Re: Don't use the \C escape in regexes - Why not?

Michael Ludwig Tue, 04 May 2010 04:06:46 -0700

Am 04.05.2010 um 11:09 schrieb Gisle Aas:

> I regret that I let \C sneak into the URI module.



I might have understood why one might think that \C is not a good idea to use 
in that method, and maybe not in general.

The fact that character strings in Perl are encoded in UTF-8 is an 
implementation detail, and you shouldn't bother, or make any assumptions about 
this technicality. But by using \C to derive an encoded version - a byte string 
- from a character string (and maybe even taking it for granted you'll get a 
UTF-8 byte string), you're tying your interface to an implementation detail. 
And the behaviour of your code will change as soon as Perl moves on to use, 
say, UTF-16 as the internal encoding. (Which is highly unlikely, but that's 
another story.)

Is it this (theoretically fragile) implicitness in handling character strings 
that makes \C a bad idea?

But probably not as bad an idea as relying on the default platform encoding in 
Java ("default charset" in Java API doc lingo), which may be different from 
country to country and from installation to installation.

http://java.sun.com/javase/6/docs/api/java/lang/String.html#String%28byte[]%29

-- 
Michael.Ludwig (#) XING.com

Re: Don't use the \C escape in regexes - Why not?

Reply via email to