Re: Don't use the \C escape in regexes - Why not?

Michael Ludwig Tue, 04 May 2010 04:22:15 -0700

Am 04.05.2010 um 13:06 schrieb Michael Ludwig:

> Is it this (theoretically fragile) implicitness in handling character strings 
> that makes \C a bad idea?
> 
> But probably not as bad an idea as relying on the default platform encoding 
> in Java ("default charset" in Java API doc lingo), which may be different 
> from country to country and from installation to installation.
> 
> http://java.sun.com/javase/6/docs/api/java/lang/String.html#String%28byte[]%29


Or, more symmetrically to encoding via \C in Perl:

http://java.sun.com/javase/6/docs/api/java/lang/String.html#getBytes%28%29

  public byte[] getBytes()
    Encodes this String into a sequence of bytes
    using the platform's default charset, storing
    the result into a new byte array.

Much more serious and real than implicitly encoding via \C in Perl, given the 
fact that Java installations do not all use the same platform encoding, while 
all current Perl installations use the same internal encoding. (All Java 
installations use the same internal encoding of UTF-16, I think, but this fact 
is well hidden from the interface.)

-- 
Michael.Ludwig (#) XING.com

Re: Don't use the \C escape in regexes - Why not?

Reply via email to