Re: Don't use the \C escape in regexes - Why not?

2010-05-04 Thread Michael Ludwig
Am 04.05.2010 um 11:09 schrieb Gisle Aas: I regret that I let \C sneak into the URI module. I might have understood why one might think that \C is not a good idea to use in that method, and maybe not in general. The fact that character strings in Perl are encoded in UTF-8 is an

Re: Don't use the \C escape in regexes - Why not?

2010-05-04 Thread Michael Ludwig
Am 04.05.2010 um 13:06 schrieb Michael Ludwig: Is it this (theoretically fragile) implicitness in handling character strings that makes \C a bad idea? But probably not as bad an idea as relying on the default platform encoding in Java (default charset in Java API doc lingo), which may be

Re: Don't use the \C escape in regexes - Why not?

2010-05-04 Thread Gisle Aas
I regret that I let \C sneak into the URI module. Now we have an interface that depends on the internal UTF-8 flag of the stings passed in. This makes it very hard to explain, makes it not do what you want when different type of strings are combined and makes it hard to fix in ways that don't

Re: Don't use the \C escape in regexes - Why not?

2010-05-04 Thread Aristotle Pagaltzis
* Michael Ludwig michael.lud...@xing.com [2010-05-04 14:55]: But wait a second: While URIs are meant to be made of characters, they're also meant to go over the wire, and there are no characters on the wire, only bytes. There is no standard encoding defined for the wire, although UTF-8 has