Re: removing accents

2004-01-03 Thread Eric Cholet
Le 3 janv. 04, à 15:49, Jarkko Hietaniemi a écrit : I'm afraid, the process of taking NFD followed by removing \pM characters (remove_accent() as below) would remove marks other than accents too much. Say, it replaces '≠' (U+2260, ) with '=' () since a mathematic "negation slash" is encoded by

Re: Keeping byte-wise processing as an option

2004-01-03 Thread Guido Flohr
Martin Duerst wrote: in 5.6 and 5.8, but _in_principle_ the bytes pragma should tell Perl in both 5.6 and 5.8 that "I want bytes, darn it." But you still get into problem when you pass UTF-8 flagged variables to legacy modules without the pragma. Yes, that seems to do the job. But is this availab

Re: removing accents

2004-01-03 Thread Jarkko Hietaniemi
I'm afraid, the process of taking NFD followed by removing \pM characters (remove_accent() as below) would remove marks other than accents too much. Say, it replaces '≠' (U+2260, ) with '=' () since a mathematic "negation slash" is encoded by U+0338 which is to be removed. Also, although they

Re: Keeping byte-wise processing as an option

2004-01-03 Thread Jarkko Hietaniemi
5.00503, 5.6.x and 5.8.x. I don't think that the tricks you need to program around the Unicode cliffs through perl versions are collected in a document. I think now that people have had time to "Unicodify" their applications with 5.8.x, starting to collect the tricks required and found useful woul

Re: Keeping byte-wise processing as an option

2004-01-03 Thread Jarkko Hietaniemi
If it were just me, that would be easy. But stating on an FAQ page 'use Perl 5.8.1 or later' for something that worked probably even in Perl 4 doesn't look like a good idea. Perl 4? And here I was being afraid that getting 5.6 to work right would be tricky... :-) I think we need to define "work"