Re: Converting string to UTF-16LE

2004-03-02 Thread Larry Wall
On Wed, Mar 03, 2004 at 09:15:00AM +0200, Jarkko Hietaniemi wrote: : FWIW, for this particular "case" and for the Perl 5.10 and Perl 6 : I think the best way to handle these Unicode case foldings : (CaseFoldings.txt and SpecCase.txt) would most probably be to do the : foldings in _Perl_ compile-ti

Re: Converting string to UTF-16LE

2004-03-02 Thread Jarkko Hietaniemi
Anyway, sounds to me like someone has mixed Level 3 support into levels 1 and 2. If that's the case, I think it's a fundamental mistake. Perl 5 should pick a level to default to, and stick with it. Going to other levels should require explicit lexically-scoped declaration to minimize magical ac

Re: Converting string to UTF-16LE

2004-03-02 Thread Jarkko Hietaniemi
Sure, but if you let them, the Unicode Consortium will drive the required minimum default Unicode support level up to about 42, and then we won't need a new release of Windows to slow everyone down. :-) What, you mean Perl 6 is not going to be also unified with XML and called XERL? :-) -- Jarkko

Re: Converting string to UTF-16LE

2004-03-02 Thread Jarkko Hietaniemi
Offhand (and I'm just guessing here from the contents of the hashes), somebody has overgeneralized somewhere, and applied language-specific tranformations when they're not desired, with the result that utf8 strings have to be prepared to change lengths at various times. And changing string lengths

Re: Converting string to UTF-16LE

2004-03-02 Thread Larry Wall
On Tue, Mar 02, 2004 at 10:16:43PM +0200, Jarkko Hietaniemi wrote: : ... and following the CaseFolding.txt is required in the the Unicode : regular expression : guidelines (http://www.unicode.org/unicode/reports/tr18/), the "Default : Loose Matches" : (http://www.unicode.org/unicode/reports/tr18/

Re: Converting string to UTF-16LE

2004-03-02 Thread Jarkko Hietaniemi
I think I now managed to shave off the speed hit of those special casing tables quite well, and all tests still pass (brute force removal of the tables made some tests of op/lc, op/pat, and all of uni/* to fall flat on their face), now UTF-8 casing operations are "only" half the speed of non-.

Re: Converting string to UTF-16LE

2004-03-02 Thread Jarkko Hietaniemi
Larry Wall <[EMAIL PROTECTED]> writes: On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote: : For this example the search value will be "Ibaïez". Because of the search : isn't case-sensitive, all letters should be uppercased, using the uc method. I don't think this is your problem,

Re: Converting string to UTF-16LE

2004-03-02 Thread Jarkko Hietaniemi
If I can recall correctly, the case tables were in response to the Unicode CaseFolding table (lib/unicore/CaseFolding.txt) which does indeed define language-independent foldings that more complex than usual (mostly caused by encoding irregularities in Unicode) Maybe just the placement of thos

Re: Converting string to UTF-16LE

2004-03-02 Thread Larry Wall
On Tue, Mar 02, 2004 at 05:25:21PM +0100, Robert Allerstorfer wrote: : > On Mon, 01 Mar 2004 20:55:14 + Nick Ing-Simmons : > <[EMAIL PROTECTED]> wrote: : : > lib/unicore/To/Upper.pl includes a toupper mapping of ñ to Ñ properly. : : while you are getting attention to the : : unicore/To/Uppe

Re: Converting string to UTF-16LE

2004-03-02 Thread Robert Allerstorfer
> On Mon, 01 Mar 2004 20:55:14 + Nick Ing-Simmons > <[EMAIL PROTECTED]> wrote: > lib/unicore/To/Upper.pl includes a toupper mapping of ñ to Ñ properly. while you are getting attention to the unicore/To/Upper.pl file, you may also want to note that I have found a very nasty bug related to t

Re: Converting string to UTF-16LE

2004-03-02 Thread Larry Wall
On Mon, Mar 01, 2004 at 08:55:14PM +, Nick Ing-Simmons wrote: : Since you are here ;-) : : Why does ñ not uppercase to Ñ ? If I recall correctly, it's because the pumpking of the time thought that backward compatibility was more important than consistency, and gave the internal 8-bit represen

Re: Converting string to UTF-16LE

2004-03-02 Thread SADAHIRO Tomoyuki
On Mon, 01 Mar 2004 20:55:14 + Nick Ing-Simmons <[EMAIL PROTECTED]> wrote: > Larry Wall <[EMAIL PROTECTED]> writes: > >On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote: > >: For this example the search value will be "IbaÃez". Because of the search > >: isn't case-sensitive, a

Re: Converting string to UTF-16LE

2004-03-01 Thread Nick Ing-Simmons
Larry Wall <[EMAIL PROTECTED]> writes: >On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote: >: For this example the search value will be "Ibaïez". Because of the search >: isn't case-sensitive, all letters should be uppercased, using the uc method. > >I don't think this is your probl

Re: Converting string to UTF-16LE

2004-03-01 Thread Larry Wall
On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote: : For this example the search value will be "Ibaïez". Because of the search : isn't case-sensitive, all letters should be uppercased, using the uc method. I don't think this is your problem, but in general I think it's better to ca

Re: Converting string to UTF-16LE

2004-03-01 Thread Sebastian Lehmann
Hello Nick, thanks a lot for your answer. When I ran your script (with the 'Ñ' in $sLine), the scripts works great. Motivated by this "victory" I modified my search script. The results were very strange. Using the lc method instead of uc works. Using the uc method only works if I placed the uc ca

Re: Converting string to UTF-16LE

2004-02-29 Thread John Delacour
At 12:43 am +0200 1/3/04, Jarkko Hietaniemi wrote: Maybe I'm missing something...? perl -le 'open(X, ">:encoding(ucs2be)", "ucs2be");print X chr(0x1234);close X' perl -le 'open(X, "<:encoding(ucs2be)", "ucs2be");printf "%x\n", ord()' No. It was me that was missing it :-)

Re: Converting string to UTF-16LE

2004-02-29 Thread Jarkko Hietaniemi
Maybe I'm missing something...? perl -le 'open(X, ">:encoding(ucs2be)", "ucs2be");print X chr(0x1234);close X' perl -le 'open(X, "<:encoding(ucs2be)", "ucs2be");printf "%x\n", ord()' -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use fo

Re: Converting string to UTF-16LE

2004-02-29 Thread John Delacour
At 8:58 pm + 29/2/04, John Delacour wrote: Suppose that /tmp/iba.txt contains the text "ibañez" in UCS-2, preceded by the BOM, then this works here (Perl 5.8.3) use Encode qw/encode decode/; my $f_16 = qq~/tmp/iba.txt~; open F16, qq~$f_16~; my $ucs2 = ; my $utf8 = decode("UCS-2BE", $ucs2)

Re: Converting string to UTF-16LE

2004-02-29 Thread John Delacour
At 6:19 pm +0100 25/2/04, Sebastian Lehmann wrote: Can anybody tell me how to work with UTF8 and UTF16 in the same script? Any help would be greatly appreciated. Suppose that /tmp/iba.txt contains the text "ibañez" in UCS-2, preceded by the BOM, then this works here (Perl 5.8.3) use Encode qw/e

Re: Converting string to UTF-16LE

2004-02-26 Thread Nick Ing-Simmons
Sebastian Lehmann <[EMAIL PROTECTED]> writes: >Hello, > >i use a perl script to search different files. The search values are given >from a HTML page, the results are displayed on this page, too. The files are >saved in the UTF16LE format, therefore i will open them with the following >open command