On Wed, Mar 03, 2004 at 09:15:00AM +0200, Jarkko Hietaniemi wrote:
: FWIW, for this particular "case" and for the Perl 5.10 and Perl 6
: I think the best way to handle these Unicode case foldings
: (CaseFoldings.txt and SpecCase.txt) would most probably be to do the
: foldings in _Perl_ compile-ti
Anyway, sounds to me like someone has mixed Level 3 support into levels
1 and 2. If that's the case, I think it's a fundamental mistake.
Perl 5
should pick a level to default to, and stick with it. Going to other
levels should require explicit lexically-scoped declaration to minimize
magical ac
Sure, but if you let them, the Unicode Consortium will drive the
required minimum default Unicode support level up to about 42, and
then we won't need a new release of Windows to slow everyone down. :-)
What, you mean Perl 6 is not going to be also unified with XML and
called XERL? :-)
--
Jarkko
Offhand (and I'm just guessing here from the contents of the hashes),
somebody has overgeneralized somewhere, and applied language-specific
tranformations when they're not desired, with the result that utf8
strings have to be prepared to change lengths at various times. And
changing string lengths
On Tue, Mar 02, 2004 at 10:16:43PM +0200, Jarkko Hietaniemi wrote:
: ... and following the CaseFolding.txt is required in the the Unicode
: regular expression
: guidelines (http://www.unicode.org/unicode/reports/tr18/), the "Default
: Loose Matches"
: (http://www.unicode.org/unicode/reports/tr18/
I think I now managed to shave off the speed hit of those special
casing tables quite well,
and all tests still pass (brute force removal of the tables made some
tests of op/lc, op/pat,
and all of uni/* to fall flat on their face), now UTF-8 casing
operations are "only" half
the speed of non-.
Larry Wall <[EMAIL PROTECTED]> writes:
On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote:
: For this example the search value will be "Ibaïez". Because of the
search
: isn't case-sensitive, all letters should be uppercased, using the
uc method.
I don't think this is your problem,
If I can recall correctly, the case tables were in response to the
Unicode
CaseFolding table (lib/unicore/CaseFolding.txt) which does indeed
define
language-independent foldings that more complex than usual (mostly
caused
by encoding irregularities in Unicode) Maybe just the placement of
thos
On Tue, Mar 02, 2004 at 05:25:21PM +0100, Robert Allerstorfer wrote:
: > On Mon, 01 Mar 2004 20:55:14 + Nick Ing-Simmons
: > <[EMAIL PROTECTED]> wrote:
:
: > lib/unicore/To/Upper.pl includes a toupper mapping of ñ to Ñ properly.
:
: while you are getting attention to the
:
: unicore/To/Uppe
> On Mon, 01 Mar 2004 20:55:14 + Nick Ing-Simmons
> <[EMAIL PROTECTED]> wrote:
> lib/unicore/To/Upper.pl includes a toupper mapping of ñ to Ñ properly.
while you are getting attention to the
unicore/To/Upper.pl
file, you may also want to note that I have found a very nasty bug
related to t
On Mon, Mar 01, 2004 at 08:55:14PM +, Nick Ing-Simmons wrote:
: Since you are here ;-)
:
: Why does ñ not uppercase to Ñ ?
If I recall correctly, it's because the pumpking of the time thought
that backward compatibility was more important than consistency,
and gave the internal 8-bit represen
On Mon, 01 Mar 2004 20:55:14 +
Nick Ing-Simmons <[EMAIL PROTECTED]> wrote:
> Larry Wall <[EMAIL PROTECTED]> writes:
> >On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote:
> >: For this example the search value will be "IbaÃez". Because of the search
> >: isn't case-sensitive, a
Larry Wall <[EMAIL PROTECTED]> writes:
>On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote:
>: For this example the search value will be "Ibaïez". Because of the search
>: isn't case-sensitive, all letters should be uppercased, using the uc method.
>
>I don't think this is your probl
On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote:
: For this example the search value will be "Ibaïez". Because of the search
: isn't case-sensitive, all letters should be uppercased, using the uc method.
I don't think this is your problem, but in general I think it's better
to ca
Hello Nick,
thanks a lot for your answer. When I ran your script (with the 'Ñ' in
$sLine), the scripts works great. Motivated by this "victory" I modified my
search script. The results were very strange.
Using the lc method instead of uc works. Using the uc method only works if I
placed the uc ca
At 12:43 am +0200 1/3/04, Jarkko Hietaniemi wrote:
Maybe I'm missing something...?
perl -le 'open(X, ">:encoding(ucs2be)", "ucs2be");print X chr(0x1234);close X'
perl -le 'open(X, "<:encoding(ucs2be)", "ucs2be");printf "%x\n", ord()'
No. It was me that was missing it :-)
Maybe I'm missing something...?
perl -le 'open(X, ">:encoding(ucs2be)", "ucs2be");print X
chr(0x1234);close X'
perl -le 'open(X, "<:encoding(ucs2be)", "ucs2be");printf "%x\n",
ord()'
--
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this
special
biologist word we use fo
At 8:58 pm + 29/2/04, John Delacour wrote:
Suppose that /tmp/iba.txt contains the text
"ibañez" in UCS-2, preceded by the BOM, then
this works here (Perl 5.8.3)
use Encode qw/encode decode/;
my $f_16 = qq~/tmp/iba.txt~;
open F16, qq~$f_16~;
my $ucs2 = ;
my $utf8 = decode("UCS-2BE", $ucs2)
At 6:19 pm +0100 25/2/04, Sebastian Lehmann wrote:
Can anybody tell me how to work with UTF8 and UTF16 in the same script? Any
help would be greatly appreciated.
Suppose that /tmp/iba.txt contains the text
"ibañez" in UCS-2, preceded by the BOM, then this
works here (Perl 5.8.3)
use Encode qw/e
Sebastian Lehmann <[EMAIL PROTECTED]> writes:
>Hello,
>
>i use a perl script to search different files. The search values are given
>from a HTML page, the results are displayed on this page, too. The files are
>saved in the UTF16LE format, therefore i will open them with the following
>open command
20 matches
Mail list logo