Re: Interpretation of non-UTF8 strings

2004-08-24 Thread Jarkko Hietaniemi
if character codes are interpreted according to Unicode. It implies that when localized texts are taken from the system, they must be decoded from the locale encoding. If you really do have a Grand Plan of how to integrate locales and Unicode happily, congratulations. -- Jarkko Hietaniemi

Re: Interpretation of non-UTF8 strings

2004-08-22 Thread Jarkko Hietaniemi
the platforms: currently at least Linux, Win32, and Mac OS X have serious Unicode platform support. Being portable over the filesystems, CLIs, and APIs is no fun. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead

Re: Weird behavior of encoding open pragmas

2004-08-17 Thread Jarkko Hietaniemi
(Sorry about linewrap, my MUA insists...) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Jarkko Hietaniemi
/stdout/stderr, @ARGV, filenames) between the default locale encoding and Perl's internal encodings? I would really appreciate if people would run perluniintro, and perlrun/-C, but I have already given up the hope. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Jarkko Hietaniemi
Marcin 'Qrczak' Kowalczyk wrote: W licie z pon, 16-08-2004, godz. 16:31 +0300, Jarkko Hietaniemi napisa: In summary, some parts of Perl treat non-UTF-8 scalars as ISO-8859-1, while others treat is as whatever is expected by default in files and filenames and commandline (the locale tells

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Jarkko Hietaniemi
Marcin 'Qrczak' Kowalczyk wrote: W licie z pon, 16-08-2004, godz. 16:54 +0300, Jarkko Hietaniemi napisa: The encoding pragma partially works. It doesn't influence assumed encoding of files opened without specifying the encoding, nor handling of filenames, and it needs to be told about

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Jarkko Hietaniemi
the dot, and use that value. (3) Use the value from either (1) or (2) and if Encode recognizes that, good. Otherwise give up. Or something like that. (It's documented in the open pragma, somewhere). -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Jarkko Hietaniemi
:-) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Jarkko Hietaniemi
Jarkko Hietaniemi wrote: $ perl -e 'use open :locale; use encoding(latin2); print chr(260), \n' $ perl -e 'use encoding(latin2); use open :locale; print chr(260), \n' \x{12a9} does not map to iso-8859-2 at -e line 1. panic: sv_setpvn called with negative strlen at -e line 1. Which Perl

Re: BOM and principle of least surprise

2004-05-17 Thread Jarkko Hietaniemi
Erland Sommarskog wrote: Jarkko Hietaniemi ([EMAIL PROTECTED]) writes: Though I must say that personally I would avoid using BOM with UTF-8: there is little reason to use a byte order mark with UTF-8 since UTF-8 is byte order independent. True. But then there is this pesky little issue: do

Re: BOM and principle of least surprise

2004-05-16 Thread Jarkko Hietaniemi
Perl: UTF-16 doesn't parse as Perl, and the UTF-8 BOM doesn't parse as Perl. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: BOM and principle of least surprise

2004-05-14 Thread Jarkko Hietaniemi
Erland Sommarskog wrote: Jarkko Hietaniemi ([EMAIL PROTECTED]) writes: Nick Ing-Simmons wrote: This thread started as complaint that perl5 can't read a script saved as UCS-2/UTF-16 or whatever Windows uses. Uh, really? Perl 5.8+ should be able to do that, automatically. To be able

Re: BOM and principle of least surprise

2004-05-05 Thread Jarkko Hietaniemi
encoding for all inputs and outputs to be UTF8, unless it has We tried this with perl 5.8.0 and the feedback was overwhelmingly negative... if people do print chr 0xff they do expect one byte, not two. been converted and that conversion is somehow flagged. -- Jarkko Hietaniemi [EMAIL PROTECTED

Re: BOM and principle of least surprise

2004-05-04 Thread Jarkko Hietaniemi
it's always UTF8, and if you want it different, you must convert it yourself each time would save lots and lots (and lots) of problems with guessing. Predictability is good, yes? -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use

Re: AL32UTF8

2004-05-02 Thread Jarkko Hietaniemi
form of encoding, that's CESU: http://www.unicode.org/reports/tr26/.) You'll need to do a conversion pass before you can mark it as UTF-8. I think an Encode translation table would be the best place to do this kind of mapping. Encode::CESU, anyone? -- Jarkko Hietaniemi [EMAIL PROTECTED] http

Re: AL32UTF8

2004-04-29 Thread Jarkko Hietaniemi
Tim Bunce wrote: Am I right in thinking that perl's internal utf8 representation represents surrogates as a single (4 byte) code point and not as two separate code points? Mmmh. Right and wrong... as a single code point, yes, since the real UTF-8 doesn't do surrogates which are only a UTF-16

Re: PERL_UNICODE environment variable

2004-04-06 Thread Jarkko Hietaniemi
Jonathan Warden wrote: As I understand it, the -CSD commandline option should add UTF8 to the PerlIO layers for all file streams. But it seems only to be applying it to STDIN and STDOUT, and not other streams. Anyone know what's going on? my $file = new IO::File($filename) or die

Re: How to use Unicode::Collate in multilinguage apps?

2004-03-28 Thread Jarkko Hietaniemi
one must always either accept a good enough sorting, or one must customize more or less heavily. But according to Unicode default collation, 'Ä' is ordered as a modified 'A' and equal to 'A' at the primary level. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: BOM and principle of least surprise

2004-03-24 Thread Jarkko Hietaniemi
I said the principle of least surprise, because having read Perluniintro my impression was that I should really have to care in which format the string was in. You should not need to care *once* the data has been read into Perl. Before that, in the input phase, Perl needs your help. If you

Re: Converting string to UTF-16LE

2004-02-29 Thread Jarkko Hietaniemi
Maybe I'm missing something...? perl -le 'open(X, :encoding(ucs2be), ucs2be);print X chr(0x1234);close X' perl -le 'open(X, :encoding(ucs2be), ucs2be);printf %x\n, ord(X)' -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable

Re: Perl 5.8.0 utf8/regex bug

2004-02-21 Thread Jarkko Hietaniemi
fixed in Mac OS X 10.3.2's 5.8.1-RC3 plus change, and is still fixed in 5.8.2 and 5.8.3 I have compiled for myself). So yes, it was a bug in Perl, not Encode (which makes sense, Encode doesn't diddle with regex matching as such). -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: Info required - Wide API calls in Win32 Perl = 5.8.2

2004-02-19 Thread Jarkko Hietaniemi
On Feb 20, 2004, at 1.16, Peter NESWAL wrote: First, thanks to all on the fast response to my questions. Jan Dubois wrote: On Thu, 19 Feb 2004 22:03:14 +0200, Jarkko Hietaniemi [EMAIL PROTECTED] wrote: After switching from perl 5.8.0 build 806 to perl 5.8.2 build 808 I found that the ability

Re: How to convert base64 string to utf-8

2004-02-02 Thread Jarkko Hietaniemi
Unfortunately, you will be out of luck for the somewhat common case of UTF-7 (unless it is available in Encode by now). UTF-7 has been supported since Encode 1.95, perl 5.8.1 had Encode 1.98, and perl is now at 5.8.3 with Encode 1.99. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: Keeping byte-wise processing as an option

2004-01-03 Thread Jarkko Hietaniemi
in the last sentence. Did you want to say I don't know?. Yes, something like that. Yes, that seems to do the job. But is this available in 5.0 or earlier? Certainly not. It's not even available (does not come standard) before 5.6, and IIRC use came in at 5.0. -- Jarkko Hietaniemi [EMAIL PROTECTED

Re: Keeping byte-wise processing as an option

2004-01-03 Thread Jarkko Hietaniemi
4?) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: removing accents

2004-01-03 Thread Jarkko Hietaniemi
OVERLAY which is to be removed. Also, although they are not accents, it's unclear (and quite language-dependent) what should be done with ligatures. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: \W and [\W]

2004-01-02 Thread Jarkko Hietaniemi
the legacy brain not to fire, other tests involving characters in that range started failing. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Keeping byte-wise processing as an option

2004-01-02 Thread Jarkko Hietaniemi
unfinished as regards to Unicode. No, it won't get fixed. Beyond 5.8, I don't. Some people may have some tricks they use to get Unicode code working both in 5.6 and 5.8, but _in_principle_ the bytes pragma should tell Perl in both 5.6 and 5.8 that I want bytes, darn it. -- Jarkko Hietaniemi [EMAIL

Re: perlunicode comment - when Unicode does not happen

2003-12-25 Thread Jarkko Hietaniemi
no variable in it :-) Just environment. Jungshik -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
much anything about Win32 APIs, Unicode or not, so I cannot be of much help. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
and unicode would mean all the same (UTF-8). As for 'normalization', I have to think more about it. And so on.. I've been just thinking aloud so that you have to bear with some incoherency. I think incoherency is the key word in this area. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
. In other words: the common data is the CLDR data. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
in Perl is new - and currently it is dead in the water for things like -d - so why not just fix it. Common practice in how do I detect what charset+encoding the user now wants to use for their filenames and what funky charset+encoding filesystems there are out there, I guess. -- Jarkko Hietaniemi

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
process that runs in my laptop creating and seeing by using just one locale setting, well, that would be a neat trick. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
the ICU stuff going into P6 - hope it will relatively easy to use though - I hate the APIs! I don't think the plan is to have ICU APIs as such available (at least in the first approximation), but instead leverage from ICU's low level Unicode machinery. -- Jarkko Hietaniemi [EMAIL PROTECTED] http

Re: perlunicode comment - when Unicode does not happen

2003-12-22 Thread Jarkko Hietaniemi
of system calls (like in Windows). Again, I think the right way to do what you want is to create a set of (operating system dependent) modules (some may require XS) that introduce the necessary filesystem-related (mkdir etc) variants (or overrides, if one wants those). -- Jarkko Hietaniemi [EMAIL

Re: perlunicode comment - when Unicode does not happen

2003-12-22 Thread Jarkko Hietaniemi
give it quite a bit of thought during the Perl 5.8.0 process and I am pretty certain of my reasoning. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Unicode::Collate question

2003-12-04 Thread Jarkko Hietaniemi
to be improving slowly. It seems that they are working on separating the locale data from the ICU, calling the data CLDR: http://oss.software.ibm.com/cvs/icu/~checkout~/locale/CLDR_status.html -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use

Re: Unicode::Collate question

2003-12-01 Thread Jarkko Hietaniemi
you can find from search.cpan.org with sort, for example Sort::ArbBiLex might be useful. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Unicode::Collate question

2003-12-01 Thread Jarkko Hietaniemi
/ Mimer_SQL_Engine_DocSet/Mimer_Concepts14.html Thanks, -- Eric Cholet -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: 5.8.1 perlre man page: [:punct:] vs. \p{IsPunct}

2003-11-02 Thread Jarkko Hietaniemi
say :punct: on a non-Unicode data, you are doing _operating_ _system_ _dependent_ AND _locale_ _dependent_ operation. :punct: and \p{Punct} are (supposed to be) equivalent with Unicode data. in Perl 5.8.1, they are _not_ equivalent, as the following snippet will demonstrate: -- Jarkko

Re: Bidirectional (bidi) Support?

2003-10-26 Thread Jarkko Hietaniemi
reason that often comes up with Unicode is that they want to have a 1:1 round-tripping from any legacy encoding to Unicode and back. So if some existing old encoding had the Arabic presentation forms, Unicode had to have them. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

nice article about Unicode

2003-10-14 Thread Jarkko Hietaniemi
http://www.joelonsoftware.com/articles/Unicode.html -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Mixing Unicode and Byte output on a Unicode enabled Perl 5.8.0

2003-10-10 Thread Jarkko Hietaniemi
confuses things. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: bytes::substr() ?

2003-09-03 Thread Jarkko Hietaniemi
Perl 5.8.1, whenever that happens, will have bytes::substr(). -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: UTF-8 case conversion

2003-09-03 Thread Jarkko Hietaniemi
say use utf8; at the top of your script and write your script, including the string literals, in UTF-8. Thanks! Sigge -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: UTF-8 case conversion

2003-09-03 Thread Jarkko Hietaniemi
:-) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: UTF-8 case conversion

2003-09-03 Thread Jarkko Hietaniemi
BCAVEAT: The following operations look the same but are not quite so; from_to($data, ïso-8859-1, ütf8); #1 $data = decode(ïso-8859-1, $data); #2 Ooops. My GNU Emacs iso-accents-mode tried to be too helpful, here... -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: Endless loop with illegal UTF-8 in Encode.pm

2003-08-29 Thread Jarkko Hietaniemi
? My Encode.pm is Version 1.75. It's part of the perl debian package 5.8.0-19 (debian unstable). I wish Encode.pm would handle invalid UTF-8 in a graceful manner... Cheers, -Sven -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use

Re: When regex dot doesn't work on unicode characters

2003-07-03 Thread Jarkko Hietaniemi
s/(.)/: $1 :/; # HAS NO EFFECT (left side did not match?!) s/(..)(A)/$1: $2 :/; # HAS NO EFFECT (left side did not match?!) Is this something that will be fixed in 5.8.1? Yes. It has been already been fixed. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: utf8_heavy noise

2003-06-22 Thread Jarkko Hietaniemi
and it is better to initialize $bits. (But is utf8_heavy.pl not ported for 64-bits?) Thanks, your patch has been now applied. I haven't seen such warnings in 64-bit platforms, but then again, beyond BMP hasn't been tested much. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: [FYI] Intersection and Removal of Character Class

2003-06-14 Thread Jarkko Hietaniemi
Excellent! I added a mention of this module to the perlunicode pod. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Unicode 4.0 vs. Perl 5.8.x ?

2003-06-11 Thread Jarkko Hietaniemi
. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: How do I find out which encoding a filehandle is using

2003-05-30 Thread Jarkko Hietaniemi
In Perl 5.8.1 there will be an API, PerlIO::get_layers() that will return the list of layer names on a filehandle. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: UTF8 flag unsets in inheritted methods

2003-03-14 Thread Jarkko Hietaniemi
? -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Looping with Unicode

2003-03-07 Thread Jarkko Hietaniemi
*think* a..z followed by all pairings of \w characters in the a..chr(0x100) range could be an answer, but I'm not certain. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Looping with Unicode

2003-03-06 Thread Jarkko Hietaniemi
the quoteless style myself. Again, did I forget to set something? (yes I did try use utf8;). thanks! /Daniel -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Odd regexp behavior

2003-02-26 Thread Jarkko Hietaniemi
. Also note that $ perl -e 'print ((\x{2019} =~ /\S/) . \n);' 1 so \x{2019} *does* match \S in principle ... odd. (Perl v5.6.0 built for i386-linux) Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ -- Jarkko Hietaniemi

Re: [PATCH] viscii.ucm

2003-02-16 Thread Jarkko Hietaniemi
Since the original VISCII data came (I think) from the Unicode web site, maybe this discrepancy should be more reported even more officially? -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: [PATCH] viscii.ucm

2003-02-16 Thread Jarkko Hietaniemi
...or it could come from the Tcl/Tk mapping tables? -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: [PATCH] viscii.ucm

2003-02-16 Thread Jarkko Hietaniemi
P.S. Is ftp.funet.fi still down? I think I am ready to $Encode:VERSION++ with this patch applied. Unfortunately so. Or, FUNET is up, it's just that the updates from PAUSE to FUNET are stopped until they are certain everything is fixed. (See use.perl.org.) -- Jarkko Hietaniemi [EMAIL

Re: Understanding Unicode support in Perl

2003-01-26 Thread Jarkko Hietaniemi
is gone since it caused too many problems for existing code. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: CGI and UTF

2003-01-21 Thread Jarkko Hietaniemi
-C:1 / -C:0 it is. (The :part being optional.) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: CGI and UTF

2003-01-18 Thread Jarkko Hietaniemi
that the above do not change the fact that if a *programmer* wants their code to be UTF-8 aware, they need to think about the evil binmode(). -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: CGI and UTF

2003-01-05 Thread Jarkko Hietaniemi
your page. I have to think more about this, though, not to make the checking at the point of reading for example unreasonably slow. And I'll be rather Internet connectivity challenged in the coming weeks, so please be patient. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: CGI and UTF

2003-01-05 Thread Jarkko Hietaniemi
to Larry Wall :-) I'm trying to get an opinion about this from him, and I just logged a problem ticket about this issue. --ewh -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: Encode utf-16 problem

2002-12-04 Thread Jarkko Hietaniemi
; ./perl -e 'print pack(v*, 0xFEFF, unpack(C*, test\n))' ! utf16 $ hex utf16 ff fe 74 00 65 00 73 00 74 00 0a 00 ..t.e.s.t... $ ./perl -Ilib -we 'open(FH, :encoding(utf16), utf16);print FH' test $ -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special

Re: Unicode Perl Dependencies

2002-11-12 Thread Jarkko Hietaniemi
/ directory? If so, what happened? preferential. Otherwise if someone has some information and/or advice on how to track down which parts we do need, I'd be quite appreciative. Thanks very much Karl Matthias -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: tr/// and use encoding

2002-10-03 Thread Jarkko Hietaniemi
your source code is other text encodings than UTF-8. But tr/// does not embrace this magic. Clarification: tr/A-E/P-T/ (the ranges) does not embrace that magic. tr/ABCDE/PQRST/ does work with the encoding pragma since that employs string literals. -- Jarkko Hietaniemi [EMAIL PROTECTED

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
be overcome by simple eval qq{} as illustrated. This much idiom would not hurt much, at least not as much as the Cookbook sample -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
the message from mutt, I do not see any eight-bit characters in the saved file... -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
(Hi, it's me again...) Are you doing character ranges in the tr/// under 'use encoding'? (I'm asking because I see a - in the middle of what I assume is mangled EUC-JP) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
of tr/// is very limited. I think you want s///e. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
On Wed, Oct 02, 2002 at 10:44:06PM +0900, Dan Kogai wrote: On Wednesday, Oct 2, 2002, at 22:34 Asia/Tokyo, Jarkko Hietaniemi wrote: Yes. that's where hiragana - katakana conversion is attempted; English equivalent of tr/A-Z/a-z/. Okay... What are the {begin,end} codepoints of those ranges

Re: Encode::compat (was Re: Encode functionality for Perl 5.6.1)

2002-09-21 Thread Jarkko Hietaniemi
rather trivially with the: s/([\x80-\xFF])/chr(0xC0|ord($1)6).chr(0x80|ord($1)0x3F)/eg; s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)60xC0|ord($2)0x3F)/eg; -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack

Re: Is \p{EastAsianFullwidth} worth implementing?

2002-09-19 Thread Jarkko Hietaniemi
? User-defined Unicode character properties (character classes), are publicly documented in perlunicode.pod. Are there any hidden drawbacks or other problems with this idea? Thanks, /Autrijus/ -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word

Re: Is \p{EastAsianFullwidth} worth implementing?

2002-09-19 Thread Jarkko Hietaniemi
properties (character classes), are publicly documented in perlunicode.pod. I'll be happy to oblige and make a U::EAW with that, then. If it is found to work fine, we can certainly merge that later back into the core (maybe when Unicode 3.2.1 comes out). -- Jarkko Hietaniemi [EMAIL PROTECTED

Re: Is \p{EastAsianFullwidth} worth implementing?

2002-09-19 Thread Jarkko Hietaniemi
basically a user cannot define 'Is' properties on his/her own, because the canonized form does not match the specified $type: -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

[FYI] locales resources and issues

2002-09-18 Thread Jarkko Hietaniemi
Found this nice resource, I especially like the list of issues (which is much longer than the list of advantages...) http://www.i18nguy.com/locales/locale-resources.html http://www.i18nguy.com/locales/index.html -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi

Re: 2 Suprises w/5.8.0

2002-08-01 Thread Jarkko Hietaniemi
Excellent, thanks Andreas. I see cookbook like this as a patch for perluniintro/perlunicode for 5.8.1. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen

Re: 2 Suprises w/5.8.0

2002-07-31 Thread Jarkko Hietaniemi
Only this combination got 'split' in myFunction to chop up utf-8 text properly. Is this behavior expected? Without seeing more detail, yes. Raw embedded UTF-8 has to be marked as UTF-8 somehow, and use utf8 is the primary way. Another issue I've encountered was with using Unicode::String

Re: translating the Perl 5.8.0 announcement to CJK

2002-07-18 Thread Jarkko Hietaniemi
Final round of proofreadings, if I may: http://www.iki.fi/jhi/pl580.txt.big5.tw http://www.iki.fi/jhi/pl580.txt.euc.cn http://www.iki.fi/jhi/pl580.txt.euc.jp http://www.iki.fi/jhi/pl580.txt.euc.kr -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for

Re: [PATCH] Encode::FB_QUIET

2002-07-17 Thread Jarkko Hietaniemi
On Wed, Jul 17, 2002 at 06:25:01PM +0900, Tatsuhiko Miyagawa wrote: here's a tiny doc patch for Encode: Thanks, applied. -- Tatsuhiko Miyagawa [EMAIL PROTECTED] --- Encode.pm~ Sun Jun 2 03:08:01 2002 +++ Encode.pm Wed Jul 17 18:24:02 2002 @@ -552,7 +552,7 @@ while(defined(read

Re: translating the Perl 5.8.0 announcement to CJK

2002-07-17 Thread Jarkko Hietaniemi
On Thu, Jul 18, 2002 at 11:51:11AM +0900, Dan Kogai wrote: On Thursday, July 18, 2002, at 02:31 AM, Jarkko Hietaniemi wrote: Notice the name changes. I also edited away the DJGPP broken entry since that's now fixed. I notice that the VMS section in the .jp one is untranslated

Re: translating the Perl 5.8.0 announcement to CJK

2002-07-17 Thread Jarkko Hietaniemi
On Thu, Jul 18, 2002 at 12:34:22PM +0800, Autrijus Tang wrote: On Thu, Jul 18, 2002 at 07:21:58AM +0300, Jarkko Hietaniemi wrote: Thus fixed. I've also corrected linefeeds so it looks better via web browsers. Get the newer version one via http://www.dan.co.jp/~dankogai/bleedperl

Re: Performance and interface of Encode(3pm) in perl 5.8.0-RC1

2002-07-10 Thread Jarkko Hietaniemi
There has been some talk of Encode possibly caching internally fast paths for small (like one eight-bit cset to another) conversions. But it was decided that we better get it first working (and out of the door with 5.8.0) and only then try to make it faster. I won't comment more on the OO vs

Re: Questions about Unicode Support in 5.6.1

2002-07-07 Thread Jarkko Hietaniemi
Do you mean that the Unicode Standard has moved on from what it was when 5.6.1 was released? Yes. What version of Unicode did 5.6.1 support? 3.0.1. (The same as Perl 5.6.0 supported.) What version of Unicode will 5.8.0 support? 3.2.0. 3. In which areas is Perl 5.6.1's Unicode

Re: Another Unicode s/// buglet?

2002-06-26 Thread Jarkko Hietaniemi
Groan. It doesn't seem that people have been stress testing Unicode s/// so far that much. But anyway, another fix attached, along with the previous one. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack

Re: Another Unicode s/// buglet?

2002-06-26 Thread Jarkko Hietaniemi
Mopping up. Change 17362 by jhi@alpha on 2002/06/26 15:25:45 Let's not leak. Affected files ... //depot/perl/pp_hot.c#283 edit Differences ... //depot/perl/pp_hot.c#283 (text) Index: perl/pp_hot.c --- perl/pp_hot.c#282~17358~Wed Jun 26 17:37:12 2002 +++

Re: Another Unicode s/// buglet?

2002-06-26 Thread Jarkko Hietaniemi
On Wed, Jun 26, 2002 at 05:43:07PM +0100, Hugo van der Sanden wrote: SADAHIRO Tomoyuki [EMAIL PROTECTED] wrote: :With Perl 5.8.0 RC2 (or plus Change 17353), :there is something strange. : :In $unicode =~ s/$regex/$bytes/, :$bytes is not upgraded, :and a malformed Unicode string is

Re: Another Unicode s/// buglet?

2002-06-26 Thread Jarkko Hietaniemi
On Wed, Jun 26, 2002 at 05:52:25PM +0100, Hugo van der Sanden wrote: I wrote: :Attached patch passes all existing tests here, as well as some new ones. Whoops, crossed in the post. My patch was written against 17356; it may not be necessary after #17358, but the extra tests might be worth

Re: Another Unicode s/// buglet?

2002-06-26 Thread Jarkko Hietaniemi
Also, does your version need the additional SvSetMagicSV(nsv, dstr) Probably. I'll try whether it disturbs anything, I guess no since we'd need some magical Unicode string? Doesn't seem to hurt, but before I have a test case that exercises that piece of code, I'd rather not add

FYI: Unicode Keyboards for Mac OS

2002-06-07 Thread Jarkko Hietaniemi
http://wordherd.com/keyboards/ -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen

Re: ICU and Parrot

2002-06-01 Thread Jarkko Hietaniemi
On Sun, Jun 02, 2002 at 02:22:32AM +0900, Dan Kogai wrote: On Saturday, June 1, 2002, at 04:37 AM, Autrijus Tang wrote: Understood. In a related note: http://www.li18nux.org/docs/html/CodesetAliasTable-V10.html has spurred quite a bit discussion in Taiwan because of the mandated

[jhi@iki.fi: 5.8.0 RC1 has been released]

2002-06-01 Thread Jarkko Hietaniemi
Since the two big Perl 5.8.0 news are much better Unicode support and much better threads support, here is the announcement. Enjoy. - Forwarded message from Jarkko Hietaniemi [EMAIL PROTECTED] - =head1 Perl 5.8.0 Release Candidate 1 The Perl 5 developer team is pleased to announce

Re: ICU and Parrot

2002-05-30 Thread Jarkko Hietaniemi
On Fri, May 31, 2002 at 06:18:55AM +0900, Dan Kogai wrote: On Friday, May 31, 2002, at 06:06 AM, George Rhoten wrote: Hopefully you take the implicit information in the UCM files and put that into encode implementation too. For instance, in gb18030 there are whole ranges of Unicode

Re: README.cjk?

2002-05-06 Thread Jarkko Hietaniemi
On Mon, May 06, 2002 at 09:01:58PM -0400, Jungshik Shin wrote: On Tue, 7 May 2002, Dan Kogai wrote: Hi Dan, pumpking is calling for the (hopefully) the last chance to update README.cjk. On Tuesday, May 7, 2002, at 02:48 , Jarkko Hietaniemi wrote: Do I have the latest versions

Re: Change 16302: Provide the \N{U+HHHH} syntax before we forget.

2002-05-02 Thread Jarkko Hietaniemi
On Thu, May 02, 2002 at 08:01:34AM +0200, Philip Newton wrote: On Wed, 1 May 2002 07:00:05 -0700, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote: Change 16302 by jhi@alpha on 2002/05/01 12:54:24 Provide the \N{U+} syntax before we forget. Do we also want to support U-HH? I

Re: [Encode] 1.66 Released

2002-05-01 Thread Jarkko Hietaniemi
On Wed, May 01, 2002 at 02:58:13PM +0900, Dan Kogai wrote: My fever is down at last when I released Encode-1.66, available as follows; Whole: http://www.dan.co.jp/~dankogai/Encode-1.66.tar.gz or CPAN Diff against current: 264 lines

  1   2   3   >