if
character codes are interpreted according to Unicode. It implies
that when localized texts are taken from the system, they must be
decoded from the locale encoding.
If you really do have a Grand Plan of how to integrate locales and
Unicode happily, congratulations.
--
Jarkko Hietaniemi
the platforms: currently at least Linux, Win32, and Mac OS
X have serious Unicode platform support. Being portable over the
filesystems, CLIs, and APIs is no fun.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead
(Sorry about linewrap, my MUA insists...)
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
/stdout/stderr,
@ARGV, filenames) between the default locale encoding and Perl's
internal encodings?
I would really appreciate if people would run perluniintro, and
perlrun/-C, but I have already given up the hope.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
Marcin 'Qrczak' Kowalczyk wrote:
W licie z pon, 16-08-2004, godz. 16:31 +0300, Jarkko Hietaniemi
napisa:
In summary, some parts of Perl treat non-UTF-8 scalars as ISO-8859-1,
while others treat is as whatever is expected by default in files and
filenames and commandline (the locale tells
Marcin 'Qrczak' Kowalczyk wrote:
W licie z pon, 16-08-2004, godz. 16:54 +0300, Jarkko Hietaniemi
napisa:
The encoding pragma partially works. It doesn't influence assumed
encoding of files opened without specifying the encoding, nor handling
of filenames, and it needs to be told about
the dot, and use that value.
(3) Use the value from either (1) or (2) and if Encode recognizes that,
good. Otherwise give up.
Or something like that. (It's documented in the open pragma, somewhere).
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist
:-)
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Jarkko Hietaniemi wrote:
$ perl -e 'use open :locale; use encoding(latin2); print chr(260), \n'
$ perl -e 'use encoding(latin2); use open :locale; print chr(260), \n'
\x{12a9} does not map to iso-8859-2 at -e line 1.
panic: sv_setpvn called with negative strlen at -e line 1.
Which Perl
Erland Sommarskog wrote:
Jarkko Hietaniemi ([EMAIL PROTECTED]) writes:
Though I must say that personally I would avoid using BOM with UTF-8:
there is little reason to use a byte order mark with UTF-8 since UTF-8
is byte order independent.
True. But then there is this pesky little issue: do
Perl: UTF-16 doesn't parse as Perl, and the UTF-8 BOM doesn't
parse as Perl.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Erland Sommarskog wrote:
Jarkko Hietaniemi ([EMAIL PROTECTED]) writes:
Nick Ing-Simmons wrote:
This thread started as complaint that perl5 can't read a
script saved as UCS-2/UTF-16 or whatever Windows uses.
Uh, really? Perl 5.8+ should be able to do that, automatically.
To be able
encoding for all inputs and outputs to be UTF8, unless it has
We tried this with perl 5.8.0 and the feedback was overwhelmingly
negative... if people do print chr 0xff they do expect one byte,
not two.
been converted and that conversion is somehow flagged.
--
Jarkko Hietaniemi [EMAIL PROTECTED
it's always UTF8, and if you want it different, you must
convert it yourself each time would save lots and lots (and lots) of
problems with guessing. Predictability is good, yes?
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use
form of encoding, that's
CESU: http://www.unicode.org/reports/tr26/.)
You'll need to do a conversion pass before you can mark it as UTF-8.
I think an Encode translation table would be the best place to do this
kind of mapping. Encode::CESU, anyone?
--
Jarkko Hietaniemi [EMAIL PROTECTED] http
Tim Bunce wrote:
Am I right in thinking that perl's internal utf8 representation
represents surrogates as a single (4 byte) code point and not as
two separate code points?
Mmmh. Right and wrong... as a single code point, yes, since the real
UTF-8 doesn't do surrogates which are only a UTF-16
Jonathan Warden wrote:
As I understand it, the -CSD commandline option should add UTF8 to the
PerlIO layers for all file streams. But it seems only to be applying it to
STDIN and STDOUT, and not other streams.
Anyone know what's going on?
my $file = new IO::File($filename) or die
one must always either accept a good enough sorting, or one must
customize more or less heavily.
But according to Unicode default collation, 'Ä' is ordered
as a modified 'A' and equal to 'A' at the primary level.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
I said the principle of least surprise, because having read Perluniintro
my impression was that I should really have to care in which format the
string was in.
You should not need to care *once* the data has been read into Perl.
Before that, in the input phase, Perl needs your help.
If you
Maybe I'm missing something...?
perl -le 'open(X, :encoding(ucs2be), ucs2be);print X
chr(0x1234);close X'
perl -le 'open(X, :encoding(ucs2be), ucs2be);printf %x\n,
ord(X)'
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable
fixed in Mac OS X 10.3.2's 5.8.1-RC3 plus change, and is still fixed in
5.8.2
and 5.8.3 I have compiled for myself). So yes, it was a bug in Perl,
not
Encode (which makes sense, Encode doesn't diddle with regex matching as
such).
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
On Feb 20, 2004, at 1.16, Peter NESWAL wrote:
First, thanks to all on the fast response to my questions.
Jan Dubois wrote:
On Thu, 19 Feb 2004 22:03:14 +0200, Jarkko Hietaniemi [EMAIL PROTECTED]
wrote:
After switching from perl 5.8.0 build 806 to perl 5.8.2 build 808
I found that the ability
Unfortunately, you will be out of luck for the somewhat common case of
UTF-7 (unless it is available in Encode by now).
UTF-7 has been supported since Encode 1.95, perl 5.8.1 had Encode 1.98,
and perl is now at 5.8.3 with Encode 1.99.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
in the last sentence. Did you
want to say I don't know?.
Yes, something like that.
Yes, that seems to do the job. But is this available in 5.0 or
earlier?
Certainly not. It's not even available (does not come standard) before
5.6,
and IIRC use came in at 5.0.
--
Jarkko Hietaniemi [EMAIL PROTECTED
4?)
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
OVERLAY which is to be removed.
Also, although they are not accents, it's unclear (and quite
language-dependent)
what should be done with ligatures.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
the legacy brain not to fire,
other
tests involving characters in that range started failing.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
unfinished as
regards to Unicode. No, it won't get fixed. Beyond 5.8, I don't.
Some people may have some tricks they use to get Unicode code working
both
in 5.6 and 5.8, but _in_principle_ the bytes pragma should tell Perl in
both 5.6 and 5.8 that I want bytes, darn it.
--
Jarkko Hietaniemi [EMAIL
no variable in it :-)
Just environment.
Jungshik
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
much anything about Win32 APIs, Unicode
or
not, so I cannot be of much help.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
and unicode
would mean all the same (UTF-8). As for 'normalization', I have to
think
more about it. And so on.. I've been just thinking aloud so that
you have to bear with some incoherency.
I think incoherency is the key word in this area.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi
.
In other words: the common data is the CLDR data.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
in Perl is new - and currently it is dead in the
water for things like -d - so why not just fix it.
Common practice in how do I detect what charset+encoding
the user now wants to use for their filenames and what funky
charset+encoding
filesystems there are out there, I guess.
--
Jarkko Hietaniemi
process that runs in my laptop
creating
and seeing by using just one locale setting, well, that would be a neat
trick.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
the ICU stuff going into P6 - hope it will relatively
easy to use though - I hate the APIs!
I don't think the plan is to have ICU APIs as such available (at least
in the first approximation), but instead leverage from ICU's low level
Unicode machinery.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http
of system calls (like in Windows).
Again, I think the right way to do what you want is to create a set
of (operating system dependent) modules (some may require XS) that
introduce
the necessary filesystem-related (mkdir etc) variants (or overrides, if
one
wants those).
--
Jarkko Hietaniemi [EMAIL
give it quite a bit of thought
during the Perl 5.8.0 process and I am pretty certain of my reasoning.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
to be improving slowly.
It seems that they are working on separating the locale data from
the ICU, calling the data CLDR:
http://oss.software.ibm.com/cvs/icu/~checkout~/locale/CLDR_status.html
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use
you
can find from search.cpan.org with sort, for example Sort::ArbBiLex
might
be useful.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
/
Mimer_SQL_Engine_DocSet/Mimer_Concepts14.html
Thanks,
--
Eric Cholet
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this
special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
say :punct: on a non-Unicode data, you are doing _operating_
_system_ _dependent_ AND _locale_ _dependent_ operation. :punct: and
\p{Punct} are (supposed to be) equivalent with Unicode data.
in Perl 5.8.1, they are _not_ equivalent, as the following snippet will
demonstrate:
--
Jarkko
reason that often comes up with Unicode is that they want to
have a 1:1 round-tripping from any legacy encoding to Unicode and back.
So if some existing old encoding had the Arabic presentation forms,
Unicode had to have them.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
http://www.joelonsoftware.com/articles/Unicode.html
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
confuses things.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Perl 5.8.1, whenever that happens, will have bytes::substr().
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
say use utf8; at
the top of your script and write your script, including the string
literals, in UTF-8.
Thanks!
Sigge
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
:-)
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
BCAVEAT: The following operations look the same but are not quite so;
from_to($data, ïso-8859-1, ütf8); #1
$data = decode(ïso-8859-1, $data); #2
Ooops. My GNU Emacs iso-accents-mode tried to be too helpful, here...
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
?
My Encode.pm is Version 1.75. It's part of the perl debian package
5.8.0-19 (debian unstable).
I wish Encode.pm would handle invalid UTF-8 in a graceful manner...
Cheers,
-Sven
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use
s/(.)/: $1 :/; # HAS NO EFFECT (left side did not match?!)
s/(..)(A)/$1: $2 :/; # HAS NO EFFECT (left side did not match?!)
Is this something that will be fixed in 5.8.1?
Yes. It has been already been fixed.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
and it is better to initialize $bits.
(But is utf8_heavy.pl not ported for 64-bits?)
Thanks, your patch has been now applied. I haven't seen such
warnings in 64-bit platforms, but then again, beyond BMP hasn't been
tested much.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
Excellent! I added a mention of this module to the perlunicode pod.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
In Perl 5.8.1 there will be an API, PerlIO::get_layers() that will
return the list of layer names on a filehandle.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
?
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
*think* a..z followed by all pairings of \w characters in the
a..chr(0x100) range could be an answer, but I'm not certain.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
the
quoteless style myself. Again, did I forget to set something?
(yes I did try use utf8;).
thanks!
/Daniel
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
.
Also note that
$ perl -e 'print ((\x{2019} =~ /\S/) . \n);'
1
so \x{2019} *does* match \S in principle ... odd.
(Perl v5.6.0 built for i386-linux)
Markus
--
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__
--
Jarkko Hietaniemi
Since the original VISCII data came (I think) from the Unicode web
site, maybe this discrepancy should be more reported even more officially?
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
...or it could come from the Tcl/Tk mapping tables?
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
P.S. Is ftp.funet.fi still down? I think I am ready to
$Encode:VERSION++ with this patch applied.
Unfortunately so. Or, FUNET is up, it's just that the updates
from PAUSE to FUNET are stopped until they are certain everything
is fixed. (See use.perl.org.)
--
Jarkko Hietaniemi [EMAIL
is gone since it caused too many problems
for existing code.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
-C:1 / -C:0 it is. (The :part being optional.)
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
that the above do not change the fact that if a *programmer* wants
their code to be UTF-8 aware, they need to think about the evil binmode().
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
your page. I have to think more about this,
though, not to make the checking at the point of reading for example
unreasonably slow. And I'll be rather Internet connectivity
challenged in the coming weeks, so please be patient.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
to Larry Wall :-)
I'm trying to get an opinion about this from him, and I just logged
a problem ticket about this issue.
--ewh
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
; ./perl -e 'print pack(v*, 0xFEFF, unpack(C*,
test\n))' ! utf16
$ hex utf16
ff fe 74 00 65 00 73 00 74 00 0a 00 ..t.e.s.t...
$ ./perl -Ilib -we 'open(FH, :encoding(utf16), utf16);print FH'
test
$
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
/ directory? If so, what happened?
preferential. Otherwise if someone has some information and/or advice on
how to track down which parts we do need, I'd be quite appreciative.
Thanks very much
Karl Matthias
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
your source code is other text encodings than UTF-8. But tr/// does
not embrace this magic.
Clarification: tr/A-E/P-T/ (the ranges) does not embrace that magic.
tr/ABCDE/PQRST/ does work with the encoding pragma since that employs
string literals.
--
Jarkko Hietaniemi [EMAIL PROTECTED
be overcome by simple eval qq{} as illustrated. This
much idiom would not hurt much, at least not as much as the Cookbook
sample
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
the message from mutt, I do not see any eight-bit
characters in the saved file...
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
(Hi, it's me again...)
Are you doing character ranges in the tr/// under 'use encoding'?
(I'm asking because I see a - in the middle of what I assume is
mangled EUC-JP)
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable
of tr/// is very limited. I think you want s///e.
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
On Wed, Oct 02, 2002 at 10:44:06PM +0900, Dan Kogai wrote:
On Wednesday, Oct 2, 2002, at 22:34 Asia/Tokyo, Jarkko Hietaniemi wrote:
Yes. that's where hiragana - katakana conversion is attempted;
English equivalent of tr/A-Z/a-z/.
Okay... What are the {begin,end} codepoints of those ranges
rather trivially with the:
s/([\x80-\xFF])/chr(0xC0|ord($1)6).chr(0x80|ord($1)0x3F)/eg;
s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)60xC0|ord($2)0x3F)/eg;
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack
? User-defined
Unicode character properties (character classes), are publicly
documented in perlunicode.pod.
Are there any hidden drawbacks or other problems with this idea?
Thanks,
/Autrijus/
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word
properties (character classes), are publicly
documented in perlunicode.pod.
I'll be happy to oblige and make a U::EAW with that, then.
If it is found to work fine, we can certainly merge that later back
into the core (maybe when Unicode 3.2.1 comes out).
--
Jarkko Hietaniemi [EMAIL PROTECTED
basically a user cannot define 'Is' properties on his/her own,
because the canonized form does not match the specified $type:
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Found this nice resource, I especially like the list of issues
(which is much longer than the list of advantages...)
http://www.i18nguy.com/locales/locale-resources.html
http://www.i18nguy.com/locales/index.html
--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi
Excellent, thanks Andreas. I see cookbook like this as a patch for
perluniintro/perlunicode for 5.8.1.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen
Only this combination got 'split' in myFunction to chop up utf-8
text properly. Is this behavior expected?
Without seeing more detail, yes. Raw embedded UTF-8 has to
be marked as UTF-8 somehow, and use utf8 is the primary way.
Another issue I've encountered was with using Unicode::String
Final round of proofreadings, if I may:
http://www.iki.fi/jhi/pl580.txt.big5.tw
http://www.iki.fi/jhi/pl580.txt.euc.cn
http://www.iki.fi/jhi/pl580.txt.euc.jp
http://www.iki.fi/jhi/pl580.txt.euc.kr
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for
On Wed, Jul 17, 2002 at 06:25:01PM +0900, Tatsuhiko Miyagawa wrote:
here's a tiny doc patch for Encode:
Thanks, applied.
--
Tatsuhiko Miyagawa [EMAIL PROTECTED]
--- Encode.pm~ Sun Jun 2 03:08:01 2002
+++ Encode.pm Wed Jul 17 18:24:02 2002
@@ -552,7 +552,7 @@
while(defined(read
On Thu, Jul 18, 2002 at 11:51:11AM +0900, Dan Kogai wrote:
On Thursday, July 18, 2002, at 02:31 AM, Jarkko Hietaniemi wrote:
Notice the name changes. I also edited away the DJGPP broken entry
since that's now fixed. I notice that the VMS section in the .jp one
is untranslated
On Thu, Jul 18, 2002 at 12:34:22PM +0800, Autrijus Tang wrote:
On Thu, Jul 18, 2002 at 07:21:58AM +0300, Jarkko Hietaniemi wrote:
Thus fixed. I've also corrected linefeeds so it looks better via web
browsers. Get the newer version one via
http://www.dan.co.jp/~dankogai/bleedperl
There has been some talk of Encode possibly caching internally
fast paths for small (like one eight-bit cset to another)
conversions. But it was decided that we better get it first
working (and out of the door with 5.8.0) and only then try to
make it faster.
I won't comment more on the OO vs
Do you mean that the Unicode Standard has moved on from what it was when
5.6.1 was released?
Yes.
What version of Unicode did 5.6.1 support?
3.0.1. (The same as Perl 5.6.0 supported.)
What version of Unicode will 5.8.0 support?
3.2.0.
3. In which areas is Perl 5.6.1's Unicode
Groan. It doesn't seem that people have been stress testing Unicode
s/// so far that much. But anyway, another fix attached, along with
the previous one.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack
Mopping up.
Change 17362 by jhi@alpha on 2002/06/26 15:25:45
Let's not leak.
Affected files ...
//depot/perl/pp_hot.c#283 edit
Differences ...
//depot/perl/pp_hot.c#283 (text)
Index: perl/pp_hot.c
--- perl/pp_hot.c#282~17358~Wed Jun 26 17:37:12 2002
+++
On Wed, Jun 26, 2002 at 05:43:07PM +0100, Hugo van der Sanden wrote:
SADAHIRO Tomoyuki [EMAIL PROTECTED] wrote:
:With Perl 5.8.0 RC2 (or plus Change 17353),
:there is something strange.
:
:In $unicode =~ s/$regex/$bytes/,
:$bytes is not upgraded,
:and a malformed Unicode string is
On Wed, Jun 26, 2002 at 05:52:25PM +0100, Hugo van der Sanden wrote:
I wrote:
:Attached patch passes all existing tests here, as well as some new ones.
Whoops, crossed in the post. My patch was written against 17356;
it may not be necessary after #17358, but the extra tests might be
worth
Also, does your version need the additional
SvSetMagicSV(nsv, dstr)
Probably. I'll try whether it disturbs anything, I guess no since
we'd need some magical Unicode string?
Doesn't seem to hurt, but before I have a test case that exercises
that piece of code, I'd rather not add
http://wordherd.com/keyboards/
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen
On Sun, Jun 02, 2002 at 02:22:32AM +0900, Dan Kogai wrote:
On Saturday, June 1, 2002, at 04:37 AM, Autrijus Tang wrote:
Understood.
In a related note:
http://www.li18nux.org/docs/html/CodesetAliasTable-V10.html
has spurred quite a bit discussion in Taiwan because of the mandated
Since the two big Perl 5.8.0 news are much better Unicode support
and much better threads support, here is the announcement. Enjoy.
- Forwarded message from Jarkko Hietaniemi [EMAIL PROTECTED] -
=head1 Perl 5.8.0 Release Candidate 1
The Perl 5 developer team is pleased to announce
On Fri, May 31, 2002 at 06:18:55AM +0900, Dan Kogai wrote:
On Friday, May 31, 2002, at 06:06 AM, George Rhoten wrote:
Hopefully you take the implicit information in the UCM files and put
that
into encode implementation too. For instance, in gb18030 there are
whole
ranges of Unicode
On Mon, May 06, 2002 at 09:01:58PM -0400, Jungshik Shin wrote:
On Tue, 7 May 2002, Dan Kogai wrote:
Hi Dan,
pumpking is calling for the (hopefully) the last chance to update
README.cjk.
On Tuesday, May 7, 2002, at 02:48 , Jarkko Hietaniemi wrote:
Do I have the latest versions
On Thu, May 02, 2002 at 08:01:34AM +0200, Philip Newton wrote:
On Wed, 1 May 2002 07:00:05 -0700, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote:
Change 16302 by jhi@alpha on 2002/05/01 12:54:24
Provide the \N{U+} syntax before we forget.
Do we also want to support U-HH? I
On Wed, May 01, 2002 at 02:58:13PM +0900, Dan Kogai wrote:
My fever is down at last when I released Encode-1.66, available as
follows;
Whole:
http://www.dan.co.jp/~dankogai/Encode-1.66.tar.gz or CPAN
Diff against current: 264 lines
1 - 100 of 202 matches
Mail list logo