Re: UTF-8 encoding & decoding

2016-05-14 Thread Aristotle Pagaltzis
* Pali Rohár [2016-05-12 20:23]: > If both functions should do same thing, why we have duplicity? Encode.pm is big and fairly slow, because it handles a zillion encodings and has lots of options for handling invalid input data. Perl needs only UTF-8 transcoding and needs it fast, so it has code f

Re: UTF-8 encoding & decoding

2016-05-12 Thread Pali Rohár
On Friday 06 May 2016 09:24:01 Karl Williamson wrote: > On 05/05/2016 08:37 AM, Pali Rohár wrote: > >Hi! > > > >I though that I understand UTF-8 encoding/decoding done in perl until I > >looked into source code of Encode package... (exactly sub encode_utf8) > > >

Re: UTF-8 encoding & decoding

2016-05-06 Thread Karl Williamson
On 05/05/2016 08:37 AM, Pali Rohár wrote: Hi! I though that I understand UTF-8 encoding/decoding done in perl until I looked into source code of Encode package... (exactly sub encode_utf8) Before... I only read description of Encode package (not source code): https://metacpan.org/pod/Encode

Re: UTF-8 encoding & decoding

2016-05-06 Thread Aristotle Pagaltzis
* Pali Rohár [2016-05-06 14:50]: > 1. What is difference between those two calls? > > utf8::encode($str); > > and > > $str = Encode::encode('utf8', $str); > > 2. What is difference between those? > > utf8::decode($str); > $str = Encode::decode_utf8($str); They do the same thing with different

UTF-8 encoding & decoding

2016-05-06 Thread Pali Rohár
Hi! I though that I understand UTF-8 encoding/decoding done in perl until I looked into source code of Encode package... (exactly sub encode_utf8) Before... I only read description of Encode package (not source code): https://metacpan.org/pod/Encode#UTF-8-vs.-utf8-vs.-UTF8 I tried to find some

Re: Choice of BOM for UTF-16 encoding

2014-02-09 Thread Aristotle Pagaltzis
* Geoffrey Leach [2014-02-10 07:35]: > Is there a way to force (from my module) the choice to be LE? It turns > out that the library I'm supporting (taglib) works in LE. Does it need a BOM prepended? If not, just do the obvious and `encode('UTF-16LE', $str)`. C.f. `perldoc Encode::Unicode`. Re

Choice of BOM for UTF-16 encoding

2014-02-09 Thread Geoffrey Leach
I'm the maintainer of Audio::Taglib. Summary of my perl5 (revision 5 version 16 subversion 3) configuration: osname=linux, osvers=3.10.9-200.fc19.x86_64, archname=x86_64-linux-thread-multi $utf16 = encode("UTF-16", "\x{6211}\x{7684}") prepends a big-endian BOM. As best I can tell this r

Given 5.014 features, is encoding::warnings still needed?

2011-06-08 Thread Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯
In <http://stackoverflow.com/q/6281049#comment-7334585>, tchrist asks: | Is `encoding::warnings` actually still needed given the `/dual` modifiers | and the `unicode_strings` feature? signature.asc Description: This is a digitally signed message part.

Re: encoding(UTF16-LE) on Windows

2011-02-02 Thread Erland Sommarskog
Michael Ludwig (mil...@gmx.de) writes: >> For instance, I use Windows exclusively, so Unicode in file names is >> no problem. > > Did a quick test: > > (v5.12.1) built for MSWin32-x86-multi-thread (so ActiveState) > > * a…b.txt > * not correct > * doesn't have anything with "uni" or "utf"

Re: encoding(UTF16-LE) on Windows

2011-01-31 Thread Michael Ludwig
tance, I use Windows exclusively, so Unicode in file names is > no problem. Did a quick test: \,,,/ (o o) --oOOo-(_)-oOOo-- use strict; use warnings; use utf8; my $fn = 'a…b.txt'; # mit Unicode-Zeichen open my $fh, '>:encoding(UTF-8)', $fn or die

Re: encoding(UTF16-LE) on Windows

2011-01-31 Thread Erland Sommarskog
Michael Ludwig (mil...@gmx.de) writes: > Erland Sommarskog schrieb am 29.01.2011 um 14:02 (+0100): > >> Yes, there certainly seems to be some more stuff to do in the Unicode >> support in Perl. For instance, support for Unicode filenames in open >> or opendir. > > I think there is no portable ans

Re: encoding(UTF16-LE) on Windows

2011-01-30 Thread Michael Ludwig
Erland Sommarskog schrieb am 29.01.2011 um 14:02 (+0100): > Yes, there certainly seems to be some more stuff to do in the Unicode > support in Perl. For instance, support for Unicode filenames in open > or opendir. I think there is no portable answer here, as it depends on the filesystem's suppor

RE: encoding(UTF16-LE) on Windows

2011-01-30 Thread Erland Sommarskog
"Jan Dubois" (j...@activestate.com) writes: > I've double-checked with Leon, who thinks that this is due to bug 38456: > > http://rt.perl.org/rt3//Public/Bug/Display.html?id=38456 > > He made a patch to fix the bug, and the patch has been applied to > bleadperl already. I ran you sample scri

RE: encoding(UTF16-LE) on Windows

2011-01-28 Thread Jan Dubois
On Fri, 21 Jan 2011, Erland Sommarskog wrote: > "Jan Dubois" (j...@activestate.com) writes: > > You need to stack the I/O layers in the right order. The :encoding() > > layer needs to come last (be at the bottom of the stack), *after* the > > :crlf layer adds the ad

RE: encoding(UTF16-LE) on Windows

2011-01-23 Thread Erland Sommarskog
"Jan Dubois" (j...@activestate.com) writes: > You need to stack the I/O layers in the right order. The :encoding() > layer needs to come last (be at the bottom of the stack), *after* the > :crlf layer adds the additional carriage returns. The way to pop the > default :crlf

RE: encoding(UTF16-LE) on Windows

2011-01-21 Thread Jan Dubois
I wrote: > I saw some discussion today that the :raw pseudo-layer in the open() > call will also remove the buffering layer (it doesn’t do that when you > use it in a binmode() call). I’ll try to remember to send a followup > once I actually understand what is going on. That seems indeed to be the

RE: encoding(UTF16-LE) on Windows

2011-01-21 Thread Jan Dubois
:40 PM To: perl-unicode@perl.org Subject: Re: encoding(UTF16-LE) on Windows Jan Dubois wrote: Files opened on Windows already have the :crlf layer pushed by default, so you somehow need to get the :encoding layer *below* it. Is it possible to re-write the working statement open(my $fh

RE: encoding(UTF16-LE) on Windows

2011-01-21 Thread Jan Dubois
On Fri, 21 Jan 2011, Erland Sommarskog wrote: > > There is still one thing that is not clear to me. The incorrect end-of-line > was > > 0D 00 0A > > But the way you describe it, I would expect it to be > > 0D 0A 00 I went back to the very first message in the thread, where you write: | Wh

RE: encoding(UTF16-LE) on Windows

2011-01-21 Thread Erland Sommarskog
"Jan Dubois" (j...@activestate.com) writes: > Now when you print a string to the filehandle, then it will be passed > to the top-most layer first (:crlf), which will s/\n/\r\n/g on the > string, and then passes it on to the next lower layer :encoding, which > will do the

Re: encoding(UTF16-LE) on Windows

2011-01-20 Thread Bob Hallissy
Jan Dubois wrote: Files opened on Windows already have the :crlf layer pushed by default, so you somehow need to get the :encoding layer*below* it. Is it possible to re-write the working statement open(my $fh, ">:raw:encoding(UTF-16LE):crlf", $filename) or die $!; in a w

Re: encoding(UTF16-LE) on Windows

2011-01-20 Thread 'Michael Ludwig'
[RE: encoding(UTF16-LE) on Windows] Jan Dubois schrieb am 20.01.2011 um 12:45 (-0800): > On Thu, 20 Jan 2011, Michael Ludwig wrote: > > Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-): > > > "Jan Dubois" (j...@activestate.com) writes: > > > > You n

RE: encoding(UTF16-LE) on Windows

2011-01-20 Thread Jan Dubois
On Thu, 20 Jan 2011, Erland Sommarskog wrote: > One can sense some potential for improvements. Not the least in the > documentation area. This is open source. Patches welcome! This is how things get better. Cheers, -Jan

RE: encoding(UTF16-LE) on Windows

2011-01-20 Thread Jan Dubois
On Thu, 20 Jan 2011, Michael Ludwig wrote: > Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-): > > "Jan Dubois" (j...@activestate.com) writes: > > > You need to stack the I/O layers in the right order. The :encoding() > > > layer needs to come last (b

Re: encoding(UTF16-LE) on Windows

2011-01-20 Thread Michael Ludwig
Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-): > "Jan Dubois" (j...@activestate.com) writes: > > You need to stack the I/O layers in the right order. The :encoding() > > layer needs to come last (be at the bottom of the stack), *after* the > > :crlf lay

RE: encoding(UTF16-LE) on Windows

2011-01-20 Thread Erland Sommarskog
"Jan Dubois" (j...@activestate.com) writes: > You need to stack the I/O layers in the right order. The :encoding() > layer needs to come last (be at the bottom of the stack), *after* the > :crlf layer adds the additional carriage returns. The way to pop the > default :crlf

Re: encoding(UTF16-LE) on Windows

2011-01-19 Thread 'Michael Ludwig'
Jan Dubois schrieb am 19.01.2011 um 11:08 (-0800): > You need to stack the I/O layers in the right order. The :encoding() > layer needs to come last (be at the bottom of the stack), *after* the > :crlf layer adds the additional carriage returns. The way to pop the > default :crlf

RE: encoding(UTF16-LE) on Windows

2011-01-19 Thread Jan Dubois
On Wed, 19 Jan 2011, Michael Ludwig wrote: > Erland Sommarskog schrieb am 17.01.2011 um 13:57 (-): > > I'm on Windows and I have this small script: > > > >use strict; > >open F, '>:encoding(UTF-16LE)', "slask2.txt"; > >

Re: encoding(UTF16-LE) on Windows

2011-01-19 Thread Michael Ludwig
Erland Sommarskog schrieb am 17.01.2011 um 13:57 (-): > I'm on Windows and I have this small script: > >use strict; >open F, '>:encoding(UTF-16LE)', "slask2.txt"; >print F "1\n2\n3\n"; >close F; > > When I open t

encoding(UTF16-LE) on Windows

2011-01-19 Thread Erland Sommarskog
I'm on Windows and I have this small script: use strict; open F, '>:encoding(UTF-16LE)', "slask2.txt"; print F "1\n2\n3\n"; close F; When I open the output in a hex editor I see 31 00 0D 0A 00 32 00 0D 0A 00 33 0D 0A 00 I would expect to se

Re: Detecting malformed characters in files opened with '<:encoding(something)'

2010-10-03 Thread Darren Duncan
harryfm...@comcast.net wrote: Various places in the Perl docs say, with good and sufficient reason, that when reading a UTF-8 file, it should be opened '<:encoding(utf8)' rather than '<:utf8'. Actually, what you want is ":encoding(UTF-8)" because that is more strict. -- Darren Duncan

Detecting malformed characters in files opened with '<:encoding(something)'

2010-10-03 Thread harryfmudd
Dear List, Various places in the Perl docs say, with good and sufficient reason, that when reading a UTF-8 file, it should be opened '<:encoding(utf8)' rather than '<:utf8'. The thing is, nowhere can I find documented what happens when a malformed character is enc

Re: use encoding 'utf8' and \x{00e4} notation

2010-02-03 Thread Michael Ludwig
Am 03.02.2010 um 08:55 schrieb Aristotle Pagaltzis: > * Michael Ludwig [2010-02-02 17:35]: >> use encoding 'utf8'; > > The `encoding` pragma is broken. Do not use it. > > You want > >use open ':encoding(UTF-8)', ':std'; Thanks

Re: use encoding 'utf8' and \x{00e4} notation

2010-02-03 Thread Aristotle Pagaltzis
* Michael Ludwig [2010-02-02 17:35]: > use encoding 'utf8'; The `encoding` pragma is broken. Do not use it. You want use open ':encoding(UTF-8)', ':std'; Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/>

use encoding 'utf8' and \x{00e4} notation

2010-02-02 Thread Michael Ludwig
I was under the assumption that: use encoding 'utf8'; was equivalent to: use utf8; # source in UTF-8 binmode STDIN, ':utf8'; binmode STDOUT, ':utf8; But that does not seem to be the case. Please consider and run the following script: use strict; use warnings;

Re: Fix UTF Encoding issue

2007-12-04 Thread Martin Koegler
es not test, if a string is really UTF-8. It seems to be to intended to check, if perl stores the string internally in a multi byte encoding. mfg Martin Kögler.

Re: Fix UTF Encoding issue

2007-12-04 Thread Ismail Dönmez
Tuesday 04 December 2007 10:47:39 Ismail Dönmez yazmıştı: > Tuesday 04 December 2007 10:44:12 Martin Koegler yazmıştı: > > On Tue, Dec 04, 2007 at 10:33:39AM +0200, Ismail Dönmez wrote: > > > Following to_utf8 function works for me : > > > > For me too (Debian sarge+etch). > > Thanks for testing.

Re: Fix UTF Encoding issue

2007-12-04 Thread Martin Koegler
return $str; In the original thread, there was some discussion, that some people might want a different fallback endcoding. So mayme you should keep the second call to decode for the fallback encoding. > } mfg Martin Kögler

Re: Fix UTF Encoding issue

2007-12-04 Thread Wincent Colaiuta
El 4/12/2007, a las 9:55, Ismail Dönmez escribió: Tuesday 04 December 2007 10:47:39 Ismail Dönmez yazmıştı: Tuesday 04 December 2007 10:44:12 Martin Koegler yazmıştı: On Tue, Dec 04, 2007 at 10:33:39AM +0200, Ismail Dönmez wrote: Following to_utf8 function works for me : For me too (Debian

Re: Fix UTF Encoding issue

2007-12-04 Thread Martin Koegler
On Tue, Dec 04, 2007 at 09:55:04AM +0200, Ismail Dönmez wrote: > Tuesday 04 December 2007 Tarihinde 09:50:28 yazmt??: > > The bug affects old versions of perl (Debian sarge = oldstable). > > As it works on the newer Debian etch, do you really think, that it is > > a good idea to report issue? >

Re: Fix UTF Encoding issue

2007-12-04 Thread Ismail Dönmez
> > > >     if(utf8::valid($str)) > >     { > >         utf8::decode($str); > >     } > > · > >     return $str; > > In the original thread, there was some discussion, that some people > might want a different fallback endcoding. So mayme you should

Re: Fix UTF Encoding issue

2007-12-04 Thread Ismail Dönmez
Tuesday 04 December 2007 10:28:59 Ismail Dönmez yazmıştı: > Tuesday 04 December 2007 10:16:34 Martin Koegler yazmıştı: > [...] > > > print t("#öäü"); > > print t("#ÀöÌ"); > > print "\n"; > > How about this one, doesn't even use Encode, uses just built-in utf8 > function : > > [~]> cat test.pl >

Re: Fix UTF Encoding issue

2007-12-04 Thread Ismail Dönmez
Tuesday 04 December 2007 10:16:34 Martin Koegler yazmıştı: [...] > print t("#öäü"); > print t("#ÀöÌ"); > print "\n"; How about this one, doesn't even use Encode, uses just built-in utf8 function : [~]> cat test.pl binmode STDOUT, ':utf8'; my $str = "#öäü"; if (utf8::valid($str)) { utf8:

Re: Fix UTF Encoding issue

2007-12-04 Thread Ismail Dönmez
tr, > > >>>>>>Encode::FB_DEFAULT); > > >>>>>>- } > > >>>>>>+ eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); > > >>>>>>}; > > >>>>>>+ return decode($fallback_encoding, $str, Enc

Re: Fix UTF Encoding issue

2007-12-04 Thread Martin Koegler
eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); > >>>>>>}; > >>>>>>+ return decode($fallback_encoding, $str, Encode::FB_DEFAULT); > >>>>>> } > >>>>>> > >>This version

Re: Fix UTF Encoding issue

2007-12-04 Thread Ismail Dönmez
Tuesday 04 December 2007 Tarihinde 09:50:28 yazmıştı: > The bug affects old versions of perl (Debian sarge = oldstable). > As it works on the newer Debian etch, do you really think, that it is > a good idea to report issue? Same problem here with v5.8.8 which is latest stable perl5 release. Regar

Re: Fix UTF Encoding issue

2007-12-04 Thread Martin Koegler
On Mon, Dec 03, 2007 at 06:02:54PM +0100, Jakub Narebski wrote: > On Mon, 3 Dec 2007, Martin Koegler wrote: > > eval { $res = decode_utf8(...); } > > if ($@) > > return decode(...); > > return $res > > > > or > > > > eval { $res = decode_utf8(...); } > > if (defined $res) > > return $

Re: Fix UTF Encoding issue

2007-12-03 Thread Benjamin Close
ding, $str, > > >>>>>> Encode::FB_DEFAULT); > > >>>>>> -} > > >>>>>> +eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); }; > > >>>>>> +return decode($fallback_encoding, $str, Encode::FB_DEFAULT); > >

Re: Fix UTF Encoding issue

2007-12-03 Thread Ismail Dönmez
val { return ($res = decode_utf8($str, Encode::FB_CROAK)); }; > >>>>>> + return decode($fallback_encoding, $str, Encode::FB_DEFAULT); > >>>>>> } > >> > >> This version is broken on Debian sarge and etch. Feeding a UTF-8 and a > >&g

Re: Fix UTF Encoding issue

2007-12-03 Thread Benjamin Close
de::FB_CROAK)); }; + return decode($fallback_encoding, $str, Encode::FB_DEFAULT); } This version is broken on Debian sarge and etch. Feeding a UTF-8 and a latin1 encoding of the same character sequence yields to different results. For the record, this was on a debian

Re: Fix UTF Encoding issue

2007-12-03 Thread Jakub Narebski
t;>> - } >>>>> + eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); }; >>>>> + return decode($fallback_encoding, $str, Encode::FB_DEFAULT); >>>>> } > > This version is broken on Debian sarge and etch. Feeding a UTF-8 and a latin1 &

Re: UTF-16BE -> UTF-8 encoding() error

2007-11-29 Thread Paul Bijnens
On 2007-11-29 01:04, Jenkins, Nicholas S (GE Money) wrote: I have 2 data files I want to compare...one is in UTF-16BE (Windows "Unicode" format) and one is in UTF-8 format. I wrote 3 perl programs: *)1 to normalize data in the UTF-16BE source and write to a UTF-8 formatted output file *)1 to

UTF-16BE -> UTF-8 encoding() error

2007-11-29 Thread Jenkins, Nicholas S (GE Money)
Hi...I found this DL via the perldoc.perl.org/perluniintro page...if I'm violating protocol for writing directly, please pardon. I have 2 data files I want to compare...one is in UTF-16BE (Windows "Unicode" format) and one is in UTF-8 format. I wrote 3 perl programs: *)1 to normalize data in the

Re: Use of encoding/decoding and 3-param open

2007-11-15 Thread Paul Bijnens
On 2007-11-13 19:56, Juerd Waalboer wrote: $rv = open (OUT2, ">:utf8", "sample2"); Should work well. Remember that you shouldn't use :utf8 for input. In the general case, :encoding(UTF-8) is safest. Can you elaborate more on the subtle difference betwe

Re: Use of encoding/decoding and 3-param open

2007-11-15 Thread Juerd Waalboer
Paul Bijnens skribis 2007-11-15 14:52 (+0100): > Can you elaborate more on the subtle difference between: > binmode(STDIN, ":utf8"); > binmode(STDIN, ":encoding(UTF-8)"); http://search.cpan.org/~rgarcia/perl-5.9.5/pod/perlunifaq.pod#Cheat?!_Tell_me,_how_can_I_che

Re: Use of encoding/decoding and 3-param open

2007-11-13 Thread Juerd Waalboer
this. \x in Perl takes codepoint numbers, and C384 is not the codepoint for the character that you want. Likewise, the codepoint U+00E5 (LATIN SMALL LITTER A WITH RING ABOVE) is not at all like U+C3A5, even though the UTF-8 encoding is C3 A5. Please do yourself a big favor and learn about the di

Use of encoding/decoding and 3-param open

2007-11-13 Thread Greger Leijonhufvud
stanå Bruk AB Open 'open (OUT1, ">", "sample1")' returns 1 Open 'open (OUT2, ">:utf8", "sample2")' returns 1 Open 'open (OUT3, '>:enco

FW: Utf8 encoding

2007-10-26 Thread Vajramatti Shashidhar (DS/EES1)
amatti Shashidhar (DS/EES1) > Sent: Friday, October 26, 2007 12:03 PM > To: 'perl-unicode@perl.org' > Subject: Utf8 encoding > > Hello, > I am parsing an xml file using libxml2. The xml file has umlauts(german > keys ü/ö/ä etc) , °(degree) atc as the characters.

Re: Utf8 encoding

2007-10-26 Thread Juerd Waalboer
rser = XML::LibXML->new(); my $doc = $parser->parse_file( $x_file ); What are the contents of the file? (What is its encoding?) The first line should contain the encoding: -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker <[EMAIL PROTECTED]

Utf8 encoding

2007-10-26 Thread Vajramatti Shashidhar (DS/EES1)
icate encoding !" Below is the snippet of code that I used. my $parser = ""; my $doc = ""; $parser = XML::LibXML->new();# $doc = $parser->parse_file( $x_file ); I tried to encode using "setEncoding" but no results. -Shashi

Support(ed|ing) ietnamese encoding in Encode

2007-09-18 Thread Nguyen Vu Hung
Dear Mr. Kogai, Your Encode supports Vietnamese viscii and CP1258 encoding. As far as I know, there are more 4 legacy Vietnamese out there: VNI VPS TCVN VIQR # Refer to http://vietunicode.sourceforge.net/charset/ # for mapping tables. As a request, can you make Perl's Encode support

Re: the encoding "646"

2007-01-19 Thread Kjetil Torgrim Homme
On Fri, 2007-01-19 at 22:01 +0900, Marty Pauley wrote: > On Thu, 18 Jan 2007 20:50:50 +0100 Kjetil Torgrim Homme > <[EMAIL PROTECTED]> wrote: > > > I request you add "646" as an alias for "ascii". > > But it isn't the same as ascii! We shouldn't need to add a bug to Perl > just to fix a dodgy So

Re: the encoding "646"

2007-01-19 Thread Marty Pauley
Hello On Thu, 18 Jan 2007 20:50:50 +0100 Kjetil Torgrim Homme <[EMAIL PROTECTED]> wrote: > I request you add "646" as an alias for "ascii". But it isn't the same as ascii! We shouldn't need to add a bug to Perl just to fix a dodgy Solaris 8 locale setup. If it really is ISO 646 then you need t

the encoding "646"

2007-01-18 Thread Kjetil Torgrim Homme
in Solaris 8, the "C" locale uses a charset which is unknown to Perl: #! /local/bin/perl use I18N::Langinfo qw(langinfo CODESET); my $term_encoding = langinfo(CODESET()); binmode STDOUT, ":encoding($term_encoding)"; pri

Normalizing an unknown filehandle encoding to utf8

2006-10-05 Thread Napiorkowski, John
Hi,   I’m trying to normalize a filehandle of unknown encoding to UTF8.  There is a lot of documentation about changing/converting data formats but nothing I’ve tried works.  Here is my problem and what I tried to do to solve it.   I have a form upload which is allowing my clients to

Re: iso-2022-jp encoding on EBCDIC

2005-12-21 Thread rajarshi das
Created ticket # 16663.SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote: On Mon, 19 Dec 2005 22:28:55 -0800 (PST), rajarshi das <[EMAIL PROTECTED]>wrote> I am testing this with iso-2022-jp encoding :> > use encoding 'iso-2022-jp';> >

Re: iso-2022-jp encoding on EBCDIC

2005-12-20 Thread SADAHIRO Tomoyuki
On Mon, 19 Dec 2005 22:28:55 -0800 (PST), rajarshi das <[EMAIL PROTECTED]> wrote > I am testing this with iso-2022-jp encoding : > ---- > use encoding 'iso-2022-jp'; > > $a = "^[$B$!^[(B"; > print "a : $a\n"; > ---

Re: iso-2022-jp encoding on EBCDIC

2005-12-19 Thread rajarshi das
--- SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote: > > On Wed, 14 Dec 2005 05:19:00 -0800 (PST), rajarshi > das <[EMAIL PROTECTED]> wrote > > > Hi, > > > > The following two line script gives an error on > z/OS : "Unknown encoding '

Re: iso-2022-jp encoding on EBCDIC

2005-12-15 Thread Nick Ing-Simmons
Rajarshi Das <[EMAIL PROTECTED]> writes: > Hi, > > The following two line script gives an error on z/OS : "Unknown encoding > 'iso-2022- > jp' at line ..". > - > use Encode; use encoding 'iso-2022-jp'; > -

Re: iso-2022-jp encoding on EBCDIC

2005-12-15 Thread SADAHIRO Tomoyuki
On Wed, 14 Dec 2005 05:19:00 -0800 (PST), rajarshi das <[EMAIL PROTECTED]> wrote > Hi, > > The following two line script gives an error on z/OS : "Unknown encoding > 'iso-2022-jp' at line ..". > - > use Encode; > use encodi

iso-2022-jp encoding on EBCDIC

2005-12-14 Thread rajarshi das
Hi,   The following two line script gives an error on z/OS : "Unknown encoding 'iso-2022-jp' at line ..". - use Encode; use encoding 'iso-2022-jp';   How do we confirm if iso-2022-jp is supported on z/OS or not ? Or if it i

Re: intelligent lexically encoding

2005-09-15 Thread Andreas J. Koenig
> On Wed, 7 Sep 2005 20:39:20 -0700, Jerzy Giergiel <[EMAIL PROTECTED]> > said: > Neither of those fallbacks is OK, I want á converted to accent > stripped version of itself i.e. a. The second solution isn't very > helpful either, it's basically tr replacement table which is not

Re: intelligent lexically encoding

2005-09-08 Thread Dan Kogai
On Sep 08, 2005, at 12:39 , Jerzy Giergiel wrote: Neither of those fallbacks is OK, I want á converted to accent stripped version of itself i.e. a. The second solution isn't very helpful either, it's basically tr replacement table which is not much fun to write when majority of upper 128 cha

Re: intelligent lexically encoding

2005-09-07 Thread Jerzy Giergiel
27;s gotta be a simpler and more elegant solution. thanks anyway. sorry for bugging people here with a trivial question. I need to convert from MacRoman encoding to asci (7-bit). Encode package simply replaces out of range characters with a question mark. I need something intelligent lex

Re: intelligent lexically encoding

2005-09-07 Thread Dan Kogai
On Sep 08, 2005, at 11:22 , Jerzy Giergiel wrote: sorry for bugging people here with a trivial question. I need to convert from MacRoman encoding to asci (7-bit). Encode package simply replaces out of range characters with a question mark. I need something intelligent lexically speaking

intelligent lexically encoding

2005-09-07 Thread Jerzy Giergiel
sorry for bugging people here with a trivial question. I need to convert from MacRoman encoding to asci (7-bit). Encode package simply replaces out of range characters with a question mark. I need something intelligent lexically speaking. For example aacute should be converted to a. Any

Re: Encoding iso-8859-16

2005-08-19 Thread Nicholas Clark
On Fri, Aug 19, 2005 at 05:51:10PM +0530, Sastry wrote: > Hi > > The test case uses the invariant character that is below <127 on > ISO-8859-16 codepage. Since character 'a' has a codepoint of 129 on > EBCDIC, is there a place in the code where it should apply > NATIVE_TO_ASCII macro on the inp

Re: Encoding iso-8859-16

2005-08-19 Thread Sastry
Clark <[EMAIL PROTECTED]> wrote: > On Fri, Aug 19, 2005 at 05:01:04PM +0530, Sastry wrote: > > Hi Nicholas > > > > With reference to my previous mail on encoding module > > > > use Encode; > > $string = "a"; > > $enc_string = encode("iso

Re: Encoding iso-8859-16

2005-08-19 Thread Nicholas Clark
On Fri, Aug 19, 2005 at 05:01:04PM +0530, Sastry wrote: > Hi Nicholas > > With reference to my previous mail on encoding module > > use Encode; > $string = "a"; > $enc_string = encode("iso-8859-16", $string); > print "\n String: $string\n&quo

Re: Encoding iso-8859-16

2005-08-19 Thread Sastry
Hi Nicholas With reference to my previous mail on encoding module use Encode; $string = "a"; $enc_string = encode("iso-8859-16", $string); print "\n String: $string\n"; print "\n enc_string: $enc_string\n"; a)How different are those ext/Encode/def_t.

Re: Encoding iso-8859-16

2005-08-10 Thread Nicholas Clark
On Wed, Aug 10, 2005 at 02:11:45PM +0530, Sastry wrote: > On 8/9/05, Nicholas Clark <[EMAIL PROTECTED]> wrote: > > On Tue, Aug 09, 2005 at 10:58:48AM +0530, Sastry wrote: > > > > $enc_string = encode("iso-8859-16", $string); > > So $enc_string should be a single byte, 97, everywhere. > Can you su

Re: Encoding iso-8859-16

2005-08-10 Thread Sastry
ring = "a"; > > > $enc_string = encode("iso-8859-16", $string); > > > > > > print ord ($enc_string), "\n"; > > 73. Odd. > > It should print 97 on all platforms. Because: > > $string contains 1 byte, the byte that represe

Re: Encoding iso-8859-16

2005-08-09 Thread Sastry
ode("iso-8859-16", $string); > > > > > > print ord ($enc_string), "\n"; > > 73. Odd. > > It should print 97 on all platforms. Because: > > $string contains 1 byte, the byte that represents 'a' in the platform's > default character encoding. > > The encode call should convert from the default encoding to iso-8859-16 > And 'a' in iso-8859-16 is 97. > Everywhere. > > So $enc_string should be a single byte, 97, everywhere. > > Nicholas Clark >

Re: Encoding iso-8859-16

2005-08-09 Thread Nicholas Clark
ot;\n"; 73. Odd. It should print 97 on all platforms. Because: $string contains 1 byte, the byte that represents 'a' in the platform's default character encoding. The encode call should convert from the default encoding to iso-8859-16 And 'a' in iso-8859-16 is 97. Everywhere. So $enc_string should be a single byte, 97, everywhere. Nicholas Clark

Re: Encoding iso-8859-16

2005-08-08 Thread Sastry
Hi I get 73 printed on EBCDIC platform. I think it is supposed to print 129 as it is the numeric equivalent of 'a'. -Sastry On 8/8/05, Nicholas Clark <[EMAIL PROTECTED]> wrote: > On Thu, Aug 04, 2005 at 11:51:44AM +0530, Sastry wrote: > > Hi > > > > I am running the following script on EBCDI

Re: Encoding iso-8859-16

2005-08-08 Thread Nicholas Clark
On Thu, Aug 04, 2005 at 11:51:44AM +0530, Sastry wrote: > Hi > > I am running the following script on EBCDIC > > use Encode; > $string = "a"; > $enc_string = encode("iso-8859-16", $string); > print "\n String: $string\n"; > print "\n enc_string: $enc_string\n"; > > > The output: > > String: a

Encoding iso-8859-16

2005-08-03 Thread Sastry
Hi I am running the following script on EBCDIC use Encode; $string = "a"; $enc_string = encode("iso-8859-16", $string); print "\n String: $string\n"; print "\n enc_string: $enc_string\n"; The output: String: a enc_string: ñ (This is the character for codepoint \xF1 on iso-8859-16) What is th

encoding a sting using iso-8859-16 codepage

2005-07-05 Thread Sastry
Hi I have the following problem which gives different results when I run on linux machine and on z/OS Could explain me the reason for so? On linux the enc_string will still beeuro whereas on z/OS it is / >, Thanks in advance regards Sastry use Encode; $string = "eur

use encoding 'utf8' bug/shortcoming?

2005-04-13 Thread John Williams
I have tried the following script on perl-5.8.[356], and I was wonder if someone could confirm whether this is a bug or known shortcoming of use encoding 'utf8'; >From the encoding docs: > Implicit upgrading for byte strings > > By default, if strings operating under byte

ucs-2 encoding and decoding in perl 5.6.1

2005-02-03 Thread Andrew V. Kuzmin
Good afternoon! I can't find in internet some module for perl 5.6.1 that can decode or encode strinmg to or from ucs-2 format. Can you send that to me. Thank you! Andrew from Russia

Re: :encoding() layer modifies read-only scalars

2004-11-29 Thread Bjoern Hoehrmann
>>> for (qw/UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE/) >>> { >>>my $backup = $string; >>>open F, "<:encoding($_)", \$backup; >>>my $char; >>>read F, $char, 1, 0; >>>close F; >>> >>

Re: :encoding() layer modifies read-only scalars

2004-11-29 Thread Nick Ing-Simmons
ot;"); >> >> for (qw/UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE/) >> { >>my $backup = $string; >>open F, "<:encoding($_)", \$backup; >>my $char; >>read F, $char, 1, 0; >>close F; >> >>die u

Re: :encoding() layer modifies read-only scalars

2004-11-28 Thread Bjoern Hoehrmann
> { > my $backup = $string; >open F, "<:encoding($_)", \$backup; >my $char; >read F, $char, 1, 0; >close F; > >die unless $backup eq $string; > } > >Gives > > utf8 "\xFE" does not map to Unicode at ... line 13. >

Re: Website encoding

2004-11-27 Thread John Delacour
s? : my $uri = 'http://www.lemonde.fr'; my $fin = '/tmp/latin1.html'; my $fout = '/tmp/utf8.html'; my $charsetin = "text/html; charset=iso-8859-1"; my $charsetout = "text/html; charset=UTF-8"; `curl -o $fin $uri` ; open(FIN, "<:encoding(iso-8859-1)&qu

Re: Website encoding

2004-11-19 Thread Nick Ing-Simmons
om UTF-8 to perl's internal form. Again this is trivial. > >However it's not working. > >Does that mean that the encoding of the actual characters on the page is >not in the charset in the meta tag? Quite possibly - do you mean the chars in the headers or the body? >

Re: Website encoding

2004-11-17 Thread Masanori HATA
27;s decode function > to turn it into 'perl's internal format' .. which in 5.8.5 is utf8 > right? I then store that in the db. > > However it's not working. > > Does that mean that the encoding of the actual characters on the page is > not in the charset in

Website encoding

2004-11-17 Thread Rick Measham
Please forgive this going to both lists but I'm not sure where things are going wrong... I have many website around the world that I need to index. They're straight HTML pages rather than perl-served and thus the headers say the content-type is 'text/html' .. without mentionin

:encoding() layer modifies read-only scalars

2004-10-31 Thread Bjoern Hoehrmann
Hi, Enocde 2.08, PerlIO::scalar 0.02, ActivePerl 5.8.2, #!perl -w use strict; use warnings; use Encode; my $string = encode(UTF16 => ""); for (qw/UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE/) { my $backup = $string; open F, "<:encoding($_)&quo

Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
de:RETURN_ON_ERR >>> is >>> on when the callar is PerlIO::encoding... >> >> Or, one could backport PerlIO::encoding (with your patch) to CPAN and >> require this latest version for Encode 2.08. > >That was what came across my mind first but I found it was not

Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
d to return >> immediately at partial character but now Encode:RETURN_ON_ERR is >> required, meaning those who installed Encode-2.07 on older perl are in >> trouble w/ PerlIO. So I am looking for a solution which does that >> without tweaking PerlIO::encoding. > >W

Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
d looks like something is broken in >> PerlIO::encoding. >> More precisely, ext/PerlIO/t/encoding.t fails test 14, that tests >> open(F,'<:encoding(utf-8)',$threebyte). > >The easiest solution is the patch below; > >--- ext/PerlIO/encoding/encoding.pm.dist

  1   2   >