Re: encoding(UTF16-LE) on Windows

2011-02-02 Thread Erland Sommarskog
Michael Ludwig ([email protected]) writes: >> For instance, I use Windows exclusively, so Unicode in file names is >> no problem. > > Did a quick test: > > (v5.12.1) built for MSWin32-x86-multi-thread (so ActiveState) > > * a…b.txt > * not correct > * doesn't have anything with "uni" or "utf"

Re: encoding(UTF16-LE) on Windows

2011-01-31 Thread Michael Ludwig
Erland Sommarskog schrieb am 31.01.2011 um 23:42 (+0100): > Michael Ludwig ([email protected]) writes: > > Erland Sommarskog schrieb am 29.01.2011 um 14:02 (+0100): > > > >> Yes, there certainly seems to be some more stuff to do in the > >> Unicode support in Perl. For instance, support for Unicode >

Re: encoding(UTF16-LE) on Windows

2011-01-31 Thread Erland Sommarskog
Michael Ludwig ([email protected]) writes: > Erland Sommarskog schrieb am 29.01.2011 um 14:02 (+0100): > >> Yes, there certainly seems to be some more stuff to do in the Unicode >> support in Perl. For instance, support for Unicode filenames in open >> or opendir. > > I think there is no portable ans

Re: encoding(UTF16-LE) on Windows

2011-01-30 Thread Michael Ludwig
Erland Sommarskog schrieb am 29.01.2011 um 14:02 (+0100): > Yes, there certainly seems to be some more stuff to do in the Unicode > support in Perl. For instance, support for Unicode filenames in open > or opendir. I think there is no portable answer here, as it depends on the filesystem's suppor

RE: encoding(UTF16-LE) on Windows

2011-01-30 Thread Erland Sommarskog
"Jan Dubois" ([email protected]) writes: > I've double-checked with Leon, who thinks that this is due to bug 38456: > > http://rt.perl.org/rt3//Public/Bug/Display.html?id=38456 > > He made a patch to fix the bug, and the patch has been applied to > bleadperl already. I ran you sample scri

RE: encoding(UTF16-LE) on Windows

2011-01-28 Thread Jan Dubois
On Fri, 21 Jan 2011, Erland Sommarskog wrote: > "Jan Dubois" ([email protected]) writes: > > You need to stack the I/O layers in the right order. The :encoding() > > layer needs to come last (be at the bottom of the stack), *after* the > > :crlf layer adds the additional carriage returns. The

RE: encoding(UTF16-LE) on Windows

2011-01-23 Thread Erland Sommarskog
"Jan Dubois" ([email protected]) writes: > You need to stack the I/O layers in the right order. The :encoding() > layer needs to come last (be at the bottom of the stack), *after* the > :crlf layer adds the additional carriage returns. The way to pop the > default :crlf layer is to start out w

RE: encoding(UTF16-LE) on Windows

2011-01-21 Thread Jan Dubois
I wrote: > I saw some discussion today that the :raw pseudo-layer in the open() > call will also remove the buffering layer (it doesn’t do that when you > use it in a binmode() call). I’ll try to remember to send a followup > once I actually understand what is going on. That seems indeed to be the

RE: encoding(UTF16-LE) on Windows

2011-01-21 Thread Jan Dubois
:40 PM To: [email protected] Subject: Re: encoding(UTF16-LE) on Windows Jan Dubois wrote: Files opened on Windows already have the :crlf layer pushed by default, so you somehow need to get the :encoding layer *below* it. Is it possible to re-write the working statement open(my $fh

RE: encoding(UTF16-LE) on Windows

2011-01-21 Thread Jan Dubois
On Fri, 21 Jan 2011, Erland Sommarskog wrote: > > There is still one thing that is not clear to me. The incorrect end-of-line > was > > 0D 00 0A > > But the way you describe it, I would expect it to be > > 0D 0A 00 I went back to the very first message in the thread, where you write: | Wh

RE: encoding(UTF16-LE) on Windows

2011-01-21 Thread Erland Sommarskog
"Jan Dubois" ([email protected]) writes: > Now when you print a string to the filehandle, then it will be passed > to the top-most layer first (:crlf), which will s/\n/\r\n/g on the > string, and then passes it on to the next lower layer :encoding, which > will do the encoding, and when it reach

Re: encoding(UTF16-LE) on Windows

2011-01-20 Thread Bob Hallissy
Jan Dubois wrote: Files opened on Windows already have the :crlf layer pushed by default, so you somehow need to get the :encoding layer*below* it. Is it possible to re-write the working statement open(my $fh, ">:raw:encoding(UTF-16LE):crlf", $filename) or die $!; in a way that works

Re: encoding(UTF16-LE) on Windows

2011-01-20 Thread 'Michael Ludwig'
[RE: encoding(UTF16-LE) on Windows] Jan Dubois schrieb am 20.01.2011 um 12:45 (-0800): > On Thu, 20 Jan 2011, Michael Ludwig wrote: > > Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-): > > > "Jan Dubois" ([email protected]) writes: > > > > You n

RE: encoding(UTF16-LE) on Windows

2011-01-20 Thread Jan Dubois
On Thu, 20 Jan 2011, Erland Sommarskog wrote: > One can sense some potential for improvements. Not the least in the > documentation area. This is open source. Patches welcome! This is how things get better. Cheers, -Jan

RE: encoding(UTF16-LE) on Windows

2011-01-20 Thread Jan Dubois
On Thu, 20 Jan 2011, Michael Ludwig wrote: > Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-): > > "Jan Dubois" ([email protected]) writes: > > > You need to stack the I/O layers in the right order. The :encoding() > > > layer needs to come last (be at the bottom of the stack), *after*

Re: encoding(UTF16-LE) on Windows

2011-01-20 Thread Michael Ludwig
Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-): > "Jan Dubois" ([email protected]) writes: > > You need to stack the I/O layers in the right order. The :encoding() > > layer needs to come last (be at the bottom of the stack), *after* the > > :crlf layer adds the additional carriage re

RE: encoding(UTF16-LE) on Windows

2011-01-20 Thread Erland Sommarskog
"Jan Dubois" ([email protected]) writes: > You need to stack the I/O layers in the right order. The :encoding() > layer needs to come last (be at the bottom of the stack), *after* the > :crlf layer adds the additional carriage returns. The way to pop the > default :crlf layer is to start out w

Re: encoding(UTF16-LE) on Windows

2011-01-19 Thread 'Michael Ludwig'
Jan Dubois schrieb am 19.01.2011 um 11:08 (-0800): > You need to stack the I/O layers in the right order. The :encoding() > layer needs to come last (be at the bottom of the stack), *after* the > :crlf layer adds the additional carriage returns. The way to pop the > default :crlf layer is to sta

RE: encoding(UTF16-LE) on Windows

2011-01-19 Thread Jan Dubois
On Wed, 19 Jan 2011, Michael Ludwig wrote: > Erland Sommarskog schrieb am 17.01.2011 um 13:57 (-): > > I'm on Windows and I have this small script: > > > >use strict; > >open F, '>:encoding(UTF-16LE)', "slask2.txt"; > >print F "1\n2\n3\n"; > >close F; > > > > When I open the out

Re: encoding(UTF16-LE) on Windows

2011-01-19 Thread Michael Ludwig
Erland Sommarskog schrieb am 17.01.2011 um 13:57 (-): > I'm on Windows and I have this small script: > >use strict; >open F, '>:encoding(UTF-16LE)', "slask2.txt"; >print F "1\n2\n3\n"; >close F; > > When I open the output in a hex editor I see > > 31 00 0D 0A 00 32 00 0D 0A

Re: Encoding iso-8859-16

2005-08-19 Thread Nicholas Clark
On Fri, Aug 19, 2005 at 05:51:10PM +0530, Sastry wrote: > Hi > > The test case uses the invariant character that is below <127 on > ISO-8859-16 codepage. Since character 'a' has a codepoint of 129 on > EBCDIC, is there a place in the code where it should apply > NATIVE_TO_ASCII macro on the inp

Re: Encoding iso-8859-16

2005-08-19 Thread Sastry
Hi The test case uses the invariant character that is below <127 on ISO-8859-16 codepage. Since character 'a' has a codepoint of 129 on EBCDIC, is there a place in the code where it should apply NATIVE_TO_ASCII macro on the input character? -Sastry On 8/19/05, Nicholas Clark <[EMAIL PROTECTE

Re: Encoding iso-8859-16

2005-08-19 Thread Nicholas Clark
On Fri, Aug 19, 2005 at 05:01:04PM +0530, Sastry wrote: > Hi Nicholas > > With reference to my previous mail on encoding module > > use Encode; > $string = "a"; > $enc_string = encode("iso-8859-16", $string); > print "\n String: $string\n"; > print "\n enc_string: $enc_string\n"; > > a)How diffe

Re: Encoding iso-8859-16

2005-08-19 Thread Sastry
Hi Nicholas With reference to my previous mail on encoding module use Encode; $string = "a"; $enc_string = encode("iso-8859-16", $string); print "\n String: $string\n"; print "\n enc_string: $enc_string\n"; a)How different are those ext/Encode/def_t.c and ext/Encode/Byte/byte_t.c files in EBCDI

Re: Encoding iso-8859-16

2005-08-10 Thread Nicholas Clark
On Wed, Aug 10, 2005 at 02:11:45PM +0530, Sastry wrote: > On 8/9/05, Nicholas Clark <[EMAIL PROTECTED]> wrote: > > On Tue, Aug 09, 2005 at 10:58:48AM +0530, Sastry wrote: > > > > $enc_string = encode("iso-8859-16", $string); > > So $enc_string should be a single byte, 97, everywhere. > Can you su

Re: Encoding iso-8859-16

2005-08-10 Thread Sastry
On 8/9/05, Nicholas Clark <[EMAIL PROTECTED]> wrote: > On Tue, Aug 09, 2005 at 10:58:48AM +0530, Sastry wrote: > > Hi > > > > I get 73 printed on EBCDIC platform. I think it is supposed to print > > 129 as it is the numeric equivalent of 'a'. > > > > -Sastry > > > > > > > > On 8/8/05, Nicholas Cla

Re: Encoding iso-8859-16

2005-08-09 Thread Sastry
Hi Nicholas Clark I agree that it is supposed to print the numerical equivalent 97. I attempted to see if there is any bug in the encode module. Surprisingly, I noticed that there are two .c files in ext/Encode/def_t.c and ext/Encode/Byte/byte_t.c which are generated using enc2xs. They are diff

Re: Encoding iso-8859-16

2005-08-09 Thread Nicholas Clark
On Tue, Aug 09, 2005 at 10:58:48AM +0530, Sastry wrote: > Hi > > I get 73 printed on EBCDIC platform. I think it is supposed to print > 129 as it is the numeric equivalent of 'a'. > > -Sastry > > > > On 8/8/05, Nicholas Clark <[EMAIL PROTECTED]> wrote: > > On your EBCDIC platform, what does

Re: Encoding iso-8859-16

2005-08-08 Thread Sastry
Hi I get 73 printed on EBCDIC platform. I think it is supposed to print 129 as it is the numeric equivalent of 'a'. -Sastry On 8/8/05, Nicholas Clark <[EMAIL PROTECTED]> wrote: > On Thu, Aug 04, 2005 at 11:51:44AM +0530, Sastry wrote: > > Hi > > > > I am running the following script on EBCDI

Re: Encoding iso-8859-16

2005-08-08 Thread Nicholas Clark
On Thu, Aug 04, 2005 at 11:51:44AM +0530, Sastry wrote: > Hi > > I am running the following script on EBCDIC > > use Encode; > $string = "a"; > $enc_string = encode("iso-8859-16", $string); > print "\n String: $string\n"; > print "\n enc_string: $enc_string\n"; > > > The output: > > String: a

Re: :encoding() layer modifies read-only scalars

2004-11-29 Thread Bjoern Hoehrmann
* Nick Ing-Simmons wrote: >>> Enocde 2.08, PerlIO::scalar 0.02, ActivePerl 5.8.2, >>> >>> #!perl -w >>> use strict; >>> use warnings; >>> use Encode; >>> >>> my $string = encode(UTF16 => ""); >>> >>> for (qw/UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE/) >>> { >>>my $backup = $string;

Re: :encoding() layer modifies read-only scalars

2004-11-29 Thread Nick Ing-Simmons
Bjoern Hoehrmann <[EMAIL PROTECTED]> writes: >* Bjoern Hoehrmann wrote: >> Enocde 2.08, PerlIO::scalar 0.02, ActivePerl 5.8.2, >> >> #!perl -w >> use strict; >> use warnings; >> use Encode; >> >> my $string = encode(UTF16 => ""); >> >> for (qw/UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE/)

Re: :encoding() layer modifies read-only scalars

2004-11-28 Thread Bjoern Hoehrmann
* Bjoern Hoehrmann wrote: > Enocde 2.08, PerlIO::scalar 0.02, ActivePerl 5.8.2, > > #!perl -w > use strict; > use warnings; > use Encode; > > my $string = encode(UTF16 => ""); > > for (qw/UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE/) > { >my $backup = $string; >open F, "<:encodin

Re: encoding...

2003-11-02 Thread John Delacour
At 3:36 pm -0800 2/11/03, Jan Dubois wrote: Should work if you initialize the variable in a BEGIN block: BEGIN { $source = 'MacRoman'; } use encoding $source, STDOUT => 'utf-8'; Ah! Yes, put single quotes around your EOT marker: $text = <<'EOT'; $ome$tuff $ome$tuff $ome$tuff

Re: encoding...

2003-11-02 Thread Jan Dubois
On Sun, 2 Nov 2003 23:24:41 +, John Delacour <[EMAIL PROTECTED]> wrote: >Question 1. > >In this script I would like for convenience' sake to use variables in >the second line, but I don't seem to be able to do so. Am I missing >something or is is simply not possible? > > >$source = 'MacRoma

Re: Encoding vs Charset

2002-03-27 Thread Keld Jørn Simonsen
On Wed, Mar 27, 2002 at 11:59:10AM -0500, Jungshik Shin wrote: > On Wed, 27 Mar 2002, Dan Kogai wrote: > > > On Wednesday, March 27, 2002, at 11:22 , Jungshik Shin wrote: > > > IMHO, you're also misusing the term 'charset' here. MIME charset > > > can be used synonymously with 'encodings' (or >

Re: Encoding vs Charset

2002-03-27 Thread Jungshik Shin
On Wed, 27 Mar 2002, Dan Kogai wrote: > On Wednesday, March 27, 2002, at 11:22 , Jungshik Shin wrote: > > IMHO, you're also misusing the term 'charset' here. MIME charset > > can be used synonymously with 'encodings' (or > > character set encoding scheme: see CJKV Information Processing, > > IE

Re: Encoding vs Charset

2002-03-26 Thread Jungshik Shin
On Tue, 26 Mar 2002, Jungshik Shin wrote: > > really means euc-cn and charset="ks_c_5601-1987" really menas euc-kr. > > Sadly this misconception is enbedded to popular browsers. > M$ OE, M$ Frontpage keep producing html docs. However, > it also has to be noted that the encoding > designated as

Re: Encoding vs Charset

2002-03-26 Thread Jungshik Shin
>And I have found that most of Chinese (Continental; seems like > Taiwanese are much more technically correct) and Korean mails and web > pages confuse "charset" and "encodings". That is, charset="gb2312" IMHO, you're also misusing the term 'charset' here. MIME charset can be used syno