On 2005–07–26, at 11:31, Jeremias Reith (via RT) wrote:

There is an endianess switch after each newline while outputting UTF-16 on
Win32.

Example script:

#!/usr/bin/perl

binmode(STDOUT, ':encoding(UTF-16)');
map { print $_ . "\n" } @ARGV;


This produces the following (called with "foo bar baz" as command line params):

0000000: feff 0066 006f 006f 000d 0a00 6200 6100  ...f.o.o....b.a.
0000010: 7200 0d0a 0062 0061 007a 000d 0a         r....b.a.z...

The carriage return (\r) is correctly outputted as 0xd but after that the newline is printed in little endian (0xa00 instead of 0xa). Furthermore all
following chars are printed in LE until the next line break.

In my point of view this looks like a bug in the code that transparently adds
a \r for each newline on the windows platform.

So each line break causes a switch from BE to LE and vice versa.

Output of the same script on Mac OS X (Perl 5.8.6):

0000000: feff 0066 006f 006f 000a 0062 0061 0072  ...f.o.o...b.a.r
0000010: 000a 0062 0061 007a 000a                 ...b.a.z..

No problem here.

The issue seems to be that the implicit push of the :crlf layer onto the handle takes place before the explicit push of :encoding(UTF-16) when, to get the correct results, it should happen after. Some more tests on Mac OS X perl 5.8.6:

$ perl -we 'binmode(STDOUT, ":crlf"); binmode(STDOUT, ":encoding (UTF-16)"); map { print $_ . "\n" } @ARGV;' foo bar baz | od -x 0000000 feff 0066 006f 006f 000d 0a00 6200 6100
0000020      7200    0d0a    0062    0061    007a    000d    0a00
0000035
$ perl -we 'binmode(STDOUT, ":encoding(UTF-16)"); binmode(STDOUT, ":crlf"); map { print $_ . "\n" } @ARGV;' foo bar baz | od -x 0000000 feff 0066 006f 006f 000d 000a 0062 0061 0000020 0072 000d 000a 0062 0061 007a 000d 000a
0000040
$ perl -we 'binmode(STDOUT, ":raw:encoding(UTF-16):crlf"); map { print $_ . "\n" } @ARGV;' foo bar baz | od -x 0000000 feff 0066 006f 006f 000d 000a 0062 0061 0000020 0072 000d 000a 0062 0061 007a 000d 000a
0000040

The first is wrong, and what I suspect is happening due to the implicit push on Windows; the second, with the pushes swapped, gives the right answer; the third does too and is what I think you need to get around the issue. (The :raw is a no-op for Mac OS X, but won't be for Windows.) As I don't have a Windows box to hand, please try it and get back to us with the result.

Thanks.
--
Dominic Dunlop

Reply via email to