On 2005–07–26, at 11:31, Jeremias Reith (via RT) wrote:
There is an endianess switch after each newline while outputting
UTF-16 on
Win32.
Example script:
#!/usr/bin/perl
binmode(STDOUT, ':encoding(UTF-16)');
map { print $_ . "\n" } @ARGV;
This produces the following (called with "foo bar baz" as command
line params):
0000000: feff 0066 006f 006f 000d 0a00 6200 6100 ...f.o.o....b.a.
0000010: 7200 0d0a 0062 0061 007a 000d 0a r....b.a.z...
The carriage return (\r) is correctly outputted as 0xd but after
that the
newline is printed in little endian (0xa00 instead of 0xa).
Furthermore all
following chars are printed in LE until the next line break.
In my point of view this looks like a bug in the code that
transparently adds
a \r for each newline on the windows platform.
So each line break causes a switch from BE to LE and vice versa.
Output of the same script on Mac OS X (Perl 5.8.6):
0000000: feff 0066 006f 006f 000a 0062 0061 0072 ...f.o.o...b.a.r
0000010: 000a 0062 0061 007a 000a ...b.a.z..
No problem here.
The issue seems to be that the implicit push of the :crlf layer onto
the handle takes place before the explicit push of :encoding(UTF-16)
when, to get the correct results, it should happen after. Some more
tests on Mac OS X perl 5.8.6:
$ perl -we 'binmode(STDOUT, ":crlf"); binmode(STDOUT, ":encoding
(UTF-16)"); map { print $_ . "\n" } @ARGV;' foo bar baz | od -x
0000000 feff 0066 006f 006f 000d 0a00 6200
6100
0000020 7200 0d0a 0062 0061 007a 000d 0a00
0000035
$ perl -we 'binmode(STDOUT, ":encoding(UTF-16)"); binmode(STDOUT,
":crlf"); map { print $_ . "\n" } @ARGV;' foo bar baz | od -x
0000000 feff 0066 006f 006f 000d 000a 0062
0061
0000020 0072 000d 000a 0062 0061 007a 000d
000a
0000040
$ perl -we 'binmode(STDOUT, ":raw:encoding(UTF-16):crlf"); map
{ print $_ . "\n" } @ARGV;' foo bar baz | od -x
0000000 feff 0066 006f 006f 000d 000a 0062
0061
0000020 0072 000d 000a 0062 0061 007a 000d
000a
0000040
The first is wrong, and what I suspect is happening due to the
implicit push on Windows; the second, with the pushes swapped, gives
the right answer; the third does too and is what I think you need to
get around the issue. (The :raw is a no-op for Mac OS X, but won't be
for Windows.) As I don't have a Windows box to hand, please try it
and get back to us with the result.
Thanks.
--
Dominic Dunlop