Re: Wide character support for Test::More

2008-02-24 Thread Aristotle Pagaltzis
* Michael G Schwern <[EMAIL PROTECTED]> [2008-02-24 05:40]:
> I just merged together a number of tickets having to do with
> Test::More not liking wide characters.

Good. Now you can close them, since it’s not your bug. It’s the
main program’s responsibility to set the encoding on its handles
sensibly.

It is not your module doing something wrong in its `print`s; it
is your users not setting up their environment properly.

> use 5.008;
> use strict;
> use warnings;
  use open ':std', ':locale';
> use Test::More tests => 1;
>
> my $uni = "\x{11e}";
>
> ok( $uni eq $uni, "Testing $uni" );
>
> __END__
> 1..1
> Wide character in print at lib/Test/Builder.pm line 1252.
  ^^ after the above patch, gone
> ok 1 - Testing Ğ
> 
> I know almost nothing about Unicode.  How do I make this Just
> Work?  Is it safe to just set binmode to always be ':utf8' if
> perl > 5.8?

No! If that were the case, why is that not the default?

The right thing is for the caller to have his locale set
properly, so that if his terminal expects Latin-1 or KOI-8R or
UTF-8, the above L incantation will transcode output to
Latin-1 or KOI-8R or UTF-8 as appropriate.

This also ensures that if the tests try to output a character
that the terminal cannot render, the user will get a warning
and the character will be output as an `\x` escape. If you
blithely assume that their terminal can render UTF-8, and it
cannot, you are likely to throw garbage at the terminal that it
will interpret as control sequences, screwing up the display
entirely.

Unfortunately, a properly set locale was not the rule, at least
a few years ago; Perl 5.8.0 did the moral equivalent of that
`open` automatically, and it caused a lot of problems for users
when distros like RedHat started going all-UTF8 in earnest. I
have no idea whether the situation is better now… although one
would hope it is.

Anyway, if I’ve set up my STDFOO handles to transcode to a
different encoding, I certainly wouldn’t want Test::More to
blithely stomp all over that configuration. *That* *would* be
a bug… and squarely *your* bug.

Regards,
-- 
Aristotle Pagaltzis // 


Re: Wide character support for Test::More

2008-02-24 Thread Nicholas Clark
On Sat, Feb 23, 2008 at 08:35:15PM -0800, Michael G Schwern wrote:
> I just merged together a number of tickets having to do with Test::More not 
> liking wide characters.

> Wide character in print at lib/Test/Builder.pm line 1252.
> ok 1 - Testing ??
> 
> 
> I know almost nothing about Unicode.  How do I make this Just Work?  Is it 

You can't. It might be that the user's terminal simply cannot display that
character.

One could get round it by escaping (in some way) all characters that are non-
ASCII, for example with \x{} encoding, and also escaping \ (at least) so that
all the diagnostic output would be valid for a "" string.

> safe to just set binmode to always be ':utf8' if perl > 5.8?

Yes, safe from a Perl point of view. But not correct if the user's terminal
isn't expecting UTF-8. And wrong if the user's terminal is expecting the
same (8 bit) encoding as the script already is in.

However, it might be safe enough to invoke the testing Perl with -CLS
(set STDIN/STDOUT/STDERR to UTF-8 if the user's locale has UTF-8 in it)
which is probably going to be more right more often than anything else.
Not sure if -CLS only came in with 5.8.1

Nicholas Clark


Re: Wide character support for Test::More

2008-02-24 Thread Aristotle Pagaltzis
* Nicholas Clark <[EMAIL PROTECTED]> [2008-02-24 11:55]:
> However, it might be safe enough to invoke the testing Perl
> with -CLS (set STDIN/STDOUT/STDERR to UTF-8 if the user's
> locale has UTF-8 in it) which is probably going to be more
> right more often than anything else. Not sure if -CLS only came
> in with 5.8.1

Yes, it’s a 5.8.1 innovation. Unfortunately, you can neither pass
that switch via the shebang line nor easily emulate its effect
from within a script.

(The L incantation in my other mail is unconditional, in
contrast to a `-C` switch with the `L` option. OTOH it will set
the encoding to whatever the locale says; if that’s not in UTF-8,
it will set it to whatever is appropriate.)

Regards,
-- 
Aristotle Pagaltzis // 


Re: Wide character support for Test::More

2008-02-24 Thread Michael G Schwern

Aristotle Pagaltzis wrote:

use 5.008;
use strict;
use warnings;

  use open ':std', ':locale';

use Test::More tests => 1;

my $uni = "\x{11e}";

ok( $uni eq $uni, "Testing $uni" );

__END__
1..1
Wide character in print at lib/Test/Builder.pm line 1252.

  ^^ after the above patch, gone


There's the rub, it doesn't go away.

Test::Builder dups STDERR and STDOUT, this is so you can mess with them to 
your heart's content and still get testing done.  File I/O disciplines don't 
appear to be copied across dups.  That's what everyone was complaining about, 
that they had to manually apply layers to Test::Builder's own handles.


It appears I have to manually copy the layers across, ok.

sub _copy_io_layers {
my($self, $src, $dest) = @_;

$self->_try(sub {
require PerlIO;
my @layers = PerlIO::get_layers($src);

binmode $dest, join " ", map ":$_", @layers if @layers;
});
}

That does it.  Thank you for playing software confessional. :)


--
The past has a vote, but not a veto.
-- Mordecai M. Kaplan