Pierre Nugues schrieb am 06.09.2010 um 22:02 (+0200):

> 2/ The output with "use utf8;"

This pragma tells the interpreter that your script source is in UTF-8.
So it affects the literals in your tr/// list. It does not tell the
interpreter what output encoding to use.

> 3/ With 
> use utf8;
> binmode(STDOUT, ':utf8');
> I get (this time, the terminal can display the <C2> as a Â. This is
> not correct. It strips the accented characters):

Some bytes might have been butchered away by the tr operator.

> 4/ With binmode(STDOUT, ':utf8') only (Then, there is a combination of
> wrongly coded quotes in Latin 1 or Latin 9  that the terminal displays
> and accented characters that are shown with their UTF-8 substitutes
> interpreted as Latin 1 or Latin 9 characters);
> 
> »Tjuvgömmare
> !
> »
> säga

Your output is double-encoded. This is what happens here:

(1) You're reading text encoded as UTF-8 in binary mode.
(2) Consequently, you don't have text in Perl: you have octets.
(3) You're applying some butchery to the octets using the tr operator.
(4) You're outputting the remaining octets encoding them as UTF-8.
(5) You're seeing garbage on the screen.

-- 
Michael Ludwig

Reply via email to