Re: Strange "UTF-8" problem

David Graff Tue, 13 Apr 2004 04:30:51 -0700

[EMAIL PROTECTED] said:
> The one line script was:
> perl -e 'open(I, "lang.lng"); while(<I>) {next if /^[\t ]*#/; next unless
> /\w/; s/\s*$//; print;}'
>
> The text file lang.lng was:
> line1=māta
> lala=tāta
>
> It is not a UTF-8 encoded file, but  a simple ANSI/Unix end of line file
> that contains some special chars in latin1 character set.
>
> The program works fine with Perl 5.8.3 under Red Hat, but it gives me
> that error if running it on another system where I have perl 5.8.0.
> However, I guess this has nothing to do with the version of perl....


Actually, it might have everything to do with the version.  5.8.0 would use
the current locale setting on a RedHat OS: if the locale was set to
something referring to utf-8, Perl 5.8.0 would assume a default behavior 
that would try to treat every input file as a utf-8 file.

This was soon recognized as a bad idea, and more recent versions will 
always open input files as "raw" (no special character semantics), and you 
have to specify ":utf8" via the open statement or binmode in order to 
interpret the input data as utf-8.

To get your one-liner to behave the same on 5.8.0 as it does on 5.8.1, you 
need to add "use bytes;" -- this is not necessary (but does no harm) when 
running 5.8.3.
-- 
-----------
David Graff                     Linguistic Data Consortium
[EMAIL PROTECTED]               3600 Market St., Suite 810
voice: (215) 898-0887           University of Pennsylvania
fax:   (215) 573-2175           Philadelphia, PA 19104
                http://www.ldc.upenn.edu

Re: Strange "UTF-8" problem

Reply via email to