Having dug into this more on Unix, I can see that the aliasing mechanism helps fill in most holes on which encodings to use for which local codepages. But I have also come to the realization that Perl is not using the underlying system code pages but is relying on its own encoding objects to handle conversions. Since only a small set of encoding objects are available by default this would mean that I would need to load up additional Perl CPAN modules to get additional language encodings, otherwise my code wouldn't be able to run much outside of ASCII and English environments. Windows seemed to work ok with Simplified Chinese using the Encode package but maybe the Windows implementation does use the underlying system codepages somehow ?

So am I correct that I would need to load up additional encodings and I couldn't count on Perl to access the wide range of available system encodings otherwise ? I just need to confirm that I am not misunderstanding something here.

Thanks very much,
Dave Schlegel



Nicholas Clark <[EMAIL PROTECTED]>
Sent by: Nicholas Clark <[EMAIL PROTECTED]>

11/09/2005 10:11 AM

To
David Schlegel/Lexington/[EMAIL PROTECTED]
cc
David Graff <[EMAIL PROTECTED]>, perl-unicode@perl.org
Subject
Re: Converting between UTF8 and local codepage without specifying local codepage





On Wed, Nov 09, 2005 at 10:02:31AM -0500, David Schlegel wrote:
> That is helpful information. I have been spending time to determine the
> local page by other means but have consistently been challenged that this
> is the wrong approach and that Perl must know somehow. Getting a
> definitive answer is almost as helpful as getting a better answer.
>
> Based on what you are saying, there is no way to ask Perl what the "local
> codepage" is and hence there can be no variant of "Encode" which can be
> told to convert from "local codepage" to UTF8 without having to provide
> the "local codepage" value explicitly.

Yes. A good summary of the situation.

> Is I18N::Langinfo(CODESET())  the best way to determine the local codepage
> for Unix ? Windows seems to reliably include the codepage number in the
> locale but Unix is all over the map.

I don't know. I have little to no experience of doing conversion of real
data, certainly for data outside of ISO-8859-1 and UTF-8, and I've never used
I18N::Langinfo. I hope that someone else on this list can give a decent
answer.

Nicholas Clark


Reply via email to