Re: Problem with Encoding
John Delacour wrote: At 7:32 pm +0100 17/2/05, Philippe de Rochambeau wrote: I am trying to convert MacRoman encoded text to iso-8859-1... The input file, data.txt contains the following string: Les éléphants sont arrivés. EURO First of all iso-8859-1 does not contain the Euro sign. The character set you probably intend is Windows-1252 No he doesn't, he wants iso-8859-15 -- David Cantrell | Reality Engineer, Ministry of Information WARNING! People in front of screen are stupider than they appear -- Tanuki the Raccoon-dog, in the Monastery
Re: Problem with Encoding
At 12:33 pm + 18/2/05, David Cantrell wrote: First of all iso-8859-1 does not contain the Euro sign. The character set you probably intend is Windows-1252 No he doesn't, he wants iso-8859-15 I doubt it very much, but you seem to have inside information. #!/usr/bin/perl -w use Encode; $euro = \x{20ac}; $mac = encode(MacRoman, $euro); $cp1252 = encode(cp1252, $euro); $latin9 = encode(iso-8859-15, $euro); print $mac $cp1252 $latin9;
Re: Problem with Encoding
John Delacour wrote: At 12:33 pm + 18/2/05, David Cantrell wrote: First of all iso-8859-1 does not contain the Euro sign. The character set you probably intend is Windows-1252 No he doesn't, he wants iso-8859-15 I doubt it very much If he says he wants ISO 8859 1 and he says he wants the Euro sign, then he wants ISO 8859 15 which is identical to 8859 1 but with the generic currency symbol replaced with the Euro symbol, and a few rarely used characters replaced with slightly less rarely used letters. but you seem to have inside information. That's funny, so do you when you claim he probably intends some odd proprietary Microsoft thing from their legacy Windows operating system. Using that is dangerous both because whether the euro character is present in the character set depends on which version of windows-1252 you use, and also because software support for it is poor. #!/usr/bin/perl -w use Encode; $euro = \x{20ac}; $mac = encode(MacRoman, $euro); $cp1252 = encode(cp1252, $euro); $latin9 = encode(iso-8859-15, $euro); print $mac $cp1252 $latin9; That prints a capital-U with circumflex (I think, it's hard to see), followed by two spaces, followed by a Euro symbol, proving my point rather elegantly. Thankyou! -- David Cantrell | Reality Engineer, Ministry of Information If I was made in God's image, does that make God a grouchy unshaven pervert?
Re: Problem with Encoding
On Feb 17, 2005, at 1:32 PM, Philippe de Rochambeau wrote: use encoding MacRoman; #, STDIN = MacRoman, STDOUT = iso-8859-1; You've specified STDOUT as 8859-1... open FH, data2.txt or die $!; print FH $data; But you're not printing to STDOUT. Try opening FH like this: open FH, ':encoding(iso-8859-1)', 'data2.txt' or die $!; sherm-- Cocoa programming in Perl: http://camelbones.sourceforge.net Hire me! My resume: http://www.dot-app.org
Re: Problem with Encoding
At 7:32 pm +0100 17/2/05, Philippe de Rochambeau wrote: I am trying to convert MacRoman encoded text to iso-8859-1. The script below show what I am trying to do. The input file, data.txt contains the following string: Les éléphants sont arrivés. EURO First of all iso-8859-1 does not contain the Euro sign. The character set you probably intend is Windows-1252, loosely termed Windows Latin 1 in OS X menus. Unfortunately Perl has a pretty loose approach to charset names too, though when it says iso-8859-1 it means it and not any extended version of it. Try this. It works here with Perl 5.8.6: #!/usr/bin/perl -w use encoding MacRoman, STDOUT = windows1252; chdir $ENV{HOME}/temp/; open STDIN, mac.txt or die $!; open STDOUT, windows1252.txt; while ( ) { print; } JD