Re: Problem with Encoding

2005-02-18 Thread David Cantrell
John Delacour wrote:
At 7:32 pm +0100 17/2/05, Philippe de Rochambeau wrote:
I am trying to convert MacRoman encoded text to iso-8859-1...
The input file, data.txt contains the following string:
Les éléphants sont arrivés. EURO
First of all iso-8859-1 does not contain the Euro sign.  The character 
set you probably intend is Windows-1252
No he doesn't, he wants iso-8859-15
--
David Cantrell | Reality Engineer, Ministry of Information
WARNING! People in front of screen are stupider than they appear
-- Tanuki the Raccoon-dog, in the Monastery


Re: Problem with Encoding

2005-02-18 Thread John Delacour
At 12:33 pm + 18/2/05, David Cantrell wrote:
First of all iso-8859-1 does not contain the Euro sign.  The 
character set you probably intend is Windows-1252
No he doesn't, he wants iso-8859-15
I doubt it very much, but you seem to have inside information.
#!/usr/bin/perl -w
use Encode;
$euro = \x{20ac};
$mac = encode(MacRoman, $euro);
$cp1252 = encode(cp1252, $euro);
$latin9 = encode(iso-8859-15, $euro);
print $mac $cp1252 $latin9;


Re: Problem with Encoding

2005-02-18 Thread David Cantrell
John Delacour wrote:
At 12:33 pm + 18/2/05, David Cantrell wrote:
First of all iso-8859-1 does not contain the Euro sign.  The 
character set you probably intend is Windows-1252
No he doesn't, he wants iso-8859-15
I doubt it very much
If he says he wants ISO 8859 1 and he says he wants the Euro sign, then 
he wants ISO 8859 15 which is identical to 8859 1 but with the generic 
currency symbol replaced with the Euro symbol, and a few rarely used 
characters replaced with slightly less rarely used letters.

  but you seem to have inside information.
That's funny, so do you when you claim he probably intends some odd 
proprietary Microsoft thing from their legacy Windows operating 
system.  Using that is dangerous both because whether the euro character 
is present in the character set depends on which version of windows-1252 
you use, and also because software support for it is poor.

 #!/usr/bin/perl -w
 use Encode;
 $euro = \x{20ac};
 $mac = encode(MacRoman, $euro);
 $cp1252 = encode(cp1252, $euro);
 $latin9 = encode(iso-8859-15, $euro);
 print $mac $cp1252 $latin9;
That prints a capital-U with circumflex (I think, it's hard to see), 
followed by two spaces, followed by a Euro symbol, proving my point 
rather elegantly.  Thankyou!

--
David Cantrell | Reality Engineer, Ministry of Information
 If I was made in God's image, does that
 make God a grouchy unshaven pervert?


Re: Problem with Encoding

2005-02-17 Thread Sherm Pendley
On Feb 17, 2005, at 1:32 PM, Philippe de Rochambeau wrote:
use encoding MacRoman; #, STDIN = MacRoman, STDOUT = 
iso-8859-1;
You've specified STDOUT as 8859-1...
open FH, data2.txt or die $!;
print FH $data;
But you're not printing to STDOUT. Try opening FH like this:
open FH, ':encoding(iso-8859-1)', 'data2.txt' or die $!;
sherm--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org


Re: Problem with Encoding

2005-02-17 Thread John Delacour
At 7:32 pm +0100 17/2/05, Philippe de Rochambeau wrote:
I am trying to convert MacRoman encoded text to iso-8859-1. The 
script below show what I am trying to do.

The input file, data.txt contains the following string:
Les éléphants sont arrivés. EURO
First of all iso-8859-1 does not contain the Euro sign.  The 
character set you probably intend is Windows-1252, loosely termed 
Windows Latin 1 in OS X menus.  Unfortunately Perl has a pretty 
loose approach to charset names too, though when it says iso-8859-1 
it means it and not any extended version of it.

Try this.  It works here with Perl 5.8.6:
#!/usr/bin/perl -w
use encoding MacRoman, STDOUT = windows1252;
chdir $ENV{HOME}/temp/;
open  STDIN,   mac.txt or die $!;
open  STDOUT, windows1252.txt;
while (  ) {
  print;
}
JD