Re: Writing utf 8 files

2006-06-23 Thread Joel Rees


If you open the file as utf-8 you will see ö and if you open it  
as MacRoman you will see ö.  You could also open it as  
Traditional Chinese or Simplified Chinese or many other things and  
see other things.  UTF-8 byte order is always the same, so there is  
no need for a BOM, though some editors might use it as a hint.


Given that his editor seems to have interpreted the file as utf-8  
with the BOM in place and as something else without the BOM, we might  
guess that his editor recognizes the BOM.


We could also, of course, guess that his login account is set to  
default to something other than utf-8, which is also in keeping with  
my experience with Mac OS X when the user has not deliberately messed  
around with things.




Writing utf 8 files

2006-06-22 Thread Tommy Nordgren
How do I write proper utf 8 characters to a file? I write only two  
characters, and they come out as four

garbage characters when I view the file in an editor.
-
This sig is dedicated to the advancement of Nuclear Power
Tommy Nordgren
[EMAIL PROTECTED]





Re: Writing utf 8 files

2006-06-22 Thread Tommy Nordgren


22 jun 2006 kl. 20.15 skrev Sherm Pendley:


On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote:

How do I write proper utf 8 characters to a file? I write only two  
characters, and they come out as four

garbage characters when I view the file in an editor.


Quick answer:

open FH, :utf8, file;

Complete answer:

perldoc perluniintro
perldoc PerlIO

sherm--

Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org


I've already tried that. That was what i was doing when I got garbage.
-
This sig is dedicated to the advancement of Nuclear Power
Tommy Nordgren
[EMAIL PROTECTED]





Re: Writing utf 8 files

2006-06-22 Thread Sherm Pendley

On Jun 22, 2006, at 2:29 PM, Tommy Nordgren wrote:


22 jun 2006 kl. 20.15 skrev Sherm Pendley:


On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote:

How do I write proper utf 8 characters to a file? I write only  
two characters, and they come out as four

garbage characters when I view the file in an editor.


Quick answer:

open FH, :utf8, file;

Complete answer:

perldoc perluniintro
perldoc PerlIO

	I've already tried that. That was what i was doing when I got  
garbage.


Well, the above is correct as far as Perl goes - but it doesn't rule  
out other problems. Are you certain that the editor you're using is  
interpreting the file correctly, as UTF8? Also, are you certain that  
your input really is UTF8?


For instance, I ran this script to generate a test file:

#!/usr/bin/perl

use strict;
use warnings;
use utf8; # This allows utf8 in string literals, like below

open FH, ':utf8', '/Users/sherm/hello.txt' or die $!;
print FH Hëllö, wörld!\n;
close FH;

When I open the file in BBEdit, I see gibberish, because BBEdit can't  
determine that it's UTF8 (there's no BOM), and misinterprets it as  
the default Mac OS Roman instead. But, if I change BBEdit's default  
encoding, or use the Reopen Using Encoding function, BBEdit  
displays the file correctly.


sherm--

Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org



Re: Writing utf 8 files

2006-06-22 Thread Tommy Nordgren


22 jun 2006 kl. 20.29 skrev Tommy Nordgren:



22 jun 2006 kl. 20.15 skrev Sherm Pendley:


On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote:

How do I write proper utf 8 characters to a file? I write only  
two characters, and they come out as four

garbage characters when I view the file in an editor.


Quick answer:

open FH, :utf8, file;

Complete answer:

perldoc perluniintro
perldoc PerlIO

sherm--

Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org

	I've already tried that. That was what i was doing when I got  
garbage.


I found the problem it is necessary to
1) use the use utf8 pragma;
2) Explicitly write a BOM byte sequence immediately after opening the  
file.

point 2 is where I erred. I expected the BOM to be added automatically,
when opening a file for write with the utf-8 encoding.
-
This sig is dedicated to the advancement of Nuclear Power
Tommy Nordgren
[EMAIL PROTECTED]





Re: Writing utf 8 files

2006-06-22 Thread Sherm Pendley

On Jun 22, 2006, at 3:28 PM, Tommy Nordgren wrote:


22 jun 2006 kl. 20.29 skrev Tommy Nordgren:



22 jun 2006 kl. 20.15 skrev Sherm Pendley:


On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote:

How do I write proper utf 8 characters to a file? I write only  
two characters, and they come out as four

garbage characters when I view the file in an editor.


Quick answer:

open FH, :utf8, file;

Complete answer:

perldoc perluniintro
perldoc PerlIO

	I've already tried that. That was what i was doing when I got  
garbage.


I found the problem it is necessary to
1) use the use utf8 pragma;


That's only needed if your actual Perl code is UTF-8 encoded, like my  
example was. If your UTF-8 data is coming from an external source,  
use utf8 has no effect.


sherm--

Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org



Re: Writing utf 8 files

2006-06-22 Thread John Delacour

At 9:28 pm +0200 22/6/06, Tommy Nordgren wrote:


On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote:

How do I write proper utf 8 characters to a file? I write only 
two characters, and they come out as four

garbage characters when I view the file in an editor.


The only reason for that can be that you have your editor set to open 
files as MacRoman or some non-utf-8 charset.  Provided your editor 
prefs are set to open as utf-8 or you opt for utf-8 in the open file 
dialog you will not get this problem.




I found the problem it is necessary to
1) use the use utf8 pragma;
2) Explicitly write a BOM byte sequence immediately after opening the file.
point 2 is where I erred. I expected the BOM to be added automatically,
when opening a file for write with the utf-8 encoding.


You would need to give an example of what you are doing, but neither 
of those things should be necessary and nor should it be necessary to 
specify utf-8 when opening the filehandle as Sherm suggested.


The following script will write ö, utf8-encoded to trash.txt on 
the desktop:


#!/usr/bin/perl
my $text = ö;
my $f = $ENV{HOME}/desktop/trash.txt;
open F, $f or die $!;
print F $text;
close F;

If you open the file as utf-8 you will see ö and if you open it as 
MacRoman you will see ö.  You could also open it as Traditional 
Chinese or Simplified Chinese or many other things and see other 
things.  UTF-8 byte order is always the same, so there is no need for 
a BOM, though some editors might use it as a hint.


JD