> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Kalarness, Bill
> Sent: 28 March 2006 14:34
> To: PerlWin32Users
> Subject: How to read/write text files with "unicode" encoding?
> 
> 
> > Hi,
> > 
> > I'm trying to read the contents of one or more text files, with the
> > ultimate goal of sorting, merging, and filtering the contents.

Assuming you are talking about 'Windows Unicode', the encoding you are
after is UTF-16LE. I replaced your open IN with:

open IN, "<:raw:encoding(UTF-16LE):crlf", "$glossaryout"

and your open OUT with:

open OUT, ">:raw:encoding(UTF-16LE):crlf", "$newout"

for output, Windows then likes you to add a byte order mark (BOM) before
any other data, as follows:

print OUT "\x{feff}";

It throws a warning when printing that character - no idea how to
suppress it - but the resulting file looks good, and a quick look at a
hex view suggests your data is being preserved.

I'll be honest and admit I don't completely understand all this, but I
fiddled for ages a while back when having to create files which were
'unicode', and this set of options seemed to do the job.

I gave up on your example two!

Modified script attached - hope it helps.

Cheers,
Paul

*****************************************************************
Gloucester Research Limited believes the information 
provided herein is reliable. While every care has been 
taken to ensure accuracy, the information is furnished 
to the recipients with no warranty as to the completeness 
and accuracy of its contents and on condition that any 
errors or omissions shall not be made the basis for any 
claim, demand or cause for action.

The information in this email is intended only for the 
named recipient.  If you are not the intended recipient
please notify us immediately and do not copy, distribute 
or take action based on this e-mail.

Gloucester Research Limited, 5th Floor, Whittington House, 
19-30 Alfred Place, London WC1E 7EA
*****************************************************************

Attachment: unicode_test.pl
Description: unicode_test.pl

_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to