RE: UTF-16 -> UTF-8

Rui Ribeiro Wed, 21 Nov 2001 14:07:51 -0800

Dear Martin,

I can use perl 5.6. In fact I'm using it. I thank you for your code, but I need to 
write the
converted words in UTF-16 to a database and not to a text file. We were using the text 
file for
output only to see if the conversion was being properly done. But our true objective 
was (it still
is) to read a text in UTF-8 and parse it (for which Perl seems to be the best option) 
and write the
parts resulting from the parse to a database (Access for starts and then to MS SQL 
Server).


Nevertheless thank you for your suggestions and your code.

Best regards.

Rui Ribeiro

> Dear Rui,
>
> I probably missed the start of this thread where you said that you couldn't
> use Perl 5.6. But if you could use Perl 5.6, then something like this would
> work:
>
> open(INFILE, "<$ARGV[0]") || die "Can't read $ARGV[0]";
> open(OUTFILE, ">$ARGV[1]") || die "Can't write $ARGV[1]";
> binmode OUTFILE;
>
> print OUTFILE pack('v', 0xfeff);
> while(<INFILE>)
> {
>     s/\n$/\015\012/o;
>     print OUTFILE pack('v*', unpack('U*', $_));
> }
>
> close(OUTFILE);
> close(INFILE);
>
> Some thoughts on this code, which I use.
>
> 1. It is set up to work in the Windows environment, hence the newline tidy ups and 
>the
> use of 'v' for packing.
> 2. It doesn't support surrogates. But you could get around this by changing the key 
>line
> to something like (this is untested):
>
> $s = $_;
> print OUTFILE pack ('v*' map {$_ > 0xFFFF ? (($_ >> 10) + 0xD800, ($_ & 0x3FF) + 
>0xDC00)
> : $_} unpack ('U*', $s));
>
> HTH,
> Martin
>
>

RE: UTF-16 -> UTF-8

Reply via email to