Dear Martin, I can use perl 5.6. In fact I'm using it. I thank you for your code, but I need to write the converted words in UTF-16 to a database and not to a text file. We were using the text file for output only to see if the conversion was being properly done. But our true objective was (it still is) to read a text in UTF-8 and parse it (for which Perl seems to be the best option) and write the parts resulting from the parse to a database (Access for starts and then to MS SQL Server).
Nevertheless thank you for your suggestions and your code. Best regards. Rui Ribeiro > Dear Rui, > > I probably missed the start of this thread where you said that you couldn't > use Perl 5.6. But if you could use Perl 5.6, then something like this would > work: > > open(INFILE, "<$ARGV[0]") || die "Can't read $ARGV[0]"; > open(OUTFILE, ">$ARGV[1]") || die "Can't write $ARGV[1]"; > binmode OUTFILE; > > print OUTFILE pack('v', 0xfeff); > while(<INFILE>) > { > s/\n$/\015\012/o; > print OUTFILE pack('v*', unpack('U*', $_)); > } > > close(OUTFILE); > close(INFILE); > > Some thoughts on this code, which I use. > > 1. It is set up to work in the Windows environment, hence the newline tidy ups and >the > use of 'v' for packing. > 2. It doesn't support surrogates. But you could get around this by changing the key >line > to something like (this is untested): > > $s = $_; > print OUTFILE pack ('v*' map {$_ > 0xFFFF ? (($_ >> 10) + 0xD800, ($_ & 0x3FF) + >0xDC00) > : $_} unpack ('U*', $s)); > > HTH, > Martin > >