RE: UTF-16 -> UTF-8

Tim Scott Wed, 21 Nov 2001 14:35:46 -0800

I don't know if this will help / is related or whatever, but I did find that when processing UTF8 data in an Oracle database PERL *seemed* to corrupt the data beyond recognition : until I built it as a freestanding executable using the Perl Dev. Kit from Activestate - it then all worked fine.

Having already obtained a license for the PDK I thought nothing more of it, just made a note that it needs doing. Might a similar thing resolve your problem ?

By 'beyond recognition' : the script was asked to store two particular bytes which I expected to represent a particular glyph, but it actually stored two entirely different bytes which were represented by some punctuation when displayed in the application. I had been careful at all stages to ensure that the environment and the database were set to use UTF-8, and had changed nothing in the environment or the script to get it working - apart from building the executable.

Maybe it's a clue. Maybe it's a red herring. PDK's free to try for a week ...
[ you may need to 'require DBD::ODBC;' to get it to build entirely freestanding ]

Regards,
Tim

Rui Ribeiro <[EMAIL PROTECTED]> wrote:

Philip,

I think the problem still lies with Perl. Not with Unicode::String though. My guess is this:

When adding the unicode value to the Sql string in
$sql="INSERT INTO Tipo_Referencia ( Descricao ) VALUES ('$palavra_utf16');";
there is an implicit conversion from the Unicode::String object to a common Perl String value. The
common Perl String value doesn't "understand" Unicode, so it treats the multibyte char as several
single byte chars and writes them to Access that way..

I've tried another method to write to the database. But there is also an implicit conversion in this
instruction:

$rs->{"Descricao"} = $palavra_utf16;

$rs is the dynamic recordset to which I'll add a new record, and "Descricao" is the field name to
which I intended to add the Unicode value.

So I think (better to say, I guess) the problem may lie with the fact that Perl doesn't have native
support to Unicode in UTF-16 format (and Access doesn't have for UTF-8 !!!!). So using the functions
/ methods available to write to an Access database from Perl, there will always be a conversion to
something other than the UTF-16 recognized by Access, before the value is actually written.

I guess I'll have to handle my special chars outside Perl. It's less elegant, but probably easier to
solve.

Once again your insigths have been very instructive. Thank you so much for your help.
Best regards.

Rui

> -----Original Message-----
> From: Philip Newton [mailto:[EMAIL PROTECTED]]
> Sent: quarta-feira, 21 de Novembro de 2001 18:29
> To: Rui Ribeiro
> Cc: [EMAIL PROTECTED]
> Subject: Re: UTF-16 -> UTF-8
>
>
> On Wed, 21 Nov 2001 16:34:48 -0000, in perl.unicode you wrote:
>
> > Don't lose more time over this. It seems there is some kind of problem with
> > the recognition of the encoding from other Office apps.
> > Its rather surprising that Notepad regosnizes the characters properly and
> > Word and Access don't.
>
> Would it maybe help to add a BOM (byte order mark) at the beginning of
> the file?
>
> Anyway, I suppose you can now ask more questions on a Word or Access
> list; the Perl part appears to work now, as far as I can see.
>
> Cheers,
> Philip
>

Do You Yahoo!?
Get personalised at My Yahoo!.

RE: UTF-16 -> UTF-8

Reply via email to