RE: UTF-16 -> UTF-8

Rui Ribeiro Wed, 21 Nov 2001 15:12:35 -0800

Yes Tim I see your point. There is probably a relation with my problem. But is seems a bit strange that that happened using UTF-8, because Perl 5.6 seems to treat UTF-8 chars properly.

My big problem is that I simply can't use UTF-8 because MS databases only recognize UTF-16 or UCS2 (I think they're the same right?). They can't handle UTF-8 at all. If they could I think there wouldn't be a problem.

Its just a pity that in Perl UTF-8 is the "native" format for Unicode support. At least for the Windows environment, for which UTF-16 is the native format. Then you have this type of problems ocurring.

But there are other inconsistencies even in the Windows "universe". If you want to ouput database Unicode content to a browser you need to use UTF-8, when using IIS. Fortunately, the conversion from UTF-16 to UTF-8 can be done automatically using several methods from ASP objects (I know that because my database content in Unicode will have to be viewed in a browser and I had to test it). So it is a rather messy situation.

Thanks for your tip.

Regards,

Rui

-----Original Message-----
From: Tim Scott [mailto:[EMAIL PROTECTED]]
Sent: quarta-feira, 21 de Novembro de 2001 22:37
To: Rui Ribeiro; Philip Newton
Cc: [EMAIL PROTECTED]
Subject: RE: UTF-16 -> UTF-8

I don't know if this will help / is related or whatever, but I did find that when processing UTF8 data in an Oracle database PERL *seemed* to corrupt the data beyond recognition : until I built it as a freestanding executable using the Perl Dev. Kit from Activestate - it then all worked fine.
Having already obtained a license for the PDK I thought nothing more of it, just made a note that it needs doing. Might a similar thing resolve your problem ?
By 'beyond recognition' : the script was asked to store two particular bytes which I expected to represent a particular glyph, but it actually stored two entirely different bytes which were represented by some punctuation when displayed in the application. I had been careful at all stages to ensure that the environment and the database were set to use UTF-8, and had changed nothing in the environment or the script to get it working - apart from building the executable.
Maybe it's a clue. Maybe it's a red herring. PDK's free to try for a week ...
[ you may need to 'require DBD::ODBC;' to get it to build entirely freestanding ]
Regards,
Tim
Rui Ribeiro <[EMAIL PROTECTED]> wrote:
Philip,

I think the problem still lies with Perl. Not with Unicode::String though. My guess is this:

When adding the unicode value to the Sql string in
$sql="INSERT INTO Tipo_Referencia ( Descricao ) VALUES ('$palavra_utf16');";
there is an implicit conversion from the Unicode::String object to a common Perl String value. The
common Perl String value doesn't "understand" Unicode, so it treats the multibyte char as several
single byte chars and writes them to Access that way..

I've tried another method to write to the database. But there is also an implicit conversion in this
instruction:

$rs->{"Descricao"} = $palavra_utf16;

$rs is the dynamic recordset to which I'll add a new record, and "Descricao" is the field name to
which I intended to add the Unicode value.

So I think (better to say, I guess) the problem may lie with the fa! ct that Perl doesn't have native
support to Unicode in UTF-16 format (and Access doesn't have for UTF-8 !!!!). So using the functions
/ methods available to write to an Access database from Perl, there will always be a conversion to
something other than the UTF-16 recognized by Access, before the value is actually written.

I guess I'll have to handle my special chars outside Perl. It's less elegant, but probably easier to
solve.

Once again your insigths have been very instructive. Thank you so much for your help.
Best regards.

Rui

> -----Original Message-----
> From: Philip Newton [mailto:[EMAIL PROTECTED]]
> Sent: quarta-feira, 21 de Novembro de 2001 18:29
> To: Rui Ribeiro
> Cc: [EMAIL PROTECTED]
> Subject: Re: UTF-16 -> UTF-8
>
>
> On Wed, 21 Nov 2001 16:34:48 -0000, in perl.unicode you wrote:
>
> > Don't lose more time over this. It seems there is som! e kind of problem with
> > the recognition of the encoding from other Office apps.
> > Its rather surprising that Notepad regosnizes the characters properly and
> > Word and Access don't.
>
> Would it maybe help to add a BOM (byte order mark) at the beginning of
> the file?
>
> Anyway, I suppose you can now ask more questions on a Word or Access
> list; the Perl part appears to work now, as far as I can see.
>
> Cheers,
> Philip
>

Do You Yahoo!?
Get personalised at My Yahoo!.

RE: UTF-16 -> UTF-8

Reply via email to