RE: UTF-16 -> UTF-8

Rui Ribeiro Tue, 20 Nov 2001 16:18:59 -0800

Philip,

Thank you for your help.
This work is being done by a couple of students of mine, so I just sent you one of the 
results of
the experiments. But they have tried other things. So I'll make some localized 
comments bellow.



> On Tue, 20 Nov 2001 16:35:25 -0000, in perl.unicode you wrote:
>
> > open(FICH1,"fich1.txt")||die"Nao foi possivel abrir o ficheiro fich1.txt";
> > open(FICH3,">fich3.txt")||die"Nao foi possivel abrir o ficheiro fich3.txt";
>
> Good that you check for success, but you should also include the reason
> -- it's in $!. For example:
>
>     open(FICH1, "fich1.txt") || die "Nao foi possivel abrir " .
>                                     "o ficheiro fich1.txt: $!";
>
> > use utf8;

Yes, there is no need for it;

>
> You shouldn't need that. Unicode::String will do all the Unicodery for
> you; your program only needs to handly 'plain' bytes.
>
> > while (<FICH1>) {
> >     chomp($_);
> >     $palavra1=$_;
> >     @array=split(/ /,$palavra1);
>
> What do you use $palavra1 and @array for? (And @array is usually a bad
> variable name.)

Yes, quite true. I guess they left it from some experiment and I overlooked it.

> >     $palavra2=utf16($_);
>
> Here is a mistake. If you call utf16($_), it means "$_ is a string
> encoded in UTF-16. Take it and convert it into a Unicode::String
> object."

We've tried with utf8. It does read well and it writes well as long as you write it in 
utf8.
>
> But you said you wanted to convert from UTF-8 to UTF-16. So you probably
> want something like
>
>     $palavra_objeito = utf8($_);
>     $palavra_em_utf16 = $palavra_objeito->utf16;

We've tried just that and the result wasn't what we expected...
>
> Note that ->utf16 will return UTF-16BE, as I understand it, since
> "Internally a Unicode::String object is a string of 2 byte values in
> network byte order (big-endian)" (quote from the docs). So if your
> database and/or file wants UTF-16LE (which is more natural for Intel
> chips), then you need to do something such as
>
>     $palavra_objeito->byteswap;

Now there's something we didn't try.
>
> first (after you assign to $palavra_objeito and before you call ->utf16)
> to convert from big-endian to little-endian.

>
> >     $sql =  "INSERT INTO Tipo_Referencia ( Descricao ) SELECT '$palavra2' AS 
>Expr1;";
>
> Is there a reason why you don't write this as
>
>     $sql = "INSERT INTO Tipo_Referencia ( Descricao ) " .
>            "VALUES ('$palavra_em_utf16')"

Not really, but the previous sintax has worked many times.
>
> ? The "INSERT INTO table (columns) VALUES (literals)" is, for me, the
> usual syntax, and "INSERT INTO table (columns) SELECT literals AS dummy"
> looks strange to me.

Maybe, I just copied the sintax from an Access Query. It worked in many occasions. 
Even writing an
UTF-8 value worked with that sintax. Obviously Access didn't make much sense of it as 
UTF-8 isn't
really something it "understands".

But your sintax is the most correct one (and the one respecting SQL standard).

>
> >     print FICH3 $palavra2,"\n";
> >     $conn->execute($sql,,,adExecuteNoRecords);
>
> This is the same as
>
>     $conn->execute($sql,adExecuteNoRecords);
>
> .. If the constant adExecuteNoRecords has to be the fourth parameter to
> ->execute, then say so:
>
>     $conn->execute($sql, undef, undef, adExecuteNoRecords);
>
> .. Perl isn't Visual Basic :)

There, you caught me. I'm much more fluent in VB than in Perl, and I was the one that 
gave my
students the ADO code...
>

> To summarise, I think you have misunderstood how Unicode::String works.
> utf16() (called as a function, not a method) doesn't convert a strong
> *to* UTF-16, it expects a string in UTF-16 and converts *from* that
> encoding into the internal format used by Unicode::String and returns an
> object. Then you can call methods on that object to produce another
> encoding such as UTF-8 or Latin-1 or whatever. So conversions involving
> Unicode::String generally involve at least two calls.

Not quite, but it is clear that it was a bad example and your conclusions are, 
therefore justified.
I'll try your suggestions and let you know about the result.

Thank you for your time and your help.

Regards.

Rui

RE: UTF-16 -> UTF-8

Reply via email to