Philip,

Thank you for your help.
This work is being done by a couple of students of mine, so I just sent you one of the 
results of
the experiments. But they have tried other things. So I'll make some localized 
comments bellow.


> On Tue, 20 Nov 2001 16:35:25 -0000, in perl.unicode you wrote:
>
> > open(FICH1,"fich1.txt")||die"Nao foi possivel abrir o ficheiro fich1.txt";
> > open(FICH3,">fich3.txt")||die"Nao foi possivel abrir o ficheiro fich3.txt";
>
> Good that you check for success, but you should also include the reason
> -- it's in $!. For example:
>
>     open(FICH1, "fich1.txt") || die "Nao foi possivel abrir " .
>                                     "o ficheiro fich1.txt: $!";
>
> > use utf8;

Yes, there is no need for it;

>
> You shouldn't need that. Unicode::String will do all the Unicodery for
> you; your program only needs to handly 'plain' bytes.
>
> > while (<FICH1>) {
> >     chomp($_);
> >     $palavra1=$_;
> >     @array=split(/ /,$palavra1);
>
> What do you use $palavra1 and @array for? (And @array is usually a bad
> variable name.)

Yes, quite true. I guess they left it from some experiment and I overlooked it.

> >     $palavra2=utf16($_);
>
> Here is a mistake. If you call utf16($_), it means "$_ is a string
> encoded in UTF-16. Take it and convert it into a Unicode::String
> object."

We've tried with utf8. It does read well and it writes well as long as you write it in 
utf8.
>
> But you said you wanted to convert from UTF-8 to UTF-16. So you probably
> want something like
>
>     $palavra_objeito = utf8($_);
>     $palavra_em_utf16 = $palavra_objeito->utf16;

We've tried just that and the result wasn't what we expected...
>
> Note that ->utf16 will return UTF-16BE, as I understand it, since
> "Internally a Unicode::String object is a string of 2 byte values in
> network byte order (big-endian)" (quote from the docs). So if your
> database and/or file wants UTF-16LE (which is more natural for Intel
> chips), then you need to do something such as
>
>     $palavra_objeito->byteswap;

Now there's something we didn't try.
>
> first (after you assign to $palavra_objeito and before you call ->utf16)
> to convert from big-endian to little-endian.

>
> >     $sql =  "INSERT INTO Tipo_Referencia ( Descricao ) SELECT '$palavra2' AS 
>Expr1;";
>
> Is there a reason why you don't write this as
>
>     $sql = "INSERT INTO Tipo_Referencia ( Descricao ) " .
>            "VALUES ('$palavra_em_utf16')"

Not really, but the previous sintax has worked many times.
>
> ? The "INSERT INTO table (columns) VALUES (literals)" is, for me, the
> usual syntax, and "INSERT INTO table (columns) SELECT literals AS dummy"
> looks strange to me.

Maybe, I just copied the sintax from an Access Query. It worked in many occasions. 
Even writing an
UTF-8 value worked with that sintax. Obviously Access didn't make much sense of it as 
UTF-8 isn't
really something it "understands".

But your sintax is the most correct one (and the one respecting SQL standard).

>
> >     print FICH3 $palavra2,"\n";
> >     $conn->execute($sql,,,adExecuteNoRecords);
>
> This is the same as
>
>     $conn->execute($sql,adExecuteNoRecords);
>
> .. If the constant adExecuteNoRecords has to be the fourth parameter to
> ->execute, then say so:
>
>     $conn->execute($sql, undef, undef, adExecuteNoRecords);
>
> .. Perl isn't Visual Basic :)

There, you caught me. I'm much more fluent in VB than in Perl, and I was the one that 
gave my
students the ADO code...
>

> To summarise, I think you have misunderstood how Unicode::String works.
> utf16() (called as a function, not a method) doesn't convert a strong
> *to* UTF-16, it expects a string in UTF-16 and converts *from* that
> encoding into the internal format used by Unicode::String and returns an
> object. Then you can call methods on that object to produce another
> encoding such as UTF-8 or Latin-1 or whatever. So conversions involving
> Unicode::String generally involve at least two calls.

Not quite, but it is clear that it was a bad example and your conclusions are, 
therefore justified.
I'll try your suggestions and let you know about the result.

Thank you for your time and your help.

Regards.

Rui

Reply via email to