Philip, Thank you for your help. This work is being done by a couple of students of mine, so I just sent you one of the results of the experiments. But they have tried other things. So I'll make some localized comments bellow.
> On Tue, 20 Nov 2001 16:35:25 -0000, in perl.unicode you wrote: > > > open(FICH1,"fich1.txt")||die"Nao foi possivel abrir o ficheiro fich1.txt"; > > open(FICH3,">fich3.txt")||die"Nao foi possivel abrir o ficheiro fich3.txt"; > > Good that you check for success, but you should also include the reason > -- it's in $!. For example: > > open(FICH1, "fich1.txt") || die "Nao foi possivel abrir " . > "o ficheiro fich1.txt: $!"; > > > use utf8; Yes, there is no need for it; > > You shouldn't need that. Unicode::String will do all the Unicodery for > you; your program only needs to handly 'plain' bytes. > > > while (<FICH1>) { > > chomp($_); > > $palavra1=$_; > > @array=split(/ /,$palavra1); > > What do you use $palavra1 and @array for? (And @array is usually a bad > variable name.) Yes, quite true. I guess they left it from some experiment and I overlooked it. > > $palavra2=utf16($_); > > Here is a mistake. If you call utf16($_), it means "$_ is a string > encoded in UTF-16. Take it and convert it into a Unicode::String > object." We've tried with utf8. It does read well and it writes well as long as you write it in utf8. > > But you said you wanted to convert from UTF-8 to UTF-16. So you probably > want something like > > $palavra_objeito = utf8($_); > $palavra_em_utf16 = $palavra_objeito->utf16; We've tried just that and the result wasn't what we expected... > > Note that ->utf16 will return UTF-16BE, as I understand it, since > "Internally a Unicode::String object is a string of 2 byte values in > network byte order (big-endian)" (quote from the docs). So if your > database and/or file wants UTF-16LE (which is more natural for Intel > chips), then you need to do something such as > > $palavra_objeito->byteswap; Now there's something we didn't try. > > first (after you assign to $palavra_objeito and before you call ->utf16) > to convert from big-endian to little-endian. > > > $sql = "INSERT INTO Tipo_Referencia ( Descricao ) SELECT '$palavra2' AS >Expr1;"; > > Is there a reason why you don't write this as > > $sql = "INSERT INTO Tipo_Referencia ( Descricao ) " . > "VALUES ('$palavra_em_utf16')" Not really, but the previous sintax has worked many times. > > ? The "INSERT INTO table (columns) VALUES (literals)" is, for me, the > usual syntax, and "INSERT INTO table (columns) SELECT literals AS dummy" > looks strange to me. Maybe, I just copied the sintax from an Access Query. It worked in many occasions. Even writing an UTF-8 value worked with that sintax. Obviously Access didn't make much sense of it as UTF-8 isn't really something it "understands". But your sintax is the most correct one (and the one respecting SQL standard). > > > print FICH3 $palavra2,"\n"; > > $conn->execute($sql,,,adExecuteNoRecords); > > This is the same as > > $conn->execute($sql,adExecuteNoRecords); > > .. If the constant adExecuteNoRecords has to be the fourth parameter to > ->execute, then say so: > > $conn->execute($sql, undef, undef, adExecuteNoRecords); > > .. Perl isn't Visual Basic :) There, you caught me. I'm much more fluent in VB than in Perl, and I was the one that gave my students the ADO code... > > To summarise, I think you have misunderstood how Unicode::String works. > utf16() (called as a function, not a method) doesn't convert a strong > *to* UTF-16, it expects a string in UTF-16 and converts *from* that > encoding into the internal format used by Unicode::String and returns an > object. Then you can call methods on that object to produce another > encoding such as UTF-8 or Latin-1 or whatever. So conversions involving > Unicode::String generally involve at least two calls. Not quite, but it is clear that it was a bad example and your conclusions are, therefore justified. I'll try your suggestions and let you know about the result. Thank you for your time and your help. Regards. Rui