Re: utf8, japanese, web-pages: beginning to see the light...

2004-06-17 Thread Gisle Aas
Marco Baroni <[EMAIL PROTECTED]> writes: > Thanks for the quick reply. > > > A workaround can be to pass it encoded UTF8. > > Excuse me my persistent confusion, but what does this mean, in concrete? The stuff that encode_utf8() returns (see 'perldoc Encode') or you get from a UTF-8 file read in

Re: utf8, japanese, web-pages: beginning to see the light...

2004-06-17 Thread Marco Baroni
Thanks for the quick reply. > A workaround can be to pass it encoded UTF8. Excuse me my persistent confusion, but what does this mean, in concrete? Regards, Marco -- Marco Baroni SSLMIT, University of Bologna http://sslmit.unibo.it/~baroni

Re: utf8, japanese, web-pages: beginning to see the light...

2004-06-17 Thread Gisle Aas
Marco Baroni <[EMAIL PROTECTED]> writes: > >> Now for a much less pressing issue: Does anybody know of something > >> similar to the HTML::FormatText module that can take utf-8 input, and > >> generate utf-8 output? > > > > Doubt it. But if you run it on Unicode chars (as indicated above) > > then

Re: utf8, japanese, web-pages: beginning to see the light...

2004-06-17 Thread Marco Baroni
Now for a much less pressing issue: Does anybody know of something similar to the HTML::FormatText module that can take utf-8 input, and generate utf-8 output? Doubt it. But if you run it on Unicode chars (as indicated above) then unless it is doing something too clever it should just work. Could i

Re: utf8, japanese, web-pages: beginning to see the light...

2004-05-18 Thread Nick Ing-Simmons
Marco Baroni <[EMAIL PROTECTED]> writes: >A few days ago, I queried this list about my problems with a script >that finds the charset of Japanese web pages and translates their text >into utf-8. > >The following solution, proposed by Nick Ing-Simmons, worked for my >case: > >>binmode STDOOUT

utf8, japanese, web-pages: beginning to see the light...

2004-05-13 Thread Marco Baroni
A few days ago, I queried this list about my problems with a script that finds the charset of Japanese web pages and translates their text into utf-8. The following solution, proposed by Nick Ing-Simmons, worked for my case: binmode STDOOUT,":utf8"; my $encoding = find_encoding($charset)