On 1/19/06, Sherwin Daganato <[EMAIL PROTECTED]> wrote:
On Thu, Jan 19, 2006 at 06:25:30PM +0800, Orlando Andico wrote:
> however YAML::Syck converts the Chinese text to escaped unicode, e.g.
> "\xE7\xB9\x81\xE9\xAB\x94\xE4\xB8\xAD\xE6\x96\x87"
>
> which is understandable, but i can't figure out how to convert the escaped
> form back to UTF-8 (Encode module doesn't seem to do it) because i want to
> re-display it after it backed-and-forth via YAML.

Have you tried utf8::upgrade()?

my understanding is that the utf8 module is deprecated in perl 5.8+

actually, in reply to Dido's post: yes, I meant escaped UTF-8 (i tend to use "unicode" and "UTF-8" interchangeably even though unicode encompasses far more than UTF-8, simply because UTF-8 is the default).

it's not surprising that libsyck converted it to escaped UTF-8, because the original *IS* UTF-8. A "dumb" application actually renders it into the UTF-8 (unprintable) character sequence, but our friendly GNOME 2.x (and even X -- i tried it in an Xterm as well) eats the sequence and displays the correct Unicode character. oh yeah, that IS traditional Chinese (the first two characters; the last two are the same in Simplified Chinese).

libsyck is actually at fault; not YAML::Syck since the latter only uses libsyck (C-library). the pure-Perl YAML module doesn't have this problem, but is 20X slower.. hmm.. decisions decisions.

a completely "dumb" implementation would be, if the sequence \xDD is seen, replace it with chr(hex(DD)). doesn't appeal to my sense of elegance though.


_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
[email protected] (#PLUG @ irc.free.net.ph)
Read the Guidelines: http://linux.org.ph/lists
Searchable Archives: http://archives.free.net.ph

Reply via email to