Orlando Andico wrote:
> 繁體中文

Pronounced something like fan-ti-zhong-wen in Mandarin, I think.

Roughly goes 'complex', 'object', 'China', 'writing'.  I think it means
traditional Chinese writing, i.e. the complicated Chinese script that is
still in use in places like Taiwan and by the Japanese in their Kanji
(the People's Republic having opted to simplify the characters in an
effort to stem the tide of illiteracy).

Anyway, to answer your question, Orly, I don't think you're really
getting escaped Unicode, but YAML::Syck escapes all input characters
above the ASCII limit no matter what.  The first character you have, 繁,
is U+7E41 from the Kanji database I use [1], which in UTF-8 comes out to
precisely the three bytes \xE7\xB9\x81 in your escaped output.  It's not
strictly escaped Unicode, but escaped UTF-8.  If you treat it as binary
after performing the unescaping, the binary that you receive is actually
the original UTF-8 representation.

That's the trouble with programs that don't understand anything but
ASCII.  They tend to mangle things up badly.  But consider yourself
lucky; PHP is a far worse language for dealing with non-ASCII character
sets, as I found to my chagrin while writing an app for a Japanese
client.  At least the UTF-8-unaware Syck library managed not to mangle
things so badly.

[1] http://kanjidict.stc.cx/pastesearch

-- 
While there is a lower class, I am in it, while there is a criminal
element, I am of it, and while there is a soul in prison, I am not free.
http://stormwyrm.blogspot.com/
_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
[email protected] (#PLUG @ irc.free.net.ph)
Read the Guidelines: http://linux.org.ph/lists
Searchable Archives: http://archives.free.net.ph

Reply via email to