Orlando Andico wrote: > 繁體中文 Pronounced something like fan-ti-zhong-wen in Mandarin, I think.
Roughly goes 'complex', 'object', 'China', 'writing'. I think it means traditional Chinese writing, i.e. the complicated Chinese script that is still in use in places like Taiwan and by the Japanese in their Kanji (the People's Republic having opted to simplify the characters in an effort to stem the tide of illiteracy). Anyway, to answer your question, Orly, I don't think you're really getting escaped Unicode, but YAML::Syck escapes all input characters above the ASCII limit no matter what. The first character you have, 繁, is U+7E41 from the Kanji database I use [1], which in UTF-8 comes out to precisely the three bytes \xE7\xB9\x81 in your escaped output. It's not strictly escaped Unicode, but escaped UTF-8. If you treat it as binary after performing the unescaping, the binary that you receive is actually the original UTF-8 representation. That's the trouble with programs that don't understand anything but ASCII. They tend to mangle things up badly. But consider yourself lucky; PHP is a far worse language for dealing with non-ASCII character sets, as I found to my chagrin while writing an app for a Japanese client. At least the UTF-8-unaware Syck library managed not to mangle things so badly. [1] http://kanjidict.stc.cx/pastesearch -- While there is a lower class, I am in it, while there is a criminal element, I am of it, and while there is a soul in prison, I am not free. http://stormwyrm.blogspot.com/ _________________________________________________ Philippine Linux Users' Group (PLUG) Mailing List [email protected] (#PLUG @ irc.free.net.ph) Read the Guidelines: http://linux.org.ph/lists Searchable Archives: http://archives.free.net.ph

