Andrew C. West" <[EMAIL PROTECTED]> wrote on Friday, February 14, 2003 2:29 AM Subject: Re: traditional vs simplified chinese > On Thu, 13 Feb 2003 09:48:45 -0800 (PST), "Zhang Weiwu" wrote: > > > Take it easy, if you find one 500B (the measure word) it is usually enough to > > say it is traditional Chinese, one 4E2A (measure word) is in simplified > > Chinese. They never happen together in a logically correct document. > > Marco is absolutely correct that Simplified and Traditional Chinese may > legitimately be found together on the same Web page (and I for one have several > pages where they do).
> Certainly, I've seen "traditional" texts which mix U+500B with U+4E2A (and with > U+7B87 for that matter). With Unicode it is now possible to transcribe > traditional texts as they are written, rather than translate into "traditional" > or "simplified". Take, for example, this Web page -- > http://uk.geocities.com/Morrison1782/Texts/TianguanCifu.html -- which > transcribes a short one-act play from the Cantonese Opera tradition, published > during the Qing dynasty (probably early 19th century). Okay, Andrew is a real expert and is right about it. I would want to have a look of that page if I can go to geocities.com. (It has been at least two years no one goes to geocigies.com directly from China.) I never saw 500B and 4E2A in one same printed document as I lived in China for 20 years. (Well, need to remove the years I cannot read:) Unless you have a obvious reason to do so, to print a book with Traditional characters is considered somewhat wrong in the past in China. There is a language council (YuWei) in charge of such issue. In some period of past time people want to completely kill Traditional Chinese. I remeber an advertisement on the street when I was a child, which said people should report public appearance of Traditional Chinese character to the local culture ministry of some sort. (Oh it's very OT) So let me correct my word: If you find a 4E2A, maybe it is still Traditional, but if you find a 500B it is very very likely to be Traditional Chinese. I think we can search 500B, if it does not exist it is likely to be a simplified character. It's a bad thing I never read copied books (I mean copy from original ancient books) so to make the kind of mistake. Try to read more in future. >It has U+4E2A (simplified > ge4) but not U+500B (traditional ge4), and yet is written mostly in > "traditional" characters. How would your algorithm classify such a page ? Well I was not talking about algorithm the first time. I thought Paul Hastings <[EMAIL PROTECTED]> wanted to do it by looking at it. And we don't have lots of such mixed pages.
smime.p7s
Description: application/pkcs7-signature

