On Friday 2003.10.10 14:48:58 -0700, Magda Danish (Unicode) wrote: > Roberto, > > I am forwarding your question to the Unicode mailing list for possible > answers from the list's subscribers. > > Regards, > > Magda Danish > Administrative Director > The Unicode Consortium > 650-693-3921 > > > > -----Original Message----- > > Date/Time: Thu Oct 9 10:20:19 EDT 2003 > > Contact: [EMAIL PROTECTED] > > Report Type: Other Question, Problem, or Feedback > > > > Hi at all, > > i have a little question: > > Characters in the unicode range U+4E00 and U+9FFF are Unified > > Ideographs for > > CJK languages. This means that all characters are togheter > > for Chinense, > > Japanese and Korean languages?
Yes, that's why they are called "unified". > > If i take a charcters for, > > example U+4E01, > > this is a valid character for all three languages? Most likely. There are some characters that only occur in modern simplified Chinese, some that for the most part only occur in modern traditional Chinese (such as used in Taiwan or Hong Kong), some that only occur in Japanese. > > My problem is to recognize from the 32 bit value of unicode > > character if this > > is a chinese character or korean or japanese. How can do this? You can't, so don't try to do it on a character-by-character basis. It is useless. As a human looking at a string of text, you can tell what language it is from the context. Of course for Japanese or Korean you will expect to see Hiragana or Katakana (for Japanese) or Korean syllables. But there is every possibility that a Korean text might contain embedded Chinese quotations, or Japanese containing embedded Korean, or ... you get the idea ... > > > > I develop international application under win98, win200 with > > Visual Studio 6.0 > > > > thanks a lot. > > > > Roberto (ITALY) > > > > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > > (End of Report) > > > > > > >

