We are using UNICODE for representing Japanese characters.
Will the Japanese characters be sorted according to: a) There order in the Japanese character set OR b) Order of their listing in the UNICODE representation. OR c) The result of the two approaches above be the same.
If you sort Unicode strings in their binary order, then you get b).
If you use the Unicode Collation Algorithm (UCA), then you get better results for Katakana/Hiragana, but still Unicode order (b) for Kanji/Han characters. See http://www.unicode.org/reports/tr10/
ICU implements UCA and also provides a Japanese tailoring for JIS X 4061 order. Kanji are sorted in JIS X 0208 order (a). For this you simply instantiate a Collator object for the locale ID "ja" and use it to compare strings or to generate sort keys. This is supported in both the C/C++ and Java versions of ICU.
http://oss.software.ibm.com/icu/userguide/Collate_Intro.html http://oss.software.ibm.com/icu/
Best regards, markus

