Find the sort order that is culturally most appropriate, and build a table to map this order to the Unicode code points.
In this case the culturally most appropriate sorting order requires sorting characters differently depending on which word they appear in. In order to sort Japanese properly, you'll have to be able to identify the reading of kanji composed words. Sometimes, different kanji composed words are written with the same characters but have different meanings and readings. Thus, the same characters have to be sorted differently depending on their meaning. For example,
çç
is sorted as
ãããã (seibutsu)
when it means "living creature," while it is sorted as
ãããã (namamono)
when it means "raw fish." I'm not sure how a computer program would be able to identify which pronunciation is appropriate in which case.
Stefan

