http://www.unicode.org/reports/tr38/ does a good summary of the possibilities.
Which and where?

Trying to "fold" from one locale to another, which is what folding from traditional to simplified would be is not a good idea, best practice is not bear in mind the locale being used, and do information retrieval on a locale by locale basis.
What do you mean?

Put simply: Either you don't let someone search a TW database with simplified characters or you convert either the search terms or the searched documents internally for the duration of your search – or some combination of these options. It is not at all obvious to me what the fastest way in a big data context is. There's gotta be research about this.

Stephan


Reply via email to