Doug Ewell wrote: > [...] Far from being a simple operation like Latin > case mapping (to which it was compared), TC/SC > requires potentially complex analysis of the text > being converted. > > This is the opinion of many experts within, as well as > outside, the Unicode standardization effort, and it is > the reason you will not find a Unicode TC/SC mapping > table.
Actually, such an table can easily be extracted from Unicode's UniHan database (a huge file: <http://www.unicode.org/Public/UNIDATA/Unihan.txt>). The relevant information for TC->SC is field <kSimplifiedVariant>, and for SC->TC is field <kTraditionalVariant>. As each field is on a separate line, the information can be extracted quite simply, such as with the DOS command: find "kSimplifiedVariant" Unihan.txt > kSimplifiedVariant.txt However, as Doug explained, this 1-to-1 data is NOT suitable for a full-fledged conversion. However, the data may be a good starting point for more complex approaches. It can also turn useful for implementing things such as a user-friendly search function, that would match any variant of the sought characters. In this respect, UniHan contains two more fields that may be useful: <kSemanticVariant>, <kSpecializedSemanticVariant>. _ Marco

