Re the following, just FYI: >One of the reasons why the whole problem of Han variants is so nasty is >that there are so many different kinds of variant out there. In order to >try to bring order to this chaos, we need a model and we need data, and >the IRG is the best organization to provide that model and those data. > >I should point out that at the last IRG, not only did Unicode have a paper >on variants, but the rapporteur also made a presentation on why this is a >problem, and much of the work at the meeting was done using a variant >database provided by Taiwan. The HKSAR also has a similar database. And, >of course, almost any Han dictionary has variant data in it, including in >many Chinese dictionaries TC/SC equivalence. > >That Han variants exist is not an issue.
A huge database used by scholars, the 800 million character electronic version of the Siku Quanshu, has as non-exclusive classes of character equivalents to be chosen by users, the following: yiti (traditional variants), tongjia (different characters sometimes used interchangeably), jianfan (simplified/traditional), zhengwu (correct/mistaken), Zhong-Ri (Chinese/Japanese), xinjiu (new/old), gujin (ancient/recent), xingjin (close in shape). This while the database itself is simpler, i.e., all in traditional characters of one form or another, than e.g. a library database would be (where noone could predict in what kind of characters a particular title on Ming history would be written, and one would always expect to get both as a result of a search). Unfortunately, no documentation is provided as to which particular pairs of characters these equivalent classes refer to, although when doing a specific search, a user will be alerted to which characters are added as equivalents under a particular choice. Martin Heijdra

