ICU has a canonical iterator, one that provides all the strings that produce the same result under toNFC(...).
Mark *— Il meglio è l’inimico del bene —* On Mon, Oct 4, 2010 at 20:59, Bjoern Hoehrmann <[email protected]> wrote: > Hi, > > Every now and then I need a tool that takes a Unicode string and gives > me all the strings that are not identical but equivalent under one of > the four normalization forms defined in UAX #15. Now I do have a couple > of hacks that get me by, but is there any tool or paper that has a more > complete solution? Last year I worked a bit in the general direction, > but http://lists.w3.org/Archives/Public/www-archive/2009Feb/0071.html I > ran out of time after proving that the sets of strings in one of the > normal forms are all regular languages, and writing a denormalizer was > not the goal anyway. > > Thanks, > -- > Björn Höhrmann · mailto:[email protected] · http://bjoern.hoehrmann.de > Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de > 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ > >

