Hi Denis, A fea thoughts ... library data may be nfc or nfd, but is more likely to conform to the MARC character repetoire, so isn't exactly NFD.
Vietnamese data is either 1) NFC or 2) neither NFC nor NFD It would be rare to find vietnamese data in NFD For a range of afrjcan languages, maily ones uskng diacriti s anx diacritic stackkng, it may be 1) NFC, 2) NFD or 3) niether NFC nor NFD depending on the input framework used. On Jan 22, 2013 3:26 AM, "Denis Jacquerye" <[email protected]> wrote: > Does anybody have any idea of how much of the Web is normalized in NFC > or NFD? Or how much not normalized? > > How would one find out or try to make a smart guess? > > I know a lot of library catalogue data is in NFD or somewhat > decomposed. Is there any other field that heavily uses decomposition? > > -- > Denis Moyogo Jacquerye > African Network for Localisation http://www.africanlocalisation.net/ > Nkótá ya Kongó míbalé --- http://info-langues-congo.1sd.org/ > DejaVu fonts --- http://www.dejavu-fonts.org/ > > >

