On Friday, July 18, 2003 7:36 AM, Michael Everson <[EMAIL PROTECTED]> wrote:
> At 00:57 +0200 2003-07-18, Philippe Verdy wrote: > > > Why is row 03 so resticted? Shouldn't it include those accents and > > diacritics that are used by other characters once canonically > > decomposed? Or does it imply that MES-2 is only supposed to use > > strings if NFC form? > > > > Also, is this list under full closure with existing character > > properties, like NFKD decompositions, and case mappings? > > The MES-2 is what it is, and was developed at the time when it was. > It is thought to be a minumum requirement for European requirements, > and is certainly a lot better than that old Adobe glyph list that was > supported earlier on. It doesn't depend on very smart fonts. > > Personally I prefer the Multilingual European Subset. Is there some work at CEN to align its MES-2 subset into a revized (MES-2.1 ???) which not only takes into consideration the ISO10646 reference but also its Unicode properties to make this set self-closed, and actually implementable, at least with NFC closure and case-mappings closure? Support for NFKC closure should then be added in a next step, which could optionally specify support for the corresponding decompositions (but this would include combining characters, and would extend the number of precomposed characters in NFC form to include in the repertoire). I don't think it's up to Unicode to do this work, but CEN should be contacted to perform this job, or some vendor or open-sourcers may have done it and published it. I still note that modern Hebrew and Arabic are excluded from MES-2, as they are not used in any official language in the European Union or EFTA, or future EU candidates. But They are certainly of great interest for countries with which the EU is a major partner, and which are using these scripts. In some future, it would be needed to include support for modern Georgian (a subset of U+10A0..U+10FF), and modern Armenian (a subset of U+0530..U+058F), as well as some characters from Cyrillic Supplementary (in U+0500..U+052F). On the opposite, I don't understand why MES-2 included characters in row U+25xx (Box Drawing, Block Elements, Geometric Shapes), which are not strictly needed for text purpose (notably legal publications of the E.U., which should better use markup systems), and the two Alphabetic Presentation Forms U+FB01..U+FB02 (<fi> and <fl> ligatures) which are really unneeded, even for legal purposes, or they should have been coherent and included <ff>, <ffi>, <ffl> ligatures... I suppose that this may come from widely used legacy encodings in some EU+EFTA+European Council countries, but CEN should have avoided them (they could still be selected by font renderers, if available in fonts). -- Philippe. Spams non tol�r�s: tout message non sollicit� sera rapport� � vos fournisseurs de services Internet.

