This is not related strictly related to this Unicode version update, but I have an interesting question about the Unicode Stability Policy.
Summary: How does it apply to the exact value (or aliases) of the property "Decomposition Type" (dt), for compatibility decomposition mappings ? In the strict definition applicable to Unicode 4.1+, the stability of decompositions is defined in terms of idempotent normalizations across versions of strings containing only characters that are assigned and encoding in each versions, so that its decomposition mapping (i.e. the list of code points to which each character is assigned) should be stable. With the weaker definition of Unicode 3.1+, this list of code points could change (but this was fixed later so that it this mapping became normalized with NFD), and it was permitted to fix some errors (under some very limited conditions, and as exhibited in NormalizationCorrections.txt listing corrigendas of Unicode 3.2 and 4.0). But the weaker definition just speaks about a much simpler (reduced) decomposition type, i.e. only "canonical" or "compatibility". If I look precisely at the possible distinct values for the "dt" property, this weaker stability would still apply a strict stability only to the property values (and aliases, as defined in PropertyValueAliases.txt): dt ; Can ; Canonical ; can dt ; None ; none But all the other values have to be interpreted as "compatiblity" for the purpose of effectively implementing the four standard normalizations (NFC, NFD, NFKC, NFKD), i.e. where the short value for the "dt" property is any one of: dt ; Com ; Compat ; com dt ; Enc ; Circle ; enc dt ; Fin ; Final ; fin dt ; Font ; font dt ; Fra ; Fraction ; fra dt ; Init ; Initial ; init dt ; Iso ; Isolated ; iso dt ; Med ; Medial ; med dt ; Nar ; Narrow ; nar dt ; Nb ; Nobreak ; nb dt ; Sml ; Small ; sml dt ; Sqr ; Square ; sqr dt ; Sub ; sub dt ; Sup ; Super ; sup dt ; Vert ; Vertical ; vert dt ; Wide ; wide Is this list of compatibility decomposition types subject to the stability policy ? (Yes, new aliases may be added in implementations, as long as they preserve them in the same classes of equivalence). But could there be new compatibility decomposition types (still preserving their uniqueness). And can these types change (for example from "dt=Small" to "dt=Narrow", or from "dt=Nobreak" to "dt=Compat")? I've looked closely in the definition of other derived properties, and it does not seem that the "dt" property is used for anything else than implementing the normalizations (for example the word-breaking properties do not depend on "dt=nb"). And it may eventually be convenient to have some characters with compatibility decomposition mappings changed to exhibit better decomposition mapping types (only to one of the existing values, excluding possible future distinct values, as needed for the stability rule "Property Alias Uniqueness" in Unicode 3.2+). Such change would not break the idempotency of normalizations defined for Unicode 4.1+, or even the weaker definition for Unicode 3.1+. The strict rule for Unicode 4.1+ just says: "Decomposition Mapping: Once a character is assigned, its decomposition mapping will not change." But I wonder if this applies to the exact decomposition type as explicited just below that, in the weaker definition, because it just speaks about the value of the "decomposition mapping" property, which does not contain itself the value of the "decomposition type" property. Even in the proposed update for TR44, the "Decomposition_Type" and "Decomposition_Mapping" properties are defined separately (the first one as an enumeration of property values listed in PropertyValueAliases, the second one as a string made of code points only). If the large enumeration is in fact very weak (and not even needed for warrantying the normalization idempotency) then we could as well simplify it in the UCD to contain only "Can" (Canonical), "Com" (Compatibility) and "None". But we could as well make the reverse thing, by better refining the list of compatibility types between the "<" and ">" brackets in the main UnicodeData.txt file. And may be we could possibly adding multiple values (except "Can" and "None"), but I fear that this could break some existing UCD parsers that only expect letters between these angle brackets to detect compatibility values, without even having to check which value is specified between them, using a simple regexp like /(<[A-Za-z]+> )/. -- Philippe.

