One problem caused by disunification is the complexification of algorithms handling text.
I forgot an important case where disunification also occured : combining sequences are the "normal" encoding, but legacy charsets encoded the precomposed character separately and Unicode had to map them for round trip compatibility purpose. This had a consequence : the creation of additional properties (i.e. for "canonical equivalences") in order to conciliate the two sets of encodings and allow some form for equivalence In fact this is general: each time we disunify a character, we have to add new properties, and possibly update the algorithms to take these properties into account and find some form of equivalences. So disunification solves one problem but creates others. We have to trade the benefits and costs of using the disunified characters with those using the "normal" characters (possibly in sequences). But given the number of cases where we have to support sequences (even if it's only combining sequences for canonical equivalences), we should really defavor the real need of disunifying characters: if it's possible with sequences, don't desunify. A famous example (based on a legacydecision which was bad in my opinion as the cost was not considered) was the desunification of Latin/Greek letters for mathematical purpose, only to force a specific style. But the alternative representation using sequences (using variation selectors for example, as the addition of specific modifier for "styles" like "bold", "italic" or "monospace" was rejected with good reasons) was not really analyzed in terms of benefits and costs, using the algorithms we already have (and that could have been updated). But mathemetical symbols are (normally...) not used at all in the same context as plain alphabetic letters (even if there's absolutely no warranty that they will be always distinctable from them when they occur in some linguistic text rendered with the same style...). The naive thinking that disunification will make things simpler is completely wrong (given that an application that would ignore all character properties and would use only isolated characters would break legitime rules in many cases, even for rendering purposes. It is in fact simpler to keep the possible sequences that are already encoded (or that could be extended to cover more cases: e.g. add new variation sequences, introduce some new modiers, not just new combining characters, and so on). We were strongly told : Unicode encodes characters, not glyphs. This should be remembered (and the argument of costs caused by disunification of distinct glyphs is also a good one against it). 2016-03-17 8:20 GMT+01:00 Asmus Freytag (t) <[email protected]>: > On 3/16/2016 11:11 PM, Philippe Verdy wrote: > > "Disunification may be an answer?" We should avoid it as well. > > Disunification is only acceptable when > - there's a complete disunification of concepts.... > > > I think answering this question depends on the understanding of "concept", > and on understanding what it is that Unicode encodes. > > When it comes to *symbols*, which is where the discussion originated, > it's not immediately obvious what Unicode encodes. For example, I posit > that Unicode does not encode the "concept" for specific mathematical > operators, but the individual "symbols" that are used for them. > > For example PRIME and DOUBLE PRIME can be used for minutes and seconds > (both of time and arc) as well as for other purposes. Unicode correctly > does not encode "MINUTE OF ARC", but the symbol used for that -- leaving it > up to the notational convention to relate the concept and the symbol. > > Thus we have a case where multiple concepts match a single symbol. For the > converse, we take the well-known case of COMMA and FULL STOP which can both > be used to separate a decimal fraction. > > Only in those cases where a single concept is associated so exclusively > with a given symbol, do we find the situation that it makes sense to treat > variations in shape of that symbol as the same symbol, but with different > glyphs. > > For some astrological symbols that is the case, but for others it is not. > Therefore, the encoding model for astrological text cannot be uniform. > Where symbols have exclusive association with a concept, the natural > encoding is to encode symbols with an understood set of variant glyphs. > Where concepts are denoted with symbols that are also used otherwise, then > the association of concept to symbol must become a matter of notational > convention and cannot form the basis of encoding: the code elements have to > be on a lower level, and by necessity represent specific symbol shapes. > > A./ >

