On 3/22/2013 4:16 AM, Philippe Verdy wrote:
2013/3/22 Asmus Freytag <[email protected]>:
The number of conventions that can be applicable to certain punctuation
characters is truly staggering, and it seems unlikely that Unicode is the
right place to
a) discover all of them or
b) standardize an expression for them.
My intent is certainly not to discover and encode all of them. But
existing characters are well known for having very common distinct
semantics which merit separate encodings.

This claim would have to be scrutinized, and, to be accepted, would require very detailed evidence. Also, on what principles would you base the requirement to make a distinction in encoding?

And this includes notably their use as numeric grouping separators or decimal 
separators.

Well, the standard currently rules that such use does not warrant separate encoding - and the standard has been consistent about that for the entire 20+ years of its existence.

Further, all other character encoding standards have encoded these characters as unified with ordinary punctuation. This is very different from the ANO TELEIA discussion, where an argument could be made that *before* Unicode, the character occurred only in *specific* character sets - and that was a distinction that was lost when these character sets were mapped to Unicode.

No such argument exists for either middle dot or raised decimal point (except insofar as you could possibly claim that raised decimal point had never been encoded properly before, but then you'd have to show some evidence for that position).

Such common semantic modifiers would be eaiser to support than
encoding many new special variants of characters (that won't even be
rendered by most applications, and thus won't be used).

That might be the case - except that they would introduce a number of problems. Any "modifier" that has no appearance of its own can get separated from the base character during editing.

The huge base of installed software is not prepared to handle an entirely different *kind* of character code, whereas support for simple character additions is something that will eventually percolate through most systems - that fact makes disunifications a much more straightforward process.

Some examples : the invisible multiplication sign, the invisible
function sign,

Nah, these are not modifiers. They stand on their own. Their "invisibility" is not ideal, but not any worse than "word joiner" or "zwsp". All of these characters are separators - with the difference that the nature of the separator was determined to be crucial enough to encode explicitly. (And of course, reasonable people can disagree on each case).

Note that Unicode cloned several characters based on their word-break (or non-break) behavior, which is not a novel idea (earlier character encodings did the same with "no break space"). Already at that stage the train of having a "word break attribute character" (what you call a modifier) had left the station.

The only way to handle these issues, for better or for worse, is by disunification (wher that can be justified in exceptional circumstances).

  and even the Latin/Greek mathematical letter-symbols
which were only encoded for encoding style differences which have
occasional but rare semantic differences. For me, adding those
variants was really pseudo-coding, breaking the fundamental encoding
model, and complicatin the task for font creators, renderer designers,
and increasing a lot the size and complexity of collation tables.

Many of these character variants could have been expressed as a base
character and some modifier (whose distinct rendering was only
optional), allowing a much easier integration and better use. Because
of that the UCD is full of many added variants that re alsmost never
used and we have to leave with encoded texts that persist in using
ambguous characters for the most common possible distinctions.
No, for the math alphabetics you would have had to have a modifier that was *not* optional, breaking the variation selector model.

There was certainly discussion of a "combining bold" or "combining italic" at the time.

One of the major reasons this was rejected included the desire to prevent the creation of such "operators" that could be applied to *every* character in the standard.

And, of course, the desire to allow ordinary software to do the right thing in displaying these - the whole infrastructure to handle such modifiers would have been lacking.

Further, when you use and italic "a" in math, you do not need most (or all) software to be aware that this relates to an ordinary "a" in any way. It doesn't, really, except in text-to-speech conversion or similar, highly specialized tasks. So, unlike variation selectors, there would have been no benefit in using a modifier.

A./

Reply via email to