Re: Rendering Raised FULL STOP between Digits

Asmus Freytag Sat, 09 Mar 2013 11:37:01 -0800

Richard,

the situation with the raised decimal point is a mess in Unicode.

I know that Mark thinks we have too many dots, but the reason this caseis a mess is because the unification with U+002E is both non-workable inpractice and runs counter to precedent.

The precedent in Unicode is to separately encode characters when theyhave different appearance, except, if, fundamentally, it's the "same"character and the difference in appearance can be determinedunambiguously by "context".

There are two primary kinds of context that Unicode admits here. One isbased on surrounding text (such as positional forms of Arabic letters).The other is overall stylistic context, such as a font choice (such asupright vs. slanted integral symbols).

When the appearance of a character is different based on the author'sintent, and two (or more) different appearances can occur in the samedocument with different significance, then the usual response by Unicodehas been to encode explicit characters. (The lot of phonetic charactersare full of examples for this, like the lower case a without hook or theg with hook, both of which need to be distinguishable from other formsof these letters in phonetics).

So, if a British document can use both inline dots and raised dots, thenyou can't assign a single font to cover both. Well, the thought was,software might recognize the numeric context. However, as you've pointedout, section numbers are numeric and do not have the raised dot. Infact, as far as such documents are concerned, the raised dot itself canbe used by the reader to distinguish decimal numbers from other use ofnumbers separated by dots (something not possible in other languagesthat lack this convention).

So, on the face of it, the choice to unify the raised decimal dot with002E violates the encoding model, by pushing semantic distinctions intosome kind of markup. On top of that, it's not really practical to expectto have to either mark up all decimal numbers or all section numberswith separate styles or font bindings. That's something not requiredanywhere else.


So far, that's bad enough.

Next, you have the issue that Unicode refused (quite properly) to encodea generic "decimal separator" character, the appearance of which wassupposed to vary on external context (like locale or a document globalstyle). This suggestion had been intended to allow numerical expressionsto be cut and pasted between documents in different languages with allnumbers formatted correctly w/o further editing. That is, the samecharacter would appear as either comma or period (or raised period)depending on context.

I wrote that I agreed with the choice to not code such special characterfor that purpose. However, by not encoding a character for the raiseddecimal point, Unicode did an about-face and made 002E a "limitedpurpose" version of a "decimal separator". Suddenly, there is acharacter that is supposed to have different appearance based on context- on the line for US documents, off the line for British documents.

This directly violates the precedent established by the refusal toencode the generic "decimal separator".


What can be done?

I believe the Unicode Standard should be fixed by explicitly removingall suggestions in the text that the raised decimal point is unifiedwith 002E.

Second, the standard should be amended by identifying which character isto be used instead for this purpose.

It might be something like 00B7. In that case, 00B7 would have to haveproperties that effectively produce the correct result in numericcontext, while leaving non-numeric context unchanged. I believe that isentirely possible, and non-disruptive, insofar as numeric use of 00B7does not exist for any purpose other than showing a raised decimal point(I suspect there are documents in the wild that already use thischaracter for that purpose).

If that alternative is deemed not acceptable, the only remaining choicewould be to add a new character. (I would recommend that only as thelast resort).

A./

Re: Rendering Raised FULL STOP between Digits

Reply via email to