Re: The rules of encoding (from Re: Missing geometric shapes)

Asmus Freytag Thu, 08 Nov 2012 18:05:26 -0800

On 11/8/2012 4:39 PM, Mark E. Shoulson wrote:

On 11/08/2012 01:48 AM, William_J_G Overington wrote:
Michael Everson <[email protected]> wrote:
< ... collect examples of these in print ...

Mark E. Shoulson <[email protected]> wrote:
We don't encode "it would be nice/useful." We encode *characters*,glyphs that people use (yes, I know I conflated glyphs andcharacters there.)
...
Unicode isn't a system for encoding ratings. It's a system forencoding what people write and print.
I have at various times, as research has progressed, deposited withthe British Library pdf documents that I have produced and publishedand I have deposited with the British Library TrueType fonts that Ihave produced and published and I have received email receipts for them.
Some of the pdf publications contain new symbols, used intermixedwith text in a plain text situation. I have used Private Use Areaencodings for the symbols.
Yet the publications have not been published in hardcopy form.
I think you may be taking me too literally. A PDF document which isessentially a proxy for a printed page (only cheaper to copy andproduce) would count, to me, as usage "in print." I don't make therules, but I think some of the Unicoders who do would agree. Thecharge of the rules being "out of date" because they demand usage isnot an accurate one, and pointing to printing vs electronic usage is ared herring.
I have long complained about another writing system which I felt hadtrouble being encoded due to chicken-and-egg issues (Klingon), buteven so people have been using it in the PUA; seehttp://qurgh.blogspot.com/ (now defunct, apparently, but the site isstill there), and the KLI's collection of Qo'noS QonoS is available inLatin letters or in pIqaD in PUA.
I agree that there is something to the charge of chicken-and-eggissues with encoding writing systems (you can't write it until it'sencoded, you can't encode it until it's written), but probably morewith the amount of usage that has to be seen, not with the requirementthat there be SOME usage.
I stand by it: we don't encode what would be cool to have. We encodewhat people *use*.

Actually, there are certain instances where characters are encoded basedon expected usage.

Currency symbols are a well known case for that, but there have beeninstances of phonetic characters encoded in order to facilitate creationand publication of certain databases for specialists, without burdeningthem with instant obsolescence (if they had used PUA characters).

If an important publisher of mathematical works (or publisher ofimportant mathematical works) made a case for adding a recently createdsymbol so that they can go ahead an make it part of their standardrepertoire, I would think it churlish to require them to createportability problems for their users by first creating documents withPUA encoding).

What these examples have in common is that they reflect a small numberof characters with an "instant" user community that's well defined andunderstood (and appropriate to the type of character). The main reasonfor the restriction to "encode what people use" is that characterscannot be retracted if the hoped for enthusiasm for them doesn'tmaterialize.

The other reason is that the Unicode Standard is a standard - what itencodes needs to be worthy of standardization. There are exceptionalinstances where "leading" standardization can be justified - they arefew and far between, but they exist. As exceptions prove the rule - themajority of characters will continue to be cases where standardizationfollows demonstrated use.

A./

Re: The rules of encoding (from Re: Missing geometric shapes)

Reply via email to