On 11/8/2012 4:39 PM, Mark E. Shoulson wrote:
On 11/08/2012 01:48 AM, William_J_G Overington wrote:
Michael Everson <[email protected]> wrote:

< ... collect examples of these in print ...

Mark E. Shoulson <[email protected]> wrote:

We don't encode "it would be nice/useful." We encode *characters*, glyphs that people use (yes, I know I conflated glyphs and characters there.)
...
Unicode isn't a system for encoding ratings. It's a system for encoding what people write and print.

I have at various times, as research has progressed, deposited with the British Library pdf documents that I have produced and published and I have deposited with the British Library TrueType fonts that I have produced and published and I have received email receipts for them.

Some of the pdf publications contain new symbols, used intermixed with text in a plain text situation. I have used Private Use Area encodings for the symbols.

Yet the publications have not been published in hardcopy form.

I think you may be taking me too literally. A PDF document which is essentially a proxy for a printed page (only cheaper to copy and produce) would count, to me, as usage "in print." I don't make the rules, but I think some of the Unicoders who do would agree. The charge of the rules being "out of date" because they demand usage is not an accurate one, and pointing to printing vs electronic usage is a red herring.

I have long complained about another writing system which I felt had trouble being encoded due to chicken-and-egg issues (Klingon), but even so people have been using it in the PUA; see http://qurgh.blogspot.com/ (now defunct, apparently, but the site is still there), and the KLI's collection of Qo'noS QonoS is available in Latin letters or in pIqaD in PUA.

I agree that there is something to the charge of chicken-and-egg issues with encoding writing systems (you can't write it until it's encoded, you can't encode it until it's written), but probably more with the amount of usage that has to be seen, not with the requirement that there be SOME usage.

I stand by it: we don't encode what would be cool to have. We encode what people *use*.


Actually, there are certain instances where characters are encoded based on expected usage.

Currency symbols are a well known case for that, but there have been instances of phonetic characters encoded in order to facilitate creation and publication of certain databases for specialists, without burdening them with instant obsolescence (if they had used PUA characters).

If an important publisher of mathematical works (or publisher of important mathematical works) made a case for adding a recently created symbol so that they can go ahead an make it part of their standard repertoire, I would think it churlish to require them to create portability problems for their users by first creating documents with PUA encoding).

What these examples have in common is that they reflect a small number of characters with an "instant" user community that's well defined and understood (and appropriate to the type of character). The main reason for the restriction to "encode what people use" is that characters cannot be retracted if the hoped for enthusiasm for them doesn't materialize.

The other reason is that the Unicode Standard is a standard - what it encodes needs to be worthy of standardization. There are exceptional instances where "leading" standardization can be justified - they are few and far between, but they exist. As exceptions prove the rule - the majority of characters will continue to be cases where standardization follows demonstrated use.

A./

Reply via email to