On 11/8/2012 4:39 PM, Mark E. Shoulson wrote:
On 11/08/2012 01:48 AM, William_J_G Overington wrote:
Michael Everson <[email protected]> wrote:
< ... collect examples of these in print ...
Mark E. Shoulson <[email protected]> wrote:
We don't encode "it would be nice/useful." We encode *characters*,
glyphs that people use (yes, I know I conflated glyphs and
characters there.)
...
Unicode isn't a system for encoding ratings. It's a system for
encoding what people write and print.
I have at various times, as research has progressed, deposited with
the British Library pdf documents that I have produced and published
and I have deposited with the British Library TrueType fonts that I
have produced and published and I have received email receipts for them.
Some of the pdf publications contain new symbols, used intermixed
with text in a plain text situation. I have used Private Use Area
encodings for the symbols.
Yet the publications have not been published in hardcopy form.
I think you may be taking me too literally. A PDF document which is
essentially a proxy for a printed page (only cheaper to copy and
produce) would count, to me, as usage "in print." I don't make the
rules, but I think some of the Unicoders who do would agree. The
charge of the rules being "out of date" because they demand usage is
not an accurate one, and pointing to printing vs electronic usage is a
red herring.
I have long complained about another writing system which I felt had
trouble being encoded due to chicken-and-egg issues (Klingon), but
even so people have been using it in the PUA; see
http://qurgh.blogspot.com/ (now defunct, apparently, but the site is
still there), and the KLI's collection of Qo'noS QonoS is available in
Latin letters or in pIqaD in PUA.
I agree that there is something to the charge of chicken-and-egg
issues with encoding writing systems (you can't write it until it's
encoded, you can't encode it until it's written), but probably more
with the amount of usage that has to be seen, not with the requirement
that there be SOME usage.
I stand by it: we don't encode what would be cool to have. We encode
what people *use*.
Actually, there are certain instances where characters are encoded based
on expected usage.
Currency symbols are a well known case for that, but there have been
instances of phonetic characters encoded in order to facilitate creation
and publication of certain databases for specialists, without burdening
them with instant obsolescence (if they had used PUA characters).
If an important publisher of mathematical works (or publisher of
important mathematical works) made a case for adding a recently created
symbol so that they can go ahead an make it part of their standard
repertoire, I would think it churlish to require them to create
portability problems for their users by first creating documents with
PUA encoding).
What these examples have in common is that they reflect a small number
of characters with an "instant" user community that's well defined and
understood (and appropriate to the type of character). The main reason
for the restriction to "encode what people use" is that characters
cannot be retracted if the hoped for enthusiasm for them doesn't
materialize.
The other reason is that the Unicode Standard is a standard - what it
encodes needs to be worthy of standardization. There are exceptional
instances where "leading" standardization can be justified - they are
few and far between, but they exist. As exceptions prove the rule - the
majority of characters will continue to be cases where standardization
follows demonstrated use.
A./