On Wednesday 11 August 2010, Doug Ewell <[email protected]> wrote:
> Maybe (though I don't personally believe so) the concept of "plain text" has
> become so passé that William's variation selectors for swash e's, and
> additional ligatures, and weather reporting codes, and Portable Interpretable
> Object Code may one day be considered "within scope" for Unicode.
Variation selector pairs to access alternate glyphs, additional ligatures,
localizable sentences and a portable interpretable object code are not in the
same categories.
The matter of ligatures is distinctly different from the other items.
The problem with ligatures as encoded in regular Unicode is one of needing an
advanced format font and an application that is aware of advanced format fonts.
Thus the golden ligatures collection of Private Use Area code points for
ligatures, started in 2002, is still of use for producing hardcopy printouts
and for making graphics files for people who do not have access to a desktop
publishing program that can use an advanced format font. Hopefully, all desktop
publishing programs will one day have the capability to handle ligatures in the
regular Unicode manner. The golden ligatures collection is a solution that can
be useful until that time.
The concept of "plain text" becoming passé is not a necessary condition for
encoding in Unicode character plus variation selector pairs to access alternate
glyphs, for encoding localizable sentences, for encoding a portable
interpretable object code and for encoding vector graphics commands. They would
be encoded in the same manner as if they were plain text, not necessarily
because they are regarded as plain text.
They could easily become encoded in regular Unicode if there is a consensus
that that is a desirable thing to happen.
If such a consensus is formed, there is no need for what is regarded as plain
text being changed. What is encoded in Unicode and what is regarded as being
plain text need not be the same.
Unfortunately there is the situation that the present policy appears to be that
encoding cannot take place proactively. A policy of proactive encoding need not
lead to a free-for-all of encoding as encoding would only be done after debate
and the formation of a consensus. A policy of proactive encoding would however
sweep away the present requirement of widespread existing usage needing to be
demonstrated as a necessary condition for encoding. Such a condition might not
be unreasonable where encoding is from letterpress printed books or from stone
carving from long ago: however, where the condition is required for modern
all-electronic communication, then, in my opinion, the condition is an
unreasonable shackle on progress and innovation.
On the specific, in this thread, matter of the encoding of character plus
variation selector pairs to access alternate glyphs. That encoding would not
need the allocation of any new code points. It would need the allocation of
character plus variation selector pairs. Those character plus variation
selector pairs would be unlikely to have any other uses if they are not
encoded. There could be a practice that only character plus variation selector
pairs using variation selector 5 onward were used for accessing alternate
glyphs, thus leaving four character plus variation selector pairs available for
other encoding.
What I find a problem at present is this. If some character plus variation
selector pairs for accessing alternate glyphs were encoded into regular
Unicode, it seems to me (am I correct in this?) that they should be usable with
existing advanced-font-aware application programs immediately, it would just be
a matter of one or more fonts using them to become available. Yet in order to
get them encoded, it appears that many texts using a Private Use Area encoding
would need to be produced, producing problems for web archiving and search
engines that a regular Unicode encoding would not produce. All of this whilst
the texts produced using a Private Use Area encoding produced problems when
displayed using fonts that did not support the alternate glyphs, whereas using
proper character plus variation selector pairs would not produce those problems.
I recognize that there may be good reasons of which I am unaware at the time of
writing this text for Unicode and ISO not providing facilities for proactive
encoding, yet I wonder if it would be a good idea to review the policy now in
2010, in case it is just a matter of policies made long ago still being applied
when they are no longer desirable.
Now certainly, if the policy were changed so that proactive encoding is
possible when a consensus can be achieved, that does not mean that all, or
indeed any, of my own ideas, currently encoded into the Private Use Area, would
necessarily be encoded into regular Unicode.
Consider please the matter of emoji. If more emoji are to be encoded, why is it
necessary for them first to be used in the Private Use Area, possibly with
several different encodings, before being encoded into Unicode and ISO 10646? I
say that it would be better to allow ideas for new emoji to be submitted
proactively, with a view to encoding some each year. That would encourage user
interest and provide for product upgrading.
Returning to the topic of this thread, it seems to me that it would be good for
there to be proactive encoding into Unicode of some character plus variation
selector pairs to access alternate glyphs. As far as I know, it would do no
harm and would be fun.
Regarding the policy that prevents proactive encoding at the present time. Is
that policy written down anywhere in a formal document? If so, is that text
available publicly and is there a procedure whereby a review concerning the
possibility of changing that policy can be requested please?
William Overington
21 August 2010