Re: Accessing alternate glyphs from plain text

William_J_G Overington Sat, 21 Aug 2010 00:22:22 -0700
On Wednesday 11 August 2010, Doug Ewell <[email protected]> wrote:
 
> Maybe (though I don't personally believe so) the concept of "plain text" has 
> become so passé that William's variation selectors for swash e's, and 
> additional ligatures, and weather reporting codes, and Portable Interpretable 
> Object Code may one day be considered "within scope" for Unicode.
 
Variation selector pairs to access alternate glyphs, additional ligatures, 
localizable sentences and a portable interpretable object code are not in the 
same categories.
 
The matter of ligatures is distinctly different from the other items.
 
The problem with ligatures as encoded in regular Unicode is one of needing an 
advanced format font and an application that is aware of advanced format fonts. 
Thus the golden ligatures collection of Private Use Area code points for 
ligatures, started in 2002, is still of use for producing hardcopy printouts 
and for making graphics files for people who do not have access to a desktop 
publishing program that can use an advanced format font. Hopefully, all desktop 
publishing programs will one day have the capability to handle ligatures in the 
regular Unicode manner. The golden ligatures collection is a solution that can 
be useful until that time.
 
The concept of "plain text" becoming passé is not a necessary condition for 
encoding in Unicode character plus variation selector pairs to access alternate 
glyphs, for encoding localizable sentences, for encoding a portable 
interpretable object code and for encoding vector graphics commands. They would 
be encoded in the same manner as if they were plain text, not necessarily 
because they are regarded as plain text.
 
They could easily become encoded in regular Unicode if there is a consensus 
that that is a desirable thing to happen.
 
If such a consensus is formed, there is no need for what is regarded as plain 
text being changed. What is encoded in Unicode and what is regarded as being 
plain text need not be the same.
 
Unfortunately there is the situation that the present policy appears to be that 
encoding cannot take place proactively. A policy of proactive encoding need not 
lead to a free-for-all of encoding as encoding would only be done after debate 
and the formation of a consensus. A policy of proactive encoding would however 
sweep away the present requirement of widespread existing usage needing to be 
demonstrated as a necessary condition for encoding. Such a condition might not 
be unreasonable where encoding is from letterpress printed books or from stone 
carving from long ago: however, where the condition is required for modern 
all-electronic communication, then, in my opinion, the condition is an 
unreasonable shackle on progress and innovation.
 
On the specific, in this thread, matter of the encoding of character plus 
variation selector pairs to access alternate glyphs. That encoding would not 
need the allocation of any new code points. It would need the allocation of 
character plus variation selector pairs. Those character plus variation 
selector pairs would be unlikely to have any other uses if they are not 
encoded. There could be a practice that only character plus variation selector 
pairs using variation selector 5 onward were used for accessing alternate 
glyphs, thus leaving four character plus variation selector pairs available for 
other encoding.
 
What I find a problem at present is this. If some character plus variation 
selector pairs for accessing alternate glyphs were encoded into regular 
Unicode, it seems to me (am I correct in this?) that they should be usable with 
existing advanced-font-aware application programs immediately, it would just be 
a matter of one or more fonts using them to become available. Yet in order to 
get them encoded, it appears that many texts using a Private Use Area encoding 
would need to be produced, producing problems for web archiving and search 
engines that a regular Unicode encoding would not produce. All of this whilst 
the texts produced using a Private Use Area encoding produced problems when 
displayed using fonts that did not support the alternate glyphs, whereas using 
proper character plus variation selector pairs would not produce those problems.
 
I recognize that there may be good reasons of which I am unaware at the time of 
writing this text for Unicode and ISO not providing facilities for proactive 
encoding, yet I wonder if it would be a good idea to review the policy now in 
2010, in case it is just a matter of policies made long ago still being applied 
when they are no longer desirable.
 
Now certainly, if the policy were changed so that proactive encoding is 
possible when a consensus can be achieved, that does not mean that all, or 
indeed any, of my own ideas, currently encoded into the Private Use Area, would 
necessarily be encoded into regular Unicode.
 
Consider please the matter of emoji. If more emoji are to be encoded, why is it 
necessary for them first to be used in the Private Use Area, possibly with 
several different encodings, before being encoded into Unicode and ISO 10646? I 
say that it would be better to allow ideas for new emoji to be submitted 
proactively, with a view to encoding some each year. That would encourage user 
interest and provide for product upgrading.
 
Returning to the topic of this thread, it seems to me that it would be good for 
there to be proactive encoding into Unicode of some character plus variation 
selector pairs to access alternate glyphs. As far as I know, it would do no 
harm and would be fun.
 
Regarding the policy that prevents proactive encoding at the present time. Is 
that policy written down anywhere in a formal document? If so, is that text 
available publicly and is there a procedure whereby a review concerning the 
possibility of changing that policy can be requested please?
 
William Overington
 
21 August 2010
Re: Accessing alternate glyphs from plain text

Reply via email to