Re: Private Use Agreements and Unapproved Characters

Patrick T. Rourke Wed, 13 Mar 2002 07:10:56 -0800

Thanks to everyone who has commented, especially John Cowan, Doug Ewell, 
and David Starner (I'm on the digest, and so apologize if I haven't 
thanked someone who has provided substantial comments). Thanks too to 
Mr. Overington, though with Mr. Kaplan I agree that this is a bit too 
much work to avoid the minor issue of overlapping PUA uses for my 
purposes; I was hoping merely to find an existing registry which might 
have some overlap with the user community I'm concerned with. I'm 
replying mainly to Mr. Ewell's comments, which are the kind of 
counter-arguments I was hoping to be able to consider.


Sorry to be coy, but since I'm writing a proposal (not a Unicode 
proposal) to the authors of a couple of Unicode proposals for such a 
registry, and since the proposals which would be included in the 
registry are ones I did not have any hand in writing, I think it would 
be better for me to avoid too much precision until I've got the approval 
of the proposal writers (who would also be among the most important of 
my targeted users).

> There's no reason it has to be that way.  Proposed glyphs are posted on
> the Unicode Web site months in advance of their "go live" date, even
> before the beta period, largely for this reason.  I'm sure Unicode-aware
> type designers like John Hudson don't wait until a version of Unicode is
> formally released before they start designing glyphs.

True, but many scholarly communities are small enough that their needs 
might not be of interest to type designers with a wider targeted 
audience (like Mr. Hudson), and so depend largely upon small 
typographers, even amateurs to provide their type.  In such cases, it 
would seem to me that a registry such as the one I'm suggesting would 
help to drive the transition.  At any rate, I've already had two type 
designers who've done type for the community show interest in such a 
registry.

> One important point to remember is that any use or proposed use of the
> PUA, such as ConScript, is strictly up to private organizations, not the
> Unicode Consortium.  To be sure, ConScript is the domain of two guys who
> are quite influential in Unicode, but they do not maintain ConScript in
> any official capacity as representatives of Unicode.

Fully aware of this.  I'm thinking that this would be an improvement 
over the status quo, which is as David Starner suggested, the use of 
informal private encodings or escaped entities.

> I would think you could simply use the version number of the Unicode
> Standard.  For example, the use of Tagalog would have been conformant to
> this proposed PUA registry until Unicode version 3.2, at which time it
> would have to be removed from the registry because of its introduction
> into Unicode.

This had not occurred to me!  The only thing that would militate against 
this would be if additional characters were identified which had not yet 
been proposed and were proposed at a later date; that would require a 
new version number which would not be a Unicode point number, and so 
might be distinguished using a letter, etc.  (I don't foresee this 
happening, but it's better to be safe than sorry, no?).

> Conformance to this registry, especially over a period of time, is up to
> the user community.  The presence of a standard is no guarantee that it
> will be followed, or even noticed.

Excellent, this is the problem I was most concerned with.  The target 
users for the registry would be a small number of electronic scholarly 
publishers in the community.  The license for the fonts would strongly 
recommend that content providers using registry-based fonts would have 
to convert their character data to the Unicode-approved codepoints 
within say six months of release, and for the target publishers this 
wouldn't be a problem.  If the distribution sites for the released fonts 
all included prominent links to the registry site, and the registry site 
provided information on the progress of the characters in the encoding 
process, this would I hope drive the adoption of later versions.

So those outside the target user group would at least be made aware of 
the process by the license, and a mechanism would be in place to prevent 
the dead hand of the older versions of the registry from being quite so 
strong.

> Suppose Old Persian Cuneiform is encoded in Patrick's PUA registry next
> week, and that encoding achieves some popularity.  Then suppose at some
> later date it is encoded in Unicode, say version 4.1.  This will
> necessarily cause the encoding in Patrick's registry to be withdrawn, or
> at least deprecated.

I was thinking deprecated for two versions or two years, whichever was 
longer, and then ultimately withdrawn.

> How many people will switch immediately to the
> sanctioned Unicode encoding?  How quickly will existing software and
> data be converted?  Probably not right away, and the chances for a
> timely conversion are less if the private-use encoding is particularly
> successful, whether or not there are scripts available to help people
> make the conversion.

There would in fact be a published time-table. Of course, if the 
private-use encoding became popular enough that it was used OUTSIDE the 
targeted group of content providers, this would become an issue.  But 
since the targeted group of content providers are pretty influential in 
the community (e.g., most users in the community would need to get a 
font that could be used to read the targeted groups' content), I'm 
hoping that their transition would drive the transition of other content 
providers.

So obviously this idea is strongly dependent upon the approval and 
cooperation of the targeted group of content providers, and so would 
have to be abandoned if I did not convince them.

> This is exactly the reason for the "rigorous proposal/review policy"
> mentioned earlier, and perhaps the biggest drawback to the concept of a
> widespread PUA encoding for future Unicode scripts.  It usually does
> take a while to get characters encoded in Unicode, not just because
> committees are big and slow and bureaucratic, but because there are real
> decisions to be made that can take a lot of time and research.  Rushing
> these characters into use before Unicode and WG2 have finished making
> these decisions could subvert the process and create the dilemmas
> Patrick mentioned.

The point is that the registry would not be "rushing characters into 
use," but that they would be characters which were already in use with a 
variety of non-standardized methods and which are widely used in print 
in the community.

I'm all too aware of why it takes time and research - for example, there 
are times when it is very difficult to distinguish a unique character 
from a variant letterform.  However, there are characters which are 
unambiguously represented as entities in an existing private encoding, 
and are present as glyphs in existing privately "encoded" fonts (which 
are not compatible with one another), and which are clearly not merely 
alternate glyphs, but unique characters.  These characters are ones 
which I would think could be included in such a registry, and would have 
a very high probability (I'd guess 90% or more) of being encoded.  But 
my ability (with the help of others who are familiar with both the 
principles of Unicode and with the needs of the community) to "predict" 
whether a character would be approved by Unicode and WG2 isn't going to 
be 100% accurate.  So it would seem to me that the best route would be 
to include the proposals in toto and work out what will be done if 
certain characters are not encoded.

It seems obvious to me that if all the proposals were rejected for some 
reason, the PUA registry would just continue on as-is. But if there were 
hard-to-dispute reasons why a particular character of a proposal were 
rejected, that character would have to be discontinued in some way.  
Would deprecation without deletion make sense for this circumstance?

Does this answer your objections, do you think?  (I'm not asking if 
you're convinced, only if you think it's something that you'd consider 
reasonable, if disagree with).

Another serious issue.  The characters are such that I doubt they would 
be approved for the BMP.  Most of the tools being used by the users in 
the community in question (mostly Windows 98 and Mac OS 9 word 
processors and web browsers - yes, Mac OS 9 will be a problem anyway) 
are not yet able to handle secondary plane characters, at least not 
without serious intervention.  The PUA code points which would be used 
would be in the BMP because use of the secondary plane PUA (I don't 
remember the code points, so forgive me for not knowing what plane(s) 
they're in) would be obstacles to adoption.  The problem will be getting 
the targeted content providers to agree beforehand to convert their 
content to the approved codepoints when they become available, as the 
BMP code points are easier to support.  Does anyone have any advice / 
prior experience for dealing with this issue?

Finally, are there any existing resources describing / testing support 
for PUA characters in existing applications, besides Alan Wood's test 
page?  Perhaps at ConScript?

Thanks again for taking the time to answer these questions.


Patrick Rourke
[EMAIL PROTECTED]

Re: Private Use Agreements and Unapproved Characters

Reply via email to