Thanks to everyone who has commented, especially John Cowan, Doug Ewell, and David Starner (I'm on the digest, and so apologize if I haven't thanked someone who has provided substantial comments). Thanks too to Mr. Overington, though with Mr. Kaplan I agree that this is a bit too much work to avoid the minor issue of overlapping PUA uses for my purposes; I was hoping merely to find an existing registry which might have some overlap with the user community I'm concerned with. I'm replying mainly to Mr. Ewell's comments, which are the kind of counter-arguments I was hoping to be able to consider.
Sorry to be coy, but since I'm writing a proposal (not a Unicode proposal) to the authors of a couple of Unicode proposals for such a registry, and since the proposals which would be included in the registry are ones I did not have any hand in writing, I think it would be better for me to avoid too much precision until I've got the approval of the proposal writers (who would also be among the most important of my targeted users). > There's no reason it has to be that way. Proposed glyphs are posted on > the Unicode Web site months in advance of their "go live" date, even > before the beta period, largely for this reason. I'm sure Unicode-aware > type designers like John Hudson don't wait until a version of Unicode is > formally released before they start designing glyphs. True, but many scholarly communities are small enough that their needs might not be of interest to type designers with a wider targeted audience (like Mr. Hudson), and so depend largely upon small typographers, even amateurs to provide their type. In such cases, it would seem to me that a registry such as the one I'm suggesting would help to drive the transition. At any rate, I've already had two type designers who've done type for the community show interest in such a registry. > One important point to remember is that any use or proposed use of the > PUA, such as ConScript, is strictly up to private organizations, not the > Unicode Consortium. To be sure, ConScript is the domain of two guys who > are quite influential in Unicode, but they do not maintain ConScript in > any official capacity as representatives of Unicode. Fully aware of this. I'm thinking that this would be an improvement over the status quo, which is as David Starner suggested, the use of informal private encodings or escaped entities. > I would think you could simply use the version number of the Unicode > Standard. For example, the use of Tagalog would have been conformant to > this proposed PUA registry until Unicode version 3.2, at which time it > would have to be removed from the registry because of its introduction > into Unicode. This had not occurred to me! The only thing that would militate against this would be if additional characters were identified which had not yet been proposed and were proposed at a later date; that would require a new version number which would not be a Unicode point number, and so might be distinguished using a letter, etc. (I don't foresee this happening, but it's better to be safe than sorry, no?). > Conformance to this registry, especially over a period of time, is up to > the user community. The presence of a standard is no guarantee that it > will be followed, or even noticed. Excellent, this is the problem I was most concerned with. The target users for the registry would be a small number of electronic scholarly publishers in the community. The license for the fonts would strongly recommend that content providers using registry-based fonts would have to convert their character data to the Unicode-approved codepoints within say six months of release, and for the target publishers this wouldn't be a problem. If the distribution sites for the released fonts all included prominent links to the registry site, and the registry site provided information on the progress of the characters in the encoding process, this would I hope drive the adoption of later versions. So those outside the target user group would at least be made aware of the process by the license, and a mechanism would be in place to prevent the dead hand of the older versions of the registry from being quite so strong. > Suppose Old Persian Cuneiform is encoded in Patrick's PUA registry next > week, and that encoding achieves some popularity. Then suppose at some > later date it is encoded in Unicode, say version 4.1. This will > necessarily cause the encoding in Patrick's registry to be withdrawn, or > at least deprecated. I was thinking deprecated for two versions or two years, whichever was longer, and then ultimately withdrawn. > How many people will switch immediately to the > sanctioned Unicode encoding? How quickly will existing software and > data be converted? Probably not right away, and the chances for a > timely conversion are less if the private-use encoding is particularly > successful, whether or not there are scripts available to help people > make the conversion. There would in fact be a published time-table. Of course, if the private-use encoding became popular enough that it was used OUTSIDE the targeted group of content providers, this would become an issue. But since the targeted group of content providers are pretty influential in the community (e.g., most users in the community would need to get a font that could be used to read the targeted groups' content), I'm hoping that their transition would drive the transition of other content providers. So obviously this idea is strongly dependent upon the approval and cooperation of the targeted group of content providers, and so would have to be abandoned if I did not convince them. > This is exactly the reason for the "rigorous proposal/review policy" > mentioned earlier, and perhaps the biggest drawback to the concept of a > widespread PUA encoding for future Unicode scripts. It usually does > take a while to get characters encoded in Unicode, not just because > committees are big and slow and bureaucratic, but because there are real > decisions to be made that can take a lot of time and research. Rushing > these characters into use before Unicode and WG2 have finished making > these decisions could subvert the process and create the dilemmas > Patrick mentioned. The point is that the registry would not be "rushing characters into use," but that they would be characters which were already in use with a variety of non-standardized methods and which are widely used in print in the community. I'm all too aware of why it takes time and research - for example, there are times when it is very difficult to distinguish a unique character from a variant letterform. However, there are characters which are unambiguously represented as entities in an existing private encoding, and are present as glyphs in existing privately "encoded" fonts (which are not compatible with one another), and which are clearly not merely alternate glyphs, but unique characters. These characters are ones which I would think could be included in such a registry, and would have a very high probability (I'd guess 90% or more) of being encoded. But my ability (with the help of others who are familiar with both the principles of Unicode and with the needs of the community) to "predict" whether a character would be approved by Unicode and WG2 isn't going to be 100% accurate. So it would seem to me that the best route would be to include the proposals in toto and work out what will be done if certain characters are not encoded. It seems obvious to me that if all the proposals were rejected for some reason, the PUA registry would just continue on as-is. But if there were hard-to-dispute reasons why a particular character of a proposal were rejected, that character would have to be discontinued in some way. Would deprecation without deletion make sense for this circumstance? Does this answer your objections, do you think? (I'm not asking if you're convinced, only if you think it's something that you'd consider reasonable, if disagree with). Another serious issue. The characters are such that I doubt they would be approved for the BMP. Most of the tools being used by the users in the community in question (mostly Windows 98 and Mac OS 9 word processors and web browsers - yes, Mac OS 9 will be a problem anyway) are not yet able to handle secondary plane characters, at least not without serious intervention. The PUA code points which would be used would be in the BMP because use of the secondary plane PUA (I don't remember the code points, so forgive me for not knowing what plane(s) they're in) would be obstacles to adoption. The problem will be getting the targeted content providers to agree beforehand to convert their content to the approved codepoints when they become available, as the BMP code points are easier to support. Does anyone have any advice / prior experience for dealing with this issue? Finally, are there any existing resources describing / testing support for PUA characters in existing applications, besides Alan Wood's test page? Perhaps at ConScript? Thanks again for taking the time to answer these questions. Patrick Rourke [EMAIL PROTECTED]

