Kenneth Whistler wrote, as part of a longer response to my original posting.
>William Overington asked: [snip] >> I wonder if consideration could please be given as to whether this matter >> should be left unregulated or whether some level of regulation should be >> used. >I think this should depend first on a determination of whether there >is a demonstrated need for an actual representation of these sequences -- >which ought to be determined by the people responsible for the >data stores which might contain them, namely the online bibliographic >community. [further remarks here snipped] Actually, "this matter" to which I was intending to refer was as follows, being more general than just the romanization of Cyrillic characters. quote It seems to me that this matter of sequences of combining characters being used to give glyphs where different meanings are needed other than just locally and that glyphs for such meanings are only correctly displayed if a particular rendering system or a particular font are used touches at the roots of the Unicode system. It seems to me that the glyphs for such sequences are being left as if they were a Private Use Area unregulated system. I recognize that fonts have glyph variations in that, say, an Arial letter b looks different to a Bookman Old Style letter b, yet in that case the meaning is the same. I wonder if consideration could please be given as to whether this matter should be left unregulated or whether some level of regulation should be used. end quote In another post in the same thread, Ken states as follows. quote But that wasn't my point. There is no particular evidence that the ALA-LC conventions with the dot above the graphic ligature ties is in widespread use for romanizations of these particular languages, that I can see. So the *urgency* of solving this problem isn't there, unless the LC/library/bibliographic community comes to the UTC and indicates that they have a data interchange problem with USMARC records using ANSEL that requires a clear representation solution in Unicode. end quote The problem of which I am seeking discussion please is as to whether, in the present state of the rules, there would be any need for any bibliographic community to approach the Unicode Consortium over such a matter, and, if it is the case that they would not need to do so, would it be better to seek to change the rules now. It is convenient to consider the situation in relation to the romanization of Cyrillic characters, yet similar considerations may well potentially also apply to topics such as the Byzantine legal texts. There may well be other topics to which similar considerations may apply. For example, please suppose that there were a committee called the Romanization of Cyrillic Committee. Suppose that that committee were to have various meetings and decide that for a ts romanization ligature that t U+FE20 s U+FE21 suits them fine, and that for the ts with a dot above romanization ligature that t U+FE20 s U+FE21 U+0307 suits them fine and publishes a list of assignments and example glyphs. The glyph for the ts with a dot above ligature in that publication has the dot above the curved line, centred horizontally. It is only later that someone with expert knowledge of the Unicode standard sees the published list and notices that the glyph shown in the document is, in fact, not the way that the glyph should appear according to the Unicode standard. By this time, many copies of the document have been published and sent to libraries around the world! Databases having started to be converted to what that publication may well be calling "the new Unicode based system". This might sound impossible, yet what is the present alternative? There is no way to formally register such sequences with the Unicode Consortium! I suggest that it might be a good idea to have an infrastructure whereby the Unicode Consortium registers sequences of combining characters and example glyphs, categorized as to application. This would have potentially far reaching benefits. Suppose, for example, that such an infrastructure existed, and that there is a mathematician, M, and a font designer, F, who do not know each other. M is writing a research paper on a particular branch of mathematics, where one of the key reference papers was written by an author whose name is written in Cyrillic characters, yet which name also has a romanized version. M finds that that romanization needs a character to represent the ts romanization ligature. How can M, who is using a word processor to prepare the research paper, insert that character into the document, because M is keen to insert the ts ligature in a form compatible with the standard bibliographic method for romanization of Cyrillic names? Fortunately, M finds that the word processor has available various special characters and finds a ts ligature and inserts it in the document. Behind the scenes the wordprocessor software inserts the correct Unicode sequence for the ts ligature. The display is excellent. However, as well as the wordprocessor software having the capability to add the ts ligature sequence, the display is only possible because F had, when updating the design of the mainstream roman font R which F designed, included glyphs for various sequences of characters used for representing romanization of Cyrillic characters. F is pleased to have done that, so that text set in the R font will, if some end user chooses to include some romanization of Cyrillic characters in a document, have iu, IU, ts and TS ligatures (etc) all appear in an elegant form. F is pleased that the R font can be used by end users in so many different areas of application, because not only has F included sequences for romanization of Cyrillic ligatures, F has also included ligatures for Byzantine legal texts and for various other specialist application areas where a general purpose roman font, such as R, might well be used by some of the end user community. F has found this quite straightforward to do, as, although not an expert in the underlying theory of either the romanization of Cyrillic characters nor in the encoding of Byzantine legal codes, F has the advantage of simply monitoring the Unicode website and, whenever a new collection of sequences is published, deciding whether to include those sequences in the various fonts which F looks after. Actually, F has, thus far, included all of the published sequences in the R font. However, F has only included a few of the sequences in various other fonts. For example, for the sequences for Byzantine legal codes, F included special glyphs for each of the sequences in a decorative font based upon the handwriting of a Byzantine scribe. Stepping back outside the hypothesis, what we have now, even with the best quality advice, is no more than the equivalent of legal opinion on what a sequence means: registering sequences and their glyphs would be the equivalent of a ruling by a court of record. For the avoidance of doubt I am not suggesting that every possible sequence of characters be registered, I am simply suggesting that a registration procedure might well be helpful to the end user community, so that authors of documents, font designers and others would all be in step regarding which sequences to use for particular applications and regarding which sequences to use to consider including in fonts as sequences to produce a specific glyph rather than the rendering system needing to rely on default combinations of combining characters which might produce a poor typographic display. I feel that there is presently the opportunity for the Unicode Consortium to provide this facility to the end user community. If the matter of establishing the infrastructure is left for too long, perhaps until some specific criterion of practical need is met, then it may well be that there is typographic chaos in the matter and that the matter will never then be right due to various legacy systems by then being in use. So, I ask whether this matter could please be considered. William Overington 19 September 2002

