Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

William Overington Thu, 19 Sep 2002 11:10:16 -0700

Kenneth Whistler wrote, as part of a longer response to my original posting.


>William Overington asked:

[snip]

>> I wonder if consideration could please be given as to whether this matter
>> should be left unregulated or whether some level of regulation should be
>> used.

>I think this should depend first on a determination of whether there
>is a demonstrated need for an actual representation of these sequences --
>which ought to be determined by the people responsible for the
>data stores which might contain them, namely the online bibliographic
>community.

[further remarks here snipped]

Actually, "this matter" to which I was intending to refer was as follows,
being more general than just the romanization of Cyrillic characters.

quote

It seems to me that this matter of sequences of combining characters being
used to give glyphs where different meanings are needed other than just
locally and that glyphs for such meanings are only correctly displayed if a
particular rendering system or a particular font are used touches at the
roots of the Unicode system.

It seems to me that the glyphs for such sequences are being left as if they
were a Private Use Area unregulated system.  I recognize that fonts have
glyph variations in that, say, an Arial letter b looks different to a
Bookman Old Style letter b, yet in that case the meaning is the same.

I wonder if consideration could please be given as to whether this matter
should be left unregulated or whether some level of regulation should be
used.

end quote

In another post in the same thread, Ken states as follows.

quote

But that wasn't my point. There is no particular evidence
that the ALA-LC conventions with the dot above the graphic
ligature ties is in widespread use for romanizations of these
particular languages, that I can see. So the *urgency* of
solving this problem isn't there, unless the LC/library/bibliographic
community comes to the UTC and indicates that they have a data interchange
problem with USMARC records using ANSEL that requires a clear
representation solution in Unicode.

end quote

The problem of which I am seeking discussion please is as to whether, in the
present state of the rules, there would be any need for any bibliographic
community to approach the Unicode Consortium over such a matter, and, if it
is the case that they would not need to do so, would it be better to seek to
change the rules now.

It is convenient to consider the situation in relation to the romanization
of Cyrillic characters, yet similar considerations may well potentially also
apply to topics such as the Byzantine legal texts.  There may well be other
topics to which similar considerations may apply.

For example, please suppose that there were a committee called the
Romanization of Cyrillic Committee.  Suppose that that committee were to
have various meetings and decide that for a ts romanization ligature that

t U+FE20 s U+FE21

suits them fine, and that for the ts with a dot above romanization ligature
that

t U+FE20 s U+FE21 U+0307

suits them fine and publishes a list of assignments and example glyphs.  The
glyph for the ts with a dot above ligature in that publication has the dot
above the curved line, centred horizontally.  It is only later that someone
with expert knowledge of the Unicode standard sees the published list and
notices that the glyph shown in the document is, in fact, not the way that
the glyph should appear according to the Unicode standard.  By this time,
many copies of the document have been published and sent to libraries around
the world!  Databases having started to be converted to what that
publication may well be calling "the new Unicode based system".

This might sound impossible, yet what is the present alternative?  There is
no way to formally register such sequences with the Unicode Consortium!

I suggest that it might be a good idea to have an infrastructure whereby the
Unicode Consortium registers sequences of combining characters and example
glyphs, categorized as to application.

This would have potentially far reaching benefits.

Suppose, for example, that such an infrastructure existed, and that there is
a mathematician, M, and a font designer, F, who do not know each other.

M is writing a research paper on a particular branch of mathematics, where
one of the key reference papers was written by an author whose name is
written in Cyrillic characters, yet which name also has a romanized version.
M finds that that romanization needs a character to represent the ts
romanization ligature.  How can M, who is using a word processor to prepare
the research paper, insert that character into the document, because M is
keen to insert the ts ligature in a form compatible with the standard
bibliographic method for romanization of Cyrillic names?

Fortunately, M finds that the word processor has available various special
characters and finds a ts ligature and inserts it in the document.  Behind
the scenes the wordprocessor software inserts the correct Unicode sequence
for the ts ligature.

The display is excellent.  However, as well as the wordprocessor software
having the capability to add the ts ligature sequence, the display is only
possible because F had, when updating the design of the mainstream roman
font R which F designed, included glyphs for various sequences of characters
used for representing romanization of Cyrillic characters.  F is pleased to
have done that, so that text set in the R font will, if some end user
chooses to include some romanization of Cyrillic characters in a document,
have iu, IU, ts and TS ligatures (etc) all appear in an elegant form.  F is
pleased that the R font can be used by end users in so many different areas
of application, because not only has F included sequences for romanization
of Cyrillic ligatures, F has also included ligatures for Byzantine legal
texts and for various other specialist application areas where a general
purpose roman font, such as R, might well be used by some of the end user
community.

F has found this quite straightforward to do, as, although not an expert in
the underlying theory of either the romanization of Cyrillic characters nor
in the encoding of Byzantine legal codes, F has the advantage of simply
monitoring the Unicode website and, whenever a new collection of sequences
is published, deciding whether to include those sequences in the various
fonts which F looks after.

Actually, F has, thus far, included all of the published sequences in the R
font.  However, F has only included a few of the sequences in various other
fonts.  For example, for the sequences for Byzantine legal codes, F included
special glyphs for each of the sequences in a decorative font based upon the
handwriting of a Byzantine scribe.

Stepping back outside the hypothesis, what we have now, even with the best
quality advice, is no more than the equivalent of legal opinion on what a
sequence means: registering sequences and their glyphs would be the
equivalent of a ruling by a court of record.

For the avoidance of doubt I am not suggesting that every possible sequence
of characters be registered, I am simply suggesting that a registration
procedure might well be helpful to the end user community, so that authors
of documents, font designers and others would all be in step regarding which
sequences to use for particular applications and regarding which sequences
to use to consider including in fonts as sequences to produce a specific
glyph rather than the rendering system needing to rely on default
combinations of combining characters which might produce a poor typographic
display.

I feel that there is presently the opportunity for the Unicode Consortium to
provide this facility to the end user community.  If the matter of
establishing the infrastructure is left for too long, perhaps until some
specific criterion of practical need is met, then it may well be that there
is typographic chaos in the matter and that the matter will never then be
right due to various legacy systems by then being in use.

So, I ask whether this matter could please be considered.

William Overington

19 September 2002

Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

Reply via email to