In a message dated 2001-10-02 4:50:03 Pacific Daylight Time, 
[EMAIL PROTECTED] writes:

>  Is there an official Unicode Consortium statement that states, for the
>  record, that the Unicode Consortium refuses to encode more ligatures and
>  precomposed characters please?

I'm pretty sure there is, since it has been brought up so often by UTC 
members on this list.  If there is no such statement, then one should be 
drafted.

>  I feel that this is a matter that needs to be formally resolved one way or
>  the other, so that, if such a refusal has been declared then people who 
wish
>  to have these characters encoded may act knowing that the Unicode 
Consortium
>  will have legally estopped itself from making any future complaint that it
>  has some right to set the standards in such a matter and that those people
>  who would like to see the problem solved and ligatured characters encoded 
as
>  single characters so that a font can be produced may proceed accordingly,
>  perhaps approaching the international standards body directly if the 
Unicode
>  Consortium refuses to do so without a process of even considering 
individual
>  submissions on their individual merits.  On the other hand, if no such
>  formal statement has been issued, then those people who would like to see
>  the problem solved and ligatured characters encoded as single characters so
>  that a font can be produced for use with software such as Microsoft Word 
may
>  proceed to define characters in the private use area in a manner compatible
>  with their possible promotion to being regular unicode characters in the
>  presentation forms section.

Was that only two sentences?  Wow....

Regarding the "refusal" to encode more ligatures and precomposed presentation 
forms: It is not arbitrary.  There is a reason why Unicode will not encode 
these things.  They would interfere with the established standard for 
decomposition.  Now that Unicode has reached its present level of popularity, 
some vendors and implementations (and standards) require a stable set of 
decomposable code points.  That set is Unicode 3.0.  If new precomposed 
characters were added, engines and standards that were built to the new 
standard would decompose them differently from those built to the old 
standard, and this is not acceptable to those who need decomposition to work 
at all.

Precomposed characters and ligatures won't be considered "on their individual 
merits," and they won't be "promoted" from a private standard to true Unicode 
character status, because the decomposition problem is bigger than the 
individual merits.  Note that I personally like the ct ligature and think it 
would be a great thing to have in a font.  If this were 1993, perhaps it 
might have been encoded.

Regarding fonts: Nothing is stopping you or anyone else from making a font 
with these precomposed glyphs and associating them with Unicode PUA (Private 
Use Area) code points.  That is an excellent illustration of a possible use 
of the PUA, and many, many font vendors do just that.  

>  I feel that it would be quite wrong to pull up the ladder on the 
possibility
>  of adding characters such as the ct ligature as U+FB07 without the
>  possibility of consideration of each case on its merits at the time that a
>  possibility arises.  A situation would then exist that several ligatures
>  have been defined as U+FB00 through to U+FB06 including one long s 
ligature,
>  yet that U+FB07 through to U+FB12 must remain unused even though they could
>  be quite reasonably used for ct and various long s ligatures so as to
>  produce a set of characters that could be used, if desired, for 
transcribing
>  the typography of an 18th Century printed book.  Yet, if the ladder has 
been
>  pulled up, perhaps U+FB07 can be defined as the ct ligature directly by the
>  international standards organization and the international standards
>  organization could decide directly about including the long s ligatures.

The organization you are talking about is ISO/IEC JTC1/SC2/WG2.  They are 
firmly committed to maintaining compatibility between Unicode and ISO/IEC 
10646.  Sorry, but this is a good thing.

>  If the possibility of fair consideration is, however, still open, then the
>  ct ligature could be defined as U+E707 within the private use area and
>  published as part of an independent private initiative amongst those 
members
>  of the unicode user community that would like to be able to use that
>  character in a document by the character being encoded as a character in an
>  ordinary font file.  That would enable font makers to add in the ct
>  character if they so choose.

You might start by checking existing fonts, especially those shipped with 
major operating systems, to see what PUA code points are commonly used 
internally for glyphs not associated with a standard Unicode character.  I 
know that several Windows fonts have privately assigned glyphs, and I assume 
the same is true for Macintosh fonts.  Also, maybe the various font makers 
who haunt this list could contribute any guidelines they know of for 
quasi-standardizing these code points.  Obviously, you are hoping that 
standardizing the code points could lead to some measure of interoperability; 
otherwise there would be no discussion.  If all you want is to encode the ct 
ligature in a font, you can use any old PUA character you wish, conformantly.

OTOH, private creation of quasi-standards on the part of vendors is not 
necessarily a good thing.  It is the sort of thing that the public tends to 
vilify Microsoft for doing.

If you want to interchange the ct ligature and the long-s ligatures, you can 
do that right now.  Just encode <c, ZWJ, t> or <long-s, ZWJ, whatever>.  
Then, rendering engines that have a glyph for the desired ligature can render 
it, and those that don't will fall back to the individual characters 
(assuming they are conformant).  This approach has at least three major 
advantages:

(1)  It is already supported by the Unicode Standard.
(2)  It provides a standard interchange mechanism without requiring font 
vendors to agree on the code point used for the precomposed glyph.
(3)  It provides a sensible fallback mechanism for the great majority of 
fonts that, let's admit it, will not have these specialized glyphs.

Think about it.

In a message dated 2001-10-02 6:35:16 Pacific Daylight Time, 
[EMAIL PROTECTED] writes:

>> You might want to take a look at the ConScript Unicode Registry, which was
>> originally intended for "constructed" and artificial scripts, but which
>> could also be used for this purpose.
>
>  No, it couldn't. It's for constructed and artificial scripts, not for 
>  precomposed Latin glyphs.

I stand corrected.  But there is no reason William couldn't initiate his own 
registry, along the lines of CSUR, for the purpose of assigning PUA code 
points to precomposed Latin glyphs.  Just don't expect the characters thus 
added to "graduate" somehow into Unicode.

-Doug Ewell
 Fullerton, California

Reply via email to