On 03/05/2002 08:00:58 PM Kenneth Whistler wrote:

>Actually, I am finding myself attracted to the parsimony of this
>approach. 

Parsimony? Thinking in terms of formal grammars and formal languages, it's 
a simple mechanism that overgenerates big time. Not everyone would call 
that parsimony. Isn't it a powerful mechanism that deals with a very small 
problem? A howitzer to shoot two rabbits? As Rick has suggested, how many 
double diacritics are we *really* likely to encounter? (Or are we 
considering this so that people will be able to invent new ways to notate 
things in writing?) And how many triple, quadruple, n-tuple (n > 4) 
diacritics are we *really* likely to encounter?


>1. Rendering applications already have to deal with combining
>   enclosing marks (well, at least if they choose to support them).

That qualifier is pretty significant here. I can't imagine too many font 
developers getting terribly excited about implementing U+20DD to enclose 
more than one preceding character, for example. Font developers will 
implement multi-character enclosing marks for Arabic because (i) they know 
that these really are needed, and (ii) they know that there is a 
well-contained limit to what length spans they have to accomodate. But if 
you ask those font developers to implement a combining tilde that can span 
up to eight base characters, I think you'll get a very cool response (or 
else an earful of laughter).

Sure, it's a slick idea. But it seems to be a solution begging for a need.


>On the downside, it might be awhile before rendering engines
>and font definitions really catch up to it. 

If you demonstrated a need for particular double diacritics, you'd 
probably get implementations before too long. But you'd need to spell out 
*exactly* what diacritics were involved -- font developers aren't going to 
go inventing typographic oddities of their own volition. And don't expect 
spans longer than two base characters unless you can come up with specific 
needs that clearly point to a text-based solution rather than a 
general-graphics-based solution. I suspect that the list of clear needs 
you could come up with are very short -- a handful at best. And if it's so 
few, why not just encode them directly rather than create a generative 
mechanism that's never going to be used except in very limited ways.


>That is, the whole
>notion of "adjusting" a diacritic to apply to an enclosure is
>fairly sophisticated, since it may involve context-dependent
>rules and arbitrary shape modifications -- not merely moving
>a glyph origin point based on a preceding glyph's metrics.

Not "may involve", but "will involve". That's why font developers are only 
likely to implement this mechanism for a very small set of documented 
needs.



>On the other hand, hacked up fonts for limited dictionary
>usage could be rather quick and easy. For the old Webster's
>pronunciation guides, the entities are really the oomacr
>and oobreve shown in the examples that started this thread.
>Simply preform those entities as glyphs in a font, and map them
>to <o, CGJ, o, CGJ, combining_macron> and
>to <o, CGJ, o, CGJ, combining_breve> respectively. Presto,
>you have a Unicode representation for the text, and a
>reliable font rendering for them, without any fancy-dancing
>about dynamic positional adjustments. 

And if it's only those two things that have any likelihood of getting 
implemented in fonts, doesn't it make more sense to encode those two 
diacritics rather than create a hypergenerative mechanism that will be 
ignored?


>The fallback rendering,
>in applications and fonts not wise to the CGJ rules would
>be {o o-macron} and {o o-breve}, which while not exact,
>is at least comprehensible and close enough for gummint work.

Or the fallback mechanism would be {o o box-macron} since TUS3.2 has 
basically told font implementers that CGJ between a base and a combining 
mark indicates bad data, and since fonts developed prior to TUS3.2 could 
map CGJ to .notdef anyway.

I don't generally find myself arguing against generative mechanisms. I 
won't be as horrified as Rick if this gets implemented, but I'm inclined 
to agree with him that I don't think it's really needed. If it is going to 
happen, I'd suggest it get done at the next meeting while people are still 
working on implementing 3.2 CGJ behaviour and the new end-of-ayah type of 
behaviour. Or let's just encode a double macron and double breve and move 
on.



- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>


Reply via email to