On Wednesday, August 06, 2003 12:38 PM, Kent Karlsson <[EMAIL PROTECTED]> wrote:
> Since I think <a, ring above, cgj, dot below> should be canonically
> equivalent to <a, dot below, cgj, ring above>, but cannot be made
> so (now), the only ways out seem to be to either formally deprecate
> CGJ, or at least confine it to very specific uses. Other occurrences
> would not be ill-formed or illegal, but would then be non-conforming.

There's a way to specify that <A, RingAbove, CGJ, DotBelow> is
well-formed, but not <A, DotBelow, CGJ, RingAbove>:
a CGJ can be authorized in a combining sequence only if it
precedes a base character, or is precedes a combining character
which combining class is strictly lower than the combining class
of the previous character.

So, with this definition, with the combining classes indicated:

- <A=0, RingAbove=230, CGJ=0, DotBelow=220>
  is well-formed because 220 < 230. It is distinct from:
  <A=0, RingAbove=230, DotBelow=220>, whose canonical
  ordering is
  <A=0, DotBelow=220, RingAbove=230>

- <A=0, DotBelow=220, CGJ=0, RingAbove=230>
  is ill-formed because 230 > 220. The CGJ is superfluous
  and should be removed to create:
  <A=0, DotBelow=220, RingAbove=230>

- <A=0, DotBelow=220, CGJ=0, Cedilla=220>
  is ill-formed because 220 = 220. The CGJ is superfluous
  and should be removed to create:
  <A=0, DotBelow=220, Cedilla=220>
  which is well-formed and in canonical order.

- <A=0, Cedilla=220, CGJ=0, DotBelow=220>
  is ill-formed because 220 = 220. The CGJ is superfluous 
  and should be removed to create:
  <A=0, Cedilla=220, DotBelow=220>
  which is well-formed and in canonical order.

This "well-formed" rule would clearly give an exact semantic
for CGJ, used in the middle of a combining sequence as the
only way to bypass the canonical reordering of combining
characters.


Reply via email to