"Philippe Verdy" wrote:
> But even with this case, you wont be able to encode with the ZWJ trick
> in plain text, such groupings that are expressed this way in TeX:
>
> \breve{ \breve{oo} x \breve{ o\acute{o} } }
>
> Because double diacritics encoded in Unicode can't be safely stacked
> together (for such application you'll need a rich-text layer on top of
> Unicode, such as TeX here).
I just thought about a solution to allow stacking of double-diacritics: we
could use variation selectors after them,
to specify a higher level of grouping.
So in the example above:
- "\breve{oo}" remains encoded as:
- "x" remains encoded as:
- "o \acute{o}" remains encoded as: followed by or
- "\breve{o \acute{o}}" remains encoded as:
And to stack a second level of breves, we could use between those three groups:
Even softwares ignoring how to create the layout would still consider this long
sequence as an unbreakable extended
grapheme cluter. and its important relative ordering will be presrved by
normalizations. Here also you'll be able to
add other single diacritics on top of the double breves...
This way, you may stack up to 256 additional levels of double diacritics in a
structured layer that will be
preserved as a single extended grapheme cluster.
Softwares that don't know what to with the variation selectors will ignore
them, and will treat all double breves
above as equal, so they will render something like this in TeX:
\breve{ oo x o \acute{o} }
in a single grouping (not so bad after all...)
BUT! Such variations sequences have NOT been allocated in the Unicode registry
for this purpose. I think that such
application should use something else than variation selectors, that are
intended to represent glyphic variants for
the individual double diacritics.
An I think that this could be done by allocating instead, in the special plane
15, a block for STACKING selectors
(or more generally GROUPING LEVELS), with exactly the same properties as
variation selectors, except that they won't
require a prior registration for their use in association with double
diacritics.
Such selectors could eventually be used to encode bidimensional structures like
those used in Egyptian hieroglyphs,
and that already use the default horizontal layout and would require a single
additional vertical stacking. For
example:
- generates the TeX equivalent of: "\hiero{1} \hiero{2}" : this is the normal
horizontal reading
- generates the TeX-like equivalent of: "\vstack{ \hiero{1} \hiero{2} }" :
this is the
vertical stacking behavior, and needs a joiner-like character to preserve the
unbreakable "extended grapheme
cluster".
But when both horizontal and vertical layout are used, the direction of
stacking in complex groupings must be
disambiguated, and would require two distinct characters. We could use ZWJ for
grouping with horizontal layout
(within a larger vertically stacked compound), and ZWNJ for grouping with
vertical layout. So we would encode here
for this second case.
Now if the structure is more complex, we'll need several levels of grouping,
both for the horizontal and the
vertical joiners. Adding a GROUPING LEVEL (acting exactly like a variation
selector), encoded just after ZWJ or ZWNJ
(using the special codepoint in plane 15, encoded as a combining character with
combining class 0) would solve the
representation problem.
For example (HIERO1-HIERO2:HIERO3)-HIERO4:HIERO5 (usiong the WikiHiero
notation), whose layout is similar to:
+--------+--------+--------+
| HIERO1 | HIERO2 | |
+--------+--------+ HIERO4 |
| HIERO3 | |
+-----------------+--------+
| HIERO5 |
+--------------------------+
could be encoded as:
And it will still match the definition of extended grapheme clusters, while
also fully preserving the semantic
composition and structure of the cluster :
* The absence of a grouping level selector means that the horizontal or
vertical joiners are acting at level 0.
* Sequences encoded at the same grouping level using ZWJ separators are
assuming the horizontal layout for
hieroglyphs
* Those encoded at the same grouping level with ZWNJ are assuming the vertical
layout.
* ZWJ (horizontal layout) has as higher grouping priority than ZWNJ if they
occur simultaneously at the same level.
If the grouping level selectors are not supported by the layout engine, it will
just try to honor ZWJ and ZWNJ
(ignoring the specified grouping levels) as if it was only encoded as:
which is the actual encoding (in WikiHiero syntax) of
(HIERO1-HIERO2:HIERO3-HIERO4:HIERO5)
+--------+--------+
| HIERO1 | HIERO2 |
+--------+--------+
| HIERO3 | HIERO4 |
+--------+--------+
| HIERO5 |
+-----------------+
And if the vertical stacking is not supported by the layout engine, it will
also ignore the ZWJ and ZWNJ, and so
will render the five hieoroglyphs linearily, ignoring in fact just only the
vertical layers by drawing them in three
successive spans as:
+-----------------+-----------------+--------+
| HIERO1 HIERO2 | HIERO3 HIERO4 | HIERO5 |
+-----------------+-----------------+--------+
Which is, for now, all that Unicode officially documents.
But the bad thing I don't like in such use of ZWNJ and ZWNJ, is that it is not
intended for controlling the layout,
but instead to hint the presence or absence of ligatures. Are compound layouts
such as those used in hieroglyphs to
be considered as special graphic ligatures ?
I think that they represent something much stronger than what ZWJ and ZWNJ
represent. But there are precedents of
such strong semantic assignments to ZWJ and ZWNJ for Indic scripts. I don't
think that what is already used to
control the semantics (and partially the graphic appearance) in Indic scripts
(in a way specific to those scripts),
can't be also used here specifically for hieroglyphs that really need such
strong semantics, even if they certainly
don't need other kinds of ligatures.
Adding the generic ZWJ, ZWNJ (optionnaly followed by the generic grouping level
selectors) to the hieroglyphic
script will not alter the way it is already encoded. But at least it will be
possible to preserve the hieroglyph
semantics in plain-text, without depending on an unspecified syntax.
So my dicussion here only proposes only one addition for encoding as new
characters in Unicode:
- adding a new block of grouping selectors in the special plane 15. In my
opinion, a single row of 16 grouping level
selectors (acting in additional to the implicit level 0) will be enough for all
situations. They MUST have combining
class 0, and might be ignorable, just like variation selectors, except that
they don't imply any glyph modification
for the characters that are encoded in the composite "default grapheme
cluster". They must have a general category
of "zero-width" combining characters (probably Mo), and must be *optionally*
ignorable in collations. They should
not format controls (in general category C) because they would be ignored in
all cases in collations.
- the addition of 2 generic horizontal/vertical grouping may be discussed : can
we override ZWJ and ZWNJ ? If not,
then ZWJ/ZWNJ + a grouping level may be also encoded as a single Unicode
character, with the same general properties
as ZWJ and ZWNJ, all in the same allocated block in the special plane.
Only the vertical groupings will be used to stack vertically the double
diacritics or to stack other diacritics on
top of a double diacritic.
This is left to discussions as several options are possible, before one can be
implemented somewhere, tested, and
finally recommanded.
I'm not asking to add grouping selectors immediately, if existing variation
selectors can safely be used on top of
ZWJ and ZWNJ, and if ZWJ/ZWNJ can be used in some scripts (like Egyptian
hieroglyphs) to encode their semantic 2D
layout.
Philippe.