On 04/08/2003 16:52, Kenneth Whistler wrote:

U+200B ZERO WIDTH SPACE might be appropriate, but this has the problem that it is a break opportunity, which is not always appropriate.



U+200B ZERO WIDTH SPACE is not appropriate, for the same reason
the U+FEFF (or U+2060) is not appropriate: The Standard does
not specify the display of non-spacing marks on it as a means
of showing the marks without base characters. And, as you indicate,
U+200B (but also U+FEFF and U+2060) are implicated in the control
of line break opportunities. They are certainly not defined
as glyph display anchors or some such.


Thank you for the clarification.






Their
names may be misleading; people intending to use them for any other
function should carefully read the sections of the Unicode Standard
that discuss their usage.



But which sections? Where is the index, online?


Patience please. The editor is paddling as fast as she can. If
you will refrain from clicking the remote for just a day or two
longer, all will be revealed.


I will wait, and try to do so patiently.



Are you surprised that I am confused?



No. That's why I'm spending time trying to keep making the
clarifications for you and others.


Thank you. I appreciate the time you are putting into this.




The function I think you have in mind is not isolated display of
a combining mark, but rather trying to find a mechanism for
getting around the conformance strictures of the standard, to
get a combining mark to apply to a *following* base
character, rather than to a *preceding* base character.




If by "apply" in the above you mean "be positioned adjacent to",



No, I mean logical application, in this context.


There are admitted deficiencies in the standard's text, even
now, regarding just what the "graphic interaction" for a combining
mark means -- that is grist for the Unicode 5.0 mill to grind
very finely, I suggest.



there is already a problem with the standard: the EXISTING Hebrew page of the standard is in contravention to its conformance strictures. This is because under the existing standard (irrespective of any changes being proposed) and in legacy encodings, the combining mark holam, which is usually graphically positioned above the preceding base character, is in certain environments, specifically when followed by a silent alef (holam male is a separate issue), graphically positioned above the following base character. But the standard has anticipated this kind of difficulty by recognising that positioning is not always consistent with logical ordering, see the note on Indic vowel signs in The Unicode Standard 4.0 section 2.10, subsection "Sequence of Base Characters and Diacritics", http://www.unicode.org/book/preview/ch02.pdf.


Or meditate on Figure 2-3, Unicode Character Code to Rendered Glyphs.
That is the fundamental mandala of the standard. ;-)


Thanks for the pointer.

A similar issue which is not Hebrew related would be a (mythical) requirement to display a diacritic like 0315, 031B or 0322 in isolation. It would not always be appropriate to use a space or NBSP as a base character as this would indent the glyph from the beginning of a line in a way which might not be wanted. What would be the recommended encoding if one wanted to display one of these characters with no leading white space?



This is a documented special case; Hebrew holam followed by silent alef is also a special case whether you like it or not, it just hasn't been documented. It could be removed, but that would require changes to every existing (ancient or modern) pointed Hebrew text.



The discussion of details of how to represent these sequences
should probably migrate back to the [EMAIL PROTECTED] list.


I have already copied it there.


-- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/





Reply via email to