On 01/29/2015 08:19 PM, Philippe Verdy wrote:
2015-01-29 19:52 GMT+01:00 Karl Williamson <[email protected]
<mailto:[email protected]>>:
Rule WB4 is
"Ignore Format and Extend characters, except when they appear at the
beginning of a region of text.".
Not clearly stated, but it appears to me that the ZWJ must be
considered here to be the beginning of a region of text, as we are
looking at the boundary between it and the "A". No rule
specifically mentions ALetter followed by an Extend, so by the
default rule, WB14
"Otherwise, break everywhere (including around ideographs)"
All the text is targeted at finding candidate positions for breaks. It
is not very clear that "ignore" is definitive and means that there
cannot be any further breaks before the Format and Extend characters,
except at beginng of text. So all the rest of rules is ignored, there
was a match and you stop there; no break before;
Any × (Format | Extend)
This is confirmed in other rules that state the word "otherwise",
including the last one (WB14) you quote which is explciitly not applicable.
I don't understand you here. I understand all the words, but I don't
see what you're trying to say. My claim is that there should be a rule:
as you give
Any × (Format | Extend)
but there isn't. I think you are maybe trying to say that the word
"ignore" in this UAX is tantamount to such a rule. I am a native
English speaker, and would never have drawn that inference from the
text. There are a lot of passages in the Standard that sound like
gibberish to me. I know the words' meanings, but the combination don't
make any sense. I don't recall ever having this issue in other
standards I've looked at.
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode