On 4/20/2014 3:24 AM, Eli Zaretskii wrote:
Would someone please help understand the following subtleties and
obscure language in the UBA document found at
http://www.unicode.org/reports/tr9/? Thanks in advance.
Eli,
I've tried to give you some explanations - in some places, I concur with
you that the wording could be improved and that such improved wording
should be proposed to the UTC (or its editorial committee) for
incorporation into a future update.
For details, see below.
1. In paragraph 3.1.2, near its very end, we have this sentence (with
my emphasis):
As rule X10 will specify, an isolating run sequence is the unit to
which the rules following it are applied, and the last character of
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
one level run in the sequence is considered to be immediately
followed by the first character of the next level run in the
sequence during this phase of the algorithm.
What does it mean here by "the rules following it"? Following what?
That looks like a bad referent, but from context, this "it" must be X10
2. In BD16 (paragraph 3.1.3), the 1st bullet says:
. Create a stack for elements each consisting of a bracket character
and a text position. Initialize it to empty.
But then 1st sub-bullet of the 3rd bullet says:
. If an opening paired bracket is found, push its
Bidi_Paired_Bracket property value and its text position onto
the stack.
But the stack does not hold values of Bidi_Paired_Bracket property, it
holds characters.
The Bidi_Paired_Bracket property is a character code (it is the
character code of the other
partner in the pair).
Items 2 and 3 below that say:
2. Compare the closing paired bracket being inspected or its
canonical equivalent to the bracket in the current stack
element.
3. If the values match, meaning the two characters
form a bracket pair, then [...]
So I guess the 1st bullet is correct, but the 3rd bullet should say
"... push the opening paired bracket character and its text position
onto the stack". Is this the correct interpretation?
What's really required is that the stack contain a unique identifier for
each bracket pair, so that, given a function that maps either opening or
closing brackets (or their canonical equivalents) to this id, one can
determine that both character belong to the same pair.
This unique id could be the opening or the closing bracket (or its
canonical equivalent), it makes to practical difference. However, it
looks like UAX#9 is written in terms of the code point for the closing
bracket.
Bullet 1 could be changed to
. Create a stack for elements each consisting of a*code point*
(Bidi_Paired_Bracket property value)
and a text position. Initialize it to empty.
to make things more clear. And a slight wording change might help the
reader with item 2:
2. Compare the*code point for the*closing paired bracket being inspected
or its
canonical equivalent to the*code poin*t (Bidi_Paired_Bracket property
value) in the current stack
element.
And, to continue
3. If the values match, meaning*the character being inspected and the
character**
** at the text position in the stack* form a bracket pair, then [...]
3. Paragraph 3.3.2 says, under "Non-formatting characters":
X6. For all types besides B, BN, RLE, LRE, RLO, LRO, PDF, RLI, LRI,
FSI, and PDI:
. Set the current character’s embedding level to the embedding
level of the last entry on the directional status stack.
[...]
Note that the current embedding level is not changed by this rule.
What does this last sentence mean by "the current embedding level"?
The first bullet of X6 mandates that "the current character’s
embedding level" _is_ changed by this rule, so what other "current
embedding level" is alluded to here?
I'm punting on that one - can someone else answer this?
4. Rule X10 says in its last bullet:
Apply rules W1–W7, N0–N2, and I1–I2, in the order in which they
appear below, to each of the isolating run sequences, applying one
rule to all the characters in the sequence in the order in which
they occur in the sequence before applying another rule to any part
of the sequence. The order that one isolating run sequence is
treated relative to another does not matter.
Does the last sentence mean that it is OK to apply W1 to the 1st
isolating sequence, then apply W1 to the second isolating sequence,
then apply W2 to the 1st isolating sequence, followed by W2
application to the 2nd isolating sequence, etc.? IOW, the last
sentence refers to the order of processing between the isolating run
sequences, but says nothing about the order of applying rules between
the sequences.
Apply rules W1–W7, N0–N2, and I1–I2 to each of the isolating run sequences.
For each sequence, [completely] apply each rule in the order in which they
appear below.
The order that one isolating run sequence is treated relative to another
does not matter.
I believe the above restatement expresses the same thing in fewer words.
The "completely" may be unnecessary. The text about applying the rules to "all
characters" seems to be unnecessary, unless there is, in any of the rules, an
option to not apply it to some characters. Unless incomplete application is
envisaged, calling out the "all characters" here just confuses.
5. Rule N0 says:
. For each bracket-pair element in the list of pairs of text positions
a. Inspect the bidirectional types of the characters enclosed
within the bracket pair.
b. If any strong type (either L or R) matching the embedding
direction is found, set the type for both brackets in the pair
to match the embedding direction.
First, what is meant here by "strong type [...] matching the embedding
direction"? Does the "match" here consider only the odd/even value of
the current embedding level vs R/L type, in the sense that odd levels
"match" R and even levels "match" L? Or does this mean some other
kind of matching? Table 3, which the only place that seems to refer
to the issue, is not entirely clear, either:
e The text ordering type (L or R) that matches the embedding level
direction (even or odd).
Again, the sense of the "match" here is not clear.
even/odd --- R/L match, might be made more explicit
Next, what is meant here by "the characters enclosed within the
bracket pair"? If the bracket pair encloses another bracket pair,
which is inner to it, do the characters inside the inner pair count
for the purposes of resolving the level of the outer pair?
They do, so there's no need to change the text.
Lastly, I presume that by "the bidirectional types of the enclosed
characters" the text means the resolved types as modified by the
preceding phases, not the original types. Is that correct?
It's the strong type assigned by rule N0.
A./
Again, thanks in advance for any help.
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode