Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

Asmus Freytag Sun, 20 Apr 2014 13:02:04 -0700

On 4/20/2014 3:24 AM, Eli Zaretskii wrote:

Would someone please help understand the following subtleties and
obscure language in the UBA document found at
http://www.unicode.org/reports/tr9/?  Thanks in advance.


Eli,

I've tried to give you some explanations - in some places, I concur withyou that the wording could be improved and that such improved wordingshould be proposed to the UTC (or its editorial committee) forincorporation into a future update.


For details, see below.


1. In paragraph 3.1.2, near its very end, we have this sentence (with
my emphasis):

   As rule X10 will specify, an isolating run sequence is the unit to
   which the rules following it are applied, and the last character of
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   one level run in the sequence is considered to be immediately
   followed by the first character of the next level run in the
   sequence during this phase of the algorithm.

What does it mean here by "the rules following it"?  Following what?


That looks like a bad referent,  but from context, this "it" must be X10


2. In BD16 (paragraph 3.1.3), the 1st bullet says:

   . Create a stack for elements each consisting of a bracket character
     and a text position. Initialize it to empty.

But then 1st sub-bullet of the 3rd bullet says:

     . If an opening paired bracket is found, push its
       Bidi_Paired_Bracket property value and its text position onto
       the stack.

But the stack does not hold values of Bidi_Paired_Bracket property, it
holds characters.

The Bidi_Paired_Bracket property is a character code (it is thecharacter code of the other

partner in the pair).

  Items 2 and 3 below that say:

       2. Compare the closing paired bracket being inspected or its
         canonical equivalent to the bracket in the current stack
         element.
       3. If the values match, meaning the two characters
         form a bracket pair, then [...]

So I guess the 1st bullet is correct, but the 3rd bullet should say
"... push the opening paired bracket character and its text position
onto the stack".  Is this the correct interpretation?

What's really required is that the stack contain a unique identifier foreach bracket pair, so that, given a function that maps either opening orclosing brackets (or their canonical equivalents) to this id, one candetermine that both character belong to the same pair.

This unique id could be the opening or the closing bracket (or itscanonical equivalent), it makes to practical difference. However, itlooks like UAX#9 is written in terms of the code point for the closingbracket.


Bullet 1 could be changed to

  . Create a stack for elements each consisting of a*code point*  
(Bidi_Paired_Bracket property value)
    and a text position. Initialize it to empty.

to make things more clear. And a slight wording change might help thereader with item 2:


      2. Compare the*code point for the*closing paired bracket being inspected 
or its
         canonical equivalent to the*code poin*t (Bidi_Paired_Bracket property 
value) in the current stack
         element.


And, to continue

      3. If the values match, meaning*the character being inspected and the 
character**
**       at the text position in the stack*  form a bracket pair, then [...]


3. Paragraph 3.3.2 says, under "Non-formatting characters":

    X6. For all types besides B, BN, RLE, LRE, RLO, LRO, PDF, RLI, LRI,
        FSI, and PDI:

        . Set the current character’s embedding level to the embedding
          level of the last entry on the directional status stack.

     [...]

    Note that the current embedding level is not changed by this rule.

What does this last sentence mean by "the current embedding level"?
The first bullet of X6 mandates that "the current character’s
embedding level" _is_ changed by this rule, so what other "current
embedding level" is alluded to here?

I'm punting on that one - can someone else answer this?


4. Rule X10 says in its last bullet:

    Apply rules W1–W7, N0–N2, and I1–I2, in the order in which they
    appear below, to each of the isolating run sequences, applying one
    rule to all the characters in the sequence in the order in which
    they occur in the sequence before applying another rule to any part
    of the sequence. The order that one isolating run sequence is
    treated relative to another does not matter.

Does the last sentence mean that it is OK to apply W1 to the 1st
isolating sequence, then apply W1 to the second isolating sequence,
then apply W2 to the 1st isolating sequence, followed by W2
application to the 2nd isolating sequence, etc.?  IOW, the last
sentence refers to the order of processing between the isolating run
sequences, but says nothing about the order of applying rules between
the sequences.


   Apply rules W1–W7, N0–N2, and I1–I2 to each of the isolating run sequences.
   For each sequence, [completely] apply each rule in the order in which they 
appear below.
   The order that one isolating run sequence is treated relative to another 
does not matter.

I believe the above restatement expresses the same thing in fewer words.
The "completely" may be unnecessary. The text about applying the rules to "all
characters" seems to be unnecessary, unless there is, in any of the rules, an
option to not apply it to some characters. Unless incomplete application is
envisaged, calling out the "all characters" here just confuses.


5. Rule N0 says:

    . For each bracket-pair element in the list of pairs of text positions

      a. Inspect the bidirectional types of the characters enclosed
        within the bracket pair.
      b. If any strong type (either L or R) matching the embedding
        direction is found, set the type for both brackets in the pair
        to match the embedding direction.

First, what is meant here by "strong type [...] matching the embedding
direction"?  Does the "match" here consider only the odd/even value of
the current embedding level vs R/L type, in the sense that odd levels
"match" R and even levels "match" L?  Or does this mean some other
kind of matching?  Table 3, which the only place that seems to refer
to the issue, is not entirely clear, either:

   e   The text ordering type (L or R) that matches the embedding level
       direction (even or odd).

Again, the sense of the "match" here is not clear.


even/odd --- R/L match, might be made more explicit


Next, what is meant here by "the characters enclosed within the
bracket pair"?  If the bracket pair encloses another bracket pair,
which is inner to it, do the characters inside the inner pair count
for the purposes of resolving the level of the outer pair?

They do, so there's no need to change the text.


Lastly, I presume that by "the bidirectional types of the enclosed
characters" the text means the resolved types as modified by the
preceding phases, not the original types.  Is that correct?


It's the strong type assigned by rule N0.

A./


Again, thanks in advance for any help.
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

Reply via email to