Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

Asmus Freytag Mon, 21 Apr 2014 23:29:14 -0700

On 4/21/2014 8:32 PM, Ilya Zakharevich wrote:

On Mon, Apr 21, 2014 at 06:08:12PM -0700, Asmus Freytag wrote:

Here's the text I supplied, with numbers added for discussion. It
definitely needs some
editing, but the point of the exercise would be to see what:


     1.  A bracket pair is a pair of characters consisting of an opening
          paired bracket and a closing paired bracket such that the
          Bidi_Paired_Bracket property value of the former equals the
latter,
          subject to the following constraints.

         a - both characters of a pair occur in the same isolating run
    sequence
         b - the closing character of a pair follows the opening character
         c - any bracket character can belong at most to one pair, the
    earliest possible one
         d - any bracket character not part of a pair is treated like an
    ordinary character
         e - pairs may nest properly, but their spans may not overlap
    otherwise


     2.  Bracket characters with canonical decompositions are
supposed to be treated
          as if they had been normalized, to allow normalized and
non-normalized text
         to give the same result.


c) needs rewording, because it is not correct

The BD16 examples show

        a ( b ) c ) d           2-4
        a ( b ( c ) d           4-6

 From that, it follows that it's not the earliest but the one with the smallest 
span.

Sorry, I do not see any definition here.  Just a collection of words
which looks like a definition, but only locally…

Thank you for the high praise. :?

Now you deleted language which I will restore here, put into areasonable order and complete the suggested

edit on "c"

d) brackets are resolved at the earliest opportunity, starting from the 
beginning of the text.

c) if there are two possible ways to resolve a pair, the one spanning less text 
is used.

f) unpaired bracket characters remaining inside a resolved bracket pair are 
treated as
ordinary characters (get ignored for bracket matching purposes).


And I think I can even invent an example which I cannot parse using
your definition:

   1(  2[  3(  4]  5)  6)

Is looking-at-1 forcing match of 3-and-5?  Or what?



Let's see what the text gives (before we improve it further).

1. -  1( or 3( could match 5) or 6) , 2[ could only match 4]

a. - we have only one isolating run, so this is a no-op

b. - all opening characters follow their putative closing characters, sothis is a no-op

d. - at location 5 is the earliest opportunity to match a pair
     (before we get to 5 we don't have a opening and closing)
c. - we could match 1( or 3( but we use 3, because it spans less text

e. , f. - can probably combine these, but 4] is now inside a resolvedpair and is ignored.

Now, when we reach 6) we have another pair, and per d, it's the earliestpossible moment

we can resolve it, so we match 1) and 6).

Now I add something to your example

  1(  2[  3(  4]  5)  6)  7]

even though 2[ and 7] properly surround 3( and 5), they can't match,because

1( and 6) surround only 2[, which makes it unpaired and ignored (per f.).

If the example had been

  1(  2[  3(  4]  5)  6]  7)


then, on reaching 6] we could have matched it with 2[ and 7) with 1(


Eli's definition starts

  A bracket pair is a pair of an opening paired bracket and a closing
  paired bracket characters within the same isolating run sequence,
  such that the Bidi_Paired_Bracket property value of the former
  character or its canonical equivalent equals the latter character or
  its canonical equivalent, ....

and continues:

  ....and all the opening and closing bracket
  characters in between these two are balanced.

That continuation we found out was incorrect, so we would need to fix it.

Here's an attempt:

   ... subject to the following conditions:


        a. a match is attempted at the left-most closing bracket character
           unmatched at this point
        b. the closest earlier matching opening bracket, that is unmatched
           at this point is used to form the pair
        c. any unmatched bracket character enclosed in a pair is ignored
           for further matching
        d. matching ends when no more pairs can be formed

I believe with this, you can parse the examples in UAX#9 and the examples
we discussed here. If not, I'd appreciate if you could help identify and
remedy any gaps.

A./


_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

Reply via email to