On 4/21/2014 8:32 PM, Ilya Zakharevich wrote:
On Mon, Apr 21, 2014 at 06:08:12PM -0700, Asmus Freytag wrote:
Here's the text I supplied, with numbers added for discussion. It
definitely needs some
editing, but the point of the exercise would be to see what:
1. A bracket pair is a pair of characters consisting of an opening
paired bracket and a closing paired bracket such that the
Bidi_Paired_Bracket property value of the former equals the
latter,
subject to the following constraints.
a - both characters of a pair occur in the same isolating run
sequence
b - the closing character of a pair follows the opening character
c - any bracket character can belong at most to one pair, the
earliest possible one
d - any bracket character not part of a pair is treated like an
ordinary character
e - pairs may nest properly, but their spans may not overlap
otherwise
2. Bracket characters with canonical decompositions are
supposed to be treated
as if they had been normalized, to allow normalized and
non-normalized text
to give the same result.
c) needs rewording, because it is not correct
The BD16 examples show
a ( b ) c ) d 2-4
a ( b ( c ) d 4-6
From that, it follows that it's not the earliest but the one with the smallest
span.
Sorry, I do not see any definition here. Just a collection of words
which looks like a definition, but only locally…
Thank you for the high praise. :?
Now you deleted language which I will restore here, put into a
reasonable order and complete the suggested
edit on "c"
d) brackets are resolved at the earliest opportunity, starting from the
beginning of the text.
c) if there are two possible ways to resolve a pair, the one spanning less text
is used.
f) unpaired bracket characters remaining inside a resolved bracket pair are
treated as
ordinary characters (get ignored for bracket matching purposes).
And I think I can even invent an example which I cannot parse using
your definition:
1( 2[ 3( 4] 5) 6)
Is looking-at-1 forcing match of 3-and-5? Or what?
Let's see what the text gives (before we improve it further).
1. - 1( or 3( could match 5) or 6) , 2[ could only match 4]
a. - we have only one isolating run, so this is a no-op
b. - all opening characters follow their putative closing characters, so
this is a no-op
d. - at location 5 is the earliest opportunity to match a pair
(before we get to 5 we don't have a opening and closing)
c. - we could match 1( or 3( but we use 3, because it spans less text
e. , f. - can probably combine these, but 4] is now inside a resolved
pair and is ignored.
Now, when we reach 6) we have another pair, and per d, it's the earliest
possible moment
we can resolve it, so we match 1) and 6).
Now I add something to your example
1( 2[ 3( 4] 5) 6) 7]
even though 2[ and 7] properly surround 3( and 5), they can't match,
because
1( and 6) surround only 2[, which makes it unpaired and ignored (per f.).
If the example had been
1( 2[ 3( 4] 5) 6] 7)
then, on reaching 6] we could have matched it with 2[ and 7) with 1(
Eli's definition starts
A bracket pair is a pair of an opening paired bracket and a closing
paired bracket characters within the same isolating run sequence,
such that the Bidi_Paired_Bracket property value of the former
character or its canonical equivalent equals the latter character or
its canonical equivalent, ....
and continues:
....and all the opening and closing bracket
characters in between these two are balanced.
That continuation we found out was incorrect, so we would need to fix it.
Here's an attempt:
... subject to the following conditions:
a. a match is attempted at the left-most closing bracket character
unmatched at this point
b. the closest earlier matching opening bracket, that is unmatched
at this point is used to form the pair
c. any unmatched bracket character enclosed in a pair is ignored
for further matching
d. matching ends when no more pairs can be formed
I believe with this, you can parse the examples in UAX#9 and the examples
we discussed here. If not, I'd appreciate if you could help identify and
remedy any gaps.
A./
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode