On 4/22/2014 2:19 AM, Ilya Zakharevich wrote:
I think the crucial problem is with

   1(  2[  3(  4]  5) 5b]  6)

I have two possible interpretations: one matches 2 with 5b, another
leaves 2 unmatched.

Ilya,

if you read UAX#9, the way the algorithm works is by pushing openers on a stack, then, on finding the first closer, going down the stack and attempting to locate a match, then, on finding a match, discarding any enclosed openers, on not finding a match, discarding the closer.

(discard = ignore for further matching, don't treat as bracket any longer).

So, when we reach 4] we have

3(
2[
1(

on the stack. The match is with 2[ and 3 is ignored. 1( remains and can be matched later to 5).

Ultimately 5b] and 6) are ignored.

I believe that your scheme does not match the PBA in that it assumes that brackets are hierarchical and attempts to preserve the best hierarchy, whereas PBA assumes that pairs that are closer together are more likely to be correct matches (for non-mathematical texts hierarchies are not the norm (and they are shallow at best)).

What the PBA actually does can now be put into a definition plus a rule, neither of which use "stack" or other implementation details, such as "variables" or "lists".

D  A bracket pair is a pair of an opening paired bracket and a closing
  paired bracket characters within the same isolating run sequence,
  such that the Bidi_Paired_Bracket property value of the former
  character or its canonical equivalent equals the latter character or
  its canonical equivalent.

R  Characters are resolved into resolved bracket pairs as follows:
Starting at the beginning of the text, when the a closing bracket character is encountered, find the nearest preceding opening character that is not part of a resolved pair, and not ignored for pair resolution and that can form a bracket pair. If one exists, resolve the pair, and mark any enclosed opening brackets of any kind as ignored. Otherwise, if no pair can be resolved, mark
  the closing bracket as ignored.


What this shows is that what the text in BD16 of UAX#9 tries to cover is both a definition
and a rule; which makes it so difficult to follow.

I think what should be proposed is such a breakdown into a smaller definition that
speaks to the matching of properties (modulo canonical equivalence) separate
from the strategy for resolving actual pairs, which is better stated as a rule.

The rule does not need to use implementation language to be definite.

A "resolved" bracket pair is simply the actual pair resolved by rule "R" and the
rest of the PBA acts on "resolved" pairs.

A./

_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Reply via email to