Bidi Parenthesis Algorithm and BidiCharacterTest.txt

2014-10-14 Thread Eli Zaretskii
Hi,

One of the test cases in BidiCharacterTest.txt seems to me to
contradict the description of the rules N0 through N2 of the UBA.  Or
maybe I'm missing something.

Here are the details.

The test case in question, on line 114 of BidiCharacterTest.txt, is as
follows:

0061 0028 0028 007B 0062 2680 005B 005D 0029 007D 005B 0063 005B 005D 005D 05D0 
0029;1;1;2 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1;16 15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0

The first field, up to the 1st semicolon, is the sequence of
characters given by their Unicode codepoints, in the logical order.
Translated into readable text, it looks like this:

a ( ( { b ⚀ [ ] )  }  [  c  [  ]  ]  א  )
1 2 3 4 5 6  7 8 9 10 11 12 13 14 15 16 17

where I inserted blanks between every 2 characters, for better
readability, and added position numbers.  The next field of the test
case data, whose value is 1, specifies that the paragraph direction is
RTL, i.e. the embedding level is 1.

Let me now present the application of N0 through N2, as I understand
them, to this text.  (Since there are no explicit directional codes
here, and no weak characters, we can skip all the rules before N0.)

The results of identifying bracket pairs, per BD16, sorted by the
position of the opening bracket, are as follows:

 2 and 17
 3 and 9
 7 and 8
 11 and 15
 13 and 14

Applying N0, we see that:

 . The pair 2-17 encloses 'א', which matches the embedding direction,
   so N0b instructs to resolve this pair as matching the embedding
   direction, i.e. R.

 . The pair 3-9 encloses 'b', whose direction is opposite to the
   embedding direction, and has 'a' before the opening bracket, so
   N0c1 says we should resolve this pair as L, the direction opposite
   to the embedding one.

 . The pair 7-8 encloses no strong characters, so it is left as is.

 . The pair 11-15 encloses 'c' and is preceded by 'b', so N0c1 again
   says to resolve this pair as L.

 . The pair 13-14 encloses no strong characters, so is left alone.

Therefore, the result after N0 is this:

a ( ( { b ⚀ [ ] ) } [ c [ ] ] א )
L R L N L N  N N L N L L N N L R R

Applying N1, we then obtain the following result:

a ( ( { b ⚀ [ ] ) } [ c [ ] ] א )
L R L L L L  L L L L L L L L L R R

There are no neutrals left, so N2 doesn't need to be applied.

Now I2 gives the following resolved levels:

a ( ( { b ⚀ [ ] ) } [ c [ ] ] א )
2 1 2 2 2 2  2 2 2 2 2 2 2 2 2 1 1

However, BidiCharacterTest.txt gives a different sequence of resolved
levels:

2 1 1 1 2 1  1 1 1 1 1 2 1 1 1 1 1

Could someone please point out what am I missing or doing incorrectly?

Thanks in advance.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Bliss?

2014-10-14 Thread Doug Ewell
Markus Scherer markus dot icu at gmail dot com wrote:

 As Michael said, I don't have information. But I found this which
 might help:
 http://en.wikipedia.org/wiki/Blissymbols#Towards_the_international_standardization_of_the_script

Statements in the linked article such as the following (not written by
Markus) always trouble me:

The proposed encoding does not use the lexical encoding model used in
the existing ISO-IR/169 registered character set, but instead applies
the Unicode and ISO character-glyph model to the Bliss-character model
already adopted by BCI, since this would significantly reduce the number
of needed characters.

since my understanding has always been that the reasons behind the
character-glyph model go much deeper than reducing the number of encoded
characters.

--
Doug Ewell | Thornton, CO, USA | http://ewellic.org


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Bliss?

2014-10-14 Thread Andrew West
On 14 October 2014 17:06, Doug Ewell d...@ewellic.org wrote:

 Statements in the linked article such as the following (not written by
 Markus) always trouble me:

Gosh, I wonder who it could have been?

https://en.wikipedia.org/w/index.php?title=Blissymbolsdiff=331226727oldid=331223779

Andrew
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Bliss?

2014-10-14 Thread Michael Everson
On 14 Oct 2014, at 17:59, Andrew West andrewcw...@gmail.com wrote:

 On 14 October 2014 17:06, Doug Ewell d...@ewellic.org wrote:
 
 Statements in the linked article such as the following (not written by
 Markus) always trouble me:
 
 Gosh, I wonder who it could have been?
 
 https://en.wikipedia.org/w/index.php?title=Blissymbolsdiff=331226727oldid=331223779

Oof.

Folks, I’m a member of the BC-UK committee and have been working with BCI for 
years to ready Bliss for encoding. Work proceeds apace. 

Michael Everson * http://www.evertype.com/


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Bidi Parenthesis Algorithm and BidiCharacterTest.txt

2014-10-14 Thread Eli Zaretskii
 From: Andrew Glass (WINDOWS) andrew.gl...@microsoft.com
 Date: Tue, 14 Oct 2014 18:07:24 +
 
 The difference is that N0 is applied per bracket pair and the result of the
 resolution of one bracket pair may impact the resolution of other bracket 
 pairs
 in the same isolating run sequence. So in your example:
 
 · 2-17 is resolved to R as you say.
 
 · Since 2-17 is now R and not neutral, the resolution of 3-9 is R because the
 check for context finds the opening parenthesis at 2 (now R) before the a at 
 1.
 Therefore 2-17 is R under N0c2.

But there's nothing about this in the UAX#9 language!  How did you
arrive at this dependency, using just what the UBA says?

 The proposed update attempts to make this clearer in the intro to 3.3.5:
 
 http://www.unicode.org/reports/tr9/tr9-32.html#N0
 
 Note that this rule is applied based on the current bidirectional character
 type of each paired bracket and not the original type, as this could have
 changed under X6.
 
 Perhaps this should be emended to include that N0 can also update the type for
 subsequent tests under N0, which is the case here.

There's a big difference between X6 and N0.  X6 is about the explicit
override, and is applied before N0.  Your interpretation makes N0 a
recursive rule, something that is not even hinted at by the UBA spec.

 Currently N0 states:
 
 N0. Process bracket pairs in an isolating run sequence sequentially in the
 logical order of the text positions of the opening paired brackets using the
 logic given below.
 
 Example 1 illustrates a similar case in that the neutral ! resolves to R
 because of the bracket resolution to R rather than the context between two Ls.
 This of course takes place in N1 and not N0 as in the example you ask about.

Of course!  And so Example 1 is very different from what we are
discussing, because each stage of the algorithm is applied to the
results of the previous stage.  But there's no other place, AFAICS,
where the same stage is applied recursively.  So I really don't see
how this interpretation could be gleaned from the UBA description.

Thanks for explaining, but it is really frustrating to find out about
these untold subtleties at this late stage.  (And yes, I've read the
proposed changes in tr9-32.html, and not even they say anything about
this.)  How can we be sure that your interpretation is indeed correct,
if it is not even hinted anywhere?
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Bidi Parenthesis Algorithm and BidiCharacterTest.txt

2014-10-14 Thread Whistler, Ken
Eli asked in response to Andrew:



  · Since 2-17 is now R and not neutral, the resolution of 3-9 is R because 
  the

  check for context finds the opening parenthesis at 2 (now R) before the a

 at 1.

  Therefore 2-17 is R under N0c2.



 But there's nothing about this in the UAX#9 language!  How did you

 arrive at this dependency, using just what the UBA says?



See below.



  Perhaps this should be emended to include that N0 can also update the

 type for

  subsequent tests under N0, which is the case here.



 There's a big difference between X6 and N0.  X6 is about the explicit

 override, and is applied before N0.  Your interpretation makes N0 a

 recursive rule, something that is not even hinted at by the UBA spec.



I disagree that this makes N0 a recursive rule. It is a rule with repeatedly

applicable subparts. And like nearly all the rules in the UBA (except ones

which explicitly state that they apply to *original* Bidi_Class values,

which thus have to be stored across the life of the processing of

the string in question), all rules apply to the *current* Bidi_Class

values of the examined context.



In this sense, the UBA, for most rules, operates as a set of

change and forget steps. Thus in the case of N0, if you are

processing a sequential list of bracket pairs, you just process

each pair, one at a time, and it sees as its input whatever the

*current* state is -- which may be (and often is) changed by

the last step.



What you do *not* need to do for N0 is preserve the starting

state when N0 was initiated, and independently check each

bracket pair against *that* array of Bidi_Class values while you

are busy setting them to new values.





 Of course!  And so Example 1 is very different from what we are

 discussing, because each stage of the algorithm is applied to the

 results of the previous stage.  But there's no other place, AFAICS,

 where the same stage is applied recursively.  So I really don't see

 how this interpretation could be gleaned from the UBA description.



I agree that this could (and should) be made more explicit, as

it is apparent that people can run into problems of interpretation

here.



An examination of the functioning of the N0 rule in the bidi

reference implementations could, however, also be used to

help explain what is intended here. For example, in the particular

test case in question, the bidiref C implementation can have its

debug diagnostics cranked up, and you find:



Trace: Entering br_UBA_ResolveEN [W7]

Current State: 13

  Text:0061 0028 0028 007B 0062 2680 005B 005D 0029 007D 005B 0063 005B 
005D 005D 05D0 0029

  Bidi_Class: L   ON   ON   ONL   ON   ON   ON   ON   ON   ONL   ON 
  ON   ONR   ON

  Levels: 1111111111111 
   1111

  Runs:
RR

…

Trace: Exiting br_SortPairList

Pair list:  {1,16} {2,8} {6,7} {10,14} {12,13}

Debug: Strong direction e between brackets

Debug: Strong direction o between brackets

Debug: No strong direction between brackets

Debug: Strong direction o between brackets

Debug: No strong direction between brackets

Current State: 14

  Text:0061 0028 0028 007B 0062 2680 005B 005D 0029 007D 005B 0063 005B 
005D 005D 05D0 0029

  Bidi_Class: LRR   ONL   ON   ON   ONR   ONRL   ON 
  ONRRR

  Levels: 1111111111111 
   1111

  Runs:
RR



Which is the clue needed to track down how the interpretation

based on comparing Bidi_Class values retained from the initiation of

rule N0 is incorrect.



--Ken







 Thanks for explaining, but it is really frustrating to find out about

 these untold subtleties at this late stage.  (And yes, I've read the

 proposed changes in tr9-32.html, and not even they say anything about

 this.)  How can we be sure that your interpretation is indeed correct,

 if it is not even hinted anywhere?




___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Bidi Parenthesis Algorithm and BidiCharacterTest.txt

2014-10-14 Thread Eli Zaretskii
 From: Whistler, Ken ken.whist...@sap.com
 Date: Tue, 14 Oct 2014 22:14:02 +
 Cc: Whistler, Ken ken.whist...@sap.com,
 unicode@unicode.org unicode@unicode.org
 
 I disagree that this makes N0 a recursive rule. It is a rule with repeatedly
 applicable subparts. And like nearly all the rules in the UBA (except ones
 which explicitly state that they apply to *original* Bidi_Class values,
 which thus have to be stored across the life of the processing of
 the string in question), all rules apply to the *current* Bidi_Class
 values of the examined context.

Can you point out where this is stated in the UBA?

According to my reading of the UBA, only W7 could qualify as something
similar to the recursive interpretation of N0.  All the other rules
are either defined in a way that the recursion cannot happen
(because the conditions for applying the rule disappear after it is
applied once), or explicitly speak about a sequence of similar
characters whose bidi types are modified in the same manner.

 Trace: Exiting br_SortPairList
 Pair list: {1,16} {2,8} {6,7} {10,14} {12,13}
 Debug: Strong direction e between brackets
 Debug: Strong direction o between brackets
 Debug: No strong direction between brackets
 Debug: Strong direction o between brackets
 Debug: No strong direction between brackets

This doesn't explain _why_ the decision was that the direction between
brackets was one or the other.  Which is at the core of the issue at
hand.  So this debugging output doesn't really help here.

In any case, when designing an implementation, one normally expects to
read some formal requirements, not learn those requirements from
another implementation.

Anyway, I'm glad we all agree that, once again, the new additions to
the UBA, and the BPA-related ones in particular, are not described
well enough to avoid misinterpretations and misunderstanding such as
this one, and that the language should be improved and clarified,
hopefully sooner rather than later.  I've just lost 20 hours of work
due to that.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode