Re: A simpler definition of the Bidi Algorithm

Michael D. Adams Sun, 17 Oct 2010 11:19:55 -0700

"The biggest challenge was not in creating those tables, but in
understanding the nuances of the rules, by the way."


Two questions so I can understand better.

First, by nuances do you mean the nuances of how the rules interact
(which I think would be simplified by using a definition as I have
proposed) or some other nuance?

Second, how did you create the state tables?  There are standard,
automated, optimal techniques for turning a regular language into a
state table by turning the regular language into a finite state
automaton (this is partly why I'm advocating phrasing the rules as a
single regular language).  Did you use something like that or some
other way?

On Sun, Oct 17, 2010 at 1:12 PM, Asmus Freytag <[email protected]> wrote:
>  On 10/17/2010 7:01 AM, Michael D. Adams wrote:
>>
>> This is something that not even the C++ and Java reference
>> implementations do (though it appears that the C++ implementation of
>> the W rules was originally derived from a regular expression as it
>> uses state tables, but if so it is undocumented).  (Which by the way
>> they have not been proven to be equivalent, they have merely been
>> tested.  Proof is a much more complicated formalism.)
>
> Having written the C++ reference implementation, I know a thing or two about
> it :)
>
> The two implementations were not formally proven to be equivalent - in a way
> that wasn't the purpose. The purpose was to see whether several (in this
> case two) implementers could read the rules and come up with the same
> results.
>
> When someone creates a real-world implementation to a specification like the
> Bidi Algorithm, it's not usually verified by formal proof, but by testing.
> Therefore, the exercise had to with finding out what level of testing was
> sufficient to capture inadvertent misapplication of some of the
> less-well-worded rules. (Some of them have since been rewritten to make
> their intent less ambiguous).
>
> Most of the issues were found with the test pass that simply compared all
> possible sequences up to length 6. That is better than the BidiTest.txt
> file, which I understand only goes to length 4. Stochastic sampling of
> sequences up to length 20 resulted in fewer reported discrepancies - again,
> all of this is from memory.
>
> For the test, the maximal depth of embeddings was set to 15 instead of 63,
> and the input were strings of bidi classes, not raw characters - therefore
> cutting down on the number of possible sequences.
>
> The Java implementation was deliberately designed to be transparent -
> matching the way the rules are formulated in an obvious way. For the C++
> implementation I wanted to do something different, and possibly faster, so I
> hand-coded a few state tables. The biggest challenge was not in creating
> those tables, but in understanding the nuances of the rules, by the way.
>
> The situation today is not fully comparable, since there was some feedback
> from the reference implementation project to the rules and especially their
> wording.
>
> A./
>

Re: A simpler definition of the Bidi Algorithm

Reply via email to