"The biggest challenge was not in creating those tables, but in understanding the nuances of the rules, by the way."
Two questions so I can understand better. First, by nuances do you mean the nuances of how the rules interact (which I think would be simplified by using a definition as I have proposed) or some other nuance? Second, how did you create the state tables? There are standard, automated, optimal techniques for turning a regular language into a state table by turning the regular language into a finite state automaton (this is partly why I'm advocating phrasing the rules as a single regular language). Did you use something like that or some other way? On Sun, Oct 17, 2010 at 1:12 PM, Asmus Freytag <[email protected]> wrote: > On 10/17/2010 7:01 AM, Michael D. Adams wrote: >> >> This is something that not even the C++ and Java reference >> implementations do (though it appears that the C++ implementation of >> the W rules was originally derived from a regular expression as it >> uses state tables, but if so it is undocumented). (Which by the way >> they have not been proven to be equivalent, they have merely been >> tested. Proof is a much more complicated formalism.) > > Having written the C++ reference implementation, I know a thing or two about > it :) > > The two implementations were not formally proven to be equivalent - in a way > that wasn't the purpose. The purpose was to see whether several (in this > case two) implementers could read the rules and come up with the same > results. > > When someone creates a real-world implementation to a specification like the > Bidi Algorithm, it's not usually verified by formal proof, but by testing. > Therefore, the exercise had to with finding out what level of testing was > sufficient to capture inadvertent misapplication of some of the > less-well-worded rules. (Some of them have since been rewritten to make > their intent less ambiguous). > > Most of the issues were found with the test pass that simply compared all > possible sequences up to length 6. That is better than the BidiTest.txt > file, which I understand only goes to length 4. Stochastic sampling of > sequences up to length 20 resulted in fewer reported discrepancies - again, > all of this is from memory. > > For the test, the maximal depth of embeddings was set to 15 instead of 63, > and the input were strings of bidi classes, not raw characters - therefore > cutting down on the number of possible sequences. > > The Java implementation was deliberately designed to be transparent - > matching the way the rules are formulated in an obvious way. For the C++ > implementation I wanted to do something different, and possibly faster, so I > hand-coded a few state tables. The biggest challenge was not in creating > those tables, but in understanding the nuances of the rules, by the way. > > The situation today is not fully comparable, since there was some feedback > from the reference implementation project to the rules and especially their > wording. > > A./ >

