Gaal Yahas wrote: > On Sun, Feb 1, 2009 at 2:36 PM, Amit Aronovitch <[email protected]> wrote: > >> If you construct your regexp from pieces, I expect each part to be >> meaningful as re on its own, so have the same rules. >> > > No, you can't make this assumption. Here's a common Perl 5 idiom: > > my $alternatives = "(" . (join ")|(", map quotemeta $_, @choices) . ")"; > my $re = qr/$alternatives/; > > There happens to be no bidi trouble here, but it's evidence that > people assemble regexps from small pieces. I'm sure there are other > cases. > > In fact this is very like the example I had in mind. My expectation to each part being a meaningful re came because I could not think of any useful counterexample. Nevertheless, you are right and I suspect that such counterexamples would pop up eventually. The role-selection scheme I proposed earlier could cover that case, and I believe that such edge-case confusions would not outweigh the de-confusions that a standardized complex-expression bidi display method would bring.
>> Actually, the suggestion for source code (programming languages >> dependant - again we assume full syntax awareness) specifically defines >> string literals and comments as tokens, so it would be $foo = "MOLAHS" >> as well. >> >>> Also, there's the tokenization problem you already >>> mention: NATBA"G would come out very bad from this transformation. >>> >> In #1 certainly it would not be much different than NATBAG. In #2, since >> " is not a special character in RE, it will be included in the token and >> have the standard Bidi algorithm applied to it - so you'd get G"ABTAN , >> just as you'd probably expect. >> > > Hmmm, you're right. I can't off-hand give you a counterexample where > the bidi algorithm's idea of tokenization and RE's differ, but I have > a hunch it's difficult to cover all cases. > > We are not talking about the general purpose bidi algorithm, but on a new standard that would provide a "higher-level protocol" for it (as suggested in unicode UAX#9). The main idea is that the application (e.g. editor) would do a *syntax dependant* tokenization and run the bidi algorithm separately on each token (maybe the term "token" here is misleading, as it may refer to long strings e.g. comments). If we define the "tokenization" process for RE correctly, there should be no collisions (that is exactly its purpose). > You're welcome. I may be interested in future discussions on this > topic, if you're cooking something up. > As probably became apparent by now, the thing that is "cooked up" is a proposal for a new standard for bidi display of complex expressions (it includes general guidelines, and several examples for specific syntaxes - one of which is RE). I'll send a current draft to you and Gabor - comments are very welcome. I prefer to avoid posting it publicly because it is still preliminary and undergoing major modifications, but if anyone is interested please contact me. Amit _______________________________________________ Perl mailing list [email protected] http://perl.org.il/mailman/listinfo/perl
