Re: Does ::: constrain the pattern engine implementation?
On Wed, Aug 28, 2002 at 10:36:54AM -0700, Larry Wall wrote: That is a worthy consideration, but expressiveness takes precedence over it in this case. DFAs are really only good for telling you *whether* and *where* a pattern matches as a whole. They are relatively useless for telling you *how* a pattern matches. Actually, DFAs can also tell you *how* a pattern matches matches. For instance the RE library (http://www.sourceforge.net/projects/libre/) is DFA-based and support a lot of Perl 5 regular expression features: - submatches - left-most matching semantics - greedy and non-greedy operators (*, *?, ?, ??) - zero-length assertions such as ^, $, \b. I don't know how to implement back-references and embedded Perl code using a DFA. Look-ahead and look-behind assertion can probably be implemented, though this looks tricky. But, these features are not used that often. So, I believe than most real-life Perl 5 regular expressions can be handled by a DFA. Add to that the fact that most real-life patterns don't generally do much backtracking, because they're written to succeed, not to fail. This pattern never backtracks, for instance: my ($num) = /^Items: (\d+)/; Another advantage of DFAs over NFAs is that the core of a DFA-based pattern matching implementation is a small simple loop which is executed very efficiently by modern CPUs. On the other hand, the core of a NFA-based implementation is much larger and much more complex, and I'm not sure it is executed as efficiently. In particular, there is probably a lot more branch mispredictions. As an example of what performance improvement one can sometimes achieve using a DFA-based rather than a NFA-based implementation, here are the results I get on my computer for the regular expression benchmark from the Great Computer Language Shootout. (http://www.bagley.org/~doug/shootout/bench/regexmatch/) Perl 5 3.49s OCaml with PCRE (NFA-based) 3.17s OCaml with RE (DFA-based) 0.34s -- Jerome
Re: Does ::: constrain the pattern engine implementation?
[EMAIL PROTECTED] (Deven T. Corzine) writes: Would it be _possible_ to create a non-backtracking implementation of a Perl 6 pattern engine I don't believe that it is, but not just because of : and friends. Why does it matter? -- Life sucks, but it's better than the alternative. -- Peter da Silva
Re: Does ::: constrain the pattern engine implementation?
At 10:57 AM -0400 8/28/02, Deven T. Corzine wrote: Would it be _possible_ to create a non-backtracking implementation of a Perl 6 pattern engine, or does the existence of backtracking-related operators preclude this possibility in advance? In general, no of course it's not possible to create a non-backtracking perl regex engine. Far too much of perl's regexes requires backtracking. That doesn't mean you can't write one for a specific subset of perl's regexes, though. A medium-term goal for the regex engine is to note where a DFA would give correct behaviour and use one, rather than going through the more expensive generalized regex engine we'd otherwise use. If you want to head over to [EMAIL PROTECTED] and pitch in on the regex implementation (it's being worked on now) that'd be great. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Does ::: constrain the pattern engine implementation?
On 28 Aug 2002, Simon Cozens wrote: [EMAIL PROTECTED] (Deven T. Corzine) writes: Would it be _possible_ to create a non-backtracking implementation of a Perl 6 pattern engine I don't believe that it is, but not just because of : and friends. Why does it matter? I'm not saying we should dump the operators -- if we get more power by assuming a backtracking implementation, maybe that's a worthwhile tradeoff. On the other hand, if we can keep the implementation possibilities more open, that's always a worthwhile goal, even if we're not sure if or when we'll ever take advantage of those possibilities, or if we even could... It seems like backtracking is a Bad Thing, in that it leads to reprocessing data that we've already looked at. On the other hand, it seems to be a Necessary Evil because of the memory costs of avoiding backtracking, and because we might have to give up valuable features without backtracking. It may be that backreferences already demand backtracking. Or some other feature might. I don't know; I haven't thought it through. If we must have backtracking, so be it. But if that's a tradeoff we're making for more expressive and powerful patterns, we should still at least make that tradeoff with our eyes open. And if the tradeoff can be avoided, that's even better. Deven
Re: Does ::: constrain the pattern engine implementation?
On Wed, 28 Aug 2002, Dan Sugalski wrote: At 10:57 AM -0400 8/28/02, Deven T. Corzine wrote: Would it be _possible_ to create a non-backtracking implementation of a Perl 6 pattern engine, or does the existence of backtracking-related operators preclude this possibility in advance? In general, no of course it's not possible to create a non-backtracking perl regex engine. Far too much of perl's regexes requires backtracking. Given that Perl 5 regex's are no longer regular (much less Perl 6), I'm sure this is probably true. There may be a regular subset which could be implemented without backtracking if problematic features are avoided, but surely a complete non-backtracking implementation is beyond reach. On the other hand, :, ::, ::: and commit don't necessarily need to be a problem if they can be treated as hints that can be ignored. If allowing the normal engine to backtrack despite the hints would change the results, that might be a problem. I don't know; cut may pose special problems. Even if the new operators can't work without backtracking, maybe it doesn't matter, since there's surely a few others inherited from Perl 5 as well... That doesn't mean you can't write one for a specific subset of perl's regexes, though. A medium-term goal for the regex engine is to note where a DFA would give correct behaviour and use one, rather than going through the more expensive generalized regex engine we'd otherwise use. I think this is a more realistic goal, and more or less what I had in mind. I believe there are many subpatterns which might be beneficial to compile to a DFA (or DFA-like) form, where runtime performance is important. For example, if a pattern is matching dates, a (Jan|Feb|Mar|Apr|...) subpattern would be more efficient to implement as a DFA than with backtracking. With a large amount of data to process, that represent significant savings... If you want to head over to [EMAIL PROTECTED] and pitch in on the regex implementation (it's being worked on now) that'd be great. I'd like to do that, if I can find the time. It would be interesting to make a small experimental prototype to see if DFA construction could really improve performance over backtracking, but it would probably need to be a very restricted subset of regex operations to test the idea... However, while I'm still on perl6-language, I have two language issues to discuss first: (1) Can we have a :study modifier in Perl 6 for patterns? It could be a no-op if necessary, but it could take the place of Perl 5's study operator and indicate that the programmer WANTS the pattern optimized for maximum runtime speed, even at the cost of compile time or memory. (Hey, how about a :cram modifier for extreme optimization? :-) (2) Would simple alternation impede DFA-style optimizations? Currently, a pattern like (Jun|June) would never match June because the leftmost match Jun would always take precedence, despite the normal longest-match behavior of regexes in general. This example could be implemented in a DFA; would that always be the case? Would it be better for the matching of (Jun|June) to be undefined and implementation-dependent? Or is it best to require leftmost semantics? Deven
Re: Does ::: constrain the pattern engine implementation?
On Wed, 28 Aug 2002, Deven T. Corzine wrote: Would it be better for the matching of (Jun|June) to be undefined and implementation-dependent? Or is it best to require leftmost semantics? For an alternation spelled out explicitly in the pattern, it seems like undefined matching would be confusing. I regularly order the branches of regexes assuming they are tried left-to-right. On the other hand, and on a related note of constrained implementation, do we require leftmost matching for interpolated arrays of literals (e.g. /x/)? If, as with hyper-operators, we said the order of evaluation is undefined, we could use a fast algorithm (Aho-Corasick?) that doesn't preserve order. /s
Re: Does ::: constrain the pattern engine implementation?
On Wed, 28 Aug 2002, Deven T. Corzine wrote: : I'm not saying we should dump the operators -- if we get more power by : assuming a backtracking implementation, maybe that's a worthwhile tradeoff. : : On the other hand, if we can keep the implementation possibilities more : open, that's always a worthwhile goal, even if we're not sure if or when : we'll ever take advantage of those possibilities, or if we even could... That is a worthy consideration, but expressiveness takes precedence over it in this case. DFAs are really only good for telling you *whether* and *where* a pattern matches as a whole. They are relatively useless for telling you *how* a pattern matches. For instance, a DFA can tell you that you have a valid computer program, but can't hand you back the syntax tree, because it has no way to decide between shifting and reducing. It has to do both simultaneously. : It seems like backtracking is a Bad Thing, in that it leads to reprocessing : data that we've already looked at. On the other hand, it seems to be a : Necessary Evil because of the memory costs of avoiding backtracking, and : because we might have to give up valuable features without backtracking. : : It may be that backreferences already demand backtracking. Or some other : feature might. I don't know; I haven't thought it through. I believe you are correct that backrefs require backtracking. Maybe some smart person will find a way to trace back through the states by which a DFA matched to retrieve backref info, but that's probably worth a PhD or two. Minimal matching is also difficult in a DFA, I suspect. : If we must have backtracking, so be it. But if that's a tradeoff we're : making for more expressive and powerful patterns, we should still at least : make that tradeoff with our eyes open. And if the tradeoff can be avoided, : that's even better. I refer you to http:://history.org/grandmother/teach/egg/suck. :-) That's a tradeoff I knowingly made in Perl 1. I saw that awk had a DFA implementation, and suffered for it as a useful tool. And it's not just the backrefs. An NFA can easily incorporate strategies such as Boyer-Moore, which actually skips looking at many of the characters, or the scream algorithm used by study, which can skip even more. All the DFAs I've seen have to look at every character, albeit only once. I suppose it's theoretically possible to compile to a Boyer-Moore-ish state machine, but I've never seen it done. Add to that the fact that most real-life patterns don't generally do much backtracking, because they're written to succeed, not to fail. This pattern never backtracks, for instance: my ($num) = /^Items: (\d+)/; I'm not against applying a DFA implementation where it's useful and practical, but just because it's the best in some limited theoretical framework doesn't cut it for me. Humans do a great deal of backtracking in real life, once the limits of their parallel pattern matching circuits are exceeded. Even in language we often have to reparse sentences that are garden pathological. Why should computers be exempt? :-) Larry
Re: Does ::: constrain the pattern engine implementation?
On Wed, 28 Aug 2002, Deven T. Corzine wrote: : I'd like to do that, if I can find the time. It would be interesting to : make a small experimental prototype to see if DFA construction could really : improve performance over backtracking, but it would probably need to be a : very restricted subset of regex operations to test the idea... That'd be cool. : However, while I'm still on perl6-language, I have two language issues to : discuss first: : : (1) Can we have a :study modifier in Perl 6 for patterns? : : It could be a no-op if necessary, but it could take the place of Perl 5's : study operator and indicate that the programmer WANTS the pattern : optimized for maximum runtime speed, even at the cost of compile time or : memory. (Hey, how about a :cram modifier for extreme optimization? :-) Well, studied isn't really a property of a pattern--it's a property of a string that knows it will have multiple patterns matched against it. One could put a :study on the first pattern, but that's somewhat deceptive. : (2) Would simple alternation impede DFA-style optimizations? : : Currently, a pattern like (Jun|June) would never match June because the : leftmost match Jun would always take precedence, despite the normal : longest-match behavior of regexes in general. This example could be : implemented in a DFA; would that always be the case? Well, June can match if what follows fails to match after Jun. : Would it be better for the matching of (Jun|June) to be undefined and : implementation-dependent? Or is it best to require leftmost semantics? Well, the semantics shouldn't generally wobble around like that, but it'd be pretty easy to let them wobble on purpose via pragma (or via :modifier, which are really just pragmas that scope to regex groups). Larry
Re: Does ::: constrain the pattern engine implementation?
At 10:36 AM -0700 8/28/02, Larry Wall wrote: On Wed, 28 Aug 2002, Deven T. Corzine wrote: : I'm not saying we should dump the operators -- if we get more power by : assuming a backtracking implementation, maybe that's a worthwhile tradeoff. : : On the other hand, if we can keep the implementation possibilities more : open, that's always a worthwhile goal, even if we're not sure if or when : we'll ever take advantage of those possibilities, or if we even could... That is a worthy consideration, but expressiveness takes precedence over it in this case. DFAs are really only good for telling you *whether* and *where* a pattern matches as a whole. They are relatively useless for telling you *how* a pattern matches. For instance, a DFA can tell you that you have a valid computer program, but can't hand you back the syntax tree, because it has no way to decide between shifting and reducing. It has to do both simultaneously. While true, there are reasonably common cases where you don't care about what or where, just whether. For a set of mushed-together examples: while () { last if /END_OF_DATA/; $line .= $_ if /=$/; next unless /$user_entered_string/; } Sure, it's a restricted subset of the stuff people do, and that's cool. I'd not even want to put in DFA-detecting code in the main regex compilation grammar. But in those cases where it is useful, a :dfa switch for regexes would be nifty. (Though *please* don't yet--we've not gotten the current grammar fully implemented :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Does ::: constrain the pattern engine implementation?
On Wed, 28 Aug 2002, Larry Wall wrote: : (1) Can we have a :study modifier in Perl 6 for patterns? : : It could be a no-op if necessary, but it could take the place of Perl 5's : study operator and indicate that the programmer WANTS the pattern : optimized for maximum runtime speed, even at the cost of compile time or : memory. (Hey, how about a :cram modifier for extreme optimization? :-) Well, studied isn't really a property of a pattern--it's a property of a string that knows it will have multiple patterns matched against it. One could put a :study on the first pattern, but that's somewhat deceptive. Oh yeah. I forgot it applied to the string, not the pattern! I forgot since I never use it! :-) Still, it could be considered a parallel... Perhaps a better approach would be to allow the optimization priorities to be specified, perhaps even as numerical ranges for relative importance? The three obvious dimensions to quantify would be compile time, run-time speed, and memory usage. There's often tradeoffs between these three, and allowing the ability for a programmer to specify his/her preferences could allow for aggressive optimizations that are normally inappropriate... Of course, these would be useful not only as modifiers for compiling any regexes, but as general pragmas controlling optimizing behavior of the entire Perl 6 compiler/optimizer... I'm not sure if it's good enough to just say optimize for run-time speed at the expense of compile time and memory (or variations for only having one of the two sacrificed) -- or it it's better to have a scale (say, 0-9) for how important each dimension is. For the extreme case where long compile time and large memory usage is irrelevant, but extreme run-time speed is a must, the programmer might specify optimization priorities of compile=0, memory=0, speed=9. I'm not sure what sort of syntax would be appropriate for such specifications... : (2) Would simple alternation impede DFA-style optimizations? : : Currently, a pattern like (Jun|June) would never match June because the : leftmost match Jun would always take precedence, despite the normal : longest-match behavior of regexes in general. This example could be : implemented in a DFA; would that always be the case? Well, June can match if what follows fails to match after Jun. True enough. Couldn't that still be implemented in a DFA? (Possibly at the cost of doubling the size of the DFA for the later part of the regex!) : Would it be better for the matching of (Jun|June) to be undefined and : implementation-dependent? Or is it best to require leftmost semantics? Well, the semantics shouldn't generally wobble around like that, but it'd be pretty easy to let them wobble on purpose via pragma (or via :modifier, which are really just pragmas that scope to regex groups). Yeah, it's probably safer not to have that much room for undefined behavior since people will just try it and assume that their implementation is the universal behavior... Would there be a good way to say don't care about the longest-vs-leftmost matching semantics? Would it be worthwhile to have longest-trumps-leftmost as an optional modifier? (This might be very expensive if implemented in a backtracking engine, since it could no longer shortcut alternations...) Dan suggested :dfa for DFA semantics -- is that the best answer, or would it be better to define the modifiers in terms of visible behavior rather than implementation, if possible? Deven
Re: Does ::: constrain the pattern engine implementation?
On Wed, Aug 28, 2002 at 12:55:44PM -0400, Deven T. Corzine wrote: On Wed, 28 Aug 2002, Dan Sugalski wrote: At 10:57 AM -0400 8/28/02, Deven T. Corzine wrote: On the other hand, :, ::, ::: and commit don't necessarily need to be a problem if they can be treated as hints that can be ignored. If allowing the normal engine to backtrack despite the hints would change the results, that might be a problem. I don't know; cut may pose special problems. They do change the semantics. (June|Jun) ebug matches the string Junebug, but (June|Jun): ebug does not. Similarly, (June|Jun) ebug (June|Jun) :: ebug (June|Jun) ::: ebug (June|Jun) commit ebug all behave differently when embedded in a larger grammar. However, it is very possible that in many (the majority?) of actual uses, they may be intended purely as optimizations and so any differing behavior is unintentional. It may be worth having a flag noting that (maybe combined with a warning you said this :: isn't supposed to change what can match, but it does.) That doesn't mean you can't write one for a specific subset of perl's regexes, though. A medium-term goal for the regex engine is to note where a DFA would give correct behaviour and use one, rather than going through the more expensive generalized regex engine we'd otherwise use. I think this is a more realistic goal, and more or less what I had in mind. I believe there are many subpatterns which might be beneficial to compile to a DFA (or DFA-like) form, where runtime performance is important. For example, if a pattern is matching dates, a (Jan|Feb|Mar|Apr|...) subpattern would be more efficient to implement as a DFA than with backtracking. With a large amount of data to process, that represent significant savings... I agree. I will reply to this in perl6-internals, though. (2) Would simple alternation impede DFA-style optimizations? Currently, a pattern like (Jun|June) would never match June because the leftmost match Jun would always take precedence, despite the normal longest-match behavior of regexes in general. This example could be implemented in a DFA; would that always be the case? You should read Friedl's Mastering Regular Expressions, if you haven't already. A POSIX NFA would be required to find the longest match (it has to work as if it were a DFA). A traditional NFA produces what would result from the straightforward backtracking implementation, which often gives an answer closer to what the user expects. IMO, these are often different, and the DFA would surprise users fairly often. Would it be better for the matching of (Jun|June) to be undefined and implementation-dependent? Or is it best to require leftmost semantics? I'm voting leftmost, because I've frequently seen people depend on it. I'm not so sure that Larry's suggestion of adding a :dfa flag is always the right approach, because I think this might actually be something you'd want to set for subsets of a grammar or a single expression. I don't think it's useful enough to go as far as proposing that || mean alternate without defining the order of preference, but perhaps some angle-bracketed thing would work. (Or can you embed flags in expressions, like perl5's (?imsx:R) thing? Then the :dfa flag is of course adequate!)
Re: Does ::: constrain the pattern engine implementation?
On Wed, 28 Aug 2002, Larry Wall wrote: That is a worthy consideration, but expressiveness takes precedence over it in this case. I see nothing wrong with expressiveness taking precedence -- I'm only saying that it would be best to be cognizant of any restrictions we're hardcoding into the design (in case there's a less restrictive means to the same ends) and make that design tradeoff knowingly rather than by default. If we can find general solutions that don't demand a particular style of implementation, that's probably an improvement. There may be unavoidable cases, in which case we decide to accept the limitation for expressiveness, and that's a perfectly reasonable design choice. I'd just hate to ignore the issue now, and have someone later say here's a great way it could have been done that would have allowed this improvement in the implementation... DFAs are really only good for telling you *whether* and *where* a pattern matches as a whole. They are relatively useless for telling you *how* a pattern matches. For instance, a DFA can tell you that you have a valid computer program, but can't hand you back the syntax tree, because it has no way to decide between shifting and reducing. It has to do both simultaneously. Yes and no. You're right, but see below. : It may be that backreferences already demand backtracking. Or some other : feature might. I don't know; I haven't thought it through. I believe you are correct that backrefs require backtracking. Maybe some smart person will find a way to trace back through the states by which a DFA matched to retrieve backref info, but that's probably worth a PhD or two. Well, there are certainly PhD students out there doing new research all the time. Who knows what one will come up with one day? It would suck if one gets a PhD for a super-clever pattern-matching algorithm, and we find that we can't use it because of hardcoded assumptions in the language design... As for backtracing states of the DFA, see below. Minimal matching is also difficult in a DFA, I suspect. Is it? I'm not sure. Since the DFA effectively follows all branches of the NFA at once, perhaps minimal matching is no more dificult than maximal? Then again, maybe not. :-) : If we must have backtracking, so be it. But if that's a tradeoff we're : making for more expressive and powerful patterns, we should still at least : make that tradeoff with our eyes open. And if the tradeoff can be avoided, : that's even better. I refer you to http:://history.org/grandmother/teach/egg/suck. :-) Um, huh? That's a tradeoff I knowingly made in Perl 1. I saw that awk had a DFA implementation, and suffered for it as a useful tool. I suspect it's not practical to have an all-DFA implementation with nearly the power and expressiveness of Perl 4 or Perl 5 regexes, much less Perl 6. On the other hand, many patterns have subpatterns which might benefit from using a DFA as an optimization. You don't lose the expressiveness here, if the backtracking NFA is available as well. I'm just asking that we consider the impact on such optimizations, and see if we can leave the door open to reap the benefits without compromising the power and expressiveness we all want. Maybe this just amounts to adding a few modifiers to allow semantic variants (like longest-trumps-leftmost), to enable optimizations that would otherwise impinge on correctness... And it's not just the backrefs. An NFA can easily incorporate strategies such as Boyer-Moore, which actually skips looking at many of the characters, or the scream algorithm used by study, which can skip even more. All the DFAs I've seen have to look at every character, albeit only once. I suppose it's theoretically possible to compile to a Boyer-Moore-ish state machine, but I've never seen it done. Okay, I confess that I've been saying DFA when I don't necessarily mean precisely that. What I really mean is a non-backtracking state machine of some sort, but I'm calling it a DFA because it would be similar to one (to the degree possible) and people know what a DFA is. I could say NBSM, but that seems confusing. :-) Your objections to the limitations of a DFA are quite correct, of course. Modifications would be required to overcome the limits, and then it's no longer really a DFA, just like Perl 5's regular expressions are no longer really regular expressions in the mathematical sense. I'm envisioning a state machine of some sort, which has a lot in common with a DFA but isn't strictly a DFA anymore. If you prefer, I'll call it an NBSM, or I'm open to better suggestions for a name! Anyway, to respond to your objections to a DFA: * While you couldn't hand back a syntax tree from a true DFA, it should be possible to create an NBSM from a DFA recognizer, modified to record whatever extra information is needed to execute the code that constructs the syntax tree. The
Re: Does ::: constrain the pattern engine implementation?
On Wed, 28 Aug 2002, Steve Fink wrote: On Wed, Aug 28, 2002 at 12:55:44PM -0400, Deven T. Corzine wrote: On Wed, 28 Aug 2002, Dan Sugalski wrote: At 10:57 AM -0400 8/28/02, Deven T. Corzine wrote: On the other hand, :, ::, ::: and commit don't necessarily need to be a problem if they can be treated as hints that can be ignored. If allowing the normal engine to backtrack despite the hints would change the results, that might be a problem. I don't know; cut may pose special problems. They do change the semantics. (June|Jun) ebug matches the string Junebug, but (June|Jun): ebug does not. Similarly, (June|Jun) ebug (June|Jun) :: ebug (June|Jun) ::: ebug (June|Jun) commit ebug all behave differently when embedded in a larger grammar. Good point. Okay, they definitely change the semantics. Still, could such semantics be implemented in a non-backtracking state machine, whether or not it's a strict DFA? However, it is very possible that in many (the majority?) of actual uses, they may be intended purely as optimizations and so any differing behavior is unintentional. It may be worth having a flag noting that (maybe combined with a warning you said this :: isn't supposed to change what can match, but it does.) I think this is like the leftmost matching semantics -- it may exist for the sake of implementation efficiency, yet it has semantic consequences as well. In many cases, those semantic differences may be immaterial, yet some code relies on it. Allowing flags to specify that such differences in semantics are immaterial to your pattern would be helpful. (Would it make sense for one flag to say don't care to the semantic differences for BOTH leftmost matching and the :/::/:::/etc. operators?) I believe there are many subpatterns which might be beneficial to compile to a DFA (or DFA-like) form, where runtime performance is important. For example, if a pattern is matching dates, a (Jan|Feb|Mar|Apr|...) subpattern would be more efficient to implement as a DFA than with backtracking. With a large amount of data to process, that represent significant savings... I agree. I will reply to this in perl6-internals, though. Yes, the discussion of details belongs there, when it's not infringing on issues of language design, as the semantic consequences do... (2) Would simple alternation impede DFA-style optimizations? Currently, a pattern like (Jun|June) would never match June because the leftmost match Jun would always take precedence, despite the normal longest-match behavior of regexes in general. This example could be implemented in a DFA; would that always be the case? You should read Friedl's Mastering Regular Expressions, if you haven't already. A POSIX NFA would be required to find the longest match (it has to work as if it were a DFA). A traditional NFA produces what would result from the straightforward backtracking implementation, which often gives an answer closer to what the user expects. IMO, these are often different, and the DFA would surprise users fairly often. Rather than following a traditional approach to NFA/DFA construction, would it be possible to use a modified approach that preserves leftmost matching? (If so, would it be more expensive or just different?) Would it be better for the matching of (Jun|June) to be undefined and implementation-dependent? Or is it best to require leftmost semantics? I'm voting leftmost, because I've frequently seen people depend on it. I was going to agree, until I read your next paragraph. However, it would be useful to be able to say don't care to the semantic distinction -- it might even be useful to be able to demand longest-match take precedence over leftmost matching, but that would incur an extra cost in the normal regex engine... I'm not so sure that Larry's suggestion of adding a :dfa flag is always the right approach, because I think this might actually be something you'd want to set for subsets of a grammar or a single expression. I don't think it's useful enough to go as far as proposing that || mean alternate without defining the order of preference, but perhaps some angle-bracketed thing would work. (Or can you embed flags in expressions, like perl5's (?imsx:R) thing? Then the :dfa flag is of course adequate!) You know, when you bring up an idea like ||, I start thinking that maybe the default should be NOT to have a preference (since it normally doesn't matter) and to only guarantee the leftmost short-circuit behavior with || instead of |. That would allow for more implementation flexibility, and provide a beautiful parallel with C -- in C, only || short-circuits and the | operator still evaluates all parts. (Granted, that's because it's bitwise, but there's still a nice parallel there.) For the few cases where someone wants to rely on leftmost matching of any alternation, they could simply use ||