On Mon, Feb 2, 2009 at 10:22 PM, <[email protected]> wrote: > Excellent stuff. I assume there are no regressions of performance or > correctness and that it lints. > > Does the not at start flag need to be on all RegExpNodes or could we > just put it on the ChoiceNodes and their children? > > > http://codereview.chromium.org/19539/diff/1/2 > File src/ast.cc (right): > > http://codereview.chromium.org/19539/diff/1/2#newcode257 > Line 257: if (node->max_match() > 0) { return false; } > It's unlikely to make any difference, but this should probably be > min_match. I think there should be a comment explaining why it makes > any sense at all to look at nodes other than the first one (the first > ones could be lookaheads or other zero-length nodes followed by a ^). > > http://codereview.chromium.org/19539/diff/1/5 > File src/jsregexp.cc (right): > > http://codereview.chromium.org/19539/diff/1/5#newcode3156 > Line 3156: bool all_omitted = true; > I think you can omit this if you implement the suggestions below. > > http://codereview.chromium.org/19539/diff/1/5#newcode3196 > Line 3196: } else if (alt_gen->quick_check_details.cannot_match()) { > You can emit the goto fail here if i == choice_count - 1 > > http://codereview.chromium.org/19539/diff/1/5#newcode3971 > Line 3971: 0, new_max, is_greedy, body, compiler, on_success, > not_at_start); > not_at_start can be true here since we know that min > 0 and body can't > be empty. > > http://codereview.chromium.org/19539/diff/1/5#newcode4839 > Line 4839: // Unroll once, and tell the loop that it's not at the start > of the input. > we just told the loop it wasn't at the start > > http://codereview.chromium.org/19539/diff/1/6 > File src/jsregexp.h (right): > > http://codereview.chromium.org/19539/diff/1/6#newcode1199 > Line 1199: cp_offset_ == 0 && > A trace is not trivial unless at_start_ is UNKNOWN. > > http://codereview.chromium.org/19539/diff/1/7 > File src/parser.cc (right): > > http://codereview.chromium.org/19539/diff/1/7#newcode3611 > Line 3611: new RegExpAssertion(RegExpAssertion::START_OF_LINE)); > I think you should set_contains_anchor here too. The code generation > for START_OF_LINE can benefit from knowing whether or not we can be at > the start too. > > http://codereview.chromium.org/19539 >
Extra thoughts: Perhaps EatsAtLeast should be able to ignore alternatives that can't match due to a ^. This way something like (^f|.bar|.baz) knows that it can preload 2 or 4 characters and look for the 'ba' sequence. If you have a text node with non-ASCII characters and the input string is known to be ASCII then you can also mark that quick check as impossible. -- Erik Corry, Software Engineer Google Denmark ApS. CVR nr. 28 86 69 84 c/o Philip & Partners, 7 Vognmagergade, P.O. Box 2227, DK-1018 Copenhagen K, Denmark. --~--~---------~--~----~------------~-------~--~----~ v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev -~----------~----~----~----~------~----~------~--~---
