On Mon, Feb 2, 2009 at 10:22 PM,  <[email protected]> wrote:
> Excellent stuff.  I assume there are no regressions of performance or
> correctness and that it lints.
>
> Does the not at start flag need to be on all RegExpNodes or could we
> just put it on the ChoiceNodes and their children?
>
>
> http://codereview.chromium.org/19539/diff/1/2
> File src/ast.cc (right):
>
> http://codereview.chromium.org/19539/diff/1/2#newcode257
> Line 257: if (node->max_match() > 0) { return false; }
> It's unlikely to make any difference, but this should probably be
> min_match.  I think there should be a comment explaining why it makes
> any sense at all to look at nodes other than the first one (the first
> ones could be lookaheads or other zero-length nodes followed by a ^).
>
> http://codereview.chromium.org/19539/diff/1/5
> File src/jsregexp.cc (right):
>
> http://codereview.chromium.org/19539/diff/1/5#newcode3156
> Line 3156: bool all_omitted = true;
> I think you can omit this if you implement the suggestions below.
>
> http://codereview.chromium.org/19539/diff/1/5#newcode3196
> Line 3196: } else if (alt_gen->quick_check_details.cannot_match()) {
> You can emit the goto fail here if i == choice_count - 1
>
> http://codereview.chromium.org/19539/diff/1/5#newcode3971
> Line 3971: 0, new_max, is_greedy, body, compiler, on_success,
> not_at_start);
> not_at_start can be true here since we know that min > 0 and body can't
> be empty.
>
> http://codereview.chromium.org/19539/diff/1/5#newcode4839
> Line 4839: // Unroll once, and tell the loop that it's not at the start
> of the input.
> we just told the loop it wasn't at the start
>
> http://codereview.chromium.org/19539/diff/1/6
> File src/jsregexp.h (right):
>
> http://codereview.chromium.org/19539/diff/1/6#newcode1199
> Line 1199: cp_offset_ == 0 &&
> A trace is not trivial unless at_start_ is UNKNOWN.
>
> http://codereview.chromium.org/19539/diff/1/7
> File src/parser.cc (right):
>
> http://codereview.chromium.org/19539/diff/1/7#newcode3611
> Line 3611: new RegExpAssertion(RegExpAssertion::START_OF_LINE));
> I think you should set_contains_anchor here too.  The code generation
> for START_OF_LINE can benefit from knowing whether or not we can be at
> the start too.
>
> http://codereview.chromium.org/19539
>

Extra thoughts:

Perhaps EatsAtLeast should be able to ignore alternatives that can't
match due to a ^.  This way something like (^f|.bar|.baz) knows that
it can preload 2 or 4 characters and look for the 'ba' sequence.

If you have a text node with non-ASCII characters and the input string
is known to be ASCII then you can also mark that quick check as
impossible.


-- 
Erik Corry, Software Engineer
Google Denmark ApS.  CVR nr. 28 86 69 84
c/o Philip & Partners, 7 Vognmagergade, P.O. Box 2227, DK-1018
Copenhagen K, Denmark.

--~--~---------~--~----~------------~-------~--~----~
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to