Attached is an automaton, as usual, for \bboy\b (don't ask, I got it
from a mozilla test).  These things are becoming increasingly
complicated so allow me to explain a bit.

The initial choice node and character classes make up the lookbehind.
This regexp needs to know if it follows a word or non-word character
but if it is just started one character early the lookbehind node will
consume that character and continue to the right place.

WI means that a node needs to know whether or not it follows a word
character.  DW means that a node is responsible for passing
information forward about whether the last character it consumed was a
word character.  DDW means that it does so -- after the full analysis
DDW iff DW.  FW contains information about whether a node follows a
word character; 0 means that it does, 1 means that it doesn't and no
FW means that it doesn't know.  Ignore IW, it is used for bookkeeping
(well, technically this is all bookkeeping but you know what I mean).

Some of these nodes could be simplified, for instance the 'boy' text
node on the left which has FW=1.  It's must only match on a word
boundary, it starts with a word character and follows a word -- in
other words it must always fail.  What will happen is that if there is
a FW it will be factored into the start set of this node so if it
follows a \w the start set will be intersected with \W which in this
case will make it empty.

On Wed, Dec 10, 2008 at 4:28 PM,  <[EMAIL PROTECTED]> wrote:
> Reviewers: Lasse Reichstein,
>
> Description:
> - Added lookbehind propagation for the initial node; now, if the
>  initial node is interested in what precedes it the automaton is
>  given an initial all-consuming character class that determines it.
> - Added verification of some node information invariants.  We now
>  check that if a node expresses interest in what precedes it that
>  information is available to it after assertion expansion.
>
> Please review this at http://codereview.chromium.org/13343
>
> Affected files:
>  M src/globals.h
>  M src/jsregexp.h
>  M src/jsregexp.cc
>  M src/parser.h
>  M src/parser.cc
>  M test/cctest/test-regexp.cc
>
>
>

--~--~---------~--~----~------------~-------~--~----~
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
-~----------~----~----~----~------~----~------~--~---

<<inline: graph.svg>>

Reply via email to