Attached is an automaton, as usual, for \bboy\b (don't ask, I got it from a mozilla test). These things are becoming increasingly complicated so allow me to explain a bit.
The initial choice node and character classes make up the lookbehind. This regexp needs to know if it follows a word or non-word character but if it is just started one character early the lookbehind node will consume that character and continue to the right place. WI means that a node needs to know whether or not it follows a word character. DW means that a node is responsible for passing information forward about whether the last character it consumed was a word character. DDW means that it does so -- after the full analysis DDW iff DW. FW contains information about whether a node follows a word character; 0 means that it does, 1 means that it doesn't and no FW means that it doesn't know. Ignore IW, it is used for bookkeeping (well, technically this is all bookkeeping but you know what I mean). Some of these nodes could be simplified, for instance the 'boy' text node on the left which has FW=1. It's must only match on a word boundary, it starts with a word character and follows a word -- in other words it must always fail. What will happen is that if there is a FW it will be factored into the start set of this node so if it follows a \w the start set will be intersected with \W which in this case will make it empty. On Wed, Dec 10, 2008 at 4:28 PM, <[EMAIL PROTECTED]> wrote: > Reviewers: Lasse Reichstein, > > Description: > - Added lookbehind propagation for the initial node; now, if the > initial node is interested in what precedes it the automaton is > given an initial all-consuming character class that determines it. > - Added verification of some node information invariants. We now > check that if a node expresses interest in what precedes it that > information is available to it after assertion expansion. > > Please review this at http://codereview.chromium.org/13343 > > Affected files: > M src/globals.h > M src/jsregexp.h > M src/jsregexp.cc > M src/parser.h > M src/parser.cc > M test/cctest/test-regexp.cc > > > --~--~---------~--~----~------------~-------~--~----~ v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev -~----------~----~----~----~------~----~------~--~---
<<inline: graph.svg>>
