On Fri, Aug 09, 2002 at 09:50:00PM -0400, Mark J. Reed wrote:
> On Fri, Aug 09, 2002 at 05:23:58PM -0700, Steve Fink wrote:
> > On Thu, Aug 08, 2002 at 12:05:00AM -0400, Mark J. Reed wrote:
> > > Finite state machines can match regular expressions whose only operations
> > > are closure (*), alternation (|), and grouping.  Some of the other things
> > 
> > Don't forget optional subexpressions (?). Not sure what the official
> > name is.
>
> Don't need 'em.  a? === (a|) (a or nothing).

Duh. Right.

> > They can also handle negation, but nobody ever seems to put that into
> > the regex syntax. Probably because the exact semantics get fuzzy when
> > you're not anchored at the beginning and end, and any particular
> > semantics you might pick are deceptive and/or useless.
>
> Hm.  Not sure about negation.

Take a DFA representing a regular expression. Make all nonaccepting
states accepting, and all accepting states nonaccepting.

Which is really silly if you're not anchoring the expression at both
ends: the negation of the pattern "hello", when given the input string
"hello", will successfully match "h". Or if you let it run off the end
of the input string and keep the last encountered accepting state, as
is often done to get the longest match, it will match "hell".

> > > (Historically, the grep program has used an NFA, while egrep has
> > > used a DFA.)
> > 
> > That sounds backwards.
>
> It's not.  It was the example our prof used, and it's also that way
> in table 4-1 of _Mastering_Regular_Expressions_.

Hmm, you're right. But my version of egrep has backreferences. I'm
confused.

> > You should probably also mention somewhere that once you break free of
> > the bounds of regularity, it's much easier to implement many features
> > and irregular constructs with an NFA rather than a DFA. Like capturing
> > parentheses, alternation that prefers the choice the expression gives
> > first, etc.
>
> True.   Although with a DFA there's not much reason to add preferential
> alternation since it matches just as fast no matter which alternate
> matches.

Right, but it matters when you're also returning where in the string
the pattern matched. (That's why I said "features" in addition to
"irregular constructs", because order of alternation has nothing to do
with the binary decision of whether to match.)

And, of course, when you're doing sick perlish things like embedded
code segments...

Reply via email to