RE: Regular and Context-Free languages

2002-08-12 Thread Thom Boyer

Steve Find said on August 09, 2002 6:24 PM:
Anyone happen to know where pushdown automata fit in this list? Can
they handle context-sensitive, just context-free, or some other
subset?

Mark Reed said on August 09, 2002 7:60 PM:
To recognize a context-sensitive language I think you need a Turing
machine.  I'm not aware of anything intermediate in power between
a PDA and a TM.

A regular expression is equivalent to an FSM 
  (Finite State Machine).
A context-free grammar is equivalent to a PDA 
  (Push-down Automata -- i.e., an FSM with a stack)
A context-sensitive grammars is equivalent to an LBA 
  (Linear Bounded Automata -- i.e., a Turing machine restricted 
  to a given finite length of tape)
An unrestricted grammar is equivalent to a TM 
  (Turing Machine).

[info from _Introduction_to_Automata_Theory,_Languages,_and_Computation_,
John E. Hopcroft and Jeffrey D. Ullman]



Re: Regular and Context-Free languages

2002-08-09 Thread Steve Fink

Wow. Since you went to the trouble of writing all this up, it really
ought to go in a FAQ somewhere.

On Thu, Aug 08, 2002 at 12:05:00AM -0400, Mark J. Reed wrote:
 Finite state machines can match regular expressions whose only operations
 are closure (*), alternation (|), and grouping.  Some of the other things

Don't forget optional subexpressions (?). Not sure what the official
name is.

They can also handle negation, but nobody ever seems to put that into
the regex syntax. Probably because the exact semantics get fuzzy when
you're not anchored at the beginning and end, and any particular
semantics you might pick are deceptive and/or useless.

 You can turn an nondeterministic FA into a deterministic one (a DFA)
 by a fairly straightforward algorithm.  Basically, each state in the
 DFA corresponds to a set of all the possible states the NFA *could*
 be in at a given point in a string.  So the number of states in
 a DFA built from an NFA can be huge; but the machine never has to
 backtrack and try again because it knows exactly which transition
 to make at each step.  So in general, NFAs take less memory but
 run slower, while DFAs run faster but take more memory.  

And if you implement the NFA the way you're implying (as a traditional
NFA), you'll always get the same answer to whether or not the
expression matches, but assuming the answer is yes the NFA and DFA
might match different parts of the string. Which doesn't contradict
anything you're saying, but it's good to be aware that the theoretical
purpose for these things (deciding whether a given string matches an
expression or not) is not the same as the common practical purpose
(finding the exact substring within a given string that matches the
expression).

 (Historically, the grep program has used an NFA, while egrep has
 used a DFA.)

That sounds backwards.

You should probably also mention somewhere that once you break free of
the bounds of regularity, it's much easier to implement many features
and irregular constructs with an NFA rather than a DFA. Like capturing
parentheses, alternation that prefers the choice the expression gives
first, etc.

 A step up from regular languages  are context-free languages, which can
 count. C-F languages can be recognized by a variety of different
 mechanisms, but not by FSMs.

Anyone happen to know where pushdown automata fit in this list? Can
they handle context-sensitive, just context-free, or some other
subset?



Re: Regular and Context-Free languages

2002-08-09 Thread Mark J. Reed

On Fri, Aug 09, 2002 at 05:23:58PM -0700, Steve Fink wrote:
 Wow. Since you went to the trouble of writing all this up, it really
 ought to go in a FAQ somewhere.
It probably already is in a FAQ somewhere; I just don't know where. :)

 On Thu, Aug 08, 2002 at 12:05:00AM -0400, Mark J. Reed wrote:
  Finite state machines can match regular expressions whose only operations
  are closure (*), alternation (|), and grouping.  Some of the other things
 
 Don't forget optional subexpressions (?). Not sure what the official
 name is.
Don't need 'em.  a? === (a|) (a or nothing).

 They can also handle negation, but nobody ever seems to put that into
 the regex syntax. Probably because the exact semantics get fuzzy when
 you're not anchored at the beginning and end, and any particular
 semantics you might pick are deceptive and/or useless.
Hm.  Not sure about negation.

 And if you implement the NFA the way you're implying (as a traditional
 NFA), you'll always get the same answer to whether or not the
 expression matches, but assuming the answer is yes the NFA and DFA
 might match different parts of the string. Which doesn't contradict
 anything you're saying, but it's good to be aware that the theoretical
 purpose for these things (deciding whether a given string matches an
 expression or not) is not the same as the common practical purpose
 (finding the exact substring within a given string that matches the
 expression).
True.

  (Historically, the grep program has used an NFA, while egrep has
  used a DFA.)
 
 That sounds backwards.
It's not.  It was the example our prof used, and it's also that way
in table 4-1 of _Mastering_Regular_Expressions_.

 You should probably also mention somewhere that once you break free of
 the bounds of regularity, it's much easier to implement many features
 and irregular constructs with an NFA rather than a DFA. Like capturing
 parentheses, alternation that prefers the choice the expression gives
 first, etc.
True.   Although with a DFA there's not much reason to add preferential
alternation since it matches just as fast no matter which alternate
matches.

 Anyone happen to know where pushdown automata fit in this list? Can
 they handle context-sensitive, just context-free, or some other
 subset?
The set of languages recognizable by pushdown automata (PDAs) is exactly
the set of context-free languages.   

To recognize a context-sensitive language I think you need a Turing
machine.  I'm not aware of anything intermediate in power between
a PDA and a TM.

-- 
Mark REED| CNN Internet Technology
1 CNN Center Rm SW0831G  | [EMAIL PROTECTED]
Atlanta, GA 30348  USA   | +1 404 827 4754 
--
Help, Mr. Wizard!
-- Tennessee Tuxedo



Re: Regular and Context-Free languages

2002-08-09 Thread Steve Fink

On Fri, Aug 09, 2002 at 09:50:00PM -0400, Mark J. Reed wrote:
 On Fri, Aug 09, 2002 at 05:23:58PM -0700, Steve Fink wrote:
  On Thu, Aug 08, 2002 at 12:05:00AM -0400, Mark J. Reed wrote:
   Finite state machines can match regular expressions whose only operations
   are closure (*), alternation (|), and grouping.  Some of the other things
  
  Don't forget optional subexpressions (?). Not sure what the official
  name is.

 Don't need 'em.  a? === (a|) (a or nothing).

Duh. Right.

  They can also handle negation, but nobody ever seems to put that into
  the regex syntax. Probably because the exact semantics get fuzzy when
  you're not anchored at the beginning and end, and any particular
  semantics you might pick are deceptive and/or useless.

 Hm.  Not sure about negation.

Take a DFA representing a regular expression. Make all nonaccepting
states accepting, and all accepting states nonaccepting.

Which is really silly if you're not anchoring the expression at both
ends: the negation of the pattern hello, when given the input string
hello, will successfully match h. Or if you let it run off the end
of the input string and keep the last encountered accepting state, as
is often done to get the longest match, it will match hell.

   (Historically, the grep program has used an NFA, while egrep has
   used a DFA.)
  
  That sounds backwards.

 It's not.  It was the example our prof used, and it's also that way
 in table 4-1 of _Mastering_Regular_Expressions_.

Hmm, you're right. But my version of egrep has backreferences. I'm
confused.

  You should probably also mention somewhere that once you break free of
  the bounds of regularity, it's much easier to implement many features
  and irregular constructs with an NFA rather than a DFA. Like capturing
  parentheses, alternation that prefers the choice the expression gives
  first, etc.

 True.   Although with a DFA there's not much reason to add preferential
 alternation since it matches just as fast no matter which alternate
 matches.

Right, but it matters when you're also returning where in the string
the pattern matched. (That's why I said features in addition to
irregular constructs, because order of alternation has nothing to do
with the binary decision of whether to match.)

And, of course, when you're doing sick perlish things like embedded
code segments...