RE: Regular and Context-Free languages
Steve Find said on August 09, 2002 6:24 PM: Anyone happen to know where pushdown automata fit in this list? Can they handle context-sensitive, just context-free, or some other subset? Mark Reed said on August 09, 2002 7:60 PM: To recognize a context-sensitive language I think you need a Turing machine. I'm not aware of anything intermediate in power between a PDA and a TM. A regular expression is equivalent to an FSM (Finite State Machine). A context-free grammar is equivalent to a PDA (Push-down Automata -- i.e., an FSM with a stack) A context-sensitive grammars is equivalent to an LBA (Linear Bounded Automata -- i.e., a Turing machine restricted to a given finite length of tape) An unrestricted grammar is equivalent to a TM (Turing Machine). [info from _Introduction_to_Automata_Theory,_Languages,_and_Computation_, John E. Hopcroft and Jeffrey D. Ullman]
Re: Regular and Context-Free languages
Wow. Since you went to the trouble of writing all this up, it really ought to go in a FAQ somewhere. On Thu, Aug 08, 2002 at 12:05:00AM -0400, Mark J. Reed wrote: Finite state machines can match regular expressions whose only operations are closure (*), alternation (|), and grouping. Some of the other things Don't forget optional subexpressions (?). Not sure what the official name is. They can also handle negation, but nobody ever seems to put that into the regex syntax. Probably because the exact semantics get fuzzy when you're not anchored at the beginning and end, and any particular semantics you might pick are deceptive and/or useless. You can turn an nondeterministic FA into a deterministic one (a DFA) by a fairly straightforward algorithm. Basically, each state in the DFA corresponds to a set of all the possible states the NFA *could* be in at a given point in a string. So the number of states in a DFA built from an NFA can be huge; but the machine never has to backtrack and try again because it knows exactly which transition to make at each step. So in general, NFAs take less memory but run slower, while DFAs run faster but take more memory. And if you implement the NFA the way you're implying (as a traditional NFA), you'll always get the same answer to whether or not the expression matches, but assuming the answer is yes the NFA and DFA might match different parts of the string. Which doesn't contradict anything you're saying, but it's good to be aware that the theoretical purpose for these things (deciding whether a given string matches an expression or not) is not the same as the common practical purpose (finding the exact substring within a given string that matches the expression). (Historically, the grep program has used an NFA, while egrep has used a DFA.) That sounds backwards. You should probably also mention somewhere that once you break free of the bounds of regularity, it's much easier to implement many features and irregular constructs with an NFA rather than a DFA. Like capturing parentheses, alternation that prefers the choice the expression gives first, etc. A step up from regular languages are context-free languages, which can count. C-F languages can be recognized by a variety of different mechanisms, but not by FSMs. Anyone happen to know where pushdown automata fit in this list? Can they handle context-sensitive, just context-free, or some other subset?
Re: Regular and Context-Free languages
On Fri, Aug 09, 2002 at 05:23:58PM -0700, Steve Fink wrote: Wow. Since you went to the trouble of writing all this up, it really ought to go in a FAQ somewhere. It probably already is in a FAQ somewhere; I just don't know where. :) On Thu, Aug 08, 2002 at 12:05:00AM -0400, Mark J. Reed wrote: Finite state machines can match regular expressions whose only operations are closure (*), alternation (|), and grouping. Some of the other things Don't forget optional subexpressions (?). Not sure what the official name is. Don't need 'em. a? === (a|) (a or nothing). They can also handle negation, but nobody ever seems to put that into the regex syntax. Probably because the exact semantics get fuzzy when you're not anchored at the beginning and end, and any particular semantics you might pick are deceptive and/or useless. Hm. Not sure about negation. And if you implement the NFA the way you're implying (as a traditional NFA), you'll always get the same answer to whether or not the expression matches, but assuming the answer is yes the NFA and DFA might match different parts of the string. Which doesn't contradict anything you're saying, but it's good to be aware that the theoretical purpose for these things (deciding whether a given string matches an expression or not) is not the same as the common practical purpose (finding the exact substring within a given string that matches the expression). True. (Historically, the grep program has used an NFA, while egrep has used a DFA.) That sounds backwards. It's not. It was the example our prof used, and it's also that way in table 4-1 of _Mastering_Regular_Expressions_. You should probably also mention somewhere that once you break free of the bounds of regularity, it's much easier to implement many features and irregular constructs with an NFA rather than a DFA. Like capturing parentheses, alternation that prefers the choice the expression gives first, etc. True. Although with a DFA there's not much reason to add preferential alternation since it matches just as fast no matter which alternate matches. Anyone happen to know where pushdown automata fit in this list? Can they handle context-sensitive, just context-free, or some other subset? The set of languages recognizable by pushdown automata (PDAs) is exactly the set of context-free languages. To recognize a context-sensitive language I think you need a Turing machine. I'm not aware of anything intermediate in power between a PDA and a TM. -- Mark REED| CNN Internet Technology 1 CNN Center Rm SW0831G | [EMAIL PROTECTED] Atlanta, GA 30348 USA | +1 404 827 4754 -- Help, Mr. Wizard! -- Tennessee Tuxedo
Re: Regular and Context-Free languages
On Fri, Aug 09, 2002 at 09:50:00PM -0400, Mark J. Reed wrote: On Fri, Aug 09, 2002 at 05:23:58PM -0700, Steve Fink wrote: On Thu, Aug 08, 2002 at 12:05:00AM -0400, Mark J. Reed wrote: Finite state machines can match regular expressions whose only operations are closure (*), alternation (|), and grouping. Some of the other things Don't forget optional subexpressions (?). Not sure what the official name is. Don't need 'em. a? === (a|) (a or nothing). Duh. Right. They can also handle negation, but nobody ever seems to put that into the regex syntax. Probably because the exact semantics get fuzzy when you're not anchored at the beginning and end, and any particular semantics you might pick are deceptive and/or useless. Hm. Not sure about negation. Take a DFA representing a regular expression. Make all nonaccepting states accepting, and all accepting states nonaccepting. Which is really silly if you're not anchoring the expression at both ends: the negation of the pattern hello, when given the input string hello, will successfully match h. Or if you let it run off the end of the input string and keep the last encountered accepting state, as is often done to get the longest match, it will match hell. (Historically, the grep program has used an NFA, while egrep has used a DFA.) That sounds backwards. It's not. It was the example our prof used, and it's also that way in table 4-1 of _Mastering_Regular_Expressions_. Hmm, you're right. But my version of egrep has backreferences. I'm confused. You should probably also mention somewhere that once you break free of the bounds of regularity, it's much easier to implement many features and irregular constructs with an NFA rather than a DFA. Like capturing parentheses, alternation that prefers the choice the expression gives first, etc. True. Although with a DFA there's not much reason to add preferential alternation since it matches just as fast no matter which alternate matches. Right, but it matters when you're also returning where in the string the pattern matched. (That's why I said features in addition to irregular constructs, because order of alternation has nothing to do with the binary decision of whether to match.) And, of course, when you're doing sick perlish things like embedded code segments...