Hi Albert, Nice to read your answer:) Perhaps I shall read some introduction material on compiler/regular expression parser design, is there any recommendation?
Best regards, Davy On Aug 21, 4:18 pm, "A.T.Hofkamp" <[EMAIL PROTECTED]> wrote: > Davy wrote: > > Hi David, > > > Thank you for the generous help :) > > As far as I can see, Lex and Yacc do the same thing on different > > abstraction layer. > > That is, Lex rule is regular expression of character, while Yacc rule > > (if we use EBNF) is regular expression of token. The only difference > > is Yacc rule embrace recursion. > > True, if you forget about the white space that you silently throw away. > > Below I wrote some background information with some terms you can use as > starting point for searching more information. > > The lack of recursion in lex makes it possible to generate a DFA > (deterministic finite automaton) from the regular expression(s), which allows > for *very fast* processing of the characters. > > The yacc parser on the other hand is slower, and more powerful than regular > expressions. > > (Note that in compiler theory, 'regular expressions with recursion' don't > exist, such grammars fall in the Chomsky hierarchy of contex-free languages. > For such languages, you need to have a stack besides the automaton.) > > This additional power allows you to recognize nested structures correctly, for > example to find the matching closing brakcet in an expression. > > The LALR(1) algorithm used by yacc is a compromise between maximal recognition > power (enough for most languages), and not too many states (although that is > of less importance nowadays). > > > I guess Lex parser is only a special case of Yacc parser (Lex output > > is list but not recursively structure like tree). So why not merge > > them into one tool? > > Speed and simplicity mostly. The current division of labour means you can use > optimized algorithms for each part while reducing overall complexity. > > While writing yacc rules, you don't have to think about the representation of > an integer number (was it '12345' or '0x3039' or '030071' or > 'B11000000111001'?). > > None the less, parsers that combine both phases do exist. One example is > 'sglr' (a C-based solution, I am not aware of any Python-based solutions). It > uses much more powerful (and much more resource-eating) algorithms to > recognize the text. > > Their power is useful at places where the lex/yacc algorithms are not enough, > eg with ambigious grammars and/or textual input with several languages > combined (eg Cobol with embedded SQL). > > Albert --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ply-hack" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ply-hack?hl=en -~----------~----~----~----~------~----~------~--~---
