Re: Problem with processing more then one statement

D.Hendriks (Dennis) Thu, 12 Feb 2009 02:44:50 -0800

Hello Ben,

If you don't specify the start symbol, the symbol name of the first rule 
(in the file) is used. In the original case, it was 'statement', because 
the first rule in the file is 'statement : VARIABLE EQUALS expression 
SEMI'. At least that is what I think happens (if I remember correctly 
from reading the PLY documentation). I always explicitly supply the 
start symbol to the yacc method. See the PLY documentation for more 
information. If you take a look at the generated parser.out file, you see:


Grammar

Rule 1     S' -> statement
Rule 2     statement -> VARIABLE EQUALS expression SEMI
Rule 3     statement -> expression
Rule 4     expression -> expression PLUS expression
Rule 5     expression -> expression MINUS expression
Rule 6     expression -> expression TIMES expression
Rule 7     expression -> expression DIVIDE expression
Rule 8     expression -> NUMBER
Rule 9     expression -> VARIABLE

meaning that indeed 'statement' is the start symbol, looking at the 
actual start symbol S'.

PLY generates parsing tables. When you supply input to the parse method, 
PLY will try to parse it. It will try to 'match' it to the start symbol. 
So, it will try to match a 'statement'. If you want to see what happens 
internally, supply debug=2 as parameter to the parse method. Also, you 
may want to check out the parser.out file to see the parse tables 
information in human readable format. If you look at the debug output 
(with debug=2), you get (I use PLY 3.0, but you get something similar in 
PLY 2.5):

PLY: PARSE DEBUG START

State  : 0
Stack  : . LexToken(VARIABLE,'$foo',1,0)
Action : Shift and goto state 3

State  : 3
Stack  : VARIABLE . LexToken(EQUALS,'=',1,5)
Action : Shift and goto state 5

State  : 5
Stack  : VARIABLE EQUALS . LexToken(NUMBER,1,1,7)
Action : Shift and goto state 1

State  : 1
Stack  : VARIABLE EQUALS NUMBER . LexToken(SEMI,';',1,8)
Action : Reduce rule [expression -> NUMBER] with [1] and goto state 7
expression_number
Result : <int @ 0x813ef18> (1)

State  : 11
Stack  : VARIABLE EQUALS expression . LexToken(SEMI,';',1,8)
Action : Shift and goto state 16

State  : 16
Stack  : VARIABLE EQUALS expression SEMI . LexToken(VARIABLE,'$bar',2,10)
ERROR: Error  : VARIABLE EQUALS expression SEMI . 
LexToken(VARIABLE,'$bar',2,10)
Syntax error on line 2: $bar

You see it is in state 16, which you can look up in parser.out:

state 16

    (1) statement -> VARIABLE EQUALS expression SEMI .

    $end            reduce using rule 1 (statement -> VARIABLE EQUALS 
expression SEMI .)

You see it only accepts $end (end of input) in that state, meaning it 
didn't expect more input after the first ';' character, which is exactly 
why you need the 'statements' symbol and corresponding parsing rules.

For more information, consult documentation on ply, original yacc and/or 
LALR(1) parsers.

Hope this helps,
Dennis



comsatcat wrote:
> Dennis,
>  
> Thank you for the feedback... I've no real experience using lex/yacc 
> so this is a new experience for me :)
>  
> I got it working adding the two definitions you suggested, but I'm not 
> sure I understand whats totally going on, so I was hoping you could 
> elaborate.
>  
> From what I get from looking at the code...
>  
> It defaults to the first processing statement encountered.
> By adding the two definitions you mentioned below, I'm essentially 
> saying "the data can have multiple statements and statements of 
> statements".  What I'm not understanding is the path the parser takes 
> after matching the statement
>  
> If I am following correctly it should look like this (from top to bottom):
>  
> statement = derefernce(statements)
> expression = dereference(statement)
> variable/number = dereference(expression)
>  
> I understand it's much more complex then that, but in a nutshell thats 
> the general path its following?
>  
> Thanks in advance,
> Ben
>  
>
>  
> On Thu, Feb 12, 2009 at 1:27 AM, D.Hendriks (Dennis) 
> <[email protected] <mailto:[email protected]>> wrote:
>
>
>     Hello comsatcat,
>
>     You defined parser rules but did not specify the start symbol, meaning
>     it gets to be 'statement', because that's the first one (I think).
>     Statement matches the part '$foo = 1;' after which the statement has
>     been matched. The grammar doesn't expect anything after that. You
>     could
>     add something like this, before any other parsing rules:
>
>     def p_statements_1(p):
>        '''
>        statements : statement
>        '''
>        pass # or something else...
>
>
>     def p_statements_2(p):
>        '''
>        statements : statements statement
>        '''
>        pass # or something else...
>
>
>     Or you could at it after/between the other rules, and explicitly
>     define
>     the start symbol.
>
>     Dennis
>
>
>     comsatcat wrote:
>     > I'm playing with Ply, my input file is as follows:
>     >
>     > $foo = 1;
>     > $bar = 3;
>     >
>     > My problem is, when I pass multiple lines to the parser, it
>     errors out
>     > after the first statement at $bar...
>     >
>     > <code>
>     >
>     > #!/usr/bin/env python
>     >
>     > import os, sys
>     > import ply.lex as lex
>     > import ply.yacc as yacc
>     >
>     > tables = {}
>     >
>     > reserved = {
>     >             'if' : 'IF',
>     >             'else' : 'ELSE',
>     >             'elsif' : 'ELSIF',
>     >             'while' : 'WHILE',
>     >             'for' : 'FOR',
>     >             'print' : 'PRINT'
>     > }
>     >
>     > tokens = [
>     >           'VARIABLE', 'NUMBER',
>     >           'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'EQUALS',
>     >           'LPAREN', 'RPAREN', 'SEMI', 'COMMENT',
>     >           'LT', 'GT', 'ET', 'LE', 'GE'
>     > ]
>     > tokens += list(reserved.values())
>     > t_LT = r'<'
>     > t_GT = r'>'
>     > t_ET = r'=='
>     > t_LE = r'<='
>     > t_GE = r'>='
>     > t_SEMI = r';'
>     > t_PLUS = r'\+'
>     > t_MINUS = r'-'
>     > t_TIMES = r'\*'
>     > t_DIVIDE = r'/'
>     > t_EQUALS = r'='
>     > t_LPAREN = r'\('
>     > t_RPAREN = r'\)'
>     >
>     > t_ignore = ' \t'
>     >
>     > def t_COMMENT(t):
>     >     r'\#.*'
>     >     pass
>     >
>     > def t_VARIABLE(t):
>     >     r'\$[a-zA-Z]{1,}'
>     >     t.type = reserved.get(t.value, 'VARIABLE')
>     >     return t
>     >
>     > def t_NUMBER(t):
>     >     r'[0-9]+'
>     >     t.value = int(t.value)
>     >     return t
>     >
>     > def t_newline(t):
>     >     r'\n'
>     >     t.lexer.lineno += 1
>     >
>     > def t_error(t):
>     >     print "Illegal character '%s'" % t.value[0]
>     >     t.lexer.skip(1)
>     >
>     > lexer = lex.lex()
>     >
>     > precedence = (
>     >               ('left', 'PLUS','MINUS'),
>     >               ('left', 'TIMES','DIVIDE')
>     >             )
>     >
>     > def p_statement_assignment(p):
>     >     '''
>     >     statement : VARIABLE EQUALS expression SEMI
>     >     '''
>     >     tables[p[1].replace("$", "")] = p[3]
>     >     print "statement_assignment"
>     >
>     > def p_statement_expression(p):
>     >     '''
>     >     statement : expression
>     >     '''
>     >     print "statement_expression"
>     >     p[0] = p[1]
>     >
>     > def p_expression(p):
>     >     '''
>     >     expression : expression PLUS expression
>     >                | expression MINUS expression
>     >                | expression TIMES expression
>     >                | expression DIVIDE expression
>     >     '''
>     >     print "expression"
>     >     if p[2] == '+':
>     >         p[0] = p[1] + p[3]
>     >     elif p[2] == '-':
>     >         p[0] = p[1] - p[3]
>     >     elif p[2] == '*':
>     >         p[0] = p[1] * p[3]
>     >     elif p[2] == '/':
>     >         p[0] = p[1] / p[3]
>     >
>     > def p_expression_number(p):
>     >     '''
>     >     expression : NUMBER
>     >     '''
>     >     print "expression_number"
>     >     p[0] = p[1]
>     >
>     > def p_expression_variable(p):
>     >     '''
>     >     expression : VARIABLE
>     >     '''
>     >     print "expression_variable"
>     >     try:
>     >         p[0] = tables[p[1].replace("$", "")]
>     >     except IndexError:
>     >         print "Cannot find variable"
>     >         pass
>     >
>     > def p_error(p):
>     >     if not p:
>     >         print "Syntax error: premature end of file"
>     >     else:
>     >         print "Syntax error on line %d: %s" % (p.lineno, p.value)
>     >
>     > parser = yacc.yacc()
>     >
>     > def run():
>     >     data = open(sys.argv[1]).read()
>     >     parser.parse(data)
>     >
>     > if __name__ == "__main__":
>     >     if len(sys.argv) != 2:
>     >         sys.exit(0)
>     >
>     >     run()
>     >     print str(tables)
>     >
>     > </code>
>     >
>     > Does anyone see what I'm doing wrong here?
>     > >
>     >
>
>
>
>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Problem with processing more then one statement

Reply via email to