I've written a rather minimal s-expression parser with PLY, but I'm
experiencing a strange bug.  Since the code is rather short, I'll post
it here:

--==BEGIN lexer.py==--
import ply.lex as lex

tokens = ('INTEGER', 'FLOAT', 'STRING', 'LPAREN', 'RPAREN',
'IDENTIFIER',
          'NEWLINE', 'RATIONAL')

t_FLOAT         = r'((\d*\.\d+)(E[\+-]?\d+)?|([1-9]\d*E[\+-]?\d+))'
t_STRING        = r'\".*?\"'
t_LPAREN        = r'\('
t_RPAREN        = r'\)'
t_IDENTIFIER    = r'[^0-9()][^()\ \t\n]*'
t_INTEGER       = r'(-)?\d+'
t_RATIONAL      = r'(-)?\d+/\d+'

t_ignore = ' \t'

def t_NEWLINE(t):
    r'\n'
    t.lexer.lineno += 1

def t_error(t):
    '''
    Houston, we have a problem.
    '''
    print("Illegal character %s" % t.value[0])
    t.lexer.skip(1)

lexer = lex.lex (optimize = 0)

--==END lexer.py==--

Now, when I do this:

>>> from lexer import lexer
>>>
>>> lexer.input (' (+ 7abc 3 "xyz") ')
>>> for token in lexer:
...     print token

I get:

LexToken(LPAREN,'(',1,1)
LexToken(IDENTIFIER,'+',1,2)
LexToken(INTEGER,'7',1,4)
LexToken(IDENTIFIER,'abc',1,5)
LexToken(INTEGER,'3',1,9)
LexToken(IDENTIFIER,'"xyz"',1,11)
LexToken(RPAREN,')',1,16)
>>>

What I'd expect is an error matching 7abc, since it's not a valid
identifier.  The thing that makes me suspect this is a LY bug rather
than a bug in my code is that pyscheme (http://hkn.eecs.berkeley.edu/
~dyoo/python/pyscheme/) builds its lexer and parser using PLY and has
the same bug.  Can anyone confirm this is a bug in PLY or am I doing
something subtly wrong?

Thanks!

--

You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en.


Reply via email to