I agree. Add an extra lexing layer that strips the unwanted tokens from the stream before passing them to the parser. The parse() function has a lexer argument to specify an alternative lexer. There is also a tokenfunc argument that specifies the function to use for getting tokens. Using either of those, you could inject extra processing to discard tokens.
Cheers, Dave On Oct 13, 2010, at 1:25 AM, Oldřich Jedlička wrote: > Hi Dave, > > On Monday 11 October 2010 17:01:26 Dave Benjamin wrote: >> Hi all, >> >> I'm writing a PHP parser with PLY, which you can find here: >> http://github.com/ramen/phply >> >> The lexer is designed to be as close as possible to the one built into >> PHP (http://php.net/token_get_all), which means that there are tokens >> for WHITESPACE, OPEN_TAG, CLOSE_TAG, and a few other syntactical >> elements that are ignored by the parser, but still available in case >> someone wants to use the lexer for color syntax highlighting, etc. > > I would write another lexer that calls the full lexer (the color syntax > highliting one), but ignores the mentioned tokens. The parser should not care > about something like whitespaces or comments. > > Oldřich. > >> I don't want these tokens to produce any values in the parser output, >> not even None, so the technique I've been using is to call errok() for >> these tokens in the error handler: >> >> def p_error(t): >> if t: >> if t.type in ('WHITESPACE', 'OPEN_TAG', 'CLOSE_TAG', >> 'COMMENT', 'DOC_COMMENT'): >> yacc.errok() >> else: >> raise SyntaxError('invalid syntax', (None, t.lineno, None, >> t.value)) >> else: >> raise SyntaxError('unexpected EOF while parsing', (None, None, >> None, None)) >> >> http://github.com/ramen/phply/blob/master/phply/phpparse.py#L1297 >> >> I wonder if this is the right way to do it, or if there's a better >> way. For one thing, when I start up my parser, I get the following >> warnings: >> >> WARNING: Token 'DOC_COMMENT' defined, but not used >> WARNING: Token 'COMMENT' defined, but not used >> WARNING: Token 'WHITESPACE' defined, but not used >> WARNING: Token 'OPEN_TAG' defined, but not used >> WARNING: There are 4 unused tokens >> >> I can make these warnings go away by adding a rule that accepts these >> tokens, but then I start producing values for them as well, which I >> don't want. They can appear anywhere, so the error handler seems like >> a convenient place to ignore them, but I wonder if this is an abuse of >> this feature of PLY. I also wonder if it is thread-safe, since >> yacc.errok() is module-level. >> >> I'd appreciate any advice on the topic, or comments or suggestions >> about the project in general. Thanks for your time! >> >> Dave > > -- > You received this message because you are subscribed to the Google Groups > "ply-hack" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/ply-hack?hl=en. > -- You received this message because you are subscribed to the Google Groups "ply-hack" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ply-hack?hl=en.
