I agree.   Add an extra lexing layer that strips the unwanted tokens from the 
stream before passing them to the parser.  The parse() function has a lexer 
argument to specify an alternative lexer.    There is also a tokenfunc argument 
that specifies the function to use for getting tokens.  Using either of those, 
you could inject extra processing to discard tokens.

Cheers,
Dave



On Oct 13, 2010, at 1:25 AM, Oldřich Jedlička wrote:

> Hi Dave,
> 
> On Monday 11 October 2010 17:01:26 Dave Benjamin wrote:
>> Hi all,
>> 
>> I'm writing a PHP parser with PLY, which you can find here:
>> http://github.com/ramen/phply
>> 
>> The lexer is designed to be as close as possible to the one built into
>> PHP (http://php.net/token_get_all), which means that there are tokens
>> for WHITESPACE, OPEN_TAG, CLOSE_TAG, and a few other syntactical
>> elements that are ignored by the parser, but still available in case
>> someone wants to use the lexer for color syntax highlighting, etc.
> 
> I would write another lexer that calls the full lexer (the color syntax 
> highliting one), but ignores the mentioned tokens. The parser should not care 
> about something like whitespaces or comments.
> 
> Oldřich.
> 
>> I don't want these tokens to produce any values in the parser output,
>> not even None, so the technique I've been using is to call errok() for
>> these tokens in the error handler:
>> 
>> def p_error(t):
>>    if t:
>>        if t.type in ('WHITESPACE', 'OPEN_TAG', 'CLOSE_TAG',
>> 'COMMENT', 'DOC_COMMENT'):
>>            yacc.errok()
>>        else:
>>            raise SyntaxError('invalid syntax', (None, t.lineno, None,
>> t.value))
>>    else:
>>        raise SyntaxError('unexpected EOF while parsing', (None, None,
>> None, None))
>> 
>> http://github.com/ramen/phply/blob/master/phply/phpparse.py#L1297
>> 
>> I wonder if this is the right way to do it, or if there's a better
>> way. For one thing, when I start up my parser, I get the following
>> warnings:
>> 
>> WARNING: Token 'DOC_COMMENT' defined, but not used
>> WARNING: Token 'COMMENT' defined, but not used
>> WARNING: Token 'WHITESPACE' defined, but not used
>> WARNING: Token 'OPEN_TAG' defined, but not used
>> WARNING: There are 4 unused tokens
>> 
>> I can make these warnings go away by adding a rule that accepts these
>> tokens, but then I start producing values for them as well, which I
>> don't want. They can appear anywhere, so the error handler seems like
>> a convenient place to ignore them, but I wonder if this is an abuse of
>> this feature of PLY. I also wonder if it is thread-safe, since
>> yacc.errok() is module-level.
>> 
>> I'd appreciate any advice on the topic, or comments or suggestions
>> about the project in general. Thanks for your time!
>> 
>> Dave
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "ply-hack" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/ply-hack?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en.

Reply via email to