pygments: get regex patterns for particular token types

Adam Wed, 20 Jul 2011 13:15:26 -0700

I am trying to use the lexers in pygments to extract comments in code
files. I can filter out just the comments I want successfully, but the
comment tokens always include the respective language's comment syntax
(eg for Python, I'm looking for hash-prefix comments and doc strings).
I would ideally like to be able to reverse lookup the regex pattern
for a given token definition so that I can use this regex to strip the
comment syntax from the comment tokens (so that I'm left with only the
comment text itself).


I have tried writing a function to iterate over the token definitions
to look up this information. I first managed to work around (I think)
the include capabilities of token definitions, by diving into a
recursive lookup when an include type is found. However I am now
stumped by the callback functions (using, bygroups). I could probably
figure out a way to overcome this hurdle but I'm also guessing there
should be a way to get all the contents of the token definitions for a
lexer as evaluated tuples.

The issue may be that I don't need (or want) the complexity of state
awareness for what I'm doing, which (I believe) is the reason the
token defs make use of callbacks in the first place.

Can anyone advise how to get at the appropriate token definitions?
Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"pocoo-libs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/pocoo-libs?hl=en.

pygments: get regex patterns for particular token types

Reply via email to