I am trying to use the lexers in pygments to extract comments in code files. I can filter out just the comments I want successfully, but the comment tokens always include the respective language's comment syntax (eg for Python, I'm looking for hash-prefix comments and doc strings). I would ideally like to be able to reverse lookup the regex pattern for a given token definition so that I can use this regex to strip the comment syntax from the comment tokens (so that I'm left with only the comment text itself).
I have tried writing a function to iterate over the token definitions to look up this information. I first managed to work around (I think) the include capabilities of token definitions, by diving into a recursive lookup when an include type is found. However I am now stumped by the callback functions (using, bygroups). I could probably figure out a way to overcome this hurdle but I'm also guessing there should be a way to get all the contents of the token definitions for a lexer as evaluated tuples. The issue may be that I don't need (or want) the complexity of state awareness for what I'm doing, which (I believe) is the reason the token defs make use of callbacks in the first place. Can anyone advise how to get at the appropriate token definitions? Thank you. -- You received this message because you are subscribed to the Google Groups "pocoo-libs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/pocoo-libs?hl=en.
