One other idea that's occurred to me (I think I saw it somewhere in the PLY
pages, but I can't find it now) is to nest lexical scanners. I keep my
current scanner and feed it into a filter that mutates the tokens as needed
and emits the tokens I need. In practice, the messy parts of Wikipedia's
mark-up can be resolved by looking a few tokens ahead. So, when I find
'{{{{{', I can pull tokens until I find either a '}}}' (which seems to
consistently be the fourth token after that one) or a '}}' (which I haven't
seen occurring, but better safe than sorry), and then emit either '{{{' and
'{{' or '{{' and '{{{'.
Here's a simple proof-of-concept:
class wrapper(object):
def __init__(self, klass):
self.klass = klass
self.stack = []
def lex(self, *argv, **kwds):
self.lex = self.klass.lex(*argv, **kwds)
return self
def input(self, *argv, **kwds):
return self.lex.input(*argv, **kwds)
def token(self):
if self.stack:
return self.stack.pop()
token = self.lex.token()
if token is not None:
if token.type == 'LBRACES5':
new_token = lex.LexToken()
new_token.type = 'LBRACES3'
new_token.value = '{{{'
new_token.lineno = token.lineno
new_token.lexpos = token.lexpos
self.stack.append(new_token)
token.type = 'LBRACES2'
token.value = '{{'
new_token.lexpos += 2
return token
def __iter__(self):
return self
def next(self):
t = self.token()
if t is None:
raise StopIteration
return t
__next__ = next
lexer = wrapper(lex).lex()
[...]
--
You received this message because you are subscribed to the Google Groups
"ply-hack" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/ply-hack/-/jU_evCnr9mYJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/ply-hack?hl=en.