Re: Parsing (well, lexing, really) wikipedia markup

Sam Denton Fri, 10 Jun 2011 11:16:32 -0700

One other idea that's occurred to me (I think I saw it somewhere in the PLY 
pages, but I can't find it now) is to nest lexical scanners.  I keep my 
current scanner and feed it into a filter that mutates the tokens as needed 
and emits the tokens I need.  In practice, the messy parts of Wikipedia's 
mark-up can be resolved by looking a few tokens ahead. So, when I find 
'{{{{{', I can pull tokens until I find either a '}}}' (which seems to 
consistently be the fourth token after that one) or a '}}' (which I haven't 
seen occurring, but better safe than sorry), and then emit either '{{{' and 
'{{' or '{{' and '{{{'.


Here's a simple proof-of-concept:

    class wrapper(object):
        def __init__(self, klass):
            self.klass = klass
            self.stack = []
        def lex(self, *argv, **kwds):
            self.lex = self.klass.lex(*argv, **kwds)
            return self
        def input(self, *argv, **kwds):
            return self.lex.input(*argv, **kwds)
        def token(self):
            if self.stack:
                return self.stack.pop()
            token = self.lex.token()
            if token is not None:
                if token.type == 'LBRACES5':
                    new_token = lex.LexToken()
                    new_token.type = 'LBRACES3'
                    new_token.value = '{{{'
                    new_token.lineno = token.lineno
                    new_token.lexpos = token.lexpos
                    self.stack.append(new_token)
                    token.type = 'LBRACES2'
                    token.value = '{{'
                    new_token.lexpos += 2
            return token
        def __iter__(self):
            return self
        def next(self):
            t = self.token()
            if t is None:
                raise StopIteration
            return t
        __next__ = next

lexer = wrapper(lex).lex()
[...]

-- 
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/ply-hack/-/jU_evCnr9mYJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en.

Re: Parsing (well, lexing, really) wikipedia markup

Reply via email to