Re: My first Python program -- a lexer

Thomas Mlynarczyk Wed, 12 Nov 2008 11:20:49 -0800

Steve Holden schrieb:

Suppose I use the dict and I want to access the regex associatetd with
the token named "tokenname" (that is, no iteration, but a single
access). I could simple write tokendict["tokenname"]. But with the list
of tuples, I can't think of an equally easy way to do that. But then, as
a beginner, I might be underestimating Python.

But when do you want to do that? There's no point inventing use cases -
they should be demonstrated needs.

Well, I had been thinking about further reducing the number of regexmatchings needed. So I wanted to modify my lexer not to tokenize thewhole input at once, but only try to grab the next token from the input"just in time" / on demand. For that I was thinking of having a next()method like this:


    def next( self, nameOfExpectedToken ):
        regex = self.getRegexByTokenName( nameOfExpectedToken )
        match = regex.match( self.source, self.offset )
        if not match: return False
        line = self.line
        self.line += match.group(0).count( "\n" )
        self.offset += len( match.group(0) )
        return ( nameOfExpectedToken, match, line )

I'm not sure if this is a good idea, but it looks like one to me. Theproblem is the first line of the method which retrieves the regexassociated with the given token name. Using a dict, I could simply write


        regex = self.tokendict[nameOfExpectedToken]

But with a list I suppose I wouldn't get away without a loop. Which Iassume is more expensive that the dict.

Or simply pass compiled token patterns in in the first place when they
are necessary ... then the caller has the option of not bothering to
optimize in the first place!

That would be an option. But shouldn't it be the lexer who takes care ofoptimizing its own work as much as it can do without the caller'sassistance? After all, the caller should not need to know about theinternal workings of the lexer.


[Optimizing performance by putting most frequent tokens first]

With a dict you have no such opportunity, because the ordering is
determined by the implementation and not by your data structure.

True. Still, I should be able to gain even better performance with myabove approach using a next() function, as this would completelyeliminate all "useless" matching (like trying to match FOO where no foois allowed).


Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
--
http://mail.python.org/mailman/listinfo/python-list

Re: My first Python program -- a lexer

Reply via email to