Steve Holden schrieb:

Suppose I use the dict and I want to access the regex associatetd with
the token named "tokenname" (that is, no iteration, but a single
access). I could simple write tokendict["tokenname"]. But with the list
of tuples, I can't think of an equally easy way to do that. But then, as
a beginner, I might be underestimating Python.

But when do you want to do that? There's no point inventing use cases -
they should be demonstrated needs.

Well, I had been thinking about further reducing the number of regex matchings needed. So I wanted to modify my lexer not to tokenize the whole input at once, but only try to grab the next token from the input "just in time" / on demand. For that I was thinking of having a next() method like this:

    def next( self, nameOfExpectedToken ):
        regex = self.getRegexByTokenName( nameOfExpectedToken )
        match = regex.match( self.source, self.offset )
        if not match: return False
        line = self.line
        self.line += match.group(0).count( "\n" )
        self.offset += len( match.group(0) )
        return ( nameOfExpectedToken, match, line )

I'm not sure if this is a good idea, but it looks like one to me. The problem is the first line of the method which retrieves the regex associated with the given token name. Using a dict, I could simply write

        regex = self.tokendict[nameOfExpectedToken]

But with a list I suppose I wouldn't get away without a loop. Which I assume is more expensive that the dict.

Or simply pass compiled token patterns in in the first place when they
are necessary ... then the caller has the option of not bothering to
optimize in the first place!

That would be an option. But shouldn't it be the lexer who takes care of optimizing its own work as much as it can do without the caller's assistance? After all, the caller should not need to know about the internal workings of the lexer.

[Optimizing performance by putting most frequent tokens first]
With a dict you have no such opportunity, because the ordering is
determined by the implementation and not by your data structure.

True. Still, I should be able to gain even better performance with my above approach using a next() function, as this would completely eliminate all "useless" matching (like trying to match FOO where no foo is allowed).

Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to