Hi everyone, I haven't had a chance to introduce myself yet. I'm George
Boutsioukis, a CS student from Greece and I'm currently enrolled as a GSOC
student for the PSF. The task I was involved with for the past few weeks was
speeding up the 2to3 tool. 

For those who are not aware of 2to3's internals, the tool matches a series 
of tree patterns to a python syntax tree and applies a code transformation 
on each match. The real bottleneck of this tool is the tree pattern matching
algorithm, since the current version traverses all nodes of the tree,
top-to-bottom, and checks each node against a set of tree patterns.

The new algorithm I've implemented reduces the candidate nodes for matching.
The process works in three steps:

1) Each pattern tree is reduced to the most 'characteristic' path, i.e. the
rarest path you'll encounter on a real code tree. (this is simpler than it
sounds)

2) From the set of the above linear patterns an Aho-Corasick automaton is
constructed that is capable of inclusively matching all patterns(meaning 
that it may mark a wrong candidate but will never miss an instance)

3) The automaton is run on each leaf until the root of the tree is reached 
and the resulting match set is returned. And finally we apply each fixer 
for each node in the match set. Of course the process is a bit more 
complicated since we have to recheck the transformed code for new matches etc.

Since it is quite a hard concept to illustrate in a post, tell me if you
have any questions or need more info.

The benefit of this somewhat complex process is substantial; the total run 
time is reduced by roughly 50%. The new module is still experimental code, 
but mature enough to pass all tests. I propose to include the new module
with the next version of Python, perhaps as an explicit option for 2to3 
until it can be thoroughly tested.

The code repository can be found here: 
https://code.google.com/p/2to3-speedup2/source/browse/
(the code is a fork from py3k's 2to3 and was tested with version 3.1.2)

The blog I've been using for GSOC(containing more details on the matching
algorithm): http://george-gsoc.blogspot.com/

Regards,
George


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to