[issue14043] Speed-up importlib's _FileFinder

2012-02-19 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Committed now (with invalidate_caches() as well).

--
resolution:  -> fixed
stage: commit review -> committed/rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-19 Thread Roundup Robot

Roundup Robot  added the comment:

New changeset bbaab666e6c7 by Antoine Pitrou in branch 'default':
Issue #14043: Speed up importlib's _FileFinder by at least 8x, and add a new 
importlib.invalidate_caches() function.
http://hg.python.org/cpython/rev/bbaab666e6c7

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-18 Thread Brett Cannon

Brett Cannon  added the comment:

I wouldn't worry too much about test_runpy as it has always been a touchy test 
for me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-18 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

This is a patch demonstrating invalidate_caches().
As for test_runpy, it seems runpy uses imp.get_loader(), so I don't see how it 
can remain synchronized with importlib's state.

--
Added file: http://bugs.python.org/file24560/listdir_cache3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-18 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Another possibility would be to include an explicit invalidate_caches() 
function in importlib.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Eric Snow

Changes by Eric Snow :


--
nosy: +eric.snow

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> Well, it's the reason you added file size to .pyc files. =)

pyc files only store the integer timestamp. Here I store the original
floating-point timestamp, and I assumed it would be sufficient :)

>  I assume if you store the system clock as well to see if no time has
> changed it would kill the performance gain.

I don't know. I might give it a try but it might still be fragile.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Brett Cannon

Brett Cannon  added the comment:

Well, it's the reason you added file size to .pyc files. =) I assume if you 
store the system clock as well to see if no time has changed it would kill the 
performance gain. We could also have two versions of _FileFinder such that 
people could choose which they preferred.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> The lastest patch (listdir_cache2) LGTM. Have you by any chance
> patched it in and run ./python -m importlib.test.regrtest (runs the
> entire test suite with builtins.__import__() substituted with
> importlib.__import__())?

There are a couple of test failures (in test_import, test_runpy,
test_reprlib). Apparently this is due to false positives when comparing
directory modification times. If I sleep() enough between tests, the
failures disappear...
I wonder if this could be a problem in real-life (say, someone compiling
some DSL to Python code on-the-fly and expecting import to pick up
without any timestamp problems).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Brett Cannon

Brett Cannon  added the comment:

The lastest patch (listdir_cache2) LGTM. Have you by any chance patched it in 
and run ./python -m importlib.test.regrtest (runs the entire test suite with 
builtins.__import__() substituted with importlib.__import__())? If not I can do 
it on my ride home tonight to make sure semantics didn't surprisingly change (I 
don't see how). And then I can benchmark against the standard benchmark suite 
to see if it speeds things up in my bootstrap branch.

--
stage: patch review -> commit review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Changes by Antoine Pitrou :


Added file: http://bugs.python.org/file24552/listdir_cache2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Updated light-weight caching patch (now checks PYTHONCASEOK dynamically).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> And while I'm thinking about it, the hybrid caching approach of
> caching just the directory contents to start and then caching full
> details on successful searches (and failures) would also let you cache
> the full file path as the strings will be interned and used again by
> the loaders for setting __file__.

Well, you shouldn't call find_module() on the same module twice (except
for benchmarks), so I'm not sure what caching the results would bring.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Brett Cannon

Brett Cannon  added the comment:

And while I'm thinking about it, the hybrid caching approach of caching just 
the directory contents to start and then caching full details on successful 
searches (and failures) would also let you cache the full file path as the 
strings will be interned and used again by the loaders for setting __file__.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Brett Cannon

Brett Cannon  added the comment:

On Fri, Feb 17, 2012 at 14:31, Antoine Pitrou wrote:

>
> Antoine Pitrou  added the comment:
>
> > Why pre-calculate everything? In the most common case any single
> > module will be imported once, if at all. And once it is imported it
> > will get cached in sys.modules, alleviating the need to hit the finder
> > again. So from a performance standpoint wouldn't it be better not to
> > do all of the pre-calculation and instead do that as needed assuming
> > that sys.modules will shield the finder from having to do repetitive
> > things like figuring out what loader is needed?
>
> I figured it would avoid repetitive tests for all 10 suffixes.
> That said, I have now tried the alternative: find_module() is around 50%
> slower, but building the cache is 10x faster. Perhaps this is a winner.
>

What is the time increase for find_module() vs. the speed-up of building
the cache? I.e. how many imports are needed before doing the full
calculation is a benefit? And would it make sense to have a hybrid of
caching the contents for fast start-up but then caching full details after
a successful find? That would mean no work is ever simply tossed out and
forgotten.

> It would depend on the situation (short or long sys.path, few or many
> imports, etc.). Perhaps you can try both patches on your bootstrap repo?
>

Yep, that's not hard (and it will only get faster as I replace the bodies
of __import__() and _gcd_import() with C code so that sys.modules is C-fast
again). Question is what to benchmark against? I should probably get the
standard benchmarks up and running and see how those are affected
(especially the start-up ones).

>
> >  Plus if the finder gets its cache invalidated frequently it  will
> > simply be wasting its time.
>
> Well, in real-world situations I don't think the cache will ever get
> invalidated: because imports are mostly done at startup, and because
> invalidating the cache means you are installing new libraries or
> updating existing ones while a running program is about to import
> something.
>

I agree, but it was just something to consider.

>
> > Otherwise it's good to know three of us now have independently come up
> > with fundamentally the same idea for speeding up imports. =)
>
> Yup :)
>
> --
>
> ___
> Python tracker 
> 
> ___
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Patch with light-weight alternative.

--
Added file: http://bugs.python.org/file24551/listdir_cache.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> Why pre-calculate everything? In the most common case any single
> module will be imported once, if at all. And once it is imported it
> will get cached in sys.modules, alleviating the need to hit the finder
> again. So from a performance standpoint wouldn't it be better not to
> do all of the pre-calculation and instead do that as needed assuming
> that sys.modules will shield the finder from having to do repetitive
> things like figuring out what loader is needed?

I figured it would avoid repetitive tests for all 10 suffixes.
That said, I have now tried the alternative: find_module() is around 50%
slower, but building the cache is 10x faster. Perhaps this is a winner.
It would depend on the situation (short or long sys.path, few or many
imports, etc.). Perhaps you can try both patches on your bootstrap repo?

>  Plus if the finder gets its cache invalidated frequently it  will
> simply be wasting its time.

Well, in real-world situations I don't think the cache will ever get
invalidated: because imports are mostly done at startup, and because
invalidating the cache means you are installing new libraries or
updating existing ones while a running program is about to import
something.

> Otherwise it's good to know three of us now have independently come up
> with fundamentally the same idea for speeding up imports. =)

Yup :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Brett Cannon

Brett Cannon  added the comment:

OK, I have gone ahead and done a review over on rietveld for the code as-is. 
But I do have a design question.

Why pre-calculate everything? In the most common case any single module will be 
imported once, if at all. And once it is imported it will get cached in 
sys.modules, alleviating the need to hit the finder again. So from a 
performance standpoint wouldn't it be better not to do all of the 
pre-calculation and instead do that as needed assuming that sys.modules will 
shield the finder from having to do repetitive things like figuring out what 
loader is needed? You will have to do the calculation regardless for the first 
import, but you might not need to import every module in a directory. Plus if 
the finder gets its cache invalidated frequently it  will simply be wasting its 
time.

I'm not going to argue from the perspective that most modules won't get 
imported as that's purely a stdlib thing; I am willing to bet most projects 
import nearly all of their modules in each execution.

Otherwise it's good to know three of us now have independently come up with 
fundamentally the same idea for speeding up imports. =)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> Just to fill you in, the discussion centred on the idea of doing a
> listdir() of the directory the FileFinder was in charge of watching
> and caching that. Then, when it had to look up a file all it had to do
> was stat the directory to look for a change before it simply looked at
> a frozenset of directory contents. That does away with having to stat
> for every file type (directory, module, extension, etc.) and replaces
> it with one upfront listdir() and then a single directory stat per
> lookup.

Ah, then it's basically what my patch does (except that it also computes
all potential module names instead of dumping the os.listdir() result in
a frozenset).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Brett Cannon

Brett Cannon  added the comment:

Just to fill you in, the discussion centred on the idea of doing a listdir() of 
the directory the FileFinder was in charge of watching and caching that. Then, 
when it had to look up a file all it had to do was stat the directory to look 
for a change before it simply looked at a frozenset of directory contents. That 
does away with having to stat for every file type (directory, module, 
extension, etc.) and replaces it with one upfront listdir() and then a single 
directory stat per lookup.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Regenerated patch.

--
Added file: http://bugs.python.org/file24550/find_module_cache3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> I have not had a chance to deep-dive in the patch, but I just wanted
> to double-check that this (a) implemented the idea PJE and I discussed
> on python-dev

I haven't followed this discussion.

> , and (b) you re-generate the patch as I pushed some changes to
> importlib today.

Will do.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Brett Cannon

Brett Cannon  added the comment:

I have not had a chance to deep-dive in the patch, but I just wanted to 
double-check that this (a) implemented the idea PJE and I discussed on 
python-dev, and (b) you re-generate the patch as I pushed some changes to 
importlib today.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Updated patch with faster cache filling.

--
Added file: http://bugs.python.org/file24549/find_module_cache2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Note that the cost of filling the cache itself isn't trivial:

$ ./python -m timeit -s "import sys; from importlib import _bootstrap" 
"_bootstrap._file_path_hook('Lib')._fill_cache()"
1000 loops, best of 3: 1.88 msec per loop

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Changes by Antoine Pitrou :


Added file: http://bugs.python.org/file24548/find_module_cache.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

Changes by Antoine Pitrou :


Removed file: http://bugs.python.org/file24547/find_module_cache.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14043] Speed-up importlib's _FileFinder

2012-02-17 Thread Antoine Pitrou

New submission from Antoine Pitrou :

This patch makes importlib's _FileFinder more than 10x faster by keeping a 
cache of directory entries. It's actually faster than imp.find_module()!

-> imp.find_module:

$ ./python -m timeit -s "import imp" "imp.find_module('logging', None)"
1 loops, best of 3: 69.9 usec per loop
$ ./python -m timeit -s "import imp" "imp.find_module('bisect', None)"
1 loops, best of 3: 108 usec per loop

-> unpatched importlib:

$ ./python -m timeit -s "from importlib._bootstrap import _DefaultPathFinder as 
Finder" "Finder.find_module('logging', None)"
1000 loops, best of 3: 431 usec per loop
$ ./python -m timeit -s "from importlib._bootstrap import _DefaultPathFinder as 
Finder" "Finder.find_module('bisect', None)"
1000 loops, best of 3: 468 usec per loop

-> patched importlib:

$ ./python -m timeit -s "from importlib._bootstrap import _DefaultPathFinder as 
Finder" "Finder.find_module('logging', None)"
1 loops, best of 3: 37.5 usec per loop
$ ./python -m timeit -s "from importlib._bootstrap import _DefaultPathFinder as 
Finder" "Finder.find_module('bisect', None)"
1 loops, best of 3: 36.9 usec per loop

--
components: Interpreter Core, Library (Lib)
files: find_module_cache.patch
keywords: patch
messages: 153566
nosy: brett.cannon, pitrou
priority: normal
severity: normal
stage: patch review
status: open
title: Speed-up importlib's _FileFinder
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file24547/find_module_cache.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com