Re: [Python-Dev] PEP 3147: PYC Repository Directories

Brett Cannon Mon, 01 Feb 2010 11:37:14 -0800

On Sun, Jan 31, 2010 at 11:04, Raymond Hettinger
<raymond.hettin...@gmail.com> wrote:
>
> On Jan 30, 2010, at 4:00 PM, Barry Warsaw wrote:
>> Abstract
>> ========
>>
>> This PEP describes an extension to Python's import mechanism which
>> improves sharing of Python source code files among multiple installed
>> different versions of the Python interpreter.
>
> +1
>
>
>>  It does this by
>> allowing many different byte compilation files (.pyc files) to be
>> co-located with the Python source file (.py file).
>
> It would be nice if all the compilation files could be tucked
> into one single zipfile per directory to reduce directory clutter.
>
> It has several benefits besides tidiness. It hides the implementation
> details of when magic numbers get shifted.  And it may allow faster
> start-up times when the zipfile is in the disk cache.


It also eliminates stat calls. I have not seen anyone mention this,
but on filesystems where stat calls are expensive (e.g. NFS), this is
going to increase import cost (and thus startup time which some people
are already incredibly paranoid about). You are now going to shift
from a single stat call to check for a bytecode file to two just in
the search phase *per file check* (remember you need to search for
module.py and module/__init__.py). And then you get to repeat all of
this during the load process (potentially, depending on how aggressive
the loader is with caching).

As others have said, an uncompressed zip file could work here. Or even
a file format where the first 4 bytes is the timestamp and then after
that are chunks of length-of-bytecode|magic|bytecode. That allows for
opening a file in append mode to add more bytecode instead of a
zipfile's requirement of rewriting the TOC on the end of the file
every time you mutate the file (if I remember the zip file format
correctly). Biggest cost in this simple approach would be reading the
file in (unless you mmap the thing when possible) since once read the
code will be a bytes object which means constant time indexing until
you find the right magic number. And adding support to differentiate
between -O bytecode is simply adding a marker per chunk of bytecode.

And I disagree this would be difficult as the PEP suggests given the
proper file format. For zip files zipimport already has the read code
in C; it just would require the code to write to a zip file. And as
for the format I mentioned above, that's dead-simple to implement.

-Brett
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3147: PYC Repository Directories

Reply via email to