Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-09 Thread Nick Coghlan
Terry Reedy wrote:
 Definitely. I have even wondered whether it would be possible to cache
 not just the bytecode for initializing a module, but also the
 initialized module itself (perhaps minus the name bindings for other
 imported modules).

Not easily, since running the module may have other side effects that
can't be cached.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-08 Thread Nick Coghlan
Ron Adam wrote:
 To tell the truth in most cases I hardly notice the extra time the first
 run takes compared to later runs with the precompiled byte code.  Yes it
 may be a few seconds at start up, but after that it's usually not a big
 part of the execution time.  Hmmm, I wonder if there's a threshold in
 file size where it really doesn't make a significant difference?

It's relative to runtime for the application itself (long-running
applications aren't going to notice as much of a percentage effect on
runtime) as well as to how many Python files are actually imported at
startup (only importing a limited number of modules, importing primarily
extension modules or effective use of a lazy module loading mechanism
will all drastically reduce the proportional impact of precompiled bytecode)

We struggle enough with startup time that doing anything that makes it
slower is rather undesirable though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-08 Thread Terry Reedy

On 2/8/2010 7:54 AM, Nick Coghlan wrote:

Ron Adam wrote:

To tell the truth in most cases I hardly notice the extra time the first
run takes compared to later runs with the precompiled byte code.  Yes it
may be a few seconds at start up, but after that it's usually not a big
part of the execution time.  Hmmm, I wonder if there's a threshold in
file size where it really doesn't make a significant difference?


It's relative to runtime for the application itself (long-running
applications aren't going to notice as much of a percentage effect on
runtime) as well as to how many Python files are actually imported at
startup (only importing a limited number of modules, importing primarily
extension modules or effective use of a lazy module loading mechanism
will all drastically reduce the proportional impact of precompiled bytecode)

We struggle enough with startup time that doing anything that makes it
slower is rather undesirable though.


Definitely. I have even wondered whether it would be possible to cache 
not just the bytecode for initializing a module, but also the 
initialized module itself (perhaps minus the name bindings for other 
imported modules).


Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Barry Warsaw
On Feb 06, 2010, at 02:20 PM, Guido van Rossum wrote:

 Upon further reflection, I agree.  __file__ also points to the source in
 Python 2.7.

Not in the 2.7 svn repo I have access to. It still points to the .pyc
file if it was used.

Ah, I was fooled by a missing pyc file.  Run it a second time and you're
right, it points to the pyc.

And I propose not to disturb this in 2.7, at least not by default. I'm
fine though with a flag or distro-overridable config setting to change
this behavior.

Cool.  I'm not sure this is absolutely necessary for Debian/Ubuntu, so I'll
call YAGNI on it for 2.x (until and unless it isn't ;).

 Do we need an attribute to point to the compiled bytecode file?

I think we do. Quite unrelated to this discussion I have a use case
for knowing easily whether a module was actually loaded from bytecode
or not -- but I also have a need for __file__ to point to the source.
So having both __file__ and __compiled__ makes sense to me.

__compiled__ or __cached__?  I like the latter but don't have strong feelings
about it either way.

When there is no source code but only bytecode I am file with both
pointing to the bytecode; in that case I presume that the bytecode is
not in a __pyr__ subdirectory. For dynamically loaded extension
modules I think both should be left unset, and some other __xxx__
variable could point to the .so or .dll file. FWIW the most common use
case for __file__ is probably to find data files relative to it. Since
the data won't be in the __pyr__ directory we couldn't make __file__
point to the __pyr__/pyc file without much code breakage.

The other main use case for having such an attribute on extension modules is
diagnostics.  I want to be able to find out where on the file system a .so
actually lives:

Python 2.7a3+ (trunk:78030, Feb  6 2010, 15:18:29) 
[GCC 4.4.1] on linux2
Type help, copyright, credits or license for more information.
 import _socket
 _socket.__file__
'/home/barry/projects/python/trunk/build/lib.linux-x86_64-2.7/_socket.so'

(Yes, I am still in favor of the folder-per-folder model.)

Cool.
-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Barry Warsaw
On Feb 04, 2010, at 03:00 PM, Glenn Linderman wrote:

When a PEP 3147 (if modified by my suggestion) version of Python runs, 
and the directory doesn't exist, and it wants to create a .pyc, it would 
create the directory, and put the .pyc there.  Sort of just like how it 
creates .pyc files, now, but an extra step of creating the repository 
directory if it doesn't exist.  After the first run, it would exist.  It 
is described in the PEP, and I quoted that section... Python will 
create a 'foo.pyr' directory... I'm just suggesting different semantics 
for how many directories, and what is contained in them.

I've added __pyr_version__ as an open question in the PEP (not yet committed),
as is making this default behavior (no -R flag required).

-Barry




signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Barry Warsaw
On Jan 31, 2010, at 01:06 PM, Ron Adam wrote:

With a single cache directory, we could have an option to force writing 
bytecode to a desired location.  That might be useful on it's own for 
creating runtime bytecode only installations for installers.

One important reason for wanting to keep the bytecode cache files colocated
with the source files is that I want to be able to continue to manipulate
$PYTHONPATH to control how Python finds its modules.  With a single
system-wide cache directory that won't be easy.  E.g. $PYTHONPATH might be
hacked to find the source file you expect, but how would that interact with
how Python finds its cache files?   I'm strongly in favor of keeping the cache
files as close to the source they were generated from as possible.

-Barry



signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Michael Foord

On 07/02/2010 17:48, Barry Warsaw wrote:

[snip...]

And I propose not to disturb this in 2.7, at least not by default. I'm
fine though with a flag or distro-overridable config setting to change
this behavior.
 

Cool.  I'm not sure this is absolutely necessary for Debian/Ubuntu, so I'll
call YAGNI on it for 2.x (until and unless it isn't ;).

   


What are the chances of getting this into 2.x at all? For it to get into 
the 2.7, likely to be the last major version in the 2.x series, the PEP 
needs to be approved and the implementation needs to be feature complete 
by April 3rd (first beta release according to the schedule [1]).


Michael Foord

[1] http://www.python.org/dev/peps/pep-0373/#release-schedule

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Barry Warsaw
On Jan 31, 2010, at 08:10 PM, Silke von Bargen wrote:

Martin v. Löwis schrieb:
 There is also the issue of race conditions with multiple simultaneous
 accesses. The original format for the PEP had race conditions for
 multiple simultaneous writers; ZIP will also have race conditions for
 concurrent readers/writers (as any new writer will have to overwrite
 the central directory, making the zip file temporarily unavailable -
 unless they copy it, in which case we are back to writer/writer
 races).

 Regards,
 Martin

   
Good point. OTOH the probability for this to happen actually is very small.

And yet, when it does happen, it's probably a monster to debug and defend
against.   Unless we have a convincing cross-platform story for preventing
these race conditions, I think a single-file (e.g. zipfile) approach is
infeasible.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Barry Warsaw
On Jan 31, 2010, at 11:34 PM, Nick Coghlan wrote:

I must admit I quite like the __pyr__ directory approach as well. Since
the interpreter knows the suffix it is looking for, names shouldn't
conflict. Using a single directory allows the name to be less cryptic,
too (e.g. __pycache__).

Something else that occurs to me; the name of the directory (under
folder-per-folder approach) probably ought to be the same as the name of the
module attribute.  There's probably no good reason to make it different, and
making it the same makes the association stronger.

That still gives us plenty of opportunity to bikeshed, but __pycache__ seems
reasonable to me (it's the cache of parsing and compiling the .py file).

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Barry Warsaw
On Feb 01, 2010, at 08:26 AM, Tim Delaney wrote:

The pyc/pyo files are just an optimisation detail, and are essentially
temporary. Given that, if they were to live in a single directory, to me it
seems obvious that the default location for that should be in the system
temporary directory. I an immediately think of the following advantages:

1. No one really complains too much about putting things in /tmp unless it
starts taking up too much space. In which case they delete it and if it gets
reused, it gets recreated.

IIUC the Filesystem Hierarchy Standard correctly, then these files really
should go under /var/cache/python.  (Don't ask me where that would be on
non-FHS compliant systems coughWindows/cough).  I've explained in other
followups why I don't particularly like separating the source from the cache
files though, but if you wanted a sick approach:

Take the full absolutely path to the .py file, plus the magic number, plus the
time stamp and hash that.  Cache the pyc file under /var/cache/python/hash.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Barry Warsaw
On Feb 06, 2010, at 04:02 PM, Guido van Rossum wrote:

On Sat, Feb 6, 2010 at 3:28 PM, Barry Warsaw ba...@python.org wrote:
 On Feb 01, 2010, at 02:04 PM, Paul Du Bois wrote:

It's an interesting challenge to write the file in such a way that
it's safe for a reader and writer to co-exist. Like Brett, I
considered an append-only scheme, but one needs to handle the case
where the bytecode for a particular magic number changes. At some
point you'd need to sweep garbage from the file. All solutions seem
unnecessarily complex, and unnecessary since in practice the case
should not come up.

 I don't think that part's difficult.  The byte code's only going to change if
 the source file has changed, and in that case, /all/ the byte code in the 
 fat
 pyc file will be invalidated, so the whole thing can be deleted by the first
 writer.  I'd worked that out in the original fat pyc version of the PEP.

I'm sorry, but I'm totally against fat bytecode files. They make
things harder for all tools. The beauty of the existing bytecode
format is that it's totally trivial: magic number, source mtime,
unmarshalled code object. You can't beat the beauty of that.

Just for the record, I totally agree.  I was just explaining something I had
figured out in the original version of the PEP, which wasn't published but
which Martin had seen an early draft of.  When Martin made the suggestion of
sibling cache directories, I immediately realized that it was much cleaner,
better, and easier to implement than fat files (especially because I already
had some nasty complex code that implemented the fat files ;).  I'm beginning
to be convinced wink that a folder-per-folder approach is the best take on
this yet.

For the traditional skinny bytecode files, I believe that the
existing algorithm which writes zeros in the place of the magic number
first, writes the rest of the file, and then goes back to write the
correct magic number, is correct with a single writer and multiple
readers (assuming the readers ignore the file if its magic number is
invalid). The creat(O_EXCL) option ensures that there won't be
multiple writers. No rename() is necessary; POSIX rename() may be
atomic, but it's a directory modification which makes it potentially
slow.

Agreed, and the current approach is time and battle tested.  I don't think we
need to be mucking around with it.

My current effort on this PEP will be spent on fleshing out the
folder-per-folder approach, understanding the implications of that, and
integrating all the other great comments in this thread.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Barry Warsaw
On Feb 06, 2010, at 04:39 PM, Guido van Rossum wrote:

The conflict is purely that PEP 3147 proposes the new behavior to be
optional, and adds a flag (-R) and an environment variable
($PYTHONPYR) to change it. I presume Barry is proposing this out of
fear that the new behavior might upset somebody; personally I think it
would be better if the behavior weren't optional. At least not in new
Python releases

Good to know!  Yes, that's one reason why I made it option, the other being
that I suspect most people don't care about the original use case (making sure
pyc files from different Python versions don't conflict).  However, with a
folder-per-folder approach, the side benefit of reducing directory clutter by
hiding all the pyc files becomes more compelling.

 -- in backports such as a distribution that wants this
feature might make, it may make sense to be more conservative, or at
least to have a way to turn it off.

For backports I think the most conservative approach is to require a flag to
enable this behavior.  If we make this the default for new versions of Python
(something I'd support) then tools written for Python = 3.2 will know this is
just how it's done.  I worry about existing deployed tools for Python  2.7
and 3.1.

How about this: enable it by default in 3.2 and 2.7.  No option to disable it.
Allow distro back ports to define a flag or environment variable to enable it.
The PEP can even be silent about how that's actually done, and a Debian
implementation for Python 2.6 or 3.1 could even use the (now documented :) -X
flag.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Barry Warsaw
On Feb 07, 2010, at 05:59 PM, Michael Foord wrote:

On 07/02/2010 17:48, Barry Warsaw wrote:
 [snip...]
 And I propose not to disturb this in 2.7, at least not by default. I'm
 fine though with a flag or distro-overridable config setting to change
 this behavior.
  
 Cool.  I'm not sure this is absolutely necessary for Debian/Ubuntu, so I'll
 call YAGNI on it for 2.x (until and unless it isn't ;).

Sorry, I was calling YAGNI on any change in behavior of module.__file__.

What are the chances of getting this into 2.x at all? For it to get into 
the 2.7, likely to be the last major version in the 2.x series, the PEP 
needs to be approved and the implementation needs to be feature complete 
by April 3rd (first beta release according to the schedule [1]).

I'd like to consult with my Debian/Ubuntu Python maintainer colleagues to see
if it's worth getting into 2.7.  If it is, and we can get a BDFL pronouncement
on the PEP (after the next rounds of updates), then I think it will be
feasible to implement in the time remaining.  Heck, that's what Pycon sprints
are for, no? :)

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread M.-A. Lemburg
Barry Warsaw wrote:
 On Feb 03, 2010, at 11:59 AM, M.-A. Lemburg wrote:
 
 How about using an optionally relative cache dir setting to let
 the user decide ?
 
 Why do we need that level of flexibility?

It's very easy to implement (see the code I posted) and gives
you a lot of control with a single env variable.

Some use cases:

1. PYTHONCACHE=. (store the cache files in the same dir as the
  .py file)

 This settings mimics what we've had in Python for decades. Users
 know about this Python behavior and expect it.

 It's also the only reasonable way of shipping byte-code only
 packages.

2. PYTHONCACHE=.pycache (store the cache files in a subdir of the
 dir where the .py file is stored)

 When using lots of cache files for multiple Python versions or
 variants, .py source code directory can easily get cluttered
 with too many such files.

 Putting them into a subdir solves this problem. This would be
 useful for developers running and testing the code with different
 Python versions.

3. PYTHONCACHE=~/.python/cache (store the cache files in a user dir,
outside the Python source file dir)

 This allows easy removal of all cache files and prevents
 cluttering up the sys.path dirs with cache files or directories
 altogether.

 It's also handy if the source code dirs are not writable by
 the user importing them. OTOH, every user would create a copy
 of the cache files (this is what currently happens with setuptools
 eggs and is very annoying).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 07 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Guido van Rossum
On Sun, Feb 7, 2010 at 10:17 AM, Barry Warsaw ba...@python.org wrote:
 On Jan 31, 2010, at 11:34 PM, Nick Coghlan wrote:

I must admit I quite like the __pyr__ directory approach as well. Since
the interpreter knows the suffix it is looking for, names shouldn't
conflict. Using a single directory allows the name to be less cryptic,
too (e.g. __pycache__).

 Something else that occurs to me; the name of the directory (under
 folder-per-folder approach) probably ought to be the same as the name of the
 module attribute.  There's probably no good reason to make it different, and
 making it the same makes the association stronger.

I'm not sure I follow. The directory doesn't suddenly become an
attribute. Moreover, the directory contains many files (assuming
folder-per-folder) and the attribute would point to a single file
inside that directory.

 That still gives us plenty of opportunity to bikeshed, but __pycache__ seems
 reasonable to me (it's the cache of parsing and compiling the .py file).

While technically it is a cache, I don't think that emphasizing that
point is helpful. For 20 years people have thought of it as compiled
bytecode.

Also while on the filesystem it makes sense for it to have py in the
directory name, that does not make sense for the attribute name. After
all we don't go around calling things __pyfile__, __pygetattr__,
__pysys__... ;-)

I'm still for __compiled__ as the attribute; I don't have a particular
preference for the directory name or the naming scheme used inside it,
as long as neither starts with '.' (and probably the directory should
be __something__).

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Brett Cannon
On Sun, Feb 7, 2010 at 10:44, Barry Warsaw ba...@python.org wrote:
 On Feb 06, 2010, at 04:39 PM, Guido van Rossum wrote:

The conflict is purely that PEP 3147 proposes the new behavior to be
optional, and adds a flag (-R) and an environment variable
($PYTHONPYR) to change it. I presume Barry is proposing this out of
fear that the new behavior might upset somebody; personally I think it
would be better if the behavior weren't optional. At least not in new
Python releases

 Good to know!  Yes, that's one reason why I made it option, the other being
 that I suspect most people don't care about the original use case (making sure
 pyc files from different Python versions don't conflict).  However, with a
 folder-per-folder approach, the side benefit of reducing directory clutter by
 hiding all the pyc files becomes more compelling.

 -- in backports such as a distribution that wants this
feature might make, it may make sense to be more conservative, or at
least to have a way to turn it off.

 For backports I think the most conservative approach is to require a flag to
 enable this behavior.  If we make this the default for new versions of Python
 (something I'd support) then tools written for Python = 3.2 will know this is
 just how it's done.  I worry about existing deployed tools for Python  2.7
 and 3.1.

 How about this: enable it by default in 3.2 and 2.7.  No option to disable it.
 Allow distro back ports to define a flag or environment variable to enable it.
 The PEP can even be silent about how that's actually done, and a Debian
 implementation for Python 2.6 or 3.1 could even use the (now documented :) -X
 flag.

Would you keep the old behavior around as well, or simply drop it? I
personally vote for the latter for simplicity and performance reasons
(by not having to look in so many places for bytecode), but I can see
tool people who magically calculate the location of the bytecode not
loving the idea (another reason why giving loaders a method to return
all relevant paths is a good idea; no more guessing).

-Brett



 -Barry

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/brett%40python.org


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Guido van Rossum
On Sun, Feb 7, 2010 at 12:23 PM, Brett Cannon br...@python.org wrote:
 On Sun, Feb 7, 2010 at 10:44, Barry Warsaw ba...@python.org wrote:
 On Feb 06, 2010, at 04:39 PM, Guido van Rossum wrote:

The conflict is purely that PEP 3147 proposes the new behavior to be
optional, and adds a flag (-R) and an environment variable
($PYTHONPYR) to change it. I presume Barry is proposing this out of
fear that the new behavior might upset somebody; personally I think it
would be better if the behavior weren't optional. At least not in new
Python releases

 Good to know!  Yes, that's one reason why I made it option, the other being
 that I suspect most people don't care about the original use case (making 
 sure
 pyc files from different Python versions don't conflict).  However, with a
 folder-per-folder approach, the side benefit of reducing directory clutter by
 hiding all the pyc files becomes more compelling.

 -- in backports such as a distribution that wants this
feature might make, it may make sense to be more conservative, or at
least to have a way to turn it off.

 For backports I think the most conservative approach is to require a flag to
 enable this behavior.  If we make this the default for new versions of Python
 (something I'd support) then tools written for Python = 3.2 will know this 
 is
 just how it's done.  I worry about existing deployed tools for Python  2.7
 and 3.1.

 How about this: enable it by default in 3.2 and 2.7.  No option to disable 
 it.
 Allow distro back ports to define a flag or environment variable to enable 
 it.
 The PEP can even be silent about how that's actually done, and a Debian
 implementation for Python 2.6 or 3.1 could even use the (now documented :) -X
 flag.

 Would you keep the old behavior around as well, or simply drop it? I
 personally vote for the latter for simplicity and performance reasons
 (by not having to look in so many places for bytecode), but I can see
 tool people who magically calculate the location of the bytecode not
 loving the idea (another reason why giving loaders a method to return
 all relevant paths is a good idea; no more guessing).

For 3.2 I think it's fine to simply drop the old behavior (as long as
a good loader API is added at the same time).

But for 2.7 I think we ought to be a lot more conservative and not
force tools to upgrade, so I think we should keep the old behavior in
2.7 as the default (though distros can change this if they want to,
and backport if they need to).

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-07 Thread Ron Adam



Barry Warsaw wrote:

On Jan 31, 2010, at 01:06 PM, Ron Adam wrote:

With a single cache directory, we could have an option to force writing 
bytecode to a desired location.  That might be useful on it's own for 
creating runtime bytecode only installations for installers.


One important reason for wanting to keep the bytecode cache files colocated
with the source files is that I want to be able to continue to manipulate
$PYTHONPATH to control how Python finds its modules.  With a single
system-wide cache directory that won't be easy.  E.g. $PYTHONPATH might be
hacked to find the source file you expect, but how would that interact with
how Python finds its cache files?   I'm strongly in favor of keeping the cache
files as close to the source they were generated from as possible.


Yes, I agree, after thinking about it, it does seems like it may be more 
complex than I first thought.


I think the folder-per-folder option sounds like the best default option at 
this time.  It reduces folder clutter for the python developer and may 
loosen the link between source files and byte code files just enough that 
it will be easier to experiment with more flexible modes later.




It seems to me that in the long run, (probably no time soon), it might be 
nice to even do away with on disk byte code altogether unless it's 
explicitly asked for. As computers get faster, the time it takes to compile 
byte code may become a smaller and smaller percent of the total run time. 
That is unless the size of python programs increase at the same rate or faster.


To tell the truth in most cases I hardly notice the extra time the first 
run takes compared to later runs with the precompiled byte code.  Yes it 
may be a few seconds at start up, but after that it's usually not a big 
part of the execution time.  Hmmm, I wonder if there's a threshold in file 
size where it really doesn't make a significant difference?


Regards,
  Ron














___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Barry Warsaw
On Feb 03, 2010, at 01:17 PM, Guido van Rossum wrote:

Can you clarify? In Python 3, __file__ always points to the source.
Clearly that is the way of the future. For 99.99% of uses of __file__,
if it suddenly never pointed to a .pyc file any more (even if one
existed) that would be just fine. So what's this talk of switching to
__source__?

Upon further reflection, I agree.  __file__ also points to the source in
Python 2.7.  Do we need an attribute to point to the compiled bytecode file?

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Barry Warsaw
On Feb 05, 2010, at 07:37 PM, Nick Coghlan wrote:

Brett Cannon wrote:
 Does code exist out there where people are constructing bytecode from
  multiple files for a single module?

I'm quite prepared to call YAGNI on that idea and just return a 2-tuple
of source filename and compiled filename.

Me too.  I think a 2-tuple of (source-path, compiled-path) is probably going
to be fine for all practical purposes.  I'd assign the former to a module's
__file__ (as is done today in Python = 2.7) and the latter to a module's
__cached__.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Barry Warsaw
On Feb 03, 2010, at 11:59 AM, M.-A. Lemburg wrote:

How about using an optionally relative cache dir setting to let
the user decide ?

Why do we need that level of flexibility?

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Barry Warsaw
On Feb 03, 2010, at 11:07 PM, Nick Coghlan wrote:

It's also the case that having to run Python to manage my own filesystem
would very annoying. If a dev has a broken .pyc that prevents the
affected Python build from even starting how are they meant to use the
nonfunctioning interpreter to find and delete the offending file? How is
someone meant to find and delete the .pyc files if they prefer to use a
graphical file manager over (or in conjunction with) the command line?

I agree.  I'd prefer to have a predictable place for the cached files,
independent of having to run Python to tell you where that is.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Barry Warsaw
On Feb 03, 2010, at 09:26 AM, Floris Bruynooghe wrote:

On Wed, Feb 03, 2010 at 06:14:44PM +1100, Ben Finney wrote:
 I don't understand the distinction you're making between those two
 options. Can you explain what you mean by each of “siblings” and
 “folder-per-folder”?

sibilings: the original proposal, i.e.:

foo.py
foo.pyr/
MAGIC1.pyc
MAGIC1.pyo
...
bar.py
bar.pyr/
MAGIC1.pyc
MAGIC1.pyo
...

folder-per-folder:

foo.py
bar.py
__pyr__/
foo.MAGIC1.pyc
foo.MAGIC1.pyo
foo.MAGIC2.pyc
bar.MAGIC1.pyc
...

IIUC

Correct.  If necessary, I'll define those two terms in the PEP.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread exarkun

On 08:21 pm, ba...@python.org wrote:

On Feb 03, 2010, at 01:17 PM, Guido van Rossum wrote:

Can you clarify? In Python 3, __file__ always points to the source.
Clearly that is the way of the future. For 99.99% of uses of __file__,
if it suddenly never pointed to a .pyc file any more (even if one
existed) that would be just fine. So what's this talk of switching to
__source__?


Upon further reflection, I agree.  __file__ also points to the source 
in
Python 2.7.  Do we need an attribute to point to the compiled bytecode 
file?


What if, instead of trying to annotate the module object with this 
assortment of metadata - metadata which depends on lots of things, and 
can vary from interpreter to interpreter, and even from module to module 
(depending on how it was loaded) - we just stuck with the __loader__ 
annotation, and encouraged/allowed/facilitated the use of the loader 
object to learn all of this extra information?


Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Guido van Rossum
On Sat, Feb 6, 2010 at 12:21 PM, Barry Warsaw ba...@python.org wrote:
 On Feb 03, 2010, at 01:17 PM, Guido van Rossum wrote:
Can you clarify? In Python 3, __file__ always points to the source.
Clearly that is the way of the future. For 99.99% of uses of __file__,
if it suddenly never pointed to a .pyc file any more (even if one
existed) that would be just fine. So what's this talk of switching to
__source__?

 Upon further reflection, I agree.  __file__ also points to the source in
 Python 2.7.

Not in the 2.7 svn repo I have access to. It still points to the .pyc
file if it was used.

And I propose not to disturb this in 2.7, at least not by default. I'm
fine though with a flag or distro-overridable config setting to change
this behavior.

 Do we need an attribute to point to the compiled bytecode file?

I think we do. Quite unrelated to this discussion I have a use case
for knowing easily whether a module was actually loaded from bytecode
or not -- but I also have a need for __file__ to point to the source.
So having both __file__ and __compiled__ makes sense to me.

When there is no source code but only bytecode I am file with both
pointing to the bytecode; in that case I presume that the bytecode is
not in a __pyr__ subdirectory. For dynamically loaded extension
modules I think both should be left unset, and some other __xxx__
variable could point to the .so or .dll file. FWIW the most common use
case for __file__ is probably to find data files relative to it. Since
the data won't be in the __pyr__ directory we couldn't make __file__
point to the __pyr__/pyc file without much code breakage.

(Yes, I am still in favor of the folder-per-folder model.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Barry Warsaw
On Jan 31, 2010, at 11:04 AM, Raymond Hettinger wrote:

  It does this by
 allowing many different byte compilation files (.pyc files) to be
 co-located with the Python source file (.py file).  

It would be nice if all the compilation files could be tucked
into one single zipfile per directory to reduce directory clutter.

It has several benefits besides tidiness. It hides the implementation
details of when magic numbers get shifted.  And it may allow faster
start-up times when the zipfile is in the disk cache.

This is closer in spirit to the original (uncirculated) PEP which called for
fat pyc files, but without the complicated implementation details.  It's still
an interesting approach to explore.

Writer concurrency can be handled with dot-lock files, but that does incur
some extra overhead, such as the remove() of the lock file.

-Barry



signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Barry Warsaw
On Feb 01, 2010, at 02:04 PM, Paul Du Bois wrote:

It's an interesting challenge to write the file in such a way that
it's safe for a reader and writer to co-exist. Like Brett, I
considered an append-only scheme, but one needs to handle the case
where the bytecode for a particular magic number changes. At some
point you'd need to sweep garbage from the file. All solutions seem
unnecessarily complex, and unnecessary since in practice the case
should not come up.

I don't think that part's difficult.  The byte code's only going to change if
the source file has changed, and in that case, /all/ the byte code in the fat
pyc file will be invalidated, so the whole thing can be deleted by the first
writer.  I'd worked that out in the original fat pyc version of the PEP.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Barry Warsaw
On Feb 01, 2010, at 11:28 PM, Martin v. Löwis wrote:

So what would you do for concurrent writers, then? The current
implementation relies on creat(O_EXCL) to be atomic, so a second
writer would just fail. This is but the only IO operation that is
guaranteed to be atomic (along with mkdir(2)), so reusing the current
approach doesn't work.

I believe rename(2) is atomic also, at least on POSIX.  I'm not sure if that
helps us though.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Guido van Rossum
On Sat, Feb 6, 2010 at 3:28 PM, Barry Warsaw ba...@python.org wrote:
 On Feb 01, 2010, at 02:04 PM, Paul Du Bois wrote:

It's an interesting challenge to write the file in such a way that
it's safe for a reader and writer to co-exist. Like Brett, I
considered an append-only scheme, but one needs to handle the case
where the bytecode for a particular magic number changes. At some
point you'd need to sweep garbage from the file. All solutions seem
unnecessarily complex, and unnecessary since in practice the case
should not come up.

 I don't think that part's difficult.  The byte code's only going to change if
 the source file has changed, and in that case, /all/ the byte code in the fat
 pyc file will be invalidated, so the whole thing can be deleted by the first
 writer.  I'd worked that out in the original fat pyc version of the PEP.

I'm sorry, but I'm totally against fat bytecode files. They make
things harder for all tools. The beauty of the existing bytecode
format is that it's totally trivial: magic number, source mtime,
unmarshalled code object. You can't beat the beauty of that.

For the traditional skinny bytecode files, I believe that the
existing algorithm which writes zeros in the place of the magic number
first, writes the rest of the file, and then goes back to write the
correct magic number, is correct with a single writer and multiple
readers (assuming the readers ignore the file if its magic number is
invalid). The creat(O_EXCL) option ensures that there won't be
multiple writers. No rename() is necessary; POSIX rename() may be
atomic, but it's a directory modification which makes it potentially
slow.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Ben Finney
Barry Warsaw ba...@python.org writes:

 On Feb 03, 2010, at 11:07 PM, Nick Coghlan wrote:

 It's also the case that having to run Python to manage my own
 filesystem would very annoying.
[…]

Files that are problematic wouldn't need Python to manage any more than
currently. The suggestion was just that, a suggestion for Python to
expose information to assist; it wouldn't be required.

 I agree. I'd prefer to have a predictable place for the cached files,
 independent of having to run Python to tell you where that is.

Right; I don't see who would disagree with that. I don't see any
conflict between “decouple compiled bytecode file locations from source
file locations” versus “predictable location for the compiled bytecode
files”.

-- 
 \ “All television is educational television. The question is: |
  `\   what is it teaching?” —Nicholas Johnson |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Guido van Rossum
On Sat, Feb 6, 2010 at 4:27 PM, Ben Finney ben+pyt...@benfinney.id.au wrote:
 Barry Warsaw ba...@python.org writes:
 I agree. I'd prefer to have a predictable place for the cached files,
 independent of having to run Python to tell you where that is.

 Right; I don't see who would disagree with that. I don't see any
 conflict between “decouple compiled bytecode file locations from source
 file locations” versus “predictable location for the compiled bytecode
 files”.

The conflict is purely that PEP 3147 proposes the new behavior to be
optional, and adds a flag (-R) and an environment variable
($PYTHONPYR) to change it. I presume Barry is proposing this out of
fear that the new behavior might upset somebody; personally I think it
would be better if the behavior weren't optional. At least not in new
Python releases -- in backports such as a distribution that wants this
feature might make, it may make sense to be more conservative, or at
least to have a way to turn it off.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Nick Coghlan
Ben Finney wrote:
 Right; I don't see who would disagree with that. I don't see any
 conflict between “decouple compiled bytecode file locations from source
 file locations” versus “predictable location for the compiled bytecode
 files”.

The more decoupled they are, the harder it is to manually find the
bytecode file.

With the current .pyc scheme, .pyr folders or an SVN style Python cache
directory, finding the bytecode file is pretty easy, since the cached
file is either in the same directory as the source file or in a
subdirectory.

With any form of shadow hierarchy though, it gets trickier because you
have to:
1. Find the root of the shadow hierarchy
2. Navigate within the shadow hierarchy down to the point that matches
where your source file was

It's a fairly significant increase in mental overhead. It gets much
worse if the location of the shadow hierarchy root is configurable in
any way (e.g. based on sys.path contents or an environment variable).

Restricting the caching mechanism to the folder containing the source
file keeps things a lot simpler.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Guido van Rossum
On Sat, Feb 6, 2010 at 5:10 PM, Nick Coghlan ncogh...@gmail.com wrote:
 Ben Finney wrote:
 Right; I don't see who would disagree with that. I don't see any
 conflict between “decouple compiled bytecode file locations from source
 file locations” versus “predictable location for the compiled bytecode
 files”.

 The more decoupled they are, the harder it is to manually find the
 bytecode file.

 With the current .pyc scheme, .pyr folders or an SVN style Python cache
 directory, finding the bytecode file is pretty easy, since the cached
 file is either in the same directory as the source file or in a
 subdirectory.

 With any form of shadow hierarchy though, it gets trickier because you
 have to:
 1. Find the root of the shadow hierarchy
 2. Navigate within the shadow hierarchy down to the point that matches
 where your source file was

 It's a fairly significant increase in mental overhead. It gets much
 worse if the location of the shadow hierarchy root is configurable in
 any way (e.g. based on sys.path contents or an environment variable).

 Restricting the caching mechanism to the folder containing the source
 file keeps things a lot simpler.

Great way of explaining why the basic folder-per-folder model wins
over the folder-per-sys.path-entry model! The basic folder-per-folder
model doesn't need to know what sys.path is. (And I hadn't followed
previous messages in the thread with enough care to understand the
subtlen implications of Ben's point. Sorry!)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Ben Finney
Nick Coghlan ncogh...@gmail.com writes:

 The more decoupled they are, the harder it is to manually find the
 bytecode file.

Okay. So it's not so much about “predictable”, but rather about
“predictable by a human without too much cognitive effort”.

I can see value in that, though it's best to be explicit that this is a
goal (to be clear that “a program can tell you where they live” isn't a
solution).

 It's a fairly significant increase in mental overhead. It gets much
 worse if the location of the shadow hierarchy root is configurable in
 any way (e.g. based on sys.path contents or an environment variable).

 Restricting the caching mechanism to the folder containing the source
 file keeps things a lot simpler.

Simpler for the human working on the source code; not for the human
trying to fit this scheme in with an OS package management system.
(Again, I'm just clarifying and making the contrast explicit, not
judging relative values.)

This makes it clearer to me that there is a glaring incompatibility
between this desire for “keep the compiled bytecode files close to the
source files” versus “decouple the locations so the OS package manager
can do its job of managing installed files”.

I recognise after earlier discussion in this thread that's not an issue
being addressed by PEP 3147.

-- 
 \ “Those are my principles. If you don't like them I have |
  `\others.” —Groucho Marx |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-06 Thread Nick Coghlan
exar...@twistedmatrix.com wrote:
 On 08:21 pm, ba...@python.org wrote:
 On Feb 03, 2010, at 01:17 PM, Guido van Rossum wrote:
 Can you clarify? In Python 3, __file__ always points to the source.
 Clearly that is the way of the future. For 99.99% of uses of __file__,
 if it suddenly never pointed to a .pyc file any more (even if one
 existed) that would be just fine. So what's this talk of switching to
 __source__?

 Upon further reflection, I agree.  __file__ also points to the source in
 Python 2.7.  Do we need an attribute to point to the compiled bytecode
 file?
 
 What if, instead of trying to annotate the module object with this
 assortment of metadata - metadata which depends on lots of things, and
 can vary from interpreter to interpreter, and even from module to module
 (depending on how it was loaded) - we just stuck with the __loader__
 annotation, and encouraged/allowed/facilitated the use of the loader
 object to learn all of this extra information?

Trickier than it sounds. In the case of answering the question was this
module loaded from bytecode or not?, the loader will need somewhere to
store the answer for each file.

The easiest per-module store is the module's own global namespace - the
loader's own attribute namespace isn't appropriate, since one loader may
handle multiple modules.

The filesystem can't be used as a reference because even when the file
is loaded from source, the bytecode file will usually be created as a
side effect.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-05 Thread Nick Coghlan
Brett Cannon wrote:
 If we add a new method like get_filenames(), I would suggest going
 with Antoine's suggestion of a tuple for __compiled__ (allowing
 loaders to indicate that they actually constructed the runtime
 bytecode from multiple cached files on-disk).
 
 
 Does code exist out there where people are constructing bytecode from
  multiple files for a single module?

I'm quite prepared to call YAGNI on that idea and just return a 2-tuple
of source filename and compiled filename.

The theoretical use case was for a module that was partially compiled to
native code in advance, so it's compiled version was a combination of
a shared library and a bytecode file. It isn't really all that
compelling an idea - it would be easy enough for a loader to pick one or
the other and stick that in __compiled__.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-04 Thread Brett Cannon
On Wed, Feb 3, 2010 at 13:33, Martin v. Löwis mar...@v.loewis.de wrote:

 Guido van Rossum wrote:
  On Wed, Feb 3, 2010 at 12:47 PM, Nick Coghlan ncogh...@gmail.com
 wrote:
  On the issue of __file__, I'd suggesting not being too hasty in
  deprecating that in favour of __source__. While I can see a lot of value
  in having it point to the source file more often with a different
  attribute that points to the cached file, I don't see a lot of gain to
  compensate for the pain of changing the name of __file__ itself.
 
  Can you clarify? In Python 3, __file__ always points to the source.
  Clearly that is the way of the future. For 99.99% of uses of __file__,
  if it suddenly never pointed to a .pyc file any more (even if one
  existed) that would be just fine. So what's this talk of switching to
  __source__?

 I originally proposed it, not knowing that Python 3 already changed the
 meaning of __file__ for byte code files.

 What I really wanted to suggest is that it should be possible to tell
 what gets really executed, plus what source file had been considered.

 So if __file__ is always the source file, a second attribute should tell
 whether a byte code file got read (so that you can delete that in case
 you doubt it's current, for example).


What should be done for loaders? Right now we have get_filename() which is
what __file__ is to be set to. For importlib there is source_path and
bytecode_path, but both of those are specified to return None in the cases
of source or bytecode are not available, respectively.

The bare minimum, I think, is we need loaders to have mehod(s) that return
the path to the source -- whether it exists or not, to set __file__ to --
and the path to bytecode if it exists -- to set __compiled__ or whatever
attribute we come up with. That suggests to me either two new methods or one
that returns a two-item tuple. We could possibly keep get_filename() and say
that people need to compare its output to what source_path()-equivalent
method returns, but that seems bad if the source location needs to be based
on the bytecode location.

My thinking is we deprecate get_filename() and introduce some new method
that returns a two-item tuple (get_paths?). First item is where the source
should be, and the second is where the bytecode is if it exists (else it's
None). Putting both calculations into a single method seems better than a
source_path()/bytecode_path() as the latter would quite possibly need
source_path() to call bytecode_path() on its own to calculate where the
source should be if it doesn't exist on top of the direct call to
get_bytecode() for setting __compiled__ itself.

-Brett




 Regards,
 Martin


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/brett%40python.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-04 Thread Glenn Linderman
On approximately 1/30/2010 4:00 PM, came the following characters from 
the keyboard of Barry Warsaw:

When the Python executable is given a `-R` flag, or the environment
variable `$PYTHONPYR` is set, then Python will create a `foo.pyr`
directory and write a `pyc` file to that directory with the hexlified
magic number as the base name.
   


After the discussion so far, my opinion is that if the source directory 
contains an appropriate python repositiory directory [1], and the 
version of Python implements PEP 3147, that there should be no need for 
-R or $PYTHONPYR to exist, but that such versions of Python would 
simply, and always look in the python repository directory for binaries.


I've reached this conclusion for several reasons/benefits:

1) it makes the rules simpler for people finding the binaries
2) there is no double lookup to find a binary at run time
3) if the PEP changes to implement alternatives B or C in [1], then I 
hear a large consensus of people that like that behavior, to clean up 
the annoying clutter of .pyc files mixed with source.
4) There is no need to add or document the command line option or 
environment variable.




[1] Alternative A... source-file-root.pyr, as in the PEP, Alt. B... 
source-file-dir/__pyr__ all versions/files in same lookaside directory, 
Alt. C... source-file-dir/__pyr_version__, each Python version with 
different bytecode would have some sort of version string or magic 
number that identifies it, and would look only in that directory for its 
.pyc/.pyo files.  I prefer C for 4 reasons: 1) easier to blow away one 
version; 2) easier to see what that version has compiled; 3) most people 
use only one or two versions, so directory proliferation is limited; 4) 
even when there are 30 versions of Python, the subdirectories would 
contain the same order-of-magnitude count of files as the source 
directory for performance issues, if the file system has a knee in the 
performance curve as some do.


--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-04 Thread Nick Coghlan
Brett Cannon wrote:
 My thinking is we deprecate get_filename() and introduce some new method
 that returns a two-item tuple (get_paths?). First item is where the
 source should be, and the second is where the bytecode is if it exists
 (else it's None). Putting both calculations into a single method seems
 better than a source_path()/bytecode_path() as the latter would quite
 possibly need source_path() to call bytecode_path() on its own to
 calculate where the source should be if it doesn't exist on top of the
 direct call to get_bytecode() for setting __compiled__ itself.

If we add a new method like get_filenames(), I would suggest going with
Antoine's suggestion of a tuple for __compiled__ (allowing loaders to
indicate that they actually constructed the runtime bytecode from
multiple cached files on-disk).

The runpy logic would then be something like:

  try:
method = loader.get_filenames
  except AttributeError:
__compiled__ = ()
try:
  method = loader.get_filename
except:
  __file__ = None
else:
  __file__ = method()
  else:
__file__, *__compiled__ = method()


For the import machinery itself, setting __compiled__ would be the
responsibility of the loaders due to the way load_module is specified. I
still sometimes wonder if we would be better off splitting that method
into separate prepare_module and exec_module methods to allow the
interpreter a chance to fiddle with the module globals before the module
code gets executed.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-04 Thread Nick Coghlan
Glenn Linderman wrote:
 Alt. C... source-file-dir/__pyr_version__, each Python version with
 different bytecode would have some sort of version string or magic
 number that identifies it, and would look only in that directory for its
 .pyc/.pyo files.  I prefer C for 4 reasons: 1) easier to blow away one
 version; 2) easier to see what that version has compiled; 3) most people
 use only one or two versions, so directory proliferation is limited; 4)
 even when there are 30 versions of Python, the subdirectories would
 contain the same order-of-magnitude count of files as the source
 directory for performance issues, if the file system has a knee in the
 performance curve as some do.

I don't think this suggestion had come up before, but I like it. It also
reduces the amount of filename adjustment needed in the individual cache
directories.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-04 Thread Brett Cannon
On Thu, Feb 4, 2010 at 13:51, Nick Coghlan ncogh...@gmail.com wrote:

 Brett Cannon wrote:
  My thinking is we deprecate get_filename() and introduce some new method
  that returns a two-item tuple (get_paths?). First item is where the
  source should be, and the second is where the bytecode is if it exists
  (else it's None). Putting both calculations into a single method seems
  better than a source_path()/bytecode_path() as the latter would quite
  possibly need source_path() to call bytecode_path() on its own to
  calculate where the source should be if it doesn't exist on top of the
  direct call to get_bytecode() for setting __compiled__ itself.

 If we add a new method like get_filenames(), I would suggest going with
 Antoine's suggestion of a tuple for __compiled__ (allowing loaders to
 indicate that they actually constructed the runtime bytecode from
 multiple cached files on-disk).


Does code exist out there where people are constructing bytecode from
multiple files for a single module?


 The runpy logic would then be something like:

  try:
method = loader.get_filenames
  except AttributeError:
__compiled__ = ()
try:
  method = loader.get_filename
except:
  __file__ = None
else:
  __file__ = method()
  else:
__file__, *__compiled__ = method()


Should it really be a flat sequence that get_filenames returns? That first
value has a very special meaning compared to the rest which suggests to me
keeping the returned sequence to two items, just with the second item being
a sequence itself.



 For the import machinery itself, setting __compiled__ would be the
 responsibility of the loaders due to the way load_module is specified.


Yep.


 I
 still sometimes wonder if we would be better off splitting that method
 into separate prepare_module and exec_module methods to allow the
 interpreter a chance to fiddle with the module globals before the module
 code gets executed.


There's a reason why importlib has its ABCs abstracted the way it does;
there's a bunch of stuff that can be automated and is common to all loaders
that load_module has to cover. We could consider refactoring the API, but I
don't know if it is worth the hassle since importlib has decorators that
take care of low-level commonality and has ABCs for higher-level stuff.

But yes, given a do-over, I would abstract loaders to a finer grain to let
import handle more of the details.

-Brett




 Cheers,
 Nick.

 --
 Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
 ---

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-04 Thread Eric Smith

Glenn Linderman wrote:
On approximately 1/30/2010 4:00 PM, came the following characters from 
the keyboard of Barry Warsaw:

When the Python executable is given a `-R` flag, or the environment
variable `$PYTHONPYR` is set, then Python will create a `foo.pyr`
directory and write a `pyc` file to that directory with the hexlified
magic number as the base name.
   


After the discussion so far, my opinion is that if the source directory 
contains an appropriate python repositiory directory [1], and the 
version of Python implements PEP 3147, that there should be no need for 
-R or $PYTHONPYR to exist, but that such versions of Python would 
simply, and always look in the python repository directory for binaries.


How would the python repository directory ever get created?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-04 Thread Glenn Linderman
On approximately 2/4/2010 2:28 PM, came the following characters from 
the keyboard of Eric Smith:

Glenn Linderman wrote:
On approximately 1/30/2010 4:00 PM, came the following characters 
from the keyboard of Barry Warsaw:

When the Python executable is given a `-R` flag, or the environment
variable `$PYTHONPYR` is set, then Python will create a `foo.pyr`
directory and write a `pyc` file to that directory with the hexlified
magic number as the base name.


After the discussion so far, my opinion is that if the source 
directory contains an appropriate python repositiory directory [1], 
and the version of Python implements PEP 3147, that there should be 
no need for -R or $PYTHONPYR to exist, but that such versions of 
Python would simply, and always look in the python repository 
directory for binaries.


How would the python repository directory ever get created?


When a PEP 3147 (if modified by my suggestion) version of Python runs, 
and the directory doesn't exist, and it wants to create a .pyc, it would 
create the directory, and put the .pyc there.  Sort of just like how it 
creates .pyc files, now, but an extra step of creating the repository 
directory if it doesn't exist.  After the first run, it would exist.  It 
is described in the PEP, and I quoted that section... Python will 
create a 'foo.pyr' directory... I'm just suggesting different semantics 
for how many directories, and what is contained in them.


--
Glenn

“Everyone is entitled to their own opinion, but not their own facts. In 
turn, everyone is entitled to their own opinions of the facts, but not 
their own facts based on their opinions.” -- Guy Rocha, retiring NV 
state archivist


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Floris Bruynooghe
On Wed, Feb 03, 2010 at 06:14:44PM +1100, Ben Finney wrote:
 Barry Warsaw ba...@python.org writes:
 
  I suppose this is going to be very subjective, but in skimming the
  thread it seems like most people like putting the byte code cache
  artifacts in subdirectories (be they siblings or folder-per-folder).
 
 I don't understand the distinction you're making between those two
 options. Can you explain what you mean by each of “siblings” and
 “folder-per-folder”?

sibilings: the original proposal, i.e.:

foo.py
foo.pyr/
MAGIC1.pyc
MAGIC1.pyo
...
bar.py
bar.pyr/
MAGIC1.pyc
MAGIC1.pyo
...

folder-per-folder:

foo.py
bar.py
__pyr__/
foo.MAGIC1.pyc
foo.MAGIC1.pyo
foo.MAGIC2.pyc
bar.MAGIC1.pyc
...

IIUC

Personally I'm +1 on the folder-per-folder option.


Floris


-- 
Debian GNU/Linux -- The Power of Freedom
www.debian.org | www.gnu.org | www.kernel.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Glenn Linderman
On approximately 2/2/2010 7:05 PM, came the following characters from 
the keyboard of Guido van Rossum:

On Tue, Feb 2, 2010 at 5:41 PM, Glenn Lindermanv+pyt...@g.nevcal.com  wrote:
   

On approximately 2/2/2010 4:28 PM, came the following characters from the
keyboard of Guido van Rossum:
 

Argh. zipfiles are way to complex to be writing.
   

Agreed.  But in reading that, it somehow triggered a question: does
zipimport only work for zipfiles, or does it work for any archive format
that Python stdlib knows how to decode?  And if only the former, why are
they so special?
 

The former.

They are special because (unlike e.g. tar files) you can read the
table of contents of a zipfile without parsing the entire file.


They are not unique in this... most archive formats except tar have a 
directory.  But that is likely a good reason not to support tar for this 
purpose, especially since tar usually comes found as .tar.Z or .tar.gz 
or .tar.bz2 etc. and would require two passes before the data could be 
found at all.



Also
because they are universally supported which makes it unnecessary to
support other formats. Again, contrast tar files which are virtually
unheard of on Windows.
   


This may well be true, at least for some definitions of Universal.  
However, for the definition of Universal that matters to the discussion, 
is all the platforms on which Python is supported... and certainly all 
those platforms have support for all the archive formats in Python's 
stdlib, eh?  Oh!  Sorry, I had jumped to the conclusion that the stdlib 
(because of the batteries included philosophy) supported things like 7z 
and rar files, since they've been around for years, but I see there is a 
limited selection there.  OK, I found the ticket that suggests adding 7z 
and nosied myself.  Didn't bother to look for rar, because I'm a 7z fan, 
and it has better compression factors in most cases.


--
Glenn

“Everyone is entitled to their own opinion, but not their own facts. In 
turn, everyone is entitled to their own opinions of the facts, but not 
their own facts based on their opinions.” -- Guy Rocha, retiring NV 
state archivist


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Michael Foord

On 03/02/2010 06:50, Barry Warsaw wrote:

I have to say up front that I'm somewhat shocked at how quickly this thread
has exploded!  Since I'm sprinting this week, I haven't thoroughly read every
message and won't have time tonight to answer every question, but I'll try to
pick out some common ideas.  I really appreciate everyone's input and will try
to clarify the PEP where I can.

It is probably not clear enough from the PEP, but I actually don't expect that
most individual Python developers will use this feature.  This is why the -R
flag exists and the behavior is turned off by default.


The fact that it doesn't affect most developers makes it the *perfect* 
opportunity to bikeshed... :-)


Michael


  When I'm developing
some Python code in my home directory, I usually only use one Python version
and even if I'm going to test it with multiple Python versions, I won't need
to do this *simultaneously*.  I will generally blow away all build artifacts
(including, but not limited to .pyc files) and then rebuild with the different
Python version.

I think that this feature will be limited mostly to distros, which have
different use cases than individual developers.  But these are important use
cases for Python to support nonetheless.

My rationale for choosing the file system layout in the PEP was to try to
present something more familiar to today's Python and to avoid radical
reorganization of the way Python caches its byte code.  Thus having a sibling
directory that differs from the source just by extension seemed more natural
to me.

Encoding the magic number in the file name under .pyr would I thought make the
look up scheme more efficient since the import machinery can craft the file
name directly.  I agree it's not very human friendly because nobody really
knows which magic numbers are associated with which Python versions and flags.

As to the question of sibling directories or folder-per-folder I think
performance issues should be the deciding factor.  There are file system
limitations to consider (but also a wide variety of file systems in use).  Do
the number of stat calls predominate the performance costs?  Maybe it makes
sense to implement the two different approaches and do some measurements.

-Barry
   



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
   



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of your 
employer, to release me from all obligations and waivers arising from any and all 
NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, 
confidentiality, non-disclosure, non-compete and acceptable use policies (BOGUS 
AGREEMENTS) that I have entered into with your employer, its partners, licensors, 
agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. 
You further represent that you have the authority to release me from any BOGUS AGREEMENTS 
on behalf of your employer.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread M.-A. Lemburg
 On 03/02/2010 06:50, Barry Warsaw wrote:
 As to the question of sibling directories or folder-per-folder I think
 performance issues should be the deciding factor.  There are file system
 limitations to consider (but also a wide variety of file systems in
 use).  Do
 the number of stat calls predominate the performance costs?  Maybe it
 makes
 sense to implement the two different approaches and do some measurements.

How about using an optionally relative cache dir setting to let
the user decide ?

import imp, os

# Get cache dir, default to module_dir
cache_dir = os.environ.get('PYTHONCACHEDIR', '.')

# Get names and versions
module_cache_type = 'pyc'
module_cache_version = imp.get_magic().encode('hex')
module_name = module.__name__
module_cache_file = '%s.%s.%s' % (module_name, module_cache_version, 
module_cache_type)
module_dir = os.path.split(module.__file__)[0]

# Determine cache dir and cache file pathname
module_cache_dir = os.path.abspath(os.path.join(module_dir, cache_dir))
module_cache_pathname = os.path.join(module_cache_dir, module_cache_file)

# Write PYC data to module_cache_pathname
...

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 03 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Antoine Pitrou
Barry Warsaw barry at python.org writes:
 
 As to the question of sibling directories or folder-per-folder I think
 performance issues should be the deciding factor.  There are file system
 limitations to consider (but also a wide variety of file systems in use).  Do
 the number of stat calls predominate the performance costs?  Maybe it makes
 sense to implement the two different approaches and do some measurements.

How about doing measurements /with the current implementation/? Everyone seems
to worry about stat() calls but there doesn't seem to be any figures to evaluate
their significance.

Thanks

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Nick Coghlan
Ben Finney wrote:
 I don't think keeping the cache files in a mass of intertwingled extra
 subdirectories is the way to solve that problem. That speaks, rather, to
 the need for Python to be able to find the file on behalf of the user
 and blow it away on request, so the user doesn't need to go searching.
 
 Possible interface (with spelling of options chosen hastily)::
 
 $ python foo.py# Use cached byte code if available.
 $ python --force-compile foo.py# Unconditionally compile.
 
 If removing the byte code file, without running the module, is what's
 desired::
 
 $ python --delete-cache foo.py # Delete cached byte code.
 $ rm $(python --show-cache-file foo.py)  # Same as above.
 
 That should cover just about any common need for the user to know
 exactly which byte code file corresponds to a given source file. That,
 in turn, frees us to choose a less obtrusive location for the byte code
 files than mingled in with the source.

That's nice in theory, but tricky in practice given the intended
flexibility of the import system (i.e. we don't want to perpetrate new
import features that aren't part of the common importer interface, so
any such proposal would need to come complete with suggested extensions
to the PEP 302 importer protocol).

It's also the case that having to run Python to manage my own filesystem
would very annoying. If a dev has a broken .pyc that prevents the
affected Python build from even starting how are they meant to use the
nonfunctioning interpreter to find and delete the offending file? How is
someone meant to find and delete the .pyc files if they prefer to use a
graphical file manager over (or in conjunction with) the command line?

We can provide a utility script in the Python distribution to copy a
source tree without the Python cache directories easily enough, which
would be far simpler than providing the extra tools to cherry pick
compilation or deletion of individual cache files.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Nick Coghlan
Floris Bruynooghe wrote:
 Personally I'm +1 on the folder-per-folder option.

Of all the proposed options, I also dislike the SVN/CVS style folder
structure the least ;)

Cheers,
Nick.

P.S. Translation of the double negative: I don't find any of the
solutions, even the current .pyc/.pyo approach, to be particularly
elegant, so I can't really say I like any of them in an absolute sense.
However, having a single cache folder inside each Python source folder
seems to strike the best balance between keeping a tidy filesystem and
still being able to locate a cached file given only the location of the
source file (or vice-versa) without using any Python-specific tools, so
it is the approach I personally prefer.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Nick Coghlan
Bob Ippolito wrote:
 I like this option as well, but why not just name the directory .pyc
 instead of __pyr__ or .pyr? That way people probably won't even have
 to reconfigure their tools to ignore it :)

This actually came up in another part of the thread. The conclusion was
that, since the cached Python files can significantly affect the way
Python executes, it would be better not to use dot-files or set the
hidden attribute in the folder's metadata (on filesystems that support
that).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Nick Coghlan
Glenn Linderman wrote:
 On approximately 2/2/2010 7:05 PM, came the following characters from
  the keyboard of Guido van Rossum:
 On Tue, Feb 2, 2010 at 5:41 PM, Glenn 
 Lindermanv+pyt...@g.nevcal.com  wrote:
 Agreed.  But in reading that, it somehow triggered a question:
 does zipimport only work for zipfiles, or does it work for any
 archive format that Python stdlib knows how to decode?  And if
 only the former, why are they so special?
 
 The former.
 
 They are special because (unlike e.g. tar files) you can read the 
 table of contents of a zipfile without parsing the entire file.
 
 They are not unique in this... most archive formats except tar have a
  directory.  But that is likely a good reason not to support tar for
 this purpose, especially since tar usually comes found as .tar.Z or
 .tar.gz or .tar.bz2 etc. and would require two passes before the data
 could be found at all.

It's also because nobody has done the work to hook up any additional
archive formats (as zipimport needs to work for importing the standard
library itself, it isn't quite as simple as just importing an extra
module to do the manipulation. Extending the test suite to cover a new
archive format would require some work as well).

Given that zip files already work and are almost universal, I figure
folks have just opted to use that and then found other things to do with
their coding time :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Nick Coghlan
Barry Warsaw wrote:
 Encoding the magic number in the file name under .pyr would I thought make the
 look up scheme more efficient since the import machinery can craft the file
 name directly.  I agree it's not very human friendly because nobody really
 knows which magic numbers are associated with which Python versions and flags.

Having a lookup dictionary from Python version + C API magic numbers to
the magic strings used in cache filenames in the import engine shouldn't
be too tricky. I'll admit it wasn't until the thread had already been
going for a while that I realised that, though :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Barry Warsaw
On Feb 03, 2010, at 11:31 PM, Nick Coghlan wrote:

Having a lookup dictionary from Python version + C API magic numbers to
the magic strings used in cache filenames in the import engine shouldn't
be too tricky. I'll admit it wasn't until the thread had already been
going for a while that I realised that, though :)

I agree, and it's clear that would be much more user friendly.  I've added a
note to my working copy of the PEP and leave that as a possible design
change.  I'm still not certain what the right mapping would be though.  Python
version numbers don't seem quite right, but maybe they are a good enough
solution.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Barry Warsaw
On Feb 03, 2010, at 12:57 PM, Antoine Pitrou wrote:

How about doing measurements /with the current implementation/? Everyone
seems to worry about stat() calls but there doesn't seem to be any figures to
evaluate their significance.

Yes, very good idea, if for no other reason than to give us a baseline for
comparison.  Added to the PEP.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Nick Coghlan
Barry Warsaw wrote:
 On Feb 03, 2010, at 11:31 PM, Nick Coghlan wrote:
 
 Having a lookup dictionary from Python version + C API magic numbers to
 the magic strings used in cache filenames in the import engine shouldn't
 be too tricky. I'll admit it wasn't until the thread had already been
 going for a while that I realised that, though :)
 
 I agree, and it's clear that would be much more user friendly.  I've added a
 note to my working copy of the PEP and leave that as a possible design
 change.  I'm still not certain what the right mapping would be though.  Python
 version numbers don't seem quite right, but maybe they are a good enough
 solution.

If we ditch the -U option for 2.7, then we'll only have one magic number
per CPython version. I've been using cpython-27 in my examples.

On the issue of __file__, I'd suggesting not being too hasty in
deprecating that in favour of __source__. While I can see a lot of value
in having it point to the source file more often with a different
attribute that points to the cached file, I don't see a lot of gain to
compensate for the pain of changing the name of __file__ itself.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Brett Cannon
On Wed, Feb 3, 2010 at 05:27, Nick Coghlan ncogh...@gmail.com wrote:
 Glenn Linderman wrote:
 On approximately 2/2/2010 7:05 PM, came the following characters from
  the keyboard of Guido van Rossum:
 On Tue, Feb 2, 2010 at 5:41 PM, Glenn
 Lindermanv+pyt...@g.nevcal.com  wrote:
 Agreed.  But in reading that, it somehow triggered a question:
 does zipimport only work for zipfiles, or does it work for any
 archive format that Python stdlib knows how to decode?  And if
 only the former, why are they so special?

 The former.

 They are special because (unlike e.g. tar files) you can read the
 table of contents of a zipfile without parsing the entire file.

 They are not unique in this... most archive formats except tar have a
  directory.  But that is likely a good reason not to support tar for
 this purpose, especially since tar usually comes found as .tar.Z or
 .tar.gz or .tar.bz2 etc. and would require two passes before the data
 could be found at all.

 It's also because nobody has done the work to hook up any additional
 archive formats (as zipimport needs to work for importing the standard
 library itself, it isn't quite as simple as just importing an extra
 module to do the manipulation. Extending the test suite to cover a new
 archive format would require some work as well).

 Given that zip files already work and are almost universal, I figure
 folks have just opted to use that and then found other things to do with
 their coding time :)


If people really need alternative archive formats they can use the
importers package: http://packages.python.org/importers/ . If someone
really wants to use another format they can use the ABCs in the
package to easily write their own importer. It also contains a sqlite3
importer and its own zip importer.

-Brett

 Cheers,
 Nick.

 --
 Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
 ---
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/brett%40python.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Ben Finney
Nick Coghlan ncogh...@gmail.com writes:

 P.S. Translation of the double negative: I don't find any of the
 solutions, even the current .pyc/.pyo approach, to be particularly
 elegant, so I can't really say I like any of them in an absolute
 sense. However, having a single cache folder inside each Python source
 folder seems to strike the best balance between keeping a tidy
 filesystem and still being able to locate a cached file given only the
 location of the source file (or vice-versa) without using any
 Python-specific tools, so it is the approach I personally prefer.

Something I think is being lost here: AFAICT, the impetus behind this
PEP is to allow OS distributions to decouple the location of the
compiled bytecode files from the location of the source code files. (If
I'm mistaken, then clearly I don't understand the PEP's purpose at all
and I'd love to have this misconception corrected.)

If that's so, then I don't see how what you suggest above is any
significat progress toward that goal. It still tightly couples the
locations of the source code files and the complied bytecode files.
Having a distinct cache of compiled bytecode files addresses this
better.

-- 
 \  “I find the whole business of religion profoundly interesting. |
  `\ But it does mystify me that otherwise intelligent people take |
_o__)it seriously.” —Douglas Adams |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Guido van Rossum
On Wed, Feb 3, 2010 at 12:47 PM, Nick Coghlan ncogh...@gmail.com wrote:
 On the issue of __file__, I'd suggesting not being too hasty in
 deprecating that in favour of __source__. While I can see a lot of value
 in having it point to the source file more often with a different
 attribute that points to the cached file, I don't see a lot of gain to
 compensate for the pain of changing the name of __file__ itself.

Can you clarify? In Python 3, __file__ always points to the source.
Clearly that is the way of the future. For 99.99% of uses of __file__,
if it suddenly never pointed to a .pyc file any more (even if one
existed) that would be just fine. So what's this talk of switching to
__source__?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Martin v. Löwis
Guido van Rossum wrote:
 On Wed, Feb 3, 2010 at 12:47 PM, Nick Coghlan ncogh...@gmail.com wrote:
 On the issue of __file__, I'd suggesting not being too hasty in
 deprecating that in favour of __source__. While I can see a lot of value
 in having it point to the source file more often with a different
 attribute that points to the cached file, I don't see a lot of gain to
 compensate for the pain of changing the name of __file__ itself.
 
 Can you clarify? In Python 3, __file__ always points to the source.
 Clearly that is the way of the future. For 99.99% of uses of __file__,
 if it suddenly never pointed to a .pyc file any more (even if one
 existed) that would be just fine. So what's this talk of switching to
 __source__?

I originally proposed it, not knowing that Python 3 already changed the
meaning of __file__ for byte code files.

What I really wanted to suggest is that it should be possible to tell
what gets really executed, plus what source file had been considered.

So if __file__ is always the source file, a second attribute should tell
whether a byte code file got read (so that you can delete that in case
you doubt it's current, for example).

Regards,
Martin


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Nick Coghlan
Guido van Rossum wrote:
 On Wed, Feb 3, 2010 at 12:47 PM, Nick Coghlan ncogh...@gmail.com wrote:
 On the issue of __file__, I'd suggesting not being too hasty in
 deprecating that in favour of __source__. While I can see a lot of value
 in having it point to the source file more often with a different
 attribute that points to the cached file, I don't see a lot of gain to
 compensate for the pain of changing the name of __file__ itself.
 
 Can you clarify? In Python 3, __file__ always points to the source.
 Clearly that is the way of the future. For 99.99% of uses of __file__,
 if it suddenly never pointed to a .pyc file any more (even if one
 existed) that would be just fine. So what's this talk of switching to
 __source__?
 

In Barry's rough notes that he added to the PEP he said he thought
__file__ had become too ambiguous and was going to suggest changing the
name to __source__. That struck me as an overreaction to a very mild
ambiguity (one that will only lessen with time if a new attribute is
added to point to the cached file that was actually executed).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Nick Coghlan
Ben Finney wrote:
 Nick Coghlan ncogh...@gmail.com writes:
 
 P.S. Translation of the double negative: I don't find any of the
 solutions, even the current .pyc/.pyo approach, to be particularly
 elegant, so I can't really say I like any of them in an absolute
 sense. However, having a single cache folder inside each Python source
 folder seems to strike the best balance between keeping a tidy
 filesystem and still being able to locate a cached file given only the
 location of the source file (or vice-versa) without using any
 Python-specific tools, so it is the approach I personally prefer.
 
 Something I think is being lost here: AFAICT, the impetus behind this
 PEP is to allow OS distributions to decouple the location of the
 compiled bytecode files from the location of the source code files. (If
 I'm mistaken, then clearly I don't understand the PEP's purpose at all
 and I'd love to have this misconception corrected.)

No, the purpose is to allow the same source file to be shared between
multiple versions of the Python interpreter without their compiled files
conflicting as they do now. It's the support for multiple .pyc and .pyo
files per .py file that is the significant change, not the specific
location of those files.

Being able to get rid of the existing .pyc/.pyo clutter at the same time
is just a bonus.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-03 Thread Ben Finney
Thanks for the explanation.

Nick Coghlan ncogh...@gmail.com writes:

 Being able to get rid of the existing .pyc/.pyo clutter at the same
 time is just a bonus.

Okay. I maintain (unsurprisingly) that replacing it with subdirectory
clutter is a poor bargain. But I have nothing new to add on that score
for now.

-- 
 \ “A man may be a fool and not know it — but not if he is |
  `\   married.” —Henry L. Mencken |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Larry Hastings


On Sun, Jan 31, 2010 at 1:03 PM, Simon Cross 
hodgestar+python...@gmail.com wrote:

I don't know whether I in favour of using a single pyr folder or not
but if a single folder is used I'd definitely prefer the folder to be
called __pyr__ rather than .pyr.


Guido van Rossum wrote:
Exactly what I would prefer. I worry that having many small 
directories is a fairly poor use of the filesystem. A quick scan of 
/usr/local/lib/python3.2 on my Linux box reveals 1163 .py files but 
only 57 directories).


Just to be clear: what should go in the __pyr__ folder?  I can see two 
possibilities:


1) All files go directly into __pyr__, a flat directory tree.
   foo.py
   bar.py
   __pyr__/
   foo.py.c.3160
   bar.py.c.3160

2) Each source file gets its own subdirectory of __pyr__.
   foo.py
   bar.py
   __pyr__/
   foo.py/
 c.3160
   bar.py/
 c.3160

2 makes it easier to clear the cache for a particular source file--just 
delete its matching directory.  The downside is that we're back to lots 
of small directories.  And it's not that onerous to do a rm 
__pyr__/foo.py.*.  So I suspect you prefer option 1.



The proposal gets a +1 from me,


/larry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Sebastian Rittau
On Sun, Jan 31, 2010 at 12:44:33PM +0100, Georg Brandl wrote:

 At least to me, this does not explain why an unwanted (why unwanted? If
 it's unwanted, set PYTHONDONTWRITEBYTECODE=1) directory is worse than an
 unwanted file.

A directory feels different than. For example, typing ls in my shell
regular print files in black, but directories in bold and blue. File
managers and IDE also highlight directories differently. In tree views,
directories have expander buttons that also make them stand out.

As a concrete example, have a look at these two screenshots:

  http://tinyurl.com/yz2fr6c and http://tinyurl.com/yg38uqt

In the first one, the subpackages stand out, while in the second one they
are hard to make out among the *.pyr directories. A directory just adds
more clutter than a file.

But overall I like the idea of using just a single __pycache__ or
__pyr__ directory per path. This would also reduce the *.pyc clutter.

 - Sebastian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Neil Schemenauer
Nick Coghlan ncogh...@gmail.com wrote:
 Henning von Bargen wrote:
 The solution is so obvious:
 
 Why not use a .pyr file that is internally a zip file?

I think a Zip file might be the right approach too.  Either you
could have directories in the zip file for each version, e.g.

2.7/foo.pyc
3.3/foo.pyc
2.7/bar.pyc
3.3/bar.pyc

Or a Zip directory for each module:

foo/2.7.pyc
foo/3.3.pyc

I think you could get away without funky names because dot would
always be in the version number.

This would be implemented simply as an extension to the zip import
mechanism we already have.  Using the zip format would allow people
to use existing zip utilities to manipulate them.

 Agreed this should be discussed in the PEP, but one obvious problem is
 the speed impact. Picking up a file from a subdirectory is going to
 introduce less overhead than unpacking it from a zipfile.

I'm pretty sure it would be better than using directories.  A
directory for every module is not performance friendly.  Really, our
current module per file is not performance friendly.

Zip files could use store as the compression method if you are
really worried about CPU time.

  Neil

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Glenn Linderman
On approximately 2/2/2010 4:28 PM, came the following characters from 
the keyboard of Guido van Rossum:

Argh. zipfiles are way to complex to be writing.


Agreed.  But in reading that, it somehow triggered a question: does 
zipimport only work for zipfiles, or does it work for any archive format 
that Python stdlib knows how to decode?  And if only the former, why are 
they so special?


--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Guido van Rossum
On Tue, Feb 2, 2010 at 5:41 PM, Glenn Linderman v+pyt...@g.nevcal.com wrote:
 On approximately 2/2/2010 4:28 PM, came the following characters from the
 keyboard of Guido van Rossum:

 Argh. zipfiles are way to complex to be writing.

 Agreed.  But in reading that, it somehow triggered a question: does
 zipimport only work for zipfiles, or does it work for any archive format
 that Python stdlib knows how to decode?  And if only the former, why are
 they so special?

The former.

They are special because (unlike e.g. tar files) you can read the
table of contents of a zipfile without parsing the entire file. Also
because they are universally supported which makes it unnecessary to
support other formats. Again, contrast tar files which are virtually
unheard of on Windows.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Bob Ippolito
On Sun, Jan 31, 2010 at 11:16 AM, Guido van Rossum gu...@python.org wrote:
 Whoa. This thread already exploded. I'm picking this message to
 respond to because it reflects my own view after reading the PEP.

 On Sun, Jan 31, 2010 at 4:13 AM, Hanno Schlichting ha...@hannosch.eu wrote:
 On Sun, Jan 31, 2010 at 1:03 PM, Simon Cross
 hodgestar+python...@gmail.com wrote:
 I don't know whether I in favour of using a single pyr folder or not
 but if a single folder is used I'd definitely prefer the folder to be
 called __pyr__ rather than .pyr.

 Exactly what I would prefer. I worry that having many small
 directories is a fairly poor use of the filesystem. A quick scan of
 /usr/local/lib/python3.2 on my Linux box reveals 1163 .py files but
 only 57 directories).

I like this option as well, but why not just name the directory .pyc
instead of __pyr__ or .pyr? That way people probably won't even have
to reconfigure their tools to ignore it :)

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Barry Warsaw
I have to say up front that I'm somewhat shocked at how quickly this thread
has exploded!  Since I'm sprinting this week, I haven't thoroughly read every
message and won't have time tonight to answer every question, but I'll try to
pick out some common ideas.  I really appreciate everyone's input and will try
to clarify the PEP where I can.

It is probably not clear enough from the PEP, but I actually don't expect that
most individual Python developers will use this feature.  This is why the -R
flag exists and the behavior is turned off by default.  When I'm developing
some Python code in my home directory, I usually only use one Python version
and even if I'm going to test it with multiple Python versions, I won't need
to do this *simultaneously*.  I will generally blow away all build artifacts
(including, but not limited to .pyc files) and then rebuild with the different
Python version.

I think that this feature will be limited mostly to distros, which have
different use cases than individual developers.  But these are important use
cases for Python to support nonetheless.

My rationale for choosing the file system layout in the PEP was to try to
present something more familiar to today's Python and to avoid radical
reorganization of the way Python caches its byte code.  Thus having a sibling
directory that differs from the source just by extension seemed more natural
to me.

Encoding the magic number in the file name under .pyr would I thought make the
look up scheme more efficient since the import machinery can craft the file
name directly.  I agree it's not very human friendly because nobody really
knows which magic numbers are associated with which Python versions and flags.

As to the question of sibling directories or folder-per-folder I think
performance issues should be the deciding factor.  There are file system
limitations to consider (but also a wide variety of file systems in use).  Do
the number of stat calls predominate the performance costs?  Maybe it makes
sense to implement the two different approaches and do some measurements.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Barry Warsaw
On Jan 30, 2010, at 11:21 PM, Vitor Bosshard wrote:

Why not:

foo.py
foo.pyc #  2.7 or  3.2
foo.27.pyc
foo.32.pyc
etc.

Because this clutters the module's directory more than it does today, which I
considered to be a negative factor.  And as others have pointed out, there
isn't a one-to-one relationship between Python version numbers and byte code
compatibility.

I'd rather have a folder cluttered with files I know I can ignore (and
can easily run a selective rm over) than one that is cluttered with
subfolders.

I suppose this is going to be very subjective, but in skimming the thread it
seems like most people like putting the byte code cache artifacts in
subdirectories (be they siblings or folder-per-folder).

-Barry



signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Barry Warsaw
On Jan 31, 2010, at 03:07 PM, Ben Finney wrote:

In other words, my understanding is that the current PEP would have the
following tree for an example project::

foo/
__init__.py
__init__.pyr/
deadbeef.pyc
decafbad.pyc
lorem.py
lorem.pyr/
deadbeef.pyc
decafbad.pyc

[...etc...]

That's a nightmarish mess of compiled files swamping the source files,
as has been pointed out several times.

Except that I think it will be quite uncommon for typical Python developers to
be confronted with this.

Could we instead have a single subdirectory for each tree of module
packages, keeping them tidily out of the way of the source files, while
making them located just as deterministically::

If we do not choose the sibling folder approach, I feel pretty strongly that
it ought be more like the Subversion-like folder-per-folder approach than the
Bazaar-like folder-at-top-of-hierarchy approach.  If you have to manually blow
away a particular pyc file, folder-per-folder makes it much easier to find
exactly what you want to blow away without have to search up the file system,
and then back down again to find the pyc file to delete.  How many ..'s does
it take until you're lost in the twisty maze of ls?

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Barry Warsaw
On Jan 31, 2010, at 12:36 PM, Georg Brandl wrote:

Not really -- much of the code I've seen that tries to guess the source
file name from a __file__ value just does something like this:

   if fname.lower().endswith(('.pyc', '.pyo')): fname = fname[:-1]

That's not compatible with using .pyr, either.

The rationale for the .pyr extension is because I've usually seen (and
written) this instead:

base, ext = os.path.splitext(fname)
py_file = base + '.py'
# ...or...
if ext != '.py':
continue

I think I rarely care what the extension is if it's not '.py'.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Barry Warsaw
On Jan 31, 2010, at 09:30 PM, Martin v. Löwis wrote:

If a single pyc folder is used, I think an additional __source__
attribute would be needed to indicate what source file time stamp had
been checked (if any) to determine that the byte code file is current.

This is a good point.  __file__ is ambiguous so I think a reasonable thing to
add to the PEP is clear semantics for extracting the source file name and the
cached file name from the module object.

Python 3 uses the .py file for __file__ but I'd like to see a transition to
__source__ for that, with __cache__ for the location of the PVM, JVM, LLVM or
whatever compilation cache artifact file.

I've added a note to my working update of the PEP.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Ben Finney
Barry Warsaw ba...@python.org writes:

 I suppose this is going to be very subjective, but in skimming the
 thread it seems like most people like putting the byte code cache
 artifacts in subdirectories (be they siblings or folder-per-folder).

I don't understand the distinction you're making between those two
options. Can you explain what you mean by each of “siblings” and
“folder-per-folder”?

-- 
 \ “Pinky, are you pondering what I'm pondering?” “I think so, |
  `\   Brain, but Tuesday Weld isn't a complete sentence.” —_Pinky and |
_o__)   The Brain_ |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-02 Thread Ben Finney
Barry Warsaw ba...@python.org writes:

 If you have to manually blow away a particular pyc file,
 folder-per-folder makes it much easier to find exactly what you want
 to blow away without have to search up the file system, and then back
 down again to find the pyc file to delete. How many ..'s does it take
 until you're lost in the twisty maze of ls?

I don't think keeping the cache files in a mass of intertwingled extra
subdirectories is the way to solve that problem. That speaks, rather, to
the need for Python to be able to find the file on behalf of the user
and blow it away on request, so the user doesn't need to go searching.

Possible interface (with spelling of options chosen hastily)::

$ python foo.py# Use cached byte code if available.
$ python --force-compile foo.py# Unconditionally compile.

If removing the byte code file, without running the module, is what's
desired::

$ python --delete-cache foo.py # Delete cached byte code.
$ rm $(python --show-cache-file foo.py)  # Same as above.

That should cover just about any common need for the user to know
exactly which byte code file corresponds to a given source file. That,
in turn, frees us to choose a less obtrusive location for the byte code
files than mingled in with the source.

-- 
 \ “Pinky, are you pondering what I'm pondering?” “I think so, but |
  `\  where will we find an open tattoo parlor at this time of |
_o__)   night?” —_Pinky and The Brain_ |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Martin v. Löwis
 3. In each top level directory on sys.path, shadow file heirarchy
   Major Pro: trivial to separate out all cached files
   Major Con: ??? (I got nuthin')

The major con of this option (and option 2) is an ambiguity of where to
look for in case of packages. In particular for namespace packages
(of the setuptools kind, or the PEP 382 kind), the directory where a
package is found on sys.path can change across Python runs.

So when you run Python several times, and install additional eggs
in-between, you get different directories all caching the same pyc
files. If you then uninstall some of the eggs, it may be difficult to
find out what pyc files to delete.

 Note that with option two, creating a bytecode only zipfile would be
 trivial: just add the __pycache__ directory as the top-level directory
 in the zipfile and leave out everything else (assume there were no data
 files in the package that were still needed).

I think any scheme that uses directories for pyc files will cause stale
pyc files to be located on disk. I then think it is important to never
automatically use these in imports - i.e. only ever consider a file in
a __pycache__ directory if you found a .py file earlier.

If that is the policy, then a __pycache__ directory in a zipfile would
have no effect (and rightly so). Instead, to run code from bytecode,
the byte code files should be on sys.path themselves (probably still
named the same way as they are named inside __pycache__).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Gertjan Klein
Hanno Schlichting wrote:

+1 for a single strategy that is used in all cases. The current
solution could be phased out across multiple releases, but in the end
there should be a single approach and no flag. Otherwise some code and
tools will only support one of the approaches, especially if this is
seen as something only a minority of Linux distributions uses.

-1. As far as I can tell, this PEP proposes to solve a specific problem
that Linux distributions have. As they have decent package managers,
this PEP makes their maintainers' lives a lot easier. If implemented, I
believe it would eventually be used by all of them, not just a
minority.

For just about anyone else, I believe the current situation works
perfectly fine, and should not be changed. Personally, I work mainly on
Windows, and things I install are placed in the site-packages directory
of the Python version I use. There is no need to place .pyc files in
subdirectories there, as there will only ever be one. Programs I write
myself are also rarely, if ever, run by multiple Python versions. They
get run by the default Python on my system; if I change the default, the
.pyc files get overwritten, which is exactly what I want, I no longer
need the old ones.

As to the single cache directory per directory versus per .py file
issue: a subdirectory per .py file is easier to manipulate manually;
listing the .py file and the subdirectory containing the compiled
versions belonging to it makes it somewhat easier to prevent errors due
to deleting the source but not the compiled version. However, as the
use-case for this PEP seems to be to make life easier for Linux
packagers, it seems that a single __pycache__ subdirectory (or whatever
the name would be) is preferable: less filesystem clutter, and no risks
of forgetting to delete .pyc files, as this is about system-managed
Python source.

Gertjan.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Antoine Pitrou

 Would you still be a -1 on making it the new scheme the default if it
 used a single cache directory instead? That would actually be cleaner
 than the current solution rather than messier.

Well, I guess no, although additional directories are always more
intrusive than additional files (visually, or with tools such as du
for example).



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread M.-A. Lemburg
Raymond Hettinger wrote:
 
 On Jan 30, 2010, at 4:00 PM, Barry Warsaw wrote:
 Abstract
 

 This PEP describes an extension to Python's import mechanism which
 improves sharing of Python source code files among multiple installed
 different versions of the Python interpreter.
 
 +1 

+1 from here as well.

  It does this by
 allowing many different byte compilation files (.pyc files) to be
 co-located with the Python source file (.py file).  

+1 on the idea of having a standard for Python module cache
files.

+1 on having those files in the same directory as the associated
module file, just like we already do.

-1 on the idea of using directories for these. This only
complicates cleanup, management and distribution of such
files. Perhaps we could make this an option, though.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 01 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Brett Cannon
On Sun, Jan 31, 2010 at 11:16, Guido van Rossum gu...@python.org wrote:
 Whoa. This thread already exploded. I'm picking this message to
 respond to because it reflects my own view after reading the PEP.

 On Sun, Jan 31, 2010 at 4:13 AM, Hanno Schlichting ha...@hannosch.eu wrote:
 On Sun, Jan 31, 2010 at 1:03 PM, Simon Cross
 hodgestar+python...@gmail.com wrote:
 I don't know whether I in favour of using a single pyr folder or not
 but if a single folder is used I'd definitely prefer the folder to be
 called __pyr__ rather than .pyr.

 Exactly what I would prefer. I worry that having many small
 directories is a fairly poor use of the filesystem. A quick scan of
 /usr/local/lib/python3.2 on my Linux box reveals 1163 .py files but
 only 57 directories).

 Do you have any specific reason for that?

 Using the leading dot notation is an established pattern to hide
 non-essential information from directory views. What makes this
 non-applicable in this situation and a custom Python notation better?

 Because we don't want to completely hide the pyc files. Also the dot
 naming convention is somewhat platform-specific.

 FWIW in Python 3, the __file__ variable always points to the .py
 source filename. I agreed with Georg that there ought to be an API for
 finding the pyc file for a module. This could be a small addition to
 the PEP.

Importlib somewhat does this already through a module's loader:
http://docs.python.org/py3k/library/importlib.html#importlib.abc.PyPycLoader.bytecode_path
. If you want to work off of module names this is enough; if importlib
did the import then you can do __loader__.bytecode_path(__name__). And
if it has not been loaded yet then that simply requires me exposing an
importlib.find_module() that returns a loader for the module.

Trick comes down to when you want it based on __file__ instead of the
module name. Oh, and me finally breaking up import so that it has
proper loaders or bootstrapping importlib; small snag. =) But at least
the code already exists for this stuff.

-Brett


 --
 --Guido van Rossum (python.org/~guido)
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/brett%40python.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Brett Cannon
On Sun, Jan 31, 2010 at 11:04, Raymond Hettinger
raymond.hettin...@gmail.com wrote:

 On Jan 30, 2010, at 4:00 PM, Barry Warsaw wrote:
 Abstract
 

 This PEP describes an extension to Python's import mechanism which
 improves sharing of Python source code files among multiple installed
 different versions of the Python interpreter.

 +1


  It does this by
 allowing many different byte compilation files (.pyc files) to be
 co-located with the Python source file (.py file).

 It would be nice if all the compilation files could be tucked
 into one single zipfile per directory to reduce directory clutter.

 It has several benefits besides tidiness. It hides the implementation
 details of when magic numbers get shifted.  And it may allow faster
 start-up times when the zipfile is in the disk cache.

It also eliminates stat calls. I have not seen anyone mention this,
but on filesystems where stat calls are expensive (e.g. NFS), this is
going to increase import cost (and thus startup time which some people
are already incredibly paranoid about). You are now going to shift
from a single stat call to check for a bytecode file to two just in
the search phase *per file check* (remember you need to search for
module.py and module/__init__.py). And then you get to repeat all of
this during the load process (potentially, depending on how aggressive
the loader is with caching).

As others have said, an uncompressed zip file could work here. Or even
a file format where the first 4 bytes is the timestamp and then after
that are chunks of length-of-bytecode|magic|bytecode. That allows for
opening a file in append mode to add more bytecode instead of a
zipfile's requirement of rewriting the TOC on the end of the file
every time you mutate the file (if I remember the zip file format
correctly). Biggest cost in this simple approach would be reading the
file in (unless you mmap the thing when possible) since once read the
code will be a bytes object which means constant time indexing until
you find the right magic number. And adding support to differentiate
between -O bytecode is simply adding a marker per chunk of bytecode.

And I disagree this would be difficult as the PEP suggests given the
proper file format. For zip files zipimport already has the read code
in C; it just would require the code to write to a zip file. And as
for the format I mentioned above, that's dead-simple to implement.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Antoine Pitrou
Le Mon, 01 Feb 2010 11:35:19 -0800, Brett Cannon a écrit :
 
 As others have said, an uncompressed zip file could work here. Or even a
 file format where the first 4 bytes is the timestamp and then after that
 are chunks of length-of-bytecode|magic|bytecode. That allows for opening
 a file in append mode to add more bytecode instead of a zipfile's
 requirement of rewriting the TOC on the end of the file every time you
 mutate the file (if I remember the zip file format correctly).

Making the file append-only doesn't eliminate the problems with 
concurrent modification. You still have to specify and implement a robust 
cross-platform file locking system which will have to be shared by all 
implementations. This is really a great deal of complication to add to 
the interpreter(s).

And, besides, it might not even work on NFS which was the motivation for 
your proposal :)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Martin v. Löwis
 And I disagree this would be difficult as the PEP suggests given the
 proper file format. For zip files zipimport already has the read code
 in C; it just would require the code to write to a zip file. And as
 for the format I mentioned above, that's dead-simple to implement.

How do you write to a zipfile while others are reading it?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Brett Cannon
On Mon, Feb 1, 2010 at 13:19, Martin v. Löwis mar...@v.loewis.de wrote:
 And I disagree this would be difficult as the PEP suggests given the
 proper file format. For zip files zipimport already has the read code
 in C; it just would require the code to write to a zip file. And as
 for the format I mentioned above, that's dead-simple to implement.

 How do you write to a zipfile while others are reading it?


By hating concurrency (i.e. I don't have an answer which kills my idea).

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Paul Du Bois
 On Mon, Feb 1, 2010 at 13:19, Martin v. Löwis mar...@v.loewis.de wrote:
 How do you write to a zipfile while others are reading it?

On Mon, Feb 1, 2010 at 1:23 PM, Brett Cannon br...@python.org wrote:
 By hating concurrency (i.e. I don't have an answer which kills my idea).

The python I use (win32 2.6.2) does not complain if it cannot read
from or write to a .pyc; and thus it handles multiple python processes
trying to create .pyc files at the same time. Is the .zip case really
any different? Since .pyc files are an optimization, it seems natural
and correct that .pyc IO errors pass silently (apologies to Tim).

It's an interesting challenge to write the file in such a way that
it's safe for a reader and writer to co-exist. Like Brett, I
considered an append-only scheme, but one needs to handle the case
where the bytecode for a particular magic number changes. At some
point you'd need to sweep garbage from the file. All solutions seem
unnecessarily complex, and unnecessary since in practice the case
should not come up.

paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Martin v. Löwis
 The python I use (win32 2.6.2) does not complain if it cannot read
 from or write to a .pyc; and thus it handles multiple python processes
 trying to create .pyc files at the same time. Is the .zip case really
 any different? Since .pyc files are an optimization, it seems natural
 and correct that .pyc IO errors pass silently (apologies to Tim).
 
 It's an interesting challenge to write the file in such a way that
 it's safe for a reader and writer to co-exist. 

I grant you that this may actually work for concurrent readers
(although on Windows, you'll have to pick the file share mode
carefully). The reader would have to be fairly robust, as the central
directory may disappear or get garbled while it is reading.

So what would you do for concurrent writers, then? The current
implementation relies on creat(O_EXCL) to be atomic, so a second
writer would just fail. This is but the only IO operation that is
guaranteed to be atomic (along with mkdir(2)), so reusing the current
approach doesn't work.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-02-01 Thread Paul Du Bois
 The python I use (win32 2.6.2) does not complain if it cannot read
 from or write to a .pyc; and thus it handles multiple python processes
 trying to create .pyc files at the same time. Is the .zip case really
 any different?

[ snip discussion of difficulty of writing a sharing-safe update ]

On Mon, Feb 1, 2010 at 2:28 PM, Martin v. Löwis mar...@v.loewis.de wrote:
 So what would you do for concurrent writers, then? The current
 implementation relies on creat(O_EXCL) to be atomic, so a second
 writer would just fail. This is but the only IO operation that is
 guaranteed to be atomic (along with mkdir(2)), so reusing the current
 approach doesn't work.

Sorry, I'm guilty of having assumed that the POSIX API has an
operation analogous to win32 CreateFile(GENERIC_WRITE, 0 /* ie,
FILE_SHARE_NONE*/).

If shared-reader/single-writer semantics are not available, the only
other possibility I can think of is to avoid opening the .pyc for
write. To write a .pyc one would read it, write and flush updates to a
temp file, and rename(). This isn't atomic, but given the invariant
that the .pyc always contains consistent data, the new file will also
only contain consistent data. Races manifest as updates getting lost.

One obvious drawback is that the the .pyc inode would change on every update.

paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Nick Coghlan
Henning von Bargen wrote:
 I like the idea of the PEP.
 On the other hand, I dislike using directories for it.
 Others have explained enough reasons for why creating many
 directories is a bad idea; and there may be other reasons
 (file-system limits for number of directories, problems when
 the directories are located on the network).

Actually, this is the first post I've seen noting objective problems
with the use of a subdirectory. The others were just a subjective
difference in perspective that saw subdirectory clutter as somehow being
worse than file clutter.

Specific examples of filesystems with different limits on file and
subdirectory counts and network filesystems where opening a subdirectory
can result in a significant speed impact would be very helpful.

 The solution is so obvious:
 
 Why not use a .pyr file that is internally a zip file?

Agreed this should be discussed in the PEP, but one obvious problem is
the speed impact. Picking up a file from a subdirectory is going to
introduce less overhead than unpacking it from a zipfile.

That said, using a non-compressed zipfile would make a lot more sense
than inventing our own archive format if a subdirectory is eventually
deemed unsuitable.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Martin v. Löwis
 Linux distributions such as Ubuntu [2]_ and Debian [3]_ provide more
 than one Python version at the same time to their users.  For example,
 Ubuntu 9.10 Karmic Koala can install Python 2.5, 2.6, and 3.1, with
 Python 2.6 being the default.

 In order to ease the burden on operating system packagers for these
 distributions, the distribution packages do not contain Python version
 numbers [4]_; they are shared across all Python versions installed on
 the system.  Putting Python version numbers in the packages would be a
 maintenance nightmare, since all the packages - *and their
 dependencies* - would have to be updated every time a new Python
 release was added or removed from the distribution.  Because of the
 sheer number of packages available, this amount of work is infeasible.
 
 As a non-Debian user (I'm a Gentoo user), the above doesn't enlighten me,
 even after skimming the referenced document.  Perhaps an example would
 be helpful?

I think the basic question is: how do you get stuff into
/usr/lib/python2.6/site-packages/Pyrex?

One option would be to have a Debian package python26-pyrex. Then you
would also need a python25-pyrex package and a python27-pyrex package,
all essentially containing the very same files (but installed into
different directories).

What they want is a single python-pyrex package that automatically works
 for all Python versions - even those that aren't yet installed (i.e.
install python-pyrex first, and Python 2.7 later, and python-pyrex
should be available).

Having a single directory in sys.path for all Python versions currently
doesn't work, as the pyc files for each version would conflict.

The current solution consists (for package installation) of
a) installing the files in a single place
b) creating a directory hiearchy in each Python's site-package
c) symlinking all .py files into this directory hierarchy
d) byte-compiling all .py files in the hierarchy
For installation of new Python versions, they need to
a) walk over the list of installed Python packages
b) for each one, repeat steps b..d from above

With the PEP in place, for pure-Python packages, they could
a) have a system wide directory for pure-Python packages, and
b) arrange that directory to appear on sys.path for all Python
   versions
On package installation, they then could
a) install the files in that system-wide directory
b) for each Python version, run byte-code compilation of the
   new package
On Python installation, they would
a) byte-compile the entire directory.

Alternatively, to support packages that don't work with all Python
versions, they could continue to use symlinking, but restrict that
onto the top directories of each package (i.e. not create a directory
hierarchy in site-packages).

 (FYI, Gentoo just installs the pyc files into each of the installed
 Python's site-packages that is supported by the package in question...disk
 space is relatively cheap.)

I suppose Gentoo also installs .py files into each site-packages?

How does it deal with a Python installation that happens after the
package installation?

 * Would a moratorium on byte code changes, similar to the language
   moratorium described in PEP 3003 [16]_ be a better approach to
   pursue, and would that solve the problem for vendors?  At the time
   of this writing, PEP 3003 is silent on the issue.
 
 Unless the bytecode change moratorium was permanent (unlikely), how would
 this solve the vendor issues?

A vendor strategy might be to not store .pyc files on disk for some
Python versions (i.e. those that differ from the rest). Assume that
3.2, 3.3, 3.4 use the same pyc magic, and 3.5, 3.6, 3.7 also do. Then,
at any point in time, one of the Python versions is the system python
in Debian. This is the one who decides the official .pyc magic. The
other Python installations on the same system can either reuse the
existing .pyc files (if the magic matches), or not, in which case they
have to recompile (to memory) the Python source on every startup. The
longer the moratorium, the less of a problem this could cause for users.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Ben Finney
Nick Coghlan ncogh...@gmail.com writes:

 Actually, this is the first post I've seen noting objective problems
 with the use of a subdirectory. The others were just a subjective
 difference in perspective that saw subdirectory clutter as somehow
 being worse than file clutter.

Here's another one, then:

The directory where the source code files reside is often a working area
for the developer. The directory structure is an essential tool of
organising the project; the presence of an unwanted directory is clutter
to this purpose, in a way that the presence of an unwanted file is not.

-- 
 \ “Alternative explanations are always welcome in science, if |
  `\   they are better and explain more. Alternative explanations that |
_o__) explain nothing are not welcome.” —Victor J. Stenger, 2001-11-05 |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Martin v. Löwis
 Agreed this should be discussed in the PEP, but one obvious problem is
 the speed impact. Picking up a file from a subdirectory is going to
 introduce less overhead than unpacking it from a zipfile.

There is also the issue of race conditions with multiple simultaneous
accesses. The original format for the PEP had race conditions for
multiple simultaneous writers; ZIP will also have race conditions for
concurrent readers/writers (as any new writer will have to overwrite
the central directory, making the zip file temporarily unavailable -
unless they copy it, in which case we are back to writer/writer
races).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Antoine Pitrou
Barry Warsaw barry at python.org writes:
 
 Putting Python version numbers in the packages would be a
 maintenance nightmare, since all the packages - *and their
 dependencies* - would have to be updated every time a new Python
 release was added or removed from the distribution.  Because of the
 sheer number of packages available, this amount of work is infeasible.

How is this infeasible exactly? Wouldn't it be an easy target for scripting?

 As an example of the problem, a common (though fragile) Python idiom
 for locating data files is to do something like this::

I don't think this is fragile. It is the most robust I can think of, but perhaps
I'm missing another solution :)
(well, apart from pkg_resources, that is)

 The implementation of this PEP would have to ensure that the same
 directory level is returned from `__file__` as it does without the
 `pyr` directory, so that the common idiom above continues to work::
 
  import foo
  foo.__file__
 'foo.pyr'

Would things like exec() work on the given directory?

 An earlier version of this PEP described fat Python byte code files.
 These files would contain the equivalent of multiple `pyc` files in a
 single `pyf` file, with a lookup table keyed off the appropriate magic
 number.  This was an extensible file format so that the first 5
 parallel Python implementations could be supported fairly efficiently,
 but with extension lookup tables available to scale `pyf` byte code
 objects as large as necessary.

As Martin said, this creates concurrent access problems, when several
interpreters modify the file simultaneously.

 * What about `py` source files that are compatible with most but not
   all installed Python versions.  We might need a way to say this py
   file should be hidden from Python versions X.Y or earlier.

-1. This is the distributor's job, not Python's.
If you want you can create dummy pyc's in your pyr that will raise an
ImportError or a NotImplementedError with some versions of Python. But I don't
think Python should have a stake in this.

 * Would a moratorium on byte code changes, similar to the language
   moratorium described in PEP 3003 [16]_ be a better approach to
   pursue, and would that solve the problem for vendors?  At the time
   of this writing, PEP 3003 is silent on the issue.

-1. Bytecode is an internal detail; besides, it is vital to be able to evolve 
it.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Georg Brandl
Am 31.01.2010 07:29, schrieb Nick Coghlan:
 Vitor Bosshard wrote:
 There is no one-to-one correspondence between Python version and pyc
 magic numbers. Different runtime options may change the magic number and
 different versions may reuse a magic number
 
 Good point. Runtime options would need to change the version (e.g.
 foo.25U.py), and versions that reuse magic numbers would be
 redundantly written to disk. However, the underlying issue as I see it
 is that the magic value is an implementation detail that should not be
 exposed.
 
 I think this is actually be a good point - while there needs to be a
 shared namespace to allow different Python implementations to avoid
 stepping on each others toes, CPython's bytecode compatibility magic
 number may not be the best choice as the distinguishing identifier.
 
 It may be better to give the magic numbers a meaningful corresponding
 string, such that the filenames would be more like:
 
 foo.py
 foo.pyr/
   cpython-25.pyc
   cpython-25U.pyc
   cpython-27.pyc
   cpython-27U.pyc
   cpython-32.pyc
   unladen-011.pyc
   wpython-11.pyc

+1.  It should be quite easy to assign a new name every time the magic
number is updated.

 If we don't change the bytecode for a given Python version, then the
 name of the bytecode format used wouldn't change either.

That would be the only remaining complaint for casual users. (Why doesn't
Python compile my file for 2.8?)

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Georg Brandl
Am 31.01.2010 05:18, schrieb Ben Finney:
 Nick Coghlan ncogh...@gmail.com writes:
 
 It won't be cluttered with subfolders - you will have at most one .pyr
 per source .py file.
 
 If that doesn't meet your threshold of “cluttered with subfolders”, I'm
 at a loss for words to think where that threshold might be. It meets,
 and exceeds by a long shot, my threshold for subfolder clutter.
 
 Even adding a *single* subfolder in arbitrary directories is an
 obnoxious act for a program to do automatically, and is not to be
 undertaken lightly. It might be justified in this case, but that doesn't
 mean we should open the gates to even more clutter.

Then why did Subversion choose to follow the CVS way and create a
subdirectory in each versioned directory?  IMO, this is much more
annoying given the alternative of a single .hg/.bzr/whatever directory.
For .pyc vs .pyr, you didn't have the alternative of putting all that
stuff in one directory now.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Georg Brandl
Am 31.01.2010 10:21, schrieb Ben Finney:
 Nick Coghlan ncogh...@gmail.com writes:
 
 Actually, this is the first post I've seen noting objective problems
 with the use of a subdirectory. The others were just a subjective
 difference in perspective that saw subdirectory clutter as somehow
 being worse than file clutter.
 
 Here's another one, then:
 
 The directory where the source code files reside is often a working area
 for the developer. The directory structure is an essential tool of
 organising the project; the presence of an unwanted directory is clutter
 to this purpose, in a way that the presence of an unwanted file is not.

At least to me, this does not explain why an unwanted (why unwanted? If
it's unwanted, set PYTHONDONTWRITEBYTECODE=1) directory is worse than an
unwanted file.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Simon Cross
On Sun, Jan 31, 2010 at 1:54 PM, Hanno Schlichting ha...@hannosch.eu wrote:
 I'd be a big +1 to using a single .pyr directory per source directory.

I don't know whether I in favour of using a single pyr folder or not
but if a single folder is used I'd definitely prefer the folder to be
called __pyr__ rather than .pyr.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Hanno Schlichting
On Sun, Jan 31, 2010 at 1:03 PM, Simon Cross
hodgestar+python...@gmail.com wrote:
 I don't know whether I in favour of using a single pyr folder or not
 but if a single folder is used I'd definitely prefer the folder to be
 called __pyr__ rather than .pyr.

Do you have any specific reason for that?

Using the leading dot notation is an established pattern to hide
non-essential information from directory views. What makes this
non-applicable in this situation and a custom Python notation better?

Hanno
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Simon Cross
On Sun, Jan 31, 2010 at 2:13 PM, Hanno Schlichting ha...@hannosch.eu wrote:
 On Sun, Jan 31, 2010 at 1:03 PM, Simon Cross
 hodgestar+python...@gmail.com wrote:
 I don't know whether I in favour of using a single pyr folder or not
 but if a single folder is used I'd definitely prefer the folder to be
 called __pyr__ rather than .pyr.

 Do you have any specific reason for that?

It rather not have the confusion caused by stray .pyc files multiplied
by having said stray files buried in a hidden folder.

 Using the leading dot notation is an established pattern to hide
 non-essential information from directory views. What makes this
 non-applicable in this situation and a custom Python notation better?

Something being an established pattern doesn't mean it's a good idea.
If we're go with an by-convention argument anyway surely Python
conventions should take precedence -- this is *Python* after all. :)

On the whole I'm against hiding folders because what information is
non-essential varies from situation to situation. People (including
me) regularly screw up dealing with .svn folders by including them in
source tarballs, copying parts of one working copy into another, etc.

Schiavo
Simon
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Nick Coghlan
Georg Brandl wrote:
 Am 31.01.2010 07:18, schrieb Nick Coghlan:
 Ben Finney wrote:
 Could we instead have a single subdirectory for each tree of module
 packages, keeping them tidily out of the way of the source files, while
 making them located just as deterministically::
 Not easily. With the scheme currently proposed in the PEP, setting a
 value for __file__ which is both reasonably accurate and backwards
 compatible with existing file manipulation techniques is
 straightforward: just use the name of the cache directory.
 
 Not really -- much of the code I've seen that tries to guess the source
 file name from a __file__ value just does something like this:
 
if fname.lower().endswith(('.pyc', '.pyo')): fname = fname[:-1]
 
 That's not compatible with using .pyr, either.

That's not the backwards compatibility I'm talking about - I'm talking
about the more common one mentioned in the PEP where __file__ is used
with os.path.split to locate adjacent resource files.

Agreed that even the .pyr idea causes backwards compatibility problems
with code like the above (fortunately we can fix the stdlib instances
ourselves).

Cheers,
Nick.


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >