Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-11-30 Thread PJ Eby
On Sat, Nov 26, 2011 at 11:53 AM, Éric Araujo mer...@netwok.org wrote:

  Le 11/08/2011 20:30, P.J. Eby a écrit :
  At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote:
   I’ll just regret that it's not possible to provide a module docstring
  to inform that this is a namespace package used for X and Y.
  It *is* possible - you'd just have to put it in a zc.py file.  IOW,
  this PEP still allows namespace-defining packages to exist, as was
  requested by early commenters on PEP 382.  It just doesn't *require*
  them to exist in order for the namespace contents to be importable.

 That’s quite cool.  I guess such a namespace-defining module (zc.py
 here) would be importable, right?


Yes.


  Also, would it cause worse
 performance for other zc.* packages than if there were no zc.py?


No.  The first import of a subpackage sets up the __path__, and all
subsequent imports use it.



  A pure virtual package having no source file, I think it should have no

 __file__ at all.

 Antoine and someone else thought likewise (I can find the link if you
 want); do you consider it consensus enough to update the PEP?


Sure.  At this point, though, before doing any more work on the PEP I'd
like to have some idea of whether there's any chance of it being accepted.
 At this point, there seems to be a lot of passive, Usenet nod syndrome
type support for it, but little active support.

It doesn't help at all that I'm not really in a position to provide an
implementation, and the persons most likely to implement have been leaning
somewhat towards 382, or wanting to modify 402 such that it uses .pyp
directory extensions so that PEP 395 can be supported...

And while 402 is an extension of an idea that Guido proposed a few years
ago, he hasn't weighed in lately on whether he still likes that idea, let
alone whether he likes where I've taken it.  ;-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-11-30 Thread Éric Araujo
Hi,

Thanks for the replies.

 At this point, though, before doing any more work on the PEP I'd
 like to have some idea of whether there's any chance of it being accepted.
  At this point, there seems to be a lot of passive, Usenet nod syndrome
 type support for it, but little active support.
If this helps, I am +1, and I’m sure other devs will chime in.  I think
the feature is useful, and I prefer 402’s way to 382’s pyp directories.
 I do acknowledge that 402 poses problems to PEP 395 which 382 does not,
and as I’m not in a position to help, my vote may count less.

Cheers
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-11-30 Thread Martin v. Löwis
 If this helps, I am +1, and I’m sure other devs will chime in.  I think
 the feature is useful, and I prefer 402’s way to 382’s pyp directories.

If that's the obstacle to adopting PEP 382, it would be easy to revert
the PEP back to having file markers to indicate package-ness. I insist
on having markers of some kind, though (IIUC, this is also what PEP 395
requires).

The main problem with file markers is that
a) they must not overlap across portions of a package, and
b) the actual file name and content is irrelevant.

a) means that package authors have to come up with some name, and b)
means that the name actually doesn't matter (but the file name extension
would). UUIDs would work, as would the name of the portion/distribution.
I think the specific choice of name will confuse people into
interpreting things in the file name that aren't really intended.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-11-30 Thread Nick Coghlan
On Thu, Dec 1, 2011 at 1:28 AM, PJ Eby p...@telecommunity.com wrote:
 It doesn't help at all that I'm not really in a position to provide an
 implementation, and the persons most likely to implement have been leaning
 somewhat towards 382, or wanting to modify 402 such that it uses .pyp
 directory extensions so that PEP 395 can be supported...

While I was initially a fan of the possibilities of PEP 402, I
eventually decided that we would be trading an easy problem (you need
an '__init__.py' marker file or a '.pyp' extension to get Python to
recognise your package directory) for a hard one (What's your
sys.path look like? What did you mean for it to look like?). Symlinks
(and the fact we implicitly call realname() during system
initialisation and import) just make things even messier.
*Deliberately* allowing package structures on the filesystem to become
ambiguous is a recipe for future pain (and could potentially undo a
lot of the good work done by PEP 328's elimination of implicit
relative imports).

I acknowledge there is a lot of confusion amongst novices as to how
packages and imports actually work, but my diagnosis of the root cause
of that problem is completely different from that supposed by PEP 402
(as documented in the more recent versions of PEP 395, I've come to
believe it is due to the way we stuff up the default sys.path[0]
initialisation when packages are involved).

So, in the end, I've come to strongly prefer the PEP 382 approach. The
principle of Explicit is better than implicit applies to package
detection on the filesystem just as much as it does to any other kind
of API design, and it really isn't that different from the way we
treat actual Python files (i.e. you can *execute* arbitrary files, but
they need to have an appropriate extension if you want to import
them).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-11-30 Thread Glyph

On Nov 30, 2011, at 6:39 PM, Nick Coghlan wrote:

 On Thu, Dec 1, 2011 at 1:28 AM, PJ Eby p...@telecommunity.com wrote:
 It doesn't help at all that I'm not really in a position to provide an
 implementation, and the persons most likely to implement have been leaning
 somewhat towards 382, or wanting to modify 402 such that it uses .pyp
 directory extensions so that PEP 395 can be supported...
 
 While I was initially a fan of the possibilities of PEP 402, I
 eventually decided that we would be trading an easy problem (you need
 an '__init__.py' marker file or a '.pyp' extension to get Python to
 recognise your package directory) for a hard one (What's your
 sys.path look like? What did you mean for it to look like?). Symlinks
 (and the fact we implicitly call realname() during system
 initialisation and import) just make things even messier.
 *Deliberately* allowing package structures on the filesystem to become
 ambiguous is a recipe for future pain (and could potentially undo a
 lot of the good work done by PEP 328's elimination of implicit
 relative imports).
 
 I acknowledge there is a lot of confusion amongst novices as to how
 packages and imports actually work, but my diagnosis of the root cause
 of that problem is completely different from that supposed by PEP 402
 (as documented in the more recent versions of PEP 395, I've come to
 believe it is due to the way we stuff up the default sys.path[0]
 initialisation when packages are involved).
 
 So, in the end, I've come to strongly prefer the PEP 382 approach. The
 principle of Explicit is better than implicit applies to package
 detection on the filesystem just as much as it does to any other kind
 of API design, and it really isn't that different from the way we
 treat actual Python files (i.e. you can *execute* arbitrary files, but
 they need to have an appropriate extension if you want to import
 them).

I've helped an almost distressing number of newbies overcome their confusion 
about sys.path and packages.  Systems using Twisted are, almost by definition, 
hairy integration problems, and are frequently being created or maintained by 
people with little to no previous Python experience.

Given that experience, I completely agree with everything you've written above 
(except for the part where you initially liked it).  I appreciate the insight 
that PEP 402 offers about python's package mechanism (and the difficulties 
introduced by namespace packages).  Its statement of the problem is good, but 
in my opinion its solution points in exactly the wrong direction: packages need 
to be _more_ explicit about their package-ness and tools need to be stricter 
about how they're laid out.  It would be great if sys.path[0] were actually 
correct when running a script inside a package, or at least issued a warning 
which would explain how to correctly lay out said package.  I would love to see 
a loud alarm every time a module accidentally got imported by the same name 
twice.  I wish I knew, once and for all, whether it was 'import Image' or 'from 
PIL import Image'.

My hope is that if Python starts to tighten these things up a bit, or at least 
communicate better about best practices, editors and IDEs will develop better 
automatic discovery features and frameworks will start to normalize their 
sys.path setups and stop depending on accidents of current directory and script 
location.  This will in turn vastly decrease confusion among new python 
developers taking on large projects with a bunch of libraries, who mostly don't 
care what the rules for where files are supposed to go are, and just want to 
put them somewhere that works.

-glyph
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-11-26 Thread Éric Araujo
Hi,

Going through my email backlog.

 Le 11/08/2011 20:30, P.J. Eby a écrit :
 At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote:
 (By the way, both of these additions to the import protocol (i.e. the
 dynamically-added ``__path__``, and dynamically-created modules)
 apply recursively to child packages, using the parent package's
 ``__path__`` in place of ``sys.path`` as a basis for generating a
 child ``__path__``.  This means that self-contained and virtual
 packages can contain each other without limitation, with the caveat
 that if you put a virtual package inside a self-contained one, it's
 gonna have a really short ``__path__``!)
 I don't understand the caveat or its implications.
 Since each package's __path__ is the same length or shorter than its
 parent's by default, then if you put a virtual package inside a
 self-contained one, it will be functionally speaking no different
 than a self-contained one, in that it will have only one path
 entry.  So, it's not really useful to put a virtual package inside a
 self-contained one, even though you can do it.  (Apart form it
 letting you avoid a superfluous __init__ module, assuming it's indeed
 superfluous.)

I still don’t understand why this matters or what negative effects it
could have on code, but I’m fine with not understanding.  I’ll trust
that people writing or maintaining import-related tools will agree or
complain about that item.

 I’ll just regret that it's not possible to provide a module docstring
 to inform that this is a namespace package used for X and Y.
 It *is* possible - you'd just have to put it in a zc.py file.  IOW,
 this PEP still allows namespace-defining packages to exist, as was
 requested by early commenters on PEP 382.  It just doesn't *require*
 them to exist in order for the namespace contents to be importable.

That’s quite cool.  I guess such a namespace-defining module (zc.py
here) would be importable, right?  Also, would it cause worse
performance for other zc.* packages than if there were no zc.py?

 This was probably said on import-sig, but here I go: yet another import
 artifact in the sys module!  I hope we get ImportEngine in 3.3 to clean
 up all this.
 Well, I rather *like* having them there, personally, vs. having to
 learn yet another API, but oh well, whatever.

Agreed with “whatever” :)  I just like to grunt sometimes.

 AFAIK, ImportEngine isn't going to do away with the need for the
 global ones to live somewhere,

Yep, but as Nick replied, at least we’ll gain one structure to rule them
all.

 Let's imagine my application Spam has a namespace spam.ext for plugins.
  To use a custom directory where plugins are stored, or a zip file with
 plugins (I don't use eggs, so let me talk about zip files here), I'd
 have to call sys.path.append *and* pkgutil.extend_virtual_paths?
 As written in the current proposal, yes.  There was some discussion
 on Python-Dev about having this happen automatically, and I proposed
 that it could be done by making virtual packages' __path__ attributes
 an iterable proxy object, rather than a list:

That sounds a bit too complicated.  What about just having
pkgutil.extend_virtual_paths call sys.path.append?  For maximum
flexibility, extend_virtual_paths could have an argument to avoid
calling sys.path.append.

 Besides, putting data files in a Python package is held very poorly by
 some (mostly people following the File Hierarchy Standard),
 ISTM that anybody who thinks that is being inconsistent in
 considering the Python code itself to not be a data file by that
 same criterion...  especially since one of the more common uses for
 such data files are for e.g. HTML templates (which usually contain
 some sort of code) or GUI resources (which are pretty tightly bound
 to the code).

A good example is documentation: Having a unique location
(/usr/share/doc) for all installed software makes my life easier.
Another example is JavaScript files used with HTML documents, such as
jQuery: Debian recently split the jQuery file out of their Sphinx
package, so that there is only one library installed that all packages
can use and that can be updated and fixed once for all.  (I’m
simplifying; there can be multiple versions of libraries, but not
multiple copies.  I’ll stop here; I’m not one of the authors of the
Filesystem Hierarchy Standard, and I’ll rant against package_data in
distutils mailing lists :)

 A pure virtual package having no source file, I think it should have no
 __file__ at all.

Antoine and someone else thought likewise (I can find the link if you
want); do you consider it consensus enough to update the PEP?

Regards
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread Vinay Sajip
Éric Araujo merwok at netwok.org writes:

 Besides, putting data files in a Python package is held very poorly by
 some (mostly people following the File Hierarchy Standard), and in
 distutils2/packaging, we (will) have a resources system that’s as
 convenient for users and more flexible for OS packagers.  Using __file__
 for more than information on the module is frowned upon for other
 reasons anyway (I talked about a Debian developer about this one day but
 forgot), so I think the limitation is okay.
 

The FHS does not apply in all scenarios - not all Python code is
deployed/packaged at system level. For example, plug-ins (such as Django apps)
are often not meant to be installed by a system-level packager. This might also
be true in scenarios where Python is embedded into some other application. It's
really useful to be able to co-locate packages with their data (e.g. in a zip
file) and I don't think all instances of putting data files in a package are to
be frowned upon.

Regards,

Vinay Sajip

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread P.J. Eby

At 02:02 PM 8/11/2011 -0400, Glyph Lefkowitz wrote:
Rather than a one-by-one ad-hoc consideration of which attribute 
should be set to None or empty strings or string or what have 
you, I'd really like to see a discussion in the PEP saying what a 
package really is vs. what a module is, and what one can reasonably 
expect from it from an API and tooling perspective.


The assumption I've been working from is the only guarantee I've ever 
seen the Python docs give: i.e., that a package is a module object 
with a __path__ attribute.  Modules aren't even required to have a 
__file__ object -- builtin modules don't, for example.  (And the 
contents of __file__ are not required to have any particular 
semantics: PEP 302 notes that it can be a dummy value like 
frozen, for example.)


Technically, btw, PEP 302 requires __file__ to be a string, so making 
__file__ = None will be a backwards-incompatible change.  But any 
code that walks modules in sys.modules is going to break today if it 
expects a __file__ attribute to exist, because 'sys' itself doesn't have one!


So, my leaning is towards leaving off __file__, since today's code 
already has to deal with it being nonexistent, if it's working with 
arbitrary modules, and that'll produce breakage sooner rather than 
later -- the twisted.python.modules code, for example, would fail 
with a loud AttributeError, rather than going on to silently assume 
that a module with a dummy __file__ isn't a package.   (Which is NOT 
a valid assumption *now*, btw, as I'll explain below.)


Anyway, if you have any suggestions for verbiage that should be added 
to the PEP to clarify these assumptions, I'd be happy to add 
them.  However, I think that the real problem you're encountering at 
the moment has more to do with making assumptions about the Python 
import ecosystem that aren't valid today, and haven't been valid 
since at least the introduction of PEP 302, if not earlier import 
hook systems as well.



 But the whole pure virtual mechanism here seems to pile even 
more inconsistency on top of an already irritatingly inconsistent 
import mechanism.  I was reasonably happy with my attempt to paper 
over PEP 302's weirdnesses from a user perspective:


http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.htmlhttp://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html

(or https://launchpad.net/moduleshttps://launchpad.net/modules if 
you are not a Twisted user)


Users of this API can traverse the module hierarchy with certain 
expectations; each module or package would have .pathEntry and 
.filePath attributes, each of which would refer to the appropriate 
place.  Of course __path__ complicates things a bit, but so it goes.


I don't mean to be critical, and no doubt what you've written works 
fine for your current requirements, but on my quick attempt to skim 
through the code I found many things which appear to me to be 
incompatible with PEP 302.


That is, the above code hardocdes a variety of assumptions about the 
import system that haven't been true since Python 2.3.  (For example, 
it assumes that the contents of sys.path strings have inspectable 
semantics, that the contents of __file__ can tell you things about 
the module-ness or package-ness of a module object, etc.)


If you want to fully support PEP 302, you might want to consider 
making this a wrapper over the corresponding pkgutil APIs (available 
since Python 2.5) that do roughly the same things, but which delegate 
all path string inspection to importer objects and allow extensible 
delegation for importers that don't support the optional methods involved.


(Of course, if the pkgutil APIs are missing something you need, 
perhaps you could propose additions.)



Now it seems like pure virtual packages are going to introduce a new 
type of special case into the hierarchy which have neither 
.pathEntry nor .filePath objects.


The problem is that your API's notion that these things exist as 
coherent concepts was never really a valid assumption in the first 
place.  .pth files and namespace packages already meant that the idea 
of a package coming from a single path entry made no sense.  And 
namespace packages installed by setuptools' system packaging mode 
*don't have a __file__ attribute* today...  heck they don't have 
__init__ modules, either.


So, adding virtual packages isn't actually going to change anything, 
except perhaps by making these scenarios more common.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread Glyph Lefkowitz

On Aug 12, 2011, at 11:24 AM, P.J. Eby wrote:

 That is, the above code hardocdes a variety of assumptions about the import 
 system that haven't been true since Python 2.3.

Thanks for this feedback.  I honestly did not realize how old and creaky this 
code had gotten.  It was originally developed for Python 2.4 and it certainly 
shows its age.  Practically speaking, the code is correct for the bundled 
importers, and paths and zipfiles are all we've cared about thus far.

 (For example, it assumes that the contents of sys.path strings have 
 inspectable semantics, that the contents of __file__ can tell you things 
 about the module-ness or package-ness of a module object, etc.)

Unfortunately, the primary goal of this code is to do something impossible - 
walk the module hierarchy without importing any code.  So some heuristics are 
necessary.  Upon further reflection, PEP 402 _will_ make dealing with namespace 
packages from this code considerably easier: we won't need to do AST analysis 
to look for a __path__ attribute or anything gross like that improve 
correctness; we can just look in various directories on sys.path and accurately 
predict what __path__ will be synthesized to be.

However, the isPackage() method can and should be looking at the module if it's 
already loaded, and not always guessing based on paths.  The whole reason 
there's an 'importPackages' flag to walk() is that some applications of this 
code care more about accuracy than others, so it tries to be as correct as it 
can be.

(Of course this is still wrong for the case where a __path__ is dynamically 
constructed by user code, but there's only so well one can do at that.)

 If you want to fully support PEP 302, you might want to consider making this 
 a wrapper over the corresponding pkgutil APIs (available since Python 2.5) 
 that do roughly the same things, but which delegate all path string 
 inspection to importer objects and allow extensible delegation for importers 
 that don't support the optional methods involved.

This code still needs to support Python 2.4, but I will make a note of this for 
future reference.

 (Of course, if the pkgutil APIs are missing something you need, perhaps you 
 could propose additions.)

 Now it seems like pure virtual packages are going to introduce a new type of 
 special case into the hierarchy which have neither .pathEntry nor .filePath 
 objects.
 
 The problem is that your API's notion that these things exist as coherent 
 concepts was never really a valid assumption in the first place.  .pth files 
 and namespace packages already meant that the idea of a package coming from a 
 single path entry made no sense.  And namespace packages installed by 
 setuptools' system packaging mode *don't have a __file__ attribute* today...  
 heck they don't have __init__ modules, either.

The fact that getModule('sys') breaks is reason enough to re-visit some of 
these design decisions.

 So, adding virtual packages isn't actually going to change anything, except 
 perhaps by making these scenarios more common.

In that case, I guess it's a good thing; these bugs should be dealt with.  
Thanks for pointing them out.  My opinion of PEP 402 has been completely 
reversed - although I'd still like to see a section about the module system 
from a library/tools author point of view rather than a time-traveling perl 
user's narrative :).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread P.J. Eby

At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
Upon further reflection, PEP 402 _will_ make dealing with namespace 
packages from this code considerably easier: we won't need to do AST 
analysis to look for a __path__ attribute or anything gross like 
that improve correctness; we can just look in various directories on 
sys.path and accurately predict what __path__ will be synthesized to be.


The flip side of that is that you can't always know whether a 
directory is a virtual package without deep inspection: one 
consequence of PEP 402 is that any directory that contains a Python 
module (of whatever type), however deeply nested, will be a valid 
package name.  So, you can't rule out that a given directory *might* 
be a package, without walking its entire reachable subtree.  (Within 
the subset of directory names that are valid Python identifiers, of course.)


However, you *can* quickly tell that a directory *might* be a package 
or is *probably* one: if it contains modules, or is the same name as 
an already-discovered module, it's a pretty safe bet that you can 
flag it as such.


In any case, you probably should *not* do the building of a virtual 
path yourself; the protocols and APIs added by PEP 402 should allow 
you to simply ask for the path to be constructed on your 
behalf.  Otherwise, you are going to be back in the same business of 
second-guessing arbitrary importer backends again!


(E.g. note that PEP 402 does not say virtual package subpaths must be 
filesystem or zipfile subdirectories of their parents - an importer 
could just as easily allow you to treat subdirectories named 
'twisted.python' as part of a virtual package with that name!)


Anyway, pkgutil defines some extra methods that importers can 
implement to support module-walking, and part of the PEP 402 
implementation should be to make this support virtual packages as well.



This code still needs to support Python 2.4, but I will make a note 
of this for future reference.


A suggestion: just take the pkgutil code and bundle it for Python 2.4 
as something._pkgutil.  There's very little about it that's 2.5+ 
specific, at least when I wrote the bits that do the module walking.


Of course, the main disadvantage of pkgutil for your purposes is that 
it currently requires packages to be imported in order to walk their 
child modules.  (IIRC, it does *not*, however, require them to be 
imported in order to discover their existence.)



In that case, I guess it's a good thing; these bugs should be dealt 
with.  Thanks for pointing them out.  My opinion of PEP 402 has been 
completely reversed - although I'd still like to see a section about 
the module system from a library/tools author point of view rather 
than a time-traveling perl user's narrative :).


LOL.

If you will propose the wording you'd like to see, I'll be happy to 
check it for any current-and-or-future incorrect assumptions.  ;-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread Glyph Lefkowitz

On Aug 12, 2011, at 2:33 PM, P.J. Eby wrote:

 At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
 Upon further reflection, PEP 402 _will_ make dealing with namespace packages 
 from this code considerably easier: we won't need to do AST analysis to look 
 for a __path__ attribute or anything gross like that improve correctness; we 
 can just look in various directories on sys.path and accurately predict what 
 __path__ will be synthesized to be.
 
 The flip side of that is that you can't always know whether a directory is a 
 virtual package without deep inspection: one consequence of PEP 402 is that 
 any directory that contains a Python module (of whatever type), however 
 deeply nested, will be a valid package name.  So, you can't rule out that a 
 given directory *might* be a package, without walking its entire reachable 
 subtree.  (Within the subset of directory names that are valid Python 
 identifiers, of course.)

Are there any rules about passing invalid identifiers to __import__ though, or 
is that just less likely? :)

 However, you *can* quickly tell that a directory *might* be a package or is 
 *probably* one: if it contains modules, or is the same name as an 
 already-discovered module, it's a pretty safe bet that you can flag it as 
 such.

I still like the idea of a 'marker' file.  It would be great if there were a 
new marker like __package__.py.  I say this more for the benefit of users 
looking at a directory on their filesystem and trying to understand whether 
this is a package or not than I do for my own programmatic tools though; it's 
already hard enough to understand the package-ness of a part of your filesystem 
and its interactions with PYTHONPATH; making directories mysteriously and 
automatically become packages depending on context will worsen that situation, 
I think.

I also have this not-terribly-well-defined idea that it would be handy for 
different providers of the _contents_ of namespace packages to provide their 
own instrumentation to be made aware that they've been added to the __path__ of 
a particular package.  This may be a solution in search of a problem, but I 
imagine that each __package__.py would be executed in the same module 
namespace.  This would allow namespace packages to do things like set up 
compatibility aliases, lazy imports, plugin registrations, etc, as they 
currently do with __init__.py.  Perhaps it would be better to define its 
relationship to the package-module namespace in a more sensible way than 
execute all over each other in no particular order.

Also, if I had my druthers, Python would raise an exception if someone added a 
directory marked as a package to sys.path, to refuse to import things from it, 
and when a submodule was run as a script, add the nearest directory not marked 
as a package to sys.path, rather than the script's directory itself.  The whole 
__name__ is wrong because your current directory was wrong when you ran that 
command thing is so confusing to explain that I hope we can eventually consign 
it to the dustbin of history.  But if you can't even reasonably guess whether a 
directory is supposed to be an entry on sys.path or a package, that's going to 
be really hard to do.

 In any case, you probably should *not* do the building of a virtual path 
 yourself; the protocols and APIs added by PEP 402 should allow you to simply 
 ask for the path to be constructed on your behalf.  Otherwise, you are going 
 to be back in the same business of second-guessing arbitrary importer 
 backends again!

What do you mean building of a virtual path?

 (E.g. note that PEP 402 does not say virtual package subpaths must be 
 filesystem or zipfile subdirectories of their parents - an importer could 
 just as easily allow you to treat subdirectories named 'twisted.python' as 
 part of a virtual package with that name!)
 
 Anyway, pkgutil defines some extra methods that importers can implement to 
 support module-walking, and part of the PEP 402 implementation should be to 
 make this support virtual packages as well.

The more that this can focus on module-walking without executing code, the 
happier I'll be :).

 This code still needs to support Python 2.4, but I will make a note of this 
 for future reference.
 
 A suggestion: just take the pkgutil code and bundle it for Python 2.4 as 
 something._pkgutil.  There's very little about it that's 2.5+ specific, at 
 least when I wrote the bits that do the module walking.
 
 Of course, the main disadvantage of pkgutil for your purposes is that it 
 currently requires packages to be imported in order to walk their child 
 modules.  (IIRC, it does *not*, however, require them to be imported in order 
 to discover their existence.)

One of the stipulations of this code is that it might give different results 
when the modules are loaded and not.  So it's fine to inspect that first and 
then invoke pkgutil only in the 'loaded' case, with the knowledge that the 
not-loaded case may be 

Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread P.J. Eby

At 05:03 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
Are there any rules about passing invalid identifiers to __import__ 
though, or is that just less likely? :)


I suppose you have a point there.  ;-)


I still like the idea of a 'marker' file.  It would be great if 
there were a new marker like __package__.py.


Having any required marker file makes separately-installable portions 
of a package impossible, since it would then be in conflict at 
installation time.


The (semi-)competing proposal, PEP 382, is based on allowing each 
portion to have a differently-named marker; we came up with PEP 402 
as a way to get rid of the need for any marker files (not to mention 
the bikeshedding involved.)




What do you mean building of a virtual path?


Constructing the __path__-to-be of a not-yet-imported virtual 
package.  The PEP defines a protocol for constructing this, by asking 
the importer objects to provide __path__ entries, and it does not 
require anything to be imported.  So there's no reason to 
re-implement the algorithm yourself.



The more that this can focus on module-walking without executing 
code, the happier I'll be :).


Virtual packages actually improve on this situation, in that a 
virtual path can be computed without the need to import the 
package.  (Assuming a submodule or subpackage doesn't munge the 
__path__, of course.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-11 Thread Barry Warsaw
On Aug 11, 2011, at 04:39 PM, Éric Araujo wrote:

 * XXX what is the __file__ of a pure virtual package?  ``None``?
   Some arbitrary string?  The path of the first directory with a
   trailing separator?  No matter what we put, *some* code is
   going to break, but the last choice might allow some code to
   accidentally work.  Is that good or bad?
A pure virtual package having no source file, I think it should have no
__file__ at all.  I don’t know if that would break more code than using
an empty string for example, but it feels righter.

I agree that the empty string is the worst of the choices.  no __file__ or
__file__=None is better.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-11 Thread Glyph Lefkowitz
On Aug 11, 2011, at 11:39 AM, Barry Warsaw wrote:

 On Aug 11, 2011, at 04:39 PM, Éric Araujo wrote:
 
 * XXX what is the __file__ of a pure virtual package?  ``None``?
  Some arbitrary string?  The path of the first directory with a
  trailing separator?  No matter what we put, *some* code is
  going to break, but the last choice might allow some code to
  accidentally work.  Is that good or bad?
 A pure virtual package having no source file, I think it should have no
 __file__ at all.  I don’t know if that would break more code than using
 an empty string for example, but it feels righter.
 
 I agree that the empty string is the worst of the choices.  no __file__ or
 __file__=None is better.

In some sense, I agree: hacks like empty strings are likely to lead to 
path-manipulation bugs where the wrong file gets opened (or worse, deleted, 
with predictable deleterious effects).  But the whole pure virtual mechanism 
here seems to pile even more inconsistency on top of an already irritatingly 
inconsistent import mechanism.  I was reasonably happy with my attempt to paper 
over PEP 302's weirdnesses from a user perspective:

http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html

(or https://launchpad.net/modules if you are not a Twisted user)

Users of this API can traverse the module hierarchy with certain expectations; 
each module or package would have .pathEntry and .filePath attributes, each of 
which would refer to the appropriate place.  Of course __path__ complicates 
things a bit, but so it goes.

Now it seems like pure virtual packages are going to introduce a new type of 
special case into the hierarchy which have neither .pathEntry nor .filePath 
objects.

Rather than a one-by-one ad-hoc consideration of which attribute should be set 
to None or empty strings or string or what have you, I'd really like to see 
a discussion in the PEP saying what a package really is vs. what a module is, 
and what one can reasonably expect from it from an API and tooling perspective. 
 Right now I have to puzzle out the intent of the final API from the 
problem/solution description and thought experiment.

Despite authoring several namespace packages myself, I don't have any of the 
problems described in the PEP.  I just want to know how to write correct tools 
given this new specification.  I suspect that this PEP will be the only 
reference for how packages work for a long time coming (just as PEP 302 was 
before it) so it should really get this right.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-11 Thread Antoine Pitrou
On Thu, 11 Aug 2011 11:39:52 -0400
Barry Warsaw ba...@python.org wrote:

 On Aug 11, 2011, at 04:39 PM, Éric Araujo wrote:
 
  * XXX what is the __file__ of a pure virtual package?  ``None``?
Some arbitrary string?  The path of the first directory with a
trailing separator?  No matter what we put, *some* code is
going to break, but the last choice might allow some code to
accidentally work.  Is that good or bad?
 A pure virtual package having no source file, I think it should have no
 __file__ at all.  I don’t know if that would break more code than using
 an empty string for example, but it feels righter.
 
 I agree that the empty string is the worst of the choices.  no __file__ or
 __file__=None is better.

None should be the answer. It simplifies inspection of module data
(repr(__file__) gives you something recognizable instead of raising)
and makes semantically sense (!) since there is, indeed, no actual file
backing the module.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-11 Thread P.J. Eby

At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote:

Hi,

I've read PEP 402 and would like to offer comments.


Thanks.


Minor: I would reserve packaging for
packaging/distribution/installation/deployment matters, not Python
modules.  I suggest Python package semantics.


Changing to Python package import semantics to hopefully be even 
clearer.  ;-)


(Nitpick: I was somewhat intentionally ambiguous because we are 
talking here about how a package is physically implemented in the 
filesystem, and that actually *is* kind of a packaging issue.  But 
it's not necessarily a *useful* intentional ambiguity, so I've no 
problem with removing it.)




Minor: In the UNIX world, or with version control tools, moving and
renaming are the same one thing (hg mv spam.py spam/__init__.py for
example).  Also, if you turn a module into a package, you may want to
move code around, change imports, etc., so I'm not sure the renaming
part is such a big step.  Anyway, if the import-sig people say that
users think it's a complex or costly operation, I can believe it.


It's not that it's complex or costly in anything other than *mental* 
overhead -- you have to remember to do it and it's not particularly 
obvious.  (But people on import-sig did mention this and other things 
covered by the PEP as being a frequent root cause of beginner 
inquiries on #python, Stackoverflow, et al.)




 (By the way, both of these additions to the import protocol (i.e. the
 dynamically-added ``__path__``, and dynamically-created modules)
 apply recursively to child packages, using the parent package's
 ``__path__`` in place of ``sys.path`` as a basis for generating a
 child ``__path__``.  This means that self-contained and virtual
 packages can contain each other without limitation, with the caveat
 that if you put a virtual package inside a self-contained one, it's
 gonna have a really short ``__path__``!)
I don't understand the caveat or its implications.


Since each package's __path__ is the same length or shorter than its 
parent's by default, then if you put a virtual package inside a 
self-contained one, it will be functionally speaking no different 
than a self-contained one, in that it will have only one path 
entry.  So, it's not really useful to put a virtual package inside a 
self-contained one, even though you can do it.  (Apart form it 
letting you avoid a superfluous __init__ module, assuming it's indeed 
superfluous.)




 In other words, we don't allow pure virtual packages to be imported
 directly, only modules and self-contained packages.  (This is an
 acceptable limitation, because there is no *functional* value to
 importing such a package by itself.  After all, the module object
 will have no *contents* until you import at least one of its
 subpackages or submodules!)

 Once ``zc.buildout`` has been successfully imported, though, there
 *will* be a ``zc`` module in ``sys.modules``, and trying to import it
 will of course succeed.  We are only preventing an *initial* import
 from succeeding, in order to prevent false-positive import successes
 when clashing subdirectories are present on ``sys.path``.
I find that limitation acceptable.  After all, there is no zc project,
and no zc module, just a zc namespace.  I'll just regret that it's not
possible to provide a module docstring to inform that this is a
namespace package used for X and Y.


It *is* possible - you'd just have to put it in a zc.py file.  IOW, 
this PEP still allows namespace-defining packages to exist, as was 
requested by early commenters on PEP 382.  It just doesn't *require* 
them to exist in order for the namespace contents to be importable.




 The resulting list (whether empty or not) is then stored in a
 ``sys.virtual_package_paths`` dictionary, keyed by module name.
This was probably said on import-sig, but here I go: yet another import
artifact in the sys module!  I hope we get ImportEngine in 3.3 to clean
up all this.


Well, I rather *like* having them there, personally, vs. having to 
learn yet another API, but oh well, whatever.  AFAIK, ImportEngine 
isn't going to do away with the need for the global ones to live 
somewhere, at least not in 3.3.




 * A new ``extend_virtual_paths(path_entry)`` function, to extend
   existing, already-imported virtual packages' ``__path__`` attributes
   to include any portions found in a new ``sys.path`` entry.  This
   function should be called by applications extending ``sys.path``
   at runtime, e.g. when adding a plugin directory or an egg to the
   path.
Let's imagine my application Spam has a namespace spam.ext for plugins.
 To use a custom directory where plugins are stored, or a zip file with
plugins (I don't use eggs, so let me talk about zip files here), I'd
have to call sys.path.append *and* pkgutil.extend_virtual_paths?


As written in the current proposal, yes.  There was some discussion 
on Python-Dev about having this happen automatically, and I proposed 
that it could be done by making virtual 

Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-11 Thread Nick Coghlan
On Fri, Aug 12, 2011 at 4:30 AM, P.J. Eby p...@telecommunity.com wrote:
 At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote:
  The resulting list (whether empty or not) is then stored in a
  ``sys.virtual_package_paths`` dictionary, keyed by module name.
 This was probably said on import-sig, but here I go: yet another import
 artifact in the sys module!  I hope we get ImportEngine in 3.3 to clean
 up all this.

 Well, I rather *like* having them there, personally, vs. having to learn yet
 another API, but oh well, whatever.  AFAIK, ImportEngine isn't going to do
 away with the need for the global ones to live somewhere, at least not in
 3.3.

And likely not for the entire 3.x series - I shudder at the thought of
the backwards incompatibility hell associated with trying to remove
them...

The point of the ImportEngine API is that the caching elements of the
import state introduce cross dependencies between various global data
structures. Code that manipulates those data structures needs to
correctly invalidate or otherwise update the state as things change. I
seem to recall a certain programming construct that is designed to
make it easier to manage interdependent data structures...

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com