Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
Hi, sorry for nitpicking, but... On Wed, Jul 20, 2011 at 05:58, P.J. Eby p...@telecommunity.com wrote: ... For those implementing PEP \302 importer objects: the '\' should be removed, right? Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
Hi Sandro, On Wed, Jul 20, 2011 at 05:58, P.J. Eby p...@telecommunity.com wrote: For those implementing PEP \302 importer objects: the '\' should be removed, right? No. Philip used backslashes to prevent the HTML conversion to transform each and every instance of “PEP \d+” to a link, which gets annoying after the few first hundred times. (It was discussed a few months ago probably on web-sig or python-dev for PEP 333 or , if memory serves.) Cheers ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Sat, Jul 30, 2011 at 14:57, Éric Araujo mer...@netwok.org wrote: On Wed, Jul 20, 2011 at 05:58, P.J. Eby p...@telecommunity.com wrote: For those implementing PEP \302 importer objects: the '\' should be removed, right? No. Philip used backslashes to prevent the HTML conversion to transform each and every instance of “PEP \d+” to a link, which gets annoying after the few first hundred times. (It was discussed a few months ago probably on web-sig or python-dev for PEP 333 or , if memory serves.) Gaah, sorry for the noise then! (but at least I learnt a new thing!) Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
First off, kudos to PJE for his work on this PEP. He really had the key insight for this new approach, and did a great job of explaining his vision in a clear way so that I think everybody over on import-sig got it. On Jul 20, 2011, at 08:57 AM, P.J. Eby wrote: At 06:46 PM 7/20/2011 +1000, Nick Coghlan wrote: On Wed, Jul 20, 2011 at 1:58 PM, P.J. Eby p...@telecommunity.com wrote: So, without further ado, here it is: I pushed this version up to the PEPs repo, so it now has a number (402) and can be read in prettier HTML format: http://www.python.org/dev/peps/pep-0402/ Technically, shouldn't this be a 3XXX series PEP? Or are we not doing those any more now that all PEPs would be 3XXX? Great question. I don't know if we want/need to make the distinction any more. It does feel a little odd putting Python 3 PEPs (the only kind of new Standards Track PEPs) in the 0XXX numbers, but now that we're all moving to Python 3 wink, it seems like segregating new PEPs to the 3XXX range is a bit contrived. I think filling up 0XXX is probably fine. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
Antoine Pitrou wrote: The additional confusion lies in the fact that a module can be shadowed by something which is not a module (a mere global variable). I find it rather baffling. I think we're stuck with that as long as we use the same syntax for importing a submodule and importing a non-module name from a module, i.e. 'from x import y'. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
P.J. Eby wrote: from x import y means import x; y = x.y. It actually means slightly more that that if y is a submodule, in which case it means import x.y; y = x.y. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 11:52 AM 7/21/2011 +1000, Nick Coghlan wrote: Trying to change how packages are identified at the Python level makes PEP 382 sound positively appealing. __path__ needs to stay :) In which case, it should be a list, not a sentinel. ;-) Even better would be for these (and sys.path) to be list subclasses that did the right thing under the hood as Glenn suggested. Code that *replaces* rather than modifies these attributes would still potentially break virtual packages, but code that modifies them in place would do the right thing automatically. (Note that all code that manipulates sys.path and __path__ attributes requires explicit calls to correctly support current namespace package mechanisms, so this would actually be an improvement on the status quo rather than making anything worse). I think the simplest thing, if we're keeping __path__ (and on reflection, I think we should), would be to simply call extend_virtual_paths() automatically on new path entries found in sys.path when an import is performed, relative to the previous value of sys.path. That is, we save an old copy of sys.path somewhere, and whenever __import__() is called (well, once it gets past checking if the target is already in sys.modules, anyway), it checks the current sys.path against it, and calls extend_virtual_paths() on any sys.path entries that weren't in the old sys.path. This is not the most efficient thing in the world, as it will cause a bunch of stat calls to happen against the new directories, in the middle of a possibly-entirely-unrelated import operation, but it would certainly address the issue in the Simplest Way That Could Possibly Work. A stricter (safer) version of the same thing would be one where we only update __path__ values that are unchanged since we created them, and rather than only appending new entries, we replace the __path__ with a newly-computed one. This version is safer because it avoids corner cases like I imported foo.bar while foo.baz 1.1 was on my path, then I prepended a directory to sys.path that has foo.baz 1.2, but I still get foo.baz 1.1 when I import. But it loses in cases where people do direct __path__ manipulation. On the other hand, it's a lot easier to say you break it, you bought it where __path__ manipulation is concerned, so I'm actually pretty inclined towards using the strict version. Hey... here's a crazy idea. Suppose that a virtual package __path__ is a *tuple* instead of a list? Now, in order to change it, you *have* to replace it. And we can cache the tuple we initially set it to in sys.virtual_package_paths, so we can do an 'is' check before replacing it. Voila: __path__ still exists and is still a sequence for a virtual path, but you have to explicitly replace it if you want to do anything funky -- at which point you're responsible for maintaining it. I'm tempted to say, well, why not use a list-subclass proxy, then?, but that means more work for no real difference. I just went through dozens of examples of __path__ usage (found via Google), and I found exactly two examples of code that modifies a __path__ that is not: 1. In the __init__.py whose __path__ it is (i.e., code that'll still have a list), or 2. Modifying the __path__ of an explicitly-named self-contained package that's part of the same distribution. The two examples are from Twisted, and Google AppEngine. In the Twisted case, it's some sort of namespace package-like plugin chicanery, and in the AppEngine case, well, I'm not sure what the heck it's doing, but it seems to be making sure that you can still import stuff that has the same name as stdlib stuff, or something. The Twisted case (and an apparent copy of the same code in a project called flumotion) uses ihooks, though, so I'm not sure it'll even get executed for virtual packages. The Google case loops over everything in sys.modules, in a function by the name of appengine.dist.fix_paths()... but I wasn't able to find out who calls this function, when and why. So, pretty much, except for these bits of nosy code, the vast majority of code out there seems to only mess with its own self-contained paths, making the use of tuples seem like a pretty safe choice. (Oh, and all the code I found that reads paths without modifying them only use tuple-safe operations.) So, if we implement automatic __path__ updates for virtual packages, I'm currently leaning towards the strict approach using tuples, but could possibly be persuaded towards read-only list-proxies instead. Side note: it looks like a *lot* of code out there abuses __path__[0] to find data files, so I probably need to add a note to the PEP about not doing that when you convert a self-contained package to a virtual one. Of course, I suppose using a sentinel could address *that* problem, or an iteration-only proxy. The main concern here is that using __path__[0] will *seem* to work when you first use it with a
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 12:59 PM 7/21/2011 -0700, Reliable Domains wrote: I assume that the implicit extend_virtual_paths() would be smart enough to only do real work if there are virtual packages to do it in, so much of the performance costs (bunch of stats) are bounded by the existence of and number of virtual packages that have actually been imported, correct? Yes - this is true even for an explicit call. It only does this for imported virtual packages, and child virtual packages are only checked for if the parent package exists. So, in the case of a directory being added that has no parent packages, then the cost in stats is equal to the number of top-level, *imported* virtual packages. The __path__ wrapper scheme can do this even better, and defer doing any of the stat calls until/unless another import occurs for one of those packages. So if you munge sys.path and then don't import anything from a virtual package, no extra stat calls would happen at all. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Thu, Jul 21, 2011 at 11:20 PM, P.J. Eby p...@telecommunity.com wrote: This seems to lean in favor of making a simple reiterable wrapper type for the __path__, that only allows you to take the length and iterate over it. With an appropriate design, it could actually update itself automatically, given a subname and a parent __path__/sys.path. That is, it could keep a tuple copy of the last-seen parent path, and before iteration, compare tuple(self.parent_path) to self.last_seen_path. If they're different, it rebuilds the value to be iterated over. Voila: transparent updating of all virtual __path__ values from sys.path changes (or modifications to self-contained __path__ parents, btw), and trying to change it (or read an item from it positionally) will not create any silent failures. Alright... *if* we support automatic updates to virtual __paths__, this is probably how we should do it. (It will require, though, that imp.find_module be changed to use a different iteration method than PyList_GetItem, as it's quite possible a virtual __path__ will get passed into it.) A no-indexing tuple wrapper for virtual package __path__ values that automatically updates itself in response to parent path modifications sounds good to me (errors shall not pass silently, etc). This also allows virtual packages to be indicated clearly just through the type of their __path__ attribute rather than having to look them up in the import state. I still like the idea of keeping sys.virtual_packages as a dict mapping to the path values, though - it makes it easier to debug erroneous __path__ replacement in virtual packages by checking pkg.__path__ is sys.virtual_package_paths[pkg.__name__] Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Tue, 19 Jul 2011 23:58:55 -0400 P.J. Eby p...@telecommunity.com wrote: Anyway, to make a long story short, we came up with an alternative implementation plan that actually solves some other problems besides the one that PEP 382 sets out to solve, and whose implementation a bit is easier to explain. (In fact, for users coming from various other languages, it hardly needs any explanation at all.) I have a question. If I have (on sys.path) a module x.py containing, say: y = 5 and (also on sys.path), a directory x containing a y.py module. What is from x import y supposed to do? (currently, it would bind y to its value in x.py) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Fri, Jul 22, 2011 at 9:35 AM, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 19 Jul 2011 23:58:55 -0400 P.J. Eby p...@telecommunity.com wrote: Anyway, to make a long story short, we came up with an alternative implementation plan that actually solves some other problems besides the one that PEP 382 sets out to solve, and whose implementation a bit is easier to explain. (In fact, for users coming from various other languages, it hardly needs any explanation at all.) I have a question. If I have (on sys.path) a module x.py containing, say: y = 5 and (also on sys.path), a directory x containing a y.py module. What is from x import y supposed to do? (currently, it would bind y to its value in x.py) It would behave the same as it does today: the imported value of 'y' would be 5. Virtual packages only kick in if an import would otherwise fail. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
Le vendredi 22 juillet 2011 à 09:53 +1000, Nick Coghlan a écrit : On Fri, Jul 22, 2011 at 9:35 AM, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 19 Jul 2011 23:58:55 -0400 P.J. Eby p...@telecommunity.com wrote: Anyway, to make a long story short, we came up with an alternative implementation plan that actually solves some other problems besides the one that PEP 382 sets out to solve, and whose implementation a bit is easier to explain. (In fact, for users coming from various other languages, it hardly needs any explanation at all.) I have a question. If I have (on sys.path) a module x.py containing, say: y = 5 and (also on sys.path), a directory x containing a y.py module. What is from x import y supposed to do? (currently, it would bind y to its value in x.py) It would behave the same as it does today: the imported value of 'y' would be 5. Virtual packages only kick in if an import would otherwise fail. Wouldn't it produce confusing situations like the above example? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On 7/21/2011 5:00 PM, Antoine Pitrou wrote: Le vendredi 22 juillet 2011 à 09:53 +1000, Nick Coghlan a écrit : On Fri, Jul 22, 2011 at 9:35 AM, Antoine Pitrousolip...@pitrou.net wrote: On Tue, 19 Jul 2011 23:58:55 -0400 P.J. Ebyp...@telecommunity.com wrote: Anyway, to make a long story short, we came up with an alternative implementation plan that actually solves some other problems besides the one that PEP 382 sets out to solve, and whose implementation a bit is easier to explain. (In fact, for users coming from various other languages, it hardly needs any explanation at all.) I have a question. If I have (on sys.path) a module x.py containing, say: y = 5 and (also on sys.path), a directory x containing a y.py module. What is from x import y supposed to do? (currently, it would bind y to its value in x.py) It would behave the same as it does today: the imported value of 'y' would be 5. Virtual packages only kick in if an import would otherwise fail. Wouldn't it produce confusing situations like the above example? Regards Antoine. If I have (on sys.path), a directory x containing a y.py module, and later (on sys.path), another directory x containing a y.py module, what is from x import y supposed to do? OR If I have (on sys.path), a module x.py containing, say: y = 5 and later (on sys.path), another module x.py containing, say: y = 6 what is from x import y supposed to do? I guess I don't see how this new proposal makes anything more confusing than it already is? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Thu, 21 Jul 2011 17:31:04 -0700 Glenn Linderman v+pyt...@g.nevcal.com wrote: If I have (on sys.path), a directory x containing a y.py module, and later (on sys.path), another directory x containing a y.py module, what is from x import y supposed to do? OR If I have (on sys.path), a module x.py containing, say: y = 5 and later (on sys.path), another module x.py containing, say: y = 6 what is from x import y supposed to do? I guess I don't see how this new proposal makes anything more confusing than it already is? It does. In your two examples, the x.py files (or the x directories) live in two different base directories; imports are then resolved in sys.path order, which is expected and intuitive. However, you can have a x.py file and a x directory *in the same base directory which is present in sys.path*, meaning sys.path can't help disambiguate in this case. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On 7/21/2011 5:38 PM, Antoine Pitrou wrote: However, you can have a x.py file and a x directory *in the same base directory which is present in sys.path*, meaning sys.path can't help disambiguate in this case. Ah yes. It means there has to be one more rule for disambiguation, which Nick supplied. Your case wasn't clear to me from your first description, however. As long as there is an ordering, and it is documented, it is not particularly confusing, however. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Fri, Jul 22, 2011 at 10:00 AM, Antoine Pitrou solip...@pitrou.net wrote: Wouldn't it produce confusing situations like the above example? I don't see how it is any more confusing than any other form of module shadowing. For backwards compatibility reasons, the precedence model will be: 1. Modules and self-contained packages that can satisfy the import request are checked for first (along the whole length of sys.path). 2. If that fails, the virtual package mechanism is checked PEP 402 eliminates some cases of package shadowing by making __init__.py files optional, so your scenario will actually *work*, so long as the submodule name doesn't conflict with a module attribute. *Today* if you have: x.py x.pyd x.so x/__init__.py in the same sys.path directory, x.py wins (search order is controlled by the internal order of checks within the import system - and source files are first on that list). With PEP 302, x.py still wins, but the submodules within the x directory become accessible so long as they don't conflict with *actual* attributes set in the x module. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Fri, Jul 22, 2011 at 10:53 AM, Glenn Linderman v+pyt...@g.nevcal.com wrote: Ah yes. It means there has to be one more rule for disambiguation, which Nick supplied. Your case wasn't clear to me from your first description, however. As long as there is an ordering, and it is documented, it is not particularly confusing, however. The genuinely confusing part is that x.py still takes precedence, even if it appears on sys.path *after* x/y.py. However, we're forced into that behaviour by backwards compatibility requirements. The alternative of allowing x/y.py to take precedence has been rejected on those grounds more than once. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
Le vendredi 22 juillet 2011 à 10:58 +1000, Nick Coghlan a écrit : On Fri, Jul 22, 2011 at 10:00 AM, Antoine Pitrou solip...@pitrou.net wrote: Wouldn't it produce confusing situations like the above example? I don't see how it is any more confusing than any other form of module shadowing. The additional confusion lies in the fact that a module can be shadowed by something which is not a module (a mere global variable). I find it rather baffling. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 03:04 AM 7/22/2011 +0200, Antoine Pitrou wrote: The additional confusion lies in the fact that a module can be shadowed by something which is not a module (a mere global variable). I find it rather baffling. If you move x.py to x/__init__.py, it does *exactly the same thing* in current versions of Python: Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. from x import y import x.y x.y module 'x.y' from 'x\y.py' y 5 The PEP does nothing new or different here. If something is baffling you, it's the behavior of from ... import, not the actual importing process. from x import y means import x; y = x.y. The PEP does not propose we change this. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Fri, Jul 22, 2011 at 11:04 AM, Antoine Pitrou solip...@pitrou.net wrote: Le vendredi 22 juillet 2011 à 10:58 +1000, Nick Coghlan a écrit : On Fri, Jul 22, 2011 at 10:00 AM, Antoine Pitrou solip...@pitrou.net wrote: Wouldn't it produce confusing situations like the above example? I don't see how it is any more confusing than any other form of module shadowing. The additional confusion lies in the fact that a module can be shadowed by something which is not a module (a mere global variable). I find it rather baffling. It's still an improvement on current Python. There a submodule can be shadowed uselessly by something that doesn't even exist. For example: x.py -- No 'y' attribute x/__init__.py -- not needed in PEP 402 x/y.py from x import y -- ImportError now, but would work in PEP 402 However, this does highlight an interesting corner case not yet covered by the PEP: when building a virtual path to add to an existing module, what do we do with directories that contain __init__.py[co] files? 1. Ignore the entire directory (i.e leave it out of the created path)? (always emit ImportWarning) 2. Ignore the file and add the directory to the created path anyway? (never emit ImportWarning) 3. Ignore the file and add the directory to the created path anyway? (emit ImportWarning if __init__.py is not empty) 4. Ignore the file only if it is empty, otherwise ignore the whole directory? (emit ImportWarning if __init__.py is not empty) 5. Execute the file in the namespace of the existing module? I suspect option 1 will lead to the fewest quirks, since it preserves current shadowing behaviour for modules and self-contained packages. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Wed, Jul 20, 2011 at 1:58 PM, P.J. Eby p...@telecommunity.com wrote: So, without further ado, here it is: I pushed this version up to the PEPs repo, so it now has a number (402) and can be read in prettier HTML format: http://www.python.org/dev/peps/pep-0402/ Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
+1 (and yay!) -- Piotr Ożarowski Debian GNU/Linux Developer www.ozarowski.pl www.griffith.cc www.debian.org GPG Fingerprint: 1D2F A898 58DA AF62 1786 2DF7 AEF6 F1A2 A745 7645 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On 7/19/2011 8:58 PM, P.J. Eby wrote: Standard Library Changes/Additions -- The ``pkgutil`` module should be updated to handle this specification appropriately, including any necessary changes to ``extend_path()``, ``iter_modules()``, etc. Specifically the proposed changes and additions to ``pkgutil`` are: * A new ``extend_virtual_paths(path_entry)`` function, to extend existing, already-imported virtual packages' ``__path__`` attributes to include any portions found in a new ``sys.path`` entry. This function should be called by applications extending ``sys.path`` at runtime, e.g. when adding a plugin directory or an egg to the path. The implementation of this function does a simple top-down traversal of ``sys.virtual_packages``, and performs any necessary ``get_subpath()`` calls to identify what path entries need to be added to each package's ``__path__``, given that `path_entry` has been added to ``sys.path``. (Or, in the case of sub-packages, adding a derived subpath entry, based on their parent namespace's ``__path__``.) When I read about creating __path__ from sys.path, I immediately thought of the issue of programs that extend sys.path, and the above is the workaround for such programs. but it requires such programs to do work, and there are a lot of such programs (I, a relative newbie, have had to write some). As it turns out, I can't think of a situation where I have extended sys.path that would result in a problem for fancy namespace packages, because so far I've only written modules, not packages, and only modules are on the paths that I add to sys.path. But that does not make for a general solution. Is there some way to create a new __path__ that would reflect the fact that it has been dynamically created, rather than set from __init__.py, and then when it is referenced, calculate (and cache?) a new value of __path__ to actually search? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 06:46 PM 7/20/2011 +1000, Nick Coghlan wrote: On Wed, Jul 20, 2011 at 1:58 PM, P.J. Eby p...@telecommunity.com wrote: So, without further ado, here it is: I pushed this version up to the PEPs repo, so it now has a number (402) and can be read in prettier HTML format: http://www.python.org/dev/peps/pep-0402/ Technically, shouldn't this be a 3XXX series PEP? Or are we not doing those any more now that all PEPs would be 3XXX? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 02:24 AM 7/20/2011 -0700, Glenn Linderman wrote: When I read about creating __path__ from sys.path, I immediately thought of the issue of programs that extend sys.path, and the above is the workaround for such programs. but it requires such programs to do work, and there are a lot of such programs (I, a relative newbie, have had to write some). As it turns out, I can't think of a situation where I have extended sys.path that would result in a problem for fancy namespace packages, because so far I've only written modules, not packages, and only modules are on the paths that I add to sys.path. But that does not make for a general solution. Most programs extend sys.path in order to import things. If those things aren't yet imported, they don't have a __path__ yet, and so don't need to be fixed. Only programs that modify sys.path *after* importing something that has a dynamic __path__ would need to do anything about that. Is there some way to create a new __path__ that would reflect the fact that it has been dynamically created, rather than set from __init__.py, and then when it is referenced, calculate (and cache?) a new value of __path__ to actually search? That's what extend_virtual_paths() is for. It updates the __path__ of all currently-imported virtual packages. Where before you wrote: sys.path.append('foo') You would now write: sys.path.append('foo') pkgutil.extend_virtual_paths('foo') ...assuming you have virtual packages you've already imported. If you don't, there's no reason to call extend_virtual_paths(). But it doesn't hurt anything if you call it unnecessarily, because it uses sys.virtual_packages to find out what to update, and if you haven't imported any virtual packages, there's nothing to update and the call will be a quick no-op. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On 07/20/2011 08:57 AM, P.J. Eby wrote: At 06:46 PM 7/20/2011 +1000, Nick Coghlan wrote: On Wed, Jul 20, 2011 at 1:58 PM, P.J. Eby p...@telecommunity.com wrote: So, without further ado, here it is: I pushed this version up to the PEPs repo, so it now has a number (402) and can be read in prettier HTML format: http://www.python.org/dev/peps/pep-0402/ Technically, shouldn't this be a 3XXX series PEP? Or are we not doing those any more now that all PEPs would be 3XXX? I think we're back to normal PEP numbering. PEP 382 was also 3.x only. Eric. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Tue, 19 Jul 2011 23:58:55 -0400, P.J. Eby p...@telecommunity.com wrote: Worse, this is not just a problem for new users: it prevents *anyone* from easily splitting a package into separately-installable components. In Perl terms, it would be as if every possible ``Net::`` module on CPAN had to be bundled up and shipped in a single tarball! In general the simplicity of the proposed mechanism and implementation is attractive. However, this bit of discussion struck me as sending the wrong message. We don't *want* something like the CPAN module hierarchy. I prefer to keep things as flat as practical. Namespace packages clearly have utility, but please let's not descend into java-esq package hierarchies. -- R. David Murray http://www.bitdance.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
I wonder if this fixes the long-standing issue in OS vendor's distributions. In Fedora, for example, there is both arch-specific and non-arch directories: /usr/lib/python2.7 + /usr/lib64/python2.7, for example. Pure python goes into /usr/lib/python2.7, and code including binaries goes into /usr/lib64/python2.7. But if a package has both, it all has to go into /usr/lib64/python2.7, because the current loader can't find pieces in 2 different directories. You can't have both /usr/lib/python2.7/site-packages/foo and /usr/lib64/python2.7/site-packages/foo. So if this PEP will allow pieces of foo to be found in 2 different places, that would be helpful, IMO. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 10:40 AM 7/20/2011 -0400, Neal Becker wrote: I wonder if this fixes the long-standing issue in OS vendor's distributions. In Fedora, for example, there is both arch-specific and non-arch directories: /usr/lib/python2.7 + /usr/lib64/python2.7, for example. Pure python goes into /usr/lib/python2.7, and code including binaries goes into /usr/lib64/python2.7. But if a package has both, it all has to go into /usr/lib64/python2.7, because the current loader can't find pieces in 2 different directories. You can't have both /usr/lib/python2.7/site-packages/foo and /usr/lib64/python2.7/site-packages/foo. So if this PEP will allow pieces of foo to be found in 2 different places, that would be helpful, IMO. It's more of a long-term solution than a short-term one. In order for it to work the way you want, 'foo' would need to have its main code in foo.py rather than foo/__init__.py. You could of course make that change on the author's behalf for your distro, or remove it altogether if it doesn't contain any actual code. However, if you're going to make changes, you could change its __init__.py right now to append extra directories to the module __path__... and that's something you can do right now. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Tue, Jul 19, 2011 at 8:58 PM, P.J. Eby p...@telecommunity.com wrote: The biggest likely exception to the above would be when a piece of code tries to check whether some package is installed by importing it. If this is done *only* by importing a top-level module (i.e., not checking for a ``__version__`` or some other attribute), *and* there is a directory of the same name as the sought-for package on ``sys.path`` somewhere, *and* the package is not actually installed, then such code could *perhaps* be fooled into thinking a package is installed that really isn't. This part worries me slightly. Imagine a program as such: datagen.py json/foo.js json/bar.js datagen.py uses the files in json/ to generate sample data for a database. In datagen.py is the following code: try: import json except ImportError: import simplejson as json Currently, this works just fine, but if will break (as I understand it) under the PEP because the json directory will become a virtual package and no ImportError will be raised. Is there a mitigation for this in the PEP that I've missed? However, even in the rare case where all these conditions line up to happen at once, the failure is more likely to be annoying than damaging. In most cases, after all, the code will simply fail a little later on, when it actually tries to DO something with the imported (but empty) module. (And code that checks ``__version__`` attributes or for the presence of some desired function, class, or module in the package will not see a false positive result in the first place.) It may only be annoying, but it's still a breaking change, and a subtle one at that. Checking __version__ is of course possible, but it's never been necessary before, so it's unlikely there's much code that does it. It also makes the fallback code significantly less neat. - Jeff ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Wed, Jul 20, 2011 at 11:56 AM, Jeff Hardy jdha...@gmail.com wrote: On Tue, Jul 19, 2011 at 8:58 PM, P.J. Eby p...@telecommunity.com wrote: The biggest likely exception to the above would be when a piece of code tries to check whether some package is installed by importing it. If this is done *only* by importing a top-level module (i.e., not checking for a ``__version__`` or some other attribute), *and* there is a directory of the same name as the sought-for package on ``sys.path`` somewhere, *and* the package is not actually installed, then such code could *perhaps* be fooled into thinking a package is installed that really isn't. This part worries me slightly. Imagine a program as such: datagen.py json/foo.js json/bar.js datagen.py uses the files in json/ to generate sample data for a database. In datagen.py is the following code: try: import json except ImportError: import simplejson as json Currently, this works just fine, but if will break (as I understand it) under the PEP because the json directory will become a virtual package and no ImportError will be raised. Is there a mitigation for this in the PEP that I've missed? This problem was brought up a few times on import-sig, but I don't think a solution was ever decided on. The best solution I can think of would be to have a way for a module to mark itself as finalized (I'm not sure if that's the best term--just the first that popped into my head). This would prevent its __path__ from being created or extended in any way. For example, if the json module contains `__finalized__ = True` or something of the like, any `import json.foo` would immediately fail. Of course, this would put all the onus on the json module to solve this problem, and other modules might actually wish to be extendable into packages, in which case you'd still have this problem. In that case there would need to be a way to mark a directory as not containing importable code. Not sure what the best approach to that would be, especially since one of the goals of this PEP seems to be to avoid marker files. Erik ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 08:56 AM 7/20/2011 -0700, Jeff Hardy wrote: On Tue, Jul 19, 2011 at 8:58 PM, P.J. Eby p...@telecommunity.com wrote: The biggest likely exception to the above would be when a piece of code tries to check whether some package is installed by importing it. If this is done *only* by importing a top-level module (i.e., not checking for a ``__version__`` or some other attribute), *and* there is a directory of the same name as the sought-for package on ``sys.path`` somewhere, *and* the package is not actually installed, then such code could *perhaps* be fooled into thinking a package is installed that really isn't. This part worries me slightly. Imagine a program as such: datagen.py json/foo.js json/bar.js datagen.py uses the files in json/ to generate sample data for a database. In datagen.py is the following code: try: import json except ImportError: import simplejson as json Currently, this works just fine, but if will break (as I understand it) under the PEP because the json directory will become a virtual package and no ImportError will be raised. Well, it won't fail as long if there actually *is* a json module or package on the path. ;-) But I do see your point. Is there a mitigation for this in the PEP that I've missed? A possible mitigation would be to require that get_subpath() only return a directory name if that directory in fact contains importable modules somewhere. This is actually discussed a bit later as an open issue under Implementation Notes, indicating that iter_modules() has this issue as well. The main open questions in doing this kind of checking have to do with recursion: it's perfectly valid to have say, a 'zc/' directory whose only content is a 'buildout/' subdirectory. Of course, it still wouldn't help if the 'json/' subdirectory in your example did contain .py files. There is another possibility, though: What if we change the logic for pure-virtual package creation so that the parent module is created *if and only if* a child module is found? In that case, trying to import a pure virtual 'zc' package would fail, but importing 'zc.buildout' would succeed as long as there was a zc/buildout.py or a zc/buildout/__init__.py somewhere. And in your example, 'import json' would fail -- which is to say, succeed. ;-) This is a minor change to the spec, though perhaps a bit hairier to implement in practice. The current import.c loop over the module name parts (iterating over say, 'zc', then 'buildout', and importing them in turn) would need to be reworked so that it could either roll back the virtual package creation in the event of sub-import failure or conversely delay creation of the parent package(s) until a sub-import finds a module. I certainly think it's *doable*, mind you, but I'd hate to have to do it in C. ;-) Hm. Here's another variant that might be easier to implement (even in C), and could offer some other advantages as well. Suppose we replace the sys.virtual_packages set() with a sys.virtual_paths dict(): a dictionary that maps from module names to __path__ lists, and that's populated by the __path__ creation algorithm described in the PEP. (An empty list would mean that __path__ creation failed for that module/package name.) Now, if a module doesn't have a __path__ (or doesn't exist), we look in sys.virtual_paths for the module name. If the retrieved list is empty, we fail the import. If it's not, we proceed... but *don't* create a module or set the existing module's __path__. Then, at the point where an import succeeds, and we're going to set an attribute on the parent module, we recursively construct parent modules and set their __path__ attributes from sys.virtual_paths, if a module doesn't exist in sys.path, or its __path__ isn't set. Voila. Now there are fewer introspection problems as well: trying to 'import json.foo' when there's no 'foo.py' in any json/ directory will *not* create an empty 'json' package in sys.modules as a side-effect. And it won't add a __path__ to the 'json' module if there were a json.py found, either. What's more, since importing a pure virtual package now fails unless you've successfully imported something from it before, it makes more sense for it to not have a __file__, or a __file__ of None. Actually, it's too bad that we have to have parent packages in sys.modules, or I'd suggest we just make pure virtual packages unimportable, period. Technically, we *could* always create dummy parent modules for virtual packages and *not* put them in sys.modules, but I'm not sure if that's a good idea. It would be more consistent in some ways with the idea that virtual packages are not directly importable, but an interesting side effect would be that if module A does: import foo.bar and module B does: import foo.baz Then module A's version of 'foo' has *only* a 'bar' attribute and B's version has *only* a 'baz' attribute. This
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 12:37 PM 7/20/2011 -0400, Erik wrote: The best solution I can think of would be to have a way for a module to mark itself as finalized (I'm not sure if that's the best term--just the first that popped into my head). This would prevent its __path__ from being created or extended in any way. For example, if the json module contains `__finalized__ = True` or something of the like, any `import json.foo` would immediately fail. That wouldn't actually fix the problem Jeff brought up, which was the case where there *wasn't* a json.py. In any case, we can fix this now by banning direct import of pure-virtual packages. In that case there would need to be a way to mark a directory as not containing importable code. Not sure what the best approach to that would be, especially since one of the goals of this PEP seems to be to avoid marker files. For this particular issue, we don't need it. For tools that process Python code, or use pkgutil.walk_modules(), there may still be use cases, so we'll keep an eye open for relevant input. Hopefully someone will say something that jars loose an idea or two, as happened with Jeff's issue above. (Btw, as we speak, I am swiping Jeff's example and adding it into the PEP. ;-) It makes a great motivating example for banning pure-virtual package imports.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Wed, Jul 20, 2011 at 11:04 AM, P.J. Eby p...@telecommunity.com wrote: Hm. Here's another variant that might be easier to implement (even in C), and could offer some other advantages as well. Suppose we replace the sys.virtual_packages set() with a sys.virtual_paths dict(): a dictionary that maps from module names to __path__ lists, and that's populated by the __path__ creation algorithm described in the PEP. (An empty list would mean that __path__ creation failed for that module/package name.) Now, if a module doesn't have a __path__ (or doesn't exist), we look in sys.virtual_paths for the module name. If the retrieved list is empty, we fail the import. If it's not, we proceed... but *don't* create a module or set the existing module's __path__. Then, at the point where an import succeeds, and we're going to set an attribute on the parent module, we recursively construct parent modules and set their __path__ attributes from sys.virtual_paths, if a module doesn't exist in sys.path, or its __path__ isn't set. (I'm guessing you meant sys.modules in that last sentence.) This is a really nice solution. So a virtual package is not imported until a submodule of the virtual package is successfully imported (except for direct import of pure virtual packages). It seems like sys.virtual_packages should be populated even during a failed submodule import. Is that right? Also, it makes sense that the above applies to all virtual packages, not just pure ones. Voila. Now there are fewer introspection problems as well: trying to 'import json.foo' when there's no 'foo.py' in any json/ directory will *not* create an empty 'json' package in sys.modules as a side-effect. And it won't add a __path__ to the 'json' module if there were a json.py found, either. What's more, since importing a pure virtual package now fails unless you've successfully imported something from it before, it makes more sense for it to not have a __file__, or a __file__ of None. Actually, it's too bad that we have to have parent packages in sys.modules, or I'd suggest we just make pure virtual packages unimportable, period. It wouldn't be that hard to disallow their direct import entirely, but still allow the indirect import when successfully importing a submodule. However, that would effectively imply that the import of submodules of the virtual package will also fail. In other words, it may be a source of confusion if a package can't be imported but its submodule can. There is one remaining difference between the two types of virtual packages that's derived from allowing direct import of pure virtual packages. When a pure virtual package is directly imported, a new [empty] module is created and its __path__ is set to the matching value in sys.virtual_packages. However, an impure virtual package is not created upon direct import, and its __path__ is not updated until a submodule import is attempted. Even the sys.virtual_packages entry is not generated until the submodule attempt, since the virtual package mechanism doesn't kick in until the point that an ImportError is currently raised. This isn't that big a deal, but it would be the one behavioral difference between the two kinds of virtual packages. So either leave that one difference, disallow direct import of pure virtual packages, or attempt to make virtual packages for all non-package imports. That last one would impose the virtual package overhead on many more imports so it is probably too impractical. I'm fine with leaving the one difference. Technically, we *could* always create dummy parent modules for virtual packages and *not* put them in sys.modules, but I'm not sure if that's a good idea. It would be more consistent in some ways with the idea that virtual packages are not directly importable, but an interesting side effect would be that if module A does: import foo.bar and module B does: import foo.baz Then module A's version of 'foo' has *only* a 'bar' attribute and B's version has *only* a 'baz' attribute. This could be considered a good thing, a bad thing, or a weird thing, depending on how you look at it. ;-) Probably, we should stick with the current shared 'foo' instance, even for pure virtual packages. It's just that 'foo' should not exist in sys.packages until one of the above imports succeeds. (Guessing you meant sys.virtual_packages.) Agreed. FYI, last night I started on an importlib-based implementation for the PEP and the above solution would be really easy to incorporate. -eric Anyway, thanks for bringing this issue up, because now we can fix the hole *entirely*. If pure virtual packages can never be imported directly, then they can *never* create false positive imports -- and the Backward Compatibility part of the PEP gets shorter. ;-) Hurray! (I'm tempted to run off and tweak the PEP for this right now, but I want to see if any of the folks who'd be doing the actual 3.x
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On 7/20/2011 1:04 PM, P.J. Eby wrote: This part worries me slightly. Imagine a program as such: datagen.py json/foo.js json/bar.js datagen.py uses the files in json/ to generate sample data for a database. In datagen.py is the following code: try: import json except ImportError: import simplejson as json While reading the PEP, I worried about this standard usage too but missed the scenario you imagined. Good catch. A possible mitigation would be to require that get_subpath() only return a directory name if that directory in fact contains importable modules somewhere. This is actually discussed a bit later as an open issue under Implementation Notes, indicating that iter_modules() has this issue as well. If one actually wants to create a bare-as-possible empty module, one can do that now either with a directory containing an empty __init__.py or, even cleaner, imp.new_module. So there is no need for the new mechanism to ever duplicate either ;-). So +1 on improving back-compatibility. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 01:35 PM 7/20/2011 -0600, Eric Snow wrote: This is a really nice solution. So a virtual package is not imported until a submodule of the virtual package is successfully imported Correct... (except for direct import of pure virtual packages). Not correct. ;-) What we do is avoid creating a parent module or altering its __path__ until a submodule/subpackage import is just about to be successfully completed. See the change I just pushed to the PEP: http://hg.python.org/peps/rev/a6f02035c66c Or read the revised Specification section here (which is a bit easier to read than the diff): http://www.python.org/dev/peps/pep-0402/#specification The change is basically that we wait until a successful find_module() happens before creating or tweaking any parent modules. This way, the load_module() part will still see an initialized parent package in sys.modules, and if it does any relative imports, they'll still work. (It *does* mean that if an error happens during load_module(), then future imports of the virtual package will succeed, but I'm okay with that corner case.) It seems like sys.virtual_packages should be populated even during a failed submodule import. Is that right? Yes. In the actual draft, btw, I dubbed it ``sys.virtual_package_paths`` and made it a dictionary. This actually makes the pkgutil.extend_path() code more general: it'll be able to fix the paths of things you haven't actually imported yet. ;-) Also, it makes sense that the above applies to all virtual packages, not just pure ones. Well, if the package isn't pure then what you've imported is really just an ordinary module, not a package at all. ;-) When a pure virtual package is directly imported, a new [empty] module is created and its __path__ is set to the matching value in sys.virtual_packages. However, an impure virtual package is not created upon direct import, and its __path__ is not updated until a submodule import is attempted. Even the sys.virtual_packages entry is not generated until the submodule attempt, since the virtual package mechanism doesn't kick in until the point that an ImportError is currently raised. This isn't that big a deal, but it would be the one behavioral difference between the two kinds of virtual packages. So either leave that one difference, disallow direct import of pure virtual packages, or attempt to make virtual packages for all non-package imports. That last one would impose the virtual package overhead on many more imports so it is probably too impractical. I'm fine with leaving the one difference. At this point, I've updated the PEP to disallow direct imports of pure virtual packages. AFAICT it's the only approach that ensures you can't get false positive imports by having unrelated-but-similarly-named directories floating around. So, really, there's not a difference, except that you can't import a useless empty module that you have no real business importing in the first place... and I'm fine with that. ;-) FYI, last night I started on an importlib-based implementation for the PEP and the above solution would be really easy to incorporate. Well, you might want to double-check that now that I've updated the spec. ;-) In the new approach, you cannot rely on parent modules existing before proceeding to the submodule import. However, I've just glanced at the importlib trunk, and I think I see what you mean. It's already using a recursive approach, rather than an iterative one, so the change should be a lot simpler there than in import.c. There probably just needs to be a pair of functions like: def _get_parent_path(parent): pmod = sys.modules.get(parent) if pmod is None: try: pmod = _gcd_import(parent) except ImportError: # Can't import parent, is it a virtual package? path = imp.get_virtual_path(parent) if not path: # no, allow the parent's import error to propagate raise return path if hasattr(pmod, '__path__'): return pmod.__path__ else: return imp.get_virtual_path(parent) def _get_parent_module(parent): pmod = sys.modules.get(parent) if pmod is None: pmod = sys.modules[parent] = imp.new_module(parent) if '.' in parent: head, _, tail = parent.rpartition('.') setattr(_get_parent_module(head), tail, pmod) if not hasattr(pmod, '__path__'): pmod.__path__ = imp.get_virtual_path(parent) return pmod And then instead of hanging on to parent_module during the import process, you'd just grab a path from _get_parent_path(), and initialize parent_module a little later, i.e.: if parent: path = _get_parent_path(parent) if not path: msg = (_ERR_MSG + '; {} is not a
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Wed, Jul 20, 2011 at 2:44 PM, P.J. Eby p...@telecommunity.com wrote: At 01:35 PM 7/20/2011 -0600, Eric Snow wrote: This is a really nice solution. So a virtual package is not imported until a submodule of the virtual package is successfully imported Correct... (except for direct import of pure virtual packages). Not correct. ;-) What we do is avoid creating a parent module or altering its __path__ until a submodule/subpackage import is just about to be successfully completed. Good point, though I was talking about direct imports of pure virtual packages (which you've indicated are disallowed by the current draft). Also, it makes sense that the above applies to all virtual packages, not just pure ones. Well, if the package isn't pure then what you've imported is really just an ordinary module, not a package at all. ;-) I meant that if the submodule import fails in the impure case, the existing module does not end up with a __path__. When a pure virtual package is directly imported, a new [empty] module is created and its __path__ is set to the matching value in sys.virtual_packages. However, an impure virtual package is not created upon direct import, and its __path__ is not updated until a submodule import is attempted. Even the sys.virtual_packages entry is not generated until the submodule attempt, since the virtual package mechanism doesn't kick in until the point that an ImportError is currently raised. This isn't that big a deal, but it would be the one behavioral difference between the two kinds of virtual packages. So either leave that one difference, disallow direct import of pure virtual packages, or attempt to make virtual packages for all non-package imports. That last one would impose the virtual package overhead on many more imports so it is probably too impractical. I'm fine with leaving the one difference. At this point, I've updated the PEP to disallow direct imports of pure virtual packages. AFAICT it's the only approach that ensures you can't get false positive imports by having unrelated-but-similarly-named directories floating around. I see what you mean. That case is probably more important than the case of having a package that fails to import but submodules of the package that succeed. FYI, last night I started on an importlib-based implementation for the PEP and the above solution would be really easy to incorporate. Well, you might want to double-check that now that I've updated the spec. ;-) In the new approach, you cannot rely on parent modules existing before proceeding to the submodule import. However, I've just glanced at the importlib trunk, and I think I see what you mean. It's already using a recursive approach, rather than an iterative one, so the change should be a lot simpler there than in import.c. snip So, yeah, actually, that's looking pretty sweet. Basically, we just have to throw a virtual_package_paths dict into the sys module, and do the above along with the get_virtual_path() function and add get_subpath() to the importer objects, in order to get the PEP's core functionality working. Exactly. That's part of why the importlib approach is so appealing to me. Brett really did a nice job. -eric ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On 7/20/2011 6:05 AM, P.J. Eby wrote: At 02:24 AM 7/20/2011 -0700, Glenn Linderman wrote: When I read about creating __path__ from sys.path, I immediately thought of the issue of programs that extend sys.path, and the above is the workaround for such programs. but it requires such programs to do work, and there are a lot of such programs (I, a relative newbie, have had to write some). As it turns out, I can't think of a situation where I have extended sys.path that would result in a problem for fancy namespace packages, because so far I've only written modules, not packages, and only modules are on the paths that I add to sys.path. But that does not make for a general solution. Most programs extend sys.path in order to import things. If those things aren't yet imported, they don't have a __path__ yet, and so don't need to be fixed. Only programs that modify sys.path *after* importing something that has a dynamic __path__ would need to do anything about that. Sure. But there are a lot of things already imported by Python itself, and if this mechanism gets used in the stdlib, a program wouldn't know whether it is safe or not, to not bother with the pkgutil.extend_virtual_paths() call or not. Plus, that requires importing pkgutil, which isn't necessarily done by every program that extends the sys.path (import sys is sufficient at present). Plus, if some 3rd party packages are imported before sys.path is extended, the knowledge of how they are implement is required to make a choice about whether it is needed to import pkgutil and call extend_virtual_paths or not. So I am still left with my original question: Is there some way to create a new __path__ that would reflect the fact that it has been dynamically created, rather than set from __init__.py, and then when it is referenced, calculate (and cache?) a new value of __path__ to actually search? That's what extend_virtual_paths() is for. It updates the __path__ of all currently-imported virtual packages. Where before you wrote: sys.path.append('foo') You would now write: sys.path.append('foo') pkgutil.extend_virtual_paths('foo') ...assuming you have virtual packages you've already imported. If you don't, there's no reason to call extend_virtual_paths(). But it doesn't hurt anything if you call it unnecessarily, because it uses sys.virtual_packages to find out what to update, and if you haven't imported any virtual packages, there's nothing to update and the call will be a quick no-op. I think I would have to write sys.path.append('foo') import pkgutil pkgutil.extend_virtual_paths('foo') or I'd get an error. And, in the absence of knowing (because I didn't write them) whether any of the packages I imported before extending sys.path are virtual packages or not, I would have to do this every time I extend sys.path. And so it becomes a burden on writing programs. If the code is so boilerplate as you describe, should sys.path become an object that acts like a list, instead of a list, and have its append method automatically do the pkgutil.extend_virtual_paths for me? Then I wouldn't have to worry about whether any of the packages I imported were virtual packages or not. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 03:22 PM 7/20/2011 -0600, Eric Snow wrote: On Wed, Jul 20, 2011 at 2:44 PM, P.J. Eby p...@telecommunity.com wrote: So, yeah, actually, that's looking pretty sweet. Basically, we just have to throw a virtual_package_paths dict into the sys module, and do the above along with the get_virtual_path() function and add get_subpath() to the importer objects, in order to get the PEP's core functionality working. Exactly. That's part of why the importlib approach is so appealing to me. Actually, it turns out I was a little too optimistic -- the sketch I gave doesn't work right for anything but top-level virtual packages, because I didn't take into account the part where get_virtual_path() needs a parent path. Fixing *that* error then leads to a really nasty bit of mutual recursion in which the parent module imports are attempted over and over again in something like O(N**2), I think. In order to get rid of that, _gcd_import would have to grow some internal memoization so it doesn't retry the same imports repeatedly. Ironically enough, this is because _gcd_import() is recursive, and thus attempts the imports in the opposite order (sort of) than import.c does, which means that you can't get hold of the parent's __path__ without recursing (again). :-( And trying to work around that with memoization, led me to the realization that you actually can't implement PEP 402 using that type of recursion. That is, to implement the spec correctly, _gcd_import is going to have to be refactored to iterate left-to-right over module name parts, rather than recursing right-to-left. That's because PEP 402 only allows for processing a virtual path if a module is not found, *not* if a module is found but can't be loaded. But, with importlib currently being recursive, it only knows that a parent import failed via ImportError, not whether that error arose from failing to find the module, or failing to load the module! So, the core part of the _gcd_import() function will need to be rewritten to iterate instead of recursing. (Still, it's probably not going to be *terribly* difficult. I'll take a look at doing a sketch of that next, but if I do one I'll send it to Import-SIG instead of here; it's not a detail that matters to the general PEP discussion.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 03:09 PM 7/20/2011 -0700, Glenn Linderman wrote: On 7/20/2011 6:05 AM, P.J. Eby wrote: At 02:24 AM 7/20/2011 -0700, Glenn Linderman wrote: When I read about creating __path__ from sys.path, I immediately thought of the issue of programs that extend sys.path, and the above is the workaround for such programs.àbut it requires such programs to do work, and there are a lot of such programs (I, a relative newbie, have had to write some).àAs it turns out, I can't think of a situation where I have extended sys.path that would result in a problem for fancy namespace packages, because so far I've only written modules, not packages, and only modules are on the paths that I add to sys.path.àBut that does not make for a general solution. Most programs extend sys.path in order to import things. If those things aren't yet imported, they don't have a __path__ yet, and so don't need to be fixed. Only programs that modify sys.path *after* importing something that has a dynamic __path__ would need to do anything about that. Sure. But there are a lot of things already imported by Python itself, and if this mechanism gets used in the stdlib, a program wouldn't know whether it is safe or not, to not bother with the pkgutil.extend_virtual_paths() call or not. I'm not sure I see how the mechanism could meaningfully be used in the stdlib, since IIUC we're not going for Perl-style package naming. So, all stdlib packages would be self-contained. Plus, that requires importing pkgutil, which isn't necessarily done by every program that extends the sys.path (import sys is sufficient at present). Plus, if some 3rd party packages are imported before sys.path is extended, the knowledge of how they are implement is required to make a choice about whether it is needed to import pkgutil and call extend_virtual_paths or not. I'd recommend *always* using it, outside of simple startup code. So I am still left with my original question: Is there some way to create a new __path__ that would reflect the fact that it has been dynamically created, rather than set from __init__.py, and then when it is referenced, calculate (and cache?) a new value of __path__ to actually search? Hm. Yes, there is a way to do something like that, but it would complicate things a bit. We'd need to: 1. Leave __path__ off of the modules, and always pull them from sys.virtual_package_paths, and 2. Before using a value in sys.virtual_package_paths, we'd need to check whether sys.path had changed since we last cached anything, and if so, clear sys.virtual_package_paths first, to force a refresh. This doesn't sound particularly forbidding, but there are various unpleasant consequences, like being unable to tell whether a module is a package or not, and whether it's a virtual package or not. We'd have to invent new ways to denote these things. On the bright side, though, it *would* allow transparent live updates to virtual package paths, so it might be worth considering. By the way, the reason we have to get rid of __path__ is that if we kept it, then code could change it, and then we wouldn't know if it was actually safe to change it automatically... even if no code had actually changed it. In principle, we could keep __path__ attributes around, and automatically update them in the case where sys.path has changed, so long as user code hasn't directly altered or replaced the __path__. But it seems to me to be a dangerous corner case; I'd rather that code which touches __path__ be taking responsibility for that path's correctness from then on, rather than having it get updated (possibly incorrectly) behind its back. So, I'd say that for this approach, we'd have to actually leave __path__ off of virtual packages' parent modules. Anyway, it seems worth considering. We just need to sort out what the downsides are for any current tools thinking that such modules aren't packages. (But hey, at least it'll be consistent with what such tools would think of the on-disk representation! That is, a tool that thinks foo.py alongside a foo/ subdirectory is just a module with no package, will also think that 'foo', once imported, is a module with no package.) And, in the absence of knowing (because I didn't write them) whether any of the packages I imported before extending sys.path are virtual packages or not, I would have to do this every time I extend sys.path. And so it becomes a burden on writing programs. If the code is so boilerplate as you describe, should sys.path become an object that acts like a list, instead of a list, and have its append method automatically do the pkgutil.extend_virtual_paths for me? Then I wouldn't have to worry about whether any of the packages I imported were virtual packages or not. Well, then we'd have to worry about other mutation methods, and things like 'sys.path = [blah, blah]', as well. So if we're going to ditch
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On 7/20/2011 4:03 PM, P.J. Eby wrote: I'd recommend *always* using it, outside of simple startup code. So that is a burden on every program. Documentation would help, but it certainly makes updating sys.path much more complex -- 3 lines (counting import of pkgutil) instead of one, and the complexity of understanding why there is a need for it, when in simple cases the single line works fine, but it would be bug prone to have both ways. So I am still left with my original question: Is there some way to create a new __path__ that would reflect the fact that it has been dynamically created, rather than set from __init__.py, and then when it is referenced, calculate (and cache?) a new value of __path__ to actually search? Hm. Yes, there is a way to do something like that, but it would complicate things a bit From what you said, it would complicate the solution for complex packaging tasks, but would return simple extensions of sys.path to being simple again. Sounds like a good tradeoff, but I'll leave that to you and other more knowledgeable people to figure out the details and implementation... I snipped the explanation, because it is beyond my present knowledge base. Anyway, it seems worth considering. We just need to sort out what the downsides are for any current tools thinking that such modules aren't packages. (But hey, at least it'll be consistent with what such tools would think of the on-disk representation! That is, a tool that thinks foo.py alongside a foo/ subdirectory is just a module with no package, will also think that 'foo', once imported, is a module with no package.) Please consider it. I think your initial proposal solves some problems, but a version that doesn't complicate the normal, simple, extension of sys.path would be a much better solution, so I am happy to hear that you have ideas in that regard. Hopefully, they don't complicate things too much more. So far, I haven't gotten my head around packages as they presently exist (this __init__.py stuff seems much more complex than the simplicity of Perl imports that I was used to, although I certainly like many things about Python better than Perl, and have switched whole-heartedly, although I still have a fair bit of Perl code to port in the fullness of time). I think your proposal here, although maintaining some amount of backward-compatibility may require complexity of implementation, can simplify the requirements for creating new packages, to the extent I understand it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Thu, Jul 21, 2011 at 9:03 AM, P.J. Eby p...@telecommunity.com wrote: Hm. Yes, there is a way to do something like that, but it would complicate things a bit. We'd need to: 1. Leave __path__ off of the modules, and always pull them from sys.virtual_package_paths, and Setting __path__ to a sentinel value (imp.VirtualPath?) would break less code, as hasattr(mod, '__path__') checks would still work. Even better would be for these (and sys.path) to be list subclasses that did the right thing under the hood as Glenn suggested. Code that *replaces* rather than modifies these attributes would still potentially break virtual packages, but code that modifies them in place would do the right thing automatically. (Note that all code that manipulates sys.path and __path__ attributes requires explicit calls to correctly support current namespace package mechanisms, so this would actually be an improvement on the status quo rather than making anything worse). I'll note that this kind of thing is one of the key reasons the import state should some day move to a real class - state coherency is one of the major use cases for the descriptor protocol, which is unavailable when interdependent state is stored as module attributes. (Don't worry, that day is a very long way away, if it ever happens at all) 2. Before using a value in sys.virtual_package_paths, we'd need to check whether sys.path had changed since we last cached anything, and if so, clear sys.virtual_package_paths first, to force a refresh. This doesn't sound particularly forbidding, but there are various unpleasant consequences, like being unable to tell whether a module is a package or not, and whether it's a virtual package or not. We'd have to invent new ways to denote these things. Trying to change how packages are identified at the Python level makes PEP 382 sound positively appealing. __path__ needs to stay :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
On Wed, Jul 20, 2011 at 7:52 PM, Nick Coghlan ncogh...@gmail.com wrote: Even better would be for these (and sys.path) to be list subclasses that did the right thing under the hood as Glenn suggested. Code that *replaces* rather than modifies these attributes would still potentially break virtual packages, but code that modifies them in place would do the right thing automatically. (Note that all code that manipulates sys.path and __path__ attributes requires explicit calls to correctly support current namespace package mechanisms, so this would actually be an improvement on the status quo rather than making anything worse). +1 as a solution to the problem Glenn brought up. However, I'm still not clear on how much code out there changes sys.path in the offending way, forcing the need to provide a more implicit solution in this PEP than extend_virtual_paths(). And in cases where sys.path *is* changed, and it impacts some virtual package, how many places is that going to happen in one project? My guess is not many (and so not many boilerplate calls). Is it worth adding implicit __path__ updates for that use case, rather than just the extend_virtual_paths() function? As an aside, my first reaction to Glenn's suggestion was that would be cool. Would it be a pursuable option? We can take this over to import-sig if it is. -eric ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com