At 01:35 PM 7/20/2011 -0600, Eric Snow wrote:
This is a really nice solution.  So a virtual package is not imported
until a submodule of the virtual package is successfully imported

Correct...

(except for direct import of pure virtual packages).

Not correct. ;-) What we do is avoid creating a parent module or altering its __path__ until a submodule/subpackage import is just about to be successfully completed.

See the change I just pushed to the PEP:

   http://hg.python.org/peps/rev/a6f02035c66c

Or read the revised Specification section here (which is a bit easier to read than the diff):

   http://www.python.org/dev/peps/pep-0402/#specification

The change is basically that we wait until a successful find_module() happens before creating or tweaking any parent modules. This way, the load_module() part will still see an initialized parent package in sys.modules, and if it does any relative imports, they'll still work.

(It *does* mean that if an error happens during load_module(), then future imports of the virtual package will succeed, but I'm okay with that corner case.)


It seems like
sys.virtual_packages should be populated even during a failed
submodule import.  Is that right?

Yes. In the actual draft, btw, I dubbed it ``sys.virtual_package_paths`` and made it a dictionary. This actually makes the pkgutil.extend_path() code more general: it'll be able to fix the paths of things you haven't actually imported yet. ;-)


Also, it makes sense that the above applies to all virtual packages,
not just pure ones.

Well, if the package isn't "pure" then what you've imported is really just an ordinary module, not a package at all. ;-)



When a pure virtual package is directly imported, a new [empty] module
is created and its __path__ is set to the matching value in
sys.virtual_packages.  However, an "impure" virtual package is not
created upon direct import, and its __path__ is not updated until a
submodule import is attempted.  Even the sys.virtual_packages entry is
not generated until the submodule attempt, since the virtual package
mechanism doesn't kick in until the point that an ImportError is
currently raised.

This isn't that big a deal, but it would be the one behavioral
difference between the two kinds of virtual packages.  So either leave
that one difference, disallow direct import of pure virtual packages,
or attempt to make virtual packages for all non-package imports.  That
last one would impose the virtual package overhead on many more
imports so it is probably too impractical.  I'm fine with leaving the
one difference.

At this point, I've updated the PEP to disallow direct imports of pure virtual packages. AFAICT it's the only approach that ensures you can't get false positive imports by having unrelated-but-similarly-named directories floating around.

So, really, there's not a difference, except that you can't import a useless empty module that you have no real business importing in the first place... and I'm fine with that. ;-)


FYI, last night I started on an importlib-based implementation for the
PEP and the above solution would be really easy to incorporate.

Well, you might want to double-check that now that I've updated the spec. ;-) In the new approach, you cannot rely on parent modules existing before proceeding to the submodule import.

However, I've just glanced at the importlib trunk, and I think I see what you mean. It's already using a recursive approach, rather than an iterative one, so the change should be a lot simpler there than in import.c.

There probably just needs to be a pair of functions like:

    def _get_parent_path(parent):
        pmod = sys.modules.get(parent)
        if pmod is None:
            try:
                pmod = _gcd_import(parent)
            except ImportError:
                # Can't import parent, is it a virtual package?
                path = imp.get_virtual_path(parent)
                if not path:
                    # no, allow the parent's import error to propagate
                    raise
                return path
        if hasattr(pmod, '__path__'):
            return pmod.__path__
        else:
            return imp.get_virtual_path(parent)

    def _get_parent_module(parent):
        pmod = sys.modules.get(parent)
        if pmod is None:
            pmod = sys.modules[parent] = imp.new_module(parent)
            if '.' in parent:
                head, _, tail = parent.rpartition('.')
                setattr(_get_parent_module(head), tail, pmod)
        if not hasattr(pmod, '__path__'):
            pmod.__path__ = imp.get_virtual_path(parent)
        return pmod

And then instead of hanging on to parent_module during the import process, you'd just grab a path from _get_parent_path(), and initialize parent_module a little later, i.e.:

        if parent:
            path = _get_parent_path(parent)
            if not path:
msg = (_ERR_MSG + '; {} is not a package').format(name, parent)
                raise ImportError(msg)

        meta_path = sys.meta_path + _IMPLICIT_META_PATH
        for finder in meta_path:
            loader = finder.find_module(name, path)
            if loader is not None:
                # ensure parent module exists and is a package before loading
                parent_module = _get_parent_module(parent)
                loader.load_module(name)
                break
        else:
            raise ImportError(_ERR_MSG.format(name))

So, yeah, actually, that's looking pretty sweet. Basically, we just have to throw a virtual_package_paths dict into the sys module, and do the above along with the get_virtual_path() function and add get_subpath() to the importer objects, in order to get the PEP's core functionality working.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to