Re: Solving the import problem

Graham Dumpleton Tue, 07 Jun 2005 18:11:44 -0700

An update on a few things that I have managed to get working in
Vampire in respect of some of the issues below, plus a few other
comments.


On 08/06/2005, at 6:33 AM, Nicolas Lehuen wrote:

One last thing that we should prepare is a clear and definite answer
to the zillion users who need to import a custom utility module.
Today, we have 4 ways of importing code :

a) the standard "import" keyword. Today, it works unchanged
(mod_python doesn't install any import hook). The consequence is that
the only modules that can be imported this way are those found on the
PYTHONPATH. Importing custom code is easy if you can manipulate this
variable (either directly or through the PythonPath configuration
directive), but not everybody has this luxury (think shared hosting,
although not being able to change the PythonPath through an .htaccess
file seems pretty restrictive to me.).


I finally worked out the proper way in Python that one is meant to
install import hooks so that you don't screw up other packages also
trying to use import hooks, although it relies on the other packages
doing it the correct way as well.

The result is that in Vampire, when the feature is enabled, you can
use the "import" keyword to import modules local to the document tree
where the handler is and it will use the Vampire module importing
system instead for those imports. Where the context is traceable back
to a top level import of a handler from Vampire, the automatic module
reloading mechanism, including changes in children causing parents
to be reloaded, is all working okay.

When this feature kicks in, it will only search in the same directory
as handler file is located and optionally along a module search path
which is distinct from the normal sys.path. This search path has to
be separate and can't overlap with sys.path because you will end up
with duplicate modules loaded in different ways if one isn't careful.
The preferred approach is that sys.path should simply not include any
directory which is a part of the document tree.

The only part of what "import" provides that isn't working completely
yet is importation of packages. The bits of this that do work are the
importing of the root of the package. Importing of a sub module/package
of the package which was already imported by the parent and using the
from/import syntax to import only bits of any of these.

The one bit that I haven't been able to get working yet is where you
have "import package.module" and where "module" wasn't explicitly
imported by "package/__init__.py".

The reason it doesn't work is that the part of the Python import system

that deals with packages assumes that any module imports are alwaysstored

in sys.modules. It relies on this and will search sys.modules for the
parent module to determine which directory it is in and thus from where
it should import the sub module/package.

At the moment to me this makes is look like any system that tries to use

import hooks in Python, cannot support packages where themodules/packages

are not stored in sys.modules.

Because of this, even though packages partly work, at the moment I throw
an import error with a message saying that packages aren't supported in
the context of the Vampire module importing system if such an import is

attempted. This shouldn't be an issue for individual handler filesstoredin the document tree as you wouldn't write them as packages normallyanyway.It might be an issue if someone had a set of utility modules livingoutside

the document tree that they wanted automatic reloading to work on. The
only choice there at the moment is not to use a traditional package in
that context. You could get more flexibility by accessing the module

loading API in Vampire directly, but that means the utility modules,that

perhaps shouldn't strictly know about Vampire/mod_python, will.

b) the PythonImport directive, which ensure that a module is imported
(hence its initialization code is ran), but doesn't really import it
into the handler's or published module's namespace.

c) the apache.import_module() function, which is frankly a strange
beast. It knows how to approximately reload some modules, but has many
tricks that makes it frankly dangerous, namely the one that causes
modules with the same name but residing in different directories to
collide. I really think that mixing dynamic (re)loading of code with
the usual import mechanisms is a recipe for disaster. Anyway, today,
it's the only way our users can import some shared code, using a
strange idiom like
apache.import_module('mymodule',[dirname(__file__)]).


I know you have marked:

  http://issues.apache.org/jira/browse/MODPYTHON-9

as resolved by virtue of including a new module importing system in
publisher, but there is still the underlying problem in import_module()
function that once you access an "index.py" in a subdirectory, the one
in the parent is effectively lost. I realise that even if this is fixed,
each still gets reloaded on cyclic requests, but at least the parent
doesn't become completely useless.

d) the new publisher.get_page(req,path), which is not really an answer
since it is designed to allow a published object to call another
published object from another page (not to call some shared code).

This mess should be sorted out. As a baseline, I'd say that we have 4
kinds of code in mod_python :


Brain slowing down at this point. I'll perhaps come back with some more
coherent thoughts on the rest of your points later when I have got some
other things out of the way. :-)

1) the standard Python code that should be imported using the "import"keyword


2) handlers, which are dynamically loaded through apache.import_module
(so they are declared in sys.module, with all the problem that can
cause when sharing a single setup with multiple handlers that have the
same name, "publisher" for example) - this should be fixed.

3) published modules, which are dynamically loaded by the
mod_python.publisher handler (so now they don't have any problems that
were previously caused by apache.import_module). An important thing to
notice is that published module are usually stored in a directory
which is visible by Apache (handlers don't need to reside in a public
directory), amongst .html and image files. Hence, people can
legitimately be reluctant to put their core application code
(including DB passwords etc.) in published modules, for security and
code/presentation separation issues.

4) custom library code, AKA core application code. This code should
reside somewhere, preferably in a private directory (at least direct
access to this code from the web should be denied) and be easily
imported and reloaded into published modules, without having to tinker
too much with the PYTHONPATH variable or the PythonPath directive.

What would be nice is a clear and definite way to handle those 4 kinds
of code. To me, layers 2, 3 and 4 could be handled by the same dynamic
code cache, except that a careful directory structure or naming scheme
would prevent the layer 4 to be visible from the web.

I know Vampire solves a lot of these problems, so we have twoalternatives :


A) We decide that we won't solve the whole problem into mod_python. We
take apache.import_module out and shoot it. Handlers are loaded in a
real dynamic code cache maybe the same as the one now used by
mod_python.publisher), which solves a lot of problems.

Custom library code is not handled : if you want to import some code,
you put it wherever you like and make sure PYTHONPATH or the
PythonPath directive point to it, so you can import it like a standard
module. You'll never use apache.import_module anymore, it will
blissfully dissolve into oblivion (and be removed from the module,
anyway).

If you need to reload your core application code without restarting
Apache, then too bad, mod_python doesn't know how to do this. Check
out Vampire.

B) We decide to solve the whole problem into mod_python.
apache.import_module is not much luckier this time, it is still taken
out and shot in the head. We solve the handlers loading problem. But
now, with a little help from Graham, custom application code can be
dynamically loaded and reloaded from any place without having to
tinker with the PYTHONPATH variable and/or the PythonPath directive.
Everything can be done from the source code with a little help from an
.htaccess file.

So, sorry for this long mail, but I had to get this out. The current
situation is pretty bad, zillions of people need to do this simple
thing, and when they notice it's not that simple (or it's buggy), they
decide to build the nth application framework on mod_python. So,
either we reckon it's None of our business, that users should turn to
higher level frameworks like Vampire, and we remove
apache.import_module, or we decide to tackle the issue, and we remove
apache.import_module. Either way, it must leave :).

What do you think ?

Regards,
Nicolas

Re: Solving the import problem

Reply via email to