I admit I haven't necessarily fully digested some of what has already been
proposed but, here is my take on the issue put together on main train ride
this morning to work .....

I feel that there are a lot of issues which need to be covered to solve the
problems with module loading. The apache.import_module() method is
currently used in a number of different contexts and each has differing
requirements. We need to look at each of these in turn and make sure we
clearly record and understand what is required for each.

The first point at which apache.import_module() is used is to load the top
level handler. Ie., the module associated with a PythonHandler directive or
the directive associated with a phase other than the content handler. The
other type of top level import is that done by the PythonImport directive.

If apache.import_module() were to be replaced with a mechanism which avoids
use of the "imp" module and storage of modules in sys.modules, these
particular cases of top level imports wouldn't be able to use it
exclusively. This is because the top level handler for PythonHandler will
often be a module which is stored in the Python site-packages directory.
Ie., modules such as mod_python.publisher, mod_python.psp, mpservlets and
vampire.

There are already problems in situations where in one part of the
documentation tree someone defines PythonHandler to be mod_python.psp and
in a handler in a different part of the tree a handler does an explicit
import of mod_python.psp. From memory, if PythonHandler case is triggered
first, then when the explicit import of mod_python.psp occurs it will fail
as the apache.import_module() function doesn't quite set up the sys.modules
environment in a way that is compatible with the "import" statement.

As well as top level imports from site-packages, PythonHandler has to
deal with the case where the module to be imported is loaded from the
document tree itself, specifically where the Directory directive is
specified or where the .htaccess file resides.

In this case, it currently works by virtue of sys.path being amended by
mod_python to include that directory before doing the import using
apache.import_module(). The problem here is that you can't then easily use
the same named module in different directories as the PythonHandler.

What I think needs to happen for these top level imports is that mod_python
has to determine if the module to be loaded is to come from the document
tree or from somewhere else on sys.path. If the module is not from the
document tree then the standard Python import mechanisms would be used to
import it. Consequently, such modules would not be candidates for any form
of automatic module reloading. Ie., no module reloading is done on anything
in sys.modules as it is now.

This would ensure for example that mod_python.psp is imported in a standard
way and that an explicit import of mod_python.psp from a users handler code
is going to work, thus avoiding the hack at the moment that mod_python.psp
must be loaded in a users handler using apache.import_module().

If mod_python finds that the module is not a standard module but one which
is defined within the document tree, then it would use the new and improved
apache.import_module() which doesn't rely on sys.modules.

Note that the direction I am looking at here is that apache.import_module()
is made to function properly in the contexts it needs to and not perform
double duty in satisfying extra requirements of top level mod_python imports
where it has to import stuff from site-packages. The top level imports
should be treated specially and it should only defer to
apache.import_module() for imports from the document tree.

If this separation is done, I think that the distinction that has been
introduced with a separate module loader in mod_python.publisher can be
eliminated. The apache.import_module() can simply be replaced with that in
mod_python.publisher or a modification of it to satisfy other requirements
I will talk about later in future emails.

As far as imports from any of the above imported modules goes, the general
rule should be that if it is a standard module in sys.path, then "import"
is used. If it is within the document tree then apache.import_module().

As far as utility modules which exist outside of the document tree which
are specifically related to the web application but which aren't on sys.path
and for which you want module reloading to work, apache.import_module()
would still be used, but you have to specify the actual directory to the
function.

In some respects the ability not to specify a path to apache.import_module()
should be disallowed with a path always required. Further, sys.path should
no longer be automatically ammended to include the directory where the
PythonHandler is defined for. And apache.import_module() should never
search in sys.path.

As far as I can tell at the moment, the only real reason that sys.path is
searched at the moment is to satisfy the requirements of top level imports
as far as being able to find stuff in site-packages or elsewhere on sys.path.
As such, if mod_python does special checking and knows when standard Python
imports should be used, this ability can be discarded.

The implication of not extending sys.path automatically is that "import"
will not work to load a file in the same directory as the handler when in
the document tree. This was always dangerous anyway as that module could
also have been loaded by apache.import_module() and a problem could thus
arise. If "import" is used in this way it would need to be changed to
apache.import_module(), or a simple import hook introduced which when
used in a module imported using apache.import_module() will use
apache.import_module() underneath for an "import" of a file in the same
directory.

How does this seem to people? There is stil more detail just in this bit
which will need clarification and there are other issues as well which
I haven't even mentioned.

Anyway, time to do some work.

Graham

Reply via email to