2005/6/8, Graham Dumpleton <[EMAIL PROTECTED]>:
> > c) the apache.import_module() function, which is frankly a strange
> > beast. It knows how to approximately reload some modules, but has many
> > tricks that makes it frankly dangerous, namely the one that causes
> > modules with the same name but residing in different directories to
> > collide. I really think that mixing dynamic (re)loading of code with
> > the usual import mechanisms is a recipe for disaster. Anyway, today,
> > it's the only way our users can import some shared code, using a
> > strange idiom like
> > apache.import_module('mymodule',[dirname(__file__)]).
> 
> I know you have marked:
> 
>    http://issues.apache.org/jira/browse/MODPYTHON-9
> 
> as resolved by virtue of including a new module importing system in
> publisher, but there is still the underlying problem in import_module()
> function that once you access an "index.py" in a subdirectory, the one
> in the parent is effectively lost. I realise that even if this is fixed,
> each still gets reloaded on cyclic requests, but at least the parent
> doesn't become completely useless.

Well, the publisher problem is really solved, and you *can* publish
two index.py modules in different directories, even subdirectories,
without any problem or cyclic reload.

However, if you are still using apache.import_module, for example to
dynamically import a support module into a published module, then
you'll have some trouble, because *apache.import_module is hopelessly
broken*. I can't stress that enough.

I'm sorry to be harsh, but this function cannot be fixed so that it
works correclty,. Well, it can, but the result would be a kludge.
Right now, if two applications or two users (in shared hosting) have a
util.py module that they want to import, the first one will do
something like :

util = apache.import_module('util',['/path/to/my/application/code'])

and the second one, being very clever and having read the mailing
list, will do :

util = apache.import_module('util',[os.dirname(__file__)])

Bang, you've got a module collision. The problem is, there is no way
to solve this while retaining the current semantics of
apache.import_module.

apache.import_module tries to emulate an environment in which modules
are imported in the standard way, which a kind of
on-the-fly-reconfigurable PYTHONPATH. This simply does not work.

For starters, I really think that providing a search path as an
argument is very, very bad practice. It forces you to hardcode a path
into some application code (or to make complicated convolutions using
os.dirname or req.get_option). I think the way you did it in Vampire
is pretty clever, Graham. You look into the directory of the module
which handles the current request (and you provide a special __req__
attribute during import time to be able to refer to the current
request), then if not in a special, application-specific search path
which can be redefined through a configuration directive, and is
distinct from the sys.path.

You wrote it better than me, we should not mix sys.path, the
application-specific search path and the published document tree.

Another problem with apache.import_module is that it pollutes the
sys.modules list, hence causing a wealth of collisions opportunities.
sys.modules should be for nice, standard Python modules and modules
imported from the sys.path. Not for dynamically loaded modules.

The nice thing is that I think I understand the usage patterns of
apache.import_module much better now. So I think I see a way to
reimplement and fix it with the following behaviour :

A) Finding the module
1) If the path parameter is passed, use it
2) If the path parameter is absent, look for the module in the "local"
directory (local being based on the __file__ attribute of the current
module)
3) If there is no matching "local" module, look for it in the
application-specific path which is defined in a configuration file. I
think this requires access to a request object, so this one could be
tricky.
4) Look for the module into sys.path

B) Loading or reloading the module
In case 1 to 3, we use the cache.ModuleCache class as in the publisher
to load or reload the module. We DO NOT store the resulting module in
sys.modules. If two modules have the same same but reside in different
directories, there won't be any collision, thanks to
cache.ModuleCache. There also won't be any multithreading issues since
ModuleCache uses a two-level locking scheme.

In case 4, we lock the import lock. We check for the existence of the
module in sys.module.

- If it is absent, we load the module.
- If it is present, we check that its __file__ attribute is the same.
If not, something really weird is happening (the same module has moved
within the sys.path, or there is a possible collision) and we throw an
ImportError. We then check for a __timestamp__ attribute : if absent,
then too bad, this module is a core Python module which was not
imported using import_module, and we don't reload it (we could
forcedly reload it but I fear this would be a bit dangerous), so we
use it as is. If the __timestamp__ attribute is present, then we can
check and reload the module if necessary.

Eventually, we set the __timestamp__ attribute on the dynamically
loaded module and store it in sys.modules, since we found the module
on sys.path in the first place (case 4). It seems that it's good
practice to retain the "it's on sys.path so it's going to end on
sys.modules" semantics. We can finally release the import lock.

Note that we could extend this behaviour by checking dependencies, so
that a dynamically loaded module could be reloaded if one of the
dynamically loaded modules it depends on are reloaded. We'll see that
once we reached the above description.

What do you think about that ?

Regards,
Nicolas

Reply via email to