import_module() and multiple modules of same name.
--------------------------------------------------

         Key: MODPYTHON-115
         URL: http://issues.apache.org/jira/browse/MODPYTHON-115
     Project: mod_python
        Type: Bug
  Components: core  
    Versions: 3.1.4, 3.2    
    Reporter: Graham Dumpleton


The "apache.import_module()" function is a thin wrapper over the standard 
Python module importing system. This means that modules are still stored in 
"sys.modules". As modules in "sys.modules" are keyed by their module name, this 
in turn means that there can only be one active instance of a module for a 
specific name.

The "import_module()" function tries to work around this by checking the path 
name of the location of a module against that being requested and if it is 
different will reload the correct module. This check of the path though only 
occurs when the "path" argument is actually supplied to the "import_module()" 
function. The "path" is only supplied in this way when mod_python.publisher 
makes use of the "import_module()" function, it is not supplied when the 
"Python*Handler" directives are used because in that circumstance a module may 
actually be a system module and supplying "path" would prevent it from being 
found.

Even though mod_python.publisher supplies the "path" argument to the 
"import_module()" function, the check of the path has bugs, with modules 
possibly becoming inaccessible as documented in JIRA as MODPYTHON-9. 

The check by mod_python of the path name to the actual code file for a module 
to determine if it should be reloaded, can also cause a continual cycle of 
module reloading even though the modules on disk may not have changed. This 
will occur when successive requests alternate between URLs related to the 
distinct modules having the same name. This cyclic reloading is documented in 
JIRA as MODPYTHON-10.

That a module is reloaded into the same object space as the existing module 
when two modules of the same name are in different locations, can also cause 
namespace pollution and security issues if one location for the module was 
public and the other private. This cross contamination of modules is as 
documented in JIRA as MODPYTHON-11.

In respect of the "Python*Handler" directives where the "path" argument was 
never supplied to the "import_module()" function, the result would be that the 
first module loaded under the specified name would be used. Thus, any 
subsequent module of the same name referred to by a "Python*Handler" directive 
found in a different directory but within the same interpreter would in effect 
be ignored.

A caveat to this though is that such a "Python*Handler" directive would result 
in that handlers directory being inserted at the head of "sys.path". If the 
first instance of the module loaded under that name were at some point 
modified, the module would be automatically reloaded, but it would load the 
version from the different directory.

Now, although these problem as they relate to mod_python.publisher are 
addressed in mod_python 3.2.6, the underlying problems in 'import_module()' are 
not. As the bug reports as they relate to mod_python.publisher have been closed 
off as resolved, am creating this bug report so as to carry on a bug report for 
the underlying problem as it applies to "Python*Handler" directive and use of 
"import_module()" explicitly.

To illustrate the issue as it applies to "Python*Handler" directive, create two 
separate directories with a .htaccess file containing:

  AddHandler mod_python .py
  PythonHandler index
  PythonDebug On

In the "index.py" file in each separate directory put:

  import os
  from mod_python import apache

  def handler(req):
    req.content_type = 'text/plain'
    print >> req, os.getpid(), __file__
    return apache.OK

Assuming these are accessed as:

  /~grahamd/mod_python_9/subdir-1/index.py
  /~grahamd/mod_python_9/subdir-2/index.py

access the first URL, and the result will be:

  10665 /Users/grahamd/Sites/mod_python_9/subdir-1/index.py

now access the second URL and we get:

  10665 /Users/grahamd/Sites/mod_python_9/subdir-1/index.py

Note this assumes the same child process got it, so fixing Apache to run one 
child process is required for this test.

As one can see, it doesn't actually use the 'subdir-2/index.py" module at all 
and still uses the "subdir-1/index.py' module.

If one modifies "subdir-1/index.py' so its timestamp is updated and load the 
second URL again, we get:

  10665 /Users/grahamd/Sites/mod_python_9/subdir-2/index.py

This occurs because it detects the change in the first module loaded, but 
because sys.path had the second handler directory at the head of sys.path now, 
when reloaded it picked up the latter.

These issues with same name module in multiple locations is listed as ISSUE 14 
in my list of module importer problems. See:

  http://www.dscpl.com.au/articles/modpython-003.html


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to