I know the subject line doesn't mean much, but I want to outline an
idea I have for an addition to mod_python which would help solve
a few problems. The mail is likely to be long, but if people can understand
what I am going on about, I would appreciate some feedback.

Some background information first.

In order to define which module contains a handler to be executed,
or even a specific handler within the module, the Python*Handler
directives are used. Thus you might say:

  SetHandler mod_python
  PythonHandler mymodule

or:

  AddHandler mod_python .py
  PythonHandler mymodule::handler_py

In the case where the module is not installed into a standard location
such as Python "site-packages" directory, but in the document tree,
mod_python will modify "sys.path", prepending it with what it believes
is the directory for which the Python*Handler directory is being applied.

Unfortunately, mod_python doesn't really get this right, or can't, for a
few cases.

The main cases which mod_python has to deal with are, use of the
directive within a Directory directive or htaccess file, or inside of a
Location/LocationMatch or Files/FilesMatch directive. The file based
directives may also be nested within a Directory directive or htaccess
file.

When Python*Handler directive appears immediately inside of a
Directory directive or htaccess file:

  <Directory /some/directory>
  SetHandler mod_python
  PythonHandler mymodule
  </Directory>

mod_python is able to get hold of the name of the directory and it
uses this to setup the value of "req.hlist.directory". It is the value of
"req.hlist.directory" that mod_python prepends to "sys.path". Having
added it to "sys.path", if the module resides in that directory it will be
able to be found.

Now consider case of file based directives, specifically Files
directive inside of a Directory directive:

  <Directory /some/directory>

  <Files *.py>
  SetHandler mod_python
  PythonHandler mymodule::handler_py
  </Files>

  <Files *.csv>
  PythonHandler mymodule::handler_csv
  </Files>

  </Directory>

As logged in:

  http://issues.apache.org/jira/browse/MODPYTHON-126

this does not work. The problem is that the value that mod_python is
accessing and trying to use for the directory are the strings "*.py" or
"*.csv".

In the case of a location directive:

  <Location /some/url>
  AddHandler mod_python .py
  PythonHandler mod_python.publisher | .py
  </Location>

what mod_python is possibly going to use (haven't confirmed it) is the URL. Because this is actually directory like, this could actually be an issue because
you get something that looks like a directory being added to "sys.path".

Ultimately, this is why Python*Handler only works immediately inside of
Directory directive or htaccess file, where you want the module to be
picked up from the directory.

FWIW, I haven't yet sat down and tried to work out if Python*Handler code in mod_python can tell when it is in a Location/File directive and ignore the value it is currently using at that point and set "req.hlist.directory" to None
instead if that is appropriate.

Now on to a prelude of what I really want to talk about.

In the new module importer which I have implemented and will be proposing be added to mod_python to replace the existing version, the first argument
to the "apache.import_module()" has been overloaded so that instead of a
module name being supplied, you can supply a full path name instead.

Thus, you can say:

  __here__ = os.path.dirname(__file__)
  module = apache.import_module(os.path.join(__here__,"_common.py"))

When supplying the full path name, the ".py" extension has to be specified.
Ie., it truly has to be an exact and full path name.

As well as when calling "apache.import_module()" explicitly, one can supply
a full path name in the Apache configuration as well.

  <Directory /some/directory>
  AddHandler mod_python .py
  PythonHandler /some/directory/handlers/mymodule.py::handler_py | .py
  </Directory>

Because having to specify full path names in a configuration file like this is
a pain, especially when the directory can change when files are moved
around, a short cut is implemented.

Specifically, you can say:

  <Directory /some/directory>
  AddHandler mod_python .py
  PythonHandler ~/mymodule.py::handler_py | .py
  </Directory>

The "~/" at the start of the path provided to the directive will be replaced with the actual directory the Python*Handler directive is being used in. In this case
the "/some/directory" directory.

The "~/" prefix can also be used when calling "apache.import_module()"
explicitly.

  config = apache.import_module("~/common/config.py")

This may be done for a handler module in the actual directory or a subdirectory. In other words, the directory that the Python*Handler directive was used in becomes a context point or base directory from which files can be addressed using relative paths. This is good because you don't have to go around changing
lots of hard coded paths when you reorganise directory structures.

You can still use relative paths relative to the directory the module is in, but
instead of having to say:

  __here__ = os.path.dirname(__file__)
  module = apache.import_module(os.path.join(__here__,"_common.py"))

you can with the new module importer use:

  module = apache.import_module("./_common.py")

or:

  config = apache.import_module("../common/config.py")

In this case, files are loaded relative to "os.path.dirname(__file__)".

Going back to the problem with the Directory/Files/Python*Handler issue, the ability to use the prefix "~/" isn't of much actual use. This is because the
"req.hlist.directory" attribute is set to a bogus value.

Thus we come to my actual idea that I want some feedback on.

The idea is to provide a new directive in mod_python that allows you to mark an arbitrary point in the directory hierarchy as a context point or base directory
from which files can then be addressed using relative paths.

This would be done by using the directory "PythonSetDirectory" immediately inside any Directory directive or htaccess file. The name supplied to directive
would be the tag for identifying that context point.

  <Directory /some/directory>
  PythonSetDirectory myapproot
  </Directory>

Where a request falls in a directory for which multiple such base contexts have been marked at different levels within the enclosing parent directories, all would be accessible. This could be done by adding a function "req.get_directories()"
which would return a table object accessible like a Python dictionary.

  def handler(req):
    myapproot = req.get_directories()["myapproot"]
    ...

Tying this is with "apache.import_module()", as well as being able to use a "~/" prefix, you could use "~tag/" where tag is the name used to the directive. Thus:

  config = apache.import_module("~myapproot/common/config.py")

What this all does is allow you to define other base directories points besides
that defined by where Python*Handler was used.

From the code example above though, as is you still have to declare a path relative to that root. Thus it may make sense to generalise the mechanism a bit to allow base directory contexts to be explicitly defined by path. These
could even be expressed relative to other base contexts. For example:

  PythonAddDirectory myextroot /some/external/directory

   <Directory /some/directory>
  PythonSetDirectory myapproot
  PythonAddDirectory myapproot_common ~myapproot/common
  </Directory>

Then can say:

    config = apache.import_module("~myapproot_common/config.py")

This makes it even easier to change the structure of things around, localising
changes to one place, the Apache configuration.

Getting back again to the Directory/Files/Python*Handler case, it can now
be solved by using:

  <Directory /some/directory>

  PythonSetDirectory here

  <Files *.py>
  SetHandler mod_python
  PythonHandler ~here/mymodule.py::handler_py
  </Files>

  <Files *.csv>
  PythonHandler ~here/mymodule.py::handler_csv
  </Files>

  </Directory>

Finally, another use for this tagged base directories is that one could also use them to calculate relative URLs as well. Would just involve some comparisons between req.filename, req.path_info and the base directory to work it out.

Now, the question is, is this going too far as to what mod_python should itself be implementing? The benefit of doing it in mod_python though is that it can
be done in a way that is mostly transparent and doesn't require too much
special steps on the part of the user. It is also cleanly integrated with the module importer as well. To do it outside of mod_python would be harder and
if everyone were to implement something similar, they aren't going to be
compatible.

That all said, specific feedback I am interested in is:

1. Is this a good idea and something worthwhile putting in mod_python?

2. If it is, what are better names for the directives? Ie., instead of PythonSetDirectory
and PythonAddDirectory.

3. Again if it is, what is a better name for "req.get_directories()"?

4. Lastly, if you think the idea is good, but how it is done is not, how do you
think it should be done and how do you think the Python*Handler in FIles
directive should be solved, if at all?

Thanks for your patience in reading this ramble, and for any input you may
have.

Graham


Reply via email to