Grisha wrote ..
> 
> On Sun, 26 Mar 2006, Graham Dumpleton wrote:
> 
> > One use for it that I already have is to get around the DirectoryIndex
> > problems in mod_python caused by Apache's use of the 
> > ap_internal_fast_redirect() function to implement that feature. The 
> > specifics of this particular issue are documented under:
> >
> >  http://issues.apache.org/jira/browse/MODPYTHON-146
> 
> 
> Could we zoom in this a little bit. I've read the description, but not
> quite sure I understand it quite yet. Is "the problem" that if I set 
> req.notes['foo'] = 'bar' in a phase prior to fixup, by the time we get
> to 
> the content handler, it will be gone because notes would be overwritten
> by 
> mod_dir?

Fixup phase or earlier actually. In the case of req.notes though, it isn't
that the value in req.notes vanishes, it is that it gets duplicated.

Consider .htaccess file containing:

  AddHandler mod_python .py
  PythonHandler mod_python.publisher
  PythonDebug On

  DirectoryIndex index.py

  PythonFixupHandler _fixup

In _fixup.py in the same directory, have:

  from mod_python import apache
  import time

  def fixuphandler(req):
    time.sleep(0.1)
    req.notes['time'] = str(time.time())
    return apache.OK

In index.py have:

  def index(req): 
    return req.notes['time']

When I use a URL:

  http://localhost:8080/~grahamd/fast_redirect/index.py

the result I get is:

  1143667522.23

Ie., a single float value holding the time the request was made.

If I now instead access the directory using the URL:

  http://localhost:8080/~grahamd/fast_redirect/

I instead get:

  ['1143667680.57', '1143667680.47']

In other words, instead of getting the single value I now get two values
contained in a list. It wouldn't matter if the the two values were the same
they would both still be included. Where a content handler was expecting
a single string value, it would die when it gets a list.

What is happening is that when the request is made against the directory
it runs through the phases up to and including the fixup handler phase.
As a consequence it runs _fixup::fixuphandler() with req.notes['time']
being set to be the time at that point.

At the end of the fixup phase a mod_dir handler kicks in and it sees
that the file type of request_rec->filename as indicated by
request_rec->finfo->filetype is APR_DIR. As a consequence it will apply
the DirectoryIndex directive, looping through listed files to find a
candidate it can redirect the request too.

In finding a candidate it reapplies phases up to and including the fixup
handler phase on the new candidate filename. This is done so that access
and authorisation checks etc are still performed on the candidate file.

Because it has run the fixup handlers on the candidate file, the
_fixup::fixuphandler() will be run again. This results in req.notes
being set. At that stage the req.notes is separate as it is in effect
run as a sub request to the main request against the directory.

If after checking through the candidates it finds one that matches, to
avoid having to run phases up to and including the fixup handler phase
on the candidate again, mod_dir tries to fake a redirect. This is what
ap_internal_fast_redirect() is being used for.

What the method does is to copy details from the request_rec structure
of the sub request for the candidate into the request_rec of the main
request. When the mod_dir fixup handler returns, the main request
then continues on to execute the content handler phase, with the
details of the sub request.

The problem with this is that rather than simply using req.notes from
the sub request, or overlapping the contents from the sub request onto
that of the main request, it merges them together. You therefore end up
with multiple entries for the 'time' value which was added.

To emphasise the problem, change the fixup handler to be:

  from mod_python import apache

  def fixuphandler(req):
    req.notes['filename'] = req.filename
    return apache.OK

and index.py to:

  def index(req): 
    return req.notes['filename']

The result when using URL against the directory is used is:

  ['/Users/grahamd/Sites/fast_redirect/index.py', 
'/Users/grahamd/Sites/fast_redirect/']

Now it isn't just req.notes that is going to see this merging as the code
in ap_internal_fast_redirect() is:

    r->notes = apr_table_overlay(r->pool, rr->notes, r->notes);
    r->headers_out = apr_table_overlay(r->pool, rr->headers_out,
                                       r->headers_out);
    r->err_headers_out = apr_table_overlay(r->pool, rr->err_headers_out,
                                           r->err_headers_out);
    r->subprocess_env = apr_table_overlay(r->pool, rr->subprocess_env,
                                          r->subprocess_env);

Thus, it also merges output headers and subprocess environment variables.
The merging of these could in themselves also cause problems.

This isn't the end of the problems though as ap_internal_fast_redirect()
doesn't do anything with:

    /** Notes on *this* request */
    struct ap_conf_vector_t *request_config;

This has two implications for mod_python.

The first is that it is the request_config that the Python request
object instance is cached in. Because the request_config is still that
of the main request, when the content handler phase is executed, it will
pick up the Python request object of the main request. Thus, any
attributes added direct to the Python request object by the sub request
will be missing.

To illustrate this, change the fixup handler to:

  from mod_python import apache
  
  def fixuphandler(req):
    if req.finfo[apache.FINFO_FILETYPE] != apache.APR_DIR:
      req.attribute = req.filename
    return apache.OK

and index.py to:

  def index(req): 
    return req.attribute

Then access index file directly using:

  http://localhost:8080/~grahamd/fast_redirect/index.py

The result is:

  /Users/grahamd/Sites/fast_redirect/index.py

Now access:

  http://localhost:8080/~grahamd/fast_redirect/

The result is an exception:

  AttributeError: 'mp_request' object has no attribute 'attribute'

Now, in the fixup handler I specifically checked for file type not equal
to a directory so that the attribute was only set when fixup handler for
index.py was run. So you don't think I am doing something wrong, you
could instead use a .htaccess file containing:

  AddHandler mod_python .py
  PythonHandler mod_python.publisher
  PythonDebug On
  
  DirectoryIndex index.py

  PythonFixupHandler _fixup | .py

and fixup handler of:

  from mod_python import apache

  def fixuphandler(req):
    req.attribute = req.filename
    return apache.OK

The important things is that whether index.py is called direct or by
application of DirectoryIndex they should behave the same and they
aren't.

The second problem with request_rec not being copied is that details
of any Python based output filters registered from within the fixup
handler are also being lost.

Keeping the .htaccess file such that fixup handler only runs with a .py
extension, change fixup handler to be:

  from mod_python import apache
  
  def outputfilter(filter):
    apache.log_error("outputfilter")
    filter.pass_on()
    return apache.OK

  def fixuphandler(req):
    req.register_output_filter("PASS", "_fixup::outputfilter")
    req.add_output_filter("PASS")
    return apache.OK

and index.py to:

  def index(req): 
    return "HELLO"

Access index.py directly and you get:

  HELLO

Check the Apache error log and you will see:

  outputfilter

logged from the filter.

Access the directory and the browser gives an error saying it couldn't load
any data from that location. Look at the Apache error log and you will get
the error:

  python_filter: Could not find registered filter.

This is because Apache had a callback in place to call mod_python for the
filter, but then mod_python could not find it, as the registration details
were still in the request_config of the sub request request_rec and weren't
copied into the main request.

Thus there are a series of problems because of how ap_internal_fast_redirect()
is implemented and used by mod_dir.

The main Apache httpd mailing list acknowledged that how merge of data
is done was wrong and that ap_internal_fast_redirect() was in general
causing problems for other Apache modules as well, such as mod_rewrite.
Some suggested that should avoid the fast redirect and do a full internal
redirect, but that such a change wouldn't be able to be done until Apache 2.4.
As this is of no help now, a workaround is required which is what my example
was one.

Note though that this whole issue of problems with the fast redirect is totally
distinct from whether req.finfo be able to be updated. It just so happened
that was wanting that ability to implement the workaround. I still contend
that there are other legitimate reasons for want to have req.finfo updated.

BTW, some examples above only work with 3.3 working version. Specifically
the output filter example.

Graham

Reply via email to