Re: Auto updating of req.finfo when req.filename changed.
On Sun, 26 Mar 2006, Graham Dumpleton wrote: One use for it that I already have is to get around the DirectoryIndex problems in mod_python caused by Apache's use of the ap_internal_fast_redirect() function to implement that feature. The specifics of this particular issue are documented under: http://issues.apache.org/jira/browse/MODPYTHON-146 Could we zoom in this a little bit. I've read the description, but not quite sure I understand it quite yet. Is the problem that if I set req.notes['foo'] = 'bar' in a phase prior to fixup, by the time we get to the content handler, it will be gone because notes would be overwritten by mod_dir? Grisha
Re: Auto updating of req.finfo when req.filename changed.
Grisha wrote .. On Sun, 26 Mar 2006, Graham Dumpleton wrote: One use for it that I already have is to get around the DirectoryIndex problems in mod_python caused by Apache's use of the ap_internal_fast_redirect() function to implement that feature. The specifics of this particular issue are documented under: http://issues.apache.org/jira/browse/MODPYTHON-146 Could we zoom in this a little bit. I've read the description, but not quite sure I understand it quite yet. Is the problem that if I set req.notes['foo'] = 'bar' in a phase prior to fixup, by the time we get to the content handler, it will be gone because notes would be overwritten by mod_dir? Fixup phase or earlier actually. In the case of req.notes though, it isn't that the value in req.notes vanishes, it is that it gets duplicated. Consider .htaccess file containing: AddHandler mod_python .py PythonHandler mod_python.publisher PythonDebug On DirectoryIndex index.py PythonFixupHandler _fixup In _fixup.py in the same directory, have: from mod_python import apache import time def fixuphandler(req): time.sleep(0.1) req.notes['time'] = str(time.time()) return apache.OK In index.py have: def index(req): return req.notes['time'] When I use a URL: http://localhost:8080/~grahamd/fast_redirect/index.py the result I get is: 1143667522.23 Ie., a single float value holding the time the request was made. If I now instead access the directory using the URL: http://localhost:8080/~grahamd/fast_redirect/ I instead get: ['1143667680.57', '1143667680.47'] In other words, instead of getting the single value I now get two values contained in a list. It wouldn't matter if the the two values were the same they would both still be included. Where a content handler was expecting a single string value, it would die when it gets a list. What is happening is that when the request is made against the directory it runs through the phases up to and including the fixup handler phase. As a consequence it runs _fixup::fixuphandler() with req.notes['time'] being set to be the time at that point. At the end of the fixup phase a mod_dir handler kicks in and it sees that the file type of request_rec-filename as indicated by request_rec-finfo-filetype is APR_DIR. As a consequence it will apply the DirectoryIndex directive, looping through listed files to find a candidate it can redirect the request too. In finding a candidate it reapplies phases up to and including the fixup handler phase on the new candidate filename. This is done so that access and authorisation checks etc are still performed on the candidate file. Because it has run the fixup handlers on the candidate file, the _fixup::fixuphandler() will be run again. This results in req.notes being set. At that stage the req.notes is separate as it is in effect run as a sub request to the main request against the directory. If after checking through the candidates it finds one that matches, to avoid having to run phases up to and including the fixup handler phase on the candidate again, mod_dir tries to fake a redirect. This is what ap_internal_fast_redirect() is being used for. What the method does is to copy details from the request_rec structure of the sub request for the candidate into the request_rec of the main request. When the mod_dir fixup handler returns, the main request then continues on to execute the content handler phase, with the details of the sub request. The problem with this is that rather than simply using req.notes from the sub request, or overlapping the contents from the sub request onto that of the main request, it merges them together. You therefore end up with multiple entries for the 'time' value which was added. To emphasise the problem, change the fixup handler to be: from mod_python import apache def fixuphandler(req): req.notes['filename'] = req.filename return apache.OK and index.py to: def index(req): return req.notes['filename'] The result when using URL against the directory is used is: ['/Users/grahamd/Sites/fast_redirect/index.py', '/Users/grahamd/Sites/fast_redirect/'] Now it isn't just req.notes that is going to see this merging as the code in ap_internal_fast_redirect() is: r-notes = apr_table_overlay(r-pool, rr-notes, r-notes); r-headers_out = apr_table_overlay(r-pool, rr-headers_out, r-headers_out); r-err_headers_out = apr_table_overlay(r-pool, rr-err_headers_out, r-err_headers_out); r-subprocess_env = apr_table_overlay(r-pool, rr-subprocess_env, r-subprocess_env); Thus, it also merges output headers and subprocess environment variables. The merging of these could in themselves also cause problems. This isn't the end of the problems though as ap_internal_fast_redirect() doesn't do anything with: /** Notes on
Re: Auto updating of req.finfo when req.filename changed.
I'm -1 on updating req.finfo when req.filename is updated. I don't have the time to explain it in detail, but I think it flows from Graham's explanation below. Basically, httpd does a stat() for you, and its result is stored in finfo. Why make it any more complicated and magical? If you alter req.filename - should that imply another stat()? I don't think so, because the stat() is the most expensive operations in the whole request service process and unnecessary stats should be avoided. I also don't see any good use case for it. If you need a stat - Python provides way for you to do it, there is no need to get httpd involved in this process. I, of course, may be missing something obvious here, but this is my initial reaction. Let httpd/apr behave the way they behave - there is no need to pythonize them, especially when the Python standard library provides the same functionality in a pythonic way. Grisha On Sun, 26 Mar 2006, Jim Gallacher wrote: Hi Graham, +1 auto update req.finfo when req.filename changed. Is there a use case where the user might change filename but not want finfo to change? I can't think of one, so let's save the user some work and make their code more robust to boot. A point I'd like to address is your concern about mod_python differing from the Apache C api in implementing certain features. Personally I think our general concern about this is somewhat misguided. I suspect that the majority of our users are more concerned about the python-ness of mod_python rather than its apache-ness, and most would never notice if we deviate slightly from the apache C api. Adding a method or attribute to the request object for example, if it's useful to *our* users, shouldn't be rejected out of hand just because it does not exist in the underlying api. Jim Graham Dumpleton wrote: Now the mailing list is a bit quiet, I would like to see if I can get some explicit feedback on some issues related to the inability to update the req.finfo attribute. Grisha, would be nice if you could respond on this issue and give some guidance else I fear I'll never be able to progress a solution to this issue. :-( As explained in: http://issues.apache.org/jira/browse/MODPYTHON-128 although it is possible to assign a new value to req.filename, there is no way to update req.finfo to the file stats associated with that new value of req.filename. If one had access to the low level C API this would normally be achieved using: apr_stat(r-finfo, r-filename, APR_FINFO_MIN, r-pool); In mod_python though, there is no way to access the function and affect that outcome. In mod_perl 1.0 they implemented the behaviour whereby the finfo attribute was automatically updated when the filename attribute was updated by a handler. In mod_perl 2.0 they dropped this though, as they wanted to preserve the idea that in mod_perl everything behaved exactly like the C API they were trying to provide a 1 to 1 mapping for. Thus in mod_perl 2.0 you need to write: use Apache2::RequestRec (); use APR::Finfo (); use APR::Const -compile = qw(FINFO_NORM); $r-filename($newfile); $r-finfo(APR::Finfo::stat($newfile, APR::Const::FINFO_NORM, $r-pool)); As mod_python isn't attempting to provide a strict 1 to 1 mapping, it might be argued that it could do what mod_perl 1.0 did and automatically updated the finfo attribute when filename is updated. The only other alternative is to add a new method to the Python request object for which there isn't strictly speaking a direct equivalent to in the Apache C API. That is, a method that calls apr_stat() but which only performs it in relation to the filename and finfo attributes in the request object itself and is not a generic routine. Since it isn't likely that mod_python will ever provide a lower level API for use of finfo related structures and functions and even if it did they most likely would be distinct to the request object, the name of the function added to the request object could still be called stat(). Thus mod_python equivalent to what mod_perl 2.0 does would be: req.filename = newfile req.stat() This though doesn't really convey a sense of what it occurring. Thus a more descriptive name would probably be more appropriate. For example: req.filename = newfile req.update_finfo() There is no ap_update_finfo() function now, but if they did later implement one, this would shadow it and prevent it being added to the request object if it was pertinent for that be done. The next problem is that apr_stat() actually takes an argument indicating what fields in the finfo attribute should be updated. In mod_perl 1.0 the value used when the automatic update was done was APR_FINFO_MIN which results in type, mtime, ctime, atime, size being updated. In the documentation for mod_perl 2.0 it suggests use of APR_FINFO_NORM instead which is described as intended to be used when an atomic unix apr_stat() is required whatever that means. Important
Re: Auto updating of req.finfo when req.filename changed.
Sorry, you have missed an important point. I am not suggesting this as some sort of convoluted alternative to using os.stat() in Python for when a Python based handlers wants to stat some alternative file. I am specifically addressing the issue where request_rec-filename is being updated to affect the outcome of a later handler phase which isn't even written in Python but C or Perl and which is designed to look at request_rec-filename and which correspondingly expects that request_rec-finfo be for the current file mentioned in request_rec-filename. The example I gave was: def headerparserhandler(req): if req.finfo[apache.FINFO_FILETYPE] == apache.APR_DIR: if req.uri[-1] == '/': req.filename = posixpath.join(req.filename, 'index.html') # XXX # Need to somehow ensure that req.finfo (actually # request_rec-finfo) is updated here to contain # apr_stat() result for new value of req.filename. # XXX return apache.OK The purpose of this handler is to get around the problem that the DirectoryIndex directive is effectively broken for mod_python if you use handlers prior to the content handler phase. Even though a workaround for a particular issue, updating req.filename and req.finfo is still a legitimate thing that can be done in C API modules to affect behaviour of later handlers implemented by other modules. In other words, please don't judge the example on the basis that it is a workaround for an Apache problem and say the real change should be to fix Apache's DirectoryIndex code. Now as to the handler above, it needs to do two things. The first is to update request_rec-filename. The new value of this field will then be consulted by the type checker phase of mod_mime to update the request_rec-content_type attribute. The second is that it needs to somehow force the update of the request_rec-finfo attribute to correspond to the new value of the request_rec-filename attribute. This then ensures that the file stats will reflect that it is a file and not a directory. If this update to request_rec-finfo is not done and it still records details of the original directory, when the fixup handler of mod_dir is executed, then it will still think the target of the request is a directory and will still trigger the broken DirectoryIndex code which is not what is desired. Thus, no matter how it is done, whether it be automatic or whether an explicit function is provided to do it, there needs to be some way of updating the underlying request_rec-finfo attribute. Saying to use Python os.stat() is of no use as req.finfo isn't writable, is not the same structure as what os.stat() returns and isn't a full reflection of what the underlying request_rec-finfo attribute holds anyway. So, you couldn't even fake up request_rec-finfo from the result of os.stat() anyway. There is therefore two things that need to worked out. The first is a consensus that there needs to be a way of updating req.finfo to correspond to new value of req.filename. Second is to work out how that should be able to be done. At the moment I get the feeling that you question the need for req.finfo to be updatable in the first place. I also think saying a stat is expensive is a red herring as we are talking about a whether specific case here. In the majority of cases, no one would have a legitimate need to be modifying req.filename in the first place, so no extra stat is going to be done, so cost of the stat is not an issue. Where req.filename is modified though, it would normally follow that a stat should be done to update req.finfo. In the above example to workaround DirectoryIndex problem the outcome is actually more effecient than the alternative, which is to use req.internal_redirect(). Thus, that one extra stat is performed is nothing compared to what is saved in avoiding the redirect. Now of all the code I have seen so far, the only time I have ever seen anyone modify req.filename is in mod_python.publisher. It updates it so that req.filename will properly yield the name of the Python code file that the request mapped against. Because mod_python.publisher partly uses it as a temporary variable rather than use a true temporary variable and only assign to it once, an automatic update of finfo would incur extra cost. But then, except for one case, mod_python.publisher is doing its own stat of some form after each assignment anyway, so it could eliminate those and use the automatically updated finfo. The one case it isn't stating is when it normalises the path for local OS, which strictly speaking it possibly shouldn't do as I think Apache might always provide req.filename in POSIX like format anyway and thus it is breaking that convention. That mod_python.publisher updates req.filename but not req.finfo also means that one can't use the mtime stored in req.finfo as input to req.set_last_modified(). Ie.,set last modified header
Re: Auto updating of req.finfo when req.filename changed.
Hi Graham, +1 auto update req.finfo when req.filename changed. Is there a use case where the user might change filename but not want finfo to change? I can't think of one, so let's save the user some work and make their code more robust to boot. A point I'd like to address is your concern about mod_python differing from the Apache C api in implementing certain features. Personally I think our general concern about this is somewhat misguided. I suspect that the majority of our users are more concerned about the python-ness of mod_python rather than its apache-ness, and most would never notice if we deviate slightly from the apache C api. Adding a method or attribute to the request object for example, if it's useful to *our* users, shouldn't be rejected out of hand just because it does not exist in the underlying api. Jim Graham Dumpleton wrote: Now the mailing list is a bit quiet, I would like to see if I can get some explicit feedback on some issues related to the inability to update the req.finfo attribute. Grisha, would be nice if you could respond on this issue and give some guidance else I fear I'll never be able to progress a solution to this issue. :-( As explained in: http://issues.apache.org/jira/browse/MODPYTHON-128 although it is possible to assign a new value to req.filename, there is no way to update req.finfo to the file stats associated with that new value of req.filename. If one had access to the low level C API this would normally be achieved using: apr_stat(r-finfo, r-filename, APR_FINFO_MIN, r-pool); In mod_python though, there is no way to access the function and affect that outcome. In mod_perl 1.0 they implemented the behaviour whereby the finfo attribute was automatically updated when the filename attribute was updated by a handler. In mod_perl 2.0 they dropped this though, as they wanted to preserve the idea that in mod_perl everything behaved exactly like the C API they were trying to provide a 1 to 1 mapping for. Thus in mod_perl 2.0 you need to write: use Apache2::RequestRec (); use APR::Finfo (); use APR::Const -compile = qw(FINFO_NORM); $r-filename($newfile); $r-finfo(APR::Finfo::stat($newfile, APR::Const::FINFO_NORM, $r-pool)); As mod_python isn't attempting to provide a strict 1 to 1 mapping, it might be argued that it could do what mod_perl 1.0 did and automatically updated the finfo attribute when filename is updated. The only other alternative is to add a new method to the Python request object for which there isn't strictly speaking a direct equivalent to in the Apache C API. That is, a method that calls apr_stat() but which only performs it in relation to the filename and finfo attributes in the request object itself and is not a generic routine. Since it isn't likely that mod_python will ever provide a lower level API for use of finfo related structures and functions and even if it did they most likely would be distinct to the request object, the name of the function added to the request object could still be called stat(). Thus mod_python equivalent to what mod_perl 2.0 does would be: req.filename = newfile req.stat() This though doesn't really convey a sense of what it occurring. Thus a more descriptive name would probably be more appropriate. For example: req.filename = newfile req.update_finfo() There is no ap_update_finfo() function now, but if they did later implement one, this would shadow it and prevent it being added to the request object if it was pertinent for that be done. The next problem is that apr_stat() actually takes an argument indicating what fields in the finfo attribute should be updated. In mod_perl 1.0 the value used when the automatic update was done was APR_FINFO_MIN which results in type, mtime, ctime, atime, size being updated. In the documentation for mod_perl 2.0 it suggests use of APR_FINFO_NORM instead which is described as intended to be used when an atomic unix apr_stat() is required whatever that means. Important to note though is that is that the ap_directory_walk() function in Apache which is used to map a URL against a file in the filesystem uses APR_FINFO_MIN. Now if a function were to be provided, it seems to make sense that it have a default whereby it uses APR_FINO_MIN, much as would be the case if the finfo attribute were updated automatically when filename is updated. Should though a function if provided allow the ability to supply an alternate for this value so as to be selective as to what attributes of finfo are updated? If it were allowed, have the problem that there are already attributes in mod_python for: FINFO_MODE = 0 FINFO_INO = 1 FINFO_DEV = 2 FINFO_NLINK = 3 FINFO_UID = 4 FINFO_GID = 5 FINFO_SIZE = 6 FINFO_ATIME = 7 FINFO_MTIME = 8 FINFO_CTIME = 9 FINFO_FNAME = 10 FINFO_NAME = 11 FINFO_FILETYPE = 12 Rather than these being equivalents to the APR constants: #define APR_FINFO_LINK 0x0001 #define