Re: [jira] Updated: (MODPYTHON-106) PythonAutoReload directive can't be turned off in config.
On Mon, 2 Jan 2006, Graham Dumpleton (JIRA) wrote: Should this be fixed in 3.2, given that it probably hasn't work in 3.0 and 3.1? I think it would be a good idea. Hopefully this is the last fix before we roll it out. Grisha
Re: [jira] Created: (MODPYTHON-105) mod_python.publisher should not discard content for HEAD request.
On Mon, 2 Jan 2006, Graham Dumpleton (JIRA) wrote: In addressing MODPYTHON-71, mod_python.publisher code was changed to read: if req.method!='HEAD': req.write(result) This change should not really have been made and it should be changed back to what was there before, ie., just: req.write(result) [...] As an an example of an Apache module that uses output filters to do stuff, there is mod_cache. Luckily in that case, a HEAD request is one of various cases where mod_cache decides it will not use the output. This does not mean though that some other output filter that someone is using might expect content to be there for HEAD. I am not sure I agree with this explanation (while I do agree that the behaviour should be reverted, but for different reason, see below). The RFC is pretty clear on HEAD: The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response., so for a filter (or anything) to expect the content with HEAD is wrong. In summary, one could also say that if a user wants to not output anything for a HEAD request, that is there decision, but mod_python.publisher should not be making that decision for them. May be not for the publisher, but given the MUST NOT it should be OK for either httpd or mod_python to do this. If httpd does not do it, then perhaps it should be made part of req.write()? It may be a good idea to check the httpd-dev archives to see if the issue of HEAD has been discussed in the past. In any event, I think the behaviour should be reverted to ignore HEAD for now simply because it is a half-ass solution given that req.write() gets through, especially because PSP templates rely on req.write() primarily. I think for 3.2 we can just leave it at that, but for 3.3 to seriously consider whether mod_python should chop output for HEAD requests. Also, I think a link to this JIRA issue as a comment somewhere in the code would be great or someone down the road will repeat this whole HEAD thing again. Grisha
Re: [jira] Created: (MODPYTHON-105) mod_python.publisher should not discard content for HEAD request.
On Wed, 4 Jan 2006, Gregory (Grisha) Trubetskoy wrote: It may be a good idea to check the httpd-dev archives to see if the issue of HEAD has been discussed in the past. Here's a relevant bit of info from the httpd-dev archives: * All handlers should always send content down even if r-header_only is set. If not, it means that the HEAD requests don't generate the same headers as a GET which is wrong. Grisha
Re: [jira] Created: (MODPYTHON-105) mod_python.publisher should notdiscard content for HEAD request.
Grisha wrote .. As an an example of an Apache module that uses output filters to do stuff, there is mod_cache. Luckily in that case, a HEAD request is one of various cases where mod_cache decides it will not use the output. This does not mean though that some other output filter that someone is using might expect content to be there for HEAD. I am not sure I agree with this explanation (while I do agree that the behaviour should be reverted, but for different reason, see below). The RFC is pretty clear on HEAD: The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response., so for a filter (or anything) to expect the content with HEAD is wrong. That is ignoring what filters can do, including the fact that they can add additional output headers. If a filter can add additional headers, that it doesn't receive content can be a problem. This is because the additional headers it adds may be in some way be dependent on the content. In effect this is partly what your quote from the http-dev list is about. * All handlers should always send content down even if r-header_only is set. If not, it means that the HEAD requests don't generate the same headers as a GET which is wrong. I'll grant that I may be partly talking theoretical cases here, but the inbuilt Apache CONTENT_LENGTH output filter can be used to add the content length to a response, provided the output filter is able to consume the whole content the first time it is called. Here is a case whereby if mod_python.publisher didn't return the content for a HEAD request, that the content length thereby calculated by the CONTENT_LENGTH output filter would be different for the HEAD request compared to the same request as a GET. Now if I can just work out why when I add CONTENT_LENGTH as an output filter in conjunction with mod_python.publisher that I don't get a content length at all, I might have a leg to stand on. :-) I note that if I have a Python based output filter, that a single req.write() generates two calls into the output filter. The first ends because of read returning an empty string, with the second reading None straight away. [Thu Jan 05 08:49:00 2006] [error] handler_1 [Thu Jan 05 08:49:00 2006] [error] uppercase [Thu Jan 05 08:49:00 2006] [error] is_input 0 [Thu Jan 05 08:49:00 2006] [error] read 'CONTENT' [Thu Jan 05 08:49:00 2006] [error] write 'CONTENT' [Thu Jan 05 08:49:00 2006] [error] read '' [Thu Jan 05 08:49:00 2006] [error] uppercase [Thu Jan 05 08:49:00 2006] [error] is_input 0 [Thu Jan 05 08:49:00 2006] [error] read None [Thu Jan 05 08:49:00 2006] [error] close For CONTENT_LENGTH to work, I would image it should be getting the equivalent of the None read at the end of the first call into the output filter. I wander if this is somehow specifically to do with how mod_python manages the bucket brigade. It would be unfortunate if mod_python is doing something a bit strange which would prevent the CONTENT_LENGTH output filter being used. Wait, I worked it out. Two calls are being seen into the output filter because req.write() is not being told not to flush the output. Thus, if req.write() is called as: req.write(result,0) it all works and content length is added by the CONTENT_LENGTH output filter. Thus with that change, one can see the problem with not outputing content when HEAD is used. Namely, no content length header generated. ~ [505]$ telnet localhost 8080 Trying ::1... Connected to localhost. Escape character is '^]'. GET /~grahamd/content_length/example.py HTTP/1.0 HTTP/1.1 200 OK Date: Wed, 04 Jan 2006 21:59:39 GMT Server: Apache/2.0.55 (Unix) mod_python/3.2.6-dev-20051229 Python/2.3 Content-Length: 7 Connection: close Content-Type: text/plain CONTENTConnection closed by foreign host. ~ [506]$ telnet localhost 8080 Trying ::1... Connected to localhost. Escape character is '^]'. HEAD /~grahamd/content_length/example.py HTTP/1.0 HTTP/1.1 200 OK Date: Wed, 04 Jan 2006 21:59:53 GMT Server: Apache/2.0.55 (Unix) mod_python/3.2.6-dev-20051229 Python/2.3 Connection: close Content-Type: text/plain Connection closed by foreign host. Either way, we agree that mod_python.publisher should still output content for HEAD. I would also propose as a change that the req.write() call not cause output to be flushed to allow an output filter like CONTENT_LENGTH to be used. I'll add a new JIRA issue for that. Graham
BUG?: Segfault with add_handler
Hi, So far, I really like mod_python -- it's a great piece of software. But I think I've found a small bug. I've noticed that adding a handler using add_handler prompts apache to segfault if no statically registered (e.g., specified in the apache conf files) handlers of the same category (e.g., PythonLogHandler, PythonTransHandler) are present. I've tested this in both the release version (314) and version 325b (albeit more thoroughly in the former). Here's how I triggered the bug: Add this to apache2.conf: PythonPath sys.path+[(the path goes here)] PythonTransHandler crashbug crashbug.py contains: from mod_python import apache def transhandler(req): req.add_handler(PythonLogHandler,foo::loghandler) return apache.OK foo contains: from mod_python import apache def loghandler(req): import sys sys.stderr.write(foo: saw request type: + req.method + \n) return apache.OK If I run apache sudo /usr/sbin/apache2 -X and tickle the server: echo -e GET / HTTP/1.0\n\n | nc localhost 8080 ...apache segfaults. On the other hand, if I add the line PythonLogHandler notfoo to apache2.conf, where notfoo.py is identical to foo except that foo: is replaced with notfoo, I get the expected two lines in the log, and no crash. Here's the trace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1214269216 (LWP 21283)] 0xb76dddb3 in hlist_copy (p=0x8158fa8, hle=0x0) at hlist.c:91 91 head-handler = apr_pstrdup(p, hle-handler); (gdb) bt #0 0xb76dddb3 in hlist_copy (p=0x8158fa8, hle=0x0) at hlist.c:91 #1 0xb76dd9aa in MpHList_FromHLEntry (hle=0x0) at hlistobject.c:49 #2 0xb76e59b6 in python_handler (req=0x8238480, phase=0xb76e804c PythonLogHandler) at mod_python.c:1029 #3 0x08086086 in ap_run_log_transaction () #4 0x0806963e in ap_process_request () #5 0x08064b19 in _start () I suspect the problem is in this part of mod_python.c: Here's my understanding (please correct me if I'm wrong). If hle is NULL (i.e., there were no handlers of this type registered in the apache conf), but dynhle is not null (we registered a dynamic handler), we continue processing... if (! (hle || dynhle)) { /* nothing to do here */ return DECLINED; } /* determine interpreter to use */ interp_name = select_interp_name(req, NULL, conf, hle, NULL, 0); /* get/create interpreter */ idata = get_interpreter(interp_name, req-server); if (!idata) return HTTP_INTERNAL_SERVER_ERROR; /* create/acquire request object */ request_obj = get_request_object(req, interp_name, phase); /* remember the extension if any. used by publisher */ if (ext) request_obj-extension = apr_pstrdup(req-pool, ext); But then, we try to create an MpHList from hle, which is NULL -- prompting a crash a few levels down. /* create a hahdler list object */ request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle); Here's how I fixed it: --- /home/jdb/dl/mpsvn/mod_python-3.1.4/src/mod_python.c2005-01-29 16:25:28.0 -0500 +++ mod_python.c2006-01-04 16:59:13.0 -0500 @@ -1026,13 +1026,18 @@ request_obj-extension = apr_pstrdup(req-pool, ext); /* create a hahdler list object */ -request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle); - -/* add dynamically registered handlers, if any */ -if (dynhle) { +if (hle) { + request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle); + /* add dynamically registered handlers, if any */ + if (dynhle) { MpHList_Append(request_obj-hlo, dynhle); + } } - +else { + /* hle was NULL; so create hlo from dynhle */ + request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(dynhle); +} + /* * Here is where we call into Python! * This is the C equivalent of AFAIK, this works -- but I'm new at this. best regards, Joe
Re: BUG?: Segfault with add_handler
You are a few weeks too late. :-) See: http://issues.apache.org/jira/browse/MODPYTHON-98 There are whole bunch of little issues with adding handlers using req.add_handler(). For this one you seem to have tread much the same path as I did. Graham Joseph Barillari wrote .. Hi, So far, I really like mod_python -- it's a great piece of software. But I think I've found a small bug. I've noticed that adding a handler using add_handler prompts apache to segfault if no statically registered (e.g., specified in the apache conf files) handlers of the same category (e.g., PythonLogHandler, PythonTransHandler) are present. I've tested this in both the release version (314) and version 325b (albeit more thoroughly in the former). Here's how I triggered the bug: Add this to apache2.conf: PythonPath sys.path+[(the path goes here)] PythonTransHandler crashbug crashbug.py contains: from mod_python import apache def transhandler(req): req.add_handler(PythonLogHandler,foo::loghandler) return apache.OK foo contains: from mod_python import apache def loghandler(req): import sys sys.stderr.write(foo: saw request type: + req.method + \n) return apache.OK If I run apache sudo /usr/sbin/apache2 -X and tickle the server: echo -e GET / HTTP/1.0\n\n | nc localhost 8080 ...apache segfaults. On the other hand, if I add the line PythonLogHandler notfoo to apache2.conf, where notfoo.py is identical to foo except that foo: is replaced with notfoo, I get the expected two lines in the log, and no crash. Here's the trace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1214269216 (LWP 21283)] 0xb76dddb3 in hlist_copy (p=0x8158fa8, hle=0x0) at hlist.c:91 91 head-handler = apr_pstrdup(p, hle-handler); (gdb) bt #0 0xb76dddb3 in hlist_copy (p=0x8158fa8, hle=0x0) at hlist.c:91 #1 0xb76dd9aa in MpHList_FromHLEntry (hle=0x0) at hlistobject.c:49 #2 0xb76e59b6 in python_handler (req=0x8238480, phase=0xb76e804c PythonLogHandler) at mod_python.c:1029 #3 0x08086086 in ap_run_log_transaction () #4 0x0806963e in ap_process_request () #5 0x08064b19 in _start () I suspect the problem is in this part of mod_python.c: Here's my understanding (please correct me if I'm wrong). If hle is NULL (i.e., there were no handlers of this type registered in the apache conf), but dynhle is not null (we registered a dynamic handler), we continue processing... if (! (hle || dynhle)) { /* nothing to do here */ return DECLINED; } /* determine interpreter to use */ interp_name = select_interp_name(req, NULL, conf, hle, NULL, 0); /* get/create interpreter */ idata = get_interpreter(interp_name, req-server); if (!idata) return HTTP_INTERNAL_SERVER_ERROR; /* create/acquire request object */ request_obj = get_request_object(req, interp_name, phase); /* remember the extension if any. used by publisher */ if (ext) request_obj-extension = apr_pstrdup(req-pool, ext); But then, we try to create an MpHList from hle, which is NULL -- prompting a crash a few levels down. /* create a hahdler list object */ request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle); Here's how I fixed it: --- /home/jdb/dl/mpsvn/mod_python-3.1.4/src/mod_python.c 2005-01-29 16:25:28.0 -0500 +++ mod_python.c 2006-01-04 16:59:13.0 -0500 @@ -1026,13 +1026,18 @@ request_obj-extension = apr_pstrdup(req-pool, ext); /* create a hahdler list object */ -request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle); - -/* add dynamically registered handlers, if any */ -if (dynhle) { +if (hle) { + request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle); + /* add dynamically registered handlers, if any */ + if (dynhle) { MpHList_Append(request_obj-hlo, dynhle); + } } - +else { + /* hle was NULL; so create hlo from dynhle */ + request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(dynhle); +} + /* * Here is where we call into Python! * This is the C equivalent of AFAIK, this works -- but I'm new at this. best regards, Joe
Re: [jira] Created: (MODPYTHON-105) mod_python.publisher should notdiscard content for HEAD request.
On Wed, 4 Jan 2006, Graham Dumpleton wrote: Either way, we agree that mod_python.publisher should still output content for HEAD. Yep. I would also propose as a change that the req.write() call not cause output to be flushed to allow an output filter like CONTENT_LENGTH to be used. Hmmm... This needs some more research. i.e. I don't quite completely understand why the CONTENT_LENGTH filter can only count bytes when there is no flush() (shortcoming of CONTENT_LENGTH or an impossibility?)... Implicit buffering would be a significant change - if someone used req.write() to generate dynamic content (e.g. output from one of those traceroute sites being sent in real time), they'd be very surprised to not see the output anymore. CGI I think flushes implicitely at every end of line which where this behaviour comes from... On the other hand implicit flush slows things down tremenduosly when you have lots of req.write()s, this was discovered when PSP was added and that's when the no-flush argument was introduced, so if backwards compatibility was of no concern, I'd make implicit buffering (i.e. no-flush) default. I'll add a new JIRA issue for that. Yep, totally. And thanks for all your help BTW :-) Grisha
Re: [jira] Created: (MODPYTHON-107) mod_python.publisher shouldn't flush result when written.
On Wed, 4 Jan 2006, Graham Dumpleton (JIRA) wrote: This makes one wander if there should be a configurable option for mod_python.psp to tell it not to flush output as well so that CONTENT_LENGTH could be used in that case as well. ??? I kinda mentioned this in the previous e-mail - PSP always does not flush output, in fact that 0/1 argument was added to req.write() just for PSP. So for pure PSP pages, content-length _should_ work (haven't tested it). Grisha
Re: [jira] Created: (MODPYTHON-105) mod_python.publisher shouldnotdiscard content for HEAD request.
Grisha wrote .. On Wed, 4 Jan 2006, Graham Dumpleton wrote: Either way, we agree that mod_python.publisher should still output content for HEAD. Yep. I would also propose as a change that the req.write() call not cause output to be flushed to allow an output filter like CONTENT_LENGTH to be used. Hmmm... This needs some more research. i.e. I don't quite completely understand why the CONTENT_LENGTH filter can only count bytes when there is no flush() (shortcoming of CONTENT_LENGTH or an impossibility?)... The content length header has to be added before CONTENT_LENGTH tries to write output from the first call into it. Thus if it doesn't detect end of content in that first call it can't add the content length header. It will not buffer input itself and has to write out data before it returns from each call. Implicit buffering would be a significant change - if someone used req.write() to generate dynamic content (e.g. output from one of those traceroute sites being sent in real time), they'd be very surprised to not see the output anymore. CGI I think flushes implicitely at every end of line which where this behaviour comes from... On the other hand implicit flush slows things down tremenduosly when you have lots of req.write()s, this was discovered when PSP was added and that's when the no-flush argument was introduced, so if backwards compatibility was of no concern, I'd make implicit buffering (i.e. no-flush) default. Not suggesting changing req.write() default, purely that one call of it in mod_python.publisher(). In the case where the only content is that returned by the publisher function, the full content is held at that point in a single string and there will be no more later on. Thus no harm in that one case of telling req.write() not to flush, since the next thing mod_python.publisher will do is return apache.OK anyway, which will cause it t be flushed anyway, but with the side effect that any output filter will be able to detect end of content in the one call. As to PSP flushing, you are right. I looked at the code, saw stuff like: STATIC_STR(req.write(\\\) and misinterpreted it since I didn't see a 0 at that point. Didn't realise it was the start of a multiline string with later: STATIC_STR(\\\,0); And thanks for all your help BTW :-) I thought at this point you would be getting sick of me finding all this bugs and suggested improvements. I have managed to delay you getting version 3.2 final out now a few times. Anyway, all an interesting exercise digging into all of this. Maybe one day I'll know enough about it to contemplate writing a book. ;-) Graham