Re: [jira] Updated: (MODPYTHON-106) PythonAutoReload directive can't be turned off in config.

2006-01-04 Thread Gregory (Grisha) Trubetskoy


On Mon, 2 Jan 2006, Graham Dumpleton (JIRA) wrote:


Should this be fixed in 3.2, given that it probably hasn't work in 3.0 and 3.1?


I think it would be a good idea. Hopefully this is the last fix before we 
roll it out.


Grisha


Re: [jira] Created: (MODPYTHON-105) mod_python.publisher should not discard content for HEAD request.

2006-01-04 Thread Gregory (Grisha) Trubetskoy


On Mon, 2 Jan 2006, Graham Dumpleton (JIRA) wrote:


In addressing MODPYTHON-71, mod_python.publisher code was changed to read:

   if req.method!='HEAD':
   req.write(result)

This change should not really have been made and it should be changed 
back to what was there before, ie., just:


   req.write(result)


[...]

As an an example of an Apache module that uses output filters to do 
stuff, there is mod_cache. Luckily in that case, a HEAD request is one 
of various cases where mod_cache decides it will not use the output. 
This does not mean though that some other output filter that someone is 
using might expect content to be there for HEAD.


I am not sure I agree with this explanation (while I do agree that the 
behaviour should be reverted, but for different reason, see below). The 
RFC is pretty clear on HEAD: The HEAD method is identical to GET except 
that the server MUST NOT return a message-body in the response., so for a 
filter (or anything) to expect the content with HEAD is wrong.


In summary, one could also say that if a user wants to not output 
anything for a HEAD request, that is there decision, but 
mod_python.publisher should not be making that decision for them.


May be not for the publisher, but given the MUST NOT it should be OK for 
either httpd or mod_python to do this. If httpd does not do it, then 
perhaps it should be made part of req.write()? It may be a good idea to 
check the httpd-dev archives to see if the issue of HEAD has been 
discussed in the past.


In any event, I think the behaviour should be reverted to ignore HEAD for 
now simply because it is a half-ass solution given that req.write() gets 
through, especially because PSP templates rely on req.write() primarily.


I think for 3.2 we can just leave it at that, but for 3.3 to seriously 
consider whether mod_python should chop output for HEAD requests.


Also, I think a link to this JIRA issue as a comment somewhere in the code 
would be great or someone down the road will repeat this whole HEAD thing 
again.


Grisha


Re: [jira] Created: (MODPYTHON-105) mod_python.publisher should not discard content for HEAD request.

2006-01-04 Thread Gregory (Grisha) Trubetskoy



On Wed, 4 Jan 2006, Gregory (Grisha) Trubetskoy wrote:

It may be a good idea to check the httpd-dev archives to see if the 
issue of HEAD has been discussed in the past.


Here's a relevant bit of info from the httpd-dev archives:


* All handlers should always send content down even if r-header_only
  is set.  If not, it means that the HEAD requests don't generate the
  same headers as a GET which is wrong.


Grisha



Re: [jira] Created: (MODPYTHON-105) mod_python.publisher should notdiscard content for HEAD request.

2006-01-04 Thread Graham Dumpleton
Grisha wrote ..
  As an an example of an Apache module that uses output filters to do 
  stuff, there is mod_cache. Luckily in that case, a HEAD request is one
  of various cases where mod_cache decides it will not use the output.
  This does not mean though that some other output filter that someone
 is 
  using might expect content to be there for HEAD.
 
 I am not sure I agree with this explanation (while I do agree that the
 behaviour should be reverted, but for different reason, see below). The
 RFC is pretty clear on HEAD: The HEAD method is identical to GET except
 that the server MUST NOT return a message-body in the response., so for
 a  filter (or anything) to expect the content with HEAD is wrong.

That is ignoring what filters can do, including the fact that they can
add additional output headers. If a filter can add additional headers,
that it doesn't receive content can be a problem. This is because the
additional headers it adds may be in some way be dependent on the
content.

In effect this is partly what your quote from the http-dev list is about.

* All handlers should always send content down even if r-header_only
   is set.  If not, it means that the HEAD requests don't generate the
   same headers as a GET which is wrong.

I'll grant that I may be partly talking theoretical cases here, but the inbuilt
Apache CONTENT_LENGTH output filter can be used to add the content
length to a response, provided the output filter is able to consume the
whole content the first time it is called.

Here is a case whereby if mod_python.publisher didn't return the
content for a HEAD request, that the content length thereby calculated
by the CONTENT_LENGTH output filter would be different for the
HEAD request compared to the same request as a GET.

Now if I can just work out why when I add CONTENT_LENGTH as an
output filter in conjunction with mod_python.publisher that I don't get
a content length at all, I might have a leg to stand on. :-)

I note that if I have a Python based output filter, that a single req.write()
generates two calls into the output filter. The first ends because of read
returning an empty string, with the second reading None straight away.

[Thu Jan 05 08:49:00 2006] [error] handler_1

[Thu Jan 05 08:49:00 2006] [error] uppercase
[Thu Jan 05 08:49:00 2006] [error] is_input 0
[Thu Jan 05 08:49:00 2006] [error] read 'CONTENT'
[Thu Jan 05 08:49:00 2006] [error] write 'CONTENT'
[Thu Jan 05 08:49:00 2006] [error] read ''

[Thu Jan 05 08:49:00 2006] [error] uppercase
[Thu Jan 05 08:49:00 2006] [error] is_input 0
[Thu Jan 05 08:49:00 2006] [error] read None
[Thu Jan 05 08:49:00 2006] [error] close

For CONTENT_LENGTH to work, I would image it should be getting the
equivalent of the None read at the end of the first call into the output
filter.

I wander if this is somehow specifically to do with how mod_python
manages the bucket brigade. It would be unfortunate if mod_python
is doing something a bit strange which would prevent the
CONTENT_LENGTH output filter being used.

Wait, I worked it out. Two calls are being seen into the output filter
because req.write() is not being told not to flush the output. Thus, if
req.write() is called as:

  req.write(result,0)

it all works and content length is added by the CONTENT_LENGTH
output filter.

Thus with that change, one can see the problem with not outputing
content when HEAD is used. Namely, no content length header
generated.

~ [505]$ telnet localhost 8080
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /~grahamd/content_length/example.py HTTP/1.0

HTTP/1.1 200 OK
Date: Wed, 04 Jan 2006 21:59:39 GMT
Server: Apache/2.0.55 (Unix) mod_python/3.2.6-dev-20051229 Python/2.3
Content-Length: 7
Connection: close
Content-Type: text/plain

CONTENTConnection closed by foreign host.


~ [506]$ telnet localhost 8080
Trying ::1...
Connected to localhost.
Escape character is '^]'.
HEAD /~grahamd/content_length/example.py HTTP/1.0

HTTP/1.1 200 OK
Date: Wed, 04 Jan 2006 21:59:53 GMT
Server: Apache/2.0.55 (Unix) mod_python/3.2.6-dev-20051229 Python/2.3
Connection: close
Content-Type: text/plain

Connection closed by foreign host.

Either way, we agree that mod_python.publisher should still output
content for HEAD.

I would also propose as a change that the req.write() call not cause
output to be flushed to allow an output filter like CONTENT_LENGTH
to be used. I'll add a new JIRA issue for that.

Graham


BUG?: Segfault with add_handler

2006-01-04 Thread Joseph Barillari
Hi,

So far, I really like mod_python -- it's a great piece of
software. But I think I've found a small bug.

I've noticed that adding a handler using add_handler prompts apache to
segfault if no statically registered (e.g., specified in the apache
conf files) handlers of the same category (e.g., PythonLogHandler,
PythonTransHandler) are present. I've tested this in both the release
version (314) and version 325b (albeit more thoroughly in the former).

Here's how I triggered the bug:

Add this to apache2.conf:
PythonPath sys.path+[(the path goes here)]
PythonTransHandler crashbug

crashbug.py contains:

from mod_python import apache

def transhandler(req):
req.add_handler(PythonLogHandler,foo::loghandler)
return apache.OK

foo contains:

from mod_python import apache

def loghandler(req):
import sys
sys.stderr.write(foo: saw request type: + req.method + \n)
return apache.OK

If I run apache

sudo   /usr/sbin/apache2 -X

and tickle the server:

echo -e GET / HTTP/1.0\n\n | nc localhost 8080

...apache segfaults.

On the other hand, if I add the line 

PythonLogHandler notfoo

to apache2.conf, where notfoo.py is identical to foo except that
foo: is replaced with notfoo, I get the expected two lines in the
log, and no crash.

Here's the trace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1214269216 (LWP 21283)]
0xb76dddb3 in hlist_copy (p=0x8158fa8, hle=0x0) at hlist.c:91
91  head-handler = apr_pstrdup(p, hle-handler);
(gdb) bt
#0  0xb76dddb3 in hlist_copy (p=0x8158fa8, hle=0x0) at hlist.c:91
#1  0xb76dd9aa in MpHList_FromHLEntry (hle=0x0) at hlistobject.c:49
#2  0xb76e59b6 in python_handler (req=0x8238480, phase=0xb76e804c 
PythonLogHandler) at mod_python.c:1029
#3  0x08086086 in ap_run_log_transaction ()
#4  0x0806963e in ap_process_request ()
#5  0x08064b19 in _start ()


I suspect the problem is in this part of mod_python.c:

Here's my understanding (please correct me if I'm wrong).  If hle is
NULL (i.e., there were no handlers of this type registered in the
apache conf), but dynhle is not null (we registered a dynamic
handler), we continue processing...

if (! (hle || dynhle)) {
/* nothing to do here */
return DECLINED;
}

/* determine interpreter to use */
interp_name = select_interp_name(req, NULL, conf, hle, NULL, 0);

/* get/create interpreter */
idata = get_interpreter(interp_name, req-server);

if (!idata)
return HTTP_INTERNAL_SERVER_ERROR;

/* create/acquire request object */
request_obj = get_request_object(req, interp_name, phase);

/* remember the extension if any. used by publisher */
if (ext) 
request_obj-extension = apr_pstrdup(req-pool, ext);

But then, we try to create an MpHList from hle, which is NULL -- prompting a 
crash a few levels down.

/* create a hahdler list object */
request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle);


Here's how I fixed it:

--- /home/jdb/dl/mpsvn/mod_python-3.1.4/src/mod_python.c2005-01-29 
16:25:28.0 -0500
+++ mod_python.c2006-01-04 16:59:13.0 -0500
@@ -1026,13 +1026,18 @@
 request_obj-extension = apr_pstrdup(req-pool, ext);
 
 /* create a hahdler list object */
-request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle);
-
-/* add dynamically registered handlers, if any */
-if (dynhle) {
+if (hle) {
+  request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle);
+  /* add dynamically registered handlers, if any */
+  if (dynhle) {
 MpHList_Append(request_obj-hlo, dynhle);
+  }
 }
-
+else {
+  /* hle was NULL; so create hlo from dynhle */
+  request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(dynhle);
+}
+
 /* 
  * Here is where we call into Python!
  * This is the C equivalent of

AFAIK, this works -- but I'm new at this.

best regards,

Joe


Re: BUG?: Segfault with add_handler

2006-01-04 Thread Graham Dumpleton
You are a few weeks too late. :-)

See:

  http://issues.apache.org/jira/browse/MODPYTHON-98

There are whole bunch of little issues with adding handlers using
req.add_handler(). For this one you seem to have tread much the
same path as I did.

Graham

Joseph Barillari wrote ..
 Hi,
 
 So far, I really like mod_python -- it's a great piece of
 software. But I think I've found a small bug.
 
 I've noticed that adding a handler using add_handler prompts apache to
 segfault if no statically registered (e.g., specified in the apache
 conf files) handlers of the same category (e.g., PythonLogHandler,
 PythonTransHandler) are present. I've tested this in both the release
 version (314) and version 325b (albeit more thoroughly in the former).
 
 Here's how I triggered the bug:
 
 Add this to apache2.conf:
 PythonPath sys.path+[(the path goes here)]
 PythonTransHandler crashbug
 
 crashbug.py contains:
 
 from mod_python import apache
 
 def transhandler(req):
 req.add_handler(PythonLogHandler,foo::loghandler)
 return apache.OK
 
 foo contains:
 
 from mod_python import apache
 
 def loghandler(req):
 import sys
 sys.stderr.write(foo: saw request type: + req.method + \n)
 return apache.OK
 
 If I run apache
 
 sudo   /usr/sbin/apache2 -X
 
 and tickle the server:
 
 echo -e GET / HTTP/1.0\n\n | nc localhost 8080
 
 ...apache segfaults.
 
 On the other hand, if I add the line 
 
 PythonLogHandler notfoo
 
 to apache2.conf, where notfoo.py is identical to foo except that
 foo: is replaced with notfoo, I get the expected two lines in the
 log, and no crash.
 
 Here's the trace:
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread -1214269216 (LWP 21283)]
 0xb76dddb3 in hlist_copy (p=0x8158fa8, hle=0x0) at hlist.c:91
 91  head-handler = apr_pstrdup(p, hle-handler);
 (gdb) bt
 #0  0xb76dddb3 in hlist_copy (p=0x8158fa8, hle=0x0) at hlist.c:91
 #1  0xb76dd9aa in MpHList_FromHLEntry (hle=0x0) at hlistobject.c:49
 #2  0xb76e59b6 in python_handler (req=0x8238480, phase=0xb76e804c 
 PythonLogHandler)
 at mod_python.c:1029
 #3  0x08086086 in ap_run_log_transaction ()
 #4  0x0806963e in ap_process_request ()
 #5  0x08064b19 in _start ()
 
 
 I suspect the problem is in this part of mod_python.c:
 
 Here's my understanding (please correct me if I'm wrong).  If hle is
 NULL (i.e., there were no handlers of this type registered in the
 apache conf), but dynhle is not null (we registered a dynamic
 handler), we continue processing...
 
 if (! (hle || dynhle)) {
 /* nothing to do here */
 return DECLINED;
 }
 
 /* determine interpreter to use */
 interp_name = select_interp_name(req, NULL, conf, hle, NULL, 0);
 
 /* get/create interpreter */
 idata = get_interpreter(interp_name, req-server);
 
 if (!idata)
 return HTTP_INTERNAL_SERVER_ERROR;
 
 /* create/acquire request object */
 request_obj = get_request_object(req, interp_name, phase);
 
 /* remember the extension if any. used by publisher */
 if (ext) 
 request_obj-extension = apr_pstrdup(req-pool, ext);
 
 But then, we try to create an MpHList from hle, which is NULL -- prompting
 a crash a few levels down.
 
 /* create a hahdler list object */
 request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle);
 
 
 Here's how I fixed it:
 
 --- /home/jdb/dl/mpsvn/mod_python-3.1.4/src/mod_python.c  2005-01-29 
 16:25:28.0
 -0500
 +++ mod_python.c  2006-01-04 16:59:13.0 -0500
 @@ -1026,13 +1026,18 @@
  request_obj-extension = apr_pstrdup(req-pool, ext);
  
  /* create a hahdler list object */
 -request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle);
 -
 -/* add dynamically registered handlers, if any */
 -if (dynhle) {
 +if (hle) {
 +  request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(hle);
 +  /* add dynamically registered handlers, if any */
 +  if (dynhle) {
  MpHList_Append(request_obj-hlo, dynhle);
 +  }
  }
 -
 +else {
 +  /* hle was NULL; so create hlo from dynhle */
 +  request_obj-hlo = (hlistobject *)MpHList_FromHLEntry(dynhle);
 +}
 +
  /* 
   * Here is where we call into Python!
   * This is the C equivalent of
 
 AFAIK, this works -- but I'm new at this.
 
 best regards,
 
 Joe


Re: [jira] Created: (MODPYTHON-105) mod_python.publisher should notdiscard content for HEAD request.

2006-01-04 Thread Gregory (Grisha) Trubetskoy


On Wed, 4 Jan 2006, Graham Dumpleton wrote:


Either way, we agree that mod_python.publisher should still output
content for HEAD.


Yep.


I would also propose as a change that the req.write() call not cause
output to be flushed to allow an output filter like CONTENT_LENGTH
to be used.


Hmmm... This needs some more research. i.e. I don't quite completely 
understand why the CONTENT_LENGTH filter can only count bytes when there 
is no flush() (shortcoming of CONTENT_LENGTH or an impossibility?)...


Implicit buffering would be a significant change - if someone used 
req.write() to generate dynamic content (e.g. output from one of those 
traceroute sites being sent in real time), they'd be very surprised to not 
see the output anymore. CGI I think flushes implicitely at every end of 
line which where this behaviour comes from...


On the other hand implicit flush slows things down tremenduosly when you 
have lots of req.write()s, this was discovered when PSP was added and 
that's when the no-flush argument was introduced, so if backwards 
compatibility was of no concern, I'd make implicit buffering (i.e. 
no-flush) default.



I'll add a new JIRA issue for that.


Yep, totally.

And thanks for all your help BTW :-)

Grisha


Re: [jira] Created: (MODPYTHON-107) mod_python.publisher shouldn't flush result when written.

2006-01-04 Thread Gregory (Grisha) Trubetskoy


On Wed, 4 Jan 2006, Graham Dumpleton (JIRA) wrote:

This makes one wander if there should be a configurable option for 
mod_python.psp to tell it not to flush output as well so that 
CONTENT_LENGTH could be used in that case as well. ???


I kinda mentioned this in the previous e-mail - PSP always does not flush 
output, in fact that 0/1 argument was added to req.write() just for PSP. 
So for pure PSP pages, content-length _should_ work (haven't tested it).


Grisha


Re: [jira] Created: (MODPYTHON-105) mod_python.publisher shouldnotdiscard content for HEAD request.

2006-01-04 Thread Graham Dumpleton
Grisha wrote ..
 
 On Wed, 4 Jan 2006, Graham Dumpleton wrote:
 
  Either way, we agree that mod_python.publisher should still output
  content for HEAD.
 
 Yep.
 
  I would also propose as a change that the req.write() call not cause
  output to be flushed to allow an output filter like CONTENT_LENGTH
  to be used.
 
 Hmmm... This needs some more research. i.e. I don't quite completely 
 understand why the CONTENT_LENGTH filter can only count bytes when there
 is no flush() (shortcoming of CONTENT_LENGTH or an impossibility?)...

The content length header has to be added before CONTENT_LENGTH
tries to write output from the first call into it. Thus if it doesn't detect
end of content in that first call it can't add the content length header.
It will not buffer input itself and has to write out data before it returns
from each call.

 Implicit buffering would be a significant change - if someone used 
 req.write() to generate dynamic content (e.g. output from one of those
 traceroute sites being sent in real time), they'd be very surprised to
 not 
 see the output anymore. CGI I think flushes implicitely at every end of
 line which where this behaviour comes from...
 
 On the other hand implicit flush slows things down tremenduosly when you
 have lots of req.write()s, this was discovered when PSP was added and 
 that's when the no-flush argument was introduced, so if backwards 
 compatibility was of no concern, I'd make implicit buffering (i.e. 
 no-flush) default.

Not suggesting changing req.write() default, purely that one call of it
in mod_python.publisher(). In the case where the only content is that
returned by the publisher function, the full content is held at that point
in a single string and there will be no more later on. Thus no harm in
that one case of telling req.write() not to flush, since the next thing
mod_python.publisher will do is return apache.OK anyway, which will
cause it t be flushed anyway, but with the side effect that any output
filter will be able to detect end of content in the one call.

As to PSP flushing, you are right. I looked at the code, saw stuff like:

  STATIC_STR(req.write(\\\)

and misinterpreted it since I didn't see a 0 at that point. Didn't
realise it was the start of a multiline string with later:

  STATIC_STR(\\\,0);

 And thanks for all your help BTW :-)

I thought at this point you would be getting sick of me finding all this
bugs and suggested improvements. I have managed to delay you getting
version 3.2 final out now a few times.

Anyway, all an interesting exercise digging into all of this. Maybe one
day I'll know enough about it to contemplate writing a book. ;-)

Graham