subject:"Re\: Segfaults in ConnectionHander"

Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-31 Thread Jim Gallacher


Jim Gallacher wrote:

Volodya wrote:


On Mon, Jan 30, 2006 at 09:40:39PM -0500, Graham Dumpleton wrote:


Graham Dumpleton wrote ..


Extending the above code as:

   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, 
bufsize);

   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString("Connection read error"));
   return NULL;
   }

   /* Return empty string if no buckets. Can be caused by EAGAIN. */
   if (APR_BRIGADE_EMPTY(bb)) {
   return PyString_FromString("");
   }

seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to 
check

whether any new buckets added and returning empty string if not.



Okay, this may work, but the EAGAIN propogating backup as an empty
string to Python can cause a tight loop to occur where calls are going
out and back into Python code. This will occur until something is read
or an error occurs.

To avoid the back and forth, another option may be:

   while (APR_BRIGADE_EMPTY(bb)) {
   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c->input_filters, bb, mode, 
APR_BLOCK_READ, bufsize);

   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString("Connection read 
error"));

   return NULL;
   }
   }




Graham,

this code runs smoothly, i.e. no segfaults, all tests passed:
FreeBSD 4.9:



That's good news. I still wonder why we are seeing this problem in 3.2 
and 3.1.4 though.


And what I meant to say was "and *NOT* in 3.1.4".

Jim

Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-31 Thread Jim Gallacher


Volodya wrote:

On Mon, Jan 30, 2006 at 09:40:39PM -0500, Graham Dumpleton wrote:


Graham Dumpleton wrote ..


Extending the above code as:

   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString("Connection read error"));
   return NULL;
   }

   /* Return empty string if no buckets. Can be caused by EAGAIN. */
   if (APR_BRIGADE_EMPTY(bb)) {
   return PyString_FromString("");
   }

seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
whether any new buckets added and returning empty string if not.


Okay, this may work, but the EAGAIN propogating backup as an empty
string to Python can cause a tight loop to occur where calls are going
out and back into Python code. This will occur until something is read
or an error occurs.

To avoid the back and forth, another option may be:

   while (APR_BRIGADE_EMPTY(bb)) {
   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString("Connection read error"));
   return NULL;
   }
   }




Graham,

this code runs smoothly, i.e. no segfaults, all tests passed:
FreeBSD 4.9:


That's good news. I still wonder why we are seeing this problem in 3.2 
and 3.1.4 though.


Jim

Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-31 Thread Volodya

On Mon, Jan 30, 2006 at 09:40:39PM -0500, Graham Dumpleton wrote:
> Graham Dumpleton wrote ..
> > Extending the above code as:
> > 
> > Py_BEGIN_ALLOW_THREADS;
> > rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, 
> > bufsize);
> > Py_END_ALLOW_THREADS;
> > 
> > if (! APR_STATUS_IS_SUCCESS(rc)) {
> > PyErr_SetObject(PyExc_IOError,
> > PyString_FromString("Connection read error"));
> > return NULL;
> > }
> > 
> > /* Return empty string if no buckets. Can be caused by EAGAIN. */
> > if (APR_BRIGADE_EMPTY(bb)) {
> > return PyString_FromString("");
> > }
> > 
> > seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
> > whether any new buckets added and returning empty string if not.
> 
> Okay, this may work, but the EAGAIN propogating backup as an empty
> string to Python can cause a tight loop to occur where calls are going
> out and back into Python code. This will occur until something is read
> or an error occurs.
> 
> To avoid the back and forth, another option may be:
> 
> while (APR_BRIGADE_EMPTY(bb)) {
> Py_BEGIN_ALLOW_THREADS;
> rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, 
> bufsize);
> Py_END_ALLOW_THREADS;
> 
> if (! APR_STATUS_IS_SUCCESS(rc)) {
> PyErr_SetObject(PyExc_IOError,
> PyString_FromString("Connection read error"));
> return NULL;
> }
> }
> 

Graham,

this code runs smoothly, i.e. no segfaults, all tests passed:
FreeBSD 4.9:

  Apache/2.0.50 (prefork) Python/2.3.4
  Apache/2.0.55 (prefork) Python/2.4.2

Thanks!

Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-30 Thread Jim Gallacher


Graham Dumpleton wrote:

Graham Dumpleton wrote ..


Returning back up to _conn_read() in mod_python source code, we have
where core_input_filter() was called ap_get_brigade():

   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
   Py_END_ALLOW_THREADS;
   
   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError, 
   PyString_FromString("Connection read error"));

   return NULL;
   }

Since APR_SUCCESS was returned and assigned to "rc", no problem is detected.

The code which follows then assumes that the first bucket in the bucket
brigade actually contains valid data, when in fact the first bucket is
actually
crap as nothing was done to set up a valid bucket since EAGAIN was returned.
As a consequence it crashes.

Thus in summary, _conn_read() doesn't cater in any way for the possibility
that the initial socket read may have failed because of EAGAIN and thus
the bucket is bogus. The problem is, how is it mean't to know this if the
value APR_SUCCESS is returned by ap_get_brigade().



Extending the above code as:

Py_BEGIN_ALLOW_THREADS;
rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
Py_END_ALLOW_THREADS;

if (! APR_STATUS_IS_SUCCESS(rc)) {
PyErr_SetObject(PyExc_IOError,
PyString_FromString("Connection read error"));
return NULL;
}

/* Return empty string if no buckets. Can be caused by EAGAIN. */
if (APR_BRIGADE_EMPTY(bb)) {
return PyString_FromString("");
}

seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
whether any new buckets added and returning empty string if not.

Can someone else seeing this issue try this fix and see if the tests then
work.


Note that APR_STATUS_IS_SUCCESS has been removed from apr 1.x, which is 
one of the issues in getting mod_python to run in Apache 2.2. It looks 
like we should just check if rc != 0. This is according to discussion 
here featuring Greg Stein and Ryan Bloom:

http://www.mail-archive.com/dev@httpd.apache.org/msg21757.html

I'll update MODPYTHON-78 regarding Apache 2.2 with details on this and 
apr_sockaddr_port_get which has also been removed in apr 1.x.


Jim

Re: Segfaults in ConnectionHander

2006-01-30 Thread Jim Gallacher


Jim Gallacher wrote:

Graham Dumpleton wrote:


What I might speculate is that if the test in mod_python for the
connection handler is setup to run on a secondary listener port,
but with the primary still active, that it may trigger the problem
on other systems like Linux. Jim, you might want to try this and see
if you can duplicate it on Linux.



I'll try it tonight.



Graham,

I am not able to reproduce the problem using the configuration and 
example code you give in MODPYTHON-102. (Linux Debian 2.6.12-1-k7 kernel).


Jim

Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-30 Thread Graham Dumpleton

Graham Dumpleton wrote ..
> Extending the above code as:
> 
> Py_BEGIN_ALLOW_THREADS;
> rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
> Py_END_ALLOW_THREADS;
> 
> if (! APR_STATUS_IS_SUCCESS(rc)) {
> PyErr_SetObject(PyExc_IOError,
> PyString_FromString("Connection read error"));
> return NULL;
> }
> 
> /* Return empty string if no buckets. Can be caused by EAGAIN. */
> if (APR_BRIGADE_EMPTY(bb)) {
> return PyString_FromString("");
> }
> 
> seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
> whether any new buckets added and returning empty string if not.

Okay, this may work, but the EAGAIN propogating backup as an empty
string to Python can cause a tight loop to occur where calls are going
out and back into Python code. This will occur until something is read
or an error occurs.

To avoid the back and forth, another option may be:

while (APR_BRIGADE_EMPTY(bb)) {
Py_BEGIN_ALLOW_THREADS;
rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, 
bufsize);
Py_END_ALLOW_THREADS;

if (! APR_STATUS_IS_SUCCESS(rc)) {
PyErr_SetObject(PyExc_IOError,
PyString_FromString("Connection read error"));
return NULL;
}
}

What doesn't make sense to me is that on my Mac OS X box where this
problem only occurs when you have two listener ports, even when you
have already read some input from the connection, it tight loops with
the lowest level read always returning EAGAIN. Ie., it doesn't block at all.

Thus something really bad is happening on on Mac OS X. Unless Apache
is setting some strange ioctl options on the socket to inadvertently
cause this, it looks to me like Mac OS X is broken in some way. I am
still on Mac OS X (10.3). I'll have to try it on my 10.4 box and see if it
makes any difference.

Graham

Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-30 Thread Graham Dumpleton

Graham Dumpleton wrote ..
> Returning back up to _conn_read() in mod_python source code, we have
> where core_input_filter() was called ap_get_brigade():
> 
> Py_BEGIN_ALLOW_THREADS;
> rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
> Py_END_ALLOW_THREADS;
> 
> if (! APR_STATUS_IS_SUCCESS(rc)) {
> PyErr_SetObject(PyExc_IOError, 
> PyString_FromString("Connection read error"));
> return NULL;
> }
> 
> Since APR_SUCCESS was returned and assigned to "rc", no problem is detected.
> 
> The code which follows then assumes that the first bucket in the bucket
> brigade actually contains valid data, when in fact the first bucket is
> actually
> crap as nothing was done to set up a valid bucket since EAGAIN was returned.
> As a consequence it crashes.
> 
> Thus in summary, _conn_read() doesn't cater in any way for the possibility
> that the initial socket read may have failed because of EAGAIN and thus
> the bucket is bogus. The problem is, how is it mean't to know this if the
> value APR_SUCCESS is returned by ap_get_brigade().

Extending the above code as:

Py_BEGIN_ALLOW_THREADS;
rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
Py_END_ALLOW_THREADS;

if (! APR_STATUS_IS_SUCCESS(rc)) {
PyErr_SetObject(PyExc_IOError,
PyString_FromString("Connection read error"));
return NULL;
}

/* Return empty string if no buckets. Can be caused by EAGAIN. */
if (APR_BRIGADE_EMPTY(bb)) {
return PyString_FromString("");
}

seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
whether any new buckets added and returning empty string if not.

Can someone else seeing this issue try this fix and see if the tests then
work.

Graham

Re: Segfaults in ConnectionHander

2006-01-30 Thread Gregory (Grisha) Trubetskoy



This may be a good question to post to dev@httpd.apache.org

Grisha

On Mon, 30 Jan 2006, Graham Dumpleton wrote:


Getting a bit closer now, have next part of puzzle worked out.

Graham Dumpleton wrote ..

This is starting to look really ugly.

In _conn_read(), it first creates a bucket brigade from the connection
objects pool object. No chance of this being destroyed prematurely
as a result.

bb = apr_brigade_create(c->pool, c->bucket_alloc);


From what I understand, it then makes a call which links the bucket

brigade to the actual source of data.

rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);

Under normal circumstances this would also have the side effect of
performing the first actual read of data off the socket connection which
the client created to Apache.


When ap_get_brigade() is called, it is actually calling through to the
function core_input_filter() in Apache (server/core.c). In that function, it
ultimately hits the code:

   e = APR_BRIGADE_FIRST(ctx->b);
   rv = apr_bucket_read(e, &str, &len, block);

   if (APR_STATUS_IS_EAGAIN(rv)) {
   return APR_SUCCESS;
   }

Tracking down into apr_bucket_read() it ends up calling the function
socket_bucket_read() containg the code:

   *str = NULL;
   *len = APR_BUCKET_BUFF_SIZE;
   buf = apr_bucket_alloc(*len, a->list); /* XXX: check for failure? */

   rv = apr_socket_recv(p, buf, len);

   if (block == APR_NONBLOCK_READ) {
   apr_socket_timeout_set(p, timeout);
   }

   if (rv != APR_SUCCESS && rv != APR_EOF) {
   apr_bucket_free(buf);
   return rv;
   }

The apr_socket_recv() is what is doing the initial read of data from the
socket connection. This should block until the first data is received.

What is happening though is that it is returning -1 with errno set to
EAGAIN. Thus it frees the temporary bucket it created and returns
EAGAIN as the result.

If you note the code in the core_input_filter() it has:

   if (APR_STATUS_IS_EAGAIN(rv)) {
   return APR_SUCCESS;
   }

Thus, when EAGAIN is encountered, it simply returns success and does
not do anything else.

Returning back up to _conn_read() in mod_python source code, we have
where core_input_filter() was called ap_get_brigade():

   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString("Connection read error"));
   return NULL;
   }

Since APR_SUCCESS was returned and assigned to "rc", no problem is detected.

The code which follows then assumes that the first bucket in the bucket
brigade actually contains valid data, when in fact the first bucket is actually
crap as nothing was done to set up a valid bucket since EAGAIN was returned.
As a consequence it crashes.

Thus in summary, _conn_read() doesn't cater in any way for the possibility
that the initial socket read may have failed because of EAGAIN and thus
the bucket is bogus. The problem is, how is it mean't to know this if the
value APR_SUCCESS is returned by ap_get_brigade().

At this point, seems a bit of research is needed of other examples of
connection handlers for Apache to see how they handle the initial startup
sequence and processing of initial data. What is in mod_python now does
not appear to be reliable in the face of an EAGAIN error occuring.

Graham

Re: Segfaults in ConnectionHander

2006-01-30 Thread Graham Dumpleton

Getting a bit closer now, have next part of puzzle worked out.

Graham Dumpleton wrote ..
> This is starting to look really ugly.
> 
> In _conn_read(), it first creates a bucket brigade from the connection
> objects pool object. No chance of this being destroyed prematurely
> as a result.
> 
> bb = apr_brigade_create(c->pool, c->bucket_alloc);
> 
> >From what I understand, it then makes a call which links the bucket
> brigade to the actual source of data.
> 
> rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
> 
> Under normal circumstances this would also have the side effect of
> performing the first actual read of data off the socket connection which
> the client created to Apache.

When ap_get_brigade() is called, it is actually calling through to the
function core_input_filter() in Apache (server/core.c). In that function, it
ultimately hits the code:

e = APR_BRIGADE_FIRST(ctx->b);
rv = apr_bucket_read(e, &str, &len, block);

if (APR_STATUS_IS_EAGAIN(rv)) {
return APR_SUCCESS;
}

Tracking down into apr_bucket_read() it ends up calling the function
socket_bucket_read() containg the code:

*str = NULL;
*len = APR_BUCKET_BUFF_SIZE;
buf = apr_bucket_alloc(*len, a->list); /* XXX: check for failure? */

rv = apr_socket_recv(p, buf, len);

if (block == APR_NONBLOCK_READ) {
apr_socket_timeout_set(p, timeout);
}
   
if (rv != APR_SUCCESS && rv != APR_EOF) {
apr_bucket_free(buf);
return rv;
}

The apr_socket_recv() is what is doing the initial read of data from the
socket connection. This should block until the first data is received.

What is happening though is that it is returning -1 with errno set to
EAGAIN. Thus it frees the temporary bucket it created and returns
EAGAIN as the result.

If you note the code in the core_input_filter() it has:

if (APR_STATUS_IS_EAGAIN(rv)) {
return APR_SUCCESS;
}

Thus, when EAGAIN is encountered, it simply returns success and does
not do anything else.

Returning back up to _conn_read() in mod_python source code, we have
where core_input_filter() was called ap_get_brigade():

Py_BEGIN_ALLOW_THREADS;
rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);
Py_END_ALLOW_THREADS;

if (! APR_STATUS_IS_SUCCESS(rc)) {
PyErr_SetObject(PyExc_IOError, 
PyString_FromString("Connection read error"));
return NULL;
}

Since APR_SUCCESS was returned and assigned to "rc", no problem is detected.

The code which follows then assumes that the first bucket in the bucket
brigade actually contains valid data, when in fact the first bucket is actually
crap as nothing was done to set up a valid bucket since EAGAIN was returned.
As a consequence it crashes.

Thus in summary, _conn_read() doesn't cater in any way for the possibility
that the initial socket read may have failed because of EAGAIN and thus
the bucket is bogus. The problem is, how is it mean't to know this if the
value APR_SUCCESS is returned by ap_get_brigade().

At this point, seems a bit of research is needed of other examples of
connection handlers for Apache to see how they handle the initial startup
sequence and processing of initial data. What is in mod_python now does
not appear to be reliable in the face of an EAGAIN error occuring.

Graham

Re: Segfaults in ConnectionHander

2006-01-30 Thread Jim Gallacher


Graham Dumpleton wrote:


What I might speculate is that if the test in mod_python for the
connection handler is setup to run on a secondary listener port,
but with the primary still active, that it may trigger the problem
on other systems like Linux. Jim, you might want to try this and see
if you can duplicate it on Linux.


I'll try it tonight.

Jim

Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

2006-01-30 Thread Jim Gallacher


David Fraser wrote:

Jim Gallacher wrote:


Barry Pederson wrote:

I think this is the general kind of thing we're looking for though, 
with some mistaken pointer/memory operation.


Too bad we can't write *everything* in python. :(


You haven't been following PyPy then? :-)

David



Well, sure, but I think porting will have to wait until at least 
mod_python 3.4. :)


Jim

Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

2006-01-30 Thread David Fraser


Jim Gallacher wrote:

Barry Pederson wrote:
I think this is the general kind of thing we're looking for though, 
with some mistaken pointer/memory operation.

Too bad we can't write *everything* in python. :(

You haven't been following PyPy then? :-)

David

Re: Segfaults in ConnectionHander

2006-01-29 Thread Graham Dumpleton

Changed subject heading. See more of what I have uncovered below.
Not sure where to go next.

Graham Dumpleton wrote ..
> > > Unlike suggestions by someone else that "self" seemed to be getting
> corrupted,
> > > it looks fine to me, and code simply crashed down in:
> > >
> > >  apr_bucket_read(b, &data, &size, APR_BLOCK_READ)
> > >
> > > on very first call to it. Thus need to start tracking into Apache itself
> > and see what
> > > there may be about bucket structures that isn't correct. This is where
> > I got to
> > > last time before I gave up, feeling it wasn't worth the effort at the
> > time. I'll try
> > > and build a version of Apache with debug so I can get a better stack
> > trace.
> > 
> > The first thing I'd check is for validity of b. Buckets use reference
> > counting much like Python, so sometimes it's possible for a bucket to
> > "self-distruct".
> 
> Starting to delve into the bucket now. Haven't looked at reference count
> stuff yet, but the b->type object seems to be bogus. This is where the
> read() function pointer is kept and since it is a bad value it is why it
> dies.

This is starting to look really ugly.

In _conn_read(), it first creates a bucket brigade from the connection
objects pool object. No chance of this being destroyed prematurely
as a result.

bb = apr_brigade_create(c->pool, c->bucket_alloc);

>From what I understand, it then makes a call which links the bucket
brigade to the actual source of data.

rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);

Under normal circumstances this would also have the side effect of
performing the first actual read of data off the socket connection which
the client created to Apache.

Import things here to note are the value of:

  c->input_filters->frec->filter_func.in_func

going into the call. Not sure exactly, but I imagine that this is the first
input filter which handles reading from the socket.

My logging shows the address of the input filter in memory as 178456.

When ap_get_brigade() returns okay, the first actual bucket from the
bucket brigade is obatained:

b = APR_BRIGADE_FIRST(bb);

There are two interesting values in the bucket worth looking at:

b->type->name
b->type->read

The first is the type of bucket object and the second is the pointer to a
function to read data from the bucket.

My logging shows the type of bucket as being "HEAP" and the address
of the read function pointer as 1819356.

I will not go into the rest of the function except to say that as necessary
it may do additional reads using apr_bucket_read() to get more data
if required when that initially read by ap_get_brigade() isn't enough.

Anyway, the above is when it is working okay. This being when I have the
connection handler attached to my primary listener port. As soon as I
add into the main Apache configuration file an additional socket for
Apache to listen on, ie., when I add:

  Listen 8081

it will crash in _conn_read() no matter whether I have attached the
connection handler to the primary listener port or the additional
listener port.

In contrast to the above, when it dies, the address of the input filter
in memory is still 178456, but the initial bucket in the bucket brigade
as populated by ap_get_brigade() is bogus. Ie., I get for the name crap
like:

  \x01\x80b\x18\x01\x8f\xec\x18\x01\x83b\x18\x01\x80b\x1c\x01\x8f\xcc\xb8

and the address of the read function is 88.

Importantly, the ap_get_brigade() function does not block on a read
waiting for the first data coming over the socket like it did before.

With the bogus bucket returned, when apr_bucket_read() is later called,
it tries to use the read function in the initial bucket which being bogus
causes the crash.

Thus in summary, with a secondary listener port the ap_get_brigade()
function doesn't block on read waiting for first data, returning
immediately, but still seeming to return success. The initial bucket
in the bucket brigade then seems to be bogus.

What I might speculate is that if the test in mod_python for the
connection handler is setup to run on a secondary listener port,
but with the primary still active, that it may trigger the problem
on other systems like Linux. Jim, you might want to try this and see
if you can duplicate it on Linux.

BTW, I am not saying this is the same problem on the BSD systems,
but it certainly is not correct either way.

Graham

Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

2006-01-29 Thread Barry Pederson


Jim Gallacher wrote:


Dang, it's frustrating not being able to reproduce this bug in Linux.


I suppose it's maybe something to do with different malloc 
implementations or such.   I haven't seen any +1s for OpenBSD, which 
would be interesting to see since they added some stuff in 3.8 to help 
catch problems with this sort of thing


http://kerneltrap.org/node/5584

Anyone been able to use valgrind or similar with mod_python?  I Googled 
and found a couple old messages from '02 and '04 mentioning attempts to 
use this, but doesn't sound like much came out of it.  I think there's a 
valgrind port on FreeBSD, so I may give that a try.


Barry

Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

2006-01-29 Thread Jim Gallacher


Barry Pederson wrote:
I don't know if this is the answer to the problem, but it looks like a 
bug anyway. In connobject.c starting at line 133:


/* time to grow destination string? */
if (len == 0 && bytes_read == bufsize) {

_PyString_Resize(&result, bufsize + HUGE_STRING_LEN);
buffer = PyString_AS_STRING((PyStringObject *) result);
buffer += HUGE_STRING_LEN;
bufsize += HUGE_STRING_LEN;
}


It looks like we've just set the buffer pointer to an address 
somewhere inside the buffer. That can't be good. The buffer pointer 
should be set to the bytes_read position. Perhaps one of you FreeBSD 
heads could try the attached patch.


Jim






Index: src/connobject.c
===
--- src/connobject.c(revision 369511)
+++ src/connobject.c(working copy)
@@ -135,7 +135,7 @@
 
 _PyString_Resize(&result, bufsize + HUGE_STRING_LEN);

 buffer = PyString_AS_STRING((PyStringObject *) result);
-buffer += HUGE_STRING_LEN;
+buffer += bytes_read;
 bufsize += HUGE_STRING_LEN;
 }
 



Sorry, that doesn't seem to fix it.  I did a fresh extraction of 
mod_python-3.2.6.tgz, applied the patch, did ./configure, make, su, make 
install, exit su, cd test, ran test.py - got the same result as before, 
with the same core dump apparently.


I really didn't think it would help since a buffer of HUGE_STRING_LEN 
(8192) should have been created in the first place. The unit test 
wouldn't be reading that many bytes, so I doubt the buffer is getting 
resized. All the same it still looks like a bug.


I think this is the general kind of thing we're looking for though, with 
some mistaken pointer/memory operation.


Too bad we can't write *everything* in python. :(


---

As I mentioned in another message, I did some experimenting with 
disabling other unittests and found if you disable just 
"test_fileupload", all the remaining tests including 
"test_connectionhandler" pass.


I hadn't forgotten. I'm just trying to understand what might be going on 
in the code and I spotted the bug.


If you disable everything except "test_fileupload" and 
"test_connectionhandler", then "test_connectionhandler" still crashes.


Dang, it's frustrating not being able to reproduce this bug in Linux.

So I suspect that it's code involved with running "test_fileupload" 
(Testing 1 MB file upload support) that's really the source of the 
problem, and it's screwing up some part of memory thats only tripped 
over later later during the connectionhandler test.


One of the things I'm trying to understand is what has changed since 3.1 
that is causing this bug. The only thing different in connobject.c is a 
fix to actually return local_ip and local_host (was returning remote_ip 
and remote_host), plus makeipaddr and makesockaddr now call the 
equivalent apr functions rather than the prior roll-our-own approach. 
Neither of these changes should impact the _conn_read.


Jim

Re: Segfaults in ConnectionHander (Possible Solution)

Re: Segfaults in ConnectionHander (Possible Solution)

Re: Segfaults in ConnectionHander (Possible Solution)

Re: Segfaults in ConnectionHander (Possible Solution)

Re: Segfaults in ConnectionHander

Re: Segfaults in ConnectionHander (Possible Solution)

Re: Segfaults in ConnectionHander (Possible Solution)

Re: Segfaults in ConnectionHander

Re: Segfaults in ConnectionHander

Re: Segfaults in ConnectionHander

Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

Re: Segfaults in ConnectionHander

Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

15 matches

Site Navigation

Mail list logo

Footer information