Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-31 Thread Volodya
On Mon, Jan 30, 2006 at 09:40:39PM -0500, Graham Dumpleton wrote:
 Graham Dumpleton wrote ..
  Extending the above code as:
  
  Py_BEGIN_ALLOW_THREADS;
  rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, 
  bufsize);
  Py_END_ALLOW_THREADS;
  
  if (! APR_STATUS_IS_SUCCESS(rc)) {
  PyErr_SetObject(PyExc_IOError,
  PyString_FromString(Connection read error));
  return NULL;
  }
  
  /* Return empty string if no buckets. Can be caused by EAGAIN. */
  if (APR_BRIGADE_EMPTY(bb)) {
  return PyString_FromString();
  }
  
  seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
  whether any new buckets added and returning empty string if not.
 
 Okay, this may work, but the EAGAIN propogating backup as an empty
 string to Python can cause a tight loop to occur where calls are going
 out and back into Python code. This will occur until something is read
 or an error occurs.
 
 To avoid the back and forth, another option may be:
 
 while (APR_BRIGADE_EMPTY(bb)) {
 Py_BEGIN_ALLOW_THREADS;
 rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, 
 bufsize);
 Py_END_ALLOW_THREADS;
 
 if (! APR_STATUS_IS_SUCCESS(rc)) {
 PyErr_SetObject(PyExc_IOError,
 PyString_FromString(Connection read error));
 return NULL;
 }
 }
 

Graham,

this code runs smoothly, i.e. no segfaults, all tests passed:
FreeBSD 4.9:

  Apache/2.0.50 (prefork) Python/2.3.4
  Apache/2.0.55 (prefork) Python/2.4.2

Thanks!



Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-31 Thread Jim Gallacher

Volodya wrote:

On Mon, Jan 30, 2006 at 09:40:39PM -0500, Graham Dumpleton wrote:


Graham Dumpleton wrote ..


Extending the above code as:

   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString(Connection read error));
   return NULL;
   }

   /* Return empty string if no buckets. Can be caused by EAGAIN. */
   if (APR_BRIGADE_EMPTY(bb)) {
   return PyString_FromString();
   }

seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
whether any new buckets added and returning empty string if not.


Okay, this may work, but the EAGAIN propogating backup as an empty
string to Python can cause a tight loop to occur where calls are going
out and back into Python code. This will occur until something is read
or an error occurs.

To avoid the back and forth, another option may be:

   while (APR_BRIGADE_EMPTY(bb)) {
   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString(Connection read error));
   return NULL;
   }
   }




Graham,

this code runs smoothly, i.e. no segfaults, all tests passed:
FreeBSD 4.9:


That's good news. I still wonder why we are seeing this problem in 3.2 
and 3.1.4 though.


Jim





Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-31 Thread Jim Gallacher

Jim Gallacher wrote:

Volodya wrote:


On Mon, Jan 30, 2006 at 09:40:39PM -0500, Graham Dumpleton wrote:


Graham Dumpleton wrote ..


Extending the above code as:

   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, 
bufsize);

   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString(Connection read error));
   return NULL;
   }

   /* Return empty string if no buckets. Can be caused by EAGAIN. */
   if (APR_BRIGADE_EMPTY(bb)) {
   return PyString_FromString();
   }

seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to 
check

whether any new buckets added and returning empty string if not.



Okay, this may work, but the EAGAIN propogating backup as an empty
string to Python can cause a tight loop to occur where calls are going
out and back into Python code. This will occur until something is read
or an error occurs.

To avoid the back and forth, another option may be:

   while (APR_BRIGADE_EMPTY(bb)) {
   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c-input_filters, bb, mode, 
APR_BLOCK_READ, bufsize);

   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString(Connection read 
error));

   return NULL;
   }
   }




Graham,

this code runs smoothly, i.e. no segfaults, all tests passed:
FreeBSD 4.9:



That's good news. I still wonder why we are seeing this problem in 3.2 
and 3.1.4 though.


And what I meant to say was and *NOT* in 3.1.4.

Jim



Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

2006-01-30 Thread David Fraser

Jim Gallacher wrote:

Barry Pederson wrote:
I think this is the general kind of thing we're looking for though, 
with some mistaken pointer/memory operation.

Too bad we can't write *everything* in python. :(

You haven't been following PyPy then? :-)

David


Re: Segfaults in ConnectionHander

2006-01-30 Thread Gregory (Grisha) Trubetskoy


This may be a good question to post to dev@httpd.apache.org

Grisha

On Mon, 30 Jan 2006, Graham Dumpleton wrote:


Getting a bit closer now, have next part of puzzle worked out.

Graham Dumpleton wrote ..

This is starting to look really ugly.

In _conn_read(), it first creates a bucket brigade from the connection
objects pool object. No chance of this being destroyed prematurely
as a result.

bb = apr_brigade_create(c-pool, c-bucket_alloc);


From what I understand, it then makes a call which links the bucket

brigade to the actual source of data.

rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);

Under normal circumstances this would also have the side effect of
performing the first actual read of data off the socket connection which
the client created to Apache.


When ap_get_brigade() is called, it is actually calling through to the
function core_input_filter() in Apache (server/core.c). In that function, it
ultimately hits the code:

   e = APR_BRIGADE_FIRST(ctx-b);
   rv = apr_bucket_read(e, str, len, block);

   if (APR_STATUS_IS_EAGAIN(rv)) {
   return APR_SUCCESS;
   }

Tracking down into apr_bucket_read() it ends up calling the function
socket_bucket_read() containg the code:

   *str = NULL;
   *len = APR_BUCKET_BUFF_SIZE;
   buf = apr_bucket_alloc(*len, a-list); /* XXX: check for failure? */

   rv = apr_socket_recv(p, buf, len);

   if (block == APR_NONBLOCK_READ) {
   apr_socket_timeout_set(p, timeout);
   }

   if (rv != APR_SUCCESS  rv != APR_EOF) {
   apr_bucket_free(buf);
   return rv;
   }

The apr_socket_recv() is what is doing the initial read of data from the
socket connection. This should block until the first data is received.

What is happening though is that it is returning -1 with errno set to
EAGAIN. Thus it frees the temporary bucket it created and returns
EAGAIN as the result.

If you note the code in the core_input_filter() it has:

   if (APR_STATUS_IS_EAGAIN(rv)) {
   return APR_SUCCESS;
   }

Thus, when EAGAIN is encountered, it simply returns success and does
not do anything else.

Returning back up to _conn_read() in mod_python source code, we have
where core_input_filter() was called ap_get_brigade():

   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString(Connection read error));
   return NULL;
   }

Since APR_SUCCESS was returned and assigned to rc, no problem is detected.

The code which follows then assumes that the first bucket in the bucket
brigade actually contains valid data, when in fact the first bucket is actually
crap as nothing was done to set up a valid bucket since EAGAIN was returned.
As a consequence it crashes.

Thus in summary, _conn_read() doesn't cater in any way for the possibility
that the initial socket read may have failed because of EAGAIN and thus
the bucket is bogus. The problem is, how is it mean't to know this if the
value APR_SUCCESS is returned by ap_get_brigade().

At this point, seems a bit of research is needed of other examples of
connection handlers for Apache to see how they handle the initial startup
sequence and processing of initial data. What is in mod_python now does
not appear to be reliable in the face of an EAGAIN error occuring.

Graham





Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-30 Thread Graham Dumpleton
Graham Dumpleton wrote ..
 Returning back up to _conn_read() in mod_python source code, we have
 where core_input_filter() was called ap_get_brigade():
 
 Py_BEGIN_ALLOW_THREADS;
 rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
 Py_END_ALLOW_THREADS;
 
 if (! APR_STATUS_IS_SUCCESS(rc)) {
 PyErr_SetObject(PyExc_IOError, 
 PyString_FromString(Connection read error));
 return NULL;
 }
 
 Since APR_SUCCESS was returned and assigned to rc, no problem is detected.
 
 The code which follows then assumes that the first bucket in the bucket
 brigade actually contains valid data, when in fact the first bucket is
 actually
 crap as nothing was done to set up a valid bucket since EAGAIN was returned.
 As a consequence it crashes.
 
 Thus in summary, _conn_read() doesn't cater in any way for the possibility
 that the initial socket read may have failed because of EAGAIN and thus
 the bucket is bogus. The problem is, how is it mean't to know this if the
 value APR_SUCCESS is returned by ap_get_brigade().

Extending the above code as:

Py_BEGIN_ALLOW_THREADS;
rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
Py_END_ALLOW_THREADS;

if (! APR_STATUS_IS_SUCCESS(rc)) {
PyErr_SetObject(PyExc_IOError,
PyString_FromString(Connection read error));
return NULL;
}

/* Return empty string if no buckets. Can be caused by EAGAIN. */
if (APR_BRIGADE_EMPTY(bb)) {
return PyString_FromString();
}

seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
whether any new buckets added and returning empty string if not.

Can someone else seeing this issue try this fix and see if the tests then
work.

Graham


Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-30 Thread Graham Dumpleton
Graham Dumpleton wrote ..
 Extending the above code as:
 
 Py_BEGIN_ALLOW_THREADS;
 rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
 Py_END_ALLOW_THREADS;
 
 if (! APR_STATUS_IS_SUCCESS(rc)) {
 PyErr_SetObject(PyExc_IOError,
 PyString_FromString(Connection read error));
 return NULL;
 }
 
 /* Return empty string if no buckets. Can be caused by EAGAIN. */
 if (APR_BRIGADE_EMPTY(bb)) {
 return PyString_FromString();
 }
 
 seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
 whether any new buckets added and returning empty string if not.

Okay, this may work, but the EAGAIN propogating backup as an empty
string to Python can cause a tight loop to occur where calls are going
out and back into Python code. This will occur until something is read
or an error occurs.

To avoid the back and forth, another option may be:

while (APR_BRIGADE_EMPTY(bb)) {
Py_BEGIN_ALLOW_THREADS;
rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, 
bufsize);
Py_END_ALLOW_THREADS;

if (! APR_STATUS_IS_SUCCESS(rc)) {
PyErr_SetObject(PyExc_IOError,
PyString_FromString(Connection read error));
return NULL;
}
}

What doesn't make sense to me is that on my Mac OS X box where this
problem only occurs when you have two listener ports, even when you
have already read some input from the connection, it tight loops with
the lowest level read always returning EAGAIN. Ie., it doesn't block at all.

Thus something really bad is happening on on Mac OS X. Unless Apache
is setting some strange ioctl options on the socket to inadvertently
cause this, it looks to me like Mac OS X is broken in some way. I am
still on Mac OS X (10.3). I'll have to try it on my 10.4 box and see if it
makes any difference.

Graham


Re: Segfaults in ConnectionHander

2006-01-30 Thread Jim Gallacher

Jim Gallacher wrote:

Graham Dumpleton wrote:


What I might speculate is that if the test in mod_python for the
connection handler is setup to run on a secondary listener port,
but with the primary still active, that it may trigger the problem
on other systems like Linux. Jim, you might want to try this and see
if you can duplicate it on Linux.



I'll try it tonight.



Graham,

I am not able to reproduce the problem using the configuration and 
example code you give in MODPYTHON-102. (Linux Debian 2.6.12-1-k7 kernel).


Jim


Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

2006-01-29 Thread Barry Pederson

Jim Gallacher wrote:


Dang, it's frustrating not being able to reproduce this bug in Linux.


I suppose it's maybe something to do with different malloc 
implementations or such.   I haven't seen any +1s for OpenBSD, which 
would be interesting to see since they added some stuff in 3.8 to help 
catch problems with this sort of thing


http://kerneltrap.org/node/5584

Anyone been able to use valgrind or similar with mod_python?  I Googled 
and found a couple old messages from '02 and '04 mentioning attempts to 
use this, but doesn't sound like much came out of it.  I think there's a 
valgrind port on FreeBSD, so I may give that a try.


Barry


Re: Segfaults in ConnectionHander

2006-01-29 Thread Graham Dumpleton
Changed subject heading. See more of what I have uncovered below.
Not sure where to go next.

Graham Dumpleton wrote ..
   Unlike suggestions by someone else that self seemed to be getting
 corrupted,
   it looks fine to me, and code simply crashed down in:
  
apr_bucket_read(b, data, size, APR_BLOCK_READ)
  
   on very first call to it. Thus need to start tracking into Apache itself
  and see what
   there may be about bucket structures that isn't correct. This is where
  I got to
   last time before I gave up, feeling it wasn't worth the effort at the
  time. I'll try
   and build a version of Apache with debug so I can get a better stack
  trace.
  
  The first thing I'd check is for validity of b. Buckets use reference
  counting much like Python, so sometimes it's possible for a bucket to
  self-distruct.
 
 Starting to delve into the bucket now. Haven't looked at reference count
 stuff yet, but the b-type object seems to be bogus. This is where the
 read() function pointer is kept and since it is a bad value it is why it
 dies.

This is starting to look really ugly.

In _conn_read(), it first creates a bucket brigade from the connection
objects pool object. No chance of this being destroyed prematurely
as a result.

bb = apr_brigade_create(c-pool, c-bucket_alloc);

From what I understand, it then makes a call which links the bucket
brigade to the actual source of data.

rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);

Under normal circumstances this would also have the side effect of
performing the first actual read of data off the socket connection which
the client created to Apache.

Import things here to note are the value of:

  c-input_filters-frec-filter_func.in_func

going into the call. Not sure exactly, but I imagine that this is the first
input filter which handles reading from the socket.

My logging shows the address of the input filter in memory as 178456.

When ap_get_brigade() returns okay, the first actual bucket from the
bucket brigade is obatained:

b = APR_BRIGADE_FIRST(bb);

There are two interesting values in the bucket worth looking at:

b-type-name
b-type-read

The first is the type of bucket object and the second is the pointer to a
function to read data from the bucket.

My logging shows the type of bucket as being HEAP and the address
of the read function pointer as 1819356.

I will not go into the rest of the function except to say that as necessary
it may do additional reads using apr_bucket_read() to get more data
if required when that initially read by ap_get_brigade() isn't enough.

Anyway, the above is when it is working okay. This being when I have the
connection handler attached to my primary listener port. As soon as I
add into the main Apache configuration file an additional socket for
Apache to listen on, ie., when I add:

  Listen 8081

it will crash in _conn_read() no matter whether I have attached the
connection handler to the primary listener port or the additional
listener port.

In contrast to the above, when it dies, the address of the input filter
in memory is still 178456, but the initial bucket in the bucket brigade
as populated by ap_get_brigade() is bogus. Ie., I get for the name crap
like:

  \x01\x80b\x18\x01\x8f\xec\x18\x01\x83b\x18\x01\x80b\x1c\x01\x8f\xcc\xb8

and the address of the read function is 88.

Importantly, the ap_get_brigade() function does not block on a read
waiting for the first data coming over the socket like it did before.

With the bogus bucket returned, when apr_bucket_read() is later called,
it tries to use the read function in the initial bucket which being bogus
causes the crash.

Thus in summary, with a secondary listener port the ap_get_brigade()
function doesn't block on read waiting for first data, returning
immediately, but still seeming to return success. The initial bucket
in the bucket brigade then seems to be bogus.

What I might speculate is that if the test in mod_python for the
connection handler is setup to run on a secondary listener port,
but with the primary still active, that it may trigger the problem
on other systems like Linux. Jim, you might want to try this and see
if you can duplicate it on Linux.

BTW, I am not saying this is the same problem on the BSD systems,
but it certainly is not correct either way.

Graham