mod_python as a mod_dav backend

2006-01-30 Thread Matt Carpenter




Hi,
Not sure if this is best posted here, or to mod_dav mailing list. But
here goes.
Has anyone looked at using mod_python to backend mod_dav, with a
similar usage to FUSE's python binding. Basically mod_dav_python.

Thanks,
Matt
-- 


Matt Carpenter
[EMAIL PROTECTED]

FCP Internet LTD
Unit 3, 52 Victoria Road,
Aldershot, Hampshire, GU11 1SS

tel: +44 (0) 1252 333 344
fax: +44 (0) 1252 333 348
efax: +44 (0) 8704 281 008


This message is confidential; Any unauthorised
disclosure, use or dissemination,
either whole or partial, is prohibited. If you are not the intended
recipient, please notify the
sender immediately and delete all copies of this message. Any views or
opinions presented are solely
those of the author and do not necessarily represent those of FCP
Internet or its subsidiaries.





Re: mod_python as a mod_dav backend

2006-01-30 Thread Matt Carpenter




Graham Dumpleton wrote:
Others may know what you are talking about, but I plead
ignorance. Can
  
you perhaps describe further what you are talking about, how it would
be
  
used etc. A URL to stuff that could be read to understand similar
things
  
would also help.
  
  
Graham
  

What I am trying to achieve:
I'm writing a module for our system for managing documents that're
attached to records in our database, updates to these documents are
recorded into the database along with what user made the user that made
the edit, and various other information depending on what type of
document (the system manages templates, and mailmerges with virtual
.csv files pulling data from the database as well). The directory
structure is entirely virtual, the structure on the servers is just a
few directories for each type of file, and the files are named after
their record in the database.
mod_dav implements hooks (see http://mailman.lyra.org/pipermail/dav-dev/2005-April/005926.html),
but I'm not a C programmer, so I'd like a these hooks to be able to
call python functions instead.
Hope that makes sense.

Thanks,
Matt
-- 


Matt Carpenter
[EMAIL PROTECTED]

FCP Internet LTD
Unit 3, 52 Victoria Road,
Aldershot, Hampshire, GU11 1SS

tel: +44 (0) 1252 333 344
fax: +44 (0) 1252 333 348
efax: +44 (0) 8704 281 008


This message is confidential; Any unauthorised
disclosure, use or dissemination,
either whole or partial, is prohibited. If you are not the intended
recipient, please notify the
sender immediately and delete all copies of this message. Any views or
opinions presented are solely
those of the author and do not necessarily represent those of FCP
Internet or its subsidiaries.





Re: Segfaults in ConnectionHander FreeBSD (was Re: 3.2.6 test period - how long do we wait?)

2006-01-30 Thread David Fraser

Jim Gallacher wrote:

Barry Pederson wrote:
I think this is the general kind of thing we're looking for though, 
with some mistaken pointer/memory operation.

Too bad we can't write *everything* in python. :(

You haven't been following PyPy then? :-)

David


Re: 3.2.6 test period - how long do we wait?

2006-01-30 Thread Jim Gallacher

Gregory (Grisha) Trubetskoy wrote:


On Sun, 29 Jan 2006, Graham Dumpleton wrote:


 buffer += bufsize;



On a second thought - yes, you're right :-)



And if he's not then there is a bug in filter_read since that is what it 
does and it is very similar to _conn_read.


Jim


Re: mod_python as a mod_dav backend

2006-01-30 Thread Graham Dumpleton


On 30/01/2006, at 9:11 PM, Matt Carpenter wrote:


Hi,
Not sure if this is best posted here, or to mod_dav mailing list.  
But here goes.
Has anyone looked at using mod_python to backend mod_dav, with a  
similar usage to FUSE's python binding. Basically mod_dav_python.


Others may know what you are talking about, but I plead ignorance. Can
you perhaps describe further what you are talking about, how it would be
used etc. A URL to stuff that could be read to understand similar things
would also help.

Graham


Re: Segfaults in ConnectionHander

2006-01-30 Thread Gregory (Grisha) Trubetskoy


This may be a good question to post to dev@httpd.apache.org

Grisha

On Mon, 30 Jan 2006, Graham Dumpleton wrote:


Getting a bit closer now, have next part of puzzle worked out.

Graham Dumpleton wrote ..

This is starting to look really ugly.

In _conn_read(), it first creates a bucket brigade from the connection
objects pool object. No chance of this being destroyed prematurely
as a result.

bb = apr_brigade_create(c-pool, c-bucket_alloc);


From what I understand, it then makes a call which links the bucket

brigade to the actual source of data.

rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);

Under normal circumstances this would also have the side effect of
performing the first actual read of data off the socket connection which
the client created to Apache.


When ap_get_brigade() is called, it is actually calling through to the
function core_input_filter() in Apache (server/core.c). In that function, it
ultimately hits the code:

   e = APR_BRIGADE_FIRST(ctx-b);
   rv = apr_bucket_read(e, str, len, block);

   if (APR_STATUS_IS_EAGAIN(rv)) {
   return APR_SUCCESS;
   }

Tracking down into apr_bucket_read() it ends up calling the function
socket_bucket_read() containg the code:

   *str = NULL;
   *len = APR_BUCKET_BUFF_SIZE;
   buf = apr_bucket_alloc(*len, a-list); /* XXX: check for failure? */

   rv = apr_socket_recv(p, buf, len);

   if (block == APR_NONBLOCK_READ) {
   apr_socket_timeout_set(p, timeout);
   }

   if (rv != APR_SUCCESS  rv != APR_EOF) {
   apr_bucket_free(buf);
   return rv;
   }

The apr_socket_recv() is what is doing the initial read of data from the
socket connection. This should block until the first data is received.

What is happening though is that it is returning -1 with errno set to
EAGAIN. Thus it frees the temporary bucket it created and returns
EAGAIN as the result.

If you note the code in the core_input_filter() it has:

   if (APR_STATUS_IS_EAGAIN(rv)) {
   return APR_SUCCESS;
   }

Thus, when EAGAIN is encountered, it simply returns success and does
not do anything else.

Returning back up to _conn_read() in mod_python source code, we have
where core_input_filter() was called ap_get_brigade():

   Py_BEGIN_ALLOW_THREADS;
   rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
   Py_END_ALLOW_THREADS;

   if (! APR_STATUS_IS_SUCCESS(rc)) {
   PyErr_SetObject(PyExc_IOError,
   PyString_FromString(Connection read error));
   return NULL;
   }

Since APR_SUCCESS was returned and assigned to rc, no problem is detected.

The code which follows then assumes that the first bucket in the bucket
brigade actually contains valid data, when in fact the first bucket is actually
crap as nothing was done to set up a valid bucket since EAGAIN was returned.
As a consequence it crashes.

Thus in summary, _conn_read() doesn't cater in any way for the possibility
that the initial socket read may have failed because of EAGAIN and thus
the bucket is bogus. The problem is, how is it mean't to know this if the
value APR_SUCCESS is returned by ap_get_brigade().

At this point, seems a bit of research is needed of other examples of
connection handlers for Apache to see how they handle the initial startup
sequence and processing of initial data. What is in mod_python now does
not appear to be reliable in the face of an EAGAIN error occuring.

Graham





Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-30 Thread Graham Dumpleton
Graham Dumpleton wrote ..
 Returning back up to _conn_read() in mod_python source code, we have
 where core_input_filter() was called ap_get_brigade():
 
 Py_BEGIN_ALLOW_THREADS;
 rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
 Py_END_ALLOW_THREADS;
 
 if (! APR_STATUS_IS_SUCCESS(rc)) {
 PyErr_SetObject(PyExc_IOError, 
 PyString_FromString(Connection read error));
 return NULL;
 }
 
 Since APR_SUCCESS was returned and assigned to rc, no problem is detected.
 
 The code which follows then assumes that the first bucket in the bucket
 brigade actually contains valid data, when in fact the first bucket is
 actually
 crap as nothing was done to set up a valid bucket since EAGAIN was returned.
 As a consequence it crashes.
 
 Thus in summary, _conn_read() doesn't cater in any way for the possibility
 that the initial socket read may have failed because of EAGAIN and thus
 the bucket is bogus. The problem is, how is it mean't to know this if the
 value APR_SUCCESS is returned by ap_get_brigade().

Extending the above code as:

Py_BEGIN_ALLOW_THREADS;
rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
Py_END_ALLOW_THREADS;

if (! APR_STATUS_IS_SUCCESS(rc)) {
PyErr_SetObject(PyExc_IOError,
PyString_FromString(Connection read error));
return NULL;
}

/* Return empty string if no buckets. Can be caused by EAGAIN. */
if (APR_BRIGADE_EMPTY(bb)) {
return PyString_FromString();
}

seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
whether any new buckets added and returning empty string if not.

Can someone else seeing this issue try this fix and see if the tests then
work.

Graham


Re: Segfaults in ConnectionHander (Possible Solution)

2006-01-30 Thread Graham Dumpleton
Graham Dumpleton wrote ..
 Extending the above code as:
 
 Py_BEGIN_ALLOW_THREADS;
 rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, bufsize);
 Py_END_ALLOW_THREADS;
 
 if (! APR_STATUS_IS_SUCCESS(rc)) {
 PyErr_SetObject(PyExc_IOError,
 PyString_FromString(Connection read error));
 return NULL;
 }
 
 /* Return empty string if no buckets. Can be caused by EAGAIN. */
 if (APR_BRIGADE_EMPTY(bb)) {
 return PyString_FromString();
 }
 
 seems to fix the problem. Ie., use call to APR_BRIGADE_EMPTY(bb) to check
 whether any new buckets added and returning empty string if not.

Okay, this may work, but the EAGAIN propogating backup as an empty
string to Python can cause a tight loop to occur where calls are going
out and back into Python code. This will occur until something is read
or an error occurs.

To avoid the back and forth, another option may be:

while (APR_BRIGADE_EMPTY(bb)) {
Py_BEGIN_ALLOW_THREADS;
rc = ap_get_brigade(c-input_filters, bb, mode, APR_BLOCK_READ, 
bufsize);
Py_END_ALLOW_THREADS;

if (! APR_STATUS_IS_SUCCESS(rc)) {
PyErr_SetObject(PyExc_IOError,
PyString_FromString(Connection read error));
return NULL;
}
}

What doesn't make sense to me is that on my Mac OS X box where this
problem only occurs when you have two listener ports, even when you
have already read some input from the connection, it tight loops with
the lowest level read always returning EAGAIN. Ie., it doesn't block at all.

Thus something really bad is happening on on Mac OS X. Unless Apache
is setting some strange ioctl options on the socket to inadvertently
cause this, it looks to me like Mac OS X is broken in some way. I am
still on Mac OS X (10.3). I'll have to try it on my 10.4 box and see if it
makes any difference.

Graham


Re: Segfaults in ConnectionHander

2006-01-30 Thread Jim Gallacher

Jim Gallacher wrote:

Graham Dumpleton wrote:


What I might speculate is that if the test in mod_python for the
connection handler is setup to run on a secondary listener port,
but with the primary still active, that it may trigger the problem
on other systems like Linux. Jim, you might want to try this and see
if you can duplicate it on Linux.



I'll try it tonight.



Graham,

I am not able to reproduce the problem using the configuration and 
example code you give in MODPYTHON-102. (Linux Debian 2.6.12-1-k7 kernel).


Jim


Re: contribution to mod_python: Apache + SimpleXMLRPCServer (fwd)

2006-01-30 Thread Graham Dumpleton
An initial few comments from a first pass through.

def _write(self, request, response, content_type='text/xml'):
request.send_http_header()
request.content_type = content_type
request.write(response)

This is technically wrong, although it doesn't matter on mod_python  3.0.

The issue is that send_http_header() in mod_python 2.7 should only be
called after content type is set. Here they do it before. Only works because
in 3.X send_http_header() is a NOP.

# men have been killed for less
temp = sys.stderr
sys.stderr = stderr_mod_python(self.request)

...

sys.stderr = temp

This is not safe in a multithread MPM.

except Exception, e:
# report exception back to server
response = xmlrpclib.dumps(
xmlrpclib.Fault(1, %s:%s % (sys.exc_type, sys.exc_value))
)
# and also log it, duh
etype, evalue, etb = sys.exc_info()

stack = traceback.format_exception(etype, evalue, etb) 
for l in stack:
sys.stderr.write(l)

First it uses fudged sys.stderr. Second is that it exacerbates a problem in
XML-RPC which is that there is no concept of namespaces for error return
codes. Because they have used arbitrary return status of 1 for internal
exception or unexpected exception in user code, then you can't distinguish
easily a valid fault response with return status of 1 generated by user
code from unexpected exception.

It may be more appropriate to generate a 500 HTTP error response in
this circumstance given that it really consitutes an internal server error
rather than it being a valid XML-RPC fault response generated by the
user. This is an issue for debate though. It depends on whether you want
to be conformant with how SimpleXmlRpcServer works which is where
this questionable code came from in the first place.

Other issues are that it doesn't check incoming content type to validate
that it is actually 'text/xml' per specification for XML-RPC. It doesn't use
incoming content length for read on POST data which can be a problem
in some cases. It also doesn't set outgoing content length as per
specification for XML-RPC. It could also perhaps be a bit more knowledgeable
about mod_python and pass through apache.SERVER_RETURN exception
so as to allow exposed methods to still generate it if need be.

This isn't the only implementation of XML-RPC support integrated with
mod_python. I have an alternate take on it in Vampire which isn't bound
to the SimpleXmlRpcServer base class. See:

  http://svn.dscpl.com.au/vampire/trunk/software/vampire/xmlrpc.py

I can't find the others right now, but I have posted links to them before
on the main mod_python list.

Overall, I'm not sure at this point that it is worthwhile putting XML-
RPC support in mod_python. If it is done, I would prefer to see it be
done as part of a larger effort to provide a range of handler components
which all work consistently together, rather than adhoc bits and pieces
that cannot be glued together easily.

Grisha wrote ..
 
 If someone here has spare Brain/CPU cycles, could you look at the attached
 code and provide feedback?
 
 Grisha
 
 -- Forwarded message --
 Date: Mon, 30 Jan 2006 18:04:42 -0800
 From: Matt Chisholm [EMAIL PROTECTED]
 Subject: Re: contribution to mod_python: Apache + SimpleXMLRPCServer
 
 On Jan 30 2006, 11:42, Matt Chisholm wrote:
 We've written a few classes to use the SimpleXMLRPCServer module in
 Python with mod_python instead of the Python CGI module.  We've been
 using it internally for a while and we'd like to contribute it back to
 the mod_python project; we don't really have the time to create a
 separate project for it, but it seems like something that would be
 useful to many people.
 
 We agree to assign copyright of this code to the mod_python project
 and to license it under the mod_python license.  Our employer,
 BitTorrent Inc., also agrees.
 
 I've attached a copy of the code.
 
 Please let me know if this is not the right channel to send
 contributions; also, I'm not on this list so please respond to me
 individually.
 
 Matt Chisholm