Hi,

(moving it to zope3-dev for discussion)

Am Samstag, den 21.04.2007, 13:50 +0000 schrieb Jim Fulton:
> On Apr 21, 2007, at 9:24 AM, Christian Theune wrote:
> 
> > *sigh*
> >
> > This is a can of worms. It looks like email.FeedParser.FeedParser is
> > probably what we want to use.
> 
> Why?

Ah. I was working on the integration of Blobs in Zope 3 and noticed that
we should try harder at some points to allow a more time-efficient
handling of large data. As soon as a file is available as a
NamedTemporaryFile today, a Blob can consume them doing a rename which
works in O(1).

The only remaining issue is during the upload of large data via HTTP.
What happens there is that the upload is first streamed to a temporary
file in the server and then handed over to the publisher. The publisher
then parses the mime data and creates new files.

This takes up the application thread for a task that doesn't require
application resources, so this could be done before handing the request
into the publisher. For files of about 50 MB this already takes 5-10
seconds, eventually slower, depending on the machine.

Typically, the upload speed is relatively slow compared to the CPU time
required to unpack the mime data. IMHO we can make good use of that time
by unpacking it as early as possible.


To avoid blocking, the server could feed the body into something like
the feedparser while getting the data from the client. The publisher
then can build on what was parsed already.

Dieter Maurer earlier pointed out an alternative (that would actually
work with the current cgi module): just introduce a new thread that
pre-processes the request before handing it into the publisher.

That way we can free up the application threads for smaller requests to
go through in between. However, the user would still see the lag after
uploading.

> > And the Python guys have been talking
> > about getting rid of cgi.FieldStorage in it's current implementation
> > since 2005 but nothing has happened. :(
> >
> > Some issues that would be nice as a preparation:
> >
> > - make a cleaner alternative interface as a replacement for  
> > FieldStorage
> 
> Could you describe the problems you percieve with FieldStorage.  I'm  
> not necessarily opposed to a change, but you need to present the  
> technical reasons.

I'm trying hard, but it takes a lot of time to understand what's
happening there. The public interface is badly underdocumented and a lot
of implicit behaviour is happening all over the place. I find it
unmaintainable - and as I'd like to adjust it to my needs, I feel like
rewriting would be better.

> > - use a parsed email.Message as input for the storage so that the  
> > publisher can blindly use the feedparser to push the uploaded body to.
> 
> Ditto.

I was trying to find something that can parse mime data already and the
Python docs refer to email.Message and discourage the use of the
deprecated packages.

> > I think writing up a spec would be a good thing, maybe we can do
> > something about this at a next sprint. I'm pretty sure this won't land
> > in Zope 3.4
> 
> Some notes:
> 
> - I'm 98% sure that when using ZServer or Twisted, the CGI module is  
> invoked by the publisher and, therefore, in the application thread,  
> not the reactor thread.  Blocking the select thread should not be an  
> issue.

That is right. However, blocking application threads is the issue I'm
trying to hunt down.

> - AFAIK, recent changes to FieldStorage use a max size on readline  
> calls that should prevent large memory consumption.  In the case of a  
> file upload, data are simply copied between temporary files.

True. I want to avoid the copy because of time issues, not memory.

> A small optimization would be arrange to avoid the copy if the input  
> stream happens to be a file.

I have already done that. However, currently a copy happens when parsing
and we can't avoid that. But I want to do it before the request enters
the application thread.

> The reason efforts to replace or refactor the cgi module over the  
> years have languages is that it is big and complicated and works  
> pretty well for the most part. 

Hmm. I assume you're pointing out it's hairy but it works so nobody
really got down to it. I see that when reading through various old
discussions on the lists too.

> If we are *really* going to do any  
> tinkering, it should be in cooperation with the wider Python  
> community, through the Web SIG.  Good luck. :)

I agree with that. I need to do some more research how other frameworks
handle this and probably gonna raise the question on the web sig list
before starting any work (again). Thanks for reminding.

Christian

-- 
gocept gmbh & co. kg - forsterstraße 29 - 06112 halle/saale - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development

Attachment: signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil

_______________________________________________
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com

Reply via email to