Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)

2004-08-24 Thread Ames Andreas (MPA/DF)
Hello,

Leonardo Rochael Almeida wrote:

 Yes, if you file a feature request on the bug collector, which can
 be found at:

I've put the most recent version of the patch to
http://collector.zope.org/Zope/1472.


cheers,

andreas

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)

2004-08-23 Thread Ames Andreas (MPA/DF)
Hello,

Dieter Maurer wrote:

 Maybe, I have a much simpler solution:

   Something in ZServer makes all file fields seekable by
   delivering them through some temporary (either a StringIO
   or a temporary file).

   Maybe, you could do the same for your requests?

Yes, I could.  But trying to avoid it was the very reason I tried to
patch HTTPRequest for.

What you propose is what the HTTPServer does for instance.  I have at
least two objections:

1) For big requests (big means some hardcoded threshold) HTTPServer
   buffers the request on disk.  Then ZPublisher creates a
   FieldStorage instance which will write the request once again to
   disk.  And that also means the request is read from disk several
   times.  That's a bit too much blocking disk I/O in a production
   webserver for my liking.  And that's not counting retries; each
   retry will add another pair of blocking disk I/O in the current
   implementation.  What I'm trying to do is:

   - avoid unneeded disk I/O in retries

   - make it possible that self.stdin in HTTPRequest can be
 non-seekable such that it doesn't have to be buffered on disk
 before FieldStorage is created (and does exactly that buffering).

2) I'm convinced that the one and only ZServer thread should do
   without any blocking (I/O) call (besides select() that is) and I
   want AJPServer also be an experiment to see if this improves
   performance.


cheers,

andreas

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)

2004-08-20 Thread Ames Andreas (MPA/DF)
Hi,

Andreas Ames wrote:

 next trial.  I hope I've got it straight now.

Not quite so (although this may only be nitpicking ;-).

fs_env = environ

should have read

fs_env = environ.copy()

obviously.  See attachment.


cheers,

andreas



--- lib.orig/python/ZPublisher/HTTPRequest.py	2004-08-18 16:37:18.0 +0200
+++ lib/python/ZPublisher/HTTPRequest.py	2004-08-20 13:07:25.0 +0200
@@ -60,6 +60,37 @@
 class NestedLoopExit( Exception ):
 pass
 
+FILELIST_ENV_ENTRY = 'request.filestorage.filelist'
+
+class ZFieldStorage(cgi.FieldStorage):
+\
+Tame the FieldStorage.
+
+This class overrides the stock implementation to add all file
+objects, generated while parsing a request, to a list.  This list
+is given by the caller.
+
+Unfortunately cgi.FieldStorage is not generic enough, to adapt the
+list of constructor parameters, by just deriving from it.  Thus
+the list is specified within the environ dict, given to the
+constructor.  This is ugly and could be fixed by a simple patch
+against cgi.FieldStorage.
+
+
+def __init__(self, fp=None, headers=None, outerboundary=,
+ environ=os.environ, keep_blank_values=0, strict_parsing=0):
+self._filelist = environ.get(FILELIST_ENV_ENTRY, [])
+cgi.FieldStorage.__init__(self, fp, headers, outerboundary,
+  environ, keep_blank_values,
+  strict_parsing)
+
+def make_file(self, binary=None):
+import tempfile
+res = tempfile.TemporaryFile(w+b)
+self._filelist.append(res)
+return res
+
+
 class HTTPRequest(BaseRequest):
 \
 Model HTTP request data.
@@ -125,10 +156,15 @@
 
 def retry(self):
 self.retry_count=self.retry_count+1
-self.stdin.seek(0)
+if not self._fs:# not really needed
+raise ValueError, Retrying on partially initialized request is not supported.  Call processInputs().
+for f in self._fs_filelist:
+f.seek(0)
 r=self.__class__(stdin=self.stdin,
  environ=self._orig_env,
- response=self.response.retry()
+ response=self.response.retry(),
+ fs = self._fs,
+ flist = self._fs_filelist
  )
 r.retry_count=self.retry_count
 return r
@@ -138,6 +174,8 @@
 # removing tempfiles.
 self.stdin = None
 self._file = None
+self._fs = None
+self._fs_filelist = None
 self.form.clear()
 # we want to clear the lazy dict here because BaseRequests don't have
 # one.  Without this, there's the possibility of memory leaking
@@ -237,7 +275,7 @@
 
 return self._client_addr
 
-def __init__(self, stdin, environ, response, clean=0):
+def __init__(self, stdin, environ, response, clean=0, fs=None, flist=None):
 self._orig_env=environ
 # Avoid the overhead of scrubbing the environment in the
 # case of request cloning for traversal purposes. If the
@@ -261,7 +299,11 @@
 self.steps=[]
 self._steps=[]
 self._lazies={}
-
+self._fs = fs
+if flist is None:
+self._fs_filelist = []
+else:
+self._fs_filelist = flist
 
 if environ.has_key('REMOTE_ADDR'):
 self._client_addr = environ['REMOTE_ADDR']
@@ -382,23 +424,26 @@
 taintedform=self.taintedform
 
 meth=None
-fs=FieldStorage(fp=fp,environ=environ,keep_blank_values=1)
-if not hasattr(fs,'list') or fs.list is None:
+if not self._fs:
+fs_env = environ.copy()
+fs_env[FILELIST_ENV_ENTRY] = self._fs_filelist
+self._fs=ZFieldStorage(fp=fp,environ=fs_env,keep_blank_values=1)
+if not hasattr(self._fs,'list') or self._fs.list is None:
 # Hm, maybe it's an XML-RPC
-if (fs.headers.has_key('content-type') and
-fs.headers['content-type'] == 'text/xml' and
+if (self._fs.headers.has_key('content-type') and
+self._fs.headers['content-type'] == 'text/xml' and
 method == 'POST'):
 # Ye haaa, XML-RPC!
 global xmlrpc
 if xmlrpc is None: import xmlrpc
-meth, self.args = xmlrpc.parse_input(fs.value)
+meth, self.args = xmlrpc.parse_input(self._fs.value)
 response=xmlrpc.response(response)
 other['RESPONSE']=self.response=response
 self.maybe_webdav_client = 0
 else:
-self._file=fs.file
+self._file=self._fs.file
 else:
-fslist=fs.list
+fslist=self._fs.list
 tuple_items={}
 lt=type([])
 CGI_name=isCGI_NAME

Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)

2004-08-20 Thread Dieter Maurer
Ames Andreas (MPA/DF) wrote at 2004-8-18 17:22 +0200:
I had a thinko in my previous patch
(http://mail.zope.org/pipermail/zope-dev/2004-August/023630.html)
which is corrected in the attached version of the patch.  Sorry for
any inconvenience, I might have caused.

I do not think that your patch can work:

  The FieldStorage must be rebuilt, because it can contain
  file objects. These file objects must be reset as they may
  have been (partially) read in the previous request.
  This prevent reusing the previous FieldStorage.

-- 
Dieter
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)

2004-08-20 Thread Ames Andreas (MPA/DF)
Hi Dieter, *,

Dieter Maurer wrote:

   The FieldStorage must be rebuilt, because it can contain
   file objects. These file objects must be reset as they may
   have been (partially) read in the previous request.
   This prevent reusing the previous FieldStorage.

As you may have seen in my first patch proposal I actually tried to
address this problem, although halfbaken and buggy.  So here's the
next trial.  I hope I've got it straight now.

Why didn't I do something to reset the files, FiledStorage creates?

Because I thought it wouldn't be needed.  If you look at
cgi.FieldStorage you will see, that accessing FieldStorage's 'value'
field, will always reset the file before reading it and HTTPRequest's
'BODY' field does the same.

Cause of your hint I looked again and realised that I was wrong.
HTTPRquest's 'BODYFILE' field fails to reset the file before accessing
it (as in OFS/Image.py the File class does).  That could be easily
fixed.  What's worse is that FiledStorage can possibly create
subinstances of itsellf which can possibly contain other fileobjects.
This is the case for multipart-type requests.


How to solve this?

I can see three alternative aproaches:

1) In HTTPRequest.retry loop through self._fs.list (if it exists) and
   call seek(0) on every fileobject, you can find there.  This can
   only work if the maximum depth of the FiledStorage tree is
   predictable, which in turn depends on the possible input.  I don't
   know enough about the possible HTTP requests to determine such a
   limit or to even 'prove' its existance.

2) In HTTPRequest.retry recursively walk through the FieldStorage
   trees in self._fs.list (if it exists).  I wouldn't do that in
   Python without being able to predict the (worst case) number of
   function calls needed to do this recursion.  And without refreshing
   my knowledge of compiler techniques I wouldn't be able to roll the
   recursion out to a loop.

3) This is a somewhat drastic (and possibly conceived as hackish?)
   approach which OTOH is independent of the input and should perform
   quite well.  I overwrite cgi.FieldStorage, esp. its make_file
   method, such that it adds all fileobjects, that it creates, to a
   (possibly caller provided) list.  Thus the caller
   (i.e. HTTPRequest) has a list of all those fileobjects that need to
   be reset if the FieldStorage should be reused (as in ZPublisher's
   retries).

For the patch attached I chose the third approach.  Please let me
know, what you think about it.  If it is assessed to be ok I will
update the collctor item, I opened yesterday.


cheers,

andreas



--- lib.orig/python/ZPublisher/HTTPRequest.py	2004-08-18 16:37:18.0 +0200
+++ lib/python/ZPublisher/HTTPRequest.py	2004-08-19 17:42:31.0 +0200
@@ -60,6 +60,37 @@
 class NestedLoopExit( Exception ):
 pass
 
+FILELIST_ENV_ENTRY = 'request.filestorage.filelist'
+
+class ZFieldStorage(cgi.FieldStorage):
+\
+Tame the FieldStorage.
+
+This class overrides the stock implementation to add all file
+objects, generated while parsing a request, to a list.  This list
+is given by the caller.
+
+Unfortunately cgi.FieldStorage is not generic enough, to adapt the
+list of constructor parameters, by just deriving from it.  Thus
+the list is specified within the environ dict, given to the
+constructor.  This is ugly and could be fixed by a simple patch
+against cgi.FieldStorage.
+
+
+def __init__(self, fp=None, headers=None, outerboundary=,
+ environ=os.environ, keep_blank_values=0, strict_parsing=0):
+self._filelist = environ.get(FILELIST_ENV_ENTRY, [])
+cgi.FieldStorage.__init__(self, fp, headers, outerboundary,
+  environ, keep_blank_values,
+  strict_parsing)
+
+def make_file(self, binary=None):
+import tempfile
+res = tempfile.TemporaryFile(w+b)
+self._filelist.append(res)
+return res
+
+
 class HTTPRequest(BaseRequest):
 \
 Model HTTP request data.
@@ -125,10 +156,15 @@
 
 def retry(self):
 self.retry_count=self.retry_count+1
-self.stdin.seek(0)
+if not self._fs:
+raise ValueError, Retrying on partially initialized request is not supported.  Call processInputs().
+for f in self._fs_filelist:
+f.seek(0)
 r=self.__class__(stdin=self.stdin,
  environ=self._orig_env,
- response=self.response.retry()
+ response=self.response.retry(),
+ fs = self._fs,
+ flist = self._fs_filelist
  )
 r.retry_count=self.retry_count
 return r
@@ -138,6 +174,8 @@
 # removing tempfiles.
 self.stdin = None
 self._file = None
+self._fs = None
+self._fs_filelist = None
 self.form.clear()
 # we want 

Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)

2004-08-20 Thread Dieter Maurer
Ames Andreas (MPA/DF) wrote at 2004-8-19 18:04 +0200:
 ...
Cause of your hint I looked again and realised that I was wrong.
HTTPRquest's 'BODYFILE' field fails to reset the file before accessing
it (as in OFS/Image.py the File class does).  That could be easily
fixed.  What's worse is that FiledStorage can possibly create
subinstances of itsellf which can possibly contain other fileobjects.
This is the case for multipart-type requests.

Maybe, I have a much simpler solution:

  Something in ZServer makes all file fields seekable by
  delivering them through some temporary (either a StringIO
  or a temporary file).

  Maybe, you could do the same for your requests?

-- 
Dieter
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)

2004-08-18 Thread Leonardo Rochael Almeida


Em Qua, 2004-08-18 às 12:22, Ames Andreas (MPA/DF) escreveu:
 [...]
 Questions:
 
 - Is this the right forum/place to send patches to?

This is the right forum to discuss the patches, and to send them for
evaluation if it's a small one, like yours. The best place to actually
send patches to is as attachment in a bug report in the Zope bug
collector.

 - Is there any chance that this could be applied to Zope's mainline?
   (If not I will proceed with a local disk buffering scheme in the
   long term.)

Yes, if you file a feature request on the bug collector, which can be
found at:

http://collector.zope.org/Zope

Cheers, Leo
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )