Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)
Hello, Leonardo Rochael Almeida wrote: > Yes, if you file a feature request on the bug collector, which can > be found at: I've put the most recent version of the patch to http://collector.zope.org/Zope/1472. cheers, andreas ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)
Hello, Dieter Maurer wrote: > Maybe, I have a much simpler solution: > > Something in ZServer makes all file fields seekable by > delivering them through some temporary (either a StringIO > or a temporary file). > > Maybe, you could do the same for your requests? Yes, I could. But trying to avoid it was the very reason I tried to patch HTTPRequest for. What you propose is what the HTTPServer does for instance. I have at least two objections: 1) For big requests (big means some hardcoded threshold) HTTPServer buffers the request on disk. Then ZPublisher creates a FieldStorage instance which will write the request once again to disk. And that also means the request is read from disk several times. That's a bit too much blocking disk I/O in a production webserver for my liking. And that's not counting retries; each retry will add another pair of blocking disk I/O in the current implementation. What I'm trying to do is: - avoid unneeded disk I/O in retries - make it possible that self.stdin in HTTPRequest can be non-seekable such that it doesn't have to be buffered on disk before FieldStorage is created (and does exactly that buffering). 2) I'm convinced that the one and only ZServer thread should do without any blocking (I/O) call (besides select() that is) and I want AJPServer also be an experiment to see if this improves performance. cheers, andreas ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)
Ames Andreas (MPA/DF) wrote at 2004-8-19 18:04 +0200: > ... >Cause of your hint I looked again and realised that I was wrong. >HTTPRquest's 'BODYFILE' field fails to reset the file before accessing >it (as in OFS/Image.py the File class does). That could be easily >fixed. What's worse is that FiledStorage can possibly create >subinstances of itsellf which can possibly contain other fileobjects. >This is the case for multipart-type requests. Maybe, I have a much simpler solution: Something in ZServer makes all file fields seekable by delivering them through some temporary (either a StringIO or a temporary file). Maybe, you could do the same for your requests? -- Dieter ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)
Hi Dieter, *, Dieter Maurer wrote: > The "FieldStorage" must be rebuilt, because it can contain > file objects. These file objects must be reset as they may > have been (partially) read in the previous request. > This prevent reusing the previous "FieldStorage". As you may have seen in my first patch proposal I actually tried to address this problem, although halfbaken and buggy. So here's the next trial. I hope I've got it straight now. Why didn't I do something to reset the files, FiledStorage creates? Because I thought it wouldn't be needed. If you look at cgi.FieldStorage you will see, that accessing FieldStorage's 'value' field, will always reset the file before reading it and HTTPRequest's 'BODY' field does the same. Cause of your hint I looked again and realised that I was wrong. HTTPRquest's 'BODYFILE' field fails to reset the file before accessing it (as in OFS/Image.py the File class does). That could be easily fixed. What's worse is that FiledStorage can possibly create subinstances of itsellf which can possibly contain other fileobjects. This is the case for multipart-type requests. How to solve this? I can see three alternative aproaches: 1) In HTTPRequest.retry loop through self._fs.list (if it exists) and call seek(0) on every fileobject, you can find there. This can only work if the maximum depth of the FiledStorage tree is predictable, which in turn depends on the possible input. I don't know enough about the possible HTTP requests to determine such a limit or to even 'prove' its existance. 2) In HTTPRequest.retry recursively walk through the FieldStorage trees in self._fs.list (if it exists). I wouldn't do that in Python without being able to predict the (worst case) number of function calls needed to do this recursion. And without refreshing my knowledge of compiler techniques I wouldn't be able to roll the recursion out to a loop. 3) This is a somewhat drastic (and possibly conceived as hackish?) approach which OTOH is independent of the input and should perform quite well. I overwrite cgi.FieldStorage, esp. its make_file method, such that it adds all fileobjects, that it creates, to a (possibly caller provided) list. Thus the caller (i.e. HTTPRequest) has a list of all those fileobjects that need to be reset if the FieldStorage should be reused (as in ZPublisher's retries). For the patch attached I chose the third approach. Please let me know, what you think about it. If it is assessed to be ok I will update the collctor item, I opened yesterday. cheers, andreas --- lib.orig/python/ZPublisher/HTTPRequest.py 2004-08-18 16:37:18.0 +0200 +++ lib/python/ZPublisher/HTTPRequest.py 2004-08-19 17:42:31.0 +0200 @@ -60,6 +60,37 @@ class NestedLoopExit( Exception ): pass +FILELIST_ENV_ENTRY = 'request.filestorage.filelist' + +class ZFieldStorage(cgi.FieldStorage): +"""\ +Tame the FieldStorage. + +This class overrides the stock implementation to add all file +objects, generated while parsing a request, to a list. This list +is given by the caller. + +Unfortunately cgi.FieldStorage is not generic enough, to adapt the +list of constructor parameters, by just deriving from it. Thus +the list is specified within the environ dict, given to the +constructor. This is ugly and could be fixed by a simple patch +against cgi.FieldStorage. +""" + +def __init__(self, fp=None, headers=None, outerboundary="", + environ=os.environ, keep_blank_values=0, strict_parsing=0): +self._filelist = environ.get(FILELIST_ENV_ENTRY, []) +cgi.FieldStorage.__init__(self, fp, headers, outerboundary, + environ, keep_blank_values, + strict_parsing) + +def make_file(self, binary=None): +import tempfile +res = tempfile.TemporaryFile("w+b") +self._filelist.append(res) +return res + + class HTTPRequest(BaseRequest): """\ Model HTTP request data. @@ -125,10 +156,15 @@ def retry(self): self.retry_count=self.retry_count+1 -self.stdin.seek(0) +if not self._fs: +raise ValueError, "Retrying on partially initialized request is not supported. Call processInputs()." +for f in self._fs_filelist: +f.seek(0) r=self.__class__(stdin=self.stdin, environ=self._orig_env, - response=self.response.retry() + response=self.response.retry(), + fs = self._fs, + flist = self._fs_filelist ) r.retry_count=self.retry_count return r @@ -138,6 +174,8 @@ # removing tempfiles. self.stdin = None self._file = None +self._fs = None +self._fs_filelist = None self.form.clear
Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)
Ames Andreas (MPA/DF) wrote at 2004-8-18 17:22 +0200: >I had a thinko in my previous patch >(http://mail.zope.org/pipermail/zope-dev/2004-August/023630.html) >which is corrected in the attached version of the patch. Sorry for >any inconvenience, I might have caused. I do not think that your patch can work: The "FieldStorage" must be rebuilt, because it can contain file objects. These file objects must be reset as they may have been (partially) read in the previous request. This prevent reusing the previous "FieldStorage". -- Dieter ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)
Hi, Andreas Ames wrote: > next trial. I hope I've got it straight now. Not quite so (although this may only be nitpicking ;-). fs_env = environ should have read fs_env = environ.copy() obviously. See attachment. cheers, andreas --- lib.orig/python/ZPublisher/HTTPRequest.py 2004-08-18 16:37:18.0 +0200 +++ lib/python/ZPublisher/HTTPRequest.py 2004-08-20 13:07:25.0 +0200 @@ -60,6 +60,37 @@ class NestedLoopExit( Exception ): pass +FILELIST_ENV_ENTRY = 'request.filestorage.filelist' + +class ZFieldStorage(cgi.FieldStorage): +"""\ +Tame the FieldStorage. + +This class overrides the stock implementation to add all file +objects, generated while parsing a request, to a list. This list +is given by the caller. + +Unfortunately cgi.FieldStorage is not generic enough, to adapt the +list of constructor parameters, by just deriving from it. Thus +the list is specified within the environ dict, given to the +constructor. This is ugly and could be fixed by a simple patch +against cgi.FieldStorage. +""" + +def __init__(self, fp=None, headers=None, outerboundary="", + environ=os.environ, keep_blank_values=0, strict_parsing=0): +self._filelist = environ.get(FILELIST_ENV_ENTRY, []) +cgi.FieldStorage.__init__(self, fp, headers, outerboundary, + environ, keep_blank_values, + strict_parsing) + +def make_file(self, binary=None): +import tempfile +res = tempfile.TemporaryFile("w+b") +self._filelist.append(res) +return res + + class HTTPRequest(BaseRequest): """\ Model HTTP request data. @@ -125,10 +156,15 @@ def retry(self): self.retry_count=self.retry_count+1 -self.stdin.seek(0) +if not self._fs:# not really needed +raise ValueError, "Retrying on partially initialized request is not supported. Call processInputs()." +for f in self._fs_filelist: +f.seek(0) r=self.__class__(stdin=self.stdin, environ=self._orig_env, - response=self.response.retry() + response=self.response.retry(), + fs = self._fs, + flist = self._fs_filelist ) r.retry_count=self.retry_count return r @@ -138,6 +174,8 @@ # removing tempfiles. self.stdin = None self._file = None +self._fs = None +self._fs_filelist = None self.form.clear() # we want to clear the lazy dict here because BaseRequests don't have # one. Without this, there's the possibility of memory leaking @@ -237,7 +275,7 @@ """ return self._client_addr -def __init__(self, stdin, environ, response, clean=0): +def __init__(self, stdin, environ, response, clean=0, fs=None, flist=None): self._orig_env=environ # Avoid the overhead of scrubbing the environment in the # case of request cloning for traversal purposes. If the @@ -261,7 +299,11 @@ self.steps=[] self._steps=[] self._lazies={} - +self._fs = fs +if flist is None: +self._fs_filelist = [] +else: +self._fs_filelist = flist if environ.has_key('REMOTE_ADDR'): self._client_addr = environ['REMOTE_ADDR'] @@ -382,23 +424,26 @@ taintedform=self.taintedform meth=None -fs=FieldStorage(fp=fp,environ=environ,keep_blank_values=1) -if not hasattr(fs,'list') or fs.list is None: +if not self._fs: +fs_env = environ.copy() +fs_env[FILELIST_ENV_ENTRY] = self._fs_filelist +self._fs=ZFieldStorage(fp=fp,environ=fs_env,keep_blank_values=1) +if not hasattr(self._fs,'list') or self._fs.list is None: # Hm, maybe it's an XML-RPC -if (fs.headers.has_key('content-type') and -fs.headers['content-type'] == 'text/xml' and +if (self._fs.headers.has_key('content-type') and +self._fs.headers['content-type'] == 'text/xml' and method == 'POST'): # Ye haaa, XML-RPC! global xmlrpc if xmlrpc is None: import xmlrpc -meth, self.args = xmlrpc.parse_input(fs.value) +meth, self.args = xmlrpc.parse_input(self._fs.value) response=xmlrpc.response(response) other['RESPONSE']=self.response=response self.maybe_webdav_client = 0 else: -self._file=fs.file +self._file=self._fs.file else: -fslist=fs.list +fslist=self._fs.list tuple_items={} lt=type([]) CGI_name=isCGI_NAME __
Re: [Zope-dev] Patch: let non-seekable streams be input for ZPublisher (updated)
Em Qua, 2004-08-18 às 12:22, Ames Andreas (MPA/DF) escreveu: > [...] > Questions: > > - Is this the right forum/place to send patches to? This is the right forum to discuss the patches, and to send them for evaluation if it's a small one, like yours. The best place to actually send patches to is as attachment in a bug report in the Zope bug collector. > - Is there any chance that this could be applied to Zope's mainline? > (If not I will proceed with a local disk buffering scheme in the > long term.) Yes, if you file a feature request on the bug collector, which can be found at: http://collector.zope.org/Zope Cheers, Leo ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )