How does this differ from Local FS?
cheers,
Chris
[EMAIL PROTECTED] wrote:
I have been building an "ExternalFile" class which stores the body of
the file in an external file, mirroring the Zope path/hierarchy. This
will allow easy integration with servers that can mount the external
representation of the content and serve it with a consistent namespace.
To make life zimple, I tried to move all file manipulation to Zope,
including upload/download/copy/cut/paste/delete and permissions. These
external files are transaction aware, blah blah..
Working with files 20MB I notices some serious performance/scalability
issues and investigated. Here are the results.
A diff with my changes against version 2.2.2 is available at
http://www.superchannel.org/Playground/large_file_zope2.2.2_200010241.diff
Concerns:
Zope objects like File require data as a seekable file or as a
coherent block, rather than as a stream. Initializing/updating
these objects *may* require loading the entire file into memory.
In memory buffering of request or response data could cause
excessive swapping of the working set.
Multi-service architecture (ZServer-ZPublisher) could limit the
reuse of stream handles.
Creating temporary files as FIFOs buffers between the services
causes signficant swapping.
Modifications:
Using pipes I found that FTPServer.ContentCollector was using a
StringIO to buffer the uploads from FTP clients. I changed this
into a TemporaryFile for a while which revealed the leaked file
descriptor bug (see below). This intermediary temp file caused 1
extra file copy for each request. The goal is to not have any
intermediary files at all, and pipeline the content directly into
the Zope objects.
To remove this FTP upload file buffer, I converted the FTP collector
again from a TemporaryFile into a pipe with a reader and writer file
objects. The FTPRequest receives the reader from which it can
process the input on the publish thread in processInputs.
Since we are dealing with blocking pipes it is OK to have a reader
on the publish thread and a writer on the ZServer thread. The major
considerations were regarding the proper way to read from a pipe
through the chain of control, especially in cgi.FieldStorage.
Stdin is treated as the reader of the pipe throughout the code. All
seek()s and tell()s on sys.stdin type objects (a tty not a seekable
file) should be considered illegal and removed.
Usage of FieldStorage from FTP (Unknown content-length)
To gain access to the body of a request, one typically calls
REQUEST['BODY'] or REQUEST['BODYFILE']. This returns the file
object the FieldStorage copied from stdin.
To prevent FieldStorage from copying the file from stdin to a
temporary file, we can set the CONTENT_LENGTH header to '0' in the
FTP _get_env for a STOR.
In this case, FieldStorage creates a temporary file but doesn't read
any data from stdin so we can return stdin directly when BODYFILE is
requested and 'content-length' is '0'. However, BODYFILE could be a
pipe which doesn't support 'seek' or 'tell'. The code used to suck
the data off the BODYFILE needs to be modified to adapt to the
possibly of being passed a pipe.
Updating Image.File to play with pipes
The _read_data method of Image.File pulls the data out of the
BODYFILE and sticks it in the instance as a string, pdata object, or
a linked list of pdata objects. The existing code reads and builds
the list in one clean sweep back-to-front. I belive this keeps the
pdata.data chunks out of memory, quickly (sub)committing then
deactivating (_p_changed = None) them.
Since we can no longer safely assume 'seek' is valid for BODYFILE, I
tried to read and build the list front-to-back. This kept the data
in memory, even though I tried to deactivate the objects quickly.
As a tradeoff, I read the data front-to-back then built the list
back-to-front taking another pass to reverse the list so it is in
the correct order.
Memory usage appears to be steady, meaning the whole file is not
loaded into the working set. This also prevents unecessary reading
into a temporary FieldStorage file during an FTP upload.
Web based uploads...
...suck. I do not recommend doing a web based upload for files
1mb. First, a content-length is known, so we don't get the
advantage of pipelining the data directly from the socket, a
temporary file must be created, written and read. Second, I