On Thu, Jul 7, 2011 at 10:49 AM, Laurence Rowe <l...@lrowe.co.uk> wrote:
> One thing I found with my (rather naive) experiments building
> s3storage a few years ago is that you need to ensure requests to S3
> are made in parallel to get reasonable performance. This would be a
> lesser problem with blobs, but even then you might have multiple file
> uploads in the same request. The boto library is really useful, but
> doesn't support async requests.
Right, it occurred to me that commit performance with s3 might be an issue.
> I guess the simplest implementation would only upload a blob to S3 in
> tpc_begin as that is where the tid is set (and presumably the tid will
> form part of the blob's S3 url.) With large files that might make
> tpc_begin take a long time to complete as it waits for the blob data
> to be loaded into S3. It might be better to upload large blobs to a
> temporary s3 url first and then only make an S3 copy in tpc_begin,
> you'd need to do some benchmarks to see if this was worthwhile for all
> files or only files over a certain size.
I think I get where you're going, although I'd quibble with the details.
There is certainly some opportunity for doing things in parallel
up until you get to tpc_vote. I wonder if renames in S3 take much
time. I can image that they do.
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org