Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
On Fri, Apr 09, 2004 at 08:11:20PM -0400, Chris McDonough wrote: > On Fri, 2004-04-09 at 18:02, Paul Winkler wrote: > > That's easy. LocalFS, CMFCore.FSFile, and ExtFile 1.1.3 all read the > > entire file into memory before sending anything back to the client. > > That's why ExtFile 1.4 is so much better - it uses RESPONSE.write() > > with 64k chunks. > > I just don't understand why it's a third the speed of ExtFile 1.1.3 at > the largest file size. You must be talking about my patched versions. I don't understand that either! -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's EGG-LIKE KAZUE! (random hero from isometric.spaceninja.com) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
On Fri, 2004-04-09 at 18:02, Paul Winkler wrote: > That's easy. LocalFS, CMFCore.FSFile, and ExtFile 1.1.3 all read the > entire file into memory before sending anything back to the client. > That's why ExtFile 1.4 is so much better - it uses RESPONSE.write() > with 64k chunks. I just don't understand why it's a third the speed of ExtFile 1.1.3 at the largest file size. > > overhead; products would be better off to do a direct RESPONSE.write if > > the total file size was under 128k (as evidenced by the fact that the > > iterator-based products are slower at small file sizes). > > Yeah, my patches were very minimal. They could easily be extended to > only use iterators for larger content-length. > I'll do that, add a working LocalFS and another thingie I heard about > yesterday into the mix, and do another round of benchmarks maybe tomorrow. Cool... > Fine and dandy, but I'd still really love to see what can be done > about ZEO. Caches only prevent ZODB access for stuff that's in the cache... > Sooner or later, some user will get a cache miss and that poor soul will > see speed go down by an order of magnitude or two. Ouch. Since it's only on the first request, I don't care *too* much (although I do care) how long it takes to slurp across ZEO, especially because blobs rarely change in normal usage and thus would be seldom evicted from FSCacheManager's cache. It's just likely "more bang for the buck" in the common case to prevent the ZODB read in the first place than to try to pick through and speed up ZEO's (very subtle) code. That said, go for it! - C ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
On Fri, Apr 09, 2004 at 12:25:50AM -0400, Chris McDonough wrote: > and 1216K (I cannot explain the large difference in results at 38322k > across products; it appears to be product-specific). That's easy. LocalFS, CMFCore.FSFile, and ExtFile 1.1.3 all read the entire file into memory before sending anything back to the client. That's why ExtFile 1.4 is so much better - it uses RESPONSE.write() with 64k chunks. > Also, for this > file size, wrapping the data in a producer is actually just more iterators now, remember? :-) > overhead; products would be better off to do a direct RESPONSE.write if > the total file size was under 128k (as evidenced by the fact that the > iterator-based products are slower at small file sizes). Yeah, my patches were very minimal. They could easily be extended to only use iterators for larger content-length. I'll do that, add a working LocalFS and another thingie I heard about yesterday into the mix, and do another round of benchmarks maybe tomorrow. > It's a bit of a mystery about why retrieving data over ZEO is an order > of magnitude slower than retrieving it from a local file storage when > the ZEO server is actually on the same physical machine. I'm sure > there are lots of optimization strategies for making this faster that > involve a lot of heavy lifting and hard thinking in a dark room. ;-) > While that might be fun, in the meantime, for the hoi-polloi who just > need to get stuff done, I believe that one simple answer to this is, as > usual, to do very aggressive caching and avoid ZODB access altogether > for large blobs. The ZEO caching algorithm is largely agnostic about > what kinds of objects it caches and doesn't allow for setting a > cache-eviction policy other than whatever its implementors consider > optimal for the most general case. This would be fairly difficult to > change in the best-case circumstance for someone with deep ZODB > knowledge. A more effective strategy for "normal" people might be to > contribute to the "hardening" of something like FileCacheManager, which > prevents ZODB access in the first place and allows for fine-grained > policies on an application level by caching blobs on client filesystems. Fine and dandy, but I'd still really love to see what can be done about ZEO. Caches only prevent ZODB access for stuff that's in the cache... Sooner or later, some user will get a cache miss and that poor soul will see speed go down by an order of magnitude or two. Ouch. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's HIGHLY TOXIC PENGUIN THING! (random hero from isometric.spaceninja.com) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
I'm pretty sure that the tempfile penalty is unrelated to the results Paul sees in his tests, at least for smaller files. If the content-length header indicates that the body of the item is smaller than 128k, it does not appear to be spooled from a tempfile at all. This also may be why there isn't much of a difference between "normal" product results and Paul's "patched" product results at his 12K file size as opposed to the roughly 60% difference in speed at at 324K and 1216K (I cannot explain the large difference in results at 38322k across products; it appears to be product-specific). Also, for this file size, wrapping the data in a producer is actually just more overhead; products would be better off to do a direct RESPONSE.write if the total file size was under 128k (as evidenced by the fact that the iterator-based products are slower at small file sizes). It's a bit of a mystery about why retrieving data over ZEO is an order of magnitude slower than retrieving it from a local file storage when the ZEO server is actually on the same physical machine. I'm sure there are lots of optimization strategies for making this faster that involve a lot of heavy lifting and hard thinking in a dark room. ;-) While that might be fun, in the meantime, for the hoi-polloi who just need to get stuff done, I believe that one simple answer to this is, as usual, to do very aggressive caching and avoid ZODB access altogether for large blobs. The ZEO caching algorithm is largely agnostic about what kinds of objects it caches and doesn't allow for setting a cache-eviction policy other than whatever its implementors consider optimal for the most general case. This would be fairly difficult to change in the best-case circumstance for someone with deep ZODB knowledge. A more effective strategy for "normal" people might be to contribute to the "hardening" of something like FileCacheManager, which prevents ZODB access in the first place and allows for fine-grained policies on an application level by caching blobs on client filesystems. - C On Thu, 2004-04-08 at 12:16, Paul Winkler wrote: > On Wed, Mar 24, 2004 at 01:32:18PM -0500, Shane Hathaway wrote: > > Jeremy has suggested that object pre-fetching could be added to ZODB. > > This is much on my mind currently. > Any thoughts on what an API for pre-fetching might look like? > > The use case that most concerns me is: > If you have an Image or File object with a very long Pdata chain, > you're likely to A) thrash the ZEO client cache if it's not big enough, > and B) spend a long time waiting for all the objects in the chain > to load. At least, this is my theory on what's slowing things > down - I will be trying to verify this today. See the bottom of this page: > http://www.slinkp.com/code/zopestuff/blobnotes ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
On Thu, 8 Apr 2004 [EMAIL PROTECTED] wrote: > I'm working on a product which serves files from the filesystem. The > data retrieval method is the usual: > > def pushData(self, f, outstream): > finished = False > while not finished: > block = f.read(blocksize) > if len(block) < blocksize: > finished = True > outstream.write(block) > > f is the file on the filesystem, outstream is the request object. > > Testing with a 1Mbyte file (ab -n 12 -c 4), I get ~4.4 req/sec - > ~4.7Mbyte/sec after a few iterations (the os caches the file). Zope reads the file from a cache but then forces its entire contents into dirty buffers. If the file is sent before the OS decides the flush the buffers to disk, and the OS is somehow smart enough to cancel the write of those buffers, you're in luck--not much penalty. You might even decide to put these files on a RAM disk. Most network connections aren't that fast, though, so you have to expect a concurrency level much higher than 4. The penalty comes when the OS has to write all those concurrent copies of the same data to disk, then delete each of them when the download is finished. Zope could make a good file server if it just didn't make so many temporary copies. > It seems from these results, that ZServer's tempfile strategy causes > some (~20% if everything is cached) performance hit, but I think that > there should other bottleneck(s) beside this. Your test is too optimistic. Try a concurrency level of 200. Another bottleneck is the asyncore select loop, which has an O(n) delay for each I/O operation, where n is the number of connections currently open. Shane ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
> In fact, Zope puts large files (the threshold is around 256K - 512K) > into a temporary file before serving them, to free up application > threads. It's a tremendous handicap. I'm working on a product which serves files from the filesystem. The data retrieval method is the usual: def pushData(self, f, outstream): finished = False while not finished: block = f.read(blocksize) if len(block) < blocksize: finished = True outstream.write(block) f is the file on the filesystem, outstream is the request object. Testing with a 1Mbyte file (ab -n 12 -c 4), I get ~4.4 req/sec - ~4.7Mbyte/sec after a few iterations (the os caches the file). Now, I change this method to the following: def pushData(self, f, outstream, data='0'*65536): for i in range(16): outstream.write(data) The test results are the same. Now, if I disable the temporary file thing in ZServer/HTTPResponse.py the performance goes up to around 6.9 req/sec - ~7Mbyte/sec. If I restore my pushData method to it's original form it can still do ~6.2 req/sec - ~6.6Mbyte/sec. In this case, practically every disk operation was served from the cache. It seems from these results, that ZServer's tempfile strategy causes some (~20% if everything is cached) performance hit, but I think that there should other bottleneck(s) beside this. Regards, Sandor ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
On Thu, Apr 08, 2004 at 08:02:05PM -0400, Shane Hathaway wrote: > On 04/08/04 12:16, Paul Winkler wrote: > >On Wed, Mar 24, 2004 at 01:32:18PM -0500, Shane Hathaway wrote: > > > >>Jeremy has suggested that object pre-fetching could be added to ZODB. > > > > > >This is much on my mind currently. > >Any thoughts on what an API for pre-fetching might look like? > > Well, thinking about it some more, it seems like it might be just as > easy to run a special thread just for fetching blobs. One queue should > tell the blob thread what to get, and another queue should return the > results. I think that would be a nice design. ok... i'll mull that over. > >The use case that most concerns me is: > >If you have an Image or File object with a very long Pdata chain, > >you're likely to A) thrash the ZEO client cache if it's not big enough, > >and B) spend a long time waiting for all the objects in the chain > >to load. At least, this is my theory on what's slowing things > >down - I will be trying to verify this today. See the bottom of this page: > >http://www.slinkp.com/code/zopestuff/blobnotes > > In fact, Zope puts large files (the threshold is around 256K - 512K) > into a temporary file before serving them, to free up application > threads. It's a tremendous handicap. If you're referring to the code in ZServerHTTPResponse.write: AFAICT, since Image.File calls RESPONSE.write() for each Pdata in its chain, we'll actually get a temporary file for each Pdata rather than one big one for the whole file. There may be (probably are) performance issues in there, but right now I'd be really happy to get ClientStorage a bit closer to the speed of FileStorage for this Pdata gunk. -- Paul Winkler http://www.slinkp.com ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
On 04/08/04 12:16, Paul Winkler wrote: On Wed, Mar 24, 2004 at 01:32:18PM -0500, Shane Hathaway wrote: Jeremy has suggested that object pre-fetching could be added to ZODB. This is much on my mind currently. Any thoughts on what an API for pre-fetching might look like? Well, thinking about it some more, it seems like it might be just as easy to run a special thread just for fetching blobs. One queue should tell the blob thread what to get, and another queue should return the results. I think that would be a nice design. The use case that most concerns me is: If you have an Image or File object with a very long Pdata chain, you're likely to A) thrash the ZEO client cache if it's not big enough, and B) spend a long time waiting for all the objects in the chain to load. At least, this is my theory on what's slowing things down - I will be trying to verify this today. See the bottom of this page: http://www.slinkp.com/code/zopestuff/blobnotes In fact, Zope puts large files (the threshold is around 256K - 512K) into a temporary file before serving them, to free up application threads. It's a tremendous handicap. Here's a relevant tangent. Twisted and asyncore like all I/O to be event-driven, but ZODB is fundamentally not event-driven. In ZODB, any attribute access can incur I/O time, and you can't control that. This is why Stackless is interesting for ZODB. With Stackless, you might be able to switch stack frames at the moment ZODB needs to do some I/O and get some useful work done. Without Stackless, ZODB can not guarantee any particular response time and therefore should not operate inside a time-critical event loop. Threads can also solve this problem, but the global interpreter lock hurts performance. There's also POSH. With POSH, you can take advantage of multiple processors (which you can't do with Stackless nor threads)... that seems like a really good thing. Some careful coding might make Zope + POSH scream. Shane ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
On Wed, Mar 24, 2004 at 01:32:18PM -0500, Shane Hathaway wrote: > Jeremy has suggested that object pre-fetching could be added to ZODB. This is much on my mind currently. Any thoughts on what an API for pre-fetching might look like? The use case that most concerns me is: If you have an Image or File object with a very long Pdata chain, you're likely to A) thrash the ZEO client cache if it's not big enough, and B) spend a long time waiting for all the objects in the chain to load. At least, this is my theory on what's slowing things down - I will be trying to verify this today. See the bottom of this page: http://www.slinkp.com/code/zopestuff/blobnotes -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's RETRO-NEWBIE Z! (random hero from isometric.spaceninja.com) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
On Wed, Mar 24, 2004 at 12:57:00PM -0500, Chris McDonough wrote: > > On Wed, 2004-03-24 at 09:28, Shane Hathaway wrote: > > This sounds useful for serving content from the filesystem. > > > > However, I'm a little concerned about this because producers must not > > read from the object database. FWIW, I've added your warning to the IStreamIterator docstring. If I hadn't vaguely remembered this thread, I probably would have tripped over it sooner or later! -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's COBRA DAMN LAGOMORPH! (random hero from isometric.spaceninja.com) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
On Wed, 2004-03-24 at 13:32, Shane Hathaway wrote: > Chris McDonough wrote: > > IMO code that needs to read from the database shouldn't return a > > producer. Instead, it should probably continue using the RESPONSE.write > > streaming protocol in the worker thread when it needs to do > > producer-like things. Returning a producer to ZPublisher seems to only > > really useful when the producer's "more" generator is guaranteed to be > > exceedingly cheap because as you noted it is meant to operate in the > > context of the main thread. > > I'll note that iterators probably ought to replace producers. Just > spell "more" as "next" and they look pretty much the same. I did consider that, but since the idea was to make it as fast as possible, I figured we'd just return something that medusa could deal with directly. But since medusa doesn't know beans about StopIteration coming out of an iterator, we can't just alias "more" to "next" and expect it to work, at least without changing medusa. But maybe that's the right thing to do anyway (medusa is pretty overdue for some spring cleaning) , or maybe we just wrap the iterator up in something medusa currently understands. It doesn't matter to me either way, really. > > The time spent waiting for the code that accessed the database would > > block all other asyncore operations, though, right? We'd need to test > > it, but I suspect it might be a net lose for the "multiple requests for > > the same object" case because the overhead of reading from the database > > cache would be realized serially for each request. > > Look at it this way: > > - Don't ghostify anything manually. Let ZODB handle that. > > - Use a larger ZODB cache for the main thread's connection than you do > for the other connections, to increase the chance that objects will be > served directly from RAM. > > - As long as other threads aren't reading/writing the large objects, > there will be at most one copy of a large object in memory at any given > time. > > - Periodically ask the connection to collect garbage. It uses a LRU > strategy, which seems much more optimal than immediate deactivation. OK. I'll let you handle that. ;-) > > And if the object > > isn't in cache, it could potentially block for quite a long time. > > That said, I dunno. Do you think it might be a win? I guess my worry > > is that the the operation of producer should be more or less > > "guaranteed" to be cheap and it seems hard to make that promise about > > ZODB access, especially as the data might be coming over the wire from > > ZEO. > > If the object is not loaded and not in the ZEO cache, the producer could > say it's not ready yet and ask ZEO to fetch it in the background. Right. We'd need to come up with a protocol that lets the producer return "not ready yet". I suppose this could just be implemented as an exception. > Jeremy has suggested that object pre-fetching could be added to ZODB. I'll let you handle that too. ;-) > > FWIW, Jim intimated a while back that he might be interested in > > providing "blob" support directly within ZODB. I can imagine an > > implementation of this where maybe you can mark an object as > > "blobifiable" and when you do so, the ZODB caching code writes a copy of > > that object into a named file on disk during normal operations > > Then we could use a producer to spool the > > file data out without ever actually reading data out of a database from > > a ZODB connection; we'd just ask the connection for the filename. > > That's a possibility, although it would complicate the storage, and > making it work with ZEO would require a distributed filesystem. It would actually complicate the ZODB connection caching code but the storage would have really nothing to do with it. It also wouldn't require a distributed filesystem, because all we'd be doing is storing cached copies of the data on the local disk of each ZEO client. An implementation could go something like this: Objects that want to participate in the blob caching scheme can implement a "_p_makeBlob" method (or whatever), which returns an iterator representing the serialized data stream. When a request for an object is provided to the connection: - if it is not in the ZODB cache, return a ghost like normal. - if it is in the cache and it has a "_p_makeBlob" method, check if a file on disk exists with its oid. if a file doesn't exist on disk, call _p_makeBlob and create the file using the iterator it returns. set _p_blob_filename on the object to the filename of the file created. - App code can now use check for _p_blob_filename to see if a cached copy representing the serialized data exists on disk. If it does, it can make use of it how it sees fit. - when a cached object is invalidated out of the ZODB caches, delete the cached file too. This happens on every ZEO client. Solving race conditions and locking is an exercise left to the reader. ;-) - C _
Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster
Chris McDonough wrote: IMO code that needs to read from the database shouldn't return a producer. Instead, it should probably continue using the RESPONSE.write streaming protocol in the worker thread when it needs to do producer-like things. Returning a producer to ZPublisher seems to only really useful when the producer's "more" generator is guaranteed to be exceedingly cheap because as you noted it is meant to operate in the context of the main thread. I'll note that iterators probably ought to replace producers. Just spell "more" as "next" and they look pretty much the same. The time spent waiting for the code that accessed the database would block all other asyncore operations, though, right? We'd need to test it, but I suspect it might be a net lose for the "multiple requests for the same object" case because the overhead of reading from the database cache would be realized serially for each request. Look at it this way: - Don't ghostify anything manually. Let ZODB handle that. - Use a larger ZODB cache for the main thread's connection than you do for the other connections, to increase the chance that objects will be served directly from RAM. - As long as other threads aren't reading/writing the large objects, there will be at most one copy of a large object in memory at any given time. - Periodically ask the connection to collect garbage. It uses a LRU strategy, which seems much more optimal than immediate deactivation. And if the object isn't in cache, it could potentially block for quite a long time. That said, I dunno. Do you think it might be a win? I guess my worry is that the the operation of producer should be more or less "guaranteed" to be cheap and it seems hard to make that promise about ZODB access, especially as the data might be coming over the wire from ZEO. If the object is not loaded and not in the ZEO cache, the producer could say it's not ready yet and ask ZEO to fetch it in the background. Jeremy has suggested that object pre-fetching could be added to ZODB. FWIW, Jim intimated a while back that he might be interested in providing "blob" support directly within ZODB. I can imagine an implementation of this where maybe you can mark an object as "blobifiable" and when you do so, the ZODB caching code writes a copy of that object into a named file on disk during normal operations Then we could use a producer to spool the file data out without ever actually reading data out of a database from a ZODB connection; we'd just ask the connection for the filename. That's a possibility, although it would complicate the storage, and making it work with ZEO would require a distributed filesystem. Shane ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )