Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-04-10 Thread Paul Winkler
On Fri, Apr 09, 2004 at 08:11:20PM -0400, Chris McDonough wrote:
 On Fri, 2004-04-09 at 18:02, Paul Winkler wrote:
  That's easy. LocalFS, CMFCore.FSFile, and ExtFile 1.1.3 all read the
  entire file into memory before sending anything back to the client.
  That's why ExtFile 1.4 is so much better - it uses RESPONSE.write()
  with 64k chunks.
 
 I just don't understand why it's a third the speed of ExtFile 1.1.3 at
 the largest file size.

You must be talking about my patched versions. I don't understand
that either!

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's EGG-LIKE KAZUE!
(random hero from isometric.spaceninja.com)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-04-09 Thread Paul Winkler
On Fri, Apr 09, 2004 at 12:25:50AM -0400, Chris McDonough wrote:
 and 1216K (I cannot explain the large difference in results at 38322k
 across products; it appears to be product-specific).

That's easy. LocalFS, CMFCore.FSFile, and ExtFile 1.1.3 all read the
entire file into memory before sending anything back to the client.
That's why ExtFile 1.4 is so much better - it uses RESPONSE.write()
with 64k chunks.

  Also, for this
 file size, wrapping the data in a producer is actually just more

iterators now, remember? :-)

 overhead; products would be better off to do a direct RESPONSE.write if
 the total file size was under 128k (as evidenced by the fact that the
 iterator-based products are slower at small file sizes).

Yeah, my patches were very minimal. They could easily be extended to
only use iterators for larger content-length.
I'll do that, add a working LocalFS and another thingie I heard about 
yesterday into the mix, and do another round of benchmarks maybe tomorrow.

 It's a bit of a mystery about why retrieving data over ZEO is an order
 of magnitude slower than retrieving it from a local file storage when
 the ZEO server is actually on the same physical machine.   I'm sure
 there are lots of optimization strategies for making this faster that
 involve a lot of heavy lifting and hard thinking in a dark room. ;-) 
 While that might be fun, in the meantime, for the hoi-polloi who just
 need to get stuff done, I believe that one simple answer to this is, as
 usual, to do very aggressive caching and avoid ZODB access altogether
 for large blobs.  The ZEO caching algorithm is largely agnostic about
 what kinds of objects it caches and doesn't allow for setting a
 cache-eviction policy other than whatever its implementors consider
 optimal for the most general case.   This would be fairly difficult to
 change in the best-case circumstance for someone with deep ZODB
 knowledge.  A more effective strategy for normal people might be to
 contribute to the hardening of something like FileCacheManager, which
 prevents ZODB access in the first place and allows for fine-grained
 policies on an application level by caching blobs on client filesystems.

Fine and dandy, but I'd still really love to see what can be done
about ZEO. Caches only prevent ZODB access for stuff that's in the cache...
Sooner or later, some user will get a cache miss and that poor soul will 
see speed go down by an order of magnitude or two.  Ouch.

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's HIGHLY TOXIC PENGUIN THING!
(random hero from isometric.spaceninja.com)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-04-09 Thread Chris McDonough
On Fri, 2004-04-09 at 18:02, Paul Winkler wrote:
 That's easy. LocalFS, CMFCore.FSFile, and ExtFile 1.1.3 all read the
 entire file into memory before sending anything back to the client.
 That's why ExtFile 1.4 is so much better - it uses RESPONSE.write()
 with 64k chunks.

I just don't understand why it's a third the speed of ExtFile 1.1.3 at
the largest file size.


  overhead; products would be better off to do a direct RESPONSE.write if
  the total file size was under 128k (as evidenced by the fact that the
  iterator-based products are slower at small file sizes).
 
 Yeah, my patches were very minimal. They could easily be extended to
 only use iterators for larger content-length.
 I'll do that, add a working LocalFS and another thingie I heard about 
 yesterday into the mix, and do another round of benchmarks maybe tomorrow.

Cool...


 Fine and dandy, but I'd still really love to see what can be done
 about ZEO. Caches only prevent ZODB access for stuff that's in the cache...
 Sooner or later, some user will get a cache miss and that poor soul will 
 see speed go down by an order of magnitude or two.  Ouch.

Since it's only on the first request, I don't care *too* much (although
I do care) how long it takes to slurp across ZEO, especially because
blobs rarely change in normal usage and thus would be seldom evicted
from FSCacheManager's cache.  It's just likely more bang for the buck
in the common case to prevent the ZODB read in the first place than to
try to pick through and speed up ZEO's (very subtle) code.  That said,
go for it!

- C



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-04-08 Thread Paul Winkler
On Wed, Mar 24, 2004 at 12:57:00PM -0500, Chris McDonough wrote:
 moved to zope-dev only
 On Wed, 2004-03-24 at 09:28, Shane Hathaway wrote:
  This sounds useful for serving content from the filesystem.
  
  However, I'm a little concerned about this because producers must not 
  read from the object database. 

FWIW, I've added your warning to the IStreamIterator docstring.
If I hadn't vaguely remembered this thread, I probably would have
tripped over it sooner or later!
 
-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's COBRA DAMN LAGOMORPH!
(random hero from isometric.spaceninja.com)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-04-08 Thread Paul Winkler
On Wed, Mar 24, 2004 at 01:32:18PM -0500, Shane Hathaway wrote:
 Jeremy has suggested that object pre-fetching could be added to ZODB.

This is much on my mind currently.
Any thoughts on what an API for pre-fetching might look like?

The use case that most concerns me is:
If you have an Image or File object with a very long Pdata chain,
you're likely to A) thrash the ZEO client cache if it's not big enough,
and B) spend a long time waiting for all the objects in the chain
to load.  At least, this is my theory on what's slowing things
down - I will be trying to verify this today. See the bottom of this page:
http://www.slinkp.com/code/zopestuff/blobnotes

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's RETRO-NEWBIE Z!
(random hero from isometric.spaceninja.com)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-04-08 Thread Shane Hathaway
On 04/08/04 12:16, Paul Winkler wrote:
On Wed, Mar 24, 2004 at 01:32:18PM -0500, Shane Hathaway wrote:

Jeremy has suggested that object pre-fetching could be added to ZODB.


This is much on my mind currently.
Any thoughts on what an API for pre-fetching might look like?
Well, thinking about it some more, it seems like it might be just as 
easy to run a special thread just for fetching blobs.  One queue should 
tell the blob thread what to get, and another queue should return the 
results.  I think that would be a nice design.

The use case that most concerns me is:
If you have an Image or File object with a very long Pdata chain,
you're likely to A) thrash the ZEO client cache if it's not big enough,
and B) spend a long time waiting for all the objects in the chain
to load.  At least, this is my theory on what's slowing things
down - I will be trying to verify this today. See the bottom of this page:
http://www.slinkp.com/code/zopestuff/blobnotes
In fact, Zope puts large files (the threshold is around 256K - 512K) 
into a temporary file before serving them, to free up application 
threads.  It's a tremendous handicap.

Here's a relevant tangent.  Twisted and asyncore like all I/O to be 
event-driven, but ZODB is fundamentally not event-driven.  In ZODB, any 
attribute access can incur I/O time, and you can't control that.  This 
is why Stackless is interesting for ZODB.  With Stackless, you might be 
able to switch stack frames at the moment ZODB needs to do some I/O and 
get some useful work done.  Without Stackless, ZODB can not guarantee 
any particular response time and therefore should not operate inside a 
time-critical event loop.  Threads can also solve this problem, but the 
global interpreter lock hurts performance.  There's also POSH.  With 
POSH, you can take advantage of multiple processors (which you can't do 
with Stackless nor threads)... that seems like a really good thing. 
Some careful coding might make Zope + POSH scream.

Shane

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-04-08 Thread zope
 In fact, Zope puts large files (the threshold is around 256K - 512K) 
 into a temporary file before serving them, to free up application 
 threads.  It's a tremendous handicap.

I'm working on a product which serves files from the filesystem. The
data retrieval method is the usual:

def pushData(self, f, outstream):
finished = False
while not finished:
block = f.read(blocksize)
if len(block)  blocksize:
finished = True
outstream.write(block)

f is the file on the filesystem, outstream is the request object.

Testing with a 1Mbyte file (ab -n 12 -c 4), I get ~4.4 req/sec -
~4.7Mbyte/sec after a few iterations (the os caches the file).

Now, I change this method to the following:

def pushData(self, f, outstream, data='0'*65536):
for i in range(16):
outstream.write(data)

The test results are the same.

Now, if I disable the temporary file thing in ZServer/HTTPResponse.py
the performance goes up to around 6.9 req/sec - ~7Mbyte/sec. If I
restore my pushData method to it's original form it can still do ~6.2
req/sec - ~6.6Mbyte/sec.
In this case, practically every disk operation was served from the
cache.

It seems from these results, that ZServer's tempfile strategy causes
some (~20% if everything is cached) performance hit, but I think that
there should other bottleneck(s) beside this.

Regards,
Sandor


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-04-08 Thread Chris McDonough
I'm pretty sure that the tempfile penalty is unrelated to the results
Paul sees in his tests, at least for smaller files.

If the content-length header indicates that the body of the item is
smaller than 128k, it does not appear to be spooled from a tempfile at
all.  This also may be why there isn't much of a difference between
normal product results and Paul's patched product results at his 12K
file size as opposed to the roughly 60% difference in speed at at 324K
and 1216K (I cannot explain the large difference in results at 38322k
across products; it appears to be product-specific).  Also, for this
file size, wrapping the data in a producer is actually just more
overhead; products would be better off to do a direct RESPONSE.write if
the total file size was under 128k (as evidenced by the fact that the
iterator-based products are slower at small file sizes).

It's a bit of a mystery about why retrieving data over ZEO is an order
of magnitude slower than retrieving it from a local file storage when
the ZEO server is actually on the same physical machine.   I'm sure
there are lots of optimization strategies for making this faster that
involve a lot of heavy lifting and hard thinking in a dark room. ;-) 
While that might be fun, in the meantime, for the hoi-polloi who just
need to get stuff done, I believe that one simple answer to this is, as
usual, to do very aggressive caching and avoid ZODB access altogether
for large blobs.  The ZEO caching algorithm is largely agnostic about
what kinds of objects it caches and doesn't allow for setting a
cache-eviction policy other than whatever its implementors consider
optimal for the most general case.   This would be fairly difficult to
change in the best-case circumstance for someone with deep ZODB
knowledge.  A more effective strategy for normal people might be to
contribute to the hardening of something like FileCacheManager, which
prevents ZODB access in the first place and allows for fine-grained
policies on an application level by caching blobs on client filesystems.

- C


On Thu, 2004-04-08 at 12:16, Paul Winkler wrote:
 On Wed, Mar 24, 2004 at 01:32:18PM -0500, Shane Hathaway wrote:
  Jeremy has suggested that object pre-fetching could be added to ZODB.
 
 This is much on my mind currently.
 Any thoughts on what an API for pre-fetching might look like?
 
 The use case that most concerns me is:
 If you have an Image or File object with a very long Pdata chain,
 you're likely to A) thrash the ZEO client cache if it's not big enough,
 and B) spend a long time waiting for all the objects in the chain
 to load.  At least, this is my theory on what's slowing things
 down - I will be trying to verify this today. See the bottom of this page:
 http://www.slinkp.com/code/zopestuff/blobnotes


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-03-24 Thread Shane Hathaway
Chris McDonough wrote:
IMO code that needs to read from the database shouldn't return a
producer.  Instead, it should probably continue using the RESPONSE.write
streaming protocol in the worker thread when it needs to do
producer-like things.  Returning a producer to ZPublisher seems to only
really useful when the producer's more generator is guaranteed to be
exceedingly cheap because as you noted it is meant to operate in the
context of the main thread.
I'll note that iterators probably ought to replace producers.  Just 
spell more as next and they look pretty much the same.

The time spent waiting for the code that accessed the database would
block all other asyncore operations, though, right?  We'd need to test
it, but I suspect it might be a net lose for the multiple requests for
the same object case because the overhead of reading from the database
cache would be realized serially for each request.
Look at it this way:

- Don't ghostify anything manually.  Let ZODB handle that.

- Use a larger ZODB cache for the main thread's connection than you do 
for the other connections, to increase the chance that objects will be 
served directly from RAM.

- As long as other threads aren't reading/writing the large objects, 
there will be at most one copy of a large object in memory at any given 
time.

- Periodically ask the connection to collect garbage.  It uses a LRU 
strategy, which seems much more optimal than immediate deactivation.

 And if the object
isn't in cache, it could potentially block for quite a long time.
That said, I dunno.  Do you think it might be a win?  I guess my worry
is that the the operation of producer should be  more or less
guaranteed to be cheap and it seems hard to make that promise about
ZODB access, especially as the data might be coming over the wire from
ZEO.
If the object is not loaded and not in the ZEO cache, the producer could 
say it's not ready yet and ask ZEO to fetch it in the background. 
Jeremy has suggested that object pre-fetching could be added to ZODB.

FWIW, Jim intimated a while back that he might be interested in
providing blob support directly within ZODB. I can imagine an
implementation of this where maybe you can mark an object as
blobifiable and when you do so, the ZODB caching code writes a copy of
that object into a named file on disk during normal operations
hand-waving goes here ;-  Then we could use a producer to spool the
file data out without ever actually reading data out of a database from
a ZODB connection; we'd just ask the connection for the filename.
That's a possibility, although it would complicate the storage, and 
making it work with ZEO would require a distributed filesystem.

Shane

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

2004-03-24 Thread Chris McDonough
On Wed, 2004-03-24 at 13:32, Shane Hathaway wrote:
 Chris McDonough wrote:
  IMO code that needs to read from the database shouldn't return a
  producer.  Instead, it should probably continue using the RESPONSE.write
  streaming protocol in the worker thread when it needs to do
  producer-like things.  Returning a producer to ZPublisher seems to only
  really useful when the producer's more generator is guaranteed to be
  exceedingly cheap because as you noted it is meant to operate in the
  context of the main thread.
 
 I'll note that iterators probably ought to replace producers.  Just 
 spell more as next and they look pretty much the same.

I did consider that, but since the idea was to make it as fast as
possible, I figured we'd just return something that medusa could deal
with directly.  But since medusa doesn't know beans about StopIteration
coming out of an iterator, we can't just alias more to next and
expect it to work, at least without changing medusa.  But maybe that's
the right thing to do anyway (medusa is pretty overdue for some spring
cleaning) , or maybe we just wrap the iterator up in something medusa
currently understands.  It doesn't matter to me either way, really.

  The time spent waiting for the code that accessed the database would
  block all other asyncore operations, though, right?  We'd need to test
  it, but I suspect it might be a net lose for the multiple requests for
  the same object case because the overhead of reading from the database
  cache would be realized serially for each request.
 
 Look at it this way:
 
 - Don't ghostify anything manually.  Let ZODB handle that.

 - Use a larger ZODB cache for the main thread's connection than you do 
 for the other connections, to increase the chance that objects will be 
 served directly from RAM.
 
 - As long as other threads aren't reading/writing the large objects, 
 there will be at most one copy of a large object in memory at any given 
 time.
 
 - Periodically ask the connection to collect garbage.  It uses a LRU 
 strategy, which seems much more optimal than immediate deactivation.

OK.  I'll let you handle that. ;-)

   And if the object
  isn't in cache, it could potentially block for quite a long time.
  That said, I dunno.  Do you think it might be a win?  I guess my worry
  is that the the operation of producer should be  more or less
  guaranteed to be cheap and it seems hard to make that promise about
  ZODB access, especially as the data might be coming over the wire from
  ZEO.
 
 If the object is not loaded and not in the ZEO cache, the producer could 
 say it's not ready yet and ask ZEO to fetch it in the background. 

Right.  We'd need to come up with a protocol that lets the producer
return not ready yet.  I suppose this could just be implemented as an
exception.

 Jeremy has suggested that object pre-fetching could be added to ZODB.

I'll let you handle that too. ;-)

  FWIW, Jim intimated a while back that he might be interested in
  providing blob support directly within ZODB. I can imagine an
  implementation of this where maybe you can mark an object as
  blobifiable and when you do so, the ZODB caching code writes a copy of
  that object into a named file on disk during normal operations
  hand-waving goes here ;-  Then we could use a producer to spool the
  file data out without ever actually reading data out of a database from
  a ZODB connection; we'd just ask the connection for the filename.
 
 That's a possibility, although it would complicate the storage, and 
 making it work with ZEO would require a distributed filesystem.

It would actually complicate the ZODB connection caching code but the
storage would have really nothing to do with it.  It also wouldn't
require a distributed filesystem, because all we'd be doing is storing
cached copies of the data on the local disk of each ZEO client.  An
implementation could go something like this:

Objects that want to participate in the blob caching scheme can
implement a _p_makeBlob method (or whatever), which returns an
iterator representing the serialized data stream.

When a request for an object is provided to the connection:

- if it is not in the ZODB cache, return a ghost like normal.
- if it is in the cache and it has a _p_makeBlob method,
  check if a file on disk exists with its oid.  if a file
  doesn't exist on disk, call _p_makeBlob and create the file
  using the iterator it returns.  set _p_blob_filename on the
  object to the filename of the file created.
- App code can now use check for _p_blob_filename to see if
  a cached copy representing the serialized data exists on
  disk.  If it does, it can make use of it how it sees fit.
- when a cached object is invalidated out of the ZODB caches,
  delete the cached file too.

This happens on every ZEO client.  Solving race conditions and locking
is an exercise left to the reader. ;-)

- C



___
Zope-Dev maillist  -  [EMAIL PROTECTED]