Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-07 Thread Erik Dahl
Guys,

Thanks for all the great feedback.  Still processing it but here are  
somethings we will try.

RelStorage - in our app context to see if there it helps / hurts. will  
report back results.  Quick tests show some improvement.  We will also  
look at tuning out current ZEO setup.  Last time I looked there was  
only the invalidation queue.  I poked around a bit for tuning docs and  
didn't see any.  Can someone point me to them?

zc.catalogqueue - we are adding fullish indexing to our objects (they  
aren't text pages so not truly a full index).  hopefully moving  
indexing out to a separate process will keep the impact of the new  
index low and help with our current conflict issues.

Our slow loading object was a persistent with a regular list inside of  
the main pickle.  Objects that the list pointed to were persistent  
which I believe means that hey will load separately.  In general we  
have tried to make our persistent objects reasonably large to lower  
the amount of load round trips.  I haven't actually checked its size  
yet but it will be interesting to see.

-EAD



On Dec 4, 2009, at 6:21 PM, Shane Hathaway wrote:

 Jim Fulton wrote:
 I find this a bit confusing.  For the warm numbers, It looks like  
 ZEO didn't
 utilize a persistent cache, which explains why the ZEO numbers are  
 the
 same for hot and cold. Is that right?

 Yes.  It is currently difficult to set up ZEO caches, which I  
 consider an issue with this early version of zodbshootout.   
 zodbshootout does include a sample test configuration that turns on  
 a ZEO cache, but it's not possible to run that configuration with a  
 concurrency level greater than 1.

 What poll interval are you using for relstorage in the tests?
 Assuming an application gets reasonable cache hit rates, I don't  
 see any
 meaningful difference between ZEO and relstorage in these numbers.

 You are entitled to your opinion. :-)  Personally, I have observed a  
 huge improvement for many operations.

 Second, does the test still write and then read roughly the same
 amount of data as before?
 That is a command line option.  The chart on the web page shows  
 reading and
 writing 1000 small persistent objects per transaction,
 Which is why I consider this benchmark largely moot. The database is
 small enough
 to fit in the servers disk cache.  Even the slowest access times are
 on the order of .5
 milliseconds. Disk accesses are typically measured in 10s of
 milliseconds.  With magnetic
 disks, for databases substantially larger than the server's ram, the
 network component
 of loading objects will be noise compared to the disk access.

 That's why I think solid state disk is already a major win,  
 economically, for large ZODB setups.  The FusionIO cards in  
 particular are likely to be at least as reliable as any disk.  It's  
 time to change the way we think about seek time.

 Shane


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-07 Thread Jim Fulton
On Mon, Dec 7, 2009 at 11:08 AM, Erik Dahl ed...@zenoss.com wrote:
 Guys,

 Thanks for all the great feedback.  Still processing it but here are
 somethings we will try.

 We will also
 look at tuning out current ZEO setup.  Last time I looked there was
 only the invalidation queue.  I poked around a bit for tuning docs and
 didn't see any.  Can someone point me to them?

The biggest problem ZODB has in general is docs.

I'd consider tuning your ZEO client caches and using
persistent caches.

I'm still curious how big your database is.

 Our slow loading object was a persistent with a regular list inside of
 the main pickle.  Objects that the list pointed to were persistent
 which I believe means that hey will load separately.

Yes.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-07 Thread Erik Dahl
Biggest DBs we see are in the 10GB range.  Most are significantly  
smaller (100s of MB) so I don't think size is an issue.  We typically  
run big persistent caches so I don't think there is much help to be  
had there.  It would only make reads faster anyway right?

-EAD



On Dec 7, 2009, at 11:21 AM, Jim Fulton wrote:

 On Mon, Dec 7, 2009 at 11:08 AM, Erik Dahl ed...@zenoss.com wrote:
 Guys,

 Thanks for all the great feedback.  Still processing it but here are
 somethings we will try.

  We will also
 look at tuning out current ZEO setup.  Last time I looked there was
 only the invalidation queue.  I poked around a bit for tuning docs  
 and
 didn't see any.  Can someone point me to them?

 The biggest problem ZODB has in general is docs.

 I'd consider tuning your ZEO client caches and using
 persistent caches.

 I'm still curious how big your database is.

 Our slow loading object was a persistent with a regular list inside  
 of
 the main pickle.  Objects that the list pointed to were persistent
 which I believe means that hey will load separately.

 Yes.

 Jim

 -- 
 Jim Fulton

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-07 Thread Jim Fulton
On Mon, Dec 7, 2009 at 1:46 PM, Erik Dahl ed...@zenoss.com wrote:
 Biggest DBs we see are in the 10GB range.  Most are significantly smaller
 (100s of MB) so I don't think size is an issue.  We typically run big
 persistent caches so I don't think there is much help to be had there.  It
 would only make reads faster anyway right?

Yes, but reads typically dominate performance -- of course each
application is different.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Fred Drake
On Fri, Dec 4, 2009 at 9:41 AM, Erik Dahl ed...@zenoss.com wrote:
  I'm guessing the indexes
 are a hotspot (haven't tested this out much though I guess b-tree's
 buckets should alleviate this problem some).  (is there a persistent
 queue around?)

Cataloging certainly can be a hot spot.  Check out zc.catalogqueue.


  -Fred

-- 
Fred L. Drake, Jr.fdrake at gmail.com
Chaos is the score upon which reality is written. --Henry Miller
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Andreas Jung
The project - at least enterprise-level projects - requires are careful
choice of the tools and backends. The ZODB is the golden bullet for all
and everything. Depending on the data model and the project needs you
have to
look at relational database or NOSQL databases as alternatives.

And as you wrote: the ZODB-based application perform bad in heavy-write
scenarios...you may raise the limits by using Relstorage but perhaps
other backends might be the better choice - something one must consider
when starting a new project.

-aj

Am 04.12.09 15:41, schrieb Erik Dahl:
 Guys,

 We have a product written in python using ZODB/ZEO and I would like to  
 improve the speed of database in general.  Things that I have seen  
 that I would like to improve some I understand and some not.

 1. Loading of largish (but not too large object had a list with around  
 20K references in it) pickles can be very slow.  Ok what size pickle?   
 not 100% sure is there a way to get the pickle size of a zodb  
 persistent object?  I lamely tried to pickle one of our persistent  
 objects and of course it blew up with max recursion because it went  
 beyond the normal bounds of a zodb persistent pickle.  There must be a  
 way to do this though right?

 2. Writing lots of objects.  I know that zodb wasn't written for this  
 type of use case but we have backed into it.  We can have many (~30 is  
 that a lot?) zodb clients and as a result large numbers of cache  
 invalidations can be sent when a write occurs.  Could invalidation  
 performance / cache refresh be an issue?

 3. DB hot spots.  Of course we see conflict errors when there are lots  
 of writes to the db from different clients that touch the same  
 object.  We haven't done a bunch of optimization work here but I'm  
 thinking of moving all indexing out to a separate client/process that  
 reads off a queue to find objects to index.  I'm guessing the indexes  
 are a hotspot (haven't tested this out much though I guess b-tree's  
 buckets should alleviate this problem some).  (is there a persistent  
 queue around?)

 Anyway these are some things that come to mind when I think of  
 performance issues.  I have the thought that many could be made better  
 with faster ZEO I/O.  Does this seem like a good assumption?  If so  
 what could we do to make ZEO faster?

 Questions:

 * We use a filestorage are there faster ones?  Can this be a bottleneck?
 * Is the ZEO protocol inefficient?
 * Is the ZEO server just plain slow?

 Thoughts I have that may have no impact.

 * rewrite ZEO or parts of it in C
 * write a C based storage

 Others?

 -EAD



 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  ZODB-Dev@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev
   


-- 
ZOPYX Ltd.  Co KG  \  zopyx group
Charlottenstr. 37/1  \  The full-service network for your 
D-72070 Tübingen  \  Python, Zope and Plone projects
www.zopyx.com, i...@zopyx.com  \  www.zopyxgroup.com

E-Publishing, Python, Zope  Plone development, Consulting


attachment: lists.vcf___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Erik Dahl
Right I have looked at the nosql stuff a good bit.  MongoDB is  
interesting but no transactions and their sharding is alpha code.   
ChouchDB has transactions but no sharding at all.  HBase has what I  
want but I can't get my brain around how to use the wacky schema.

Unfortunately, this isn't a new project so if we could get more out of  
what we have that would be great but clearly swapping out the back end  
may become necessary.

I haven't dove into the relstorage yet but have heard it will perform  
better.  Not sure I understand why though.  Isn't it just putting  
pickles into a single table with an index on the oid? (or oid / serial).

-EAD



On Dec 4, 2009, at 10:15 AM, Andreas Jung wrote:

 The project - at least enterprise-level projects - requires are  
 careful
 choice of the tools and backends. The ZODB is the golden bullet for  
 all
 and everything. Depending on the data model and the project needs you
 have to
 look at relational database or NOSQL databases as alternatives.

 And as you wrote: the ZODB-based application perform bad in heavy- 
 write
 scenarios...you may raise the limits by using Relstorage but perhaps
 other backends might be the better choice - something one must  
 consider
 when starting a new project.

 -aj

 Am 04.12.09 15:41, schrieb Erik Dahl:
 Guys,

 We have a product written in python using ZODB/ZEO and I would like  
 to
 improve the speed of database in general.  Things that I have seen
 that I would like to improve some I understand and some not.

 1. Loading of largish (but not too large object had a list with  
 around
 20K references in it) pickles can be very slow.  Ok what size pickle?
 not 100% sure is there a way to get the pickle size of a zodb
 persistent object?  I lamely tried to pickle one of our persistent
 objects and of course it blew up with max recursion because it went
 beyond the normal bounds of a zodb persistent pickle.  There must  
 be a
 way to do this though right?

 2. Writing lots of objects.  I know that zodb wasn't written for this
 type of use case but we have backed into it.  We can have many (~30  
 is
 that a lot?) zodb clients and as a result large numbers of cache
 invalidations can be sent when a write occurs.  Could invalidation
 performance / cache refresh be an issue?

 3. DB hot spots.  Of course we see conflict errors when there are  
 lots
 of writes to the db from different clients that touch the same
 object.  We haven't done a bunch of optimization work here but I'm
 thinking of moving all indexing out to a separate client/process that
 reads off a queue to find objects to index.  I'm guessing the indexes
 are a hotspot (haven't tested this out much though I guess b-tree's
 buckets should alleviate this problem some).  (is there a persistent
 queue around?)

 Anyway these are some things that come to mind when I think of
 performance issues.  I have the thought that many could be made  
 better
 with faster ZEO I/O.  Does this seem like a good assumption?  If so
 what could we do to make ZEO faster?

 Questions:

 * We use a filestorage are there faster ones?  Can this be a  
 bottleneck?
 * Is the ZEO protocol inefficient?
 * Is the ZEO server just plain slow?

 Thoughts I have that may have no impact.

 * rewrite ZEO or parts of it in C
 * write a C based storage

 Others?

 -EAD



 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  ZODB-Dev@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev



 -- 
 ZOPYX Ltd.  Co KG  \  zopyx group
 Charlottenstr. 37/1  \  The full-service network for your
 D-72070 Tübingen  \  Python, Zope and Plone projects
 www.zopyx.com, i...@zopyx.com  \  www.zopyxgroup.com
 
 E-Publishing, Python, Zope  Plone development, Consulting


 lists.vcf

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl ed...@zenoss.com wrote:
...
 I haven't dove into the relstorage yet but have heard it will perform
 better.

It doesn't.  See:

  https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 9:41 AM, Erik Dahl ed...@zenoss.com wrote:
 Guys,

 We have a product written in python using ZODB/ZEO and I would like to
 improve the speed of database in general.  Things that I have seen
 that I would like to improve some I understand and some not.

How have you seen these? Do you have evidence that they are
affecting your applications specifically?

 1. Loading of largish (but not too large object had a list with around
 20K references in it) pickles can be very slow.  Ok what size pickle?
 not 100% sure is there a way to get the pickle size of a zodb
 persistent object?  I lamely tried to pickle one of our persistent
 objects and of course it blew up with max recursion because it went
 beyond the normal bounds of a zodb persistent pickle.  There must be a
 way to do this though right?

You can get the pickle and thus the pickle size for an object fairly
straightforwardly.

  p, s = ob._p_jar.db().storage.load(ob._p_oid, '')
  print len(p)

Pickling is actually pretty fast.

By list, do you mean  a Python list (or persistent list)?
When such an object is loaded, the database will have to instantiate
all those objects. That might be time consuming.  In general, I'd avoid
large lists or persistent lists, especially if they are uploaded a lot.

If you try to access those objects all at once, then those objects
will have to be loaded from the database, and that can be very
time consuming.

 2. Writing lots of objects.  I know that zodb wasn't written for this
 type of use case but we have backed into it.

While some relational databases have greater transaction throughput,
you can fair writes throughput with ZODB.

  We can have many (~30 is
 that a lot?)

That's on the order of what we have.

 zodb clients and as a result large numbers of cache
 invalidations can be sent when a write occurs.  Could invalidation
 performance / cache refresh be an issue?

I don't think so. Invalidations aren't that large and are sent asynchronously.

 3. DB hot spots.  Of course we see conflict errors when there are lots
 of writes to the db from different clients that touch the same
 object.  We haven't done a bunch of optimization work here but I'm
 thinking of moving all indexing out to a separate client/process that
 reads off a queue to find objects to index.  I'm guessing the indexes
 are a hotspot (haven't tested this out much though I guess b-tree's
 buckets should alleviate this problem some).  (is there a persistent
 queue around?)

hotspots are an issue in any database.  It is generally worth
application refactoring to avoid them. :)

 Anyway these are some things that come to mind when I think of
 performance issues.  I have the thought that many could be made better
 with faster ZEO I/O.  Does this seem like a good assumption?  If so
 what could we do to make ZEO faster?

There are 2 major issues with ZEO that I'm aware of:

- Each object load requires a round trip.  If you have a list
  of 20,000 objects and you want to iterate over them, that
  will require 20,000 separate object load requests, each requiring
  a network round trip.

  We've discussed schemes to give the database hints to pre-fetch
  objects, thus avoiding serial round trips.

- ZEO servers are currently single threaded. This can hurt a lot of you
  have many clients and prevents you from taking advantage of multi-
  spindle storage systems.

I'll note that if you have a large database, a substantial amount of time
can be spent doing disk IO.  In a recent test we performed with a 500GB
database, disk access accounted for ~15 milliseconds of a ~16 millisecond
object load requests. IOW, disk access times swamped network access times,
which in my measurements were on the order of a 600-1000 microseconds.

 Questions:

 * We use a filestorage are there faster ones?

Not that I'm aware of.

 Can this be a bottleneck?

Not in and of itself AFAIK.  Your disk configuration
can matter a lot. How big is your database file?

The biggest issues with file storage for large databases are:

- packing can take a long time and can affect database
  performance a great deal.

- shut down and start up can take a long time to write and read
  an index file. God help you if you have to open a large database
  without an up to date index.

This is why I am (yet again) exploring a BerkeleyDB-based
storage implementation.

 * Is the ZEO protocol inefficient?

No, or at least not enough to matter.

 * Is the ZEO server just plain slow?

To some degree, yes, mainly because it is single threaded.

See my jim-thready-zeo2 branch.  I suspect that Python's GIL
will always put ZEO at a disadvantage relative to non-Python-based
database servers.

You can help by testing this. :)

 Thoughts I have that may have no impact.

 * rewrite ZEO or parts of it in C
 * write a C based storage

I don't think either of these will help in any significant way.

Jim

-- 
Jim Fulton
___
For more information about ZODB, 

Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 9:41 AM, Erik Dahl ed...@zenoss.com wrote:
 Guys,

 We have a product written in python using ZODB/ZEO and I would like to
 improve the speed of database in general.  Things that I have seen
 that I would like to improve some I understand and some not.

How have you seen these? Do you have evidence that they are
affecting your applications specifically?

 1. Loading of largish (but not too large object had a list with around
 20K references in it) pickles can be very slow.  Ok what size pickle?
 not 100% sure is there a way to get the pickle size of a zodb
 persistent object?  I lamely tried to pickle one of our persistent
 objects and of course it blew up with max recursion because it went
 beyond the normal bounds of a zodb persistent pickle.  There must be a
 way to do this though right?

You can get the pickle and thus the pickle size for an object fairly
straightforwardly.

  p, s = ob._p_jar.db().storage.load(ob._p_oid, '')
  print len(p)

Pickling is actually pretty fast.

By list, do you mean  a Python list (or persistent list)?
When such an object is loaded, the database will have to instantiate
all those objects. That might be time consuming.  In general, I'd avoid
large lists or persistent lists, especially if they are uploaded a lot.

If you try to access those objects all at once, then those objects
will have to be loaded from the database, and that can be very
time consuming.

 2. Writing lots of objects.  I know that zodb wasn't written for this
 type of use case but we have backed into it.

While some relational databases have greater transaction throughput,
you can fair writes throughput with ZODB.

  We can have many (~30 is
 that a lot?)

That's on the order of what we have.

 zodb clients and as a result large numbers of cache
 invalidations can be sent when a write occurs.  Could invalidation
 performance / cache refresh be an issue?

I don't think so. Invalidations aren't that large and are sent asynchronously.

 3. DB hot spots.  Of course we see conflict errors when there are lots
 of writes to the db from different clients that touch the same
 object.  We haven't done a bunch of optimization work here but I'm
 thinking of moving all indexing out to a separate client/process that
 reads off a queue to find objects to index.  I'm guessing the indexes
 are a hotspot (haven't tested this out much though I guess b-tree's
 buckets should alleviate this problem some).  (is there a persistent
 queue around?)

hotspots are an issue in any database.  It is generally worth
application refactoring to avoid them. :)

 Anyway these are some things that come to mind when I think of
 performance issues.  I have the thought that many could be made better
 with faster ZEO I/O.  Does this seem like a good assumption?  If so
 what could we do to make ZEO faster?

There are 2 major issues with ZEO that I'm aware of:

- Each object load requires a round trip.  If you have a list
  of 20,000 objects and you want to iterate over them, that
  will require 20,000 separate object load requests, each requiring
  a network round trip.

  We've discussed schemes to give the database hints to pre-fetch
  objects, thus avoiding serial round trips.

- ZEO servers are currently single threaded. This can hurt a lot of you
  have many clients and prevents you from taking advantage of multi-
  spindle storage systems.

I'll note that if you have a large database, a substantial amount of time
can be spent doing disk IO.  In a recent test we performed with a 500GB
database, disk access accounted for ~15 milliseconds of a ~16 millisecond
object load requests. IOW, disk access times swamped network access times,
which in my measurements were on the order of a 600-1000 microseconds.

 Questions:

 * We use a filestorage are there faster ones?

Not that I'm aware of.

 Can this be a bottleneck?

Not in and of itself AFAIK.  Your disk configuration
can matter a lot. How big is your database file?

The biggest issues with file storage for large databases are:

- packing can take a long time and can affect database
  performance a great deal.

- shut down and start up can take a long time to write and read
  an index file. God help you if you have to open a large database
  without an up to date index.

This is why I am (yet again) exploring a BerkeleyDB-based
storage implementation.

 * Is the ZEO protocol inefficient?

No, or at least not enough to matter.

 * Is the ZEO server just plain slow?

To some degree, yes, mainly because it is single threaded.

See my jim-thready-zeo2 branch.  I suspect that Python's GIL
will always put ZEO at a disadvantage relative to n

You can help by testing this. :)

 Thoughts I have that may have no impact.

 * rewrite ZEO or parts of it in C
 * write a C based storage

I don't think either of these will help in any significant way.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:

Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Chris Withers
Jim Fulton wrote:
 On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl ed...@zenoss.com wrote:
 ...
 I haven't dove into the relstorage yet but have heard it will perform
 better.
 
 It doesn't.  See:
 
   https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html

Jim,

That's a little sweeping and some might see you as a little biased in 
this ;-)

Chris

-- 
Simplistix - Content Management, Batch Processing  Python Consulting
 - http://www.simplistix.co.uk

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Shane Hathaway
Jim Fulton wrote:
 On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl ed...@zenoss.com wrote:
 ...
 I haven't dove into the relstorage yet but have heard it will perform
 better.
 
 It doesn't.  See:
 
   https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html

Since you posted those numbers, I have worked hard to solve the 
performance issues you identified.  I have improved both RelStorage and 
the performance test, which is now a separate project (released on PyPI) 
named zodbshootout.  Your numbers may apply to RelStorage 1.2, but not 1.4.

See:

http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/

Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway sh...@hathawaymix.org wrote:
 Jim Fulton wrote:

 On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl ed...@zenoss.com wrote:
 ...

 I haven't dove into the relstorage yet but have heard it will perform
 better.

 It doesn't.  See:

  https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html

 Since you posted those numbers, I have worked hard to solve the performance
 issues you identified.  I have improved both RelStorage and the performance
 test, which is now a separate project (released on PyPI) named zodbshootout.
  Your numbers may apply to RelStorage 1.2, but not 1.4.

 See:

 http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/

I won't take the time now to analyze the new test, although I will ask
a couple of questions:

First, in your results, you show cold, warm and hot numbers. Do these
correspond to my cold, hot and steamin numbers?

Second, does the test still write and then read roughly the same
amount of data as before?

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Shane Hathaway
Erik Dahl wrote:
 I haven't dove into the relstorage yet but have heard it will perform  
 better.  Not sure I understand why though.  Isn't it just putting  
 pickles into a single table with an index on the oid? (or oid / serial).

Yes.  In theory, ZEO should be about the same speed, but all the 
measurements I've done suggest ZEO has excessively high internal 
latency.  I suspect ZEO uses Python sockets inefficiently, or that the 
GIL may be getting in the way.  I know FileStorage isn't the problem, 
since bare FileStorage can read or write about 20,000 objects per second 
on my laptop.

Jim is working on a newer version of ZEO, so maybe that will close the 
performance gap.  For now, all the people who have told me they are 
using RelStorage seem to be happy with the improved speed.

Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Shane Hathaway
Jim Fulton wrote:
 On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway sh...@hathawaymix.org wrote:
 http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/
 
 I won't take the time now to analyze the new test, although I will ask
 a couple of questions:
 
 First, in your results, you show cold, warm and hot numbers. Do these
 correspond to my cold, hot and steamin numbers?

zodbshootout still produces steamin numbers, but I didn't include them 
on the web page because they would dominate the chart.  I added the 
warm test after your speedtest modifications.  My cold, hot, and 
steamin numbers correspond with your cold, hot, and steamin numbers. 
The steamin numbers are about the same for RelStorage and ZEO.  See this 
page for a chart that includes steamin numbers:

http://pypi.python.org/pypi/zodbshootout#interpreting-the-results

 Second, does the test still write and then read roughly the same
 amount of data as before?

That is a command line option.  The chart on the web page shows reading 
and writing 1000 small persistent objects per transaction, and the 
object count is corrected now.  (OTOH, someone could claim the object 
count is off by one for some of the tests, and if that turns out 
significant, I can correct that.)

Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 3:41 PM, Shane Hathaway sh...@hathawaymix.org wrote:
 Jim Fulton wrote:

 On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway sh...@hathawaymix.org
 wrote:

 http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/

 I won't take the time now to analyze the new test, although I will ask
 a couple of questions:

 First, in your results, you show cold, warm and hot numbers. Do these
 correspond to my cold, hot and steamin numbers?

 zodbshootout still produces steamin numbers, but I didn't include them on
 the web page because they would dominate the chart.  I added the warm test
 after your speedtest modifications.  My cold, hot, and steamin numbers
 correspond with your cold, hot, and steamin numbers. The steamin numbers are
 about the same for RelStorage and ZEO.  See this page for a chart that
 includes steamin numbers:

 http://pypi.python.org/pypi/zodbshootout#interpreting-the-results

I find this a bit confusing.  For the warm numbers, It looks like ZEO didn't
utilize a persistent cache, which explains why the ZEO numbers are the
same for hot and cold. Is that right?

What poll interval are you using for relstorage in the tests?

Assuming an application gets reasonable cache hit rates, I don't see any
meaningful difference between ZEO and relstorage in these numbers.

 Second, does the test still write and then read roughly the same
 amount of data as before?

 That is a command line option.  The chart on the web page shows reading and
 writing 1000 small persistent objects per transaction,

Which is why I consider this benchmark largely moot. The database is
small enough
to fit in the servers disk cache.  Even the slowest access times are
on the order of .5
milliseconds. Disk accesses are typically measured in 10s of
milliseconds.  With magnetic
disks, for databases substantially larger than the server's ram, the
network component
of loading objects will be noise compared to the disk access.

The only real difference between relstorage and ZEO is in cold numbers
for databases
in ram.  This measures the raw low-level server network performance.
MySQL and Postgress
servers are obviously faster than ZEO at this level, but networking is
a small component of
the overall work load.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 3:41 PM, Shane Hathaway sh...@hathawaymix.org wrote:
 Jim Fulton wrote:

 On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway sh...@hathawaymix.org
 wrote:

 http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/

 I won't take the time now to analyze the new test, although I will ask
 a couple of questions:

 First, in your results, you show cold, warm and hot numbers. Do these
 correspond to my cold, hot and steamin numbers?

 zodbshootout still produces steamin numbers, but I didn't include them on
 the web page because they would dominate the chart.  I added the warm test
 after your speedtest modifications.  My cold, hot, and steamin numbers
 correspond with your cold, hot, and steamin numbers. The steamin numbers are
 about the same for RelStorage and ZEO.  See this page for a chart that
 includes steamin numbers:

 http://pypi.python.org/pypi/zodbshootout#interpreting-the-results

I find this a bit confusing.  For the warm numbers, It looks like ZEO didn't
utilize a persistent cache, which explains why the ZEO numbers are the
same for hot and cold. Is that right?

What poll interval are you using for relstorage in the tests?

Assuming an application gets reasonable cache hit rates, I don't see any
meaningful difference between ZEO and relstorage in these numbers.

 Second, does the test still write and then read roughly the same
 amount of data as before?

 That is a command line option.  The chart on the web page shows reading and
 writing 1000 small persistent objects per transaction,

Which is why I consider this benchmark largely moot. The database is
small enough
to fit in the servers disk cache.  Even the slowest access times are
on the order of .5
milliseconds. Disk accesses are typically measured in 10s of
milliseconds.  With magnetic
disks, for databases substantially larger than the server's ram, the
network component
of loading objects will be noise compared to the disk access.

The only real difference between relstorage and ZEO is in cold numbers
for databases
in ram.  This measures the raw low-level server network performance.
MySQL and Postgress
servers are obviously faster than ZEO at this level, but networking is
a small component of the overall
work load.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Shane Hathaway
Jim Fulton wrote:
 I find this a bit confusing.  For the warm numbers, It looks like ZEO didn't
 utilize a persistent cache, which explains why the ZEO numbers are the
 same for hot and cold. Is that right?

Yes.  It is currently difficult to set up ZEO caches, which I consider 
an issue with this early version of zodbshootout.  zodbshootout does 
include a sample test configuration that turns on a ZEO cache, but it's 
not possible to run that configuration with a concurrency level greater 
than 1.

 What poll interval are you using for relstorage in the tests?
 
 Assuming an application gets reasonable cache hit rates, I don't see any
 meaningful difference between ZEO and relstorage in these numbers.

You are entitled to your opinion. :-)  Personally, I have observed a 
huge improvement for many operations.

 Second, does the test still write and then read roughly the same
 amount of data as before?
 That is a command line option.  The chart on the web page shows reading and
 writing 1000 small persistent objects per transaction,
 
 Which is why I consider this benchmark largely moot. The database is
 small enough
 to fit in the servers disk cache.  Even the slowest access times are
 on the order of .5
 milliseconds. Disk accesses are typically measured in 10s of
 milliseconds.  With magnetic
 disks, for databases substantially larger than the server's ram, the
 network component
 of loading objects will be noise compared to the disk access.

That's why I think solid state disk is already a major win, 
economically, for large ZODB setups.  The FusionIO cards in particular 
are likely to be at least as reliable as any disk.  It's time to change 
the way we think about seek time.

Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev