Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-07 Thread Jim Fulton
On Mon, Dec 7, 2009 at 1:46 PM, Erik Dahl  wrote:
> Biggest DBs we see are in the 10GB range.  Most are significantly smaller
> (100s of MB) so I don't think size is an issue.  We typically run big
> persistent caches so I don't think there is much help to be had there.  It
> would only make reads faster anyway right?

Yes, but reads typically dominate performance -- of course each
application is different.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-07 Thread Erik Dahl
Biggest DBs we see are in the 10GB range.  Most are significantly  
smaller (100s of MB) so I don't think size is an issue.  We typically  
run big persistent caches so I don't think there is much help to be  
had there.  It would only make reads faster anyway right?

-EAD



On Dec 7, 2009, at 11:21 AM, Jim Fulton wrote:

> On Mon, Dec 7, 2009 at 11:08 AM, Erik Dahl  wrote:
>> Guys,
>>
>> Thanks for all the great feedback.  Still processing it but here are
>> somethings we will try.
>>
>>  We will also
>> look at tuning out current ZEO setup.  Last time I looked there was
>> only the invalidation queue.  I poked around a bit for tuning docs  
>> and
>> didn't see any.  Can someone point me to them?
>
> The biggest problem ZODB has in general is docs.
>
> I'd consider tuning your ZEO client caches and using
> persistent caches.
>
> I'm still curious how big your database is.
>
>> Our slow loading object was a persistent with a regular list inside  
>> of
>> the main pickle.  Objects that the list pointed to were persistent
>> which I believe means that hey will load separately.
>
> Yes.
>
> Jim
>
> -- 
> Jim Fulton

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-07 Thread Jim Fulton
On Mon, Dec 7, 2009 at 11:08 AM, Erik Dahl  wrote:
> Guys,
>
> Thanks for all the great feedback.  Still processing it but here are
> somethings we will try.
>
> We will also
> look at tuning out current ZEO setup.  Last time I looked there was
> only the invalidation queue.  I poked around a bit for tuning docs and
> didn't see any.  Can someone point me to them?

The biggest problem ZODB has in general is docs.

I'd consider tuning your ZEO client caches and using
persistent caches.

I'm still curious how big your database is.

> Our slow loading object was a persistent with a regular list inside of
> the main pickle.  Objects that the list pointed to were persistent
> which I believe means that hey will load separately.

Yes.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-07 Thread Erik Dahl
Guys,

Thanks for all the great feedback.  Still processing it but here are  
somethings we will try.

RelStorage - in our app context to see if there it helps / hurts. will  
report back results.  Quick tests show some improvement.  We will also  
look at tuning out current ZEO setup.  Last time I looked there was  
only the invalidation queue.  I poked around a bit for tuning docs and  
didn't see any.  Can someone point me to them?

zc.catalogqueue - we are adding fullish indexing to our objects (they  
aren't text pages so not truly a "full" index).  hopefully moving  
indexing out to a separate process will keep the impact of the new  
index low and help with our current conflict issues.

Our slow loading object was a persistent with a regular list inside of  
the main pickle.  Objects that the list pointed to were persistent  
which I believe means that hey will load separately.  In general we  
have tried to make our persistent objects reasonably large to lower  
the amount of load round trips.  I haven't actually checked its size  
yet but it will be interesting to see.

-EAD



On Dec 4, 2009, at 6:21 PM, Shane Hathaway wrote:

> Jim Fulton wrote:
>> I find this a bit confusing.  For the warm numbers, It looks like  
>> ZEO didn't
>> utilize a persistent cache, which explains why the ZEO numbers are  
>> the
>> same for hot and cold. Is that right?
>
> Yes.  It is currently difficult to set up ZEO caches, which I  
> consider an issue with this early version of zodbshootout.   
> zodbshootout does include a sample test configuration that turns on  
> a ZEO cache, but it's not possible to run that configuration with a  
> concurrency level greater than 1.
>
>> What poll interval are you using for relstorage in the tests?
>> Assuming an application gets reasonable cache hit rates, I don't  
>> see any
>> meaningful difference between ZEO and relstorage in these numbers.
>
> You are entitled to your opinion. :-)  Personally, I have observed a  
> huge improvement for many operations.
>
 Second, does the test still write and then read roughly the same
 amount of data as before?
>>> That is a command line option.  The chart on the web page shows  
>>> reading and
>>> writing 1000 small persistent objects per transaction,
>> Which is why I consider this benchmark largely moot. The database is
>> small enough
>> to fit in the servers disk cache.  Even the slowest access times are
>> on the order of .5
>> milliseconds. Disk accesses are typically measured in 10s of
>> milliseconds.  With magnetic
>> disks, for databases substantially larger than the server's ram, the
>> network component
>> of loading objects will be noise compared to the disk access.
>
> That's why I think solid state disk is already a major win,  
> economically, for large ZODB setups.  The FusionIO cards in  
> particular are likely to be at least as reliable as any disk.  It's  
> time to change the way we think about seek time.
>
> Shane
>

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Shane Hathaway
Jim Fulton wrote:
> I find this a bit confusing.  For the warm numbers, It looks like ZEO didn't
> utilize a persistent cache, which explains why the ZEO numbers are the
> same for hot and cold. Is that right?

Yes.  It is currently difficult to set up ZEO caches, which I consider 
an issue with this early version of zodbshootout.  zodbshootout does 
include a sample test configuration that turns on a ZEO cache, but it's 
not possible to run that configuration with a concurrency level greater 
than 1.

> What poll interval are you using for relstorage in the tests?
> 
> Assuming an application gets reasonable cache hit rates, I don't see any
> meaningful difference between ZEO and relstorage in these numbers.

You are entitled to your opinion. :-)  Personally, I have observed a 
huge improvement for many operations.

>>> Second, does the test still write and then read roughly the same
>>> amount of data as before?
>> That is a command line option.  The chart on the web page shows reading and
>> writing 1000 small persistent objects per transaction,
> 
> Which is why I consider this benchmark largely moot. The database is
> small enough
> to fit in the servers disk cache.  Even the slowest access times are
> on the order of .5
> milliseconds. Disk accesses are typically measured in 10s of
> milliseconds.  With magnetic
> disks, for databases substantially larger than the server's ram, the
> network component
> of loading objects will be noise compared to the disk access.

That's why I think solid state disk is already a major win, 
economically, for large ZODB setups.  The FusionIO cards in particular 
are likely to be at least as reliable as any disk.  It's time to change 
the way we think about seek time.

Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 3:41 PM, Shane Hathaway  wrote:
> Jim Fulton wrote:
>>
>> On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway 
>> wrote:
>>>
>>> http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/
>>
>> I won't take the time now to analyze the new test, although I will ask
>> a couple of questions:
>>
>> First, in your results, you show cold, warm and hot numbers. Do these
>> correspond to my cold, hot and steamin numbers?
>
> zodbshootout still produces steamin numbers, but I didn't include them on
> the web page because they would dominate the chart.  I added the "warm" test
> after your speedtest modifications.  My cold, hot, and steamin numbers
> correspond with your cold, hot, and steamin numbers. The steamin numbers are
> about the same for RelStorage and ZEO.  See this page for a chart that
> includes steamin numbers:
>
> http://pypi.python.org/pypi/zodbshootout#interpreting-the-results

I find this a bit confusing.  For the warm numbers, It looks like ZEO didn't
utilize a persistent cache, which explains why the ZEO numbers are the
same for hot and cold. Is that right?

What poll interval are you using for relstorage in the tests?

Assuming an application gets reasonable cache hit rates, I don't see any
meaningful difference between ZEO and relstorage in these numbers.

>> Second, does the test still write and then read roughly the same
>> amount of data as before?
>
> That is a command line option.  The chart on the web page shows reading and
> writing 1000 small persistent objects per transaction,

Which is why I consider this benchmark largely moot. The database is
small enough
to fit in the servers disk cache.  Even the slowest access times are
on the order of .5
milliseconds. Disk accesses are typically measured in 10s of
milliseconds.  With magnetic
disks, for databases substantially larger than the server's ram, the
network component
of loading objects will be noise compared to the disk access.

The only real difference between relstorage and ZEO is in cold numbers
for databases
in ram.  This measures the raw low-level server network performance.
MySQL and Postgress
servers are obviously faster than ZEO at this level, but networking is
a small component of the overall
work load.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 3:41 PM, Shane Hathaway  wrote:
> Jim Fulton wrote:
>>
>> On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway 
>> wrote:
>>>
>>> http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/
>>
>> I won't take the time now to analyze the new test, although I will ask
>> a couple of questions:
>>
>> First, in your results, you show cold, warm and hot numbers. Do these
>> correspond to my cold, hot and steamin numbers?
>
> zodbshootout still produces steamin numbers, but I didn't include them on
> the web page because they would dominate the chart.  I added the "warm" test
> after your speedtest modifications.  My cold, hot, and steamin numbers
> correspond with your cold, hot, and steamin numbers. The steamin numbers are
> about the same for RelStorage and ZEO.  See this page for a chart that
> includes steamin numbers:
>
> http://pypi.python.org/pypi/zodbshootout#interpreting-the-results

I find this a bit confusing.  For the warm numbers, It looks like ZEO didn't
utilize a persistent cache, which explains why the ZEO numbers are the
same for hot and cold. Is that right?

What poll interval are you using for relstorage in the tests?

Assuming an application gets reasonable cache hit rates, I don't see any
meaningful difference between ZEO and relstorage in these numbers.

>> Second, does the test still write and then read roughly the same
>> amount of data as before?
>
> That is a command line option.  The chart on the web page shows reading and
> writing 1000 small persistent objects per transaction,

Which is why I consider this benchmark largely moot. The database is
small enough
to fit in the servers disk cache.  Even the slowest access times are
on the order of .5
milliseconds. Disk accesses are typically measured in 10s of
milliseconds.  With magnetic
disks, for databases substantially larger than the server's ram, the
network component
of loading objects will be noise compared to the disk access.

The only real difference between relstorage and ZEO is in cold numbers
for databases
in ram.  This measures the raw low-level server network performance.
MySQL and Postgress
servers are obviously faster than ZEO at this level, but networking is
a small component of
the overall work load.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Shane Hathaway
Jim Fulton wrote:
> On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway  wrote:
>> http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/
> 
> I won't take the time now to analyze the new test, although I will ask
> a couple of questions:
> 
> First, in your results, you show cold, warm and hot numbers. Do these
> correspond to my cold, hot and steamin numbers?

zodbshootout still produces steamin numbers, but I didn't include them 
on the web page because they would dominate the chart.  I added the 
"warm" test after your speedtest modifications.  My cold, hot, and 
steamin numbers correspond with your cold, hot, and steamin numbers. 
The steamin numbers are about the same for RelStorage and ZEO.  See this 
page for a chart that includes steamin numbers:

http://pypi.python.org/pypi/zodbshootout#interpreting-the-results

> Second, does the test still write and then read roughly the same
> amount of data as before?

That is a command line option.  The chart on the web page shows reading 
and writing 1000 small persistent objects per transaction, and the 
object count is corrected now.  (OTOH, someone could claim the object 
count is off by one for some of the tests, and if that turns out 
significant, I can correct that.)

Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Shane Hathaway
Erik Dahl wrote:
> I haven't dove into the relstorage yet but have heard it will perform  
> better.  Not sure I understand why though.  Isn't it just putting  
> pickles into a single table with an index on the oid? (or oid / serial).

Yes.  In theory, ZEO should be about the same speed, but all the 
measurements I've done suggest ZEO has excessively high internal 
latency.  I suspect ZEO uses Python sockets inefficiently, or that the 
GIL may be getting in the way.  I know FileStorage isn't the problem, 
since bare FileStorage can read or write about 20,000 objects per second 
on my laptop.

Jim is working on a newer version of ZEO, so maybe that will close the 
performance gap.  For now, all the people who have told me they are 
using RelStorage seem to be happy with the improved speed.

Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 3:07 PM, Shane Hathaway  wrote:
> Jim Fulton wrote:
>>
>> On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl  wrote:
>> ...
>>>
>>> I haven't dove into the relstorage yet but have heard it will perform
>>> better.
>>
>> It doesn't.  See:
>>
>>  https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html
>
> Since you posted those numbers, I have worked hard to solve the performance
> issues you identified.  I have improved both RelStorage and the performance
> test, which is now a separate project (released on PyPI) named zodbshootout.
>  Your numbers may apply to RelStorage 1.2, but not 1.4.
>
> See:
>
> http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/

I won't take the time now to analyze the new test, although I will ask
a couple of questions:

First, in your results, you show cold, warm and hot numbers. Do these
correspond to my cold, hot and steamin numbers?

Second, does the test still write and then read roughly the same
amount of data as before?

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Shane Hathaway
Jim Fulton wrote:
> On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl  wrote:
> ...
>> I haven't dove into the relstorage yet but have heard it will perform
>> better.
> 
> It doesn't.  See:
> 
>   https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html

Since you posted those numbers, I have worked hard to solve the 
performance issues you identified.  I have improved both RelStorage and 
the performance test, which is now a separate project (released on PyPI) 
named zodbshootout.  Your numbers may apply to RelStorage 1.2, but not 1.4.

See:

http://shane.willowrise.com/archives/relstorage-1-4-0b1-and-zodbshootout/

Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Chris Withers
Jim Fulton wrote:
> On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl  wrote:
> ...
>> I haven't dove into the relstorage yet but have heard it will perform
>> better.
> 
> It doesn't.  See:
> 
>   https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html

Jim,

That's a little sweeping and some might see you as a little biased in 
this ;-)

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
 - http://www.simplistix.co.uk

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 9:41 AM, Erik Dahl  wrote:
> Guys,
>
> We have a product written in python using ZODB/ZEO and I would like to
> improve the speed of database in general.  Things that I have seen
> that I would like to improve some I understand and some not.

How have you "seen" these? Do you have evidence that they are
affecting your applications specifically?

> 1. Loading of largish (but not too large object had a list with around
> 20K references in it) pickles can be very slow.  Ok what size pickle?
> not 100% sure is there a way to get the pickle size of a zodb
> persistent object?  I lamely tried to pickle one of our persistent
> objects and of course it blew up with max recursion because it went
> beyond the normal bounds of a zodb persistent pickle.  There must be a
> way to do this though right?

You can get the pickle and thus the pickle size for an object fairly
straightforwardly.

  p, s = ob._p_jar.db().storage.load(ob._p_oid, '')
  print len(p)

Pickling is actually pretty fast.

By list, do you mean  a Python list (or persistent list)?
When such an object is loaded, the database will have to instantiate
all those objects. That might be time consuming.  In general, I'd avoid
large lists or persistent lists, especially if they are uploaded a lot.

If you try to access those objects all at once, then those objects
will have to be loaded from the database, and that can be very
time consuming.

> 2. Writing lots of objects.  I know that zodb wasn't written for this
> type of use case but we have backed into it.

While some relational databases have greater transaction throughput,
you can fair writes throughput with ZODB.

>  We can have many (~30 is
> that a lot?)

That's on the order of what we have.

> zodb clients and as a result large numbers of cache
> invalidations can be sent when a write occurs.  Could invalidation
> performance / cache refresh be an issue?

I don't think so. Invalidations aren't that large and are sent asynchronously.

> 3. DB hot spots.  Of course we see conflict errors when there are lots
> of writes to the db from different clients that touch the same
> object.  We haven't done a bunch of optimization work here but I'm
> thinking of moving all indexing out to a separate client/process that
> reads off a queue to find objects to index.  I'm guessing the indexes
> are a hotspot (haven't tested this out much though I guess b-tree's
> buckets should alleviate this problem some).  (is there a persistent
> queue around?)

hotspots are an issue in any database.  It is generally worth
application refactoring to avoid them. :)

> Anyway these are some things that come to mind when I think of
> performance issues.  I have the thought that many could be made better
> with faster ZEO I/O.  Does this seem like a good assumption?  If so
> what could we do to make ZEO faster?

There are 2 major issues with ZEO that I'm aware of:

- Each object load requires a round trip.  If you have a list
  of 20,000 objects and you want to iterate over them, that
  will require 20,000 separate object load requests, each requiring
  a network round trip.

  We've discussed schemes to give the database hints to pre-fetch
  objects, thus avoiding serial round trips.

- ZEO servers are currently single threaded. This can hurt a lot of you
  have many clients and prevents you from taking advantage of multi-
  spindle storage systems.

I'll note that if you have a large database, a substantial amount of time
can be spent doing disk IO.  In a recent test we performed with a 500GB
database, disk access accounted for ~15 milliseconds of a ~16 millisecond
object load requests. IOW, disk access times swamped network access times,
which in my measurements were on the order of a 600-1000 microseconds.

> Questions:
>
> * We use a filestorage are there faster ones?

Not that I'm aware of.

> Can this be a bottleneck?

Not in and of itself AFAIK.  Your disk configuration
can matter a lot. How big is your database file?

The biggest issues with file storage for large databases are:

- packing can take a long time and can affect database
  performance a great deal.

- shut down and start up can take a long time to write and read
  an index file. God help you if you have to open a large database
  without an up to date index.

This is why I am (yet again) exploring a BerkeleyDB-based
storage implementation.

> * Is the ZEO protocol inefficient?

No, or at least not enough to matter.

> * Is the ZEO server just plain slow?

To some degree, yes, mainly because it is single threaded.

See my jim-thready-zeo2 branch.  I suspect that Python's GIL
will always put ZEO at a disadvantage relative to n

You can help by testing this. :)

> Thoughts I have that may have no impact.
>
> * rewrite ZEO or parts of it in C
> * write a C based storage

I don't think either of these will help in any significant way.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the Z

Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 9:41 AM, Erik Dahl  wrote:
> Guys,
>
> We have a product written in python using ZODB/ZEO and I would like to
> improve the speed of database in general.  Things that I have seen
> that I would like to improve some I understand and some not.

How have you "seen" these? Do you have evidence that they are
affecting your applications specifically?

> 1. Loading of largish (but not too large object had a list with around
> 20K references in it) pickles can be very slow.  Ok what size pickle?
> not 100% sure is there a way to get the pickle size of a zodb
> persistent object?  I lamely tried to pickle one of our persistent
> objects and of course it blew up with max recursion because it went
> beyond the normal bounds of a zodb persistent pickle.  There must be a
> way to do this though right?

You can get the pickle and thus the pickle size for an object fairly
straightforwardly.

  p, s = ob._p_jar.db().storage.load(ob._p_oid, '')
  print len(p)

Pickling is actually pretty fast.

By list, do you mean  a Python list (or persistent list)?
When such an object is loaded, the database will have to instantiate
all those objects. That might be time consuming.  In general, I'd avoid
large lists or persistent lists, especially if they are uploaded a lot.

If you try to access those objects all at once, then those objects
will have to be loaded from the database, and that can be very
time consuming.

> 2. Writing lots of objects.  I know that zodb wasn't written for this
> type of use case but we have backed into it.

While some relational databases have greater transaction throughput,
you can fair writes throughput with ZODB.

>  We can have many (~30 is
> that a lot?)

That's on the order of what we have.

> zodb clients and as a result large numbers of cache
> invalidations can be sent when a write occurs.  Could invalidation
> performance / cache refresh be an issue?

I don't think so. Invalidations aren't that large and are sent asynchronously.

> 3. DB hot spots.  Of course we see conflict errors when there are lots
> of writes to the db from different clients that touch the same
> object.  We haven't done a bunch of optimization work here but I'm
> thinking of moving all indexing out to a separate client/process that
> reads off a queue to find objects to index.  I'm guessing the indexes
> are a hotspot (haven't tested this out much though I guess b-tree's
> buckets should alleviate this problem some).  (is there a persistent
> queue around?)

hotspots are an issue in any database.  It is generally worth
application refactoring to avoid them. :)

> Anyway these are some things that come to mind when I think of
> performance issues.  I have the thought that many could be made better
> with faster ZEO I/O.  Does this seem like a good assumption?  If so
> what could we do to make ZEO faster?

There are 2 major issues with ZEO that I'm aware of:

- Each object load requires a round trip.  If you have a list
  of 20,000 objects and you want to iterate over them, that
  will require 20,000 separate object load requests, each requiring
  a network round trip.

  We've discussed schemes to give the database hints to pre-fetch
  objects, thus avoiding serial round trips.

- ZEO servers are currently single threaded. This can hurt a lot of you
  have many clients and prevents you from taking advantage of multi-
  spindle storage systems.

I'll note that if you have a large database, a substantial amount of time
can be spent doing disk IO.  In a recent test we performed with a 500GB
database, disk access accounted for ~15 milliseconds of a ~16 millisecond
object load requests. IOW, disk access times swamped network access times,
which in my measurements were on the order of a 600-1000 microseconds.

> Questions:
>
> * We use a filestorage are there faster ones?

Not that I'm aware of.

> Can this be a bottleneck?

Not in and of itself AFAIK.  Your disk configuration
can matter a lot. How big is your database file?

The biggest issues with file storage for large databases are:

- packing can take a long time and can affect database
  performance a great deal.

- shut down and start up can take a long time to write and read
  an index file. God help you if you have to open a large database
  without an up to date index.

This is why I am (yet again) exploring a BerkeleyDB-based
storage implementation.

> * Is the ZEO protocol inefficient?

No, or at least not enough to matter.

> * Is the ZEO server just plain slow?

To some degree, yes, mainly because it is single threaded.

See my jim-thready-zeo2 branch.  I suspect that Python's GIL
will always put ZEO at a disadvantage relative to non-Python-based
database servers.

You can help by testing this. :)

> Thoughts I have that may have no impact.
>
> * rewrite ZEO or parts of it in C
> * write a C based storage

I don't think either of these will help in any significant way.

Jim

-- 
Jim Fulton
___
For more 

Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Jim Fulton
On Fri, Dec 4, 2009 at 11:31 AM, Erik Dahl  wrote:
...
> I haven't dove into the relstorage yet but have heard it will perform
> better.

It doesn't.  See:

  https://mail.zope.org/pipermail/zodb-dev/2009-October/012758.html

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Erik Dahl
Right I have looked at the nosql stuff a good bit.  MongoDB is  
interesting but no transactions and their sharding is alpha code.   
ChouchDB has transactions but no sharding at all.  HBase has what I  
want but I can't get my brain around how to use the wacky schema.

Unfortunately, this isn't a new project so if we could get more out of  
what we have that would be great but clearly swapping out the back end  
may become necessary.

I haven't dove into the relstorage yet but have heard it will perform  
better.  Not sure I understand why though.  Isn't it just putting  
pickles into a single table with an index on the oid? (or oid / serial).

-EAD



On Dec 4, 2009, at 10:15 AM, Andreas Jung wrote:

> The project - at least enterprise-level projects - requires are  
> careful
> choice of the tools and backends. The ZODB is the golden bullet for  
> all
> and everything. Depending on the data model and the project needs you
> have to
> look at relational database or NOSQL databases as alternatives.
>
> And as you wrote: the ZODB-based application perform bad in heavy- 
> write
> scenarios...you may raise the limits by using Relstorage but perhaps
> other backends might be the better choice - something one must  
> consider
> when starting a new project.
>
> -aj
>
> Am 04.12.09 15:41, schrieb Erik Dahl:
>> Guys,
>>
>> We have a product written in python using ZODB/ZEO and I would like  
>> to
>> improve the speed of database in general.  Things that I have seen
>> that I would like to improve some I understand and some not.
>>
>> 1. Loading of largish (but not too large object had a list with  
>> around
>> 20K references in it) pickles can be very slow.  Ok what size pickle?
>> not 100% sure is there a way to get the pickle size of a zodb
>> persistent object?  I lamely tried to pickle one of our persistent
>> objects and of course it blew up with max recursion because it went
>> beyond the normal bounds of a zodb persistent pickle.  There must  
>> be a
>> way to do this though right?
>>
>> 2. Writing lots of objects.  I know that zodb wasn't written for this
>> type of use case but we have backed into it.  We can have many (~30  
>> is
>> that a lot?) zodb clients and as a result large numbers of cache
>> invalidations can be sent when a write occurs.  Could invalidation
>> performance / cache refresh be an issue?
>>
>> 3. DB hot spots.  Of course we see conflict errors when there are  
>> lots
>> of writes to the db from different clients that touch the same
>> object.  We haven't done a bunch of optimization work here but I'm
>> thinking of moving all indexing out to a separate client/process that
>> reads off a queue to find objects to index.  I'm guessing the indexes
>> are a hotspot (haven't tested this out much though I guess b-tree's
>> buckets should alleviate this problem some).  (is there a persistent
>> queue around?)
>>
>> Anyway these are some things that come to mind when I think of
>> performance issues.  I have the thought that many could be made  
>> better
>> with faster ZEO I/O.  Does this seem like a good assumption?  If so
>> what could we do to make ZEO faster?
>>
>> Questions:
>>
>> * We use a filestorage are there faster ones?  Can this be a  
>> bottleneck?
>> * Is the ZEO protocol inefficient?
>> * Is the ZEO server just plain slow?
>>
>> Thoughts I have that may have no impact.
>>
>> * rewrite ZEO or parts of it in C
>> * write a C based storage
>>
>> Others?
>>
>> -EAD
>>
>>
>>
>> ___
>> For more information about ZODB, see the ZODB Wiki:
>> http://www.zope.org/Wikis/ZODB/
>>
>> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
>> https://mail.zope.org/mailman/listinfo/zodb-dev
>>
>
>
> -- 
> ZOPYX Ltd. & Co KG  \  zopyx group
> Charlottenstr. 37/1  \  The full-service network for your
> D-72070 Tübingen  \  Python, Zope and Plone projects
> www.zopyx.com, i...@zopyx.com  \  www.zopyxgroup.com
> 
> E-Publishing, Python, Zope & Plone development, Consulting
>
>
> 

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Andreas Jung
The project - at least enterprise-level projects - requires are careful
choice of the tools and backends. The ZODB is the golden bullet for all
and everything. Depending on the data model and the project needs you
have to
look at relational database or NOSQL databases as alternatives.

And as you wrote: the ZODB-based application perform bad in heavy-write
scenarios...you may raise the limits by using Relstorage but perhaps
other backends might be the better choice - something one must consider
when starting a new project.

-aj

Am 04.12.09 15:41, schrieb Erik Dahl:
> Guys,
>
> We have a product written in python using ZODB/ZEO and I would like to  
> improve the speed of database in general.  Things that I have seen  
> that I would like to improve some I understand and some not.
>
> 1. Loading of largish (but not too large object had a list with around  
> 20K references in it) pickles can be very slow.  Ok what size pickle?   
> not 100% sure is there a way to get the pickle size of a zodb  
> persistent object?  I lamely tried to pickle one of our persistent  
> objects and of course it blew up with max recursion because it went  
> beyond the normal bounds of a zodb persistent pickle.  There must be a  
> way to do this though right?
>
> 2. Writing lots of objects.  I know that zodb wasn't written for this  
> type of use case but we have backed into it.  We can have many (~30 is  
> that a lot?) zodb clients and as a result large numbers of cache  
> invalidations can be sent when a write occurs.  Could invalidation  
> performance / cache refresh be an issue?
>
> 3. DB hot spots.  Of course we see conflict errors when there are lots  
> of writes to the db from different clients that touch the same  
> object.  We haven't done a bunch of optimization work here but I'm  
> thinking of moving all indexing out to a separate client/process that  
> reads off a queue to find objects to index.  I'm guessing the indexes  
> are a hotspot (haven't tested this out much though I guess b-tree's  
> buckets should alleviate this problem some).  (is there a persistent  
> queue around?)
>
> Anyway these are some things that come to mind when I think of  
> performance issues.  I have the thought that many could be made better  
> with faster ZEO I/O.  Does this seem like a good assumption?  If so  
> what could we do to make ZEO faster?
>
> Questions:
>
> * We use a filestorage are there faster ones?  Can this be a bottleneck?
> * Is the ZEO protocol inefficient?
> * Is the ZEO server just plain slow?
>
> Thoughts I have that may have no impact.
>
> * rewrite ZEO or parts of it in C
> * write a C based storage
>
> Others?
>
> -EAD
>
>
>
> ___
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev
>   


-- 
ZOPYX Ltd. & Co KG  \  zopyx group
Charlottenstr. 37/1  \  The full-service network for your 
D-72070 Tübingen  \  Python, Zope and Plone projects
www.zopyx.com, i...@zopyx.com  \  www.zopyxgroup.com

E-Publishing, Python, Zope & Plone development, Consulting


<>___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Making ZODB / ZEO faster

2009-12-04 Thread Fred Drake
On Fri, Dec 4, 2009 at 9:41 AM, Erik Dahl  wrote:
>  I'm guessing the indexes
> are a hotspot (haven't tested this out much though I guess b-tree's
> buckets should alleviate this problem some).  (is there a persistent
> queue around?)

Cataloging certainly can be a hot spot.  Check out zc.catalogqueue.


  -Fred

-- 
Fred L. Drake, Jr.
"Chaos is the score upon which reality is written." --Henry Miller
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev