Re: [ZODB-Dev] Speeding up ZODB (was "redis cache for RelStorage")

2011-05-05 Thread Jim Fulton
On Thu, May 5, 2011 at 4:43 AM, Pedro Ferreira
 wrote:

...

> Since we are talking about speed, does anyone have any tips on making
> ZODB (in general) faster?

It's hard to give general advice.  The basic thing to be aware of is
that lods from ZEO cache have some cost, but are fairly cheap
especially if your ZEO cache can fit in the local operating-system
disk cache. (You may want to add RAM to your clients.)

Loads from the storage server are quite a bit more expensive. If you
have a large database (100s of Gigs) then disk access times on your
storage server can also be a major factor.

Effective use of ZEO cache can make a big difference.  To decide how
big your ZEO cache should be, turn on ZEO cache tracing, starting from
an empty cache, and use the ZEO/scripts/cache_simul.py to experiment
with different cache sizes.

To enable cache tracing, run your application with the ZEO_CACHE_TRACE
environment variable set to a non-empty value.

> In our project, the DB is apparently the
> bottleneck, and we are considering implementing a memcache layer in
> order to avoid fetching so often from DB.

You should look at your ZEO cache configuration first.  You may be
able to improve performance quite a bit by simply increasing your ZEO
cache sizes.  If you have larger ZEO caches, you should probably use
persistent caches. You probably also want to set:

- drop-cache-rather-verify to true on the client
- set invalidation-age on the server to at least an hour or two so
  that you deal with being disconnected from the storage server for a
  reasonable period of time without having to verify.

If you haven't, make sure binary data like photos, movies, whatever
are stored in blobs.

Consider compression your database records:

  http://pypi.python.org/pypi/zc.zlibstorage

Not only will thhis save disk space, but it will allow more of your
database to fit in the storage server's disk cache.  It will allow
your ZEO caches to store 2-3 times as many records for a given cache
size.

A disadvantage of ZEO caches is that they aren't shared between
processes and I've been thinking of ways to leverage something like
memcached.

> However, we were also
> wondering if we could in some way take advantage of different computer
> hardware - since the ZEO server is mostly single-threaded we thought of
> getting a machine with higher clock freq and larger cache rather than a
> commodity 8-core server (which is what we are using now).

Is your storage server CPU bound?  Starting with ZODB 3.10, ZEO
storage servers are multi-threaded. They have a thread for each
client.  We have a storage server that has run at 120% cpu on a 4-core
box.  Also, if you use zc.FileStorage, packing is mostly done in a
separate process.

> Any tips on the kind of hardware that performs best with ZODB/ZEO?

A major source of slow down *can* be disk access times. How's IO wait
on your server?  If IO wait is high, then consider adding ram to get a
larger cache or moving the database to an ssd. We're running our
largest most active database on an SSD.  (Blobs are still on magnetic
disk.) Again, compression can help a lot here, allowing databases to
fit on ssd that otherwise wouldn't.

> Are
> there any adjustments that can be done at the OS or even application
> layer that might improve performance?

Look at how your application is using data. If you have requests that
have to load a lot of data, maybe you can refactor your application to
load fewer.

Jim

--
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] How to check for setting the same values on persistent objects?

2011-05-05 Thread Alexandre Garel
Le 04/05/2011 11:53, Hanno Schlichting a écrit :
> Hi.
>
> I tried to analyze the overhead of changing content in Plone a bit. It
> turns out we write back a lot of persistent objects to the database,
> even tough the actual values of these objects haven't changed.
>
> Digging deeper I tried to understand what happens here:
>
> 1. persistent.__setattr__ will always set _p_changed to True and thus
> cause the object to be written back
> 2. Some BTree buckets define the "VALUE_SAME" macro. If the macro is
> available and the new value is the same as the old, the change is
> ignored
> 3. The VALUE_SAME macro is only defined for the int, long and float
> value variants but not the object based ones
> 4. All code in Products.ZCatalog does explicit comparisons of the old
> and new value and ignores non-value-changes. I haven't seen any other
> code doing this.
>
> I'm assuming doing a general check for "old == new" is not safe, as it
> might not be implemented correctly for all objects and doing the
> comparison might be expensive.

I know very few of ZODB internals but in Python "old == new" does not 
means "old is new"

I don't know the way ZODB retrieve a particular object exactly but I 
assume it does this using _p_oid. So for persistant classes you could 
check old._p_oid == new._p_oid. For string, int you can of course use 
old is new.

Sorry if I'm wrong, as I may miss lot of thing !

Alex
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Speeding up ZODB (was "redis cache for RelStorage")

2011-05-05 Thread Hanno Schlichting
On Thu, May 5, 2011 at 10:43 AM, Pedro Ferreira
 wrote:
> Since we are talking about speed, does anyone have any tips on making
> ZODB (in general) faster?

Query fewer objects from the database. Make sure you don't store lots
of tiny persistent objects in the database, I'd aim for storing data
in chunks of 8-32kb or use blobs for larger objects. Remember that
ZODB is a key/value storage for the most part. Model your data
accordingly.

> In our project, the DB is apparently the
> bottleneck, and we are considering implementing a memcache layer in
> order to avoid fetching so often from DB.

Before you do that, you might consider switching to RelStorage, which
already has a memcached caching layer in addition to the connection
caches.

But remember that throwing more caches at the problem isn't a
solution. It's likely the way you store or query the data from the
database that's not optimal.

> However, we were also
> wondering if we could in some way take advantage of different computer
> hardware - since the ZEO server is mostly single-threaded we thought of
> getting a machine with higher clock freq and larger cache rather than a
> commodity 8-core server (which is what we are using now).

The ZEO server needs almost no CPU power, except for garbage
collection and packing. During normal operations the CPU speed should
be irrelevant.

> Any tips on the kind of hardware that performs best with ZODB/ZEO? Are
> there any adjustments that can be done at the OS or even application
> layer that might improve performance?

Faster disks. Whatever you can do to get faster disks will help
performance. But that's general advise that applies to all database
servers. You can also throw more memory at the db server, so the
operating systems disk cache will kick in and you'll actually read
data from memory instead of the disks.

Hanno
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Speeding up ZODB (was "redis cache for RelStorage")

2011-05-05 Thread Pedro Ferreira

> I suspect either one is so fast that the speed of Redis or Memcached is 
> irrelevant.  If you want speed, minimize the latency of the *network*, 
> and that means getting good network hardware.

Since we are talking about speed, does anyone have any tips on making
ZODB (in general) faster? In our project, the DB is apparently the
bottleneck, and we are considering implementing a memcache layer in
order to avoid fetching so often from DB. However, we were also
wondering if we could in some way take advantage of different computer
hardware - since the ZEO server is mostly single-threaded we thought of
getting a machine with higher clock freq and larger cache rather than a
commodity 8-core server (which is what we are using now).
Any tips on the kind of hardware that performs best with ZODB/ZEO? Are
there any adjustments that can be done at the OS or even application
layer that might improve performance?

Thanks,

Pedro

-- 
José Pedro Ferreira

Software Developer, Indico Project
http://indico-software.org

+---+
+  '``'--- `+  CERN - European Organization for Nuclear Research
+ |CERN|  / +  1211 Geneve 23, Switzerland
+ ..__. \.  +  IT-UDS-AVC
+  \\___.\  +  Office: 513-R-0042
+  /+  Tel. +41227677159
+---+
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev