Re: [ZODB-Dev] Server-side caching

2012-02-13 Thread Pedro Ferreira

Dear Jim,

Thanks for your answer.


The OS' file-system cache acts as a storage server cache.  The storage
server does (essentially) no processing to data read from disk, so an
application-level cache would add nothing over the disk cache provided by
the storage server.


I see, then I guess it would be good to have at least the same amount of 
RAM as the total size of the DB, no? From what I see in our server, the 
linux buffer cache takes around 13GB of the 16G available, while the 
rest is mostly taken by the ZEO process (1.7G). The database is 17GB on 
disk.



Also note that, for better or worse, FileStorage uses an in-memory index
of current record positions, so no disk access is needed to find current data.


Yes, but pickles still have to be retrieved, right? I guess this would 
mean random access (for a database like ours, in which we have many 
small objects), which doesn't favor cache performance.


I'm asking this because in the tests we've made wih SSDs we have seen a 
20% decrease in reading time for non-client-cached objects. So, there 
seems to be some disk i/o going on.




In general, I'd say no.  It can depend on lots of details, including:

- database size
- active set size
- network speed
- memory and disk speeds on clients and servers
- ...


In any case, from what I see, these client caches cannot be shared 
between processes, which doesn't make them very useful , in which we 
have many parallel processes asking for the same objects over and over 
again.


Thanks once again,

Pedro
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Server-side caching

2012-02-13 Thread Laurence Rowe
On 13 February 2012 10:06, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote:
 The OS' file-system cache acts as a storage server cache.  The storage
 server does (essentially) no processing to data read from disk, so an
 application-level cache would add nothing over the disk cache provided by
 the storage server.


 I see, then I guess it would be good to have at least the same amount of RAM
 as the total size of the DB, no? From what I see in our server, the linux
 buffer cache takes around 13GB of the 16G available, while the rest is
 mostly taken by the ZEO process (1.7G). The database is 17GB on disk.

Adding enough memory so the database fits in RAM is always a good idea.

Since the introduction of blobs, this should be possible (and
relatively cheap) for most ZODB deployments. For Plone sites, a 30GB
pre-blobs Data.fs typically falls to 2-3GB with blobs.

There's also the wrapper storage zc.zlibstorage which compresses ZODB
records allowing more of the database to fit in RAM (RelStorage has an
option to compress records.)

 Also note that, for better or worse, FileStorage uses an in-memory index
 of current record positions, so no disk access is needed to find current
 data.


 Yes, but pickles still have to be retrieved, right? I guess this would mean
 random access (for a database like ours, in which we have many small
 objects), which doesn't favor cache performance.

 I'm asking this because in the tests we've made wih SSDs we have seen a 20%
 decrease in reading time for non-client-cached objects. So, there seems to
 be some disk i/o going on.

The mean performance improvement doesn't tell the whole story here.
With most of you database in the file-system cache median read times
will be identical, but your 95th percentile read times will show a
huge decrease as the seek time on an SSD is orders of magnitude lower
than the seek time of a spinning disk.

Even when you have enough RAM so the OS can cache the database in
memory, I still think SSDs are worthwhile. Packing the database,
backing up or any operation that churns through the disk can all cause
the database to drop out of the file-system cache. Be sure to choose
an SSD with capacitor backup so it won't lose your data, see:
http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html.

 In general, I'd say no.  It can depend on lots of details, including:

 - database size
 - active set size
 - network speed
 - memory and disk speeds on clients and servers
 - ...


 In any case, from what I see, these client caches cannot be shared between
 processes, which doesn't make them very useful , in which we have many
 parallel processes asking for the same objects over and over again.

You could try a ZEO fanout setup too, where you have a  ZEO server
running on each client machine. The intermediary ZEO's client cache
(you could put it on tmpfs if you have enough RAM) is then shared
between all the clients running on that machine.

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Server-side caching

2012-02-13 Thread Pedro Ferreira




between processes, which doesn't make them very useful , in which we


very useful *for our setup*


--
José Pedro Ferreira

Software Developer, Indico Project
http://indico-software.org

+---+
+  '``'--- `+  CERN - European Organization for Nuclear Research
+ |CERN|  / +  1211 Geneve 23, Switzerland
+ ..__. \.  +  IT-CIS-AVC
+  \\___.\  +  Office: 513-1-005
+  /+  Tel. +41227677159
+---+
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Server-side caching

2012-02-13 Thread Jim Fulton
On Mon, Feb 13, 2012 at 5:06 AM, Pedro Ferreira
jose.pedro.ferre...@cern.ch wrote:
 Dear Jim,

 Thanks for your answer.


 The OS' file-system cache acts as a storage server cache.  The storage
 server does (essentially) no processing to data read from disk, so an
 application-level cache would add nothing over the disk cache provided by
 the storage server.


 I see, then I guess it would be good to have at least the same amount of RAM
 as the total size of the DB, no? From what I see in our server, the linux
 buffer cache takes around 13GB of the 16G available, while the rest is
 mostly taken by the ZEO process (1.7G). The database is 17GB on disk.

Having enough ram to hold your entire database may not be practical.
Ideally, you want enough to hold the working set.  For many applications,
most of the database reads are from the later part of the file.  The working
set is often much smaller than the whole file.



 Also note that, for better or worse, FileStorage uses an in-memory index
 of current record positions, so no disk access is needed to find current
 data.


 Yes, but pickles still have to be retrieved, right?

Yes, but this is better than having to do disk accesses to get the meta
data needed to find the records.

 I guess this would mean
 random access (for a database like ours, in which we have many small
 objects), which doesn't favor cache performance.

I don't see how this follows.

...

 In general, I'd say no.  It can depend on lots of details, including:

 - database size
 - active set size
 - network speed
 - memory and disk speeds on clients and servers
 - ...


 In any case, from what I see, these client caches cannot be shared between
 processes, which doesn't make them very useful , in which we have many
 parallel processes asking for the same objects over and over again.

The caches are still probably providing benefit, depending on how large they
are.  If you haven't, you should probably try using the ZEO cache-analysis
scripts to get a better handle on how effective our cache is and whether it
should be larger.

It's true that storing the same data in many caches is inefficient.

I imagine that someone will eventually figure out how to use
memcached to implement a shared ZEO cache, as has been done
for relstorage.

At PyCon, I'll be presenting work I've been doing on a load
balancer that seeks to avoid sharing the same data in multiple
caches by assigning different kinds of work to different workers.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Server-side caching

2012-02-13 Thread Pedro Ferreira

Having enough ram to hold your entire database may not be practical.
Ideally, you want enough to hold the working set.  For many applications,
most of the database reads are from the later part of the file.  The working
set is often much smaller than the whole file.


That is a very good point. I will try to find that out, maybe I can take 
a FileStorage index file and calculate the distribution.



I guess this would mean
random access (for a database like ours, in which we have many small
objects), which doesn't favor cache performance.


I don't see how this follows.


I meant that if we have to retrieve different small pickles from disk, 
this will result in continuous access to random disk locations, which 
can be bad (depending on the granularity of the cache). However, 
considering what you've said above (that the working set should be 
located at the later part of the file), maybe that's not the case.



The caches are still probably providing benefit, depending on how large they
are.  If you haven't, you should probably try using the ZEO cache-analysis
scripts to get a better handle on how effective our cache is and whether it
should be larger.


Will do so.


I imagine that someone will eventually figure out how to use
memcached to implement a shared ZEO cache, as has been done
for relstorage.


That would be great.


At PyCon, I'll be presenting work I've been doing on a load
balancer that seeks to avoid sharing the same data in multiple
caches by assigning different kinds of work to different workers.


I will be at the conference, will for sure attend :)

Cheers,

Pedro

--
José Pedro Ferreira

Software Developer, Indico Project
http://indico-software.org

+---+
+  '``'--- `+  CERN - European Organization for Nuclear Research
+ |CERN|  / +  1211 Geneve 23, Switzerland
+ ..__. \.  +  IT-CIS-AVC
+  \\___.\  +  Office: 513-1-005
+  /+  Tel. +41227677159
+---+
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Server-side caching

2012-02-10 Thread Pedro Ferreira

Hello all,

A (possibly silly) question: does ZEO have some kind of server-side 
cache? I mean, each time an oid is requested by one of the clients is it 
retrieved from the DB file directly, or are some of the objects kept in 
memory? From what I see in the code, the latter doesn't seem to happen.


I know there are client-side caches, but in a multiple client/server 
context I wonder if it's not faster to ask the DB for an oid that is 
already in memory instead of retrieving it from the client cache?


Thanks in advance,

Pedro

--
José Pedro Ferreira

Software Developer, Indico Project
http://indico-software.org

+---+
+  '``'--- `+  CERN - European Organization for Nuclear Research
+ |CERN|  / +  1211 Geneve 23, Switzerland
+ ..__. \.  +  IT-CIS-AVC
+  \\___.\  +  Office: 513-1-005
+  /+  Tel. +41227677159
+---+
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Server-side caching

2012-02-10 Thread Jim Fulton
On Fri, Feb 10, 2012 at 4:49 AM, Pedro Ferreira
jose.pedro.ferre...@cern.ch wrote:
 Hello all,

 A (possibly silly) question: does ZEO have some kind of server-side cache? I
 mean, each time an oid is requested by one of the clients is it retrieved
 from the DB file directly, or are some of the objects kept in memory? From
 what I see in the code, the latter doesn't seem to happen.

No -- and yes. :)

The OS' file-system cache acts as a storage server cache.  The storage
server does (essentially) no processing to data read from disk, so an
application-level cache would add nothing over the disk cache provided by
the storage server.

Also note that, for better or worse, FileStorage uses an in-memory index
of current record positions, so no disk access is needed to find current data.

 I know there are client-side caches, but in a multiple client/server context
 I wonder if it's not faster to ask the DB for an oid that is already in
 memory instead of retrieving it from the client cache?

In general, I'd say no.  It can depend on lots of details, including:

- database size
- active set size
- network speed
- memory and disk speeds on clients and servers
- ...

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev