Re: [ZODB-Dev] Server-side caching
Dear Jim, Thanks for your answer. The OS' file-system cache acts as a storage server cache. The storage server does (essentially) no processing to data read from disk, so an application-level cache would add nothing over the disk cache provided by the storage server. I see, then I guess it would be good to have at least the same amount of RAM as the total size of the DB, no? From what I see in our server, the linux buffer cache takes around 13GB of the 16G available, while the rest is mostly taken by the ZEO process (1.7G). The database is 17GB on disk. Also note that, for better or worse, FileStorage uses an in-memory index of current record positions, so no disk access is needed to find current data. Yes, but pickles still have to be retrieved, right? I guess this would mean random access (for a database like ours, in which we have many small objects), which doesn't favor cache performance. I'm asking this because in the tests we've made wih SSDs we have seen a 20% decrease in reading time for non-client-cached objects. So, there seems to be some disk i/o going on. In general, I'd say no. It can depend on lots of details, including: - database size - active set size - network speed - memory and disk speeds on clients and servers - ... In any case, from what I see, these client caches cannot be shared between processes, which doesn't make them very useful , in which we have many parallel processes asking for the same objects over and over again. Thanks once again, Pedro ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Server-side caching
On 13 February 2012 10:06, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: The OS' file-system cache acts as a storage server cache. The storage server does (essentially) no processing to data read from disk, so an application-level cache would add nothing over the disk cache provided by the storage server. I see, then I guess it would be good to have at least the same amount of RAM as the total size of the DB, no? From what I see in our server, the linux buffer cache takes around 13GB of the 16G available, while the rest is mostly taken by the ZEO process (1.7G). The database is 17GB on disk. Adding enough memory so the database fits in RAM is always a good idea. Since the introduction of blobs, this should be possible (and relatively cheap) for most ZODB deployments. For Plone sites, a 30GB pre-blobs Data.fs typically falls to 2-3GB with blobs. There's also the wrapper storage zc.zlibstorage which compresses ZODB records allowing more of the database to fit in RAM (RelStorage has an option to compress records.) Also note that, for better or worse, FileStorage uses an in-memory index of current record positions, so no disk access is needed to find current data. Yes, but pickles still have to be retrieved, right? I guess this would mean random access (for a database like ours, in which we have many small objects), which doesn't favor cache performance. I'm asking this because in the tests we've made wih SSDs we have seen a 20% decrease in reading time for non-client-cached objects. So, there seems to be some disk i/o going on. The mean performance improvement doesn't tell the whole story here. With most of you database in the file-system cache median read times will be identical, but your 95th percentile read times will show a huge decrease as the seek time on an SSD is orders of magnitude lower than the seek time of a spinning disk. Even when you have enough RAM so the OS can cache the database in memory, I still think SSDs are worthwhile. Packing the database, backing up or any operation that churns through the disk can all cause the database to drop out of the file-system cache. Be sure to choose an SSD with capacitor backup so it won't lose your data, see: http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html. In general, I'd say no. It can depend on lots of details, including: - database size - active set size - network speed - memory and disk speeds on clients and servers - ... In any case, from what I see, these client caches cannot be shared between processes, which doesn't make them very useful , in which we have many parallel processes asking for the same objects over and over again. You could try a ZEO fanout setup too, where you have a ZEO server running on each client machine. The intermediary ZEO's client cache (you could put it on tmpfs if you have enough RAM) is then shared between all the clients running on that machine. Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Server-side caching
between processes, which doesn't make them very useful , in which we very useful *for our setup* -- José Pedro Ferreira Software Developer, Indico Project http://indico-software.org +---+ + '``'--- `+ CERN - European Organization for Nuclear Research + |CERN| / + 1211 Geneve 23, Switzerland + ..__. \. + IT-CIS-AVC + \\___.\ + Office: 513-1-005 + /+ Tel. +41227677159 +---+ ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Server-side caching
On Mon, Feb 13, 2012 at 5:06 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Dear Jim, Thanks for your answer. The OS' file-system cache acts as a storage server cache. The storage server does (essentially) no processing to data read from disk, so an application-level cache would add nothing over the disk cache provided by the storage server. I see, then I guess it would be good to have at least the same amount of RAM as the total size of the DB, no? From what I see in our server, the linux buffer cache takes around 13GB of the 16G available, while the rest is mostly taken by the ZEO process (1.7G). The database is 17GB on disk. Having enough ram to hold your entire database may not be practical. Ideally, you want enough to hold the working set. For many applications, most of the database reads are from the later part of the file. The working set is often much smaller than the whole file. Also note that, for better or worse, FileStorage uses an in-memory index of current record positions, so no disk access is needed to find current data. Yes, but pickles still have to be retrieved, right? Yes, but this is better than having to do disk accesses to get the meta data needed to find the records. I guess this would mean random access (for a database like ours, in which we have many small objects), which doesn't favor cache performance. I don't see how this follows. ... In general, I'd say no. It can depend on lots of details, including: - database size - active set size - network speed - memory and disk speeds on clients and servers - ... In any case, from what I see, these client caches cannot be shared between processes, which doesn't make them very useful , in which we have many parallel processes asking for the same objects over and over again. The caches are still probably providing benefit, depending on how large they are. If you haven't, you should probably try using the ZEO cache-analysis scripts to get a better handle on how effective our cache is and whether it should be larger. It's true that storing the same data in many caches is inefficient. I imagine that someone will eventually figure out how to use memcached to implement a shared ZEO cache, as has been done for relstorage. At PyCon, I'll be presenting work I've been doing on a load balancer that seeks to avoid sharing the same data in multiple caches by assigning different kinds of work to different workers. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Server-side caching
Having enough ram to hold your entire database may not be practical. Ideally, you want enough to hold the working set. For many applications, most of the database reads are from the later part of the file. The working set is often much smaller than the whole file. That is a very good point. I will try to find that out, maybe I can take a FileStorage index file and calculate the distribution. I guess this would mean random access (for a database like ours, in which we have many small objects), which doesn't favor cache performance. I don't see how this follows. I meant that if we have to retrieve different small pickles from disk, this will result in continuous access to random disk locations, which can be bad (depending on the granularity of the cache). However, considering what you've said above (that the working set should be located at the later part of the file), maybe that's not the case. The caches are still probably providing benefit, depending on how large they are. If you haven't, you should probably try using the ZEO cache-analysis scripts to get a better handle on how effective our cache is and whether it should be larger. Will do so. I imagine that someone will eventually figure out how to use memcached to implement a shared ZEO cache, as has been done for relstorage. That would be great. At PyCon, I'll be presenting work I've been doing on a load balancer that seeks to avoid sharing the same data in multiple caches by assigning different kinds of work to different workers. I will be at the conference, will for sure attend :) Cheers, Pedro -- José Pedro Ferreira Software Developer, Indico Project http://indico-software.org +---+ + '``'--- `+ CERN - European Organization for Nuclear Research + |CERN| / + 1211 Geneve 23, Switzerland + ..__. \. + IT-CIS-AVC + \\___.\ + Office: 513-1-005 + /+ Tel. +41227677159 +---+ ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Server-side caching
Hello all, A (possibly silly) question: does ZEO have some kind of server-side cache? I mean, each time an oid is requested by one of the clients is it retrieved from the DB file directly, or are some of the objects kept in memory? From what I see in the code, the latter doesn't seem to happen. I know there are client-side caches, but in a multiple client/server context I wonder if it's not faster to ask the DB for an oid that is already in memory instead of retrieving it from the client cache? Thanks in advance, Pedro -- José Pedro Ferreira Software Developer, Indico Project http://indico-software.org +---+ + '``'--- `+ CERN - European Organization for Nuclear Research + |CERN| / + 1211 Geneve 23, Switzerland + ..__. \. + IT-CIS-AVC + \\___.\ + Office: 513-1-005 + /+ Tel. +41227677159 +---+ ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Server-side caching
On Fri, Feb 10, 2012 at 4:49 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Hello all, A (possibly silly) question: does ZEO have some kind of server-side cache? I mean, each time an oid is requested by one of the clients is it retrieved from the DB file directly, or are some of the objects kept in memory? From what I see in the code, the latter doesn't seem to happen. No -- and yes. :) The OS' file-system cache acts as a storage server cache. The storage server does (essentially) no processing to data read from disk, so an application-level cache would add nothing over the disk cache provided by the storage server. Also note that, for better or worse, FileStorage uses an in-memory index of current record positions, so no disk access is needed to find current data. I know there are client-side caches, but in a multiple client/server context I wonder if it's not faster to ask the DB for an oid that is already in memory instead of retrieving it from the client cache? In general, I'd say no. It can depend on lots of details, including: - database size - active set size - network speed - memory and disk speeds on clients and servers - ... Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev