Re: Optimising cache performance
Clinton Gormley wrote: For now it's not a distributed system, and I have been using Cache::FileCache. But that still means freezing and thawing objects - which I'm trying to minimise. Other things (IPC::MM, MLDBM::Sync, Cache::Mmap, BerkeleyDB) are significantly faster than Cache::FileCache. If you have tons of free memory, then go ahead and cache things in memory. My feeling is that the very small amount of time that the fastest of these systems use to freeze and thaw is totally made up for in the huge memory savings which allows you to run more server processes. When you say that Cache::Mmap is only limited by the size of your disk, is that because the file in memory gets written to disk as part of VM? ( I don't see any other mention of files in the docs.) Which presumably means resizing your VM to make space for the cache? That's right, it uses your system's mmap() call. I've never needed to adjust the amount of VM I have because of memory-mapping a file, but I suppose it could happen. This would be a good question for the author of the module, or an expert on your system's mmap() implementation. I see the author of IPC::MM has an e-toys address - was this something you used at e-toys? It was used at one point, although not in the version of the system that I wrote about. He originally wrote it as a wrapper around the mm library, and I asked if he could put in a shared hash just for fun. It turned out be very fast, largely because the sharing and the hash (or btree) is implemented in C. The Perl part is just an interface to it. I know very little about shared memory segments, but is MM used to share small data objects, rather than to keep large caches in shared memory? It's a shared hash. You can put whatever you want into it. Apache uses mm to share data between processes. Ralph Engelschall writes in the MM documentation : "The maximum size of a continuous shared memory segment one can allocate depends on the underlaying platform. This cannot be changed, of course. But currently the high-level malloc(3)-style API just uses a single shared memory segment as the underlaying data structure for an MM object which means that the maximum amount of memory an MM object represents also depends on the platform." What implications does this have on the size of the cache that can be created with IPC::MM It varies by platform, but I believe that on Linux it means each individual hash is limited to 64MB. So maybe I spoke too soon about having unlimited storage, but you should be able to have as many hashes as you want. If you're seriously concerned about storage limits like these, you could use one of the other options like MLDBM::Sync or BerkeleyDB which use disk-storage. - Perrin
Re: Optimising cache performance
> What implications does this have on the size of the cache that can be > created with IPC::MM I believe that documentation is telling you that each OS governs the amount of shared memory you can have in different ways. Linux, for example, has a variable called shmmax, accessible as /proc/sys/kernel/shmmax, which controls how much shared memory you are allowed to allocate. I think Solaris' setting lives in /etc/system somewhere. Cory 'G' Watson http://gcdb.spleck.net
Re: Optimising cache performance
Thanks for your feedback - a couple more questions First, I'm assuming this is for a distributed system running on multiple servers. If not, you should just download one of the cache modules from CPAN. They're good. For now it's not a distributed system, and I have been using Cache::FileCache. But that still means freezing and thawing objects - which I'm trying to minimise. I suggest you use either Cache::Mmap or IPC::MM for your local cache. They are both very fast and will save you memory. Also, Cache::Mmap is only limited by the size of your disk, so you don't have to do any purging. When you say that Cache::Mmap is only limited by the size of your disk, is that because the file in memory gets written to disk as part of VM? ( I don't see any other mention of files in the docs.) Which presumably means resizing your VM to make space for the cache? You seem to be taking a lot of care to ensure that everything always has the latest version of the data. If you can handle slightly out-of-date Call me anal ;) Most of the time it wouldn't really matter, but sometimes it could be extremely off-putting If everything really does have to be 100% up-to-date, then what you're doing is reasonable. It would be nice to not do the step that checks for outdated objects before processing the request, but instead do it in a cleanup handler, although that could lead to stale data being used now and then. Yes - had considered that. If you were using a shared cache like Cache::Mmap, you could have a cron job or a separate Perl daemon that simply purges outdated objects every minute or so, and leave that out of your mod_perl code completely. I see the author of IPC::MM has an e-toys address - was this something you used at e-toys? I know very little about shared memory segments, but is MM used to share small data objects, rather than to keep large caches in shared memory? Ralph Engelschall writes in the MM documentation : "The maximum size of a continuous shared memory segment one can allocate depends on the underlaying platform. This cannot be changed, of course. But currently the high-level malloc(3)-style API just uses a single shared memory segment as the underlaying data structure for an MM object which means that the maximum amount of memory an MM object represents also depends on the platform." What implications does this have on the size of the cache that can be created with IPC::MM thanks Clinton Gormley
Re: Optimising cache performance
On Friday, March 7, 2003, at 02:20 PM, Perrin Harkins wrote: Cory 'G' Watson wrote: I'm not sure if my way would fit in with your objects Clinton, but I have some code in the commit() method of all my objects which, when it is called, removes any cached copies of the object. That's how I stay up to date. Why wouldn't it simply update the version in the cache when you commit? Also, do you have a way of synchronizing changes across multiple machines? I suppose it could, but I use it as a poor man's cache cleaning. I suppose it would boost performance to do what you suggest. I'll just implement a cache cleaner elsewhere. I only run on one machine, so I don't do any synchronization. I hope to have that problem some day ;) Cory 'G' Watson http://gcdb.spleck.net
Re: Optimising cache performance
Cory 'G' Watson wrote: I'm not sure if my way would fit in with your objects Clinton, but I have some code in the commit() method of all my objects which, when it is called, removes any cached copies of the object. That's how I stay up to date. Why wouldn't it simply update the version in the cache when you commit? Also, do you have a way of synchronizing changes across multiple machines? - Perrin
Re: Optimising cache performance
On Friday, March 7, 2003, at 12:45 PM, Perrin Harkins wrote: You seem to be taking a lot of care to ensure that everything always has the latest version of the data. If you can handle slightly out-of-date data, I would suggest that you simply keep objects in the local cache with a time-to-live (which Cache::Mmap or Cache::FileCache can do for you) and just look at the local version until it expires. You would end up building the objects once per server, but that isn't so bad. I'm not sure if my way would fit in with your objects Clinton, but I have some code in the commit() method of all my objects which, when it is called, removes any cached copies of the object. That's how I stay up to date. Cory 'G' Watson http://gcdb.spleck.net
Re: Optimising cache performance
Clinton Gormley wrote: I'd appreciate some feedback on my logic to optimise my cache (under mod_perl 1) First, I'm assuming this is for a distributed system running on multiple servers. If not, you should just download one of the cache modules from CPAN. They're good. I'm planning a two level cache : 1) Live objects in each mod_perl process 2) Serialised objects in a database I suggest you use either Cache::Mmap or IPC::MM for your local cache. They are both very fast and will save you memory. Also, Cache::Mmap is only limited by the size of your disk, so you don't have to do any purging. You seem to be taking a lot of care to ensure that everything always has the latest version of the data. If you can handle slightly out-of-date data, I would suggest that you simply keep objects in the local cache with a time-to-live (which Cache::Mmap or Cache::FileCache can do for you) and just look at the local version until it expires. You would end up building the objects once per server, but that isn't so bad. If everything really does have to be 100% up-to-date, then what you're doing is reasonable. It would be nice to not do the step that checks for outdated objects before processing the request, but instead do it in a cleanup handler, although that could lead to stale data being used now and then. If you were using a shared cache like Cache::Mmap, you could have a cron job or a separate Perl daemon that simply purges outdated objects every minute or so, and leave that out of your mod_perl code completely. Yet another way to handle a distributed cache is to have each write to the cache send updates to the other caches using something like Spread::Queue. This is a bit more complex, but it means you don't need a second-tier in your cache to share updates. - Perrin