Re: What to cache ? (was Re: Caching needs to be implemented in Rails application)

William Sobel Fri, 23 Jan 2009 09:15:46 -0800

This writeup was taken from the write-board we started with Akara backquite a few months ago. We tried to do what made sense for a Railsapplication using page/action/fragment caching. A lot of the eventdetail is fragment cached whereas the home page is page cached for non-logged in users and fragment cached for logged-in users. Are you doinga similar split?

Running with low db load is not invalid, many application areattempting to do just that. I'm surprised it made such a largedifference, what percentage of hits are homepage hits for non-loggedin users?

I think Akara's idea is a good one. We should also try to stress othercomponents as well. I think a memcache heavy load would be interestingto test. We can create a special branch of the rails application thatcaches the thumbs in memcached, this is no problem. Currently we'reusing the proxy server to serve the images, aren't you doing the samewith apache?

I was thinking of setting nginx up to use the memcached module (it isreported to give a 4x improvement. This may also be relevant for yourtests as well. http://wiki.codemongers.com/NginxHttpMemcachedModule.


http://www.igvita.com/2008/02/11/nginx-and-memcached-a-400-boost/

As Shanti said, additional input would be greatly appreciated!

- Will

On Jan 23, 2009, at 8:56 AM, Shanti Subramanyam wrote:

Thanks Will. At this point, the PHP app is only caching the homepage. We are wondering whether to even do the Event Detail page asthe load on the database has been drastically cut down just from thehome page caching.It's a dilemma - if we cache too much, there is no load on the db.If we cache too little, there is nothing much in memcached. Ofcourse, if we run a much larger scale (say, 10's of systems for theweb tier), then I'm sure we'll see increasing load on both tiers.But practically speaking, we need to be able to run a reasonableconfiguration.
Akara has another idea to use memcached more heavily, while at thesame time not reducing the db load. Namely, cache the thumbnails init. This will also reduce the load on the filestore (which currentlyis quite heavily stressed for the PHP app). But this strategy won'twork for the rails app will it ? I believe you're serving allstatic files out of the proxy server ?
Would love to hear what others think as well.

Shanti

William Sobel wrote:
On Jan 22, 2009, at 5:11 PM, Shanti Subramanyam wrote:
Can you please elaborate on what exactly is cached ? How is thecache managed (in terms of timeouts etc.) ?
From the original writeup:

Cache Strategy for Web20Kit

Home Page

The home page will be cached in two forms:
1. Cached as a whole page accessed by users arriving at the siteand users that are not logged on.2. Cached as a page fragment, just for the content part. The pagewill be constructed from the dynamic header which contains the username of the current user and the cached content fragment.3. Paginations – these will be cached up to 5 pages. It is lesslikely for users to search for events beyond the fifth page.
Expiration and re-generation
The home page will expire every 120 seconds. Then the page will bere-generated by one of the first requests arriving after theexpiration. To prevent all requests arriving after the expirationfrom re-generating, thus causing a stampede phenomenon, we will usea lock/semaphore control mechanism as follows:
1. The home page and/or home page fragment is cached with notimeout or a very large timeout (in the order of magnitude of days)in memcached.2. For each cached page, a small semaphore object is placed intomemcached with a timeout of 120 seconds – the regeneration cycle.3. After accessing the page/fragment in the cache and sending theresponse to the user, the cache client (web server) checks to seewhether the semaphore is there or has timed out. If it is not there(timed out), the client will attempt to re-generate the page orfragment.4. To prevent a stampede, the client ‘adds’ a lock entry into thecache. If the add succeeds, this thread has the lock. The locktimes out after 20 seconds using the memcached timeout mechanism.This prevents a thread to hold a lock indefinitely.5. After obtaining the lock, the thread generates the page orfragment and replaces the copy in memcached.6. Then the generating thread places a new semaphore object withthe same timeout period and removes the lock object.
Event Detail Page
The event detail page is cached as both content and, if not loggedon, the whole page as well.
Expiration and re-generation
Event detail page cache entries have a time out of 30 seconds usingthe cache timeout mechanism of memcached. Thus only frequentlyaccessed events will remain in the cache. The load generator willneed to be designed to access event detail pages in a non-uniformmanner, too. We will use a locking mechanism for the event detailpage in a similar manner to the home page. However, we will not usean expiry semaphore and let the page expire from the cache as awhole. Access to the entry should however renew the expiry time sothat frequently accessed events will stay in cache. The mechanismwill work as follows:
1. The event detail page and fragment is cached with a timeout of30 seconds.2. As a cache client needs to access the entry, it will try to readthe entry from the cache. If the entry is available, it will extendthe cache timeout. Otherwise, the event detail page is generatedfrom the database.3. To regenerate the page and prevent stampede, the client ‘adds’ alock entry into the cache. If the add succeeds, this thread has thelock. The lock times out after 20 seconds using the memcachedtimeout mechanism. This prevents a thread to hold a lockinidefinitely.4. After obtaining the lock, the thread proceeds with generatingthe page. After completion, the page gets placed into the cache andthe lock gets removed from memcached.5. If we do not get the lock (add fails). We stay in a loop, sleepfor 200ms, and check/re-check whether the page matches. We keepchecking till a timeout of 5 seconds (25 iterations).6. The attendee list and comments/rating fragments of this page iscached in the same manner. Those sections will be re-generatedwhile holding a lock object in the same manner. They will beregenerated if the fragment is not in the cache, and on or afterupdating of those fragments (i.e. somebody makes a comment orsigned up to attend this event).
Other Pages
At this point, none of the other pages and/or their fragments arecached. Most of the other pages are accessed at low frequency withthe exception of the tag search page. The tag search page is thenext candidate for caching and pre-generation. The caching strategyis still to be determined.
Page Caches with Ruby on Rails
Ruby on Rails does not natively use memcached for whole pagecaches. It can do so with caching page fragments. Instead, it willgenerate static pages as files and the request will be routed tothe corresponding file that represents a fully rendered page.
The Ruby on Rails implementation of Web20Kit will use the nativeRails mechanism for full page caches. Expirations result in a callto remove the file and follow the same expiry policy defined foreach page, above. The file must be removed as the page cacheexpires, either by a request arriving after expiry, or by abackground job.
Cheers,
- Will Sobel



Cheers,
- Will Sobel

Re: What to cache ? (was Re: Caching needs to be implemented in Rails application)

Reply via email to