[web2py] Re: caching questions

Niphlod Wed, 27 Apr 2016 13:44:53 -0700

on these "philosophical" notes, boring for someone, exciting for yours 
truly, the are a few other pointers (just to point out that the concept of 
"caching" isn't really something to discard beforehand)...be aware that on 
"caching" there are probably thousands of posts and pages of books written 
on the matter.

As everything, it's a process that has "steps". Let's spend 10 seconds of 
silence on the sentence "premature optimization is the root of all evil". 
And another 10 on "There are only two hard things in Computer Science: 
cache invalidation and naming things". Let those sink in.

Ready? Let's go.

Step #0 : assessment

Consider an app that has a page that shows the records of a table that YOU 
KNOW (as you are the developer) gets refreshed once a day (e.g. the 
temperature recorded for LA for the previous day)
Or a page that shows the content of a row that never gets updated (e.g. a 
blogpost)
Given that the single most expensive operation on a traditional webapp is 
the database (just think to web2py requesting the data, the database 
reading it from disk, preparing, sending it over the wire, web2py receiving 
it) developers should always find a way to spend the least possible time on 
those steps.
Optimizing queries (and/or database tuning, normalization, etc). Reducing 
the number of needed queries to render a page. Requesting just the amount 
of data needed (paging). Those are HUUUUGE topics (again, zillion of posts, 
books, years of expertise to master, etc etc etc). But - of course - not 
having to issue a query at all shortcircuits all of the above!
Still at step #0, as users come by your app, every request made to those 
pages triggers the roundtrip to the database back and forth, always for the 
same data, over and over.
Granted, 50 reqs/sec won't certainly hurt performances, but once they get 
to 500, it'll become pretty obvious that "a" shortcircuit could save LOTS 
of processing power.
When you face the problem of scaling to serve more concurrent requests, 
either you do spawning more processes, or adding servers.
Adding frontend servers is easy: the data is transactionally consistent as 
long as you have a single database instance. You put a load balancer in 
front of frontends (it's relatively inexpensive) and go on.
Scaling databases adding servers is NEVER easy (againt, the interwebs and 
libraries are full of evidences, and a big part of nosql "shiny" features 
are indeed horizontal scaling, with pros and cons).

Step #1: local caching
Back to your app without cache...wouldn't be better to avoid calling the db 
for the same data 500 times per second ? Sure. Cache it.

Assuming you cache the database results, web2py still needs to compute the 
view, but that step, in regards of the shortcircuit, is orders of magnitude 
less expensive. (yes, if you cache views, you're sidestepping web2py's 
rendering too, but let's keep as less variables as possible for the sake of 
this discussion)
And there you are, at the first iteration of step #1, using 1MB of RAM more 
to avoid hitting the database. 
Cache it for just an hour, do the math on the simple example of 50 req/s, 
and you saved 50*60*60 - 1 = 179999 roundtrips. You can use the extra 
savings to do 179999 roundtrips you actually NEED to in other places of 
your app, and having the same performances, without additional costs
Whoa! 

Step #300
You start caching here and there, and you use 500MB or RAM. You're using 
cache.ram, everything is super-speedy, no third-parties, just web2py 
features. 
Now, you need to serve 100 req/s, you spawn another process.... whoopsie 
.... 1GB of RAM. Or, another server, 500MB on the first and 500MB on the 
second... 500 are "clearly" wasted, as they are a copy of the "original" 
500. 
And the second process (or server) still needs to do roundtrips if its 
local cache doesn't contain your already-cached-in-another-place query.
Also, something else "crepts in"...as your apps grows, you start loosing 
track of what you cached, when you cached, for how long it's needed to be 
cached... a record fetched on the first server at 8:00AM could be updated 
in the meantime and fetched on the second server (because it isn't in its 
local cache) at 8:02AM...you're effectively serving from cache different 
versions!

#Step 301
To sidestep both issues, you use redis or memcached: they sit outside of 
the web2py process and consume only 500MB. 
For one, two, a zillion processes. And they are a single source of truth. 
And they are as speedy as cache.ram (or at least, they have the same 
magnitude).

On Wednesday, April 27, 2016 at 2:04:07 PM UTC+2, Anthony wrote:
>
> On Wednesday, April 27, 2016 at 7:00:53 AM UTC-4, Pierre wrote:
>>
>> I'm impressed Anthony...
>>
>> well all of these - memcache-redis - seem to require lot's of 
>> technicality and probably to set-up your own deployment machine. I am not 
>> very enthusiastic about that option since the internet if full of endless  
>> technical setup-config issues. Given what's being said, I see only two 
>> kinds of page of my app I could cache : general informations and maybe 
>> forms (no db().select()) all shared and uniform datas.
>>
>
> I'm not sure what you mean by caching forms, but you probably don't want 
> to do that (at least if we're talking about web2py forms, which each 
> include a unique hidden _formkey field to protect against CSRF attacks).
>  
>
>> There should be a simple way to achieve such a simple thing whatever the 
>> platform: pythonanywhere,vs......Is there one ? 
>>
>
> You can just use cache.ram. If running uWSGI with multiple processes, you 
> will have a separate cache for each, but that won't necessarily be a 
> problem (just not as efficient as it could be). You could also try 
> cache.disk and do some testing to see how it impacts performance.
>
> More generally, caching is something you do to improve efficiency, which 
> becomes important as you start to have lots of traffic. But if you've got 
> enough traffic where efficiency becomes so important, you should probably 
> be willing (and hopefully able) to put in some extra effort to set up 
> something like Memcached or Redis. Until you hit that point, don't worry 
> about it.
>
> Anthony
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[web2py] Re: caching questions

Reply via email to