Re: [PERFORM] Partitioning / Clustering

Alex Stapleton Wed, 11 May 2005 02:09:34 -0700


On 11 May 2005, at 09:50, Alex Stapleton wrote:

On 11 May 2005, at 08:57, David Roussel wrote:
For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf

I have implemented similar systems in the past, it's a pretty good technique, unfortunately it's not very "Plug-and-Play" as you have to base most of your API on memcached (I imagine MySQLs NDB tables might work as well actually) for it to work well.
They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.
Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale.

They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the application level to avoid hitting the db for rendered pageviews
I don't think they are storing pre-rendered pages (or bits of) in memcached, but are principally storing the data for the pages in it. Gluing pages together is not a hugely intensive process usually :) The only problem with memcached is that the clients clustering/ partitioning system will probably break if a node dies, and probably get confused if you add new nodes onto it as well. Easily extensible clustering (no complete redistribution of data required when you add/remove nodes) with the data distributed across nodes seems to be nothing but a pipe dream right now.

It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up.

Don't Google use indexing servers which keep track of where data is? So that you only need to update them when you add or move data, deletes don't even have to be propagated among indexes immediately really because you'll find out if data isn't there when you visit where it should be. Or am I talking crap?

That will teach me to RTFA first ;) Ok so LJ maintain an index of which cluster each user is on, kinda of like google do :)

David
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
     joining column's datatypes do not match

Re: [PERFORM] Partitioning / Clustering

Reply via email to