Re: [twincling] scaling twitter

Zubin Wadia Sun, 08 Jun 2008 18:37:34 -0700

I am not sure I want to sit around defining an algorithm because it really
is a well-defined & solved problem. DNS would be a well implemented example
of a monster phonebook distributed across a few root servers and working
beautifully for internet in an eventually consistent manner.

Also - algorithm optimization generally contributes to performance on a
per-node basis.

Scalability is just a price-performance equation where performance usually
is compromised as scale increases. Price determines how much it is
compromised by.

Also - Twitter appears to have issues with members who have thousands of
"Followers" and handling the distribution of those tweets. Broadcasting is
an issue, not receiving. Also it would be really bizarre to have 1 user
follow 10million people. That person would have no life! But 10 million
people following 1 person is a possibility.

One thing is clear, Twitter is a lesson in why architecture matters. It is
not an O(N) system. It is tightly coupled, in the sense that increased
traffic puts pressure on ALL tiers of the architecture simultaneously. This
indicates short-sighted design.

Twitter is built on:

- Ruby
- Memcached
- MySQL

They use a stunning amount of Memcached to keep their service responsive
when up (their uptime isn't so great). 16GB I believe. They also have a
home-grown message queuing system (Starling) that works with Memcached.

All the crazy accusations that Ruby-on-Rails was the core reason for Twitter
scalability issues turned out to be false. Nobody really knows why Twitter
is flaky. But I am pretty sure separation of concern has more to do with it
over the algorithm. If they knew the problem was O(2N) vs O(2^N) they have
enough funding to buy more hardware and fix it! They simply cannot
atomically scale tiers currently to assess the impact on scalability.

I feel Twitter would be better served if the architecture looked like:

Erlang+Erlyweb+Mnesia (Take a BASE not ACID mentality)
XMPP Protocol
C++ for message logging/archival (Because Erlang I/O / String libs are not
where they should be)
Clustered Filesystem (No DB), large files, heirarchical storage.

Facebook Chat utilizes a similar architecture for 60 million users.

I believe they are re-engineering Twitter as we speak. Let's see where they
go with it.

Cheers,

Zubin.
On Sun, Jun 8, 2008 at 9:38 AM, Saifi Khan <[EMAIL PROTECTED]> wrote:

>   Hi all:
>
> Surely, most of us would have tried twitter atleast once.
> It is a cool 140 character (aka tweet) micro-blog facility
> which then members can follow and be updated.
>
> Interestingly enough this Ruby+MySQL implementation is
> plagued by outages given the usage load among other issues.
>
> i was just wondering if anybody would like to take a crack
> at the problem from an algorithm perspective.
>
> "There are M users and each user follows N other users.
> Each user's timeline is rendered with latest 20 tweets
> from all the friends."
>
> What is a good way to model this problem ?
>
> How to scale the solution when N is very large say 10 million ?
>
> All suggestions are welcome.
>
> thanks
> Saifi.
>

Re: [twincling] scaling twitter

Reply via email to