I am not sure I want to sit around defining an algorithm because it really is a well-defined & solved problem. DNS would be a well implemented example of a monster phonebook distributed across a few root servers and working beautifully for internet in an eventually consistent manner.
Also - algorithm optimization generally contributes to performance on a per-node basis. Scalability is just a price-performance equation where performance usually is compromised as scale increases. Price determines how much it is compromised by. Also - Twitter appears to have issues with members who have thousands of "Followers" and handling the distribution of those tweets. Broadcasting is an issue, not receiving. Also it would be really bizarre to have 1 user follow 10million people. That person would have no life! But 10 million people following 1 person is a possibility. One thing is clear, Twitter is a lesson in why architecture matters. It is not an O(N) system. It is tightly coupled, in the sense that increased traffic puts pressure on ALL tiers of the architecture simultaneously. This indicates short-sighted design. Twitter is built on: - Ruby - Memcached - MySQL They use a stunning amount of Memcached to keep their service responsive when up (their uptime isn't so great). 16GB I believe. They also have a home-grown message queuing system (Starling) that works with Memcached. All the crazy accusations that Ruby-on-Rails was the core reason for Twitter scalability issues turned out to be false. Nobody really knows why Twitter is flaky. But I am pretty sure separation of concern has more to do with it over the algorithm. If they knew the problem was O(2N) vs O(2^N) they have enough funding to buy more hardware and fix it! They simply cannot atomically scale tiers currently to assess the impact on scalability. I feel Twitter would be better served if the architecture looked like: Erlang+Erlyweb+Mnesia (Take a BASE not ACID mentality) XMPP Protocol C++ for message logging/archival (Because Erlang I/O / String libs are not where they should be) Clustered Filesystem (No DB), large files, heirarchical storage. Facebook Chat utilizes a similar architecture for 60 million users. I believe they are re-engineering Twitter as we speak. Let's see where they go with it. Cheers, Zubin. On Sun, Jun 8, 2008 at 9:38 AM, Saifi Khan <[EMAIL PROTECTED]> wrote: > Hi all: > > Surely, most of us would have tried twitter atleast once. > It is a cool 140 character (aka tweet) micro-blog facility > which then members can follow and be updated. > > Interestingly enough this Ruby+MySQL implementation is > plagued by outages given the usage load among other issues. > > i was just wondering if anybody would like to take a crack > at the problem from an algorithm perspective. > > "There are M users and each user follows N other users. > Each user's timeline is rendered with latest 20 tweets > from all the friends." > > What is a good way to model this problem ? > > How to scale the solution when N is very large say 10 million ? > > All suggestions are welcome. > > thanks > Saifi. >

