Flat file generation and maintenance would be foolish at this stage. Seperating out the individual data sets purely for api to be served by different clusters with server side caching may fit the bill - but tbh if this isn't happening already I'll be shocked.
On Sep 7, 5:40 am, Jesse Stay <jesses...@gmail.com> wrote: > As far as retrieving the large graphs from a DB, flat files are one way - > another is to just store the full graph (of ids) in a single column in the > database and parse on retrieval. This is what FriendFeed is doing > currently, so they've said. Dewald and I are both talking about this > because we're also having to duplicate this on our own servers, so we too > have to deal with the pains of the social graph. (and oh the pain it is!) > > > > On Sun, Sep 6, 2009 at 8:44 PM, Dewald Pretorius <dpr...@gmail.com> wrote: > > > If I worked for Twitter, here's what I would have done. > > > I would have grabbed the follower id list of the large accounts (those > > that usually kicked back 502s) and written them to flat files once > > every 5 or so minutes. > > > When an API request comes in for that list, I'd just grab it from the > > flat file, instead of asking the DB to select 2+ million ids from > > amongst a few billion records, while it's trying to do a few thousand > > other selects at the same time. > > > That's one way of getting rid of 502s on large social graph lists. > > > Okay, the data is going to be 5 minutes out-dated. To that I say, so > > bloody what? > > > Dewald- Hide quoted text - > > - Show quoted text -