This issue was aired considerably in this thread: http://groups.google.com/group/twitter-development-talk/browse_thread/thread/8665766f5e262d60
On Sep 7, 9:33 am, Monica Keller <[email protected]> wrote: > +1 definitely ! > > I think everyone asks Twitter the same question but the problem was > they developed the firehose prior to PHSB > > What are the main cons of PHSB ? > > On Mon, Sep 7, 2009 at 8:48 AM, Jesse Stay<[email protected]> wrote: > > Not necessarily. See this document (which I've posted earlier on this list) > > for details: http://code.google.com/p/pubsubhubbub/wiki/PublisherEfficiency > > In essence, with PSHB (Pubsub Hubbub), Twitter would only have to retrieve > > the latest data, add it to flat files on the server or a single column in a > > database somewhere as a static RSS format. Then, using a combination of > > persistent connections, HTTP Pipelining, and multiple, cached and linked > > ATOM feeds, return those feeds to either a hub or the user. ATOM feeds can > > be linked, and Twitter doesn't need to return the entire dataset in each > > feed, just the latest data, linked to older data on the server (if I > > understand ATOM correctly - someone correct me if I'm wrong). > > So in essence Twitter only needs to retrieve, and return to the user or hub > > the latest (cached) data, and can do so in a persistent connection, multiple > > HTTP requests at a time. And of course this doesn't take into account the > > biggest advantage of PSHB - the hub. PSHB is built to be distributed. I > > know Twitter doesn't want to go there, but if they wanted to they could > > allow other authorized hubs to distribute the load of such data, and only > > the hubs would fetch data from Twitter, significantly reducing the load for > > Twitter regardless of the size of request and ensuring a) users own their > > data in a publicly owned format, and b) if Twitter ever goes down the > > content is still available via the API. IMO this is the only way Twitter > > will become a "utility" as Jack Dorsey wants it to be. > > I would love to see Twitter adopt a more publicly accepted standard like > > this. Or, if it's not meeting their needs, either create their own public > > standard and take the lead in open real-time stream standards, or join an > > existing one so the standards can be perfected to a manner a company like > > Twitter can handle. I know it would make my coding much easier as more > > companies begin to adopt these protocols and I'm stuck having to write the > > code for each one. > > Leaving the data retrieval in a closed, proprietary format benefits nobody. > > Jesse > > On Mon, Sep 7, 2009 at 7:52 AM, Dewald Pretorius <[email protected]> wrote: > > >> SUP will not work for Twitter or any other service that deals with > >> very large data sets. > > >> In essence, a Twitter SUP feed would be one JSON array of all the > >> Twitter users who have posted a status update in the past 60 seconds. > > >> a) The SUP feed will consistently contain a few million array entries. > > >> b) To build that feed you must do a select against the tweets table, > >> which contains a few billion records, and extract all the user ids > >> with a tweet that has a published time greater than now() - 60. Good > >> luck asking any DB to do that kind of select once every 60 seconds. > > >> Dewald
