+1 definitely ! I think everyone asks Twitter the same question but the problem was they developed the firehose prior to PHSB
What are the main cons of PHSB ? On Mon, Sep 7, 2009 at 8:48 AM, Jesse Stay<jesses...@gmail.com> wrote: > Not necessarily. See this document (which I've posted earlier on this list) > for details: http://code.google.com/p/pubsubhubbub/wiki/PublisherEfficiency > In essence, with PSHB (Pubsub Hubbub), Twitter would only have to retrieve > the latest data, add it to flat files on the server or a single column in a > database somewhere as a static RSS format. Then, using a combination of > persistent connections, HTTP Pipelining, and multiple, cached and linked > ATOM feeds, return those feeds to either a hub or the user. ATOM feeds can > be linked, and Twitter doesn't need to return the entire dataset in each > feed, just the latest data, linked to older data on the server (if I > understand ATOM correctly - someone correct me if I'm wrong). > So in essence Twitter only needs to retrieve, and return to the user or hub > the latest (cached) data, and can do so in a persistent connection, multiple > HTTP requests at a time. And of course this doesn't take into account the > biggest advantage of PSHB - the hub. PSHB is built to be distributed. I > know Twitter doesn't want to go there, but if they wanted to they could > allow other authorized hubs to distribute the load of such data, and only > the hubs would fetch data from Twitter, significantly reducing the load for > Twitter regardless of the size of request and ensuring a) users own their > data in a publicly owned format, and b) if Twitter ever goes down the > content is still available via the API. IMO this is the only way Twitter > will become a "utility" as Jack Dorsey wants it to be. > I would love to see Twitter adopt a more publicly accepted standard like > this. Or, if it's not meeting their needs, either create their own public > standard and take the lead in open real-time stream standards, or join an > existing one so the standards can be perfected to a manner a company like > Twitter can handle. I know it would make my coding much easier as more > companies begin to adopt these protocols and I'm stuck having to write the > code for each one. > Leaving the data retrieval in a closed, proprietary format benefits nobody. > Jesse > On Mon, Sep 7, 2009 at 7:52 AM, Dewald Pretorius <dpr...@gmail.com> wrote: >> >> SUP will not work for Twitter or any other service that deals with >> very large data sets. >> >> In essence, a Twitter SUP feed would be one JSON array of all the >> Twitter users who have posted a status update in the past 60 seconds. >> >> a) The SUP feed will consistently contain a few million array entries. >> >> b) To build that feed you must do a select against the tweets table, >> which contains a few billion records, and extract all the user ids >> with a tweet that has a published time greater than now() - 60. Good >> luck asking any DB to do that kind of select once every 60 seconds. >> >> Dewald > >