This issue was aired considerably in this thread:
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/8665766f5e262d60



On Sep 7, 9:33 am, Monica Keller <[email protected]> wrote:
> +1 definitely !
>
> I think everyone asks Twitter the same question but the problem was
> they developed the firehose prior to PHSB
>
> What are the main cons of PHSB ?
>
> On Mon, Sep 7, 2009 at 8:48 AM, Jesse Stay<[email protected]> wrote:
> > Not necessarily.  See this document (which I've posted earlier on this list)
> > for details: http://code.google.com/p/pubsubhubbub/wiki/PublisherEfficiency
> > In essence, with PSHB (Pubsub Hubbub), Twitter would only have to retrieve
> > the latest data, add it to flat files on the server or a single column in a
> > database somewhere as a static RSS format.  Then, using a combination of
> > persistent connections, HTTP Pipelining, and multiple, cached and linked
> > ATOM feeds, return those feeds to either a hub or the user.  ATOM feeds can
> > be linked, and Twitter doesn't need to return the entire dataset in each
> > feed, just the latest data, linked to older data on the server (if I
> > understand ATOM correctly - someone correct me if I'm wrong).
> > So in essence Twitter only needs to retrieve, and return to the user or hub
> > the latest (cached) data, and can do so in a persistent connection, multiple
> > HTTP requests at a time.  And of course this doesn't take into account the
> > biggest advantage of PSHB - the hub.  PSHB is built to be distributed.  I
> > know Twitter doesn't want to go there, but if they wanted to they could
> > allow other authorized hubs to distribute the load of such data, and only
> > the hubs would fetch data from Twitter, significantly reducing the load for
> > Twitter regardless of the size of request and ensuring a) users own their
> > data in a publicly owned format, and b) if Twitter ever goes down the
> > content is still available via the API.  IMO this is the only way Twitter
> > will become a "utility" as Jack Dorsey wants it to be.
> > I would love to see Twitter adopt a more publicly accepted standard like
> > this.  Or, if it's not meeting their needs, either create their own public
> > standard and take the lead in open real-time stream standards, or join an
> > existing one so the standards can be perfected to a manner a company like
> > Twitter can handle.  I know it would make my coding much easier as more
> > companies begin to adopt these protocols and I'm stuck having to write the
> > code for each one.
> > Leaving the data retrieval in a closed, proprietary format benefits nobody.
> > Jesse
> > On Mon, Sep 7, 2009 at 7:52 AM, Dewald Pretorius <[email protected]> wrote:
>
> >> SUP will not work for Twitter or any other service that deals with
> >> very large data sets.
>
> >> In essence, a Twitter SUP feed would be one JSON array of all the
> >> Twitter users who have posted a status update in the past 60 seconds.
>
> >> a) The SUP feed will consistently contain a few million array entries.
>
> >> b) To build that feed you must do a select against the tweets table,
> >> which contains a few billion records, and extract all the user ids
> >> with a tweet that has a published time greater than now() - 60. Good
> >> luck asking any DB to do that kind of select once every 60 seconds.
>
> >> Dewald

Reply via email to