+1 definitely !

I think everyone asks Twitter the same question but the problem was
they developed the firehose prior to PHSB

What are the main cons of PHSB ?

On Mon, Sep 7, 2009 at 8:48 AM, Jesse Stay<jesses...@gmail.com> wrote:
> Not necessarily.  See this document (which I've posted earlier on this list)
> for details: http://code.google.com/p/pubsubhubbub/wiki/PublisherEfficiency
> In essence, with PSHB (Pubsub Hubbub), Twitter would only have to retrieve
> the latest data, add it to flat files on the server or a single column in a
> database somewhere as a static RSS format.  Then, using a combination of
> persistent connections, HTTP Pipelining, and multiple, cached and linked
> ATOM feeds, return those feeds to either a hub or the user.  ATOM feeds can
> be linked, and Twitter doesn't need to return the entire dataset in each
> feed, just the latest data, linked to older data on the server (if I
> understand ATOM correctly - someone correct me if I'm wrong).
> So in essence Twitter only needs to retrieve, and return to the user or hub
> the latest (cached) data, and can do so in a persistent connection, multiple
> HTTP requests at a time.  And of course this doesn't take into account the
> biggest advantage of PSHB - the hub.  PSHB is built to be distributed.  I
> know Twitter doesn't want to go there, but if they wanted to they could
> allow other authorized hubs to distribute the load of such data, and only
> the hubs would fetch data from Twitter, significantly reducing the load for
> Twitter regardless of the size of request and ensuring a) users own their
> data in a publicly owned format, and b) if Twitter ever goes down the
> content is still available via the API.  IMO this is the only way Twitter
> will become a "utility" as Jack Dorsey wants it to be.
> I would love to see Twitter adopt a more publicly accepted standard like
> this.  Or, if it's not meeting their needs, either create their own public
> standard and take the lead in open real-time stream standards, or join an
> existing one so the standards can be perfected to a manner a company like
> Twitter can handle.  I know it would make my coding much easier as more
> companies begin to adopt these protocols and I'm stuck having to write the
> code for each one.
> Leaving the data retrieval in a closed, proprietary format benefits nobody.
> Jesse
> On Mon, Sep 7, 2009 at 7:52 AM, Dewald Pretorius <dpr...@gmail.com> wrote:
>>
>> SUP will not work for Twitter or any other service that deals with
>> very large data sets.
>>
>> In essence, a Twitter SUP feed would be one JSON array of all the
>> Twitter users who have posted a status update in the past 60 seconds.
>>
>> a) The SUP feed will consistently contain a few million array entries.
>>
>> b) To build that feed you must do a select against the tweets table,
>> which contains a few billion records, and extract all the user ids
>> with a tweet that has a published time greater than now() - 60. Good
>> luck asking any DB to do that kind of select once every 60 seconds.
>>
>> Dewald
>
>

Reply via email to