Actually PSHB does use fat pings - subscribers are POSTed the delta of any feed when a Publisher notifies a Hub that a change has taken place.
As to efficiency, I think the caching mechanism is slightly off track. Serving a cached delta feed, and sending a delta feed (which presumably is generated just once) are fairly equivelant. There are however differences in the backend request serving - whether the servicing is done using a proxy or a full application. In most cases, I'd assume the second is offloaded as much as possible to a basic task and not being run through an application proper so as to lower the request cost. To your second point, Subscribers should never synchronously process updates. They should be dumped immediately to a job queue for asynchronous processing. This will help spread the processing load more evenly over time instead of being clumped together which I gather is what you're against. So it's receive update, verify it is an update (input validation), dump update to queue, and respond with a 200 code. So, I think overall it's still quite an efficient system. The main thing is making sure each party is being efficient about it which is, of course, an implementation point the specification won't be commenting on. I think this will be the biggest mental block over time - web developers are pretty bad at thinking asynchronously ;). Paddy Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com OpenID Europe Foundation Irish Representative ________________________________ From: Alexis Richardson <[email protected]> To: [email protected] Sent: Mon, October 26, 2009 4:29:43 PM Subject: [pubsubhubbub] Re: Are fat pings efficient? Alex PSHB is not using fat pings. There are use cases for fat pings that are under discussion, but fat pings are not in the spec at this time. alexis On Mon, Oct 26, 2009 at 4:12 PM, Alex Barth <[email protected]> wrote: > > I am *very* excited about the pubsubhubbub work I'm seeing. I consider > making it a mainstay of our aggregation infrastructure. > > Reading the spec and some of the issues on project page, my main > question is: > > Why does PuSH POST the entire feed to subscribers? > > To me it would seem more efficient that the hub exposes the updated > feed on a URL and then POSTs only this URL to the subscribers. The > subscribers would then GET the feed from the hub. > > The amount of data to be posted would be a fraction, the updated feed > hosted by the hub could be cached with a reverse proxy like Varnish or > Squid. Subscribers could queue URLs neatly, then work them off > asynchronously. > > Further, allowing POSTing a URL where updated data can be fetched > would open Pubsubhubbub to be applied in fields where the data feeds > are large (look at http://data.gov). > > What are the reasons behind the design decision on PuSH posting fat > pings? Is there an option to post light pings that I am overlooking? > Are there threads I should be reading up? > > Alex > > -- > I'm one of the geeks at http://developmentseed.org and as such I do a > lot of work with aggregation for news tracking and Open Data in > Drupal. Recently we launched an open source news tracker called > Managing News http://managingnews.com. I maintain and have helped > maintain 3 aggregators for Drupal (e. g. http://drupal.org/project/feedapi > and its reincarnation: http://drupal.org/project/feeds). >
