I am *very* excited about the pubsubhubbub work I'm seeing. I consider making it a mainstay of our aggregation infrastructure.
Reading the spec and some of the issues on project page, my main question is: Why does PuSH POST the entire feed to subscribers? To me it would seem more efficient that the hub exposes the updated feed on a URL and then POSTs only this URL to the subscribers. The subscribers would then GET the feed from the hub. The amount of data to be posted would be a fraction, the updated feed hosted by the hub could be cached with a reverse proxy like Varnish or Squid. Subscribers could queue URLs neatly, then work them off asynchronously. Further, allowing POSTing a URL where updated data can be fetched would open Pubsubhubbub to be applied in fields where the data feeds are large (look at http://data.gov). What are the reasons behind the design decision on PuSH posting fat pings? Is there an option to post light pings that I am overlooking? Are there threads I should be reading up? Alex -- I'm one of the geeks at http://developmentseed.org and as such I do a lot of work with aggregation for news tracking and Open Data in Drupal. Recently we launched an open source news tracker called Managing News http://managingnews.com. I maintain and have helped maintain 3 aggregators for Drupal (e. g. http://drupal.org/project/feedapi and its reincarnation: http://drupal.org/project/feeds).
