Hey all, I wanted to ping on this thread. Has anyone had a chance to review this proposal? Does it sound sane? Would it help anyone out there deal with the load they're seeing from various Track APIs (like Superfeedr track?).
Thanks, -Brett On Wed, Oct 6, 2010 at 7:45 PM, Brett Slatkin <[email protected]> wrote: > Hey all, > > Had some ideas I've been kicking around in discussions with various > folks. Would love some early feedback. > > > == Background > > The plan right now is to ditch the "Aggregated Content Distribution" > section of the spec (see > http://code.google.com/p/pubsubhubbub/issues/detail?id=105). There is > a variety of issues with it and it's never been deployed. However, I > believe there is still a need for efficient aggregated delivery that > follows from Bob Wyman's ideas about content filtering > (http://groups.google.com/group/pubsubhubbub/msg/820f7f29b7c22d46). > > Take the Google Buzz Track API for example > (http://code.google.com/apis/buzz/v1/using_rest.html#activity-track). > Let's say you have these two Track subscriptions registered (both > PubSubHubbub topics): > > https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo > https://www.googleapis.com/buzz/v1/activities/track?q=Baggins > > An item comes through that matches both terms (a post with the author > "Bilbo Baggins"). Your PuSH subscriber will receive *two* copies of > that message, one for each subscription, each on a different callback > URL that was registered when you setup the PuSH subscription. This > gets much worse as the number of Track queries and potential overlaps > increases; it's *especially* awful for geographic queries which > intrinsically overlap. > > Bob's solution is to deliver a single copy of the "Bilbo Baggins" post > but annotate it with *which queries* it matched. I like this idea, but > I want to 1) change how we express the annotation, 2) make it easy for > existing clients to migrate to the new scheme, 3) not add any new > parameters (e.g., "hub.filter") to the PuSH protocol. > > > == The Proposal > > PubSubHubbub-enabled feeds will declare a new aggregation relation > ("http://pubsubhubbub.org/aggregation"). The "href" is picked by the > publisher and is a statement of "things with this aggregation URL I > can batch together into aggregated delivery." For example, with the > Buzz Track API feeds we could do: > > <feed> > <link rel="self" > href="https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo"/> > <link rel="http://pubsubhubbub.org/aggregation" > href="https://www.googleapis.com/buzz/v1/activities/combined"/> > ... > </feed> > > Subscribers would see this new "rel" link and know that they could > subscribe to that new topic > ("https://www.googleapis.com/buzz/v1/activities/combined") to get > aggregated delivery. What does it mean to get aggregated delivery? > Essentially, *all* of the subscriber's existing subscriptions with > that same "aggregation" link value would *STOP* delivering, and > instead the subscriber would get POSTs on a *single* callback that > look like this: > > POST /my-aggregated-callback HTTP/1.1 > Link: <https://www.googleapis.com/buzz/v1/activities/combined>; > rel="http://pubsubhubbub.org/aggregation", > <https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo>; > rel="self", > <https://www.googleapis.com/buzz/v1/activities/track?q=Baggins>; > rel="self" > X-Hub-Signature: ... > > <feed> > <link rel="self" > href="https://www.googleapis.com/buzz/v1/activities/combined"/> > <link rel="http://pubsubhubbub.org/aggregation" > href="https://www.googleapis.com/buzz/v1/activities/combined"/> > ... > </feed> > > Thus you will only get one copy of each item. The list of queries > matched will be in the Link header so users know why they're getting > the item. > > This proposal would fundamentally decouple subscription verification > from event delivery. If the subscriber adds a new PuSH subscription > with the same "aggregation" link value, non-obviously it will use the > normal callback URL for PuSH verification but send all content > delivery to the aggregated callback. Unsubscription will also use a > separate callback URL for verification. If the subscriber unsubscribes > from the aggregation URL, then all of the subscriptions will revert > back to the old way of doing things. > > > == Open questions > > Random list of questions: > > * What granularity do you use to move the existing subscriptions to > the aggregated endpoint? Does the publisher do it by domain, by URL > prefix, by some other token? > * Should the "self" links in the aggregated delivery be for the feeds > you subscribed to, or should you instead pass through the callback > URLs that *would* have been used for normal delivery? The latter > approach could be useful for subscribers who put context data into > their callback URLs. > * Will this allow us to finally put the Topic header in content > delivery as users have requested a million times? > (http://code.google.com/p/pubsubhubbub/issues/detail?id=79) > * Can this scheme be reused for aggregated delivery across different > sites, so subscribers get fewer POSTs? > > > Thanks for reading! > > -Brett >
