Hey Brett, I will review it within this week. hope I can give you some feedback.
Kang On Tue, Nov 2, 2010 at 5:49 AM, Brett Slatkin <[email protected]> wrote: > Hey all, > > I wanted to ping on this thread. Has anyone had a chance to review > this proposal? Does it sound sane? Would it help anyone out there deal > with the load they're seeing from various Track APIs (like Superfeedr > track?). > > Thanks, > > -Brett > > On Wed, Oct 6, 2010 at 7:45 PM, Brett Slatkin <[email protected]> wrote: > > Hey all, > > > > Had some ideas I've been kicking around in discussions with various > > folks. Would love some early feedback. > > > > > > == Background > > > > The plan right now is to ditch the "Aggregated Content Distribution" > > section of the spec (see > > http://code.google.com/p/pubsubhubbub/issues/detail?id=105). There is > > a variety of issues with it and it's never been deployed. However, I > > believe there is still a need for efficient aggregated delivery that > > follows from Bob Wyman's ideas about content filtering > > (http://groups.google.com/group/pubsubhubbub/msg/820f7f29b7c22d46). > > > > Take the Google Buzz Track API for example > > (http://code.google.com/apis/buzz/v1/using_rest.html#activity-track). > > Let's say you have these two Track subscriptions registered (both > > PubSubHubbub topics): > > > > https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo > > https://www.googleapis.com/buzz/v1/activities/track?q=Baggins > > > > An item comes through that matches both terms (a post with the author > > "Bilbo Baggins"). Your PuSH subscriber will receive *two* copies of > > that message, one for each subscription, each on a different callback > > URL that was registered when you setup the PuSH subscription. This > > gets much worse as the number of Track queries and potential overlaps > > increases; it's *especially* awful for geographic queries which > > intrinsically overlap. > > > > Bob's solution is to deliver a single copy of the "Bilbo Baggins" post > > but annotate it with *which queries* it matched. I like this idea, but > > I want to 1) change how we express the annotation, 2) make it easy for > > existing clients to migrate to the new scheme, 3) not add any new > > parameters (e.g., "hub.filter") to the PuSH protocol. > > > > > > == The Proposal > > > > PubSubHubbub-enabled feeds will declare a new aggregation relation > > ("http://pubsubhubbub.org/aggregation"). The "href" is picked by the > > publisher and is a statement of "things with this aggregation URL I > > can batch together into aggregated delivery." For example, with the > > Buzz Track API feeds we could do: > > > > <feed> > > <link rel="self" > > href="https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo"/> > > <link rel="http://pubsubhubbub.org/aggregation" > > href="https://www.googleapis.com/buzz/v1/activities/combined"/> > > ... > > </feed> > > > > Subscribers would see this new "rel" link and know that they could > > subscribe to that new topic > > ("https://www.googleapis.com/buzz/v1/activities/combined") to get > > aggregated delivery. What does it mean to get aggregated delivery? > > Essentially, *all* of the subscriber's existing subscriptions with > > that same "aggregation" link value would *STOP* delivering, and > > instead the subscriber would get POSTs on a *single* callback that > > look like this: > > > > POST /my-aggregated-callback HTTP/1.1 > > Link: <https://www.googleapis.com/buzz/v1/activities/combined>; > > rel="http://pubsubhubbub.org/aggregation", > > <https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo>; > > rel="self", > > <https://www.googleapis.com/buzz/v1/activities/track?q=Baggins>; > > rel="self" > > X-Hub-Signature: ... > > > > <feed> > > <link rel="self" > > href="https://www.googleapis.com/buzz/v1/activities/combined"/> > > <link rel="http://pubsubhubbub.org/aggregation" > > href="https://www.googleapis.com/buzz/v1/activities/combined"/> > > ... > > </feed> > > > > Thus you will only get one copy of each item. The list of queries > > matched will be in the Link header so users know why they're getting > > the item. > > > > This proposal would fundamentally decouple subscription verification > > from event delivery. If the subscriber adds a new PuSH subscription > > with the same "aggregation" link value, non-obviously it will use the > > normal callback URL for PuSH verification but send all content > > delivery to the aggregated callback. Unsubscription will also use a > > separate callback URL for verification. If the subscriber unsubscribes > > from the aggregation URL, then all of the subscriptions will revert > > back to the old way of doing things. > > > > > > == Open questions > > > > Random list of questions: > > > > * What granularity do you use to move the existing subscriptions to > > the aggregated endpoint? Does the publisher do it by domain, by URL > > prefix, by some other token? > > * Should the "self" links in the aggregated delivery be for the feeds > > you subscribed to, or should you instead pass through the callback > > URLs that *would* have been used for normal delivery? The latter > > approach could be useful for subscribers who put context data into > > their callback URLs. > > * Will this allow us to finally put the Topic header in content > > delivery as users have requested a million times? > > (http://code.google.com/p/pubsubhubbub/issues/detail?id=79) > > * Can this scheme be reused for aggregated delivery across different > > sites, so subscribers get fewer POSTs? > > > > > > Thanks for reading! > > > > -Brett > > > -- Stay hungry,Stay foolish. Twitter: http://twitter.com/lookon | Buzz: http://www.google.com/profiles/areyoulookon | Blog: http://throw-dice.appspot.com
