Hey all,

I wanted to ping on this thread. Has anyone had a chance to review
this proposal? Does it sound sane? Would it help anyone out there deal
with the load they're seeing from various Track APIs (like Superfeedr
track?).

Thanks,

-Brett

On Wed, Oct 6, 2010 at 7:45 PM, Brett Slatkin <[email protected]> wrote:
> Hey all,
>
> Had some ideas I've been kicking around in discussions with various
> folks. Would love some early feedback.
>
>
> == Background
>
> The plan right now is to ditch the "Aggregated Content Distribution"
> section of the spec (see
> http://code.google.com/p/pubsubhubbub/issues/detail?id=105). There is
> a variety of issues with it and it's never been deployed. However, I
> believe there is still a need for efficient aggregated delivery that
> follows from Bob Wyman's ideas about content filtering
> (http://groups.google.com/group/pubsubhubbub/msg/820f7f29b7c22d46).
>
> Take the Google Buzz Track API for example
> (http://code.google.com/apis/buzz/v1/using_rest.html#activity-track).
> Let's say you have these two Track subscriptions registered (both
> PubSubHubbub topics):
>
> https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo
> https://www.googleapis.com/buzz/v1/activities/track?q=Baggins
>
> An item comes through that matches both terms (a post with the author
> "Bilbo Baggins"). Your PuSH subscriber will receive *two* copies of
> that message, one for each subscription, each on a different callback
> URL that was registered when you setup the PuSH subscription. This
> gets much worse as the number of Track queries and potential overlaps
> increases; it's *especially* awful for geographic queries which
> intrinsically overlap.
>
> Bob's solution is to deliver a single copy of the "Bilbo Baggins" post
> but annotate it with *which queries* it matched. I like this idea, but
> I want to 1) change how we express the annotation, 2) make it easy for
> existing clients to migrate to the new scheme, 3) not add any new
> parameters (e.g., "hub.filter") to the PuSH protocol.
>
>
> == The Proposal
>
> PubSubHubbub-enabled feeds will declare a new aggregation relation
> ("http://pubsubhubbub.org/aggregation";). The "href" is picked by the
> publisher and is a statement of "things with this aggregation URL I
> can batch together into aggregated delivery." For example, with the
> Buzz Track API feeds we could do:
>
> <feed>
>  <link rel="self"
> href="https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo"/>
>  <link rel="http://pubsubhubbub.org/aggregation";
> href="https://www.googleapis.com/buzz/v1/activities/combined"/>
>  ...
> </feed>
>
> Subscribers would see this new "rel" link and know that they could
> subscribe to that new topic
> ("https://www.googleapis.com/buzz/v1/activities/combined";) to get
> aggregated delivery. What does it mean to get aggregated delivery?
> Essentially, *all* of the subscriber's existing subscriptions with
> that same "aggregation" link value would *STOP* delivering, and
> instead the subscriber would get POSTs on a *single* callback that
> look like this:
>
> POST /my-aggregated-callback HTTP/1.1
> Link: <https://www.googleapis.com/buzz/v1/activities/combined>;
> rel="http://pubsubhubbub.org/aggregation";,
>        <https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo>;
> rel="self",
>        <https://www.googleapis.com/buzz/v1/activities/track?q=Baggins>;
> rel="self"
> X-Hub-Signature: ...
>
> <feed>
>  <link rel="self"
> href="https://www.googleapis.com/buzz/v1/activities/combined"/>
>  <link rel="http://pubsubhubbub.org/aggregation";
> href="https://www.googleapis.com/buzz/v1/activities/combined"/>
>  ...
> </feed>
>
> Thus you will only get one copy of each item. The list of queries
> matched will be in the Link header so users know why they're getting
> the item.
>
> This proposal would fundamentally decouple subscription verification
> from event delivery. If the subscriber adds a new PuSH subscription
> with the same "aggregation" link value, non-obviously it will use the
> normal callback URL for PuSH verification but send all content
> delivery to the aggregated callback. Unsubscription will also use a
> separate callback URL for verification. If the subscriber unsubscribes
> from the aggregation URL, then all of the subscriptions will revert
> back to the old way of doing things.
>
>
> == Open questions
>
> Random list of questions:
>
> * What granularity do you use to move the existing subscriptions to
> the aggregated endpoint? Does the publisher do it by domain, by URL
> prefix, by some other token?
> * Should the "self" links in the aggregated delivery be for the feeds
> you subscribed to, or should you instead pass through the callback
> URLs that *would* have been used for normal delivery? The latter
> approach could be useful for subscribers who put context data into
> their callback URLs.
> * Will this allow us to finally put the Topic header in content
> delivery as users have requested a million times?
> (http://code.google.com/p/pubsubhubbub/issues/detail?id=79)
> * Can this scheme be reused for aggregated delivery across different
> sites, so subscribers get fewer POSTs?
>
>
> Thanks for reading!
>
> -Brett
>

Reply via email to