Re: [pubsubhubbub] Re: New proposal for PubSubHubbub aggregated delivery

Julien Genestoux Tue, 02 Nov 2010 04:02:19 -0700

Hey,

So, I reviewed it and I think it can be a good solution. We (superfeedr)
currently don't have so much of an issue with overlapping track queries.


One thing that bugs me. I think it's *extremely convenient* to match
susbcriptions using the callback url rather than parsing the content to
extract and match urls to see which feeds are concerned. With the current
proposal, I think we lose this... unless we use the callback urls rather
than the matches queries :

POST /my-aggregated-callback HTTP/1.1
Link: <https://www.googleapis.com/buzz/v1/activities/combined>;
rel="http://pubsubhubbub.org/aggregation";,
       <https://superfeedr.com/callbacks/12345>;
rel="self",
       <https://superfeedr.com/callbacks/12346>;
rel="self"
X-Hub-Signature: ...

where I used https://superfeedr.com/callbacks/12345 to subscribe to
https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo
and https://superfeedr.com/callbacks/12346 to subscribe to
https://www.googleapis.com/buzz/v1/activities/track?q=Baggins

This way we don't lose this and we still keep the flexibility of having urls
diverging with trailing slashes, missing www, https vs. http... etc.

Also, and I'm taking my implementors hat : I really think this spec should
be only SHOULDs and not MUSTs.

Julien



On Tue, Nov 2, 2010 at 6:03 AM, Kang Lu <[email protected]> wrote:

> Hey Brett,
>
> I will review it within this week. hope I can give you some feedback.
>
> Kang
>
>
> On Tue, Nov 2, 2010 at 5:49 AM, Brett Slatkin <[email protected]> wrote:
>
>> Hey all,
>>
>> I wanted to ping on this thread. Has anyone had a chance to review
>> this proposal? Does it sound sane? Would it help anyone out there deal
>> with the load they're seeing from various Track APIs (like Superfeedr
>> track?).
>>
>> Thanks,
>>
>> -Brett
>>
>> On Wed, Oct 6, 2010 at 7:45 PM, Brett Slatkin <[email protected]> wrote:
>> > Hey all,
>> >
>> > Had some ideas I've been kicking around in discussions with various
>> > folks. Would love some early feedback.
>> >
>> >
>> > == Background
>> >
>> > The plan right now is to ditch the "Aggregated Content Distribution"
>> > section of the spec (see
>> > http://code.google.com/p/pubsubhubbub/issues/detail?id=105). There is
>> > a variety of issues with it and it's never been deployed. However, I
>> > believe there is still a need for efficient aggregated delivery that
>> > follows from Bob Wyman's ideas about content filtering
>> > (http://groups.google.com/group/pubsubhubbub/msg/820f7f29b7c22d46).
>> >
>> > Take the Google Buzz Track API for example
>> > (http://code.google.com/apis/buzz/v1/using_rest.html#activity-track).
>> > Let's say you have these two Track subscriptions registered (both
>> > PubSubHubbub topics):
>> >
>> > https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo
>> > https://www.googleapis.com/buzz/v1/activities/track?q=Baggins
>> >
>> > An item comes through that matches both terms (a post with the author
>> > "Bilbo Baggins"). Your PuSH subscriber will receive *two* copies of
>> > that message, one for each subscription, each on a different callback
>> > URL that was registered when you setup the PuSH subscription. This
>> > gets much worse as the number of Track queries and potential overlaps
>> > increases; it's *especially* awful for geographic queries which
>> > intrinsically overlap.
>> >
>> > Bob's solution is to deliver a single copy of the "Bilbo Baggins" post
>> > but annotate it with *which queries* it matched. I like this idea, but
>> > I want to 1) change how we express the annotation, 2) make it easy for
>> > existing clients to migrate to the new scheme, 3) not add any new
>> > parameters (e.g., "hub.filter") to the PuSH protocol.
>> >
>> >
>> > == The Proposal
>> >
>> > PubSubHubbub-enabled feeds will declare a new aggregation relation
>> > ("http://pubsubhubbub.org/aggregation";). The "href" is picked by the
>> > publisher and is a statement of "things with this aggregation URL I
>> > can batch together into aggregated delivery." For example, with the
>> > Buzz Track API feeds we could do:
>> >
>> > <feed>
>> >  <link rel="self"
>> > href="https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo"/>
>> >  <link rel="http://pubsubhubbub.org/aggregation";
>> > href="https://www.googleapis.com/buzz/v1/activities/combined"/>
>> >  ...
>> > </feed>
>> >
>> > Subscribers would see this new "rel" link and know that they could
>> > subscribe to that new topic
>> > ("https://www.googleapis.com/buzz/v1/activities/combined";) to get
>> > aggregated delivery. What does it mean to get aggregated delivery?
>> > Essentially, *all* of the subscriber's existing subscriptions with
>> > that same "aggregation" link value would *STOP* delivering, and
>> > instead the subscriber would get POSTs on a *single* callback that
>> > look like this:
>> >
>> > POST /my-aggregated-callback HTTP/1.1
>> > Link: <https://www.googleapis.com/buzz/v1/activities/combined>;
>> > rel="http://pubsubhubbub.org/aggregation";,
>> >        <https://www.googleapis.com/buzz/v1/activities/track?q=Bilbo>;
>> > rel="self",
>> >        <https://www.googleapis.com/buzz/v1/activities/track?q=Baggins>;
>> > rel="self"
>> > X-Hub-Signature: ...
>> >
>> > <feed>
>> >  <link rel="self"
>> > href="https://www.googleapis.com/buzz/v1/activities/combined"/>
>> >  <link rel="http://pubsubhubbub.org/aggregation";
>> > href="https://www.googleapis.com/buzz/v1/activities/combined"/>
>> >  ...
>> > </feed>
>> >
>> > Thus you will only get one copy of each item. The list of queries
>> > matched will be in the Link header so users know why they're getting
>> > the item.
>> >
>> > This proposal would fundamentally decouple subscription verification
>> > from event delivery. If the subscriber adds a new PuSH subscription
>> > with the same "aggregation" link value, non-obviously it will use the
>> > normal callback URL for PuSH verification but send all content
>> > delivery to the aggregated callback. Unsubscription will also use a
>> > separate callback URL for verification. If the subscriber unsubscribes
>> > from the aggregation URL, then all of the subscriptions will revert
>> > back to the old way of doing things.
>> >
>> >
>> > == Open questions
>> >
>> > Random list of questions:
>> >
>> > * What granularity do you use to move the existing subscriptions to
>> > the aggregated endpoint? Does the publisher do it by domain, by URL
>> > prefix, by some other token?
>> > * Should the "self" links in the aggregated delivery be for the feeds
>> > you subscribed to, or should you instead pass through the callback
>> > URLs that *would* have been used for normal delivery? The latter
>> > approach could be useful for subscribers who put context data into
>> > their callback URLs.
>> > * Will this allow us to finally put the Topic header in content
>> > delivery as users have requested a million times?
>> > (http://code.google.com/p/pubsubhubbub/issues/detail?id=79)
>> > * Can this scheme be reused for aggregated delivery across different
>> > sites, so subscribers get fewer POSTs?
>> >
>> >
>> > Thanks for reading!
>> >
>> > -Brett
>> >
>>
>
>
>
> --
> Stay hungry,Stay foolish.
>
> Twitter: http://twitter.com/lookon | Buzz:
> http://www.google.com/profiles/areyoulookon | Blog:
> http://throw-dice.appspot.com
>

Re: [pubsubhubbub] Re: New proposal for PubSubHubbub aggregated delivery

Reply via email to