Re: [pubsubhubbub] Options for firehoses and filtering

Bob Wyman Mon, 15 Mar 2010 13:38:19 -0700

An issue that isn't addressed in Brett's base note for this thread is that
of telling a subscriber *why* a particular entry has been delivered to them
by a hub. This is a problem that doesn't arise when you are only doing
"topic-based" or "pure firehose" subscriptions since all entries specify
their topic. The problem arises when you allow content-based, filtered
tracking subscriptions.


Imagine that I run a service that aggregates subscriptions for thousands or
users. I'm likely to want to create a variety of subscriptions that
"overlap" with each other. Thus, I might have subscriptions to "Britney
Spears," "Brocoli Spears", "spears", etc. Then, let's imagine that someone
publishes something that mentions that "Britney Spears ate some brocoli
spears"... How many entries should the hub return to the subscriber? Should
it be three entries (massive, wasteful duplication but simple...) or just
one entry that somehow identifies the three distinct subscriptions that this
particular message matched? (Massively more efficient, but would require a
means to report the subscription identifiers *outside* of the actual entry
returned. And, we'd have to decide if the query itself was the subscription
identifier, but that could waste bandwidth, or if we'd have a distinct
identifier, perhaps a fingerprint of the query?)

Content-based, filtered feeds introduce a number of issues that are
distinctly different from those already dealt with in the topic-based PSHB
system.

bob wyman

On Sun, Mar 14, 2010 at 3:17 PM, Brett Slatkin <[email protected]> wrote:

> Hey all,
>
> Here's some rough notes that Julien and I came up with at SxSW this
> year to talk about the options for using virtual feeds (eg, firehoses,
> filtering, track, geo bounaries) with PubSubHubbub. We got some nice
> input from bradfitz, Eric Marcoullier (from Gnip), Ilya Grigorik (from
> postrank), and of course, Mr. Filtering himself, Bob Wyman.
>
> Please note that order in this doc is not significant at all, we just
> wanted to get the options out there. If you have any additional
> variants of these specific options or a whole new option let us know.
>
> Thanks in advance for your feedback!
>
> -Brett
>
> ---------------
>
> 1. Use XRD
>
> - [email protected] has a feed
> - Could also work on an arbitrary URI for a domain
> - Could also work on the Hub URL
>
> Do some WebFinger: find example.com/.well-known/host-meta
>
> contains:
> <link rel="http://pubsubhubbub.org/full-feed";
> href="http://buzz.google.com/full-feed"/>
>
> This full feed URL could be a link to subscribe to or it could be an
> HTML page that says how to get approval for the firehose. You could
> have a click-through ToS to accept some terms, generate a one-off
> firehose URL, charge money, whatever you want.
>
> Good things
> - No change to hubbub protocol
>
> Bad
> - Have to fetch/parse XRD for discovery
> - Per feed basis not a per hub if the discovery is not on the hub url
> (so custom domains would require firehose discovery every time; would
> also like for one domain to have multiple different hubs for
> syndication)
>
>
> 2. Link relation in the feed itself
>
> Put something like:
>
> <atom:link rel="supersauce" href="http://buzz.google.com/full-feed"/>
>
> In every feed produced by a publisher.
>
> Good:
> - No new discovery document
> - Exactly the same discovery flow except different link relation
>
> Bad:
> - Have to add this link relation to every feed doc
> - New features for additional relation types require publisher to
> change their feed yet again (so hub functionality is too tightly
> coupled with the publish's feed, as opposed to delegation to the hub
> for discovering what the hub can do on behalf of the feed)
>
>
> 3. Verification request includes discovery information
>
> You find a feed, it has some hub urls, you subscribe and then you see
> on the verification request something like:
>
> hub.extension.fullfeed=http://example.com/full-feed
>
> And then you know that you could go back to the hub and subscribe to
> the full firehose.
>
> Could also use URI templating in here for doing specific kinds of
> filtering (using the templating spec
>
> http://bitworking.org/projects/URI-Templates/spec/draft-gregorio-uritemplate-03.html
> )
>
> hub.extension.filter=
> http://example.com/filter?params={{params}}&box={{lat/lot,lat/lon}}
>
> Another variant is these extra params could be in the headers of a
> notification request.
>
> Good:
> - Decouples hub functionality from feed publisher so hub can add new
> features without publisher changes
> - No extra queries or polling to find the extra features of the hub
>
> Bad:
> - Mixing verification and feature discovery is kinda weird (subscriber
> would presumably unsubscribe from the same feed once they found the
> firehose and that's kinda weird)
> - Not clear at all how this would work with authorization of the subscriber
> - Unclear if this should be part of the base spec or if we should wait
>
>
> 4. Fuck it
>
> Don't define it. Everyone does virtual feeds/filtering/firehose
> declaration a little different and users just figure out how to use
> their favorite provider.
>
> Pros:
> - Simplify the spec by taking out aggregated delivery (which is kind
> of broken in the base spec right now anyways because we're overriding
> what atom:source is actually for)
>
> Cons:
> - Different providers may completely diverge
>
>
> 5. Like #1 except skip XRD and use a new mode
>
> Do a query on the hub URL like:
>
> http://example.com/hub?hub.mode=whatsup
>
> This returns a 302 or an HTML doc or something that some human needs
> to inspect to figure out what they can do with this hub, some of which
> may be programmatic.
>

Re: [pubsubhubbub] Options for firehoses and filtering

Reply via email to