[pubsubhubbub] Options for firehoses and filtering

Brett Slatkin Sun, 14 Mar 2010 12:17:13 -0700

Hey all,

Here's some rough notes that Julien and I came up with at SxSW this
year to talk about the options for using virtual feeds (eg, firehoses,
filtering, track, geo bounaries) with PubSubHubbub. We got some nice
input from bradfitz, Eric Marcoullier (from Gnip), Ilya Grigorik (from
postrank), and of course, Mr. Filtering himself, Bob Wyman.


Please note that order in this doc is not significant at all, we just
wanted to get the options out there. If you have any additional
variants of these specific options or a whole new option let us know.

Thanks in advance for your feedback!

-Brett

---------------

1. Use XRD

- [email protected] has a feed
- Could also work on an arbitrary URI for a domain
- Could also work on the Hub URL

Do some WebFinger: find example.com/.well-known/host-meta

contains:
<link rel="http://pubsubhubbub.org/full-feed";
href="http://buzz.google.com/full-feed"/>

This full feed URL could be a link to subscribe to or it could be an
HTML page that says how to get approval for the firehose. You could
have a click-through ToS to accept some terms, generate a one-off
firehose URL, charge money, whatever you want.

Good things
- No change to hubbub protocol

Bad
- Have to fetch/parse XRD for discovery
- Per feed basis not a per hub if the discovery is not on the hub url
(so custom domains would require firehose discovery every time; would
also like for one domain to have multiple different hubs for
syndication)


2. Link relation in the feed itself

Put something like:

<atom:link rel="supersauce" href="http://buzz.google.com/full-feed"/>

In every feed produced by a publisher.

Good:
- No new discovery document
- Exactly the same discovery flow except different link relation

Bad:
- Have to add this link relation to every feed doc
- New features for additional relation types require publisher to
change their feed yet again (so hub functionality is too tightly
coupled with the publish's feed, as opposed to delegation to the hub
for discovering what the hub can do on behalf of the feed)


3. Verification request includes discovery information

You find a feed, it has some hub urls, you subscribe and then you see
on the verification request something like:

hub.extension.fullfeed=http://example.com/full-feed

And then you know that you could go back to the hub and subscribe to
the full firehose.

Could also use URI templating in here for doing specific kinds of
filtering (using the templating spec
http://bitworking.org/projects/URI-Templates/spec/draft-gregorio-uritemplate-03.html)

hub.extension.filter=http://example.com/filter?params={{params}}&box={{lat/lot,lat/lon}}

Another variant is these extra params could be in the headers of a
notification request.

Good:
- Decouples hub functionality from feed publisher so hub can add new
features without publisher changes
- No extra queries or polling to find the extra features of the hub

Bad:
- Mixing verification and feature discovery is kinda weird (subscriber
would presumably unsubscribe from the same feed once they found the
firehose and that's kinda weird)
- Not clear at all how this would work with authorization of the subscriber
- Unclear if this should be part of the base spec or if we should wait


4. Fuck it

Don't define it. Everyone does virtual feeds/filtering/firehose
declaration a little different and users just figure out how to use
their favorite provider.

Pros:
- Simplify the spec by taking out aggregated delivery (which is kind
of broken in the base spec right now anyways because we're overriding
what atom:source is actually for)

Cons:
- Different providers may completely diverge


5. Like #1 except skip XRD and use a new mode

Do a query on the hub URL like:

http://example.com/hub?hub.mode=whatsup

This returns a 302 or an HTML doc or something that some human needs
to inspect to figure out what they can do with this hub, some of which
may be programmatic.

[pubsubhubbub] Options for firehoses and filtering

Reply via email to