[pubsubhubbub] Fat publishing, general HTTP pub/sub, Hub-to-Hub comm

Jeff Lindsay Mon, 11 Jan 2010 12:20:18 -0800

These are all relevant to each other, and all need basically the same thing:
a way for Hub's to accept verified fat updates.


Here is an abbreviated discussion I had with Brett a while back:

>From Jeff to Brett:
===
[truncated]

The other content types... first, I don't think the idea of differential
updates should be coupled with content types. pub/sub is not intrinsically
about updates/diffs, but messages ... in the case of Atom/RSS feeds, it just
happens to make sense to send only new entries as those messages. Second,
the fact PSHB fetches the feed document is also entirely coupled to feeds,
however it obviously simplifies the security model for publishing. The only
way to make this a generic enough pubsub protocol for it to be useful beyond
feeds is to figure out how to securely accept "fat pings" for publishing.

Now, my idea of "fat pings" may be different than yours. Really, what we're
talking about is a system that just multicasts POSTs. That's what every HTTP
pubsub system that's sprung up is doing at its core ... it's what I want and
NEED ... and yet Hubbub does not do this!

My ideal core PSHB spec looks like this:

-Sure, yes, it uses the Link header or whatever relevant "link" mechanism
for that resource for discovery.
-Big one: It allows the publisher to POST using whatever content type and
body to a given topic.
-The subscribers then get that data POSTed to them nearly just as it came
in, content type and everything.

This means, it is entirely content type agnostic. It's literally, at its
core, a system that multicasts posts. Because that's what we need. The hard
part is the security model for having these open hubs to accept data that it
can trust is from the publisher of the owner of the topic. Right? Well,
here's how we do this:

The idea is to give the publisher a URL it can POST whatever it wants to. It
can't be the primary Hub endpoint because that requires params to
use--meaning it eats up the content body. What's more is the hub has to know
content posted to it is coming from the trusted publisher/owner for a given
topic URL. So we are basically going to use a capability URL! I'm sure
people have mentioned these to you -- it's an unguessable URL that
represents authorized access. Usually capability systems use these URLs to
wrap another URL, but here we're just using it to bundle in an authorization
token into the request without using parameters.

But how do we get the publisher this URL? Well, as it stands a publisher
owns a URL that represents a topic. We keep this. Even if this URL is not a
feed and not the source of the content we're publishing ... the way we need
to think of it is primarily a unique string for a topic AND the address to
the publisher we know owns it.

We use this to let the publisher tell the hub where it should send this
secret capability URL. In this process, we also use a verification token so
the publisher knows when the Hub tells it this URL, that it is that hub.

Now, here's the story:

-Publisher owns a topic resource.
-Publisher wants to publish content regarding this topic resource to Hub.
-Publisher tells Hub it wants to publish "fat pings" (read: arbitrary
content payloads) for the topic resource. It gives Hub a secret token.
-Hub goes to the topic resource. It finds a Link header with
rel=give-me-a-post-url (for lack of a better name).
-Hub then POSTs to this new URL (new because if the topic resource IS a
feed, its silly to accept POSTs on it -- also allows publishing agents!)
with a secret "capability URL" and the secret token the Publisher gave it.
-Given the secret token matches what it gave the Hub, Publisher now has a
URL it can POST to knowing Subscribers of his topic on that Hub will get
these POSTs.

If you have this system, you could make the whole feed fetching and diffing
a separate system entirely. I'd leave that up to you, but I assume you'd
want to keep the existing "feed shortcut" interface to keep all the feed
people happy (and not entirely shatter their idea of what PSHB really is --
an actual pubsub messaging hub).

Thoughts?

-jeff

>From Brett to Jeff, Brad:
===
+brad

Hey Jeff,

Sorry again for the delay. I think that for the content-type stuff we
are roughly on the same page, so I'm going to only focus on fat
pushing for this email.


Brad and I discussed an approach like this at some point. This
multi-step dance is one way to avoid auth problems for fat pushing
from publishers to hubs. Other arguments aside (e.g., why not just use
oauth), the problem is simple: This approach requires the publisher to
register their feed with the hub.

We ruled out this option (for the simplest case) because requiring
registration of a topic adds an enormous burden to large publishers
and hubs. The 80% of feeds they publish that nobody ever cares to
subscribe to would be registered and pushed; the hub would drop
updates on the floor after tons of work had already been done. Why
bother at all?

The naive ping from the publisher to the hub ensures the cost of
uninteresting topics is as low as possible. The publisher does not
have to form the content body, invalidate any cache, or send any data
to the hub. The hub does not have to receive any data, parse any
content, or verify signatures.


Two thoughts that follow (which I'd like your feedback on):

1) Every company I've talked to wants to do fat pushing differently.
At Google we would use our own proto-buffer-RPCs, Facebook would use
Thrift, others would use XML-RPC or JSON-RPC. The reason is that each
company has tools for deploying, monitoring, and debugging these
protocols in their production environment. For that reason, it seems
that dictating the format for fat pushes from publisher to hubs is of
limited value.

2) I wonder: What is the benefit of fat pinging to a third-party hub
over running/building your own hub (i.e., publisher and hub integrated
as one)? You would need all of the complexity of a feed registration
system *and* a way to generate/send full payloads to the hub on a
particular capability URL. It strikes me that this level of complexity
is roughly equivalent to building and running a Hub. If you run your
own hub, you can fat ping however you want.


Hopefully when we dig into these questions we can clarify some
assumptions and get to the meat of the reasoning here. I think #2 will
be a more reasonable claim when there are robust, open-source Hubs
available. I hope that Pádraic's hub and your hub could eventually
fill this role (potentially with non-standard, shared secret-based
fat-pinging extensions like this one:
http://code.google.com/p/pubsubhubbub/source/browse/trunk/nonstandard/fat_publish.py
)

-Brett

>From Jeff to Brett, Brad:
===
Hmm, I see your point. However, registering a topic requires only two bits:

- Pinging the hub telling it has something to publish
- Listening for the publish URL

Then you just push. I imagine *that's* where the trouble is for large
hubs--not the registration part (since it's not much different from regular
PSHB pings), but sending all those pushes that the hub will drop.

But that seems easy to solve. You just tell the hub to defer sending you the
publish URL until there is a subscribe request.

I'll hold off on discussing the way you do fat pings (xml-rpc, json-rpc,
etc) ... because that seems silly. The short of it is: If everybody does it
different, you do it at the lowest common denominator --- simple HTTP.
That's what this is.

-jeff

-- 
Jeff Lindsay
http://webhooks.org -- Make the web more programmable
http://shdh.org -- A party for hackers and thinkers
http://tigdb.com -- Discover indie games
http://progrium.com -- More interesting things

[pubsubhubbub] Fat publishing, general HTTP pub/sub, Hub-to-Hub comm

Reply via email to