These are all relevant to each other, and all need basically the same thing: a way for Hub's to accept verified fat updates.
Here is an abbreviated discussion I had with Brett a while back: >From Jeff to Brett: === [truncated] The other content types... first, I don't think the idea of differential updates should be coupled with content types. pub/sub is not intrinsically about updates/diffs, but messages ... in the case of Atom/RSS feeds, it just happens to make sense to send only new entries as those messages. Second, the fact PSHB fetches the feed document is also entirely coupled to feeds, however it obviously simplifies the security model for publishing. The only way to make this a generic enough pubsub protocol for it to be useful beyond feeds is to figure out how to securely accept "fat pings" for publishing. Now, my idea of "fat pings" may be different than yours. Really, what we're talking about is a system that just multicasts POSTs. That's what every HTTP pubsub system that's sprung up is doing at its core ... it's what I want and NEED ... and yet Hubbub does not do this! My ideal core PSHB spec looks like this: -Sure, yes, it uses the Link header or whatever relevant "link" mechanism for that resource for discovery. -Big one: It allows the publisher to POST using whatever content type and body to a given topic. -The subscribers then get that data POSTed to them nearly just as it came in, content type and everything. This means, it is entirely content type agnostic. It's literally, at its core, a system that multicasts posts. Because that's what we need. The hard part is the security model for having these open hubs to accept data that it can trust is from the publisher of the owner of the topic. Right? Well, here's how we do this: The idea is to give the publisher a URL it can POST whatever it wants to. It can't be the primary Hub endpoint because that requires params to use--meaning it eats up the content body. What's more is the hub has to know content posted to it is coming from the trusted publisher/owner for a given topic URL. So we are basically going to use a capability URL! I'm sure people have mentioned these to you -- it's an unguessable URL that represents authorized access. Usually capability systems use these URLs to wrap another URL, but here we're just using it to bundle in an authorization token into the request without using parameters. But how do we get the publisher this URL? Well, as it stands a publisher owns a URL that represents a topic. We keep this. Even if this URL is not a feed and not the source of the content we're publishing ... the way we need to think of it is primarily a unique string for a topic AND the address to the publisher we know owns it. We use this to let the publisher tell the hub where it should send this secret capability URL. In this process, we also use a verification token so the publisher knows when the Hub tells it this URL, that it is that hub. Now, here's the story: -Publisher owns a topic resource. -Publisher wants to publish content regarding this topic resource to Hub. -Publisher tells Hub it wants to publish "fat pings" (read: arbitrary content payloads) for the topic resource. It gives Hub a secret token. -Hub goes to the topic resource. It finds a Link header with rel=give-me-a-post-url (for lack of a better name). -Hub then POSTs to this new URL (new because if the topic resource IS a feed, its silly to accept POSTs on it -- also allows publishing agents!) with a secret "capability URL" and the secret token the Publisher gave it. -Given the secret token matches what it gave the Hub, Publisher now has a URL it can POST to knowing Subscribers of his topic on that Hub will get these POSTs. If you have this system, you could make the whole feed fetching and diffing a separate system entirely. I'd leave that up to you, but I assume you'd want to keep the existing "feed shortcut" interface to keep all the feed people happy (and not entirely shatter their idea of what PSHB really is -- an actual pubsub messaging hub). Thoughts? -jeff >From Brett to Jeff, Brad: === +brad Hey Jeff, Sorry again for the delay. I think that for the content-type stuff we are roughly on the same page, so I'm going to only focus on fat pushing for this email. Brad and I discussed an approach like this at some point. This multi-step dance is one way to avoid auth problems for fat pushing from publishers to hubs. Other arguments aside (e.g., why not just use oauth), the problem is simple: This approach requires the publisher to register their feed with the hub. We ruled out this option (for the simplest case) because requiring registration of a topic adds an enormous burden to large publishers and hubs. The 80% of feeds they publish that nobody ever cares to subscribe to would be registered and pushed; the hub would drop updates on the floor after tons of work had already been done. Why bother at all? The naive ping from the publisher to the hub ensures the cost of uninteresting topics is as low as possible. The publisher does not have to form the content body, invalidate any cache, or send any data to the hub. The hub does not have to receive any data, parse any content, or verify signatures. Two thoughts that follow (which I'd like your feedback on): 1) Every company I've talked to wants to do fat pushing differently. At Google we would use our own proto-buffer-RPCs, Facebook would use Thrift, others would use XML-RPC or JSON-RPC. The reason is that each company has tools for deploying, monitoring, and debugging these protocols in their production environment. For that reason, it seems that dictating the format for fat pushes from publisher to hubs is of limited value. 2) I wonder: What is the benefit of fat pinging to a third-party hub over running/building your own hub (i.e., publisher and hub integrated as one)? You would need all of the complexity of a feed registration system *and* a way to generate/send full payloads to the hub on a particular capability URL. It strikes me that this level of complexity is roughly equivalent to building and running a Hub. If you run your own hub, you can fat ping however you want. Hopefully when we dig into these questions we can clarify some assumptions and get to the meat of the reasoning here. I think #2 will be a more reasonable claim when there are robust, open-source Hubs available. I hope that Pádraic's hub and your hub could eventually fill this role (potentially with non-standard, shared secret-based fat-pinging extensions like this one: http://code.google.com/p/pubsubhubbub/source/browse/trunk/nonstandard/fat_publish.py ) -Brett >From Jeff to Brett, Brad: === Hmm, I see your point. However, registering a topic requires only two bits: - Pinging the hub telling it has something to publish - Listening for the publish URL Then you just push. I imagine *that's* where the trouble is for large hubs--not the registration part (since it's not much different from regular PSHB pings), but sending all those pushes that the hub will drop. But that seems easy to solve. You just tell the hub to defer sending you the publish URL until there is a subscribe request. I'll hold off on discussing the way you do fat pings (xml-rpc, json-rpc, etc) ... because that seems silly. The short of it is: If everybody does it different, you do it at the lowest common denominator --- simple HTTP. That's what this is. -jeff -- Jeff Lindsay http://webhooks.org -- Make the web more programmable http://shdh.org -- A party for hackers and thinkers http://tigdb.com -- Discover indie games http://progrium.com -- More interesting things
