Very cool, but I do get the point of topics being implemented as content filters. In my world, topics are still useful as a sort of primary key to shard and route more efficiently. Otherwise you just have a firehose that you always have to do filtering on, which is harder to do -- or at least adds complexity (throw Storm the loop maybe). So there is a balance. I usually implement limited content filtering on top of a topic stream.
Anyway, I still feel like topic-based is the way to go because it makes the simple case very simple, it's familiar, and you *can* do content filtering on top of it. It's also more compatible with the nature of HTTP (single resource oriented operations), which is where I think PSHB should live -- not so much in the content. If we were to do content-based filtering, that would imply moving into the content-space, which is why I understand your advocacy for Atom. I just don't think we should. -jeff On Wed, Nov 30, 2011 at 3:15 PM, Bob Wyman <[email protected]> wrote: > To better demonstrate what I'm talking about (re: Topic-based v > Content-based), I've put up a little demo. (Code speaks louder than > words...) Check out: http://frackim.appspot.com/ . "FrackIM" allows you > to "Follow" or "Track" messages which originate either from XMPP IM or from > PubSubHubbub. Read the instructions <http://frackim.appspot.com/> and > then add [email protected] to your buddy list. > > The software uses the AppEngine Prospective Search Service and delivers > results using AppEngine's XMPP service. Topic-based behavior is implemented > by subscribing to messages using a query that constrains the "follow" > attribute of messages. This attribute contains either an XMPP JID or an > HTTP URL (the URL is, of course, a PSHB "topic"). For example: > /subscribe follow:huffingtonpost.com is like a topic-based subscription > in PSHB today. > To get "content-based" behavior, you create a subscription that constrains > the "track" attribute of a message. Thus: > /subscribe track:obama is content-based and will match any IM or PSHB > message that contains the word "Obama". > of course, you can combine the two together like this: > /subscribe follow:huffingtonpost.com AND track:obama which would return > only messages published by HuffingtonPost that contain the word Obama. > > The point here is to demonstrate that a content-based system can implement > topic-based as a degenerate, trivial case. (i.e. in this example, a "topic > based" system would only support the "follow" attribute.) However, such a > system is easily extended to handle more complex applications by simply > allowing more fields to match against and by allowing a greater variety of > query operators. In such a system, you never even consider building a > "firehose" since you essentially start off with one to begin with. > > Give the toy a try and see what you think. Note: it is only subscribed to > a small number of feeds -- mostly political content. So, subscriptions like > "Obama" are more likely to work than geeky stuff like "prospective search." > If you have some PSHB topics you'd like me to add, just send a note > off-list. > > On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]> wrote: > > Btw, are you in the area (SF)? It would be interesting to discuss > I work in NYC so it would be hard to meet up in SF any time soon. There is > always email... > > bob wyman > > > On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]> wrote: > >> Okay, I better understand your position and perspective on this. Btw, are >> you in the area (SF)? It would be interesting to discuss topic vs content >> based subscriptions in person because I have thought/worked with it a lot, >> but not in those terms. >> >> -jeff >> >> >> On Tue, Nov 29, 2011 at 2:06 PM, Bob Wyman <[email protected]> wrote: >> >>> >>> >>> On Mon, Nov 28, 2011 at 8:45 PM, Jeff Lindsay <[email protected]>wrote: >>> >>>> The idea was that the hub should publish Atom entries and only Atom >>>>> entries. Of course, the entries would contain atom.source elements to show >>>>> the feeds with which they were associated. Also, the hub should do >>>>> de-duping to ensure that any particular entry isn't sent more than once. >>>>> >>>> >>>> Yeah, I get the reasoning behind Atom and I understand it's more >>>> general use. The problem is in order to make something useful and easy to >>>> adopt, you need to really facilitate what people are already doing and are >>>> familiar with. Not everybody wants to work with Atom, despite all its >>>> benefits. Having Atom as a representation or as a possible payload is >>>> great, but depending on its semantics, forcing it to be required for PSHB >>>> to be useful is not a great idea... or least a pragmatic one IMO. >>>> >>>> >>>>> We could build all the above things very easily based on systems that >>>>> publish Atom feeds and allow content-based (query-based) subscriptions. >>>>> >>>> >>>> Call me crazy, but I'm in love with the Unix philosophy of doing one >>>> thing well and designing for composition of more complex systems from >>>> simple parts. >>>> >>> "Designing for composition of more complex systems from simple parts" is >>> an excellent goal. The problem is that in order to facilitate composition, >>> you must have some idea of what kinds of complex systems you're going to >>> compose. Given the application domain under discussion (Publish/Subscribe >>> even if some other name is used), the problem here is that we know from >>> many long years of experience that it is difficult to build a content-based >>> system on top of a topic-based system yet it is trivial to build a >>> topic-based system on top of a content-based system. It is important where >>> you start when designing systems. Things get path-dependent very quickly. >>> >>> The problem is that design decisions made to facilitate topic-based >>> system construction tend to make harder the job of building content based >>> systems. Take, for example, the regular discussion of "firehoses" which are >>> almost always a common subject of discussion with topic-based systems but >>> are generally irrelevant when discussing content-based systems. A firehose, >>> which adds complexity to the topic-based implementation, is almost always >>> needed when people want to do any kind of content-based work on top of a >>> topic-based system. (That can include either real-time filtering or dumping >>> of data into a database for later "content-based" retrieval or searching.) >>> A firehose is simply a mechanism to de-mux or merge together the many >>> topic-based streams that were created in order to provide a topic-based >>> subscription model. If you start with a topic-based system, you almost >>> always need to construct firehoses in order to make content-based routing >>> possible. On the other hand, if you start with a content-based system and >>> have "topic" as an attribute of each published item, then it is trivial to >>> create "topic" streams since they are simply single-attribute subscriptions >>> keyed on the "topic" attribute. >>> >>> If you start with a content-based model but want topic-based, then >>> instead of subscribing to topic "foobar" you assume that all published >>> items have an attribute named "topic" and you subscribe to "topic = >>> 'foobar'". A "topic-based" system is thus nothing more than the most simple >>> use of a content-based system. Of course, the advantage of using a trivial >>> content-based interface to emulate a topic-based system is that you can >>> then easily expand the capability of the base system to support more >>> complex filters or queries. You can go from just a single attribute and >>> exact-match to allowing full Boolean expressions, etc. without making a >>> significant change to the subscription interface -- the changes are only to >>> the subscription query syntax and those changes can all produce proper >>> supersets of the trival syntax. >>> >>> What I wonder is what, if any, benefit comes from baking "topic-based" >>> into the subscription interface? Given that the alternative provides such >>> flexibility down the road, what significant advantage do you get from >>> limiting the system's expressiveness up-front? >>> >>> >>>> Queries and filters, to me, are out of the scope of this protocol, >>>> despite being very useful. >>>> >>> If you see my reasoning in the paragraphs above, you won't be surprised >>> that I claim that in order to build a topic-based system, you already need >>> to build "Queries and Filters." The only difference is that if you build >>> something like PSHB, you are building a very simple filter language that >>> happens to be hard to extend. When people subscribe to topic " >>> http://example.com/feed" it is EXACTLY the same, semantically, as >>> subscribing using the query "topic = 'http://example.com/feed'"... >>> There is no significant introduction of complexity that results from going >>> from topic-based to content-based -- only a much easier path to doing more >>> interesting things in the future. (i.e. "topic='http://example.com/feedAND >>> content='foobar'" is just a step away...) >>> >>> >>>> The reason is that anybody can create a subscriber or relay (perhaps >>>> even a hub) that happens to do that filtering in its implementation. >>>> >>> Yes, anyone can build yet another aggregator to either consume firehoses >>> or construct them and then filter them. But, just because a thing can be >>> done, doesn't mean that we should insist that it be done -- unless there is >>> a good reason not to allow alternatives. In this case, I can't see that >>> there are. Building the basic system using the model of a trivial >>> content-based system doesn't make it any more difficult to build other hubs >>> or relays that can do arbitrary processing, however, it gives us the option >>> of allowing a single system, with a standard interface, to do both the >>> simple and the complex work in an integrated and more efficient manner. >>> >>>> >>>> That said, I'm assuming this was more just to defend Atom and >>>> content-based subscriptions, to which I would say: those examples should be >>>> possible *if* you use Atom as your content container and have access to or >>>> can build a subscription querier node. But it should also be possible if >>>> the content is *not* Atom using the same approach of putting the filtering >>>> in an intermediate node (or potentially being an implementation detail of a >>>> hub). >>>> >>>> I just think the core should be simple and neutral, allowing more >>>> specialized extensions, additions, and combinability. And for that, my >>>> experience (and general observations) suggest that we should focus on >>>> content-type neutral HTTP-based mechanisms. >>>> >>>> -jeff >>>> >>>> >>>>> >>>>> bob wyman >>>>> >>>>> >>>>> On Mon, Nov 28, 2011 at 6:33 PM, Julien Genestoux < >>>>> [email protected]> wrote: >>>>> >>>>>> Jeff, do you think you could help getting the folks at GitHub, >>>>>> Twilio, FreshBooks, Pusher to come in here and participate? What would >>>>>> they >>>>>> love to see in and out of PubSubHubbub so that it fits their needs? >>>>>> >>>>>> Bob, that's an interesting point. You said you wanted PSHB to be >>>>>> about entries rather than feeds. I'm not sure I understand this. I guess >>>>>> you would still need to subscribe to an endpoint that would emit a >>>>>> collection of entries, right? >>>>>> >>>>>> Julien >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Nov 29, 2011 at 12:16 AM, Bob Wyman <[email protected]> wrote: >>>>>> >>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> > PubSubHubbub is currently too >>>>>>> > much oriented toward data feeds >>>>>>> Personally, I think that PSHB "went wrong" when folk insisted that >>>>>>> it support RSS instead of just Atom. In the Atom format we had gone to >>>>>>> great trouble to ensure that "entry" was a top-level item and that >>>>>>> entries >>>>>>> had the same semantics whether they were inside feeds or on their own. >>>>>>> (Not >>>>>>> the case with RSS.) One of the reasons that I worked to make this the >>>>>>> case >>>>>>> was that I've been wanting to do pubsub with arbitrary content for many >>>>>>> years... The idea was that an Atom entry is a reasonable wrapper or >>>>>>> container for just about any content you might want to publish. (MIME >>>>>>> types >>>>>>> distinguish the content type.) Thus, a system for syndicating Atom >>>>>>> entries >>>>>>> could be used to reasonably syndicate just about anything. But, when >>>>>>> support for RSS feeds came into the PSHB spec, all sorts of things got >>>>>>> confused... PSHB should have been about the entries, not the feeds... >>>>>>> >>>>>>> bob wyman >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> Jeff, thanks for sharing so quickly :) >>>>>>>> I perfectly agree and acknowledge that PubSubHubbub is currently too >>>>>>>> much oriented toward data feeds, and content in general, while it's >>>>>>>> just a sub-case. >>>>>>>> I also think the "realtime" aspect of things doesn't matter that >>>>>>>> much, >>>>>>>> and is just a consequence of the "push" design. When you trigger >>>>>>>> events, there is no reason to do it later than sooner. >>>>>>>> >>>>>>>> The spec should evolve in something that works as well for events >>>>>>>> than >>>>>>>> for content. >>>>>>>> It should be "subscribe to a web resource, get events". [this can be >>>>>>>> decorated in any way people want to work with feeds, with publisher/ >>>>>>>> hubs merged or distinct, with no data... etc.] >>>>>>>> >>>>>>>> Julien >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Nov 28, 11:21 pm, Jeff Lindsay <[email protected]> wrote: >>>>>>>> > On Mon, Nov 28, 2011 at 2:02 PM, Julien Genestoux < >>>>>>>> > >>>>>>>> > [email protected]> wrote: >>>>>>>> > > Jeff, please do share your feelings. Help us make PubSubHubbub >>>>>>>> better! >>>>>>>> > > Bob, obviously pubsubhubub should be less about blogging and/or >>>>>>>> news. I >>>>>>>> > > started a thread about supporting any kind of arbitrary data, >>>>>>>> and this is >>>>>>>> > > what I had in mind as a way to suppoty any kind of content, and >>>>>>>> any type of >>>>>>>> > > updates (with our without payload). >>>>>>>> > >>>>>>>> > To this point, my main feeling is that, yes, PSHB is focused too >>>>>>>> much on >>>>>>>> > content. While I think this is useful (as its been the primary >>>>>>>> use case), >>>>>>>> > it's not a wide enough net to really have critical mass as a >>>>>>>> project. I >>>>>>>> > originally thought it was good that it was very focused and >>>>>>>> didn't solve >>>>>>>> > *my* particular problems. I also thought it was good it focused >>>>>>>> on a >>>>>>>> > tangible goal of making feeds more realtime. However, I think >>>>>>>> time has >>>>>>>> > shown it was not enough to be a big enough deal to sustain >>>>>>>> momentum as a >>>>>>>> > project. >>>>>>>> > >>>>>>>> > The problem is that this general problem PSHB solves has many >>>>>>>> different >>>>>>>> > views/perspectives/languages. For example, it can be message >>>>>>>> oriented and >>>>>>>> > talk about pubsub. Or it can be event oriented and talk about >>>>>>>> events etc >>>>>>>> > (the perspective used by Phil and them). Or it can even be >>>>>>>> thought of as >>>>>>>> > callbacks or hooks (webhooks). There are other similar concepts >>>>>>>> with >>>>>>>> > different language as well: updates/notifications, observers, >>>>>>>> etc. The two >>>>>>>> > main ones seem to be events vs messages/pubsub, and I'm not sure >>>>>>>> which one >>>>>>>> > is generally consider more general than the other. Ultimately, >>>>>>>> technically, >>>>>>>> > they're more or less the same thing, but I think the framing >>>>>>>> makes a *big* >>>>>>>> > difference. >>>>>>>> > >>>>>>>> > Anyway, that's the start of my ideas around this. >>>>>>>> > >>>>>>>> > -jeff >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > > Julien >>>>>>>> > >>>>>>>> > > On Mon, Nov 28, 2011 at 9:33 PM, Bob Wyman <[email protected]> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > >> The sitehttp://www.mostlybaked.com/provides a number of quick >>>>>>>> sketches >>>>>>>> > >> of applications that are things that I personally think should >>>>>>>> work well >>>>>>>> > >> over PSHB if the focus of PSHB became less about blogging and >>>>>>>> more about >>>>>>>> > >> the general case of publishing and subscribing to streams of >>>>>>>> data on the >>>>>>>> > >> Internet. Also, Phil often talks about the kinds of things >>>>>>>> that he'd like >>>>>>>> > >> to do with the EventedAPI on his blog. ex: >>>>>>>> > >> >>>>>>>> http://www.windley.com/archives/2011/11/personal_event_networks_and_v. >>>>>>>> .. >>>>>>>> > >>>>>>>> > >> bob wyman >>>>>>>> > >>>>>>>> > >> On Mon, Nov 28, 2011 at 1:16 PM, Bob Wyman <[email protected]> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > >>> See:http://www.eventedapi.org/spec >>>>>>>> > >>>>>>>> > >>> As we consider what can be done to move PubSubHubbub forward, >>>>>>>> it might >>>>>>>> > >>> make sense to take a look at some other protocols that folk >>>>>>>> have defined to >>>>>>>> > >>> determine if there is anything in them that PubSubHubbub >>>>>>>> should be >>>>>>>> > >>> implemented or if they do things better that PSHB does. The >>>>>>>> folk at Kynetx ( >>>>>>>> > >>>http://apps.kynetx.com/) have been building up a PSHB-like >>>>>>>> system for >>>>>>>> > >>> some time now... I'm not sure I understand why PSHB wouldn't, >>>>>>>> in fact, >>>>>>>> > >>> serve their needs. >>>>>>>> > >>>>>>>> > >>> bob wyman >>>>>>>> > >>>>>>>> > -- >>>>>>>> > Jeff Lindsayhttp://progrium.com >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Jeff Lindsay >>>> http://progrium.com >>>> >>> >>> >> >> >> -- >> Jeff Lindsay >> http://progrium.com >> > > -- Jeff Lindsay http://progrium.com
