Ho, and it would indeed be amazing if I could subscribe to Google's result pages :)
Julien On Thu, Dec 1, 2011 at 9:43 AM, Julien Genestoux <[email protected] > wrote: > Bob, > > I really like your example. I have a question though. One of my favorite > things about Google's search is that they use query params for search > rather than POST params, which means we have permalinks for searches. > > In my mind we could use topics to do content based searches. For example, > I would rewrite your > /subscribe follow:huffingtonpost.com as /subscribe > http://frackim.appspot.com/?follow=huffingtonpost.com > > This way, there is no difference in subscription process between an actual > topic (feed) or track/follow search. Other benefits include the fact that > we have a permalink to access historical results, but also that each > service can implement their very own operators and algorithm, without > affecting the subscription process. In my mind, this is a very powerful > decoupling, and really equivalent to a "shell" with pipes approach > described by Jeff in a previous message. > > So, if we can turn PubSubHubbub into a "generic" web resource subscription > service, there is no need to distinguish between pure topic based and > content based subscriptions. > > Please let me know if this is unclear. > > Julien > > > On Thu, Dec 1, 2011 at 12:31 AM, Jeff Lindsay <[email protected]> wrote: > >> Very cool, but I do get the point of topics being implemented as content >> filters. In my world, topics are still useful as a sort of primary key to >> shard and route more efficiently. Otherwise you just have a firehose that >> you always have to do filtering on, which is harder to do -- or at least >> adds complexity (throw Storm the loop maybe). So there is a balance. I >> usually implement limited content filtering on top of a topic stream. >> >> Anyway, I still feel like topic-based is the way to go because it makes >> the simple case very simple, it's familiar, and you *can* do content >> filtering on top of it. It's also more compatible with the nature of HTTP >> (single resource oriented operations), which is where I think PSHB should >> live -- not so much in the content. If we were to do content-based >> filtering, that would imply moving into the content-space, which is why I >> understand your advocacy for Atom. I just don't think we should. >> >> -jeff >> >> >> On Wed, Nov 30, 2011 at 3:15 PM, Bob Wyman <[email protected]> wrote: >> >>> To better demonstrate what I'm talking about (re: Topic-based v >>> Content-based), I've put up a little demo. (Code speaks louder than >>> words...) Check out: http://frackim.appspot.com/ . "FrackIM" allows you >>> to "Follow" or "Track" messages which originate either from XMPP IM or from >>> PubSubHubbub. Read the instructions <http://frackim.appspot.com/> and >>> then add [email protected] to your buddy list. >>> >>> The software uses the AppEngine Prospective Search Service and delivers >>> results using AppEngine's XMPP service. Topic-based behavior is implemented >>> by subscribing to messages using a query that constrains the "follow" >>> attribute of messages. This attribute contains either an XMPP JID or an >>> HTTP URL (the URL is, of course, a PSHB "topic"). For example: >>> /subscribe follow:huffingtonpost.com is like a topic-based subscription >>> in PSHB today. >>> To get "content-based" behavior, you create a subscription that >>> constrains the "track" attribute of a message. Thus: >>> /subscribe track:obama is content-based and will match any IM or PSHB >>> message that contains the word "Obama". >>> of course, you can combine the two together like this: >>> /subscribe follow:huffingtonpost.com AND track:obama which would return >>> only messages published by HuffingtonPost that contain the word Obama. >>> >>> The point here is to demonstrate that a content-based system can >>> implement topic-based as a degenerate, trivial case. (i.e. in this example, >>> a "topic based" system would only support the "follow" attribute.) However, >>> such a system is easily extended to handle more complex applications by >>> simply allowing more fields to match against and by allowing a greater >>> variety of query operators. In such a system, you never even consider >>> building a "firehose" since you essentially start off with one to begin >>> with. >>> >>> Give the toy a try and see what you think. Note: it is only subscribed >>> to a small number of feeds -- mostly political content. So, subscriptions >>> like "Obama" are more likely to work than geeky stuff like "prospective >>> search." If you have some PSHB topics you'd like me to add, just send a >>> note off-list. >>> >>> On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]> >>> wrote: >>> > Btw, are you in the area (SF)? It would be interesting to discuss >>> I work in NYC so it would be hard to meet up in SF any time soon. There >>> is always email... >>> >>> bob wyman >>> >>> >>> On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]>wrote: >>> >>>> Okay, I better understand your position and perspective on this. Btw, >>>> are you in the area (SF)? It would be interesting to discuss topic vs >>>> content based subscriptions in person because I have thought/worked with it >>>> a lot, but not in those terms. >>>> >>>> -jeff >>>> >>>> >>>> On Tue, Nov 29, 2011 at 2:06 PM, Bob Wyman <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Nov 28, 2011 at 8:45 PM, Jeff Lindsay <[email protected]>wrote: >>>>> >>>>>> The idea was that the hub should publish Atom entries and only Atom >>>>>>> entries. Of course, the entries would contain atom.source elements to >>>>>>> show >>>>>>> the feeds with which they were associated. Also, the hub should do >>>>>>> de-duping to ensure that any particular entry isn't sent more than once. >>>>>>> >>>>>> >>>>>> Yeah, I get the reasoning behind Atom and I understand it's more >>>>>> general use. The problem is in order to make something useful and easy to >>>>>> adopt, you need to really facilitate what people are already doing and >>>>>> are >>>>>> familiar with. Not everybody wants to work with Atom, despite all its >>>>>> benefits. Having Atom as a representation or as a possible payload is >>>>>> great, but depending on its semantics, forcing it to be required for PSHB >>>>>> to be useful is not a great idea... or least a pragmatic one IMO. >>>>>> >>>>>> >>>>>>> We could build all the above things very easily based on systems >>>>>>> that publish Atom feeds and allow content-based (query-based) >>>>>>> subscriptions. >>>>>>> >>>>>> >>>>>> Call me crazy, but I'm in love with the Unix philosophy of doing one >>>>>> thing well and designing for composition of more complex systems from >>>>>> simple parts. >>>>>> >>>>> "Designing for composition of more complex systems from simple parts" >>>>> is an excellent goal. The problem is that in order to facilitate >>>>> composition, you must have some idea of what kinds of complex systems >>>>> you're going to compose. Given the application domain under discussion >>>>> (Publish/Subscribe even if some other name is used), the problem here is >>>>> that we know from many long years of experience that it is difficult to >>>>> build a content-based system on top of a topic-based system yet it is >>>>> trivial to build a topic-based system on top of a content-based system. It >>>>> is important where you start when designing systems. Things get >>>>> path-dependent very quickly. >>>>> >>>>> The problem is that design decisions made to facilitate topic-based >>>>> system construction tend to make harder the job of building content based >>>>> systems. Take, for example, the regular discussion of "firehoses" which >>>>> are >>>>> almost always a common subject of discussion with topic-based systems but >>>>> are generally irrelevant when discussing content-based systems. A >>>>> firehose, >>>>> which adds complexity to the topic-based implementation, is almost always >>>>> needed when people want to do any kind of content-based work on top of a >>>>> topic-based system. (That can include either real-time filtering or >>>>> dumping >>>>> of data into a database for later "content-based" retrieval or searching.) >>>>> A firehose is simply a mechanism to de-mux or merge together the many >>>>> topic-based streams that were created in order to provide a topic-based >>>>> subscription model. If you start with a topic-based system, you almost >>>>> always need to construct firehoses in order to make content-based routing >>>>> possible. On the other hand, if you start with a content-based system and >>>>> have "topic" as an attribute of each published item, then it is trivial to >>>>> create "topic" streams since they are simply single-attribute >>>>> subscriptions >>>>> keyed on the "topic" attribute. >>>>> >>>>> If you start with a content-based model but want topic-based, then >>>>> instead of subscribing to topic "foobar" you assume that all published >>>>> items have an attribute named "topic" and you subscribe to "topic = >>>>> 'foobar'". A "topic-based" system is thus nothing more than the most >>>>> simple >>>>> use of a content-based system. Of course, the advantage of using a trivial >>>>> content-based interface to emulate a topic-based system is that you can >>>>> then easily expand the capability of the base system to support more >>>>> complex filters or queries. You can go from just a single attribute and >>>>> exact-match to allowing full Boolean expressions, etc. without making a >>>>> significant change to the subscription interface -- the changes are only >>>>> to >>>>> the subscription query syntax and those changes can all produce proper >>>>> supersets of the trival syntax. >>>>> >>>>> What I wonder is what, if any, benefit comes from baking "topic-based" >>>>> into the subscription interface? Given that the alternative provides such >>>>> flexibility down the road, what significant advantage do you get from >>>>> limiting the system's expressiveness up-front? >>>>> >>>>> >>>>>> Queries and filters, to me, are out of the scope of this protocol, >>>>>> despite being very useful. >>>>>> >>>>> If you see my reasoning in the paragraphs above, you won't be >>>>> surprised that I claim that in order to build a topic-based system, you >>>>> already need to build "Queries and Filters." The only difference is that >>>>> if you build something like PSHB, you are building a very simple filter >>>>> language that happens to be hard to extend. When people subscribe to >>>>> topic " >>>>> http://example.com/feed" it is EXACTLY the same, semantically, as >>>>> subscribing using the query "topic = 'http://example.com/feed'"... >>>>> There is no significant introduction of complexity that results from going >>>>> from topic-based to content-based -- only a much easier path to doing more >>>>> interesting things in the future. (i.e. "topic=' >>>>> http://example.com/feed AND content='foobar'" is just a step away...) >>>>> >>>>> >>>>>> The reason is that anybody can create a subscriber or relay (perhaps >>>>>> even a hub) that happens to do that filtering in its implementation. >>>>>> >>>>> Yes, anyone can build yet another aggregator to either consume >>>>> firehoses or construct them and then filter them. But, just because a >>>>> thing >>>>> can be done, doesn't mean that we should insist that it be done -- unless >>>>> there is a good reason not to allow alternatives. In this case, I can't >>>>> see >>>>> that there are. Building the basic system using the model of a trivial >>>>> content-based system doesn't make it any more difficult to build other >>>>> hubs >>>>> or relays that can do arbitrary processing, however, it gives us the >>>>> option >>>>> of allowing a single system, with a standard interface, to do both the >>>>> simple and the complex work in an integrated and more efficient manner. >>>>> >>>>>> >>>>>> That said, I'm assuming this was more just to defend Atom and >>>>>> content-based subscriptions, to which I would say: those examples should >>>>>> be >>>>>> possible *if* you use Atom as your content container and have access to >>>>>> or >>>>>> can build a subscription querier node. But it should also be possible if >>>>>> the content is *not* Atom using the same approach of putting the >>>>>> filtering >>>>>> in an intermediate node (or potentially being an implementation detail >>>>>> of a >>>>>> hub). >>>>>> >>>>>> I just think the core should be simple and neutral, allowing more >>>>>> specialized extensions, additions, and combinability. And for that, my >>>>>> experience (and general observations) suggest that we should focus on >>>>>> content-type neutral HTTP-based mechanisms. >>>>>> >>>>>> -jeff >>>>>> >>>>>> >>>>>>> >>>>>>> bob wyman >>>>>>> >>>>>>> >>>>>>> On Mon, Nov 28, 2011 at 6:33 PM, Julien Genestoux < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Jeff, do you think you could help getting the folks at GitHub, >>>>>>>> Twilio, FreshBooks, Pusher to come in here and participate? What would >>>>>>>> they >>>>>>>> love to see in and out of PubSubHubbub so that it fits their needs? >>>>>>>> >>>>>>>> Bob, that's an interesting point. You said you wanted PSHB to be >>>>>>>> about entries rather than feeds. I'm not sure I understand this. I >>>>>>>> guess >>>>>>>> you would still need to subscribe to an endpoint that would emit a >>>>>>>> collection of entries, right? >>>>>>>> >>>>>>>> Julien >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Nov 29, 2011 at 12:16 AM, Bob Wyman <[email protected]> wrote: >>>>>>>> >>>>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>> > PubSubHubbub is currently too >>>>>>>>> > much oriented toward data feeds >>>>>>>>> Personally, I think that PSHB "went wrong" when folk insisted that >>>>>>>>> it support RSS instead of just Atom. In the Atom format we had gone to >>>>>>>>> great trouble to ensure that "entry" was a top-level item and that >>>>>>>>> entries >>>>>>>>> had the same semantics whether they were inside feeds or on their >>>>>>>>> own. (Not >>>>>>>>> the case with RSS.) One of the reasons that I worked to make this the >>>>>>>>> case >>>>>>>>> was that I've been wanting to do pubsub with arbitrary content for >>>>>>>>> many >>>>>>>>> years... The idea was that an Atom entry is a reasonable wrapper or >>>>>>>>> container for just about any content you might want to publish. (MIME >>>>>>>>> types >>>>>>>>> distinguish the content type.) Thus, a system for syndicating Atom >>>>>>>>> entries >>>>>>>>> could be used to reasonably syndicate just about anything. But, when >>>>>>>>> support for RSS feeds came into the PSHB spec, all sorts of things got >>>>>>>>> confused... PSHB should have been about the entries, not the feeds... >>>>>>>>> >>>>>>>>> bob wyman >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Jeff, thanks for sharing so quickly :) >>>>>>>>>> I perfectly agree and acknowledge that PubSubHubbub is currently >>>>>>>>>> too >>>>>>>>>> much oriented toward data feeds, and content in general, while >>>>>>>>>> it's >>>>>>>>>> just a sub-case. >>>>>>>>>> I also think the "realtime" aspect of things doesn't matter that >>>>>>>>>> much, >>>>>>>>>> and is just a consequence of the "push" design. When you trigger >>>>>>>>>> events, there is no reason to do it later than sooner. >>>>>>>>>> >>>>>>>>>> The spec should evolve in something that works as well for events >>>>>>>>>> than >>>>>>>>>> for content. >>>>>>>>>> It should be "subscribe to a web resource, get events". [this can >>>>>>>>>> be >>>>>>>>>> decorated in any way people want to work with feeds, with >>>>>>>>>> publisher/ >>>>>>>>>> hubs merged or distinct, with no data... etc.] >>>>>>>>>> >>>>>>>>>> Julien >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Nov 28, 11:21 pm, Jeff Lindsay <[email protected]> wrote: >>>>>>>>>> > On Mon, Nov 28, 2011 at 2:02 PM, Julien Genestoux < >>>>>>>>>> > >>>>>>>>>> > [email protected]> wrote: >>>>>>>>>> > > Jeff, please do share your feelings. Help us make >>>>>>>>>> PubSubHubbub better! >>>>>>>>>> > > Bob, obviously pubsubhubub should be less about blogging >>>>>>>>>> and/or news. I >>>>>>>>>> > > started a thread about supporting any kind of arbitrary data, >>>>>>>>>> and this is >>>>>>>>>> > > what I had in mind as a way to suppoty any kind of content, >>>>>>>>>> and any type of >>>>>>>>>> > > updates (with our without payload). >>>>>>>>>> > >>>>>>>>>> > To this point, my main feeling is that, yes, PSHB is focused >>>>>>>>>> too much on >>>>>>>>>> > content. While I think this is useful (as its been the primary >>>>>>>>>> use case), >>>>>>>>>> > it's not a wide enough net to really have critical mass as a >>>>>>>>>> project. I >>>>>>>>>> > originally thought it was good that it was very focused and >>>>>>>>>> didn't solve >>>>>>>>>> > *my* particular problems. I also thought it was good it focused >>>>>>>>>> on a >>>>>>>>>> > tangible goal of making feeds more realtime. However, I think >>>>>>>>>> time has >>>>>>>>>> > shown it was not enough to be a big enough deal to sustain >>>>>>>>>> momentum as a >>>>>>>>>> > project. >>>>>>>>>> > >>>>>>>>>> > The problem is that this general problem PSHB solves has many >>>>>>>>>> different >>>>>>>>>> > views/perspectives/languages. For example, it can be message >>>>>>>>>> oriented and >>>>>>>>>> > talk about pubsub. Or it can be event oriented and talk about >>>>>>>>>> events etc >>>>>>>>>> > (the perspective used by Phil and them). Or it can even be >>>>>>>>>> thought of as >>>>>>>>>> > callbacks or hooks (webhooks). There are other similar concepts >>>>>>>>>> with >>>>>>>>>> > different language as well: updates/notifications, observers, >>>>>>>>>> etc. The two >>>>>>>>>> > main ones seem to be events vs messages/pubsub, and I'm not >>>>>>>>>> sure which one >>>>>>>>>> > is generally consider more general than the other. Ultimately, >>>>>>>>>> technically, >>>>>>>>>> > they're more or less the same thing, but I think the framing >>>>>>>>>> makes a *big* >>>>>>>>>> > difference. >>>>>>>>>> > >>>>>>>>>> > Anyway, that's the start of my ideas around this. >>>>>>>>>> > >>>>>>>>>> > -jeff >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > > Julien >>>>>>>>>> > >>>>>>>>>> > > On Mon, Nov 28, 2011 at 9:33 PM, Bob Wyman <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> > >>>>>>>>>> > >> The sitehttp://www.mostlybaked.com/provides a number of >>>>>>>>>> quick sketches >>>>>>>>>> > >> of applications that are things that I personally think >>>>>>>>>> should work well >>>>>>>>>> > >> over PSHB if the focus of PSHB became less about blogging >>>>>>>>>> and more about >>>>>>>>>> > >> the general case of publishing and subscribing to streams of >>>>>>>>>> data on the >>>>>>>>>> > >> Internet. Also, Phil often talks about the kinds of things >>>>>>>>>> that he'd like >>>>>>>>>> > >> to do with the EventedAPI on his blog. ex: >>>>>>>>>> > >> >>>>>>>>>> http://www.windley.com/archives/2011/11/personal_event_networks_and_v. >>>>>>>>>> .. >>>>>>>>>> > >>>>>>>>>> > >> bob wyman >>>>>>>>>> > >>>>>>>>>> > >> On Mon, Nov 28, 2011 at 1:16 PM, Bob Wyman <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> > >>>>>>>>>> > >>> See:http://www.eventedapi.org/spec >>>>>>>>>> > >>>>>>>>>> > >>> As we consider what can be done to move PubSubHubbub >>>>>>>>>> forward, it might >>>>>>>>>> > >>> make sense to take a look at some other protocols that folk >>>>>>>>>> have defined to >>>>>>>>>> > >>> determine if there is anything in them that PubSubHubbub >>>>>>>>>> should be >>>>>>>>>> > >>> implemented or if they do things better that PSHB does. The >>>>>>>>>> folk at Kynetx ( >>>>>>>>>> > >>>http://apps.kynetx.com/) have been building up a PSHB-like >>>>>>>>>> system for >>>>>>>>>> > >>> some time now... I'm not sure I understand why PSHB >>>>>>>>>> wouldn't, in fact, >>>>>>>>>> > >>> serve their needs. >>>>>>>>>> > >>>>>>>>>> > >>> bob wyman >>>>>>>>>> > >>>>>>>>>> > -- >>>>>>>>>> > Jeff Lindsayhttp://progrium.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jeff Lindsay >>>>>> http://progrium.com >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Jeff Lindsay >>>> http://progrium.com >>>> >>> >>> >> >> >> -- >> Jeff Lindsay >> http://progrium.com >> > >
