Very cool, but I do get the point of topics being implemented as content
filters. In my world, topics are still useful as a sort of primary key to
shard and route more efficiently. Otherwise you just have a firehose that
you always have to do filtering on, which is harder to do -- or at least
adds complexity (throw Storm the loop maybe). So there is a balance. I
usually implement limited content filtering on top of a topic stream.

Anyway, I still feel like topic-based is the way to go because it makes the
simple case very simple, it's familiar, and you *can* do content filtering
on top of it. It's also more compatible with the nature of HTTP (single
resource oriented operations), which is where I think PSHB should live --
not so much in the content. If we were to do content-based filtering, that
would imply moving into the content-space, which is why I understand your
advocacy for Atom. I just don't think we should.

-jeff

On Wed, Nov 30, 2011 at 3:15 PM, Bob Wyman <[email protected]> wrote:

> To better demonstrate what I'm talking about (re: Topic-based v
> Content-based), I've put up a little demo. (Code speaks louder than
> words...) Check out: http://frackim.appspot.com/ . "FrackIM" allows you
> to "Follow" or "Track" messages which originate either from XMPP IM or from
> PubSubHubbub. Read the instructions <http://frackim.appspot.com/> and
> then add [email protected] to your buddy list.
>
> The software uses the AppEngine Prospective Search Service and delivers
> results using AppEngine's XMPP service. Topic-based behavior is implemented
> by subscribing to messages using a query that constrains the "follow"
> attribute of messages. This attribute contains either an XMPP JID or an
> HTTP URL (the URL is, of course, a PSHB "topic"). For example:
> /subscribe follow:huffingtonpost.com is like a topic-based subscription
> in PSHB today.
> To get "content-based" behavior, you create a subscription that constrains
> the "track" attribute of a message. Thus:
> /subscribe track:obama is content-based and will match any IM or PSHB
> message that contains the word "Obama".
> of course, you can combine the two together like this:
> /subscribe follow:huffingtonpost.com AND track:obama which would return
> only messages published by HuffingtonPost that contain the word Obama.
>
> The point here is to demonstrate that a content-based system can implement
> topic-based as a degenerate, trivial case. (i.e. in this example, a "topic
> based" system would only support the "follow" attribute.) However, such a
> system is easily extended to handle more complex applications by simply
> allowing more fields to match against and by allowing a greater variety of
> query operators. In such a system, you never even consider building a
> "firehose" since you essentially start off with one to begin with.
>
> Give the toy a try and see what you think. Note: it is only subscribed to
> a small number of feeds -- mostly political content. So, subscriptions like
> "Obama" are more likely to work than geeky stuff like "prospective search."
> If you have some PSHB topics you'd like me to add, just send a note
> off-list.
>
> On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]> wrote:
> > Btw, are you in the area (SF)? It would be interesting to discuss
> I work in NYC so it would be hard to meet up in SF any time soon. There is
> always email...
>
> bob wyman
>
>
> On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]> wrote:
>
>> Okay, I better understand your position and perspective on this. Btw, are
>> you in the area (SF)? It would be interesting to discuss topic vs content
>> based subscriptions in person because I have thought/worked with it a lot,
>> but not in those terms.
>>
>> -jeff
>>
>>
>> On Tue, Nov 29, 2011 at 2:06 PM, Bob Wyman <[email protected]> wrote:
>>
>>>
>>>
>>> On Mon, Nov 28, 2011 at 8:45 PM, Jeff Lindsay <[email protected]>wrote:
>>>
>>>> The idea was that the hub should publish Atom entries and only Atom
>>>>> entries. Of course, the entries would contain atom.source elements to show
>>>>> the feeds with which they were associated. Also, the hub should do
>>>>> de-duping to ensure that any particular entry isn't sent more than once.
>>>>>
>>>>
>>>> Yeah, I get the reasoning behind Atom and I understand it's more
>>>> general use. The problem is in order to make something useful and easy to
>>>> adopt, you need to really facilitate what people are already doing and are
>>>> familiar with. Not everybody wants to work with Atom, despite all its
>>>> benefits. Having Atom as a representation or as a possible payload is
>>>> great, but depending on its semantics, forcing it to be required for PSHB
>>>> to be useful is not a great idea... or least a pragmatic one IMO.
>>>>
>>>>
>>>>> We could build all the above things very easily based on systems that
>>>>> publish Atom feeds and allow content-based (query-based) subscriptions.
>>>>>
>>>>
>>>> Call me crazy, but I'm in love with the Unix philosophy of doing one
>>>> thing well and designing for composition of more complex systems from
>>>> simple parts.
>>>>
>>> "Designing for composition of more complex systems from simple parts" is
>>> an excellent goal. The problem is that in order to facilitate composition,
>>> you must have some idea of what kinds of complex systems you're going to
>>> compose. Given the application domain under discussion (Publish/Subscribe
>>> even if some other name is used), the problem here is that we know from
>>> many long years of experience that it is difficult to build a content-based
>>> system on top of a topic-based system yet it is trivial to build a
>>> topic-based system on top of a content-based system. It is important where
>>> you start when designing systems. Things get path-dependent very quickly.
>>>
>>> The problem is that design decisions made to facilitate topic-based
>>> system construction tend to make harder the job of building content based
>>> systems. Take, for example, the regular discussion of "firehoses" which are
>>> almost always a common subject of discussion with topic-based systems but
>>> are generally irrelevant when discussing content-based systems. A firehose,
>>> which adds complexity to the topic-based implementation, is almost always
>>> needed when people want to do any kind of content-based work on top of a
>>> topic-based system. (That can include either real-time filtering or dumping
>>> of data into a database for later "content-based" retrieval or searching.)
>>> A firehose is simply a mechanism to de-mux or merge together the many
>>> topic-based streams that were created in order to provide a topic-based
>>> subscription model. If you start with a topic-based system, you almost
>>> always need to construct firehoses in order to make content-based routing
>>> possible. On the other hand, if you start with a content-based system and
>>> have "topic" as an attribute of each published item, then it is trivial to
>>> create "topic" streams since they are simply single-attribute subscriptions
>>> keyed on the "topic" attribute.
>>>
>>> If you start with a content-based model but want topic-based, then
>>> instead of subscribing to topic "foobar" you assume that all published
>>> items have an attribute named "topic" and you subscribe to "topic =
>>> 'foobar'". A "topic-based" system is thus nothing more than the most simple
>>> use of a content-based system. Of course, the advantage of using a trivial
>>> content-based interface to emulate a topic-based system is that you can
>>> then easily expand the capability of the base system to support more
>>> complex filters or queries. You can go from just a single attribute and
>>> exact-match to allowing full Boolean expressions, etc. without making a
>>> significant change to the subscription interface -- the changes are only to
>>> the subscription query syntax and those changes can all produce proper
>>> supersets of the trival syntax.
>>>
>>> What I wonder is what, if any, benefit comes from baking "topic-based"
>>> into the subscription interface? Given that the alternative provides such
>>> flexibility down the road, what significant advantage do you get from
>>> limiting the system's expressiveness up-front?
>>>
>>>
>>>> Queries and filters, to me, are out of the scope of this protocol,
>>>> despite being very useful.
>>>>
>>> If you see my reasoning in the paragraphs above, you won't be surprised
>>> that I claim that in order to build a topic-based system, you already need
>>> to build "Queries and Filters."  The only difference is that if you build
>>> something like PSHB, you are building a very simple filter language that
>>> happens to be hard to extend. When people subscribe to topic "
>>> http://example.com/feed"; it is EXACTLY the same, semantically, as
>>> subscribing using the query "topic = 'http://example.com/feed'"...
>>> There is no significant introduction of complexity that results from going
>>> from topic-based to content-based -- only a much easier path to doing more
>>> interesting things in the future. (i.e. "topic='http://example.com/feedAND 
>>> content='foobar'" is just a step away...)
>>>
>>>
>>>> The reason is that anybody can create a subscriber or relay (perhaps
>>>> even a hub) that happens to do that filtering in its implementation.
>>>>
>>> Yes, anyone can build yet another aggregator to either consume firehoses
>>> or construct them and then filter them. But, just because a thing can be
>>> done, doesn't mean that we should insist that it be done -- unless there is
>>> a good reason not to allow alternatives. In this case, I can't see that
>>> there are. Building the basic system using the model of a trivial
>>> content-based system doesn't make it any more difficult to build other hubs
>>> or relays that can do arbitrary processing, however, it gives us the option
>>> of allowing a single system, with a standard interface, to do both the
>>> simple and the complex work in an integrated and more efficient manner.
>>>
>>>>
>>>> That said, I'm assuming this was more just to defend Atom and
>>>> content-based subscriptions, to which I would say: those examples should be
>>>> possible *if* you use Atom as your content container and have access to or
>>>> can build a subscription querier node. But it should also be possible if
>>>> the content is *not* Atom using the same approach of putting the filtering
>>>> in an intermediate node (or potentially being an implementation detail of a
>>>> hub).
>>>>
>>>> I just think the core should be simple and neutral, allowing more
>>>> specialized extensions, additions, and combinability. And for that, my
>>>> experience (and general observations) suggest that we should focus on
>>>> content-type neutral HTTP-based mechanisms.
>>>>
>>>> -jeff
>>>>
>>>>
>>>>>
>>>>> bob wyman
>>>>>
>>>>>
>>>>> On Mon, Nov 28, 2011 at 6:33 PM, Julien Genestoux <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Jeff, do you think you could help getting the folks at GitHub,
>>>>>> Twilio, FreshBooks, Pusher to come in here and participate? What would 
>>>>>> they
>>>>>> love to see in and out of PubSubHubbub so that it fits their needs?
>>>>>>
>>>>>> Bob, that's an interesting point. You said you wanted PSHB to be
>>>>>> about entries rather than feeds. I'm not sure I understand this. I guess
>>>>>> you would still need to subscribe to an endpoint that would emit a
>>>>>> collection of entries, right?
>>>>>>
>>>>>> Julien
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 29, 2011 at 12:16 AM, Bob Wyman <[email protected]> wrote:
>>>>>>
>>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien <[email protected]>
>>>>>>>  wrote:
>>>>>>>
>>>>>>> > PubSubHubbub is currently too
>>>>>>> > much oriented toward data feeds
>>>>>>> Personally, I think that PSHB "went wrong" when folk insisted that
>>>>>>> it support RSS instead of just Atom. In the Atom format we had gone to
>>>>>>> great trouble to ensure that "entry" was a top-level item and that 
>>>>>>> entries
>>>>>>> had the same semantics whether they were inside feeds or on their own. 
>>>>>>> (Not
>>>>>>> the case with RSS.) One of the reasons that I worked to make this the 
>>>>>>> case
>>>>>>> was that I've been wanting to do pubsub with arbitrary content for many
>>>>>>> years... The idea was that an Atom entry is a reasonable wrapper or
>>>>>>> container for just about any content you might want to publish. (MIME 
>>>>>>> types
>>>>>>> distinguish the content type.) Thus, a system for syndicating Atom 
>>>>>>> entries
>>>>>>> could be used to reasonably syndicate just about anything. But, when
>>>>>>> support for RSS feeds came into the PSHB spec, all sorts of things got
>>>>>>> confused... PSHB should have been about the entries, not the feeds...
>>>>>>>
>>>>>>> bob wyman
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>> Jeff, thanks for sharing so quickly :)
>>>>>>>> I perfectly agree and acknowledge that PubSubHubbub is currently too
>>>>>>>> much oriented toward data feeds, and content in general, while it's
>>>>>>>> just a sub-case.
>>>>>>>> I also think the "realtime" aspect of things doesn't matter that
>>>>>>>> much,
>>>>>>>> and is just a consequence of the "push" design. When you trigger
>>>>>>>> events, there is no reason to do it later than sooner.
>>>>>>>>
>>>>>>>> The spec should evolve in something that works as well for events
>>>>>>>> than
>>>>>>>> for content.
>>>>>>>> It should be "subscribe to a web resource, get events". [this can be
>>>>>>>> decorated in any way people want to work with feeds, with publisher/
>>>>>>>> hubs merged or distinct, with no data... etc.]
>>>>>>>>
>>>>>>>> Julien
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Nov 28, 11:21 pm, Jeff Lindsay <[email protected]> wrote:
>>>>>>>> > On Mon, Nov 28, 2011 at 2:02 PM, Julien Genestoux <
>>>>>>>> >
>>>>>>>> > [email protected]> wrote:
>>>>>>>> > > Jeff, please do share your feelings. Help us make PubSubHubbub
>>>>>>>> better!
>>>>>>>> > > Bob, obviously pubsubhubub should be less about blogging and/or
>>>>>>>> news. I
>>>>>>>> > > started a thread about supporting any kind of arbitrary data,
>>>>>>>> and this is
>>>>>>>> > > what I had in mind as a way to suppoty any kind of content, and
>>>>>>>> any type of
>>>>>>>> > > updates (with our without payload).
>>>>>>>> >
>>>>>>>> > To this point, my main feeling is that, yes, PSHB is focused too
>>>>>>>> much on
>>>>>>>> > content. While I think this is useful (as its been the primary
>>>>>>>> use case),
>>>>>>>> > it's not a wide enough net to really have critical mass as a
>>>>>>>> project. I
>>>>>>>> > originally thought it was good that it was very focused and
>>>>>>>> didn't solve
>>>>>>>> > *my* particular problems. I also thought it was good it focused
>>>>>>>> on a
>>>>>>>> > tangible goal of making feeds more realtime. However, I think
>>>>>>>> time has
>>>>>>>> > shown it was not enough to be a big enough deal to sustain
>>>>>>>> momentum as a
>>>>>>>> > project.
>>>>>>>> >
>>>>>>>> > The problem is that this general problem PSHB solves has many
>>>>>>>> different
>>>>>>>> > views/perspectives/languages. For example, it can be message
>>>>>>>> oriented and
>>>>>>>> > talk about pubsub. Or it can be event oriented and talk about
>>>>>>>> events etc
>>>>>>>> > (the perspective used by Phil and them). Or it can even be
>>>>>>>> thought of as
>>>>>>>> > callbacks or hooks (webhooks). There are other similar concepts
>>>>>>>> with
>>>>>>>> > different language as well: updates/notifications, observers,
>>>>>>>> etc. The two
>>>>>>>> > main ones seem to be events vs messages/pubsub, and I'm not sure
>>>>>>>> which one
>>>>>>>> > is generally consider more general than the other. Ultimately,
>>>>>>>> technically,
>>>>>>>> > they're more or less the same thing, but I think the framing
>>>>>>>> makes a *big*
>>>>>>>> > difference.
>>>>>>>> >
>>>>>>>> > Anyway, that's the start of my ideas around this.
>>>>>>>> >
>>>>>>>> > -jeff
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > > Julien
>>>>>>>> >
>>>>>>>> > > On Mon, Nov 28, 2011 at 9:33 PM, Bob Wyman <[email protected]>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > >> The sitehttp://www.mostlybaked.com/provides a number of quick
>>>>>>>> sketches
>>>>>>>> > >> of applications that are things that I personally think should
>>>>>>>> work well
>>>>>>>> > >> over PSHB if the focus of PSHB became less about blogging and
>>>>>>>> more about
>>>>>>>> > >> the general case of publishing and subscribing to streams of
>>>>>>>> data on the
>>>>>>>> > >> Internet. Also, Phil often talks about the kinds of things
>>>>>>>> that he'd like
>>>>>>>> > >> to do with the EventedAPI on his blog. ex:
>>>>>>>> > >>
>>>>>>>> http://www.windley.com/archives/2011/11/personal_event_networks_and_v.
>>>>>>>> ..
>>>>>>>> >
>>>>>>>> > >> bob wyman
>>>>>>>> >
>>>>>>>> > >> On Mon, Nov 28, 2011 at 1:16 PM, Bob Wyman <[email protected]>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > >>> See:http://www.eventedapi.org/spec
>>>>>>>> >
>>>>>>>> > >>> As we consider what can be done to move PubSubHubbub forward,
>>>>>>>> it might
>>>>>>>> > >>> make sense to take a look at some other protocols that folk
>>>>>>>> have defined to
>>>>>>>> > >>> determine if there is anything in them that PubSubHubbub
>>>>>>>> should be
>>>>>>>> > >>> implemented or if they do things better that PSHB does. The
>>>>>>>> folk at Kynetx (
>>>>>>>> > >>>http://apps.kynetx.com/) have been building up a PSHB-like
>>>>>>>> system for
>>>>>>>> > >>> some time now... I'm not sure I understand why PSHB wouldn't,
>>>>>>>> in fact,
>>>>>>>> > >>> serve their needs.
>>>>>>>> >
>>>>>>>> > >>> bob wyman
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Jeff Lindsayhttp://progrium.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Lindsay
>>>> http://progrium.com
>>>>
>>>
>>>
>>
>>
>> --
>> Jeff Lindsay
>> http://progrium.com
>>
>
>


-- 
Jeff Lindsay
http://progrium.com

Reply via email to