To better demonstrate what I'm talking about (re: Topic-based v
Content-based), I've put up a little demo. (Code speaks louder than
words...) Check out: http://frackim.appspot.com/ . "FrackIM" allows you to
"Follow" or "Track" messages which originate either from XMPP IM or from
PubSubHubbub. Read the instructions <http://frackim.appspot.com/> and then
add [email protected] to your buddy list.

The software uses the AppEngine Prospective Search Service and delivers
results using AppEngine's XMPP service. Topic-based behavior is implemented
by subscribing to messages using a query that constrains the "follow"
attribute of messages. This attribute contains either an XMPP JID or an
HTTP URL (the URL is, of course, a PSHB "topic"). For example:
/subscribe follow:huffingtonpost.com is like a topic-based subscription in
PSHB today.
To get "content-based" behavior, you create a subscription that constrains
the "track" attribute of a message. Thus:
/subscribe track:obama is content-based and will match any IM or PSHB
message that contains the word "Obama".
of course, you can combine the two together like this:
/subscribe follow:huffingtonpost.com AND track:obama which would return
only messages published by HuffingtonPost that contain the word Obama.

The point here is to demonstrate that a content-based system can implement
topic-based as a degenerate, trivial case. (i.e. in this example, a "topic
based" system would only support the "follow" attribute.) However, such a
system is easily extended to handle more complex applications by simply
allowing more fields to match against and by allowing a greater variety of
query operators. In such a system, you never even consider building a
"firehose" since you essentially start off with one to begin with.

Give the toy a try and see what you think. Note: it is only subscribed to a
small number of feeds -- mostly political content. So, subscriptions like
"Obama" are more likely to work than geeky stuff like "prospective search."
If you have some PSHB topics you'd like me to add, just send a note
off-list.

On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]> wrote:
> Btw, are you in the area (SF)? It would be interesting to discuss
I work in NYC so it would be hard to meet up in SF any time soon. There is
always email...

bob wyman


On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]> wrote:

> Okay, I better understand your position and perspective on this. Btw, are
> you in the area (SF)? It would be interesting to discuss topic vs content
> based subscriptions in person because I have thought/worked with it a lot,
> but not in those terms.
>
> -jeff
>
>
> On Tue, Nov 29, 2011 at 2:06 PM, Bob Wyman <[email protected]> wrote:
>
>>
>>
>> On Mon, Nov 28, 2011 at 8:45 PM, Jeff Lindsay <[email protected]> wrote:
>>
>>> The idea was that the hub should publish Atom entries and only Atom
>>>> entries. Of course, the entries would contain atom.source elements to show
>>>> the feeds with which they were associated. Also, the hub should do
>>>> de-duping to ensure that any particular entry isn't sent more than once.
>>>>
>>>
>>> Yeah, I get the reasoning behind Atom and I understand it's more general
>>> use. The problem is in order to make something useful and easy to adopt,
>>> you need to really facilitate what people are already doing and are
>>> familiar with. Not everybody wants to work with Atom, despite all its
>>> benefits. Having Atom as a representation or as a possible payload is
>>> great, but depending on its semantics, forcing it to be required for PSHB
>>> to be useful is not a great idea... or least a pragmatic one IMO.
>>>
>>>
>>>> We could build all the above things very easily based on systems that
>>>> publish Atom feeds and allow content-based (query-based) subscriptions.
>>>>
>>>
>>> Call me crazy, but I'm in love with the Unix philosophy of doing one
>>> thing well and designing for composition of more complex systems from
>>> simple parts.
>>>
>> "Designing for composition of more complex systems from simple parts" is
>> an excellent goal. The problem is that in order to facilitate composition,
>> you must have some idea of what kinds of complex systems you're going to
>> compose. Given the application domain under discussion (Publish/Subscribe
>> even if some other name is used), the problem here is that we know from
>> many long years of experience that it is difficult to build a content-based
>> system on top of a topic-based system yet it is trivial to build a
>> topic-based system on top of a content-based system. It is important where
>> you start when designing systems. Things get path-dependent very quickly.
>>
>> The problem is that design decisions made to facilitate topic-based
>> system construction tend to make harder the job of building content based
>> systems. Take, for example, the regular discussion of "firehoses" which are
>> almost always a common subject of discussion with topic-based systems but
>> are generally irrelevant when discussing content-based systems. A firehose,
>> which adds complexity to the topic-based implementation, is almost always
>> needed when people want to do any kind of content-based work on top of a
>> topic-based system. (That can include either real-time filtering or dumping
>> of data into a database for later "content-based" retrieval or searching.)
>> A firehose is simply a mechanism to de-mux or merge together the many
>> topic-based streams that were created in order to provide a topic-based
>> subscription model. If you start with a topic-based system, you almost
>> always need to construct firehoses in order to make content-based routing
>> possible. On the other hand, if you start with a content-based system and
>> have "topic" as an attribute of each published item, then it is trivial to
>> create "topic" streams since they are simply single-attribute subscriptions
>> keyed on the "topic" attribute.
>>
>> If you start with a content-based model but want topic-based, then
>> instead of subscribing to topic "foobar" you assume that all published
>> items have an attribute named "topic" and you subscribe to "topic =
>> 'foobar'". A "topic-based" system is thus nothing more than the most simple
>> use of a content-based system. Of course, the advantage of using a trivial
>> content-based interface to emulate a topic-based system is that you can
>> then easily expand the capability of the base system to support more
>> complex filters or queries. You can go from just a single attribute and
>> exact-match to allowing full Boolean expressions, etc. without making a
>> significant change to the subscription interface -- the changes are only to
>> the subscription query syntax and those changes can all produce proper
>> supersets of the trival syntax.
>>
>> What I wonder is what, if any, benefit comes from baking "topic-based"
>> into the subscription interface? Given that the alternative provides such
>> flexibility down the road, what significant advantage do you get from
>> limiting the system's expressiveness up-front?
>>
>>
>>> Queries and filters, to me, are out of the scope of this protocol,
>>> despite being very useful.
>>>
>> If you see my reasoning in the paragraphs above, you won't be surprised
>> that I claim that in order to build a topic-based system, you already need
>> to build "Queries and Filters."  The only difference is that if you build
>> something like PSHB, you are building a very simple filter language that
>> happens to be hard to extend. When people subscribe to topic "
>> http://example.com/feed"; it is EXACTLY the same, semantically, as
>> subscribing using the query "topic = 'http://example.com/feed'"... There
>> is no significant introduction of complexity that results from going from
>> topic-based to content-based -- only a much easier path to doing more
>> interesting things in the future. (i.e. "topic='http://example.com/feedAND 
>> content='foobar'" is just a step away...)
>>
>>
>>> The reason is that anybody can create a subscriber or relay (perhaps
>>> even a hub) that happens to do that filtering in its implementation.
>>>
>> Yes, anyone can build yet another aggregator to either consume firehoses
>> or construct them and then filter them. But, just because a thing can be
>> done, doesn't mean that we should insist that it be done -- unless there is
>> a good reason not to allow alternatives. In this case, I can't see that
>> there are. Building the basic system using the model of a trivial
>> content-based system doesn't make it any more difficult to build other hubs
>> or relays that can do arbitrary processing, however, it gives us the option
>> of allowing a single system, with a standard interface, to do both the
>> simple and the complex work in an integrated and more efficient manner.
>>
>>>
>>> That said, I'm assuming this was more just to defend Atom and
>>> content-based subscriptions, to which I would say: those examples should be
>>> possible *if* you use Atom as your content container and have access to or
>>> can build a subscription querier node. But it should also be possible if
>>> the content is *not* Atom using the same approach of putting the filtering
>>> in an intermediate node (or potentially being an implementation detail of a
>>> hub).
>>>
>>> I just think the core should be simple and neutral, allowing more
>>> specialized extensions, additions, and combinability. And for that, my
>>> experience (and general observations) suggest that we should focus on
>>> content-type neutral HTTP-based mechanisms.
>>>
>>> -jeff
>>>
>>>
>>>>
>>>> bob wyman
>>>>
>>>>
>>>> On Mon, Nov 28, 2011 at 6:33 PM, Julien Genestoux <
>>>> [email protected]> wrote:
>>>>
>>>>> Jeff, do you think you could help getting the folks at GitHub,
>>>>> Twilio, FreshBooks, Pusher to come in here and participate? What would 
>>>>> they
>>>>> love to see in and out of PubSubHubbub so that it fits their needs?
>>>>>
>>>>> Bob, that's an interesting point. You said you wanted PSHB to be about
>>>>> entries rather than feeds. I'm not sure I understand this. I guess you
>>>>> would still need to subscribe to an endpoint that would emit a collection
>>>>> of entries, right?
>>>>>
>>>>> Julien
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 29, 2011 at 12:16 AM, Bob Wyman <[email protected]> wrote:
>>>>>
>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien <[email protected]>
>>>>>>  wrote:
>>>>>>
>>>>>> > PubSubHubbub is currently too
>>>>>> > much oriented toward data feeds
>>>>>> Personally, I think that PSHB "went wrong" when folk insisted that it
>>>>>> support RSS instead of just Atom. In the Atom format we had gone to great
>>>>>> trouble to ensure that "entry" was a top-level item and that entries had
>>>>>> the same semantics whether they were inside feeds or on their own. (Not 
>>>>>> the
>>>>>> case with RSS.) One of the reasons that I worked to make this the case 
>>>>>> was
>>>>>> that I've been wanting to do pubsub with arbitrary content for many
>>>>>> years... The idea was that an Atom entry is a reasonable wrapper or
>>>>>> container for just about any content you might want to publish. (MIME 
>>>>>> types
>>>>>> distinguish the content type.) Thus, a system for syndicating Atom 
>>>>>> entries
>>>>>> could be used to reasonably syndicate just about anything. But, when
>>>>>> support for RSS feeds came into the PSHB spec, all sorts of things got
>>>>>> confused... PSHB should have been about the entries, not the feeds...
>>>>>>
>>>>>> bob wyman
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Jeff, thanks for sharing so quickly :)
>>>>>>> I perfectly agree and acknowledge that PubSubHubbub is currently too
>>>>>>> much oriented toward data feeds, and content in general, while it's
>>>>>>> just a sub-case.
>>>>>>> I also think the "realtime" aspect of things doesn't matter that
>>>>>>> much,
>>>>>>> and is just a consequence of the "push" design. When you trigger
>>>>>>> events, there is no reason to do it later than sooner.
>>>>>>>
>>>>>>> The spec should evolve in something that works as well for events
>>>>>>> than
>>>>>>> for content.
>>>>>>> It should be "subscribe to a web resource, get events". [this can be
>>>>>>> decorated in any way people want to work with feeds, with publisher/
>>>>>>> hubs merged or distinct, with no data... etc.]
>>>>>>>
>>>>>>> Julien
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Nov 28, 11:21 pm, Jeff Lindsay <[email protected]> wrote:
>>>>>>> > On Mon, Nov 28, 2011 at 2:02 PM, Julien Genestoux <
>>>>>>> >
>>>>>>> > [email protected]> wrote:
>>>>>>> > > Jeff, please do share your feelings. Help us make PubSubHubbub
>>>>>>> better!
>>>>>>> > > Bob, obviously pubsubhubub should be less about blogging and/or
>>>>>>> news. I
>>>>>>> > > started a thread about supporting any kind of arbitrary data,
>>>>>>> and this is
>>>>>>> > > what I had in mind as a way to suppoty any kind of content, and
>>>>>>> any type of
>>>>>>> > > updates (with our without payload).
>>>>>>> >
>>>>>>> > To this point, my main feeling is that, yes, PSHB is focused too
>>>>>>> much on
>>>>>>> > content. While I think this is useful (as its been the primary use
>>>>>>> case),
>>>>>>> > it's not a wide enough net to really have critical mass as a
>>>>>>> project. I
>>>>>>> > originally thought it was good that it was very focused and didn't
>>>>>>> solve
>>>>>>> > *my* particular problems. I also thought it was good it focused on
>>>>>>> a
>>>>>>> > tangible goal of making feeds more realtime. However, I think time
>>>>>>> has
>>>>>>> > shown it was not enough to be a big enough deal to sustain
>>>>>>> momentum as a
>>>>>>> > project.
>>>>>>> >
>>>>>>> > The problem is that this general problem PSHB solves has many
>>>>>>> different
>>>>>>> > views/perspectives/languages. For example, it can be message
>>>>>>> oriented and
>>>>>>> > talk about pubsub. Or it can be event oriented and talk about
>>>>>>> events etc
>>>>>>> > (the perspective used by Phil and them). Or it can even be thought
>>>>>>> of as
>>>>>>> > callbacks or hooks (webhooks). There are other similar concepts
>>>>>>> with
>>>>>>> > different language as well: updates/notifications, observers, etc.
>>>>>>> The two
>>>>>>> > main ones seem to be events vs messages/pubsub, and I'm not sure
>>>>>>> which one
>>>>>>> > is generally consider more general than the other. Ultimately,
>>>>>>> technically,
>>>>>>> > they're more or less the same thing, but I think the framing makes
>>>>>>> a *big*
>>>>>>> > difference.
>>>>>>> >
>>>>>>> > Anyway, that's the start of my ideas around this.
>>>>>>> >
>>>>>>> > -jeff
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > > Julien
>>>>>>> >
>>>>>>> > > On Mon, Nov 28, 2011 at 9:33 PM, Bob Wyman <[email protected]>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > >> The sitehttp://www.mostlybaked.com/provides a number of quick
>>>>>>> sketches
>>>>>>> > >> of applications that are things that I personally think should
>>>>>>> work well
>>>>>>> > >> over PSHB if the focus of PSHB became less about blogging and
>>>>>>> more about
>>>>>>> > >> the general case of publishing and subscribing to streams of
>>>>>>> data on the
>>>>>>> > >> Internet. Also, Phil often talks about the kinds of things that
>>>>>>> he'd like
>>>>>>> > >> to do with the EventedAPI on his blog. ex:
>>>>>>> > >>
>>>>>>> http://www.windley.com/archives/2011/11/personal_event_networks_and_v.
>>>>>>> ..
>>>>>>> >
>>>>>>> > >> bob wyman
>>>>>>> >
>>>>>>> > >> On Mon, Nov 28, 2011 at 1:16 PM, Bob Wyman <[email protected]>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > >>> See:http://www.eventedapi.org/spec
>>>>>>> >
>>>>>>> > >>> As we consider what can be done to move PubSubHubbub forward,
>>>>>>> it might
>>>>>>> > >>> make sense to take a look at some other protocols that folk
>>>>>>> have defined to
>>>>>>> > >>> determine if there is anything in them that PubSubHubbub
>>>>>>> should be
>>>>>>> > >>> implemented or if they do things better that PSHB does. The
>>>>>>> folk at Kynetx (
>>>>>>> > >>>http://apps.kynetx.com/) have been building up a PSHB-like
>>>>>>> system for
>>>>>>> > >>> some time now... I'm not sure I understand why PSHB wouldn't,
>>>>>>> in fact,
>>>>>>> > >>> serve their needs.
>>>>>>> >
>>>>>>> > >>> bob wyman
>>>>>>> >
>>>>>>> > --
>>>>>>> > Jeff Lindsayhttp://progrium.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Jeff Lindsay
>>> http://progrium.com
>>>
>>
>>
>
>
> --
> Jeff Lindsay
> http://progrium.com
>

Reply via email to