Ho, and it would indeed be amazing if I could subscribe to Google's result
pages :)

Julien



On Thu, Dec 1, 2011 at 9:43 AM, Julien Genestoux <[email protected]
> wrote:

> Bob,
>
> I really like your example. I have a question though. One of my favorite
> things about Google's search is that they use query params for search
> rather than POST params, which means we have permalinks for searches.
>
> In my mind we could use topics to do content based searches. For example,
> I would rewrite your
> /subscribe follow:huffingtonpost.com as /subscribe
> http://frackim.appspot.com/?follow=huffingtonpost.com
>
> This way, there is no difference in subscription process between an actual
> topic (feed) or track/follow search. Other benefits include the fact that
> we have a permalink to access historical results, but also that each
> service can implement their very own operators and algorithm, without
> affecting the subscription process. In my mind, this is a very powerful
> decoupling, and really equivalent to a "shell" with pipes approach
> described by Jeff in a previous message.
>
> So, if we can turn PubSubHubbub into a "generic" web resource subscription
> service, there is no need to distinguish between pure topic based and
> content based subscriptions.
>
> Please let me know if this is unclear.
>
> Julien
>
>
> On Thu, Dec 1, 2011 at 12:31 AM, Jeff Lindsay <[email protected]> wrote:
>
>> Very cool, but I do get the point of topics being implemented as content
>> filters. In my world, topics are still useful as a sort of primary key to
>> shard and route more efficiently. Otherwise you just have a firehose that
>> you always have to do filtering on, which is harder to do -- or at least
>> adds complexity (throw Storm the loop maybe). So there is a balance. I
>> usually implement limited content filtering on top of a topic stream.
>>
>> Anyway, I still feel like topic-based is the way to go because it makes
>> the simple case very simple, it's familiar, and you *can* do content
>> filtering on top of it. It's also more compatible with the nature of HTTP
>> (single resource oriented operations), which is where I think PSHB should
>> live -- not so much in the content. If we were to do content-based
>> filtering, that would imply moving into the content-space, which is why I
>> understand your advocacy for Atom. I just don't think we should.
>>
>> -jeff
>>
>>
>> On Wed, Nov 30, 2011 at 3:15 PM, Bob Wyman <[email protected]> wrote:
>>
>>> To better demonstrate what I'm talking about (re: Topic-based v
>>> Content-based), I've put up a little demo. (Code speaks louder than
>>> words...) Check out: http://frackim.appspot.com/ . "FrackIM" allows you
>>> to "Follow" or "Track" messages which originate either from XMPP IM or from
>>> PubSubHubbub. Read the instructions <http://frackim.appspot.com/> and
>>> then add [email protected] to your buddy list.
>>>
>>> The software uses the AppEngine Prospective Search Service and delivers
>>> results using AppEngine's XMPP service. Topic-based behavior is implemented
>>> by subscribing to messages using a query that constrains the "follow"
>>> attribute of messages. This attribute contains either an XMPP JID or an
>>> HTTP URL (the URL is, of course, a PSHB "topic"). For example:
>>> /subscribe follow:huffingtonpost.com is like a topic-based subscription
>>> in PSHB today.
>>> To get "content-based" behavior, you create a subscription that
>>> constrains the "track" attribute of a message. Thus:
>>> /subscribe track:obama is content-based and will match any IM or PSHB
>>> message that contains the word "Obama".
>>> of course, you can combine the two together like this:
>>> /subscribe follow:huffingtonpost.com AND track:obama which would return
>>> only messages published by HuffingtonPost that contain the word Obama.
>>>
>>> The point here is to demonstrate that a content-based system can
>>> implement topic-based as a degenerate, trivial case. (i.e. in this example,
>>> a "topic based" system would only support the "follow" attribute.) However,
>>> such a system is easily extended to handle more complex applications by
>>> simply allowing more fields to match against and by allowing a greater
>>> variety of query operators. In such a system, you never even consider
>>> building a "firehose" since you essentially start off with one to begin
>>> with.
>>>
>>> Give the toy a try and see what you think. Note: it is only subscribed
>>> to a small number of feeds -- mostly political content. So, subscriptions
>>> like "Obama" are more likely to work than geeky stuff like "prospective
>>> search." If you have some PSHB topics you'd like me to add, just send a
>>> note off-list.
>>>
>>> On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]>
>>>  wrote:
>>> > Btw, are you in the area (SF)? It would be interesting to discuss
>>> I work in NYC so it would be hard to meet up in SF any time soon. There
>>> is always email...
>>>
>>> bob wyman
>>>
>>>
>>> On Tue, Nov 29, 2011 at 7:42 PM, Jeff Lindsay <[email protected]>wrote:
>>>
>>>> Okay, I better understand your position and perspective on this. Btw,
>>>> are you in the area (SF)? It would be interesting to discuss topic vs
>>>> content based subscriptions in person because I have thought/worked with it
>>>> a lot, but not in those terms.
>>>>
>>>> -jeff
>>>>
>>>>
>>>> On Tue, Nov 29, 2011 at 2:06 PM, Bob Wyman <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 28, 2011 at 8:45 PM, Jeff Lindsay <[email protected]>wrote:
>>>>>
>>>>>> The idea was that the hub should publish Atom entries and only Atom
>>>>>>> entries. Of course, the entries would contain atom.source elements to 
>>>>>>> show
>>>>>>> the feeds with which they were associated. Also, the hub should do
>>>>>>> de-duping to ensure that any particular entry isn't sent more than once.
>>>>>>>
>>>>>>
>>>>>> Yeah, I get the reasoning behind Atom and I understand it's more
>>>>>> general use. The problem is in order to make something useful and easy to
>>>>>> adopt, you need to really facilitate what people are already doing and 
>>>>>> are
>>>>>> familiar with. Not everybody wants to work with Atom, despite all its
>>>>>> benefits. Having Atom as a representation or as a possible payload is
>>>>>> great, but depending on its semantics, forcing it to be required for PSHB
>>>>>> to be useful is not a great idea... or least a pragmatic one IMO.
>>>>>>
>>>>>>
>>>>>>> We could build all the above things very easily based on systems
>>>>>>> that publish Atom feeds and allow content-based (query-based) 
>>>>>>> subscriptions.
>>>>>>>
>>>>>>
>>>>>> Call me crazy, but I'm in love with the Unix philosophy of doing one
>>>>>> thing well and designing for composition of more complex systems from
>>>>>> simple parts.
>>>>>>
>>>>> "Designing for composition of more complex systems from simple parts"
>>>>> is an excellent goal. The problem is that in order to facilitate
>>>>> composition, you must have some idea of what kinds of complex systems
>>>>> you're going to compose. Given the application domain under discussion
>>>>> (Publish/Subscribe even if some other name is used), the problem here is
>>>>> that we know from many long years of experience that it is difficult to
>>>>> build a content-based system on top of a topic-based system yet it is
>>>>> trivial to build a topic-based system on top of a content-based system. It
>>>>> is important where you start when designing systems. Things get
>>>>> path-dependent very quickly.
>>>>>
>>>>> The problem is that design decisions made to facilitate topic-based
>>>>> system construction tend to make harder the job of building content based
>>>>> systems. Take, for example, the regular discussion of "firehoses" which 
>>>>> are
>>>>> almost always a common subject of discussion with topic-based systems but
>>>>> are generally irrelevant when discussing content-based systems. A 
>>>>> firehose,
>>>>> which adds complexity to the topic-based implementation, is almost always
>>>>> needed when people want to do any kind of content-based work on top of a
>>>>> topic-based system. (That can include either real-time filtering or 
>>>>> dumping
>>>>> of data into a database for later "content-based" retrieval or searching.)
>>>>> A firehose is simply a mechanism to de-mux or merge together the many
>>>>> topic-based streams that were created in order to provide a topic-based
>>>>> subscription model. If you start with a topic-based system, you almost
>>>>> always need to construct firehoses in order to make content-based routing
>>>>> possible. On the other hand, if you start with a content-based system and
>>>>> have "topic" as an attribute of each published item, then it is trivial to
>>>>> create "topic" streams since they are simply single-attribute 
>>>>> subscriptions
>>>>> keyed on the "topic" attribute.
>>>>>
>>>>> If you start with a content-based model but want topic-based, then
>>>>> instead of subscribing to topic "foobar" you assume that all published
>>>>> items have an attribute named "topic" and you subscribe to "topic =
>>>>> 'foobar'". A "topic-based" system is thus nothing more than the most 
>>>>> simple
>>>>> use of a content-based system. Of course, the advantage of using a trivial
>>>>> content-based interface to emulate a topic-based system is that you can
>>>>> then easily expand the capability of the base system to support more
>>>>> complex filters or queries. You can go from just a single attribute and
>>>>> exact-match to allowing full Boolean expressions, etc. without making a
>>>>> significant change to the subscription interface -- the changes are only 
>>>>> to
>>>>> the subscription query syntax and those changes can all produce proper
>>>>> supersets of the trival syntax.
>>>>>
>>>>> What I wonder is what, if any, benefit comes from baking "topic-based"
>>>>> into the subscription interface? Given that the alternative provides such
>>>>> flexibility down the road, what significant advantage do you get from
>>>>> limiting the system's expressiveness up-front?
>>>>>
>>>>>
>>>>>> Queries and filters, to me, are out of the scope of this protocol,
>>>>>> despite being very useful.
>>>>>>
>>>>> If you see my reasoning in the paragraphs above, you won't be
>>>>> surprised that I claim that in order to build a topic-based system, you
>>>>> already need to build "Queries and Filters."  The only difference is that
>>>>> if you build something like PSHB, you are building a very simple filter
>>>>> language that happens to be hard to extend. When people subscribe to 
>>>>> topic "
>>>>> http://example.com/feed"; it is EXACTLY the same, semantically, as
>>>>> subscribing using the query "topic = 'http://example.com/feed'"...
>>>>> There is no significant introduction of complexity that results from going
>>>>> from topic-based to content-based -- only a much easier path to doing more
>>>>> interesting things in the future. (i.e. "topic='
>>>>> http://example.com/feed AND content='foobar'" is just a step away...)
>>>>>
>>>>>
>>>>>> The reason is that anybody can create a subscriber or relay (perhaps
>>>>>> even a hub) that happens to do that filtering in its implementation.
>>>>>>
>>>>> Yes, anyone can build yet another aggregator to either consume
>>>>> firehoses or construct them and then filter them. But, just because a 
>>>>> thing
>>>>> can be done, doesn't mean that we should insist that it be done -- unless
>>>>> there is a good reason not to allow alternatives. In this case, I can't 
>>>>> see
>>>>> that there are. Building the basic system using the model of a trivial
>>>>> content-based system doesn't make it any more difficult to build other 
>>>>> hubs
>>>>> or relays that can do arbitrary processing, however, it gives us the 
>>>>> option
>>>>> of allowing a single system, with a standard interface, to do both the
>>>>> simple and the complex work in an integrated and more efficient manner.
>>>>>
>>>>>>
>>>>>> That said, I'm assuming this was more just to defend Atom and
>>>>>> content-based subscriptions, to which I would say: those examples should 
>>>>>> be
>>>>>> possible *if* you use Atom as your content container and have access to 
>>>>>> or
>>>>>> can build a subscription querier node. But it should also be possible if
>>>>>> the content is *not* Atom using the same approach of putting the 
>>>>>> filtering
>>>>>> in an intermediate node (or potentially being an implementation detail 
>>>>>> of a
>>>>>> hub).
>>>>>>
>>>>>> I just think the core should be simple and neutral, allowing more
>>>>>> specialized extensions, additions, and combinability. And for that, my
>>>>>> experience (and general observations) suggest that we should focus on
>>>>>> content-type neutral HTTP-based mechanisms.
>>>>>>
>>>>>> -jeff
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> bob wyman
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 28, 2011 at 6:33 PM, Julien Genestoux <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Jeff, do you think you could help getting the folks at GitHub,
>>>>>>>> Twilio, FreshBooks, Pusher to come in here and participate? What would 
>>>>>>>> they
>>>>>>>> love to see in and out of PubSubHubbub so that it fits their needs?
>>>>>>>>
>>>>>>>> Bob, that's an interesting point. You said you wanted PSHB to be
>>>>>>>> about entries rather than feeds. I'm not sure I understand this. I 
>>>>>>>> guess
>>>>>>>> you would still need to subscribe to an endpoint that would emit a
>>>>>>>> collection of entries, right?
>>>>>>>>
>>>>>>>> Julien
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 29, 2011 at 12:16 AM, Bob Wyman <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>> > PubSubHubbub is currently too
>>>>>>>>> > much oriented toward data feeds
>>>>>>>>> Personally, I think that PSHB "went wrong" when folk insisted that
>>>>>>>>> it support RSS instead of just Atom. In the Atom format we had gone to
>>>>>>>>> great trouble to ensure that "entry" was a top-level item and that 
>>>>>>>>> entries
>>>>>>>>> had the same semantics whether they were inside feeds or on their 
>>>>>>>>> own. (Not
>>>>>>>>> the case with RSS.) One of the reasons that I worked to make this the 
>>>>>>>>> case
>>>>>>>>> was that I've been wanting to do pubsub with arbitrary content for 
>>>>>>>>> many
>>>>>>>>> years... The idea was that an Atom entry is a reasonable wrapper or
>>>>>>>>> container for just about any content you might want to publish. (MIME 
>>>>>>>>> types
>>>>>>>>> distinguish the content type.) Thus, a system for syndicating Atom 
>>>>>>>>> entries
>>>>>>>>> could be used to reasonably syndicate just about anything. But, when
>>>>>>>>> support for RSS feeds came into the PSHB spec, all sorts of things got
>>>>>>>>> confused... PSHB should have been about the entries, not the feeds...
>>>>>>>>>
>>>>>>>>> bob wyman
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 28, 2011 at 5:31 PM, Julien <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Jeff, thanks for sharing so quickly :)
>>>>>>>>>> I perfectly agree and acknowledge that PubSubHubbub is currently
>>>>>>>>>> too
>>>>>>>>>> much oriented toward data feeds, and content in general, while
>>>>>>>>>> it's
>>>>>>>>>> just a sub-case.
>>>>>>>>>> I also think the "realtime" aspect of things doesn't matter that
>>>>>>>>>> much,
>>>>>>>>>> and is just a consequence of the "push" design. When you trigger
>>>>>>>>>> events, there is no reason to do it later than sooner.
>>>>>>>>>>
>>>>>>>>>> The spec should evolve in something that works as well for events
>>>>>>>>>> than
>>>>>>>>>> for content.
>>>>>>>>>> It should be "subscribe to a web resource, get events". [this can
>>>>>>>>>> be
>>>>>>>>>> decorated in any way people want to work with feeds, with
>>>>>>>>>> publisher/
>>>>>>>>>> hubs merged or distinct, with no data... etc.]
>>>>>>>>>>
>>>>>>>>>> Julien
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Nov 28, 11:21 pm, Jeff Lindsay <[email protected]> wrote:
>>>>>>>>>> > On Mon, Nov 28, 2011 at 2:02 PM, Julien Genestoux <
>>>>>>>>>> >
>>>>>>>>>> > [email protected]> wrote:
>>>>>>>>>> > > Jeff, please do share your feelings. Help us make
>>>>>>>>>> PubSubHubbub better!
>>>>>>>>>> > > Bob, obviously pubsubhubub should be less about blogging
>>>>>>>>>> and/or news. I
>>>>>>>>>> > > started a thread about supporting any kind of arbitrary data,
>>>>>>>>>> and this is
>>>>>>>>>> > > what I had in mind as a way to suppoty any kind of content,
>>>>>>>>>> and any type of
>>>>>>>>>> > > updates (with our without payload).
>>>>>>>>>> >
>>>>>>>>>> > To this point, my main feeling is that, yes, PSHB is focused
>>>>>>>>>> too much on
>>>>>>>>>> > content. While I think this is useful (as its been the primary
>>>>>>>>>> use case),
>>>>>>>>>> > it's not a wide enough net to really have critical mass as a
>>>>>>>>>> project. I
>>>>>>>>>> > originally thought it was good that it was very focused and
>>>>>>>>>> didn't solve
>>>>>>>>>> > *my* particular problems. I also thought it was good it focused
>>>>>>>>>> on a
>>>>>>>>>> > tangible goal of making feeds more realtime. However, I think
>>>>>>>>>> time has
>>>>>>>>>> > shown it was not enough to be a big enough deal to sustain
>>>>>>>>>> momentum as a
>>>>>>>>>> > project.
>>>>>>>>>> >
>>>>>>>>>> > The problem is that this general problem PSHB solves has many
>>>>>>>>>> different
>>>>>>>>>> > views/perspectives/languages. For example, it can be message
>>>>>>>>>> oriented and
>>>>>>>>>> > talk about pubsub. Or it can be event oriented and talk about
>>>>>>>>>> events etc
>>>>>>>>>> > (the perspective used by Phil and them). Or it can even be
>>>>>>>>>> thought of as
>>>>>>>>>> > callbacks or hooks (webhooks). There are other similar concepts
>>>>>>>>>> with
>>>>>>>>>> > different language as well: updates/notifications, observers,
>>>>>>>>>> etc. The two
>>>>>>>>>> > main ones seem to be events vs messages/pubsub, and I'm not
>>>>>>>>>> sure which one
>>>>>>>>>> > is generally consider more general than the other. Ultimately,
>>>>>>>>>> technically,
>>>>>>>>>> > they're more or less the same thing, but I think the framing
>>>>>>>>>> makes a *big*
>>>>>>>>>> > difference.
>>>>>>>>>> >
>>>>>>>>>> > Anyway, that's the start of my ideas around this.
>>>>>>>>>> >
>>>>>>>>>> > -jeff
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > > Julien
>>>>>>>>>> >
>>>>>>>>>> > > On Mon, Nov 28, 2011 at 9:33 PM, Bob Wyman <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > >> The sitehttp://www.mostlybaked.com/provides a number of
>>>>>>>>>> quick sketches
>>>>>>>>>> > >> of applications that are things that I personally think
>>>>>>>>>> should work well
>>>>>>>>>> > >> over PSHB if the focus of PSHB became less about blogging
>>>>>>>>>> and more about
>>>>>>>>>> > >> the general case of publishing and subscribing to streams of
>>>>>>>>>> data on the
>>>>>>>>>> > >> Internet. Also, Phil often talks about the kinds of things
>>>>>>>>>> that he'd like
>>>>>>>>>> > >> to do with the EventedAPI on his blog. ex:
>>>>>>>>>> > >>
>>>>>>>>>> http://www.windley.com/archives/2011/11/personal_event_networks_and_v.
>>>>>>>>>> ..
>>>>>>>>>> >
>>>>>>>>>> > >> bob wyman
>>>>>>>>>> >
>>>>>>>>>> > >> On Mon, Nov 28, 2011 at 1:16 PM, Bob Wyman <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > >>> See:http://www.eventedapi.org/spec
>>>>>>>>>> >
>>>>>>>>>> > >>> As we consider what can be done to move PubSubHubbub
>>>>>>>>>> forward, it might
>>>>>>>>>> > >>> make sense to take a look at some other protocols that folk
>>>>>>>>>> have defined to
>>>>>>>>>> > >>> determine if there is anything in them that PubSubHubbub
>>>>>>>>>> should be
>>>>>>>>>> > >>> implemented or if they do things better that PSHB does. The
>>>>>>>>>> folk at Kynetx (
>>>>>>>>>> > >>>http://apps.kynetx.com/) have been building up a PSHB-like
>>>>>>>>>> system for
>>>>>>>>>> > >>> some time now... I'm not sure I understand why PSHB
>>>>>>>>>> wouldn't, in fact,
>>>>>>>>>> > >>> serve their needs.
>>>>>>>>>> >
>>>>>>>>>> > >>> bob wyman
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Jeff Lindsayhttp://progrium.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Lindsay
>>>>>> http://progrium.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Lindsay
>>>> http://progrium.com
>>>>
>>>
>>>
>>
>>
>> --
>> Jeff Lindsay
>> http://progrium.com
>>
>
>

Reply via email to