Re: [PubSub] collection node definition

Ralph Meijer Wed, 11 Nov 2009 10:47:26 -0800

On Thu, 2009-09-17 at 00:10 +0200, Fabio Forno wrote:
> 2009/9/16 Peter Saint-Andre <[email protected]>:
> >>     Agreed. The only reason I champion collection nodes as I do is that
> >> I don't know what "node as code" means. It might be better, but in the
> >> mean time the only solution I have is collection nodes.
> >
> > Ralph? ;-)
> >
> 
> I try to guess ;) Though I don't see a direct connection with
> collection nodes, this isa concept we are exploring thanks to Ralph's
> implementation in wokkel which allows implementing custom logic behind
> a node. At present we all think to pubsub nodes as simple dispatchers
> that follow rules defined by node configuration and affiliations, i.e.
> we publish an event and, by applying those rules, the event is
> delivered as it is to a set of subscribers, always to the same ones
> for any event. Things become more interesting when we can customize
> the behavior of a node by writing our code processing a  publish or
> delete of an item, so that we can transform the item itself before
> delivering it, aggregate the events, implement our delivery policy
> (e.g. content based delivery, or things like delivering to just the
> highest priority online subscriber). Perhaps the possibility of doing
> content based delivery is something which is similar to collection
> nodes, but this would require to specify some selection options in the
> subscription.


Yeah, mostly this.

The connection with collection nodes is that they both allow for
subscribing to a node that will cause the subscriber to receive
notifications from events that can also be subscribed to more
specifically. Let's make that more concrete:

Say I model my blog posts as individual nodes (e.g. 'blog:fosdem_2010').
You can subscribe to each of those separately, getting updates whenever
a post changes. Assume the payload is an Atom Entry Document. This in
itself is nice, and has use cases like keeping remote copies in sync,
e.g. to show summaries.

A problem with the above is that you have to know about the existence of
a particular post. You won't be notified about new ones. What you would
like is to subscribe to a node that will either yield notifications of
the fact that a new node (blog post) exists, or direct notifications of
the blog post itself.

In come collections. A collection node (as it is currently defined)
allows for associating nodes to it. E.g. the node 'blog:fosdem_2009'
would be associated with the collection node 'blog'.

A subscriber can choose between two models of subscription: 'nodes' or
'items'. The former will send notifications about changed associations
(new blog posts), while the latter will simply make notifications from
the associate nodes also go to the subscriber of the collection. In that
case, the notifications carries a 'Collection' SHIM headers that holds
the actual node subscribed to.

Another solution is 'code-as-node'. Basically the system will magically
also send out notifications to subscribers of 'blog' whenever
'blog:fosdem_2010' gets updated, or when 'blog:oscon_2010' appears. As
it is currently done (at least by me), 'blog' looks like a leaf node,
just like the others.

The code-as-node model has several advantages over collections as they
are defined now. It allows for more dynamic associations, or even
content-based subscriptions (prospective search, like Collecta). You
don't need to make the associations explicit, because the logic is in
the system.

On the other hand, there is no way to detect duplicates other than
looking at the payload, or maybe through service-wide unique item
identifiers.

Having some implementation experience with both, I am thinking we should
try to define collections more loosely in XEP-0248, allowing for making
'code-as-node' type nodes that act as a collection. I.e. notifications
would be sent out as the 'original' node, but include information on the
subscriptions that caused the notifications to be send to the recipient.
Currently, I believe a combination of 'Collection' and 'SubID' headers
is ambiguous in some cases. It would be nice if we could simply send
along combinations of subscribed-to-node and SubID. Maybe in a new SHIM
header.

Also, pubsub is often useless without a way to retrieve previously
published items. For this, items requests would need to be allowed for
collection nodes. An implementation can decide for itself how to
actually implement this, but caching the last n items send out for a
particular node comes to mind. In complicated use-cases, per
subscription. Think inboxes with a sliding window.

As noted by Brian Cully back in June [1], we would need to be able to
represent items from different nodes in one response. I could also
imagine having an empty result for such a query, triggering the sending
of notifications for the matching items asynchronously. This also
prevents very large stanzas.

Some of the other concerns I noted earlier in this thread still need to
be looked into.

ralphm

[1] http://mail.jabber.org/pipermail/pubsub/2009-June/000227.html

Re: [PubSub] collection node definition

Reply via email to