Re: [PubSub] collection node definition

Ralph Meijer Tue, 24 Nov 2009 01:45:11 -0800

On Sat, 2009-11-21 at 10:08 -0500, Robin Collier wrote:
> [..]
> 
> > I assume you mean generic vs. custom where you talk about open vs.
> > closed, because we are talking about a protocol here. Even though most
> > of our specifications go into detail about possible business rules in
> > implementations, the focus is still about the relation between inputs
> > and outputs. This is especially true for publish-subscribe.
> > 
> Actually, I was referring to an open vs. closed system deployment. [..]
>
> Personally, I see collections as a simple organizational tool for nodes
> and I don't think they need to be any more than that.  My own messaging
> background is in the enterprise using JMS, and that spec also allows for 
> such a grouping (hierarchical only), although it is considered optional.
> (Sidenote: This optional part kind of stinks though since it means you cannot 
> make 
> your code vendor neutral since there is no discovery mechanism for such
> functionality.)


I think we mean the same thing here. A generic pubsub service is one
that does not have any application specific logic behind it.

While I understand that one could define how to create node hierarchies
(the current version of XEP-0248 does that), and there is prior art, I
am questioning the actual usefulness of it.

Again, real-world use cases would help here. Without such justification,
collections, as currently defined, are a waste of time. I find it too
complex, even though it is theoretically feasible.

> > [..]
> > 
> > Collections as I see them are really just abstractions of content-based
> > pubsub systems (hi Bob Wyman!), where you basically assign a fixed name
> > (node identifier) to a particular query into the notification plasm. I
> > am still interested in explicitly defining the minimally subscribe-able
> > unit (like a blog post), so I want to to pass along a specific node from
> > where a notification originates, though. 
> 
> That is an interesting concept, correct me if I am wrong, but this sounds
> an awful lot like a view in a relational database.  I am not sure if I would 
> consider this to be a collection though, it seems to me like another concept
> which would be better called an aggregation node.  I guess I would
> distinguish them by defining a collection node as a collection of nodes, 
> whereas
> an aggregation node is a collection of items from multiple nodes.

I'm ok with giving it another name. In the end, though, the only
difference is in how publish actions on a particular leaf node cause a
notification on the aggregate node. The notifications could be mostly
identical.

For Collection Nodes, as currently defined in XEP-0248, you have a
static configuration of the associations, where the whole system is a
DAG. The configuration and (implementing the) mechanics of causing
notifications to be pushed out to the correct subscribers is painful.
Authorization is ill-defined. There are probably other issues. As said
above, if there is no actual use for this, I would move to do away with
these aspects of the specification.

Other aspects are worth keeping for Aggregate Nodes. E.g. the concept of
subscribing to the appearance of new nodes.

But even if we would want to keep Collection Nodes, and invent a sibling
concept for Aggregate Nodes, there are some issues to be resolved for
both.

First of all, finding out which subscription(s) caused a notification is
ambiguous, especially when the implementation supports subscription
identifiers. I think we should include subscription information in a
different way than with the current SubID and Collection SHIM headers. A
possible solution is to use XMPP URIs to hold this information:

  <header 
name="Subscription">xmpp:pubsub.example.org?;node=mynode;subid=1234</header>

The advantage of this is that you have exactly one such header per
subscription, and you can specify both the node and the subscription
identifier in one value. Other suggestions are welcome.

Second, there is currently no way to retrieve items from a collection
node. As I mentioned before, this would need modification to the result
structure of the <items/> request. It cannot currently hold items for
different nodes. I suggested to return the empty result, and then
sending delayed notifications that match the items request as messages.
This prevents very large stanzas and avoids the issue with multiple
origin nodes. Probably we need some way to signal the difference from no
results at all.

ralphm

Re: [PubSub] collection node definition

Reply via email to