Re: [PubSub] collection node definition

Brian Cully Tue, 24 Nov 2009 07:36:55 -0800

On 16-Nov-2009, at 12:01, Ralph Meijer wrote:
> So yes, code-as-node is custom by nature. The assumption is that for
> systems that would benefit most from the concept of collections, having
> static configurations for parent-child node association is cumbersome at
> best. For most applications, it doesn't really matter what the precise
> associations of nodes are. The application just wants to subscribe to a
> particular set of updates.


        It is completely true that applications don't need to care about the 
DAG except, potentially, in esoteric situations. However, for an system 
architect, the DAG is the key to the implementation. The reason I like it is 
that it is not custom. Anyone who has put together a system via collection 
nodes can trivially port that system to any XMPP server that also supports 
collection nodes.

        From my understanding so far, `node-as-code' requires some kind of API, 
at a minimum, which then relies on specific language implementations, which are 
undefined, in a given server. Porting such a system between server 
implementations becomes a monumentally more difficult task. The only way around 
it, that I see, is to define a `node-as-code' language which must be 
implemented by servers. I firmly believe that such a language is not, and 
should not be, in the scope of XMPP. On top of my aesthetic argument, if such a 
language were defined it would almost certainly be orders of magnitude more 
difficult to implement in the server than collection nodes, which are merely 
DAGs.

> Continuing with the example of modeling blog posts as leaf nodes and the
> whole blog as a collection, you would need to reconfigure either the
> parent or child node to associate the new blog post. If you do in-band
> publishing (i.e. you use a generic pubsub service and actively post to
> it from whatever blogging backend you have), you can also include the
> parent node when creating the new node for the blog post.

        Including parent node information in the publish notification 
side-steps both `node-as-code' and collection nodes, IMHO, since the publisher 
specifies the relationship, and not the node owners. Were one to go this route, 
I would think that content based subscriptions would fit the bill better than 
either of the other solutions.

        As an aside, I kind of want to kill pubsub#children from collection 
node configuration, as it makes implementation harder for negligible benefit 
(maybe someone has a compelling use case that I'm missing).

> On top of that, we don't need to remember
> the associations, as this can be calculated at run time.

        I don't regard this as a particularly compelling trade-off. Memory is 
cheaper than CPU cycles unless you have a large number of nodes with a low 
number of publish events.

> After talking to a bunch of people about modeling their application, I
> have the idea that a generic solution for collections is not practical
> for anything but toy projects. I'd like to be proven wrong at this.
> Please come up with useful, real-world examples where static node
> configuration is required and feasible to implement.

        We use collection nodes for our real-time call information dashboard 
(https://my.onsip.com/). It works excruciatingly well and imposes very little 
overhead in run-time at an enormous savings in memory (due to lack of data 
duplication).

> In any case, while I'd like to support the smaller use-case, I think
> that implementing the whole scheme of recording associations and
> traversing DAGs at publish time, along with checking authorization and
> more is not worth the trouble. My suggested alternative, though, was a
> breeze to implement. So I have to disagree here.

        Traversing a DAG is a simple and fairly light-weight operation. They 
can be hard to serialize into SQL, but otherwise there's not much difficulty 
with them. The authorization issue is troublesome, but, IMHO, deserves nothing 
more than a paragraph or two in the `Security Considerations' section of the 
XEP. If the system designer makes sure authorization is set up appropriately 
(again, I don't consider this to be too difficult) then there isn't much issue.

> Collections as I see them are really just abstractions of content-based
> pubsub systems (hi Bob Wyman!), where you basically assign a fixed name
> (node identifier) to a particular query into the notification plasm. I
> am still interested in explicitly defining the minimally subscribe-able
> unit (like a blog post), so I want to to pass along a specific node from
> where a notification originates, though.

        The distinction is who defines the relationship: owner, publisher, or 
subscriber? I think there are valid cases for all three, although the last two 
can be defined in terms of content based systems (as I understand them, anyway) 
the first is defined by the owner at creation time and can effectively be 
forced on any user of the system (the top-level collection is open, children 
are closed).

-bjc

Re: [PubSub] collection node definition

Reply via email to