Re: [PubSub] collection node definition

Brian Cully Tue, 24 Nov 2009 08:39:17 -0800

On 24-Nov-2009, at 10:43, Andy Skelton wrote:
> The WordPress.com PubSub service runs on a customized Ejabberd
> installation. If we had wanted to create firehose (collection) nodes
> for certain types of nodes (blogs and their comments) and not had the
> ability to customize the code underlying nodes, we would have had to
> rely on some form of Collection Nodes and we would have been stuck
> with whatever support the software offered. Custom code might always
> be the best way to implement systems with complex information.


        *koff*

        Ejabberd supports collection nodes in 2.1. =)

        But yes, collection nodes don't solve every problem. For things that 
require turing-complete solutions you're going to need a turing-complete 
language to implement them, and collection nodes are none such beast. They do, 
however, solve many kinds of problems and have defined semantics, allowing one 
to move systems between servers trivially (assuming the servers all implement 
collection nodes anyway). The right trade-off depends on the kind of system one 
needs to put together, but, in general, I'd rather use the defined and portable 
solution if at all possible.

> The main downside is that it would be a terrible task to port our
> system, which I hope never to do. We could switch to a strict
> collection node graph but custom logic would still be required to
> handle privacy. I have not seen how the switch could be worthwhile.

        If I'm understanding correctly it is privacy that is the crux of the 
problem. Were it not for the constraint that items w/i a node had different 
privacy settings then this could be trivially implemented with collection nodes 
in a simple hierarchy. If this is the case, I think one could still go with 
collection nodes by using the DAG properly.

        We have a similar issue for our call center product. Let me describe 
the basic architecture and requirements:

        A user MUST be able to see all their own call information. A user MUST 
be able to grant access to a contact to see the user's call information. A 
contact MUST NOT be able to see call information from a user unless access has 
been granted by that user.

        Call information is published once and only once to leaf nodes in the 
form of /domain/user (mapping to u...@domain SIP addresses).

        So, we have a heirarchy as follows:

        Collection /$jid (one-to-many)-> Collection /$jid/$domain 
(many-to-many)-> Leaf /$domain/$user.

        The many-to-many relationship is the key to the authorization system. 
As users authorize contacts to view their information a link is made to the 
contact's /$jid/$domain collection node. This is /not/ a strict tree, but it 
/is/ a DAG. Technically, we don't need the /$jid/$domain node (we could link 
straight to /$jid), but since we're providing an API with our product we 
thought it made sense to allow subscription at any one of the three levels for 
completeness and because it was trivial for us to implement. Since the leaf 
looks like `/kublai.com/bjc' we wanted you to be able to subscribe to `/' or 
`/kublai.com', mimicking traditional path hierarchies (the latter two actually 
issue redirects to the $jid-prefixed nodes).

        I believe an architecture like this can be used in a wide variety of 
circumstances to allow fine-grained authorization in even complex permission 
systems. The only constraint is that items with different authorization levels 
must be published to nodes which share that authorization level, since the node 
is the unit of transmission.

-bjc

Re: [PubSub] collection node definition

Reply via email to