On 24-Nov-2009, at 10:43, Andy Skelton wrote:
> The WordPress.com PubSub service runs on a customized Ejabberd
> installation. If we had wanted to create firehose (collection) nodes
> for certain types of nodes (blogs and their comments) and not had the
> ability to customize the code underlying nodes, we would have had to
> rely on some form of Collection Nodes and we would have been stuck
> with whatever support the software offered. Custom code might always
> be the best way to implement systems with complex information.
*koff*
Ejabberd supports collection nodes in 2.1. =)
But yes, collection nodes don't solve every problem. For things that
require turing-complete solutions you're going to need a turing-complete
language to implement them, and collection nodes are none such beast. They do,
however, solve many kinds of problems and have defined semantics, allowing one
to move systems between servers trivially (assuming the servers all implement
collection nodes anyway). The right trade-off depends on the kind of system one
needs to put together, but, in general, I'd rather use the defined and portable
solution if at all possible.
> The main downside is that it would be a terrible task to port our
> system, which I hope never to do. We could switch to a strict
> collection node graph but custom logic would still be required to
> handle privacy. I have not seen how the switch could be worthwhile.
If I'm understanding correctly it is privacy that is the crux of the
problem. Were it not for the constraint that items w/i a node had different
privacy settings then this could be trivially implemented with collection nodes
in a simple hierarchy. If this is the case, I think one could still go with
collection nodes by using the DAG properly.
We have a similar issue for our call center product. Let me describe
the basic architecture and requirements:
A user MUST be able to see all their own call information. A user MUST
be able to grant access to a contact to see the user's call information. A
contact MUST NOT be able to see call information from a user unless access has
been granted by that user.
Call information is published once and only once to leaf nodes in the
form of /domain/user (mapping to u...@domain SIP addresses).
So, we have a heirarchy as follows:
Collection /$jid (one-to-many)-> Collection /$jid/$domain
(many-to-many)-> Leaf /$domain/$user.
The many-to-many relationship is the key to the authorization system.
As users authorize contacts to view their information a link is made to the
contact's /$jid/$domain collection node. This is /not/ a strict tree, but it
/is/ a DAG. Technically, we don't need the /$jid/$domain node (we could link
straight to /$jid), but since we're providing an API with our product we
thought it made sense to allow subscription at any one of the three levels for
completeness and because it was trivial for us to implement. Since the leaf
looks like `/kublai.com/bjc' we wanted you to be able to subscribe to `/' or
`/kublai.com', mimicking traditional path hierarchies (the latter two actually
issue redirects to the $jid-prefixed nodes).
I believe an architecture like this can be used in a wide variety of
circumstances to allow fine-grained authorization in even complex permission
systems. The only constraint is that items with different authorization levels
must be published to nodes which share that authorization level, since the node
is the unit of transmission.
-bjc