On 16-Nov-2009, at 12:01, Ralph Meijer wrote:
> So yes, code-as-node is custom by nature. The assumption is that for
> systems that would benefit most from the concept of collections, having
> static configurations for parent-child node association is cumbersome at
> best. For most applications, it doesn't really matter what the precise
> associations of nodes are. The application just wants to subscribe to a
> particular set of updates.
It is completely true that applications don't need to care about the
DAG except, potentially, in esoteric situations. However, for an system
architect, the DAG is the key to the implementation. The reason I like it is
that it is not custom. Anyone who has put together a system via collection
nodes can trivially port that system to any XMPP server that also supports
collection nodes.
From my understanding so far, `node-as-code' requires some kind of API,
at a minimum, which then relies on specific language implementations, which are
undefined, in a given server. Porting such a system between server
implementations becomes a monumentally more difficult task. The only way around
it, that I see, is to define a `node-as-code' language which must be
implemented by servers. I firmly believe that such a language is not, and
should not be, in the scope of XMPP. On top of my aesthetic argument, if such a
language were defined it would almost certainly be orders of magnitude more
difficult to implement in the server than collection nodes, which are merely
DAGs.
> Continuing with the example of modeling blog posts as leaf nodes and the
> whole blog as a collection, you would need to reconfigure either the
> parent or child node to associate the new blog post. If you do in-band
> publishing (i.e. you use a generic pubsub service and actively post to
> it from whatever blogging backend you have), you can also include the
> parent node when creating the new node for the blog post.
Including parent node information in the publish notification
side-steps both `node-as-code' and collection nodes, IMHO, since the publisher
specifies the relationship, and not the node owners. Were one to go this route,
I would think that content based subscriptions would fit the bill better than
either of the other solutions.
As an aside, I kind of want to kill pubsub#children from collection
node configuration, as it makes implementation harder for negligible benefit
(maybe someone has a compelling use case that I'm missing).
> On top of that, we don't need to remember
> the associations, as this can be calculated at run time.
I don't regard this as a particularly compelling trade-off. Memory is
cheaper than CPU cycles unless you have a large number of nodes with a low
number of publish events.
> After talking to a bunch of people about modeling their application, I
> have the idea that a generic solution for collections is not practical
> for anything but toy projects. I'd like to be proven wrong at this.
> Please come up with useful, real-world examples where static node
> configuration is required and feasible to implement.
We use collection nodes for our real-time call information dashboard
(https://my.onsip.com/). It works excruciatingly well and imposes very little
overhead in run-time at an enormous savings in memory (due to lack of data
duplication).
> In any case, while I'd like to support the smaller use-case, I think
> that implementing the whole scheme of recording associations and
> traversing DAGs at publish time, along with checking authorization and
> more is not worth the trouble. My suggested alternative, though, was a
> breeze to implement. So I have to disagree here.
Traversing a DAG is a simple and fairly light-weight operation. They
can be hard to serialize into SQL, but otherwise there's not much difficulty
with them. The authorization issue is troublesome, but, IMHO, deserves nothing
more than a paragraph or two in the `Security Considerations' section of the
XEP. If the system designer makes sure authorization is set up appropriately
(again, I don't consider this to be too difficult) then there isn't much issue.
> Collections as I see them are really just abstractions of content-based
> pubsub systems (hi Bob Wyman!), where you basically assign a fixed name
> (node identifier) to a particular query into the notification plasm. I
> am still interested in explicitly defining the minimally subscribe-able
> unit (like a blog post), so I want to to pass along a specific node from
> where a notification originates, though.
The distinction is who defines the relationship: owner, publisher, or
subscriber? I think there are valid cases for all three, although the last two
can be defined in terms of content based systems (as I understand them, anyway)
the first is defined by the owner at creation time and can effectively be
forced on any user of the system (the top-level collection is open, children
are closed).
-bjc