On Sat, Nov 14, 2009 at 8:00 PM, Robin Collier <[email protected]> wrote: > While quite powerful, doesn't this make a system quite custom in nature > to the point where discovery of capabilities and configuration become > quite useless? I also implies access to the backend system to be to > insert the custom logic. In the end, wouldn't this only be useful in a > very closed system. It strikes me that this would not be too useful > to an open system where you would not want the users to be able > insert code on a server.
The WordPress.com PubSub service runs on a customized Ejabberd installation. If we had wanted to create firehose (collection) nodes for certain types of nodes (blogs and their comments) and not had the ability to customize the code underlying nodes, we would have had to rely on some form of Collection Nodes and we would have been stuck with whatever support the software offered. Custom code might always be the best way to implement systems with complex information. > I guess I am thinking that this capability should be determined by > server implementations as an extended capability, and not necessarily > as part of the spec itself. This is exactly what I'm thinking. I can see a spec being useful for more generic systems but it almost certainly wouldn't be useful to me. I'll describe our service as a use case. tl;dr: pubsub.im.wordpress.com uses custom code to implement a complex node tree. WordPress.com pushes new blog posts and comments through a PubSub service in Ejabberd. For the convenience of users familiar with WordPress feeds, A.K.A. laziness, we decided to mimic the standard feed URLs in our node names. In WordPress the most commonly used feed URL path is /feed/ which delivers an RSS feed of recent posts. The URL /comments/feed/ delivers recent comments. Adding /feed/ after any post URL (/permalink/feed/) fetches the comments on that post only. Appending /atom/ changes the format. We mapped these onto PubSub node names by prefixing /blogs/ and dropping /feed/: All posts: http://domain/feed/ -> /blogs/domain All comments: http://domain/comment/feed/ -> /blogs/domain/comments One post's comments: http://domain/permalink/feed/ -> /blogs/domain/permalink A subscription to /blogs/domain/ would not be equivalent to a subscription to each of its sub-nodes. We could have used a stricter hierarchy with all items flowing down toward root but we felt a more familiar scheme would be more user-friendly. (Of course we may be proven wrong.) We also have nodes that function sort of like collections without their names being prefixes of the blog nodes. Maybe they would be described better as compilation nodes. These two nodes push public WordPress.com posts and comments to services like Collecta. This is the mapping: /blogs/*[private=false] -> /firehose /blogs/*/comments[private=false] -> /gusher Let me explain a little about /blogs/*. There are millions of blogs on WordPress.com. Some of them are private, i.e. access-controlled. We have no intention of implementing subscription or browsing of /blogs/ as a collection node. We don't even want to store its sub-nodes (blogs) in the PubSub system if we can use WordPress.com as a back end. Thus all of the blog nodes are virtual (they only exist while performing a task). The instantiation of nodes is handled by custom code that uses WordPress.com as back end via php_app[1]. The collection-ish nodes, /firehose and /gusher, are access-controlled, notification only, and deliver payloads. These high-volume nodes likely never will support item browsing. So we figured the most efficient way to feed them was to send them items internally from the most specific nodes. We don't bother showing the origin node in a SHIM because our subscribers don't need it and because each Atom item contains data sufficient to reconstruct the origin node path. Items are published to the comments, firehose, and gusher nodes internally. For example, when a comment is published to /blogs/domain/permalink (the origin) our virtual node module directly calls the function that publishes the comment in /blogs/domain/comments. Then it checks the blog's privacy settings via php_app and if it's a public blog the virtual node module publishes the comment in /gusher. The main downside is that it would be a terrible task to port our system, which I hope never to do. We could switch to a strict collection node graph but custom logic would still be required to handle privacy. I have not seen how the switch could be worthwhile. Our custom nodes-as-code are working just fine. If anything, I'll just rewrite the re-publication code to avoid data duplication. If you would like more details on our implementation, please ask. /tl;dr Andy [1] http://github.com/skeltoac/php_app/
