The results of last week's XMPP Summit are beginning to bleed out as Ralphm
blogs the first of a promised series of notes on the event.
See: http://ralphm.net/blog/2008/07/26/xmpp_summit_5
and http://ralphm.net/blog/2008/07/26/xmpp_social_networks_1

Not surprisingly, it seems that those at the Summit agreed that the most
sensible way to federate XMPP PubSub servers is to have various servers
subscribe to each other. Thus, if I was running a microblogging service that
provided open access to "public" posts on my service, I might set up a node
to which I published all such "public" posts. Other microblogging services,
search engines, etc. would then subscribe to that node and, by doing so,
could mix messages published to my service with those published to their own
service.

This approach of "Federation via Subscription" has some distinct advantages
over the alernative, "Federation via Publishing", particularly in that it
eases spam control and management of server resources. However, it has a
distinct disadvantage in that it makes it somewhat harder to form networks
of cooperating servers.

In a system which relies on Federation via Subscription, all servers that
receive messages must have knowledge of potential publishers prior to any
data flowing between them. Given two servers, A and B, no data will flow
from A to B unless B first becomes aware of A and subsequently subscribes to
at least one node on A. The interesting question becomes: "How does B become
aware of A?". Since no data can flow between the two servers until a
subscription is established, if there are no other mechanisms provided, one
must assume that B discovers A via "out-of-band" communications such as
email messages, phone calls, directory lookups, etc.  These are, of course,
rather crude discovery methods and require manual configuration upon
discovery to establish federating subscriptions.

An alternative means for facilitating discovery would be to extend the
XEP-0060 PubSub specification to support a means for servers to publish
"Advertisements" which announce the availability of nodes for federation.
Advertisements would specify which nodes are available for federation and
what data will be published over those nodes. In order to reuse as much
existing framework as possible, Advertisements would be published just like
normal events, but they would be published to a "well known node" that is
commonly available on all services that support advertisements. This node
might be named: "http://jabber.org/protocol/pubsub#advertisements"; and would
be like any other pubsub node in that it could be subscribed to, read, etc.
However, it would only support publishing <advertisement/>s not <event\>s.

The basic assumption behind federation is that two services will be
publishing data which is similar. For instance, that two micro-blogging
services will both be publishing micro-blogging entries that are formatted
as Atom entries. Agreement on the payload formats is essential to enable
federation. On the other hand, it is unreasonable to insist that all servers
use common node names. Thus, a mechanism is needed to provide a mapping from
some commonly agreed name for  a stream of data and the node name that is
used on any particular server. This can be accomplished by having the
Advertisement provide a mapping from commonly understood logical node names
to local concrete names. Thus, those creating micro-blogging standards might
say that the logical node name for publishing public posts is:
http://example.com/PublicMicroBloggingPosts. Then, a server that published
public posts on a node named "987ye879799wwww00" would simply provide both
the local and logical name for the node in the advertisement.

Given this introduction, an advertisement might look like the following:
(but, use of an xdata form might be more appropriate and more flexible...)

<iq type='set'
    from='[EMAIL PROTECTED]'
    to='old_service.shakespeare.lit'
    id='ad1'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>

    <publish node='http://jabber.org/protocol/pubsub#advertisements'>
      <advertisement xmlns='http://jabber.org/protocol/pubsub#advertisements'
                     id="*tag:[EMAIL PROTECTED],2008-07-24:1234*">
        <local node='987ye879799wwww00'
               format='http://example.com/post_format'\>
        <common node='http://example.com/PublicMicroBloggingPosts
<http://www.w3.org/2005/Atom>'\>
        <description>All public posts on this server.</description>
      </advertisement>
    </publish>
  </pubsub>
</iq>

If the Advertisement node is supported as a normal node, then it should be
possible for others to subscribe to the node and thus monitor advertisements
as they are published. Using filters, subscribers would either subscribe to
all advertisements published to the remote node or only to those
advertisements that are specific to that node. This permits advertisements
to flow to nodes not known to the advertiser as well as to permit servers to
ensure that they are rapidly made aware of changes to servers in which they
have an interest. Additional metadata such as keywords, etc. could be added
to make filtering easier and more effective.

Of course, "Advertisers" shouldn't expect that the mere act of advertising
will always result in a federating subscription. Server managers will still
often want to moderate the lists of nodes they subscribe to. Nonetheless,
the mechanism a foundation on which automatic subscription will sometimes
reasonably be built. For instance, I might wish to build a microblogging
aggregator that automatically subscribes to all remote services that claim
support for microblogging. Or, I might have a strong trust relationship with
some other service and decide that I would like to have my service subscribe
to anything advertised by the that service -- while manually reviewing
advertisements from other services... Many patterns are possible and
reasonable.

Those familar with blogging infrastructure will recognize a great deal of
similarity between the idea of Advertisements and that of "pinging." In
fact, within the blogging world, pinging is probably the most common and
useful means available to blog aggregators to discover new blogs. In fact,
it can be argued that the introduction of pinging and its use by blog
aggregators was probably one of the most essential steps in building the
blogging infrastructure as we know it today. Before pinging, the process of
discovering new blogs was horribly difficult, inaccurate and expensive for
service providers.

Comments? Does this sound reasonable?

bob wyman

Reply via email to