On 07/13/2010 05:36 PM, Danny Briere wrote:
We are looking at applications of PubSubHubbub that are mission
critical for businesses. So while businesses will tolerate some level
of variable delivery delay, they won't tolerate downtime. They are
used to 100% or near 100% (five nines) type reliability.
Does PubSubHubbub have support for any type of load distribution, load
balancing, or fault tolerance? For example, what appears to be a
single hub to the outside world really consists of several servers
that share the load, and that keep the hub up and running even if one
of the servers stops working. Or, if the load to capture the
publisher's source info backs up, is there some mechanism to share the
load out?
I can see firms listing multiple hubs instead of one in thinking that
this somehow provides for greater reliability, but I'm not sure that
does anything but create multiple feeds that subscribers then have to
true up. Thoughts?
A PubSubHubbub hub is essentially just a normal webapp that happens to
be accessed by other software rather than by users with browsers. The
usual best practices for web application scalability can therefore be
applied to the problem, publishing only one hub endpoint to the world
but having it actually be serviced by a farm of machines in the background.
(Sidebar: the reference implementation of a hub that runs on app engine
is in fact already doing this, using Google's App Engine infrastructure
to handle the distribution.)
I think in practice there's little reason for a feed to have more than
one hub. Subscriber implementations will generally only create
subscriptions to the first hub (in document order) anyway.
A robust subscriber will still poll a hubbub-enabled feed on a reduced
polling frequency to detect a change in hub URL or the removal of
pubsubhubbub support.