[pubsubhubbub] Re: Spec 0.4 review

Roman Thu, 28 Jun 2012 08:48:14 -0700

Hi Julien,

Several more thoughts on the hubbub spec.


*1.* Both 0.3 and 0.4 specs say that hubs MAY ignore hub.lease_seconds from
subscription requests. I think it's too permissive; hubs should be allowed
to reduce this value but not increase it.

*2.* I think we need to specify what the hub is allowed to do w.r.t. feed
deduping.

Consider the following scenario.

   - http://*foo.com*/atom.xml redirects to http://*foo.com/feeds*/atom.xml.
   - The latter contains <link rel="self" href="http://*bar.com*
   /atom.xml"/>.
   - There are subscribers for all three topics.
   - http://*foo.com*/atom.xml gets published.

Questions:

   1. Should http://*foo.com/feeds*/atom.xml subscriber get notified?
      1. If yes, what should the subscriber receive? Should it be told
      that http://*foo.com/feeds*/atom.xml or http://*foo.com*/atom.xml has
      changed?
   2. b. Should http://*bar.com*/atom.xml subscriber get notified?
      1. If yes, what should the subscriber receive?

I think the answers should be:

   1. The hub doesn't have to do this, but it's a good thing.
      1. The subscriber should be told that http://*foo.com/feeds*/atom.xml
      has changed because that's what the subscriber has subscribed to. If the
      notification was triggered by a publication on a different topic, the
      subscriber doesn't need to know that.
   2. The hub can't blindly trust the self link from the feed because it
   might point to a completely different content. Therefore, by default, the
   hub shouldn't notify the http://*bar.com*/atom.xml subscriber. Although
   if the hub is certain that http://*bar.com*/atom.xml and
http://*foo.com*/atom.xml
   are the same (security concern), it may notify the subscriber.
      1. If the hub does notify the subscriber, it should say that http://*
      bar.com*/atom.xml has changed because that's what the subscriber has
      subscribed to.

*To summarize*: When the hub is distributing content, the Self Link MUST be
equal to hub.topic of the corresponding subscription.

*3.* Retries.

It's important to make the protocol as simple as possible for the
publishers because we'd love to see more publishers participating in the
hubbub communication. We don't expect the number of subscribers to be very
high, so it's OK if their part is a bit more involved.

Permanent subscriptions require retries of subscription verification
requests because without retries they can't be relied upon and subscribers
will have to issue resubscription requests on their own. So if we allow
permanent subscriptions, we also must require the hub to retry subscription
verification requests.

Subscribers MUST keep track of the topics they are subscribed to in order
to handle subscription verification requests correctly. Thus, it's not too
much extra work for them to periodically resubscribe to all topics.

Even if permanent subscriptions were allowed, I expect non-trivial
subscribers to periodically resubscribe to all their topics anyway for
increased robustness. And since the number of subscribers is likely to stay
relatively low, most of them will be non-trivial and thus can easily afford
that.

*Conclusion*: I'm convinced that it's a responsibility of subscribers to
retry subscription requests. The hub should not do it. Section Automatic
Subscription Refreshing should be removed from the spec.

*4.* Subscription validation and denied subscriptions.

The hub is allowed to deny a subscription at any point. For example, when a
social network user makes her profile private, all the profile's
subscribers will get denied. There is a period of time when the
subscription is already invalidated by the hub but the subscriber isn't yet
notified. Although this period is usually short, it still exists, meaning
that the subscriber can't assume that it's receiving updates on a topic
just because it hasn't seen the deny notification yet.

Now, let's consider two subscription protocols.

Subscription protocol A:

   - Hub receives a subscription request.
   - It validates the subscription (is this subscriber allowed to
   subscribe?). If validation fails, it notifies the subscriber.
   - The hubs issues a subscription verification request to verify the
   intent of the subscriber. If it fails, the subscription request is ignored.
   - The subscription is marked as validated. The subscriber starts
   receiving content updates.

Subscription protocol B:

   - Hub receives a subscription request.
   - It issues a subscription verification request. If it fails, the
   subscription request is ignored.
   - The hub validates the subscription. If validation fails, it notifies
   the subscriber.
   - The subscription is marked as validated. The subscriber starts
   receiving content updates.

Both protocols also have a separate background process that can invalidate
any subscription at any moment (described at the start of this section).

First of all, please note that protocol B is simpler for the subscriber:
the subscriber is guaranteed to receive a subscription verification request
from the hub after sending a subscription request. In protocol A this is
not the case: the subscriber can also receive the deny notification
instead. In both protocols the subscriber can receive a deny notification
after the verification request.

Protocol A, while being more complex, doesn't have any advantages that I
can see. It may look like it gives the subscriber more information, but
it's not actually the case. In both protocols there is a period of time
when the hub considers a subscription invalid but the subscriber doesn't
know about it yet.

*My suggestion*: pick protocol B.

Roman.

[pubsubhubbub] Re: Spec 0.4 review

Reply via email to