[pubsubhubbub] Re: Spec 0.4 review

Julien Genestoux Thu, 05 Jul 2012 10:34:44 -0700

Romain,
Sorry for the late response;

On Thu, Jun 28, 2012 at 8:47 AM, Roman <[email protected]> wrote:


> Hi Julien,
>
> Several more thoughts on the hubbub spec.
>
> *1.* Both 0.3 and 0.4 specs say that hubs MAY ignore hub.lease_seconds
> from subscription requests. I think it's too permissive; hubs should be
> allowed to reduce this value but not increase it.
>

I don't think it matters if they increase it either. Subscribers can just
cancel theirs or reconfirm it at any time anyway.


> *2.* I think we need to specify what the hub is allowed to do w.r.t. feed
> deduping.
>
> Consider the following scenario.
>
>    - http://*foo.com*/atom.xml redirects to http://*foo.com/feeds*
>    /atom.xml.
>    - The latter contains <link rel="self" href="http://*bar.com*
>    /atom.xml"/>.
>    - There are subscribers for all three topics.
>    - http://*foo.com*/atom.xml gets published.
>
> Questions:
>
>    1. Should http://*foo.com/feeds*/atom.xml subscriber get notified?
>
> No. In my mind, the only url which matters is the self url, (whether in
<link> or in the HTTP headers).
Subscribers should respect the spec with that regard and cannot be
expecting things to work if they don't respect it.

>
>    1. If yes, what should the subscriber receive? Should it be told
>       that http://*foo.com/feeds*/atom.xml or http://*foo.com*/atom.xml
>       has changed?
>    1. b. Should http://*bar.com*/atom.xml subscriber get notified?
>
> No. Because that resource (or one that it redirects to was not updated).
Publishers should also respect the spec
and if they don't they can't expect it to work.

>
>    1. If yes, what should the subscriber receive?
>
> I think the answers should be:
>
>    1. The hub doesn't have to do this, but it's a good thing.
>
> Agreed, but then that shouldn't be in the spec (that's what we do at
Superfeedr).

>
>    1. The subscriber should be told that http://*foo.com/feeds*/atom.xml
>       has changed because that's what the subscriber has subscribed to. If the
>       notification was triggered by a publication on a different topic, the
>       subscriber doesn't need to know that.
>
> I'm with you here... but again, that's outside the spec in my mind.


>
>    1. The hub can't blindly trust the self link from the feed because it
>    might point to a completely different content. Therefore, by default, the
>    hub shouldn't notify the http://*bar.com*/atom.xml subscriber.
>    Although if the hub is certain that http://*bar.com*/atom.xml
>    and http://*foo.com*/atom.xml are the same (security concern), it may
>    notify the subscriber.
>
> I disgaree, the hub should 'blindly' trust the self... Subscribers MUST
subscribe to the self url, publishers MUST ping with the self url, and
everything will work. If either fails, then things are expected to go wrong.


>
>    1. If the hub does notify the subscriber, it should say that http://*
>       bar.com*/atom.xml has changed because that's what the subscriber
>       has subscribed to.
>
> In all notifications the hub should send the self link (again, in <link>
or header, based on the content type.)


>  *To summarize*: When the hub is distributing content, the Self Link MUST
> be equal to hub.topic of the corresponding subscription.
>
Yes.


>
> *3.* Retries.
>
> It's important to make the protocol as simple as possible for the
> publishers because we'd love to see more publishers participating in the
> hubbub communication. We don't expect the number of subscribers to be very
> high, so it's OK if their part is a bit more involved.
>

I guess that's your specific Google use case, but other people disagree
with this :)


>
> Permanent subscriptions require retries of subscription verification
> requests because without retries they can't be relied upon and subscribers
> will have to issue resubscription requests on their own. So if we allow
> permanent subscriptions, we also must require the hub to retry subscription
> verification requests.
>
I thought we agreed on forgetting about "permanent" subscriptions. In
practice, some subscriptions may me infinite, but just because they are
renewed by the subscriber before they expire.


> Subscribers MUST keep track of the topics they are subscribed to in order
> to handle subscription verification requests correctly. Thus, it's not too
> much extra work for them to periodically resubscribe to all topics.
>
Agreed again. Some hubs may decide, as a courtesy, to provide automatic
retries on their end so that subscribers don't need to run their own
cronjob.


> Even if permanent subscriptions were allowed, I expect non-trivial
> subscribers to periodically resubscribe to all their topics anyway for
> increased robustness. And since the number of subscribers is likely to stay
> relatively low, most of them will be non-trivial and thus can easily afford
> that.
>
That's fine with me.


>
> *Conclusion*: I'm convinced that it's a responsibility of subscribers to
> retry subscription requests. The hub should not do it. Section Automatic
> Subscription Refreshing should be removed from the spec.
>
I would rephrase this as; the subscriber should not expect the hub to do
that, unless otherwise stated (outside of the spec).


>
> *4.* Subscription validation and denied subscriptions.
>
> The hub is allowed to deny a subscription at any point. For example, when
> a social network user makes her profile private, all the profile's
> subscribers will get denied. There is a period of time when the
> subscription is already invalidated by the hub but the subscriber isn't yet
> notified.
>
Seconds at most?


> Although this period is usually short, it still exists, meaning that the
> subscriber can't assume that it's receiving updates on a topic just because
> it hasn't seen the deny notification yet.
>
I'd argue that this doesn't matter, because the publisher denial will be
received by the subscriber before the publihser publishes anything new.
Both message take the same "route" (publisher -> hub -> subscriber), and
since PubSubHubbub is intended at 'low' frequency messages, the subscriber
would know about the denial before anything else is published by the
subscriber.


>
> Now, let's consider two subscription protocols.
>
> Subscription protocol A:
>
>    - Hub receives a subscription request.
>    - It validates the subscription (is this subscriber allowed to
>    subscribe?). If validation fails, it notifies the subscriber.
>    - The hubs issues a subscription verification request to verify the
>    intent of the subscriber. If it fails, the subscription request is ignored.
>    - The subscription is marked as validated. The subscriber starts
>    receiving content updates.
>
> Subscription protocol B:
>
>    - Hub receives a subscription request.
>    - It issues a subscription verification request. If it fails, the
>    subscription request is ignored.
>    - The hub validates the subscription. If validation fails, it notifies
>    the subscriber.
>    - The subscription is marked as validated. The subscriber starts
>    receiving content updates.
>
> Both protocols also have a separate background process that can invalidate
> any subscription at any moment (described at the start of this section).
>
> First of all, please note that protocol B is simpler for the subscriber:
> the subscriber is guaranteed to receive a subscription verification request
> from the hub after sending a subscription request.
>
true.

> In protocol A this is not the case: the subscriber can also receive the
> deny notification instead. In both protocols the subscriber can receive a
> deny notification after the verification request.
>
>

> Protocol A, while being more complex, doesn't have any advantages that I
> can see. It may look like it gives the subscriber more information, but
> it's not actually the case. In both protocols there is a period of time
> when the hub considers a subscription invalid but the subscriber doesn't
> know about it yet.
>
> *My suggestion*: pick protocol B.
>

I understand your point better now: in other words, if the subscription is
not denied, then it is considered as accepted. My only concern with B, is
that there is a time (between the verification request) and the first
notification where the subscriber doesn't know if he is going to get data
or not.
I think this is bad, because it means that subscribers may do retries over
and over again as they do not know if things are in place...
We need a way to tell the subscriber that the subscription is good.
This is where the protocol A wins (for me at least!), because if the
verification is successful, then the subscribers _knows_ the state of the
subscription...


Julien



>
> Roman.
>

[pubsubhubbub] Re: Spec 0.4 review

Reply via email to