Hi Julien, Thanks for your response; you raise some excellent points. Comments inline.
On Jul 15, 2:21 pm, Julien Genestoux <[email protected]> wrote: > Dan, > Sorry for the late reply. First, I'd like to thank you and the Feedburner > for the efforts. > A few comments: > > - *When a hub crawls a proxied feed, the proxy should treat it as a ping.* > I need to think about this a bit. Superfeedr acts as a "default" hub, which > means that we work for any feed, by polling on behalf of the subscriber(s). > This may cause issues, as we guarantee our susbrcibers that we will poll the > feed at least once a day, to tell them about the status of the feed. I > understand is goes a little bit beyond the regular PubSubHubbub approach, > but we may need to implement something specific on our end so we don't > identify our pollers as the hub when we're not the designated hub. > Another solution would be that when the proxy gets a request from a hub, it > should check that the hub is (or not) the designated hub. > If a hub crawls a feed once or a few times a day without actually having received a ping, I don't think that should pose a problem for a feed proxy or for the platform hosting the source feed -- it's not very much traffic, and if the proxy winds up crawling the source feed, it shouldn't cause any trouble. What's important is that feed proxies don't crawl source feeds constantly / every time they get a request for the feed, since that hurts feed serving performance and can DOS the source feed platform. > *When hub links appear in the source feed, proxies should subscribe to them. > * > As I agree with this, your comments about the proxy which may consider the > fat ping as a light ping is worrysome, as there are cases where the source > feed isn't updated yet (caches... ), and polling this (by not trusting the > hub) could result in an inaccurate content and a delay as long as the proxy > didn't poll the source feed while it hasn't been updated. [This is the > behavior that Superfeedr (and I believe other subscribers) has had a for a > long time, which caused us a lot of issues with Feedburner specifically]. > I think this would lead to a problem when either: - the hub crawls the source feed and gets updates, but the proxy crawls it and sees stale content or - the platform sends a fat ping to the hub directly, but crawls of the feed yield stale content In an ideal platform, neither of these things should happen -- if caches are being used by the platform hosting a source feed, they should be invalidated when a new post appears in it and the ping to the hub goes out, so that subsequent crawls will return the new post. While FeedBurner used to have some problems in this area, they were fixed several months ago. (Please let us know if you're still seeing any problems with latency in updates.) One reason that FeedBurner can't use the fat pings from hubs directly is that the pings contain only deltas, and don't tell us: - what items have fallen off the feed, as opposed to items that remain unchanged but are still present in the feed - where items that were added to the feed should be inserted relative to existing, unmodified items in the feed So it's not possible to reconstruct the source feed with the same content you would get if you crawled it. This may not matter for some applications, but it does matter to our application, since clients crawling feeds expect the full feed. Another reason is that if something goes wrong and we don't receive a ping from a hub, and then receive another from it, we would miss new items or updates sent in the first ping if we only applied the deltas to state we maintained internally. Crawling the source feed when we get a fat ping from a hub, caching issues aside, ensures we get all of the latest content for the source feed. So far this approach has been working well for us. Are there other situations in particular that you're concerned about? > Apart from this, it would be very interesting that proxies, when > they subscribed to the source feed's designated hub, that they mention the > number of susbcribers that they have themselves for the proxied feed. this > way the original hub can report that number to the publisher. I understand > there may be some dupes, but at least, there will be data. However, this is > certainly a secondary issue. Sounds like an excellent suggestion. Is this something we would embed into the User Agent header for the request when making the hub subscription? (I wasn't aware of a protocol for doing this in the spec.) > > Thanks again for the hard work on this. > > Cheers, > > Julien > > > > On Wed, Jul 14, 2010 at 10:48 PM, Dan <[email protected]> wrote: > > Hi all, > > > I wrote up a doc on the Pubsubhubbub wiki describing some problems > > that arise getting hubs and feed proxies to work together nicely, and > > an approach for solving those problems: > > >http://code.google.com/p/pubsubhubbub/wiki/HubsAndFeedProxies > > > The approach described is (roughly speaking) the approach taken by > > FeedBurner and the reference hub. > > > Any comments you could provide would be great. One thing in particular > > we need to figure out: what convention we should adopt for the HTTP > > headers used when hubs crawl feeds. Some possible conventions are > > described in the "Identifying hubs" section... please let me know what > > you think a good convention would be. > > > Thanks! > > > Dan Rodney > > FeedBurner
