Re: Autodiscovery
Phil Ringnalda wrote: Arve Bersvendsen wrote: On Tue, 03 May 2005 18:52:59 +0200, Tim Bray [EMAIL PROTECTED] wrote: http://diveintomark.org/rfc/draft-ietf-atompub-autodiscovery-01.txt 1) Change the attribute value for the rel from alternate to feed, Don't forget, since you would be doing that primarily for people who think too much, that you'll also need to include a profile [1] URI and make a guess at what dereferencing that URI ought to return, and probably take a stab at explaining how to deal with multiple profiles, since the HTML spec punted on that. This would not be necessary if 'feed' were added to the HTML standard directly. Popularizing feed would have one benefit outside Atom's scope, though: there's currently no useful way for an RSS 1.0 feed to do autodiscovery with type=application/rdf+xml since it could be any alternate RDF, not just RSS: if Atom breaks the ice with feed then RSS 1.0 wins. 'feed' is not really defining a /relation/, it's defining a sort of meta-content-type... But I would much prefer that to forcing 'alternate' on non-'alternate' links. ~fantasai (Copying to WHATWG mailing list: http://www.whatwg.org/ )
Re: Autodiscovery
On May 4, 2005, at 02:56, David Nesting wrote: Plus, feed is kind of application-specific. What about related? It's a spec for discovering *feeds*. It is proper to have an app-specific rel value to avoid feed-specific apps downloading non-feed related documents. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
On Apr 29, 2005, at 12:17, Martin Duerst wrote: Making this more precise is definitely desirable. But there is also an i18n issue: This works fine for languages that use spaces between words. It doesn't work for languages that don't have spaces between words (Chinese, Japanese, Thai,...). If Text elements are only used for short things such as names or titles, that's not a big issue, the text in question can just be put on a single line. However, when the texts in question are long, it's a serious issue, and should be fixed. You seem to be assuming that the length of a line is restricted in XML source. Why? As far as I can tell, it should be permissible to produce Atom documents that contain no LF or CR characters. Can't languages without spaces use long source lines and apply soft wrapping in a source view if necessary? Why is this a wire format problem? -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: Autodiscovery
On 4/5/05 3:52 PM, fantasai [EMAIL PROTECTED] wrote: 'feed' is not really defining a /relation/, it's defining a sort of meta-content-type... But I would much prefer that to forcing 'alternate' on non-'alternate' links. instead of feed, consider updates, which gets closer to the gist of the sense e.
Re: AutoDiscovery
Randy Charles Morin wrote: +1 to adding lang as an attribute to link thanks Robert link lang='en' ... The HTML and XHTML specification already define that. -- Anne van Kesteren http://annevankesteren.nl/
Re: Atom feed refresh rates
Brett Lindsley wrote: Andy, I recall bringing up the same issue with respect to portable devices. My angle was that firing up the transmitter, making a network connection and connecting to the server is still an expensive operation in time and power (for a portable device) - even if the server returns nothing . There is no reason to check feeds that are not being updated, but then, there currently is no way to know this. I recall there was a proposal on cache control. That seemed like a good direction, but I don't recall it being discussed. As you indicated, if the feed had some element that indicated it won't be updated (for example) for another day (e.g. a daily news summary), then the end client would need to only check once a day. Brett Lindsley, Motorola Labs Isn't this what the HTTP Expires header is for (http://greenbytes.de/tech/webdav/rfc2616.html#header.expires)? Best regards, Julian
Re: Atom feed refresh rates
In reviewing the protocol spec (and the basic protocol spec), there is no mention of recommended HTTP headers. There are examples in the basic protocol spec that shows ETag and Last-Modified but not Expires. Maybe there should be a section in the protocol spec showing recommended headers (a SHOULD) for use with Atom feeds. This would encourage the use of these three headers. Brett Lindsley, Motorola Labs. Julian Reschke wrote: Brett Lindsley wrote: Andy, I recall bringing up the same issue with respect to portable devices. My angle was that firing up the transmitter, making a network connection and connecting to the server is still an expensive operation in time and power (for a portable device) - even if the server returns nothing . There is no reason to check feeds that are not being updated, but then, there currently is no way to know this. I recall there was a proposal on cache control. That seemed like a good direction, but I don't recall it being discussed. As you indicated, if the feed had some element that indicated it won't be updated (for example) for another day (e.g. a daily news summary), then the end client would need to only check once a day. Brett Lindsley, Motorola Labs Isn't this what the HTTP Expires header is for (http://greenbytes.de/tech/webdav/rfc2616.html#header.expires)? Best regards, Julian
RE: Atom feed refresh rates
Isn't this what the HTTP Expires header is for (http://greenbytes.de/tech/webdav/rfc2616.html#header.expires)? I don't think this helps a lot with my original issue because in many cases a feed's updater will either not know when they will next update the feed, or will be updating the feed frequently throughout the day. Andy
Re: Atom feed refresh rates
On 4 May 2005, at 9:10 am, Andy Henderson wrote: I am adding Atom support to my Agg. For RSS feeds, I have used the ttl and sy:updatePeriod / sy:updateFrequency elements to allow feed providers to limit refresh rates. Why? I have, in any case, imposed a minimum refresh rate of one hour - because that seemed the decent thing to do. This is a myth perpetuated by cheapskate bloggers. There's no technical reason for it beyond I bought a lousy hosting package. However, I'm coming under pressure to reduce that minimum limit for feeds that are clearly designed for shorter refresh periods - such as the Gmail Atom feeds. I'm reluctant to implement a free-for-all so I'm looking for guidance on how I should tackle this issue. Keep the global setting for all feeds limited to 60 (or 30) minutes, but allow the setting for individual feeds to be set lower. Graham
Re: Atom feed refresh rates
Andy Henderson wrote: Isn't this what the HTTP Expires header is for (http://greenbytes.de/tech/webdav/rfc2616.html#header.expires)? I don't think this helps a lot with my original issue because in many cases a feed's updater will either not know when they will next update the feed, or will be updating the feed frequently throughout the day. If they don't know that, how can the previous response you got help you in determining when to poll next? Best regards, Julian
Re: Atom feed refresh rates
On 4 May 2005, at 11:44 am, Brett Lindsley wrote: There is no reason to check feeds that are not being updated, but then, there currently is no way to know this. plug plug: http://www.fondantfancies.com/apps/shrook/distfaq.php As you indicated, if the feed had some element that indicated it won't be updated (for example) for another day (e.g. a daily news summary), then the end client would need to only check once a day. Please don't confuse bandwidth (number of posts per day) with latency (checking rate). They're largely unrelated. You could only check once per day if the daily summary appeared at an exact, known time, was never late, and was never updated later. Graham
Re: Autodiscovery
On Tuesday, May 3, 2005, at 11:41 PM, fantasai wrote: David Nesting wrote: I expect that many of my implementations will utilize content negotiation (using the same URL as an HTML representation, where needed), so I expect that I'll have some links like: link rel=alternate href=/ type=application/atom+xml link rel=alternate href=/ type=application/rss+xml Or even link rel=alternate href= type=application/atom+xml link rel=alternate href= type=application/rss+xml That won't work, because content negotiation will continue to return the same thing it returned just now. You must somehow tell the server to return a specific other version of the current document, and you do that typically by sending a GET request with a different URL -- one that specifies a particular version of the resource. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14 GET /path-to-the-feed HTTP/1.1 Accept: application/atom+xml ... You don't have to change the URL--just list only the format you want in the Accept header. If the autodiscovery link was lying/mistaken and that format really isn't available at that URL, you should get a 406 (not acceptable) response.
Re: Autodiscovery
On 5/4/05, fantasai [EMAIL PROTECTED] wrote: Who's to say we can't overload it a little for this case? You are not writing the HTML 4.01 spec, you're writing an autodiscovery spec that takes advantage of the syntax *and semantics* given in HTML 4. Your specification should be consistent with HTML 4, not contradictory to it. The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. It is not contradictory. Robert Sayre
Re: Atom feed refresh rates
PaceCaching uses the HTTP model for Atom, whether Atom is used over HTTP or some other protocol. PaceCaching was rejected by the editors because it was too late (two months ago) and non-core. I think that: a) it is never too late to get it right, and b) scalability is core. The PACE describes why refresh rates do not solve the problem adequately. wunder --On May 4, 2005 5:44:18 AM -0500 Brett Lindsley [EMAIL PROTECTED] wrote: Andy, I recall bringing up the same issue with respect to portable devices. My angle was that firing up the transmitter, making a network connection and connecting to the server is still an expensive operation in time and power (for a portable device) - even if the server returns nothing . There is no reason to check feeds that are not being updated, but then, there currently is no way to know this. I recall there was a proposal on cache control. That seemed like a good direction, but I don't recall it being discussed. As you indicated, if the feed had some element that indicated it won't be updated (for example) for another day (e.g. a daily news summary), then the end client would need to only check once a day. Brett Lindsley, Motorola Labs Andy Henderson wrote: If I'm asking this in the wrong place, sorry; please redirect me if you can. I am the author of an Aggregator and I'm looking for advice on refresh rates. There was some discussion in this group back in June about a possible 'Refresh rate' element. That seems to have been dismissed in favour of bandwidth throttling techniques, notably etag, last-modified and compression. I already support all these plus some additional ones. I am uncomfortable, though, with the implication that refresh rates don't matter and should be left to the end-user to decide. I am adding Atom support to my Agg. For RSS feeds, I have used the ttl and sy:updatePeriod / sy:updateFrequency elements to allow feed providers to limit refresh rates. I have, in any case, imposed a minimum refresh rate of one hour - because that seemed the decent thing to do. However, I'm coming under pressure to reduce that minimum limit for feeds that are clearly designed for shorter refresh periods - such as the Gmail Atom feeds. I'm reluctant to implement a free-for-all so I'm looking for guidance on how I should tackle this issue. Andy Henderson Constructive IT Advice -- Walter Underwood Principal Architect, Verity
Re: Atom feed refresh rates
On 5/4/05, Walter Underwood [EMAIL PROTECTED] wrote: PaceCaching uses the HTTP model for Atom, whether Atom is used over HTTP or some other protocol. PaceCaching was rejected by the editors because it was too late (two months ago) and non-core. In this WG, the editors don't reject proposals or schedule issues. Those tasks fall to the chairs and secretary, respectively. Robert Sayre
Re: Atom feed refresh rates
On 5/5/05 12:44 AM, Graham [EMAIL PROTECTED] wrote: uses 3GB a day, or about $1.20 at current prices. only in some parts of the world. over here I'm paying 13.2 cents per K and reading from a recent bill 2,982.61 Kbytes cost me $393.79 AUD. e.
Re: Autodiscovery
On 5/4/05, Robert Sayre [EMAIL PROTECTED] wrote: On 5/4/05, fantasai [EMAIL PROTECTED] wrote: Who's to say we can't overload it a little for this case? You are not writing the HTML 4.01 spec, you're writing an autodiscovery spec that takes advantage of the syntax *and semantics* given in HTML 4. Your specification should be consistent with HTML 4, not contradictory to it. The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. It is not contradictory. +1 -- Joe Gregoriohttp://bitworking.org
Re: Atom feed refresh rates
On May 4, 2005, at 7:44 AM, Graham wrote: A quick look at that site turned up only one other site actually complaining, MSDN, and they changed their minds: Actually, as I recall, last time this came up (proposed by Walter Underwood), someone pointed out accurately that RSS2 has had this functionality for a long time and that nobody ever really implemented it; thus there was a strong vote from experience against such a feature. -Tim
Re: Atom feed refresh rates
This is a myth perpetuated by cheapskate bloggers. There's no technical reason for it beyond I bought a lousy hosting package. Graham: I disagree. In a time where referrer and trackback spam agents are hammering servers everywhere, it's quite reasonable for aggregator developers to exhibit restraint and not add to the burden that the blogosphere has unintentionally created. That's not to say that there's something necessarily wrong with an aggregator that allows users to pull feeds every five minutes. If you're building something for people who are going to be subscribing to Gmail feeds or referrer logs (I'm subscribed to both in Newzcrawler), then you have to cater to those needs. The most anyone can ask is that you provide reasonable defaults and leave it at that. But I've got my own code set to limit refreshes to an hour or more, and don't forsee changing it. It's the right thing for *me* to do. -- Roger Benningfield
Re: Autodiscovery
Arve Bersvendsen wrote: On Wed, 04 May 2005 09:43:38 +0200, Eric Scheid [EMAIL PROTECTED] wrote: instead of feed, consider updates, which gets closer to the gist of the sense No. To me 'Updates' signifies that something is 'updated'. Even posting new content falls outside of that definition. That would signify updates to my document. If I'm linking to the CNN news feed, or my-favorite-blog, that wouldn't be appropriate. For this purpose, the syntax needs to signify that this is a feed, that it needs to be handled as such.. and that there is no other significant relationship between the document and the feed it links to (unless otherwise specified). ~fantasai
Re: Autodiscovery
Robert Sayre wrote: On 5/4/05, fantasai [EMAIL PROTECTED] wrote: Who's to say we can't overload it a little for this case? You are not writing the HTML 4.01 spec, you're writing an autodiscovery spec that takes advantage of the syntax *and semantics* given in HTML 4. Your specification should be consistent with HTML 4, not contradictory to it. The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. It is not contradictory. The definition of 'alternate' is not one line long on my screen, but here's the first sentence of it: # Alternate # Designates substitute versions for the document in which the link occurs. -- http://www.w3.org/TR/REC-html40/types.html#h-6.12 How is a link from the top of my homepage to my friend's weblog feed designating a substitute version for the document in which the link occurs? Note that we are not arguing the semantics of the link element in an Atom document, but the semantics of the link element in an HTML document. ~fantasai
Re: Autodiscovery
On 5/4/05, fantasai [EMAIL PROTECTED] wrote: The definition of 'alternate' is not one line long on my screen, but here's the first sentence of it: # Alternate # Designates substitute versions for the document in which the link occurs. -- http://www.w3.org/TR/REC-html40/types.html#h-6.12 How is a link from the top of my homepage to my friend's weblog feed designating a substitute version for the document in which the link occurs? I don't know, but I'm not sure why you think that's what the autodiscovery spec endorses. Is there some part of the spec that endorses that? The autodiscovery spec is for use by UAs like Mozilla and Safari that present little icons alerting the user to a feed version of a page. Often, I never visit the page again, once I've subscribed to the feed. The feed is a substitute. Note that we are not arguing the semantics of the link element in an Atom document, but the semantics of the link element in an HTML document. Yes, I caught that. Robert Sayre
Re: Autodiscovery
On 4/5/05 11:11 PM, Robert Sayre [EMAIL PROTECTED] wrote: The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. how is a feed of recent entries a substitute version for the document in which the link occurs when that document is some blog post long since dropped out of the feed? Alternate Designates substitute versions for the document in which the link occurs. When used together with the lang attribute, it implies a translated version of the document. When used together with the media attribute, it implies a version designed for a different medium (or media).
Re: Autodiscovery
Robert Sayre wrote: The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. It is not contradictory. But a feed is not a substitute version of an archive page as most archived entries are not in the feed anymore. That said, I'm totally in favor of using rel=alternate to link to a feed from the _alternate_ HTML version. From an archive page, you should rather use rel=start. Actually, here's my view of those things: In the latest news page (generally the homepage for a weblog): link rel=alternate type=application/atom+xml href=feed.atom In a category page: link rel=start type=text/html href=../index.html link rel=start type=application/atom+xml href=../feed.atom link rel=alternate type=application/atom+xml href=category.atom link rel=section type=text/html href=category/index.html link rel=section type=application/atom+xml href=category/category.atom In a single-entry archive page: link rel=start type=text/html href=../../index.html link rel=start type=application/atom+xml href=../../feed.atom link rel=section type=text/html href=../index.html link rel=section type=application/atom+xml href=../category.atom !-- no alternate -- However, is this enough for the autodiscovery purpose? -- Thomas Broyer
Re: Autodiscovery
how is a feed of recent entries a substitute version for the document in which the link occurs when that document is some blog post long since dropped out of the feed? Eric: A devil's advocacy moment... if I change the published date for the document to today's date, it will suddenly spring forward into my feed of recent entries. And at some point in the past, it was already in that feed. -- Roger Benningfield
Re: Atom feed refresh rates
On May 4, 2005, at 3:44 AM, Brett Lindsley wrote: Andy, I recall bringing up the same issue with respect to portable devices. My angle was that firing up the transmitter, making a network connection and connecting to the server is still an expensive operation in time and power (for a portable device) - even if the server returns nothing . There is no reason to check feeds that are not being updated, but then, there currently is no way to know this. As the author of an aggregator app for a portable wireless device I can tell you that this is a serious problem for this class of products. In my app I've implemented every trick in the book to try and reduce the amount of data that I have to pull through the radio and parse. I use If-None-Match and If-Changed-Since headers in my requests, I support compression, I respect caching hints from the servers. It doesn't help in all cases. I have 112 loaded up in my aggregator and only 74 of the servers hosting those feeds ever return a 304. The rest give me a 200 and gladly hand me everything regardless of whether it has changed or not. 17 of the servers don't bother supplying an ETag header. My feed list amounts to about 20 MB of data per day when polling once per hour. That is a lot of air time for a small radio, and a lot time spent grinding in an XML parser for a small CPU. This is especially upsetting because by my measurements only about 2 MB of data is fresh for any given day. The main hit is in battery life the above stats can trivially knock HOURS off of the life of a small battery. I've written extensively about this problem here: http://www.desalvo.org/blog/?p=230 with a real-world example studied here: http://www.desalvo.org/blog/?p=232 So, I guess I'd like to see an optional update-frequency hint element. Thanks, Chris
Re: Autodiscovery
* Eric Scheid [EMAIL PROTECTED] [2005-05-05 02:35+1000] On 4/5/05 11:11 PM, Robert Sayre [EMAIL PROTECTED] wrote: The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. how is a feed of recent entries a substitute version for the document in which the link occurs when that document is some blog post long since dropped out of the feed? Because the HTML definition is close to meaningless. I can substitute any document for another, and the 2nd is a substitution not through any intrinsic characteristics, but because it was substituted. Many of the HTML link type definitions don't bear up under detailed scrutiny... Dan
Re: Atom feed refresh rates
On 4 May 2005, at 7:11 pm, Chris DeSalvo wrote: My feed list amounts to about 20 MB of data per day when polling once per hour. That is a lot of air time for a small radio, and a lot time spent grinding in an XML parser for a small CPU. This is especially upsetting because by my measurements only about 2 MB of data is fresh for any given day. The main hit is in battery life the above stats can trivially knock HOURS off of the life of a small battery. So you're saying the first smartphone aggregator that uses a gateway server to move the heavy lifting off of the device is going to clean up the market. What's this got to do with Atom? So, I guess I'd like to see an optional update-frequency hint element. Why? Graham
Re: Autodiscovery
Dan Brickley wrote: * Eric Scheid [EMAIL PROTECTED] [2005-05-05 02:35+1000] On 4/5/05 11:11 PM, Robert Sayre [EMAIL PROTECTED] wrote: The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. how is a feed of recent entries a substitute version for the document in which the link occurs when that document is some blog post long since dropped out of the feed? Because the HTML definition is close to meaningless. I can substitute any document for another, and the 2nd is a substitution not through any intrinsic characteristics, but because it was substituted. Many of the HTML link type definitions don't bear up under detailed scrutiny... I think you're taking your anarchic interpretation a little too far there. Especially there: if you read the *spec*, you might notice that the definition of 'alternate' continues: # When used together with the media attribute, it implies a version # designed for a different medium (or media). From section 12.2.4, we also have # The rel attribute specifies the relationship of the linked document # with the current document. So, according to HTML 4.01 -- which is the definitive spec as far as HTML is concerned -- the following link link rel=alternate type=application/atom+xml href=feed.atom designates a link to a version of the linking document that is application/atom+xml. Again, my friend's blog feed is not an Atom version of /my/ web page; linking to it as alternate would be wrong. ~fantasai
Re: Atom feed refresh rates
On May 4, 2005, at 11:35 AM, Graham wrote: On 4 May 2005, at 7:11 pm, Chris DeSalvo wrote: My feed list amounts to about 20 MB of data per day when polling once per hour. That is a lot of air time for a small radio, and a lot time spent grinding in an XML parser for a small CPU. This is especially upsetting because by my measurements only about 2 MB of data is fresh for any given day. The main hit is in battery life the above stats can trivially knock HOURS off of the life of a small battery. So you're saying the first smartphone aggregator that uses a gateway server to move the heavy lifting off of the device is going to clean up the market. What's this got to do with Atom? So, I guess I'd like to see an optional update-frequency hint element. Why? If the feed provided a hint for a reasonable polling frequency, it would be a plus for limited-resource devices. I hate to suggest that the format be changed as a prophylactic measure against bad-citizen servers, but that is the problem that I have to solve for my platform and applications. In case anyone cares, this is for the T-Mobile Sidekick. I work at Danger, Inc, the developer of the OS and hardware. I work on the OS and applications. -chris p.s. And yes, someone providing a good gateway, with a snazzy push protocol would make my life a lot easier.
Re: Autodiscovery
On May 4, 2005, at 11:02 AM, Robert Sayre wrote: I think it would be a mistake to see this as an opportunity to invent a supremely capable and expressive autodiscovery spec. I've seen mozilla, safari, NNW do autodiscovery. I'm sure bots from PubSub, Technorati, Yahoo, etc do it as well. We should document what they do, and settle on one arbitrary choice where they differ for no good reason. Mark's draft does an excellent job of documenting that reality. +1. It's Good Enough. -Tim
Re: Autodiscovery
Antone Roundy wrote: On Tuesday, May 3, 2005, at 11:41 PM, fantasai wrote: David Nesting wrote: I expect that many of my implementations will utilize content negotiation (using the same URL as an HTML representation, where needed), so I expect that I'll have some links like: link rel=alternate href=/ type=application/atom+xml link rel=alternate href=/ type=application/rss+xml Or even link rel=alternate href= type=application/atom+xml link rel=alternate href= type=application/rss+xml That won't work, because content negotiation will continue to return the same thing it returned just now. You must somehow tell the server to return a specific other version of the current document, and you do that typically by sending a GET request with a different URL -- one that specifies a particular version of the resource. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14 GET /path-to-the-feed HTTP/1.1 Accept: application/atom+xml ... You don't have to change the URL--just list only the format you want in the Accept header. If the autodiscovery link was lying/mistaken and that format really isn't available at that URL, you should get a 406 (not acceptable) response. Where does it say that including a 'type' attribute on a link forces the UA to send a restricted Accept header? ~fantasai
Re: Autodiscovery
Antone Roundy wrote: On Wednesday, May 4, 2005, at 12:59 PM, fantasai wrote: Again, my friend's blog feed is not an Atom version of /my/ web page; linking to it as alternate would be wrong. To me, this raises a red flag, suggesting that using an autodiscovery link from your web page to your friend's feed is not what autodiscovery is intended for. +1 Julian
Re: Autodiscovery
Antone Roundy wrote: On Wednesday, May 4, 2005, at 12:59 PM, fantasai wrote: Again, my friend's blog feed is not an Atom version of /my/ web page; linking to it as alternate would be wrong. To me, this raises a red flag, suggesting that using an autodiscovery link from your web page to your friend's feed is not what autodiscovery is intended for. Probably not. But the same argument applies if I have an autodiscovery link from a single entry in my blog to the main blog feed (which is a valid alternate version of my weblog's front page, but not of that single entry). ~fantasai
Re: Autodiscovery
On 5/4/05, Robert Sayre [EMAIL PROTECTED] wrote: Mark's draft does an excellent job of documenting that reality. +1 -joe -- Joe Gregoriohttp://bitworking.org
Re: Atom feed refresh rates
I do not disagree. I just wanted to get my $0.02 in for completeness. I'm happy as a clam with atom as it is now. -chris On May 4, 2005, at 12:52 PM, Robert Sayre wrote: No one is denying the existence of the problem you're describing. However, this WG has consistently decided is that an optional XML element of the kind you're describing wouldn't solve the problem. Essentially, we'd be trading one evangelism problem for another.
Re: Autodiscovery
fantasai wrote: Arve Bersvendsen wrote: On Wed, 04 May 2005 09:43:38 +0200, Eric Scheid [EMAIL PROTECTED] wrote: instead of feed, consider updates, which gets closer to the gist of the sense No. To me 'Updates' signifies that something is 'updated'. Even posting new content falls outside of that definition. That would signify updates to my document. If I'm linking to the CNN news feed, or my-favorite-blog, that wouldn't be appropriate. For this purpose, the syntax needs to signify that this is a feed, that it needs to be handled as such.. and that there is no other significant relationship between the document and the feed it links to (unless otherwise specified). ~fantasai These are both valid interpretations of updates. From Princeton's WordNet: update - n - news that updates your information - v - 1: modernize or bring up to date; We updated the kitchen in the old house 2: bring up to date; supply with recent information 3: bring to the latest state of technology As this definition suggests, most people think of updates as modifications of items that already exists first and completely new items second. In the land of feeds, the frequency is reversed (most updates in feeds are new items, not modifications to existing ones). -Nikolas 'Atrus' Coukouma
Re: Autodiscovery
fantasai wrote: Arve Bersvendsen wrote: On Wed, 04 May 2005 09:43:38 +0200, Eric Scheid [EMAIL PROTECTED] wrote: instead of feed, consider updates, which gets closer to the gist of the sense No. To me 'Updates' signifies that something is 'updated'. Even posting new content falls outside of that definition. That would signify updates to my document. If I'm linking to the CNN news feed, or my-favorite-blog, that wouldn't be appropriate. For this purpose, the syntax needs to signify that this is a feed, that it needs to be handled as such.. and that there is no other significant relationship between the document and the feed it links to (unless otherwise specified). ~fantasai These are both valid interpretations of updates. From Princeton's WordNet: update - n - news that updates your information - v - 1: modernize or bring up to date; We updated the kitchen in the old house 2: bring up to date; supply with recent information 3: bring to the latest state of technology As this definition suggests, most people think of updates as modifications of items that already exists first and completely new items second. In the land of feeds, the frequency is reversed (most updates in feeds are new items, not modifications to existing ones). -Nikolas 'Atrus' Coukouma
Re: Autodiscovery
Eric Scheid wrote: On 4/5/05 11:11 PM, Robert Sayre [EMAIL PROTECTED] wrote: The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. how is a feed of recent entries a substitute version for the document in which the link occurs when that document is some blog post long since dropped out of the feed? I'd suggest placing the link element only on the front page of your blog if this is a concern. The feed usually is a substitute version for the document in which the link occurs for that, at least. There's nothing in the spec that even suggests you to place the autodiscovery information in archive pages. In practice, people probably will, but I'm not sure it's worth worrying about. Do you have some example that's more generally applicable? -Nikolas 'Atrus' Coukouma
Re: Autodiscovery
On Wednesday, May 4, 2005, at 04:49 PM, Nikolas 'Atrus' Coukouma wrote: Eric Scheid wrote: On 4/5/05 11:11 PM, Robert Sayre [EMAIL PROTECTED] wrote: The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. how is a feed of recent entries a substitute version for the document in which the link occurs when that document is some blog post long since dropped out of the feed? I'd suggest placing the link element only on the front page of your blog if this is a concern. The feed usually is a substitute version for the document in which the link occurs for that, at least. There's nothing in the spec that even suggests you to place the autodiscovery information in archive pages. In practice, people probably will, but I'm not sure it's worth worrying about. There is a good reason for putting the link in the individual entry pages: if people get to your blog via some location other than your blog homepage, you don't want them to have to go to your homepage to subscribe to your blog's feed. In such a case, sure, alternate wouldn't be descriptive of the feed's relationship to the isolated page, but the way that such links will be processed by browsers will match the intent for publishing the link - if you find this entry interesting enough to want to subscribe to my feed, here's where to do it. I personally don't care whether it's alternative or something like feed. Alternative is a more generally applicable term, but yeah, it doesn't sound quite right on individual entry pages.
Re: Autodiscovery
On 5/5/05 4:02 AM, Thomas Broyer [EMAIL PROTECTED] wrote: Robert Sayre wrote: The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. It is not contradictory. But a feed is not a substitute version of an archive page as most archived entries are not in the feed anymore. That said, I'm totally in favor of using rel=alternate to link to a feed from the _alternate_ HTML version. From an archive page, you should rather use rel=start. The problem is, an automaton wouldn't know which to use as it wouldn't know if the page it is looking at is an entry archive page or a recent entries page, which rather defeats the purpose of auto-discovery. Also, it would be entirely reasonable to use @rel='alternate' to point to an @type='application/atom+xml' Atom Entry Document from an archive page. Furthermore, from a recent entries page it would also be entirely reasonable to use @rel='start' to point to the first archive entry page. Thus, the meanings of 'alternate' and 'start' would be *reversed* depending on what kind of page you were looking at. This is not conducive to hands-free auto discovery. Using @rel='feed' from both kinds of pages fixes that problem. e.
Re: Atom feed refresh rates
On 5/4/05, Chris DeSalvo [EMAIL PROTECTED] wrote: If the feed provided a hint for a reasonable polling frequency, it would be a plus for limited-resource devices. I hate to suggest that the format be changed as a prophylactic measure against bad-citizen servers, but that is the problem that I have to solve for my platform and applications. No one is denying the existence of the problem you're describing. However, this WG has consistently decided is that an optional XML element of the kind you're describing wouldn't solve the problem. Essentially, we'd be trading one evangelism problem for another. Robert Sayre
Re: Atom feed refresh rates
On 5/4/05, Roger B. [EMAIL PROTECTED] wrote: That's not to say that there's something necessarily wrong with an aggregator that allows users to pull feeds every five minutes. If In the toy aggregator I wrote I played with a scheduler that tried to throttle itself based on the feeds response. That is to say it started polling every ten minutes. If the feed returned a 302 (or the corresponding Etag i haven't changed) then it extended that to every 20. Then 30... The problem I had was deciding what the maximum should be (1 hour? 2? 24?). Upon getting a 'fresh' feed it reset the interval to 10 minutes and started over again. I'm certain I got this idea from someone else, but don't recall who originated the idea. Lance Lavandowska
Re: Autodiscovery
On 5/5/05 4:17 AM, Dan Brickley [EMAIL PROTECTED] wrote: The autodiscovery spec is a reasonable interpretation of the *one line* definition of the 'alternate' relation. how is a feed of recent entries a substitute version for the document in which the link occurs when that document is some blog post long since dropped out of the feed? Because the HTML definition is close to meaningless. I can substitute any document for another, and the 2nd is a substitution not through any intrinsic characteristics, but because it was substituted. Many of the HTML link type definitions don't bear up under detailed scrutiny... Only if you take the most broadest sense of the word 'substitute'. This is like saying that not only is olive oil a substitute for butter in cooking, but so is engine oil, concrete, a 400 pound gorilla, and the square root of the gross national product of madagascar. No, I suspect they used the word 'substitute' in it's more narrow sense, and they used the word 'substitute' because they didn't want to write alternate: Designates an alternate version for the document in which the link occurs which would be circular. e.
Re: Autodiscovery
On 5/5/05 5:20 AM, Antone Roundy [EMAIL PROTECTED] wrote: On Wednesday, May 4, 2005, at 12:59 PM, fantasai wrote: Again, my friend's blog feed is not an Atom version of /my/ web page; linking to it as alternate would be wrong. To me, this raises a red flag, suggesting that using an autodiscovery link from your web page to your friend's feed is not what autodiscovery is intended for. I agree. However, using a link from an archive page is common practice (very!), but is one that would confound the use of Atom Entry Documents as @rel='alternate'. e.
Re: Autodiscovery
Eric Scheid wrote: On 5/5/05 4:38 AM, Nikolas 'Atrus' Coukouma [EMAIL PROTECTED] wrote: Do you have some example that's more generally applicable? in practice, people will put a link to the feed from which this page, and others like it, are likely to be found, into entry only pages. otherwise auto-discovery doesn't work unless you first navigate to the front page of someone's blog. people want to be able to say here's a link to my feed from entry pages. e. As I said I'm not sure it's worth worrying about. My current opinion is that it's just not worth making this change at this point, if this is in fact the only concern. It applies to a large number of pages with a small number of views and those are done for usability. Even if pages only had the link on the main page, I think it would allow 95% of users to find it. Maybe this is more of a concern for blogs where the archives are a major entry point. Perhaps this is even the usual case and I'm just used to people coming in the front door. What are the chances of someone subscribing to your feed if they never even look at the front page? -Nikolas 'Atrus' Coukouma
Re: rel profiles [was Autodiscovery
We have published profiles for both license and nofollow: http://developers.technorati.com/wiki/RelLicense http://developers.technorati.com/wiki/RelNoFollow feel free to use them... On May 3, 2005, at 11:16 PM, Mark Pilgrim wrote: On 5/4/05, Henri Sivonen [EMAIL PROTECTED] wrote No you don't. rel='license' and rel='nofollow' have been deployable without a profile. You just release running code that hard-codes rel='feed' and, boom, no profile needed. Then I'm confused as to why you can't just release running code that hard-codes rel=alternate. You know, like people have already done. -- Cheers, -Mark
Re: Autodiscovery and alternate
How about alternate be recommended for only true substitutes; a feed for comments or pictures should not be labelled alternate, as it is not a substitute. feed is appealing, but does fly in the face of practice. There are existing rel values that could apply to qualify other kinds of feeds, or we could suggest new ones. eg, if it is an titles-only feed, rel=contents would apply If you had both full-content and summary feeds available, this could be indicated in a machine readable way (I appreciate that Atom handles this properly within the format, unlike RSS, but offering both versions is something I see many sites doing). I am amazed that there was no rel=summary defined by the w3c; this would be a useful extension to consider. http://www.w3.org/TR/1999/REC-html401-19991224/types.html#type-links On May 3, 2005, at 10:29 PM, fantasai wrote: Arve Bersvendsen wrote: On Tue, 03 May 2005 18:52:59 +0200, Tim Bray [EMAIL PROTECTED] wrote: http://diveintomark.org/rfc/draft-ietf-atompub-autodiscovery-01.txt 1) Change the attribute value for the rel from alternate to feed, or some similar wording. A feed is not always an alternate of the HTML document in which it occurs. As I mentioned last November [1] I agree with not requiring the 'alternate' rel value for the reasons stated in http://fantasai.inkedblade.net/weblog/2004/linking-feeds/ Briefly, it is an abuse of its semantics because many feed links are not links to alternate representations of the current page. [1] http://www.imc.org/atom-syntax/mail-archive/msg11705.html ~fantasai
PaceOriginalAttribute (was: PaceDuplicateIDWithSource2)
http://www.intertwingly.net/wiki/pie/PaceOriginalAttribute On 5/3/05, Martin Duerst [EMAIL PROTECTED] wrote: I'm not really happy with this. I found Martin's comments (copied in full below) to be accurate. So, I thought I would try another approach. Comments, suggestions, and alterations are welcome. Robert Sayre == Abstract == Preserve the original ID elsewhere, and require republishers to mint new IDs for *their entries*. == Status == Open == Rationale == Duplicate entry ids in feeds are too easy to create unintentionally, and the legitimate uses can't be verified as updates unless they come from the originating feed. == Proposal == Add an 'original' attribute to atom:source and reword as follows: {{{ If an atom:entry is copied from one feed into another feed, then the source atom:feed's metadata (all child elements of atom:feed other than the atom:entry elements) MAY be preserved within the copied entry by adding an atom:source child element, if it is not already present in the entry, and including some or all of the source feed's metadata elements as the atom:source element's children. Such metadata SHOULD be preserved if the source atom:feed contains any of the child elements atom:author, atom:contributor, atom:copyright, or atom:category and those child elements are not present in the source atom:entry. 4.2.11.1 The 'original' Attribute Atom entries can be republished and altered by intermediaries, but Atom feeds MUST NOT contain duplicate atom:id values. The 'original' attribute contains the entry's initial atom:id value. atom:source elements MUST have an 'original' attribute. }}} == Impacts == == Notes == CategoryProposals On 5/3/05, Martin Duerst [EMAIL PROTECTED] wrote: I'm not really happy with this. Conceptually, it seems to replace an ID for an entry with a pair (ID,feed). As IDs are URIs/IRIs (remember what they expand to), that doesn't make sense. What guarantee do we have that two feeds will be different? (yes, these days, we get feeds over http, but there are other ways to get feeds, where things may not be that easy). If we don't have a solution for the malicious case, and we think we need one, we should work on some security stuff. If we think that accidential ID duplication is a problem, then let's look at how we can improve the explanation. After that, there may still be an occasional accident, but the spec should be worded to catch that, not to provide a loophole. If we have to allow duplicate IDs, I'd rather prefer we do it without all this feed/source/... stuff: I.e. if you are an aggregator and can't manage to do duplicate elimination, you can just delegate the problem to the next place in the feeding chain. Regards,Martin.
Atom on portable wireless device (was: RE: Atom feed refresh rates)
Chris DeSalvo wrote: As the author of an aggregator app for a portable wireless device I can tell you that this is a serious problem for this class of products. You didn't list support for RFC3229+feed[1,2] as one of the things you are doing. This would help you drastically reduce the bandwidth needed when you find a feed that actually has new content. If you use RFC3229+feed to pull a feed, then you will only get the new entries in the feed -- not ones that you've copied over before. It's one step beyond If-None-Match, etc. But, the real problem with your approach is that you have apparently coded the device so that it goes out and polls large numbers of feeds. This doesn't make sense. For a portable wireless device with limited bandwidth and limited connectivity, you should be accessing feeds via an intermediary proxy that gathers up all your updates into a *single* feed. That feed should be served using RFC3229+feed to ensure that you only copy from it the updated entries since you last pulled from it. Of course, it would also make sense to support compression on the results. There is no more efficient mechanism for polling for feeds from the kind of device you describe. You say that you're reading about 20MB per day but you're only able to harvest 2MB of fresh data from it? This 1/10 harvesting yield is actually pretty normal when polling RSS/Atom feeds served without RFC3229+feed. If you used RFC3229+feed, you would find that your yield would start to approach 100% rather then the 10% you are at now. Additionally, given the efficiencies here, you would be able to increase your polling frequency almost arbitrarily without significantly increasing the bandwidth consumption of your system. Thus, you could cut latency below the average of 30 minutes which is implied by a polling frequency of 1 hour. You've written on your blog that you want to see more 304 responses. Well, I would suggest that what you *really* should want is more 226 responses -- 226 is the success code for an RFC3229+feed GET operation. bob wyman [1] http://bobwyman.pubsub.com/main/2004/10/massive_bandwid.html [2] http://bobwyman.pubsub.com/main/2004/09/using_rfc3229_w.html Original Message == In my app I've implemented every trick in the book to try and reduce the amount of data that I have to pull through the radio and parse. I use If-None-Match and If-Changed-Since headers in my requests, I support compression, I respect caching hints from the servers. It doesn't help in all cases. I have 112 loaded up in my aggregator and only 74 of the servers hosting those feeds ever return a 304. The rest give me a 200 and gladly hand me everything regardless of whether it has changed or not. 17 of the servers don't bother supplying an ETag header. My feed list amounts to about 20 MB of data per day when polling once per hour. That is a lot of air time for a small radio, and a lot time spent grinding in an XML parser for a small CPU. This is especially upsetting because by my measurements only about 2 MB of data is fresh for any given day. The main hit is in battery life - the above stats can trivially knock HOURS off of the life of a small battery. I've written extensively about this problem here: http://www.desalvo.org/blog/?p=230 with a real-world example studied here: http://www.desalvo.org/blog/?p=232 So, I guess I'd like to see an optional update-frequency hint element. Thanks, Chris
http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs
+1 with a comment: If this Pace is accepted (and I hope it will be) the issue of Duplicate IDs should probably be dealt with in Marks Implementation Guide.[1] Atom supports the publishing of newer versions of an entry which use the same atom:id as earlier versions of the same entry. It is not required that atom:updated be modified when a newer version is written. If the PaceAllowDuplicateIDs is accepted, it will be permitted to have multiple entries with the same atom:id in a single feed. However, the Pace language says processors SHOULD regard as feed generation errors any entries which duplicate both the atom:id and atom:updated of another entry in the same feed. Thus, feed authors who wish to publish feeds with duplicate atom:ids should ensure that any entry which duplicates an entry already in the feed has a different value for atom:updated. This constraint is not a requirement of the language, but it is a clear derivative of it. Basically, you dont have to update atom:updated unless you think it makes sense OR you are publishing to a feed that already has an entry with the same atom:id as the atom:id of the entry you are currently publishing. bob wyman [1] http://diveintomark.org/rfc/draft-ietf-atompub-impl-guide-00.html
Re: http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs
On May 4, 2005, at 6:20 PM, Bob Wyman wrote: +1 with a comment: If this Pace is accepted (and I hope it will be) the issue of Duplicate IDs should probably be dealt with in Marks Implementation Guide.[1] Er, I had planned to refine this a bit and then announce it to the group with some explanations and some other background research I did; so how about I promise to do that later this evening; and please consider waiting for that before you all pile in, pro or contra. -Tim
Re: Autodiscovery
On 5/5/05 5:36 AM, fantasai [EMAIL PROTECTED] wrote: - specify that UAs MAY also recognize the rel=alternate and type=application/atom+xml combination as an autodiscoverable Atom feed even if 'feed' is not among the rel values, and that UA should check that the representation returned when requesting that resource is an Atom Feed Document, and not an Atom Entry Document. e.
PaceAllowDuplicateIDs
co-chair-hat status=OFF http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs This Pace was motivated by a talk I had with Bob Wyman today about the problems the synthofeed-generator community has. Summary: 1. There are multiple plausible use-cases for feeds with duplicate IDs 2. Pro and Contra 3. Alternate Paces 4. Details about this Pace 1. Use-Cases Here's a stream of stock-market quotes. feedtitleMy Portfolio/title entrytitleMSFT/title updated2005-05-03T10:00:00-05:00/updated contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T11:00:00-05:00/updated contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T12:00:00-05:00/updated contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item /entry /feed You could also imagine a stream of weather readings. Bob's actual here-and-now today use-case from PubSub is earthquakes, an entry describes an earthquake and they keep re-issuing it as new info about strength/location comes in. Some people only care about the most recent version of the entry, others might want to see all of them. Basically, each atom:entry element describes the same Entry, only at a different point in time. You could argue that in some cases, these are representations of the Web resources identified by the atom:id URI, but I don't think we need to say that explicitly. Yes, you could think of alternate ways of representing stock quotes or any of the other use-cases but this is simple and direct and idiomatic. 2. Pro and Contra Given that I issued the consensus call rejecting the last attempt to do this, which was PaceRepeatIdInDocument, I felt nervous about revisiting the issue. So I went and reviewed the discussion around that one, which I extracted and placed at http://www.tbray.org/tmp/RepeatID.txt for the WG's convenience. Reviewing that discussion, I'm actually not impressed. There were a few -1's but very few actual technical arguments about why this shouldn't be done. The most common was Software will screw this up. On reflection, I don't believe that. You have a bunch of Entries, some of them have the same ID and are distinguished by datestamp. Some software will show the latest, some will show all of them, the good software will allow switching back and forth. Doesn't seem like rocket science to me. So here's how I see it: there are plausible use cases for doing this, and one of the leading really large-scale implementors in the space (PubSub) wants to do this right now. Bob's been making strong claims about not being able to use Atom if this restriction remains in place. I believe strongly that if there's something that implementors want to do, standards shouldn't get in the way unless there's real interoperability damage. I'm certainly prepared to believe that this could cause interoperability damage, but to date I haven't seen any convincing arguments that it will. I think that if we nonetheless forbid it, people who want to do this will (a) use RSS instead of Atom, (b) cook up horrible kludges, or (c) ignore us and just do it. So my best estimate is that the cost of allowing dupes is probably much lower than the cost of forbidding them. Finally, our charter does say that we're also supposed to specify how you'd go about archiving feeds, and AllowDuplicateIDs makes this trivial. I looked around and failed to find how we claimed we were going to do that while still forbidding duplicates, but it's possible I missed that. 3. Alternate Paces I didn't want to just revive PaceRepeatIdInDocument, because it used the word version in what I thought was kind of a sloppy way, and because it wasn't current against format-08. I don't like either PaceDuplicateIDWithSource or ...WithSource2, they are complicated and don't really meet PubSub's needs anyhow. So I'm strongly -1 on both of those. Yes, that means that if this Pace fails, we'll allow no duplicates at all. I prefer either dupes OK or no dupes to dupes OK in the following circumstances; cleaner. 4. Details Section 4.1.2 of format-08 says that atom:entry represents an individual entry. The Pace says that if you have dupes, they represent the same entry, which I think is consistent with both the letter and spirit of 4.1.2. The Pace discourages duplicate timestamps without resorting to MUST language, because accidents can happen; this allows software to throw such entries on the floor while positively encouraging noisy complaining. On the other hand, if the WG wanted either to insist on a MUST here or remove the discouragement altogether I could live with that. Finally, it makes it clear that if there are entries with duplicate atom:id, software is free to display all or a subset, and calls out the likely common case where you discard all but the most recent. If I were Brent Simmons or equivalent, I'd be coding up a button where you