Re: Fyi, Apache project proposal
--On May 23, 2006 3:18:18 PM +0200 Ugo Cei [EMAIL PROTECTED] wrote: Demokritos might be quite well advanced but unfortunately Python code is not very suited for us poor souls who still have to struggle with java environments ;-) The goal is a reference implementation. The goal is to be exactly correct. Being in a particular language, or even being fast enough to be usable, is beside the point. In particular, a reference implementation should always choose code readability over speed. If the goal is to have a standard, free implementation that everyone uses, that is different from a reference implementation and the goals should say that. wunder -- Walter Underwood Principal Software Architect, Autonomy (Ultraseek)
Re: Atom syndication schema
--On March 15, 2006 4:25:40 PM +1100 Eric Scheid [EMAIL PROTECTED] wrote: Since the original discussion I've stumbled across something extra that makes xml:lang relevant for atom:name. Seems that in writing Hungarian names, the pattern is always surname followed by forename - e.g. Bartók Béla, where Béla is the personal name and Bartók is the family name. Or Margittai Neumann János vs. John von Neumann. It can be more complicated than first/last or last/first. I'm pretty sure that I brought this up and the WG decided to punt. Representing personal names well means starting with X.500 and asking around to see what could be improved. That is well outside the Atom charter. Punting was the right thing to do, but it means that atom:name is minimal. xml:lang isn't enough information to sort out given name and family name. About all you can do with atom:name is print it out. xml:lang could be useful in deciding between Chinese and Japanese variants of a character for names. wunder -- Walter Underwood Principal Software Architect, Autonomy
Re: wiki mime type
It isn't wiki. Those are used in blogs, and I use Markdown for simple HTML memos. Don't use x-, either. Register a real type. wunder --On March 7, 2006 5:51:42 PM +0100 Henry Story [EMAIL PROTECTED] wrote: On 6 Mar 2006, at 18:54, James Tauber wrote: Agreed that this would be very useful and also that it needs to be done on a per wiki format basis. Is there a forum a large number of them tend to hang out on, so that one could ask them to think about this? What would be the best to do in the meantime? Something like text/x-wiki+textile text/x-wiki+markdown perhaps? I think, however, that this is something the format creators should be encouraged to register, or at least suggest a convention for. James On Mon, 06 Mar 2006 07:59:10 -0800, Walter Underwood [EMAIL PROTECTED] said: --On March 6, 2006 3:59:39 PM +0100 Henry Story [EMAIL PROTECTED] wrote: Silly question probably, but is there a wiki mime type? I was thinking of text/wiki or text/x-wiki or something. I want people to be able to edit their blogs in wiki format in BlogEd and be able to distinguish when they do that from when they enter plain text, html or xhtml. Perhaps this is also useful for the protocol. It would be really useful, especially for feeds that archive the content of a blog. It would be best to use the official names of the formats, like text/markdown or text/textile. The wikis and blogs that I use can be configured to accept different formats, so text/wiki doesn't work. wunder -- Walter Underwood Principal Software Architect, Autonomy -- James Tauber http://jtauber.com/ journeyman of somehttp://jtauber.com/blog/ -- Walter Underwood Principal Software Architect, Autonomy
Re: wiki mime type
--On March 6, 2006 3:59:39 PM +0100 Henry Story [EMAIL PROTECTED] wrote: Silly question probably, but is there a wiki mime type? I was thinking of text/wiki or text/x-wiki or something. I want people to be able to edit their blogs in wiki format in BlogEd and be able to distinguish when they do that from when they enter plain text, html or xhtml. Perhaps this is also useful for the protocol. It would be really useful, especially for feeds that archive the content of a blog. It would be best to use the official names of the formats, like text/markdown or text/textile. The wikis and blogs that I use can be configured to accept different formats, so text/wiki doesn't work. wunder -- Walter Underwood Principal Software Architect, Autonomy
Re: Atom logo where?
--On March 6, 2006 7:02:23 PM +0100 A. Pagaltzis [EMAIL PROTECTED] wrote: For that matter, who has seen Mena Trott’s alternative Atom logo design and what do people think about it? 1. I don't see why Atom needs a logo. 2. The proposed logo is probably too close to the Autonomy logo. I cannot speak for Autonomy lawyers, but companies are faced with defend it or lose it on their trademarks. Autonomy is in the unstructured info business, so there is probably a conflict. It also looks like the logo for the Austin Bergtrom International Airport, but that doesn't conflict. wunder -- Walter Underwood Principal Software Architect, Autonomy
Re: atom:updated handling
It doesn't hurt to point it out. It could catch some developer errors. But it doesn't make an invalid feed. --wunder --On February 15, 2006 4:25:35 PM -0800 James M Snell [EMAIL PROTECTED] wrote: I personally think that the feedvalidator is being too anal about updated handling. Entries with the same atom:id value MUST have different updated values, but the spec says nothing about entries with different atom:id's. - James James Yenne wrote: I'm using the feedvalidtor.org to validate a feed with entries containing atom:updated that may have the same datetime, although different atom:id. The validator complains that two entries cannot have the same value for atom:updated. I generate these feeds and the generator uses the current datetime, which may be exactly the same. I don't understand why the validator should care about these updated values from different entries per atom:id - these are totally unrelated entries. Is the validator wrong? It seems that otherwise I have to play tricks to make these entries have different updated within the feed. I'm not sure how this relates to the thread More on atom:id handling Thanks, James -- Walter Underwood Principal Software Architect, Autonomy
Re: [Fwd: Re: todo: add language encoding information]
--On December 23, 2005 11:31:22 PM +0100 Henry Story [EMAIL PROTECTED] wrote: So you can't have a link pointing from an entry to an id, without losing some very important information. We need something more specific. We need a link pointing from A to C as shown by the blue line. Some people will need that in the guts of their publishing system. Why do we need it in Atom? Is there something essential that subscribers cannot do because this isn't represented? This sounds like something needed for the publishing/translation workflow, not for the general readership. Extended provenance information is sometimes needed, but there is almost no limit to that. It certainly does not stop at translation, source, and translator. I'm reading a new translation of Andersen's tales where Thumbelina is Inchelina because the translator knew the right dialect of Danish. That is significant, but does it need to be in Atom? The semantics here should be exactly the same as for dates -- the date means what the publisher thinks it means. Same for language info. Trying to get more exact means that the model will be wrong for some publishers that generate completely legal Atom. wunder -- Walter Underwood Principal Software Architect, Verity
Re: ACE - Atom Common Extensions Namespace
--On October 2, 2005 9:35:28 AM +0200 Anne van Kesteren [EMAIL PROTECTED] wrote: Having a file and folder of the same name is not technically possible. (Although you could emulate the effect of course with some mod_rewrite.) Namespaces aren't files, only names. So the limitations of some particular file name implementation are meaningless for namespaces. Also, some filesystem implementations do allow a file and a folder with the same name. wunder -- Walter Underwood Principal Software Architect, Verity
Re: Arr! Avast me hearties!
I think we just got a nomination for an April 1 RFC. Nice job. More accurate than the x-hacker locale on Google, because that is really still english, not some other hacker language. Besides, they didn't make the spell suggest work in l33t. wunder --On September 20, 2005 3:09:56 AM +0100 James Holderness [EMAIL PROTECTED] wrote: A conforming client SHOULD perform an HTTP request for the feed with the Accept-Language header set to en-pirate (or whatever the standard RFC 3066 language tag for the pirate dialect of english). A conforming server SHOULD return the pirate version of the feed with the Content-Language header set to en-pirate and/or the xml:lang attribute set to en-pirate in the root element. -- Walter Underwood Principal Software Architect, Verity
Re: Top 10 and other lists should be entries, not feeds.
--On August 30, 2005 1:49:57 AM -0400 Bob Wyman [EMAIL PROTECTED] wrote: I’m sorry, but I can’t go on without complaining. Microsoft has proposed extensions which turn RSS V2.0 feeds into lists and we’ve got folk who are proposing much the same for Atom (i.e. stateful, incremental or partitioned feeds)… I think they are wrong. Feeds aren’t lists and Lists aren’t feeds. The Atom spec says: This specification assigns no significance to the order of atom:entry elements within the feed. One could read that to mean that feeds are fundamentally unordered or that Atom doesn't say what the order means. Other RSS formats are ordered, either implicitly or explicity (RSS 1.0). For interoperatility, lots of software is going to treat Atom as ordered. Otherwise, it is not possible to go from Atom to RSS 1.0. What is a search engine or a matching engine supposed to return as a resul if it find a match for a user query in an entry that comes from a list-feed? Maybe the list feed should have a noindex flag. Should it return the entire feed or should it return just the entry/item that contained the stuff in the users’ query? I'd return the entry. It is all about the entries. If the list position is semantically important to the entry, then include a link from the entry to the list. This is movie 312 in wunder's queue. wunder -- Walter Underwood Principal Software Architect, Verity
Re: Top 10 and other lists should be entries, not feeds.
--On August 30, 2005 3:50:45 PM -0600 Peter Saint-Andre [EMAIL PROTECTED] wrote: One could read that to mean that feeds are fundamentally unordered or that Atom doesn't say what the order means. Is not logical order, if any, determined by the datetime of the published (or updated) element? That is one kind of order. Other kinds are relevance to a search term (A9 OpenSearch), editorial importance (BBC News feeds), or datetime of original publication (nearly all blog feeds, not the same as last update). wunder -- Walter Underwood Principal Software Architect, Verity
Re: Top 10 and other lists should be entries, not feeds.
--On August 30, 2005 3:50:45 PM -0600 Peter Saint-Andre [EMAIL PROTECTED] wrote: Otherwise, it is not possible to go from Atom to RSS 1.0. I assume you mean from RSS 1.0 to Atom. :-) No. You can go from a Bag to List by ignoring the order. RSS 1.0 is a List, so you would need to invent an order to put unordered items in it. wunder -- Walter Underwood Principal Software Architect, Verity
Re: Don't Aggregrate Me
--On Monday, August 29, 2005 10:39:33 AM -0600 Antone Roundy [EMAIL PROTECTED] wrote: As has been suggested, to inline images, we need to add frame documents, stylesheets, Java applets, external JavaScript code, objects such as Flash files, etc., etc., etc. The question is, with respect to feed readers, do external feed content (content src=... /), enclosures, etc. fall into the same exceptions category or not? Of course a feed reader can read the feed, and anything required to make it readable. Duh. And all this time, I thought robots.txt was simple. robots.txt is a polite hint from the publisher that a robot (not a human) probably should avoid those URLs. Humans can do any stupid thing they want, and probably will. The robots.txt spec is silent on what to do with URLs manually-added to a robot. The normal approach is to deny those, with a message that they are disallowed by robots.txt, and offer some way to override that. wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: Don't Aggregrate Me
--On August 30, 2005 11:39:04 AM +1000 Eric Scheid [EMAIL PROTECTED] wrote: Someone wrote up A Robots Processing Instruction for XML Documents http://atrus.org/writings/technical/robots_pi/spec-199912__/ That's a PI though, and I have no idea how well supported they are. I'd prefer a namespaced XML vocabulary. That was me. I think it makes perfect sense as a PI. But I think reuse via namespaces is oversold. For example, we didn't even try to use Dublin Core tags in Atom. PI support is required by the XML spec -- must be passed to the application. wunder -- Walter Underwood Principal Software Architect, Verity
Re: Don't Aggregrate Me
--On August 29, 2005 7:05:09 PM -0700 James M Snell [EMAIL PROTECTED] wrote: x:index=no|yes doesn't seem to make a lot of sense in this case. It makes just as much sense as it does for HTML files. Maybe it is a whole group of Atom test cases. Maybe it is a feed of reboot times for the server. wunder -- Walter Underwood Principal Software Architect, Verity
Re: Don't Aggregrate Me
There are no wildcards in /robots.txt, only path prefixes and user-agent names. There is one special user-agent, *, which means all. I can't think of any good reason to always ignore the disallows for *. I guess it is OK to implement the parts of a spec that you want. Just don't answer yes when someone asks if you honor robots.txt. A lot of spiders allow the admin to override /robots.txt for specific sites, or better, for specific URLs. wunder --On August 25, 2005 11:47:18 PM -0500 Roger B. [EMAIL PROTECTED] wrote: Bob: It's one thing to ignore a wildcard rule in robots.txt. I don't think its a good idea, but I can at least see a valid argument for it. However, if I put something like: User-agent: PubSub Disallow: / ...in my robots.txt and you ignore it, then you very much belong on the Bad List. -- Roger Benningfield -- Walter Underwood Principal Software Architect, Verity
Re: Don't Aggregrate Me
--On August 26, 2005 9:51:10 AM -0700 James M Snell [EMAIL PROTECTED] wrote: Add a new link rel=readers whose href points to a robots.txt-like file that either allows or disallows the aggregator for specific URI's and establishes polling rate preferences User-agent: {aggregator-ua} Origin: {ip-address} Allow: {uri} Disallow: {uri} Frequency: {rate} [{penalty}] Max-Requests: {num-requests} {period} [{penalty}] No, on several counts. 1. Big, scalable spiders don't work like that. They don't do aggregate frequencies or rates. They may have independent crawlers visiting the same host. Yes, they try to be good citizens, but you can't force WWW search folk to redesign their spiders. 2. Frequencies and rates don't work well with either HTTP caching or with publishing schedules. Things are much cleaner with a single model (max-age and/or expires). 3. This is trying to be a remote-control for spiders instead of describing some characteristic of the content. We've rejected the remote control approach in Atom. 4. What happens when there are conflicting specs in this file, in robots.txt, and in a Google Sitemap? 5. Specifying all this detail is pointless if the spider ignores it. You still need to have enforceable rate controls in your webserver to handle busted or bad citizen robots. 6. Finally, this sort of thing has been proposed a few times and never caught on. By itself, that is a weak argument, but I think the causes are pretty strong (above). There are some proprietary extensions to robots.txt: Yahoo crawl-delay: http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html Google wildcard disallows: http://www.google.com/remove.html#images It looks like MSNbot does crawl-delay and an extension-only wildcard: http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_RestrictAccessToSite.htm wunder -- Walter Underwood Principal Software Architect, Verity
Re: Don't Aggregrate Me
I'm adding robots@mccmedia.com to this dicussion. That is the classic list for robots.txt discussion. Robots list: this is a discussion about the interactions of /robots.txt and clients or robots that fetch RSS feeds. Atom is a new format in the RSS family. --On August 26, 2005 8:39:59 PM +1000 Eric Scheid [EMAIL PROTECTED] wrote: While true that each of these scenarios involve crawling new links, the base principle at stake is to prevent harm caused by automatic or robotic behaviour. That can include extremely frequent periodic re-fetching, a scenario which didn't really exist when robots.txt was first put together. It was a problem then: In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren't welcome for various reasons. Sometimes these reasons were robot specific, e.g. certain robots swamped servers with rapid-fire requests, or retrieved the same files repeatedly. In other situations robots traversed parts of WWW servers that weren't suitable, e.g. very deep virtual trees, duplicated information, temporary information, or cgi-scripts with side-effects (such as voting). http://www.robotstxt.org/wc/norobots.html I see /robots.txt as a declaration by the publisher (webmaster) that robots are not welcome at those URLs. Web robots do not solely depend on automatic link discovery, and haven't for at least ten years. Infoseek had a public Add URL page. /robots.txt was honored regardless of whether the link was manually added or automatically discovered. A crawling service (robot) should warn users that the URL, Atom or otherwise, is disallowed by robots.txt. Report that on the status page for that feed. wunder -- Walter Underwood Principal Software Architect, Verity
Re: Don't Aggregrate Me
--On August 25, 2005 3:43:03 PM -0400 Karl Dubost [EMAIL PROTECTED] wrote: Le 05-08-25 à 12:51, Walter Underwood a écrit : /robots.txt is one approach. Wouldn't hurt to have a recommendation for whether Atom clients honor that. Not many honor it. I'm not surprised. There seems to be a new generation of robots that hasn't learned much from the first generation. The Robots mailing list is silent these day. That is why we should make a recommendation about it. wunder -- Walter Underwood Principal Software Architect, Verity
Re: Don't Aggregrate Me
I would call desktop clients clients not robots. The distinction is how they add feeds to the polling list. Clients add them because of human decisions. Robots discover them mechanically and add them. So, clients should act like browsers, and ignore robots.txt. Robots.txt is not very widely deployed (around 5% of sites), but it does work OK for general web content. wunder --On August 25, 2005 10:25:08 PM +0200 Henry Story [EMAIL PROTECTED] wrote: Mhh. I have not looked into this. But is not every desktop aggregator a robot? Henry On 25 Aug 2005, at 22:18, James M Snell wrote: At the very least, aggregators should respect robots.txt. Doing so would allow publishers to restrict who is allowed to pull their feed. - James -- Walter Underwood Principal Software Architect, Verity
Re: If you want Fat Pings just use Atom!
--On August 23, 2005 9:40:44 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote: There's nothing in the XML spec requiring the app to throw away the data structures it has already built when the parser reports the error. There is also nothing requiring it. It is optional. The only reqired behavior is to report the error and stop creating parsed information. Otherwise, results are undefined according to the spec. The spec does require that normal processing stop at the error. The parser can make data past the error available, but it must not continue to pass character data and information about the document's logical structure to the application in the normal way. This still feels like a hack to me. An unterminated document is not well-formed, and is not XML or Atom. Doing this should require another RFC that says, we didn't really mean that it had to be XML. wunder -- Walter Underwood Principal Software Architect, Verity
Re: If you want Fat Pings just use Atom!
--On August 22, 2005 12:36:17 AM -0400 Sam Ruby [EMAIL PROTECTED] wrote: With a HTTP client library and SAX, the absolute simplest solution is what Bob is describing: a single document that never completes. Except that an endless document can't be legal XML, because XML requires the root element to balance. An endless document never closes it. So, the endless document cannot be legal Atom. Worse, there is no chance for error recovery. One error, and the rest of the stream might not be parsable. So, it is simple, but busted. The standard trick here is to use a sequence of small docs, separated by ASCII form-feed characters. That character is not legal within an XML document, so it allows the stream to resyncronize on that character. Besides, form-feed actually has almost the right semantics -- start a new page. wunder -- Walter Underwood Principal Software Architect, Verity
Re: If you want Fat Pings just use Atom!
--On August 22, 2005 2:01:45 PM -0400 Joe Gregorio [EMAIL PROTECTED] wrote: Interestingly enough the FF separated entries method would also work when storing a large quantity of entries in a single flat file where appending an entry needs to be fast. The original application was logfiles in XML. wunder -- Walter Underwood Principal Software Architect, Verity
Re: If you want Fat Pings just use Atom!
--On August 23, 2005 12:01:11 PM +0900 Martin Duerst [EMAIL PROTECTED] wrote: Well, modulo character encoding issues, that is. An FF will look differently in UTF-16 than in ASCII-based encodings. Fine. Use two NULs. That is either one illegal UTF-16(BE or LE) character or two illegal characters in ASCII or UTF-8. Of course, a transport level multi-payload system would be preferred. wunder -- Walter Underwood Principal Software Architect, Verity
Re: FYI: Expires Extension Draft
RSS 3? Eh? The RSS ttl element is a mess. RSS 3 Lite (could we spell that word correctly?) specifies it not as information about the feed, but as an attempt to remotely control robots. RSS 2 specifies it as a caching hint, but in minutes, not seconds. Regardless it is useless for a feed with a dedicated update schedule, because it requires updating the feed every second (or minute) as the publish time approaches. For more detail, see: http://www.intertwingly.net/wiki/pie/PaceCaching That was a proposal, and is *not* part of Atom, but it does have some useful discussion of cache hints. For caching, use the native HTTP cache features. wunder --On August 18, 2005 2:20:21 PM -0400 Elias Torres [EMAIL PROTECTED] wrote: I tried commenting on your site, but I have to register to comment. :-( You linked to RSS3 [1] and I spotted something related to this extension that could be used instead. ttl span=days7/ttl It seems more elegant than having to convert to whatever you specified in your spec. Just a thought. Elias [1] http://www.rss3.org/rss3lite.html On 8/17/05, James M Snell [EMAIL PROTECTED] wrote: http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-expires-00.txt Example: entry ... t:expires xmlns:t=...2005-08-16T12:00:00Z/t:expires ... /entry or entry ... updated2005-08-16T12:00:00Z/updated t:max-age2/t:max-age ... /entry This is not to be used for caching of Atom documents; nor is it to be used as a mechanism for scheduling updates of local copies of Atom documents. - James -- Walter Underwood Principal Software Architect, Verity
Re: Expires extension draft (was Re: Feed History -02)
--On August 10, 2005 1:56:05 PM +1000 Eric Scheid [EMAIL PROTECTED] wrote: Aside: a perfect example of what sense of 'expires' is in the I-D itself... Network Working Group Internet-Draft Expires: January 2, 2006 Especially perfect because the HTTP header does not reflect the expiration. Honestly, another reason to put expiration inside the feed is that HTTP caching is just not used. Well, except to force reloads and show you new ads. But it is extremely rare to see it per-document cache information. wunder -- Walter Underwood Principal Architect, Verity
Re: Feed History -02
--On August 9, 2005 1:07:29 PM +0200 Henry Story [EMAIL PROTECTED] wrote: But I would really like some way to specify that the next feed document is an archive (ie. won't change). This would make it easy for clients to know when to stop following the links, ie, when they have cought up with the changes since they last looked at the feed. I made some proposals for cache control info (expires and max-age). That might work for this. wunder -- Walter Underwood Principal Architect, Verity
Re: FormatTests
--On July 17, 2005 3:45:26 PM +0100 Graham [EMAIL PROTECTED] wrote: Now do you see why canonical ids are stupid and irrelevant? Not unless the robustness principal is stupid and irrelevant. Canonical IDs are more robust. Feeds that use them will work better in the quick-and-dirty, Desperate Perl Hacker environment of the internet. The updated warning is just right. Thank you for using Atom, here is how you can do a better job. wunder -- Walter Underwood Principal Architect, Verity
Re: Evangelism, etc.
--On July 16, 2005 11:16:44 AM -0400 Robert Sayre [EMAIL PROTECTED] wrote: I found the criticism pathetic. A little lame, at least. You can't add precision and interoperability with innovation and extension. But there is a point buried under all that. What are the changes required to support Atom? It looks complicated, but how hard is it? Here is a shot at that information. For publishers, you need to be precise about the content. There are fallbacks, where if it is any sort of HTML, send it as HTML, and if it isn't, send it as text. The XHTML and XML options are there for extra control. Also, add an ID. It is OK for this to be a URL to the article as long as it doesn't change later. That is, the article can move to a different URL, but keep the ID the same. Add a modified date. The software probably already has this, and you can fall back to the file last-modified if you have to. But if there is a better date available, use it. The ID and date are required because they allows Atom clients and aggregators to get it right when tracking entries, either in the same feed or when the same entry shows up in multiple feeds. Extending Atom is different from extending RSS, because there are more options. The mechanical part of extensions are covered in the spec, to guarantee that an Atom feed is still interoperable when it includes extensions. The political part of extensions has two options: free innovation and standardization. Anyone can write an extension to Atom and use it. Or, they can propose a standard to the IETF (or another body). The standards process usually means more review, more interoperability, and more delay in deploying it. Sometimes, the delay is worth it, and we hope that is true for Atom. wunder -- Walter Underwood Principal Architect, Verity
Re: The Atomic age
--On July 14, 2005 11:37:05 PM -0700 Tim Bray [EMAIL PROTECTED] wrote: So, implementors... to work. Do we have a list of who is implementing it? That could be used in the Deployment section of http://www.tbray.org/atom/RSS-and-Atom. Ultraseek will implement Atom. We need to think more about exactly what it means for a search engine to implement it, but we'll at least spider it. wunder Creature with the Atom Brain, why is he acting so strange? Roky Erickson -- Walter Underwood Principal Architect, Verity
Mystery abbrevations in draft 9
In 4.2.6 atom:id, the last sentence is: o Ensure that all components of the IRI are appropriately character- normalized, e.g. by using NFC or NFKC. NFC and NFKC need to be defined, with a reference to the Unicode spec. wunder -- Walter Underwood Principal Architect, Verity
RE: Roll-up of proposed changes to atompub-format section 5
--On Tuesday, July 05, 2005 11:48:44 AM -0700 Paul Hoffman [EMAIL PROTECTED] wrote: At 2:24 PM -0400 7/5/05, Bob Wyman wrote: I find it hard to imagine what harm could be done by providing this recommendation. Timing. If we change text other than because of an IESG note, there is a strong chance we will have to delay being finalized by two weeks, possibly more. I'm fine with the delay. Two or three weeks on top of 18 months is not a big deal. wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: Clearing a discuss vote on the Atom format
--On July 1, 2005 4:44:23 PM +0900 Martin Duerst [EMAIL PROTECTED] wrote: The reason for this is to make sure we have interoperability with a mandatory-to-implement (and default-to-use) canonicalization, but that we don't disallow other canonicalizations that for one or the other as of now not yet clear reason may be preferable in some cases in the future (but in your wording would prohibit the result to be called Atom at all). A potential future reason that we can't even characterize isn't enough reason for me to support this. If we discover weaknesses in the canonicalization, we'll need to change Atom anyway. Explicitly making room for future incompatible canonicalizations doesn't make any sense to me. What is the point of calling something Atom when it uses a canonicalization which prevents interop with legal Atom implementations? wunder -- Walter Underwood Principal Architect, Verity
Re: Google Sitemaps: Yet another RSS or site-metadata format and Atom competitor
--On June 7, 2005 3:17:04 AM -0700 gstein [EMAIL PROTECTED] wrote: proprietary connotes closed. We published the spec and encourage other search engines to use it. There is no intent to close or control it. Proprietary means owned. Google clearly owns Google Sitemaps. The license requires derivative works to keep the same license. That is control. It was designed in isolation, for Google's use. That is a closed spec. For example, the priority element is not specified well enough for another engine to implement it compatibly. Does it apply to ranking, crawl order or duplicate preference? An open process would have at least looked at the proposed extensions for robots.txt and earlier formats like Infoseek sitelist.txt. wunder -- Walter Underwood Principal Architect, Verity
Re: Last and final consensus pronouncement
The atom:author element name is embarrassing. Make it atom:creator. There were no objections to that. wunder --On May 26, 2005 10:26:54 AM -0700 Tim Bray [EMAIL PROTECTED] wrote: co-chair-mode On behalf of Paul and myself: This is it. The initial phase of the WG's work in designing the Atompub data format specification is finished over, pining for the fjords, etc. Please everyone reach around and pat yourselves on the back, I think the community will generally view this as a fine piece of work. Stand by for announcements on buckling down on Atom-Protocol. Note that this is a pronouncement, not a call for further debate. Here are the next steps: 1. Editors take the assembled changes and produce a format-09 I-D. Sooner is better. 2. They post the I-D. 3. Paul sends Scott a message, cc'ing the WG, that we're done. 4. At this point there may be objections from the WG. We decide whether to accept the objections and pull the draft back, or tell the objectors they'll have to pursue the appeal process. 5. The IESG process takes over at this point and we'll eventually hear back from them. Last two draft changes: 1. PaceAtomIdDOS We think that the WG has consensus that it is of benefit to add a warning to section 8 Security Considerations. The language from PaceAtomIdDos is mostly OK, except that the late suggestion of talking about spoofing instead of DOS seemed to get general support. I reworded slightly. We'll leave it up to the editors to decide whether a new subsection of section 8 is required. Atom Processors should be aware of the potential for spoofing attacks where the attacker publishes an atom:entry with the atom:id value of an entry from another feed, perhaps with a falsified atom:source element duplicating the atom:id of the other feed. Atom Processors which, for example, suppress display of duplicate entries by displaying only one entry with a particular atom:id value, perhaps by selecting the one with the latest atom:updated value, might also take steps to determine whether the entries originated from the same publisher before considering them to be duplicates. 2. PaceAtom10 http://www.intertwingly.net/wiki/pie/PaceAtom10 We just missed this one in the previous consensus call; seeing lots of +1's and no pushback, it's accepted. /co-chair-mode -- Walter Underwood Principal Architect, Verity
Re: Consensus snapshot, 2005/05/25
--On Wednesday, May 25, 2005 11:03:46 AM -0700 Tim Bray [EMAIL PROTECTED] wrote: Have I missed any? Yes, there has been high-volume debate on several other issues; but have there been any other outcomes where we can reasonably claim consensus exists? Changing atom:author to atom:creator? No objections, so far. I paste together a PACE with the official Dublin Core definition. Should we mention DC for atom:contributor? wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: posted PaceAuthorContributor
--On May 23, 2005 10:52:47 AM -0700 Tim Bray [EMAIL PROTECTED] wrote: If you're worried, one good way to address the issue would be to say that the semantics of this element are based on the Dublin Core's [dc:creator], DC is pretty clear as I recall. I've been thinking that would be a good idea anyhow. Let's call it atom:creator, then, and actually use the DC definition. Not because DC is better, but because it makes the metadata crosswalks (interoperability) work smoothly. wunder -- Walter Underwood Principal Architect, Verity
PaceCaching
--On Tuesday, May 17, 2005 09:13:37 PM -0700 Tim Bray [EMAIL PROTECTED] wrote: PaceCaching Multiple -1's, it fails. I'll address the objections anyway, because I (still) think this is important. 1. This introduces multiple caching schemes. Wrong. Right now we have multiple schemes, with HTTP caching, ad hoc client caching, and ad hoc server-side load shedding. This recommends one consistant scheme, which we know will work. The current multi-scheme approach is a mess, and we can be sure that it will have problems. 2. This applies protocol caching to a client. True, but not really an isssue. HTTP caching does work when used to manage a client cache. Compare a client working through an HTTP cache to one which checks the cache information internally before issuing HTTP requests. The HTTP server will see the same series of requests. Effectively, the client will run a virtual HTTP cache internally. 3. Server-side parsing is too much overhead. Maybe with 90 MHz Pentiums, but XML parsing is really fast these days. Parse the file, cache the values, and toss them if the file has changed when you stat it. Or, the blog server software can set the cache info out-of-band to the server. 4. This requires synchronized clocks. Those are a SHOULD for HTTP, too. And they ought to be a SHOULD for Atom anwyay, because you cannot date-sort entries from two servers with unsynchronized clocks. 5. This is just like HTTP-EQUIV and that has failed. Yes and no. Most HTTP servers ignore HTTP-EQUIV, but it is still useful for passing through things like content-language when there is no HTTP header present. For Atom, the caching info would be valid when there is no HTTP cache header. This is exactly where HTTP-EQUIV is effective today. wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: PaceAllowDuplicateIdsWithModified
--On Thursday, May 19, 2005 01:12:22 AM +1000 Eric Scheid [EMAIL PROTECTED] wrote: (See the wiki for a survey of tools and the dates they support.) hmmm ... Blogger, Moveable Type, JournURL, bloxsom, ExpressionEngine, ongoing, Roller, Macsanomat, WordPress, and BigBlogTool all provide dates which represent the last date/time the entry was modified, and there is no info for LiveJournal. We abaondoned full LiveJournal compatability a long time ago by requiring time zones. Older LJ posts do not have time zones. Don't know about the current ones. wunder -- Walter Underwood Principal Architect Verity Ultraseek
RE: Last Call: 'The Atom Syndication Format' to Proposed Standard
--On May 10, 2005 8:57:47 AM -0400 Scott Hollenbeck [EMAIL PROTECTED] wrote: I have to agree with Paul. I don't believe that the issue of white space in the syndicated content is really an Atompub issue. It might be an issue for the content creator. It might be an issue for the reader. As long as the pipe between the two passes the content as submitted, though, the pipe has done its job. If publishers and subscribers have obstacles to using Atom, that sounds like a problem to me. Everyone has this problem is not a good reason to ignore it. Someone has to be the first to solve it, might as well be us. It is not acceptable to build formats for the English Wide Web. That doesn't exist any more. wunder -- Walter Underwood Principal Architect, Verity
Re: Atom 1.0?
--On Tuesday, May 10, 2005 09:12:09 AM -0700 Paul Hoffman [EMAIL PROTECTED] wrote: At 9:09 PM -0700 5/9/05, Walter Underwood wrote: Seriously, I don't mind Atom 1.0 as long as the next version is Atom 2.0. +12 I'd also be happy with just Atom and saying RFC Atom when pressed for a version. Even with Atom 1.0 we'll need to say which RFC. If we choose a specific name, it *must* be in the RFC. Because the RFC must be a hit for that search. wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: Atom 1.0?
--On May 9, 2005 7:29:58 PM -0700 Tim Bray [EMAIL PROTECTED] wrote: Anyone have a better idea? --Tim Hey, let's vote on a *new* name. I'm +1 on Naked News, because it delivers the news without chrome and crap. Or maybe that is what you get when Atom (Adam?) goes public. Or because sex sells. Seriously, I don't mind Atom 1.0 as long as the next version is Atom 2.0. Please don't increment the right-of-the-dot part forever, because I just had to fix some software that made the (reasonable) assumption that 5.10==5.1, even though 5.10 is really Solaris 10. wunder -- Walter Underwood Principal Architect, Verity
Re: PaceCaching
--On May 6, 2005 4:28:44 PM -0700 Paul Hoffman [EMAIL PROTECTED] wrote: -1. Having two mechanisms in two different layers is a recipe for disaster. If HTTP headers are good enough for everything else on the web, they're good enough for Atom. That would be a problem. But this is one mechanism with two ways to specify it. One is out-of-band in a server-specific way, the other is in the document in a standard way. Either way, it is HTTP rules for caching at all intermediate caches and at the client. Architecturally, this is exactly the same as HTTP-EQUIV meta tags for HTTP headers, and very similar to the ROBOTS meta tag for /robots.txt. In both cases, they provide a way for the document author to specify something without having permissions on the server software config. Further, these should be implemented exactly like HTTP-EQUIV, where the server software reads them and sets the header. The HTTP-EQUIV meta tag is proof put it in the header is not good enough for everything else. If that wasn't needed, it would be deprecated by now. There is a problem here, though. We need to specify the priority of the in-document specs vs. the HTTP header specs. I propose following the HTTP standard, in saying that the HTTP headers trump anything in the body. I'll even assume that following the HTTP spec is non-controversial, and go update the PACE. wunder -- Walter Underwood Principal Architect, Verity
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
--On May 7, 2005 11:29:07 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote: Why would you put line breaks in the CJK source, then? Isn't the problem solved with the least heuristics by the producer not putting breaks there? It would be even better if they would just speak English. :-) White space is not particularly meaningful in some of these languages, so we cannot expect them to suddenly pay attention to that just so they can use Atom. There will be plenty of content from other formats with this linguistically meaningless white space. If we get this wrong, Atom-delivered content will look broken in some languages, and a bunch of extra-spec practice will build up about how to fix it. Much better to get it right in 1.0. wunder -- Walter Underwood Principal Architect, Verity
Re: Atom feed refresh rates
--On May 5, 2005 10:53:48 AM -0700 John Panzer [EMAIL PROTECTED] wrote: I assume an HTTP Expires header for Atom content will work and play well with caches such as the Google Accelerator (http://webaccelerator.google.com/). I'd also guess that a syntax-level tag won't. Is this important? The syntax-level tag is useful inside a client program with a cache. It can reduce the number of requests at the source, rather than reducing them in the middle of the network at an HTTP cache. There is extra benefit from putting that info into the HTTP headers, because the HTTP cache is shared between multiple clients. The source webserver sees one GET per HTTP cache instead of one GET per Atom client. The syntax-level tag also provides a way for the feed author to specify the info without depending on webserver-specific controls. It does depend on some extra bit of software to take that info and put it in the HTTP Expires or Cache-control headers. wunder -- Walter Underwood Principal Architect, Verity
RE: Selfish Feeds...
--On May 6, 2005 4:37:23 PM -0400 Bob Wyman [EMAIL PROTECTED] wrote: Frankly, I really wish that we had done the blog architecture work many months ago so that we would all have a shared understanding of the system-wide issues and components rather than the widely divergent personal and partial views that are obvious in many our conversations today... Agreed. A conceptual model of a resource is up there at the front of our charter, and if we don't have that, it doesn't seem like the WG is done. wunder -- Walter Underwood Principal Architect, Verity
RE: Atom feed refresh rates
--On May 5, 2005 8:15:10 AM +0100 Andy Henderson [EMAIL PROTECTED] wrote: here is no RSS2 feature I can see that allows feed providers to tell aggregators the minimum refresh period. There's the ttl tag. That was, I believe, introduced for a different purpose and determines the Maximum time a feed should be cached in a certain situation. We need both a ttl (max-age) and expires. One or the other is appropriate for different publishing needs. We also need to specify what you do with those values, or you end up with a mess, like the RSS2 ttl meaning reversing over an undocumented value (Yikes!). What has yet to be tried is a specific tag in the core feed standard that promotes and determines good behaviour for aggregators refreshing their feeds. Even if it were to prove only a limited benefit, it would still be a benefit. It has been tried several ways, originally in robots.txt extensions and also in RSS. It doesn't work. The model is not rich enough for publishers or for spiders/aggregators. Max-age/expires is already designed and proven. By page count, 20% of the HTTP 1.1 spec is about caching. If we want to write a new caching/scheduling approach, we can expect it to be a 20 page spec, plus an additional 10 pages on how to work with the HTTP model. See the Notes section here for details on when to use max-age or expires, and on the problems with calendar-based schemes. http://www.intertwingly.net/wiki/pie/PaceCaching wunder -- Walter Underwood Principal Architect, Verity
Re: Atom feed refresh rates
--On May 5, 2005 8:07:15 AM -0500 Mark Pilgrim [EMAIL PROTECTED] wrote: Not to be flippant, but we have one that's widely available. It's called the Expires header. You need the information outside of HTTP. To quote from the RSS spec for ttl: This makes it possible for RSS sources to be managed by a file-sharing network such as Gnutella. Caching information is about knowing when your client cache is stale, regardless of how you got the feed. wunder -- Walter Underwood Principal Architect, Verity
Re: AtomPubIssuesList for 2005/05/05
--On May 5, 2005 7:17:00 AM -0400 Sam Ruby [EMAIL PROTECTED] wrote: Demonstrate that you have revisited the previous discussion, and that you either have something new to add, or can point out some evidence that the previous consensus call was made in error. PaceCaching was not discussed and rejected based on false information. It was rejected because it was HTTP-specific (it is not), and because it was non-core (similar features are common in other RSS specs). It does not interact with other features, so it should be a fairly clean, quick discussion. wunder -- Walter Underwood Principal Architect, Verity
Re: Atom feed refresh rates
PaceCaching uses the HTTP model for Atom, whether Atom is used over HTTP or some other protocol. PaceCaching was rejected by the editors because it was too late (two months ago) and non-core. I think that: a) it is never too late to get it right, and b) scalability is core. The PACE describes why refresh rates do not solve the problem adequately. wunder --On May 4, 2005 5:44:18 AM -0500 Brett Lindsley [EMAIL PROTECTED] wrote: Andy, I recall bringing up the same issue with respect to portable devices. My angle was that firing up the transmitter, making a network connection and connecting to the server is still an expensive operation in time and power (for a portable device) - even if the server returns nothing . There is no reason to check feeds that are not being updated, but then, there currently is no way to know this. I recall there was a proposal on cache control. That seemed like a good direction, but I don't recall it being discussed. As you indicated, if the feed had some element that indicated it won't be updated (for example) for another day (e.g. a daily news summary), then the end client would need to only check once a day. Brett Lindsley, Motorola Labs Andy Henderson wrote: If I'm asking this in the wrong place, sorry; please redirect me if you can. I am the author of an Aggregator and I'm looking for advice on refresh rates. There was some discussion in this group back in June about a possible 'Refresh rate' element. That seems to have been dismissed in favour of bandwidth throttling techniques, notably etag, last-modified and compression. I already support all these plus some additional ones. I am uncomfortable, though, with the implication that refresh rates don't matter and should be left to the end-user to decide. I am adding Atom support to my Agg. For RSS feeds, I have used the ttl and sy:updatePeriod / sy:updateFrequency elements to allow feed providers to limit refresh rates. I have, in any case, imposed a minimum refresh rate of one hour - because that seemed the decent thing to do. However, I'm coming under pressure to reduce that minimum limit for feeds that are clearly designed for shorter refresh periods - such as the Gmail Atom feeds. I'm reluctant to implement a free-for-all so I'm looking for guidance on how I should tackle this issue. Andy Henderson Constructive IT Advice -- Walter Underwood Principal Architect, Verity
Re: FYI: More on duplicates in feeds: DoubleClick does ads the WRONG way!
--On May 2, 2005 5:32:22 PM +1000 Eric Scheid [EMAIL PROTECTED] wrote: Counting impressions is essential to their trade, and you'll find that it is industry standard practice. Make that was essential, and should be a dying practice. Ads have moved to results-based billing, paying for clickthrough and conversion. wunder -- Walter Underwood Principal Architect, Verity
Re: PaceOptionalFeedLink
--On April 30, 2005 3:03:50 PM -0400 Robert Sayre [EMAIL PROTECTED] wrote: atom:feed elements MUST NOT contain more than one atom:link element with a rel attribute value of alternate that has the same combination of type and hreflang attribute values. That actually specifies something different, the duplication, without saying whether atom:link is recommended. I recommend adding this text: An atom:feed element SHOULD/MAY contain one such atom:link element. I'll let other people contribute on whether it is SHOULD or MAY. wunder -- Walter Underwood Principal Architect, Verity
Re: HTML/XHTML type issues, was: FW: XML Directorate Reviewer Comments
--On April 13, 2005 9:06:59 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote: Instead of saying XHTML it would be clearer to say XHTML 1.x or defining it in terms of the XHTML 1.x namespace URI. This could work. XHTML 1.0 will not be confused with a media type. When XHTML 2.0 is ready, we can add a supplemental RFC which defines a new attribute value for that. wunder -- Walter Underwood Principal Architect, Verity
Re: PaceCoConstraintsAreBad
--On April 8, 2005 8:29:52 PM -0400 Robert Sayre [EMAIL PROTECTED] wrote: Please don't respond to me by saying that accessibility is important. I would never say that. Required or essential, but not merely important. wunder -- Walter Underwood Principal Architect, Verity
Re: PaceCoConstraintsAreBad
--On April 8, 2005 6:59:47 PM -0400 Robert Sayre [EMAIL PROTECTED] wrote: Walter, you are missing my point. You've said it yourself: Maybe summaries are optional, but not because accessibility is optional.[0] That was in reply to a proposal to make accessibility an optional profile, and to make summaries required only in that profile. That approach is unacceptable. I would read my comment as regardless of your position on summaries, accessibility is required. Local textual summaries are rather common on the web. The a tag, for example. Current accessibility practice is to make the anchor text understandable out of context. In other words, to make it a summary of the linked resource. Even if the remote resource is text! For the img tag, the alt tag is used to provide a local, textual equivalent. Again, this is required practice for accessibility. Same thing for graphs, charts, audio, and video. These are top-level requirements. They fit on the WAI pocket card. There are ten quick tips and five of them are about local textual equivalents: http://www.w3.org/WAI/References/QuickTips/ wunder -- Walter Underwood Principal Architect, Verity
Re: Spaces supports slash:comments. Result = Duplicates Galore!
One way to look at this is to define what parts are local content as opposed to caches of remote, and base the Etag or other hash on that. I still think we should address caching in Atom 1.0. This would have been part of that. Scaling is an essential thing for syndication, and caching is the best known way to scale. wunder --On Thursday, April 07, 2005 02:48:07 PM -0400 Bob Wyman [EMAIL PROTECTED] wrote: Spaces.msn.com recently announced support for slash:comments, an element which shows how many comments an RSS item has associated with it. As Dare Obasanjo explains[1]: Another cool RSS enhancement is that the number of comments on each post is now provided using the slash:comments elements. Now users of aggregators like RSS Bandit can track the comment counts on various posts on a space. I've been wanting that since last year. Of course, the side effect of this change is that any aggregator that uses an MD5-like approach to detect changes will now think that an entry has been updated every time a new comment is made. This may or may not be what is desired by consumers of feeds... In any case, there are now millions of blogs whose entries are changed every time anyone comments on them. Should aggregators ignore changes that are limited to the slash:comments element? If so, are there other elements that should be ignored? Now, Spaces only publishes RSS feeds... However, if similar atom extensions were to be defined, the problem would appear with Atom feeds as well. bob wyman [1] http://spaces.msn.com/members/carnage4life/Blog/cns%211piiOwAp2SJRIfUfD95CnR Lw%21430.entry -- Walter Underwood Principal Architect Verity Ultraseek
Re: Alternative to the date regex
+1 on dropping the regex. It isn't from any of the other specs, it isn't specifically called out as explanatory and non-normative, and it is too long to be clear. Some examples would be nice, along with some examples of things which do not conform. wunder --On March 25, 2005 5:11:09 PM + Graham [EMAIL PROTECTED] wrote: Currently we have this A Date construct is an element whose content MUST conform to the date-time BNF rule in [RFC3339]. I.e., the content of this element matches this regular expression: [0-9]{8}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+) ?(Z|[\+\-][0-9]{2}:[0-9]{2}) As a result, the date values conform to the following specifications... The problem with the regex is that it's entirely redundant. If we look at Norm's message where the regex was suggested [1], he intends it as a profile of xsd:dateTime, which allows a variety of date formats. However we're using it as a profile of RFC3339, which already requires that date-times match the regex 100%. Having the regex there as well is just confusing - until preparing this email I was under the impression it made some additional restrictions on RFC3339. The nearest thing I see to an additional restriction is that there must be a capital T between the date and time, which the date-time BNF rule we mention also requires, but the prose later mentions you might be allowed to use something different. Proposal: Replace the first para and regex with: A Date construct is an element whose content MUST conform to the date-time BNF rule in [RFC3339]. Note this requires an uppercase letter T between the date and time sections. Secondly, *all* RFC3339 date-times are compatible with the 4 specs mentioned, so the wording of the second paragraph (As a result...) is a bit strange, since it's not as a result of anything we've done. Just say Date values expressed in this way are also compatible with Graham [1]http://www.imc.org/atom-syntax/mail-archive/msg13116.html -- Walter Underwood Principal Architect, Verity
Re: new issues in draft -06, was: Updated issues list
--On March 20, 2005 11:44:30 AM -0800 Tim Bray [EMAIL PROTECTED] wrote: Good point. My impression is that we do currently have SHOULD-level mandate to serve valid HTML; recognizing that most real-world implementors do make a best-effort with tag soup. Anyone who thinks that the language needs improving should suggest improvements. I support a SHOULD on that. The Robustness Principle would suggest exactly that. Consumers of Atom may make an attempt to parse arbitrary HTML-like content, but producers should make the effort to serve clean HTML. That free-range HTML is nasty stuff. In the past week, we had two customers freely mixing slash and backslash in their URL paths. Sigh. wunder -- Walter Underwood Principal Architect, Verity
Re: PaceRepeatIdInDocument solution
About logical clocks in atom:modified: --On February 21, 2005 3:30:13 AM +1100 Eric Scheid [EMAIL PROTECTED] wrote: Semantically, it would work ... for comparing two instances of one entry. It wouldn't work for establishing if an entry was modified before or after [some event moment] (eg. close of the stock exchange). Establishing sequences of events is rather tricky. See Leslie Lamport's Time, Clocks, and the Ordering of Events in Distributed Systems for how to do it with logical clocks. The core part of the paper is short, maybe five pages, and definitely worth reading if you care about this stuff. http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf Synchronized clocks make this simpler. If Atom depends on comparing timestamps from different servers, then synchronized clocks are a SHOULD. See the text in PaceCaching for an example. Synchronized clocks are already a SHOULD for HTTP. wunder -- Walter Underwood Principal Architect, Verity
Re: atom:entry elements MUST contain an atom:summary element in any of the following cases
I don't think that accessibility is optional. It isn't a profile, it is a requirement. Maybe summaries are optional, but not because accessibility is optional. wunder --On February 14, 2005 8:48:08 PM -0800 James M Snell [EMAIL PROTECTED] wrote: At the risk of beating the PaceProfile drum to death, I would think that an Accessibility profile could be used to specify specific requirements for accessible feeds. The core could do exactly as you suggest below -- not require summary. -- Walter Underwood Principal Architect, Verity
RE: PaceHeadless
--On Tuesday, February 08, 2005 08:39:42 AM -0500 Bob Wyman [EMAIL PROTECTED] wrote: Linking to the feed is not an acceptable solution. It must be possible to embed feed metadata in an entry in a feed and in an Entry document. +1 The feed document *must* be standalone. Everything required to interpret the feed has to be in the feed. wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: PaceClarifyDateUpdated
--On February 6, 2005 1:07:42 PM +0200 Henri Sivonen [EMAIL PROTECTED] wrote: Yes. Also as a spec expectation--that is, how often is the SHOULD NOT expected to be violated. Will the SHOULD NOT be violated so often that it dilutes the meaning of all SHOULD NOTs? Roughly, a SHOULD or SHOULD NOT can be violated when the implementer understands and accepts the interoperability limitations they of that decision. So, the spec should (must?) explain what those are. wunder -- Walter Underwood Principal Architect, Verity
RE: PaceArchiveDocument posted
I agree, but I would put it another way. The charter requires support for archives, but we don't have a clear model for those. Without a model, we can't spec syntax. So, it is not possible for the current doc to fulfill the charter, and this document is not ready for last call. wunder --On February 6, 2005 2:00:20 AM -0500 Bob Wyman [EMAIL PROTECTED] wrote: -1. The use cases for archiving have not been well defined or well discussed on this list. It is, I believe, inappropriate and unwise to try to rush through something this major at the last moment before a pending Last Call. bob wyman -- Walter Underwood Principal Architect, Verity
Re: PaceCaching posted
This is not restricted to HTTP. It uses HTTP's cache age algorithms, because they are very carefully designed and have proven effective. But it can be used for any local copy in an Atom client. wunder --On Monday, February 07, 2005 10:08:48 AM -0800 Paul Hoffman [EMAIL PROTECTED] wrote: At 9:38 AM -0800 2/7/05, Walter Underwood wrote: I was holding this back as out of scope and too close to the deadline, but now that we are talking about sliding windows and delayed, cached state, it is quite relevant. Sorry, this is too late for consideration for the Atom core. Even if you had turned it in on time, I would give it a -1 for not being essential to the core for the Atom format. Atom will be distributed over many protocols, HTTP being one of them. Having said that, I think this would be an excellent extension, one that might keep the folks who don't understand HTTP scalability but feel free to talk about it anyway at bay. --Paul Hoffman, Director --Internet Mail Consortium -- Walter Underwood Principal Architect Verity Ultraseek
Re: PaceEntryOrder
--On February 7, 2005 1:06:49 PM -0500 Robert Sayre [EMAIL PROTECTED] wrote: Paul Hoffman wrote: +1. It is a simple clarification that shows the intention without restricting anyone. +1. Agree in full. -1. I don't see the benefit. Clients MAY re-order them, but that doesn't mean they MUST ignore the order. The publisher may prefer an order which cannot be expressed in the attributes. The Macintouch and BBC New feeds cited before are good examples. wunder -- Walter Underwood Principal Architect, Verity
Re: PaceEntryOrder
--On Monday, February 07, 2005 12:24:15 PM -0800 Paul Hoffman [EMAIL PROTECTED] wrote: At 11:07 AM -0800 2/7/05, Walter Underwood wrote: -1. I don't see the benefit. Clients MAY re-order them, but that doesn't mean they MUST ignore the order. The publisher may prefer an order which cannot be expressed in the attributes. The Macintouch and BBC New feeds cited before are good examples. I'm very confused. Clients that show the entries of those feeds in the received order are perfectly acceptable according to the wording of this Pace. Correct, clients may choose any order, including the original. This is about the publisher's order preference. The Pace says that the publisher cannot indicate a preferred order in the Atom format. The order is not significant. This is clearly counter to normal use, where the order does have some meaning. The meaning varies by publisher, but it is usually significant. wunder -- Walter Underwood Principal Architect Verity Ultraseek
RE: Entry order
--On February 3, 2005 11:21:50 PM -0500 Bob Wyman [EMAIL PROTECTED] wrote: David Powell wrote: It looks like this might have got lost accidently when the atom:head element was introduced. Previously Atom 0.3 said [1]: Ordering of the element children of atom:feed element MUST NOT be considered significant. +1. The order of entries in an Atom feed should NOT be significant. This is, I think, a very, very important point to make. -1 Is this a joke? This is like saying that the order of the entries in my mailbox is not significant. Note that ordering a mailbox by date is not the same thing as its native order. Feed order is the only way we have to show the publication order of items in a feed. I just looked at all my subscriptions, and there is only one where the order might not be relevant, a security test for RSS readers. That is clearly not within Atom's charter, so it doesn't count. wunder -- Walter Underwood Principal Architect, Verity
Re: Entry order
--On February 4, 2005 11:44:31 AM -0800 Tim Bray [EMAIL PROTECTED] wrote: On Feb 4, 2005, at 11:27 AM, Walter Underwood wrote: Is this a joke? This is like saying that the order of the entries in my mailbox is not significant. Note that ordering a mailbox by date is not the same thing as its native order. Except for, Atom entries have a *compulsory* updated date. So I have no idea what semantics you'd attach to the natural order... -Tim Order the publisher wants to present them in. Conventionally, most recently published first. Entries may be updated without being reordered. If clients are told to ignore the order, and given only an updated timestamp, there is no way to show most recent headlines, which is the primary purpose of the whole family of RSS formats. Right now, you can shuffle the entries and Atom says it is the same feed. Either we need a published date stamp or we need to honor the order. wunder -- Walter Underwood Principal Architect, Verity
Re: Entry order
--On February 4, 2005 4:28:53 PM -0600 Roger B. [EMAIL PROTECTED] wrote: If clients are told to ignore the order, and given only an updated timestamp, there is no way to show most recent headlines... At a single moment within a feedstream, sure... but the next time an entry is added to that feed, I'll have no problem letting the user know that this is new stuff. But if three are added, you can't order those three. wunder -- Walter Underwood Principal Architect, Verity
Re: Format spec vs Protocol spec
On the other hand, the original plan was to publish both specs at the same time, which I still think is a good idea. Do we think there will be any dependencies in the other direction? That is, when we work on the protocol, will we need to add things to feed or entry? If there is a reasonable chance of that, then Atom 1.0 is a temporary thing, and Atom 1.1 will be the real one. If that is the case, we should leave in the dependency and publish them together. wunder --On Tuesday, February 01, 2005 09:40:38 PM -0800 Paul Hoffman [EMAIL PROTECTED] wrote: At 10:05 PM +0100 2/1/05, Julian Reschke wrote: As far as I understand the IETF publication process, this means that draft-ietf-atompub-format can't be published until the protocol spec is ready as well. Others have said we can and should remove the dependency, which is fine. Wearing my nitpicky-IETF-geek hat, I would point out that specs with dangling dependencies can be made standards without clearing the dependencies; they simply can't be published as RFCs with them. There are dozens (possibly over a hundred) IETF standards-track documents that have not yet been published as RFCs for a variety of reasons, many of them quite lame. --Paul Hoffman, Director --Internet Mail Consortium -- Walter Underwood Principal Architect Verity Ultraseek
Re: Format spec vs Protocol spec
--On Wednesday, February 02, 2005 11:53:29 AM -0700 Antone Roundy [EMAIL PROTECTED] wrote: On Wednesday, February 2, 2005, at 11:56 AM, Walter Underwood wrote: We are assuming that Atom will need extensions for new applications, but it should not need extensions for editing blog entries. I'd have to disagree. I don't think it inappropriate for elements that exist for use by the publishing protocol to live in a separate namespace from the feed itself. Rather, I think that a clean separation between the two would be desirable. Why require the feed format to be revised if we really just want to alter the publishing protocol? I'm not talking about altering the publishing protcol, I'm talking about things needed to make 1.0 work. It would seem a little odd if we needed extensions or a 1.1 for the 1.0 publishing protocol. wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: Issues with draft-ietf-atompub-format-04
--On January 30, 2005 10:06:23 PM +0200 Henri Sivonen [EMAIL PROTECTED] wrote: So how many European sites besides the EU have the resources to provide translations of the *same* content in multiple languages at the same time? Pretty common in Quebec. We see English and Spanish in the US from Texas to California. California has voter guides in seven languages. It isn't limited to goverments, UBS's site is in four languages and the San Jose Mercury New has editions in Spanish and Vietnamese. How many of those can't provide multiple feed links and really want to stuff everything in a single feed? Good question. The answer probably depends on how much client software allows you to select a preferred locale. All browsers do, so they could easily do that with Atom feeds. Locales aren't just language. You could offer English in US, UK, and Australian versions. I was completely mystified about what the Aussies might mean by footy tipping. wunder -- Walter Underwood Principal Architect, Verity
Re: PaceOrderSpecAlphabetically
--On January 30, 2005 12:34:42 PM -0500 Sam Ruby [EMAIL PROTECTED] wrote: == Abstract == Order the Element Definitions in the specification alphabetically. +1. Yes, please. It works fine for the HTTP spec. wunder -- Walter Underwood Principal Architect, Verity
Re: PaceMustBeWellFormed status
--On Monday, January 24, 2005 04:17:40 PM -0800 Tim Bray [EMAIL PROTECTED] wrote: If there were no further discussion: The WG completely failed to converge to consensus on these issues last time around. Consensus can still be found here. -Tim I'm +1 on this, and feel that it belongs in the spec. This is a constraint on the format of the feed document, and is testable. Forbidding re-parsing (6.2) is OK, and not a restatement of the XML spec. If you use a parser which isn't an XML parser, it might process the doc. This says you can't do that. I think that the rationale misstates the Pace. It says that Atom feeds must always be ASCII, but the proposal only requires that for text/xml feeds. application/xml feeds may use UTF-8, either in an encoding declaration or with a charset parameter. I would add a note that 3023 is normative, and maybe move the notes in 6.1 to an appendix. Are we sure we want RFC 3023 or its successor instead of RFC 3023? A successor could make some Atom feeds illegal without a change to the Atom spec. wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: AtomOWL AtomIsRDF
--On Monday, January 17, 2005 12:16:36 PM -0500 Dan Brickley [EMAIL PROTECTED] wrote: I fear [2] is unfortunately named. Atom is RDF-like in some ways, but until the Atom spec says Atom is RDF, Atom isn't RDF. Call it AtomAsRDF. wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: PaceMustUnderstandElement
Excellent examples. Each of these could be handled without mustUnderstand. Define an extension for entries. Put the restricted content inside the extension. The extension would include the display constraints between the content portions and the disclaimer or authentication portions. This could mean duplicating some content elements inside the extension. On the other hand, it would add support for restricted content instead of redefining regular content as potentially restricted. wunder --On Thursday, January 13, 2005 02:46:06 PM -0800 Tim Bray [EMAIL PROTECTED] wrote: On Jan 13, 2005, at 2:29 PM, David Powell wrote: Does anyone have any example use cases for mustUnderstand? 1. A stream of financial disclosures from a public company in a highly-regulated industry. The legislation is very clear that they may not say anything in public unaccompanied by disclaimers and limitation-of-liability statements. The financial industry gets together an introduces an extension that requests clients to display these disclaimers in a fashion that meets the regulatory requirements. If Atom has MustUnderstand, compliant clients that can't do this will never fail to display the appropriate material, and this reduces the risk of litigation and makes it more likely that such feeds will be created. 2. A stream of information that uses a special-purpose digital-signature scheme to establish the authenticity of the information. People should not act on this information without checking the signature. A person using a conformant Atom client can be sure that they won't see anything that hasn't been checked. -Tim -- Walter Underwood Principal Architect Verity Ultraseek
Re: PaceFeedRecursive
--On Thursday, January 13, 2005 06:55:53 PM -0500 Sam Ruby [EMAIL PROTECTED] wrote: The proposal apparently is for feeds to contain other feeds by containment. My question is whether it would make sense to also support feeds containing other feeds by reference, perhaps via a link element or via a src= attribute (analogous to content/@src defined in 5.12.2). As someone maintaining a spider and search engine, I'm -1 on inclusion by reference. The additional complexity is just not worth it. If a referenced doc is a 404, is the feed state OK? Should I index it or not? Do I always need to re-visit referred docs when I revisit the main one? What do I do with loops? Do I index all the content as belonging to the main doc? What if that is included in something else by reference, can it still be a standalone item? For the same reasons, I'm not very hot on any sort of content by reference. But recursive feeds by reference opens up lots more issues. Basically, if you want your content in a search engine, it had better be accessible with a single GET. wunder -- Walter Underwood Principal Architect Verity Ultraseek
Re: Hash for Links [Was: Re: Posted PaceEnclosuresAndPix]
This is really cache management. Use e-tags, from the HTTP 1.1 spec. Or use an HTTP cache, which would require no changes to Atom. Going thorugh a client-side HTTP 1.1 cache would automatically take advantage of the e-tags and other caching information in HTTP. wunder --On Saturday, January 08, 2005 06:14:50 PM -0800 James Snell [EMAIL PROTECTED] wrote: I really don't want to be going down the road of requiring HTTP header equivalents in the Atom feed, etc. All I want is the ability to specify a hash of whatever it is that is being linked to. It could work in both link and content elements and one could easily use the Content-MD5 header to verify whether or not the resource referenced has been modified since the time it was included in the Entry. The URI and the length of the file do not guarantee that the content has not changed and yes, I had considered this as a possible non-core extension but wanted to float it as a core item first. On Sat, 08 Jan 2005 15:02:27 -0500, Robert Sayre [EMAIL PROTECTED] wrote: Bill de hÓra wrote: link rel=enclosure href=http://example.com/somefile.mp3; hash={generated_hash_value} hashalg={uri_identifying_the_hash_algorithm_used / The hash and hashalg attributes would be optional but MUST appear together. Thoughts? (If we have more than two people respond favorably to this, I'll write up a Pace for it) Seems like a good idea - would it be possible to move them into elements? Well, Content-Length lives in the attributes as length, but I don't think we need to make a home for every HTTP header. Content-MD5 will work just fine; it would probably be wise to send a HEAD request before automatically downloading a giant mp3. Furthermore, you'll get a good enough identifier by concatenating the URI and the length. Something more accurate will require a HEAD request. Thirdly, there's absolutely no reason to have this in core. Robert Sayre -- - James Snell http://www.snellspace.com [EMAIL PROTECTED] -- Walter Underwood Principal Architect Verity Ultraseek