Re: Atom 1.0 xml:base/URI funnies
Tuesday, July 19, 2005, 12:44:51 AM, A. Pagaltzis wrote: You misunderstood what I said. The point is that regardless of how the base URI is determined (whether it is embedded in content or otherwise), it *means* that the content it applies to was actually found at the base URI. It’s not simply any arbitrary old prefix defined for convenience. Why does xml:base allow for relative base URIs and stacking then? If xml:base can only describe the actual source URI of the document, then these features don't make sense. The example in the xml:base spec [1] uses a relative URI in the olist xml:base=/hotpicks/ element, after defining an absolute URI in doc xml:base=http://example.org/today/; at the top of the document. If xml:base can only describe the source URI, then one of them must be lying? [1] http://www.w3.org/TR/xmlbase/#syntax -- Dave
Re: Atom 1.0 xml:base/URI funnies
* David Powell [EMAIL PROTECTED] [2005-07-19 08:25]: Why does xml:base allow for relative base URIs and stacking then? If xml:base can only describe the actual source URI of the document, then these features don't make sense. Indeed, they don’t. The example in the xml:base spec [1] uses a relative URI in the olist xml:base=/hotpicks/ element, after defining an absolute URI in doc xml:base=http://example.org/today/; at the top of the document. [1] http://www.w3.org/TR/xmlbase/#syntax That example says: the content of the root element can be found in the resource at http://example.org/today/, and the content of the olist tag can be found in the resource at http://example.org/hotpicks/. xml:base is quite apparently being used as “a prefix for calculating relative URIs” instead of “the source URI for the material found inside this tag.” It makes me wonder whether the person who wrote the example was unaware of the consequences of the same-document reference specifications in the URI RFC. Surely, the xml:base WG must have noticed this issue and discussed it? If xml:base can only describe the source URI, then one of them must be lying? xml:base provides a mechanism to describe the base URI of any part of an XML document. If you copy bits from another document and don’t want to munge contained URI references, then you will need to set an xml:base on the copied container element for these copied bits. Notice that xml:base describes **a base URI**. The xml:base TR does not define what a base URI is. It’s RFC3986 (and originally, RFC2396) which does, while describing just what an URI is, to begin with. The same-document reference stanza in the URI RFCs is clear evidence that in the spirit of the spec, “base URI” means “the source URI of the content,” not “the prefix I wish to apply to relative references.” Now, xml:base appears to try to address the situation where an aggregate document may contain fragments from many sources, and each of which thus has its own base URI. But the devilish detail is that RFC-specified behaviour means that if a useragent were to find a link to http://example.org/today/ somewhere inside the example document except inside the olist tag, or a link to http://example.org/hotpicks/ inside the olist tag, it may not retrieve that URL – instead it would have to consider the XML document itself to be the document found at the respective URL. This is what RFC3986 says. The xml:base TR contains no language that would contradict, or even enforce, or in fact at all address this point. We therefore have to go by the behaviour specified in RFC3986 when we determine how a user agent resolves URIs. At first, I thought the RFC-specified same-document reference stanza made no sense. But then I realized it is perfectly fine and absolutely desirable for the case where the “base URI embedded in content” applies to the entire document. It is the xml:base TR which is at odds with this; applying same-document reference behaviour to fragments of an aggregate document is non-sensical. The more I think about it, the more it seems like this interaction is Broken As Designed. xml:base should not have adopted the “base URI” term – basically, it appears that the very attribute name itself is a misnomer. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Feed History -02
Am 18.07.2005 um 23:21 schrieb Mark Nottingham: On 18/07/2005, at 2:17 PM, Stefan Eissing wrote: On a more semantic issue: The described sync algorithm will work. In most scenarios the abort condition (e.g. all items on a historical feed are known) will also do the job. However this still means that clients need to check the first fh:prev document if they know all entries there - if my understanding is correct. This is one of the unanswered questions that I left out of scope. The consumer can examine the previous archive's URI and decide as to whether it's seen it or not before, and therefore avoid fetching it if it already has seen it. However, in this approach, it won't see changes that are made in the archive (e.g., if a revision -- even a spelling correction -- is made to an old entry); to do that it either has to walk back the *entire* archive each time, or the feed has to publish all changes -- even to old entries -- at the head of the feed. I left it out because it has more to do with questions about entry deleting and ordering than with recovering state. it's an arbitrary decision (I had language about this in the original Pace I made), but it seemed like a good trade-off between complexity and capability. It is a valid starting point. I am just wondering what consequences it has on client implementations. Let's say CNN goes stateful, how would a client handle a history which soon consists of thousands of entries. How would a server best offer such a history to avoid clients retrieving it over and over again. Probably nobody has a good idea on that one, or? I have the feeling that clients will need to protect themselves from servers with almost infinite histories. So a client will probably offer a XX days into the past, max NN entries setting in its UI. Maybe that is all that's needed. How about: In case feeds are served via HTTP, server implemenations SHOULD offer ETag and Last-Modified headers on history documents (see RFC 2616 xxx). Clients SHOULD persist ETag and Last-Modified information and use If-* headers to ease server load on history synchronization. //Stefan
Re: Feed History -02
On 18 Jul 2005, at 23:21, Mark Nottingham wrote: On 18/07/2005, at 2:17 PM, Stefan Eissing wrote: On a more semantic issue: The described sync algorithm will work. In most scenarios the abort condition (e.g. all items on a historical feed are known) will also do the job. However this still means that clients need to check the first fh:prev document if they know all entries there - if my understanding is correct. This is one of the unanswered questions that I left out of scope. The consumer can examine the previous archive's URI and decide as to whether it's seen it or not before, and therefore avoid fetching it if it already has seen it. However, in this approach, it won't see changes that are made in the archive (e.g., if a revision -- even a spelling correction -- is made to an old entry); to do that it either has to walk back the *entire* archive each time, or the feed has to publish all changes -- even to old entries -- at the head of the feed. Clearly the archive feed will work best if archive documents, once completed (containing a given number of entries) never change. Readers of the archive will have a simple way to know when to stop reading: there should never be a need to re-read an archive page - they just never change. The archive provides a history of the feed's evolution. Earlier changes to the resources described by the feed will be found in older archive documents and newer changes in the later ones. One should expect some entries to be referenced in multiple archive feed documents. These will be entries that have been changed over time. Archives *should not* change. I think any librarian will agree with that. I left it out because it has more to do with questions about entry deleting and ordering than with recovering state. it's an arbitrary decision (I had language about this in the original Pace I made), but it seemed like a good trade-off between complexity and capability. Does that make sense, or am I way off-base? Is it worthy to think of something to spare clients and servers this lookup? Are the HTTP caching and If-* header mechanisms good enough to save network bandwidth? An alternate stratgey would be to require that fh:prev documents never change once created. Then a client can terminate the sync once it sees a URI it already knows. And most clients would not do more lookups than they are doing now... I think this would be the correct strategy. Henry Story
Re: Feed History -02
On 19 Jul 2005, at 01:52, A. Pagaltzis wrote: * Mark Nottingham [EMAIL PROTECTED] [2005-07-18 23:30]: This is one of the unanswered questions that I left out of scope. The consumer can examine the previous archive's URI and decide as to whether it's seen it or not before, and therefore avoid fetching it if it already has seen it. However, in this approach, it won't see changes that are made in the archive (e.g., if a revision -- even a spelling correction -- is made to an old entry); to do that it either has to walk back the *entire* archive each time, or the feed has to publish all changes -- even to old entries -- at the head of the feed. These are the kinds of things my “hub archive feed” situation was supposed to address. Because the links are all in one place, the consumer only has to suck down one document in order to be informed of all archive feeds and being able to decide which ones he wants to re-/get. I wonder if what you are trying to describe here is not a different concept altogether from an archive feed. I guess that both are completely orthogonal concepts. Feeds tend to specialize in a number of resources they track. What would also be useful would be a document that described the resources tracked by a feed. This would be closer to a directory listing. It would help point to the current state of the resources tracked by the feed. So when one subscribed to a feed one could then quickly get a list of all the resources that the feed had responsibility for. As this could be quite large some form of navigation may be necessary. Perhaps this is the type of thing that the protocol group is working on. Henry Story Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Atom 1.0 xml:base/URI funnies
A. Pagaltzis wrote: It makes me wonder whether the person who wrote the example was unaware of the consequences of the same-document reference specifications in the URI RFC. Surely, the xml:base WG must have noticed this issue and discussed it? I wonder how many people are aware of it. I wonder if we managed to convince any readers on the atom list at all. Tim Bray hasn't responded yet, so I guess he is still in doubt. I found out about it through Mozilla Bug 241981. https://bugzilla.mozilla.org/show_bug.cgi?id=241981 Mozilla had implemented that the http content-location header sets the base href. But as Mozilla has no same-document reference support, it would navigate to the content-location document when you clicked on an internal link. The solution was to revert content-location support, and noting that it was broken-by-design. I found this hard to believe, and tried figuring out what was really supposed to happen. Notice that xml:base describes **a base URI**. The xml:base TR does not define what a base URI is. It’s RFC3986 (and originally, RFC2396) which does, while describing just what an URI is, to begin with. The same-document reference stanza in the URI RFCs is clear evidence that in the spirit of the spec, “base URI” means “the source URI of the content,” not “the prefix I wish to apply to relative references.” RFC2396 is not the original spec, HTML 2 is. Later versions of HTML screwed up. I wrote up the history of fragment identifiers here: http://w3future.com/weblog/2005/01/ At first, I thought the RFC-specified same-document reference stanza made no sense. But then I realized it is perfectly fine and absolutely desirable for the case where the “base URI embedded in content” applies to the entire document. It is the xml:base TR which is at odds with this; applying same-document reference behaviour to fragments of an aggregate document is non-sensical. The more I think about it, the more it seems like this interaction is Broken As Designed. xml:base should not have adopted the “base URI” term – basically, it appears that the very attribute name itself is a misnomer. I don't find applying same-document reference behaviour to fragments of an aggregate document non-sensical. If I XInclude a piece of XHTML that has same-document references in it, I still want them to be same-document references, and they should not link back to the original file. -- Sjoerd Visscher http://w3future.com/weblog/
Re: Feed History -02
On Monday, July 18, 2005, at 01:59 AM, Stefan Eissing wrote: Ch 3. fh:stateful seems to be only needed for a newborn stateful feed. As an alternative one could drop fh:stateful and define that an empty fh:prev (refering to itself) is the last document in a stateful feed. That would eliminate the cases of wrong mixes of fh:stateful and fh:prev. The problem is that an empty @href in fh:prev is subject to xml:base processing, and who knows what the current xml:base is going to be when you get to it. Is there a way to explicitly make xml:base undefined? If I'm not mistaken xml:base= doesn't do it--it just adds nothing to the existing xml:base. If there is a way, you could say link rel=fhprev href= xml:base=[whatever value sets it to undefined] /, but otherwise, using an empty @href is probably overloading the wrong attribute. A different @rel value like fh:noprev (with an empty link, since it doesn't matter what it actually points to) might be a step up, but using any kind of link to indicate the lack of a link is a little odd.
Re: Feed History -02
On Tuesday, July 19, 2005, at 12:29 PM, Antone Roundy wrote: On Monday, July 18, 2005, at 01:59 AM, Stefan Eissing wrote: Ch 3. fh:stateful seems to be only needed for a newborn stateful feed. As an alternative one could drop fh:stateful and define that an empty fh:prev (refering to itself) is the last document in a stateful feed. That would eliminate the cases of wrong mixes of fh:stateful and fh:prev. The problem is that an empty @href in fh:prev is subject to xml:base processing, and who knows what the current xml:base is going to be when you get to it. Is there a way to explicitly make xml:base undefined? If I'm not mistaken xml:base= doesn't do it--it just adds nothing to the existing xml:base. If there is a way, you could say link rel=fhprev href= xml:base=[whatever value sets it to undefined] /, but otherwise, using an empty @href is probably overloading the wrong attribute. A different @rel value like fh:noprev (with an empty link, since it doesn't matter what it actually points to) might be a step up, but using any kind of link to indicate the lack of a link is a little odd. Yikes, I should have caught up on the xml:base thread first! Looks like the jury's out, or at least hung, on this issue.
Re: Atom 1.0 xml:base/URI funnies
If anyone comes to a definitive conclusion on this, would they post to the list, or a website please. TIA -- Regards, Dave Pawson XSLT + Docbook FAQ http://www.dpawson.co.uk
Re: Atom 1.0 xml:base/URI funnies
* Sjoerd Visscher [EMAIL PROTECTED] [2005-07-19 12:35]: I don't find applying same-document reference behaviour to fragments of an aggregate document non-sensical. If I XInclude a piece of XHTML that has same-document references in it, I still want them to be same-document references, and they should not link back to the original file. It is and isn’t. I thought about it more, and found that there are cases where it is non-sensical and cases where it’s desirable, but I couldn’t verbalize the difference. Antone filled the gap in his reply below. I am not so negative about the xml:base TR anymore; both the TR as well as the RFC are to blame, to an extent, but it’s neither’s fault. * Antone Roundy [EMAIL PROTECTED] [2005-07-19 22:45]: That example says: the content of the root element can be found in the resource at http://example.org/today/, and the content of the olist tag can be found in the resource at http://example.org/hotpicks/. xml:base is quite apparently being used as “a prefix for calculating relative URIs” instead of “the source URI for the material found inside this tag.” As you can see above, I reached the opposite conclusion. I’m not sure if I didn’t explain myself well (likely), or you misunderstood my (very brief) explanation, but I don’t think we’re in disagreement. Everything you’ve said about your own example document is exactly in line with my thinking. The problem lies not in applying same-document reference behavior, but in copying EXCERPTS from source documents that have links to fragments that aren't part of the excerpt. The same-document reference behavior is desirable if both the link and the fragment it links to are copied into the destination document. Yes!! Exactly. Thank you for finding that disctinction. It is the point I could feel and sense as I thought about this issue more, but couldn’t quite pin down. But there is no way to link to non-excerpted fragments. The URI spec would have to say that if the fragment isn't found in the current document, you can fetch the base URI to see if it exists there (it could even say that you can only do this if the current base URI was embedded in the content). If the fragment doesn't at the base URI, it's a broken link. Indeed, that would be the correct fix. A hackish solution to the Tim's Feed Conundrum would be to set xml:base not to 'http://www.tbray.org/ongoing/', but to 'http://www.tbray.org/ongoing/foo', where foo doesn't actually exist, but is just used to ensure that relative references don't end up being identical to the base URI. Then, instead of link href='' / (which would be a same-document reference...I think I was wrong in the other thread), you could say link href='./' /. That’s hackish, but almost correct. Now substitute foo in that base URI for ongoing.atom and you get the real base URI for the Atom document. Further, link href=./ / then produces a correct alternate link, and link rel=self href= / is then a correct self-link. That is exactly what I proposed a few messages up in this thread. :-) Although at the time, I hadn’t cleared my thinking enough, so proposed the wrong xml:base for individual atom:entry tags, which was corrected by Sjoerd. The other solution I can think of would be for the Atom spec to say that the same-document reference rule from the URI spec does not apply to the atom:link element. But that's kinda lame too--it would basically mean that Atom uses base URIs as prefixes for convenience, rather than to rectify the base URI of data taken from somewhere else, which seems to me to be their intent. Yes, I proposed the same. :-) Clearly, we are on the same page. And yes, it would be lame. Not an undue burden on implementors, as I also already argued, but conceptually it is lame indeed. Finally, I share the dismay you expressed at the beginning of your mail. This kind of sucks… Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Atom 1.0 xml:base/URI funnies
* Graham [EMAIL PROTECTED] [2005-07-20 01:20]: While I agree this interpretation is potentially correct, it moves us pretty far away from the idea of a self-contained document with a singular embedded base URI, which is all that RFC2396 ever discusses. That is pretty much what I said; yes. The “base URI embedded in content” which the RFC describes is really one which has to apply to the entire document. Whether the idea same-document references still make sense when the document isn't a document but an XML element buried deep inside an actual document. It partially does – see Sjoerd’s and Antone’s replies and my reply to them. But the language in RFC3986 does not consider this use case, and the language in the xml:base TR does not address same-document references at all. So there are things possible in the scope of the xml:base TR, for whose behaviour it defers to the RFC, which only considers a small subset of the possible use cases. So we have a mismatched layering of specs, for a certain class of use cases… ugh. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Notes on the latest draft.
I took some notes while reading the specification. Not all of them are good notes, and I was cranky while writing them. Still, they do have some issues or slightly vague points about the spec from my view point. Section 1.2: http://www.w3.org/2005/Atom I guess consistancy is not a requirement of the Atom spec. By convention, this should be all lowercase. Existing software for Atom 0.3 has to be recoded for Atom 1.0, so this change has no real cost. True, URI references shouldn't change; however, that only applies to stable resources. Atom is explicitly unstable until standardized. Section 2: -- Any element defined by this specification MAY have an xml:base attribute. When xml:base is used in an Atom Document, it serves the function described in section 5.1.1 of RFC 3986, establishing the base URI (or IRI) for resolving any relative references found within the effective scope of the xml:base attribute. xml:base is a broken specification. At the simplest, it's just a lame attempt at abbreviating strings. However, it solves that problem in the worst possible manner. As the RDF serializations show, what is needed is a name/value pair simular to entities or xml namespaces. In fact, the general solution would combine all three. This is a case where, in an attempt to simplify problems by seggerating them into domains (i.e. namespaces, URI abbreviations, SGML compatibility, and others), the solutions actually complicate things to the point of absurdity! Of course, it is too late to fix for Atom 1.0, XML 1.0, and others. :-( Section 3.1.1.2: HTML has many entities predefined. If you use HTML content, are those entities allowed (after being escaped, of course)? That would make it really really hard to normalize to text or XML without doctype processing. I feel that HTML entities other than numeric references, amp;gt;, amp;lt;, amp;amp;, amp;apos;, and amp;quote; should be depreciated in HTML content. Section 3.1.1.3: Atom should explicitly endorse XHTML over HTML as perferred form of content. HTML is just really hard to process compared with XML. Also, how does Atom interact with HTML 5, XHTML 2, or future versions of the specs? I still say that version or doctype attributes should be allowed to solve disambiguities and allow compatibility with future versions of those specs. Public/System identifiers can still be used to identify type without validating it - that is why RDF and namespaces work at all. Section 3.2.2: -- The atom:uri element's content conveys an IRI associated with the person. Person constructs MAY contain an atom:uri element, but MUST NOT contain more than one. The content of atom:uri in a Person construct MUST be an IRI reference. There is no reason *not* to change this to atom:id. It is lazy and dangerous to have an element lie about the type of its content. Furthermore, the whole point of atom:uri is the same as atom:id - to identify the thing they refer to (author or entry) - and their content is likewise identical. Section 3.2.3: -- The atom:email element's content conveys an e-mail address associated with the person. Person constructs MAY contain an atom:email element, but MUST NOT contain more than one. Its content MUST conform to the addr-spec production in RFC 2822. OTOH, is there a reason that atom:uri (which should be atom:indentifier) and atom:email are not attributes of a person construct? It is easier to process with CSS, but not harder for other processes, if they are attributes. Especially if white space is not significant and should be normalized. Also, by making them attributes you get the XML processor to enforce the cardinality of these constructs. Section 4.1.1: -- * atom:feed elements SHOULD contain one atom:link element with a rel attribute value of self. This is the preferred URI for retrieving Atom Feed Documents representing this Atom feed. There is a mistake. atom:link indentifies things using an IRI not an URI. Section 4.1.3.3: Clarify this point: This section applies to atom:content when the src attribute is not present. If it is present, then the content of the external file must be valid whatever depending on mime type you use (of course). But it could be valid HTML with a doctype or valid XHTML with a root html element. Section 4.2.5: -- No way of defining the perfered media type of the icon as with links? Section 4.2.6: -- How does this interact with xml:base? Are relative atom:ids allowed? How are they compared? Are entries information resources? If so, should something retrievable be placed at their URI? Section 4.2.8: -- No way of defining the perfered media type of the logo (or icon) as with links? Section 7: -- Fragment identifiers: As specified for application/xml in RFC 3023,