Re: Atom 1.0 xml:base/URI funnies

2005-07-19 Thread David Powell


Tuesday, July 19, 2005, 12:44:51 AM, A. Pagaltzis wrote:

 You misunderstood what I said. The point is that regardless of
 how the base URI is determined (whether it is embedded in content
 or otherwise), it *means* that the content it applies to was
 actually found at the base URI. It’s not simply any arbitrary old
 prefix defined for convenience.

Why does xml:base allow for relative base URIs and stacking then? If
xml:base can only describe the actual source URI of the document, then
these features don't make sense.

The example in the xml:base spec [1] uses a relative URI in the
olist xml:base=/hotpicks/ element, after defining an absolute URI in
doc xml:base=http://example.org/today/; at the top of the document.
If xml:base can only describe the source URI, then one of them must be
lying?

[1] http://www.w3.org/TR/xmlbase/#syntax

-- 
Dave




Re: Atom 1.0 xml:base/URI funnies

2005-07-19 Thread A. Pagaltzis

* David Powell [EMAIL PROTECTED] [2005-07-19 08:25]:
 Why does xml:base allow for relative base URIs and stacking
 then? If xml:base can only describe the actual source URI of
 the document, then these features don't make sense.

Indeed, they don’t.

 The example in the xml:base spec [1] uses a relative URI in the
 olist xml:base=/hotpicks/ element, after defining an
 absolute URI in doc xml:base=http://example.org/today/; at
 the top of the document.
 
 [1] http://www.w3.org/TR/xmlbase/#syntax

That example says: the content of the root element can be found
in the resource at http://example.org/today/, and the content
of the olist tag can be found in the resource at
http://example.org/hotpicks/. xml:base is quite apparently
being used as “a prefix for calculating relative URIs” instead of
“the source URI for the material found inside this tag.”

It makes me wonder whether the person who wrote the example was
unaware of the consequences of the same-document reference
specifications in the URI RFC. Surely, the xml:base WG must have
noticed this issue and discussed it?

 If xml:base can only describe the source URI, then one of them
 must be lying?

xml:base provides a mechanism to describe the base URI of any
part of an XML document. If you copy bits from another document
and don’t want to munge contained URI references, then you will
need to set an xml:base on the copied container element for these
copied bits.

Notice that xml:base describes **a base URI**. The xml:base TR
does not define what a base URI is. It’s RFC3986 (and originally,
RFC2396) which does, while describing just what an URI is, to
begin with. The same-document reference stanza in the URI RFCs is
clear evidence that in the spirit of the spec, “base URI” means
“the source URI of the content,” not “the prefix I wish to apply
to relative references.”

Now, xml:base appears to try to address the situation where an
aggregate document may contain fragments from many sources, and
each of which thus has its own base URI. But the devilish detail
is that RFC-specified behaviour means that if a useragent were to
find a link to http://example.org/today/ somewhere inside the
example document except inside the olist tag, or a link to
http://example.org/hotpicks/ inside the olist tag, it may not
retrieve that URL – instead it would have to consider the XML
document itself to be the document found at the respective URL.

This is what RFC3986 says.

The xml:base TR contains no language that would contradict, or
even enforce, or in fact at all address this point. We therefore
have to go by the behaviour specified in RFC3986 when we
determine how a user agent resolves URIs.

At first, I thought the RFC-specified same-document reference
stanza made no sense. But then I realized it is perfectly fine
and absolutely desirable for the case where the “base URI
embedded in content” applies to the entire document.

It is the xml:base TR which is at odds with this; applying
same-document reference behaviour to fragments of an aggregate
document is non-sensical. The more I think about it, the more it
seems like this interaction is Broken As Designed. xml:base
should not have adopted the “base URI” term – basically, it
appears that the very attribute name itself is a misnomer.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: Feed History -02

2005-07-19 Thread Stefan Eissing



Am 18.07.2005 um 23:21 schrieb Mark Nottingham:

On 18/07/2005, at 2:17 PM, Stefan Eissing wrote:


On a more semantic issue:

The described sync algorithm will work. In most scenarios the abort 
condition (e.g. all items on a historical feed are known) will also 
do the job. However this still means that clients need to check the 
first fh:prev document if they know all entries there - if my 
understanding is correct.


This is one of the unanswered questions that I left out of scope. The 
consumer can examine the previous archive's URI and decide as to 
whether it's seen it or not before, and therefore avoid fetching it if 
it already has seen it. However, in this approach, it won't see 
changes that are made in the archive (e.g., if a revision -- even a 
spelling correction -- is made to an old entry); to do that it either 
has to walk back the *entire* archive each time, or the feed has to 
publish all changes -- even to old entries -- at the head of the feed.


I left it out because it has more to do with questions about entry 
deleting and ordering than with recovering state. it's an arbitrary 
decision (I had language about this in the original Pace I made), but 
it seemed like a good trade-off between complexity and capability.


It is a valid starting point. I am just wondering what consequences it 
has on client implementations. Let's say CNN goes stateful, how would a 
client handle a history which soon consists of thousands of entries. 
How would a server best offer such a history to avoid clients 
retrieving it over and over again. Probably nobody has a good idea on 
that one, or?


I have the feeling that clients will need to protect themselves from 
servers with almost infinite histories. So a client will probably offer 
a XX days into the past, max NN entries setting in its UI. Maybe that 
is all that's needed.


How about:
In case feeds are served via HTTP, server implemenations SHOULD offer 
ETag and Last-Modified headers on history documents (see RFC 2616 xxx). 
Clients SHOULD persist ETag and Last-Modified information and use If-* 
headers to ease server load on history synchronization.


//Stefan



Re: Feed History -02

2005-07-19 Thread Henry Story



On 18 Jul 2005, at 23:21, Mark Nottingham wrote:

On 18/07/2005, at 2:17 PM, Stefan Eissing wrote:


On a more semantic issue:

The described sync algorithm will work. In most scenarios the  
abort condition (e.g. all items on a historical feed are known)  
will also do the job. However this still means that clients need  
to check the first fh:prev document if they know all entries there  
- if my understanding is correct.




This is one of the unanswered questions that I left out of scope.  
The consumer can examine the previous archive's URI and decide as  
to whether it's seen it or not before, and therefore avoid fetching  
it if it already has seen it. However, in this approach, it won't  
see changes that are made in the archive (e.g., if a revision --  
even a spelling correction -- is made to an old entry); to do that  
it either has to walk back the *entire* archive each time, or the  
feed has to publish all changes -- even to old entries -- at the  
head of the feed.


Clearly the archive feed will work best if archive documents, once  
completed (containing a
given number of entries) never change. Readers of the archive will  
have a simple way to know when
to stop reading: there should never be a need to re-read an archive  
page - they just never change.


The archive provides a history of the feed's evolution. Earlier  
changes to the resources
described by the feed will be found in older archive documents and  
newer changes in the later
ones. One should expect some entries to be referenced in multiple  
archive feed documents. These

will be entries that have been changed over time.

Archives *should not* change. I think any librarian will agree with  
that.



I left it out because it has more to do with questions about entry  
deleting and ordering than with recovering state. it's an arbitrary  
decision (I had language about this in the original Pace I made),  
but it seemed like a good trade-off between complexity and capability.


Does that make sense, or am I way off-base?

Is it worthy to think of something to spare clients and servers  
this lookup? Are the HTTP caching and If-* header mechanisms good  
enough to save network bandwidth?


An alternate stratgey would be to require that fh:prev documents  
never change once created. Then a client can terminate the sync  
once it sees a URI it already knows. And most clients would not do  
more lookups than they are doing now...




I think this would be the correct strategy.


Henry Story



Re: Feed History -02

2005-07-19 Thread Henry Story



On 19 Jul 2005, at 01:52, A. Pagaltzis wrote:


* Mark Nottingham [EMAIL PROTECTED] [2005-07-18 23:30]:


This is one of the unanswered questions that I left out of
scope. The  consumer can examine the previous archive's URI and
decide as to  whether it's seen it or not before, and therefore
avoid fetching it  if it already has seen it. However, in this
approach, it won't see  changes that are made in the archive
(e.g., if a revision -- even a  spelling correction -- is made
to an old entry); to do that it either  has to walk back the
*entire* archive each time, or the feed has to  publish all
changes -- even to old entries -- at the head of the feed.



These are the kinds of things my “hub archive feed” situation was
supposed to address. Because the links are all in one place, the
consumer only has to suck down one document in order to be
informed of all archive feeds and being able to decide which ones
he wants to re-/get.


I wonder if what you are trying to describe here is not a different
concept altogether from an archive feed. I guess that both are  
completely

orthogonal concepts.

Feeds tend to specialize in a number of resources they track. What would
also be useful would be a document that  described the resources   
tracked by a
feed. This would be closer to a directory listing. It would help  
point to the

current state of the resources tracked by the feed.

So when one subscribed to a feed one could then quickly get a list of  
all
the resources that the feed had responsibility for. As this could be  
quite large
some form of navigation may be necessary. Perhaps this is the type of  
thing

that the protocol group is working on.

Henry Story



Regards,
--
Aristotle Pagaltzis // http://plasmasturm.org/







Re: Atom 1.0 xml:base/URI funnies

2005-07-19 Thread Sjoerd Visscher


A. Pagaltzis wrote:
It makes me wonder whether the person who wrote the example was 
unaware of the consequences of the same-document reference 
specifications in the URI RFC. Surely, the xml:base WG must have 
noticed this issue and discussed it?


I wonder how many people are aware of it. I wonder if we managed to
convince any readers on the atom list at all. Tim Bray hasn't responded
yet, so I guess he is still in doubt.

I found out about it through Mozilla Bug 241981.
https://bugzilla.mozilla.org/show_bug.cgi?id=241981
Mozilla had implemented that the http content-location header sets the
base href. But as Mozilla has no same-document reference support, it
would navigate to the content-location document when you clicked on an
internal link. The solution was to revert content-location support, and
noting that it was broken-by-design. I found this hard to believe, and
tried figuring out what was really supposed to happen.


Notice that xml:base describes **a base URI**. The xml:base TR does
not define what a base URI is. It’s RFC3986 (and originally, RFC2396)
which does, while describing just what an URI is, to begin with. The
same-document reference stanza in the URI RFCs is clear evidence that
in the spirit of the spec, “base URI” means “the source URI of the
content,” not “the prefix I wish to apply to relative references.”


RFC2396 is not the original spec, HTML 2 is. Later versions of HTML 
screwed up. I wrote up the history of fragment identifiers here:

http://w3future.com/weblog/2005/01/


At first, I thought the RFC-specified same-document reference stanza
made no sense. But then I realized it is perfectly fine and
absolutely desirable for the case where the “base URI embedded in
content” applies to the entire document.

It is the xml:base TR which is at odds with this; applying 
same-document reference behaviour to fragments of an aggregate 
document is non-sensical. The more I think about it, the more it 
seems like this interaction is Broken As Designed. xml:base should

not have adopted the “base URI” term – basically, it appears that the
very attribute name itself is a misnomer.


I don't find applying same-document reference behaviour to fragments of
an aggregate document non-sensical. If I XInclude a piece of XHTML that
has same-document references in it, I still want them to be
same-document references, and they should not link back to the original
file.

--
Sjoerd Visscher
http://w3future.com/weblog/



Re: Feed History -02

2005-07-19 Thread Antone Roundy


On Monday, July 18, 2005, at 01:59  AM, Stefan Eissing wrote:
Ch 3. fh:stateful seems to be only needed for a newborn stateful feed. 
As an alternative one could drop fh:stateful and define that an empty 
fh:prev (refering to itself) is the last document in a stateful feed. 
That would eliminate the cases of wrong mixes of fh:stateful and 
fh:prev.


The problem is that an empty @href in fh:prev is subject to xml:base 
processing, and who knows what the current xml:base is going to be when 
you get to it.  Is there a way to explicitly make xml:base undefined?  
If I'm not mistaken xml:base= doesn't do it--it just adds nothing to 
the existing xml:base.  If there is a way, you could say link 
rel=fhprev href= xml:base=[whatever value sets it to 
undefined] /, but otherwise, using an empty @href is probably 
overloading the wrong attribute.  A different @rel value like 
fh:noprev (with an empty link, since it doesn't matter what it 
actually points to) might be a step up, but using any kind of link to 
indicate the lack of a link is a little odd.




Re: Feed History -02

2005-07-19 Thread Antone Roundy


On Tuesday, July 19, 2005, at 12:29  PM, Antone Roundy wrote:

On Monday, July 18, 2005, at 01:59  AM, Stefan Eissing wrote:
Ch 3. fh:stateful seems to be only needed for a newborn stateful 
feed. As an alternative one could drop fh:stateful and define that an 
empty fh:prev (refering to itself) is the last document in a stateful 
feed. That would eliminate the cases of wrong mixes of fh:stateful 
and fh:prev.


The problem is that an empty @href in fh:prev is subject to xml:base 
processing, and who knows what the current xml:base is going to be 
when you get to it.  Is there a way to explicitly make xml:base 
undefined?  If I'm not mistaken xml:base= doesn't do it--it just 
adds nothing to the existing xml:base.  If there is a way, you could 
say link rel=fhprev href= xml:base=[whatever value sets it to 
undefined] /, but otherwise, using an empty @href is probably 
overloading the wrong attribute.  A different @rel value like 
fh:noprev (with an empty link, since it doesn't matter what it 
actually points to) might be a step up, but using any kind of link to 
indicate the lack of a link is a little odd.


Yikes, I should have caught up on the xml:base thread first!  Looks 
like the jury's out, or at least hung, on this issue.




Re: Atom 1.0 xml:base/URI funnies

2005-07-19 Thread Dave Pawson

If anyone comes to a definitive conclusion on this,
would they post to the list, or a website please.

TIA



-- 
Regards, 

Dave Pawson
XSLT + Docbook FAQ
http://www.dpawson.co.uk



Re: Atom 1.0 xml:base/URI funnies

2005-07-19 Thread A. Pagaltzis

* Sjoerd Visscher [EMAIL PROTECTED] [2005-07-19 12:35]:
 I don't find applying same-document reference behaviour to
 fragments of an aggregate document non-sensical. If I XInclude
 a piece of XHTML that has same-document references in it, I
 still want them to be same-document references, and they should
 not link back to the original file.

It is and isn’t. I thought about it more, and found that there
are cases where it is non-sensical and cases where it’s
desirable, but I couldn’t verbalize the difference. Antone filled
the gap in his reply below.

I am not so negative about the xml:base TR anymore; both the TR
as well as the RFC are to blame, to an extent, but it’s neither’s
fault.

* Antone Roundy [EMAIL PROTECTED] [2005-07-19 22:45]:
 That example says: the content of the root element can be
 found in the resource at http://example.org/today/, and the
 content of the olist tag can be found in the resource at
 http://example.org/hotpicks/. xml:base is quite apparently
 being used as “a prefix for calculating relative URIs” instead
 of “the source URI for the material found inside this tag.”
 As you can see above, I reached the opposite conclusion.

I’m not sure if I didn’t explain myself well (likely), or you
misunderstood my (very brief) explanation, but I don’t think
we’re in disagreement. Everything you’ve said about your own
example document is exactly in line with my thinking.

 The problem lies not in applying same-document reference
 behavior, but in copying EXCERPTS from source documents that
 have links to fragments that aren't part of the excerpt.  The
 same-document reference behavior is desirable if both the link
 and the fragment it links to are copied into the destination
 document.

Yes!! Exactly. Thank you for finding that disctinction. It is the
point I could feel and sense as I thought about this issue more,
but couldn’t quite pin down.

 But there is no way to link to non-excerpted fragments.  The
 URI spec would have to say that if the fragment isn't found in
 the current document, you can fetch the base URI to see if it
 exists there (it could even say that you can only do this if
 the current base URI was embedded in the content). If the
 fragment doesn't at the base URI, it's a broken link.

Indeed, that would be the correct fix.

 A hackish solution to the Tim's Feed Conundrum would be to
 set xml:base not to 'http://www.tbray.org/ongoing/', but to
 'http://www.tbray.org/ongoing/foo', where foo doesn't
 actually exist, but is just used to ensure that relative
 references don't end up being identical to the base URI.  Then,
 instead of link href='' / (which would be a same-document
 reference...I think I was wrong in the other thread), you could
 say link href='./' /.

That’s hackish, but almost correct. Now substitute foo in that
base URI for ongoing.atom and you get the real base URI for the
Atom document. Further, link href=./ / then produces a correct
alternate link, and link rel=self href= / is then a correct
self-link.

That is exactly what I proposed a few messages up in this thread.
:-)

Although at the time, I hadn’t cleared my thinking enough, so
proposed the wrong xml:base for individual atom:entry tags, which
was corrected by Sjoerd.

 The other solution I can think of would be for the Atom spec to
 say that the same-document reference rule from the URI spec
 does not apply to the atom:link element.  But that's kinda lame
 too--it would basically mean that Atom uses base URIs as
 prefixes for convenience, rather than to rectify the base URI
 of data taken from somewhere else, which seems to me to be
 their intent.

Yes, I proposed the same. :-)  Clearly, we are on the same page.

And yes, it would be lame. Not an undue burden on implementors,
as I also already argued, but conceptually it is lame indeed.

Finally, I share the dismay you expressed at the beginning of
your mail. This kind of sucks…

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: Atom 1.0 xml:base/URI funnies

2005-07-19 Thread A. Pagaltzis

* Graham [EMAIL PROTECTED] [2005-07-20 01:20]:
 While I agree this interpretation is potentially correct, it
 moves us  pretty far away from the idea of a self-contained
 document with a  singular embedded base URI, which is all that
 RFC2396 ever discusses.

That is pretty much what I said; yes. The “base URI embedded in
content” which the RFC describes is really one which has to apply
to the entire document.

 Whether the idea same-document references still make sense
 when the  document isn't a document but an XML element buried
 deep inside an  actual document.

It partially does – see Sjoerd’s and Antone’s replies and my
reply to them.

But the language in RFC3986 does not consider this use case, and
the language in the xml:base TR does not address same-document
references at all. So there are things possible in the scope of
the xml:base TR, for whose behaviour it defers to the RFC, which
only considers a small subset of the possible use cases. So we
have a mismatched layering of specs, for a certain class of use
cases… ugh.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Notes on the latest draft.

2005-07-19 Thread James Cerra

I took some notes while reading the specification.  Not all of them are good 
notes, and I was cranky while writing them.  Still, they do have some issues 
or slightly vague points about the spec from my view point.

Section 1.2:

 http://www.w3.org/2005/Atom

I guess consistancy is not a requirement of the Atom spec.  By convention, 
this should be all lowercase.  Existing software for Atom 0.3 has to be 
recoded for Atom 1.0, so this change has no real cost.  True, URI references 
shouldn't change; however, that only applies to stable resources.  Atom is 
explicitly unstable until standardized.

Section 2:
--
 Any element defined by this specification MAY have an xml:base attribute.  
 When xml:base is used in an Atom Document, it serves the function described 
 in section 5.1.1 of RFC 3986, establishing the base URI (or IRI) for 
 resolving any relative references found within the effective scope of the 
 xml:base attribute.

xml:base is a broken specification.  At the simplest, it's just a lame attempt 
at abbreviating strings.  However, it solves that problem in the worst 
possible manner.  As the RDF serializations show, what is needed is a 
name/value pair simular to entities or xml namespaces.  In fact, the general 
solution would combine all three.  This is a case where, in an attempt to 
simplify problems by seggerating them into domains (i.e. namespaces, URI 
abbreviations, SGML compatibility, and others), the solutions actually 
complicate things to the point of absurdity!  Of course, it is too late to fix 
for Atom 1.0, XML 1.0, and others.  :-(

Section 3.1.1.2:

HTML has many entities predefined.  If you use HTML content, are those 
entities allowed (after being escaped, of course)?  That would make it really 
really hard to normalize to text or XML without doctype processing.  I feel 
that HTML entities other than numeric references, amp;gt;, amp;lt;, 
amp;amp;, amp;apos;, and amp;quote; should be depreciated in HTML 
content.

Section 3.1.1.3:

Atom should explicitly endorse XHTML over HTML as perferred form of 
content.  HTML is just really hard to process compared with XML.

Also, how does Atom interact with HTML 5, XHTML 2, or future versions of 
the specs?  I still say that version or doctype attributes should be allowed 
to solve disambiguities and allow compatibility with future versions of those 
specs.  Public/System identifiers can still be used to identify type without 
validating it - that is why RDF and namespaces work at all.

Section 3.2.2:
--
 The atom:uri element's content conveys an IRI associated with the person.  
 Person constructs MAY contain an atom:uri element, but MUST NOT contain more 
 than one.  The content of atom:uri in a Person construct MUST be an IRI 
 reference.

There is no reason *not* to change this to atom:id.  It is lazy and 
dangerous to have an element lie about the type of its content.  Furthermore, 
the whole point of atom:uri is the same as atom:id - to identify the thing 
they refer to (author or entry) - and their content is likewise identical.

Section 3.2.3:
--
 The atom:email element's content conveys an e-mail address associated with 
 the person.  Person constructs MAY contain an atom:email element, but MUST 
 NOT contain more than one.  Its content MUST conform to the addr-spec 
 production in RFC 2822.

OTOH, is there a reason that atom:uri (which should be atom:indentifier) and 
atom:email are not attributes of a person construct?  It is easier to process 
with CSS, but not harder for other processes, if they are attributes.  
Especially if white space is not significant and should be normalized.  Also, 
by making them attributes you get the XML processor to enforce the 
cardinality of these constructs.

Section 4.1.1:
--
 * atom:feed elements SHOULD contain one atom:link element with a rel 
   attribute value of self.  This is the preferred URI for retrieving Atom 
   Feed Documents representing this Atom feed.

There is a mistake.  atom:link indentifies things using an IRI not an URI.

Section 4.1.3.3:

Clarify this point:  This section applies to atom:content when the src 
attribute is not present.  If it is present, then the content of the external 
file must be valid whatever depending on mime type you use (of course).  But 
it could be valid HTML with a doctype or valid XHTML with a root html 
element.

Section 4.2.5:
--
No way of defining the perfered media type of the icon as with links?

Section 4.2.6:
--
How does this interact with xml:base?  Are relative atom:ids allowed?  How are 
they compared?  Are entries information resources?  If so, should something 
retrievable be placed at their URI?

Section 4.2.8:
--
No way of defining the perfered media type of the logo (or icon) as with links?

Section 7:
--
 Fragment identifiers: As specified for application/xml in RFC 3023,