Re: base within HTML content
On Tue, 02 Jan 2007 01:21:09 +0100, Henri Sivonen [EMAIL PROTECTED] wrote: I suppose you could raise this on the WHATWG list. Asking what happens if you set innerHTML of a div where the setted value has both a base and an a for instance. Interesting. I hadn't thought that Atom was supposed to use innerHTML parsing. I'd have said that you prepend !DOCTYPE htmltitle/ titlediv to what travels in the feed and append /div to it, parse the resulting string and grab the first div in the document order. That could work as well. In that case base would most certainly apply. But nothing like that is defined... -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: base within HTML content
* Henri Sivonen [EMAIL PROTECTED] [2007-01-02 01:35]: I hadn't thought that Atom was supposed to use innerHTML parsing. I'd have said that you prepend !DOCTYPE htmltitle/titlediv to what travels in the feed and append /div to it, parse the resulting string and grab the first div in the document order. That will lead to silent data loss if the content is malformed such that it contains an extraneous `/div`. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: base within HTML content
On Tue, 02 Jan 2007 02:12:14 +0100, James Holderness [EMAIL PROTECTED] wrote: Well that's not really what I've learnt. I've learnt that there are a lot of broken feeds out there (Atom as well as RSS) and that users are less than impressed when you tell them it's not your fault and they should complain to someone else. So if a feed is non-well-formed you should just parse it as well using some tag soup parser for XML? I don't necessarily disagree with that, but I'd like error handling to be defined in great detail. Everyone just doing what is best for their users will lead you to where HTML is now (at best). -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: base within HTML content
On Tue, 02 Jan 2007 11:40:53 +0100, A. Pagaltzis [EMAIL PROTECTED] wrote: * Henri Sivonen [EMAIL PROTECTED] [2007-01-02 01:35]: I hadn't thought that Atom was supposed to use innerHTML parsing. I'd have said that you prepend !DOCTYPE htmltitle/titlediv to what travels in the feed and append /div to it, parse the resulting string and grab the first div in the document order. That will lead to silent data loss if the content is malformed such that it contains an extraneous `/div`. Yeah, it's probably better to take the first and only body element. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: base within HTML content
On Jan 2, 2007, at 12:40, A. Pagaltzis wrote: * Henri Sivonen [EMAIL PROTECTED] [2007-01-02 01:35]: I hadn't thought that Atom was supposed to use innerHTML parsing. I'd have said that you prepend !DOCTYPE htmltitle/titlediv to what travels in the feed and append /div to it, parse the resulting string and grab the first div in the document order. That will lead to silent data loss if the content is malformed such that it contains an extraneous `/div`. Good point. Prepending !DOCTYPE htmltitle/title and grabbing the contents of body would work better. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: base within HTML content
Anne van Kesteren wrote: So if a feed is non-well-formed you should just parse it as well using some tag soup parser for XML? Well that's what I do. The Google Reader blog post I quoted estimated that about seven percent of feeds contained XML errors of some kind. That's a lot of feeds for me to ignore. I'm sure other aggregator authors will choose otherwise. It just depends on their needs and the needs of their users. I'd like error handling to be defined in great detail. Everyone just doing what is best for their users will lead you to where HTML is now (at best). To be honest, I don't care. I'm not trying to make policy here. I was just offering my advice to one particular person in response to one particular query. Regards James
Re: base within HTML content
On Fri, 22 Dec 2006 18:38:33 +0100, Geoffrey Sneddon [EMAIL PROTECTED] wrote: If we come across something like: description type=html![CDATA [base url=http://example.com/;a href=test.htmlTest Link/a]] /description, Yikes! I assume the link should point to http://example.com/test.html, due to the base element? I assume, likewise, that base would take precedence over xml:base, as it is directly within the content. Like James Holderness wrote, the base element has no place in an HTML fragment, so its meaning is (although most browsers wrongfully supports its presence anywhere in an HTML document) unspecified. The correct base URI to use here is the closest xml:base in the ancestor vector or the document's base URI. What's the use case for not using xml:base here? -- Asbjørn Ulsberg -=|=-http://virtuelvis.com/quark/ «He's a loathsome offensive brute, yet I can't look away»
Re: base within HTML content
On 1 Jan 2007, at 16:59, Asbjørn Ulsberg wrote: Like James Holderness wrote, Eek! I should keep up with emails better! the base element has no place in an HTML fragment, so its meaning is (although most browsers wrongfully supports its presence anywhere in an HTML document) unspecified. Web Applications 1.0 (keeping with the real world) defines that it should be moved to HEAD within the DOM tree. Why, may I ask, MUST (under the RFC 2119 definition) HTML content be a fragment (HTML markup within SHOULD be such that it could validly appear directly within an HTML DIV element, after unescaping. - note the word SHOULD, not MUST, implying that you can have a full HTML document within)? The correct base URI to use here is the closest xml:base in the ancestor vector or the document's base URI. What's the use case for not using xml:base here? I don't know - this is just an example of a feed I came across a few weeks back. - Geoffrey Sneddon
Re: base within HTML content
2007/1/1, Geoffrey Sneddon: On 1 Jan 2007, at 16:59, Asbjørn Ulsberg wrote: the base element has no place in an HTML fragment, so its meaning is (although most browsers wrongfully supports its presence anywhere in an HTML document) unspecified. Web Applications 1.0 (keeping with the real world) defines that it should be moved to HEAD within the DOM tree. I suppose HTML within Atom is rather processed as innerHTML, so there is no head pointer, and the base element is just appended as a child of the current node (along with a parse error !) Why, may I ask, MUST (under the RFC 2119 definition) HTML content be a fragment (HTML markup within SHOULD be such that it could validly appear directly within an HTML DIV element, after unescaping. - note the word SHOULD, not MUST, implying that you can have a full HTML document within)? Yes, you could, in the sense that the Atom document wouldn't be invalid, but you shouldn't expect it to be processed as a full HTML document. The SHOULD implies that Atom processors are OK if they process HTML content as innerHTML on a div element. -- Thomas Broyer
Re: base within HTML content
* Geoffrey Sneddon [EMAIL PROTECTED] [2007-01-01 19:00]: On 1 Jan 2007, at 16:59, Asbjørn Ulsberg wrote: the base element has no place in an HTML fragment, so its meaning is (although most browsers wrongfully supports its presence anywhere in an HTML document) unspecified. Web Applications 1.0 (keeping with the real world) defines that it should be moved to HEAD within the DOM tree. Thereby, of course, breaking the links in any other entries rendered in the same page by a web-based aggregator, f.ex. Why, may I ask, MUST (under the RFC 2119 definition) HTML content be a fragment (HTML markup within SHOULD be such that it could validly appear directly within an HTML DIV element, after unescaping. - note the word SHOULD, not MUST, implying that you can have a full HTML document within)? Because many aggregators (most, very likely) do not render items in isolation, but rather in some sort of collection, either across feeds as a “river of news” or even just several within a single feed. (Weblog engines do that when showing the front page of the weblog or archive for particular intervals.) They will usually strip any header-level information from your entry, so putting such elements in the content will usually fail to achieve what you wanted – hence the SHOULD. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: base within HTML content
On 1/1/07, Geoffrey Sneddon [EMAIL PROTECTED] wrote: Why, may I ask, MUST (under the RFC 2119 definition) HTML content be a fragment (HTML markup within SHOULD be such that it could validly appear directly within an HTML DIV element, after unescaping. - note the word SHOULD, not MUST, implying that you can have a full HTML document within)? What would you do if you wanted to display a feed of 10 entries in newspaper style (i.e. all entries in a single HTML page) yet each of the entries had a different BASE defined? It wouldn't do you much good to move all the base elements to the HEAD of the DOM tree -- you'd just end up with a mess. If you want a local base, then use xml:base. That's what it is for. The same problem exists for other page-global stuff. For instance, XHTML modularization is useless if you're creating Atom entries since that stuff relies on elements in HEAD but, an Atom entry ain't got no head Remember as well that not all of the entries in a feed document need be created by the same person. For instance, with aggregated or synthetic feeds, you end up with entries written by many different authors who have no chance of negotiating how they will divide the global resources that might be used to display their entries. Because some entries may be signed, you can't simply say something like just rewrite the entries -- that would break the signatures. It is good that Atom entries should be fragments. That increases to a great degree the variety of environments in which Atom entries are useful. If you feel constrained by this, I would suggest that you push on those who define HTML and get them to provide mechanisms for allowing fragment-local expression of things that at this time can only be expressed as page-global. (Yes, I realize this will take some time.) bob wyman
Re: base within HTML content
On 1 Jan 2007, at 19:22, Bob Wyman wrote: If you want a local base, then use xml:base. That's what it is for. When the spec says you SHOULD treat html content as if it were in a DIV, it adds a certain amount of unclarity as how such Atom feeds should be parsed. I'm asking merely to see if there's any consensus as to how it should be done. I have no control over the vast majority of feeds out there - telling me to use xml:base will make no difference, as I have no control over the feed in which I found a base. - Geoffrey Sneddon
Re: base within HTML content
On Mon, 01 Jan 2007 21:22:33 +0100, Geoffrey Sneddon [EMAIL PROTECTED] wrote: If you want a local base, then use xml:base. That's what it is for. When the spec says you SHOULD treat html content as if it were in a DIV, it adds a certain amount of unclarity as how such Atom feeds should be parsed. I'm asking merely to see if there's any consensus as to how it should be done. I have no control over the vast majority of feeds out there - telling me to use xml:base will make no difference, as I have no control over the feed in which I found a base. Hmm, the same is true for a large number of other cases. Atom in general is ambigious at best in terms of error handling. I suppose you could raise this on the WHATWG list. Asking what happens if you set innerHTML of a div where the setted value has both a base and an a for instance. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: base within HTML content
Geoffrey Sneddon wrote: When the spec says you SHOULD treat html content as if it were in a DIV, it adds a certain amount of unclarity as how such Atom feeds should be parsed. I'm asking merely to see if there's any consensus as to how it should be done. I have no control over the vast majority of feeds out there - telling me to use xml:base will make no difference, as I have no control over the feed in which I found a base. Do you still have a copy of the feed you encountered that was using a base element? I'd be curious to see whether its links and images would fail to work if you didn't take that base into account? Because if that's the case, I'd recommend supporting it (i.e. the base element takes precedence over xml:base or however else the current base uri is determined). In other words, do whatever it takes to get that particular feed to work. This obviously isn't a common scenario, and it's arguably not a valid feed, so whatever you do you can't be faulted. Unless you find more data suggesting this is a bad idea, it seems to me it would make sense to at least get your one known example to work. MHO. Regards James
Re: base within HTML content
-1. If there's anything we can learn from the mess that is RSS, at a certain point feed consumers should be allowed to say simply that a buggy feed is a buggy feed and that it falls on the responsibility of the feed publisher to get things right. - James James Holderness wrote: [snip] Do you still have a copy of the feed you encountered that was using a base element? I'd be curious to see whether its links and images would fail to work if you didn't take that base into account? Because if that's the case, I'd recommend supporting it (i.e. the base element takes precedence over xml:base or however else the current base uri is determined). In other words, do whatever it takes to get that particular feed to work. This obviously isn't a common scenario, and it's arguably not a valid feed, so whatever you do you can't be faulted. Unless you find more data suggesting this is a bad idea, it seems to me it would make sense to at least get your one known example to work. MHO. Regards James
Re: base within HTML content
On Jan 1, 2007, at 22:46, Anne van Kesteren wrote: I suppose you could raise this on the WHATWG list. Asking what happens if you set innerHTML of a div where the setted value has both a base and an a for instance. Interesting. I hadn't thought that Atom was supposed to use innerHTML parsing. I'd have said that you prepend !DOCTYPE htmltitle/ titlediv to what travels in the feed and append /div to it, parse the resulting string and grab the first div in the document order. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: base within HTML content
James M Snell wrote: -1. If there's anything we can learn from the mess that is RSS, at a certain point feed consumers should be allowed to say simply that a buggy feed is a buggy feed and that it falls on the responsibility of the feed publisher to get things right. Well that's not really what I've learnt. I've learnt that there are a lot of broken feeds out there (Atom as well as RSS) and that users are less than impressed when you tell them it's not your fault and they should complain to someone else. In the words of Mihai Parparita (Google Reader): as anyone who has attempted to implement a feed parser knows, there are many subtle deviations from the spec that you have to handle if you want to have any hope of satisfying the needs of your users. [1] I'm not saying that aggregators MUST support this particular buggy feed. I just got the impression that Geoffrey WANTED to support it. I think that's his choice to make. Regards James [1] http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html
Re: base within HTML content
Geoffrey Sneddon wrote: If we come across something like: description type=html![CDATA [base url=http://example.com/;a href=test.htmlTest Link/a]] /description, I assume the link should point to http:// example.com/test.html, due to the base element? Not necessarily. The base element should only appear in the head section of an HTML document, so it's not actually valid in an HTML fragment like this. I suppose you could support it if you wanted to, but I don't think you'd be wrong to ignore it. Regards James