Re: base within HTML content

2007-01-02 Thread Anne van Kesteren


On Tue, 02 Jan 2007 01:21:09 +0100, Henri Sivonen [EMAIL PROTECTED] wrote:
I suppose you could raise this on the WHATWG list. Asking what happens  
if you set innerHTML of a div where the setted value has both a  
base and an a for instance.


Interesting. I hadn't thought that Atom was supposed to use innerHTML  
parsing. I'd have said that you prepend !DOCTYPE htmltitle/ 
titlediv to what travels in the feed and append /div to it,  
parse the resulting string and grab the first div in the document order.


That could work as well. In that case base would most certainly apply.  
But nothing like that is defined...



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/



Re: base within HTML content

2007-01-02 Thread A. Pagaltzis

* Henri Sivonen [EMAIL PROTECTED] [2007-01-02 01:35]:
 I hadn't thought that Atom was supposed to use innerHTML
 parsing. I'd have said that you prepend
 !DOCTYPE htmltitle/titlediv to what travels in the
 feed and append /div to it,  parse the resulting string and
 grab the first div in the document order.

That will lead to silent data loss if the content is malformed
such that it contains an extraneous `/div`.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: base within HTML content

2007-01-02 Thread Anne van Kesteren


On Tue, 02 Jan 2007 02:12:14 +0100, James Holderness  
[EMAIL PROTECTED] wrote:
Well that's not really what I've learnt. I've learnt that there are a  
lot of broken feeds out there (Atom as well as RSS) and that users are  
less than impressed when you tell them it's not your fault and they  
should complain to someone else.


So if a feed is non-well-formed you should just parse it as well using  
some tag soup parser for XML? I don't necessarily disagree with that, but  
I'd like error handling to be defined in great detail. Everyone just doing  
what is best for their users will lead you to where HTML is now (at best).



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/



Re: base within HTML content

2007-01-02 Thread Anne van Kesteren


On Tue, 02 Jan 2007 11:40:53 +0100, A. Pagaltzis [EMAIL PROTECTED] wrote:

* Henri Sivonen [EMAIL PROTECTED] [2007-01-02 01:35]:

I hadn't thought that Atom was supposed to use innerHTML
parsing. I'd have said that you prepend
!DOCTYPE htmltitle/titlediv to what travels in the
feed and append /div to it,  parse the resulting string and
grab the first div in the document order.


That will lead to silent data loss if the content is malformed
such that it contains an extraneous `/div`.


Yeah, it's probably better to take the first and only body element.


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/



Re: base within HTML content

2007-01-02 Thread Henri Sivonen


On Jan 2, 2007, at 12:40, A. Pagaltzis wrote:


* Henri Sivonen [EMAIL PROTECTED] [2007-01-02 01:35]:

I hadn't thought that Atom was supposed to use innerHTML
parsing. I'd have said that you prepend
!DOCTYPE htmltitle/titlediv to what travels in the
feed and append /div to it,  parse the resulting string and
grab the first div in the document order.


That will lead to silent data loss if the content is malformed
such that it contains an extraneous `/div`.


Good point.

Prepending !DOCTYPE htmltitle/title and grabbing the contents  
of body would work better.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: base within HTML content

2007-01-02 Thread James Holderness


Anne van Kesteren wrote:
So if a feed is non-well-formed you should just parse it as well using 
some tag soup parser for XML?


Well that's what I do. The Google Reader blog post I quoted estimated that 
about seven percent of feeds contained XML errors of some kind. That's a lot 
of feeds for me to ignore. I'm sure other aggregator authors will choose 
otherwise. It just depends on their needs and the needs of their users.


I'd like error handling to be defined in great detail. Everyone just doing 
what is best for their users will lead you to where HTML is now (at best).


To be honest, I don't care. I'm not trying to make policy here. I was just 
offering my advice to one particular person in response to one particular 
query.


Regards
James



Re: base within HTML content

2007-01-01 Thread Asbjørn Ulsberg


On Fri, 22 Dec 2006 18:38:33 +0100, Geoffrey Sneddon  
[EMAIL PROTECTED] wrote:


If we come across something like: description type=html![CDATA 
[base url=http://example.com/;a href=test.htmlTest Link/a]] 
 /description,


Yikes!


I assume the link should point to http://example.com/test.html, due
to the base element? I assume, likewise, that base would take
precedence over xml:base, as it is directly within the content.


Like James Holderness wrote, the base element has no place in an HTML  
fragment, so its meaning is (although most browsers wrongfully supports  
its presence anywhere in an HTML document) unspecified. The correct base  
URI to use here is the closest xml:base in the ancestor vector or the  
document's base URI.


What's the use case for not using xml:base here?

--
Asbjørn Ulsberg -=|=-http://virtuelvis.com/quark/
«He's a loathsome offensive brute, yet I can't look away»



Re: base within HTML content

2007-01-01 Thread Geoffrey Sneddon



On 1 Jan 2007, at 16:59, Asbjørn Ulsberg wrote:


Like James Holderness wrote,


Eek! I should keep up with emails better!

the base element has no place in an HTML fragment, so its meaning  
is (although most browsers wrongfully supports its presence  
anywhere in an HTML document) unspecified.


Web Applications 1.0 (keeping with the real world) defines that it  
should be moved to HEAD within the DOM tree.


Why, may I ask, MUST (under the RFC 2119 definition) HTML content be  
a fragment (HTML markup within SHOULD be such that it could validly  
appear directly within an HTML DIV element, after unescaping. -  
note the word SHOULD, not MUST, implying that you can have a full  
HTML document within)?


The correct base URI to use here is the closest xml:base in the  
ancestor vector or the document's base URI.


What's the use case for not using xml:base here?


I don't know - this is just an example of a feed I came across a few  
weeks back.



- Geoffrey Sneddon





Re: base within HTML content

2007-01-01 Thread Thomas Broyer


2007/1/1, Geoffrey Sneddon:


On 1 Jan 2007, at 16:59, Asbjørn Ulsberg wrote:

 the base element has no place in an HTML fragment, so its meaning
 is (although most browsers wrongfully supports its presence
 anywhere in an HTML document) unspecified.

Web Applications 1.0 (keeping with the real world) defines that it
should be moved to HEAD within the DOM tree.


I suppose HTML within Atom is rather processed as innerHTML, so
there is no head pointer, and the base element is just appended as
a child of the current node (along with a parse error !)


Why, may I ask, MUST (under the RFC 2119 definition) HTML content be
a fragment (HTML markup within SHOULD be such that it could validly
appear directly within an HTML DIV element, after unescaping. -
note the word SHOULD, not MUST, implying that you can have a full
HTML document within)?


Yes, you could, in the sense that the Atom document wouldn't be
invalid, but you shouldn't expect it to be processed as a full HTML
document.

The SHOULD implies that Atom processors are OK if they process HTML
content as innerHTML on a div element.

--
Thomas Broyer



Re: base within HTML content

2007-01-01 Thread A. Pagaltzis

* Geoffrey Sneddon [EMAIL PROTECTED] [2007-01-01 19:00]:
 On 1 Jan 2007, at 16:59, Asbjørn Ulsberg wrote:
 the base element has no place in an HTML fragment, so its
 meaning is (although most browsers wrongfully supports its
 presence anywhere in an HTML document) unspecified.
 
 Web Applications 1.0 (keeping with the real world) defines that it  
 should be moved to HEAD within the DOM tree.

Thereby, of course, breaking the links in any other entries
rendered in the same page by a web-based aggregator, f.ex.

 Why, may I ask, MUST (under the RFC 2119 definition) HTML content be  
 a fragment (HTML markup within SHOULD be such that it could validly  
 appear directly within an HTML DIV element, after unescaping. -  
 note the word SHOULD, not MUST, implying that you can have a full  
 HTML document within)?

Because many aggregators (most, very likely) do not render items
in isolation, but rather in some sort of collection, either
across feeds as a “river of news” or even just several within a
single feed. (Weblog engines do that when showing the front page
of the weblog or archive for particular intervals.) They will
usually strip any header-level information from your entry, so
putting such elements in the content will usually fail to achieve
what you wanted – hence the SHOULD.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: base within HTML content

2007-01-01 Thread Bob Wyman

On 1/1/07, Geoffrey Sneddon [EMAIL PROTECTED] wrote: Why, may I
ask, MUST (under the RFC 2119 definition) HTML

content be a fragment (HTML markup within SHOULD be such
that it could validly appear directly within an HTML DIV
element, after unescaping. - note the word SHOULD, not
MUST, implying that you can have a full HTML document within)?


What would you do if you wanted to display a feed of 10 entries in
newspaper style (i.e. all entries in a single HTML page) yet each of the
entries had a different BASE defined? It wouldn't do you much good to move
all the base elements to the HEAD of the DOM tree -- you'd just end up with
a mess. If you want a local base, then use xml:base. That's what it is for.

The same problem exists for other page-global stuff. For instance, XHTML
modularization is useless if you're creating Atom entries since that stuff
relies on elements in HEAD but, an Atom entry ain't got no head

Remember as well that not all of the entries in a feed document need be
created by the same person. For instance, with aggregated or synthetic
feeds, you end up with entries written by many different authors who have no
chance of negotiating how they will divide the global resources that might
be used to display their entries. Because some entries may be signed, you
can't simply say something like just rewrite the entries -- that would
break the signatures.

It is good that Atom entries should be fragments. That increases to a great
degree the variety of environments in which Atom entries are useful. If you
feel constrained by this, I would suggest that you push on those who define
HTML and get them to provide mechanisms for allowing fragment-local
expression of things that at this time can only be expressed as page-global.
(Yes, I realize this will take some time.)

bob wyman


Re: base within HTML content

2007-01-01 Thread Geoffrey Sneddon



On 1 Jan 2007, at 19:22, Bob Wyman wrote:


If you want a local base, then use xml:base. That's what it is for.


When the spec says you SHOULD treat html content as if it were in a  
DIV, it adds a certain amount of unclarity as how such Atom feeds  
should be parsed. I'm asking merely to see if there's any consensus  
as to how it should be done. I have no control over the vast majority  
of feeds out there - telling me to use xml:base will make no  
difference, as I have no control over the feed in which I found a  
base.



- Geoffrey Sneddon




Re: base within HTML content

2007-01-01 Thread Anne van Kesteren


On Mon, 01 Jan 2007 21:22:33 +0100, Geoffrey Sneddon  
[EMAIL PROTECTED] wrote:

If you want a local base, then use xml:base. That's what it is for.


When the spec says you SHOULD treat html content as if it were in a  
DIV, it adds a certain amount of unclarity as how such Atom feeds  
should be parsed. I'm asking merely to see if there's any consensus as  
to how it should be done. I have no control over the vast majority of  
feeds out there - telling me to use xml:base will make no difference, as  
I have no control over the feed in which I found a base.


Hmm, the same is true for a large number of other cases. Atom in general  
is ambigious at best in terms of error handling.


I suppose you could raise this on the WHATWG list. Asking what happens if  
you set innerHTML of a div where the setted value has both a base and  
an a for instance.



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/



Re: base within HTML content

2007-01-01 Thread James Holderness


Geoffrey Sneddon wrote:
When the spec says you SHOULD treat html content as if it were in a 
DIV, it adds a certain amount of unclarity as how such Atom feeds 
should be parsed. I'm asking merely to see if there's any consensus  as to 
how it should be done. I have no control over the vast majority  of feeds 
out there - telling me to use xml:base will make no  difference, as I have 
no control over the feed in which I found a  base.


Do you still have a copy of the feed you encountered that was using a base 
element? I'd be curious to see whether its links and images would fail to 
work if you didn't take that base into account? Because if that's the case, 
I'd recommend supporting it (i.e. the base element takes precedence over 
xml:base or however else the current base uri is determined).


In other words, do whatever it takes to get that particular feed to work. 
This obviously isn't a common scenario, and it's arguably not a valid feed, 
so whatever you do you can't be faulted. Unless you find more data 
suggesting this is a bad idea, it seems to me it would make sense to at 
least get your one known example to work.


MHO.

Regards
James



Re: base within HTML content

2007-01-01 Thread James M Snell

-1. If there's anything we can learn from the mess that is RSS, at a
certain point feed consumers should be allowed to say simply that a
buggy feed is a buggy feed and that it falls on the responsibility of
the feed publisher to get things right.

- James

James Holderness wrote:
 [snip]
 Do you still have a copy of the feed you encountered that was using a
 base element? I'd be curious to see whether its links and images would
 fail to work if you didn't take that base into account? Because if
 that's the case, I'd recommend supporting it (i.e. the base element
 takes precedence over xml:base or however else the current base uri is
 determined).
 
 In other words, do whatever it takes to get that particular feed to
 work. This obviously isn't a common scenario, and it's arguably not a
 valid feed, so whatever you do you can't be faulted. Unless you find
 more data suggesting this is a bad idea, it seems to me it would make
 sense to at least get your one known example to work.
 
 MHO.
 
 Regards
 James
 
 



Re: base within HTML content

2007-01-01 Thread Henri Sivonen


On Jan 1, 2007, at 22:46, Anne van Kesteren wrote:

I suppose you could raise this on the WHATWG list. Asking what  
happens if you set innerHTML of a div where the setted value has  
both a base and an a for instance.


Interesting. I hadn't thought that Atom was supposed to use innerHTML  
parsing. I'd have said that you prepend !DOCTYPE htmltitle/ 
titlediv to what travels in the feed and append /div to it,  
parse the resulting string and grab the first div in the document order.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: base within HTML content

2007-01-01 Thread James Holderness


James M Snell wrote:

-1. If there's anything we can learn from the mess that is RSS, at a
certain point feed consumers should be allowed to say simply that a
buggy feed is a buggy feed and that it falls on the responsibility of
the feed publisher to get things right.


Well that's not really what I've learnt. I've learnt that there are a lot of 
broken feeds out there (Atom as well as RSS) and that users are less than 
impressed when you tell them it's not your fault and they should complain to 
someone else.


In the words of Mihai Parparita (Google Reader): as anyone who has 
attempted to implement a feed parser knows, there are many subtle deviations 
from the spec that you have to handle if you want to have any hope of 
satisfying the needs of your users.  [1]


I'm not saying that aggregators MUST support this particular buggy feed. I 
just got the impression that Geoffrey WANTED to support it. I think that's 
his choice to make.


Regards
James

[1] http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html



Re: base within HTML content

2006-12-22 Thread James Holderness


Geoffrey Sneddon wrote:
If we come across something like: description type=html![CDATA 
[base url=http://example.com/;a href=test.htmlTest Link/a]]

/description, I assume the link should point to http://
example.com/test.html, due to the base element?


Not necessarily. The base element should only appear in the head section of 
an HTML document, so it's not actually valid in an HTML fragment like this. 
I suppose you could support it if you wanted to, but I don't think you'd be 
wrong to ignore it.


Regards
James