Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-29 Thread Bill de hÓra


James M Snell wrote:

Antone,

Very good write up.  The fact that xml:base on div is not valid XHTML is
somewhat irrelevant given that there is an identical problem with
xml:lang. For instance, if I have content xml:lang=endiv
xml:lang=fr.../div/content and I drop the div silently, then I've
got a problem.  Granted, the producer of the atom feed really shouldn't
have done this, but we still need to be able to handle it properly if it
does happen.  


I don't agree bug compliance is the way to go.  If downstream code has 
to patch against broken providers that's a race to the bottom  - it's a 
culture where specs cease to matter because they can be mercilessly E 
and E'd. File a bug report instead.


Otoh if we have spec'd in a feature here which doesn't sit on top on XML 
infrastructure properly, that's another matter - hey, xml lib, handle 
this element special like, cos atom markup don't care about clean 
layering sounds like a problem. We seem to keep doing that with xml:* 
features (lang, include, base). Atom is to XML as HTML is to SGML?


cheers
Bill



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Sylvain Hellegouarch


 On 6/27/06, James M Snell [EMAIL PROTECTED] wrote:
 Please define conformance in regards to this test.  That is, what is
 the exact behavior that a library must perform when a code library has
 an API like, getContent on the content element.

 e.g., is a parser not conformant if it passes the DIV on to the
 consuming application with the expectation that the application is
 responsible for doing the right thing with it?

 Don't be dense. Would the parser be conformant if it passed on the
 feed, entry, and div elements with that expectation? I'll file a bug
 on UFP and I bet you it'll get fixed without a question, because there
 won't be a bad-faith interpretation to fight. That's two demerits this
 week for you. Tsk tsk.


Could this teasing please stop? It noises the debate and starts being
*really* annoying for all of us. If you two have issues, do that in
private as this is not the right place.

Thanks,
- Sylvain



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Sam Ruby


Robert Sayre wrote:


I'll file a bug
on UFP and I bet you it'll get fixed without a question


http://sourceforge.net/tracker/index.php?func=detailaid=1474256group_id=112328atid=661937

- Sam Ruby



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Henri Sivonen


On Jun 28, 2006, at 02:27, James M Snell wrote:


That is, what is
the exact behavior that a library must perform when a code library has
an API like, getContent on the content element.


One sane behaviour is to return an org.w3c.dom.DocumentFragment with  
the deep copies of the children of the namespace div with the  
xml:base and xml:lang context pushed down on each child.


That's a bit awkward, so I guess using a placeholder root element  
with the xml:base and xml:lang context would make sense, provided  
that the API doc says that the root is not part of the logical  
content. This could be emphasized by using a root in a private  
namespace instead of an XHTML div. (Just to be obnoxious enough to  
make sure users of the API take note. :-)


Or, alternatively, the API could construct a full XHTML  
nu.xom.Document or org.w3c.dom.Document and thereby unify the return  
value for type=application/xhtml+xml, type=text/html,  
type=xhtml and type=html. (Assuming, that is that the library  
runs TagSoup and automatically converts HTML to XHTML.) Actually, I  
think this would be the best way.


In any case, returning a String as the value of the content means  
that the library is not fully doing its job when the logical value is  
an XML document fragment.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread James M Snell

Hey Sylvain,

On this one, I'm being very serious.  I need to know what conformance
means.  Hiding the div completely from users of Abdera would mean
potentially losing important data (e.g. the div may contain an xml:lang
or xml:base) or forcing me to perform additional processing (pushing the
in-scope xml:lang/xml:base down to child elements of the div.  It also
has ease-of-use ramifications on the API.  So I really do need a solid
answer on this one.

- James

Sylvain Hellegouarch wrote:
 On 6/27/06, James M Snell [EMAIL PROTECTED] wrote:
 Please define conformance in regards to this test.  That is, what is
 the exact behavior that a library must perform when a code library has
 an API like, getContent on the content element.

 e.g., is a parser not conformant if it passes the DIV on to the
 consuming application with the expectation that the application is
 responsible for doing the right thing with it?
 Don't be dense. Would the parser be conformant if it passed on the
 feed, entry, and div elements with that expectation? I'll file a bug
 on UFP and I bet you it'll get fixed without a question, because there
 won't be a bad-faith interpretation to fight. That's two demerits this
 week for you. Tsk tsk.

 
 Could this teasing please stop? It noises the debate and starts being
 *really* annoying for all of us. If you two have issues, do that in
 private as this is not the right place.
 
 Thanks,
 - Sylvain
 



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Sylvain Hellegouarch

 Hey Sylvain,

 On this one, I'm being very serious.  I need to know what conformance
 means.  Hiding the div completely from users of Abdera would mean
 potentially losing important data (e.g. the div may contain an xml:lang
 or xml:base) or forcing me to perform additional processing (pushing the
 in-scope xml:lang/xml:base down to child elements of the div.  It also
 has ease-of-use ramifications on the API.  So I really do need a solid
 answer on this one.

 - James

Hi James,

I can totally be wrong but as the div tag is added by the Atom processor
when creating a content of type XHTML, I believe it should be stripped out
when extracting the content to avoid altering the original meaning of the
message. That's my understanding anyway.

- Sylvain



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread A. Pagaltzis

* James M Snell [EMAIL PROTECTED] [2006-06-28 14:35]:
 Hiding the div completely from users of Abdera would mean
 potentially losing important data (e.g. the div may contain an
 xml:lang or xml:base) or forcing me to perform additional
 processing (pushing the in-scope xml:lang/xml:base down to
 child elements of the div.

How is that any different from having to find ways to pass any
in-scope xml:lang/xml:base down to API clients when the content
is type=html or type=text? I hope you didn’t punt on those?

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread James M Snell

Our Content interface has methods for getting to that information.

- James

A. Pagaltzis wrote:
 * James M Snell [EMAIL PROTECTED] [2006-06-28 14:35]:
 Hiding the div completely from users of Abdera would mean
 potentially losing important data (e.g. the div may contain an
 xml:lang or xml:base) or forcing me to perform additional
 processing (pushing the in-scope xml:lang/xml:base down to
 child elements of the div.
 
 How is that any different from having to find ways to pass any
 in-scope xml:lang/xml:base down to API clients when the content
 is type=html or type=text? I hope you didn’t punt on those?
 
 Regards,



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread David Powell


Wednesday, June 28, 2006, 1:22:00 PM, James Snell wrote:

 Hiding the div completely from users of Abdera would mean
 potentially losing important data (e.g. the div may contain an xml:lang
 or xml:base)

I don't think that the div should contain an xml:base, because it
isn't valid to use xml:base in XHTML 1.x. As the xhtml:div is added by
the producer, it should be removed by the consumer, so there shouldn't
be an xml:lang in there either. I wouldn't expect consumers to handle
either consistently, so if you are a producer don't do it. I think in
my implementation I handle lang and base on the div, and store them
out-of-band, but it is more by accident than anything.

I would hope that any other xmlns:* declarations on xhtml:div are
honoured. Namespaces are so core to XML that making any
recommendations about their placement is asking for trouble.

 or forcing me to perform additional processing (pushing the
 in-scope xml:lang/xml:base down to child elements of the div.

I avoid that, it isn't nice as the xml:base will make the XHTML
invalid and browser-dependant. In my RDF implementation, I store the
lang context, base context, content model, and other stuff out-of-band
from the content itself. I do rely on RDF's exclusive canonicalization
rules though, to preserve the inscope namespace decls.

(I assume that namespace decls aren't strictly allowed in valid XHTML
either? Oh well...)

 It also has ease-of-use ramifications on the API. So I really do
 need a solid answer on this one.

You need to preserve a load of context in addition to the content
string itself, so expect to have to return these extra properties for
each use of Text Constructs in your API.  It is a bit of a
high-barrier to entry really.

(If Atom had been designed in JSON, instead of XML, I wonder if it
would have been more sympathetic to the OO/RDBMS crowd, and whether we
would have bothered with such fine-grained language tagging?)

-- 
Dave



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread A. Pagaltzis

* James M Snell [EMAIL PROTECTED] [2006-06-28 20:00]:
 A. Pagaltzis wrote:
  * James M Snell [EMAIL PROTECTED] [2006-06-28 14:35]:
  Hiding the div completely from users of Abdera would mean
  potentially losing important data (e.g. the div may contain
  an xml:lang or xml:base) or forcing me to perform additional
  processing (pushing the in-scope xml:lang/xml:base down to
  child elements of the div.
  
  How is that any different from having to find ways to pass
  any in-scope xml:lang/xml:base down to API clients when the
  content is type=html or type=text? I hope you didn’t punt
  on those?

 Our Content interface has methods for getting to that
 information.

Then stripping the `div` is not an issue, is it?

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Antone Roundy


On Jun 28, 2006, at 12:06 PM, A. Pagaltzis wrote:

* James M Snell [EMAIL PROTECTED] [2006-06-28 20:00]:

A. Pagaltzis wrote:

* James M Snell [EMAIL PROTECTED] [2006-06-28 14:35]:

Hiding the div completely from users of Abdera would mean
potentially losing important data (e.g. the div may contain
an xml:lang or xml:base) or forcing me to perform additional
processing (pushing the in-scope xml:lang/xml:base down to
child elements of the div.


How is that any different from having to find ways to pass
any in-scope xml:lang/xml:base down to API clients when the
content is type=html or type=text? I hope you didn’t punt
on those?


Our Content interface has methods for getting to that
information.


Then stripping the `div` is not an issue, is it?


Consider this:

entry xml:lang=en xml:base=http://example.com/foo/;
...
content type=xhtml
		xhtml:div xml:lang=fr xml:base=http://example.com/ 
feu/xhtml:a href=axe.htmlaxe/xhtml:a/xhtml:div

/content
/entry

Whether there's a problem depends on whether one requests the  
xml:base, xml:lang, or whatever for the atom:content element itself  
or for the CONTENT OF the atom:content element, in which case the  
library could return the values it got from the xhtml:div.  Except in  
unusual cases like this, the result would be identical.


Certainly a distinction could be made between how an XML library  
would handle this vs. how an Atom library would handle it.  An Atom  
processing library might be expected to be able to do things like:


* give me the raw contents of the atom:content element
* give me the contents of the atom:content element converted to well- 
formed XHTML (whether it started as text, escaped tag soup, or inline  
xhtml)


In the former case, keeping the div feels like the right thing to do-- 
the consuming app would have to know to remove it.  In the latter  
case, removing the div from xhtml content feels like the right thing  
to do.  But unless the library gives me the xml:base, for example,  
which applies to the content of the atom:content element (as  
converted to well-formed xhtml or whatever), as opposed to the  
xml:base which applied to the atom:content element itself, there's  
potential for trouble.


...now that I think about it, if the library always returns the  
xml:base which applies to the content of the element, that could  
cause trouble too:


entry xml:lang=en xml:base=http://example.com/;
...
content type=xhtml
		xhtml:div xml:lang=fr xml:base=feu/xhtml:a  
href=axe.htmlaxe/xhtml:a/xhtml:div

/content
/entry

Here, if I get xml:base for the content of content, it will be  
http://example.com/feu/;.  Then, if I get the raw content of the  
element, strip the div, and apply xml:base myself, I'll erroneously  
use http://example.com/feu/feu/; as the base URI unless I know to  
ignore the xml:base attribute on the div.




Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread James M Snell

Antone,

Very good write up.  The fact that xml:base on div is not valid XHTML is
somewhat irrelevant given that there is an identical problem with
xml:lang. For instance, if I have content xml:lang=endiv
xml:lang=fr.../div/content and I drop the div silently, then I've
got a problem.  Granted, the producer of the atom feed really shouldn't
have done this, but we still need to be able to handle it properly if it
does happen.  The solution I think I'm going to go with is to support
both approaches.  Our default behavior will be to return the div.  A
separate API will provide the content without the div.  When it doubt,
do both.

- James

Antone Roundy wrote:
 
 On Jun 28, 2006, at 12:06 PM, A. Pagaltzis wrote:
 * James M Snell [EMAIL PROTECTED] [2006-06-28 20:00]:
 A. Pagaltzis wrote:
 * James M Snell [EMAIL PROTECTED] [2006-06-28 14:35]:
 Hiding the div completely from users of Abdera would mean
 potentially losing important data (e.g. the div may contain
 an xml:lang or xml:base) or forcing me to perform additional
 processing (pushing the in-scope xml:lang/xml:base down to
 child elements of the div.

 How is that any different from having to find ways to pass
 any in-scope xml:lang/xml:base down to API clients when the
 content is type=html or type=text? I hope you didn’t punt
 on those?

 Our Content interface has methods for getting to that
 information.

 Then stripping the `div` is not an issue, is it?
 
 Consider this:
 
 entry xml:lang=en xml:base=http://example.com/foo/;
 ...
 content type=xhtml
 xhtml:div xml:lang=fr
 xml:base=http://example.com/feu/;xhtml:a
 href=axe.htmlaxe/xhtml:a/xhtml:div
 /content
 /entry
 
 Whether there's a problem depends on whether one requests the xml:base,
 xml:lang, or whatever for the atom:content element itself or for the
 CONTENT OF the atom:content element, in which case the library could
 return the values it got from the xhtml:div.  Except in unusual cases
 like this, the result would be identical.
 
 Certainly a distinction could be made between how an XML library would
 handle this vs. how an Atom library would handle it.  An Atom processing
 library might be expected to be able to do things like:
 
 * give me the raw contents of the atom:content element
 * give me the contents of the atom:content element converted to
 well-formed XHTML (whether it started as text, escaped tag soup, or
 inline xhtml)
 
 In the former case, keeping the div feels like the right thing to
 do--the consuming app would have to know to remove it.  In the latter
 case, removing the div from xhtml content feels like the right thing to
 do.  But unless the library gives me the xml:base, for example, which
 applies to the content of the atom:content element (as converted to
 well-formed xhtml or whatever), as opposed to the xml:base which applied
 to the atom:content element itself, there's potential for trouble.
 
 ...now that I think about it, if the library always returns the xml:base
 which applies to the content of the element, that could cause trouble too:
 
 entry xml:lang=en xml:base=http://example.com/;
 ...
 content type=xhtml
 xhtml:div xml:lang=fr xml:base=feu/xhtml:a
 href=axe.htmlaxe/xhtml:a/xhtml:div
 /content
 /entry
 
 Here, if I get xml:base for the content of content, it will be
 http://example.com/feu/;.  Then, if I get the raw content of the
 element, strip the div, and apply xml:base myself, I'll erroneously use
 http://example.com/feu/feu/; as the base URI unless I know to ignore
 the xml:base attribute on the div.
 
 



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread James M Snell

David,

you're right, ideally the xhtml container div would be nothing but the
div, but if it's not, we still need to be prepared to handle it.  Silent
data loss sucks, if it's silly data :-)

- James

David Powell wrote:
 Wednesday, June 28, 2006, 1:22:00 PM, James Snell wrote:
 
 Hiding the div completely from users of Abdera would mean
 potentially losing important data (e.g. the div may contain an xml:lang
 or xml:base)
 
 I don't think that the div should contain an xml:base, because it
 isn't valid to use xml:base in XHTML 1.x. As the xhtml:div is added by
 the producer, it should be removed by the consumer, so there shouldn't
 be an xml:lang in there either. I wouldn't expect consumers to handle
 either consistently, so if you are a producer don't do it. I think in
 my implementation I handle lang and base on the div, and store them
 out-of-band, but it is more by accident than anything.
 [snip]



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Robert Sayre


On 6/28/06, James M Snell [EMAIL PROTECTED] wrote:


Our default behavior will be to return the div.  A
separate API will provide the content without the div.


So, standards-off-by-default then? Unbelievable.

--

Robert Sayre



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread James Holderness


Antone Roundy wrote:

Consider this:

entry xml:lang=en xml:base=http://example.com/foo/;
...
content type=xhtml
xhtml:div xml:lang=fr xml:base=http://example.com/ feu/xhtml:a 
href=axe.htmlaxe/xhtml:a/xhtml:div

/content
/entry


Another observation for those of you curious about interoperability... the 
last time I tested xml:base conformance (which admittedly was a while back) 
I couldn't find a single aggregator that supported xml:base on the div 
element.


Regards
James



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Robert Sayre


Irrelevant. The content in the entries below should be handled the same way:

entry xml:lang=en xml:base=http://example.com/foo/;
  ...
  content type=xhtml
  xhtml:div xml:lang=fr xml:base=http://example.com/
feu/xhtml:a href=axe.htmlaxe/xhtml:a/xhtml:div
  /content
/entry

entry xml:lang=en xml:base=http://example.com/foo/;
  ...
  content type=xhtml xml:lang=fr xml:base=http://example.com/
feu/
  xhtml:div xhtml:a href=axe.htmlaxe/xhtml:a/xhtml:div
  /content
/entry


On 6/28/06, Antone Roundy [EMAIL PROTECTED] wrote:


On Jun 28, 2006, at 12:06 PM, A. Pagaltzis wrote:
 * James M Snell [EMAIL PROTECTED] [2006-06-28 20:00]:
 A. Pagaltzis wrote:
 * James M Snell [EMAIL PROTECTED] [2006-06-28 14:35]:
 Hiding the div completely from users of Abdera would mean
 potentially losing important data (e.g. the div may contain
 an xml:lang or xml:base) or forcing me to perform additional
 processing (pushing the in-scope xml:lang/xml:base down to
 child elements of the div.

 How is that any different from having to find ways to pass
 any in-scope xml:lang/xml:base down to API clients when the
 content is type=html or type=text? I hope you didn't punt
 on those?

 Our Content interface has methods for getting to that
 information.

 Then stripping the `div` is not an issue, is it?

Consider this:

entry xml:lang=en xml:base=http://example.com/foo/;
...
content type=xhtml
xhtml:div xml:lang=fr xml:base=http://example.com/
feu/xhtml:a href=axe.htmlaxe/xhtml:a/xhtml:div
/content
/entry

Whether there's a problem depends on whether one requests the
xml:base, xml:lang, or whatever for the atom:content element itself
or for the CONTENT OF the atom:content element, in which case the
library could return the values it got from the xhtml:div.  Except in
unusual cases like this, the result would be identical.

Certainly a distinction could be made between how an XML library
would handle this vs. how an Atom library would handle it.  An Atom
processing library might be expected to be able to do things like:

* give me the raw contents of the atom:content element
* give me the contents of the atom:content element converted to well-
formed XHTML (whether it started as text, escaped tag soup, or inline
xhtml)

In the former case, keeping the div feels like the right thing to do--
the consuming app would have to know to remove it.  In the latter
case, removing the div from xhtml content feels like the right thing
to do.  But unless the library gives me the xml:base, for example,
which applies to the content of the atom:content element (as
converted to well-formed xhtml or whatever), as opposed to the
xml:base which applied to the atom:content element itself, there's
potential for trouble.

...now that I think about it, if the library always returns the
xml:base which applies to the content of the element, that could
cause trouble too:

entry xml:lang=en xml:base=http://example.com/;
...
content type=xhtml
xhtml:div xml:lang=fr xml:base=feu/xhtml:a
href=axe.htmlaxe/xhtml:a/xhtml:div
/content
/entry

Here, if I get xml:base for the content of content, it will be
http://example.com/feu/;.  Then, if I get the raw content of the
element, strip the div, and apply xml:base myself, I'll erroneously
use http://example.com/feu/feu/; as the base URI unless I know to
ignore the xml:base attribute on the div.





--

Robert Sayre

I would have written a shorter letter, but I did not have the time.



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread James M Snell

Actually, switch this.  I realized after I sent this that I had it
backwards.  The default behavior will be to not return the div. A
separate API will provide the content with the div.

- James

James M Snell wrote:
 [snip]...The solution I think I'm going to go with is to support
 both approaches.  Our default behavior will be to return the div.  A
 separate API will provide the content without the div.  When it doubt,
 do both.
 
 - James
 



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread David Powell


Wednesday, June 28, 2006, 9:55:29 PM, James Snell wrote:

 David,

 you're right, ideally the xhtml container div would be nothing but the
 div, but if it's not, we still need to be prepared to handle it.  Silent
 data loss sucks, if it's silly data :-)

I'm just looking at it from the perspective of the producer and the
consumer.

In my consumer implementation, I take the resolved base URI of the div
(including any xml:base there), and the language context of the div,
discard the div, and store them both out-of-band of the content, with
namespace prefixes inline. That's probably good enough. Some
post-processing is used to convert the data in the store into a form
that allows it to be safely embedded in an HTML page - I've been
trying XSLT (with TagSoup for HTML content).

I don't think that the div should have lang or base attached, but if
it is there, it is better to use it than ignore it, cause it is likely
there for a reason. I wouldn't produce feeds like that though.

If people start using CSS links in feeds (or even just CSS styling in
aggregators), discarding the div could be important.

If you're going to supply an API for extracting usable
[X]HTML, there are a number of features that consumers might want in
some combination:

* Forcing the XHTML to use a blank namespace prefix to make it DTD
  compatable, and removing unused prefixes.

* Resolving relative references (which will inevitably be a lossy
  process)

* Removing XSS risks (intentionally lossy)

I still keep the original content in a reasonably accurate form
though.

-- 
Dave



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Antone Roundy


On Jun 28, 2006, at 3:10 PM, Robert Sayre wrote:

The content in the entries below should be handled the same way:

entry xml:lang=en xml:base=http://example.com/foo/;
  ...
  content type=xhtml
  xhtml:div xml:lang=fr xml:base=http://example.com/
feu/xhtml:a href=axe.htmlaxe/xhtml:a/xhtml:div
  /content
/entry

entry xml:lang=en xml:base=http://example.com/foo/;
  ...
  content type=xhtml xml:lang=fr xml:base=http:// 
example.com/

feu/
  xhtml:div xhtml:a href=axe.htmlaxe/xhtml:a/ 
xhtml:div

  /content
/entry


Of course the end result of both should be identical.  Is that what  
you mean by should be handled the same way?  The question is, if  
the xhtml:div is stripped by the library before handing it off to the  
app, how is the app going to get the attributes that were on the  
div?  Is the library going to push those values down into the content  
or act as if they were on the atom:content element (or something  
similar to that)?


BTW, it just occurred to me that pushing them down into the content  
won't work.  Here's an example where that would fail:


entry xml:lang=en
  ...
  content type=xhtml
  xhtml:div xml:lang=frOui!/xhtml:div
  /content
/entry

Notice that there are no elements inside the xhtml:div for xml:lang  
to be attached to (and even if there were any, any text appearing  
outside of them would not have the correct xml:lang attached to it).


So it looks like the options (both of a which a single library could  
support, of course) are:


* Strip the div, but provide a way to get the attributes that were on it
or
* Leave the div



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Henri Sivonen


On Jun 28, 2006, at 23:53, James M Snell wrote:


or instance, if I have content xml:lang=endiv
xml:lang=fr.../div/content and I drop the div silently, then  
I've

got a problem.


Dropping the div shouldn't mean dropping the language and base URL  
context. You need to communicate those anyway in the case they are  
inherited from higher up in the document tree.


(When the script that generates my feed copies node from a document  
tree to another, it checks the inherited language of the node being  
copied. If it differs from the inherited language of the insertion  
target, the newly inserted copy gets an explicit xml:lang.)


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Henry Story


Thanks everyone for this really interesting discussion. I have added  
a note to this effect to the latest atom-owl ontology [1].


In Atom-Owl we could easily do both.

[] :content xhtml:div xml:lang=frOui!/xhtml:div^:xhtml.

or we could have

[] :content Oui!@fr^:xhtml .

or

[] :content [ :xhtml oui;
  :lang en ].


It would be simplest I suppose to have the :xhtml type be defined as  
always having an div ... element. Except that of course it would  
look odd for xhtml content that contains an html base tag such as


div
  htmlhead...body.../body/head
/div

From this discussion it looks like the most reasonable would be to  
strip the div element. In which case one may wonder what the whole  
purpose of putting the div in the content really was in the first  
place.


Henry

[1] https://sommer.dev.java.net/atom/2006-06-06/awol.html#term_xhtml


On 28 Jun 2006, at 23:53, Antone Roundy wrote:



On Jun 28, 2006, at 3:10 PM, Robert Sayre wrote:

The content in the entries below should be handled the same way:

entry xml:lang=en xml:base=http://example.com/foo/;
  ...
  content type=xhtml
  xhtml:div xml:lang=fr xml:base=http://example.com/
feu/xhtml:a href=axe.htmlaxe/xhtml:a/xhtml:div
  /content
/entry

entry xml:lang=en xml:base=http://example.com/foo/;
  ...
  content type=xhtml xml:lang=fr xml:base=http:// 
example.com/

feu/
  xhtml:div xhtml:a href=axe.htmlaxe/xhtml:a/ 
xhtml:div

  /content
/entry


Of course the end result of both should be identical.  Is that what  
you mean by should be handled the same way?  The question is, if  
the xhtml:div is stripped by the library before handing it off to  
the app, how is the app going to get the attributes that were on  
the div?  Is the library going to push those values down into the  
content or act as if they were on the atom:content element (or  
something similar to that)?


BTW, it just occurred to me that pushing them down into the content  
won't work.  Here's an example where that would fail:


entry xml:lang=en
  ...
  content type=xhtml
  xhtml:div xml:lang=frOui!/xhtml:div
  /content
/entry

Notice that there are no elements inside the xhtml:div for xml:lang  
to be attached to (and even if there were any, any text appearing  
outside of them would not have the correct xml:lang attached to it).


So it looks like the options (both of a which a single library  
could support, of course) are:


* Strip the div, but provide a way to get the attributes that were  
on it

or
* Leave the div




Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Henry Story


On the other hand, if one strips the div element, then :xhtml can  
no longer be an inverse functional property, as contents with  
different bases could have very different meanings. Just think of  
xhtml content with a picture, which in one subtree points to bush,  
and in another one points to Gore, the relative uri references being  
the same in both cases.


This seems to make it more reasonable to create a new literal type  
which contains the div. (it makes finding duplicates in an rdf  
database easier).


On that topic are there not xhtml ways to create xml:base and  
xml:lang elements? Should those not perhaps be used instead on the  
div element?


Henry

On 29 Jun 2006, at 00:11, Henry Story wrote:



Thanks everyone for this really interesting discussion. I have  
added a note to this effect to the latest atom-owl ontology [1].


In Atom-Owl we could easily do both.

[] :content xhtml:div xml:lang=frOui!/xhtml:div^:xhtml.

or we could have

[] :content Oui!@fr^:xhtml .

or

[] :content [ :xhtml oui;
  :lang en ].


It would be simplest I suppose to have the :xhtml type be defined  
as always having an div ... element. Except that of course it  
would look odd for xhtml content that contains an html base tag  
such as


div
  htmlhead...body.../body/head
/div

From this discussion it looks like the most reasonable would be to  
strip the div element. In which case one may wonder what the  
whole purpose of putting the div in the content really was in the  
first place.


Henry

[1] https://sommer.dev.java.net/atom/2006-06-06/awol.html#term_xhtml


On 28 Jun 2006, at 23:53, Antone Roundy wrote:



On Jun 28, 2006, at 3:10 PM, Robert Sayre wrote:

The content in the entries below should be handled the same way:

entry xml:lang=en xml:base=http://example.com/foo/;
  ...
  content type=xhtml
  xhtml:div xml:lang=fr xml:base=http://example.com/
feu/xhtml:a href=axe.htmlaxe/xhtml:a/xhtml:div
  /content
/entry

entry xml:lang=en xml:base=http://example.com/foo/;
  ...
  content type=xhtml xml:lang=fr xml:base=http:// 
example.com/

feu/
  xhtml:div xhtml:a href=axe.htmlaxe/xhtml:a/ 
xhtml:div

  /content
/entry


Of course the end result of both should be identical.  Is that  
what you mean by should be handled the same way?  The question  
is, if the xhtml:div is stripped by the library before handing it  
off to the app, how is the app going to get the attributes that  
were on the div?  Is the library going to push those values down  
into the content or act as if they were on the atom:content  
element (or something similar to that)?


BTW, it just occurred to me that pushing them down into the  
content won't work.  Here's an example where that would fail:


entry xml:lang=en
  ...
  content type=xhtml
  xhtml:div xml:lang=frOui!/xhtml:div
  /content
/entry

Notice that there are no elements inside the xhtml:div for  
xml:lang to be attached to (and even if there were any, any text  
appearing outside of them would not have the correct xml:lang  
attached to it).


So it looks like the options (both of a which a single library  
could support, of course) are:


* Strip the div, but provide a way to get the attributes that were  
on it

or
* Leave the div




Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread James M Snell

I've just made a change in our implementation that in the following case...

  content type=xhtml xml:lang=en xml:base=foo/
div xml:lang=fr xml:base=barOui!/div
  /content

Content.getLanguage() will return fr
Content.getBaseUri() will return foo/bar

The other possible attributes are still available via a Div interface,
but in the default case, things just sort of work themselves out.

- James

Antone Roundy wrote:
 
 On Jun 28, 2006, at 3:10 PM, Robert Sayre wrote:
 The content in the entries below should be handled the same way:

 entry xml:lang=en xml:base=http://example.com/foo/;
   ...
   content type=xhtml
   xhtml:div xml:lang=fr xml:base=http://example.com/
 feu/xhtml:a href=axe.htmlaxe/xhtml:a/xhtml:div
   /content
 /entry

 entry xml:lang=en xml:base=http://example.com/foo/;
   ...
   content type=xhtml xml:lang=fr xml:base=http://example.com/
 feu/
   xhtml:div xhtml:a
 href=axe.htmlaxe/xhtml:a/xhtml:div
   /content
 /entry
 
 Of course the end result of both should be identical.  Is that what you
 mean by should be handled the same way?  The question is, if the
 xhtml:div is stripped by the library before handing it off to the app,
 how is the app going to get the attributes that were on the div?  Is the
 library going to push those values down into the content or act as if
 they were on the atom:content element (or something similar to that)?
 
 BTW, it just occurred to me that pushing them down into the content
 won't work.  Here's an example where that would fail:
 
 entry xml:lang=en
   ...
   content type=xhtml
   xhtml:div xml:lang=frOui!/xhtml:div
   /content
 /entry
 
 Notice that there are no elements inside the xhtml:div for xml:lang to
 be attached to (and even if there were any, any text appearing outside
 of them would not have the correct xml:lang attached to it).
 
 So it looks like the options (both of a which a single library could
 support, of course) are:
 
 * Strip the div, but provide a way to get the attributes that were on it
 or
 * Leave the div
 
 



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread A. Pagaltzis

* Henri Sivonen [EMAIL PROTECTED] [2006-06-29 00:20]:
 On Jun 28, 2006, at 23:53, James M Snell wrote:
 or instance, if I have content xml:lang=endiv
 xml:lang=fr.../div/content and I drop the div silently,
 then  I've got a problem.
 
 Dropping the div shouldn't mean dropping the language and base
 URL context. You need to communicate those anyway in the case
 they are  inherited from higher up in the document tree.

Exactly.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread A. Pagaltzis

* Antone Roundy [EMAIL PROTECTED] [2006-06-28 21:30]:
 Consider this:
 
 entry xml:lang=en xml:base=http://example.com/foo/;
   ...
   content type=xhtml
   xhtml:div xml:lang=fr xml:base=http://example.com/ 
 feu/xhtml:a href=axe.htmlaxe/xhtml:a/xhtml:div
   /content
 /entry
 
 Whether there's a problem depends on whether one requests the
 xml:base, xml:lang, or whatever for the atom:content element
 itself or for the CONTENT OF the atom:content element, in which
 case the library could return the values it got from the
 xhtml:div. Except in unusual cases like this, the result would
 be identical.

I can see your argument, but I find this too fine a distinction.
The `div` is part of the container when `type=xhtml` as far as
I’m concerned. I’d just merge the information with that on the
`content` element and pretend there’s no difference. As far as
the feed’s *meaning* is concerned, there isn’t, after all.

 * give me the raw contents of the atom:content element
 * give me the contents of the atom:content element converted to
   well-formed XHTML (whether it started as text, escaped tag
   soup, or inline xhtml)
 
 In the former case, keeping the div feels like the right thing
 to do -- the consuming app would have to know to remove it. In
 the latter case, removing the div from xhtml content feels
 like the right thing to do.

Yes, that sounds sane. “Give me the raw contents” would be
somehting only an Atom-aware API client would want to do, so it
is reasonable to expect that the client knows what to do with the
`div` when it finds that the content type was `xhtml`. Anyone who
just wants to use the data and doesn’t want to have to care about
how Atom works should just ask for XHTML and not care what it was
originally packaged as.

 ...now that I think about it, if the library always returns the
 xml:base which applies to the content of the element, that
 could cause trouble too:
 
 entry xml:lang=en xml:base=http://example.com/;
   ...
   content type=xhtml
   xhtml:div xml:lang=fr xml:base=feu/xhtml:a
 href=axe.htmlaxe/xhtml:a/xhtml:div
   /content
 /entry
 
 Here, if I get xml:base for the content of content, it will be
 http://example.com/feu/;. Then, if I get the raw content of
 the element, strip the div, and apply xml:base myself, I'll
 erroneously use http://example.com/feu/feu/; as the base URI
 unless I know to ignore the xml:base attribute on the div.

I agree, but I don’t see how that’s at all to the point. Such an
API client is just buggy. If they ask for the raw `content`
content, then they should also ask for the `content` base URI,
not for the content’s base URI.

Guiding API clients to avoid such a mistake should be reasonably
easy by naming the methods appropriately, ie something like
`get_container_content` and `get_container_base` vs `get_content`
and `get_base`. (That the first pair of names is so long is fully
intentional… :-) )

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-28 Thread Robert Sayre


On 6/28/06, James M Snell [EMAIL PROTECTED] wrote:


Actually, switch this.  I realized after I sent this that I had it
backwards.  The default behavior will be to not return the div. A
separate API will provide the content with the div.



Next time, don't start out with egregious obfuscation, and then kick
and scream through tons of list traffic with beyond-bogus arguments.
Here's how it started:

http://mail-archives.apache.org/mod_mbox/incubator-abdera-dev/200606.mbox/[EMAIL 
PROTECTED]

It's a waste of other people's time. Once or twice is understandable,
reasonable people sometimes disagree on basic things. I think we're up
to 20 or 30 of these incidents with you, though.

At this point, it's not something to be replied to with a smart
remark. It belies contempt for your colleagues. We shouldn't have to
sit here and listen to specious tripe because it sounds semi-plausible
to a non-implementor.

It's abusive, and it's much worse than the nasty messages so many of
us have sent.

--

Robert Sayre



http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-27 Thread Robert Sayre


When XHTML content is used,

The XHTML div element itself MUST NOT be considered part of the content.

http://atompub.org/rfc4287.html#rfc.section.4.1.3.3



This is hard to test with aggregators, but conforming libraries
definitely need to get this right.

http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

--

Robert Sayre



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-27 Thread James M Snell

Please define conformance in regards to this test.  That is, what is
the exact behavior that a library must perform when a code library has
an API like, getContent on the content element.

e.g., is a parser not conformant if it passes the DIV on to the
consuming application with the expectation that the application is
responsible for doing the right thing with it?

Robert Sayre wrote:
 
 When XHTML content is used,
 
 The XHTML div element itself MUST NOT be considered part of the content.
 
 http://atompub.org/rfc4287.html#rfc.section.4.1.3.3
 
 
 
 This is hard to test with aggregators, but conforming libraries
 definitely need to get this right.
 
 http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests
 



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-27 Thread Robert Sayre


On 6/27/06, James M Snell [EMAIL PROTECTED] wrote:

Please define conformance in regards to this test.  That is, what is
the exact behavior that a library must perform when a code library has
an API like, getContent on the content element.

e.g., is a parser not conformant if it passes the DIV on to the
consuming application with the expectation that the application is
responsible for doing the right thing with it?


Don't be dense. Would the parser be conformant if it passed on the
feed, entry, and div elements with that expectation? I'll file a bug
on UFP and I bet you it'll get fixed without a question, because there
won't be a bad-faith interpretation to fight. That's two demerits this
week for you. Tsk tsk.

--

Robert Sayre

I would have written a shorter letter, but I did not have the time.



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-27 Thread James M Snell

I'm shooting for at least five demerits. Otherwise, the week will be
completely sunk.  And yes, the parser would be conformant.  Abdera is
conformant even tho it is possible to use Abdera to produce and read
invalid Atom.  Returning the div in the getContent method is incorrect
and I'm fixing that now; making the div available for the application
using Abdera should be ok.  I want to make sure this conformance test
isn't saying that the parser must hide the div completely.

- James

Robert Sayre wrote:
 On 6/27/06, James M Snell [EMAIL PROTECTED] wrote:
 Please define conformance in regards to this test.  That is, what is
 the exact behavior that a library must perform when a code library has
 an API like, getContent on the content element.

 e.g., is a parser not conformant if it passes the DIV on to the
 consuming application with the expectation that the application is
 responsible for doing the right thing with it?
 
 Don't be dense. Would the parser be conformant if it passed on the
 feed, entry, and div elements with that expectation? I'll file a bug
 on UFP and I bet you it'll get fixed without a question, because there
 won't be a bad-faith interpretation to fight. That's two demerits this
 week for you. Tsk tsk.
 



Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

2006-06-27 Thread James Holderness


Robert Sayre wrote:

When XHTML content is used,

The XHTML div element itself MUST NOT be considered part of the content.

http://atompub.org/rfc4287.html#rfc.section.4.1.3.3


FWIW, most aggregators that I've tested do not strip the div element.

Regards
James