Re: atom:name ... text or html?

2006-03-23 Thread Anne van Kesteren


Quoting Eric Scheid [EMAIL PROTECTED]:

If I have an author with the name Bertrand Café, is it acceptable to put
that into atom:author like this;

   authorname![CDATA[Bertrand Cafeacute;]]/name/author

or should I be using the unicode numeric entity instead?


Even if it was HTML you couldn't really use the entity, could you? I 
think you

have to use a character reference or the actual character instead, yes.


--
Anne van Kesteren
http://annevankesteren.nl/




Re: atom:name ... text or html?

2006-03-23 Thread James M Snell

+1 to what Anne says.  If I received that Atom author name, I would
display it exactly as presented Bertrand Cafeacute;

- James

Anne van Kesteren wrote:
 
 Quoting Eric Scheid [EMAIL PROTECTED]:
 If I have an author with the name Bertrand Café, is it acceptable to
 put
 that into atom:author like this;

authorname![CDATA[Bertrand Cafeacute;]]/name/author

 or should I be using the unicode numeric entity instead?
 
 Even if it was HTML you couldn't really use the entity, could you? I
 think you
 have to use a character reference or the actual character instead, yes.
 
 



Re: atom:name ... text or html?

2006-03-23 Thread James Holderness


Hahaha! It's RSS all over again. In the words of Mark Pilgrim: Here's 
something that might be HTML. Or maybe not. I can't tell you, and you can't 
guess. :-)


Seriously though, the atom:name element is described as a human-readable 
name, so unless your name really is Betrand Cafeacture; that can't be 
right. If RFC4287 had intended to allow markup in the element it would have 
used atomTextConstruct.


Regards
James

Eric Scheid wrote:

If I have an author with the name Bertrand Café, is it acceptable to put
that into atom:author like this;

   authorname![CDATA[Bertrand Cafeacute;]]/name/author




Re: atom:name ... text or html?

2006-03-23 Thread A. Pagaltzis

* Eric Scheid [EMAIL PROTECTED] [2006-03-23 17:30]:
If I have an author with the name Bertrand Café, is it
acceptable to put that into atom:author like this;

authorname![CDATA[Bertrand Cafeacute;]]/name/author

No. That means the author’s name is Bertrand Cafeacute; (he must
have had very cruel parents), not Bertrand Café.

or should I be using the unicode numeric entity instead?

Yes. Or use a literal é as you did in this mail, provided you
emit the feed as UTF-8 (or ISO-8859-1, if you must).

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: atom:name ... text or html?

2006-03-23 Thread Sylvain Hellegouarch





Seriously though, the atom:name element is described as a 
human-readable name, 
Do you mean that human-readable is equivalent to solely English? 
Because as a French, having accents in names is so natural that I see it 
as human readable too ;)


- Sylvain




Re: atom:name ... text or html?

2006-03-23 Thread James Holderness


Sylvain Hellegouarch wrote:
Do you mean that human-readable is equivalent to solely English? Because 
as a French, having accents in names is so natural that I see it as human 
readable too ;)


No. I mean that the literal sequence of characters  e a c u t e ; is not 
human-readable (or at least isn't intended to be).


Regards
James



Re: atom:name ... text or html?

2006-03-23 Thread Stephane Bortzmeyer

On Fri, Mar 24, 2006 at 03:16:18AM +1100,
 Eric Scheid [EMAIL PROTECTED] wrote 
 a message of 10 lines which said:

 or should I be using the unicode numeric entity instead?

Or the character itself, in UTF-8 or any other encoding (but UTF-8 is
the most widely implemented, so you limit the risks).

(That's what I do with http://www.bortzmeyer.org/feed.atom and it
seems OK in every agregator and it validates.)



Re: atom:name ... text or html?

2006-03-23 Thread David Powell


Thursday, March 23, 2006, 4:57:11 PM, you wrote:

 On 24/3/06 3:21 AM, Anne van Kesteren [EMAIL PROTECTED] wrote:

 authorname![CDATA[Bertrand Cafeacute;]]/name/author
 
 Even if it was HTML you couldn't really use the entity, could you? I think
 you have to use a character reference or the actual character instead, yes.
 

 It's true that XML has only a half dozen or so entities defined, meaning
 most interesting entities from html can't exist in XML ... unless maybe they
 are wrapped like in CDATA block like above?

atom:name is not intended to contain HTML, the spec for it doesn't
mention HTML, it is no more correct to put HTML in it, than it is to
put base64'd PDF in there.

 I'm getting the data by scraping an html page, so I'm expecting it to be
 acceptable html code, including html entities.

Your HTML parser should decode the entities for you and return a
string. Your Atom generator should encode or escape the string using
numeric entities.

If you really need to use HTML entities directly, then you could put:

!DOCTYPE feed [
!ENTITY eacute #233;
]

at the top of your feed and get rid of that CDATA. XML processors are
REQUIRED [1] to process internal DTD subsets.

[Hmm, internal DTD subsets completely fail in IE7's feed reader,
throwing up a friendly error message]

[1] http://www.w3.org/TR/2004/REC-xml-20040204/#proc-types

-- 
Dave



Re: atom:name ... text or html?

2006-03-23 Thread Stephane Bortzmeyer

On Thu, Mar 23, 2006 at 05:01:03PM +0100,
 Sylvain Hellegouarch [EMAIL PROTECTED] wrote 
 a message of 11 lines which said:

 Because as a French, having accents in names is so natural that I
 see it as human readable too ;)

As I wrote and used and tested on my blog, there is no problem in Atom
to have a first name with accent like mine. Atom is XML and therefore
Unicode rules.



Re: atom:name ... text or html?

2006-03-23 Thread A. Pagaltzis

* Eric Scheid [EMAIL PROTECTED] [2006-03-23 18:05]:
It's true that XML has only a half dozen or so entities defined,
meaning most interesting entities from html can't exist in XML
... unless maybe they are wrapped like in CDATA block like
above?

No, a CDATA block simply means that characters like ,  and 
stand for themselves.

I'm getting the data by scraping an html page, so I'm expecting
it to be acceptable html code, including html entities.

Then decode the entities to a Unicode string and emit the feed as
Unicode. Simplest thing that will work reliably.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: atom:name ... text or html?

2006-03-23 Thread A. Pagaltzis

* Sylvain Hellegouarch [EMAIL PROTECTED] [2006-03-23 18:15]:
Do you mean that human-readable is equivalent to solely
English? Because as a French, having accents in names is so
natural that I see it as human readable too ;)

Even as a French, you probably write é, not eacute;. :-)

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: atom:name ... text or html?

2006-03-23 Thread Antone Roundy


On Mar 23, 2006, at 9:48 AM, James Holderness wrote:
Hahaha! It's RSS all over again. In the words of Mark Pilgrim:  
Here's something that might be HTML. Or maybe not. I can't tell  
you, and you can't guess. :-)


Seriously though, the atom:name element is described as a human- 
readable name, so unless your name really is Betrand  
Cafeacture; that can't be right. If RFC4287 had intended to allow  
markup in the element it would have used atomTextConstruct.


I agree with James here--if we had intended for the name to be able  
to include markup, we should have used the construct we created to  
allow that.  This from RFC 4287 (section 3.2):


   element atom:name { text }

would have been this:

   element atom:name { atomTextConstruct }

if we had intended for it to be able to contain anything but literal  
text after XML un-escaping, right?


On Mar 23, 2006, at 9:57 AM, Eric Scheid wrote:
It's true that XML has only a half dozen or so entities defined,  
meaning
most interesting entities from html can't exist in XML ... unless  
maybe they

are wrapped like in CDATA block like above?
If they're wrapped in a CDATA block, then they don't trigger an XML  
parsing error, but wrapping something in CDATA isn't a license to  
enter data in a format other than what the RFC allows.


I'm getting the data by scraping an html page, so I'm expecting it  
to be

acceptable html code, including html entities.
You, the producer, are getting the data from an HTML page, so you  
should certainly be prepared to handle HTML entities in it. But you  
the Atom publisher are responsible for making sure that you've made  
any changes to the data that are necessary for it to be proper Atom  
before you publish it. The consumer of the Atom feed doesn't know  
where you got the data, and thus can't be expected to decide how to  
process it based on where you got it.




Re: atom:name ... text or html?

2006-03-23 Thread James Holderness


David Powell wrote:

[Hmm, internal DTD subsets completely fail in IE7's feed reader,
throwing up a friendly error message]


If I remember correctly they considered that a feature. Something to do with 
DTDs being a security risk. I'm not sure if this also meant they were 
incapable of processing Netscape RSS 0.91 feeds. All I know is that if I 
ever have a blog, I'll be sure to include a DTD at the top of my feed.


Regards
James



Re: atom:name ... text or html?

2006-03-23 Thread Tim Bray



On Mar 23, 2006, at 8:01 AM, Sylvain Hellegouarch wrote:






Seriously though, the atom:name element is described as a human- 
readable name,
Do you mean that human-readable is equivalent to solely English?  
Because as a French, having accents in names is so natural that I  
see it as human readable too ;)


You can have accents, you just can't use HTML entities to get them. -Tim



Re: atom:name ... text or html?

2006-03-23 Thread Tim Bray



On Mar 23, 2006, at 8:57 AM, Eric Scheid wrote:



On 24/3/06 3:21 AM, Anne van Kesteren [EMAIL PROTECTED]  
wrote:



authorname![CDATA[Bertrand Cafeacute;]]/name/author

Even if it was HTML you couldn't really use the entity, could  
you? I think
you have to use a character reference or the actual character  
instead, yes.




It's true that XML has only a half dozen or so entities defined


To be precise, 5: lt; amp; gt; apos; quot; -Tim



Re: atom:name ... text or html?

2006-03-23 Thread Tim Bray


On Mar 23, 2006, at 8:16 AM, Eric Scheid wrote:

If I have an author with the name Bertrand Café, is it acceptable  
to put

that into atom:author like this;

authorname![CDATA[Bertrand Cafeacute;]]/name/author

or should I be using the unicode numeric entity instead?


The key point is that the atom:name element, described in RFC4287  
3.2.1, is not a Text Construct, as defined in 3.1, so you can't say  
atom:name type=html; so no markup allowed.  So just say Bertrand  
Café.  -Tim





Re: atom:name ... text or html?

2006-03-23 Thread Eric Scheid

On 24/3/06 4:42 AM, A. Pagaltzis [EMAIL PROTECTED] wrote:

 I'm getting the data by scraping an html page, so I'm expecting
 it to be acceptable html code, including html entities.
 
 Then decode the entities to a Unicode string and emit the feed as
 Unicode. Simplest thing that will work reliably.

I figured as much. Oh well, now to track down a list of html entities and
their corresponding unicodes ...

e.



Re: atom:name ... text or html?

2006-03-23 Thread Tim Bray



On Mar 23, 2006, at 2:20 PM, Eric Scheid wrote:


Oh well, now to track down a list of html entities and
their corresponding unicodes ...


http://www.google.com/search?q=xhtml%20entities



Re: atom:name ... text or html?

2006-03-23 Thread A. Pagaltzis

* Eric Scheid [EMAIL PROTECTED] [2006-03-23 23:30]:
Oh well, now to track down a list of html entities and their
corresponding unicodes ...

That would be in the spec.
http://www.w3.org/TR/REC-html40/sgml/entities.html

But you shouldn’t have to. Any self-respecting language has a
library for that somewhere.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/