Re: atom:name ... text or html?
Quoting Eric Scheid [EMAIL PROTECTED]: If I have an author with the name Bertrand Café, is it acceptable to put that into atom:author like this; authorname![CDATA[Bertrand Cafeacute;]]/name/author or should I be using the unicode numeric entity instead? Even if it was HTML you couldn't really use the entity, could you? I think you have to use a character reference or the actual character instead, yes. -- Anne van Kesteren http://annevankesteren.nl/
Re: atom:name ... text or html?
+1 to what Anne says. If I received that Atom author name, I would display it exactly as presented Bertrand Cafeacute; - James Anne van Kesteren wrote: Quoting Eric Scheid [EMAIL PROTECTED]: If I have an author with the name Bertrand Café, is it acceptable to put that into atom:author like this; authorname![CDATA[Bertrand Cafeacute;]]/name/author or should I be using the unicode numeric entity instead? Even if it was HTML you couldn't really use the entity, could you? I think you have to use a character reference or the actual character instead, yes.
Re: atom:name ... text or html?
Hahaha! It's RSS all over again. In the words of Mark Pilgrim: Here's something that might be HTML. Or maybe not. I can't tell you, and you can't guess. :-) Seriously though, the atom:name element is described as a human-readable name, so unless your name really is Betrand Cafeacture; that can't be right. If RFC4287 had intended to allow markup in the element it would have used atomTextConstruct. Regards James Eric Scheid wrote: If I have an author with the name Bertrand Café, is it acceptable to put that into atom:author like this; authorname![CDATA[Bertrand Cafeacute;]]/name/author
Re: atom:name ... text or html?
* Eric Scheid [EMAIL PROTECTED] [2006-03-23 17:30]: If I have an author with the name Bertrand Café, is it acceptable to put that into atom:author like this; authorname![CDATA[Bertrand Cafeacute;]]/name/author No. That means the author’s name is Bertrand Cafeacute; (he must have had very cruel parents), not Bertrand Café. or should I be using the unicode numeric entity instead? Yes. Or use a literal é as you did in this mail, provided you emit the feed as UTF-8 (or ISO-8859-1, if you must). Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: atom:name ... text or html?
Seriously though, the atom:name element is described as a human-readable name, Do you mean that human-readable is equivalent to solely English? Because as a French, having accents in names is so natural that I see it as human readable too ;) - Sylvain
Re: atom:name ... text or html?
Sylvain Hellegouarch wrote: Do you mean that human-readable is equivalent to solely English? Because as a French, having accents in names is so natural that I see it as human readable too ;) No. I mean that the literal sequence of characters e a c u t e ; is not human-readable (or at least isn't intended to be). Regards James
Re: atom:name ... text or html?
On Fri, Mar 24, 2006 at 03:16:18AM +1100, Eric Scheid [EMAIL PROTECTED] wrote a message of 10 lines which said: or should I be using the unicode numeric entity instead? Or the character itself, in UTF-8 or any other encoding (but UTF-8 is the most widely implemented, so you limit the risks). (That's what I do with http://www.bortzmeyer.org/feed.atom and it seems OK in every agregator and it validates.)
Re: atom:name ... text or html?
Thursday, March 23, 2006, 4:57:11 PM, you wrote: On 24/3/06 3:21 AM, Anne van Kesteren [EMAIL PROTECTED] wrote: authorname![CDATA[Bertrand Cafeacute;]]/name/author Even if it was HTML you couldn't really use the entity, could you? I think you have to use a character reference or the actual character instead, yes. It's true that XML has only a half dozen or so entities defined, meaning most interesting entities from html can't exist in XML ... unless maybe they are wrapped like in CDATA block like above? atom:name is not intended to contain HTML, the spec for it doesn't mention HTML, it is no more correct to put HTML in it, than it is to put base64'd PDF in there. I'm getting the data by scraping an html page, so I'm expecting it to be acceptable html code, including html entities. Your HTML parser should decode the entities for you and return a string. Your Atom generator should encode or escape the string using numeric entities. If you really need to use HTML entities directly, then you could put: !DOCTYPE feed [ !ENTITY eacute #233; ] at the top of your feed and get rid of that CDATA. XML processors are REQUIRED [1] to process internal DTD subsets. [Hmm, internal DTD subsets completely fail in IE7's feed reader, throwing up a friendly error message] [1] http://www.w3.org/TR/2004/REC-xml-20040204/#proc-types -- Dave
Re: atom:name ... text or html?
On Thu, Mar 23, 2006 at 05:01:03PM +0100, Sylvain Hellegouarch [EMAIL PROTECTED] wrote a message of 11 lines which said: Because as a French, having accents in names is so natural that I see it as human readable too ;) As I wrote and used and tested on my blog, there is no problem in Atom to have a first name with accent like mine. Atom is XML and therefore Unicode rules.
Re: atom:name ... text or html?
* Eric Scheid [EMAIL PROTECTED] [2006-03-23 18:05]: It's true that XML has only a half dozen or so entities defined, meaning most interesting entities from html can't exist in XML ... unless maybe they are wrapped like in CDATA block like above? No, a CDATA block simply means that characters like , and stand for themselves. I'm getting the data by scraping an html page, so I'm expecting it to be acceptable html code, including html entities. Then decode the entities to a Unicode string and emit the feed as Unicode. Simplest thing that will work reliably. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: atom:name ... text or html?
* Sylvain Hellegouarch [EMAIL PROTECTED] [2006-03-23 18:15]: Do you mean that human-readable is equivalent to solely English? Because as a French, having accents in names is so natural that I see it as human readable too ;) Even as a French, you probably write é, not eacute;. :-) Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: atom:name ... text or html?
On Mar 23, 2006, at 9:48 AM, James Holderness wrote: Hahaha! It's RSS all over again. In the words of Mark Pilgrim: Here's something that might be HTML. Or maybe not. I can't tell you, and you can't guess. :-) Seriously though, the atom:name element is described as a human- readable name, so unless your name really is Betrand Cafeacture; that can't be right. If RFC4287 had intended to allow markup in the element it would have used atomTextConstruct. I agree with James here--if we had intended for the name to be able to include markup, we should have used the construct we created to allow that. This from RFC 4287 (section 3.2): element atom:name { text } would have been this: element atom:name { atomTextConstruct } if we had intended for it to be able to contain anything but literal text after XML un-escaping, right? On Mar 23, 2006, at 9:57 AM, Eric Scheid wrote: It's true that XML has only a half dozen or so entities defined, meaning most interesting entities from html can't exist in XML ... unless maybe they are wrapped like in CDATA block like above? If they're wrapped in a CDATA block, then they don't trigger an XML parsing error, but wrapping something in CDATA isn't a license to enter data in a format other than what the RFC allows. I'm getting the data by scraping an html page, so I'm expecting it to be acceptable html code, including html entities. You, the producer, are getting the data from an HTML page, so you should certainly be prepared to handle HTML entities in it. But you the Atom publisher are responsible for making sure that you've made any changes to the data that are necessary for it to be proper Atom before you publish it. The consumer of the Atom feed doesn't know where you got the data, and thus can't be expected to decide how to process it based on where you got it.
Re: atom:name ... text or html?
David Powell wrote: [Hmm, internal DTD subsets completely fail in IE7's feed reader, throwing up a friendly error message] If I remember correctly they considered that a feature. Something to do with DTDs being a security risk. I'm not sure if this also meant they were incapable of processing Netscape RSS 0.91 feeds. All I know is that if I ever have a blog, I'll be sure to include a DTD at the top of my feed. Regards James
Re: atom:name ... text or html?
On Mar 23, 2006, at 8:01 AM, Sylvain Hellegouarch wrote: Seriously though, the atom:name element is described as a human- readable name, Do you mean that human-readable is equivalent to solely English? Because as a French, having accents in names is so natural that I see it as human readable too ;) You can have accents, you just can't use HTML entities to get them. -Tim
Re: atom:name ... text or html?
On Mar 23, 2006, at 8:57 AM, Eric Scheid wrote: On 24/3/06 3:21 AM, Anne van Kesteren [EMAIL PROTECTED] wrote: authorname![CDATA[Bertrand Cafeacute;]]/name/author Even if it was HTML you couldn't really use the entity, could you? I think you have to use a character reference or the actual character instead, yes. It's true that XML has only a half dozen or so entities defined To be precise, 5: lt; amp; gt; apos; quot; -Tim
Re: atom:name ... text or html?
On Mar 23, 2006, at 8:16 AM, Eric Scheid wrote: If I have an author with the name Bertrand Café, is it acceptable to put that into atom:author like this; authorname![CDATA[Bertrand Cafeacute;]]/name/author or should I be using the unicode numeric entity instead? The key point is that the atom:name element, described in RFC4287 3.2.1, is not a Text Construct, as defined in 3.1, so you can't say atom:name type=html; so no markup allowed. So just say Bertrand Café. -Tim
Re: atom:name ... text or html?
On 24/3/06 4:42 AM, A. Pagaltzis [EMAIL PROTECTED] wrote: I'm getting the data by scraping an html page, so I'm expecting it to be acceptable html code, including html entities. Then decode the entities to a Unicode string and emit the feed as Unicode. Simplest thing that will work reliably. I figured as much. Oh well, now to track down a list of html entities and their corresponding unicodes ... e.
Re: atom:name ... text or html?
On Mar 23, 2006, at 2:20 PM, Eric Scheid wrote: Oh well, now to track down a list of html entities and their corresponding unicodes ... http://www.google.com/search?q=xhtml%20entities
Re: atom:name ... text or html?
* Eric Scheid [EMAIL PROTECTED] [2006-03-23 23:30]: Oh well, now to track down a list of html entities and their corresponding unicodes ... That would be in the spec. http://www.w3.org/TR/REC-html40/sgml/entities.html But you shouldn’t have to. Any self-respecting language has a library for that somewhere. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/