Re: clarification: "escaped"

2006-07-26 Thread James Holderness


Antone Roundy wrote:

Converting & to & and < to < is sufficient


People keep missing this so I'm going to point it out one more time: there 
are certain rare circumstances when a right angle bracket (>) MUST be 
escaped so if you're just doing ampersands and left angle brackets that 
WON'T always be sufficient. To be safe it's best to always encode all three.


As for CDATA sections, it's worth noting that you wouldn't have been able to 
syndicate this message thread if you always escaped everything with CDATA.


Regards
James



Re: clarification: "escaped"

2006-07-26 Thread A. Pagaltzis

* Antone Roundy <[EMAIL PROTECTED]> [2006-07-26 16:45]:
> Or you put the whole thing in a CDATA block.

Which is the easiest option, so long as you remember the edge
case of having to turn any `]]>` sequences in the input into
`]]>]]>

Re: clarification: "escaped"

2006-07-26 Thread Antone Roundy


On Jul 26, 2006, at 3:19 AM, Bill de hÓra wrote:

A. Pagaltzis wrote:

* Robert Sayre <[EMAIL PROTECTED]> [2006-07-26 01:45]:

On 7/25/06, Bill de hÓra <[EMAIL PROTECTED]> wrote:

And I didn't know whether Atom code could get away with
escaping < and &.

 hmm

that is an XML fatal error, no doubt, as the ampersand before
"nbsp" must be escaped.

But he did say “escaping < and &”, so it would be. I’m not sure
what Bill’s question even is.


What do I escape, so I know what to unescape?


The point is that after your XML parser has unescaped the content of  
the element, it should be suitable for handling as HTML.  Escape  
whatever you have to ensure that the consumer gets HTML from their  
XML parser.  Converting & to & and < to < is sufficient  
(assuming that you've started with HTML--if you've started with plain  
text, then you need to double escape, but in that case, you should be  
using type="text" anyway to save yourself the trouble).  You could  
also convert > to >, " to ", ' to ' and any other  
characters to numeric character references.  Or you put the whole  
thing in a CDATA block.




Re: clarification: "escaped"



A. Pagaltzis wrote:

* Robert Sayre <[EMAIL PROTECTED]> [2006-07-26 01:45]:

On 7/25/06, Bill de hÓra <[EMAIL PROTECTED]> wrote:

And I didn't know whether Atom code could get away with
escaping < and &.

 hmm

that is an XML fatal error, no doubt, as the ampersand before
"nbsp" must be escaped.


But he did say “escaping < and &”, so it would be. I’m not sure
what Bill’s question even is.


What do I escape, so I know what to unescape?

cheers
Bill



Re: clarification: "escaped"


* Robert Sayre <[EMAIL PROTECTED]> [2006-07-26 01:45]:
> On 7/25/06, Bill de hÓra <[EMAIL PROTECTED]> wrote:
> >And I didn't know whether Atom code could get away with
> >escaping < and &.
> 
>  hmm
> 
> that is an XML fatal error, no doubt, as the ampersand before
> "nbsp" must be escaped.

But he did say “escaping < and &”, so it would be. I’m not sure
what Bill’s question even is.

Regards,
-- 
Aristotle Pagaltzis // 



Re: Re: clarification: "escaped"



On 7/25/06, Bill de hÓra <[EMAIL PROTECTED]> wrote:


It came up on django irc. I'd assumed for whatever reason that escaping
was limited to the usual XML suspects, but when asked about html content
I knew I didn't know for sure, especially wrt HTML character entities.
And I didn't know whether Atom code could get away with escaping < and &.


I'm not certain I understand the issue, but if the question concerns
what happens when an Atom processor encounters a document with no
declared entities and contains a title like this:

 hmm

that is an XML fatal error, no doubt, as the ampersand before "nbsp"
must be escaped. Concretely, Mozilla will give you a DOM with a
non-breaking space if you write this:

 hmm

--

Robert Sayre

"I would have written a shorter letter, but I did not have the time."



Re: clarification: "escaped"



Robert Sayre wrote:

On 7/25/06, Bill de hÓra <[EMAIL PROTECTED]> wrote:


The RFC says that the content should be 'escaped' for type 'text/html'
in 3.1.1.2, but dosn't define what that is.


IIRC, WG discussion touched on this point, and the WG decided the
definition wasn't important, given the example. Is there a problem
you're hoping to clear up?


It came up on django irc. I'd assumed for whatever reason that escaping 
was limited to the usual XML suspects, but when asked about html content 
I knew I didn't know for sure, especially wrt HTML character entities. 
And I didn't know whether Atom code could get away with escaping < and &.


cheers
Bill



Re: clarification: "escaped"



On 7/25/06, Bill de hÓra <[EMAIL PROTECTED]> wrote:


The RFC says that the content should be 'escaped' for type 'text/html'
in 3.1.1.2, but dosn't define what that is.


IIRC, WG discussion touched on this point, and the WG decided the
definition wasn't important, given the example. Is there a problem
you're hoping to clear up?

--

Robert Sayre

"I would have written a shorter letter, but I did not have the time."



clarification: "escaped"



The RFC says that the content should be 'escaped' for type 'text/html' 
in 3.1.1.2, but dosn't define what that is. Is it as defined in XML:


http://www.w3.org/TR/REC-xml/#dt-escape

?

cheers
Bill