Re: [Quixote-users] urllib.quote() and cgi.escape()

Patrik Simons Thu, 26 Jan 2006 09:12:23 -0800

On Thu, 26 Jan 2006 13:16:02 +0100 mario ruggier <[EMAIL PROTECTED]> wrote:


> 
> On Jan 24, 2006, at 8:21 PM, Titus Brown wrote:
> 
> > So, should htmlescape deal with this differently?
> >
> > right now it does this:
> >
> >>>> print str(htmlescape("'"))
> > '
> >>>> print str(htmlescape('"'))
> > &quot;
> 
> If you were trying to use these characters in a URI value (for their 
> normal meaning in that context!) then my understanding is that you have 
> to use their HTML char entities: &amp; &lt; &gt; &quot;. This way, the  
> HTML document can be valid.
> 
> If however you are trying to use them as a string literal value in a 
> URI context, then you should use the %xx mechanism.

And here quixote.html.url_quote does it wrong, imho. If you set
quixote.DEFAULT_CHARSET to 'utf-8' and then url_quote a unicode string,
url_quote should first encode the string as utf-8 and then quote it.

It doesn't and quixote breaks with a UnicodeDecodeError on urls like
this one: u'/component?test=\xc4'

Compare:
>>> url_quote(u'\xc4')
'%C4'
>>> url_quote(u'\xc4'.encode('utf-8'))
'%C3%84'

The problem happens in the functions quixote.http_request.parse_query
and _decode_string: '\xc4'.decode('utf-8') -> UnicodeDecodeError.

> 
> (I was however unable to easily find a clear and convenient statement 
> of the above in RFC 2396).
> 
> Now, in your original question, you were actually trying to use such 
> characters in the literal value attribute of an input element... as 
> this value can become a part of the URL for the page (e.g. in the 
> querystring) than it should follow that it should be escaped with 
> urllib.quote(), i.e. the %xx mechanism.
> 
> So, similar to your original example:
> '<input name="one" value="%s" />' 
> %("""contains'different"quotes&stuff""")
> 
> and assume some other input field:
> '<input name="two" value="normal" />'
> 
> if we submit the form (or specify the fields in the querystring for the 
> page) we should end up with a  querystring such as:
> ?one=contains%27different%22quotes%26stuff&amp;two=normal
> 
> Note that the &amp; (as delimeter!) is html escaped as it should be, 
> but the & as literal value (%26) is url escaped (as it should be?).
> 
> But, re your actual question above, I was under the impression that the 
> "'" character should also be escaped with &apos; ... but, I see that 
> this char entity is not even listed in 
> <http://www.w3.org/TR/REC-html40/sgml/entities.html>. So, maybe not.
> 
> mario
> 
> _______________________________________________
> Quixote-users mailing list
> [email protected]
> http://mail.mems-exchange.org/mailman/listinfo/quixote-users
> 


-- 
Patrik
_______________________________________________
Quixote-users mailing list
[email protected]
http://mail.mems-exchange.org/mailman/listinfo/quixote-users

Re: [Quixote-users] urllib.quote() and cgi.escape()

Reply via email to