On Jan 24, 2006, at 8:21 PM, Titus Brown wrote:

So, should htmlescape deal with this differently?

right now it does this:

print str(htmlescape("'"))
'
print str(htmlescape('"'))
"

If you were trying to use these characters in a URI value (for their normal meaning in that context!) then my understanding is that you have to use their HTML char entities: & < > ". This way, the HTML document can be valid.

If however you are trying to use them as a string literal value in a URI context, then you should use the %xx mechanism.

(I was however unable to easily find a clear and convenient statement of the above in RFC 2396).

Now, in your original question, you were actually trying to use such characters in the literal value attribute of an input element... as this value can become a part of the URL for the page (e.g. in the querystring) than it should follow that it should be escaped with urllib.quote(), i.e. the %xx mechanism.

So, similar to your original example:
'<input name="one" value="%s" />' %("""contains'different"quotes&stuff""")

and assume some other input field:
'<input name="two" value="normal" />'

if we submit the form (or specify the fields in the querystring for the page) we should end up with a querystring such as:
?one=contains%27different%22quotes%26stuff&amp;two=normal

Note that the &amp; (as delimeter!) is html escaped as it should be, but the & as literal value (%26) is url escaped (as it should be?).

But, re your actual question above, I was under the impression that the "'" character should also be escaped with &apos; ... but, I see that this char entity is not even listed in <http://www.w3.org/TR/REC-html40/sgml/entities.html>. So, maybe not.

mario

_______________________________________________
Quixote-users mailing list
[email protected]
http://mail.mems-exchange.org/mailman/listinfo/quixote-users

Reply via email to