On Oct 23, 2008, at 2:10 PM, Jochem Maas wrote:

The order is reversed, so if $host has a non-zero length, it is not
escaped.

first thing that I noticed, second wondering why no charset was specified,
thirdly was wondering why it's not plain:

$host = htmlentities($host);

but nonetheless your point stands, :-)

Yeah, fair enough.

To my credit, I also noticed the problem without spending more than a second or two on that line, but I also recognized how it could be missed. To me, it's similar to missing when someone calls a functions and gets the order of arguments wrong. You can tell what they meant, so the error doesn't stand out as boldly. Perhaps subconsciously you anticipate that they're right, because in most of the code, they are.

The challenge of being perfect is why I've developed a number of tools to help me out. I'm going to release one of the best of these as open source in a few months. I might mention that on this list, since it seems appropriate. Hopefully no one will mind the "advertising" too much. :-)

now about that charset ... your blog post uses UTF-7 to demonstrate the potential for problems ... but htmlentities() doesn't support that charset, or at least not according to the docs, in fact the list of supported charsets
is quite limited, out of curiosity what would your recommendation be
if one is faced with a having 'htmlentize' a string encoded in UTF-7 or
some other charset not supported by htmlentities()?

That's a good question. I would probably convert it to something like UTF-8, escape it, then convert it back. I've never faced this situation, and the scenario I was recreating in my post was when someone attacked Google using UTF-7. Google didn't actually want to support that character encoding.

If you specify ISO-8859-1 in your Content-Type header, it's actually fine to omit the character encoding in htmlentities(), because it uses that by default. (Also, not all mismatches are exploitable.) However, it always catches my eye, because it demonstrates a lax treatment of character encoding in general. I like to see it explicitly declared everywhere.

a second question: strip_tags() doesn't have a charset parameter, how does it manage to cope without knowing the input string encoding? or does it
not and is it actually vulnerable to maliciously encoded input?

My guess would be that it doesn't cope. :-) I never use strip_tags(), so someone else might be able to offer a much better answer.

Hope that helps, and thanks for the discussion.

Chris

--
Chris Shiflett
http://shiflett.org/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to