Re: [W3af-develop] A huge problems with Unicode support in w3af

Andres Riancho Thu, 16 Feb 2012 06:37:03 -0800

Daniel,

On Thu, Feb 16, 2012 at 10:38 AM, Daniel Zulla
<[email protected]> wrote:
>    All software has vulnerabilities, it's in their nature :)
>
>
> Right.
>
>    Don't really. As soon as the byte string enters w3af, the best
> thing to do is to decode it using the best encoding available (the one
> in Content-Encoding header, or some other we might have in the HTTP
> response) and after that all the rest of w3af's code simply forgets
> about encodings and uses the unicode string.
>
>
> Cool.
>
>    Vulnerable to what?
>
>
> A forced crash. I can't see any validation of the incoming data. E.g.:
> Is resp.code really an integer > 100 < 900.


That's because the validation is done in httplib, please see "    def
_read_status(self):" in httplib.py. We use urllib2, which uses
httplib, so we don't have to worry about that. The worse thing that
can happen is that we get a BadStatusLine exception and we're handling
those in our code in order to avoid crashes.

>    We're not assuming that, if the response is not HTTP then httplib,
> or urllib, or urllib2 (don't really know which one) will fail and
> raise an exception.
>
>
> That's my point. I would like to be sure about that. Because, for example,
> if there will be additional c++ based code in w3af one day, and there are
> chances to bypass filters or to cause exceptions, a python exception could
> turn into a really dangerous exploitable flaw in PyQt4 or Cython referenced
> code really quickly.

    Could be, but we ARE doing proper error handling in xUrllib and httplib.py

>    Could you explain me a little bit more about this? I tried to
> google for ChunkOfUnidentified or ChunkOfUnidentifiedData and found
> nothing.
>
>
> http://docs.python.org/release/3.0.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

Quoting you: "Everything is a ChunkOfUnidentified data until it gets
converted to a string. If it's a string, it's Unicode and everthing is
fine. If not, everthing breaks immediately."

"Everything is a ChunkOfUnidentified data until it gets converted to a
string. If it's a string, it's Unicode and everthing is fine." That's
what we're doing now at w3af. We receive a string of bytes and convert
it to a unicode string based on the encoding that was indicated by the
HTTP response. In some cases we're having errors in the conversion
(because of various reasons that would also happen in py3k), that's
why we have those bugs.

"If not, everthing breaks immediately." We're trying to avoid that :)
The problem is that if we use errors=ignore/replace we end up in a
situation where we don't know about the errors and can't fix them.

PS: Please check how to properly answer emails inline so that it is
then easier to answer back :)

> Regards,
> Daniel
> Am 16.02.2012 um 14:26 schrieb Andres Riancho:
>
> Daniel,
>
> On Thu, Feb 16, 2012 at 10:07 AM, Daniel Zulla
> <[email protected]> wrote:
>
> I have analyzed some closed source vulnerability scanners, and audited open
> source scanners like skipfish.
>
> Some of them are ironically vulnerable. Somebody may create an apache2
> module that recognizes attacks in order to force penetration testers'
> software to crash (or worse, e.g. to execute arbitrary code).
>
>
>    All software has vulnerabilities, it's in their nature :)
>
> errors=ignore or errors=replace may be a nice way to go, but - here are my
> two cents:
>
> Treating HTTP Responses as an UnidentifiedChunkOfPossiblyMaliciousData" as
> long as possible is definitely the right way to go.
>
>
>    Don't really. As soon as the byte string enters w3af, the best
> thing to do is to decode it using the best encoding available (the one
> in Content-Encoding header, or some other we might have in the HTTP
> response) and after that all the rest of w3af's code simply forgets
> about encodings and uses the unicode string.
>
> I haven't audited or reviewed the httplib, but the "from_httplib_resp"
> method, looks extremely vulnerable:
>
>
>    Vulnerable to what?
>
>    resp = httplibresp
>
>    code, msg, hdrs, body = (resp.code, resp.msg, resp.info(), resp.read())
>
>
>    if original_url:
>
>        url_inst = url_object(resp.geturl(), original_url.encoding)
>
>    else:
>
>        url_inst = original_url = url_object(resp.geturl())
>
>
>    charset = getattr(httplibresp, 'encoding', None)
>
>    return httpResponse(code, body, hdrs, url_inst,
>
>                        original_url, msg, charset=charset)
>
>
> I am just skeptical about assuming that the response of a webserver is valid
> HTTP.
>
>
>    We're not assuming that, if the response is not HTTP then httplib,
> or urllib, or urllib2 (don't really know which one) will fail and
> raise an exception.
>
> That's why i mentioned py3k - it's exactly how Python3 handles external
> data:
>
> Everything is a ChunkOfUnidentified data until it gets converted to a
> string. If it's a string, it's Unicode and everthing is fine. If not,
> everthing breaks immediately.
>
>
>    Could you explain me a little bit more about this? I tried to
> google for ChunkOfUnidentified or ChunkOfUnidentifiedData and found
> nothing.
>
>
> Regards,
>
> Daniel
>
>
> Am 16.02.2012 um 13:33 schrieb Andres Riancho:
>
>
> sends a string of bytes back to you in the HTTP response.
>
>
> Do you have some code / a example where those exceptions usually appear in
> the current w3af code?
>
>
> Regards,
>
> Daniel
>
>
> Am 15.02.2012 um 22:06 schrieb Javier Andalia:
>
>
> Hello Daniel,
>
>
> On Wed, Feb 15, 2012 at 5:11 PM, Daniel Zulla
>
> <[email protected]> wrote:
>
> What about switching over to Python3?
>
> It solves the UnicodeDecodeException madness.
>
>
> Can you please be more specific? What exactly do you have in mind?
>
>
> Maybe I'm wrong, but the way I see it w3af would still
>
> receive/transmit encoded bytes so there's no way to skip the
>
> bytestring_to_unicode and unicode_to_bytestring conversions. Not even
>
> in py3k.
>
>
> Regards,
>
>
> Javier
>
>
>
>
>
>
>
>
> --
> Andrés Riancho
> Director of Web Security at Rapid7 LLC
> Founder at Bonsai Information Security
> Project Leader at w3af
>
>



-- 
Andrés Riancho
Director of Web Security at Rapid7 LLC
Founder at Bonsai Information Security
Project Leader at w3af

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
W3af-develop mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Re: [W3af-develop] A huge problems with Unicode support in w3af

Reply via email to