Nevertheless, I just wanted to point out that not every library seems to properly validate/sanitize all the input: (core/data/url/handlers/redirect.py)
# fix a possible malformed URL urlparts = urlparse.urlparse(newurl) if not urlparts.path: urlparts = list(urlparts) urlparts[2] = "/" newurl = urlparse.urlunparse(urlparts) urlparse for example, won't complain about a URL like "http://foobar.com:some_non_integer_input/foo...". >>> urlparse.urlparse("http://w3af.org:fooo:myhost.com/foo?bar=bla").netloc 'w3af.org:fooo:myhost.com' >>> urlparse.urlparse("http://w3af.org:fooo:myhost.com/foo?bar=bla").hostname 'w3af.org' >>> It will crash when you try to call the "port" attribute, but there is no type casting performed for the "netloc" attribute. Worse: No validation is performed when you unparse the urlparse object. So in the end, newurl could be --something---. I'm kind of afraid of bugs like that, but this topic isn't related to the UTF-8 stuff anymore... Regards, Daniel Am 16.02.2012 um 15:35 schrieb Andres Riancho: > Daniel, > > On Thu, Feb 16, 2012 at 10:38 AM, Daniel Zulla > <daniel.zu...@googlemail.com> wrote: >> All software has vulnerabilities, it's in their nature :) >> >> >> Right. >> >> Don't really. As soon as the byte string enters w3af, the best >> thing to do is to decode it using the best encoding available (the one >> in Content-Encoding header, or some other we might have in the HTTP >> response) and after that all the rest of w3af's code simply forgets >> about encodings and uses the unicode string. >> >> >> Cool. >> >> Vulnerable to what? >> >> >> A forced crash. I can't see any validation of the incoming data. E.g.: >> Is resp.code really an integer > 100 < 900. > > That's because the validation is done in httplib, please see " def > _read_status(self):" in httplib.py. We use urllib2, which uses > httplib, so we don't have to worry about that. The worse thing that > can happen is that we get a BadStatusLine exception and we're handling > those in our code in order to avoid crashes. > >> We're not assuming that, if the response is not HTTP then httplib, >> or urllib, or urllib2 (don't really know which one) will fail and >> raise an exception. >> >> >> That's my point. I would like to be sure about that. Because, for example, >> if there will be additional c++ based code in w3af one day, and there are >> chances to bypass filters or to cause exceptions, a python exception could >> turn into a really dangerous exploitable flaw in PyQt4 or Cython referenced >> code really quickly. > > Could be, but we ARE doing proper error handling in xUrllib and httplib.py > >> Could you explain me a little bit more about this? I tried to >> google for ChunkOfUnidentified or ChunkOfUnidentifiedData and found >> nothing. >> >> >> http://docs.python.org/release/3.0.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit > > Quoting you: "Everything is a ChunkOfUnidentified data until it gets > converted to a string. If it's a string, it's Unicode and everthing is > fine. If not, everthing breaks immediately." > > "Everything is a ChunkOfUnidentified data until it gets converted to a > string. If it's a string, it's Unicode and everthing is fine." That's > what we're doing now at w3af. We receive a string of bytes and convert > it to a unicode string based on the encoding that was indicated by the > HTTP response. In some cases we're having errors in the conversion > (because of various reasons that would also happen in py3k), that's > why we have those bugs. > > "If not, everthing breaks immediately." We're trying to avoid that :) > The problem is that if we use errors=ignore/replace we end up in a > situation where we don't know about the errors and can't fix them. > > PS: Please check how to properly answer emails inline so that it is > then easier to answer back :) > >> Regards, >> Daniel >> Am 16.02.2012 um 14:26 schrieb Andres Riancho: >> >> Daniel, >> >> On Thu, Feb 16, 2012 at 10:07 AM, Daniel Zulla >> <daniel.zu...@googlemail.com> wrote: >> >> I have analyzed some closed source vulnerability scanners, and audited open >> source scanners like skipfish. >> >> Some of them are ironically vulnerable. Somebody may create an apache2 >> module that recognizes attacks in order to force penetration testers' >> software to crash (or worse, e.g. to execute arbitrary code). >> >> >> All software has vulnerabilities, it's in their nature :) >> >> errors=ignore or errors=replace may be a nice way to go, but - here are my >> two cents: >> >> Treating HTTP Responses as an UnidentifiedChunkOfPossiblyMaliciousData" as >> long as possible is definitely the right way to go. >> >> >> Don't really. As soon as the byte string enters w3af, the best >> thing to do is to decode it using the best encoding available (the one >> in Content-Encoding header, or some other we might have in the HTTP >> response) and after that all the rest of w3af's code simply forgets >> about encodings and uses the unicode string. >> >> I haven't audited or reviewed the httplib, but the "from_httplib_resp" >> method, looks extremely vulnerable: >> >> >> Vulnerable to what? >> >> resp = httplibresp >> >> code, msg, hdrs, body = (resp.code, resp.msg, resp.info(), resp.read()) >> >> >> if original_url: >> >> url_inst = url_object(resp.geturl(), original_url.encoding) >> >> else: >> >> url_inst = original_url = url_object(resp.geturl()) >> >> >> charset = getattr(httplibresp, 'encoding', None) >> >> return httpResponse(code, body, hdrs, url_inst, >> >> original_url, msg, charset=charset) >> >> >> I am just skeptical about assuming that the response of a webserver is valid >> HTTP. >> >> >> We're not assuming that, if the response is not HTTP then httplib, >> or urllib, or urllib2 (don't really know which one) will fail and >> raise an exception. >> >> That's why i mentioned py3k - it's exactly how Python3 handles external >> data: >> >> Everything is a ChunkOfUnidentified data until it gets converted to a >> string. If it's a string, it's Unicode and everthing is fine. If not, >> everthing breaks immediately. >> >> >> Could you explain me a little bit more about this? I tried to >> google for ChunkOfUnidentified or ChunkOfUnidentifiedData and found >> nothing. >> >> >> Regards, >> >> Daniel >> >> >> Am 16.02.2012 um 13:33 schrieb Andres Riancho: >> >> >> sends a string of bytes back to you in the HTTP response. >> >> >> Do you have some code / a example where those exceptions usually appear in >> the current w3af code? >> >> >> Regards, >> >> Daniel >> >> >> Am 15.02.2012 um 22:06 schrieb Javier Andalia: >> >> >> Hello Daniel, >> >> >> On Wed, Feb 15, 2012 at 5:11 PM, Daniel Zulla >> >> <daniel.zu...@googlemail.com> wrote: >> >> What about switching over to Python3? >> >> It solves the UnicodeDecodeException madness. >> >> >> Can you please be more specific? What exactly do you have in mind? >> >> >> Maybe I'm wrong, but the way I see it w3af would still >> >> receive/transmit encoded bytes so there's no way to skip the >> >> bytestring_to_unicode and unicode_to_bytestring conversions. Not even >> >> in py3k. >> >> >> Regards, >> >> >> Javier >> >> >> >> >> >> >> >> >> -- >> Andrés Riancho >> Director of Web Security at Rapid7 LLC >> Founder at Bonsai Information Security >> Project Leader at w3af >> >> > > > > -- > Andrés Riancho > Director of Web Security at Rapid7 LLC > Founder at Bonsai Information Security > Project Leader at w3af ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ W3af-develop mailing list W3af-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/w3af-develop