Taras, On Fri, Apr 13, 2012 at 5:53 AM, Taras <ox...@oxdef.info> wrote: > Andres, > > my point is one scan with *tuned scanner's settings* of "classic" web app > must run no more then 3-4 hours.
Why? Scan duration, if you want the scan to analyze the whole site, depends on many factors that we don't control: * Network speed * Amount of URLs in the site * Amount of parameters in the site If by "classic" web app you mean "small", then we agree :) > We can do it with: > * improving discovery state Yes, agreed. Please identify one or more particular issues and I'll try to fix them during next week. > * reducing number of requests per detection of flaw for audit plugins > (especially for xss plugin). I worked a little bit on this and never got to a final version, I'll try to re-start my work there. > * extensive using of cache - w3af already has it But maybe it is slow? Maybe we could implement it in a completely different way? I'll think about that too, I have some ideas related to caching in a shelve instead of multiple files and also caching the document parser so that we don't parse things more than once. > Reducing number of MAX_VARIANTS to 5 is good idea. I also agree that we need > to make it controlled by user. My idea is moving of > variants and limiting code from w3afcore and webSpider plugin to the > baseDiscovery class or even separate class to consolidate this important > part in single place. All discovery plugins can use it as result instead of > list. But this task, imho, is not for nearest release of w3af. Currently we > can improve the most important audit plugins like xss (we already have > discussed it) and sqli. > > Currently my tests show that maximum of time, resource and amount of traffic > w3af uses for *audit stage* (usual size of output-http.txt for me is about > 1Gb!). Great setting which I use here is limit for discovery time. It is > really useful! The good thing is that from the 1GB you have in output-http.txt, only a part was sent to the network. Please remember that output-http.txt logs cached and network responses. > Also want to say that my patch to w3afCore was only PoC. > > > On 04/11/2012 07:50 PM, Andres Riancho wrote: >> >> Taras, >> >> On Wed, Apr 11, 2012 at 12:11 PM, Andres Riancho >> <andres.rian...@gmail.com> wrote: >>> >>> On Wed, Apr 11, 2012 at 4:56 AM, Taras<ox...@oxdef.info> wrote: >>>> >>>> Andres, >>>> >>>> >>>>>>> If the framework IS working like this, I think that the shared >>>>>>> fuzzable request list wouldn't do much good. If it is not working >>>>>>> like >>>>>>> this (and I would love to get an output log to show it), it seems >>>>>>> that >>>>>>> we have a lot of work ahead of us. >>>>>> >>>>>> >>>>>> >>>>>> And w3afCore need to filter requests from discovery plugins on every >>>>>> loop >>>>>> in >>>>>> _discover_and_bruteforce(), am I right? >>>>> >>>>> >>>>> >>>>> It should filter things as they come out of the plugin and before >>>>> adding them to the fuzzable request list, >>>> >>>> >>>> Agree, but as I see in w3afCore.py there is no filtering in it. >>>> I just have added it [0]. It shows good results on the test suite (see >>>> attachment). >>>> >>>> Without filtering: >>>> Found 2 URLs and 87 different points of injection. >>>> ... >>>> Scan finished in 3 minutes 30 seconds. >>>> >>>> With filtering: >>>> Found 2 URLs and 3 different points of injection. >>>> ... >>>> Scan finished in 11 seconds. >>> >>> >>> Reviewing this and reproducing in my environment. Will have some opinions >>> in ~1h >> >> >> All right... now I see your concern and understand it. I run the scan >> you proposed and was able to reproduce the issue, which is actually >> generated by a simple constant: >> >> webSpider.py: >> MAX_VARIANTS = 40 >> >> Let me explain what is going on here and what your patch is doing: >> #1 In the current trunk version, w3af's webSpider is parsing the >> index.php file you sent and identifies many links, most of them >> variants of each other. Before returning them to the w3afCore the >> webSpider uses the variant_db class and MAX_VARIANTS to define if >> enough variants of that link have been analyzed. If there are not >> enough then the variant needs to be analyzed so it is returned to the >> core. Given that MAX_VARIANTS is 40 [Note: I changed this to 5 in the >> latest commit.], the webSpider returns all/most of the links in your >> index.php to the core. >> >> a) This makes sense, since a link to a previously unknown section >> might be present in "article.php?id=25" and NOT present in >> "article.php?id=35", so w3af needs to make a choice on how many of >> those variants are going to be analyzed and how many are going to be >> left out. >> >> b) The same happens with vulnerabilities, there might be a >> vulnerability in the foo parameter of "article.php?id=28&foo=bar" when >> the id=28 and the vulnerability might NOT be present when the id is >> 32. >> >> #2 With your patch, which filters all variants and "flattens" the >> previously found ones, w3afCore only ends up with >> "article.php?id=number" and ""article.php?id=number&foo=string" , >> which won't allow for other discovery plugins to analyze the variants >> (#1 - a) and audit plugins to identify the more complex >> vulnerabilities (#1 - b). What will happen (of course) is that the >> scanner will be VERY fast. >> >> But lets try to understand what happens with the audit plugins when >> they are presented with multiple variants. According to 1-b they >> should send multiple requests and those should generate a lot of >> network traffic, slowing the scan down. Here is a grep of a scan with >> the audit.sqli plugin enabled: >> >> dz0@dz0-laptop:~/workspace/w3af$ grep "d'z\"0" output-w3af.txt >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=145&foo=d'z"0 >> returned HTTP code "200" - id: 93 >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar >> returned HTTP code "200" - id: 94 >> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 >> returned HTTP code "200" - id: 96 >> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 >> returned HTTP code "200" - id: 98 - from cache. >> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 >> returned HTTP code "200" - id: 100 - from cache. >> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 >> returned HTTP code "200" - id: 102 - from cache. >> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 >> returned HTTP code "200" - id: 104 - from cache. >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar >> returned HTTP code "200" - id: 106 - from cache. >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=122&foo=d'z"0 >> returned HTTP code "200" - id: 107 >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar >> returned HTTP code "200" - id: 109 - from cache. >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=119&foo=d'z"0 >> returned HTTP code "200" - id: 110 >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar >> returned HTTP code "200" - id: 112 - from cache. >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=82&foo=d'z"0 >> returned HTTP code "200" - id: 113 >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar >> returned HTTP code "200" - id: 115 - from cache. >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=75&foo=d'z"0 >> returned HTTP code "200" - id: 116 >> >> The most important thing to notice here are the repeated HTTP requests >> to the variants and the "from cache" strings at the end of the >> repeated requests. For example: >> >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar >> returned HTTP code "200" - id: 93 >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar >> returned HTTP code "200" - id: 95 - from cache. >> >> And then, we're following the logic from #1-b and actually sending >> these two requests to the remote web application: >> >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=215&foo=d'z"0 >> returned HTTP code "200" - id: 96 >> GET >> http://moth/w3af/discovery/web_spider/variants/article.php?id=29&foo=d'z"0 >> returned HTTP code "200" - id: 105 >> >> I'm not saying that this is all perfect. The downsides of this scan >> strategy are: >> * Slow >> - Because more HTTP requests are sent >> - Because more pattern matching is applied to more HTTP responses >> - Because (maybe) the responses that are retrieved from the >> cache are slow to get >> - Because "MAX_VARIANTS = 40" was too high >> >> But of course this has good things like #1-a and #1-b, which provides >> the scanner with better code coverage at the end. >> >> Maybe we could have different scan strategies, or change MAX_VARIANTS >> to be a user defined parameter, or... (please send your ideas).- >> >> Regards, >>>> >>>> >>>> >>>>> Please let me know if the discovery process is NOT working as we >>>>> expect and if we have to filter stuff somewhere >>>> >>>> >>>> See above. >>>> >>>> [0] https://sourceforge.net/apps/trac/w3af/changeset/4861 >>>> -- >>>> Taras >>>> http://oxdef.info >>> >>> >>> >>> >>> -- >>> Andrés Riancho >>> Project Leader at w3af - http://w3af.org/ >>> Web Application Attack and Audit Framework >> >> >> >> > > > -- > Taras > http://oxdef.info -- Andrés Riancho Project Leader at w3af - http://w3af.org/ Web Application Attack and Audit Framework ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ W3af-develop mailing list W3af-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/w3af-develop