On Wed, Apr 11, 2012 at 12:50 PM, Andres Riancho <andres.rian...@gmail.com> wrote: > Taras, > > On Wed, Apr 11, 2012 at 12:11 PM, Andres Riancho > <andres.rian...@gmail.com> wrote: >> On Wed, Apr 11, 2012 at 4:56 AM, Taras <ox...@oxdef.info> wrote: >>> Andres, >>> >>> >>>>>> If the framework IS working like this, I think that the shared >>>>>> fuzzable request list wouldn't do much good. If it is not working like >>>>>> this (and I would love to get an output log to show it), it seems that >>>>>> we have a lot of work ahead of us. >>>>> >>>>> >>>>> And w3afCore need to filter requests from discovery plugins on every loop >>>>> in >>>>> _discover_and_bruteforce(), am I right? >>>> >>>> >>>> It should filter things as they come out of the plugin and before >>>> adding them to the fuzzable request list, >>> >>> Agree, but as I see in w3afCore.py there is no filtering in it. >>> I just have added it [0]. It shows good results on the test suite (see >>> attachment). >>> >>> Without filtering: >>> Found 2 URLs and 87 different points of injection. >>> ... >>> Scan finished in 3 minutes 30 seconds. >>> >>> With filtering: >>> Found 2 URLs and 3 different points of injection. >>> ... >>> Scan finished in 11 seconds. >> >> Reviewing this and reproducing in my environment. Will have some opinions in >> ~1h > > All right... now I see your concern and understand it. I run the scan > you proposed and was able to reproduce the issue, which is actually > generated by a simple constant: > > webSpider.py: > MAX_VARIANTS = 40 > > Let me explain what is going on here and what your patch is doing: > #1 In the current trunk version, w3af's webSpider is parsing the > index.php file you sent and identifies many links, most of them > variants of each other. Before returning them to the w3afCore the > webSpider uses the variant_db class and MAX_VARIANTS to define if > enough variants of that link have been analyzed. If there are not > enough then the variant needs to be analyzed so it is returned to the > core. Given that MAX_VARIANTS is 40 [Note: I changed this to 5 in the > latest commit.], the webSpider returns all/most of the links in your > index.php to the core. > > a) This makes sense, since a link to a previously unknown section > might be present in "article.php?id=25" and NOT present in > "article.php?id=35", so w3af needs to make a choice on how many of > those variants are going to be analyzed and how many are going to be > left out. > > b) The same happens with vulnerabilities, there might be a > vulnerability in the foo parameter of "article.php?id=28&foo=bar" when > the id=28 and the vulnerability might NOT be present when the id is > 32. > > #2 With your patch, which filters all variants and "flattens" the > previously found ones, w3afCore only ends up with > "article.php?id=number" and ""article.php?id=number&foo=string" , > which won't allow for other discovery plugins to analyze the variants > (#1 - a) and audit plugins to identify the more complex > vulnerabilities (#1 - b). What will happen (of course) is that the > scanner will be VERY fast. > > But lets try to understand what happens with the audit plugins when > they are presented with multiple variants. According to 1-b they > should send multiple requests and those should generate a lot of > network traffic, slowing the scan down. Here is a grep of a scan with > the audit.sqli plugin enabled: > > dz0@dz0-laptop:~/workspace/w3af$ grep "d'z\"0" output-w3af.txt > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=145&foo=d'z"0 > returned HTTP code "200" - id: 93 > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar > returned HTTP code "200" - id: 94 > GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 > returned HTTP code "200" - id: 96 > GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 > returned HTTP code "200" - id: 98 - from cache. > GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 > returned HTTP code "200" - id: 100 - from cache. > GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 > returned HTTP code "200" - id: 102 - from cache. > GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 > returned HTTP code "200" - id: 104 - from cache. > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar > returned HTTP code "200" - id: 106 - from cache. > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=122&foo=d'z"0 > returned HTTP code "200" - id: 107 > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar > returned HTTP code "200" - id: 109 - from cache. > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=119&foo=d'z"0 > returned HTTP code "200" - id: 110 > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar > returned HTTP code "200" - id: 112 - from cache. > GET http://moth/w3af/discovery/web_spider/variants/article.php?id=82&foo=d'z"0 > returned HTTP code "200" - id: 113 > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar > returned HTTP code "200" - id: 115 - from cache. > GET http://moth/w3af/discovery/web_spider/variants/article.php?id=75&foo=d'z"0 > returned HTTP code "200" - id: 116 > > The most important thing to notice here are the repeated HTTP requests > to the variants and the "from cache" strings at the end of the > repeated requests. For example: > > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar > returned HTTP code "200" - id: 93 > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar > returned HTTP code "200" - id: 95 - from cache. > > And then, we're following the logic from #1-b and actually sending > these two requests to the remote web application: > > GET > http://moth/w3af/discovery/web_spider/variants/article.php?id=215&foo=d'z"0 > returned HTTP code "200" - id: 96 > GET http://moth/w3af/discovery/web_spider/variants/article.php?id=29&foo=d'z"0 > returned HTTP code "200" - id: 105 > > I'm not saying that this is all perfect. The downsides of this scan > strategy are: > * Slow > - Because more HTTP requests are sent > - Because more pattern matching is applied to more HTTP responses > - Because (maybe) the responses that are retrieved from the > cache are slow to get > - Because "MAX_VARIANTS = 40" was too high > > But of course this has good things like #1-a and #1-b, which provides > the scanner with better code coverage at the end. > > Maybe we could have different scan strategies, or change MAX_VARIANTS > to be a user defined parameter, or... (please send your ideas).-
Forgot to mention that this can be reproduced with an updated trunk and moth. I commited all test scripts so you guys can run: ./w3af_console -s scripts/script-web_spider-variants.w3af And get my same results. > Regards, >>> >>> >>>> Please let me know if the discovery process is NOT working as we >>>> expect and if we have to filter stuff somewhere >>> >>> See above. >>> >>> [0] https://sourceforge.net/apps/trac/w3af/changeset/4861 >>> -- >>> Taras >>> http://oxdef.info >> >> >> >> -- >> Andrés Riancho >> Project Leader at w3af - http://w3af.org/ >> Web Application Attack and Audit Framework > > > > -- > Andrés Riancho > Project Leader at w3af - http://w3af.org/ > Web Application Attack and Audit Framework -- Andrés Riancho Project Leader at w3af - http://w3af.org/ Web Application Attack and Audit Framework ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ W3af-develop mailing list W3af-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/w3af-develop