Taras,

On Fri, Apr 13, 2012 at 5:53 AM, Taras <ox...@oxdef.info> wrote:
> Andres,
>
> my point is one scan with *tuned scanner's settings* of "classic" web app
> must run no more then 3-4 hours.

    Why? Scan duration, if you want the scan to analyze the whole
site, depends on many factors that we don't control:
        * Network speed
        * Amount of URLs in the site
        * Amount of parameters in the site

    If by "classic" web app you mean "small", then we agree :)

> We can do it with:
> * improving discovery state

    Yes, agreed. Please identify one or more particular issues and
I'll try to fix them during next week.

> * reducing number of requests per detection of flaw for audit plugins
> (especially for xss plugin).

    I worked a little bit on this and never got to a final version,
I'll try to re-start my work there.

> * extensive using of cache - w3af already has it

    But maybe it is slow? Maybe we could implement it in a completely
different way? I'll think about that too, I have some ideas related to
caching in a shelve instead of multiple files and also caching the
document parser so that we don't parse things more than once.

> Reducing number of MAX_VARIANTS to 5 is good idea. I also agree that we need
> to make it controlled by user. My idea is moving of
> variants and limiting code from w3afcore and webSpider plugin to the
> baseDiscovery class or even separate class to consolidate this important
> part in single place. All discovery plugins can use it as result instead of
> list. But this task, imho, is not for nearest release of w3af. Currently we
> can improve the most important audit plugins like xss (we already have
> discussed it) and sqli.
>
> Currently my tests show that maximum of time, resource and amount of traffic
>  w3af uses for *audit stage* (usual size of output-http.txt for me is about
> 1Gb!). Great setting which I use here is limit for discovery time. It is
> really useful!

The good thing is that from the 1GB you have in output-http.txt, only
a part was sent to the network. Please remember that output-http.txt
logs cached and network responses.

> Also want to say that my patch to w3afCore was only PoC.
>
>
> On 04/11/2012 07:50 PM, Andres Riancho wrote:
>>
>> Taras,
>>
>> On Wed, Apr 11, 2012 at 12:11 PM, Andres Riancho
>> <andres.rian...@gmail.com>  wrote:
>>>
>>> On Wed, Apr 11, 2012 at 4:56 AM, Taras<ox...@oxdef.info>  wrote:
>>>>
>>>> Andres,
>>>>
>>>>
>>>>>>>     If the framework IS working like this, I think that the shared
>>>>>>> fuzzable request list wouldn't do much good. If it is not working
>>>>>>> like
>>>>>>> this (and I would love to get an output log to show it), it seems
>>>>>>> that
>>>>>>> we have a lot of work ahead of us.
>>>>>>
>>>>>>
>>>>>>
>>>>>> And w3afCore need to filter requests from discovery plugins on every
>>>>>> loop
>>>>>> in
>>>>>> _discover_and_bruteforce(), am I right?
>>>>>
>>>>>
>>>>>
>>>>> It should filter things as they come out of the plugin and before
>>>>> adding them to the fuzzable request list,
>>>>
>>>>
>>>> Agree, but as I see in w3afCore.py there is no filtering in it.
>>>> I just have added it [0]. It shows good results on the test suite (see
>>>> attachment).
>>>>
>>>> Without filtering:
>>>>  Found 2 URLs and 87 different points of injection.
>>>>  ...
>>>>  Scan finished in 3 minutes 30 seconds.
>>>>
>>>> With filtering:
>>>>  Found 2 URLs and 3 different points of injection.
>>>>  ...
>>>>  Scan finished in 11 seconds.
>>>
>>>
>>> Reviewing this and reproducing in my environment. Will have some opinions
>>> in ~1h
>>
>>
>> All right... now I see your concern and understand it. I run the scan
>> you proposed and was able to reproduce the issue, which is actually
>> generated by a simple constant:
>>
>>     webSpider.py:
>>     MAX_VARIANTS = 40
>>
>> Let me explain what is going on here and what your patch is doing:
>>     #1 In the current trunk version, w3af's webSpider is parsing the
>> index.php file you sent and identifies many links, most of them
>> variants of each other. Before returning them to the w3afCore the
>> webSpider uses the variant_db class and MAX_VARIANTS to define if
>> enough variants of that link have been analyzed. If there are not
>> enough then the variant needs to be analyzed so it is returned to the
>> core. Given that MAX_VARIANTS is 40 [Note: I changed this to 5 in the
>> latest commit.], the webSpider returns all/most of the links in your
>> index.php to the core.
>>
>>     a) This makes sense, since a link to a previously unknown section
>> might be present in "article.php?id=25" and NOT present in
>> "article.php?id=35", so w3af needs to make a choice on how many of
>> those variants are going to be analyzed and how many are going to be
>> left out.
>>
>>     b) The same happens with vulnerabilities, there might be a
>> vulnerability in the foo parameter of "article.php?id=28&foo=bar" when
>> the id=28 and the vulnerability might NOT be present when the id is
>> 32.
>>
>>     #2 With your patch, which filters all variants and "flattens" the
>> previously found ones, w3afCore only ends up with
>> "article.php?id=number" and ""article.php?id=number&foo=string" ,
>> which won't allow for other discovery plugins to analyze the variants
>> (#1 - a) and audit plugins to identify the more complex
>> vulnerabilities (#1 - b). What will happen (of course) is that the
>> scanner will be VERY fast.
>>
>> But lets try to understand what happens with the audit plugins when
>> they are presented with multiple variants. According to 1-b they
>> should send multiple requests and those should generate a lot of
>> network traffic, slowing the scan down. Here is a grep of a scan with
>> the audit.sqli plugin enabled:
>>
>> dz0@dz0-laptop:~/workspace/w3af$ grep "d'z\"0" output-w3af.txt
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=145&foo=d'z"0
>> returned HTTP code "200" - id: 93
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
>> returned HTTP code "200" - id: 94
>> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
>> returned HTTP code "200" - id: 96
>> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
>> returned HTTP code "200" - id: 98 - from cache.
>> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
>> returned HTTP code "200" - id: 100 - from cache.
>> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
>> returned HTTP code "200" - id: 102 - from cache.
>> GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
>> returned HTTP code "200" - id: 104 - from cache.
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
>> returned HTTP code "200" - id: 106 - from cache.
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=122&foo=d'z"0
>> returned HTTP code "200" - id: 107
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
>> returned HTTP code "200" - id: 109 - from cache.
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=119&foo=d'z"0
>> returned HTTP code "200" - id: 110
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
>> returned HTTP code "200" - id: 112 - from cache.
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=82&foo=d'z"0
>> returned HTTP code "200" - id: 113
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
>> returned HTTP code "200" - id: 115 - from cache.
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=75&foo=d'z"0
>> returned HTTP code "200" - id: 116
>>
>> The most important thing to notice here are the repeated HTTP requests
>> to the variants and the "from cache" strings at the end of the
>> repeated requests. For example:
>>
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
>> returned HTTP code "200" - id: 93
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
>> returned HTTP code "200" - id: 95 - from cache.
>>
>> And then, we're following the logic from #1-b and actually sending
>> these two requests to the remote web application:
>>
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=215&foo=d'z"0
>> returned HTTP code "200" - id: 96
>> GET
>> http://moth/w3af/discovery/web_spider/variants/article.php?id=29&foo=d'z"0
>> returned HTTP code "200" - id: 105
>>
>> I'm not saying that this is all perfect. The downsides of this scan
>> strategy are:
>>     * Slow
>>         - Because more HTTP requests are sent
>>         - Because more pattern matching is applied to more HTTP responses
>>         - Because (maybe) the responses that are retrieved from the
>> cache are slow to get
>>         - Because "MAX_VARIANTS = 40" was too high
>>
>> But of course this has good things like #1-a and #1-b, which provides
>> the scanner with better code coverage at the end.
>>
>> Maybe we could have different scan strategies, or change MAX_VARIANTS
>> to be a user defined parameter, or... (please send your ideas).-
>>
>> Regards,
>>>>
>>>>
>>>>
>>>>> Please let me know if the discovery process is NOT working as we
>>>>> expect and if we have to filter stuff somewhere
>>>>
>>>>
>>>> See above.
>>>>
>>>> [0] https://sourceforge.net/apps/trac/w3af/changeset/4861
>>>> --
>>>> Taras
>>>> http://oxdef.info
>>>
>>>
>>>
>>>
>>> --
>>> Andrés Riancho
>>> Project Leader at w3af - http://w3af.org/
>>> Web Application Attack and Audit Framework
>>
>>
>>
>>
>
>
> --
> Taras
> http://oxdef.info



-- 
Andrés Riancho
Project Leader at w3af - http://w3af.org/
Web Application Attack and Audit Framework

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Reply via email to