List,

    Today I've worked on a performance improvement, I replaced the
"disk_list" that had a sqlite3 database backend with a Bloom filter
[0]. The short story: your scans will be faster, w3af will start in
less time, and changing between profiles doesn't take 5 seconds
anymore.

    The long story:

- In many plugins we want to analyze URLs only once. The initial
approach for doing this was:

    def __init__(self...):
        self.already_analyzed = []

    def grep(self...):
        ...
        if url not in self._already_analyzed:
            self._analyze( request, response )
            self._already_analyzed.append( url )

- The problem with that, of course, is that the
"self.already_analyzed" list will grow and consume memory in an
unbounded way. But... well... it was the easiest thing to do at first.

- A year ago we introduced "disk_lists", which were a way of saving
those URLs to disk using a sqlite3 database. This approach was great
because of the *very* low memory use, which solved our first problem;
but introduced yet another one: reading from disk and sqlite3's poor
performance for this situation.

- While trying to fix a totally unrelated issue, Javier and I realized
that "disk_list" took 15 seconds to insert and 10k integers and test
if 10k of them were there or not (5k were, 5 were not). At the same
time we were playing with bloom filters, which performed the same
actions in 2 seconds.

- After 4 hours of development, testing, etc. I replaced disk_list
with a ScalableBloomFilter, improving performance of those calls
(which are *very* common) in 8 to 10 times depending on the amount of
information being stored.

- This affects the performance of the whole scanner as many of these
"if url not in ..." were used in grep plugins which slowed down the
whole scan. This also fixed an annoyance in which loading a profile
would take (in some boxes) up to 8 seconds. The reason behind that was
that the plugins inside the profile had to init the disk_list -->
sqlite3 db, which was a time consuming process.

    If you're interested in the changes I've introduced, please take a
look at these commits [1][2]

[0] http://en.wikipedia.org/wiki/Bloom_filter
[1] http://w3af.svn.sourceforge.net/w3af/?rev=3787&view=rev
[2] http://w3af.svn.sourceforge.net/w3af/?rev=3788&view=rev

Regards,
-- 
Andrés Riancho
Director of Web Security at Rapid7 LLC
Founder at Bonsai Information Security
Project Leader at w3af

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Reply via email to