Hi!

It is really good and interesting improvement!
I has just tested and w3af starts now really faster!
I tried to understand principle of Bloom filter but it looks like 5 min
is too little time for it. But I will read one more time about it.

On Fri, 2010-11-26 at 16:41 -0300, Andres Riancho wrote:
> List,
> 
>     Today I've worked on a performance improvement, I replaced the
> "disk_list" that had a sqlite3 database backend with a Bloom filter
> [0]. The short story: your scans will be faster, w3af will start in
> less time, and changing between profiles doesn't take 5 seconds
> anymore.
> 
>     The long story:
> 
> - In many plugins we want to analyze URLs only once. The initial
> approach for doing this was:
> 
>     def __init__(self...):
>         self.already_analyzed = []
> 
>     def grep(self...):
>         ...
>         if url not in self._already_analyzed:
>             self._analyze( request, response )
>             self._already_analyzed.append( url )
> 
> - The problem with that, of course, is that the
> "self.already_analyzed" list will grow and consume memory in an
> unbounded way. But... well... it was the easiest thing to do at first.
> 
> - A year ago we introduced "disk_lists", which were a way of saving
> those URLs to disk using a sqlite3 database. This approach was great
> because of the *very* low memory use, which solved our first problem;
> but introduced yet another one: reading from disk and sqlite3's poor
> performance for this situation.
> 
> - While trying to fix a totally unrelated issue, Javier and I realized
> that "disk_list" took 15 seconds to insert and 10k integers and test
> if 10k of them were there or not (5k were, 5 were not). At the same
> time we were playing with bloom filters, which performed the same
> actions in 2 seconds.
> 
> - After 4 hours of development, testing, etc. I replaced disk_list
> with a ScalableBloomFilter, improving performance of those calls
> (which are *very* common) in 8 to 10 times depending on the amount of
> information being stored.
> 
> - This affects the performance of the whole scanner as many of these
> "if url not in ..." were used in grep plugins which slowed down the
> whole scan. This also fixed an annoyance in which loading a profile
> would take (in some boxes) up to 8 seconds. The reason behind that was
> that the plugins inside the profile had to init the disk_list -->
> sqlite3 db, which was a time consuming process.
> 
>     If you're interested in the changes I've introduced, please take a
> look at these commits [1][2]
> 
> [0] http://en.wikipedia.org/wiki/Bloom_filter
> [1] http://w3af.svn.sourceforge.net/w3af/?rev=3787&view=rev
> [2] http://w3af.svn.sourceforge.net/w3af/?rev=3788&view=rev
> 
> Regards,

-- 
Taras
http://oxdef.info
----
"Software is like sex: it's better when it's free." - Linus Torvalds



------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Reply via email to