Hi! It is really good and interesting improvement! I has just tested and w3af starts now really faster! I tried to understand principle of Bloom filter but it looks like 5 min is too little time for it. But I will read one more time about it.
On Fri, 2010-11-26 at 16:41 -0300, Andres Riancho wrote: > List, > > Today I've worked on a performance improvement, I replaced the > "disk_list" that had a sqlite3 database backend with a Bloom filter > [0]. The short story: your scans will be faster, w3af will start in > less time, and changing between profiles doesn't take 5 seconds > anymore. > > The long story: > > - In many plugins we want to analyze URLs only once. The initial > approach for doing this was: > > def __init__(self...): > self.already_analyzed = [] > > def grep(self...): > ... > if url not in self._already_analyzed: > self._analyze( request, response ) > self._already_analyzed.append( url ) > > - The problem with that, of course, is that the > "self.already_analyzed" list will grow and consume memory in an > unbounded way. But... well... it was the easiest thing to do at first. > > - A year ago we introduced "disk_lists", which were a way of saving > those URLs to disk using a sqlite3 database. This approach was great > because of the *very* low memory use, which solved our first problem; > but introduced yet another one: reading from disk and sqlite3's poor > performance for this situation. > > - While trying to fix a totally unrelated issue, Javier and I realized > that "disk_list" took 15 seconds to insert and 10k integers and test > if 10k of them were there or not (5k were, 5 were not). At the same > time we were playing with bloom filters, which performed the same > actions in 2 seconds. > > - After 4 hours of development, testing, etc. I replaced disk_list > with a ScalableBloomFilter, improving performance of those calls > (which are *very* common) in 8 to 10 times depending on the amount of > information being stored. > > - This affects the performance of the whole scanner as many of these > "if url not in ..." were used in grep plugins which slowed down the > whole scan. This also fixed an annoyance in which loading a profile > would take (in some boxes) up to 8 seconds. The reason behind that was > that the plugins inside the profile had to init the disk_list --> > sqlite3 db, which was a time consuming process. > > If you're interested in the changes I've introduced, please take a > look at these commits [1][2] > > [0] http://en.wikipedia.org/wiki/Bloom_filter > [1] http://w3af.svn.sourceforge.net/w3af/?rev=3787&view=rev > [2] http://w3af.svn.sourceforge.net/w3af/?rev=3788&view=rev > > Regards, -- Taras http://oxdef.info ---- "Software is like sex: it's better when it's free." - Linus Torvalds ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ W3af-develop mailing list W3af-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/w3af-develop