Tomas, On Wed, Oct 31, 2012 at 10:39 AM, Tomas Velazquez <tomas.velazqu...@gmail.com> wrote: > Andres, > > mmm, code_repository? As you like xD
You're the one that said the plugin needed a name change, I'm fine with code_repository. > Could you create svn and cvs repo at > https://sourceforge.net/apps/trac/w3af/browser/extras/testEnv/webroot/moth/w3af/crawl/find_dvcs > ? Done, svn up your test directory. > Great fixes Andres :D > > All extract function work ok. Maybe test_find_dvcs.py is very slow because > uses web_spider. I noticed that it is slow too, but it's actually because of this code which can be modified: for domain_path in fuzzable_request.getURL().getDirectories(): In our test case, and even if we specify "onlyForward" in the web_spider configuration, the find_dvcs plugin will try to find repos in the following directories: http://moth/w3af/crawl/find_dvcs/ http://moth/w3af/crawl/ http://moth/w3af/ http://moth/ http://moth/w3af/crawl/find_dvcs/cvs/ http://moth/w3af/crawl/find_dvcs/svn/ http://moth/w3af/crawl/find_dvcs/.../ Our initial expectation would be for it to find repos only in "http://moth/w3af/crawl/find_dvcs/" and that's why we think it is slow. Instead of sending 10 HTTP requests (1 for each entry in self._dvcs, 1 directory) it is sending 40 requests (1 for each entry in self._dvcs, 4 directories) So, we can either change the getDirectories() to something that simply returns the domain/path/ URL and have a "faster" find_dvcs; or just leave it as-is. Please note that I used "faster" because in most scans it will be the same to go with getDirectories() which will perform more tests the first times and maybe no tests the last times it is called (because /w3af/ was already analyzed in the first call) or getDomainPath() which will perform an equal amount of HTTP requests each time. After writing all that, and running some tests that took 60sec; I replaced the getDirectories() with: domain_path = fuzzable_request.getURL().getDomainPath() In order to have faster unittests. After some debugging I was able to make all tests pass! The code, which should be almost final, is available here [0]. The most important change I did was: * _clean_filenames: The _analyzed_filenames was storing "index.html" many times, disregarding which directory it was actually tested in, which lead to false negatives. * The data was stored in the kb using one key: repo.upper() which contained "git repository" and get() on the test using "GIT". * Some file parsing functions were very strict and did a "return set()" even after a successful extraction of a couple of references from the file. Removed those or replaced with "continue" The only pending task is to decide what to do with duplicates. The code (as it is) will report one vulnerability for each directory where repository metadata is found; for example if the site has this structure: http://host.tld/ http://host.tld/foo/ http://host.tld/foo/bar/ http://host.tld/spam/ http://host.tld/eggs/ http://host.tld/eggs/1/ http://host.tld/eggs/2/ http://host.tld/eggs/3/ http://host.tld/eggs/4/ And the whole webroot was generated using a "svn co" then we'll report 9 vulnerabilities. Imagine this in a complex site with 50 directories... The option we have is to store the shortest path for each repo type and report only that, which would result in 1 vuln report of http://host.tld/ ; but this wouldn't be enough if there is something like this: http://host.tld/ <------- No repo here http://host.tld/app1/ <-------- svn checkout https://internal.svn.server/app1 http://host.tld/app1/foo/ (generated by checkout app1) http://host.tld/app2/ <-------- svn checkout https://internal.svn.server/app2 http://host.tld/app2/bar/ (generated by checkout app2) Here there are two different vulns to report. @Tomas: What do you think? What should we do? [0] https://sourceforge.net/apps/trac/w3af/changeset/6086 > http://tvelazquez.googlecode.com/files/code_repository.tgz > > >> At this moment it is impossible to achieve that. We could do it once >> the threading2 branch is done... but it doesn't make much sense >> either. Weekly releases mean one of two things: unstable or tons of >> work. And the tons of work required to make a weekly release stable >> make no sense for me. > > Once threading2 branch is done the release cicle could be every 3 months :) > > Regards, > > > > On Tue, Oct 30, 2012 at 2:52 PM, Andres Riancho <andres.rian...@gmail.com> > wrote: >> >> Tomas, >> >> On Tue, Oct 30, 2012 at 9:55 AM, Andres Riancho >> <andres.rian...@gmail.com> wrote: >> > Tomas, >> > >> > On Mon, Oct 29, 2012 at 10:35 PM, Tomas Velazquez >> > <tomas.velazqu...@gmail.com> wrote: >> >> Andres, >> >> >> >> Sorry for the delay, but I was developing other plugins more >> >> interesting, >> >> let me a few weeks and you'll see. :> >> > >> > Nice, I would like to see that :) >> > >> >> I don't like the word find in the plugin and maybe it is wrong to call >> >> it >> >> dvcs as it supports svn and cvs. I don't know what would be the correct >> >> name. >> > >> > Some ideas: >> > code_repository >> > source_repository >> > code_repo >> > source_repo >> > code_leak >> > source_leak >> > >> > The last two names are mostly related with the fact that based on the >> > metadata one could steal the code using >> > https://github.com/evilpacket/DVCS-Pillage which was written by the >> > original author from find_dvcs.py, Adam Baldwin. >> > >> >> These new files used are more correct than the others and ensure the >> >> existence of a repository. >> >> >> >> >> >> http://code.google.com/p/tvelazquez/source/browse/pentest/w3af-plugins/crawl/find_dvcs.py >> >> Code review: >> >> * self._analyzed_dirs = set() can't be used, it simply takes too >> much memory. It is fine for a site with 30 directories, but we should >> always should think about scanning huge sites. Using >> scalable_bloomfilter (fixed) >> >> * You sometimes use \t for indents, please use 4-spaces (fixed) >> >> * I don't like the fact that you use the _analyzed_dirs for two >> different things: >> - self._analyzed_dirs.add( domain_path ) >> - self._analyzed_dirs.add(filename) >> (fixed) >> >> * _get_and_parse() duplicated code from _send_and_check() (fixed) >> >> * Plugin failed to pass pre-existing unittest [0] which scans this >> directory [1] that contains hg, bzr, git and svn repos. (pending) >> >> * Wrote some unittests for the functions which extract filenames >> from the repo files. If possible, add the rest since some seem to be >> working in an unexpected way: Is the svn_entries function returning >> garbage? See unittest code. (pending) >> >> Since I know you'll deliver the required fixes to make the unittest >> that scans [1] pass, I'm commiting this to the threading2 branch >> >> [0] >> https://sourceforge.net/apps/trac/w3af/browser/branches/threading2/plugins/tests/crawl/test_find_dvcs.py >> [1] >> https://sourceforge.net/apps/trac/w3af/browser/extras/testEnv/webroot/moth/w3af/crawl/find_dvcs >> >> > Great, reviewing right now. Will write some unittests and let you know >> > if there is anything else that needs to be done, >> > >> >> Regards, >> >> >> >> PD: I would love a w3af stable version :> >> > >> > That will be achieved when I finish my TODO >> > https://sourceforge.net/apps/trac/w3af/wiki/andres%27-TODO >> > >> > It shouldn't be long before I finish it, the only problem is that it >> > seems to grow in number items instead of getting smaller ;) Now for >> > real, it shouldn't be long and it will be a great way to start over >> > with the project since it is a huge rewrite. >> > >> >> Is there a roadmap? >> > >> > At one point in time I created one, but it is outdated now :( You can >> > see a part of it here: >> > >> > https://sourceforge.net/apps/trac/w3af/query?status=new&status=accepted&status=reopened&group=milestone&component=w3af-plugins&order=priority >> > >> > But be aware! Most of those tickets are outdated: code already written >> > or the ticket was replaced by something else, etc. Before starting to >> > write anything send me an email and I'll let you know. >> > >> >> I think the >> >> short development cycles would be good idea. >> >> http://zaproxy.blogspot.com.es/2012/10/zap-weekly-releases.html >> > >> > At this moment it is impossible to achieve that. We could do it once >> > the threading2 branch is done... but it doesn't make much sense >> > either. Weekly releases mean one of two things: unstable or tons of >> > work. And the tons of work required to make a weekly release stable >> > make no sense for me. >> > >> > Regards, >> > >> >> >> >> >> >> On Mon, Oct 29, 2012 at 11:34 PM, Andres Riancho >> >> <andres.rian...@gmail.com> >> >> wrote: >> >>> >> >>> Tomas, >> >>> >> >>> On Fri, Oct 12, 2012 at 12:02 PM, Andres Riancho >> >>> <andres.rian...@gmail.com> wrote: >> >>> > Tomas, >> >>> > >> >>> > On Sun, Oct 7, 2012 at 2:55 PM, Tomas Velazquez >> >>> > <tomas.velazqu...@gmail.com> wrote: >> >>> >> Andres, >> >>> >> >> >>> >> I don't touch find_dvcs because it's a code of Adam Baldwin and I >> >>> >> don't >> >>> >> know >> >>> >> if he let me change your code ... ok I will add my code to >> >>> >> find_dvcs :) >> >>> > >> >>> > It is open source, and if you're improving it... nobody will >> >>> > complain. >> >>> > >> >>> > I'm not saying that you HAVE to use find_dvcs, I was just mentioning >> >>> > that the plugins look alike and that before replacing one with the >> >>> > other (or something similar) we should understand what each >> >>> > provides. >> >>> > Note that we shouldn't leave both, that would only confuse users. >> >>> > >> >>> >> find_dvcs uses this strings to check existence of repositories: >> >>> >> .git/HEAD >> >>> >> .hg/requires >> >>> >> .bzr/README >> >>> >> >> >>> >> I use the repository index files to check this. Should I keep these >> >>> >> files >> >>> >> previously mentioned? >> >>> > >> >>> > You should use the files you think are more convenient to reduce the >> >>> > amount of HTTP requests and increase the quality of the detection. >> >>> > For >> >>> > example, could .bzr/README be removed and the bzr repository still >> >>> > work? Could the content be edited manually and make the detection >> >>> > fail >> >>> > for that? In the case of the "repository index files" it sounds like >> >>> > if you remove/edit those the repository will not work. >> >>> >> >>> Did you have the time to merge these two plugins? I would love to >> >>> review that code, add it to the threading2 branch and remove this from >> >>> my TODO list :) >> >>> >> >>> Regards, >> >>> >> >>> >> Regards >> >>> >> >> >>> >> >> >>> >> On Fri, Oct 5, 2012 at 9:44 PM, Andres Riancho >> >>> >> <andres.rian...@gmail.com> >> >>> >> wrote: >> >>> >>> >> >>> >>> List, Tomas, >> >>> >>> >> >>> >>> > - >> >>> >>> > >> >>> >>> > >> >>> >>> > https://code.google.com/p/tvelazquez/source/browse/pentest/w3af-plugins/crawl/rcs.py >> >>> >>> >> >>> >>> I noticed that this is an improvement for find_dvcs [0], which >> >>> >>> adds >> >>> >>> features for detecting SVN, CVS, etc. and also parsing some of the >> >>> >>> identified files; neat! What else is in this file? Why a rewrite >> >>> >>> instead of just adding stuff to find_dvcs? >> >>> >>> >> >>> >>> [0] >> >>> >>> >> >>> >>> >> >>> >>> https://sourceforge.net/apps/trac/w3af/browser/branches/threading2/plugins/crawl/find_dvcs.py >> >>> >>> >> >>> >>> Regards, >> >>> >>> -- >> >>> >>> Andrés Riancho >> >>> >>> Project Leader at w3af - http://w3af.org/ >> >>> >>> Web Application Attack and Audit Framework >> >>> >>> Twitter: @w3af >> >>> >>> GPG: 0x93C344F3 >> >>> >> >> >>> >> >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > Andrés Riancho >> >>> > Project Leader at w3af - http://w3af.org/ >> >>> > Web Application Attack and Audit Framework >> >>> > Twitter: @w3af >> >>> > GPG: 0x93C344F3 >> >>> >> >>> >> >>> >> >>> -- >> >>> Andrés Riancho >> >>> Project Leader at w3af - http://w3af.org/ >> >>> Web Application Attack and Audit Framework >> >>> Twitter: @w3af >> >>> GPG: 0x93C344F3 >> >> >> >> >> > >> > >> > >> > -- >> > Andrés Riancho >> > Project Leader at w3af - http://w3af.org/ >> > Web Application Attack and Audit Framework >> > Twitter: @w3af >> > GPG: 0x93C344F3 >> >> >> >> -- >> Andrés Riancho >> Project Leader at w3af - http://w3af.org/ >> Web Application Attack and Audit Framework >> Twitter: @w3af >> GPG: 0x93C344F3 > > -- Andrés Riancho Project Leader at w3af - http://w3af.org/ Web Application Attack and Audit Framework Twitter: @w3af GPG: 0x93C344F3 ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ W3af-develop mailing list W3af-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/w3af-develop