Javier, On Tue, Jul 19, 2011 at 12:41 PM, Javier Andalia <javier_anda...@rapid7.com> wrote: > On 07/19/2011 04:29 PM, Andres Riancho wrote: >> >> Javier, >> >> On Tue, Jul 19, 2011 at 12:18 PM, Javier Andalia >> <javier_anda...@rapid7.com> wrote: >>> >>> On 07/19/2011 02:54 PM, Andres Riancho wrote: >>>> >>>> Javier, >>>> >>>> On Tue, Jul 19, 2011 at 10:21 AM, Javier Andalia >>>> <javier_anda...@rapid7.com> wrote: >>>>> >>>>> List, >>>>> >>>>> This is my attempt to improve the performance of the xpath evaluation >>>>> given >>>>> a DOM Element. >>>>> The original (and current) version is in httpResponse.py. Examples of >>>>> how >>>>> this is used can be found at: >>>>> ajax.py, fileUpload.py, formAutocomplete.py, etc >>>>> >>>>> >>>>> def getDOM2(self): >>>>> >>>>> ''' >>>>> >>>>> TODO: Put docstring here >>>>> >>>>> ''' >>>>> >>>>> class DOM(object): >>>>> >>>>> def xpath(self, tag, xpathpredicate='.'): >>>>> >>>>> xpath = etree.XPath(xpathpredicate) >>>>> >>>>> root = etree.fromstring(self.body, >>>>> >>>>> etree.HTMLParser(recover=True)) >>>>> >>>>> >>>>> context = etree.iterwalk(root, events=('start',), >>>>> tag=tag) >>>>> >>>>> try: >>>>> >>>>> for evt, elem in context: >>>>> >>>>> if xpath(elem): >>>>> >>>>> yield elem >>>>> >>>>> while elem.getprevious() is not None: >>>>> >>>>> del elem.getparent()[0] >>>>> >>>>> except etree.XPathSyntaxError: >>>>> >>>>> om.out.debug('Invalid XPath expression: "%s"' % >>>>> >>>>> xpathpredicate) >>>>> >>>>> raise >>>>> >>>>> del context >>>> >>>> Are you sure that this is equivalent to the old implementation? >>> >>> What do you mean? It is certainly a little more complex but still >>> equivalent. >> >> Sorry for not being clear enough! My question was: is your >> implementation going to return the same result as the old >> implementation for ALL inputs? >> > > Pretty sure! Note there's a slight variation on the way the 'xpath' method > is called in the experimental implementation though. > > typical lines as: > > dom.xpath("//input[translate(@type,'PASWORD','pasword')='password']") > > > were converted to: > > dom.xpath(tag='input', > xpathpredicate="translate(@type,'PASWORD','pasword')='password'")
Interesting. I still would like to know where the majority of the CPU use goes to. gprof2xdot can tell you that in a very visual way. > >>>> I'm guessing that the old implementation is faster because it's C >>>> with a Python wrapper and this is "python calling many times different >>> >>> That make sense. Additionally, I think it is slower because the xpath >>> evaluation occurs *only once* in the original implementation. I >>> definitely >>> misunderstood what was explained in section "Finding elements quickly" of >>> [1] where they focus on the use of 'find' and 'findall' vs more efficient >>> alternatives. We use in our code simple and direct xpath evaluation. >>> Seems >>> that anything can't be faster than that. >>> >>> Javier >>> >>> >>> [1] http://www.ibm.com/developerworks/xml/library/x-hiperfparse/ >>> >>>> C functions" ? Have you tested [0] to see WHERE the CPU is consumed? >>>> >>>> [0] http://code.google.com/p/jrfonseca/wiki/Gprof2Dot >>>> >>>>> dom = DOM() >>>>> >>>>> dom.body = self.body >>>>> >>>>> return dom >>>>> >>>>> >>>>> >>>>> Unfortunately this didn't work out as expected. It is slower. >>>>> >>>>>>>> code = ''' >>>>> >>>>> f = open("index-form-two-fields.html") >>>>> >>>>> html = f.read() >>>>> >>>>> f.close() >>>>> >>>>> u = url_object('http://w3af.com') >>>>> >>>>> res = core.data.url.httpResponse.httpResponse(200, html, >>>>> {'content-type': >>>>> 'text/html'}, u, u) >>>>> >>>>> for i in res.getDOM2().xpath('input', >>>>> "translate(@type,'PASWORD','pasword')='password'"): >>>>> >>>>> pass >>>>> >>>>> ''' >>>>> >>>>>>>> setup = '''import sys >>>>> >>>>> sys.path.append('/home/jandalia/workspace/w3af.unicode'); >>>>> >>>>> from core.data.parsers.urlParser import url_object; >>>>> >>>>> import core.data.url.httpResponse >>>>> >>>>> ''' >>>>> >>>>>>>> t = timeit.Timer(code, setup) >>>>>>>> min(t.repeat(repeat=3, number=10000)) >>>>> >>>>> 27.584304094314575 >>>>> >>>>> >>>>> Using the original version: >>>>> >>>>>>>> code = ''' >>>>> >>>>> f = open("/home/jandalia/Desktop/index-form-two-fields.html") >>>>> >>>>> html = f.read() >>>>> >>>>> f.close() >>>>> >>>>> u = url_object('http://w3af.com') >>>>> >>>>> res = core.data.url.httpResponse.httpResponse(200, html, >>>>> {'content-type': >>>>> 'text/html'}, u, u) >>>>> >>>>> dom = res.getDOM() >>>>> >>>>> for i in >>>>> dom.xpath("//input[translate(@type,'PASWORD','pasword')='password']"): >>>>> >>>>> pass >>>>> >>>>> ''' >>>>> >>>>>>>> t = timeit.Timer(code, setup) >>>>>>>> min(t.repeat(repeat=3, number=10000)) >>>>> >>>>> 3.8396580219268799 >>>>> >>>>> >>>>> In other words, it is about 7 times slower. >>>>> If anyone has an idea on how to improve this code it would be very >>>>> appreciated. The html doc used for the tests. is attached. >>>>> >>>>> Thanks! >>>>> >>>>> Javier >>>>> >>>>> Note: Some useful info can be found here: >>>>> http://www.ibm.com/developerworks/xml/library/x-hiperfparse/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Magic Quadrant for Content-Aware Data Loss Prevention >>>>> Research study explores the data loss prevention market. Includes >>>>> in-depth >>>>> analysis on the changes within the DLP market, and the criteria used to >>>>> evaluate the strengths and weaknesses of these DLP solutions. >>>>> http://www.accelacomm.com/jaw/sfnl/114/51385063/ >>>>> _______________________________________________ >>>>> W3af-develop mailing list >>>>> W3af-develop@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/w3af-develop >>>>> >>>>> >>>> >>> >> >> > > -- Andrés Riancho Director of Web Security at Rapid7 LLC Founder at Bonsai Information Security Project Leader at w3af ------------------------------------------------------------------------------ Magic Quadrant for Content-Aware Data Loss Prevention Research study explores the data loss prevention market. Includes in-depth analysis on the changes within the DLP market, and the criteria used to evaluate the strengths and weaknesses of these DLP solutions. http://www.accelacomm.com/jaw/sfnl/114/51385063/ _______________________________________________ W3af-develop mailing list W3af-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/w3af-develop