Javier,

On Tue, Jul 19, 2011 at 10:21 AM, Javier Andalia
<javier_anda...@rapid7.com> wrote:
> List,
>
> This is my attempt to improve the performance of the xpath evaluation given
> a DOM Element.
> The original (and current) version is in httpResponse.py. Examples of how
> this is used can be found at:
> ajax.py, fileUpload.py, formAutocomplete.py, etc
>
>
>    def getDOM2(self):
>
>        '''
>
>        TODO: Put docstring here
>
>        '''
>
>        class DOM(object):
>
>            def xpath(self, tag, xpathpredicate='.'):
>
>                xpath = etree.XPath(xpathpredicate)
>
>                root = etree.fromstring(self.body,
>
>                                        etree.HTMLParser(recover=True))
>
>
>                context = etree.iterwalk(root, events=('start',), tag=tag)
>
>                try:
>
>                    for evt, elem in context:
>
>                        if xpath(elem):
>
>                            yield elem
>
>                        while elem.getprevious() is not None:
>
>                            del elem.getparent()[0]
>
>                except etree.XPathSyntaxError:
>
>                        om.out.debug('Invalid XPath expression: "%s"' %
>
>                                     xpathpredicate)
>
>                        raise
>
>                del context

    Are you sure that this is equivalent to the old implementation?

    I'm guessing that the old implementation is faster because it's C
with a Python wrapper and this is "python calling many times different
C functions" ? Have you tested [0] to see WHERE the CPU is consumed?

[0] http://code.google.com/p/jrfonseca/wiki/Gprof2Dot

>        dom = DOM()
>
>        dom.body = self.body
>
>        return dom
>
>
>
> Unfortunately this didn't work out as expected. It is slower.
>
>>>>  code = '''
>
> f = open("index-form-two-fields.html")
>
> html = f.read()
>
> f.close()
>
> u = url_object('http://w3af.com')
>
> res = core.data.url.httpResponse.httpResponse(200, html, {'content-type':
> 'text/html'}, u, u)
>
> for i in res.getDOM2().xpath('input',
> "translate(@type,'PASWORD','pasword')='password'"):
>
>    pass
>
> '''
>
>>>>  setup = '''import sys
>
> sys.path.append('/home/jandalia/workspace/w3af.unicode');
>
> from core.data.parsers.urlParser import url_object;
>
> import core.data.url.httpResponse
>
> '''
>
>>>>  t = timeit.Timer(code, setup)
>
>>>>  min(t.repeat(repeat=3, number=10000))
>
> 27.584304094314575
>
>>>>
>
>
> Using the original version:
>
>>>>  code = '''
>
> f = open("/home/jandalia/Desktop/index-form-two-fields.html")
>
> html = f.read()
>
> f.close()
>
> u = url_object('http://w3af.com')
>
> res = core.data.url.httpResponse.httpResponse(200, html, {'content-type':
> 'text/html'}, u, u)
>
> dom = res.getDOM()
>
> for i in
> dom.xpath("//input[translate(@type,'PASWORD','pasword')='password']"):
>
>    pass
>
> '''
>
>>>>  t = timeit.Timer(code, setup)
>>>>  min(t.repeat(repeat=3, number=10000))
>
> 3.8396580219268799
>
>
> In other words, it is about 7 times slower.
> If anyone has an idea on how to improve this code it would be very
> appreciated. The html doc used for the tests. is attached.
>
> Thanks!
>
> Javier
>
> Note: Some useful info can be found here:
> http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
>
>
>
> ------------------------------------------------------------------------------
> Magic Quadrant for Content-Aware Data Loss Prevention
> Research study explores the data loss prevention market. Includes in-depth
> analysis on the changes within the DLP market, and the criteria used to
> evaluate the strengths and weaknesses of these DLP solutions.
> http://www.accelacomm.com/jaw/sfnl/114/51385063/
> _______________________________________________
> W3af-develop mailing list
> W3af-develop@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/w3af-develop
>
>



-- 
Andrés Riancho
Director of Web Security at Rapid7 LLC
Founder at Bonsai Information Security
Project Leader at w3af

------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Reply via email to