Web testing, scraping and XPath

Michael G Schwern Sat, 23 May 2009 18:11:52 -0700

I'm doing some application/acceptance level testing of web apps.  Selenium is
one option but for things like "check that every item has X attribute set" I
really want a program.  Javascript isn't a big deal so I wrote up some
Mechanize scripts.


A lot of the information isn't particularly friendly to being scraped.  Tables
and table cells and whatnot.  What I'd like to do is use XPath to narrow down
the amount of HTML I'm looking through.  I'd like to use an XPath query to
return the HTML sub-tree which matches the query, then do more XPath queries
inside of that.  Or maybe flatten it into text and just use regexes.

    # Load a page for testing
    $mech->get_ok($url);

    # Run an XPath query on $mech->content.  Return the resulting
    # XHTML nodes.  Fail if it doesn't match.
    my $row = $mech->xpath_ok("//t...@id='thing']");

    # Perform further queries on the XHTML row we found.
    # Render into text for more convenient testing.
    like $row->xpath("/td[1]")->as_text, qr/Foo/, "First cell";
    like $row->xpath("/td[2]")->as_text, qr/Bar/, "Second cell";

A combination of Test::WWW::Mechanize + HTML::TreeBuilder::XPath seems just
what I need.  Test::HTML::Content has the ability to see if an XPath query
matches but nothing more.  Web::Scraper looks interesting but a bit too high
level.

Before I go ahead and code it up I thought I'd ask if there's something out
there which does this?  Or a better technique?


-- 
The interface should be as clean as newly fallen snow and its behavior
as explicit as Japanese eel porn.

Web testing, scraping and XPath

Reply via email to