I'm doing some application/acceptance level testing of web apps. Selenium is one option but for things like "check that every item has X attribute set" I really want a program. Javascript isn't a big deal so I wrote up some Mechanize scripts.
A lot of the information isn't particularly friendly to being scraped. Tables and table cells and whatnot. What I'd like to do is use XPath to narrow down the amount of HTML I'm looking through. I'd like to use an XPath query to return the HTML sub-tree which matches the query, then do more XPath queries inside of that. Or maybe flatten it into text and just use regexes. # Load a page for testing $mech->get_ok($url); # Run an XPath query on $mech->content. Return the resulting # XHTML nodes. Fail if it doesn't match. my $row = $mech->xpath_ok("//t...@id='thing']"); # Perform further queries on the XHTML row we found. # Render into text for more convenient testing. like $row->xpath("/td[1]")->as_text, qr/Foo/, "First cell"; like $row->xpath("/td[2]")->as_text, qr/Bar/, "Second cell"; A combination of Test::WWW::Mechanize + HTML::TreeBuilder::XPath seems just what I need. Test::HTML::Content has the ability to see if an XPath query matches but nothing more. Web::Scraper looks interesting but a bit too high level. Before I go ahead and code it up I thought I'd ask if there's something out there which does this? Or a better technique? -- The interface should be as clean as newly fallen snow and its behavior as explicit as Japanese eel porn.