Thanks Thomas, No no, I'm not wed to Template::Extract at all, but the reason I was drawn to it is because I am going to be doing a lot of scraping for a project and wanted to be able to externalize the template for the various target pages, rather than embedding it for a particular page format. Do you know of any other modules that might be able to accomplish this? My goal is basically to extract certain data and create an rss feed given a URL.
thanks again Chad On 8/18/05, Thomas, Mark - BLS CTR <[EMAIL PROTECTED]> wrote: > Chad, > > Sorry that this is a somewhat OT remark, but I like to use XPath to parse > HTML using XML::LibXML. This code does the same thing: > > #!/usr/bin/perl -w > use strict; > use LWP::Simple; > use XML::LibXML; > > my $parser = XML::LibXML->new(XML_LIBXML_RECOVER => 2); > my $doc = $parser->parse_html_string(get > "http://www.timeanddate.com/worldclock/"); > > my @data; > > # Find the table cells containing time (the only cells with a class of "r") > foreach my $time_cell ($doc->findnodes('//tr/[EMAIL PROTECTED]"r"]')) { > push @data, { > # The city name is in the preceding cell > city => > $time_cell->findvalue('preceding-sibling::td[1]/a'), > time => $time_cell->textContent, > }; > } > > use Data::Dumper; > print Dumper ([EMAIL PROTECTED]); > __END__ > > Result: > > $VAR1 = [ > { > 'city' => 'Abu Dhabi', > 'time' => 'Thu 11:34 PM' > }, > { > 'city' => 'Halifax', > 'time' => 'Thu 4:34 PM' > }, > { > 'city' => 'New Orleans', > 'time' => 'Thu 2:34 PM' > }, > ... > ]; > > _______________________________________________ templates mailing list [email protected] http://lists.template-toolkit.org/mailman/listinfo/templates
