Thanks Thomas,
  No no, I'm not wed to Template::Extract at all, but the reason I was
drawn to it is because I am going to be doing a lot of scraping for a
project and wanted to be able to externalize the template for the
various target pages, rather than embedding it for a particular page
format. Do you know of any other modules that might be able to
accomplish this? My goal is basically to extract certain data and
create an rss feed given a URL.

thanks again
Chad

On 8/18/05, Thomas, Mark - BLS CTR <[EMAIL PROTECTED]> wrote:
> Chad,
> 
> Sorry that this is a somewhat OT remark, but I like to use XPath to parse
> HTML using XML::LibXML. This code does the same thing:
> 
> #!/usr/bin/perl -w
> use strict;
> use LWP::Simple;
> use XML::LibXML;
> 
> my $parser = XML::LibXML->new(XML_LIBXML_RECOVER => 2);
> my $doc = $parser->parse_html_string(get
> "http://www.timeanddate.com/worldclock/";);
> 
> my @data;
> 
> # Find the table cells containing time (the only cells with a class of "r")
> foreach my $time_cell ($doc->findnodes('//tr/[EMAIL PROTECTED]"r"]')) {
>     push @data, {
>                  # The city name is in the preceding cell
>                  city =>
> $time_cell->findvalue('preceding-sibling::td[1]/a'),
>                  time => $time_cell->textContent,
>                 };
> }
> 
> use Data::Dumper;
> print Dumper ([EMAIL PROTECTED]);
> __END__
> 
> Result:
> 
> $VAR1 = [
>           {
>             'city' => 'Abu Dhabi',
>             'time' => 'Thu 11:34 PM'
>           },
>           {
>             'city' => 'Halifax',
>             'time' => 'Thu 4:34 PM'
>           },
>           {
>             'city' => 'New Orleans',
>             'time' => 'Thu 2:34 PM'
>           },
>           ...
> ];
> 
>

_______________________________________________
templates mailing list
[email protected]
http://lists.template-toolkit.org/mailman/listinfo/templates

Reply via email to