Sorry, I called you Thomas ;) Your email fooled me.
Ok you almost have me sold, the only thing that still raises concerns
about externalizability is the bit about:
findvalue('preceding-sibling::td[1]/a')
It seems like there are multiple script points that are tied to a
particular page, not just the XPath query. I guess that just means I
would need multiple configuration strings for a single target URL.
thanks for the help Mark, this is getting me close to my goal,
Chad
On 8/18/05, Thomas, Mark - BLS CTR <[EMAIL PROTECTED]> wrote:
> Chad Armstrong wrote:
> > Thanks Thomas,
> > No no, I'm not wed to Template::Extract at all, but the reason I was
> > drawn to it is because I am going to be doing a lot of scraping for a
> > project and wanted to be able to externalize the template for the
> > various target pages, rather than embedding it for a particular page
> > format. Do you know of any other modules that might be able to
> > accomplish this? My goal is basically to extract certain data and
> > create an rss feed given a URL.
>
> Actually, I do that kind of thing a lot. I use XML::LibXML to parse URLs,
> and Template Toolkit to format the data into RSS, HTML snippets for a
> portal, etc.
>
> Anyway, I find XPath expressions to be very nice because they are simple
> strings and therefore externalizable (place into a config file, database,
> etc) so that maintenance is simpler. This is useful when the site's layout
> changes, which it will inevitably do from time to time. Also, I often find
> that just one well-crafted XPath expression is all that is needed to extract
> exactly what you want.
>
> If there's something else as convenient as XPath for parsing HTML, I haven't
> found it yet.
>
> - Mark.
>
>
_______________________________________________
templates mailing list
[email protected]
http://lists.template-toolkit.org/mailman/listinfo/templates