Re: [RLUG] html parser for automated web site interaction? includes JavaScript handler?

Brian Chrisman Mon, 23 May 2005 14:10:18 -0700

On Mon, May 23, 2005 at 01:58:37PM -0700, Ben Johnson wrote:
> Hey.
> 
> When I demo'd my POS system at the last meeting I was asked by a couple
> people if I was using a certain package to parse the html.  I answered
> "no" and that I was using the LWP stuff, to which the response was
> "yeah, this package is built on top of the LWP stuff."  I don't remember
> the name of that package.  Will someone please refresh my memory?


btw, another option for doing this, is to take the html, push it through
'tidy' or 'xmllint', and perform xpath queries on it.  I used to do that
to webscrape more effectively than what I was before, ie regexp'ing for
various strings within the html.
I would expect such is being used behind WWW::Mechanize, though I'm
not sure.

> 
> My greatest hope for this thing is that it can run the javascript found
> in a page.  I played around with adding some code to my parser that fed
> the parsed javascript into the perl JavaScript module.  It wouldn't run
> though because the 'document' object didn't exist.  making a 'document'
> object seemed a bit overwhelming, so I stopped pursing that strategy.
> If something like this already exists though,  and it works even
> moderately well, I think my script would be much more tolerant to small
> changes in the site I'm trying to interact with.
> 
> - Ben
> 
> 
> _______________________________________________
> RLUG mailing list
> [email protected]
> http://lists.rlug.org/mailman/listinfo/rlug

_______________________________________________
RLUG mailing list
[email protected]
http://lists.rlug.org/mailman/listinfo/rlug

Re: [RLUG] html parser for automated web site interaction? includes JavaScript handler?

Reply via email to