> I have been using Perl's www:mechanize to scrape a series of web pages. > > Unfortunately the web page now includes some Javascript, which > mechanize does not handle. Suggestions? > > All of my code is in shell script and perl, so I'd like to stick with > those. > > Suggestions?
+1 for Selenium with a real browser. I have used it with Python quite extensively over the last year, but I imagine Perl API is reasonable as well. Python is more polished, though, and you could invoke a Python processor from Perl. Somewhat heavy weight and slow, but you can redirect the browser DISPLAY to point to Xvfb null display and then you can run it in headless mode which saves you the graphics rendering at the very least. You can do all kinds of cool things - wait for the browser to render a certain element, inject your own Javascript, take screenshots. You can combine simple scraping with Selenium-style as well - e.g. if you detect something that you know you cannot handle with a simple HTML tree parser, you just delegate it to Selenium. Another nice thing is that you can add an option to your script to show you the visuals and pause at certain breakpoints. I do this all the time when doing testing and web development. E.g. I have some code already that gets me to a certain part of the UI. After that I do not know what to do. Instead of having to get there manually (which means I have to remember how and to actually do it), I just have my script take me there, and then pause waiting for a key press in the terminal. Once it gets there, I use the DOM inspector in the browser to figure out the ID or XPath, or maybe even custom Javascript to advance to the next stage and also what to validate and how. In fact, nowadays I do most of my web development with the help of Selenium - this way my brain is relieved of the tedium of having to click and type the same thing over and over, and when I am done I have an automated regression test to ensure I will never break what I just coded without the test suite raising a red flag. And, for a bonus, if you want to demonstrate a web UI failure or feature to a coworker, you can get his desktop set up with a local instance of Selenium, and then you just give your test an argument to point to his instance. Of course in that case you need to make sure that unwanted parties do not have access to the Selenium port. -- Sasha Pachev Fast Running Blog. http://fastrunningblog.com Run. Blog. Improve. Repeat. /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
