One option is to write a custom protocol implementation which uses the Selenium API to navigate / resolve the javascript and return some byte content for the parser to process. You need to have a selenium server running indeed. We did use ChromeDriver as a Selenium-compatible server to do some bespoke navigation from a page and that worked fine.
On 19 December 2013 14:00, Markus Jelsma <[email protected]> wrote: > From what i understood about Selenium is that it requires Selenium to run > as service somewhere outside MapReduce, which is a problem in itself. > Please correct me if i am wrong. If Selenium can emulate the DOM as just a > library we could indeed process AJAX websites. > > I've did tests once in Nutch with SpiderMonkey and Rhino but didn't get it > to work that time. Using SpiderMonkey or another Javascript engine is quite > easy but without the DOM we're helpless. > > > -----Original message----- > > From:Lewis John Mcgibbney <[email protected]> > > Sent: Thursday 19th December 2013 14:31 > > To: [email protected] > > Subject: Re: In reference to > http://www.mail-archive.com/[email protected]/msg09999.html (Get HTML > content generated by Javascript) > > > > Hi Nibal, > > > > On Sun, Dec 15, 2013 at 11:26 PM, <[email protected]> > wrote: > > > > > > > > of Single Page Web-apps and JavaScript-only web-applications is > > > sky-rocketing.....well, isn't this a high priority issue???? > > > > > > > It would appear not. Unless folk provide patches then core contributers > > have not got around to addressing this particular issue. > > > > > > > If I had the technical knowledge, I would have contributed, but I don't > > > think I have clearly gotten my head around understanding > > > Nutch fully yet. > > > > > > > That is a real shame. Its always nice to get contributions :) > > > > > > > > > > Note: my small research led me to a lot of Java based implementation > > > including Selenium, HttpUnit and CrawlAjax being alternatives. > > > I was wondering if in case this does not appear to be a high priority, > does > > > someone have any guidance to offer regarding this matter? > > > > > > > Personally no i don't but maybe others do. > > > > Lewis > > > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

