One option is to write a custom protocol implementation which uses the
Selenium API to navigate / resolve the javascript and return some byte
content for the parser to process. You need to have a selenium server
running indeed. We did use ChromeDriver as a Selenium-compatible server to
do some bespoke navigation from a page and that worked fine.


On 19 December 2013 14:00, Markus Jelsma <[email protected]> wrote:

> From what i understood about Selenium is that it requires Selenium to run
> as service somewhere outside MapReduce, which is a problem in itself.
> Please correct me if i am wrong. If Selenium can emulate the DOM as just a
> library we could indeed process AJAX websites.
>
> I've did tests once in Nutch with SpiderMonkey and Rhino but didn't get it
> to work that time. Using SpiderMonkey or another Javascript engine is quite
> easy but without the DOM we're helpless.
>
>
> -----Original message-----
> > From:Lewis John Mcgibbney <[email protected]>
> > Sent: Thursday 19th December 2013 14:31
> > To: [email protected]
> > Subject: Re: In reference to
> http://www.mail-archive.com/[email protected]/msg09999.html (Get HTML
> content generated by Javascript)
> >
> > Hi Nibal,
> >
> > On Sun, Dec 15, 2013 at 11:26 PM, <[email protected]>
> wrote:
> >
> > >
> > > of Single Page Web-apps and JavaScript-only web-applications is
> > > sky-rocketing.....well, isn't this a high priority issue????
> > >
> >
> > It would appear not. Unless folk provide patches then core contributers
> > have not got around to addressing this particular issue.
> >
> >
> > > If I had the technical knowledge, I would have contributed, but I don't
> > > think I have clearly gotten my head around understanding
> > > Nutch fully yet.
> > >
> >
> > That is a real shame. Its always nice to get contributions :)
> >
> >
> > >
> > > Note: my small research led me to a lot of Java based implementation
> > > including Selenium, HttpUnit and CrawlAjax being alternatives.
> > > I was wondering if in case this does not appear to be a high priority,
> does
> > > someone have any guidance to offer regarding this matter?
> > >
> >
> > Personally no i don't but maybe others do.
> >
> > Lewis
> >
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to