msg09999.html (Get HTML content generated by Javascript)

Markus Jelsma Thu, 19 Dec 2013 06:02:35 -0800

>From what i understood about Selenium is that it requires Selenium to run as 
>service somewhere outside MapReduce, which is a problem in itself. Please 
>correct me if i am wrong. If Selenium can emulate the DOM as just a library we 
>could indeed process AJAX websites.


I've did tests once in Nutch with SpiderMonkey and Rhino but didn't get it to 
work that time. Using SpiderMonkey or another Javascript engine is quite easy 
but without the DOM we're helpless.
 
 
-----Original message-----
> From:Lewis John Mcgibbney <[email protected]>
> Sent: Thursday 19th December 2013 14:31
> To: [email protected]
> Subject: Re: In reference to 
> http://www.mail-archive.com/[email protected]/msg09999.html (Get HTML 
> content generated by Javascript)
> 
> Hi Nibal,
> 
> On Sun, Dec 15, 2013 at 11:26 PM, <[email protected]> wrote:
> 
> >
> > of Single Page Web-apps and JavaScript-only web-applications is
> > sky-rocketing.....well, isn't this a high priority issue????
> >
> 
> It would appear not. Unless folk provide patches then core contributers
> have not got around to addressing this particular issue.
> 
> 
> > If I had the technical knowledge, I would have contributed, but I don't
> > think I have clearly gotten my head around understanding
> > Nutch fully yet.
> >
> 
> That is a real shame. Its always nice to get contributions :)
> 
> 
> >
> > Note: my small research led me to a lot of Java based implementation
> > including Selenium, HttpUnit and CrawlAjax being alternatives.
> > I was wondering if in case this does not appear to be a high priority, does
> > someone have any guidance to offer regarding this matter?
> >
> 
> Personally no i don't but maybe others do.
> 
> Lewis
>

RE: In reference to http://www.mail-archive.com/[email protected]/msg09999.html (Get HTML content generated by Javascript)

Reply via email to