Is that something that could work on a massive scale? If not, i'd prefer a
Javascript engine and a DOM environment such as EnvJS where it can run in. It
is only very unfortunate that EnvJS hasn't been worked on how quite some time
now.
-----Original message-----
> From:Julien Nioche <[email protected]>
> Sent: Thursday 19th December 2013 15:05
> To: [email protected]
> Subject: Re: In reference to
> http://www.mail-archive.com/[email protected]/msg09999.html (Get HTML
> content generated by Javascript)
>
> One option is to write a custom protocol implementation which uses the
> Selenium API to navigate / resolve the javascript and return some byte
> content for the parser to process. You need to have a selenium server
> running indeed. We did use ChromeDriver as a Selenium-compatible server to
> do some bespoke navigation from a page and that worked fine.
>
>
> On 19 December 2013 14:00, Markus Jelsma <[email protected]> wrote:
>
> > From what i understood about Selenium is that it requires Selenium to run
> > as service somewhere outside MapReduce, which is a problem in itself.
> > Please correct me if i am wrong. If Selenium can emulate the DOM as just a
> > library we could indeed process AJAX websites.
> >
> > I've did tests once in Nutch with SpiderMonkey and Rhino but didn't get it
> > to work that time. Using SpiderMonkey or another Javascript engine is quite
> > easy but without the DOM we're helpless.
> >
> >
> > -----Original message-----
> > > From:Lewis John Mcgibbney <[email protected]>
> > > Sent: Thursday 19th December 2013 14:31
> > > To: [email protected]
> > > Subject: Re: In reference to
> > http://www.mail-archive.com/[email protected]/msg09999.html (Get HTML
> > content generated by Javascript)
> > >
> > > Hi Nibal,
> > >
> > > On Sun, Dec 15, 2013 at 11:26 PM, <[email protected]>
> > wrote:
> > >
> > > >
> > > > of Single Page Web-apps and JavaScript-only web-applications is
> > > > sky-rocketing.....well, isn't this a high priority issue????
> > > >
> > >
> > > It would appear not. Unless folk provide patches then core contributers
> > > have not got around to addressing this particular issue.
> > >
> > >
> > > > If I had the technical knowledge, I would have contributed, but I don't
> > > > think I have clearly gotten my head around understanding
> > > > Nutch fully yet.
> > > >
> > >
> > > That is a real shame. Its always nice to get contributions :)
> > >
> > >
> > > >
> > > > Note: my small research led me to a lot of Java based implementation
> > > > including Selenium, HttpUnit and CrawlAjax being alternatives.
> > > > I was wondering if in case this does not appear to be a high priority,
> > does
> > > > someone have any guidance to offer regarding this matter?
> > > >
> > >
> > > Personally no i don't but maybe others do.
> > >
> > > Lewis
> > >
> >
>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>