Btw, i mixed up some things here (was quite some time ago). I didn't use
SpiderMonkey at all, that is C. The stuff i didn't manage to get running was
Rhino/Java ScriptEngine and envjs. It also seemed that dom.js does not support
Rhino. Both envjs and dom.js are both not maintained anymore.
-----Original message-----
> From:Markus Jelsma <[email protected]>
> Sent: Thursday 19th December 2013 15:43
> To: [email protected]
> Subject: RE: In reference to
> http://www.mail-archive.com/[email protected]/msg09999.html (Get HTML
> content generated by Javascript)
>
> All i did with Rhino was attempt to get it up and running inside a
> ParseFilter plugin. I did not succeed that time and didn't try it again. The
> Rhino website is quite confusing. I should work on it again some time.
>
> -----Original message-----
> > From:Patrick Kirsch <[email protected]>
> > Sent: Thursday 19th December 2013 15:39
> > To: [email protected]
> > Subject: Re: In reference to
> > http://www.mail-archive.com/[email protected]/msg09999.html (Get HTML
> > content generated by Javascript)
> >
> > Hey Markus,
> >
> > Am 19.12.2013 15:25, schrieb Markus Jelsma:
> > > Looks like PhantomJS is QT/CPP based, that is not something i think we
> > > can use from Nutch' HtmlParser implementation. Please correct me again if
> > > i am wrong :) I think it must be entirely Java based or we need a DOM
> > > environment written in Javascript such as EnvJS that we can run inside
> > > SpiderMonkey together with the page's Javascript.
> > Rhino is java based, did you tried it and with what results?
> > Can you share that experience?
> >
> >
> > >
> > > Cheers
> > >
> > >
> > Regards
> > >
> > > -----Original message-----
> > >> From:Patrick Kirsch <[email protected]>
> > >> Sent: Thursday 19th December 2013 15:19
> > >> To: [email protected]
> > >> Subject: Re: In reference to
> > >> http://www.mail-archive.com/[email protected]/msg09999.html (Get
> > >> HTML content generated by Javascript)
> > >>
> > >> Hey,
> > >> Am 19.12.2013 15:00, schrieb Markus Jelsma:
> > >>> From what i understood about Selenium is that it requires Selenium to
> > >>> run as service somewhere outside MapReduce, which is a problem in
> > >>> itself. Please correct me if i am wrong. If Selenium can emulate the
> > >>> DOM as just a library we could indeed process AJAX websites.
> > >> Selenium is intended to use for click automation and simulate
> > >> (pre-defined) workflows usually done by users (e.g. testing process).
> > >> So I'm not sure, how this will work.
> > >> Given a random single-page-site it is not (definitly-)clear which click
> > >> will produce ajax/json requests resolving in changing the DOM
> > >> significantly.
> > >>
> > >>>
> > >>> I've did tests once in Nutch with SpiderMonkey and Rhino but didn't get
> > >>> it to work that time. Using SpiderMonkey or another Javascript engine
> > >>> is quite easy but without the DOM we're helpless.
> > >> Ususally I use phantomjs, did you also tried that?
> > >>
> > >> At least Selenium has waitFor() events (e.g. with XPATHs or IDs), so it
> > >> is possible to trigger ajax/json events and collect the rendered (html)
> > >> result.
> > >>>
> > >>>
> > >>> -----Original message-----
> > >>>> From:Lewis John Mcgibbney <[email protected]>
> > >>>> Sent: Thursday 19th December 2013 14:31
> > >>>> To: [email protected]
> > >>>> Subject: Re: In reference to
> > >>>> http://www.mail-archive.com/[email protected]/msg09999.html (Get
> > >>>> HTML content generated by Javascript)
> > >>>>
> > >>>> Hi Nibal,
> > >>>>
> > >>>> On Sun, Dec 15, 2013 at 11:26 PM, <[email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>>
> > >>>>> of Single Page Web-apps and JavaScript-only web-applications is
> > >>>>> sky-rocketing.....well, isn't this a high priority issue????
> > >>>>>
> > >>>>
> > >>>> It would appear not. Unless folk provide patches then core contributers
> > >>>> have not got around to addressing this particular issue.
> > >>>>
> > >>>>
> > >>>>> If I had the technical knowledge, I would have contributed, but I
> > >>>>> don't
> > >>>>> think I have clearly gotten my head around understanding
> > >>>>> Nutch fully yet.
> > >>>>>
> > >>>>
> > >>>> That is a real shame. Its always nice to get contributions :)
> > >>>>
> > >>>>
> > >>>>>
> > >>>>> Note: my small research led me to a lot of Java based implementation
> > >>>>> including Selenium, HttpUnit and CrawlAjax being alternatives.
> > >>>>> I was wondering if in case this does not appear to be a high
> > >>>>> priority, does
> > >>>>> someone have any guidance to offer regarding this matter?
> > >>>>>
> > >>>>
> > >>>> Personally no i don't but maybe others do.
> > >>>>
> > >>>> Lewis
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>