Hey,
Am 19.12.2013 15:00, schrieb Markus Jelsma:
 From what i understood about Selenium is that it requires Selenium to run as 
service somewhere outside MapReduce, which is a problem in itself. Please 
correct me if i am wrong. If Selenium can emulate the DOM as just a library we 
could indeed process AJAX websites.
Selenium is intended to use for click automation and simulate (pre-defined) workflows usually done by users (e.g. testing process).
So I'm not sure, how this will work.
Given a random single-page-site it is not (definitly-)clear which click will produce ajax/json requests resolving in changing the DOM significantly.


I've did tests once in Nutch with SpiderMonkey and Rhino but didn't get it to 
work that time. Using SpiderMonkey or another Javascript engine is quite easy 
but without the DOM we're helpless.
Ususally I use phantomjs, did you also tried that?

At least Selenium has waitFor() events (e.g. with XPATHs or IDs), so it is possible to trigger ajax/json events and collect the rendered (html) result.


-----Original message-----
From:Lewis John Mcgibbney <[email protected]>
Sent: Thursday 19th December 2013 14:31
To: [email protected]
Subject: Re: In reference to 
http://www.mail-archive.com/[email protected]/msg09999.html (Get HTML 
content generated by Javascript)

Hi Nibal,

On Sun, Dec 15, 2013 at 11:26 PM, <[email protected]> wrote:


of Single Page Web-apps and JavaScript-only web-applications is
sky-rocketing.....well, isn't this a high priority issue????


It would appear not. Unless folk provide patches then core contributers
have not got around to addressing this particular issue.


If I had the technical knowledge, I would have contributed, but I don't
think I have clearly gotten my head around understanding
Nutch fully yet.


That is a real shame. Its always nice to get contributions :)



Note: my small research led me to a lot of Java based implementation
including Selenium, HttpUnit and CrawlAjax being alternatives.
I was wondering if in case this does not appear to be a high priority, does
someone have any guidance to offer regarding this matter?


Personally no i don't but maybe others do.

Lewis



Reply via email to