Re: Web forum crawling using nutch

Patrick Kirsch Mon, 01 Sep 2014 07:22:08 -0700

Am 06.08.2014 10:24, schrieb Ali Nazemian:
> Dear all,
> Hi,
> - Some of forums use java script for identifying paging and java script is
> a client side programming language. Somehow it should be parsed with nutch.
Parsing of plain javascript files (plain links) is possible.
Difficult is the situation, if links will be generated (e.g. click
events) through a Javascript JQuery Framework like JQuery.
In this case Nutch needs to behave more like a browser and need the help
of selenium, phantomjs or xulrunner etc.
> - The depth method of nutch for crawling becomes useless since each page
> consider in new depth. But also infinite depth is off the choice cause it
> can be face us with infite crawling!
> - More...
> I really appreciate if somebody guide me through this subject.
> Best regards.
> 
Regards,
 Patrick

Re: Web forum crawling using nutch

Reply via email to