Re: Distributed Crawling

Sebastian Nagel Tue, 12 Jan 2016 05:21:44 -0800

Hi,

Nutch was designed as *distributed* crawler.


This tutorial should help:
 https://wiki.apache.org/nutch/NutchHadoopTutorial
(it may be a little bit outdated, esp. for 1.11
 which switched from Hadoop 1.2 to 2.4
 -- we are grateful for any updates and completions.
 Thanks!)

It's not easy to manage a Hadoop cluster
- you may first start to learn how to run
  Nutch in pseudo-distributed mode:
  http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial
- or run Nutch on a Hadoop cloud (e.g., on AWS)

There are many people sharing their experience out there,
just google for:
 nutch distributed crawling
 nutch aws
or have a look at Julien's recent video tutorial:
 https://www.youtube.com/watch?v=v9zjcTjjjyU

Cheers,
Sebastian

On 01/12/2016 01:19 AM, Manish Verma wrote:
> Hello Friends,
> 
> I am using nutch 1.10 and want to do distributed crawling for speed, Is this 
> supported in Nutch 1.x or 2.x ?
> Any document on this ?
> 
> Thanks Manish
>

Re: Distributed Crawling

Reply via email to