Here is an unpleasant truth - there is no up to date tutorial for Nutch. To 
make it even more interesting, sometimes the tutorial can contradict real 
behavior of Nutch, because of lately introduced features/bugs. If you find such 
cases, please try to fix and contribute to the project.

Welcome to the open source world.

Though, my recommendations as a person who started with Nutch less then a year 
ago :
1) If you just need a simple crawl, you are in luck. Simply run crawl script or 
several steps according to the Nutch crawl tutorial.
2) If it is bit more comlex you start to face problems either with 
configuration or with bugs. Therefore, first have a look at Nutch List Archive , if it doesnt work try to figure 
out yourself, if that doesnt work ask here or at developer list.
3) In most cases, you HAVE to open the code and fix/discover something. Nutch 
is really complicated system and to understand it properly you can easily spend 
2-3 months trying to get the full basic understanding of the system. It gets 
even worse if you don't know Hadoop. If you dont I do recomend to read "Hadoop. 
The definitive guide", because, well, Nutch is Hadoop.

Here we are, no pain, no gain.

Sent: Tuesday, March 06, 2018 at 7:42 PM
From: "Eric Valencia" <>
Subject: Re: Need Tutorial on Nutch
Thank you kindly Yash. Yes, I did try some of the tutorials actually but
they seem to be missing the complete amount of steps required to
successfully scrape in nutch.

On Tue, Mar 6, 2018 at 10:37 AM Yash Thenuan Thenuan <>

> I would suggest to start with the documentation on nutch's website.
> You can get a Idea about how to start crawling and all.
> Apart from that there are no proper tutorials as such.
> Just start crawling if you got stuck somewhere try to find something
> related to that on Google and nutch mailing list archives.
> Ask questions if nothing helps.
> On 7 Mar 2018 00:01, "Eric Valencia" <> wrote:
> I'm a beginner in Nutch and need the best tutorials to get started. Can
> you guys let me know how you would advise yourselves if starting today
> (like me)?
> Eric

Reply via email to