Thanks !

-----Original Message-----
From: Lewis John Mcgibbney [mailto:[email protected]] 
Sent: Sunday, May 31, 2015 21:56
To: [email protected]
Subject: Re: Nutch 2.X vs. 1.X

Hi Chaushu,

On Sun, May 31, 2015 at 12:30 AM, <[email protected]> wrote:

>
> I'm using Nutch 1.9 with Solr 4.10
> I wanted to ask what are the advantages of Nutch 2 vs. Nutch 1 and if 
> I use Solr, there is a reason why should I use Nutch 2.
>

Nutch 1.X branch is the more maintained of the two Nutch codebases. It sees 
more community contributions and has seen more releases as of recent. Nutch 2.X 
should be used of you have a justified reason to access Nutch crawl data from 
one of the Gora supported datastores such as HBase. Both scale very well and 
work well on official Hadoop 1.X Hadoop distributions. Nutch 2.X works on 
Hadoop 2.X. I think we are still not quite a point where Nutch 1.X is fully 
supported on Hadoop 2.X.


> (I understand that the different is that Nutch 2 use NoSQL - but if I 
> use Solr, I can access the data from there..)
>
>
Correct. There is a gora-solr module where you can map your Nutch WebPage's and 
Web Graph (WebDB) to as well as your Host DB.
hth
Lewis
---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Reply via email to