Hi Chaushu,

On Sun, May 31, 2015 at 12:30 AM, <[email protected]> wrote:

>
> I'm using Nutch 1.9 with Solr 4.10
> I wanted to ask what are the advantages of Nutch 2 vs. Nutch 1 and if I
> use Solr, there is a reason why should I use Nutch 2.
>

Nutch 1.X branch is the more maintained of the two Nutch codebases. It sees
more community contributions and has seen more releases as of recent. Nutch
2.X should be used of you have a justified reason to access Nutch crawl
data from one of the Gora supported datastores such as HBase. Both scale
very well and work well on official Hadoop 1.X Hadoop distributions. Nutch
2.X works on Hadoop 2.X. I think we are still not quite a point where Nutch
1.X is fully supported on Hadoop 2.X.


> (I understand that the different is that Nutch 2 use NoSQL - but if I use
> Solr, I can access the data from there..)
>
>
Correct. There is a gora-solr module where you can map your Nutch WebPage's
and Web Graph (WebDB) to as well as your Host DB.
hth
Lewis

Reply via email to