Thanks ! -----Original Message----- From: Lewis John Mcgibbney [mailto:[email protected]] Sent: Sunday, May 31, 2015 21:56 To: [email protected] Subject: Re: Nutch 2.X vs. 1.X
Hi Chaushu, On Sun, May 31, 2015 at 12:30 AM, <[email protected]> wrote: > > I'm using Nutch 1.9 with Solr 4.10 > I wanted to ask what are the advantages of Nutch 2 vs. Nutch 1 and if > I use Solr, there is a reason why should I use Nutch 2. > Nutch 1.X branch is the more maintained of the two Nutch codebases. It sees more community contributions and has seen more releases as of recent. Nutch 2.X should be used of you have a justified reason to access Nutch crawl data from one of the Gora supported datastores such as HBase. Both scale very well and work well on official Hadoop 1.X Hadoop distributions. Nutch 2.X works on Hadoop 2.X. I think we are still not quite a point where Nutch 1.X is fully supported on Hadoop 2.X. > (I understand that the different is that Nutch 2 use NoSQL - but if I > use Solr, I can access the data from there..) > > Correct. There is a gora-solr module where you can map your Nutch WebPage's and Web Graph (WebDB) to as well as your Host DB. hth Lewis --------------------------------------------------------------------- Intel Electronics Ltd. This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

