Hi - it depends on the estimated size of your data and the available hardware. You can simply get the current 1.0.x stable or 1.1.x beta Hadoop version, both will run fine. The choice is which Nutch to use, 1.x is very stable and has more features and can be used for very large scale crawls although you might have to use a bit more hardware. 2.x is more efficient in writing and reading data but also less stable, you will run into more problems that divert you from your core tasks.
If you have a few powerful machines and your data is in the TB range 1.x is fine. If you like a challenge 2.x is the way to go. We process many TBs each month on just a few powerful machines and run a modified 1.x. -----Original message----- > From:許懷文 <[email protected]> > Sent: Mon 24-Dec-2012 18:17 > To: [email protected] > Subject: About the version of the nutch > > Dear Nutch Project Team: > > I am interested in Nutch and Hadoop and want to use them to apply to big > data analysis; but I have some problems with the version of them. > I want to set up a search engine by myself, and I also choose the > Hadoop+Nutch+Solr+Hbase to implement it. > Would you mind give me the suitable version of them to set them up? I will > appreciate your kind reply and helpful suggestions. > Thanks! > Best regards, > Kevin Hsu. >

