Hi - it depends on the estimated size of your data and the available hardware. 
You can simply get the current 1.0.x stable or 1.1.x beta Hadoop version, both 
will run fine. The choice is which Nutch to use, 1.x is very stable and has 
more features and can be used for very large scale crawls although you might 
have to use a bit more hardware. 2.x is more efficient in writing and reading 
data but also less stable, you will run into more problems that divert you from 
your core tasks.

If you have a few powerful machines and your data is in the TB range 1.x is 
fine. If you like a challenge 2.x is the way to go. We process many TBs each 
month on just a few powerful machines and run a modified 1.x.  
 
-----Original message-----
> From:許懷文 <[email protected]>
> Sent: Mon 24-Dec-2012 18:17
> To: [email protected]
> Subject: About the version of the nutch
> 
> Dear Nutch Project Team:
> 
> I am interested in Nutch and Hadoop and want to use them to apply to  big
> data analysis; but I have some problems with the version of them.
> I want to set up a search engine by myself, and I also choose the
> Hadoop+Nutch+Solr+Hbase to implement it.
> Would you mind give me the suitable version of them to set them up? I will
> appreciate your kind reply and helpful suggestions.
> Thanks!
> Best regards,
> Kevin Hsu.
> 

Reply via email to