When you say that 1.x is more stable, what does that mean?

      From: Markus Jelsma <[email protected]>
 To: "[email protected]" <[email protected]> 
 Sent: Monday, October 31, 2016 9:39 AM
 Subject: RE: Nutch 1.x or 2.x
   
Hello - if you want to crawl big, performance is not really a problem, 
especially using Hadoop output file compression. We chose 1.x, simply because 
it is more stable and feature rich.

Using 1.x, it is quite easy to crawl a billion records.

Also, do not run on many small machines, your overhead will kill your cluster 
wide performance. It is a complete waste of resources.
 
-----Original message-----
> From:Michael Coffey <[email protected]>
> Sent: Sunday 30th October 2016 18:22
> To: [email protected]
> Subject: Re: Nutch 1.x or 2.x
> 
> Newbie question: I am trying to decide between Nutch 1.x or 2.x. The 
> application is to crawl a large portion of the www using a massive number 
> (thousands) of small machines (<= 2GB RAM each). I like the idea of the 
> simpler architecture and pluggable storage backend of 2.x. However, I am 
> concerned about things I've read about 2.x being less stable and possibly 
> less efficient than 1.x. Are these concerns valid at this time?
> 
> 
> 
> 
>    

   

Reply via email to