When you say that 1.x is more stable, what does that mean?
From: Markus Jelsma <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Monday, October 31, 2016 9:39 AM
Subject: RE: Nutch 1.x or 2.x
Hello - if you want to crawl big, performance is not really a problem,
especially using Hadoop output file compression. We chose 1.x, simply because
it is more stable and feature rich.
Using 1.x, it is quite easy to crawl a billion records.
Also, do not run on many small machines, your overhead will kill your cluster
wide performance. It is a complete waste of resources.
-----Original message-----
> From:Michael Coffey <[email protected]>
> Sent: Sunday 30th October 2016 18:22
> To: [email protected]
> Subject: Re: Nutch 1.x or 2.x
>
> Newbie question: I am trying to decide between Nutch 1.x or 2.x. The
> application is to crawl a large portion of the www using a massive number
> (thousands) of small machines (<= 2GB RAM each). I like the idea of the
> simpler architecture and pluggable storage backend of 2.x. However, I am
> concerned about things I've read about 2.x being less stable and possibly
> less efficient than 1.x. Are these concerns valid at this time?
>
>
>
>
>