For sure, 1.x is more stable and that’s been the case for years. 2.x is great, and has some interesting functionality and adaptability, but it is not as scalable as 1.x.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, Open Source Projects Formulation and Development Office (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 180-503E, Mailstop: 180-502 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ On 10/31/16, 11:20 AM, "Michael Coffey" <[email protected]> wrote: When you say that 1.x is more stable, what does that mean? From: Markus Jelsma <[email protected]> To: "[email protected]" <[email protected]> Sent: Monday, October 31, 2016 9:39 AM Subject: RE: Nutch 1.x or 2.x Hello - if you want to crawl big, performance is not really a problem, especially using Hadoop output file compression. We chose 1.x, simply because it is more stable and feature rich. Using 1.x, it is quite easy to crawl a billion records. Also, do not run on many small machines, your overhead will kill your cluster wide performance. It is a complete waste of resources. -----Original message----- > From:Michael Coffey <[email protected]> > Sent: Sunday 30th October 2016 18:22 > To: [email protected] > Subject: Re: Nutch 1.x or 2.x > > Newbie question: I am trying to decide between Nutch 1.x or 2.x. The application is to crawl a large portion of the www using a massive number (thousands) of small machines (<= 2GB RAM each). I like the idea of the simpler architecture and pluggable storage backend of 2.x. However, I am concerned about things I've read about 2.x being less stable and possibly less efficient than 1.x. Are these concerns valid at this time? > > > > >

