For sure, 1.x is more stable and that’s been the case for years. 2.x is great, 
and has
some interesting functionality and adaptability, but it is not as scalable as 
1.x.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-502
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 10/31/16, 11:20 AM, "Michael Coffey" <[email protected]> wrote:

    When you say that 1.x is more stable, what does that mean?
    
    
          From: Markus Jelsma <[email protected]>
     To: "[email protected]" <[email protected]> 
     Sent: Monday, October 31, 2016 9:39 AM
     Subject: RE: Nutch 1.x or 2.x
       
    Hello - if you want to crawl big, performance is not really a problem, 
especially using Hadoop output file compression. We chose 1.x, simply because 
it is more stable and feature rich.
    
    Using 1.x, it is quite easy to crawl a billion records.
    
    Also, do not run on many small machines, your overhead will kill your 
cluster wide performance. It is a complete waste of resources.
     
    -----Original message-----
    > From:Michael Coffey <[email protected]>
    > Sent: Sunday 30th October 2016 18:22
    > To: [email protected]
    > Subject: Re: Nutch 1.x or 2.x
    > 
    > Newbie question: I am trying to decide between Nutch 1.x or 2.x. The 
application is to crawl a large portion of the www using a massive number 
(thousands) of small machines (<= 2GB RAM each). I like the idea of the simpler 
architecture and pluggable storage backend of 2.x. However, I am concerned 
about things I've read about 2.x being less stable and possibly less efficient 
than 1.x. Are these concerns valid at this time?
    > 
    > 
    > 
    > 
    >    
    
       

Reply via email to