Hi Markus, This is very useful thank you. Lewis On Mon, Feb 25, 2013 at 3:08 PM, Markus Jelsma <[email protected]>wrote:
> Something seems to be missing here. It's clear that 1.x has more features > and is a lot more stable than 2.x. Nutch 2.x can theoretically perform a > lot better if you are going to crawl on a very large scale but i still > haven't seen any numbers to support this assumption. Nutch 1.x can easily > deal with many millions of records and deal with billions if you throw some > hardware at it. > > Most users are not going to crawl millions or records. In that case i > personally choose 1.x. I prefer the stability and predictabilty above some > performance you are not likely going to need anyway. > > Besides our large 1.x research cluster we still use 1.x in production for > all our customers, running locally on a 2 core 512MB RAM VPS with a crawldb > of over 5 million records and it runs fine, fast and keeps up with newly > discovered URL's. The only significant improvements were a better scoring > filter and integrating indexing in the fetcher. > > -----Original message----- > > From:Lewis John Mcgibbney <[email protected]> > > Sent: Mon 25-Feb-2013 23:37 > > To: [email protected] > > Subject: Re: Differences between 2.1 and 1.6 > > > > Hi Danilo, > > > > You can check out the architecture changes here > > http://wiki.apache.org/nutch/#Nutch_2.x > > > > Nutch trunk (1.7-SNAPSHOT) is here > > http://svn.apache.org/repos/asf/nutch/trunk/ > > > > 2.x is here > > http://svn.apache.org/repos/asf/nutch/branches/2.x/ > > > > On Mon, Feb 25, 2013 at 1:56 PM, Danilo Fernandes < > > [email protected]> wrote: > > > > > Hi everyone, > > > > > > Somebody can tell me about differences between 2.1 and 1.6? > > > > > > The SVN trunk is 1.* or 2.*? > > > > > > Thanks, > > > Danilo Fernandes > > > > > > > > > > > > -- > > *Lewis* > > > -- *Lewis*

