Hi Markus,
This is very useful thank you.
Lewis

On Mon, Feb 25, 2013 at 3:08 PM, Markus Jelsma
<[email protected]>wrote:

> Something seems to be missing here. It's clear that 1.x has more features
> and is a lot more stable than 2.x. Nutch 2.x can theoretically perform a
> lot better if you are going to crawl on a very large scale but i still
> haven't seen any numbers to support this assumption. Nutch 1.x can easily
> deal with many millions of records and deal with billions if you throw some
> hardware at it.
>
> Most users are not going to crawl millions or records. In that case i
> personally choose 1.x. I prefer the stability and predictabilty above some
> performance you are not likely going to need anyway.
>
> Besides our large 1.x research cluster we still use 1.x in production for
> all our customers, running locally on a 2 core 512MB RAM VPS with a crawldb
> of over 5 million records and it runs fine, fast and keeps up with newly
> discovered URL's. The only significant improvements were a better scoring
> filter and integrating indexing in the fetcher.
>
> -----Original message-----
> > From:Lewis John Mcgibbney <[email protected]>
> > Sent: Mon 25-Feb-2013 23:37
> > To: [email protected]
> > Subject: Re: Differences between 2.1 and 1.6
> >
> > Hi Danilo,
> >
> > You can check out the architecture changes here
> > http://wiki.apache.org/nutch/#Nutch_2.x
> >
> > Nutch trunk (1.7-SNAPSHOT) is here
> > http://svn.apache.org/repos/asf/nutch/trunk/
> >
> > 2.x is here
> > http://svn.apache.org/repos/asf/nutch/branches/2.x/
> >
> > On Mon, Feb 25, 2013 at 1:56 PM, Danilo Fernandes <
> > [email protected]> wrote:
> >
> > > Hi everyone,
> > >
> > > Somebody can tell me about differences between 2.1 and 1.6?
> > >
> > > The SVN trunk is 1.* or 2.*?
> > >
> > > Thanks,
> > > Danilo Fernandes
> > >
> > >
> >
> >
> > --
> > *Lewis*
> >
>



-- 
*Lewis*

Reply via email to