I'm confused as to what are the significant differences between 1.x and
2.x.
Is there a bit of history that I could read about why the development of
the two parallel to each other happened?

As I'm just starting out with Nutch/Solr/Hadoop, I'd like to know which
path would be best for me to follow. So far, 1.x has appeared to be the
best choice for me, but is that going to change in the next iteration?
Confused. And a little scared.

Guy McDowell
[email protected]
http://www.GuyMcDowell.com





On Fri, Aug 29, 2014 at 11:29 AM, Mattmann, Chris A (3980) <
[email protected]> wrote:

> +1, great.
>
> I'd like to have a conversation about versioning.
>
> Since we're at 1.9, my suggestion would be to have the
> next in the trunk series (1.x) move to version 3.x post
> 1.9 for the release.
>
> Nutch2 remains Nutch and can be worked on there. That
> would give us a nice split in the diversionary branch
> paths for Nutch.
>
> Cheers,
> Chris
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: [email protected]
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Julien Nioche <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Friday, August 29, 2014 1:35 AM
> To: "[email protected]" <[email protected]>
> Subject: Re: [RELEASE] Apache Nutch 1.9
>
> >Hi Lewis,
> >
> >A few comments below.
> >
> >I use Nutch 2.x as it enables me to do analytics over the data I am
> >> crawling. This is my justification for trying to maintain an further the
> >> development on that branch over the last while.
> >>
> >
> >Just out of interest, what sort of analytics do you do and why is it
> >better
> >to do it in 2.x than 1.x?
> >
> >
> >> I am also extremely interested in the technologies supported within the
> >> Nutch 2.X stack and I like keeping up with their development and using
> >>them
> >> to fix my problems if and when the problems arise.
> >> I like having fine grained control over my storage architecture. This is
> >> also a pro for me.
> >>
> >
> >Another way to look at it is that having to maintain 2 versions in Nutch
> >is
> >an absolute pain, especially given that there aren't very many active
> >committers.
> >IMHO the mistake we made a few years ago was to name the GORA-based branch
> >'2.x' as it leads people to think that it is an improvement over 1.x. We
> >should have called it something like Nutch-GORA or something along these
> >lines (the original version was called NutchBase) to underline that it is
> >a
> >different beast, not necessarily a better one.
> >
> >Most users are probably not bothered in the underlying technologies so
> >much
> >and just want the stuff to work, not fix problems. In my view 2.x is not
> >production ready, but an experimental branch.
> >
> >
> >
> >> The performance Julien talks about (and please correct me if I am wrong
> >> Julien) is not so much Nutch related as it is Gora. Different Gora
> >>backends
> >> perform differently, this is itself driven by who wishes to maintain
> >>them.
> >>
> >
> >Not really. The overall performance has improved a bit with the latest
> >version of GORA but not that different from what we reported in
> >http://digitalpebble.blogspot.co.uk/2013/09/nutch-fight-17-vs-221.html.
> >Some backends are probably better than others indeed but all of them are
> >atrocious compared to 1.x, I think the reason for that is that these NoSQL
> >tools are optimize to provide random reads/writes to the data and in Nutch
> >we use them mostly in a sequential manner. Whether the functionalities we
> >gain are worth the effort depends on everyone's use case.
> >
> >
> >> On another note, we've identified that for users, Nutch 2.X is a bloody
> >> pain to provision and get running. This is a problem for this branch and
> >> for the people that invest and possibly waste time trying to determine
> >> revisions, etc.
> >>
> >
> >Could not agree more. That and the fact that it puts additional
> >constraints
> >on the hardware and means servers with bigger specs (££££)
> >
> >
> >>
> >> It is my intention to build different Vagrant flavours for each Nutch
> >>2.X
> >> stack.
> >> https://issues.apache.org/jira/browse/NUTCH-1812
> >>
> >> If ANYONE on this list is intersted in helping with this effort them I
> >> would dedicate some time to document the process on the wiki so that it
> >>can
> >> be reproduced for everyone's benefit. I feel that this would be a huge
> >>move
> >> forward for the 2.X branch.
> >>
> >
> > Thanks for your enthusiasm and efforts Lewis!
> >
> >For anyone insterested in 2.x - there are quite a few issues you can help
> >with if you feel so inclined, see
> >
> https://issues.apache.org/jira/browse/NUTCH/fixforversion/12324325/?select
> >edTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
> >
> >Julien
> >
> >--
> >
> >Open Source Solutions for Text Engineering
> >
> >http://digitalpebble.blogspot.com/
> >http://www.digitalpebble.com
> >http://twitter.com/digitalpebble
>
>

Reply via email to