>From what I remember about the list discussions was: Nutch shouldn't implement everything under the sun, and doesn't need to invent (and worse maintain) the wheel. Lots of the existing projects now handle big chunks of the problems that Nutch originally implemented internally.
* Nutch no longer implements Map-Reduce, that was spun off as Hadoop (as I understand it). * Tika started off by somebody taking the Nutch parsers and turning them into an independent project. * Nutch no longer directly indexes using Lucene, instead it lets Solr handle that. Nutch implemented a lot of useful and reusable infrastructure which others noticed, spun off and created separate projects (and in Hadoop's case ecosystems). I am pretty sure that Julian's quote is about the piece of puzzle that Nutch is going to contribute the heavy lifting, and which pieces it is going to delegate the heavy lifting to some other project. Even the crawler-commons project mentioned on the list is all about spinning out useful re-usable components. The problem Nutch is tackling is large and difficult. The number of code contributors is actually fairly small, hence the extreme focus on re-using high quality code. All that is to say, Nutch still has the same goals and ultimately provides all the same functionality, it just isn't going to suffer from "Not Invented Here" syndrome. Kirby On Wed, Jul 6, 2011 at 6:04 PM, Mattmann, Chris A (388J) <[email protected]> wrote: > Also note that quotes can easily be taken out of context. Let's let Julien be > specific > and explain what he means rather than interpret his quotes. > > I'm not sure many of the high level goals of Nutch have changed one bit since > Doug started the project. The means, and the mechanism for getting there, have > a little bit, hopefully to its benefit. > > You can read about some of this in my ApacheCon NA 2010 presentation: > > http://s.apache.org/UvU > > Cheers, > Chris > > On Jul 6, 2011, at 1:21 PM, <[email protected]> > <[email protected]> wrote: > >> Julien Nioche, wrote: >> >> "This is a change in the scope of the project from being an open source >> large scale search engine to an open source crawler indeed. We should make >> this clearer on the website." >> >> Just a crawler? That is what worries me. When I kenw nutch 0.3, I loved >> its original purpose. I think that most users, like me, do not have the >> technical abilities to deal with further issues, quite complicated for >> non-programmers. >> >> >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >

