Hi Kirby, I wouldn't have expressed it more clearly myself
Thanks Julien On 7 July 2011 00:30, Kirby Bohling <[email protected]> wrote: > From what I remember about the list discussions was: > > Nutch shouldn't implement everything under the sun, and doesn't need > to invent (and worse maintain) the wheel. Lots of the existing > projects now handle big chunks of the problems that Nutch originally > implemented internally. > > * Nutch no longer implements Map-Reduce, that was spun off as Hadoop > (as I understand it). > * Tika started off by somebody taking the Nutch parsers and turning > them into an independent project. > * Nutch no longer directly indexes using Lucene, instead it lets Solr > handle that. > > Nutch implemented a lot of useful and reusable infrastructure which > others noticed, spun off and created separate projects (and in > Hadoop's case ecosystems). I am pretty sure that Julian's quote is > about the piece of puzzle that Nutch is going to contribute the heavy > lifting, and which pieces it is going to delegate the heavy lifting to > some other project. Even the crawler-commons project mentioned on the > list is all about spinning out useful re-usable components. > > The problem Nutch is tackling is large and difficult. The number of > code contributors is actually fairly small, hence the extreme focus on > re-using high quality code. > > All that is to say, Nutch still has the same goals and ultimately > provides all the same functionality, it just isn't going to suffer > from "Not Invented Here" syndrome. > > Kirby > > > On Wed, Jul 6, 2011 at 6:04 PM, Mattmann, Chris A (388J) > <[email protected]> wrote: > > Also note that quotes can easily be taken out of context. Let's let > Julien be specific > > and explain what he means rather than interpret his quotes. > > > > I'm not sure many of the high level goals of Nutch have changed one bit > since > > Doug started the project. The means, and the mechanism for getting there, > have > > a little bit, hopefully to its benefit. > > > > You can read about some of this in my ApacheCon NA 2010 presentation: > > > > http://s.apache.org/UvU > > > > Cheers, > > Chris > > > > On Jul 6, 2011, at 1:21 PM, <[email protected]> < > [email protected]> wrote: > > > >> Julien Nioche, wrote: > >> > >> "This is a change in the scope of the project from being an open source > >> large scale search engine to an open source crawler indeed. We should > make > >> this clearer on the website." > >> > >> Just a crawler? That is what worries me. When I kenw nutch 0.3, I loved > >> its original purpose. I think that most users, like me, do not have > the > >> technical abilities to deal with further issues, quite complicated for > >> non-programmers. > >> > >> > >> > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Senior Computer Scientist > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 171-266B, Mailstop: 171-246 > > Email: [email protected] > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Adjunct Assistant Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

