Hi Kirby, On Jul 6, 2011, at 4:30 PM, Kirby Bohling wrote:
> From what I remember about the list discussions was: Quotes and links would help here. > > Nutch shouldn't implement everything under the sun, and doesn't need > to invent (and worse maintain) the wheel. Lots of the existing > projects now handle big chunks of the problems that Nutch originally > implemented internally. > > * Nutch no longer implements Map-Reduce, that was spun off as Hadoop > (as I understand it). Right, yep. > * Tika started off by somebody taking the Nutch parsers and turning > them into an independent project. Yep, that was me and a few other people. > * Nutch no longer directly indexes using Lucene, instead it lets Solr > handle that. Sure, that's been done recently with Nutch 1.2 on. > > Nutch implemented a lot of useful and reusable infrastructure which > others noticed, spun off and created separate projects (and in > Hadoop's case ecosystems). I am pretty sure that Julian's quote is > about the piece of puzzle that Nutch is going to contribute the heavy > lifting, and which pieces it is going to delegate the heavy lifting to > some other project. Right and if you would like checking out my ApacheCon NA slides you'll also find a similar quote in there. Ever since the Nutch2 design discussions [1], a number of us have had that as a vision. > Even the crawler-commons project mentioned on the > list is all about spinning out useful re-usable components. Sure, that's probably one part of their goals > > The problem Nutch is tackling is large and difficult. The number of > code contributors is actually fairly small, hence the extreme focus on > re-using high quality code. Where are you getting "the number of code contributors is really small"? We've added 3 significantly active committers over the past 2 years including Markus, Julien, Lewis and others. I've been doing a ton of releasing. We get updates and fixes from folks even more now than ever now that we are releasing (again I point you to the ApacheCon presentation for some thoughts on this). Nutch has had and maintains a tremendous community and a number of active users. For a while, it was definitely in coast mode, but I think we've made great strides over the past 2 years to rectify that. > > All that is to say, Nutch still has the same goals and ultimately > provides all the same functionality, it just isn't going to suffer > from "Not Invented Here" syndrome. Sure, it wouldn't suffer from it b/c most of the others that are inventing elsewhere were also original contributors to Nutch. Cheers, Chris [1] http://s.apache.org/B7u > > Kirby > > > On Wed, Jul 6, 2011 at 6:04 PM, Mattmann, Chris A (388J) > <[email protected]> wrote: >> Also note that quotes can easily be taken out of context. Let's let Julien >> be specific >> and explain what he means rather than interpret his quotes. >> >> I'm not sure many of the high level goals of Nutch have changed one bit since >> Doug started the project. The means, and the mechanism for getting there, >> have >> a little bit, hopefully to its benefit. >> >> You can read about some of this in my ApacheCon NA 2010 presentation: >> >> http://s.apache.org/UvU >> >> Cheers, >> Chris >> >> On Jul 6, 2011, at 1:21 PM, <[email protected]> >> <[email protected]> wrote: >> >>> Julien Nioche, wrote: >>> >>> "This is a change in the scope of the project from being an open source >>> large scale search engine to an open source crawler indeed. We should make >>> this clearer on the website." >>> >>> Just a crawler? That is what worries me. When I kenw nutch 0.3, I loved >>> its original purpose. I think that most users, like me, do not have the >>> technical abilities to deal with further issues, quite complicated for >>> non-programmers. >>> >>> >>> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

