>From what I remember about the list discussions was:

Nutch shouldn't implement everything under the sun, and doesn't need
to invent (and worse maintain) the wheel.  Lots of the existing
projects now handle big chunks of the problems that Nutch originally
implemented internally.

* Nutch no longer implements Map-Reduce, that was spun off as Hadoop
(as I understand it).
* Tika started off by somebody taking the Nutch parsers and turning
them into an independent project.
* Nutch no longer directly indexes using Lucene, instead it lets Solr
handle that.

Nutch implemented a lot of useful and reusable infrastructure which
others noticed, spun off and created separate projects (and in
Hadoop's case ecosystems).  I am pretty sure that Julian's quote is
about the piece of puzzle that Nutch is going to contribute the heavy
lifting, and which pieces it is going to delegate the heavy lifting to
some other project.  Even the crawler-commons project mentioned on the
list is all about spinning out useful re-usable components.

The problem Nutch is tackling is large and difficult.  The number of
code contributors is actually fairly small, hence the extreme focus on
re-using high quality code.

All that is to say, Nutch still has the same goals and ultimately
provides all the same functionality, it just isn't going to suffer
from "Not Invented Here" syndrome.

Kirby


On Wed, Jul 6, 2011 at 6:04 PM, Mattmann, Chris A (388J)
<[email protected]> wrote:
> Also note that quotes can easily be taken out of context. Let's let Julien be 
> specific
> and explain what he means rather than interpret his quotes.
>
> I'm not sure many of the high level goals of Nutch have changed one bit since
> Doug started the project. The means, and the mechanism for getting there, have
> a little bit, hopefully to its benefit.
>
> You can read about some of this in my ApacheCon NA 2010 presentation:
>
> http://s.apache.org/UvU
>
> Cheers,
> Chris
>
> On Jul 6, 2011, at 1:21 PM, <[email protected]> 
> <[email protected]> wrote:
>
>> Julien Nioche, wrote:
>>
>> "This is a change in the scope of the project from being an open source
>> large scale search engine to an open source crawler indeed. We should make
>> this clearer on the website."
>>
>> Just a crawler? That is what worries me. When I kenw nutch 0.3, I loved
>> its original purpose.  I think that most  users,  like me, do not have the
>> technical abilities to deal with further issues, quite complicated for
>> non-programmers.
>>
>>
>>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Reply via email to