Hi Kirby,

On Jul 6, 2011, at 4:30 PM, Kirby Bohling wrote:

> From what I remember about the list discussions was:

Quotes and links would help here.

> 
> Nutch shouldn't implement everything under the sun, and doesn't need
> to invent (and worse maintain) the wheel.  Lots of the existing
> projects now handle big chunks of the problems that Nutch originally
> implemented internally.
> 
> * Nutch no longer implements Map-Reduce, that was spun off as Hadoop
> (as I understand it).

Right,  yep.

> * Tika started off by somebody taking the Nutch parsers and turning
> them into an independent project.

Yep, that was me and a few other people.

> * Nutch no longer directly indexes using Lucene, instead it lets Solr
> handle that.

Sure, that's been done recently with Nutch 1.2 on.

> 
> Nutch implemented a lot of useful and reusable infrastructure which
> others noticed, spun off and created separate projects (and in
> Hadoop's case ecosystems).  I am pretty sure that Julian's quote is
> about the piece of puzzle that Nutch is going to contribute the heavy
> lifting, and which pieces it is going to delegate the heavy lifting to
> some other project.  

Right and if you would like checking out my ApacheCon NA slides you'll 
also find a similar quote in there. Ever since the Nutch2 design discussions 
[1], 
a number of us have had that as a vision.

> Even the crawler-commons project mentioned on the
> list is all about spinning out useful re-usable components.

Sure, that's probably one part of their goals

> 
> The problem Nutch is tackling is large and difficult.  The number of
> code contributors is actually fairly small, hence the extreme focus on
> re-using high quality code.

Where are you getting "the number of code contributors is really small"?

We've added 3 significantly active committers over the past 2 years including 
Markus, Julien, Lewis and others. I've been doing a ton of releasing. We get 
updates and fixes from folks even more now than ever now that we are releasing 
(again I point you to the ApacheCon presentation for some thoughts on this).

Nutch has had and maintains a tremendous community and a number of active 
users. For a while, it was definitely in coast mode, but I think we've made 
great 
strides over the past 2 years to rectify that.

> 
> All that is to say, Nutch still has the same goals and ultimately
> provides all the same functionality, it just isn't going to suffer
> from "Not Invented Here" syndrome.

Sure, it wouldn't suffer from it b/c most of the others that are inventing 
elsewhere were also original contributors to Nutch.

Cheers,
Chris

[1] http://s.apache.org/B7u

> 
> Kirby
> 
> 
> On Wed, Jul 6, 2011 at 6:04 PM, Mattmann, Chris A (388J)
> <[email protected]> wrote:
>> Also note that quotes can easily be taken out of context. Let's let Julien 
>> be specific
>> and explain what he means rather than interpret his quotes.
>> 
>> I'm not sure many of the high level goals of Nutch have changed one bit since
>> Doug started the project. The means, and the mechanism for getting there, 
>> have
>> a little bit, hopefully to its benefit.
>> 
>> You can read about some of this in my ApacheCon NA 2010 presentation:
>> 
>> http://s.apache.org/UvU
>> 
>> Cheers,
>> Chris
>> 
>> On Jul 6, 2011, at 1:21 PM, <[email protected]> 
>> <[email protected]> wrote:
>> 
>>> Julien Nioche, wrote:
>>> 
>>> "This is a change in the scope of the project from being an open source
>>> large scale search engine to an open source crawler indeed. We should make
>>> this clearer on the website."
>>> 
>>> Just a crawler? That is what worries me. When I kenw nutch 0.3, I loved
>>> its original purpose.  I think that most  users,  like me, do not have the
>>> technical abilities to deal with further issues, quite complicated for
>>> non-programmers.
>>> 
>>> 
>>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [email protected]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to