Personally, I don't see the advantage of Nutch going for a TLP.  It's not like 
new committers are having a hard time getting in today, it's not like they are 
being proposed and rejected.  I also don't feel like Nutch lacks 
exposure/visibility -- lots of people know about it.  It's just that very few 
people need a massively scalable web-wide crawling machinery that Nutch 

Otis ----
Sematext :: :: Solr - Lucene - Nutch
Hadoop ecosystem search ::

>From: "Mattmann, Chris A (388J)" <>
>To: "" <>
>Sent: Sat, March 20, 2010 7:30:54 PM
>Subject: Re: [DISCUSS] Nutch as a top level project (TLP)?
>Hey Andrzej,
>>I’d be +1 for Nutch being a TLP. I don’t think it’ll change much (other than 
>>to provide more visibility/etc., and to allow more focused decision making by 
>>the folks in the Nutch community). The infrastructure moves required to move 
>>to TLP status are moving mailing lists, moving JIRA, moving SVN, and moving 
>>the website (a bit of redesign/etc.), which shouldn’t be that hard, and the 
>>infra team can probably help with (at least the first 3 parts if we file 
>>issues for them).
>>I’d volunteer to help with things like list moderation, or whatever else I 
>>can do to help.
>>The important things to decide would be: 
>       * Who’s on the PMC (my suggestion, similar to Tika, make existing Nutch 
> committers PMC members)
>       * Who’s the VP (my +1 for you)
>>On 3/19/10 12:51 PM, "Andrzej Bialecki" <> wrote:
>Hi devs,
>>>>The ASF Board indicated recently that so called "umbrella" projects,
>>>>i.e. projects that host many significant sub-projects, should examine
>>>>their structure towards simplification, such as merging or splitting out
>>>>Lucene TLP is such a project. Recently the Lucene PMC accepted the merge
>>>>of Solr and Lucene core projects. Mahout project will most likely split
>>>>to its own TLP soon. Which leaves Nutch as a sort of odd duck ;)
>>>>Moving Nutch to its own TLP has some advantages, mostly an easier
>>>>decision process - voting on new committers and new releases involves
>>>>then only those who participate directly in Nutch dev., i.e. the Nutch
>>>>Also, from the coding point of view, Nutch is not intrinsically tied to
>>>>the Lucene development as if both would require some careful
>>>>coordination - we just use Lucene as one of many dependencies, and in
>>>>fact we aim to cleanly separate Nutch search API from Lucene-based API.
>>>>I can easily imagine Nutch dropping completely the low-level
>>>>Lucene-based components and moving to a more general search fabric (e.g.
>>>>Being its own TLP could also give Nutch more exposure and help to
>>>>crystallize our mission.
>>>>There are some disadvantages to such a split, too: we would need to
>>>>spend some more effort on various administrative tasks, and maintain a
>>>>separate web site (under Apache, but not under Lucene), and probably
>>>>some other tasks that I'm not yet aware of. This would also mean that
>>>>Nutch would have to stand on its own merit, which considering the small
>>>>number of active committers may be challenging.
>>>>Let's discuss this, and after we collect some pros and cons I'm going to
>>>>call for a vote.
>>>>Best regards,
>>>>Andrzej Bialecki     <><
>>>>  ___. ___ ___ ___ _ _   __________________________________
>>>>[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>>>>___|||__||  \|  ||  |  Embedded Unix, System Integration
>>  Contact: info at sigram dot com
>>Chris Mattmann, Ph.D.
>>Senior Computer Scientist
>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 171-266B, Mailstop: 171-246
>>Adjunct Assistant Professor, Computer Science Department 
>>University of Southern California, Los Angeles, CA 90089 USA
> >

Reply via email to