Re: The Future of Nutch, reactivated

2009-05-23 Thread Otis Gospodnetic
Message From: Andrzej Bialecki a...@getopt.org To: nutch-dev@lucene.apache.org Sent: Thursday, May 14, 2009 9:59:11 AM Subject: The Future of Nutch, reactivated Hi all, I'd like to revive this thread and gather additional feedback so that we end up with concrete conclusions. Much of what

Re: The Future of Nutch, reactivated

2009-05-19 Thread Andrzej Bialecki
Aaron Binns wrote: Our usage of Nutch is focused on index building and search services. We don't use the crawling/fetching features at all. We use Heritrix. Typically, our large-scale harvests are performed over 8-12 week periods, then the archived data is handed off to me for full-text

Re: The Future of Nutch, reactivated

2009-05-19 Thread Aaron Binns
Andrzej Bialecki a...@getopt.org writes: One of the biggest boons of Nutch is the Hadoop infrastructure. When indexing massive data sets, being able to fire up 60+ nodes in a Hadoop system helps tremendously. Are you familiar with the distributed indexing package in Hadoop contrib/ ?

Re: The Future of Nutch, reactivated

2009-05-19 Thread Mark Olson
AA{hb - Original Message - From: Aaron Binns aa...@archive.org To: nutch-dev@lucene.apache.org nutch-dev@lucene.apache.org Sent: Tue May 19 13:23:37 2009 Subject: Re: The Future of Nutch, reactivated Andrzej Bialecki a...@getopt.org writes: One of the biggest boons of Nutch

Re: The Future of Nutch, reactivated

2009-05-19 Thread Mark Olson
R - Original Message - From: Aaron Binns aa...@archive.org To: nutch-dev@lucene.apache.org nutch-dev@lucene.apache.org Sent: Tue May 19 13:23:37 2009 Subject: Re: The Future of Nutch, reactivated Andrzej Bialecki a...@getopt.org writes: One of the biggest boons of Nutch is the Hadoop

Re: The Future of Nutch, reactivated

2009-05-19 Thread Bradford Stephens
- From: Aaron Binns aa...@archive.org To: nutch-dev@lucene.apache.org nutch-dev@lucene.apache.org Sent: Tue May 19 13:23:37 2009 Subject: Re: The Future of Nutch, reactivated Andrzej Bialecki a...@getopt.org writes: One of the biggest boons of Nutch is the Hadoop infrastructure. When

Re: The Future of Nutch, reactivated

2009-05-18 Thread Aaron Binns
Andrzej Bialecki a...@getopt.org writes: Target audience === I think that the Nutch project experiences a crisis of personality now - we are not sure what is the target audience, and we cannot satisfy everyone. I think that there are following groups of Nutch users: 1.

The Future of Nutch, reactivated

2009-05-14 Thread Andrzej Bialecki
Hi all, I'd like to revive this thread and gather additional feedback so that we end up with concrete conclusions. Much of what I write below others have said before, I'm trying here to express this as it looks from my point of view. Target audience === I think that the Nutch

Re: The Future of Nutch, reactivated

2009-05-14 Thread Mattmann, Chris A
Hi Andrzej, Great summary. My general feeling on this is similar to my prior comments on similar threads from Otis and from Dennis. My personal pet projects for Nutch2: * refactored Nutch core data structures, modeled as POJOs * refactored Nutch architecture where

The Future of Nutch, reactivated

2009-05-14 Thread Kirby Bohling
All, Sorry that I didn't reply, and thus this isn't threaded properly. I've lurked on the list via the RSS feed, I subscribed so I could put in my two cents worth. I've recently starting using git to maintain a local branch of Nutch. My hope is to get my employer to let me contribute just