Message
From: Andrzej Bialecki a...@getopt.org
To: nutch-dev@lucene.apache.org
Sent: Thursday, May 14, 2009 9:59:11 AM
Subject: The Future of Nutch, reactivated
Hi all,
I'd like to revive this thread and gather additional feedback so that we
end up with concrete conclusions. Much of what
Aaron Binns wrote:
Our usage of Nutch is focused on index building and search services. We
don't use the crawling/fetching features at all. We use Heritrix.
Typically, our large-scale harvests are performed over 8-12 week
periods, then the archived data is handed off to me for full-text
Andrzej Bialecki a...@getopt.org writes:
One of the biggest boons of Nutch is the Hadoop infrastructure. When
indexing massive data sets, being able to fire up 60+ nodes in a
Hadoop system helps tremendously.
Are you familiar with the distributed indexing package in Hadoop
contrib/ ?
AA{hb
- Original Message -
From: Aaron Binns aa...@archive.org
To: nutch-dev@lucene.apache.org nutch-dev@lucene.apache.org
Sent: Tue May 19 13:23:37 2009
Subject: Re: The Future of Nutch, reactivated
Andrzej Bialecki a...@getopt.org writes:
One of the biggest boons of Nutch
R
- Original Message -
From: Aaron Binns aa...@archive.org
To: nutch-dev@lucene.apache.org nutch-dev@lucene.apache.org
Sent: Tue May 19 13:23:37 2009
Subject: Re: The Future of Nutch, reactivated
Andrzej Bialecki a...@getopt.org writes:
One of the biggest boons of Nutch is the Hadoop
-
From: Aaron Binns aa...@archive.org
To: nutch-dev@lucene.apache.org nutch-dev@lucene.apache.org
Sent: Tue May 19 13:23:37 2009
Subject: Re: The Future of Nutch, reactivated
Andrzej Bialecki a...@getopt.org writes:
One of the biggest boons of Nutch is the Hadoop infrastructure. When
Andrzej Bialecki a...@getopt.org writes:
Target audience
===
I think that the Nutch project experiences a crisis of personality now -
we are not sure what is the target audience, and we cannot satisfy
everyone. I think that there are following groups of Nutch users:
1.
Hi all,
I'd like to revive this thread and gather additional feedback so that we
end up with concrete conclusions. Much of what I write below others have
said before, I'm trying here to express this as it looks from my point
of view.
Target audience
===
I think that the Nutch
Hi Andrzej,
Great summary. My general feeling on this is similar to my prior comments on
similar threads from Otis and from Dennis. My personal pet projects for
Nutch2:
* refactored Nutch core data structures, modeled as POJOs
* refactored Nutch architecture where
All,
Sorry that I didn't reply, and thus this isn't threaded properly.
I've lurked on the list via the RSS feed, I subscribed so I could put
in my two cents worth. I've recently starting using git to maintain a
local branch of Nutch. My hope is to get my employer to let me
contribute just
10 matches
Mail list logo