Nutch Newbie wrote:
Again not really proposing a new project but more easy to use
re-usable code. IMHO, Nutch will be an umbrella project for
"ala-Google" and Solr will be for "ala-Enterpise" where Lucene
is the index lib, Hadoop is the Mapred/DFS lib ..what is missing is
Common Crawler lib, Com
Doug:
I agree with all of your comment except the following..
Third, part of the problem seems like there are two few
contributors--that the challenges are big and the resources limited.
Splitting the project will only spread those resources more thinly.
IMHO, there are lot of duplicated effo
Doug Cutting wrote:
> Branching doesn't sound like the right solution here. ...
I couldn't agree more that the not-splitting-up approach is indeed better
for resource-utilization, but how do we get round the problems that we keep
encountering?
We haven't managed to run a script without Hadoop pop
he.org
Sent: Monday, January 22, 2007 1:40:30 PM
Subject: Re: Reviving Nutch 0.7
[EMAIL PROTECTED] wrote:
> Yes, certainly, anything that can be shared and decoupled from pieces that
> make each branch (not SVN/CVS branch) different, should be decoupled. But I
> was really curious about wheth
On 1/22/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
Finally, web crawling, indexing and searching are data-intensive.
Before long, users will want to index tens or hundreds of millions of
pages. Distributed operation is soon required at this scale, and
batch-mode is an order-of-magnitude faste
[EMAIL PROTECTED] wrote:
Yes, certainly, anything that can be shared and decoupled from pieces that make
each branch (not SVN/CVS branch) different, should be decoupled. But I was
really curious about whether people think this is a valid idea/direction, not
necessarily immediately how things
utch-dev@lucene.apache.org
Sent: Monday, January 22, 2007 10:52:47 AM
Subject: Re: Reviving Nutch 0.7
Chris Mattmann wrote:
> In any case, I think that, if we are going to maintain separate branches of
> the source, in fact, really parallel projects, then an undertaking such as
> Tika is prope
Chris Mattmann wrote:
> In any case, I think that, if we are going to maintain separate branches of
> the source, in fact, really parallel projects, then an undertaking such as
> Tika is properly needed ...
I still don't think we need separate project to start with, IMO right
mode of mind is enoug
> Before doubling (or after 0.9.0 tripling?) the maintenance/development work
> please consider the following:
>
> One option would be re factoring the code in a way that the parts that are
> usable to other projects like protocols?, parsers (this actually was
> proposed by
> Jukka Zitting some
2007/1/22, Otis Gospodnetic <[EMAIL PROTECTED]>:
Hi,
I've been meaning to write this message for a while, and Andrzej's
StrategicGoals made me compose it, finally.
Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop
stabilizes, it will be even more valuable than it is today. How
g WWW-crawling.
Best regards,
Alan
_
Alan Tanaman
iDNA Solutions
http://blog.idna-solutions.com
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: 22 January 2007 06:48
To: Nutch Developer List
Subject: Reviving Nutch 0.7
Hi,
I've been m
On 1/22/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Hi,
I've been meaning to write this message for a while, and Andrzej's
StrategicGoals made me compose it, finally.
Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes,
it will be even more valuable than it is tod
Otis,
Some time ago people on the list said that they are willing to at
least maintain Nutch 0.7 branch. As a committer (not very active
recently) I volunteered to commit patches when they appear - I do not
have enough time at the moment to do active coding. I have created a
7.3 release in JIRA so
Hi,
I've been meaning to write this message for a while, and Andrzej's
StrategicGoals made me compose it, finally.
Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes,
it will be even more valuable than it is today. However, I think there is
still a need for something
14 matches
Mail list logo