Re: Reviving Nutch 0.7

2007-01-23 Thread J. Delgado
Nutch Newbie wrote: Again not really proposing a new project but more easy to use re-usable code. IMHO, Nutch will be an umbrella project for "ala-Google" and Solr will be for "ala-Enterpise" where Lucene is the index lib, Hadoop is the Mapred/DFS lib ..what is missing is Common Crawler lib, Com

Re: Reviving Nutch 0.7

2007-01-23 Thread Nutch Newbie
Doug: I agree with all of your comment except the following.. Third, part of the problem seems like there are two few contributors--that the challenges are big and the resources limited. Splitting the project will only spread those resources more thinly. IMHO, there are lot of duplicated effo

RE: Reviving Nutch 0.7

2007-01-23 Thread Alan Tanaman
Doug Cutting wrote: > Branching doesn't sound like the right solution here. ... I couldn't agree more that the not-splitting-up approach is indeed better for resource-utilization, but how do we get round the problems that we keep encountering? We haven't managed to run a script without Hadoop pop

Re: Reviving Nutch 0.7

2007-01-22 Thread ogjunk-nutch
he.org Sent: Monday, January 22, 2007 1:40:30 PM Subject: Re: Reviving Nutch 0.7 [EMAIL PROTECTED] wrote: > Yes, certainly, anything that can be shared and decoupled from pieces that > make each branch (not SVN/CVS branch) different, should be decoupled. But I > was really curious about wheth

Re: Reviving Nutch 0.7

2007-01-22 Thread AJ Chen
On 1/22/07, Doug Cutting <[EMAIL PROTECTED]> wrote: Finally, web crawling, indexing and searching are data-intensive. Before long, users will want to index tens or hundreds of millions of pages. Distributed operation is soon required at this scale, and batch-mode is an order-of-magnitude faste

Re: Reviving Nutch 0.7

2007-01-22 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Yes, certainly, anything that can be shared and decoupled from pieces that make each branch (not SVN/CVS branch) different, should be decoupled. But I was really curious about whether people think this is a valid idea/direction, not necessarily immediately how things

Re: Reviving Nutch 0.7

2007-01-22 Thread ogjunk-nutch
utch-dev@lucene.apache.org Sent: Monday, January 22, 2007 10:52:47 AM Subject: Re: Reviving Nutch 0.7 Chris Mattmann wrote: > In any case, I think that, if we are going to maintain separate branches of > the source, in fact, really parallel projects, then an undertaking such as > Tika is prope

Re: Reviving Nutch 0.7

2007-01-22 Thread Sami Siren
Chris Mattmann wrote: > In any case, I think that, if we are going to maintain separate branches of > the source, in fact, really parallel projects, then an undertaking such as > Tika is properly needed ... I still don't think we need separate project to start with, IMO right mode of mind is enoug

Re: Reviving Nutch 0.7

2007-01-22 Thread Chris Mattmann
> Before doubling (or after 0.9.0 tripling?) the maintenance/development work > please consider the following: > > One option would be re factoring the code in a way that the parts that are > usable to other projects like protocols?, parsers (this actually was > proposed by > Jukka Zitting some

Re: Reviving Nutch 0.7

2007-01-22 Thread Sami Siren
2007/1/22, Otis Gospodnetic <[EMAIL PROTECTED]>: Hi, I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally. Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is today. How

RE: Reviving Nutch 0.7

2007-01-22 Thread Alan Tanaman
g WWW-crawling. Best regards, Alan _ Alan Tanaman iDNA Solutions http://blog.idna-solutions.com -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 22 January 2007 06:48 To: Nutch Developer List Subject: Reviving Nutch 0.7 Hi, I've been m

Re: Reviving Nutch 0.7

2007-01-22 Thread Zaheed Haque
On 1/22/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hi, I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally. Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is tod

Re: Reviving Nutch 0.7

2007-01-22 Thread Piotr Kosiorowski
Otis, Some time ago people on the list said that they are willing to at least maintain Nutch 0.7 branch. As a committer (not very active recently) I volunteered to commit patches when they appear - I do not have enough time at the moment to do active coding. I have created a 7.3 release in JIRA so

Reviving Nutch 0.7

2007-01-21 Thread Otis Gospodnetic
Hi, I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally. Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is today. However, I think there is still a need for something