Re: Reviving Nutch 0.7

2007-01-23 Thread J. Delgado
Nutch Newbie wrote: Again not really proposing a new project but more easy to use re-usable code. IMHO, Nutch will be an umbrella project for ala-Google and Solr will be for ala-Enterpise where Lucene is the index lib, Hadoop is the Mapred/DFS lib ..what is missing is Common Crawler lib, Common

Re: Reviving Nutch 0.7

2007-01-22 Thread Piotr Kosiorowski
Otis, Some time ago people on the list said that they are willing to at least maintain Nutch 0.7 branch. As a committer (not very active recently) I volunteered to commit patches when they appear - I do not have enough time at the moment to do active coding. I have created a 7.3 release in JIRA

Re: Reviving Nutch 0.7

2007-01-22 Thread Zaheed Haque
On 1/22/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally. Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is

RE: Reviving Nutch 0.7

2007-01-22 Thread Alan Tanaman
Hello, I'm writing this on behalf of both Armel Nene and myself. We think that you and those who have responded have a point. We've been experiencing quite a number of problems with getting Nutch 0.8 adapted for our needs, and making changes to support evolving business requirements as they

Re: Reviving Nutch 0.7

2007-01-22 Thread Sami Siren
2007/1/22, Otis Gospodnetic [EMAIL PROTECTED]: Hi, I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally. Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is today.

Re: Reviving Nutch 0.7

2007-01-22 Thread Chris Mattmann
Before doubling (or after 0.9.0 tripling?) the maintenance/development work please consider the following: One option would be re factoring the code in a way that the parts that are usable to other projects like protocols?, parsers (this actually was proposed by Jukka Zitting some time

Re: Reviving Nutch 0.7

2007-01-22 Thread Sami Siren
Chris Mattmann wrote: In any case, I think that, if we are going to maintain separate branches of the source, in fact, really parallel projects, then an undertaking such as Tika is properly needed ... I still don't think we need separate project to start with, IMO right mode of mind is enough

Re: Reviving Nutch 0.7

2007-01-22 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Yes, certainly, anything that can be shared and decoupled from pieces that make each branch (not SVN/CVS branch) different, should be decoupled. But I was really curious about whether people think this is a valid idea/direction, not necessarily immediately how things

Re: Reviving Nutch 0.7

2007-01-22 Thread AJ Chen
On 1/22/07, Doug Cutting [EMAIL PROTECTED] wrote: Finally, web crawling, indexing and searching are data-intensive. Before long, users will want to index tens or hundreds of millions of pages. Distributed operation is soon required at this scale, and batch-mode is an order-of-magnitude