Re: Reviving Nutch 0.7

2007-01-22 Thread Piotr Kosiorowski
Otis, Some time ago people on the list said that they are willing to at least maintain Nutch 0.7 branch. As a committer (not very active recently) I volunteered to commit patches when they appear - I do not have enough time at the moment to do active coding. I have created a 7.3 release in JIRA

Re: How to Become a Nutch Developer

2007-01-22 Thread Zaheed Haque
On 1/21/07, Andrzej Bialecki [EMAIL PROTECTED] wrote: Well ... so far this process was very informal, because there were so few key developers that they more or less knew what needs to be done, and who is doing what. Hadoop follows a much stricter and formalized model, which we could adopt,

Re: Reviving Nutch 0.7

2007-01-22 Thread Zaheed Haque
On 1/22/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally. Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is

RE: Reviving Nutch 0.7

2007-01-22 Thread Alan Tanaman
Hello, I'm writing this on behalf of both Armel Nene and myself. We think that you and those who have responded have a point. We've been experiencing quite a number of problems with getting Nutch 0.8 adapted for our needs, and making changes to support evolving business requirements as they

Re: Fetcher2

2007-01-22 Thread chee wu
Fetcher2 should be a great help for me,but seems can't integrate with Nutch81. Any advice on how to use it based on .81? - Original Message - From: Andrzej Bialecki [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org Sent: Thursday, January 18, 2007 5:18 AM Subject: Fetcher2 Hi all,

Re: Reviving Nutch 0.7

2007-01-22 Thread Sami Siren
2007/1/22, Otis Gospodnetic [EMAIL PROTECTED]: Hi, I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally. Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is today.

Re: Reviving Nutch 0.7

2007-01-22 Thread Chris Mattmann
Before doubling (or after 0.9.0 tripling?) the maintenance/development work please consider the following: One option would be re factoring the code in a way that the parts that are usable to other projects like protocols?, parsers (this actually was proposed by Jukka Zitting some time

Re: Reviving Nutch 0.7

2007-01-22 Thread Sami Siren
Chris Mattmann wrote: In any case, I think that, if we are going to maintain separate branches of the source, in fact, really parallel projects, then an undertaking such as Tika is properly needed ... I still don't think we need separate project to start with, IMO right mode of mind is enough

Re: How to Become a Nutch Developer

2007-01-22 Thread Dennis Kubes
Thanks to everyone for the input. I know some of these questions are obvious but I wanted to take it from the lowest possible level. Part of the document is already posted to the wiki here. http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer It seems like I am getting a section done each

Re: Fetcher2

2007-01-22 Thread Andrzej Bialecki
chee wu wrote: Fetcher2 should be a great help for me,but seems can't integrate with Nutch81. Any advice on how to use it based on .81? You would have to port it to Nutch 0.8.1 - e.g. change all Text occurences to UTF8, and most likely make other changes too ... -- Best regards, Andrzej

Re: How to Become a Nutch Developer

2007-01-22 Thread Andrzej Bialecki
Dennis Kubes wrote: What does the Hadoop project do differently than Nutch. I thought they both were run about the same way? Is it that all communication on issues goes through the JIRA? The workflow is different - I'm not sure about the details, perhaps Doug can correct me if I'm wrong

Re: java.io.EOFException in latest nightly in mergesegs from hadoop.io.DataOutputBuffer

2007-01-22 Thread Brian Whitman
On Jan 21, 2007, at 6:47 AM, Sami Siren wrote: However I cannot find from the change logs of hadoop that what the change is that is causing nutch these problems. It's HADOOP-331, so i guess at least the changes/additions in map() is required. Hi, just following up here-- does this indicate

Re: How to Become a Nutch Developer

2007-01-22 Thread Doug Cutting
Andrzej Bialecki wrote: The workflow is different - I'm not sure about the details, perhaps Doug can correct me if I'm wrong ... and yes, it uses JIRA extensively. 1. An issue is created 2. patches are added, removed commented, etc... 3. finally, a candidate patch is selected, and the issue is

Re: Reviving Nutch 0.7

2007-01-22 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Yes, certainly, anything that can be shared and decoupled from pieces that make each branch (not SVN/CVS branch) different, should be decoupled. But I was really curious about whether people think this is a valid idea/direction, not necessarily immediately how things

Re: java.io.EOFException in latest nightly in mergesegs from hadoop.io.DataOutputBuffer

2007-01-22 Thread Sami Siren
Brian Whitman wrote: On Jan 21, 2007, at 6:47 AM, Sami Siren wrote: However I cannot find from the change logs of hadoop that what the change is that is causing nutch these problems. It's HADOOP-331, so i guess at least the changes/additions in map() is required. Hi, just following up

Re: java.io.EOFException in latest nightly in mergesegs from hadoop.io.DataOutputBuffer

2007-01-22 Thread Andrzej Bialecki
Sami Siren wrote: Brian Whitman wrote: On Jan 21, 2007, at 6:47 AM, Sami Siren wrote: However I cannot find from the change logs of hadoop that what the change is that is causing nutch these problems. It's HADOOP-331, so i guess at least the changes/additions in map() is

Re: How to Become a Nutch Developer

2007-01-22 Thread Dennis Kubes
+1 for adopting the same types of process with Nutch. Doug Cutting wrote: Andrzej Bialecki wrote: The workflow is different - I'm not sure about the details, perhaps Doug can correct me if I'm wrong ... and yes, it uses JIRA extensively. 1. An issue is created 2. patches are added, removed

Re: Reviving Nutch 0.7

2007-01-22 Thread AJ Chen
On 1/22/07, Doug Cutting [EMAIL PROTECTED] wrote: Finally, web crawling, indexing and searching are data-intensive. Before long, users will want to index tens or hundreds of millions of pages. Distributed operation is soon required at this scale, and batch-mode is an order-of-magnitude

Re: How to Become a Nutch Developer

2007-01-22 Thread Dennis Kubes
Doug Can you answer the question of how to add developer names to JIRA or if that is only for committers? Dennis Doug Cutting wrote: Andrzej Bialecki wrote: The workflow is different - I'm not sure about the details, perhaps Doug can correct me if I'm wrong ... and yes, it uses JIRA

Re: How to Become a Nutch Developer

2007-01-22 Thread Doug Cutting
Dennis Kubes wrote: Can you answer the question of how to add developer names to JIRA or if that is only for committers? It's not just for committers, but also for regular contributors. I have added you. Anyone else? Doug

Finished How to Become a Nutch Developer

2007-01-22 Thread nutch-dev
All, Draft version of How to Become a Nutch Developer is on the wiki at: http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer Please take a look and if you think anything needs to be added, removed, or changed let me know. Dennis Kubes