Otis,
Some time ago people on the list said that they are willing to at
least maintain Nutch 0.7 branch. As a committer (not very active
recently) I volunteered to commit patches when they appear - I do not
have enough time at the moment to do active coding. I have created a
7.3 release in JIRA
On 1/21/07, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Well ... so far this process was very informal, because there were so
few key developers that they more or less knew what needs to be done,
and who is doing what.
Hadoop follows a much stricter and formalized model, which we could
adopt,
On 1/22/07, Otis Gospodnetic [EMAIL PROTECTED] wrote:
Hi,
I've been meaning to write this message for a while, and Andrzej's
StrategicGoals made me compose it, finally.
Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes,
it will be even more valuable than it is
Hello,
I'm writing this on behalf of both Armel Nene and myself.
We think that you and those who have responded have a point. We've been
experiencing quite a number of problems with getting Nutch 0.8 adapted for
our needs, and making changes to support evolving business requirements as
they
Fetcher2 should be a great help for me,but seems can't integrate with Nutch81.
Any advice on how to use it based on .81?
- Original Message -
From: Andrzej Bialecki [EMAIL PROTECTED]
To: nutch-dev@lucene.apache.org
Sent: Thursday, January 18, 2007 5:18 AM
Subject: Fetcher2
Hi all,
2007/1/22, Otis Gospodnetic [EMAIL PROTECTED]:
Hi,
I've been meaning to write this message for a while, and Andrzej's
StrategicGoals made me compose it, finally.
Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop
stabilizes, it will be even more valuable than it is today.
Before doubling (or after 0.9.0 tripling?) the maintenance/development work
please consider the following:
One option would be re factoring the code in a way that the parts that are
usable to other projects like protocols?, parsers (this actually was
proposed by
Jukka Zitting some time
Chris Mattmann wrote:
In any case, I think that, if we are going to maintain separate branches of
the source, in fact, really parallel projects, then an undertaking such as
Tika is properly needed ...
I still don't think we need separate project to start with, IMO right
mode of mind is enough
Thanks to everyone for the input. I know some of these questions are
obvious but I wanted to take it from the lowest possible level.
Part of the document is already posted to the wiki here.
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer
It seems like I am getting a section done each
chee wu wrote:
Fetcher2 should be a great help for me,but seems can't integrate with Nutch81.
Any advice on how to use it based on .81?
You would have to port it to Nutch 0.8.1 - e.g. change all Text
occurences to UTF8, and most likely make other changes too ...
--
Best regards,
Andrzej
Dennis Kubes wrote:
What does the Hadoop project do differently than Nutch. I thought
they both were run about the same way? Is it that all communication
on issues goes through the JIRA?
The workflow is different - I'm not sure about the details, perhaps Doug
can correct me if I'm wrong
On Jan 21, 2007, at 6:47 AM, Sami Siren wrote:
However I cannot find from the change logs of hadoop that what the
change is that is causing nutch these problems.
It's HADOOP-331, so i guess at least the changes/additions in map() is
required.
Hi, just following up here-- does this indicate
Andrzej Bialecki wrote:
The workflow is different - I'm not sure about the details, perhaps Doug
can correct me if I'm wrong ... and yes, it uses JIRA extensively.
1. An issue is created
2. patches are added, removed commented, etc...
3. finally, a candidate patch is selected, and the issue is
[EMAIL PROTECTED] wrote:
Yes, certainly, anything that can be shared and decoupled from pieces that make
each branch (not SVN/CVS branch) different, should be decoupled. But I was
really curious about whether people think this is a valid idea/direction, not
necessarily immediately how things
Brian Whitman wrote:
On Jan 21, 2007, at 6:47 AM, Sami Siren wrote:
However I cannot find from the change logs of hadoop that what the
change is that is causing nutch these problems.
It's HADOOP-331, so i guess at least the changes/additions in map() is
required.
Hi, just following up
Sami Siren wrote:
Brian Whitman wrote:
On Jan 21, 2007, at 6:47 AM, Sami Siren wrote:
However I cannot find from the change logs of hadoop that what the
change is that is causing nutch these problems.
It's HADOOP-331, so i guess at least the changes/additions in map() is
+1 for adopting the same types of process with Nutch.
Doug Cutting wrote:
Andrzej Bialecki wrote:
The workflow is different - I'm not sure about the details, perhaps
Doug can correct me if I'm wrong ... and yes, it uses JIRA extensively.
1. An issue is created
2. patches are added, removed
On 1/22/07, Doug Cutting [EMAIL PROTECTED] wrote:
Finally, web crawling, indexing and searching are data-intensive.
Before long, users will want to index tens or hundreds of millions of
pages. Distributed operation is soon required at this scale, and
batch-mode is an order-of-magnitude
Doug
Can you answer the question of how to add developer names to JIRA or if
that is only for committers?
Dennis
Doug Cutting wrote:
Andrzej Bialecki wrote:
The workflow is different - I'm not sure about the details, perhaps
Doug can correct me if I'm wrong ... and yes, it uses JIRA
Dennis Kubes wrote:
Can you answer the question of how to add developer names to JIRA or if
that is only for committers?
It's not just for committers, but also for regular contributors. I have
added you. Anyone else?
Doug
All,
Draft version of How to Become a Nutch Developer is on the wiki at:
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer
Please take a look and if you think anything needs to be added, removed,
or changed let me know.
Dennis Kubes
21 matches
Mail list logo