Hello,

 I've been following this conversation for the past week and decided that
I'd go ahead and chime in now. I think that honestly this whole thread of
discussion needs to be taken off list, because it doesn't really have
anything to do with the "use" of Nutch: what it boils down to is a list of
complaints, requests for improvements and what not. Nutch's goal is to be a
large-scale, open source search engine: it's not a PDF parsing framework,
nor is it as thoroughly documented as some commercial software -- although
I've ran into many commercial software products that don't have the same
quality of documentation that Nutch even has now in its nascent stages.

> Now that I have said that, I want to express my feeling that it's hard
> when it takes a week to figure out that invertlinks only applies to
> version 0.8. and when you ask to become a volunteer, you are met with no
> response.  

You don't need to "ask" to become a volunteer: just do it. As Doug said,
create a patch, submit the patch to JIRA and let the community look at it.
Change something on the Wiki if you don't think that the documentation is
particularly well there. Use Nutch to do whatever you like, and if you feel
that you contributed something that is applicable to a broader community
outside of your domain, let people know about it. If it's really cool, I
wouldn't worry about people ignoring you: they'll come around.

> It's also frustrating when you share some heard earned
> insights into something that nutch needs to work on, like pdf parsing,
> and your comments don't get a single good response from the nutch dev
> team.  

The nutch "dev team" isn't focused on PDF parsing. Nutch is a search engine
framework, and to Nutch, a PDF parser is a "black box" that conforms to a
standard parsing interface that can be swapped out as technology evolves.
Right now, Nutch uses PDFBox, but in a week it could use "hot super new rad
PDF parsing technology X.1", or some other greater PDF parser. If you feel
that PDFBox isn't getting the job done for your particular domain, then post
an actual question, not pointers to documents for the Nutch developers to go
read. Honestly, I'm guessing they don't have the time, nor the desire to go
read a whole bunch of PDF documentation unless there's a real use case, and
a real need to upgrade the existing parser. Empirically show that Nutch's
PDF capabilities aren't getting the job done, post your results to the list,
and let the community look them. I'd guess you'd generate more interest and
probably get a better response that way.

> 
> Sometimes, in OS projects I get the feeling that the developers breathe
> different air than users, and that our help is not wanted or that our
> questions are stupid and not worth their time to answer.

As far as I can tell the Nutch developers all breathe the same air as us
(and moreover, I believe they put on their pants "one leg at a time")

> 
> Nutch is nowhere near being a dead project, that is not what I said (I
> said it was close, not closed), its just that I don't feel that it's
> something that anyone can just download and use without running into
> problems.  

Problems is a generic word: I would agree with your statement if you
qualified what "problems" means. Small problems like configuration issues?
I'd buy that. Exception messages not providing super super detailed
information about the error? Sure, I'd even buy that in some cases. However,
larger, bigger problems that generally fall in the class of "bugs"? I would
say the answer to that is probably a "no".

> Problems always exist, but need to be documented correctly so
> that they can be solved quickly.  I think nutch has a long way to go
> before it is comparable to tomcat or httpd, which are both production
> ready and have literally volumes of information on using in every manner
> possible.  

Check out the commiters list on Tomcat (
http://tomcat.apache.org/whoweare.html) versus that of Nutch (
http://lucene.apache.org/nutch/credits.html). 21 active commiters on the
Tomcat PMC and many more emeritus commiters. Nutch has less than 10. To have
the wealth of capability and functionality that Nutch provides, with the
ability to deploy it in production quality environments (which I can assure
you, after having been on the mailing lists for the better part of a year,
there are plenty), and its ease of use, I would have to respectfully
disagree with the majority of your assertions and say that the Nutch folks
are doing a great job.

Now, can we please take this discussion off the public mailing lists? I
would think that the majority of folks on the list would like to move on. I
know that I would.

Cheers,
  Chris


> 
> I am sorry if you don't like my opinion or the way it is expressed.
> 
> -----Original Message-----
> From: carmmello [mailto:[EMAIL PROTECTED]
> Sent: Saturday, March 04, 2006 10:54 AM
> To: [email protected]
> Subject: RE: project vitality?
> 
> 
> I really can not agree with the way Mr. Richard Braman express his
> views.  I have tried Nutch since version 0.3 and I could not make the
> 0.8 release  work (Nutch is becoming a little bit complicated with all
> those map reduce, hadoop, and so on, that I can't deal with).  I
> understand, however,  that if a product is not finished yet,  some times
> it may fail with the lack of some fundamental documentation, but, if
> there is a bunch of people who develops, for free, a product that is
> commercially worth some thousands of dollars and may fit our purposes,
> we have to say thanks.  After that we can, of course, express our views,
> complaints and suggestions, but we should refrain from some hard, non
> relevant comments, that goes nowhere, like this, non technical, post of
> mine. I, myself, have my own experimental implementation of Nutch
> 0.7.1.x (a nightly version), with more than 400,000 pages, that can be,
> sometimes, viewed at brazilian working hours, at
> http://www.qualidade.eng.br/constelacao.htm .  It is in portuguese, but
> english terms related to quality, standards and environment can be
> searched.
> 


Reply via email to