Re: project vitality? / less documentation is more!

2006-03-07 Thread Franz Werfel
Hello, Just my 2 cents: the Intranet crawl functionnality is VERY confusing. If it was just taken out of the tutorial, and out of the set of commands, that would actually help A LOT: I understood many many things about Nutch once I tried so-called whole-web crawling, where one has to use every

RE: project vitality? / less documentation is more!

2006-03-07 Thread Vanderdray, Jacob
Werfel [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 3:01 AM To: nutch-user@lucene.apache.org Subject: Re: project vitality? / less documentation is more! Hello, Just my 2 cents: the Intranet crawl functionnality is VERY confusing. If it was just taken out of the tutorial, and out

RE: project vitality? / less documentation is more!

2006-03-07 Thread Vanderdray, Jacob
on the current tutorial. Feel free to edit. http://wiki.apache.org/nutch/NutchTutorial Thanks, Jake. -Original Message- From: Franz Werfel [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 10:11 AM To: nutch-user@lucene.apache.org Subject: Re: project vitality? / less documentation

RE: project vitality? / less documentation is more!

2006-03-07 Thread Richard Braman
+1 -Original Message- From: Franz Werfel [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 3:01 AM To: nutch-user@lucene.apache.org Subject: Re: project vitality? / less documentation is more! Hello, Just my 2 cents: the Intranet crawl functionnality is VERY confusing

RE: project vitality? / less documentation is more!

2006-03-07 Thread Richard Braman
+1 -Original Message- From: Franz Werfel [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 10:11 AM To: nutch-user@lucene.apache.org Subject: Re: project vitality? / less documentation is more! Hello, single site crawling wouldn't address the confusion that results from the fact

Re: project vitality?

2006-03-06 Thread TDLN
Stefan. I know people having 500 mio pages index and I personal run crawls with ~300 pages per second. Sorry, but I have to ask: what kind of setup do you have (network, hw, nutch version) that you manage so many pages per second? Unless this is a company secret, it would be very nice to know

Re: project vitality?

2006-03-06 Thread Stefan Groschupf
Hi Thomas, for this crawl setup we have a test environment of nutch 0.8, 10xAMD's, custom linux build, 100Mbit eth1, 1Gb eth0, each box has a 'caching' dns server. Stefan Am 06.03.2006 um 15:59 schrieb TDLN: Stefan. I know people having 500 mio pages index and I personal run crawls

Re: project vitality?

2006-03-06 Thread mos
On 3/4/06, Stefan Groschupf: Just a general note, jira has a voting functionality. This allows everybody to vote an issue and can show in a very compressed style what the community is looking for. However it is not used that often yet. It would be great if more users can use it. That's a

Re: project vitality?

2006-03-06 Thread mos
On 3/4/06, Stefan Groschupf: Just a general note, jira has a voting functionality. This allows everybody to vote an issue and can show in a very compressed style what the community is looking for. However it is not used that often yet. It would be great if more users can use it. That's a

Re: project vitality?

2006-03-06 Thread Doug Cutting
Richard Braman wrote: I realy do think nutch is great, but I echo Matthias's comments that the community needs to come together and contirbute more back. And that comes with the requirement of making sure volunteers are given access to make their contributions part of the project. Here's how

Re: project vitality?

2006-03-06 Thread Doug Cutting
David Wallace wrote: Also, I've lost count of the number of times someone has posted something to the effect of I'll pay someone to give me Nutch support, simply because they find the existing documentation and mailing lists inadequate. Usually, that person gets told that the best way to get

Re: project vitality?

2006-03-05 Thread Byron Miller
I like to think of it as a framework. Building blocks to build what you ultimately need. If your after the one stop shop, plug in play, no development necessary then perhaps some other commercial systems may be your best bet. Mailing list is very active, most people get responses fairly quickly.

Re: [Nutch-general] Re: project vitality?

2006-03-05 Thread Greg Boulter
Hi, I think that this is my first post. I follow the mailing list and read as many of the emails as I can. I'm going to make a few proposals. I have obtained some money to spend on them. I use and get paid for my nutch expertise. I have some experience. I don't just speak for myself but also for

RE: project vitality?

2006-03-05 Thread David Wallace
Hello all, I think Nutch is a fantastic product. I used 0.6 initially, then 0.7. My 0.7 installation is in production, and mostly works really well. I haven't made the move to 0.8 yet, because the direction that Nutch has gone for 0.8 is quite different from what my organisation requires from

Re: project vitality?

2006-03-05 Thread Chris Lamprecht
I think of the Nutch project as a marathon, not a sprint. Nutch's stated goals include: * Scale to entire web - pages on millions of different servers - billions of pages * Support high traffic - thousands of searches per second * State-of-the-art search quality (see

Re: [Nutch-general] Re: project vitality?

2006-03-05 Thread Greg Boulter
Hello again. OK - first of all I hate mailing lists. I don't consider them to be a valid form of communication for anything but the people doing the coding and don't really consider them of much use at all unless there is no other alternative. Except one - and that is when there needs to be

RE: [Nutch-general] Re: project vitality?

2006-03-05 Thread Richard Braman
I'll take part in your forum. Just added first post. -Original Message- From: Greg Boulter [mailto:[EMAIL PROTECTED] Sent: Sunday, March 05, 2006 6:33 PM To: nutch-user@lucene.apache.org Subject: Re: [Nutch-general] Re: project vitality? Hello again. OK - first of all I hate mailing

RE: project vitality?

2006-03-04 Thread Richard Braman
skin a little bit. -Original Message- From: sudhendra seshachala [mailto:[EMAIL PROTECTED] Sent: Saturday, March 04, 2006 1:26 AM To: nutch-user@lucene.apache.org Subject: Re: project vitality? I could not agree with Doug more. This is one of the best.. am trying UIMA too... though

Re: project vitality?

2006-03-04 Thread Stefan Groschupf
Hi Richard, I told you I was more than willing to help, and I think many users feel the same way, but I for one feel that there is a lack of documentation and support. This isn't meant to offend anyone, if you are offended you need to toughen up your skin a little bit. Here you can find

RE: project vitality?

2006-03-04 Thread Howie Wang
I agree that the doc could be better, but I still take issue with the earlier use of the phrase proof-of-concept. If there are dozens of sites using it in production, several of them indexing 100's of millions of pages, I don't know how you can call it proof-of-concept. Honestly, I'm not sure if

RE: project vitality?

2006-03-04 Thread Richard Braman
manner possible. I am sorry if you don't like my opinion or the way it is expressed. -Original Message- From: carmmello [mailto:[EMAIL PROTECTED] Sent: Saturday, March 04, 2006 10:54 AM To: nutch-user@incubator.apache.org Subject: RE: project vitality? I really can not agree

RE: project vitality?

2006-03-04 Thread Richard Braman
parsing algorithms, that aren't being used. Google does a good job parsing pdf, nutch has to do if its ogin to compete. -Original Message- From: Chris Mattmann [mailto:[EMAIL PROTECTED] Sent: Saturday, March 04, 2006 4:10 PM To: nutch-user@lucene.apache.org Subject: Re: project vitality

Re: project vitality?

2006-03-04 Thread Matthias Jaekle
I am sorry if you don't like my opinion or the way it is expressed. Hi Richard, most of your opinion I think is the same as mine. I use nutch now since spring 2004 for our page http://www.umkreisfinder.de It was a big effort to learn how nutch is working and also a big effort to learn how

RE: project vitality?

2006-03-04 Thread Richard Braman
started learning nutch. -Original Message- From: Matthias Jaekle [mailto:[EMAIL PROTECTED] Sent: Saturday, March 04, 2006 5:27 PM To: nutch-user@lucene.apache.org Subject: Re: project vitality? I am sorry if you don't like my opinion or the way it is expressed. Hi Richard, most

Re: project vitality?

2006-03-04 Thread Stefan Groschupf
Maybe we should organize us ourself a little bit better in this point. What do you think? Just a general note, jira has a voting functionality. This allows everybody to vote an issue and can show in a very compressed style what the community is looking for. However it is not used that often

Re: project vitality?

2006-03-04 Thread Chris Mattmann
[mailto:[EMAIL PROTECTED] Sent: Saturday, March 04, 2006 4:10 PM To: nutch-user@lucene.apache.org Subject: Re: project vitality? Hello, I've been following this conversation for the past week and decided that I'd go ahead and chime in now. I think that honestly this whole thread

project vitality?

2006-03-03 Thread Matt Wilkie
Hi there, I'm new around here. The mailing lists seem to have a pretty steady stream of traffic but the website hasn't been updated since august, and there's only a handful of news items before that. What is the vitality of Nutch project? Is it basically a labority proof of concept or a mature

RE: project vitality?

2006-03-03 Thread Richard Braman
Message- From: Matt Wilkie [mailto:[EMAIL PROTECTED] Sent: Friday, March 03, 2006 6:34 PM To: nutch-user@lucene.apache.org Subject: project vitality? Hi there, I'm new around here. The mailing lists seem to have a pretty steady stream of traffic but the website hasn't been updated since august

RE: project vitality?

2006-03-03 Thread Howie Wang
. -Original Message- From: Matt Wilkie [mailto:[EMAIL PROTECTED] Sent: Friday, March 03, 2006 6:34 PM To: nutch-user@lucene.apache.org Subject: project vitality? Hi there, I'm new around here. The mailing lists seem to have a pretty steady stream of traffic but the website hasn't been

Re: project vitality?

2006-03-03 Thread gekkokid
@lucene.apache.org Sent: Saturday, March 04, 2006 1:09 AM Subject: RE: project vitality? I wouldn't call Nutch 0.7.x proof-of-concept. There are several production sites running it already: http://wiki.apache.org/nutch/PublicServers Plus I think technorati is built on either Nutch and/or Lucene. That said

Re: project vitality?

2006-03-03 Thread sudhendra seshachala
I could not agree with Doug more. This is one of the best.. am trying UIMA too... though UIMA also uses Lucene...as of today, it is still a framework and community in early stages.. In fact the nightly builds has good improvements than 0.71. Any serious user or adopter should be trying