Hello,
Just my 2 cents: the Intranet crawl functionnality is VERY confusing.
If it was just taken out of the tutorial, and out of the set of
commands, that would actually help A LOT: I understood many many
things about Nutch once I tried so-called whole-web crawling, where
one has to use every
Werfel [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 07, 2006 3:01 AM
To: nutch-user@lucene.apache.org
Subject: Re: project vitality? / less documentation is more!
Hello,
Just my 2 cents: the Intranet crawl functionnality is VERY confusing.
If it was just taken out of the tutorial, and out
on the current tutorial. Feel free to edit.
http://wiki.apache.org/nutch/NutchTutorial
Thanks,
Jake.
-Original Message-
From: Franz Werfel [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 07, 2006 10:11 AM
To: nutch-user@lucene.apache.org
Subject: Re: project vitality? / less documentation
+1
-Original Message-
From: Franz Werfel [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 07, 2006 3:01 AM
To: nutch-user@lucene.apache.org
Subject: Re: project vitality? / less documentation is more!
Hello,
Just my 2 cents: the Intranet crawl functionnality is VERY confusing
+1
-Original Message-
From: Franz Werfel [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 07, 2006 10:11 AM
To: nutch-user@lucene.apache.org
Subject: Re: project vitality? / less documentation is more!
Hello,
single site crawling wouldn't address the confusion that results from
the fact
Stefan.
I know people having 500 mio pages index and I personal run crawls with
~300 pages per second.
Sorry, but I have to ask: what kind of setup do you have (network, hw, nutch
version) that you manage so many pages per second?
Unless this is a company secret, it would be very nice to know
Hi Thomas,
for this crawl setup we have a test environment of nutch 0.8,
10xAMD's, custom linux build, 100Mbit eth1, 1Gb eth0, each box has a
'caching' dns server.
Stefan
Am 06.03.2006 um 15:59 schrieb TDLN:
Stefan.
I know people having 500 mio pages index and I personal run
crawls
On 3/4/06, Stefan Groschupf:
Just a general note, jira has a voting functionality.
This allows everybody to vote an issue and can show in a very
compressed style what the community is looking for.
However it is not used that often yet. It would be great if more
users can use it.
That's a
On 3/4/06, Stefan Groschupf:
Just a general note, jira has a voting functionality.
This allows everybody to vote an issue and can show in a very
compressed style what the community is looking for.
However it is not used that often yet. It would be great if more
users can use it.
That's a
Richard Braman wrote:
I realy do think nutch is great, but I echo Matthias's comments that the
community needs to come together and contirbute more back. And that
comes with the requirement of making sure volunteers are given access to
make their contributions part of the project.
Here's how
David Wallace wrote:
Also, I've lost count of the number of times someone has posted
something to the effect of I'll pay someone to give me Nutch support,
simply because they find the existing documentation and mailing lists
inadequate. Usually, that person gets told that the best way to get
I like to think of it as a framework. Building blocks
to build what you ultimately need.
If your after the one stop shop, plug in play, no
development necessary then perhaps some other
commercial systems may be your best bet.
Mailing list is very active, most people get responses
fairly quickly.
Hi,
I think that this is my first post. I follow the mailing list and read as
many of the emails as I can.
I'm going to make a few proposals.
I have obtained some money to spend on them.
I use and get paid for my nutch expertise.
I have some experience.
I don't just speak for myself but also for
Hello all,
I think Nutch is a fantastic product. I used 0.6 initially, then 0.7.
My 0.7 installation is in production, and mostly works really well. I
haven't made the move to 0.8 yet, because the direction that Nutch has
gone for 0.8 is quite different from what my organisation requires from
I think of the Nutch project as a marathon, not a sprint. Nutch's
stated goals include:
* Scale to entire web
- pages on millions of different servers
- billions of pages
* Support high traffic
- thousands of searches per second
* State-of-the-art search quality
(see
Hello again.
OK - first of all I hate mailing lists. I don't consider them to be a valid
form of communication for anything but the people doing the coding and don't
really consider them of much use at all unless there is no other
alternative. Except one - and that is when there needs to be
I'll take part in your forum. Just added first post.
-Original Message-
From: Greg Boulter [mailto:[EMAIL PROTECTED]
Sent: Sunday, March 05, 2006 6:33 PM
To: nutch-user@lucene.apache.org
Subject: Re: [Nutch-general] Re: project vitality?
Hello again.
OK - first of all I hate mailing
skin a little bit.
-Original Message-
From: sudhendra seshachala [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 04, 2006 1:26 AM
To: nutch-user@lucene.apache.org
Subject: Re: project vitality?
I could not agree with Doug more. This is one of the best.. am trying
UIMA too... though
Hi Richard,
I told you I was more than willing to help, and I think many users
feel
the same way, but I for one feel that there is a lack of documentation
and support. This isn't meant to offend anyone, if you are
offended you
need to toughen up your skin a little bit.
Here you can find
I agree that the doc could be better, but I still take issue with
the earlier use of the phrase proof-of-concept. If there are
dozens of sites using it in production, several of them indexing
100's of millions of pages, I don't know how you can call it
proof-of-concept.
Honestly, I'm not sure if
manner
possible.
I am sorry if you don't like my opinion or the way it is expressed.
-Original Message-
From: carmmello [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 04, 2006 10:54 AM
To: nutch-user@incubator.apache.org
Subject: RE: project vitality?
I really can not agree
parsing algorithms, that aren't being used. Google does a good job
parsing pdf, nutch has to do if its ogin to compete.
-Original Message-
From: Chris Mattmann [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 04, 2006 4:10 PM
To: nutch-user@lucene.apache.org
Subject: Re: project vitality
I am sorry if you don't like my opinion or the way it is expressed.
Hi Richard,
most of your opinion I think is the same as mine. I use nutch now since
spring 2004 for our page http://www.umkreisfinder.de
It was a big effort to learn how nutch is working and also a big effort
to learn how
started learning nutch.
-Original Message-
From: Matthias Jaekle [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 04, 2006 5:27 PM
To: nutch-user@lucene.apache.org
Subject: Re: project vitality?
I am sorry if you don't like my opinion or the way it is expressed.
Hi Richard,
most
Maybe we should organize us ourself a little bit better in this point.
What do you think?
Just a general note, jira has a voting functionality.
This allows everybody to vote an issue and can show in a very
compressed style what the community is looking for.
However it is not used that often
[mailto:[EMAIL PROTECTED]
Sent: Saturday, March 04, 2006 4:10 PM
To: nutch-user@lucene.apache.org
Subject: Re: project vitality?
Hello,
I've been following this conversation for the past week and decided
that I'd go ahead and chime in now. I think that honestly this whole
thread
Hi there, I'm new around here. The mailing lists seem to have a pretty
steady stream of traffic but the website hasn't been updated since
august, and there's only a handful of news items before that. What is
the vitality of Nutch project? Is it basically a labority proof of
concept or a mature
Message-
From: Matt Wilkie [mailto:[EMAIL PROTECTED]
Sent: Friday, March 03, 2006 6:34 PM
To: nutch-user@lucene.apache.org
Subject: project vitality?
Hi there, I'm new around here. The mailing lists seem to have a pretty
steady stream of traffic but the website hasn't been updated since
august
.
-Original Message-
From: Matt Wilkie [mailto:[EMAIL PROTECTED]
Sent: Friday, March 03, 2006 6:34 PM
To: nutch-user@lucene.apache.org
Subject: project vitality?
Hi there, I'm new around here. The mailing lists seem to have a pretty
steady stream of traffic but the website hasn't been
@lucene.apache.org
Sent: Saturday, March 04, 2006 1:09 AM
Subject: RE: project vitality?
I wouldn't call Nutch 0.7.x proof-of-concept. There are several
production sites running it already:
http://wiki.apache.org/nutch/PublicServers
Plus I think technorati is built on either Nutch and/or Lucene.
That said
I could not agree with Doug more. This is one of the best.. am trying UIMA
too... though UIMA also uses Lucene...as of today, it is still a framework and
community in early stages..
In fact the nightly builds has good improvements than 0.71.
Any serious user or adopter should be trying
31 matches
Mail list logo