On Wed, Apr 28, 2010 at 10:27 AM, matthew a. grisius
<mgris...@comcast.net>wrote:

> I also share many of Phil's sentiments. I really want the project
> (bin/nutch crawl) to work for me as well and I want to help somehow. I
> would like to share a 5gb 'intranet' web site with ~50 people. And I
> have not graduated to making the 'deepcrawl' script work yet either, as
> I'm thinking that maybe Nutch might not be the 'right tool' for 'little
> projects' based on documentation, discussion list feedback, etc. . . .
>

I think it's exactly what you need to do that. I was able to get the 1.0
release to work pretty quickly. Working 8 hour days, I had a server built
and Nutch crawling sites within 40 hours. Actually after I found one
specific tutorial I can get Nutch running in a basic bin/nutch crawl sort of
way in about an hour. Wish I had found that site the first day...

Going through that documentation, I found that it lacked one step and I fed
that back to the author. He has already fixed it for 1.0 and if you follow
his steps from top to bottom, you will get Nutch 1.0 running.

The site is here:

http://centoshelp.org/servers/installing-configuring-nutch-nutch-gui-sun-jdk-tomcat-6-on-centos-5.x

Nutch 1.1 also follows the same installation steps and you get a working
interface, but the crawls don't work well enough to get data into the
indexes.

Reply via email to