On Wed, Apr 28, 2010 at 10:27 AM, matthew a. grisius <mgris...@comcast.net>wrote:
> I also share many of Phil's sentiments. I really want the project > (bin/nutch crawl) to work for me as well and I want to help somehow. I > would like to share a 5gb 'intranet' web site with ~50 people. And I > have not graduated to making the 'deepcrawl' script work yet either, as > I'm thinking that maybe Nutch might not be the 'right tool' for 'little > projects' based on documentation, discussion list feedback, etc. . . . > I think it's exactly what you need to do that. I was able to get the 1.0 release to work pretty quickly. Working 8 hour days, I had a server built and Nutch crawling sites within 40 hours. Actually after I found one specific tutorial I can get Nutch running in a basic bin/nutch crawl sort of way in about an hour. Wish I had found that site the first day... Going through that documentation, I found that it lacked one step and I fed that back to the author. He has already fixed it for 1.0 and if you follow his steps from top to bottom, you will get Nutch 1.0 running. The site is here: http://centoshelp.org/servers/installing-configuring-nutch-nutch-gui-sun-jdk-tomcat-6-on-centos-5.x Nutch 1.1 also follows the same installation steps and you get a working interface, but the crawls don't work well enough to get data into the indexes.