Oh yeah, I built a presentation and gave it to our local Linux User Group meeting. You might find it useful:
http://leap-cf.org/presentations/nutch/NutchWebCrawler.odp On Sat, May 1, 2010 at 2:10 AM, Phil Barnett <ph...@philb.us> wrote: > > > On Wed, Apr 28, 2010 at 10:27 AM, matthew a. grisius <mgris...@comcast.net > > wrote: > >> I also share many of Phil's sentiments. I really want the project >> (bin/nutch crawl) to work for me as well and I want to help somehow. I >> would like to share a 5gb 'intranet' web site with ~50 people. And I >> have not graduated to making the 'deepcrawl' script work yet either, as >> I'm thinking that maybe Nutch might not be the 'right tool' for 'little >> projects' based on documentation, discussion list feedback, etc. . . . >> > > I think it's exactly what you need to do that. I was able to get the 1.0 > release to work pretty quickly. Working 8 hour days, I had a server built > and Nutch crawling sites within 40 hours. Actually after I found one > specific tutorial I can get Nutch running in a basic bin/nutch crawl sort of > way in about an hour. Wish I had found that site the first day... > > Going through that documentation, I found that it lacked one step and I fed > that back to the author. He has already fixed it for 1.0 and if you follow > his steps from top to bottom, you will get Nutch 1.0 running. > > The site is here: > > > http://centoshelp.org/servers/installing-configuring-nutch-nutch-gui-sun-jdk-tomcat-6-on-centos-5.x > > Nutch 1.1 also follows the same installation steps and you get a working > interface, but the crawls don't work well enough to get data into the > indexes. >