Oh yeah, I built a presentation and gave it to our local Linux User Group
meeting. You might find it useful:

http://leap-cf.org/presentations/nutch/NutchWebCrawler.odp

On Sat, May 1, 2010 at 2:10 AM, Phil Barnett <ph...@philb.us> wrote:

>
>
> On Wed, Apr 28, 2010 at 10:27 AM, matthew a. grisius <mgris...@comcast.net
> > wrote:
>
>> I also share many of Phil's sentiments. I really want the project
>> (bin/nutch crawl) to work for me as well and I want to help somehow. I
>> would like to share a 5gb 'intranet' web site with ~50 people. And I
>> have not graduated to making the 'deepcrawl' script work yet either, as
>> I'm thinking that maybe Nutch might not be the 'right tool' for 'little
>> projects' based on documentation, discussion list feedback, etc. . . .
>>
>
> I think it's exactly what you need to do that. I was able to get the 1.0
> release to work pretty quickly. Working 8 hour days, I had a server built
> and Nutch crawling sites within 40 hours. Actually after I found one
> specific tutorial I can get Nutch running in a basic bin/nutch crawl sort of
> way in about an hour. Wish I had found that site the first day...
>
> Going through that documentation, I found that it lacked one step and I fed
> that back to the author. He has already fixed it for 1.0 and if you follow
> his steps from top to bottom, you will get Nutch 1.0 running.
>
> The site is here:
>
>
> http://centoshelp.org/servers/installing-configuring-nutch-nutch-gui-sun-jdk-tomcat-6-on-centos-5.x
>
> Nutch 1.1 also follows the same installation steps and you get a working
> interface, but the crawls don't work well enough to get data into the
> indexes.
>

Reply via email to