setting search dir for nutch web app

Mark Lim Fri, 12 Mar 2010 10:48:57 -0800

Just sharing my experience with setting the search directory for the
nutch webapp.  This is a leading cause of the disappointing "Hits 0-0
(out of about 0 total matching pages)" message.

I had a situation like Noah Silverman:

> On Thu, 2009-12-17 at 16:32 -0800, Noah Silverman wrote:
>   
>> Hello,
>>
>> Just to summarize.
>>
>> 1) Nutch crawl completes without error.
>>
>> 2) I can search from command line and see results.  (Assume this
means
>> that index is created.)
>>     bin/nutch org.apache.nutch.searcher.NutchBean foobar
>>
>> 3) Tomcat configured through nutch-site file to point to nutch/crawl
>> directory
>>
>> 4) catalina.out logfile indicates that tomcat is opening nutch/crawl
>>     2009-12-16 22:00:39,740 INFO SearchBean - opening indexes in
>> /home/noah/Documents/nutch/crawl/indexes
>>
>> 5) No results when searching in web front end
>>
>> 6) No errors in the logs
>>
>> Is there some way to debug this?  Perhaps more verbose logging?
>>
>> Thanks!!!
>>
>> -N

The log message in 4 is only somewhat helpful since if anything goes
wrong, nothing will be said. Noah's problem was that he needed to point
to the top level directory.  My case was that I needed to set the
permissions correctly.

I had crawled as root so the crawl directory was root:root with
permissions 544. (at least readable)  I moved it to $TOMCAT/work and
gave it ownership $TOMCAT_USER:$TOMCAT_GROUP with permissions 755.   Now
it works.  

In any case, the nutch web app will simply log at info that it's opening
indexes at $DIR.  If permissions are wrong, or the directory doesn't
exist, it will say nothing, not even at debug logging.  No exceptions
will be thrown.

setting search dir for nutch web app

Reply via email to