Re: nutch 0.9 "no results" ??

Susam Pal Thu, 01 May 2008 08:41:03 -0700

On Thu, May 1, 2008 at 3:17 PM, ili chimad <[EMAIL PROTECTED]> wrote:
> Thnks S.P for your quik response
>
>
>  > 1. Check logs/hadoop.log file. Do you see any lines
>  > containing the
>  > string "fetching". Such lines should clearly show
>  > what URLs have been
>  > fetched.
>
>  there are many fetchinf line there, i think it's not for this reason.
>
>
>  > 2. One reason may be that all URLs are blocked in
>  > conf/crawl-urlfilter.txt. Did you edit this file as per the
>  > tutorial?
>  > If not, this is most certainly the problem. An easy way to
>  > allow all
>  > URLs would be to replace the .- in the end with .+
>  >
>
>  yes, like this:
>  # accept hosts in MY.DOMAIN.NAME
>  +^http://([a-z0-9]*\.)*hustoo.net/
>  # skip everything else
>  +.
>
>  what do you think about Tomcat 
> 6.0\webapps\nutch-0.9\WEB-INF\classesnutch-site.xml:
>  <configuration>
>  <property>
>
>     <name>searcher.dir</name>
>
>     <value>C:\nutch-0.9\crawl\</value>
>
>   </property>
>  </configuration>
>
>  in ths first i think it's a "\" problem or the path in generally ??
>
>  THANKS for any suggestion..


That can be the reason. I haven't used Nutch on Windows, so I don't
know about the kind of issues one might face on Windows. The default
value for this property is 'crawl'. You can try removing this property
from nutch-site.xml so that the default value from nutch-default.xml
is used. Then change your current directory to the directory that
contains the 'crawl' directory and restart Nutch. If it works, then
most certainly, the absolute path you have given is causing the
problem. You could then try something like C:/nutch-0.9/crawl/ and see
if it works. By the way, did you try searching from command prompt
using the bin/nutch crawl command. That will ensure that your index is
correct and provides results.

Regards,
Susam Pal

Re: nutch 0.9 "no results" ??

Reply via email to