On Thu, May 1, 2008 at 3:17 PM, ili chimad <[EMAIL PROTECTED]> wrote: > Thnks S.P for your quik response > > > > 1. Check logs/hadoop.log file. Do you see any lines > > containing the > > string "fetching". Such lines should clearly show > > what URLs have been > > fetched. > > there are many fetchinf line there, i think it's not for this reason. > > > > 2. One reason may be that all URLs are blocked in > > conf/crawl-urlfilter.txt. Did you edit this file as per the > > tutorial? > > If not, this is most certainly the problem. An easy way to > > allow all > > URLs would be to replace the .- in the end with .+ > > > > yes, like this: > # accept hosts in MY.DOMAIN.NAME > +^http://([a-z0-9]*\.)*hustoo.net/ > # skip everything else > +. > > what do you think about Tomcat > 6.0\webapps\nutch-0.9\WEB-INF\classesnutch-site.xml: > <configuration> > <property> > > <name>searcher.dir</name> > > <value>C:\nutch-0.9\crawl\</value> > > </property> > </configuration> > > in ths first i think it's a "\" problem or the path in generally ?? > > THANKS for any suggestion..
That can be the reason. I haven't used Nutch on Windows, so I don't know about the kind of issues one might face on Windows. The default value for this property is 'crawl'. You can try removing this property from nutch-site.xml so that the default value from nutch-default.xml is used. Then change your current directory to the directory that contains the 'crawl' directory and restart Nutch. If it works, then most certainly, the absolute path you have given is causing the problem. You could then try something like C:/nutch-0.9/crawl/ and see if it works. By the way, did you try searching from command prompt using the bin/nutch crawl command. That will ensure that your index is correct and provides results. Regards, Susam Pal
