No results on sites other than www.apache.org

Daniel Garcia Tue, 10 Jun 2008 16:52:16 -0700

I've followed the tutorial on the Wiki site and have successfuly indexed a few 
pages on www.apache.com with the command


bin/nutch crawl /etc/opt/nutch/urls -dir /var/lib/nutch-crawls/test1 -depth 3 
-topN 50

a query for "apache" on my local nutch/tomcat installation gives me  52 
matching pages. Next I changed

/usr/local/nutch/conf/crawl-urlfilter.txt

to allow to www.circuitcity.com with +^http://www.circuitcity.com/. I also 
added the root page to /etc/opt/nutch/urls/circuitcity. I clear out my test run 
with

rm /var/lib/nutch-crawls/test1/* -Rf

and rerun my crawl

bin/nutch crawl /etc/opt/nutch/urls -dir /var/lib/nutch-crawls/test1 -depth 3 
-topN 50

I looks like it downloads plenty of pages (all from circuitcity). When I try 
searching for anything on the tomcat/nutch app I get 0 results all the time. I 
can switch back to  apache and the index turns up results. Is there a  config 
file I missed somewhere?

Regards,
Daniel Garcia

No results on sites other than www.apache.org

Reply via email to