Dear all. I have a problem with nutch Internet crawl/recrawl script (I'm wanted to understand how it works, so I wrote it by myself).
After I merge indexes (merging segments seems to be fine), I search doesn't work for me: $ bin/nutch org.apache.nutch.searcher.NutchBean http Total hits: 0 Before recrawling I was able to search (index was placed at crawl/indexes) My script: --------------------------------------------- #!/bin/bash export JAVA_HOME=/usr/lib/jvm/java-6-sun #Inject new urls bin/nutch inject crawl/crawldb dmoz/urls echo "new URLs injected (dmoz/urls)" #generate segments bin/nutch generate crawl/crawldb crawl/segments -topN $3 echo "segments generated" #generate fetch-list s1=`ls -d crawl/segments/2* | tail -1` echo $s1 echo "fetch-list generated" #fetch bin/nutch fetch $s1 -threads $2 echo "fetching done" #update the database with results of fetch bin/nutch updatedb crawl/crawldb $s1 echo "database updated" #merge segments bin/nutch mergesegs crawl/MERGEDsegments crawl/segments/* rm -r crawl/segments mv crawl/MERGEDsegments crawl/segments echo "segments merged" #inverting links bin/nutch invertlinks crawl/linkdb -dir crawl/segments echo "links inverted" #indexing bin/nutch index crawl/NEWindexes crawl/crawldb crawl/linkdb crawl/segments/* echo "indexing done" #dedup - delete duplicate documents in the index bin/nutch dedup crawl/NEWindexes echo "dedup done" #merging indexes bin/nutch merge crawl/MERGEDindexes crawl/NEWindexes echo "indexes merged" # replace indexes with indexes_merged mv --verbose crawl/indexes crawl/OLDindexes mv --verbose crawl/MERGEDindexes crawl/indexes/part-00000 #clean up rm -rf crawl/NEWindexes rm -rf crawl/OLDindexes ------------------------------------------------- What's wrong with the script? Thank You in advance, Kind Regards, -- Andrey Sapegin, Software Developer, Unister GmbH [email protected] www.unister.de

