I tried to open index with Luke. It says "Incompatible format version: 2
expected 1 or lower" for both working and not working indexes.
P.S. I'm using Nutch 1.2
On 01/20/2011 02:45 AM, 黄淑明 wrote:
Can't make sure what happened, check out the "indexes" dir, find
out whether this dir is empty,
or you can use index management tools such as "luke" to make sure whether
index file is broken.
Tiger
2011-1-20
2011/1/18 Andrey Sapegin<[email protected]>
Dear all.
I have a problem with nutch Internet crawl/recrawl script (I'm wanted to
understand how it works, so I wrote it by myself).
After I merge indexes (merging segments seems to be fine), I search doesn't
work for me:
$ bin/nutch org.apache.nutch.searcher.NutchBean http
Total hits: 0
Before recrawling I was able to search (index was placed at crawl/indexes)
My script:
---------------------------------------------
#!/bin/bash
export JAVA_HOME=/usr/lib/jvm/java-6-sun
#Inject new urls
bin/nutch inject crawl/crawldb dmoz/urls
echo "new URLs injected (dmoz/urls)"
#generate segments
bin/nutch generate crawl/crawldb crawl/segments -topN $3
echo "segments generated"
#generate fetch-list
s1=`ls -d crawl/segments/2* | tail -1`
echo $s1
echo "fetch-list generated"
#fetch
bin/nutch fetch $s1 -threads $2
echo "fetching done"
#update the database with results of fetch
bin/nutch updatedb crawl/crawldb $s1
echo "database updated"
#merge segments
bin/nutch mergesegs crawl/MERGEDsegments crawl/segments/*
rm -r crawl/segments
mv crawl/MERGEDsegments crawl/segments
echo "segments merged"
#inverting links
bin/nutch invertlinks crawl/linkdb -dir crawl/segments
echo "links inverted"
#indexing
bin/nutch index crawl/NEWindexes crawl/crawldb crawl/linkdb
crawl/segments/*
echo "indexing done"
#dedup - delete duplicate documents in the index
bin/nutch dedup crawl/NEWindexes
echo "dedup done"
#merging indexes
bin/nutch merge crawl/MERGEDindexes crawl/NEWindexes
echo "indexes merged"
# replace indexes with indexes_merged
mv --verbose crawl/indexes crawl/OLDindexes
mv --verbose crawl/MERGEDindexes crawl/indexes/part-00000
#clean up
rm -rf crawl/NEWindexes
rm -rf crawl/OLDindexes
-------------------------------------------------
What's wrong with the script?
Thank You in advance,
Kind Regards,
--
Andrey Sapegin,
Software Developer,
Unister GmbH
[email protected]
www.unister.de
--
Andrey Sapegin,
Software Developer,
Unister GmbH
Dittrichring 18-20 | 04109 Leipzig
+49 (0)341 492885069,
+4915778339304,
[email protected]
www.unister.de