Hello

I'm trying to re-index my filesystem.
First I create an Index with the normal crawl command. This works and in
the end i can search my index with luke.
But if i start my re-index script to re-index the same filesystem, i get
a invalid index in the end with luke.
I searched a while to find the failer but i didn't find one.

Maybe some one could help me.

Here is my little script:

------------------------------------------------------------------------
---
cd C:/eclipse_projects/nutchTrunk/

webdb_dir=c:/nutchIndexFile/crawldb
segments_dir=c:/nutchIndexFile/segments
index_dir=c:/nutchIndexFile/index
link_dir=c:/nutchIndexFile/linkdb
indexes_dir=c:/nutchIndexFile/indexes/

# The generate/fetch/update cycle with depth 2
for ((i=1; i <= 2 ; i++))
do
  bin/nutch generate $webdb_dir $segments_dir
  segment=`ls -d $segments_dir/* | tail -1`
  bin/nutch fetch $segment
  bin/nutch updatedb $webdb_dir $segment
done

#the 2 represents the depth
for segment in `ls -d $segments_dir/* | tail -2`
do
  bin/nutch index $indexes_dir $webdb_dir $link_dir $segment
done

# De-duplicate indexes
bin/nutch dedup $indexes_dir

mkdir c:/tmpNutch

bin/nutch merge -workingdir c:/tmpNutch/ $index_dir $indexes_dir
------------------------------------------------------------------------
--

Thx a lot
Alain


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to