Hello I'm trying to re-index my filesystem. First I create an Index with the normal crawl command. This works and in the end i can search my index with luke. But if i start my re-index script to re-index the same filesystem, i get a invalid index in the end with luke. I searched a while to find the failer but i didn't find one.
Maybe some one could help me. Here is my little script: ------------------------------------------------------------------------ --- cd C:/eclipse_projects/nutchTrunk/ webdb_dir=c:/nutchIndexFile/crawldb segments_dir=c:/nutchIndexFile/segments index_dir=c:/nutchIndexFile/index link_dir=c:/nutchIndexFile/linkdb indexes_dir=c:/nutchIndexFile/indexes/ # The generate/fetch/update cycle with depth 2 for ((i=1; i <= 2 ; i++)) do bin/nutch generate $webdb_dir $segments_dir segment=`ls -d $segments_dir/* | tail -1` bin/nutch fetch $segment bin/nutch updatedb $webdb_dir $segment done #the 2 represents the depth for segment in `ls -d $segments_dir/* | tail -2` do bin/nutch index $indexes_dir $webdb_dir $link_dir $segment done # De-duplicate indexes bin/nutch dedup $indexes_dir mkdir c:/tmpNutch bin/nutch merge -workingdir c:/tmpNutch/ $index_dir $indexes_dir ------------------------------------------------------------------------ -- Thx a lot Alain ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
