Hi, I'm following the nutch tutorial (http://wiki.apache.org/nutch/NutchTutorial) and everything seems to be working fine, except when I try to run
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/* The document count on my solr server doesn't change (I'm viewing /solr/admin/stats.jsp). I've even go so far as to explicitly issue a <commit /> using curl, with no success. It seems like my fetch routine grabs a ton of documents, but only a few make it to solr if at all (there are about 2000 in there already from a previous nutch solrindex that added a few). How can I tell how many documents nutch is sending to solr? Should I just modify the solrindex driver program? Just for reference, my nutch cycle looks like this: $ bin/nutch inject crawlwi/crawldb wiurls/ $ bin/nutch generate crawlwi/crawldb crawlwi/segments Then I ran the following a few times, with the newest segment in a variable: $ s1=`ls -d crawlwi/segments/2* | tail -1` $ echo $s1 $ bin/nutch fetch $s1 -threads 15 $ bin/nutch updatedb crawlwi/crawldb $s1 $ bin/nutch generate crawlwi/crawldb crawlwi/segments -topN 5000 Then $ bin/nutch invertlinks crawlwi/linkdb -dir crawlwi/segments $ bin/nutch index crawlwi/indexes crawlwi/crawldb crawlwi/linkdb crawlwi/segments/* $ bin/nutch solrindex http://127.0.0.1/solr/ crawlwi/crawldb crawlwi/linkdb crawlwi/segments/* But the new documents don't make the index. Any ideas? Thanks.

