Did you setup the solr mappings? When you index into nutch, do they appear there when you query nutch's interface?
On Jul 31, 2010, at 5:12 PM, Max Lynch <[email protected]> wrote: > Hi, > I'm following the nutch tutorial (http://wiki.apache.org/nutch/NutchTutorial) > and everything seems to be working fine, except when I try to run > > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb > crawl/segments/* > > The document count on my solr server doesn't change (I'm viewing > /solr/admin/stats.jsp). I've even go so far as to explicitly issue a > <commit /> using curl, with no success. > > It seems like my fetch routine grabs a ton of documents, but only a few make > it to solr if at all (there are about 2000 in there already from a previous > nutch solrindex that added a few). How can I tell how many documents nutch > is sending to solr? Should I just modify the solrindex driver program? > > Just for reference, my nutch cycle looks like this: > > $ bin/nutch inject crawlwi/crawldb wiurls/ > $ bin/nutch generate crawlwi/crawldb crawlwi/segments > > Then I ran the following a few times, with the newest segment in a variable: > $ s1=`ls -d crawlwi/segments/2* | tail -1` > $ echo $s1 > $ bin/nutch fetch $s1 -threads 15 > $ bin/nutch updatedb crawlwi/crawldb $s1 > $ bin/nutch generate crawlwi/crawldb crawlwi/segments -topN 5000 > > Then > $ bin/nutch invertlinks crawlwi/linkdb -dir crawlwi/segments > $ bin/nutch index crawlwi/indexes crawlwi/crawldb crawlwi/linkdb > crawlwi/segments/* > $ bin/nutch solrindex http://127.0.0.1/solr/ crawlwi/crawldb crawlwi/linkdb > crawlwi/segments/* > > But the new documents don't make the index. > > Any ideas? > Thanks.

