Did you setup the solr mappings? When you index into nutch, do they appear 
there when you query nutch's interface?

On Jul 31, 2010, at 5:12 PM, Max Lynch <[email protected]> wrote:

> Hi,
> I'm following the nutch tutorial (http://wiki.apache.org/nutch/NutchTutorial)
> and everything seems to be working fine, except when I try to run
> 
> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb
> crawl/segments/*
> 
> The document count on my solr server doesn't change (I'm viewing
> /solr/admin/stats.jsp).  I've even go so far as to explicitly issue a
> <commit /> using curl, with no success.
> 
> It seems like my fetch routine grabs a ton of documents, but only a few make
> it to solr if at all (there are about 2000 in there already from a previous
> nutch solrindex that added a few).  How can I tell how many documents nutch
> is sending to solr?  Should I just modify the solrindex driver program?
> 
> Just for reference, my nutch cycle looks like this:
> 
> $ bin/nutch inject crawlwi/crawldb wiurls/
> $ bin/nutch generate crawlwi/crawldb crawlwi/segments
> 
> Then I ran the following a few times, with the newest segment in a variable:
> $ s1=`ls -d crawlwi/segments/2* | tail -1`
> $ echo $s1
> $ bin/nutch fetch $s1 -threads 15
> $ bin/nutch updatedb crawlwi/crawldb $s1
> $ bin/nutch generate crawlwi/crawldb crawlwi/segments -topN 5000
> 
> Then
> $ bin/nutch invertlinks crawlwi/linkdb -dir crawlwi/segments
> $ bin/nutch index crawlwi/indexes crawlwi/crawldb crawlwi/linkdb
> crawlwi/segments/*
> $ bin/nutch solrindex http://127.0.0.1/solr/ crawlwi/crawldb crawlwi/linkdb
> crawlwi/segments/*
> 
> But the new documents don't make the index.
> 
> Any ideas?
> Thanks.

Reply via email to