The solr schema and mappings all seem to work fine. It's just that sometimes I run solrindex and no documents get added to the solr index and I have no indication of why that might be. I see my fetcher grabbing thousands of pages and yet my doc count on solr doesn't increase.
I've cleared my index and have been following the steps here: http://wiki.apache.org/nutch/RunningNutchAndSolr and it seems to be working better. I'm just not sure why these steps seem to work better yet the nutch tutorial steps before didn't. The only difference I can see is the -noParse and parse steps added. I think it's the non-determinism or lack of output that unsettles me. Can I enable debugging output or something? On Sat, Jul 31, 2010 at 8:34 PM, Scott Gonyea <[email protected]> wrote: > Did you setup the solr mappings? When you index into nutch, do they appear > there when you query nutch's interface? > > On Jul 31, 2010, at 5:12 PM, Max Lynch <[email protected]> wrote: > > > Hi, > > I'm following the nutch tutorial ( > http://wiki.apache.org/nutch/NutchTutorial) > > and everything seems to be working fine, except when I try to run > > > > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb > crawl/linkdb > > crawl/segments/* > > > > The document count on my solr server doesn't change (I'm viewing > > /solr/admin/stats.jsp). I've even go so far as to explicitly issue a > > <commit /> using curl, with no success. > > > > It seems like my fetch routine grabs a ton of documents, but only a few > make > > it to solr if at all (there are about 2000 in there already from a > previous > > nutch solrindex that added a few). How can I tell how many documents > nutch > > is sending to solr? Should I just modify the solrindex driver program? > > > > Just for reference, my nutch cycle looks like this: > > > > $ bin/nutch inject crawlwi/crawldb wiurls/ > > $ bin/nutch generate crawlwi/crawldb crawlwi/segments > > > > Then I ran the following a few times, with the newest segment in a > variable: > > $ s1=`ls -d crawlwi/segments/2* | tail -1` > > $ echo $s1 > > $ bin/nutch fetch $s1 -threads 15 > > $ bin/nutch updatedb crawlwi/crawldb $s1 > > $ bin/nutch generate crawlwi/crawldb crawlwi/segments -topN 5000 > > > > Then > > $ bin/nutch invertlinks crawlwi/linkdb -dir crawlwi/segments > > $ bin/nutch index crawlwi/indexes crawlwi/crawldb crawlwi/linkdb > > crawlwi/segments/* > > $ bin/nutch solrindex http://127.0.0.1/solr/ crawlwi/crawldb > crawlwi/linkdb > > crawlwi/segments/* > > > > But the new documents don't make the index. > > > > Any ideas? > > Thanks. >

