I set multiValued="true" on my schema and I don't see the error anymore. Could it be the interaction with the parse-feed plugin?
Either way, it's working so I'm happy. I'm on nutch 1.1 and solr 1.4.1 On Mon, Aug 2, 2010 at 12:03 PM, Markus Jelsma <[email protected]>wrote: > Hi, > > > > It makes no sense indeed. But check your solrindex-mapping.xml in the Nutch > configuration directory, it might copy the field. Also, check your > schema.xml in the Solr configuration for it might do the same. > > > > To make it a bit more complicated, don't you have some deduplication > mechanism somewhere? It can prevent any additions to the index if you didn't > properly configure it, such as a recurring field value as a source for the > signature. > > > > And, what Nutch and Solr versions are you using? I have had multiple setups > with Nutch 1.0, 1.1 and trunk and Solr 1.4 and 1.4.1 but never came across > your error for the title field, some shipped Nutch configurations did > actually mess up the url and id fields in the Solr index, which are not > multi valued. > > > Cheers, > > -----Original message----- > From: Max Lynch <[email protected]> > Sent: Mon 02-08-2010 18:32 > To: [email protected]; > Subject: Re: Nutch SolrIndex command not adding documents > > So, I figured out the log debugging stuff (just had to modify some stuff in > log4j.properties), and I've found the source of my solrindex errors. First > of all, many dates in my index fail to parse properly in > MoreIndexingFilter.java, so I added another date format of the type "EEE > MMM > dd HH:mm:ss zzz yyyy" which I will make a bug tracker entry and a patch > for. > > However, I've also encountered this issue: > "multiple_values_encountered_for_non_multiValued_field_title" > which crashes the job. In my solr schema I don't allow multiple values for > the "title" field (as per the nutch default). Why would the parser find > multiple title values? Seems to be another bug. > > Any ideas? > > Thanks. > > > On Sat, Jul 31, 2010 at 9:11 PM, Max Lynch <[email protected]> wrote: > > > The solr schema and mappings all seem to work fine. It's just that > > sometimes I run solrindex and no documents get added to the solr index > and I > > have no indication of why that might be. I see my fetcher grabbing > > thousands of pages and yet my doc count on solr doesn't increase. > > > > I've cleared my index and have been following the steps here: > > http://wiki.apache.org/nutch/RunningNutchAndSolr and it seems to be > > working better. I'm just not sure why these steps seem to work better > yet > > the nutch tutorial steps before didn't. The only difference I can see is > > the -noParse and parse steps added. > > > > I think it's the non-determinism or lack of output that unsettles me. > Can > > I enable debugging output or something? > > > > > > On Sat, Jul 31, 2010 at 8:34 PM, Scott Gonyea <[email protected]> wrote: > > > >> Did you setup the solr mappings? When you index into nutch, do they > appear > >> there when you query nutch's interface? > >> > >> On Jul 31, 2010, at 5:12 PM, Max Lynch <[email protected]> wrote: > >> > >> > Hi, > >> > I'm following the nutch tutorial ( > >> http://wiki.apache.org/nutch/NutchTutorial) > >> > and everything seems to be working fine, except when I try to run > >> > > >> > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb > >> crawl/linkdb > >> > crawl/segments/* > >> > > >> > The document count on my solr server doesn't change (I'm viewing > >> > /solr/admin/stats.jsp). I've even go so far as to explicitly issue a > >> > <commit /> using curl, with no success. > >> > > >> > It seems like my fetch routine grabs a ton of documents, but only a > few > >> make > >> > it to solr if at all (there are about 2000 in there already from a > >> previous > >> > nutch solrindex that added a few). How can I tell how many documents > >> nutch > >> > is sending to solr? Should I just modify the solrindex driver > program? > >> > > >> > Just for reference, my nutch cycle looks like this: > >> > > >> > $ bin/nutch inject crawlwi/crawldb wiurls/ > >> > $ bin/nutch generate crawlwi/crawldb crawlwi/segments > >> > > >> > Then I ran the following a few times, with the newest segment in a > >> variable: > >> > $ s1=`ls -d crawlwi/segments/2* | tail -1` > >> > $ echo $s1 > >> > $ bin/nutch fetch $s1 -threads 15 > >> > $ bin/nutch updatedb crawlwi/crawldb $s1 > >> > $ bin/nutch generate crawlwi/crawldb crawlwi/segments -topN 5000 > >> > > >> > Then > >> > $ bin/nutch invertlinks crawlwi/linkdb -dir crawlwi/segments > >> > $ bin/nutch index crawlwi/indexes crawlwi/crawldb crawlwi/linkdb > >> > crawlwi/segments/* > >> > $ bin/nutch solrindex http://127.0.0.1/solr/ crawlwi/crawldb > >> crawlwi/linkdb > >> > crawlwi/segments/* > >> > > >> > But the new documents don't make the index. > >> > > >> > Any ideas? > >> > Thanks. > >> > > > > >

