Strange. Do you use a stock solrmapping.xml config? Custom indexing plugins that might be copying fields?
On Monday 18 July 2011 15:54:29 Germán Biozzoli wrote: > Hi everybody > > I've a small batch nutch 1.2 based process that is crawling a site and > after that insert data into a Solr instance. After updating to 1.3 > version it starts to generate problems in solr configuration. It seems > be generating duplicate url, segment, boost and other fields that > were not configured as multipleValue. Additionally it seems to be > generating duplicate values. > > hadoop.log: > > > > org.apache.solr.common.SolrException: ERROR: > [http://www.marketshare.com/] multiple values encountered for non > multiValued field boost: [1.0985885, 1.0985885] > > ERROR: [http://www.marketshare.com/] multiple values encountered for > non multiValued field boost: [1.0985885, 1.0985885] > > request: http://localhost:8983/solr/market/update?wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt > pSolrServer.java:436) at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt > pSolrServer.java:245) at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac > tUpdateRequest.java:105) at > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) > > My process do the following: > > rm -rf crawl > rm *.out > > NUTCH_HOME="/home/nutch-nuevo/runtime/local" > ${NUTCH_HOME}/bin/nutch crawl urls -dir crawl -depth 10 -topN 300000 > > log.out > > segment=`ls -d crawl/segments/*` > ${NUTCH_HOME}/bin/nutch updatedb crawl/crawldb $segment > ${NUTCH_HOME}/bin/nutch invertlinks crawl/linkdb -dir crawl/segments > > invert.out > > ${NUTCH_HOME}/bin/nutch solrindex http://localhost:8983/solr/bolsa/ > crawl/crawldb crawl/linkdb crawl/segments/* > index.out > > > Additionaly, the crawl process starts saying that solrUrl is not > configured, it means that I could index directly using Solr without > the previous step in Lucene? > > Any hint? > Thank you in advance > Germán -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

