Strange. Do you use a stock solrmapping.xml config? Custom indexing plugins 
that might be copying fields?

On Monday 18 July 2011 15:54:29 Germán Biozzoli wrote:
> Hi everybody
> 
> I've a small batch nutch 1.2 based process that is crawling a site and
> after that insert data into a Solr instance. After updating to 1.3
> version it starts to generate problems in solr configuration. It seems
> be generating  duplicate url, segment, boost and other fields that
> were not configured as multipleValue. Additionally it seems to be
> generating duplicate values.
> 
> hadoop.log:
> 
> 
> 
> org.apache.solr.common.SolrException: ERROR:
> [http://www.marketshare.com/] multiple values encountered for non
> multiValued field boost: [1.0985885, 1.0985885]
> 
> ERROR: [http://www.marketshare.com/] multiple values encountered for
> non multiValued field boost: [1.0985885, 1.0985885]
> 
> request: http://localhost:8983/solr/market/update?wt=javabin&version=2
>       at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:436) at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:245) at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac
> tUpdateRequest.java:105) at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> 
> My process do the following:
> 
> rm -rf crawl
> rm *.out
> 
> NUTCH_HOME="/home/nutch-nuevo/runtime/local"
> ${NUTCH_HOME}/bin/nutch crawl urls -dir crawl -depth 10 -topN 300000 >
> log.out
> 
> segment=`ls -d crawl/segments/*`
> ${NUTCH_HOME}/bin/nutch updatedb crawl/crawldb $segment
> ${NUTCH_HOME}/bin/nutch invertlinks crawl/linkdb -dir crawl/segments >
> invert.out
> 
> ${NUTCH_HOME}/bin/nutch solrindex http://localhost:8983/solr/bolsa/
> crawl/crawldb crawl/linkdb crawl/segments/* > index.out
> 
> 
> Additionaly, the crawl process starts saying that solrUrl is not
> configured, it means that I could index directly using Solr without
> the previous step in Lucene?
> 
> Any hint?
> Thank you in advance
> Germán

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to