Hi everybody I've a small batch nutch 1.2 based process that is crawling a site and after that insert data into a Solr instance. After updating to 1.3 version it starts to generate problems in solr configuration. It seems be generating duplicate url, segment, boost and other fields that were not configured as multipleValue. Additionally it seems to be generating duplicate values.
hadoop.log: org.apache.solr.common.SolrException: ERROR: [http://www.marketshare.com/] multiple values encountered for non multiValued field boost: [1.0985885, 1.0985885] ERROR: [http://www.marketshare.com/] multiple values encountered for non multiValued field boost: [1.0985885, 1.0985885] request: http://localhost:8983/solr/market/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) My process do the following: rm -rf crawl rm *.out NUTCH_HOME="/home/nutch-nuevo/runtime/local" ${NUTCH_HOME}/bin/nutch crawl urls -dir crawl -depth 10 -topN 300000 > log.out segment=`ls -d crawl/segments/*` ${NUTCH_HOME}/bin/nutch updatedb crawl/crawldb $segment ${NUTCH_HOME}/bin/nutch invertlinks crawl/linkdb -dir crawl/segments > invert.out ${NUTCH_HOME}/bin/nutch solrindex http://localhost:8983/solr/bolsa/ crawl/crawldb crawl/linkdb crawl/segments/* > index.out Additionaly, the crawl process starts saying that solrUrl is not configured, it means that I could index directly using Solr without the previous step in Lucene? Any hint? Thank you in advance Germán

