Hi Germán, To answer the last part of your thread yes we can pass the -solr parameter to the nutch crawl command just as documented here [1]. This would save you the annoying task of specifying a separate solrindex command when wishing to index to Solr.
I've not had this particular error before, however maybe Markus' comments can help. [1] http://wiki.apache.org/nutch/bin/nutch_crawl On Mon, Jul 18, 2011 at 2:54 PM, Germán Biozzoli <[email protected]>wrote: > Hi everybody > > I've a small batch nutch 1.2 based process that is crawling a site and > after that insert data into a Solr instance. After updating to 1.3 > version it starts to generate problems in solr configuration. It seems > be generating duplicate url, segment, boost and other fields that > were not configured as multipleValue. Additionally it seems to be > generating duplicate values. > > hadoop.log: > > > > org.apache.solr.common.SolrException: ERROR: > [http://www.marketshare.com/] multiple values encountered for non > multiValued field boost: [1.0985885, 1.0985885] > > ERROR: [http://www.marketshare.com/] multiple values encountered for > non multiValued field boost: [1.0985885, 1.0985885] > > request: http://localhost:8983/solr/market/update?wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) > > My process do the following: > > rm -rf crawl > rm *.out > > NUTCH_HOME="/home/nutch-nuevo/runtime/local" > ${NUTCH_HOME}/bin/nutch crawl urls -dir crawl -depth 10 -topN 300000 > > log.out > > segment=`ls -d crawl/segments/*` > ${NUTCH_HOME}/bin/nutch updatedb crawl/crawldb $segment > ${NUTCH_HOME}/bin/nutch invertlinks crawl/linkdb -dir crawl/segments > > invert.out > > ${NUTCH_HOME}/bin/nutch solrindex http://localhost:8983/solr/bolsa/ > crawl/crawldb crawl/linkdb crawl/segments/* > index.out > > > Additionaly, the crawl process starts saying that solrUrl is not > configured, it means that I could index directly using Solr without > the previous step in Lucene? > > Any hint? > Thank you in advance > Germán > -- *Lewis*

