Re: 1.3 nutch / solr problem

lewis john mcgibbney Mon, 18 Jul 2011 08:00:25 -0700

Hi Germán,

To answer the last part of your thread yes we can pass the -solr parameter
to the nutch crawl command just as documented here [1]. This would save you
the annoying task of specifying a separate solrindex command when wishing to
index to Solr.


I've not had this particular error before, however maybe Markus' comments
can help.

[1] http://wiki.apache.org/nutch/bin/nutch_crawl

On Mon, Jul 18, 2011 at 2:54 PM, Germán Biozzoli
<[email protected]>wrote:

> Hi everybody
>
> I've a small batch nutch 1.2 based process that is crawling a site and
> after that insert data into a Solr instance. After updating to 1.3
> version it starts to generate problems in solr configuration. It seems
> be generating  duplicate url, segment, boost and other fields that
> were not configured as multipleValue. Additionally it seems to be
> generating duplicate values.
>
> hadoop.log:
>
>
>
> org.apache.solr.common.SolrException: ERROR:
> [http://www.marketshare.com/] multiple values encountered for non
> multiValued field boost: [1.0985885, 1.0985885]
>
> ERROR: [http://www.marketshare.com/] multiple values encountered for
> non multiValued field boost: [1.0985885, 1.0985885]
>
> request: http://localhost:8983/solr/market/update?wt=javabin&version=2
>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436)
>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)
>        at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
>
> My process do the following:
>
> rm -rf crawl
> rm *.out
>
> NUTCH_HOME="/home/nutch-nuevo/runtime/local"
> ${NUTCH_HOME}/bin/nutch crawl urls -dir crawl -depth 10 -topN 300000 >
> log.out
>
> segment=`ls -d crawl/segments/*`
> ${NUTCH_HOME}/bin/nutch updatedb crawl/crawldb $segment
> ${NUTCH_HOME}/bin/nutch invertlinks crawl/linkdb -dir crawl/segments >
> invert.out
>
> ${NUTCH_HOME}/bin/nutch solrindex http://localhost:8983/solr/bolsa/
> crawl/crawldb crawl/linkdb crawl/segments/* > index.out
>
>
> Additionaly, the crawl process starts saying that solrUrl is not
> configured, it means that I could index directly using Solr without
> the previous step in Lucene?
>
> Any hint?
> Thank you in advance
> Germán
>



-- 
*Lewis*

Re: 1.3 nutch / solr problem

Reply via email to