Nutch 2, Solr 5 - solrdedup causes ClassCastException:

Tom Chiverton Fri, 14 Oct 2016 06:31:17 -0700

I've tried using both Solr 6 and 5 with the latest Nutch 2, and withboth I am getting an error from Nutch's bin/crawl.

mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -Dmapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -Dmapred.reduce.tasks.speculative.execution=false -Dmapred.map.tasks.speculative.execution=false -Dmapred.compress.map.output=true http://localhost:8983/solr/nutchException in thread "main" java.lang.RuntimeException: job failed:name=apache-nutch-2.3.1.jar, jobid=job_local2123017879_0001atorg.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)atorg.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:383)atorg.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:393)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

atorg.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:403)

Error running:

/mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -Dmapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -Dmapred.reduce.tasks.speculative.execution=false -Dmapred.map.tasks.speculative.execution=false -Dmapred.compress.map.output=true http://localhost:8983/solr/nutch

Failed with exit value 1.


hadoop.log says

java.lang.Exception: java.lang.ClassCastException: java.util.ArrayListcannot be cast to java.lang.Stringatorg.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)atorg.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)Caused by: java.lang.ClassCastException: java.util.ArrayList cannot becast to java.lang.Stringatorg.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrRecordReader.nextKeyValue(SolrDeleteDuplicates.java:233)atorg.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)atorg.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)atorg.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)

        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)

atorg.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)atjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Which appears to be related to the digest field somehow...

Is this a known bug ? Do I need a particular version of Nutch with aparticular Solr or something ?

--
*Tom Chiverton*
Lead Developer
e:      [email protected] <mailto:[email protected]>
p:      0161 817 2922
t:      @extravision <http://www.twitter.com/extravision>
w:      www.extravision.com <http://www.extravision.com/>

Extravision - email worth seeing <http://www.extravision.com/>

Registered in the UK at: 107 Timber Wharf, 33 Worsley Street,Manchester, M15 4LD.

Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19

This e-mail is intended solely for the person to whom it is addressedand may contain confidential or privileged information.Any views or opinions presented in this e-mail are solely of the authorand do not necessarily represent those of Extravision Ltd.

Nutch 2, Solr 5 - solrdedup causes ClassCastException:

Reply via email to