Hey

While I am running the whole process flow of Nutch i.e. Inject,Generate,Fetch,Parse,Update.

The following errors are being logged:

*Generator Job*

java.lang.Exception: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:34)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2017-03-07 15:28:07,696 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException: job failed: name=[rss_new]generate: 1488880683-1996901673, jobid=job_local78754654_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:121)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:233)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:262)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:328)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:336)

*Fetcher Job:*

java.lang.Exception: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.nutch.fetcher.FetcherJob$FetcherMapper.map(FetcherJob.java:96)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

*Parser Job:*

java.lang.Exception: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:80)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

The plugin.folder directory specified in conf/nutch-site.xml is correct. And, when checked in code it point towards the line where the class is specified.

Like public class GeneratorMapper(). What changes need to be made in the configuration files.

--
Thanks and Regards,
Shubham Gupta

Reply via email to