Hey
I was inserting the data in a table rss_webpage (webpage appended
automatically by nutch), but when i changed the table to rss_one_webpage
the error disappeared. Is this the reason behind Nutch or MongoDB.
Thanks and Regards,
Shubham Gupta
On Wednesday 08 March 2017 12:44 PM, shubham.gupta wrote:
Hey
While I am running the whole process flow of Nutch i.e.
Inject,Generate,Fetch,Parse,Update.
The following errors are being logged:
*Generator Job*
java.lang.Exception: java.lang.ClassCastException:
org.bson.types.ObjectId cannot be cast to java.lang.String
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.bson.types.ObjectId
cannot be cast to java.lang.String
at
org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-03-07 15:28:07,696 ERROR crawl.GeneratorJob - GeneratorJob:
java.lang.RuntimeException: job failed: name=[rss_new]generate:
1488880683-1996901673, jobid=job_local78754654_0001
at
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:121)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:233)
at
org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:262)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:328)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:336)
*Fetcher Job:*
java.lang.Exception: java.lang.ClassCastException:
org.bson.types.ObjectId cannot be cast to java.lang.String
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.bson.types.ObjectId
cannot be cast to java.lang.String
at
org.apache.nutch.fetcher.FetcherJob$FetcherMapper.map(FetcherJob.java:96)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
*Parser Job:*
java.lang.Exception: java.lang.ClassCastException:
org.bson.types.ObjectId cannot be cast to java.lang.String
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.bson.types.ObjectId
cannot be cast to java.lang.String
at
org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:80)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The plugin.folder directory specified in conf/nutch-site.xml is
correct. And, when checked in code it point towards the line where the
class is specified.
Like public class GeneratorMapper(). What changes need to be made in
the configuration files.