Hey

I was inserting the data in a table rss_webpage (webpage appended automatically by nutch), but when i changed the table to rss_one_webpage the error disappeared. Is this the reason behind Nutch or MongoDB.

Thanks and Regards,
Shubham Gupta

On Wednesday 08 March 2017 12:44 PM, shubham.gupta wrote:
Hey

While I am running the whole process flow of Nutch i.e. Inject,Generate,Fetch,Parse,Update.

The following errors are being logged:

*Generator Job*

java.lang.Exception: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:34)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2017-03-07 15:28:07,696 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException: job failed: name=[rss_new]generate: 1488880683-1996901673, jobid=job_local78754654_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:121)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:233)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:262)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:328)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:336)

*Fetcher Job:*

java.lang.Exception: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.nutch.fetcher.FetcherJob$FetcherMapper.map(FetcherJob.java:96)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

*Parser Job:*

java.lang.Exception: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot be cast to java.lang.String at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:80)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

The plugin.folder directory specified in conf/nutch-site.xml is correct. And, when checked in code it point towards the line where the class is specified.

Like public class GeneratorMapper(). What changes need to be made in the configuration files.


Reply via email to