Hi all,
I use nutch-0.8-dev with metadatas. If i update the crawldb a NPE on line 71 in CrawldbReducer occurs.

060412 163435 job_yh0f7t
java.lang.NullPointerException
        at org.apache.nutch.crawl.CrawlDbReducer.reduce(CrawlDbReducer.java:71)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:283)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run (LocalJobRunner.java:144)
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
        at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
        at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:102)


The reason is that the "highest" CrawlDatum has no metadatas (null), but this metadata are set to the "result" CrawlDatum.
Line 67:
    result.set(highest);


After that the metadata's from the "result" CrawlDatum are used.
Line 71:
        result.getMetaData().putAll(old.getMetaData());


Is this a bug?

Marko

Reply via email to