Marko Bauhardt wrote:
Hi all,
I use nutch-0.8-dev with metadatas. If i update the crawldb a NPE on
line 71 in CrawldbReducer occurs.
060412 163435 job_yh0f7t
java.lang.NullPointerException
at
org.apache.nutch.crawl.CrawlDbReducer.reduce(CrawlDbReducer.java:71)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:283)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:144)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:102)
The reason is that the "highest" CrawlDatum has no metadatas (null),
but this metadata are set to the "result" CrawlDatum.
Line 67:
result.set(highest);
After that the metadata's from the "result" CrawlDatum are used.
Line 71:
result.getMetaData().putAll(old.getMetaData());
Is this a bug?
Yes. I think this should be fixed in CrawlDbReducer - although having
this CrawlDatum field null seems awkward, I thought I'd rather
initialize it when creating CrawlDatum - but this may lead to
unnecessary creation of many MapWritable object... so, we'll fix it here.
Please try this patch:
Index: src/java/org/apache/nutch/crawl/CrawlDbReducer.java
===================================================================
--- src/java/org/apache/nutch/crawl/CrawlDbReducer.java (revision 393266)
+++ src/java/org/apache/nutch/crawl/CrawlDbReducer.java (working copy)
@@ -68,6 +68,7 @@
if (old != null) {
// copy metadata from old, if exists
if (old.getMetaData() != null) {
+ if (result.getMetaData() == null) result.setMetaData(new
MapWritable());
result.getMetaData().putAll(old.getMetaData());
// overlay with new, if any
if (highest.getMetaData() != null)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers