I got this exception a lot, too. I haven't tested the patch by Andrzej
yet but instead I just put the doc.add() lines in the indexer reduce
function in a try-catch block . This way the indexing finishes even with
a null value and i can see which documents haven't been indexed in the
log file.
Wouldn't it be a good idea to catch every exceptions that only affect
one document in loops like this? At least I don't like it if an indexing
process dies after a few hours because one document triggers such an
exception.
best regards,
Dominik
Byron Miller wrote:
60111 103432 reduce > reduce
060111 103432 Optimizing index.
060111 103433 closing > reduce
060111 103434 closing > reduce
060111 103435 closing > reduce
java.lang.NullPointerException: value cannot be null
at
org.apache.lucene.document.Field.<init>(Field.java:469)
at
org.apache.lucene.document.Field.<init>(Field.java:412)
at
org.apache.lucene.document.Field.UnIndexed(Field.java:195)
at
org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198)
at
org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
at
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90)
Exception in thread "main" java.io.IOException: Job
failed!
at
org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
at
org.apache.nutch.indexer.Indexer.index(Indexer.java:259)
at
org.apache.nutch.crawl.Crawl.main(Crawl.java:121)
[EMAIL PROTECTED]:/data/nutch/trunk$
Pulled todays build and got above error. No problems
running out of disk space or anything like that. This
is a single instance, local file systems.
Anyway to recover the crawl/finish the reduce job from
where it failed?
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers