[Nutch-dev] Re: Problem with latest SVN during reduce phase

Lukas Vlcek Thu, 12 Jan 2006 00:09:04 -0800

Hi,
I am facing this error as well. Now I located one particular document
which is causing it (it is msword document which can't be properly
parsed by parser). I have sent it to Andrzej in separed email. Let's
see if that helps...
Lukas


On 1/11/06, Dominik Friedrich <[EMAIL PROTECTED]> wrote:
> I got this exception a lot, too. I haven't tested the patch by Andrzej
> yet but instead I just put the doc.add() lines in the indexer reduce
> function in a try-catch block . This way the indexing finishes even with
> a null value and i can see which documents haven't been indexed in the
> log file.
>
> Wouldn't it be a good idea to catch every exceptions that only affect
> one document in loops like this? At least I don't like it if an indexing
> process dies after a few hours because one document triggers such an
> exception.
>
> best regards,
> Dominik
>
> Byron Miller wrote:
> > 60111 103432 reduce > reduce
> > 060111 103432 Optimizing index.
> > 060111 103433 closing > reduce
> > 060111 103434 closing > reduce
> > 060111 103435 closing > reduce
> > java.lang.NullPointerException: value cannot be null
> >         at
> > org.apache.lucene.document.Field.<init>(Field.java:469)
> >         at
> > org.apache.lucene.document.Field.<init>(Field.java:412)
> >         at
> > org.apache.lucene.document.Field.UnIndexed(Field.java:195)
> >         at
> > org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198)
> >         at
> > org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
> >         at
> > org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90)
> > Exception in thread "main" java.io.IOException: Job
> > failed!
> >         at
> > org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
> >         at
> > org.apache.nutch.indexer.Indexer.index(Indexer.java:259)
> >         at
> > org.apache.nutch.crawl.Crawl.main(Crawl.java:121)
> > [EMAIL PROTECTED]:/data/nutch/trunk$
> >
> >
> > Pulled todays build and got above error. No problems
> > running out of disk space or anything like that. This
> > is a single instance, local file systems.
> >
> > Anyway to recover the crawl/finish the reduce job from
> > where it failed?
> >
> >
> >
>
>
>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: Problem with latest SVN during reduce phase

Reply via email to