Re: Error while indexing (mapred)

Raghavendra Prabhu Tue, 14 Feb 2006 07:38:53 -0800

Hi Florent

Does the mapreduce go in a loop


Can you let us know the environment

Are you running on windows or linus

If on windows ,you should use  Cygwin

Rgds
Prabhu


On 2/14/06, Florent Gluck <[EMAIL PROTECTED]> wrote:
>
> Chris,
>
> I bumpped the maximum number of open file descriptors to 32k, but still
> no luck:
>
> ...
> 060214 062901  reduce 9%
> 060214 062905  reduce 10%
> 060214 062908  reduce 11%
> 060214 062911  reduce 12%
> 060214 062914  reduce 11%
> 060214 062917  reduce 10%
> 060214 062918  reduce 9%
> 060214 062919  reduce 10%
> 060214 062923  reduce 9%
> 060214 062924  reduce 10%
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:310)
>        at
> org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:329)
>        at
> org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:349)
>
> Exactly the same error messages as before.
> I guess I'll take my chances with the latest revision in trunk and try
> again :-/
>
> --Florent
>
> Chris Schneider wrote:
>
> >Florent,
> >
> >You might want to try increasing the number of open files allowed on your
> master machine. We've increased this twice now, and each time it solved
> similar problems. We now have it at 16K. See my other post today (re:
> Corrupt NDFS?) for more details.
> >
> >Good Luck,
> >
> >- Chris
> >
> >At 11:07 AM -0500 2/10/06, Florent Gluck wrote:
> >
> >
> >>Hi,
> >>
> >>I have 4 boxes (1 master, 3 slaves), about 33GB worth of segment data
> >>and 4.6M fetched urls in my crawldb.  I'm using the mapred code from
> >>trunk  (revision 374061, Wed, 01 Feb 2006).
> >>I was able to generate the indexes from the crawldb and linkdb, but I
> >>started to see this error recently while  running a dedup on my indexes:
> >>
> >>....
> >>060210 061707  reduce 9%
> >>060210 061710  reduce 10%
> >>060210 061713  reduce 11%
> >>060210 061717  reduce 12%
> >>060210 061719  reduce 11%
> >>060210 061723  reduce 10%
> >>060210 061725  reduce 11%
> >>060210 061726  reduce 10%
> >>060210 061729  reduce 11%
> >>060210 061730  reduce 9%
> >>060210 061732  reduce 10%
> >>060210 061736  reduce 11%
> >>060210 061739  reduce 12%
> >>060210 061742  reduce 10%
> >>060210 061743  reduce 9%
> >>060210 061745  reduce 10%
> >>060210 061746  reduce 100%
> >>Exception in thread "main" java.io.IOException: Job failed!
> >> at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:310)
> >> at
> >>org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java
> :329)
> >> at
> >>org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java
> :349)
> >>
> >>I can see a lot of these messages in the jobtracker log on the master:
> >>...
> >>060210 061743 Task 'task_r_4t50k4' has been lost.
> >>060210 061743 Task 'task_r_79vn7i' has been lost.
> >>...
> >>
> >>On every single slave, I get this file not found exception in the
> >>tasktracker log:
> >>060210 061749 Server handler 0 on 50040 caught:
> >>java.io.FileNotFoundException:
> >>/var/epile/nutch/mapred/local/task_m_273opj/part-4.out
> >>java.io.FileNotFoundException:
> >>/var/epile/nutch/mapred/local/task_m_273opj/part-4.out
> >>       at
> >>org.apache.nutch.fs.LocalFileSystem.openRaw(LocalFileSystem.java:121)
> >>at
> >>org.apache.nutch.fs.NFSDataInputStream$Checker.<init>(
> NFSDataInputStream.java:45)
> >>       at
> >>org.apache.nutch.fs.NFSDataInputStream.<init>(NFSDataInputStream.java
> :226)
> >>       at
> >>org.apache.nutch.fs.NutchFileSystem.open(NutchFileSystem.java:160)
> >>       at
> >>org.apache.nutch.mapred.MapOutputFile.write(MapOutputFile.java:93)
> >>       at
> >>org.apache.nutch.io.ObjectWritable.writeObject(ObjectWritable.java:121)
> >>       at org.apache.nutch.io.ObjectWritable.write(ObjectWritable.java
> :68)
> >>       at org.apache.nutch.ipc.Server$Handler.run(Server.java:215)
> >>
> >>I used to be able to complete the index dedupping successfully when my
> >>segments/crawldb was smaller, but I don't see why this would be related
> >>to the FileNotFoundException.  I'm by far not running out of disk space
> >>and my hard discs work properly.
> >>
> >>Has anyone encountered a similar issue or has a clue about what's
> happening?
> >>
> >>Thanks,
> >>Florent
> >>
> >>
> >
> >
> >
>
>
>

Re: Error while indexing (mapred)

Reply via email to