[Nutch-general] Re: Error while indexing (mapred)

Raghavendra Prabhu Tue, 14 Feb 2006 07:42:08 -0800

Please ignore my earlier message

I think is due to some other reason ....


Rgds
Prabhu


On 2/14/06, Raghavendra Prabhu <[EMAIL PROTECTED]> wrote:
>
> Hi Florent
>
> Does the mapreduce go in a loop
>
> Can you let us know the environment
>
> Are you running on windows or linus
>
> If on windows ,you should use  Cygwin
>
> Rgds
> Prabhu
>
>
>  On 2/14/06, Florent Gluck <[EMAIL PROTECTED]> wrote:
> >
> > Chris,
> >
> > I bumpped the maximum number of open file descriptors to 32k, but still
> > no luck:
> >
> > ...
> > 060214 062901  reduce 9%
> > 060214 062905  reduce 10%
> > 060214 062908  reduce 11%
> > 060214 062911  reduce 12%
> > 060214 062914  reduce 11%
> > 060214 062917  reduce 10%
> > 060214 062918  reduce 9%
> > 060214 062919  reduce 10%
> > 060214 062923  reduce 9%
> > 060214 062924  reduce 10%
> > Exception in thread "main" java.io.IOException: Job failed!
> >        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:310)
> >        at
> > org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java
> > :329)
> >        at
> > org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java
> > :349)
> >
> > Exactly the same error messages as before.
> > I guess I'll take my chances with the latest revision in trunk and try
> > again :-/
> >
> > --Florent
> >
> > Chris Schneider wrote:
> >
> > >Florent,
> > >
> > >You might want to try increasing the number of open files allowed on
> > your master machine. We've increased this twice now, and each time it solved
> > similar problems. We now have it at 16K. See my other post today (re:
> > Corrupt NDFS?) for more details.
> > >
> > >Good Luck,
> > >
> > >- Chris
> > >
> > >At 11:07 AM -0500 2/10/06, Florent Gluck wrote:
> > >
> > >
> > >>Hi,
> > >>
> > >>I have 4 boxes (1 master, 3 slaves), about 33GB worth of segment data
> > >>and 4.6M fetched urls in my crawldb.  I'm using the mapred code from
> > >>trunk  (revision 374061, Wed, 01 Feb 2006).
> > >>I was able to generate the indexes from the crawldb and linkdb, but I
> > >>started to see this error recently while  running a dedup on my
> > indexes:
> > >>
> > >>....
> > >>060210 061707  reduce 9%
> > >>060210 061710  reduce 10%
> > >>060210 061713  reduce 11%
> > >>060210 061717  reduce 12%
> > >>060210 061719  reduce 11%
> > >>060210 061723  reduce 10%
> > >>060210 061725  reduce 11%
> > >>060210 061726  reduce 10%
> > >>060210 061729  reduce 11%
> > >>060210 061730  reduce 9%
> > >>060210 061732  reduce 10%
> > >>060210 061736  reduce 11%
> > >>060210 061739  reduce 12%
> > >>060210 061742  reduce 10%
> > >>060210 061743  reduce 9%
> > >>060210 061745  reduce 10%
> > >>060210 061746  reduce 100%
> > >>Exception in thread "main" java.io.IOException: Job failed!
> > >> at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:310)
> > >> at
> > >>org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java
> > :329)
> > >> at
> > >>org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java
> > :349)
> > >>
> > >>I can see a lot of these messages in the jobtracker log on the master:
> > >>...
> > >>060210 061743 Task 'task_r_4t50k4' has been lost.
> > >>060210 061743 Task 'task_r_79vn7i' has been lost.
> > >>...
> > >>
> > >>On every single slave, I get this file not found exception in the
> > >>tasktracker log:
> > >>060210 061749 Server handler 0 on 50040 caught:
> > >>java.io.FileNotFoundException:
> > >>/var/epile/nutch/mapred/local/task_m_273opj/part-4.out
> > >>java.io.FileNotFoundException:
> > >>/var/epile/nutch/mapred/local/task_m_273opj/part-4.out
> > >>       at
> > >>org.apache.nutch.fs.LocalFileSystem.openRaw(LocalFileSystem.java:121)
> > >>at
> > >>org.apache.nutch.fs.NFSDataInputStream$Checker.<init>(
> > NFSDataInputStream.java:45)
> > >>       at
> > >> org.apache.nutch.fs.NFSDataInputStream.<init>(NFSDataInputStream.java
> > :226)
> > >>       at
> > >>org.apache.nutch.fs.NutchFileSystem.open(NutchFileSystem.java:160)
> > >>       at
> > >>org.apache.nutch.mapred.MapOutputFile.write (MapOutputFile.java:93)
> > >>       at
> > >>org.apache.nutch.io.ObjectWritable.writeObject(ObjectWritable.java
> > :121)
> > >>       at org.apache.nutch.io.ObjectWritable.write(ObjectWritable.java
> > :68)
> > >>       at org.apache.nutch.ipc.Server$Handler.run(Server.java:215)
> > >>
> > >>I used to be able to complete the index dedupping successfully when my
> > >>segments/crawldb was smaller, but I don't see why this would be
> > related
> > >>to the FileNotFoundException.  I'm by far not running out of disk
> > space
> > >>and my hard discs work properly.
> > >>
> > >>Has anyone encountered a similar issue or has a clue about what's
> > happening?
> > >>
> > >>Thanks,
> > >>Florent
> > >>
> > >>
> > >
> > >
> > >
> >
> >
> >
>

[Nutch-general] Re: Error while indexing (mapred)

Reply via email to