Re: Error while indexing (mapred)

Raghavendra Prabhu Sat, 11 Feb 2006 21:44:57 -0800

Hi

Where do we change the no of open files?
Where do we do it in the master system


Rgds
Prabhu


On 2/12/06, Chris Schneider <[EMAIL PROTECTED]> wrote:
>
> Florent,
>
> You might want to try increasing the number of open files allowed on your
> master machine. We've increased this twice now, and each time it solved
> similar problems. We now have it at 16K. See my other post today (re:
> Corrupt NDFS?) for more details.
>
> Good Luck,
>
> - Chris
>
> At 11:07 AM -0500 2/10/06, Florent Gluck wrote:
> >Hi,
> >
> >I have 4 boxes (1 master, 3 slaves), about 33GB worth of segment data
> >and 4.6M fetched urls in my crawldb.  I'm using the mapred code from
> >trunk  (revision 374061, Wed, 01 Feb 2006).
> >I was able to generate the indexes from the crawldb and linkdb, but I
> >started to see this error recently while  running a dedup on my indexes:
> >
> >....
> >060210 061707  reduce 9%
> >060210 061710  reduce 10%
> >060210 061713  reduce 11%
> >060210 061717  reduce 12%
> >060210 061719  reduce 11%
> >060210 061723  reduce 10%
> >060210 061725  reduce 11%
> >060210 061726  reduce 10%
> >060210 061729  reduce 11%
> >060210 061730  reduce 9%
> >060210 061732  reduce 10%
> >060210 061736  reduce 11%
> >060210 061739  reduce 12%
> >060210 061742  reduce 10%
> >060210 061743  reduce 9%
> >060210 061745  reduce 10%
> >060210 061746  reduce 100%
> >Exception in thread "main" java.io.IOException: Job failed!
> >  at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:310)
> >  at
> >org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java
> :329)
> >  at
> >org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:349)
> >
> >I can see a lot of these messages in the jobtracker log on the master:
> >...
> >060210 061743 Task 'task_r_4t50k4' has been lost.
> >060210 061743 Task 'task_r_79vn7i' has been lost.
> >...
> >
> >On every single slave, I get this file not found exception in the
> >tasktracker log:
> >060210 061749 Server handler 0 on 50040 caught:
> >java.io.FileNotFoundException:
> >/var/epile/nutch/mapred/local/task_m_273opj/part-4.out
> >java.io.FileNotFoundException:
> >/var/epile/nutch/mapred/local/task_m_273opj/part-4.out
> >        at
> >org.apache.nutch.fs.LocalFileSystem.openRaw(LocalFileSystem.java:121)
> >at
> >org.apache.nutch.fs.NFSDataInputStream$Checker.<init>(
> NFSDataInputStream.java:45)
> >        at
> >org.apache.nutch.fs.NFSDataInputStream.<init>(NFSDataInputStream.java
> :226)
> >        at
> >org.apache.nutch.fs.NutchFileSystem.open(NutchFileSystem.java:160)
> >        at
> >org.apache.nutch.mapred.MapOutputFile.write(MapOutputFile.java:93)
> >        at
> >org.apache.nutch.io.ObjectWritable.writeObject(ObjectWritable.java:121)
> >        at org.apache.nutch.io.ObjectWritable.write(ObjectWritable.java
> :68)
> >        at org.apache.nutch.ipc.Server$Handler.run(Server.java:215)
> >
> >I used to be able to complete the index dedupping successfully when my
> >segments/crawldb was smaller, but I don't see why this would be related
> >to the FileNotFoundException.  I'm by far not running out of disk space
> >and my hard discs work properly.
> >
> >Has anyone encountered a similar issue or has a clue about what's
> happening?
> >
> >Thanks,
> >Florent
>
> --
> ------------------------
> Chris Schneider
> TransPac Software, Inc.
> [EMAIL PROTECTED]
> ------------------------
>

Re: Error while indexing (mapred)

Reply via email to