Please ignore my earlier message I think is due to some other reason ....
Rgds Prabhu On 2/14/06, Raghavendra Prabhu <[EMAIL PROTECTED]> wrote: > > Hi Florent > > Does the mapreduce go in a loop > > Can you let us know the environment > > Are you running on windows or linus > > If on windows ,you should use Cygwin > > Rgds > Prabhu > > > On 2/14/06, Florent Gluck <[EMAIL PROTECTED]> wrote: > > > > Chris, > > > > I bumpped the maximum number of open file descriptors to 32k, but still > > no luck: > > > > ... > > 060214 062901 reduce 9% > > 060214 062905 reduce 10% > > 060214 062908 reduce 11% > > 060214 062911 reduce 12% > > 060214 062914 reduce 11% > > 060214 062917 reduce 10% > > 060214 062918 reduce 9% > > 060214 062919 reduce 10% > > 060214 062923 reduce 9% > > 060214 062924 reduce 10% > > Exception in thread "main" java.io.IOException: Job failed! > > at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:310) > > at > > org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java > > :329) > > at > > org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java > > :349) > > > > Exactly the same error messages as before. > > I guess I'll take my chances with the latest revision in trunk and try > > again :-/ > > > > --Florent > > > > Chris Schneider wrote: > > > > >Florent, > > > > > >You might want to try increasing the number of open files allowed on > > your master machine. We've increased this twice now, and each time it solved > > similar problems. We now have it at 16K. See my other post today (re: > > Corrupt NDFS?) for more details. > > > > > >Good Luck, > > > > > >- Chris > > > > > >At 11:07 AM -0500 2/10/06, Florent Gluck wrote: > > > > > > > > >>Hi, > > >> > > >>I have 4 boxes (1 master, 3 slaves), about 33GB worth of segment data > > >>and 4.6M fetched urls in my crawldb. I'm using the mapred code from > > >>trunk (revision 374061, Wed, 01 Feb 2006). > > >>I was able to generate the indexes from the crawldb and linkdb, but I > > >>started to see this error recently while running a dedup on my > > indexes: > > >> > > >>.... > > >>060210 061707 reduce 9% > > >>060210 061710 reduce 10% > > >>060210 061713 reduce 11% > > >>060210 061717 reduce 12% > > >>060210 061719 reduce 11% > > >>060210 061723 reduce 10% > > >>060210 061725 reduce 11% > > >>060210 061726 reduce 10% > > >>060210 061729 reduce 11% > > >>060210 061730 reduce 9% > > >>060210 061732 reduce 10% > > >>060210 061736 reduce 11% > > >>060210 061739 reduce 12% > > >>060210 061742 reduce 10% > > >>060210 061743 reduce 9% > > >>060210 061745 reduce 10% > > >>060210 061746 reduce 100% > > >>Exception in thread "main" java.io.IOException: Job failed! > > >> at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:310) > > >> at > > >>org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java > > :329) > > >> at > > >>org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java > > :349) > > >> > > >>I can see a lot of these messages in the jobtracker log on the master: > > >>... > > >>060210 061743 Task 'task_r_4t50k4' has been lost. > > >>060210 061743 Task 'task_r_79vn7i' has been lost. > > >>... > > >> > > >>On every single slave, I get this file not found exception in the > > >>tasktracker log: > > >>060210 061749 Server handler 0 on 50040 caught: > > >>java.io.FileNotFoundException: > > >>/var/epile/nutch/mapred/local/task_m_273opj/part-4.out > > >>java.io.FileNotFoundException: > > >>/var/epile/nutch/mapred/local/task_m_273opj/part-4.out > > >> at > > >>org.apache.nutch.fs.LocalFileSystem.openRaw(LocalFileSystem.java:121) > > >>at > > >>org.apache.nutch.fs.NFSDataInputStream$Checker.<init>( > > NFSDataInputStream.java:45) > > >> at > > >> org.apache.nutch.fs.NFSDataInputStream.<init>(NFSDataInputStream.java > > :226) > > >> at > > >>org.apache.nutch.fs.NutchFileSystem.open(NutchFileSystem.java:160) > > >> at > > >>org.apache.nutch.mapred.MapOutputFile.write (MapOutputFile.java:93) > > >> at > > >>org.apache.nutch.io.ObjectWritable.writeObject(ObjectWritable.java > > :121) > > >> at org.apache.nutch.io.ObjectWritable.write(ObjectWritable.java > > :68) > > >> at org.apache.nutch.ipc.Server$Handler.run(Server.java:215) > > >> > > >>I used to be able to complete the index dedupping successfully when my > > >>segments/crawldb was smaller, but I don't see why this would be > > related > > >>to the FileNotFoundException. I'm by far not running out of disk > > space > > >>and my hard discs work properly. > > >> > > >>Has anyone encountered a similar issue or has a clue about what's > > happening? > > >> > > >>Thanks, > > >>Florent > > >> > > >> > > > > > > > > > > > > > > > >
