hi there, there was a little improvement; at least its not running out of ram anymore; but you're right there seems to be a side effect.
i am now having what seems to be disk issues! i am running in a VPS so i am suspecting this might have something to do with it? but what is the cause now? ==>> 00:36:28,912 INFO [TaskRunner] Task 'attempt_local_0001_m_000064_0' done. 00:36:29,104 INFO [MapTask] numReduceTasks: 1 00:36:29,104 INFO [MapTask] io.sort.mb = 100 00:36:29,240 INFO [MapTask] data buffer = 79691776/99614720 00:36:29,240 INFO [MapTask] record buffer = 262144/327680 00:36:29,260 INFO [CodecPool] Got brand-new decompressor 00:36:29,264 INFO [MapTask] Starting flush of map output 00:36:29,276 INFO [MapTask] Finished spill 0 00:36:29,280 INFO [TaskRunner] Task:attempt_local_0001_m_000065_0 is done. And is in the process of commiting 00:36:29,280 INFO [LocalJobRunner] file:/home/meda/workspace/web/crawl/segments/20091101171338/parse_text/part-00000/data:0+12655 00:36:29,280 INFO [TaskRunner] Task 'attempt_local_0001_m_000065_0' done. 00:36:38,533 WARN [LocalJobRunner] job_local_0001 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:381) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:150) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) On Tue, 2009-11-03 at 10:28 -0500, Kalaimathan Mahenthiran wrote: > if you set the mapred.child.java.opts > with additional value "-XX: -UseGCOverheadLimit" you can bypass this > exception. I don't know if it has any side effects as a result of > this.. > ex. > -Xmx512m -XX: -UseGCOverheadLimit > > > On Tue, Nov 3, 2009 at 7:50 AM, Fadzi Ushewokunze > <fa...@butterflycluster.net> wrote: > > hi, > > > > i am running on a single machine; 2G RAM, and java heap space set at > > 1024m, the segments are quite - tiny less than 100 urls and during > > mergeSegments i get this exception below; > > > > i have set mapred.child.java.opts=-Xmx512m but there is no change; > > > > any suggestions? > > > > > > ====> > > > > 2009-11-03 17:58:28,971 INFO [org.apache.hadoop.mapred.LocalJobRunner] > > reduce > reduce > > 2009-11-03 17:58:38,448 INFO [org.apache.hadoop.mapred.LocalJobRunner] > > reduce > reduce > > 2009-11-03 17:58:57,085 INFO [org.apache.hadoop.mapred.LocalJobRunner] > > reduce > reduce > > 2009-11-03 17:59:34,723 INFO [org.apache.hadoop.mapred.LocalJobRunner] > > reduce > reduce > > 2009-11-03 18:02:09,660 INFO [org.apache.hadoop.mapred.TaskRunner] > > Communication exception: java.lang.OutOfMemoryError: Java heap space > > at org.apache.hadoop.mapred.Counters > > $Group.getCounterForName(Counters.java:327) > > at > > org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494) > > at org.apache.hadoop.mapred.Counters.sum(Counters.java:506) > > at org.apache.hadoop.mapred.LocalJobRunner > > $Job.statusUpdate(LocalJobRunner.java:222) > > at org.apache.hadoop.mapred.Task$1.run(Task.java:418) > > at java.lang.Thread.run(Thread.java:619) > > > > 2009-11-03 18:02:10,376 WARN [org.apache.hadoop.mapred.LocalJobRunner] > > job_local_0001 > > java.lang.ThreadDeath > > at java.lang.Thread.stop(Thread.java:715) > > at > > org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) > > at org.apache.hadoop.mapred.JobClient > > $NetworkedJob.killJob(JobClient.java:315) > > at > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239) > > at > > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) > > at > > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) > > > >