On Thu, Sep 18, 2008 at 4:19 PM, Edward Quick <[EMAIL PROTECTED]> wrote: > > Hi, > > I'm getting java.lang.OutOfMemoryError: Java heap space errors when running > nutch in a hadoop cluster. > I have doubled the heap by setting export HADOOP_HEAPSIZE=2048 in > hadoop-env.sh but this doesn't seem to make a difference. > > I'm need to hadoop so appreciate any help. >
Are you parsing during fetching? If so try disabling that and run parsing as a separate job. At least, you won't lose the results of fetching :) > Thanks, > > Ed. > > > 2008-09-18 14:13:30,274 INFO fetcher.Fetcher - fetch of > http://somedomain.com/general/aptrix/aptprop.nsf/Content/CityFlyer+Cabin+Crew+Home%5CLibrary?OpenDocument > failed with: java.lang.OutOfMemoryError: Java heap space > 2008-09-18 14:13:30,276 ERROR httpclient.Http - java.lang.OutOfMemoryError: > Java heap space > 2008-09-18 14:13:30,635 INFO fetcher.Fetcher - fetch of > http://somedomain.com/general/aptrix/aptcsops.nsf/Content/Inflight+Services+Home%5CPeople+%26+Training%5CCrewcare?OpenDocument?OpenDocument > failed with: java.lang.OutOfMemoryError: Java heap space > 2008-09-18 14:13:30,635 INFO fetcher.Fetcher - fetch of > http://somedomain.com/general/aptrix/apteba.nsf/Content/Commercial+Chat+-+February+7+2008?OpenDocument > failed with: java.lang.OutOfMemoryError: Java heap space > 2008-09-18 14:13:30,635 ERROR httpclient.Http - java.lang.OutOfMemoryError: > Java heap space > 2008-09-18 14:13:30,636 WARN mapred.TaskTracker - Error running child > java.lang.OutOfMemoryError: Java heap space > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - java.io.IOException: Stream > closed > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - java.io.IOException: Stream > closed > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1378) > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337) > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at > java.io.DataInputStream.readInt(DataInputStream.java:353) > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1847) > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1877) > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1378) > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1782) > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337) > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1828) > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at > java.io.DataInputStream.readInt(DataInputStream.java:353) > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79) > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1847) > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:31) > 2008-09-18 14:13:30,638 ERROR httpclient.Http - java.lang.OutOfMemoryError: > Java heap space > 2008-09-18 14:13:30,638 INFO fetcher.Fetcher - fetch of > http://somedomain.com/general/aptrix/aptrix.nsf/Content/Corporate+Principles?OpenDocument > failed with: java.lang.OutOfMemoryError: Java heap space > 2008-09-18 14:13:31,048 FATAL fetcher.Fetcher - java.io.IOException: Stream > closed > 2008-09-18 14:13:31,048 FATAL fetcher.Fetcher - at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1378) > 2008-09-18 14:13:31,048 FATAL fetcher.Fetcher - at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at > java.io.DataInputStream.readInt(DataInputStream.java:353) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1847) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1877) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1782) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1828) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:31) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:127) > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - fetcher > caught:java.io.IOException: Stream closed > 2008-09-18 14:13:31,050 FATAL util.LogUtil - Cannot log with method [public > abstract void org.apache.commons.logging.Log.fatal(java.lang.Object)] > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.nutch.util.LogUtil$1.flush(LogUtil.java:103) > at java.io.PrintStream.write(PrintStream.java:414) > at > sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336) > at > sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:404) > at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:115) > at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:169) > at java.io.PrintStream.newLine(PrintStream.java:478) > at java.io.PrintStream.println(PrintStream.java:740) > at java.lang.Throwable.printStackTrace(Throwable.java:465) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:132) > Caused by: java.lang.OutOfMemoryError: Java heap space > 2008-09-18 14:13:31,050 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1782) > 2008-09-18 14:13:31,050 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1828) > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79) > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:31) > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158) > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:127) > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - fetcher > caught:java.io.IOException: Stream closed > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158) > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:127) > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - fetcher > caught:java.io.IOException: Stream closed > > _________________________________________________________________ > Make a mini you and download it into Windows Live Messenger > http://clk.atdmt.com/UKM/go/111354029/direct/01/ -- Doğacan Güney
