2008/9/18 Edward Quick <[EMAIL PROTECTED]>:
>
>>
>> On Thu, Sep 18, 2008 at 4:19 PM, Edward Quick <[EMAIL PROTECTED]> wrote:
>> >
>> > Hi,
>> >
>> > I'm getting java.lang.OutOfMemoryError: Java heap space errors when 
>> > running nutch in a hadoop cluster.
>> > I have doubled the heap by setting export HADOOP_HEAPSIZE=2048 in 
>> > hadoop-env.sh but this doesn't seem to make a difference.
>> >
>> > I'm need to hadoop so appreciate any help.
>> >
>>
>> Are you parsing during fetching? If so try disabling that and run
>> parsing as a separate job. At least, you
>> won't lose the results of fetching :)
>
>
> The threads in nutch-site.xml were set too high (at 50) so I put those down 
> to 10 and it seems ok now.
>
> How do you run fetching and parsing separately? Does that use up more space?
>
>

No, but you need to run two jobs, it may take more time. Just enable
-noParsing switch, i.e

bin/nutch fetch .... -noParsing

>
> Thanks for your help.
>
>
>
> Ed.
>
>>
>> > Thanks,
>> >
>> > Ed.
>> >
>> >
>> > 2008-09-18 14:13:30,274 INFO  fetcher.Fetcher - fetch of 
>> > http://somedomain.com/general/aptrix/aptprop.nsf/Content/CityFlyer+Cabin+Crew+Home%5CLibrary?OpenDocument
>> >  failed with: java.lang.OutOfMemoryError: Java heap space
>> > 2008-09-18 14:13:30,276 ERROR httpclient.Http - 
>> > java.lang.OutOfMemoryError: Java heap space
>> > 2008-09-18 14:13:30,635 INFO  fetcher.Fetcher - fetch of 
>> > http://somedomain.com/general/aptrix/aptcsops.nsf/Content/Inflight+Services+Home%5CPeople+%26+Training%5CCrewcare?OpenDocument?OpenDocument
>> >  failed with: java.lang.OutOfMemoryError: Java heap space
>> > 2008-09-18 14:13:30,635 INFO  fetcher.Fetcher - fetch of 
>> > http://somedomain.com/general/aptrix/apteba.nsf/Content/Commercial+Chat+-+February+7+2008?OpenDocument
>> >  failed with: java.lang.OutOfMemoryError: Java heap space
>> > 2008-09-18 14:13:30,635 ERROR httpclient.Http - 
>> > java.lang.OutOfMemoryError: Java heap space
>> > 2008-09-18 14:13:30,636 WARN  mapred.TaskTracker - Error running child
>> > java.lang.OutOfMemoryError: Java heap space
>> > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - java.io.IOException: 
>> > Stream closed
>> > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - java.io.IOException: 
>> > Stream closed
>> > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1378)
>> > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337)
>> > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at 
>> > java.io.DataInputStream.readInt(DataInputStream.java:353)
>> > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1847)
>> > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1877)
>> > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1378)
>> > 2008-09-18 14:13:30,637 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1782)
>> > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337)
>> > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1828)
>> > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at 
>> > java.io.DataInputStream.readInt(DataInputStream.java:353)
>> > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
>> > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1847)
>> > 2008-09-18 14:13:30,638 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:31)
>> > 2008-09-18 14:13:30,638 ERROR httpclient.Http - 
>> > java.lang.OutOfMemoryError: Java heap space
>> > 2008-09-18 14:13:30,638 INFO  fetcher.Fetcher - fetch of 
>> > http://somedomain.com/general/aptrix/aptrix.nsf/Content/Corporate+Principles?OpenDocument
>> >  failed with: java.lang.OutOfMemoryError: Java heap space
>> > 2008-09-18 14:13:31,048 FATAL fetcher.Fetcher - java.io.IOException: 
>> > Stream closed
>> > 2008-09-18 14:13:31,048 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1378)
>> > 2008-09-18 14:13:31,048 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at 
>> > java.io.DataInputStream.readInt(DataInputStream.java:353)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1847)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1877)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1782)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1828)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:31)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - at 
>> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:127)
>> > 2008-09-18 14:13:31,049 FATAL fetcher.Fetcher - fetcher 
>> > caught:java.io.IOException: Stream closed
>> > 2008-09-18 14:13:31,050 FATAL util.LogUtil - Cannot log with method 
>> > [public abstract void 
>> > org.apache.commons.logging.Log.fatal(java.lang.Object)]
>> > java.lang.reflect.InvocationTargetException
>> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >        at 
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >        at 
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >        at java.lang.reflect.Method.invoke(Method.java:585)
>> >        at org.apache.nutch.util.LogUtil$1.flush(LogUtil.java:103)
>> >        at java.io.PrintStream.write(PrintStream.java:414)
>> >        at 
>> > sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>> >        at 
>> > sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:404)
>> >        at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:115)
>> >        at 
>> > java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:169)
>> >        at java.io.PrintStream.newLine(PrintStream.java:478)
>> >        at java.io.PrintStream.println(PrintStream.java:740)
>> >        at java.lang.Throwable.printStackTrace(Throwable.java:465)
>> >        at 
>> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:132)
>> > Caused by: java.lang.OutOfMemoryError: Java heap space
>> > 2008-09-18 14:13:31,050 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1782)
>> > 2008-09-18 14:13:31,050 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1828)
>> > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
>> > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:31)
>> > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158)
>> > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at 
>> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:127)
>> > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - fetcher 
>> > caught:java.io.IOException: Stream closed
>> > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at 
>> > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158)
>> > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - at 
>> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:127)
>> > 2008-09-18 14:13:31,051 FATAL fetcher.Fetcher - fetcher 
>> > caught:java.io.IOException: Stream closed
>> >
>> > _________________________________________________________________
>> > Make a mini you and download it into Windows Live Messenger
>> > http://clk.atdmt.com/UKM/go/111354029/direct/01/
>>
>>
>>
>> --
>> Doğacan Güney
>
> _________________________________________________________________
> Make a mini you and download it into Windows Live Messenger
> http://clk.atdmt.com/UKM/go/111354029/direct/01/



-- 
Doğacan Güney

Reply via email to