Hi Renaud, Actually i don't have enough memory to do that. My servers have only 1 Go of Ram.
I'm able to crawl a lot of website without any issue. Its only when i try to merge 2 crawl that i get this error. Any ideas? E > hi Emmanuel, > > does it work if you try to allocate more memory to java in bin/nutch > script? (using e.g. '-Xms512M -Xmx2048M') > > HTH, > Renaud > > > Emmanuel wrote: >> Hi Guys, >> >> I tried to merge 2 crawl of about 200 000 fetched pages each and i got >> the >> following error : >> >> 2007-08-15 09:47:43,472 WARN mapred.TaskTracker - Error running child >> java.lang.OutOfMemoryError: Java heap space >> at java.util.Arrays.copyOf(Arrays.java:2786) >> at >> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) >> at java.io.DataOutputStream.write(DataOutputStream.java:90) >> at java.io.FilterOutputStream.write(FilterOutputStream.java:80) >> at org.apache.nutch.protocol.Content.write(Content.java:163) >> at >> org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:100) >> at >> org.apache.nutch.metadata.MetaWrapper.write(MetaWrapper.java:107) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java :365) >> at >> org.apache.nutch.segment.SegmentMerger.map(SegmentMerger.java:338) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186) >> at >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1707) >> >> I used the trunk version on Linux 2.6.22and Java 1.6. >> >> Does it mean anything for you? >> Any help would be appreciate.. >> Thanks >> E >> >> > >
