Hi Guys,
I tried to merge 2 crawl of about 200 000 fetched pages each and i got the
following error :
2007-08-15 09:47:43,472 WARN mapred.TaskTracker - Error running child
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at org.apache.nutch.protocol.Content.write(Content.java:163)
at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:100)
at org.apache.nutch.metadata.MetaWrapper.write(MetaWrapper.java:107)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:365)
at org.apache.nutch.segment.SegmentMerger.map(SegmentMerger.java:338)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1707)
I used the trunk version on Linux 2.6.22and Java 1.6.
Does it mean anything for you?
Any help would be appreciate..
Thanks
E