I received the following error during the linkdb stage of indexing.  Has
anyone encountered this before?  Is there a way of increasing memory for
this stage in config file?  Is there a known linkdb memory leak problem?

 

2007-10-09 10:56:37,787 INFO  crawl.LinkDb - LinkDb: starting

2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb: linkdb: crawl/linkdb

2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb: URL normalize: true

2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb: URL filter: true

2007-10-09 10:56:37,886 INFO  crawl.LinkDb - LinkDb: adding segment:
/user/daclark/crawl/segments/20071008185033

2007-10-09 10:56:39,977 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable

2007-10-09 10:56:42,495 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable

2007-10-09 10:56:51,415 WARN  mapred.TaskTracker - Error running child

java.lang.OutOfMemoryError: Java heap space

        at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)

        at java.io.DataOutputStream.write(DataOutputStream.java:90)

        at org.apache.hadoop.io.Text.writeString(Text.java:399)

        at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)

        at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)

        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)

        at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)

        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)

        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)

        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)

2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)

        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)

        at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)

        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)

        at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)

 

 

 

~~~~~~~~~~~~~~~~~~~~~

Daniel Clark, President

DAC Systems, Inc.

(703) 403-0340

~~~~~~~~~~~~~~~~~~~~~

 

Reply via email to