Any other thoughts on Out of memory error in linkdb? Thanks. Sathyam Y <[EMAIL PROTECTED]> wrote: Around 200,000 pages when it failed.
Dennis Kubes wrote: How many pages are in your database? Dennis Kubes Sathyam Y wrote: > I am getting the same out of memory exception in linkdb. I have a > configuration of 4 machines running Nutch0.9 trunk. > > Please let me know if you found a way to resolve this issue. All tasks > (master and slaves) are running with -Xmx1000m option and I am reluctant to > increase heap size further. > > Thanks. > > Dennis Kubes wrote: > Try setting your child opts to -Xmx512M or higher. This config variable > is found in the hadoop-default.xml. AFAIK there is no way to change the > memory options for a single stage. > > Dennis Kubes > > Daniel Clark wrote: >> I received the following error during the linkdb stage of indexing. Has >> anyone encountered this before? Is there a way of increasing memory for >> this stage in config file? Is there a known linkdb memory leak problem? >> >> >> >> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting >> >> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb >> >> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true >> >> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true >> >> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment: >> /user/daclark/crawl/segments/20071008185033 >> >> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load >> native-hadoop library for your platform... using builtin-java classes where >> applicable >> >> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load >> native-hadoop library for your platform... using builtin-java classes where >> applicable >> >> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child >> >> java.lang.OutOfMemoryError: Java heap space >> >> at >> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95) >> >> at java.io.DataOutputStream.write(DataOutputStream.java:90) >> >> at org.apache.hadoop.io.Text.writeString(Text.java:399) >> >> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48) >> >> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54) >> >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315) >> >> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167) >> >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) >> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175) >> >> at >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445) >> >> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException: >> Job failed! >> >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) >> >> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232) >> >> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377) >> >> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) >> >> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333) >> >> >> >> >> >> >> >> ~~~~~~~~~~~~~~~~~~~~~ >> >> Daniel Clark, President >> >> DAC Systems, Inc. >> >> (703) 403-0340 >> >> ~~~~~~~~~~~~~~~~~~~~~ >> >> >> >> > > > > --------------------------------- > Looking for a deal? Find great prices on flights and hotels with Yahoo! > FareChase. --------------------------------- Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
