Any other thoughts on Out of memory error in linkdb?
   
  Thanks.
  

Sathyam Y <[EMAIL PROTECTED]> wrote:
  Around 200,000 pages when it failed. 

Dennis Kubes wrote: How many pages are in your database?

Dennis Kubes

Sathyam Y wrote:
> I am getting the same out of memory exception in linkdb. I have a 
> configuration of 4 machines running Nutch0.9 trunk.
> 
> Please let me know if you found a way to resolve this issue. All tasks 
> (master and slaves) are running with -Xmx1000m option and I am reluctant to 
> increase heap size further.
> 
> Thanks.
> 
> Dennis Kubes wrote:
> Try setting your child opts to -Xmx512M or higher. This config variable 
> is found in the hadoop-default.xml. AFAIK there is no way to change the 
> memory options for a single stage.
> 
> Dennis Kubes
> 
> Daniel Clark wrote:
>> I received the following error during the linkdb stage of indexing. Has
>> anyone encountered this before? Is there a way of increasing memory for
>> this stage in config file? Is there a known linkdb memory leak problem?
>>
>>
>>
>> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>>
>> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
>> /user/daclark/crawl/segments/20071008185033
>>
>> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>>
>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>
>> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>>
>> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>>
>> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>>
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>
>> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>>
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>>
>> at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>>
>> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
>> Job failed!
>>
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>
>> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>
>> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>>
>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>
>> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> Daniel Clark, President
>>
>> DAC Systems, Inc.
>>
>> (703) 403-0340
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
> 
> 
> 
> ---------------------------------
> Looking for a deal? Find great prices on flights and hotels with Yahoo! 
> FareChase.



---------------------------------
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.

 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to