Do you have the Java heap space options set in the 'mapred.child.java.opts' property (in conf/hadoop-site.xml)? For a machine with 1gb ram and 1gb swap space, I set this to '-Xms1024m -Xmx2048m'.
Best, Siddhartha On Jan 31, 2008 3:23 AM, John Mendenhall <[EMAIL PROTECTED]> wrote: > > >>>The one task crawls about 3% of my topN and stops > > >>>eventually with java.lang.OutOfMemoryError: Java heap space > > >>>errors. > > >>Are you running Fetcher in parsing mode? Try to use the -noParsing > > >>option, and then parse the content in a separate step. > > I am now running generate/fetch/parse/updatedb. > The fetch process still only gets about 3%-4% of > the URLs in the topN of the generate. > The fetch process logs similar messages as before: > > ----- > fetch of http://www.example.com/public/page.asp/85491 failed with: > java.lang.OutOfMemoryError: Java heap space > fetch of http://www.example.com/public/page.asp/16154 failed with: > java.lang.OutOfMemoryError: Java heap space > fetch of http://www.example.com/public/page.asp/20208 failed with: > java.lang.OutOfMemoryError: Java heap space > fetch of http://www.example.com/public/page.asp/15411 failed with: > java.lang.OutOfMemoryError: Java heap space > fetch of http://www.example.com/public/page.asp/178293 failed with: > java.lang.OutOfMemoryError: Java heap space > fetch of http://www.example.com/public/page.asp/843060 failed with: > java.lang.OutOfMemoryError: Java heap space > fetch of http://www.example.com/public/page.asp/967264 failed with: > java.lang.OutOfMemoryError: Java heap space > java.lang.OutOfMemoryError: Java heap space > fetcher caught:java.lang.OutOfMemoryError: Java heap space > fetch of http://www.example.com/public/page.asp/97401 failed with: > java.lang.OutOfMemoryError: Java heap space > java.lang.OutOfMemoryError: Java heap space > fetcher caught:java.lang.OutOfMemoryError: Java heap space > fetch of http://www.example.com/public/page.asp/1585146 failed with: > java.lang.OutOfMemoryError: Java heap space > java.lang.OutOfMemoryError: Java heap space > fetcher caught:java.lang.OutOfMemoryError: Java heap space > fetch of http://www.example.com/public/page.asp/11 failed with: > java.lang.OutOfMemoryError: Java heap space > java.lang.OutOfMemoryError: Java heap space > fetcher caught:java.lang.OutOfMemoryError: Java heap space > ----- > > The first few entries are just fetch of X failed with: Y > After a few of these, it changes to a set of 3 error messages > like 'fetcher caught: java.lang... ; java.lang... ; fetch of X > failed with: java.lang...'. > > I am not seeing any errors in the parse process. > > How do I hunt down the java heap space error > further? This only occurs in the fetch process. > Do I have too many threads? > > I have it set to 24 threads, 32 max on a single > host. > > I have the std memory option on the java runs. > Every java process has the -Xmx1000m option. > Should this be increased? > > How do you deal with slaves that have different > amounts of memory. I have some with 1.5gb ram, > and others with 4gb ram. > > Sorry for all the questions. The fetch issue is > my current wall I am trying to overcome. > > Should this be debugged in the fetch process or > is it possible the generate process is only > outputting 3%-4% of the topN value? > > Thanks in advance for any pointers. > > JohnM > > -- > john mendenhall > [EMAIL PROTECTED] > surf utopia > internet services > -- http://sids.in "If you are not having fun, you are not doing it right."
