Do you have the Java heap space options set in the 'mapred.child.java.opts'
property (in conf/hadoop-site.xml)? For a machine with 1gb ram and 1gb swap
space, I set this to '-Xms1024m -Xmx2048m'.

Best,
Siddhartha

On Jan 31, 2008 3:23 AM, John Mendenhall <[EMAIL PROTECTED]> wrote:

> > >>>The one task crawls about 3% of my topN and stops
> > >>>eventually with java.lang.OutOfMemoryError: Java heap space
> > >>>errors.
> > >>Are you running Fetcher in parsing mode? Try to use the -noParsing
> > >>option, and then parse the content in a separate step.
>
> I am now running generate/fetch/parse/updatedb.
> The fetch process still only gets about 3%-4% of
> the URLs in the topN of the generate.
> The fetch process logs similar messages as before:
>
> -----
> fetch of http://www.example.com/public/page.asp/85491 failed with:
> java.lang.OutOfMemoryError: Java heap space
> fetch of http://www.example.com/public/page.asp/16154 failed with:
> java.lang.OutOfMemoryError: Java heap space
> fetch of http://www.example.com/public/page.asp/20208 failed with:
> java.lang.OutOfMemoryError: Java heap space
> fetch of http://www.example.com/public/page.asp/15411 failed with:
> java.lang.OutOfMemoryError: Java heap space
> fetch of http://www.example.com/public/page.asp/178293 failed with:
> java.lang.OutOfMemoryError: Java heap space
> fetch of http://www.example.com/public/page.asp/843060 failed with:
> java.lang.OutOfMemoryError: Java heap space
> fetch of http://www.example.com/public/page.asp/967264 failed with:
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> fetcher caught:java.lang.OutOfMemoryError: Java heap space
> fetch of http://www.example.com/public/page.asp/97401 failed with:
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> fetcher caught:java.lang.OutOfMemoryError: Java heap space
> fetch of http://www.example.com/public/page.asp/1585146 failed with:
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> fetcher caught:java.lang.OutOfMemoryError: Java heap space
> fetch of http://www.example.com/public/page.asp/11 failed with:
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> fetcher caught:java.lang.OutOfMemoryError: Java heap space
> -----
>
> The first few entries are just fetch of X failed with: Y
> After a few of these, it changes to a set of 3 error messages
> like 'fetcher caught: java.lang... ; java.lang... ; fetch of X
> failed with: java.lang...'.
>
> I am not seeing any errors in the parse process.
>
> How do I hunt down the java heap space error
> further?  This only occurs in the fetch process.
> Do I have too many threads?
>
> I have it set to 24 threads, 32 max on a single
> host.
>
> I have the std memory option on the java runs.
> Every java process has the -Xmx1000m option.
> Should this be increased?
>
> How do you deal with slaves that have different
> amounts of memory.  I have some with 1.5gb ram,
> and others with 4gb ram.
>
> Sorry for all the questions.  The fetch issue is
> my current wall I am trying to overcome.
>
> Should this be debugged in the fetch process or
> is it possible the generate process is only
> outputting 3%-4% of the topN value?
>
> Thanks in advance for any pointers.
>
> JohnM
>
> --
> john mendenhall
> [EMAIL PROTECTED]
> surf utopia
> internet services
>



-- 
http://sids.in
"If you are not having fun, you are not doing it right."

Reply via email to