> >>>The one task crawls about 3% of my topN and stops > >>>eventually with java.lang.OutOfMemoryError: Java heap space > >>>errors. > >>Are you running Fetcher in parsing mode? Try to use the -noParsing > >>option, and then parse the content in a separate step.
I am now running generate/fetch/parse/updatedb. The fetch process still only gets about 3%-4% of the URLs in the topN of the generate. The fetch process logs similar messages as before: ----- fetch of http://www.example.com/public/page.asp/85491 failed with: java.lang.OutOfMemoryError: Java heap space fetch of http://www.example.com/public/page.asp/16154 failed with: java.lang.OutOfMemoryError: Java heap space fetch of http://www.example.com/public/page.asp/20208 failed with: java.lang.OutOfMemoryError: Java heap space fetch of http://www.example.com/public/page.asp/15411 failed with: java.lang.OutOfMemoryError: Java heap space fetch of http://www.example.com/public/page.asp/178293 failed with: java.lang.OutOfMemoryError: Java heap space fetch of http://www.example.com/public/page.asp/843060 failed with: java.lang.OutOfMemoryError: Java heap space fetch of http://www.example.com/public/page.asp/967264 failed with: java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space fetcher caught:java.lang.OutOfMemoryError: Java heap space fetch of http://www.example.com/public/page.asp/97401 failed with: java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space fetcher caught:java.lang.OutOfMemoryError: Java heap space fetch of http://www.example.com/public/page.asp/1585146 failed with: java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space fetcher caught:java.lang.OutOfMemoryError: Java heap space fetch of http://www.example.com/public/page.asp/11 failed with: java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space fetcher caught:java.lang.OutOfMemoryError: Java heap space ----- The first few entries are just fetch of X failed with: Y After a few of these, it changes to a set of 3 error messages like 'fetcher caught: java.lang... ; java.lang... ; fetch of X failed with: java.lang...'. I am not seeing any errors in the parse process. How do I hunt down the java heap space error further? This only occurs in the fetch process. Do I have too many threads? I have it set to 24 threads, 32 max on a single host. I have the std memory option on the java runs. Every java process has the -Xmx1000m option. Should this be increased? How do you deal with slaves that have different amounts of memory. I have some with 1.5gb ram, and others with 4gb ram. Sorry for all the questions. The fetch issue is my current wall I am trying to overcome. Should this be debugged in the fetch process or is it possible the generate process is only outputting 3%-4% of the topN value? Thanks in advance for any pointers. JohnM -- john mendenhall [EMAIL PROTECTED] surf utopia internet services
