Dear developers,

I have installed a nutch system on a linux enterprise server with 8GB RAM.
My JAVA VM has 4GB RAM, when nutch starts.

I have configured a web-crawler to scan pdf documents (abour 3000) in intranet.
After about 100 PDF docs, there is always a outOfMemory Exception.

I tried following trick.

In idex.html, I generate links to a set of  html links. (link1.html, 
liknk2.html etc..) 
Each link.html has a link to 20 PDFS. But this trick also fails.

Can someone give some idea or a place to read?


Best regards,

Dulip Withanage, M.Sc 


Cluster of Excellence 
Karl Jaspers Centre
Heidelberg

Fax: +49-6221 - 54 4012
e-mail: withan...@asia-europe.uni-heidelberg.de



Reply via email to