Dear developers, I have installed a nutch system on a linux enterprise server with 8GB RAM. My JAVA VM has 4GB RAM, when nutch starts.
I have configured a web-crawler to scan pdf documents (abour 3000) in intranet. After about 100 PDF docs, there is always a outOfMemory Exception. I tried following trick. In idex.html, I generate links to a set of html links. (link1.html, liknk2.html etc..) Each link.html has a link to 20 PDFS. But this trick also fails. Can someone give some idea or a place to read? Best regards, Dulip Withanage, M.Sc Cluster of Excellence Karl Jaspers Centre Heidelberg Fax: +49-6221 - 54 4012 e-mail: withan...@asia-europe.uni-heidelberg.de