You could write a shell script, to be executed via a Cron job every minute or so, to stat the temp file directory, and if the size is over the set limit, terminate the java thread. Or, if you can program sufficiently, add some Java to the crawler code.
As such, I don't believe there is any setting in the configuration files that allows you to do such a thing. Regards, Alexander -----Original Message----- From: Olena Medelyan [mailto:[EMAIL PROTECTED] Sent: Tuesday, 21 March 2006 3:46 PM To: [email protected] Subject: How to terminate the crawl? Hi, I'm using the crawl tool in nutch to crawl web starting from a set of URL seeds. The crawl normally finishes after the specified depth was reached. Is it possible to terminate after a pre-defined number of pages or a text data of a pre-defined size (e.g. 500 MB) has been crawled? Thank you for any hints! Regards, Olena ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
