There's something left I want to ask that I haven't found clearly explained on FAQ nor mailing list:
Nutch STOP conditions, meaning: "how to stop a running nutch crawl" In other words, how to define crawl: 1) "time limit": Crawl for Q hours and stop 2) "segments limit": After generating N segments, stop 3) "space limit": After M megabytes/space on DFS used, stop. 4) "input urls limit": After crawling Z urls from the original (seed) input set, stop. 5) "depth limit": After reaching crawling depth X "far away" from original input url list, stop. More "limits" doubts/suggestions are welcome ;) I'll put the answer(s) on Nutch wiki (FAQ section) if you don't mind, I think it could clarify this spot to lots of people on the mailing list (me included ! :-S).
