> > Edward Quick wrote: > > Ahh, I see this was already discussed in a recent thread: > > > > http://www.mail-archive.com/[email protected]/msg11812.html > > > > So in conclusion, is this saying it's not possible to fetch from the same > > site at the same time on multiple nodes, or is there a way to override that? > > Currently there is no way to override this behavior (unless you're > willing to modify the Generator class to use a different Partitioner). > The only thing you can do now to speed it up is to allow more threads > per host in the config. This is set to 1 by default, but since you > control the target server you can increase it to e.g. 10 and see how it > works. >
Thanks Andrzej, I wasn't sure whether I should post a new thread for this but as the fetch is only running on one host, the filesystem fills up after it's crawled 400000 links, and then falls over instead of continuing the fetch on one of the other nodes. Is that expected behaviour? Also I have set fetcher.store.content to false in nutch-site.xml to try and save space. Am I right in thinking that will just stop the cached page working in the search results? Cheers, Ed _________________________________________________________________ Make a mini you and download it into Windows Live Messenger http://clk.atdmt.com/UKM/go/111354029/direct/01/
