> 
> Edward Quick wrote:
> > Ahh, I see this was already discussed in a recent thread:
> > 
> > http://www.mail-archive.com/[email protected]/msg11812.html
> > 
> > So in conclusion, is this saying it's not possible to fetch from the same 
> > site at the same time on multiple nodes, or is there a way to override that?
> 
> Currently there is no way to override this behavior (unless you're 
> willing to modify the Generator class to use a different Partitioner). 
> The only thing you can do now to speed it up is to allow more threads 
> per host in the config. This is set to 1 by default, but since you 
> control the target server you can increase it to e.g. 10 and see how it 
> works.
> 

Thanks Andrzej,

I wasn't sure whether I should post a new thread for this but as the fetch is 
only running on one host, the filesystem fills up after it's crawled 400000 
links, and then falls over instead of continuing the fetch on one of the other 
nodes. Is that expected behaviour?

Also I have set fetcher.store.content to false in nutch-site.xml to try and 
save space. Am I right in thinking that will just stop the cached page working 
in the search results?

Cheers,

Ed

_________________________________________________________________
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/

Reply via email to