Gang, I can see lots of discussion about fetching large site like wikipedia but none of then gives a concrete picture on how to fetch without any problem. I have six fetcher job running and all URLs are assigned to a single job as I think Nutch partitions by hostname and all of these pages get assigned to a single fetcher. If I use generate.max.per.host parameter to restrict number of urls per job, will it able to distribute urls uniformly across all jobs? As this is going to be a major issue, I am thinking to twik the nutch so that URLs would be assigned based on number than host i.e. if a job reaches some number then it will assign to other jobs for fetching. Not sure, which one should be following to have a successful fetching of large site. - RB