I am not sure how to do db per domain per thread in a single process. If you try to do a db per domain in multiple processes then you generally limit the number of threads per host and that is not very efficient use of threads because for politeness reasons you should not run many threads per host.
Paul On Nov 21, 2010 10:43 PM, "Sourabh Kasliwal" <[email protected]> wrote: Hi, Thanks for the reply.. Can you please elaborate on the crawl speed problem to some extend. I want to run multiple crawls together in different threads(not processes) in order to fetch pages from multiple domains in parallel and have separate crawl database for each domain. I am doing it on a single machine with hadoop on standalone mode. I am familiar with the issue that hadoop will serialize the nutch jobs. Is there any other major problem that I should handle ? regards Sourabh On Mon, Nov 22, 2010 at 11:56 AM, Paul Dhaliwal <[email protected]> wrote: > It is possible to do ...

