Re: Multiple Crawl database ??

Paul Dhaliwal Sun, 21 Nov 2010 23:02:27 -0800

I am not sure how to do db per domain per thread in a single process. If you
try to do a db per domain in multiple processes then you generally limit the
number of threads per host and that is not very efficient use of threads
because for politeness reasons you should not run many threads per host.


Paul

On Nov 21, 2010 10:43 PM, "Sourabh Kasliwal" <[email protected]>
wrote:
Hi,
Thanks for the reply..
Can you please elaborate on the crawl speed problem to some extend.

I want to run multiple crawls together in different threads(not processes)
in order to fetch pages from multiple domains in parallel and have separate
crawl database for each  domain.

I am doing it on a single machine with hadoop on standalone mode. I am
familiar with the issue that hadoop will serialize the nutch jobs. Is there
any other major problem that I should handle ?

regards
Sourabh


On Mon, Nov 22, 2010 at 11:56 AM, Paul Dhaliwal <[email protected]> wrote:

> It is possible to do ...

Re: Multiple Crawl database ??

Reply via email to