Hi,
I have Nutch 2.3.1 installation with MongoDB.

I want to understand what scalability options I have.

1. Number threads during one Job can be defined by nutch-site.xml
        a. fetcher.threads.per.queue - This number is the maximum number of 
threads that should be allowed to access a queue at one time.
        b. fetcher.threads.fetch - The number of FetcherThreads the fetcher 
should use
Do we have other scalability configuration parameters?

2. Ability to run the same Job on different hosts.
        Does it supported by Nutch?
3.  Ability to run Jobs in parallel.
        Example: I run “fetch” job. It produces new not Crawled URLS. 
        Can I run another job to process these uncrawled URLS before the first 
Job is done?
4. Database scalability.
        Can I use multiple instances Mongo DB for crawling?

Thank you in advance,
Vladimir.

Reply via email to