Hi Roland and lufeng,

Thank you very much for your replies, I already tested lufeng advice, with
results pretty much as expected.

By the way, my nutch installation is based on 2.1 version with hbase as
crawldb storage

Roland, maybe fetcher.server.delay param has something to do with that as
well, I set it to 3 secs, setting it to 0 would be unpolite?

All info you provided has helped me a lot, only one issue remains unfixed
yet, there are more than 60 URLs from different hosts in my seed file, and
only 20 queues, things may seem that all other 40 hosts have no more URLs to
generate, but I really haven't seen any URL coming from those hosts since
the creation of the crawldb.

Based on my poor experience following params would allow a number of 60
queues for my vertical crawl, am I missing something?

topN = 1 million
fetcher.threads.per.queue = 3
fetcher.threads.per.host = 3 (just in case, I remember you told me to use
per.queue instead)
fetcher.threads.fetch = 200
seed urls of different hosts = 60 or more (regex-urlfilter.txt allows only
urls from these hosts, they're all there, I checked)
crawldb record count > 1 million

Thanks again for all your help

Regards,
JC



--
View this message in context: 
http://lucene.472066.n3.nabble.com/a-lot-of-threads-spinwaiting-tp4043801p4043988.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to