Hi jc,
and one thing to add: check the robots.txt file of your crawled hosts,
maybe they are limiting your fetches with delays:
Crawl-delay: 10
--Roland
Am 01.03.2013 03:32, schrieb feng lu:
Hi jc
<<
I don't understand why there are 19 queues, is it maybe that only 19
websites are being fetched?
Because each queue handles FetchItems which come from the same Queue ID (be
it a proto/hostname or proto/IP or proto/domain pair). And the Queue ID
will be created based on queueMode argument. So here may be there 19
different Queue ID in FetchItemQueues.
<<
Anyways, why is it that there are 194 spinwaiting out of 200 active
threads?
First of all, i see that the parameter "fetcher.threads.per.host" has been
replaced by "fetcher.threads.per.queue" in nutch 1.6. I see that there are
200 fetching threads that can fetch items from any host. However, all
remaining items are from the different 19 hosts. And total urls count is
10000. Each queue come from the same Queue ID. So the logs indicate that
only 6 threads is fetching and another 13 threads have finished fetching.
Maybe another 13 queues are too small without spend too much time.
Thanks
lufeng
On Fri, Mar 1, 2013 at 6:44 AM, jc <[email protected]> wrote:
Hi guys,
I'm sorry if this question has been answered before, I looked but didn't
find anything.
This is my scenario (only relevant settings I think):
seed urls: about 60 homepages from different domains
generate.max.count = 10000
fetcher.threads.per.host = 3 I'm trying to be polite here :-)
partition.url.mode = byHost
fetcher.threads.fetch = 200
fetcher.threads.per.queue = 1
topN = 1000000
depth = 1
Since the very beggining I've got a lot of spinwaiting threads (I'm not
sure
if those are threads because it doesn't really say in the log)
194/200 spinwaiting/active, 166 pages, 3 errors, 4.7 3.8 pages/s , 1471
1412
kb/s, 10000 URLs in 19 queues
I don't understand why there are 19 queues, is it maybe that only 19
websites are being fetched? Anyways, why is it that there are 194
spinwaiting out of 200 active threads?
Thanks a lot in advance for your time.
Regards,
jc
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-lot-of-threads-spinwaiting-tp4043801.html
Sent from the Nutch - User mailing list archive at Nabble.com.