Re: Concurrent settings in scrapy/scrapyd

Nikolaos-Digenis Karagiannis Sat, 07 Jan 2017 03:25:33 -0800

Those settings limit the concurrent requests per ip/domain *per downloader*.
Since each process has a separate downloader,
you will have that amount more requests per downloader.

What you want to do totally makes sense.
You are crawling from the same machine.
If multiple processes start crawling the same host
you may get blocked.

There are several ways to solve this:

You can ensure that your crawls don't overlap (in terms of crawled hosts)
by configuring your spiders and whatever schedules their runs.

You can write a custom downloader
that communicates with other instances of itself
and they all together keep track of some global slot(per ip/domain) 
utilization.
I wonder if the developers would be interested in integrating such a 
solution in scrapy.

Finally, it's theoretically possible in a router/firewall.
Traffic shaping usually works the other way round
so I don't know if you'll find enough information on this.
This is different that limiting the requests per ip,
it would only limit the bandwidth per ip.

On Friday, 6 January 2017 19:53:43 UTC+2, k bez wrote:
>
> I have 2 scrapyd instances with max_proc=8 each.
> I am aware for CONCURRENT_REQUESTS_PER_IP and 
> CONCURRENT_REQUESTS_PER_DOMAIN settings but i read in a previous post that 
> they are for each slot/proc.
> I tested it and even if i set 1 on both settings its keep downloads 8x2 
> concurrent items of same domain.
> Is there some way to limit concurrent requests when i have max_proc > 1 in 
> scrapyd?
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Concurrent settings in scrapy/scrapyd

Reply via email to