I am trying to create a downloader middleware that will set a assign random IP address every 10 requests. However, I noticed that when I assign a new proxy address after the 10th request, the spider continues to use the original proxy.
Does Scrapy pool connections and re-use the proxies? If so, how do I change this? Here is a simple test spider that shows an example of the issue. The spider correctly sets the first proxy IP to *191.101.55.67*. However, when the proxy is set to *https://79.110.19.164 (*confirmed by printing the response.meta within the spider parse function), the IP continues to be show up as *191.101.55.67. * *ipi.ipify.org response: {"ip":"191.101.55.67*"} *Response body in meta:* { 'proxy': *'https://79.110.19.164:8085*' } > > *Spider* class TestSpider(scrapy.Spider): name = "test" allowed_domains = ["api.ipify.org"] start_urls = [ 'https://api.ipify.org/?format=json', ] * 20 def parse(self, response): print response.body, response.meta pass *Downloader Middleware * class RandomProxy(object): def __init__(self, settings): self.proxy_list = settings.get('PROXY_LIST') fin = open(self.proxy_list) self.proxies = {} for line in fin.readlines(): parts = re.match('(\w+://)(\w+:\w+@)?(.+)', line) # Cut trailing @ if parts.group(2): user_pass = parts.group(2)[:-1] else: user_pass = '' self.proxies[parts.group(1) + parts.group(3)[:-1]] = user_pass fin.close() self.proxy_address = None self._get_new_proxy() def _get_new_proxy(self): if self.proxy_address is not None: del self.proxies[self.proxy_address] self.proxy_address = random.choice(self.proxies.keys()) self.proxy_request_count = 0 log.msg('Using proxy <%s>, (%d proxies left)' % (self.proxy_address, len(self.proxies))) return self.proxy_address @classmethod def from_crawler(cls, crawler): return cls(crawler.settings) def process_request(self, request, spider): new_proxy = self.proxy_address self.proxy_request_count += 1 if self.proxy_request_count == 10: new_proxy = self._get_new_proxy() * request.meta['proxy'] = new_proxy* print "ASSIGNED PROXY %s" % new_proxy def process_exception(self, request, exception, spider): self._get_new_proxy() -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.