I am trying to create a downloader middleware that will set a assign random 
IP address every 10 requests. However, I noticed that when I assign a new 
proxy address after the 10th request, the spider continues to use the 
original proxy.

Does Scrapy pool connections and re-use the proxies? If so, how do I change 
this?

Here is a simple test spider that shows an example of the issue. The spider 
correctly sets the first proxy IP to *191.101.55.67*. However, when the 
proxy is set to *https://79.110.19.164 (*confirmed by printing the 
response.meta within the spider parse function), the IP continues to be 
show up as *191.101.55.67. *


*ipi.ipify.org response: {"ip":"191.101.55.67*"} 

*Response body in meta:* { 'proxy': *'https://79.110.19.164:8085*' }

>  

>

*Spider*

class TestSpider(scrapy.Spider):
    name = "test"
    allowed_domains = ["api.ipify.org"]
    start_urls = [
        'https://api.ipify.org/?format=json',
    ] * 20

    def parse(self, response):
     print response.body, response.meta
        pass

*Downloader Middleware *

class RandomProxy(object):

    def __init__(self, settings):
        self.proxy_list = settings.get('PROXY_LIST')
        fin = open(self.proxy_list)

        self.proxies = {}
        for line in fin.readlines():
            parts = re.match('(\w+://)(\w+:\w+@)?(.+)', line)

            # Cut trailing @
            if parts.group(2):
                user_pass = parts.group(2)[:-1]
            else:
                user_pass = ''

            self.proxies[parts.group(1) + parts.group(3)[:-1]] = user_pass

        fin.close()

        self.proxy_address = None
        self._get_new_proxy()

    def _get_new_proxy(self):
        if self.proxy_address is not None:
            del self.proxies[self.proxy_address]
        self.proxy_address = random.choice(self.proxies.keys())
        self.proxy_request_count = 0
        log.msg('Using proxy <%s>, (%d proxies left)' % 
(self.proxy_address, len(self.proxies)))
        return self.proxy_address

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.settings)

    def process_request(self, request, spider):

        new_proxy = self.proxy_address
        self.proxy_request_count += 1
        if self.proxy_request_count == 10:
            new_proxy = self._get_new_proxy()

       * request.meta['proxy'] = new_proxy*
        print "ASSIGNED PROXY %s" % new_proxy

    def process_exception(self, request, exception, spider):
        self._get_new_proxy()

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to