Implement process_request(), process_response(), process_exception() in your own proxy middleware and disable build-in proxy middleware, if a proxy is outdated, return the Request again in process_response() according HTTP status code or in process_exception() , the Request will be processed again including be assigned a new proxy.
在 2014年10月14日,上午8:32,Sungmin Lee <id.sungmin....@gmail.com> 写道: > Hi all, > > I'm implementing a spider working over proxy, so I've overridden > proxymiddleware. it works so far so good. > > What I want to ultimately achieve is that, > > 1) assign a proxy > 2) start scraping > 3) when proxy address is out-dated, broken, etc., apply new healthy proxy. > 4) continue scraping > > > The problem is that, whenever a proxy address becomes corrupted, scrapy just > hangs there waiting for TCP response. > I wanted to utilize httpRetryMiddleware but it doesn't help as scrapy doesn't > return response.status. > > 2014-10-13 16:46:22-0700 [proxy_test] INFO: Crawled 0 pages (at 0 pages/min), > scraped 0 items (at 0 items/min) > 2014-10-13 16:46:53-0700 [proxy_test] DEBUG: Retrying <GET > http://some/website> (failed 1 times): TCP connection timed out: 60: > Operation timed out. > 2014-10-13 16:46:54-0700 [proxy_test] DEBUG: Retrying <GET > http://some/website> (failed 1 times): TCP connection timed out: 60: > Operation timed out. > 2014-10-13 16:46:54-0700 [proxy_test] DEBUG: Retrying <GET > http://some/website> (failed 1 times): TCP connection timed out: 60: > Operation timed out. > 2014-10-13 16:46:55-0700 [proxy_test] DEBUG: Retrying <GET > http://some/website> (failed 1 times): TCP connection timed out: 60: > Operation timed out. > 2014-10-13 16:46:56-0700 [proxy_test] DEBUG: Retrying <GET > http://some/website> (failed 1 times): TCP connection timed out: 60: > Operation timed out. > 2014-10-13 16:46:57-0700 [proxy_test] DEBUG: Retrying <GET > http://some/website> (failed 1 times): TCP connection timed out: 60: > Operation timed out. > > > Is there any way that I can handle this timeout issue? > > Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to scrapy-users+unsubscr...@googlegroups.com. > To post to this group, send email to scrapy-users@googlegroups.com. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.