Implement process_request(), process_response(), process_exception() in your 
own proxy middleware and disable build-in proxy middleware, if a proxy is 
outdated, return the Request again in process_response() according HTTP status 
code or in process_exception() , the Request will be processed again including 
be assigned a new proxy.

在 2014年10月14日,上午8:32,Sungmin Lee <id.sungmin....@gmail.com> 写道:

> Hi all,
> 
> I'm implementing a spider working over proxy, so I've overridden 
> proxymiddleware. it works so far so good.
> 
> What I want to ultimately achieve is that,
> 
> 1) assign a proxy
> 2) start scraping
> 3) when proxy address is out-dated, broken, etc., apply new healthy proxy.
> 4) continue scraping
> 
> 
> The problem is that, whenever a proxy address becomes corrupted, scrapy just 
> hangs there waiting for TCP response.
> I wanted to utilize httpRetryMiddleware but it doesn't help as scrapy doesn't 
> return response.status.
> 
> 2014-10-13 16:46:22-0700 [proxy_test] INFO: Crawled 0 pages (at 0 pages/min), 
> scraped 0 items (at 0 items/min)
> 2014-10-13 16:46:53-0700 [proxy_test] DEBUG: Retrying <GET 
> http://some/website> (failed 1 times): TCP connection timed out: 60: 
> Operation timed out.
> 2014-10-13 16:46:54-0700 [proxy_test] DEBUG: Retrying <GET 
> http://some/website> (failed 1 times): TCP connection timed out: 60: 
> Operation timed out.
> 2014-10-13 16:46:54-0700 [proxy_test] DEBUG: Retrying <GET 
> http://some/website> (failed 1 times): TCP connection timed out: 60: 
> Operation timed out.
> 2014-10-13 16:46:55-0700 [proxy_test] DEBUG: Retrying <GET 
> http://some/website> (failed 1 times): TCP connection timed out: 60: 
> Operation timed out.
> 2014-10-13 16:46:56-0700 [proxy_test] DEBUG: Retrying <GET 
> http://some/website> (failed 1 times): TCP connection timed out: 60: 
> Operation timed out.
> 2014-10-13 16:46:57-0700 [proxy_test] DEBUG: Retrying <GET 
> http://some/website> (failed 1 times): TCP connection timed out: 60: 
> Operation timed out.
> 
> 
> Is there any way that I can handle this timeout issue?
> 
> Thanks!
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to scrapy-users+unsubscr...@googlegroups.com.
> To post to this group, send email to scrapy-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to