I tried your snippet, seems like scrapy is hanging:
2015-05-25 13:32:46+0530 [scrapy] INFO: Scrapy 0.24.4 started (bot: proxy)
2015-05-25 13:32:46+0530 [scrapy] INFO: Optional features available: ssl,
http11, boto
2015-05-25 13:32:46+0530 [scrapy] INFO: Overridden settings:
{'NEWSPIDER_MODULE': 'proxy.spiders', 'SPIDER_MODULES': ['proxy.spiders'],
'BOT_NAME': 'proxy'}
2015-05-25 13:32:46+0530 [scrapy] INFO: Enabled extensions: LogStats,
TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-05-25 13:32:46+0530 [scrapy] INFO: Enabled downloader middlewares:
HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware,
RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware,
HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware,
ChunkedTransferMiddleware, DownloaderStats
2015-05-25 13:32:46+0530 [scrapy] INFO: Enabled spider middlewares:
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
UrlLengthMiddleware, DepthMiddleware
2015-05-25 13:32:46+0530 [scrapy] INFO: Enabled item pipelines:
2015-05-25 13:32:46+0530 [proxy] INFO: Spider opened
2015-05-25 13:32:46+0530 [proxy] INFO: Crawled 0 pages (at 0 pages/min),
scraped 0 items (at 0 items/min)
2015-05-25 13:32:46+0530 [scrapy] DEBUG: Telnet console listening on
127.0.0.1:6023
2015-05-25 13:32:46+0530 [scrapy] DEBUG: Web service listening on
127.0.0.1:6080
2015-05-25 13:33:46+0530 [proxy] INFO: Crawled 0 pages (at 0
pages/min), scraped 0 items (at 0 items/min)
2015-05-25 13:34:46+0530 [proxy] INFO: Crawled 0 pages (at 0 pages/min),
scraped 0 items (at 0 items/min)
2015-05-25 13:34:54+0530 [proxy] DEBUG: Retrying <GET
https://check.torproject.org/> (failed 1 times): TCP connection timed out:
110: Connection timed out.
2015-05-25 13:34:54+0530 [proxy] DEBUG: Retrying <GET
http://my-ip.heroku.com> (failed 1 times): TCP connection timed out: 110:
Connection timed out.
Any ideas??
Plus please describe the authentication, how you're using it.
On Friday, May 18, 2012 at 1:36:54 PM UTC+5:30, Максим Горковский wrote:
>
> You would have to install tor and polipo from packages and write simple
> middleware which forces tor to change route when scrapy retrying to recieve
> page:
>
> class RetryChangeProxyMiddleware(RetryMiddleware):
> def _retry(self, request, reason, spider):
> log.msg('Changing proxy')
> tn = telnetlib.Telnet('127.0.0.1', 9051)
> tn.read_until("Escape character is '^]'.", 2)
> tn.write('AUTHENTICATE "267765"\r\n')
> tn.read_until("250 OK", 2)
> tn.write("signal NEWNYM\r\n")
> tn.read_until("250 OK", 2)
> tn.write("quit\r\n")
> tn.close()
> time.sleep(3)
> log.msg('Proxy changed')
> return RetryMiddleware._retry(self, request, reason, spider)
>
> then use it in settings.py:
>
> DOWNLOADER_MIDDLEWARE = {
> 'spider.middlewares.RetryChangeProxyMiddleware':
> 600,
> }
>
> and then you just want to send requests through local tor proxy (polipo)
> which could be done with:
> tsocks scrapy crawl spirder
>
> I've done this long ago but this solution seems ok and, most important, it
> works
>
>
> 2012/5/18 Peter Chatzilampros <[email protected] <javascript:>>
>
>> Hello,
>> does anybody know how to use TOR with scrapy in ubuntu?
>> I don't know much about TOR.
>> Do I have to make a tor account and install some special packages?
>> Do I have to write a middleware?
>> Is there any sample code?
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To post to this group, send email to [email protected]
>> <javascript:>.
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/scrapy-users?hl=en.
>>
>
>
>
> --
> С уважением,
> Максим Горковский
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.