Re: How to use TOR ?

Mayank Chutani Mon, 25 May 2015 02:23:53 -0700

I tried your snippet, seems like scrapy is hanging:

2015-05-25 13:32:46+0530 [scrapy] INFO: Scrapy 0.24.4 started (bot: proxy)
2015-05-25 13:32:46+0530 [scrapy] INFO: Optional features available: ssl, 
http11, boto
2015-05-25 13:32:46+0530 [scrapy] INFO: Overridden settings: 
{'NEWSPIDER_MODULE': 'proxy.spiders', 'SPIDER_MODULES': ['proxy.spiders'], 
'BOT_NAME': 'proxy'}
2015-05-25 13:32:46+0530 [scrapy] INFO: Enabled extensions: LogStats, 
TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-05-25 13:32:46+0530 [scrapy] INFO: Enabled downloader middlewares: 
HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, 
RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, 
HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, 
ChunkedTransferMiddleware, DownloaderStats
2015-05-25 13:32:46+0530 [scrapy] INFO: Enabled spider middlewares: 
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, 
UrlLengthMiddleware, DepthMiddleware
2015-05-25 13:32:46+0530 [scrapy] INFO: Enabled item pipelines: 
2015-05-25 13:32:46+0530 [proxy] INFO: Spider opened
2015-05-25 13:32:46+0530 [proxy] INFO: Crawled 0 pages (at 0 pages/min), 
scraped 0 items (at 0 items/min)
2015-05-25 13:32:46+0530 [scrapy] DEBUG: Telnet console listening on 
127.0.0.1:6023
2015-05-25 13:32:46+0530 [scrapy] DEBUG: Web service listening on 
127.0.0.1:6080
        2015-05-25 13:33:46+0530 [proxy] INFO: Crawled 0 pages (at 0 
pages/min), scraped 0 items (at 0 items/min)
2015-05-25 13:34:46+0530 [proxy] INFO: Crawled 0 pages (at 0 pages/min), 
scraped 0 items (at 0 items/min)
2015-05-25 13:34:54+0530 [proxy] DEBUG: Retrying <GET 
https://check.torproject.org/> (failed 1 times): TCP connection timed out: 
110: Connection timed out.
2015-05-25 13:34:54+0530 [proxy] DEBUG: Retrying <GET 
http://my-ip.heroku.com> (failed 1 times): TCP connection timed out: 110: 
Connection timed out.


Any ideas??
 Plus please describe the authentication,  how you're using it.

On Friday, May 18, 2012 at 1:36:54 PM UTC+5:30, Максим Горковский wrote:
>
> You would have to install tor and polipo from packages and write simple 
> middleware which forces tor to change route when scrapy retrying to recieve 
> page:
>
> class RetryChangeProxyMiddleware(RetryMiddleware):
>     def _retry(self, request, reason, spider):
>         log.msg('Changing proxy')
>         tn = telnetlib.Telnet('127.0.0.1', 9051)
>         tn.read_until("Escape character is '^]'.", 2)
>         tn.write('AUTHENTICATE "267765"\r\n')
>         tn.read_until("250 OK", 2)
>         tn.write("signal NEWNYM\r\n")
>         tn.read_until("250 OK", 2)
>         tn.write("quit\r\n")
>         tn.close()
>         time.sleep(3)
>         log.msg('Proxy changed')
>         return RetryMiddleware._retry(self, request, reason, spider)
>
> then use it in settings.py:
>
> DOWNLOADER_MIDDLEWARE = {
>                          'spider.middlewares.RetryChangeProxyMiddleware': 
> 600,
>                          }
>
> and then you just want to send requests through local tor proxy (polipo) 
> which could be done with:
> tsocks scrapy crawl spirder 
>
> I've done this long ago but this solution seems ok and, most important, it 
> works
>
>
> 2012/5/18 Peter Chatzilampros <[email protected] <javascript:>>
>
>> Hello,
>> does anybody know how to use TOR with scrapy in ubuntu?
>> I don't know much about TOR. 
>> Do I have to make a tor account and install some special packages?
>> Do I have to write a middleware?
>> Is there any sample code?
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> To unsubscribe from this group, send email to 
>> [email protected] <javascript:>.
>> For more options, visit this group at 
>> http://groups.google.com/group/scrapy-users?hl=en.
>>
>
>
>
> -- 
> С уважением,
> Максим Горковский
>  

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: How to use TOR ?

Reply via email to