Re: Scrapy with https proxy

Palash Jain Tue, 21 Jun 2016 01:05:43 -0700

Hi, could you get it to work?
I am facing the same issue, can't get it to work. Any help would be 
appreciated.


On Friday, August 26, 2011 at 11:14:47 PM UTC+5:30, Pablo Hoffman wrote:
>
> https proxies are not supported yet. There's more information on this 
> ticket:
> http://dev.scrapy.org/ticket/159
>
> On Thu, Aug 25, 2011 at 08:04:23PM -0700, Oana Goga wrote:
> > Hi,
> > 
> > I am trying to use scrapy to access https web pages over a proxy and
> > I have some problems getting it to work.
> > When I am trying to fetch/view https://www.paypal.com with scrapy I
> > am getting the 501 error (Not Implemented), but when I fetch the
> > page with wget everything is working well.  Here are the steps that
> > I am doing:
> > 
> > $ export http_proxy="http://us.proxymesh.com:31280";
> > $ export https_proxy="http://us.proxymesh.com:31280";
> > $ scrapy view https://www.paypal.com
> > 2011-08-25 19:41:43-0700 [scrapy] INFO: Scrapy 0.12.0.2545 started
> > (bot: nice_bot)
> > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled extensions:
> > FeedExporter, TelnetConsole, SpiderContext, WebService, CoreStats,
> > MemoryUsage, CloseSpider
> > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled scheduler
> > middlewares: DuplicatesFilterMiddleware
> > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled downloader
> > middlewares: HttpProxyMiddleware, HttpAuthMiddleware,
> > DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware,
> > DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware,
> > HttpCompressionMiddleware, DownloaderStats
> > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled spider middlewares:
> > HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
> > UrlCanonicalizerMiddleware, UrlLengthMiddleware, DepthMiddleware
> > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled item pipelines:
> > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Telnet console listening on
> > 0.0.0.0:6023
> > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Web service listening on
> > 0.0.0.0:6080
> > 2011-08-25 19:41:43-0700 [default] INFO: Spider opened
> > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Cookie: None for
> > https://www.paypal.com
> > 2011-08-25 19:41:44-0700 [scrapy] INFO: Set-Cookie: [] from
> > https://www.paypal.com
> > 2011-08-25 19:41:44-0700 [default] *DEBUG: Crawled (501) <GET
> > https://www.paypal.com>* (referer: None)
> > 2011-08-25 19:41:44-0700 [default] INFO: Closing spider (finished)
> > 2011-08-25 19:41:48-0700 [default] INFO: Spider closed (finished)
> > 
> > 
> > $ wget https://www.paypal.com
> > --2011-08-25 19:44:08--  https://www.paypal.com/
> > Resolving us.proxymesh.com... 184.106.76.204
> > Connecting to us.proxymesh.com|184.106.76.204|:31280... connected.
> > Proxy request sent, awaiting response*... 200 OK*
> > Length: unspecified [text/html]
> > Saving to: `index.html'
> > 
> > I have scrapy 0.12.0.2545 , twisted 11.0.0 and python 2.7.
> > 
> > After some investigation, it appears that scrapy instead of issuing
> > a CONNECT method and then doing a GET it is only issuing a GET
> > requests which causes the fetch to fail.
> > 
> > Do you have any idea why this happens and how it can be fixed?
> > 
> > Thanks,
> > Oana
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "scrapy-users" group.
> > To post to this group, send email to scrapy...@googlegroups.com 
> <javascript:>.
> > To unsubscribe from this group, send email to 
> scrapy-users...@googlegroups.com <javascript:>.
> > For more options, visit this group at 
> http://groups.google.com/group/scrapy-users?hl=en.
> > 
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Scrapy with https proxy

Reply via email to