Hi, could you get it to work? I am facing the same issue, can't get it to work. Any help would be appreciated.
On Friday, August 26, 2011 at 11:14:47 PM UTC+5:30, Pablo Hoffman wrote: > > https proxies are not supported yet. There's more information on this > ticket: > http://dev.scrapy.org/ticket/159 > > On Thu, Aug 25, 2011 at 08:04:23PM -0700, Oana Goga wrote: > > Hi, > > > > I am trying to use scrapy to access https web pages over a proxy and > > I have some problems getting it to work. > > When I am trying to fetch/view https://www.paypal.com with scrapy I > > am getting the 501 error (Not Implemented), but when I fetch the > > page with wget everything is working well. Here are the steps that > > I am doing: > > > > $ export http_proxy="http://us.proxymesh.com:31280" > > $ export https_proxy="http://us.proxymesh.com:31280" > > $ scrapy view https://www.paypal.com > > 2011-08-25 19:41:43-0700 [scrapy] INFO: Scrapy 0.12.0.2545 started > > (bot: nice_bot) > > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled extensions: > > FeedExporter, TelnetConsole, SpiderContext, WebService, CoreStats, > > MemoryUsage, CloseSpider > > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled scheduler > > middlewares: DuplicatesFilterMiddleware > > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled downloader > > middlewares: HttpProxyMiddleware, HttpAuthMiddleware, > > DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, > > DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, > > HttpCompressionMiddleware, DownloaderStats > > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled spider middlewares: > > HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, > > UrlCanonicalizerMiddleware, UrlLengthMiddleware, DepthMiddleware > > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled item pipelines: > > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Telnet console listening on > > 0.0.0.0:6023 > > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Web service listening on > > 0.0.0.0:6080 > > 2011-08-25 19:41:43-0700 [default] INFO: Spider opened > > 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Cookie: None for > > https://www.paypal.com > > 2011-08-25 19:41:44-0700 [scrapy] INFO: Set-Cookie: [] from > > https://www.paypal.com > > 2011-08-25 19:41:44-0700 [default] *DEBUG: Crawled (501) <GET > > https://www.paypal.com>* (referer: None) > > 2011-08-25 19:41:44-0700 [default] INFO: Closing spider (finished) > > 2011-08-25 19:41:48-0700 [default] INFO: Spider closed (finished) > > > > > > $ wget https://www.paypal.com > > --2011-08-25 19:44:08-- https://www.paypal.com/ > > Resolving us.proxymesh.com... 184.106.76.204 > > Connecting to us.proxymesh.com|184.106.76.204|:31280... connected. > > Proxy request sent, awaiting response*... 200 OK* > > Length: unspecified [text/html] > > Saving to: `index.html' > > > > I have scrapy 0.12.0.2545 , twisted 11.0.0 and python 2.7. > > > > After some investigation, it appears that scrapy instead of issuing > > a CONNECT method and then doing a GET it is only issuing a GET > > requests which causes the fetch to fail. > > > > Do you have any idea why this happens and how it can be fixed? > > > > Thanks, > > Oana > > > > > > > > > > > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "scrapy-users" group. > > To post to this group, send email to scrapy...@googlegroups.com > <javascript:>. > > To unsubscribe from this group, send email to > scrapy-users...@googlegroups.com <javascript:>. > > For more options, visit this group at > http://groups.google.com/group/scrapy-users?hl=en. > > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.