I have made some headway. It seems things are not working since Scrapy / Splash is sending a POST request as seen in the Splash log:
Scrapy output: 2016-06-03 19:43:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2016-06-03 19:43:37 [scrapy] DEBUG: Telnet console listening on 127.0.0.1: 6023 2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (404) <GET https://sapui5.hana.ondemand.com/robots.txt> (referer: None) 2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (200) <GET https://sapui5.hana.ondemand.com/> (referer: None) 2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (404) <GET http://localhost:8050/robots.txt> (referer: None) 2016-06-03 19:43:42 [scrapy] DEBUG: Crawled (200) <GET https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html via http://localhost:8050/render.html> (referer: None) Splash Window 2016-06-04 02:43:42.574895 [pool] [140619310439728] SLOT 10 done with < splash.qtrender.HtmlRender object at 0x7fe4341c00b8> 2016-06-04 02:43:42.576237 [events] {"active": 0, "path": "/render.html", "rendertime": 5.003755807876587, "maxrss": 94368, "client_ip": "172.17.0.1", "qsize": 0, "method": "POST", "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36", "timestamp": 1465008222, "load": [0.09, 0.05, 0.05], "status_code": 200, "fds": 19, "_id": 140619310439728, "args": {"height": 768, "headers": {"Accept-Encoding": "gzip,deflate", "Referer": "https://sapui5.hana.ondemand.com/", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en"}, "uid": 140619310439728, "png": 1, "iframes": 1, "wait": 5.0, "url": "https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html", "http_method": "GET", "timeout": 10, "script": 1, "width": 1024, "html": 1, "console": 1}} 2016-06-04 02:43:42.576691 [-] "172.17.0.1" - - [04/Jun/2016:02:43:41 +0000] "POST /render.html HTTP/1.1" 200 1830 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" 2016-06-04 02:43:42.577109 [pool] SLOT 10 is available When I use this curl request: curl 'http://localhost:8050/render.html?url=https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html&iframe=1&html=1&png=1&width=1024&height=768&script=1&console=1&timeout=10&wait=0.5' When I use curl, it uses a GET request, and the data is rendered appropriately. 2016-06-04 02:45:06.550405 [pool] [140619313333752] SLOT 11 done with < splash.qtrender.HtmlRender object at 0x7fe47c038390> 2016-06-04 02:45:06.551410 [events] {"active": 0, "path": "/render.html", "rendertime": 0.7969868183135986, "maxrss": 94368, "client_ip": "172.17.0.1" , "qsize": 0, "method": "GET", "user-agent": "curl/7.47.0", "timestamp": 1465008306, "load": [0.23, 0.11, 0.07], "status_code": 200, "fds": 19, "_id" : 140619313333752, "args": {"height": "768", "console": "1", "iframe": "1", "uid": 140619313333752, "png": "1", "width": "1024", "wait": "0.5", "url": "https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html", "timeout": "10", "script": "1", "html": "1"}} 2016-06-04 02:45:06.552238 [-] "172.17.0.1" - - [04/Jun/2016:02:45:05 +0000] "GET /render.html?url=https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html&iframe=1&html=1&png=1&width=1024&height=768&script=1&console=1&timeout=10&wait=0.5 HTTP/1.1" 200 5562 "-" "curl/7.47.0" 2016-06-04 02:45:06.552681 [pool] SLOT 11 is available No matter what I try in my spider, it always sends a POST request; Here is my latest code: def parse(self, response): #url = 'https://sapui5.hana.ondemand.com/sdk/#docs/api/symbols/sap.html' url = 'https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html' yield SplashRequest(url, self.parse_page, args={ 'http_method': 'GET', 'timeout': 10, 'wait': 5., 'iframes': 1, 'html': 1, 'png': 1, 'script': 1, 'console': 1, 'width': 1024, 'height': 768, }, endpoint='render.html') Whether I use render.json or render.html, same result a POST request is sent. Any idea how to change that? David -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
