Hi, here is the code and logs:
https://gist.github.com/rolando/e3da0515aff240dde3e790196809b4d6

I had to increase the wait time to 10 as I was getting empty result with 5.

Best,

Rolando

On Fri, Jun 3, 2016 at 11:46 PM, David Fishburn <[email protected]>
wrote:

> I have made some headway.
>
> It seems things are not working since Scrapy / Splash is sending a POST
> request as seen in the Splash log:
>
> Scrapy output:
>
> 2016-06-03 19:43:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min),
> scraped 0 items (at 0 items/min)
> 2016-06-03 19:43:37 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:
> 6023
> 2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (404) <GET https://
> sapui5.hana.ondemand.com/robots.txt> (referer: None)
> 2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (200) <GET https://
> sapui5.hana.ondemand.com/> (referer: None)
> 2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (404) <GET 
> http://localhost:8050/robots.txt>
> (referer: None)
> 2016-06-03 19:43:42 [scrapy] DEBUG: Crawled (200) <GET https://
> sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html via
> http://localhost:8050/render.html> (referer: None)
>
>
> Splash Window
>
> 2016-06-04 02:43:42.574895 [pool] [140619310439728] SLOT 10 done with <
> splash.qtrender.HtmlRender object at 0x7fe4341c00b8>
> 2016-06-04 02:43:42.576237 [events] {"active": 0, "path": "/render.html",
> "rendertime": 5.003755807876587, "maxrss": 94368, "client_ip":
> "172.17.0.1", "qsize": 0, "method": "POST", "user-agent": "Mozilla/5.0
> (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
> Chrome/50.0.2661.102 Safari/537.36", "timestamp": 1465008222, "load": [
> 0.09, 0.05, 0.05], "status_code": 200, "fds": 19, "_id": 140619310439728,
> "args": {"height": 768, "headers": {"Accept-Encoding": "gzip,deflate",
> "Referer": "https://sapui5.hana.ondemand.com/";, "User-Agent": "Mozilla/5.0
> (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
> Chrome/50.0.2661.102 Safari/537.36", "Accept":
> "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
> "Accept-Language": "en"}, "uid": 140619310439728, "png": 1, "iframes": 1,
> "wait": 5.0, "url": "
> https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html";,
> "http_method": "GET", "timeout": 10, "script": 1, "width": 1024, "html": 1
> , "console": 1}}
> 2016-06-04 02:43:42.576691 [-] "172.17.0.1" - - [04/Jun/2016:02:43:41 +
> 0000] "POST /render.html HTTP/1.1" 200 1830 "-" "Mozilla/5.0 (Windows NT
> 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
> Chrome/50.0.2661.102 Safari/537.36"
> 2016-06-04 02:43:42.577109 [pool] SLOT 10 is available
>
>
> When I use this curl request:
>
> curl '
> http://localhost:8050/render.html?url=https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html&iframe=1&html=1&png=1&width=1024&height=768&script=1&console=1&timeout=10&wait=0.5
> '
>
>
>
> When I use curl, it uses a GET request, and the data is rendered
> appropriately.
>
>
> 2016-06-04 02:45:06.550405 [pool] [140619313333752] SLOT 11 done with <
> splash.qtrender.HtmlRender object at 0x7fe47c038390>
> 2016-06-04 02:45:06.551410 [events] {"active": 0, "path": "/render.html",
> "rendertime": 0.7969868183135986, "maxrss": 94368, "client_ip":
> "172.17.0.1", "qsize": 0, "method": "GET", "user-agent": "curl/7.47.0",
> "timestamp": 1465008306, "load": [0.23, 0.11, 0.07], "status_code": 200,
> "fds": 19, "_id": 140619313333752, "args": {"height": "768", "console":
> "1", "iframe": "1", "uid": 140619313333752, "png": "1", "width": "1024",
> "wait": "0.5", "url": "
> https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html";, "timeout"
> : "10", "script": "1", "html": "1"}}
> 2016-06-04 02:45:06.552238 [-] "172.17.0.1" - - [04/Jun/2016:02:45:05 +
> 0000] "GET /render.html?url=
> https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html&iframe=1&html=1&png=1&width=1024&height=768&script=1&console=1&timeout=10&wait=0.5
> HTTP/1.1" 200 5562 "-" "curl/7.47.0"
> 2016-06-04 02:45:06.552681 [pool] SLOT 11 is available
>
>
>
>
> No matter what I try in my spider, it always sends a POST request;
> Here is my latest code:
>
>   def parse(self, response):
>         #url = '
> https://sapui5.hana.ondemand.com/sdk/#docs/api/symbols/sap.html'
>         url = '
> https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html'
>         yield SplashRequest(url, self.parse_page,
>                             args={
>                                 'http_method': 'GET',
>                                 'timeout': 10,
>                                 'wait': 5.,
>                                 'iframes': 1,
>                                 'html': 1,
>                                 'png': 1,
>                                 'script': 1,
>                                 'console': 1,
>                                 'width': 1024,
>                                 'height': 768,
>                             },
>                             endpoint='render.html')
>
>
> Whether I use render.json or render.html, same result a POST request is
> sent.
>
> Any idea how to change that?
>
> David
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to