Re: [R] How to download this data
This is not a simple question. The data are in an html-formatted web page. You must scrape the html for the data and read it into an R table (or other appropriate R data structure). Searching (the web) on scrape data from html into R listed several packages that claim to enable you to do this easily. Choose what seems best for you. You should also install and read the documentation for the XML package, which is also used for this purpose, though those you find above may be slicker. Disclaimer: I have no direct experience with this. I'm just pointing out what I believe are relevant resources. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Tue, Aug 25, 2015 at 11:10 AM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hi, I would like to download data from below page directly onto R. http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm Could you please assist me how can I do that programmatically. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data
Looks like you can get what you need from http://www.nseindia.com/homepage/Indices1.json on that page. On Tue, Aug 25, 2015 at 2:23 PM, Bert Gunter bgunter.4...@gmail.com wrote: This is not a simple question. The data are in an html-formatted web page. You must scrape the html for the data and read it into an R table (or other appropriate R data structure). Searching (the web) on scrape data from html into R listed several packages that claim to enable you to do this easily. Choose what seems best for you. You should also install and read the documentation for the XML package, which is also used for this purpose, though those you find above may be slicker. Disclaimer: I have no direct experience with this. I'm just pointing out what I believe are relevant resources. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Tue, Aug 25, 2015 at 11:10 AM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hi, I would like to download data from below page directly onto R. http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm Could you please assist me how can I do that programmatically. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data
If there's no api available, I would use selenium to grab what I need and pipe it to R. Let me know if you need further assistance. Cheers! -- H On Aug 25, 2015 11:12 AM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hi, I would like to download data from below page directly onto R. http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm Could you please assist me how can I do that programmatically. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to download this data
Hi, I would like to download data from below page directly onto R. http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm Could you please assist me how can I do that programmatically. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data
I agree that this is a tricky task... even more so than using ascraping package because the page is built dynamically. This will take someone with skills in multiple web technologies to decipher the web page scripts to figure out how to manipulate the server to give you the data, because it isn't actually in the web page. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On August 25, 2015 11:23:26 AM PDT, Bert Gunter bgunter.4...@gmail.com wrote: This is not a simple question. The data are in an html-formatted web page. You must scrape the html for the data and read it into an R table (or other appropriate R data structure). Searching (the web) on scrape data from html into R listed several packages that claim to enable you to do this easily. Choose what seems best for you. You should also install and read the documentation for the XML package, which is also used for this purpose, though those you find above may be slicker. Disclaimer: I have no direct experience with this. I'm just pointing out what I believe are relevant resources. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Tue, Aug 25, 2015 at 11:10 AM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hi, I would like to download data from below page directly onto R. http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm Could you please assist me how can I do that programmatically. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data
Actually, in looking again, I noticed a download in csv link on the page, and this appears to provide a csv -formatted table that then can trivially be read into R by, e.g. read.csv() . So maybe all the html (or JSON) stuff can be ignored. -- Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Tue, Aug 25, 2015 at 11:59 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: I agree that this is a tricky task... even more so than using ascraping package because the page is built dynamically. This will take someone with skills in multiple web technologies to decipher the web page scripts to figure out how to manipulate the server to give you the data, because it isn't actually in the web page. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On August 25, 2015 11:23:26 AM PDT, Bert Gunter bgunter.4...@gmail.com wrote: This is not a simple question. The data are in an html-formatted web page. You must scrape the html for the data and read it into an R table (or other appropriate R data structure). Searching (the web) on scrape data from html into R listed several packages that claim to enable you to do this easily. Choose what seems best for you. You should also install and read the documentation for the XML package, which is also used for this purpose, though those you find above may be slicker. Disclaimer: I have no direct experience with this. I'm just pointing out what I believe are relevant resources. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Tue, Aug 25, 2015 at 11:10 AM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hi, I would like to download data from below page directly onto R. http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm Could you please assist me how can I do that programmatically. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data
Hello, There might be a problem: url - http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm; readLines(url) Error in file(con, r) : cannot open the connection In addition: Warning message: In file(con, r) : cannot open: HTTP status was '403 Forbidden' So I've downloaded the csv file with the data, but that's not programmatically. Rui Barradas Em 25-08-2015 19:10, Christofer Bogaso escreveu: Hi, I would like to download data from below page directly onto R. http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm Could you please assist me how can I do that programmatically. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data
FWIW. This violates their terms of service, unless you have their permission: http://www.nseindia.com/global/content/termsofuse.htm quote You may not conduct any systematic or automated data collection activities (including scraping, data mining, data extraction and data harvesting) on or in relation to our website without our express written consent. quote/ On Tue, Aug 25, 2015 at 1:10 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hi, I would like to download data from below page directly onto R. http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm Could you please assist me how can I do that programmatically. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Schrodinger's backup: The condition of any backup is unknown until a restore is attempted. Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be. He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data
... but not programmatically. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On August 25, 2015 12:11:15 PM PDT, Bert Gunter bgunter.4...@gmail.com wrote: Actually, in looking again, I noticed a download in csv link on the page, and this appears to provide a csv -formatted table that then can trivially be read into R by, e.g. read.csv() . So maybe all the html (or JSON) stuff can be ignored. -- Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Tue, Aug 25, 2015 at 11:59 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: I agree that this is a tricky task... even more so than using ascraping package because the page is built dynamically. This will take someone with skills in multiple web technologies to decipher the web page scripts to figure out how to manipulate the server to give you the data, because it isn't actually in the web page. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On August 25, 2015 11:23:26 AM PDT, Bert Gunter bgunter.4...@gmail.com wrote: This is not a simple question. The data are in an html-formatted web page. You must scrape the html for the data and read it into an R table (or other appropriate R data structure). Searching (the web) on scrape data from html into R listed several packages that claim to enable you to do this easily. Choose what seems best for you. You should also install and read the documentation for the XML package, which is also used for this purpose, though those you find above may be slicker. Disclaimer: I have no direct experience with this. I'm just pointing out what I believe are relevant resources. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Tue, Aug 25, 2015 at 11:10 AM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hi, I would like to download data from below page directly onto R. http://www.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm Could you please assist me how can I do that programmatically. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HOW TO DOWNLOAD INTRADAY DATA AT ONE TIME
Is there any way VIA R to download all available intraday data for stocks at once (for example, all the data available at the Indian stock exchange)? I need to make a comparative analysis and downloading the data by ticker is too time consuming, besides I want to know if there is any website that store the historical intraday data. Other sites delete the data gradually. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data?
Hello Duncan, Thank you very much for your pointer. However when I tried to run your code, I got following error: rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) Error in function (type, msg, asError = TRUE) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Can someone help me to understand what could be the cause of this error? Thank you. - Original Message - From: Duncan Temple Lang dtemplel...@ucdavis.edu To: r-help@r-project.org Cc: Sent: Saturday, 3 August 2013 4:33 AM Subject: Re: [R] How to download this data? That URL is an HTTPS (secure HTTP), not an HTTP. The XML parser cannot retrieve the file. Instead, use the RCurl package to get the file. However, it is more complicated than that. If you look at source of the HTML page in a browser, you'll see a jsessionid and that is a session identifier. The following retrieves the content of your URL and then parses it and extracts the value of the jsessionid. Then we create the full URL to the actual data page (which is actually in the HTML content but in JavaScript code) library(RCurl) library(XML) rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) rawDoc = htmlParse(rawOrig) tmp = getNodeSet(rawDoc, //@href[contains(.,\040'jsessionid=')])[[1]] jsession = gsub(.*jsessionid=([^?]+)?.*, \\1, tmp) u = sprintf(https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;, jsession) doc = htmlParse(getURLContent(u)) tbls = readHTMLTable(doc) data = tbls[[1]] dim(data) I did this quickly so it may not be the best way or completely robust, but hopefully it gets the point across and does get the data. D. On 8/2/13 2:42 PM, Ron Michael wrote: Hi all, I need to download the data from this web page: https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry I used the function readHTMLTable() from package XML, however could not download that. Can somebody help me how to get the data onto my R window? Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data?
In the mean time I have this problem sorted out, hopefully I did it correctly. I have modified the line of your code as: rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;, ssl.verifypeer = FALSE) However next I faced with another problem to executing: u = sprintf(a href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;, jsession) Error: unexpected symbol in u = sprintf(a href=https Can you or someone else help me to get out of this error? Also, my another question is: from where you got the expression: a href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219; I really appreciate if someone help me to understand that. Thank you. - Original Message - From: Ron Michael ron_michae...@yahoo.com To: Duncan Temple Lang dtemplel...@ucdavis.edu; r-help@r-project.org r-help@r-project.org Cc: Sent: Saturday, 3 August 2013 12:58 PM Subject: Re: [R] How to download this data? Hello Duncan, Thank you very much for your pointer. However when I tried to run your code, I got following error: rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) Error in function (type, msg, asError = TRUE) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Can someone help me to understand what could be the cause of this error? Thank you. - Original Message - From: Duncan Temple Lang dtemplel...@ucdavis.edu To: r-help@r-project.org Cc: Sent: Saturday, 3 August 2013 4:33 AM Subject: Re: [R] How to download this data? That URL is an HTTPS (secure HTTP), not an HTTP. The XML parser cannot retrieve the file. Instead, use the RCurl package to get the file. However, it is more complicated than that. If you look at source of the HTML page in a browser, you'll see a jsessionid and that is a session identifier. The following retrieves the content of your URL and then parses it and extracts the value of the jsessionid. Then we create the full URL to the actual data page (which is actually in the HTML content but in JavaScript code) library(RCurl) library(XML) rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) rawDoc = htmlParse(rawOrig) tmp = getNodeSet(rawDoc, //@href[contains(.,\040'jsessionid=')])[[1]] jsession = gsub(.*jsessionid=([^?]+)?.*, \\1, tmp) u = sprintf(https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;, jsession) doc = htmlParse(getURLContent(u)) tbls = readHTMLTable(doc) data = tbls[[1]] dim(data) I did this quickly so it may not be the best way or completely robust, but hopefully it gets the point across and does get the data. D. On 8/2/13 2:42 PM, Ron Michael wrote: Hi all, I need to download the data from this web page: https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry I used the function readHTMLTable() from package XML, however could not download that. Can somebody help me how to get the data onto my R window? Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data?
Hi Ron Yes, you can use ssl.verifypeer = FALSE. Or alternatively, you can use also use getURLContent(, cainfo = system.file(CurlSSL, cacert.pem, package = RCurl)) to specify where libcurl can find the certificates to verify the SSL signature. The error you are encountering appears to becoming from a garbled R expression. This may have arisen as a result of an HTML mailer adding the a href= into the expression where it found an https://... What we want to do is end up with a string of the form https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=adasdasdad?expiryData=specId=219 We have to substitute the text adasdasdad which we assigned to jsession in a previous command. So, take the literal text c(https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=;, jsession, ?expiryData=specId=219) and combine it into a single string with paste0. We need the literal strings as they appear when you view the mail for R to make sense of them, not what the mailer adds. As to where I found this, it is in the source of the original HTML page in rawDoc scripts = getNodeSet(rawDoc, //body//script) scripts[[ length(scripts) ]] and look at the text, specifically the app.urls and its 'expiry' field. script type=text/javascript![CDATA[ var app = {}; app.isOption = false; app.urls = { 'spec':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?details=specId=219', 'data':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?data=specId=219', 'confirm':'/reports/dealreports/getSampleConfirm.do;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?hubId=403productId=254', 'reports':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?reports=specId=219', 'expiry':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?expiryDates=specId=219' }; app.Router = Backbone.Router.extend({ routes:{ spec:spec, data:data, confirm:confirm, On 8/3/13 1:05 AM, Ron Michael wrote: In the mean time I have this problem sorted out, hopefully I did it correctly. I have modified the line of your code as: rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;, ssl.verifypeer = FALSE) However next I faced with another problem to executing: u = sprintf(a href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;, jsession) Error: unexpected symbol in u = sprintf(a href=https Can you or someone else help me to get out of this error? Also, my another question is: from where you got the expression: a href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219; I really appreciate if someone help me to understand that. Thank you. - Original Message - From: Ron Michael ron_michae...@yahoo.com To: Duncan Temple Lang dtemplel...@ucdavis.edu; r-help@r-project.org r-help@r-project.org Cc: Sent: Saturday, 3 August 2013 12:58 PM Subject: Re: [R] How to download this data? Hello Duncan, Thank you very much for your pointer. However when I tried to run your code, I got following error: rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) Error in function (type, msg, asError = TRUE) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Can someone help me to understand what could be the cause of this error? Thank you. - Original Message - From: Duncan Temple Lang dtemplel...@ucdavis.edu To: r-help@r-project.org Cc: Sent: Saturday, 3 August 2013 4:33 AM Subject: Re: [R] How to download this data? That URL is an HTTPS (secure HTTP), not an HTTP. The XML parser cannot retrieve the file. Instead, use the RCurl package to get the file. However, it is more complicated than that. If you look at source of the HTML page in a browser, you'll see a jsessionid and that is a session identifier. The following retrieves the content of your URL and then parses it and extracts the value of the jsessionid. Then we create the full URL to the actual data page (which is actually in the HTML content but in JavaScript code) library(RCurl) library(XML) rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) rawDoc = htmlParse(rawOrig) tmp = getNodeSet(rawDoc, //@href[contains(.,\040'jsessionid=')])[[1]] jsession = gsub(.*jsessionid=([^?]+)?.*, \\1
Re: [R] How to download this data?
Hi Duncan, Thank you very much for your prompt help. Now all worked very smoothly. Thank you. - Original Message - From: Duncan Temple Lang dtemplel...@ucdavis.edu To: Ron Michael ron_michae...@yahoo.com Cc: r-help@r-project.org r-help@r-project.org Sent: Saturday, 3 August 2013 7:43 PM Subject: Re: [R] How to download this data? Hi Ron Yes, you can use ssl.verifypeer = FALSE. Or alternatively, you can use also use getURLContent(, cainfo = system.file(CurlSSL, cacert.pem, package = RCurl)) to specify where libcurl can find the certificates to verify the SSL signature. The error you are encountering appears to becoming from a garbled R expression. This may have arisen as a result of an HTML mailer adding the a href= into the expression where it found an https://... What we want to do is end up with a string of the form https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=adasdasdad?expiryData=specId=219 We have to substitute the text adasdasdad which we assigned to jsession in a previous command. So, take the literal text c(https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=;, jsession, ?expiryData=specId=219) and combine it into a single string with paste0. We need the literal strings as they appear when you view the mail for R to make sense of them, not what the mailer adds. As to where I found this, it is in the source of the original HTML page in rawDoc scripts = getNodeSet(rawDoc, //body//script) scripts[[ length(scripts) ]] and look at the text, specifically the app.urls and its 'expiry' field. script type=text/javascript![CDATA[ var app = {}; app.isOption = false; app.urls = { 'spec':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?details=specId=219', 'data':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?data=specId=219', 'confirm':'/reports/dealreports/getSampleConfirm.do;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?hubId=403productId=254', 'reports':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?reports=specId=219', 'expiry':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?expiryDates=specId=219' }; app.Router = Backbone.Router.extend({ routes:{ spec:spec, data:data, confirm:confirm, On 8/3/13 1:05 AM, Ron Michael wrote: In the mean time I have this problem sorted out, hopefully I did it correctly. I have modified the line of your code as: rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;, ssl.verifypeer = FALSE) However next I faced with another problem to executing: u = sprintf(a href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;, jsession) Error: unexpected symbol in u = sprintf(a href=https Can you or someone else help me to get out of this error? Also, my another question is: from where you got the expression: a href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219; I really appreciate if someone help me to understand that. Thank you. - Original Message - From: Ron Michael ron_michae...@yahoo.com To: Duncan Temple Lang dtemplel...@ucdavis.edu; r-help@r-project.org r-help@r-project.org Cc: Sent: Saturday, 3 August 2013 12:58 PM Subject: Re: [R] How to download this data? Hello Duncan, Thank you very much for your pointer. However when I tried to run your code, I got following error: rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) Error in function (type, msg, asError = TRUE) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Can someone help me to understand what could be the cause of this error? Thank you. - Original Message - From: Duncan Temple Lang dtemplel...@ucdavis.edu To: r-help@r-project.org Cc: Sent: Saturday, 3 August 2013 4:33 AM Subject: Re: [R] How to download this data? That URL is an HTTPS (secure HTTP), not an HTTP. The XML parser cannot retrieve the file. Instead, use the RCurl package to get the file. However, it is more complicated than that. If you look at source of the HTML page in a browser, you'll see a jsessionid and that is a session identifier. The following retrieves the content of your URL and then parses it and extracts the value of the jsessionid. Then we create the full URL to the actual data page (which
[R] How to download this data?
Hi all, I need to download the data from this web page: https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry I used the function readHTMLTable() from package XML, however could not download that. Can somebody help me how to get the data onto my R window? Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data?
That URL is an HTTPS (secure HTTP), not an HTTP. The XML parser cannot retrieve the file. Instead, use the RCurl package to get the file. However, it is more complicated than that. If you look at source of the HTML page in a browser, you'll see a jsessionid and that is a session identifier. The following retrieves the content of your URL and then parses it and extracts the value of the jsessionid. Then we create the full URL to the actual data page (which is actually in the HTML content but in JavaScript code) library(RCurl) library(XML) rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) rawDoc = htmlParse(rawOrig) tmp = getNodeSet(rawDoc, //@href[contains(.,\040'jsessionid=')])[[1]] jsession = gsub(.*jsessionid=([^?]+)?.*, \\1, tmp) u = sprintf(https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;, jsession) doc = htmlParse(getURLContent(u)) tbls = readHTMLTable(doc) data = tbls[[1]] dim(data) I did this quickly so it may not be the best way or completely robust, but hopefully it gets the point across and does get the data. D. On 8/2/13 2:42 PM, Ron Michael wrote: Hi all, I need to download the data from this web page: https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry I used the function readHTMLTable() from package XML, however could not download that. Can somebody help me how to get the data onto my R window? Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.