Thanks for looking into it. Is there a way to avoid the HEAD request in R 3.3.0? I'm asking because if there isn't, then I'll add a workaround in a package I'm working on.
-Winston On Tue, Jun 21, 2016 at 9:45 PM, Martin Morgan <martin.mor...@roswellpark.org> wrote: > On 06/21/2016 09:35 PM, Winston Chang wrote: >> >> In R 3.2.4, if you ran download.file(method="libcurl"), it issues a >> HTTP GET request for the file. However, in R 3.3.0, it issues a HTTP >> HEAD request first, and then a GET requet. This can result in problems >> when the web server gives an error for a HEAD request, even if the >> file is available with a GET request. >> >> Is it possible to tell download.file to simply send a GET request, >> without first sending a HEAD request? >> >> >> In theory, web servers should give the same response for HEAD and GET >> requests, except that for a HEAD request, it sends only headers, and >> not the content. However, not all web servers do this for all files. >> I've seen this problem come up in two different places. >> >> The first is from an issue that someone filed for the downloader >> package. The following works in R 3.2.4, but in R 3.3.0, it fails with >> a 404 (tested on a Mac): >> options(internet.info=1) # Show verbose download info >> url <- >> "https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011_gen.zip" >> download.file(url, destfile = "out.zip", method="libcurl") >> >> In R 3.3.0, the download succeeds with method="wget", and >> method="curl". It's only method="libcurl" that has problems. >> >> >> The second place I've encountered a problem is in downloading attached >> files from a GitHub release. >> options(internet.info=1) # Show verbose download info >> url <- >> "https://github.com/wch/webshot/releases/download/v0.3/phantomjs-2.1.1-macosx.zip" >> download.file(url, destfile = "out.zip") >> >> This one fails with a 403 Forbidden because it gets redirected to a >> URL in Amazon S3, where a signature of the file is embedded in the >> URL. However, the signature is computed with the request type (HEAD >> vs. GET), and so the same URL doesn't work for both. (See >> http://stackoverflow.com/a/20580036/412655) >> >> Any help would be appreciated! > > > I think I introduced this, in > > ------------------------------------------------------------------------ > r69280 | morgan | 2015-09-03 06:24:49 -0400 (Thu, 03 Sep 2015) | 4 lines > > don't create empty file on 404 and similar errors > > - download.file(method="libcurl") > > ------------------------------------------------------------------------ > > The idea was to test that the file can be downloaded before trying to > download it; previously R would download the error page as though it were > the content. > > I'll give this some thought. > > Martin Morgan > > >> -Winston >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel