On Mon, 6 May 2019 11:12:25 +0200 Ralf Stubner <ralf.stub...@daqana.com> wrote:
> On 04.05.19 19:04, Stephen Berman wrote: >> In versions of R prior to 3.6.0 the following invocation succeeds, >> returning the data frame shown: >> >>> read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text", >>> header=TRUE) >> Dekade Anzahl >> 1 1900 11467254 >> 2 1910 13023370 >> 3 1920 13434601 >> 4 1930 13296355 >> 5 1940 12121250 >> 6 1950 13191131 >> 7 1960 10587420 >> 8 1970 10944129 >> 9 1980 11279439 >> 10 1990 12052652 >> >> But in version 3.6.0 it fails: >> >>> read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text", >>> header=TRUE) >> Error in file(file, "rt") : >> cannot open the connection to >> 'https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text' >> In addition: Warning message: >> In file(file, "rt") : >> cannot open URL >> 'https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text': >> HTTP status was '403 Forbidden' > > I can reproduce the behavior on Debian using the CRAN supplied package > for R 3.6.0. Trying to read the page with 'curl' produces also a 403 > error plus some HTML text (in German) explaining that I am treated as a > 'robot' due to the supplied User-Agent (here: curl/7.52.1). One > suggested solution is to adjust that value which does solve the issue: > > > options(HTTPUserAgent='mozilla') I confirm that works for me, too. Thanks! FWIW, the default value of HTTPUserAgent in R 3.6 here is "R (3.6.0 x86_64-suse-linux-gnu x86_64 linux-gnu)", and using this (in R 3.6) fails as I reported, while the default value of HTTPUserAgent in R 3.5 here is "R (3.5.0 x86_64-suse-linux-gnu x86_64 linux-gnu)" and using that (in R 3.5) succeeds. However, setting HTTPUserAgent in R 3.5 to "libcurl/7.60.0" fails just as it does in 3.6. It's not clear to me if this particular website is being too restrictive or if R 3.6 should deal with it, or at least mention the issue in NEWS or somewhere else. Steve Berman ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel