Re: [Rd] URL checks
One other failure mode: SSL certificates trusted by browsers that are not installed on the check machine, e.g. the "GEANT Vereniging" certificate from https://relational.fit.cvut.cz/ . K On 07.01.21 12:14, Kirill Müller via R-devel wrote: Hi The URL checks in R CMD check test all links in the README and vignettes for broken or redirected links. In many cases this improves documentation, I see problems with this approach which I have detailed below. I'm writing to this mailing list because I think the change needs to happen in R's check routines. I propose to introduce an "allow-list" for URLs, to reduce the burden on both CRAN and package maintainers. Comments are greatly appreciated. Best regards Kirill # Problems with the detection of broken/redirected URLs ## 301 should often be 307, how to change? Many web sites use a 301 redirection code that probably should be a 307. For example, https://www.oracle.com and https://www.oracle.com/ both redirect to https://www.oracle.com/index.html with a 301. I suspect the company still wants oracle.com to be recognized as the primary entry point of their web presence (to reserve the right to move the redirection to a different location later), I haven't checked with their PR department though. If that's true, the redirect probably should be a 307, which should be fixed by their IT department which I haven't contacted yet either. $ curl -i https://www.oracle.com HTTP/2 301 server: AkamaiGHost content-length: 0 location: https://www.oracle.com/index.html ... ## User agent detection twitter.com responds with a 400 error for requests without a user agent string hinting at an accepted browser. $ curl -i https://twitter.com/ HTTP/2 400 ... ...Please switch to a supported browser.. $ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1 HTTP/2 200 # Impact While the latter problem *could* be fixed by supplying a browser-like user agent string, the former problem is virtually unfixable -- so many web sites should use 307 instead of 301 but don't. The above list is also incomplete -- think of unreliable links, HTTP links, other failure modes... This affects me as a package maintainer, I have the choice to either change the links to incorrect versions, or remove them altogether. I can also choose to explain each broken link to CRAN, this subjects the team to undue burden I think. Submitting a package with NOTEs delays the release for a package which I must release very soon to avoid having it pulled from CRAN, I'd rather not risk that -- hence I need to remove the link and put it back later. I'm aware of https://github.com/r-lib/urlchecker, this alleviates the problem but ultimately doesn't solve it. # Proposed solution ## Allow-list A file inst/URL that lists all URLs where failures are allowed -- possibly with a list of the HTTP codes accepted for that link. Example: https://oracle.com/ 301 https://twitter.com/drob/status/1224851726068527106 400 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] URL checks
Hi The URL checks in R CMD check test all links in the README and vignettes for broken or redirected links. In many cases this improves documentation, I see problems with this approach which I have detailed below. I'm writing to this mailing list because I think the change needs to happen in R's check routines. I propose to introduce an "allow-list" for URLs, to reduce the burden on both CRAN and package maintainers. Comments are greatly appreciated. Best regards Kirill # Problems with the detection of broken/redirected URLs ## 301 should often be 307, how to change? Many web sites use a 301 redirection code that probably should be a 307. For example, https://www.oracle.com and https://www.oracle.com/ both redirect to https://www.oracle.com/index.html with a 301. I suspect the company still wants oracle.com to be recognized as the primary entry point of their web presence (to reserve the right to move the redirection to a different location later), I haven't checked with their PR department though. If that's true, the redirect probably should be a 307, which should be fixed by their IT department which I haven't contacted yet either. $ curl -i https://www.oracle.com HTTP/2 301 server: AkamaiGHost content-length: 0 location: https://www.oracle.com/index.html ... ## User agent detection twitter.com responds with a 400 error for requests without a user agent string hinting at an accepted browser. $ curl -i https://twitter.com/ HTTP/2 400 ... ...Please switch to a supported browser.. $ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1 HTTP/2 200 # Impact While the latter problem *could* be fixed by supplying a browser-like user agent string, the former problem is virtually unfixable -- so many web sites should use 307 instead of 301 but don't. The above list is also incomplete -- think of unreliable links, HTTP links, other failure modes... This affects me as a package maintainer, I have the choice to either change the links to incorrect versions, or remove them altogether. I can also choose to explain each broken link to CRAN, this subjects the team to undue burden I think. Submitting a package with NOTEs delays the release for a package which I must release very soon to avoid having it pulled from CRAN, I'd rather not risk that -- hence I need to remove the link and put it back later. I'm aware of https://github.com/r-lib/urlchecker, this alleviates the problem but ultimately doesn't solve it. # Proposed solution ## Allow-list A file inst/URL that lists all URLs where failures are allowed -- possibly with a list of the HTTP codes accepted for that link. Example: https://oracle.com/ 301 https://twitter.com/drob/status/1224851726068527106 400 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Printing Unicode escapes with 6 digits may be problematic
I see that this was only a passing issue. R-devel r79638 and greater (also tested with r79801) print six digits inside curly brace delimiters, like so: "\U{016fe4}1" (using the example below). This ensures compatibility between output and input. - Mikko -Alkuperäinen viesti- Lähettäjä: R-devel Puolesta Korpela Mikko (MML) Lähetetty: maanantai 14. joulukuuta 2020 11.51 Vastaanottaja: r-devel Aihe: [Rd] Printing Unicode escapes with 6 digits may be problematic A recent R-devel commit introduces a change in the way non-printable Unicode characters are shown as an escape code. Whereas large code points were previously printed using an escape code of 8 hexadecimal digits, with initial zeros, the present code (tested with R-devel r79623 on Ubuntu Linux) only prints 6 hex digits. I think this may be problematic: it is now possible that R prints a character string which is not valid when reused as an input. See the following example. "\U{16FE4}1" # [1] "\U016fe41" "\U016fe41" # Error: invalid \U value 16fe41 (line 1) Best regards, - Mikko Korpela __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel