Re: [Rd] download.file does not process gz files correctly (truncates them?)
Thanks for the comments, feedback, and improvements. I still argue that the current behavior cause more harm than it helps. First of all, it increases the risk for code that does not work on all platforms, which I'd say is one of the strengths and design goals of R. To write cross-platform code, a developer basically needs to specify argument 'mode'. A second problem is that people who work on non-Windows platforms will not be aware of this problem. Yes, adding this Windows-specific behavior to the help on all platforms will help a bit (thanks for doing that). However, since there are so many non-Windows users out there that write documentation, vignettes, blog posts, host classes and workshops, it is quite likely that you'll see things like "Download the data file using `download.file(url, file)` and then ...". Boom, a "beginner" on Windows will have problems and even the non-Windows instructor may not know what's going and quickly lots of time is wasted. A third problem is wasted bandwidth because the same file has to be downloaded a second time. If the default is changed to mode="wb" and someone truly needs mode="w", the penalty should be smaller because such text-based files are likely to be much smaller than binary files, which are often several GiB these days. What could lower the risk for the above,and help the user and helpers, is to give an informative warning whenever 'mode' is not specified, e.g. The file 'NNN' is downloaded as a text file (mode = "w"). If you meant to download it as a binary file, specify mode = "wb". Deprecating the default mode="w" on Windows can be done in steps, e.g. by making the argument mandatory for a while. This could be done on all platforms because we're already all affected, i.e. we need to specify 'mode' to avoid surprises. Even if the default won't change, below are some more comments/observations that is related to the current implementation of download.file() on Windows: ADD MORE EXTENSIONS? What about case-insensitive matching, e.g. data.ZIP and data.Rdata? A quick scan of the R source code suggests that R is also working with the following filename extensions (using various case styles): * Rbin (src/library/tools/R/install.R) * rda, Rda (tests/reg-tests-1a.R) * rdb (src/library/tools/R/install.R) * rds, RDS, Rds (src/library/tools/R/install.R) * rdx (src/library/tools/R/install.R) * RData, Rdata, rdata (src/library/tools/R/install.R) Should the tar extension also be added? What about binary image formats that R produces, e.g. filename extensions bmp, jpg, jpeg, pdf, png, tif, tiff? What about all the other file extensions that we know for sure are binary? VECTORIZATION: For some value of the 'method' argument, the current implementation will download the same file differently depending on other files downloaded at the same time. For example, here a PNG file is downloaded in text mode and its content is translated: > urls <- c("https://www.r-project.org/logo/Rlogo.png;) > download.file(urls, destfile = basename(urls), method = "libcurl") trying URL 'https://www.r-project.org/logo/Rlogo.png' Content length 48148 bytes (47 KB) downloaded 47 KB > file.size(basename(urls)) [1] 48281 But if we throw in a "known" binary extension, the PNG file be downloaded as binary: > urls <- c("https://www.r-project.org/logo/Rlogo.png;, > "https://cran.r-project.org/bin/windows/contrib/3.6/future_1.8.1.zip;) > download.file(urls, destfile = basename(urls), method = "libcurl") trying URL 'https://www.r-project.org/logo/Rlogo.png' trying URL 'https://cran.r-project.org/bin/windows/contrib/3.6/future_1.8.1.zip' > file.size(basename(urls)) [1] 48148 527069 Best, Henrik On Fri, May 4, 2018 at 1:18 AM, Martin Maechlerwrote: >> Joris Meys >> on Fri, 4 May 2018 10:00:07 +0200 writes: > > > On Fri, May 4, 2018 at 8:34 AM, Tomas Kalibera > > wrote: > > >> The current heuristic/hack is in line with the > >> compatibility approach: it detects files that are > >> obviously binary, so it changes the default behavior only > >> for cases when it would obviously cause damage. > >> > >> Tomas > > > > Well, I was trying to download a .gz file and > > download.file() didn't detect that. Reason for that is > > obviously that the link doesn't contain .gz but %2Egz , > > using the ASCII code for the dot instead of the dot > > itself. That's general practice in a lot of links. > > > Hence I propose to change the line in download.file() that > > does this check to: > > > if (missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$", > > URLdecode(url > > > using URLdecode() ensures that .gz, .RData etc will be > > detected correctly in an encoded URL. > > > Cheers Joris > > Makes sense to me and I plan to add it when also adding '.rds' > > { OTOH, after reading the thread about this:
Re: [Bioc-devel] Switch to SSH protocol for git clone instructions on package landing pages?
On 04/30/2018 08:17 AM, Kasper Daniel Hansen wrote: Still, it is convenient for some of us to have copy+paste code on the landing page. How about having both https and ssh? Supporting https:// would require account and password management. I guess we have moved closer to that than originally anticipated, but we were trying to avoid getting involved in that. Also, we had not anticipated that some organizations would block ssh activity. At the moment and for the foreseeable future, https is read-only. Martin On Sun, Apr 29, 2018 at 8:57 AM, Peter Hickeywrote: Ah, thanks both Joris and Nitesh. I didn't appreciate that SSH access is limited to those with a public key registered on the git server. On Sun, 29 Apr 2018 at 11:50 Turaga, Nitesh wrote: The one-liner on the package landing page describing how to check out a package from the git repo uses HTTPS rather than ssh, e.g.: # From https://bioconductor.org/packages/bsseq/ git clone https://git.bioconductor.org/packages/bsseq However, as a developer we should be using the SSH protocol (https://bioconductor.org/developers/how-to/git/faq/). Is there any reason not to use the SSH protocol (i.e. git clone g...@git.bioconductor.org:packages/bsseq) in the instructions given on the landing page? It seems to me an unnecessary source of friction, particularly for new developers who will end up with the dreaded "fatal: remote error: FATAL: W any packages/myPackage nobody DENIED by fallthru (or you mis-spelled the reponame)" error message if they don't know to switch protocols (https://bioconductor.org/developers/how-to/git/faq/) Cheers, Pete ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or...{{dropped:2}} ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] BioC 2018 poster / talk / scholarship / workshop application deadline May 17
Join us for our annual conference BioC 2018: Where Software and Biology Connect, at Victoria University on the University of Toronto campus http://bioc2018.bioconductor.org The deadline for poster, talk, scholarship (travel, accommodation, and registration), and workshop applications is May 17, see http://bioc2018.bioconductor.org/call-for-abstracts http://bioc2018.bioconductor.org/scholarships Martin Morgan Bioconductor This email message may contain legally privileged and/or...{{dropped:2}} ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Rd] Sys.timezone (timedatectl) unnecessarily warns loudly
Dear R-devels, timedatectl binary used by Sys.timezone does not always work reliably. If it doesn't the warning is raised, unnecessarily because later on Sys.timezone gets timezone successfully from /etc/timezone. This obviously might not be true for different linux OSes, but it solves the issue for simple dockerized Ubuntu 16.04. Current behavior R Under development (unstable) (2018-05-04 r74695) -- "Unsuffered Consequences" Sys.timezone() #Failed to create bus connection: No such file or directory #[1] "Etc/UTC" #Warning message: #In system("timedatectl", intern = TRUE) : # running command 'timedatectl' had status 1 There was small discussion where I initially put comment about it in: https://github.com/wch/r-source/commit/9866ac2ad1e2f1c4565ae829ba33b5b98a08d10d#r28867164 Below patch makes timedatectl call silent, both suppressWarnings and ignore.stderr are required to deal with R warning, and warning printed directly to console from timedatectl. diff --git src/library/base/R/datetime.R src/library/base/R/datetime.R index 6b34267936..b81c049f3e 100644 --- src/library/base/R/datetime.R +++ src/library/base/R/datetime.R @@ -73,7 +73,7 @@ Sys.timezone <- function(location = TRUE) ## First try timedatectl: should work on any modern Linux ## as part of systemd (and probably nowhere else) if (nzchar(Sys.which("timedatectl"))) { -inf <- system("timedatectl", intern = TRUE) +inf <- suppressWarnings(system("timedatectl", intern = TRUE, ignore.stderr=TRUE)) ## typical format: ## " Time zone: Europe/London (GMT, +)" ## " Time zone: Europe/Vienna (CET, +0100)" Regards, Jan Gorecki __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] length of `...`
Does anyone notice r-devel thread "stopifnot() does not stop at first non-TRUE argument" starting with https://stat.ethz.ch/pipermail/r-devel/2017-May/074179.html ? I have mentioned (function(...)nargs())(...) in https://stat.ethz.ch/pipermail/r-devel/2017-May/074294.html . Something like ..elt(n) is switch(n, ...) . I have mentioned it in https://stat.ethz.ch/pipermail/r-devel/2017-May/074270.html . See also response in https://stat.ethz.ch/pipermail/r-devel/2017-May/074282.html . By the way, because 'stopifnot' in R 3.5.0 contains argument other than '...', it might be better to use match.call(expand.dots=FALSE)$... instead of match.call()[-1L] . --- > Joris Meys > on Fri, 4 May 2018 15:37:27 +0200 writes: > The one difference I see, is the necessity to pass the dots to the function > dotlength : > dotlength <- function(...) nargs() > myfun <- function(..., someArg = 1){ > n1 <- ...length() > n2 <- dotlength() > n3 <- dotlength(...) > return(c(n1, n2, n3)) > } > myfun(stop("A"), stop("B"), someArg = stop("c")) > I don't really see immediately how one can replace the C definition with > Hadley's solution without changing how the function has to be used. Yes, of course: nargs() can only be applied to the function inside which it is used, and hence n2 <- dotlength() must therefore be 0. Thank you, Joris > Personally, I have no preference over the use, but changing it now would > break code dependent upon ...length() imho. Unless I'm overlooking > something of course. Yes. OTOH, as it's been very new, one could consider deprecating it, and advertize say, .length(...) instead of ...length() [yes, in spite of the fact that the pure-R solution is slower than a primitive; both are fast enough for all purposes] But such a deprecation cycle typically entails time more writing etc, not something I've time for just these days. Martin > On Fri, May 4, 2018 at 3:02 PM, Martin Maechler > wrote: >> > Hervé Pagès >> > on Thu, 3 May 2018 08:55:20 -0700 writes: >> >> > Hi, >> > It would be great if one of the experts could comment on the >> > difference between Hadley's dotlength and ...length? The fact >> > that someone bothered to implement a new primitive for that >> > when there seems to be a very simple and straightforward R-only >> > solution suggests that there might be some gotchas/pitfalls with >> > the R-only solution. >> >> Namely >> >> > dotlength <- function(...) nargs() >> >> > (This is subtly different from calling nargs() directly as it will >> > only count the elements in ...) >> >> > Hadley >> >> >> Well, I was the "someone". In the past I had seen (and used myself) >> >> length(list(...)) >> >> and of course that was not usable. >> I knew of some substitute() / match.call() tricks [but I think >> did not know Bill's cute substitute(...()) !] at the time, but >> found them too esoteric. >> >> Aditionally and importantly, ...length() and ..elt(n) were >> developed "synchronously", and the R-substitutes for ..elt() >> definitely are less trivial (I did not find one at the time), as >> Duncan's example to Bill's proposal has shown, so I had looked >> at .Primitive() solutions of both. >> >> In hindsight I should have asked here for advice, but may at >> the time I had been a bit frustrated by the results of some of >> my RFCs ((nothing specific in mind !)) >> >> But __if__ there's really no example where current (3.5.0 and newer) >> >> ...length() >> >> differs from Hadley's dotlength() >> I'd vert happy to replace ...length 's C based definition by >> Hadley's beautiful minimal solution. >> >> Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel