Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-06 Thread Henrik Bengtsson
Thanks for the comments, feedback, and improvements.

I still argue that the current behavior cause more harm than it helps.

First of all, it increases the risk for code that does not work on all
platforms, which I'd say is one of the strengths and design goals of
R.  To write cross-platform code, a developer basically needs to
specify argument 'mode'.

A second problem is that people who work on non-Windows platforms will
not be aware of this problem.  Yes, adding this Windows-specific
behavior to the help on all platforms will help a bit (thanks for
doing that).  However, since there are so many non-Windows users out
there that write documentation, vignettes, blog posts, host classes
and workshops, it is quite likely that you'll see things like
"Download the data file using `download.file(url, file)` and then
...".  Boom, a "beginner" on Windows will have problems and even the
non-Windows instructor may not know what's going and quickly lots of
time is wasted.

A third problem is wasted bandwidth because the same file has to be
downloaded a second time.  If the default is changed to mode="wb" and
someone truly needs mode="w", the penalty should be smaller because
such text-based files are likely to be much smaller than binary files,
which are often several GiB these days.

What could lower the risk for the above,and help the user and helpers,
is to give an informative warning whenever 'mode' is not specified,
e.g.

   The file 'NNN' is downloaded as a text file (mode = "w"). If you
meant to download it as a binary file, specify mode = "wb".

Deprecating the default mode="w" on Windows can be done in steps, e.g.
by making the argument mandatory for a while. This could be done on
all platforms because we're already all affected, i.e. we need to
specify 'mode' to avoid surprises.

Even if the default won't change, below are some more
comments/observations that is related to the current implementation of
download.file() on Windows:

ADD MORE EXTENSIONS?

What about case-insensitive matching, e.g. data.ZIP and data.Rdata?

A quick scan of the R source code suggests that R is also working with
the following filename extensions (using various case styles):

* Rbin (src/library/tools/R/install.R)
* rda, Rda (tests/reg-tests-1a.R)
* rdb (src/library/tools/R/install.R)
* rds, RDS, Rds (src/library/tools/R/install.R)
* rdx (src/library/tools/R/install.R)
* RData, Rdata, rdata (src/library/tools/R/install.R)

Should the tar extension also be added?

What about binary image formats that R produces, e.g. filename
extensions bmp, jpg, jpeg, pdf, png, tif, tiff?

What about all the other file extensions that we know for sure are binary?


VECTORIZATION:

For some value of the 'method' argument, the current implementation
will download the same file differently depending on other files
downloaded at the same time.  For example, here a PNG file is
downloaded in text mode and its content is translated:

> urls <- c("https://www.r-project.org/logo/Rlogo.png;)
> download.file(urls, destfile = basename(urls), method = "libcurl")
trying URL 'https://www.r-project.org/logo/Rlogo.png'
Content length 48148 bytes (47 KB)
downloaded 47 KB
> file.size(basename(urls))
[1] 48281

But if we throw in a "known" binary extension, the PNG file be
downloaded as binary:

> urls <- c("https://www.r-project.org/logo/Rlogo.png;, 
> "https://cran.r-project.org/bin/windows/contrib/3.6/future_1.8.1.zip;)
> download.file(urls, destfile = basename(urls), method = "libcurl")
trying URL 'https://www.r-project.org/logo/Rlogo.png'
trying URL 'https://cran.r-project.org/bin/windows/contrib/3.6/future_1.8.1.zip'
> file.size(basename(urls))
[1]  48148 527069

Best,

Henrik

On Fri, May 4, 2018 at 1:18 AM, Martin Maechler
 wrote:
>> Joris Meys 
>> on Fri, 4 May 2018 10:00:07 +0200 writes:
>
> > On Fri, May 4, 2018 at 8:34 AM, Tomas Kalibera
> >  wrote:
>
> >> The current heuristic/hack is in line with the
> >> compatibility approach: it detects files that are
> >> obviously binary, so it changes the default behavior only
> >> for cases when it would obviously cause damage.
> >>
> >> Tomas
>
>
> > Well, I was trying to download a .gz file and
> > download.file() didn't detect that. Reason for that is
> > obviously that the link doesn't contain .gz but %2Egz ,
> > using the ASCII code for the dot instead of the dot
> > itself. That's general practice in a lot of links.
>
> > Hence I propose to change the line in download.file() that
> > does this check to:
>
> >   if (missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$",
> >   URLdecode(url
>
> > using URLdecode() ensures that .gz, .RData etc will be
> > detected correctly in an encoded URL.
>
> > Cheers Joris
>
> Makes sense to me and I plan to add it when also adding '.rds'
>
> { OTOH, after reading the thread about this: 

Re: [Bioc-devel] Switch to SSH protocol for git clone instructions on package landing pages?

2018-05-06 Thread Martin Morgan



On 04/30/2018 08:17 AM, Kasper Daniel Hansen wrote:

Still, it is convenient for some of us to have copy+paste code on the
landing page.  How about having both https and ssh?


Supporting https:// would require account and password management. I 
guess we have moved closer to that than originally anticipated, but we 
were trying to avoid getting involved in that. Also, we had not 
anticipated that some organizations would block ssh activity. At the 
moment and for the foreseeable future, https is read-only.


Martin



On Sun, Apr 29, 2018 at 8:57 AM, Peter Hickey 
wrote:


Ah, thanks both Joris and Nitesh. I didn't appreciate that SSH access is
limited to those with a public key registered on the git server.

On Sun, 29 Apr 2018 at 11:50 Turaga, Nitesh 

wrote:


The one-liner on the package landing page describing how to check out
a package from the git repo uses HTTPS rather than ssh, e.g.:

# From https://bioconductor.org/packages/bsseq/
git clone https://git.bioconductor.org/packages/bsseq

However, as a developer we should be using the SSH protocol
(https://bioconductor.org/developers/how-to/git/faq/).

Is there any reason not to use the SSH protocol (i.e. git clone
g...@git.bioconductor.org:packages/bsseq) in the instructions given on
the landing page? It seems to me an unnecessary source of friction,
particularly for new developers who will end up with the dreaded
"fatal: remote error: FATAL: W any packages/myPackage nobody DENIED by
fallthru (or you mis-spelled the reponame)" error message if they
don't know to switch protocols
(https://bioconductor.org/developers/how-to/git/faq/)

Cheers,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




This email message may contain legally privileged and/or confidential
information.  If you are not the intended recipient(s), or the employee

or

agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited.  If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.


 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




This email message may contain legally privileged and/or...{{dropped:2}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] BioC 2018 poster / talk / scholarship / workshop application deadline May 17

2018-05-06 Thread Martin Morgan
Join us for our annual conference BioC 2018: Where Software and Biology 
Connect, at Victoria University on the University of Toronto campus


  http://bioc2018.bioconductor.org

The deadline for poster, talk, scholarship (travel, accommodation, and 
registration), and workshop applications is May 17, see


  http://bioc2018.bioconductor.org/call-for-abstracts
  http://bioc2018.bioconductor.org/scholarships

Martin Morgan
Bioconductor


This email message may contain legally privileged and/or...{{dropped:2}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Rd] Sys.timezone (timedatectl) unnecessarily warns loudly

2018-05-06 Thread Jan Gorecki
Dear R-devels,

timedatectl binary used by Sys.timezone does not always work reliably.
If it doesn't the warning is raised, unnecessarily because later on
Sys.timezone gets timezone successfully from /etc/timezone. This
obviously might not be true for different linux OSes, but it solves
the issue for simple dockerized Ubuntu 16.04.

Current behavior R Under development (unstable) (2018-05-04 r74695) --
"Unsuffered Consequences"

  Sys.timezone()
  #Failed to create bus connection: No such file or directory
  #[1] "Etc/UTC"
  #Warning message:
  #In system("timedatectl", intern = TRUE) :
  #  running command 'timedatectl' had status 1

There was small discussion where I initially put comment about it in:
https://github.com/wch/r-source/commit/9866ac2ad1e2f1c4565ae829ba33b5b98a08d10d#r28867164

Below patch makes timedatectl call silent, both suppressWarnings and
ignore.stderr are required to deal with R warning, and warning printed
directly to console from timedatectl.

diff --git src/library/base/R/datetime.R src/library/base/R/datetime.R
index 6b34267936..b81c049f3e 100644
--- src/library/base/R/datetime.R
+++ src/library/base/R/datetime.R
@@ -73,7 +73,7 @@ Sys.timezone <- function(location = TRUE)
 ## First try timedatectl: should work on any modern Linux
 ## as part of systemd (and probably nowhere else)
 if (nzchar(Sys.which("timedatectl"))) {
-inf <- system("timedatectl", intern = TRUE)
+inf <- suppressWarnings(system("timedatectl", intern = TRUE,
ignore.stderr=TRUE))
 ## typical format:
 ## "   Time zone: Europe/London (GMT, +)"
 ## "   Time zone: Europe/Vienna (CET, +0100)"

Regards,
Jan Gorecki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-06 Thread Suharto Anggono Suharto Anggono via R-devel
Does anyone notice r-devel thread "stopifnot() does not stop at first non-TRUE 
argument" starting with 
https://stat.ethz.ch/pipermail/r-devel/2017-May/074179.html ?

I have mentioned
(function(...)nargs())(...)
in https://stat.ethz.ch/pipermail/r-devel/2017-May/074294.html .

Something like ..elt(n) is switch(n, ...) . I have mentioned it in 
https://stat.ethz.ch/pipermail/r-devel/2017-May/074270.html . See also response 
in https://stat.ethz.ch/pipermail/r-devel/2017-May/074282.html .

By the way, because 'stopifnot' in R 3.5.0 contains argument other than '...', 
it might be better to use
match.call(expand.dots=FALSE)$...
instead of
match.call()[-1L] .

---
> Joris Meys 
> on Fri, 4 May 2018 15:37:27 +0200 writes:

> The one difference I see, is the necessity to pass the dots to the 
function
> dotlength :

> dotlength <- function(...) nargs()

> myfun <- function(..., someArg = 1){
> n1 <- ...length()
> n2 <- dotlength()
> n3 <- dotlength(...)
> return(c(n1, n2, n3))
> }

> myfun(stop("A"), stop("B"), someArg = stop("c"))

> I don't really see immediately how one can replace the C definition with
> Hadley's solution without changing how the function has to be used.

Yes, of course:  nargs() can only be applied to the function inside
which it is used, and hence  n2 <- dotlength()  must therefore be 0.
Thank you, Joris

> Personally, I have no preference over the use, but changing it now would
> break code dependent upon ...length() imho. Unless I'm overlooking
> something of course.

Yes.  OTOH, as it's been very new, one could consider
deprecating it, and advertize say,  .length(...) instead of ...length()
[yes, in spite of the fact that the pure-R solution is slower
 than a primitive; both are fast enough for all purposes]

But such a deprecation cycle typically entails time more writing
etc, not something I've time for just these days.

Martin


> On Fri, May 4, 2018 at 3:02 PM, Martin Maechler 
> wrote:

>> > Hervé Pagès 
>> > on Thu, 3 May 2018 08:55:20 -0700 writes:
>> 
>> > Hi,
>> > It would be great if one of the experts could comment on the
>> > difference between Hadley's dotlength and ...length? The fact
>> > that someone bothered to implement a new primitive for that
>> > when there seems to be a very simple and straightforward R-only
>> > solution suggests that there might be some gotchas/pitfalls with
>> > the R-only solution.
>> 
>> Namely
>> 
>> > dotlength <- function(...) nargs()
>> 
>> > (This is subtly different from calling nargs() directly as it will
>> > only count the elements in ...)
>> 
>> > Hadley
>> 
>> 
>> Well,  I was the "someone".  In the past I had seen (and used myself)
>> 
>> length(list(...))
>> 
>> and of course that was not usable.
>> I knew of some substitute() / match.call() tricks [but I think
>> did not know Bill's cute substitute(...()) !] at the time, but
>> found them too esoteric.
>> 
>> Aditionally and importantly,  ...length()  and  ..elt(n)  were
>> developed  "synchronously",  and the R-substitutes for ..elt()
>> definitely are less trivial (I did not find one at the time), as
>> Duncan's example to Bill's proposal has shown, so I had looked
>> at .Primitive() solutions of both.
>> 
>> In hindsight I should have asked here for advice,  but may at
>> the time I had been a bit frustrated by the results of some of
>> my RFCs ((nothing specific in mind !))
>> 
>> But __if__ there's really no example where current (3.5.0 and newer)
>> 
>> ...length()
>> 
>> differs from Hadley's  dotlength()
>> I'd vert happy to replace ...length 's C based definition by
>> Hadley's beautiful minimal solution.
>> 
>> Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel