Re: [Rd] The case for freezing CRAN
On Thu, 20 Mar 2014, Dirk Eddelbuettel wrote: o Roger correctly notes that R scripts and packages are just one issue. Compilers, libraries and the OS matter. To me, the natural approach these days would be to think of something based on Docker or Vagrant or (if you must, VirtualBox). The newer alternatives make snapshotting very cheap (eg by using Linux LXC). That approach reproduces a full environemnt as best as we can while still ignoring the hardware layer (and some readers may recall the infamous Pentium bug of two decades ago). At one of my previous jobs we did effectively this (albeit in a lower tech fashion). Every project had its own environment, complete with the exact snapshot of R & packages used, etc. All scripts/code was kept in that environment in a versioned fashion such that at any point one could go to any stage of development of that paper/project's analysis and reproduce it exactly. It was hugely inefficient in terms of storage, but it solved the problem we're discussing here. As you note, with the tools available today it'd be trivial to distribute that environment for people to reproduce results. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
On 21 March 2014 at 07:43, Therneau, Terry M., Ph.D. wrote: | This has been a fascinating discussion. I am not so sure. Seems more like rehashing of old and known arguments, while some folks try to push their work (Hi Jeroen :) onto already overloaded others. The only real thing I learned so far is that Philippe is busy earning publication credits along the line 'damn, just go and test it' suggestion I made (somewhat flippantly) in my last email. | I maintain the survival package which currently has 246 reverse dependencies and take a | slightly different view, which could be described as "the price of fame". I feel a | responsiblity to not break R. I have automated scripts which download the latest copy of | all 246, using the install-tests option, and run them all. Most updates have 1-3 issues. Same here, but as a somewhat younger package Rcpp is so far "only" at 189 and counting, with pretty decent growth. My experience has been positive too, and CRAN appears appreciative for us doing preemptive work and trying to be careful about not introducing breaking changes. I too see the latter part as something we owe the users of our package: a "promise" not to mess with the interface unless we absolutely must. | but also worth it. I've built the test scripts over several years, with help from several | others; a place to share this information would be a useful addition. I put my script on GitHub next to Rcpp itself, turns out that another thread participant just a need for exactly that script yesterday. Dirk -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
On Fri, Mar 21, 2014 at 8:43 AM, Therneau, Terry M., Ph.D. < thern...@mayo.edu> wrote: [...] > > Gabor Csardi discussed the problems with maintaining a package with lots > of dependencies. > I maintain the survival package which currently has 246 reverse > dependencies and take a slightly different view, which could be described > as "the price of fame". I feel a responsiblity to not break R. I have > automated scripts which download the latest copy of all 246, using the > install-tests option, and run them all. Most updates have 1-3 issues. > About 25% of the time it turns out to be a problem that I introduced, and > in all the others I have found the other package authors to be responsive. > It is a nuisance, yes, but also worth it. I've built the test scripts > over several years, with help from several others; a place to share this > information would be a useful addition. > Well, maybe you are just a better programmer and maintainer than me, and I am alone with my problems. I hope that this is the case. I actually do run automated tests against the reverse dependencies. It downloads ~3GB of packages, the output is 500KB (much of it is the compilation of my package, though), and it contains the word 'error' ~ 80 and the word 'warning' ~ 270 times: http://pave.igraph.org/job/igraph-r-check-deps/15/consoleFull This process also keeps me honest about any updates that are not backwards > compatable. Not really, this would only be true if all the 246 package had proper tests for all of their survival uses. Unlikely. It definitely helps, I am not saying that it does not, but I also think that it is up to the maintainer of the package to test it, including testing it against newer versions of its dependencies. Simply because the maintainers know best how their packages are supposed to work, and how it is supposed to be tested. The other thing is that quite often I do want to break the API, and this would be much easier with having a CRAN-devel, so that there is some time for the problems to come up. Gabor There is hardly a single option that is not used by some other package, > somewhere. [...] [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
This has been a fascinating discussion. Carl Boettinger replied with a set of examples where the world is much more fragile than my examples. That was useful. It seems that people in my area (medical research and survival) are more careful with their packages (whew!). Gabor Csardi discussed the problems with maintaining a package with lots of dependencies. I maintain the survival package which currently has 246 reverse dependencies and take a slightly different view, which could be described as "the price of fame". I feel a responsiblity to not break R. I have automated scripts which download the latest copy of all 246, using the install-tests option, and run them all. Most updates have 1-3 issues. About 25% of the time it turns out to be a problem that I introduced, and in all the others I have found the other package authors to be responsive. It is a nuisance, yes, but also worth it. I've built the test scripts over several years, with help from several others; a place to share this information would be a useful addition. This process also keeps me honest about any updates that are not backwards compatable. There is hardly a single option that is not used by some other package, somewhere. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
On Mar 20, 2014, at 1:02 PM, Marc Schwartz wrote: > > On Mar 20, 2014, at 12:23 PM, Greg Snow <538...@gmail.com> wrote: > >> On Thu, Mar 20, 2014 at 7:32 AM, Dirk Eddelbuettel wrote: >> [snip] >> >>>(and some readers >>> may recall the infamous Pentium bug of two decades ago). >> >> It was a "Flaw" not a "Bug". At least I remember the Intel people >> making a big deal about that distinction. >> >> But I do remember the time well, I was a biostatistics Ph.D. student >> at the time and bought one of the flawed pentiums. My attempts at >> getting the chip replaced resulted in a major run around and each >> person that I talked to would first try to explain that I really did >> not need the fix because the only people likely to be affected were >> large corporations and research scientists. I will admit that I was >> not a large corporation, but if a Ph.D. student in biostatistics is >> not a research scientist, then I did not know what they defined one >> as. When I pointed this out they would usually then say that it still >> would not matter, unless I did a few thousand floating point >> operations I was unlikely to encounter one of the problematic >> divisions. I would then point out that some days I did over 10,000 >> floating point operations before breakfast (I had checked after the >> 1st person told me this and 10,000 was a low estimate of a lower bound >> of one set of simulations) at which point they would admit that I had >> a case and then send me to talk to someone else who would start the >> process over. > > > Further segue: > > That (1994) was a watershed moment for Intel as a company. A time during > which Intel's future was quite literally at stake. Intel's internal response > to that debacle, which fundamentally altered their own perception of just who > their customer was (the OEM's like IBM, COMPAQ and Dell versus the end users > like us), took time to be realized, as the impact of increasingly negative PR > took hold. It was also a good example of the impact of public perception (a > flawed product) versus the realities of how infrequently the flaw would be > observed in "typical" computing. "Perception is reality", as some would > observe. > > Intel ultimately spent somewhere in the neighborhood of $500 million (in 1994 > U.S. dollars), as I recall, to implement a large scale Pentium chip > replacement infrastructure targeted to end users. The "Intel Inside" > marketing campaign was also an outgrowth of that time period. > Quick correction, thanks to Peter, on my assertion that the "Intel Inside" campaign arose from the 1994 Pentium issue. It actually started in 1991. I had a faulty recollection from my long ago reading of Andy Grove's 1996 book, "Only The Paranoid Survive", that the slogan arose from Intel's reaction to the Pentium fiasco. It actually pre-dated that time frame by a few years. Thanks Peter! Regards, Marc __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
Given the version / dated snapshots of CRAN, and an agreement that reproducibility is the responsibility of the study author, the author simply needs to sync all their packages to a chosen date, run the analysis and publish the chosen date. It is true that this doesn't include compilers, OS, system packages etc, but in my experience those are significantly more stable than CRAN packages. Also, my previous description of how to serve up a dated CRAN was way too complicated. Since most of the files on CRAN never change, they don't need version control. Only the metadata about which versions are current really needs to be tracked, and that's small enough that it could be stored in static files. On Thu, Mar 20, 2014 at 6:32 AM, Dirk Eddelbuettel wrote: > > No attempt to summarize the thread, but a few highlighted points: > > o Karl's suggestion of versioned / dated access to the repo by adding a >layer to webaccess is (as usual) nice. It works on the 'supply' side. > But >Jeroen's problem is on the demand side. Even when we know that an >analysis was done on 20xx-yy-zz, and we reconstruct CRAN that day, it > only >gives us a 'ceiling' estimate of what was on the machine. In production >or lab environments, installations get stale. Maybe packages were > already >a year old? To me, this is an issue that needs to be addressed on the >'demand' side of the user. But just writing out version numbers is not >good enough. > > o Roger correctly notes that R scripts and packages are just one issue. >Compilers, libraries and the OS matter. To me, the natural approach > these >days would be to think of something based on Docker or Vagrant or (if > you >must, VirtualBox). The newer alternatives make snapshotting very cheap >(eg by using Linux LXC). That approach reproduces a full environemnt as >best as we can while still ignoring the hardware layer (and some readers >may recall the infamous Pentium bug of two decades ago). > > o Reproduciblity will probably remain the responsibility of study >authors. If an investigator on a mega-grant wants to (or needs to) > freeze, >they do have the tools now. Requiring the need of a few to push work on >those already overloaded (ie CRAN) and changing the workflow of > everybody >is a non-starter. > > o As Terry noted, Jeroen made some strong claims about exactly how flawed >the existing system is and keeps coming back to the example of 'a JSS >paper that cannot be re-run'. I would really like to see empirics on >this. Studies of reproducibility appear to be publishable these days, > so >maybe some enterprising grad student wants to run with the idea of >actually _testing_ this. We maybe be above Terry's 0/30 and nearer to >Kevin's 'low'/30. But let's bring some data to the debate. > > o Overall, I would tend to think that our CRAN standards of releasing with >tests, examples, and checks on every build and release already do a much >better job of keeping things tidy and workable than in most if not all >other related / similar open source projects. I would of course welcome >contradictory examples. > > Dirk > > -- > Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
On Mar 20, 2014, at 12:23 PM, Greg Snow <538...@gmail.com> wrote: > On Thu, Mar 20, 2014 at 7:32 AM, Dirk Eddelbuettel wrote: > [snip] > >> (and some readers >> may recall the infamous Pentium bug of two decades ago). > > It was a "Flaw" not a "Bug". At least I remember the Intel people > making a big deal about that distinction. > > But I do remember the time well, I was a biostatistics Ph.D. student > at the time and bought one of the flawed pentiums. My attempts at > getting the chip replaced resulted in a major run around and each > person that I talked to would first try to explain that I really did > not need the fix because the only people likely to be affected were > large corporations and research scientists. I will admit that I was > not a large corporation, but if a Ph.D. student in biostatistics is > not a research scientist, then I did not know what they defined one > as. When I pointed this out they would usually then say that it still > would not matter, unless I did a few thousand floating point > operations I was unlikely to encounter one of the problematic > divisions. I would then point out that some days I did over 10,000 > floating point operations before breakfast (I had checked after the > 1st person told me this and 10,000 was a low estimate of a lower bound > of one set of simulations) at which point they would admit that I had > a case and then send me to talk to someone else who would start the > process over. Further segue: That (1994) was a watershed moment for Intel as a company. A time during which Intel's future was quite literally at stake. Intel's internal response to that debacle, which fundamentally altered their own perception of just who their customer was (the OEM's like IBM, COMPAQ and Dell versus the end users like us), took time to be realized, as the impact of increasingly negative PR took hold. It was also a good example of the impact of public perception (a flawed product) versus the realities of how infrequently the flaw would be observed in "typical" computing. "Perception is reality", as some would observe. Intel ultimately spent somewhere in the neighborhood of $500 million (in 1994 U.S. dollars), as I recall, to implement a large scale Pentium chip replacement infrastructure targeted to end users. The "Intel Inside" marketing campaign was also an outgrowth of that time period. Regards, Marc Schwartz > [snip] >> -- >> Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538...@gmail.com > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
There seems to be some question of how frequently changes to software packages result in irreproducible results. I am sure Terry is correct that research using functions like `glm` and other functions that are shipped with base R are quite reliable; and after all they already benefit from being versioned with R releases as Jeroen argues. In my field of ecology and evolution, the situation is quite different. Packages are frequently developed by scientists without any background in programming and become widely used, such as [geiger]( http://cran.r-project.org/web/packages/geiger/), with 463 papers citing it and probably many more using it that do not cite it (both because it is sometimes used only as a dependency of another package or just because our community isn't great at citing packages). The package has changed substantially over the time it has been on CRAN and many functions that would once run based on older versions could no longer run on newer ones. It's dependencies, notably the phylogenetics package ape, has changed continually over that interval with both bug fixes and substantial changes to the basic data structure. The ape package has 1,276 citations (again a lower bound). I suspect that correctly identifying the right version of the software used in any of these thousands of papers would prove difficult and for a large fraction the results would simply not execute successfully. It would be much harder to track down cases where the bug fixes would have any impact on the result. I have certainly seen both problems in the hundreds of Sweave/knitr files I have produced over the years that use these packages. Even work that simply relies on a package that has been archived becomes a substantial challenge to reproducibility by other scientists even when an expert familiar with the packages (e.g. the original author) would not have a problem, as the informatics team at the Evolutionary Synthesis center recently concluded in an exercise trying to reproduce several papers including my own that used a package that had been archived (odesolve, whose replacement, deSolve, does not use quite the same function call for the same `lsoda` function). New methods are being published all the time, and I think it is excellent that in ecology and evolution it is increasingly standard to publish R packages implementing those methods, as a scan of any table of contents in "methods in Ecology and Evolution", for instance, will quickly show. But unlike `glm`, these methods have a long way to go before they are fully tested and debugged, and reproducing any work based on them requires a close eye to the versions (particularly when unit tests and even detailed changelogs are not common). The methods are invariably built by "user-developers", researchers developing the code for their own needs, and thus these packages can themselves fall afoul of changes as they depend and build upon work of other nascent ecology and evolution packages. Detailed reproducibility studies of published work in this area are still hard to come by, not least because the actual code used by the researchers is seldom published (other than when it is published as it's own R package). But incompatibilities between successive versions of the 100s of packages in our domain, along with the interdependencies of those packages might provide some window into the difficulties of computational reproducibility. I suspect changes in these fast-moving packages are far more culprit than differences in compilers and operating systems. Cheers, Carl On Thu, Mar 20, 2014 at 10:23 AM, Greg Snow <538...@gmail.com> wrote: > On Thu, Mar 20, 2014 at 7:32 AM, Dirk Eddelbuettel wrote: > [snip] > > > (and some readers > >may recall the infamous Pentium bug of two decades ago). > > It was a "Flaw" not a "Bug". At least I remember the Intel people > making a big deal about that distinction. > > But I do remember the time well, I was a biostatistics Ph.D. student > at the time and bought one of the flawed pentiums. My attempts at > getting the chip replaced resulted in a major run around and each > person that I talked to would first try to explain that I really did > not need the fix because the only people likely to be affected were > large corporations and research scientists. I will admit that I was > not a large corporation, but if a Ph.D. student in biostatistics is > not a research scientist, then I did not know what they defined one > as. When I pointed this out they would usually then say that it still > would not matter, unless I did a few thousand floating point > operations I was unlikely to encounter one of the problematic > divisions. I would then point out that some days I did over 10,000 > floating point operations before breakfast (I had checked after the > 1st person told me this and 10,000 was a low estimate of a lower bound > of one set of simulations) at which point they would admit that I had > a case and then
Re: [Rd] The case for freezing CRAN
On Thu, Mar 20, 2014 at 7:32 AM, Dirk Eddelbuettel wrote: [snip] > (and some readers >may recall the infamous Pentium bug of two decades ago). It was a "Flaw" not a "Bug". At least I remember the Intel people making a big deal about that distinction. But I do remember the time well, I was a biostatistics Ph.D. student at the time and bought one of the flawed pentiums. My attempts at getting the chip replaced resulted in a major run around and each person that I talked to would first try to explain that I really did not need the fix because the only people likely to be affected were large corporations and research scientists. I will admit that I was not a large corporation, but if a Ph.D. student in biostatistics is not a research scientist, then I did not know what they defined one as. When I pointed this out they would usually then say that it still would not matter, unless I did a few thousand floating point operations I was unlikely to encounter one of the problematic divisions. I would then point out that some days I did over 10,000 floating point operations before breakfast (I had checked after the 1st person told me this and 10,000 was a low estimate of a lower bound of one set of simulations) at which point they would admit that I had a case and then send me to talk to someone else who would start the process over. [snip] > -- > Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
No attempt to summarize the thread, but a few highlighted points: o Karl's suggestion of versioned / dated access to the repo by adding a layer to webaccess is (as usual) nice. It works on the 'supply' side. But Jeroen's problem is on the demand side. Even when we know that an analysis was done on 20xx-yy-zz, and we reconstruct CRAN that day, it only gives us a 'ceiling' estimate of what was on the machine. In production or lab environments, installations get stale. Maybe packages were already a year old? To me, this is an issue that needs to be addressed on the 'demand' side of the user. But just writing out version numbers is not good enough. o Roger correctly notes that R scripts and packages are just one issue. Compilers, libraries and the OS matter. To me, the natural approach these days would be to think of something based on Docker or Vagrant or (if you must, VirtualBox). The newer alternatives make snapshotting very cheap (eg by using Linux LXC). That approach reproduces a full environemnt as best as we can while still ignoring the hardware layer (and some readers may recall the infamous Pentium bug of two decades ago). o Reproduciblity will probably remain the responsibility of study authors. If an investigator on a mega-grant wants to (or needs to) freeze, they do have the tools now. Requiring the need of a few to push work on those already overloaded (ie CRAN) and changing the workflow of everybody is a non-starter. o As Terry noted, Jeroen made some strong claims about exactly how flawed the existing system is and keeps coming back to the example of 'a JSS paper that cannot be re-run'. I would really like to see empirics on this. Studies of reproducibility appear to be publishable these days, so maybe some enterprising grad student wants to run with the idea of actually _testing_ this. We maybe be above Terry's 0/30 and nearer to Kevin's 'low'/30. But let's bring some data to the debate. o Overall, I would tend to think that our CRAN standards of releasing with tests, examples, and checks on every build and release already do a much better job of keeping things tidy and workable than in most if not all other related / similar open source projects. I would of course welcome contradictory examples. Dirk -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
On 3/20/2014 9:00 AM, Therneau, Terry M., Ph.D. wrote: On 03/20/2014 07:48 AM, Michael Weylandt wrote: On Mar 20, 2014, at 8:19, "Therneau, Terry M., Ph.D." wrote: There is a central assertion to this argument that I don't follow: At the end of the day most published results obtained with R just won't be reproducible. This is a very strong assertion. What is the evidence for it? If I've understood Jeroen correctly, his point might be alternatively phrased as "won't be reproducED" (i.e., end user difficulties, not software availability). Michael That was my point as well. Of the 30+ Sweave documents that I've produced I can't think of one that will change its output with a new version of R. My 0/30 estimate is at odds with the "nearly all" assertion. Perhaps I only do dull things? Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel The only concrete example that comes to mind from my own Sweave reports was actually caused by BioConductor and not CRAN. I had a set of analyses that used DNAcopy, and the results changed substantially with a new release of the package in which they changed the default values to the main function call. As a result, I've taken to writing out more of the defaults that I previously just accepted. There have been a few minor issues similar to this one (with changes to parts of the Mclust package ??). So my estimates are somewhat higher than 0/30 but are still a long way from "almost all". Kevin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
On 03/20/2014 07:48 AM, Michael Weylandt wrote: On Mar 20, 2014, at 8:19, "Therneau, Terry M., Ph.D." wrote: There is a central assertion to this argument that I don't follow: At the end of the day most published results obtained with R just won't be reproducible. This is a very strong assertion. What is the evidence for it? If I've understood Jeroen correctly, his point might be alternatively phrased as "won't be reproducED" (i.e., end user difficulties, not software availability). Michael That was my point as well. Of the 30+ Sweave documents that I've produced I can't think of one that will change its output with a new version of R. My 0/30 estimate is at odds with the "nearly all" assertion. Perhaps I only do dull things? Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
On Mar 20, 2014, at 8:19, "Therneau, Terry M., Ph.D." wrote: > There is a central assertion to this argument that I don't follow: > >> At the end of the day most published results obtained with R just won't be >> reproducible. > > This is a very strong assertion. What is the evidence for it? If I've understood Jeroen correctly, his point might be alternatively phrased as "won't be reproducED" (i.e., end user difficulties, not software availability). Michael __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
There is a central assertion to this argument that I don't follow: At the end of the day most published results obtained with R just won't be reproducible. This is a very strong assertion. What is the evidence for it? I write a lot of Sweave/knitr in house as a way of documenting complex analyses, and a glm() based logistic regression looks the same yesterday as it will tomorrow. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel