Michael Weylandt <michael.weyla...@gmail.com> writes: > On Mar 19, 2014, at 22:17, Gavin Simpson <ucfa...@gmail.com> wrote: > >> Michael, >> >> I think the issue is that Jeroen wants to take that responsibility out >> of the hands of the person trying to reproduce a work. If it used R >> 3.0.x and packages A, B and C then it would be trivial to to install >> that version of R and then pull down the stable versions of A B and C >> for that version of R. At the moment, one might note the packages used >> and even their versions, but what about the versions of the packages >> that the used packages rely upon & so on? What if developers don't >> state know working versions of dependencies? > > Doesn't sessionInfo() give all of this? > > If you want to be very worried about every last bit, I suppose it > should also include options(), compiler flags, compiler version, BLAS > details, etc. (Good talk on the dregs of a floating point number and > how hard it is to reproduce them across processors > http://www.youtube.com/watch?v=GIlp4rubv8U)
In principle yes - but this calls specifically for a package which is extracting the info and stores it into a human readable format, which can then be used to re-install (automatically) all the versions for (hopefully) reproducibility - because if there are external libraries included, you HAVE problems. > >> >> The problem is how the heck do you know which versions of packages are >> needed if developers don't record these dependencies in sufficient >> detail? The suggested solution is to freeze CRAN at intervals >> alongside R releases. Then you'd know what the stable versions were. > > Only if you knew which R release was used. Well - that would be easier to specify in a paper then the version infos of all packages needed - and which ones of the installed ones are actually needed? OK - the ones specified in library() calls. But wait - there are dependencies, imports, ... That is a lot of digging - I wpul;d not know how to do this out of my head, except by digging through the DESCRIPTION files of the packages... > >> >> Or we could just get package developers to be more thorough in >> documenting dependencies. Or R CMD check could refuse to pass if a >> package is listed as a dependency but with no version qualifiers. Or >> have R CMD build add an upper bound (from the current, at build-time >> version of dependencies on CRAN) if the package developer didn't >> include and upper bound. Or... The first is unliekly to happen >> consistently, and no-one wants *more* checks and hoops to jump through >> :-) >> >> To my mind it is incumbent upon those wanting reproducibility to build >> the tools to enable users to reproduce works. > > But the tools already allow it with minimal effort. If the author > can't even include session info, how can we be sure the version of R > is known. If we can't know which version of R, can we ever change R at > all? Etc to absurdity. > > My (serious) point is that the tools are in place, but ramming them > down folks' throats by intentionally keeping them on older versions by > default is too much. > >> When you write a paper >> or release a tool, you will have tested it with a specific set of >> packages. It is relatively easy to work out what those versions are >> (there are tools in R for this). What is required is an automated way >> to record that info in an agreed upon way in an approved >> file/location, and have a tool that facilitates setting up a package >> library sufficient with which to reproduce a work. That approval >> doesn't need to come from CRAN or R Core - we can store anything in >> ./inst. > > I think the package version and published paper cases are different. > > For the latter, the recipe is simple: if you want the same results, > use the same software (as noted by sessionInfoPlus() or equiv) Dependencies, imports, package versions, ... not that straight forward I would say. > > For the former, I think you start straying into this NP complete problem: > http://people.debian.org/~dburrows/model.pdf > > Yes, a good config can (and should be recorded) but isn't that exactly what > sessionInfo() gives? > >> >> Reproducibility is a very important part of doing "science", but not >> everyone using CRAN is doing that. Why force everyone to march to the >> reproducibility drum? I would place the onus elsewhere to make this >> work. >> > > Agreed: reproducibility is the onus of the author, not the reader Exactly - but also the authors of the software which is aimed at being used in the context of reproducibility - the tools should be there to make it easy! My points are: 1) I think the snapshot idea of CRAN is a good idea which should be followed 2) The snapshots should be incorporated at CRAN as I assume that CRAN will be there longer then any third party repository. 3) the default for the user should *not* change, i.e. normal users will always get the newest packages as it is now 4) If this can / will not be done because of workload, storage space, ... commands should be incorporated in a package (preferably which becomes part of the core packages) to store snapshots of installed package and R version information as a human readable text file, but which can be parsed by a second command to re-create this setup. Cheers, and thanks for this important discussion (could have been a GSoC project?), Rainer > > >> Gavin >> A scientist, very much interested in reproducibility of my work and others. > > Michael > In finance, where we call it "Auditability" and care very much as well :-) > > > [[alternative HTML version deleted]] > -- Rainer M. Krug email: Rainer<at>krugs<dot>de PGP: 0x0F52F982
pgp0EotWauDSe.pgp
Description: PGP signature
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel