Gavin Simpson <ucfagls <at> gmail.com> writes: > ... > > > To my mind it is incumbent upon those wanting reproducibility to build > the tools to enable users to reproduce works. When you write a paper > or release a tool, you will have tested it with a specific set of > packages. It is relatively easy to work out what those versions are > (there are tools in R for this). What is required is an automated way > to record that info in an agreed upon way in an approved > file/location, and have a tool that facilitates setting up a package > library sufficient with which to reproduce a work. That approval > doesn't need to come from CRAN or R Core - we can store anything in > ./inst.
Gavin, Thanks for contributing useful insights. With reference to Jeroen's proposal and the discussion so far, I can see where the problem lies, but the proposed solutions are very invasive. What might offer a less invasive resolution is through a robust and predictable schema for sessionInfo() content, permitting ready parsing, so that (using Hadley's interjection) the reproducer could reconstruct the original execution environment at least as far as R and package versions are concerned. In fact, I'd argue that the responsibility for securing reproducibility lies with the originating author or organisation, so that work where reproducibility is desired should include such a standardised record. There is an additional problem not addressed directly in this thread but mentioned in some contributions, upstream of R. The further problem upstream is actually in the external dependencies and compilers, beyond that in hardware. So raising consciousness about the importance of being able to query version information to enable reproducibility is important. Next, encapsulating the information permitting its parsing would perhaps enable the original execution environment to be reconstructed locally by installing external dependencies, then R, then packages from source, using the same versions of build train components if possible (and noting mismatches if not). Maybe ressurect StatDataML in addition to RData serialization of the version dependencies? Of course, current R and package versions may provide reproducibility, but if they don't, one would use the parseable record of the original development environment > > Reproducibility is a very important part of doing "science", but not > everyone using CRAN is doing that. Why force everyone to march to the > reproducibility drum? I would place the onus elsewhere to make this > work. Exactly. Roger > > Gavin > A scientist, very much interested in reproducibility of my work and others. > ... > > > > ______________________________________________ > > R-devel <at> r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel