----- Original Message ----- > From: "David Winsemius" <dwinsem...@comcast.net> > To: "Jeroen Ooms" <jeroen.o...@stat.ucla.edu> > Cc: "r-devel" <r-devel@r-project.org> > Sent: Wednesday, March 19, 2014 11:03:32 PM > Subject: Re: [Rd] [RFC] A case for freezing CRAN > > > On Mar 19, 2014, at 7:45 PM, Jeroen Ooms wrote: > > > On Wed, Mar 19, 2014 at 6:55 PM, Michael Weylandt > > <michael.weyla...@gmail.com> wrote: > >> Reading this thread again, is it a fair summary of your position > >> to say "reproducibility by default is more important than giving > >> users access to the newest bug fixes and features by default?" > >> It's certainly arguable, but I'm not sure I'm convinced: I'd > >> imagine that the ratio of new work being done vs reproductions is > >> rather high and the current setup optimizes for that already. > > > > I think that separating development from released branches can give > > us > > both reliability/reproducibility (stable branch) as well as new > > features (unstable branch). The user gets to pick (and you can pick > > both!). The same is true for r-base: when using a 'released' > > version > > you get 'stable' base packages that are up to 12 months old. If you > > want to have the latest stuff you download a nightly build of > > r-devel. > > For regular users and reproducible research it is recommended to > > use > > the stable branch. However if you are a developer (e.g. package > > author) you might want to develop/test/check your work with the > > latest > > r-devel. > > > > I think that extending the R release cycle to CRAN would result > > both > > in more stable released versions of R, as well as more freedom for > > package authors to implement rigorous change in the unstable > > branch. > > When writing a script that is part of a production pipeline, or > > sweave > > paper that should be reproducible 10 years from now, or a book on > > using R, you use stable version of R, which is guaranteed to behave > > the same over time. However when developing packages that should be > > compatible with the upcoming release of R, you use r-devel which > > has > > the latest versions of other CRAN and base packages. > > > As I remember ... The example demonstrating the need for this was an > XML package that cause an extract from a website where the headers > were misinterpreted as data in one version of pkg:XML and not in > another. That seems fairly unconvincing. Data cleaning and > validation is a basic task of data analysis. It also seems excessive > to assert that it is the responsibility of CRAN to maintain a synced > binary archive that will be available in ten years.
CRAN already does this, the bin/windows/contrib directory has subdirectories going back to 1.7, with packages dated October 2004. I don't see why it is burdensome to continue to archive these. It would be nice if source versions had a similar archive. Dan > Bug fixes would > be inhibited for years.... not unlike SAS and Excel. What next? > Perhaps al bugs should be labeled as features? Surely this > CRAN-of-the-future would be offering something that no other > statistical package currently offers, nicht wahr? > > Why not leave it to the authors to specify the packages which version > numbers were used in their publications. The authors of the packages > would get recognition and the dependencies would be recorded. > > -- > David. > > > > > >> What I'm trying to figure out is why the standard "install the > >> following list of package versions" isn't good enough in your > >> eyes? > > > > Almost nobody does this because it is cumbersome and impractical. > > We > > can do so much better than this. Note that in order to install old > > packages you also need to investigate which versions of > > dependencies > > of those packages were used. On win/osx, users need to manually > > build > > those packages which can be a pain. All in all it makes > > reproducible > > research difficult and expensive and error prone. At the end of the > > day most published results obtain with R just won't be > > reproducible. > > > > Also I believe that keeping it simple is essential for solutions to > > be > > practical. If every script has to be run inside an environment with > > custom libraries, it takes away much of its power. Running a bash > > or > > python script in Linux is so easy and reliable that entire > > distributions are based on it. I don't understand why we make our > > lives so difficult in R. > > > > In my estimation, a system where stable versions of R pull packages > > from a stable branch of CRAN will naturally resolve the majority of > > the reproducibility and reliability problems with R. And in > > contrast > > to what some people here are suggesting it does not introduce any > > limitations. If you want to get the latest stuff, you either grab a > > copy of r-devel, or just enable the testing branch and off you go. > > Debian 'testing' works in a similar way, see > > http://www.debian.org/devel/testing. > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel