Given the version / dated snapshots of CRAN, and an agreement that
reproducibility is the responsibility of the study author, the author
simply needs to sync all their packages to a chosen date, run the analysis
and publish the chosen date.  It is true that this doesn't include
compilers, OS, system packages etc, but in my experience those are
significantly more stable than CRAN packages.


Also, my previous description of how to serve up a dated CRAN was way too
complicated.  Since most of the files on CRAN never change, they don't need
version control.  Only the metadata about which versions are current really
needs to be tracked, and that's small enough that it could be stored in
static files.




On Thu, Mar 20, 2014 at 6:32 AM, Dirk Eddelbuettel <e...@debian.org> wrote:

>
> No attempt to summarize the thread, but a few highlighted points:
>
>  o Karl's suggestion of versioned / dated access to the repo by adding a
>    layer to webaccess is (as usual) nice.  It works on the 'supply' side.
> But
>    Jeroen's problem is on the demand side.  Even when we know that an
>    analysis was done on 20xx-yy-zz, and we reconstruct CRAN that day, it
> only
>    gives us a 'ceiling' estimate of what was on the machine.  In production
>    or lab environments, installations get stale.  Maybe packages were
> already
>    a year old?  To me, this is an issue that needs to be addressed on the
>    'demand' side of the user. But just writing out version numbers is not
>    good enough.
>
>  o Roger correctly notes that R scripts and packages are just one issue.
>    Compilers, libraries and the OS matter.  To me, the natural approach
> these
>    days would be to think of something based on Docker or Vagrant or (if
> you
>    must, VirtualBox).  The newer alternatives make snapshotting very cheap
>    (eg by using Linux LXC).  That approach reproduces a full environemnt as
>    best as we can while still ignoring the hardware layer (and some readers
>    may recall the infamous Pentium bug of two decades ago).
>
>  o Reproduciblity will probably remain the responsibility of study
>    authors. If an investigator on a mega-grant wants to (or needs to)
> freeze,
>    they do have the tools now.  Requiring the need of a few to push work on
>    those already overloaded (ie CRAN) and changing the workflow of
> everybody
>    is a non-starter.
>
>  o As Terry noted, Jeroen made some strong claims about exactly how flawed
>    the existing system is and keeps coming back to the example of 'a JSS
>    paper that cannot be re-run'.  I would really like to see empirics on
>    this.  Studies of reproducibility appear to be publishable these days,
> so
>    maybe some enterprising grad student wants to run with the idea of
>    actually _testing_ this.  We maybe be above Terry's 0/30 and nearer to
>    Kevin's 'low'/30.  But let's bring some data to the debate.
>
>  o Overall, I would tend to think that our CRAN standards of releasing with
>    tests, examples, and checks on every build and release already do a much
>    better job of keeping things tidy and workable than in most if not all
>    other related / similar open source projects. I would of course welcome
>    contradictory examples.
>
> Dirk
>
> --
> Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to