Re: [Rd] R CMD check for the R code from vignettes

Carl Boettiger Fri, 30 May 2014 09:30:07 -0700

Hi Yihui,

I agree with you (and your comments in [knitr issue 784]) that it seems
wrong for R CMD check to be using tangle (purl, etc) as a way to check R
code in a vignette, when the standard and expected way to check the
vignette is already to knit / Sweave the vignette.


I also agree with the perspective that the tangle function no longer plays
the crucial role it did when we were using noweb and C programs that
couldn't be compiled without tangle.

However, I would be hesitant to see tangle removed entirely, as it is
occasionally a convenient way to create an R script from a dynamic
document.  Pure R scripts are still much more widely recognized than
dynamic documents, and I sometimes will just tangle out the R code because
a collaborator would have no idea what to do with a .Rmd file (Though
RStudio is certainly improving this situation).  Tangle-like functions also
provides a nice compliment to the "stitch" and friends that make dynamic
documents from the ubiquitous R scripts.

[knitr issue 784]: https://github.com/yihui/knitr/issues/784


- Carl



On Fri, May 30, 2014 at 6:21 AM, Kevin Coombes <kevin.r.coom...@gmail.com>
wrote:

> Hi,
>
> Unless someone is planning to change Stangle to include inline expressions
> (which I am *not* advocating), I think that relying on side-effects within
> an \Sexpr construction is a bad idea. So, my own coding style is to
> restrict my use of \Sexpr to calls of the form
> \Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less
> believe that having R CMD check use Stangle and report an error is probably
> a good thing.
>
> There is a completely separate questions about the relationship between
> Sweave/Stangle or knit/purl and literate programming that is linked to your
> question about whether to use Stangle on vignettes. The underlying model(s)
> in R have drifted away from Knuth's original conception, for some good
> reasons.
>
> The original goal of literate programming was to be able to explain the
> algorithms and data structures in the code to humans.  For that purpose, it
> was important to have named code chunks that you could move around, which
> would allow you to describe the algorithm starting from a high level
> overview and then drilling down into the details. From this perspective,
> "tangle" was critical to being able to reconstruct a program that would
> compile and run correctly.
>
> The vast majority of applications of Sweave/Stangle or knit/purl in modern
> R have a completely different goal: to produce some sort of document that
> describes the results of an analysis to a non-programmer or
> non-statistician.  For this goal, "weave" is much more important than
> "tangle", because the most important aspect is the ability to integrate the
> results (figures, tables, etc) of running the code into the document that
> get passed off to the person for whom the analysis was prepared. As a
> result, the number of times in my daily work that I need to explicitly
> invoke Stangle (or purl) explicitly is many orders of magnitude smaller
> than  the number of times that I invoke Sweave (or knitr).
>
>   -- Kevin
>
>
>
> On 5/30/2014 1:04 AM, Yihui Xie wrote:
>
>> Hi,
>>
>> Recently I saw a couple of cases in which the package vignettes were
>> somewhat complicated so that Stangle() (or knitr::purl() or other
>> tangling functions) can fail to produce the exact R code that is
>> executed by the weaving function Sweave() (or knitr::knit(), ...). For
>> example, this is a valid document that can pass the weaving process
>> but cannot generate a valid R script to be source()d:
>>
>> \documentclass{article}
>> \begin{document}
>> Assign 1 to x: \Sexpr{x <- 1}
>> <<>>=
>> x + 1
>> @
>> \end{document}
>>
>> That is because the inline R code is not written to the R script
>> during the tangling process. When an R package vignette contains
>> inline R code expressions that have significant side effects, R CMD
>> check can fail because the tangled output is not correct. What I
>> showed here is only a trivial example, and I have seen two packages
>> that have more complicated scenarios than this. Anyway, the key thing
>> that I want to discuss here is, since the R code in the vignette has
>> been executed once during the weaving process, does it make much sense
>> to execute the code generated from the tangle function? In other
>> words, if the weaving process has succeeded, is it necessary to
>> source() the R script again?
>>
>> The two options here are:
>>
>> 1. Do not check the R code from vignettes;
>> 2. Or fix the tangle function so that it produces exactly what was
>> executed in the weaving process. If this is done, I'm back to my
>> previous question: does it make sense to run the code twice?
>>
>> To push this a little further, personally I do not quite appreciate
>> literate programming in R as two separate steps, namely weave and
>> tangle. In particular, I do not see the value of tangle, considering
>> Sweave() (or knitr::knit()) as the new "source()". Therefore
>> eventually I tend to just drop tangle, but perhaps I missed something
>> here, and I'd like to hear what other people think about it.
>>
>> Regards,
>> Yihui
>> --
>> Yihui Xie <xieyi...@gmail.com>
>> Web: http://yihui.name
>>
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R CMD check for the R code from vignettes

Reply via email to