A working example of knitr caching across workflows is now available at https://github.com/LTLA/BiocWorkCache <https://github.com/LTLA/BiocWorkCache>.
It uses “~/chipseq.log” as a log to demonstrate that the code in the most-upstream workflow (“test1.Rmd”) is indeed only executed once during the BUILD. Note that the compilation of upstream vignettes involves a system call out to a separate R session. This avoids some difficult issues with caching when a Rmd file is compiled from within another Rmd file - trying to use rmarkdown::render() on the upstream vignette within a downstream vignette does not generate a cache that is recognized when BUILD goes onto compile the upstream vignette. -A > On 23 Dec 2018, at 01:24, Aaron Lun > <infinite.monkeys.with.keyboa...@gmail.com> wrote: > > Yes, I had noticed the vignettes.rds as well, and I figured that would be a > problem. > > I just tried setting set cache=TRUE in my vignettes, implemented such that > BUILDing each downstream vignette will also run all upstream vignettes on > which it depends (that haven’t already been compiled). If an upstream > vignette is run in this manner, it caches the results of each code chunk to > avoid repeated work when it gets compiled “for real” by R CMD BUILD. > > This seems to work on initial inspection (the caches are produced for the > upstream vignettes upon running one downstream vignette). I’ll have to check > whether this plays nice with R CMD BUILD. I will probably have to write a > function to isolate the scope of the execution of each upstream vignette, to > avoid polluting the namespace and cache of each downstream vignette. > > -A > >> On 22 Dec 2018, at 19:22, Henrik Bengtsson <henrik.bengts...@gmail.com >> <mailto:henrik.bengts...@gmail.com>> wrote: >> >> On Sat, Dec 22, 2018 at 10:56 AM Michael Lawrence >> <lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>> wrote: >>> >>> Anything that eventually lands in inst/doc is a vignette, I think, so >>> there might be a hack around that. >> >> Just so this is not misread - it's *not* possible to just hack your >> vignette "product" files (PDF or HTML) into inst/doc and thinking >> you're good. R keeps track of package vignettes in a "vignette >> index", e.g. >> >>> readRDS(system.file(package = "utils", "Meta", "vignette.rds")) >> File Title PDF R Depends Keywords >> 1 Sweave.Rnw Sweave User Manual Sweave.pdf Sweave.R tools >> >> which is created during 'R CMD build' by parsing and compiling the >> vignettes >> (https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393 >> >> <https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393>). >> This vignette index is used to find package vignettes (e.g. >> utils::vignette()) and build the HTML vignette index. >> >> Also, one vignette source (e.g. Rnw, Rmd, ...) can only produce one >> vignette product (PDF or HTML) in the vignette index. You can output >> other files (e.g. image files) in a relative folder that the vignette >> references, which is why for instance non-self-contained HTML files >> work. Thus, one ad-hoc, not-so-nice hack that OP could do is to have >> a single main vignette that produces and links to all child vignettes. >> However, personally, I'd aim for using memoization/caching (to file) >> such that each vignette can be compiled independently of the others >> (and in any order), while still reusing intermediate >> results/calculations produced by earlier vignettes. >> >> /Henrik >> >>> >>> On Fri, Dec 21, 2018 at 11:26 PM Aaron Lun >>> <infinite.monkeys.with.keyboa...@gmail.com >>> <mailto:infinite.monkeys.with.keyboa...@gmail.com>> wrote: >>>> >>>> I gave it a shot: >>>> >>>> https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest> >>>> <https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>> >>>> >>>> This uses a single “controller” Rmd file to trigger Drake::make. Running >>>> this file will instruct Drake to compile all of the other vignettes >>>> following the desired dependency structure. >>>> >>>> The current sticking point is that I need to move the Drake-controlled Rmd >>>> files out of “vignettes/“, otherwise they’ll just be compiled as usual >>>> without consideration of their dependencies. This causes problems as R CMD >>>> BUILD only recognizes the controller Rmd file as the sole vignette, and >>>> doesn’t retain or index the HTML files produced from the other Rmd files >>>> as side-effects of running the controller. >>>> >>>> Are there any better ways to subvert the vignette building procedure to >>>> get the desired effect of running drake::make() and recognition of the >>>> resulting HTMLs as vignettes? >>>> >>>> -A >>>> >>>>> On 18 Dec 2018, at 17:41, Michael Lawrence <lawrence.mich...@gene.com >>>>> <mailto:lawrence.mich...@gene.com>> wrote: >>>>> >>>>> Sounds like a use case for drake... >>>>> >>>>> On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun >>>>> <infinite.monkeys.with.keyboa...@gmail.com >>>>> <mailto:infinite.monkeys.with.keyboa...@gmail.com> >>>>> <mailto:infinite.monkeys.with.keyboa...@gmail.com >>>>> <mailto:infinite.monkeys.with.keyboa...@gmail.com>>> wrote: >>>>> @Michael In this case, the resource produced by vignette X is a >>>>> SingleCellExperiment object containing the results of various processing >>>>> steps (normalization, clustering, etc.) described in that vignette. >>>>> >>>>> I can imagine a lazy evaluation model for this, but it wouldn’t be >>>>> pretty. If I had another vignette Y that depended on the SCE produced by >>>>> vignette X, I would need Y to execute all of the steps in X if X hadn’t >>>>> already been run before Y. This gets us into the territory of >>>>> Makefile-like dependencies, which seems even more complicated than simply >>>>> specifying a compilation order. >>>>> >>>>> You might ask why X and Y are split into two separate vignettes. The use >>>>> of different vignettes is motivated by the complexity of the workflows: >>>>> >>>>> - Vignette 1 demonstrates core processing steps for one read-based >>>>> single-cell RNAseq dataset. >>>>> - Vignette 2 demonstrates (slightly different) core steps for a UMI-based >>>>> dataset. >>>>> - … so on for a bunch of other core steps for different types of data. >>>>> - Vignette 6 demonstrates extra optional steps for the two SCEs produced >>>>> by vignettes 1 & 3. >>>>> - … and so on for a bunch of other optional steps. >>>>> >>>>> The separation between core and optional steps into separate documents is >>>>> desirable. From a pedagogical perspective, I would very much like to get >>>>> the reader through all the core steps before even considering the extra >>>>> steps, which would just be confusing if presented so early on. >>>>> Previously, everything was in a single document, which was difficult to >>>>> read (for users) and to debug (for me), especially because I had to use >>>>> contrived variable names to avoid clashes between different sections of >>>>> the workflow that did similar things. >>>>> >>>>> @Martin I’ve been using BiocFileCache for all of the online resources >>>>> that are used in the workflow. However, this is only for my (and the >>>>> reader’s) convenience. I use a local cache rather than the system >>>>> default, to ensure that the downloaded files are removed after package >>>>> build. This is intentional as it forces the package builder to try to >>>>> re-download resources when compiling the vignette, thus ensuring the >>>>> validity of the URLs. For a similar reason, I would prefer not to cache >>>>> the result objects for use in different R sessions. I could imagine >>>>> caching the result objects for use by a different vignette in the same >>>>> build session, but this gets back to the problem of ensuring that the >>>>> result object is generated by one vignette before it is needed by another >>>>> vignette. >>>>> >>>>> -A >>>>> >>>>>> On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.b...@gmail.com >>>>>> <mailto:mtmorgan.b...@gmail.com> <mailto:mtmorgan.b...@gmail.com >>>>>> <mailto:mtmorgan.b...@gmail.com>>> wrote: >>>>>> >>>>>> Also perhaps using BiocFileCache so that the result object is only >>>>>> generated once, then cached for future (different session) use. >>>>>> >>>>>> On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" >>>>>> <bioc-devel-boun...@r-project.org >>>>>> <mailto:bioc-devel-boun...@r-project.org> >>>>>> <mailto:bioc-devel-boun...@r-project.org >>>>>> <mailto:bioc-devel-boun...@r-project.org>> on behalf of >>>>>> lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com> >>>>>> <mailto:lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>>> >>>>>> wrote: >>>>>> >>>>>> I would recommend against dependencies across vignettes. Ideally >>>>>> someone >>>>>> can pick up a vignette and execute the code independently of any other >>>>>> documentation. Perhaps you could move the code generating those shared >>>>>> resources to the package. They could behave lazily, only generating the >>>>>> resource if necessary, otherwise reusing it. That would also make it >>>>>> easy >>>>>> for people to write their own documents using those resources. >>>>>> >>>>>> Michael >>>>>> >>>>>> On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun < >>>>>> infinite.monkeys.with.keyboa...@gmail.com >>>>>> <mailto:infinite.monkeys.with.keyboa...@gmail.com> >>>>>> <mailto:infinite.monkeys.with.keyboa...@gmail.com >>>>>> <mailto:infinite.monkeys.with.keyboa...@gmail.com>>> wrote: >>>>>> >>>>>>> In a number of my workflow packages (e.g., simpleSingleCell), I rely on >>>>>>> a >>>>>>> specific compilation order for my vignettes. This is because some >>>>>>> vignettes >>>>>>> set up resources or objects that are to be used by later vignettes. >>>>>>> >>>>>>> From what I understand, vignettes are compiled in alphanumeric ordering >>>>>>> of >>>>>>> their file names. As such, I give my vignettes fairly structured names, >>>>>>> e.g., “work-1-reads.Rmd”, “work-2-umi.Rmd” and so on. >>>>>>> >>>>>>> However, it becomes rather annoying when I want to add a new vignette in >>>>>>> the middle somewhere. This results in some unnatural numberings, e.g., >>>>>>> “work-0”, “3b”, which are ugly and unintuitive. This is relevant as >>>>>>> BiocStyle::Biocpkg() links between vignettes require you to use the >>>>>>> destination vignette’s file name; so difficult names complicate linking, >>>>>>> especially if the names continually change to reflect new orderings. >>>>>>> >>>>>>> Is there an easier way to control vignette compilation order? WRE >>>>>>> provides >>>>>>> no (obvious) guidance, so I would like to know what non-standard hacks >>>>>>> are >>>>>>> known to work on the build machines. I can imagine something dirty >>>>>>> whereby >>>>>>> one ”reference” vignette contains code to “rmarkdown::render" all other >>>>>>> vignettes in the specified order… ugh. >>>>>>> >>>>>>> -A >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >>>>>>> <mailto:Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>> >>>>>>> mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >>>>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> >>>>>>> >>>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >>>>>> <mailto:Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>> >>>>>> mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >>>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >>>>> <mailto:Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>> >>>>> mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >>>> >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel