Re: [Bioc-devel] Reducing dependencies

2020-06-03 Thread Sean Davis
Github Actions offers several advantages over travis-ci including longer
sessions and more resources. In addition, applying package caching to a
github actions workflow can essentially eliminate the time associated with
package installation after the first build. See here for an example (that
has some extra bells-and-whistles):

https://github.com/seandavi/BuildABiocWorkshop2020/blob/master/.github/workflows/basic_checks.yaml#L9-L21

Sean


On Tue, Jun 2, 2020 at 5:45 PM Koen Van den Berge 
wrote:

> Dear All,
>
> We have recently extended our Bioconductor package tradeSeq <
> https://bioconductor.org/packages/devel/bioc/html/tradeSeq.html> to allow
> different input formats and accommodate extended downstream analyses, by
> building on other R/Bioconductor packages.
> However this has resulted in a significant increase in the number of
> dependencies due to relying on other packages that also have many
> dependencies, for example causing very long build times on Travis <
> https://travis-ci.com/github/statOmics/tradeSeq>.
>
> We are therefore wondering about current recommendations to reduce the
> dependency load. We have moved some larger packages from ‘Imports’ to
> ‘Suggests’, but to no avail.
>
> Best,
> Koen
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Sean Davis, MD, PhD
Center for Cancer Research
National Cancer Institute
National Institutes of Health
Bethesda, MD 20892
https://seandavi.github.io/
https://twitter.com/seandavis12

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Reducing dependencies

2020-06-03 Thread Aaron Lun

We have recently extended our Bioconductor package tradeSeq 
 to allow 
different input formats and accommodate extended downstream analyses, by building on 
other R/Bioconductor packages.


I would guess that the problem starts here. Having a mega-package that 
does everything but walk the dog is antithetical to the Bioconductor 
philosophy of an ecosystem of interoperable packages.


From what you've described, a better architecture would be to have a 
separate package to convert multiple formats into a standard format 
(e.g., SCE), use tradeSeq to do the number crunching, and then emit 
another standard format for downstream methods to operate on.


This is compartmentalized for easier development and maintenance; 
reduces dependencies for all packages; and provides multiple entry 
points for other packages to use part or all of your workflow.


If you need to demonstrate how to use all of these packages in tandem to 
answer a complex scientific question, a vignette or book is usually 
better than writing wrappers. Teach a user to fish, etc.



However this has resulted in a significant increase in the number of dependencies due 
to relying on other packages that also have many dependencies, for example causing 
very long build times on Travis 
.


Just get rid of all the tidyverse packages, you don't really need those.


We are therefore wondering about current recommendations to reduce the 
dependency load. We have moved some larger packages from ‘Imports’ to 
‘Suggests’, but to no avail.


I consider plots to be an optional functionality of any package doing 
serious computation. Very few of the packages I am involved in have 
plotting functionality (unless that is their primary purpose, e.g., 
iSEE). In fact, the only one I can recall is SingleR, and I was dragged 
kicking and screaming into including plotting functions there. Even so, 
I shoved all the plot-related packages into "Suggests:" because I 
couldn't stand the thought of always importing them for the sake of art.


tl;dr chuck ggplot2 into "Suggests:" and shave off ~20 dependencies. Or 
even better, make a new package for "trajectory-related plots" and then 
other people can use them even if they don't care for tradeSeq's math.


-A

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Reducing dependencies

2020-06-03 Thread Robert Castelo

hi Koen,

you can do some analysis of the dependencies using the BiocPkgTools as 
follows:


library(BiocPkgTools)

depdf <- buildPkgDependencyDataFrame(repo=c("BioCsoft", "CRAN"),
dependencies=c("Depends", "Imports"))
## if you get this error
##
## Error in readRDS(gzcon(con)) :
##   cannot open the connection to 
'https://packagemanager.rstudio.com/all/__linux__/bionic/latest/web/packages/packages.rds'

##
## please change the CRAN mirror and choose anything but RStudio, by 
doing ..


chooseCRANmirror()

## then call the function 'pkgDepMetrics()'
pdm <- pkgDepMetrics("tradeSeq", depdf)
pdm
 ImportedAndUsed Exported Usage DepOverlap 
DepGainIfExcluded

S4Vectors  1  275  0.36 0.09 0
dplyr  2  261  0.77 0.30 4
mgcv   3  172  1.74 0.13 0
ggplot2   10  504  1.98 0.53    11
magrittr   1   35  2.86 0.01 0
BiocParallel   2   67  2.99 0.12 6
pbapply    1   17  5.88 0.03 1
SummarizedExperiment   6   79  7.59 0.32 0
SingleCellExperiment   5   55  9.09 0.33 0
slingshot  4   23 17.39 0.43 3
princurve  1    5 20.00 0.08 0
Biobase   NA  128    NA 0.08 0
edgeR NA  234    NA 0.13 3
matrixStats   NA  105    NA 0.01 0
RColorBrewer  NA    4    NA 0.01 0
tibble    NA   42    NA 0.24 0

in the help page of 'pkgDepMetrics' and the section "7 Dependency 
burden" from the BiocPkgTools vignette, you can find a description of 
these columns, but essentially we see that 'ggplot2' is the dependency 
that has the larger overlap with the dependency graph of 'tradeSeq' and 
by removing it you would have the largest reduction in dependencies. 
however, you're also using 10 functions from this package so this is not 
a dependency you can easily replace. you can try to explore whether you 
could get rid of the dependencies for which 'BiocPkgTools' could not 
identify the functionality imported, which are those with NA values in 
the column 'Usage'. you can explore what functions you're actually using 
with 'pkgDepImports()', for instance:


imp <- pkgDepImports("tradeSeq")
imp[imp$pkg %in% "dplyr", ]
# A tibble: 2 x 2
  pkg   fun
   
1 dplyr filter
2 dplyr mutate

this means that if you would avoid using 'filter()' and 'mutate()', you 
could in principle remove 'dplyr' as a dependency.


you also mentioned below that you moved packages from imports to 
suggests, to do this kind of analysis including packages in 'suggests' 
you need to call again 'buildPkgDependencyDataFrame()' adding 'Suggests' 
to the 'dependencies' argument and then call 'pkgDepMetrics'. however, i 
guess the packages in suggests are used only in the vignette, so the 
solution there would be to try to simplify the vignette.


cheers,

robert.


On 02/06/2020 23:18, Koen Van den Berge wrote:

Dear All,

We have recently extended our Bioconductor package tradeSeq 
 to allow 
different input formats and accommodate extended downstream analyses, by building on 
other R/Bioconductor packages.
However this has resulted in a significant increase in the number of dependencies due 
to relying on other packages that also have many dependencies, for example causing 
very long build times on Travis 
.

We are therefore wondering about current recommendations to reduce the 
dependency load. We have moved some larger packages from ‘Imports’ to 
‘Suggests’, but to no avail.

Best,
Koen
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Reducing dependencies

2020-06-02 Thread Henrik Bengtsson
RStudio provides pre-built R package for Linux and since a some weeks
now, they can be used on GitHub Actions
(https://github.com/r-lib/actions).  In addition, the run-time limit
on GitHub Actions is several hours compared to the 50 minutes you've'
got on Travis, so even if you install from source, you're less likely
to hit these limits on GitHub Actions.

Also, it could be that you could tweak/trick Travis to install above
Linux binary packages.

My $.02

/Henrik

On Tue, Jun 2, 2020 at 2:45 PM Koen Van den Berge
 wrote:
>
> Dear All,
>
> We have recently extended our Bioconductor package tradeSeq 
>  to allow 
> different input formats and accommodate extended downstream analyses, by 
> building on other R/Bioconductor packages.
> However this has resulted in a significant increase in the number of 
> dependencies due to relying on other packages that also have many 
> dependencies, for example causing very long build times on Travis 
> .
>
> We are therefore wondering about current recommendations to reduce the 
> dependency load. We have moved some larger packages from ‘Imports’ to 
> ‘Suggests’, but to no avail.
>
> Best,
> Koen
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Reducing dependencies

2020-06-02 Thread Koen Van den Berge
Dear All,

We have recently extended our Bioconductor package tradeSeq 
 to allow 
different input formats and accommodate extended downstream analyses, by 
building on other R/Bioconductor packages.
However this has resulted in a significant increase in the number of 
dependencies due to relying on other packages that also have many dependencies, 
for example causing very long build times on Travis 
.

We are therefore wondering about current recommendations to reduce the 
dependency load. We have moved some larger packages from ‘Imports’ to 
‘Suggests’, but to no avail.

Best,
Koen
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel