[Bioc-devel] Request to add maintainers for diffHic

2024-03-25 Thread Aaron Lun
Could Hannah and Gordon (in cc) be given push access to Bioc's diffHic 
repository? Note, this is in addition to my current push access, as I 
will be responsible for the large body of C++ code still in the package.


Thanks,

-A

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Python module "tensorflow_probability" not found

2023-07-08 Thread Aaron Lun

Not hard to have OS-specific environments, see for example:

https://github.com/alanocallaghan/snifter/blob/devel/R/basilisk.R

-A

On 7/6/23 20:23, Kasper Daniel Hansen wrote:

This sounds excellent Kim!

Here you can get 2.10 for Windows: https://anaconda.org/anaconda/tensorflow
although my experience is that I hate mixing channels on conda. It is also
quite interesting that this conda package also has Windows at an older
version (but just 2.10 vs. 2.12)

This really speaks to a potential need for having basilisk dependencies
being platform specific. This would also come in handy for macOS. But
AFAIK, this is not supported by basilisk currently. Might be something we
need to address.

Best,
Kasper



On Thu, Jul 6, 2023 at 5:38 PM Kim Philipp Jablonski <
kim.philipp.jablon...@gmail.com> wrote:


Thank you all so much for your input and the references!

@Kasper: I mostly rely on tensorflow and tensorflow-probability, so I might
somehow get rid of the keras dependency but it would require some work.

After being inspired by the lovely orthos package (thanks Charlotte!), I
decided to play around further in the basilisk direction and updated my
project (https://github.com/cbg-ethz/pareg/tree/feature-basilisk).
The hardest part was figuring out a set of package versions which satisfy
conda's package manager (@Vincent, I feel you!).
But then it just magically worked on my local machine.

When testing with GitHub Actions, the windows runner crashes with
PackagesNotFoundError: The following packages are not available from
current channels:
   - tensorflow=2.11.1
A look at conda-forge (https://anaconda.org/conda-forge/tensorflow)
reveals
that for win-64, there's only v1.14.0 available... I guess ignoring the
windows build is not an option for my bioc package?

For the ubuntu runner, my vignettes were created successfully ("* creating
vignettes ... OK"). My tests still fail, but that is expected because I
have not wrapped them with basiliskRun. Do I have to do this manually for
every function call which may interact with tensorflow (so much
boilerplate), or can I somehow implicitly use the created conda env for
every function in my package?

On Thu, Jul 6, 2023 at 2:08 PM Vincent Carey 
wrote:


That's great news.  FWIW I am finding that the advice at
https://rstudio.github.io/reticulate/articles/python_dependencies.html
can work to produce properly resolved python dependencies.  Just don't
follow the example literally; the requested
scipy version may not exist.  Version 1.11.1 does. Stay tuned.

On Thu, Jul 6, 2023 at 7:43 AM Charlotte Soneson <
charlottesone...@gmail.com>
wrote:


Hi,

in case it's useful: we have a package (orthos) in review (
https://github.com/Bioconductor/Contributions/issues/3042) which uses
basilisk to set up a conda environment with tensorflow and keras. It

builds

and runs fine both on GitHub Actions (GitHub repo here:
https://github.com/fmicompbio/orthos) and on the Single Package

Builder.

We have also tested (locally) that it will use GPUs if available, and

that

the GPU configuration can be controlled from the R session (outside of

the

package), e.g. by setting the CUDA_VISIBLE_DEVICES environment

variable.


Charlotte


On 5 Jul 2023, at 23:12, Kasper Daniel Hansen <

kasperdanielhan...@gmail.com> wrote:


So I think Kim is interfacing to tensorflow by using the keras

package

from

CRAN (partly authored by the Rstudio people). This package leaves it

to

the

user to install tensorflow, which is a highly non-trivial

installation

task. There is some partly helpful instructions for using conda

together

with reticulate (see the macOS tab on
https://tensorflow.rstudio.com/install/local_gpu.html). This is the

job

that basilisk handles for you. In essence, basilisk allows the

developer

to

specify an R-package-specific conda. Tensorflow can be run on a

CPUor a

GPU. Getting it to run on a user-GPU is extra complicated and I am

not

sure

basilisk can handle this.

Going forward, we (Bioc) want to decide if we want to support keras

on

our

build system. This will require some work, because it is definitely

not

trivial to get to work (but much more possible if we limit ourselves

to

running on CPU). If we decide to support keras, we should try to

figure

out

how to wrap keras into a basilisk container; perhaps something like
creating a keras-basilisk R package, because IF we decide to support

keras,

this is going to be a major headache (to add to the frustration,

tensorflow

often rearranges everything so I future issues keeping it

operational).


For Kim: I think you should consider if there are any alternatives to
keras. Even if we get it to work on our build system, users will have

major

headache getting this to work (I think). If there are no alternatives

to

keras, you should perhaps think about doing the keras-basilisk

option I

outline above (assuming that is feasible; I don't know how keras

interfaces

with tensorflow. You might also have major 

[Bioc-devel] Updates to basilisk on BioC-devel

2022-09-08 Thread Aaron Lun
basilisk 1.9.6 on BioC-devel now uses an updated version of the
Miniconda installer, in order to support the new Arm-based Macs. The
update of the conda version is accompanied by a change in the default
Python version, which is now 3.8 as opposed to 3.7.

For developers of basilisk client packages, there are a few things to
watch out for:

1) If you didn't pin the Python version in your BasiliskEnvironment
manifest, you may be encountering BioC build system failures due to
version resolution failures, specifically between Python 3.8 and your
requested Python packages. There are two solutions:

- Pin the Python version to 3.7 in the manifest.
- Update your manifest to use 3.8-compatible versions of requested
Python packages.

Either of these would be fine, depending on how conservative you want to be.

2) basiliskStart and basiliskRun provide a new testload= option to
catch GLIBCXX errors from mismatches with the system shared libraries.
Upon error, basilisk will fall back to an internal copy of R to run
the requested function. Typical usage is to set something like
'testload="scipy.optimize"' in your basiliskStart/Run calls to catch
common scipy loading errors. See ?basiliskStart and
https://github.com/LTLA/basilisk/issues/20 for more details.

3) basiliskRun provides a new persist= option to more robustly persist
variables across calls for the same process. This avoids
inefficiencies from continually transferring data/results between
processes for multi-step workflows. See the BioC-devel vignette for
more details.

-A

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] C++ parallel computing

2021-05-26 Thread Aaron Lun
Incidentally, I was reflecting on this topic the other day and was 
wondering whether BiocParallel could have something like OpenMPParam() 
that sets the number of threads to some non-zero value via 
omp_set_num_threads(). This would provide a consistent framework through 
which users could control OpenMP behavior in suitably written functions.


One could even imagine having a composition design where a caller could 
assemble a BPPARAM object like:


bplapply(..., BPPARAM=OpenMPParam(SnowParam(5), 2))

which tells bplapply to spin up 5 workers where each worker is allowed 
to use up to 2 threads each. Implementation-wise, it would be a 
relatively simple matter of stuffing an extra set-up command into 
.composeTry; the nthread-setting code can be borrowed from ShortRead.


For context: I am planning on moving more parallelization in my packages 
into OpenMP to get around the overhead of the other backends. Forking is 
the only approach that is remotely fast enough, but the interaction of 
forks with the GC is too chaotic in memory-limited environments.


-A

On 5/25/21 10:39 AM, Martin Morgan wrote:

If the BAM files are each processed independently, and each processing task 
takes a while, then it is probably 'good enough' to use R-level parallel 
evaluation using BiocParallel (currently the recommendation for Bioconductor 
packages) or other evaluation framework. Also, presumably you will use Rhtslib, 
which provides C-level access to the hts library. This will requiring writing C 
/ C++ code to interface between R and the hts library, and will of course be a 
significant underataking.

It might be worth outlining in a bit more detail what your task is and how (not 
too much detail!) you've tried to implement this in Rsamtools.

Martin Morgan

On 5/24/21, 10:01 AM, "Bioc-devel on behalf of Oleksii Nikolaienko" 
 wrote:

 Dear Bioc team,
 I'd like to ask for your advice on the parallelization within a Bioc
 package. Please point me to a better place if this mailing list is not
 appropriate.
 After a bit of thinking I decided that I'd like to parallelize processing
 at the level of C++ code. Would you strongly recommend not to and use an R
 approach instead (e.g. "future")?
 If parallel C++ is ok, what would be the best solution for all major OSs?
 My initial choice was OpenMP, but then it seems that Apple has something
 against it (https://mac.r-project.org/openmp/). My own dev environment is
 mostly Big Sur/ARM64, but I wouldn't want to drop its support anyway.

 (On the actual task: loading and specific processing of very large BAM
 files, ideally significantly faster than by means of Rsamtools as a 
backend)

 Best,
 Oleksii Nikolaienko

[[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Updates to BiocFileCache, AnnotationHub, and ExperimentHub

2021-04-07 Thread Aaron Lun
> There is no guarantee we would be under the right user to have permissions to 
> move the cache automatically and would not want to leave it in a broken state.

Well, can't you try? If people follow your 4.1 instructions and they
don't have permissions, the cache will be broken anyway.

But let's say you can't move it, and your worst-case scenario comes to
pass. EVEN THEN: I would expect a deprecation warning, no error, and
BiocFileCache continuing to pull from the old cache for 6 months.

Every previous non-transparent change to BioC's core infrastructure
has come with a deprecation warning. I don't see why this is any
different. An error is particularly galling given that the package was
working fine before, it's not like you're doing some kind of critical
bugfix.

> This should not affect any cache that is explicitly stated with a different 
> name in the constructor or using environment variables;  only in the case of 
> BiocFileCache() .  Most package specific caches created their own cache in 
> the constructor so it should not cause the ERROR in that case.

If Vince's last email is any indication, and calling ExperimentHub()
or AnnotationHub() causes an error... this will be a disaster. I'm
going to get a lot of emails, unnecessary emails, from users wondering
why scRNAseq and celldex don't work anymore. It'll be like our
AWS-China problems multiplied by 10.

Why not just make a new cache and populate it? Well, I don't really
care what you do, as long as I don't get an error.

-A

> ________
> From: Aaron Lun 
> Sent: Wednesday, April 7, 2021 11:41 AM
> To: Kern, Lori 
> Cc: bioc-devel@r-project.org 
> Subject: Re: [Bioc-devel] Updates to BiocFileCache, AnnotationHub, and 
> ExperimentHub
>
> Woah, I missed the part where you said that there would be an error.
>
> This does not sound good. Users are going to flip out, especially when
> EHub and AHub are not visible dependencies (e.g., scRNAseq, celldex).
> It also sounds completely unnecessary for EHub and AHub given that the
> new cache can just be populated by fresh downloads. Similarly,
> BiocFileCache::bfcrpath should not be affected, and people using that
> shouldn't be getting an error.
>
> Why not just move the old default cache into the new location
> automatically? This seems like the simplest solution given that
> everyone accessing BFC resources should be doing so through the BFC
> API. And most files are not position-dependent, unless people are
> putting shared libraries in there.
>
> But even if you can't, an error is just too much. We use BiocFileCache
> a lot in our company infrastructure and the brown stuff will hit the
> fan if we have to find every old default cache and delete it. The
> package should handle this for us.
>
> -A
>
> On Wed, Apr 7, 2021 at 4:46 AM Kern, Lori  
> wrote:
> >
> > Mostly to lighten the dependency tree using tools that is built in with R 
> > would remove one additional dependency.  Also clarity; the tools directory 
> > adds an R folder for distinction that they are used with R packages which 
> > seemed like if a user was ever investigating, they would have a better idea 
> > where those files came from.
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Comprehensive Cancer Center
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > 
> > From: Bioc-devel  on behalf of Aaron Lun 
> > 
> > Sent: Wednesday, April 7, 2021 4:10 AM
> > To: bioc-devel@r-project.org 
> > Subject: Re: [Bioc-devel] Updates to BiocFileCache, AnnotationHub, and 
> > ExperimentHub
> >
> > rebook and basilisk are also currently using rappdirs. I would be
> > interested in the motivation behind the switch for the Hubs and whether
> > that is applicable to those two packages as well.
> >
> > -A
> >
> > On 4/5/21 6:41 AM, Kern, Lori wrote:
> > > We are in process of making some major updates to the caching in 
> > > BiocFileCache, AnnotationHub, and ExperimentHub.  Namely, the default 
> > > caching location will change from using rappdirs::user_cache_dir   to 
> > > using  tools::R_user_dir  eventually relieving the dependency on 
> > > rappdirs.  To avoid conflicting default caches, if anyone used an old 
> > > default caching directory, there will be an error to decide how to deal 
> > > with the old location before proceeding and documentation in the 
> > > vignettes for how to resolve.  Currently I have update BiocFileCache, the 
> > > changes were just pushed to th

Re: [Bioc-devel] Updates to BiocFileCache, AnnotationHub, and ExperimentHub

2021-04-07 Thread Aaron Lun
Woah, I missed the part where you said that there would be an error.

This does not sound good. Users are going to flip out, especially when
EHub and AHub are not visible dependencies (e.g., scRNAseq, celldex).
It also sounds completely unnecessary for EHub and AHub given that the
new cache can just be populated by fresh downloads. Similarly,
BiocFileCache::bfcrpath should not be affected, and people using that
shouldn't be getting an error.

Why not just move the old default cache into the new location
automatically? This seems like the simplest solution given that
everyone accessing BFC resources should be doing so through the BFC
API. And most files are not position-dependent, unless people are
putting shared libraries in there.

But even if you can't, an error is just too much. We use BiocFileCache
a lot in our company infrastructure and the brown stuff will hit the
fan if we have to find every old default cache and delete it. The
package should handle this for us.

-A

On Wed, Apr 7, 2021 at 4:46 AM Kern, Lori  wrote:
>
> Mostly to lighten the dependency tree using tools that is built in with R 
> would remove one additional dependency.  Also clarity; the tools directory 
> adds an R folder for distinction that they are used with R packages which 
> seemed like if a user was ever investigating, they would have a better idea 
> where those files came from.
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Comprehensive Cancer Center
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ____
> From: Bioc-devel  on behalf of Aaron Lun 
> 
> Sent: Wednesday, April 7, 2021 4:10 AM
> To: bioc-devel@r-project.org 
> Subject: Re: [Bioc-devel] Updates to BiocFileCache, AnnotationHub, and 
> ExperimentHub
>
> rebook and basilisk are also currently using rappdirs. I would be
> interested in the motivation behind the switch for the Hubs and whether
> that is applicable to those two packages as well.
>
> -A
>
> On 4/5/21 6:41 AM, Kern, Lori wrote:
> > We are in process of making some major updates to the caching in 
> > BiocFileCache, AnnotationHub, and ExperimentHub.  Namely, the default 
> > caching location will change from using rappdirs::user_cache_dir   to using 
> >  tools::R_user_dir  eventually relieving the dependency on rappdirs.  To 
> > avoid conflicting default caches, if anyone used an old default caching 
> > directory, there will be an error to decide how to deal with the old 
> > location before proceeding and documentation in the vignettes for how to 
> > resolve.  Currently I have update BiocFileCache, the changes were just 
> > pushed to the devel branch and should propagate tonight.  I plan on doing 
> > the same for both AnnotationHub and ExperimentHub within the next few days. 
> >  We appreciate any feedback or questions with regards to these updates.
> >
> > This is only relevant to using the default cache location,  if a user 
> > manually specified a unique location, used environment variables, or 
> > created a package specific cache the code/location is not affected.  Anyone 
> > using package specific caching that utilizes rappdirs is encouraged also to 
> > consider changing package code to use the now available function in tools.
> >
> > Cheers,
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Comprehensive Cancer Center
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> >
> > This email message may contain legally privileged and/or confidential 
> > information.  If you are not the intended recipient(s), or the employee or 
> > agent responsible for the delivery of this message to the intended 
> > recipient(s), you are hereby notified that any disclosure, copying, 
> > distribution, or use of this email message is prohibited.  If you have 
> > received this message in error, please notify the sender immediately by 
> > e-mail and delete this email message from your computer. Thank you.
> >[[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://secure-web.cisco.com/1xDKtwmD7KF6XN2PGsO5WMb8MUBuVd1_7pAaiY6x4Jc1PgpDH8GiGacn-IzghnM40n60o567y87UdWrsqKF1LJRwGTwXOP598j5D7E6SPBKpWfVoTsB78DxjOEhJN0-wwvxq-4tw3xW2W3I1DpwxlybPtN8tG2zlJiSHn9OBn0hHbo8lo81oVIOopo4mAa_ui2PcrX6zQPeh322U9G813vyhB3BsSA7YM-V70ahEdb6JUf-XZBqeY_mSavleRlfFk4-ROT8ng5D16P9XlU-qk1i64Gjc_Q0dExIV1WKOh5Sr7c05LH0vQ6eRG-IHZTN7O/https%3A%2F%2Fst

Re: [Bioc-devel] Updates to BiocFileCache, AnnotationHub, and ExperimentHub

2021-04-07 Thread Aaron Lun
rebook and basilisk are also currently using rappdirs. I would be 
interested in the motivation behind the switch for the Hubs and whether 
that is applicable to those two packages as well.


-A

On 4/5/21 6:41 AM, Kern, Lori wrote:

We are in process of making some major updates to the caching in BiocFileCache, 
AnnotationHub, and ExperimentHub.  Namely, the default caching location will 
change from using rappdirs::user_cache_dir   to using  tools::R_user_dir  
eventually relieving the dependency on rappdirs.  To avoid conflicting default 
caches, if anyone used an old default caching directory, there will be an error 
to decide how to deal with the old location before proceeding and documentation 
in the vignettes for how to resolve.  Currently I have update BiocFileCache, 
the changes were just pushed to the devel branch and should propagate tonight.  
I plan on doing the same for both AnnotationHub and ExperimentHub within the 
next few days.  We appreciate any feedback or questions with regards to these 
updates.

This is only relevant to using the default cache location,  if a user manually 
specified a unique location, used environment variables, or created a package 
specific cache the code/location is not affected.  Anyone using package 
specific caching that utilizes rappdirs is encouraged also to consider changing 
package code to use the now available function in tools.

Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Rd] Feature request: adding a log.p= option to stats::p.adjust?

2020-12-14 Thread Aaron Lun

Dear list,

Is there any interest in adding a log.p= option to p.adjust() so that it 
can accept log-transformed p-values and return log-transformed adjusted 
p-values?


I have some functions that, on occasion, return very low p-values. To 
avoid underflow in such cases, I allow my users to set log.p=TRUE, which 
is passed onto pt(), pchisq(), etc. to return log-transformed p-values. 
However, currently, these log-p-values need to be run through exp() 
before being used in p.adjust(). This re-introduces the possibility of 
underflow errors, sometimes leading to adjusted p-values of zero.


The proposed log.p= mode for p.adjust() would perform all operations in 
the log-scale to avoid this - see 
https://github.com/MarioniLab/scran/blob/f9ace20d74c7fdbde613aaf91311cdfc6dbe0feb/R/utils_markers.R#L184-L190 
for an example with the BH correction. If it were available, we could 
conveniently pass along the log.p=TRUE option to p.adjust() as we do for 
pt() and friends, and everything would "just work".


If this sounds sensible, I would be happy to put a patch together; the 
modifications involved seem simple enough.


Best,

Aaron

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Bioc-devel] maintainer access to scater

2020-09-07 Thread Aaron Lun

For record-keeping purposes:

We are transitioning maintainership of scater from myself to Alan 
O'Callaghan. Can Alan be given access to the scater BioC git repo?


-A

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Migration of many scater utilities to scuttle

2020-06-03 Thread Aaron Lun

Dear list,

scater's functions have now been split into two packages: the plotting 
functions stay in scater while many of the non-plotting utilities (QC, 
normalization, aggregation) move into the new scuttle package.


This has been done to prune the dependency tree by removing 
visualization-related packages that are not necessary for downstream 
packages relying only on the utilities. Some estimates put the reduction 
in the number of dependencies to be almost 40 packages due to the 
removal of the ggplot2 stack and its various accessories.


All migrated functions in scuttle have been re-exported in scater so no 
action is required from users or developers. The re-exported functions 
in the scater namespace should be considered to be soft-deprecated; 
developers should start to import from scuttle instead, even though a 
hard deprecation will not be done in the forseeable future. Proactive 
developers may also consider switching their Imports: from scater to 
scuttle in order to prune their own dependency trees.


Note that there was a bug in the re-exporting for 1.17.1, so some of 
scater's downstream packages may observe build failures in tomorrow's 
reports. This has been fixed in 1.17.2 so just sit tight.


-A

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Reducing dependencies

2020-06-03 Thread Aaron Lun

We have recently extended our Bioconductor package tradeSeq 
 to allow 
different input formats and accommodate extended downstream analyses, by building on 
other R/Bioconductor packages.


I would guess that the problem starts here. Having a mega-package that 
does everything but walk the dog is antithetical to the Bioconductor 
philosophy of an ecosystem of interoperable packages.


From what you've described, a better architecture would be to have a 
separate package to convert multiple formats into a standard format 
(e.g., SCE), use tradeSeq to do the number crunching, and then emit 
another standard format for downstream methods to operate on.


This is compartmentalized for easier development and maintenance; 
reduces dependencies for all packages; and provides multiple entry 
points for other packages to use part or all of your workflow.


If you need to demonstrate how to use all of these packages in tandem to 
answer a complex scientific question, a vignette or book is usually 
better than writing wrappers. Teach a user to fish, etc.



However this has resulted in a significant increase in the number of dependencies due 
to relying on other packages that also have many dependencies, for example causing 
very long build times on Travis 
.


Just get rid of all the tidyverse packages, you don't really need those.


We are therefore wondering about current recommendations to reduce the 
dependency load. We have moved some larger packages from ‘Imports’ to 
‘Suggests’, but to no avail.


I consider plots to be an optional functionality of any package doing 
serious computation. Very few of the packages I am involved in have 
plotting functionality (unless that is their primary purpose, e.g., 
iSEE). In fact, the only one I can recall is SingleR, and I was dragged 
kicking and screaming into including plotting functions there. Even so, 
I shoved all the plot-related packages into "Suggests:" because I 
couldn't stand the thought of always importing them for the sake of art.


tl;dr chuck ggplot2 into "Suggests:" and shave off ~20 dependencies. Or 
even better, make a new package for "trajectory-related plots" and then 
other people can use them even if they don't care for tradeSeq's math.


-A

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] RcppAnnoy changed in CRAN from 0.0.14 to 0.0.15, and might have broken some packages

2020-02-27 Thread Aaron Lun

This is a minor RcppAnnoy issue that should be fixed soon, hopefully.

Incidentally, this little episode has highlighted the advantages of 
Bioconductor's release/devel configuration. If the upstream dependency 
was a Bioc package, its updates are unlikely to break downstream 
packages in release. We are only broken in release right now because 
RcppAnnoy is a CRAN package and lies outside the release/devel cycle.


Of course, that is not to say that we shouldn't use CRAN packages. That 
brings me to my second point: BiocNeighbors explicitly serves as a 
wrapper for this dependency, and if I did not have faith in the 
RcppAnnoy maintainers to respond in a timely manner, I could just modify 
BiocNeighbors so that its Annoy functionality diverts to some other 
algorithm. This ensures that downstream packages that were using Annoy 
via BiocNeighbors can continue to operate - albeit with altered results 
due to the change in algorithm, but at least they don't break.


Right now, I do have faith in the RcppAnnoy maintainers so I have 
refrained from modifying BiocNeighbors in BioC-release. But the key 
point is that I can pull the trigger at any time.


-A

On 2/27/20 12:22 PM, Martin Morgan wrote:

You can't use a package version other than the current release, so 
BiocNeighbors needs to be updated (assuming that's the problem); I'm sure it 
will be. Martin

On 2/28/20, 2:51 AM, "Bioc-devel on behalf of Leonardo Collado Torres" 
 wrote:

 Hi,
 
 When installing BiocNeighbors (1.4.1, latest release from Bioconductor

 version 3.10) from source on linux I noticed the issue as a few hours
 ago it all worked fine but now it doesn't. Locally, I noticed that I
 had to update from RcppAnnoy 0.0.14 to 0.0.15, and while at my macOS I
 can install the BiocNeighbors 1.4.1 binary, I did the actual tests at
 https://github.com/LTLA/BiocNeighbors/issues/10 and noticed that
 RcppAnnoy's change lead to this. I have no idea how RcppAnnoy works,
 but well, maybe if you use it in your package you do.
 
 Now I need to google how to set on my DESCRIPTION to use the CRAN

 archived version of RcppAnnoy 0.0.14 that you can install with
 
 packageurl <- "https://cran.r-project.org/src/contrib/Archive/RcppAnnoy/RcppAnnoy_0.0.14.tar.gz;

 install.packages(packageurl, repos=NULL, type="source")
 ## From 
https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
 
 Best,

 Leo
 
 ___

 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Updates to SingleCellExperiment class structure - no action required

2020-02-16 Thread Aaron Lun

That would be correct.

On 2/16/20 6:38 AM, Kasper Daniel Hansen wrote:
It sounds to me that the class definition / structure is updated. You 
should also have an associated updateObject() method to deal with this; 
that's tradition. Of course, you may already have addressed this and 
just not written about it in your email.


On Sun, Feb 16, 2020 at 3:51 AM Aaron Lun 
<mailto:infinite.monkeys.with.keyboa...@gmail.com>> wrote:


Dear list,

As of version 1.9.2, the SingleCellExperiment class structure has been
updated so that "sizeFactors()<-" will store values in a "sizeFactor"
field in the "colData()", rather than the internal metadata. This aims
to improve interoperability of the SCE size factors with functions that
retrieve information from the column metadata; see GitHub for details.

This change occurs under the hood in the "sizeFactors()" getter and
setter functions, so no action should be required from users who were
already using those functions to manipulate the SCE size factors.
Nonetheless, downstream developers should keep an eye on their unit
tests as some of the more fragile checks may fail, e.g., if they
hard-coded the expected column names of the "colData" of an SCE.

Version 1.9.2. also removes a bunch of deprecated functionality from
the
package, which may cause unrelated failures; though this was pretty
esoteric stuff that didn't see a lot of use in the first place.

Best,

-A

___
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Best,
Kasper


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocManager not installing latest release version of DepecheR

2019-11-14 Thread Aaron Lun
I would assume that "BiocManager::install()" won't update an existing
set of BioC packages unless explicitly prompted. If that's the case,
shouldn't the instructions at http://bioconductor.org/install/ specify
"BiocManager::install(version='3.10')"?

On Wed, Nov 13, 2019 at 1:04 PM Shepherd, Lori
 wrote:
>
> BiocManager should recognize versions automatically for the version of R.  I 
> believe we had a message that would be displayed in this case where there was 
> a more recent version available
> Something along the lines of:
> Bioconductor version '3.9' is out-of-date; the current release version '3.10'
>   is available with R version '3.6'; see https://bioconductor.org/install
>
> It is broken in this latest version of BiocManager but will be correct 
> shortly.
>
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Comprehensive Cancer Center
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ________
> From: Jakob Theorell 
> Sent: Wednesday, November 13, 2019 3:59 PM
> To: Aaron Lun ; Shepherd, Lori 
> 
> Cc: bioc-devel@r-project.org 
> Subject: Re: [Bioc-devel] BiocManager not installing latest release version 
> of DepecheR
>
> Yes, this sounds like a very good solution. Thank you all for your input, and 
> I hope this silly mistake of mine might have some positive consequence.
> Best
> Jakob
> 
> From: Bioc-devel  on behalf of Aaron Lun 
> 
> Sent: 13 November 2019 18:56
> To: Shepherd, Lori 
> Cc: bioc-devel@r-project.org 
> Subject: Re: [Bioc-devel] BiocManager not installing latest release version 
> of DepecheR
>
> Perhaps the installation instructions on each package's landing page
> should explicitly specify 'version=3.10' (or whatever happens to be
> the latest release) in the install() call. This avoids ambiguities
> with using the latest version when the current and previous releases
> are on the same version of R.
>
> -A
>
> On Wed, Nov 13, 2019 at 8:25 AM Shepherd, Lori
>  wrote:
> >
> > It looks like you are still installing release 3.9 versions of packages.  
> > The latest version is release 3.10.
> >
> > If you do
> > BiocManager::version()
> > Does it show "3.9"  or "3.10"?
> >
> > I'm betting "3.9"
> >
> > You can do
> >
> > BiocManager::install(version="3.10")
> > BiocManager::install()
> >
> >
> > To update all packages to 3.10 versions
> >
> >
> > Cheers,
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Comprehensive Cancer Center
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > 
> > From: Bioc-devel  on behalf of Jakob 
> > Theorell 
> > Sent: Wednesday, November 13, 2019 11:06 AM
> > To: bioc-devel@r-project.org 
> > Subject: [Bioc-devel] BiocManager not installing latest release version of 
> > DepecheR
> >
> > Dear all,
> > This is probably a mistake on my side, but I just now tried to install the 
> > last release version of DepecheR (for which I am the maintainer), and 
> > although the source package on BioConductor is 1.2, this is the text I get 
> > from BiocManager when installing:
> >
> > Bioconductor version 3.9 (BiocManager 1.30.9), R 3.6.1 (2019-07-05)
> > Installing package(s) 'DepecheR'
> > trying URL 
> > 'https://bioconductor.org/packages/3.9/bioc/bin/macosx/el-capitan/contrib/3.6/DepecheR_1.0.3.tgz'
> >
> > This is clearly the old version, that does not contain all the updates that 
> > the source package contains. Have I missed something I should have done to 
> > prevent this and what can I in that case do now?
> > Best regards
> > Jakob Theorell, MD/PhD
> > Autoimmune Neurology Group
> > Nuffield Department of Clinical Neurosciences
> > University of Oxford
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
> > This email message may contain legally privileged and/or confidential 
> > information.  If you are not the intended recipient(s), or the employee or 
> > agent responsible for the delivery of this message to the intended 
> > recipient(s), you a

Re: [Bioc-devel] BiocManager not installing latest release version of DepecheR

2019-11-13 Thread Aaron Lun
Perhaps the installation instructions on each package's landing page
should explicitly specify 'version=3.10' (or whatever happens to be
the latest release) in the install() call. This avoids ambiguities
with using the latest version when the current and previous releases
are on the same version of R.

-A

On Wed, Nov 13, 2019 at 8:25 AM Shepherd, Lori
 wrote:
>
> It looks like you are still installing release 3.9 versions of packages.  The 
> latest version is release 3.10.
>
> If you do
> BiocManager::version()
> Does it show "3.9"  or "3.10"?
>
> I'm betting "3.9"
>
> You can do
>
> BiocManager::install(version="3.10")
> BiocManager::install()
>
>
> To update all packages to 3.10 versions
>
>
> Cheers,
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Comprehensive Cancer Center
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> 
> From: Bioc-devel  on behalf of Jakob 
> Theorell 
> Sent: Wednesday, November 13, 2019 11:06 AM
> To: bioc-devel@r-project.org 
> Subject: [Bioc-devel] BiocManager not installing latest release version of 
> DepecheR
>
> Dear all,
> This is probably a mistake on my side, but I just now tried to install the 
> last release version of DepecheR (for which I am the maintainer), and 
> although the source package on BioConductor is 1.2, this is the text I get 
> from BiocManager when installing:
>
> Bioconductor version 3.9 (BiocManager 1.30.9), R 3.6.1 (2019-07-05)
> Installing package(s) 'DepecheR'
> trying URL 
> 'https://bioconductor.org/packages/3.9/bioc/bin/macosx/el-capitan/contrib/3.6/DepecheR_1.0.3.tgz'
>
> This is clearly the old version, that does not contain all the updates that 
> the source package contains. Have I missed something I should have done to 
> prevent this and what can I in that case do now?
> Best regards
> Jakob Theorell, MD/PhD
> Autoimmune Neurology Group
> Nuffield Department of Clinical Neurosciences
> University of Oxford
>
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee or 
> agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Is magrittr "%>%" acceptable in bioc packages?

2019-08-17 Thread Aaron Lun
Vince's answer gives the official position, to which I'll add my 
personal opinions for my little corner of the world.


Every previous release, I had routine rounds of "depiping" where I go 
through any package that I'm heavily involved in and strip out any %>% 
calls. I think I've gotten the last of them in this release, but one 
must never be too careful. The reasons for this behavior are 2-fold:


- Pipes encourage these long expressions where A %>% B %>%  %>% Z. 
These are hard to debug. Sure, you could use debug_pipeline() or 
debug_pipe() but then you need to copy the whole chain and wrap it in a 
debug_pipeline() or stick a debug_pipe() in the middle... geez. I just 
want to step through a debug()'d function and look at each step.


And sometimes debug()ing isn't even possible (e.g., S4 method bodies, or 
complex scenarios involving repeated calls to a problematic function). 
In such cases, I spam print() statements on the intermediate objects for 
debugging. If I had pipes, this would require me to break up the chain 
to extract the intermediate construct for interrogation.


- They are hard to read. While technically you only have to think about 
each step of the chain as you go along, the way that it's presented 
means that, in practice, you're forced to reason about the whole chain 
at once. I'd also rather avoid having to even think about the many faces 
of "." when looking at code that I'm not familiar with.


The use of %>% chains is tolerable in some cases, e.g., interactive 
analysis code that you're writing from scratch where brevity of 
expression is important. But package code is "write once, read hundreds 
of times" where it helps to be more verbose.


So, to answer your question: not in my backyard.

-A


On 8/17/19 1:52 AM, Vincent Carey wrote:

On Sat, Aug 17, 2019 at 4:40 AM Venu Thatikonda 
wrote:


Hi,

As the title says, is %>% acceptable in bioc packages? With BiocCheck, I
get a WARNING that is "Add non-empty \\value sections to the following man
pages: man/pipe.Rd".



Yes magrittr is acceptable.  The warning says (I think) that you have a man
page for
a topic 'pipe' but did not put a \value section for the man page.  If using
roxygen you
would want to have a @return element in your documentation, to avoid this
warning.



Is it okay even if this warning appears ? `R CMD check` didn't give
warnings about it.



Try to avoid the warning.  If you have difficulty, report back.




Thank you.

--
Best regards
Venu Thatikonda
https://itsvenu.github.io/

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Build error - C stack usage is too close to the limit

2019-08-02 Thread Aaron Lun
> I can confirm this.  Would it then be appropriate for scMerge to add a
> (>= 1.7.3) after its Imports: entry for SingleCellExperiment?

I don't think this is really necessary; 1.7.3 will propagate soon
enough, at which point people just need to stay updated.

> Basically, before we commit changes to a class, it should not be too hard
> to assess the downstream effects and notify relevant developers.

Well, it *should* have been a transparent change (even for serialized
objects), it's just that there was a bug. This is not surprising - who
knows how many times S4Vectors has broken one of my packages - and my
attitude is to not do anything unless the fail persists in >=3 build
cycles.

-A

On Fri, Aug 2, 2019 at 9:19 AM Pages, Herve  wrote:
>
> This is a really important point.
>
> Finding and updating serialized S4 instances that are lying around
> as they evolve can be painful and very time-consuming.
>
> We should definitely avoid storing serialized S4 objects on the Hub.
> I don't know about ExperimentHub but at least for AnnotationHub I
> believe that we've been careful to avoid storing serialized S4
> instances there. The S4 objects that land on the user space are
> generally assembled on the user side from raw files downloaded from
> the Hub.
>
> H.
>
>
> On 8/2/19 02:58, Vincent Carey wrote:
> > On Thu, Aug 1, 2019 at 10:36 PM Aaron Lun <
> > infinite.monkeys.with.keyboa...@gmail.com> wrote:
> >
> >> One possibility is that this is due to a regression in
> >> SingleCellExperiment, caused by the altexp updates and other
> >> refactoring. This should be fixed in 1.7.3, you can check this for
> >> yourself by installing drisso/SingleCellExperiment off GitHub.
> >>
> >
> > I can confirm this.  Would it then be appropriate for scMerge to add a
> > (>= 1.7.3) after its Imports: entry for SingleCellExperiment?
> >
> >
> >>
> >> The other moral of the story is to not use serialized high-level
> >> objects. Serializing basic objects is fine, but the higher up you go,
> >> the more fragile your code becomes to refactoring. See, for example, the
> >> scRNAseq data package for how to deliver a SingleCellExperiment to an R
> >> session without relying on serialized SingleCellExperiments.
> >>
> >
> > I have run into this issue also.  It is convenient to serialize a high-level
> > object.  Breaking it down to its constituents, and assembling it, is
> > a lot more effort.  scRNAseq:::.create_sce shows how to reassemble for
> > SingleCellExperiments.
> >
> > Is this a principle we want to adopt?  Avoid serializing "non-basic"
> > objects?
> > updateObject methods should help centralize the effort to managing object
> > designs and their implementation.
> >
> > It seems it would be wise to implement this principle for any resource that
> > might be used by both release and devel ... even in devel we may see more
> > breaking changes propagating as S4 classes evolve, if objects are
> > serialized.
> > Adding validObject tests in tests/ could reduce surprises.  Helping class
> > maintainers know what packages have serialized fragile objects might be a
> > task for BiocPkgTools -- but a nontrivial one.  Maybe a BiocDevTools package
> > would be more appropriate.  And checking hub elements seems relevant too.
> >
> > Basically, before we commit changes to a class, it should not be too hard
> > to assess the downstream effects and notify relevant developers.  The
> > alternative
> > of banning serialized S4 objects seems too harsh and possibly ambiguous.
> > On the other hand it may be necessary to ban serialized S4 in the *Hubs?
> >
> >
> >
> >
> >>
> >> -A
> >>
> >> On 8/1/19 7:14 PM, Kevin Wang wrote:
> >>> Hi all,
> >>>
> >>> I am getting a strange build error message for scMerge (
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__bioconductor.org_checkResults_devel_bioc-2DLATEST_scMerge_malbec1-2Dbuildsrc.html=DwIFaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=YHQVNdyBX6rt3QxtsMXpCFatPq0SPNfUusESF9FXcno=L4f9oTVSpwscJMPgdi1rNvsOuwzZZcI4el6nnCGyeIg=
> >>  )
> >> that reads
> >>>
> >>> + "C stack usage is too close to the limit” on Linux and Mac and
> >>> + "evaluation nested too deeply: infinite recursion” on Windows, when
> >> building the vignette file “scMerge.Rmd”.
> >>>
> >>> However, when I was building the Rmd locally and also on Travis +
> >> pkgdown under BioC3.10 (
&

Re: [Bioc-devel] Build error - C stack usage is too close to the limit

2019-08-01 Thread Aaron Lun
One possibility is that this is due to a regression in 
SingleCellExperiment, caused by the altexp updates and other 
refactoring. This should be fixed in 1.7.3, you can check this for 
yourself by installing drisso/SingleCellExperiment off GitHub.


The other moral of the story is to not use serialized high-level 
objects. Serializing basic objects is fine, but the higher up you go, 
the more fragile your code becomes to refactoring. See, for example, the 
scRNAseq data package for how to deliver a SingleCellExperiment to an R 
session without relying on serialized SingleCellExperiments.


-A

On 8/1/19 7:14 PM, Kevin Wang wrote:

Hi all,

I am getting a strange build error message for scMerge 
(http://bioconductor.org/checkResults/devel/bioc-LATEST/scMerge/malbec1-buildsrc.html)
 that reads

+ "C stack usage is too close to the limit” on Linux and Mac and
+ "evaluation nested too deeply: infinite recursion” on Windows, when building 
the vignette file “scMerge.Rmd”.

However, when I was building the Rmd locally and also on Travis + pkgdown under 
BioC3.10 (https://travis-ci.org/SydneyBioX/scMerge/builds/566753523), I had no 
errors. This file has not been edited for 2 months 
(https://github.com/SydneyBioX/scMerge/blob/master/vignettes/scMerge.Rmd).

Any help would be appreciated.

Thank you
Best Wishes
Kevin

PhD Candidate
Faculty of Science, School of Mathematics and Statistics
THE UNIVERSITY OF SYDNEY

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] SingleCellExperiment refactoring

2019-07-21 Thread Aaron Lun

Dear list,

We are planning to modify the SingleCellExperiment class to better 
accommodate alternative feature sets from CITE-seq and Perturb-seq 
experiments. A new "altExps" concept has been added to store 
experimental data for alternative feature sets as nested 
SummarizedExperiment instances within a SingleCellExperiment. This aims 
to provide a flexible and lightweight approach to storing multiple 
Experiments without requiring major changes to user workflows when only 
the main feature set (i.e., endogenous genes) is of interest.


The "altExps" concept can also be extended to storage of spike-in 
transcripts. In fact, it is more convenient than the current "isSpike" 
approach, as the latter requires subsetting to remove the spike-ins 
prior to performing gene-only operations on the expression matrix (e.g., 
clustering). For this reason, we are planning to deprecate the "isSpike" 
functionality for marking rows as spike-ins. This will be replaced with 
the more general "SingleCellExperiment::splitSCEByAlt" function, which 
splits a SCE into a main SCE and nested alternative SCEs for minority 
features like spike-in transcripts, antibody or CRISPR tags, etc.


These proposed changes are expected to have the following effects on 
packages downstream of SingleCellExperiment:


- No change is required for packages that do not use spike-in 
information or multiple size factor settings.
- Packages using spike-in transcripts via "isSpike" should switch to 
"altExps" to retrieve spike-in data, with experiment-specific size 
factors to perform spike-in-specific normalization.
- Packages using other features (e.g., antibody tags) should consider 
using "altExps" to retrieve/store this data.


More technical details can be found in the discussion at 
https://github.com/drisso/SingleCellExperiment/pull/32, which also 
contains a testable implementation of the proposed change. Comments and 
other feedback on the proposed plan should be directed there.


-A

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] package test failing due to CSAW

2019-06-26 Thread Aaron Lun
This behavior has been deprecated since May 18 2018, looking at the Git 
logs here:


https://github.com/LTLA/csaw/commits/master/R/scaledAverage.R

I eventually got around to removing it last month, so the deprecation 
warning would have existed for 2 release cycles. That's fair game to me.


The reason for this is that filtering in the csaw workflow is usually 
done before conversion to a DGEList, so it makes more sense to apply 
scaledAverage() on SE objects. In fact, you're *meant* to run it on SE 
objects, hence the deprecation of the DGEList inputs


In your case, you have scale=1 so you might as well just call 
edgeR::aveLogCPM() directly. scaledAverage() just mimics this function 
with some careful scaling of the prior count when scale!=1.


-A

On 6/25/19 1:16 AM, Vivek Bhardwaj wrote:

Hi All

The check of my package is failing locally due to the error in
"csaw::scaledAverage" function. Untill version 3.9, I used to pass a
DGElist object to this function call as follows:

/dat.y <- csaw::asDGEList(data, assay = assay.data)//# create DGElist
from SE object
//dat.y <- edgeR::estimateCommonDisp(dat.y)//# estimate dispersion
/

/data.ab <- csaw::scaledAverage(dat.y, scale = 1, prior.count = 1) # get
scaled average
/

Now in version 3.10 I get the following error:

Error in (function (classes, fdef, mtable) :
    unable to find an inherited method for function ‘assay’ for signature
‘"DGEList", "character"’

Is CSAW not accepting DGElist anymore? How shall I replace this function
call here?


Thanks,

Vivek


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] scRNAseq upgrade: give us your single-cell data!

2019-06-14 Thread Aaron Lun
We have recently repurposed the scRNAseq package to serve as a general 
location for any publicly available single-cell count data. The aim of 
this package is to provide convenient functions that directly yield 
nicely formatted SingleCellExperiment objects containing count matrices 
and relevant metadata, via BioC's ExperimentHub system for easy and 
rapid access anywhere in your computer. Our USP compared to other 
single-cell data packages is that we'll take anything - and, in fact, 
the more customized the original data is, the more we want it!


So, if you find an interesting public dataset that has been - ahem - 
"imaginatively" formatted by the original authors, we would welcome a 
contribution to the scRNAseq package to make the count data nice and 
pretty. This will save other members of the R/Bioconductor community 
from pulling their hair out (there's not much left!) if they want to 
make use of that data. Contribution guidelines are described in the 
scRNAseq vignette at 
http://bioconductor.org/packages/devel/data/experiment/html/scRNAseq.html, 
and if you more-or-less follow the suggestions, we can do the rest 
pretty quickly.


So, give us your tired, your poor, your huddled datasets yearning to 
breathe free!


-A & D

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] InteractionSet for structural variants

2019-05-21 Thread Aaron Lun

Thanks for your response. So far my intention is to to plot them and I
do not intend on performing any other operation. The first step would be
read in the VCF file and transform it into a meaningful object and I was
hoping there was a core package already taking care of that, but I get
from your answer that there's no such functionality implemented.


Not to my knowledge... but if you're planning on writing some relevant 
functions, I'm sure we could find a home for it somewhere.


-A


El 5/18/19 a las 4:47 AM, Aaron Lun escribió:

I would say that it depends on what operations you intend to perform
on them. You can _store_ things any way you like, but the trick is to
ensure that operations and manipulations on those things are
consistent and meaningful. It is not obvious that there are meaningful
common operations that one might want to apply to all structural
variants.

For example, translocations involve two genomic regions (i.e., the two
bits that get stuck together) and so are inherently two-dimensional. A
lot of useful operations will be truly translocation-specific, e.g.,
calculation of distances between anchor regions, identification of
bounding boxes in two-dimensional space. These operations will be
meaningless to 1-dimensional variants on the linear genome, e.g.,
CNVs, inversions. The converse also applies where operations on the
linear genome have no single equivalent in the two-dimensional case.

So, I would be inclined to store them separately. If you must keep
them in one object, just lump them into a List with "translocation"
(GInteractions), "cnv" (GRanges) and "inversion" (another GRanges)
elements, and people/programs can pull out bits and pieces as needed.

-A


On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:

Hi all,

Is there any standard recommended container for genomic structural
variants? I think InteractionSet would work fine for translocation and
GRanges for inversions and copy number changes, but I don't know what
would be the recommended way to store them all together using standard
Bioconductor objects.

And actually, is there any package that would load a SV VCF by lumpy or
delly and build that object?

Thanks!

Bernat


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] InteractionSet for structural variants

2019-05-17 Thread Aaron Lun
I would say that it depends on what operations you intend to perform on 
them. You can _store_ things any way you like, but the trick is to 
ensure that operations and manipulations on those things are consistent 
and meaningful. It is not obvious that there are meaningful common 
operations that one might want to apply to all structural variants.


For example, translocations involve two genomic regions (i.e., the two 
bits that get stuck together) and so are inherently two-dimensional. A 
lot of useful operations will be truly translocation-specific, e.g., 
calculation of distances between anchor regions, identification of 
bounding boxes in two-dimensional space. These operations will be 
meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, 
inversions. The converse also applies where operations on the linear 
genome have no single equivalent in the two-dimensional case.


So, I would be inclined to store them separately. If you must keep them 
in one object, just lump them into a List with "translocation" 
(GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) 
elements, and people/programs can pull out bits and pieces as needed.


-A


On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:

Hi all,

Is there any standard recommended container for genomic structural
variants? I think InteractionSet would work fine for translocation and
GRanges for inversions and copy number changes, but I don't know what
would be the recommended way to store them all together using standard
Bioconductor objects.

And actually, is there any package that would load a SV VCF by lumpy or
delly and build that object?

Thanks!

Bernat


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SummarizedExperiments not equal after serialisation

2019-05-11 Thread Aaron Lun

I would say it's much worse than mismatching class definitions.

https://github.com/Bioconductor/SummarizedExperiment/issues/16

-A

On 5/11/19 5:07 AM, Martin Morgan wrote:

I think it has to do with the use of reference classes in the assay slot, which 
have different environments

   se = SummarizedExperiment()
   saveRDS(se, fl <- tempfile())
   se1 = readRDS(fl)

and then


all.equal(se@assays, se1@assays)

[1] "Class definitions are not identical"

all.equal(se@assays@.xData, se1@assays@.xData)

[1] "Component \".self\": Class definitions are not identical"

se@assays@.xData



se1@assays@.xData



Martin

On 5/11/19, 6:38 AM, "Bioc-devel on behalf of Laurent Gatto" 
 wrote:

 I would appreciate some background about the following:
 
 > suppressPackageStartupMessages(library("SummarizedExperiment"))

 > set.seed(1L)
 > m <- matrix(rnorm(16), ncol = 4, dimnames = list(letters[1:4], 
LETTERS[1:4]))
 > rowdata <- DataFrame(X = 1:4, row.names = letters[1:4])
 > se1 <- SummarizedExperiment(m, rowData = rowdata)
 > se2 <- SummarizedExperiment(m, rowData = rowdata)
 > all.equal(se1, se2)
 [1] TRUE
 
 But after serialising and reading se2, the two instances aren't equal any more:
 
 > saveRDS(se2, file = "se2.rds")

 > rm(se2)
 > se2 <- readRDS("se2.rds")
 > all.equal(se1, se2)
 [1] "Attributes: < Component “assays”: Class definitions are not identical 
>"
 
 Session information provided below.
 
 Thank you in advance,
 
 Laurent
 
 
 R version 3.6.0 RC (2019-04-21 r76417)

 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 18.04.2 LTS
 
 Matrix products: default

 BLAS:   /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3
 LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
 
 locale:

  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=fr_FR.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=fr_FR.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=fr_FR.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:

 [1] parallel  stats4stats graphics  grDevices utils datasets
 [8] methods   base
 
 other attached packages:

  [1] SummarizedExperiment_1.14.0 DelayedArray_0.10.0
  [3] BiocParallel_1.18.0 matrixStats_0.54.0
  [5] Biobase_2.44.0  GenomicRanges_1.36.0
  [7] GenomeInfoDb_1.20.0 IRanges_2.18.0
  [9] S4Vectors_0.22.0BiocGenerics_0.30.0
 
 loaded via a namespace (and not attached):

  [1] lattice_0.20-38bitops_1.0-6   grid_3.6.0
  [4] zlibbioc_1.30.0XVector_0.24.0 Matrix_1.2-17
  [7] tools_3.6.0RCurl_1.95-4.12compiler_3.6.0
 [10] GenomeInfoDbData_1.2.1
 
 
 ___

 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
___

Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

2019-04-26 Thread Aaron Lun
Thanks Daniel. Glad to see the end of that monkey business, my analyses
were going bananas.

On Fri, Apr 26, 2019 at 3:41 PM Van Twisk, Daniel <
daniel.vantw...@roswellpark.org> wrote:

> I've pushed new 3.8.2 orgdbs that should propagate soon. They do not have
> this issue.
> --
> *From:* Bioc-devel  on behalf of Pages,
> Herve 
> *Sent:* Thursday, April 25, 2019 9:19:35 PM
> *To:* Aaron Lun; Vincent Carey
> *Cc:* Bioc-devel; jmac...@u.washington.edu
> *Subject:* Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
>
> Hi Aaron,
>
> On 4/25/19 16:44, Aaron Lun wrote:
>
> It doesn't seem like it - on my installation, org.Hs.eg.db is still...
> monkeying around.
>
>
>   __
>  w  c(..)o   (
>   \__(-)__)
>   /\   (
>  /(_)___)
>  w /|
>   | \
>  m  m
>
> Daniel has prepared a new batch of *.db0 and org.* packages (v 3.8.1). The
> new packages are on their way and should become available via
> BiocManager::install() in the next 12 hours or so.
>
> Hopefully they'll put an end to the Great Monkey Conspiracy!
>
> Unfortunately we won't see the effect on tomorrow's build report, only on
> Saturday's report.
>
> Cheers,
>
> H.
>
>
>
>
>
> On Thu, Apr 25, 2019 at 9:17 AM Vincent Carey  ><mailto:st...@channing.harvard.edu>
> wrote:
>
>
>
> Has this situation been rectified?
>
> On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel <
> daniel.vantw...@roswellpark.org<mailto:daniel.vantw...@roswellpark.org>>
> wrote:
>
>
>
> We've made some changes to our annotation generation scripts this release
> and it seems these may have introduced some errors. Thank you for
> identifying this issue and I will try to have some fixes out asap.
>
> 
> From: Bioc-devel  bioc-devel-boun...@r-project.org> on behalf of James
> W. MacDonald <mailto:jmac...@uw.edu>
> Sent: Tuesday, April 23, 2019 11:03:02 AM
> To: Aaron Lun
> Cc: Bioc-devel
> Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
>
> Looks like the ensembl table of the human.db0 package got polluted with
> *Pan
> troglodytes* genes:
>
>
>
> con <- dbConnect(SQLite(),
>
>
> "/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
>
>
> dbGetQuery(con, "select count(*) from ensembl where ensid like
>
>
> 'ENSPTR%';")
>   count(*)
> 116207
>
>
> dbGetQuery(con, "select count(*) from ensembl where ensid like
>
>
> 'ENSG%';")
>   count(*)
> 128973
>
> On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
> infinite.monkeys.with.keyboa...@gmail.com infinite.monkeys.with.keyboa...@gmail.com>> wrote:
>
>
>
> Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG...?
>
>  > library(org.Hs.eg.db)
>  > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>   GCG
> "ENSPTRG777"
>
> Well, at least it still recovers the right identifier... eventually.
>
>  > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>SYMBOLENSEMBL
> 1GCG ENSPTRG777
> 2GCGENSG0115263
>
> The SYMBOL->Entrez ID relational table seems to be okay:
>
>  > Y <- toTable(org.Hs.egSYMBOL)
>  > Y[which(Y[,2]=="GCG"),]
>   gene_id symbol
> 21522641GCG
>
> So the cause is the Ensembl->Entrez mappings:
>
>  > Z <- toTable(org.Hs.egENSEMBL2EG)
>  > Z[Z[,1]==2641,]
>   gene_id ensembl_id
> 30282641 ENSPTRG777
> 30292641ENSG0115263
>
> Googling suggests that ENSPTRG777 is an identifier for some
> other gene in one of the other monkeys. Hardly "Hs" stuff.
>
> Session info (not technically R 3.6, but I didn't think that would have
> been the cause):
>
>
>
> R Under development (unstable) (2019-04-11 r76379)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
>
> Matrix products: default
> BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
> LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRES

[Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

2019-04-22 Thread Aaron Lun

Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG...?

> library(org.Hs.eg.db)
> mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
 GCG
"ENSPTRG777"

Well, at least it still recovers the right identifier... eventually.

> select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
  SYMBOLENSEMBL
1GCG ENSPTRG777
2GCGENSG0115263

The SYMBOL->Entrez ID relational table seems to be okay:

> Y <- toTable(org.Hs.egSYMBOL)
> Y[which(Y[,2]=="GCG"),]
 gene_id symbol
21522641GCG

So the cause is the Ensembl->Entrez mappings:

> Z <- toTable(org.Hs.egENSEMBL2EG)
> Z[Z[,1]==2641,]
 gene_id ensembl_id
30282641 ENSPTRG777
30292641ENSG0115263

Googling suggests that ENSPTRG777 is an identifier for some 
other gene in one of the other monkeys. Hardly "Hs" stuff.


Session info (not technically R 3.6, but I didn't think that would have 
been the cause):



R Under development (unstable) (2019-04-11 r76379)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   


attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets 
[8] methods   base 


other attached packages:
[1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5  
[4] S4Vectors_0.21.23Biobase_2.43.1   BiocGenerics_0.29.2 


loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1  digest_0.6.18   DBI_1.0.0   RSQLite_2.1.1  
 [5] blob_1.1.1  bit64_0.9-7 bit_1.1-14  compiler_3.7.0 
 [9] pkgconfig_2.0.2 memoise_1.1.0


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Problem with non-portable compiler flags on package test

2019-04-13 Thread Aaron Lun
To contribute another data point on the use of architecture-specific 
compilation settings, see the discussion at:


https://www.mail-archive.com/bioc-devel@r-project.org/msg10525.html

This discrepancy took two months to track down. Two months! During my 
holidays! And it wasn't even my fault!


So, if you're putting in architecture-specific flags, (i) it had better 
be worth it, and (ii) a few CHECK warnings are and will be the least of 
your concerns.


-A

On 4/12/19 2:32 AM, Martin Morgan wrote:

Your configure outsmarts the check system, so your options are to omit the 
flags or to live with the warning. Either solution is fine with Bioconductor.

Martin

On 4/12/19, 4:17 AM, "Bioc-devel on behalf of Jochen Knaus" 
 wrote:

 Hi everybody,
 
 for our new R package "Netboost" we have a problem about non portable

 compiler flags. Basically we support AVX in our own C++ code (using
 compiler intrinsics to use the AVX units). Of course we have a non-AVX
 code path.
 
 For building we use autotools and configure to determine at installation

 time if the AVX unit is available and supported by the given compilers.
 If available then compilation is done with AVX, if not, then ordinary
 code path is used.
 
 The problem is the R package test, which does see the set AVX flag in

 "Makevars" (as Makevars.in is configured to use AVX if the test is
 executed on a machine supporting AVX).
 
 Note: due to bundled software, this is a Linux only package, so no

 support for Microsoft compilers is required (with other flag names).
 
 Is there any way around this warning, which is a real false-positive, as

 the flag is not set in environments not suitable.
 
 Thanks a lot for any help!
 
 Jo
 
 
 Details:
 
 for testing we use GNU Autotools and the AX_EXT M4-macroset to determine

 the hardware and compiler support for additional features:
 https://www.gnu.org/software/autoconf-archive/ax_ext.html
 
 configure.ac:
 
 m4_include([m4/ax_gcc_x86_avx_xgetbv.m4])

 m4_include([m4/ax_gcc_x86_cpuid.m4])
 m4_include([m4/ax_check_compile_flag.m4])
 m4_include([m4/ax_ext.m4])
 
 # Probe CPU and compilers

 AX_EXT
 
 src/Makevars.in:
 
 PKG_CXXFLAGS=`${R_HOME}/bin/Rscript -e "Rcpp:::CxxFlags()"` @SIMD_FLAGS@
 
 Running R CMD CHECK with --as-cran, we get the warning:
 
 http://bioconductor.org/spb_reports/netboost_buildreport_20190412033232.html
 
 * checking compilation flags used ... WARNING Compilation used the

 following non-portable flag(s): -Wno-deprecated -maes -mavx -mavx2 -mfma
 -mmmx -msse -msse3 -msse4.1 -msse4.2 -mssse3
 
 (Basically we only need -mavx and optionally FMA, but AX_EXT sets all).
 
 --

 Jochen Knaus
 Institute of Biometry and Statistics
 Faculty of Medicine and Medical Center - University of Freiburg
 Office: IMBI library
 Postal address: Stefan-Meier-Str. 26, D-79104 Freiburg
 Phone: +49/761/203-5528
 Mail: j...@imbi.uni-freiburg.de
 Homepage: http://www.imbi.uni-freiburg.de
 
 
 	[[alternative HTML version deleted]]
 
 ___

 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
___

Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Call for collaborators/advice

2019-04-02 Thread Aaron Lun

Breaker breaker.

1) A package to define a base (virtual) "Interactions" class. This would 
basically have a constant "Vector" store with a "Hits" object specifying 
the pairwise interactions between elements in the constant store. One 
could also distinguish between "SelfInteractions" (constant store) and 
the more general "Interactions" (two stores, possibly of different 
types, e.g., genomic interval -> protein interactions). A variety of 
methods would be available here to do manipulations and such.


https://github.com/LTLA/IndexedRelations (WIP)

2) A package to define an "Interactions" subclass where the store is a 
genomic interval, with basic methods to operate on such classes. Methods 
such as findOverlaps(), linkOverlaps() and boundingBox() would probably 
go here. @Luke, a binning method could also conceivably go here.


https://github.com/ComputationalRegulatoryGenomicsICL/GenomicInteractions/issues/37

All of this is open for discussion, if people are interested and willing 
to volunteer. These changes will not make the next release anyway.


What he said.

-A


On 22/03/2019 19:54, Aaron Lun wrote:

Hi Luke,

Do you mean bins or bin pairs?

If you want to just bin the coverage in terms of the linear genome, 
there should be ways to do that outside of InteractionSet or 
GenomicInteractions. This is just dealing with standard genomic 
interval data; extract the anchor coordinates and plug it in elsewhere.


If you want to collate region pairs into bin pairs; I don't know of a 
dedicated function to do this from a GInteractions object (diffHic 
only does this from raw read data). You'll need to figure out what to 
do to regions that cross bin boundaries.


The simplest way to mimic this behaviour right now is to generate 
another GInteractions object containing ALL POSSIBLE bin pairs (use 
combn with a constant set of bin regions) and plug that into 
countOverlaps. This will generate loads of zeroes, though, so is not 
the most efficient way to do this. You could get a sparser form with 
linkOverlaps but this requires more work to get the counts.


I have some more thoughts about the Bioconductor Hi-C infrastructure, 
but my laptop battery's running out and I left my charger in my new 
apartment. So that'll have to wait until tomorrow.


-A


On 22/03/2019 09:31, Luke Klein wrote:
I am writing a package that will extend the GenomicInteractions 
class.   I am a statistician, so I may not know best practices when 
it comes to extending existing classes (eg. should I make a new slot 
or simply add a column to the `elementMetadata`?  Are there existing 
functions that already do what I am attempting?).


I am not familiar with Bioc-Devel decorum, so if asking this here is 
inappropriate, kindly let me know.


About my project:

In the first step, I am hoping to implement a HiC binning function on 
HiC data contained in a GenomicInteractions set.  I aim to:


- Reorder the anchor pairs (I will explain in more detail to anyone 
that wants to help)

- Collapse the regions to the desires bin width
- Sum the counts within each bin
- Update the anchors to agree with the new/updated regions

This will set the stage for the new class that I hope to create for 
HiC domain calling, but I need to achieve the above tasks first.


All the best to everyone!

—*Luke Klein*
     PhD Student
     Department of Statistics
     University of California, Riverside
lklei...@ucr.edu <mailto:lklei...@ucr.edu>








___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] package RBGL requires CRAN dependency on devel branch

2019-03-26 Thread Aaron Lun

well, we can't fix this in old branches of Bioc.


Sure, but one could say that about breaking changes to any CRAN package. 
Nothing particularly special about BH on that point.


My POV is that we need to consider - in some sense - lock down CRAN with 
a Bioc release.


That's probably worth considering. For example, it would be pretty cool 
to just supply a date and BiocManager could figure out the last 
successful release versions of all CRAN/BioC packages at that time.


-A

On Tue, Mar 26, 2019 at 12:45 AM Aaron Lun 
<mailto:infinite.monkeys.with.keyboa...@gmail.com>> wrote:


My 2 cents - API-breaking changes to BH are no more of an issue than
breaking changes to any other CRAN package. We just hope that it
doesn't
happen too often and we deal with it when the time comes; that's the
whole point of getting frequent release builds to check for these cases.

If we were discussing a package that distributed a shared library, then
greater concern would be warranted if updates to the library
resulted in
ABI incompatibilities. This could result in very cryptic errors at link
time, load time, or possibly segmentation faults, who knows.

But BH is a header-only library, so breaking changes will most likely
cause compilation errors that are obvious and easy to fix. Well, easy
enough if you were able to write C++ code in the first place.

-A


On 25/03/2019 08:39, Vincent Carey wrote:
 > On Mon, Mar 25, 2019 at 10:57 AM Kasper Daniel Hansen <
 > kasperdanielhan...@gmail.com
<mailto:kasperdanielhan...@gmail.com>> wrote:
 >
 >> There are no issues with depending on CRAN packages.
 >>
 >> But I would advise caution. On one hand it is great that boost gets
 >> updated regularly. On the other hand, it could lead to
incompatibilities
 >> with RBGL and then you have to update that package rapidly. Also
- and this
 >> is something we could consider addressing - the CRAN imports of
Bioc are
 >> not locked down. By which I mean, you release RBGL in
Bioconductor X. After
 >> release (or perhaps even after next Bioc release) BH is updated in a
 >> non-backwards compatible way and now the old code is hosed.
Having said
 >> that, so far we have been ignoring it (I think) and the same
issue arises
 >> with Rcpp.
 >>
 >> Do you have any idea how often Boost breaks compatibility?  I would
 >> strongly advise to download the last couple of BH releases and
test with
 >> RBGL. While kind of irrelevant in some sense, it will give you
an idea of
 >> how fast Boost / BH evolves.
 >>
 >
 > These are good points.  In this particular case I believe that
Boost Graph
 > Library evolves very slowly and
 > backwards compatibility is not endangered.  It is an early
component of
 > Boost.  On the other hand, BH has
 > no obligation to provide the graph (BGL) headers, and I believe
that in
 > early incarnations of BH, some headers
 > needed for RBGL were not there.  So there are maintenance
vulnerabilities
 > to this approach, but I think it is better
 > if we stick with the maintained BH as long as this works.  Should
this
 > approach fail (and your scenario of
 > CRAN package changes breaking bioc must be kept in mind) we can
go back to
 > tarball distribution if necessary.
 >
 >
 >>
 >> On Mon, Mar 25, 2019 at 8:03 AM Martin Morgan
mailto:mtmorgan.b...@gmail.com>>
 >> wrote:
 >>
 >>> ...also Bioconductor knows all about CRAN -- see the repositories
 >>> returned by
 >>>
 >>>> BiocManager::repositories()
 >>>                                                 BioCsoft
 >>>             "https://bioconductor.org/packages/3.9/bioc;
 >>>                                                  BioCann
 >>> "https://bioconductor.org/packages/3.9/data/annotation;
 >>>                                                  BioCexp
 >>> "https://bioconductor.org/packages/3.9/data/experiment;
 >>>                                            BioCworkflows
 >>>        "https://bioconductor.org/packages/3.9/workflows;
 >>>                                                     CRAN
 >>>                               "https://cran.rstudio.com;
 >>>>
 >>>
 >>> On 3/25/19, 7:42 AM, "Martin Morgan" mailto:mtmorgan.b...@gmail.com>> wrote:
 >>>
 >>>      I think the usual incantation in c

Re: [Bioc-devel] package RBGL requires CRAN dependency on devel branch

2019-03-25 Thread Aaron Lun
My 2 cents - API-breaking changes to BH are no more of an issue than 
breaking changes to any other CRAN package. We just hope that it doesn't 
happen too often and we deal with it when the time comes; that's the 
whole point of getting frequent release builds to check for these cases.


If we were discussing a package that distributed a shared library, then 
greater concern would be warranted if updates to the library resulted in 
ABI incompatibilities. This could result in very cryptic errors at link 
time, load time, or possibly segmentation faults, who knows.


But BH is a header-only library, so breaking changes will most likely 
cause compilation errors that are obvious and easy to fix. Well, easy 
enough if you were able to write C++ code in the first place.


-A


On 25/03/2019 08:39, Vincent Carey wrote:

On Mon, Mar 25, 2019 at 10:57 AM Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:


There are no issues with depending on CRAN packages.

But I would advise caution. On one hand it is great that boost gets
updated regularly. On the other hand, it could lead to incompatibilities
with RBGL and then you have to update that package rapidly. Also - and this
is something we could consider addressing - the CRAN imports of Bioc are
not locked down. By which I mean, you release RBGL in Bioconductor X. After
release (or perhaps even after next Bioc release) BH is updated in a
non-backwards compatible way and now the old code is hosed. Having said
that, so far we have been ignoring it (I think) and the same issue arises
with Rcpp.

Do you have any idea how often Boost breaks compatibility?  I would
strongly advise to download the last couple of BH releases and test with
RBGL. While kind of irrelevant in some sense, it will give you an idea of
how fast Boost / BH evolves.



These are good points.  In this particular case I believe that Boost Graph
Library evolves very slowly and
backwards compatibility is not endangered.  It is an early component of
Boost.  On the other hand, BH has
no obligation to provide the graph (BGL) headers, and I believe that in
early incarnations of BH, some headers
needed for RBGL were not there.  So there are maintenance vulnerabilities
to this approach, but I think it is better
if we stick with the maintained BH as long as this works.  Should this
approach fail (and your scenario of
CRAN package changes breaking bioc must be kept in mind) we can go back to
tarball distribution if necessary.




On Mon, Mar 25, 2019 at 8:03 AM Martin Morgan 
wrote:


...also Bioconductor knows all about CRAN -- see the repositories
returned by


BiocManager::repositories()

BioCsoft
"https://bioconductor.org/packages/3.9/bioc;
 BioCann
"https://bioconductor.org/packages/3.9/data/annotation;
 BioCexp
"https://bioconductor.org/packages/3.9/data/experiment;
   BioCworkflows
   "https://bioconductor.org/packages/3.9/workflows;
CRAN
  "https://cran.rstudio.com;




On 3/25/19, 7:42 AM, "Martin Morgan"  wrote:

 I think the usual incantation in configure files is ${R_HOME}/bin/R
... R_HOME is the path to R set by the command that starts to build or
install the package, whereas Rscript is found on the search path.

 Martin

 On 3/25/19, 7:33 AM, "Bioc-devel on behalf of Vincent Carey" <
bioc-devel-boun...@r-project.org on behalf of st...@channing.harvard.edu>
wrote:

 The error on linux for 3.9:


##

##
 ###
 ### Running command:
 ###
 ###   /home/biocbuild/bbs-3.9-bioc/R/bin/R CMD INSTALL RBGL
 ###

##

##


 * installing to library ‘/home/biocbuild/bbs-3.9-bioc/R/library’
 * installing *source* package ‘RBGL’ ...
 ** using staged installation
 checking R package BH ... no
 configure: error: R package BH not found.
 ERROR: configuration failed for package ‘RBGL’
 * removing ‘/home/biocbuild/bbs-3.9-bioc/R/library/RBGL’
 * restoring previous ‘/home/biocbuild/bbs-3.9-bioc/R/library/RBGL’

 Note that BiocParallel also uses BH and succeeds

 configure: creating ./config.status
 config.status: creating src/Makevars
 ** libs
 g++ -std=gnu++11 -I"/home/biocbuild/bbs-3.9-bioc/R/include"
-DNDEBUG
 -I"/home/biocbuild/bbs-3.9-bioc/R/library/BH/include"
 -I/usr/local/include  -fpic  -g -O2  -Wall -c ipcmutex.cpp -o
 ipcmutex.o
 In 

Re: [Bioc-devel] Call for collaborators/advice

2019-03-25 Thread Aaron Lun

Power's back, so continuing on:

The Bioconductor Hi-C infrastructure should probably be consolidated 
into packages with more clearly defined boundaries:


1) A package to define a base (virtual) "Interactions" class. This would 
basically have a constant "Vector" store with a "Hits" object specifying 
the pairwise interactions between elements in the constant store. One 
could also distinguish between "SelfInteractions" (constant store) and 
the more general "Interactions" (two stores, possibly of different 
types, e.g., genomic interval -> protein interactions). A variety of 
methods would be available here to do manipulations and such.


2) A package to define an "Interactions" subclass where the store is a 
genomic interval, with basic methods to operate on such classes. Methods 
such as findOverlaps(), linkOverlaps() and boundingBox() would probably 
go here. @Luke, a binning method could also conceivably go here.


3) A package to define the "InteractionSet" and "ContactMatrix" classes. 
Basically just the "InteractionSet" package with the "GInteractions" 
class stripped out and moved into (2).


4) Additional packages for higher-level analysis, e.g., diffHic. These 
won't need much change beyond fiddling with the Imports.


So, (2) depends on (1), (3) depends on (2), and (4) depends on (3). (1) 
could either be S4Vectors itself, or we could take out the "Pairs" class 
from S4Vectors and put it into a separate package that provides data 
structures for interaction-esque thingies.


@Liz, "GenomicInteractions" (the package) would be a natural home for 
the class/methods in (2). It would also resolve the confusion between 
the "GInteractions" class and "GenomicInteractions" (the class) by 
making these one thing. There are two obvious hurdles:


- I'm not familiar with the requirements for the class specialization in 
"GenomicInteractions", but anything really custom would not belong in (2).
- Any methods for specialized data analysis would need to go into 
another package for (4). I don't have a good definition of what is 
specialized; but if there's statistical inference, it shouldn't be in (2).


All of this is open for discussion, if people are interested and willing 
to volunteer. These changes will not make the next release anyway.


-A


On 22/03/2019 19:54, Aaron Lun wrote:

Hi Luke,

Do you mean bins or bin pairs?

If you want to just bin the coverage in terms of the linear genome, 
there should be ways to do that outside of InteractionSet or 
GenomicInteractions. This is just dealing with standard genomic interval 
data; extract the anchor coordinates and plug it in elsewhere.


If you want to collate region pairs into bin pairs; I don't know of a 
dedicated function to do this from a GInteractions object (diffHic only 
does this from raw read data). You'll need to figure out what to do to 
regions that cross bin boundaries.


The simplest way to mimic this behaviour right now is to generate 
another GInteractions object containing ALL POSSIBLE bin pairs (use 
combn with a constant set of bin regions) and plug that into 
countOverlaps. This will generate loads of zeroes, though, so is not the 
most efficient way to do this. You could get a sparser form with 
linkOverlaps but this requires more work to get the counts.


I have some more thoughts about the Bioconductor Hi-C infrastructure, 
but my laptop battery's running out and I left my charger in my new 
apartment. So that'll have to wait until tomorrow.


-A


On 22/03/2019 09:31, Luke Klein wrote:
I am writing a package that will extend the GenomicInteractions class. 
  I am a statistician, so I may not know best practices when it comes 
to extending existing classes (eg. should I make a new slot or simply 
add a column to the `elementMetadata`?  Are there existing functions 
that already do what I am attempting?).


I am not familiar with Bioc-Devel decorum, so if asking this here is 
inappropriate, kindly let me know.


About my project:

In the first step, I am hoping to implement a HiC binning function on 
HiC data contained in a GenomicInteractions set.  I aim to:


- Reorder the anchor pairs (I will explain in more detail to anyone 
that wants to help)

- Collapse the regions to the desires bin width
- Sum the counts within each bin
- Update the anchors to agree with the new/updated regions

This will set the stage for the new class that I hope to create for 
HiC domain calling, but I need to achieve the above tasks first.


All the best to everyone!

—*Luke Klein*
     PhD Student
     Department of Statistics
     University of California, Riverside
lklei...@ucr.edu <mailto:lklei...@ucr.edu>








___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Call for collaborators/advice

2019-03-22 Thread Aaron Lun

Hi Luke,

Do you mean bins or bin pairs?

If you want to just bin the coverage in terms of the linear genome, 
there should be ways to do that outside of InteractionSet or 
GenomicInteractions. This is just dealing with standard genomic interval 
data; extract the anchor coordinates and plug it in elsewhere.


If you want to collate region pairs into bin pairs; I don't know of a 
dedicated function to do this from a GInteractions object (diffHic only 
does this from raw read data). You'll need to figure out what to do to 
regions that cross bin boundaries.


The simplest way to mimic this behaviour right now is to generate 
another GInteractions object containing ALL POSSIBLE bin pairs (use 
combn with a constant set of bin regions) and plug that into 
countOverlaps. This will generate loads of zeroes, though, so is not the 
most efficient way to do this. You could get a sparser form with 
linkOverlaps but this requires more work to get the counts.


I have some more thoughts about the Bioconductor Hi-C infrastructure, 
but my laptop battery's running out and I left my charger in my new 
apartment. So that'll have to wait until tomorrow.


-A


On 22/03/2019 09:31, Luke Klein wrote:
I am writing a package that will extend the GenomicInteractions class. 
  I am a statistician, so I may not know best practices when it comes to 
extending existing classes (eg. should I make a new slot or simply add a 
column to the `elementMetadata`?  Are there existing functions that 
already do what I am attempting?).


I am not familiar with Bioc-Devel decorum, so if asking this here is 
inappropriate, kindly let me know.


About my project:

In the first step, I am hoping to implement a HiC binning function on 
HiC data contained in a GenomicInteractions set.  I aim to:


- Reorder the anchor pairs (I will explain in more detail to anyone that 
wants to help)

- Collapse the regions to the desires bin width
- Sum the counts within each bin
- Update the anchors to agree with the new/updated regions

This will set the stage for the new class that I hope to create for HiC 
domain calling, but I need to achieve the above tasks first.


All the best to everyone!

—*Luke Klein*
     PhD Student
     Department of Statistics
     University of California, Riverside
lklei...@ucr.edu 








___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] set.seed and BiocParallel

2019-03-12 Thread Aaron Lun
I think Kylie is saying that she wants to use the same seed for each 
feature across different runs, but the seed can be different across 
features - which would make more sense.


Multi-worker reproducibility is an issue that we discussed before (the 
link goes into the middle of the thread):


https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014505.html

The key thing is that, in addition to reproducibility, there is the 
issue of correctness with guaranteed independent streams.


Some food for thought: in the vast majority of my parallelized 
applications, the heavy lifting (including the RNG'ing) is done in C++. 
If this is also the case for you, consider using the dqrng package to 
provide the C++ PRNG. I usually generate all my seeds in the serial part 
of the code, and then distribute seeds to the jobs where each job is set 
to a different "stream" value so that the sequence of random numbers is 
always different, regardless of the seed. As the serial seed generation 
is under the control of set.seed(), this provides correctness and 
reproducibility no matter how the jobs are distributed across workers.


-A

On 12/03/2019 17:42, Kasper Daniel Hansen wrote:

But why do you want the same seed for the different features? That is not
the right way to use stochastic methods.

Best,
Kasper

On Tue, Mar 12, 2019 at 5:20 PM Bemis, Kylie 
wrote:


Hi all,

I remember similar questions coming up before, but couldn’t track any down
that directly pertain to my situation.

Suppose I want to use bplapply() in a function to fit models to many
features, and I am applying over features. The models are stochastic, and I
want the results to be reproducible, and preferably use the same RNG seed
for each feature. So I could do:

fitModels <- function(object, seed=1, BPPARAM=bpparam()) {
bplapply(object, function(x) {
set.seed(seed)
fitModel(x)
}, BPPARAM=BPPARAM)
}

But the BioC guidelines say not to use set.seed() inside function code,
and I’ve seen other questions answered saying not to use “seed” as a
function parameter in this way.

Is it preferable to check and modify .Random.seed directly, or is there
some other standard way of doing this?

Thanks,
Kylie

~~~
Kylie Ariel Bemis
Khoury College of Computer Sciences
Northeastern University
kuwisdelu.github.io









 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Merging GInteraction/GenomicInteractions ranges

2019-02-20 Thread Aaron Lun
> One thing I’m not clear on is how to expand the ranges from one step
> to the next.  The way GenomicInteractions are structured, there is a
> Granges object with all possible ranges, and the GInteractions object
> is populated by reference to said interactions.

If you're already in C++, then life is easy. Just compute the minimum
bounding box within each run of interactions (if you want empirical
bounds) or compute the theoretical bounds based on the bin widths and
offsets. Collect all of these ranges and report them at the end of the
function to construct a new G(enomic)Interactions object.

Note that the theoretical bounds will be better at leveraging the
memory-saving internal structure of the GInteractions object, as it is
more likely that the anchor regions will be shared between different
bin pairs at the same height of the tree. With empirical bounds, this
is unlikely to be the case.

> What I am going to need is a new GRanges object with the new set of
> (expanded) ranges, and a way to map the prior ranges to the new,
> wider range.

If you're constructing the quad-tree using the recursive algorithm
discussed below, mapping is trivial. At each step of the recursion, you
pass the identity of the parent interaction and record it for each new
child interaction that is generated.

-A

> On Feb 13, 2019, at 3:34 AM, Aaron Lun  > a...@gmail.com> wrote:
> > 
> > Note that your visual won't show up for many (all?) of us.
> > Nonetheless,
> > I think I know what you want to do.
> > 
> > Your task does not lend itself to vectorization, which makes it
> > difficult to write efficient R code. It's not impossible, but it
> > would
> > be quite hard to read and debug, and your maintenance programmer
> > will
> > be cursing you somewhere down the line.
> > 
> > If speed is truly a concern, I would write this code in C++. This
> > would
> > probably be several lines' worth of code:
> > 
> > 1. Compute a pair of bin IDs for each interaction by dividing each
> > anchor coordinate by the bin width and truncating the result.
> > (You'll
> > need to decide if you want to use the midpoint/start/end, etc.)
> > 2. Sort the interactions by the paired bin IDs, e.g., with
> > std::sort.
> > 3. Identify each "run" of interactions with the same paired IDs.
> > 4. Repeat step 1 within each run (you'll need to offset the anchor
> > coordinate before dividing this time). Append the current quadrant
> > to
> > the quadrant sequence for return to R at the end of recursion.
> > 
> > Clear, concise, and can be slapped together in less than half an
> > hour
> > with Rcpp and C++11, if you know what you're doing.
> > 
> > -A
> > 
> > On Tue, 2019-02-12 at 11:34 -0800, Luke Klein wrote:
> > > Hello.  I am planning to develop a new package which extends the
> > > GenomicInteractions package.  I would like some help/advice on
> > > implementing the following functionality.
> > > 
> > > Consider the follow GenomicInteractions object
> > > 
> > > GenomicInteractions object with 10 interactions and 1 metadata
> > > column:
> > >    seqnames1   ranges1 seqnames2   ranges2 |counts
> > >        | 
> > >    [1]  chrA   1-2 ---  chrA  9-10 | 1
> > >    [2]  chrA   1-2 ---  chrA 15-16 | 1
> > >    [3]  chrA   3-4 ---  chrA   3-4 | 1
> > >    [4]  chrA   5-6 ---  chrA   7-8 | 1
> > >    [5]  chrA   5-6 ---  chrA  9-10 | 1
> > >    [6]  chrA   7-8 ---  chrA   7-8 | 1
> > >    [7]  chrA   7-8 ---  chrA 11-12 | 1
> > >    [8]  chrA   7-8 ---  chrA 17-18 | 1
> > >    [9]  chrA  9-10 ---  chrA  9-10 | 1
> > >   [10]  chrA  9-10 ---  chrA 15-16 | 1
> > >   ---
> > >   regions: 8 ranges and 0 metadata columns
> > >   seqinfo: 1 sequence from an unspecified genome; no seqlengths
> > > 
> > > 
> > > Which is visually represented thusly
> > > 
> > > 
> > > 
> > > I would like to do the following:
> > > 
> > > 1) I want to group the regions into bins of WxW (in this case, W
> > > will
> > > be 3), as in a quad-tree structure <https://en.wikipedia.org/wiki
> > > /Qua
> > > dtree> with the final group being WxW (instead of 2x2).  This
> > > will
> > > involve 
> > >   - iteratively dividi

Re: [Bioc-devel] Merging GInteraction/GenomicInteractions ranges

2019-02-13 Thread Aaron Lun
Note that your visual won't show up for many (all?) of us. Nonetheless,
I think I know what you want to do.

Your task does not lend itself to vectorization, which makes it
difficult to write efficient R code. It's not impossible, but it would
be quite hard to read and debug, and your maintenance programmer will
be cursing you somewhere down the line.

If speed is truly a concern, I would write this code in C++. This would
probably be several lines' worth of code:

1. Compute a pair of bin IDs for each interaction by dividing each
anchor coordinate by the bin width and truncating the result. (You'll
need to decide if you want to use the midpoint/start/end, etc.)
2. Sort the interactions by the paired bin IDs, e.g., with std::sort.
3. Identify each "run" of interactions with the same paired IDs.
4. Repeat step 1 within each run (you'll need to offset the anchor
coordinate before dividing this time). Append the current quadrant to
the quadrant sequence for return to R at the end of recursion.

Clear, concise, and can be slapped together in less than half an hour
with Rcpp and C++11, if you know what you're doing.

-A

On Tue, 2019-02-12 at 11:34 -0800, Luke Klein wrote:
> Hello.  I am planning to develop a new package which extends the
> GenomicInteractions package.  I would like some help/advice on
> implementing the following functionality.
> 
> Consider the follow GenomicInteractions object
> 
> GenomicInteractions object with 10 interactions and 1 metadata
> column:
>    seqnames1   ranges1 seqnames2   ranges2 |counts
>        | 
>    [1]  chrA   1-2 ---  chrA  9-10 | 1
>    [2]  chrA   1-2 ---  chrA 15-16 | 1
>    [3]  chrA   3-4 ---  chrA   3-4 | 1
>    [4]  chrA   5-6 ---  chrA   7-8 | 1
>    [5]  chrA   5-6 ---  chrA  9-10 | 1
>    [6]  chrA   7-8 ---  chrA   7-8 | 1
>    [7]  chrA   7-8 ---  chrA 11-12 | 1
>    [8]  chrA   7-8 ---  chrA 17-18 | 1
>    [9]  chrA  9-10 ---  chrA  9-10 | 1
>   [10]  chrA  9-10 ---  chrA 15-16 | 1
>   ---
>   regions: 8 ranges and 0 metadata columns
>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> 
> Which is visually represented thusly
> 
> 
> 
> I would like to do the following:
> 
> 1) I want to group the regions into bins of WxW (in this case, W will
> be 3), as in a quad-tree structure  dtree> with the final group being WxW (instead of 2x2).  This will
> involve 
>   - iteratively dividing the matrix into quadrants {upper-left
> (0), upper-right (1), lower-left (2), lower-right(3)} .
>   - labeling each subdivision in a new column until the final WxW
> resolution is reached.
>   - sorting by the columns
> 
> 
> 
> 
> GenomicInteractions object with 10 interactions and 1 metadata
> column:
>    seqnames1   ranges1 seqnames2   ranges2
> |counts quad1 quad2
>        | 
>  
>    [1]  chrA   1-2 ---  chrA  9-10 | 1
>    0   1
>    [2]  chrA   1-2 ---  chrA 15-16 | 1
>    1   0
>    [3]  chrA   3-4 ---  chrA   3-4 | 1
>    0   0
>    [4]  chrA   5-6 ---  chrA   7-8 | 1
>    0   1
>    [5]  chrA   5-6 ---  chrA  9-10 | 1
>    0   1
>    [6]  chrA   7-8 ---  chrA   7-8 | 1
>    0   3   
>    [7]  chrA   7-8 ---  chrA 11-12 | 1
>    0   3
>    [8]  chrA   7-8 ---  chrA 17-18 | 1
>    1   2
>    [9]  chrA  9-10 ---  chrA  9-10 | 1
>    0   3
>   [10]  chrA  9-10 ---  chrA 15-16 | 1
>    1   2
>   ---
>   regions: 8 ranges and 0 metadata columns
>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> 
> Sorting by the two columns yields what I am after.  Of course, I
> include the “quadX” column for illustration only.  Upon
> implementation, I would like these columns hidden from the user.
> 
> GenomicInteractions object with 10 interactions and 1 metadata
> column:
>    seqnames1   ranges1 seqnames2   ranges2
> |counts quad1 quad2
>        | 
>  
>    [1]  chrA   3-4 ---  chrA   3-4 | 1
>    0   0
>    [2]  chrA   1-2 ---  chrA  9-10 | 1
>    0   1
>    [3]  chrA   5-6 ---  chrA   7-8 | 1
>    0   1
>    [4]  chrA   5-6 ---  chrA  9-10 | 1
>    0   1
>    [5]  chrA   7-8 ---  chrA 

Re: [Bioc-devel] Pushing towards a better home for matrix generics

2019-02-10 Thread Aaron Lun
might need to be changed >>
> >  >> from \alias{colSums,dgCMatrix,ANY,ANY-method} to >>
> >  >> \alias{colSums,dgCMatrix-method}).
> >  >> >>
> >  >> >> Anybody wants to try to make a patch for this?
> >  >>
> >  >> >> H.
> >  >>
> >  >> I've already replied without having read the above two
> >  >> messages.  In my reply I had indeed more or less argued
> >  >> as Hervé does above.
> >  >>
> >  >> Michael, Hervé, .. : Why is it really so much better to
> >  >> disallow dispatch for the other compulsory arguments?
> >  >> Dispatch there allows to use methods for class "missing"
> >  >> which is nicer in my eyes than the traditional default
> >  >> argument + missing() "tricks".
> >  >>
> >  >> Is it mainly speed you are concerned about.  If yes, do
> >  >> we have data (and data analysis) about performance here?
> >  >>
> >  >> Martin
> >  >>
> >  >> >>
> >  >> >> On 1/28/19 19:00, Michael Lawrence wrote: > I agree
> >  >> (2) >> is a good compromise. CC'ing Martin for his
> >  >> perspective.
> >  >> >> >
> >  >> >> > Michael
> >  >> >> >
> >  >> >> > On Mon, Jan 28, 2019 at 6:58 PM Pages, Herve >>
> >  >>  wrote: >> Hi Aaron,
> >  >> >> >>
> >  >> >> >> The 4 matrix summarization generics currently
> >  >> defined >> in BiocGenerics >> are defined as followed:
> >  >> >> >>
> >  >> >> >> setGeneric("rowSums", signature="x") >> >>
> >  >> setGeneric("colSums", signature="x") >> >>
> >  >> setGeneric("rowMeans", signature="x") >> >>
> >  >> setGeneric("colMeans", signature="x")
> >  >> >> >>
> >  >> >> >> The only reason for having these definitions in >>
> >  >> BiocGenerics is to >> restrict dispatch the first >>
> >  >> argument. This is cleaner than what we would >> get with
> >  >> >> the implicit generics where dispatch is on all
> >  >> arguments >> (it >> doesn't really make sense to dispatch
> >  >> on toggles >> like 'na.rm' or >> 'dims'). Sticking to
> >  >> simple dispatch >> when possible makes life easier for >>
> >  >> the developer >> (especially in times of troubleshooting)
> >  >> and for the user >> >> (methods are easier to discover
> >  >> and their man pages >> easier to access).
> >  >> >> >>
> >  >> >> >> However, the 4 statements above create new generics
> >  >> >> that mask the >> implicit generics defined in the
> >  >> Matrix >> package (Matrix doesn't contain >> any
> >  >> setGeneric >> statements for these generics, only
> >  >> setMethod >> >> statements). This is a very unsatisfying
> >  >> situation and it >> has hit me >> repeatedly over the
> >  >> last couple of years.
> >  >> >> >>
> >  >> >> >> We have basically 3 ways to go. From simpler to
> >  >> more >> complicated:
> >  >> >> >>
> >  >> >> >> 1) Give up on single dispatch for these
> >  >> generics. That >> is, we remove the >> 4 statements above
> >  >> from >> BiocGenerics. Then we use setMethod() in package
> >  >> >> code >> like Matrix does.
> >  >> >> >>
> >  >> >> >> 2) Convince the Matrix folks to put the 4
> >  >> statements >> above in Matrix.  >> Then any BioC package
> >  >> that needs to >> define methods for these generics >>
> >  >> would just need to >> import them from the Matrix
> >  >> package. Maybe we could >> >> even push this one step

[Bioc-devel] Pushing towards a better home for matrix generics

2019-01-27 Thread Aaron Lun
This is a resurrection of some old threads:

https://stat.ethz.ch/pipermail/bioc-devel/2017-November/012273.html

https://github.com/Bioconductor/MatrixGenerics/issues

For those who are unfamiliar with this, the basic issue is that various
Matrix and BiocGenerics functions mask each other. This is mildly
frustrating in interactive sessions:

> library(Matrix)
> library(DelayedArray)
> x <- rsparsematrix(10, 10, 0.1)
> colSums(x) # fails
> Matrix::colSums(x) # okay

... but quite annoying during package development, requiring code like
this:

if (is(x, "Matrix")) {
z <- Matrix::colSums(x)
} else {
z <- colSums(x) # assuming DelayedArray does the masking.
}

... which defeats the purpose of using S4 dispatch in the first place.

I have been encountering this issue with increasing frequency in my
packages, as a lot of my code base needs to be able to interface with
both Matrix and Bioconductor objects (e.g., DelayedMatrices) at the
same time. What needs to happen so that I can just write:

z <- colSums(x)

... and everything will work for both Matrix and Bioconductor classes?
It seems that many of these function names are implicit generics
anyway, can BiocGenerics take advantage of that for the time being?

Best,

Aaron

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Plans for multi-feature SingleCellExperiment?

2019-01-22 Thread Aaron Lun
For 10X experiments, the Bioc-devel version of DropletUtils will read in
the additional features as extra rows in the count matrix. This reflects
how they are stored in the 10X output format. The row metadata will
record the nature of the feature.

In some cases it may be desirable to keep all the features together. For
starters, it seems like many of the biases are likely to be shared
(w.r.t. library preparation and capture efficiency), so one could
imagine using the same scaling factors for normalization of both
antibody-based features and endogenous mRNAs. In addition, all of the
scater visualization methods rely on SCE inputs, so if you want to
overlay them with protein marker intensities, they'll need to be in the
same matrix.

If you really need to only use mRNAs or antibody-based features, (i) you
can explicitly subset the SCE based on the rowData, or (ii) pass a
subsetting vector to the various scran/scater/whatever functions to tell
them to only use the specified features. Admittedly, if you're going to
be doing this a lot, it would be more convenient to form a MAE
containing two SCEs so that you only have to pass the SCE you want into
those functions.

To that end I would be willing to entertain a PR to DropletUtils to
create a MAE from an SCE. I'm more reluctant to add an isSpike()-like
function. The rationale behind isSpike() was that spike-ins are constant
across cells (theoretically) and thus a function could use this
information to improve its calculations. It's less clear what
mathematically useful information can be gained from protein markers -
biological info, yes, but nothing that you would use to change your
algorithm.

-A

Steve Lianoglou wrote:
> Comrades,
>
> Sorry if I'm out of the loop and have missed anything obvious.
>
> I was curious what the plans are in the single-cell bioconductor-verse
> to support single cell experiments that produce counts from different
> feature-spaces, such as those produced by CITE-seq / REAP-seq, for
> instance.
>
> In these types of experiments, I'm pretty sure we want the counts
> generated from those "features" (oligo-conjugated Antibodies, for
> instance) to be kept in a separate space than the mRNA counts. I think
> we would most  naturally want to put these in something like an
> `assay()` matrix with a different (rowwise) dimmension than the gene
> count matrix, but that can't work since all matrices in the assay()
> list need to be of the same dimensions.
>
> Another option might be to just add them as rows to the assay
> matrices, but keep some type of feature space meta-information akin to
> what `isSpike()` currently does;
>
> or add a new slot to SingleCellExperiment to hold counts from
> different feature spaces, perhaps?;
>
> Or rely on something like a MultiAssayExperiment?
>
> Or?
>
> Curious to learn which way you folks are leaning ...
>
> Thanks!
> -steve
>
> ps - sorry if this email came through twice, it was somehow magically
> sent from an email address I don't have access to anymore.
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] What is good convention for package-local BiocParallel param?

2019-01-15 Thread Aaron Lun
> From the above example, if I had the BPPARAM argument, it’d also clearly add 
> a lot of code noise:
> 
> data %>%
> apply_methods(method_list1, BPPARAM = MulticoreParam(stop.on.error=FALSE)) %>%
> apply_methods(method_list2, BPPARAM = MulticoreParam(stop.on.error=FALSE)) %>%
> apply_methods(method_list3, BPPARAM = MulticoreParam(stop.on.error=FALSE))
> 
> The compromise is to have
> 
> my_bpparam = MulticoreParam(stop.on.error=FALSE)
> 
> data %>%
> apply_methods(method_list1, BPPARAM = my_bpparam) %>%
> apply_methods(method_list2, BPPARAM = my_bpparam) %>%
> apply_methods(method_list3, BPPARAM = my_bpparam)

This actually looks like the best option to me. Nice and explicit, amenable to 
static code analysis (manual or automated).

> But really at that point why not just have
> 
> set_cellbench_bpparam(MulticoreParam(stop.on.error=FALSE))
> 
> data %>%
> apply_methods(method_list1) %>%
> apply_methods(method_list2) %>%
> apply_methods(method_list3)
> 
> Which I guess is the point of this mail chain. Is there a reason why I 
> shouldn’t?

Can’t hurt to have fewer globals, I guess?

-A

>> =On 12 Jan 2019, at 5:43 pm, Aaron Lun 
>>  wrote:
>> 
>> My current set-up in a variety of packages is that every parallelizable 
>> function has a BPPARAM= argument. This makes it explicit about which steps 
>> are being parallelized. Requiring users to respecify BPPARAM= in each 
>> function isn’t as annoying as you’d think, because not many steps are 
>> actually heavy enough to warrant parallelization.
>> 
>> Ideally, I'd have BPPARAM=bpparam() by default, allowing me to both respond 
>> to the register()'d bpparam() as well as any custom argument that might be 
>> supplied by the user, e.g., if they don't want to change bpparam(). However, 
>> for various reasons (discussed in the other SerialParam() thread), the 
>> current default is BPPARAM=SerialParam().
>> 
>> To be honest, I've never thought it necessary to have a global 
>> package-specific parameter for parallelization as you've done (for scPipe, 
>> presumably). The current options - global across all packages with 
>> register(), or local to individual functions with BPPARAM= - seem to be 
>> satisfactory in the vast majority of cases. At least to me.
>> 
>> And at least for RNGs, if a function from another package is giving greatly 
>> different results upon parallelization (excepting some numerical error with 
>> changed order of summation), I'd say that's a bug of some sort. That should 
>> be fixed on their end, rather than requiring other packages and users to 
>> tiptoe around them.
>> 
>> -A
>> 
>>> On 10 Jan 2019, at 23:59, Shian Su  wrote:
>>> 
>>> Hello Developers,
>>> 
>>> I’m using BiocParallel for parallelism, and I understand that register() is 
>>> the recommended method for setting threads. But I am reluctant to ask 
>>> people to run code for my package which changes how other packages operate, 
>>> so I figured I’d use local bp params. Recent discussions of RNG has made me 
>>> worried there may be hidden state gotcha’s I’ve not considered. The current 
>>> implementation is
>>> 
>>> set_mypkg_threads <- function(n) {
>>> if (n == 1) {
>>> options(“mypkg.bpparam” = SerialParam())
>>> } else if (n > 1) {
>>> if (.Platform$OS.type == "windows") {
>>> options(“mypkg.bpparam" = SnowParam(nthreads))
>>> } else {
>>> options(“mypkg.bpparam" = MulticoreParam(nthreads))
>>> }
>>> }
>>> }
>>> 
>>> Then elsewhere in my package I make use of parallelism as follows
>>> 
>>> bplapply(
>>> BPPARAM = getOption(“mypkg.bpparam”, bpparam()),
>>> …
>>> )
>>> 
>>> Where getOption() either retrieves my set option or the default value given 
>>> by bpparam(). So the behaviour is that if users have not registered params 
>>> for my package specifically then it will take the BiocParallel default, but 
>>> otherwise it will use my package’s local bpparam.
>>> 
>>> Also I know that as currently implemented, I preclude cluster parallelism 
>>> on non-Windows machines. But it’s easy to fix. Just looking for feedback on 
>>> the approach.
>>> 
>>> Kind regards,
>>> Shian Su
>>> 
>>> ___
>>> 
>>> The information in this email is confidential and intended solely for the 
>>> addressee.
>>> 

Re: [Bioc-devel] What is good convention for package-local BiocParallel param?

2019-01-11 Thread Aaron Lun
My current set-up in a variety of packages is that every parallelizable 
function has a BPPARAM= argument. This makes it explicit about which steps are 
being parallelized. Requiring users to respecify BPPARAM= in each function 
isn’t as annoying as you’d think, because not many steps are actually heavy 
enough to warrant parallelization.

Ideally, I'd have BPPARAM=bpparam() by default, allowing me to both respond to 
the register()'d bpparam() as well as any custom argument that might be 
supplied by the user, e.g., if they don't want to change bpparam(). However, 
for various reasons (discussed in the other SerialParam() thread), the current 
default is BPPARAM=SerialParam().

To be honest, I've never thought it necessary to have a global package-specific 
parameter for parallelization as you've done (for scPipe, presumably). The 
current options - global across all packages with register(), or local to 
individual functions with BPPARAM= - seem to be satisfactory in the vast 
majority of cases. At least to me.

And at least for RNGs, if a function from another package is giving greatly 
different results upon parallelization (excepting some numerical error with 
changed order of summation), I'd say that's a bug of some sort. That should be 
fixed on their end, rather than requiring other packages and users to tiptoe 
around them.

-A

> On 10 Jan 2019, at 23:59, Shian Su  wrote:
> 
> Hello Developers,
> 
> I’m using BiocParallel for parallelism, and I understand that register() is 
> the recommended method for setting threads. But I am reluctant to ask people 
> to run code for my package which changes how other packages operate, so I 
> figured I’d use local bp params. Recent discussions of RNG has made me 
> worried there may be hidden state gotcha’s I’ve not considered. The current 
> implementation is
> 
> set_mypkg_threads <- function(n) {
> if (n == 1) {
> options(“mypkg.bpparam” = SerialParam())
> } else if (n > 1) {
> if (.Platform$OS.type == "windows") {
> options(“mypkg.bpparam" = SnowParam(nthreads))
> } else {
> options(“mypkg.bpparam" = MulticoreParam(nthreads))
> }
> }
> }
> 
> Then elsewhere in my package I make use of parallelism as follows
> 
> bplapply(
> BPPARAM = getOption(“mypkg.bpparam”, bpparam()),
> …
> )
> 
> Where getOption() either retrieves my set option or the default value given 
> by bpparam(). So the behaviour is that if users have not registered params 
> for my package specifically then it will take the BiocParallel default, but 
> otherwise it will use my package’s local bpparam.
> 
> Also I know that as currently implemented, I preclude cluster parallelism on 
> non-Windows machines. But it’s easy to fix. Just looking for feedback on the 
> approach.
> 
> Kind regards,
> Shian Su
> 
> ___
> 
> The information in this email is confidential and intended solely for the 
> addressee.
> You must not disclose, forward, print or use it without the permission of the 
> sender.
> 
> The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of the 
> Kulin
> Nation as the traditional owners of the land where our campuses are located 
> and
> the continuing connection to country and community.
> ___
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Using SerialParam() as the registered back-end for all platforms

2019-01-08 Thread Aaron Lun
; *invariant to the number of workers* (amount of chunking), I think the
>> only solution is to pregenerated RNG seeds (using
>> parallel::nextRNGStream()) for each individual iteration (element).
>> In other words, if a worker will process K elements, then the main R
>> process needs to generate K RNG seeds and pass those along to the
>> work.  I use this approach for future.apply::future_lapply(...,
>> future.seed = TRUE/), which then produce identical RNG
>> results regardless of backend and amount of chunking.  In the past, I
>> think I've seen Martin suggesting something similar as a manual
>> approach to some users.
>> 
>> 2. The above approach is obviously expensive, especially when there
>> are a large number of elements to iterate over.  Because of this I'm
>> thinking providing an option to use only one RNG seed per worker
>> (which is the common approach used elsewhere)
>> [https://github.com/HenrikBengtsson/future.apply/issues/20].  This
>> won't be invariant to the number of workers, but it "should" still be
>> statistically sound.  This approach will give reproducible RNG results
>> given the same initial seed and the same amount of chunking.
>> 
>> 3. For algorithms which do not rely on RNG, we can ignore both of the
>> above.  The problem is that it's not always known to the
>> user/developer which methods depend on RNG or not.  The above 'RNG
>> tracker' helps to identify some, but things might also change over
>> time.  I believe there's room for automating this in one way or the
>> other.  For instance, having a way to declare a function being
>> dependent on RNG or not could help.  Static code inspection could also
>> do it, e.g. when an R package is built and it could be part of the R
>> CMD checks to validate.
>> 
>> 4. Are there other approaches?
>> 
>> /Henrik
>> 
>>> 
>>> The documented behavior is to us the RNGseed= argument to *Param, but I
>> think it could be made consistent (by default, obey the global random
>> number seed on workers) at least on a single machine (where the default
>> number of cores is constant).
>>> 
>>> I have not (yet?) changed the default behavior to SerialParam. I guess
>> the cost of SerialParam is from the dependent packages that need to be
>> loaded
>>> 
>>>> system.time(suppressPackageStartupMessages(library(DelayedArray)))
>>>   user  system elapsed
>>>  3.068   0.082   3.150
>>> 
>>> If fastMNN() makes several calls to bplapply(), it might make sense to
>> start the default cluster at the top of the function once
>>> 
>>>if (!isup(bpparam())) {
>>>bpstart(bpparam())
>>>on.exit(bpstop(bpparam()))
>>>}
>>> 
>>> Martin
>>> 
>>> On 1/6/19, 11:16 PM, "Bioc-devel on behalf of Aaron Lun" <
>> bioc-devel-boun...@r-project.org on behalf of
>> infinite.monkeys.with.keyboa...@gmail.com> wrote:
>>> 
>>>As we know, the default BiocParallel backends are currently set to
>> MulticoreParam (Linux/Mac) or SnowParam (Windows). I can understand this to
>> some extent because a new user running, say, bplapply() without additional
>> arguments or set-up would expect some kind of parallelization. However,
>> from a developer’s perspective, I would argue that it makes more sense to
>> use SerialParam() by default.
>>> 
>>>1. It avoids problems with MulticoreParam stalling (especially on
>> Macs) when the randomly chosen port is in already use. This used to be a
>> major problem, to the point that all my BiocParallel-using functions in
>> scran passed BPPARAM=SerialParam() by default. Setting SerialParam() as
>> package default would ensure BiocParallel functions run properly in the
>> first place; if the code stalls due to switching to MulticoreParam, then
>> it’s obvious where the problem lies (and how to fix it).
>>> 
>>>2. It avoids the alteration of the random seed when the
>> MulticoreParam instance is constructed for the first time.
>>> 
>>>library(BiocParallel) # new R session
>>>set.seed(100)
>>>invisible(bplapply(1:5, identity))
>>>rnorm(1) # 0.1315312
>>>set.seed(100)
>>>invisible(bplapply(1:5, identity))
>>>rnorm(1) # -0.5021924
>>> 
>>>This is because the first bplapply() call calls bpparam(), which
>> constructs a MulticoreParam() for the first time; this calls the PRNG to
>> choose a random port number. Ensuing random numbers a

Re: [Bioc-devel] Controlling vignette compilation order

2019-01-07 Thread Aaron Lun
Agreed. And the BioC build system doesn’t even CHECK workflow packages, so 
r75944 actually wouldn’t have an effect at all.

P.S. simpleSingleCell has successfully built on the BioC-devel with my custom 
inter-vignette compilation set-up, so that’s a relief.

> On 7 Jan 2019, at 20:13, Pages, Herve  wrote:
> 
> This changes the default for _R_CHECK_BUILD_VIGNETTES_SEPARATELY_ from 
> false to true so only affects the re-built of the vignettes during 'R 
> CMD check'. While this is a step in the right direction, it would be 
> good if  'R CMD build' was modified accordingly i.e. to also build 
> vignettes in separate processes. With the current inconsistency, there 
> will be situations where 'R CMD check' will fail to re-build vignettes 
> that were just built by 'R CMD build'. Also, even though 'R CMD check' 
> now avoids the MAX_DLL problem, it is not that useful if 'R CMD build' 
> still has the problem and fails to build the package in the 1st place.
> 
> H.
> 
> On 1/2/19 17:01, Martin Morgan wrote:
>> r75944 | ripley | 2019-01-02 03:37:21 -0500 (Wed, 02 Jan 2019) | 1 line
>> 
>> making re-building vignettes in separate processes the default
>> 
>> from R-devel suggests that stand-alone vignettes are now necessary.
>> 
>> Martin
>> 
>> On 12/24/18, 3:02 AM, "Bioc-devel on behalf of Aaron Lun" 
>> > infinite.monkeys.with.keyboa...@gmail.com> wrote:
>> 
>> A working example of knitr caching across workflows is now available at 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_LTLA_BiocWorkCache=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=tSDkKz75WP0ADoqwKXKiWY8bkIqaLES4qv1wlk5eWZg=V9MlctgPkBOJTskConkT6RZN0e90QCM0HaLDnWYFCT4=
>>  
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_LTLA_BiocWorkCache=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=tSDkKz75WP0ADoqwKXKiWY8bkIqaLES4qv1wlk5eWZg=V9MlctgPkBOJTskConkT6RZN0e90QCM0HaLDnWYFCT4=>.
>> 
>> It uses “~/chipseq.log” as a log to demonstrate that the code in the 
>> most-upstream workflow (“test1.Rmd”) is indeed only executed once during the 
>> BUILD.
>> 
>> Note that the compilation of upstream vignettes involves a system call 
>> out to a separate R session. This avoids some difficult issues with caching 
>> when a Rmd file is compiled from within another Rmd file - trying to use 
>> rmarkdown::render() on the upstream vignette within a downstream vignette 
>> does not generate a cache that is recognized when BUILD goes onto compile 
>> the upstream vignette.
>> 
>> -A
>> 
>>> On 23 Dec 2018, at 01:24, Aaron Lun 
>>>  wrote:
>>> 
>>> Yes, I had noticed the vignettes.rds as well, and I figured that would be a 
>>> problem.
>>> 
>>> I just tried setting set cache=TRUE in my vignettes, implemented such that 
>>> BUILDing each downstream vignette will also run all upstream vignettes on 
>>> which it depends (that haven’t already been compiled). If an upstream 
>>> vignette is run in this manner, it caches the results of each code chunk to 
>>> avoid repeated work when it gets compiled “for real” by R CMD BUILD.
>>> 
>>> This seems to work on initial inspection (the caches are produced for the 
>>> upstream vignettes upon running one downstream vignette). I’ll have to 
>>> check whether this plays nice with R CMD BUILD. I will probably have to 
>>> write a function to isolate the scope of the execution of each upstream 
>>> vignette, to avoid polluting the namespace and cache of each downstream 
>>> vignette.
>>> 
>>> -A
>>> 
>>>> On 22 Dec 2018, at 19:22, Henrik Bengtsson >>> <mailto:henrik.bengts...@gmail.com>> wrote:
>>>> 
>>>> On Sat, Dec 22, 2018 at 10:56 AM Michael Lawrence
>>>> mailto:lawrence.mich...@gene.com>> wrote:
>>>>> 
>>>>> Anything that eventually lands in inst/doc is a vignette, I think, so
>>>>> there might be a hack around that.
>>>> 
>>>> Just so this is not misread - it's *not* possible to just hack your
>>>> vignette "product" files (PDF or HTML) into inst/doc and thinking
>>>> you're good.  R keeps track of package vignettes in a "vignette
>>>> index", e.g.
>>>> 
>>>>> readRDS(system.file(package = "utils", "Meta", "vignette.rds"))
>>>>   File  TitlePDFR Depends Keywords
>>

Re: [Bioc-devel] Using SerialParam() as the registered back-end for all platforms

2019-01-07 Thread Aaron Lun
The main problem I’ve described refers to changes in the random seed due to the 
MulticoreParam() constructor, prior to dispatch to workers. For the 
related-but-separate problem of obtaining consistent random results within each 
worker, we’ve been discussing the possible solutions on another Bioc-devel 
thread (https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014498.html 
<https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014498.html>). 

-A

> On 7 Jan 2019, at 15:03, Ryan Thompson  wrote:
> 
> I don't know if this is helpful for BiocParallel, but there's an extension 
> for the foreach package that ensures reproducible RNG behavior for all 
> parallel backends: https://cran.r-project.org/web/packages/doRNG/index.html 
> <https://cran.r-project.org/web/packages/doRNG/index.html>
> 
> Perhaps some of the principles from that package can be re-used?
> 
> On Mon, Jan 7, 2019 at 9:37 AM Aaron Lun 
>  <mailto:infinite.monkeys.with.keyboa...@gmail.com>> wrote:
> > I hope for 1. to have a 'local socket' (i.e., not using ports) 
> > implementation shortly.
> 
> Yes, that would be helpful.
> 
> > I committed a patch in 1.17.6 for the wrong-seeming behavior of 2. We now 
> > have
> > 
> >> library(BiocParallel)
> >> set.seed(1); p = bpparam(); rnorm(1)
> > [1] -0.6264538
> >> set.seed(1); p = bpparam(); rnorm(1)
> > [1] -0.6264538
> > 
> > at the expensive of using the generator when the package is loaded.
> > 
> >> set.seed(1); rnorm(1)
> > [1] -0.6264538
> >> set.seed(1); library(BiocParallel); rnorm(1)
> > [1] 0.1836433
> > 
> > Is that bad? It will be consistent across platforms.
> 
> Hm. I guess the changed behaviour is… better, in the sense that the second 
> scenario (setting the seed before loading the package) is less likely in real 
> analysis code.
> 
> Even so, there are probably some edge cases where this could cause issues, 
> e.g., when:
> 
> set.seed(1)
> MyPackage::SomeFun()
> 
> … where MyPackage causes BiocParallel to be attached, which presumably 
> changes the seed.
> 
> Having thought about it for a while, the fact that bpparam() changes the 
> random seed is only a secondary issue. The main issue is that it doesn’t 
> change the seed reproducibly. So I wouldn’t mind *as much* if repeated calls 
> of:
> 
> set.seed(1)
> bpparam()
> rnorm(1)
> 
> … gave the same result, even if it were different from just running 
> “set.seed(1); rnorm(1)”. (Mind you, I’d still mind a little, but it wouldn’t 
> be so bad.) The biggest problem with the current state of affairs is that the 
> first call gives different results to all subsequent calls, which really 
> interferes with debugging attempts.
> 
> > This behavior
> > 
> >> set.seed(1); unlist(bplapply(1:2, function(i) rnorm(1)))
> > [1] 0.9624337 0.8925947
> >> set.seed(1); unlist(bplapply(1:2, function(i) rnorm(1)))
> > [1] -0.5703597  0.1102093
> > 
> > seems wrong, but is consistent with mclapply
> > 
> >> set.seed(1); unlist(mclapply(1:2, function(i) rnorm(1)))
> > [1] -0.02704527  0.40721777
> >> set.seed(1); unlist(mclapply(1:2, function(i) rnorm(1)))
> > [1] -0.8239765  1.2957928
> > 
> > The documented behavior is to us the RNGseed= argument to *Param, but I 
> > think it could be made consistent (by default, obey the global random 
> > number seed on workers) at least on a single machine (where the default 
> > number of cores is constant).
> 
> I’m less concerned with that behaviour, given it’s inherently hard to take 
> randomization code written for serial execution and make it give the same 
> results on multiple cores (as we discussed elsewhere).
> 
> > I have not (yet?) changed the default behavior to SerialParam. I guess the 
> > cost of SerialParam is from the dependent packages that need to be loaded
> > 
> >> system.time(suppressPackageStartupMessages(library(DelayedArray)))
> >   user  system elapsed
> >  3.068   0.082   3.150
> 
> Does calling "SerialParam()” cause DelayedArray to be attached? That seems 
> odd.
> 
> > If fastMNN() makes several calls to bplapply(), it might make sense to 
> > start the default cluster at the top of the function once
> > 
> >if (!isup(bpparam())) {
> >bpstart(bpparam())
> >on.exit(bpstop(bpparam()))
> >}
> 
> This is probably a good idea to do in general to all of my parallelized 
> functions, though I don’t know how much this will solve the time problem. 
> Perhaps I should just do it and see.
> 
> -A
> 
> > On 1/6/19, 11:

Re: [Bioc-devel] Using SerialParam() as the registered back-end for all platforms

2019-01-07 Thread Aaron Lun
> I hope for 1. to have a 'local socket' (i.e., not using ports) implementation 
> shortly.

Yes, that would be helpful.

> I committed a patch in 1.17.6 for the wrong-seeming behavior of 2. We now have
> 
>> library(BiocParallel)
>> set.seed(1); p = bpparam(); rnorm(1)
> [1] -0.6264538
>> set.seed(1); p = bpparam(); rnorm(1)
> [1] -0.6264538
> 
> at the expensive of using the generator when the package is loaded.
> 
>> set.seed(1); rnorm(1)
> [1] -0.6264538
>> set.seed(1); library(BiocParallel); rnorm(1)
> [1] 0.1836433
> 
> Is that bad? It will be consistent across platforms.

Hm. I guess the changed behaviour is… better, in the sense that the second 
scenario (setting the seed before loading the package) is less likely in real 
analysis code.

Even so, there are probably some edge cases where this could cause issues, 
e.g., when:

set.seed(1)
MyPackage::SomeFun()

… where MyPackage causes BiocParallel to be attached, which presumably changes 
the seed.

Having thought about it for a while, the fact that bpparam() changes the random 
seed is only a secondary issue. The main issue is that it doesn’t change the 
seed reproducibly. So I wouldn’t mind *as much* if repeated calls of:

set.seed(1)
bpparam()
rnorm(1)

… gave the same result, even if it were different from just running 
“set.seed(1); rnorm(1)”. (Mind you, I’d still mind a little, but it wouldn’t be 
so bad.) The biggest problem with the current state of affairs is that the 
first call gives different results to all subsequent calls, which really 
interferes with debugging attempts.

> This behavior
> 
>> set.seed(1); unlist(bplapply(1:2, function(i) rnorm(1)))
> [1] 0.9624337 0.8925947
>> set.seed(1); unlist(bplapply(1:2, function(i) rnorm(1)))
> [1] -0.5703597  0.1102093
> 
> seems wrong, but is consistent with mclapply
> 
>> set.seed(1); unlist(mclapply(1:2, function(i) rnorm(1)))
> [1] -0.02704527  0.40721777
>> set.seed(1); unlist(mclapply(1:2, function(i) rnorm(1)))
> [1] -0.8239765  1.2957928
> 
> The documented behavior is to us the RNGseed= argument to *Param, but I think 
> it could be made consistent (by default, obey the global random number seed 
> on workers) at least on a single machine (where the default number of cores 
> is constant).

I’m less concerned with that behaviour, given it’s inherently hard to take 
randomization code written for serial execution and make it give the same 
results on multiple cores (as we discussed elsewhere).

> I have not (yet?) changed the default behavior to SerialParam. I guess the 
> cost of SerialParam is from the dependent packages that need to be loaded
> 
>> system.time(suppressPackageStartupMessages(library(DelayedArray)))
>   user  system elapsed
>  3.068   0.082   3.150

Does calling "SerialParam()” cause DelayedArray to be attached? That seems odd.

> If fastMNN() makes several calls to bplapply(), it might make sense to start 
> the default cluster at the top of the function once
> 
>if (!isup(bpparam())) {
>bpstart(bpparam())
>on.exit(bpstop(bpparam()))
>}

This is probably a good idea to do in general to all of my parallelized 
functions, though I don’t know how much this will solve the time problem. 
Perhaps I should just do it and see.

-A

> On 1/6/19, 11:16 PM, "Bioc-devel on behalf of Aaron Lun" 
>  infinite.monkeys.with.keyboa...@gmail.com> wrote:
> 
>As we know, the default BiocParallel backends are currently set to 
> MulticoreParam (Linux/Mac) or SnowParam (Windows). I can understand this to 
> some extent because a new user running, say, bplapply() without additional 
> arguments or set-up would expect some kind of parallelization. However, from 
> a developer’s perspective, I would argue that it makes more sense to use 
> SerialParam() by default. 
> 
>1. It avoids problems with MulticoreParam stalling (especially on Macs) 
> when the randomly chosen port is in already use. This used to be a major 
> problem, to the point that all my BiocParallel-using functions in scran 
> passed BPPARAM=SerialParam() by default. Setting SerialParam() as package 
> default would ensure BiocParallel functions run properly in the first place; 
> if the code stalls due to switching to MulticoreParam, then it’s obvious 
> where the problem lies (and how to fix it).
> 
>2. It avoids the alteration of the random seed when the MulticoreParam 
> instance is constructed for the first time. 
> 
>library(BiocParallel) # new R session
>set.seed(100)
>invisible(bplapply(1:5, identity))
>rnorm(1) # 0.1315312
>set.seed(100)
>invisible(bplapply(1:5, identity))
>rnorm(1) # -0.5021924
> 
>This is because the first bplapply() call calls bpparam(), which 
&

[Bioc-devel] Using SerialParam() as the registered back-end for all platforms

2019-01-06 Thread Aaron Lun
As we know, the default BiocParallel backends are currently set to 
MulticoreParam (Linux/Mac) or SnowParam (Windows). I can understand this to 
some extent because a new user running, say, bplapply() without additional 
arguments or set-up would expect some kind of parallelization. However, from a 
developer’s perspective, I would argue that it makes more sense to use 
SerialParam() by default. 

1. It avoids problems with MulticoreParam stalling (especially on Macs) when 
the randomly chosen port is in already use. This used to be a major problem, to 
the point that all my BiocParallel-using functions in scran passed 
BPPARAM=SerialParam() by default. Setting SerialParam() as package default 
would ensure BiocParallel functions run properly in the first place; if the 
code stalls due to switching to MulticoreParam, then it’s obvious where the 
problem lies (and how to fix it).

2. It avoids the alteration of the random seed when the MulticoreParam instance 
is constructed for the first time. 

library(BiocParallel) # new R session
set.seed(100)
invisible(bplapply(1:5, identity))
rnorm(1) # 0.1315312
set.seed(100)
invisible(bplapply(1:5, identity))
rnorm(1) # -0.5021924

This is because the first bplapply() call calls bpparam(), which constructs a 
MulticoreParam() for the first time; this calls the PRNG to choose a random 
port number. Ensuing random numbers are altered, as seen above. To avoid this, 
I need to define the MulticoreParam() object prior to set.seed(), which 
undermines the utility of a default-defined bpparam().

3. Job dispatch via SnowParam() is quite slow, which potentially makes Windows 
package builds run slower by default. A particularly bad example is that of 
scran::fastMNN(), which has a few matrix multiplications that use 
DelayedArray:%*%. The %*% is parallelized with the default bpparam(), resulting 
in SNOW parallelization on Windows. This slowed down fastMNN()’s examples from 
4 seconds (unix) to ~100 seconds (windows). Clearly, serial execution is the 
faster option here. A related problem is MulticoreParam()’s tendency to copy 
the environment, which may result in problems from inflated memory consumption.

So, can we default to SerialParam() on all platforms? And by this I mean the 
BiocParallel in-built default - I don’t want to have to instruct all my users 
to put a “register(SerialParam())” at the start of their analysis scripts. I 
feel that BiocParallel’s job is to provide downstream code with the potential 
for parallelization. If end-users want actual parallelization, they had better 
be prepared to specify an appropriate scheme via *Param() objects. 

-A




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Controlling vignette compilation order

2019-01-04 Thread Aaron Lun
Thanks for the update Martin. At least separate processes will avoid the 
MAX_DLL problems we had earlier.

“simpleSingleCell” is now set up so that each vignette will try to compile any 
upstream vignettes that it depends on (if those upstream targets haven’t 
already been built). This is done in a separate R process via callr to avoid 
caching difficulties as described previously. knitr caching is used to avoid 
redundant work when an already-compiled upstream vignette is recompiled by R 
CMD BUILD to create the HTML.

This setup seems to behave on the Bioc build system (note that the package 
itself is failing for other reasons that should be fixed in the latest 
version). There are no longer any implicit dependencies between vignettes based 
on the compilation order, which is probably a good thing. All vignettes must 
explicitly “simpleSingleCell:::.compile()” their upstream targets.

It’s worth pointing out that my old vignette setup would have withstood r75944, 
as everything necessary was saved to file before being reloaded by each 
vignette. As it is now, the set-up can also withstand arbitrary changes to the 
compilation order. Now I only have to worry if R should decide to compile 
vignettes in different directories (or empty out the directory before compiling 
each vignette).

-A

> On 3 Jan 2019, at 01:01, Martin Morgan  wrote:
> 
> r75944 | ripley | 2019-01-02 03:37:21 -0500 (Wed, 02 Jan 2019) | 1 line
> 
> making re-building vignettes in separate processes the default
> 
> from R-devel suggests that stand-alone vignettes are now necessary.
> 
> Martin
> 
> On 12/24/18, 3:02 AM, "Bioc-devel on behalf of Aaron Lun" 
>  infinite.monkeys.with.keyboa...@gmail.com> wrote:
> 
>A working example of knitr caching across workflows is now available at 
> https://github.com/LTLA/BiocWorkCache 
> <https://github.com/LTLA/BiocWorkCache>. 
> 
>It uses “~/chipseq.log” as a log to demonstrate that the code in the 
> most-upstream workflow (“test1.Rmd”) is indeed only executed once during the 
> BUILD.
> 
>Note that the compilation of upstream vignettes involves a system call out 
> to a separate R session. This avoids some difficult issues with caching when 
> a Rmd file is compiled from within another Rmd file - trying to use 
> rmarkdown::render() on the upstream vignette within a downstream vignette 
> does not generate a cache that is recognized when BUILD goes onto compile the 
> upstream vignette.
> 
>-A
> 
>> On 23 Dec 2018, at 01:24, Aaron Lun 
>>  wrote:
>> 
>> Yes, I had noticed the vignettes.rds as well, and I figured that would be a 
>> problem.
>> 
>> I just tried setting set cache=TRUE in my vignettes, implemented such that 
>> BUILDing each downstream vignette will also run all upstream vignettes on 
>> which it depends (that haven’t already been compiled). If an upstream 
>> vignette is run in this manner, it caches the results of each code chunk to 
>> avoid repeated work when it gets compiled “for real” by R CMD BUILD.
>> 
>> This seems to work on initial inspection (the caches are produced for the 
>> upstream vignettes upon running one downstream vignette). I’ll have to check 
>> whether this plays nice with R CMD BUILD. I will probably have to write a 
>> function to isolate the scope of the execution of each upstream vignette, to 
>> avoid polluting the namespace and cache of each downstream vignette.
>> 
>> -A
>> 
>>> On 22 Dec 2018, at 19:22, Henrik Bengtsson >> <mailto:henrik.bengts...@gmail.com>> wrote:
>>> 
>>> On Sat, Dec 22, 2018 at 10:56 AM Michael Lawrence
>>> mailto:lawrence.mich...@gene.com>> wrote:
>>>> 
>>>> Anything that eventually lands in inst/doc is a vignette, I think, so
>>>> there might be a hack around that.
>>> 
>>> Just so this is not misread - it's *not* possible to just hack your
>>> vignette "product" files (PDF or HTML) into inst/doc and thinking
>>> you're good.  R keeps track of package vignettes in a "vignette
>>> index", e.g.
>>> 
>>>> readRDS(system.file(package = "utils", "Meta", "vignette.rds"))
>>>   File  TitlePDFR Depends Keywords
>>> 1 Sweave.Rnw Sweave User Manual Sweave.pdf Sweave.R   tools
>>> 
>>> which is created during 'R CMD build' by parsing and compiling the
>>> vignettes 
>>> (https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393
>>>  
>>> <https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393>).
>>> This

Re: [Bioc-devel] how to achieve reproducibility with BiocParallel regardless of number of threads and OS (set.seed is disallowed)

2019-01-02 Thread Aaron Lun
I’ll also back-track a bit from my advice in the original support site posting, 
as it turns out C++11’s  is not guaranteed to be reproducible across 
platforms. 

That is to say, the RNG engines are portably defined across implementations, 
but the distribution classes (that convert the random stream into values of the 
desired distribution) turn out to be implementation-defined. As such, you can 
get different distribution values from the same seed with different compilers. 
Fun, huh? So much for “standard”! So, by using , I ended up trading 
irreproducibility across workers for irreproducibility across platforms. 

Switching to boost::random (as provided by the BH package) seems to fix the 
problem, though one wonders how this was ever allowed to happen in the first 
place.

-A

> On 2 Jan 2019, at 14:45, Martin Morgan  wrote:
> 
> I'll back-track on my advice a little, and say that the right way to enable 
> the user to get reproducible results is to respect the setting the user makes 
> outside your function. So for
> 
> your = function()
> unlist(bplapply(1:4, rnorm))
> 
> The user will
> 
> register(MulticoreParam(2, RNGseed=123))
> your()
> 
> to always produces the identical result.
> 
> Following Aaron's strategy, the R-level approach to reproducibility might be 
> along the lines of 
> 
> - tell the user to set parallel::RNGkind("L'Ecuyer-CMRG") and set.seed()
> - In your function, generate seeds for each job
> 
>n = 5; seeds <- vector("list", n)
>seeds[[1]] = .Random.seed  # FIXME fails if set.seed or random nos. have 
> not been generated...
>for (i in tail(seq_len(n), -1)) seeds[[i]] = nextRNGStream(seeds[[i - 1]])
> 
> - send these, along with the job, to the workers, setting .Random.seed on 
> each worker
> 
>bpmapply(function(i, seed, ...) {
>oseed <- get(".Random.seed", envir = .GlobalEnv)
>on.exit(assign(".Random.seed", oseed, envir = .GlobalEnv))
>assign(".Random.seed", seed, envir = .GlobalEnv)
>...
>}, seq_len(n), seeds, ...)
> 
> The use of L'Ecuyer-CMRG and `nextRNGStream()` means that the streams on each 
> worker are independent. Using on.exit means that, even on the worker, the 
> state of the random number generator is not changed by the evaluation. This 
> means that even with SerialParam() the generator is well-behaved. I don’t 
> know how BiocCheck responds to use of .Random.seed, which in general would be 
> a bad thing to do but in this case with the use of on.exit() the usage seems 
> ok.
> 
> Martin
> 
> 
> On 12/31/18, 3:17 PM, "Lulu Chen"  wrote:
> 
>Hi Martin,
> 
> 
>Thanks for your help. But setting different number of workers will 
> generate different results:
> 
> 
>> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(1, RNGseed=123)))
> [1]  1.0654274 -1.2421454  1.0523311 -0.7744536  1.3081934 -1.5305223  
> 1.1525356  0.9287607 -0.4355877  1.5055436
>> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(2, RNGseed=123)))
> [1] -0.9685927  0.7061091  1.4890213 -0.4094454  0.8909694 -0.8653704  
> 1.4642711  1.2674845 -0.2220491  2.4505322
>> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(3, RNGseed=123)))
> [1] -0.96859273 -0.40944544  0.89096942 -0.86537045  1.46427111  
> 1.26748453 -0.48906078  0.43304237 -0.03195349
>[10]  0.14670372
>> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(4, RNGseed=123)))
> [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 
> -0.03195349 -1.03886641  1.57451249  0.74708204
>[10]  0.67187201
> 
> 
> 
>Best,
>Lulu
> 
> 
> 
>On Mon, Dec 31, 2018 at 1:12 PM Martin Morgan  
> wrote:
> 
> 
>The major BiocParallel objects (SnowParam(), MulticoreParam()) and use of 
> bplapply() allow fully repeatable randomizations, e.g.,
> 
>> library(BiocParallel)
>> unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
> [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 
> -0.03195349
> [7] -1.03886641  1.57451249  0.74708204  0.67187201
>> unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
> [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 
> -0.03195349
> [7] -1.03886641  1.57451249  0.74708204  0.67187201
>> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(RNGseed=123)))
>[1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 -0.03195349
> [7] -1.03886641  1.57451249  0.74708204  0.67187201
> 
>The idea then would be to tell the user to register() such a param, or to 
> write your function to accept an argument rngSeed along the lines of
> 
>f = function(..., rngSeed = NULL) {
>if (!is.null(rngSeed)) {
>param = bpparam()  # user's preferred back-end
>oseed = bpRNGseed(param)
>on.exit(bpRNGseed(param) <- oseed)
>bpRNGseed(param) = rngSeed
>}
>bplapply(1:4, rnorm)
>}
> 
>(actually, this exercise illustrates a problem with bpRNGseed<-() when the 
> original seed is NULL; this 

Re: [Bioc-devel] Controlling vignette compilation order

2018-12-24 Thread Aaron Lun
A working example of knitr caching across workflows is now available at 
https://github.com/LTLA/BiocWorkCache <https://github.com/LTLA/BiocWorkCache>. 

It uses “~/chipseq.log” as a log to demonstrate that the code in the 
most-upstream workflow (“test1.Rmd”) is indeed only executed once during the 
BUILD.

Note that the compilation of upstream vignettes involves a system call out to a 
separate R session. This avoids some difficult issues with caching when a Rmd 
file is compiled from within another Rmd file - trying to use 
rmarkdown::render() on the upstream vignette within a downstream vignette does 
not generate a cache that is recognized when BUILD goes onto compile the 
upstream vignette.

-A

> On 23 Dec 2018, at 01:24, Aaron Lun 
>  wrote:
> 
> Yes, I had noticed the vignettes.rds as well, and I figured that would be a 
> problem.
> 
> I just tried setting set cache=TRUE in my vignettes, implemented such that 
> BUILDing each downstream vignette will also run all upstream vignettes on 
> which it depends (that haven’t already been compiled). If an upstream 
> vignette is run in this manner, it caches the results of each code chunk to 
> avoid repeated work when it gets compiled “for real” by R CMD BUILD.
> 
> This seems to work on initial inspection (the caches are produced for the 
> upstream vignettes upon running one downstream vignette). I’ll have to check 
> whether this plays nice with R CMD BUILD. I will probably have to write a 
> function to isolate the scope of the execution of each upstream vignette, to 
> avoid polluting the namespace and cache of each downstream vignette.
> 
> -A
> 
>> On 22 Dec 2018, at 19:22, Henrik Bengtsson > <mailto:henrik.bengts...@gmail.com>> wrote:
>> 
>> On Sat, Dec 22, 2018 at 10:56 AM Michael Lawrence
>> mailto:lawrence.mich...@gene.com>> wrote:
>>> 
>>> Anything that eventually lands in inst/doc is a vignette, I think, so
>>> there might be a hack around that.
>> 
>> Just so this is not misread - it's *not* possible to just hack your
>> vignette "product" files (PDF or HTML) into inst/doc and thinking
>> you're good.  R keeps track of package vignettes in a "vignette
>> index", e.g.
>> 
>>> readRDS(system.file(package = "utils", "Meta", "vignette.rds"))
>>File  TitlePDFR Depends Keywords
>> 1 Sweave.Rnw Sweave User Manual Sweave.pdf Sweave.R   tools
>> 
>> which is created during 'R CMD build' by parsing and compiling the
>> vignettes 
>> (https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393
>>  
>> <https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393>).
>> This vignette index is used to find package vignettes (e.g.
>> utils::vignette()) and build the HTML vignette index.
>> 
>> Also, one vignette source (e.g. Rnw, Rmd, ...) can only produce one
>> vignette product (PDF or HTML) in the vignette index.  You can output
>> other files (e.g. image files) in a relative folder that the vignette
>> references, which is why for instance non-self-contained HTML files
>> work.  Thus, one ad-hoc, not-so-nice hack that OP could do is to have
>> a single main vignette that produces and links to all child vignettes.
>> However, personally, I'd aim for using memoization/caching (to file)
>> such that each vignette can be compiled independently of the others
>> (and in any order), while still reusing intermediate
>> results/calculations produced by earlier vignettes.
>> 
>> /Henrik
>> 
>>> 
>>> On Fri, Dec 21, 2018 at 11:26 PM Aaron Lun
>>> >> <mailto:infinite.monkeys.with.keyboa...@gmail.com>> wrote:
>>>> 
>>>> I gave it a shot:
>>>> 
>>>> https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest> 
>>>> <https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>>
>>>> 
>>>> This uses a single “controller” Rmd file to trigger Drake::make. Running 
>>>> this file will instruct Drake to compile all of the other vignettes 
>>>> following the desired dependency structure.
>>>> 
>>>> The current sticking point is that I need to move the Drake-controlled Rmd 
>>>> files out of “vignettes/“, otherwise they’ll just be compiled as usual 
>>>> without consideration of their dependencies. This causes problems as R CMD 
>>>> BUILD only recognizes the controller Rmd file as the sole vignette, and 
>>>> doesn’t retain or index the HTML files

Re: [Bioc-devel] Controlling vignette compilation order

2018-12-22 Thread Aaron Lun
Yes, I had noticed the vignettes.rds as well, and I figured that would be a 
problem.

I just tried setting set cache=TRUE in my vignettes, implemented such that 
BUILDing each downstream vignette will also run all upstream vignettes on which 
it depends (that haven’t already been compiled). If an upstream vignette is run 
in this manner, it caches the results of each code chunk to avoid repeated work 
when it gets compiled “for real” by R CMD BUILD.

This seems to work on initial inspection (the caches are produced for the 
upstream vignettes upon running one downstream vignette). I’ll have to check 
whether this plays nice with R CMD BUILD. I will probably have to write a 
function to isolate the scope of the execution of each upstream vignette, to 
avoid polluting the namespace and cache of each downstream vignette.

-A

> On 22 Dec 2018, at 19:22, Henrik Bengtsson  wrote:
> 
> On Sat, Dec 22, 2018 at 10:56 AM Michael Lawrence
> mailto:lawrence.mich...@gene.com>> wrote:
>> 
>> Anything that eventually lands in inst/doc is a vignette, I think, so
>> there might be a hack around that.
> 
> Just so this is not misread - it's *not* possible to just hack your
> vignette "product" files (PDF or HTML) into inst/doc and thinking
> you're good.  R keeps track of package vignettes in a "vignette
> index", e.g.
> 
>> readRDS(system.file(package = "utils", "Meta", "vignette.rds"))
>File  TitlePDFR Depends Keywords
> 1 Sweave.Rnw Sweave User Manual Sweave.pdf Sweave.R   tools
> 
> which is created during 'R CMD build' by parsing and compiling the
> vignettes 
> (https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393
>  
> <https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393>).
> This vignette index is used to find package vignettes (e.g.
> utils::vignette()) and build the HTML vignette index.
> 
> Also, one vignette source (e.g. Rnw, Rmd, ...) can only produce one
> vignette product (PDF or HTML) in the vignette index.  You can output
> other files (e.g. image files) in a relative folder that the vignette
> references, which is why for instance non-self-contained HTML files
> work.  Thus, one ad-hoc, not-so-nice hack that OP could do is to have
> a single main vignette that produces and links to all child vignettes.
> However, personally, I'd aim for using memoization/caching (to file)
> such that each vignette can be compiled independently of the others
> (and in any order), while still reusing intermediate
> results/calculations produced by earlier vignettes.
> 
> /Henrik
> 
>> 
>> On Fri, Dec 21, 2018 at 11:26 PM Aaron Lun
>>  wrote:
>>> 
>>> I gave it a shot:
>>> 
>>> https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>
>>> 
>>> This uses a single “controller” Rmd file to trigger Drake::make. Running 
>>> this file will instruct Drake to compile all of the other vignettes 
>>> following the desired dependency structure.
>>> 
>>> The current sticking point is that I need to move the Drake-controlled Rmd 
>>> files out of “vignettes/“, otherwise they’ll just be compiled as usual 
>>> without consideration of their dependencies. This causes problems as R CMD 
>>> BUILD only recognizes the controller Rmd file as the sole vignette, and 
>>> doesn’t retain or index the HTML files produced from the other Rmd files as 
>>> side-effects of running the controller.
>>> 
>>> Are there any better ways to subvert the vignette building procedure to get 
>>> the desired effect of running drake::make() and recognition of the 
>>> resulting HTMLs as vignettes?
>>> 
>>> -A
>>> 
>>>> On 18 Dec 2018, at 17:41, Michael Lawrence  
>>>> wrote:
>>>> 
>>>> Sounds like a use case for drake...
>>>> 
>>>> On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun 
>>>> >>> <mailto:infinite.monkeys.with.keyboa...@gmail.com>> wrote:
>>>> @Michael In this case, the resource produced by vignette X is a 
>>>> SingleCellExperiment object containing the results of various processing 
>>>> steps (normalization, clustering, etc.) described in that vignette.
>>>> 
>>>> I can imagine a lazy evaluation model for this, but it wouldn’t be pretty. 
>>>> If I had another vignette Y that depended on the SCE produced by vignette 
>>>> X, I would need Y to execute all of the steps in X if X hadn’t already 
>>>> been run before Y. This gets us into the ter

Re: [Bioc-devel] Controlling vignette compilation order

2018-12-22 Thread Aaron Lun
Yes, that is the simplest solution, and it’s what I’m doing now.

It’s not overly confusing for a reader, but it’s awkward to add new vignettes 
in the middle of the compilation order, as I then have to rename the others (or 
give the new vignette a weird name, e.g., “xtra-3b-de.Rmd” to get it to fall 
behind “xtra-3-var.Rmd”).

>From a practical perspective, this becomes particularly annoying when writing 
>links between vignettes, which needs to know the file name to construct the 
>URL (see BiocStyle::Biocpkg()). If the name is unintuitive and/or changes all 
>the time, these links become difficult to write. External links to vignettes 
>(e.g., in support site posts) would also become invalidated upon name changes.

I guess I find it unpleasant to require the file name to conflate the 
description of the vignette with the compilation order, and I would like a more 
explicit mechanism to do this.

-A

> On 22 Dec 2018, at 19:00, Martin Morgan  wrote:
> 
> ...but in the end isn't it just simpler to name your vignettes in collation 
> order? Who other than you will be able to parse what you've done?
> 
> Martin
> 
> On 12/22/18, 1:56 PM, "Bioc-devel on behalf of Michael Lawrence" 
>  
> wrote:
> 
>Anything that eventually lands in inst/doc is a vignette, I think, so
>there might be a hack around that.
> 
>On Fri, Dec 21, 2018 at 11:26 PM Aaron Lun
> wrote:
>> 
>> I gave it a shot:
>> 
>> https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>
>> 
>> This uses a single “controller” Rmd file to trigger Drake::make. Running 
>> this file will instruct Drake to compile all of the other vignettes 
>> following the desired dependency structure.
>> 
>> The current sticking point is that I need to move the Drake-controlled Rmd 
>> files out of “vignettes/“, otherwise they’ll just be compiled as usual 
>> without consideration of their dependencies. This causes problems as R CMD 
>> BUILD only recognizes the controller Rmd file as the sole vignette, and 
>> doesn’t retain or index the HTML files produced from the other Rmd files as 
>> side-effects of running the controller.
>> 
>> Are there any better ways to subvert the vignette building procedure to get 
>> the desired effect of running drake::make() and recognition of the resulting 
>> HTMLs as vignettes?
>> 
>> -A
>> 
>>> On 18 Dec 2018, at 17:41, Michael Lawrence  
>>> wrote:
>>> 
>>> Sounds like a use case for drake...
>>> 
>>> On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun 
>>> >> <mailto:infinite.monkeys.with.keyboa...@gmail.com>> wrote:
>>> @Michael In this case, the resource produced by vignette X is a 
>>> SingleCellExperiment object containing the results of various processing 
>>> steps (normalization, clustering, etc.) described in that vignette.
>>> 
>>> I can imagine a lazy evaluation model for this, but it wouldn’t be pretty. 
>>> If I had another vignette Y that depended on the SCE produced by vignette 
>>> X, I would need Y to execute all of the steps in X if X hadn’t already been 
>>> run before Y. This gets us into the territory of Makefile-like 
>>> dependencies, which seems even more complicated than simply specifying a 
>>> compilation order.
>>> 
>>> You might ask why X and Y are split into two separate vignettes. The use of 
>>> different vignettes is motivated by the complexity of the workflows:
>>> 
>>> - Vignette 1 demonstrates core processing steps for one read-based 
>>> single-cell RNAseq dataset.
>>> - Vignette 2 demonstrates (slightly different) core steps for a UMI-based 
>>> dataset.
>>> - … so on for a bunch of other core steps for different types of data.
>>> - Vignette 6 demonstrates extra optional steps for the two SCEs produced by 
>>> vignettes 1 & 3.
>>> - … and so on for a bunch of other optional steps.
>>> 
>>> The separation between core and optional steps into separate documents is 
>>> desirable. From a pedagogical perspective, I would very much like to get 
>>> the reader through all the core steps before even considering the extra 
>>> steps, which would just be confusing if presented so early on. Previously, 
>>> everything was in a single document, which was difficult to read (for 
>>> users) and to debug (for me), especially because I had to use contrived 
>>> variable names to avoid clashes between different sections of the workflow 
>>> that did similar things.
>>> 
>>> @Martin I’ve been using Bi

Re: [Bioc-devel] Compilation flags, CHECK errors and BiocNeighbors

2018-12-21 Thread Aaron Lun
Thanks Val. Looks like BiocNeighbors is all green again in the latest build, so 
that’s a relief.

One down, two to go - Windows CHECK failures seem to be the tokay2’s idea of 
Christmas presents.

-A

> On 20 Dec 2018, at 19:52, Obenchain, Valerie 
>  wrote:
> 
> The problem is that during the nightly builds, one of the Bioconductor 
> packages writes out a .R/Makevars.win in biocbuild's HOME during R CMD 
> build.
> 
> Yesterday I removed the .R/ directory before the builds started and, as 
> expected, today's NodeInfo on tokay2 and packages using the C++11 show 
> the correct flags.
> 
> If this .R/Makevars.win is not removed, it will (and did in the past) 
> pollute the next build cycle such that the NodeInfo and all packages 
> using C++11 would report/use the wrong flags.
> 
> I think I've narrowed down which package is doing this and will contact 
> the maintainer. We'll also implement some sanitation code in the BBS to 
> prevent this from happening again.
> 
> The reason HOME is writable is that many applications need to create 
> files (often hidden) such as lock files, cache, config files etc. If 
> they can't, they'll break and they will sometimes break in a subtle way 
> that is not immediately obvious.
> 
> One last follow up is to explain why the previous iteration of the 
> NodeInfo on the build report reported the incorrect C++11 flags. The 
> problem there was that previously we were only picking up CXX1XFLAGS 
> instead of the individual CXX11FLAGS, CXX14FLAGS etc.
> 
> Thanks for being persistent on this issue and for bringing the 
> conversation to bioc-devel.
> 
> Val
> 
> 
> 
> On 12/18/18 8:39 AM, Obenchain, Valerie wrote:
>> The devel build report hasn't posted yet but I took a look at the new
>> compiler flag output Herve implemented. The results show tokay2 is
>> indeed using
>> 
>> CXX11FLAGS: -O3 -march=native -mtune=native
>> 
>> This is inconsistent with what we have in the R/etc//Makeconf for
>> both architectures on both tokay1 and tokay2. The Makeconf looks like this:
>> 
>> CXX11 = $(BINPREF)g++ $(M_ARCH)
>> CXX11FLAGS = -O2 -Wall $(DEBUGFLAG) -mtune=generic
>> CXX11PICFLAGS =
>> CXX11STD = -std=gnu++11
>> 
>> I don't know why the Makeconf is not being respected on tokay2. I can
>> confirm the inconsistency in an R session -
>> 
>> tokay2:
>> 
>> PS C:\Users\biocbuild\bbs-3.9-bioc\R> ./bin/R CMD config CXX11FLAGS
>> -O3 -march=native -mtune=native
>> 
>> tokay1:
>> 
>> PS C:\Users\biocbuild\bbs-3.8-bioc\R> ./bin/R CMD config CXX11FLAGS
>> -O2 -Wall -mtune=generic
>> 
>> I'll work with Herve to resolve this.
>> 
>> Val
>> 
>> 
>> 
>> On 12/17/18 5:05 PM, Aaron Lun wrote:
>>> Thanks Val. I don�t think it�s a BiocNeighbors thing, as it doesn�t try
>>> to customize the compilation flags or have its own Makevars. Moreover,
>>> the �-O3 -mtune=native -mtune=generic� flags seem to show up on all of
>>> my packages containing C++11 code. Some cursory checks of other packages
>>> suggest that the correct flags (�-O2 -mtune=generic�) are used for C++98
>>> code.
>>> 
>>> -A
>>> 
>>>> On 17 Dec 2018, at 17:47, Obenchain, Valerie 
>>>>  wrote:
>>>> 
>>>> Hi Aaron,
>>>> 
>>>> The only compilation flags that are different for tokay1 (release) and
>>>> tokay2 (devel) are C++14 flags. BiocNeighbors is not using C++14 but
>>>> C++11 so I think the changes we discussed previously actually don't
>>>> apply to your case.
>>>> 
>>>> All compilation flags we use are listed at the top of the build report,
>>>> e.g., for tokay2:
>>>> 
>>>> https://www.bioconductor.org/checkResults/devel/bioc-LATEST/tokay2-NodeInfo.html
>>> <https://www.bioconductor.org/checkResults/devel/bioc-LATEST/tokay2-NodeInfo.html
>>>  
>>> <https://www.bioconductor.org/checkResults/devel/bioc-LATEST/tokay2-NodeInfo.html>>
>>>> 
>>>> I can look into this further but right now I'm not sure where the '-O3
>>>> -march=native -mtune=native' is coming from in the check output for
>>>> BiocNeighbors. We don't use 'native' on the builders for build/check or
>>>> for creating binaries.
>>>> 
>>>> Herve might have more insight on this.
>>>> 
>>>> Val
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 12/15/18 10:56 PM, Aaron Lun w

Re: [Bioc-devel] New ExperimentHub resource and some related questions

2018-12-20 Thread Aaron Lun
I presume your package is not actually called “SingleCell” (in point 1). This 
would be pretty confusing wjem compared to the simpleSingleCell package, the 
SingleCellExperiment package, and the SingleCell biocViews term itself. It 
would probably make more sense to call it BUStoolsR or some other appropriate 
pun (e.g., RBUS, which is funniest when it gets to version 3.8.0.).

Also, at first glance, the BUS format seems pretty similar to 10X’s molecule 
information file, for which the DropletUtils package has a series of reader 
functions. You may find some of the code there useful for your package. I might 
also add a readBUS() function to DropletUtils if this turns out to be a popular 
format for droplet data, though TBH the sparse matrix is a much more common 
starting point.

-A

> On 20 Dec 2018, at 01:42, Lu, Dongyi (Lambda)  wrote:
> 
> Hi everyone,
> 
> I’m writing a package (biocViews SinigleCell) that converts files of the BUS 
> format (standing for Barcode, UMI, Set, see 
> https://www.biorxiv.org/content/early/2018/11/21/472571) into a sparse matrix 
> in R that can be used in Seurat and SingleCellExperiment. In order to write 
> the examples and the vignette, I’m also putting the data itself into a 
> package for ExperimentHub. The data used here are some mixed human and mouse 
> cells from 10x. Here are my questions:
> 
> 
>  1.  In the documentation for `ExperimentHubData::makeExperimentHubMetadata`, 
> the fields `RDataClass` and `DispatchClass` are required. However, this 
> accompanying dataset package is meant to download text files (generated by 
> command line tools outside R) to disk rather than into the R session, and 
> it’s the job of the SingleCell package to converts the text files into a 
> sparse matrix. There is a website documenting how the command line tools were 
> used to generate the text files. So is this dataset still appropriate for 
> ExperimentHub?
>  2.  If it is appropriate, then what shall I put in `RDataClass` and 
> `DispatchClass`?
> 
> Thanks,
> Lambda
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Controlling vignette compilation order

2018-12-18 Thread Aaron Lun
@Michael In this case, the resource produced by vignette X is a 
SingleCellExperiment object containing the results of various processing steps 
(normalization, clustering, etc.) described in that vignette. 

I can imagine a lazy evaluation model for this, but it wouldn’t be pretty. If I 
had another vignette Y that depended on the SCE produced by vignette X, I would 
need Y to execute all of the steps in X if X hadn’t already been run before Y. 
This gets us into the territory of Makefile-like dependencies, which seems even 
more complicated than simply specifying a compilation order.

You might ask why X and Y are split into two separate vignettes. The use of 
different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based single-cell 
RNAseq dataset. 
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based 
dataset.
- … so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced by 
vignettes 1 & 3.
- … and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is 
desirable. From a pedagogical perspective, I would very much like to get the 
reader through all the core steps before even considering the extra steps, 
which would just be confusing if presented so early on. Previously, everything 
was in a single document, which was difficult to read (for users) and to debug 
(for me), especially because I had to use contrived variable names to avoid 
clashes between different sections of the workflow that did similar things.

@Martin I’ve been using BiocFileCache for all of the online resources that are 
used in the workflow. However, this is only for my (and the reader’s) 
convenience. I use a local cache rather than the system default, to ensure that 
the downloaded files are removed after package build. This is intentional as it 
forces the package builder to try to re-download resources when compiling the 
vignette, thus ensuring the validity of the URLs. For a similar reason, I would 
prefer not to cache the result objects for use in different R sessions. I could 
imagine caching the result objects for use by a different vignette in the same 
build session, but this gets back to the problem of ensuring that the result 
object is generated by one vignette before it is needed by another vignette.

-A

> On 18 Dec 2018, at 14:14, Martin Morgan  wrote:
> 
> Also perhaps using BiocFileCache so that the result object is only generated 
> once, then cached for future (different session) use.
> 
> On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" 
>  
> wrote:
> 
>I would recommend against dependencies across vignettes. Ideally someone
>can pick up a vignette and execute the code independently of any other
>documentation. Perhaps you could move the code generating those shared
>resources to the package. They could behave lazily, only generating the
>resource if necessary, otherwise reusing it. That would also make it easy
>for people to write their own documents using those resources.
> 
>Michael
> 
>On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
>infinite.monkeys.with.keyboa...@gmail.com> wrote:
> 
>> In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
>> specific compilation order for my vignettes. This is because some vignettes
>> set up resources or objects that are to be used by later vignettes.
>> 
>> From what I understand, vignettes are compiled in alphanumeric ordering of
>> their file names. As such, I give my vignettes fairly structured names,
>> e.g., “work-1-reads.Rmd”, “work-2-umi.Rmd” and so on.
>> 
>> However, it becomes rather annoying when I want to add a new vignette in
>> the middle somewhere. This results in some unnatural numberings, e.g.,
>> “work-0”, “3b”, which are ugly and unintuitive. This is relevant as
>> BiocStyle::Biocpkg() links between vignettes require you to use the
>> destination vignette’s file name; so difficult names complicate linking,
>> especially if the names continually change to reflect new orderings.
>> 
>> Is there an easier way to control vignette compilation order? WRE provides
>> no (obvious) guidance, so I would like to know what non-standard hacks are
>> known to work on the build machines. I can imagine something dirty whereby
>> one ”reference” vignette contains code to “rmarkdown::render" all other
>> vignettes in the specified order… ugh.
>> 
>> -A
>> 
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
>> 
&

[Bioc-devel] Controlling vignette compilation order

2018-12-18 Thread Aaron Lun
In a number of my workflow packages (e.g., simpleSingleCell), I rely on a 
specific compilation order for my vignettes. This is because some vignettes set 
up resources or objects that are to be used by later vignettes. 

>From what I understand, vignettes are compiled in alphanumeric ordering of 
>their file names. As such, I give my vignettes fairly structured names, e.g., 
>“work-1-reads.Rmd”, “work-2-umi.Rmd” and so on.

However, it becomes rather annoying when I want to add a new vignette in the 
middle somewhere. This results in some unnatural numberings, e.g., “work-0”, 
“3b”, which are ugly and unintuitive. This is relevant as BiocStyle::Biocpkg() 
links between vignettes require you to use the destination vignette’s file 
name; so difficult names complicate linking, especially if the names 
continually change to reflect new orderings.

Is there an easier way to control vignette compilation order? WRE provides no 
(obvious) guidance, so I would like to know what non-standard hacks are known 
to work on the build machines. I can imagine something dirty whereby one 
”reference” vignette contains code to “rmarkdown::render" all other vignettes 
in the specified order… ugh. 

-A

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Compilation flags, CHECK errors and BiocNeighbors

2018-12-17 Thread Aaron Lun
Thanks Val. I don�t think it�s a BiocNeighbors thing, as it doesn�t try to 
customize the compilation flags or have its own Makevars. Moreover, the �-O3 
-mtune=native -mtune=generic� flags seem to show up on all of my packages 
containing C++11 code. Some cursory checks of other packages suggest that the 
correct flags (�-O2 -mtune=generic�) are used for C++98 code.

-A

> On 17 Dec 2018, at 17:47, Obenchain, Valerie 
>  wrote:
> 
> Hi Aaron,
> 
> The only compilation flags that are different for tokay1 (release) and 
> tokay2 (devel) are C++14 flags. BiocNeighbors is not using C++14 but 
> C++11 so I think the changes we discussed previously actually don't 
> apply to your case.
> 
> All compilation flags we use are listed at the top of the build report, 
> e.g., for tokay2:
> 
> https://www.bioconductor.org/checkResults/devel/bioc-LATEST/tokay2-NodeInfo.html
>  
> <https://www.bioconductor.org/checkResults/devel/bioc-LATEST/tokay2-NodeInfo.html>
> 
> I can look into this further but right now I'm not sure where the '-O3 
> -march=native -mtune=native' is coming from in the check output for 
> BiocNeighbors. We don't use 'native' on the builders for build/check or 
> for creating binaries.
> 
> Herve might have more insight on this.
> 
> Val
> 
> 
> 
> 
> 
> 
> 
> On 12/15/18 10:56 PM, Aaron Lun wrote:
>> Sometime between 6-18 November, BiocNeighbors� BioC-devel builds began 
>> failing on Windows 64-bit, and have continued to fail since:
>> 
>> http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/ 
>> <http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/> 
>> <http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/ 
>> <http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/>>
>> 
>> The most interesting part is the nature of the failures. They are not 
>> segmentation faults but rather �incorrect� output in the unit tests:
>> 
>> - BiocNeighbors uses the Annoy algorithm for approximate nearest neighbor 
>> search, which is provided as a header-only C++ library in the RcppAnnoy 
>> package.
>> 
>> - I have compiled the BiocNeighhbors C++ code with an �#include" for these 
>> libraries to use the Annoy routines. For testing, I compared the output of 
>> my C++ code to the output of the code in the RcppAnnoy package.
>> 
>> - It is these tests that are failing (i.e., the output does not match up) 
>> during CHECK on Windows 64-bit only, despite the fact that the same library 
>> is being �#include�d in both the BiocNeighbors and RcppAnnoy sources!
>> 
>> What makes this particularly intriguing is that the differences between 
>> BiocNeighbors and RcppAnnoy are very minor. Less than 1% of the neighbor 
>> identities differ, and only for some of the scenarios, so it�s not an 
>> obvious bug that would be changing the output en masse. Now, the package 
>> also uses/tests Annoy in BioC-release but builds fine on tokay1:
>> 
>> http://bioconductor.org/checkResults/release/bioc-LATEST/BiocNeighbors/ 
>> <http://bioconductor.org/checkResults/release/bioc-LATEST/BiocNeighbors/> 
>> <http://bioconductor.org/checkResults/release/bioc-LATEST/BiocNeighbors/ 
>> <http://bioconductor.org/checkResults/release/bioc-LATEST/BiocNeighbors/>>
>> 
>> The major difference between the Bioc-release/devel builds is the 
>> compilation flags, which have changed from �-O2 -mtune=generic� to �-O3 
>> -march=native -mtune=native� in tokay2. I am told (thanks Val) that the 
>> timing of this change is consistent with the start of the BiocNeighbors 
>> build failures on tokay2. I would guess that RcppAnnoy is also compiled with 
>> �-O2 -mtune=generic� on the CRAN build systems, introducing differences in 
>> optimization levels between the BiocNeighbors and RcppAnnoy binaries. These 
>> could be responsible for the discrepancies in the search results.
>> 
>> I was able to reproduce this on my Unix cluster (gcc 6.5.0) where setting 
>> �-march=native� with either �-O3� or �-O2� caused a difference in the 
>> calculations. After much trial and error, I eventually narrowed this down to 
>> the �-mfma� flag, which seems to change the precision of multiply-and-add 
>> operations and thus the search results. This occurs even when AVX support is 
>> turned off; I guess the compiler tries to be smart if it detects you are 
>> doing some kind of simultaneous multiply and addition, which is a pretty 
>> common thing to do when computing Euclidean distances.
>> 
>> In summary: can we not use �-march=native� on tokay2? (Val, 

[Bioc-devel] Compilation flags, CHECK errors and BiocNeighbors

2018-12-15 Thread Aaron Lun
Sometime between 6-18 November, BiocNeighbors’ BioC-devel builds began failing 
on Windows 64-bit, and have continued to fail since:

http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/ 


The most interesting part is the nature of the failures. They are not 
segmentation faults but rather “incorrect” output in the unit tests:

- BiocNeighbors uses the Annoy algorithm for approximate nearest neighbor 
search, which is provided as a header-only C++ library in the RcppAnnoy package.

- I have compiled the BiocNeighhbors C++ code with an “#include" for these 
libraries to use the Annoy routines. For testing, I compared the output of my 
C++ code to the output of the code in the RcppAnnoy package.

- It is these tests that are failing (i.e., the output does not match up) 
during CHECK on Windows 64-bit only, despite the fact that the same library is 
being “#include”d in both the BiocNeighbors and RcppAnnoy sources!

What makes this particularly intriguing is that the differences between 
BiocNeighbors and RcppAnnoy are very minor. Less than 1% of the neighbor 
identities differ, and only for some of the scenarios, so it’s not an obvious 
bug that would be changing the output en masse. Now, the package also 
uses/tests Annoy in BioC-release but builds fine on tokay1:

http://bioconductor.org/checkResults/release/bioc-LATEST/BiocNeighbors/ 


The major difference between the Bioc-release/devel builds is the compilation 
flags, which have changed from “-O2 -mtune=generic” to “-O3 -march=native 
-mtune=native” in tokay2. I am told (thanks Val) that the timing of this change 
is consistent with the start of the BiocNeighbors build failures on tokay2. I 
would guess that RcppAnnoy is also compiled with “-O2 -mtune=generic” on the 
CRAN build systems, introducing differences in optimization levels between the 
BiocNeighbors and RcppAnnoy binaries. These could be responsible for the 
discrepancies in the search results.

I was able to reproduce this on my Unix cluster (gcc 6.5.0) where setting 
“-march=native” with either “-O3” or “-O2” caused a difference in the 
calculations. After much trial and error, I eventually narrowed this down to 
the “-mfma” flag, which seems to change the precision of multiply-and-add 
operations and thus the search results. This occurs even when AVX support is 
turned off; I guess the compiler tries to be smart if it detects you are doing 
some kind of simultaneous multiply and addition, which is a pretty common thing 
to do when computing Euclidean distances. 

In summary: can we not use “-march=native” on tokay2? (Val, I know we discussed 
this, but whatever changes you made to the compilation flags don’t seem to have 
propagated to the build machines.) As the case study with BiocNeighbors shows, 
this leads to inconsistencies between the CRAN and BioC-devel binaries for the 
same code, which unnecessarily complicates downstream usage and unit tests. I 
also wonder how binaries specialized for tokay2’s architecture would behave on 
other CPUs with different instruction sets, if they would run at all.

Cheers,

Aaron
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocInstaller: next generation

2018-05-09 Thread Aaron Lun
This all sounds pretty reasonable to me. The ability to choose the
version in install() is nice, especially if we can easily flip between
versions in different install locations. I presume that
version="release" will be the default?

As for the names - BiocManager seems the most sober of the lot. And
thematically appropriate - you might have an orchestra and conductor,
but you still need a manager to get everyone paid, fed and on the stage.

-Aaron

Martin Morgan wrote:
> Developers --
>
> A preliminary heads-up and request for comments.
>
> Almost since project inception, we've used the commands
>
>   source("https://bioconductor.org/biocLite.R;)
>   biocLite(pkgs)
>
> to install packages. This poses security risks (e.g., typos in the
> url) and deviates from standard R package installation procedures.
>
>
> We'd like to move to a different system where a base package, call it
> 'BiocManager', is installed from CRAN and used to install Bioconductor
> packages
>
>   if (!"BiocManager" %in% rownames(installed.packages()))
>   install.packages("BiocManager")
>   BiocManager::install(pkgs)
>
> This establishes a secure chain from user R session to Bioconductor
> package installation. It is also more consistent with base R package
> installation procedures.
>
> BiocManager exposes four functions
>
>   - install() or update packages
>
>   - version() version of Bioconductor in use
>
>   - valid() are all Bioconductor packages from the same Bioconductor
> version?
>
>   - repositories() url location for Bioconductor version-specific
> repositories
>
> install() behaves like biocLite(), using the most current version of
> Bioconductor for the version of R in use. It stores this state using a
> Bioconductor package 'BiocVersion', which is nothing more than a
> sentinel for the version in use. One can also 'use devel' or a
> particular version of Bioconductor (consistent with the version of R)
> with
>
>   BiocManager::install(version = "3.8")   # or the synonym "devel"
>
>
> We intend to phase this in over several release cycles, and to
> continue to support the traditional biocLite() route for versions
> before BiocManager becomes available.
>
> We also intend to change the overall versioning of 'Bioconductor'
> itself, where releases are always even (3.8, 3.10, 3.12, ...) and
> 'devel' always odd.
>
> Obviously this is a large change, eventually requiring updates to many
> locations on our web site and individual vignettes.
>
>
> Of course the key question is the name of the 'BiocManager' package.
> It cannot easily be 'BiocInstaller', because of the differences in way
> CRAN and Bioconductor version packages. Some possible names are
> '
> BiocInstall::install()
> BiocPackages::install()
> BiocManager
> BiocMaestro
>
>
> Your comments are welcome...
>
> Martin
>
>
> This email message may contain legally privileged and/or...{{dropped:2}}
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
___

The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.

The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of the 
Kulin
Nation as the traditional owners of the land where our campuses are located and
the continuing connection to country and community.
___
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] problem with class definitions between S4Vectors and RNeXML in using Summarized Experiment

2018-04-12 Thread Aaron Lun
Well, it's not really SingleCellExperiment's problem, either.

library(S4Vectors)
DataFrame(1:5) # Silent, okay.
library(RNeXML)
DataFrame(1:5) # Prints out the message
## Found more than one class "Annotated" in cache; using the first,
 from namespace 'S4Vectors'
## Also defined by ‘RNeXML’

Session information attached below.

-Aaron

> sessionInfo()
R Under development (unstable) (2018-03-26 r74466)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /home/cri.camres.org/lun01/Software/R/trunk/lib/libRblas.so
LAPACK: /home/cri.camres.org/lun01/Software/R/trunk/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] parallel  stats4stats graphics  grDevices
utils datasets 
[8] methods   base 

other attached packages:
[1] RNeXML_2.0.8ape_5.1 S4Vectors_0.17.41  
[4] BiocGenerics_0.25.3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16compiler_3.6.0  pillar_1.2.1   
 [4] plyr_1.8.4  bindr_0.1.1 iterators_1.0.9
 [7] tools_3.6.0 uuid_0.1-2  jsonlite_1.5   
[10] tibble_1.4.2nlme_3.1-137lattice_0.20-35
[13] pkgconfig_2.0.1 rlang_0.2.0 foreach_1.4.4  
[16] crul_0.5.2  curl_3.2bindrcpp_0.2.2 
[19] httr_1.3.1  stringr_1.3.0   dplyr_0.7.4
[22] xml2_1.2.0  grid_3.6.0  reshape_0.8.7  
[25] glue_1.2.0  data.table_1.10.4-3 R6_2.2.2   
[28] XML_3.98-1.10   purrr_0.2.4 reshape2_1.4.3 
[31] tidyr_0.8.0 magrittr_1.5codetools_0.2-15   
[34] assertthat_0.2.0bold_0.5.0  taxize_0.9.3   
[37] stringi_1.1.7   lazyeval_0.2.1  zoo_1.8-1          


On Thu, 2018-04-12 at 17:40 +0200, Elizabeth Purdom wrote:
> Just to follow up on my previous post. I am able to replicate the
> problem in the problem like in the github post from 2 years ago (http
> s://github.com/epurdom/clusterExperiment/issues/66
> ) only now it
> is not the SummarizedExperiment class but the SingleCellExperiment
> class that has the problem. [And I was incorrect, the problem does
> occur in  development version 2018-03-22 r74446]. 
> 
> So this is actually a problem with the SingleCellExperiment package —
> sorry for the incorrect subject line.
> 
> All of the best,
> Elizabeth
> 
> > 
> > > 
> > > library(SingleCellExperiment)
> > > SingleCellExperiment()
> > class: SingleCellExperiment 
> > dim: 0 0 
> > metadata(0):
> > assays(0):
> > rownames: NULL
> > rowData names(0):
> > colnames: NULL
> > colData names(0):
> > reducedDimNames(0):
> > spikeNames(0):
> > > 
> > > library(RNeXML)
> > Loading required package: ape
> > > 
> > > 
> > > SingleCellExperiment()
> > Found more than one class "Annotated" in cache; using the first,
> > from namespace 'S4Vectors'
> > Also defined by ‘RNeXML’
> > Found more than one class "Annotated" in cache; using the first,
> > from namespace 'S4Vectors'
> > Also defined by ‘RNeXML’
> > class: SingleCellExperiment 
> > dim: 0 0 
> > metadata(0):
> > assays(0):
> > rownames: NULL
> > rowData names(0):
> > colnames: NULL
> > colData names(0):
> > reducedDimNames(0):
> > spikeNames(0):
> 
> 
> 
> > 
> > > 
> > > sessionInfo()
> > R Under development (unstable) (2018-03-22 r74446)
> > Platform: x86_64-apple-darwin15.6.0 (64-bit)
> > Running under: OS X El Capitan 10.11.6
> > 
> > Matrix products: default
> > BLAS:
> > /System/Library/Frameworks/Accelerate.framework/Versions/A/Framewor
> > ks/vecLib.framework/Versions/A/libBLAS.dylib
> > LAPACK:
> > /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapa
> > ck.dylib
> > 
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> > 
> > attached base packages:
> > [1] parallel  stats4stats graphics  grDevices
> > utils datasets  methods   base 
> > 
> > other attached packages:
> >  [1]
> > RNeXML_2.0.8ape_5.1 SingleCellE
> > xperiment_1.1.2 
> >  [4] SummarizedExperiment_1.9.16
> > DelayedArray_0.5.30 BiocParallel_1.13.3
> >  [7]
> > matrixStats_0.53.1  Biobase_2.39.2  GenomicRang
> > es_1.31.23  
> > [10]
> > GenomeInfoDb_1.15.5 IRanges_2.13.28 S4Vectors_0
> > .17.41  
> > [13] BiocGenerics_0.25.3
> > 
> > loaded via a namespace (and not attached):
> >  [1]
> > Rcpp_0.12.16   pillar_1.2.1   bindr_0.1.1  
> >   compiler_3.5.0
> >  [5]
> > plyr_1.8.4 

Re: [Bioc-devel] Workflows are now in git (and other Important workflow-related changes)

2018-04-03 Thread Aaron Lun
It would also be nice to have a mechanism to obtain the underlying *.md
files used to generate the compiled workflow HTMLs. I've been using
this to check that the results have not changed since the last run.

Of course, I could also do this locally, but it would be reassuring to
confirm that the workflow is building correctly on the BioC machines.

-A

On Mon, 2018-04-02 at 11:42 +, Shepherd, Lori wrote:
> For now the page at 
> https://www.bioconductor.org/help/workflows/
> will remain and be updated accordingly however there is discussion
> about having this page be removed or redirected.  
> 
> In the next few days I hope to have the new landing pages up and
> running and will make an announcement when they are live. The new
> changes involve having a link to the workflows on the biocViews page,
> that will directly link to a workflow package landing page (like the
> software, annotation, and experiment package do). 
> 
> Until the release your changes will be visible in devel which will
> transition over to the new release just as done for software packages
> - 
> The workflow are built monday, wednesday, and friday and the new
> landing pages will reflect when the package is propagated like the
> other types of packages. 
> http://bioconductor.org/checkResults/3.7/workflows-LATEST/
> 
> Lori Shepherd
> Bioconductor Core Team
> Roswell Park Cancer Institute
> Department of Biostatistics & Bioinformatics
> Elm & Carlton Streets
> Buffalo, New York 14263
> From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of
> Aaron Lun <a...@wehi.edu.au>
> Sent: Sunday, April 1, 2018 2:17:09 PM
> To: Hervé Pagès; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Workflows are now in git (and other
> Important workflow-related changes)
>  
> Thanks everybody, this is much appreciated.
> 
> On that note, will the compiled workflows shown at:
> 
> https://www.bioconductor.org/help/workflows/
> 
> ... be updated to reflect changes in the git repositories for the
> workflows? 
> 
> Or will the workflow page just directly link to the landing page for
> each package? This would be very convenient, not least because it
> will
> avoid me having to make pull requests to the bioconductor.org
> repository every time I want to change my workflow blurb.
> 
> I'm in the process of fixing and updating my various workflows, and
> I'm
> wondering when/how my changes will be visible to users.
> 
> Cheers,
> 
> Aaron
> 
> On Fri, 2018-03-30 at 13:10 -0700, Hervé Pagès wrote:
> > To the authors/maintainers of the workflows:
> > 
> > 
> > Following the svn-to-git migration of the software and data
> > experiment
> > packages last summer, we've completed the migration of the workflow
> > packages.
> > 
> > The canonical location for the workflow source code now is
> > git.bioconductor.org
> > 
> > Please use your git client to access/maintain your workflow the
> same
> > way you would do it for a software or data-experiment package.
> > 
> > We've also migrated the workflows to our in-house build system.
> > Starting with Bioc 3.7, the build report for the devel versions of
> > the workflows can be found here:
> > 
> >    https://bioconductor.org/checkResults/devel/workflows-LATEST/
> > 
> > We run these builds every other day (Mondays, Wednesdays, Fridays).
> > Because of limited build resources, we now run the data-experiment
> > builds on Sundays, Tuesdays, and Thursdays only (instead of daily).
> > 
> > The links to the package landing pages are not working yet. This
> > will be addressed in the next few days.
> > 
> > Please address any error you see on the report for the workflow
> > you maintain.
> > 
> > Note that, from now on, we're also following the same version
> scheme
> > for these packages as for the software and data-experiment
> packages.
> > That is, we're using an even y (in x.y.z) in release and an odd y
> in
> > devel. We'll take care of bumping y at release time (like we do for
> > software and data-experiment packages).
> > 
> > After the next Bioconductor release (scheduled for May 1), we'll
> > start
> > building the release versions of the workflows in addition to the
> > devel versions. The build report for the release versions will be
> > here:
> > 
> >    https://bioconductor.org/checkResults/release/workflows-LATEST/
> > 
> > Finally, please note that with the latest version of BiocInstaller
> > (1.29.5), workflow packages can be installed with biocLite(), like
> > any other Bioconductor package. We'll depreca

Re: [Bioc-devel] Workflows are now in git (and other Important workflow-related changes)

2018-04-01 Thread Aaron Lun
Thanks everybody, this is much appreciated.

On that note, will the compiled workflows shown at:

https://www.bioconductor.org/help/workflows/

... be updated to reflect changes in the git repositories for the
workflows? 

Or will the workflow page just directly link to the landing page for
each package? This would be very convenient, not least because it will
avoid me having to make pull requests to the bioconductor.org
repository every time I want to change my workflow blurb.

I'm in the process of fixing and updating my various workflows, and I'm
wondering when/how my changes will be visible to users.

Cheers,

Aaron

On Fri, 2018-03-30 at 13:10 -0700, Hervé Pagès wrote:
> To the authors/maintainers of the workflows:
> 
> 
> Following the svn-to-git migration of the software and data
> experiment
> packages last summer, we've completed the migration of the workflow
> packages.
> 
> The canonical location for the workflow source code now is
> git.bioconductor.org
> 
> Please use your git client to access/maintain your workflow the same
> way you would do it for a software or data-experiment package.
> 
> We've also migrated the workflows to our in-house build system.
> Starting with Bioc 3.7, the build report for the devel versions of
> the workflows can be found here:
> 
>    https://bioconductor.org/checkResults/devel/workflows-LATEST/
> 
> We run these builds every other day (Mondays, Wednesdays, Fridays).
> Because of limited build resources, we now run the data-experiment
> builds on Sundays, Tuesdays, and Thursdays only (instead of daily).
> 
> The links to the package landing pages are not working yet. This
> will be addressed in the next few days.
> 
> Please address any error you see on the report for the workflow
> you maintain.
> 
> Note that, from now on, we're also following the same version scheme
> for these packages as for the software and data-experiment packages.
> That is, we're using an even y (in x.y.z) in release and an odd y in
> devel. We'll take care of bumping y at release time (like we do for
> software and data-experiment packages).
> 
> After the next Bioconductor release (scheduled for May 1), we'll
> start
> building the release versions of the workflows in addition to the
> devel versions. The build report for the release versions will be
> here:
> 
>    https://bioconductor.org/checkResults/release/workflows-LATEST/
> 
> Finally, please note that with the latest version of BiocInstaller
> (1.29.5), workflow packages can be installed with biocLite(), like
> any other Bioconductor package. We'll deprecate the old mechanism
> (workflowInstall()) at some point in the future.
> 
> Thanks to Andrzej, Lori, Nitesh, and Valerie for working on this
> migration.
> 
> Let us know if you have any question about this.
> 
> H.
> 
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] R version check in BiocChech

2018-02-19 Thread Aaron Lun
>  > Personally, I haven't found it to be particularly difficult to update R,
>  > or to run R-devel in parallel with R 3.4, even without root privileges.
> 
> I find it much harder for a normal user to install R-devel (and update 
> it properly, because it's a development version) and running 
> 'devtools::install_github("blabla/my_package")'.

There seem to be two issues here.

The first is regarding the usability of your specific package. For this, 
Kevin's suggestion (and what you are already doing) is pretty 
reasonable. It's just a branch with a single altered commit (>= 3.5 to 
 >= 3.4); it costs nothing, and you can delete it later.

However, this "solution" will only last until the next BioC release, at 
which point biocLite() will only work on R 3.5.*. So, sooner or later, 
your users will have to update their versions of R.

Which leads us to the second question. Should Bioconductor, as a 
project, enforce the use of the latest R version? The core team will 
have better things to say than me on this topic, but for me, the answer 
is an unqualified yes. We get the latest features, bugfixes and 
improvements; a considerable set of benefits, IMHO.

>  > I think many people underappreciate the benefits of moving to the latest
>  > version of R.
> 
> Don't you think it should be a developer's choice whether to use such 
> new features or ignore them and have a potentially bigger audience?

It's true that a developer might not need the latest cutting-edge 
features in the latest version of R. But they should incorporate bug 
fixes to the underlying infrastructure, or changes to existing 
functionality that result in different behaviour.

Of course, it would be difficult to ask every developer to read through 
the NEWS to see if the changes affect their package. It is much easier 
for everyone to just use the latest version of R; then we only have to 
deal with bugs in the latest version, not previously solved ones.

And besides; let's say, hypothetically, BioC didn't have a R version 
requirement. Unless you're using a quite restricted subset of packages, 
you'll encounter a package somewhere that requires the latest R version. 
In my workflows, I know that I load at least 100 packages; only one of 
them needs to have R (>= 3.5) to force me to upgrade anyway.

>  > Enforcing version consistency avoids heartache during release and
>  > debugging.
> 
> But it's a developer's heartache. As I said, it even can't be attributed 
> to Bioconductor at all, as it's not possible to install the package from 
> bioc-devel, unless you have the corresponding R version.

Yes, that's the point. To paraphrase what I tell my colleagues:

Bugs in a BioC-release package with R 3.4 = my problem
Bugs in a BioC-devel package with R 3.5 = my problem
Bugs in a BioC-devel package with R 3.4 = not my problem

 From my perspective, the version requirements in biocLite() ensure that 
the user is doing things properly; and if they follow the rules, any 
bugs are therefore the fault of my package. If the users don't follow 
the rules, they're on their own - but at least they know what the rules 
are, because it's pretty inconvenient to break them.

Cheers,

Aaron

> On Mon, Feb 19, 2018 at 6:38 PM, Aaron Lun <a...@wehi.edu.au 
> <mailto:a...@wehi.edu.au>> wrote:
> 
> I'll just throw in my two cents here.
> 
> I think many people underappreciate the benefits of moving to the latest
> version of R. If you inspect the R-devel NEWS file, there's a couple of
> nice fixes/features that a developer might want to take advantage of:
> 
> - sum() doesn't give NAs upon integer overflow anymore.
> - New ...elt(n) and ...length() functions for dealing with ellipses.
> - ALTREP support for 1:n sequences (wow!)
> - zero length subassignment in a non-zero index fails correctly.
> 
> The previous 3.4.0 release also added support for more DLLs being loaded
> at once, which was otherwise causing headaches in workflows. And 3.4.2
> had a bug fix to LAPACK, which did result in a few user-level changes in
> some packages like edgeR. So there are considerable differences between
> the versions of R, especially if one is a package developer.
> 
> Enforcing version consistency avoids heartache during release and
> debugging. There's a choice between users getting annoyed about having
> to update R, and then updating R, and everything working as a result; or
> everyone (developers/users) wasting some time figuring out whether a bug
> in a package is due to the code in the package itself or the version of
> R. The brief annoyance in the first option is better than the chronic
> grief of the second option, especially given that the solution to the
> problem in the second 

Re: [Bioc-devel] R version check in BiocChech

2018-02-19 Thread Aaron Lun
I'll just throw in my two cents here.

I think many people underappreciate the benefits of moving to the latest 
version of R. If you inspect the R-devel NEWS file, there's a couple of 
nice fixes/features that a developer might want to take advantage of:

- sum() doesn't give NAs upon integer overflow anymore.
- New ...elt(n) and ...length() functions for dealing with ellipses.
- ALTREP support for 1:n sequences (wow!)
- zero length subassignment in a non-zero index fails correctly.

The previous 3.4.0 release also added support for more DLLs being loaded 
at once, which was otherwise causing headaches in workflows. And 3.4.2 
had a bug fix to LAPACK, which did result in a few user-level changes in 
some packages like edgeR. So there are considerable differences between 
the versions of R, especially if one is a package developer.

Enforcing version consistency avoids heartache during release and 
debugging. There's a choice between users getting annoyed about having 
to update R, and then updating R, and everything working as a result; or 
everyone (developers/users) wasting some time figuring out whether a bug 
in a package is due to the code in the package itself or the version of 
R. The brief annoyance in the first option is better than the chronic 
grief of the second option, especially given that the solution to the 
problem in the second option would be to update R anyway.

Personally, I haven't found it to be particularly difficult to update R, 
or to run R-devel in parallel with R 3.4, even without root privileges.

-Aaron

On 19/02/18 14:55, Kevin RUE wrote:
> Hi Alexey,
> 
> I do agree with you that there is no harm in testing against other version
> of R. In a way, that is even good practice, considering that many HPC users
> do not always have access to the latest version of R, and that Travis is
> making this fairly easy.
> 
> Now, with regard to your latest reply, I am wondering whether we're having
> confusion here between the "R≥x.x" requirement, and the version(s) of R
> that you use to develop/test your package (the version of R installed on
> your own machine).
> 
> First, I think the "R≥x.x" does not have an explicit rule.
> To me, the point of this requirement is to declare the oldest version of R
> that the package has been tested/validated for. This does not necessarily
> have to be the _next_ version of R (see the core Bioc package S4Vectors:
> https://bioconductor.org/packages/release/bioc/html/S4Vectors.html, and I
> am sure there are older requirements in other packages).
> Here, I think the decision here boils down to how far back in terms of R
> versions the developer is willing to support the package. I suppose one
> could state R≥2.3 if they're confident about it.
> 
> On a separate note, going back to the Bioc guideline that I initially
> highlighted ("Package authors should develop against the version of *R* that
> will be available to users when the *Bioconductor* devel branch becomes the
> *Bioconductor* release branch."), this rather refers to the forward-looking
> guideline that the cutting-edge version of any R package should be
> compatible with the cutting edge version of R, and that developers should
> be working with R-devel to ensure this.
> In other words, this only refers to the version of R that the developer
> should have installed on their own machine. It does not request users to
> make R-devel a _requirement_ of their package.
> 
> I hope this addresses your question better, and I am curious to hear if
> anyone else has an opinion or precisions to weigh in on this topic.
> 
> Best,
> Kevin
> 
> 
> On Mon, Feb 19, 2018 at 12:19 PM, Alexey Sergushichev 
> wrote:
> 
>> Hello Kevin,
>>
>> Well, bioc-devel packages are tested against bioc-devel (and R-3.5) in any
>> case. What I'm saying is that aside from testing the package against
>> bioc-devel, I can as well test against bioc-release too on my own. If the
>> package doesn't work with bioc-devel it shouldn't pass bioc-devel checks,
>> if the package is properly developed and has a good test coverage. So I see
>> no problem in allowing developers to test against other versions, on top of
>> developing against bioc-devel. And as it's only possible to install the
>> package from github and not from Bioconductor, the developer alone is
>> responsible for the package to work properly.
>>
>> I can't really see a scenario, where requiring R >= 3.5 helps to improve
>> the package quality.
>>
>>> A short-term workaround can be to create a git branch (e.g. "3.4").
>>
>> That's the way I'm doing too, but supporting two branches different only
>> in R version looks ridiculous and unnecessary.
>>
>> --
>> Alexey
>>
>>
>>
>>
>>
>> On Mon, Feb 19, 2018 at 12:48 PM, Kevin RUE  wrote:
>>
>>> Dear Alexey,
>>>
>>> The reason is somewhat implicitly given at https://www.bioconductor.or
>>> g/developers/how-to/useDevel/ :
>>> "Package authors should develop against the version of *R* 

[Bioc-devel] Alpha release of the iSEE visualization tool

2018-01-31 Thread Aaron Lun
Dear list,

At the European Bioconductor Meeting in December 2017, a group of us 
decided to develop a common interface for interactive visualization of 
single-cell ‘omics data. Two months (and over 500 commits) later, we are 
proud to present the alpha release of iSEE, a package for interactive 
visualization of ‘omics (meta)data contained in a SummarizedExperiment 
(or SingleCellExperiment) object.

iSEE (“interactive SummarizedExperiment Explorer”) provides a flexible, 
powerful and reactive framework to examine reduced dimension results, 
column metadata, gene expression and gene-specific statistics. 
Functionalities include table-to-plot links for gene selection, 
multi-plot brushing relationships for subsetting and highlighting, 
adaptive panel placement and resizing, as well as code tracking for full 
reproducibility.

iSEE is currently available from GitHub at 
https://github.com/csoneson/iSEE and can be installed with the usual 
methods (i.e., via biocLite with devtools and R-devel). We would 
appreciate feedback from the Bioconductor community regarding bugs or 
(well-considered) potential new features, which can be raised as GitHub 
issues. We hope to submit to Bioconductor before the next release.

Regards,

The iSEE team (Aaron, Charlotte, Federico and Kevin)

P.S. Yes, we will put in heatmaps eventually.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Changes to rhdf5

2018-01-05 Thread Aaron Lun
Great work Mike; looking forward to it on BioC-devel.

When are persistently open file handles generated? All the time, or on 
specific uses of the upper-case H5 functions?

I ask because I recall that the HDF5 library permits multiple open file 
handles, but it throws an error if the access flags are not consistent. 
For example, when developing beachmat, I initially used H5F_ACC_TRUNC 
for opening a new file for HDF5Matrix output. However, this seemed to be 
incompatible with h5createFile(), which opened the file in H5F_ACC_RDWR. 
It took me a while to get around that.

At any rate, currently beachmat works with the github version of rhdf5 
on my computer, so at least that's reassuring. And rhdf5 compiles so 
much faster now!

Cheers,

-A

On 05/01/18 12:02, Mike Smith wrote:
> Dear BioC Developers,
> 
> Just a heads up to point out that I've made a fundamental change to the
> rhdf5 package over the last week.  I've directly emailed developers of
> packages that I think this affects, but thought I would post here too in
> case I missed someone.
> 
> These changes are mostly to make it link against Rhdf5lib, which in turn
> updates the version of the HDF5 library we're using.  For the most part
> this seems like it shouldn't disrupt things too much, but it has a really
> dramatic effect on the H5close() function.  The previously advertised use
> was to do some house keeping and close any HDF5 references that are left
> open.  However if you run it with the updated version, it now shuts down
> the HDF5 interface and you pretty much have to restart your R session for
> any further rhdf5 commands to work.  This is obviously not ideal behaviour
> if it's already incorporated into your code!
> 
> Despite this, I'm going to leave H5close() in the package since it mirrors
> part of the HDF5 API, but that may change as it feels pretty useless to a
> package developer.  With this in mind, if you are using it I think there's
> three options you can choose to update your code:
> 
> - There is a straight drop in replacement to close everything with the
> new h5closeAll() function.
> - Check if you need to close things at all.  There were a number of
> other functions that left open handles if they exited under error
> conditions that have now been updated to exit more gracefully.  If you 
> were
> only using H5close() as a safety net it might now not be needed at all.
> - Use the appropriate close function for the HDF5 type e.g. H5Fclose(),
> H5Dclose() etc.  If you're using higher level functions like h5create()
> and h5write() this doesn't apply.
> 
> The changes are now in the Bioc devel branch (rhdf5 version 2.23.3), and
> should appear fairly soon.  Please let me know if I can provide any more
> info, or things start behaving unexpectedly.  I've tried to test this
> thoroughly, but there's always cases that I will have missed.
> 
> Cheers,
> Mike
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] workflow page reorganization

2017-12-15 Thread Aaron Lun
Thanks Andrzej. And yes, I just put in a PR; I hope itemized sublists 
for a particular workflow (i.e., mine) aren't too ambitious.

I'm thinking about whether "Epigenetics" is the right section heading. 
For greatest generality, you could call it "Genome regulation", which 
bundles all genomic binding/accessibility/methylation stuff together.

I was also pondering whether cytof belongs in Proteomics or Single-cell, 
given the equivalent FACS workflow lives in Proteomics and the nature of 
the data are fundamentally different from single cell genomics datasets. 
But I suppose that's none of my business, I'll leave it to Lukas & co.

It would be nice to see an image analysis workflow up there!

-A

On 15/12/17 15:27, Andrzej Oleś wrote:
> Hi Aaron,
> 
> thank you for taking the lead. I've merged your suggested categories 
> with my preliminary arrangement (which just went online). I've also 
> included an index at the beginning of the page.
> 
> Any refinements are of course welcome, e.g. via PR to 
> https://github.com/Bioconductor/bioconductor.org
> 
> Cheers,
> Andrzej
> 
> 
> On Fri, Dec 15, 2017 at 2:08 PM, Aaron Lun <a...@wehi.edu.au 
> <mailto:a...@wehi.edu.au>> wrote:
> 
> My proposed categories reflect my vested interests, but here goes:
> 
> - Gene expression:
>      - rnaseqGene
>      - RNAseq123
>      - ExpressionNormalizationWorkflow
>      - RnaSeqGeneEdgeRQL
> 
> - Epigenomics: (not quite sure what to call this)
>      - chipseqDB
>      - methylationArrayAnalysis
>      - generegulation
> 
> - Single cell:
>      - simpleSingleCell
>      - cytofWorkflow (or in proteomics?)
> 
> - Proteomics:
>      - proteomics
>      - highthroughputassays
>      - cytofWorkflow (see above)
> 
> - Variant calling:
>      - Variant calling
>      - Nucleotide tallies
>      - eQTL
> 
> - Resource querying: (needs a better name)
>      - recountWorkflow
>      - TCGAWorkflow
> 
> - Other:
>      - everything else.
> 
> I haven't looked at the Basic workflows, which are probably basic enough
> to be lumped together in that existing section.
> 
> -A
> 
> On 15/12/17 12:33, Shepherd, Lori wrote:
>  > Hello all,
>  >
>  >
>  > There has been a request to reorganize the workflow page as
> workflows have grown past basic and advanced.
>  >
>  >
>  > http://bioconductor.org/help/workflows/
> <http://bioconductor.org/help/workflows/>
>  >
>  >
>  > We wanted to check with the community what your thoughts were for
> categories.
>  >
>  > Thank you for your suggestions.
>  >
>  >
>  >
>  > Lori Shepherd
>  >
>  > Bioconductor Core Team
>  >
>  > Roswell Park Cancer Institute
>  >
>  > Department of Biostatistics & Bioinformatics
>  >
>  > Elm & Carlton Streets
>  >
>  > Buffalo, New York 14263
>  >
>  >
>  > This email message may contain legally privileged and/or
> confidential information.  If you are not the intended recipient(s),
> or the employee or agent responsible for the delivery of this
> message to the intended recipient(s), you are hereby notified that
> any disclosure, copying, distribution, or use of this email message
> is prohibited.  If you have received this message in error, please
> notify the sender immediately by e-mail and delete this email
> message from your computer. Thank you.
>  >       [[alternative HTML version deleted]]
>  >
>  > ___
>  > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
> mailing list
>  > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>  >
> ___
> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> 
> 
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] workflow page reorganization

2017-12-15 Thread Aaron Lun
My proposed categories reflect my vested interests, but here goes:

- Gene expression:
- rnaseqGene
- RNAseq123
- ExpressionNormalizationWorkflow
- RnaSeqGeneEdgeRQL

- Epigenomics: (not quite sure what to call this)
- chipseqDB
- methylationArrayAnalysis
- generegulation

- Single cell:
- simpleSingleCell
- cytofWorkflow (or in proteomics?)

- Proteomics:
- proteomics
- highthroughputassays
- cytofWorkflow (see above)

- Variant calling:
- Variant calling
- Nucleotide tallies
- eQTL

- Resource querying: (needs a better name)
- recountWorkflow
- TCGAWorkflow

- Other:
- everything else.

I haven't looked at the Basic workflows, which are probably basic enough 
to be lumped together in that existing section.

-A

On 15/12/17 12:33, Shepherd, Lori wrote:
> Hello all,
> 
> 
> There has been a request to reorganize the workflow page as workflows have 
> grown past basic and advanced.
> 
> 
> http://bioconductor.org/help/workflows/
> 
> 
> We wanted to check with the community what your thoughts were for categories.
> 
> Thank you for your suggestions.
> 
> 
> 
> Lori Shepherd
> 
> Bioconductor Core Team
> 
> Roswell Park Cancer Institute
> 
> Department of Biostatistics & Bioinformatics
> 
> Elm & Carlton Streets
> 
> Buffalo, New York 14263
> 
> 
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee or 
> agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] splitting simpleSingleCell into self-contained vignettes

2017-12-13 Thread Aaron Lun
While we wait for the changes to come online: would you be open to PRs 
to workflow.md? I was thinking of making a nested list for the 
Introduction/part 1/part 2/part 3, which is a bit nicer to read.

-A

On 12/12/17 22:11, Andrzej Oleś wrote:
> Thanks for you feedback Aaron!
> 
> On Tue, Dec 12, 2017 at 9:49 PM, Aaron Lun <a...@wehi.edu.au 
> <mailto:a...@wehi.edu.au>> wrote:
> 
> Thanks Andrzej.
> 
> > Thank you. I've edited the workflow index page by introducing a separate
> > "Single-cell Workflows" section, and by substituting the previous link 
> to
> > your workflow by links to the individual parts.
> 
> Great, I'm looking forward to seeing it. Do you know how frequently the
> index page (I assume we're talking about
> https://bioconductor.org/help/workflows/
> <https://bioconductor.org/help/workflows/>) updates? I assume your edits
> haven't propagated through the system yet.
> 
> 
> Not sure, should be online by now
> https://github.com/Bioconductor/bioconductor.org/commit/a60c46f0942d9825f9a643321890ba5987de109b
> 
> 
> > As discussed during EuroBioc, I'm happy to restructure the index page by
> > grouping workflows by topic. It would be really helpful if authors would
> > chime in to suggest the most relevant sections for their workflows.
> 
> I can chip in with two that I'm involved in:
> 
> "Differential Binding from ChIP-seq data
> <https://bioconductor.org/help/workflows/chipseqDB/
> <https://bioconductor.org/help/workflows/chipseqDB/>>" => ChIP-seq
> workflows
> "Gene-level RNA-seq differential expression and pathway analysis
> <https://bioconductor.org/help/workflows/RnaSeqGeneEdgeRQL/
> <https://bioconductor.org/help/workflows/RnaSeqGeneEdgeRQL/>>" =>
> RNA-seq
> workflows
> 
> Of course, it depends on how granular you want the topics to be. For
> example, I only see one ChIP-seq workflow, so that particular section
> might be a bit lonely for a while (I am planning to split that into two
> workflows later).
> 
> 
> Right, we should probably avoid hair-splitting. We can start with a few, 
> say 6, and split up further according to demand as new ones are introduced.
> 
> Best,
> Andrzej
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] splitting simpleSingleCell into self-contained vignettes

2017-12-12 Thread Aaron Lun
Thanks Andrzej.

> Thank you. I've edited the workflow index page by introducing a separate
> "Single-cell Workflows" section, and by substituting the previous link to
> your workflow by links to the individual parts.

Great, I'm looking forward to seeing it. Do you know how frequently the
index page (I assume we're talking about
https://bioconductor.org/help/workflows/) updates? I assume your edits
haven't propagated through the system yet.

> As discussed during EuroBioc, I'm happy to restructure the index page by
> grouping workflows by topic. It would be really helpful if authors would
> chime in to suggest the most relevant sections for their workflows.

I can chip in with two that I'm involved in:

"Differential Binding from ChIP-seq data
<https://bioconductor.org/help/workflows/chipseqDB/>" => ChIP-seq workflows
"Gene-level RNA-seq differential expression and pathway analysis
<https://bioconductor.org/help/workflows/RnaSeqGeneEdgeRQL/>" => RNA-seq
workflows

Of course, it depends on how granular you want the topics to be. For
example, I only see one ChIP-seq workflow, so that particular section
might be a bit lonely for a while (I am planning to split that into two
workflows later).

Cheers,

Aaron

> On Tue, Dec 12, 2017 at 7:19 PM, Aaron Lun <aaron@cruk.cam.ac.uk> wrote:
>
>> The split-up workflows seem to have built successfully:
>>
>> http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/
>>
>> Is there something I have to do to get a blurb specific to each
>> vignette, as observed for "Annotation_Resources" vs
>> "Annotating_Genomic_Ranges"?
>>
>> The various vignettes are ordered pedagogically, so the order in which
>> they are presented in the workflow page might require some manual
>> specification. It would also be nice if the multiple simpleSingleCell
>> workflows are grouped together, to avoid being intermingled with other
>> workflows on the page.
>>
>> Finally, could we get a separate "single-cell workflows" section? The
>> current "Basic/Advanced" partition is pretty crude, and I can see
>> opportunities for more detailed stratification, e.g., by ChIP-seq,
>> RNA-seq, single-cell RNA-seq, proteomics (including mass cytometry).
>>
>> Cheers,
>>
>> Aaron
>>
>>
>> On 11/12/17 20:24, Aaron Lun wrote:
>>> Thanks Val:
>>>
>>> Obenchain, Valerie wrote:
>>>> Hi,
>>>>
>>>> On 12/11/2017 08:49 AM, Aaron Lun wrote:
>>>>> Following up on our earlier discussion:
>>>>>
>>>>> https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011949.html
>>>>>
>>>>> I have split the simpleSingleCell workflow into three (four, if you
>>>>> include the introductory overview) self-contained Rmarkdown files. I am
>>>>> preparing them for submission to BioC's workflow builder, and I would
>>>>> like to check what is the best way to do this:
>>>>>
>>>>> i) Each workflow file goes into its own package.
>>>>>
>>>>> ii) All workflow files go into a single package.
>>>>>
>>>>> Option (i) is logistically easier but probably a bit odd conceptually,
>>>>> especially if users need to download "simpleSingleCell1",
>>>>> "simpleSingleCell2", "simpleSingleCell3", etc.
>>>>> Option (ii) is nicer but requires more coordination, as the BioC
>> webpage
>>>>> builder needs to know that that multiple HTMLs have been generated.
>> It's
>>>>> also unclear to me whether this will run into problems with the DLL
>>>>> limit - does R restart when compiling each vignette?
>>>> You could do either but I'd say option 2 is easier from a maintenance
>>>> standpoint and probably for the user. Maybe you've seen this but an
>>>> example is the annotation workflow package which houses 2 workflows:
>>>>
>>>> ~/repos/svn/workflows >ls annotation/vignettes/
>>>> Annotating_Genomic_Ranges.Rmd  Annotation_Resources.Rmd
>>>> databaseTypes.png  display.png
>>>>
>>>> Each has an informative name and is presented on the website as an
>>>> individual workflow:
>>>>
>>>> https://bioconductor.org/help/workflows/
>>> I didn't know that, thanks.
>>>
>>>> I don't think more coordination is involved - you just have multiple
>>>> files in vignettes/. And, as you mentioned, it's a bonus that when a
>

Re: [Bioc-devel] splitting simpleSingleCell into self-contained vignettes

2017-12-12 Thread Aaron Lun
The split-up workflows seem to have built successfully:

http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/

Is there something I have to do to get a blurb specific to each 
vignette, as observed for "Annotation_Resources" vs 
"Annotating_Genomic_Ranges"?

The various vignettes are ordered pedagogically, so the order in which 
they are presented in the workflow page might require some manual 
specification. It would also be nice if the multiple simpleSingleCell 
workflows are grouped together, to avoid being intermingled with other 
workflows on the page.

Finally, could we get a separate "single-cell workflows" section? The 
current "Basic/Advanced" partition is pretty crude, and I can see 
opportunities for more detailed stratification, e.g., by ChIP-seq, 
RNA-seq, single-cell RNA-seq, proteomics (including mass cytometry).

Cheers,

Aaron


On 11/12/17 20:24, Aaron Lun wrote:
> Thanks Val:
> 
> Obenchain, Valerie wrote:
>> Hi,
>>
>> On 12/11/2017 08:49 AM, Aaron Lun wrote:
>>> Following up on our earlier discussion:
>>>
>>> https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011949.html
>>>
>>> I have split the simpleSingleCell workflow into three (four, if you
>>> include the introductory overview) self-contained Rmarkdown files. I am
>>> preparing them for submission to BioC's workflow builder, and I would
>>> like to check what is the best way to do this:
>>>
>>> i) Each workflow file goes into its own package.
>>>
>>> ii) All workflow files go into a single package.
>>>
>>> Option (i) is logistically easier but probably a bit odd conceptually,
>>> especially if users need to download "simpleSingleCell1",
>>> "simpleSingleCell2", "simpleSingleCell3", etc.
>>> Option (ii) is nicer but requires more coordination, as the BioC webpage
>>> builder needs to know that that multiple HTMLs have been generated. It's
>>> also unclear to me whether this will run into problems with the DLL
>>> limit - does R restart when compiling each vignette?
>> You could do either but I'd say option 2 is easier from a maintenance
>> standpoint and probably for the user. Maybe you've seen this but an
>> example is the annotation workflow package which houses 2 workflows:
>>
>> ~/repos/svn/workflows >ls annotation/vignettes/
>> Annotating_Genomic_Ranges.Rmd  Annotation_Resources.Rmd
>> databaseTypes.png  display.png
>>
>> Each has an informative name and is presented on the website as an
>> individual workflow:
>>
>> https://bioconductor.org/help/workflows/
> 
> I didn't know that, thanks.
> 
>> I don't think more coordination is involved - you just have multiple
>> files in vignettes/. And, as you mentioned, it's a bonus that when a
>> user downloads the annotation package they get all related workflows.
>>
>> A fresh R session is started for each package but not for each
>> vignette in the package.
> 
> Ah. That's a shame, I was hoping to reduce the sensitivity to the DLL limit.
> 
> But now that I think about it: maybe that's not actually a problem,
> provided the BioC workflow builders have a high DLL limit. The main
> issue was that *users* were running into the DLL limit; by splitting the
> workflow up, users should no be tempted to run everything at once, thus
> avoiding the limit on their machines. Of course, Bioconductor can
> control its own build machines, so as long as they set the MAX_DLLs
> high, it should still build and show up on the website.
> 
>>> Any thoughts would be appreciated. I'm also happy to be a guinea pig for
>>> any SVN->Git transition for the workflow packages, if that's on the radar.
>>
>> Nitesh has created git repos for the workflow packages and Andrzej is
>> adapting the BBS code to incorporate them into the builds. We
>> guesstimate this will be done by the end of the year. You shouldn't
>> have to do anything on your end - once we're ready to switch over
>> we'll let you know and send the new location of the workflow in git.
> 
> Cool, looking forward to it.
> 
> -A
> 
>> Val
>>> Cheers,
>>>
>>> Aaron
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>> This email message may contain legally privileged and/or confidential
>> information. If you are not the intended recipient(s), or the employee
>> or agent responsible for the delivery of this message to the intended
>> re

Re: [Bioc-devel] splitting simpleSingleCell into self-contained vignettes

2017-12-12 Thread Aaron Lun
The split-up workflows seem to have built successfully:

http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/

The various vignettes are ordered pedagogically, so the order in which 
they are presented in the workflow page might require some manual 
specification. It would also be nice if the multiple simpleSingleCell 
workflows are grouped together, to avoid being intermingled with other 
workflows on the page.

Is there something I have to do to get a blurb specific to each 
vignette, as observed for "Annotation_Resources" vs 
"Annotating_Genomic_Ranges"? I'm happy to only have a blurb for the 
first workflow, given that I'd be just repeating myself for the others; 
but this depends on how it's organized on the webpage.

Finally, could we get a separate "single-cell workflows" section? The 
current "Basic/Advanced" partition is pretty crude, and I can see 
opportunities for more detailed stratification, e.g., by ChIP-seq, 
RNA-seq, single-cell RNA-seq, proteomics (including mass cytometry).

Cheers,

Aaron

On 11/12/17 20:24, Aaron Lun wrote:
> Thanks Val:
> 
> Obenchain, Valerie wrote:
>> Hi,
>>
>> On 12/11/2017 08:49 AM, Aaron Lun wrote:
>>> Following up on our earlier discussion:
>>>
>>> https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011949.html
>>>
>>> I have split the simpleSingleCell workflow into three (four, if you
>>> include the introductory overview) self-contained Rmarkdown files. I am
>>> preparing them for submission to BioC's workflow builder, and I would
>>> like to check what is the best way to do this:
>>>
>>> i) Each workflow file goes into its own package.
>>>
>>> ii) All workflow files go into a single package.
>>>
>>> Option (i) is logistically easier but probably a bit odd conceptually,
>>> especially if users need to download "simpleSingleCell1",
>>> "simpleSingleCell2", "simpleSingleCell3", etc.
>>> Option (ii) is nicer but requires more coordination, as the BioC webpage
>>> builder needs to know that that multiple HTMLs have been generated. It's
>>> also unclear to me whether this will run into problems with the DLL
>>> limit - does R restart when compiling each vignette?
>> You could do either but I'd say option 2 is easier from a maintenance
>> standpoint and probably for the user. Maybe you've seen this but an
>> example is the annotation workflow package which houses 2 workflows:
>>
>> ~/repos/svn/workflows >ls annotation/vignettes/
>> Annotating_Genomic_Ranges.Rmd  Annotation_Resources.Rmd
>> databaseTypes.png  display.png
>>
>> Each has an informative name and is presented on the website as an
>> individual workflow:
>>
>> https://bioconductor.org/help/workflows/
> 
> I didn't know that, thanks.
> 
>> I don't think more coordination is involved - you just have multiple
>> files in vignettes/. And, as you mentioned, it's a bonus that when a
>> user downloads the annotation package they get all related workflows.
>>
>> A fresh R session is started for each package but not for each
>> vignette in the package.
> 
> Ah. That's a shame, I was hoping to reduce the sensitivity to the DLL limit.
> 
> But now that I think about it: maybe that's not actually a problem,
> provided the BioC workflow builders have a high DLL limit. The main
> issue was that *users* were running into the DLL limit; by splitting the
> workflow up, users should no be tempted to run everything at once, thus
> avoiding the limit on their machines. Of course, Bioconductor can
> control its own build machines, so as long as they set the MAX_DLLs
> high, it should still build and show up on the website.
> 
>>> Any thoughts would be appreciated. I'm also happy to be a guinea pig for
>>> any SVN->Git transition for the workflow packages, if that's on the radar.
>>
>> Nitesh has created git repos for the workflow packages and Andrzej is
>> adapting the BBS code to incorporate them into the builds. We
>> guesstimate this will be done by the end of the year. You shouldn't
>> have to do anything on your end - once we're ready to switch over
>> we'll let you know and send the new location of the workflow in git.
> 
> Cool, looking forward to it.
> 
> -A
> 
>> Val
>>> Cheers,
>>>
>>> Aaron
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>> This email message may contain legally privileged and/or conf

Re: [Bioc-devel] splitting simpleSingleCell into self-contained vignettes

2017-12-11 Thread Aaron Lun
Thanks Val:

Obenchain, Valerie wrote:
> Hi,
>
> On 12/11/2017 08:49 AM, Aaron Lun wrote:
>> Following up on our earlier discussion:
>>
>> https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011949.html
>>
>> I have split the simpleSingleCell workflow into three (four, if you 
>> include the introductory overview) self-contained Rmarkdown files. I am 
>> preparing them for submission to BioC's workflow builder, and I would 
>> like to check what is the best way to do this:
>>
>> i) Each workflow file goes into its own package.
>>
>> ii) All workflow files go into a single package.
>>
>> Option (i) is logistically easier but probably a bit odd conceptually, 
>> especially if users need to download "simpleSingleCell1", 
>> "simpleSingleCell2", "simpleSingleCell3", etc.
>> Option (ii) is nicer but requires more coordination, as the BioC webpage 
>> builder needs to know that that multiple HTMLs have been generated. It's 
>> also unclear to me whether this will run into problems with the DLL 
>> limit - does R restart when compiling each vignette?
> You could do either but I'd say option 2 is easier from a maintenance
> standpoint and probably for the user. Maybe you've seen this but an
> example is the annotation workflow package which houses 2 workflows:
>
> ~/repos/svn/workflows >ls annotation/vignettes/
> Annotating_Genomic_Ranges.Rmd  Annotation_Resources.Rmd 
> databaseTypes.png  display.png
>
> Each has an informative name and is presented on the website as an
> individual workflow:
>
> https://bioconductor.org/help/workflows/

I didn't know that, thanks.

> I don't think more coordination is involved - you just have multiple
> files in vignettes/. And, as you mentioned, it's a bonus that when a
> user downloads the annotation package they get all related workflows.
>
> A fresh R session is started for each package but not for each
> vignette in the package.

Ah. That's a shame, I was hoping to reduce the sensitivity to the DLL limit.

But now that I think about it: maybe that's not actually a problem,
provided the BioC workflow builders have a high DLL limit. The main
issue was that *users* were running into the DLL limit; by splitting the
workflow up, users should no be tempted to run everything at once, thus
avoiding the limit on their machines. Of course, Bioconductor can
control its own build machines, so as long as they set the MAX_DLLs
high, it should still build and show up on the website.

>> Any thoughts would be appreciated. I'm also happy to be a guinea pig for 
>> any SVN->Git transition for the workflow packages, if that's on the radar.
>
> Nitesh has created git repos for the workflow packages and Andrzej is
> adapting the BBS code to incorporate them into the builds. We
> guesstimate this will be done by the end of the year. You shouldn't
> have to do anything on your end - once we're ready to switch over
> we'll let you know and send the new location of the workflow in git.

Cool, looking forward to it.

-A

> Val
>> Cheers,
>>
>> Aaron
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee
> or agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately
> by e-mail and delete this email message from your computer. Thank you. 

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] splitting simpleSingleCell into self-contained vignettes

2017-12-11 Thread Aaron Lun
Following up on our earlier discussion:

https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011949.html

I have split the simpleSingleCell workflow into three (four, if you 
include the introductory overview) self-contained Rmarkdown files. I am 
preparing them for submission to BioC's workflow builder, and I would 
like to check what is the best way to do this:

i) Each workflow file goes into its own package.

ii) All workflow files go into a single package.

Option (i) is logistically easier but probably a bit odd conceptually, 
especially if users need to download "simpleSingleCell1", 
"simpleSingleCell2", "simpleSingleCell3", etc.

Option (ii) is nicer but requires more coordination, as the BioC webpage 
builder needs to know that that multiple HTMLs have been generated. It's 
also unclear to me whether this will run into problems with the DLL 
limit - does R restart when compiling each vignette?

Any thoughts would be appreciated. I'm also happy to be a guinea pig for 
any SVN->Git transition for the workflow packages, if that's on the radar.

Cheers,

Aaron
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] possible bug in Rhtslib::pkgconfig

2017-10-27 Thread Aaron Lun
The "Re:" in the title of this e-mail? I add an "Re:" to all my titles,
and I assume that the mailing list dispatcher adds the "[Bioc-devel]"
tag (cleverly inserting it between the "Re:" and the actual title).

Just looked it up - apparently it stands for "in re" in Latin. I guess I
learn something new every day.

Martin Morgan wrote:
> Thanks Aaron I'll follow up on the support site.
>
> Can you clarify where the 'Re:' came from in the title? I can't find a
> previous post with similar title.
>
> Martin
>
> On 10/27/2017 05:42 AM, Aaron Lun wrote:
>> Dear list,
>>
>> It seems that there is an issue with Rhtslib::pkgconfig() regarding the
>> identification of the location of the shared library on some systems:
>>
>> https://support.bioconductor.org/p/102248/
>>
>> To summarize: on this system, R is putting the shared library in lib64/,
>> while pkgconfig() looks for it in lib/. This results in linkage errors
>> for all packages depending on Rhtslib on this system. I imagine that the
>> same would happen for all library packages, e.g., beachmat, Rhdf5lib.
>>
>> Looking at the Makevars for Rhtslib suggests that the shared library is
>> stored in ${R_PACKAGE_DIR}/lib${R_ARCH}, while pkgconfig only ever looks
>> in lib/. I assume that this usually works because ${R_ARCH} is empty on
>> most linux systems, though perhaps this cannot be guaranteed.
>>
>> Cheers,
>>
>> Aaron
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that any disclosure,
> copying, distribution, or use of this email message is prohibited.  If
> you have received this message in error, please notify the sender
> immediately by e-mail and delete this email message from your
> computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] clarification of the BiocParallel vignette for SnowParam() usage

2017-10-10 Thread Aaron Lun
Dear list;


Currently the vignette for the BiocParallel package states that the functions 
to be executed should contain the necessary library() calls if bplapply is used 
with SnowParam(). Now, I use bplapply in several of my packages with internal 
helper/wrapper functions, and that piece of advice caused me to worry whether I 
should have library() calls in all of these functions.


After some testing and contemplation, I realised that the internal functions 
would pass along the package namespace to each worker, and so no extra loading 
was required. Perhaps it would be a good idea to mention this in the vignette, 
to soothe the hearts of package developers like me.


Some mention of the memory consumption behaviour of MulticoreParam would also 
be useful (as discussed in https://support.bioconductor.org/p/70196/#70509), 
especially if people are choosing between MulticoreParam and SnowParam.


Cheers,


Aaron

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-05 Thread Aaron Lun
Thanks Val, and no worries. I'm looking forward to seeing some updated 
guidelines for the workflows, incorporating what we've discussed on the mailing 
list; if I know what to do, I'm happy to put in the effort on my end.


-Aaron


From: Obenchain, Valerie <valerie.obench...@roswellpark.org>
Sent: Friday, 6 October 2017 6:36:44 AM
To: Aaron Lun; Wolfgang Huber; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

Glad this sparked some interesting dialogue!

Regardless of what's decided, this change should have been announced on the 
list before implementing it. Sorry for changing the package out from under you. 
I recognize these workflows are the author's unique work and style will vary 
with personal preferences.

I've reverted simpleSingleCell to -r132062 which was prior to us making any 
vignette/DESCRIPTION changes.

Valerie


On 10/04/2017 05:12 PM, Aaron Lun wrote:

Here's another two cents from me:

The explicit library() calls allow for easy copy-pasting if people only want to 
use/adapt a section of the workflow. In such cases, calling 
"library(simpleSingleCell)" could drag in a lot of unnecessary packages (e.g., 
which could hit the DLL limit). Reading through the text to figure out the 
requirements for each code chunk seems like a pain, and lots of "::" are 
unwieldy.

More generally, the removal of individual library() calls seems to encourage 
the use of a single "library(simpleSingleCell)" call at the top of any 
user-developed custom analysis scripts based on the workflow. This seems 
conceptually odd to me - the simpleSingleCell package is simply a vehicle for 
the compiled workflow, it shouldn't be involved in analyses of other data.

-Aaron


From: Bioc-devel 
<bioc-devel-boun...@r-project.org><mailto:bioc-devel-boun...@r-project.org> on 
behalf of Wolfgang Huber <wolfgang.hu...@embl.de><mailto:wolfgang.hu...@embl.de>
Sent: Thursday, 5 October 2017 8:26 AM
To: bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow


I find `eval=FALSE` chunks not a good idea, since
- they confuse users who only see the rendered HTML/PDF (where this flag
is not shown)
- they are not tested, so more prone to code rot.

I'd also like to object to the idea that proximity of a `library` call
to code that uses a package is somehow didactic. It's actually a bad
habit: the R interpreter does not care. The relevant package
- can be mentioned in the narrative,
- stated in the code with the pkgname:: prefix.
The latter is good didactics to get people used to the idea of
namespaces, especially since there is an increasing frequency of name
clashes in CRAN, tidyverse, BioC (e.g. consider the various functions
named 'filter' and the obscure malbehaviors that can result from these).

Best wishes
Wolfgang

On 04/10/2017 22:20, Turaga, Nitesh wrote:


Hi Aaron,


A work around solution maybe to, put all libraries in a �eval=FALSE� block in 
the r code chunk

```{r, eval=FALSE}
library(scran)
library(scater)
```

etc.


This way the users can see the library() calls in the vignette.

Best,

Nitesh



On Oct 4, 2017, at 4:14 PM, Obenchain, Valerie 
<valerie.obench...@roswellpark.org><mailto:valerie.obench...@roswellpark.org> 
wrote:

Hi guys,

A little background on this vignette -> package conversion. The workflows were 
converted to package form because we want to integrate them into the nightly 
build system instead of supporting separate machines as we're now doing.

As part of this conversion, packages loaded in workflow vignettes were moved to 
Depends in DESCRIPTION. This enables the user to load a single package instead 
of many. Packages were moved to Depends instead of Suggests (as is usually done 
with software packages) because these vignette is the only thing these workflow 
packages have going - no defined classes or methods. This seemed a more tidy 
approach and the dependencies are listed in Depends for the user to see. This 
was my (maybe bad?) idea and Nitesh was the messenger. If you feel the 
individual loading of packages in the vignette is a key part of the 
instruction/learning we can leave them as is and list the packages in Suggests.

I should also mention that incorporating the workflows into the build system 
won't happen until after the release. At that time we'll move the repositories 
from svn to git and it's likely we'll have to ask maintainers to abide by some 
time/space guidelines. At that point the build machines will be building 
software, experimental data and workflows and resources aren't unlimited. When 
that time comes we'll update the workflow guidelines and contact maintainers.

Thanks.
Valerie



On 10/04/2017 12:27 PM, Kasper Daniel Hansen wrote:

yeah, that is super super useful to

Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-05 Thread Aaron Lun
Just had a look at the use of Makefiles in vignettes/; it seems like the 
Makefile controls the conversion to PDF after Sweave'ing. So it probably can't 
be used to control the order of Sweaving, though there doesn't seem to be any 
mention of how the order is controlled. If it's alphanumeric, we could call it 
"1-start.Rmd", "2-next.Rmd", "3-final.Rmd" and so on.


-Aaron




From: Wolfgang Huber <wolfgang.hu...@embl.de>
Sent: Friday, 6 October 2017 5:22:44 AM
To: Aaron Lun; Laurent Gatto
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow


Breaking up long workflows into several smaller "modules" each with a
clearly defined input and output is a good idea, certainly for didactic
& maintenance reasons.

It doesn't "solve" the DLL issue though, it only avoids it (for now)...

I believe you can use a Makefile for your vignettes
(https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes),
and this might be a good way of managing which depends on which. For
passing along output/input, perhaps local .RData files are good enough,
perhaps some wheel-reinventing can also be avoided by using
https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
(haven't actually used it yet, though).

Wolfgang



5.10.17 20:02, Aaron Lun scripsit:
> This may relate to what I was thinking with respect to solving the DLL
> problem, by breaking up large workflows into modules that can be
> executed in separate R sessions. The same approach would also make it
> easier to associate package dependencies with specific parts of the
> workflow.
>
>
> In my particular situation, it is easy to break up the workflow into
> sections that can be executed completely independently. However, I can
> also imagine situations where dependencies on previous objects, etc.
> make it difficult to break up the workflow. If multiple files are
> present in vignettes/, can they be directed to execute in a specific
> order, and would output files from one vignette persist during the
> execution of another?
>
>
> -Aaron
>
> 
> *From:* Wolfgang Huber <wolfgang.hu...@embl.de>
> *Sent:* Thursday, 5 October 2017 6:23:47 PM
> *To:* Laurent Gatto; Aaron Lun
> *Cc:* bioc-devel@r-project.org
> *Subject:* Re: [Bioc-devel] library() calls removed in simpleSingleCell
> workflow
>
> I agree it is nice to be able to only load the packages needed for a
> certain section of a vignette and not the whole thing. And that too many
> `::` can make code look unwieldy (though some may actually increase
> readability).
>
> But relying on manually sprinkled in `library` calls seems like a hack
> prone to error. And there are always bound to be dependencies that are
> non-local, e.g. on general infrastructure like SummarizedExperiment,
> ggplot2, dplyr.
>
> So: do we need a way to computationally determine the dependencies of a
> vignette section, including highlighting/eliminating potential name
> clashes (b/c the warnings about masking emitted at package loading are
> easily ignored)? This seems like a straightforward engineering task.
>
> Eventually with such code analysis we could get rid of explicit
> `library` calls altogether :)
>
>  Wolfgang
>
>
>
>
>
> 5.10.17 08:53, Laurent Gatto scripsit:
>>
>> On  5 October 2017 00:11, Aaron Lun wrote:
>>
>>> Here's another two cents from me:
>>>
>>> The explicit library() calls allow for easy copy-pasting if people
>>> only want to use/adapt a section of the workflow. In such cases,
>>> calling "library(simpleSingleCell)" could drag in a lot of unnecessary
>>> packages (e.g., which could hit the DLL limit). Reading through the
>>> text to figure out the requirements for each code chunk seems like a
>>> pain, and lots of "::" are unwieldy.
>>>
>>> More generally, the removal of individual library() calls seems to
>>> encourage the use of a single "library(simpleSingleCell)" call at the
>>> top of any user-developed custom analysis scripts based on the
>>> workflow. This seems conceptually odd to me - the simpleSingleCell
>>> package is simply a vehicle for the compiled workflow, it shouldn't be
>>> involved in analyses of other data.
>>
>> I can confirm that this is a possibility.
>>
>> Before workflows became available, I created the RforProteomics package
>> that essentially provided one relatively large vignette to demonstrate a
>> variety of applications of R/Bioconductor for mass 

Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-05 Thread Aaron Lun
This may relate to what I was thinking with respect to solving the DLL problem, 
by breaking up large workflows into modules that can be executed in separate R 
sessions. The same approach would also make it easier to associate package 
dependencies with specific parts of the workflow.


In my particular situation, it is easy to break up the workflow into sections 
that can be executed completely independently. However, I can also imagine 
situations where dependencies on previous objects, etc. make it difficult to 
break up the workflow. If multiple files are present in vignettes/, can they be 
directed to execute in a specific order, and would output files from one 
vignette persist during the execution of another?


-Aaron


From: Wolfgang Huber <wolfgang.hu...@embl.de>
Sent: Thursday, 5 October 2017 6:23:47 PM
To: Laurent Gatto; Aaron Lun
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow


I agree it is nice to be able to only load the packages needed for a
certain section of a vignette and not the whole thing. And that too many
`::` can make code look unwieldy (though some may actually increase
readability).

But relying on manually sprinkled in `library` calls seems like a hack
prone to error. And there are always bound to be dependencies that are
non-local, e.g. on general infrastructure like SummarizedExperiment,
ggplot2, dplyr.

So: do we need a way to computationally determine the dependencies of a
vignette section, including highlighting/eliminating potential name
clashes (b/c the warnings about masking emitted at package loading are
easily ignored)? This seems like a straightforward engineering task.

Eventually with such code analysis we could get rid of explicit
`library` calls altogether :)

Wolfgang





5.10.17 08:53, Laurent Gatto scripsit:
>
> On  5 October 2017 00:11, Aaron Lun wrote:
>
>> Here's another two cents from me:
>>
>> The explicit library() calls allow for easy copy-pasting if people
>> only want to use/adapt a section of the workflow. In such cases,
>> calling "library(simpleSingleCell)" could drag in a lot of unnecessary
>> packages (e.g., which could hit the DLL limit). Reading through the
>> text to figure out the requirements for each code chunk seems like a
>> pain, and lots of "::" are unwieldy.
>>
>> More generally, the removal of individual library() calls seems to
>> encourage the use of a single "library(simpleSingleCell)" call at the
>> top of any user-developed custom analysis scripts based on the
>> workflow. This seems conceptually odd to me - the simpleSingleCell
>> package is simply a vehicle for the compiled workflow, it shouldn't be
>> involved in analyses of other data.
>
> I can confirm that this is a possibility.
>
> Before workflows became available, I created the RforProteomics package
> that essentially provided one relatively large vignette to demonstrate a
> variety of applications of R/Bioconductor for mass spectrometry and
> proteomics. I think this has been a useful way to disseminate R and
> Bioconductor in these respective communities, but also lead to the
> confusion that it was that package that "did all the stuff", i.e. people
> saying that they were using RforProteomics to do a task that was
> described in the vignette. The RforProteomics vignette does explicitly
> call library at the beginning of each section and explained that the
> package was only a collection of analyses stemming from other packages,
> but that wasn't enough apparently.
>
> Laurent
>
>
>> -Aaron
>>
>> 
>> From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Wolfgang 
>> Huber <wolfgang.hu...@embl.de>
>> Sent: Thursday, 5 October 2017 8:26 AM
>> To: bioc-devel@r-project.org
>> Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell 
>> workflow
>>
>>
>> I find `eval=FALSE` chunks not a good idea, since
>> - they confuse users who only see the rendered HTML/PDF (where this flag
>> is not shown)
>> - they are not tested, so more prone to code rot.
>>
>> I'd also like to object to the idea that proximity of a `library` call
>> to code that uses a package is somehow didactic. It's actually a bad
>> habit: the R interpreter does not care. The relevant package
>> - can be mentioned in the narrative,
>> - stated in the code with the pkgname:: prefix.
>> The latter is good didactics to get people used to the idea of
>> namespaces, especially since there is an increasing frequency of name
>> clashes in CRAN, tidyverse, BioC (e.g. consider the various functions
>>

Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-04 Thread Aaron Lun
Here's another two cents from me:

The explicit library() calls allow for easy copy-pasting if people only want to 
use/adapt a section of the workflow. In such cases, calling 
"library(simpleSingleCell)" could drag in a lot of unnecessary packages (e.g., 
which could hit the DLL limit). Reading through the text to figure out the 
requirements for each code chunk seems like a pain, and lots of "::" are 
unwieldy.

More generally, the removal of individual library() calls seems to encourage 
the use of a single "library(simpleSingleCell)" call at the top of any 
user-developed custom analysis scripts based on the workflow. This seems 
conceptually odd to me - the simpleSingleCell package is simply a vehicle for 
the compiled workflow, it shouldn't be involved in analyses of other data.

-Aaron


From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Wolfgang Huber 
<wolfgang.hu...@embl.de>
Sent: Thursday, 5 October 2017 8:26 AM
To: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow


I find `eval=FALSE` chunks not a good idea, since
- they confuse users who only see the rendered HTML/PDF (where this flag
is not shown)
- they are not tested, so more prone to code rot.

I'd also like to object to the idea that proximity of a `library` call
to code that uses a package is somehow didactic. It's actually a bad
habit: the R interpreter does not care. The relevant package
- can be mentioned in the narrative,
- stated in the code with the pkgname:: prefix.
The latter is good didactics to get people used to the idea of
namespaces, especially since there is an increasing frequency of name
clashes in CRAN, tidyverse, BioC (e.g. consider the various functions
named 'filter' and the obscure malbehaviors that can result from these).

Best wishes
Wolfgang

On 04/10/2017 22:20, Turaga, Nitesh wrote:
> Hi Aaron,
>
>
> A work around solution maybe to, put all libraries in a �eval=FALSE� block in 
> the r code chunk
>
> ```{r, eval=FALSE}
> library(scran)
> library(scater)
> ```
>
> etc.
>
>
> This way the users can see the library() calls in the vignette.
>
> Best,
>
> Nitesh
>
>> On Oct 4, 2017, at 4:14 PM, Obenchain, Valerie 
>> <valerie.obench...@roswellpark.org> wrote:
>>
>> Hi guys,
>>
>> A little background on this vignette -> package conversion. The workflows 
>> were converted to package form because we want to integrate them into the 
>> nightly build system instead of supporting separate machines as we're now 
>> doing.
>>
>> As part of this conversion, packages loaded in workflow vignettes were moved 
>> to Depends in DESCRIPTION. This enables the user to load a single package 
>> instead of many. Packages were moved to Depends instead of Suggests (as is 
>> usually done with software packages) because these vignette is the only 
>> thing these workflow packages have going - no defined classes or methods. 
>> This seemed a more tidy approach and the dependencies are listed in Depends 
>> for the user to see. This was my (maybe bad?) idea and Nitesh was the 
>> messenger. If you feel the individual loading of packages in the vignette is 
>> a key part of the instruction/learning we can leave them as is and list the 
>> packages in Suggests.
>>
>> I should also mention that incorporating the workflows into the build system 
>> won't happen until after the release. At that time we'll move the 
>> repositories from svn to git and it's likely we'll have to ask maintainers 
>> to abide by some time/space guidelines. At that point the build machines 
>> will be building software, experimental data and workflows and resources 
>> aren't unlimited. When that time comes we'll update the workflow guidelines 
>> and contact maintainers.
>>
>> Thanks.
>> Valerie
>>
>>
>>
>> On 10/04/2017 12:27 PM, Kasper Daniel Hansen wrote:
>>
>> yeah, that is super super useful to people. In my vignettes (granted, not
>> workflows) I have a separate "Dependencies" section which is basically a
>> series of library() calls.
>>
>> On Wed, Oct 4, 2017 at 3:18 PM, Aaron Lun 
>> <a...@wehi.edu.au><mailto:a...@wehi.edu.au> wrote:
>>
>>
>>
>> Dear Nitesh, list;
>>
>>
>> The library() calls in the simpleSingleCell workflow have been removed.
>> Why is this? I find explicit library() calls to be quite useful for readers
>> of the compiled vignette, because it makes it easier for them to determine
>> the packages that are required to adapt parts of the workflow fo

Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-04 Thread Aaron Lun
Dear Nitesh, list;


The library() calls in the simpleSingleCell workflow have been removed. Why is 
this? I find explicit library() calls to be quite useful for readers of the 
compiled vignette, because it makes it easier for them to determine the 
packages that are required to adapt parts of the workflow for their own 
analyses. If it doesn't hurt the build system, I would prefer to have these 
library() calls in the vignette.


Cheers,


Aaron

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

2017-09-29 Thread Aaron Lun
Thanks Martin. Looks like it's building happily now, which gives us some 
breathing space.


-Aaron


From: Martin Morgan <martin.mor...@roswellpark.org>
Sent: Tuesday, 26 September 2017 6:05:22 PM
To: Aaron Lun; Herv� Pag�s; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange 
error in Jenkins build forsingleCellWorkflow

On 09/26/2017 03:04 AM, Aaron Lun wrote:
> Hi Herve,
>
>
> I tried out the .BBSoptions approach, but it seems that the build system
> is still having some trouble:
>
>
> http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/label=master/59/console
>
>
> I bumped up the maximum number of DLLs to 200 in .BBSoptions, but to no
> effect. Any ideas?

This is my bad advice; as Herve mentions the workflow builders do not
respect BBS options. We will adjust the max. DLLs on our end. Please be
patient.

Martin

>
>
> -Aaron
>
> 
> *From:* Herv� Pag�s <hpa...@fredhutch.org>
> *Sent:* Thursday, 21 September 2017 3:06:18 PM
> *To:* Aaron Lun; Martin Morgan; bioc-devel@r-project.org
> *Subject:* Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re:
> strange error in Jenkins build forsingleCellWorkflow
> Hi,
>
> @Martin: It's good news that the workflows have been standardized as
> packages but aren't we still using the traditional workflow builder?
> AFAIK .BBSoptions files are only honoured on the main build system
> (a.k.a. BBS).
>
> @Aaron: If we decide to use BBS (our main build system) to build the
> workflows, then you'll be able to control R_MAX_NUM_DLLS by putting
> the following lines to your .BBSoptions file:
>
> RbuildPrepend: R_MAX_NUM_DLLS=150
> RbuildPrepend.win: set R_MAX_NUM_DLLS=150&&
> RcheckPrepend: R_MAX_NUM_DLLS=150
> RcheckPrepend.win: set R_MAX_NUM_DLLS=150&&
>
> You might not need all of them but it doesn't hurt to have them
> all. Note that you should not try to put a space before && in the
> RbuildPrepend.win or RcheckPrepend.win value.
>
> H.
>
> On 09/19/2017 05:51 PM, Aaron Lun wrote:
>> Thanks Martin. I think I will stick to one workflow for now, until the
>> BioC-workflows page provides some formal support for multiple workflows
>> representing different components of the same workflow (i.e., other than
>> me manually writing in the abstract that "This workflow is based on the
>> concepts introduced in the previous workflow X").
>>
>>
>> @Herve can you help me out with the .BBSoptions configuration for
>> R_MAX_NUM_DLLS? I guess we should also indicate to the user that this
>> needs to be increased in order for the workflow to run.
>>
>>
>> -Aaron
>>
>>
>>
>> 
>> *From:* Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of
>> Martin Morgan <martin.mor...@roswellpark.org>
>> *Sent:* Wednesday, 20 September 2017 2:16 AM
>> *To:* Wolfgang Huber; bioc-devel@r-project.org
>> *Subject:* Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re:
>> strange error in Jenkins build forsingleCellWorkflow
>> On 09/19/2017 09:50 AM, Wolfgang Huber wrote:
>>>
>>> My 3 cents:
>>> - I think this is a more and more common problem that I'm also
>>> encountering in everyday work and that asks for a general solution.
>>> - I agree with Martin that setting R_MAX_NUM_DLLS is better than
>>> unloading. AfaIk it is not even possible to cleanly unload every package
>>> ('as if it had never been loaded') due to irreversible global effects;
>>> although I'd happy to be educated otherwise.
>>> - R_MAX_NUM_DLLS is not a sustainable solution either: the current
>>> default is 100, but e.g. on my MacOS 10.12 any value >152 leads to an
>>> error. Upping to the maximum 152 will give us some temporary respite but
>>> seems not really future-proof.
>>
>> This was the R-core motivation for increasing the max to only 100, but
>> it's still surprising to me that a modern OS has such a tight limit.
>> I'll see if there are ideas in R-core.
>>
>>   From our internal discussions there is some willingness to (continue)
>> supporting large and complicated work flows, but it is valuable to think
>> carefully about the consequences for users following along. Maybe part
>> of this is clearly alerting the user to the fact that 500G of data are
>> going to be downloaded, the workflow requires advanced configuration of
>> R, etc.
>>
>> @Aaron -- if you'd

Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

2017-09-26 Thread Aaron Lun
Hi Herve,


I tried out the .BBSoptions approach, but it seems that the build system is 
still having some trouble:


http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/label=master/59/console


I bumped up the maximum number of DLLs to 200 in .BBSoptions, but to no effect. 
Any ideas?


-Aaron


From: Herv� Pag�s <hpa...@fredhutch.org>
Sent: Thursday, 21 September 2017 3:06:18 PM
To: Aaron Lun; Martin Morgan; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange 
error in Jenkins build forsingleCellWorkflow

Hi,

@Martin: It's good news that the workflows have been standardized as
packages but aren't we still using the traditional workflow builder?
AFAIK .BBSoptions files are only honoured on the main build system
(a.k.a. BBS).

@Aaron: If we decide to use BBS (our main build system) to build the
workflows, then you'll be able to control R_MAX_NUM_DLLS by putting
the following lines to your .BBSoptions file:

RbuildPrepend: R_MAX_NUM_DLLS=150
RbuildPrepend.win: set R_MAX_NUM_DLLS=150&&
RcheckPrepend: R_MAX_NUM_DLLS=150
RcheckPrepend.win: set R_MAX_NUM_DLLS=150&&

You might not need all of them but it doesn't hurt to have them
all. Note that you should not try to put a space before && in the
RbuildPrepend.win or RcheckPrepend.win value.

H.

On 09/19/2017 05:51 PM, Aaron Lun wrote:
> Thanks Martin. I think I will stick to one workflow for now, until the
> BioC-workflows page provides some formal support for multiple workflows
> representing different components of the same workflow (i.e., other than
> me manually writing in the abstract that "This workflow is based on the
> concepts introduced in the previous workflow X").
>
>
> @Herve can you help me out with the .BBSoptions configuration for
> R_MAX_NUM_DLLS? I guess we should also indicate to the user that this
> needs to be increased in order for the workflow to run.
>
>
> -Aaron
>
>
>
> 
> *From:* Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of
> Martin Morgan <martin.mor...@roswellpark.org>
> *Sent:* Wednesday, 20 September 2017 2:16 AM
> *To:* Wolfgang Huber; bioc-devel@r-project.org
> *Subject:* Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re:
> strange error in Jenkins build forsingleCellWorkflow
> On 09/19/2017 09:50 AM, Wolfgang Huber wrote:
>>
>> My 3 cents:
>> - I think this is a more and more common problem that I'm also
>> encountering in everyday work and that asks for a general solution.
>> - I agree with Martin that setting R_MAX_NUM_DLLS is better than
>> unloading. AfaIk it is not even possible to cleanly unload every package
>> ('as if it had never been loaded') due to irreversible global effects;
>> although I'd happy to be educated otherwise.
>> - R_MAX_NUM_DLLS is not a sustainable solution either: the current
>> default is 100, but e.g. on my MacOS 10.12 any value >152 leads to an
>> error. Upping to the maximum 152 will give us some temporary respite but
>> seems not really future-proof.
>
> This was the R-core motivation for increasing the max to only 100, but
> it's still surprising to me that a modern OS has such a tight limit.
> I'll see if there are ideas in R-core.
>
>   From our internal discussions there is some willingness to (continue)
> supporting large and complicated work flows, but it is valuable to think
> carefully about the consequences for users following along. Maybe part
> of this is clearly alerting the user to the fact that 500G of data are
> going to be downloaded, the workflow requires advanced configuration of
> R, etc.
>
> @Aaron -- if you'd like to continue with one work flow, contact Herve
> (cc'd) and he'll provide the .BBSoptions configuration to allow the
> build system to use an appropriate R_MAX_NUM_DLLS. If instead you'd like
> to produce two workflows, then the best strategy in your case would be
> to simply have two independent packages (DESCRIPTION + vignettes/) each
> with more modest numbers of DLLs; contact Lori (cc'd) when you've
> decided on a second name, and we'll create the svn location for you.
>
> Martin
>
>>
>>  Wolfgang
>>
>> 19.9.17 12:02, Martin Morgan scripsit:
>>> On 09/18/2017 10:42 PM, Shian Su wrote:
>>>> Hi Aaron,
>>>>
>>>> Would you mind sharing the code for flushing DLLs? This is a problem
>>>> that others working with single cells and I have faced.
>>>>
>>>
>>> For the user encountering this problem I think a better solution is to
>>> increase the number of DLLs allowed by R, for instance editing
>>

Re: [Bioc-devel] assay dimnames in SingleCellExperiment / SummarizedExperiment

2017-09-23 Thread Aaron Lun
Well, I guess it probably wouldn't be too bad to have "Feature" for the 
SingleCellExperiment "names(dimnames(...))[1]". The SCE inherits from an RSE 
anyway so we do imply that the rows represent some kind of genomic feature. 
Maybe this is too presumptive, but it doesn't seem to clash with any of  the 
current  applications of SCE. Of course, we could always just leave the first 
name empty, but I'm not sure whether this would cause Kevin's use cases to do 
funny things.


On another note, what is the easiest way to enforce dimnames names at the SCE 
level? Do I need to implement new methods for assay<- or assays<-, to overwrite 
the names(dimnames()) of incoming matrices? That seems like a pain. If the base 
SE is going to check/enforce consistent names of dimnames, perhaps this could 
be made into a method that SCE can specialize to coerce the names to something 
else.


-Aaron


From: Herv� Pag�s <hpa...@fredhutch.org>
Sent: Saturday, 23 September 2017 8:47:14 AM
To: Kevin RUE; Aaron Lun
Cc: bioc-devel; da...@ebi.ac.uk; risso.dav...@gmail.com; Maintainer
Subject: Re: assay dimnames in SingleCellExperiment / SummarizedExperiment

Hi guys,

On 09/16/2017 03:49 AM, Kevin RUE wrote:
> Hi Aaron,
>
> Yes - sorry, I meant the names of dimnames. Dimnames are indeed checked,
> but my code was meant to demonstrate that names of dimnames aren't.
> Obviously, it's not the end of the world, but just something I noticed
> while I was investigating the glitch.

Sounds like a good idea to me to check for consistent names of
dimnames across assays. I'll add this to the validity method of
SummarizedExperiment objects.

>
> My second point is not that much about calling dim or dimnames, but
> rather about the side-effects of having names(dimnames(x)) not NULL,
> such as the case of `reshape2::melt`.
> I think it'd be one worry less for downstream methods to 'know' the
> colnames of a melted assay(x, 1) instead of having "Var1, Var2, value"
> if names(dimnames) is NULL, and "something else" if not NULL.
>
> Beyond aesthetics, it's really just semantics, but I do think small
> stuff like that, if handled at a higher class level, can encourage
> downstream developers to work off a more consistent mental
> and computational model (my take from Michael Lawrence's BOF at
> Bioc2017). In other words, it has a small cost to implement in the
> parent class, instead of if-else statements in each child class.
>
> It could be something as simple as :
>
>   * c("Feature", "Sample") at the `SummarizedExperiment` level
>   * overriden by c("Feature", "Cell") in `SingleCellExperiment`
>   * overriden by developer's choice in other dependent packages.

I'm not too keen on enforcing this at the SummarizedExperiment level.
The rows of a SummarizedExperiment object sometimes correspond to
bins or to a running window. You could even imagine use cases where
they correspond to reads or groups of reads or protein IDs. As general
as "Feature" might sound, it would feel a little bit like a misnomer
for these use cases. This could slightly hurt the re-usability appeal
of SummarizedExperiment objects.

I could see the same argument being made about enforcing
names(dimnames(x))[1] to "Feature" for a SingleCellExperiment
object. However enforcing names(dimnames(x))[2] to "Cell" is
probably fine and "Cell" seems like the natural choice given
that this is hardcoded in the name of the class.

Note that technically you cannot have SummarizedExperiment
enforce c("Feature", "Sample") and SingleCellExperiment enforce
something else. That's because S4 doesn't let you override validity
criteria defined by an ancestor class. And that in turn is because
in S4 validation is *incremental*. This means that the validity
method for a subclass only needs to worry about validating what's
not already covered by the validity methods of all the ancestors
class. When one calls validObject(x), first the validity methods
for all 'x' ancestor classes are called (from the most distant
ancestor to the direct parent), and the validity method for the
class of 'x' is finally called. This means that you cannot write
a validity method for 'x' that contradicts what the validity
methods for the ancestor classes expect. In other words, if B
extends A, an object of class B must be a valid A object (remember
that is(x, "A") is TRUE) before it can be considered to be a
valid B object. In the (almost) real world this is just saying
that before a cat can be considered to be a valid red cat it must
first be considered to be a valid cat. (Don't ask me what a valid
cat is.)

Cheers,
H.

>
>
> All the best,
> Kevin
>
>
> On Sat, Sep 16, 2017 at 6:43 AM, Aaron Lun <a...@wehi.edu.au
> <mailto:a..

Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

2017-09-19 Thread Aaron Lun
Thanks Martin. I think I will stick to one workflow for now, until the 
BioC-workflows page provides some formal support for multiple workflows 
representing different components of the same workflow (i.e., other than me 
manually writing in the abstract that "This workflow is based on the concepts 
introduced in the previous workflow X").


@Herve can you help me out with the .BBSoptions configuration for 
R_MAX_NUM_DLLS? I guess we should also indicate to the user that this needs to 
be increased in order for the workflow to run.


-Aaron



From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Martin Morgan 
<martin.mor...@roswellpark.org>
Sent: Wednesday, 20 September 2017 2:16 AM
To: Wolfgang Huber; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange 
error in Jenkins build forsingleCellWorkflow

On 09/19/2017 09:50 AM, Wolfgang Huber wrote:
>
> My 3 cents:
> - I think this is a more and more common problem that I'm also
> encountering in everyday work and that asks for a general solution.
> - I agree with Martin that setting R_MAX_NUM_DLLS is better than
> unloading. AfaIk it is not even possible to cleanly unload every package
> ('as if it had never been loaded') due to irreversible global effects;
> although I'd happy to be educated otherwise.
> - R_MAX_NUM_DLLS is not a sustainable solution either: the current
> default is 100, but e.g. on my MacOS 10.12 any value >152 leads to an
> error. Upping to the maximum 152 will give us some temporary respite but
> seems not really future-proof.

This was the R-core motivation for increasing the max to only 100, but
it's still surprising to me that a modern OS has such a tight limit.
I'll see if there are ideas in R-core.

 From our internal discussions there is some willingness to (continue)
supporting large and complicated work flows, but it is valuable to think
carefully about the consequences for users following along. Maybe part
of this is clearly alerting the user to the fact that 500G of data are
going to be downloaded, the workflow requires advanced configuration of
R, etc.

@Aaron -- if you'd like to continue with one work flow, contact Herve
(cc'd) and he'll provide the .BBSoptions configuration to allow the
build system to use an appropriate R_MAX_NUM_DLLS. If instead you'd like
to produce two workflows, then the best strategy in your case would be
to simply have two independent packages (DESCRIPTION + vignettes/) each
with more modest numbers of DLLs; contact Lori (cc'd) when you've
decided on a second name, and we'll create the svn location for you.

Martin

>
>  Wolfgang
>
> 19.9.17 12:02, Martin Morgan scripsit:
>> On 09/18/2017 10:42 PM, Shian Su wrote:
>>> Hi Aaron,
>>>
>>> Would you mind sharing the code for flushing DLLs? This is a problem
>>> that others working with single cells and I have faced.
>>>
>>
>> For the user encountering this problem I think a better solution is to
>> increase the number of DLLs allowed by R, for instance editing
>> .Renviron to contain the line
>>
>> R_MAX_NUM_DLLS=120
>>
>> or similar. This can be on an installation-wide, user-wise, or
>> project-specific basis, as described in ?Startup
>>
>> @Aaron -- we are still discussing things internally; for instance it
>> is possible to set the maximum number of DLLs in the build system.
>>
>> Martin
>>
>>> Better yet would anyone know of code that would allow unused DLL to
>>> be identified and unloaded? I suspect not as it would require keeping
>>> track of the dependency tree of your current environment but I�m
>>> hopeful.
>>>
>>> Kind regards,
>>> Shian Su
>>>
>>>> On 19 Sep 2017, at 12:30 pm, Aaron Lun <a...@wehi.edu.au> wrote:
>>>>
>>>> Well, inertia won out in the end, and so I've just moved a whole
>>>> stack of packages into "Suggests" for now. This is probably not a
>>>> sustainable solution as the workflow can potentially get larger over
>>>> time; I would prefer to have some formal support for splitting up
>>>> the workflow into modules that can be independently installed.
>>>>
>>>> -Aaron
>>>> 
>>>> From: Vincent Carey <st...@channing.harvard.edu>
>>>> Sent: Saturday, 16 September 2017 10:08:13 PM
>>>> To: Aaron Lun
>>>> Cc: Martin Morgan; bioc-devel@r-project.org
>>>> Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in
>>>> Jenkins build forsingleCellWorkflow
>>>>
>>>> IMHO the pedagogic v

Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

2017-09-19 Thread Aaron Lun
The simplest approach is to try unloading each package in turn (it will fail if 
there are dependencies) and repeat until all desired packages are unloaded. 
After this, you can call gcDLLs() from the R.utils package. There is a code 
chunk in my workflow.Rmd file from lines 1588 to 1605 to do this, see 
https://github.com/MarioniLab/BiocWorkflow2016.

[https://avatars1.githubusercontent.com/u/16623186?v=4=400]<https://github.com/MarioniLab/BiocWorkflow2016>

GitHub - MarioniLab/BiocWorkflow2016: Files for a 
...<https://github.com/MarioniLab/BiocWorkflow2016>
github.com
BiocWorkflow2016 - Files for a Bioconductor workflow for low-level scRNA-seq 
data analyses.



However, this is not without problems, as some packages do some funky 
database-related things upon loading and don't get unloaded properly. Trial and 
error suggests that AnnotationDbi and GenomeInfoDb (and maybe more) should not 
be unloaded, as they can't be properly loaded again in the same session.


-Aaron


From: Shian Su
Sent: Tuesday, 19 September 2017 12:42:47 PM
To: Aaron Lun
Cc: Vincent Carey; bioc-devel@r-project.org
Subject: Re: [Untrusted Server]Re: [Bioc-devel] [Untrusted Server]Re: strange 
error in Jenkins build forsingleCellWorkflow

Hi Aaron,

Would you mind sharing the code for flushing DLLs? This is a problem that 
others working with single cells and I have faced.

Better yet would anyone know of code that would allow unused DLL to be 
identified and unloaded? I suspect not as it would require keeping track of the 
dependency tree of your current environment but I�m hopeful.

Kind regards,
Shian Su

> On 19 Sep 2017, at 12:30 pm, Aaron Lun <a...@wehi.edu.au> wrote:
>
> Well, inertia won out in the end, and so I've just moved a whole stack of 
> packages into "Suggests" for now. This is probably not a sustainable solution 
> as the workflow can potentially get larger over time; I would prefer to have 
> some formal support for splitting up the workflow into modules that can be 
> independently installed.
>
> -Aaron
> 
> From: Vincent Carey <st...@channing.harvard.edu>
> Sent: Saturday, 16 September 2017 10:08:13 PM
> To: Aaron Lun
> Cc: Martin Morgan; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in Jenkins 
> build forsingleCellWorkflow
>
> IMHO the pedagogic value of a unified document that treats a topic thoroughly
> is quite high.  Building the whole workflow on an arbitrary user's system 
> seems to
> me to be a lower priority.  Thus using the environment variable in the build 
> system
> to avoid this limit seems an appropriate solution.
>
> On Sat, Sep 16, 2017 at 7:43 AM, Aaron Lun 
> <a...@wehi.edu.au<mailto:a...@wehi.edu.au>> wrote:
> Thanks Martin. Yes, it's quite unfortunate that scater drags in dplyr and 
> ggplot2, which - combined with Bioconductor's core packages - already puts us 
> pretty close to the limit without doing anything else!
>
>
> A solution might be to split my workflow into self-contained components, each 
> of which can become its own workflow package (e.g., simpleSingleCell1, 
> simpleSingleCell2, simpleSingleCell3 and so on). This should avoid all of the 
> problems and our associated hacks.
>
>
> I'm happy to do this, but is it possible for the website to indicate that 
> there is a connection between the component workflows? For example, the link 
> that ordinarily goes to the compiled workflow could instead go to an indexing 
> page, which contains links to individual component workflows.
>
>
> -Aaron
>
>
> ____
> From: Martin Morgan 
> <martin.mor...@roswellpark.org<mailto:martin.mor...@roswellpark.org>>
> Sent: Saturday, 16 September 2017 8:18:09 PM
> To: Aaron Lun; bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
> Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in Jenkins 
> build forsingleCellWorkflow
>
> On 09/16/2017 01:53 AM, Aaron Lun wrote:
>> Bumping this rather old thread. To re-iterate, I'm updating my 
>> simpleSingleCell workflow and I'm running into R's DLL limit. I've added a 
>> code block halfway through the workflow that unloads all DLLs and cleans 
>> them out, and this works fine during compilation on my local machine.
>>
>>
>> However, it seems that the BioC workflow builder uses a pre-processing step 
>> whereby it first tries to load all packages contained within library() 
>> calls. This hits the DLL limit as it doesn't execute the protective code 
>> block, which defeats the purpose of all my fiddling in the first place.
>>
>>
>> What options are there? I'm happy to split my workflow into multiple 

Re: [Bioc-devel] [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

2017-09-18 Thread Aaron Lun
Well, inertia won out in the end, and so I've just moved a whole stack of 
packages into "Suggests" for now. This is probably not a sustainable solution 
as the workflow can potentially get larger over time; I would prefer to have 
some formal support for splitting up the workflow into modules that can be 
independently installed.

-Aaron

From: Vincent Carey <st...@channing.harvard.edu>
Sent: Saturday, 16 September 2017 10:08:13 PM
To: Aaron Lun
Cc: Martin Morgan; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in Jenkins build 
forsingleCellWorkflow

IMHO the pedagogic value of a unified document that treats a topic thoroughly
is quite high.  Building the whole workflow on an arbitrary user's system seems 
to
me to be a lower priority.  Thus using the environment variable in the build 
system
to avoid this limit seems an appropriate solution.

On Sat, Sep 16, 2017 at 7:43 AM, Aaron Lun 
<a...@wehi.edu.au<mailto:a...@wehi.edu.au>> wrote:
Thanks Martin. Yes, it's quite unfortunate that scater drags in dplyr and 
ggplot2, which - combined with Bioconductor's core packages - already puts us 
pretty close to the limit without doing anything else!


A solution might be to split my workflow into self-contained components, each 
of which can become its own workflow package (e.g., simpleSingleCell1, 
simpleSingleCell2, simpleSingleCell3 and so on). This should avoid all of the 
problems and our associated hacks.


I'm happy to do this, but is it possible for the website to indicate that there 
is a connection between the component workflows? For example, the link that 
ordinarily goes to the compiled workflow could instead go to an indexing page, 
which contains links to individual component workflows.


-Aaron



From: Martin Morgan 
<martin.mor...@roswellpark.org<mailto:martin.mor...@roswellpark.org>>
Sent: Saturday, 16 September 2017 8:18:09 PM
To: Aaron Lun; bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in Jenkins build 
forsingleCellWorkflow

On 09/16/2017 01:53 AM, Aaron Lun wrote:
> Bumping this rather old thread. To re-iterate, I'm updating my 
> simpleSingleCell workflow and I'm running into R's DLL limit. I've added a 
> code block halfway through the workflow that unloads all DLLs and cleans them 
> out, and this works fine during compilation on my local machine.
>
>
> However, it seems that the BioC workflow builder uses a pre-processing step 
> whereby it first tries to load all packages contained within library() calls. 
> This hits the DLL limit as it doesn't execute the protective code block, 
> which defeats the purpose of all my fiddling in the first place.
>
>
> What options are there? I'm happy to split my workflow into multiple smaller 
> Rmarkdown files that get compiled separately, provided there is appropriate 
> support for this setup from the build system

The workflows have been standardized as packages. The packages put the
workflow dependencies in the 'Depends:' field, with the idea being that
the user installing the workflow package 'in the usual way' will get the
packages used in the vignette installed in their system 'in the usual
way' without having to execute special variants of biocLite() /
install.packages() / funky code in the vignette itself to be able to
build the vignette.

Loading a package loads its Depends: (and Imports:) so triggers the problem.

Writing separate vignettes would not help with this (but might make the
workflow more palatable; I'm not 100% sure of support for separate work
flows in a single package, there is no problem with having multiple
workflow packages on the same general topic).

One could move (some?) packages to Suggests: and use your trick of
unloading packages part-way through the vignette. But then users will
find that they need to install packages to complete the vignette.

'We' could add a support for a BBS option that increases R_MAX_NUM_DLLS,
but that would allow the workflow to build on the build system, but not
on the users' system.

I think also the R-core approach to this
(https://stat.ethz.ch/pipermail/r-devel/2016-December/073529.html,
https://github.com/wch/r-source/commit/757bfa1d7ff373a604d6d34617f9cad78e0c875e)
is a little insightful, where one could imagine increasing the default
R_MAX_NUM_DLLS, but apparently on some OS these compete for number of
open files, and this in turn can be quite low.

I note that users have already struggled with the DLL problem 'in the
wild' https://stackoverflow.com/a/45552926/547331. This seems
particularly problematic for workflows, which are appealing to
relatively novice users.

At the end of the day I think the workflows should make realistic use of
R resources. I think this means modifying the workflow to use fewer
DLLs. (this g

Re: [Bioc-devel] [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

2017-09-16 Thread Aaron Lun
Thanks Martin. Yes, it's quite unfortunate that scater drags in dplyr and 
ggplot2, which - combined with Bioconductor's core packages - already puts us 
pretty close to the limit without doing anything else!


A solution might be to split my workflow into self-contained components, each 
of which can become its own workflow package (e.g., simpleSingleCell1, 
simpleSingleCell2, simpleSingleCell3 and so on). This should avoid all of the 
problems and our associated hacks.


I'm happy to do this, but is it possible for the website to indicate that there 
is a connection between the component workflows? For example, the link that 
ordinarily goes to the compiled workflow could instead go to an indexing page, 
which contains links to individual component workflows.


-Aaron



From: Martin Morgan <martin.mor...@roswellpark.org>
Sent: Saturday, 16 September 2017 8:18:09 PM
To: Aaron Lun; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in Jenkins build 
forsingleCellWorkflow

On 09/16/2017 01:53 AM, Aaron Lun wrote:
> Bumping this rather old thread. To re-iterate, I'm updating my 
> simpleSingleCell workflow and I'm running into R's DLL limit. I've added a 
> code block halfway through the workflow that unloads all DLLs and cleans them 
> out, and this works fine during compilation on my local machine.
>
>
> However, it seems that the BioC workflow builder uses a pre-processing step 
> whereby it first tries to load all packages contained within library() calls. 
> This hits the DLL limit as it doesn't execute the protective code block, 
> which defeats the purpose of all my fiddling in the first place.
>
>
> What options are there? I'm happy to split my workflow into multiple smaller 
> Rmarkdown files that get compiled separately, provided there is appropriate 
> support for this setup from the build system

The workflows have been standardized as packages. The packages put the
workflow dependencies in the 'Depends:' field, with the idea being that
the user installing the workflow package 'in the usual way' will get the
packages used in the vignette installed in their system 'in the usual
way' without having to execute special variants of biocLite() /
install.packages() / funky code in the vignette itself to be able to
build the vignette.

Loading a package loads its Depends: (and Imports:) so triggers the problem.

Writing separate vignettes would not help with this (but might make the
workflow more palatable; I'm not 100% sure of support for separate work
flows in a single package, there is no problem with having multiple
workflow packages on the same general topic).

One could move (some?) packages to Suggests: and use your trick of
unloading packages part-way through the vignette. But then users will
find that they need to install packages to complete the vignette.

'We' could add a support for a BBS option that increases R_MAX_NUM_DLLS,
but that would allow the workflow to build on the build system, but not
on the users' system.

I think also the R-core approach to this
(https://stat.ethz.ch/pipermail/r-devel/2016-December/073529.html,
https://github.com/wch/r-source/commit/757bfa1d7ff373a604d6d34617f9cad78e0c875e)
is a little insightful, where one could imagine increasing the default
R_MAX_NUM_DLLS, but apparently on some OS these compete for number of
open files, and this in turn can be quite low.

I note that users have already struggled with the DLL problem 'in the
wild' https://stackoverflow.com/a/45552926/547331. This seems
particularly problematic for workflows, which are appealing to
relatively novice users.

At the end of the day I think the workflows should make realistic use of
R resources. I think this means modifying the workflow to use fewer
DLLs. (this general comment is relevant to other workflows, which for
instance start by downloading very large data sets -- I know that less
constrained use of computing resources is supposed to be a selling point
of the workflows, but in excess this seems counter-productive to their
primary use as pedagogic tools [rather than, for instance, comprehensive
exemplars of reproducible research]).

Maybe there is additional discussion about some of the technical aspects
of workflows that others might contribute.

Martin

>
>
> Cheers
>
>
> Aaron
>
> ____
> From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Aaron Lun 
> <a...@wehi.edu.au>
> Sent: Wednesday, 21 June 2017 12:09:13 AM
> To: bioc-devel@r-project.org
> Subject: [Untrusted Server]Re: [Bioc-devel] strange error in Jenkins build 
> forsingleCellWorkflow
>
> Hi all,
>
>
> I'm getting a curious error in the Jenkins log when I try to build the 
> singleCellWorkflow:
>
>
> http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/48/label=master/console
&

Re: [Bioc-devel] [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

2017-09-15 Thread Aaron Lun
Bumping this rather old thread. To re-iterate, I'm updating my simpleSingleCell 
workflow and I'm running into R's DLL limit. I've added a code block halfway 
through the workflow that unloads all DLLs and cleans them out, and this works 
fine during compilation on my local machine.


However, it seems that the BioC workflow builder uses a pre-processing step 
whereby it first tries to load all packages contained within library() calls. 
This hits the DLL limit as it doesn't execute the protective code block, which 
defeats the purpose of all my fiddling in the first place.


What options are there? I'm happy to split my workflow into multiple smaller 
Rmarkdown files that get compiled separately, provided there is appropriate 
support for this setup from the build system.


Cheers


Aaron


From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Aaron Lun 
<a...@wehi.edu.au>
Sent: Wednesday, 21 June 2017 12:09:13 AM
To: bioc-devel@r-project.org
Subject: [Untrusted Server]Re: [Bioc-devel] strange error in Jenkins build 
forsingleCellWorkflow

Hi all,


I'm getting a curious error in the Jenkins log when I try to build the 
singleCellWorkflow:


http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/48/label=master/console


The key part is at the bottom:


Error: package or namespace load failed for 'GenomicFeatures' in dyn.load(file, 
DLLpath = DLLpath, ...):
 unable to load shared object 
'/var/lib/jenkins/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/libs/Rsamtools.so':
  `maximal number of DLLs reached...


The workflow had previously been running fine on the build system; I'm not 
quite sure what's going on here, given that it's not even failing at the point 
where I made the latest changes.

Cheers,

Aaron

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] assay dimnames in SingleCellExperiment / SummarizedExperiment

2017-09-15 Thread Aaron Lun
I'll leave the first point to the SummarizedExperiment maintainers, though I  
note that your code seems to be about the names of the dimnames rather than the 
dimnames themselves. (I'm under the impression that consistency in the actual 
dimnames is enforced somehow by the SE constructor.)


As for the second point; I suppose we could set the second name for the 
dimnames as "Cells" in SingleCellExperiment, though the choice for the first 
name is more ambiguous. This request has come up before, and I've never been 
entirely convinced by its necessity. It seems mostly aesthetic to me, and 
honestly, if a user doesn't already know that rows are genes and columns are 
cells, I can't see them flailing away at the keyboard until they call dim() to 
tell them what the dimensions correspond to.


But I guess other people like aesthetics, so if you want, you can put in a PR 
to override dim() and dimnames() for SingleCellExperiment to put some names on 
the returned vectors or lists. If I had to choose, I would go with "Features" 
and "Cells" for the rows and columns, respectively. (We already use a RSE so 
we're already implicitly assuming genomic features.)


-Aaron


From: Kevin RUE <kevinru...@gmail.com>
Sent: Thursday, 14 September 2017 10:57:39 PM
To: bioc-devel
Cc: da...@ebi.ac.uk; risso.dav...@gmail.com; Aaron Lun; Maintainer
Subject: assay dimnames in SingleCellExperiment / SummarizedExperiment

Dear all,

I cc-ed to this email individual package maintainer to directly 'notify' them 
of this thread and have their respective opinions, but I thought the common use 
of SummarizedExperiment was worth involving the community as well.

Background: I was updating one of my workflow from SCESet to the 
SingleCellExperiment class recently introduced on the development branch.

1)
One thing leading to another, I ended up noticing that there is no validity 
check on dimnames of the various assays in SummarizedExperiment. In other 
words, the different assays can have different `dimnames` (or some assays can 
have NULL dimnames). Using the example code from SummarizedExperiment:

nrows <- 200; ncols <- 6
counts3 <- counts2 <- counts <-
  matrix(runif(nrows * ncols, 1, 1e4), nrows)

rnames <- paste0("F_", sprintf("%03.f", seq_len(nrows)))
cnames <- LETTERS[1:6]

dimnames(counts) <- list(rnames, cnames)
dimnames(counts2) <- list(Tags = rnames, Samples = cnames)
dimnames(counts3) <- list(Features = rnames, Cells = cnames)

colData <- DataFrame(row.names=cnames)

rse <- SummarizedExperiment(assays=SimpleList(c1=counts, c2=counts2, 
c3=counts3), colData=colData)

assayNames(rse)
names(dimnames(assay(rse, "c1"))) # NULL
names(dimnames(assay(rse, "c2"))) # [1] "Tags""Samples"
names(dimnames(assay(rse, "c3"))) # [1] "Features" "Cells"

Although not critical, it'd probably be best practice to have a validity check 
on identical dimnames across all assay, so that one does not have to worry 
later about `melt` calls returning different column names whether each assay 
has proper dimnames or not.


2)
The initial glitch that prompted this email related to the `reshape2::melt` 
method that extracts dimnames, if available, in the `scater::plotHighestExprs` 
function. Anyway, Davis has already prepared a fix to deal with the scenario 
whereby the assay does have dimnames (e.g. counts in the edgeR::DGEList class 
that I generally use to import counts). Somehow that wasn't an issue with the 
SCESet that I was using previously (probably a side-effect of ExpressionSet).

The point is, the glitch prompted me to think whether a potential 
standardisation of names(dimnames) could be beneficial, perhaps more 
specifically in the new `SingleCellExperiment` class (as SummarizedExperiment 
has a much more general purpose). Considering the fairly specific purpose of 
the former, I was wondering whether it would be worth:

  *   enforcing names(dimnames(x)) to "Features" and "Cells", (bearing in mind 
that features could still be genes, transcripts, ...)
  *   or maybe dropping dimnames altogether, storing them only once elsewhere 
(although a slot for that seems overkill)

There may be other possibilities that I haven't thought of yet, but I thought 
I'd get the ball rolling.
Having well-defined dimnames sounds good practice, with the added benefit of 
generating aesthetically pleasing column names in melted data-frame as a 
by-product.
However, I can't tell whether the handling of dimnames is something that needs 
to be handle by individual downstream package developers, or whether standards 
should be set in parent classes.


Thanks for your time!

Best,
Kevin

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocStyle not acknowledging fig.wide=TRUE in knitr

2017-09-15 Thread Aaron Lun
Dear list,


I'm playing around with the BiocStyle aesthetics for my csaw user's guide, and 
it looks pretty good in general. However, setting fig.wide=TRUE doesn't seem to 
be respected by knitr during compilation. To give an example:


%%% Latex code start %%%


\documentclass{article}


<<style-knitr, eval=TRUE, echo=FALSE, results="asis">>=

BiocStyle::latex()

@


\bioctitle[Test]{Testing}

%% also: \bioctitle{Title used for both header and title page}

%% or... \title{Title used for both header and title page}

\author{Aaron Lun}


\begin{document}

\maketitle



\begin{abstract}

yayayayaya

\end{abstract}


\packageVersion{\Sexpr{BiocStyle::pkg_ver("csaw")}}


\tableofcontents

\newpage

\chapter{blasdh}

\section{Introduction}

\subsection{Introduction}

\subsubsection{Introduction}

\paragraph{XXX}


asdasd

asdasd


<<testfig, fig.wide=TRUE>>=

plot(1, 1)

@


\end{document}

%%% End %%%

Running this with "R CMD Sweave --engine=knitr::knitr --pdf " gives 
me a TeX file. But setting "fig.wide=FALSE" in the testfig chunk and 
re-Sweaving gives me the exact same TeX file.

I would have expected something to manifest differently when I ask for a figure 
to be wide. in this case, it seems like it never becomes wide, regardless of 
how much I ask. Am I missing something?

Cheers,

Aaron

P.S. Running on BiocStyle 2.5.37.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] cbind for DataFrame no longer behaving as expected

2017-09-08 Thread Aaron Lun
Thanks Michael. Also for the Github tip.


-Aaron


From: Michael Lawrence <lawrence.mich...@gene.com>
Sent: Saturday, 9 September 2017 1:41:24 AM
To: Aaron Lun
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] cbind for DataFrame no longer behaving as expected

This is a problem with DataFrame(). Base R's data.frame uses
as.data.frame() to coerce its arguments, passing optional=TRUE. With
optional=TRUE, vectors and other things without column names coerced
to data.frames will have NULL names, so data.frame() can tell the
difference between real/original names and concocted names, for which
it uses the argument name or the deparsed argument as a last resort.
Since DataFrame does not support NULL names, we have long relied on a
heuristic that breaks in this case:

> DataFrame(list(data.frame(foo=1)))
  list.data.frame.foo...1..
  
1 1


> data.frame(list(data.frame(foo=1)))
  foo
1   1

I will fix this somehow. Btw, you can submit these types of issues
through github now.

Michael

On Fri, Sep 8, 2017 at 6:25 AM, Aaron Lun <a...@wehi.edu.au> wrote:
> Dear list,
>
> It seems that an alteration to the cbind method for DataFrame objects in 
> S4Vectors (probably d595a19b19df9b9c9aaef71e9c1cd1bdc681bfb1) has led to some 
> strange behaviour. In particular, if I run this code with S4Vectors 0.15.7, I 
> get the following output:
>
> require(S4Vectors)
> cbind(DataFrame(score=1, xxx=1), DataFrame(row.names=1)) # okay
>
> ## DataFrame with 1 row and 2 columns
> ##  score   xxx
> ##   
> ## 1 1 1
>
> cbind(DataFrame(score=1), DataFrame(row.names=1)) # strange
>
> ## DataFrame with 1 row and 1 column
> ##dfs
> ##  
> ## 1 1
>
> The first cbind() call works as expected, but the named "score" field 
> disappears in the output object of the second cbind() call, which is rather 
> surprising. This is the source of at least a few failed tests in the 
> InteractionSet package.
>
> Cheers,
>
> Aaron
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] cbind for DataFrame no longer behaving as expected

2017-09-08 Thread Aaron Lun
Dear list,

It seems that an alteration to the cbind method for DataFrame objects in 
S4Vectors (probably d595a19b19df9b9c9aaef71e9c1cd1bdc681bfb1) has led to some 
strange behaviour. In particular, if I run this code with S4Vectors 0.15.7, I 
get the following output:

require(S4Vectors)
cbind(DataFrame(score=1, xxx=1), DataFrame(row.names=1)) # okay

## DataFrame with 1 row and 2 columns
##  score   xxx
##   
## 1 1 1

cbind(DataFrame(score=1), DataFrame(row.names=1)) # strange

## DataFrame with 1 row and 1 column
##dfs
##  
## 1 1

The first cbind() call works as expected, but the named "score" field 
disappears in the output object of the second cbind() call, which is rather 
surprising. This is the source of at least a few failed tests in the 
InteractionSet package.

Cheers,

Aaron
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] transitioning scater/scran to SingleCellExperiment

2017-08-08 Thread Aaron Lun
> I guess this would be a question for the
> SummarizedExperiment developers, though personally, I never liked
> ExpressionSet's inclination to slap names on everything.
> 
> Too bad we’re bound to SummarizedExperiment’s “rows” and “cols”. Since 
> they always refer to features and samples, respectively: Why not name 
> them that?
> 
> There’s already too many APIs in too many programming languages that 
> confusingly have one or the other convention – if whe know which is 
> which, why not name them after that knowledge?

*shrug* + *meh*. As I said, I'm the wrong person to complain to about 
this. Though I don't have particularly strong feelings either way.

> It probably wouldn't be a good idea to store distances as expression
> matrices. However, if there is a need for it, we can add a new slot
> for distance matrices. I think SC3 has a similar requirement, so
> perhaps this would be more generally useful than I first thought.
> You can post an issue on the github repository to remind Davide or
> me to do it.
> 
> Distance matrices (cell×cell) can’t only come from cell×gene matrices. 
> You can e.g. use dynamic time warping to create them from cell×gene×time 
> arrays.

I don't think there's direct support for >2-dimensional arrays in SE 
objects. You might be able to put them in, but I don't know how well it 
will interact with the subsetting machinery. One solution is to split it 
up by the third dimension and store each matrix as a separate assay.

In any case, a distance matrix calculated from such an array would be 
fine, as long as the dimensions are equal to the number of cells. The 
question is whether it is needed by enough packages to warrant a slot in 
the base SCE class; I will discuss this with Davide and Vlad.

> Finally, I'm not sure what advantages those ergonomics provide.
> Indeed, if every package defines its own plot() S4 method for
> SingleCellExperiment, they will clobber each other in the dispatch
> table, resulting in some interesting results dependent on package
> loading order. If you have destiny-specific data and methods, best
> to keep them separate rather than stuffing them into the SCE object.
> 
> I wrote that I could e.g. create a plot_dm method, which plots a 
> diffusion map stored in a SCE.
> 
> Also I didn’t mean the plot method with ergonomics. I meant |fortify|, 
> |names|, |$|, and |[[|. Those would be very useful, as you could just do 
> things like the following, and have autocompletion:
> 
> sce$Predicate1 <- sce$SampleMeta1 > 40# `$` accesses counts (by gene) 
> and rowData. `$<-` sets rowData
> qplot(Gene1, Gene2, colour = Predicate1, data = sce) # fortify creates a 
> data.frame containing cbind(t(counts), rowData)

The SingleCellExperiment package makes no statement on whether 
downstream users/packages want to (or not) use the tidy-verse or 
ggplot2. It simply provides the minimal class and methods; convenience 
wrappers are left to the discretion of each package developer. scater, 
for example, implements a few dplyr verbs for SCE objects.

Cheers,

Aaron
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] transitioning scater/scran to SingleCellExperiment

2017-08-07 Thread Aaron Lun
perspective, if the DiffusionMap() function vomits out a 
lot of metadata fields, that might not be desirable if only the final 
diffusion coordinates are of interest. In such cases, I would find it 
easier to just extract the coordinates and store it in reducedDim<- 
manually. Whether this is done from a DiffusionMap or 
SingleCellExperiment output makes little difference to me.

Finally, I'm not sure what advantages those ergonomics provide. Indeed, 
if every package defines its own plot() S4 method for 
SingleCellExperiment, they will clobber each other in the dispatch 
table, resulting in some interesting results dependent on package 
loading order. If you have destiny-specific data and methods, best to 
keep them separate rather than stuffing them into the SCE object.

Our vision for the SCE class is to coordinate inputs into many packages 
across a long, long workflow. A little detour into destiny's classes for 
a small portion of the workflow doesn't pose much trouble, as long as 
any relevant statistics can be extracted and stored in the SCE object 
when it moves to the next stage of the workflow.

-Aaron

> 
> *Von: *"Aaron Lun" <a...@wehi.edu.au>
> *An: *"bioc-devel" <bioc-devel@r-project.org>
> *Gesendet: *Montag, 31. Juli 2017 10:38:03
> *Betreff: *Re: [Bioc-devel] transitioning scater/scran to 
> SingleCellExperiment
> 
> Dear developers,
> 
> Both scater and scran will be migrating to the SingleCellExperiment
> class (https://bioconductor.org/packages/SingleCellExperiment) in the
> next BioC release. This is based on a SummarizedExperiment and provides
> a more modern user interface, as well as supporting different matrix
> representations (e.g., dgCMatrix, HDF5Matrix).
> 
> We note that there are a number of Bioconductor packages that depend
> on/import/suggest scater or scran, which we have listed below:
> 
> scDD
> scone
> SIMLR
> splatter
> Glimma
> SC3
> phenopath
> switchde
> 
> To the maintainers of these packages, we advise switching from SCESet to
> SingleCellExperiment as soon as possible; the former will be deprecated
> in the next release cycle. There are several things to note here:
> 
> - The SCESet previously contained a number of slots relating to
> distances and clustering results. These are no longer present in the
> SingleCellExperiment, in line with the minimalist design philosophy of
> that package. If these are necessary, we suggest extending the
> SingleCellExperiment class in your own packages(*).
> 
> - For packages that depend directly on methods in scater or scran, a
> number of methods have been removed. This aims to simplify the analysis
> workflow and code maintenance by reducing redundancy. Please ensure that
> your package does not need those missing methods by CHECKing it against
> the experimental versions(**) of these two packages:
> 
> https://github.com/LTLA/scran
> https://github.com/davismcc/scater/tree/future
> 
> If there are any issues with the switch, please let us know and we will
> do our best to figure out the most appropriate fix.
> 
> Regards,
> 
> Aaron, Davis and Davide
> 
> (*): If there is popular demand for some slots, we may consider
> including it in the base SingleCellExperiment object.
> 
> (**): These versions are highly experimental and fluid, and results are
> likely to be unstable over the coming month. Nonetheless, if something
> is breaking, it is best that we know sooner rather than later. Or in
> other words, don't start complaining when it's close to release time.
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> 
> 
> Helmholtz Zentrum München
> Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
> Ingolstädter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Bärbel Brumme-Bothe
> Geschäftsführer: Prof. Dr. Günther Wess, Heinrich Baßler, Dr. Alfons Enhsen
> Registergericht: Amtsgericht München HRB 6466
> USt-IdNr: DE 129521671
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] improving navigation within workflows

2017-08-04 Thread Aaron Lun
Regarding the metadata: I think that the installation instructions
should always be visible. However, the rest of the details could be
stored in a hidden section that can be clicked to expose the full
details (e.g., expands upon clicking a link saying "Click here for more
details").

Andrzej Oleś wrote:
> Hi Vince,
>
> thanks for your feedback. The floating TOC would be technically
> possible, the question is more about the aesthetics of such solution.
> As we thought of using the right column for the supplementary stuff
> such asfigure/table captions or footnotes, this space cannot harbor a
> floating TOC as it would obscure the view of these items.
>
> Regarding the document metadata section at the top: this is a fair
> point and I'm interested to hear the opinion of others. Personally I
> would argue that some of these information is quite important and
> should be emphasized at the beginning, such as installation
> instructions or recent modification and compilation dates. I agree
> that links to package tarballs are more obscure and could be maybe
> moved somewhere downstream. As workflow pages tend to be relatively
> long documents, I'm a bit worried though whether burying it at the
> very bottom after the output of `sessionInfo()` won't virtually
> prevent people from reaching it at all. I'm looking forward to the
> thoughts of other developers on this.
>
> Cheers,
> Andrzej
>
> On Fri, Aug 4, 2017 at 5:01 PM, Vincent Carey
> <st...@channing.harvard.edu <mailto:st...@channing.harvard.edu>> wrote:
>
> interesting.  is the floating toc not an option for this?  also,
> IMHO the "about this document" should go at the end.  one wants
>     the content to hit the reader right away, not the provenance, i think.
>
> On Fri, Aug 4, 2017 at 9:33 AM, Aaron Lun <a...@wehi.edu.au
> <mailto:a...@wehi.edu.au>> wrote:
>
> Mind -> blown. Yes, that's exactly what I wanted, thanks.
>
> -Aaron
>
> On 04/08/17 14:19, Andrzej Oleś wrote:
> > Hi Aaron,
> >
> > I'm happy to inform that I've implemented your suggestion to add
> > navigation links at workflow pages. Feel free to have a look
> at the
> > working prototype under
> >
> https://www.bioconductor.org/help/workflows/testproj/testfile/
> <https://www.bioconductor.org/help/workflows/testproj/testfile/>
> >
> > The new workflow-rendering engine which includes the
> navigation links
> > will be enabled soon after some additional testing is
>     finished. The
> > transition will be announced in a separate email to the
> devel mailing list.
> >
> > Cheers,
> > Andrzej
> >
> >
> > On Mon, Jul 17, 2017 at 10:15 AM, Aaron Lun
> <a...@wehi.edu.au <mailto:a...@wehi.edu.au>
> > <mailto:a...@wehi.edu.au <mailto:a...@wehi.edu.au>>> wrote:
> >
> > Hi Andrzej,
> >
> > Interesting. I also didn't realize that we could get
> figure referencing
> > via bookdown, that's nice to know. I was thinking that
> the floating TOC
> > could go onto the right margin, but I guess that doesn't
> fit anymore now
> > that you have the footnotes and figure captions.
> >
> > CRAN-style links might be the best compromise, e.g., if
> they can be
> > placed in the right margin. A simple set of three links
> > () beside each
> section/subsection heading
> > would probably sufficient for effective navigation,
> without using up too
> > much space.
> >
> > Cheers,
> >
> > Aaron
> >
> > On 17/07/17 00:35, Andrzej Oleś wrote:
> > > Hi Aaron,
> > >
> > > thanks for your feedback. We are currently looking
> into ways of
> > > improving the building of workflows for the website in
> order to enable
> > > cross references, html widgets, and similar. So far
> these were not
> > > supported because of some technical constraints of the
> current
> > > implementation which involves rendering the workflows
> into an
> > > intermediate .md file. The new approach will ov

Re: [Bioc-devel] improving navigation within workflows

2017-08-04 Thread Aaron Lun
Mind -> blown. Yes, that's exactly what I wanted, thanks.

-Aaron

On 04/08/17 14:19, Andrzej Oleś wrote:
> Hi Aaron,
> 
> I'm happy to inform that I've implemented your suggestion to add 
> navigation links at workflow pages. Feel free to have a look at the 
> working prototype under 
> https://www.bioconductor.org/help/workflows/testproj/testfile/
> 
> The new workflow-rendering engine which includes the navigation links 
> will be enabled soon after some additional testing is finished. The 
> transition will be announced in a separate email to the devel mailing list.
> 
> Cheers,
> Andrzej
> 
> 
> On Mon, Jul 17, 2017 at 10:15 AM, Aaron Lun <a...@wehi.edu.au 
> <mailto:a...@wehi.edu.au>> wrote:
> 
> Hi Andrzej,
> 
> Interesting. I also didn't realize that we could get figure referencing
> via bookdown, that's nice to know. I was thinking that the floating TOC
> could go onto the right margin, but I guess that doesn't fit anymore now
> that you have the footnotes and figure captions.
> 
> CRAN-style links might be the best compromise, e.g., if they can be
> placed in the right margin. A simple set of three links
> () beside each section/subsection heading
> would probably sufficient for effective navigation, without using up too
> much space.
> 
> Cheers,
> 
> Aaron
> 
> On 17/07/17 00:35, Andrzej Oleś wrote:
> > Hi Aaron,
> >
> > thanks for your feedback. We are currently looking into ways of
> > improving the building of workflows for the website in order to enable
> > cross references, html widgets, and similar. So far these were not
> > supported because of some technical constraints of the current
> > implementation which involves rendering the workflows into an
> > intermediate .md file. The new approach will overcome these limitations
> > by skipping this intermediate step and rendering directly to html. For a
> > preview see:
> > https://www.bioconductor.org/help/workflows/testproj/testfile/
> <https://www.bioconductor.org/help/workflows/testproj/testfile/> (it
> also
> > uses a slightly modified layout with some of the supporting information
> > such as figure and table captions, or footnotes moved to the right 
> column)
> >
> > Regarding the navigation, a floating TOC would be in principle possible
> > in the new implementation. It is not entirely clear to me, however,
> > weather this will visually work with the bioconductor.org 
> <http://bioconductor.org>
>  > <http://bioconductor.org> website template. Links to the TOC, such as
> > ones in e.g.
> > https://www.bioconductor.org/help/workflows/highthroughputassays
> <https://www.bioconductor.org/help/workflows/highthroughputassays> could
> > be relatively easily inserted automatically after each section. Static
> > CRAN-like solution as you mentioned it would probably require some more
> > work.
> >
> > Cheers,
> > Andrzej
> >
> > On Sun, Jul 16, 2017 at 9:36 PM, Aaron Lun <a...@wehi.edu.au 
> <mailto:a...@wehi.edu.au>
> > <mailto:a...@wehi.edu.au <mailto:a...@wehi.edu.au>>> wrote:
> >
> > Indeed, that's exactly what I was thinking of. I have a floating 
> TOC in
> > my own Rmarkdown files, but I'm not sure if it's supported by the
> > workflow builder, given that it adds a separate hyperlinked TOC to 
> the
> > start of the workflow page.
> >
> > On 16/07/17 19:29, Vincent Carey wrote:
> > > like this?
> > >
> > > 
> http://bioconductor.org/packages/release/bioc/vignettes/BiocStyle/inst/doc/AuthoringRmdVignettes.html
> 
> <http://bioconductor.org/packages/release/bioc/vignettes/BiocStyle/inst/doc/AuthoringRmdVignettes.html>
> > 
> <http://bioconductor.org/packages/release/bioc/vignettes/BiocStyle/inst/doc/AuthoringRmdVignettes.html
> 
> <http://bioconductor.org/packages/release/bioc/vignettes/BiocStyle/inst/doc/AuthoringRmdVignettes.html>>
> > >
> > > On Sun, Jul 16, 2017 at 11:53 AM, Aaron Lun <a...@wehi.edu.au 
> <mailto:a...@wehi.edu.au> <mailto:a...@wehi.edu.au
> <mailto:a...@wehi.edu.au>>
>  > > <mailto:a...@wehi.edu.au <mailto:a...@wehi.edu.au>
> <mailto:a...@wehi.edu.au <mailto:a...@wehi.edu.au>>>> wrote:
>  > >
>  > > Hello all,
> 

Re: [Bioc-devel] transitioning scater/scran to SingleCellExperiment

2017-07-31 Thread Aaron Lun
Dear developers,

Both scater and scran will be migrating to the SingleCellExperiment 
class (https://bioconductor.org/packages/SingleCellExperiment) in the
next BioC release. This is based on a SummarizedExperiment and provides 
a more modern user interface, as well as supporting different matrix
representations (e.g., dgCMatrix, HDF5Matrix).

We note that there are a number of Bioconductor packages that depend 
on/import/suggest scater or scran, which we have listed below:

scDD
scone
SIMLR
splatter
Glimma
SC3
phenopath
switchde

To the maintainers of these packages, we advise switching from SCESet to
SingleCellExperiment as soon as possible; the former will be deprecated
in the next release cycle. There are several things to note here:

- The SCESet previously contained a number of slots relating to
distances and clustering results. These are no longer present in the
SingleCellExperiment, in line with the minimalist design philosophy of
that package. If these are necessary, we suggest extending the
SingleCellExperiment class in your own packages(*).

- For packages that depend directly on methods in scater or scran, a
number of methods have been removed. This aims to simplify the analysis
workflow and code maintenance by reducing redundancy. Please ensure that
your package does not need those missing methods by CHECKing it against 
the experimental versions(**) of these two packages:

https://github.com/LTLA/scran
https://github.com/davismcc/scater/tree/future

If there are any issues with the switch, please let us know and we will
do our best to figure out the most appropriate fix.

Regards,

Aaron, Davis and Davide

(*): If there is popular demand for some slots, we may consider 
including it in the base SingleCellExperiment object.

(**): These versions are highly experimental and fluid, and results are
likely to be unstable over the coming month. Nonetheless, if something
is breaking, it is best that we know sooner rather than later. Or in
other words, don't start complaining when it's close to release time.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] improving navigation within workflows

2017-07-17 Thread Aaron Lun
Hi Andrzej,

Interesting. I also didn't realize that we could get figure referencing 
via bookdown, that's nice to know. I was thinking that the floating TOC 
could go onto the right margin, but I guess that doesn't fit anymore now 
that you have the footnotes and figure captions.

CRAN-style links might be the best compromise, e.g., if they can be 
placed in the right margin. A simple set of three links 
() beside each section/subsection heading 
would probably sufficient for effective navigation, without using up too 
much space.

Cheers,

Aaron

On 17/07/17 00:35, Andrzej Oleś wrote:
> Hi Aaron,
> 
> thanks for your feedback. We are currently looking into ways of 
> improving the building of workflows for the website in order to enable 
> cross references, html widgets, and similar. So far these were not 
> supported because of some technical constraints of the current 
> implementation which involves rendering the workflows into an 
> intermediate .md file. The new approach will overcome these limitations 
> by skipping this intermediate step and rendering directly to html. For a 
> preview see: 
> https://www.bioconductor.org/help/workflows/testproj/testfile/ (it also 
> uses a slightly modified layout with some of the supporting information 
> such as figure and table captions, or footnotes moved to the right column)
> 
> Regarding the navigation, a floating TOC would be in principle possible 
> in the new implementation. It is not entirely clear to me, however, 
> weather this will visually work with the bioconductor.org 
> <http://bioconductor.org> website template. Links to the TOC, such as 
> ones in e.g. 
> https://www.bioconductor.org/help/workflows/highthroughputassays could 
> be relatively easily inserted automatically after each section. Static 
> CRAN-like solution as you mentioned it would probably require some more 
> work.
> 
> Cheers,
> Andrzej
> 
> On Sun, Jul 16, 2017 at 9:36 PM, Aaron Lun <a...@wehi.edu.au 
> <mailto:a...@wehi.edu.au>> wrote:
> 
> Indeed, that's exactly what I was thinking of. I have a floating TOC in
> my own Rmarkdown files, but I'm not sure if it's supported by the
> workflow builder, given that it adds a separate hyperlinked TOC to the
> start of the workflow page.
> 
> On 16/07/17 19:29, Vincent Carey wrote:
> > like this?
> >
> > 
> http://bioconductor.org/packages/release/bioc/vignettes/BiocStyle/inst/doc/AuthoringRmdVignettes.html
> 
> <http://bioconductor.org/packages/release/bioc/vignettes/BiocStyle/inst/doc/AuthoringRmdVignettes.html>
> >
> > On Sun, Jul 16, 2017 at 11:53 AM, Aaron Lun <a...@wehi.edu.au 
> <mailto:a...@wehi.edu.au>
> > <mailto:a...@wehi.edu.au <mailto:a...@wehi.edu.au>>> wrote:
> >
> > Hello all,
> >
> > I was wondering if there's any plans to improve the navigation for 
> the
> > BioC workflows. I was looking at my simpleSingleCell workflow:
> >
> > https://www.bioconductor.org/help/workflows/simpleSingleCell/
> <https://www.bioconductor.org/help/workflows/simpleSingleCell/>
> > <https://www.bioconductor.org/help/workflows/simpleSingleCell/
> <https://www.bioconductor.org/help/workflows/simpleSingleCell/>>
> >
> > ... and I've realized that it's gotten pretty long. It's a pain to 
> keep
> > on scrolling up and down when I'm stuck in the middle of the 
> document
> > and I want to jump somewhere else quickly (or even just go to the 
> TOC).
> >
> > It would be nice to have some sort of floating navigation bar 
> containing
> > links to every section, rather than a TOC at the top. Failing that,
> > section numbers and hyperlinks to the top or end (a la
> > https://cran.r-project.org/doc/manuals/r-release/R-exts.html
> <https://cran.r-project.org/doc/manuals/r-release/R-exts.html>
>  > <https://cran.r-project.org/doc/manuals/r-release/R-exts.html
> <https://cran.r-project.org/doc/manuals/r-release/R-exts.html>>)
> would be
>  > useful.
>  >
>  > Cheers,
>  >
>  > Aaron
>  > ___
>  > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
> <mailto:Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>>
> mailing list
>  > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>  > <https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>  >
>  >
> ___
> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> 
> 
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] improving navigation within workflows

2017-07-16 Thread Aaron Lun
Indeed, that's exactly what I was thinking of. I have a floating TOC in 
my own Rmarkdown files, but I'm not sure if it's supported by the 
workflow builder, given that it adds a separate hyperlinked TOC to the 
start of the workflow page.

On 16/07/17 19:29, Vincent Carey wrote:
> like this?
> 
> http://bioconductor.org/packages/release/bioc/vignettes/BiocStyle/inst/doc/AuthoringRmdVignettes.html
> 
> On Sun, Jul 16, 2017 at 11:53 AM, Aaron Lun <a...@wehi.edu.au 
> <mailto:a...@wehi.edu.au>> wrote:
> 
> Hello all,
> 
> I was wondering if there's any plans to improve the navigation for the
> BioC workflows. I was looking at my simpleSingleCell workflow:
> 
> https://www.bioconductor.org/help/workflows/simpleSingleCell/
> <https://www.bioconductor.org/help/workflows/simpleSingleCell/>
> 
> ... and I've realized that it's gotten pretty long. It's a pain to keep
> on scrolling up and down when I'm stuck in the middle of the document
> and I want to jump somewhere else quickly (or even just go to the TOC).
> 
> It would be nice to have some sort of floating navigation bar containing
> links to every section, rather than a TOC at the top. Failing that,
> section numbers and hyperlinks to the top or end (a la
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html
> <https://cran.r-project.org/doc/manuals/r-release/R-exts.html>) would be
> useful.
> 
> Cheers,
> 
> Aaron
> ___
> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> 
> 
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] improving navigation within workflows

2017-07-16 Thread Aaron Lun
Hello all,

I was wondering if there's any plans to improve the navigation for the 
BioC workflows. I was looking at my simpleSingleCell workflow:

https://www.bioconductor.org/help/workflows/simpleSingleCell/

... and I've realized that it's gotten pretty long. It's a pain to keep 
on scrolling up and down when I'm stuck in the middle of the document 
and I want to jump somewhere else quickly (or even just go to the TOC).

It would be nice to have some sort of floating navigation bar containing 
links to every section, rather than a TOC at the top. Failing that, 
section numbers and hyperlinks to the top or end (a la 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html) would be 
useful.

Cheers,

Aaron
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] strange error in Jenkins build for singleCellWorkflow

2017-06-20 Thread Aaron Lun
Hi all,


I'm getting a curious error in the Jenkins log when I try to build the 
singleCellWorkflow:


http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/48/label=master/console


The key part is at the bottom:


Error: package or namespace load failed for 'GenomicFeatures' in dyn.load(file, 
DLLpath = DLLpath, ...):
 unable to load shared object 
'/var/lib/jenkins/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/libs/Rsamtools.so':
  `maximal number of DLLs reached...


The workflow had previously been running fine on the build system; I'm not 
quite sure what's going on here, given that it's not even failing at the point 
where I made the latest changes.

Cheers,

Aaron

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


  1   2   >