from:"Wolfgang Huber"

[Bioc-devel] Ascona Workshop: Spatial and temporal statistical modeling in molecular biology. 8-13 Sep 2024 in Ascona, Switzerland

2023-11-23 Thread Wolfgang Huber

We are excited to announce the Ascona Workshop 

“Spatial and Temporal Statistical Modeling in Molecular Biology” 

in Ascona, Switzerland, 8-13 September 2024, looking at statistical, 
computational and machine learning methods and applications to spatial 
biological data at different length scales, from the nucleus and spatial omics 
of tissues to ecosystems and planetary-scale biology.  More info at 
https://spatialbio.net

Confirmed speakers:
- Peer Bork, EMBL
- Maria Cristina Gambetta, Univ. of Lausanne
- Shila Ghazanfar, Univ. of Sydney
- Stefanie Hicks, Johns Hopkins University
- Dmitry Kobak, Univ. of Tübingen
- Anna Kreshuk, EMBL
- Andreas Moor, ETH
- Emma Schymanski, Univ. of Luxemburg
- Ewa Szczurek, Univ. of Warsaw
- Virginie Uhlmann, Univ. of Zürich
- Lara Urban, Technical University Munich / Helmholtz Center

We invite you to submit proposals for contributed presentations and posters!

Pre-registration is open now until 22 April 2024. Instructions at 
https://spatialbio.net

With kind regards from the organizers
Niko Beerenwinkel, ETH
Valentina Boeva, ETH 
Peter Bühlmann, ETH
Wolfgang Huber, EMBL

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Package problems due to results() function from other package?

2023-10-31 Thread Wolfgang Huber

Dear Christian

If your vignette attaches another package that exports a “results” function, 
after it attached SNPhood which defines its own results function, then the R 
interpreter has no other choice than doing what it does.

Other people adding additional functionality to their packages is probably not 
something one can really complain about, so I see three options
- you use SNPhood::results in your vignette
- you don’t attach the other package, and rather just use what you need from it 
using “::”
- you convince Hervé to add ‘results' to BiocGenerics and everyone who exports 
such a function converts it to a method for that generic.

Thank you and kind regards
Wolfgang

--
Wolfgang Huber
EMBL
https://www.huber.embl.de/









> Il giorno 2023-10-28, alle ore 16:15, Christian Arnold  ha 
> scritto:
> 
> For my package SNPhood that did not receive any code changes or updates
> in quite a while, I suddenly see errors with Bioc 3.18:
> https://master.bioconductor.org/checkResults/3.18/bioc-LATEST/SNPhood/nebbiolo2-buildsrc.html
> 
> Error: processing vignette 'workflow.Rmd' failed with diagnostics:
> unused argument (type = "allelicBias")
> 
> This comes from this line I think:
> 
> names(results(SNPhood.o, type = "allelicBias"))
> 
> For literally years, this didnt cause any problems, and the results
> function is actually (re)defined in the SNPhood package:
> 
> results <- function(SNPhood.o, type, elements = NULL)
> 
> I am not sure now what causes this. Should I use the syntax
> SNPhood::results to make it clear, or I am wrongly assuming that the
> wrong result function is taken that causes the error?
> 
> Any pointers?
> 
> 
> Best
> 
> Christian
> 
> 
> [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] BiocManager::install

2023-05-14 Thread Wolfgang Huber

Dear Klaus

Good points. I am not trying to “poach” anything here, and guess you have 
discussed this, but with this number of reverse dependencies, it sounds like 
the maintainers, contributors and users of ape and phangorn could benefit from 
putting the packages on Bioconductor?

>From a quick look it seems that a good fraction of their reverse dependencies 
>are on Bioconductor, and also the remaining ones might benefit from you being 
>able to follow the established and transparent devel/release cycle?

Thanks and best wishes
Wolfgang







> Il giorno 12.05.2023, alle ore 17:19, Klaus Schliep  
> ha scritto:
> 
> Hi all,
> no real contribution to this discussion, just some appreciation to all your
> hard work.
> I just wish CRAN had something like a proper release cycle for packages. I
> contribute to two packages phangorn and ape, which have a decent number of
> reverse dependencies. So you can't introduce any breaking changes (as there
> is no devel branch) without contacting all maintainers and fixing some/most
> reverse dependencies yourself. Also whenever CRAN introduces a change to
> R-devel which causes an error of your package, one and sometime all
> maintainers of dependent packages might get an email stating to fix this
> within 2 weeks. This however has nothing to do with their release cycle of
> the new R version itself. I wonder how many packages have been taken from
> CRAN and how much total unnecessary frustration this causes?
> Have a nice weekend.
> Kind regards,
> Klaus Schliep
> 
> On Sat, May 6, 2023 at 10:40 AM Wolfgang Huber 
> wrote:
> 
>> Hi,
>> 
>> I am wondering whether:
>> 1. it could be easier to install Bioconductor packages (devel or release)
>> on R-devel (or other non-standard R versions) using BiocManager::install (I
>> may be stirring a hornet’s nest with that:)
>> 2. whether its documentation needs to be updated and/or its implementation
>> could be deconvoluted (hopefully that’s uncontroversial).
>> 
>> Re the first point, I appreciate that we’re trying to help non-expert
>> users with simple use cases, and that we had/have a lot of trouble with
>> users working with out-of-sync versions. OTOH, the current solution (rigid,
>> confusing documentation, seemingly buggy implementation) seems to be
>> standing in the way for developers, a dichotomy that we do not really want.
>> 
>> Of course, a workaround is
>> ```{r}
>>> install.packages("ggtree", repos = c(“@CRAN@", "
>> https://bioconductor.org/packages/3.18/bioc;)
>> ```
>> and maybe this is just the answer. So far, my workflows have been based on
>> BiocManager::install, but I get (and cannot seem to get rid of):
>> 
>> ```{r}
>>> options(BIOCONDUCTOR_ONLINE_VERSION_DIAGNOSIS = FALSE)
>>> BiocManager::install("ggtree", version = "devel")
>> Error: Bioconductor does not yet build and check packages for R version
>> 4.4; see
>>  https://bioconductor.org/install
>> 
>>> sessionInfo()
>> R Under development (unstable) (2023-05-05 r84398)
>> Platform: aarch64-apple-darwin20 (64-bit)
>> Running under: macOS Ventura 13.3.1
>> 
>> Matrix products: default
>> BLAS:
>> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
>> 
>> LAPACK:
>> /Users/whuber/R.framework/Versions/4.4/Resources/lib/libRlapack.dylib;
>> LAPACK version 3.11.0
>> 
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> 
>> time zone: Europe/Berlin
>> tzcode source: internal
>> 
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>> 
>> other attached packages:
>> [1] BiocManager_1.30.20 fortunes_1.5-4
>> 
>> loaded via a namespace (and not attached):
>> [1] compiler_4.4.0  tools_4.4.0 rstudioapi_0.14
>> ```
>> 
>> I noted some discussion on this here:
>> https://github.com/Bioconductor/BiocManager/issues/13 but this was 5
>> years ago.
>> It appears that the documentation of BiocManager::install mismatches its
>> implementation, and overall the process for something that's conceptually
>> quite simple seems to have become convoluted.
>> 
>> One of the most helpful documentation resources on this topic btw is
>> https://solutions.posit.co/envs-pkgs/bioconductor/ which cheerfully
>> concludes "Working with BioConductor packages for code development is
>> possible."
>> 
>> Thanks and best wishes
>> Wolfgang
>> 
>> --
>> Wolfgang Hu

Re: [Bioc-devel] BiocManager::install

2023-05-12 Thread Wolfgang Huber



> Il giorno 12.05.2023, alle ore 04:43, Kasper Daniel Hansen 
>  ha scritto:
> 
> It seems totally sensible to be able to use BiocManager to install either 
> bioc-release or bioc-devel at any time, provided you're running R-devel. 
> First, by definition, R-devel is always >= the R used for release / devel and 
> Second, it is reasonable to assume users of R-devel to know what they are 
> doing.
> 
> I am unsure if you're arguing for anything else.

Hi Kasper,

Nope, that’s it. Was I so unclear?

Wolfgang.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] BiocManager::install

2023-05-11 Thread Wolfgang Huber

Hi Kasper

My use case is simple: anyone who works with R-devel and wants to use a package 
on Bioconductor from April to October.
Many of the 2230 packages in our repository are useful outside of the 
BiocGenerics, IRanges, SummarizedExperiment core world. 
E.g., to name a few, BiocParallel, illuminaio, rhdf5, EBImage, ggtree, edgeR, 
limma, qvalue, sparseMatrixStats, … and I do not think “we” should recommend 
people who want to use these which version of R they must use. Btw these 
examples are all “highly downloaded”.

I fully understand the wish to make people use coherent versions of packages 
and R for situations where lots of interdependent packages, classes, methods 
etc. are imported. 
But sometimes, people just need one or two packages, and then R’s built-in 
dependency management works just fine and the current BiocManager approach is 
needlessly intrusive.

It’s as bad as having made me wonder whether to recommend authors of packages 
that do not directly build upon BiocGenerics, IRanges etc. to submit them to 
CRAN, to increase potential user base (b/c installation from Bioconductor can 
be such a pain). And that’s really not the place I want to be.

Thanks and best wishes
Wolfgang





> Il giorno 10.05.2023, alle ore 17:12, Kasper Daniel Hansen 
>  ha scritto:
> 
> Could we get a list of use cases from Wolfgang? I am confused about what
> the issue is. Is the issue that it is painful to work with R-devel in the
> "off" 6-months? If so, I agree that it should be easier (even if we don't
> recommend it). But I am having a hard time parsing the email.
> 
> I can recognize Martin M's wish: a way to run Bioc-release on R-devel; that
> seems sensible to me.
> 
> Best,
> Kasper
> 
> On Tue, May 9, 2023 at 3:46 AM Martin Maechler 
> wrote:
> 
>>>>>>> Wolfgang Huber
>>>>>>>on Sun, 7 May 2023 14:29:37 +0200 writes:
>> 
>>> Hi Martin As you correctly point out, Bioconductor package
>>> developers are probably not those with the most relevant
>>> use cases. I think there are use cases for everyone
>>> else—anyone who decides to write code on R-devel, for
>>> whatever reason, and just wants to use a Bioconductor
>>> package between mid-April to mid-October (they could
>>> develop for CRAN, or just be a user and write scripts and
>>> packages for a private project). There are many useful
>>> packages on Bioconductor that are of general interest,
>>> even for people whose work does not center around
>>> Bioconductor or biology (say, ggtree, rhdf5,
>>> sparseMatrixStats, EBImage, …)
>> 
>>> I added these ponderings also to
>>> https://github.com/Bioconductor/pkgrevdocs/issues/108
>> 
>>> Thanks and best wishes Wolfgang
>> 
>> As the older ones among you know, I've been a BioC developer
>> only many years ago ('hexbin' e.g.), but as an R package
>> maintainer and co-maintainer and R Core team member,
>> I really like to chime in here, declaring that it *has* been
>> quite painful for me over the years to test CRAN packages which
>> depend on BioC packages - with R-devel -- which is my primary R
>> version for testing, notably also for testing potential changes in R
>> across many packages, etc.
>> Notably during this half of the year where there is no
>> "official" way how to correctly install current Bioconductor packages
>> (in their own package library, as I always do) under R-devel.
>> 
>> If I'd be able to sum up the time lost over this issue for the last say 10
>> years, it would add to a full working day at least. ...
>> 
>> (and I have added a comment also in the above issue #108)
>> 
>> 
>>> (PS in my particular case yesterday, it was just that my
>>> R-devel is better maintained (built from source etc) and
>>> has in its library some (non-BioC) packages with complex
>>> systems dependencies that I need for a workflow I am
>>> working on, packages that currently elude me on my binary
>>> installation of R4.3. And then in addition I just wanted
>>> to *use* a package from Bioconductor and didn’t like how
>>> clumsy that experience was.)
>> 
>> My other experience is that I always have to help people in my
>> group to install our pcalg CRAN package because it depends
>> e.g. on Bioc packages 'graph' and 'Rgraphviz' .. and on their
>> laptops they somehow don't have the correct  getOption("repos")
>> or there are other reasons why install.packages('pcalg')
>> does not find its Bioc dependencies.
>> On our Linux desktop+server environment, I do setup
>>options(repos = )
>>

Re: [Bioc-devel] BiocManager::install

2023-05-07 Thread Wolfgang Huber

Hi Martin

As you correctly point out, Bioconductor package developers are probably not 
those with the most relevant use cases. I think there are use cases for 
everyone else—anyone who decides to write code on R-devel, for whatever reason, 
and just wants to use a Bioconductor package between mid-April to mid-October 
(they could develop for CRAN, or just be a user and write scripts and packages 
for a private project). There are many useful packages on Bioconductor that are 
of general interest, even for people whose work does not center around 
Bioconductor or biology (say, ggtree, rhdf5, sparseMatrixStats, EBImage, …)

I added these ponderings also to 
https://github.com/Bioconductor/pkgrevdocs/issues/108 

Thanks and best wishes
Wolfgang


(PS in my particular case yesterday, it was just that my R-devel is better 
maintained (built from source etc) and has in its library some (non-BioC) 
packages with complex systems dependencies that I need for a workflow I am 
working on, packages that currently elude me on my binary installation of R4.3. 
And then in addition I just wanted to *use* a package from Bioconductor and 
didn’t like how clumsy that experience was.)



> Il giorno 06.05.2023, alle ore 16:45, Martin Morgan  
> ha scritto:
> 
> I opened two issues for further discussion of the technical aspects.
>  https://github.com/Bioconductor/BiocManager/issues/165
> https://github.com/Bioconductor/pkgrevdocs/issues/108
>  Just to be clear, as noted at the end of the second issue and on the web 
> page you mention, a Bioconductor package developer wishing to use 
> 'Bioc-devel' should, during the mid-April to mid-October release cycle, be 
> using the **release** version of R. This combination of R and Bioconductor is 
> supported by BiocManager. Similarly, in the mid-October to mid-April release 
> cycle, the Bioconductor developer should be R-devel, and BoicManager supports 
> this, too.
>  There are scenarios where a developer might wish to combine R-devel and 
> Bioc-devel in the mid-May, to mid-October time frame, e.g., when developing a 
> CRAN package with Bioconductor dependencies, or when conscientiously testing 
> CRAN packages that depend on Bioconductor packages. One may also just want to 
> be on the bleeding edge, so using R-devel and living with any consequence 
> that arise from R / Bioconductor version mismatches. Are these less-common 
> scenarios the one that you are engaged in?
>  Martin
>  From: Bioc-devel  on behalf of Wolfgang 
> Huber 
> Date: Saturday, May 6, 2023 at 9:43 AM
> To: Vincent Carey 
> Cc: bioc-devel@r-project.org 
> Subject: Re: [Bioc-devel] BiocManager::install
> Dear Martin and Vince
> 
> thank you, very insightful points. Indeed I think it’s primarily a matter of 
> documentation and priming, and, e.g., adding Martin's lines prominently 
> enough e.g. to https://contributions.bioconductor.org/use-devel.htmland a 
> reference to it into the manpage of BiocMananger::install.  
> 
> I acknowledge that installation and dealing with dependencies is *hard*. The 
> relatively smooth user experience of Bioconductor, compared to other 
> projects, is one of our greatest assets. I guess it needs constant attention 
> on our side. One of the slogans of R/Bioconductor is “turning users into 
> developers” and therefore something that has useful defaults but is easy 
> enough to customize seems desirable. In that sense, it’d be great to be able 
> to stay with BiocManager::install and not having to abandon it in favour of 
> base::install.packages.
> 
> The codebase behind BiocManager::install seems to have become a little…. 
> complicated.
> 
> The documentation clarification re BIOCONDUCTOR_ONLINE_VERSION_DIAGNOSIS that 
> Martin suggests would be welcome.
> 
> Kind regards
> Wolfgang
> 
> 
> 
> 
> 
> 
> > Il giorno 06.05.2023, alle ore 13:05, Vincent Carey 
> >  ha scritto:
> > 
> > Thanks for these observations Wolfgang, I am glad I read to the end,
> > because as you say,
> > 
> > https://solutions.posit.co/envs-pkgs/bioconductor/
> > 
> > has lots of interesting information.  As I personally have no
> > experience with renv or Connect
> > much of the motivating detail is opaque to me.
> > 
> > I would question the proposition
> > 
> > "Given the structural differences between BioConductor and CRAN
> > repositories, it is not straightforward to work with both types. "
> > 
> > with at least 10 years of history of effective usage of both together
> > by many hundreds of users.  "Straightforward" is
> > subjective.  The existence of some shortcomings, like the specific
> > ones you mention, is acknowledged, and setting
> > up priorities to amelio

Re: [Bioc-devel] BiocManager::install

2023-05-06 Thread Wolfgang Huber

Dear Martin and Vince

thank you, very insightful points. Indeed I think it’s primarily a matter of 
documentation and priming, and, e.g., adding Martin's lines prominently enough 
e.g. to https://contributions.bioconductor.org/use-devel.html and a reference 
to it into the manpage of BiocMananger::install.  

I acknowledge that installation and dealing with dependencies is *hard*. The 
relatively smooth user experience of Bioconductor, compared to other projects, 
is one of our greatest assets. I guess it needs constant attention on our side. 
One of the slogans of R/Bioconductor is “turning users into developers” and 
therefore something that has useful defaults but is easy enough to customize 
seems desirable. In that sense, it’d be great to be able to stay with 
BiocManager::install and not having to abandon it in favour of 
base::install.packages.

The codebase behind BiocManager::install seems to have become a little…. 
complicated.

The documentation clarification re BIOCONDUCTOR_ONLINE_VERSION_DIAGNOSIS that 
Martin suggests would be welcome.

Kind regards
Wolfgang






> Il giorno 06.05.2023, alle ore 13:05, Vincent Carey 
>  ha scritto:
> 
> Thanks for these observations Wolfgang, I am glad I read to the end,
> because as you say,
> 
> https://solutions.posit.co/envs-pkgs/bioconductor/
> 
> has lots of interesting information.  As I personally have no
> experience with renv or Connect
> much of the motivating detail is opaque to me.
> 
> I would question the proposition
> 
> "Given the structural differences between BioConductor and CRAN
> repositories, it is not straightforward to work with both types. "
> 
> with at least 10 years of history of effective usage of both together
> by many hundreds of users.  "Straightforward" is
> subjective.  The existence of some shortcomings, like the specific
> ones you mention, is acknowledged, and setting
> up priorities to ameliorate them would be worthwhile.  Part of the
> prioritization would need to be based on user
> data and user experiences.  In the case of this posit.co article, what
> is known about the significance of Connect
> for genomic data science?  I have not had great difficulty publishing
> apps to shinyapps.io that use Bioconductor
> and CRAN, but perhaps it can be made easier if that is a key concern.
> 
> The problem of smoothly supporting multiple versions of R/Bioc
> simultaneously is also acknowledged.  At this
> time we do not have sufficient resources to make a big charge in the
> direction of increasing support for this
> "use case".  Users and sysadmins with sufficient expertise can
> definitely accomplish much in this area, see
> https://bioconductor.org/about/release-announcements/ for the map of
> resources supporting this going back to
> 2005.  If there is a way to simplify this by using recently developed
> package management strategies is would
> be good to know and document.
> 
> This is a good place to continue the discussion from a developer's
> perspective, but how can we get more
> input from non-developer users?  And from posit.co?
> 
> "Publishing Shiny Apps that make use of BioConductor packages to
> Connect is not possible for this setup.
> BiocManager::install() temporarily adds the BioConductor repository
> for the duration of the install process.
> During the publishing process rsconnect no longer has any knowledge
> about BioConductor." -- is this something
> that can be remedied in BiocManager?  Are we able to test Connect for
> this use case?
> 
> 
> On Sat, May 6, 2023 at 4:40 AM Wolfgang Huber  wrote:
>> 
>> Hi,
>> 
>> I am wondering whether:
>> 1. it could be easier to install Bioconductor packages (devel or release) on 
>> R-devel (or other non-standard R versions) using BiocManager::install (I may 
>> be stirring a hornet’s nest with that:)
>> 2. whether its documentation needs to be updated and/or its implementation 
>> could be deconvoluted (hopefully that’s uncontroversial).
>> 
>> Re the first point, I appreciate that we’re trying to help non-expert users 
>> with simple use cases, and that we had/have a lot of trouble with users 
>> working with out-of-sync versions. OTOH, the current solution (rigid, 
>> confusing documentation, seemingly buggy implementation) seems to be 
>> standing in the way for developers, a dichotomy that we do not really want.
>> 
>> Of course, a workaround is
>> ```{r}
>>> install.packages("ggtree", repos = c(“@CRAN@", 
>>> "https://bioconductor.org/packages/3.18/bioc;)
>> ```
>> and maybe this is just the answer. So far, my workflows have been based on 
>> BiocManager::install, but I get (and cannot see

[Bioc-devel] BiocManager::install

2023-05-06 Thread Wolfgang Huber

Hi,

I am wondering whether:
1. it could be easier to install Bioconductor packages (devel or release) on 
R-devel (or other non-standard R versions) using BiocManager::install (I may be 
stirring a hornet’s nest with that:)
2. whether its documentation needs to be updated and/or its implementation 
could be deconvoluted (hopefully that’s uncontroversial). 

Re the first point, I appreciate that we’re trying to help non-expert users 
with simple use cases, and that we had/have a lot of trouble with users working 
with out-of-sync versions. OTOH, the current solution (rigid, confusing 
documentation, seemingly buggy implementation) seems to be standing in the way 
for developers, a dichotomy that we do not really want.

Of course, a workaround is
```{r}
> install.packages("ggtree", repos = c(“@CRAN@", 
> "https://bioconductor.org/packages/3.18/bioc;)
``` 
and maybe this is just the answer. So far, my workflows have been based on 
BiocManager::install, but I get (and cannot seem to get rid of):

```{r}
> options(BIOCONDUCTOR_ONLINE_VERSION_DIAGNOSIS = FALSE)
> BiocManager::install("ggtree", version = "devel")
Error: Bioconductor does not yet build and check packages for R version 4.4; see
  https://bioconductor.org/install

> sessionInfo()
R Under development (unstable) (2023-05-05 r84398)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.3.1

Matrix products: default
BLAS:   
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
 
LAPACK: /Users/whuber/R.framework/Versions/4.4/Resources/lib/libRlapack.dylib;  
LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] BiocManager_1.30.20 fortunes_1.5-4 

loaded via a namespace (and not attached):
[1] compiler_4.4.0  tools_4.4.0 rstudioapi_0.14
``` 

I noted some discussion on this here: 
https://github.com/Bioconductor/BiocManager/issues/13 but this was 5 years ago.
It appears that the documentation of BiocManager::install mismatches its 
implementation, and overall the process for something that's conceptually quite 
simple seems to have become convoluted. 

One of the most helpful documentation resources on this topic btw is 
https://solutions.posit.co/envs-pkgs/bioconductor/ which cheerfully concludes 
"Working with BioConductor packages for code development is possible."

Thanks and best wishes
Wolfgang

--
Wolfgang Huber
EMBL
https://www.embl.org/groups/huber

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Question on prospective Bioc package

2023-02-22 Thread Wolfgang Huber

Dear Greg

This sounds like a typical use case for a Bioconductor package.
Simplicity is a not a bad thing—robust, well-engineered building blocks that do 
one thing really well are IMHO often more useful to many than big integrated 
complex jack of all trades.


Thanks and best wishes
Wolfgang





> Il giorno 17.02.2023, alle ore 18:26, Flanigan, Greg (NIH/NCI) [C] via 
> Bioc-devel  ha scritto:
> 
> Dear Bioconductor Developer team,
> 
> I�m part of a team at the NCI that is looking at developing a Bioconductor 
> package, but I want to be sure it is a good fit with the goals of the 
> Bioconductor project.
> 
> The core of the package is a custom data object that allows users to query 
> predicted drug performance based on target gene, whether wild-type or mutant, 
> or to query predicted gene target fitness by drug. The data object is 
> precomputed based on a variety of sources, and we include methods to modify 
> the object based on custom data provided by the user.
> 
> If a more detailed dive would help, here is the preprint of the accompanying 
> paper: 
> Link
> 
> I think this tool will offer good impact and utility for researchers. Its 
> predictions are well-validated and it offers a novel method for examining 
> drug performance on various cancers. However, given the simplicity of this 
> tool, would it be a good fit for the Bioconductor project? I would appreciate 
> your opinion. We will be developing it into a package hosted on GitHub either 
> way. Thank you for your time!
> 
> Sincerely,
> Greg
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Package name

2021-11-08 Thread Wolfgang Huber

A more specific name seems appropriate here anyway, when I 
https://www.google.com/search?q=PSM or https://en.wikipedia.org/wiki/PSM 
neither your nor Mortensen and Klim’s expansion of this abbreviation would seem 
to be the most obvious to most people.

Kind regards
Wolfgang




> Il giorno 24ott2021, alle ore 21:15, Laurent Gatto 
>  ha scritto:
> 
> My specific example falls in Henrik's category.
> 
> I am working on a package that handles peptide-spectrum matches, commonly 
> called PSMs in proteomics. I realised, to my great dismay, that there used to 
> be a PSM package on CRAN 
> (https://cran.r-project.org/web/packages/PSM/index.html) for non-linear 
> mixed-effects modelling using stochastic differential equations for 
> population stochastic modelling. As you might imagine, that name is very far 
> fetched in my view.
> 
> I renamed my package.
> 
> 
> From: Bioc-devel  on behalf of Henrik 
> Bengtsson 
> Sent: 22 October 2021 14:02
> To: Wolfgang Huber
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Package name
> 
> For CRAN packages it's easy. Packages on CRAN are eternal. They may be
> archived, but they are never removed, so in a sense they're always
> "currently on CRAN". Archived packages may still be installed, but only
> with some efforts of the user. Some packages go in an out of "archived"
> status depending how quick the maintainer fixes issues. Because of this, I
> cannot really see how a CRAN package name can be "reused" by anyone else
> without a formal handover agreement between old and new maintainers. Even
> so, I think CRAN needs to approve on the "update" in order to unarchive it.
> 
> Personally, I'd argue the same should apply to Bioconductor packages.
> Reusing package names for other purposes/different APIs is just asking for
> troubles, e.g. when it comes to future scientists trying to reproduce
> legacy results.
> 
> /Henrik
> 
> On Fri, Oct 22, 2021, 03:02 Wolfgang Huber  wrote:
> 
>> This is probably a niche concern, but  I’d find it a pity if a good
>> package name (*) became unavailable forever, esp. if it refers to a
>> real-world concept not owned by the authors of the original package.
>> Perhaps we could allow re-using a name after a grace period (say 1 or 2
>> years)?
>> To be extra safe, one could also require the first version number of the
>> new package be much higher than the last version of the old (dead) package.
>> 
>> (*) One example I have in mind where we re-used the name of an extinct
>> project is rhdf5.
>> 
>> Kind regards
>> Wolfgang
>> 
>>> Il giorno 21ott2021, alle ore 13:39, Kern, Lori
>>  ha scritto:
>>> 
>>> Good point.  I'll open an issue on the github to fix.
>>> 
>>> 
>>> Lori Shepherd
>>> 
>>> Bioconductor Core Team
>>> 
>>> Roswell Park Comprehensive Cancer Center
>>> 
>>> Department of Biostatistics & Bioinformatics
>>> 
>>> Elm & Carlton Streets
>>> 
>>> Buffalo, New York 14263
>>> 
>>> 
>>> From: Bioc-devel  on behalf of
>> Laurent Gatto 
>>> Sent: Thursday, October 21, 2021 12:53 AM
>>> To: bioc-devel@r-project.org 
>>> Subject: [Bioc-devel] Package name
>>> 
>>> The Package Guidelines for Developers and Reviewers say that:
>>> 
>>> A package name should be descriptive and should not already exist as a
>> current package (case-insensitive) in Bioconductor nor CRAN.
>>> 
>>> The sentences says current packages - does that imply that names of
>> packages that have been archived (on CRAN) or deprecated (on Bioconductor)
>> are available? This is likely to lead to serious confusion.
>>> 
>>> Laurent
>>> 
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> 
>> https://secure-web.cisco.com/18tLjfrOdSZ-K_8neKbEy5VWz_fgbNJthSRI3zRVyXXtc-p9kCgNhG51wWXnY7UGhy4yP_imTwLGoP4BCIicB_fqzg9U937WF_IJiOPJh7NnfQXFLeEV-SiiJJ1eCyN2vaJFacWPvahAlN135mDHZNw_peW0Yl4BOq8m2QBMh4i952Nt6oghMQpSWSjaP_2bN4VKIBT2ZP-A7pDqddlOSeCCaMEKJZp_6w1WthdY69MB6lAbsF-i9uX3JVNSCmAlXW3YMNOfVEBijto4EJaGIUJMJwGX_vec9kTf9gtFiYztotSHNfquFZ4GlaHmXeHwPaBEtazOY5fPiuzLjzDK52Q/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel
>>> 
>>> 
>>> 
>>> This email message may contain legally privileged and/or confidential
>> information.  If you are not the intended recipient(s), or the employee or
>> agent responsib

Re: [Bioc-devel] Package name

2021-10-22 Thread Wolfgang Huber

This is probably a niche concern, but  I’d find it a pity if a good package 
name (*) became unavailable forever, esp. if it refers to a real-world concept 
not owned by the authors of the original package.
Perhaps we could allow re-using a name after a grace period (say 1 or 2 years)?
To be extra safe, one could also require the first version number of the new 
package be much higher than the last version of the old (dead) package.

(*) One example I have in mind where we re-used the name of an extinct project 
is rhdf5.

Kind regards
Wolfgang

> Il giorno 21ott2021, alle ore 13:39, Kern, Lori 
>  ha scritto:
> 
> Good point.  I'll open an issue on the github to fix.
> 
> 
> Lori Shepherd
> 
> Bioconductor Core Team
> 
> Roswell Park Comprehensive Cancer Center
> 
> Department of Biostatistics & Bioinformatics
> 
> Elm & Carlton Streets
> 
> Buffalo, New York 14263
> 
> 
> From: Bioc-devel  on behalf of Laurent 
> Gatto 
> Sent: Thursday, October 21, 2021 12:53 AM
> To: bioc-devel@r-project.org 
> Subject: [Bioc-devel] Package name
> 
> The Package Guidelines for Developers and Reviewers say that:
> 
> A package name should be descriptive and should not already exist as a 
> current package (case-insensitive) in Bioconductor nor CRAN.
> 
> The sentences says current packages - does that imply that names of packages 
> that have been archived (on CRAN) or deprecated (on Bioconductor) are 
> available? This is likely to lead to serious confusion.
> 
> Laurent
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://secure-web.cisco.com/18tLjfrOdSZ-K_8neKbEy5VWz_fgbNJthSRI3zRVyXXtc-p9kCgNhG51wWXnY7UGhy4yP_imTwLGoP4BCIicB_fqzg9U937WF_IJiOPJh7NnfQXFLeEV-SiiJJ1eCyN2vaJFacWPvahAlN135mDHZNw_peW0Yl4BOq8m2QBMh4i952Nt6oghMQpSWSjaP_2bN4VKIBT2ZP-A7pDqddlOSeCCaMEKJZp_6w1WthdY69MB6lAbsF-i9uX3JVNSCmAlXW3YMNOfVEBijto4EJaGIUJMJwGX_vec9kTf9gtFiYztotSHNfquFZ4GlaHmXeHwPaBEtazOY5fPiuzLjzDK52Q/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel
> 
> 
> 
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee or 
> agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Can anyone reproduce this? Possible bug or typo in BiocManager:::.version_map() or upstream

2021-05-04 Thread Wolfgang Huber

Hi,

I get

 > BiocManager::install()
Error: Bioconductor version '3.13' requires R version '4.1'; R version is too 
new; see
  https://bioconductor.org/install

and this appears to be because of the last line in

> BiocManager:::.version_map()
   BiocR  BiocStatus RSPM MRAN
…(29 lines omitted)….
30 3.10  3.6 out-of-date  
31 3.11  4.0 out-of-date  
32 3.12  4.0 release  
33 3.13  4.1   devel  
34 3.13  4.2  future  

Shouldn’t it say “3.14” in the second element of the last line?


> sessionInfo()
R Under development (unstable) (2021-05-04 r80261)
Platform: x86_64-apple-darwin20.4.0 (64-bit)
Running under: macOS Big Sur 11.3

Matrix products: default
BLAS:   /Users/whuber/R/lib/libRblas.dylib
LAPACK: /Users/whuber/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] BiocManager_1.30.12 fortunes_1.5-4

loaded via a namespace (and not attached):
[1] compiler_4.2.0 tools_4.2.0

and I seem to have the most recent version of BiocManager as of today’s 
https://cran.r-project.org/web/packages/BiocManager/index.html 

——

Btw the result of

> BiocManager:::.get_R_version()
[1] ‘4.2.0’

is probably technically not correct, as it’s r-devel.

Kind regards
Wolfgang

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] How to integrate code from a package not available on bioconductor

2021-02-10 Thread Wolfgang Huber




> Il giorno 10feb2021, alle ore 14:31, Kern, Lori 
>  ha scritto:
> 
> Have you reached out to the maintainer of the github package to see if they 
> would plan on submitting to CRAN or Bioconductor?
> If they do not,  you could see if they are okay with you including the code 
> in your package and then clearly indicate their authorship in the man pages 
> and by giving contributor credit in the DESCRIPTION. They might also include 
> CITATION information in their package to include?
> 
> Others might have additional thoughts?

Ana, 

if you are in a position to take that other code “as is” and then assume 
maintainership over it, this may be a way forward. However, consider these 
issues in advance:
- What happens if these other authors change their code on GitHub? Will you 
also synchronize the copy in your own package, or leave it as is? This requires 
a process and potentially continuous resources. 
- Who is responsible for fixing bugs in that copied code? 
- There are many reasons why people might not put their code on CRAN and 
Bioconductor, but in case it is a sign of low confidence in the quality of the 
code or low commitment to maintain it, depending on it incurs additional 
technical debt for you.

Are there maybe other packages (on CRAN/Bioconductor) that provide what you 
need?

Kind regards
Wolfgang


> 
> Cheers,
> 
> 
> Lori Shepherd
> 
> Bioconductor Core Team
> 
> Roswell Park Comprehensive Cancer Center
> 
> Department of Biostatistics & Bioinformatics
> 
> Elm & Carlton Streets
> 
> Buffalo, New York 14263
> 
> 
> From: Bioc-devel  on behalf of Ana Carolina 
> Leote 
> Sent: Wednesday, February 10, 2021 5:14 AM
> To: bioc-devel@r-project.org 
> Subject: [Bioc-devel] How to integrate code from a package not available on 
> bioconductor
> 
> Dear all,
> 
> I am a package maintainer and would like to add a functionality that
> depends on another package which is not available in bioconductor, only via
> GitHub. I would like to make the necessary changes in order to be able to
> use their code in my package. Can anyone point me towards the best way to
> proceed without undermining the authorship of the original developers? Is
> copying their code to my package and editing it to bioconductor standards
> appropriate?
> 
> Thank you and best wishes,
> Carolina
> 
> *Ana Carolina Leote*
> *MSc. Biological Engineering*
> *PhD student at the Cologne Graduate School for Ageing Research*
> *Cellular networks and systems biology (Beyer group), **CECAD Research
> Center*
> https://secure-web.cisco.com/1sN_yjFZ4rXsNHMf8Fp8CJTySgKw-1exyK2uoJVgve2HB-ffaKVZpjVqnqKkiHfvj-8XfBD5L-E-u5CSB1GVD7eouovHeu4HCMh60N2H2o2WA8SOPodHnGfUgo8J3fWYeoSrQCiNJ5M4yE2VTnUi-WGlGdJeBDFxKTQwTFVThCjoTyAwk1tTdVAznZxVTGkCNBmBuV1Na-a0-81QaIYra4grO9-57FesPlxZIA0WUU9icoS3LgqlPTUhsmLZUM0NmejCSrTNSDA9AzDs7tB66TW4lBTvaNpOHMrk5wPCYIPckLqpDkVL3HnRSiWmXdWJE/https%3A%2F%2Fwww.linkedin.com%2Fin%2Fanacarolinaleote%2F
> 
>[[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://secure-web.cisco.com/1vxzYOt1Hk0jYyShxe8B714l2GGAvh6Qvfo2Ey_49Ro47iSw83aMqNIB5Qv3keiqGApswE2x6TuWzoZjp-NJuEJcrt-skDYLR1Lsd0m1gCdOno-5vOXu1DHnuKsRkZQEyXRX00TjWpnBjbIwuJq5ll1xYmpurSpXNAagbyUQpcfZ136DgLhfGEM63BY-2KzMOwYsXqzpFhqcGhYhR1BkRDJBlIK7NjrhbsQ_8f4Hn2n7yvll4i20POhhaeiLBMZwwu0HXiWtT8v6VNA8pbxf7YxHpISCef2JJErtEjx21uA1ZQxeTNYg3X_37zZDJYGcZ/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel
> 
> 
> 
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee or 
> agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] RFC: Bioc repository for single-version packages

2018-11-13 Thread Wolfgang Huber

t rises to the level of 
securing separate funding?

Martin
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
With thanks in advance-

Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

My book with Susan Holmes: http://www.huber.embl.de/msmb






___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Ascona Workshop 2019 -- Statistical Challenges in Medical Data Science, 16–21 June 2019

2018-10-25 Thread Wolfgang Huber


Save the date!

We would like to announce and invite you to the workshop on Statistical 
Challenges in Medical Data Science, to be held on 16 – 21 June 2019 at 
Monte Verità, Ascona, Switzerland. The purpose of the workshop is to 
bring together participants from statistics, computational sciences, 
biology and medicine, and to encourage interaction in an informal and 
cooperative atmosphere.


Confirmed invited speakers: Karsten Borgwardt (ETH Zurich), Tianxi Cai 
(Harvard), Christina Curtis (Stanford), Anna Goldenberg (U Toronto), 
Jennifer Listgarten (UC Berkeley), Marylyn Ritchie (U Penn), Michael 
Snyder (Stanford), Matthew Stephens (U Chicago), Yinyin Yuan (ICR London).


Application will open in January 2019. For more information, please 
refer to https://www.bsse.ethz.ch/cbg/cbg-news/ascona-2019.html


Niko Beerenwinkel (ETH Zurich)
Peter Bühlmann (ETH Zurich)
Wolfgang Huber (EMBL)

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Can ExperimentHub packages contain non-R data formats?

2018-09-11 Thread Wolfgang Huber


Asking for a friend :)

The vignette [1] seems not completely explicit about this, and the 
example [2] contains mandatory-looking(?) columns RDataClass, 
DispatchClass, RDataPath.


[1] 
https://www.bioconductor.org/packages/devel/bioc/vignettes/ExperimentHub/inst/doc/CreateAnExperimentHubPackage.html
[2] 
https://github.com/Bioconductor/GSE62944/blob/master/inst/extdata/metadata.csv



With thanks in advance-

Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

My book with Susan Holmes: http://www.huber.embl.de/msmb

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Workflows are now in git (and other Important workflow-related changes)

2018-03-31 Thread Wolfgang Huber



Thank you, Lori, Valerie, Andrzej, Nitesh and  Hervé, this is great 
news! This should help further rationalize the authoring and maintenance 
of workflows.



--
I'm sure many have also seen the call to submit workflows to the 
Bioconductor gateway on F1000Research 
https://support.bioconductor.org/p/107477/ - which will make your 
workflow a 'peer-reviewed' publication and gives it additional visibility.


 Best wishes
Wolfgang

30.3.18 22:10, Hervé Pagès scripsit:

To the authors/maintainers of the workflows:


Following the svn-to-git migration of the software and data experiment
packages last summer, we've completed the migration of the workflow
packages.

The canonical location for the workflow source code now is
git.bioconductor.org

Please use your git client to access/maintain your workflow the same
way you would do it for a software or data-experiment package.

We've also migrated the workflows to our in-house build system.
Starting with Bioc 3.7, the build report for the devel versions of
the workflows can be found here:

   https://bioconductor.org/checkResults/devel/workflows-LATEST/

We run these builds every other day (Mondays, Wednesdays, Fridays).
Because of limited build resources, we now run the data-experiment
builds on Sundays, Tuesdays, and Thursdays only (instead of daily).

The links to the package landing pages are not working yet. This
will be addressed in the next few days.

Please address any error you see on the report for the workflow
you maintain.

Note that, from now on, we're also following the same version scheme
for these packages as for the software and data-experiment packages.
That is, we're using an even y (in x.y.z) in release and an odd y in
devel. We'll take care of bumping y at release time (like we do for
software and data-experiment packages).

After the next Bioconductor release (scheduled for May 1), we'll start
building the release versions of the workflows in addition to the
devel versions. The build report for the release versions will be here:

   https://bioconductor.org/checkResults/release/workflows-LATEST/

Finally, please note that with the latest version of BiocInstaller
(1.29.5), workflow packages can be installed with biocLite(), like
any other Bioconductor package. We'll deprecate the old mechanism
(workflowInstall()) at some point in the future.

Thanks to Andrzej, Lori, Nitesh, and Valerie for working on this
migration.

Let us know if you have any question about this.

H.




--
With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Should GenomicFeatures really depend pn RMySQL ? Is it time to migrate to RMariaDB ?

2018-02-02 Thread Wolfgang Huber


Thanks Mike and Hervé!

Somehow, errors in examples that are caused by the state (or absence) of 
things on the internet should have a different status in my view than 
ones that reflect local state of R and packages - and distinguishing 
this could make maintenance of the package corpus & builds more modular.


But I'm not entirely confident whether this view is coherent or 
practicable, what do others think?


Wolfgang

2.2.18 12:19, Hervé Pagès scripsit:

Hi Wolfgang, Mike,

We didn't have a successful build of GenomicFeatures in devel
for many days because of all kinds of problems with the examples
that try to access the Ensembl marts. The latest of which being:

   https://support.bioconductor.org/p/105565/

(which also affects GenomicFeatures in release).

Thanks Mike for looking at the timeout issue (FWIW I can reproduce
it). I thought this timeout was a new setting on the Ensembl mart
server side :-)

In GenomicFeatures 1.31.7 I replaced RMySQL with RMariaDB:


https://github.com/Bioconductor/GenomicFeatures/commit/08dd24296d94ef31b5f5685240b871c79a160e91 



I also made another small speed improvement to makeTxDbFromUCSC().

H.


On 02/02/2018 02:33 AM, Mike Smith wrote:

The error TxDbFromBiomart looks like it might be related to a biomaRt
change I made recently to submit queries using httr rather than RCurl.
Others have reported something similar (e.g
https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_104502_=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=nTaIFim9PZDsGaU0xtK3D4cw4UKBrPzen-jCDV-ppDE=k3wnoKsf0v7Biv9DKASzF3hSAzhO02fR9UMIiznM2EA=) 
and I raised the timeout from

10 to 60 seconds.  I guess with the old version it was even longer than
that.

I haven't been able to recreate the problem at my end, I think the time
taken is related as much to the internet connection as to the query, but
I'll take a look at the failing example to see if I can shed any more 
light

on it.

Mike

On 2 February 2018 at 10:41, Wolfgang Huber <wolfgang.hu...@embl.de> 
wrote:



Thanks Hervé!

This seems to take a long time to propagate. As of now,
https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_packages_devel_bioc_html_GenomicFeatures.html=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=nTaIFim9PZDsGaU0xtK3D4cw4UKBrPzen-jCDV-ppDE=09LGf-6SKIrEdIR2hwC4Vq1YOBj-3PrsuCanwxv9AMs= 


still shows 1.31.3.

( Btw, there's also an error in the build report which seems to come 
from

a rather excessive-looking example in the makeTxDbFromBiomart man page -
that maybe could better live in a vignette, not least for reducing
brittleness? )

 Best wishes
 Wolfgang

30.1.18 19:00, Hervé Pagès scripsit:

This is done in GenomicFeatures 1.31.6.


Note that I also made a few changes to makeTxDbFromUCSC() to make it
a little bit faster (about 2x).

@Kasper: moving the makeTxDb* functions to a GenomicFeaturesBuildTools
or GenomicFeaturesForge package is maybe an idea to explore...

H.

On 01/26/2018 06:09 PM, Kasper Daniel Hansen wrote:

As an alternative to Suggests, perhaps make a 
GenomicFeaturesBuildTools.

   Not sure if it is better or worse, just different

On Fri, Jan 26, 2018 at 2:39 PM, Wolfgang Huber 
<wolfgang.hu...@embl.de>

wrote:




26.1.18 14:59, Martin Morgan scripsit:

On 01/24/2018 03:38 PM, Wolfgang Huber wrote:


GenomicFeatures_1.31.3 imports RMySQL.


I'm having great trouble installing RMySQL from source on a recent
MacOS
(10.13.3) with homebrew.

The package's homepage says "The 'RMySQL' package contains an old
implementation based on legacy code from S-PLUS which being phased
out. A
modern 'MySQL' client based on 'Rcpp' is available from the 
'RMariaDB'
package" 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-

2Dproject.org_web_packages_RMySQL_index.html=DwICAg=eRAM
FD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaP
hzWA=jUD3XGyxapAK8nfWvxbR_9Rz7RBxP592NO0nWz_wyr0=0neTB2B
Jghm4wF-LnEZ9V9sV_AbF1botcOeTb_bWk-w=

So is it time to heed that advice and migrate GenomicFeatures to
RMariaDB ?



Out of curiosity, is MariaDB easier to install on your system? Its
system
dependencies are described at 
https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense.proofpoint=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=nTaIFim9PZDsGaU0xtK3D4cw4UKBrPzen-jCDV-ppDE=t1JSruTyZxTGystW_bcaK5XfEnAVD0dTfxdLOO3boT4=. 


com/v2/url?u=https-3A__CRAN.R-2Dproject.org_package-3DRMaria
DB=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJ
YbW0WYiZvSXAJJKaaPhzWA=jUD3XGyxapAK8nfWvxbR_9Rz7RBxP592NO0
nWz_wyr0=55tLfrRpaigMug-pW7L1AhpsD1A830xw1f9d1FZEANo=


I have no problems installing RMariaDB on MacOS (10.13.3) on an 
R-devel

from source, after installing mariadb-connector-c with homebrew.

OTOH, I have not figured out a way to install RMySQL neither on my
R-devel
from source (various complaints about missing .h files)

Re: [Bioc-devel] Should GenomicFeatures really depend pn RMySQL ? Is it time to migrate to RMariaDB ?

2018-02-02 Thread Wolfgang Huber


Thanks Hervé!

This seems to take a long time to propagate. As of now, 
https://bioconductor.org/packages/devel/bioc/html/GenomicFeatures.html 
still shows 1.31.3.


( Btw, there's also an error in the build report which seems to come 
from a rather excessive-looking example in the makeTxDbFromBiomart man 
page - that maybe could better live in a vignette, not least for 
reducing brittleness? )


Best wishes
Wolfgang

30.1.18 19:00, Hervé Pagès scripsit:

This is done in GenomicFeatures 1.31.6.

Note that I also made a few changes to makeTxDbFromUCSC() to make it
a little bit faster (about 2x).

@Kasper: moving the makeTxDb* functions to a GenomicFeaturesBuildTools
or GenomicFeaturesForge package is maybe an idea to explore...

H.

On 01/26/2018 06:09 PM, Kasper Daniel Hansen wrote:

As an alternative to Suggests, perhaps make a GenomicFeaturesBuildTools.
  Not sure if it is better or worse, just different

On Fri, Jan 26, 2018 at 2:39 PM, Wolfgang Huber <wolfgang.hu...@embl.de>
wrote:




26.1.18 14:59, Martin Morgan scripsit:


On 01/24/2018 03:38 PM, Wolfgang Huber wrote:


GenomicFeatures_1.31.3 imports RMySQL.

I'm having great trouble installing RMySQL from source on a recent 
MacOS

(10.13.3) with homebrew.

The package's homepage says "The 'RMySQL' package contains an old
implementation based on legacy code from S-PLUS which being phased 
out. A

modern 'MySQL' client based on 'Rcpp' is available from the 'RMariaDB'
package" 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_web_packages_RMySQL_index.html=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=jUD3XGyxapAK8nfWvxbR_9Rz7RBxP592NO0nWz_wyr0=0neTB2BJghm4wF-LnEZ9V9sV_AbF1botcOeTb_bWk-w= 



So is it time to heed that advice and migrate GenomicFeatures to
RMariaDB ?



Out of curiosity, is MariaDB easier to install on your system? Its 
system
dependencies are described at 
https://urldefense.proofpoint.com/v2/url?u=https-3A__CRAN.R-2Dproject.org_package-3DRMariaDB=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=jUD3XGyxapAK8nfWvxbR_9Rz7RBxP592NO0nWz_wyr0=55tLfrRpaigMug-pW7L1AhpsD1A830xw1f9d1FZEANo= 





I have no problems installing RMariaDB on MacOS (10.13.3) on an R-devel
from source, after installing mariadb-connector-c with homebrew.

OTOH, I have not figured out a way to install RMySQL neither on my 
R-devel
from source (various complaints about missing .h files) nor on a 
binary R

3.4.2 with the binary package download (complaints about missing system
libraries / wrong versions).

Thanks and kind regards
 Wolfgang

FWIW MySQL is a relatively recent addition as a dependency to
GenomicFeatures; it enables `makeTxDbFromEnsembl()`, which is 
probably a
much more stable solution than `makeTxDbFromBiomart()`. On the other 
had
Johannes does an excellent job on the ensembldb packages, so perhaps 
this

code could really be conditional with the RMySQL dependency moved to
Suggests:

Martin



With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.huber.embl.de=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=jUD3XGyxapAK8nfWvxbR_9Rz7RBxP592NO0nWz_wyr0=Vp0j7sKJXiZSWHn3LVWSECRS-f-AFZZSSh0mq5rvJ-0= 



___
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=jUD3XGyxapAK8nfWvxbR_9Rz7RBxP592NO0nWz_wyr0=0h8ajRHX9SrbTr3PP9GKMvt8hoBxj6c4Op-wxmvtyLA= 






This email message may contain legally privileged and/or confidential
information.  If you are not the intended recipient(s), or the 
employee or

agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited.  If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.



--
With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.huber.embl.de=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=jUD3XGyxapAK8nfWvxbR_9Rz7RBxP592NO0nWz_wyr0=Vp0j7sKJXiZSWHn3LVWSECRS-f-AFZZSSh0mq5rvJ-0= 



___
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WY

Re: [Bioc-devel] Should GenomicFeatures really depend pn RMySQL ? Is it time to migrate to RMariaDB ?

2018-01-26 Thread Wolfgang Huber




26.1.18 14:59, Martin Morgan scripsit:

On 01/24/2018 03:38 PM, Wolfgang Huber wrote:

GenomicFeatures_1.31.3 imports RMySQL.

I'm having great trouble installing RMySQL from source on a recent 
MacOS (10.13.3) with homebrew.


The package's homepage says "The 'RMySQL' package contains an old 
implementation based on legacy code from S-PLUS which being phased 
out. A modern 'MySQL' client based on 'Rcpp' is available from the 
'RMariaDB' package" 
https://cran.r-project.org/web/packages/RMySQL/index.html


So is it time to heed that advice and migrate GenomicFeatures to 
RMariaDB ?


Out of curiosity, is MariaDB easier to install on your system? Its 
system dependencies are described at 
https://CRAN.R-project.org/package=RMariaDB


I have no problems installing RMariaDB on MacOS (10.13.3) on an R-devel 
from source, after installing mariadb-connector-c with homebrew.


OTOH, I have not figured out a way to install RMySQL neither on my 
R-devel from source (various complaints about missing .h files) nor on a 
binary R 3.4.2 with the binary package download (complaints about 
missing system libraries / wrong versions).


Thanks and kind regards
Wolfgang

FWIW MySQL is a relatively recent addition as a dependency to 
GenomicFeatures; it enables `makeTxDbFromEnsembl()`, which is probably a 
much more stable solution than `makeTxDbFromBiomart()`. On the other had 
Johannes does an excellent job on the ensembldb packages, so perhaps 
this code could really be conditional with the RMySQL dependency moved 
to Suggests:


Martin



With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee 
or agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have 
received this message in error, please notify the sender immediately by 
e-mail and delete this email message from your computer. Thank you.


--
With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] search - was: differential expression tools for proteins

2018-01-07 Thread Wolfgang Huber




7.1.18 12:46, Wolfgang Huber scripsit:
Thank you for your question. It would however be more appropriate for 
the support forum, not for the developer mailing list. Would you mind 
moving it there, perhaps also the responses so far?


I just saw you posted on the forum, after browsing it.
I had searched for your name using the "search" function of the forum 
webpage before sending the previous, but that did not turn up this post.


Wouldn't it be great if our search box worked better? Or would just 
convert the search into an URL like


https://www.google.com/search?q=Cardin+site%3Asupport.bioconductor.org=date

Wolfgang

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] differential expression tools for proteins

2018-01-07 Thread Wolfgang Huber


Dear Julie

Thank you for your question. It would however be more appropriate for 
the support forum, not for the developer mailing list. Would you mind 
moving it there, perhaps also the responses so far?


There is no "in-principle" reason why DESeq2 shouldn't produce useful 
results also for count data from technologies that are not 
DNA-sequencing based. It's error model (Gamma-Poisson, GP) is quite generic.


As always, you should do model fit diagnostics though, to see whether 
the residuals for each protein across replicates and conditions (after 
fitting the GLM) are reasonably consistent with the GP, in particular, 
that they look unimodal.


One issue to check is also whether the normalization (size factors) is 
appropriate.


There is another bit of irony afaIu: If you have enough replicates (or: 
degrees of freedom) that you can actually "see" deviations from the GP 
assumption (i.e. >=dozens), then you probably don't need a parametric 
method, and could switch to something non-parametric.


Kind regards
Wolfgang

6.1.18 18:45, Cardin Julie via Bioc-devel scripsit:

Hi,
I have experienced very good results with DESeq2 for my RNASeq analysis. As 
far as I understand, it is a tool that normalise our data from sequencing to 
make them comparable.

I have a new project implicating proteins counts.
I have  couple of data sets. For each sample we have:
rows with proteins names (instead of genes), with their respective counts.

My goal is again to make a differential expression between treated groups 
versus controls.
I wonder if I can use DESeq2 to do a differential expression for proteins?
Or if the correcting factor that is used by DESeq2 to correct counts for RNASeq 
is specific to DNA sequencing and it is not applicable to proteins?

Is there a tool that do the exact same thing as DESeq2 but for proteins?

Thank you very much for your help and time,
Best regards and happy new year!

Julie Cardin
Bioinforamatician
IRCM



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] workflow page reorganization

2017-12-17 Thread Wolfgang Huber

Thank you all for the efforts on this! I agree with Mike that there's a 
lot of yet untapped potential in the workflows, for all levels of BioC 
users: for beginners, to make their first steps, for more experienced 
users, to learn new stuff.


Mike's points 1-5 are great, I second them.

And then some polite pushback on categorization... I think for a volume 
of one-two dozen workflows, browsing based on short descriptions (Mike's 
point 3) will often be preferable. And if the volume gets larger, I 
noted that "search" rather than "manually curated hierarchical menu 
trees" drives many successful websites (Amazon, Google) and there's 
probably a lesson in there.


Best wishes
Wolfgang


15.12.17 16:55, Michael Love scripsit:

This already looks much improved, thanks Andrzej and Aaron. I think
workflows are where it's at, and this page is probably
underappreciated by Bioconductor users and the outside community.

My wishlist for the workflows page, which may exceed what is available
for the current effort:

1) It should say at the top which version of R/Bioconductor the
workflows are being built on.

2) On the main page, for each workflow:

* A thumbnail (could live in a pre-specified location in the package)
* Author list (autopopulated from DESCRIPTION)
* Version
* Link to the (most current) F1000Research articles for those which
are published (new field in DESCRIPTION?)
* Some kind of CI "buttony" thing, to indicate to users that these are
live documents
* Key Bioc/R packages used in this worfklow (could this also be an
additional DESCRIPTION field?)

3) I think it would be good to encourage the more stubby workflow
descriptions to add more text, and maybe to decrease the very words
ones, so that it's more consistent. Wow, that's pretty obsessive of
me, but I think it would make the page look more professional.

4) Text somewhere with a link to the support site and how to ask for
help on workflows (e.g. vignette(), ?functionName)

5) An advertisement somewhere for submitting a workflow, link to more
detailed doc elsewhere

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-06 Thread Wolfgang Huber


Interesting! In iTerm2, I get
$ ulimit -Sn
4864

and
env R_MAX_NUM_DLLS=1000 R

works, which means that on Mac it IS possible to have many more DLLs 
open than 100 if R is started in the right way.


Wolfgang

PS I meant OS X 10.12.6, too. SOrry for the typo.


6.10.17 14:50, Kasper Daniel Hansen scripsit:

On OS X 10.12.6 (I don't think 10.12.16 exists), I get

$ ulimit -Sn
7168

Interestingly, this is because I use iTerm2 for my command line prompt.  
If I do the same command in Terminal I get 256.  If I start R inside of 
Emacs I get 256 as well.  I don't know anything about ulimit and how it 
is set, but that is a pretty start difference.


Best,
Kasper



On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber <wolfgang.hu...@embl.de 
<mailto:wolfgang.hu...@embl.de>> wrote:


On Mac OSX 10.12.16:
$ ulimit -Sn
256

so the maximum value of R_MAX_NUM_DLLS is 153 ...

         Wolfgang

5.10.17 23:02, Henrik Bengtsson scripsit:

About the DLL limit:

Just wanna make sure you're aware of "new" environment variable
R_MAX_NUM_DLLS available in R (>= 3.4.0).  It allows you to push the
current default limit of 100 open DLLs a bit higher.  It can be
set in
.Renviron or before, e.g.

$ R_MAX_NUM_DLLS=500 R

This, of course, assumes that you can set it, which you might not be
able to do on build servers.  Also, there is an upper limit
min(0.6*fd_limit,1000) that depends on the number of files you can
have open at the same time (fd_limit), e.g. on my Ubuntu 16.04 I've
got:

$ ulimit -Sn
1024

so R_MAX_NUM_DLLS=614 is the maximum for me.

/Henrik

On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber
<wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de>> wrote:


Breaking up long workflows into several smaller "modules"
each with a
clearly defined input and output is a good idea, certainly
for didactic &
maintenance reasons.

It doesn't "solve" the DLL issue though, it only avoids it
(for now)...

I believe you can use a Makefile for your vignettes

(https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes

<https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes>),
and this might be a good way of managing which depends on
which. For passing
along output/input, perhaps local .RData files are good
enough, perhaps some
wheel-reinventing can also be avoided by using

https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html

<https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html>
(haven't actually used it yet, though).

          Wolfgang



5.10.17 20:02, Aaron Lun scripsit:


This may relate to what I was thinking with respect to
solving the DLL
problem, by breaking up large workflows into modules
that can be executed in
separate R sessions. The same approach would also make
it easier to
associate package dependencies with specific parts of
the workflow.


In my particular situation, it is easy to break up the
workflow into
sections that can be executed completely independently.
However, I can also
imagine situations where dependencies on previous
objects, etc. make it
difficult to break up the workflow. If multiple files
are present in
vignettes/, can they be directed to execute in a
specific order, and would
output files from one vignette persist during the
execution of another?


-Aaron


----
*From:* Wolfgang Huber <wolfgang.hu...@embl.de
<mailto:wolfgang.hu...@embl.de>>
*Sent:* Thursday, 5 October 2017 6:23:47 PM
*To:* Laurent Gatto; Aaron Lun
*Cc:* bioc-devel@r-project.org
<mailto:bioc-devel@r-project.org>
*Subject:* Re: [Bioc-devel] library() calls removed in
simpleSingleCell
workflow


I agree it is nice to be able to only load the packages
needed for a
certain section of a vignette and not the whole thing.
And that too many
`::` can make code look unwieldy (though some may
actually increase

Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-06 Thread Wolfgang Huber


On Mac OSX 10.12.16:
$ ulimit -Sn
256

so the maximum value of R_MAX_NUM_DLLS is 153 ...

Wolfgang

5.10.17 23:02, Henrik Bengtsson scripsit:

About the DLL limit:

Just wanna make sure you're aware of "new" environment variable
R_MAX_NUM_DLLS available in R (>= 3.4.0).  It allows you to push the
current default limit of 100 open DLLs a bit higher.  It can be set in
.Renviron or before, e.g.

$ R_MAX_NUM_DLLS=500 R

This, of course, assumes that you can set it, which you might not be
able to do on build servers.  Also, there is an upper limit
min(0.6*fd_limit,1000) that depends on the number of files you can
have open at the same time (fd_limit), e.g. on my Ubuntu 16.04 I've
got:

$ ulimit -Sn
1024

so R_MAX_NUM_DLLS=614 is the maximum for me.

/Henrik

On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber <wolfgang.hu...@embl.de> wrote:


Breaking up long workflows into several smaller "modules" each with a
clearly defined input and output is a good idea, certainly for didactic &
maintenance reasons.

It doesn't "solve" the DLL issue though, it only avoids it (for now)...

I believe you can use a Makefile for your vignettes
(https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes),
and this might be a good way of managing which depends on which. For passing
along output/input, perhaps local .RData files are good enough, perhaps some
wheel-reinventing can also be avoided by using
https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
(haven't actually used it yet, though).

 Wolfgang



5.10.17 20:02, Aaron Lun scripsit:


This may relate to what I was thinking with respect to solving the DLL
problem, by breaking up large workflows into modules that can be executed in
separate R sessions. The same approach would also make it easier to
associate package dependencies with specific parts of the workflow.


In my particular situation, it is easy to break up the workflow into
sections that can be executed completely independently. However, I can also
imagine situations where dependencies on previous objects, etc. make it
difficult to break up the workflow. If multiple files are present in
vignettes/, can they be directed to execute in a specific order, and would
output files from one vignette persist during the execution of another?


-Aaron

----
*From:* Wolfgang Huber <wolfgang.hu...@embl.de>
*Sent:* Thursday, 5 October 2017 6:23:47 PM
*To:* Laurent Gatto; Aaron Lun
*Cc:* bioc-devel@r-project.org
*Subject:* Re: [Bioc-devel] library() calls removed in simpleSingleCell
workflow


I agree it is nice to be able to only load the packages needed for a
certain section of a vignette and not the whole thing. And that too many
`::` can make code look unwieldy (though some may actually increase
readability).

But relying on manually sprinkled in `library` calls seems like a hack
prone to error. And there are always bound to be dependencies that are
non-local, e.g. on general infrastructure like SummarizedExperiment,
ggplot2, dplyr.

So: do we need a way to computationally determine the dependencies of a
vignette section, including highlighting/eliminating potential name
clashes (b/c the warnings about masking emitted at package loading are
easily ignored)? This seems like a straightforward engineering task.

Eventually with such code analysis we could get rid of explicit
`library` calls altogether :)

  Wolfgang





5.10.17 08:53, Laurent Gatto scripsit:



On  5 October 2017 00:11, Aaron Lun wrote:


Here's another two cents from me:

The explicit library() calls allow for easy copy-pasting if people
only want to use/adapt a section of the workflow. In such cases,
calling "library(simpleSingleCell)" could drag in a lot of unnecessary
packages (e.g., which could hit the DLL limit). Reading through the
text to figure out the requirements for each code chunk seems like a
pain, and lots of "::" are unwieldy.

More generally, the removal of individual library() calls seems to
encourage the use of a single "library(simpleSingleCell)" call at the
top of any user-developed custom analysis scripts based on the
workflow. This seems conceptually odd to me - the simpleSingleCell
package is simply a vehicle for the compiled workflow, it shouldn't be
involved in analyses of other data.



I can confirm that this is a possibility.

Before workflows became available, I created the RforProteomics package
that essentially provided one relatively large vignette to demonstrate a
variety of applications of R/Bioconductor for mass spectrometry and
proteomics. I think this has been a useful way to disseminate R and
Bioconductor in these respective communities, but also lead to the
confusion that it was that package that "did all the stuff", i.e. people
saying that they were using RforProteomics to do a task that was
described i

Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-05 Thread Wolfgang Huber



Breaking up long workflows into several smaller "modules" each with a 
clearly defined input and output is a good idea, certainly for didactic 
& maintenance reasons.


It doesn't "solve" the DLL issue though, it only avoids it (for now)...

I believe you can use a Makefile for your vignettes 
(https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes), 
and this might be a good way of managing which depends on which. For 
passing along output/input, perhaps local .RData files are good enough, 
perhaps some wheel-reinventing can also be avoided by using 
https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html 
(haven't actually used it yet, though).


Wolfgang



5.10.17 20:02, Aaron Lun scripsit:
This may relate to what I was thinking with respect to solving the DLL 
problem, by breaking up large workflows into modules that can be 
executed in separate R sessions. The same approach would also make it 
easier to associate package dependencies with specific parts of the 
workflow.



In my particular situation, it is easy to break up the workflow into 
sections that can be executed completely independently. However, I can 
also imagine situations where dependencies on previous objects, etc. 
make it difficult to break up the workflow. If multiple files are 
present in vignettes/, can they be directed to execute in a specific 
order, and would output files from one vignette persist during the 
execution of another?



-Aaron

----
*From:* Wolfgang Huber <wolfgang.hu...@embl.de>
*Sent:* Thursday, 5 October 2017 6:23:47 PM
*To:* Laurent Gatto; Aaron Lun
*Cc:* bioc-devel@r-project.org
*Subject:* Re: [Bioc-devel] library() calls removed in simpleSingleCell 
workflow


I agree it is nice to be able to only load the packages needed for a
certain section of a vignette and not the whole thing. And that too many
`::` can make code look unwieldy (though some may actually increase
readability).

But relying on manually sprinkled in `library` calls seems like a hack
prone to error. And there are always bound to be dependencies that are
non-local, e.g. on general infrastructure like SummarizedExperiment,
ggplot2, dplyr.

So: do we need a way to computationally determine the dependencies of a
vignette section, including highlighting/eliminating potential name
clashes (b/c the warnings about masking emitted at package loading are
easily ignored)? This seems like a straightforward engineering task.

Eventually with such code analysis we could get rid of explicit
`library` calls altogether :)

     Wolfgang





5.10.17 08:53, Laurent Gatto scripsit:


On  5 October 2017 00:11, Aaron Lun wrote:


Here's another two cents from me:

The explicit library() calls allow for easy copy-pasting if people
only want to use/adapt a section of the workflow. In such cases,
calling "library(simpleSingleCell)" could drag in a lot of unnecessary
packages (e.g., which could hit the DLL limit). Reading through the
text to figure out the requirements for each code chunk seems like a
pain, and lots of "::" are unwieldy.

More generally, the removal of individual library() calls seems to
encourage the use of a single "library(simpleSingleCell)" call at the
top of any user-developed custom analysis scripts based on the
workflow. This seems conceptually odd to me - the simpleSingleCell
package is simply a vehicle for the compiled workflow, it shouldn't be
involved in analyses of other data.


I can confirm that this is a possibility.

Before workflows became available, I created the RforProteomics package
that essentially provided one relatively large vignette to demonstrate a
variety of applications of R/Bioconductor for mass spectrometry and
proteomics. I think this has been a useful way to disseminate R and
Bioconductor in these respective communities, but also lead to the
confusion that it was that package that "did all the stuff", i.e. people
saying that they were using RforProteomics to do a task that was
described in the vignette. The RforProteomics vignette does explicitly
call library at the beginning of each section and explained that the
package was only a collection of analyses stemming from other packages,
but that wasn't enough apparently.

Laurent



-Aaron


From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Wolfgang Huber 
<wolfgang.hu...@embl.de>
Sent: Thursday, 5 October 2017 8:26 AM
To: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow


I find `eval=FALSE` chunks not a good idea, since
- they confuse users who only see the rendered HTML/PDF (where this flag
is not shown)
- they are not tested, so more prone to code rot.

I'd also like to object to the idea that proximity of a `library` call
to code that uses a package is somehow

Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-05 Thread Wolfgang Huber



I agree it is nice to be able to only load the packages needed for a 
certain section of a vignette and not the whole thing. And that too many 
`::` can make code look unwieldy (though some may actually increase 
readability).


But relying on manually sprinkled in `library` calls seems like a hack 
prone to error. And there are always bound to be dependencies that are 
non-local, e.g. on general infrastructure like SummarizedExperiment, 
ggplot2, dplyr.


So: do we need a way to computationally determine the dependencies of a 
vignette section, including highlighting/eliminating potential name 
clashes (b/c the warnings about masking emitted at package loading are 
easily ignored)? This seems like a straightforward engineering task.


Eventually with such code analysis we could get rid of explicit 
`library` calls altogether :)


Wolfgang





5.10.17 08:53, Laurent Gatto scripsit:


On  5 October 2017 00:11, Aaron Lun wrote:


Here's another two cents from me:

The explicit library() calls allow for easy copy-pasting if people
only want to use/adapt a section of the workflow. In such cases,
calling "library(simpleSingleCell)" could drag in a lot of unnecessary
packages (e.g., which could hit the DLL limit). Reading through the
text to figure out the requirements for each code chunk seems like a
pain, and lots of "::" are unwieldy.

More generally, the removal of individual library() calls seems to
encourage the use of a single "library(simpleSingleCell)" call at the
top of any user-developed custom analysis scripts based on the
workflow. This seems conceptually odd to me - the simpleSingleCell
package is simply a vehicle for the compiled workflow, it shouldn't be
involved in analyses of other data.


I can confirm that this is a possibility.

Before workflows became available, I created the RforProteomics package
that essentially provided one relatively large vignette to demonstrate a
variety of applications of R/Bioconductor for mass spectrometry and
proteomics. I think this has been a useful way to disseminate R and
Bioconductor in these respective communities, but also lead to the
confusion that it was that package that "did all the stuff", i.e. people
saying that they were using RforProteomics to do a task that was
described in the vignette. The RforProteomics vignette does explicitly
call library at the beginning of each section and explained that the
package was only a collection of analyses stemming from other packages,
but that wasn't enough apparently.

Laurent



-Aaron


From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Wolfgang Huber 
<wolfgang.hu...@embl.de>
Sent: Thursday, 5 October 2017 8:26 AM
To: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow


I find `eval=FALSE` chunks not a good idea, since
- they confuse users who only see the rendered HTML/PDF (where this flag
is not shown)
- they are not tested, so more prone to code rot.

I'd also like to object to the idea that proximity of a `library` call
to code that uses a package is somehow didactic. It's actually a bad
habit: the R interpreter does not care. The relevant package
- can be mentioned in the narrative,
- stated in the code with the pkgname:: prefix.
The latter is good didactics to get people used to the idea of
namespaces, especially since there is an increasing frequency of name
clashes in CRAN, tidyverse, BioC (e.g. consider the various functions
named 'filter' and the obscure malbehaviors that can result from these).

Best wishes
 Wolfgang

On 04/10/2017 22:20, Turaga, Nitesh wrote:

Hi Aaron,


A work around solution maybe to, put all libraries in a eval=FALSE block in 
the r code chunk

```{r, eval=FALSE}
library(scran)
library(scater)
```

etc.


This way the users can see the library() calls in the vignette.

Best,

Nitesh


On Oct 4, 2017, at 4:14 PM, Obenchain, Valerie 
<valerie.obench...@roswellpark.org> wrote:

Hi guys,

A little background on this vignette -> package conversion. The workflows were 
converted to package form because we want to integrate them into the nightly build 
system instead of supporting separate machines as we're now doing.

As part of this conversion, packages loaded in workflow vignettes were moved to 
Depends in DESCRIPTION. This enables the user to load a single package instead 
of many. Packages were moved to Depends instead of Suggests (as is usually done 
with software packages) because these vignette is the only thing these workflow 
packages have going - no defined classes or methods. This seemed a more tidy 
approach and the dependencies are listed in Depends for the user to see. This 
was my (maybe bad?) idea and Nitesh was the messenger. If you feel the 
individual loading of packages in the vignette is a key part of the 
instruction/learning we can leave them as is and list the pa

Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow

2017-10-04 Thread Wolfgang Huber

elete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] strange error in Jenkins build forsingleCellWorkflow

2017-09-19 Thread Wolfgang Huber



19.9.17 18:16, Martin Morgan scripsit:

On 09/19/2017 09:50 AM, Wolfgang Huber wrote:

My 3 cents:
- I think this is a more and more common problem that I'm also 
encountering in everyday work and that asks for a general solution.
- I agree with Martin that setting R_MAX_NUM_DLLS is better than 
unloading. AfaIk it is not even possible to cleanly unload every 
package ('as if it had never been loaded') due to irreversible global 
effects; although I'd happy to be educated otherwise.
- R_MAX_NUM_DLLS is not a sustainable solution either: the current 
default is 100, but e.g. on my MacOS 10.12 any value >152 leads to an 
error. Upping to the maximum 152 will give us some temporary respite 
but seems not really future-proof.


This was the R-core motivation for increasing the max to only 100, but 
it's still surprising to me that a modern OS has such a tight limit. 
I'll see if there are ideas in R-core.


 From our internal discussions there is some willingness to (continue) 
supporting large and complicated work flows, 


Doesn't need to be particularly large and complicated. Just the 
following in a fresh R session leads to 61 DLLs being loaded:


  library("tidyverse")
  library("DESeq2")
  .dynLibs()

and when I add a

  library("caret")

we're up to 72, and plus "scater", at 78.

Best-
Wolfgang

but it is valuable to think
carefully about the consequences for users following along. Maybe part 
of this is clearly alerting the user to the fact that 500G of data are 
going to be downloaded, the workflow requires advanced configuration of 
R, etc.


@Aaron -- if you'd like to continue with one work flow, contact Herve 
(cc'd) and he'll provide the .BBSoptions configuration to allow the 
build system to use an appropriate R_MAX_NUM_DLLS. If instead you'd like 
to produce two workflows, then the best strategy in your case would be 
to simply have two independent packages (DESCRIPTION + vignettes/) each 
with more modest numbers of DLLs; contact Lori (cc'd) when you've 
decided on a second name, and we'll create the svn location for you.


Martin



 Wolfgang

19.9.17 12:02, Martin Morgan scripsit:

On 09/18/2017 10:42 PM, Shian Su wrote:

Hi Aaron,

Would you mind sharing the code for flushing DLLs? This is a problem 
that others working with single cells and I have faced.




For the user encountering this problem I think a better solution is 
to increase the number of DLLs allowed by R, for instance editing 
.Renviron to contain the line


R_MAX_NUM_DLLS=120

or similar. This can be on an installation-wide, user-wise, or 
project-specific basis, as described in ?Startup


@Aaron -- we are still discussing things internally; for instance it 
is possible to set the maximum number of DLLs in the build system.


Martin

Better yet would anyone know of code that would allow unused DLL to 
be identified and unloaded? I suspect not as it would require 
keeping track of the dependency tree of your current environment but 
I’m hopeful.


Kind regards,
Shian Su


On 19 Sep 2017, at 12:30 pm, Aaron Lun <a...@wehi.edu.au> wrote:

Well, inertia won out in the end, and so I've just moved a whole 
stack of packages into "Suggests" for now. This is probably not a 
sustainable solution as the workflow can potentially get larger 
over time; I would prefer to have some formal support for splitting 
up the workflow into modules that can be independently installed.


-Aaron

From: Vincent Carey <st...@channing.harvard.edu>
Sent: Saturday, 16 September 2017 10:08:13 PM
To: Aaron Lun
Cc: Martin Morgan; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in 
Jenkins build forsingleCellWorkflow


IMHO the pedagogic value of a unified document that treats a topic 
thoroughly
is quite high.  Building the whole workflow on an arbitrary user's 
system seems to
me to be a lower priority.  Thus using the environment variable in 
the build system

to avoid this limit seems an appropriate solution.

On Sat, Sep 16, 2017 at 7:43 AM, Aaron Lun 
<a...@wehi.edu.au<mailto:a...@wehi.edu.au>> wrote:
Thanks Martin. Yes, it's quite unfortunate that scater drags in 
dplyr and ggplot2, which - combined with Bioconductor's core 
packages - already puts us pretty close to the limit without doing 
anything else!



A solution might be to split my workflow into self-contained 
components, each of which can become its own workflow package 
(e.g., simpleSingleCell1, simpleSingleCell2, simpleSingleCell3 and 
so on). This should avoid all of the problems and our associated 
hacks.



I'm happy to do this, but is it possible for the website to 
indicate that there is a connection between the component 
workflows? For example, the link that ordinarily goes to the 
compiled workflow could instead go to an indexing page, which 
contains links to individual component workflows.

Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

2017-09-19 Thread Wolfgang Huber

 without having to execute special variants of biocLite() /
install.packages() / funky code in the vignette itself to be able to
build the vignette.

Loading a package loads its Depends: (and Imports:) so triggers the 
problem.


Writing separate vignettes would not help with this (but might make the
workflow more palatable; I'm not 100% sure of support for separate work
flows in a single package, there is no problem with having multiple
workflow packages on the same general topic).

One could move (some?) packages to Suggests: and use your trick of
unloading packages part-way through the vignette. But then users will
find that they need to install packages to complete the vignette.

'We' could add a support for a BBS option that increases R_MAX_NUM_DLLS,
but that would allow the workflow to build on the build system, but not
on the users' system.

I think also the R-core approach to this
(https://stat.ethz.ch/pipermail/r-devel/2016-December/073529.html,
https://github.com/wch/r-source/commit/757bfa1d7ff373a604d6d34617f9cad78e0c875e) 


is a little insightful, where one could imagine increasing the default
R_MAX_NUM_DLLS, but apparently on some OS these compete for number of
open files, and this in turn can be quite low.

I note that users have already struggled with the DLL problem 'in the
wild' https://stackoverflow.com/a/45552926/547331. This seems
particularly problematic for workflows, which are appealing to
relatively novice users.

At the end of the day I think the workflows should make realistic use of
R resources. I think this means modifying the workflow to use fewer
DLLs. (this general comment is relevant to other workflows, which for
instance start by downloading very large data sets -- I know that less
constrained use of computing resources is supposed to be a selling point
of the workflows, but in excess this seems counter-productive to their
primary use as pedagogic tools [rather than, for instance, comprehensive
exemplars of reproducible research]).

Maybe there is additional discussion about some of the technical aspects
of workflows that others might contribute.

Martin




Cheers


Aaron


From: Bioc-devel 
<bioc-devel-boun...@r-project.org<mailto:bioc-devel-boun...@r-project.org>> 
on behalf of Aaron Lun <a...@wehi.edu.au<mailto:a...@wehi.edu.au>>

Sent: Wednesday, 21 June 2017 12:09:13 AM
To: bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
Subject: [Untrusted Server]Re: [Bioc-devel] strange error in Jenkins 
build forsingleCellWorkflow


Hi all,


I'm getting a curious error in the Jenkins log when I try to build 
the singleCellWorkflow:



http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/48/label=master/console 




The key part is at the bottom:


Error: package or namespace load failed for 'GenomicFeatures' in 
dyn.load(file, DLLpath = DLLpath, ...):
  unable to load shared object 
'/var/lib/jenkins/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/libs/Rsamtools.so': 


   `maximal number of DLLs reached...


The workflow had previously been running fine on the build system; 
I'm not quite sure what's going on here, given that it's not even 
failing at the point where I made the latest changes.


Cheers,

Aaron

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

   [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the 
employee or agent responsible for the delivery of this message to the 
intended recipient(s), you are hereby notified that any disclosure, 
copying, distribution, or use of this email message is prohibited.  
If you have received this message in error, please notify the sender 
immediately by e-mail and delete this email message from your 
computer. Thank you.


    [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




This email message may contain legally privileged and/or...{{dropped:2}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
With thanks in advance-
Wolfgang

---
W

Re: [Bioc-devel] adding multiple vignettes not all should be processed by the build-system

2017-07-21 Thread Wolfgang Huber



Hi Maarten

The most pragmatic way is perhaps to add a regular .Rmd file into the 
"vignettes/" directory, which does nothing but link (or even redirect?) 
to the prebuilt HTML file. (Not sure whether that can even be in the 
same directory, or whether it's better to put it somewhere under "inst/"?)


There's probably a way to copy the pre-built HTML files into the right 
place in "vignettes/" directory using a Makefile and making sure the 
indexing and landing page display is done. This used to work for vsn, 
but more recently I concluded this could be too unrobust and unportable. 
There's probably also value for your users in clearly seeing a 
difference between the dynamically built and the static documents, and 
in not confusing them with non-standard Makefile magic.


Wolfgang



19.7.17 09:07, Maarten van Iterson scripsit:

Dear all,

I've created a package that contains one vignette showing its core
functionality that I would like to be processed by the build-system and two
other vignettes showing more comprehensive examples, e.g. including
downloading publicly available data from GEO and 1000G vcfs which take too
much time and memory resources. I would like to show these two vignettes on
the landing page using prebuild html but don't want them to be processed by
the build-system.

Is this possible and if yes, what is the appropriate way to accomplish this?

The package is currently under review and called omicsPrint.

Regards,
Maarten

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Bioconductor issue 396: Vignette building problem for OPWpaper package

2017-07-17 Thread Wolfgang Huber


Hi Shakil

Why don't you use the regular mechanism for data in packages, e.g.: 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Data-in-packages


Btw, is the preprint of the associated paper out already?

best wishes
Wolfgang


17.7.17 10:25, Shakil Mohamad scripsit:

Hi,

I manually stored some .RDATA in one of my package's folder named 'results'
located in the 'inst' folder (
https://github.com/mshasan/OPWpaper/inst/simulations/results). I need to
use those .RDATA to build the vignettes. For example, one of my vignettes'
names is 'FWER_and_FDR' in which I tried the following code to load .RDATA.
Although it passed R CMD check but did not pass BioCheck. Can you tell me
what can I do?

```{r load_fwer_data}
fwer_dat <- system.file("simulations/results", package = "OPWpaper")
setwd(fwer_dat)
load("simu_fwer.RDATA")

# another way-
# load(system.file("simulations/results", "simu_fwer.RDATA", package =
"OPWpaper"), envir = environment())
```

Thank you, I appreciate your time and considerations.

-
Shakil



<https://mailtrack.io/> Sent with Mailtrack
<https://mailtrack.io/install?source=signature=en=shakilmoham...@gmail.com=22>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] invalid PDF version and NULL vector errors in unchanged package

2017-05-18 Thread Wolfgang Huber


Dear Sokratis

What do you get when you run the package vignette on a current R-devel 
on your own machine?


Even if your code didn't change, that of R, or of packages your package 
depends on, will have changed and may challenge assumptions that you 
make in your code.


best wishes
Wolfgang

17.5.17 15:39, Sokratis Kariotis scripsit:

Hey all,

I maintain 'pathprint', a package added to bioconductor in the last
release. Now the BioC 3.6 BUILD (
http://bioconductor.org/checkResults/3.6/bioc-LATEST/pathprint/)
<http://bioconductor.org/checkResults/3.6/bioc-LATEST/pathprint/> is
producing an error in all 3 platforms (despite it used to be error-free and
there were no changes to the code after that). *malbec1* and *tokay1* share
the same error:

Error: processing vignette 'exampleFingerprint.Rnw' failed with diagnostics:
invalid PDF version
Execution halted

while *veracruz1* provides a different error

Error: processing vignette 'exampleFingerprint.Rnw' failed with diagnostics:
  chunk 2 (label = data)
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  :
   'data' must be of a vector type, was 'NULL'
Execution halted

The two errors seem to be different, the first one appears at the very
beginning and is general,
while the second one is having a problem with the data chuck of the file.

Could any changes in BioC 3.6 be responsible, since the code didnt
change at all? Thanks in advance for any help!

Regards,

Sokratis

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
With thanks in advance-
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Question about R functions

2017-03-31 Thread Wolfgang Huber



Thanks Juan; the .point used to be a way to do this, but since the 
introduction of namespaces to R, it is neither necessary nor sufficient 
for private functions. See e.g. .Hub in the AnnotationHub package, or 
the .Call function in base.


See 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-namespaces

http://r-pkgs.had.co.nz/namespace.html

Wolfgang

30.3.17 22:46, Juan David Henao Sanchez scripsit:

Hi Wang

You can create internal functions calling them as ".function", the point is
necessary to declare an internal function. Additionally, you can put all
your internal functions in the same R file and is not necessary create the
documentation for this functions.

Best regards.

Juan D. Henao

2017-03-30 15:02 GMT-05:00 Jing Wang :


Hi,



I have three functions (FA,FB,FC) in the package and all these functions
need to call another function (FD). But I do not want other users to use
the function FD and thus I do not want to create the document for FD in the
R package.



Could you please give me some suggestion how to do that?



Thanks,








[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel







___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] R long vectors not supported yet on R 3.3.2

2017-03-10 Thread Wolfgang Huber



The human genome is arranged in 46 chromosomes. The longest is ~250 Mb 
(~2^28). While a Hilbert curve layout of a single chromosome tends to be 
informative, there is no obvious meaning in treating the complete human 
genome as a single 3 Gb linear sequence.


Wolfgang

On 10/03/2017 21:54, Wolfgang Huber wrote:

Two replies:

1. Downsampling?
In case you want to use the Hilbert curve for visualisation, please note
that you will need a graphics device with resolution 65536 x 65536 to
display it. Many people have smaller screens, so binning the genome
(e.g. into bins of 10x10=100nt) could be a practical solution, and more
efficient than computing some large intermediate thing that your
graphics device will then downsample anyway.

2. Long vector
In case you really need the big curve: I just had a look at the C code
in the "HilbertVis" package, which anyway uses long ints, and it does
not look difficult to modify the R wrapper so that it uses a long
vector. I assume that Simon would welcome the patch.

Wolfgang





9.3.17 08:44, Sohaib Ghani scripsit:

I am trying to simulate hilbertcurve (of Bioconductor package) of
level 16 in R. It takes about 4^16=4 Billion points. I want to
generate the hilbert curve of genome (size about 3 billion).

But I am getting this error

long vectors not supported yet: memory.c:1668

I am using 64 bit version (R 3.3.2) so my guess is I can use vectors
of length > 2^31. Also, my RAM is about 350GB.

The command I am using is

itr=4^16
hc = HilbertCurve(1, itr, 16, mode = "pixel", title = "pixel
mode",start_from = "topleft")

Even when I am reading the whole genome sometimes R is crashing in the
process.

I have read the other similar questions on this topic but could not
find the solution. Please help me what should I use for this problem.


Thanks

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Best wishes
Wolfgang

---
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] R long vectors not supported yet on R 3.3.2

2017-03-10 Thread Wolfgang Huber


Two replies:

1. Downsampling?
In case you want to use the Hilbert curve for visualisation, please note 
that you will need a graphics device with resolution 65536 x 65536 to 
display it. Many people have smaller screens, so binning the genome 
(e.g. into bins of 10x10=100nt) could be a practical solution, and more 
efficient than computing some large intermediate thing that your 
graphics device will then downsample anyway.


2. Long vector
In case you really need the big curve: I just had a look at the C code 
in the "HilbertVis" package, which anyway uses long ints, and it does 
not look difficult to modify the R wrapper so that it uses a long 
vector. I assume that Simon would welcome the patch.


Wolfgang





9.3.17 08:44, Sohaib Ghani scripsit:

I am trying to simulate hilbertcurve (of Bioconductor package) of level 16 in 
R. It takes about 4^16=4 Billion points. I want to generate the hilbert curve 
of genome (size about 3 billion).

But I am getting this error

long vectors not supported yet: memory.c:1668

I am using 64 bit version (R 3.3.2) so my guess is I can use vectors of length 
> 2^31. Also, my RAM is about 350GB.

The command I am using is

itr=4^16
hc = HilbertCurve(1, itr, 16, mode = "pixel", title = "pixel mode",start_from = 
"topleft")

Even when I am reading the whole genome sometimes R is crashing in the process.

I have read the other similar questions on this topic but could not find the 
solution. Please help me what should I use for this problem.


Thanks

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] any interest in a BiocMatrix core package?

2017-03-03 Thread Wolfgang Huber


Dear Aaron

Thank you. I think it's an important simplification of a potential API 
when you are saying that what you mostly need are accessors

  m[i, ] and m[, i]
with i scalar or a short contiguous range, such that the value of that 
could be a relatively small ordinary matrix. (Compared to operations 
like matrix multiplication, SVD or other decompositions.)


Wolfgang

PS Loops per se in today's R are not as slow as some think: depending on 
the algorithm, the time "wasted" by the R interpreter on looking up 
symbols etc may (or may not) be negligible compared to the actual 
computations that are done at the C level anyway:


g = function(n) {
s = 0
for (i in seq_len(n))
s = s + i
s
}

cg = compiler::cmpfun(g)

print(system.time( g(1e6)))
   user  system elapsed
  0.161   0.000   0.161

print(system.time(cg(1e6)))
  user  system elapsed
  0.043   0.000   0.043



2.3.17 20:05, Aaron Lun scripsit:

I'll give two examples from the scran package. In both cases, the count
matrix is such that rows are genes and columns are cells. The first
example involves cell cycle phase assignment (from the cyclone()
function, FYI). Briefly, upon entry to C++, the function:

1. Loops through the cells, one at a time.
2. For each cell, it applies a classifier to the counts for that cell
(i.e., a column of the count matrix). This is not a straightforward
operation and also involves a number of random permutations.
3. Returns a set of scores representing the phase assignment.

For a few cells, I could conceivably move the loop into R and just
supply the column counts for each cell via .Call, which would avoid the
need to interact with the matrix in C++. However, if I were to process
one million cells, the slowness of R's loops would really hurt.

The second example involves normalization using a pooling and
deconvolution algorithm (from the computeSumFactors() function). Upon
entry into C++, the function:

1. Loops through an ordered set of cells.
2. At each cell, the neighbouring set of 20-100 cells defines a sliding
window. Counts for all cells in the window are summed together to create
a pooled expression profile.
3. The pooled profile is used to obtain a size factor, by computing the
median of the ratios between the pool and a pseudo-cell.
4. This is repeated for all cells in the set (i.e., all positions of the
window). Each window corresponds to a pool; the function stores the
identity of the cells in the pool and the size factor for the pool.

The output is used to construct a linear system at the R level, which is
solved to obtain cell-specific size factors. Again, the work done within
the loop is not obviously vectorizable with standard functions.

All of the cases I work with involve processing one row or column at a
time; I generally don't do matrix operations that require random access,
at least not at the C++ level.

Another motivation for moving into C++ is the greater control over
memory management. For a decent number of cells, this can make the
difference between something being runnable or not.

Cheers,

Aaron

On 02/03/17 18:09, Wolfgang Huber wrote:

Aaron

Can you describe use cases, i.e. intended computations on these
matrices, esp. those for which C++ access is needed for?

I'm asking b/c the goals of efficient code and abstraction from how the
data are stored may be conflicting - in which case critical algorithms
may end up circumventing a prematurely defined API.

Wolfgang


25.2.17 00:37, Aaron Lun scripsit:

Yes, I think double-precision would be necessary for general use. Only
the raw count data would be integer, and even then that's not
guaranteed (e.g., if people are using kallisto or salmon for
quantification).


-Aaron



From: Vincent Carey <st...@channing.harvard.edu>
Sent: Saturday, 25 February 2017 9:25 AM
To: Aaron Lun
Cc: Tim Triche, Jr.; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] any interest in a BiocMatrix core package?

What is the data type for an expression value?  Is it assumed that
double precision will be needed?

On Fri, Feb 24, 2017 at 4:50 PM, Aaron Lun
<a...@wehi.edu.au<mailto:a...@wehi.edu.au>> wrote:
It's a good place to start, though it would be very handy to have a
C(++) API that can be linked against. I'm not sure how much work that
would entail but it would give downstream developers a lot more
options. Sort of like how we can link to Rhtslib, which speeds up a
lot of BAM file processing, instead of just relying on Rsamtools.


-Aaron


From: Tim Triche, Jr. <tim.tri...@gmail.com<mailto:tim.tri...@gmail.com>>
Sent: Saturday, 25 February 2017 8:34:58 AM
To: Aaron Lun
Cc: bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
Subject: Re: [Bioc-devel] any interest in a BiocMatrix core package?

yes

the DelayedArray framework that handles HDF5Array, etc. seems like the
right choice?

--t

On Fri, Feb 24, 2017 at

Re: [Bioc-devel] any interest in a BiocMatrix core package?

2017-03-02 Thread Wolfgang Huber

Aaron

Can you describe use cases, i.e. intended computations on these 
matrices, esp. those for which C++ access is needed for?

I'm asking b/c the goals of efficient code and abstraction from how the 
data are stored may be conflicting - in which case critical algorithms 
may end up circumventing a prematurely defined API.

Wolfgang

25.2.17 00:37, Aaron Lun scripsit:

Yes, I think double-precision would be necessary for general use. Only the raw 
count data would be integer, and even then that's not guaranteed (e.g., if 
people are using kallisto or salmon for quantification).

-Aaron

From: Vincent Carey 
Sent: Saturday, 25 February 2017 9:25 AM
To: Aaron Lun
Cc: Tim Triche, Jr.; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] any interest in a BiocMatrix core package?

What is the data type for an expression value?  Is it assumed that double 
precision will be needed?

On Fri, Feb 24, 2017 at 4:50 PM, Aaron Lun 
> wrote:
It's a good place to start, though it would be very handy to have a C(++) API 
that can be linked against. I'm not sure how much work that would entail but it 
would give downstream developers a lot more options. Sort of like how we can 
link to Rhtslib, which speeds up a lot of BAM file processing, instead of just 
relying on Rsamtools.

-Aaron

From: Tim Triche, Jr. >
Sent: Saturday, 25 February 2017 8:34:58 AM
To: Aaron Lun
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] any interest in a BiocMatrix core package?

yes

the DelayedArray framework that handles HDF5Array, etc. seems like the right 
choice?

--t

On Fri, Feb 24, 2017 at 1:26 PM, Aaron Lun 
>>
 wrote:
Hi everyone,

I just attended the Human Cell Atlas meeting in Stanford, and people were talking 
about gene expression matrices for >1 million cells. If we assume that we can 
get non-zero expression profiles for ~5000 genes, we�d be talking about a 5000 x 1 
million matrix for the raw count data. This would be 20-40 GB in size, which would 
clearly benefit from sparse (via Matrix) or disk-backed representations 
(bigmatrix, BufferedMatrix, rhdf5, etc.).

I�m wondering whether there is any appetite amongst us for making a consistent 
BioC API to handle these matrices, sort of like what BiocParallel does for 
multicore and snow. It goes without saying that the different matrix 
representations should have consistent functions at the R level (rbind/cbind, 
etc.) but it would also be nice to have an integrated C/C++ API (accessible via 
LinkedTo). There�s many non-trivial things that can be done with this type of 
data, and it is often faster and more memory efficient to do these complex 
operations in compiled code.

I was thinking of something that you could supply any supported matrix 
representation to a registered function via .Call; the C++ constructor would 
recognise the type of matrix during class instantiation; and operations 
(row/column/random read access, also possibly various ways of writing a matrix) 
would be overloaded and behave as required for the class. Only the 
implementation of the API would need to care about the nitty gritty of each 
representation, and we would all be free to write code that actually does the 
interesting analytical stuff.

Anyway, just throwing some thoughts out there. Any comments appreciated.

Cheers,

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Add journal citation to a package

2016-12-28 Thread Wolfgang Huber

Dear Andrea

Have a look at this:
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#CITATION-files
Many CRAN and Bioconductor packages are doing this, you could look at some of 
them for examples.

Wolfgang


> On 28 Dec 2016, at 15:49, Rodriguez Martinez, Andrea 
>  wrote:
> 
> Hi,
> 
> I'd like to add the corresponding journal citation to my package. How can I 
> do that?
> 
> Thanks very much in advance,
> 
> Best wishes,
> 
> Andrea
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] .git folders in the repository

2016-12-15 Thread Wolfgang Huber

Today I get, after “svn up"
whuber@boltzmann:~/madman/Rpacks$ find . -name .git -exec du -sh {} \;
 17M./anamiR/anamiR/.git
 56K./RareVariantVis/.git

Probably the “.git" folders should not be checked in?

Best wishes
Wolfgang

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] FlipFlop Ignores Read Strand and Requires Antiquated File Formats

2016-11-01 Thread Wolfgang Huber

Dear Dario

These all look like legitimate suggestions, and probably the package author has 
something to say on their substance.

But independent of the specifics here, I think we should also keep in mind that 
the motivations for package submission vary across the project.
Some contributors really take it on themselves to continuously develop and 
extend a solution for a particular field of science or technology.
This is great, in particular for users.

Others provide a ‘snapshot’ of their work at a particular point in time (e.g. 
end of PhD thesis) and make their research reproducible, so that others can 
build on top of it. This is the same as with research papers, which are also 
rarely supposed to be the “final word” on something but an important advance.
I think this is also great, for the scientific record, and for other method 
developers who can start from such a snapshot (if not literally then at least 
conceptually) and move on to do even greater things.

Best wishes
Wolfgang

> On Nov 1, 2016, at 6:00 GMT+1, Dario Strbenac <dstr7...@uni.sydney.edu.au> 
> wrote:
> 
> Hello,
> 
> The package FlipFlop is made for isoform quantitiation. Why are there no 
> options to specify the RNA-seq read strand ? Otherwise, the method produces 
> incorrect counts where overlapping genes on both strands are being 
> transcribed. Also, the software requires a SAM file as input. This is 
> inefficient, since most mapping results are stored as BAM files. It would be 
> better if FlipFlop made more use of the import and export functions available 
> in Rsamtools. Also, requiring the gene database to be in BED12 format creates 
> more unnecessary work for the user. ENSEMBL and GENCODE both provide GTF and 
> GFF3 files, which can easily be imported into R with functions provided by 
> rtracklayer.
> 
> --
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Ascona Workshop 2017: Call for abstracts

2016-10-18 Thread Wolfgang Huber

Ascona Workshop 2017 -- Statistical Challenges in Single-Cell Biology

Abstract Submission now open!

We would like to announce and invite your participation in a workshop on 
Statistical Challenges in Single-Cell Biology, to be held from April 30 to May 
5, 2017, at Monte Verità, Ascona, in the Italian-speaking part of Switzerland. 
The purpose of the workshop is to bring together participants from statistics, 
computational sciences, bioinformatics and biology, and to encourage 
interaction in an informal and cooperative atmosphere.

Confirmed invited speakers: Nicolas Aceto (University of Basel), Kobi Benenson 
(ETH Zurich), Bernd Bodenmiller (University of Zurich), Petra Dittrich (ETH 
Zurich), Raphael Gottardo (Fred Hutchinson Cancer Research Center), Takashi 
Hiiragi (EMBL Heidelberg), Peter Kharchenko (Harvard Medical School), Smita 
Krishnaswamy (Yale School of Medicine), Prisca Liberali (FMI Basel), John 
Marioni (Cancer Research UK, Cambridge), Nick Navin  (MD Anderson), Magnus 
Rattray (University of Manchester), Oliver Stegle (EMBL-EBI Hinxton), Valérie 
Taly (Paris Descartes)

We welcome your submissions for proposals for contributed presentations. All 
abstract submissions must be made online through the following form:
https://docs.google.com/forms/d/e/1FAIpQLSfUoJY3HMI4FPAS4knXww3y-wA_OG_uMPVzwA070o7AfnRXEg/viewform

Application closes January 15, 2017. Notifications will be sent by January 30, 
2017. More details are available at 
https://www.bsse.ethz.ch/cbg/cbg-news/ascona-2017.html

Niko Beerenwinkel
Peter Bühlmann
Wolfgang Huber

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Statistical Scalability for Streaming Data - 1/2 day workshop in London

2016-10-04 Thread Wolfgang Huber

Perhaps interesting for some:
 http://www.turing-gateway.cam.ac.uk/tgmw39.shtml

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Ascona workshop: Statistical Challenges in Single-Cell Biology, April 30 to May 5, 2017, Monte Verità, Ascona, CH

2016-08-29 Thread Wolfgang Huber


We would like to announce and invite your participation in a workshop on 
Statistical Challenges in Single-Cell Biology, held from April 30 to May 5, 
2017, at Villa Monte Verità, Ascona, on Lago Maggiore and in the foothills of 
the Alps in Switzerland. The purpose of the workshop is to bring together 
participants from statistics, computational sciences, bioinformatics and 
biology, and to encourage interaction in a informal and collegial atmosphere.

Confirmed invited speakers: Petra Dittrich (ETH Zurich), Nicolas Aceto 
(University of Basel), Smita Krishnaswamy (Yale School of Medicine), Oliver 
Stegle (EMBL/EBI Cambridge), Peter Kharchenko (Harvard Medical School), John 
Marioni (Cancer Research UK, Cambridge), Nick Navin  (MD Anderson), Kobi 
Benenson (ETH Zurich), Magnus Rattray (University of Manchester), Prisca 
Liberali (FMI Basel), Takashi Hiiragi (EMBL), Valérie Taly (Paris Descartes).

We welcome your submissions for proposals for contributed presentations
More details are available at 
https://www.bsse.ethz.ch/cbg/cbg-news/ascona-2017.html

Registration to the workshop and submission of abstracts opens in October 2016.

Niko Beerenwinkel
Peter Bühlmann
Wolfgang Huber
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Is it OK for Rmd package vignettes to be rendered as PDF?

2016-08-18 Thread Wolfgang Huber



> On 17 Aug 2016, at 13:02, Henrik Bengtsson  wrote:
> 
> R CMD build, which is what triggers vignette  building, only supports one
> output file (HTML or PDF) per vignette. It will basically ignore duplicate
> output formats. This is by design / legacy reasons. Technically it wouldn't
> be hard to add support for multiple output formats, but that would require
> changes to R itself - I think it could be a useful feature.

Henrik, I’m sure you have more experience and insight with this than I, but I 
wonder when (at what stage) and what for R needs to be changed? It seems there 
are several issues:
(a) having both the PDF and HTML be built by the build system and be shipped 
with the package
(b) making them discoverable on the Bioc package landing page, and on the index 
page of the R-help system.
(c) making (a) and (b) easy and standardized for package authors

Re (a), on first sight, it seems that simply adding the YAML lines Ramon 
mentioned to the vignette will NOT achieve this (it looks like only whatever is 
the first output format stated, is produced), but  according to 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Writing-package-vignettes
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Non_002dSweave-vignettes
I expect that with sufficient cleverness with (i) a Makefile and/or (ii) 
registering your own VignetteBuilder (some wrapper around knitr::render that 
makes sure both outputs are built, with only one run of the R code) it should 
be possible to achieve (a).

For something almost as good as (b) [or better?], you could have the HTML 
indexed, and in it e.g. at the top have a button with a link to the PDF file, 
for those who want to print it.

For (c), I suppose changing R would be handy. Or BiocStyle?

Wolfgang


> 

> A related question is where some prefer to have access to also the
> intermediate plain Markdown / TeX rather than the final HTML / PDF product,
> e.g. because they work better with screen readers.
> 
> The only way I see you can have a PDF and a HTML version at the same time
> is to create to identical vignettes each outputting a specific format.
> 
> Henrik
> 
> On Aug 17, 2016 12:17, "Ramon Diaz-Uriarte"  wrote:
> 
>> 
>> Dear All,
>> 
>> I am considering rewriting the vignette of one BioC package I maintain as
>> Rmd (it is currently Rnw). But I would like that the entry under
>> "Documentation" contain a PDF of the vignette; it can ideally also contain
>> the HTML version too, but I do not want it to not have the PDF[1].
>> 
>> 
>> I know I can add entries to the document header such as
>> 
>> output:
>>  BiocStyle::pdf_document:
>>toc: true
>>  BiocStyle::html_document:
>>toc: true
>> 
>> 
>> that will, when run locally via "render('file.Rmd', output_format =
>> 'all')", produce both formats.
>> 
>> 
>> 
>> I've googled around, but I am not sure about:
>> 
>> 1. If I have both output formats specified in the document header, will the
>> BioC page of the package actually show both the PDF and the HTML of the
>> vignette?
>> 
>> 
>> 2. Is it OK (in conforming with BioC policies, sensible[1], whatever) to
>> even try/want this? My reading of the doc for the BiocStyle
>> (https://www.bioconductor.org/packages/devel/bioc/vignettes/
>> BiocStyle/inst/doc/HtmlStyle.html)
>> seems to suggest that the "natural" thing for Rmd vignettes is to be
>> rendered as HTML, but I have not seen that producing PDF is discouraged
>> explicitly.
>> 
>> 
>> Best,
>> 
>> 
>> R.
>> 
>> 
>> [1] Why do I want to get a PDF if I am using Rmd? I want a PDF because this
>> is a fairly long document that some users want to be able to print. I want
>> HTML because some users prefer HTML and because I'd like to also place the
>> vignette as HTML in Github Pages. I think that the only way to accomplish
>> both is to use Rmd (not Rnw, even if I really, really, prefer LaTeX :-).
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Ramon Diaz-Uriarte
>> Department of Biochemistry, Lab B-25
>> Facultad de Medicina
>> Universidad Autónoma de Madrid
>> Arzobispo Morcillo, 4
>> 28029 Madrid
>> Spain
>> 
>> Phone: +34-91-497-2412
>> 
>> Email: rdia...@gmail.com
>>   ramon.d...@iib.uam.es
>> 
>> http://ligarto.org/rdiaz
>> 
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] problem in building package GSAR

2016-08-06 Thread Wolfgang Huber

Dear Yasir
Did you actually upload the source code of these new function to the subversion 
server?
I get, right now

huber@boltzmann:~/madman/Rpacks/GSAR/R$ svn info .
…
URL: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GSAR/R
Revision: 119920
…

"grep AggrF  *” in that directory does not yield any results. 
"grep radial *” finds radial_ranking but not radial.ranking
etc.

Wolfgang

> On Aug 6, 2016, at 0:34 GMT+2, Rahmatallah, Yasir <yrahmatal...@uams.edu> 
> wrote:
> 
> Hi,
> 
> I recently added 6 more functions to package GSAR. I added the R files and 
> manual pages for the new functions. I updated the namespace file accordingly. 
> I can build the package successfully on my windows PC and 'R CMD check' 
> returns no errors or warnings. I can install the package and it works fine on 
> my PC. I use svn to update the development version (current development 
> version is 1.7.2). The last update returned the following error
> 
> ** testing if installed package can be loaded
> Error in namespaceExport(ns, exports) :
>  undefined exports: AggrFtest, radial.ranking, MDtest, RMDtest, findMST2.PPI, 
> TestGeneSets
> Error: loading failed
> 
> The undefined exports are the 6 new functions I added to the package. It's 
> like they are declared in the namespace file but they don't exist elsewhere. 
> Any idea what causes this error? Help is appreciated. The contents of the 
> namespace file are shown below:
> 
> export(
>  AggrFtest,
>  HDP.ranking,
>  radial.ranking,
>  WWtest,
>  KStest,
>  MDtest,
>  RKStest,
>  RMDtest,
>  GSNCAtest,
>  findMST2,
>  findMST2.PPI,
>  plotMST2.pathway,
>  TestGeneSets
>  )
> 
> import(igraph)
> importFrom("stats", "cor", "dist", "sd", "var.test", "p.adjust")
> importFrom("graphics", "legend", "mtext", "par", "plot", "title")
> 
> Thank you,
> Yasir
> 
> ------
> Confidentiality Notice: This e-mail message, including a...{{dropped:10}}
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Wolfgang

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Announcement of a new package called bacon

2016-06-11 Thread Wolfgang Huber


There exist people that use Bioconductor devel branch packages even though they 
are not developers and don’t subscribe to bioc-devel, so I think announcing on 
the support site can be appropriate.

I understand that problems might arise with & for users who do not easily 
handle versioning, but IMHO they are more than offset by highlightung a 
potentially useful package to as many people as possible as early as possible. 
(I haven’t checked this for a while - does ‘biocLite’ give a useful error 
message when asked to install a ‘devel only’ package on a system that is set up 
for release?)

Best wishes
Wolfgang

> On Jun 9, 2016, at 16:48 GMT+2, Maarten van Iterson <mviter...@gmail.com> 
> wrote:
> 
> So the policy could be: before release to bioc-devel and after release to
> the support-site. Maybe it is more useful to post only on the support-site
> on release? I assume most bioc-devel subscribers follow the support-site.
> 
> Maarten
> 
> On Thu, Jun 9, 2016 at 3:29 PM, James W. MacDonald <jmac...@uw.edu> wrote:
> 
>> I agree, but bacon is part of release. As Maarten said "
>> *I would like to introduce the package bacon that has been added
>> tobioconductor just before the release of version 3.3." *
>> 
>> On Thu, Jun 9, 2016 at 10:09 AM, Martin Morgan <
>> martin.mor...@roswellpark.org> wrote:
>> 
>>> On 06/09/2016 10:06 AM, James W. MacDonald wrote:
>>> 
>>>> You should post this on the support site (
>>>> https://support.bioconductor.org),
>>>> using the 'News' item description. This bioc-devel is intended for
>>>> discussion of issues with development of packages, not really for
>>>> announcing new packages.
>>>> 
>>> 
>>> actually the email to developers on package acceptance encourages them to
>>> post to bioc-devel.
>>> 
>>> I think the rationale was that the package is only available in devel, so
>>> advertising on the support site would just lead to disappointment.
>>> 
>>> Open to revised policies / suggestions, though.
>>> 
>>> Martin
>>> 
>>> 
>>>> Jim
>>>> 
>>>> 
>>>> 
>>>> On Thu, Jun 9, 2016 at 9:37 AM, Maarten van Iterson <mviter...@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>> Dear list,
>>>>> 
>>>>> I would like to introduce the package bacon that has been added to
>>>>> bioconductor just before the release of version 3.3.
>>>>> 
>>>>> bacon can be used to estimate and control for bias and inflation often
>>>>> present in epigenome- and transcriptome-wide association
>>>>> studies(EWAS/TWAS). The idea behind bacon is to estimate the empirical
>>>>> null
>>>>> distribution from the data (using Bayesian statistics) and use the
>>>>> empirical null i.s.o. a theoretical null for inference. Bacon supports
>>>>> bias- and inflation-controlled fixed-effect meta-analysis that can be
>>>>> executed in parallel. A manuscript is available from biorxiv (
>>>>> http://biorxiv.org/content/early/2016/05/27/055772).
>>>>> 
>>>>> Kind regards,
>>>>> 
>>>>> Maarten van Iterson
>>>>> 
>>>>> [[alternative HTML version deleted]]
>>>>> 
>>>>> ___
>>>>> Bioc-devel@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> This email message may contain legally privileged and/or confidential
>>> information.  If you are not the intended recipient(s), or the employee or
>>> agent responsible for the delivery of this message to the intended
>>> recipient(s), you are hereby notified that any disclosure, copying,
>>> distribution, or use of this email message is prohibited.  If you have
>>> received this message in error, please notify the sender immediately by
>>> e-mail and delete this email message from your computer. Thank you.
>>> 
>> 
>> 
>> 
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>> 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Wolfgang

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Workflows for Bioconductor channel at F1000R

2016-06-02 Thread Wolfgang Huber

Susan

I’m giving my reply via the Bioconductor mailing list to also include the 
others, both for input from the core team, and for information of other 
workflow authors.

The preferred approach is to host a live version of the workflow on 
http://www.bioconductor.org/help/workflows/ - with the obvious advantages of 
easy installation for users (e.g. 
http://www.bioconductor.org/help/workflows/rnaseqGene) and continuous testing.

The .Rmds are (and can be maintained by the authors) at 
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/workflows . The 
Bioconductor project is moving from subversion to github, maybe Martin or 
someone can outline the ramifications, if any, for authors.

To get the document from there into F1000R, Mike Love made notes: 
https://docs.google.com/document/d/1SC1Z6TS6eepNelu4Eyng7cdbJBMXzZZoMOzfLEYe0LE/edit
The main effort being in manually uploading the figures to Overleaf and in 
manually adding figure captions. I wish this could be further automated.

Kind regards
Wolfgang

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
http://www.huber.embl.de

> On 1 Jun 2016, at 16:52, Susan Holmes <susanats...@gmail.com> wrote:
> 
> Hi there,
> So I a managed to get our workflow paper in before midnight last night (it is 
> quite long, mainly because of the pictures), I am in touch with their office 
> through
> someone called Paola and she says they prefer a GitHub repository,
> did you do that? Is there a preferred directory/document structure?
> Should I make a Bioconductor Workflow like document to publish alongside?
> 
> 
> Best
> Susan
> 
> 
> Susan Holmes
> Professor, Statistics and BioX
> John Henry Samter University Fellow
> coDirector, Math Comp Sci
> 102, Sequoia Hall,  
> 390 Serra Mall,
> Stanford
> http://www-stat.stanford.edu/~susan/
> 
> 

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] downloads number status shields for packages

2016-05-08 Thread Wolfgang Huber

;>>>> works for CRAN packages http://cranlogs.r-pkg.org/
>>>>>>> 
>>>>>>> It could take total `Nb of downloads` from the right table like here
>>>>>>> http://bioconductor.org/packages/stats/bioc/RTCGA.html
>>>>>>> 
>>>>>>> Best,
>>>>>>> Marcin
>>>>>>> 
>>>>>>> [[alternative HTML version deleted]]
>>>>>>> 
>>>>>>> ___
>>>>>>> Bioc-devel@r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>> 
>>>>>> 
>>>>>> [[alternative HTML version deleted]]
>>>>>> 
>>>>>> ___
>>>>>> Bioc-devel@r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>> 
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Wolfgang

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Bioconductor-related papers -- special F1000R collection for Bioc2016

2016-03-25 Thread Wolfgang Huber

Bioconductor Developers:

The Bioconductor channel on F1000Research< 
http://f1000research.com/channels/bioconductor > will be publishing a 
collection of articles in time for the BioC 2016 conference, and we would like 
to invite you to contribute. We are specifically looking to publish articles on 
the following:
- Cross-package workflows
- Package-based vignettes that aim to solve scientific problems
- Teaching labs
- Benchmark studies
- Methodological reviews
- Perspectives relating to bioinformatics software

We’ll use the rapid and transparent publication process of F1000Research for 
this article collection: versionable articles are published within a few days 
of your submission, together with associated source code and data, before 
openly peer review by invited referees.

The submission deadline for this collection is May 31st. This is a firm 
deadline to ensure papers are published before BioC starts.  Articles will be 
published as and when they are ready, and we expect all contributions will be 
published by June 17th.
Tom Ingraham, who will be helping to coordinate the submissions on the 
F1000Research side of things will follow up shortly with further details and 
instructions.

We look forward to hearing from you

Kind regards
Wolfgang
(For all the editors)
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Package reference manuals in html

2016-03-19 Thread Wolfgang Huber

Dear Andrzej

Thank you. If automagic resolution of cross-links is the one and major 
bottleneck, then maybe we can drop this? 
(The perfect being the enemy of the good…). Assuming a universal namespace, 
i.e. where symbols are mapped to other packages and where this is considered to 
be universal and session-independent anyway seems dubious to me.

Re resources, we (I) should be able to find them (both CPU and disk) at EMBL HD 
or EBI.

Kind regards
Wolfgang

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
http://www.huber.embl.de

> On 16 Mar 2016, at 16:33, Andrzej Oleś <andrzej.o...@gmail.com> wrote:
> 
> Hi all,
> 
> I had a discussion earlier today with Martin and Dan on providing online man 
> pages for Bioconductor packages. As we dived into implementation details, it 
> turned out that this idea is a little bit more complex and resource-intensive 
> than originally anticipated.
> 
> The main problem in generating man pages in a repository-wide fashion seems 
> to be the cross-linking of packages. Briefly, in order to generate the links, 
> apparently one needs to generate the html pages in an R installation which is 
> aware of the other packages. For example, the Rd macro 
> \linkS4class{ClassName} takes as argument only the class name, and the 
> corresponding package containing the class definition is "automagically" 
> resolved by R. I'm not sure how this could be done manually, on a per-package 
> basis. So by the end of the day, in order to generate static man pages, we 
> would need to maintain a complete BioC repo installation, possibly on a 
> system with the --enable-prebuilt-html configure option. Unfortunately, it 
> seems unfeasible to exploit the build servers for this, as this would 
> significantly increase the computational burden. This is because currently 
> only around 2/5 of all software and data packages are actually being 
> installed by the build system. The rest which does not have any reverse 
> dependencies is skipped. Installing the remaining 3/5 of  packages on a 
> regular basis, not to mention the heavy annotation packages, is a little bit 
> of an overkill. So piggy-backing on the existing infrastructure doesn't seem 
> realistic.
> 
> On top of this, even if we would have access to a machine with a complete, 
> up-to-date BioC installation (maybe by just updating the packages after the 
> repo gets rebuild rather than re-installing them each time from scratch), it 
> remains an open question how "external" links to, let's say, CRAN packages, 
> or even base R packages, should be handled.
> 
> A lightweight and easy to implement alternative for those willing to share 
> self-hosted documentation of their packages, could be to provide in the 
> package DESCRIPTION file a "Documentation" field containing a link to 
> external resource, which would then appear on the package landing page next 
> to the vignettes and pdf manual. The obvious downsides of this solution are: 
> 1. no package cross-links, and 2. the burden of keeping the documentation in 
> sync with the package version on BioC would be in maintainer's hands...
> 
> I will try to contact the authors of rdocumentation.org 
> <http://rdocumentation.org/> - maybe they have some useful comments or even 
> code which they would be willing to share. In any case, it would be good to 
> know what their experience is and why did they stop maintaining their 
> service. Maybe the BioC community could jump in and help them to resolve the 
> bottlenecks and keep the website up to date.
> 
> Cheers,
> Andrzej
> 
> 
> On Tue, Mar 8, 2016 at 4:36 PM, Andrzej Oleś <andrzej.o...@gmail.com 
> <mailto:andrzej.o...@gmail.com>> wrote:
> Hi Martin,
> 
> thank you for your suggestions - I would be happy to contribute to this! I 
> could help with developing the scripts for generating man pages, and 
> integrating them with the website layout.
> 
> As for rendering the man pages, I suggest that we try a similar approach to 
> the one used by knitr::knit_rd() rather than plain tools::Rd2HTML(). It has 
> the advantage that the examples are actually run, and the results, e.g. 
> plots, are included in the output documents. I hope you can appreciate the 
> added value by comparing the following man page rendered using 
> tools::Rd2HTML() and knitr::knit_rd(), respectively.
> http://www.huber.embl.de/users/aoles/man/Image.html 
> <http://www.huber.embl.de/users/aoles/man/Image.html>
> http://www.huber.embl.de/users/aoles/man/Image-knitr.html 
> <http://www.huber.embl.de/users/aoles/man/Image-knitr.html>
> Regarding the additional dependencies: we kind of already rely on knitr when 
> compiling vi

Re: [Bioc-devel] specify the color in plotMutationSpectrum() of SomaticSignatures library

2016-02-18 Thread Wolfgang Huber

Dear Rebecca

can you please post this in the Bioconductor user forum - this is not really a 
developer question. 

> On Feb 17, 2016, at 20:33 GMT+1, sun <firefly...@gmail.com> wrote:
> 
> Hi All,
> 
> How can I specify the color that I would like to use in
> plotMutationSpectrum()?
> 
> eg.
> 
> plotMutationSpectrum(sca_motifs, "study", normalize = TRUE), I would like
> to only use "red" color here, how should I do?
> 
> Thanks,
> 
> Rebecca
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Wolfgang

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Classes to be supported by ggraph

2016-02-06 Thread Wolfgang Huber

Dear Thomas

thank you for asking about this. 

The main information in a graphNEL is, of course, the vector of nodes 
(character) and a list of edges. That information can for sure go into an 
igraph. 

Then there is the slot `renderInfo`, which afaIu is (or can be) used to control 
how the graph should be rendered when plotted e.g. on a 2D screen with 
Rgraphivz. The last big refactoring seems (according to svn blame) due to Seth 
Falcon, Nishant Gopalakrishnan and Florian Hahne in 2010/11. (The package has 
been worked on by a variety of of people over the years; e.g. my own 
involvement dates back to two bursts in 2004 and 07).

It is hard to tell how much of the renderInfo functionality is used out there 
in the wild, but the fact that its code hasn’t been touched for 5 years is 
consistent with assuming that interest in it is limited (otherwise there tend 
to be bug reports or requests for more functionality).

Kapser and Florian are probably lurking on this list and waiting to make 
additional comments.

In summary: I think your proposal is great, and you should go ahead. I expect 
the downsides, if any, to be miniscule.

Hope this helps
Wolfgang

> On Feb 6, 2016, at 12:20 GMT+1, Thomas Lin Pedersen <thomas...@gmail.com> 
> wrote:
> 
> I’m in the process of developing ggraph (https://github.com/thomasp85/ggraph) 
> - An extensive plotting framework for graph/tree/network data based on 
> ggplot2. Currently the class support is focused on igraph and dendrogram, but 
> the idea is to extend it with support for additional graph representations in 
> R. The easy way to do this is through conversion to one of the two currently 
> supported classes, but I would like to hear from the developers and users of 
> especially the graph package, whether there are any downsides to this 
> approach, i.e. are there features/information captured by the graphNEL class 
> that would disappear with conversion to an igraph object.
> 
> Also if someone is working with, or developing other graph representations 
> and would like to see support in ggraph, please let me know…
> 
> best
> 
> Thomas
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Update released version of Package

2016-01-04 Thread Wolfgang Huber

Something for a developer FAQ?
Kind regards
Wolfgang


> On 4 Jan 2016, at 19:35, Dan Tenenbaum  wrote:
> 
> Check out the release version from SVN at 
> 
> https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_2/madman/Rpacks/YOURPACKAGENAME
> 
> Make the fixes. 
> 
> Be sure to bump the version number accordingly, that is, do NOT make it the 
> same as the version number in devel.
> Bump only the last segment (z of the x.y.z version number).  For example, if 
> the version number is 1.0.0, bump it to 1.0.1.
> 
> Then commit the changes.
> 
> The changes in release should be minimal; change only what is necessary to 
> fix bugs/errors, or to enhance documentation.
> Other changes/new/features/experiments in devel should not be backported to 
> release.
> 
> More information:
> 
> http://bioconductor.org/developers/how-to/source-control/
> http://bioconductor.org/developers/how-to/version-numbering/
> 
> Dan
> 
> - Original Message -
>> From: "Venkat Malladi" 
>> To: "bioc-devel" 
>> Sent: Monday, January 4, 2016 10:31:33 AM
>> Subject: [Bioc-devel] Update released version of Package
> 
>> Hi,
>> 
>> I noticed that in release 3.2 of my package some functions were never 
>> exported.
>> I have fixed this on the devel branch, but what is the best way for me to fix
>> this on the release 3.3?
>> 
>> Venkat
>> 
>> 
>> 
>> UT Southwestern
>> 
>> 
>> Medical Center
>> 
>> 
>> 
>> The future of medicine, today.
>> 
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Rendering of vignette R code on the website

2015-06-18 Thread Wolfgang Huber

Perhaps a buglet in the build system / the website code? --

When I follow the link to “R Script” on the page 
http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html
(i.e. 
http://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.R
 ) the result is a nearly empty file.

With other packages, e.g. BiocStyle, this does not happen, and code is shown: 
http://bioconductor.org/packages/devel/bioc/vignettes/BiocStyle/inst/doc/LatexStyle.R

I am also confused about the first lines of both of these files:
### R code from vignette source 'vignettes/DESeq2/inst/doc/DESeq2.Rnw'
### R code from vignette source 'vignettes/BiocStyle/inst/doc/LatexStyle.Rnw'
as the stated paths do (should) not exist.

Thanks and best wishes
Wolfgang

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] best way to transition users to new version of package

2015-05-27 Thread Wolfgang Huber

Robert

with the packages cellHTS, cellHTS2 and DESeq, DESeq2 (and with the functions 
vsn, vsn2 in the vsn package) I three times chose route 1, and am generally 
happy about it. In due time, you can deprecate and then defunct the old one.

Option 2 seems needlessly disruptive (potentially). A large fraction of users 
you never ‘see’ or get in contact with. Not sure how that translates into 
absolute numbers of course.

With option 3 it seems difficult to implement the exact same (and probably 
unsatisfactory) behaviour that people are used to.

People also seem used to that from other products (WIndows 3.1, or now soon 10; 
iphone 6; X11; HTML5; …) 

Best wishes
Wolfgang

 On 27 May 2015, at 17:10, Robert M. Flight rfligh...@gmail.com wrote:
 
 I am the author and maintainer of the categoryCompare package on
 Bioconductor. As I and others have used it over the years, I am seeing that
 there are a lot of design mistakes in the code, and that it was not
 extensible in it's current form. Therefore, I decided to do a complete
 rewrite starting from scratch. Because of a new logic, I decided on a
 completely new function naming scheme, class names, etc.
 
 There are currently no packages on Bioconductor that depend on my package,
 and I only know of a handful of other users that are actively using it (I
 have no posts on the support forum, and I've only gotten three emails
 directly with questions about using it).
 
 I'm trying to figure out how best to transition the few users who may have
 analysis code relying on the package. I have three possibilities in mind,
 ranging from what I consider most radical to least, and probably least
 amount of work on my part to most:
 
 1 - change name of new package to categoryComare2 or something else. May
 lose old users who don't find the package. Could be mitigated by adding
 startupMessage to the next iteration of the original package, and adding
 information to Bioconductor landing page
 
 2 - add startupMessage's pointing users to vignettes with new workflow and
 functions, warning that old functions are completely gone.
 
 3 - provide wrappers with the same names as old package functions that use
 new functions under the hood, with warning that they will be deprecated in
 next version.
 
 I'd appreciate feedback on what the best approach would be in this case.
 
 Cheers,
 
 -Robert
 
 Robert M Flight, PhD
 Bioinformatics Research Associate
 Resource Center for Stable Isotope Resolved Metabolomics
 Markey Cancer Center
 University of Kentucky
 Lexington, KY
 
 Twitter: @rmflight
 Web: rmflight.github.io
 EM rfligh...@gmail.com
 PH 502-509-1827
 
 The most exciting phrase to hear in science, the one that heralds new
 discoveries, is not Eureka! (I found it!) but That's funny ... - Isaac
 Asimov
 
   [[alternative HTML version deleted]]
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Use and Usability metrics / shields

2015-05-14 Thread Wolfgang Huber



Can it be that the “in Bioc” shield is incorrect?
For instance, it says “9.98 years” for vsn but the first commit was in Oct 2002
Curiously “9.98 years” is stated for many old packages - surely we can use R 
for more precise date arithmetic?

Cheers
Wolfgang

 On May 13, 2015, at 21:36 GMT+2, Dan Tenenbaum dtene...@fredhutch.org wrote:
 
 
 
 - Original Message -
 From: Henrik Bengtsson henrik.bengts...@ucsf.edu
 To: COMMO Frederic frederic.co...@gustaveroussy.fr
 Cc: bioc-devel@r-project.org
 Sent: Wednesday, May 13, 2015 12:28:54 PM
 Subject: Re: [Bioc-devel] Use and Usability metrics / shields
 
 Sweet; you went live with the badges/shields, e.g.
 
  http://bioconductor.org/packages/release/bioc/html/affxparser.html
 
 A positive side effect is that now there's a link from the package
 page to to the package's check results, which I always wanted :)
 
 
 That was there before (and still is, see the bottom of the Details section). 
 But yes, it was not very visible.
 
 Dan
 
 
 Thanks for adding this
 
 /Henrik
 
 
 On Sun, May 10, 2015 at 11:39 AM, COMMO Frederic
 frederic.co...@gustaveroussy.fr wrote:
 Dear Martin,
 
 All of these suggestions sound good.
 
 Wolfgang's suggestion regarding possible associated papers might be
 also great.
 
 Another useful information would be to point to other publications
 where a given package was used, and cited.
 I don't know if it's technically possible, but it would be greatly
 informative to know how frequently a package is used, and how it
 performs, in real contexts.
 
 Frederic Commo
 Bioinformatics, U981
 Gustave Roussy
 
 
 De : Bioc-devel [bioc-devel-boun...@r-project.org] de la part de
 Wolfgang Huber [whu...@embl.de]
 Date d'envoi : samedi 9 mai 2015 19:57
 À : Martin Morgan
 Cc: bioc-devel@r-project.org
 Objet : Re: [Bioc-devel] Use and Usability metrics / shields
 
 Dear Martin
 
 great idea.
 Current build status” could perhaps be wrapped with
 Cross-platform availability” into some sort of “Availability /
 Accessibility”?
 
 I wonder how informative it would be to make metrics such as
 (i) citations of the associated paper
 (ii) full-text mentions e.g. in PubmedCentral
 actually useful. (i) could be flawed if package and paper are
 diverged; (ii) would require good disambiguation, e.g. like
 bioNerDS http://www.biomedcentral.com/1471-2105/14/194 (or other
 tools? not my expertise). Do we have someone with capabilities in
 this area on this list?
 
 PS  Martin you’ll like Fig. 2 of their paper.
 
 Wolfgang
 
 
 
 
 
 On May 9, 2015, at 19:15 GMT+2, Martin Morgan
 mtmor...@fredhutch.org wrote:
 
 Bioc developers!
 
 It's important that our users be able to identify packages that
 are suitable for their research question. Obviously a first step
 is to identify packages in the appropriate research domain, for
 instance through biocViews.
 
 http://bioconductor.org/packages/release/
 
 We'd like to help users further prioritize their efforts by
 summarizing use and usability. Metrics include:
 
 - Cross-platform availability -- biocLite()-able from all or only
 some platforms
 - Support forum activity -- questions and comments / responses, 6
 month window
 - Download percentile -- top 5, 20, 50%, or 'available'
 - Current build status -- errors or warnings on some or all
 platforms
 - Developer activity -- commits in the last 6 months
 - Historical presence -- years in Bioconductor
 
 Obviously the metrics are imperfect, so constructive feedback
 welcome -- we think the above capture in a more-or-less objective
 and computable way the major axes influencing use and usability.
 
 We initially intend to prominently display 'shields' (small
 graphical icons) on package landing pages.
 
 Thanks in advance for your comments,
 
 Martin Morgan
 Bioconductor
 --
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109
 
 Location: Arnold Building M1 B861
 Phone: (206) 667-2793
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Use and Usability metrics / shields

2015-05-09 Thread Wolfgang Huber

Dear Martin

great idea.
Current build status” could perhaps be wrapped with Cross-platform 
availability” into some sort of “Availability / Accessibility”?

I wonder how informative it would be to make metrics such as
(i) citations of the associated paper
(ii) full-text mentions e.g. in PubmedCentral
actually useful. (i) could be flawed if package and paper are diverged; (ii) 
would require good disambiguation, e.g. like bioNerDS 
http://www.biomedcentral.com/1471-2105/14/194 (or other tools? not my 
expertise). Do we have someone with capabilities in this area on this list?

PS  Martin you’ll like Fig. 2 of their paper.

Wolfgang





 On May 9, 2015, at 19:15 GMT+2, Martin Morgan mtmor...@fredhutch.org wrote:
 
 Bioc developers!
 
 It's important that our users be able to identify packages that are suitable 
 for their research question. Obviously a first step is to identify packages 
 in the appropriate research domain, for instance through biocViews.
 
  http://bioconductor.org/packages/release/
 
 We'd like to help users further prioritize their efforts by summarizing use 
 and usability. Metrics include:
 
 - Cross-platform availability -- biocLite()-able from all or only some 
 platforms
 - Support forum activity -- questions and comments / responses, 6 month window
 - Download percentile -- top 5, 20, 50%, or 'available'
 - Current build status -- errors or warnings on some or all platforms
 - Developer activity -- commits in the last 6 months
 - Historical presence -- years in Bioconductor
 
 Obviously the metrics are imperfect, so constructive feedback welcome -- we 
 think the above capture in a more-or-less objective and computable way the 
 major axes influencing use and usability.
 
 We initially intend to prominently display 'shields' (small graphical icons) 
 on package landing pages.
 
 Thanks in advance for your comments,
 
 Martin Morgan
 Bioconductor
 -- 
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109
 
 Location: Arnold Building M1 B861
 Phone: (206) 667-2793
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Docker granularity: containers for individual R packages, running on a normal R installation?

2015-04-14 Thread Wolfgang Huber

Dear Sean
I understand the second point. As for .Call not being the right paradigm, then 
maybe some other method invocation mechanism? In essence, my question is 
whether someone already has figured out whether new virtualisation tools can 
help avoid some of the tradtional Makeovers/configure pain.
Wolfgang 






 On Apr 14, 2015, at 13:52 GMT+2, Sean Davis seand...@gmail.com wrote:
 
 Hi, Wolfgang.
 
 One way to think of docker is as a very efficient, self-contained virtual 
 machine.  The operative term is self-contained.  The docker containers 
 resemble real machines from the inside and the outside.  These machines can 
 expose ports and can mount file systems, but something like .Call would need 
 to use a network protocol, basically.  So, I think the direct answer to your 
 question is no.  
 
 That said, there is no reason that a docker container containing all complex 
 system dependencies for the Bioc build system, for example, couldn't be 
 created with a minimal R installation.  Such a system could then become the 
 basis for further installations, perhaps even package-specific ones (though 
 those would need to include all R package dependencies, also).  R would need 
 to run INSIDE the container, though, to get the benefits of the installed 
 complex dependencies.
 
 I imagine Dan or others might have other thoughts to contribute.  
 
 Sean
 
 
 On Tue, Apr 14, 2015 at 7:23 AM, Wolfgang Huber whu...@embl.de wrote:
 Is it possible to ship individual R packages (that e.g. contain complex, 
 tricky to compile C/C++ libraries or other system resources) as Docker 
 containers (or analogous) so that they would still run under a “normal”, 
 system-installed R. Or, is it possible to provide a Docker container that 
 contains such complex system dependencies such that a normal R package can 
 access it e.g. via .Call ?
 
 (This question exposes my significant ignorance on the topic, I’m still 
 asking it for the potential benefit of a potential answer.)
 
 Wolfgang
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] SummarizedExperiment subset of 4 dimensions

2015-03-31 Thread Wolfgang Huber


Hi Michael

where would you put the “colData”-style metadata for the 3rd, 4th, … dimensions?

As an (ex-)physicists of course I like arrays, and the more dimensions the 
better, but in practical work I’ve consistently been bitten by the rigidity of 
such a design choice too early in a process.

Wolfgang

 On 31 Mar 2015, at 13:32, Michael Lawrence lawrence.mich...@gene.com wrote:
 
 Taken in the abstract, the tidy data argument is one for consistent data 
 structures that enable interoperability, which is what we have with 
 SummarizedExperiment. The long form or tidy data frame is an effective 
 general representation, but if there is additional structure in your data, 
 why not represent it formally? Given the way R lays out the data in arrays, 
 it should be possible to add that fourth dimension, in an assay array, while 
 still using the colData to annotate that structure. It does not make the data 
 any less tidy, but it does make it more structured.
 
 On Tue, Mar 31, 2015 at 4:14 AM, Wolfgang Huber whu...@embl.de 
 mailto:whu...@embl.de wrote:
 Dear Jesper
 
 this is maybe not the answer you want to hear, but stuffing in 4, 5, … 
 dimensions may not be all that useful, as you can always roll out these 
 higher dimensions into the existing third (or even into the second, the 
 SummarizedExperiment columns). There is Hadley’s concept of “tidy data” (see 
 e.g. http://www.jstatsoft.org/v59/i10 http://www.jstatsoft.org/v59/i10 ) — 
 a paper that is really worthwhile to read — which implies that the tidy way 
 forward is to stay with 2 (or maybe 3) dimensions in SummarizedExperiment, 
 and to record the information that you’d otherwise stuff into the higher 
 dimensions in the colData covariates.
 
 Wolfgang
 
 Wolfgang Huber
 Principal Investigator, EMBL Senior Scientist
 Genome Biology Unit
 European Molecular Biology Laboratory (EMBL)
 Heidelberg, Germany
 
 T +49-6221-3878823 tel:%2B49-6221-3878823
 wolfgang.hu...@embl.de mailto:wolfgang.hu...@embl.de
 http://www.huber.embl.de http://www.huber.embl.de/
 
 
 
 
 
  On 30 Mar 2015, at 12:38, Jesper Gådin jesper.ga...@gmail.com 
  mailto:jesper.ga...@gmail.com wrote:
 
  Hi!
 
  The SummarizedExperiment class is an extremely powerful container for
  biological data(thank you!), and all my thinking nowadays is just circling
  around how to stuff it as effectively as possible.
 
  Have been using 3 dimension for a long time, which has been very
  successful. Now I also have a case for using 4 dimensions. Everything
  seemed to work as expected until I tried to subset my object, see example.
 
  library(GenomicRanges)
 
  rowRanges - GRanges(
 seqnames=chrx,
 ranges=IRanges(start=1:3,end=4:6),
 strand=*
 )
 
  coldata - DataFrame(row.names=paste(s,1:3, sep=))
 
  assays - SimpleList()
 
  #two dim
  assays[[dim2]] - array(0,dim=c(3,3))
  se - SummarizedExperiment(assays, rowRanges = rowRanges, colData=coldata)
  se[1]
  #works
 
  #three dim
  assays[[dim3]] - array(0,dim=c(3,3,3))
  se - SummarizedExperiment(assays, rowRanges = rowRanges, colData=coldata)
  se[1]
  #works
 
  #four dim
  assays[[dim4]] - array(0,dim=c(3,3,3,3))
  se - SummarizedExperiment(assays, rowRanges = rowRanges, colData=coldata)
  se[1]
  #does not work
  #Error in x[i, , , drop = FALSE] : incorrect number of dimensions
 
  This is also the case for rbind and cbind. Would it be appropriate to ask
  you to update the SE functions to handle subset, rbind, cbind also for 4
  dimensions? I know the time for next release is very soon, so maybe it is
  better to wait until after April 16. Just let me know your thoughts about
  it.
 
  Jesper
 
[[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel 
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel 
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Short URLs for packages?

2015-03-24 Thread Wolfgang Huber

Before we start a religious war, can we make progress on the pragmatic goal of 
making it possible to provide such URLs to people?

There are two concepts
- ‘the package' - a specific version, running in a specific environment, 
‘frozen’, etc. (Gabe)
- ‘the package’ - as a concept and a living artifact (me, Bernd, Tim)
Both are useful. And having URLs for both would also be useful.

Wolfgang







 On Mar 23, 2015, at 18:43 GMT+1, Tim Triche, Jr. tim.tri...@gmail.com wrote:
 
 I just meant that the mnemonic link
 
 http://www.bioconductor.org/limma/  (SEO version of limma ;-))
 
 could dump people at something like
 
 http://www.bioconductor.org/release/limma/3.22.7/   (I'd prefer this)
 
 or if need be for backwards compatibility,
 
 http://www.bioconductor.org/packages/3.0/limma/3.22.7/  (seems less good)
 
 instead of
 
 http://www.bioconductor.org/packages/3.0/bioc/html/limma.html  (current)
 
 and furthermore the specific version page could note more prominently that
 the build of limma being referenced at that particular instance in time may
 or may not be the same as was cited in a paper, used in an analysis,
 available for download the previous evening, etc. thus citation(limma) is
 a Very Good Idea when writing up results that depend upon it.  Because even
 the WEHI guys could theoretically have a bug that impacted someone's
 results (as opposed to the usual case of Didn't Read The Fine Limma Manual)
 
 Does that make more sense?  (Probably not, but worth a try)
 
 Statistics is the grammar of science.
 Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science
 
 On Mon, Mar 23, 2015 at 9:29 AM, Dan Tenenbaum dtene...@fredhutch.org
 wrote:
 
 
 
 On March 23, 2015 9:18:57 AM PDT, Tim Triche, Jr. tim.tri...@gmail.com
 wrote:
 
 Packages are (read: should be, IMHO) published, citable pieces of
 research, though. Imagine if a paper you cite were silently updated
 without the doi/citation changing. That wouldn't be good
 
 I don't disagree, but the existing setup does nothing to address that.
 Citation('limma'), for example, does.
 
 .../release/... and .../devel/... can change at any time, potentially
 overnight (with or without a new BioC release).  The only real way to
 cite an exact version is to cite that exact version, which is already
 the proper way to do things and would remain unaffected by this, at
 least AFAIK.
 
 Perhaps a useful addendum would be for the mnemonic
 
 http://bioconductor.org/limma
 
 To redirect to
 
 
 http://bioconductor.org/packages/limma/whateverTheMostRecentStableVersionMayBe/
 
 And then everything is explicit.
 
 Does that address the competing issues discussed herein?
 
 
 Note that 'release' and 'devel' are just symlinks to the current release
 and devel versions. I.e. currently 3.0 and 3.1 respectively. So you can
 always link directly to a specific version.
 
 Dan
 
 
 Best,
 
 --t
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 
   [[alternative HTML version deleted]]
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Short URLs for packages?

2015-03-23 Thread Wolfgang Huber

I wonder whether it’d possible to have the website understand URLs like
http://www.bioconductor.org/pkgname

This could resolve to 
http://www.bioconductor.org/packages/release/bioc/html/pkgname.html
or
http://www.bioconductor.org/packages/devel/bioc/html/pkgname.html
depending on whether the package was yet released.

This could be handy in papers or grants that mention packages.

Wolfgang



Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

T +49-6221-3878823
wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] recalling methods

2014-12-06 Thread Wolfgang Huber


Also some interest on our side to contribute.
Perhaps in particular the rendering a useful index (or graph) of man pages on 
the fly in HTML / graphically.

Is it too ambitious to “learn” which methods are most important for objects of 
a particular class from analysing (running) a large code base (or even 
injecting a hook to that effect into a user’s R)? 

Wolfgang






 On Dec 6, 2014, at 1:19 GMT+1, Michael Love michaelisaiahl...@gmail.com 
 wrote:
 
 nice. I will play around with this. thanks Gabe!
 
 On Fri, Dec 5, 2014 at 6:37 PM, Gabe Becker becker.g...@gene.com wrote:
 Hey guys,
 
 Surgically removed from promptClass:
 
  classInSig - function(g, where, cl) {
cl %in% unique(unlist(findMethods(g, where)@signatures))
}
genWithClass - function(cl, where) {
allgen - getGenerics(where = where)
ok - as.logical(unlist(lapply(allgen, classInSig, cl = cl,
where = where)))
allgen[ok]
}
 
 genWithClass(IRanges, find(classMetaName(IRanges)))
 [1] ccoerce   end-gaps intersect
 [6] isNormal names-  namespgap
 pintersect
 [11] psetdiff punion   reduce   reverse  setdiff
 [16] start-  startthreebands   union
 updateObject
 [21] update   width-  width
 
 
 For semantic guessing of which ones will be useful, I've got nothing (for
 now).
 
 ~G
 
 On Fri, Dec 5, 2014 at 11:28 AM, Michael Lawrence
 lawrence.mich...@gene.com wrote:
 
 Cool. I see hypertext as being useful here, because the generics and
 classes form an intricate and sometimes ambiguous web, especially when
 multiple inheritance and dispatch are involved. I think we should first
 build better tooling for introspecting S4 and for graph-based modeling and
 analysis of S4 architecture. For example, could we statically detect
 whether a dispatch ambiguity exists, knowing all of the methods and
 classes? And based on that build one or more end-user UIs?
 
 
 
 On Fri, Dec 5, 2014 at 11:05 AM, Michael Love
 michaelisaiahl...@gmail.com
 wrote:
 
 On Thu, Dec 4, 2014 at 4:01 PM, Michael Lawrence
 lawrence.mich...@gene.com wrote:
 
 I think this gets at the heart of at least one of the usability issues
 in Bioconductor: interface discoverability. Many simpler command line
 tools
 have a single-faceted interface for which it is easy to enumerate a list
 of
 features. There's definitely room for better ways to interrogate our
 object-oriented APIs, but it's challenging. Essentially we need a way
 for
 the user to ask what can I do with this object?. Yes, we need better
 introspection utilities, but we also need to integrate the query with
 documentation. In other words, we need a more dynamic, more fluid help
 system, oriented around S4.
 
 
 I would be interested in working on this. A minimal goal for me is a
 function that just returns a character vector of the names of the
 generics defined for the object. Filtering that down to give methods
 which are likely relevant using the documentation will definitely be
 a bigger challenge.
 
 
 
 
 
 On Thu, Dec 4, 2014 at 9:56 AM, Michael Love 
 michaelisaiahl...@gmail.com wrote:
 
 I was thinking about a request from someone at Bioc2014 (I can't
 remember at the moment)
 
 As an end-user, if I have an object x, how can I *quickly* recall the
 main methods for that? As in, without breaking my flow and going to
 ?myClass or help(myClass-class). Suppose x is a GRanges, how can I
 remember that there is a method called narrow() which works on x?
 
 showMethods(classes=class(x)) will print out a huge list for many
 complex Bioc classes. And printTo=FALSE turns this huge list into an
 unhelpful character vector, e.g.:
 
 head(showMethods(classes=GRanges,printTo=FALSE),8)
 [1] Function \.asSpace\:
  [3]  not an S4 generic function   
  [5] Function \.linkToCachedObject-\:  not an S4 generic
 function
  [7] Function
 \.replaceSEW\:
 
 any ideas?
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 
 
[[alternative HTML version deleted]]
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 
 
 --
 Computational Biologist
 Genentech Research
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Wolfgang Huber

A colleague and I are designing a package for quantitative proteomics data, and 
we are debating whether to base it on the SummarizedExperiment or the 
ExpressionSet class. 

There is no immediate use for the ranges aspect of SummarizedExperiment, so 
that would have to be carried around with NAs, and this is a parsimony argument 
for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is 
cleaner, its code more modern and more likely to be updated, and users of the 
Bioconductor project are likely to benefit from having to deal with a single 
interface that works the same or similarly across packages, rather than a 
variety of formats; which argues that new packages should converge towards 
SummarizedExperiment(’s interface).

Are there any pertinent insights from this group?

Thanks and best wishes
Wolfgang

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] plotPCA for BiocGenerics

2014-11-02 Thread Wolfgang Huber

Just to bring the discussion back to the fact that there is a need to do 
/something/. A function plotPCA is defined in packages EDASeq, DESeq2, DESeq, 
affycoretools, Rcade, facopy, CopyNumber450k, netresponse, MAIT, with a real 
potential for needless user confusion. And BiocGenerics already defines the 
generics plotMA and plotDispEsts.

The need for BiocGenerics in the first place is a consequence of the S4 / Dylan 
/ Common LISP object system and the fact that our project releases more than 
one package. We should not confuse that with the other issues that came up in 
the thread.

To what extent functions that do related things should have the same name seems 
a matter of taste. Reducing the number of function names that are around, but 
increasing the number of classes, seems pretty much a null-sum game to me. 
irony We could have a ‘compute’ generic, for all functions that compute 
something? Might make things easier for some users. Until some authors start 
using its argument ‘what’ to say what it should compute if it’s not already 
clear from the class of its argument(s). /irony 

I second Mike’s suggestion  Kasper’s points.

Best wishes
Wolfgang

 On 1 Nov 2014, at 19:46, Kasper Daniel Hansen kasperdanielhan...@gmail.com 
 wrote:
 
 I see the argument for separating plotting and computation.
 
 I don't see the argument for changing plotPCA to plot.  base R has things
 that work either way; we all know hist(), boxplot() etc etc.  And for this
 specific case there are (good) arguments for the fact that one could
 envision several plots on a PCA object.
 
 But while I see the argument, by having a common class which all packages
 should use, it becomes pretty hard to have package specific customization
 (colors, phenodata etc etc), or it will at least require some thinking.
 
 Best,
 Kasper
 
 On Sat, Nov 1, 2014 at 2:21 PM, Michael Love michaelisaiahl...@gmail.com
 wrote:
 
 On Nov 1, 2014 1:29 PM, Michael Love michaelisaiahl...@gmail.com
 wrote:
 
 As far as the proposal of using the plot() function for all plots, I
 think for the biologists who are struggling already to get R going,
 and to figure out what kinds of plots are possible, plotMA (and
 knowing that the help is available at ?plotMA) is just so much simpler
 than the alternative (isn't it ?plot,MA-method for S4?).
 
 
 Scratch that... I forgot that finding help has to be ugly either way.
 
 
 
 
 On Fri, Oct 31, 2014 at 9:10 PM, Michael Lawrence
 lawrence.mich...@gene.com wrote:
 Sure, the ggplot model (returning an abstract representation of a plot,
 and
 then rendering it when requested, i.e., printed) is preferable to the
 side
 effects of base graphics. Unfortunately, plot() implies the side
 effect,
 which motivated the introduction of autoplot() in ggbio, and in fact we
 used Steve's type= parameter idea in many of the autoplot methods.
 While I
 agree that plotScree() could be preferable to plot(type=scree), it's
 still beneficial to have the abstraction, if only for convenience and
 to
 support generic code. Btw, a (S3) pca object already exists: see
 ?princomp.
 
 Michael
 
 On Fri, Oct 31, 2014 at 3:53 PM, Ryan C. Thompson 
 r...@thompsonclan.org
 wrote:
 
 I'd just like to chime in that regardless of what approach is chosen,
 I
 definitely would appreciate a way to get the plot data without
 actually
 making the plot. I often end up reimplementing plots in ggplot so that
 I
 can easily customize some aspect of them, so in such cases I need a
 way to
 just get the plot data/coordinates.
 
 For example, if I have an edgeR DGEList and I want to get the X and Y
 coordinates for the MDS plot, I need to do something like:
 
 dev.new()
 mds.coords - plotMDS(dge)
 dev.off()
 
 which is kind of unfortunate.
 
 So I guess this is more a reminder to people implementing plots to
 also
 implement a way to get the plot data.
 
 -Ryan
 
 
 On Fri 31 Oct 2014 03:43:04 PM PDT, Steve Lianoglou wrote:
 
 Hi,
 
 On Fri, Oct 31, 2014 at 2:35 PM, Thomas Lin Pedersen
 thomas...@gmail.com wrote:
 
 With regards to abstraction - I would personally much rather read
 and
 write code that contained plotScores() and plotScree() etc. where
 the
 intend of the code is clearly communicated, instead of relying on a
 plot()
 function whose result is only known from experience. Trying to
 squeeze
 every kind of visual output into the same plot generic seems
 artificial and
 constrained to me. I totally agree on the plotPCA critique on the
 other
 hand...
 
 
 If we've bought a ticket to ride on Kevin's and Michael's (and
 whoever
 else) train of thought, wouldn't plot(pca(x), type='scree') or
 plot(pca(x), type='scores') be the preferred way to go ... for some
 definition of preferable?
 
 -steve
 
 
 Thomas
 
 
 On 31 Oct 2014, at 22:09, Michael Lawrence 
 lawrence.mich...@gene.com
 wrote:
 
 I strongly agree with Kevin's position. plotPCA() represents two
 separate concerns in its very name: the computation and the
 rendering.
 Those need to be

Re: [Bioc-devel] rhdf5 help

2014-11-01 Thread Wolfgang Huber

For the record: see https://support.bioconductor.org/p/62283 which includes a 
reply.


 On 25 Oct 2014, at 21:07, Joseph Nathaniel Paulson jpaul...@umiacs.umd.edu 
 wrote:
 
 Hello,
 
 I'm in the process of writing a few wrappers for loading and writing out
 files in the biom-format
 http://biom-format.org/documentation/format_versions/biom-2.1.html that
 happens to be in HDF5 format. The rhdf5 package is great, but in
 particular, the beginning of every file (as an example:
 https://github.com/biocore/biom-format/blob/master/examples/rich_sparse_otu_table_hdf5.biom
 )
 has missing information that I can get running the command-line version of
 hdf5dump
 
 Running hdf5dump vs. 1.8.7 I'm able to see *creation-date*, *format-url*,
 *format-version*, etc (see below).
 
 However, running h5read/ls on the same object none of these
 categories/groups come up. My goal is to get the format-verson, etc groups
 that are not showing up.
 
 
 Thank you,
 
 Joseph Paulson
 
 
 *Example:*
 
 *# in R*
 
 *str(h5read(./rich_sparse_otu_table_hdf5.biom,/))*List of 2
 $ observation:List of 4
  ..$ group-metadata: NULL
  ..$ ids   : chr [1:5(1d)] GG_OTU_1 GG_OTU_2 GG_OTU_3
 GG_OTU_4 ...
  ..$ matrix:List of 3
  .. ..$ data   : num [1:15(1d)] 1 5 1 2 3 1 1 4 2 2 ...
  .. ..$ indices: int [1:15(1d)] 2 0 1 3 4 5 2 3 5 0 ...
  .. ..$ indptr : int [1:6(1d)] 0 1 6 9 13 15
  ..$ metadata  :List of 1
  .. ..$ taxonomy: chr [1:7, 1:5] k__Bacteria p__Proteobacteria
 c__Gammaproteobacteria o__Enterobacteriales ...
 $ sample :List of 4
  ..$ group-metadata: NULL
  ..$ ids   : chr [1:6(1d)] Sample1 Sample2 Sample3 Sample4
 ...
  ..$ matrix:List of 3
  .. ..$ data   : num [1:15(1d)] 5 2 1 1 1 1 1 1 1 2 ...
  .. ..$ indices: int [1:15(1d)] 1 3 1 3 4 0 2 3 4 1 ...
  .. ..$ indptr : int [1:7(1d)] 0 2 5 9 11 12 15
  ..$ metadata  :List of 4
  .. ..$ BODY_SITE   : chr [1:6(1d)] gut gut gut skin ...
  .. ..$ BarcodeSequence : chr [1:6(1d)] CGCTTATCGAGA CATACCAGTAGC
 CTCTCTACCTGT CTCTCGGCCTGT ...
  .. ..$ Description : chr [1:6(1d)] human gut human gut human
 gut human skin ...
  .. ..$ LinkerPrimerSequence: chr [1:6(1d)] CATGCTGCCTCCCGTAGGAGT
 CATGCTGCCTCCCGTAGGAGT CATGCTGCCTCCCGTAGGAGT CATGCTGCCTCCCGTAGGAGT ...
 
 sessionInfo()
 R version 3.1.0 (2014-04-10)
 Platform: x86_64-apple-darwin10.8.0 (64-bit)
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 other attached packages:
 [1]* rhdf5_2.10.0 *BiocInstaller_1.16.0
 loaded via a namespace (and not attached):
 [1] tools_3.1.0 zlibbioc_1.12.0
 
 *# Terminal *
 
 *./hdf5-1.8.7-mac-intel-x86_64-static/bin/h5dump
 ./rich_sparse_otu_table_hdf5.biom *HDF5 ./rich_sparse_otu_table_hdf5.biom
 {
 GROUP / {
   ATTRIBUTE creation-date {
  DATATYPE  H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
 }
  DATASPACE  SCALAR
  DATA {
  (0): 2014-07-29T16:16:36.617320
  }
   }
   ATTRIBUTE format-url {
  DATATYPE  H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
 }
  DATASPACE  SCALAR
  DATA {
  (0): http://biom-format.org;
  }
   }
   ATTRIBUTE format-version {
  DATATYPE  H5T_STD_I64LE
  DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
  DATA {
  (0): 2, 1
  }
   }
   ATTRIBUTE generated-by {
  DATATYPE  H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
 }
  DATASPACE  SCALAR
  DATA {
  (0): example
  }
   }
   ATTRIBUTE id {
  DATATYPE  H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
 }
  DATASPACE  SCALAR
  DATA {
  (0): No Table ID
  }
   }
   ATTRIBUTE nnz {
  DATATYPE  H5T_STD_I64LE
  DATASPACE  SCALAR
  DATA {
  (0): 15
  }
   }
   ATTRIBUTE shape {
  DATATYPE  H5T_STD_I64LE
  DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
  DATA {
  (0): 5, 6
  }
   }
   ATTRIBUTE type {
  DATATYPE  H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
 }
  DATASPACE  SCALAR
  DATA {
  (0): otu table
  }
   }
 .
 
   [[alternative HTML version deleted]]
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Add a new developer/maintainer

2014-10-03 Thread Wolfgang Huber

Dear Setia

just edit the “DESCRIPTION” file.

For maintainers, note the CRAN Repository Policy at 
http://cran.r-project.org/web/packages/policies.html which I think we generally 
also apply in Bioconductor: The package’s DESCRIPTION file must show both the 
name and email address of a single designated maintainer (a person, not a 
mailing list). That contact address must be kept up to date, and be usable for 
information mailed by the CRAN team without any form of filtering, confirmation 


See also 
http://www.bioconductor.org/developers/package-guidelines/#responsibilities
 
Wolfgang




Il giorno Oct 3, 2014, alle ore 9:10 GMT+2, Setia Pramana setia.pram...@ki.se 
ha scritto:

 Dear Bioc Developer,
 I would like to add another person as developer and maintainer of my package 
 IsoGeneGUI. Please let me know the procedure for that. Look forward to your 
 suggestion. Thank you.
 Best regards,
 Setia
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Workflows

2014-09-29 Thread Wolfgang Huber


Sorry if I have overlooked something… referring to 
http://www.bioconductor.org/developers/how-to/workflows Is there a standardized 
way to manage
- dependencies
- versions

In principle, these could be automagically computed (?), but would still have 
to be exposed to workflow users using an afaIcs not yet existing mechanism (?)

Wolfgang

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Workflows

2014-09-29 Thread Wolfgang Huber

Dear Dan

Thanks. What is the recommended procedure for people wanting to run the 
workflow on their own computer? (E.g. for teaching).
Or even to prevent them from doing odd things like using the wrong versions of 
R and packages?

Best wishes
Wolfgang






Il giorno Sep 29, 2014, alle ore 21:32 GMT+2, Dan Tenenbaum 
dtene...@fhcrc.org ha scritto:

 
 
 - Original Message -
 From: Wolfgang Huber whu...@embl.de
 To: bioc-devel@r-project.org
 Sent: Monday, September 29, 2014 12:24:54 PM
 Subject: [Bioc-devel] Workflows
 
 
 Sorry if I have overlooked something… referring to
 http://www.bioconductor.org/developers/how-to/workflows Is there a
 standardized way to manage
 - dependencies
 - versions
 
 In principle, these could be automagically computed (?), but would
 still have to be exposed to workflow users using an afaIcs not yet
 existing mechanism (?)
 
 The builder simply notes any package that you invoke with library() or 
 require() and then it creates a package and makes sure these packages are all 
 added as dependencies in the DESCRIPTION file.
 
 Dan
 
 
 
 Wolfgang
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] support site

2014-09-26 Thread Wolfgang Huber


0. Overall, the new site is great. Some of the threads now turn into valuable 
little mini-blogs or discussion pages on a specialized topic.

1. I tend to agree with Kasper. The site layout looks a bit too cluttered to 
me, with all these boxes of votes, views and response counts in different 
colours, breathless reporting of recent locations, bold face subject headers 
and what not. Of course, these things are a matter of taste, although I am sure 
there also now lots of people who approach this topic scientifically and 
rationally.

2. I agree that “LATEST” and “OPEN” could be more prominent (and “Messages” 
less so). It took me a great while to discover them (cf. point 1.)

Best wishes
Wolfgang


Il giorno 26 Sep 2014, alle ore 20:35, Kasper Daniel Hansen 
kasperdanielhan...@gmail.com ha scritto:

 1. When you click new post I think there should be some mention that if
 you report an issue with a specific package, you should tag the post with
 package name.
 
 2. Is there a way to have default signature?
 
 3. Is there some way to modify the layout.  I am getting confused by all
 the colors and text and stuff.  I really want to see the title of the posts
 much more clearly.
 
 4. It was not immediately clear to me that the big Messages really are
 private messages and that I should click on LATEST or OPEN (which
 appear smaller than messages in my browser) to see the forum.  In fact,
 there seems to be three buttons which are critical to the new user (like
 me) and they are LATEST, (perhaps) OPEN and New Post.  Everything
 else should be made smaller and less in your face, I think.
 
 Best,
 Kasper
 
   [[alternative HTML version deleted]]
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] r+w permissions in release branches

2014-04-24 Thread Wolfgang Huber

Hi Martin
to come back to the original trigger for this thread: it was not concerns for 
reproducibility, but the fact that a Bioc package in the current release 
stopped working because a CRAN package has changed in the meanwhile.
What’s the most practical solution to this specific problem?
Best wishes
Wolfgang




On 23 Apr 2014, at 19:41, Martin Morgan mtmor...@fhcrc.org wrote:

 On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote:
 I think we should have a CRAN snapshot (or a subset of CRAN used in Bioc)
 inside each Bioc release; I don't know how hard that is to manage from a
 technical point of view.
 
 I followed this thread with some interest.
 
 It would be surprisingly challenging to update even a 2.13 package -- the 
 build machines have moved on to other tasks, unconstrained by the unique 
 system dependencies needed for 2.13 builds.
 
 The idea of a 'forever' repository snapshot seems possible, but would the 
 snapshot be at the beginning of the release and hence miss the few but 
 important bug fixes introduced during the release, or at the end of the 
 release, which might be after the time required for the purposes of 
 replication? Either way it is certain that the peanut butter would land face 
 down for one's particular need. Also, the need for the user to satisfy system 
 dependencies becomes increasingly challenging, even with a binary repository. 
 I don't think a central 'Bioc' solution would really address the problem of 
 reproducibility.
 
 It is not that 'hard' for an individual group to create a snapshot of Bioc 
 and CRAN, using rsync
 
  http://www.bioconductor.org/about/mirrors/mirror-how-to/
  http://cran.r-project.org/mirror-howto.html‎
 
 and to use install.packages() or even biocLite to access these (see 
 ?setRepositories). This would again require that the system dependencies for 
 these packages are satisfied in some kind of frozen fashion.
 
 A more robust possibility is of course a virtual machine, such as the AMI (or 
 a customized version) we provide
 
  http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids
 
 although these have only a subset of packages installed by default.
 
 The CRAN thread referenced earlier included this post
 
  https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html
 
 which I think makes an important distinction between exact replication and 
 scientific reproducibility; it is the latter that must be the most 
 interesting, and the former that we somehow seem to stumble over. The thread 
 also mentions best practices -- version control
 
  http://bioconductor.org/developers/how-to/source-control/
 
 disciplined approach to deprecation
 
  http://bioconductor.org/developers/how-to/deprecation/
 
 package versioning
 
  http://bioconductor.org/developers/how-to/version-numbering/
 
 and the Bioc-style approach to release that we as developers can act on to 
 enhance reproducibility. What other best practices can we more forcefully / 
 conveniently adopt within the project?
 
 Martin
 
 
 Best,
 Kasper
 
 
 On Tue, Apr 22, 2014 at 6:06 PM, Julian Gehring 
 julian.gehr...@embl.dewrote:
 
 Hi,
 
 For most problems discussed here, it seems that having a fixed version of
 package is sufficient rather than a specific version.  If the idea of a
 snapshot with each bioc release would work (which still means one version
 per package), so would requiring that version within the package (one would
 just need to agree which version this is).
 
 Best wishes
 
 Julian
 
 
  what if two Bioc packages require different version of the â€˜sameâ€™ CRAN
 package?
 AfaIu, the infrastructure is not designed to deal with multiple versions
 of a package.
 
 Nor would I as a user expect to have less-than-the-most recent versions
 of CRAN packages in my library just because some other package says soâ€¦
 
 Just to throw in another, and probably silly suggestion: the Bioconductor
 repository could keep â€˜snapshotsâ€™ of CRAN packages compatible with each
 release, but they would have to be name-mangled in some way. The potential
 for confusion is enormous.
 
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
  [[alternative HTML version deleted]]
 
 
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 
 -- 
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109
 
 Location: Arnold Building M1 B861
 Phone: (206) 667-2793
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] CITATION on the Bioc package landing page?

2014-04-24 Thread Wolfgang Huber

I wonder whether the software that makes the package landing pages (e.g.: 
http://bioconductor.org/packages/release/bioc/html/minfi.html ) could be 
tweaked to display the actual citation suggested in a package CITATION file.

Right now, it says 'To cite this package in a publication, start R and enter: 
citation(“minfi”)’. Which is already a good start, but requires the reader to 
have an R session available, install the package, and type these words. Things 
that could be easily automated, and where there is no obvious benefit from 
having the user do these computations, as their result is anyway predictable.

What do you think?

Kind regards
Wolfgang

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] 'droplevels' argument in `[` method for SummarizedExperiment?

2014-03-12 Thread Wolfgang Huber

Hi Martin, Mike

a DESeq2 user brought up the observation that when he subsets a ‘DESeqDataSet’ 
object (the class inherits from ‘SummarizedExperiment’) by samples, he often 
ends up with unused factor levels in the colData. (Esp. since the subsetting is 
often to select certain subgroups). Would either of the following two make 
sense:

- a ‘droplevels’ method for ‘SummarizedExperiment’ that efficiently and 
conveniently removes unused levels, i.e.
 x = x[, x$tissue %in% c(“guts”, “brains”)]
 x = droplevels(x)
- a ‘droplevels’ argument (default: FALSE)
 x = x[, x$tissue %in% c(“guts”, “brains”), droplevels=TRUE]

Wolfgang

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Nature Genetics - Call for data analysis papers

2014-02-27 Thread Wolfgang Huber

Perhaps of interest to some:
http://www.nature.com/ng/journal/v46/n3/full/ng.2914.html

Community standards for data access, interoperability and metadata only make 
sense if data are creatively reused to further research. We are therefore 
inviting the submission of Analysis papers that reformat and integrate existing 
data sets to generate substantial novel insights into gene expression in cell 
differentiation transitions and different cell fates.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] org-mode vignettes

2013-08-28 Thread Wolfgang Huber


Hi

is there already a best practice (example?) for how to deliver vignettes 
written in org-mode (http://orgmode.org) in Bioconductor packages?

(This would also require that emacs and its ESS and org modes are installed on 
the build servers.)

Best wishes
Wolfgang
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] svn and package version numbers

2013-06-30 Thread Wolfgang Huber

Hi All,

just a reminder that it is good practice to bump up the package version when 
you commit a change to a package's source, even if you consider it 'trivial'. 
Version numbers are free, while the confusion ensuing from there being 
different versions of the software with ostensibly the same version can waste a 
great deal of someone's time.

Dan / Bioc-Core team: would it be good to mention this somewhere on 
http://bioconductor.org/developers/source-control ?

Best wishes
Wolfgang
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] biocLite should warn when called from a non-current R version

2013-06-20 Thread Wolfgang Huber

Hi,
there is benefit in having newest versions, but I think we shouldn't get 
carried away, and find Martin's suggestion reasonable. I don't want to be told 
to go get an iPhone 5 every time I update an app on my iPhone 4 - even if that 
same app would work much better there. People may have legitimate reasons to 
run an 'old' R, and we are not the version police.

Simon, I very much sympathise with your frustration about thoughtless or 
ignorant user questions, and I suppose you hope that this proposed change will 
provide a quick fix. I would have said something different above if I shared 
that hope, but I am not sure it would make much difference to them in practice: 
many would not follow that message even it were in ALL CAPS ALL ACROSS THEIR 
SCREEN. And at the same time shouting at people who know what they're doing is 
not a desirable approach.

Best wishes
Wolfgang
 
On Jun 20, 2013, at 9:03 am, Simon Anders and...@embl.de wrote:

 Hi Martin
 
 good to see that Herve agrees with me, and I reiterate my point, because I 
 consider this issue very important.
 
 The average user does _not_ expect that a function like 'biocLite', which has 
 the express purpose of downloading and installing packages, does not pull the 
 newest package. I know that to you, as an an experienced Bioc person, this 
 has become second nature, but believe me: It is unusual and very surprising 
 for anybody used to other systems.
 
 The message hence has to very clearly and unambiguously state the following 
 fact: The biocLite function will NOT install the most recent released 
 versions of Bioconductor packages.
 
 I insist that this should be mentioned in this direct manner. Your 
 formulation may imply it to the careful reader but not to a user in a hurry. 
 Merely mentioning that there are newer versions will _not_ bring across the 
 point that calling the biocLite installation script will not install these!
 
 I really do not see the problem with saying clearly that biocLite will not 
 pull the newest version. Is this something we are ashamed of and don't want 
 to admit straight out?
 
 And I frankly see no need to warn users already in this message that updates 
 can break existing workflows. Everybody who has ever used any software knows 
 this.
 
  Simon
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] \VignetteIndexEntry

2013-05-20 Thread Wolfgang Huber

I am puzzled by the interpretation of the %\VignetteIndexEntry directive by the 
build system. On 
http://www.bioconductor.org/packages/devel/bioc/html/genefilter.html of the 
four vignettes for this package, two are listed with titles
1.  empy title
2.  Additional plots for: Independent filtering increases power for detecting 
differentially expressed genes

whereas in the .Rnw files, these two lines are provided ( 
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/genefilter/inst/doc 
 )

1. %\VignetteIndexEntry{Diagnostics for independent filtering}
2. %\VignetteIndexEntry{Additional plots for: Independent filtering increases 
power for detecting differentially expressed genes, Bourgon et al., PNAS (2010)}

both of these do not match what is rendered (the first more obviously so). 
Anybody have an idea what's wrong or how to properly do this?

Best wishes
Wolfgang
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] vectorize default dist2 function in genefilter

2013-03-14 Thread Wolfgang Huber

Hi James

I added your suggestion to the package, it is in svn revision 74349 and should 
be on the server in the devel section soon.

This an old package - I noted the first commit to it had revision number 24 and 
was by Robert Gentleman on 2001-08-01.

Best wishes
Wolfgang



El Mar 13, 2013, a las 10:41 am, James F. Reid rei...@gmail.com escribió:

 Hi Wolfgang,
 
 On 12/03/13 21:11, Wolfgang Huber wrote:
 
 Dear James
 
 Thank you. What would the saved time be (e.g. compared to the overall 
 runtime of arrayQualityMetrics)? I would be surprised if the saving was 
 worth the added complexity, but am always happy to be surprised.
 
 I believe so but maybe I'm missing something.
 Here are the system times on a 2*100 on my laptop using just dist2, which 
 would also have a consequence on the overall runtime of arrayQualityMetrics 
 in aqm.heamap.
 
 
 dist2 - function (x,
   fun = function(a, b) mean(abs(a - b), na.rm = TRUE),
   diagonal = 0) {
 
defaultFun - ifelse(missing(fun), TRUE, FALSE)
 
if (!(is.numeric(diagonal)  (length(diagonal) == 1L)))
stop('diagonal' must be a numeric scalar.)
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) = 2) {
 
if (defaultFun) {
res - apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
} else {
for (j in 2:ncol(x))
for (i in 1:(j - 1))
res[i, j] = res[j, i] = fun(x[, i], x[, j])
}
}
 
return(res)
 }
 
 y - matrix(rnorm(2 * 100), 2, 100)
 
 system.time(dist2(x = y, fun = function(a, b) mean(abs(a - b), na.rm=TRUE)))
 ##   user  system elapsed
 ## 11.664   0.060  11.800
 
 system.time(dist2(x = y))
 ##  user  system elapsed
 ## 5.201   0.348   5.600
 
 Not sure what you'd want to change to the Rd.
 
 Best,
 James.
 
 
 A patch of the .R and .Rd file would be most welcome and expedite the change.
 
 Btw, colSums apparently also works with 3-dim arrays, so both loops (over i 
 and j) could be vectorised, however afaIcs at the cost of constructing an 
 object of size nrow(x)^3 in memory, which might again break performance.
 
  Best wishes
  Wolfgang
 
 Il giorno Mar 12, 2013, alle ore 4:43 PM, James F. Reid rei...@gmail.com 
 ha scritto:
 
 Dear bioc-devel,
 
 the dist2 function in genefilter defined as:
 
 dist2 - function (x, fun = function(a, b) mean(abs(a - b), na.rm = TRUE), 
 diagonal = 0) {
 
if (!(is.numeric(diagonal)  (length(diagonal) == 1L)))
stop('diagonal' must be a numeric scalar.)
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) = 2) {
for (j in 2:ncol(x)) for (i in 1:(j - 1)) res[i, j] = res[j,
i] = fun(x[, i], x[, j])
}
return(res)
 }
 
 could have it's default function vectorized as:
 
 res - apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
 
 to improve performance for example in the ArrayQualityMetrics package.
 
 Best.
 James.
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] vectorize default dist2 function in genefilter

2013-03-12 Thread Wolfgang Huber


Dear James

Thank you. What would the saved time be (e.g. compared to the overall runtime 
of arrayQualityMetrics)? I would be surprised if the saving was worth the added 
complexity, but am always happy to be surprised.

A patch of the .R and .Rd file would be most welcome and expedite the change.

Btw, colSums apparently also works with 3-dim arrays, so both loops (over i and 
j) could be vectorised, however afaIcs at the cost of constructing an object of 
size nrow(x)^3 in memory, which might again break performance.

Best wishes
Wolfgang

Il giorno Mar 12, 2013, alle ore 4:43 PM, James F. Reid rei...@gmail.com ha 
scritto:

 Dear bioc-devel,
 
 the dist2 function in genefilter defined as:
 
 dist2 - function (x, fun = function(a, b) mean(abs(a - b), na.rm = TRUE), 
 diagonal = 0) {
 
if (!(is.numeric(diagonal)  (length(diagonal) == 1L)))
stop('diagonal' must be a numeric scalar.)
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) = 2) {
for (j in 2:ncol(x)) for (i in 1:(j - 1)) res[i, j] = res[j,
i] = fun(x[, i], x[, j])
}
return(res)
 }
 
 could have it's default function vectorized as:
 
 res - apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
 
 to improve performance for example in the ArrayQualityMetrics package.
 
 Best.
 James.
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Webpage spring cleaning - Package submission

2013-02-22 Thread Wolfgang Huber


I would like to suggest clarification of the information in the Package 
Guidelines [1] and the Package Submission [2] pages. 

1. Information that is actually of type guideline is (only) stated on the 
submission page, which is confusing. Shouldn't one page describe the applied 
criteria, and the other the submission procedure?

2. We state: Packages should also conform to the following: • Use S4 classes 
and methods.. This is vague and confusing. I think what it should say is that 
*existing* S4 classes and generic functions, and existing methods, should be 
re-used. What many developers seem to interpret this as is that they should 
come up with lots of their own, new, ideosyncratic S4 classes and methods. 
Which mostly only adds overhead and complexity, and rarely any benefit. I would 
like to suggest removing that statement, or clarifying it.

Best wishes
Wolfgang


[1] http://www.bioconductor.org/developers/package-guidelines/
[2] http://www.bioconductor.org/developers/package-submission/
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Request to add 'normalize' to BiocGenerics

2013-02-21 Thread Wolfgang Huber

 Gentleman wrote:
 my 2c worth
 
 On Wed, Feb 20, 2013 at 10:45 AM, Hervé Pagès hpa...@fhcrc.org wrote:
 Hi,
 
 I agree with Laurent that we can't really play the semantic and concept
 police. It's the responsibility of package authors to decide whether
 it's appropriate or not to call normalization that particular
 transformation they're implementing.
 
 However I hope that we all agree on the following rule regarding the
 generics that make it into BiocGenerics:
 
   If foo() is a generic function defined in BiocGenerics, no
   BioC package should redefine the function (either as a generic
   or an ordinary function). It can only define methods for it,
   or move away and use a different name for this functionality.
 
  but really the point of namespaces is that you don't need to do that.
  And we really don't want to be the naming police.
   The sole advantage of BiocGenerics, I think, is that there is a
 common
 and standard location for a set of generic functions that get used in
 different
 packages.  This allows package authors to add methods that specialize
 the
 behavior of a generic function.  They have some confidence that the
 generic will always exist and hence can plan accordingly.  It
 hopefully reduces dependencies between packages.
 
   I don't think it should define a set of reserved words, that seems
 counter
 productive.
There are often good reasons why the same name is used for different
 concepts (normalize being one of them).  And in some cases a single
 generic suffices, but in others it will not.  Places where a single
 generic
 fall apart are when there are really different argument lists, and
 where inheritance (and hence things like NextMethod) are going to get
 messed
 up if the disparate methods are all linked to a single generic. Generics
 are really concepts - and the methods are realizations of those
 concepts.
 
   Of course, packages that define functions whose names clash with
 BiocGenerics will cause problems, and they would generally be best
 to avoid that, but really I don't think I would advocate any sort of
 prohibition.
 
 
 
 Does that sound reasonable? Otherwise that would kind of defeat the
 purpose of having the BiocGenerics package in the 1st place.
 
 To me, having 10 BioC packages defining a normalize() function is far
 from being ideal. I think having it defined in BiocGenerics would
 improve things a little bit. Also one potential positive side effect
 I see is that it would give an opportunity to the authors of those
 10 packages to reconsider if they still want to ride the normalize()
 poney or not. Maybe some of them won't and they'll pick up another
 name. Not something we can really decide for them...
 
 H.
 
 
 
 On 02/20/2013 09:47 AM, Laurent Gautier wrote:
 
 On 2013-02-20 17:32, Schalkwyk, Leonard wrote:
 
 
 Is this not just an indication that normalize is now a poor choice of
 a function name?
 
 
 If the package authors called the functions normalize, this means
 either:
 1- at least some of the package authors have named a function
 performing
 an action that is inappropriately described as normalize
 2- all functions normalize do perform an action that can be
 described
 with that verb
 
 Without more details, I'd vote for 2.
 
 (more below)
 
 
 LEo
 
 On 20 Feb 2013, at 16:14, Wolfgang Huber wrote:
 
 Hi
 
 is it clear that all these different functions (methods) share
 similar semantics and enough (conceptually) of their interface?
 
 
 Playing the semantic and concept police would come after defining
 things
 like ontologies of data processing; I am not sure this should be a
 priority.
 I'd see working out a minimal common signature that keeps everyone
 going
 with a minimal fuss come first.
 
 
 Wouldn't the implication be that preemptively every possible string
 of characters should already be defined as a generic function in
 BiocGenerics?
 
 
 No. Otherwise this would probably also mean that R's S4 system
 should in
 fact define all possible strings as generics, which by extension would
 also mean that generic functions do not need to be explicitly
 declared:
 since all possible generics would be declared, it is more practical to
 implicitly assume any given function has already generic declared. S4
 has notions about implicit generic functions; a starting point is the
 man page for setGeneric().
 
 
 
 
 Best wishes
 Wolfgang
 
 Il giorno Feb 20, 2013, alle ore 11:04 AM, Laurent Gatto
 lg...@cam.ac.uk ha scritto:
 
 On 19 February 2013 22:44, Hervé Pagès hpa...@fhcrc.org wrote:
 
 Hi Laurent, and maintainers of packages with a normalize()
 function,
 
 
 On 02/15/2013 04:28 AM, Laurent Gatto wrote:
 
 A quick (and incomplete) manual search using
 http://search.bioconductor.jp/ suggest the following usage of
 normalize:
 
 As a function:
 xps::normalize
 codelink::normalize
 EBImage::normalize
 diffGeneAnalysis::normalize
 
 Defining a generic and methods:
 oligo::normalize
 flowCore::normalize
 MSnbase::normalize
 isobar

82 matches

Mail list logo