Re: [Bioc-devel] Extending Views to support GRanges (and possibly BigWig files)

2016-06-28 Thread Michael Lawrence
Yes, I am generally in favor of the GViews, and I've been a fan of
Views since they were invented. Was just pointing out that we can do a
lot with Lists these days.

On Tue, Jun 28, 2016 at 4:56 PM, Hervé Pagès  wrote:
> On 06/28/2016 05:43 AM, Michael Lawrence wrote:
>>
>> Thanks for these use cases.
>>
>> I've been wondering about the usefulness of Views given how far along
>> Lists have come since the invention of Views. Instead of
>> viewSums(Views(cov, gr)) we could just do sum(cov[gr]). The latter
>> works today. The difference between the Views and List approach is
>> that the Views data structure defers the extraction until
>> summarization.
>
>
> I think this is an important feature.
>
>> So we can always retrieve the entire underlying vector,
>> and the ranges of interest. But for summarize-and-forget use cases,
>> Lists should work fine.
>
>
> They work, but they're not super convenient. The user needs to know
> how to import data for his/her regions of interest only. The way to
> do this can vary across the type of file (e.g. 2bit vs BigWig).
> Having GViews objects hides these details and provides a unified
> mode of operating on a set of genomic regions of interest.
>
> Also, unlike a List, a GViews object bundles together the genomic
> regions and the data associated with them. This makes the object
> self-descriptive and thus reduces the risk of errors.
>
>>
>> I like the idea of pushing the aggregation down to the data. BigWig
>> files are particularly well suited for this, because they have
>> precomputed summary statistics. See summary,BigWigFile. It would take
>> some hacking of the Kent library to expose everything we want, like
>> the sums.
>
>
> This sounds like an argument in favor of using GViews objects to me.
>
> Thanks,
> H.
>
>
>>
>> Michael
>>
>> On Tue, Jun 28, 2016 at 3:54 AM, Johnston, Jeffrey 
>> wrote:
>>>
>>> During the BioC developer day, Hervé brought up the idea of possibly
>>> extending the concept of GenomicViews to other use cases. I'd like to share
>>> how we currently use GenomicRanges, Views and RleLists to perform certain
>>> analyses, and hopefully demonstrate that a way to directly use Views with
>>> GRanges would be quite useful.
>>>
>>> As an example, let's say we have 2 ChIP-seq samples (as BigWig files) and
>>> a set of genomic ranges we are interested in (as a BED file). We want to
>>> find the sum of the ChIP-seq signal found in our regions of interest for
>>> each of the two samples. We'd first import the BED file as a GRanges object.
>>> Annotating the GRanges with two metadata columns representing the ChIP-seq
>>> signal for the two samples seems like a straightforward use for Views (in
>>> particular, viewSums), but it is a bit convoluted.
>>>
>>> The main problem is that Views work with RangesLists, not GRanges.
>>> Coercing a GRanges to a RangesList possibly disrupts the order, so we have
>>> to first reorder the GRanges to match the order it will be given after
>>> coercion (keeping track so we can later revert back to the original order):
>>>
>>> regions <- import("regions_of_interest.bed")
>>> sample1_cov <- import("sample1.bw", as="RleList")
>>> sample2_cov <- import("sample2.bw", as="RleList")
>>> oo <- order(regions)
>>> regions <- regions[oo]
>>>
>>> Now we can construct a View and use viewSums to obtain our values
>>> (unlisting them as they are grouped by seqnames) and add them as a metadata
>>> column in our GRanges, restoring the original order of the GRanges when we
>>> are done:
>>>
>>> v <- Views(sample1_cov, as(regions, "RangesList"))
>>> mcols(regions)$sample1_signal <- unlist(viewSums(v))
>>> regions[oo] <- regions
>>>
>>> We then repeat the process to add another metadata column for sample2.
>>>
>>> To me, having the ability to use a GRanges as a view into an RleList
>>> makes a lot more sense. That would allow us to reduce all the above
>>> complexity to something like:
>>>
>>> regions$sample1_signal <- viewSums(Views(sample1_cov, regions))
>>> regions$sample2_signal <- viewSums(Views(sample2_cov, regions))
>>>
>>> That alone would be great! But, there's a way to make it even better.
>>> Storing these RleLists in memory for each of our samples is quite
>>> inefficient, especially since our regions of interest are only a small
>>> portion of them. The rtracklayer package already has some very useful
>>> functionality for instantiating an RleList with only the data from specific
>>> ranges of a BigWig file. Taking advantage of that, we can dramatically
>>> reduce our memory usage and increase our performance like so:
>>>
>>> regions <- import("regions_of_interest.bed")
>>> sample1_cov <- import("sample1.bw", as="RleList", which=regions)
>>> sample2_cov <- import("sample2.bw", as="RleList", which=regions)
>>> regions$sample1_signal <- viewSums(Views(sample1_cov, regions))
>>> regions$sample2_signal <- viewSums(Views(sample2_cov, regions))
>>>
>>> But can't this functionality be included in 

Re: [Bioc-devel] Extending Views to support GRanges (and possibly BigWig files)

2016-06-28 Thread Hervé Pagès

On 06/28/2016 03:54 AM, Johnston, Jeffrey wrote:

During the BioC developer day, Hervé brought up the idea of possibly extending 
the concept of GenomicViews to other use cases. I'd like to share how we 
currently use GenomicRanges, Views and RleLists to perform certain analyses, 
and hopefully demonstrate that a way to directly use Views with GRanges would 
be quite useful.

As an example, let's say we have 2 ChIP-seq samples (as BigWig files) and a set 
of genomic ranges we are interested in (as a BED file). We want to find the sum 
of the ChIP-seq signal found in our regions of interest for each of the two 
samples. We'd first import the BED file as a GRanges object. Annotating the 
GRanges with two metadata columns representing the ChIP-seq signal for the two 
samples seems like a straightforward use for Views (in particular, viewSums), 
but it is a bit convoluted.

The main problem is that Views work with RangesLists, not GRanges. Coercing a 
GRanges to a RangesList possibly disrupts the order, so we have to first 
reorder the GRanges to match the order it will be given after coercion (keeping 
track so we can later revert back to the original order):

regions <- import("regions_of_interest.bed")
sample1_cov <- import("sample1.bw", as="RleList")
sample2_cov <- import("sample2.bw", as="RleList")
oo <- order(regions)
regions <- regions[oo]

Now we can construct a View and use viewSums to obtain our values (unlisting 
them as they are grouped by seqnames) and add them as a metadata column in our 
GRanges, restoring the original order of the GRanges when we are done:

v <- Views(sample1_cov, as(regions, "RangesList"))
mcols(regions)$sample1_signal <- unlist(viewSums(v))
regions[oo] <- regions

We then repeat the process to add another metadata column for sample2.

To me, having the ability to use a GRanges as a view into an RleList makes a 
lot more sense. That would allow us to reduce all the above complexity to 
something like:

regions$sample1_signal <- viewSums(Views(sample1_cov, regions))
regions$sample2_signal <- viewSums(Views(sample2_cov, regions))

That alone would be great! But, there's a way to make it even better. Storing 
these RleLists in memory for each of our samples is quite inefficient, 
especially since our regions of interest are only a small portion of them. The 
rtracklayer package already has some very useful functionality for 
instantiating an RleList with only the data from specific ranges of a BigWig 
file. Taking advantage of that, we can dramatically reduce our memory usage and 
increase our performance like so:

regions <- import("regions_of_interest.bed")
sample1_cov <- import("sample1.bw", as="RleList", which=regions)
sample2_cov <- import("sample2.bw", as="RleList", which=regions)
regions$sample1_signal <- viewSums(Views(sample1_cov, regions))
regions$sample2_signal <- viewSums(Views(sample2_cov, regions))

But can't this functionality be included in Views? Why not have it accept a 
BigWig file in place of an RleList and have it selectively load the portions of 
the BigWig it needs based on the provided GRanges? That would allow this:

regions <- import("regions_of_interest.bed")
regions$sample1_signal <- viewSums(Views("sample1.bw", regions))
regions$sample2_signal <- viewSums(Views("sample2.bw", regions))


Exactly. Thanks Jeff for taking the time to put this so nicely together.
That's what I've been having in mind for years. The above is very
intuitive and frees the user of having to deal with a lot of low-level
details.

The ability to subset with a GRanges subscript already provides some
level of convenience but having the ability to formally define views
on genomic data goes even further. In particular, as you pointed out,
it allows the use of delayed views realization strategies behind the
scene which can lead to important performance boosts.

H.



The above is quite close to how I use GRanges and BigWigs now, except I have to 
write and maintain all the hackish code to link BigWig files, GRanges, Views, 
RangesLists and RleLists together into something that behaves as one would 
intuitively expect.

I’d welcome any thoughts on how people perform similar analyses that involve 
GRanges and data stored in BigWig files or RleLists, and whether this would 
also be useful to them.

Thanks,
Jeff Johnston
Zeitlinger Lab
Stowers Institute

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Survey results

2016-06-28 Thread Martin Morgan


The results are at https://www.surveymonkey.com/results/SM-FKMZZ2QT/

Martin


This email message may contain legally privileged and/or...{{dropped:2}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] NotNeeded in build report RSS feeds

2016-06-28 Thread Dan Tenenbaum
Hi Antti,

I'm not sure I follow. NotNeeded is a status seen in the INSTALL phase of the 
build, and it means your package did not need to be installed because no other 
package built by the build system depends on it. 

This status should not change from one day to the next. (It would change if 
someone were to depend on your package.)

But the basic behavior of the RSS feeds is supposed to be to only notify you 
when there is a change in build status. So you should not get new feeds when 
the build status stays the same.

Dan


- Original Message -
> From: "Antti Honkela" 
> To: "bioc-devel" 
> Sent: Tuesday, June 28, 2016 4:45:12 AM
> Subject: [Bioc-devel] NotNeeded in build report RSS feeds

> Would it be possible to change the build report RSS feed generation to
> treat "NotNeeded" similarly as "OK"?
> 
> This would make the feed more useful for spotting real problems.
> 
> 
> Thanks!
> 
> Antti
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Confusion on biocLite() with github and annotation data repo

2016-06-28 Thread Dan Tenenbaum
I think you need to add the 

dependencies=TRUE

argument which gets passed to devtools::install_github() and thence to 
devtools::install().

Dan


- Original Message -
> From: "Martin Morgan" 
> To: "James W. MacDonald" , "Sean Davis" 
> 
> Cc: "bioc-devel" 
> Sent: Tuesday, June 28, 2016 8:11:26 AM
> Subject: Re: [Bioc-devel] Confusion on biocLite() with github and annotation 
> data repo

> On 06/28/2016 10:56 AM, James W. MacDonald wrote:
>> I had a similar experience, where the dependencies were not found upon
>> installation. I didn't do anything to fix it - instead it seemed that just
>> re-running biocLite after the initial failed install ended up working.
> 
> Installing from github delegates to devtools::install_github, and that
> the annotation repository is not found. So something like
> 
> > options(repos=BiocInstaller::biocinstallRepos())
> > biocLite("jmacdon/BiocAnno2016")
> 
> I think the code is trying to do that, though
> 
> https://github.com/Bioconductor-mirror/BiocInstaller/blob/master/R/biocLite.R#L72
> 
> so don't really understand why it fails...
> 
> Martin
> 
>>
>> Maybe the same will work for you?
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On Mon, Jun 27, 2016 at 5:03 PM, Davis, Sean (NIH/NCI) [E] <
>> sdav...@mail.nih.gov> wrote:
>>
>>> I am trying to install from Jim’s annotation workflow from github (
>>> https://github.com/jmacdon/BiocAnno2016), but biocLite() fails because it
>>> cannot find annotation data packages.  I *can* go back and install the
>>> annotation data package with a separate call to biocLite().  Is this
>>> expected behavior?  If so, is it possible and desirable to install from
>>> github and have it “do the right thing” to get Bioc dependencies?
>>>
>>> Thanks,
>>> Sean
>>>
>>>
 biocLite('jmacdon/BiocAnno2016',dependencies=TRUE,build_vignettes=TRUE)
>>> BioC_mirror: https://bioconductor.org
>>> Using Bioconductor 3.3 (BiocInstaller 1.22.3), R 3.3.0 (2016-05-03).
>>> Installing github package(s) ‘jmacdon/BiocAnno2016’
>>> Downloading GitHub repo jmacdon/BiocAnno2016@master
>>> from URL https://api.github.com/repos/jmacdon/BiocAnno2016/zipball/master
>>> Installing BiocAnno2016
>>> Skipping 7 unavailable packages: BSgenome.Hsapiens.UCSC.hg19,
>>> EnsDb.Hsapiens.v79, EnsDb.Mmusculus.v79, Homo.sapiens,
>>> hugene20sttranscriptcluster.db, org.Hs.eg.db,
>>> TxDb.Hsapiens.UCSC.hg19.knownGene
>>> Skipping 19 packages ahead of CRAN: AnnotationDbi, AnnotationHub, Biobase,
>>> BiocGenerics, BiocParallel, biomaRt, Biostrings, BSgenome,
>>> GenomicAlignments, GenomicRanges, interactiveDisplayBase, IRanges,
>>> rmarkdown, Rsamtools, rtracklayer, S4Vectors, SummarizedExperiment,
>>> XVector, zlibbioc
>>> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file
>>> --no-environ  \
>>>--no-save --no-restore --quiet CMD build  \
>>>
>>> '/private/var/folders/21/b_rp6qyj1_b1j5cp8qby0tnrgn/T/Rtmpq8yntE/devtools32fa3fcc4899/jmacdon-BiocAnno2016-7317126'
>>> \
>>>--no-resave-data --no-manual
>>>
>>> * checking for file
>>> ‘/private/var/folders/21/b_rp6qyj1_b1j5cp8qby0tnrgn/T/Rtmpq8yntE/devtools32fa3fcc4899/jmacdon-BiocAnno2016-7317126/DESCRIPTION’
>>> ... OK
>>> * preparing ‘BiocAnno2016’:
>>> * checking DESCRIPTION meta-information ... OK
>>> * installing the package to process help pages
>>>---
>>> ERROR: dependencies ‘EnsDb.Hsapiens.v79’, ‘Homo.sapiens’,
>>> ‘EnsDb.Mmusculus.v79’ are not available for package ‘BiocAnno2016’
>>> * removing
>>> ‘/private/var/folders/21/b_rp6qyj1_b1j5cp8qby0tnrgn/T/RtmpBduTCP/Rinst33cf24872f4e/BiocAnno2016’
>>>---
>>> ERROR: package installation failed
>>> Error: Command failed (1)
 biocLite('Homo.sapiens')
>>> BioC_mirror: https://bioconductor.org
>>> Using Bioconductor 3.3 (BiocInstaller 1.22.3), R 3.3.0 (2016-05-03).
>>> Installing package(s) ‘Homo.sapiens’
>>> installing the source package ‘Homo.sapiens’
>>>
>>> trying URL '
>>> https://bioconductor.org/packages/3.3/data/annotation/src/contrib/Homo.sapiens_1.3.1.tar.gz
>>> '
>>> Content type 'application/x-gzip' length 1617 bytes
>>> ==
>>> downloaded 1617 bytes
>>>
>>> * installing *source* package ‘Homo.sapiens’ ...
>>> ** R
>>> ** data
>>> ** preparing package for lazy loading
>>> Warning: package ‘GenomicRanges’ was built under R version 3.3.1
>>> ** help
>>> *** installing help indices
>>> ** building package indices
>>> ** testing if installed package can be loaded
>>> Warning: package ‘GenomicRanges’ was built under R version 3.3.1
>>> * DONE (Homo.sapiens)
>>>
>>> The downloaded source packages are in
>>>
>>> ‘/private/var/folders/21/b_rp6qyj1_b1j5cp8qby0tnrgn/T/Rtmpq8yntE/downloaded_packages’
>>> Old packages: 'airway', 'BiocStyle', 'ComplexHeatmap', 'dendextend',
>>> 'DESeq2',
>>>'devtools', 'docopt', 

Re: [Bioc-devel] Confusion on biocLite() with github and annotation data repo

2016-06-28 Thread James W. MacDonald
I had a similar experience, where the dependencies were not found upon
installation. I didn't do anything to fix it - instead it seemed that just
re-running biocLite after the initial failed install ended up working.

Maybe the same will work for you?

Best,

Jim



On Mon, Jun 27, 2016 at 5:03 PM, Davis, Sean (NIH/NCI) [E] <
sdav...@mail.nih.gov> wrote:

> I am trying to install from Jim’s annotation workflow from github (
> https://github.com/jmacdon/BiocAnno2016), but biocLite() fails because it
> cannot find annotation data packages.  I *can* go back and install the
> annotation data package with a separate call to biocLite().  Is this
> expected behavior?  If so, is it possible and desirable to install from
> github and have it “do the right thing” to get Bioc dependencies?
>
> Thanks,
> Sean
>
>
> > biocLite('jmacdon/BiocAnno2016',dependencies=TRUE,build_vignettes=TRUE)
> BioC_mirror: https://bioconductor.org
> Using Bioconductor 3.3 (BiocInstaller 1.22.3), R 3.3.0 (2016-05-03).
> Installing github package(s) ‘jmacdon/BiocAnno2016’
> Downloading GitHub repo jmacdon/BiocAnno2016@master
> from URL https://api.github.com/repos/jmacdon/BiocAnno2016/zipball/master
> Installing BiocAnno2016
> Skipping 7 unavailable packages: BSgenome.Hsapiens.UCSC.hg19,
> EnsDb.Hsapiens.v79, EnsDb.Mmusculus.v79, Homo.sapiens,
> hugene20sttranscriptcluster.db, org.Hs.eg.db,
> TxDb.Hsapiens.UCSC.hg19.knownGene
> Skipping 19 packages ahead of CRAN: AnnotationDbi, AnnotationHub, Biobase,
> BiocGenerics, BiocParallel, biomaRt, Biostrings, BSgenome,
> GenomicAlignments, GenomicRanges, interactiveDisplayBase, IRanges,
> rmarkdown, Rsamtools, rtracklayer, S4Vectors, SummarizedExperiment,
> XVector, zlibbioc
> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file
> --no-environ  \
>   --no-save --no-restore --quiet CMD build  \
>
> '/private/var/folders/21/b_rp6qyj1_b1j5cp8qby0tnrgn/T/Rtmpq8yntE/devtools32fa3fcc4899/jmacdon-BiocAnno2016-7317126'
> \
>   --no-resave-data --no-manual
>
> * checking for file
> ‘/private/var/folders/21/b_rp6qyj1_b1j5cp8qby0tnrgn/T/Rtmpq8yntE/devtools32fa3fcc4899/jmacdon-BiocAnno2016-7317126/DESCRIPTION’
> ... OK
> * preparing ‘BiocAnno2016’:
> * checking DESCRIPTION meta-information ... OK
> * installing the package to process help pages
>   ---
> ERROR: dependencies ‘EnsDb.Hsapiens.v79’, ‘Homo.sapiens’,
> ‘EnsDb.Mmusculus.v79’ are not available for package ‘BiocAnno2016’
> * removing
> ‘/private/var/folders/21/b_rp6qyj1_b1j5cp8qby0tnrgn/T/RtmpBduTCP/Rinst33cf24872f4e/BiocAnno2016’
>   ---
> ERROR: package installation failed
> Error: Command failed (1)
> > biocLite('Homo.sapiens')
> BioC_mirror: https://bioconductor.org
> Using Bioconductor 3.3 (BiocInstaller 1.22.3), R 3.3.0 (2016-05-03).
> Installing package(s) ‘Homo.sapiens’
> installing the source package ‘Homo.sapiens’
>
> trying URL '
> https://bioconductor.org/packages/3.3/data/annotation/src/contrib/Homo.sapiens_1.3.1.tar.gz
> '
> Content type 'application/x-gzip' length 1617 bytes
> ==
> downloaded 1617 bytes
>
> * installing *source* package ‘Homo.sapiens’ ...
> ** R
> ** data
> ** preparing package for lazy loading
> Warning: package ‘GenomicRanges’ was built under R version 3.3.1
> ** help
> *** installing help indices
> ** building package indices
> ** testing if installed package can be loaded
> Warning: package ‘GenomicRanges’ was built under R version 3.3.1
> * DONE (Homo.sapiens)
>
> The downloaded source packages are in
>
> ‘/private/var/folders/21/b_rp6qyj1_b1j5cp8qby0tnrgn/T/Rtmpq8yntE/downloaded_packages’
> Old packages: 'airway', 'BiocStyle', 'ComplexHeatmap', 'dendextend',
> 'DESeq2',
>   'devtools', 'docopt', 'dplyr', 'ensembldb', 'genefilter',
> 'GenomicFeatures',
>   'GEOquery', 'Gviz', 'h2o', 'HSMMSingleCell', 'lfa', 'limma', 'miRLAB',
>   'monocle', 'nlme', 'oligo', 'pracma', 'purrr', 'RBGL', 'RcppArmadillo',
>   'Rhtslib', 'robustbase', 'rstudioapi', 'rvest', 'sevenbridges',
>   'SomaticCancerAlterations', 'survival', 'tidyr', 'tximport',
>   'VariantAnnotation', 'vegan', 'VGAM', 'withr', 'XLConnect',
> 'XLConnectJars',
>   'xml2'
> Update all/some/none? [a/s/n]: n
> > sessionInfo()
> R version 3.3.0 (2016-05-03)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X 10.11 (El Capitan)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> other attached packages:
> [1] BiocInstaller_1.22.3
>
> loaded via a namespace (and not attached):
>  [1] httr_1.2.0  R6_2.1.2tools_3.3.0 withr_1.0.1
>  [5] curl_0.9.7  memoise_1.0.0   knitr_1.13  git2r_0.15.0
>  [9] digest_0.6.9devtools_1.11.1
>



-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 

Re: [Bioc-devel] Extending Views to support GRanges (and possibly BigWig files)

2016-06-28 Thread Michael Lawrence
Thanks for these use cases.

I've been wondering about the usefulness of Views given how far along
Lists have come since the invention of Views. Instead of
viewSums(Views(cov, gr)) we could just do sum(cov[gr]). The latter
works today. The difference between the Views and List approach is
that the Views data structure defers the extraction until
summarization. So we can always retrieve the entire underlying vector,
and the ranges of interest. But for summarize-and-forget use cases,
Lists should work fine.

I like the idea of pushing the aggregation down to the data. BigWig
files are particularly well suited for this, because they have
precomputed summary statistics. See summary,BigWigFile. It would take
some hacking of the Kent library to expose everything we want, like
the sums.

Michael

On Tue, Jun 28, 2016 at 3:54 AM, Johnston, Jeffrey  wrote:
> During the BioC developer day, Hervé brought up the idea of possibly 
> extending the concept of GenomicViews to other use cases. I'd like to share 
> how we currently use GenomicRanges, Views and RleLists to perform certain 
> analyses, and hopefully demonstrate that a way to directly use Views with 
> GRanges would be quite useful.
>
> As an example, let's say we have 2 ChIP-seq samples (as BigWig files) and a 
> set of genomic ranges we are interested in (as a BED file). We want to find 
> the sum of the ChIP-seq signal found in our regions of interest for each of 
> the two samples. We'd first import the BED file as a GRanges object. 
> Annotating the GRanges with two metadata columns representing the ChIP-seq 
> signal for the two samples seems like a straightforward use for Views (in 
> particular, viewSums), but it is a bit convoluted.
>
> The main problem is that Views work with RangesLists, not GRanges. Coercing a 
> GRanges to a RangesList possibly disrupts the order, so we have to first 
> reorder the GRanges to match the order it will be given after coercion 
> (keeping track so we can later revert back to the original order):
>
> regions <- import("regions_of_interest.bed")
> sample1_cov <- import("sample1.bw", as="RleList")
> sample2_cov <- import("sample2.bw", as="RleList")
> oo <- order(regions)
> regions <- regions[oo]
>
> Now we can construct a View and use viewSums to obtain our values (unlisting 
> them as they are grouped by seqnames) and add them as a metadata column in 
> our GRanges, restoring the original order of the GRanges when we are done:
>
> v <- Views(sample1_cov, as(regions, "RangesList"))
> mcols(regions)$sample1_signal <- unlist(viewSums(v))
> regions[oo] <- regions
>
> We then repeat the process to add another metadata column for sample2.
>
> To me, having the ability to use a GRanges as a view into an RleList makes a 
> lot more sense. That would allow us to reduce all the above complexity to 
> something like:
>
> regions$sample1_signal <- viewSums(Views(sample1_cov, regions))
> regions$sample2_signal <- viewSums(Views(sample2_cov, regions))
>
> That alone would be great! But, there's a way to make it even better. Storing 
> these RleLists in memory for each of our samples is quite inefficient, 
> especially since our regions of interest are only a small portion of them. 
> The rtracklayer package already has some very useful functionality for 
> instantiating an RleList with only the data from specific ranges of a BigWig 
> file. Taking advantage of that, we can dramatically reduce our memory usage 
> and increase our performance like so:
>
> regions <- import("regions_of_interest.bed")
> sample1_cov <- import("sample1.bw", as="RleList", which=regions)
> sample2_cov <- import("sample2.bw", as="RleList", which=regions)
> regions$sample1_signal <- viewSums(Views(sample1_cov, regions))
> regions$sample2_signal <- viewSums(Views(sample2_cov, regions))
>
> But can't this functionality be included in Views? Why not have it accept a 
> BigWig file in place of an RleList and have it selectively load the portions 
> of the BigWig it needs based on the provided GRanges? That would allow this:
>
> regions <- import("regions_of_interest.bed")
> regions$sample1_signal <- viewSums(Views("sample1.bw", regions))
> regions$sample2_signal <- viewSums(Views("sample2.bw", regions))
>
> The above is quite close to how I use GRanges and BigWigs now, except I have 
> to write and maintain all the hackish code to link BigWig files, GRanges, 
> Views, RangesLists and RleLists together into something that behaves as one 
> would intuitively expect.
>
> I’d welcome any thoughts on how people perform similar analyses that involve 
> GRanges and data stored in BigWig files or RleLists, and whether this would 
> also be useful to them.
>
> Thanks,
> Jeff Johnston
> Zeitlinger Lab
> Stowers Institute
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list

[Bioc-devel] NotNeeded in build report RSS feeds

2016-06-28 Thread Antti Honkela
Would it be possible to change the build report RSS feed generation to 
treat "NotNeeded" similarly as "OK"?


This would make the feed more useful for spotting real problems.


Thanks!

Antti

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Extending Views to support GRanges (and possibly BigWig files)

2016-06-28 Thread Johnston, Jeffrey
During the BioC developer day, Hervé brought up the idea of possibly extending 
the concept of GenomicViews to other use cases. I'd like to share how we 
currently use GenomicRanges, Views and RleLists to perform certain analyses, 
and hopefully demonstrate that a way to directly use Views with GRanges would 
be quite useful.

As an example, let's say we have 2 ChIP-seq samples (as BigWig files) and a set 
of genomic ranges we are interested in (as a BED file). We want to find the sum 
of the ChIP-seq signal found in our regions of interest for each of the two 
samples. We'd first import the BED file as a GRanges object. Annotating the 
GRanges with two metadata columns representing the ChIP-seq signal for the two 
samples seems like a straightforward use for Views (in particular, viewSums), 
but it is a bit convoluted.

The main problem is that Views work with RangesLists, not GRanges. Coercing a 
GRanges to a RangesList possibly disrupts the order, so we have to first 
reorder the GRanges to match the order it will be given after coercion (keeping 
track so we can later revert back to the original order):

regions <- import("regions_of_interest.bed")
sample1_cov <- import("sample1.bw", as="RleList")
sample2_cov <- import("sample2.bw", as="RleList")
oo <- order(regions)
regions <- regions[oo]

Now we can construct a View and use viewSums to obtain our values (unlisting 
them as they are grouped by seqnames) and add them as a metadata column in our 
GRanges, restoring the original order of the GRanges when we are done:

v <- Views(sample1_cov, as(regions, "RangesList"))
mcols(regions)$sample1_signal <- unlist(viewSums(v))
regions[oo] <- regions

We then repeat the process to add another metadata column for sample2.

To me, having the ability to use a GRanges as a view into an RleList makes a 
lot more sense. That would allow us to reduce all the above complexity to 
something like:

regions$sample1_signal <- viewSums(Views(sample1_cov, regions))
regions$sample2_signal <- viewSums(Views(sample2_cov, regions))

That alone would be great! But, there's a way to make it even better. Storing 
these RleLists in memory for each of our samples is quite inefficient, 
especially since our regions of interest are only a small portion of them. The 
rtracklayer package already has some very useful functionality for 
instantiating an RleList with only the data from specific ranges of a BigWig 
file. Taking advantage of that, we can dramatically reduce our memory usage and 
increase our performance like so:

regions <- import("regions_of_interest.bed")
sample1_cov <- import("sample1.bw", as="RleList", which=regions)
sample2_cov <- import("sample2.bw", as="RleList", which=regions)
regions$sample1_signal <- viewSums(Views(sample1_cov, regions))
regions$sample2_signal <- viewSums(Views(sample2_cov, regions))

But can't this functionality be included in Views? Why not have it accept a 
BigWig file in place of an RleList and have it selectively load the portions of 
the BigWig it needs based on the provided GRanges? That would allow this:

regions <- import("regions_of_interest.bed")
regions$sample1_signal <- viewSums(Views("sample1.bw", regions))
regions$sample2_signal <- viewSums(Views("sample2.bw", regions))

The above is quite close to how I use GRanges and BigWigs now, except I have to 
write and maintain all the hackish code to link BigWig files, GRanges, Views, 
RangesLists and RleLists together into something that behaves as one would 
intuitively expect.

I’d welcome any thoughts on how people perform similar analyses that involve 
GRanges and data stored in BigWig files or RleLists, and whether this would 
also be useful to them.

Thanks,
Jeff Johnston
Zeitlinger Lab
Stowers Institute

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel