Re: [Bioc-devel] Package problems due to results() function from other package?

2023-10-31 Thread Michael Love
Fine with me to make results() into a generic.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] BiocViews for long read technology

2023-03-06 Thread Michael Love
hi,

On the new #longread slack channel, we have been discussing how it
would be helpful to have a BiocViews dedicated to software for data
from long read technology, and LongRead was suggested. Perhaps under
Software > Technology > Sequencing?

Best,
Mike

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] ExperimentHub: bad restore file magic number (file may be corrupted)

2023-01-10 Thread Michael Love
In the latest release, one of my files in ExperimentHub is giving the
following error on Bioc machines. I haven't touched these files since
they were submitted many years ago to EHub.

http://bioconductor.org/checkResults/release/data-experiment-LATEST/alpineData/nebbiolo2-buildsrc.html

...
--- re-building ‘alpineData.Rmd’ using knitr
Quitting from lines 18-22 (alpineData.Rmd)
Error: processing vignette 'alpineData.Rmd' failed with diagnostics:
failed to load resource
  name: EH167
  title: ERR188088
  reason: error in evaluating the argument 'x' in selecting a method
for function 'get': bad restore file magic number (file may be
corrupted) -- no data loaded
--- failed re-building ‘alpineData.Rmd’

On my machine I can load this resource:

 > library(alpineData)
!> ERR188088()
 snapshotDate(): 2022-10-31
 see ?alpineData and browseVignettes('alpineData') for documentation
 loading from cache
 GAlignmentPairs object with 28576 pairs, strandMode=1, and 0 metadata columns:
   seqnames strand   :  ranges  --  ranges
   : --   
   [1]1  -   : 108565979-108566053  -- 108565846-108565920
   [2]1  -   : 108573341-108573415  -- 108573234-108573308
   [3]1  +   : 108581087-108581161  -- 108581239-108581313
   [4]1  +   : 108601105-108601179  -- 108601196-108601270
   [5]1  -   : 108603628-108603701  -- 108603540-108603614
   ...  ...... ... ... ... ...
   [28572]X  -   : 119791266-119791340  -- 119791130-119791204
   [28573]X  -   : 119791431-119791505  -- 119791358-119791432
   [28574]X  -   : 119791593-119791667  -- 119786691-119789940
   [28575]X  -   : 119791629-119791703  -- 119789951-119791587
   [28576]X  -   : 119791637-119791711  -- 119789976-119791612
   ---
   seqinfo: 194 sequences from an unspecified genome

> eh <- ExperimentHub()

!> eh[["EH167"]]
 see ?alpineData and browseVignettes('alpineData') for documentation
 loading from cache
 GAlignmentPairs object with 28576 pairs, strandMode=1, and 0 metadata columns:
   seqnames strand   :  ranges  --  ranges
   : --   
   [1]1  -   : 108565979-108566053  -- 108565846-108565920
   [2]1  -   : 108573341-108573415  -- 108573234-108573308
   [3]1  +   : 108581087-108581161  -- 108581239-108581313
   [4]1  +   : 108601105-108601179  -- 108601196-108601270
   [5]1  -   : 108603628-108603701  -- 108603540-108603614
   ...  ...... ... ... ... ...
   [28572]X  -   : 119791266-119791340  -- 119791130-119791204
   [28573]X  -   : 119791431-119791505  -- 119791358-119791432
   [28574]X  -   : 119791593-119791667  -- 119786691-119789940
   [28575]X  -   : 119791629-119791703  -- 119789951-119791587
   [28576]X  -   : 119791637-119791711  -- 119789976-119791612
   ---
   seqinfo: 194 sequences from an unspecified genome

!> sessionInfo()
 R version 4.2.2 (2022-10-31)
 Platform: aarch64-apple-darwin20 (64-bit)
 Running under: macOS Ventura 13.0.1

 Matrix products: default
 BLAS:   
/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
 LAPACK: 
/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats4stats graphics  grDevices datasets  utils methods   base

 other attached packages:
  [1] rtracklayer_1.58.0  alpineData_1.24.0
GenomicAlignments_1.34.0
  [4] Rsamtools_2.14.0Biostrings_2.66.0   XVector_0.38.0
  [7] SummarizedExperiment_1.28.0 Biobase_2.58.0
MatrixGenerics_1.10.0
 [10] matrixStats_0.63.0  GenomicRanges_1.50.1
GenomeInfoDb_1.34.4
 [13] IRanges_2.32.0  S4Vectors_0.36.0
ExperimentHub_2.6.0
 [16] AnnotationHub_3.6.0 BiocFileCache_2.6.0 dbplyr_2.2.1
 [19] BiocGenerics_0.44.0 pkgdown_2.0.6   testthat_3.1.5
 [22] rmarkdown_2.18  devtools_2.4.5  usethis_2.1.6

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Mac ARM64 binaries available

2022-09-27 Thread Michael Love
Thanks everyone! This is a huge help.

Mike

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] pre-receive hook declined

2021-07-01 Thread Michael Love
That works for me. Thanks.

Mike

On Thu, Jul 1, 2021 at 2:07 PM Nitesh Turaga  wrote:
>
> Hi
>
> Please try again. You should not see this error anymore.
>
> This was a mistake on our end.
>
> Best,
>
> Nitesh
>
>
>
> Nitesh Turaga
> Scientist II, Department of Data Science,
> Bioconductor Core Team Member
> Dana Farber Cancer Institute
>
> > On Jul 1, 2021, at 1:49 AM, Michael Love  
> > wrote:
> >
> > I'm getting a similar error, trying to push commits for a new package
> > submission. So this is my first attempt to push to Bioc git servers
> > for this package.
> >
> > After following instructions from
> > https://bioconductor.org/developers/how-to/git/new-package-workflow/
> > ...
> >
> > spatialDmelxsim git:(main) > git push upstream main:master
> > Enumerating objects: 20, done.
> > Counting objects: 100% (20/20), done.
> > Delta compression using up to 8 threads
> > Compressing objects: 100% (10/10), done.
> > Writing objects: 100% (12/12), 1.60 KiB | 1.60 MiB/s, done.
> > Total 12 (delta 5), reused 0 (delta 0), pack-reused 0
> > remote: Traceback (most recent call last):
> > remote:   File "hooks/pre-receive.h00-pre-receive-hook-dataexp-workflow",
> > line 90, in 
> > remote: apply_hooks(hooks_dict)
> > remote:   File "hooks/pre-receive.h00-pre-receive-hook-dataexp-workflow",
> > line 72, in apply_hooks
> > remote: if hooks_dict["pre-receive-hook-merge-markers"]:  # enable hook
> > remote: KeyError: 'pre-receive-hook-merge-markers'
> > To git.bioconductor.org:packages/spatialDmelxsim
> > ! [remote rejected] main -> master (pre-receive hook declined)
> > error: failed to push some refs to
> > 'git.bioconductor.org:packages/spatialDmelxsim'
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] pre-receive hook declined

2021-06-30 Thread Michael Love
I'm getting a similar error, trying to push commits for a new package
submission. So this is my first attempt to push to Bioc git servers
for this package.

After following instructions from
https://bioconductor.org/developers/how-to/git/new-package-workflow/
...

spatialDmelxsim git:(main) > git push upstream main:master
Enumerating objects: 20, done.
Counting objects: 100% (20/20), done.
Delta compression using up to 8 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 1.60 KiB | 1.60 MiB/s, done.
Total 12 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Traceback (most recent call last):
remote:   File "hooks/pre-receive.h00-pre-receive-hook-dataexp-workflow",
line 90, in 
remote: apply_hooks(hooks_dict)
remote:   File "hooks/pre-receive.h00-pre-receive-hook-dataexp-workflow",
line 72, in apply_hooks
remote: if hooks_dict["pre-receive-hook-merge-markers"]:  # enable hook
remote: KeyError: 'pre-receive-hook-merge-markers'
To git.bioconductor.org:packages/spatialDmelxsim
 ! [remote rejected] main -> master (pre-receive hook declined)
error: failed to push some refs to
'git.bioconductor.org:packages/spatialDmelxsim'

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] proposal for additional seqlevelsStyle

2020-07-03 Thread Michael Love
This looks great Hervé!

I will work on incorporating this into tximeta when RefSeq
transcriptomes are used.

I second Robert's position that seqlevelsStyle() is one of the most
influential contributions to human genetics.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] DESeq2 use of assays<-, fixed in v1.27.26

2020-02-20 Thread Michael Love
hi,

Just a short note to developers who have a dependency on DESeq2. The
development version of DESeq2 was giving error the past day because of
a new change to assays<- regarding dimnames, I've just now fixed this
use of assays() throughout, so DESeq2 1.27.26 should build without
error and propagate tomorrow.

Mike

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Updating for 3.11

2020-01-07 Thread Michael Love
Thanks for the note Karl, and sorry for the break.

For others, the change in DESeq2::lfcShrink is that now type="apeglm"
is the default instead of type="normal".

For the past two releases, users have been getting a message when they
run with the default argument that type="normal" performs worse the
other two shrinkage options available.

best,
Mike

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Block bootstrap for GenomicRanges

2018-08-14 Thread Michael Love
Maybe we can break off to a #blockboot channel on the Bioc-community Slack.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Block bootstrap for GenomicRanges

2018-08-14 Thread Michael Love
dear Hervé,

Thanks again for the quick and useful reply!

I think that the theory behind the block bootstrap [Kunsch (1989), Liu
and Singh (1992), Politis and Romano (1994)], needs that the blocks be
drawn with replacement (you can get some features twice) and that the
blocks can be overlapping. In a hand-waving way, I think, it's "good"
for the variance estimation on any statistic of interest that y' may
have more or less features than y.

I will explore a bit using the solutions you've laid out.

Now that I think about it, the start-position based solution that I
was thinking about will break if two features in y share the same
start position, so that's not good.

On Mon, Aug 13, 2018 at 11:58 PM, Hervé Pagès  wrote:
> That helps. I think I start to understand what you are after.
>
> See below...
>
>
> On 08/13/2018 06:07 PM, Michael Love wrote:
>>
>> dear Hervé,
>>
>> Thanks for the quick reply about directions to take this.
>>
>> I'm sorry for not providing sufficient detail about the goal of block
>> bootstrapping in my initial post. Let me try again. For a moment, let
>> me ignore multiple chromosomes/seqs and just focus on a single set of
>> IRanges.
>>
>> The point of the block bootstrap is: Let's say we want to find the
>> number of overlaps of x and y, and then assess how surprised we are at
>> how large that overlap is. Both of them may have features that tend to
>> cluster together along the genome (independently). One method would
>> just be to move the features in y around to random start sites, making
>> y', say B times, and then calculate each of the B times the number of
>> overlaps between x and y'. Or we might make this better by having
>> blacklisted sites where the randomly shuffled features in y cannot go.
>>
>> The block bootstrap is an alternative to randomly moving the start
>> sites, where instead we create random data, by taking big "blocks" of
>> features in y. Each block is a lot like a View. And the ultimate goal
>> is to make B versions of the data y where the features have been
>> shuffled around, but by taking blocks, we preserve the clumpiness of
>> the features in y.
>>
>> Let me give some numbers to make this more concrete, so say we're
>> making a single block bootstrap sample of a chromosome that is 1000 bp
>> long. Here is the original y:
>>
>> y <- IRanges(c(51,61,71,111,121,131,501,511,521,921,931,941),width=5)
>>
>> If I go with my coverage approach, I should extend it all the way to
>> the end of the chromosome. Here I lose information if there are
>> overlaps of features in y, and I'm thinking of a fix I'll describe
>> below.
>>
>> cov <- c(coverage(y), Rle(rep(0,55)))
>>
>> I could make one block bootstrap sample of y (this is 1 out of B in
>> the ultimate procedure) by taking 10 blocks of width 100. The blocks
>> have random start positions from 1 to 901.
>>
>> y.boot.1 <- unlist(Views(cov, start=round(runif(10,1,901)), width=100))
>
>
> Choosing blocks that can overlap with each others could make y' appear
> to have more features than y (by repeating some of the original
> features). Also choosing blocks that can leave big gaps in the
> chromosome could make y' appear to have less features than y
> (by dropping some of the original ranges). Isn't that a problem?
>
> Have you considered choosing a set of blocks that represent a
> partitioning of the chromosome? This would make y' look more like y
> by preserving the number of original ranges.
>
> In other words, if you can assign each range in y to one block and
> one block only, then you could generate y' by just shifting the
> ranges in y. The exact amount of shifting (positive or negative)
> would only depend on the block that the range belongs to. This would
> give you an y' that is parallel to y (i.e. same number of ranges
> and y'[i] corresponds to y[i]).
>
> Something like this:
>
> 1) Define the blocks (must be a partitioning of the chromosome):
>
>   blocks <- successiveIRanges(rep(100, 10))
>
> 2) Assign each range in y to the block it belongs to:
>
>   mcols(y)$block <- findOverlaps(y, blocks, select="first")
>
> 1) and 2) are preliminary steps that you only need to do once,
> before you generate B versions of the shuffled data (what you
> call y'). The next steps are for generating one version of the
> shuffled data so would need to be repeated B times.
>
> 3) Shuffle the blocks:
>
>   perm <- sample(length(blocks))
>   perm
>   # [1]  6  5  8  3  2  7  1  9  4 10
>
>   permuted_blocks <- successiveIRanges(width(blocks)[perm])
>   permute

Re: [Bioc-devel] Block bootstrap for GenomicRanges

2018-08-13 Thread Michael Love
dear Hervé,

Thanks for the quick reply about directions to take this.

I'm sorry for not providing sufficient detail about the goal of block
bootstrapping in my initial post. Let me try again. For a moment, let
me ignore multiple chromosomes/seqs and just focus on a single set of
IRanges.

The point of the block bootstrap is: Let's say we want to find the
number of overlaps of x and y, and then assess how surprised we are at
how large that overlap is. Both of them may have features that tend to
cluster together along the genome (independently). One method would
just be to move the features in y around to random start sites, making
y', say B times, and then calculate each of the B times the number of
overlaps between x and y'. Or we might make this better by having
blacklisted sites where the randomly shuffled features in y cannot go.

The block bootstrap is an alternative to randomly moving the start
sites, where instead we create random data, by taking big "blocks" of
features in y. Each block is a lot like a View. And the ultimate goal
is to make B versions of the data y where the features have been
shuffled around, but by taking blocks, we preserve the clumpiness of
the features in y.

Let me give some numbers to make this more concrete, so say we're
making a single block bootstrap sample of a chromosome that is 1000 bp
long. Here is the original y:

y <- IRanges(c(51,61,71,111,121,131,501,511,521,921,931,941),width=5)

If I go with my coverage approach, I should extend it all the way to
the end of the chromosome. Here I lose information if there are
overlaps of features in y, and I'm thinking of a fix I'll describe
below.

cov <- c(coverage(y), Rle(rep(0,55)))

I could make one block bootstrap sample of y (this is 1 out of B in
the ultimate procedure) by taking 10 blocks of width 100. The blocks
have random start positions from 1 to 901.

y.boot.1 <- unlist(Views(cov, start=round(runif(10,1,901)), width=100))

This last line below is a hack to get back to the ranges for a single
block bootstrap sample of y. It works here, but only because none of
the features in y were overlapping each other.

Instead of coverage(), if I'd made an Rle where the non-zero elements
are located at the start of ranges, and the value is the width of the
range, I think this Views approach would actually work.

y.boot.1.rng <- IRanges(start(y.boot.1)[runValue(y.boot.1) == 1],
  width=runLength(y.boot.1)[runValue(y.boot.1) == 1])

I'm interested in building a function that takes in IRanges and
outputs these shuffled set of IRanges.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Block bootstrap for GenomicRanges

2018-08-13 Thread Michael Love
dear Bioc-devels,

>From a conversation on Twitter [1], I've been wondering if there has
been any work to implement block bootstrap for GRanges.

I guess that's my main question, and then the rest is details and
ideas in case it hasn't already been done.

The block bootstrap helps to create perhaps more realistic null data,
in way that deals with clumping of features. You take many random,
potentially overlapping blocks from the original data (ranges) to fill
up a new set of data (ranges). There's a 2011 paper from Peter Bickel
about this topic [2], which had some python software associated with
it.

Aside from the statistical considerations, one would want to do each
bootstrap quickly so you can make many bootstrap samples, and so
having vectorized code would be an obvious desirable thing. I was
thinking about how the Views infrastructure is kind of what one would
want, but it operates on Rle's if I remember correctly, and the
desired output here would be ranges. Below is kind of getting what I
want, as a hack I'd then convert the Rle's back to ranges and
concatenate them if I wanted 5 x 40bp = 200 bp of bootstrap data.

> ir <- IRanges(c(51,61,71,111,121,131,200),width=5) # make some clumpy data
> Views(coverage(ir), start=round(runif(5,1,150)), width=40) # e.g. 5 random 40 
> bp blocks
Views on a 204-length Rle subject

views:
start end width
[1]97 13640 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1
1 1 1 1 0 0 0 0 0 1 1 ...]
[2]50  8940 [0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1
1 0 0 0 0 0 0 0 0 0 0 ...]
[3]18  5740 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 ...]
[4] 8  4740 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 ...]
[5]97 13640 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1
1 1 1 1 0 0 0 0 0 1 1 ...]

Another option would be to convert to time-series data, then use an
existing package like 'boot' then convert back to ranges.

[1] https://twitter.com/mikelove/status/1027296115346092032
[2] https://arxiv.org/abs/1101.0947

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] workflows are missing from landing pages

2018-06-07 Thread Michael Love
It seems like the vignettes are not showing up here:

https://bioconductor.org/packages/release/workflows/

https://bioconductor.org/packages/release/workflows/html/rnaseqGene.html

https://bioconductor.org/packages/release/workflows/html/RnaSeqGeneEdgeRQL.html

https://bioconductor.org/packages/release/workflows/html/simpleSingleCell.html

Sorry if I missed an announcement and this is known.

-Mike

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Suggested edits to support site posting guide

2018-04-30 Thread Michael Love
hi Leo,

We currently have "Rudeness and ad hominem comments are not
acceptable" in the section on replying. This is nice and simple.

I think most of the people reading the guide will be first time
question-askers, but I'm glad we have the above bulletpoint so we can
say someone was breaking the site guidelines if they were rude.

I don't think we need to link to the anti-RTFM piece, because I don't
think I've ever seen that phrase used on Bioc support (or if I saw it,
it was a while ago and I forgot about it). If you want to incorporate
any of the positive suggestions from there to the section on replying
go for it.

Anyone should feel free to keep editing the Gdoc. I'll wait a week or
so and then do a PR to the site.

best,
Mike


On Mon, Apr 30, 2018 at 3:30 PM, Tim Triche, Jr. <tim.tri...@gmail.com> wrote:
> well, Hadley usually walks people through the steps required to 1) file a
> bug reproducibly or 2) learn how to do the thing in question organically,
> and his universe is doing OK, so N=1 in favor of that approach.  Also a lot
> of people helped me learning Perl (!) on Usenet (!!!) so yeah I'm old but
> it seems like civility was once a little more common.  Bioc-devel and
> support.bioconductor.com are healthy communities, I would hope that nobody
> minds this being spelled out explicitly.
>
> I love Prof Ripley and he's never been anything but nice to me, still,
> r-help's reputation took a beating from his insistence on self-reliance.
> Several times when I thought I understood a library or process, I'd write
> up an example and discover that what I believed to be true was not, in
> fact, true -- perhaps an example was cordoned off with \{dontrun} or some
> such, perhaps an upstream package broke unwritten assumptions, whatever.
> My point is that it can be beneficial for the answerer as for the asker,
> and perhaps if one does not have time to provide a working example in an
> answer, then it might be better to let another person with sufficient time
> go ahead and address that question.
>
> JMHO, since you asked "everyone else".
>
> Best,
>
>
> --t
>
> On Mon, Apr 30, 2018 at 2:58 PM, Leonardo Collado Torres <lcoll...@jhu.edu>
> wrote:
>
>> Hi Mike, Lori and everyone else,
>>
>> I recently saw this tweet
>> https://twitter.com/aprilwensel/status/989248246878035972 that links
>> to https://medium.com/compassionate-coding/its-time-
>> to-retire-rtfm-31acdfef654f#---0-162
>>
>> First, I'm curious if you've read it and if you like it. Second, maybe
>> we could include some of it (or link to it) in the suggested changes
>> to the posting guide that Mike is working on.
>>
>> Best,
>> Leo
>>
>>
>>
>> On Fri, Apr 20, 2018 at 10:10 AM, Shepherd, Lori
>> <lori.sheph...@roswellpark.org> wrote:
>> > May I also suggested when you are satisfied with user feedback and
>> alterations to create a pull request for the website repository
>> >
>> >
>> > https://github.com/Bioconductor/bioconductor.org
>> >
>> >
>> >
>> >
>> > Lori Shepherd
>> >
>> > Bioconductor Core Team
>> >
>> > Roswell Park Cancer Institute
>> >
>> > Department of Biostatistics & Bioinformatics
>> >
>> > Elm & Carlton Streets
>> >
>> > Buffalo, New York 14263
>> >
>> > 
>> > From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of
>> Michael Love <michaelisaiahl...@gmail.com>
>> > Sent: Thursday, April 12, 2018 11:23:35 AM
>> > To: bioc-devel
>> > Subject: [Bioc-devel] Suggested edits to support site posting guide
>> >
>> > dear all,
>> >
>> > I've edited the text from the posting guide
>> > <http://www.bioconductor.org/help/support/posting-guide/> to update it a
>> > bit.
>> >
>> > For example, some of the text still referred to the mailing list
>> > :
>> > "Compose a new message with a new subject line".
>> >
>> > Mostly, though, I wanted to emphasize posting all of their R code and
>> > information about the experiment, which is often missing from support
>> site
>> > posts. The self-contained example with data and code is rarely possible
>> > because many Bioc users have large datasets that can't be shared.
>> >
>> > Curious if this is useful or if others have suggestions. The following
>> > link allows edits:
>> >
>> > https://docs.google.com/document/d/1baiBUYB8E02KMbaojjoo-
>> > tKV3Ctd9sA5lVCvwFIWquA/e

[Bioc-devel] Suggested edits to support site posting guide

2018-04-12 Thread Michael Love
dear all,

I've edited the text from the posting guide
 to update it a
bit.

For example, some of the text still referred to the mailing list
​: ​
"Compose a new message with a new subject line".

​Mostly, though, I wanted to emphasize posting all of their R code and
information about the experiment, which is often missing from support site
posts.​ The self-contained example with data and code is rarely possible
because many Bioc users have large datasets that can't be shared.
​
​Curious if this is useful or if others have suggestions. The following
link allows edits:​

https://docs.google.com/document/d/1baiBUYB8E02KMbaojjoo-
tKV3Ctd9sA5lVCvwFIWquA/edit?usp=sharing

​best,
Mike​

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Thanks to Andrzej...

2018-03-01 Thread Michael Love
Thanks Andrzej! these have been really useful contributions to the Bioc
ecosystem

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Nightly builds

2018-02-07 Thread Michael Love
I've found this Makefile from Laurent et al makes it a lot easier to
do build/check, getting closer to the ease of devtools while still
building and checking from command line:

https://github.com/ComputationalProteomicsUnit/maker

You do:

git clone g...@github.com:ComputationalProteomicsUnit/maker.git
ln -s maker/Makefile .

Then configure maker/Makefile for your package and point it to your
local R-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] workflow page reorganization

2017-12-15 Thread Michael Love
This already looks much improved, thanks Andrzej and Aaron. I think
workflows are where it's at, and this page is probably
underappreciated by Bioconductor users and the outside community.

My wishlist for the workflows page, which may exceed what is available
for the current effort:

1) It should say at the top which version of R/Bioconductor the
workflows are being built on.

2) On the main page, for each workflow:

* A thumbnail (could live in a pre-specified location in the package)
* Author list (autopopulated from DESCRIPTION)
* Version
* Link to the (most current) F1000Research articles for those which
are published (new field in DESCRIPTION?)
* Some kind of CI "buttony" thing, to indicate to users that these are
live documents
* Key Bioc/R packages used in this worfklow (could this also be an
additional DESCRIPTION field?)

3) I think it would be good to encourage the more stubby workflow
descriptions to add more text, and maybe to decrease the very words
ones, so that it's more consistent. Wow, that's pretty obsessive of
me, but I think it would make the page look more professional.

4) Text somewhere with a link to the support site and how to ask for
help on workflows (e.g. vignette(), ?functionName)

5) An advertisement somewhere for submitting a workflow, link to more
detailed doc elsewhere

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocFileCache for developers

2017-12-09 Thread Michael Love
thanks Henrik,

I like the explicitness of the `R.cache` approach and I copied it for
my current implementation.

For the BiocFileCache location that should be used for this package
I'm developing, `tximeta`, I'm now using the following logic:

* If run non-interactively, `tximeta` uses a temporary directory.
* If run interactively, and a location has not been previously saved,
  the user is prompted if she wants to use (1) the default directory or
  a (2) temporary directory.
- If (1), then use the default directory, and save this choice.
- If (2), then use a temporary directory for the rest of this R
  session, and ask again next R session.
* The prompt above also mentions that a specific function can be used
  to manually set the directory at any time point, and this choice is
  saved.
* The default directory is given by `rappdirs::user_cache_dir("BiocFileCache")`.
* The choice itself of the BiocFileCache directory that `tximeta`
  should use is saved in a JSON file here
  `rappdirs::user_cache_dir("tximeta")`.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocFileCache for developers

2017-12-01 Thread Michael Love
user_cache_dir(appname="mikes-package-name")

wow, how did you guess it?

I'm storing TxDb's for use across sessions with `rname` set to the
basename of the GTF file, e.g. "gencode.v27.annotation.gtf.gz". I want
to encourage the serendipitous case that there is already a
BiocFileCache entry with this `rname` created outside of the use of my
package. I can see this happening, especially if I mention this naming
pattern in the vignette.

I'm thinking I will encourage the user to pick a good BiocFileCache
location by not setting a default value. Potentially multiple users
could be sharing the same BiocFileCache location, e.g. a lab space on
HPC.

And then actively specifying NULL for the location (or something like
this) could switch the location to:

user_cache_dir(appname = "BiocFileCache")

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocFileCache for developers

2017-12-01 Thread Michael Love
One solution if a developer really wants to make sure the user knows
that the function will store a cache somewhere would be to leave the
BiocFileCache location argument without a default value.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] BiocFileCache for developers

2017-12-01 Thread Michael Love
hi,

I'm writing a function which currently uses BiocFileCache to store a
small data.frame and one or more TxDb objects, so that these objects
are persistent and available across sessions (or possible available to
multiple users).

In the simplest case, I would call

bfc <- BiocFileCache()

inside my function, which will check the default location:

user_cache_dir(appname = "BiocFileCache")

In general, should developers also support the user specifying a
specific location for the BiocFileCache? So functions using
BiocFileCache should have an argument that overrides the above
location?

thanks,
Mike

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] workflow builder error: winbuilder1 says "pdflatex is not available"

2017-09-14 Thread Michael Love
Thank you for the quick fix and reply Andrzej. best, Mike

On Thu, Sep 14, 2017 at 6:30 PM, Andrzej Oleś <andrzej.o...@gmail.com> wrote:
> Hi Mike,
>
> thanks for reporting this. Strange, not sure what the problem was because
> now I was able to rebuild the workflow without actually reinstalling or
> reconfiguring anything on the builder. Maybe this was a transient issue
> related to the fact that I updated the LaTeX distro on the builder some days
> ago.
>
> Best,
> Andrzej
>
>
> On Thu, Sep 14, 2017 at 6:08 PM, Michael Love <michaelisaiahl...@gmail.com>
> wrote:
>>
>> hi,
>>
>> I pushed a change to rnaseqGene workflow and only winbuilder1 had a
>> problem:
>>
>> * checking PDF version of manual ...Warning: running command
>> '"C:/Progra~1/R/R-3.4.0/bin/x64/Rcmd.exe" Rd2pdf  --batch --no-preview
>> --build-dir="C:/Windows/TEMP/RtmpE9T1fB/Rd2pdff6059a733c8" --no-clean
>> -o  rnaseqGene-manual.pdf
>>
>> "C:/DOCBUILDER_TEMP/jenkins-rnaseqGene-label=winbuilder1-81/rnaseqGene.Rcheck/rnaseqGene"'
>> had status 1
>>  WARNING
>> LaTeX errors when creating PDF version.
>> This typically indicates Rd problems.
>> * checking PDF version of manual without hyperrefs or index
>> ...Warning: running command '"C:/Progra~1/R/R-3.4.0/bin/x64/Rcmd.exe"
>> Rd2pdf  --batch --no-preview
>> --build-dir="C:/Windows/TEMP/RtmpE9T1fB/Rd2pdff603f1c7162" --no-clean
>> --no-index -o  rnaseqGene-manual.pdf
>>
>> C:/DOCBUILDER_TEMP/jenkins-rnaseqGene-label=winbuilder1-81/rnaseqGene.Rcheck/rnaseqGene'
>> had status 1
>>  ERROR
>> Re-running with no redirection of stdout/stderr.
>> Hmm ... looks like a package
>> Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet,
>> :
>>   pdflatex is not available
>> Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet,
>> :
>>   pdflatex is not available
>> Error in running tools::texi2pdf()
>>
>> The full report:
>>
>>
>> http://docbuilder.bioconductor.org:8080/job/rnaseqGene/label=winbuilder1/lastBuild/console
>>
>> thanks,
>> Mike
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] workflow builder error: winbuilder1 says "pdflatex is not available"

2017-09-14 Thread Michael Love
hi,

I pushed a change to rnaseqGene workflow and only winbuilder1 had a problem:

* checking PDF version of manual ...Warning: running command
'"C:/Progra~1/R/R-3.4.0/bin/x64/Rcmd.exe" Rd2pdf  --batch --no-preview
--build-dir="C:/Windows/TEMP/RtmpE9T1fB/Rd2pdff6059a733c8" --no-clean
-o  rnaseqGene-manual.pdf
"C:/DOCBUILDER_TEMP/jenkins-rnaseqGene-label=winbuilder1-81/rnaseqGene.Rcheck/rnaseqGene"'
had status 1
 WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
* checking PDF version of manual without hyperrefs or index
...Warning: running command '"C:/Progra~1/R/R-3.4.0/bin/x64/Rcmd.exe"
Rd2pdf  --batch --no-preview
--build-dir="C:/Windows/TEMP/RtmpE9T1fB/Rd2pdff603f1c7162" --no-clean
--no-index -o  rnaseqGene-manual.pdf
C:/DOCBUILDER_TEMP/jenkins-rnaseqGene-label=winbuilder1-81/rnaseqGene.Rcheck/rnaseqGene'
had status 1
 ERROR
Re-running with no redirection of stdout/stderr.
Hmm ... looks like a package
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet,  :
  pdflatex is not available
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet,  :
  pdflatex is not available
Error in running tools::texi2pdf()

The full report:

http://docbuilder.bioconductor.org:8080/job/rnaseqGene/label=winbuilder1/lastBuild/console

thanks,
Mike

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Using BiocInstaller with R 3.4.0 beta

2017-04-17 Thread Michael Love
alpine, which had this error as of Sunday, is now cleared (I didn't
make any changes to the code)

http://master.bioconductor.org/checkResults/devel/bioc-LATEST/alpine/

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] tximport fails to build on Windows

2017-03-20 Thread Michael Love
Anyone have any ideas for how to debug this?

I get an error from the tximport vignette on tokay2, but I can't
figure out what could be the issue:

...
1 Quitting from lines 185-189 (tximport.Rmd)
Error: processing vignette 'tximport.Rmd' failed with diagnostics:
cannot open the connection
Execution halted

http://master.bioconductor.org/checkResults/devel/bioc-LATEST/tximport/tokay2-buildsrc.html

Those lines look innocuous to me, and work fine on Linux and Mac:

184 ```{r}
185 files <- file.path(dir,"sailfish", samples$run, "quant.sf")
186 names(files) <- paste0("sample",1:6)
187 txi.sailfish <- tximport(files, type="sailfish", tx2gene=tx2gene)
188 head(txi.sailfish$counts)
189 ```

I tried bumping the version and I still get the error from tokay2.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] "access forbidden" trying to add new data to experiment package

2016-12-09 Thread Michael Love
That worked.

Thanks Martin

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] "access forbidden" trying to add new data to experiment package

2016-12-09 Thread Michael Love
hi,

After checking out the data_store version, I get an svn error "access
forbidden". Can you remind me the proper steps (I tried to following
variations on the Source Code instructions for experiment packages)?

love:test/ $ svn co
https://hedgehog.fhcrc.org/bioc-data/trunk/experiment/data_store/tximportData
AtximportData/inst
AtximportData/inst/extdata
AtximportData/inst/extdata/cufflinks
AtximportData/inst/extdata/cufflinks/isoforms.count_table
AtximportData/inst/extdata/cufflinks/isoforms.attr_table
AtximportData/inst/extdata/cufflinks/isoforms.fpkm_table
AtximportData/inst/extdata/rsem
AtximportData/inst/extdata/rsem/ERR188329
AtximportData/inst/extdata/rsem/ERR188329/ERR188329.genes.results
AtximportData/inst/extdata/rsem/ERR188356
AtximportData/inst/extdata/rsem/ERR188356/ERR188356.genes.results
AtximportData/inst/extdata/rsem/ERR188088
AtximportData/inst/extdata/rsem/ERR188088/ERR188088.genes.results
AtximportData/inst/extdata/rsem/ERR188288
AtximportData/inst/extdata/rsem/ERR188288/ERR188288.genes.results
AtximportData/inst/extdata/rsem/ERR188297
AtximportData/inst/extdata/rsem/ERR188297/ERR188297.genes.results
AtximportData/inst/extdata/rsem/ERR188021
AtximportData/inst/extdata/rsem/ERR188021/ERR188021.genes.results
AtximportData/inst/extdata/tx2gene.csv
AtximportData/inst/extdata/salmon
AtximportData/inst/extdata/salmon/ERR188329
AtximportData/inst/extdata/salmon/ERR188329/cmd_info.json
AtximportData/inst/extdata/salmon/ERR188329/quant.sf
AtximportData/inst/extdata/salmon/ERR188356
AtximportData/inst/extdata/salmon/ERR188356/quant.sf
AtximportData/inst/extdata/salmon/ERR188356/cmd_info.json
AtximportData/inst/extdata/salmon/ERR188088
AtximportData/inst/extdata/salmon/ERR188088/quant.sf
AtximportData/inst/extdata/salmon/ERR188088/cmd_info.json
AtximportData/inst/extdata/salmon/ERR188288
AtximportData/inst/extdata/salmon/ERR188288/quant.sf
AtximportData/inst/extdata/salmon/ERR188288/cmd_info.json
AtximportData/inst/extdata/salmon/ERR188297
AtximportData/inst/extdata/salmon/ERR188297/cmd_info.json
AtximportData/inst/extdata/salmon/ERR188297/quant.sf
AtximportData/inst/extdata/salmon/ERR188021
AtximportData/inst/extdata/salmon/ERR188021/cmd_info.json
AtximportData/inst/extdata/salmon/ERR188021/quant.sf
AtximportData/inst/extdata/kallisto
AtximportData/inst/extdata/kallisto/ERR188329
AtximportData/inst/extdata/kallisto/ERR188329/run_info.json
AtximportData/inst/extdata/kallisto/ERR188329/abundance.tsv
AtximportData/inst/extdata/kallisto/ERR188356
AtximportData/inst/extdata/kallisto/ERR188356/run_info.json
AtximportData/inst/extdata/kallisto/ERR188356/abundance.tsv
AtximportData/inst/extdata/kallisto/ERR188088
AtximportData/inst/extdata/kallisto/ERR188088/run_info.json
AtximportData/inst/extdata/kallisto/ERR188088/abundance.tsv
AtximportData/inst/extdata/kallisto/ERR188288
AtximportData/inst/extdata/kallisto/ERR188288/abundance.tsv
AtximportData/inst/extdata/kallisto/ERR188288/run_info.json
AtximportData/inst/extdata/kallisto/ERR188297
AtximportData/inst/extdata/kallisto/ERR188297/run_info.json
AtximportData/inst/extdata/kallisto/ERR188297/abundance.tsv
AtximportData/inst/extdata/kallisto/ERR188021
AtximportData/inst/extdata/kallisto/ERR188021/run_info.json
AtximportData/inst/extdata/kallisto/ERR188021/abundance.tsv
AtximportData/inst/extdata/sailfish
AtximportData/inst/extdata/sailfish/ERR188329
AtximportData/inst/extdata/sailfish/ERR188329/cmd_info.json
AtximportData/inst/extdata/sailfish/ERR188329/quant.sf
AtximportData/inst/extdata/sailfish/ERR188356
AtximportData/inst/extdata/sailfish/ERR188356/quant.sf
AtximportData/inst/extdata/sailfish/ERR188356/cmd_info.json
AtximportData/inst/extdata/sailfish/ERR188088
AtximportData/inst/extdata/sailfish/ERR188088/cmd_info.json
AtximportData/inst/extdata/sailfish/ERR188088/quant.sf
AtximportData/inst/extdata/sailfish/ERR188288
AtximportData/inst/extdata/sailfish/ERR188288/quant.sf
AtximportData/inst/extdata/sailfish/ERR188288/cmd_info.json
AtximportData/inst/extdata/sailfish/ERR188297
AtximportData/inst/extdata/sailfish/ERR188297/quant.sf
AtximportData/inst/extdata/sailfish/ERR188297/cmd_info.json
AtximportData/inst/extdata/sailfish/ERR188021
AtximportData/inst/extdata/sailfish/ERR188021/quant.sf
AtximportData/inst/extdata/sailfish/ERR188021/cmd_info.json
AtximportData/inst/extdata/samples.txt
AtximportData/inst/extdata/samples_extended.txt
Checked out revision 4006.

love:tximportData/ $ cd inst/extdata
[15:06:02]


Then I add new files:


love:extdata/ $ svn add kallisto_h5
[15:06:19]
A kallisto_h5
A kallisto_h5/ERR188329
A kallisto_h5/ERR188329/run_info.json
A  (bin)  kallisto_h5/ERR188329/abundance.h5
A 

Re: [Bioc-devel] "single cell" BiocViews

2016-09-20 Thread Michael Love
That looks right to me, as it spans across transcription, etc. There are so
many great single cell methods now on Bioconductor.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] "single cell" BiocViews

2016-09-20 Thread Michael Love
It would be useful to have a "single cell" BiocViews

-Mike

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Experimental data package or ExperimentHub

2016-07-14 Thread Michael Love
dear Valerie,

This is a simple R question, but I can't seem to find the answer. I'm
assuming ExperimentHub wants object created by save() and not
saveRDS().

In make-data.R, I have a loop where I programmatically make a
GAlignmentPairs object for each of 4 samples. I'm then using assign()
to assign the value to an appropriate name, e.g.

assign(sample.name[i], x)

where x is a GAlignmentPairs object I've made.

I'm having trouble though saving this to a file. My first try was:

save(sample.name[i], file=paste0(sample.name[i],".rda"))

But this gives:

Error in save(sample.name[i], file = ...
  object 'sample.name[i]' not found

I also tried get(), but it's also giving the "not found" error.

save(get(sample.name[i]), file=paste0(sample.name[i],".rda"))


thanks
Mike

On Thu, Jun 30, 2016 at 6:41 PM, Obenchain, Valerie
<valerie.obench...@roswellpark.org> wrote:
>
> Hi Mike,
>
> Yes, let's go the ExperimentHub route. The focus is on detailed
> documentation / real use cases so the vignette you describe sounds like
> a good fit.
>
> Essentially you'll put a package together that is very similar to an
> experimental data package with a few extra files described in the
> ExperimentHubData vignette:
>
>
> http://www.bioconductor.org/packages/3.4/bioc/vignettes/ExperimentHubData/inst/doc/ExperimentHubData.html
>
> Let me know when you've got the data objects ready and I'll put them in
> S3. Once that's done we'll test the package functions and put the
> package through a review on the new tracker.
>
> Val
>
>
>
> On 06/30/2016 09:27 AM, Michael Love wrote:
> > I want to generate an experimental data package with a small subset of
> > alignments. The end goal is a software vignette, where the user would
> > specify BAM files. I'd like to try out the ExperimentHub submission
> > process, which is for R data objects. Does it sound reasonable if I
> > were to have an initial code chunk that loads GAlignments and writes
> > out BAM files using rtracklayer::export to then show the use of
> > software with BAM files? There would be so few alignments that it
> > wouldn't be a large computational or storage expense. Or should I
> > submit a standard Bioc experimental data package with files in
> > inst/extdata?
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>
>
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee or 
> agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Experimental data package or ExperimentHub

2016-06-30 Thread Michael Love
I want to generate an experimental data package with a small subset of
alignments. The end goal is a software vignette, where the user would
specify BAM files. I'd like to try out the ExperimentHub submission
process, which is for R data objects. Does it sound reasonable if I
were to have an initial code chunk that loads GAlignments and writes
out BAM files using rtracklayer::export to then show the use of
software with BAM files? There would be so few alignments that it
wouldn't be a large computational or storage expense. Or should I
submit a standard Bioc experimental data package with files in
inst/extdata?

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Listing package-specific methods

2016-06-27 Thread Michael Love
This seems like a pretty simple solution, so I like that :)

On Mon, Jun 27, 2016 at 10:46 AM, Hervé Pagès <hpa...@fredhutch.org> wrote:

> Hi Mike,
>
> IIUC you want to do something like
>
>   setdiff(methods(class="DESeqDataSet"),
>   methods(class="RangedSummarizedExperiment"))
>
> Maybe this could be handled by methods() itself e.g. with
> an extra argument that lets the user choose if s/he wants to see
> all the methods (i.e. specific + inherited) or only the specific
> ones?
>
> H.
>
> On 06/27/2016 09:39 AM, Michael Love wrote:
>
>> hi,
>>
>> Following on a conversation from Bioc2016, I think it would be good to
>> have
>> a function available to Bioconductor users that helps in the following
>> situation:
>>
>> I'm a user, trying out a new package 'foo', which defines the FooData
>> class, that builds on top of SummarizedExperiment. The package author has
>> defined a set of methods for the FooData class in the 'foo' package. From
>> the R console, I want to see a list of these methods, but I don't want to
>> look through the entire list of methods defined for SummarizedExperiment,
>> Vector, and Annotated.
>>
>> I think such a function should live somewhere easily accessible for
>> Bioconductor users. I worked out two implementations of this function
>> which
>> you can see here:
>>
>> http://rpubs.com/mikelove/pkgmethods
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Listing package-specific methods

2016-06-27 Thread Michael Love
1) I guess if someone else defined methods for the FooData class then those
could be of interest, but I think nearly all of the time I want to know
just what this specific package author has defined.

2) This is getting down to my (perhaps idiosyncratic) wants, but I would
like a quick and short printout to the console, so that I can recall the
names (maybe I just forgot) or to look up the man page. So the signatures
then just take up extra space IMO.

On Mon, Jun 27, 2016 at 10:08 AM, Vincent Carey <st...@channing.harvard.edu>
wrote:

> 1) is the package connection key, or is it the 'directness' of the method
> in connection
>
> with the class of interest?  (sorry to be so vague, there must be a more
> scientific term...)
>
> 2) it seems useful to get the signature too.  this uses string operations
> to get at something that
>
> the class system likely "knows about" but seems to get close to your target
>
> > directMethods = function(cl) {
>
> +  grep(cl, methods(class=cl), value=TRUE)
>
> + }
>
> > directMethods(class(dds))
>
>  [1] "coef.DESeqDataSet"
>
>  [2] "coerce,DESeqDataSet,RangedSummarizedExperiment-method"
>
>  [3] "coerce<-,DESeqDataSet,RangedSummarizedExperiment-method"
>
>  [4] "counts,DESeqDataSet-method"
>
>  [5] "counts<-,DESeqDataSet,matrix-method"
>
>  [6] "design,DESeqDataSet-method"
>
>  [7] "design<-,DESeqDataSet,formula-method"
>
>  [8] "dispersionFunction,DESeqDataSet-method"
>
>  [9] "dispersionFunction<-,DESeqDataSet,function-method"
>
> [10] "dispersions,DESeqDataSet-method"
>
> [11] "dispersions<-,DESeqDataSet,numeric-method"
>
> [12] "estimateDispersions,DESeqDataSet-method"
>
> [13] "estimateSizeFactors,DESeqDataSet-method"
>
> [14] "normalizationFactors,DESeqDataSet-method"
>
> [15] "normalizationFactors<-,DESeqDataSet,matrix-method"
>
> [16] "plotDispEsts,DESeqDataSet-method"
>
> [17] "plotMA,DESeqDataSet-method"
>
> [18] "sizeFactors,DESeqDataSet-method"
>
> [19] "sizeFactors<-,DESeqDataSet,numeric-method"
>
> On Mon, Jun 27, 2016 at 9:39 AM, Michael Love <michaelisaiahl...@gmail.com
> > wrote:
>
>> hi,
>>
>> Following on a conversation from Bioc2016, I think it would be good to
>> have
>> a function available to Bioconductor users that helps in the following
>> situation:
>>
>> I'm a user, trying out a new package 'foo', which defines the FooData
>> class, that builds on top of SummarizedExperiment. The package author has
>> defined a set of methods for the FooData class in the 'foo' package. From
>> the R console, I want to see a list of these methods, but I don't want to
>> look through the entire list of methods defined for SummarizedExperiment,
>> Vector, and Annotated.
>>
>> I think such a function should live somewhere easily accessible for
>> Bioconductor users. I worked out two implementations of this function
>> which
>> you can see here:
>>
>> http://rpubs.com/mikelove/pkgmethods
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Listing package-specific methods

2016-06-27 Thread Michael Love
hi,

Following on a conversation from Bioc2016, I think it would be good to have
a function available to Bioconductor users that helps in the following
situation:

I'm a user, trying out a new package 'foo', which defines the FooData
class, that builds on top of SummarizedExperiment. The package author has
defined a set of methods for the FooData class in the 'foo' package. From
the R console, I want to see a list of these methods, but I don't want to
look through the entire list of methods defined for SummarizedExperiment,
Vector, and Annotated.

I think such a function should live somewhere easily accessible for
Bioconductor users. I worked out two implementations of this function which
you can see here:

http://rpubs.com/mikelove/pkgmethods

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] workflow builder "Failed to check repository revision..."

2016-03-01 Thread Michael Love
hi,

I committed a new version of the rnaseqGene workflow about an hour ago and
am waiting on the workflow builder to start running.

There may be something going on here:

Started on Mar 1, 2016 10:28:00 AM
Received SCM poll call on master for rnaseqGene on Mar 1, 2016 10:28:01 AM
ERROR: Failed to check repository revision for
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/workflows/rnaseqGene
org.tmatesoft.svn.core.SVNCancelException: svn: E200015: authentication
cancelled

http://docbuilder.bioconductor.org:8080/job/rnaseqGene/scmPollLog/

thanks,
Mike

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] workflows running Bioc-devel

2016-02-28 Thread Michael Love
+1

I'm also interested in having the option of devel / release workflow
branches for the reasons Aaron mentions.

I develop software and workflow in tandem but currently there is a
disconnect in showing publicly how software changes will impact a workflow.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] package citation and version is not updated

2016-02-15 Thread Michael Love
hi Dan,

I'm having the same issue with tximport:

http://bioconductor.org/packages/devel/bioc/html/tximport.html

The CITATION file has been in there since the package was added to devel
last week

https://github.com/Bioconductor-mirror/tximport/blob/master/inst/CITATION

citation("tximport") works

thanks,
Mike

On Fri, Jan 29, 2016 at 5:56 PM, Dan Tenenbaum 
wrote:

> This is fixed now. Thanks for the report.
> Dan
>
>
> - Original Message -
> > From: "Ludwig Geistlinger" 
> > To: "bioc-devel" 
> > Sent: Friday, January 29, 2016 2:06:37 AM
> > Subject: [Bioc-devel] package citation and version is not updated
>
> > Dear Dan,
> >
> > For some reason the citation on my package's landing page
> >
> > http://bioconductor.org/packages/EnrichmentBrowser.html
> >
> > is not displayed in agreement with my inst/CITATION file.
> >
> > Furthermore, the version number is also not up to date here, although
> > displayed correctly at the bottom of the page (--> Package Archives).
> >
> > How to resolve this?
> >
> > Thx,
> > Ludwig
> >
> > --
> > Dipl.-Bioinf. Ludwig Geistlinger
> >
> > Lehr- und Forschungseinheit für Bioinformatik
> > Institut für Informatik
> > Ludwig-Maximilians-Universität München
> > Amalienstrasse 17, 2. Stock, Büro A201
> > 80333 München
> >
> > Tel.: 089-2180-4067
> > eMail: ludwig.geistlin...@bio.ifi.lmu.de
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] TIMEOUTs on Morelia

2016-02-01 Thread Michael Love
​Thanks Stephen and Dan, sorry I haven't chased this down recently.

I have access to a Mac and will look into it now.

-Mike​

On Mon, Feb 1, 2016 at 1:45 PM, Dan Tenenbaum 
wrote:

> Actually, it does seem like there is a more complicated issue going on. I
> will investigate and report back here.
>
> Thanks,
> Dan
>
>
> - Original Message -
> > From: "Dan Tenenbaum" 
> > To: "Hartley, Stephen (NIH/NHGRI) [F]" 
> > Cc: "bioc-devel" 
> > Sent: Monday, February 1, 2016 10:11:54 AM
> > Subject: Re: [Bioc-devel] TIMEOUTs on Morelia
>
> > I think the morelia server was overloaded, so it was restarted yesterday
> around
> > 11PM Seattle time, so I hope we will see fewer TIMEOUTs in today's build
> > report.
> > Dan
> >
> >
> > - Original Message -
> >> From: "Hartley, Stephen (NIH/NHGRI) [F]" 
> >> To: "bioc-devel" 
> >> Sent: Monday, February 1, 2016 10:06:11 AM
> >> Subject: [Bioc-devel] TIMEOUTs on Morelia
> >
> >> The current devel build for DESeq2 has status TIMEOUT on Morelia. This
> is odd
> >> since the package builder gives it 40 minutes to complete, and it
> finishes in
> >> less than 5 on zin2 and muscato2.
> >>
> >> A bunch of packages that depend on or suggest DESeq2 also timeout on
> Morelia:
> >> DiffBind, derfinder, DChIPRep, gage, phyloseq, rgsepd, systemPipeR, and
> TCC.
> >> All of them time out after 40 minutes, but take less than 10 minutes to
> build
> >> on zin2 and moscato2.
> >>
> >> So I guess the question is: is it DESeq2, or is something wrong with
> the Morelia
> >> server? I don't have an OSX-mavericks machine to test it all out on.
> >>
> >> Is there any way to get previous daily build reports, or are they
> deleted daily?
> >>
> >> -Steve
> >>
> >>  [[alternative HTML version deleted]]
> >>
> >> ___
> >> Bioc-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] TIMEOUTs on Morelia

2016-02-01 Thread Michael Love
I was able to construct a minimal example where it seems the bug originates
from SummarizedExperiment:

library(SummarizedExperiment)
se <- SummarizedExperiment(matrix(1:6,ncol=2))
rse <- as(se, "RangedSummarizedExperiment")
rse[1:2,] # hangs


On Mon, Feb 1, 2016 at 2:57 PM, Michael Love <michaelisaiahl...@gmail.com>
wrote:

> ​Thanks Stephen and Dan, sorry I haven't chased this down recently.
>
> I have access to a Mac and will look into it now.
>
> -Mike​
>
> On Mon, Feb 1, 2016 at 1:45 PM, Dan Tenenbaum <dtene...@fredhutch.org>
> wrote:
>
>> Actually, it does seem like there is a more complicated issue going on. I
>> will investigate and report back here.
>>
>> Thanks,
>> Dan
>>
>>
>> - Original Message -
>> > From: "Dan Tenenbaum" <dtene...@fredhutch.org>
>> > To: "Hartley, Stephen (NIH/NHGRI) [F]" <stephen.hart...@nih.gov>
>> > Cc: "bioc-devel" <bioc-devel@r-project.org>
>> > Sent: Monday, February 1, 2016 10:11:54 AM
>> > Subject: Re: [Bioc-devel] TIMEOUTs on Morelia
>>
>> > I think the morelia server was overloaded, so it was restarted
>> yesterday around
>> > 11PM Seattle time, so I hope we will see fewer TIMEOUTs in today's build
>> > report.
>> > Dan
>> >
>> >
>> > - Original Message -
>> >> From: "Hartley, Stephen (NIH/NHGRI) [F]" <stephen.hart...@nih.gov>
>> >> To: "bioc-devel" <bioc-devel@r-project.org>
>> >> Sent: Monday, February 1, 2016 10:06:11 AM
>> >> Subject: [Bioc-devel] TIMEOUTs on Morelia
>> >
>> >> The current devel build for DESeq2 has status TIMEOUT on Morelia. This
>> is odd
>> >> since the package builder gives it 40 minutes to complete, and it
>> finishes in
>> >> less than 5 on zin2 and muscato2.
>> >>
>> >> A bunch of packages that depend on or suggest DESeq2 also timeout on
>> Morelia:
>> >> DiffBind, derfinder, DChIPRep, gage, phyloseq, rgsepd, systemPipeR,
>> and TCC.
>> >> All of them time out after 40 minutes, but take less than 10 minutes
>> to build
>> >> on zin2 and moscato2.
>> >>
>> >> So I guess the question is: is it DESeq2, or is something wrong with
>> the Morelia
>> >> server? I don't have an OSX-mavericks machine to test it all out on.
>> >>
>> >> Is there any way to get previous daily build reports, or are they
>> deleted daily?
>> >>
>> >> -Steve
>> >>
>> >>  [[alternative HTML version deleted]]
>> >>
>> >> ___
>> >> Bioc-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >
>> > ___
>> > Bioc-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] TIMEOUTs on Morelia

2016-02-01 Thread Michael Love
Thanks for the quick fix!
On Feb 1, 2016 6:36 PM, "Hervé Pagès" <hpa...@fredhutch.org> wrote:

> Hi all,
>
> The problem was in S4Vectors where I introduced a regression about 10
> days ago. Thanks to Martin for spotting the bug for me. Should be fixed
> in S4Vectors 0.9.27.
>
> Sorry for the inconvenience,
> H.
>
>
> On 02/01/2016 01:02 PM, Morgan, Martin wrote:
>
>> I don't see an error, but on linux under valgrind and current R-devel /
>> bioc packages and especially
>>
>> packageVersion("S4Vectors")
>>>
>> [1] '0.9.26'
>>
>>
>> I see
>>
>> rse[1:2,] # hangs
>>>
>> ==23105== Invalid read of size 4
>> ==23105==at 0xD0B8D4D: int_bsearch (Rle_class.c:606)
>> ==23105==by 0xD0B8C45: find_window_runs2 (Rle_class.c:654)
>> ==23105==by 0xD0B868B: find_runs_of_ranges2 (Rle_class.c:719)
>> ==23105==by 0xD0B6F5A: find_runs_of_ranges (Rle_class.c:837)
>> ==23105==by 0xD0B75CA: _subset_Rle_by_ranges (Rle_class.c:996)
>> ==23105==by 0xD0B7A42: Rle_extract_ranges (Rle_class.c:1018)
>> ==23105==by 0x4F24411: R_doDotCall (dotcode.c:582)
>> ==23105==by 0x4F31621: do_dotcall (dotcode.c:1251)
>> ==23105==by 0x4F6FA8A: Rf_eval (eval.c:713)
>> ==23105==by 0x4F88101: do_begin (eval.c:1806)
>> ==23105==by 0x4F6F8B7: Rf_eval (eval.c:685)
>> ==23105==by 0x4F85303: Rf_applyClosure (eval.c:1134)
>> ==23105==  Address 0x80f1760 is 0 bytes after a block of size 0 alloc'd
>> ==23105==at 0x4C2AB80: malloc (in
>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==23105==by 0xD0B8549: find_runs_of_ranges2 (Rle_class.c:704)
>> ==23105==by 0xD0B6F5A: find_runs_of_ranges (Rle_class.c:837)
>> ==23105==by 0xD0B75CA: _subset_Rle_by_ranges (Rle_class.c:996)
>> ==23105==by 0xD0B7A42: Rle_extract_ranges (Rle_class.c:1018)
>> ==23105==by 0x4F24411: R_doDotCall (dotcode.c:582)
>> ==23105==by 0x4F31621: do_dotcall (dotcode.c:1251)
>> ==23105==by 0x4F6FA8A: Rf_eval (eval.c:713)
>> ==23105==by 0x4F88101: do_begin (eval.c:1806)
>> ==23105==by 0x4F6F8B7: Rf_eval (eval.c:685)
>> ==23105==by 0x4F85303: Rf_applyClosure (eval.c:1134)
>> ==23105==by 0x4F6FB9C: Rf_eval (eval.c:732)
>> ==23105==
>> ==23105== Invalid read of size 4
>> ==23105==at 0xD0B8D7D: int_bsearch (Rle_class.c:612)
>> ==23105==by 0xD0B8C45: find_window_runs2 (Rle_class.c:654)
>> ==23105==by 0xD0B868B: find_runs_of_ranges2 (Rle_class.c:719)
>> ==23105==by 0xD0B6F5A: find_runs_of_ranges (Rle_class.c:837)
>> ==23105==by 0xD0B75CA: _subset_Rle_by_ranges (Rle_class.c:996)
>> ==23105==by 0xD0B7A42: Rle_extract_ranges (Rle_class.c:1018)
>> ==23105==by 0x4F24411: R_doDotCall (dotcode.c:582)
>> ==23105==by 0x4F31621: do_dotcall (dotcode.c:1251)
>> ==23105==by 0x4F6FA8A: Rf_eval (eval.c:713)
>> ==23105==by 0x4F88101: do_begin (eval.c:1806)
>> ==23105==by 0x4F6F8B7: Rf_eval (eval.c:685)
>> ==23105==by 0x4F85303: Rf_applyClosure (eval.c:1134)
>> ==23105==  Address 0x8ba088c is 4 bytes before a block of size 0 alloc'd
>> ==23105==at 0x4C2AB80: malloc (in
>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==23105==by 0xD0B8549: find_runs_of_ranges2 (Rle_class.c:704)
>> ==23105==by 0xD0B6F5A: find_runs_of_ranges (Rle_class.c:837)
>> ==23105==by 0xD0B75CA: _subset_Rle_by_ranges (Rle_class.c:996)
>> ==23105==by 0xD0B7A42: Rle_extract_ranges (Rle_class.c:1018)
>> ==23105==by 0x4F24411: R_doDotCall (dotcode.c:582)
>> ==23105==by 0x4F31621: do_dotcall (dotcode.c:1251)
>> ==23105==by 0x4F6FA8A: Rf_eval (eval.c:713)
>> ==23105==by 0x4F88101: do_begin (eval.c:1806)
>> ==23105==by 0x4F6F8B7: Rf_eval (eval.c:685)
>> ==23105==by 0x4F85303: Rf_applyClosure (eval.c:1134)
>> ==23105==by 0x4F6FB9C: Rf_eval (eval.c:732)
>> ==23105==
>>
>>
>>
>> 
>> From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Michael
>> Love <michaelisaiahl...@gmail.com>
>> Sent: Monday, February 1, 2016 3:35 PM
>> To: Dan Tenenbaum
>> Cc: bioc-devel; Hervé Pagès
>> Subject: Re: [Bioc-devel] TIMEOUTs on Morelia
>>
>> I was able to construct a minimal example where it seems the bug
>> originates
>> from SummarizedExperiment:
>>
>> library(SummarizedExperiment)
>> se <- SummarizedExperiment(matrix(1:6,ncol=2))
>> rse <- as(se, "RangedSummarizedExperiment")
>> rse[1:2,] # hangs
>

Re: [Bioc-devel] ggplot2 2.0.0 export Position now, conflict with BiocGeneric::Position?

2015-12-19 Thread Michael Love
thanks for the heads up Lorena and Martin

I've just checked in a DESeq2 using importFrom of ggplot2 functions

I now just have one warning for DESeq2 CHECK:

Warning: multiple methods tables found for ‘relistToClass’

On Sat, Dec 19, 2015 at 2:02 PM, Morgan, Martin <
martin.mor...@roswellpark.org> wrote:

> You are right that BiocGenerics creates and exports a generic Position.
>
> I think you are right that the warning is coming from DESeq2, which
> imports(BiocGenerics) and imports(ggplot2).
>
> The solution is for the DESeq2 author to do as you did, importFrom()
> selectively, or continue to declare Imports: in the DESCRIPTION file but
> use ggplot2::Position, etc., in the R code.
>
> I am confident that this will be cleaned up by the DESeq2 author in the
> near future; there are a number of other problems likely to be introduced
> by ggplot2 changes in other Bioconductor packages.
>
> Martin
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Lo [
> lorena.pant...@gmail.com]
> Sent: Saturday, December 19, 2015 12:35 PM
> To: bioc-devel@r-project.org
> Subject: [Bioc-devel] ggplot2 2.0.0 export Position now, conflict with
> BiocGeneric::Position?
>
> Hi all,
>
> maybe this is so stupid, I just updated to R 3.3 and all packages,
> including gglot2 (2.0.0, two days ago).
>
> I noticed ggplot2 now export Positions, and I think BiocGenerics as well
> has that?
>
> Now I having problems with my package that is under review in the
> package builder with this warning:
>
> Warning: replacing previous import by ‘ggplot2::Position’ when loading
> ‘DESeq2’
>
> I import DESeq2 and DESeq2 import ggplot2. I was importing as well
> ggplot2, so I was seeing this warning twice. I had to use importFrom to
> fix it. Now I only see the warnings once.
>
> Could someone try to replicate?
>
> I just updated ggplot2 to 2.0.0 in R3.3 and installed devel version of
> DESeq2, and I get the warning. I guess any package importing ggplot2
> should do the same.
>
> I just want to know if it is some stupid thing I am doing, or this will
> start happening always n the future?
>
> sorry if it was stupid.
>
> thanks
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Bioc twitter bot is on repeat

2015-11-24 Thread Michael Love
best said by James Eddy:

"Could someone please put out a new @Bioconductor package so @Bioconductor
stops tweeting about the same 3 every day?"

https://twitter.com/jamesaeddy/status/668911442007949313

https://twitter.com/Bioconductor

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-09-15 Thread Michael Love
+1

I was in need of this function yesterday and generally about once a week,
when looking up ranges in IGV or UCSC.

On Tue, Sep 15, 2015 at 12:59 PM, Michael Lawrence <
lawrence.mich...@gene.com> wrote:

> Did this as.character method ever get added? It was a good idea, and we
> should add it even though we haven't figured out the table stuff yet. It's
> fine if it appends the strand whenever there is at least one range with
> +/-.
>
> Michael
>
> On Mon, Apr 27, 2015 at 2:23 PM, Hervé Pagès  wrote:
>
> > On 04/27/2015 02:15 PM, Michael Lawrence wrote:
> >
> >> It would be nice to have a single function call that would hide these
> >> details. It could probably be made more efficient also by avoiding
> >> multiple matching, unnecessary revmap lists, etc. tableAsGRanges() is
> >> not a good name but it conveys what I mean (does that make it actually
> >> good?).
> >>
> >
> > There is nothing specific to GRanges here. We're just reporting the
> > frequency of unique elements in a metadata column so this belongs to
> > the "extended" Vector API in the same way that findMatches/countMatches
> > do.
> >
> > H.
> >
> >
> >> On Mon, Apr 27, 2015 at 12:23 PM, Hervé Pagès  >> > wrote:
> >>
> >> On 04/24/2015 11:41 AM, Michael Lawrence wrote:
> >>
> >> Taking this a bit off topic but it would be nice if we could get
> >> the
> >> GRanges equivalent of as.data.frame(table(x)), i.e., unique(x)
> >> with a
> >> count mcol. Should be easy to support but what should the API be
> >> like?
> >>
> >>
> >> This was actually the motivating use case for introducing
> >> findMatches/countMatches a couple of years ago:
> >>
> >>ux <- unique(x)
> >>mcols(ux)$Freq <- countMatches(ux, x)
> >>
> >> Don't know what a good API would be to make this even more
> >> straightforward though. Maybe via some extra argument to unique()
> >> e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
> >> argument of reduce(). Note that unique() could also support the
> >> 'with.revmap' arg. Once it does, the 'with.freq' arg can also
> >> be implemented by just calling elementLengths() on the "revmap"
> >> metadata column.
> >>
> >> H.
> >>
> >>
> >> On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès
> >> 
> >> >>
> >> wrote:
> >>
> >>  On 04/24/2015 10:18 AM, Michael Lawrence wrote:
> >>
> >>  It is a great idea, but I'm not sure I would use it to
> >> implement
> >>  table(). Allocating those strings will be costly. Don't
> >> we
> >>  already have
> >>  the 4-way int hash? Of course, my intuition might be
> >> completely
> >>  off here.
> >>
> >>
> >>  It does use the 4-way int hash internally. as.character()
> >> is only used
> >>  at the very-end to stick the names on the returned table
> >> object.
> >>
> >>  H.
> >>
> >>
> >>
> >>  On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
> >>  
> >> >
> >>   >>   >>  >>
> >>   Hi Pete,
> >>
> >>   Excellent idea. That will make things like table()
> >> work
> >>  out-of-the-box
> >>   on GenomicRanges objects. I'll add that.
> >>
> >>   Thanks,
> >>   H.
> >>
> >>
> >>
> >>   On 04/24/2015 09:43 AM, Peter Haverty wrote:
> >>
> >>   Would people be interested in having this:
> >>
> >>   setMethod("as.character", "GenomicRanges",
> >>   function(x) {
> >>   paste0(seqnames(x), ":",
> >> start(x), "-",
> >>  end(x))
> >>   })
> >>
> >>   ?
> >>
> >>   I find myself doing that a lot to make unique
> >> names or for
> >>   output that
> >>   goes to collaborators.  I suppose we might
> >> want to tack
> >>  on the
> >>   strand if it
> >>   isn't "*".  I have some code for going the
> other
> >>  direction too,
> >>   if there is
> >>   interest.
> >>
> >>
> >>
> >>  

[Bioc-devel] Bioc workflows: using rmarkdown as vignette engine

2015-08-14 Thread Michael Love
hi Dan,

(I add bioc-devel as this might be useful info for others)

Is it possible to specify rmarkdown as the vignette engine for workflows?

The citation functionality [1] of rmarkdown would be useful, as there
is discussion to have workflows possible submitted as peer reviewed
papers.

I tried adding:

!--
%\VignetteEngine{knitr::rmarkdown}
--

as recommended here [2] but the auto package builder adds this on top:

!--
 %\VignetteEngine{knitr::knitr}
--

and my test of using a citation fails [3] (although locally running
rmarkdown::render(), the citation and bibliography works). My test
citation look like: [@love2014]

best,

Mike

[1] http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html
[2] 
http://www.bioconductor.org/packages/release/bioc/vignettes/BiocStyle/inst/doc/HtmlStyle.html
[3] http://bioconductor.org/help/workflows/rnaseqGene/

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] testing class and length of function args

2015-07-23 Thread Michael Love
Thanks all for the pointers
On Jul 23, 2015 10:51 AM, Charles C. Berry ccbe...@ucsd.edu wrote:

 On Wed, 22 Jul 2015, Michael Love wrote:

  it's slightly annoying to write

 foo - function(x) {
  if ( ! is.numeric(x) ) stop(x should be numeric)
  if ( ! length(x) == 2 ) stop(x should be length 2)
  c(x[2], x[1])
 }

 i wonder if we could have some core functions that test the class and
 the length in one and give the appropriate stop message.

 maybe this exists already



 Not what you are asking, but perhaps this saves you a step:

 stopifnot( is.numeric(x), length(x)==2 )

 stopifnot( ... ) will stop and issue an error msg for the first arg that
 is FALSE.

 HTH,

 Chuck


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] testing class and length of function args

2015-07-22 Thread Michael Love
it's slightly annoying to write

foo - function(x) {
  if ( ! is.numeric(x) ) stop(x should be numeric)
  if ( ! length(x) == 2 ) stop(x should be length 2)
  c(x[2], x[1])
}

i wonder if we could have some core functions that test the class and
the length in one and give the appropriate stop message.

maybe this exists already

-Mike

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] making txdb, and propagating metadata from AnnotationHub to GenomicFeatures

2015-06-18 Thread Michael Love
that's great Hervé and Sonali!

thanks for the quick response.

best, m

On Thu, Jun 18, 2015 at 10:07 AM, Sonali  Arora sar...@fredhutch.org wrote:
 Hi Michael, Herve,


 On 6/17/2015 9:43 PM, Hervé Pagès wrote:

 Hi Michael,

 On 06/17/2015 12:35 AM, Michael Love wrote:

 Background:

 With previous approaches that I would recommend to users for building
 txdb along the way of making count tables, it was desirable that the
 GTF release information would *automatically* be passed into the
 metadata of the rowRanges of the SummarizedExperiment.

 for example, in parathyroidSE, using makeTxDbFromBiomart:

 ...
 BioMart database version : chr ENSEMBL GENES 72 (SANGER UK)
 BioMart dataset : chr hsapiens_gene_ensembl
 BioMart dataset description : chr Homo sapiens genes (GRCh37.p11)

 or, alternatively, using makeTxDbFromGFF, it would be present in the
 name of the GTF file:

 ...
 Data source: chr Homo_sapiens.GRCh37.75.gtf


 Question:

 I'm now interested in switching over to using GTF files available
 through AnnotationHub, and wondering how we can maintain this
 automatic propagation of metadata.

 here's some example code:

 library(GenomicFeatures)
 library(AnnotationHub)
 ah - AnnotationHub()
 z - query(ah, c(Ensembl,gtf,Caenorhabditis elegans,release-80))
 stopifnot(length(z) == 1)
 z$title

   [1] Caenorhabditis_elegans.WBcel235.80.gtf

 gtf - ah[[names(z)]]
 metadata(gtf)

   $AnnotationHubName
   [1] AH47045


 A first (and hopefully easy) improvement would be that the GRanges and
 other objects on AnnotationHub had more information in their metadata().


 The GRanges from AnnotationHub now include more information in their
 metadata()
 slot. This is added in devel (2.1.27).

 library(AnnotationHub)
 ah = AnnotationHub()
 packageVersion('AnnotationHub')
 gtf - query(ah, c('gtf', 'homo sapiens', '77', 'ensembl'))
 gr - gtf[[1]]
 gr
 metadata(gr)

 metadata(gr)
 $AnnotationHubName
 [1] AH28812

 $`File Name`
 [1] Homo_sapiens.GRCh38.77.gtf.gz

 $`Data Source`
 [1]
 ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.gz;

 $Provider
 [1] Ensembl

 $Organism
 [1] Homo sapiens

 $`Taxonomy ID`
 [1] 9606


 Only some of the fields have been added here, but note that all others can
 be accessed with -

 mcols(ah[AH47045])

 or

 ah[metadata(gr)$AnnotationHubName]


 Thanks,
 Sonali.


 In the mean time, here is a workaround (in 2 steps):

 1) Add some useful metadata to 'gtf' (grabbed from the AnnotationHub
object):

   addMetadataFromAnnotationHub - function(x, ah)
   {
 stopifnot(length(ah) == 1L)
 metadata0 - list(
 `Data source`=ah$sourceurl,
 `Provider`=ah$dataprovider,
 `Organism`=ah$species,
 `Taxonomy ID`=ah$taxonomyid
 )
 metadata(x) - c(metadata0, metadata(x))
 x
   }

   gtf - addMetadataFromAnnotationHub(gtf, z)

   metadata(gtf)
   # $`Data source`
   # [1]
 ftp://ftp.ensembl.org/pub/release-80/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.80.gtf.gz;
   #
   # $Provider
   # [1] Ensembl
   #
   # $Organism
   # [1] Caenorhabditis elegans
   #
   # $`Taxonomy ID`
   # [1] 6239
   #
   # $AnnotationHubName
   # [1] AH47045

 2) Pass the metadata to makeTxDbFromGRanges():

   txdb - makeTxDbFromGRanges(gtf,
   metadata=data.frame(
   name=names(metadata(gtf)),
   value=as.character(metadata(gtf
   txdb
   # TxDb object:
   # Db type: TxDb
   # Supporting package: GenomicFeatures
   # Data source:
 ftp://ftp.ensembl.org/pub/release-80/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.80.gtf.gz
   # Provider: Ensembl
   # Organism: Caenorhabditis elegans
   # Taxonomy ID: 6239
   # AnnotationHubName: AH47045
   # Genome: WBcel235
   # transcript_nrow: 57834
   # exon_nrow: 173506
   # cds_nrow: 131562
   # Db created by: GenomicFeatures package from Bioconductor
   # Creation time: 2015-06-17 12:24:51 -0700 (Wed, 17 Jun 2015)
   # GenomicFeatures version at creation time: 1.21.13
   # RSQLite version at creation time: 1.0.0
   # DBSCHEMAVERSION: 1.1

   organism(txdb)
   # [1] Caenorhabditis elegans
   taxonomyId(txdb)
   # [1] 6239

 Step 2) should be made easier because the metadata is already in 'gtf'
 so there is no reason why the user would need to pass it again thru the
 'metadata' argument. I'll made that change to makeTxDbFromGRanges().

 H.


 txdb - makeTxDbFromGRanges(gtf)
 metadata(txdb)

  name
   1   Db type
   2Supporting package
   3Genome
   4   transcript_nrow
   5 exon_nrow
   6  cds_nrow
   7 Db created by
   8 Creation time
   9  GenomicFeatures version at creation time
   10 RSQLite version at creation time
   11

[Bioc-devel] making txdb, and propagating metadata from AnnotationHub to GenomicFeatures

2015-06-17 Thread Michael Love
Background:

With previous approaches that I would recommend to users for building
txdb along the way of making count tables, it was desirable that the
GTF release information would *automatically* be passed into the
metadata of the rowRanges of the SummarizedExperiment.

for example, in parathyroidSE, using makeTxDbFromBiomart:

...
BioMart database version : chr ENSEMBL GENES 72 (SANGER UK)
BioMart dataset : chr hsapiens_gene_ensembl
BioMart dataset description : chr Homo sapiens genes (GRCh37.p11)

or, alternatively, using makeTxDbFromGFF, it would be present in the
name of the GTF file:

...
Data source: chr Homo_sapiens.GRCh37.75.gtf


Question:

I'm now interested in switching over to using GTF files available
through AnnotationHub, and wondering how we can maintain this
automatic propagation of metadata.

here's some example code:

library(GenomicFeatures)
library(AnnotationHub)
ah - AnnotationHub()
z - query(ah, c(Ensembl,gtf,Caenorhabditis elegans,release-80))
stopifnot(length(z) == 1)
z$title

 [1] Caenorhabditis_elegans.WBcel235.80.gtf

gtf - ah[[names(z)]]
metadata(gtf)

 $AnnotationHubName
 [1] AH47045

txdb - makeTxDbFromGRanges(gtf)
metadata(txdb)

name
 1   Db type
 2Supporting package
 3Genome
 4   transcript_nrow
 5 exon_nrow
 6  cds_nrow
 7 Db created by
 8 Creation time
 9  GenomicFeatures version at creation time
 10 RSQLite version at creation time
 11  DBSCHEMAVERSION
   value
 1  TxDb
 2   GenomicFeatures
 3  WBcel235
 4 57834
 5173506
 6131562
 7 GenomicFeatures package from Bioconductor
 8  2015-06-17 09:31:10 +0200 (Wed, 17 Jun 2015)
 9   1.21.13
 101.0.0
 11  1.1


sessionInfo()

 R Under development (unstable) (2015-04-29 r68278)
 Platform: x86_64-apple-darwin12.5.0 (64-bit)
 Running under: OS X 10.10.3 (Yosemite)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats4parallel  stats graphics  grDevices datasets  utils
 [8] methods   base

 other attached packages:
  [1] AnnotationHub_2.1.26GenomicFeatures_1.21.13 AnnotationDbi_1.31.16
  [4] Biobase_2.29.1  GenomicRanges_1.21.15   GenomeInfoDb_1.5.7
  [7] IRanges_2.3.11  S4Vectors_0.7.5 BiocGenerics_0.15.2
 [10] testthat_0.10.0 knitr_1.10  BiocInstaller_1.19.6

 loaded via a namespace (and not attached):
  [1] Rcpp_0.11.6  compiler_3.3.0
  [3] futile.logger_1.4.1  XVector_0.9.1
  [5] bitops_1.0-6 futile.options_1.0.0
  [7] tools_3.3.0  zlibbioc_1.15.0
  [9] biomaRt_2.25.1   digest_0.6.8
 [11] RSQLite_1.0.0memoise_0.2.1
 [13] shiny_0.12.1 DBI_0.3.1
 [15] rtracklayer_1.29.10  httr_0.6.1
 [17] stringr_1.0.0Biostrings_2.37.2
 [19] R6_2.0.1 XML_3.98-1.2
 [21] BiocParallel_1.3.26  lambda.r_1.1.7
 [23] magrittr_1.5 Rsamtools_1.21.8
 [25] htmltools_0.2.6  GenomicAlignments_1.5.9
 [27] SummarizedExperiment_0.1.5   xtable_1.7-4
 [29] mime_0.3 interactiveDisplayBase_1.7.0
 [31] httpuv_1.3.2 stringi_0.4-1
 [33] RCurl_1.95-4.6   crayon_1.3.0

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] RSS package feeds not updated

2015-05-26 Thread Michael Love
I have another small request, when the RSS feed returns to normal it says
No build problems for myPackage. Could this title indicate whether it is
referring to release or devel?

I'm totally happy with the feed behavior otherwise.

On Wed, Dec 3, 2014 at 3:07 PM, Dan Tenenbaum dtene...@fredhutch.org
wrote:



 - Original Message -
  From: Dan Tenenbaum dtene...@fredhutch.org
  To: Kevin Rue-Albrecht kevin@ucdconnect.ie
  Cc: bioc-devel@r-project.org
  Sent: Wednesday, December 3, 2014 7:49:10 AM
  Subject: Re: [Bioc-devel] RSS package feeds not updated
 
 
 
  - Original Message -
   From: Kevin Rue-Albrecht kevin@ucdconnect.ie
   To: Stephanie M. Gogarten sdmor...@u.washington.edu
   Cc: bioc-devel@r-project.org, Dan Tenenbaum
   dtene...@fredhutch.org
   Sent: Wednesday, December 3, 2014 3:09:58 AM
   Subject: Re: [Bioc-devel] RSS package feeds not updated
  
  
  
  
   Hi Stephanie,
  
   I am using rss2email (
   http://www.allthingsrss.com/rss2email/getting-started-with-rss2email/
   ),
 
  (incidentally written by the late Aaron Swartz)
 
 
   a script that I installed to track my package feed, and that is
   run by my Windows task scheduler every day, to send me the RSS news
   in an email (not as an RSS feed per se)
  
  
   Now, just like you I get the good and the bad news following the
   nightly build. But since I get them as emails, I just set up a
   filter to mark good emails as read and I move them directly in a
   separate email folder (you could also delete them).
  
   This way, only the bad news show up as unread messages which catch
   my
   attention.
  
  
   Took a bit of time to set up, but then it ran seamlessly ever
   since.
  
   No extra work for Dan, and developers who might want to log the
   successful builds can do so.
 
 
  I don't consider it 'extra work' to restore a feature that was there
  and used to work. I do think 'no news is good news' is a good
  feature (I think it's also good to be notified when the build has
  returned to normal, so the system should only notify when build
  status changes) but it needs to work properly, so I'll try and
  figure out why it wasn't working and restore it. Then I'll let you
  know so you can remove your filter if you want.
 

 OK, I have overhauled the logic that decides when to update the rss feeds.
 So we should be back to no news is good news (it may take a cycle or two
 to actually get there). The logic is a bit different than it was before. I
 think before you would only get an everything is fine notification if the
 rss file did not exist. But now you get notifications every time the build
 status changes, but not when it was the same as yesterday. So if your
 package has an error, you'll get one notification, and when the problem is
 fixed you will get another notification saying everything is ok with the
 package. This is in line with continuous integration systems such as
 Jenkins which notify you on error and notify you again when the build has
 returned to normal.

 Thanks to all for the feedback, it's helpful to know whether and how
 people are using this feature. Let me know if you discover further issues.

 Dan



  Dan
 
 
  
  
   Hope that helps.
  
   Kevin
  
  
  
  
  
   On 3 December 2014 at 00:19, Stephanie M. Gogarten 
   sdmor...@u.washington.edu  wrote:
  
  
  
  
  
  
   On 12/2/14 8:17 AM, Dan Tenenbaum wrote:
  
  
  
  
   - Original Message -
  
  
   From: Julian Gehring  julian.gehr...@embl.de 
   To: bioc-devel@r-project.org
   Cc: Felix Klein  fkl...@embl.de 
   Sent: Tuesday, December 2, 2014 2:19:01 AM
   Subject: [Bioc-devel] RSS package feeds not updated
  
   Hi,
  
   some/many of the package build RSS feeds [1] don't receive updates.
   For
   example,
  
   http://bioconductor.org/rss/ build/packages/DESeq2.rss (from
   2014-11-04)
  
   http://bioconductor.org/rss/ build/packages/BiocInstaller. rss
   (from
   2014-10-15)
  
   report package builds as broken, although the current builds are
   fine.
  
   In contrast,
  
   http://bioconductor.org/rss/ build/packages/GenomicRanges. rss
  
   was recently updated (2014-12-01) and also contains the build
   report
   of
   both the devel and release branch (the broken ones listed above
   don't).
  
   Most of the packages that I checked manually seem to be
   outdated/broken
   - I guess there is a issue with the underlying feed system.
  
  
   Thanks for the report. I've had a look. I've changed the system so
   it
   doesn't try to be smart about not updating feeds for packages that
   do not have issues.
   This means all package feeds will be updated daily, whether there
   is
   an issue or not. Not sure if this will be annoying to feed users;
   we
   can revisit it if it's an issue.
  
   I have to say, I liked the old system better. I have a feed for
   each
   of my packages in the bottom of my email program, so I could see at
   a glance if there was a problem with one of the packages 

Re: [Bioc-devel] request for more notices about build issues

2015-04-14 Thread Michael Love
works for me. Thanks Dan.

On Tue, Apr 14, 2015 at 2:34 PM, Dan Tenenbaum dtene...@fredhutch.org wrote:


 - Original Message -
 From: Michael Love michaelisaiahl...@gmail.com
 To: bioc-devel@r-project.org
 Sent: Wednesday, April 8, 2015 11:03:02 AM
 Subject: [Bioc-devel] request for more notices about build issues

 dear core Bioc-ers,

 It would be nice to have more devel mailing list notices about large
 scale build issues, just to keep developers in the loop. (e.g. I
 often
 get a number of emails when GenomicRanges is down, asking why I'm
 breaking someone's downstream package).

 Or if you don't want to push these out to the list, to have a URL
 where core team could post information about what's going on in devel
 and rough timelines about upcoming changes. Then developers can
 pull
 this information when they need to.

 thanks for all your work! There are lots of exciting things in the
 next release, and I know that's it's a lot of work to get to the
 finish.


 If it's ok with everyone, I'll address this by just sending email to 
 bioc-devel when there are
 large-scale build issues. If you see an issue and I don't address it, please 
 email the list.

 Dan


 best,

 Mike

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] request for more notices about build issues

2015-04-08 Thread Michael Love
dear core Bioc-ers,

It would be nice to have more devel mailing list notices about large
scale build issues, just to keep developers in the loop. (e.g. I often
get a number of emails when GenomicRanges is down, asking why I'm
breaking someone's downstream package).

Or if you don't want to push these out to the list, to have a URL
where core team could post information about what's going on in devel
and rough timelines about upcoming changes. Then developers can pull
this information when they need to.

thanks for all your work! There are lots of exciting things in the
next release, and I know that's it's a lot of work to get to the
finish.

best,

Mike

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] recalling methods

2015-04-06 Thread Michael Love
I just noticed in the newsletter [1] that Martin fixed this in R-devel / R
3.2

 methods(class=GenomicRanges)
  [1] !=[ [-   %in%
   
...

thanks Martin!

[1]
http://www.bioconductor.org/help/newsletters/2015_April/#new-and-noteworthy

On Sat, Dec 6, 2014 at 5:32 PM, Michael Lawrence lawrence.mich...@gene.com
wrote:



 On Sat, Dec 6, 2014 at 9:23 AM, Wolfgang Huber whu...@embl.de wrote:


 Also some interest on our side to contribute.
 Perhaps in particular the rendering a useful index (or graph) of man
 pages on the fly in HTML / graphically.


 Great, that's the sort of thing I had in mind.


 Is it too ambitious to “learn” which methods are most important for
 objects of a particular class from analysing (running) a large code base
 (or even injecting a hook to that effect into a user’s R)?


 It's a good idea and one that Eclipse and other IDEs use for
 auto-completion. We'd just have to find the right codebase, i.e., something
 with a lot of end-user analysis code, instead of infrastructure.

 Wolfgang






  On Dec 6, 2014, at 1:19 GMT+1, Michael Love 
 michaelisaiahl...@gmail.com wrote:
 
  nice. I will play around with this. thanks Gabe!
 
  On Fri, Dec 5, 2014 at 6:37 PM, Gabe Becker becker.g...@gene.com
 wrote:
  Hey guys,
 
  Surgically removed from promptClass:
 
   classInSig - function(g, where, cl) {
 cl %in% unique(unlist(findMethods(g, where)@signatures))
 }
 genWithClass - function(cl, where) {
 allgen - getGenerics(where = where)
 ok - as.logical(unlist(lapply(allgen, classInSig, cl = cl,
 where = where)))
 allgen[ok]
 }
 
  genWithClass(IRanges, find(classMetaName(IRanges)))
  [1] ccoerce   end-gaps
  intersect
  [6] isNormal names-  namespgap
  pintersect
  [11] psetdiff punion   reduce   reverse
 setdiff
  [16] start-  startthreebands   union
  updateObject
  [21] update   width-  width
 
 
  For semantic guessing of which ones will be useful, I've got nothing
 (for
  now).
 
  ~G
 
  On Fri, Dec 5, 2014 at 11:28 AM, Michael Lawrence
  lawrence.mich...@gene.com wrote:
 
  Cool. I see hypertext as being useful here, because the generics and
  classes form an intricate and sometimes ambiguous web, especially when
  multiple inheritance and dispatch are involved. I think we should
 first
  build better tooling for introspecting S4 and for graph-based
 modeling and
  analysis of S4 architecture. For example, could we statically detect
  whether a dispatch ambiguity exists, knowing all of the methods and
  classes? And based on that build one or more end-user UIs?
 
 
 
  On Fri, Dec 5, 2014 at 11:05 AM, Michael Love
  michaelisaiahl...@gmail.com
  wrote:
 
  On Thu, Dec 4, 2014 at 4:01 PM, Michael Lawrence
  lawrence.mich...@gene.com wrote:
 
  I think this gets at the heart of at least one of the usability
 issues
  in Bioconductor: interface discoverability. Many simpler command line
  tools
  have a single-faceted interface for which it is easy to enumerate a
 list
  of
  features. There's definitely room for better ways to interrogate our
  object-oriented APIs, but it's challenging. Essentially we need a way
  for
  the user to ask what can I do with this object?. Yes, we need
 better
  introspection utilities, but we also need to integrate the query with
  documentation. In other words, we need a more dynamic, more fluid
 help
  system, oriented around S4.
 
 
  I would be interested in working on this. A minimal goal for me is a
  function that just returns a character vector of the names of the
  generics defined for the object. Filtering that down to give methods
  which are likely relevant using the documentation will definitely
 be
  a bigger challenge.
 
 
 
 
 
  On Thu, Dec 4, 2014 at 9:56 AM, Michael Love 
  michaelisaiahl...@gmail.com wrote:
 
  I was thinking about a request from someone at Bioc2014 (I can't
  remember at the moment)
 
  As an end-user, if I have an object x, how can I *quickly* recall
 the
  main methods for that? As in, without breaking my flow and going to
  ?myClass or help(myClass-class). Suppose x is a GRanges, how can
 I
  remember that there is a method called narrow() which works on x?
 
  showMethods(classes=class(x)) will print out a huge list for many
  complex Bioc classes. And printTo=FALSE turns this huge list into
 an
  unhelpful character vector, e.g.:
 
  head(showMethods(classes=GRanges,printTo=FALSE),8)
  [1] Function \.asSpace\:
   [3]  not an S4 generic function   
   [5] Function \.linkToCachedObject-\:  not an S4 generic
  function
   [7] Function
  \.replaceSEW\:
 
  any ideas?
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 
 
 [[alternative HTML version deleted

[Bioc-devel] SummarizedExperiment, differing assay dimnames

2015-04-03 Thread Michael Love
hi Martin,

I noticed with GenomicRanges 1.19.51, a new, or newly functional,
check on dimnames differing between assays.

It might be convenient to add an option that assays() will take an
incoming matrix without error no matter what, even if the dimnames on
the incoming matrix are not the same as those of the
SummarizedExperiment.

Currently, it appears if there are no dimnames on the incoming matrix,
withDimnames=FALSE allows assignment (without this arg we get an
error), but with mismatched dimnames, withDimnames=FALSE gives an
error.

Also, the help file should probably be updated, as it seems the
argument is not actually ignored for the setter. Unless this
functionality above was unintended?

withDimnames: A ‘logical(1)’, indicating whether dimnames should be
  applied to extracted assay elements (this argument is ignored
  for the setter ‘assays-’). Setting ‘withDimnames=FALSE’
  increases the speed and memory efficiency with which assays
  are extracted.

best,

Mike

~


 (se - SummarizedExperiment(matrix(1:6, ncol=2, dimnames=list(1:3,1:2)), 
 colData=DataFrame(x=1:2)))
class: SummarizedExperiment
dim: 3 2
exptData(0):
assays(1): ''
rownames(3): 1 2 3
rowRanges metadata column names(0):
colnames: NULL
colData names(1): x

 assays(se)[[test]] - matrix(7:12, ncol=2)
Error in `assays-`(`*tmp*`, value = S4 object of class SimpleList) :
  'dimnames' differ between assay elements
Calls: assays- - assays-

# error because the incoming matrix doesn't have the same dimnames

 assays(se, withDimnames=FALSE)[[test]] - matrix(7:12, ncol=2)

# no error, as expected because withDimnames=FALSE

 assays(se)[[test]] - matrix(7:12, ncol=2, dimnames=list(4:6,3:4))
Error in `assays-`(`*tmp*`, value = S4 object of class SimpleList) :
  'dimnames' differ between assay elements
Calls: assays- - assays-

# again, mismatched dimnames, so error

 assays(se, withDimnames=FALSE)[[test]] - matrix(7:12, ncol=2, 
 dimnames=list(4:6,3:4))
Error in `assays-`(`*tmp*`, withDimnames = FALSE, value = S4 object
of class SimpleList) :
  'dimnames' differ between assay elements
Calls: assays- - assays-

# error despite asking withDimnames=FALSE

 sessionInfo()
R Under development (unstable) (2015-01-19 r67547)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices datasets  utils
   methods
[9] base

other attached packages:
[1] GenomicRanges_1.19.50 GenomeInfoDb_1.3.16   IRanges_2.1.43
[4] S4Vectors_0.5.22  BiocGenerics_0.13.10  RUnit_0.4.28
[7] devtools_1.7.0knitr_1.9 BiocInstaller_1.17.7

loaded via a namespace (and not attached):
[1] evaluate_0.5.5 formatR_1.0stringr_0.6.2  tools_3.2.0XVector_0.7.4

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SummarizedExperiment, differing assay dimnames

2015-04-03 Thread Michael Love
On Fri, Apr 3, 2015 at 7:32 AM, Michael Love
michaelisaiahl...@gmail.com wrote:
 hi Martin,

 I noticed with GenomicRanges 1.19.51, a new, or newly functional,
 check on dimnames differing between assays.

ah, I'm still using 1.19.50, maybe that's the issue. But I'd still
like to hear what the new behavior will be so I can build without
error :)



 It might be convenient to add an option that assays() will take an
 incoming matrix without error no matter what, even if the dimnames on
 the incoming matrix are not the same as those of the
 SummarizedExperiment.

 Currently, it appears if there are no dimnames on the incoming matrix,
 withDimnames=FALSE allows assignment (without this arg we get an
 error), but with mismatched dimnames, withDimnames=FALSE gives an
 error.

 Also, the help file should probably be updated, as it seems the
 argument is not actually ignored for the setter. Unless this
 functionality above was unintended?

 withDimnames: A ‘logical(1)’, indicating whether dimnames should be
   applied to extracted assay elements (this argument is ignored
   for the setter ‘assays-’). Setting ‘withDimnames=FALSE’
   increases the speed and memory efficiency with which assays
   are extracted.

 best,

 Mike

 ~


 (se - SummarizedExperiment(matrix(1:6, ncol=2, dimnames=list(1:3,1:2)), 
 colData=DataFrame(x=1:2)))
 class: SummarizedExperiment
 dim: 3 2
 exptData(0):
 assays(1): ''
 rownames(3): 1 2 3
 rowRanges metadata column names(0):
 colnames: NULL
 colData names(1): x

 assays(se)[[test]] - matrix(7:12, ncol=2)
 Error in `assays-`(`*tmp*`, value = S4 object of class SimpleList) :
   'dimnames' differ between assay elements
 Calls: assays- - assays-

 # error because the incoming matrix doesn't have the same dimnames

 assays(se, withDimnames=FALSE)[[test]] - matrix(7:12, ncol=2)

 # no error, as expected because withDimnames=FALSE

 assays(se)[[test]] - matrix(7:12, ncol=2, dimnames=list(4:6,3:4))
 Error in `assays-`(`*tmp*`, value = S4 object of class SimpleList) :
   'dimnames' differ between assay elements
 Calls: assays- - assays-

 # again, mismatched dimnames, so error

 assays(se, withDimnames=FALSE)[[test]] - matrix(7:12, ncol=2, 
 dimnames=list(4:6,3:4))
 Error in `assays-`(`*tmp*`, withDimnames = FALSE, value = S4 object
 of class SimpleList) :
   'dimnames' differ between assay elements
 Calls: assays- - assays-

 # error despite asking withDimnames=FALSE

 sessionInfo()
 R Under development (unstable) (2015-01-19 r67547)
 Platform: x86_64-unknown-linux-gnu (64-bit)
 Running under: Ubuntu 14.04.2 LTS

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 LC_TIME=en_US.UTF-8
  [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
 LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 LC_ADDRESS=C
 [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats4parallel  stats graphics  grDevices datasets  utils
methods
 [9] base

 other attached packages:
 [1] GenomicRanges_1.19.50 GenomeInfoDb_1.3.16   IRanges_2.1.43
 [4] S4Vectors_0.5.22  BiocGenerics_0.13.10  RUnit_0.4.28
 [7] devtools_1.7.0knitr_1.9 BiocInstaller_1.17.7

 loaded via a namespace (and not attached):
 [1] evaluate_0.5.5 formatR_1.0stringr_0.6.2  tools_3.2.0XVector_0.7.4

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Build error for DESeq2 dev causing error in other package builds

2015-04-03 Thread Michael Love
hi Andrea,

I sent out an email to bioc-devel at the same time as you :)

https://stat.ethz.ch/pipermail/bioc-devel/2015-April/007269.html

I think this might be fixed already, but I didn't have time to try out
GenomicRanges 1.19.51 yet.

I will try soon, and if this doesn't fix it, I have a hack to ensure
dimnames are consistent for every set. So either way everything should
be resolved today.

best

Mike


On Fri, Apr 3, 2015 at 7:37 AM, Andrea Rau andrea@jouy.inra.fr wrote:
 Hello,

 I am the maintainer of a Bioconductor package called 'HTSFilter' that
 imports DESeq2. On today's build report, I see that my package (as well as
 DESeq2 and all of the other packages that import it) is showing an error
 message which seems to arise from a recent change in the DESeq2 code dealing
 with the dimnames of a DESeqDataSet object. The error message in my package
 reads:

 Error in `assays-`(`*tmp*`, value = S4 object of class
 structure(SimpleList, package = S4Vectors)) :
   'dimnames' differ between assay elements

 and seems to occur when calling the DESeq function (right after the
 gene-wise dispersion estimates message is printed). I know that today is
 the deadline for fixing errors prior to release, so the time-frame is
 somewhat short to fix errors. To ensure that my package isn't withheld from
 release, should I retrograde to the last stable version of DESeq2 rather
 than using dev?

 Thanks for your help,
 Andrea

 Here is some code + sessionInfo that illustrate the error:

 library(HTSFilter)
 library(DESeq2)
 library(Biobase)
 data(sultan)
 conds - pData(sultan)$cell.line
 dds - DESeqDataSetFromMatrix(countData = exprs(sultan),
 +colData = data.frame(cell.line = conds),
 +design = ~ cell.line)
 dds - DESeq(dds)
 estimating size factors
 estimating dispersions
 gene-wise dispersion estimates
 Error in `assays-`(`*tmp*`, value = S4 object of class SimpleList) :
   'dimnames' differ between assay elements
 sessionInfo()
 R version 3.2.0 beta (2015-04-01 r68134)
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 Running under: Windows 7 x64 (build 7601) Service Pack 1

 locale:
 [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252
 LC_MONETARY=French_France.1252
 [4] LC_NUMERIC=C   LC_TIME=French_France.1252

 attached base packages:
 [1] stats4parallel  stats graphics  grDevices utils datasets
 methods   base

 other attached packages:
  [1] DESeq2_1.7.46 RcppArmadillo_0.4.650.1.1 Rcpp_0.11.5
 GenomicRanges_1.19.50
  [5] GenomeInfoDb_1.3.16   IRanges_2.1.43S4Vectors_0.5.22
 HTSFilter_1.7.0
  [9] Biobase_2.27.3BiocGenerics_0.13.10

 loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-2futile.logger_1.4 plyr_1.8.1
 XVector_0.7.4 tools_3.2.0
  [6] futile.options_1.0.0  rpart_4.1-9   digest_0.6.8
 annotate_1.45.4   RSQLite_1.0.0
 [11] gtable_0.1.2  lattice_0.20-31   DBI_0.3.1
 proto_0.3-10  DESeq_1.19.0
 [16] cluster_2.0.1 genefilter_1.49.2 stringr_0.6.2
 locfit_1.5-9.1nnet_7.3-9
 [21] grid_3.2.0AnnotationDbi_1.29.20 XML_3.98-1.1
 survival_2.38-1   BiocParallel_1.1.21
 [26] foreign_0.8-61latticeExtra_0.6-26   Formula_1.2-0
 limma_3.23.11 geneplotter_1.45.0
 [31] ggplot2_1.0.1 reshape2_1.4.1lambda.r_1.1.7
 edgeR_3.9.14  Hmisc_3.15-0
 [36] MASS_7.3-40   scales_0.2.4  splines_3.2.0
 colorspace_1.2-6  xtable_1.7-4
 [41] acepack_1.3-3.3   munsell_0.4.2




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-04-01 Thread Michael Love
Yes, you're right! Sorry for the noise. I forgot this was how it
always behaved. All I had to do was change the argument name.

On Wed, Apr 1, 2015 at 3:51 PM, Hervé Pagès hpa...@fredhutch.org wrote:
 Hi Michael,

 On 04/01/2015 07:17 AM, Michael Love wrote:

 I'll retract those last two emails about empty GRanges. That's simply:

 se - SummarizedExperiment(assays, colData=colData)
 mcols(se) - myDataFrame


 Glad you found a simple way to do what you wanted.

 More below...


 On Tue, Mar 31, 2015 at 4:40 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:

 Would this code inspired by the release version of GenomicRanges work?
 e.g. if I want to add a DataFrame with 10 rows:

 names - letters[1:10]
 x - relist(GRanges(), PartitioningByEnd(integer(10), names=names))
 mcols(x) - DataFrame(foo=1:10)

 Then give x to the rowRanges argument of SummarizedExperiment?

 On Tue, Mar 31, 2015 at 3:49 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:

 I forgot to ask my other question. I had gone in early March and fixed
 my code to eliminate rowData-, but the argument to SummarizedExperiment
 was still called rowData, and a DataFrame could be provided. Then I
 didn't check for a few weeks, but the argument for the rowData slot is
 now called rowRanges. What's the trick to putting a DataFrame on an
 empty GRanges, so I can get the old behavior but now using the rowRanges
 argument?


 I'm not sure what you meant by so I can get the old behavior but
 now using the rowRanges argument.

 Just to clarify: the renaming of rowData to rowRanges is a change
 of name only, not a change of behavior. More precisely the new
 rowRanges() accessor should behave exactly as the old rowData()
 accessor. The same applies to the 'rowRanges' argument of the
 SummarizedExperiment() constructor. So whatever you were passing
 before to the 'rowData' argument, you should still be able to pass
 it to the new 'rowRanges' argument. Please let us know if it's not
 the case as this is certainly not intended.

 Thanks,
 H.



 On Tue, Mar 31, 2015 at 3:40 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:

 With GenomicRanges 1.19.48, I'm still having issues with re-naming the
 first assay and duplication of memory from my March 9 email. I tried
 assayNames- as well. My use case is if I am given a
 SummarizedExperiment where the first element is not named counts
 (albeit the SE is most likely coming from summarizeOverlaps() and
 already named counts...).

 sessionInfo()

 R Under development (unstable) (2015-03-31 r68129)
 Platform: x86_64-apple-darwin12.5.0 (64-bit)
 Running under: OS X 10.8.5 (Mountain Lion)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats4parallel  stats graphics  grDevices datasets  utils
 methods   base

 other attached packages:
 [1] GenomicRanges_1.19.48 GenomeInfoDb_1.3.16   IRanges_2.1.43
 S4Vectors_0.5.22
 [5] BiocGenerics_0.13.10  testthat_0.9.1devtools_1.7.0
 knitr_1.9
 [9] BiocInstaller_1.17.6

 loaded via a namespace (and not attached):
 [1] formatR_1.1XVector_0.7.4  tools_3.3.0stringr_0.6.2
 evaluate_0.5.5

 On Mon, Mar 9, 2015 at 1:21 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:



 On Mar 9, 2015 12:36 PM, Martin Morgan mtmor...@fredhutch.org
 wrote:


 On 03/09/2015 08:07 AM, Michael Love wrote:


 Some guidance on how to avoid duplication of the matrix for
 developers
 would be greatly appreciated.



 It's unsatisfactory, but using withDimnames=FALSE avoids duplication
 on extraction of assays (but obviously you don't have dimnames on the
 matrix). Row or column subsetting necessarily causes the subsetted assay
 data to be duplicated. There should not be any duplication when 
 rowRanges()
 or colData() are changed without changing their dimension / ordering.


 Thanks Martin for checking into the regression.

 Sorry, I should have been more specific earlier, I meant more
 guidance/documentation in the man page for SE. I scanned the 'Extension'
 section but didn't find a note on withDimnames for extracting the matrix 
 or
 this example of renaming the assays (it seems like this could easily be
 relevant for other package authors).

 A prominent note there might help devs write more memory efficient
 packages.

 The argument section mentions speed but I'd explicitly mention memory
 given that we're often storing big matrices:

 Setting withDimnames=FALSE  increases the speed with which assays are
 extracted.

 (its entirely possible the info is there but i missed it)

 Best,

 Mike


 Another example of a trouble point, is that if I am given an SE with
 an unnamed assay and I need to give the assay a name, this also can
 expand the memory used. I had found a solution (which works with
 GenomicRanges 1.18 / current release) with:

 names(assays(se, withDimnames=FALSE))[1] - foo

 But now I'm looking in devel and this appears to no longer work. The
 memory used expands, equivalent to:

 names(assays(se

Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-31 Thread Michael Love
With GenomicRanges 1.19.48, I'm still having issues with re-naming the
first assay and duplication of memory from my March 9 email. I tried
assayNames- as well. My use case is if I am given a
SummarizedExperiment where the first element is not named counts
(albeit the SE is most likely coming from summarizeOverlaps() and
already named counts...).

 sessionInfo()
R Under development (unstable) (2015-03-31 r68129)
Platform: x86_64-apple-darwin12.5.0 (64-bit)
Running under: OS X 10.8.5 (Mountain Lion)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4parallel  stats graphics  grDevices datasets  utils
   methods   base

other attached packages:
[1] GenomicRanges_1.19.48 GenomeInfoDb_1.3.16   IRanges_2.1.43
S4Vectors_0.5.22
[5] BiocGenerics_0.13.10  testthat_0.9.1devtools_1.7.0knitr_1.9
[9] BiocInstaller_1.17.6

loaded via a namespace (and not attached):
[1] formatR_1.1XVector_0.7.4  tools_3.3.0stringr_0.6.2  evaluate_0.5.5

On Mon, Mar 9, 2015 at 1:21 PM, Michael Love
michaelisaiahl...@gmail.com wrote:


 On Mar 9, 2015 12:36 PM, Martin Morgan mtmor...@fredhutch.org wrote:
 
  On 03/09/2015 08:07 AM, Michael Love wrote:
 
  Some guidance on how to avoid duplication of the matrix for developers
  would be greatly appreciated.
 
 
  It's unsatisfactory, but using withDimnames=FALSE avoids duplication on 
  extraction of assays (but obviously you don't have dimnames on the matrix). 
  Row or column subsetting necessarily causes the subsetted assay data to be 
  duplicated. There should not be any duplication when rowRanges() or 
  colData() are changed without changing their dimension / ordering.
 

 Thanks Martin for checking into the regression.

 Sorry, I should have been more specific earlier, I meant more 
 guidance/documentation in the man page for SE. I scanned the 'Extension' 
 section but didn't find a note on withDimnames for extracting the matrix or 
 this example of renaming the assays (it seems like this could easily be 
 relevant for other package authors).

 A prominent note there might help devs write more memory efficient packages.

 The argument section mentions speed but I'd explicitly mention memory given 
 that we're often storing big matrices:

 Setting withDimnames=FALSE  increases the speed with which assays are 
 extracted.

 (its entirely possible the info is there but i missed it)

 Best,

 Mike

 
  Another example of a trouble point, is that if I am given an SE with
  an unnamed assay and I need to give the assay a name, this also can
  expand the memory used. I had found a solution (which works with
  GenomicRanges 1.18 / current release) with:
 
  names(assays(se, withDimnames=FALSE))[1] - foo
 
  But now I'm looking in devel and this appears to no longer work. The
  memory used expands, equivalent to:
 
  names(assays(se))[1] - foo
 
  Here's some code to try this:
 
  m - matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10))
  se - SummarizedExperiment(m)
  names(assays(se, withDimnames=FALSE))[1] - foo
  names(assays(se))[1] - foo
 
  while running gc() in between steps.
 
 
  I think this is a regression of some sort, and I'll look into it. Thanks 
  for the heads-up.
 
  Martin
 
 
 
 
  On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen
  kasperdanielhan...@gmail.com wrote:
 
  On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey 
  st...@channing.harvard.edu
  wrote:
 
  I am glad you are keeping this discussion alive Kasper.
 
  On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen 
  kasperdanielhan...@gmail.com wrote:
 
  It sounds like the proposed changes are already made.  However (like
  others) I am still a bit mystified why this was necessary.  The old
  version
  did allow for a GRanges inside the DataFrame of the rowData, as far as I
  recall.  So I assume this is for efficiency.  But why?  What kind of
  data/use cases is this for?
 
  I am happy to hear that SummarizedExperiment is going to be spun out 
  into
  its own package.  When that happens, I have some comments, which I'll
  include here in anticipation
 1) I now very strongly believe it was a design mistake to not have
  colnames on the assays.  The advantage of this choice is that 
  sampleNames
  are only stored one place.  The extreme disadvantage is the high
  ineffeciency when you want colnames on an extracted assay.
 
 
  after example(SummarizedExperiment)
 
  colnames(assays(se1)[[1]])
 
  [1] A B C D E F
 
  so this seems to be optional.  But attempts to set rownames will fail
  silently
 
  rownames(assays(se1)[[1]]) = as.character(1:200)
 
 
  rownames(assays(se1)[[1]])
 
 
  NULL
  seems we could issue a warning there
 
 
 
  Vince, you need to be careful here.
 
  The assays are stored without colnames (unless something has recently
  changed).  The default is to - upon extraction - set the colnames of the
  matrix.  This however requires a copy of the entire matrix.  So
  essentially, upon extraction

Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-31 Thread Michael Love
Would this code inspired by the release version of GenomicRanges work?
e.g. if I want to add a DataFrame with 10 rows:

names - letters[1:10]
x - relist(GRanges(), PartitioningByEnd(integer(10), names=names))
mcols(x) - DataFrame(foo=1:10)

Then give x to the rowRanges argument of SummarizedExperiment?

On Tue, Mar 31, 2015 at 3:49 PM, Michael Love
michaelisaiahl...@gmail.com wrote:
 I forgot to ask my other question. I had gone in early March and fixed
 my code to eliminate rowData-, but the argument to SummarizedExperiment
 was still called rowData, and a DataFrame could be provided. Then I
 didn't check for a few weeks, but the argument for the rowData slot is
 now called rowRanges. What's the trick to putting a DataFrame on an
 empty GRanges, so I can get the old behavior but now using the rowRanges
 argument?

 On Tue, Mar 31, 2015 at 3:40 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:
 With GenomicRanges 1.19.48, I'm still having issues with re-naming the
 first assay and duplication of memory from my March 9 email. I tried
 assayNames- as well. My use case is if I am given a
 SummarizedExperiment where the first element is not named counts
 (albeit the SE is most likely coming from summarizeOverlaps() and
 already named counts...).

 sessionInfo()
 R Under development (unstable) (2015-03-31 r68129)
 Platform: x86_64-apple-darwin12.5.0 (64-bit)
 Running under: OS X 10.8.5 (Mountain Lion)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats4parallel  stats graphics  grDevices datasets  utils
methods   base

 other attached packages:
 [1] GenomicRanges_1.19.48 GenomeInfoDb_1.3.16   IRanges_2.1.43
 S4Vectors_0.5.22
 [5] BiocGenerics_0.13.10  testthat_0.9.1devtools_1.7.0
 knitr_1.9
 [9] BiocInstaller_1.17.6

 loaded via a namespace (and not attached):
 [1] formatR_1.1XVector_0.7.4  tools_3.3.0stringr_0.6.2  
 evaluate_0.5.5

 On Mon, Mar 9, 2015 at 1:21 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:


 On Mar 9, 2015 12:36 PM, Martin Morgan mtmor...@fredhutch.org wrote:
 
  On 03/09/2015 08:07 AM, Michael Love wrote:
 
  Some guidance on how to avoid duplication of the matrix for developers
  would be greatly appreciated.
 
 
  It's unsatisfactory, but using withDimnames=FALSE avoids duplication on 
  extraction of assays (but obviously you don't have dimnames on the 
  matrix). Row or column subsetting necessarily causes the subsetted assay 
  data to be duplicated. There should not be any duplication when 
  rowRanges() or colData() are changed without changing their dimension / 
  ordering.
 

 Thanks Martin for checking into the regression.

 Sorry, I should have been more specific earlier, I meant more 
 guidance/documentation in the man page for SE. I scanned the 'Extension' 
 section but didn't find a note on withDimnames for extracting the matrix or 
 this example of renaming the assays (it seems like this could easily be 
 relevant for other package authors).

 A prominent note there might help devs write more memory efficient packages.

 The argument section mentions speed but I'd explicitly mention memory given 
 that we're often storing big matrices:

 Setting withDimnames=FALSE  increases the speed with which assays are 
 extracted.

 (its entirely possible the info is there but i missed it)

 Best,

 Mike

 
  Another example of a trouble point, is that if I am given an SE with
  an unnamed assay and I need to give the assay a name, this also can
  expand the memory used. I had found a solution (which works with
  GenomicRanges 1.18 / current release) with:
 
  names(assays(se, withDimnames=FALSE))[1] - foo
 
  But now I'm looking in devel and this appears to no longer work. The
  memory used expands, equivalent to:
 
  names(assays(se))[1] - foo
 
  Here's some code to try this:
 
  m - matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10))
  se - SummarizedExperiment(m)
  names(assays(se, withDimnames=FALSE))[1] - foo
  names(assays(se))[1] - foo
 
  while running gc() in between steps.
 
 
  I think this is a regression of some sort, and I'll look into it. Thanks 
  for the heads-up.
 
  Martin
 
 
 
 
  On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen
  kasperdanielhan...@gmail.com wrote:
 
  On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey 
  st...@channing.harvard.edu
  wrote:
 
  I am glad you are keeping this discussion alive Kasper.
 
  On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen 
  kasperdanielhan...@gmail.com wrote:
 
  It sounds like the proposed changes are already made.  However (like
  others) I am still a bit mystified why this was necessary.  The old
  version
  did allow for a GRanges inside the DataFrame of the rowData, as far 
  as I
  recall.  So I assume this is for efficiency.  But why?  What kind of
  data/use cases is this for?
 
  I am happy to hear that SummarizedExperiment is going to be spun out 
  into
  its own package.  When that happens, I have

[Bioc-devel] testthat instructions for website

2015-03-13 Thread Michael Love
hi Bioc team,

Can we add some testthat instructions to the unit testing page:

http://bioconductor.org/developers/how-to/unitTesting-guidelines/

I presume the instructions would be simply to follow the basic
testthat instructions, e.g.:

https://github.com/hadley/testthat/blob/master/README.md
http://r-pkgs.had.co.nz/tests.html

...add testthat to Suggests, put tests in files starting with 'test'
inside tests/testthat/ and write a file tests/testthat.R with specific
code in it. I think this is what the packages here are all doing:

http://bioconductor.org/help/search/index.html?q=testthat

best,

Mike

 end of request, details continue below 

My reasons for wanting that Bioconductor support testthat in addition
to RUnit is that RUnit does not make it very simple to run all tests,
which should be a very simple thing to do.

With testthat, this is simply a call to test() which lives in
devtools. With RUnit you have to do:

runTestSuite(defineTestSuite(myTest, inst/unitTests/,test_))

Although that's not quite enough. That will not run the same test as
Bioconductor, or as if you ran the code in the test files in your
console, because defineTestSuite picks rngKind =
Marsaglia-Multicarry and rngNormalKind = Kinderman-Ramage, which
are not the current R default random number generators.

So then you have to do:

runTestSuite(defineTestSuite(myTest, inst/unitTests/,test_,
rngKind = default, rngNormalKind = default))

Or you could call BiocGenerics:::testPackage(), but we probably all
agree it's better that unit testing have a documented, exported
function.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-09 Thread Michael Love
Some guidance on how to avoid duplication of the matrix for developers
would be greatly appreciated.

Another example of a trouble point, is that if I am given an SE with
an unnamed assay and I need to give the assay a name, this also can
expand the memory used. I had found a solution (which works with
GenomicRanges 1.18 / current release) with:

names(assays(se, withDimnames=FALSE))[1] - foo

But now I'm looking in devel and this appears to no longer work. The
memory used expands, equivalent to:

names(assays(se))[1] - foo

Here's some code to try this:

m - matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10))
se - SummarizedExperiment(m)
names(assays(se, withDimnames=FALSE))[1] - foo
names(assays(se))[1] - foo

while running gc() in between steps.


On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen
kasperdanielhan...@gmail.com wrote:
 On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey st...@channing.harvard.edu
 wrote:

 I am glad you are keeping this discussion alive Kasper.

 On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen 
 kasperdanielhan...@gmail.com wrote:

 It sounds like the proposed changes are already made.  However (like
 others) I am still a bit mystified why this was necessary.  The old
 version
 did allow for a GRanges inside the DataFrame of the rowData, as far as I
 recall.  So I assume this is for efficiency.  But why?  What kind of
 data/use cases is this for?

 I am happy to hear that SummarizedExperiment is going to be spun out into
 its own package.  When that happens, I have some comments, which I'll
 include here in anticipation
   1) I now very strongly believe it was a design mistake to not have
 colnames on the assays.  The advantage of this choice is that sampleNames
 are only stored one place.  The extreme disadvantage is the high
 ineffeciency when you want colnames on an extracted assay.


 after example(SummarizedExperiment)

  colnames(assays(se1)[[1]])
 [1] A B C D E F

 so this seems to be optional.  But attempts to set rownames will fail
 silently

  rownames(assays(se1)[[1]]) = as.character(1:200)

  rownames(assays(se1)[[1]])

 NULL
 seems we could issue a warning there



 Vince, you need to be careful here.

 The assays are stored without colnames (unless something has recently
 changed).  The default is to - upon extraction - set the colnames of the
 matrix.  This however requires a copy of the entire matrix.  So
 essentially, upon extraction, each assay is needlessly duplicated to add
 the colnames.  This is what I mean by inefficient. I would prefer to store
 the assays with colnames.  This means that changing sampleNames of the
 object will be inefficient (as it is for eSets) since it would require a
 complete copy of everything.  But I would rather - much rather - copy when
 setting sampleNames than copy when extracting an assay.

 Best,
 Kasper

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-09 Thread Michael Love
On Mar 9, 2015 12:36 PM, Martin Morgan mtmor...@fredhutch.org wrote:

 On 03/09/2015 08:07 AM, Michael Love wrote:

 Some guidance on how to avoid duplication of the matrix for developers
 would be greatly appreciated.


 It's unsatisfactory, but using withDimnames=FALSE avoids duplication on
extraction of assays (but obviously you don't have dimnames on the matrix).
Row or column subsetting necessarily causes the subsetted assay data to be
duplicated. There should not be any duplication when rowRanges() or
colData() are changed without changing their dimension / ordering.


Thanks Martin for checking into the regression.

Sorry, I should have been more specific earlier, I meant more
guidance/documentation in the man page for SE. I scanned the 'Extension'
section but didn't find a note on withDimnames for extracting the matrix or
this example of renaming the assays (it seems like this could easily be
relevant for other package authors).

A prominent note there might help devs write more memory efficient
packages.

The argument section mentions speed but I'd explicitly mention memory given
that we're often storing big matrices:

Setting withDimnames=FALSE  increases the speed with which assays are
extracted.

(its entirely possible the info is there but i missed it)

Best,

Mike


 Another example of a trouble point, is that if I am given an SE with
 an unnamed assay and I need to give the assay a name, this also can
 expand the memory used. I had found a solution (which works with
 GenomicRanges 1.18 / current release) with:

 names(assays(se, withDimnames=FALSE))[1] - foo

 But now I'm looking in devel and this appears to no longer work. The
 memory used expands, equivalent to:

 names(assays(se))[1] - foo

 Here's some code to try this:

 m - matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10))
 se - SummarizedExperiment(m)
 names(assays(se, withDimnames=FALSE))[1] - foo
 names(assays(se))[1] - foo

 while running gc() in between steps.


 I think this is a regression of some sort, and I'll look into it. Thanks
for the heads-up.

 Martin




 On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen
 kasperdanielhan...@gmail.com wrote:

 On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey 
st...@channing.harvard.edu
 wrote:

 I am glad you are keeping this discussion alive Kasper.

 On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen 
 kasperdanielhan...@gmail.com wrote:

 It sounds like the proposed changes are already made.  However (like
 others) I am still a bit mystified why this was necessary.  The old
 version
 did allow for a GRanges inside the DataFrame of the rowData, as far
as I
 recall.  So I assume this is for efficiency.  But why?  What kind of
 data/use cases is this for?

 I am happy to hear that SummarizedExperiment is going to be spun out
into
 its own package.  When that happens, I have some comments, which I'll
 include here in anticipation
1) I now very strongly believe it was a design mistake to not have
 colnames on the assays.  The advantage of this choice is that
sampleNames
 are only stored one place.  The extreme disadvantage is the high
 ineffeciency when you want colnames on an extracted assay.


 after example(SummarizedExperiment)

 colnames(assays(se1)[[1]])

 [1] A B C D E F

 so this seems to be optional.  But attempts to set rownames will fail
 silently

 rownames(assays(se1)[[1]]) = as.character(1:200)


 rownames(assays(se1)[[1]])


 NULL
 seems we could issue a warning there



 Vince, you need to be careful here.

 The assays are stored without colnames (unless something has recently
 changed).  The default is to - upon extraction - set the colnames of the
 matrix.  This however requires a copy of the entire matrix.  So
 essentially, upon extraction, each assay is needlessly duplicated to add
 the colnames.  This is what I mean by inefficient. I would prefer to
store
 the assays with colnames.  This means that changing sampleNames of the
 object will be inefficient (as it is for eSets) since it would require a
 complete copy of everything.  But I would rather - much rather - copy
when
 setting sampleNames than copy when extracting an assay.

 Best,
 Kasper

  [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel



 --
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] recalling methods

2014-12-05 Thread Michael Love
nice. I will play around with this. thanks Gabe!

On Fri, Dec 5, 2014 at 6:37 PM, Gabe Becker becker.g...@gene.com wrote:
 Hey guys,

 Surgically removed from promptClass:

   classInSig - function(g, where, cl) {
 cl %in% unique(unlist(findMethods(g, where)@signatures))
 }
 genWithClass - function(cl, where) {
 allgen - getGenerics(where = where)
 ok - as.logical(unlist(lapply(allgen, classInSig, cl = cl,
 where = where)))
 allgen[ok]
 }

 genWithClass(IRanges, find(classMetaName(IRanges)))
  [1] ccoerce   end-gaps intersect
  [6] isNormal names-  namespgap
 pintersect
 [11] psetdiff punion   reduce   reverse  setdiff
 [16] start-  startthreebands   union
 updateObject
 [21] update   width-  width


 For semantic guessing of which ones will be useful, I've got nothing (for
 now).

 ~G

 On Fri, Dec 5, 2014 at 11:28 AM, Michael Lawrence
 lawrence.mich...@gene.com wrote:

 Cool. I see hypertext as being useful here, because the generics and
 classes form an intricate and sometimes ambiguous web, especially when
 multiple inheritance and dispatch are involved. I think we should first
 build better tooling for introspecting S4 and for graph-based modeling and
 analysis of S4 architecture. For example, could we statically detect
 whether a dispatch ambiguity exists, knowing all of the methods and
 classes? And based on that build one or more end-user UIs?



 On Fri, Dec 5, 2014 at 11:05 AM, Michael Love
 michaelisaiahl...@gmail.com
 wrote:

  On Thu, Dec 4, 2014 at 4:01 PM, Michael Lawrence
  lawrence.mich...@gene.com wrote:
  
   I think this gets at the heart of at least one of the usability issues
  in Bioconductor: interface discoverability. Many simpler command line
  tools
  have a single-faceted interface for which it is easy to enumerate a list
  of
  features. There's definitely room for better ways to interrogate our
  object-oriented APIs, but it's challenging. Essentially we need a way
  for
  the user to ask what can I do with this object?. Yes, we need better
  introspection utilities, but we also need to integrate the query with
  documentation. In other words, we need a more dynamic, more fluid help
  system, oriented around S4.
  
 
  I would be interested in working on this. A minimal goal for me is a
  function that just returns a character vector of the names of the
  generics defined for the object. Filtering that down to give methods
  which are likely relevant using the documentation will definitely be
  a bigger challenge.
 
 
  
  
  
   On Thu, Dec 4, 2014 at 9:56 AM, Michael Love 
  michaelisaiahl...@gmail.com wrote:
  
   I was thinking about a request from someone at Bioc2014 (I can't
   remember at the moment)
  
   As an end-user, if I have an object x, how can I *quickly* recall the
   main methods for that? As in, without breaking my flow and going to
   ?myClass or help(myClass-class). Suppose x is a GRanges, how can I
   remember that there is a method called narrow() which works on x?
  
   showMethods(classes=class(x)) will print out a huge list for many
   complex Bioc classes. And printTo=FALSE turns this huge list into an
   unhelpful character vector, e.g.:
  
   head(showMethods(classes=GRanges,printTo=FALSE),8)
   [1] Function \.asSpace\:
 [3]  not an S4 generic function   
 [5] Function \.linkToCachedObject-\:  not an S4 generic
  function
 [7] Function
   \.replaceSEW\:
  
   any ideas?
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
  
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




 --
 Computational Biologist
 Genentech Research

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] recalling methods

2014-12-04 Thread Michael Love
I was thinking about a request from someone at Bioc2014 (I can't
remember at the moment)

As an end-user, if I have an object x, how can I *quickly* recall the
main methods for that? As in, without breaking my flow and going to
?myClass or help(myClass-class). Suppose x is a GRanges, how can I
remember that there is a method called narrow() which works on x?

showMethods(classes=class(x)) will print out a huge list for many
complex Bioc classes. And printTo=FALSE turns this huge list into an
unhelpful character vector, e.g.:

head(showMethods(classes=GRanges,printTo=FALSE),8)
[1] Function \.asSpace\:
  [3]  not an S4 generic function   
  [5] Function \.linkToCachedObject-\:  not an S4 generic function
  [7] Function \.replaceSEW\:

any ideas?

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] plotPCA for BiocGenerics

2014-11-01 Thread Michael Love
 of PCA there is a number of different 
 plot-types
 that can all lay claim to the plot function of a PCA class, for instance
 scoreplot, scatterplot matrix of all scores, biplot, screeplot, 
 accumulated
 R^2 barplot, leverage vs. distance-to-model... (you get the idea). So 
 while
 having some very well-thought out classes for very common result types 
 such
 as PCA, this class would still need a lot of different plot methods such 
 as
 plotScores, plotScree etc (or plot(..., type='score'), but I don't find
 that very appealing). Expanding beyond PCA only muddles the water even 
 more
 - there are very few interesting data structures that only have one visual
 representation to-rule-them-all...

 just my 2c

 best
 Thomas


  Date: Mon, 20 Oct 2014 18:50:48 -0400
 From: Kevin Coombes kevin.r.coom...@gmail.com mailto:
 kevin.r.coom...@gmail.com

 Well. I have two responses to that.

 First, I think it would be a lot better/easier for users if (most)
 developers could make use of the same plot function for basic classes
 like PCA.

 Second, if you think the basic PCA plotting routine needs enhancements,
 you still have two options.  On the one hand, you could (as you said)
 try to convince the maintainer of PCA to add what you want.  If it's
 generally valuable, then he'd probably do it --- and other classes that
 use it would benefit.  On the other hand, if it really is a special
 enhancement that only makes sense for your class, then you can derive a
 class from the basic PCA class
  setClass(mySpecialPCA, contains=c(PCA), *other stuff here*)
   and implement your own version of the plot generic for this class.
 And you could tweak the as.PCA function so it returns an object of
 the
 mySpecialPCA class. And the user could still just plot the result
 without hacving to care what's happening behind the scenes.

 On 10/20/2014 5:59 PM, Michael Love wrote:

 Ah, I see now. Personally, I don't think Bioconductor developers
 should have to agree on single plotting functions for basic classes
 like 'PCA' (because this logic applies equally to the situation of all
 Bioconductor developers agreeing on single MA-plot, a single
 variance-mean plot, etc). I think letting developers define their
 plotPCA makes contributions easier (I don't have to ask the owner of
 plot.PCA to incorporate something), even though it means we have a
 growing list of generics.

 Still you have a good point about splitting computation and plotting.
 In practice, we subset the rows so PCA is not laborious.


 On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes
 kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com
 mailto:kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com
 wrote:

 Hi,

 I don't see how it needs more functions (as long as you can get
 developers to agree).  Suppose that someone can define a reusable
 PCA class.  This will contain a single plot generic function,
 defined once and reused by other classes. The existing plotPCA
 interface can also be implemented just once, in this class, as

 plotPCA - function(object, ...) plot(as.PCA(object), ...)

 This can be exposed to users of your class through namespaces.
 Then the only thing a developer needs to implement in his own
 class is the single as.PCA function.  And he/she would have
 already been rquired to implement this as part of the old
 plotPCA function.  So it can be extracted from that, and the
 developer doesn't have to reimplement the visualization code from
 the PCA class.

 Best,
   Kevin


 On 10/20/2014 5:15 PM, davide risso wrote:

 Hi Kevin,

 I see your points and I agree (especially for the specific case
 of plotPCA that involves some non trivial computations).

 On the other hand, having a wrapper function that starting from
 the raw data gives you a pretty picture (with virtually zero
 effort by the user) using a sensible choice of parameters that
 are more or less OK for RNA-seq data is useful for practitioners
 that just want to look for patterns in the data.

 I guess it would be the same to have a PCA method for each of the
 objects and then using the plot method on those new objects, but
 that would just create a lot more objects and functions than the
 current approach (like Mike was saying).

 Your as.pca or performPCA approach would be definitely better
 if all the different methods would create objects of the *same*
 PCA class, but since we are talking about different packages, I
 don't know how easy it would be to coordinate. But perhaps this
 is the way we should go.

 Best,
 davide



 On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
 kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com
 mailto:kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com
 wrote:

 Hi,

 It depends.

 The traditional R approach to these matters is that you (a)
 first

Re: [Bioc-devel] GenomicFiles: chunking

2014-10-27 Thread Michael Love
 approaches were not intended to be equivalent ways of
 arriving at the same end. The idea was that the user had a specific use
 case in mind - they either wanted to collapse the data across or within
 files.


 (4) return object from coverage(BigWigFileViews):

 Previous comment:

 ...
 coverage(BigWigFileViews) returns a wrong assay object in my opinion,
 ...
 Specifically, each (i,j) entry in the object is an RleList with a single
 element with a name equal to the seqnames of the i'th entry in the query
 GRanges.  To me, this extra nestedness is unnecessary; I would have
 expected an Rle instead of an RleList with 1 element.
 ...

 The return value from coverage(x) is an RleList with one coverage vector
 per seqlevel in 'x'. Even if there is only one seqlevel, the result still
 comes back as an RleList. This is just the default behavior.


 (5) separate the 'read' function from the MAP step

 Previous comment:

 ...
 Also, something completely different, it seems like it would be convenient
 for stuff like BigWigFileViews to not have to actually parse the file in
 the MAP step.  Somehow I would envision some kind of reading function,
 stored inside the object, which just returns an Rle when I ask for a
 (range, file).  Perhaps this is better left for later.
 ...

 The current approach for the reduce* functions is for MAP to both extract
 and manipulate data. The idea of separating the extraction step is actually
 implemented in reduceByYield(). (This function used to be yieldReduce() in
 Rsamtools in past releases.) For reduceByYield() teh user must specify
 YIELD (a reader function), MAP, REDUCE and DONE (criteria to stop
 iteration).

 I'm not sure what is best here. I thought the many-argument approach of
 reduceByYield() was possibly confusing or burdensome and so didn't use it
 in the other GenomicFiles functions. Maybe it's not confusing but instead
 makes the individual steps more clear. What do you think,

 - Should the reader function be separate from the MAP? What are the
 advantages?

 - Should READER, MAP, REDUCE be stored inside the GenomicFiles object or
 supplied as arguments to the functions?


 (6) unnamed assay in SummarizedExperiment

 Previous comment:

 ...
 The return object of reduceByRange / reduceByFile with summarize = TRUE is
 a SummarizedExperiment with an unnamed assay.  I was surprised to see that
 this is even possible.
 ...

 There is no default name for SummarizedExperiment in general. I've named
 the assay 'data' for lack of a better term. We could also go with
 'reducedData' or another suggestion.


 Thanks for the feedback.

 Valerie




 On 10/01/2014 08:30 AM, Michael Love wrote:

 hi Kasper,

 For a concrete example, I posted a R and Rout file here:

 https://gist.github.com/mikelove/deaff84dc75f125d

 Things to note: 'ranges' is a GRangesList, I cbind() the numeric
 vectors in the REDUCE, and then rbind() the final list to get the
 desired matrix.

 Other than the weird column name 'init', does this give you what you want?

 best,

 Mike

 On Tue, Sep 30, 2014 at 2:08 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:

 hi Kasper and Valerie,

 In Kasper's original email:

 I would like to be able to write a MAP function which takes
ranges, file
 instead of just
range, file
 And then chunk over say 1,000s of ranges. I could then have an
 argument to reduceByXX called something like rangeSize, which is kind
 of yieldSize.

 I was thinking, for your BigWig example, we get part of the way just
 by splitting the ranges into a GRangesList of your desired chunk size.
 Then the parallel iteration is over chunks of GRanges. Within your MAP
 function:

 import(file, which = ranges, as = Rle, format = bw)[ranges]

 returns an RleList, and calling mean() on this gives a numeric vector
 of the mean of the coverage for each range.

 Then the only work needed is how to package the result into something
 reasonable. What you would get now is a list (length of GRangesList)
 of lists (length of files) of vectors (length of GRanges).

 best,

 Mike

 On Mon, Sep 29, 2014 at 7:09 PM, Valerie Obenchain voben...@fhcrc.org
 wrote:

 Hi Kasper,

 The reduceBy* functions were intended to combine data across ranges or
 files. If we have 10 ranges and 3 files you can think of it as a 10 x 3
 grid
 where we'll have 30 queries and therefore 30 pieces of information that
 can
 be combined across different dimensions.

 The request to 'processes ranges in one operation' is a good suggestion
 and,
 as you say, may even be the more common use case.

 Some action items and explanations:

 1)  treat ranges / files as a group

 I'll add functions to treat all ranges / files as a group; essentially
 no
 REDUCER other than concatenation. Chunking (optional) will occur as
 defined
 by 'yieldSize' in the BamFile.

 2)  class clean-up

 The GenomicFileViews class was the first iteration. It was overly
 complicated and the complication didn't gain us much. In my mind the
 GenomicFiles class

Re: [Bioc-devel] plotPCA for BiocGenerics

2014-10-20 Thread Michael Love
hi Kevin,

that would imply there is only one way to plot an object of a given class.
Additionally, it would break a lot of code.​

best,

Mike

On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes kevin.r.coom...@gmail.com
wrote:

 But shouldn't they all really just be named plot for the appropriate
 objects?  In which case, there would already be a perfectly good generic
 On Oct 20, 2014 10:27 AM, Michael Love michaelisaiahl...@gmail.com
 wrote:

 I noticed that 'plotPCA' functions are defined in EDASeq, DESeq2, DESeq,
 affycoretools, Rcade, facopy, CopyNumber450k, netresponse, MAIT (maybe
 more).

 Sounds like a case for BiocGenerics.

 best,

 Mike

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] plotPCA for BiocGenerics

2014-10-20 Thread Michael Love
I agree it depends on programming style. But to continue with the example,
it would have to be:

plot(as.mySpecialObjectPCA(mySpecialObject))

because we have many packages here with their own functions.

Wouldn't this just proliferate the number of classes instead of functions?
For every type of plot in my package, I would have to add a special class
branching off of mySpecialObject. Users would have to remember to convert
from mySpecialObject to mySpecialObjectPCA or mySpecialObjectMAPlot or
mySpecialObjectVarianceMeanPlot, etc.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] plotPCA for BiocGenerics

2014-10-20 Thread Michael Love
): The worst offender in base R -- because it doesn't use
 this typical approch -- is the heatmap function.  Having tried to teach
 this function in several different classes, I have come to the conclusion
 that it is basically unusable by mortals.  And I think the problem is that
 it tries to combine too many steps -- clustering rows, clustering columns,
 scaling, visualization -- all in a single fiunction


 On 10/20/2014 3:47 PM, davide risso wrote:

 Hi Kevin,

  I don't agree. In the case of EDASeq (as I suppose it is the case for
 DESeq/DESeq2) plotting the principal components of the count matrix is only
 one of possible exploratory plots (RLE plots, MA plots, etc.).
 So, in my opinion, it makes more sense from an object oriented point of
 view to have multiple plotting methods for a single RNA-seq experiment
 object.

  In addition, this is the same strategy adopted elsewhere in
 Bioconductor, e.g., for the plotMA method.

  Just my two cents.

  Best,
 davide

 On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes 
 kevin.r.coom...@gmail.com wrote:

  I understand that breaking code is a problem, and that is admittedly
 the main reason not to immediately adopt my suggestion.

 But as a purely logical exercise, creating a PCA object X or something
 similar and using either
 plot(X)
 or
 plot(as.PCA(mySpecialObject))
 is a much more sensible use of object-oriented programming/design. This
 requires no new generics (to write or to learn).

 And you could use it to transition away from the current system by
 convincing the various package maintainers to re-implement plotPCA as
 follows:

 plotPCA - function(object, ...) {
   plot(as.PCA(object), ...)
 }

 This would be relatively easy to eventually deprecate and teach users to
 switch to the alternative.


 On 10/20/2014 1:07 PM, Michael Love wrote:

  hi Kevin,

  that would imply there is only one way to plot an object of a given
 class. Additionally, it would break a lot of code.​

  best,

  Mike

 On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes 
 kevin.r.coom...@gmail.com wrote:

 But shouldn't they all really just be named plot for the appropriate
 objects?  In which case, there would already be a perfectly good 
 generic
  On Oct 20, 2014 10:27 AM, Michael Love michaelisaiahl...@gmail.com
 wrote:

  I noticed that 'plotPCA' functions are defined in EDASeq, DESeq2,
 DESeq,
 affycoretools, Rcade, facopy, CopyNumber450k, netresponse, MAIT (maybe
 more).

 Sounds like a case for BiocGenerics.

 best,

 Mike

  [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel





  --
 http://www.avast.com/

 This email is free from viruses and malware because avast! Antivirus
 http://www.avast.com/ protection is active.




  --
 Davide Risso, PhD
 Post Doctoral Scholar
 Division of Biostatistics
 School of Public Health
 University of California, Berkeley
 344 Li Ka Shing Center, #3370
 Berkeley, CA 94720-3370
 E-mail: davide.ri...@berkeley.edu




  --
 http://www.avast.com/

 This email is free from viruses and malware because avast! Antivirus
 http://www.avast.com/ protection is active.




  --
 Davide Risso, PhD
 Post Doctoral Scholar
 Division of Biostatistics
 School of Public Health
 University of California, Berkeley
 344 Li Ka Shing Center, #3370
 Berkeley, CA 94720-3370
 E-mail: davide.ri...@berkeley.edu




 --
http://www.avast.com/

 This email is free from viruses and malware because avast! Antivirus
 http://www.avast.com/ protection is active.



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GenomicFiles: chunking

2014-10-01 Thread Michael Love
hi Kasper,

For a concrete example, I posted a R and Rout file here:

https://gist.github.com/mikelove/deaff84dc75f125d

Things to note: 'ranges' is a GRangesList, I cbind() the numeric
vectors in the REDUCE, and then rbind() the final list to get the
desired matrix.

Other than the weird column name 'init', does this give you what you want?

best,

Mike

On Tue, Sep 30, 2014 at 2:08 PM, Michael Love
michaelisaiahl...@gmail.com wrote:
 hi Kasper and Valerie,

 In Kasper's original email:

 I would like to be able to write a MAP function which takes
   ranges, file
 instead of just
   range, file
 And then chunk over say 1,000s of ranges. I could then have an
 argument to reduceByXX called something like rangeSize, which is kind
 of yieldSize.

 I was thinking, for your BigWig example, we get part of the way just
 by splitting the ranges into a GRangesList of your desired chunk size.
 Then the parallel iteration is over chunks of GRanges. Within your MAP
 function:

 import(file, which = ranges, as = Rle, format = bw)[ranges]

 returns an RleList, and calling mean() on this gives a numeric vector
 of the mean of the coverage for each range.

 Then the only work needed is how to package the result into something
 reasonable. What you would get now is a list (length of GRangesList)
 of lists (length of files) of vectors (length of GRanges).

 best,

 Mike

 On Mon, Sep 29, 2014 at 7:09 PM, Valerie Obenchain voben...@fhcrc.org wrote:
 Hi Kasper,

 The reduceBy* functions were intended to combine data across ranges or
 files. If we have 10 ranges and 3 files you can think of it as a 10 x 3 grid
 where we'll have 30 queries and therefore 30 pieces of information that can
 be combined across different dimensions.

 The request to 'processes ranges in one operation' is a good suggestion and,
 as you say, may even be the more common use case.

 Some action items and explanations:

 1)  treat ranges / files as a group

 I'll add functions to treat all ranges / files as a group; essentially no
 REDUCER other than concatenation. Chunking (optional) will occur as defined
 by 'yieldSize' in the BamFile.

 2)  class clean-up

 The GenomicFileViews class was the first iteration. It was overly
 complicated and the complication didn't gain us much. In my mind the
 GenomicFiles class is a striped down version should replace the *FileViews
 classes. I plan to deprecate the *FileViews classes.

 Ideally the class(s) in GenomicFiles would inherit from
 SummarizedExperiment. A stand-alone package for SummarizedExperiment is in
 the works for the near future. Until those plans are finalized I'd rather
 not change the inheritance structure in GenomicFiles.

 3) pack / unpack

 I experimented with this an came to the conclusion that packing / unpacking
 should be left to the user vs being done behind the scenes with reduceBy*.
 The primary difficulty is that you don't have pre-knowledge of the output
 format of the user's MAPPER. If MAPPER outputs an Rle, unpacking may be
 straightforward. If MAPPER outputs a NumericList or matrix or vector with no
 genomic coordinates then things are more complicated.

 I'm open if others have suggestions / prototypes.

 4) reduceByYield

 This was intended for chunking through ranges in a single file. You can
 imagine using bplapply() over files where each file is chunked through with
 reduceByYield(). All ranges are chunked by 'yieldSize' defined in the
 BamFile unless a 'param' dictates a subset of ranges.

 5) additional tidy

 I'll add a name to the 'assays' when summarize=TRUE, make sure return types
 are consistent when summarize=FALSE, etc.

 Thanks for the testing and feedback.


 Valerie


 On 09/29/2014 07:18 AM, Michael Love wrote:

 On Mon, Sep 29, 2014 at 9:09 AM, Kasper Daniel Hansen
 kasperdanielhan...@gmail.com wrote:

 I don't fully understand the use case for reducing by range is when the
 entire dataset won't fit into memory.  The basic assumption of these
 functions (as far as I can see) is that the output data fits in memory.
 What may not fit in memory is various earlier iterations of the data.
 For
 example, in my use case, if I just read in all the data in all the ranges
 in
 all the samples it is basically Rle's across 450MB times 38 files, which
 is
 not small.  What fits in memory is smaller chunks of this; that is true
 for
 every application.


 I was unclear. I meant that, in approach1, you have an object,
 all.Rle, which contains Rles for every range over every file. Can you
 actually run this approach on the full dataset?

 Reducing by range (or file) only makes sense when the final output
 includes
 one entity for several ranges/files ... right?  So I don't see how reduce
 would help me.


 Yes, I think we agree. This is not a good use case for reduce by range
 as now implemented.

 This is a use case which would benefit from the user-facing function
 internally calling pack()/unpack() to reduce the number of import()
 calls, and then in the end giving back

Re: [Bioc-devel] GenomicFiles: chunking

2014-09-29 Thread Michael Love
Thanks for checking it out and benchmarking. We should be more clear
in the docs that the use case for reducing by range is when the entire
dataset won't fit into memory. Also, we had some discussion and
Valerie had written up methods for packing up the ranges supplied by
the user into a better form for querying files. In your case it would
have packed many ranges together, to reduce the number of import()
calls like your naive approach. See the pack/unpack functions, which
are not in the vignette but are in the man pages. If I remember,
writing code to unpack() the result was not so simple, and development
of these functions was set aside for the moment.

Mike

On Sun, Sep 28, 2014 at 10:49 PM, Kasper Daniel Hansen
kasperdanielhan...@gmail.com wrote:

 I am testing GenomicFiles.

 My use case: I have 231k ranges of average width 1.9kb and total width 442
 MB.  I also have 38 BigWig files.  I want to compute the average coverage
 of the 38 BigWig files inside each range.  This is similar to wanting to
 get coverage of - say - all promoters in the human genome.

 My data is residing on a file server which is connected to the compute node
 through ethernet, so basically I have very slow file access.  Also. the
 BigWig files are (perhaps unusually) big: ~2GB / piece.

 Below I have two approaches, one using basically straightforward code and
 the other using GenomicFiles (examples are 10 / 100 ranges on 5 files).
 Basically GenomicFiles is 10-20x slower than the straightforward approach.
 This is most likely because reduceByRange/reduceByFile processes each
 (range, file) separately.

 It seems naturally (to me) to allow some chunking of the mapping of the
 ranges.  My naive approach is fast (I assume) because I read multiple
 ranges through one import() command.  I know I have random access to the
 BigWig files, but there must still be some overhead of seeking and perhaps
 more importantly instantiating/moving stuff back and forth to R.  So
 basically I would like to be able to write a MAP function which takes
   ranges, file
 instead of just
   range, file
 And then chunk over say 1,000s of ranges.  I could then have an argument to
 reduceByXX called something like rangeSize, which is kind of yieldSize.

 Perhaps this is what is intended for the reduceByYield on BigWig files?

 In a way, this is what is done in the vignette with the coverage(BAMFILE)
 example where tileGenome is essentially constructed by the user to chunk
 the coverage computation.

 I think the example of a set of regions I am querying on the files, will be
 an extremely common usecase going forward. The raw data for all the regions
 together is too big but I do a computation on each region to reduce the
 size.  In this situation, all the focus is on the MAP step.  I see the need
 for REDUCE in the case of the t-test example in the vignette, where the
 return object is a single thing for each base.  But in general, I think
 we will use these file structures a lot to construct things without REDUCE
 (across neither files nor ranges).

 Also, something completely different, it seems like it would be convenient
 for stuff like BigWigFileViews to not have to actually parse the file in
 the MAP step.  Somehow I would envision some kind of reading function,
 stored inside the object, which just returns an Rle when I ask for a
 (range, file).  Perhaps this is better left for later.

 Best,
 Kasper

 Examples

 approach1 - function(ranges, files) {
 ## By hand
 all.Rle - lapply(files, function(file) {
 rle - import(file, as = Rle, which = ranges, format =
 bw)[ranges]
 rle
 })
 print(object.size(all.Rle), units = auto)
 mat - do.call(cbind, lapply(all.Rle, function(xx) {
 sapply(xx, mean)
 }))
 invisible(mat)
 }

 system.time({
 mat1 - approach1(all.grs[1:10], all.files[1:5])
 })

 160.9 Kb
 user  system elapsed
1.109   0.001   1.113

 system.time({
 mat1 - approach1(all.grs[1:100], all.files[1:5])
 }) # less than 4x slower than previous call

 3 Mb
 user  system elapsed
4.101   0.019   4.291


 approach2 - function(ranges, files) {
 gf - GenomicFiles(rowData = ranges, files = files)
 MAPPER - function(range, file, ) {
 library(rtracklayer)
 rle - import(file, which = range, as = Rle, format = bw)[range]
 mean(rle)
 }
 sExp - reduceByRange(gf, MAP = MAPPER, summarize = TRUE, BPPARAM =
 SerialParam())
 sExp
 }

 system.time({
 mat2 - approach2(all.grs[1:10], all.files[1:5])
 })  # 8-9x slower than approach1

user  system elapsed
9.349   0.280   9.581

 system.time({
 mat2 - approach2(all.grs[1:100], all.files[1:5])
 })  # 9x slower than previous call, 20x slow than approach1 on same input

user  system elapsed
   89.310   0.627  91.044

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GenomicFiles: chunking

2014-09-29 Thread Michael Love
On Mon, Sep 29, 2014 at 9:09 AM, Kasper Daniel Hansen
kasperdanielhan...@gmail.com wrote:
 I don't fully understand the use case for reducing by range is when the
 entire dataset won't fit into memory.  The basic assumption of these
 functions (as far as I can see) is that the output data fits in memory.
 What may not fit in memory is various earlier iterations of the data.  For
 example, in my use case, if I just read in all the data in all the ranges in
 all the samples it is basically Rle's across 450MB times 38 files, which is
 not small.  What fits in memory is smaller chunks of this; that is true for
 every application.


I was unclear. I meant that, in approach1, you have an object,
all.Rle, which contains Rles for every range over every file. Can you
actually run this approach on the full dataset?

 Reducing by range (or file) only makes sense when the final output includes
 one entity for several ranges/files ... right?  So I don't see how reduce
 would help me.


Yes, I think we agree. This is not a good use case for reduce by range
as now implemented.

This is a use case which would benefit from the user-facing function
internally calling pack()/unpack() to reduce the number of import()
calls, and then in the end giving back the mean coverage over the
input ranges. I want this too.

https://github.com/Bioconductor/GenomicFileViews/issues/2#issuecomment-32625456
(link to the old github repo, the new github repo is named GenomicFiles)

 As I see the pack()/unpack() paradigm, it just re-orders the query ranges
 (which is super nice and matters a lot for speed for some applications).  As
 I understand the code (and my understanding is developing) we need an extra
 layer to support processing multiple ranges in one operation.

 I am happy to help apart from complaining.

 Best,
 Kasper

 On Mon, Sep 29, 2014 at 8:55 AM, Michael Love michaelisaiahl...@gmail.com
 wrote:

 Thanks for checking it out and benchmarking. We should be more clear
 in the docs that the use case for reducing by range is when the entire
 dataset won't fit into memory. Also, we had some discussion and
 Valerie had written up methods for packing up the ranges supplied by
 the user into a better form for querying files. In your case it would
 have packed many ranges together, to reduce the number of import()
 calls like your naive approach. See the pack/unpack functions, which
 are not in the vignette but are in the man pages. If I remember,
 writing code to unpack() the result was not so simple, and development
 of these functions was set aside for the moment.

 Mike

 On Sun, Sep 28, 2014 at 10:49 PM, Kasper Daniel Hansen
 kasperdanielhan...@gmail.com wrote:
 
  I am testing GenomicFiles.
 
  My use case: I have 231k ranges of average width 1.9kb and total width
  442
  MB.  I also have 38 BigWig files.  I want to compute the average
  coverage
  of the 38 BigWig files inside each range.  This is similar to wanting to
  get coverage of - say - all promoters in the human genome.
 
  My data is residing on a file server which is connected to the compute
  node
  through ethernet, so basically I have very slow file access.  Also. the
  BigWig files are (perhaps unusually) big: ~2GB / piece.
 
  Below I have two approaches, one using basically straightforward code
  and
  the other using GenomicFiles (examples are 10 / 100 ranges on 5 files).
  Basically GenomicFiles is 10-20x slower than the straightforward
  approach.
  This is most likely because reduceByRange/reduceByFile processes each
  (range, file) separately.
 
  It seems naturally (to me) to allow some chunking of the mapping of the
  ranges.  My naive approach is fast (I assume) because I read multiple
  ranges through one import() command.  I know I have random access to the
  BigWig files, but there must still be some overhead of seeking and
  perhaps
  more importantly instantiating/moving stuff back and forth to R.  So
  basically I would like to be able to write a MAP function which takes
ranges, file
  instead of just
range, file
  And then chunk over say 1,000s of ranges.  I could then have an argument
  to
  reduceByXX called something like rangeSize, which is kind of yieldSize.
 
  Perhaps this is what is intended for the reduceByYield on BigWig files?
 
  In a way, this is what is done in the vignette with the
  coverage(BAMFILE)
  example where tileGenome is essentially constructed by the user to chunk
  the coverage computation.
 
  I think the example of a set of regions I am querying on the files, will
  be
  an extremely common usecase going forward. The raw data for all the
  regions
  together is too big but I do a computation on each region to reduce
  the
  size.  In this situation, all the focus is on the MAP step.  I see the
  need
  for REDUCE in the case of the t-test example in the vignette, where the
  return object is a single thing for each base.  But in general, I
  think
  we will use these file structures a lot to construct things

[Bioc-devel] Submitting a workflow

2014-09-03 Thread Michael Love
hi,

I'd like to contribute a workflow, but I don't see a guide on how to
submit a workflow on the website.

Do the workflows have the same versioning system x.y.z as software packages?

Is it possible to use the github bridge with workflows?

thanks,

Mike

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Submitting a workflow

2014-09-03 Thread Michael Love
Thanks. I didn't poke around enough. Maybe there could be a link from

http://www.bioconductor.org/help/workflows/

for How to contribute a workflow

On Wed, Sep 3, 2014 at 9:05 AM, Andrzej Oleś andrzej.o...@gmail.com wrote:
 Hi Mike,

 see also http://www.bioconductor.org/developers/how-to/workflows/

 If you author your vignette in markdown you could try using BiocStyle
 formatting, see
 http://www.bioconductor.org/packages/devel/bioc/vignettes/BiocStyle/inst/doc/HtmlStyle.html

 Best,
 Andrzej


 On Wed, Sep 3, 2014 at 2:59 PM, Vincent Carey
 st...@channing.harvard.edu wrote:
 here's the guide i am aware of.  a link should be on the basic workflow
 page as a 'how to'.

 https://hedgehog.fhcrc.org/bioconductor/trunk/madman/workflows/README.md


 On Wed, Sep 3, 2014 at 7:47 AM, Michael Love michaelisaiahl...@gmail.com
 wrote:

 hi,

 I'd like to contribute a workflow, but I don't see a guide on how to
 submit a workflow on the website.

 Do the workflows have the same versioning system x.y.z as software
 packages?

 Is it possible to use the github bridge with workflows?

 thanks,

 Mike

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [BioC] More than one package document with the same name

2014-08-06 Thread Michael Love
(copying over to bioc-devel). I do appreciate the very few keystrokes
it takes to do, e.g.

vignette('affy')

of course I can find them all with browseVignettes('affy') and
sometimes the above is not the right name, but I like that the primary
documentation is so close at hand. this is slightly more of a pain:

vignette('affy-vignette')

On Tue, Aug 5, 2014 at 2:03 PM, Julian Gehring julian.gehr...@embl.de wrote:
 Hi,

 For the BiocCheck of the vignette name, one could also check that the
 vignette file name is not 'package.[Rnw|Rmd]'.  This would give more
 flexibility, and avoid a fixed naming scheme.

 Since the issue affects all R packages - bioc or not -, I still think it is
 worth trying to address this at the R level.

 Best wishes
 Julian



 On 05.08.2014 10:36, Dan Tenenbaum wrote:


 I'm not sure this is true. The vignette can be named whatever the package
 author wants, as long as it has the appropriate file extension.
 So BiocCheck could suggest that it be named package-vignette.Rnw instead
 of package.Rnw.
 And the compiled (PDF) reference manual doesn't exist in a source package.
 So (this is probably better than my previous suggestion) BioC could rename
 the compiled manuals on the package landing pages to be (e.g.)
 package-manual.pdf instead of package.pdf.

 Dan


 ___
 Bioconductor mailing list
 bioconduc...@r-project.org
 https://stat.ethz.ch/mailman/listinfo/bioconductor
 Search the archives:
 http://news.gmane.org/gmane.science.biology.informatics.conductor

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Deseq2 and differentia expression

2014-07-11 Thread Michael Love
hi Jarod,

This is more of a main Bioc mailing list question, so you can address
future questions there.

On Fri, Jul 11, 2014 at 6:05 AM, jarod...@libero.it jarod...@libero.it wrote:
 Dear Dr,
 Thanks so much for clarification!!!
 So I try the test of log fold change but I'm bit confusion on the results:
 If I interested in the genes that have a foldchange more than 0.5 and 2 I need
 to use this comand is it right?

the second and third results() commands below give you this.

 ddsNoPrior - DESeq(ddHTSeq, betaPrior=FALSE) #only for lessABs

 resGA - results(ddsNoPrior, lfcThreshold=.5, altHypothesis=lessAbs)
 #greater tdi
 resGA2 - results(dds, lfcThreshold=.5, altHypothesis=greaterAbs) #greater
 tdi
 resGA3 - results(dds, lfcThreshold=2, altHypothesis=greaterAbs) #greater
 tdi

 dim(resGA)
 [1] 62893 6
 dim(resGA2)
 [1] 62893 6
 dim(resGA3)
 [1] 62893 6

 The number of gene select it is always the same.. Where is my mistake!
 thanks in advance!


DESeq2 returns the results for all the genes in the same order as the
original object. You need to specify a threshold on adjusted p-value.

table(res$padj  0.1)

You can use subset(res, padj  0.1) to filter the DataFrame.


Messaggio originale
Da: michaelisaiahl...@gmail.com
Data: 10/07/2014 14.46
A: jarod...@libero.itjarod...@libero.it
Cc: bioc-devel@r-project.orgbioc-devel@r-project.org
Ogg: Re: [Bioc-devel] Deseq2 and differentia expression

hi Jarod,

On Thu, Jul 10, 2014 at 7:59 AM, jarod...@libero.it jarod...@libero.it
 wrote:
 Hi there!!!

 I have did this code:
 SampleTable -data.frame(SampleName=metadata$ID_CLINICO,
 fileName=metadata$NOME,
 condition=metadata$CONDITION,prim=metadata$CDT)
 ddHTSeq - DESeqDataSetFromHTSeqCount(sampleTable=SampleTable,directory=
 Count/, design= ~condition) # effetto dello mutazione
 ddHTSeq$condition - relevel(ddHTSeq$condition, NVI)# quindi verso non
 viscerali
 dds - DESeq(ddHTSeq)
 res -results(dds)

 resOrdered - res[order(res$padj),]
 head(resOrdered)
 ResSig - res[ which(res$padj  0.1 ), ]


 I want to select some data. How can I do? which is the good cut-off on FDR
 values?

The code above does the selection on adjusted p-value. The right FDR
cutoff is up to you, what percent of false discoveries is tolerable in
the final list of genes? The considerations are: the cost of
validation or following up on a false discovery, versus the cost of a
missed discovery. These are hard to quantify even if you know all the
details of an experiment.

 All the data have a FDR less thank 0.1 . :
 Is it right this comand?
 res[ which(res$padj  0.1 ), ]


yes. The which() is necessary because some of the res$padj have NA. If
you have a logical vector with NA, you cannot directly index a
DataFrame, but you can index after calling which(), which will return
the numeric index of the TRUE's. You could also subset with:
subset(res, padj  0.1).

The reason for the NAs is explained in the vignette: Note that some
values in the results table can be set to NA, for either one of the
following reasons:...


 How many significant genes are with FDR less than 0.1 and  have an absolute
 value of  foldchange  more of 1 ? I  have and error on this. I have many NA
 values.

 If I try this code I have the follow errors
 significant.genes = res[(res$padj  .05  abs(res$log2FoldChange) = 1 ),]
 #
 Set thethreshold for the log2 fold change.
 Error in normalizeSingleBracketSubscript(i, x, byrow = TRUE, exact = FALSE)
 :
   subscript contains NAs


This is not the recommended way to filter on large log fold changes.
We have implemented a test specifically for this, check the vignette
section on Tests of log2 fold change above or below a threshold

Mike

 How can I resolve this problenms?
 thanks in advance for the help



 R version 3.1.0 (2014-04-10)
 Platform: i686-pc-linux-gnu (32-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] splines   parallel  stats graphics  grDevices utils datasets
 [8] methods   base

 other attached packages:
  [1] annotate_1.40.1 RColorBrewer_1.0-5  gplots_2.14.1
  [4] org.Hs.eg.db_2.10.1 ReportingTools_2.4.0AnnotationDbi_1.24.0
  [7] RSQLite_0.11.4  DBI_0.2-7   knitr_1.6
 [10] biomaRt_2.18.0  DESeq2_1.4.5RcppArmadillo_0.
 4.320.0
 [13] Rcpp_0.11.2 GenomicRanges_1.14.4XVector_0.2.0
 [16] IRanges_1.20.7  affy_1.40.0 NOISeq_2.6.0
 [19] Biobase_2.22.0  BiocGenerics_0.8.0

 loaded via a namespace (and not attached):
  [1] affyio_1.30.0AnnotationForge_1.4.4BiocInstaller_1.
 12.1
  [4] Biostrings_2.30.1biovizBase_1.10.8bitops_1.0-
 6
  [7] BSgenome_1.30.0  

[Bioc-devel] makeTranscriptDbFrom... AnnotationHub

2014-07-08 Thread Michael Love
The recent TranscriptDb thread reminded me of a question: are there
plans (or am I missing the function) to easily get a TranscriptDb out
of the AnnotationHub objects? It would be great to have a preprocessed
Ensembl txdb like we have for UCSC.

 ah - AnnotationHub()
 gr - 
 ah$ensembl.release.73.gtf.homo_sapiens.Homo_sapiens.GRCh37.73.gtf_0.0.1.RData
 gr
GRanges with 2268089 ranges and 12 metadata columns:
seqnames ranges strand   | source
   Rle  IRanges  Rle   |   factor
[1]1 [11869, 12227]  +   |   processed_transcript
[2]1 [12613, 12721]  +   |   processed_transcript
[3]1 [13221, 14409]  +   |   processed_transcript
[4]1 [11872, 12227]  +   | unprocessed_pseudogene
[5]1 [12613, 12721]  +   | unprocessed_pseudogene
...  ......... ......
  [2268085]   MT [14747, 15887]  +   | protein_coding
  [2268086]   MT [14747, 15887]  +   | protein_coding
  [2268087]   MT [14747, 14749]  +   | protein_coding
  [2268088]   MT [15888, 15953]  +   |Mt_tRNA
  [2268089]   MT [15956, 16023]  -   |Mt_tRNA
   type score phase gene_id   transcript_id
   factor numeric integer character character
[1]exon  NA  NA ENSG0223972 ENST0456328
[2]exon  NA  NA ENSG0223972 ENST0456328
[3]exon  NA  NA ENSG0223972 ENST0456328
[4]exon  NA  NA ENSG0223972 ENST0515242
[5]exon  NA  NA ENSG0223972 ENST0515242
... ...   ...   ... ... ...
  [2268085]exon  NA  NA ENSG0198727 ENST0361789
  [2268086] CDS  NA 0 ENSG0198727 ENST0361789
  [2268087] start_codon  NA 0 ENSG0198727 ENST0361789
  [2268088]exon  NA  NA ENSG0210195 ENST0387460
  [2268089]exon  NA  NA ENSG0210196 ENST0387461
exon_number   gene_name   gene_biotype transcript_name
  numeric charactercharacter character
[1]   1 DDX11L1 pseudogene DDX11L1-002
[2]   2 DDX11L1 pseudogene DDX11L1-002
[3]   3 DDX11L1 pseudogene DDX11L1-002
[4]   1 DDX11L1 pseudogene DDX11L1-201
[5]   2 DDX11L1 pseudogene DDX11L1-201
... ... ...... ...
  [2268085]   1  MT-CYB protein_coding  MT-CYB-201
  [2268086]   1  MT-CYB protein_coding  MT-CYB-201
  [2268087]   1  MT-CYB protein_coding  MT-CYB-201
  [2268088]   1   MT-TTMt_tRNA   MT-TT-201
  [2268089]   1   MT-TPMt_tRNA   MT-TP-201
exon_id  protein_id
character character
[1] ENSE2234944NA
[2] ENSE3582793NA
[3] ENSE2312635NA
[4] ENSE2234632NA
[5] ENSE3608237NA
... ... ...
  [2268085] ENSE1436074NA
  [2268086]NA ENSP0354554
  [2268087]NANA
  [2268088] ENSE1544475NA
  [2268089] ENSE1544473NA
  ---
  seqlengths:
 1   2 ...  MT
NA  NA ...  NA

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Problem on Installation of DEseq2

2014-06-18 Thread Michael Love
hi Jarod,

What version of Rcpp and RcppArmadillo is installed?

Mike

On Wed, Jun 18, 2014 at 11:35 AM, jarod...@libero.it jarod...@libero.it wrote:

I can't install this  DESEq2 on the RHEL 5.5 server:

* installing *source* package ‘RcppArmadillo’ ...
** package ‘RcppArmadillo’ successfully unpacked and MD5 sums
 checked
* checking LAPACK_LIBS:  divide-and-conquer complex SVD unavailable via
 R-
supplied LAPACK
* divide-and-conquer algorithm for complex SVD will be redirected to
 default
** libs
g++ -I/illumina/software/PROG/R302/lib64/R/include -DNDEBUG  -
I/usr/local/include -I
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include  -I..
 /inst/include
-
fpic  -g -O2  -c RcppArmadillo.cpp -o RcppArmadillo.o
In file included from ../inst/include/armadillo:42,
 from ../inst/include/RcppArmadilloForward.h:37,
 from ../inst/include/RcppArmadillo.h:30,
 from RcppArmadillo.cpp:22:
../inst/include/armadillo_bits/compiler_setup.hpp:119:6: error: #error
 ***
Need a newer compiler ***
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include/Rcpp/Date.h:
 In
member function ‘void Rcpp::Date::update_tm()’:
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include/Rcpp/Date.h:
 108:

- Ignored:
warning: converting to ‘time_t’ from ‘double’
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include/Rcpp/Date.h:
 In
function ‘Rcpp::Date Rcpp::operator+(const Rcpp::Date, int)’:
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include/Rcpp/Date.h:
 139:
warning: converting to ‘time_t’ from ‘double’
make: *** [RcppArmadillo.o] Error 1
ERROR: compilation failed for package ‘RcppArmadillo’
* removing ‘/illumina/software/PROG/R302/lib64/R/library/RcppArmadilloâ
 €™
* restoring previous
‘/illumina/software/PROG/R302/lib64/R/library/RcppArmadillo’

The downloaded source packages are in
  ‘/tmp/Rtmp30ZzKT/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning messages:
1: In install.packages(pkgs = pkgs, lib = lib, repos = repos, ...) :
  installation of package ‘RcppArmadillo’ had non-zero exit status
2: In install.packages(pkgs = pkgs, lib = lib, repos = repos, ...) :
  installation of package ‘DESeq2’ had non-zero exit status
3: In install.packages(update[instlib == l, Package], l, contriburl

sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
 base

other attached packages:
[1] BiocInstaller_1.12.1

loaded via a namespace (and not attached):
[1] tools_3.0.
Any Idea how can resolve?





- Done.



 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Problem on Installation of DEseq2

2014-06-18 Thread Michael Love
are you installing with:

biocLite(DESeq2)

hmm these don't look like Rcpp* version numbers. What do you get with

library(Rcpp)
library(RcppArmadillo)
sessionInfo()



On Wed, Jun 18, 2014 at 1:13 PM, jarod...@libero.it jarod...@libero.it wrote:
 Thanks for your help
 I have the 2.0 when i try to install 3.8  I have the error on version of gcc
 on Rhel5.5



Messaggio originale
Da: michaelisaiahl...@gmail.com
Data: 18/06/2014 17.44
A: jarod...@libero.itjarod...@libero.it
Cc: bioc-devel@r-project.orgbioc-devel@r-project.org
Ogg: Re: [Bioc-devel] Problem on Installation of DEseq2

hi Jarod,

What version of Rcpp and RcppArmadillo is installed?

Mike

On Wed, Jun 18, 2014 at 11:35 AM, jarod...@libero.it jarod...@libero.it
 wrote:

I can't install this  DESEq2 on the RHEL 5.5 server:

* installing *source* package ‘RcppArmadillo’ ...
** package ‘RcppArmadillo’ successfully unpacked and MD5 sums
 checked
* checking LAPACK_LIBS:  divide-and-conquer complex SVD unavailable
 via
 R-
supplied LAPACK
* divide-and-conquer algorithm for complex SVD will be redirected to
 default
** libs
g++ -I/illumina/software/PROG/R302/lib64/R/include -DNDEBUG  -
I/usr/local/include -I
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include  -I..
 /inst/include
-
fpic  -g -O2  -c RcppArmadillo.cpp -o RcppArmadillo.o
In file included from ../inst/include/armadillo:42,
 from ../inst/include/RcppArmadilloForward.h:37,
 from ../inst/include/RcppArmadillo.h:30,
 from RcppArmadillo.cpp:22:
../inst/include/armadillo_bits/compiler_setup.hpp:119:6: error: #error
 ***
Need a newer compiler ***
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include/Rcpp/Date.h:
 In
member function ‘void Rcpp::Date::update_tm()’:
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include/Rcpp/Date.h:
 108:

- Ignored:
warning: converting to ‘time_t’ from ‘double’
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include/Rcpp/Date.h:
 In
function ‘Rcpp::Date Rcpp::operator+(const Rcpp::Date, int)’:
/illumina/software/PROG/R302/lib64/R/library/Rcpp/include/Rcpp/Date.h:
 139:
warning: converting to ‘time_t’ from ‘double’
make: *** [RcppArmadillo.o] Error 1
ERROR: compilation failed for package ‘RcppArmadillo’
* removing â
 €˜/illumina/software/PROG/R302/lib64/R/library/RcppArmadilloâ
 €™
* restoring previous
‘/illumina/software/PROG/R302/lib64/R/library/RcppArmadillo’

The downloaded source packages are in
  ‘/tmp/Rtmp30ZzKT/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning messages:
1: In install.packages(pkgs = pkgs, lib = lib, repos = repos, ...) :
  installation of package ‘RcppArmadillo’ had non-zero exit status
2: In install.packages(pkgs = pkgs, lib = lib, repos = repos, ...) :
  installation of package ‘DESeq2’ had non-zero exit status
3: In install.packages(update[instlib == l, Package], l, contriburl

sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
 base

other attached packages:
[1] BiocInstaller_1.12.1

loaded via a namespace (and not attached):
[1] tools_3.0.
Any Idea how can resolve?





- Done.



 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] change names(assays(SummarizedExperiment)) w/o copy?

2014-05-07 Thread Michael Love
hi,

Is there a way that I can change the names of the assays slot of a
SummarizedExperiment, without making a new copy of the data contained
within? Assume I get an SE which has already been constructed, but no
names on the assays() SimpleList.

thanks,

Mike

 library(GenomicRanges)
  gc()
   used (Mb) gc trigger (Mb) max used (Mb)
 Ncells 1291106   691710298 91.4  1590760 85.0
 Vcells 117861991925843 14.7  1724123 13.2
  m - matrix(1:2e7, ncol=10)
  gc()
used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  129 69.01967602 105.1  1590760  85.0
 Vcells 11178604 85.3   22482701 171.6 21178631 161.6

# made a ~75 Mb matrix

  colnames(m) - letters[1:10]
  gc()
used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1291149 69.01967602 105.1  1590760  85.0
 Vcells 11178679 85.3   22482701 171.6 21179851 161.6
  se - SummarizedExperiment(m)
  gc()
used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1302603 69.61967602 105.1  1623929  86.8
 Vcells 12189777 93.1   22482701 171.6 21179851 161.6

# so far no copying

  names(assays(se)) - counts
  gc()
used  (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1303174  69.61967602 105.1  1623929  86.8
 Vcells 22190847 169.4   23686836 180.8 22203423 169.4

# last step made a copy

 sessionInfo()
 R Under development (unstable) (2014-05-07 r65539)
 Platform: x86_64-apple-darwin12.5.0 (64-bit)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets  methods
 [8] base

 other attached packages:
 [1] GenomicRanges_1.17.12 GenomeInfoDb_1.1.3IRanges_1.99.13
 [4] S4Vectors_0.0.6   BiocGenerics_0.11.2

 loaded via a namespace (and not attached):
 [1] RCurl_1.95-4.1 stats4_3.2.0   XVector_0.5.6

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] 'droplevels' argument in `[` method for SummarizedExperiment?

2014-03-12 Thread Michael Love
At the least, I will add the use of droplevels on colData to the vignette.
On Mar 12, 2014 7:28 PM, Steve Lianoglou lianoglou.st...@gene.com wrote:

 On Wed, Mar 12, 2014 at 3:52 PM, Michael Lawrence
 lawrence.mich...@gene.com wrote:
  On Wed, Mar 12, 2014 at 3:45 PM, Martin Morgan mtmor...@fhcrc.org
 wrote:
 
  On 03/12/2014 03:02 PM, Wolfgang Huber wrote:
 
  Hi Martin, Mike
 
  a DESeq2 user brought up the observation that when he subsets a
  'DESeqDataSet' object (the class inherits from 'SummarizedExperiment')
 by
  samples, he often ends up with unused factor levels in the colData.
 (Esp.
  since the subsetting is often to select certain subgroups). Would
 either of
  the following two make sense:
 
  - a 'droplevels' method for 'SummarizedExperiment' that efficiently and
  conveniently removes unused levels, i.e.
x = x[, x$tissue %in% c(guts, brains)]
x = droplevels(x)
 
 
  vs. x$tissue = droplevels(x$tissue)

 Or do:

 colData(x) - droplevels(colData(x))

  there are a surprising number of places were levels could be dropped --
  each column of colData, each column of (possibly two levels of) 'mcols'
 on
  the row data, and the seqlevels of the row data.

 Perhaps true, however in Wolfgan'g case (the DESeqDataSet) I think
 most people would want that to work over the colData of the object.

 In which case, perhaps DESeq2 should just define
 droplevels,DESeqDataSet to work over the colData of the dds for to
 enable that convenience.

 -steve

 --
 Steve Lianoglou
 Computational Biologist
 Genentech

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] BigWigViews

2013-11-18 Thread Michael Love
a discussion came up on devel last year about looking at a genomic range
over multiple samples and multiple experiments (
https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120920/93a4fb61/attachment.pl
 )

stepping aside the multiple experiment part, I'm interested in
BigWigViews() with fixed ranges across samples. Has there been any more
thoughts in this direction?

BigWigViews would be incredibly useful for genomics applications where we
want to scan along the genome looking at lots of samples. BigWig offers a
concise representation of the information compared to BAM files.

What I am trying now is using import(BigWigFile, which=gr) on files one by
one, and then binding the coverage together.

best,

Mike

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BigWigViews

2013-11-18 Thread Michael Love
I'm happy to contribute as well.

We will send something along.

Best,

Mike
On Nov 18, 2013 8:09 PM, Kasper Daniel Hansen 
kasperdanielhan...@gmail.com wrote:

 tileGenome?

 Michael, making us do a prototype in R is a very reasonable request.  We
 should do that.

 Best,
 Kasper


 On Mon, Nov 18, 2013 at 7:45 PM, Tim Triche, Jr. tim.tri...@gmail.com
 wrote:

  Doesn't tileGenome or whatever it's called help with the binning?  It's
  not too hard to bolt multiple tracks into a SummarizedExperiment at that
  point.
 
  --t
 
   On Nov 18, 2013, at 4:33 PM, Kasper Daniel Hansen 
  kasperdanielhan...@gmail.com wrote:
  
   (Michael Love and I had some discussion on this Friday)
  
   I also think it would be a very convenient class/method.  A lot of data
   these days are naturally represented (and are available from say GEO)
 as
   bigWig files (essentially coverage tracks), for example ChIP-seq.  This
   would be much more efficient than converting BAM to coverage on the
 fly.
  
   It seems to me that bigWig ought to be efficient for this, but I am not
   very familiar with its performance.  What we want is really to be able
 to
   chunk multiple coverage profiles over the genome, and do computations
 on
   each of the chunks.  Any idea on efficiency?  I am happy to contribute
 a
   bit, at least with design.
  
   Best,
   Kasper
  
  
   On Mon, Nov 18, 2013 at 6:11 PM, Michael Lawrence 
  lawrence.mich...@gene.com
   wrote:
  
   Aggregating coverage over multiple samples is a popular request
  recently.
   I'm happy to support this effort, but I thinks someone in Seattle is
  going
   to have to take the lead on it.
  
  
   On Mon, Nov 18, 2013 at 2:36 PM, Michael Love
   michaelisaiahl...@gmail.comwrote:
  
   a discussion came up on devel last year about looking at a genomic
  range
   over multiple samples and multiple experiments (
  
 
 https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120920/93a4fb61/attachment.pl
   )
  
   stepping aside the multiple experiment part, I'm interested in
   BigWigViews() with fixed ranges across samples. Has there been any
 more
   thoughts in this direction?
  
   BigWigViews would be incredibly useful for genomics applications
 where
  we
   want to scan along the genome looking at lots of samples. BigWig
  offers a
   concise representation of the information compared to BAM files.
  
   What I am trying now is using import(BigWigFile, which=gr) on files
 one
   by
   one, and then binding the coverage together.
  
   best,
  
   Mike
  
  [[alternative HTML version deleted]]
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
  [[alternative HTML version deleted]]
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
  [[alternative HTML version deleted]]
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] renameModelMatrixColumns mishap + patch for DESeq2

2013-07-15 Thread Michael Love
hi Steve,

I have fixed this bug in version 1.0.18 and 1.1.22, and I added a unit
test for such a design formula in the devel version.

best,

Mike

On Wed, Jul 10, 2013 at 8:52 AM, Michael Love
michaelisaiahl...@gmail.com wrote:
 Hi Steve,

 Thanks for pointing out this bug in the renaming code. I will fix this as
 soon as possible.

 Best,

 Mike

 On Jul 10, 2013 1:00 AM, Steve Lianoglou lianoglou.st...@gene.com wrote:

 Greetings DESeq2ers,

 I've been playing around with different ways to encode a 2x2 factorial
 design I'm working with: 4 cell types, 4 treatments and exploring
 what's cooking in there.

 I thought that the nested interaction formula used in the limma user's
 guide (section 8.5.3) would be an easy way to explore some obvious
 questions since differential expression from different treatments
 within each cell type (among other things) shake out quite cleanly
 from the results of running the wald test since these end up as
 columns in the design matrix, eg:

 ~ cell + cell:treatment

 as long as the control treatment is the first level of the treatment
 factor, we got columns for all cell:treatment effects, eg.
 cellA:treatment1, cellA:treatment2, ..., cellD:treatment4)

 Yay!

 ... almost ...

 The problem is that the renaming column stuff in the fitNbinomGLMs
 function seems to assume that we'll have columns for all main effects
 in the design matrix, so the following line triggers an error:

 modelMatrixNames[match(convertNames$from, modelMatrixNames)] -
 convertNames$to

 since you will find elements like treatment2, ..., treatment4 among
 the elements in `covertNames$from`. The call to match then returns NA
 for these, and *boom*.

 An easy fix would be to add the following line after the call to
 `renameModelMatrixColumns`:

 convertNames - subset(convertNames, from %in% modelMatrixNames)

 So the if (renameCols) block (line 1066 in core.R) now becomes:

   if (renameCols) {
 names(modelMatrixNames) - modelMatrixNames
 convertNames - renameModelMatrixColumns(modelMatrixNames,

 as.data.frame(colData(object)),
  modelFormula)
 convertNames - subset(convertNames, from %in% modelMatrixNames)
 modelMatrixNames[match(convertNames$from, modelMatrixNames)] -
 convertNames$to
   }



 --
 Steve Lianoglou
 Computational Biologist
 Bioinformatics and Computational Biology
 Genentech

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] DESeq2 and miscTranformsation(s)

2013-04-22 Thread Michael Love
hi Steve,

Thanks for pointing me to this bug. I have fixed this in version 1.0.9. So
now rlogTransformation should accept the given dds just like
varianceStabilizingTransformation.

To avoid making a copy of the dds but still re-estimate dispersions, one
can do:

design(dds) - ~ 1
mcols(dds)$dispersion - NULL

then either rlogTransformation or varianceStabilizingTransformation, which
will call estimateDispersions internally.

Mike



On Sun, Apr 21, 2013 at 8:13 PM, Steve Lianoglou 
mailinglist.honey...@gmail.com wrote:

 Hi Michael,

 On Sat, Apr 20, 2013 at 12:18 AM, Michael Love
 michaelisaiahl...@gmail.com wrote:
  hi Steve,
 
  Thanks. Can you send a minimal example of the error? I can provide an
  alternate formula without error:
 
  dds - makeExampleDESeqDataSet()
  design(dds) - formula(~ condition)
  rld - rlogTransformation(dds)

 I'll see your `condition` and raise you a `treatment` :-)

 ===
 R set.seed(123)
 R library(DESeq2)
 R dds - makeExampleDESeqDataSet()
 R colData(dds)$treatment - factor(sample(c(x, y), ncol(dds),
 replace=TRUE))
 R design(dds) - formula(~ condition + treatment)

 ## Fails for rlogTransformation
 R dds.rlog - rlogTransformation(dds)
 gene-wise dispersion estimates
 mean-dispersion relationship
 final dispersion estimates
 Error in SummarizedExperiment(assays = rlogData(object, samplesVector,  :
   error in evaluating the argument 'assays' in selecting a method for
 function 'SummarizedExperiment': Error in
 modelMatrixNames[match(convertNames$from, modelMatrixNames)] -
 convertNames$to :
   NAs are not allowed in subscripted assignments

 ## vst seems OK
 R dds.vst - varianceStabilizingTransformation(dds)
 gene-wise dispersion estimates
 mean-dispersion relationship
 final dispersion estimates
 

 I'm using DESeq2_1.0.7 on R-3.0.0

 Thanks,
 -steve

 --
 Steve Lianoglou
 Computational Biologist
 Department of Bioinformatics and Computational Biology
 Genentech


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel