Re: [Bioc-devel] Error in HDF5 - Package submission - Not detected locally

2019-09-16 Thread Peter Hickey
Hi Tiago,

The following will create a similarly sized subset of the PBMC3k
dataset with the counts in-memory as a sparse matrix:

pbmc3k <- TENxPBMCData::TENxPBMCData("pbmc3k")
mini_pbmc3k <- pbmc3k[1:1700, 1:600]
assay(mini_pbmc3k) <- as(assay(mini_pbmc3k), "dgCMatrix")

Cheers,
Pete

On Mon, 16 Sep 2019 at 23:38, Tiago Lubiana Alves
 wrote:
>
> Hello Mike,
>
> Thank you for the detailed explanation.
>
> You are right, for the vignette, I can download it from the ExperimentHub
> subset the pbmc3k dataset in the first few lines. The main point of having
> a new dataset was to use it in the examples of functions. The full dataset
> is too big and the examples would take too long otherwise.
>
> Still, if someone knows how to "transform from an HDF5 back
> SingleCellExperiment to using the data in memory", it would be of great
> help.
> Best,
> Tiago
>
>
> Tiago Lubiana
> Mestrando em Bioinformática, FCF/USP
> *Computational Systems Biology Laboratory (CSBL)*
> Telefone (laboratório): +55 (11) 2648-0240
> Telefone (pessoal): +55 (11) 954258000
>
>
> On Mon, Sep 16, 2019 at 5:39 AM Mike Smith  wrote:
>
> > Hi Tiago,
> >
> > I suspect what has happened here is that when create the mini_pbmc3k
> > object, you're doing this by subsetting the PBMC, 3k scRNA-seq data from
> > ExperimenHub. The assay data for that are are actually stored in an HDF5
> > file which will be downloaded and stored in your ExperimentHub cache on
> > your local machine. When you take the subset for your new object it still
> > points to the same HDF5 file, and when you save mini_pbmc3k all it actually
> > saves is the location of the HDF5 rather than putting the data in the
> > object. This works locally, but naturally the BioC build server doesn't
> > have the original HDF5 in exactly the same location, and so it fails.
> >
> > Is there a reason you don't want to use the whole dataset and make the
> > first section of the vignette demonstrate how to download it via
> > ExperimentHub? Alternatively there is a way to transform from an HDF5 back
> > SingleCellExperiment to using the data in memory, but I can't remember it
> > right now - hopefully someone else will be along shortly.
> >
> > This also seems like the perfect opportunity to point out that our next
> > BioC Developers' Forum (
> > https://stat.ethz.ch/pipermail/bioc-devel/2019-September/015499.html) will
> > have a discussion on object serilaisation which should cover exactly this
> > type of issue.
> >
> > Best,
> > Mike
> >
> >
> >
> > On Sat, 14 Sep 2019 at 23:26, Tiago Lubiana Alves <
> > tiago.lubiana.al...@usp.br> wrote:
> >
> >> Hello,
> >>
> >> I am having a problem with a package submission build.
> >>
> >> This is the package issue:
> >> https://github.com/Bioconductor/Contributions/issues/1241
> >> And this is the ERROR:
> >>
> >> HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
> >>   #000: C:/hdf5_build/CMake-hdf5-1.10.5/hdf5-1.10.5/src/H5F.c line 509
> >> in H5Fopen(): unable to open file
> >> major: File accessibilty
> >> minor: Unable to open file
> >>   #001: C:/hdf5_build/CMake-hdf5-1.10.5/hdf5-1.10.5/src/H5Fint.c line
> >> 1498 in H5F_open(): unable to open file: time = Fri Sep 13 15:05:35
> >> 2019
> >> , name = '/home/lubianat/.cache/ExperimentHub/5486ffbe0e3_1605',
> >> tent_flags = 0
> >> (...)
> >>
> >> Quitting from lines 46-51 (fcoex.Rmd)
> >> Error: processing vignette 'fcoex.Rmd' failed with diagnostics:
> >> failed to open file '/home/lubianat/.cache/ExperimentHub/5486ffbe0e3_1605'
> >> --- failed re-building 'fcoex.Rmd'
> >>
> >>
> >> It looks like something is pointing in the wrong direction, but I have not
> >> been able to figure out exactly what. I've tried re-saving the data file
> >> ("mini_pbmc3k"), which is loaded in the vignette chunk of the error, but
> >> that did not help.
> >>
> >> Does anyone have a suggestion of what might be happening?
> >>
> >> Thank you very much for your time,
> >> Best,
> >> Tiago
> >>
> >> Tiago Lubiana
> >> Mestrando em Bioinformática, FCF/USP
> >> *Computational Systems Biology Laboratory (CSBL)*
> >> Telefone (laboratório): +55 (11) 2648-0240
> >> Telefone (pessoal): +55 (11) 954258000
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ___
> >> Bioc-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Biocondutor Developers Forum

2019-08-08 Thread Peter Hickey
Great initiative, Mike!
Could these meetings please be recorded?
These will be in the middle of the night in my time zone and I'm sure
others would appreciate being able to view them if they can't make the call.
rOpenSci do something similar with their community calls (
https://ropensci.org/commcalls/).

Cheers,
Pete

On Thu., 8 Aug. 2019, 6:31 pm Mike Smith,  wrote:

> Dear all,
>
> I am excited to announce a new initiative within the Bioconductor project -
> the Bioconductor Developers' Forum.  This monthly teleconference is
> intended as a platform for Bioconductor developers to describe existing
> software infrastructure to other members of the BioC community, to present
> plans for future developments, and discuss changes that may impact other
> developers or software tools within the Bioconductor.
>
> The intended audience is anyone interested in software development and
> infrastructure, whether you're a member of the BioC core team with
> responsibility for multiple packages, or you're just getting started with
> creating a Bioconductor package.
>
> Our first meeting will take place on Thursday 15th August at 09:00 PDT/
> 12:00 EDT / 18:00 CEST using BlueJeans and can be joined via:
>
> https://bluejeans.com/136043474?src=join_info (Meeting ID: 136 043 474)
>
> More details on the intentions for this initiative, including a list of
> proposed topics, can found at:
>
>
> https://www.huber.embl.de/users/msmith/Bioconductor-Developers-Forum-Proposal.pdf
>
>
> The agenda for the first meeting is still open, so if you have a proposal
> or a particular topic you wish to prioritise please reach out to me.
>
> Best wishes,
>
> Mike
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] any interest in a BiocMatrix core package?

2018-05-21 Thread Peter Hickey
A belated follow-up on this thread.

I've created a minimal package and GitHub repo at
https://github.com/Bioconductor/MatrixGenerics; might I suggest we
move the discussion there for the time being?

I've created some issues already to discuss the main points. These
would really benefit from input by experts on S4 and the methods
package, as well as anyone invested in the original subject of the
thread.

Cheers,
Pete

On 4 November 2017 at 19:33, Henrik Bengtsson
<henrik.bengts...@gmail.com> wrote:
> As Peter points out, the 'matrixStats' package provides an API with
> plain functions - not generic functions.  This is intentional with the
> main purpose of this is to keep the overhead at an absolute minimum.
> This is also in line with the overall philosophy of 'matrixStats'
> where speed is maximized and memory usage is minimized to the point
> where you cannot do much better if you'd use native code.   The user
> should be able to call the same matrixStats function thousands of
> times even on rather small matrices without getting killed by overhead
> due to dispatching or internal copies, e.g. [toy example] resampling
> 'cols' B=10,000 times in calls such as matrixStats::rowMeans2(X, cols
> = cols)`.  You can find extensive benchmark reports at
> https://github.com/HenrikBengtsson/matrixStats/wiki/Benchmark-reports.
>
> From my perspective, the role of 'matrixStats' in a software stack is
> a rather low-level role where it can serve higher-level API that
> either replicate its API or reuse it internally, e.g. those that
> dispatch on S3 and S4 etc.  Peter's 'DelayedMatrixStats' is one
> example.
>
> On Thu, Nov 2, 2017 at 2:00 AM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:
> [...]
>> Honestly, I (as co-maintainer of Matrix, principal maintainer
>>  for several years now)
>> had been a bit surprised and frustrated that the 'matrixStats'
>> initiative had started w/o any contact with the Matrix package
>> maintainers and initially has not ever tried to use Matrix
>> package classes or functionality
>> (and this is still the case now AFAICS).
>
> Oh no, I'm sorry that I/we've caused frustration with 'matrixStats'.
> I'm not sure I understand though - the overlap in API and
> functionality between 'matrixStats' and 'Matrix' is basically zero(?).
> I think of 'Matrix' a higher-level package.  Do my comments above put
> it in a different light?  Or are you saying that what's in
> 'matrixStats' should really have been in 'Matrix'?
>
> All the best,
>
> Henrik
>
> On Fri, Nov 3, 2017 at 7:16 AM, Martin Morgan
> <martin.mor...@roswellpark.org> wrote:
>> On 11/02/2017 06:20 PM, Peter Hickey wrote:
>>>
>>> As Michael notes, I think the scope here is broader than considering S4
>>> generics for functions in base R. To summarise, I think we would be
>>> looking
>>> to have S4 generics for the following:
>>>
>>> - All(?) the row*/col* functions in matrixStats (NB: matrixStats uses
>>> plain
>>> old functions with no S3 or S4, which I believe was to avoid any overhead
>>> of method dispatch since it is explicitly targeting ordinary matrix
>>> objects
>>> as input)
>>> - Potentially new row*/col* summaries (i.e. that don't currently exist in
>>> matrixStats)
>>> - Perhaps moving from BiocGenerics the S4 generics defined in
>>> R/matrix-summary.R?
>>> - Perhaps apply() (E.g., DelayedArray defines an S4 generic for this)
>>>
>>> Having these as part of base R or in a recommended packages would be
>>> great,
>>> but of course comes with its own challenges. The alternative is a
>>> lightweight package, likely better hosted on CRAN than BioC to assist with
>>> wider adoption and integration with Matrix, matrixStats, and other
>>> non-BioC
>>> packages.
>>>
>>> As Michael notes, getting the generic signature 'right' will be important
>>> and there are undoubtedly other challenges ahead (I've started a TODO).
>>>
>>> Might Bioconductor open up a GitHub repo (MatrixGenerics?) where this can
>>> be discussed with accompanying code. I've made the skeleton of a
>>> MatrixGenerics package that I could upload to kick things off, along with
>>> adding my TODOs as Issues on GitHub for further discussion.
>>
>>
>> I did start this repository as a place to develop more concrete ideas; I
>> think that a Bioconductor MatrixGenerics solution would not be optimal, so I
>> think of this repository as a place to develop ideas rather than a precursor
>> to an actual package.
>>
>> I invited Pete as a 

Re: [Bioc-devel] bsseqData

2018-05-02 Thread Peter Hickey
Successful pushed to both devel and release. Thanks, Nitesh!

On Wed., 2 May 2018, 1:35 pm Turaga, Nitesh, <nitesh.tur...@roswellpark.org>
wrote:

> Try one more time please.
>
> I made some changes after seeing your output.
>
> Best,
>
> Nitesh
>
> > On May 2, 2018, at 1:21 PM, Peter Hickey <peter.hic...@gmail.com> wrote:
> >
> > Hi Nitesh,
> >
> > I did a fresh clone, updated the package, and committed the changes,
> > but am still unable to push to the bioc git repo. Requested output
> > below.
> >
> > $ git clone g...@git.bioconductor.org:packages/bsseqData
> > # Make updates and commit
> > $ git push origin master
> > Counting objects: 7, done.
> > Delta compression using up to 4 threads.
> > Compressing objects: 100% (7/7), done.
> > Writing objects: 100% (7/7), 67.52 MiB | 1.35 MiB/s, done.
> > Total 7 (delta 2), reused 0 (delta 0)
> > remote: Error: file larger than 5 Mb.
> > remote:
> > remote: File name: 'data/BS.cancer.ex.fit.rda'
> > remote: File size: 41.5 Mb
> > remote:
> > remote: Please see Biocondcutor guidelines
> > remote: https://bioconductor.org/developers/package-guidelines/
> > remote:
> > To git.bioconductor.org:packages/bsseqData
> > ! [remote rejected] master -> master (pre-receive hook declined)
> > error: failed to push some refs to 'g...@git.bioconductor.org:
> packages/bsseqData'
> >
> > $ git remote -v
> > origin g...@git.bioconductor.org:packages/bsseqData (fetch)
> > origin g...@git.bioconductor.org:packages/bsseqData (push)
> >
> > $ tree .git/hooks/
> > .git/hooks/
> > ├── applypatch-msg.sample
> > ├── commit-msg.sample
> > ├── post-update.sample
> > ├── pre-applypatch.sample
> > ├── pre-commit.sample
> > ├── pre-push.sample
> > ├── pre-rebase.sample
> > ├── pre-receive.sample
> > ├── prepare-commit-msg.sample
> > └── update.sample
> >
> > $ git branch -a
> > * master
> >  remotes/origin/HEAD -> origin/master
> >  remotes/origin/RELEASE_2_13
> >  remotes/origin/RELEASE_2_14
> >  remotes/origin/RELEASE_3_0
> >  remotes/origin/RELEASE_3_1
> >  remotes/origin/RELEASE_3_2
> >  remotes/origin/RELEASE_3_3
> >  remotes/origin/RELEASE_3_4
> >  remotes/origin/RELEASE_3_5
> >  remotes/origin/RELEASE_3_6
> >  remotes/origin/RELEASE_3_7
> >  remotes/origin/master
> >
> > $ git log
> > commit 5eb4c1ebc03fb1d713e873a77570744b94dc7659
> > Author: Peter Hickey <peter.hic...@gmail.com>
> > Date:   Wed May 2 13:14:24 2018 -0400
> >
> >Run updateObject() on serialized objects
> >
> > commit 153db702379b926743f26012a34c8076b37f91f9
> > Author: Nitesh Turaga <nitesh.tur...@gmail.com>
> > Date:   Mon Apr 30 10:34:47 2018 -0400
> >
> >bump x.y.z versions to odd y after creation of RELEASE_3_7 branch
> >
> > commit e02c0503b1f75b5ec4caffd055645d3b060d2af9
> > Author: Nitesh Turaga <nitesh.tur...@gmail.com>
> > Date:   Mon Apr 30 10:31:27 2018 -0400
> >
> >bump x.y.z versions to even y prior to creation of RELEASE_3_7 branch
> >
> > commit 3f87e4d027264396bdeeeb87baa68fb3b6662b46
> > Author: Peter Hickey <peter.hic...@gmail.com>
> > Date:   Sun Apr 29 12:31:34 2018 -0400
> >
> >Bump minimum version of bsseq
> >
> > # git log output continues
> >
> >
> > On Wed, 2 May 2018 at 11:36 Turaga, Nitesh
> > <nitesh.tur...@roswellpark.org> wrote:
> >>
> >> Hi Pete,
> >>
> >> Can you try once again please? And this time send me the following
> information if it doesn’t work.
> >>
> >>git remote -v
> >>
> >> and
> >>
> >>ls -r (or tree) .git/hooks/
> >>
> >> and
> >>
> >>git branch -a
> >>
> >>
> >> Thanks,
> >>
> >> Nitesh
> >>
> >>
> >>> On Apr 29, 2018, at 11:22 PM, Peter Hickey <peter.hic...@gmail.com>
> wrote:
> >>>
> >>> Sure thing, we'll wait. Thanks, Val
> >>>
> >>> On Sun., 29 Apr. 2018, 9:01 pm Obenchain, Valerie, <
> valerie.obench...@roswellpark.org> wrote:
> >>> Hi guys,
> >>>
> >>> I'm not sure if this got resolved. If it didn't, I'd recommend waiting
> until after the 3.7 branching tomorrow.
> >>>
> >>> Thanks.
> >>> Val
> >>>
> >>>
> >>>
> >>> On 04/29/20

Re: [Bioc-devel] bsseqData

2018-05-02 Thread Peter Hickey
Hi Nitesh,

I did a fresh clone, updated the package, and committed the changes,
but am still unable to push to the bioc git repo. Requested output
below.

$ git clone g...@git.bioconductor.org:packages/bsseqData
# Make updates and commit
$ git push origin master
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 67.52 MiB | 1.35 MiB/s, done.
Total 7 (delta 2), reused 0 (delta 0)
remote: Error: file larger than 5 Mb.
remote:
remote: File name: 'data/BS.cancer.ex.fit.rda'
remote: File size: 41.5 Mb
remote:
remote: Please see Biocondcutor guidelines
remote: https://bioconductor.org/developers/package-guidelines/
remote:
To git.bioconductor.org:packages/bsseqData
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 
'g...@git.bioconductor.org:packages/bsseqData'

$ git remote -v
origin g...@git.bioconductor.org:packages/bsseqData (fetch)
origin g...@git.bioconductor.org:packages/bsseqData (push)

$ tree .git/hooks/
.git/hooks/
├── applypatch-msg.sample
├── commit-msg.sample
├── post-update.sample
├── pre-applypatch.sample
├── pre-commit.sample
├── pre-push.sample
├── pre-rebase.sample
├── pre-receive.sample
├── prepare-commit-msg.sample
└── update.sample

$ git branch -a
* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/RELEASE_2_13
  remotes/origin/RELEASE_2_14
  remotes/origin/RELEASE_3_0
  remotes/origin/RELEASE_3_1
  remotes/origin/RELEASE_3_2
  remotes/origin/RELEASE_3_3
  remotes/origin/RELEASE_3_4
  remotes/origin/RELEASE_3_5
  remotes/origin/RELEASE_3_6
  remotes/origin/RELEASE_3_7
  remotes/origin/master

$ git log
commit 5eb4c1ebc03fb1d713e873a77570744b94dc7659
Author: Peter Hickey <peter.hic...@gmail.com>
Date:   Wed May 2 13:14:24 2018 -0400

Run updateObject() on serialized objects

commit 153db702379b926743f26012a34c8076b37f91f9
Author: Nitesh Turaga <nitesh.tur...@gmail.com>
Date:   Mon Apr 30 10:34:47 2018 -0400

bump x.y.z versions to odd y after creation of RELEASE_3_7 branch

commit e02c0503b1f75b5ec4caffd055645d3b060d2af9
Author: Nitesh Turaga <nitesh.tur...@gmail.com>
Date:   Mon Apr 30 10:31:27 2018 -0400

bump x.y.z versions to even y prior to creation of RELEASE_3_7 branch

commit 3f87e4d027264396bdeeeb87baa68fb3b6662b46
Author: Peter Hickey <peter.hic...@gmail.com>
Date:   Sun Apr 29 12:31:34 2018 -0400

Bump minimum version of bsseq

# git log output continues


On Wed, 2 May 2018 at 11:36 Turaga, Nitesh
<nitesh.tur...@roswellpark.org> wrote:
>
> Hi Pete,
>
> Can you try once again please? And this time send me the following 
> information if it doesn’t work.
>
> git remote -v
>
> and
>
> ls -r (or tree) .git/hooks/
>
> and
>
> git branch -a
>
>
> Thanks,
>
> Nitesh
>
>
> > On Apr 29, 2018, at 11:22 PM, Peter Hickey <peter.hic...@gmail.com> wrote:
> >
> > Sure thing, we'll wait. Thanks, Val
> >
> > On Sun., 29 Apr. 2018, 9:01 pm Obenchain, Valerie, 
> > <valerie.obench...@roswellpark.org> wrote:
> > Hi guys,
> >
> > I'm not sure if this got resolved. If it didn't, I'd recommend waiting 
> > until after the 3.7 branching tomorrow.
> >
> > Thanks.
> > Val
> >
> >
> >
> > On 04/29/2018 09:37 AM, Peter Hickey wrote:
> >> I'm still unable to push large files after a fresh clone. I *am* able to 
> >> push smaller changes (I just tweaked the DESCRIPTION to test this). But I 
> >> get the "Error: file larger than 5 Mb" error when I try to update larger 
> >> objects (e.g., BS.cancer.ex.fit.rda which is 40.8 Mb).
> >>
> >> On Sun, 29 Apr 2018 at 11:43 Turaga, Nitesh 
> >> <nitesh.tur...@roswellpark.org> wrote:
> >> Can you try with a fresh clone of the repo?
> >>
> >> Best,
> >>
> >> Nitesh
> >>
> >> > On Apr 29, 2018, at 10:21 AM, Peter Hickey <peter.hic...@gmail.com> 
> >> > wrote:
> >> >
> >> > Nitesh, I am still getting the "Error: file larger than 5 Mb" error.
> >> >
> >> > On Sun, 29 Apr 2018 at 09:59 Turaga, Nitesh 
> >> > <nitesh.tur...@roswellpark.org> wrote:
> >> > Hi Pete,
> >> >
> >> > This should be resolved now.
> >> >
> >> > Best,
> >> >
> >> > Nitesh
> >> > > On Apr 29, 2018, at 9:50 AM, Peter Hickey <peter.hic...@gmail.com> 
> >> > > wrote:
> >> > >
> >> > > Thanks, Val. I'm getting an error about "too large files" when I try to
> >> > > push (see be

Re: [Bioc-devel] Virtual class for `matrix` and `DelayedArray`? (or better strategy for dealing with them both)

2018-04-30 Thread Peter Hickey
Tim: As the developer of DelayedMatrixStats (and enthusiastic 'canary down
the coal mine' user-dev of DelayedArray) I'm obviously invested in reducing
the confusion around these packages

I'm going to write some blog posts-cum-vignettes-cum-F1000 around these
issues over the coming weeks, with the ultimate goal of improving the
packages themselves.

Pete


On Mon., 30 Apr. 2018, 12:11 pm Tim Triche, Jr., 
wrote:

> But if you merge methods like that, the error method can be that much more
> difficult to identify. It took a couple of weeks to chase that bug down
> properly, and it ended up down to rowMeans2 vs rowMeans.
>
> I suppose the merged/abstracted method allows to centralize any such
> dispatch into one place and swap out ill-behaved methods once identified,
> so as long as DelayedArray/DelayedMatrixStats quirks are
> documented/understood, maybe it is better to create this union class?
>
> The Matrix/matrixStats/DelayedMatrix/DelayedMatrixStats situation has been
> "interesting" in practical terms, as seemingly simple abstractions appear
> to require more thought. That was my only point.
>
>
> --t
>
> On Mon, Apr 30, 2018 at 11:28 AM, Martin Morgan <
> martin.mor...@roswellpark.org> wrote:
>
> > But that issue will be fixed, so Tim's advice is inappropriate.
> >
> >
> > On 04/30/2018 10:42 AM, Tim Triche, Jr. wrote:
> >
> >> Don't do that.  Seriously, just don't.
> >>
> >> https://github.com/Bioconductor/DelayedArray/issues/16
> >>
> >> --t
> >>
> >> On Mon, Apr 30, 2018 at 10:02 AM, Elizabeth Purdom <
> >> epur...@stat.berkeley.edu> wrote:
> >>
> >> Hello,
> >>>
> >>> I am trying to extend my package to handle `HDF5Matrix` class ( or more
> >>> generally `DelayedArray`). I currently have S4 functions for `matrix`
> >>> class. Usually I have a method for `SummarizedExperiment`, which will
> >>> call
> >>> call the method on `assay(x)` and I want the method to be able to deal
> >>> with
> >>> if `assay(x)` is a `DelayedArray`.
> >>>
> >>> Most of my functions, however, do not require separate code depending
> on
> >>> whether `x` is a `matrix` or `DelayedArray`. They are making use of
> >>> existing functions that will make that choice for me, e.g. rowMeans or
> >>> subsetting. My goal right now is compatibility, not cleverness, and I'm
> >>> not
> >>> creating HDF5 methods to handle other cases. (If something doesn't
> >>> currently exist, then I just enclose `x` with `data.matrix` or
> >>> `as.matrix`
> >>> and call the matrix into memory — for cleanliness and ease in updating
> >>> with
> >>> appropriate methods in future, I could make separate S4 functions for
> >>> these
> >>> specific tasks to dispatch, but that's outside of the scope of my
> >>> question). So for simplicity assume I don't really need to dispatch *my
> >>> code* -- that the methods I'm going to use do that.
> >>>
> >>> The natural solution for me seem to use `setClassUnion` and I was
> >>> wondering if such a virtual class already exists? Or is there a better
> >>> way
> >>> to handle this?
> >>>
> >>> Here's a simple example, using `rowMeans` as my example:
> >>>
> >>> ```
> >>> setGeneric("myNewRowMeans", function(x,...) { standardGeneric("
> >>> myNewRowMeans")})
> >>> setClassUnion("matrixOrDelayed",members=c("matrix", "DelayedArray"))
> >>>
> >>> #' @importFrom DelayedArray rowMeans
> >>> setMethod("myNewRowMeans",
> >>>signature = "matrixOrDelayed",
> >>>definition = function(x,...){
> >>>  # a lot of code independent of x
> >>>  print("This is a lot of code shared regardless
> >>> of
> >>> class of x\n")
> >>>  # a lot of code that depends on x, but is
> >>> dispatched by the functions called
> >>>  out<-rowMeans(x)
> >>>  #a lot of code based on output of out
> >>>  out<-out+1
> >>>  return(out)
> >>>  }
> >>> )
> >>> ```
> >>>
> >>> ___
> >>> Bioc-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>
> >>>
> >> [[alternative HTML version deleted]]
> >>
> >> ___
> >> Bioc-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >>
> >
> > This email message may contain legally privileged and/or confidential
> > information.  If you are not the intended recipient(s), or the employee
> or
> > agent responsible for the delivery of this message to the intended
> > recipient(s), you are hereby notified that any disclosure, copying,
> > distribution, or use of this email message is prohibited.  If you have
> > received this message in error, please notify the sender immediately by
> > e-mail and delete this email message from your computer. Thank you.
> >
>
> [[alternative HTML version deleted]]
>
> 

Re: [Bioc-devel] bsseqData

2018-04-29 Thread Peter Hickey
Sure thing, we'll wait. Thanks, Val

On Sun., 29 Apr. 2018, 9:01 pm Obenchain, Valerie, <
valerie.obench...@roswellpark.org> wrote:

> Hi guys,
>
> I'm not sure if this got resolved. If it didn't, I'd recommend waiting
> until after the 3.7 branching tomorrow.
>
> Thanks.
> Val
>
>
>
> On 04/29/2018 09:37 AM, Peter Hickey wrote:
>
> I'm still unable to push large files after a fresh clone. I *am* able to
> push smaller changes (I just tweaked the DESCRIPTION to test this). But I
> get the "Error: file larger than 5 Mb" error when I try to update larger
> objects (e.g., BS.cancer.ex.fit.rda which is 40.8 Mb).
>
> On Sun, 29 Apr 2018 at 11:43 Turaga, Nitesh <nitesh.tur...@roswellpark.org>
> wrote:
>
>> Can you try with a fresh clone of the repo?
>>
>> Best,
>>
>> Nitesh
>>
>> > On Apr 29, 2018, at 10:21 AM, Peter Hickey <peter.hic...@gmail.com>
>> wrote:
>> >
>> > Nitesh, I am still getting the "Error: file larger than 5 Mb" error.
>> >
>> > On Sun, 29 Apr 2018 at 09:59 Turaga, Nitesh <
>> nitesh.tur...@roswellpark.org> wrote:
>> > Hi Pete,
>> >
>> > This should be resolved now.
>> >
>> > Best,
>> >
>> > Nitesh
>> > > On Apr 29, 2018, at 9:50 AM, Peter Hickey <peter.hic...@gmail.com>
>> wrote:
>> > >
>> > > Thanks, Val. I'm getting an error about "too large files" when I try
>> to
>> > > push (see below). Is there a different workflow for pushing an
>> experiment
>> > > data package or am I doing something else wrong? bsseqData predates
>> the svn
>> > > -> git transition.
>> > >
>> > > Thanks,
>> > > Pete
>> > >
>> > > $ git push origin master
>> > > Counting objects: 7, done.
>> > > Delta compression using up to 4 threads.
>> > > Compressing objects: 100% (7/7), done.
>> > > Writing objects: 100% (7/7), 66.80 MiB | 693.00 KiB/s, done.
>> > > Total 7 (delta 2), reused 0 (delta 0)
>> > > remote: Error: file larger than 5 Mb.
>> > > remote:
>> > > remote: File name: 'data/BS.cancer.ex.fit.rda'
>> > > remote: File size: 40.8 Mb
>> > > remote:
>> > > remote: Please see Biocondcutor guidelines
>> > > remote: https://bioconductor.org/developers/package-guidelines/
>> > > remote:
>> > > To git.bioconductor.org:packages/bsseqData.git
>> > > ! [remote rejected] master -> master (pre-receive hook declined)
>> > > error: failed to push some refs to
>> > > 'g...@git.bioconductor.org:packages/bsseqData.git'
>> > >
>> > >
>> > > On Sat, 28 Apr 2018 at 19:17 Obenchain, Valerie <
>> > > valerie.obench...@roswellpark.org> wrote:
>> > >
>> > >> I've just given you and Pete access. Please give it a try and let me
>> know
>> > >> if you have problems.
>> > >>
>> > >> Val
>> > >>
>> > >> On 04/28/2018 10:50 AM, Kasper Daniel Hansen wrote:
>> > >>
>> > >> Could Pete and I get write access to bsseqData?
>> > >>
>> > >> Best,
>> > >> Kasper
>> > >>
>> > >>[[alternative HTML version deleted]]
>> > >>
>> > >> ___
>> > >> Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing
>> list
>> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> This email message may contain legally privileged and/or confidential
>> > >> information.  If you are not the intended recipient(s), or the
>> employee or
>> > >> agent responsible for the delivery of this message to the intended
>> > >> recipient(s), you are hereby notified that any disclosure, copying,
>> > >> distribution, or use of this email message is prohibited.  If you
>> have
>> > >> received this message in error, please notify the sender immediately
>> by
>> > >> e-mail and delete this email message from your computer. Thank you.
>> > >>[[alternative HTML version deleted]]
>> > >>
>> > >> ___
>> > >>

Re: [Bioc-devel] bsseqData

2018-04-29 Thread Peter Hickey
I'm still unable to push large files after a fresh clone. I *am* able to
push smaller changes (I just tweaked the DESCRIPTION to test this). But I
get the "Error: file larger than 5 Mb" error when I try to update larger
objects (e.g., BS.cancer.ex.fit.rda which is 40.8 Mb).

On Sun, 29 Apr 2018 at 11:43 Turaga, Nitesh <nitesh.tur...@roswellpark.org>
wrote:

> Can you try with a fresh clone of the repo?
>
> Best,
>
> Nitesh
>
> > On Apr 29, 2018, at 10:21 AM, Peter Hickey <peter.hic...@gmail.com>
> wrote:
> >
> > Nitesh, I am still getting the "Error: file larger than 5 Mb" error.
> >
> > On Sun, 29 Apr 2018 at 09:59 Turaga, Nitesh <
> nitesh.tur...@roswellpark.org> wrote:
> > Hi Pete,
> >
> > This should be resolved now.
> >
> > Best,
> >
> > Nitesh
> > > On Apr 29, 2018, at 9:50 AM, Peter Hickey <peter.hic...@gmail.com>
> wrote:
> > >
> > > Thanks, Val. I'm getting an error about "too large files" when I try to
> > > push (see below). Is there a different workflow for pushing an
> experiment
> > > data package or am I doing something else wrong? bsseqData predates
> the svn
> > > -> git transition.
> > >
> > > Thanks,
> > > Pete
> > >
> > > $ git push origin master
> > > Counting objects: 7, done.
> > > Delta compression using up to 4 threads.
> > > Compressing objects: 100% (7/7), done.
> > > Writing objects: 100% (7/7), 66.80 MiB | 693.00 KiB/s, done.
> > > Total 7 (delta 2), reused 0 (delta 0)
> > > remote: Error: file larger than 5 Mb.
> > > remote:
> > > remote: File name: 'data/BS.cancer.ex.fit.rda'
> > > remote: File size: 40.8 Mb
> > > remote:
> > > remote: Please see Biocondcutor guidelines
> > > remote: https://bioconductor.org/developers/package-guidelines/
> > > remote:
> > > To git.bioconductor.org:packages/bsseqData.git
> > > ! [remote rejected] master -> master (pre-receive hook declined)
> > > error: failed to push some refs to
> > > 'g...@git.bioconductor.org:packages/bsseqData.git'
> > >
> > >
> > > On Sat, 28 Apr 2018 at 19:17 Obenchain, Valerie <
> > > valerie.obench...@roswellpark.org> wrote:
> > >
> > >> I've just given you and Pete access. Please give it a try and let me
> know
> > >> if you have problems.
> > >>
> > >> Val
> > >>
> > >> On 04/28/2018 10:50 AM, Kasper Daniel Hansen wrote:
> > >>
> > >> Could Pete and I get write access to bsseqData?
> > >>
> > >> Best,
> > >> Kasper
> > >>
> > >>[[alternative HTML version deleted]]
> > >>
> > >> ___
> > >> Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing
> list
> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> This email message may contain legally privileged and/or confidential
> > >> information.  If you are not the intended recipient(s), or the
> employee or
> > >> agent responsible for the delivery of this message to the intended
> > >> recipient(s), you are hereby notified that any disclosure, copying,
> > >> distribution, or use of this email message is prohibited.  If you have
> > >> received this message in error, please notify the sender immediately
> by
> > >> e-mail and delete this email message from your computer. Thank you.
> > >>[[alternative HTML version deleted]]
> > >>
> > >> ___
> > >> Bioc-devel@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >>
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > ___
> > > Bioc-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
> >
> > This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Switch to SSH protocol for git clone instructions on package landing pages?

2018-04-29 Thread Peter Hickey
Ah, thanks both Joris and Nitesh. I didn't appreciate that SSH access is
limited to those with a public key registered on the git server.

On Sun, 29 Apr 2018 at 11:50 Turaga, Nitesh <nitesh.tur...@roswellpark.org>
wrote:

> Hi Pete,
>
> For developers there is no reason not to use the SSH protocol. But there
> are many people who’d like to clone the repo and look at it on their local
> machine.
>
> Take for example packages which are not maintained on both the
> bioconductor server and Github. For these packages, a good way for users to
> browse the source code on their local machine is using the HTTPS protocol,
> since they will not have access to download it via SSH ( you need
> permissions to do this).
>
> We always advocate developers to only use SSH though. For everyone else,
> HTTPS is the best option.
>
> Best,
>
> Nitesh
>
> > On Apr 29, 2018, at 10:03 AM, Peter Hickey <peter.hic...@gmail.com>
> wrote:
> >
> > The one-liner on the package landing page describing how to check out
> > a package from the git repo uses HTTPS rather than ssh, e.g.:
> >
> > # From https://bioconductor.org/packages/bsseq/
> > git clone https://git.bioconductor.org/packages/bsseq
> >
> > However, as a developer we should be using the SSH protocol
> > (https://bioconductor.org/developers/how-to/git/faq/).
> >
> > Is there any reason not to use the SSH protocol (i.e. git clone
> > g...@git.bioconductor.org:packages/bsseq) in the instructions given on
> > the landing page? It seems to me an unnecessary source of friction,
> > particularly for new developers who will end up with the dreaded
> > "fatal: remote error: FATAL: W any packages/myPackage nobody DENIED by
> > fallthru (or you mis-spelled the reponame)" error message if they
> > don't know to switch protocols
> > (https://bioconductor.org/developers/how-to/git/faq/)
> >
> > Cheers,
> > Pete
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] bsseqData

2018-04-29 Thread Peter Hickey
Nitesh, I am still getting the "Error: file larger than 5 Mb" error.

On Sun, 29 Apr 2018 at 09:59 Turaga, Nitesh <nitesh.tur...@roswellpark.org>
wrote:

> Hi Pete,
>
> This should be resolved now.
>
> Best,
>
> Nitesh
> > On Apr 29, 2018, at 9:50 AM, Peter Hickey <peter.hic...@gmail.com>
> wrote:
> >
> > Thanks, Val. I'm getting an error about "too large files" when I try to
> > push (see below). Is there a different workflow for pushing an experiment
> > data package or am I doing something else wrong? bsseqData predates the
> svn
> > -> git transition.
> >
> > Thanks,
> > Pete
> >
> > $ git push origin master
> > Counting objects: 7, done.
> > Delta compression using up to 4 threads.
> > Compressing objects: 100% (7/7), done.
> > Writing objects: 100% (7/7), 66.80 MiB | 693.00 KiB/s, done.
> > Total 7 (delta 2), reused 0 (delta 0)
> > remote: Error: file larger than 5 Mb.
> > remote:
> > remote: File name: 'data/BS.cancer.ex.fit.rda'
> > remote: File size: 40.8 Mb
> > remote:
> > remote: Please see Biocondcutor guidelines
> > remote: https://bioconductor.org/developers/package-guidelines/
> > remote:
> > To git.bioconductor.org:packages/bsseqData.git
> > ! [remote rejected] master -> master (pre-receive hook declined)
> > error: failed to push some refs to
> > 'g...@git.bioconductor.org:packages/bsseqData.git'
> >
> >
> > On Sat, 28 Apr 2018 at 19:17 Obenchain, Valerie <
> > valerie.obench...@roswellpark.org> wrote:
> >
> >> I've just given you and Pete access. Please give it a try and let me
> know
> >> if you have problems.
> >>
> >> Val
> >>
> >> On 04/28/2018 10:50 AM, Kasper Daniel Hansen wrote:
> >>
> >> Could Pete and I get write access to bsseqData?
> >>
> >> Best,
> >> Kasper
> >>
> >>[[alternative HTML version deleted]]
> >>
> >> ___
> >> Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >>
> >>
> >>
> >>
> >> This email message may contain legally privileged and/or confidential
> >> information.  If you are not the intended recipient(s), or the employee
> or
> >> agent responsible for the delivery of this message to the intended
> >> recipient(s), you are hereby notified that any disclosure, copying,
> >> distribution, or use of this email message is prohibited.  If you have
> >> received this message in error, please notify the sender immediately by
> >> e-mail and delete this email message from your computer. Thank you.
> >>[[alternative HTML version deleted]]
> >>
> >> ___
> >> Bioc-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Switch to SSH protocol for git clone instructions on package landing pages?

2018-04-29 Thread Peter Hickey
The one-liner on the package landing page describing how to check out
a package from the git repo uses HTTPS rather than ssh, e.g.:

# From https://bioconductor.org/packages/bsseq/
git clone https://git.bioconductor.org/packages/bsseq

However, as a developer we should be using the SSH protocol
(https://bioconductor.org/developers/how-to/git/faq/).

Is there any reason not to use the SSH protocol (i.e. git clone
g...@git.bioconductor.org:packages/bsseq) in the instructions given on
the landing page? It seems to me an unnecessary source of friction,
particularly for new developers who will end up with the dreaded
"fatal: remote error: FATAL: W any packages/myPackage nobody DENIED by
fallthru (or you mis-spelled the reponame)" error message if they
don't know to switch protocols
(https://bioconductor.org/developers/how-to/git/faq/)

Cheers,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] bsseqData

2018-04-29 Thread Peter Hickey
Thanks, Val. I'm getting an error about "too large files" when I try to
push (see below). Is there a different workflow for pushing an experiment
data package or am I doing something else wrong? bsseqData predates the svn
-> git transition.

Thanks,
Pete

$ git push origin master
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 66.80 MiB | 693.00 KiB/s, done.
Total 7 (delta 2), reused 0 (delta 0)
remote: Error: file larger than 5 Mb.
remote:
remote: File name: 'data/BS.cancer.ex.fit.rda'
remote: File size: 40.8 Mb
remote:
remote: Please see Biocondcutor guidelines
remote: https://bioconductor.org/developers/package-guidelines/
remote:
To git.bioconductor.org:packages/bsseqData.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to
'g...@git.bioconductor.org:packages/bsseqData.git'


On Sat, 28 Apr 2018 at 19:17 Obenchain, Valerie <
valerie.obench...@roswellpark.org> wrote:

> I've just given you and Pete access. Please give it a try and let me know
> if you have problems.
>
> Val
>
> On 04/28/2018 10:50 AM, Kasper Daniel Hansen wrote:
>
> Could Pete and I get write access to bsseqData?
>
> Best,
> Kasper
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Problem with saveHDF5SummarizedExperiment in HDF5Array package

2018-04-17 Thread Peter Hickey
Hi Elizabeth,

Aaron and I were hit by this same error message. As a workaround,
installing DelayedArray and HDF5Array from the git(hub) repo fixed the
issue (https://github.com/Bioconductor/HDF5Array/issues/6). But this
needs to be propagated to the versions made available via BiocLite().

Martin: Might it be possible to trigger a re-build of these 2 packages
on the BioC build machines?

Cheers,
Pete

On 17 April 2018 at 06:14, Elizabeth Purdom  wrote:
> Hello,
>
> When I try to run the example code in the saveHDF5SummarizedExperiment 
> function, I get the error "Error: C stack usage  7969416 is too close to the 
> limit”. I am working with development R and have incorporated HDF5 
> functionality in my package. I did so many weeks ago on earlier versions of 
> the packages and didn’t use to get this error, but now my tests are failing, 
> etc, since I can’t create a basic object.
>
> Perhaps I’m unknowingly using the wrong version or some other problem? 
> Otherwise, I expect this is already known by authors since its their own 
> example, but in that case I am also wondering if I should roll back to an 
> earlier version for now, and if so which one so that I’m still reasonably 
> current?
>
> Thanks,
> Elizabeth Purdom
>
> Following example from the help pages of saveHDF5SummarizedExperiment:
>> library(HDF5Array)
>> library(SummarizedExperiment)
>> nrows <- 200; ncols <- 6
>> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
>> colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
> +  row.names=LETTERS[1:6])
>> se0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
> + colData=colData)
>> se0
> class: SummarizedExperiment
> dim: 200 6
> metadata(0):
> assays(1): counts
> rownames: NULL
> rowData names(0):
> colnames(6): A B ... E F
> colData names(1): Treatment
>>
>> ## Save 'se0' as an HDF5-based SummarizedExperiment object:
>> dir <- sub("file", "h5_se0_", tempfile())
>> h5_se0 <- saveHDF5SummarizedExperiment(se0, dir)
> Error: C stack usage  7969416 is too close to the limit
> #only showing part of traceback, because as expected by error, hitting some 
> kind of loop
>> traceback()
> …..
> 28: nrow(x)
> 27: nrow(x)
> 26: dim(x)
> 25: dim(x)
> 24: nrow(x)
> 23: nrow(x)
> 22: dim(x)
> 21: dim(x)
> 20: nrow(x)
> 19: nrow(x)
> 18: dim(assay)
> 17: dim(assay)
> 16: FUN(X[[i]], ...)
> 15: lapply(as.list(X), match.fun(FUN), ...)
> 14: lapply(as.list(X), match.fun(FUN), ...)
> 13: lapply(X = X, FUN = FUN, ...)
> 12: lapply(X = X, FUN = FUN, ...)
> 11: sapply(assays, function(assay) dim(assay)[1:2])
> 10: sapply(assays, function(assay) dim(assay)[1:2])
> 9: valid.func(object)
> 8: validityMethod(as(object, superClass))
> 7: isTRUE(x)
> 6: anyStrings(validityMethod(as(object, superClass)))
> 5: validObject(ans)
> 4: `[[<-`(`*tmp*`, i, value = new("HDF5Matrix", seed = new("HDF5ArraySeed",
>filepath = 
> "/private/var/folders/h4/xtpbyfq55qd3rc882bm4zfjwgn/T/RtmpKIQALa/h5_se0_7d29f927618/assays.h5",
>name = "assay001", dim = c(200L, 6L), first_val = 2481.95574347652,
>chunkdim = c(200L, 6L
> 3: `[[<-`(`*tmp*`, i, value = new("HDF5Matrix", seed = new("HDF5ArraySeed",
>filepath = 
> "/private/var/folders/h4/xtpbyfq55qd3rc882bm4zfjwgn/T/RtmpKIQALa/h5_se0_7d29f927618/assays.h5",
>name = "assay001", dim = c(200L, 6L), first_val = 2481.95574347652,
>chunkdim = c(200L, 6L
> 2: .write_h5_assays(x@assays, h5_path, chunkdim, level, verbose)
> 1: saveHDF5SummarizedExperiment(se0, dir)
>> sessionInfo()
> R Under development (unstable) (2018-03-22 r74446)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: OS X El Capitan 10.11.6
>
> Matrix products: default
> BLAS: 
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
> LAPACK: 
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats4stats graphics  grDevices utils datasets  
> methods   base
>
> other attached packages:
>  [1] SummarizedExperiment_1.9.16 Biobase_2.39.2  
> GenomicRanges_1.31.23   GenomeInfoDb_1.15.5
>  [5] HDF5Array_1.7.10rhdf5_2.23.8
> DelayedArray_0.5.30 BiocParallel_1.13.3
>  [9] IRanges_2.13.28 S4Vectors_0.17.42   
> BiocGenerics_0.25.3 matrixStats_0.53.1
>
> loaded via a namespace (and not attached):
>  [1] lattice_0.20-35bitops_1.0-6   grid_3.5.0 
> zlibbioc_1.25.0XVector_0.19.9
>  [6] Matrix_1.2-14  Rhdf5lib_1.1.5 tools_3.5.0
> RCurl_1.95-4.10compiler_3.5.0
> [11] GenomeInfoDbData_1.1.0
>>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] problem in rbind with DelayedArray / HDF5Array package

2018-04-04 Thread Peter Hickey
Does `rbind(testhdf, DelayedArray(testdata))` give you what you want?

On Wed, 4 Apr 2018 at 14:58 Elizabeth Purdom 
wrote:

> Hello,
>
> I am trying to do a rbind a normal (in memory) matrix with a HDF5Matrix
> object or DelayedArray object and I am hitting problems. I’m using the
> development version of R and bioconductor (as of 2 weeks ago) and
> HDF5Array_1.7.9,   DelayedArray_0.5.23 — see sessionInfo at end of email.
>
> Basically if I apply rbind between my normal matrix and a HDF5Matrix, I
> get an error that a method doesn't exist for a DataTable class. If I force
> my HDF5Matrix object into a DelayedMatrix object using ‘as', I either get
> 1) a warning that it doesn’t know how to select a method (even if I use
> DelayedArray::rbind in my command) or 2) In addition to the warning, an
> error that the column names don’t match, even though neither object has
> colnames (they are NULL). Which of these I get depends on the order of the
> entries to rbind — the warning-only version occurs if the DelayedMatrix
> object is first.
>
> The warning-only option version gives the correct answer, but I want to
> understand how to avoid the warning, since I am using this in a package.
>
> It also seems like a bug that it would matter the order of the arguments,
> nor why I need to manually manipulate the HDF5Matrix into a DelayedMatrix
> object in order to do rbind.
>
> Thanks,
> Elizabeth Purdom
>
> Here is my code setting up the objects:
>
> > testdata<-matrix(rnorm(1000),nrow=100,ncol=10)
> > testhdf<-HDF5Array::writeHDF5Array(testdata, "./trash.h5")
> > class(testhdf)
> [1] "HDF5Matrix"
> attr(,"package")
> [1] “HDF5Array"
>
> Here are the errors/warnings I’m getting, some of which depend on the
> order of the entries into rbind:
>
> > test1<-rbind(testhdf,testdata) #error is independent of order of entries
> Error in rbind(...) :
>   missing 'rbind' method for DataTable class HDF5Matrix
> > test2<-DelayedArray::rbind(as(testhdf,"DelayedMatrix"),testdata)
> Warning message:
> In methods:::.selectDotsMethod(classes, .MTable, .AllMTable) :
>   multiple direct matches: "DelayedMatrix", "DataFrame"; using the first
> of these
> > test3<-DelayedArray::rbind(testdata,as(testhdf,"DelayedMatrix"))
> Error in rbind(...) :
>   column names for arg 2 do not match those of first arg
> In addition: Warning message:
> In methods:::.selectDotsMethod(classes, .MTable, .AllMTable) :
>   multiple direct matches: "DataFrame", "DelayedMatrix"; using the first
> of these
>
> > colnames(testdata)
> NULL
> > colnames(as(testhdf,"DelayedMatrix"))
> NULL
>
> Here is my sessionInfo:
>
> > sessionInfo()
> R Under development (unstable) (2018-03-22 r74446)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: OS X El Capitan 10.11.6
>
> Matrix products: default
> BLAS:
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> other attached packages:
> [1] HDF5Array_1.7.9 rhdf5_2.23.5DelayedArray_0.5.23
> BiocParallel_1.13.3 IRanges_2.13.28
> [6] S4Vectors_0.17.38   BiocGenerics_0.25.3 matrixStats_0.53.1
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.0 tools_3.5.0Rhdf5lib_1.1.5
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] any interest in a BiocMatrix core package?

2017-11-01 Thread Peter Hickey
I think that's a good idea, Kylie.
Pete (DelayedMatrixStats developer)

On Thu., 2 Nov. 2017, 6:09 am Kasper Daniel Hansen, <
kasperdanielhan...@gmail.com> wrote:

> I think it makes sense. A lot of sense. Might be useful to involve Henrik
> (matrixStats) as well.
>
> Who are the players, apart from DelayedArray/DelayedMatrixStats and matter?
> (and some very old stuff in Biobase which should really be deprecated in
> favor of matrixStats).
>
> Best,
> Kasper
>
> On Wed, Nov 1, 2017 at 3:03 PM, Bemis, Kylie 
> wrote:
>
> > Hi all,
> >
> > To continue a variant of this conversation, with the latest BioC release,
> > we now have quite a few packages that are implementing various
> > matrix-related S4 generic functions, many of them relying on matrixStats
> as
> > a template.
> >
> > I was wondering if there is any interest or intention to create a common
> > MatrixGenerics/ArrayGenerics package on which we can depend to import the
> > relevant S4 generic functions. Although BiocGeneric has a few like
> > ‘rowSums()’ and ‘colMeans()’, etc., there are many more that are
> > implemented across ‘DelayedArray', ‘DelayedMatrixStats', my own package
> > ‘matter', etc., including ‘apply()’, ‘rowSds()’, ‘colVars()’, and so
> forth.
> >
> > It would be nice to have a single package with minimal additional
> > dependencies (a la BiocGenerics) where we could import the various S4
> > generics and avoid unwanted namespace collisions.
> >
> > Have there been any thoughts on this?
> >
> > Many thanks,
> > Kylie
> >
> > ~~~
> > Kylie Ariel Bemis
> > Future Faculty Fellow
> > College of Computer and Information Science
> > Northeastern University
> > kuwisdelu.github.io
> >
> >
> >
> >
> > On Mar 3, 2017, at 11:27 AM, Kasper Daniel Hansen <
> > kasperdanielhan...@gmail.com>
> wrote:
> >
> >
> >
> > On Fri, Mar 3, 2017 at 10:22 AM, Vincent Carey <
> st...@channing.harvard.edu
> > > wrote:
> >
> >
> > On Fri, Mar 3, 2017 at 10:07 AM, Kasper Daniel Hansen <
> > kasperdanielhan...@gmail.com>
> wrote:
> > Some comment on Aaron's stuff
> >
> > One possibility for doing things like this is if your code can be done in
> > C++ using a subset of rows or columns.  That can sometimes give the
> > necessary speed up.  What I mean is this
> >
> > Say you can safely process 1000 cells (not matrix cells, but biological
> > cells, aka columns) at a time in RAM
> >
> > iterate in R:
> >   get chunk i containing 1000 cells from the backend data storage
> >   do something on this sub matrix where everything is in a normal matrix
> > and you just use C++
> >   write results out to whatever backend you're using
> >
> > Then, with a million cells you iterate over 1000 chunks in R.  And you
> > don't need to "touch" the full dataset which can be stored on an
> arbitrary
> > backend.
> >
> > you "touch" it, but you never ingest the whole thing at any time, is that
> > what you mean?
> >
> > Yes, you load the chunk into RAM and then just deal with it.
> >
> > Think of doing 10^10 linear models.  If this was 10^6 I would just use
> > lmFit.  But 10^10 doesn't fit into memory.  So I load 10^7 into memory,
> run
> > lmFit, store results, redo.  This is bound to be much more efficient than
> > loading a single row into memory and doing lm 10^10 times, because lmFit
> is
> > written to do many linear models at the same time.
> >
> > I am suggesting that this is a potential general strategy.
> >
> >
> > And this approach could be run even (potentially) with different chunks
> on
> > different nodes.
> >
> > that seems to me to be an important if not essential desideratum.
> >
> > what then is the role of C++?  extracting a chunk?  preexisting
> utilities?
> >
> > When I say C++ I just mean write an efficient implementation that works
> on
> > a chunk, like lmFit.  It is true that anything that works on a chunk will
> > work on a single row/column (like lmFit) but there are possibilities for
> > optimization when you work at the chunk level.
> >
> > Obviously not all computations can be done chunkwise.  But for those that
> > can, this is a strategy which is independent of the data backend.
> >
> > I wonder whether this "obviously not" needs to be rethought.  Algorithms
> > that are implemented to work with data holistically may need
> > to be reexpressed so that they can succeed with chunkwise access.  Is
> this
> > a new mindset needed for holist developers, or can the
> > effective data decompositions occur autonomously?
> >
> > Well, I would say it is obvious that not all computations can be done
> > chunkwise.  But of course, in the limit of extremely large data,
> algorithms
> > which needs to cycle over everything no longer scale.  So in that case
> all
> > practical computations can be done chunkwise, out of necessity.  For
> single
> > cell right now where it is just millions of cells 

Re: [Bioc-devel] Long-form documentation for DelayedArray?

2017-10-29 Thread Peter Hickey
FYI I also began a project to support an additional backend;
https://github.com/PeteHaitch/matterArray. It's incomplete and may not work
with the current version of DelayedArray (it's ~3 months old and I was
naughtily using some internal functions of DelayedArray). I hope to return
to this soon and I have plans for 1-2 other backends, so some additional
documentation would also be appreciated by me :)

On Mon, 30 Oct 2017 at 08:01 Hervé Pagès  wrote:

> Hi Francesco,
>
> On 10/29/2017 10:10 AM, Francesco Napolitano wrote:
> > Hi all,
> >
> > packages submitted to Bioconductor are required to include at least
> > one vignette. However, it seems that this rule does not hold for some
> > core packages, such as HDF5Array and DelayedArray. Is there any
> > special reason for this?
>
> Infrastructure packages are not strictly required to have a vignette.
> However that doesn't mean they shouldn't have one ;-)
>
> >
> > In particular, I'd like to read more about how to create a backend for
> > DelayedArray. Is there any documentation available beyond the
> > reference manual?
>
> I'm guilty. I plan to remedy this ASAP. In the mean time I'll be glad
> to help. Note that other people are already working (or planning to
> work) on other backends:
>
> Backend for remote HDF5 data:
>
>https://github.com/vjcitn/RemoteArray
>
>See issues #1, #2, #3 for some discussion about this.
>
> Backend for GDS files:
>
>https://github.com/Bioconductor/VariantExperiment/issues/1
>
> Cheers,
> H.
>
> >
> > Thank you very much,
> > Francesco.
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=4GEWJdTmK0EYn8vfoxKmMiMXAtZmHSg5yDSGV-cNeXo=K9O4Nr_pBAUTG36uzgP6wEYTh5N2PFWOHwERdUkdQlI=
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Ce
> nter
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] "extra" unit tests

2017-10-24 Thread Peter Hickey
A partial answer if you are using the 'testthat' framework: you can use
`testthat::skip_on_bioc()` to specify that a test should be skipped if it
is running on the BioC build machines. The test will otherwise be run
(e.g., during local development). There are some other `testthat::skip*()`
functions that may also be useful.
Cheers,
Pete

On Wed, 25 Oct 2017 at 12:47 Levi Waldron 
wrote:

> Any thoughts about how to implement optional or "extra" unit tests, that
> are too resource intensive to be part of the Bioconductor daily builds, but
> that should be run once in a while, say with major updates?
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Unable to push to https://git.bioconductor.org/packages/GenomicTuples

2017-09-21 Thread Peter Hickey
Doh, I literally spotted that the second I hit send. Sorry for the noise
and thanks!

On Thu, 21 Sep 2017 at 13:55 Turaga, Nitesh <nitesh.tur...@roswellpark.org>
wrote:

> Hi Peter,
>
> If you notice, 
>
> On Sep 21, 2017, at 1:53 PM, Peter Hickey <peter.hic...@gmail.com> wrote:
>
> Hi Nitesh,
>
> I'm unable to push changes to the GenomicTuples package to the BioC git
> host.
>
> $ git push
> fatal: remote error: FATAL: W any packages/GenomicTuples nobody DENIED by
> fallthru
> (or you mis-spelled the reponame)
>
> Following the FAQ (http://bioconductor.org/developers/how-to/git/faq/)
> I've
> run the following to try to diagnose the issue:
>
> $ git remote -v
> origin https://git.bioconductor.org/packages/GenomicTuples (fetch)
> origin https://git.bioconductor.org/packages/GenomicTuples (push)
>
>
> This is supposed to be SSH, i.e
>
> origin g...@git.bioconductor.org:packages/GenomicTuples (fetch)
> origin g...@git.bioconductor.org:packages/GenomicTuples (push)
>
>
> $ ssh -T g...@git.bioconductor.org
> hello p.hickey, this is git@ip-172-30-0-33 running gitolite3
> v3.6.6-6-g7c8f0ab on git 2.13.0
>
> R admin/..*
> R packages/..*
> R   admin/manifest
> 
> R W packages/GenomicTuples
> 
> R W packages/bsseq
> 
>
> This all seems to be in order, so I'm stumped. Can you please help.
>
> Thanks,
> Pete
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Unable to push to https://git.bioconductor.org/packages/GenomicTuples

2017-09-21 Thread Peter Hickey
Hi Nitesh,

I'm unable to push changes to the GenomicTuples package to the BioC git
host.

$ git push
fatal: remote error: FATAL: W any packages/GenomicTuples nobody DENIED by
fallthru
(or you mis-spelled the reponame)

Following the FAQ (http://bioconductor.org/developers/how-to/git/faq/) I've
run the following to try to diagnose the issue:

$ git remote -v
origin https://git.bioconductor.org/packages/GenomicTuples (fetch)
origin https://git.bioconductor.org/packages/GenomicTuples (push)

$ ssh -T g...@git.bioconductor.org
hello p.hickey, this is git@ip-172-30-0-33 running gitolite3
v3.6.6-6-g7c8f0ab on git 2.13.0

 R admin/..*
 R packages/..*
 R   admin/manifest

 R W packages/GenomicTuples

 R W packages/bsseq


This all seems to be in order, so I'm stumped. Can you please help.

Thanks,
Pete

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GitHub mirror not synced for GenomicTuples

2017-01-22 Thread Peter Hickey
Thanks, Martin!

On 22 January 2017 at 13:10, Martin Morgan
<martin.mor...@roswellpark.org> wrote:
> On 01/22/2017 10:18 AM, Peter Hickey wrote:
>>
>> Hi,
>>
>> Recent changes (last few days) that successfully synced from my own GitHub
>> repo to the BioC SVN and consequent builds don't seem to have propagated
>> to
>> the BioC GitHub mirror. Anything I should/can be doing to address this?
>
>
> Thanks; I think these are back in sync?
>
> The problem likely comes when there are many quick svn commits, each getting
> synced to the Biocoonductor git mirror independently. Squashing commits that
> you then sync to Bioconductor would be one approach to avoid this. (we are
> making progress on direct support for git).
>
> Martin
>
>>
>> Thanks,
>> Pete
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GitHub mirror not synced for GenomicTuples

2017-01-22 Thread Peter Hickey
Sorry, I forgot to say this is the GenomicTuples package

On 22 January 2017 at 10:18, Peter Hickey <peter.hic...@gmail.com> wrote:

> Hi,
>
> Recent changes (last few days) that successfully synced from my own GitHub
> repo to the BioC SVN and consequent builds don't seem to have propagated to
> the BioC GitHub mirror. Anything I should/can be doing to address this?
>
> Thanks,
> Pete
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] GitHub mirror not synced for GenomicTuples

2017-01-22 Thread Peter Hickey
Hi,

Recent changes (last few days) that successfully synced from my own GitHub
repo to the BioC SVN and consequent builds don't seem to have propagated to
the BioC GitHub mirror. Anything I should/can be doing to address this?

Thanks,
Pete

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Feedback wanted on design of fixed-width Ranges class

2016-11-23 Thread Peter Hickey
Gabe - very cool! I'll be following this with interest.

Ryan - conceptually I haven't been thinking of fixed-width ranges as
different from general ranges, hence why I think it'd be neat if the
user just got the benefits of space-efficient representation without
having to know/care about the underlying representation. My
delineation by class is more of prototyping/conceptual convenience and
my thinking of IRanges/FWRanges as being concrete implementations of
the (virtual) Ranges class (albeit with FWRanges subject to additional
constraints).

Cheers,
Pete

On Thu, 24 Nov 2016 at 14:14 Ryan <r...@thompsonclan.org> wrote:
>
> Hi all,
>
> In addition to the technical concerns, I suppose we should consider
> whether fixed-width ranges are conceptually different enough from
> general ranges to warrant a separate class, or whether this is just
> being considered for purely technical reasons. My feeling is that
> fixed-width ranges aren't sufficiently different from general ranges to
> justify a separate class. The two main uses I can think of for
> fixed-width ranges are genomic positions (i.e. length 1 ranges) and
> cases like "1kb upstream of" or "1kb radius around" a set of specified
> positions. But even for that case, fixed-wdith ranges are not
> necessarily usable because a position less than 1kb from the end of a
> chromosome would require a truncated range. (What behavior would we
> expect from a hypothetical FWRanges class in this case?)
>
> -Ryan
>
> On 11/23/16 8:01 PM, Ryan wrote:
> > Is it possible to allow the width slot of IRanges to be either a
> > normal vector or an Rle?
> >
> >
> > On 11/23/16 6:18 PM, Peter Hickey wrote:
> >> I've been toying with the idea of a fixed/constant width Ranges
> >> subclass. The motivation comes from storing DNA methylation data at CH
> >> loci (non-CpG methylation): there are 1.1 billion CH loci in the human
> >> genome, so to store these as a GRanges object requires 2 x 1.1 billion
> >> integer vectors, one for the @start and one for the @width slots of
> >> the IRanges object in the @ranges slot. But in this case, and perhaps
> >> others, such as storing SNP data, we have a situation where all loci
> >> have the same width, namely 1. Of course, you might argue such a
> >> 2-fold reduction in size is purely academic, but I think it could be a
> >> nice efficiency that's worth pursuing.
> >>
> >> I've sketched out two different prototypes, neither of which I've
> >> worked up to a complete implementation; I'd like to get some feedback
> >> on these two designs, along with a variation that I've not yet even
> >> tried implementing, before I decide how/whether to proceed.
> >>
> >> The two approaches are:
> >>
> >> 1. A new Ranges subclass, FWRanges (fixed-width Ranges, open to better
> >> name suggestions).
> >> a. The @width slot would be an integer vector of length 1
> >> b. [variation not yet implemented] The @width slot would be an Rle
> >> vector parallel to @start
> >> 2. Modifying the IRanges class. The @width slot may be a integer
> >> vector of length 1 or a vector parallel to @start
> >>
> >> [Upon reflection, I suppose there could be a '2b' where the @width
> >> slot is an Rle, but I'm going to ignore this for now since in general
> >> it would be inefficient when the ranges have (random) variable widths]
> >>
> >> # Pros of 1
> >>
> >> - It seems the proper thing is to create a new Ranges subclass
> >> - No dangers associated with stuffing around with internals of the
> >> IRanges class and clean code separation
> >>
> >> # Pros of 1b compared to 1a
> >>
> >> - Like for IRanges, the @width slot would remain parallel to the
> >> @start slot
> >>
> >> # Cons of 1
> >>
> >> - Can't immediately use in a GRanges object because the @ranges slot
> >> is classed as an IRanges object
> >> - Perhaps this could be changed to allow a Ranges object in the
> >> @ranges slot of a GRanges object?
> >> - Otherwise, would also need to implement a subclass of GenomicRanges
> >> (say, FWGRanges) that used a FWRanges object in the @ranges slot. This
> >> would necessitate a fair bit of code duplicated from GRanges methods.
> >> - Methods like start<-, end<-, width<- would either have to
> >> - (A) return an error if the new object no longer has fixed/constant
> >> widths
> >> - (B) coerce it to an IRanges object (with or without warning) thus
> >> meaning thes

Re: [Bioc-devel] Feedback wanted on design of fixed-width Ranges class

2016-11-23 Thread Peter Hickey
Vince - From my understanding GPos mostly gains its efficiencies when
positions are adjacent, which is generally not the case for the types of
positions I'm considering. In fact, the @ranges slot of the @pos_runs slot
in a GPos object is just a IRanges object where n adjacent positions are
'compressed' into a single width-n range.

(Also, FWRanges could generalise to intervals with fixed-width > 1)

On Thu, 24 Nov 2016 at 10:36 Vincent Carey <st...@channing.harvard.edu>
wrote:

> pace Wolfgang Huber ...
>
> Peter I don't mean to be rude.  Your comments deserve more study.  But it
> was fun to remember GPos, which I had forgotten.
>
> On Wed, Nov 23, 2016 at 6:34 PM, Vincent Carey <st...@channing.harvard.edu
> > wrote:
>
> library(GenomicRanges)
> class?GPos
>
> On Wed, Nov 23, 2016 at 6:18 PM, Peter Hickey <peter.hic...@gmail.com>
> wrote:
>
> I've been toying with the idea of a fixed/constant width Ranges
> subclass. The motivation comes from storing DNA methylation data at CH
> loci (non-CpG methylation): there are 1.1 billion CH loci in the human
> genome, so to store these as a GRanges object requires 2 x 1.1 billion
> integer vectors, one for the @start and one for the @width slots of
> the IRanges object in the @ranges slot. But in this case, and perhaps
> others, such as storing SNP data, we have a situation where all loci
> have the same width, namely 1. Of course, you might argue such a
> 2-fold reduction in size is purely academic, but I think it could be a
> nice efficiency that's worth pursuing.
>
> I've sketched out two different prototypes, neither of which I've
> worked up to a complete implementation; I'd like to get some feedback
> on these two designs, along with a variation that I've not yet even
> tried implementing, before I decide how/whether to proceed.
>
> The two approaches are:
>
> 1. A new Ranges subclass, FWRanges (fixed-width Ranges, open to better
> name suggestions).
> a. The @width slot would be an integer vector of length 1
> b. [variation not yet implemented] The @width slot would be an Rle
> vector parallel to @start
> 2. Modifying the IRanges class. The @width slot may be a integer
> vector of length 1 or a vector parallel to @start
>
> [Upon reflection, I suppose there could be a '2b' where the @width
> slot is an Rle, but I'm going to ignore this for now since in general
> it would be inefficient when the ranges have (random) variable widths]
>
> # Pros of 1
>
> - It seems the proper thing is to create a new Ranges subclass
> - No dangers associated with stuffing around with internals of the
> IRanges class and clean code separation
>
> # Pros of 1b compared to 1a
>
> - Like for IRanges, the @width slot would remain parallel to the @start
> slot
>
> # Cons of 1
>
> - Can't immediately use in a GRanges object because the @ranges slot
> is classed as an IRanges object
> - Perhaps this could be changed to allow a Ranges object in the
> @ranges slot of a GRanges object?
> - Otherwise, would also need to implement a subclass of GenomicRanges
> (say, FWGRanges) that used a FWRanges object in the @ranges slot. This
> would necessitate a fair bit of code duplicated from GRanges methods.
> - Methods like start<-, end<-, width<- would either have to
> - (A) return an error if the new object no longer has fixed/constant widths
> - (B) coerce it to an IRanges object (with or without warning) thus
> meaning these operations would not be strict endomorphisms
> - Users would only get the space-savings of the FWRanges class if they
> explicitly construct a FWRanges object or coerce a compatible IRanges
> object to an FWRanges object
> - Clean code separation from the IRanges class may also lead to duplicated
> code
>
> # Cons of 1b compared to 1a
>
> - Endomorphic versions of methods like start<-, end<-, width<- could
> create a @width slot that is twice the 'necessary' size (e.g., an Rle
> representation of a vector that contains no 'runs').
>
> # Pros of 2
>
> - If properly implemented, the user wouldn't need to think about
> whether the ranges were fixed or variable width, they'd just get the
> most efficient representation
>
> # Cons of 2
>
> - This is fairly obvious, 2 would be a major (internal) change to a
> core Bioconductor class
> - The @width slot would no longer necessarily be parallel to @start
> slot, e.g., code that does direct slot access via @width could easily
> break (of course, the width() getter would be modified to return a
> parallel vector to the @start slot, but people (*cough* me) have code
> that does the wrong thing with respect to the use of getters vs.
> direct slot access)
> - New IRanges objects may be incompatible with e

[Bioc-devel] Feedback wanted on design of fixed-width Ranges class

2016-11-23 Thread Peter Hickey
I've been toying with the idea of a fixed/constant width Ranges
subclass. The motivation comes from storing DNA methylation data at CH
loci (non-CpG methylation): there are 1.1 billion CH loci in the human
genome, so to store these as a GRanges object requires 2 x 1.1 billion
integer vectors, one for the @start and one for the @width slots of
the IRanges object in the @ranges slot. But in this case, and perhaps
others, such as storing SNP data, we have a situation where all loci
have the same width, namely 1. Of course, you might argue such a
2-fold reduction in size is purely academic, but I think it could be a
nice efficiency that's worth pursuing.

I've sketched out two different prototypes, neither of which I've
worked up to a complete implementation; I'd like to get some feedback
on these two designs, along with a variation that I've not yet even
tried implementing, before I decide how/whether to proceed.

The two approaches are:

1. A new Ranges subclass, FWRanges (fixed-width Ranges, open to better
name suggestions).
a. The @width slot would be an integer vector of length 1
b. [variation not yet implemented] The @width slot would be an Rle
vector parallel to @start
2. Modifying the IRanges class. The @width slot may be a integer
vector of length 1 or a vector parallel to @start

[Upon reflection, I suppose there could be a '2b' where the @width
slot is an Rle, but I'm going to ignore this for now since in general
it would be inefficient when the ranges have (random) variable widths]

# Pros of 1

- It seems the proper thing is to create a new Ranges subclass
- No dangers associated with stuffing around with internals of the
IRanges class and clean code separation

# Pros of 1b compared to 1a

- Like for IRanges, the @width slot would remain parallel to the @start slot

# Cons of 1

- Can't immediately use in a GRanges object because the @ranges slot
is classed as an IRanges object
- Perhaps this could be changed to allow a Ranges object in the
@ranges slot of a GRanges object?
- Otherwise, would also need to implement a subclass of GenomicRanges
(say, FWGRanges) that used a FWRanges object in the @ranges slot. This
would necessitate a fair bit of code duplicated from GRanges methods.
- Methods like start<-, end<-, width<- would either have to
- (A) return an error if the new object no longer has fixed/constant widths
- (B) coerce it to an IRanges object (with or without warning) thus
meaning these operations would not be strict endomorphisms
- Users would only get the space-savings of the FWRanges class if they
explicitly construct a FWRanges object or coerce a compatible IRanges
object to an FWRanges object
- Clean code separation from the IRanges class may also lead to duplicated code

# Cons of 1b compared to 1a

- Endomorphic versions of methods like start<-, end<-, width<- could
create a @width slot that is twice the 'necessary' size (e.g., an Rle
representation of a vector that contains no 'runs').

# Pros of 2

- If properly implemented, the user wouldn't need to think about
whether the ranges were fixed or variable width, they'd just get the
most efficient representation

# Cons of 2

- This is fairly obvious, 2 would be a major (internal) change to a
core Bioconductor class
- The @width slot would no longer necessarily be parallel to @start
slot, e.g., code that does direct slot access via @width could easily
break (of course, the width() getter would be modified to return a
parallel vector to the @start slot, but people (*cough* me) have code
that does the wrong thing with respect to the use of getters vs.
direct slot access)
- New IRanges objects may be incompatible with earlier version of IRanges

Your feedback is very appreciated,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Fast check of GenomicRanges equality to speed up cbind, SummarizedExperiment

2016-08-30 Thread Peter Hickey
Wonderful. Thanks, Hervé!

On 30 August 2016 at 20:45, Hervé Pagès <hpa...@fredhutch.org> wrote:
> Hi Pete,
>
> Thanks for suggesting this fast method. I've formalized this a little
> bit by using a generic (identicalVals) + methods. I also tweaked it
> in order to avoid false negatives that can occur when 'x' and 'y' have
> different names or different seqlevels. So no more fallback to
> 'all(x == y)'.
>
> Committed in SummarizedExperiment 1.3.82.
>
> BTW please note that 'x == y' and 'identicalVals(x, y)' both ignore
> circularity of the underlying sequences e.g. ranges [1, 10] and
> [101, 110] represent the same position on a circular sequence of
> length 100 so should be considered equal. However for 'x == y' and
> 'identicalVals(x, y)', they are not. Something we should address at
> some point...
>
> Cheers,
> H.
>
>
> On 08/30/2016 05:57 AM, Peter Hickey wrote:
>>
>> The cbind,SummarizedExperiment-method checks that the rowRanges slots
>> are equal by calling `all(x == x1)`, where x and x1 are GenomicRanges
>> objects. This can be kind of slow and makes a large, temporary vector
>> when length(x) is large.
>>
>> I wrote a fast method to check equality of two GenomicRanges objects,
>> see https://gist.github.com/PeteHaitch/13787125a165928e652dcfea2a8d166a.
>> It takes it from 13.7 seconds to 0.004 seconds for a GenomicRanges
>> object with 100M elements on my machine. It uses identical() on key
>> slots of the GenomicRanges objects, and I'm not sure if this could
>> return false negatives, so I fall back to all(x == x1) if the fast
>> method returns FALSE.
>>
>> Could cbind,SummarizedExperiment-method be updated to use something like
>> this?
>>
>> Cheers,
>> Pete
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Increasing MAX_BUFLENGTH in S4Vectors src/AEbufs.c

2016-07-19 Thread Peter Hickey
I hit an error when calling reduce() on a very big GRanges object
(length = 1170402558). The error was:

Error in .Call2("CompressedIRangesList_reduce", x, drop.empty.ranges,  :
   _get_new_buflength(): MAX_BUFLENGTH reached

I found MAX_BUFLENGTH is defined in S4Vectors in the file src/AEbufs.c as:

#define MAX_BUFLENGTH_INC (32 * 1024 * 1024)
#define MAX_BUFLENGTH (32 * MAX_BUFLENGTH_INC)

So I experimentally increased the limit in a local copy. I first set
it to twice the current value (which errored on R CMD check, I think
because that makes MAX_BUFLENGTH > the maximum allowable integer), but
by setting it to 1.5 times its current limit I got (A) an apparently
working copy of S4Vectors and (B) the original code now ran without
error.

So, is it safe to increase MAX_BUFLENGTH or am I missing some important details?

Thanks,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Some granges() accessors broken in devel

2016-06-20 Thread Peter Hickey
I think this is a recent break. I'm mostly concerned because I need to
use this "broken" functionality in a tutorial for BioC2016 this week
and it would require changes to package internals, not the vignette,
in order to fix this at my end.

library(SummarizedExperiment)
se <- SummarizedExperiment(rowRanges = GRanges(1, IRanges(1, 10)))
granges(se)
# Error in granges(se) :
#   could not find symbol "use.names" in environment of the generic function
# This still works
rowRanges(se)

Perhaps we should be using rowRanges() rather than granges() but this
is breaking a few things in minfi and bsseq. I've noticed this on
granges,RangedSummarizedExperiment-method, but it may affect other
methods (haven't tested).

Cheers,
Pete

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] SummarizedExperiment_1.3.5 Biobase_2.33.0
[3] GenomicRanges_1.25.4   GenomeInfoDb_1.9.1
[5] IRanges_2.7.6  S4Vectors_0.11.5
[7] BiocGenerics_0.19.1repete_0.0.0.9004
[9] devtools_1.11.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5  codetools_0.2-14 digest_0.6.9 withr_1.0.1
 [5] plyr_1.8.4   magrittr_1.5 scales_0.4.0 zlibbioc_1.19.0
 [9] stringi_1.1.1XVector_0.13.2   pryr_0.1.2   tools_3.3.0
[13] stringr_1.0.0munsell_0.4.3colorspace_1.2-6 memoise_1.0.0

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SVN and GitHub mirror out-of-sync

2016-04-18 Thread Peter Hickey
Typo - github version is 1.5.23

On Mon, 18 Apr 2016 at 09:26 Peter Hickey <peter.hic...@gmail.com> wrote:

> Hi,
>
> The current version of GenomicTuples on the official SVN is 1.5.24
> (
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomicTuples/DESCRIPTION
> ),
> however, the version available via the GitHub mirror is only 1.5.21
> (
> https://github.com/Bioconductor-mirror/GenomicTuples/blob/master/DESCRIPTION
> ).
> Shouldn't the latest version be automatically propagated to the GitHub
> mirror?
>
> Thanks,
> Pete
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] SVN and GitHub mirror out-of-sync

2016-04-18 Thread Peter Hickey
Hi,

The current version of GenomicTuples on the official SVN is 1.5.24
(https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomicTuples/DESCRIPTION),
however, the version available via the GitHub mirror is only 1.5.21
(https://github.com/Bioconductor-mirror/GenomicTuples/blob/master/DESCRIPTION).
Shouldn't the latest version be automatically propagated to the GitHub
mirror?

Thanks,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Making GenomicAlignments::readGAlignmentPairs() fail fast if given bad seqlevels in `which`

2016-03-19 Thread Peter Hickey
Thanks, Aaron. I implemented a similar workaround, but I think it
would be nice to have in the core Bioconductor implementation. I had a
quick poke around GenomicAlignments::readGAlignmentPairs(), however,
but it looked like I'd have to learn a bit too much about the
underlying Rsamtools::scanBam() in order to implement a quick fix.

>
> Hi Peter,
>
> I had the same problem a while ago and solved it by first reading only the
> header of the BAM file, extracting the chromosomes that are available and
> generating a warning for all given chromosomes that are not available. That
> worked for my purposes. I have implemented this in a function (
> https://github.com/ataudt/aneufinder/blob/master/R/importReads.R)
>
> Aaron
>
> [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Making GenomicAlignments::readGAlignmentPairs() fail fast if given bad seqlevels in `which`

2016-03-19 Thread Peter Hickey
Hi,

GenomicAlignments::readGAlignmentPairs() can take a while to
(correctly) fail if the `which` parameter contains a "bad" seqlevel.
It'd be great if it failed early in the following scenario (just
experienced).

An example BAM is available from
https://www.dropbox.com/sh/4avqxuqnhlv3r9c/AADqx-XqXV6c7Dc_SaSUq324a?dl=0
(110 MB; sorry, it needs to be large-ish in order to notice the
problem). The following code ought to reproduce the problem. Here I am
taking the example BAM of mouse data mapped to mm10 and using a
`which` based on hg19 (it was mistakenly assuming all my data were
human that led me to this problem). When a single "bad" seqlevel is
provided via `which` then it errors fast and with a helpful error.
However, if the `which` contains multiple seqlevels, some "good" and
some "bad", then it seemingly just hangs. I initially thought it had
just frozen indefinitely but it actually eventually returns the
correct error.

It'd be great if it failed fast in this situation.

Thanks,
Pete

library(GenomicAlignments)
library(BSgenome.Hsapiens.UCSC.hg19)
si <- seqinfo(BSgenome.Hsapiens.UCSC.hg19)

# mouse data mapped to mm10 (temporarily available from
https://www.dropbox.com/sh/4avqxuqnhlv3r9c/AADqx-XqXV6c7Dc_SaSUq324a?dl=0)
file <- "~/Desktop/tmp/SRR1781315.markdup.bam"

# Errors fast and helpfully because chr20 doesn't exist in mouse
readGAlignmentPairs(file, param = ScanBamParam(which = as(si, "GRanges")[20]))

# Takes a long time to error if some seqlevels exist (chr19) and some
don't exist (chr20) in sample
readGAlignmentPairs(file, param = ScanBamParam(which = as(si,
"GRanges")[19:20]))

> sessionInfo()
R Under development (unstable) (2016-03-11 r70310)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils
datasets  methods   base

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.39.4
  rtracklayer_1.31.7
 [4] GenomicAlignments_1.7.20  Rsamtools_1.23.3
  Biostrings_2.39.12
 [7] XVector_0.11.7SummarizedExperiment_1.1.22
  Biobase_2.31.3
[10] GenomicRanges_1.23.24 GenomeInfoDb_1.7.6
  IRanges_2.5.40
[13] S4Vectors_0.9.42  BiocGenerics_0.17.3
  repete_0.0.0.9000
[16] devtools_1.10.0

loaded via a namespace (and not attached):
[1] XML_3.98-1.4digest_0.6.9bitops_1.0-6
zlibbioc_1.17.0 BiocParallel_1.5.20
[6] tools_3.3.0 RCurl_1.95-4.8  memoise_1.0.0

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Making GenomicAlignments::readGAlignmentPairs() fail fast if given bad seqlevels in `which`

2016-03-19 Thread Peter Hickey
Wonderful. Thanks, Martin

> in svn at 1.23.5, and in Bioc-devel hopefully after tonight's build.
>
> Martin

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Behaviour of rbind/cbind on assays slot of SummarizedExperiment with multidimensional assays

2016-03-03 Thread Peter Hickey
Hi Herve,

I agree, the abind::abind() signature is rather verbose and much of it is not
required in the context of a SummarizedExperiment. Perhaps "overriding"
abind::abind() with an S4 generic with a different signature isn't a good idea
and it would be better to have our own generic.

I quite like arbind() and acbind() as names. I guess these would live in the
SummarizedExperiment package?

Happy to do further work on this but I won't have time until the weekend or
next week.

Cheers,
Pete

On Thu, 3 Mar 2016 at 13:31 Hervé Pagès <hpa...@fredhutch.org> wrote:
>
> Hi Pete,
>
> On 03/02/2016 12:42 PM, Peter Hickey wrote:
> > This is mostly directed to Herve and/or Martin, but I'd be interested
> > in other's input too.
> >
> > The SummarizedExperiment package defines rbind,Assays-method and
> > cbind,Assays-method that are called when rbind() or cbind() is called
> > on a SummarizedExperiment object. In the case of two-dimensional assay
> > (matrix) these work much as if rbind/cbind were called on the matrix:
> >
> >> library(SummarizedExperiment)
> >> m <- matrix(rnorm(100), nrow = 4, ncol = 25)
> >> se1 <- SummarizedExperiment(m)
> >> dim(assay(rbind(se1, se1)))
> > [1]  8 25
> >> dim(rbind(assay(se1), assay(se1)))
> > [1]  8 25
> >> dim(assay(cbind(se1, se1)))
> > [1]  4 50
> >> dim(cbind(assay(se1), assay(se1)))
> > [1]  4 50
> >
> > When an assay is an array with more than 2 dimensions, however, the
> > result of the rbind,Assay-method (resp. cbind,Assays-method) differs
> > from the rbind,array-method (resp. cbind,array-method). This is for a
> > good reason because it preserves the dimensionality of the assay in
> > the SummarizedExperiment object. So in fact the "rbind(...)" of the
> > assay is more like abind::abind(..., along = 1) and the "cbind(...)"
> > of the assay is more like abind::abind(..., along = 2):
> >
> >> x <- array(rnorm(100), dim = c(4, 5, 5))
> >> se2 <- SummarizedExperiment(x)
> >> dim(assay(rbind(se2, se2)))
> > [1] 8 5 5
> >> dim(rbind(assay(se2), assay(se2)))
> > [1]   2 100
> >> dim(abind::abind(assay(se2), assay(se2), along = 1))
> > [1] 8 5 5
> >> identical(assay(rbind(se2, se2)), abind::abind(assay(se2), assay(se2), 
> >> along = 1))
> > [1] TRUE
> >> dim(assay(cbind(se2, se2)))
> > [1]  4 10  5
> >> dim(cbind(assay(se2), assay(se2)))
> > [1] 100   2
> >> dim(abind::abind(assay(se2), assay(se2), along = 2))
> > [1]  4 10  5
> >> identical(assay(cbind(se2, se2)), abind::abind(assay(se2), assay(se2), 
> >> along = 2))
> > [1] TRUE
> >
> > rbind/cbind does not work for other "array-like" objects with > 2
> > dimensions in the assays slot of a SummarizedExperiment because the
> > internal function SummarizedExperiment:::.bind_assay_elements()
> > constructs a new array via array() if the assay has more than 2
> > dimensions, thus destroying the original class of the array-like
> > object.
> >
> > What I'm wondering is whether there is a way to generalise rbind/cbind
> > of Assays to other array-like objects provided that have a suitable
> > method defined. It seems to me that a good candidate would be to
> > require that an object in the assays slot has an abind(..., along = 1)
> > and abind(..., along = 2) method defined if it has more than 2
> > dimensions. It might even be worth using abind::abind() for when the
> > assay is an array with more than 2 dimensions to simplify the code
> > somewhat.
> >
> > Thoughts? I'd be happy to work on a patch.
>
> Requiring that abind(..., along=1) and abind(..., along=2) work on
> assays of dim > 2 would work. Note that abind() has a complicated
> signature (many extra arguments) but the "abind" methods that one
> would need to implement wouldn't need to satisfy the full abind()
> contract (in the context of SummarizedExperiment assays, satisfying
> the full contract is not needed and would be too much work).
>
> Alternatively we can introduce our own generics for that e.g.
> abind1() and abind2(), or arbind() and acbind() (for "assay rbind"
> and "assay cbind"). Advantages: the signatures would be cleaner,
> the contracts simpler, and the methods easier to implement. Also
> we wouldn't need to depend on the abind package.
>
> What do you think?
>
> H.
>
> >
> > Cheers,
> > Pete
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Behaviour of rbind/cbind on assays slot of SummarizedExperiment with multidimensional assays

2016-03-02 Thread Peter Hickey
This is mostly directed to Herve and/or Martin, but I'd be interested
in other's input too.

The SummarizedExperiment package defines rbind,Assays-method and
cbind,Assays-method that are called when rbind() or cbind() is called
on a SummarizedExperiment object. In the case of two-dimensional assay
(matrix) these work much as if rbind/cbind were called on the matrix:

> library(SummarizedExperiment)
> m <- matrix(rnorm(100), nrow = 4, ncol = 25)
> se1 <- SummarizedExperiment(m)
> dim(assay(rbind(se1, se1)))
[1]  8 25
> dim(rbind(assay(se1), assay(se1)))
[1]  8 25
> dim(assay(cbind(se1, se1)))
[1]  4 50
> dim(cbind(assay(se1), assay(se1)))
[1]  4 50

When an assay is an array with more than 2 dimensions, however, the
result of the rbind,Assay-method (resp. cbind,Assays-method) differs
from the rbind,array-method (resp. cbind,array-method). This is for a
good reason because it preserves the dimensionality of the assay in
the SummarizedExperiment object. So in fact the "rbind(...)" of the
assay is more like abind::abind(..., along = 1) and the "cbind(...)"
of the assay is more like abind::abind(..., along = 2):

> x <- array(rnorm(100), dim = c(4, 5, 5))
> se2 <- SummarizedExperiment(x)
> dim(assay(rbind(se2, se2)))
[1] 8 5 5
> dim(rbind(assay(se2), assay(se2)))
[1]   2 100
> dim(abind::abind(assay(se2), assay(se2), along = 1))
[1] 8 5 5
> identical(assay(rbind(se2, se2)), abind::abind(assay(se2), assay(se2), along 
> = 1))
[1] TRUE
> dim(assay(cbind(se2, se2)))
[1]  4 10  5
> dim(cbind(assay(se2), assay(se2)))
[1] 100   2
> dim(abind::abind(assay(se2), assay(se2), along = 2))
[1]  4 10  5
> identical(assay(cbind(se2, se2)), abind::abind(assay(se2), assay(se2), along 
> = 2))
[1] TRUE

rbind/cbind does not work for other "array-like" objects with > 2
dimensions in the assays slot of a SummarizedExperiment because the
internal function SummarizedExperiment:::.bind_assay_elements()
constructs a new array via array() if the assay has more than 2
dimensions, thus destroying the original class of the array-like
object.

What I'm wondering is whether there is a way to generalise rbind/cbind
of Assays to other array-like objects provided that have a suitable
method defined. It seems to me that a good candidate would be to
require that an object in the assays slot has an abind(..., along = 1)
and abind(..., along = 2) method defined if it has more than 2
dimensions. It might even be worth using abind::abind() for when the
assay is an array with more than 2 dimensions to simplify the code
somewhat.

Thoughts? I'd be happy to work on a patch.

Cheers,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] dimnames of multidimensional assays in SummarizedExperiment

2016-02-10 Thread Peter Hickey
The assays slot in a SummarizedExperiment object supports elements
with up to 4 dimensions [*]

library(SummarizedExperiment)
makeSE <- function(n) {
  assay <- array(1:2^n,
 dim = rep(2, n),
 dimnames = split(letters[1:(2 * n)], seq_len(n)))
  SummarizedExperiment(assay)
}
x <- makeSE(4)

However, the "higher-order" dimnames of the assays aren't preserved
when calling the `assays` or `assay` getters:

> dimnames(assay(x, withDimnames = TRUE))
[[1]]
[1] "a" "e"

[[2]]
[1] "b" "f"

[[3]]
NULL

[[4]]
NULL

This is despite the data still being available in the assays slot:

> dimnames(x@assays[[1]])
1`
[1] "a" "e"

2`
[1] "b" "f"

3`
[1] "c" "g"

4`
[1] "d" "h"

The following patch fixes this by only touching the rownames and
colnames and not touching the "higher-order" dimnames. Seem
reasonable?

Index: R/SummarizedExperiment-class.R
===
  --- R/SummarizedExperiment-class.R (revision 113505)
+++ R/SummarizedExperiment-class.R (working copy)
@@ -174,7 +174,10 @@
{
  assays <- as(x@assays, "SimpleList")
  if (withDimnames)
-endoapply(assays, "dimnames<-", dimnames(x))
  + endoapply(assays, function(assay) {
+dimnames(assay)[1:2] <- dimnames(x)
+assay
+})
  else
assays
})

[*] In fact, the assay elements can have more than 4 dimensions when
constructed, although subsetting with `[` isn't supported (possibly
things other than subsetting break as well in this case).

# No error
y <- makeSE(5)
y

# Error
y[1, ]

Perhaps there should be a check in the constructor that all assay
elements have < 5 dimensions?

Cheers,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Use of Matrix inside SummarizedExperiment

2016-01-26 Thread Peter Hickey
Thanks, Hervé!

On 26/01/2016, Hervé Pagès <hpa...@fredhutch.org> wrote:
> Hi Pete,
>
> On 01/25/2016 12:32 PM, Peter Hickey wrote:
>> The Matrix virtual class in the Matrix package seems to mostly work as
>> an assays element in a SummarizedExperiment. This is especially useful
>> for data that can be efficiently represented as a sparse matrix, e.g.,
>> using the dgCMatrix class.
>>
>> My understanding is that this works because the (concrete subclasses
>> of) Matrix implement the necessary basic S4 methods to form a basic,
>> matrix-like API. However, there are a couple of edge cases that I'm
>> hoping it might be possible to smoothen out. Ideally, I'd love if this
>> could work for any class that implements a minimal matrix-like API
>> (I'm currently experimenting with such a class) and not just for the
>> Matrix virtual class and its concrete subclasses. From reading the
>> SummarizedExperiment code, it looks like the minimal methods required
>> for an element of a (concrete subclass of) Assays object would be dim,
>> dimnames, [, [<-, rbind, cbind, length. I suppose that if any
>> additional methods are added for the Assays virtual class (e.g., I
>> have an almost-complete combine,SummarizedExperiment-method that calls
>> a combine,Assays-method) then these matrix-like objects must also have
>> such methods defined to ensure relatively straightforward inheritance.
>>
>> Here are a couple of instances where a matrix and a Matrix behave
>> (understandably) differently but where it would be nice if it "just
>> worked". There may well be others, but I'd be interested to know
>> whether this is worth further pursuing.
>>
>> library(SummarizedExperiment)
>> library(Matrix)
>> m <- matrix(1:10, ncol = 2)
>> m2 <- Matrix(m)
>>
>> # SummarizedExperiment constructor has specialised matrix method.
>> se <- SummarizedExperiment(m)
>> # This won't work because there is no Matrix specialisation
>> se2 <- SummarizedExperiment(m2)
>> # But can get around this by wrapping the Matrix in a SimpleList to defer
>> to
>> # the SummarizedExperiment,SimpleList-method
>> se2 <- SummarizedExperiment(SimpleList(m2))
>
> Note that wrapping the Matrix in an ordinary list also works.
>
>> # I guess the only way around this is to write a SummarizedExperiment
>> method
>> # for every matrix-like class, which might be too much overhead for the
>> # SummarizedExperiment package to maintain. Perhaps there is another
>> solution,
>> # e.g., try wrapping the input in a call to SimpleList if no method found
>> and
>> # then deferring to the SimpleList method? Could be too messy to be worth
>> it ...
>
> The method for matrix already does this wrapping into a SimpleList
> object and then defers to the method for SimpleList method. I just
> replaced the current method for matrix by a method for ANY that does
> exactly the same thing. With this change, SummarizedExperiment() takes
> any matrix-like object.
>
>>
>> # assay<- dispatches on value (which must be a matrix)
>> assay(se) <- assay(se)
>> # Won't work because there is no Matrix specialisation
>> assay(se2) <- assay(se2)
>> # But using assays() does work
>> assays(se2)[[1]] <- assays(se2)[[1]]
>> # Could value be dropped from the assay<- signatuare and the object
>> validated
>> # during/following the consequent call to assays<-?
>
> That makes a lot of sense. Having the assay() setter dispatch on 'x',
> 'i', and 'value' has no real benefit. Dispatching on 'x' and 'i' is
> enough and allows the assay() setter to take any matrix-like object as
> long as the resulting SummarizedExperiment object is valid.
>
> These 2 changes are in SummarizedExperiment 1.1.17.
>
> Thanks for the suggestions,
> H.
>
>>
>> Cheers,
>> Pete
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] do SummarizedExperiments really need colnames?

2015-12-06 Thread Peter Hickey
While on the topic of SummarizedExperiment colnames, the circumstances in
which these are stripped from the assays and overridden by colData is
confusing to me, particularly case 2 below (a warning in case 3 might be
useful too).

> m1 <- matrix(1:10, ncol = 2)
> m2 <- m1
> colnames(m2) <- c("A", "B")
>
> se1 <- SummarizedExperiment(m1, colData = DataFrame(row.names = c("A",
"B")))
> se2 <- SummarizedExperiment(m2)
> se3 <- SummarizedExperiment(m2, colData = DataFrame(row.names = c("C",
"D")))
>
> # colnames correctly set to c("A", "B") and stripped from assays
> colnames(se1)
[1] "A" "B"
> se1@assays[[1]]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> # colnames correctly set to c("A", "B") set and but not stripped from
assays
> colnames(se2)
[1] "A" "B"
> se2@assays[[1]]
A B
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> # colnames set to c("C", "D") (without warning about mismatch) and
stripped
> # from assays
> se3@assays[[1]]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10

> sessionInfo()
R Under development (unstable) (2015-11-28 r69714)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] SummarizedExperiment_1.1.5 Biobase_2.31.1
[3] GenomicRanges_1.23.4 GenomeInfoDb_1.7.3
[5] IRanges_2.5.9 S4Vectors_0.9.11
[7] BiocGenerics_0.17.2

loaded via a namespace (and not attached):
[1] zlibbioc_1.17.0 XVector_0.11.1

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] segfault when using RleList in DataFrames

2015-12-06 Thread Peter Hickey
Hi Leonard,

I'm seeing what I think is a related problem in the devel branch. I
think it derives from some issue with List-based classes. E.g, a
simplified version of your example errors for me (although without
segfault):

> library(IRanges)
# snip - this produces a possibly related warning on my machine
"Warning message: multiple methods tables found for ‘unlist’"

> x <- RleList(IntegerList(vector("list", 3)))
> x
RleList of length 3
Error in show(as.list(head(object, k))) :
  error in evaluating the argument 'object' in selecting a method for
function 'show': Error: evaluation nested too deeply: infinite
recursion / options(expressions=)?

I first noticed it this morning when working with GRangesList objects, e.g.:

> library(GenomicRanges)
# snip - this now twice produces the  "Warning message: multiple
methods tables found for ‘unlist’" message

> GRangesList(GRanges())
GRangesList object of length 1:
[[1]]
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

> sessionInfo()
R Under development (unstable) (2015-11-28 r69714)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.23.4 GenomeInfoDb_1.7.3   IRanges_2.5.9
[4] S4Vectors_0.9.11 BiocGenerics_0.17.2

loaded via a namespace (and not attached):
[1] zlibbioc_1.17.0 XVector_0.11.1

Cheers,
Pete

> Hi all,
>
> I ran into problems when using an RleList as column in a DataFrame
> object (see example below).
>
> Many thanks in advance for your help.
>
> Leonard
>
>
>> sessionInfo()
> R version 3.2.2 (2015-08-14)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4parallel  stats graphics  grDevices utils datasets
> [8] methods   base
>
> other attached packages:
> [1] IRanges_2.4.4   S4Vectors_0.8.3 BiocGenerics_0.16.1
>>
>> df <- DataFrame(ID = 1:3)
>> x <- RleList(IntegerList(vector("list", 3)))
>> df$rle <- x
>> df
> DataFrame with 3 rows and 2 columns
>
>  *** caught segfault ***
> address 0x9b00364, cause 'memory not mapped'
>
> Traceback:
>  1: .Call(.NAME, ..., PACKAGE = PACKAGE)
>  2: .Call2("Rle_getStartEndRunAndOffset", x, start, end, PACKAGE = 
> "S4Vectors")
>  3: S4Vectors:::getStartEndRunAndOffset(x, start(solved_SEW), end(solved_SEW))
>  4: .local(x, ...)
>  5: window(x, start = 1L, width = n)
>  6: window(x, start = 1L, width = n)
>  7: .local(x, ...)
>  8: head(x, 3)
>  9: head(x, 3)
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
> Selection:

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] is.unsorted method for GRanges objects

2015-11-03 Thread Peter Hickey
Hi Michael,

Sorry, I took this off-list with Hervé. I've written a prototype
is.unsorted,GenomicRanges-method.

I structured it following the lead of order,GenomicRanges-method, so
there's an outer R-level that "translates" the GenomicRanges object to
4 integer vectors (actually the slowest part of the whole process, I
think). These 4 integer vectors are then passed to a lower-level
function that does the actual looped comparisons. Side note, this
lower-level function, isUnsortedIntegerQuads(), is perhaps better
housed in S4Vectors.

isUnsortedIntegerQuads() currently calls an Rcpp function, but I'll
convert that to a plain C function callable using .Call2() tomorrow
morning (Melbourne, Australia time).

I'll then try to figure out the necessary plumbing in order to get it
all up and running as part of the GenomicRanges package. I hope to
finish it all and send through a patch tomorrow, but it depends what
else hits my desk. (I'll add docs and unit tests if the patch is considered
helpful).

Cheers,
Pete

On 3 November 2015 at 22:41, Michael Lawrence <lawrence.mich...@gene.com> wrote:
> If we're going to do that, it brings up the question of whether
> is.unsorted() could be made to handle multiple vectors like order(). It
> would be nice to implement that logic only once. Suggestions for the API?
> New function? Additional argument? Patches welcome ;)
>
> Michael
>
> On Mon, Nov 2, 2015 at 10:45 PM, Hervé Pagès <hpa...@fredhutch.org> wrote:
>>
>> OK. Thanks Pete for the timings. The fact that the relative difference
>> in speed is larger for small n in your brief tests is because one
>> performs roughly in n*log(n) (quicksort-based) and the other one is
>> linear in time. Which is why I assumed (but without doing any testing)
>> that the latter was going to perform better. Anyway it seems that there
>> is just too much overhead involved in that solution to make it a good
>> candidate.
>>
>> So back to square one and to the business of trying to come up with
>> something even more efficient than is.unsorted(order(x)) for
>> GenomicRanges objects. It's indeed important that is.unsorted() be
>> as fast and as memory efficient as possible since it is typically
>> used as a quick/cheap way of checking whether a costly sort is
>> required or not (e.g. with something like if (is.unsorted(x))
>> x <- sort(x)).
>>
>> So it seems that unfortunately we won't be able to do it without
>> writing some C code. Your proposal sounds very reasonable to me. It
>> will perform in linear time (in the worst case) and avoid any copy
>> of the object (that we get with the expensive calls to head() and
>> tail() in my solution). So will be much faster than the 2 R solutions
>> whatever n is. Should work on GenomicRanges objects, not just GRanges
>> (this is easily achieved by passing S4Vectors:::decodeRle(seqnames(x)),
>> start(x), with(x), and S4Vectors:::decodeRle(strand(x)) to the .Call
>> entry point).
>>
>> I'll take your patch if you want to work on this or I can add this
>> to GenomicRanges, let me know. We should probably take this off-list.
>>
>> Thanks,
>> H.
>>
>>
>> On 11/02/2015 09:43 PM, Peter Hickey wrote:
>>>
>>> Thanks for everyones' input.
>>>
>>> @Hervé FWIW, the below benchmark suggests that unfortunately this is a
>>> fair bit slower than is.unsorted(order(gr)) when the length of the
>>> GRanges object is < 10,000,000 (the relative difference in speed is
>>> larger for small n in my brief tests; I didn't check above n >
>>> 10,000,000)
>>>
>>> ```r
>>> # GenomicRanges_1.23.1
>>> library(GenomicRanges)
>>>
>>> # Simulate some random ranges
>>> sim_gr <- function(n) {
>>>GRanges(seqnames = sample(paste0("chr", 1:22), n, replace = TRUE),
>>>ranges = IRanges(sample(n * 10, size = n, replace = TRUE),
>>>width = runif(n, 1, 10)),
>>>strand = sample(c("+", "-", "*"), n, replace = TRUE),
>>>seqinfo = Seqinfo(paste0("chr", 1:22)))
>>>
>>> }
>>>
>>> gr <- sim_gr(1000)
>>>
>>> herve <- function(x, na.rm=FALSE, strictly=FALSE) {
>>>if (length(x) <= 1L)
>>>  return(FALSE)
>>>x1 <- head(x, n=-1)
>>>x2 <- tail(x, n=-1)
>>>if (strictly)
>>>  return(any(x1 >= x2))
>>>any(x1 > x2)
>>> }
>>>
>>> # 22 seconds
>>> system.time(herve(gr))
>>> # 11.3 seconds
>>>

Re: [Bioc-devel] is.unsorted method for GRanges objects

2015-11-02 Thread Peter Hickey
Thanks for everyones' input.

@Hervé FWIW, the below benchmark suggests that unfortunately this is a
fair bit slower than is.unsorted(order(gr)) when the length of the
GRanges object is < 10,000,000 (the relative difference in speed is
larger for small n in my brief tests; I didn't check above n >
10,000,000)

```r
# GenomicRanges_1.23.1
library(GenomicRanges)

# Simulate some random ranges
sim_gr <- function(n) {
  GRanges(seqnames = sample(paste0("chr", 1:22), n, replace = TRUE),
  ranges = IRanges(sample(n * 10, size = n, replace = TRUE),
  width = runif(n, 1, 10)),
  strand = sample(c("+", "-", "*"), n, replace = TRUE),
  seqinfo = Seqinfo(paste0("chr", 1:22)))

}

gr <- sim_gr(1000)

herve <- function(x, na.rm=FALSE, strictly=FALSE) {
  if (length(x) <= 1L)
return(FALSE)
  x1 <- head(x, n=-1)
  x2 <- tail(x, n=-1)
  if (strictly)
return(any(x1 >= x2))
  any(x1 > x2)
}

# 22 seconds
system.time(herve(gr))
# 11.3 seconds
system.time(is.unsorted(order(gr)))

# And when it's already sorted
gr2 <- sort(gr)

# 4.3 seconds
system.time(herve(gr2))
# 0.2 seconds
system.time(is.unsorted(order(gr2)))
```

Roughly, it looks like the head(), tail() calls take approximately 1/4
of the time each, while the any() call takes the remaining 1/2 of the
time. I was thinking it might be possible to make this quite fast by
looping over the GRanges object at the C-level and breaking out of the
loop if gr[i+1] <= gr[i] or gr[i+1] < gr[i], as appropriate. Does this
sound reasonable?

Cheers,
Pete

On 3 November 2015 at 14:06, Michael Lawrence <lawrence.mich...@gene.com> wrote:
>
>
> On Mon, Nov 2, 2015 at 6:39 PM, Hervé Pagès <hpa...@fredhutch.org> wrote:
>>
>> Hi,
>>
>> @Pete:
>>
>> 2a- I would just compare pairs of adjacent elements, taking
>> advantage of the fact that <= is vectorized and cheap. So something
>> like:
>>
>>   setMethod("is.unsorted", "Vector",
>> function(x, na.rm=FALSE, strictly=FALSE)
>> {
>> if (length(x) <= 1L)
>> return(FALSE)
>> x1 <- head(x, n=-1)
>> x2 <- tail(x, n=-1)
>> if (strictly)
>> return(any(x1 >= x2))
>> any(x1 > x2)
>> }
>>   )
>>
>> Since this will work on any Vector derivative for which <= and
>> subsetting are defined, it's a good candidate for being the default
>> "is.unsorted" method for Vector objects. I'll add it to S4Vectors.
>>
>> 2b- The semantic of is.unsorted() on a GRangesList object or any
>> List object in general should be sapply(x, is.unsorted), for
>> consistency with order(), sort(), etc:
>>
>>   > sort(IntegerList(4:3, 1:-2))
>>   IntegerList of length 2
>>   [[1]] 3 4
>>   [[2]] -2 -1 0 1
>>
>> I'll add this too.
>>
>> 2c - That won't be needed. The default method for Vector objects will
>> work on RangedSummarizedExperiment objects (<= and 1D subsetting are
>> defined and along the same dimension).
>>
>> @Gabe:
>>
>> See ?`GenomicRanges-comparison` for how the order of genomic ranges
>> is defined.
>>
>> @Michael:
>>
>> Calling base::.gt() in a loop sounds indeed very inefficient. What
>> about having base::is.unsorted() do the above on "objects" instead?
>> base::.gt() seems to also require that the object is subsettable so
>> the requirements are the same.
>>
>> Then we wouldn't need the default "is.unsorted" method for Vector
>> objects, only a default "anyNA" method for Vector objects that always
>> returns FALSE (plus some specific ones for Rle and other Vector
>> derivatives that support NAs).
>>
>
> Yea, I assumed it did what you suggested before looking at it. It would be
> the right place to fix this.
>
>>
>> Thanks,
>> H.
>>
>>
>> On 11/02/2015 05:35 PM, Michael Lawrence wrote:
>>>
>>> The notion of sortedness is already formally defined, which is why we
>>> have
>>> an order method, etc.
>>>
>>> The base is.unsorted implementation for "objects" ends up calling
>>> base::.gt() for each adjacent pair of elements, which is likely too slow
>>> to
>>> be practical, so we probably should add a custom method.
>>>
>>> This does bring up the tangential question of whether GenomicRanges
>>> should
>>> have an anyNA method that returns FALSE (and similarly an is.na()
>>> method),
>>> although we have

[Bioc-devel] is.unsorted method for GRanges objects

2015-11-02 Thread Peter Hickey
Hi all,

I sometimes want to test whether a GRanges object (or some object with
a GRanges slot, e.g., a SummarizedExperiment object) is (un)sorted.
There is no is.unsorted,GRanges-method or, rather, it defers to
is.unsorted,ANY-method. I'm unsure that deferring to the
is.unsorted,ANY-method is what is really desired when a user calls
is.unsorted on a GRanges object, and it will certainly return a
(possibly unrelated) warning - "In is.na(x) : is.na() applied to
non-(list or vector) of type 'S4'".


For this reason, I tend to use is.unsorted(order(x)) when x is a
GRanges object. This workaround is also used, for example, by minfi
(https://github.com/kasperdanielhansen/minfi/blob/master/R/blocks.R#L121).
However, this is slow because it essentially sorts the object to test
whether it is already sorted.


So, to my questions:

1. Have I overlooked a fast way to test whether a GRanges object is sorted?
2a. Could a is.unsorted,GenomicRanges-method be added to the
GenomicRanges package? Side note, I'm unsure at which level to define
this method, e.g., GRanges vs. GenomicRanges.
2b. Is it possible to have a sensible definition and implementation
for is.unsorted,GRangesList-method?
2c. Could a is.unsorted,RangedSummarizedExperiment-method be added to
the SummarizedExperiment package?

I started working on a patch for 2a/2c, but wanted to ensure I hadn't
overlooked something obvious. Also, I'm sure 2a/2b/2c could be written
much more efficiently at the C-level but I'm afraid this might be a
bit beyond my abilities to integrate nicely with the existing code.

Thanks,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Help using Git with Bioconductor SVN repositories

2015-10-18 Thread Peter Hickey
I used to use the git-svn bridge for my GenomicTuples package, which I
develop on GitHub. Several months ago I attempted to switch to the new
method described at
http://bioconductor.org/developers/how-to/git-mirrors/ but made a
complete mess of it. This wasn't so important at the time since I
didn't actually have any changes to add, but now I want to do some
development on the package so I'm revisiting the issue. Unfortunately
I'm still making a complete mess of it and would appreciate some help.

I first tried to use my existing GitHub-hosted repo
(https://github.com/PeteHaitch/GenomicTuples). This resulted in a
horrendous pile of merge conflicts that scared me off.

Then I tried forking the repo available from bioconductor-mirror
(https://github.com/Bioconductor-mirror/GenomicTuples). This resulted
in an error due to a non-existent devel branch.

I've posted the complete set of commands and output to
https://gist.github.com/PeteHaitch/c633527fc4610de1832e

Thanks in advance for any help with my self-inflicted mess,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] A method for combining SummarizedExperiment objects

2015-10-14 Thread Peter Hickey
I often find myself with multiple `SE` objects (I'm using `SE` as a
shorthand for the `SummarizedExperiment0` and `RangedSummarizedExeriment`
classes), each with different samples but possibly non-overlapping
features/ranges. Currently, it is difficult to combine these objects;
 `rbind()` can only combine objects with the same samples but different
features/ranges and `cbind()` can only combine objects with the same
features/ranges but different samples. I think it would be useful to have a
"combine" method for `SE` objects that handles the situation where each
object has different samples but with possibly non-overlapping
features/ranges.

I've written a first pass at a method to do this at
https://gist.github.com/PeteHaitch/8993b096cfa7ccd08c13.
Is this a method other people find themselves in need of and, if so, might
we add something like this to the SummarizedExperiment package? As noted in
the gist, there's a few things I'd like to address to make it more robust
and complete (probably some optimisations too).

Cheers,
Pete

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] A method for combining SummarizedExperiment objects

2015-10-14 Thread Peter Hickey
Sorry, the URL may have been mangled. It's
https://gist.github.com/PeteHaitch/8993b096cfa7ccd08c13
<https://gist.github.com/PeteHaitch/8993b096cfa7ccd08c13.>

On Thu, 15 Oct 2015 at 12:52 Peter Hickey <peter.hic...@gmail.com> wrote:

> I often find myself with multiple `SE` objects (I'm using `SE` as a
> shorthand for the `SummarizedExperiment0` and `RangedSummarizedExeriment`
> classes), each with different samples but possibly non-overlapping
> features/ranges. Currently, it is difficult to combine these objects;
>  `rbind()` can only combine objects with the same samples but different
> features/ranges and `cbind()` can only combine objects with the same
> features/ranges but different samples. I think it would be useful to have a
> "combine" method for `SE` objects that handles the situation where each
> object has different samples but with possibly non-overlapping
> features/ranges.
>
> I've written a first pass at a method to do this at
> https://gist.github.com/PeteHaitch/8993b096cfa7ccd08c13.
> Is this a method other people find themselves in need of and, if so, might
> we add something like this to the SummarizedExperiment package? As noted in
> the gist, there's a few things I'd like to address to make it more robust
> and complete (probably some optimisations too).
>
> Cheers,
> Pete
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Minor (impossible to trigger?) bug in assay, SummarizedExperiment0, character-method

2015-10-09 Thread Peter Hickey
See 
https://github.com/Bioconductor-mirror/SummarizedExperiment/blob/744eea36e9f8ee4daea00baa7a1d9eea68d957ca/R/SummarizedExperiment0-class.R#L210

I think it should be 'i = assayNames(x)[1]'.

I say it is impossible to trigger because I don't think this method is
ever called since if 'i' is missing the
assay,SummarizedExperiment0,missing-method will be called, but I point
it out in case I'm mistaken.

Cheers,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Adding a lengths() method to List class

2015-10-02 Thread Peter Hickey
Thanks, Michael!

On Fri, 2 Oct 2015 at 13:54 Michael Lawrence <lawrence.mich...@gene.com>
wrote:

> Change was made. Should dispatch to length and [[ methods.
>
> On Wed, Sep 30, 2015 at 9:37 PM, Hervé Pagès <hpa...@fredhutch.org> wrote:
>
>> On 09/30/2015 05:28 PM, Michael Lawrence wrote:
>>
>>> It wasn't a conscious choice, but it would slow things down a bit. Not
>>> by much though, since we're already attempting dispatch on length(). I
>>> can make the change.
>>>
>>
>> That would be great. Thanks Michael!
>>
>> H.
>>
>>
>>> On Wed, Sep 30, 2015 at 1:33 PM, Hervé Pagès <hpa...@fredhutch.org
>>> <mailto:hpa...@fredhutch.org>> wrote:
>>>
>>> Hi Michael,
>>>
>>> I was expecting this to just work:
>>>
>>>base::lengths(IntegerList(1:4, 1:6))
>>>
>>> but it doesn't:
>>>
>>>Error in base::lengths(IntegerList(1:4, 1:6)) :
>>>  'x' must be a list or atomic vector
>>>
>>> The man page says:
>>>
>>>   This function loops over ‘x’ and returns a compatible vector
>>>   containing the length of each element in ‘x’.  Effectively,
>>>   ‘length(x[[i]])’ is called for all ‘i’, so any methods on
>>> ‘length’
>>>   are considered.
>>>
>>> If length(x[[i]]) is called for all i then it should work on any
>>> object
>>> for which [[ is defined. Note that this is what happens with
>>> base::sapply(), base::mapply(), etc... they all use [[ internally.
>>>
>>> Do you know of any reason why lengths() doesn't do this?
>>>
>>> Thanks,
>>> H.
>>>
>>>
>>> On 09/28/2015 09:51 PM, Michael Lawrence wrote:
>>>
>>> That is the plan. Note that we already have elementLengths()
>>> that serves
>>> the same purpose. It was the direct inspiration for lengths().
>>>
>>> On Mon, Sep 28, 2015 at 9:41 PM, Peter Hickey
>>> <peter.hic...@gmail.com <mailto:peter.hic...@gmail.com>>
>>> wrote:
>>>
>>> The lengths() function was added in R 3.2 to "get the length
>>> of each
>>> element of a list or atomic vector (is.atomic) as an integer
>>> or numeric
>>> vector." It seems useful to me to have also a similar method
>>> defined for
>>> the S4Vectors::List class (and subclasses). What do others
>>> think?
>>>
>>>   [[alternative HTML version deleted]]
>>>
>>> ___
>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
>>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> ___
>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
>>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>
>>>
>>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpa...@fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:(206) 667-1319
>>
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Missed change in S4Vectors

2015-08-12 Thread Peter Hickey
Thanks, Hervé, that has indeed fixed my problem.
Pete

On Wed, 12 Aug 2015 at 02:12 Hervé Pagès hpa...@fredhutch.org wrote:

 Hi Peter,

 Yesterday I realized that I didn't bump S4Vectors version properly
 after I moved selectSome() from BiocGenerics to S4Vectors so this
 could explain the problem you're seeing. I think that if you just
 re-install S4Vectors locally (without making the change you proposed)
 the issue will go away. Hopefully...

 H.

 On 08/10/2015 06:46 PM, Peter Hickey wrote:
  Sorry, that should say once I made the proposed change to S4Vectors,
  not IRanges.
 
 
  On Tue, 11 Aug 2015 8:51 am Peter Hickey peter.hic...@gmail.com
  mailto:peter.hic...@gmail.com wrote:
 
  Hi Hervé,
 
 
  Hmm, sorry I may have misdiagnosed my problem. I was having problems
  with some code in the bsseq vignette.
 
 
  The following demonstrates what was happening:
 
 
suppressPackageStartupMessages(library(bsseq))
 
  Warning message:
 
  In .recacheSubclasses(def@className, def, doSubclasses, env) :
 
 undefined subclass externalRefMethod of class
  expressionORfunction; definition not updated
 
data(BS.chr22)
 
head(seqnames(BS.chr22), n = 4)
 
  factor-Rle of length 4 with 1 run
 
  Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
 
 object 'labeledLine' not found
 
sessionInfo()
 
  R version 3.2.1 (2015-06-18)
 
  Platform: x86_64-apple-darwin13.4.0 (64-bit)
 
  Running under: OS X 10.10.4 (Yosemite)
 
 
  locale:
 
  [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
 
 
  attached base packages:
 
  [1] stats4parallel  stats graphics  grDevices utils
  datasets
 
  [8] methods   base
 
 
  other attached packages:
 
  [1] bsseq_1.5.5SummarizedExperiment_0.3.3
 
  [3] Biobase_2.29.1 GenomicRanges_1.21.18
 
  [5] GenomeInfoDb_1.5.10IRanges_2.3.18
 
  [7] S4Vectors_0.7.12   matrixStats_0.14.2
 
  [9] BiocGenerics_0.15.6
 
 
  loaded via a namespace (and not attached):
 
[1] locfit_1.5-9.1   Rcpp_0.12.0  lattice_0.20-33  gtools_3.5.0
 
[5] chron_2.3-47 plyr_1.8.3   grid_3.2.1   magrittr_1.5
 
[9] scales_0.2.5 stringi_0.5-5reshape2_1.4.1
  XVector_0.9.1
 
  [13] data.table_1.9.4 tools_3.2.1  stringr_1.0.0munsell_0.4.2
 
  [17] colorspace_1.2-6
 
 
  Once I made that proposed change to IRanges (locally) and re-install
  then it works as expected.
 
 
  Any ideas what I'm doing wrong?
 
 
  Thanks,
 
  Pete
 
 
 
  Hi Peter,
 
 
  Starting with S4Vectors 0.7.12, labeledLine() belongs to S4Vectors so
 
  using the triple colon should not be necessary (and doing so will
 
  actually trigger a note from R CMD check). Can you provide more
 
  details on why you need this?
 
 
  Thanks,
 
  H.
 
 
 
  On 08/09/2015 09:16 PM, Peter Hickey wrote:
 
  Hi Hervé,
 
 
  I was having trouble with some devel code of mine and tracked it
  down to some recent updates moving the internal utility
  labeledLine() from BiocGenerics to S4Vectors. The labeledLine()
  internal function wasn’t being found when called in certain
  circumstances. Here’s an svn diff to fix the bug in the S4Vectors
  package.
 
 
  Cheers,
 
  Pete
 
 
  Index: DESCRIPTION
 
  ===
 
  --- DESCRIPTION(revision 107278)
 
  +++ DESCRIPTION(working copy)
 
  @@ -8,7 +8,7 @@
 
  interest (e.g. DataFrame, Rle, and Hits) are implemented in the
 
  S4Vectors package itself (many more are implemented in the IRanges
 
  package and in other Bioconductor infrastructure packages).
 
  -Version: 0.7.12
 
  +Version: 0.7.13
 
Author: H. Pages, M. Lawrence and P. Aboyoun
 
Maintainer: Bioconductor Package Maintainer
  maintai...@bioconductor.org mailto:maintai...@bioconductor.org
 
biocViews: Infrastructure, DataRepresentation
 
  Index: R/List-class.R
 
  ===
 
  --- R/List-class.R(revision 107278)
 
  +++ R/List-class.R(working copy)
 
  @@ -86,7 +86,7 @@
 
  cat(classNameForDisplay(object),  of length , lo,
 
  \n, sep = )
 
  if (!is.null(names(object)))
 
  -cat(labeledLine(names, names(object)))
 
  +cat(S4Vectors:::labeledLine(names, names(object)))
 
  })
 
 
 
  
 
  Peter Hickey,
 
  PhD Student/Research Assistant,
 
  Bioinformatics Division,
 
  Walter and Eliza Hall Institute of Medical Research,
 
  1G Royal Parade, Parkville, Vic 3052, Australia.
 
  Ph: +613 9345 2324
 
 
  hic

Re: [Bioc-devel] Missed change in S4Vectors

2015-08-10 Thread Peter Hickey
Hi Hervé,


Hmm, sorry I may have misdiagnosed my problem. I was having problems with
some code in the bsseq vignette.


The following demonstrates what was happening:


 suppressPackageStartupMessages(library(bsseq))

Warning message:

In .recacheSubclasses(def@className, def, doSubclasses, env) :

  undefined subclass externalRefMethod of class expressionORfunction;
definition not updated

 data(BS.chr22)

 head(seqnames(BS.chr22), n = 4)

factor-Rle of length 4 with 1 run

Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :

  object 'labeledLine' not found

 sessionInfo()

R version 3.2.1 (2015-06-18)

Platform: x86_64-apple-darwin13.4.0 (64-bit)

Running under: OS X 10.10.4 (Yosemite)


locale:

[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8


attached base packages:

[1] stats4parallel  stats graphics  grDevices utils datasets

[8] methods   base


other attached packages:

[1] bsseq_1.5.5SummarizedExperiment_0.3.3

[3] Biobase_2.29.1 GenomicRanges_1.21.18

[5] GenomeInfoDb_1.5.10IRanges_2.3.18

[7] S4Vectors_0.7.12   matrixStats_0.14.2

[9] BiocGenerics_0.15.6


loaded via a namespace (and not attached):

 [1] locfit_1.5-9.1   Rcpp_0.12.0  lattice_0.20-33  gtools_3.5.0

 [5] chron_2.3-47 plyr_1.8.3   grid_3.2.1   magrittr_1.5

 [9] scales_0.2.5 stringi_0.5-5reshape2_1.4.1   XVector_0.9.1

[13] data.table_1.9.4 tools_3.2.1  stringr_1.0.0munsell_0.4.2

[17] colorspace_1.2-6


Once I made that proposed change to IRanges (locally) and re-install then
it works as expected.


Any ideas what I'm doing wrong?


Thanks,

Pete



Hi Peter,


Starting with S4Vectors 0.7.12, labeledLine() belongs to S4Vectors so

using the triple colon should not be necessary (and doing so will

actually trigger a note from R CMD check). Can you provide more

details on why you need this?


Thanks,

H.



On 08/09/2015 09:16 PM, Peter Hickey wrote:

Hi Hervé,


I was having trouble with some devel code of mine and tracked it down to
some recent updates moving the internal utility labeledLine() from
BiocGenerics to S4Vectors. The labeledLine() internal function wasn’t being
found when called in certain circumstances. Here’s an svn diff to fix the
bug in the S4Vectors package.


Cheers,

Pete


Index: DESCRIPTION

===

--- DESCRIPTION (revision 107278)

+++ DESCRIPTION (working copy)

@@ -8,7 +8,7 @@

  interest (e.g. DataFrame, Rle, and Hits) are implemented in the

  S4Vectors package itself (many more are implemented in the IRanges

  package and in other Bioconductor infrastructure packages).

-Version: 0.7.12

+Version: 0.7.13

 Author: H. Pages, M. Lawrence and P. Aboyoun

 Maintainer: Bioconductor Package Maintainer maintai...@bioconductor.org

 biocViews: Infrastructure, DataRepresentation

Index: R/List-class.R

===

--- R/List-class.R (revision 107278)

+++ R/List-class.R (working copy)

@@ -86,7 +86,7 @@

   cat(classNameForDisplay(object),  of length , lo,

   \n, sep = )

   if (!is.null(names(object)))

-cat(labeledLine(names, names(object)))

+cat(S4Vectors:::labeledLine(names, names(object)))

   })





Peter Hickey,

PhD Student/Research Assistant,

Bioinformatics Division,

Walter and Eliza Hall Institute of Medical Research,

1G Royal Parade, Parkville, Vic 3052, Australia.

Ph: +613 9345 2324


hic...@wehi.edu.au

http://www.wehi.edu.au


__

The information in this email is confidential and intended solely for the
addressee.

You must not disclose, forward, print or use it without the permission of
the sender.

__



-- 

Hervé Pagès


Program in Computational Biology

Division of Public Health Sciences

Fred Hutchinson Cancer Research Center

1100 Fairview Ave. N, M1-B514

P.O. Box 19024

Seattle, WA 98109-1024


E-mail: hpa...@fredhutch.org

Phone:  (206) 667-5791

Fax:(206) 667-1319

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Missed change in S4Vectors

2015-08-10 Thread Peter Hickey
Sorry, that should say once I made the proposed change to S4Vectors, not
IRanges.

On Tue, 11 Aug 2015 8:51 am Peter Hickey peter.hic...@gmail.com wrote:

 Hi Hervé,


 Hmm, sorry I may have misdiagnosed my problem. I was having problems with
 some code in the bsseq vignette.


 The following demonstrates what was happening:


  suppressPackageStartupMessages(library(bsseq))

 Warning message:

 In .recacheSubclasses(def@className, def, doSubclasses, env) :

   undefined subclass externalRefMethod of class expressionORfunction;
 definition not updated

  data(BS.chr22)

  head(seqnames(BS.chr22), n = 4)

 factor-Rle of length 4 with 1 run

 Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :

   object 'labeledLine' not found

  sessionInfo()

 R version 3.2.1 (2015-06-18)

 Platform: x86_64-apple-darwin13.4.0 (64-bit)

 Running under: OS X 10.10.4 (Yosemite)


 locale:

 [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8


 attached base packages:

 [1] stats4parallel  stats graphics  grDevices utils datasets

 [8] methods   base


 other attached packages:

 [1] bsseq_1.5.5SummarizedExperiment_0.3.3

 [3] Biobase_2.29.1 GenomicRanges_1.21.18

 [5] GenomeInfoDb_1.5.10IRanges_2.3.18

 [7] S4Vectors_0.7.12   matrixStats_0.14.2

 [9] BiocGenerics_0.15.6


 loaded via a namespace (and not attached):

  [1] locfit_1.5-9.1   Rcpp_0.12.0  lattice_0.20-33  gtools_3.5.0

  [5] chron_2.3-47 plyr_1.8.3   grid_3.2.1   magrittr_1.5

  [9] scales_0.2.5 stringi_0.5-5reshape2_1.4.1   XVector_0.9.1

 [13] data.table_1.9.4 tools_3.2.1  stringr_1.0.0munsell_0.4.2

 [17] colorspace_1.2-6


 Once I made that proposed change to IRanges (locally) and re-install then
 it works as expected.


 Any ideas what I'm doing wrong?


 Thanks,

 Pete



 Hi Peter,


 Starting with S4Vectors 0.7.12, labeledLine() belongs to S4Vectors so

 using the triple colon should not be necessary (and doing so will

 actually trigger a note from R CMD check). Can you provide more

 details on why you need this?


 Thanks,

 H.



 On 08/09/2015 09:16 PM, Peter Hickey wrote:

 Hi Hervé,


 I was having trouble with some devel code of mine and tracked it down to
 some recent updates moving the internal utility labeledLine() from
 BiocGenerics to S4Vectors. The labeledLine() internal function wasn’t being
 found when called in certain circumstances. Here’s an svn diff to fix the
 bug in the S4Vectors package.


 Cheers,

 Pete


 Index: DESCRIPTION

 ===

 --- DESCRIPTION (revision 107278)

 +++ DESCRIPTION (working copy)

 @@ -8,7 +8,7 @@

   interest (e.g. DataFrame, Rle, and Hits) are implemented in the

   S4Vectors package itself (many more are implemented in the IRanges

   package and in other Bioconductor infrastructure packages).

 -Version: 0.7.12

 +Version: 0.7.13

  Author: H. Pages, M. Lawrence and P. Aboyoun

  Maintainer: Bioconductor Package Maintainer maintai...@bioconductor.org

  biocViews: Infrastructure, DataRepresentation

 Index: R/List-class.R

 ===

 --- R/List-class.R (revision 107278)

 +++ R/List-class.R (working copy)

 @@ -86,7 +86,7 @@

cat(classNameForDisplay(object),  of length , lo,

\n, sep = )

if (!is.null(names(object)))

 -cat(labeledLine(names, names(object)))

 +cat(S4Vectors:::labeledLine(names, names(object)))

})



 

 Peter Hickey,

 PhD Student/Research Assistant,

 Bioinformatics Division,

 Walter and Eliza Hall Institute of Medical Research,

 1G Royal Parade, Parkville, Vic 3052, Australia.

 Ph: +613 9345 2324


 hic...@wehi.edu.au

 http://www.wehi.edu.au


 __

 The information in this email is confidential and intended solely for the
 addressee.

 You must not disclose, forward, print or use it without the permission of
 the sender.

 __



 --

 Hervé Pagès


 Program in Computational Biology

 Division of Public Health Sciences

 Fred Hutchinson Cancer Research Center

 1100 Fairview Ave. N, M1-B514

 P.O. Box 19024

 Seattle, WA 98109-1024


 E-mail: hpa...@fredhutch.org

 Phone:  (206) 667-5791

 Fax:(206) 667-1319


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Google hangout on Wed December 10th for new package authors

2014-12-02 Thread Peter Hickey
Hi Marc,

Will a recording be made available? This sounds quite useful, but not quite 
staying up until 2am Australian time useful :)

Thanks,
Pete 

 Message: 5
 Date: Tue, 2 Dec 2014 15:08:08 -0800
 From: Marc Carlson mcarl...@fredhutch.org
 To: bioc-devel bioc-de...@stat.math.ethz.ch
 Subject: [Bioc-devel] Google hangout on Wed December 10th for new
   package authors
 Message-ID: 547e4658.5050...@fredhutch.org
 Content-Type: text/plain; charset=utf-8; format=flowed
 
 Hello new package authors,
 
 Based on the number of new software packages being submitted to the 
 project it seems that Bioconductor is more popular than ever.  Last 
 release we added a hundred and ten new packages (a new record).
 
 A lot of the popularity of this project is because Bioconductor packages 
 have to live up to certain minimal standards (Nature Genetics thinks so 
 too, e.g., http://www.nature.com/ng/journal/v46/n1/full/ng.2869.html). 
 For example every Bioconductor package is expected to:
 
 1) provide complete documentation so that new users will know how to use 
 them
 2) contain working examples that are run when the package is checked by 
 the build system so that failure can be detected early.
 3) cooperate with related packages within the project so as to 
 facilitate code reuse and support reproducible research.
 
 We hope you will agree that having such package guidelines is a big win 
 for the whole community.
 
 To help *you* contribute to Bioconductor, we are going to have a Google 
 hangout (on air) to allow you to tune in, listen to some tips from 
 Bioconductor package reviewers and then open up the forum for questions.
 
 Webinar Invitation: Contributing your package to Bioconductor: 
 guidelines and overview
 Date: December 10, 2014
 Time: 8:00 AM PST /11:00 AM  EST
 
 Please 'tune in' December 10th at 8AM PST for a Google Hangout to 
 discuss new package contributions.  And learn how to maximize the value 
 of your package contribution to the Bioconductor community.

__
The information in this email is confidential and intend...{{dropped:6}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to Hits class in devel branch affecting distanceToNearest method

2014-11-10 Thread Peter Hickey
Thanks, Val and Herve.
On 08/11/2014, at 8:47 AM, Valerie Obenchain voben...@fredhutch.org wrote:

 These issues are fixed in IRanges 2.1.8 and GenomicRanges 1.19.5.
 
 Valerie
 
 
 On 11/05/14 18:32, Herv� Pag�s wrote:
 Hi Peter,
 
 The new validity method for Hits revealed some issues with
 the behavior of distanceToNearest method for GRanges objects.
 The major issue being that it sometimes returns a Hits
 object with NAs in it. Val will address this in the next few
 days.
 
 Cheers,
 H.
 
 
 On 11/05/2014 06:21 PM, Peter Hickey wrote:
 This message may be a bit premature or redundant since after writing
 it I now see that Herve is in the midst of work on the Hits class in
 S4Vectors.
 
 My package GenomicTuples is currently failing R CMD check in the devel
 branch
 (http://bioconductor.org/checkResults/devel/bioc-LATEST/GenomicTuples/zin2-checksrc.html).
 I tracked down the error to a problem with my
 distanceToNearest,GTuples,GTuples-method. However, this is simply
 defined via inheritance to the
 distanceToNearest,GenomicRanges,GenomicRanges-method and so I believe
 the problem lies with the
 distanceToNearest,GenomicRanges,GenomicRanges-method.
 
 The following example adapted from nearest-methods {GenomicRanges}
 docs demonstrates the change in behaviour between the release and
 devel branch. It appears to be due to a new validity check of the Hits
 object returned by distanceToNearest. As I said, I now see that Herve
 recently started some work on the Hits class in S4Vectors, which
 includes this validity check, so this is really just a heads up that
 these changes affect the distanceToNearest method, in case this hasn't
 already been noted.
 
 Thanks,
 Pete
 
 #---
 
 # BioC release
   library(GenomicRanges)
   query - GRanges(c(A, B), IRanges(c(1, 5), width=1))
   subject - GRanges(A, IRanges(c(6, 5, 13), c(10, 10, 15)))
   distanceToNearest(query, subject)
 Hits of length 2
 queryLength: 2
 subjectLength: 3
   queryHits subjectHits  distance
integer   integer integer
  1 1   2 3
  2 2  NANA
 sessionInfo()
 R version 3.1.1 (2014-07-10)
 Platform: x86_64-unknown-linux-gnu (64-bit)
 
 locale:
  [1] LC_CTYPE=en_AU.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8
  [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8
  [7] LC_PAPER=en_AU.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats4parallel  stats graphics  grDevices utils datasets
 [8] methods   base
 
 other attached packages:
 [1] GenomicRanges_1.18.1 GenomeInfoDb_1.2.2   IRanges_2.0.0
 [4] S4Vectors_0.4.0  BiocGenerics_0.12.0
 
 loaded via a namespace (and not attached):
 [1] XVector_0.6.0
 #---
 
 
 #---
 
 # BioC devel
   library(GenomicRanges)
 query - GRanges(c(A, B), IRanges(c(1, 5), width=1))
 subject - GRanges(A, IRanges(c(6, 5, 13), c(10, 10, 15)))
 distanceToNearest(query, subject)
 Error in validObject(.Object) :
   invalid class �Hits� object: 'subjectHits(x)' must contain non-NA
 values = 1 and = 'subjectLength(x)'
 sessionInfo()
 R Under development (unstable) (2014-10-29 r66891)
 Platform: x86_64-unknown-linux-gnu (64-bit)
 
 locale:
  [1] LC_CTYPE=en_AU.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8
  [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8
  [7] LC_PAPER=en_AU.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats4parallel  stats graphics  grDevices utils datasets
 [8] methods   base
 
 other attached packages:
 [1] GenomicRanges_1.19.4 GenomeInfoDb_1.3.6   IRanges_2.1.5
 [4] S4Vectors_0.5.3  BiocGenerics_0.13.0
 
 loaded via a namespace (and not attached):
 [1] XVector_0.7.1
 #---
 
 
 
 Peter Hickey,
 PhD Student/Research Assistant,
 Bioinformatics Division,
 Walter and Eliza Hall Institute of Medical Research,
 1G Royal Parade, Parkville, Vic 3052, Australia.
 Ph: +613 9345 2324
 
 hic...@wehi.edu.au
 http://www.wehi.edu.au
 
 __
 The information in this email is confidential and inte...{{dropped:19}}
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 


Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall

[Bioc-devel] Bug in GenomicRanges:::compare

2014-10-05 Thread Peter Hickey
Hi Martin, 

The last element of 'x' is never accessed in a call to the internal function 
GenomicRanges:::.compare when 'GenomicRanges = TRUE'. The attached patch fixes 
this.

Cheers,
Pete
===
--- R/SummarizedExperiment-class.R  (revision 94987)
+++ R/SummarizedExperiment-class.R  (working copy)
@@ -635,7 +635,7 @@
x - lapply(x, unlist) 
x1 - x[[1]]
}
- for (i in seq_along(x[-1])) {
+ for (i in seq_along(x)[-1]) {
if (length(x1) != length(x[[i]]))
return(FALSE)
ok - x1 == x[[i]]

__
The information in this email is confidential and intend...{{dropped:6}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Peter Hickey
Just a vote for still allowing for multiple genomes in a Seqinfo object (in a 
GRanges object). My use case is in bisulfite-sequencing experiments where there 
is often a spike-in of a lambda phage genome along with the genome of interest 
(human or mouse). It's often useful to keep all data from a single library 
together in the same objet but process according to genome(x) for each seqlevel.

FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the show 
method.

Cheers,
Pete


 I might have requested the genome annotation, but I'm pretty sure it wasn't
 me who decide on tracking it on a per-sequence basis. I could imagine use
 cases for that though, e.g., when diagnosing sequencing contamination (like
 human vs. mouse). But most other tools and file formats expect a single
 genome per track, so, for example, rtracklayer has an internal function
 singleGenome() to take care of this.
 
 On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s hpa...@fhcrc.org wrote:
 
 Hi Vince,
 
 Yes it would make sense to have the show method report the genome
 when genome(x) contains a unique non-NA value. I think the main
 use case for having the genome defined at the sequence level instead
 of the whole object level is metagenomics. Maybe Michael has some other
 good use cases to share since IIRC he requested the addition of the
 genome field a couple of years ago and made the case for having it
 defined at the sequence level.
 
 Cheers,
 H.
 
 
 On 09/08/2014 07:21 AM, Vincent Carey wrote:
 
 For GRanges x, my naive expectation is that genome(x) returns a length-
 
 one tag identifying the genome to which chromosomal coordinates
 
 correspond.  The genome() method seems to have sequence-specific
 
 semantics, which makes sense, but when we identify sequence
 
 with chromosome, it seems too complicated.  Is there a use case for
 
 a GRanges with sequences from several different genomes?
 
 
 One reason I am inquiring is that I feel it would be nice to have the
 GRanges show() method report, prominently, the genome in use (or NA
 
 if unspecified).  This could be accomplished by reporting
 unique(genome(x)), and perhaps that would be satisfactory.
 
 after example(genome) :
 
 seqinfo(txdb)
 
 
 Seqinfo of length 15
 
 seqnames seqlengths isCircular genome
 
 CH2L   23011544  FALSEdm3
 
 CH2R   21146708  FALSEdm3
 
 CH3L   24543557  FALSEdm3
 
 CH3R   27905053  FALSEdm3
 
 CH4 1351857  FALSEdm3
 
 ... .........
 
 CH3LHet 2555491  FALSEdm3
 
 CH3RHet 2517507  FALSEdm3
 
 CHXHet   204112  FALSEdm3
 
 CHYHet   347038  FALSEdm3
 
 CHUextra   29004656  FALSEdm3
 
 genome(seqinfo(txdb))
 
 
 CH2L CH2R CH3L CH3R  CH4  CHX  CHUM
 
dm3dm3dm3dm3dm3dm3dm3dm3
 
  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra
 
dm3dm3dm3dm3dm3dm3dm3
 
[[alternative HTML version deleted]]
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 --
 Herv? Pag?s
 
 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024
 
 E-mail: hpa...@fhcrc.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319
 
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 


Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hic...@wehi.edu.au
http://www.wehi.edu.au

__
The information in this email is confidential and intend...{{dropped:6}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Peter Hickey
Perhaps it might be useful to have some way of highlighting if any of the 
chromosomes are circular or highlighting if there are multiple genomes present? 
Otherwise this information might be hidden in the �

Cheers,
Pete


On 09/09/2014, at 9:44 AM, Herv� Pag�s hpa...@fhcrc.org wrote:

 On 09/08/2014 02:28 PM, Peter Hickey wrote:
 Just a vote for still allowing for multiple genomes in a Seqinfo object (in 
 a GRanges object). My use case is in bisulfite-sequencing experiments where 
 there is often a spike-in of a lambda phage genome along with the genome of 
 interest (human or mouse). It's often useful to keep all data from a single 
 library together in the same objet but process according to genome(x) for 
 each seqlevel.
 
 Note taken. Thanks Pete! It's always great to know about concrete use
 cases.
 
 
 FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the show 
 method.
 
 Or what about displaying the genome next to the seqlevel it's
 associated with? Like e.g.:
 
   gr
  GRanges with 3 ranges and 0 metadata columns:
seqnames   ranges strand
   RleIRanges  Rle
[1]chr14 [19069583, 19069654]  +
[2]chr14 [19363738, 19363809]  +
[3]chr14 [19363755, 19363826]  -
[4]chr14 [19369799, 19369870]  +
---
seqinfo:
  seqlevels seqlengths isCircular genome
  chr1   249250621   NA   hg19
  chr10  135534747   NA   hg19
  chr11  135006516   NA   hg19
  ...  .........
  chrUn_gl000249 38502   NA   hg19
  chrX   155270560   NA   hg19
  chrY59373566   NA   hg19
 
 That way, we also raise awareness about the isCircular field.
 The current choice to only display the seqlengths pre-dates the
 existence of the seqinfo slot but might be a little bit misleading
 those days since it only exposes some arbitrary seqinfo fields.
 
 H.
 
 
 Cheers,
 Pete
 
 
 I might have requested the genome annotation, but I'm pretty sure it wasn't
 me who decide on tracking it on a per-sequence basis. I could imagine use
 cases for that though, e.g., when diagnosing sequencing contamination (like
 human vs. mouse). But most other tools and file formats expect a single
 genome per track, so, for example, rtracklayer has an internal function
 singleGenome() to take care of this.
 
 On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s hpa...@fhcrc.org wrote:
 
 Hi Vince,
 
 Yes it would make sense to have the show method report the genome
 when genome(x) contains a unique non-NA value. I think the main
 use case for having the genome defined at the sequence level instead
 of the whole object level is metagenomics. Maybe Michael has some other
 good use cases to share since IIRC he requested the addition of the
 genome field a couple of years ago and made the case for having it
 defined at the sequence level.
 
 Cheers,
 H.
 
 
 On 09/08/2014 07:21 AM, Vincent Carey wrote:
 
 For GRanges x, my naive expectation is that genome(x) returns a length-
 
 one tag identifying the genome to which chromosomal coordinates
 
 correspond.  The genome() method seems to have sequence-specific
 
 semantics, which makes sense, but when we identify sequence
 
 with chromosome, it seems too complicated.  Is there a use case for
 
 a GRanges with sequences from several different genomes?
 
 
 One reason I am inquiring is that I feel it would be nice to have the
 GRanges show() method report, prominently, the genome in use (or NA
 
 if unspecified).  This could be accomplished by reporting
 unique(genome(x)), and perhaps that would be satisfactory.
 
 after example(genome) :
 
 seqinfo(txdb)
 
 
 Seqinfo of length 15
 
 seqnames seqlengths isCircular genome
 
 CH2L   23011544  FALSEdm3
 
 CH2R   21146708  FALSEdm3
 
 CH3L   24543557  FALSEdm3
 
 CH3R   27905053  FALSEdm3
 
 CH4 1351857  FALSEdm3
 
 ... .........
 
 CH3LHet 2555491  FALSEdm3
 
 CH3RHet 2517507  FALSEdm3
 
 CHXHet   204112  FALSEdm3
 
 CHYHet   347038  FALSEdm3
 
 CHUextra   29004656  FALSEdm3
 
 genome(seqinfo(txdb))
 
 
 CH2L CH2R CH3L CH3R  CH4  CHX  CHUM
 
dm3dm3dm3dm3dm3dm3dm3dm3
 
  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra
 
dm3dm3dm3dm3dm3dm3dm3
 
[[alternative HTML version deleted]]
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 --
 Herv? Pag?s
 
 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024
 
 E

Re: [Bioc-devel] Valid classes for extraColumnSlots

2014-09-02 Thread Peter Hickey
Hi Michael, 

Sorry to bring this up again. I think the S4Vectors version number needs to be 
bumped to propagate the addition of replaceROWS,NULL-method to the build system 
- I can use it if I build S4Vectors from source but not if I install via 
biocLite() [with useDevel()]. 

Thanks,
Pete


On 29/08/2014, at 1:18 PM, Michael Lawrence lawrence.mich...@gene.com wrote:

 Added to S4Vectors. Thanks!
 
 
 On Thu, Aug 28, 2014 at 5:04 PM, Peter Hickey hic...@wehi.edu.au wrote:
 Thanks, Michael. Do you think there's a general use case for a replaceROWs, 
 NULL method or shall I just specify that in my package? I require it because 
 the slot is matrixOrNULL via a setClassUnion but I don't know how common that 
 is amongst other BioC devels.
 
 setMethod(replaceROWS,
   NULL,
   function(x, i, value) {
 NULL
   }
 )
 
 Thanks,
 Pete
 
 
 On 29/08/2014, at 8:34 AM, Michael Lawrence lawrence.mich...@gene.com wrote:
 
  Sorry it took so long. Fixed in S4Vectors 0.1.3. Surprisingly, we were 
  missing a replaceROWs,matrix method.
 
 
 
  On Tue, Aug 26, 2014 at 5:13 PM, Peter Hickey hic...@wehi.edu.au wrote:
  Hi Michael,
 
  Thanks for your patience. Here is a self-contained example with comments 
  https://gist.github.com/PeteHaitch/fdb66d360446ff96ed4b
 
  Thanks,
  Pete
 
 
  On 27/08/2014, at 1:43 AM, Michael Lawrence lawrence.mich...@gene.com 
  wrote:
 
   Do you have the code that actually fails? Then I could use it to 
   reproduce the problem and fix things.
  
   Thanks,
   Michael
  
  
   On Tue, Aug 26, 2014 at 4:25 AM, Peter Hickey hic...@wehi.edu.au wrote:
   Hi Michael,
  
   Sorry for my misunderstanding. Here is some code describing the class
   https://github.com/PeteHaitch/GenomicTuples/blob/master/R/GTuples-class.R
   (the package is not yet installable but hopefully the in-progress code
   shows you what I'm trying to achieve).
  
   The relevant slot is called internalPos and extraColumnSlotNames does
   indeed return this as a character vector. What I meant is that originally
   the internalPos slot was a matrix (or NULL). I switched to DataFrame
   (or NULL) because I was running into some problems related to replaceROWS
   when it was a matrix.
  
   Thanks,
   Pete
  
   - Original Message -
   From: Michael Lawrence lawrence.mich...@gene.com
   To: Peter Hickey hic...@wehi.edu.au
   Cc: bioc-devel@r-project.org
   Sent: Tue, 26 Aug 2014 13:35:35 +1000 (EST)
   Subject: Re: [Bioc-devel] Valid classes for extraColumnSlots
   Hi Peter,
   Some code would help here.  I'm not sure what you mean by having a matrix 
   as your extraColumnSlots. A derivative of GenomicRanges should definel a 
   method for extraColumnSlotNames that returns a character vector of names 
   for actual slots that the class defines. It sounds like you're trying to 
   represent all of the extra column slots with a single matrix slot, which 
   is not how the mechanism was designed.
   Michael
  
   On Mon, Aug 25, 2014 at 7:57 PM, Peter Hickey hic...@wehi.edu.au wrote:
   Are the extraColumnSlots of a class that extends GenomicRanges limited to 
   DataFrame objects?
  
   Background: I wrote a class that extends the GRanges class. It has a 
   matrix as the extraColumnSlots. When I use 
   replaceROWS,GenomicRanges,GenomicRanges-method (via inheritance) it 
   extracts this extraColumnSlots as a DataFrame object by use of 
   GenomicRanges:::extraColumnSlotsAsDF. This means that the subsequent call 
   to update() in replaceROWS,GenomicRanges,GenomicRanges-method fails 
   because the class definition expects a matrix for the extraSlotNames but 
   gets a DataFrame.
  
   In this case, it's not a problem for me to change my extraColumnSlots 
   element to a DataFrame in the class definition. However, more generally, 
   some guidance on what classes are and are not allowed in extraColumnSlots 
   would be appreciated.
  
   Thanks,
   Pete
  
   This is using BioC devel:
   sessionInfo()
   R version 3.1.1 (2014-07-10)
   Platform: x86_64-apple-darwin13.1.0 (64-bit)
  
   locale:
   [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
  
   attached base packages:
   [1] parallel  stats graphics  grDevices utils datasets  methods
   [8] base
  
   other attached packages:
   [1] GenomicTuples_0.1.0   GenomicRanges_1.17.35 GenomeInfoDb_1.1.18
   [4] IRanges_1.99.24   S4Vectors_0.1.2   BiocGenerics_0.11.4
   [7] devtools_1.5
  
   loaded via a namespace (and not attached):
[1] Biobase_2.25.0   digest_0.6.4 evaluate_0.5.5   httr_0.4
[5] memoise_0.2.1packrat_0.4.0.12 Rcpp_0.11.2  RCurl_1.95-4.3
[9] stats4_3.1.1 stringr_0.6.2tools_3.1.1  whisker_0.3-2
  
   
   Peter Hickey,
   PhD Student/Research Assistant,
   Bioinformatics Division,
   Walter and Eliza Hall Institute of Medical Research,
   1G Royal Parade, Parkville, Vic 3052, Australia.
   Ph: +613 9345 2324
   hic

Re: [Bioc-devel] Valid classes for extraColumnSlots

2014-08-28 Thread Peter Hickey
Thanks, Michael. Do you think there's a general use case for a replaceROWs, 
NULL method or shall I just specify that in my package? I require it because 
the slot is matrixOrNULL via a setClassUnion but I don't know how common that 
is amongst other BioC devels.

setMethod(replaceROWS, 
  NULL,
  function(x, i, value) {
NULL
  }
)

Thanks,
Pete


On 29/08/2014, at 8:34 AM, Michael Lawrence lawrence.mich...@gene.com wrote:

 Sorry it took so long. Fixed in S4Vectors 0.1.3. Surprisingly, we were 
 missing a replaceROWs,matrix method.
 
 
 
 On Tue, Aug 26, 2014 at 5:13 PM, Peter Hickey hic...@wehi.edu.au wrote:
 Hi Michael,
 
 Thanks for your patience. Here is a self-contained example with comments 
 https://gist.github.com/PeteHaitch/fdb66d360446ff96ed4b
 
 Thanks,
 Pete
 
 
 On 27/08/2014, at 1:43 AM, Michael Lawrence lawrence.mich...@gene.com wrote:
 
  Do you have the code that actually fails? Then I could use it to reproduce 
  the problem and fix things.
 
  Thanks,
  Michael
 
 
  On Tue, Aug 26, 2014 at 4:25 AM, Peter Hickey hic...@wehi.edu.au wrote:
  Hi Michael,
 
  Sorry for my misunderstanding. Here is some code describing the class
  https://github.com/PeteHaitch/GenomicTuples/blob/master/R/GTuples-class.R
  (the package is not yet installable but hopefully the in-progress code
  shows you what I'm trying to achieve).
 
  The relevant slot is called internalPos and extraColumnSlotNames does
  indeed return this as a character vector. What I meant is that originally
  the internalPos slot was a matrix (or NULL). I switched to DataFrame
  (or NULL) because I was running into some problems related to replaceROWS
  when it was a matrix.
 
  Thanks,
  Pete
 
  - Original Message -
  From: Michael Lawrence lawrence.mich...@gene.com
  To: Peter Hickey hic...@wehi.edu.au
  Cc: bioc-devel@r-project.org
  Sent: Tue, 26 Aug 2014 13:35:35 +1000 (EST)
  Subject: Re: [Bioc-devel] Valid classes for extraColumnSlots
  Hi Peter,
  Some code would help here.  I'm not sure what you mean by having a matrix 
  as your extraColumnSlots. A derivative of GenomicRanges should definel a 
  method for extraColumnSlotNames that returns a character vector of names 
  for actual slots that the class defines. It sounds like you're trying to 
  represent all of the extra column slots with a single matrix slot, which is 
  not how the mechanism was designed.
  Michael
 
  On Mon, Aug 25, 2014 at 7:57 PM, Peter Hickey hic...@wehi.edu.au wrote:
  Are the extraColumnSlots of a class that extends GenomicRanges limited to 
  DataFrame objects?
 
  Background: I wrote a class that extends the GRanges class. It has a matrix 
  as the extraColumnSlots. When I use 
  replaceROWS,GenomicRanges,GenomicRanges-method (via inheritance) it 
  extracts this extraColumnSlots as a DataFrame object by use of 
  GenomicRanges:::extraColumnSlotsAsDF. This means that the subsequent call 
  to update() in replaceROWS,GenomicRanges,GenomicRanges-method fails because 
  the class definition expects a matrix for the extraSlotNames but gets a 
  DataFrame.
 
  In this case, it's not a problem for me to change my extraColumnSlots 
  element to a DataFrame in the class definition. However, more generally, 
  some guidance on what classes are and are not allowed in extraColumnSlots 
  would be appreciated.
 
  Thanks,
  Pete
 
  This is using BioC devel:
  sessionInfo()
  R version 3.1.1 (2014-07-10)
  Platform: x86_64-apple-darwin13.1.0 (64-bit)
 
  locale:
  [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
 
  attached base packages:
  [1] parallel  stats graphics  grDevices utils datasets  methods
  [8] base
 
  other attached packages:
  [1] GenomicTuples_0.1.0   GenomicRanges_1.17.35 GenomeInfoDb_1.1.18
  [4] IRanges_1.99.24   S4Vectors_0.1.2   BiocGenerics_0.11.4
  [7] devtools_1.5
 
  loaded via a namespace (and not attached):
   [1] Biobase_2.25.0   digest_0.6.4 evaluate_0.5.5   httr_0.4
   [5] memoise_0.2.1packrat_0.4.0.12 Rcpp_0.11.2  RCurl_1.95-4.3
   [9] stats4_3.1.1 stringr_0.6.2tools_3.1.1  whisker_0.3-2
 
  
  Peter Hickey,
  PhD Student/Research Assistant,
  Bioinformatics Division,
  Walter and Eliza Hall Institute of Medical Research,
  1G Royal Parade, Parkville, Vic 3052, Australia.
  Ph: +613 9345 2324
  hic...@wehi.edu.au
  http://www.wehi.edu.au
 
  __
  The information in this email is confidential and intend...{{dropped:6}}
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
  __
  The information in this email is confidential and intended solely for the 
  addressee.
  You must not disclose, forward, print or use it without the permission of 
  the sender

Re: [Bioc-devel] Valid classes for extraColumnSlots

2014-08-26 Thread Peter Hickey

Hi Michael,
Sorry for my misunderstanding. Here is some code describing the class 
https://github.com/PeteHaitch/GenomicTuples/blob/master/R/GTuples-class.R(the 
package is not yet installable but hopefully the in-progress code shows you 
what I'm trying to achieve).
The relevant slot is called internalPos and extraColumnSlotNames does indeed 
return this as a character vector. What I meant is that originally the 
internalPos slot was a matrix (or NULL). I switched to DataFrame (or NULL) 
because I was running into some problems related to replaceROWS when it was a 
matrix.
Thanks,Pete


- Original Message -From: Michael Lawrence 
lawrence.mich...@gene.comTo: Peter Hickey hic...@wehi.edu.auCc: 
bioc-devel@r-project.orgSent: Tue, 26 Aug 2014 13:35:35 +1000 (EST)Subject: Re: 
[Bioc-devel] Valid classes for extraColumnSlots

Hi Peter,

Some code would help here.  I'm not sure what you mean by having a matrix as 
your extraColumnSlots. A derivative of GenomicRanges should definel a method 
for extraColumnSlotNames that returns a character vector of names for actual 
slots that the class defines. It sounds like you're trying to represent all of 
the extra column slots with a single matrix slot, which is not how the 
mechanism was designed.

Michael


On Mon, Aug 25, 2014 at 7:57 PM, Peter Hickey hic...@wehi.edu.au wrote:
Are the extraColumnSlots of a class that extends GenomicRanges limited to 
DataFrame objects?

Background: I wrote a class that extends the GRanges class. It has a matrix as 
the extraColumnSlots. When I use replaceROWS,GenomicRanges,GenomicRanges-method 
(via inheritance) it extracts this extraColumnSlots as a DataFrame object by 
use of GenomicRanges:::extraColumnSlotsAsDF. This means that the subsequent 
call to update() in replaceROWS,GenomicRanges,GenomicRanges-method fails 
because the class definition expects a matrix for the extraSlotNames but gets a 
DataFrame.

In this case, it's not a problem for me to change my extraColumnSlots element 
to a DataFrame in the class definition. However, more generally, some guidance 
on what classes are and are not allowed in extraColumnSlots would be 
appreciated.

Thanks,

Pete

This is using BioC devel:

sessionInfo()

R version 3.1.1 (2014-07-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:

[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:

[1] parallel  stats graphics  grDevices utils datasets  methods

[8] base

other attached packages:

[1] GenomicTuples_0.1.0   GenomicRanges_1.17.35 GenomeInfoDb_1.1.18

[4] IRanges_1.99.24   S4Vectors_0.1.2   BiocGenerics_0.11.4

[7] devtools_1.5

loaded via a namespace (and not attached):

 [1] Biobase_2.25.0   digest_0.6.4 evaluate_0.5.5   httr_0.4

 [5] memoise_0.2.1packrat_0.4.0.12 Rcpp_0.11.2  RCurl_1.95-4.3

 [9] stats4_3.1.1 stringr_0.6.2tools_3.1.1  whisker_0.3-2



Peter Hickey,

PhD Student/Research Assistant,

Bioinformatics Division,

Walter and Eliza Hall Institute of Medical Research,

1G Royal Parade, Parkville, Vic 3052, Australia.

Ph: +613 9345 2324

hic...@wehi.edu.auhttp://www.wehi.edu.au

__

The information in this email is confidential and intend...{{dropped:15}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Valid classes for extraColumnSlots

2014-08-26 Thread Peter Hickey
Hi Michael,

Thanks for your patience. Here is a self-contained example with comments 
https://gist.github.com/PeteHaitch/fdb66d360446ff96ed4b

Thanks,
Pete


On 27/08/2014, at 1:43 AM, Michael Lawrence lawrence.mich...@gene.com wrote:

 Do you have the code that actually fails? Then I could use it to reproduce 
 the problem and fix things.
 
 Thanks,
 Michael
 
 
 On Tue, Aug 26, 2014 at 4:25 AM, Peter Hickey hic...@wehi.edu.au wrote:
 Hi Michael,
 
 Sorry for my misunderstanding. Here is some code describing the class 
 https://github.com/PeteHaitch/GenomicTuples/blob/master/R/GTuples-class.R
 (the package is not yet installable but hopefully the in-progress code 
 shows you what I'm trying to achieve).
 
 The relevant slot is called internalPos and extraColumnSlotNames does 
 indeed return this as a character vector. What I meant is that originally 
 the internalPos slot was a matrix (or NULL). I switched to DataFrame 
 (or NULL) because I was running into some problems related to replaceROWS 
 when it was a matrix.
 
 Thanks,
 Pete
 
 - Original Message -
 From: Michael Lawrence lawrence.mich...@gene.com
 To: Peter Hickey hic...@wehi.edu.au
 Cc: bioc-devel@r-project.org
 Sent: Tue, 26 Aug 2014 13:35:35 +1000 (EST)
 Subject: Re: [Bioc-devel] Valid classes for extraColumnSlots
 Hi Peter,
 Some code would help here.  I'm not sure what you mean by having a matrix as 
 your extraColumnSlots. A derivative of GenomicRanges should definel a method 
 for extraColumnSlotNames that returns a character vector of names for actual 
 slots that the class defines. It sounds like you're trying to represent all 
 of the extra column slots with a single matrix slot, which is not how the 
 mechanism was designed.
 Michael
 
 On Mon, Aug 25, 2014 at 7:57 PM, Peter Hickey hic...@wehi.edu.au wrote:
 Are the extraColumnSlots of a class that extends GenomicRanges limited to 
 DataFrame objects?
 
 Background: I wrote a class that extends the GRanges class. It has a matrix 
 as the extraColumnSlots. When I use 
 replaceROWS,GenomicRanges,GenomicRanges-method (via inheritance) it extracts 
 this extraColumnSlots as a DataFrame object by use of 
 GenomicRanges:::extraColumnSlotsAsDF. This means that the subsequent call to 
 update() in replaceROWS,GenomicRanges,GenomicRanges-method fails because the 
 class definition expects a matrix for the extraSlotNames but gets a DataFrame.
 
 In this case, it's not a problem for me to change my extraColumnSlots element 
 to a DataFrame in the class definition. However, more generally, some 
 guidance on what classes are and are not allowed in extraColumnSlots would be 
 appreciated.
 
 Thanks,
 Pete
 
 This is using BioC devel:
 sessionInfo()
 R version 3.1.1 (2014-07-10)
 Platform: x86_64-apple-darwin13.1.0 (64-bit)
 
 locale:
 [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
 
 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets  methods
 [8] base
 
 other attached packages:
 [1] GenomicTuples_0.1.0   GenomicRanges_1.17.35 GenomeInfoDb_1.1.18
 [4] IRanges_1.99.24   S4Vectors_0.1.2   BiocGenerics_0.11.4
 [7] devtools_1.5
 
 loaded via a namespace (and not attached):
  [1] Biobase_2.25.0   digest_0.6.4 evaluate_0.5.5   httr_0.4
  [5] memoise_0.2.1packrat_0.4.0.12 Rcpp_0.11.2  RCurl_1.95-4.3
  [9] stats4_3.1.1 stringr_0.6.2tools_3.1.1  whisker_0.3-2
 
 
 Peter Hickey,
 PhD Student/Research Assistant,
 Bioinformatics Division,
 Walter and Eliza Hall Institute of Medical Research,
 1G Royal Parade, Parkville, Vic 3052, Australia.
 Ph: +613 9345 2324
 hic...@wehi.edu.au
 http://www.wehi.edu.au
 
 __
 The information in this email is confidential and intend...{{dropped:6}}
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 __
 The information in this email is confidential and intended solely for the 
 addressee.
 You must not disclose, forward, print or use it without the permission of the 
 sender.
 __
 


Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hic...@wehi.edu.au
http://www.wehi.edu.au

__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] seqnames of SNPlocs.*

2014-06-17 Thread Peter Hickey
Is there a reason why the seqnames of SNPlocs.Hsapiens.dbSNP.20120608 (and 
possibly the other SNPlocs.*) use the prefix ch instead of chr? E.g. ch1 
instead of chr1. It doesn't seem to fit with any standard way of naming 
chromosomes and means that these need to be renamed to use with most other 
Bioconductor data sources.
Thanks,
Pete

Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hic...@wehi.edu.au
http://www.wehi.edu.au


__
The information in this email is confidential and intend...{{dropped:8}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] seqnames of SNPlocs.*

2014-06-17 Thread Peter Hickey
Thanks for the explanation, Vincent. GenomeInfoDb has NCBI and UCSC support, 
but doesn't seem to support the dbSNP format. Perhaps this should be added?

 seqlevelsStyle(seqnames(SNPlocs.Hsapiens.dbSNP.20120608))
Error in .guessSpeciesStyle(seqnames) : 
  The style does not have a compatible entry for the species supported by 
Seqname. Please
  see genomeStyles() for supported species/style

On 18/06/2014, at 12:40 PM, Vincent Carey st...@channing.harvard.edu wrote:

 it is the convention used in dbSNP, just propagated directly.  indeed one 
 typically has to relabel, but there
 is seqnamesStyle infrastructure in GenomeInfoDb that may help.
 
 
 On Tue, Jun 17, 2014 at 8:17 PM, Peter Hickey hic...@wehi.edu.au wrote:
 Is there a reason why the seqnames of SNPlocs.Hsapiens.dbSNP.20120608 (and 
 possibly the other SNPlocs.*) use the prefix ch instead of chr? E.g. 
 ch1 instead of chr1. It doesn't seem to fit with any standard way of 
 naming chromosomes and means that these need to be renamed to use with most 
 other Bioconductor data sources.
 Thanks,
 Pete
 
 Peter Hickey,
 PhD Student/Research Assistant,
 Bioinformatics Division,
 Walter and Eliza Hall Institute of Medical Research,
 1G Royal Parade, Parkville, Vic 3052, Australia.
 Ph: +613 9345 2324
 
 hic...@wehi.edu.au
 http://www.wehi.edu.au
 
 
 __
 The information in this email is confidential and inte...{{dropped:28}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] My == method breaks existing == method for signature c(Rle, vector')

2014-05-28 Thread Peter Hickey
Thanks for making that change, Hervé. It allows my package to avoid the bug 
that I reported (tested with GenomicRanges_1.17.17 built from source). However, 
I can still reproduce the simplified bug using the latest devel versions (see 
below) but I'm happy that I now no longer get bitten by it.

Thanks,
Pete

--
# A simplified example
# Fresh R session
library(IRanges)
fix - Rle('end', 10)

# Works when repeatedly called
fix == 'end' # Works
fix == 'end' # Works

# The method definition that breaks things
# I haven't included the MTuples class definition or the .MTuples.compare 
function
# But that shouldn't matter in order to highlight the problem, should it?
setMethod(==, c(MTuples, MTuples), function(e1, e2) {
  .MTuples.compare(e1, e2) == 0L
})

# Works the first time, but not the second
fix == 'end' # Works
fix == 'end' # Breaks
--

My session info is:
--
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods   base  
   

other attached packages:
[1] IRanges_1.99.15 S4Vectors_0.0.7 BiocGenerics_0.11.2

loaded via a namespace (and not attached):
 [1] devtools_1.5   digest_0.6.4   evaluate_0.5.5 httr_0.3   memoise_0.2.1  
RCurl_1.95-4.1
 [7] stats4_3.1.0   stringr_0.6.2  tools_3.1.0whisker_0.3-2
--

On 28/05/2014, at 3:23 AM, Hervé Pagès hpa...@fhcrc.org wrote:

 Hi Peter,
 
 On 05/26/2014 04:37 PM, Peter Hickey wrote:
 Thanks for the suggested work-around, Martin. In order to define the method 
 on the group generic 'Ops' rather than '==' I will need to generalise 
 .MTuples.compare to the 'Arith', 'Compare' and 'Logic' sub-groups listed in 
 ?Ops, won't I? I'll try to implement the comparison methods for MTuples via 
 the 'Ops' group generic but I want to first make sure that I'm not missing 
 out on getting some of these for free. The rest of this post might be 
 tangential to the issue I first raised, but it also explains why I 
 implemented '==' rather than on the group generic 'Ops'.
 
 The MTuples class extends GRanges:
 
 setClass('MTuples', representation(extraPos = matrix), contains = 
 GRanges, validity = .valid.MTuples)
 
 I defined .MTuples.compare because while MTuples is based on GRanges, the 
 GRanges 'compare' method is inappropriate for MTuples (see 
 https://github.com/PeteHaitch/cometh/blob/8be01b8b37b6288db5493f53188a193a423ab69f/R/methods-MTuples-class.R
  if interested). Based on reading ?compare, I'd hoped that the binary 
 parallel comparison operators (=, ==, !=, =,  and ) would work 
 out-of-the-box. But I already found that I needed to include explicit method 
 definitions for '=' and '=='  because otherwise these (and the other binary 
 comparison operators) would dispatch on the GRanges methods rather than the 
 MTuples methods. This is because the GRanges '==' and '=' methods are 
 explicitly implemented as:
 
 setMethod(==, signature(e1=GenomicRanges, e2=GenomicRanges), 
 function(e1, e2) { .GenomicRanges.compare(e1, e2) == 0L })
 setMethod(=, signature(e1=GenomicRanges, e2=GenomicRanges), 
 function(e1, e2) { .GenomicRanges.compare(e1, e2) = 0L })
 
 I don't understand why these explicit definitions are necessary; the 
 'compare' method for GRanges is already defined so why don't these work 
 out-of-the-box, which is what I expected based on my reading of ?compare 
 (specifically, point 1 in the Note sub-section).
 
 You're right, they are not needed. I just removed them from
 GenomicRanges 1.17.17. See if that helps with the original problem
 you reported.
 
 Cheers,
 H.
 
 
 Because these are explicitly defined, and not simply defined via 
 inheritance, I must also explicitly define '=' and '==' methods for MTuples 
 (although I do get '=', '!=', '' and '' for free, which is great!). 
 Unfortunately, as you can see, the MTuples '==' method breaks '==' for other 
 signatures. Hope that wasn't too tangential.
 
 Cheers,
 Pete
 
 
 On 27/05/2014, at 2:44 AM, Martin Morgan mtmor...@fhcrc.org wrote:
 
 On 05/25/2014 09:39 PM, Peter Hickey wrote:
 The == method that I have defined for my class, MTuples, is breaking the 
 == method for signature c(Rle, vector). I discovered this when 
 working on something quite unrelated, namely, I couldn't resize IRanges 
 with fixed = end when my package was loaded. The attached code 
 highlights the initial problem.
 
 The error message and traceback were a little hairy - could someone please

Re: [Bioc-devel] My == method breaks existing == method for signature c(Rle, vector')

2014-05-26 Thread Peter Hickey
Thanks for the suggested work-around, Martin. In order to define the method on 
the group generic 'Ops' rather than '==' I will need to generalise 
.MTuples.compare to the 'Arith', 'Compare' and 'Logic' sub-groups listed in 
?Ops, won't I? I'll try to implement the comparison methods for MTuples via the 
'Ops' group generic but I want to first make sure that I'm not missing out on 
getting some of these for free. The rest of this post might be tangential to 
the issue I first raised, but it also explains why I implemented '==' rather 
than on the group generic 'Ops'.

The MTuples class extends GRanges:

setClass('MTuples', representation(extraPos = matrix), contains = GRanges, 
validity = .valid.MTuples)

I defined .MTuples.compare because while MTuples is based on GRanges, the 
GRanges 'compare' method is inappropriate for MTuples (see 
https://github.com/PeteHaitch/cometh/blob/8be01b8b37b6288db5493f53188a193a423ab69f/R/methods-MTuples-class.R
 if interested). Based on reading ?compare, I'd hoped that the binary 
parallel comparison operators (=, ==, !=, =,  and ) would work 
out-of-the-box. But I already found that I needed to include explicit method 
definitions for '=' and '=='  because otherwise these (and the other binary 
comparison operators) would dispatch on the GRanges methods rather than the 
MTuples methods. This is because the GRanges '==' and '=' methods are 
explicitly implemented as:

setMethod(==, signature(e1=GenomicRanges, e2=GenomicRanges), function(e1, 
e2) { .GenomicRanges.compare(e1, e2) == 0L })
setMethod(=, signature(e1=GenomicRanges, e2=GenomicRanges), function(e1, 
e2) { .GenomicRanges.compare(e1, e2) = 0L })

I don't understand why these explicit definitions are necessary; the 'compare' 
method for GRanges is already defined so why don't these work out-of-the-box, 
which is what I expected based on my reading of ?compare (specifically, point 1 
in the Note sub-section). Because these are explicitly defined, and not simply 
defined via inheritance, I must also explicitly define '=' and '==' methods 
for MTuples (although I do get '=', '!=', '' and '' for free, which is 
great!). Unfortunately, as you can see, the MTuples '==' method breaks '==' for 
other signatures. Hope that wasn't too tangential.

Cheers,
Pete


On 27/05/2014, at 2:44 AM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 05/25/2014 09:39 PM, Peter Hickey wrote:
 The == method that I have defined for my class, MTuples, is breaking the 
 == method for signature c(Rle, vector). I discovered this when working 
 on something quite unrelated, namely, I couldn't resize IRanges with fixed 
 = end when my package was loaded. The attached code highlights the initial 
 problem.
 
 The error message and traceback were a little hairy - could someone please 
 help me figure out what's going wrong?
 
 
 I don't have an immediate answer. A work-around is to define the method on 
 the group generic 'Ops' rather than '==', though maybe the generalization of 
 .MTuples.compare to other members of the Ops family is not easy?
 
 setMethod(Ops, c(MTuples, MTuples), function(e1, e2) TRUE)
 
 Martin
 
 Thanks,
 Pete
 
 --
 # Mimics the initial problem
 # Fresh R session
 library(IRanges)
 ir - IRanges(start = 11, end = rep.int(20, 5))
 
  # Works when repeatedly called
 resize(ir, 1, 'end') # Works
 resize(ir, 1, 'end') # Works
 
 # The method definition that breaks things
 # I haven't included the MTuples class definition or the .MTuples.compare 
 function
 # But that shouldn't matter in order to highlight the problem, should it?
 setMethod(==, c(MTuples, MTuples), function(e1, e2) {
   .MTuples.compare(e1, e2) == 0L
 })
 
 # No longer works
 resize(ir, 1, 'end')
 --
 
 
 I managed to simplify the reproducible example to the following, but I can't 
 figure out what's going wrong:
 --
 # A simplified example
 # Fresh R session
 library(IRanges)
 fix - Rle('end', 10)
 
 # Works when repeatedly called
 fix == 'end' # Works
 fix == 'end' # Works
 
 # The method definition that breaks things
 # I haven't included the MTuples class definition or the .MTuples.compare 
 function
 # But that shouldn't matter in order to highlight the problem, should it?
 setMethod(==, c(MTuples, MTuples), function(e1, e2) {
   .MTuples.compare(e1, e2) == 0L
 })
 
 # Works the first time, but not the second
 fix == 'end' # Works
 fix == 'end' # Breaks
 --
 
 The same problem occurs if the vector is numeric and not character, e.g. 
 using 7 instead of 'end'.
 
 When this breaks I get the error

[Bioc-devel] My == method breaks existing == method for signature c(Rle, vector')

2014-05-25 Thread Peter Hickey
The == method that I have defined for my class, MTuples, is breaking the == 
method for signature c(Rle, vector). I discovered this when working on 
something quite unrelated, namely, I couldn't resize IRanges with fixed = end 
when my package was loaded. The attached code highlights the initial problem.

The error message and traceback were a little hairy - could someone please help 
me figure out what's going wrong?

Thanks,
Pete

--
# Mimics the initial problem
# Fresh R session
library(IRanges)
ir - IRanges(start = 11, end = rep.int(20, 5))

 # Works when repeatedly called
resize(ir, 1, 'end') # Works
resize(ir, 1, 'end') # Works

# The method definition that breaks things
# I haven't included the MTuples class definition or the .MTuples.compare 
function 
# But that shouldn't matter in order to highlight the problem, should it?
setMethod(==, c(MTuples, MTuples), function(e1, e2) {
  .MTuples.compare(e1, e2) == 0L
})

# No longer works
resize(ir, 1, 'end') 
--


I managed to simplify the reproducible example to the following, but I can't 
figure out what's going wrong:
--
# A simplified example
# Fresh R session
library(IRanges)
fix - Rle('end', 10)

# Works when repeatedly called
fix == 'end' # Works
fix == 'end' # Works 

# The method definition that breaks things
# I haven't included the MTuples class definition or the .MTuples.compare 
function 
# But that shouldn't matter in order to highlight the problem, should it?
setMethod(==, c(MTuples, MTuples), function(e1, e2) {
  .MTuples.compare(e1, e2) == 0L
})

# Works the first time, but not the second
fix == 'end' # Works
fix == 'end' # Breaks
--

The same problem occurs if the vector is numeric and not character, e.g. using 
7 instead of 'end'.

When this breaks I get the error:
--
Error in Rle(values = callGeneric(runValue(e1)[which1], runValue(e2)[which2]),  
: 
  error in evaluating the argument 'values' in selecting a method for function 
'Rle': Error in as.character(call[[1L]]) : 
  cannot coerce type 'builtin' to vector of type 'character'
--

My session info is:
--
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods 
[8] base 

other attached packages:
[1] IRanges_1.22.7  BiocGenerics_0.10.0

loaded via a namespace (and not attached):
[1] stats4_3.1.0
--



Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hic...@wehi.edu.au
http://www.wehi.edu.au


__
The information in this email is confidential and intend...{{dropped:8}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Restrictions on findOverlaps parameters with SummarizedExperiment objects

2014-04-11 Thread Peter Hickey
Could the findOverlaps method with signatures involving SummarizedExperiment 
objects please be extended to allow the full range of type and select 
arguments? Please see the examples below for situations where the parameter 
choices seem unduly restrictive. Are there reasons for these restrictions?

Thanks,
Pete

These examples use GenomicRanges_1.14.4:

### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
## Returns error: Error in match.arg(type) :   'arg' should be one of any, 
start, end, within
nrows - 200; ncols - 6
  counts - matrix(runif(nrows * ncols, 1, 1e4), nrows)
  rowData - GRanges(rep(c(chr1, chr2), c(50, 150)),
 IRanges(floor(runif(200, 1e5, 1e6)), width=100),
 strand=sample(c(+, -), 200, TRUE))
  colData - DataFrame(Treatment=rep(c(ChIP, Input), 3),
   row.names=LETTERS[1:6])
  sset - SummarizedExperiment(assays=SimpleList(counts=counts),
 rowData=rowData, colData=colData)
findOverlaps(sset, sset, type = 'equal')

### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
## But this works fine (ostensibly doing the same thing as the above but via a 
different signature)
findOverlaps(sset, rowData(sset), type = 'equal')

### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
## I looked at the method definitions for various signatures 
## This revealed that only a subset of type and select arguments are 
allowed for certain signatures involving SummarizedExperiment objects
## E.g. Does not allow type = 'equal' or select = 'last' or select = 'arbitrary'
selectMethod('findOverlaps', c('SummarizedExperiment', 'SummarizedExperiment'))
Method Definition:

function (query, subject, maxgap = 0L, minoverlap = 1L, type = c(any, 
start, end, within), select = c(all, first), ...) 
{
.local - function (query, subject, maxgap = 0L, minoverlap = 1L, 
type = c(any, start, end, within), select = c(all, 
first), ignore.strand = FALSE) 
{
findOverlaps(rowData(query), rowData(subject), maxgap = maxgap, 
minoverlap = minoverlap, type = match.arg(type), 
select = match.arg(select), ignore.strand = ignore.strand)
}
.local(query, subject, maxgap, minoverlap, type, select, 
...)
}
environment: namespace:GenomicRanges

Signatures:
query  subject   
target  SummarizedExperiment SummarizedExperiment
defined SummarizedExperiment SummarizedExperiment

### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
## E.g. Does allow type = 'equal' but does not allow select = 'last' or select 
= 'arbitrary'
selectMethod('findOverlaps', c('SummarizedExperiment', 'Vector'))
Method Definition:

function (query, subject, maxgap = 0L, minoverlap = 1L, type = c(any, 
start, end, within, equal), select = c(all, first), 
...) 
{
.local - function (query, subject, maxgap = 0L, minoverlap = 1L, 
type = c(any, start, end, within, equal), select = c(all, 
first), ignore.strand = FALSE) 
{
findOverlaps(rowData(query), subject, maxgap = maxgap, 
minoverlap = minoverlap, type = match.arg(type), 
select = match.arg(select), ignore.strand = ignore.strand)
}
.local(query, subject, maxgap, minoverlap, type, select, 
...)
}
environment: namespace:GenomicRanges

Signatures:
query  subject 
target  SummarizedExperiment Vector
defined SummarizedExperiment Vector

### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
## E.g. Does allow type = 'equal' but does not allow select = 'last' or select 
= 'arbitrary'
selectMethod('findOverlaps', c('Vector', 'SummarizedExperiment'))
Method Definition:

function (query, subject, maxgap = 0L, minoverlap = 1L, type = c(any, 
start, end, within, equal), select = c(all, first), 
...) 
{
.local - function (query, subject, maxgap = 0L, minoverlap = 1L, 
type = c(any, start, end, within, equal), select = c(all, 
first), ignore.strand = FALSE) 
{
findOverlaps(query, rowData(subject), maxgap = maxgap, 
minoverlap = minoverlap, type = match.arg(type), 
select = match.arg(select), ignore.strand = ignore.strand)
}
.local(query, subject, maxgap, minoverlap, type, select, 
...)
}
environment: namespace:GenomicRanges

Signatures:
querysubject   
target  Vector SummarizedExperiment
defined Vector SummarizedExperiment


Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hic...@wehi.edu.au
http://www.wehi.edu.au


__
The information in this email is confidential and intend

Re: [Bioc-devel] Restrictions on findOverlaps parameters with SummarizedExperiment objects

2014-04-11 Thread Peter Hickey
Thanks for the explanation and the update, Martin.
Cheers,
Pete
- Original Message -
From: Martin Morgan mtmor...@fhcrc.org
To: Peter Hickey hic...@wehi.edu.au, bioc-devel@r-project.org
Sent: Sat, 12 Apr 2014 06:45:46 +1000 (EST)
Subject: Re: [Bioc-devel] Restrictions on findOverlaps parameters with 
SummarizedExperiment objects

On 04/10/2014 11:44 PM, Peter Hickey wrote:
 Could the findOverlaps method with signatures involving SummarizedExperiment 
 objects please be extended to allow the full range of type and select 
 arguments? Please see the examples below for situations where the parameter 
 choices seem unduly restrictive. Are there reasons for these restrictions?


Hi Pete -- Updated in 1.15.46, thanks. I think the original rationale had been 
that the rowData can be a GRanges or GRangesList, and type and select are not 
equally supported across these types, so only those values implemented for all 
row data were available. Martin


 Thanks,
 Pete

 These examples use GenomicRanges_1.14.4:

 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 ## Returns error: Error in match.arg(type) :   'arg' should be one of any, 
 start, end, within
 nrows - 200; ncols - 6
counts - matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowData - GRanges(rep(c(chr1, chr2), c(50, 150)),
   IRanges(floor(runif(200, 1e5, 1e6)), width=100),
   strand=sample(c(+, -), 200, TRUE))
colData - DataFrame(Treatment=rep(c(ChIP, Input), 3),
 row.names=LETTERS[1:6])
sset - SummarizedExperiment(assays=SimpleList(counts=counts),
   rowData=rowData, colData=colData)
 findOverlaps(sset, sset, type = 'equal')

 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 ## But this works fine (ostensibly doing the same thing as the above but via 
 a different signature)
 findOverlaps(sset, rowData(sset), type = 'equal')

 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 ## I looked at the method definitions for various signatures
 ## This revealed that only a subset of type and select arguments are 
 allowed for certain signatures involving SummarizedExperiment objects
 ## E.g. Does not allow type = 'equal' or select = 'last' or select = 
 'arbitrary'
 selectMethod('findOverlaps', c('SummarizedExperiment', 
 'SummarizedExperiment'))
 Method Definition:

 function (query, subject, maxgap = 0L, minoverlap = 1L, type = c(any,
  start, end, within), select = c(all, first), ...)
 {
  .local - function (query, subject, maxgap = 0L, minoverlap = 1L,
  type = c(any, start, end, within), select = c(all,
  first), ignore.strand = FALSE)
  {
  findOverlaps(rowData(query), rowData(subject), maxgap = maxgap,
  minoverlap = minoverlap, type = match.arg(type),
  select = match.arg(select), ignore.strand = ignore.strand)
  }
  .local(query, subject, maxgap, minoverlap, type, select,
  ...)
 }
 environment: namespace:GenomicRanges

 Signatures:
  query  subject
 target  SummarizedExperiment SummarizedExperiment
 defined SummarizedExperiment SummarizedExperiment

 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 ## E.g. Does allow type = 'equal' but does not allow select = 'last' or 
 select = 'arbitrary'
 selectMethod('findOverlaps', c('SummarizedExperiment', 'Vector'))
 Method Definition:

 function (query, subject, maxgap = 0L, minoverlap = 1L, type = c(any,
  start, end, within, equal), select = c(all, first),
  ...)
 {
  .local - function (query, subject, maxgap = 0L, minoverlap = 1L,
  type = c(any, start, end, within, equal), select = c(all,
  first), ignore.strand = FALSE)
  {
  findOverlaps(rowData(query), subject, maxgap = maxgap,
  minoverlap = minoverlap, type = match.arg(type),
  select = match.arg(select), ignore.strand = ignore.strand)
  }
  .local(query, subject, maxgap, minoverlap, type, select,
  ...)
 }
 environment: namespace:GenomicRanges

 Signatures:
  query  subject
 target  SummarizedExperiment Vector
 defined SummarizedExperiment Vector

 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 ## E.g. Does allow type = 'equal' but does not allow select = 'last' or 
 select = 'arbitrary'
 selectMethod('findOverlaps', c('Vector', 'SummarizedExperiment'))
 Method Definition:

 function (query, subject, maxgap = 0L, minoverlap = 1L, type = c(any,
  start, end, within, equal), select = c(all, first),
  ...)
 {
  .local - function (query, subject, maxgap = 0L, minoverlap = 1L,
  type = c(any, start, end, within, equal), select = c(all,
  first), ignore.strand = FALSE)
  {
  findOverlaps(query, rowData(subject), maxgap = maxgap,
  minoverlap

Re: [Bioc-devel] Help in designing class based on SummarizedExperiment

2014-02-14 Thread Peter Hickey
Apologies for the bump, but is anyone able to help me on this? Or are these 
questions more appropriate for the general Bioconductor mailing list rather 
than Bioc-Devel?

Many thanks,
Pete

- Original Message -
Date: Mon, 10 Feb 2014 13:20:47 +1100
From: Peter Hickey hic...@wehi.edu.au
To: bioc-devel@r-project.org
Subject: [Bioc-devel] Help in designing class based on
SummarizedExperiment
Message-ID: e3127fd3-87e5-43da-b056-a633525ee...@wehi.edu.au
Content-Type: text/plain

Hi all,

Apologies up front for the rather long post.

I'm designing a class to store what I call co-methylation m-tuples. These are 
based on a very simple tab-delimited file format. 
For example, here are 1-tuples (m = 1):
chr pos1M   U
chr157691   0   1
chr159276   1   0
chr160408   1   0
chr163495   1   0
chr163568   2   0
chr163627   3   0

2-tuples (m = 2):
chr pos1pos2MM  MU  UM  UU
chr1567438  567570  0   0   0   2
chr1567501  567549  0   0   0   35
chr1567549  567558  0   1   0   139

3-tuples (m = 3):
chr pos1pos2pos3MMM MMU MUM MUU UMM 
UMU UUM UUU
chr113644   13823   13828   1   0   0   
0   0   0   0   0
chr114741   14747   14773   1   0   0   
0   0   0   0   0

etc.

1-tuples are basically the standard input to an analysis of BS-seq data. 

I think of these files as being comprised of 3 parts: the 'chr' column (chr), 
the 'pos' matrix (pos1, pos2, pos3) and the 'counts' matrix (MMM, MMU, MUM, 
MUU, UMM, UMU, UUM, UUU), when m = 3. For a given value of 'm' there is one 
'chr' column, m 'pos' columns and 2^m 'counts' columns.

I want to implement a class for these objects as I'm writing a package for the 
analysis of this type of data. I'd like a GRanges-type object storing the 
genomic information and a matrix-like object storing the counts. After 
tinkering around for a while, and doing some reading of the code in packages 
such as GenomicRanges and bsseq, I decided to extend the SummarizedExperiment 
class.  I now have a prototype but I have some questions and would appreciate 
feedback on some of my design choices before I translate my existing functions 
to work with this class of object.

Here is the code for the prototype:
#
library(GenomicRanges)

setClass(CoMeth, contains = SummarizedExperiment)

CoMeth - function(seqnames, pos, counts, m, methylation_type, sample_name, 
strand = *, seqlengths = NULL, seqinfo = NULL){
  
  # Argument checks, etc. go here #
  
  gr - GRanges(seqnames = seqnames, ranges = IRanges(start = pos[[1]], end = 
pos[[length(pos)]]), strand = strand, seqlengths = seqlengths, seqinfo = 
seqinfo) # The width of each element is defined by the first and last 'pos', 
e.g. for 3-tuples it is defined by pos1 and pos3.
  # Need to store the extra positions if m  2. Each additional position is 
stored as a separate assay
  if (m  2){
extra_pos - lapply(seq(2, m - 1, 1), function(i, pos){
  pos[[i]]
}, pos = pos)
names(extra_pos) - names(pos)[2:(m-1)]
  } else {
extra_pos - NULL
  }
  assays - SimpleList(c(counts, extra_pos))
  colData - DataFrame(sample_name = sample_name, m = m, methylation_type = 
paste0(sort(methylation_type), collapse = '/'))
  cometh - SummarizedExperiment(assays = assays, rowData = gr, colData = 
colData)
  cometh - as(cometh, CoMeth)
  
  return(cometh)
}

And here's some example data:
# A function that roughly imitates the output of a call to scan() to read in 
BS-seq m-tuple data
# m is the size of the m-tuples
# n is the number of m-tuples
# z is the proportion of each column of 'counts' that is zero
make_test_data - function(m, n, z){
  seqnames - list(seqnames = rep('chr1', n))
  pos - lapply(1:m, function(x, n){matrix(seq(from = 1 + x - 1, to = n + x - 
1, by = 1), ncol = 1)}, n = n) # Need these to be matrices rather than vectors
  names(pos) - paste0('pos', 1:m)
  # A rough hack to simulate counts where a proportion (z) are 0 and the rest 
are sampled from Poisson(lambda). Small values of lambda will inflate the 
zero-count.
  counts - mapply(FUN = function(i, z, n, lambda){
nz - floor(n * (1 - z))
matrix(sample(c(rpois(nz, lambda), rep(0, n - nz))), ncol = 1)
}, i = 1:(2 ^ m), z = z, n = n, lambda = 4, SIMPLIFY = FALSE) # Need these 
to be matrices rather than vectors
  names(counts) - sort(do.call(paste0, expand.grid(lapply(seq_len(m), 
function(x){c('M', 'U')}
  return(c(seqnames, pos, counts))
}

m - 3 # An example using 3-tuples
n - 1000 # A typical value for 3-tuples from a methylC-seq experiment is n = 
17,000,000
z - c(0.2, 0.6, 0.6, 0.7, 0.6, 0.8, 0.8, 0.7) # Typical

[Bioc-devel] Help in designing class based on SummarizedExperiment

2014-02-09 Thread Peter Hickey
/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods   base  
   

other attached packages:
[1] GenomicRanges_1.14.4 XVector_0.2.0IRanges_1.20.6   
BiocGenerics_0.8.0  

loaded via a namespace (and not attached):
[1] stats4_3.0.2 tools_3.0.2 
#

Questions
1. How can I move the 'extra_pos' columns from the assay slot but keep 
the copy-on-change behaviour? From a design perspective, I think it would make 
more sense for the 'extra_pos' columns, i.e. ('pos2') for 3-tuples and ('pos2', 
'pos3') for 4-tuples etc., to be in their own slot rather than in the assays 
slot, after all, they aren't assays but rather are additional genomic 
co-ordinates. The 'extra_pos' fields are fixed (at least until I start 
subsetting or combining multiple CoMeth objects). My understanding of the the 
SummarizedExperiment class is that the assays slot is a reference class to 
avoid excessive copying when changing other slots of a SummarizedExperiment 
object. So if the 'extra_pos' columns were stored outside of the assays slot 
then these would have to be copied when any changes are made to the other slots 
of a CoMeth object, correct? Is there a way to avoid this, i.e. so that these 
'extra_pos' columns are stored separately from the assays slot but with the !
 copy-on-change behaviour of the assays slot?
2. Is the correct to compute something based on the 'counts' data via 
the assay() accessor? For example, I might want a helper function 
getCounts(cometh) that does the equivalent of sapply(X = 1:(2^m), function(i, 
cometh){assay(cometh, i)}, cometh = cometh). Similarly, I might want to compute 
the coverage of an m-tuple, which would be the equivalent of 
rowSums(getCounts(cometh)). Is this the correct way to do this sort of thing?
3. How do I measure the size of a SummarizedExperiment/CoMeth object? 
For example, with the test data, print(object.size(cometh), units = auto)  
print(object.size(assays(cometh)), units = auto), so it seems that the size 
of the assays slot isn't counted by object.size(). 
4. Is it possible to store an Rle-type object in the assays slot of a 
SummarizedExperiment? 20-80% of the entries in each column of 'counts' are zero 
and there are often runs of zeros. So I thought that perhaps an Rle 
representation (column-wise) might be more (memory) efficient. But I can't seem 
to get an Rle object in the assays slot (I tried via DataFrame); is it even 
possible?
5. Are there matrix-like objects with Rle columns? I found this thread 
started by Kasper Hansen 
(https://stat.ethz.ch/pipermail/bioconductor/2012-June/046473.html) discussing 
the idea of matrix-like object where the columns are Rle's; I could imagine 
using such an object for a CoMeth object containing multiple samples, i.e. MMM 
is a matrix-like object with ncol = # of samples, MMU is matrix-like object 
with ncol = # of samples, etc. Was anything like this ever implemented? My 
reading of the previous thread was to use a DataFrame but the matrix API, 
e.g. rowSums, doesn't work with DataFrames (and see (4) as to whether it's even 
possible to store such objects in the assays slot).

Many thanks for your help in answering these questions. Any other suggestions 
on the design of the CoMeth class are appreciated.

Thanks,
Pete

Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hic...@wehi.edu.au
http://www.wehi.edu.au


__
The information in this email is confidential and intend...{{dropped:8}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel