Re: [Bioc-devel] DataFrameList to Wide Format DataFrame

2021-12-17 Thread Michael Lawrence via Bioc-devel
This is more of a support site question.

The stack() function is relevant here, but it won't fill in the missing columns.

Note though that there are some conveniences that might help a tiny
bit, like how colnames(DFL) returns a CharacterList, so you can do
unique(unlist(colnames(DFL))).

In theory we could make [<-() on a DataFrameList behave more like its
SplitDataFrameList derivative and insert columns into each of its
elements, so you could do something like:

DFL[,psetdiff(unique(unlist(colnames(DFL))), colnames(DFL))] <- NA

I don't know if psetdiff() would work in that way, but it could.

Michael

On Thu, Dec 16, 2021 at 11:01 PM Dario Strbenac via Bioc-devel
 wrote:
>
> Hello,
>
> Ah, yes, the sample names should of course be in the rows - Friday afternoon 
> error. In the question, I specified "largely the same set of features", 
> implying that the overlap is not complete. So, the example below will error.
>
> DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = 
> LETTERS[1:3]),
>  Y = DataFrame(b = 4:6, c = 6:4, row.names = 
> LETTERS[20:22]))
> unlist(DFL)
> Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = 
> strict.colnames) :
>   the DFrame objects to combine must have the same column names
>
> This is long but works:
>
> allFeatures <- unique(unlist(lapply(DFL, colnames)))
> DFL <- lapply(DFL, function(DF)
> {
>   missingFeatures <- setdiff(allFeatures, colnames(DF))
>   DF[missingFeatures] <- NA
>   DF
> })
> DFLflattened <- do.call(rbind, DFL)
>
> Is there a one-line function for it?
>
> --
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Principal Scientist, Director of Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] S4 Method Slow Execution if Signature Has Multiple Class Unions

2021-12-03 Thread Michael Lawrence via Bioc-devel
Hi Dario,

Thanks for the reproducible example. The time comes from building all
the possible signatures when dispatching the first time.  The expense
comes from the length of the signature. After the first dispatch, the
table is cached and everything is fast.

Have you considered just using an ordinary function as a constructor?
Usually, there is no need for polymorphism during construction, and
any that is necessary can be handled through explicit conditions. One
reason for this is that it is rare to desire extensibility of
construction across packages, since the constructor makes assumptions
about the internal structure of the class.

In this case, you are using dispatch to condition on argument
missingness, which I would argue is more clearly implemented just
using `if(missing())` and/or putting default values in the constructor
formals. There are very few valid use cases for generics with
signatures this long.

As an aside, the behavior (at least in this example) seems confusing.
For example, ParamSet() uses 'M' as the default for `A`, while
specifying any parameter results in `A` defaulting to 'L'.

Hope this helps,
Michael



Michael

On Wed, Nov 24, 2021 at 3:02 AM Dario Strbenac via Bioc-devel
 wrote:
>
> Hello,
>
> Thanks. It was difficult to pinpoint, but I was able to make a minimal 
> example. It happens only if SummarizedExperiment is pre-loaded. The 
> difference is 0.2 seconds versus 32 seconds on my modest Windows 10 laptop 
> computer - a 150 times slowdown. Can you reproduce it?
>
> library(SummarizedExperiment)
>
> setClassUnion("characterOrMissing", c("character", "missing"))
> setClassUnion("integerOrMissing", c("integer", "missing"))
> setClass("ParamsSet", representation(A = "characterOrMissing", B = "integer"))
> setGeneric("ParamsSet", function(A, B, C, D, E, F, G, H) 
> standardGeneric("ParamsSet"))
>
> setMethod("ParamsSet", c("missing", "missing", "missing", "missing", 
> "missing", "missing", "missing", "missing"),
> function() # Empty constructor
> {
>   new("ParamsSet", A = 'M', B = 300L)
> })
>
> setMethod("ParamsSet", c("characterOrMissing", "integerOrMissing", 
> "integerOrMissing", "integerOrMissing",
>  "characterOrMissing", "integerOrMissing", 
> "integerOrMissing", "integerOrMissing"),
> function(A = c('L', 'M', 'N'), B = 500, C = 100, D, E, F, G, H)
> {
>   if(missing(A)) A <- 'L' # Mimick match.arg.
>   if(missing(B)) B <- 500L # Hack to implement parameter defaults not 
> specified by generic.
>   if(missing(C)) C <- 100L
>   new("ParamsSet", A = A, B = B)
> })
>
> system.time(ParamsSet(B = 999L)) # Slow or fast, depending on 
> SummarizedExperiment presence.
>
> --
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Michael Lawrence
Principal Scientist, Director of Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] S4 Method Slow Execution if Signature Has Multiple Class Unions

2021-11-23 Thread Michael Lawrence via Bioc-devel
Hi Dario,

Thanks for bringing this up. The second time speed could be explained
by the caching that happens inside the methods package. If you are
able to come up with a smaller reproducible example, I would be happy
to help out.

Michael

On Mon, Nov 22, 2021 at 1:00 PM Dario Strbenac via Bioc-devel
 wrote:
>
> Good day,
>
> I created two constructor methods for a generic function. One is for the 
> default empty constructor and the other is a constructor when any one or more 
> parameters is specified by the user. The method signatures are:
>
> 1. c("missing", "missing", "missing", "missing", "missing", "missing", 
> "missing", "missing"),
> 2. c("characterOrMissing", "numericOrMissing", "numericOrMissing", 
> "numericOrMissing", "numericOrMissing", "characterOrMissing", 
> "BiocParallelParamOrMissing", "numericOrMissing")
>
> The class unions are defined as you might expect.
>
> setClassUnion("characterOrMissing", c("character", "missing"))
> setClassUnion("numericOrMissing", c("numeric", "missing"))
> setClassUnion("BiocParallelParamOrMissing", c("BiocParallelParam", "missing"))
>
> The first method works as expected:
>
> > system.time(CrossValParams())
>user  system elapsed
>   0.165   0.000   0.165
>
> The second takes over ten minutes and constantly uses 100% CPU usage, 
> according to top.
>
> > system.time(CrossValParams("Leave-k-Out", leave = 2))
>user  system elapsed
> 760.018  15.093 775.090
>
> Strangely, if I rerun this code again, it works quickly the second time.
>
> > system.time(CrossValParams("Leave-k-Out", leave = 2))
>user  system elapsed
>   0.145   0.000   0.145
>
> I haven't been able to come up with a minimal reproducile example of the 
> issue. How can this be done consistently and efficiently?
>
> --
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Principal Scientist, Director of Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Too many dependencies / MultiAssayExperiment + rtracklayer

2021-09-20 Thread Michael Lawrence via Bioc-devel
Hi Shraddha,

>From the rtracklayer perspective, it sounds like Rsamtools is
(indirectly) bringing in those system libraries. I would have expected
zlibbioc to cover the zlib dependency, and perhaps bz2 and lzma
support is optional. Perhaps a core member could comment on that.

In the past, I've used this package
https://github.com/Bioconductor/codetoolsBioC to identify missing
NAMESPACE imports. In theory, you could remove the rtracklayer import
and run functions in that package to identify the symbol-level
dependencies. The output is a bit noisy though.

Btw, using @importFrom only allows you to be selective of symbol-level
dependencies, not package-level.

Michael

On Mon, Sep 20, 2021 at 11:37 AM Shraddha Pai  wrote:
>
> Hello again,
> I'm trying to simplify the dependencies for my package "netDx", make it
> easier to install. It's currently got over 200(!) + some Unix libraries
> that need to be installed.
>
> 1. I ran pkgDepMetrics() from BiocPkgTools to find less-needed pkgs, and
> the package with the most dependencies is MultiAssayExperiment (see below
> email). I'm using MAE to construct a container - is there a way to use
> @importFrom calls to reduce MAE dependencies?
>
> 2. Another problem package is rtracklayer which requires Rhtslib, which
> requires some unix libraries: zlib1g-dev libbz2-dev liblzma-dev. I'm not
> sure which functionality in the package requires rtracklayer - how can I
> tell? Is there a way to simplify / reduce these deps so the user doesn't
> have to install all these unix packages?
>
> 3. Are there other "problem packages" you can see that I can remove? Let's
> assume for now ggplot2 stays because people find it useful to have plotting
> functions readily available.
>
> Thanks very much in advance,
> Shraddha
> ---
> "ImportedAndUsed" "Exported" "Usage" "DepOverlap" "DepGainIfExcluded"
> "igraph" 1 782 0.13 0.05 0
> "ggplot2" 1 520 0.19 0.19 0
> "pracma" 1 448 0.22 0.03 0
> "plotrix" 1 160 0.62 0.03 1
> "S4Vectors" 2 283 0.71 0.03 0
> "grDevices" 1 112 0.89 0.01 0
> "httr" 1 91 1.1 0.05 0
> "scater" 1 85 1.18 0.4 0
> "utils" 3 217 1.38 0.01 0
> "GenomeInfoDb" 1 60 1.67 0.06 0
> "stats" 12 449 2.67 0.01 0
> "bigmemory" 1 35 2.86 0.03 3
> "RCy3" 12 386 3.11 0.32 18
> "BiocFileCache" 1 29 3.45 0.23 3
> "glmnet" 1 24 4.17 0.07 2
> "parallel" 2 33 6.06 0.01 0
> "combinat" 1 13 7.69 0.01 1
> "MultiAssayExperiment" 4 46 8.7 0.22 1
> "foreach" 2 23 8.7 0.02 0
> "graphics" 8 87 9.2 0.01 0
> "GenomicRanges" 15 106 14.15 0.08 0
> "rappdirs" 1 7 14.29 0.01 0
> "reshape2" 1 6 16.67 0.05 0
> "RColorBrewer" 1 4 25 0.01 0
> "netSmooth" 1 3 33.33 0.82 3
> "Rtsne" 1 3 33.33 0.02 0
> "doParallel" 1 2 50 0.03 0
> "ROCR" 2 3 66.67 0.05 4
> "clusterExperiment" NA 122 NA 0.74 0
> "IRanges" NA 255 NA 0.04 0
>
>
> --
>
> *Shraddha Pai, PhD*
> Principal Investigator, OICR
> Assistant Professor, Department of Molecular Biophysics, University of
> Toronto
> shraddhapai.com; @spaiglass on Twitter
> https://pailab.oicr.on.ca
>
>
> *Ontario Institute for Cancer Research*
> MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G
> 0A3
> *@OICR_news*  | *www.oicr.on.ca*
> 
>
>
>
> *Collaborate. Translate. Change lives.*
>
>
>
> This message and any attachments may contain confident...{{dropped:22}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] rtracklayer::import (v 1.53.1) fails on xz-compressed files.

2021-09-02 Thread Michael Lawrence via Bioc-devel
Hi Charles,

Thanks for reaching out. Please ensure that you have built R with xz
(lzma) support. If this does not resolve your problem, please post on
support.bioconductor.org.

Michael

On Thu, Sep 2, 2021 at 2:54 PM Charles Plessy
 wrote:
>
> Hello
>
> on my R 4.1.0 system, rtracklayer version 1.53.1 refuses to load
> XZ-compressed GFF files, throwing the error "has unsupported type:
> xzfile".  As the function's manual page states that xz-compressed files
> are handled transparently, I suppose it is a bug?
>
> Have a ncie day,
>
> Charles Plessy
>
> --
> Charles Plessy - - ~ ~ ~ ~ ~  ~ ~ ~ ~ ~ - - charles.ple...@oist.jp
> Okinawa  Institute  of  Science  and  Technology  Graduate  University
> Staff scientist in the Luscombe Unit - ~ - https://groups.oist.jp/grsu
> Toots from work - ~ ~~ ~ - https://mastodon.technology/@charles_plessy
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Principal Scientist, Director of Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] git permissions

2021-03-01 Thread Michael Lawrence via Bioc-devel
Just got this when trying to push a change to IRanges:

Pushing to g...@git.bioconductor.org:packages/IRanges.git
FATAL: W any packages/IRanges m.lawrence DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.

Have git permissions changed?

Thanks,
Michael

-- 
Michael Lawrence
Senior Scientist, Director of Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] merging DFrames

2020-10-21 Thread Michael Lawrence via Bioc-devel
Laurent,

Thanks for bringing this up and offering to help. Yes, please raise an
issue. There's an opportunity to implement faster matching than
base::merge(), using stuff like matchIntegerQuads(), findMatches(), and
grouping().

grouping() can be really fast for character vectors, since it takes
advantage of string internalization. For example, let's say you're merging
on three character vector keys. Concatenate the keys of 'y' onto they keys
of 'x'. Then call grouping(k1, k2, k3) and you effectively have a matching.
Should be way faster than the paste() approach used by base::merge(). Would
be interesting to see.

Michael

On Wed, Oct 21, 2020 at 9:37 AM Pages, Herve  wrote:

> Hi Laurent,
>
> I think the current implementation was just an expedient to have
> something that works (in most cases). I don't know if a proper
> implementation that doesn't go thru data.frame is on the TODO list.
> Michael?
>
> I suggest you open an issue on GitHub under S4Vectors.
>
> Cheers,
> H.
>
> PS: Note that you can pass the list elements directly to the List()
> constructor, no need to construct an ordinary list first:
>
>List(1, 1:2, 1:3)  # same as List(list(1, 1:2, 1:3)))
>
>
> On 10/21/20 08:35, Laurent Gatto wrote:
> > When merging DFrame instances, the *List types are lost:
> >
> > The following two instances have NumericList columns (y and z)
> > d1 <- DataFrame(x = letters[1:3], y = List(list(1, 1:2, 1:3)))
> > d2 <- DataFrame(x = letters[1:3], z = List(list(1:3, 1:2, 1)))
> >
> > d1
> > ## DataFrame with 3 rows and 2 columns
> > ## x y
> > ##
> > ## 1   a 1
> > ## 2   b   1,2
> > ## 3   c 1,2,3
> >
> > That are however converted to list when merged
> >
> > merge(d1, d2, by = "x")
> > ## DataFrame with 3 rows and 3 columns
> > ## x  y  z
> > ## 
> > ## 1   a  1  1,2,3
> > ## 2   b1,21,2
> > ## 3   c  1,2,3  1
> >
> > Looking at merge,DataTable,DataTable (form with merge,DFrame,DFrame
> inherits), this makes sense given that they are converted to data.frames,
> merged with merge,data.frame,data.frame and the results is coerced back to
> DFrame:
> >
> >> getMethod("merge", c("DataTable", "DataTable"))
> > Method Definition:
> >
> > function (x, y, ...)
> > {
> >  .local <- function (x, y, by, ...)
> >  {
> >  if (is(by, "Hits")) {
> >  return(.mergeByHits(x, y, by, ...))
> >  }
> >  as(merge(as(x, "data.frame"), as(y, "data.frame"), by,
> >  ...), class(x))
> >  }
> >  .local(x, y, ...)
> > }
> > 
> > 
> >
> > Signatures:
> >  x   y
> > target  "DataTable" "DataTable"
> > defined "DataTable" "DataTable"
> >
> > I would like not to loose the *List classes in the individual DFrames.
> >
> > Am I missing something? Is this something that is on the todo list, or
> that I could help with?
> >
> > Best wishes,
> >
> > Laurent
> >
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=TUxwEgK30pAlKpQ6SAJcnT6kPVktHlJ-9R_Al6ri-Mg=uqmel2bDfLejAXpRYsi-PFcGqjn8b6W-JmfpZDhOF7U=
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Michael Lawrence
Senior Scientist, Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Resource temporarily unavailable error with rtracklayer

2020-07-20 Thread Michael Lawrence via Bioc-devel
Please raise an issue on the rtracklayer GitHub.

Thanks,
Michael

On Sun, Jul 19, 2020 at 8:20 PM Charles Plessy  wrote:
>
> On 15/07/2020 10:53, Charles Plessy wrote:
> > On 14/07/2020 19:56, Vincent Carey wrote:
> >> Can you post an example dataset that triggers the error reproducibly?
> >
> > here it is:
> >
> > https://www.dropbox.com/s/ai9itm586clt4y9/gp.Rds?dl=0
> >
> > It is a GPos object; interestingly I just realised that export.bw will
> > not crash if I convert it to GRanges first...
>
> Dear Vincent and everybody,
>
> as wrapping my GPos objects in a GRanges() call suppresses the error
> effectively, I wonder if I should just push a CAGEr update with that
> fix, or if it is more a bug in rtracklayer or GenomicRanges ?
>
> Have a nice day,
>
> Charles
>
> --
> Charles Plessy - - ~ ~ ~ ~ ~  ~ ~ ~ ~ ~ - - charles.ple...@oist.jp
> Okinawa  Institute  of  Science  and  Technology  Graduate  University
> Staff scientist in the Luscombe Unit - ~ - https://groups.oist.jp/grsu
> Toots from work - ~ ~~ ~ - https://mastodon.technology/@charles_plessy
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Senior Scientist, Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to import a setAs method in one's package?

2020-05-08 Thread Michael Lawrence via Bioc-devel
setAs() defines methods on the coerce() generic, so you could write
importMethodsFrom(pkg, coerce). However, as() searches the namespace
associated with class(from) for coerce() methods. As long as that
namespace hasn't selectively imported methods on coerce(), it should
end up using the global table and find the appropriate method.

It's best to just importFrom() a generic to get access to its global
table, or import() the entire package. So Hervé's suggestion of
import(methods) is the right way to go

Michael.

On Fri, May 8, 2020 at 2:07 AM Bhagwat, Aditya
 wrote:
>
> Dear BioC compatriots,
>
> How does one import a setAs method (which is generally not exported)?
> I in particular need to import as(., DataFrame) and am a bit puzzled on how 
> to do this.
>
> Thank you!
>
> Aditya
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] regionReport error leads to ggbio & biovizBase issues, confusingly not on all OS though

2020-04-09 Thread Michael Lawrence via Bioc-devel
I bumped the version of ggbio which should propagate a recent fix.

Michael

On Thu, Apr 9, 2020 at 7:11 AM Leonardo Collado Torres <
lcollado...@gmail.com> wrote:

> Hi BioC-devel,
>
> I was able to trace a regionReport error back to ggbio & biovizBase.
> However, I'm completely clueless as to why (a) it only appears in some
> OS (though this has changed over time) and (b) it does not seem to be
> a problem with derfinderPlot on the build reports for the same OS.
>
>
>
> ## Short version
>
> library('ggbio')
> data(hg19IdeogramCyto, package = 'biovizBase')
> p.ideo <- plotIdeogram(hg19IdeogramCyto, 'chr21')
> tracks(p.ideo)
>
> This code on linux leads to 3 warnings from plotIdeogram() and an
> error from tracks(). Also, validObject(hg19IdeogramCyto) fails (this
> also fails on macOS).
>
> One warning (about seqlengths) and validObject() can be fixed with:
>
> hg19IdeogramCytoFixed <- updateObject(hg19IdeogramCyto)
> seqlengths(hg19IdeogramCytoFixed) <-
> seqlengths(getChromInfoFromUCSC('hg19', as.Seqinfo =
> TRUE))[seqlevels(hg19IdeogramCytoFixed)]
>
>
> This code is currently run from derfinderPlot::plotCluster() and for
> some reason, the examples don't fail.
>
>
>
> ## Long version
>
> My package regionReport has been failing for a while with the same
> error on Bioc-release (3.10) and bioc-devel (3.11). I've been watching
> the fails for a while and it's mostly:
>
> * linux on 3.10, occasionally Windows
> * linux on 3.11
>
> Right now, it's only linux on both 3.10 and 3.11.
>
> The error always is:
>
> Quitting from lines 224-229 (basicExploration.Rmd)
> Error in magick_image_trim(image, fuzz) :
>   R: unable to get registry ID `cache:hosts' @
> error/registry.c/GetImageRegistry/202
> Calls: render ...  -> assert_image ->  ->
> magick_image_trim
>
>
> A while back, I reported that I couldn't reproduce this on 3.10 on
> Windows (when it was failing). I've looked at the magick package and
> it didn't seem to be the issue
> mhttps://
> github.com/ropensci/magick/search?q=magick_image_trim_q=magick_image_trim
> .
>
> After getting R 4.0.0 setup on my macOS, I also couldn't reproduce it.
> So I finally looked at Nitesh's work on dockers at
> http://bioconductor.org/help/docker/
>
>  docker run \
>  -e PASSWORD=bioc \
>  -p 8787:8787 \
>  bioconductor/bioconductor_docker:devel
>
> There I could reproduce the build problem from regionReport and got a
> more detailed error message. Using an interactive session where I ran
> the internal code piece by piece, I got to the biovizBase + ggbio
> error.
>
> derfinderPlot::plotCluster(), which is also a package I made, runs the
> exact same code and has not failed at all on both bioc-release (3.10)
> and devel (3.11) all this time. I would have expected derfinderPlot to
> fail also (in which case it would have provided a more informative
> error than the one hidden by rmarkdown in regionReport's build
> reports). If I run the example for derfinderPlot::plotCluster() on
> linux it does fail, however this was not picked up by the BioC build
> report. Hence why I'm even more confused.
>
> Anyway, hopefully fixing the ggbio/biovizBase issue will fix
> derfinderPlot::plotCluster() and in turn regionReport.
>
> As to the inconsistencies across OS or between the BioC build report
> output for derfinderPlot and what I see on linux, I have no idea
> though.
>
>
> ## Gist with details
>
> I have a Gist with all the output logs and the R session information
> at https://gist.github.com/lcolladotor/b7585962082263b377c02fa4451f27e6.
> Here's the brief description for each file.
>
> * macOS_regionReport_build.txt: runs R CMD build for regionReport
> without a problem
> * linux_regionReport_build.txt: fails to run R CMD build for
> regionReport and provides more details and the BioC build report. This
> is where we can start to see errors related to the ideograms. I
> thought that fixing this would only involve a updateObject() call at
> this point.
> * reprex_ggbio_code.R: smallest code I have for reproducing this issue
> * macOS_reprex_ggbio_output.txt: runs without errors (though it still
> has warnings about seqlengths and other ggbio parts)
> * linux_reprex_ggbio_output.txt: fails to run the small reprex
> * reprex_derfinderPlot_code.R: runs the example for
> derfinderPlot::plotCluster()
> * macOS_reprex_derfinderPlot_output.txt: similar story to
> macOS_reprex_ggbio_output.txt:
> * linux_reprex_derfinderPlot_output.txt: the example fails here too!
> Which makes me even more confused. I mean, it's consistent with
> regionReport and the ggbio reprex. However, I then don't understand
> why this hasn't failed in the BioC builds.
>
>
>
> Best,
> Leo
>
>
> Leonardo Collado Torres, Ph. D., Research Scientist
> Lieber Institute for Brain Development
> 855 N Wolfe St, Suite 300
> Baltimore, MD 21205
> http://lcolladotor.github.io
>
>
> On Tue, Mar 17, 2020 at 6:35 PM Leonardo Collado Torres
>  wrote:
> >
> > Hi BioC-devel,
> >
> > I spent some time yesterday getting 

Re: [Bioc-devel] Installing libsbml for rsbml on a mac?

2020-04-03 Thread Michael Lawrence via Bioc-devel
Does this work for you?

https://github.com/mythosil/homebrew-libsbml

Michael

On Fri, Apr 3, 2020 at 2:20 AM Martin Morgan 
wrote:

> The INSTALL file for rsbml says
>
>   ~/b/git/rsbml master$ cat INSTALL
>   ...
>
>   To build rsbml on the Mac, first install libsbml with:
>   `brew install homebrew/science/libsbml`.
>
> but
>
>   ~/b/git/rsbml master$ brew install homebrew/science/libsbml
>   Error: homebrew/science was deprecated. This tap is now empty as all its
> formulae were
>   migrated.
>
> If I download and build from source
>
>   curl -O
> https://s3.amazonaws.com/linux-provisioning/libSBML-5.10.2-core-src.tar.gz
>   tar xzf libSBML-5.10.2-core-src.tar.gz
>   ./configure --enable-layout
>   make
>   make install
>
> when I install rsbml things go well enough until
>
>   ** testing if installed package can be loaded from temporary location
>   Error: package or namespace load failed for ‘rsbml’:
>.onLoad failed in loadNamespace() for 'rsbml', details:
> call: dyn.load(file, DLLpath = DLLpath, ...)
> error: unable to load shared object
>  
> '/Users/ma38727/Library/Caches/pkgserver/library/78c7d37adb9cb6df9d8f1ec10fed882b/00LOCK-rsbml/00new/rsbml/libs/rsbml.so':
>
> dlopen(/Users/ma38727/Library/Caches/pkgserver/library/78c7d37adb9cb6df9d8f1ec10fed882b/00LOCK-rsbml/00new/rsbml/libs/rsbml.so,
> 6): Symbol not found: _ASTNode_isName
>   Referenced from: /usr/local/lib/libsbml.5.dylib
>   Expected in: flat namespace
>  in /usr/local/lib/libsbml.5.dylib
>  Error: loading failed
>  Execution halted
>
> Any hints?
>
> Thanks, Martin
>


-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Question regarding the error from build report

2020-03-27 Thread Michael Lawrence via Bioc-devel
Try resending the email as plain text. The list strips HTML content,
so if the message has no text part, there will be no message.

On Thu, Mar 26, 2020 at 9:35 PM Hervé Pagès  wrote:
>
> What is the question?
>
> On 3/26/20 17:36, 유도영 wrote:
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=yOswU0969aOLHa3jBIrzVj8fAiukIOoj9_7XnCcbRqI=zbtkDPRVf5duX0qU9xSZy3MqdeBSIo3Bpl7L6zEdNBE=
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] proper way to define an S4 method for 'plot'

2020-03-16 Thread Michael Lawrence via Bioc-devel
At least with 2020-03-16 r77987 (Mac OS) I wasn't able to reproduce this.
Btw, there's no real need for an S4 method in this case. An S3 method would
work just as well.

On Mon, Mar 16, 2020 at 4:12 PM Henrik Bengtsson 
wrote:

> Maybe it's related to:
>
> * The plot() S3 generic function is now in package base rather than
> package graphics, as it is reasonable to have methods that do not use
> the graphics package. The generic is currently re-exported from the
> graphics namespace to allow packages importing it from there to
> continue working, but this may change in future.
>
> mentioned in the R 4.0.0 NEWS
> (https://cran.r-project.org/doc/manuals/r-devel/NEWS.html)?
>
> /Henrik
>
> On Mon, Mar 16, 2020 at 3:52 PM Vincent Carey
>  wrote:
> >
> > I just updated my R and I am getting into trouble with MLInterfaces
> > maintenance.
> >
> > > BiocManager::install("MLInterfaces")
> >
> > *Bioconductor version 3.11 (BiocManager 1.30.10), R Under development
> > (unstable)*
> >
> > *  (2020-03-15 r77975)*
> >
> > *Installing package(s) 'MLInterfaces'*
> >
> > *Warning: unable to access index for repository
> > https://cran.rstudio.com/bin/macosx/el-capitan/contrib/4.0
> > :*
> >
> > *  cannot open URL
> > 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/4.0/PACKAGES
> > '*
> >
> >
> >   There is a binary version available but the source version is later:
> >
> >  binary source needs_compilation
> >
> > MLInterfaces 1.67.2 1.67.5 FALSE
> >
> >
> > *installing the source package ‘MLInterfaces’*
> >
> >
> > *trying URL
> > '
> https://bioconductor.org/packages/3.11/bioc/src/contrib/MLInterfaces_1.67.5.tar.gz
> > <
> https://bioconductor.org/packages/3.11/bioc/src/contrib/MLInterfaces_1.67.5.tar.gz
> >'*
> >
> > *Content type 'application/x-gzip' length 1071876 bytes (1.0 MB)*
> >
> > *==*
> >
> > *downloaded 1.0 MB*
> >
> >
> > * installing *source* package ‘MLInterfaces’ ...
> >
> > ** using staged installation
> >
> > ** R
> >
> > ** inst
> >
> > ** byte-compile and prepare package for lazy loading
> >
> > Error in getGeneric(f, TRUE, envir, package) :
> >
> >   no generic function found for ‘plot’
> >
> > Calls:  ... namespaceImportFrom -> .mergeImportMethods ->
> >  -> getGeneric
> >
> > recover called non-interactively; frames dumped, use debugger() to view
> >
> > ** help
> >
> > *** installing help indices
> >
> > ** building package indices
> >
> > ** installing vignettes
> >
> > ** testing if installed package can be loaded from temporary location
> >
> > Error: package or namespace load failed for ‘MLInterfaces’ in
> getGeneric(f,
> > TRUE, envir, package):
> >
> >  no generic function found for ‘plot’
> >
> >
> > ...
> >
> >
> > > sessionInfo()
> >
> > R Under development (unstable) (2020-03-15 r77975)
> >
> > Platform: x86_64-apple-darwin15.6.0 (64-bit)
> >
> > Running under: macOS Mojave 10.14.6
> >
> >
> > Matrix products: default
> >
> > BLAS:
> >
> /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.0.dylib
> >
> > LAPACK:
> >
> /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
> >
> >
> > locale:
> >
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> >
> > attached base packages:
> >
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> >
> > other attached packages:
> >
> > [1] rmarkdown_2.1
> >
> >
> > loaded via a namespace (and not attached):
> >
> >  [1] BiocManager_1.30.10 compiler_4.0.0  startup_0.14.0
> >
> >  [4] tools_4.0.0 htmltools_0.4.0 Rcpp_1.0.3
> >
> >  [7] codetools_0.2-16knitr_1.28  xfun_0.12
> >
> > [10] digest_0.6.25   rlang_0.4.5 evaluate_0.14
> >
> > --
> > The information in this e-mail is intended only for th...{{dropped:6}}
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods

2020-02-06 Thread Michael Lawrence via Bioc-devel
I would urge you to make the package _directly_ compatible with
standard Bioconductor data structures; no explicit conversion. But you
can create wrapper methods (even on an S3 generic) that perform the
conversion automatically. You'll probably want two separate APIs
though (in different styles), for one thing automatic conversion is
obviously not possible for return values.

Michael

On Thu, Feb 6, 2020 at 5:34 PM stefano  wrote:
>
> Thanks Michael,
>
> yes in a sense, ttBulk and SummariseExperiment can be considere as two 
> interfaces. Would be fair enough to create a function that convert from one 
> to the other, although the default would be ttBulk?
>
> > I'm not sure the tidyverse is a great answer to the user interface, because 
> > it lacks domain semantics
>
> Would be fair to say that ttBulk class could be considered a tibble with 
> specific semantics? In the sense that it holds information about key column 
> names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and 
> has a validator (that is triggered at every ttBulk function).
>
> I think at the moment, given (i) S3 problem, and (ii) the lack of formal 
> foundation on SummaisedExperiment interface (that maybe would require an S4 
> technology itself, where SummariseExperiment could be a slot?) my package 
> would belong more to CRAN, until those two issues will have been resolved.
>
> I imagine there are not many cases where a CRAN package migrated to 
> Bioconductor after complying with the ecosystem policies.
>
> Thanks a lot.
>
> Best wishes.
>
> Stefano
>
>
>
> Stefano Mangiola | Postdoctoral fellow
>
> Papenfuss Laboratory
>
> The Walter Eliza Hall Institute of Medical Research
>
> +61 (0)466452544
>
>
>
> Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence 
>  ha scritto:
>>
>> There's a difference between implementing software, where one wants
>> formal data structures, and providing a convenient user interface.
>> Software needs to interface with other software, so a package could
>> provide both types of interfaces, one based on rich (S4) data
>> structures, another on simpler structures with an API more amenable to
>> analysis. I'm not sure the tidyverse is a great answer to the user
>> interface, because it lacks domain semantics. This is still an active
>> area of research (see Stuart Lee's plyranges, for example). I hope you
>> can find a reasonable compromise that enables you to integrate ttBulk
>> into Bioconductor, so that it can take advantage of the synergies the
>> ecosystem provides.
>>
>> PS: There is no simple fix for your example.
>>
>> Michael
>>
>> On Thu, Feb 6, 2020 at 4:12 PM stefano  wrote:
>> >
>> > Thanks a lot for your comment Martin and Michael,
>> >
>> > Here I reply to Marti's comment. Michael I will try to implement your
>> > solution!
>> >
>> > I think a key point from
>> > https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106
>> > (that I was under-looking) is
>> >
>> > *>> "So to sum up: if you submit a package to Bioconductor, there is an
>> > expectation that your package can work seamlessly with other Bioconductor
>> > packages, and your implementation should support that. The safest and
>> > easiest way to do that is to use Bioconductor data structures"*
>> >
>> > In this case my package would not be suited as I do not use pre-existing
>> > Bioconductor data structures, but instead i see value in using a simple
>> > tibble, for the reasons in part explained in the README
>> > https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse
>> > and friends for bulk transcriptomic analyses).
>> >
>> > *>> "with the minimum standard of being able to accept such objects even if
>> > you do not rely on them internally (though you should)"*
>> >
>> > With this I can comply in the sense that I can built converters to and from
>> > SummarizedExperiment (for example).
>> >
>> > * >> "If you don't want to do that, then that's a shame, but it would
>> > suggest that Bioconductor would not be the right place to host this
>> > package."*
>> >
>> > Well said.
>> >
>> > In summary, I do not rely on Bioconductor data structure, as I am proposing
>> > another paradigm, but my back end is made of largely Bioconductor analysis
>> > packages that I would like to interface with tidyverse. So
>> >
>> > 1) Should I build converters to Bioc. data structures, and force the use of
>> > S3 object (needed to tiidyverse to work), or
>> > 2) Submit to CRAN
>> >
>> > I don't have strong feeling for either, although I think Bioconductor would
>> > be a good fit. Please community give me your honest opinions, I will take
>> > them seriously and proceed.
>> >
>> >
>> >
>> > Best wishes.
>> >
>> > *Stefano *
>> >
>> >
>> >
>> > Stefano Mangiola | Postdoctoral fellow
>> >
>> > Papenfuss Laboratory
>> >
>> > The Walter Eliza Hall Institute of Medical Research
>> >
>> > +61 (0)466452544
>> >
>> >
>> > Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan <
>> > 

Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods

2020-02-06 Thread Michael Lawrence via Bioc-devel
There's a difference between implementing software, where one wants
formal data structures, and providing a convenient user interface.
Software needs to interface with other software, so a package could
provide both types of interfaces, one based on rich (S4) data
structures, another on simpler structures with an API more amenable to
analysis. I'm not sure the tidyverse is a great answer to the user
interface, because it lacks domain semantics. This is still an active
area of research (see Stuart Lee's plyranges, for example). I hope you
can find a reasonable compromise that enables you to integrate ttBulk
into Bioconductor, so that it can take advantage of the synergies the
ecosystem provides.

PS: There is no simple fix for your example.

Michael

On Thu, Feb 6, 2020 at 4:12 PM stefano  wrote:
>
> Thanks a lot for your comment Martin and Michael,
>
> Here I reply to Marti's comment. Michael I will try to implement your
> solution!
>
> I think a key point from
> https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106
> (that I was under-looking) is
>
> *>> "So to sum up: if you submit a package to Bioconductor, there is an
> expectation that your package can work seamlessly with other Bioconductor
> packages, and your implementation should support that. The safest and
> easiest way to do that is to use Bioconductor data structures"*
>
> In this case my package would not be suited as I do not use pre-existing
> Bioconductor data structures, but instead i see value in using a simple
> tibble, for the reasons in part explained in the README
> https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse
> and friends for bulk transcriptomic analyses).
>
> *>> "with the minimum standard of being able to accept such objects even if
> you do not rely on them internally (though you should)"*
>
> With this I can comply in the sense that I can built converters to and from
> SummarizedExperiment (for example).
>
> * >> "If you don't want to do that, then that's a shame, but it would
> suggest that Bioconductor would not be the right place to host this
> package."*
>
> Well said.
>
> In summary, I do not rely on Bioconductor data structure, as I am proposing
> another paradigm, but my back end is made of largely Bioconductor analysis
> packages that I would like to interface with tidyverse. So
>
> 1) Should I build converters to Bioc. data structures, and force the use of
> S3 object (needed to tiidyverse to work), or
> 2) Submit to CRAN
>
> I don't have strong feeling for either, although I think Bioconductor would
> be a good fit. Please community give me your honest opinions, I will take
> them seriously and proceed.
>
>
>
> Best wishes.
>
> *Stefano *
>
>
>
> Stefano Mangiola | Postdoctoral fellow
>
> Papenfuss Laboratory
>
> The Walter Eliza Hall Institute of Medical Research
>
> +61 (0)466452544
>
>
> Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan <
> mtmorgan.b...@gmail.com> ha scritto:
>
> > The idea isn't to use S4 at any cost, but to 'play well' with the
> > Bioconductor ecosystem, including writing robust and maintainable code.
> >
> > This comment
> > https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106
> > provides some motivation; there was also an interesting exchange on the
> > Bioconductor community slack about this (join at
> > https://bioc-community.herokuapp.com/; discussion starting with
> > https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800).
> > The plyranges package http://bioconductor.org/packages/plyranges and
> > recently accepted fluentGenomics workflow
> > https://github.com/Bioconductor/Contributions/issues/1350 provide
> > illustrations.
> >
> > In your domain it's really surprising that your package does not use
> > (Import or Depend on) SummarizedExperiment or GenomicRanges packages. From
> > a superficial look at your package, it seems like something like
> > `reduce_dimensions()` could be defined to take & return a
> > SummarizedExperiment and hence benefit from some of the points in the
> > github issue comment mentioned above.
> >
> > Certainly there is a useful transition, both 'on the way in' to a
> > SummarizedExperiment, and after leaving the more specialized bioinformatic
> > computations to, e.g., display a pairs plot of the reduced dimensions,
> > where one might re-shape the data to a tidy format and use 'plain old'
> > tibbles; the fluentGenomics workflow might provide some guidance.
> >
> > At the end of the day it would not be surprising for Bioconductor packages
> > to make use of tidy concepts and data structures, particularly in the
> > vignette, and it would be a mistake for Bioconductor to exclude
> > well-motivated 'tidy' representations.
> >
> > Martin Morgan
> >
> > On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" <
> > bioc-devel-boun...@r-project.org on behalf of mangiolastef...@gmail.com>
> > wrote:
> >
> > Hello,
> >
> > I have a package (ttBulk) under 

Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods

2020-02-06 Thread Michael Lawrence via Bioc-devel
Martin's comments are great.

I'll just give some technical help. The "tbl_df" S4 class is already
defined by dplyr (and more properly), so omit the call to
setOldClass(). Then, things work a bit better.

> my %>% nest(-b) # had to fix this from your example
[some ugly result]
Warning messages:
1: In class(x) <- c(subclass, tibble_class) :
  Setting class(x) to multiple strings ("tbl_df", "tbl", ...); result
will no longer be an S4 object
2: In split.default(data[nest_vars], idx) :
  data length is not a multiple of split variable

I would argue that this is an issue with the tidyverse. They provide
an S4 class, which should in principle work, because S4 is compatible
enough with S3, but the use of class<-() breaks it. There may be a way
to make class()<- work in such cases. I will think about it.

The right way to do it with S4 would be to just call initialize(x,
...) in new_tibble(). They have to implement the initialize() logic
themselves using update_tibble_attrs(). S4 gives you that for free.

And there are more issues but I think this is a good example of the
difficulties.

Michael

On Thu, Feb 6, 2020 at 2:46 PM stefano  wrote:
>
> Hello,
>
> I have a package (ttBulk) under review. I have been told to replace the S3
> system to S4. My package is based on the class tbl_df and must be fully
> compatible with tidyverse methods (inheritance). After some tests and
> research I understood that tidyverse ecosystem is not compatible with S4
> classes.
>
>  For example, several methos do not apparently handle S4 objects based on
> S3 tbl_df
>
> ```library(tidyverse)setOldClass("tbl_df")
> setClass("test2", contains = "tbl_df")
> my <- new("test2",  tibble(a = 1))
> my %>%  mutate(b = 3)
>
>a b
> 1 1 3
> ```
>
>  ```my <- new("test2",  tibble(a = rnorm(100), b = 1))
> my %>% nest(data = -b)
> Error: `x` must be a vector, not a `test2` object
> Run `rlang::last_error()` to see where the error occurred.
> ```
>
> Could you please advise whether a tidyverse based package can be hosted on
> Bioconductor, and if S4 classes are really mandatory? I need to understand
> if I am forced to submit to CRAN instead (although Bioconductor would be a
> good fit, sice I try to interface transcriptional analysis tools to tidy
> universe)
>
>
> Thanks a lot.
> Stefano
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Compatibility of S4 and tidyverse

2020-02-05 Thread Michael Lawrence via Bioc-devel
Yep that about sums it up.

On Wed, Feb 5, 2020 at 8:37 PM Stefano Mangiola 
wrote:

>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-30 Thread Michael Lawrence via Bioc-devel
That sucks. It was broken since it was added in 2017... now fixed.

On Thu, Jan 30, 2020 at 11:20 AM Pages, Herve  wrote:
>
> On 1/30/20 11:10, Hervé Pagès wrote:
> > Yes poverlaps() is a good option, as mentioned earlier.
>
> Well actually not. Looks like it's broken:
>
>  > poverlaps(GRanges("chr1:11-15"), GRanges("chr1:16-20"))
> Error in isSingleNumber(minoverlap) : object 'minoverlaps' not found
>
> H.



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-30 Thread Michael Lawrence via Bioc-devel
What happened to just poverlaps()?

On Thu, Jan 30, 2020 at 10:34 AM Pages, Herve  wrote:
>
> On 1/29/20 23:31, web working wrote:
> > Hi Herve,
> >
> > Thank you for your answer. pcompare works fine for me. Here my solution:
> >
> > query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 22)))
> > subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 21)))
> > out <- vector("numeric", length(query))
> > out[(which(abs(pcompare(query, subject))<5))] <- 1
> > out
>
> Why not just
>
>out <- abs(pcompare(query, subject)) < 5
>
> In any case you should use integer instead of numeric (twice more
> compact in memory).
>
> H.
>
> >
> > Carey was right that this here is off list. Next time I will pose my
> > question on support.bioconductor.org
> > <https://urldefense.proofpoint.com/v2/url?u=http-3A__support.bioconductor.org=DwMCaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=VZ2-jg7W_Ctrav2BpPVUPpvJlyISX3QVwFAzTnDnNTs=Jdmp3dD6ubzPdE8KjFJ3urOav62YTOmxYcYZ4000MY8=>.
> >
> > Best,
> >
> > Tobias
> >
> > Am 29.01.20 um 18:02 schrieb Pages, Herve:
> >> Yes poverlaps().
> >>
> >> Or pcompare(), which should be even faster. But only if you are not
> >> afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning
> >> of the codes returned by pcompare().
> >>
> >> H.
> >>
> >> On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
> >>> poverlaps()?
> >>>
> >>> On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
> >>>> Hello,
> >>>>
> >>>> I have two big GRanges objects and want to search for an overlap of  the
> >>>> first range of query with the first range of subject. Then take the
> >>>> second range of query and compare it with the second range of subject
> >>>> and so on. Here an example of my problem:
> >>>>
> >>>> # GRanges objects
> >>>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
> >>>> 22)), id=1:4)
> >>>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
> >>>> 21)), id=1:4)
> >>>>
> >>>> # The 2 overlaps at the first position should not be counted, because
> >>>> these ranges are at different rows.
> >>>> countOverlaps(query, subject)
> >>>>
> >>>> # Approach 1 (bad style. I have simplified it to understand)
> >>>> dat <- as.data.frame(findOverlaps(query, subject))
> >>>> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
> >>>> indexBool <- dat[indexDat,1]
> >>>> out <- rep(FALSE, length(query))
> >>>> out[indexBool] <- TRUE
> >>>> as.numeric(out)
> >>>>
> >>>> # Approach 2 (bad style and takes too long)
> >>>> out <- vector("numeric", 4)
> >>>> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
> >>>> out
> >>>>
> >>>> # Approach 3 (wrong results)
> >>>> as.numeric(overlapsAny(query, subject))
> >>>> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
> >>>>
> >>>>
> >>>> Maybe someone has an idea to speed this up?
> >>>>
> >>>>
> >>>> Best,
> >>>>
> >>>> Tobias
> >>>>
> >>>> ___
> >>>> Bioc-devel@r-project.org  mailing list
> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws=
> >>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Michael Lawrence via Bioc-devel
poverlaps()?

On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
>
> Hello,
>
> I have two big GRanges objects and want to search for an overlap of  the
> first range of query with the first range of subject. Then take the
> second range of query and compare it with the second range of subject
> and so on. Here an example of my problem:
>
> # GRanges objects
> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
> 22)), id=1:4)
> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
> 21)), id=1:4)
>
> # The 2 overlaps at the first position should not be counted, because
> these ranges are at different rows.
> countOverlaps(query, subject)
>
> # Approach 1 (bad style. I have simplified it to understand)
> dat <- as.data.frame(findOverlaps(query, subject))
> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
> indexBool <- dat[indexDat,1]
> out <- rep(FALSE, length(query))
> out[indexBool] <- TRUE
> as.numeric(out)
>
> # Approach 2 (bad style and takes too long)
> out <- vector("numeric", 4)
> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
> out
>
> # Approach 3 (wrong results)
> as.numeric(overlapsAny(query, subject))
> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>
>
> Maybe someone has an idea to speed this up?
>
>
> Best,
>
> Tobias
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] rtracklayer::import.bed(genome = Seqinfo)

2019-10-23 Thread Michael Lawrence via Bioc-devel
Hi Aditya,

This is probably best asked on the support site. When you post there,
please include your sessionInfo().

Thanks,
Michael

On Tue, Oct 22, 2019 at 9:50 AM Bhagwat, Aditya
 wrote:
>
> Dear Michael,
>
> Sorry for my incomplete email - the send button got hit too fast. Better this 
> time.
>
> rtracklayer::import.bed mentions the argument "genome" to be either a genome 
> identifier (like 'mm10') or a Seqinfo object.
>
> I notice that the second option does not work on my BED file (in attach).
>
> # This works
> rtracklayer::import.bed('SRF.bed', genome = 'mm10') # this works
>
> # But this doesn't
> seqinfo1<- 
> GenomeInfoDb::seqinfo(BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10)
> rtracklayer::import.bed('SRF.bed', genome = seqinfo1)
>
> So I am requesting feedback.
> I thought to use this channel
>
> Aditya



-- 
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] plyranges group_by

2019-10-17 Thread Michael Lawrence via Bioc-devel
I replied on the support site. Let's move the discussion there.

On Thu, Oct 17, 2019 at 1:24 AM Bhagwat, Aditya <
aditya.bhag...@mpi-bn.mpg.de> wrote:

> Thank you Stuart and Michael for your feedback.
>
> Stuart, in response to your request for more context regarding my use
> case, I have updated my recent BioC support post
> , now providing all use-case
> details.
>
> Michael, I didn't selfmatch yet, but Stuart's reply seems to suggest that
> it would not get the data.table performance (which is literally
> instantaneous).
>
> As a general question, do you think it would be useful to add a
> data.table-based split-apply-combine functionality to plyranges (such that
> end user operations remain on GRanges-only)? I wouldn't mind writing a
> function to do that (in github), but first need your feedback as to whether
> you think that would be useful :-)
>
> Aditya
>
>
> --
> *From:* Stuart Lee [le...@wehi.edu.au]
> *Sent:* Thursday, October 17, 2019 3:01 AM
> *To:* Michael Lawrence
> *Cc:* Bhagwat, Aditya; bioc-devel@r-project.org
> *Subject:* Re: plyranges group_by
>
> Currently, the way grouping indices are generated is pretty slow if you’re
> doing stuff rowwise. Michael’s suggestion for using selfmatch should speed
> things up a bit. What are you planning to do after grouping? I’ve found
> there’s usually to do stuff without rowwise grouping but really depends on
> what you’re after. Re your other issue would you mind putting it on as a
> GitHub issue.
> —
> Stuart Lee
> Visiting PhD Student - Ritchie Lab
>
>
>
> On 16 Oct 2019, at 22:54, Michael Lawrence 
> wrote:
>
> Just a note that in this particular case, selfmatch(annotatedsrf) would
> be a fast way to generate a grouping vector, like
> plyranges::group_by(annotatedsrf, selfmatch(annotatedsrf)).
>
> Michael
>
> On Wed, Oct 16, 2019 at 2:48 AM Bhagwat, Aditya <
> aditya.bhag...@mpi-bn.mpg.de> wrote:
>
>> Hi Stuart, Michael,
>>
>> Your plyranges package is really cool - now I am using it for left
>> joining GRanges (I am facing a minor issue there
>> , but that is not the topic
>> of this email - I have been asked by Lori not to double-post :-)).
>>
>> This email is about the plyranges functionality for grouping GRanges.
>> That is cool, but I found it to be not so performant for large numbers of
>> ranges.
>> My R session hangs when I do:
>>
>> bedfile <- paste0('
>> https://gitlab.gwdg.de/loosolab/software/multicrispr/wikis',
>>   '/uploads/a51e98516c1e6b71441f5b5a5f741fa1/SRF.bed')
>> srfranges <- rtracklayer::import.bed(bedfile, genome = 'mm10')
>> txdb <- TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene
>> generanges <- GenomicFeatures::genes(txdb)
>> annotatedsrf <- plyranges::join_overlap_left(srfranges, generanges)
>> plyranges::group_by(annotatedsrf, seqnames, start, end, strand)
>>
>> For my purposes, I worked around it by performing a groupby in data.table:
>>
>> data.table::as.data.table(annotatedsrf)[
>> !is.na(gene_id),
>> gene_id := paste0(gene_id, collapse = ';'),
>> by = c('seqnames', 'start', 'end', 'strand'))
>>
>> And was wondering, in general, whether it would be useful to have a
>> data.table-based backend for plyranges::groupby()
>> And, whether all of this is actually a on-issue due to my improper use
>> of plyranges::group_by properly.
>>
>> Thank you for feebdack :-)
>>
>> Aditya
>>
>>
>>
>
> --
> Michael Lawrence
> Scientist, Bioinformatics and Computational Biology
> Genentech, A Member of the Roche Group
> Office +1 (650) 225-7760
> micha...@gene.com
>
> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>
>
> ___
>
> The information in this email is confidential and inte...{{dropped:26}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] plyranges group_by

2019-10-16 Thread Michael Lawrence via Bioc-devel
Just a note that in this particular case, selfmatch(annotatedsrf) would be
a fast way to generate a grouping vector, like
plyranges::group_by(annotatedsrf, selfmatch(annotatedsrf)).

Michael

On Wed, Oct 16, 2019 at 2:48 AM Bhagwat, Aditya <
aditya.bhag...@mpi-bn.mpg.de> wrote:

> Hi Stuart, Michael,
>
> Your plyranges package is really cool - now I am using it for left joining
> GRanges (I am facing a minor issue there
> , but that is not the topic
> of this email - I have been asked by Lori not to double-post :-)).
>
> This email is about the plyranges functionality for grouping GRanges.
> That is cool, but I found it to be not so performant for large numbers of
> ranges.
> My R session hangs when I do:
>
> bedfile <- paste0('
> https://gitlab.gwdg.de/loosolab/software/multicrispr/wikis',
>   '/uploads/a51e98516c1e6b71441f5b5a5f741fa1/SRF.bed')
> srfranges <- rtracklayer::import.bed(bedfile, genome = 'mm10')
> txdb <- TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene
> generanges <- GenomicFeatures::genes(txdb)
> annotatedsrf <- plyranges::join_overlap_left(srfranges, generanges)
> plyranges::group_by(annotatedsrf, seqnames, start, end, strand)
>
> For my purposes, I worked around it by performing a groupby in data.table:
>
> data.table::as.data.table(annotatedsrf)[
> !is.na(gene_id),
> gene_id := paste0(gene_id, collapse = ';'),
> by = c('seqnames', 'start', 'end', 'strand'))
>
> And was wondering, in general, whether it would be useful to have a
> data.table-based backend for plyranges::groupby()
> And, whether all of this is actually a on-issue due to my improper use of
> plyranges::group_by properly.
>
> Thank you for feebdack :-)
>
> Aditya
>
>
>

-- 
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] read_bed()

2019-09-18 Thread Michael Lawrence via Bioc-devel
I'd suggest separating the operations on the data from the interface.
You can have both, one layer for programming and another for
interactive analysis.

On Wed, Sep 18, 2019 at 5:05 AM Bhagwat, Aditya
 wrote:
>
> In the end I endeavour to end up with a handful of verbs, with which I can do 
> all tasks in a project.
>
> Regarding the BED files: they're basic bed files, with additional metadata 
> columns to allow traceback. But for the purpose of multicrispr, non need to 
> restrict to those files only. You extraCols works great for me. And for 
> multicrispr examples, I have removed the metadata cols to keep things simple. 
> You were right btw that things went wrong earlier in the column stripping 
> process.
>
> Aditya
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 1:57 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Hi Michael,
>
> That's a software design dilemma I've encountered a few times.
>
> One approach is to keep the "verb" functions bare. E.g. read_bed would only 
> read a bedfile, and plot_bed would somehow plot it. Advantage: if read_bed 
> doesn't depend on anything else, other functions can depend on it, which 
> makes dependency handling easier.
>
> Another intention is to make verb functions "intuitive". In that scenario, I 
> try for each operation to also output a visual image of the operation, to 
> make it easier to see at a glance what is happening. E.g. for the range 
> operations in multicrispr, the function plot_intervals visually shows what 
> operation is being performed, making it easier to both spot errors as well as 
> maintain focus.
>
> In the case of read_bed, I thought of wrapping around your excellent 
> core-level rtracklayer::import(), additionally providing the textual and 
> visual feedback which I intent to give.
>
> Interesting to hear your suggestions on this topic, though.
>
> Aditya
>
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Wednesday, September 18, 2019 1:33 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I'm not sure if a function called read_bed() should be plotting or
> printing. Is your BED file a known BED variant, i.e., maybe there is a
> better name for the file type than "bed"?
>
>
> On Wed, Sep 18, 2019 at 3:17 AM Bhagwat, Aditya
>  wrote:
> >
> > Actually,
> >
> > I will keep multicrispr::read_bed(), but wrap it around 
> > rtracklayer::import.bed, and additionally plot and print range summaries.
> >
> > Aditya
> >
> > 
> > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> > Aditya [aditya.bhag...@mpi-bn.mpg.de]
> > Sent: Wednesday, September 18, 2019 11:31 AM
> > To: Michael Lawrence
> > Cc: bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > (Typo corrected to avoid confusion)
> >
> > Michael,
> >
> > rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
> > multicrispr::read_bed().
> >
> > In order to avoid the overkill of `require(tracklayer)` for multicrispr 
> >  users, does it make 
> > sense to import/re-export import.bed() in multicrispr? What is BioC 
> > convention/best practice in such cases?
> >
> > Aditya
> >
> >
> >
> > 
> > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> > Aditya [aditya.bhag...@mpi-bn.mpg.de]
> > Sent: Wednesday, September 18, 2019 8:35 AM
> > To: Michael Lawrence
> > Cc: bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > Thank you Michael :-)
> >
> > Aditya
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 8:49 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > I think you probably made a mistake when dropping the columns. When I
> > provide the extraCols= argument (inventing my own names for things),
> > it almost works, but it breaks due to NAs in the extra columns. The
> > "." character is the standard way to express NA in BED files. I've
> > added support for extra na.strings to version 1.45.6.
> >
> > For reference, the call is like:
> >
> > import("SRF.bed", extraCols=c(chr2="character", start2="integer",
> > end2="integer", mDux="factor", type="factor", pos1="integer",
> > pos2="integer", strand2="factor", from="factor", n="integer",
> > code="character", anno="factor", id="character", biotype="character",
> > score2="numeric" ), na.strings="NA")
> >
> >
> > On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Michael,
> > >
> > > I 

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Michael Lawrence via Bioc-devel
I'm not sure if a function called read_bed() should be plotting or
printing. Is your BED file a known BED variant, i.e., maybe there is a
better name for the file type than "bed"?


On Wed, Sep 18, 2019 at 3:17 AM Bhagwat, Aditya
 wrote:
>
> Actually,
>
> I will keep multicrispr::read_bed(), but wrap it around 
> rtracklayer::import.bed, and additionally plot and print range summaries.
>
> Aditya
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 11:31 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> (Typo corrected to avoid confusion)
>
> Michael,
>
> rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
> multicrispr::read_bed().
>
> In order to avoid the overkill of `require(tracklayer)` for multicrispr 
>  users, does it make 
> sense to import/re-export import.bed() in multicrispr? What is BioC 
> convention/best practice in such cases?
>
> Aditya
>
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 8:35 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Thank you Michael :-)
>
> Aditya
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 8:49 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I think you probably made a mistake when dropping the columns. When I
> provide the extraCols= argument (inventing my own names for things),
> it almost works, but it breaks due to NAs in the extra columns. The
> "." character is the standard way to express NA in BED files. I've
> added support for extra na.strings to version 1.45.6.
>
> For reference, the call is like:
>
> import("SRF.bed", extraCols=c(chr2="character", start2="integer",
> end2="integer", mDux="factor", type="factor", pos1="integer",
> pos2="integer", strand2="factor", from="factor", n="integer",
> code="character", anno="factor", id="character", biotype="character",
> score2="numeric" ), na.strings="NA")
>
>
> On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > I removed the additional metadata columns in SRF.bed
> > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> >
> > But still can't get rtracklayer::import.bed working:
> >
> > > rtracklayer::import.bed(bedfile)
> > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > dec, : scan() expected 'a real', got '1.168.595'
> > > bedfile
> > [1] 
> > "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
> >
> > Never mind, multicrispr function read_bed, based on data.table::fread is 
> > doing the job, so I will stick to that .
> >
> > Thank you for all feedback,
> >
> > Cheers,
> >
> > Aditya
> >
> >
> > 
> > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> > Aditya [aditya.bhag...@mpi-bn.mpg.de]
> > Sent: Tuesday, September 17, 2019 2:48 PM
> > To: Michael Lawrence
> > Cc: bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > Oh :-) - Thankyou for explaining!
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:40 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > Having a "." in the function name does not make something "S3".
> > There's no dispatch from import() to import.bed(). Had I not been a
> > total newb when I created rtracklayer, I would have called the
> > function importBed() or something like that. Sorry for the confusion.
> >
> > On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Oh, superb, thx!
> > >
> > > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > > intention underlying these choices (I'm asking because I am trying to 
> > > figure out myself when to use S3 and when to use S4 and whether to mix 
> > > the two).
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 2:23 PM
> > > To: Bhagwat, Aditya
> > > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > The generic documentation does not mention it, but see ?import.bed.
> > > It's similar to colClasses on read.table().
> > >
> > > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> > >  wrote:
> > > >
> > > > 

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files.  I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Thankyou Michael,
> > >
> > > How do I use the extraCols argument? The documentation does not mention 
> > > an `extraCols` argument explicitly, so it must be one of the ellipsis 
> > > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > > extraCols = 10 (ten extra columns) or so?
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 2:05 PM
> > > To: Bhagwat, Aditya
> > > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > It breaks it because it's not standard BED; however, using the
> > > extraCols= argument should work in this case. Requiring an explicit
> > > format specification is intentional, because it provides validation
> > > and type safety, and it communicates the format to a future reader.
> > > This also looks a bit like a bedPE file, so you might consider using
> > > the Pairs data structure.
> > >
> > > Michael
> > >
> > > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
> > >  wrote:
> > > >
> > > > Hi Michael,
> > > >
> > > > Yeah, I also noticed that the attachment was eaten when it entered the 
> > > > bio-devel list.
> > > >
> > > > The file is also accessible in the extdata of the multicrispr:
> > > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> > > >
> > > > A bedfile to GRanges importer requires columns 1 (chrom), 2 
> > > > (chromStart), 3 (chromEnd), and column 6 (strand). All of these are 
> > > > present in SRF.bed.
> > > >
> > > > I am curious as to why you feel that having additional columns in a 
> > > > bedfile would break it?
> > > >
> > > > Cheers,
> > > >
> > > > Aditya
> > > >
> > > > 
> > > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > > Sent: 

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
Having a "." in the function name does not make something "S3".
There's no dispatch from import() to import.bed(). Had I not been a
total newb when I created rtracklayer, I would have called the
function importBed() or something like that. Sorry for the confusion.

On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
 wrote:
>
> Oh, superb, thx!
>
> Interesting ... here you use S3 rather than S4 - I wonder the design 
> intention underlying these choices (I'm asking because I am trying to figure 
> out myself when to use S3 and when to use S4 and whether to mix the two).
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:23 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> The generic documentation does not mention it, but see ?import.bed.
> It's similar to colClasses on read.table().
>
> On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
>  wrote:
> >
> > Thankyou Michael,
> >
> > How do I use the extraCols argument? The documentation does not mention an 
> > `extraCols` argument explicitly, so it must be one of the ellipsis 
> > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > extraCols = 10 (ten extra columns) or so?
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:05 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > It breaks it because it's not standard BED; however, using the
> > extraCols= argument should work in this case. Requiring an explicit
> > format specification is intentional, because it provides validation
> > and type safety, and it communicates the format to a future reader.
> > This also looks a bit like a bedPE file, so you might consider using
> > the Pairs data structure.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Michael,
> > >
> > > Yeah, I also noticed that the attachment was eaten when it entered the 
> > > bio-devel list.
> > >
> > > The file is also accessible in the extdata of the multicrispr:
> > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> > >
> > > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 
> > > 3 (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> > >
> > > I am curious as to why you feel that having additional columns in a 
> > > bedfile would break it?
> > >
> > > Cheers,
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 1:41 PM
> > > To: Bhagwat, Aditya
> > > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > I don't see an attachment, nor can I find the multicrispr package
> > > anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> > > which was addressed many years ago, so I've updated the documentation.
> > > Broken BED files will never be supported.
> > >
> > > Michael
> > >
> > > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> > >  wrote:
> > > >
> > > > Hi Lori,
> > > >
> > > > I remember now - I tried this function earlier, but it does not work 
> > > > for my bedfiles, like the one in attach.
> > > >
> > > > > bedfile  <- system.file('extdata/SRF.bed', package = 
> > > > > 'multicrispr')
> > > > >
> > > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > > > 'mm10')
> > > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > > > dec,  : scan() expected 'an integer', got 'chr2'
> > > > >
> > > >
> > > > Perhaps this sentence in `?rtracklayer::import` points to the source of 
> > > > the error?
> > > >
> > > > many tools and organizations have extended BED with additional columns. 
> > > > These are not officially valid BED files, and as such rtracklayer does 
> > > > not yet support them (this will be addressed soon).
> > > >
> > > > Which brings the question: how soon is soon :-D ?
> > > >
> > > > Aditya
> > > >
> > > >
> > > > 
> > > > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > > > Sent: Tuesday, September 17, 2019 1:02 PM
> > > > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > > > Subject: Re: read_bed()
> > > >
> > > > Please look at rtracklayer::import()  function that we recommend for 
> > > > reading of BAM files along with other common formats.
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > Lori Shepherd
> > > >
> > > > Bioconductor Core Team
> > > >
> > > > Roswell Park Cancer Institute
> > > >
> > > > Department of Biostatistics & Bioinformatics
> > > >
> > > > Elm & Carlton Streets
> > > >
> > > > Buffalo, New York 

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
The generic documentation does not mention it, but see ?import.bed.
It's similar to colClasses on read.table().

On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
 wrote:
>
> Thankyou Michael,
>
> How do I use the extraCols argument? The documentation does not mention an 
> `extraCols` argument explicitly, so it must be one of the ellipsis arguments, 
> but `?rtracklayer::import` does not mention it. Should I say extraCols = 10 
> (ten extra columns) or so?
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:05 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> It breaks it because it's not standard BED; however, using the
> extraCols= argument should work in this case. Requiring an explicit
> format specification is intentional, because it provides validation
> and type safety, and it communicates the format to a future reader.
> This also looks a bit like a bedPE file, so you might consider using
> the Pairs data structure.
>
> Michael
>
> On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > Yeah, I also noticed that the attachment was eaten when it entered the 
> > bio-devel list.
> >
> > The file is also accessible in the extdata of the multicrispr:
> > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> >
> > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
> > (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> >
> > I am curious as to why you feel that having additional columns in a bedfile 
> > would break it?
> >
> > Cheers,
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 1:41 PM
> > To: Bhagwat, Aditya
> > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > I don't see an attachment, nor can I find the multicrispr package
> > anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> > which was addressed many years ago, so I've updated the documentation.
> > Broken BED files will never be supported.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Lori,
> > >
> > > I remember now - I tried this function earlier, but it does not work for 
> > > my bedfiles, like the one in attach.
> > >
> > > > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > > >
> > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > > 'mm10')
> > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > > dec,  : scan() expected 'an integer', got 'chr2'
> > > >
> > >
> > > Perhaps this sentence in `?rtracklayer::import` points to the source of 
> > > the error?
> > >
> > > many tools and organizations have extended BED with additional columns. 
> > > These are not officially valid BED files, and as such rtracklayer does 
> > > not yet support them (this will be addressed soon).
> > >
> > > Which brings the question: how soon is soon :-D ?
> > >
> > > Aditya
> > >
> > >
> > > 
> > > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > > Sent: Tuesday, September 17, 2019 1:02 PM
> > > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > > Subject: Re: read_bed()
> > >
> > > Please look at rtracklayer::import()  function that we recommend for 
> > > reading of BAM files along with other common formats.
> > >
> > > Cheers,
> > >
> > >
> > > Lori Shepherd
> > >
> > > Bioconductor Core Team
> > >
> > > Roswell Park Cancer Institute
> > >
> > > Department of Biostatistics & Bioinformatics
> > >
> > > Elm & Carlton Streets
> > >
> > > Buffalo, New York 14263
> > >
> > > 
> > > From: Bioc-devel  on behalf of Bhagwat, 
> > > Aditya 
> > > Sent: Tuesday, September 17, 2019 6:58 AM
> > > To: bioc-devel@r-project.org 
> > > Subject: [Bioc-devel] read_bed()
> > >
> > > Dear bioc-devel,
> > >
> > > I had two feedback requests regarding the function read_bed().
> > >
> > > 1) Did I overlook, and therefore, re-invent existing functionality?
> > > 2) If not, would `read_bed` be suited for existence in a more 
> > > foundational package, e.g. `GenomicRanges`, given the rather basal nature 
> > > of this functionality?
> > >
> > > It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> > > (bedfile) to 1-based (GRanges), adds 
> > > BSgenome info (to allow for implicit range validity 
> > > checking) and plots the 
> > > karyogram.
> > >
> > > Thank you for your feedback.
> > >
> > > Cheers,
> > >
> > > Aditya
> > >
> > >
> > > #' Read bedfile into GRanges
> > > #'
> > > #' 

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
It breaks it because it's not standard BED; however, using the
extraCols= argument should work in this case. Requiring an explicit
format specification is intentional, because it provides validation
and type safety, and it communicates the format to a future reader.
This also looks a bit like a bedPE file, so you might consider using
the Pairs data structure.

Michael

On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> Yeah, I also noticed that the attachment was eaten when it entered the 
> bio-devel list.
>
> The file is also accessible in the extdata of the multicrispr:
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
> (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
>
> I am curious as to why you feel that having additional columns in a bedfile 
> would break it?
>
> Cheers,
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 1:41 PM
> To: Bhagwat, Aditya
> Cc: Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I don't see an attachment, nor can I find the multicrispr package
> anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> which was addressed many years ago, so I've updated the documentation.
> Broken BED files will never be supported.
>
> Michael
>
> On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Lori,
> >
> > I remember now - I tried this function earlier, but it does not work for my 
> > bedfiles, like the one in attach.
> >
> > > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > >
> > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > 'mm10')
> > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > dec,  : scan() expected 'an integer', got 'chr2'
> > >
> >
> > Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> > error?
> >
> > many tools and organizations have extended BED with additional columns. 
> > These are not officially valid BED files, and as such rtracklayer does not 
> > yet support them (this will be addressed soon).
> >
> > Which brings the question: how soon is soon :-D ?
> >
> > Aditya
> >
> >
> > 
> > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > Sent: Tuesday, September 17, 2019 1:02 PM
> > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > Subject: Re: read_bed()
> >
> > Please look at rtracklayer::import()  function that we recommend for 
> > reading of BAM files along with other common formats.
> >
> > Cheers,
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Cancer Institute
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > 
> > From: Bioc-devel  on behalf of Bhagwat, 
> > Aditya 
> > Sent: Tuesday, September 17, 2019 6:58 AM
> > To: bioc-devel@r-project.org 
> > Subject: [Bioc-devel] read_bed()
> >
> > Dear bioc-devel,
> >
> > I had two feedback requests regarding the function read_bed().
> >
> > 1) Did I overlook, and therefore, re-invent existing functionality?
> > 2) If not, would `read_bed` be suited for existence in a more foundational 
> > package, e.g. `GenomicRanges`, given the rather basal nature of this 
> > functionality?
> >
> > It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> > (bedfile) to 1-based (GRanges), adds 
> > BSgenome info (to allow for implicit range validity 
> > checking) and plots the 
> > karyogram.
> >
> > Thank you for your feedback.
> >
> > Cheers,
> >
> > Aditya
> >
> >
> > #' Read bedfile into GRanges
> > #'
> > #' @param bedfilefile path
> > #' @param bsgenome   BSgenome, e.g. 
> > BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> > #' @param zero_based logical(1): whether bedfile GRanges are 0-based
> > #' @param rm_duplicates  logical(1)
> > #' @param plot   logical(1)
> > #' @param verboselogical(1)
> > #' @return \code{\link[GenomicRanges]{GRanges-class}}
> > #' @note By convention BED files are 0-based. GRanges are always 1-based.
> > #'   A good discussion on these two alternative codings is given
> > #'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
> > #' @examples
> > #' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> > #' (gr <- read_bed(bedfile, bsgenome))
> > #' @importFrom  data.table  :=
> > #' @export
> > read_bed <- function(
> > bedfile,
> > bsgenome,
> > zero_based= TRUE,
> > rm_duplicates = TRUE,
> > plot  = 

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
I don't see an attachment, nor can I find the multicrispr package
anywhere. The "addressed soon" was referring to the BEDX+Y formats,
which was addressed many years ago, so I've updated the documentation.
Broken BED files will never be supported.

Michael

On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
 wrote:
>
> Hi Lori,
>
> I remember now - I tried this function earlier, but it does not work for my 
> bedfiles, like the one in attach.
>
> > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> >
> > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > 'mm10')
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  
> : scan() expected 'an integer', got 'chr2'
> >
>
> Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> error?
>
> many tools and organizations have extended BED with additional columns. These 
> are not officially valid BED files, and as such rtracklayer does not yet 
> support them (this will be addressed soon).
>
> Which brings the question: how soon is soon :-D ?
>
> Aditya
>
>
> 
> From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> Sent: Tuesday, September 17, 2019 1:02 PM
> To: Bhagwat, Aditya; bioc-devel@r-project.org
> Subject: Re: read_bed()
>
> Please look at rtracklayer::import()  function that we recommend for reading 
> of BAM files along with other common formats.
>
> Cheers,
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> 
> From: Bioc-devel  on behalf of Bhagwat, 
> Aditya 
> Sent: Tuesday, September 17, 2019 6:58 AM
> To: bioc-devel@r-project.org 
> Subject: [Bioc-devel] read_bed()
>
> Dear bioc-devel,
>
> I had two feedback requests regarding the function read_bed().
>
> 1) Did I overlook, and therefore, re-invent existing functionality?
> 2) If not, would `read_bed` be suited for existence in a more foundational 
> package, e.g. `GenomicRanges`, given the rather basal nature of this 
> functionality?
>
> It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> (bedfile) to 1-based (GRanges), adds 
> BSgenome info (to allow for implicit range validity 
> checking) and plots the 
> karyogram.
>
> Thank you for your feedback.
>
> Cheers,
>
> Aditya
>
>
> #' Read bedfile into GRanges
> #'
> #' @param bedfilefile path
> #' @param bsgenome   BSgenome, e.g. 
> BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' @param zero_based logical(1): whether bedfile GRanges are 0-based
> #' @param rm_duplicates  logical(1)
> #' @param plot   logical(1)
> #' @param verboselogical(1)
> #' @return \code{\link[GenomicRanges]{GRanges-class}}
> #' @note By convention BED files are 0-based. GRanges are always 1-based.
> #'   A good discussion on these two alternative codings is given
> #'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
> #' @examples
> #' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' (gr <- read_bed(bedfile, bsgenome))
> #' @importFrom  data.table  :=
> #' @export
> read_bed <- function(
> bedfile,
> bsgenome,
> zero_based= TRUE,
> rm_duplicates = TRUE,
> plot  = TRUE,
> verbose   = TRUE
> ){
> # Assert
> assert_all_are_existing_files(bedfile)
> assert_is_a_bool(verbose)
> assert_is_a_bool(rm_duplicates)
> assert_is_a_bool(zero_based)
>
> # Comply
> seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL
>
> # Read
> if (verbose) cmessage('\tRead %s', bedfile)
> dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
> col.names = c('seqnames', 'start', 'end', 'strand'))
> data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))
>
> # Transform coordinates: 0-based -> 1-based
> if (zero_based){
> if (verbose)cmessage('\t\tConvert 0 -> 1-based')
> dt[, start := start + 1]
> }
>
> if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
> nrow(dt), length(unique(dt$seqnames)))
>
> # Drop duplicates
> if (rm_duplicates){
> is_duplicated <- cduplicated(dt)
> if (any(is_duplicated)){
> if (verbose) cmessage('\t\t%d after removing duplicates')
> dt %<>% extract(!duplicated)
> }
> }
>
> # Turn into GRanges
> gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)
>
> # Plot and return
> title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
> if (plot) plot_karyogram(gr, title)
> gr
> }
>
>
> [[alternative HTML version deleted]]
>

Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

2019-09-16 Thread Michael Lawrence via Bioc-devel
It's on the right track. The case where two ranges are produced is
problematic, because we would want this to be a parallel vector
operation, where the length of the input is the same as the length of
the output. So that last case I think might just ignore the leftend
and rightstart arguments with a warning, returning a result with the
gap filled.

Michael

On Mon, Sep 16, 2019 at 1:48 AM Bhagwat, Aditya
 wrote:
>
> Michael, actually, such a generic straddle() could be useful:
>
> straddle(leftstart=-100, rightend=100)  # extended range
> straddle(leftstart=-100, leftend=-1)   # left flank
> straddle(rightstart=1, rightend=100) # right flank
> straddle(leftstart=-100, leftend=-1, rightstart=1, rightend=100) # left and 
> right flanks
>
> What do you think?
>
> Aditya
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Monday, September 16, 2019 10:30 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`
>
> Hmm no that wouldn't work, it would become messy trying to figure out when 
> incompatible arguments are provided.
>
> Aditya
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Monday, September 16, 2019 10:09 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`
>
> Hi Michael,
>
> Thank you for the pointer to plyranges - looks very useful!
>
> > Maybe a better name is "straddle" for when ranges
> > straddle one of the endpoints? In keeping with the current pattern of
> > Ranges API, there would be a single function: straddle(x, side, left,
> > right, ignore.strand=FALSE). So straddle(x, "start", -100, 10) would
> > be like promoters(x, 100, 10) for a positive or "*" strand range.
>
> Cool suggestion, and a really fitting verb :-)
> Just slightly modifying your suggestion makes the API fully generic (waaw!), 
> generalizing over left_flank, right_flank, as well as slop:
>
> straddle(leftstart, leftend, rightstart, rightend)
>
> Would it be worth having such functionality in GenomicRanges or plyranges, 
> rather than multicrispr?
>
>
> > That brings up strandedness, which needs to be considered here. For
> > unstranded ranges, it may be that direct start() and end()
> > manipulation is actually more transparent than a special verb.
>
> I ended up using left/right for unstranded, and up/down for stranded 
> operations.
>
>
> > The functions that involve reduce() wouldn't fit into the intrarange
> > operations, as they are summarizing ranges, not transforming them.
> > They may be going too far.
>
> True. Actually, the functions would be cleaner without the reduce(), I think 
> I'll take that out.
>
> Cheers,
>
> Aditya
>
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

2019-09-13 Thread Michael Lawrence via Bioc-devel
Thanks for these suggestions; I think they're worth considering.

I've never been totally satisfied with (my function) flank(), because
it's limited and its arguments are somewhat obscure in meaning. You
can check out what we did in plyranges:
https://rdrr.io/bioc/plyranges/man/flank-ranges.html. Your functions
are more flexible, because they are two-way about the endpoint, like
promoters(). Sometimes I've solved that with resize(flank()), but
that's not ideal.  Maybe a better name is "straddle" for when ranges
straddle one of the endpoints? In keeping with the current pattern of
Ranges API, there would be a single function: straddle(x, side, left,
right, ignore.strand=FALSE). So straddle(x, "start", -100, 10) would
be like promoters(x, 100, 10) for a positive or "*" strand range. That
brings up strandedness, which needs to be considered here. For
unstranded ranges, it may be that direct start() and end()
manipulation is actually more transparent than a special verb. I
wonder what Stuart Lee thinks?

The functions that involve reduce() wouldn't fit into the intrarange
operations, as they are summarizing ranges, not transforming them.
They may be going too far.

Michael

On Fri, Sep 13, 2019 at 4:48 AM Bhagwat, Aditya
 wrote:
>
> Dear bioc-devel,
>
> The ?GenomicRanges::`intra-range-methods` are very useful for range 
> arithmetic
>
> Feedback request: would it be of general use to add the methods below to the 
> GenomicRanges::`intra-range-methods` palette (after properly S4-ing them)?
> Or shall I keep them in 
> multicrispr?
> Additional feedback welcome as well (e.g. re-implementation of already 
> existing functionality).
>
>
> 1) Left flank
>
> #' Left flank
> #' @param gr   \code{\link[GenomicRanges]{GRanges-class}}
> #' @param leftstart number: flank start (relative to range start)
> #' @param leftend   number: flank end   (relative to range start)
> #' @return a \code{\link[GenomicRanges]{GRanges-class}}
> #' @export
> #' @examples
> #' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
> #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' gr <- read_bed(bedfile, bsgenome)
> #' left_flank(gr)
> left_flank <- function(gr, leftstart = -200, leftend   = -1){
>
> # Assert
> assert_is_identical_to_true(is(gr, 'GRanges'))
> assert_is_a_number(leftstart)
> assert_is_a_number(leftend)
>
> # Flank
> newranges <- gr
> end(newranges)   <- start(gr) + leftend
> start(newranges) <- start(gr) + leftstart
>
> # Return
> newranges
> }
>
>
> 2) Right flank
>
> #' Right flank
> #' @param gr\code{\link[GenomicRanges]{GRanges-class}}
> #' @param rightstart number: flank start (relative to range end)
> #' @param rightend   number: flank end   (relative to range end)
> #' @return \code{\link[GenomicRanges]{GRanges-class}}
> #' @export
> #' @examples
> #' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
> #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' gr <- read_bed(bedfile, bsgenome)
> #' right_flank(gr)
> #' @export
> right_flank <- function(gr, rightstart = 1, rightend   = 200){
>
> # Assert
> assert_is_identical_to_true(is(gr, 'GRanges'))
> assert_is_a_number(rightstart)
> assert_is_a_number(rightend)
> assert_is_a_bool(verbose)
>
> # Flank
> newranges <- gr
> start(newranges) <- end(newranges) + rightstart
> end(newranges)   <- end(newranges) + rightend
>
> # Plot
> if (plot)  plot_intervals(GRangesList(sites = gr, rightflanks = 
> newranges))
>
> # Return
> cmessage('\t\t%d right flanks : [end%s%d, end%s%d]',
> length(newranges),
> csign(rightstart),
> abs(rightstart),
> csign(rightend),
> abs(rightend))
> newranges
> }
>
>
> 3) Slop
>
> #' Slop (i.e. extend left/right)
> #' @param gr\code{\link[GenomicRanges]{GRanges-class}}
> #' @param leftstart number: flank start (relative to range start)
> #' @param rightend  number: flank end   (relative to range end)
> #' @return \code{\link[GenomicRanges]{GRanges-class}}
> #' @export
> #' @examples
> #' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
> #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' gr <- read_bed(bedfile, bsgenome)
> #' slop(gr)
> #' @export
> slop <- function(gr, leftstart = -22, rightend  =  22){
>
> # Assert
> assert_is_identical_to_true(methods::is(gr, 'GRanges'))
> assert_is_a_number(leftstart)
> assert_is_a_number(rightend)
> assert_is_a_bool(verbose)
>
> # Slop
> newranges <- gr
> start(newranges) <- start(newranges) + leftstart
> end(newranges)   <- end(newranges)   + rightend
>
> # Return
> newranges
> }
>
>
> 4) Flank fourways
>
> #' Flank fourways
> #'
> #' Flank left and right, for both 

Re: [Bioc-devel] Duplicated method names in purrr and GenomicRanges

2019-09-12 Thread Michael Lawrence via Bioc-devel
Third option: use Reduce() from base instead of purr::reduce().

On Thu, Sep 12, 2019 at 2:54 AM O'CALLAGHAN Alan
 wrote:
>
> Hi,
>
> Two options.
>
> First option: import either purrr::reduce or GenomicRanges::reduce, and
> call the other with [pkg]::reduce.
>
> Second option: remove the import for both of these. Use purrr::reduce
> and GenomicRanges::reduce to call both functions.
>
> I think the second option leads to clearer code and would be my definite
> preference.
>
>
> On 12/09/2019 10:07, bio...@posteo.de wrote:
> > Dear all,
> >
> > I am developing a Bioconductor package and have a problem with two
> > methods which have the same name. I am using the reduce() function
> > from the R packages GenomicRanges and purrr. All methods from other
> > packages are imported with @importFrom in all of my functions.
> >
> >
> > During devtools::document() I get the following Warning:
> >
> > ...
> >
> > replacing previous import ‘GenomicRanges::reduce’ by ‘purrr::reduce’
> > when loading ‘testPackage’
> >
> > ...
> >
> >
> > Here are my NAMESPACE entries:
> >
> > # Generated by roxygen2: do not edit by hand
> >
> > export(mergeDataFrameList)
> > export(reduceDummy)
> > importFrom(GenomicRanges,GRanges)
> > importFrom(GenomicRanges,reduce)
> > importFrom(IRanges,IRanges)
> > importFrom(dplyr,"%>%")
> > importFrom(dplyr,left_join)
> > importFrom(dplyr,mutate)
> > importFrom(dplyr,pull)
> > importFrom(magrittr,"%<>%")
> > importFrom(purrr,reduce)
> > importFrom(tibble,tibble)
> >
> >
> > I am not using both reduce functions in the same function. To use the
> > GenomicRanges reduce function, I have to call this function like this:
> > GenomicRanges::reduce().
> >
> > I understand the warning and why I have to call the reduce function
> > like this. Is there a solution for this problem? Compiling a R package
> > with warnings and calling functions like this is not the best way I
> > guess.
> >
> > I am using R version 3.6.1 (2019-07-05)
> >
> > Thanks for help!
> >
> > Best,
> >
> > Tobias
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> The University of Edinburgh is a charitable body, registered in Scotland, 
> with registration number SC005336.
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-11 Thread Michael Lawrence via Bioc-devel
Good call. Didn't know about seqlevelsInUse().

On Wed, Sep 11, 2019 at 8:29 AM Pages, Herve  wrote:
>
> Or more accurately:
>
>as(seqinfo(bsgenome)[seqlevelsInUse(grl)], "GRanges")
>
> since not all seqlevels are necessarily "in use" (i.e. not necessarily
> represented in seqnames(grl)).
>
> H.
>
> On 9/11/19 08:26, Hervé Pagès wrote:
> > The unique seqnames is what we call the seqlevels. So just:
> >
> >as(seqinfo(bsgenome)[seqlevels(grl)], "GRanges")
> >
> > H.
> >
> > On 9/11/19 07:42, Michael Lawrence wrote:
> >> So why not just do:
> >>
> >> as(seqinfo(bsgenome)[unique(unlist(seqnames(grl)))], "GRanges")
> >>
> >> Michael
> >>
> >> On Wed, Sep 11, 2019 at 5:55 AM Bhagwat, Aditya
> >>  wrote:
> >>>
> >>> Thanks Michael,
> >>>
> >>> The important detail is that I want to plot the relevant chromosomes
> >>> only
> >>>
> >>>  relevant_chromosomes <- GenomeInfoDb::seqnames(grangeslist)  %>%
> >>>  S4Vectors::runValue() %>%
> >>>  Reduce(union, .) %>%
> >>>  unique()
> >>>
> >>>  genomeranges <- GenomeInfoDb::seqinfo(grangeslist) %>%
> >>>  as('GRanges') %>%
> >>> (function(gr){
> >>> gr [ as.character(GenomeInfoDb::seqnames(gr))
> >>> %in%
> >>>  relevant_chromosomes ]
> >>> })
> >>>
> >>>  kp <- karyoploteR::plotKaryotype(genomeranges)
> >>>  karyoploteR::kpPlotRegions(kp, grangeslist) # grangeslist
> >>> contains crispr target sites
> >>>
> >>>
> >>> And, this process required as("GRanges")
> >>>
> >>>  #' Convert BSgenome into GRanges
> >>>  #' @param from BSgenome, e.g.
> >>> BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> >>>  #' @examples
> >>>  #' require(magrittr)
> >>>  #' BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10 %>%
> >>>  #' as('GRanges')
> >>>  #' @importClassesFrom BSgenome BSgenome
> >>>  #' @export
> >>>  methods::setAs( "BSgenome",
> >>>  "GRanges",
> >>>  function(from)  from %>%
> >>>  GenomeInfoDb::seqinfo() %>%
> >>>  as('GRanges'))
> >>>
> >>> Thankyou for feedback,
> >>>
> >>> Aditya
> >>>
> >>> 
> >>> From: Michael Lawrence [lawrence.mich...@gene.com]
> >>> Sent: Wednesday, September 11, 2019 2:31 PM
> >>> To: Bhagwat, Aditya
> >>> Cc: Pages, Herve; bioc-devel@r-project.org
> >>> Subject: Re: [Bioc-devel] Import BSgenome class without attaching
> >>> BiocGenerics (and others)?
> >>>
> >>> I'm pretty surprised that the karyoploteR package does not accept a
> >>> Seqinfo since it is plotting chromosomes. But again, please consider
> >>> just doing as(seqinfo(bsgenome), "GRanges").
> >>>
> >>> On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya
> >>>  wrote:
> 
>  Hi Herve,
> 
>  Thank you for your responses.
>   From your response, it is clear that the vcountPDict use case does
>  not need a BSgenome -> GRanges coercer.
> 
>  The karyoploteR use case still requires it, though, to allow
>  plotting of only the chromosomal BSgenome portions:
> 
>   chromranges <- as(bsegenome, "GRanges")
>   kp <- karyoploteR::plotKaryotype(chromranges)
>   karyoploteR::kpPlotRegions(kp, crispr_target_sites)
> 
>  Or do you see any alternative for this purpose too?
> 
>  Aditya
> 
>  
>  From: Pages, Herve [hpa...@fredhutch.org]
>  Sent: Wednesday, September 11, 2019 12:24 PM
>  To: Bhagwat, Aditya; bioc-devel@r-project.org
>  Subject: Re: [Bioc-devel] Import BSgenome class without attaching
>  BiocGenerics (and others)?
> 
>  Hi Aditya,
> 
>  On 9/11/19 01:31, Bhagwat, Aditya wrote:
> > Hi Herve,
> >
> >
> >   > It feels that a coercion method from BSgenome to GRanges should
> > rather be defined in the BSgenome package itself.
> >
> > :-)
> >
> >
> >   > Patch/PR welcome on GitHub.
> >
> > Owkies. What pull/fork/check/branch protocol to be followed?
> >
> >
> >   > Is this what you have in mind for this coercion?
> >   > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
> >
> > Yes.
> >
> > Perhaps also useful to share the wider context, allowing your and
> > others
> > feedback for improved software design.
> > I wanted to subset a
> > BSgenome
> >
> > (without the _random or _unassigned), but Lori explained this is not
> > possible.
> > 

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-11 Thread Michael Lawrence via Bioc-devel
So why not just do:

as(seqinfo(bsgenome)[unique(unlist(seqnames(grl)))], "GRanges")

Michael

On Wed, Sep 11, 2019 at 5:55 AM Bhagwat, Aditya
 wrote:
>
> Thanks Michael,
>
> The important detail is that I want to plot the relevant chromosomes only
>
> relevant_chromosomes <- GenomeInfoDb::seqnames(grangeslist)  %>%
> S4Vectors::runValue() %>%
> Reduce(union, .) %>%
> unique()
>
> genomeranges <- GenomeInfoDb::seqinfo(grangeslist) %>%
> as('GRanges') %>%
>(function(gr){
>gr [ as.character(GenomeInfoDb::seqnames(gr)) %in%
> relevant_chromosomes ]
>})
>
> kp <- karyoploteR::plotKaryotype(genomeranges)
> karyoploteR::kpPlotRegions(kp, grangeslist) # grangeslist contains crispr 
> target sites
>
>
> And, this process required as("GRanges")
>
> #' Convert BSgenome into GRanges
> #' @param from BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' @examples
> #' require(magrittr)
> #' BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10 %>%
> #' as('GRanges')
> #' @importClassesFrom BSgenome BSgenome
> #' @export
> methods::setAs( "BSgenome",
> "GRanges",
> function(from)  from %>%
> GenomeInfoDb::seqinfo() %>%
> as('GRanges'))
>
> Thankyou for feedback,
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Wednesday, September 11, 2019 2:31 PM
> To: Bhagwat, Aditya
> Cc: Pages, Herve; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> BiocGenerics (and others)?
>
> I'm pretty surprised that the karyoploteR package does not accept a
> Seqinfo since it is plotting chromosomes. But again, please consider
> just doing as(seqinfo(bsgenome), "GRanges").
>
> On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Herve,
> >
> > Thank you for your responses.
> > From your response, it is clear that the vcountPDict use case does not need 
> > a BSgenome -> GRanges coercer.
> >
> > The karyoploteR use case still requires it, though, to allow plotting of 
> > only the chromosomal BSgenome portions:
> >
> > chromranges <- as(bsegenome, "GRanges")
> > kp <- karyoploteR::plotKaryotype(chromranges)
> > karyoploteR::kpPlotRegions(kp, crispr_target_sites)
> >
> > Or do you see any alternative for this purpose too?
> >
> > Aditya
> >
> > 
> > From: Pages, Herve [hpa...@fredhutch.org]
> > Sent: Wednesday, September 11, 2019 12:24 PM
> > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> > BiocGenerics (and others)?
> >
> > Hi Aditya,
> >
> > On 9/11/19 01:31, Bhagwat, Aditya wrote:
> > > Hi Herve,
> > >
> > >
> > >  > It feels that a coercion method from BSgenome to GRanges should
> > > rather be defined in the BSgenome package itself.
> > >
> > > :-)
> > >
> > >
> > >  > Patch/PR welcome on GitHub.
> > >
> > > Owkies. What pull/fork/check/branch protocol to be followed?
> > >
> > >
> > >  > Is this what you have in mind for this coercion?
> > >  > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
> > >
> > > Yes.
> > >
> > > Perhaps also useful to share the wider context, allowing your and others
> > > feedback for improved software design.
> > > I wanted to subset a
> > > BSgenome
> > > (without the _random or _unassigned), but Lori explained this is not
> > > possible.
> > > 
> > >
> > > Instead Lori suggested to coerce a BSgenome into a GRanges
> > > ,
> > > which is a useful solution, but for which currently no exported S4
> > > method exists
> > > 
> > > So I defined an S4 coercer in my multicrispr package, making sure to
> > > 

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-11 Thread Michael Lawrence via Bioc-devel
I'm pretty surprised that the karyoploteR package does not accept a
Seqinfo since it is plotting chromosomes. But again, please consider
just doing as(seqinfo(bsgenome), "GRanges").

On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya
 wrote:
>
> Hi Herve,
>
> Thank you for your responses.
> From your response, it is clear that the vcountPDict use case does not need a 
> BSgenome -> GRanges coercer.
>
> The karyoploteR use case still requires it, though, to allow plotting of only 
> the chromosomal BSgenome portions:
>
> chromranges <- as(bsegenome, "GRanges")
> kp <- karyoploteR::plotKaryotype(chromranges)
> karyoploteR::kpPlotRegions(kp, crispr_target_sites)
>
> Or do you see any alternative for this purpose too?
>
> Aditya
>
> 
> From: Pages, Herve [hpa...@fredhutch.org]
> Sent: Wednesday, September 11, 2019 12:24 PM
> To: Bhagwat, Aditya; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> BiocGenerics (and others)?
>
> Hi Aditya,
>
> On 9/11/19 01:31, Bhagwat, Aditya wrote:
> > Hi Herve,
> >
> >
> >  > It feels that a coercion method from BSgenome to GRanges should
> > rather be defined in the BSgenome package itself.
> >
> > :-)
> >
> >
> >  > Patch/PR welcome on GitHub.
> >
> > Owkies. What pull/fork/check/branch protocol to be followed?
> >
> >
> >  > Is this what you have in mind for this coercion?
> >  > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
> >
> > Yes.
> >
> > Perhaps also useful to share the wider context, allowing your and others
> > feedback for improved software design.
> > I wanted to subset a
> > BSgenome
> > (without the _random or _unassigned), but Lori explained this is not
> > possible.
> > 
> >
> > Instead Lori suggested to coerce a BSgenome into a GRanges
> > ,
> > which is a useful solution, but for which currently no exported S4
> > method exists
> > 
> > So I defined an S4 coercer in my multicrispr package, making sure to
> > properly import the Bsgenome class
> > .
> > Then, after coercing a BSgenome into a GRanges, I can extract the
> > chromosomes, after properly importing IRanges::`%in%`
> > 
>
> Looks like you don't need to coerce the BSgenome object to GRanges. See
> https://support.bioconductor.org/p/123489/#124581
>
> H.
>
> > Which I can then on end to karyoploteR
> > ,
> > for genome-wide plots of crispr target sites.
> >
> > A good moment also to say thank you to all of you who helped me out, it
> > helps me to make multicrispr fit nicely into the BioC ecosystem.
> >
> > Speeking of BioC design philosophy, can any of you suggest concise and
> > to-the-point reading material to deepen my understanding of the core
> > BioC software design philosophy?
> > I am trying to understand that better (which was the context for asking
> > recently why there are three Vector -> data.frame coercers in S4Vectors
> > )
> >
> > Cheers,
> >
> > Aditya
> >
> >
> >
> >
> > 
> 

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-06 Thread Michael Lawrence via Bioc-devel
There's never a need to attach a package to satisfy the dependencies
of another package. That would defeat the purpose of namespaces.

The three coercion functions gets at the heart of the S3/S4 mess.
setAs() provides a dynamic coercion mechanism, so is useful for as(x,
class) when class can be anything. as.data.frame() is an S3 generic
defined by the base package, so every package sees it. Something
promotes as.data.frame() to an S4 generic, but only packages that
import the generic can see it. That likely excludes the vast majority
of CRAN packages. Thus, we define an S3 method for calls to the S3
generic. The S4 generic will fall back to the S3 methods; however, it
will first check all S4 methods. Defining as.data.frame,Vector()
defends against the definition of a method above it, such as
as.data.frame,Annotated(), which would intercept dispatch to the S3
as.data.frame.Vector().

Michael

On Fri, Sep 6, 2019 at 10:08 AM Bhagwat, Aditya
 wrote:
>
> Thanks Kasper and Michael.
>
>
>
> The importClassesFrom  sounds like something that would allow an 
> attachment-free S4 class import, will check them out.
>
> Michael, the current auto-attach is causing 66 namespace clashes – not 
> feeling very comfortable about that, so trying to avoid them.
>
>
>
> I also think there’s something about S4 coercion that I don’t yet fully 
> understand.
>
> For instance: the S4Vectors package has three different versions of a 
> S4Vectors::Vector -> data.frame coercer. Why? Any useful pointers?
>
>
>
> setAs("Vector", "data.frame", function(from) as.data.frame(from))
>
>
>
> as.data.frame.Vector <- function(x, row.names=NULL, optional=FALSE, ...) {
>
> as.data.frame(x, row.names=NULL, optional=optional, ...)
>
> }
>
>
>
> setMethod("as.data.frame", "Vector",
>
>   function(x, row.names=NULL, optional=FALSE, ...)
>
>   {
>
>   x <- as.vector(x)
>
>   as.data.frame(x, row.names=row.names, optional=optional, ...)
>
>   })
>
>
>
>
>
>
>
> From: Kasper Daniel Hansen 
> Sent: Freitag, 6. September 2019 17:49
> To: Michael Lawrence 
> Cc: Bhagwat, Aditya ; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> BiocGenerics (and others)?
>
>
>
> There are
>
>   importMethodsFrom(PACKAGE, NAME_OF_METHODS)
>
>   importClassesFrom(PACKAGE, NAME_OF_METHODS)
>
> to help with selective importing S4 methods/classes.  See section 1.5.6 of 
> WRE.
>
>
>
> On Fri, Sep 6, 2019 at 9:24 AM Michael Lawrence via Bioc-devel 
>  wrote:
>
> It sounds like you are trying to defer loading of namespaces in order
> to save time when they are unnecessary? That's probably going to end
> up a losing battle.
>
> On Fri, Sep 6, 2019 at 5:47 AM Bhagwat, Aditya
>  wrote:
> >
> > Thank you Michael,
> >
> > Appreciate your time for helping me fill the gaps in my understanding of 
> > the S4 flow :-).
> >
> > It all started when I defined (in my multicrispr package) the S4 coercer :
> > methods::setAs( "BSgenome",
> >
> > "GRanges",
> > function(from) as(GenomeInfoDb::seqinfo(from), "GRanges")
> >
> > When building, I noticed the message
> > in method for 'coerce' with signature '"BSgenome","GRanges"': no definition 
> > for class "BSgenome"
> >
> > So, I added
> > BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
> >
> > That loads all these dependencies.
> > From your answer, I understand that there is currently no alternative to 
> > loading all these dependencies.
> > I guess because these dependencies are needed to provide for all required 
> > S4 methods for the BSgenome class, am I right?
> >
> > Is there a way to define a methods::setAs without loading the class 
> > definition?
> >
> > Aditya
> >
> >
> >
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Friday, September 06, 2019 1:09 PM
> > To: Bhagwat, Aditya
> > Cc: bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> > BiocGenerics (and others)?
> >
> > The way to keep a "clean namespace" is to selectively import symbols
> > into your namespace, not to import _nothing_ into your namespace.
> > Otherwise, your code will fill with namespace qualifications that
> > distract from what is more important to communicate: the intent of the
> > code. And no, there's

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-06 Thread Michael Lawrence via Bioc-devel
It sounds like you are trying to defer loading of namespaces in order
to save time when they are unnecessary? That's probably going to end
up a losing battle.

On Fri, Sep 6, 2019 at 5:47 AM Bhagwat, Aditya
 wrote:
>
> Thank you Michael,
>
> Appreciate your time for helping me fill the gaps in my understanding of the 
> S4 flow :-).
>
> It all started when I defined (in my multicrispr package) the S4 coercer :
> methods::setAs( "BSgenome",
>
> "GRanges",
> function(from) as(GenomeInfoDb::seqinfo(from), "GRanges")
>
> When building, I noticed the message
> in method for 'coerce' with signature '"BSgenome","GRanges"': no definition 
> for class "BSgenome"
>
> So, I added
> BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
>
> That loads all these dependencies.
> From your answer, I understand that there is currently no alternative to 
> loading all these dependencies.
> I guess because these dependencies are needed to provide for all required S4 
> methods for the BSgenome class, am I right?
>
> Is there a way to define a methods::setAs without loading the class 
> definition?
>
> Aditya
>
>
>
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Friday, September 06, 2019 1:09 PM
> To: Bhagwat, Aditya
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> BiocGenerics (and others)?
>
> The way to keep a "clean namespace" is to selectively import symbols
> into your namespace, not to import _nothing_ into your namespace.
> Otherwise, your code will fill with namespace qualifications that
> distract from what is more important to communicate: the intent of the
> code. And no, there's no way to define method signatures using
> anything other than simple class names.
>
> It would be interesting to explore alternative ways of specifying
> method signatures. One way would be if every package exported a "class
> reference" (class name with package attribute, at least) for each of
> its classes. Those could be treated like any other exported object,
> and referenced via namespace qualification. It would require major
> changes to the methods package but that should probably happen anyway
> to support disambiguation when two packages define a class of the same
> name. It would be nice to get away from the exportClasses() and
> importClasses() stuff. File that under the "rainy year" category.
>
> Michael
>
> On Fri, Sep 6, 2019 at 3:39 AM Bhagwat, Aditya
>  wrote:
> >
> > Dear Bioc devel,
> >
> > Is it possible to import the BSgenome class without attaching BiocGenerics 
> > (to keep a clean namespace during the development of 
> > multicrispr).
> >
> > BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
> >
> > (Posted earlier on BioC support 
> > and redirected here following Martin's suggestion)
> >
> > Thankyou :-)
> >
> > Aditya
> >
> > [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> --
> Michael Lawrence
> Scientist, Bioinformatics and Computational Biology
> Genentech, A Member of the Roche Group
> Office +1 (650) 225-7760
> micha...@gene.com
>
> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube



-- 
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-06 Thread Michael Lawrence via Bioc-devel
The way to keep a "clean namespace" is to selectively import symbols
into your namespace, not to import _nothing_ into your namespace.
Otherwise, your code will fill with namespace qualifications that
distract from what is more important to communicate: the intent of the
code. And no, there's no way to define method signatures using
anything other than simple class names.

It would be interesting to explore alternative ways of specifying
method signatures. One way would be if every package exported a "class
reference" (class name with package attribute, at least)  for each of
its classes. Those could be treated like any other exported object,
and referenced via namespace qualification. It would require major
changes to the methods package but that should probably happen anyway
to support disambiguation when two packages define a class of the same
name. It would be nice to get away from the exportClasses() and
importClasses() stuff. File that under the "rainy year" category.

Michael

On Fri, Sep 6, 2019 at 3:39 AM Bhagwat, Aditya
 wrote:
>
> Dear Bioc devel,
>
> Is it possible to import the BSgenome class without attaching BiocGenerics 
> (to keep a clean namespace during the development of 
> multicrispr).
>
> BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
>
> (Posted earlier on BioC support 
> and redirected here following Martin's suggestion)
>
> Thankyou :-)
>
> Aditya
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] rtracklayer::import.bw error situation when there's repeated input ranges in the "selection", leads to S4Vectors

2019-08-24 Thread Michael Lawrence via Bioc-devel
Sorry about that. I bumped the version and ported to release.

On Thu, Aug 22, 2019 at 7:50 AM Leonardo Collado Torres
 wrote:
>
> Hi Michael,
>
> That was using rtracklayer 1.45.3. However I see at
> https://github.com/lawremi/rtracklayer/commit/90d4209eae05ae2a4af550072c35cada662b4c89
> that you made a recent change. If I use the GitHub version, it all
> works =)
>
> > result <- rtracklayer::import(sampleFile,
> + selection = regs,
> + as = 'RleList'
> + )
> > result2 <- rtracklayer::import(sampleFile,
> + selection = GenomicRanges::reduce(regs),
> + as = 'RleList'
> + )
> > identical(result, result2)
> [1] TRUE
>
> library('sessioninfo')
> options(width = 120)
> session_info()
>
> ## Same as yesterday except:
>  rtracklayer1.45.32019-08-22 [1] Github
> (lawremi/rtracklayer@2dac472)
>
> Best,
> Leo
>
> On Wed, Aug 21, 2019 at 3:33 PM Michael Lawrence
>  wrote:
> >
> > Sorry, please try 1.45.3. If that works then I'll push it over to release.
> >
> > On Wed, Aug 21, 2019 at 11:48 AM Leonardo Collado Torres
> >  wrote:
> > >
> > > Dear Bioc-devel,
> > >
> > > In BioC 3.9 and 3.10 I've noticed some errors on recount which today I
> > > finally traced. It looks like the internals of
> > > rtracklayer::import.bw() changed (or mabye S4Vectors:::normarg_names)
> > > in such a way that if you specify as input to rtracklayer::import()
> > > the "selection" argument with a named GRanges object that has repeated
> > > ranges, the function call fails. This can be avoided from a user's
> > > perspective by using GenomicRanges::reduce() on the input to
> > > "selection", which I guess is ultimately the best option. I now use
> > > GenomicRanges::reduce() on derfinder version 1.18.4 (BioC 3.9) and
> > > 1.19.4 (BioC 3.10) to solve this issue for recount. But well, I
> > > thought that it would be best to share this with all of you.
> > >
> > > Best,
> > > Leo
> > >
> > > Here's the actual R code for reproducing this situation:
> > >
> > >
> > >
> > > sampleFile <- c(
> > > 'SRR38' = 
> > > 'http://duffel.rail.bio/recount/SRP009615/bw/SRR38.bw'
> > > )
> > > regs <- GenomicRanges::GRanges(
> > > 'chrY',
> > > IRanges::IRanges(start = c(1, 1), width = 10),
> > > strand = '-'
> > > )
> > > names(regs) <- c(1:2)
> > > result <- rtracklayer::import(sampleFile,
> > > selection = regs,
> > > as = 'RleList'
> > > )
> > > # Error in S4Vectors:::normarg_names(value, class(x), length(x)) :
> > > #   attempt to set too many names (2) on IRanges object of length 1
> > >
> > > # 12: stop(wmsg("attempt to set too many names (", names_len, ") ",
> > > # "on ", x_class, " object of length ", x_len))
> > > # 11: S4Vectors:::normarg_names(value, class(x), length(x))
> > > # 10: `names<-`(`*tmp*`, value = nm)
> > > # 9: `names<-`(`*tmp*`, value = nm)
> > > # 8: setNames(ranges(x), value)
> > > # 7: `names<-`(`*tmp*`, value = names(flatWhich))
> > > # 6: `names<-`(`*tmp*`, value = names(flatWhich))
> > > # 5: .local(con, format, text, ...)
> > > # 4: import(FileForFormat(con), ...)
> > > # 3: import(FileForFormat(con), ...)
> > > # 2: rtracklayer::import(sampleFile, selection = regs, as = "RleList")
> > > # 1: rtracklayer::import(sampleFile, selection = regs, as = "RleList")
> > >
> > > result <- rtracklayer::import(sampleFile,
> > > selection = GenomicRanges::reduce(regs),
> > > as = 'RleList'
> > > )
> > >
> > > library('sessioninfo')
> > > options(width = 120)
> > > session_info()
> > >
> > > # ─ Session info
> > > ───
> > > #  setting  value
> > > #  version  R version 3.6.1 (2019-07-05)
> > > #  os   macOS Mojave 10.14.6
> > > #  system   x86_64, darwin15.6.0
> > > #  ui   X11
> > > #  language (EN)
> > > #  collate  en_US.UTF-8
> > > #  ctypeen_US.UTF-8
> > > #  tz   America/New_York
> > > #  date 2019-08-21
> > > #
> > > # ─ Packages 
> > > ───
> > > #  package  * version   date   lib source
> > > #  assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
> > > #  backports  1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
> > > #  Biobase2.45.02019-05-02 [1] Bioconductor
> > > #  BiocGenerics   0.31.52019-07-03 [1] Bioconductor
> > > #  BiocParallel   1.19.22019-08-07 [1] Bioconductor
> > > #  Biostrings 2.53.22019-07-09 [1] Bioconductor
> > > #  bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.0)
> > > #  callr  3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
> > > #  cli1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
> > > #  colorout * 1.2-1 2019-07-27 [1] Github
> > > (jalvesaq/colorout@7ea9440)
> > > #  crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
> > > #  DelayedArray   

Re: [Bioc-devel] rtracklayer::import.bw error situation when there's repeated input ranges in the "selection", leads to S4Vectors

2019-08-21 Thread Michael Lawrence via Bioc-devel
Sorry, please try 1.45.3. If that works then I'll push it over to release.

On Wed, Aug 21, 2019 at 11:48 AM Leonardo Collado Torres
 wrote:
>
> Dear Bioc-devel,
>
> In BioC 3.9 and 3.10 I've noticed some errors on recount which today I
> finally traced. It looks like the internals of
> rtracklayer::import.bw() changed (or mabye S4Vectors:::normarg_names)
> in such a way that if you specify as input to rtracklayer::import()
> the "selection" argument with a named GRanges object that has repeated
> ranges, the function call fails. This can be avoided from a user's
> perspective by using GenomicRanges::reduce() on the input to
> "selection", which I guess is ultimately the best option. I now use
> GenomicRanges::reduce() on derfinder version 1.18.4 (BioC 3.9) and
> 1.19.4 (BioC 3.10) to solve this issue for recount. But well, I
> thought that it would be best to share this with all of you.
>
> Best,
> Leo
>
> Here's the actual R code for reproducing this situation:
>
>
>
> sampleFile <- c(
> 'SRR38' = 'http://duffel.rail.bio/recount/SRP009615/bw/SRR38.bw'
> )
> regs <- GenomicRanges::GRanges(
> 'chrY',
> IRanges::IRanges(start = c(1, 1), width = 10),
> strand = '-'
> )
> names(regs) <- c(1:2)
> result <- rtracklayer::import(sampleFile,
> selection = regs,
> as = 'RleList'
> )
> # Error in S4Vectors:::normarg_names(value, class(x), length(x)) :
> #   attempt to set too many names (2) on IRanges object of length 1
>
> # 12: stop(wmsg("attempt to set too many names (", names_len, ") ",
> # "on ", x_class, " object of length ", x_len))
> # 11: S4Vectors:::normarg_names(value, class(x), length(x))
> # 10: `names<-`(`*tmp*`, value = nm)
> # 9: `names<-`(`*tmp*`, value = nm)
> # 8: setNames(ranges(x), value)
> # 7: `names<-`(`*tmp*`, value = names(flatWhich))
> # 6: `names<-`(`*tmp*`, value = names(flatWhich))
> # 5: .local(con, format, text, ...)
> # 4: import(FileForFormat(con), ...)
> # 3: import(FileForFormat(con), ...)
> # 2: rtracklayer::import(sampleFile, selection = regs, as = "RleList")
> # 1: rtracklayer::import(sampleFile, selection = regs, as = "RleList")
>
> result <- rtracklayer::import(sampleFile,
> selection = GenomicRanges::reduce(regs),
> as = 'RleList'
> )
>
> library('sessioninfo')
> options(width = 120)
> session_info()
>
> # ─ Session info
> ───
> #  setting  value
> #  version  R version 3.6.1 (2019-07-05)
> #  os   macOS Mojave 10.14.6
> #  system   x86_64, darwin15.6.0
> #  ui   X11
> #  language (EN)
> #  collate  en_US.UTF-8
> #  ctypeen_US.UTF-8
> #  tz   America/New_York
> #  date 2019-08-21
> #
> # ─ Packages 
> ───
> #  package  * version   date   lib source
> #  assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
> #  backports  1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
> #  Biobase2.45.02019-05-02 [1] Bioconductor
> #  BiocGenerics   0.31.52019-07-03 [1] Bioconductor
> #  BiocParallel   1.19.22019-08-07 [1] Bioconductor
> #  Biostrings 2.53.22019-07-09 [1] Bioconductor
> #  bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.0)
> #  callr  3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
> #  cli1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
> #  colorout * 1.2-1 2019-07-27 [1] Github
> (jalvesaq/colorout@7ea9440)
> #  crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
> #  DelayedArray   0.11.42019-07-03 [1] Bioconductor
> #  desc   1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
> #  devtools * 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
> #  digest 0.6.202019-07-04 [1] CRAN (R 3.6.0)
> #  fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
> #  GenomeInfoDb   1.21.12019-05-16 [1] Bioconductor
> #  GenomeInfoDbData   1.2.1 2019-07-27 [1] Bioconductor
> #  GenomicAlignments  1.21.42019-06-28 [1] Bioconductor
> #  GenomicRanges  1.37.14   2019-06-24 [1] Bioconductor
> #  glue   1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
> #  IRanges2.19.10   2019-06-11 [1] Bioconductor
> #  lattice0.20-38   2018-11-04 [1] CRAN (R 3.6.1)
> #  magrittr   1.5   2014-11-22 [1] CRAN (R 3.6.0)
> #  Matrix 1.2-172019-03-22 [1] CRAN (R 3.6.1)
> #  matrixStats0.54.02018-07-23 [1] CRAN (R 3.6.0)
> #  memoise1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
> #  pkgbuild   1.0.4 2019-08-05 [1] CRAN (R 3.6.0)
> #  pkgload1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
> #  prettyunits1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
> #  processx   3.4.1 

Re: [Bioc-devel] InteractionSet for structural variants

2019-05-21 Thread Michael Lawrence via Bioc-devel
The new package StructuralVariantAnnotation is worth mentioning. It
operates on the general "breakend" notation so should be able to represent
any type of structural variant.

On Tue, May 21, 2019 at 3:22 AM Sean Davis  wrote:

> On Tue, May 21, 2019 at 2:54 AM Aaron Lun <
> infinite.monkeys.with.keyboa...@gmail.com> wrote:
>
> > > Thanks for your response. So far my intention is to to plot them and I
> > > do not intend on performing any other operation. The first step would
> be
> > > read in the VCF file and transform it into a meaningful object and I
> was
> > > hoping there was a core package already taking care of that, but I get
> > > from your answer that there's no such functionality implemented.
> >
> > Not to my knowledge... but if you're planning on writing some relevant
> > functions, I'm sure we could find a home for it somewhere.
> >
>
> I do have a couple of simple functions in VCFWrenchR (not in Bioc), but
> like much VCF code, it probably misses a bunch of edge cases. The functions
> target VRanges, not interactionsets.
>
> https://github.com/seandavi/VCFWrenchR
>
> Sean
>
>
> > -A
> >
> > > El 5/18/19 a las 4:47 AM, Aaron Lun escribió:
> > >> I would say that it depends on what operations you intend to perform
> > >> on them. You can _store_ things any way you like, but the trick is to
> > >> ensure that operations and manipulations on those things are
> > >> consistent and meaningful. It is not obvious that there are meaningful
> > >> common operations that one might want to apply to all structural
> > >> variants.
> > >>
> > >> For example, translocations involve two genomic regions (i.e., the two
> > >> bits that get stuck together) and so are inherently two-dimensional. A
> > >> lot of useful operations will be truly translocation-specific, e.g.,
> > >> calculation of distances between anchor regions, identification of
> > >> bounding boxes in two-dimensional space. These operations will be
> > >> meaningless to 1-dimensional variants on the linear genome, e.g.,
> > >> CNVs, inversions. The converse also applies where operations on the
> > >> linear genome have no single equivalent in the two-dimensional case.
> > >>
> > >> So, I would be inclined to store them separately. If you must keep
> > >> them in one object, just lump them into a List with "translocation"
> > >> (GInteractions), "cnv" (GRanges) and "inversion" (another GRanges)
> > >> elements, and people/programs can pull out bits and pieces as needed.
> > >>
> > >> -A
> > >>
> > >>
> > >> On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
> > >>> Hi all,
> > >>>
> > >>> Is there any standard recommended container for genomic structural
> > >>> variants? I think InteractionSet would work fine for translocation
> and
> > >>> GRanges for inversions and copy number changes, but I don't know what
> > >>> would be the recommended way to store them all together using
> standard
> > >>> Bioconductor objects.
> > >>>
> > >>> And actually, is there any package that would load a SV VCF by lumpy
> or
> > >>> delly and build that object?
> > >>>
> > >>> Thanks!
> > >>>
> > >>> Bernat
> >
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 

Michael Lawrence

Scientist, Bioinformatics and Computational Biology

Genentech , A Member of the Roche Group

Office +1 (650) 225-7760

micha...@gene.com 


Join Genentech on LinkedIn  |
Twitter

| Facebook  | Instagram
 | YouTube


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] When to use generics?

2019-05-01 Thread Michael Lawrence via Bioc-devel
They're symbols in the namespace, so namespace qualification is possible.

On Wed, May 1, 2019 at 10:15 AM Boris Steipe 
wrote:

> I am more and more emphasizing ::() syntax in my
> teaching, to avoid the issues that library() have wrt. order of
> loading, and clarity of code. It's my understanding that that's a general
> trend. Now, do correct me if I'm wrong but IIRC,  that doesn't work well
> with generics because they don't get registered. If so, that might be
> another consideration.
>
> Cheers,
> Boris
>
>
>
>
> > On 2019-05-01, at 12:18, Michael Lawrence via Bioc-devel <
> bioc-devel@r-project.org> wrote:
> >
> > This is good advice. The purpose of the software also matters. The main
> > advantage of a generic is extensibility. If extensibility is important,
> > generics are appropriate, even if there is no immediate need for
> > polymorphism. Bioconductor emphasizes extensibility, and thus generics,
> but
> > it's not the only goal.
> >
> > On Wed, May 1, 2019 at 8:55 AM Kasper Daniel Hansen <
> > kasperdanielhan...@gmail.com> wrote:
> >
> >> This is a common situation. Most packages might have a need for
> something
> >> that looks like a generic, but it really only has a usage inside your
> own
> >> package(s).
> >>
> >> In many ways, the pseudo-generic you define using if() statements is
> often
> >> easier to work with, debugging wise and code-reading wise. However,
> there
> >> is a certain beauty to generics. IME many newcomers to S4 overdo it with
> >> generics. My rule of thumb is don't do a generic unless you have at
> least 3
> >> different classes you're doing dispatch on. Well, you have 3 classes,
> so I
> >> would start thinking about it. For better or worse, this is unlikely to
> >> make much difference so you should choose the one that makes the most
> sense
> >> to you as a writer.
> >>
> >> If you go down the generic route, think hard about the name.
> >>
> >>
> >> On Mon, Apr 29, 2019 at 10:38 AM Michael Lawrence via Bioc-devel <
> >> bioc-devel@r-project.org> wrote:
> >>
> >>> On Mon, Apr 29, 2019 at 6:55 AM Pages Gallego, M. <
> >>> m.pagesgall...@umcutrecht.nl> wrote:
> >>>
> >>>> Dear all,
> >>>>
> >>>> I am currently developing a package and I am a bit confused in when
> one
> >>>> should define a generic. Let me propose an example:
> >>>> Suppose I have defined 3 classes: one for proteomics data, one for
> >>>> metabolomics data and one for transcriptomics data. I have a function
> >>>> “foo(x)” that will do the same to any of the 3 classes, but requires a
> >>>> slightly different implementation for each one. I could do something
> >>> like
> >>>> that:
> >>>>
> >>>> ```
> >>>> foo <- function(x) {
> >>>>if (is(x, ‘proteomics’)) {
> >>>>  foo.protein(x)
> >>>>} else if (is(x, ‘metabolomics’)) {
> >>>>  foo.metabo(x)
> >>>>} else if (is(x, ‘transcriptomics’)) {
> >>>>  foo.trans(x)
> >>>>}
> >>>> }
> >>>> ```
> >>>>
> >>>> And then define foo.protein, foo.metabo and foo.trans.
> >>>> But this is basically the same as defining the generic “foo” and then
> >>>> defining a method for each of the classes.
> >>>>
> >>>> The problem is that “foo” is not generic “enough” in terms that
> outside
> >>>> the package it has no use. Therefore, defining the generic seems like
> a
> >>> big
> >>>> responsibility in that it should have a use outside the package. Would
> >>> you
> >>>> still define “foo” as generic in this case?
> >>>
> >>>
> >>> Not exactly sure what you mean, but if a function has no use outside
> of a
> >>> package, just avoid exporting it.
> >>>
> >>> Are there any guidelines in when one should or should not define a
> >>> generic?
> >>>>
> >>>> From:
> >>>>
> >>>
> https://www.bioconductor.org/help/course-materials/2017/Zurich/S4-classes-and-methods.html#extending-a-class-without-adding-any-slot
> >>>> in section 2.2.2 it recommends to reuse existing generics (which
> makes a
> >>>> lot of sense),

Re: [Bioc-devel] When to use generics?

2019-05-01 Thread Michael Lawrence via Bioc-devel
This is good advice. The purpose of the software also matters. The main
advantage of a generic is extensibility. If extensibility is important,
generics are appropriate, even if there is no immediate need for
polymorphism. Bioconductor emphasizes extensibility, and thus generics, but
it's not the only goal.

On Wed, May 1, 2019 at 8:55 AM Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:

> This is a common situation. Most packages might have a need for something
> that looks like a generic, but it really only has a usage inside your own
> package(s).
>
> In many ways, the pseudo-generic you define using if() statements is often
> easier to work with, debugging wise and code-reading wise. However, there
> is a certain beauty to generics. IME many newcomers to S4 overdo it with
> generics. My rule of thumb is don't do a generic unless you have at least 3
> different classes you're doing dispatch on. Well, you have 3 classes, so I
> would start thinking about it. For better or worse, this is unlikely to
> make much difference so you should choose the one that makes the most sense
> to you as a writer.
>
> If you go down the generic route, think hard about the name.
>
>
> On Mon, Apr 29, 2019 at 10:38 AM Michael Lawrence via Bioc-devel <
> bioc-devel@r-project.org> wrote:
>
>> On Mon, Apr 29, 2019 at 6:55 AM Pages Gallego, M. <
>> m.pagesgall...@umcutrecht.nl> wrote:
>>
>> > Dear all,
>> >
>> > I am currently developing a package and I am a bit confused in when one
>> > should define a generic. Let me propose an example:
>> > Suppose I have defined 3 classes: one for proteomics data, one for
>> > metabolomics data and one for transcriptomics data. I have a function
>> > “foo(x)” that will do the same to any of the 3 classes, but requires a
>> > slightly different implementation for each one. I could do something
>> like
>> > that:
>> >
>> > ```
>> > foo <- function(x) {
>> > if (is(x, ‘proteomics’)) {
>> >   foo.protein(x)
>> > } else if (is(x, ‘metabolomics’)) {
>> >   foo.metabo(x)
>> > } else if (is(x, ‘transcriptomics’)) {
>> >   foo.trans(x)
>> > }
>> > }
>> > ```
>> >
>> > And then define foo.protein, foo.metabo and foo.trans.
>> > But this is basically the same as defining the generic “foo” and then
>> > defining a method for each of the classes.
>> >
>> > The problem is that “foo” is not generic “enough” in terms that outside
>> > the package it has no use. Therefore, defining the generic seems like a
>> big
>> > responsibility in that it should have a use outside the package. Would
>> you
>> > still define “foo” as generic in this case?
>>
>>
>> Not exactly sure what you mean, but if a function has no use outside of a
>> package, just avoid exporting it.
>>
>> Are there any guidelines in when one should or should not define a
>> generic?
>> >
>> > From:
>> >
>> https://www.bioconductor.org/help/course-materials/2017/Zurich/S4-classes-and-methods.html#extending-a-class-without-adding-any-slot
>> > in section 2.2.2 it recommends to reuse existing generics (which makes a
>> > lot of sense), but then it also says that you can define new generics
>> for
>> > specialized operations, what if the operation is very specialized?
>> >
>>
>> One recommendation is that if an operation is very specialized it should
>> have a very special name. In the context of generics, that helps avoid
>> conflicts.
>>
>> From:
>> >
>> https://www.bioconductor.org/help/course-materials/2011/AdvancedRFeb2011Seattle/ImplementingS4Objects-lab.pdf
>> > in section 2.3 it says that the recommended way to implement accessors
>> is
>> > through methods (which require generics if none exist), is this always
>> the
>> > case for all S4 classes?
>> > I apologize in advance if these are very naïve questions.
>> >
>> >
>> At the very least you'll want some level of abstraction around the object
>> representation, even for internal classes. You could take a lazy approach,
>> first making ordinary functions, then transforming them into generics
>> later
>> when it becomes necessary. For public accessors, it's hard to predict when
>> it will become necessary, since generics are meant to be reused by others,
>> thus it is often best to make them generic initially. It is OK for APIs to
>> evolve over time though. In practice, once generics b

Re: [Bioc-devel] When to use generics?

2019-04-29 Thread Michael Lawrence via Bioc-devel
On Mon, Apr 29, 2019 at 6:55 AM Pages Gallego, M. <
m.pagesgall...@umcutrecht.nl> wrote:

> Dear all,
>
> I am currently developing a package and I am a bit confused in when one
> should define a generic. Let me propose an example:
> Suppose I have defined 3 classes: one for proteomics data, one for
> metabolomics data and one for transcriptomics data. I have a function
> “foo(x)” that will do the same to any of the 3 classes, but requires a
> slightly different implementation for each one. I could do something like
> that:
>
> ```
> foo <- function(x) {
> if (is(x, ‘proteomics’)) {
>   foo.protein(x)
> } else if (is(x, ‘metabolomics’)) {
>   foo.metabo(x)
> } else if (is(x, ‘transcriptomics’)) {
>   foo.trans(x)
> }
> }
> ```
>
> And then define foo.protein, foo.metabo and foo.trans.
> But this is basically the same as defining the generic “foo” and then
> defining a method for each of the classes.
>
> The problem is that “foo” is not generic “enough” in terms that outside
> the package it has no use. Therefore, defining the generic seems like a big
> responsibility in that it should have a use outside the package. Would you
> still define “foo” as generic in this case?


Not exactly sure what you mean, but if a function has no use outside of a
package, just avoid exporting it.

Are there any guidelines in when one should or should not define a generic?
>
> From:
> https://www.bioconductor.org/help/course-materials/2017/Zurich/S4-classes-and-methods.html#extending-a-class-without-adding-any-slot
> in section 2.2.2 it recommends to reuse existing generics (which makes a
> lot of sense), but then it also says that you can define new generics for
> specialized operations, what if the operation is very specialized?
>

One recommendation is that if an operation is very specialized it should
have a very special name. In the context of generics, that helps avoid
conflicts.

From:
> https://www.bioconductor.org/help/course-materials/2011/AdvancedRFeb2011Seattle/ImplementingS4Objects-lab.pdf
> in section 2.3 it says that the recommended way to implement accessors is
> through methods (which require generics if none exist), is this always the
> case for all S4 classes?
> I apologize in advance if these are very naïve questions.
>
>
At the very least you'll want some level of abstraction around the object
representation, even for internal classes. You could take a lazy approach,
first making ordinary functions, then transforming them into generics later
when it becomes necessary. For public accessors, it's hard to predict when
it will become necessary, since generics are meant to be reused by others,
thus it is often best to make them generic initially. It is OK for APIs to
evolve over time though. In practice, once generics become generally
adopted, they need to move to a more general package.

Best,
> Marc
>
>
>
> --
>
> De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht
> ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
> direct
> te informeren door het bericht te retourneren. Het Universitair Medisch
> Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de
> W.H.W.
> (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd
> bij
> de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.
>
> Denk s.v.p aan het milieu voor u deze e-mail afdrukt.
>
>
> --
>
> This message may contain confidential information and ...{{dropped:22}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Pushing towards a better home for matrix generics

2019-04-09 Thread Michael Lawrence via Bioc-devel
This should be in trunk and the 3.6 release branch.

On Thu, Mar 28, 2019 at 9:57 AM Michael Lawrence  wrote:

> Not yet. Too busy breaking other things ;) I'll move it up on my TODO list.
>
> On Thu, Mar 28, 2019 at 9:16 AM Pages, Herve  wrote:
>
>> Hi Michael,
>>
>> Did you get a chance to make this change?
>>
>> Thanks,
>>
>> H.
>>
>> On 2/11/19 08:07, Michael Lawrence wrote:
>> > I propose that we just fix the signatures in the methods package and
>> > deal with the consequences. If Martin's OK with that, I'll make the
>> > change.
>> >
>> > Michael
>> >
>> > On Mon, Feb 11, 2019 at 7:45 AM McDavid, Andrew
>> >  wrote:
>> >> As a casual observer of this thread, my takeaway is that 1. the
>> current situation is untenable (e.g. it means that Aaron would have to
>> essentially reimplement S4 method dispatch) 2.  Given the long history, and
>> extent of reverse dependencies to Matrix, there are good reasons to not
>> mess with the signature of its implicit generic (though I don't see much
>> utility in multiple dispatch here) and 3.  therefore the least-bad
>> alternative may be to eliminate the call to setGeneric('colSums'), etc, in
>> BiocGenerics, hopefully with some fixes to the help system to make it more
>> tolerant to S4 method naming.  I appreciate that until these fixes are
>> forthcoming it means more work maintaining the help aliases for some
>> methods.  How often do we think the aliases would be break?
>> >>
>> >> Andrew McDavid
>> >> Biostatistics and Computational Biology
>> >> Office: SRB 4.206 Ph: 585.275.5983
>> >>
>> >> Message: 1
>> >> Date: Sun, 10 Feb 2019 13:36:43 +
>> >> From: Aaron Lun > infinite.monkeys.with.keyboa...@gmail.com>>
>> >> To: "Pages, Herve" mailto:hpa...@fredhutch.org>>,
>> Martin Maechler
>> >> mailto:maech...@stat.math.ethz.ch>>
>> >> Cc: Michael Lawrence > lawrence.mich...@gene.com>>,
>> >> "bioc-devel@r-project.org" <
>> bioc-devel@r-project.org>
>> >> Subject: Re: [Bioc-devel] Pushing towards a better home for matrix
>> >> generics
>> >> Message-ID: <1549805803.3935.11.ca...@gmail.com> 1549805803.3935.11.ca...@gmail.com>>
>> >> Content-Type: text/plain; charset="utf-8"
>> >>
>> >> Returning to this topic:
>> >>
>> >> It's good to hear some of the rationale behind the current state of
>> >> affairs. That said, the set-up we have now is quite difficult to work
>> >> with; as mentioned before, I've had to hack around it like:
>> >>
>> >> # Example from "BiocSingular",
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_LTLA_BiocSingular=DwIGaQ=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA=YOujD4cGNGWN0ZskTj96dUHU2a4CkraklBd4BndS21sf3_emyYLIG1llnUarWCda=VBZgu2tHS9vBNQ2d8RSG4ijIcp3jyZkfCsRnA1mk7XI=FN4WdWx8gfnjFoLRQuBaIHaKrAjONVa9hsbAyIXSwGo=
>> >> .safe_colSums <- function(x) {
>> >>  if (is(x, "Matrix")) {
>> >>  Matrix::colSums(x)
>> >>  } else {
>> >>  colSums(x)
>> >>  }
>> >> }
>> >>
>> >> ... which is ugly, and even worse, still incorrect, e.g., for non-
>> >> Matrix classes that have methods for the implicit colSums generic. This
>> >> situation is not sustainable for further package development.
>> >>
>> >> Is there a path forward that is palatable to everyone? Or perhaps these
>> >> conversations are already happening on R-devel?
>> >>
>> >> -A
>> >>
>> >> On Tue, 2019-01-29 at 18:46 +, Pages, Herve wrote:
>> >> Yes the help system could enforce the full signature for the aliases
>> >> but
>> >> that means the end user then will have to always do
>> >> ?`colSums,SomeClass,ANY,ANY-method`, which feels unnecessary
>> >> complicated
>> >> and confusing in the case of a generic where dispatching on the 2nd
>> >> and
>> >> 3rd arguments hardly makes sense.
>> >>
>> >> Or are you saying that the help system should enforce an alias that
>> >> strictly matches the signature explicitly used in the setMethod
>> >> statement? Problem with this is that then there is no easy way for
>> >> the
>> >> end user to know a priori which form to use to access the man page.
>> >> Is
>> >> it ?`colSums,dgCMatrix,ANY,ANY-method` or is it
>> >> ?`colSums,dgCMatrix-method`. Right now when you type colSums
>> >> (after loading the Matrix package), you get this:
>> >>
>> >> > library(Matrix)
>> >> > colSums
>> >> standardGeneric for "colSums" defined from package "base"
>> >>
>> >> function (x, na.rm = FALSE, dims = 1, ...)
>> >> standardGeneric("colSums")
>> >> 
>> >> 
>> >> Methods may be defined for arguments: x, na.rm, dims
>> >> Use  showMethods("colSums")  for currently available ones.
>> >>
>> >> This suggests that the correct form is ?`colSums,dgCMatrix,ANY,ANY-
>> >> method`.
>> >>
>> >> All this confusion can be avoided by specifying signature="x" in the
>> >> definition of the implicit generic. It formalizes where dispatch
>> >> really
>> >> happens and sets expectations upfront. No loose ends.
>> >>
>> >> Hope this makes sense,

Re: [Bioc-devel] DataFrame: replacement error

2019-04-01 Thread Michael Lawrence via Bioc-devel
Thanks for the report. It should be fixed in S4Vectors 0.21.22.

On Mon, Apr 1, 2019 at 12:01 PM Ludwig Geistlinger <
ludwig.geistlin...@sph.cuny.edu> wrote:

> I have a SummarizedExperiment with putatively user-annoted rowData.
> I have a function that does computation on this SE and appends
> the results to the rowData.
> When (repeatedly) called on the SE, the function just overwrites /
> replaces already existing columns.
>
> In R release and S4Vectors_0.20.1
>
> > sessionInfo()
> R version 3.5.2 (2018-12-20)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS Mojave 10.14.3
>
> Matrix products: default
> BLAS:
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats4stats graphics  grDevices utils datasets
> [8] methods   base
>
> other attached packages:
> [1] S4Vectors_0.20.1BiocGenerics_0.28.0
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.2
>
> this works:
>
> # my rowData
> > d <- DataFrame(a=c(1,2), b=c(3,4), c=c(5,6))
> DataFrame with 2 rows and 3 columns
>   a b c
> 
> 1 1 3 5
> 2 2 4 6
>
> # output of my function
> > d2 <- DataFrame(b=c(7,8), c=c(9,10), d=c(11,12))
> DataFrame with 2 rows and 3 columns
>   b c d
> 
> 1 7 911
> 2 81012
>
> # replace and simultaneously append
> > d[,colnames(d2)] <- d2
> > d
> DataFrame with 2 rows and 4 columns
>   a b c d
>  
> 1 1 7 911
> 2 2 81012
>
>
>
>
> However, using R-devel and S4Vectors_0.21.20
>
> > sessionInfo()
> R Under development (unstable) (2019-01-18 r75994)
> Platform: x86_64-apple-darwin18.2.0 (64-bit)
> Running under: macOS Mojave 10.14.3
>
> Matrix products: default
> BLAS: /Users/ludwig/Downloads/R-devel/lib/libRblas.dylib
> LAPACK: /Users/ludwig/Downloads/R-devel/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats4stats graphics  grDevices utils datasets
> [8] methods   base
>
> other attached packages:
> [1] S4Vectors_0.21.20   BiocGenerics_0.29.2
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.0
>
>
> The same replacement operation causes an error
>
> > d[,colnames(d2)] <- d2
> Error in names(x@listData) <- value :
>   'names' attribute [3] must be the same length as the vector [1]
>
>
> For data.frame, this still works in R-devel:
> > dd <- as.data.frame(d)
> > dd2 <- as.data.frame(d2)
> > dd[,colnames(dd2)] <- dd2
> > dd
>   a b  c  d
> 1 1 7  9 11
> 2 2 8 10 12
>
> How to best address this issue?
>
>
> --
> Dr. Ludwig Geistlinger
> CUNY School of Public Health
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Pushing towards a better home for matrix generics

2019-03-28 Thread Michael Lawrence via Bioc-devel
Not yet. Too busy breaking other things ;) I'll move it up on my TODO list.

On Thu, Mar 28, 2019 at 9:16 AM Pages, Herve  wrote:

> Hi Michael,
>
> Did you get a chance to make this change?
>
> Thanks,
>
> H.
>
> On 2/11/19 08:07, Michael Lawrence wrote:
> > I propose that we just fix the signatures in the methods package and
> > deal with the consequences. If Martin's OK with that, I'll make the
> > change.
> >
> > Michael
> >
> > On Mon, Feb 11, 2019 at 7:45 AM McDavid, Andrew
> >  wrote:
> >> As a casual observer of this thread, my takeaway is that 1. the current
> situation is untenable (e.g. it means that Aaron would have to essentially
> reimplement S4 method dispatch) 2.  Given the long history, and extent of
> reverse dependencies to Matrix, there are good reasons to not mess with the
> signature of its implicit generic (though I don't see much utility in
> multiple dispatch here) and 3.  therefore the least-bad alternative may be
> to eliminate the call to setGeneric('colSums'), etc, in BiocGenerics,
> hopefully with some fixes to the help system to make it more tolerant to S4
> method naming.  I appreciate that until these fixes are forthcoming it
> means more work maintaining the help aliases for some methods.  How often
> do we think the aliases would be break?
> >>
> >> Andrew McDavid
> >> Biostatistics and Computational Biology
> >> Office: SRB 4.206 Ph: 585.275.5983
> >>
> >> Message: 1
> >> Date: Sun, 10 Feb 2019 13:36:43 +
> >> From: Aaron Lun  infinite.monkeys.with.keyboa...@gmail.com>>
> >> To: "Pages, Herve" mailto:hpa...@fredhutch.org>>,
> Martin Maechler
> >> mailto:maech...@stat.math.ethz.ch>>
> >> Cc: Michael Lawrence  lawrence.mich...@gene.com>>,
> >> "bioc-devel@r-project.org" <
> bioc-devel@r-project.org>
> >> Subject: Re: [Bioc-devel] Pushing towards a better home for matrix
> >> generics
> >> Message-ID: <1549805803.3935.11.ca...@gmail.com 1549805803.3935.11.ca...@gmail.com>>
> >> Content-Type: text/plain; charset="utf-8"
> >>
> >> Returning to this topic:
> >>
> >> It's good to hear some of the rationale behind the current state of
> >> affairs. That said, the set-up we have now is quite difficult to work
> >> with; as mentioned before, I've had to hack around it like:
> >>
> >> # Example from "BiocSingular",
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_LTLA_BiocSingular=DwIGaQ=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA=YOujD4cGNGWN0ZskTj96dUHU2a4CkraklBd4BndS21sf3_emyYLIG1llnUarWCda=VBZgu2tHS9vBNQ2d8RSG4ijIcp3jyZkfCsRnA1mk7XI=FN4WdWx8gfnjFoLRQuBaIHaKrAjONVa9hsbAyIXSwGo=
> >> .safe_colSums <- function(x) {
> >>  if (is(x, "Matrix")) {
> >>  Matrix::colSums(x)
> >>  } else {
> >>  colSums(x)
> >>  }
> >> }
> >>
> >> ... which is ugly, and even worse, still incorrect, e.g., for non-
> >> Matrix classes that have methods for the implicit colSums generic. This
> >> situation is not sustainable for further package development.
> >>
> >> Is there a path forward that is palatable to everyone? Or perhaps these
> >> conversations are already happening on R-devel?
> >>
> >> -A
> >>
> >> On Tue, 2019-01-29 at 18:46 +, Pages, Herve wrote:
> >> Yes the help system could enforce the full signature for the aliases
> >> but
> >> that means the end user then will have to always do
> >> ?`colSums,SomeClass,ANY,ANY-method`, which feels unnecessary
> >> complicated
> >> and confusing in the case of a generic where dispatching on the 2nd
> >> and
> >> 3rd arguments hardly makes sense.
> >>
> >> Or are you saying that the help system should enforce an alias that
> >> strictly matches the signature explicitly used in the setMethod
> >> statement? Problem with this is that then there is no easy way for
> >> the
> >> end user to know a priori which form to use to access the man page.
> >> Is
> >> it ?`colSums,dgCMatrix,ANY,ANY-method` or is it
> >> ?`colSums,dgCMatrix-method`. Right now when you type colSums
> >> (after loading the Matrix package), you get this:
> >>
> >> > library(Matrix)
> >> > colSums
> >> standardGeneric for "colSums" defined from package "base"
> >>
> >> function (x, na.rm = FALSE, dims = 1, ...)
> >> standardGeneric("colSums")
> >> 
> >> 
> >> Methods may be defined for arguments: x, na.rm, dims
> >> Use  showMethods("colSums")  for currently available ones.
> >>
> >> This suggests that the correct form is ?`colSums,dgCMatrix,ANY,ANY-
> >> method`.
> >>
> >> All this confusion can be avoided by specifying signature="x" in the
> >> definition of the implicit generic. It formalizes where dispatch
> >> really
> >> happens and sets expectations upfront. No loose ends.
> >>
> >> Hope this makes sense,
> >>
> >> H.
> >>
> >>
> >>  [[alternative HTML version deleted]]
> >>
> >> ___
> >> Bioc-devel@r-project.org mailing list
> >>
> 

Re: [Bioc-devel] Unclear build failure ‘logical subscript contains NAs’

2019-03-26 Thread Michael Lawrence via Bioc-devel
You'll just need to wait for that build error to resolve.

On Tue, Mar 26, 2019 at 10:45 AM Yue Zhao (Jason)  wrote:

> Thanks, Michael. I saw this error in the bioC 3.9 package building step for
> my package, so I'm not sure how to specify the S4Vector version in that.
>
> On Tue, Mar 26, 2019 at 1:35 PM Michael Lawrence <
> lawrence.mich...@gene.com>
> wrote:
>
> > DESeq2 seems to be working with S4Vectors 0.21.19. If you're using an
> > older version, you'll need to update once that version appears in the
> > repository. Sorry for the mess. Trying to clean up the [<- stuff on
> > DataFrame and Vector to make them easier to extend.
> >
> > On Tue, Mar 26, 2019 at 10:19 AM Yue Zhao (Jason)  wrote:
> >
> >> I see this error (logical subscript contains NAs) happened in DESeq2,
> I'm
> >> wondering if anyone knows what happened with the new BioC? Thanks!
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ___
> >> Bioc-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Unclear build failure ‘logical subscript contains NAs’

2019-03-26 Thread Michael Lawrence via Bioc-devel
DESeq2 seems to be working with S4Vectors 0.21.19. If you're using an older
version, you'll need to update once that version appears in the repository.
Sorry for the mess. Trying to clean up the [<- stuff on DataFrame and
Vector to make them easier to extend.

On Tue, Mar 26, 2019 at 10:19 AM Yue Zhao (Jason)  wrote:

> I see this error (logical subscript contains NAs) happened in DESeq2, I'm
> wondering if anyone knows what happened with the new BioC? Thanks!
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Unclear build failure ‘appending gaps’

2019-03-23 Thread Michael Lawrence via Bioc-devel
This is due to some buggy changes to the internals of the S4Vectors
package. I should be able to fix these today.

On Sat, Mar 23, 2019 at 7:41 AM Karl Stamm  wrote:

> My package rgsepd has failed build recently.
>
> I don't understand the error message, and need some guidance. It says Error
> building vignette, "appending gaps is not supported"
>
> I can't find that statement anywhere in Google, so I don't know what it's
> referring to.  Unfortunately, my package is an automation pipeline, so
> there's one line of code that runs many others, and the line triggering the
> error could represent any part of my pipeline.
>
> I've checked the build on my end before uploading code changes, so I cant
> reproduce the error. My build was okay until very recently, I have added a
> CITATION file which is rather lengthy, and may be impacting the vignette
> build?
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SetMethod to dispatch on class in unattached package

2019-01-30 Thread Michael Lawrence via Bioc-devel
Hopefully it could be implemented with minimal bloat, but I get your
point. The intent is to clarify the source of the class, since it
would not otherwise be imported. I think this is a fairly common case
that is worth solving if it doesn't add too much complexity.

On Wed, Jan 30, 2019 at 2:31 PM Pages, Herve  wrote:
>
> Hi Michael,
>
> So IIUC this upfront class registration wouldn't do much beside making
> the "no definition for class 'seurat'" message issued by setMethod
> disappear. Maybe that's not enough to justify adding more bloat to the
> methods package?
>
> My 2 cents,
>
> H.
>
>
> On 1/30/19 13:35, Michael Lawrence via Bioc-devel wrote:
> > Unrelated to the specific question, a generic function introduces more
> > vocabulary into the ecosystem. Since SingleCellExperiment defines the
> > standard interface, why not make a new method on logcounts()?
> >
> > Back on topic, what if the methods package were to support forward
> > class declarations? For example,
> >
> > setExternalClass("seurat", package="seurat")
> >
> > I have no idea how to implement that, but perhaps we could at least
> > agree on the API. Honestly, it may be that just some dummy class would
> > work for method registration. But it's nice to be explicit and to list
> > the source package, even if nothing is formally checked.
> >
> > Michael
> >
> >
> >
> > On Wed, Jan 30, 2019 at 12:41 PM Brendan Innes
> >  wrote:
> >> Thanks Martin!
> >>
> >>
> >> The fun strategy (zzz.R .onLoad setHook trick) worked well, except for 
> >> setting this particular method:
> >>
> >>
> >>  setMethod(scClustViz::getDR,"SingleCellExperiment",function(x,DRtype) 
> >> {
> >>  if 
> >> (any(grepl(DRtype,SingleCellExperiment::reducedDimNames(x),ignore.case=T)))
> >>  {
> >>SingleCellExperiment::reducedDim(x,grep(DRtype,
> >>
> >> SingleCellExperiment::reducedDimNames(x),
> >>ignore.case=T,value=T))
> >>  } else {
> >>stop(paste(paste0("DRtype '",DRtype,"' not found."),
> >>   "The following cell embeddings are available in this 
> >> object:",
> >>   
> >> paste0(SingleCellExperiment::reducedDimNames(x),collapse=", "),
> >>   sep="\n  "))
> >>  }
> >>},
> >>where=.GlobalEnv)
> >>
> >>
> >> which gives the following error when SingleCellExperiment is loaded:
> >>
> >> Error in as.vector(x, "character") :
> >>cannot coerce type 'closure' to vector of type 'character'
> >>
> >>
> >> But if I just run that code, it doesn't give the error.  I thought this 
> >> might be due to where in the order of loading and attaching namespaces the 
> >> hook is running, but changing the packageEvent event to "attach" makes no 
> >> difference.
> >>
> >>
> >> Anyway, suppressing the message works, so I'm happy.  Just wanted to 
> >> report back on the partial success of doing it the cool way, in case 
> >> anyone was interested.
> >>
> >>
> >> Thanks again!
> >>
> >> Brendan
> >>
> >>
> >> www.baderlab.org/BrendanInnes<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.baderlab.org_BrendanInnes=DwIFaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=HwMn126h-7G4XmAqvYBr8lGjzQbgLqGq4-KkAOUH9r0=OIIsBqppQCVjbRlE5mpw0XnWEc-hiPwpjOCLNdiu4Mg=>
> >>
> >> 
> >> From: Martin Morgan 
> >> Sent: Wednesday, January 30, 2019 12:16:20 PM
> >> To: Brendan Innes; bioc-devel@r-project.org
> >> Subject: Re: [Bioc-devel] SetMethod to dispatch on class in unattached 
> >> package
> >>
> >> A simple approach is to suppress the message (I think this is pragmatic, 
> >> rather than dumb __) about unknown class (along with DESCRIPTION Suggests: 
> >> Seurat)
> >>
> >> setGeneric("getGE", function(x) standardGeneric("getGE"))
> >>
> >> suppressMessages({
> >>  setMethod("getGE", "seurat", function(x) Seurat::GetAssayData(x))
> >> })
> >>
> >> One fun t

Re: [Bioc-devel] SetMethod to dispatch on class in unattached package

2019-01-30 Thread Michael Lawrence via Bioc-devel
Unrelated to the specific question, a generic function introduces more
vocabulary into the ecosystem. Since SingleCellExperiment defines the
standard interface, why not make a new method on logcounts()?

Back on topic, what if the methods package were to support forward
class declarations? For example,

setExternalClass("seurat", package="seurat")

I have no idea how to implement that, but perhaps we could at least
agree on the API. Honestly, it may be that just some dummy class would
work for method registration. But it's nice to be explicit and to list
the source package, even if nothing is formally checked.

Michael



On Wed, Jan 30, 2019 at 12:41 PM Brendan Innes
 wrote:
>
> Thanks Martin!
>
>
> The fun strategy (zzz.R .onLoad setHook trick) worked well, except for 
> setting this particular method:
>
>
> setMethod(scClustViz::getDR,"SingleCellExperiment",function(x,DRtype) {
> if 
> (any(grepl(DRtype,SingleCellExperiment::reducedDimNames(x),ignore.case=T))) {
>   SingleCellExperiment::reducedDim(x,grep(DRtype,
>   
> SingleCellExperiment::reducedDimNames(x),
>   ignore.case=T,value=T))
> } else {
>   stop(paste(paste0("DRtype '",DRtype,"' not found."),
>  "The following cell embeddings are available in this 
> object:",
>  
> paste0(SingleCellExperiment::reducedDimNames(x),collapse=", "),
>  sep="\n  "))
> }
>   },
>   where=.GlobalEnv)
>
>
> which gives the following error when SingleCellExperiment is loaded:
>
> Error in as.vector(x, "character") :
>   cannot coerce type 'closure' to vector of type 'character'
>
>
> But if I just run that code, it doesn't give the error.  I thought this might 
> be due to where in the order of loading and attaching namespaces the hook is 
> running, but changing the packageEvent event to "attach" makes no difference.
>
>
> Anyway, suppressing the message works, so I'm happy.  Just wanted to report 
> back on the partial success of doing it the cool way, in case anyone was 
> interested.
>
>
> Thanks again!
>
> Brendan
>
>
> www.baderlab.org/BrendanInnes
>
> 
> From: Martin Morgan 
> Sent: Wednesday, January 30, 2019 12:16:20 PM
> To: Brendan Innes; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] SetMethod to dispatch on class in unattached package
>
> A simple approach is to suppress the message (I think this is pragmatic, 
> rather than dumb __) about unknown class (along with DESCRIPTION Suggests: 
> Seurat)
>
> setGeneric("getGE", function(x) standardGeneric("getGE"))
>
> suppressMessages({
> setMethod("getGE", "seurat", function(x) Seurat::GetAssayData(x))
> })
>
> One fun thing to try is use hooks for package load; I would say that this is 
> rarely used so may have surprising issues... Note the need to specify the 
> location of the generic `PkgA::getGE` and that the method is defined in the 
> global environment, rather than in the package environment, so would not have 
> access to non-exported functions. The DESCRIPTION file has Suggests: Seurat
>
> Something along the lines of
>
> setGeneric("getGE", function(x) standardGeneric("getGE"))
>
> .onLoad <-
> function(...)
> {
> if ("Seurat" %in% loadedNamespaces()) {
> setMethod(
> getGE, "seurat",
> function(x) Seurat::GetAssayData(x)
> )
> } else {
> setHook(
> packageEvent("Seurat", "onLoad"),
> function(...) {
> setMethod(
> PkgA::getGE, "seurat",
> function(x) Seurat::GetAssayData(x),
> where = .GlobalEnv
> )
> }
> )
> }
> }
>
> Martin
>
> On 1/29/19, 9:27 PM, "Bioc-devel on behalf of Brendan Innes" 
>  brendan.in...@mail.utoronto.ca> wrote:
>
> Hi friendly Bioc gang! I'm struggling with what seems like a silly 
> problem. I'm trying to write a simple wrapper S4 generic that accesses the 
> data slot of various S4 objects (seurat and SingleCellExperiment objects). So 
> the generic is:
>
> setGeneric("getGE",function(x) standardGeneric("getGE"))
>
>
> And the methods are:
>
> setMethod("getGE","seurat",
>   function(x) Seurat::GetAssayData(x))
> setMethod("getGE","SingleCellExperiment",
>   function(x) SingleCellExperiment::logcounts(x))
>
>
> Problem is that when I install the package, I get the warning
>
> > in method for �getGE� with signature �"seurat"�: no definition for 
> class �seurat�
>
>
> This isn't surprising, since Seurat isn't imported, but I don't want the 
> user to have to import it if their data is in a SingleCellExperiment object. 
> The function still seems to work fine if I attach Seurat and load a seurat 
> object, so I'm tempted to just suppress 

Re: [Bioc-devel] Pushing towards a better home for matrix generics

2019-01-29 Thread Michael Lawrence via Bioc-devel
Along like the lines of Martin Morgan's comment, simpler signatures
mean simpler software. We can typically limit to single dispatch, and
it indicates how the generic is _meant_ to be used, separating the
data arguments from the modulating accessory parameters. I'm not sure
how valuable dispatching on the "missing" class is, as it encourages
telescoping method dispatch, which is not always the clearest mode of
expression.

Michael

On Tue, Jan 29, 2019 at 11:38 AM Martin Morgan  wrote:
>
> Multiple dispatch also makes it difficult to reason about method selection, 
> at least for me.
>
> > setGeneric(f, function(a, b, c) standardGeneric("f"))
> [1] "f"
> > setMethod(f, c(b="missing"), function(a, b, c) {})
> > setMethod(f, c(c="missing"), function(a, b, c) {})
> > f()
> Note: method with signature 'ANY#missing#ANY' chosen for function 'f',
>  target signature 'missing#missing#missing'.
>  "ANY#ANY#missing" would also be valid
>
> I believe these rules are also sensitive to other packages that are 
> installed, and one ends up implementing methods for the exact signature one 
> wishes to support, rather than allowing any kind of inheritance / dispatch.
>
> Martin Morgan
>
> On 1/29/19, 12:10 PM, "Bioc-devel on behalf of Pages, Herve" 
>  wrote:
>
> Hi Martin.
>
> Speed is not the concern: I just did some quick benchmarking and didn't
> observe any significant difference in method dispatch performance after
> doing setGeneric("toto", function(x, a=0, b=0, c=0)
> standardGeneric("toto")) vs doing setGeneric("toto", signature="x",
> function(x, a=0, b=0, c=0) standardGeneric("toto")).
>
> Here is the real concern to me:
>
> Aliases of the form \alias{colSums,dgCMatrix,ANY,ANY-method} are a real
> pain to maintain. It's also a pain for the end user to have to do
> ?`colSums,dgCMatrix,ANY,ANY-method` to access the man page for a
> particular method. I know Matrix uses "short" aliases i.e. aliases of
> the form \alias{colSums,dgCMatrix-method} so the user only needs to do
> ?`colSums,dgCMatrix-method` but there is a lot of fragility to the
> situation.
>
> Here is why: The exact form that needs to be used for these aliases can
> change anytime depending on what happens in one of the upstream packages
> (not a concern for the Matrix package because no upstream package
> defines colSums methods). More precisely: If all the colSums methods
> defined in the upstream packages and in my own package are defined with
> setMethod statements of the form:
>
> setMethod("colSums", signature(x="SomeClass"), ...)
>
> then the aliases in the man pages must be of the form
> \alias{colSums,SomeClass-method} and the end user can just do
> ?`colSums,SomeClass-method`, which is good. But if **one** upstream
> package decides to use a setMethod statement of the form:
>
>setMethod("colSums", signature(x="SomeClass", na.rm="ANY",
> dims="ANY"), ...)
>
> then the aliases for **all** the colSums methods in **all** the
> downstream packages now must be of the form
> \alias{colSums,SomeOtherClass,ANY,ANY-method}, even if the method for
> SomeOtherClass is defined with signature(x="SomeOtherClass"). Also, as a
> consequence, now the end user has to use the long syntax to access the
> man pages for these methods. And if later the author of the upstream
> package decides to switch back to the setMethod("colSums",
> signature(x="SomeClass"), ...) form, then I have to switch back all the
> aliases in all my downstream packages to the short form again!
>
> This fragility of the alias syntax was one of the motivations for me to
> put many setGeneric statements of the form setGeneric("someGeneric",
> signature="x") in BiocGenerics over the years. So I don't have many
> dozens of aliases that suddenly break for mysterious reasons ('R CMD
> check' would suddenly starts reporting warnings for these aliases
> despite no changes in my package or in R).
>
> Best,
>
> H.
>
> On 1/29/19 03:16, Martin Maechler wrote:
> >> Michael Lawrence
> >>  on Mon, 28 Jan 2019 20:47:58 -0800 writes:
> >  > That will have some consequences; for example,
> >  > documentation aliases will need to change. Not sure how
> >  > many packages will need to be fixed outside of Matrix, but
> >  > it's not an isolated change. Martin might comment on the
> >  > rationale for the full signature, since he defined those
> >  > generics.
> >
> >  > On Mon, Jan 28, 2019 at 7:21 PM Pages, Herve
> >  >  wrote:
> >  >>
> >  >> Actually there is a 4th solution which is to modify the
> >  >> definition of the implicit generics in the methods
> >  >> package (file makeBasicFunsList.R) to make them dispatch
> >  >> on their 1st arg only. Should be easy. Then no package
> >  

Re: [Bioc-devel] Pushing towards a better home for matrix generics

2019-01-28 Thread Michael Lawrence via Bioc-devel
That will have some consequences; for example, documentation aliases
will need to change. Not sure how many packages will need to be fixed
outside of Matrix, but it's not an isolated change. Martin might
comment on the rationale for the full signature, since he defined
those generics.

On Mon, Jan 28, 2019 at 7:21 PM Pages, Herve  wrote:
>
> Actually there is a 4th solution which is to modify the definition of
> the implicit generics in the methods package (file makeBasicFunsList.R)
> to make them dispatch on their 1st arg only. Should be easy. Then no
> package will need to use a setGeneric statement anymore. Everybody will
> automatically get a clean implicit generic. Including the Matrix package
> which shouldn't need to be touched (except maybe for some aliases in its
> man page that might need to be changed from
> \alias{colSums,dgCMatrix,ANY,ANY-method} to
> \alias{colSums,dgCMatrix-method}).
>
> Anybody wants to try to make a patch for this?
>
> H.
>
> On 1/28/19 19:00, Michael Lawrence wrote:
> > I agree (2) is a good compromise. CC'ing Martin for his perspective.
> >
> > Michael
> >
> > On Mon, Jan 28, 2019 at 6:58 PM Pages, Herve  wrote:
> >> Hi Aaron,
> >>
> >> The 4 matrix summarization generics currently defined in BiocGenerics
> >> are defined as followed:
> >>
> >> setGeneric("rowSums", signature="x")
> >> setGeneric("colSums", signature="x")
> >> setGeneric("rowMeans", signature="x")
> >> setGeneric("colMeans", signature="x")
> >>
> >> The only reason for having these definitions in BiocGenerics is to
> >> restrict dispatch the first argument. This is cleaner than what we would
> >> get with the implicit generics where dispatch is on all arguments (it
> >> doesn't really make sense to dispatch on toggles like 'na.rm' or
> >> 'dims'). Sticking to simple dispatch when possible makes life easier for
> >> the developer (especially in times of troubleshooting) and for the user
> >> (methods are easier to discover and their man pages easier to access).
> >>
> >> However, the 4 statements above create new generics that mask the
> >> implicit generics defined in the Matrix package (Matrix doesn't contain
> >> any setGeneric statements for these generics, only setMethod
> >> statements). This is a very unsatisfying situation and it has hit me
> >> repeatedly over the last couple of years.
> >>
> >> We have basically 3 ways to go. From simpler to more complicated:
> >>
> >> 1) Give up on single dispatch for these generics. That is, we remove the
> >> 4 statements above from BiocGenerics. Then we use setMethod() in package
> >> code like Matrix does.
> >>
> >> 2) Convince the Matrix folks to put the 4 statements above in Matrix.
> >> Then any BioC package that needs to define methods for these generics
> >> would just need to import them from the Matrix package. Maybe we could
> >> even push this one step further by having BiocGenerics import and
> >> re-export these generics. This would make them "available" in BioC as
> >> soon as the BiocGenerics is loaded (and any package that needs to define
> >> methods on them would just need to import them from BiocGenerics).
> >>
> >> 3) Put the 4 statements above in a MatrixGenerics package. Then convince
> >> the Matrix folks to define methods on the generics defined in
> >> MatrixGenerics. Very unlikely to happen!
> >>
> >> IMO 2) is the best compromise. Want to give it a shot?
> >>
> >> H.
> >>
> >>
> >> On 1/27/19 13:45, Aaron Lun wrote:
> >>> This is a resurrection of some old threads:
> >>>
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_bioc-2Ddevel_2017-2DNovember_012273.html=DwIDaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk=pcpUyjpkQe6U79lZ_n2SANNp6Zj_s6i1Sq2yZx2NSjw=
> >>>
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_MatrixGenerics_issues=DwIDaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk=NrmcVnmvgkDp3p64J-izZU9VD5nFsFCWOTI-TsnzCpY=
> >>>
> >>> For those who are unfamiliar with this, the basic issue is that various
> >>> Matrix and BiocGenerics functions mask each other. This is mildly
> >>> frustrating in interactive sessions:
> >>>
>  library(Matrix)
>  library(DelayedArray)
>  x <- rsparsematrix(10, 10, 0.1)
>  colSums(x) # fails
>  Matrix::colSums(x) # okay
> >>> ... but quite annoying during package development, requiring code like
> >>> this:
> >>>
> >>>   if (is(x, "Matrix")) {
> >>>   z <- Matrix::colSums(x)
> >>>   } else {
> >>>   z <- colSums(x) # assuming DelayedArray does the masking.
> >>>   }
> >>>
> >>> ... which defeats the purpose of using S4 dispatch in the first place.
> >>>
> >>> I have been encountering this issue with increasing frequency in my
> >>> packages, as a lot of my code base needs to be able to interface with
> >>> both Matrix and Bioconductor 

Re: [Bioc-devel] Pushing towards a better home for matrix generics

2019-01-28 Thread Michael Lawrence via Bioc-devel
I agree (2) is a good compromise. CC'ing Martin for his perspective.

Michael

On Mon, Jan 28, 2019 at 6:58 PM Pages, Herve  wrote:
>
> Hi Aaron,
>
> The 4 matrix summarization generics currently defined in BiocGenerics
> are defined as followed:
>
>setGeneric("rowSums", signature="x")
>setGeneric("colSums", signature="x")
>setGeneric("rowMeans", signature="x")
>setGeneric("colMeans", signature="x")
>
> The only reason for having these definitions in BiocGenerics is to
> restrict dispatch the first argument. This is cleaner than what we would
> get with the implicit generics where dispatch is on all arguments (it
> doesn't really make sense to dispatch on toggles like 'na.rm' or
> 'dims'). Sticking to simple dispatch when possible makes life easier for
> the developer (especially in times of troubleshooting) and for the user
> (methods are easier to discover and their man pages easier to access).
>
> However, the 4 statements above create new generics that mask the
> implicit generics defined in the Matrix package (Matrix doesn't contain
> any setGeneric statements for these generics, only setMethod
> statements). This is a very unsatisfying situation and it has hit me
> repeatedly over the last couple of years.
>
> We have basically 3 ways to go. From simpler to more complicated:
>
> 1) Give up on single dispatch for these generics. That is, we remove the
> 4 statements above from BiocGenerics. Then we use setMethod() in package
> code like Matrix does.
>
> 2) Convince the Matrix folks to put the 4 statements above in Matrix.
> Then any BioC package that needs to define methods for these generics
> would just need to import them from the Matrix package. Maybe we could
> even push this one step further by having BiocGenerics import and
> re-export these generics. This would make them "available" in BioC as
> soon as the BiocGenerics is loaded (and any package that needs to define
> methods on them would just need to import them from BiocGenerics).
>
> 3) Put the 4 statements above in a MatrixGenerics package. Then convince
> the Matrix folks to define methods on the generics defined in
> MatrixGenerics. Very unlikely to happen!
>
> IMO 2) is the best compromise. Want to give it a shot?
>
> H.
>
>
> On 1/27/19 13:45, Aaron Lun wrote:
> > This is a resurrection of some old threads:
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_bioc-2Ddevel_2017-2DNovember_012273.html=DwIDaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk=pcpUyjpkQe6U79lZ_n2SANNp6Zj_s6i1Sq2yZx2NSjw=
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_MatrixGenerics_issues=DwIDaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk=NrmcVnmvgkDp3p64J-izZU9VD5nFsFCWOTI-TsnzCpY=
> >
> > For those who are unfamiliar with this, the basic issue is that various
> > Matrix and BiocGenerics functions mask each other. This is mildly
> > frustrating in interactive sessions:
> >
> >> library(Matrix)
> >> library(DelayedArray)
> >> x <- rsparsematrix(10, 10, 0.1)
> >> colSums(x) # fails
> >> Matrix::colSums(x) # okay
> > ... but quite annoying during package development, requiring code like
> > this:
> >
> >  if (is(x, "Matrix")) {
> >  z <- Matrix::colSums(x)
> >  } else {
> >  z <- colSums(x) # assuming DelayedArray does the masking.
> >  }
> >
> > ... which defeats the purpose of using S4 dispatch in the first place.
> >
> > I have been encountering this issue with increasing frequency in my
> > packages, as a lot of my code base needs to be able to interface with
> > both Matrix and Bioconductor objects (e.g., DelayedMatrices) at the
> > same time. What needs to happen so that I can just write:
> >
> >  z <- colSums(x)
> >
> > ... and everything will work for both Matrix and Bioconductor classes?
> > It seems that many of these function names are implicit generics
> > anyway, can BiocGenerics take advantage of that for the time being?
> >
> > Best,
> >
> > Aaron
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIDaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk=JtgGBnaZJH44fV8OUp-SwnHxhD_i_mdVkqoMfUoA5tM=
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org 

Re: [Bioc-devel] Plans for multi-feature SingleCellExperiment?

2019-01-22 Thread Michael Lawrence via Bioc-devel
Hi Steve,

Pretty sure MultiAssayExperiment, containing SingleCellExperiments,
would fit the bill, but I'm not familiar with those data types.

Michael

On Tue, Jan 22, 2019 at 2:18 PM Steve Lianoglou
 wrote:
>
> Comrades,
>
> Sorry if I'm out of the loop and have missed anything obvious.
>
> I was curious what the plans are in the single-cell bioconductor-verse
> to support single cell experiments that produce counts from different
> feature-spaces, such as those produced by CITE-seq / REAP-seq, for
> instance.
>
> In these types of experiments, I'm pretty sure we want the counts
> generated from those "features" (oligo-conjugated Antibodies, for
> instance) to be kept in a separate space than the mRNA counts. I think
> we would most  naturally want to put these in something like an
> `assay()` matrix with a different (rowwise) dimmension than the gene
> count matrix, but that can't work since all matrices in the assay()
> list need to be of the same dimensions.
>
> Another option might be to just add them as rows to the assay
> matrices, but keep some type of feature space meta-information akin to
> what `isSpike()` currently does;
>
> or add a new slot to SingleCellExperiment to hold counts from
> different feature spaces, perhaps?;
>
> Or rely on something like a MultiAssayExperiment?
>
> Or?
>
> Curious to learn which way you folks are leaning ...
>
> Thanks!
> -steve
>
> ps - sorry if this email came through twice, it was somehow magically
> sent from an email address I don't have access to anymore.
>
> --
> Steve Lianoglou
> Denali Therapeutics
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel