[Bioc-devel] Whether to copy unexported BSgenome function into multicrispr

2019-08-28 Thread Bhagwat, Aditya
Dear BioC developers,

BSgenome::getBSgenome('mm10') returns BSgenome.Mmusculus.UCSC.mm10::Mmusculus.

But that function also attaches BiocGenerics, S4Vectors, IRanges and 
Biostrings, which is unfortunate from a keep-the-namespace-clean perspective.

I could instead use the clean alternative
bsname <- BSgenome:::.getInstalledPkgnameFromProviderVersion('mm10')
utils::getFromNamespace(bsname, bsname)

But that, off course, (BioC)Check does not like, since using unexported 
functions doesn't fit into the R software development paradigm.

So then instead I could copy paste the latter function into my multicrispr 
 package (which 
is being readied for BioC) and add its author, Herve Pages, as a co-author of 
my package (after, off course, informing him). Is this the way such things are 
normally dealt with? Or how would you guys deal with a situation like this?

A function-level question regarding this topic was posted earlier on BioC 
support. The current 
question is more of a package development question, though, and I thought it 
would be nice to get feedback from more experienced BioC developers :-)

Thank you for your feedback!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-06 Thread Bhagwat, Aditya
Thank you Michael,

Appreciate your time for helping me fill the gaps in my understanding of the S4 
flow :-).

It all started when I defined (in my multicrispr package) the S4 coercer :
methods::setAs( "BSgenome",
"GRanges",
function(from) as(GenomeInfoDb::seqinfo(from), "GRanges")

When building, I noticed the message
in method for 'coerce' with signature '"BSgenome","GRanges"': no definition for 
class "BSgenome"

So, I added
BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')

That loads all these dependencies.
>From your answer, I understand that there is currently no alternative to 
>loading all these dependencies.
I guess because these dependencies are needed to provide for all required S4 
methods for the BSgenome class, am I right?

Is there a way to define a methods::setAs without loading the class definition?

Aditya





From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Friday, September 06, 2019 1:09 PM
To: Bhagwat, Aditya
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

The way to keep a "clean namespace" is to selectively import symbols
into your namespace, not to import _nothing_ into your namespace.
Otherwise, your code will fill with namespace qualifications that
distract from what is more important to communicate: the intent of the
code. And no, there's no way to define method signatures using
anything other than simple class names.

It would be interesting to explore alternative ways of specifying
method signatures. One way would be if every package exported a "class
reference" (class name with package attribute, at least) for each of
its classes. Those could be treated like any other exported object,
and referenced via namespace qualification. It would require major
changes to the methods package but that should probably happen anyway
to support disambiguation when two packages define a class of the same
name. It would be nice to get away from the exportClasses() and
importClasses() stuff. File that under the "rainy year" category.

Michael

On Fri, Sep 6, 2019 at 3:39 AM Bhagwat, Aditya
 wrote:
>
> Dear Bioc devel,
>
> Is it possible to import the BSgenome class without attaching BiocGenerics 
> (to keep a clean namespace during the development of 
> multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr>).
>
> BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
>
> (Posted earlier on BioC support<https://support.bioconductor.org/p/124442/> 
> and redirected here following Martin's suggestion)
>
> Thankyou :-)
>
> Aditya
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-06 Thread Bhagwat, Aditya
I noticed the unfriendly indentation and formatting of my response , so I 
updated my original question on BioC support (with a much more eye-friendly 
formatting):

https://support.bioconductor.org/p/124442



From: Bhagwat, Aditya
Sent: Friday, September 06, 2019 2:47 PM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: RE: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

Thank you Michael,

Appreciate your time for helping me fill the gaps in my understanding of the S4 
flow :-).

It all started when I defined (in my multicrispr package) the S4 coercer :
methods::setAs( "BSgenome",
"GRanges",
function(from) as(GenomeInfoDb::seqinfo(from), "GRanges")

When building, I noticed the message
in method for 'coerce' with signature '"BSgenome","GRanges"': no definition for 
class "BSgenome"

So, I added
BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')

That loads all these dependencies.
>From your answer, I understand that there is currently no alternative to 
>loading all these dependencies.
I guess because these dependencies are needed to provide for all required S4 
methods for the BSgenome class, am I right?

Is there a way to define a methods::setAs without loading the class definition?

Aditya





From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Friday, September 06, 2019 1:09 PM
To: Bhagwat, Aditya
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

The way to keep a "clean namespace" is to selectively import symbols
into your namespace, not to import _nothing_ into your namespace.
Otherwise, your code will fill with namespace qualifications that
distract from what is more important to communicate: the intent of the
code. And no, there's no way to define method signatures using
anything other than simple class names.

It would be interesting to explore alternative ways of specifying
method signatures. One way would be if every package exported a "class
reference" (class name with package attribute, at least) for each of
its classes. Those could be treated like any other exported object,
and referenced via namespace qualification. It would require major
changes to the methods package but that should probably happen anyway
to support disambiguation when two packages define a class of the same
name. It would be nice to get away from the exportClasses() and
importClasses() stuff. File that under the "rainy year" category.

Michael

On Fri, Sep 6, 2019 at 3:39 AM Bhagwat, Aditya
 wrote:
>
> Dear Bioc devel,
>
> Is it possible to import the BSgenome class without attaching BiocGenerics 
> (to keep a clean namespace during the development of 
> multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr>).
>
> BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
>
> (Posted earlier on BioC support<https://support.bioconductor.org/p/124442/> 
> and redirected here following Martin's suggestion)
>
> Thankyou :-)
>
> Aditya
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-12 Thread Bhagwat, Aditya
Thankyou Bernat, 

Saw your email just now - since I have the "digest" option. Good that you 
brought the chromosomes parameter under my attention, I must be able to use 
that!

Aditya


-Original Message-
From: Bernat Gel Moreno  
Sent: Donnerstag, 12. September 2019 08:47
To: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

Hi all,

I'm the developer of karyoploteR.

@Michael: I never though about using seqinfo as the source for the genome 
information. I'll add this as an option to define the genome. 
Thanks for the suggestion.

@Aditya: If you want to plot just your relevant chromosomes, you don't need to 
alter the genome. You can use the "chromosomes" parameter to give a vector of 
chromosome names. Is it not working for you for some reason?

Bernat


El 9/11/19 a las 2:31 PM, Michael Lawrence via Bioc-devel escribió:
> I'm pretty surprised that the karyoploteR package does not accept a 
> Seqinfo since it is plotting chromosomes. But again, please consider 
> just doing as(seqinfo(bsgenome), "GRanges").
>
> On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya 
>  wrote:
>> Hi Herve,
>>
>> Thank you for your responses.
>>  From your response, it is clear that the vcountPDict use case does not need 
>> a BSgenome -> GRanges coercer.
>>
>> The karyoploteR use case still requires it, though, to allow plotting of 
>> only the chromosomal BSgenome portions:
>>
>>  chromranges <- as(bsegenome, "GRanges")
>>  kp <- karyoploteR::plotKaryotype(chromranges)
>>  karyoploteR::kpPlotRegions(kp, crispr_target_sites)
>>
>> Or do you see any alternative for this purpose too?
>>
>> Aditya
>>
>> 
>> From: Pages, Herve [hpa...@fredhutch.org]
>> Sent: Wednesday, September 11, 2019 12:24 PM
>> To: Bhagwat, Aditya; bioc-devel@r-project.org
>> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
>> BiocGenerics (and others)?
>>
>> Hi Aditya,
>>
>> On 9/11/19 01:31, Bhagwat, Aditya wrote:
>>> Hi Herve,
>>>
>>>
>>>   > It feels that a coercion method from BSgenome to GRanges should 
>>> rather be defined in the BSgenome package itself.
>>>
>>> :-)
>>>
>>>
>>>   > Patch/PR welcome on GitHub.
>>>
>>> Owkies. What pull/fork/check/branch protocol to be followed?
>>>
>>>
>>>   > Is this what you have in mind for this coercion?
>>>   > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
>>>
>>> Yes.
>>>
>>> Perhaps also useful to share the wider context, allowing your and 
>>> others feedback for improved software design.
>>> I wanted to subset a
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.biocon
>>> ductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvime
>>> WdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0a
>>> anQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>BSgenome
>>> (without the _random or _unassigned), but Lori explained this is not 
>>> possible.
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.biocon
>>> ductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvime
>>> WdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0a
>>> anQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>
>>>
>>> Instead Lori suggested to coerce a BSgenome into a GRanges 
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.biocon
>>> ductor.org_p_123489=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvime
>>> WdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0a
>>> anQY4M6wY=6Eh73QthFfpPsfpRdPWs98pH6GHvv1Z23ORp34OCPxA=>,
>>> which is a useful solution, but for which currently no exported S4 
>>> method exists 
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.biocon
>>> ductor.org_p_124416=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvime
>>> WdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0a
>>> anQY4M6wY=H8owJlOQrNHwNFHfCxGHe27Jxu6xjxpuAMWK8JlTU4Y=>
>>> So I defined an S4 coercer in my multicrispr package, making sure to 
>>> properly import the Bsgenome class 
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124442=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=2XNBVcw

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-12 Thread Bhagwat, Aditya
Thankyou Bernat!

-Original Message-
From: Bernat Gel Moreno  
Sent: Donnerstag, 12. September 2019 11:26
To: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

I have updated karyoploteR and it's now (from version 1.11.9 in devel) possible 
to use a BSgenome object or a seqinfo object as genome definitions in 
plotKaryotype. In both cases, if possible, it will by default automatically 
filter the chromosomes to the canonical ones (if
defined) and retrieve the cytobands for the genome. Or you can specify the 
exact chromosomes you want to plot. I think this should help with the specific 
question at hand.

Bernat


El 9/12/19 a las 10:09 AM, Bernat Gel Moreno escribió:
> Oh, and Aditya, take into account taht if you give karyoploteR a 
> custom genome as you are planning to do, it will not paint the 
> cytobands by default, you'll have to get them yourself and give them to 
> plotKaryotype.
>
> If possible, I would recommend giving the genome by name ("hg19") and 
> selecting the chromosomes to plot using "chromosomes".
>
> Bernat
>
>
>
>
> El 9/12/19 a las 8:47 AM, Bernat Gel Moreno escribió:
>> Hi all,
>>
>> I'm the developer of karyoploteR.
>>
>> @Michael: I never though about using seqinfo as the source for the 
>> genome information. I'll add this as an option to define the genome.
>> Thanks for the suggestion.
>>
>> @Aditya: If you want to plot just your relevant chromosomes, you 
>> don't need to alter the genome. You can use the "chromosomes" 
>> parameter to give a vector of chromosome names. Is it not working for 
>> you for some reason?
>>
>> Bernat
>>
>>
>> El 9/11/19 a las 2:31 PM, Michael Lawrence via Bioc-devel escribió:
>>> I'm pretty surprised that the karyoploteR package does not accept a 
>>> Seqinfo since it is plotting chromosomes. But again, please consider 
>>> just doing as(seqinfo(bsgenome), "GRanges").
>>>
>>> On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya 
>>>  wrote:
>>>> Hi Herve,
>>>>
>>>> Thank you for your responses.
>>>>From your response, it is clear that the vcountPDict use case does not 
>>>> need a BSgenome -> GRanges coercer.
>>>>
>>>> The karyoploteR use case still requires it, though, to allow plotting of 
>>>> only the chromosomal BSgenome portions:
>>>>
>>>>chromranges <- as(bsegenome, "GRanges")
>>>>kp <- karyoploteR::plotKaryotype(chromranges)
>>>>karyoploteR::kpPlotRegions(kp, crispr_target_sites)
>>>>
>>>> Or do you see any alternative for this purpose too?
>>>>
>>>> Aditya
>>>>
>>>> 
>>>> From: Pages, Herve [hpa...@fredhutch.org]
>>>> Sent: Wednesday, September 11, 2019 12:24 PM
>>>> To: Bhagwat, Aditya; bioc-devel@r-project.org
>>>> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
>>>> BiocGenerics (and others)?
>>>>
>>>> Hi Aditya,
>>>>
>>>> On 9/11/19 01:31, Bhagwat, Aditya wrote:
>>>>> Hi Herve,
>>>>>
>>>>>
>>>>> > It feels that a coercion method from BSgenome to GRanges 
>>>>> should rather be defined in the BSgenome package itself.
>>>>>
>>>>> :-)
>>>>>
>>>>>
>>>>> > Patch/PR welcome on GitHub.
>>>>>
>>>>> Owkies. What pull/fork/check/branch protocol to be followed?
>>>>>
>>>>>
>>>>> > Is this what you have in mind for this coercion?
>>>>> > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
>>>>>
>>>>> Yes.
>>>>>
>>>>> Perhaps also useful to share the wider context, allowing your and 
>>>>> others feedback for improved software design.
>>>>> I wanted to subset a
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioc
>>>>> onductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeA
>>>>> vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK
>>>>> 7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>B
>>>>> Sgenome (without the _random or _unassigned), but Lori explained 
>>>>> this is not possible.
>>>>> <h

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-12 Thread Bhagwat, Aditya
Thanks Michael and Herve, 

Will do that then. 

I extract from this discussion that exporting a function in a core BioC package 
is reserved for functions 
(1) whose name unambiguously communicates what they do
(2) has the potential to be broadly used

And that as(BSgenome, 'GRanges') is being felt not not comply to these.

Thanks for all  the feedback - has been very helpful.

Aditya


From: Pages, Herve [hpa...@fredhutch.org]
Sent: Wednesday, September 11, 2019 5:29 PM
To: Michael Lawrence; Bhagwat, Aditya
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

Or more accurately:

   as(seqinfo(bsgenome)[seqlevelsInUse(grl)], "GRanges")

since not all seqlevels are necessarily "in use" (i.e. not necessarily
represented in seqnames(grl)).

H.

On 9/11/19 08:26, Hervé Pagès wrote:
> The unique seqnames is what we call the seqlevels. So just:
>
>as(seqinfo(bsgenome)[seqlevels(grl)], "GRanges")
>
> H.
>
> On 9/11/19 07:42, Michael Lawrence wrote:
>> So why not just do:
>>
>> as(seqinfo(bsgenome)[unique(unlist(seqnames(grl)))], "GRanges")
>>
>> Michael
>>
>> On Wed, Sep 11, 2019 at 5:55 AM Bhagwat, Aditya
>>  wrote:
>>>
>>> Thanks Michael,
>>>
>>> The important detail is that I want to plot the relevant chromosomes
>>> only
>>>
>>>  relevant_chromosomes <- GenomeInfoDb::seqnames(grangeslist)  %>%
>>>  S4Vectors::runValue() %>%
>>>  Reduce(union, .) %>%
>>>  unique()
>>>
>>>  genomeranges <- GenomeInfoDb::seqinfo(grangeslist) %>%
>>>  as('GRanges') %>%
>>> (function(gr){
>>> gr [ as.character(GenomeInfoDb::seqnames(gr))
>>> %in%
>>>  relevant_chromosomes ]
>>> })
>>>
>>>  kp <- karyoploteR::plotKaryotype(genomeranges)
>>>  karyoploteR::kpPlotRegions(kp, grangeslist) # grangeslist
>>> contains crispr target sites
>>>
>>>
>>> And, this process required as("GRanges")
>>>
>>>  #' Convert BSgenome into GRanges
>>>  #' @param from BSgenome, e.g.
>>> BSgenome.Mmusculus.UCSC.mm10::Mmusculus
>>>  #' @examples
>>>  #' require(magrittr)
>>>  #' BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10 %>%
>>>  #' as('GRanges')
>>>  #' @importClassesFrom BSgenome BSgenome
>>>  #' @export
>>>  methods::setAs( "BSgenome",
>>>  "GRanges",
>>>  function(from)  from %>%
>>>  GenomeInfoDb::seqinfo() %>%
>>>  as('GRanges'))
>>>
>>> Thankyou for feedback,
>>>
>>> Aditya
>>>
>>> 
>>> From: Michael Lawrence [lawrence.mich...@gene.com]
>>> Sent: Wednesday, September 11, 2019 2:31 PM
>>> To: Bhagwat, Aditya
>>> Cc: Pages, Herve; bioc-devel@r-project.org
>>> Subject: Re: [Bioc-devel] Import BSgenome class without attaching
>>> BiocGenerics (and others)?
>>>
>>> I'm pretty surprised that the karyoploteR package does not accept a
>>> Seqinfo since it is plotting chromosomes. But again, please consider
>>> just doing as(seqinfo(bsgenome), "GRanges").
>>>
>>> On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya
>>>  wrote:
>>>>
>>>> Hi Herve,
>>>>
>>>> Thank you for your responses.
>>>>  From your response, it is clear that the vcountPDict use case does
>>>> not need a BSgenome -> GRanges coercer.
>>>>
>>>> The karyoploteR use case still requires it, though, to allow
>>>> plotting of only the chromosomal BSgenome portions:
>>>>
>>>>  chromranges <- as(bsegenome, "GRanges")
>>>>  kp <- karyoploteR::plotKaryotype(chromranges)
>>>>  karyoploteR::kpPlotRegions(kp, crispr_target_sites)
>>>>
>>>> Or do you see any alternative for this purpose too?
>>>>
>>>> Aditya
>>>>
>>>> 
>>>> From: Pages, Herve [hpa...@fredhutch.org]
>>>&g

[Bioc-devel] Extending GenomicRanges::`intra-range-methods`

2019-09-13 Thread Bhagwat, Aditya
Dear bioc-devel,

The ?GenomicRanges::`intra-range-methods` are very useful for range 
arithmetic

Feedback request: would it be of general use to add the methods below to the 
GenomicRanges::`intra-range-methods` palette (after properly S4-ing them)?
Or shall I keep them in 
multicrispr?
Additional feedback welcome as well (e.g. re-implementation of already existing 
functionality).


1) Left flank

#' Left flank
#' @param gr   \code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart number: flank start (relative to range start)
#' @param leftend   number: flank end   (relative to range start)
#' @return a \code{\link[GenomicRanges]{GRanges-class}}
#' @export
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' gr <- read_bed(bedfile, bsgenome)
#' left_flank(gr)
left_flank <- function(gr, leftstart = -200, leftend   = -1){

# Assert
assert_is_identical_to_true(is(gr, 'GRanges'))
assert_is_a_number(leftstart)
assert_is_a_number(leftend)

# Flank
newranges <- gr
end(newranges)   <- start(gr) + leftend
start(newranges) <- start(gr) + leftstart

# Return
newranges
}


2) Right flank

#' Right flank
#' @param gr\code{\link[GenomicRanges]{GRanges-class}}
#' @param rightstart number: flank start (relative to range end)
#' @param rightend   number: flank end   (relative to range end)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @export
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' gr <- read_bed(bedfile, bsgenome)
#' right_flank(gr)
#' @export
right_flank <- function(gr, rightstart = 1, rightend   = 200){

# Assert
assert_is_identical_to_true(is(gr, 'GRanges'))
assert_is_a_number(rightstart)
assert_is_a_number(rightend)
assert_is_a_bool(verbose)

# Flank
newranges <- gr
start(newranges) <- end(newranges) + rightstart
end(newranges)   <- end(newranges) + rightend

# Plot
if (plot)  plot_intervals(GRangesList(sites = gr, rightflanks = newranges))

# Return
cmessage('\t\t%d right flanks : [end%s%d, end%s%d]',
length(newranges),
csign(rightstart),
abs(rightstart),
csign(rightend),
abs(rightend))
newranges
}


3) Slop

#' Slop (i.e. extend left/right)
#' @param gr\code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart number: flank start (relative to range start)
#' @param rightend  number: flank end   (relative to range end)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @export
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' gr <- read_bed(bedfile, bsgenome)
#' slop(gr)
#' @export
slop <- function(gr, leftstart = -22, rightend  =  22){

# Assert
assert_is_identical_to_true(methods::is(gr, 'GRanges'))
assert_is_a_number(leftstart)
assert_is_a_number(rightend)
assert_is_a_bool(verbose)

# Slop
newranges <- gr
start(newranges) <- start(newranges) + leftstart
end(newranges)   <- end(newranges)   + rightend

# Return
newranges
}


4) Flank fourways

#' Flank fourways
#'
#' Flank left and right, for both strands, and merge overlaps
#' @param gr  \code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart   number: left flank start  (relative to range start)
#' @param leftend number: left flank  end   (relative to range start)
#' @param rightstart  number: right flank start (relative to range end)
#' @param rightendnumber: right flank end   (relative to range end)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' granges <- read_bed(bedfile, bsgenome)
#' flank_fourways(granges)
#' @export
flank_fourways <- function(gr, leftstart  = -200, leftend=   -1, rightstart 
=1, rightend   =  200){

# Comply
. <- NULL

# Flank
left <-  left_flank( gr, leftstart, leftend)
right <- right_flank(gr,rightstart, rightend)
newranges <- c(left, right)

# Complement
newranges %<>% c(invertStrand(.))

# Merge overlaps
newranges %<>% reduce() # GenomicRanges::reduce

# Return
newranges
}



5) Slop fourways

#' Slop granges for both strands, merging overlaps
#' @param gr   \code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart number
#' @param rightend  number
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- 

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-11 Thread Bhagwat, Aditya
Hi Herve,


> It feels that a coercion method from BSgenome to GRanges should rather be 
> defined in the BSgenome package itself.

:-)


> Patch/PR welcome on GitHub.

Owkies. What pull/fork/check/branch protocol to be followed?


> Is this what you have in mind for this coercion?
> as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")

Yes.

Perhaps also useful to share the wider context, allowing your and others 
feedback for improved software design.
I wanted to subset a <https://support.bioconductor.org/p/124367> BSgenome 
(without the _random or _unassigned), but Lori explained this is not 
possible.<https://support.bioconductor.org/p/124367>
Instead Lori suggested to coerce a BSgenome into a 
GRanges<https://support.bioconductor.org/p/123489>, which is a useful solution, 
but for which currently no exported S4 method 
exists<https://support.bioconductor.org/p/124416>
So I defined an S4 coercer in my multicrispr package, making sure to properly 
import the Bsgenome class<https://support.bioconductor.org/p/124442>.
Then, after coercing a BSgenome into a GRanges, I can extract the chromosomes, 
after properly importing 
IRanges::`%in%`<https://support.bioconductor.org/p/124367>
Which I can then on end to 
karyoploteR<https://support.bioconductor.org/p/124328>, for genome-wide plots 
of crispr target sites.

A good moment also to say thank you to all of you who helped me out, it helps 
me to make multicrispr fit nicely into the BioC ecosystem.

Speeking of BioC design philosophy, can any of you suggest concise and 
to-the-point reading material to deepen my understanding of the core BioC 
software design philosophy?
I am trying to understand that better (which was the context for asking 
recently why there are three Vector -> data.frame coercers in 
S4Vectors<https://support.bioconductor.org/p/124491>)

Cheers,

Aditya





From: Pages, Herve [hpa...@fredhutch.org]
Sent: Tuesday, September 10, 2019 6:45 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

Hi Aditya,


More generally speaking, coercion methods should be defined in a place
that is "as close as possible" to the "from" or "to" classes rather than
in a package that doesn't own any of the 2 classes involved.
Is this what you have in mind for this coercion?

> as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
GRanges object with 7 ranges and 0 metadata columns:
seqnames ranges strand
  
chrI chrI 1-15072423 *
chrII chrII 1-15279345 *
chrIII chrIII 1-13783700 *
chrIV chrIV 1-17493793 *
chrV chrV 1-20924149 *
chrX chrX 1-17718866 *
chrM chrM 1-13794 *
---
seqinfo: 7 sequences (1 circular) from ce10 genome

Thanks,
H.


On 9/6/19 03:39, Bhagwat, Aditya wrote:
> Dear Bioc devel,
>
> Is it possible to import the BSgenome class without attaching BiocGenerics 
> (to keep a clean namespace during the development of 
> multicrispr<https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.gwdg.de_loosolab_software_multicrispr=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=cXJaaEvfNbOioopXgFWQms1qny1xehFQyb3V3xDy55M=MIR-kUeXy9oWokdQxItuG82hrvs0uwP1aBIqNdM-Jrs=
>  >).
>
> BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
>
> (Posted earlier on BioC 
> support<https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124442_=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=cXJaaEvfNbOioopXgFWQms1qny1xehFQyb3V3xDy55M=oBSScH5uD5j0vCAaj4dfWepjiNGtHm9q5gA8eaIudZ4=
>  > and redirected here following Martin's suggestion)
>
> Thankyou :-)
>
> Aditya
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=cXJaaEvfNbOioopXgFWQms1qny1xehFQyb3V3xDy55M=cEojiObibdSuzmh21opvy85DZyRrjtfo1vEMopKWmAg=
>

--
Herv� Pag�s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-11 Thread Bhagwat, Aditya
Hi Herve,

Thank you for your responses. 
>From your response, it is clear that the vcountPDict use case does not need a 
>BSgenome -> GRanges coercer.

The karyoploteR use case still requires it, though, to allow plotting of only 
the chromosomal BSgenome portions:

chromranges <- as(bsegenome, "GRanges")
kp <- karyoploteR::plotKaryotype(chromranges)
karyoploteR::kpPlotRegions(kp, crispr_target_sites)

Or do you see any alternative for this purpose too?

Aditya


From: Pages, Herve [hpa...@fredhutch.org]
Sent: Wednesday, September 11, 2019 12:24 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

Hi Aditya,

On 9/11/19 01:31, Bhagwat, Aditya wrote:
> Hi Herve,
>
>
>  > It feels that a coercion method from BSgenome to GRanges should
> rather be defined in the BSgenome package itself.
>
> :-)
>
>
>  > Patch/PR welcome on GitHub.
>
> Owkies. What pull/fork/check/branch protocol to be followed?
>
>
>  > Is this what you have in mind for this coercion?
>  > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
>
> Yes.
>
> Perhaps also useful to share the wider context, allowing your and others
> feedback for improved software design.
> I wanted to subset a
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>BSgenome
> (without the _random or _unassigned), but Lori explained this is not
> possible.
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>
>
> Instead Lori suggested to coerce a BSgenome into a GRanges
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_123489=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=6Eh73QthFfpPsfpRdPWs98pH6GHvv1Z23ORp34OCPxA=>,
> which is a useful solution, but for which currently no exported S4
> method exists
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124416=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=H8owJlOQrNHwNFHfCxGHe27Jxu6xjxpuAMWK8JlTU4Y=>
> So I defined an S4 coercer in my multicrispr package, making sure to
> properly import the Bsgenome class
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124442=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=2XNBVcwoJTjlxY_gl4UPzrHPKmKH9LTnM4ih5SQOfps=>.
> Then, after coercing a BSgenome into a GRanges, I can extract the
> chromosomes, after properly importing IRanges::`%in%`
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>

Looks like you don't need to coerce the BSgenome object to GRanges. See
https://support.bioconductor.org/p/123489/#124581

H.

> Which I can then on end to karyoploteR
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124328=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=M90_rBO1oohGnXe2XBpQHQriFNthY_W0hzN6KWlf2S4=>,
> for genome-wide plots of crispr target sites.
>
> A good moment also to say thank you to all of you who helped me out, it
> helps me to make multicrispr fit nicely into the BioC ecosystem.
>
> Speeking of BioC design philosophy, can any of you suggest concise and
> to-the-point reading material to deepen my understanding of the core
> BioC software design philosophy?
> I am trying to understand that better (which was the context for asking
> recently why there are three Vector -> data.frame coercers in S4Vectors
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124491=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=nBHdQoTrd1Mfu4VTMgtkPyUQ0Ju2NLeX-0X1Ny3fSeg=>)
>
> Cheers,
>
> Aditya
>
>
>
>
> 
> From: Pages, Herve [hpa...@fredhutch.org]
> Sent: Tuesday, September 10, 2019 6:45 PM
> To: Bhagwat, Aditya; bioc-devel@r-pro

Re: [Bioc-devel] Collapsing a GRangesList into a GRanges without loosing names(GRangesList)

2019-09-11 Thread Bhagwat, Aditya
Oh, that's pretty cool :-)
I knew I was overlooking something!
Thanks Lori

Aditya

From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Wednesday, September 11, 2019 2:54 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: Collapsing a GRangesList into a GRanges without loosing 
names(GRangesList)

In what way are you feeling they loose names?

> grlist <- GenomicRanges::GRangesList( gr1 = GenomicRanges::GRanges('chr1', 
> '1-100',   strand = '-'), gr2 = GenomicRanges::GRanges('chr1', '101-200', 
> strand = '-'))

> names(grlist)
[1] "gr1" "gr2"

> temp = unlist(grlist)
> names(temp)
[1] "gr1" "gr2"

> temp
GRanges object with 2 ranges and 0 metadata columns:
  seqnamesranges strand

  gr1 chr1 1-100  -
  gr2 chr1   101-200  -
  ---
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> temp["gr1"]
GRanges object with 1 range and 0 metadata columns:
  seqnamesranges strand

  gr1 chr1 1-100  -
  ---
  seqinfo: 1 sequence from an unspecified genome; no seqlengths



The names "gr1" and "gr2"  were preserved. You can see them as the first entry 
in my temp object.



Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Bhagwat, 
Aditya 
Sent: Wednesday, September 11, 2019 8:47 AM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] Collapsing a GRangesList into a GRanges without loosing 
names(GRangesList)

Dear bioc-devel,

When using unlist to collapse a GRangesList into a 
GRanges<https://support.bioconductor.org/p/83599>  one looses 
names(GRangesList).
Since I need a name-preserving collapser, I defined the following in 
multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr>.
My feedback request is: did I overlook existing similar functionality?

#' Collapse a GRangesList
#' @param grangeslist GenomicRanges::GRangesList
#' @examples
#' # Consider a GRangesList
#' grlist <- GenomicRanges::GRangesList(
#' gr1 = GenomicRanges::GRanges('chr1', '1-100',   strand = '-'),
#' gr2 = GenomicRanges::GRanges('chr1', '101-200', strand = '-'))
#'
#' # unlist() drops names(grlist)
#' unlist(grlist)
#'
#' # collapse() preserves them
#' collapse(grlist)
#'
#' # in a way similar to as.data.frame()
#' as.data.frame(grlist)
#' @export
collapse <- function(grangeslist){
add_series <- function(granges, group_name){
granges$group_name <- group_name;
granges }
S4Vectors::mendoapply(add_series, grangeslist, names(grangeslist)) %>%
unlist() %>%
sort()
}


Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. If you have received this message in error, please 
notify the sender immediately by e-mail and delete this email message from your 
computer. Thank you.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Collapsing a GRangesList into a GRanges without loosing names(GRangesList)

2019-09-11 Thread Bhagwat, Aditya
Dear bioc-devel,

When using unlist to collapse a GRangesList into a 
GRanges  one looses 
names(GRangesList).
Since I need a name-preserving collapser, I defined the following in 
multicrispr.
My feedback request is: did I overlook existing similar functionality?

#' Collapse a GRangesList
#' @param grangeslist GenomicRanges::GRangesList
#' @examples
#' # Consider a GRangesList
#' grlist <- GenomicRanges::GRangesList(
#' gr1 = GenomicRanges::GRanges('chr1', '1-100',   strand = '-'),
#' gr2 = GenomicRanges::GRanges('chr1', '101-200', strand = '-'))
#'
#' # unlist() drops names(grlist)
#' unlist(grlist)
#'
#' # collapse() preserves them
#' collapse(grlist)
#'
#' # in a way similar to as.data.frame()
#' as.data.frame(grlist)
#' @export
collapse <- function(grangeslist){
add_series <- function(granges, group_name){
granges$group_name <- group_name;
granges }
S4Vectors::mendoapply(add_series, grangeslist, names(grangeslist)) %>%
unlist() %>%
sort()
}


Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-11 Thread Bhagwat, Aditya
Thanks Michael, 

The important detail is that I want to plot the relevant chromosomes only

relevant_chromosomes <- GenomeInfoDb::seqnames(grangeslist)  %>% 
S4Vectors::runValue() %>% 
Reduce(union, .) %>% 
unique()

genomeranges <- GenomeInfoDb::seqinfo(grangeslist) %>% 
as('GRanges') %>% 
   (function(gr){
   gr [ as.character(GenomeInfoDb::seqnames(gr)) %in% 
relevant_chromosomes ]
   })

kp <- karyoploteR::plotKaryotype(genomeranges)
karyoploteR::kpPlotRegions(kp, grangeslist) # grangeslist contains crispr 
target sites


And, this process required as("GRanges")

#' Convert BSgenome into GRanges 
#' @param from BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @examples 
#' require(magrittr)
#' BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10 %>%
#' as('GRanges')
#' @importClassesFrom BSgenome BSgenome
#' @export
methods::setAs( "BSgenome", 
"GRanges", 
function(from)  from %>% 
GenomeInfoDb::seqinfo() %>% 
as('GRanges'))

Thankyou for feedback,

Aditya


From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Wednesday, September 11, 2019 2:31 PM
To: Bhagwat, Aditya
Cc: Pages, Herve; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

I'm pretty surprised that the karyoploteR package does not accept a
Seqinfo since it is plotting chromosomes. But again, please consider
just doing as(seqinfo(bsgenome), "GRanges").

On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya
 wrote:
>
> Hi Herve,
>
> Thank you for your responses.
> From your response, it is clear that the vcountPDict use case does not need a 
> BSgenome -> GRanges coercer.
>
> The karyoploteR use case still requires it, though, to allow plotting of only 
> the chromosomal BSgenome portions:
>
> chromranges <- as(bsegenome, "GRanges")
> kp <- karyoploteR::plotKaryotype(chromranges)
> karyoploteR::kpPlotRegions(kp, crispr_target_sites)
>
> Or do you see any alternative for this purpose too?
>
> Aditya
>
> 
> From: Pages, Herve [hpa...@fredhutch.org]
> Sent: Wednesday, September 11, 2019 12:24 PM
> To: Bhagwat, Aditya; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> BiocGenerics (and others)?
>
> Hi Aditya,
>
> On 9/11/19 01:31, Bhagwat, Aditya wrote:
> > Hi Herve,
> >
> >
> >  > It feels that a coercion method from BSgenome to GRanges should
> > rather be defined in the BSgenome package itself.
> >
> > :-)
> >
> >
> >  > Patch/PR welcome on GitHub.
> >
> > Owkies. What pull/fork/check/branch protocol to be followed?
> >
> >
> >  > Is this what you have in mind for this coercion?
> >  > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
> >
> > Yes.
> >
> > Perhaps also useful to share the wider context, allowing your and others
> > feedback for improved software design.
> > I wanted to subset a
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>BSgenome
> > (without the _random or _unassigned), but Lori explained this is not
> > possible.
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>
> >
> > Instead Lori suggested to coerce a BSgenome into a GRanges
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_123489=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=6Eh73QthFfpPsfpRdPWs98pH6GHvv1Z23ORp34OCPxA=>,
> > which is a useful solution, but for which currently no exported S4
> > method exists
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124416=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=H8owJlOQrNHwNFHfCxGHe27Jxu6xjxpuAMWK8JlTU4Y=>
> > So I defin

[Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational 
package, e.g. `GenomicRanges`, given the rather basal nature of this 
functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based 
(bedfile) to 1-based (GRanges), adds BSgenome 
info (to allow for implicit range validity 
checking) and plots the 
karyogram.

Thank you for your feedback.

Cheers,

Aditya


#' Read bedfile into GRanges
#'
#' @param bedfilefile path
#' @param bsgenome   BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot   logical(1)
#' @param verboselogical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'   A good discussion on these two alternative codings is given
#'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
bedfile,
bsgenome,
zero_based= TRUE,
rm_duplicates = TRUE,
plot  = TRUE,
verbose   = TRUE
){
# Assert
assert_all_are_existing_files(bedfile)
assert_is_a_bool(verbose)
assert_is_a_bool(rm_duplicates)
assert_is_a_bool(zero_based)

# Comply
seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

# Read
if (verbose) cmessage('\tRead %s', bedfile)
dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
col.names = c('seqnames', 'start', 'end', 'strand'))
data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

# Transform coordinates: 0-based -> 1-based
if (zero_based){
if (verbose)cmessage('\t\tConvert 0 -> 1-based')
dt[, start := start + 1]
}

if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
nrow(dt), length(unique(dt$seqnames)))

# Drop duplicates
if (rm_duplicates){
is_duplicated <- cduplicated(dt)
if (any(is_duplicated)){
if (verbose) cmessage('\t\t%d after removing duplicates')
dt %<>% extract(!duplicated)
}
}

# Turn into GRanges
gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

# Plot and return
title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
if (plot) plot_karyogram(gr, title)
gr
}


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Aha - thx!

Aditya

From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Tuesday, September 17, 2019 1:02 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: read_bed()

Please look at rtracklayer::import()  function that we recommend for reading of 
BAM files along with other common formats.

Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Bhagwat, 
Aditya 
Sent: Tuesday, September 17, 2019 6:58 AM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] read_bed()

Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational 
package, e.g. `GenomicRanges`, given the rather basal nature of this 
functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based 
(bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome 
info (to allow for implicit range validity 
checking<https://support.bioconductor.org/p/124250>) and plots the 
karyogram<https://support.bioconductor.org/p/124328>.

Thank you for your feedback.

Cheers,

Aditya


#' Read bedfile into GRanges
#'
#' @param bedfilefile path
#' @param bsgenome   BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot   logical(1)
#' @param verboselogical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'   A good discussion on these two alternative codings is given
#'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
bedfile,
bsgenome,
zero_based= TRUE,
rm_duplicates = TRUE,
plot  = TRUE,
verbose   = TRUE
){
# Assert
assert_all_are_existing_files(bedfile)
assert_is_a_bool(verbose)
assert_is_a_bool(rm_duplicates)
assert_is_a_bool(zero_based)

# Comply
seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

# Read
if (verbose) cmessage('\tRead %s', bedfile)
dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
col.names = c('seqnames', 'start', 'end', 'strand'))
data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

# Transform coordinates: 0-based -> 1-based
if (zero_based){
if (verbose)cmessage('\t\tConvert 0 -> 1-based')
dt[, start := start + 1]
}

if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
nrow(dt), length(unique(dt$seqnames)))

# Drop duplicates
if (rm_duplicates){
is_duplicated <- cduplicated(dt)
if (any(is_duplicated)){
if (verbose) cmessage('\t\t%d after removing duplicates')
dt %<>% extract(!duplicated)
}
}

# Turn into GRanges
gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

# Plot and return
title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
if (plot) plot_karyogram(gr, title)
gr
}


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. If you have received this message in error, please 
notify the sender immediately by e-mail and delete this email message from your 
computer. Thank you.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Hi Lori,

I remember now - I tried this function earlier, but it does not work for my 
bedfiles, like the one in attach.

> bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
>
> targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 'mm10')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
scan() expected 'an integer', got 'chr2'
>

Perhaps this sentence in `?rtracklayer::import` points to the source of the 
error?

many tools and organizations have extended BED with additional columns. These 
are not officially valid BED files, and as such rtracklayer does not yet 
support them (this will be addressed soon).

Which brings the question: how soon is soon :-D ?

Aditya



From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Tuesday, September 17, 2019 1:02 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: read_bed()

Please look at rtracklayer::import()  function that we recommend for reading of 
BAM files along with other common formats.

Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Bhagwat, 
Aditya 
Sent: Tuesday, September 17, 2019 6:58 AM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] read_bed()

Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational 
package, e.g. `GenomicRanges`, given the rather basal nature of this 
functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based 
(bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome 
info (to allow for implicit range validity 
checking<https://support.bioconductor.org/p/124250>) and plots the 
karyogram<https://support.bioconductor.org/p/124328>.

Thank you for your feedback.

Cheers,

Aditya


#' Read bedfile into GRanges
#'
#' @param bedfilefile path
#' @param bsgenome   BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot   logical(1)
#' @param verboselogical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'   A good discussion on these two alternative codings is given
#'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
bedfile,
bsgenome,
zero_based= TRUE,
rm_duplicates = TRUE,
plot  = TRUE,
verbose   = TRUE
){
# Assert
assert_all_are_existing_files(bedfile)
assert_is_a_bool(verbose)
assert_is_a_bool(rm_duplicates)
assert_is_a_bool(zero_based)

# Comply
seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

# Read
if (verbose) cmessage('\tRead %s', bedfile)
dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
col.names = c('seqnames', 'start', 'end', 'strand'))
data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

# Transform coordinates: 0-based -> 1-based
if (zero_based){
if (verbose)cmessage('\t\tConvert 0 -> 1-based')
dt[, start := start + 1]
}

if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
nrow(dt), length(unique(dt$seqnames)))

# Drop duplicates
if (rm_duplicates){
is_duplicated <- cduplicated(dt)
if (any(is_duplicated)){
if (verbose) cmessage('\t\t%d after removing duplicates')
dt %<>% extract(!duplicated)
}
}

# Turn into GRanges
gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

# Plot and return
title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
if (plot) plot_karyogram(gr, title)
gr
}


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. I

Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

2019-09-17 Thread Bhagwat, Aditya
Owkies, will file a PR in one of the coming days. And continue the discussion 
when I do so.

Cheers!

Aditya


From: Stuart Lee [le...@wehi.edu.au]
Sent: Tuesday, September 17, 2019 5:33 AM
To: Michael Lawrence
Cc: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

Hi Aditya,

I think straddle would be a great addition to plyranges. Happy for you to put 
in a PR and add you as a contributor.

Maybe instead of specifying the start etc. we could dispatch on anchored ranges 
instead? So we�d follow the anchor_start(gr) %>% straddle(). We could also have 
the directed version for considering strands.

https://github.com/sa-lee/plyranges

Thanks,
Stuart

---
Stuart Lee
Visiting PhD Student - Ritchie Lab



On 13 Sep 2019, at 22:38, Michael Lawrence 
mailto:lawrence.mich...@gene.com>> wrote:

Thanks for these suggestions; I think they're worth considering.

I've never been totally satisfied with (my function) flank(), because
it's limited and its arguments are somewhat obscure in meaning. You
can check out what we did in plyranges:
https://rdrr.io/bioc/plyranges/man/flank-ranges.html. Your functions
are more flexible, because they are two-way about the endpoint, like
promoters(). Sometimes I've solved that with resize(flank()), but
that's not ideal.  Maybe a better name is "straddle" for when ranges
straddle one of the endpoints? In keeping with the current pattern of
Ranges API, there would be a single function: straddle(x, side, left,
right, ignore.strand=FALSE). So straddle(x, "start", -100, 10) would
be like promoters(x, 100, 10) for a positive or "*" strand range. That
brings up strandedness, which needs to be considered here. For
unstranded ranges, it may be that direct start() and end()
manipulation is actually more transparent than a special verb. I
wonder what Stuart Lee thinks?

The functions that involve reduce() wouldn't fit into the intrarange
operations, as they are summarizing ranges, not transforming them.
They may be going too far.

Michael

On Fri, Sep 13, 2019 at 4:48 AM Bhagwat, Aditya
mailto:aditya.bhag...@mpi-bn.mpg.de>> wrote:

Dear bioc-devel,

The ?GenomicRanges::`intra-range-methods` are very useful for range 
arithmetic<https://genomicsclass.github.io/book/pages/figure/bioc1_igranges-unnamed-chunk-6-1.png>

Feedback request: would it be of general use to add the methods below to the 
GenomicRanges::`intra-range-methods` palette (after properly S4-ing them)?
Or shall I keep them in 
multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr>?
Additional feedback welcome as well (e.g. re-implementation of already existing 
functionality).


1) Left flank

#' Left flank
#' @param gr   \code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart number: flank start (relative to range start)
#' @param leftend   number: flank end   (relative to range start)
#' @return a \code{\link[GenomicRanges]{GRanges-class}}
#' @export
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' gr <- read_bed(bedfile, bsgenome)
#' left_flank(gr)
left_flank <- function(gr, leftstart = -200, leftend   = -1){

   # Assert
   assert_is_identical_to_true(is(gr, 'GRanges'))
   assert_is_a_number(leftstart)
   assert_is_a_number(leftend)

   # Flank
   newranges <- gr
   end(newranges)   <- start(gr) + leftend
   start(newranges) <- start(gr) + leftstart

   # Return
   newranges
}


2) Right flank

#' Right flank
#' @param gr\code{\link[GenomicRanges]{GRanges-class}}
#' @param rightstart number: flank start (relative to range end)
#' @param rightend   number: flank end   (relative to range end)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @export
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' gr <- read_bed(bedfile, bsgenome)
#' right_flank(gr)
#' @export
right_flank <- function(gr, rightstart = 1, rightend   = 200){

   # Assert
   assert_is_identical_to_true(is(gr, 'GRanges'))
   assert_is_a_number(rightstart)
   assert_is_a_number(rightend)
   assert_is_a_bool(verbose)

   # Flank
   newranges <- gr
   start(newranges) <- end(newranges) + rightstart
   end(newranges)   <- end(newranges) + rightend

   # Plot
   if (plot)  plot_intervals(GRangesList(sites = gr, rightflanks = newranges))

   # Return
   cmessage('\t\t%d right flanks : [end%s%d, end%s%d]',
   length(newranges),
   csign(rightstart),
   abs(rightstart),
   csign(rightend),
   abs(rightend))
   newranges
}


3) Slop

#' Slop (i.e. extend left/right)
#' @param gr\code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart number: flank start (relative to range start)
#' @param righte

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Hi Michael,

Yeah, I also noticed that the attachment was eaten when it entered the 
bio-devel list. 

The file is also accessible in the extdata of the multicrispr:
https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed

A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
(chromEnd), and column 6 (strand). All of these are present in SRF.bed.

I am curious as to why you feel that having additional columns in a bedfile 
would break it?

Cheers,

Aditya


From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 1:41 PM
To: Bhagwat, Aditya
Cc: Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I don't see an attachment, nor can I find the multicrispr package
anywhere. The "addressed soon" was referring to the BEDX+Y formats,
which was addressed many years ago, so I've updated the documentation.
Broken BED files will never be supported.

Michael

On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
 wrote:
>
> Hi Lori,
>
> I remember now - I tried this function earlier, but it does not work for my 
> bedfiles, like the one in attach.
>
> > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> >
> > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > 'mm10')
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  
> : scan() expected 'an integer', got 'chr2'
> >
>
> Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> error?
>
> many tools and organizations have extended BED with additional columns. These 
> are not officially valid BED files, and as such rtracklayer does not yet 
> support them (this will be addressed soon).
>
> Which brings the question: how soon is soon :-D ?
>
> Aditya
>
>
> ____
> From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> Sent: Tuesday, September 17, 2019 1:02 PM
> To: Bhagwat, Aditya; bioc-devel@r-project.org
> Subject: Re: read_bed()
>
> Please look at rtracklayer::import()  function that we recommend for reading 
> of BAM files along with other common formats.
>
> Cheers,
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> 
> From: Bioc-devel  on behalf of Bhagwat, 
> Aditya 
> Sent: Tuesday, September 17, 2019 6:58 AM
> To: bioc-devel@r-project.org 
> Subject: [Bioc-devel] read_bed()
>
> Dear bioc-devel,
>
> I had two feedback requests regarding the function read_bed().
>
> 1) Did I overlook, and therefore, re-invent existing functionality?
> 2) If not, would `read_bed` be suited for existence in a more foundational 
> package, e.g. `GenomicRanges`, given the rather basal nature of this 
> functionality?
>
> It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds 
> BSgenome info (to allow for implicit range validity 
> checking<https://support.bioconductor.org/p/124250>) and plots the 
> karyogram<https://support.bioconductor.org/p/124328>.
>
> Thank you for your feedback.
>
> Cheers,
>
> Aditya
>
>
> #' Read bedfile into GRanges
> #'
> #' @param bedfilefile path
> #' @param bsgenome   BSgenome, e.g. 
> BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' @param zero_based logical(1): whether bedfile GRanges are 0-based
> #' @param rm_duplicates  logical(1)
> #' @param plot   logical(1)
> #' @param verboselogical(1)
> #' @return \code{\link[GenomicRanges]{GRanges-class}}
> #' @note By convention BED files are 0-based. GRanges are always 1-based.
> #'   A good discussion on these two alternative codings is given
> #'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
> #' @examples
> #' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' (gr <- read_bed(bedfile, bsgenome))
> #' @importFrom  data.table  :=
> #' @export
> read_bed <- function(
> bedfile,
> bsgenome,
> zero_based= TRUE,
> rm_duplicates = TRUE,
> plot  = TRUE,
> verbose   = TRUE
> ){
> # Assert
> assert_all_are_existing_files(bedfile)
> assert_is_a_bool(verbose)
> assert_is_a_bool(rm_duplicates)
> assert_is_a_bool(zero_based)
>
> # Comply
> seqnames <- start <- end <- strand <- .N <- g

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Thankyou Michael, 

How do I use the extraCols argument? The documentation does not mention an 
`extraCols` argument explicitly, so it must be one of the ellipsis arguments, 
but `?rtracklayer::import` does not mention it. Should I say extraCols = 10 
(ten extra columns) or so?

Aditya


From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:05 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

It breaks it because it's not standard BED; however, using the
extraCols= argument should work in this case. Requiring an explicit
format specification is intentional, because it provides validation
and type safety, and it communicates the format to a future reader.
This also looks a bit like a bedPE file, so you might consider using
the Pairs data structure.

Michael

On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> Yeah, I also noticed that the attachment was eaten when it entered the 
> bio-devel list.
>
> The file is also accessible in the extdata of the multicrispr:
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
> (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
>
> I am curious as to why you feel that having additional columns in a bedfile 
> would break it?
>
> Cheers,
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 1:41 PM
> To: Bhagwat, Aditya
> Cc: Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I don't see an attachment, nor can I find the multicrispr package
> anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> which was addressed many years ago, so I've updated the documentation.
> Broken BED files will never be supported.
>
> Michael
>
> On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Lori,
> >
> > I remember now - I tried this function earlier, but it does not work for my 
> > bedfiles, like the one in attach.
> >
> > > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > >
> > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > 'mm10')
> > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > dec,  : scan() expected 'an integer', got 'chr2'
> > >
> >
> > Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> > error?
> >
> > many tools and organizations have extended BED with additional columns. 
> > These are not officially valid BED files, and as such rtracklayer does not 
> > yet support them (this will be addressed soon).
> >
> > Which brings the question: how soon is soon :-D ?
> >
> > Aditya
> >
> >
> > 
> > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > Sent: Tuesday, September 17, 2019 1:02 PM
> > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > Subject: Re: read_bed()
> >
> > Please look at rtracklayer::import()  function that we recommend for 
> > reading of BAM files along with other common formats.
> >
> > Cheers,
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Cancer Institute
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > 
> > From: Bioc-devel  on behalf of Bhagwat, 
> > Aditya 
> > Sent: Tuesday, September 17, 2019 6:58 AM
> > To: bioc-devel@r-project.org 
> > Subject: [Bioc-devel] read_bed()
> >
> > Dear bioc-devel,
> >
> > I had two feedback requests regarding the function read_bed().
> >
> > 1) Did I overlook, and therefore, re-invent existing functionality?
> > 2) If not, would `read_bed` be suited for existence in a more foundational 
> > package, e.g. `GenomicRanges`, given the rather basal nature of this 
> > functionality?
> >
> > It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> > (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds 
> > BSgenome info (to allow for implicit range validity 
> > checking<https://support.bioconductor.org/p/124250>) and plots the 
> > karyogram<https://support.bioconductor

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Oh, superb, thx!

Interesting ... here you use S3 rather than S4 - I wonder the design intention 
underlying these choices (I'm asking because I am trying to figure out myself 
when to use S3 and when to use S4 and whether to mix the two).

Aditya


From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:23 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

The generic documentation does not mention it, but see ?import.bed.
It's similar to colClasses on read.table().

On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
 wrote:
>
> Thankyou Michael,
>
> How do I use the extraCols argument? The documentation does not mention an 
> `extraCols` argument explicitly, so it must be one of the ellipsis arguments, 
> but `?rtracklayer::import` does not mention it. Should I say extraCols = 10 
> (ten extra columns) or so?
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:05 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> It breaks it because it's not standard BED; however, using the
> extraCols= argument should work in this case. Requiring an explicit
> format specification is intentional, because it provides validation
> and type safety, and it communicates the format to a future reader.
> This also looks a bit like a bedPE file, so you might consider using
> the Pairs data structure.
>
> Michael
>
> On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > Yeah, I also noticed that the attachment was eaten when it entered the 
> > bio-devel list.
> >
> > The file is also accessible in the extdata of the multicrispr:
> > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> >
> > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
> > (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> >
> > I am curious as to why you feel that having additional columns in a bedfile 
> > would break it?
> >
> > Cheers,
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 1:41 PM
> > To: Bhagwat, Aditya
> > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > I don't see an attachment, nor can I find the multicrispr package
> > anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> > which was addressed many years ago, so I've updated the documentation.
> > Broken BED files will never be supported.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Lori,
> > >
> > > I remember now - I tried this function earlier, but it does not work for 
> > > my bedfiles, like the one in attach.
> > >
> > > > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > > >
> > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > > 'mm10')
> > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > > dec,  : scan() expected 'an integer', got 'chr2'
> > > >
> > >
> > > Perhaps this sentence in `?rtracklayer::import` points to the source of 
> > > the error?
> > >
> > > many tools and organizations have extended BED with additional columns. 
> > > These are not officially valid BED files, and as such rtracklayer does 
> > > not yet support them (this will be addressed soon).
> > >
> > > Which brings the question: how soon is soon :-D ?
> > >
> > > Aditya
> > >
> > >
> > > 
> > > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > > Sent: Tuesday, September 17, 2019 1:02 PM
> > > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > > Subject: Re: read_bed()
> > >
> > > Please look at rtracklayer::import()  function that we recommend for 
> > > reading of BAM files along with other common formats.
> > >
> > > Cheers,
> > >
> > >
> > > Lori Shepherd
> > >
> > > Bioconductor Core Team
> > >
> > > Roswel

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Oh :-) - Thankyou for explaining!

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:40 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Having a "." in the function name does not make something "S3".
There's no dispatch from import() to import.bed(). Had I not been a
total newb when I created rtracklayer, I would have called the
function importBed() or something like that. Sorry for the confusion.

On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
 wrote:
>
> Oh, superb, thx!
>
> Interesting ... here you use S3 rather than S4 - I wonder the design 
> intention underlying these choices (I'm asking because I am trying to figure 
> out myself when to use S3 and when to use S4 and whether to mix the two).
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:23 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> The generic documentation does not mention it, but see ?import.bed.
> It's similar to colClasses on read.table().
>
> On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
>  wrote:
> >
> > Thankyou Michael,
> >
> > How do I use the extraCols argument? The documentation does not mention an 
> > `extraCols` argument explicitly, so it must be one of the ellipsis 
> > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > extraCols = 10 (ten extra columns) or so?
> >
> > Aditya
> >
> > ________
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:05 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > It breaks it because it's not standard BED; however, using the
> > extraCols= argument should work in this case. Requiring an explicit
> > format specification is intentional, because it provides validation
> > and type safety, and it communicates the format to a future reader.
> > This also looks a bit like a bedPE file, so you might consider using
> > the Pairs data structure.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Michael,
> > >
> > > Yeah, I also noticed that the attachment was eaten when it entered the 
> > > bio-devel list.
> > >
> > > The file is also accessible in the extdata of the multicrispr:
> > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> > >
> > > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 
> > > 3 (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> > >
> > > I am curious as to why you feel that having additional columns in a 
> > > bedfile would break it?
> > >
> > > Cheers,
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 1:41 PM
> > > To: Bhagwat, Aditya
> > > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > I don't see an attachment, nor can I find the multicrispr package
> > > anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> > > which was addressed many years ago, so I've updated the documentation.
> > > Broken BED files will never be supported.
> > >
> > > Michael
> > >
> > > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> > >  wrote:
> > > >
> > > > Hi Lori,
> > > >
> > > > I remember now - I tried this function earlier, but it does not work 
> > > > for my bedfiles, like the one in attach.
> > > >
> > > > > bedfile  <- system.file('extdata/SRF.bed', package = 
> > > > > 'multicrispr')
> > > > >
> > > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > > > 'mm10')
> > > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > > > dec,  : scan() expected 'an integer', got 'chr2'
> > > > >
> > > >
> > > > Perhaps th

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-06 Thread Bhagwat, Aditya
Waaw, thanks Michael, that is really clarifying. 


From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Friday, September 06, 2019 7:32 PM
To: Bhagwat, Aditya
Cc: Kasper Daniel Hansen; Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

There's never a need to attach a package to satisfy the dependencies
of another package. That would defeat the purpose of namespaces.

The three coercion functions gets at the heart of the S3/S4 mess.
setAs() provides a dynamic coercion mechanism, so is useful for as(x,
class) when class can be anything. as.data.frame() is an S3 generic
defined by the base package, so every package sees it. Something
promotes as.data.frame() to an S4 generic, but only packages that
import the generic can see it. That likely excludes the vast majority
of CRAN packages. Thus, we define an S3 method for calls to the S3
generic. The S4 generic will fall back to the S3 methods; however, it
will first check all S4 methods. Defining as.data.frame,Vector()
defends against the definition of a method above it, such as
as.data.frame,Annotated(), which would intercept dispatch to the S3
as.data.frame.Vector().

Michael

On Fri, Sep 6, 2019 at 10:08 AM Bhagwat, Aditya
 wrote:
>
> Thanks Kasper and Michael.
>
>
>
> The importClassesFrom  sounds like something that would allow an 
> attachment-free S4 class import, will check them out.
>
> Michael, the current auto-attach is causing 66 namespace clashes – not 
> feeling very comfortable about that, so trying to avoid them.
>
>
>
> I also think there’s something about S4 coercion that I don’t yet fully 
> understand.
>
> For instance: the S4Vectors package has three different versions of a 
> S4Vectors::Vector -> data.frame coercer. Why? Any useful pointers?
>
>
>
> setAs("Vector", "data.frame", function(from) as.data.frame(from))
>
>
>
> as.data.frame.Vector <- function(x, row.names=NULL, optional=FALSE, ...) {
>
> as.data.frame(x, row.names=NULL, optional=optional, ...)
>
> }
>
>
>
> setMethod("as.data.frame", "Vector",
>
>   function(x, row.names=NULL, optional=FALSE, ...)
>
>   {
>
>   x <- as.vector(x)
>
>   as.data.frame(x, row.names=row.names, optional=optional, ...)
>
>   })
>
>
>
>
>
>
>
> From: Kasper Daniel Hansen 
> Sent: Freitag, 6. September 2019 17:49
> To: Michael Lawrence 
> Cc: Bhagwat, Aditya ; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> BiocGenerics (and others)?
>
>
>
> There are
>
>   importMethodsFrom(PACKAGE, NAME_OF_METHODS)
>
>   importClassesFrom(PACKAGE, NAME_OF_METHODS)
>
> to help with selective importing S4 methods/classes.  See section 1.5.6 of 
> WRE.
>
>
>
> On Fri, Sep 6, 2019 at 9:24 AM Michael Lawrence via Bioc-devel 
>  wrote:
>
> It sounds like you are trying to defer loading of namespaces in order
> to save time when they are unnecessary? That's probably going to end
> up a losing battle.
>
> On Fri, Sep 6, 2019 at 5:47 AM Bhagwat, Aditya
>  wrote:
> >
> > Thank you Michael,
> >
> > Appreciate your time for helping me fill the gaps in my understanding of 
> > the S4 flow :-).
> >
> > It all started when I defined (in my multicrispr package) the S4 coercer :
> > methods::setAs( "BSgenome",
> >
> > "GRanges",
> > function(from) as(GenomeInfoDb::seqinfo(from), "GRanges")
> >
> > When building, I noticed the message
> > in method for 'coerce' with signature '"BSgenome","GRanges"': no definition 
> > for class "BSgenome"
> >
> > So, I added
> > BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
> >
> > That loads all these dependencies.
> > From your answer, I understand that there is currently no alternative to 
> > loading all these dependencies.
> > I guess because these dependencies are needed to provide for all required 
> > S4 methods for the BSgenome class, am I right?
> >
> > Is there a way to define a methods::setAs without loading the class 
> > definition?
> >
> > Aditya
> >
> >
> >
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Friday, September 06, 2019 1:09 PM
> > To: Bhagwat, Aditya
> > Cc: bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> > BiocGenerics 

[Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-06 Thread Bhagwat, Aditya
Dear Bioc devel,

Is it possible to import the BSgenome class without attaching BiocGenerics (to 
keep a clean namespace during the development of 
multicrispr).

BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')

(Posted earlier on BioC support and 
redirected here following Martin's suggestion)

Thankyou :-)

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
Thank you Michael :-)

Aditya

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 8:49 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files.  I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > ____________
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Thankyou Michael,
> > >
> > > How do I use the extraCols argument? The documentation does not mention 
> > > an `extraCols` argument explicitly, so it must be one of the ellipsis 
> > > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > > extraCols = 10 (ten extra columns) or so?
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 2:05 PM
> > > To: Bhagwat, Aditya
> > > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > It breaks it because it's not standard BED; however, using the
> > > extraCols= argument should work in this case. Requiring an explicit
> > > format specification is intentional, because it provides validation
> > > and type safety, and it communicates the format to a future reader.
> > > This also looks a bit like a be

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
(Typo corrected to avoid confusion)

Michael,

rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
multicrispr::read_bed().

In order to avoid the overkill of `require(tracklayer)` for multicrispr 
<https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
sense to import/re-export import.bed() in multicrispr? What is BioC 
convention/best practice in such cases?

Aditya




From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 8:35 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Thank you Michael :-)

Aditya

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 8:49 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files. I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > ________
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Thankyou Michael,
> > >
> > > How do I use the extraCols argument? The documentation does not mention 
> > > an `extraCols` argument explicitly, so it must be one of the ellipsis 
> > > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > > extraCols = 10 (ten extra columns) or so?
> > >
> > > Aditya
> > >
> > > 

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
Michael,

rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
multicrispr::read_bed.

In order to avoid the overkill of `require(tracklayer` for multicrispr 
<https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
sense to import/re-export read.bed in multicrispr? What is BioC convention/best 
practice in such cases?

Aditya




From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 8:35 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Thank you Michael :-)

Aditya

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 8:49 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files. I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > ________
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Thankyou Michael,
> > >
> > > How do I use the extraCols argument? The documentation does not mention 
> > > an `extraCols` argument explicitly, so it must be one of the ellipsis 
> > > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > > extraCols = 10 (ten extra columns) or so?
> > >
> > > Aditya
> > >
> > > 
> > >

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
In the end I endeavour to end up with a handful of verbs, with which I can do 
all tasks in a project.

Regarding the BED files: they're basic bed files, with additional metadata 
columns to allow traceback. But for the purpose of multicrispr, non need to 
restrict to those files only. You extraCols works great for me. And for 
multicrispr examples, I have removed the metadata cols to keep things simple. 
You were right btw that things went wrong earlier in the column stripping 
process.

Aditya



From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 1:57 PM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Hi Michael,

That's a software design dilemma I've encountered a few times.

One approach is to keep the "verb" functions bare. E.g. read_bed would only 
read a bedfile, and plot_bed would somehow plot it. Advantage: if read_bed 
doesn't depend on anything else, other functions can depend on it, which makes 
dependency handling easier.

Another intention is to make verb functions "intuitive". In that scenario, I 
try for each operation to also output a visual image of the operation, to make 
it easier to see at a glance what is happening. E.g. for the range operations 
in multicrispr, the function plot_intervals visually shows what operation is 
being performed, making it easier to both spot errors as well as maintain focus.

In the case of read_bed, I thought of wrapping around your excellent core-level 
rtracklayer::import(), additionally providing the textual and visual feedback 
which I intent to give.

Interesting to hear your suggestions on this topic, though.

Aditya



From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Wednesday, September 18, 2019 1:33 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I'm not sure if a function called read_bed() should be plotting or
printing. Is your BED file a known BED variant, i.e., maybe there is a
better name for the file type than "bed"?


On Wed, Sep 18, 2019 at 3:17 AM Bhagwat, Aditya
 wrote:
>
> Actually,
>
> I will keep multicrispr::read_bed(), but wrap it around 
> rtracklayer::import.bed, and additionally plot and print range summaries.
>
> Aditya
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 11:31 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> (Typo corrected to avoid confusion)
>
> Michael,
>
> rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
> multicrispr::read_bed().
>
> In order to avoid the overkill of `require(tracklayer)` for multicrispr 
> <https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
> sense to import/re-export import.bed() in multicrispr? What is BioC 
> convention/best practice in such cases?
>
> Aditya
>
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 8:35 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Thank you Michael :-)
>
> Aditya
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 8:49 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I think you probably made a mistake when dropping the columns. When I
> provide the extraCols= argument (inventing my own names for things),
> it almost works, but it breaks due to NAs in the extra columns. The
> "." character is the standard way to express NA in BED files. I've
> added support for extra na.strings to version 1.45.6.
>
> For reference, the call is like:
>
> import("SRF.bed", extraCols=c(chr2="character", start2="integer",
> end2="integer", mDux="factor", type="factor", pos1="integer",
> pos2="integer", strand2="factor", from="factor", n="integer",
> code="character", anno="factor", id="character", biotype="character",
> score2="numeric" ), na.strings="NA")
>
>
> On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > I removed the additi

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
Actually,

I will keep multicrispr::read_bed(), but wrap it around 
rtracklayer::import.bed, and additionally plot and print range summaries.

Aditya


From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 11:31 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

(Typo corrected to avoid confusion)

Michael,

rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
multicrispr::read_bed().

In order to avoid the overkill of `require(tracklayer)` for multicrispr 
<https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
sense to import/re-export import.bed() in multicrispr? What is BioC 
convention/best practice in such cases?

Aditya




From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 8:35 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Thank you Michael :-)

Aditya

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 8:49 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files. I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > ________
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
>

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
Hi Michael, 

That's a software design dilemma I've encountered a few times.

One approach is to keep the "verb" functions bare. E.g. read_bed would only 
read a bedfile, and plot_bed would somehow plot it. Advantage: if read_bed 
doesn't depend on anything else, other functions can depend on it, which makes 
dependency handling easier.

Another intention is to make verb functions "intuitive". In that scenario, I 
try for each operation to also output a visual image of the operation, to make 
it easier to see at a glance what is happening. E.g. for the range operations 
in multicrispr, the function plot_intervals visually shows what operation is 
being performed, making it easier to both spot errors as well as maintain focus.

In the case of read_bed, I thought of wrapping around your excellent core-level 
rtracklayer::import(), additionally providing the textual and visual feedback 
which I intent to give.

Interesting to hear your suggestions on this topic, though.

Aditya



From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Wednesday, September 18, 2019 1:33 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I'm not sure if a function called read_bed() should be plotting or
printing. Is your BED file a known BED variant, i.e., maybe there is a
better name for the file type than "bed"?


On Wed, Sep 18, 2019 at 3:17 AM Bhagwat, Aditya
 wrote:
>
> Actually,
>
> I will keep multicrispr::read_bed(), but wrap it around 
> rtracklayer::import.bed, and additionally plot and print range summaries.
>
> Aditya
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 11:31 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> (Typo corrected to avoid confusion)
>
> Michael,
>
> rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
> multicrispr::read_bed().
>
> In order to avoid the overkill of `require(tracklayer)` for multicrispr 
> <https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
> sense to import/re-export import.bed() in multicrispr? What is BioC 
> convention/best practice in such cases?
>
> Aditya
>
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 8:35 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Thank you Michael :-)
>
> Aditya
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 8:49 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I think you probably made a mistake when dropping the columns. When I
> provide the extraCols= argument (inventing my own names for things),
> it almost works, but it breaks due to NAs in the extra columns. The
> "." character is the standard way to express NA in BED files. I've
> added support for extra na.strings to version 1.45.6.
>
> For reference, the call is like:
>
> import("SRF.bed", extraCols=c(chr2="character", start2="integer",
> end2="integer", mDux="factor", type="factor", pos1="integer",
> pos2="integer", strand2="factor", from="factor", n="integer",
> code="character", anno="factor", id="character", biotype="character",
> score2="numeric" ), na.strings="NA")
>
>
> On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > I removed the additional metadata columns in SRF.bed
> > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> >
> > But still can't get rtracklayer::import.bed working:
> >
> > > rtracklayer::import.bed(bedfile)
> > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > dec, : scan() expected 'a real', got '1.168.595'
> > > bedfile
> > [1] 
> > "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
> >
> > Never mind, multicrispr function read_bed, based on data.table::fread is 
> > doing the job, so I will stick to that .
> >
> > Thank you for all feedback,
> >
> > Cheers,
> >
> > Aditya
> &

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Hi Michael,

I removed the additional metadata columns in SRF.bed
https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed

But still can't get rtracklayer::import.bed working:

> rtracklayer::import.bed(bedfile)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : 
scan() expected 'a real', got '1.168.595'
> bedfile
[1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"

Never mind, multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr> 
function 
read_bed<https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/R/03_read.R>,
 based on data.table::fread is doing the job, so I will stick to that .

Thank you for all feedback,

Cheers,

Aditya



From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Tuesday, September 17, 2019 2:48 PM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Oh :-) - Thankyou for explaining!

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:40 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Having a "." in the function name does not make something "S3".
There's no dispatch from import() to import.bed(). Had I not been a
total newb when I created rtracklayer, I would have called the
function importBed() or something like that. Sorry for the confusion.

On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
 wrote:
>
> Oh, superb, thx!
>
> Interesting ... here you use S3 rather than S4 - I wonder the design 
> intention underlying these choices (I'm asking because I am trying to figure 
> out myself when to use S3 and when to use S4 and whether to mix the two).
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:23 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> The generic documentation does not mention it, but see ?import.bed.
> It's similar to colClasses on read.table().
>
> On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
>  wrote:
> >
> > Thankyou Michael,
> >
> > How do I use the extraCols argument? The documentation does not mention an 
> > `extraCols` argument explicitly, so it must be one of the ellipsis 
> > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > extraCols = 10 (ten extra columns) or so?
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:05 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > It breaks it because it's not standard BED; however, using the
> > extraCols= argument should work in this case. Requiring an explicit
> > format specification is intentional, because it provides validation
> > and type safety, and it communicates the format to a future reader.
> > This also looks a bit like a bedPE file, so you might consider using
> > the Pairs data structure.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Michael,
> > >
> > > Yeah, I also noticed that the attachment was eaten when it entered the 
> > > bio-devel list.
> > >
> > > The file is also accessible in the extdata of the multicrispr:
> > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> > >
> > > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 
> > > 3 (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> > >
> > > I am curious as to why you feel that having additional columns in a 
> > > bedfile would break it?
> > >
> > > Cheers,
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 1:41 PM
> > > To: Bhagwat, Aditya
> > > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > I don't see an attachment, nor can I find the multicrispr package
> > > anywhere. The "addressed soon" was referring t

Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

2019-09-16 Thread Bhagwat, Aditya
Michael, actually, such a generic straddle() could be useful:

straddle(leftstart=-100, rightend=100)  # extended range
straddle(leftstart=-100, leftend=-1)   # left flank
straddle(rightstart=1, rightend=100) # right flank
straddle(leftstart=-100, leftend=-1, rightstart=1, rightend=100) # left and 
right flanks

What do you think?

Aditya


From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Monday, September 16, 2019 10:30 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

Hmm no that wouldn't work, it would become messy trying to figure out when 
incompatible arguments are provided.

Aditya



From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Monday, September 16, 2019 10:09 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

Hi Michael,

Thank you for the pointer to plyranges - looks very useful!

> Maybe a better name is "straddle" for when ranges
> straddle one of the endpoints? In keeping with the current pattern of
> Ranges API, there would be a single function: straddle(x, side, left,
> right, ignore.strand=FALSE). So straddle(x, "start", -100, 10) would
> be like promoters(x, 100, 10) for a positive or "*" strand range.

Cool suggestion, and a really fitting verb :-)
Just slightly modifying your suggestion makes the API fully generic (waaw!), 
generalizing over left_flank, right_flank, as well as slop:

straddle(leftstart, leftend, rightstart, rightend)

Would it be worth having such functionality in GenomicRanges or plyranges, 
rather than multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr>?


> That brings up strandedness, which needs to be considered here. For
> unstranded ranges, it may be that direct start() and end()
> manipulation is actually more transparent than a special verb.

I ended up using left/right for unstranded, and up/down for stranded operations.


> The functions that involve reduce() wouldn't fit into the intrarange
> operations, as they are summarizing ranges, not transforming them.
> They may be going too far.

True. Actually, the functions would be cleaner without the reduce(), I think 
I'll take that out.

Cheers,

Aditya


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

2019-09-16 Thread Bhagwat, Aditya
Hmm no that wouldn't work, it would become messy trying to figure out when 
incompatible arguments are provided. 

Aditya



From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Monday, September 16, 2019 10:09 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

Hi Michael,

Thank you for the pointer to plyranges - looks very useful!

> Maybe a better name is "straddle" for when ranges
> straddle one of the endpoints? In keeping with the current pattern of
> Ranges API, there would be a single function: straddle(x, side, left,
> right, ignore.strand=FALSE). So straddle(x, "start", -100, 10) would
> be like promoters(x, 100, 10) for a positive or "*" strand range.

Cool suggestion, and a really fitting verb :-)
Just slightly modifying your suggestion makes the API fully generic (waaw!), 
generalizing over left_flank, right_flank, as well as slop:

straddle(leftstart, leftend, rightstart, rightend)

Would it be worth having such functionality in GenomicRanges or plyranges, 
rather than multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr>?


> That brings up strandedness, which needs to be considered here. For
> unstranded ranges, it may be that direct start() and end()
> manipulation is actually more transparent than a special verb.

I ended up using left/right for unstranded, and up/down for stranded operations.


> The functions that involve reduce() wouldn't fit into the intrarange
> operations, as they are summarizing ranges, not transforming them.
> They may be going too far.

True. Actually, the functions would be cleaner without the reduce(), I think 
I'll take that out.

Cheers,

Aditya


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Extending GenomicRanges::`intra-range-methods`

2019-09-16 Thread Bhagwat, Aditya
Hi Michael,

Thank you for the pointer to plyranges - looks very useful!

> Maybe a better name is "straddle" for when ranges
> straddle one of the endpoints? In keeping with the current pattern of
> Ranges API, there would be a single function: straddle(x, side, left,
> right, ignore.strand=FALSE). So straddle(x, "start", -100, 10) would
> be like promoters(x, 100, 10) for a positive or "*" strand range.

Cool suggestion, and a really fitting verb :-)
Just slightly modifying your suggestion makes the API fully generic (waaw!), 
generalizing over left_flank, right_flank, as well as slop:

straddle(leftstart, leftend, rightstart, rightend)

Would it be worth having such functionality in GenomicRanges or plyranges, 
rather than multicrispr?


> That brings up strandedness, which needs to be considered here. For
> unstranded ranges, it may be that direct start() and end()
> manipulation is actually more transparent than a special verb.

I ended up using left/right for unstranded, and up/down for stranded operations.


> The functions that involve reduce() wouldn't fit into the intrarange
> operations, as they are summarizing ranges, not transforming them.
> They may be going too far.

True. Actually, the functions would be cleaner without the reduce(), I think 
I'll take that out.

Cheers,

Aditya


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-06 Thread Bhagwat, Aditya
Thanks Kasper and Michael.

The importClassesFrom  sounds like something that would allow an 
attachment-free S4 class import, will check them out.
Michael, the current auto-attach is causing 66 namespace 
clashes<https://support.bioconductor.org/p/124442/> – not feeling very 
comfortable about that, so trying to avoid them.

I also think there’s something about S4 coercion that I don’t yet fully 
understand.
For instance: the S4Vectors package has three different versions of a 
S4Vectors::Vector -> data.frame coercer. Why? Any useful pointers?

setAs("Vector", "data.frame", function(from) as.data.frame(from))

as.data.frame.Vector <- function(x, row.names=NULL, optional=FALSE, ...) {
as.data.frame(x, row.names=NULL, optional=optional, ...)
}

setMethod("as.data.frame", "Vector",
  function(x, row.names=NULL, optional=FALSE, ...)
  {
  x <- as.vector(x)
  as.data.frame(x, row.names=row.names, optional=optional, ...)
  })



From: Kasper Daniel Hansen 
Sent: Freitag, 6. September 2019 17:49
To: Michael Lawrence 
Cc: Bhagwat, Aditya ; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics 
(and others)?

There are
  importMethodsFrom(PACKAGE, NAME_OF_METHODS)
  importClassesFrom(PACKAGE, NAME_OF_METHODS)
to help with selective importing S4 methods/classes.  See section 1.5.6 of WRE.

On Fri, Sep 6, 2019 at 9:24 AM Michael Lawrence via Bioc-devel 
mailto:bioc-devel@r-project.org>> wrote:
It sounds like you are trying to defer loading of namespaces in order
to save time when they are unnecessary? That's probably going to end
up a losing battle.

On Fri, Sep 6, 2019 at 5:47 AM Bhagwat, Aditya
mailto:aditya.bhag...@mpi-bn.mpg.de>> wrote:
>
> Thank you Michael,
>
> Appreciate your time for helping me fill the gaps in my understanding of the 
> S4 flow :-).
>
> It all started when I defined (in my multicrispr package) the S4 coercer :
> methods::setAs( "BSgenome",
>
> "GRanges",
> function(from) as(GenomeInfoDb::seqinfo(from), "GRanges")
>
> When building, I noticed the message
> in method for 'coerce' with signature '"BSgenome","GRanges"': no definition 
> for class "BSgenome"
>
> So, I added
> BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
>
> That loads all these dependencies.
> From your answer, I understand that there is currently no alternative to 
> loading all these dependencies.
> I guess because these dependencies are needed to provide for all required S4 
> methods for the BSgenome class, am I right?
>
> Is there a way to define a methods::setAs without loading the class 
> definition?
>
> Aditya
>
>
>
>
> 
> From: Michael Lawrence 
> [lawrence.mich...@gene.com<mailto:lawrence.mich...@gene.com>]
> Sent: Friday, September 06, 2019 1:09 PM
> To: Bhagwat, Aditya
> Cc: bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
> BiocGenerics (and others)?
>
> The way to keep a "clean namespace" is to selectively import symbols
> into your namespace, not to import _nothing_ into your namespace.
> Otherwise, your code will fill with namespace qualifications that
> distract from what is more important to communicate: the intent of the
> code. And no, there's no way to define method signatures using
> anything other than simple class names.
>
> It would be interesting to explore alternative ways of specifying
> method signatures. One way would be if every package exported a "class
> reference" (class name with package attribute, at least) for each of
> its classes. Those could be treated like any other exported object,
> and referenced via namespace qualification. It would require major
> changes to the methods package but that should probably happen anyway
> to support disambiguation when two packages define a class of the same
> name. It would be nice to get away from the exportClasses() and
> importClasses() stuff. File that under the "rainy year" category.
>
> Michael
>
> On Fri, Sep 6, 2019 at 3:39 AM Bhagwat, Aditya
> mailto:aditya.bhag...@mpi-bn.mpg.de>> wrote:
> >
> > Dear Bioc devel,
> >
> > Is it possible to import the BSgenome class without attaching BiocGenerics 
> > (to keep a clean namespace during the development of 
> > multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr>).
> >
> > BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')
> >
> > (Posted earlier on BioC support<https://support.bioconductor.or

Re: [Bioc-devel] From Biostring matching to short read mapping

2019-11-11 Thread Bhagwat, Aditya
True :-)

From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Éric Fournier 
[fournier.eri...@crchudequebec.ulaval.ca]
Sent: Saturday, November 09, 2019 5:12 PM
To: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] From Biostring matching to short read mapping

Hi,

it might be worthwhile to note that the concern about different chromosome 
sizes only applies if you have more workers than chromosomes. If you're running 
on 2-8 threads, the longer chromosome might hold up a thread while another 
processes two short ones.

Cheers,
-Eric




Date: Fri, 8 Nov 2019 18:19:27 +
From: "Pages, Herve" 
To: "Bhagwat, Aditya" ,
"bioc-devel@r-project.org" 
Subject: Re: [Bioc-devel] From Biostring matching to short read
mapping
Message-ID: <84550bd2-9ded-04a3-6ef6-52746c66f...@fredhutch.org>
Content-Type: text/plain; charset="windows-1252"

Hi Aditya,

Should not be too hard to parallelize. With some gotchas: using one
worker per chromosome (which is the easy way to go) wouldn't be optimal
because of the size differences between the chromosomes. So a better
approach is to try to give each worker the same amount of work by
splitting the set of chromosomes in groups of more or less equal sizes.
The split can either preserve full chromosomes or break them in smaller
pieces. The later will allow using a lot more workers than the former.
I'll try to come up with some code that I'll share here.

BTW the *PDict() family in Biostrings is for finding the matches of a
collection of patterns. You say you want to find "all genomic
(mis)matches of a 23-bp candidate Cas9 sequence". Any reason you're not
using vmatchPattern() (or vcountPattern()) for that?

Cheers,
H.


On 11/7/19 02:11, Bhagwat, Aditya wrote:
> Dear bioc-devel,
>
> multicrispr
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.gwdg.de_loosolab_software_multicrispr=DwMFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=B3ZdDoy-Ur4VIfZr68ORA8dplv90DuCcehJEWpkwWUU=UsUGsKc2SVyrBHDWnEJS0FVy1wIhoeq2WA4nlLmtmfo=>
>  provides
> functions for Crispr/Cas9 gRNA design (and is being prepared for BioC).
> One task involves finding all genomic (mis)matches of a 23-bp candidate
> Cas9 sequence. Currently this is done with `Biostrings::vcountPDict`, an
> approach that is successful, though not fast. An alternative would be to
> switch to short read mapping rather than (Bio)string matching, which
> involves a one-time indexing effort, but subsequent fast alignment.
>
> `Rsubread::align` seems to be limited to max. 16 `nBestLocations`,
> whereas I know from vcountPDict that some Cas9 candidates have hundreds
> of genomic matches.
>
> `QuasR::qAlign` (connecting to Bowtie) does not mention an upper limit
> on `maxHits`.
>
> Feedback request�
>
> Michael, would QuasR/(R)bowtie be a good approach to do this?
>
> Wei, did I overlook a way to do this with Rsubread?
>
> Herve, is there an elegant way to speed up vcountPDict (parallelize?)
>
> Thankyou J
>
> Aditya
>

--
Herv� Pag�s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] From Biostring matching to short read mapping

2019-11-10 Thread Bhagwat, Aditya
Thank you Wei,

I actually love Rsubread, and use it with much appreciation on RNAseq projects, 
thank you for its creation :-)

Cheers,

Aditya



From: Wei Shi [s...@wehi.edu.au]
Sent: Saturday, November 09, 2019 12:02 PM
To: Bhagwat, Aditya; Pages, Herve; bioc-devel@r-project.org
Cc: Michael Stadler (michael.stad...@fmi.ch)
Subject: Re: From Biostring matching to short read mapping

Hi Aditya,

Yes you are correct that Subread reports no more than 16 alignments per reads. 
One reason for this limitation is because subread detects indels in the read 
(Bowtie does not detect indels) and it has to set a limit on the number of 
candidate locations being considered due to the computational cost 
consideration.

Thanks for considering Subread and good luck for your project.

Wei


From: Bhagwat, Aditya 
Sent: Saturday, November 9, 2019 8:06 pm
To: Pages, Herve; bioc-devel@r-project.org
Cc: Wei Shi; Michael Stadler (michael.stad...@fmi.ch)
Subject: RE: From Biostring matching to short read mapping

Thankyou Michael, I got Rbowtie working, now functionalizing it for use within 
multicrispr. I noticed that in QuasR, you actually create a package with bowtie 
indices which you then use for future purposes. Interesting workflow, think I 
will make use of that functionality.

Thankyou Herve. Yes, parallellizing would speed up things. I use `vcountPDict` 
because I want to do the offtargetanalysis for a set of 23 bp cas9 sites. 
vcountPDict must be more efficient than looping, I thought, maybe this is only 
marginally so, I noticed there's an sapply underlying vcountPDict. Is there a 
BSgenome way to parallellize, like a parallel bsapply or so?

And Rsubread I concluded is really limited to only a small number of 
co-alignments, and so not suited for offtargetanalysis.

Cheers,

Aditya


From: Pages, Herve [hpa...@fredhutch.org]
Sent: Friday, November 08, 2019 7:19 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Cc: Wei Shi (s...@wehi.edu.au); Michael Stadler (michael.stad...@fmi.ch)
Subject: Re: From Biostring matching to short read mapping

Hi Aditya,

Should not be too hard to parallelize. With some gotchas: using one
worker per chromosome (which is the easy way to go) wouldn't be optimal
because of the size differences between the chromosomes. So a better
approach is to try to give each worker the same amount of work by
splitting the set of chromosomes in groups of more or less equal sizes.
The split can either preserve full chromosomes or break them in smaller
pieces. The later will allow using a lot more workers than the former.
I'll try to come up with some code that I'll share here.

BTW the *PDict() family in Biostrings is for finding the matches of a
collection of patterns. You say you want to find "all genomic
(mis)matches of a 23-bp candidate Cas9 sequence". Any reason you're not
using vmatchPattern() (or vcountPattern()) for that?

Cheers,
H.


On 11/7/19 02:11, Bhagwat, Aditya wrote:
> Dear bioc-devel,
>
> multicrispr
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.gwdg.de_loosolab_software_multicrispr=DwMFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=B3ZdDoy-Ur4VIfZr68ORA8dplv90DuCcehJEWpkwWUU=UsUGsKc2SVyrBHDWnEJS0FVy1wIhoeq2WA4nlLmtmfo=>
>  provides
> functions for Crispr/Cas9 gRNA design (and is being prepared for BioC).
> One task involves finding all genomic (mis)matches of a 23-bp candidate
> Cas9 sequence. Currently this is done with `Biostrings::vcountPDict`, an
> approach that is successful, though not fast. An alternative would be to
> switch to short read mapping rather than (Bio)string matching, which
> involves a one-time indexing effort, but subsequent fast alignment.
>
> `Rsubread::align` seems to be limited to max. 16 `nBestLocations`,
> whereas I know from vcountPDict that some Cas9 candidates have hundreds
> of genomic matches.
>
> `QuasR::qAlign` (connecting to Bowtie) does not mention an upper limit
> on `maxHits`.
>
> Feedback request�
>
> Michael, would QuasR/(R)bowtie be a good approach to do this?
>
> Wei, did I overlook a way to do this with Rsubread?
>
> Herve, is there an elegant way to speed up vcountPDict (parallelize?)
>
> Thankyou J
>
> Aditya
>

--
Herv� Pag�s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___

The information in this email is confidential and intend...{{dropped:15}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] From Biostring matching to short read mapping

2019-11-09 Thread Bhagwat, Aditya
Thankyou Michael, I got Rbowtie working, now functionalizing it for use within 
multicrispr. I noticed that in QuasR, you actually create a package with bowtie 
indices which you then use for future purposes. Interesting workflow, think I 
will make use of that functionality.

Thankyou Herve. Yes, parallellizing would speed up things. I use `vcountPDict` 
because I want to do the offtargetanalysis for a set of 23 bp cas9 sites. 
vcountPDict must be more efficient than looping, I thought, maybe this is only 
marginally so, I noticed there's an sapply underlying vcountPDict. Is there a 
BSgenome way to parallellize, like a parallel bsapply or so?

And Rsubread I concluded is really limited to only a small number of 
co-alignments, and so not suited for offtargetanalysis.

Cheers,

Aditya


From: Pages, Herve [hpa...@fredhutch.org]
Sent: Friday, November 08, 2019 7:19 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Cc: Wei Shi (s...@wehi.edu.au); Michael Stadler (michael.stad...@fmi.ch)
Subject: Re: From Biostring matching to short read mapping

Hi Aditya,

Should not be too hard to parallelize. With some gotchas: using one
worker per chromosome (which is the easy way to go) wouldn't be optimal
because of the size differences between the chromosomes. So a better
approach is to try to give each worker the same amount of work by
splitting the set of chromosomes in groups of more or less equal sizes.
The split can either preserve full chromosomes or break them in smaller
pieces. The later will allow using a lot more workers than the former.
I'll try to come up with some code that I'll share here.

BTW the *PDict() family in Biostrings is for finding the matches of a
collection of patterns. You say you want to find "all genomic
(mis)matches of a 23-bp candidate Cas9 sequence". Any reason you're not
using vmatchPattern() (or vcountPattern()) for that?

Cheers,
H.


On 11/7/19 02:11, Bhagwat, Aditya wrote:
> Dear bioc-devel,
>
> multicrispr
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.gwdg.de_loosolab_software_multicrispr=DwMFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=B3ZdDoy-Ur4VIfZr68ORA8dplv90DuCcehJEWpkwWUU=UsUGsKc2SVyrBHDWnEJS0FVy1wIhoeq2WA4nlLmtmfo=>
>  provides
> functions for Crispr/Cas9 gRNA design (and is being prepared for BioC).
> One task involves finding all genomic (mis)matches of a 23-bp candidate
> Cas9 sequence. Currently this is done with `Biostrings::vcountPDict`, an
> approach that is successful, though not fast. An alternative would be to
> switch to short read mapping rather than (Bio)string matching, which
> involves a one-time indexing effort, but subsequent fast alignment.
>
> `Rsubread::align` seems to be limited to max. 16 `nBestLocations`,
> whereas I know from vcountPDict that some Cas9 candidates have hundreds
> of genomic matches.
>
> `QuasR::qAlign` (connecting to Bowtie) does not mention an upper limit
> on `maxHits`.
>
> Feedback request…
>
> Michael, would QuasR/(R)bowtie be a good approach to do this?
>
> Wei, did I overlook a way to do this with Rsubread?
>
> Herve, is there an elegant way to speed up vcountPDict (parallelize?)
>
> Thankyou J
>
> Aditya
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] From Biostring matching to short read mapping

2019-11-07 Thread Bhagwat, Aditya
Dear bioc-devel,

multicrispr provides 
functions for Crispr/Cas9 gRNA design (and is being prepared for BioC). One 
task involves finding all genomic (mis)matches of a 23-bp candidate Cas9 
sequence. Currently this is done with `Biostrings::vcountPDict`, an approach 
that is successful, though not fast. An alternative would be to switch to short 
read mapping rather than (Bio)string matching, which involves a one-time 
indexing effort, but subsequent fast alignment.

`Rsubread::align` seems to be limited to max. 16 `nBestLocations`, whereas I 
know from vcountPDict that some Cas9 candidates have hundreds of genomic 
matches.

`QuasR::qAlign` (connecting to Bowtie) does not mention an upper limit on 
`maxHits`.

Feedback request...

Michael, would QuasR/(R)bowtie be a good approach to do this?
Wei, did I overlook a way to do this with Rsubread?
Herve, is there an elegant way to speed up vcountPDict (parallelize?)

Thankyou :)

Aditya



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] plyranges group_by

2019-10-16 Thread Bhagwat, Aditya
Hi Stuart, Michael,

Your plyranges package is really cool - now I am using it for left joining 
GRanges (I am facing a minor issue 
there, but that is not the topic of 
this email - I have been asked by Lori not to double-post :-)).

This email is about the plyranges functionality for grouping GRanges.
That is cool, but I found it to be not so performant for large numbers of 
ranges.
My R session hangs when I do:

bedfile <- paste0('https://gitlab.gwdg.de/loosolab/software/multicrispr/wikis',
  '/uploads/a51e98516c1e6b71441f5b5a5f741fa1/SRF.bed')
srfranges <- rtracklayer::import.bed(bedfile, genome = 'mm10')
txdb <- TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene
generanges <- GenomicFeatures::genes(txdb)
annotatedsrf <- plyranges::join_overlap_left(srfranges, generanges)
plyranges::group_by(annotatedsrf, seqnames, start, end, strand)

For my purposes, I worked around it by performing a groupby in data.table:

data.table::as.data.table(annotatedsrf)[
!is.na(gene_id),
gene_id := paste0(gene_id, collapse = ';'),
by = c('seqnames', 'start', 'end', 'strand'))

And was wondering, in general, whether it would be useful to have a 
data.table-based backend for plyranges::groupby()
And, whether all of this is actually a on-issue due to my improper use of 
plyranges::group_by properly.

Thank you for feebdack :-)

Aditya



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] (no subject)

2019-10-22 Thread Bhagwat, Aditya
Dear Michael,

# This works
rtracklayer::import.bed('SRF.bed', genome = 'mm10') # this works

# But this doesn't
seqinfo1<- 
GenomeInfoDb::seqinfo(BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10)
rtracklayer::import.bed('SRF.bed', genome = seqinfo1)

# Neither does
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] plyranges group_by

2019-10-17 Thread Bhagwat, Aditya
Thank you Michael,

In attach the example file, since I noticed you were unable to download it from 
gitlab.
Will continue the discussion there, then :-)

Aditya


From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Thursday, October 17, 2019 11:45 AM
To: Bhagwat, Aditya
Cc: Stuart Lee; Michael Lawrence; bioc-devel@r-project.org
Subject: Re: plyranges group_by

I replied on the support site. Let's move the discussion there.

On Thu, Oct 17, 2019 at 1:24 AM Bhagwat, Aditya 
mailto:aditya.bhag...@mpi-bn.mpg.de>> wrote:
Thank you Stuart and Michael for your feedback.

Stuart, in response to your request for more context regarding my use case, I 
have updated my recent BioC support 
post<https://support.bioconductor.org/p/125623/>, now providing all use-case 
details.

Michael, I didn't selfmatch yet, but Stuart's reply seems to suggest that it 
would not get the data.table performance (which is literally instantaneous).

As a general question, do you think it would be useful to add a 
data.table-based split-apply-combine functionality to plyranges (such that end 
user operations remain on GRanges-only)? I wouldn't mind writing a function to 
do that (in github), but first need your feedback as to whether you think that 
would be useful :-)

Aditya



From: Stuart Lee [le...@wehi.edu.au<mailto:le...@wehi.edu.au>]
Sent: Thursday, October 17, 2019 3:01 AM
To: Michael Lawrence
Cc: Bhagwat, Aditya; bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
Subject: Re: plyranges group_by

Currently, the way grouping indices are generated is pretty slow if you’re 
doing stuff rowwise. Michael’s suggestion for using selfmatch should speed 
things up a bit. What are you planning to do after grouping? I’ve found there’s 
usually to do stuff without rowwise grouping but really depends on what you’re 
after. Re your other issue would you mind putting it on as a GitHub issue.
—
Stuart Lee
Visiting PhD Student - Ritchie Lab



On 16 Oct 2019, at 22:54, Michael Lawrence 
mailto:lawrence.mich...@gene.com>> wrote:

Just a note that in this particular case, selfmatch(annotatedsrf) would be a 
fast way to generate a grouping vector, like plyranges::group_by(annotatedsrf, 
selfmatch(annotatedsrf)).

Michael

On Wed, Oct 16, 2019 at 2:48 AM Bhagwat, Aditya 
mailto:aditya.bhag...@mpi-bn.mpg.de>> wrote:
Hi Stuart, Michael,

Your plyranges package is really cool - now I am using it for left joining 
GRanges (I am facing a minor issue 
there<https://support.bioconductor.org/p/125623/>, but that is not the topic of 
this email - I have been asked by Lori not to double-post :-)).

This email is about the plyranges functionality for grouping GRanges.
That is cool, but I found it to be not so performant for large numbers of 
ranges.
My R session hangs when I do:

bedfile <- paste0('https://gitlab.gwdg.de/loosolab/software/multicrispr/wikis',
  '/uploads/a51e98516c1e6b71441f5b5a5f741fa1/SRF.bed')
srfranges <- rtracklayer::import.bed(bedfile, genome = 'mm10')
txdb <- TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene
generanges <- GenomicFeatures::genes(txdb)
annotatedsrf <- plyranges::join_overlap_left(srfranges, generanges)
plyranges::group_by(annotatedsrf, seqnames, start, end, strand)

For my purposes, I worked around it by performing a groupby in data.table:

data.table::as.data.table(annotatedsrf)[
!is.na<http://is.na/>(gene_id),
gene_id := paste0(gene_id, collapse = ';'),
by = c('seqnames', 'start', 'end', 'strand'))

And was wondering, in general, whether it would be useful to have a 
data.table-based backend for plyranges::groupby()
And, whether all of this is actually a on-issue due to my improper use of 
plyranges::group_by properly.

Thank you for feebdack :-)

Aditya




--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com<mailto:micha...@gene.com>

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube


___

The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.

The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of the 
Kulin
Nation as the traditional owners of the land where our campuses are located and
the continuing connection to country and community.
___


--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com<mailto:micha...@gene.com>

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] plyranges group_by

2019-10-17 Thread Bhagwat, Aditya
Thank you Stuart and Michael for your feedback.

Stuart, in response to your request for more context regarding my use case, I 
have updated my recent BioC support 
post<https://support.bioconductor.org/p/125623/>, now providing all use-case 
details.

Michael, I didn't selfmatch yet, but Stuart's reply seems to suggest that it 
would not get the data.table performance (which is literally instantaneous).

As a general question, do you think it would be useful to add a 
data.table-based split-apply-combine functionality to plyranges (such that end 
user operations remain on GRanges-only)? I wouldn't mind writing a function to 
do that (in github), but first need your feedback as to whether you think that 
would be useful :-)

Aditya



From: Stuart Lee [le...@wehi.edu.au]
Sent: Thursday, October 17, 2019 3:01 AM
To: Michael Lawrence
Cc: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: plyranges group_by

Currently, the way grouping indices are generated is pretty slow if you�re 
doing stuff rowwise. Michael�s suggestion for using selfmatch should speed 
things up a bit. What are you planning to do after grouping? I�ve found there�s 
usually to do stuff without rowwise grouping but really depends on what you�re 
after. Re your other issue would you mind putting it on as a GitHub issue.
�
Stuart Lee
Visiting PhD Student - Ritchie Lab



On 16 Oct 2019, at 22:54, Michael Lawrence 
mailto:lawrence.mich...@gene.com>> wrote:

Just a note that in this particular case, selfmatch(annotatedsrf) would be a 
fast way to generate a grouping vector, like plyranges::group_by(annotatedsrf, 
selfmatch(annotatedsrf)).

Michael

On Wed, Oct 16, 2019 at 2:48 AM Bhagwat, Aditya 
mailto:aditya.bhag...@mpi-bn.mpg.de>> wrote:
Hi Stuart, Michael,

Your plyranges package is really cool - now I am using it for left joining 
GRanges (I am facing a minor issue 
there<https://support.bioconductor.org/p/125623/>, but that is not the topic of 
this email - I have been asked by Lori not to double-post :-)).

This email is about the plyranges functionality for grouping GRanges.
That is cool, but I found it to be not so performant for large numbers of 
ranges.
My R session hangs when I do:

bedfile <- paste0('https://gitlab.gwdg.de/loosolab/software/multicrispr/wikis',
  '/uploads/a51e98516c1e6b71441f5b5a5f741fa1/SRF.bed')
srfranges <- rtracklayer::import.bed(bedfile, genome = 'mm10')
txdb <- TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene
generanges <- GenomicFeatures::genes(txdb)
annotatedsrf <- plyranges::join_overlap_left(srfranges, generanges)
plyranges::group_by(annotatedsrf, seqnames, start, end, strand)

For my purposes, I worked around it by performing a groupby in data.table:

data.table::as.data.table(annotatedsrf)[
!is.na<http://is.na/>(gene_id),
gene_id := paste0(gene_id, collapse = ';'),
by = c('seqnames', 'start', 'end', 'strand'))

And was wondering, in general, whether it would be useful to have a 
data.table-based backend for plyranges::groupby()
And, whether all of this is actually a on-issue due to my improper use of 
plyranges::group_by properly.

Thank you for feebdack :-)

Aditya




--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com<mailto:micha...@gene.com>

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube


___

The information in this email is confidential and intend...{{dropped:15}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Converting gene ids to GRanges - ensembl centric TxDb missing for human

2019-10-15 Thread Bhagwat, Aditya
Dear BioC devel,

I want to convert geneids to GRanges by doing:
GenomicFeatures::genes(txdb)[geneids]

Works wonderfully for mouse, with entrezgene as well ensemblgene-centric TxDbs:
txdb <- 
TxDb.Mmusculus.UCSC.mm10.knownGene::TxDb.Mmusculus.UCSC.mm10.knownGene
GenomicFeatures::genes(txdb)[c('19600', '99889', '99982')]

txdb <- 
TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene
GenomicFeatures::genes(txdb)[c('ENSMUSG001', 
'ENSMUSG003')]

For human, hower, ensembl-centric TxDbs seem to be missing:
txdb <- 
TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene
GenomicFeatures::genes(txdb)[c('1', '10', '100')]

   # No TxDb.Hsapiens.UCSC.hg38.ensGene::TxDb.Hsapiens.UCSC.hg38.ensGene

Has this been a (perhaps recent) design choice to no longer offer the 
ensemble-centric TxDbs?

(The larger context of this question is the development of multicrispr 
(https://gitlab.gwdg.de/loosolab/software/multicrispr))

Thankyou for feedback!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] rtracklayer::import.bed(genome = Seqinfo)

2019-10-22 Thread Bhagwat, Aditya
Dear Michael,

Sorry for my incomplete email - the send button got hit too fast. Better this 
time.

rtracklayer::import.bed mentions the argument "genome" to be either a genome 
identifier (like 'mm10') or a Seqinfo object.

I notice that the second option does not work on my BED file (in attach).

# This works
rtracklayer::import.bed('SRF.bed', genome = 'mm10') # this works

# But this doesn't
seqinfo1<- 
GenomeInfoDb::seqinfo(BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10)
rtracklayer::import.bed('SRF.bed', genome = seqinfo1)

So I am requesting feedback.
I thought to use this channel

Aditya
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] bsapply and vcountPattern: sapply -> bplapply

2020-01-02 Thread Bhagwat, Aditya
Dear Herve & co,

Wish you a happy new year :-).

I am optimizing code that calls BSgenome::vcountPDict.
BSgenome::vcountPDict calls bsapply, which calls sapply.

Could this sapply be replaced by a bplapply instead - that would turn all the 
vcount, vmatch, etc operations into parallellized operations, by just a minimal 
intervention (sapply -> bplapply).

Or is there a reason why this was not done, or perhaps is there an alternative 
paradigm which you use for speeding up things?

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] How to import a setAs method in one's package?

2020-05-08 Thread Bhagwat, Aditya
Dear BioC compatriots,

How does one import a setAs method (which is generally not exported)?
I in particular need to import as(., DataFrame) and am a bit puzzled on how to 
do this.

Thank you!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to import a setAs method in one's package?

2020-05-08 Thread Bhagwat, Aditya
Looks like this can be successfully done by

@importFrom   methods   as
@importFrom   S4Vectors   DataFrame

Cheers,

Aditya


From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Friday, May 08, 2020 11:04 AM
To: bioc-devel@r-project.org
Subject: [Bioc-devel] How to import a setAs method in one's package?

Dear BioC compatriots,

How does one import a setAs method (which is generally not exported)?
I in particular need to import as(., DataFrame) and am a bit puzzled on how to 
do this.

Thank you!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to import a setAs method in one's package?

2020-05-09 Thread Bhagwat, Aditya
Thankyou guys for your quick and helpful feedback!

Aditya

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Friday, May 08, 2020 7:35 PM
To: Bhagwat, Aditya
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] How to import a setAs method in one's package?

setAs() defines methods on the coerce() generic, so you could write
importMethodsFrom(pkg, coerce). However, as() searches the namespace
associated with class(from) for coerce() methods. As long as that
namespace hasn't selectively imported methods on coerce(), it should
end up using the global table and find the appropriate method.

It's best to just importFrom() a generic to get access to its global
table, or import() the entire package. So Hervé's suggestion of
import(methods) is the right way to go

Michael.

On Fri, May 8, 2020 at 2:07 AM Bhagwat, Aditya
 wrote:
>
> Dear BioC compatriots,
>
> How does one import a setAs method (which is generally not exported)?
> I in particular need to import as(., DataFrame) and am a bit puzzled on how 
> to do this.
>
> Thank you!
>
> Aditya
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] multicrispr: builds fine on all test platforms except one (nebbiolo)

2020-10-14 Thread Bhagwat, Aditya
Dear BiocCore,

multicrispr 
 is 
building fine on all test platforms except one 
(nebbiolo):

Could the build error be arising from some limitation of the test platform 
rather than the multicrispr package?

Thank you for your feedback!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in bioc-devel?

2020-07-07 Thread Bhagwat, Aditya
It's the BSgenome to GenomeDescription coercer that seems to be missing - is 
this on purpose?

# bsgenomeName(BSgenomeObj) FAILS
#---
bsname <- GenomeInfoDb::bsgenomeName(bsgenome)
index_genome(bsgenome, indexedgenomesdir = tempdir())
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 
'bsgenomeName': no method or default for coercing "BSgenome" to 
"GenomeDescription"

# as(bsgenome, 'GenomeDescription') also FAILS
#
bsname <- GenomeInfoDb::bsgenomeName(as(bsgenome, 'GenomeDescription')) # ALSO 
FAILS
index_genome(bsgenome, indexedgenomesdir = tempdir())
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 
'bsgenomeName': no method or default for coercing "BSgenome" to 
"GenomeDescription"

Aditya


From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Tuesday, July 07, 2020 10:22 AM
To: bioc-devel@r-project.org
Subject: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in bioc-devel?

Dear bioc-devel,

multicrispr is having an error on the bioc-devel build 
machines<https://bioconductor.org/checkResults/3.12/bioc-LATEST/multicrispr/malbec1-checksrc.html>,
 caused by:

unable to find an inherited method for function 'bsgenomeName' for signature 
'"BSgenome"

This is a bit strange, because normally a BSgenome object gets automatically 
converted to a GenomeDescription object before being sent to the method 
bsgenomeName. In bioc-devel, for some reason this mechanism seems to be broken. 
Is it on purpose? What would be the best fix/patch?

Right now, I'm checking whether explicitation fixes things:
bsname <- GenomeInfoDb::bsgenomeName(bsgenome)  
   # FAILS
bsname <- GenomeInfoDb::bsgenomeName(as(bsgenome, 'GenomeDescription')) # WORKS 
? (VERIFYING)

Thank you for feedback :-)

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in bioc-devel?

2020-07-07 Thread Bhagwat, Aditya
Sorry, my earlier email had a copy/paste error - corrected now

From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Tuesday, July 07, 2020 10:50 AM
To: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in 
bioc-devel?

It's the BSgenome to GenomeDescription coercer that seems to be missing - is 
this on purpose?

# bsgenomeName(BSgenomeObj) FAILS
#---
bsname <- GenomeInfoDb::bsgenomeName(bsgenome)
index_genome(bsgenome, indexedgenomesdir = tempdir())
Error in h(simpleError(msg, call)) :
  unable to find an inherited method for function 'bsgenomeName' for signature 
'"BSgenome"

# as(bsgenome, 'GenomeDescription') also FAILS
#
bsname <- GenomeInfoDb::bsgenomeName(as(bsgenome, 'GenomeDescription')) # ALSO 
FAILS
index_genome(bsgenome, indexedgenomesdir = tempdir())
Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'x' in selecting a method for function 
'bsgenomeName': no method or default for coercing "BSgenome" to 
"GenomeDescription"

Aditya


From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Tuesday, July 07, 2020 10:22 AM
To: bioc-devel@r-project.org
Subject: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in bioc-devel?

Dear bioc-devel,

multicrispr is having an error on the bioc-devel build 
machines<https://bioconductor.org/checkResults/3.12/bioc-LATEST/multicrispr/malbec1-checksrc.html>,
 caused by:

unable to find an inherited method for function 'bsgenomeName' for signature 
'"BSgenome"

This is a bit strange, because normally a BSgenome object gets automatically 
converted to a GenomeDescription object before being sent to the method 
bsgenomeName. In bioc-devel, for some reason this mechanism seems to be broken. 
Is it on purpose? What would be the best fix/patch?

Right now, I'm checking whether explicitation fixes things:
bsname <- GenomeInfoDb::bsgenomeName(bsgenome)  
   # FAILS
bsname <- GenomeInfoDb::bsgenomeName(as(bsgenome, 'GenomeDescription')) # WORKS 
? (VERIFYING)

Thank you for feedback :-)

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in bioc-devel?

2020-07-07 Thread Bhagwat, Aditya
Dear bioc-devel,

multicrispr is having an error on the bioc-devel build 
machines,
 caused by:

unable to find an inherited method for function 'bsgenomeName' for signature 
'"BSgenome"

This is a bit strange, because normally a BSgenome object gets automatically 
converted to a GenomeDescription object before being sent to the method 
bsgenomeName. In bioc-devel, for some reason this mechanism seems to be broken. 
Is it on purpose? What would be the best fix/patch?

Right now, I'm checking whether explicitation fixes things:
bsname <- GenomeInfoDb::bsgenomeName(bsgenome)  
   # FAILS
bsname <- GenomeInfoDb::bsgenomeName(as(bsgenome, 'GenomeDescription')) # WORKS 
? (VERIFYING)

Thank you for feedback :-)

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in bioc-devel?

2020-07-07 Thread Bhagwat, Aditya
Thankyou Herve!

I guess that means that I can pull the fix in a day or two, is that right?

Aditya

From: Hervé Pagès [hpa...@fredhutch.org]
Sent: Tuesday, July 07, 2020 11:45 AM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in 
bioc-devel?

Hi Aditya,

This change was not intended, sorry. Should be fixed in BSgenome 1.57.4.

Cheers,
H.

On 7/7/20 01:53, Bhagwat, Aditya wrote:
> Sorry, my earlier email had a copy/paste error - corrected now
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, July 07, 2020 10:50 AM
> To: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in 
> bioc-devel?
>
> It's the BSgenome to GenomeDescription coercer that seems to be missing - is 
> this on purpose?
>
> # bsgenomeName(BSgenomeObj) FAILS
> #---
> bsname <- GenomeInfoDb::bsgenomeName(bsgenome)
> index_genome(bsgenome, indexedgenomesdir = tempdir())
> Error in h(simpleError(msg, call)) :
>unable to find an inherited method for function 'bsgenomeName' for 
> signature '"BSgenome"
>
> # as(bsgenome, 'GenomeDescription') also FAILS
> #
> bsname <- GenomeInfoDb::bsgenomeName(as(bsgenome, 'GenomeDescription')) # 
> ALSO FAILS
> index_genome(bsgenome, indexedgenomesdir = tempdir())
> Error in h(simpleError(msg, call)) :
>error in evaluating the argument 'x' in selecting a method for function 
> 'bsgenomeName': no method or default for coercing "BSgenome" to 
> "GenomeDescription"
>
> Aditya
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, July 07, 2020 10:22 AM
> To: bioc-devel@r-project.org
> Subject: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in bioc-devel?
>
> Dear bioc-devel,
>
> multicrispr is having an error on the bioc-devel build 
> machines<https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_checkResults_3.12_bioc-2DLATEST_multicrispr_malbec1-2Dchecksrc.html=DwIFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=_1eFc4UGidLaNS3zXQqjaaIUIU5zu68VT0g6PN0b24E=N-MGRsB-Th9-Po32CSpim6CogpSSPciuLCjci-Uuh8g=
>  >, caused by:
>
> unable to find an inherited method for function 'bsgenomeName' for signature 
> '"BSgenome"
>
> This is a bit strange, because normally a BSgenome object gets automatically 
> converted to a GenomeDescription object before being sent to the method 
> bsgenomeName. In bioc-devel, for some reason this mechanism seems to be 
> broken. Is it on purpose? What would be the best fix/patch?
>
> Right now, I'm checking whether explicitation fixes things:
> bsname <- GenomeInfoDb::bsgenomeName(bsgenome)
>  # FAILS
> bsname <- GenomeInfoDb::bsgenomeName(as(bsgenome, 'GenomeDescription')) # 
> WORKS ? (VERIFYING)
>
> Thank you for feedback :-)
>
> Aditya
>
>  [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=_1eFc4UGidLaNS3zXQqjaaIUIU5zu68VT0g6PN0b24E=aAOK1NFWIMal4FrsocE13hvo95BTk3RL18eVMCncRzk=
>
> ___
> Bioc-devel@r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=_1eFc4UGidLaNS3zXQqjaaIUIU5zu68VT0g6PN0b24E=aAOK1NFWIMal4FrsocE13hvo95BTk3RL18eVMCncRzk=
>
> ___
> Bioc-devel@r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=_1eFc4UGidLaNS3zXQqjaaIUIU5zu68VT0g6PN0b24E=aAOK1NFWIMal4FrsocE13hvo95BTk3RL18eVMCncRzk=
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in bioc-devel?

2020-07-07 Thread Bhagwat, Aditya
Fantastic - thankyou Herve!

Aditya

From: Hervé Pagès [hpa...@fredhutch.org]
Sent: Tuesday, July 07, 2020 11:55 AM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in 
bioc-devel?

On 7/7/20 02:50, Hervé Pagès wrote:
>
>
> On 7/7/20 02:46, Bhagwat, Aditya wrote:
>> Thankyou Herve!
>>
>> I guess that means that I can pull the fix in a day or two, is that
>> right?
>
> Yes, as soon as BSgenome 1.57.4 propagates, which should take between
> 24h and 48h.

To be clear: BSgenome 1.57.4 will become available for installation with
BiocManager::install("BSgenome") only in a couple of days but you can
get it now by installing directly from GitHub with
BiocManager::install("Bioconductor/BSgenome").

H.

>
> H.
>
>>
>> Aditya
>> 
>> From: Hervé Pagès [hpa...@fredhutch.org]
>> Sent: Tuesday, July 07, 2020 11:45 AM
>> To: Bhagwat, Aditya; bioc-devel@r-project.org
>> Subject: Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in
>> bioc-devel?
>>
>> Hi Aditya,
>>
>> This change was not intended, sorry. Should be fixed in BSgenome 1.57.4.
>>
>> Cheers,
>> H.
>>
>> On 7/7/20 01:53, Bhagwat, Aditya wrote:
>>> Sorry, my earlier email had a copy/paste error - corrected now
>>> 
>>> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of
>>> Bhagwat, Aditya [aditya.bhag...@mpi-bn.mpg.de]
>>> Sent: Tuesday, July 07, 2020 10:50 AM
>>> To: bioc-devel@r-project.org
>>> Subject: Re: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in
>>> bioc-devel?
>>>
>>> It's the BSgenome to GenomeDescription coercer that seems to be
>>> missing - is this on purpose?
>>>
>>> # bsgenomeName(BSgenomeObj) FAILS
>>> #---
>>> bsname <- GenomeInfoDb::bsgenomeName(bsgenome)
>>> index_genome(bsgenome, indexedgenomesdir = tempdir())
>>> Error in h(simpleError(msg, call)) :
>>> unable to find an inherited method for function 'bsgenomeName'
>>> for signature '"BSgenome"
>>>
>>> # as(bsgenome, 'GenomeDescription') also FAILS
>>> #
>>> bsname <- GenomeInfoDb::bsgenomeName(as(bsgenome,
>>> 'GenomeDescription')) # ALSO FAILS
>>> index_genome(bsgenome, indexedgenomesdir = tempdir())
>>> Error in h(simpleError(msg, call)) :
>>> error in evaluating the argument 'x' in selecting a method for
>>> function 'bsgenomeName': no method or default for coercing "BSgenome"
>>> to "GenomeDescription"
>>>
>>> Aditya
>>>
>>> 
>>> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of
>>> Bhagwat, Aditya [aditya.bhag...@mpi-bn.mpg.de]
>>> Sent: Tuesday, July 07, 2020 10:22 AM
>>> To: bioc-devel@r-project.org
>>> Subject: [Bioc-devel] bsGenomeName(BSgenomeObject) disappeared in
>>> bioc-devel?
>>>
>>> Dear bioc-devel,
>>>
>>> multicrispr is having an error on the bioc-devel build
>>> machines<https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_checkResults_3.12_bioc-2DLATEST_multicrispr_malbec1-2Dchecksrc.html=DwIFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=_1eFc4UGidLaNS3zXQqjaaIUIU5zu68VT0g6PN0b24E=N-MGRsB-Th9-Po32CSpim6CogpSSPciuLCjci-Uuh8g=
>>> >, caused by:
>>>
>>> unable to find an inherited method for function 'bsgenomeName' for
>>> signature '"BSgenome"
>>>
>>> This is a bit strange, because normally a BSgenome object gets
>>> automatically converted to a GenomeDescription object before being
>>> sent to the method bsgenomeName. In bioc-devel, for some reason this
>>> mechanism seems to be broken. Is it on purpose? What would be the
>>> best fix/patch?
>>>
>>> Right now, I'm checking whether explicitation fixes things:
>>> bsname <-
>>> GenomeInfoDb::bsgenomeName(bsgenome)
>>> # FAILS
>>> bsname <- GenomeInfoDb::bsgenomeName(as(bsgenome,
>>> 'GenomeDescription')) # WORKS ? (VERIFYING)
>>>
>>> Thank you for feedback :-)
>>>
>>> Aditya
>>>
>>>   [[alternative HTML version deleted]]
>>>
>>> 

[Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a branch 'RELEASE_3_11' cannot be created from it

2020-06-11 Thread Bhagwat, Aditya
Dear Bioc Core,

Thank you for accepting multicrispr on 
BioC.
I was going through the instructions on how to sync existing 
repositories.
All instructions up to point 9 work.

Point 10, however fails, when creating the release branch:
$ git checkout -b RELEASE_3_11 upstream/RELEASE_3_11
fatal: 'upstream/RELEASE_3_11' is not a commit and a branch 'RELEASE_3_11' 
cannot be created  from it

I wonder what I am doing wrong (seems like it should be something trivial, but 
I am not seeing it).

Thank you for helping!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a branch 'RELEASE_3_11' cannot be created from it

2020-06-11 Thread Bhagwat, Aditya
Thankyou Lori, Nitesh,

So you are saying is that

  *   upstream/RELEASE_*_* is created from your end with the next BioC release.
  *   The git error message arises because upstream/RELEASE_3_11 does not exist 
to copy the contents of (locally created) RELEASE_3_11 into.

The points under http://bioconductor.org/developers/how-to/git/faq/ #14 did 
work for me, but the above explanation makes sense.

Thanks!

Aditya



From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Thursday, June 11, 2020 6:18 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a 
branch 'RELEASE_3_11' cannot be created from it

Newly accepted packages are added into devel only.  They will be a release 
branch as soon as the next scheduled Bioconductor release occurs.


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Bhagwat, 
Aditya 
Sent: Thursday, June 11, 2020 12:15 PM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a 
branch 'RELEASE_3_11' cannot be created from it

Dear Bioc Core,

Thank you for accepting multicrispr on 
BioC<https://secure-web.cisco.com/1Y5aHfpLYwxVOA6Cj3bxEVuenqe6D8K2mYcqfAmwz6Fi6eyzj9ZL85VY2H0PZrFkpkFbO_w5JIWkUIDUjYSFPRsOWFmm_pDVgAJBRmO1uCb9LgJF5mh_FUnj2zUAPVJbk9HPy9BzwWn8pHCQOyHp9T_xfBrl1tFAkt7ZrYGptTyl0u1HYKnjYTmc-l5OjWQoyG3bIOTfYV1lc_cNItl7XwfDdyx4NkyoksEKEZfvA_2PlrkF8te0_49BJxVw47JVUK1Tjj-F-nuVxx2gAF3nDoPiQl1f5bOcc1zKgTYrgh1x2so6QfUHbqu_M7rEFHBQ9OFgeC30W0bOLRd4YEfu5XA/https%3A%2F%2Fgithub.com%2FBioconductor%2FContributions%2Fissues%2F1486>.
I was going through the instructions on how to sync existing 
repositories<https://secure-web.cisco.com/1Xe9o5uhiFHsFKbmSBGQuqpBQbV-hql0tIGwW7TBNOQyRT67-xND5I-TSMoVXC2pXrgYxNb-5-sPbt5rOOjgawuDHJv7Y7yJFhVBV51tIj9xSy4Nk-GlEPEtV_1RG79MZbJ5d7LZ82dFp-BZmzPwrjKRAib7x8vphKaRW4qc_q6EIPMOrs282w4JaJxxRd-Jsxi6X_gw41rihMwaTa84WG62bi5im6SkX3hoyYq6qwitZdOiUeNN9FYJHB_VREnXcM16haB2xWae5BwiJJdtuP3Y1T-wW9L0HnO14_bhkOd794PH_lDa0cdICiv2NVJVoaSyYZK93SX5T2wpON1FgO_xhi1D5Z45H163FH75DerE/https%3A%2F%2Fbioconductor.org%2Fdevelopers%2Fhow-to%2Fgit%2Fsync-existing-repositories%2F>.
All instructions up to point 9 work.

Point 10, however fails, when creating the release branch:
$ git checkout -b RELEASE_3_11 upstream/RELEASE_3_11
fatal: 'upstream/RELEASE_3_11' is not a commit and a branch 'RELEASE_3_11' 
cannot be created  from it

I wonder what I am doing wrong (seems like it should be something trivial, but 
I am not seeing it).

Thank you for helping!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://secure-web.cisco.com/1dD24Fmu9gjmLPt6fORbREp3nCtIbsBT2Cqq9kX7NWPdnENsspbw0obk03UOZoUcHCDqDJWOZ5LqGeBEON5GsAOUcxgLaOIPiHdXbUkUYRI0JJpGgw9gxTbtdNb5tOM-Tcn6oSrOEmurcRssmw-W7BYV3C5hGKbLQi-Ykb7JeZLS-9VgZSsSCiVcRcC12WqKjR0OmBY6UK1bpRAs--1MCuX1a-WnFzbBEIcdBgOf2IDH6hYDxiFUmniKrXYc52F2MYBhX1qSx1zfJDqxP1bPI1d_4A9MTbwsAuGI9764_XDo3xrBlhnd75yY6a0PaYCvwifkIOcQ2KhxLc0TWUamlFg/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel


This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. If you have received this message in error, please 
notify the sender immediately by e-mail and delete this email message from your 
computer. Thank you.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a branch 'RELEASE_3_11' cannot be created from it

2020-06-12 Thread Bhagwat, Aditya
Thankyou Lori,

Is there a way to get a visual overview of recent commits at 
https://git.bioconductor.org/packages/multicrispr? (as in e.g. github)
Asking because I cannot see recent commits (pushed to upstream successfully 
yesterday) reflected in build 
reports<https://bioconductor.org/checkResults/3.12/bioc-LATEST/> or landing 
page<https://bioconductor.org/packages/devel/bioc/html/multicrispr.html>.
A related question: the CI/CD feature during package submission (master branch 
push to github triggers build attempt on the BioC servers) was actually very 
useful, but I can understand that you don't provide this for accepted packages 
to keep the load on your servers manageable.
What CI/CD setup would you recommend (from the perspective of being maximally 
relevant for BioC).

Travis?

Thank you for helping!

Aditya


From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Thursday, June 11, 2020 6:18 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a 
branch 'RELEASE_3_11' cannot be created from it
Newly accepted packages are added into devel only.  They will be a release 
branch as soon as the next scheduled Bioconductor release occurs.


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Bhagwat, 
Aditya 
Sent: Thursday, June 11, 2020 12:15 PM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a 
branch 'RELEASE_3_11' cannot be created from it

Dear Bioc Core,

Thank you for accepting multicrispr on 
BioC<https://secure-web.cisco.com/1Y5aHfpLYwxVOA6Cj3bxEVuenqe6D8K2mYcqfAmwz6Fi6eyzj9ZL85VY2H0PZrFkpkFbO_w5JIWkUIDUjYSFPRsOWFmm_pDVgAJBRmO1uCb9LgJF5mh_FUnj2zUAPVJbk9HPy9BzwWn8pHCQOyHp9T_xfBrl1tFAkt7ZrYGptTyl0u1HYKnjYTmc-l5OjWQoyG3bIOTfYV1lc_cNItl7XwfDdyx4NkyoksEKEZfvA_2PlrkF8te0_49BJxVw47JVUK1Tjj-F-nuVxx2gAF3nDoPiQl1f5bOcc1zKgTYrgh1x2so6QfUHbqu_M7rEFHBQ9OFgeC30W0bOLRd4YEfu5XA/https%3A%2F%2Fgithub.com%2FBioconductor%2FContributions%2Fissues%2F1486>.
I was going through the instructions on how to sync existing 
repositories<https://secure-web.cisco.com/1Xe9o5uhiFHsFKbmSBGQuqpBQbV-hql0tIGwW7TBNOQyRT67-xND5I-TSMoVXC2pXrgYxNb-5-sPbt5rOOjgawuDHJv7Y7yJFhVBV51tIj9xSy4Nk-GlEPEtV_1RG79MZbJ5d7LZ82dFp-BZmzPwrjKRAib7x8vphKaRW4qc_q6EIPMOrs282w4JaJxxRd-Jsxi6X_gw41rihMwaTa84WG62bi5im6SkX3hoyYq6qwitZdOiUeNN9FYJHB_VREnXcM16haB2xWae5BwiJJdtuP3Y1T-wW9L0HnO14_bhkOd794PH_lDa0cdICiv2NVJVoaSyYZK93SX5T2wpON1FgO_xhi1D5Z45H163FH75DerE/https%3A%2F%2Fbioconductor.org%2Fdevelopers%2Fhow-to%2Fgit%2Fsync-existing-repositories%2F>.
All instructions up to point 9 work.

Point 10, however fails, when creating the release branch:
$ git checkout -b RELEASE_3_11 upstream/RELEASE_3_11
fatal: 'upstream/RELEASE_3_11' is not a commit and a branch 'RELEASE_3_11' 
cannot be created  from it

I wonder what I am doing wrong (seems like it should be something trivial, but 
I am not seeing it).

Thank you for helping!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://secure-web.cisco.com/1dD24Fmu9gjmLPt6fORbREp3nCtIbsBT2Cqq9kX7NWPdnENsspbw0obk03UOZoUcHCDqDJWOZ5LqGeBEON5GsAOUcxgLaOIPiHdXbUkUYRI0JJpGgw9gxTbtdNb5tOM-Tcn6oSrOEmurcRssmw-W7BYV3C5hGKbLQi-Ykb7JeZLS-9VgZSsSCiVcRcC12WqKjR0OmBY6UK1bpRAs--1MCuX1a-WnFzbBEIcdBgOf2IDH6hYDxiFUmniKrXYc52F2MYBhX1qSx1zfJDqxP1bPI1d_4A9MTbwsAuGI9764_XDo3xrBlhnd75yY6a0PaYCvwifkIOcQ2KhxLc0TWUamlFg/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. If you have received this message in error, please 
notify the sender immediately by e-mail and delete this email message from your 
computer. Thank you.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a branch 'RELEASE_3_11' cannot be created from it

2020-06-12 Thread Bhagwat, Aditya
Owkies, thx!

Aditya

From: Shepherd, Lori 
Sent: Freitag, 12. Juni 2020 17:42
To: Bhagwat, Aditya ; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a 
branch 'RELEASE_3_11' cannot be created from it

If you git clone the git.bioconductor.org repository for your package in a 
different folder location you would be able to see the copy and commits we have 
on our server (I understand not very convenient)

There is a delay from commits to updates on the building reports and landing 
page.  It is discussed at the top of the page and has been answered numerous 
times on the mailing list

http://bioconductor.org/developers/how-to/troubleshoot-build-report/

We are in the process of potentially expanding the submission process tracker 
building on commit to the daily builder.  There is still no time period on when 
that would be available as it has just started being explored.

Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263

____
From: Bhagwat, Aditya 
mailto:aditya.bhag...@mpi-bn.mpg.de>>
Sent: Friday, June 12, 2020 11:33 AM
To: Shepherd, Lori 
mailto:lori.sheph...@roswellpark.org>>; 
bioc-devel@r-project.org<mailto:bioc-devel@r-project.org> 
mailto:bioc-devel@r-project.org>>
Subject: RE: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a 
branch 'RELEASE_3_11' cannot be created from it


Thankyou Lori,



Is there a way to get a visual overview of recent commits at 
https://git.bioconductor.org/packages/multicrispr<https://secure-web.cisco.com/1DzTC21Xsa01UK-VZkKk4yvwj_VU6qp85rqzcKCvvPMtm2vah4-NZjCAOKXcKftajD-J_w4XNYR6kLkwlc3GHf2OiIPJgqZyL75sZ8VqF0VU14xhmRORFhrGnYF9LaVvgeCdpIHt6nYIpOP1Iqtm4SxSlQZxBPes6PuXMj0O8IKEzoYniEDKV8i0Om9zJq0V6MZ5t82_OwcYqQlomckoYFnPfegBOQTE0zImOFpX4OVb7EbkBxnsULFIX__s_wylxcSctW2sWmwOs1YzSytASDL87V2RYWBOBRZ6vrGyj7WCe3ucom8SmlnU_D9whWOO34I4eG4ErtNZIravWIuPUXw/https%3A%2F%2Fgit.bioconductor.org%2Fpackages%2Fmulticrispr>?
 (as in e.g. github)

Asking because I cannot see recent commits (pushed to upstream successfully 
yesterday) reflected in build 
reports<https://secure-web.cisco.com/1jJOzEyoW2u-3tHDFcgIXdMcB7kOXmu9Nm8ZW3YJPv38b7AL5FKIwuPDG-mk-VK2RdohHNb094i-Jw69AJKUHMEPRCwSDJc3epGdu1oXfs3y1pB4y9EIq5qq64scEvOXk4qo434vvfrRlgLpROeZnAVed-P0lE-Wg3VADtxGOEomvwrpyClTfASKZR3hzSGuZI7oP522KScMh4MwjaWXtAvFU6AX54GaGDPWsBy2-LvKJhkK5MXA88qDQPAps3DTlT9OOx3QBU3Vn-K3rD4-PosS7lXJ7JkrlXfSzmnMzctmZx8fpRukHIFeC7zo7VIE1D3-7SqwQriVj90ow34NVGW2G_s4ro-fKJSxtqlNzak8/https%3A%2F%2Fbioconductor.org%2FcheckResults%2F3.12%2Fbioc-LATEST%2F>
 or landing 
page<https://secure-web.cisco.com/1A2W_b5ZAOGQ8Ht9ZFzC1E_hGsPYkaraU-OYX5DQLKHG1CDSVVsO3z1RrEZifTEa7Be45ASdmCX4DLcPoH3dlRN9kF-SCtD1Cqzd1h74rL-bm97PldOPsCjm_QrIhfwXtWbjmgjCOqdc4z_LYwHKcUQXizoM8Twn_0y0mvBIg_kXlp9AzxF31kdHa33NPA-VVAAk1VuF1yJWS2bFMg_LU51SXt24BoFVNmtsGKIyWbU5C2UB1yLcTq5EvalNJzI7eBgOwPDLObUy0W8SYa5A24zf3N6GDwchPgOXL5EOhsK-_W4VZnHnQFeNr82ysGdn38SUlyqrf_UwczMyBW5nC6ARU77Kgfl72pViMawG0aok/https%3A%
 2F%2Fbioconductor.org%2Fpackages%2Fdevel%2Fbioc%2Fhtml%2Fmulticrispr.html>.

A related question: the CI/CD feature during package submission (master branch 
push to github triggers build attempt on the BioC servers) was actually very 
useful, but I can understand that you don't provide this for accepted packages 
to keep the load on your servers manageable.

What CI/CD setup would you recommend (from the perspective of being maximally 
relevant for BioC).



Travis?



Thank you for helping!



Aditya





From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Thursday, June 11, 2020 6:18 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
Subject: Re: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a 
branch 'RELEASE_3_11' cannot be created from it

Newly accepted packages are added into devel only.  They will be a release 
branch as soon as the next scheduled Bioconductor release occurs.



Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263



From: Bioc-devel 
mailto:bioc-devel-boun...@r-project.org>> on 
behalf of Bhagwat, Aditya 
mailto:aditya.bhag...@mpi-bn.mpg.de>>
Sent: Thursday, June 11, 2020 12:15 PM
To: bioc-devel@r-project.org<mailto:bioc-devel@r-project.org> 
mailto:bioc-devel@r-project.org>>
Subject: [Bioc-devel] fatal: 'upstream/RELEASE_3_11' is not a commit and a 
branch 'RELEASE_3_11' cannot be created from it



Dear Bioc Core,

Thank you for accepting multicrispr on 
BioC<https://secure-web.cisco.com/1Y5aHfpLYwxVOA6Cj3bxEVuenqe6D8K2mYcqfAmwz6Fi6eyzj9ZL85VY2H0PZrFkpkFbO_w5JIWkUIDUjYSFPRsOWFmm_

[Bioc-devel] autonomics: warnings only on tokay2

2021-02-09 Thread Bhagwat, Aditya
Dear Bioc-devel,

Only on tokay2 (windows server), autonomics is giving warnings:
http://bioconductor.org/spb_reports/autonomics_buildreport_20210209130019.html

It looks like something strange is happening on tokay2, or have I overlooked 
something?

Thank you for feedback!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] autonomics DOI link inactive

2021-03-25 Thread Bhagwat, Aditya
Dear bioc-devel,

Thank you for having 
autonomics 
on bioc-devel.
The DOI link seems to be inactive, is this normal: 
https://doi.org/doi:10.18129/B9.bioc.autonomics

Thankyou,

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Converting gene ids to GRanges - ensembl centric TxDb missing for human

2019-10-15 Thread Bhagwat, Aditya via Bioc-devel
Thankyou Lori,

Cheers,

Aditya



From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Tuesday, October 15, 2019 2:22 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: Converting gene ids to GRanges - ensembl centric TxDb missing for 
human

Again we would not recommend posting these types of questions to both the 
mailing list and the support site.
Since this is not a developer question -  the support site was the appropriate 
place -
https://support.bioconductor.org/p/125609/

We have members of the team working on developing the new TxDbs for the release 
and will look into the reasoning and post on the support site thread.



Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Bhagwat, 
Aditya 
Sent: Tuesday, October 15, 2019 8:15 AM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] Converting gene ids to GRanges - ensembl centric TxDb 
missing for human

Dear BioC devel,

I want to convert geneids to GRanges by doing:
GenomicFeatures::genes(txdb)[geneids]

Works wonderfully for mouse, with entrezgene as well ensemblgene-centric TxDbs:
txdb <- 
TxDb.Mmusculus.UCSC.mm10.knownGene::TxDb.Mmusculus.UCSC.mm10.knownGene
GenomicFeatures::genes(txdb)[c('19600', '99889', '99982')]

txdb <- 
TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene
GenomicFeatures::genes(txdb)[c('ENSMUSG001', 
'ENSMUSG003')]

For human, hower, ensembl-centric TxDbs seem to be missing:
txdb <- 
TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene
GenomicFeatures::genes(txdb)[c('1', '10', '100')]

   # No TxDb.Hsapiens.UCSC.hg38.ensGene::TxDb.Hsapiens.UCSC.hg38.ensGene

Has this been a (perhaps recent) design choice to no longer offer the 
ensemble-centric TxDbs?

(The larger context of this question is the development of multicrispr 
(https://gitlab.gwdg.de/loosolab/software/multicrispr))

Thankyou for feedback!

Aditya

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. If you have received this message in error, please 
notify the sender immediately by e-mail and delete this email message from your 
computer. Thank you.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel