Re: [Bioc-devel] read_bed()

2019-09-18 Thread Michael Lawrence via Bioc-devel
I'd suggest separating the operations on the data from the interface.
You can have both, one layer for programming and another for
interactive analysis.

On Wed, Sep 18, 2019 at 5:05 AM Bhagwat, Aditya
 wrote:
>
> In the end I endeavour to end up with a handful of verbs, with which I can do 
> all tasks in a project.
>
> Regarding the BED files: they're basic bed files, with additional metadata 
> columns to allow traceback. But for the purpose of multicrispr, non need to 
> restrict to those files only. You extraCols works great for me. And for 
> multicrispr examples, I have removed the metadata cols to keep things simple. 
> You were right btw that things went wrong earlier in the column stripping 
> process.
>
> Aditya
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 1:57 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Hi Michael,
>
> That's a software design dilemma I've encountered a few times.
>
> One approach is to keep the "verb" functions bare. E.g. read_bed would only 
> read a bedfile, and plot_bed would somehow plot it. Advantage: if read_bed 
> doesn't depend on anything else, other functions can depend on it, which 
> makes dependency handling easier.
>
> Another intention is to make verb functions "intuitive". In that scenario, I 
> try for each operation to also output a visual image of the operation, to 
> make it easier to see at a glance what is happening. E.g. for the range 
> operations in multicrispr, the function plot_intervals visually shows what 
> operation is being performed, making it easier to both spot errors as well as 
> maintain focus.
>
> In the case of read_bed, I thought of wrapping around your excellent 
> core-level rtracklayer::import(), additionally providing the textual and 
> visual feedback which I intent to give.
>
> Interesting to hear your suggestions on this topic, though.
>
> Aditya
>
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Wednesday, September 18, 2019 1:33 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I'm not sure if a function called read_bed() should be plotting or
> printing. Is your BED file a known BED variant, i.e., maybe there is a
> better name for the file type than "bed"?
>
>
> On Wed, Sep 18, 2019 at 3:17 AM Bhagwat, Aditya
>  wrote:
> >
> > Actually,
> >
> > I will keep multicrispr::read_bed(), but wrap it around 
> > rtracklayer::import.bed, and additionally plot and print range summaries.
> >
> > Aditya
> >
> > ____
> > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> > Aditya [aditya.bhag...@mpi-bn.mpg.de]
> > Sent: Wednesday, September 18, 2019 11:31 AM
> > To: Michael Lawrence
> > Cc: bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > (Typo corrected to avoid confusion)
> >
> > Michael,
> >
> > rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
> > multicrispr::read_bed().
> >
> > In order to avoid the overkill of `require(tracklayer)` for multicrispr 
> > <https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
> > sense to import/re-export import.bed() in multicrispr? What is BioC 
> > convention/best practice in such cases?
> >
> > Aditya
> >
> >
> >
> > 
> > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> > Aditya [aditya.bhag...@mpi-bn.mpg.de]
> > Sent: Wednesday, September 18, 2019 8:35 AM
> > To: Michael Lawrence
> > Cc: bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > Thank you Michael :-)
> >
> > Aditya
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 8:49 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > I think you probably made a mistake when dropping the columns. When I
> > provide the extraCols= argument (inventing my own names for things),
> > it almost works, but it breaks due to NAs in the extra columns. The
> > "." character is the standard way to expres

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
In the end I endeavour to end up with a handful of verbs, with which I can do 
all tasks in a project.

Regarding the BED files: they're basic bed files, with additional metadata 
columns to allow traceback. But for the purpose of multicrispr, non need to 
restrict to those files only. You extraCols works great for me. And for 
multicrispr examples, I have removed the metadata cols to keep things simple. 
You were right btw that things went wrong earlier in the column stripping 
process.

Aditya



From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 1:57 PM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Hi Michael,

That's a software design dilemma I've encountered a few times.

One approach is to keep the "verb" functions bare. E.g. read_bed would only 
read a bedfile, and plot_bed would somehow plot it. Advantage: if read_bed 
doesn't depend on anything else, other functions can depend on it, which makes 
dependency handling easier.

Another intention is to make verb functions "intuitive". In that scenario, I 
try for each operation to also output a visual image of the operation, to make 
it easier to see at a glance what is happening. E.g. for the range operations 
in multicrispr, the function plot_intervals visually shows what operation is 
being performed, making it easier to both spot errors as well as maintain focus.

In the case of read_bed, I thought of wrapping around your excellent core-level 
rtracklayer::import(), additionally providing the textual and visual feedback 
which I intent to give.

Interesting to hear your suggestions on this topic, though.

Aditya



From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Wednesday, September 18, 2019 1:33 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I'm not sure if a function called read_bed() should be plotting or
printing. Is your BED file a known BED variant, i.e., maybe there is a
better name for the file type than "bed"?


On Wed, Sep 18, 2019 at 3:17 AM Bhagwat, Aditya
 wrote:
>
> Actually,
>
> I will keep multicrispr::read_bed(), but wrap it around 
> rtracklayer::import.bed, and additionally plot and print range summaries.
>
> Aditya
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 11:31 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> (Typo corrected to avoid confusion)
>
> Michael,
>
> rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
> multicrispr::read_bed().
>
> In order to avoid the overkill of `require(tracklayer)` for multicrispr 
> <https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
> sense to import/re-export import.bed() in multicrispr? What is BioC 
> convention/best practice in such cases?
>
> Aditya
>
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 8:35 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Thank you Michael :-)
>
> Aditya
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 8:49 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I think you probably made a mistake when dropping the columns. When I
> provide the extraCols= argument (inventing my own names for things),
> it almost works, but it breaks due to NAs in the extra columns. The
> "." character is the standard way to express NA in BED files. I've
> added support for extra na.strings to version 1.45.6.
>
> For reference, the call is like:
>
> import("SRF.bed", extraCols=c(chr2="character", start2="integer",
> end2="integer", mDux="factor", type="factor", pos1="integer",
> pos2="integer", strand2="factor", from="factor", n="integer",
> code="character", anno="factor", id="character", biotype="character",
> score2="numeric" ), na.strings="NA")
>
>
> On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > I removed the additi

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
Hi Michael, 

That's a software design dilemma I've encountered a few times.

One approach is to keep the "verb" functions bare. E.g. read_bed would only 
read a bedfile, and plot_bed would somehow plot it. Advantage: if read_bed 
doesn't depend on anything else, other functions can depend on it, which makes 
dependency handling easier.

Another intention is to make verb functions "intuitive". In that scenario, I 
try for each operation to also output a visual image of the operation, to make 
it easier to see at a glance what is happening. E.g. for the range operations 
in multicrispr, the function plot_intervals visually shows what operation is 
being performed, making it easier to both spot errors as well as maintain focus.

In the case of read_bed, I thought of wrapping around your excellent core-level 
rtracklayer::import(), additionally providing the textual and visual feedback 
which I intent to give.

Interesting to hear your suggestions on this topic, though.

Aditya



From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Wednesday, September 18, 2019 1:33 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I'm not sure if a function called read_bed() should be plotting or
printing. Is your BED file a known BED variant, i.e., maybe there is a
better name for the file type than "bed"?


On Wed, Sep 18, 2019 at 3:17 AM Bhagwat, Aditya
 wrote:
>
> Actually,
>
> I will keep multicrispr::read_bed(), but wrap it around 
> rtracklayer::import.bed, and additionally plot and print range summaries.
>
> Aditya
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 11:31 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> (Typo corrected to avoid confusion)
>
> Michael,
>
> rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
> multicrispr::read_bed().
>
> In order to avoid the overkill of `require(tracklayer)` for multicrispr 
> <https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
> sense to import/re-export import.bed() in multicrispr? What is BioC 
> convention/best practice in such cases?
>
> Aditya
>
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 8:35 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Thank you Michael :-)
>
> Aditya
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 8:49 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I think you probably made a mistake when dropping the columns. When I
> provide the extraCols= argument (inventing my own names for things),
> it almost works, but it breaks due to NAs in the extra columns. The
> "." character is the standard way to express NA in BED files. I've
> added support for extra na.strings to version 1.45.6.
>
> For reference, the call is like:
>
> import("SRF.bed", extraCols=c(chr2="character", start2="integer",
> end2="integer", mDux="factor", type="factor", pos1="integer",
> pos2="integer", strand2="factor", from="factor", n="integer",
> code="character", anno="factor", id="character", biotype="character",
> score2="numeric" ), na.strings="NA")
>
>
> On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > I removed the additional metadata columns in SRF.bed
> > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> >
> > But still can't get rtracklayer::import.bed working:
> >
> > > rtracklayer::import.bed(bedfile)
> > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > dec, : scan() expected 'a real', got '1.168.595'
> > > bedfile
> > [1] 
> > "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
> >
> > Never mind, multicrispr function read_bed, based on data.table::fread is 
> > doing the job, so I will stick to that .
> >
> > Thank you for all feedback,
> >
> > Cheers,
> >
> > Aditya
> &

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Michael Lawrence via Bioc-devel
I'm not sure if a function called read_bed() should be plotting or
printing. Is your BED file a known BED variant, i.e., maybe there is a
better name for the file type than "bed"?


On Wed, Sep 18, 2019 at 3:17 AM Bhagwat, Aditya
 wrote:
>
> Actually,
>
> I will keep multicrispr::read_bed(), but wrap it around 
> rtracklayer::import.bed, and additionally plot and print range summaries.
>
> Aditya
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 11:31 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> (Typo corrected to avoid confusion)
>
> Michael,
>
> rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
> multicrispr::read_bed().
>
> In order to avoid the overkill of `require(tracklayer)` for multicrispr 
> <https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
> sense to import/re-export import.bed() in multicrispr? What is BioC 
> convention/best practice in such cases?
>
> Aditya
>
>
>
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Wednesday, September 18, 2019 8:35 AM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Thank you Michael :-)
>
> Aditya
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 8:49 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I think you probably made a mistake when dropping the columns. When I
> provide the extraCols= argument (inventing my own names for things),
> it almost works, but it breaks due to NAs in the extra columns. The
> "." character is the standard way to express NA in BED files. I've
> added support for extra na.strings to version 1.45.6.
>
> For reference, the call is like:
>
> import("SRF.bed", extraCols=c(chr2="character", start2="integer",
> end2="integer", mDux="factor", type="factor", pos1="integer",
> pos2="integer", strand2="factor", from="factor", n="integer",
> code="character", anno="factor", id="character", biotype="character",
> score2="numeric" ), na.strings="NA")
>
>
> On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > I removed the additional metadata columns in SRF.bed
> > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> >
> > But still can't get rtracklayer::import.bed working:
> >
> > > rtracklayer::import.bed(bedfile)
> > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > dec, : scan() expected 'a real', got '1.168.595'
> > > bedfile
> > [1] 
> > "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
> >
> > Never mind, multicrispr function read_bed, based on data.table::fread is 
> > doing the job, so I will stick to that .
> >
> > Thank you for all feedback,
> >
> > Cheers,
> >
> > Aditya
> >
> >
> > ____
> > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> > Aditya [aditya.bhag...@mpi-bn.mpg.de]
> > Sent: Tuesday, September 17, 2019 2:48 PM
> > To: Michael Lawrence
> > Cc: bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > Oh :-) - Thankyou for explaining!
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:40 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > Having a "." in the function name does not make something "S3".
> > There's no dispatch from import() to import.bed(). Had I not been a
> > total newb when I created rtracklayer, I would have called the
> > function importBed() or something like that. Sorry for the confusion.
> >
> > On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Oh, superb, thx!
> > >
> > > I

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
Actually,

I will keep multicrispr::read_bed(), but wrap it around 
rtracklayer::import.bed, and additionally plot and print range summaries.

Aditya


From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 11:31 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

(Typo corrected to avoid confusion)

Michael,

rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
multicrispr::read_bed().

In order to avoid the overkill of `require(tracklayer)` for multicrispr 
<https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
sense to import/re-export import.bed() in multicrispr? What is BioC 
convention/best practice in such cases?

Aditya




From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 8:35 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Thank you Michael :-)

Aditya

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 8:49 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files. I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> ____
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
>

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
(Typo corrected to avoid confusion)

Michael,

rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
multicrispr::read_bed().

In order to avoid the overkill of `require(tracklayer)` for multicrispr 
<https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
sense to import/re-export import.bed() in multicrispr? What is BioC 
convention/best practice in such cases?

Aditya




From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 8:35 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Thank you Michael :-)

Aditya

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 8:49 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files. I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> ____
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Thankyou Michael,
> > >
> > > How do I use the extraCols argument? The documentation does not mention 
> > > an `extraCols` argument explicitly, so it must be one of the ellipsis 
> > > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > > extraCols = 10 (ten extra columns) or so?
> > >
> > > Aditya
> > >
> > > 

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
Michael,

rtracklayer::import.bed() indeed works perfectly for me, so I am dropping 
multicrispr::read_bed.

In order to avoid the overkill of `require(tracklayer` for multicrispr 
<https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make 
sense to import/re-export read.bed in multicrispr? What is BioC convention/best 
practice in such cases?

Aditya




From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Wednesday, September 18, 2019 8:35 AM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Thank you Michael :-)

Aditya

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 8:49 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files. I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> ____
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Thankyou Michael,
> > >
> > > How do I use the extraCols argument? The documentation does not mention 
> > > an `extraCols` argument explicitly, so it must be one of the ellipsis 
> > > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > > extraCols = 10 (ten extra columns) or so?
> > >
> > > Aditya
> > >
> > > 
> > >

Re: [Bioc-devel] read_bed()

2019-09-18 Thread Bhagwat, Aditya
Thank you Michael :-)

Aditya

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 8:49 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files.  I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> ________
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Thankyou Michael,
> > >
> > > How do I use the extraCols argument? The documentation does not mention 
> > > an `extraCols` argument explicitly, so it must be one of the ellipsis 
> > > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > > extraCols = 10 (ten extra columns) or so?
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 2:05 PM
> > > To: Bhagwat, Aditya
> > > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > It breaks it because it's not standard BED; however, using the
> > > extraCols= argument should work in this case. Requiring an explicit
> > > format specification is intentional, because it provides validation
> > > and type safety, and it communicates the format to a future reader.
> > > This also looks a bit like a be

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
I think you probably made a mistake when dropping the columns. When I
provide the extraCols= argument (inventing my own names for things),
it almost works, but it breaks due to NAs in the extra columns. The
"." character is the standard way to express NA in BED files.  I've
added support for extra na.strings to version 1.45.6.

For reference, the call is like:

import("SRF.bed", extraCols=c(chr2="character", start2="integer",
end2="integer", mDux="factor", type="factor", pos1="integer",
pos2="integer", strand2="factor", from="factor", n="integer",
code="character", anno="factor", id="character", biotype="character",
score2="numeric" ), na.strings="NA")


On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> I removed the additional metadata columns in SRF.bed
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> But still can't get rtracklayer::import.bed working:
>
> > rtracklayer::import.bed(bedfile)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
> : scan() expected 'a real', got '1.168.595'
> > bedfile
> [1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"
>
> Never mind, multicrispr function read_bed, based on data.table::fread is 
> doing the job, so I will stick to that .
>
> Thank you for all feedback,
>
> Cheers,
>
> Aditya
>
>
> ________
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
> Aditya [aditya.bhag...@mpi-bn.mpg.de]
> Sent: Tuesday, September 17, 2019 2:48 PM
> To: Michael Lawrence
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Oh :-) - Thankyou for explaining!
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:40 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> Having a "." in the function name does not make something "S3".
> There's no dispatch from import() to import.bed(). Had I not been a
> total newb when I created rtracklayer, I would have called the
> function importBed() or something like that. Sorry for the confusion.
>
> On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
>  wrote:
> >
> > Oh, superb, thx!
> >
> > Interesting ... here you use S3 rather than S4 - I wonder the design 
> > intention underlying these choices (I'm asking because I am trying to 
> > figure out myself when to use S3 and when to use S4 and whether to mix the 
> > two).
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:23 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > The generic documentation does not mention it, but see ?import.bed.
> > It's similar to colClasses on read.table().
> >
> > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Thankyou Michael,
> > >
> > > How do I use the extraCols argument? The documentation does not mention 
> > > an `extraCols` argument explicitly, so it must be one of the ellipsis 
> > > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > > extraCols = 10 (ten extra columns) or so?
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 2:05 PM
> > > To: Bhagwat, Aditya
> > > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > It breaks it because it's not standard BED; however, using the
> > > extraCols= argument should work in this case. Requiring an explicit
> > > format specification is intentional, because it provides validation
> > > and type safety, and it communicates the format to a future reader.
> > > This also looks a bit like a bedPE file, so you might consider using
> > > the Pairs data structure.
> > >
> > > Michael
> > >
> > > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
> > >  wrote:
> > > >
> > > > Hi Michael,
>

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Hi Michael,

I removed the additional metadata columns in SRF.bed
https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed

But still can't get rtracklayer::import.bed working:

> rtracklayer::import.bed(bedfile)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : 
scan() expected 'a real', got '1.168.595'
> bedfile
[1] "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed"

Never mind, multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr> 
function 
read_bed<https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/R/03_read.R>,
 based on data.table::fread is doing the job, so I will stick to that .

Thank you for all feedback,

Cheers,

Aditya



From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, 
Aditya [aditya.bhag...@mpi-bn.mpg.de]
Sent: Tuesday, September 17, 2019 2:48 PM
To: Michael Lawrence
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Oh :-) - Thankyou for explaining!

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:40 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Having a "." in the function name does not make something "S3".
There's no dispatch from import() to import.bed(). Had I not been a
total newb when I created rtracklayer, I would have called the
function importBed() or something like that. Sorry for the confusion.

On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
 wrote:
>
> Oh, superb, thx!
>
> Interesting ... here you use S3 rather than S4 - I wonder the design 
> intention underlying these choices (I'm asking because I am trying to figure 
> out myself when to use S3 and when to use S4 and whether to mix the two).
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:23 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> The generic documentation does not mention it, but see ?import.bed.
> It's similar to colClasses on read.table().
>
> On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
>  wrote:
> >
> > Thankyou Michael,
> >
> > How do I use the extraCols argument? The documentation does not mention an 
> > `extraCols` argument explicitly, so it must be one of the ellipsis 
> > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > extraCols = 10 (ten extra columns) or so?
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:05 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > It breaks it because it's not standard BED; however, using the
> > extraCols= argument should work in this case. Requiring an explicit
> > format specification is intentional, because it provides validation
> > and type safety, and it communicates the format to a future reader.
> > This also looks a bit like a bedPE file, so you might consider using
> > the Pairs data structure.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Michael,
> > >
> > > Yeah, I also noticed that the attachment was eaten when it entered the 
> > > bio-devel list.
> > >
> > > The file is also accessible in the extdata of the multicrispr:
> > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> > >
> > > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 
> > > 3 (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> > >
> > > I am curious as to why you feel that having additional columns in a 
> > > bedfile would break it?
> > >
> > > Cheers,
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 1:41 PM
> > > To: Bhagwat, Aditya
> > > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > I don't see an attachment, nor can I find the multicrispr package
> > > anywhere. The "addressed soon" was referring t

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Oh :-) - Thankyou for explaining!

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:40 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Having a "." in the function name does not make something "S3".
There's no dispatch from import() to import.bed(). Had I not been a
total newb when I created rtracklayer, I would have called the
function importBed() or something like that. Sorry for the confusion.

On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
 wrote:
>
> Oh, superb, thx!
>
> Interesting ... here you use S3 rather than S4 - I wonder the design 
> intention underlying these choices (I'm asking because I am trying to figure 
> out myself when to use S3 and when to use S4 and whether to mix the two).
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:23 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> The generic documentation does not mention it, but see ?import.bed.
> It's similar to colClasses on read.table().
>
> On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
>  wrote:
> >
> > Thankyou Michael,
> >
> > How do I use the extraCols argument? The documentation does not mention an 
> > `extraCols` argument explicitly, so it must be one of the ellipsis 
> > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > extraCols = 10 (ten extra columns) or so?
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:05 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > It breaks it because it's not standard BED; however, using the
> > extraCols= argument should work in this case. Requiring an explicit
> > format specification is intentional, because it provides validation
> > and type safety, and it communicates the format to a future reader.
> > This also looks a bit like a bedPE file, so you might consider using
> > the Pairs data structure.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Michael,
> > >
> > > Yeah, I also noticed that the attachment was eaten when it entered the 
> > > bio-devel list.
> > >
> > > The file is also accessible in the extdata of the multicrispr:
> > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> > >
> > > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 
> > > 3 (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> > >
> > > I am curious as to why you feel that having additional columns in a 
> > > bedfile would break it?
> > >
> > > Cheers,
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 1:41 PM
> > > To: Bhagwat, Aditya
> > > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > I don't see an attachment, nor can I find the multicrispr package
> > > anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> > > which was addressed many years ago, so I've updated the documentation.
> > > Broken BED files will never be supported.
> > >
> > > Michael
> > >
> > > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> > >  wrote:
> > > >
> > > > Hi Lori,
> > > >
> > > > I remember now - I tried this function earlier, but it does not work 
> > > > for my bedfiles, like the one in attach.
> > > >
> > > > > bedfile  <- system.file('extdata/SRF.bed', package = 
> > > > > 'multicrispr')
> > > > >
> > > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > > > 'mm10')
> > > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > > > dec,  : scan() expected 'an integer', got 'chr2'
> > > > >
> > > >
> > > > Perhaps th

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
Having a "." in the function name does not make something "S3".
There's no dispatch from import() to import.bed(). Had I not been a
total newb when I created rtracklayer, I would have called the
function importBed() or something like that. Sorry for the confusion.

On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
 wrote:
>
> Oh, superb, thx!
>
> Interesting ... here you use S3 rather than S4 - I wonder the design 
> intention underlying these choices (I'm asking because I am trying to figure 
> out myself when to use S3 and when to use S4 and whether to mix the two).
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:23 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> The generic documentation does not mention it, but see ?import.bed.
> It's similar to colClasses on read.table().
>
> On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
>  wrote:
> >
> > Thankyou Michael,
> >
> > How do I use the extraCols argument? The documentation does not mention an 
> > `extraCols` argument explicitly, so it must be one of the ellipsis 
> > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > extraCols = 10 (ten extra columns) or so?
> >
> > Aditya
> >
> > 
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:05 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > It breaks it because it's not standard BED; however, using the
> > extraCols= argument should work in this case. Requiring an explicit
> > format specification is intentional, because it provides validation
> > and type safety, and it communicates the format to a future reader.
> > This also looks a bit like a bedPE file, so you might consider using
> > the Pairs data structure.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Michael,
> > >
> > > Yeah, I also noticed that the attachment was eaten when it entered the 
> > > bio-devel list.
> > >
> > > The file is also accessible in the extdata of the multicrispr:
> > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> > >
> > > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 
> > > 3 (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> > >
> > > I am curious as to why you feel that having additional columns in a 
> > > bedfile would break it?
> > >
> > > Cheers,
> > >
> > > Aditya
> > >
> > > 
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 1:41 PM
> > > To: Bhagwat, Aditya
> > > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > I don't see an attachment, nor can I find the multicrispr package
> > > anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> > > which was addressed many years ago, so I've updated the documentation.
> > > Broken BED files will never be supported.
> > >
> > > Michael
> > >
> > > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> > >  wrote:
> > > >
> > > > Hi Lori,
> > > >
> > > > I remember now - I tried this function earlier, but it does not work 
> > > > for my bedfiles, like the one in attach.
> > > >
> > > > > bedfile  <- system.file('extdata/SRF.bed', package = 
> > > > > 'multicrispr')
> > > > >
> > > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > > > 'mm10')
> > > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > > > dec,  : scan() expected 'an integer', got 'chr2'
> > > > >
> > > >
> > > > Perhaps this sentence in `?rtracklayer::import` points to the source of 
> > > > the error?
> > > >
> > > > many tools and organizations have extended BED with additional columns. 
> > > > These are not officially valid BED files, and as such rtrackla

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Oh, superb, thx!

Interesting ... here you use S3 rather than S4 - I wonder the design intention 
underlying these choices (I'm asking because I am trying to figure out myself 
when to use S3 and when to use S4 and whether to mix the two).

Aditya


From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:23 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

The generic documentation does not mention it, but see ?import.bed.
It's similar to colClasses on read.table().

On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
 wrote:
>
> Thankyou Michael,
>
> How do I use the extraCols argument? The documentation does not mention an 
> `extraCols` argument explicitly, so it must be one of the ellipsis arguments, 
> but `?rtracklayer::import` does not mention it. Should I say extraCols = 10 
> (ten extra columns) or so?
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:05 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> It breaks it because it's not standard BED; however, using the
> extraCols= argument should work in this case. Requiring an explicit
> format specification is intentional, because it provides validation
> and type safety, and it communicates the format to a future reader.
> This also looks a bit like a bedPE file, so you might consider using
> the Pairs data structure.
>
> Michael
>
> On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > Yeah, I also noticed that the attachment was eaten when it entered the 
> > bio-devel list.
> >
> > The file is also accessible in the extdata of the multicrispr:
> > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> >
> > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
> > (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> >
> > I am curious as to why you feel that having additional columns in a bedfile 
> > would break it?
> >
> > Cheers,
> >
> > Aditya
> >
> > ____
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 1:41 PM
> > To: Bhagwat, Aditya
> > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > I don't see an attachment, nor can I find the multicrispr package
> > anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> > which was addressed many years ago, so I've updated the documentation.
> > Broken BED files will never be supported.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Lori,
> > >
> > > I remember now - I tried this function earlier, but it does not work for 
> > > my bedfiles, like the one in attach.
> > >
> > > > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > > >
> > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > > 'mm10')
> > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > > dec,  : scan() expected 'an integer', got 'chr2'
> > > >
> > >
> > > Perhaps this sentence in `?rtracklayer::import` points to the source of 
> > > the error?
> > >
> > > many tools and organizations have extended BED with additional columns. 
> > > These are not officially valid BED files, and as such rtracklayer does 
> > > not yet support them (this will be addressed soon).
> > >
> > > Which brings the question: how soon is soon :-D ?
> > >
> > > Aditya
> > >
> > >
> > > 
> > > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > > Sent: Tuesday, September 17, 2019 1:02 PM
> > > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > > Subject: Re: read_bed()
> > >
> > > Please look at rtracklayer::import()  function that we recommend for 
> > > reading of BAM files along with other common formats.
> > >
> > > Cheers,
> > >
> > >
> > > Lori Shepherd
> > >
> > > Bioconductor Core Team
> > >
> > > Roswel

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
The generic documentation does not mention it, but see ?import.bed.
It's similar to colClasses on read.table().

On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
 wrote:
>
> Thankyou Michael,
>
> How do I use the extraCols argument? The documentation does not mention an 
> `extraCols` argument explicitly, so it must be one of the ellipsis arguments, 
> but `?rtracklayer::import` does not mention it. Should I say extraCols = 10 
> (ten extra columns) or so?
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:05 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> It breaks it because it's not standard BED; however, using the
> extraCols= argument should work in this case. Requiring an explicit
> format specification is intentional, because it provides validation
> and type safety, and it communicates the format to a future reader.
> This also looks a bit like a bedPE file, so you might consider using
> the Pairs data structure.
>
> Michael
>
> On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Michael,
> >
> > Yeah, I also noticed that the attachment was eaten when it entered the 
> > bio-devel list.
> >
> > The file is also accessible in the extdata of the multicrispr:
> > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> >
> > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
> > (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> >
> > I am curious as to why you feel that having additional columns in a bedfile 
> > would break it?
> >
> > Cheers,
> >
> > Aditya
> >
> > ____
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 1:41 PM
> > To: Bhagwat, Aditya
> > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > I don't see an attachment, nor can I find the multicrispr package
> > anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> > which was addressed many years ago, so I've updated the documentation.
> > Broken BED files will never be supported.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> >  wrote:
> > >
> > > Hi Lori,
> > >
> > > I remember now - I tried this function earlier, but it does not work for 
> > > my bedfiles, like the one in attach.
> > >
> > > > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > > >
> > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > > 'mm10')
> > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > > dec,  : scan() expected 'an integer', got 'chr2'
> > > >
> > >
> > > Perhaps this sentence in `?rtracklayer::import` points to the source of 
> > > the error?
> > >
> > > many tools and organizations have extended BED with additional columns. 
> > > These are not officially valid BED files, and as such rtracklayer does 
> > > not yet support them (this will be addressed soon).
> > >
> > > Which brings the question: how soon is soon :-D ?
> > >
> > > Aditya
> > >
> > >
> > > 
> > > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > > Sent: Tuesday, September 17, 2019 1:02 PM
> > > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > > Subject: Re: read_bed()
> > >
> > > Please look at rtracklayer::import()  function that we recommend for 
> > > reading of BAM files along with other common formats.
> > >
> > > Cheers,
> > >
> > >
> > > Lori Shepherd
> > >
> > > Bioconductor Core Team
> > >
> > > Roswell Park Cancer Institute
> > >
> > > Department of Biostatistics & Bioinformatics
> > >
> > > Elm & Carlton Streets
> > >
> > > Buffalo, New York 14263
> > >
> > > 
> > > From: Bioc-devel  on behalf of Bhagwat, 
> > > Aditya 
> > > Sent: Tuesday, September 17, 2019 6:58 AM
> > > To: bioc-devel@r-project.org 
> > > Subject: [Bioc-devel] read_bed()
> >

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Thankyou Michael, 

How do I use the extraCols argument? The documentation does not mention an 
`extraCols` argument explicitly, so it must be one of the ellipsis arguments, 
but `?rtracklayer::import` does not mention it. Should I say extraCols = 10 
(ten extra columns) or so?

Aditya


From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:05 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

It breaks it because it's not standard BED; however, using the
extraCols= argument should work in this case. Requiring an explicit
format specification is intentional, because it provides validation
and type safety, and it communicates the format to a future reader.
This also looks a bit like a bedPE file, so you might consider using
the Pairs data structure.

Michael

On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> Yeah, I also noticed that the attachment was eaten when it entered the 
> bio-devel list.
>
> The file is also accessible in the extdata of the multicrispr:
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
> (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
>
> I am curious as to why you feel that having additional columns in a bedfile 
> would break it?
>
> Cheers,
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 1:41 PM
> To: Bhagwat, Aditya
> Cc: Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I don't see an attachment, nor can I find the multicrispr package
> anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> which was addressed many years ago, so I've updated the documentation.
> Broken BED files will never be supported.
>
> Michael
>
> On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Lori,
> >
> > I remember now - I tried this function earlier, but it does not work for my 
> > bedfiles, like the one in attach.
> >
> > > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > >
> > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > 'mm10')
> > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > dec,  : scan() expected 'an integer', got 'chr2'
> > >
> >
> > Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> > error?
> >
> > many tools and organizations have extended BED with additional columns. 
> > These are not officially valid BED files, and as such rtracklayer does not 
> > yet support them (this will be addressed soon).
> >
> > Which brings the question: how soon is soon :-D ?
> >
> > Aditya
> >
> >
> > 
> > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > Sent: Tuesday, September 17, 2019 1:02 PM
> > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > Subject: Re: read_bed()
> >
> > Please look at rtracklayer::import()  function that we recommend for 
> > reading of BAM files along with other common formats.
> >
> > Cheers,
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Cancer Institute
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > 
> > From: Bioc-devel  on behalf of Bhagwat, 
> > Aditya 
> > Sent: Tuesday, September 17, 2019 6:58 AM
> > To: bioc-devel@r-project.org 
> > Subject: [Bioc-devel] read_bed()
> >
> > Dear bioc-devel,
> >
> > I had two feedback requests regarding the function read_bed().
> >
> > 1) Did I overlook, and therefore, re-invent existing functionality?
> > 2) If not, would `read_bed` be suited for existence in a more foundational 
> > package, e.g. `GenomicRanges`, given the rather basal nature of this 
> > functionality?
> >
> > It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> > (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds 
> > BSgenome info (to allow for implicit range validity 
> > checking<https://support.bioconductor.org/p/124250>) and plots the 
> > karyogram<https://support.bioconductor

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
It breaks it because it's not standard BED; however, using the
extraCols= argument should work in this case. Requiring an explicit
format specification is intentional, because it provides validation
and type safety, and it communicates the format to a future reader.
This also looks a bit like a bedPE file, so you might consider using
the Pairs data structure.

Michael

On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
 wrote:
>
> Hi Michael,
>
> Yeah, I also noticed that the attachment was eaten when it entered the 
> bio-devel list.
>
> The file is also accessible in the extdata of the multicrispr:
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
> (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
>
> I am curious as to why you feel that having additional columns in a bedfile 
> would break it?
>
> Cheers,
>
> Aditya
>
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 1:41 PM
> To: Bhagwat, Aditya
> Cc: Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I don't see an attachment, nor can I find the multicrispr package
> anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> which was addressed many years ago, so I've updated the documentation.
> Broken BED files will never be supported.
>
> Michael
>
> On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
>  wrote:
> >
> > Hi Lori,
> >
> > I remember now - I tried this function earlier, but it does not work for my 
> > bedfiles, like the one in attach.
> >
> > > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > >
> > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > 'mm10')
> > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > dec,  : scan() expected 'an integer', got 'chr2'
> > >
> >
> > Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> > error?
> >
> > many tools and organizations have extended BED with additional columns. 
> > These are not officially valid BED files, and as such rtracklayer does not 
> > yet support them (this will be addressed soon).
> >
> > Which brings the question: how soon is soon :-D ?
> >
> > Aditya
> >
> >
> > 
> > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > Sent: Tuesday, September 17, 2019 1:02 PM
> > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > Subject: Re: read_bed()
> >
> > Please look at rtracklayer::import()  function that we recommend for 
> > reading of BAM files along with other common formats.
> >
> > Cheers,
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Cancer Institute
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > 
> > From: Bioc-devel  on behalf of Bhagwat, 
> > Aditya 
> > Sent: Tuesday, September 17, 2019 6:58 AM
> > To: bioc-devel@r-project.org 
> > Subject: [Bioc-devel] read_bed()
> >
> > Dear bioc-devel,
> >
> > I had two feedback requests regarding the function read_bed().
> >
> > 1) Did I overlook, and therefore, re-invent existing functionality?
> > 2) If not, would `read_bed` be suited for existence in a more foundational 
> > package, e.g. `GenomicRanges`, given the rather basal nature of this 
> > functionality?
> >
> > It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> > (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds 
> > BSgenome info (to allow for implicit range validity 
> > checking<https://support.bioconductor.org/p/124250>) and plots the 
> > karyogram<https://support.bioconductor.org/p/124328>.
> >
> > Thank you for your feedback.
> >
> > Cheers,
> >
> > Aditya
> >
> >
> > #' Read bedfile into GRanges
> > #'
> > #' @param bedfilefile path
> > #' @param bsgenome   BSgenome, e.g. 
> > BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> > #' @param zero_based logical(1): whether bedfile GRanges are 0-based
> > #' @param rm_duplicates  logical(1)
> > #' @param plot   logical(1)
> > #' @para

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Hi Michael,

Yeah, I also noticed that the attachment was eaten when it entered the 
bio-devel list. 

The file is also accessible in the extdata of the multicrispr:
https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed

A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
(chromEnd), and column 6 (strand). All of these are present in SRF.bed.

I am curious as to why you feel that having additional columns in a bedfile 
would break it?

Cheers,

Aditya


From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 1:41 PM
To: Bhagwat, Aditya
Cc: Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I don't see an attachment, nor can I find the multicrispr package
anywhere. The "addressed soon" was referring to the BEDX+Y formats,
which was addressed many years ago, so I've updated the documentation.
Broken BED files will never be supported.

Michael

On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
 wrote:
>
> Hi Lori,
>
> I remember now - I tried this function earlier, but it does not work for my 
> bedfiles, like the one in attach.
>
> > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> >
> > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > 'mm10')
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  
> : scan() expected 'an integer', got 'chr2'
> >
>
> Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> error?
>
> many tools and organizations have extended BED with additional columns. These 
> are not officially valid BED files, and as such rtracklayer does not yet 
> support them (this will be addressed soon).
>
> Which brings the question: how soon is soon :-D ?
>
> Aditya
>
>
> 
> From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> Sent: Tuesday, September 17, 2019 1:02 PM
> To: Bhagwat, Aditya; bioc-devel@r-project.org
> Subject: Re: read_bed()
>
> Please look at rtracklayer::import()  function that we recommend for reading 
> of BAM files along with other common formats.
>
> Cheers,
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ____
> From: Bioc-devel  on behalf of Bhagwat, 
> Aditya 
> Sent: Tuesday, September 17, 2019 6:58 AM
> To: bioc-devel@r-project.org 
> Subject: [Bioc-devel] read_bed()
>
> Dear bioc-devel,
>
> I had two feedback requests regarding the function read_bed().
>
> 1) Did I overlook, and therefore, re-invent existing functionality?
> 2) If not, would `read_bed` be suited for existence in a more foundational 
> package, e.g. `GenomicRanges`, given the rather basal nature of this 
> functionality?
>
> It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds 
> BSgenome info (to allow for implicit range validity 
> checking<https://support.bioconductor.org/p/124250>) and plots the 
> karyogram<https://support.bioconductor.org/p/124328>.
>
> Thank you for your feedback.
>
> Cheers,
>
> Aditya
>
>
> #' Read bedfile into GRanges
> #'
> #' @param bedfilefile path
> #' @param bsgenome   BSgenome, e.g. 
> BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' @param zero_based logical(1): whether bedfile GRanges are 0-based
> #' @param rm_duplicates  logical(1)
> #' @param plot   logical(1)
> #' @param verboselogical(1)
> #' @return \code{\link[GenomicRanges]{GRanges-class}}
> #' @note By convention BED files are 0-based. GRanges are always 1-based.
> #'   A good discussion on these two alternative codings is given
> #'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
> #' @examples
> #' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' (gr <- read_bed(bedfile, bsgenome))
> #' @importFrom  data.table  :=
> #' @export
> read_bed <- function(
> bedfile,
> bsgenome,
> zero_based= TRUE,
> rm_duplicates = TRUE,
> plot  = TRUE,
> verbose   = TRUE
> ){
> # Assert
> assert_all_are_existing_files(bedfile)
> assert_is_a_bool(verbose)
> assert_is_a_bool(rm_duplicates)
> assert_is_a_bool(zero_based)
>
> # Comply
> seqnames <- start <- end <- strand <- .N <- g

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Michael Lawrence via Bioc-devel
I don't see an attachment, nor can I find the multicrispr package
anywhere. The "addressed soon" was referring to the BEDX+Y formats,
which was addressed many years ago, so I've updated the documentation.
Broken BED files will never be supported.

Michael

On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
 wrote:
>
> Hi Lori,
>
> I remember now - I tried this function earlier, but it does not work for my 
> bedfiles, like the one in attach.
>
> > bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> >
> > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > 'mm10')
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  
> : scan() expected 'an integer', got 'chr2'
> >
>
> Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> error?
>
> many tools and organizations have extended BED with additional columns. These 
> are not officially valid BED files, and as such rtracklayer does not yet 
> support them (this will be addressed soon).
>
> Which brings the question: how soon is soon :-D ?
>
> Aditya
>
>
> 
> From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> Sent: Tuesday, September 17, 2019 1:02 PM
> To: Bhagwat, Aditya; bioc-devel@r-project.org
> Subject: Re: read_bed()
>
> Please look at rtracklayer::import()  function that we recommend for reading 
> of BAM files along with other common formats.
>
> Cheers,
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ____
> From: Bioc-devel  on behalf of Bhagwat, 
> Aditya 
> Sent: Tuesday, September 17, 2019 6:58 AM
> To: bioc-devel@r-project.org 
> Subject: [Bioc-devel] read_bed()
>
> Dear bioc-devel,
>
> I had two feedback requests regarding the function read_bed().
>
> 1) Did I overlook, and therefore, re-invent existing functionality?
> 2) If not, would `read_bed` be suited for existence in a more foundational 
> package, e.g. `GenomicRanges`, given the rather basal nature of this 
> functionality?
>
> It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds 
> BSgenome info (to allow for implicit range validity 
> checking<https://support.bioconductor.org/p/124250>) and plots the 
> karyogram<https://support.bioconductor.org/p/124328>.
>
> Thank you for your feedback.
>
> Cheers,
>
> Aditya
>
>
> #' Read bedfile into GRanges
> #'
> #' @param bedfilefile path
> #' @param bsgenome   BSgenome, e.g. 
> BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' @param zero_based logical(1): whether bedfile GRanges are 0-based
> #' @param rm_duplicates  logical(1)
> #' @param plot   logical(1)
> #' @param verboselogical(1)
> #' @return \code{\link[GenomicRanges]{GRanges-class}}
> #' @note By convention BED files are 0-based. GRanges are always 1-based.
> #'   A good discussion on these two alternative codings is given
> #'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
> #' @examples
> #' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' (gr <- read_bed(bedfile, bsgenome))
> #' @importFrom  data.table  :=
> #' @export
> read_bed <- function(
> bedfile,
> bsgenome,
> zero_based= TRUE,
> rm_duplicates = TRUE,
> plot  = TRUE,
> verbose   = TRUE
> ){
> # Assert
> assert_all_are_existing_files(bedfile)
> assert_is_a_bool(verbose)
> assert_is_a_bool(rm_duplicates)
> assert_is_a_bool(zero_based)
>
> # Comply
> seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL
>
> # Read
> if (verbose) cmessage('\tRead %s', bedfile)
> dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
> col.names = c('seqnames', 'start', 'end', 'strand'))
> data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))
>
> # Transform coordinates: 0-based -> 1-based
> if (zero_based){
> if (verbose)cmessage('\t\tConvert 0 -> 1-based')
> dt[, start := start + 1]
> }
>
> if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
> nrow(dt), length(unique(dt$seqnames)))
>
> # Drop duplicates
> if (rm_duplicates){
>   

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Hi Lori,

I remember now - I tried this function earlier, but it does not work for my 
bedfiles, like the one in attach.

> bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
>
> targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 'mm10')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
scan() expected 'an integer', got 'chr2'
>

Perhaps this sentence in `?rtracklayer::import` points to the source of the 
error?

many tools and organizations have extended BED with additional columns. These 
are not officially valid BED files, and as such rtracklayer does not yet 
support them (this will be addressed soon).

Which brings the question: how soon is soon :-D ?

Aditya



From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Tuesday, September 17, 2019 1:02 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: read_bed()

Please look at rtracklayer::import()  function that we recommend for reading of 
BAM files along with other common formats.

Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Bhagwat, 
Aditya 
Sent: Tuesday, September 17, 2019 6:58 AM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] read_bed()

Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational 
package, e.g. `GenomicRanges`, given the rather basal nature of this 
functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based 
(bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome 
info (to allow for implicit range validity 
checking<https://support.bioconductor.org/p/124250>) and plots the 
karyogram<https://support.bioconductor.org/p/124328>.

Thank you for your feedback.

Cheers,

Aditya


#' Read bedfile into GRanges
#'
#' @param bedfilefile path
#' @param bsgenome   BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot   logical(1)
#' @param verboselogical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'   A good discussion on these two alternative codings is given
#'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
bedfile,
bsgenome,
zero_based= TRUE,
rm_duplicates = TRUE,
plot  = TRUE,
verbose   = TRUE
){
# Assert
assert_all_are_existing_files(bedfile)
assert_is_a_bool(verbose)
assert_is_a_bool(rm_duplicates)
assert_is_a_bool(zero_based)

# Comply
seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

# Read
if (verbose) cmessage('\tRead %s', bedfile)
dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
col.names = c('seqnames', 'start', 'end', 'strand'))
data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

# Transform coordinates: 0-based -> 1-based
if (zero_based){
if (verbose)cmessage('\t\tConvert 0 -> 1-based')
dt[, start := start + 1]
}

if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
nrow(dt), length(unique(dt$seqnames)))

# Drop duplicates
if (rm_duplicates){
is_duplicated <- cduplicated(dt)
if (any(is_duplicated)){
if (verbose) cmessage('\t\t%d after removing duplicates')
dt %<>% extract(!duplicated)
}
}

# Turn into GRanges
gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

# Plot and return
title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
if (plot) plot_karyogram(gr, title)
gr
}


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. I

Re: [Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Aha - thx!

Aditya

From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Tuesday, September 17, 2019 1:02 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: read_bed()

Please look at rtracklayer::import()  function that we recommend for reading of 
BAM files along with other common formats.

Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Bhagwat, 
Aditya 
Sent: Tuesday, September 17, 2019 6:58 AM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] read_bed()

Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational 
package, e.g. `GenomicRanges`, given the rather basal nature of this 
functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based 
(bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome 
info (to allow for implicit range validity 
checking<https://support.bioconductor.org/p/124250>) and plots the 
karyogram<https://support.bioconductor.org/p/124328>.

Thank you for your feedback.

Cheers,

Aditya


#' Read bedfile into GRanges
#'
#' @param bedfilefile path
#' @param bsgenome   BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot   logical(1)
#' @param verboselogical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'   A good discussion on these two alternative codings is given
#'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
bedfile,
bsgenome,
zero_based= TRUE,
rm_duplicates = TRUE,
plot  = TRUE,
verbose   = TRUE
){
# Assert
assert_all_are_existing_files(bedfile)
assert_is_a_bool(verbose)
assert_is_a_bool(rm_duplicates)
assert_is_a_bool(zero_based)

# Comply
seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

# Read
if (verbose) cmessage('\tRead %s', bedfile)
dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
col.names = c('seqnames', 'start', 'end', 'strand'))
data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

# Transform coordinates: 0-based -> 1-based
if (zero_based){
if (verbose)cmessage('\t\tConvert 0 -> 1-based')
dt[, start := start + 1]
}

if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
nrow(dt), length(unique(dt$seqnames)))

# Drop duplicates
if (rm_duplicates){
is_duplicated <- cduplicated(dt)
if (any(is_duplicated)){
if (verbose) cmessage('\t\t%d after removing duplicates')
dt %<>% extract(!duplicated)
}
}

# Turn into GRanges
gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

# Plot and return
title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
if (plot) plot_karyogram(gr, title)
gr
}


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. If you have received this message in error, please 
notify the sender immediately by e-mail and delete this email message from your 
computer. Thank you.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] read_bed()

2019-09-17 Thread Shepherd, Lori
Please look at rtracklayer::import()  function that we recommend for reading of 
BAM files along with other common formats.

Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Bhagwat, 
Aditya 
Sent: Tuesday, September 17, 2019 6:58 AM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] read_bed()

Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational 
package, e.g. `GenomicRanges`, given the rather basal nature of this 
functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based 
(bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome 
info (to allow for implicit range validity 
checking<https://support.bioconductor.org/p/124250>) and plots the 
karyogram<https://support.bioconductor.org/p/124328>.

Thank you for your feedback.

Cheers,

Aditya


#' Read bedfile into GRanges
#'
#' @param bedfilefile path
#' @param bsgenome   BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot   logical(1)
#' @param verboselogical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'   A good discussion on these two alternative codings is given
#'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
bedfile,
bsgenome,
zero_based= TRUE,
rm_duplicates = TRUE,
plot  = TRUE,
verbose   = TRUE
){
# Assert
assert_all_are_existing_files(bedfile)
assert_is_a_bool(verbose)
assert_is_a_bool(rm_duplicates)
assert_is_a_bool(zero_based)

# Comply
seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

# Read
if (verbose) cmessage('\tRead %s', bedfile)
dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
col.names = c('seqnames', 'start', 'end', 'strand'))
data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

# Transform coordinates: 0-based -> 1-based
if (zero_based){
if (verbose)cmessage('\t\tConvert 0 -> 1-based')
dt[, start := start + 1]
}

if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
nrow(dt), length(unique(dt$seqnames)))

# Drop duplicates
if (rm_duplicates){
is_duplicated <- cduplicated(dt)
if (any(is_duplicated)){
if (verbose) cmessage('\t\t%d after removing duplicates')
dt %<>% extract(!duplicated)
}
}

# Turn into GRanges
gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

# Plot and return
title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
if (plot) plot_karyogram(gr, title)
gr
}


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] read_bed()

2019-09-17 Thread Bhagwat, Aditya
Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational 
package, e.g. `GenomicRanges`, given the rather basal nature of this 
functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based 
(bedfile) to 1-based (GRanges), adds BSgenome 
info (to allow for implicit range validity 
checking) and plots the 
karyogram.

Thank you for your feedback.

Cheers,

Aditya


#' Read bedfile into GRanges
#'
#' @param bedfilefile path
#' @param bsgenome   BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot   logical(1)
#' @param verboselogical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'   A good discussion on these two alternative codings is given
#'   by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
bedfile,
bsgenome,
zero_based= TRUE,
rm_duplicates = TRUE,
plot  = TRUE,
verbose   = TRUE
){
# Assert
assert_all_are_existing_files(bedfile)
assert_is_a_bool(verbose)
assert_is_a_bool(rm_duplicates)
assert_is_a_bool(zero_based)

# Comply
seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

# Read
if (verbose) cmessage('\tRead %s', bedfile)
dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
col.names = c('seqnames', 'start', 'end', 'strand'))
data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

# Transform coordinates: 0-based -> 1-based
if (zero_based){
if (verbose)cmessage('\t\tConvert 0 -> 1-based')
dt[, start := start + 1]
}

if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
nrow(dt), length(unique(dt$seqnames)))

# Drop duplicates
if (rm_duplicates){
is_duplicated <- cduplicated(dt)
if (any(is_duplicated)){
if (verbose) cmessage('\t\t%d after removing duplicates')
dt %<>% extract(!duplicated)
}
}

# Turn into GRanges
gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

# Plot and return
title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
if (plot) plot_karyogram(gr, title)
gr
}


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel