Re: [Bioc-devel] readGAlignmentPairs with discordant strand

2015-10-17 Thread Michael Lawrence
This might have been lost in this thread. If there is an easy way to coerce
a GAlignmentsList where all(lengths(x) == 2) to a GAlignmentPairs, please
let me know. Otherwise, I'll add a coerce method. Debating whether it
should discard elements of length != 2, or if it should fail.

On Fri, Oct 16, 2015 at 9:58 AM, Michael Lawrence  wrote:

> Sure, "*" makes more sense for strand, given the precedent.
>
> On Fri, Oct 16, 2015 at 9:55 AM, Hervé Pagès  wrote:
> > On 10/16/2015 09:28 AM, Michael Lawrence wrote:
> >>
> >> I kind of wish it would return NA for things like seqnames and strand,
> >> but yes that would be very useful.
> >
> >
> > Could do this for seqnames() but I'm hesitant to do this for strand().
> > If you look at ?strand in BiocGenerics, ‘*’ is used when the exact
> > strand of the location is unknown, or irrelevant, or when the "feature"
> > at that location belongs to both strands. A pair with discordant strand
> > belongs to both strands. Also there is a lot of code around that
> > assumes strand() never returns NAs.
> >
> >
> > H.
> >
> >>
> >> On Fri, Oct 16, 2015 at 9:15 AM, Hervé Pagès 
> wrote:
> >>>
> >>> Hi Michael, Martin,
> >>>
> >>> On 10/16/2015 06:48 AM, Michael Lawrence wrote:
> 
> 
>  It does seem like starting with the more general data structure is the
>  better approach, but I couldn't find an easy way to move the paired
>  subset
>  of GAlignmentsList to GAlignmentPairs. You mention a coercion, but
> it's
>  not
>  obvious to me, unfortunately.
> 
>  Another approach would be a GAlignmentPairs where the unpaired reads
>  have
>  "missing" mates. I know GAlignments has no concept of missing, but it
>  would
>  get everything into a single data structure that is convenient for
>  computing on pairs.
> >>>
> >>>
> >>>
> >>> I could modify readGAlignmentPairs() to have the discordant and/or
> >>> ambiguous pairs end up in th GAlignmentPairs. The ambiguous pairs
> >>> could be marked as such thru a metadata col of the object or thru
> >>> a proper slot. The seqnames() and strand() accessors will return
> >>> * on discordant pairs. Does that sound reasonable?
> >>>
> >>> H.
> >>>
> >>>
> 
>  On Fri, Oct 16, 2015 at 5:21 AM, Morgan, Martin <
>  martin.mor...@roswellpark.org> wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org] On
> Behalf
> >> Of
> >> Michael Lawrence
> >> Sent: Friday, October 16, 2015 7:41 AM
> >> To: bioc-devel@r-project.org
> >> Subject: [Bioc-devel] readGAlignmentPairs with discordant strand
> >>
> >> Now that GAlignmentPairs supports discordant strand between mates,
> how
> >> hard would it be to relax that restriction on readGAlignmentPairs()?
> >>
> >> Also, would be nice if getDumpedAlignments() returned those dumped
> by
> >> readGAlignmentPairs(). Right now, I'm reading a GAlignments (with
> the
> >
> >
> > extra
> >>
> >>
> >> mcols) and calling makeGAlignmentPairs(). Not so convenient.
> >
> >
> >
> > I'm not sure whether this is relevant to your use case but
> > readGAlignmentsList returns a list of paired mates, and if
> appropriate
> > (based on ScanBamParam) list elements with solo travelers. The paired
> > portion of the list can be coerced to GAlignmentPairs if the
> additional
> > structure of that class is required.
> >
> > Martin
> >
> >>
> >> Michael
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ___
> >> Bioc-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
> >
> >
> > This email message may contain legally privileged and/or confidential
> > information.  If you are not the intended recipient(s), or the
> employee
> > or
> > agent responsible for the delivery of this message to the intended
> > recipient(s), you are hereby notified that any disclosure, copying,
> > distribution, or use of this email message is prohibited.  If you
> have
> > received this message in error, please notify the sender immediately
> by
> > e-mail and delete this email message from your computer. Thank you.
> >
> 
>   [[alternative HTML version deleted]]
> 
>  ___
>  Bioc-devel@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> >>>
> >>> --
> >>> Hervé Pagès
> >>>
> >>> Program in Computational Biology
> >>> Division of Public Health Sciences
> >>> Fred Hutchinson Cancer Research Center
> >>> 1100 Fairview Ave. N, M1-B514
> >>> P.O. Box 19024
> >>> Seattle, WA 98109-1024
> >>>
> >>> E-mail: hpa...@fredhutch.org
> >>> Phone:  (206) 667-5791
> 

Re: [Bioc-devel] readGAlignmentPairs with discordant strand

2015-10-17 Thread Michael Lawrence
Cool, thanks. So does it make sense to have that as the coercion from
GAlignmentsList to GAlignmentPairs? I can add it, if it seems reasonable.
I'm kind of torn on whether a coercion should discard records.

On Sat, Oct 17, 2015 at 8:37 AM, Morgan, Martin <
martin.mor...@roswellpark.org> wrote:

> Val and I discussed this a bit, with the resolution that calling the
> GAlignmentPairs() constructor on the first and the second element of the
> GAlignmentsList was the 'easiest' way to go.
>
>   ga = unlist(gal[mcols(gal)$mate_status == "mated"])
>   GAlignmentPairs(ga[c(TRUE, FALSE)], ga[c(FALSE, TRUE)])
>
> (completely speculative code). If I understand correctly, there's a check
> in for discordant pairs in readGAlignmentPairs, but not in GAlignmentPairs
> itself; could be mistaken though...
>
> Martin
> 
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Saturday, October 17, 2015 9:48 AM
> To: Hervé Pagès
> Cc: Michael Lawrence; Morgan, Martin; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] readGAlignmentPairs with discordant strand
>
> This might have been lost in this thread. If there is an easy way to
> coerce a GAlignmentsList where all(lengths(x) == 2) to a GAlignmentPairs,
> please let me know. Otherwise, I'll add a coerce method. Debating whether
> it should discard elements of length != 2, or if it should fail.
>
> On Fri, Oct 16, 2015 at 9:58 AM, Michael Lawrence  > wrote:
> Sure, "*" makes more sense for strand, given the precedent.
>
> On Fri, Oct 16, 2015 at 9:55 AM, Hervé Pagès > wrote:
> > On 10/16/2015 09:28 AM, Michael Lawrence wrote:
> >>
> >> I kind of wish it would return NA for things like seqnames and strand,
> >> but yes that would be very useful.
> >
> >
> > Could do this for seqnames() but I'm hesitant to do this for strand().
> > If you look at ?strand in BiocGenerics, ‘*’ is used when the exact
> > strand of the location is unknown, or irrelevant, or when the "feature"
> > at that location belongs to both strands. A pair with discordant strand
> > belongs to both strands. Also there is a lot of code around that
> > assumes strand() never returns NAs.
> >
> >
> > H.
> >
> >>
> >> On Fri, Oct 16, 2015 at 9:15 AM, Hervé Pagès  > wrote:
> >>>
> >>> Hi Michael, Martin,
> >>>
> >>> On 10/16/2015 06:48 AM, Michael Lawrence wrote:
> 
> 
>  It does seem like starting with the more general data structure is the
>  better approach, but I couldn't find an easy way to move the paired
>  subset
>  of GAlignmentsList to GAlignmentPairs. You mention a coercion, but
> it's
>  not
>  obvious to me, unfortunately.
> 
>  Another approach would be a GAlignmentPairs where the unpaired reads
>  have
>  "missing" mates. I know GAlignments has no concept of missing, but it
>  would
>  get everything into a single data structure that is convenient for
>  computing on pairs.
> >>>
> >>>
> >>>
> >>> I could modify readGAlignmentPairs() to have the discordant and/or
> >>> ambiguous pairs end up in th GAlignmentPairs. The ambiguous pairs
> >>> could be marked as such thru a metadata col of the object or thru
> >>> a proper slot. The seqnames() and strand() accessors will return
> >>> * on discordant pairs. Does that sound reasonable?
> >>>
> >>> H.
> >>>
> >>>
> 
>  On Fri, Oct 16, 2015 at 5:21 AM, Morgan, Martin <
>  martin.mor...@roswellpark.org>
> wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org bioc-devel-boun...@r-project.org>] On Behalf
> >> Of
> >> Michael Lawrence
> >> Sent: Friday, October 16, 2015 7:41 AM
> >> To: bioc-devel@r-project.org
> >> Subject: [Bioc-devel] readGAlignmentPairs with discordant strand
> >>
> >> Now that GAlignmentPairs supports discordant strand between mates,
> how
> >> hard would it be to relax that restriction on readGAlignmentPairs()?
> >>
> >> Also, would be nice if getDumpedAlignments() returned those dumped
> by
> >> readGAlignmentPairs(). Right now, I'm reading a GAlignments (with
> the
> >
> >
> > extra
> >>
> >>
> >> mcols) and calling makeGAlignmentPairs(). Not so convenient.
> >
> >
> >
> > I'm not sure whether this is relevant to your use case but
> > readGAlignmentsList returns a list of paired mates, and if
> appropriate
> > (based on ScanBamParam) list elements with solo travelers. The paired
> > portion of the list can be coerced to GAlignmentPairs if the
> additional
> > structure of that class is required.
> >
> > Martin
> >
> >>
> >> Michael
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> 

Re: [Bioc-devel] readGAlignmentPairs with discordant strand

2015-10-17 Thread Morgan, Martin
Val and I discussed this a bit, with the resolution that calling the 
GAlignmentPairs() constructor on the first and the second element of the 
GAlignmentsList was the 'easiest' way to go.

  ga = unlist(gal[mcols(gal)$mate_status == "mated"])
  GAlignmentPairs(ga[c(TRUE, FALSE)], ga[c(FALSE, TRUE)])

(completely speculative code). If I understand correctly, there's a check in 
for discordant pairs in readGAlignmentPairs, but not in GAlignmentPairs itself; 
could be mistaken though...

Martin

From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Saturday, October 17, 2015 9:48 AM
To: Hervé Pagès
Cc: Michael Lawrence; Morgan, Martin; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] readGAlignmentPairs with discordant strand

This might have been lost in this thread. If there is an easy way to coerce a 
GAlignmentsList where all(lengths(x) == 2) to a GAlignmentPairs, please let me 
know. Otherwise, I'll add a coerce method. Debating whether it should discard 
elements of length != 2, or if it should fail.

On Fri, Oct 16, 2015 at 9:58 AM, Michael Lawrence 
> wrote:
Sure, "*" makes more sense for strand, given the precedent.

On Fri, Oct 16, 2015 at 9:55 AM, Hervé Pagès 
> wrote:
> On 10/16/2015 09:28 AM, Michael Lawrence wrote:
>>
>> I kind of wish it would return NA for things like seqnames and strand,
>> but yes that would be very useful.
>
>
> Could do this for seqnames() but I'm hesitant to do this for strand().
> If you look at ?strand in BiocGenerics, ‘*’ is used when the exact
> strand of the location is unknown, or irrelevant, or when the "feature"
> at that location belongs to both strands. A pair with discordant strand
> belongs to both strands. Also there is a lot of code around that
> assumes strand() never returns NAs.
>
>
> H.
>
>>
>> On Fri, Oct 16, 2015 at 9:15 AM, Hervé Pagès 
>> > wrote:
>>>
>>> Hi Michael, Martin,
>>>
>>> On 10/16/2015 06:48 AM, Michael Lawrence wrote:


 It does seem like starting with the more general data structure is the
 better approach, but I couldn't find an easy way to move the paired
 subset
 of GAlignmentsList to GAlignmentPairs. You mention a coercion, but it's
 not
 obvious to me, unfortunately.

 Another approach would be a GAlignmentPairs where the unpaired reads
 have
 "missing" mates. I know GAlignments has no concept of missing, but it
 would
 get everything into a single data structure that is convenient for
 computing on pairs.
>>>
>>>
>>>
>>> I could modify readGAlignmentPairs() to have the discordant and/or
>>> ambiguous pairs end up in th GAlignmentPairs. The ambiguous pairs
>>> could be marked as such thru a metadata col of the object or thru
>>> a proper slot. The seqnames() and strand() accessors will return
>>> * on discordant pairs. Does that sound reasonable?
>>>
>>> H.
>>>
>>>

 On Fri, Oct 16, 2015 at 5:21 AM, Morgan, Martin <
 martin.mor...@roswellpark.org> wrote:

>
>
>> -Original Message-
>> From: Bioc-devel 
>> [mailto:bioc-devel-boun...@r-project.org]
>>  On Behalf
>> Of
>> Michael Lawrence
>> Sent: Friday, October 16, 2015 7:41 AM
>> To: bioc-devel@r-project.org
>> Subject: [Bioc-devel] readGAlignmentPairs with discordant strand
>>
>> Now that GAlignmentPairs supports discordant strand between mates, how
>> hard would it be to relax that restriction on readGAlignmentPairs()?
>>
>> Also, would be nice if getDumpedAlignments() returned those dumped by
>> readGAlignmentPairs(). Right now, I'm reading a GAlignments (with the
>
>
> extra
>>
>>
>> mcols) and calling makeGAlignmentPairs(). Not so convenient.
>
>
>
> I'm not sure whether this is relevant to your use case but
> readGAlignmentsList returns a list of paired mates, and if appropriate
> (based on ScanBamParam) list elements with solo travelers. The paired
> portion of the list can be coerced to GAlignmentPairs if the additional
> structure of that class is required.
>
> Martin
>
>>
>> Michael
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee
> or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure,