Re: [Bioc-devel] restrictToSNV for VCF

2014-04-09 Thread Valerie Obenchain

Update on these tasks.

1) XStringSetList now has an nchar() method (as of Biostrings 2.31.17)

2) restrictToSNV() was removed from VariantAnnotation

3) The following generics and methods for VCF and VRanges have been 
added to VariantAnnotation 1.9.50:


isSNV()
isInsertion()
isDeletion()
isIndel()
isSubstitution
isTranstion()

I've held off on adding

isSV()
isSVPrecise()

until we have a way to distinguish structural vs non-structual ALT. 
Currently if any of the ALT values are structural, all are coerced to 
character. It would be good to have a way to distinguish a mixture of 
ALT values so we can compute on the nucleotides and do whatever else on 
the structural variants. This may be a project for the next dev cycle.


Valerie


On 03/19/2014 03:29 PM, Michael Lawrence wrote:

Thanks Sean. Probably also need an isSubstitution for any
substitution, either SNV or complex.


On Wed, Mar 19, 2014 at 3:20 PM, Sean Davis sdav...@mail.nih.gov
mailto:sdav...@mail.nih.gov wrote:



On Wed, Mar 19, 2014 at 4:26 PM, Valerie Obenchain
voben...@fhcrc.org mailto:voben...@fhcrc.org wrote:

Thanks for the feedback.

I'll look into nchar for XStringSetList.

I'm in favor of supporting isDeletion(), isInsertion(),
isIndel() and isSNV() for the VCF classes and removing
restrictToSNV(). I could add an argument 'all_alt' or
'all_alt_agreement' to be used with CollapsedVCF in the case
where not all alternate alleles meet the criteria.

Here are the current definitions:

isDeletion - function(x) {
   nchar(alt(x)) == 1L  nchar(ref(x))  1L 
substring(ref(x), 1, 1) == alt(x)
}

isInsertion - function(x) {
   nchar(ref(x)) == 1L  nchar(alt(x))  1L 
substring(alt(x), 1, 1) == ref(x)
}

isIndel - function(x) {
   isDeletion(x) | isInsertion(x)
}

isSNV - function(x) {
   nchar(alt(x)) == 1L  nchar(ref(x)) == 1L
}



To be thorough:

isTransition()

isSV()

isSVPrecise()

We haven't been using VCF for SVs much yet, but there are probably
some fun things to be done on that front.

Sean



Valerie



On 03/19/2014 01:07 PM, Vincent Carey wrote:




On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence
lawrence.mich...@gene.com
mailto:lawrence.mich...@gene.com
mailto:lawrence.michael@gene.__com
mailto:lawrence.mich...@gene.com wrote:

 It would be nice to have functions like isSNV, isIndel,
isDeletion,
 etc that at least provide precise definitions of the
terminology.
 I've added these, but they're designed only for
VRanges. Should work
 for ExpandedVCF.

 Also, it would be nice if restrictToSNV just assumed
that alt(x)
 must be something with nchar() support (with special
handling for
 any List), so that the 'character' vector of
alt,VRanges would work
 immediately. Basically restrictToSNV should just be
x[isSNV(x)]. Is
 there even a use-case for the restrictToSNV abstraction
if we did that?


for VCF instance it would be x[isSNV(x),] and indeed I think
that would
be sufficient.  i like the idea of having this family of
predicates for
variant classes to allow such selections

 Michael



 On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
 voben...@fhcrc.org mailto:voben...@fhcrc.org
mailto:voben...@fhcrc.org mailto:voben...@fhcrc.org wrote:

 Hi,

 I've added a restrictToSNV() function to
VariantAnnotation
 (1.9.46). The return value is a subset VCF object
containing
 SNVs only. The function operates on CollapsedVCF or
ExapandedVCF
 and the alt(VCF) value must be nucleotides (i.e.,
no structural
 variants).

 A variant is considered a SNV if the nucleotide
sequences in
 both ref(vcf) and alt(x) are of length 1. I have a
question
 about how variants with multiple 'ALT' values
should be handled.

 Should we consider row 4 a SNV? One 'ALT' is length
1, the other
 is not.

 ALT - DNAStringSetList(A, c(TT), c(G, A),
c(TT, C))
 REF - DNAStringSet(c(G, c(AA), T, G))

 DataFrame(REF, ALT)

 DataFrame with 4 rows and 2 columns
  

Re: [Bioc-devel] restrictToSNV for VCF

2014-03-21 Thread Hervé Pagès

Hi Martin,

On 03/21/2014 01:45 PM, Martin Morgan wrote:

On 03/20/2014 05:20 PM, Hervé Pagès wrote:

Hi,

On 03/19/2014 01:10 PM, Michael Lawrence wrote:

You can apparently use 1D extraction for VCF, which is a little
surprising;
I learned it from restrictToSNV.


This is inherited from SummarizedExperiment:

example(SummarizedExperiment)

se1
   class: SummarizedExperiment
   dim: 200 6
   exptData(0):
   assays(1): counts
   rownames: NULL
   rowData metadata column names(0):
   colnames(6): A B ... E F
   colData names(1): Treatment

se1[1:4]
   class: SummarizedExperiment
   dim: 4 6
   exptData(0):
   assays(1): counts
   rownames: NULL
   rowData metadata column names(0):
   colnames(6): A B ... E F
   colData names(1): Treatment

To me that means that a SummarizedExperiment has a length
(conceptually), and that this length is the number of rows.
It would actually help if a length method was defined:

length(se1)
   [1] 1


I think of a SummarizedExperiment as fundamentally a matrix with row and
column annotations. 'length' would then be prod(dim(se1))


But it's not defined as such either.

Note that findOverlaps() on SummarizedExperiment objects returns a
Hits object with indices in the 1:nrow(query) or 1:nrow(subject)
range. I'd like to be able to say in the seq_along(query) or
seq_along(subject) range because that's what findOverlaps() does
on any other object defined in IRanges/GenomicRanges/GenomicAlignments.
But I can't because that would be inaccurate.

However, it's conceptually true: I can use the indices in the Hits
object to do 1D extractions from the query or subject. This is good
and consistent with any other type of query or subject.


col- and
rownames() are defined but names() is NULL. I guess 1-D sub-setting
isn't matrix-like, but I don't think that removing this 'feature' simply
for consistency sake is worth it;


I was not suggesting that.


I guess the subsetting logical was
copy/pasted from other code without enough thought. head(), tail() could
be implemented if this were somehow useful (I usually use these for
compact display, and that's irrelevant here...);


I still find it sometimes useful to be able to do head() on a big
object when I just want to try things on a few elements first:

   dim(vcf)
  [1] 100   3

  toy - head(vcf)
  rowData(toy)
  assay(toy)
  isSNV(toy)
  findOverlaps(toy, exons)

It's more convenient and much quicker than having to truncate the
individual results:

  head(rowData(vcf))
  head(assay(vcf))
  head(isSNV(vcf))
  head(findOverlaps(vcf, exons))

I guess what I'm trying to say is that while it helps thinking of
a SummarizedExperiment as fundamentally a matrix, there are already
enough differences with the matrix API to suggest that, unlike for a
matrix, the length of a SummarizedExperiment object is its nb of rows.
It's implicit in many ways and I think that formalizing it would help
rather than hurt. It will still be somewhat a surprise for the
end-user, but not a bigger surprise than the ones s/he gets right
now with seq_along(vcf), vcf[i], isSNV(vcf), findOverlaps(), head(),
etc.. And once s/he gets over it, there won't be anymore surprises:
all these things will be in agreement with length(vcf) and behave
as expected.

Thanks,
H.



rev() on a matrix
doesn't do anything useful.

Martin



That would automatically fix many convenience [ wrappers like head(),
tail(), rev(), etc...

head(se1)
   class: SummarizedExperiment
   dim: 1 6
   exptData(0):
   assays(1): counts
   rownames: NULL
   rowData metadata column names(0):
   colnames(6): A B ... E F
   colData names(1): Treatment

rev(se1)
   class: SummarizedExperiment
   dim: 1 6
   exptData(0):
   assays(1): counts
   rownames: NULL
   rowData metadata column names(0):
   colnames(6): A B ... E F
   colData names(1): Treatment

Following that logic names(se1) also probably return colnames(se1).

H.






On Wed, Mar 19, 2014 at 1:07 PM, Vincent Carey
st...@channing.harvard.eduwrote:





On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence 
lawrence.mich...@gene.com wrote:


It would be nice to have functions like isSNV, isIndel, isDeletion,
etc
that at least provide precise definitions of the terminology. I've
added
these, but they're designed only for VRanges. Should work for
ExpandedVCF.

Also, it would be nice if restrictToSNV just assumed that alt(x)
must be
something with nchar() support (with special handling for any
List), so
that the 'character' vector of alt,VRanges would work immediately.
Basically restrictToSNV should just be x[isSNV(x)]. Is there even a
use-case for the restrictToSNV abstraction if we did that?



for VCF instance it would be x[isSNV(x),] and indeed I think that
would be
sufficient.  i like the idea of having this family of predicates for
variant classes to allow such selections




Michael



On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
voben...@fhcrc.orgwrote:


Hi,

I've added a restrictToSNV() function to 

Re: [Bioc-devel] restrictToSNV for VCF

2014-03-21 Thread Michael Lawrence
Some of the inconsistency emerges from wrappers that correspond to
operations on the rowData. I think  that's fine as long as it's obvious (as
in the case of findOverlaps and isSNV). The head and tail functions are by
convention row-based for rectangular objects. I agree though that if we
keep 1D extraction then the behavior of length() should be changed.


On Fri, Mar 21, 2014 at 3:35 PM, Hervé Pagès hpa...@fhcrc.org wrote:

 Hi Martin,


 On 03/21/2014 01:45 PM, Martin Morgan wrote:

 On 03/20/2014 05:20 PM, Hervé Pagès wrote:

 Hi,

 On 03/19/2014 01:10 PM, Michael Lawrence wrote:

 You can apparently use 1D extraction for VCF, which is a little
 surprising;
 I learned it from restrictToSNV.


 This is inherited from SummarizedExperiment:

 example(SummarizedExperiment)

 se1
class: SummarizedExperiment
dim: 200 6
exptData(0):
assays(1): counts
rownames: NULL
rowData metadata column names(0):
colnames(6): A B ... E F
colData names(1): Treatment

 se1[1:4]
class: SummarizedExperiment
dim: 4 6
exptData(0):
assays(1): counts
rownames: NULL
rowData metadata column names(0):
colnames(6): A B ... E F
colData names(1): Treatment

 To me that means that a SummarizedExperiment has a length
 (conceptually), and that this length is the number of rows.
 It would actually help if a length method was defined:

 length(se1)
[1] 1


 I think of a SummarizedExperiment as fundamentally a matrix with row and
 column annotations. 'length' would then be prod(dim(se1))


 But it's not defined as such either.

 Note that findOverlaps() on SummarizedExperiment objects returns a
 Hits object with indices in the 1:nrow(query) or 1:nrow(subject)
 range. I'd like to be able to say in the seq_along(query) or
 seq_along(subject) range because that's what findOverlaps() does
 on any other object defined in IRanges/GenomicRanges/GenomicAlignments.
 But I can't because that would be inaccurate.

 However, it's conceptually true: I can use the indices in the Hits
 object to do 1D extractions from the query or subject. This is good
 and consistent with any other type of query or subject.


  col- and
 rownames() are defined but names() is NULL. I guess 1-D sub-setting
 isn't matrix-like, but I don't think that removing this 'feature' simply
 for consistency sake is worth it;


 I was not suggesting that.


  I guess the subsetting logical was
 copy/pasted from other code without enough thought. head(), tail() could
 be implemented if this were somehow useful (I usually use these for
 compact display, and that's irrelevant here...);


 I still find it sometimes useful to be able to do head() on a big
 object when I just want to try things on a few elements first:

dim(vcf)
   [1] 100   3

   toy - head(vcf)
   rowData(toy)
   assay(toy)
   isSNV(toy)
   findOverlaps(toy, exons)

 It's more convenient and much quicker than having to truncate the
 individual results:

   head(rowData(vcf))
   head(assay(vcf))
   head(isSNV(vcf))
   head(findOverlaps(vcf, exons))

 I guess what I'm trying to say is that while it helps thinking of
 a SummarizedExperiment as fundamentally a matrix, there are already
 enough differences with the matrix API to suggest that, unlike for a
 matrix, the length of a SummarizedExperiment object is its nb of rows.
 It's implicit in many ways and I think that formalizing it would help
 rather than hurt. It will still be somewhat a surprise for the
 end-user, but not a bigger surprise than the ones s/he gets right
 now with seq_along(vcf), vcf[i], isSNV(vcf), findOverlaps(), head(),
 etc.. And once s/he gets over it, there won't be anymore surprises:
 all these things will be in agreement with length(vcf) and behave
 as expected.

 Thanks,
 H.



  rev() on a matrix
 doesn't do anything useful.

 Martin


 That would automatically fix many convenience [ wrappers like head(),
 tail(), rev(), etc...

 head(se1)
class: SummarizedExperiment
dim: 1 6
exptData(0):
assays(1): counts
rownames: NULL
rowData metadata column names(0):
colnames(6): A B ... E F
colData names(1): Treatment

 rev(se1)
class: SummarizedExperiment
dim: 1 6
exptData(0):
assays(1): counts
rownames: NULL
rowData metadata column names(0):
colnames(6): A B ... E F
colData names(1): Treatment

 Following that logic names(se1) also probably return colnames(se1).

 H.





 On Wed, Mar 19, 2014 at 1:07 PM, Vincent Carey
 st...@channing.harvard.eduwrote:




 On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence 
 lawrence.mich...@gene.com wrote:

  It would be nice to have functions like isSNV, isIndel, isDeletion,
 etc
 that at least provide precise definitions of the terminology. I've
 added
 these, but they're designed only for VRanges. Should work for
 ExpandedVCF.

 Also, it would be nice if restrictToSNV just assumed that alt(x)
 must be
 something with nchar() support (with special handling 

Re: [Bioc-devel] restrictToSNV for VCF

2014-03-20 Thread Hervé Pagès

On 03/20/2014 05:20 PM, Hervé Pagès wrote:
[...]


Following that logic names(se1)  also probably return colnames(se1).

 /\
   should

H.



H.






On Wed, Mar 19, 2014 at 1:07 PM, Vincent Carey
st...@channing.harvard.eduwrote:





On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence 
lawrence.mich...@gene.com wrote:


It would be nice to have functions like isSNV, isIndel, isDeletion, etc
that at least provide precise definitions of the terminology. I've
added
these, but they're designed only for VRanges. Should work for
ExpandedVCF.

Also, it would be nice if restrictToSNV just assumed that alt(x)
must be
something with nchar() support (with special handling for any List), so
that the 'character' vector of alt,VRanges would work immediately.
Basically restrictToSNV should just be x[isSNV(x)]. Is there even a
use-case for the restrictToSNV abstraction if we did that?



for VCF instance it would be x[isSNV(x),] and indeed I think that
would be
sufficient.  i like the idea of having this family of predicates for
variant classes to allow such selections




Michael



On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
voben...@fhcrc.orgwrote:


Hi,

I've added a restrictToSNV() function to VariantAnnotation
(1.9.46). The
return value is a subset VCF object containing SNVs only. The function
operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value
must be
nucleotides (i.e., no structural variants).

A variant is considered a SNV if the nucleotide sequences in both
ref(vcf) and alt(x) are of length 1. I have a question about how
variants
with multiple 'ALT' values should be handled.

Should we consider row 4 a SNV? One 'ALT' is length 1, the other is
not.

ALT - DNAStringSetList(A, c(TT), c(G, A), c(TT, C))
REF - DNAStringSet(c(G, c(AA), T, G))


DataFrame(REF, ALT)



DataFrame with 4 rows and 2 columns
  REFALT
   DNAStringSet DNAStringSetList
1  G  A
2 AA TT
3  TG,A
4  G   TT,C




Thanks.
Valerie

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel








[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] restrictToSNV for VCF

2014-03-19 Thread Michael Lawrence
Also, the code for DNAStringSetList is too low-level. There should just be
an nchar,XStringSetList that does the same thing as
nchar,CompressedCharacterList. Then restrictToSNV or whatever just does
any(nchar(x) == 1L) for any List.

Michael


On Wed, Mar 19, 2014 at 1:00 PM, Michael Lawrence micha...@gene.com wrote:

 It would be nice to have functions like isSNV, isIndel, isDeletion, etc
 that at least provide precise definitions of the terminology. I've added
 these, but they're designed only for VRanges. Should work for ExpandedVCF.

 Also, it would be nice if restrictToSNV just assumed that alt(x) must be
 something with nchar() support (with special handling for any List), so
 that the 'character' vector of alt,VRanges would work immediately.
 Basically restrictToSNV should just be x[isSNV(x)]. Is there even a
 use-case for the restrictToSNV abstraction if we did that?

 Michael



 On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain voben...@fhcrc.orgwrote:

 Hi,

 I've added a restrictToSNV() function to VariantAnnotation (1.9.46). The
 return value is a subset VCF object containing SNVs only. The function
 operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value must be
 nucleotides (i.e., no structural variants).

 A variant is considered a SNV if the nucleotide sequences in both
 ref(vcf) and alt(x) are of length 1. I have a question about how variants
 with multiple 'ALT' values should be handled.

 Should we consider row 4 a SNV? One 'ALT' is length 1, the other is not.

 ALT - DNAStringSetList(A, c(TT), c(G, A), c(TT, C))
 REF - DNAStringSet(c(G, c(AA), T, G))

 DataFrame(REF, ALT)

 DataFrame with 4 rows and 2 columns
  REFALT
   DNAStringSet DNAStringSetList
 1  G  A
 2 AA TT
 3  TG,A
 4  G   TT,C



 Thanks.
 Valerie

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] restrictToSNV for VCF

2014-03-19 Thread Vincent Carey
On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence lawrence.mich...@gene.com
 wrote:

 It would be nice to have functions like isSNV, isIndel, isDeletion, etc
 that at least provide precise definitions of the terminology. I've added
 these, but they're designed only for VRanges. Should work for ExpandedVCF.

 Also, it would be nice if restrictToSNV just assumed that alt(x) must be
 something with nchar() support (with special handling for any List), so
 that the 'character' vector of alt,VRanges would work immediately.
 Basically restrictToSNV should just be x[isSNV(x)]. Is there even a
 use-case for the restrictToSNV abstraction if we did that?


for VCF instance it would be x[isSNV(x),] and indeed I think that would be
sufficient.  i like the idea of having this family of predicates for
variant classes to allow such selections



 Michael



 On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain voben...@fhcrc.orgwrote:

 Hi,

 I've added a restrictToSNV() function to VariantAnnotation (1.9.46). The
 return value is a subset VCF object containing SNVs only. The function
 operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value must be
 nucleotides (i.e., no structural variants).

 A variant is considered a SNV if the nucleotide sequences in both
 ref(vcf) and alt(x) are of length 1. I have a question about how variants
 with multiple 'ALT' values should be handled.

 Should we consider row 4 a SNV? One 'ALT' is length 1, the other is not.

 ALT - DNAStringSetList(A, c(TT), c(G, A), c(TT, C))
 REF - DNAStringSet(c(G, c(AA), T, G))

 DataFrame(REF, ALT)

 DataFrame with 4 rows and 2 columns
  REFALT
   DNAStringSet DNAStringSetList
 1  G  A
 2 AA TT
 3  TG,A
 4  G   TT,C



 Thanks.
 Valerie

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] restrictToSNV for VCF

2014-03-19 Thread Michael Lawrence
You can apparently use 1D extraction for VCF, which is a little surprising;
I learned it from restrictToSNV.




On Wed, Mar 19, 2014 at 1:07 PM, Vincent Carey
st...@channing.harvard.eduwrote:




 On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence 
 lawrence.mich...@gene.com wrote:

 It would be nice to have functions like isSNV, isIndel, isDeletion, etc
 that at least provide precise definitions of the terminology. I've added
 these, but they're designed only for VRanges. Should work for ExpandedVCF.

 Also, it would be nice if restrictToSNV just assumed that alt(x) must be
 something with nchar() support (with special handling for any List), so
 that the 'character' vector of alt,VRanges would work immediately.
 Basically restrictToSNV should just be x[isSNV(x)]. Is there even a
 use-case for the restrictToSNV abstraction if we did that?


 for VCF instance it would be x[isSNV(x),] and indeed I think that would be
 sufficient.  i like the idea of having this family of predicates for
 variant classes to allow such selections



 Michael



 On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain 
 voben...@fhcrc.orgwrote:

 Hi,

 I've added a restrictToSNV() function to VariantAnnotation (1.9.46). The
 return value is a subset VCF object containing SNVs only. The function
 operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value must be
 nucleotides (i.e., no structural variants).

 A variant is considered a SNV if the nucleotide sequences in both
 ref(vcf) and alt(x) are of length 1. I have a question about how variants
 with multiple 'ALT' values should be handled.

 Should we consider row 4 a SNV? One 'ALT' is length 1, the other is not.

 ALT - DNAStringSetList(A, c(TT), c(G, A), c(TT, C))
 REF - DNAStringSet(c(G, c(AA), T, G))

 DataFrame(REF, ALT)

 DataFrame with 4 rows and 2 columns
  REFALT
   DNAStringSet DNAStringSetList
 1  G  A
 2 AA TT
 3  TG,A
 4  G   TT,C



 Thanks.
 Valerie

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel





[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] restrictToSNV for VCF

2014-03-19 Thread Michael Lawrence
Thanks Sean. Probably also need an isSubstitution for any substitution,
either SNV or complex.


On Wed, Mar 19, 2014 at 3:20 PM, Sean Davis sdav...@mail.nih.gov wrote:



 On Wed, Mar 19, 2014 at 4:26 PM, Valerie Obenchain voben...@fhcrc.orgwrote:

 Thanks for the feedback.

 I'll look into nchar for XStringSetList.

 I'm in favor of supporting isDeletion(), isInsertion(), isIndel() and
 isSNV() for the VCF classes and removing restrictToSNV(). I could add an
 argument 'all_alt' or 'all_alt_agreement' to be used with CollapsedVCF in
 the case where not all alternate alleles meet the criteria.

 Here are the current definitions:

  isDeletion - function(x) {
   nchar(alt(x)) == 1L  nchar(ref(x))  1L  substring(ref(x), 1, 1) ==
 alt(x)
 }

 isInsertion - function(x) {
   nchar(ref(x)) == 1L  nchar(alt(x))  1L  substring(alt(x), 1, 1) ==
 ref(x)
 }

 isIndel - function(x) {
   isDeletion(x) | isInsertion(x)
 }

 isSNV - function(x) {
   nchar(alt(x)) == 1L  nchar(ref(x)) == 1L
 }



 To be thorough:

 isTransition()

 isSV()

 isSVPrecise()

 We haven't been using VCF for SVs much yet, but there are probably some
 fun things to be done on that front.

 Sean





 Valerie



 On 03/19/2014 01:07 PM, Vincent Carey wrote:




 On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence
 lawrence.mich...@gene.com mailto:lawrence.mich...@gene.com wrote:

 It would be nice to have functions like isSNV, isIndel, isDeletion,
 etc that at least provide precise definitions of the terminology.
 I've added these, but they're designed only for VRanges. Should work
 for ExpandedVCF.

 Also, it would be nice if restrictToSNV just assumed that alt(x)
 must be something with nchar() support (with special handling for
 any List), so that the 'character' vector of alt,VRanges would work
 immediately. Basically restrictToSNV should just be x[isSNV(x)]. Is
 there even a use-case for the restrictToSNV abstraction if we did
 that?


 for VCF instance it would be x[isSNV(x),] and indeed I think that would
 be sufficient.  i like the idea of having this family of predicates for
 variant classes to allow such selections

 Michael



 On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
 voben...@fhcrc.org mailto:voben...@fhcrc.org wrote:

 Hi,

 I've added a restrictToSNV() function to VariantAnnotation
 (1.9.46). The return value is a subset VCF object containing
 SNVs only. The function operates on CollapsedVCF or ExapandedVCF
 and the alt(VCF) value must be nucleotides (i.e., no structural
 variants).

 A variant is considered a SNV if the nucleotide sequences in
 both ref(vcf) and alt(x) are of length 1. I have a question
 about how variants with multiple 'ALT' values should be handled.

 Should we consider row 4 a SNV? One 'ALT' is length 1, the other
 is not.

 ALT - DNAStringSetList(A, c(TT), c(G, A), c(TT, C))
 REF - DNAStringSet(c(G, c(AA), T, G))

 DataFrame(REF, ALT)

 DataFrame with 4 rows and 2 columns
   REFALT
DNAStringSet DNAStringSetList
 1  G  A
 2 AA TT
 3  TG,A
 4  G   TT,C



 Thanks.
 Valerie

 _
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/__listinfo/bioc-devel
 https://stat.ethz.ch/mailman/listinfo/bioc-devel





 --
 Valerie Obenchain
 Program in Computational Biology
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B155
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: voben...@fhcrc.org
 Phone:  (206) 667-3158
 Fax:(206) 667-1319


 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel