Hi-- I'm working on a comparison of variant callers (details below; 1). The Samtools 1.1 results are within 0.2% of the maximum sensitivity (congrats!). The other callers I'm looking at have a default minimal filter strategy, and I'm hoping you can suggest one for Samtools.
We work on a lot of different kinds of experiments, so we're looking for a robust initial filter that can be broadly applied. Under the conditions of my comparison, these filters reduce sensitivity by roughly 1% and improve specificity to a similar extent. Samtools specificity doesn't need to be filtered under my current conditions, because it's comparable to the filtered values for the other callers (again, congrats). However, I expect specificity to worsen for all callers when I don't have a defined set of callable regions (particularly with less well-annotated model organisms) or when an experiment calls for analysis of more difficult regions. Although they weren't intended for this purpose, I applied your example filters (2), but they're far more stringent than the minimal filters used by the other callers. Thanks and best regards, Holly Details: (1) I'm comparing high confidence genotype calls on NA12878 from the Genome in a Bottle consortium (GiaB) with the results of variant calls made by Freebayes, Platypus, & GATK on chr20 of NA12878 with 40x coverage and excluding regions not defined as callable by GiaB. I use vcfallelicprimitives from vcflib to try to standardize the representation of indels and complex variants. (2) g3 -G10 %QUAL<10 || (RPB<0.1 && %QUAL<15) || (AC<2 && %QUAL<15) || %MAX(DV)<=3 || %MAX(DV)/%MAX(DP)<=0.3 *Holly Beale, PhD* Computational Biologist hbe...@maverixbio.com Maverix Biomics, Inc.
------------------------------------------------------------------------------
_______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help