[Samtools-help] bams with malformed region names lead to variant calls using -l to be incorrect

Liyang Diao Thu, 23 Mar 2017 20:25:42 -0700

Dear all,

I have a large number of bam files, where the number of reference "genomes"
is very large, about 1M (bacterial marker gene alignments). A small
fraction of these genomes is poorly named, resulting in the following error
when I run mpileup:


Could not parse the header line: ##contig=<ID=BADNAMES>"

Since this was a small fraction of the references and I am only interested
in a preliminary exploratory analysis, I went ahead and looked into the
VCFs that were generated, assuming (wrongly?) that alignments to these
areas would simply be ignored.

What I found, however, was that the variants called are incorrect--for
example, I have high-confidence SNPs found in regions of zero coverage.

So I am wondering if there is an easy workaround to this problem, or if I
will have to perform realignments of the data, removing or renaming the
culprit references.

I found that, for some reason, using the -r POSITION flag in mpileup
appears to give reasonable results, but that -l produces bad results as
before, but through searching this help archive I found that -r cannot
accept multiple positions in file format.

Any help would be greatly appreciated!
Thanks

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

[Samtools-help] bams with malformed region names lead to variant calls using -l to be incorrect

Reply via email to