Dear all,

I have a large number of bam files, where the number of reference "genomes"
is very large, about 1M (bacterial marker gene alignments). A small
fraction of these genomes is poorly named, resulting in the following error
when I run mpileup:

Could not parse the header line: ##contig=<ID=BADNAMES>"

Since this was a small fraction of the references and I am only interested
in a preliminary exploratory analysis, I went ahead and looked into the
VCFs that were generated, assuming (wrongly?) that alignments to these
areas would simply be ignored.

What I found, however, was that the variants called are incorrect--for
example, I have high-confidence SNPs found in regions of zero coverage.

So I am wondering if there is an easy workaround to this problem, or if I
will have to perform realignments of the data, removing or renaming the
culprit references.

I found that, for some reason, using the -r POSITION flag in mpileup
appears to give reasonable results, but that -l produces bad results as
before, but through searching this help archive I found that -r cannot
accept multiple positions in file format.

Any help would be greatly appreciated!
Thanks
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to