Hi Tom, Thanks for your help.
I think Bowtie2 is not changing base qualities. To verify that I extracted from Bowtie2 bam file back the fastqs (bedtools bamtofastq) and looked base scores in FastQC - they look the same as before alignment. Bowtie2 is assigning alignment quality scores (bam fifth column) and those are mostly perfect (151364 from 158220 bam/sam lines have score 42). mpileup of samtools is changing the qualities. Below is one line from mpileup file (samtools mpileup -Q 0 -d 1000000 -f hg38.fa PREFIX.bam > PREFIX.pileup) . I folded long fifth and sixth column for better readibility. This is the position where the reads have C instead of T that is in reference. You might see about half of the bases have qualities higher than 0 (!), most frequent quality is m that would correspond to 109-33=76 on Illumina 1.8+ scale. Now original qualities both in fastqs and in bam/sam file don't have zeroes and the most common quality is G that is 38 on Illumina 1.8+ scale (lowest I can see just by looking at random places in bam file is + at the end of reads that is quality 10). Jiri Nehyba chr16 3243407 T 928 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC.CCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC.CCCCCCCCCCCCCCCcccccccccccccccc cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc cccccccccccccccccccccccccccccccccccccccccccccccccccc,ccccccccccccccccccccccccccc cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccacccccccccccccccccc cccccccccccccccccccccccccccccccccccccccccccccccc mhmmmmklgmlmmlmmmmmlmmmmmmmimlmmmljmml_mmmmm\llmmRkmmkjm_mmimmmlmmkmmlmmkmmfllmm m`mmlmmmmmmm[m_mmmbkkmimmmmemmmRmmmkiml`glmlmmmmhi`_mm]gmmmbmkll]emLlkmmmmmkchgi jmlmlkm_ilmmmmmmmmmikmXjmhjmm\jjjm`llmmimZmmh`mmmmmllRmjkmkmmkmllmmmmmmmmmmmmm[m l^mmmmlmmlmhmmmmmmmmmlmmmmlmmYmmiYmmmjmmmmkmmjImmmkmmmmmgmmmklmbimmlmkmmmmmlmmim emmmmmlmmmlmmmmmmmmmmmmmmkimmmkhmmjlGlhmllmmmmllmm[mmmgmeimimm_bmmkmQmamjmmlmmmf mmhimmmmmlmmmmimmmml_mmjmmmmmlmmlmmmmimhemhR_^mm)mmjlmmmmkmmmm`m!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! On 2/20/2017 6:38 AM, Thomas W. Blackwell wrote: > > I think this is a bowtie2 question, not a samtools question. Why is > bowtie2 setting those base call qualities to zero ? > > - tom blackwell - > > On Sun, 19 Feb 2017, Jiri Nehyba wrote: > >> Hi, >> >> I would like to ask for help with mutation calling using >> samtools-bcftools. Specifically my problem is that I am loosing >> (almost) all my reverse bases and mutations are called only on >> forward bases (reads). >> >> I have installed last version of samtools and bcftools (1.3.1). >> >> I have paired fastqs from amplicon panel libraries sequenced on >> Illumina MiSeq. I cut off the primer sequences using Cutadapt, >> aligned reads to hg38 genome with Bowtie2 (paired alignment), sorted >> and indexed bam file with samtools and then I used following command: >> >> samtools mpileup -u -v -d 1000000 -f hg38.fa PREFIX.bam | bcftools >> call -c -v >>> PREFIX.vcf >> >> Resulting vcf file looks like this (showing just three rows and only >> some of the fields) >> >> chr16 3243407 . T C 221.999 . DP=928; . . . >> ;DP4=1,0,462,0;MQ=42;FQ=-281.989;PV4=1,1,0.454462,1 GT:PL >> 1/1:255,255,0 >> chr16 3243888 . C T 221.999 . DP=2982; . . . >> ;DP4=2,0,1486,2;MQ=42;FQ=-281.989;PV4=1,0.458362,0.4298,1 GT:PL >> 1/1:255,255,0 >> chr16 3243922 . A T 221.999 . DP=2982; . . . >> ;DP4=1,0,1486,4;MQ=42;FQ=-281.989;PV4=1,0.338485,0.450054,1 GT:PL >> 1/1:255,255,0 >> >> What I don't like is the DP4 field that suggests extreme strand bias. >> >> I tried to find out why are those reverse bases discarded - there is >> no reason for generally lower quality score - amplicons are ligated >> randomly in both orientations and I am getting both sides of each >> amplicon in R1 and R2 reads. >> >> The only thing that prevents the "loss" of the bases is setting the Q >> to 0, like that: >> >> samtools mpileup -u -v -Q 0 -d 1000000 -f hg38.fa PREFIX.bam | >> bcftools call -c -v > PREFIX.vcf >> >> Result: >> >> chr16 3243407 . T C 221.999 . DP=928; . . . >> ;DP4=2,1,462,463;MQ=42;FQ=-281.989;PV4=1,1,0.419714,1 GT:PL >> 1/1:255,255,0 >> chr16 3243888 . C T 221.999 . DP=2982; . . . >> ;DP4=2,2,1489,1489;MQ=42;FQ=-281.989;PV4=1,0.493227,0.40081,1 GT:PL >> 1/1:255,255,0 >> chr16 3243922 . A T 221.999 . DP=2982; . . . >> ;DP4=4,3,1487,1488;MQ=42;FQ=-281.989;PV4=1,1,1,1 GT:PL 1/1:255,255,0 >> >> If I set Q to more than 0, for example 1 I will again loose almost >> all reverse bases. >> >> Finally, a look at mpileup file obtained by this command: samtools >> mpileup -Q 0 -d 1000000 -f hg38.fa PREFIX.bam > PREFIX.pileup >> >> I see that half of the bases is getting score ! that is 0 : >> >> >> >> Thank you for your help, >> >> Jiri >> ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help