Thanks for the suggestion, Petr. But no, BAQs are not the culprits. I made four mpileup files using these four commands:
1 mpileup -d 1000000 -f hg38.fa PREFIX.bam > PREFIX.pileup 2 mpileup --no-BAQ -d 1000000 -f hg38.fa PREFIX.bam > PREFIX_B0.pileup 3 mpileup -Q 0 -d 1000000 -f hg38.fa PREFIX.bam > PREFIX_Q0.pileup 4 mpileup --no-BAQ -Q 0 -d 1000000 -f hg38.fa PREFIX.bam > PREFIX_B0Q0.pileup For majority of lines there was no difference after --no-BAQ, for example that line I showed in last e-mail ( chr16 3243407 T ...) was exactly same after command 1 and 2 and after command 3 and 4, so only -Q 0 made the difference. There were some lines that changed, usualy a the edge of amplicon, and that is likely what is BAQ used for - to prevent misreading around indel ? I am starting to think, if this weird "recalibration" of base qualities is not somehow related to the fact, that I am analyzing amplicon library. Amplicon library looks like a bunch of PCR duplicates, in this case all those thousands reads look like duplicates of 28 reads (there are 28 amplicons there). If this is true it is not Bowtie2 fault, Bowtie does not flag them as duplicates. I used today different library, that was constructed by random fragmentation of a large 8 kb PCR fragment and I did not have "strand bias" problem like I have with amplicon library. Anyway, any suggestion how to make samtools work for my amplicon case are welcome. Jiri On 2/20/2017 12:40 PM, Petr Danecek wrote: > Could this be because of BAQ perhaps? Try to run with --no-BAQ > > Petr > > On Mon, 2017-02-20 at 10:31 -0600, Jiri Nehyba wrote: >> Hi Tom, >> >> Thanks for your help. >> >> I think Bowtie2 is not changing base qualities. To verify that I >> extracted from Bowtie2 bam file back the fastqs (bedtools >> bamtofastq) >> and looked base scores in FastQC - they look the same as before >> alignment. Bowtie2 is assigning alignment quality scores (bam fifth >> column) and those are mostly perfect (151364 from 158220 bam/sam >> lines >> have score 42). >> >> mpileup of samtools is changing the qualities. Below is one line >> from >> mpileup file (samtools mpileup -Q 0 -d 1000000 -f hg38.fa PREFIX.bam >>> >> PREFIX.pileup) . I folded long fifth and sixth column for better >> readibility. This is the position where the reads have C instead of >> T >> that is in reference. You might see about half of the bases have >> qualities higher than 0 (!), most frequent quality is m that would >> correspond to 109-33=76 on Illumina 1.8+ scale. Now original >> qualities >> both in fastqs and in bam/sam file don't have zeroes and the most >> common >> quality is G that is 38 on Illumina 1.8+ scale (lowest I can see just >> by >> looking at random places in bam file is + at the end of reads that >> is >> quality 10). >> >> >> Jiri Nehyba >> >> chr16 3243407 T 928 >> CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC >> CCCCCCCCCCC >> CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC.C >> CCCCCCCCCCC >> CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC >> CCCCCCCCCCC >> CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC >> CCCCCCCCCCC >> CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC >> CCCCCCCACCC >> CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC.CCCCCCCCCCCCCCCccccc >> ccccccccccc >> ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc >> ccccccccccc >> cccccccccccccccccccccccccccccccccccccccccccccccccccc,cccccccccccccccc >> ccccccccccc >> ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc >> ccccccccccc >> ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc >> ccccccccccc >> cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccaccccccc >> ccccccccccc >> cccccccccccccccccccccccccccccccccccccccccccccccc >> mhmmmmklgmlmmlmmmmmlmmmmmmmimlmmmljmml_mmmmm\llmmRkmmkjm_mmimmmlmmkmm >> lmmkmmfllmm >> m`mmlmmmmmmm[m_mmmbkkmimmmmemmmRmmmkiml`glmlmmmmhi`_mm]gmmmbmkll]emLl >> kmmmmmkchgi >> jmlmlkm_ilmmmmmmmmmikmXjmhjmm\jjjm`llmmimZmmh`mmmmmllRmjkmkmmkmllmmmm >> mmmmmmmmm[m >> l^mmmmlmmlmhmmmmmmmmmlmmmmlmmYmmiYmmmjmmmmkmmjImmmkmmmmmgmmmklmbimmlm >> kmmmmmlmmim >> emmmmmlmmmlmmmmmmmmmmmmmmkimmmkhmmjlGlhmllmmmmllmm[mmmgmeimimm_bmmkmQ >> mamjmmlmmmf >> mmhimmmmmlmmmmimmmml_mmjmmmmmlmmlmmmmimhemhR_^mm)mmjlmmmmkmmmm`m!!!!! >> !!!!!!!!!!! >> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >> !!!!!!!!!!! >> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >> !!!!!!!!!!! >> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >> !!!!!!!!!!! >> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >> !!!!!!!!!!! >> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >> !!!!!!!!!!! >> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >> >> >> On 2/20/2017 6:38 AM, Thomas W. Blackwell wrote: >>> >>> I think this is a bowtie2 question, not a samtools question. Why >>> is >>> bowtie2 setting those base call qualities to zero ? >>> >>> - tom blackwell - >>> >>> On Sun, 19 Feb 2017, Jiri Nehyba wrote: >>> >>>> Hi, >>>> >>>> I would like to ask for help with mutation calling using >>>> samtools-bcftools. Specifically my problem is that I am loosing >>>> (almost) all my reverse bases and mutations are called only on >>>> forward bases (reads). >>>> >>>> I have installed last version of samtools and bcftools (1.3.1). >>>> >>>> I have paired fastqs from amplicon panel libraries sequenced on >>>> Illumina MiSeq. I cut off the primer sequences using Cutadapt, >>>> aligned reads to hg38 genome with Bowtie2 (paired alignment), >>>> sorted >>>> and indexed bam file with samtools and then I used following >>>> command: >>>> >>>> samtools mpileup -u -v -d 1000000 -f hg38.fa PREFIX.bam | >>>> bcftools >>>> call -c -v >>>>> PREFIX.vcf >>>> Resulting vcf file looks like this (showing just three rows and >>>> only >>>> some of the fields) >>>> >>>> chr16 3243407 . T C 221.999 . DP=928; . . . >>>> ;DP4=1,0,462,0;MQ=42;FQ=-281.989;PV4=1,1,0.454462,1 GT:PL >>>> 1/1:255,255,0 >>>> chr16 3243888 . C T 221.999 . DP=2982; . . >>>> . >>>> ;DP4=2,0,1486,2;MQ=42;FQ=-281.989;PV4=1,0.458362,0.4298,1 GT:PL >>>> 1/1:255,255,0 >>>> chr16 3243922 . A T 221.999 . DP=2982; . . >>>> . >>>> ;DP4=1,0,1486,4;MQ=42;FQ=-281.989;PV4=1,0.338485,0.450054,1 >>>> GT:PL >>>> 1/1:255,255,0 >>>> >>>> What I don't like is the DP4 field that suggests extreme strand >>>> bias. >>>> >>>> I tried to find out why are those reverse bases discarded - there >>>> is >>>> no reason for generally lower quality score - amplicons are >>>> ligated >>>> randomly in both orientations and I am getting both sides of >>>> each >>>> amplicon in R1 and R2 reads. >>>> >>>> The only thing that prevents the "loss" of the bases is setting >>>> the Q >>>> to 0, like that: >>>> >>>> samtools mpileup -u -v -Q 0 -d 1000000 -f hg38.fa PREFIX.bam | >>>> bcftools call -c -v > PREFIX.vcf >>>> >>>> Result: >>>> >>>> chr16 3243407 . T C 221.999 . DP=928; . . . >>>> ;DP4=2,1,462,463;MQ=42;FQ=-281.989;PV4=1,1,0.419714,1 GT:PL >>>> 1/1:255,255,0 >>>> chr16 3243888 . C T 221.999 . DP=2982; . . >>>> . >>>> ;DP4=2,2,1489,1489;MQ=42;FQ=-281.989;PV4=1,0.493227,0.40081,1 >>>> GT:PL >>>> 1/1:255,255,0 >>>> chr16 3243922 . A T 221.999 . DP=2982; . . >>>> . >>>> ;DP4=4,3,1487,1488;MQ=42;FQ=-281.989;PV4=1,1,1,1 GT:PL >>>> 1/1:255,255,0 >>>> >>>> If I set Q to more than 0, for example 1 I will again loose >>>> almost >>>> all reverse bases. >>>> >>>> Finally, a look at mpileup file obtained by this command: >>>> samtools >>>> mpileup -Q 0 -d 1000000 -f hg38.fa PREFIX.bam > PREFIX.pileup >>>> >>>> I see that half of the bases is getting score ! that is 0 : >>>> >>>> >>>> >>>> Thank you for your help, >>>> >>>> Jiri >>>> >> >> ------------------------------------------------------------------- >> ----------- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, SlashDot.org! http://sdm.link/slashdot >> _______________________________________________ >> Samtools-help mailing list >> Samtools-help@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/samtools-help > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help