Hi Colin,
here are couple of alignments from sam file. I used name sorting for
this demonstration to show reads of both orientations together.
Otherwise for mpile I am using the same bam file sorted by plain sort
command.
Jiri
M03964:25:000000000-AKT18:1:1101:1792:12037 83 chr16 3249344
42 173M = 3249342 -175
ACAACCCAGAGTTGTTGGGAAAATGAAGTAAGGCCCAGTGTGTCCAAGTGCCTGGCAGAGAAGAGCCCACAGGCAGGGAGTGCCTACCTTGTGTTCCAGGGCGACCTCCTCAATGGGGCGCACCCGGTGGCCTTGGTGCTCCTGACTCAGACTGCAGATGAGGCAGATGGGCT
GGGFGGGGFGGGGGGGGGFGGGGGGGFGFFGGFGGFCDGGFGGGGGGGGGGGGGGGFGGGGGGGGGGFFGGGFFGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGGGGGGGGGGGEGGGGGFGGG
AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:173 YS:i:0 YT:Z:CP
M03964:25:000000000-AKT18:1:1101:1792:12037 163 chr16 3249342
42 173M = 3249344 175
CAACAACCCAGAGTTGTTGGGAAAATGAAGTAAGGCCCAGTGTGTCCAAGTGCCTGGCAGAGAAGAGCCCACAGGCAGGGAGTGCCTACCTTGTGTTCCAGGGCGACCTCCTCAATGGGGCGCACCCGGTGGCCTTGGTGCTCCTGACTCAGACTGCAGATGAGGCAGATGGG
GGGGGGGGGDGGGGGGGGGCDGGF?FFFGAFGGGGGGGGFGGGGGGGGGGGGGGGGGGGEGGGGGGGGGFFECFF@EG7FCGGGGFGGGDFFGFFGGGGGG?CCCEFE>FGGGFFFCD8CEECGGGGGGGGGGGGGGGFCFCCGGFGF>F6F>FGGFGFDFFFAD>CAFFDAF
AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:173 YS:i:0 YT:Z:CP
M03964:25:000000000-AKT18:1:1101:1856:12139 83 chr16 3248843
42 198M = 3248841 -200
ACCCCTGCTCACTCTTCCCACCTTCCTCCCAGGGACGGATGGGCCATCAGCCACCTCTGACCTTACCAGAAAGCTCACTGCCTTCTCCTCCCCATAGGATCGCTGCTCCTCCCCTGATTTTCTCAGCTTCTTCAGATGCTCCAGCTGCTTCTGAATTTTCTTCTGGAAAAACAGCACTTGTTGAAAAGCTTGAATTTG
FDAFFFFGGGFGGDF@5C77GGC>GEGEGGFEEGGGGGGGGFGGGGFDEF6E8FGGGGFGGGGGFFGGGEA9GGGGFGGGGGGGGGGGGGGDGGFGGGFGGGGGGGEGGGGGFGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGG
AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:198 YS:i:-3 YT:Z:CP
M03964:25:000000000-AKT18:1:1101:1856:12139 163 chr16 3248841
42 198M = 3248843 200
GGACCCCTGCTCACTCTTCCCACCTTCCTCCCAGGGACGGATGGGCCATCAGCCACCTCTGACCTTACCAGAAAGCTCACTGCCTTCTCCTCCCCATAGGATCGCTGCTCCTCCCCTGATTTTCTCAGCTTCTTCAGATGCTCCAGCTGCTTCTGACTTTTCTTCTGGAAAAACAGCACTTGTTGAAAAGCTTGAATT
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGDEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGFGGGGGGFGGGGEGGGCFGGGGGGGGGGGGFGGGGDGG+<D7FFGGFGFFFGFGFGF?5?FFFCFFF<@C5==@5@=CFF
AS:i:-3 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:156A41 YS:i:0 YT:Z:CP
M03964:25:000000000-AKT18:1:1101:1936:12864 83 chr16 3254291
42 159M = 3254289 -161
AATTTCTGGATTTGCGGGCGCCTTCTCCCCTGTAGAAATGGTGACCTCAAGGCTTCTAGGTCGCATCTTTCCCGAGGGCAGGTACACTTCGAAGGGCCTGCACTCCTTCTGCCCCGGGGCGCCCCCCGCCAGCCCCTGCAGCCTCCCCGCGGAGCTGGC
?CGC:<FF<7:E>GGC>>DF88ECEGGEC<<+7@F<CC?CFGGGDFGGGGGGGF9DGEE>E:C7FGGFEGGDEE<EGFGGFGGGGGEFFGE6G?;5@CGGGGGGGGGGCEEE:GGGGGCCGGGGGCGGEDGF@7FGGGEGDGGGGGEEGCEE<GGGGDF
AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:159 YS:i:0 YT:Z:CP
M03964:25:000000000-AKT18:1:1101:1936:12864 163 chr16 3254289
42 159M = 3254291 161
AGAATTTCTGGATTTGCGGGCGCCTTCTCCCCTGTAGAAATGGTGACCTCAAGGCTTCTAGGTCGCATCTTTCCCGAGGGCAGGTACACTTCGAAGGGCCTGCACTCCTTCTGCCCCGGGGCGCCCCCCGCCAGCCCCTGCAGCCTCCCCGCGGAGCTG
FCFGAEFCF9CEGGFFFGGGGEEGG:<@<:@B<FGGGGGGAF,C<EGFGD@FCA@FFFFGC,9C:@C@:FFG,EFGGGCB@F?@EFGFGCFDEG,BCGGGCEDCF>DFG9E9@DDFGC=6+@EEGG:@:EGGG*=:8**?F4D=<49EE7DDG557C46
AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:159 YS:i:0 YT:Z:CP
M03964:25:000000000-AKT18:1:1101:1992:12226 83 chr16 3243623
42 218M = 3243621 -220
GTCTAACACTCTTCAGATCATCAGAGAAGATGAGGTTGGGGTAAGCGGTTTCTGCATCCAGAATCACATTAACTGCAAAGAAAATTTGAATACCTAGGTAGGGGTCCATGGGCAACATCCCTACAGGGTTCTCCCCACCTGCAGGAAACAGGGACAGGGTAGTTCTTCTGGAACGTGGTAGGGGAGAGCACAGGGATCCAGCAGGCCAGGGCCACTTG
AFFFAFB:;+AFFA6FFFGFFGFGFFFGGGFFC>GGD?C6FEGGGED@CFGGGFDCFDECE,38FFEAGGGFGGGGGGGGGGECGGGGGCEFF@F9A=GGECFGGGGGDGEFGGGGGGGGGGGGE@CGGGGGGGGGGGGGGGGGGFGGGGFFGGGFGGFGGFGGGGGFGGFCGGGGGGGGDGGFCGFGGGFCFE@EFFDGFGFDCGGFF?GFFFFGFF
AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:218 YS:i:-8 YT:Z:CP
M03964:25:000000000-AKT18:1:1101:1992:12226 163 chr16 3243621
42 218M = 3243623 220
AAGTCTAACACTCTTCAGATCATCAGAGAAGATGAGGTTGGGGTAAGCGGTTTCTGCATCCAGAATCACATTAACTGCAAAGAAAATTTGAATACCGAGGTAGGGGTCCATGGGCAACATCCCTACAGGGTTATCCCCACCTGCAGGAAACAGGGACAGGGTAGTTCTTCTGGAACGTGGTAGGGGAGAGGACAGGGATCCAGCAGGCCAGGGCCACT
EEGCFDGFFEFFFD9CDEEFGGGFGGFGE,@@<AEF?C@E@FCFDGFFFE,@:EEFGFAE,B@FF,?AAFA9FFEFG,EFEFFF,,AFC,A<EFG,,@>CC4C@BC=?F?9EDCGGGF=,@DFGGGEC88D,+6@=2@D>E6=EFGGCG61=DDD?=*=?8*8=?F7CA+3<8@?5<F?>5@A5?)*:8))0::@BAD94>0*7((/58(/885>?04
AS:i:-8 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:96T35C57C27
YS:i:0 YT:Z:CP
M03964:25:000000000-AKT18:1:1101:2092:11235 99 chr16 3254128
42 189M = 3254130 191
CGATATAAAGTAGGAAAGAACACAATTTACCGGTGACCGAATTTTCTGGATTTCCAGGGCCTTCCTTCAGGTCCGCAGATGCCCCTCCATCCGGCGTGGGCCTTGCCCGGGGTTCTGTTGCCGAGTCCAGATTCGCAGCTGTCTTTTCTTCTAGAGTCAGGAGAGTTTCTGGATTTGCGGGCGCCTTCT
@+F@:FF@EFFG9F9FE<88FEF8,;C6CFG@E:FGGFG@@C,6CFGFFFF9FFGGGDGGGGFFDFDF9FE,9EEE7++:,5,BFFGDG<EFGE+FEGG+:AFCF@,==FEF+8A,<FEG,?F7F@,3,8733EEG@+>EGFFCCFEC;,@7BDC,3@9C8<*,*45,2,2,2,,29:19?CDD5::2;
AS:i:-12 XN:i:0 XM:i:4 XO:i:0 XG:i:0 NM:i:4
MD:Z:42G51A53C15A24 YS:i:-19 YT:Z:CP
M03964:25:000000000-AKT18:1:1101:2092:11235 147 chr16 3254130
42 189M = 3254128 -191
ATATAAAGGAGGAAAGAACTCAATTTACCGGTGACCGAATGTTCTGGAGTTCCAGGGCCTTCCTTCAGGTCCGCAGAGGCCCCTCCATCCGGAGTGGGCCTTGCCCGGGGTTCTGTTGCCGAGGCCAGATTCGCAGCTGTCTTTTCCTCTAGAGTCAGGAGAATTTCTGGATTTGCGGGCGCCTGCTCC
))5A5655:8+8;1+1+4*+>:=+:81D8*C8@/?7GC9A70CFD;2**CFCC75;,@,EFE9C@53++++8D8>D8+ECB83,+@EGFGGGCF?@44+++F=+C:@+<GF8F=,+7CDFEAA,GEFC@+8:F8FC8FGFF8F@GCDC,9GFFF<<E<C,FFEDF6,GGGF6:::F7F7@@6CF7F@@,
AS:i:-19 XN:i:0 XM:i:6 XO:i:0 XG:i:0 NM:i:6
MD:Z:8T10A28T28T45T60T4 YS:i:-12 YT:Z:CP
On 2/20/2017 9:01 PM, Colin Hercus wrote:
Hi Jiri,
Could you paste a few alignments from the SAM file.
Colin
On 21 February 2017 at 00:31, Jiri Nehyba <ji...@utexas.edu
<mailto:ji...@utexas.edu>> wrote:
Hi Tom,
Thanks for your help.
I think Bowtie2 is not changing base qualities. To verify that I
extracted from Bowtie2 bam file back the fastqs (bedtools bamtofastq)
and looked base scores in FastQC - they look the same as before
alignment. Bowtie2 is assigning alignment quality scores (bam fifth
column) and those are mostly perfect (151364 from 158220 bam/sam lines
have score 42).
mpileup of samtools is changing the qualities. Below is one line from
mpileup file (samtools mpileup -Q 0 -d 1000000 -f hg38.fa PREFIX.bam >
PREFIX.pileup) . I folded long fifth and sixth column for better
readibility. This is the position where the reads have C instead of T
that is in reference. You might see about half of the bases have
qualities higher than 0 (!), most frequent quality is m that would
correspond to 109-33=76 on Illumina 1.8+ scale. Now original qualities
both in fastqs and in bam/sam file don't have zeroes and the most
common
quality is G that is 38 on Illumina 1.8+ scale (lowest I can see
just by
looking at random places in bam file is + at the end of reads that is
quality 10).
Jiri Nehyba
chr16 3243407 T 928
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC.CCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC.CCCCCCCCCCCCCCCcccccccccccccccc
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
cccccccccccccccccccccccccccccccccccccccccccccccccccc,ccccccccccccccccccccccccccc
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccacccccccccccccccccc
cccccccccccccccccccccccccccccccccccccccccccccccc
mhmmmmklgmlmmlmmmmmlmmmmmmmimlmmmljmml_mmmmm\llmmRkmmkjm_mmimmmlmmkmmlmmkmmfllmm
m`mmlmmmmmmm[m_mmmbkkmimmmmemmmRmmmkiml`glmlmmmmhi`_mm]gmmmbmkll]emLlkmmmmmkchgi
jmlmlkm_ilmmmmmmmmmikmXjmhjmm\jjjm`llmmimZmmh`mmmmmllRmjkmkmmkmllmmmmmmmmmmmmm[m
l^mmmmlmmlmhmmmmmmmmmlmmmmlmmYmmiYmmmjmmmmkmmjImmmkmmmmmgmmmklmbimmlmkmmmmmlmmim
emmmmmlmmmlmmmmmmmmmmmmmmkimmmkhmmjlGlhmllmmmmllmm[mmmgmeimimm_bmmkmQmamjmmlmmmf
mmhimmmmmlmmmmimmmml_mmjmmmmmlmmlmmmmimhemhR_^mm)mmjlmmmmkmmmm`m!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
On 2/20/2017 6:38 AM, Thomas W. Blackwell wrote:
>
> I think this is a bowtie2 question, not a samtools question. Why is
> bowtie2 setting those base call qualities to zero ?
>
> - tom blackwell -
>
> On Sun, 19 Feb 2017, Jiri Nehyba wrote:
>
>> Hi,
>>
>> I would like to ask for help with mutation calling using
>> samtools-bcftools. Specifically my problem is that I am loosing
>> (almost) all my reverse bases and mutations are called only on
>> forward bases (reads).
>>
>> I have installed last version of samtools and bcftools (1.3.1).
>>
>> I have paired fastqs from amplicon panel libraries sequenced on
>> Illumina MiSeq. I cut off the primer sequences using Cutadapt,
>> aligned reads to hg38 genome with Bowtie2 (paired alignment),
sorted
>> and indexed bam file with samtools and then I used following
command:
>>
>> samtools mpileup -u -v -d 1000000 -f hg38.fa PREFIX.bam | bcftools
>> call -c -v
>>> PREFIX.vcf
>>
>> Resulting vcf file looks like this (showing just three rows and
only
>> some of the fields)
>>
>> chr16 3243407 . T C 221.999 . DP=928; . . .
>> ;DP4=1,0,462,0;MQ=42;FQ=-281.989;PV4=1,1,0.454462,1 GT:PL
>> 1/1:255,255,0
>> chr16 3243888 . C T 221.999 . DP=2982; . . .
>> ;DP4=2,0,1486,2;MQ=42;FQ=-281.989;PV4=1,0.458362,0.4298,1 GT:PL
>> 1/1:255,255,0
>> chr16 3243922 . A T 221.999 . DP=2982; . . .
>> ;DP4=1,0,1486,4;MQ=42;FQ=-281.989;PV4=1,0.338485,0.450054,1 GT:PL
>> 1/1:255,255,0
>>
>> What I don't like is the DP4 field that suggests extreme strand
bias.
>>
>> I tried to find out why are those reverse bases discarded -
there is
>> no reason for generally lower quality score - amplicons are ligated
>> randomly in both orientations and I am getting both sides of each
>> amplicon in R1 and R2 reads.
>>
>> The only thing that prevents the "loss" of the bases is setting
the Q
>> to 0, like that:
>>
>> samtools mpileup -u -v -Q 0 -d 1000000 -f hg38.fa PREFIX.bam |
>> bcftools call -c -v > PREFIX.vcf
>>
>> Result:
>>
>> chr16 3243407 . T C 221.999 . DP=928; . . .
>> ;DP4=2,1,462,463;MQ=42;FQ=-281.989;PV4=1,1,0.419714,1 GT:PL
>> 1/1:255,255,0
>> chr16 3243888 . C T 221.999 . DP=2982; . . .
>> ;DP4=2,2,1489,1489;MQ=42;FQ=-281.989;PV4=1,0.493227,0.40081,1 GT:PL
>> 1/1:255,255,0
>> chr16 3243922 . A T 221.999 . DP=2982; . . .
>> ;DP4=4,3,1487,1488;MQ=42;FQ=-281.989;PV4=1,1,1,1 GT:PL
1/1:255,255,0
>>
>> If I set Q to more than 0, for example 1 I will again loose almost
>> all reverse bases.
>>
>> Finally, a look at mpileup file obtained by this command: samtools
>> mpileup -Q 0 -d 1000000 -f hg38.fa PREFIX.bam > PREFIX.pileup
>>
>> I see that half of the bases is getting score ! that is 0 :
>>
>>
>>
>> Thank you for your help,
>>
>> Jiri
>>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
<mailto:Samtools-help@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/samtools-help
<https://lists.sourceforge.net/lists/listinfo/samtools-help>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help