Re: [Samtools-help] samtools mpileup / bcftools mpileup / missing sites
Dear James, I have installed the latest github htslib, samtools and bcftools Program: bcftools (Tools for variant calling and manipulating VCFs and BCFs) Version: 1.5-13-gab7b4c1 (using htslib 1.5-5-gda5c0c7) and still I get not the full number of sites. However, it seems that these are all the positions that have ZERO coverage throughout all BAM files. However, it was stated that with the -a -a option one would also retain these. So it is likely not a BUG but than it would be a request how to also retain these sites as DP=0 in the mpileup file? Unfortunately the INDELs end up as seperate lines, which can be partially fixed with bcftools norm: bcftools mpileup -r chr1:5001-50025000 --fasta-ref REF.fasta -a AD,DP -Q 0 -A -B -O z -b BAM.list | bcftools norm -O v -m +any | awk '{if(substr($1,1,1)!="#") print $2}' | sort | uniq | wc -l [mpileup] 6 samples in 6 input files Lines total/split/realigned/skipped:25140/0/0/0 24923 I have also checked if there might be something wrong in that reference location, but this region consists of: ACGT 7014 4852 4668 8466 So no gaps or 'N' chars in it. Thank you in anticipation Kristian On 07/12/2017 06:08 PM, James Bonfield wrote: On Wed, Jul 12, 2017 at 05:54:06PM +0200, Kristian Ullrich wrote: HOWEVER, if I add the options '-u -v' so that the output should be uncompressed and in VCF format, than the output has duplicated POSITIONs and in total less than the expected 25000 sites, if I remove the duplicated. So there might be an issue in reporting the VCF output or in translating the compressed to uncompressed output. It's unlikely to be a compression issue. Is this the latest samtools and htslib release? I don't see how specifying a single region could ever give duplicated positions (and some missing) in VCF though, hence this is likely a bug! It is worth testing bcftools mpileup, just to be sure it's not been fixed. The samtools mpileup -v code has migrated to bcftools now and it's deprecated in samtools. There are some minor command line differences and maybe other tweaks, so it's worth checking it there. If it still fails with the latest bcftools code then can you please file a ticket on github? https://github.com/samtools/bcftools Apologies for the problems. James -- Kristian Ullrich, Ph.D. Department of Evolutionary Genetics Max Planck Institute for Evolutionary Biology August Thienemann Str. 2 24306 Ploen Germany +49 4522 763 274 ullr...@evolbio.mpg.de -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] samtools mpileup / bcftools mpileup / missing sites
On Wed, Jul 12, 2017 at 05:54:06PM +0200, Kristian Ullrich wrote: > HOWEVER, if I add the options '-u -v' so that the output should be > uncompressed and in VCF format, than the output has duplicated > POSITIONs and in total less than the expected 25000 sites, if I > remove the duplicated. So there might be an issue in reporting the > VCF output or in translating the compressed to uncompressed output. It's unlikely to be a compression issue. Is this the latest samtools and htslib release? I don't see how specifying a single region could ever give duplicated positions (and some missing) in VCF though, hence this is likely a bug! It is worth testing bcftools mpileup, just to be sure it's not been fixed. The samtools mpileup -v code has migrated to bcftools now and it's deprecated in samtools. There are some minor command line differences and maybe other tweaks, so it's worth checking it there. If it still fails with the latest bcftools code then can you please file a ticket on github? https://github.com/samtools/bcftools Apologies for the problems. James -- James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] samtools mpileup / bcftools mpileup / missing sites
On Wed, Jul 12, 2017 at 04:41:17PM +0200, Kristian Ullrich wrote: > whatever I am doing with both 'samtools mpileup' and 'bcftools > mpileup', in both cases not all sites are reported, even if I > specify the '-a -a' options e.g.: My standard first try at explaining anything odd with mpileup is to test it with (default) and without (-B) the BAQ calculation. Also consider -A too. Does adding -B change things for you? (If it does, explaining why and deciding if it is an improvement is a totally different and more complex issue!) James -- James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
[Samtools-help] samtools mpileup / bcftools mpileup / missing sites
Dear samtools-help list, whatever I am doing with both 'samtools mpileup' and 'bcftools mpileup', in both cases not all sites are reported, even if I specify the '-a -a' options e.g.: samtools mpileup -L 9 -d 9 -r chr1:5001-50025000 -b BAM.list --fasta-ref REF.fasta -a -a -t AD,DP -Q 0 Thsi should result in 25000 sites, however the number is lower. Has somebody stumbled about the same problem and might have a solution. Actually the number of sites even drops if I am using the '-I' option. Best regards and thank you in anticipation -- Kristian Ullrich, Ph.D. Department of Evolutionary Genetics Max Planck Institute for Evolutionary Biology August Thienemann Str. 2 24306 Ploen Germany +49 4522 763 274 ullr...@evolbio.mpg.de -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
[Samtools-help] samtools mpileup help
Hi Admin, I run samtools mpileup using the following parameters. samtools mpileup -B -q 1 -f $ref -l $bedFile $3/$4_Tumor.sorted.bam > $3/$4_Tumor.pileup I found that the pileup file also the eventual vcf has variants in positions not in the bed file. There are around 30 such variants out of the 7500 variants called. Can you please tell me if this is a bug or if it is normal to see variants in non-bed regions. Thanks and Regards, Pramod -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] Samtools mpileup reference is different from input reference fasta
John -- Thank you for your quick response. The commands -t and --skip-indels were used because I was originally outputting to vcd, but I subsequently verified that the problem was at the level of mpileup output. Yes, working with human -- hg19. I will check on the two things that you mention, but the mtDNA.fasta file used to align, in the faidx command, and in the mpileup command is in fact the exact same one in the same location. I will get back with more details and thanks again. Jacob On Wed, Dec 14, 2016 at 1:16 PM, John Marshallwrote: > On 14 Dec 2016, at 11:54, John Marshall wrote: > > > > On 13 Dec 2016, at 23:56, Jacob Ulirsch > wrote: > >> Here is output from faidx, showing the correct fasta sequence for these > positions: > >> > >> samtools faidx mtDNA.fasta chrM:1330-1340 > >>> chrM:1330-1340 > >> GTCAAGGTGTA > > > > You don't say what species you're working with, but assuming human... > In fact the bases at positions 1330-1340 in the human mitochondrial DNA > sequence (NC_012920) are indeed CAAGGTGTAGC. > > After a conversation with James Bonfield and looking at this further, I > recall that the bases at positions 1330-1340 in the outdated NC_001807.4 > mitochondrial DNA sequence are GTCAAGGTGTA. > > So it would appear that you have a GRCh37/hg19 mtDNA.fasta file that you > used to align input.bam and in the faidx command shown, and a separate > GRCh38/hg38 mtDNA.fasta file that you used in the mpileup command. > > John > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] Samtools mpileup reference is different from input reference fasta
On 14 Dec 2016, at 11:54, John Marshallwrote: > > On 13 Dec 2016, at 23:56, Jacob Ulirsch wrote: >> Here is output from faidx, showing the correct fasta sequence for these >> positions: >> >> samtools faidx mtDNA.fasta chrM:1330-1340 >>> chrM:1330-1340 >> GTCAAGGTGTA > > You don't say what species you're working with, but assuming human... In > fact the bases at positions 1330-1340 in the human mitochondrial DNA sequence > (NC_012920) are indeed CAAGGTGTAGC. After a conversation with James Bonfield and looking at this further, I recall that the bases at positions 1330-1340 in the outdated NC_001807.4 mitochondrial DNA sequence are GTCAAGGTGTA. So it would appear that you have a GRCh37/hg19 mtDNA.fasta file that you used to align input.bam and in the faidx command shown, and a separate GRCh38/hg38 mtDNA.fasta file that you used in the mpileup command. John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] Samtools mpileup reference is different from input reference fasta
On 13 Dec 2016, at 23:56, Jacob Ulirschwrote: > When I run samtools (v1.3.1) mpileup I am running into some very concerning > issues. The reference bases for the output pileup file are not in the same > position as the references in the input fasta. In fact, they appear to be > oddly shifted. For example, here is output for 10 variants from the following > samtools pileup command: > > samtools mpileup -f mtDNA.fasta -r chrM -Q 0 -t AD,ADF,ADR --skip-indels > input.bam > input.txt (The -t and --skip-indels options are effective only when producing VCF/BCF output, not when producing mpileup output.) > chrM1330C 457 G$G$GGGgggtg... > chrM1331A 461 T$T$T$tT... > chrM1332A 455 c$c$c$c$c$CC... > chrM1333G 449 A$A$A$A$A$A$... > chrM1334G 463 A$,$,$AA... > chrM1335T 487 GGgg... > chrM1336G 529 .$.$,$,$,$,,... > chrM1337T 524 ,$,$,$,$,$,$... > chrM1338A 503 g$g$g$g$g$g$... > chrM1339G 497 TTtttTTT... > chrM1340C 506 A$A$aaaA... > > Here is output from faidx, showing the correct fasta sequence for these > positions: > > samtools faidx mtDNA.fasta chrM:1330-1340 > >chrM:1330-1340 > GTCAAGGTGTA You don't say what species you're working with, but assuming human... In fact the bases at positions 1330-1340 in the human mitochondrial DNA sequence (NC_012920) are indeed CAAGGTGTAGC. So if there is a problem, it is with your samtools faidx command. Are you sure mtDNA.fasta.fai is up to date with your mtDNA.fasta file? Rename mtDNA.fasta.fai and regenerate it with `samtools faidx mtDNA.fasta`. If you compare the two .fai files I think you will find that some of the offset numbers have changed slightly, perhaps because the header line in mtDNA.fasta has been changed since the original index file was generated. John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
[Samtools-help] Samtools mpileup reference is different from input reference fasta
Hello, When I run samtools (v1.3.1) mpileup I am running into some very concerning issues. The reference bases for the output pileup file are not in the same position as the references in the input fasta. In fact, they appear to be oddly shifted. For example, here is output for 10 variants from the following samtools pileup command: samtools mpileup -f mtDNA.fasta -r chrM -Q 0 -t AD,ADF,ADR --skip-indels input.bam > input.txt chrM1330C 457 G$G$GGGgggtg... chrM1331A 461 T$T$T$tT... chrM1332A 455 c$c$c$c$c$CC... chrM1333G 449 A$A$A$A$A$A$... chrM1334G 463 A$,$,$AA... chrM1335T 487 GGgg... chrM1336G 529 .$.$,$,$,$,,... chrM1337T 524 ,$,$,$,$,$,$... chrM1338A 503 g$g$g$g$g$g$... chrM1339G 497 TTtttTTT... chrM1340C 506 A$A$aaaA... Here is output from faidx, showing the correct fasta sequence for these positions: samtools faidx mtDNA.fasta chrM:1330-1340 >chrM:1330-1340 GTCAAGGTGTA Notice that the pileup format is shifted by 2 bases. Looking at this more closely, I identified some positions where the reference sequence would be "TCC" (8 Cs, 1 T, 6 Cs) but samtools had converted this to "CCCTC" (7 Cs, 1 T, 5 Cs). Best, Jacob -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] samtools mpileup
Hi, On Mon, Sep 19, 2016 at 05:32:44PM +0800, peijia wrote: > I use samtools with the version 1.3.1 to execute the command "samtools > mpileup -uf REF.fasta merge.bam -o merge.bcf", the result shows "[mpileup] 1 > samples in 1 input files Set max per-file depth to 8000" and it > doesn't work. Could we get more information please than "it doesn't work"? Do you get any more error messages? Is the exit code zero? (to see, run samtools-cmd; echo $?). Is the output file blank, or does it contain something? If so, is it a valid BCF file? I suspect you're outputting the textual mpileup and not BCF format (-g option). Try looking at the recommended workflows for example usage: http://www.htslib.org/workflow/ James -- James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
[Samtools-help] samtools mpileup
Does mpileup support multi-nucleotide substitutions, e.g. a dinucleotide substitution ? The version I have, treats them as two separate independent single substitutions even when all variant reads have both the substitutions. If it not available currently, is it something that samtools plans to add in the future ? thanks -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] samtools mpileup genotype likelihood calulation
What is the coverage of your bams? samtools samples randomly 255 reads when calculating genotype likelihoods. petr On Mon, 2016-02-01 at 09:35 -0500, Thomas W. Blackwell wrote: > See option -d, --max-depth. Also -F, -L, -m for indel calling. > > These count all reads overlapping a candidate variant site, rather > than by starting position. > > - tom blackwell - > > On Mon, 1 Feb 2016, Gnter Jger wrote: > > > Dear Samtools-Team, > > I encountered that samtools mpileup followed by bcftools for variant > calling produces different results when the bam file is sorted > differently. Since the bam file has to be sorted by coordinate, the > order of reads sharing the same start position may vary in different > versions of the bam file, i.e. if reads in a coordinate sorted bam file > are shuffled and then sorted by coordinate again, the order of reads > mapping to identical start positions usually differs from the order of > the same reads in the initial bam file. > > This however, does change the quality of the variants that are called by > bcftools. Is suspect that the genotype likelihoods change. I would have > expected that changing the order of reads mapping to the same start > position does not influence the variant calling, when using the exactly > same samtools mpileup/bcftools commands. I use the current version 1.3., > but also older versions show that problem. > > Is there any explanation why this is the case? The problem occurs with > every bam file I tested so far. It seems like samtools mpileup does not > use all of the reads overlapping a variant position for likelihood > estimation?! Is there any option in bcftools or samtools mpileup to > prevent this from happening, i.e. to produce the same variant quality > scores/genotype likelihood values regardless of the order of reads? > > I would appreciate any comments on that. > > Best regards, > Günter > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/samtools-help > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + > Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor > end-to-end web transactions and take corrective actions now Troubleshoot > faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] samtools mpileup genotype likelihood calulation
Hi Gunter, this is for each a pileup position regardless of the start coordinate. There is an expensive step in the calculation which is solved by keeping a pre-calculating table for number of reads up to 255. This constraint cannot be easily removed, but 255 reads should be plenty for estimating the genotype. Do the qualities come out vastly different? Please see also the section 3.1.2 and 3.1.3 of the samtools math notes if you are interested in more details http://www.broadinstitute.org/gatk/media/docs/Samtools.pdf petr On Wed, 2016-02-03 at 17:05 +0100, Günter Jäger wrote: > Dear Petr, > > the average coverage of my files varies between 300 and 2000 (very deep > sequencing, paired-end). Now it is clear why I get different results. > However, it is not clear to me why there is this constraint. Or did you > mean max 255 reads mapping to the same start position, but in total more > than 255 covering a candidate site? > > - Is there a way to increase the number of sampled reads for genotype > likelihood estimation, or is this a fixed value? > > Many thanks. > > Günter > > Am 03.02.2016 um 13:09 schrieb Petr Danecek: > > What is the coverage of your bams? samtools samples randomly 255 reads > > when calculating genotype likelihoods. > > > > petr > > > > On Mon, 2016-02-01 at 09:35 -0500, Thomas W. Blackwell wrote: > >> See option -d, --max-depth. Also -F, -L, -m for indel calling. > >> > >> These count all reads overlapping a candidate variant site, rather > >> than by starting position. > >> > >>- tom blackwell - > >> > >> On Mon, 1 Feb 2016, Gnter Jger wrote: > >> > >>> Dear Samtools-Team, > >> I encountered that samtools mpileup followed by bcftools for variant > >> calling produces different results when the bam file is sorted > >> differently. Since the bam file has to be sorted by coordinate, the > >> order of reads sharing the same start position may vary in different > >> versions of the bam file, i.e. if reads in a coordinate sorted bam file > >> are shuffled and then sorted by coordinate again, the order of reads > >> mapping to identical start positions usually differs from the order of > >> the same reads in the initial bam file. > >> > >> This however, does change the quality of the variants that are called by > >> bcftools. Is suspect that the genotype likelihoods change. I would have > >> expected that changing the order of reads mapping to the same start > >> position does not influence the variant calling, when using the exactly > >> same samtools mpileup/bcftools commands. I use the current version 1.3., > >> but also older versions show that problem. > >> > >> Is there any explanation why this is the case? The problem occurs with > >> every bam file I tested so far. It seems like samtools mpileup does not > >> use all of the reads overlapping a variant position for likelihood > >> estimation?! Is there any option in bcftools or samtools mpileup to > >> prevent this from happening, i.e. to produce the same variant quality > >> scores/genotype likelihood values regardless of the order of reads? > >> > >> I would appreciate any comments on that. > >> > >> Best regards, > >> Günter > >> > >> -- > >> Site24x7 APM Insight: Get Deep Visibility into Application Performance > >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > >> Monitor end-to-end web transactions and take corrective actions now > >> Troubleshoot faster and improve end-user experience. Signup Now! > >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > >> ___ > >> Samtools-help mailing list > >> Samtools-help@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/samtools-help > >> -- > >> Site24x7 APM Insight: Get Deep Visibility into Application Performance > >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor > >> end-to-end web transactions and take corrective actions now Troubleshoot > >> faster and improve end-user experience. Signup Now! > >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > >> ___ > >> Samtools-help mailing list > >> Samtools-help@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor
[Samtools-help] samtools mpileup genotype likelihood calulation
Dear Samtools-Team, I encountered that samtools mpileup followed by bcftools for variant calling produces different results when the bam file is sorted differently. Since the bam file has to be sorted by coordinate, the order of reads sharing the same start position may vary in different versions of the bam file, i.e. if reads in a coordinate sorted bam file are shuffled and then sorted by coordinate again, the order of reads mapping to identical start positions usually differs from the order of the same reads in the initial bam file. This however, does change the quality of the variants that are called by bcftools. Is suspect that the genotype likelihoods change. I would have expected that changing the order of reads mapping to the same start position does not influence the variant calling, when using the exactly same samtools mpileup/bcftools commands. I use the current version 1.3., but also older versions show that problem. Is there any explanation why this is the case? The problem occurs with every bam file I tested so far. It seems like samtools mpileup does not use all of the reads overlapping a variant position for likelihood estimation?! Is there any option in bcftools or samtools mpileup to prevent this from happening, i.e. to produce the same variant quality scores/genotype likelihood values regardless of the order of reads? I would appreciate any comments on that. Best regards, Günter -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] Samtools mpileup: some positions missing fields
On 16 Sep 2015, at 22:24, Brendan Kohrnwrote: > I found the following issue in the output from samtools mpileup (-f), with a > single bam file input: > > gi|255961284|ref|NC_011713.2| 140 G 1 ,$ C > gi|255961284|ref|NC_011713.2| 149 A 0 > gi|255961284|ref|NC_011713.2| 150 C 1 , D > gi|255961284|ref|NC_011713.2| 151 G 1 , D > > This has happened at locations with 0 depth in a few other files. This issue > doesn't appear to occur in samtools v. 0.1.17, but it does occur in 0.1.19 > and up. I'm not sure about 0.1.18. One of the changes introduced back in 0.1.19 is that mpileup applies the -Q minimum-base-quality filter to mpileup output. The default filter level is 13; if you use -Q0 you'll get the same output with 0.1.19 (and the current 1.2) as with 0.1.18 and before. When these low-quality bases are filtered out, the depth at position 149 is 0, and the following columns are empty, as one might expect -- the depth is 0 and there are no read bases to display. However in some other circumstances (with multiple input files) empty columns in mpileup output are shown as "*", and it possibly ought to be displaying these empty columns due to low-base-quality filtering similarly. So in summary: this (empty fields) is indeed supposed to be happening, but those fields likely ought to be being displayed in a less confusing way. Thanks for the report. John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] samtools mpileup v1.1 output significantly different from 0.1.19 (starting in middle of base qualities)
—ignore-overlaps option was what was needed to get the pre-v1 mpileup output. Thanks. Petr. On Feb 2, 2015, at 12:50 AM, Petr Danecek p...@sanger.ac.uk wrote: Hi Paul, I think this might be related to bug fixes where the default MQ and BQ filters were not applied with the raw mpileup output. Petr On Fri, 2015-01-30 at 15:07 -0800, Paul Lott wrote: Hi all, We have an amplicon pipeline that uses mpileup output. We recently updated samtools from 0.1.19 to 1.1. When performing some validation on the pipeline, we found that there were a significant number of variants missing from the caller(Varscan 2.3.7) output. The only difference was the output from mpileup. In the version 1.1, the coverage is greatly reduced and the base quality string is the same upto approx the middle (base quality string position ~1138); after that everything appears to be garbled. I’ve included outputs for one position, but it is pretty much the same for most positions. (see outputs below) I used the same command is for both versions: samtools mpileup -B -f /local/scratch/REFERENCE_DATA/GATK_Bundle/b37/human_g1k_v37_decoy.fasta -r 13:32911471-32911471 Sample.bam I’ve searched for an answer to the differences, but couldn’t find an explanation of why the outputs of v1.1 would be reduced and changed compared to 0.1.19. If there is an option to get similar output to v0.1.19, that would be great. Or, if it’s a bug it would need to be fixed. Any help or a point in the right direction would be appreciated. Thanks, Paul Paul Lott UC Davis Genome Center University of California, Davis Using samtools 0.1.19, we get the following output: [mpileup] 1 samples in 1 input files mpileup Set max per-file depth to 8000 13 32911471G 3656
Re: [Samtools-help] samtools mpileup v1.1 output significantly different from 0.1.19 (starting in middle of base qualities)
Hi Paul, I think this might be related to bug fixes where the default MQ and BQ filters were not applied with the raw mpileup output. Petr On Fri, 2015-01-30 at 15:07 -0800, Paul Lott wrote: Hi all, We have an amplicon pipeline that uses mpileup output. We recently updated samtools from 0.1.19 to 1.1. When performing some validation on the pipeline, we found that there were a significant number of variants missing from the caller(Varscan 2.3.7) output. The only difference was the output from mpileup. In the version 1.1, the coverage is greatly reduced and the base quality string is the same upto approx the middle (base quality string position ~1138); after that everything appears to be garbled. I’ve included outputs for one position, but it is pretty much the same for most positions. (see outputs below) I used the same command is for both versions: samtools mpileup -B -f /local/scratch/REFERENCE_DATA/GATK_Bundle/b37/human_g1k_v37_decoy.fasta -r 13:32911471-32911471 Sample.bam I’ve searched for an answer to the differences, but couldn’t find an explanation of why the outputs of v1.1 would be reduced and changed compared to 0.1.19. If there is an option to get similar output to v0.1.19, that would be great. Or, if it’s a bug it would need to be fixed. Any help or a point in the right direction would be appreciated. Thanks, Paul Paul Lott UC Davis Genome Center University of California, Davis Using samtools 0.1.19, we get the following output: [mpileup] 1 samples in 1 input files mpileup Set max per-file depth to 8000 13 32911471G 3656
[Samtools-help] samtools mpileup v1.1 output significantly different from 0.1.19 (starting in middle of base qualities)
Hi all, We have an amplicon pipeline that uses mpileup output. We recently updated samtools from 0.1.19 to 1.1. When performing some validation on the pipeline, we found that there were a significant number of variants missing from the caller(Varscan 2.3.7) output. The only difference was the output from mpileup. In the version 1.1, the coverage is greatly reduced and the base quality string is the same upto approx the middle (base quality string position ~1138); after that everything appears to be garbled. I’ve included outputs for one position, but it is pretty much the same for most positions. (see outputs below) I used the same command is for both versions: samtools mpileup -B -f /local/scratch/REFERENCE_DATA/GATK_Bundle/b37/human_g1k_v37_decoy.fasta -r 13:32911471-32911471 Sample.bam I’ve searched for an answer to the differences, but couldn’t find an explanation of why the outputs of v1.1 would be reduced and changed compared to 0.1.19. If there is an option to get similar output to v0.1.19, that would be great. Or, if it’s a bug it would need to be fixed. Any help or a point in the right direction would be appreciated. Thanks, Paul Paul Lott UC Davis Genome Center University of California, Davis Using samtools 0.1.19, we get the following output: [mpileup] 1 samples in 1 input files mpileup Set max per-file depth to 8000 13 32911471G 3656
[Samtools-help] samtools mpileup -- error status 141
Hello, I observe a strange problem and I am not able to find out what is the exact reason of it. When I use samtools mpileup | another_program, samtools return error code 141. Everything else is OK (pileup is well created, all data are well processed). I observe it when I use set -o pipefail. The overall return error status is 141 and it is really caused by samtools because echo ${PIPESTATUS[0]} ${PIPESTATUS[1]} 141 0 It is interesting that this problem appears only with pipes. When I save the output of samtools to a file, the error code is 0. Do you have any ideas about the reason? Thanks Karel -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] samtools mpileup -- error status 141
On 13 Nov 2014, at 12:23, Karel Břinda karel.bri...@gmail.com wrote: I observe a strange problem and I am not able to find out what is the exact reason of it. When I use samtools mpileup | another_program, samtools return error code 141. When a Unix shell reports that a command has an exit status greater than 128, this usually means the command was terminated by a signal. You don't say what platform this is on, but typically 141 means signal 141-128 = 13, which is typically SIGPIPE. Everything else is OK (pileup is well created, all data are well processed). I observe it when I use set -o pipefail. The overall return error status is 141 and it is really caused by samtools because echo ${PIPESTATUS[0]} ${PIPESTATUS[1]} 141 0 It is interesting that this problem appears only with pipes. When I save the output of samtools to a file, the error code is 0. SIGPIPE means that samtools tried to write on a pipe but there was no-one to read it. So this means that another_program closed its input and exited while samtools mpileup still had more data to write. Perhaps it was some trivial trailer or statistics and so doesn't much matter; but perhaps you should check that another_program really did process all the way to the end of the last chromosome! John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
Re: [Samtools-help] samtools mpileup -- error status 141
Hello, thanks a lot for this excellent answer. Thanks to it I found where originally the problem started. So as I realized, it appeared only with FASTA files containing space in the first line (i.e., with a sequence description). The program mentioned in my previous e-mail as another_program considered whole strings after as a sequence name (did not remove the description from it). Since samtools mpileup (unfortunately) skips some of uncovered positions (as it has been discussed recently in this conference), another_program had to process the PILEUP file concurrently with the FASTA file (to complet missing information). So it considered the chromosomes in FASTA to be uncovered by the PILEUP since the chromosome names were differing. And after iterating over the whole FASTA file, it was still on the second line of the PILEUP file. An idea to the end -- would it be possible to add a new parameter to samtools mpileup, which would ensure that all FASTA positions will be mentioned in the PILEUP (even with zero coverage)? Thanks again! Karel 2014-11-13 14:07 GMT+01:00 John Marshall j...@sanger.ac.uk: On 13 Nov 2014, at 12:23, Karel Břinda karel.bri...@gmail.com wrote: I observe a strange problem and I am not able to find out what is the exact reason of it. When I use samtools mpileup | another_program, samtools return error code 141. When a Unix shell reports that a command has an exit status greater than 128, this usually means the command was terminated by a signal. You don't say what platform this is on, but typically 141 means signal 141-128 = 13, which is typically SIGPIPE. Everything else is OK (pileup is well created, all data are well processed). I observe it when I use set -o pipefail. The overall return error status is 141 and it is really caused by samtools because echo ${PIPESTATUS[0]} ${PIPESTATUS[1]} 141 0 It is interesting that this problem appears only with pipes. When I save the output of samtools to a file, the error code is 0. SIGPIPE means that samtools tried to write on a pipe but there was no-one to read it. So this means that another_program closed its input and exited while samtools mpileup still had more data to write. Perhaps it was some trivial trailer or statistics and so doesn't much matter; but perhaps you should check that another_program really did process all the way to the end of the last chromosome! John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
[Samtools-help] samtools mpileup call snp
hi all ; I call snp by samtools mpileup and get results of vcf format; and i find the quality of snp which most are 222,and the max of them are also 222, could you tell me the reasons. I am looking foward to hearing from you soon. Sincerely, fang-- ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help