Re: [Samtools-help] samtools mpileup / bcftools mpileup / missing sites

2017-07-12 Thread Kristian Ullrich

Dear James,

I have installed the latest github htslib, samtools and bcftools

Program: bcftools (Tools for variant calling and manipulating VCFs and BCFs)
Version: 1.5-13-gab7b4c1 (using htslib 1.5-5-gda5c0c7)

and still I get not the full number of sites.

However, it seems that these are all the positions that have ZERO 
coverage throughout all BAM files. However, it was stated that with the 
-a -a option one would also retain these.


So it is likely not a BUG but than it would be a request how to also 
retain these sites as DP=0 in the mpileup file?



Unfortunately the INDELs end up as seperate lines, which can be 
partially fixed with bcftools norm:


bcftools mpileup -r chr1:5001-50025000 --fasta-ref REF.fasta -a 
AD,DP -Q 0 -A -B -O z -b BAM.list | bcftools norm -O v -m +any | awk 
'{if(substr($1,1,1)!="#") print $2}' | sort | uniq | wc -l

[mpileup] 6 samples in 6 input files
Lines   total/split/realigned/skipped:25140/0/0/0
24923

I have also checked if there might be something wrong in that reference 
location, but this region consists of:


   ACGT
7014 4852 4668 8466

So no gaps or 'N' chars in it.


Thank you in anticipation

Kristian


On 07/12/2017 06:08 PM, James Bonfield wrote:

On Wed, Jul 12, 2017 at 05:54:06PM +0200, Kristian Ullrich wrote:

HOWEVER, if I add the options '-u -v' so that the output should be
uncompressed and in VCF format, than the output has duplicated
POSITIONs and in total less than the expected 25000 sites, if I
remove the duplicated. So there might be an issue in reporting the
VCF output or in translating the compressed to uncompressed output.

It's unlikely to be a compression issue. Is this the latest samtools
and htslib release?

I don't see how specifying a single region could ever give duplicated
positions (and some missing) in VCF though, hence this is likely a
bug!

It is worth testing bcftools mpileup, just to be sure it's not been
fixed.  The samtools mpileup -v code has migrated to bcftools now and
it's deprecated in samtools.  There are some minor command line
differences and maybe other tweaks, so it's worth checking it there.

If it still fails with the latest bcftools code then can you please
file a ticket on github?

 https://github.com/samtools/bcftools

Apologies for the problems.

James



--
Kristian Ullrich, Ph.D.
Department of Evolutionary Genetics
Max Planck Institute for Evolutionary Biology
August Thienemann Str. 2
24306 Ploen
Germany
+49 4522 763 274
ullr...@evolbio.mpg.de


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] samtools mpileup / bcftools mpileup / missing sites

2017-07-12 Thread James Bonfield
On Wed, Jul 12, 2017 at 05:54:06PM +0200, Kristian Ullrich wrote:
> HOWEVER, if I add the options '-u -v' so that the output should be
> uncompressed and in VCF format, than the output has duplicated
> POSITIONs and in total less than the expected 25000 sites, if I
> remove the duplicated. So there might be an issue in reporting the
> VCF output or in translating the compressed to uncompressed output.

It's unlikely to be a compression issue. Is this the latest samtools
and htslib release?

I don't see how specifying a single region could ever give duplicated
positions (and some missing) in VCF though, hence this is likely a
bug!

It is worth testing bcftools mpileup, just to be sure it's not been
fixed.  The samtools mpileup -v code has migrated to bcftools now and
it's deprecated in samtools.  There are some minor command line
differences and maybe other tweaks, so it's worth checking it there.

If it still fails with the latest bcftools code then can you please
file a ticket on github?

https://github.com/samtools/bcftools

Apologies for the problems.

James

-- 
James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
  | Plurima gyrabant gymbolitare vabo;
  A Staden Package developer: | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/   | Momiferique omnes exgrabure Rathi. 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] samtools mpileup / bcftools mpileup / missing sites

2017-07-12 Thread James Bonfield
On Wed, Jul 12, 2017 at 04:41:17PM +0200, Kristian Ullrich wrote:
> whatever I am doing with both 'samtools mpileup' and 'bcftools
> mpileup', in both cases not all sites are reported, even if I
> specify the '-a -a' options e.g.:

My standard first try at explaining anything odd with mpileup is to
test it with (default) and without (-B) the BAQ calculation.  Also
consider -A too.

Does adding -B change things for you?  (If it does, explaining why and
deciding if it is an improvement is a totally different and more
complex issue!)

James

-- 
James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
  | Plurima gyrabant gymbolitare vabo;
  A Staden Package developer: | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/   | Momiferique omnes exgrabure Rathi. 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


[Samtools-help] samtools mpileup / bcftools mpileup / missing sites

2017-07-12 Thread Kristian Ullrich

Dear samtools-help list,

whatever I am doing with both 'samtools mpileup' and 'bcftools mpileup', 
in both cases not all sites are reported, even if I specify the '-a -a' 
options e.g.:


samtools mpileup -L 9 -d 9 -r chr1:5001-50025000 -b 
BAM.list --fasta-ref REF.fasta -a -a -t AD,DP -Q 0


Thsi should result in 25000 sites, however the number is lower.

Has somebody stumbled about the same problem and might have a solution.

Actually the number of sites even drops if I am using the '-I' option.

Best regards and thank you in anticipation

--
Kristian Ullrich, Ph.D.
Department of Evolutionary Genetics
Max Planck Institute for Evolutionary Biology
August Thienemann Str. 2
24306 Ploen
Germany
+49 4522 763 274
ullr...@evolbio.mpg.de


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


[Samtools-help] samtools mpileup help

2017-01-17 Thread Pramod b.m
Hi Admin,

I run samtools mpileup using the following parameters.

samtools mpileup -B -q 1 -f $ref -l $bedFile $3/$4_Tumor.sorted.bam >
$3/$4_Tumor.pileup

I found that the pileup file also the eventual vcf has variants in
positions not in the bed file. There are around 30 such variants out of the
7500 variants called. Can you please tell me if this is a bug or if it is
normal to see variants in non-bed regions.


Thanks and Regards,

Pramod
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] Samtools mpileup reference is different from input reference fasta

2016-12-14 Thread Jacob Ulirsch
John -- Thank you for your quick response.

The commands -t and --skip-indels were used because I was originally
outputting to vcd, but I subsequently verified that the problem was at the
level of mpileup output. Yes, working with human -- hg19. I will check on
the two things that you mention, but the mtDNA.fasta file used to align, in
the faidx command, and in the mpileup command is in fact the exact same one
in the same location. I will get back with more details and thanks again.

Jacob

On Wed, Dec 14, 2016 at 1:16 PM, John Marshall  wrote:

> On 14 Dec 2016, at 11:54, John Marshall  wrote:
> >
> > On 13 Dec 2016, at 23:56, Jacob Ulirsch 
> wrote:
> >> Here is output from faidx, showing the correct fasta sequence for these
> positions:
> >>
> >> samtools faidx mtDNA.fasta chrM:1330-1340
> >>> chrM:1330-1340
> >> GTCAAGGTGTA
> >
> > You don't say what species you're working with, but assuming human...
> In fact the bases at positions 1330-1340 in the human mitochondrial DNA
> sequence (NC_012920) are indeed CAAGGTGTAGC.
>
> After a conversation with James Bonfield and looking at this further, I
> recall that the bases at positions 1330-1340 in the outdated NC_001807.4
> mitochondrial DNA sequence are GTCAAGGTGTA.
>
> So it would appear that you have a GRCh37/hg19 mtDNA.fasta file that you
> used to align input.bam and in the faidx command shown, and a separate
> GRCh38/hg38 mtDNA.fasta file that you used in the mpileup command.
>
> John
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] Samtools mpileup reference is different from input reference fasta

2016-12-14 Thread John Marshall
On 14 Dec 2016, at 11:54, John Marshall  wrote:
> 
> On 13 Dec 2016, at 23:56, Jacob Ulirsch  wrote:
>> Here is output from faidx, showing the correct fasta sequence for these 
>> positions:
>> 
>> samtools faidx mtDNA.fasta chrM:1330-1340
>>> chrM:1330-1340
>> GTCAAGGTGTA
> 
> You don't say what species you're working with, but assuming human...  In 
> fact the bases at positions 1330-1340 in the human mitochondrial DNA sequence 
> (NC_012920) are indeed CAAGGTGTAGC.

After a conversation with James Bonfield and looking at this further, I recall 
that the bases at positions 1330-1340 in the outdated NC_001807.4 mitochondrial 
DNA sequence are GTCAAGGTGTA.

So it would appear that you have a GRCh37/hg19 mtDNA.fasta file that you used 
to align input.bam and in the faidx command shown, and a separate GRCh38/hg38 
mtDNA.fasta file that you used in the mpileup command.

John

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] Samtools mpileup reference is different from input reference fasta

2016-12-14 Thread John Marshall
On 13 Dec 2016, at 23:56, Jacob Ulirsch  wrote:
> When I run samtools (v1.3.1) mpileup I am running into some very concerning 
> issues. The reference bases for the output pileup file are not in the same 
> position as the references in the input fasta. In fact, they appear to be 
> oddly shifted. For example, here is output for 10 variants from the following 
> samtools pileup command:
> 
> samtools mpileup -f mtDNA.fasta -r chrM -Q 0 -t AD,ADF,ADR --skip-indels 
> input.bam > input.txt

(The -t and --skip-indels options are effective only when producing VCF/BCF 
output, not when producing mpileup output.)

> chrM1330C   457 G$G$GGGgggtg...
> chrM1331A   461 T$T$T$tT...
> chrM1332A   455 c$c$c$c$c$CC...
> chrM1333G   449 A$A$A$A$A$A$...
> chrM1334G   463 A$,$,$AA...
> chrM1335T   487 GGgg...
> chrM1336G   529 .$.$,$,$,$,,...
> chrM1337T   524 ,$,$,$,$,$,$...
> chrM1338A   503 g$g$g$g$g$g$...
> chrM1339G   497 TTtttTTT...
> chrM1340C   506 A$A$aaaA...
> 
> Here is output from faidx, showing the correct fasta sequence for these 
> positions:
> 
> samtools faidx mtDNA.fasta chrM:1330-1340
> >chrM:1330-1340
> GTCAAGGTGTA

You don't say what species you're working with, but assuming human...  In fact 
the bases at positions 1330-1340 in the human mitochondrial DNA sequence 
(NC_012920) are indeed CAAGGTGTAGC.  So if there is a problem, it is with your 
samtools faidx command.

Are you sure mtDNA.fasta.fai is up to date with your mtDNA.fasta file?  Rename 
mtDNA.fasta.fai and regenerate it with `samtools faidx mtDNA.fasta`.  If you 
compare the two .fai files I think you will find that some of the offset 
numbers have changed slightly, perhaps because the header line in mtDNA.fasta 
has been changed since the original index file was generated.

John

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


[Samtools-help] Samtools mpileup reference is different from input reference fasta

2016-12-14 Thread Jacob Ulirsch
Hello,

When I run samtools (v1.3.1) mpileup I am running into some very concerning
issues. The reference bases for the output pileup file are not in the same
position as the references in the input fasta. In fact, they appear to be
oddly shifted. For example, here is output for 10 variants from the
following samtools pileup command:

samtools mpileup -f mtDNA.fasta -r chrM -Q 0 -t AD,ADF,ADR --skip-indels
input.bam > input.txt
chrM1330C   457 G$G$GGGgggtg...
chrM1331A   461 T$T$T$tT...
chrM1332A   455 c$c$c$c$c$CC...
chrM1333G   449 A$A$A$A$A$A$...
chrM1334G   463 A$,$,$AA...
chrM1335T   487 GGgg...
chrM1336G   529 .$.$,$,$,$,,...
chrM1337T   524 ,$,$,$,$,$,$...
chrM1338A   503 g$g$g$g$g$g$...
chrM1339G   497 TTtttTTT...
chrM1340C   506 A$A$aaaA...

Here is output from faidx, showing the correct fasta sequence for these
positions:

samtools faidx mtDNA.fasta chrM:1330-1340
>chrM:1330-1340
GTCAAGGTGTA

Notice that the pileup format is shifted by 2 bases. Looking at this more
closely, I identified some positions where the reference sequence would be
"TCC" (8 Cs, 1 T, 6 Cs) but samtools had converted this to
"CCCTC" (7 Cs, 1 T, 5 Cs).

Best,
Jacob
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] samtools mpileup

2016-09-19 Thread James Bonfield
Hi,

On Mon, Sep 19, 2016 at 05:32:44PM +0800, peijia wrote:
> I use samtools with the version 1.3.1 to execute the command "samtools 
> mpileup -uf REF.fasta merge.bam -o merge.bcf", the result shows "[mpileup] 1 
> samples in 1 input files  Set max per-file depth to 8000" and it 
> doesn't work.

Could we get more information please than "it doesn't work"?  Do you
get any more error messages?  Is the exit code zero? (to see, run
samtools-cmd; echo $?).  Is the output file blank, or does it contain
something?  If so, is it a valid BCF file?

I suspect you're outputting the textual mpileup and not BCF format (-g
option). Try looking at the recommended workflows for example usage:

http://www.htslib.org/workflow/

James

-- 
James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
  | Plurima gyrabant gymbolitare vabo;
  A Staden Package developer: | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/   | Momiferique omnes exgrabure Rathi. 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

--
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


[Samtools-help] samtools mpileup

2016-04-15 Thread kavita garg
Does mpileup support multi-nucleotide substitutions, e.g. a dinucleotide
substitution ?
The version I have, treats them as two separate independent single
substitutions even when all variant reads have both the substitutions.
If it not available currently, is it something that samtools plans to add
in the future ?

thanks
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] samtools mpileup genotype likelihood calulation

2016-02-03 Thread Petr Danecek
What is the coverage of your bams? samtools samples randomly 255 reads
when calculating genotype likelihoods.

petr

On Mon, 2016-02-01 at 09:35 -0500, Thomas W. Blackwell wrote:
> See option  -d, --max-depth.  Also  -F, -L, -m  for indel calling.
> 
> These count all reads overlapping a candidate variant site, rather
> than by starting position.
> 
>   -  tom blackwell  -
> 
> On Mon, 1 Feb 2016, Gnter Jger wrote:
> 
> > Dear Samtools-Team,
> 
> I encountered that samtools mpileup followed by bcftools for variant 
> calling produces different results when the bam file is sorted 
> differently. Since the bam file has to be sorted by coordinate, the 
> order of reads sharing the same start position may vary in different 
> versions of the bam file, i.e. if reads in a coordinate sorted bam file 
> are shuffled and then sorted by coordinate again, the order of reads 
> mapping to identical start positions usually differs from the order of 
> the same reads in the initial bam file.
> 
> This however, does change the quality of the variants that are called by 
> bcftools. Is suspect that the genotype likelihoods change. I would have 
> expected that changing the order of reads mapping to the same start 
> position does not influence the variant calling, when using the exactly 
> same samtools mpileup/bcftools commands. I use the current version 1.3., 
> but also older versions show that problem.
> 
> Is there any explanation why this is the case? The problem occurs with 
> every bam file I tested so far. It seems like samtools mpileup does not 
> use all of the reads overlapping a variant position for likelihood 
> estimation?! Is there any option in bcftools or samtools mpileup to 
> prevent this from happening, i.e. to produce the same variant quality 
> scores/genotype likelihood values regardless of the order of reads?
> 
> I would appreciate any comments on that.
> 
> Best regards,
> Günter
> 
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help
> --
>  Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + 
> Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor 
> end-to-end web transactions and take corrective actions now Troubleshoot 
> faster and improve end-user experience. Signup Now! 
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] samtools mpileup genotype likelihood calulation

2016-02-03 Thread Petr Danecek
Hi Gunter,

this is for each a pileup position regardless of the start coordinate.

There is an expensive step in the calculation which is solved by keeping
a pre-calculating table for number of reads up to 255. This constraint
cannot be easily removed, but 255 reads should be plenty for estimating
the genotype. Do the qualities come out vastly different?

Please see also the section 3.1.2 and 3.1.3 of the samtools math notes
if you are interested in more details
http://www.broadinstitute.org/gatk/media/docs/Samtools.pdf

petr



On Wed, 2016-02-03 at 17:05 +0100, Günter Jäger wrote:
> Dear Petr,
> 
> the average coverage of my files varies between 300 and 2000 (very deep 
> sequencing, paired-end). Now it is clear why I get different results. 
> However, it is not clear to me why there is this constraint. Or did you 
> mean max 255 reads mapping to the same start position, but in total more 
> than 255 covering a candidate site?
> 
> - Is there a way to increase the number of sampled reads for genotype 
> likelihood estimation, or is this a fixed value?
> 
> Many thanks.
> 
> Günter
> 
> Am 03.02.2016 um 13:09 schrieb Petr Danecek:
> > What is the coverage of your bams? samtools samples randomly 255 reads
> > when calculating genotype likelihoods.
> >
> > petr
> >
> > On Mon, 2016-02-01 at 09:35 -0500, Thomas W. Blackwell wrote:
> >> See option  -d, --max-depth.  Also  -F, -L, -m  for indel calling.
> >>
> >> These count all reads overlapping a candidate variant site, rather
> >> than by starting position.
> >>
> >>-  tom blackwell  -
> >>
> >> On Mon, 1 Feb 2016, Gnter Jger wrote:
> >>
> >>> Dear Samtools-Team,
> >> I encountered that samtools mpileup followed by bcftools for variant
> >> calling produces different results when the bam file is sorted
> >> differently. Since the bam file has to be sorted by coordinate, the
> >> order of reads sharing the same start position may vary in different
> >> versions of the bam file, i.e. if reads in a coordinate sorted bam file
> >> are shuffled and then sorted by coordinate again, the order of reads
> >> mapping to identical start positions usually differs from the order of
> >> the same reads in the initial bam file.
> >>
> >> This however, does change the quality of the variants that are called by
> >> bcftools. Is suspect that the genotype likelihoods change. I would have
> >> expected that changing the order of reads mapping to the same start
> >> position does not influence the variant calling, when using the exactly
> >> same samtools mpileup/bcftools commands. I use the current version 1.3.,
> >> but also older versions show that problem.
> >>
> >> Is there any explanation why this is the case? The problem occurs with
> >> every bam file I tested so far. It seems like samtools mpileup does not
> >> use all of the reads overlapping a variant position for likelihood
> >> estimation?! Is there any option in bcftools or samtools mpileup to
> >> prevent this from happening, i.e. to produce the same variant quality
> >> scores/genotype likelihood values regardless of the order of reads?
> >>
> >> I would appreciate any comments on that.
> >>
> >> Best regards,
> >> Günter
> >>
> >> --
> >> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >> Monitor end-to-end web transactions and take corrective actions now
> >> Troubleshoot faster and improve end-user experience. Signup Now!
> >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> >> ___
> >> Samtools-help mailing list
> >> Samtools-help@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/samtools-help
> >> --
> >>  Site24x7 APM Insight: Get Deep Visibility into Application Performance 
> >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor 
> >> end-to-end web transactions and take corrective actions now Troubleshoot 
> >> faster and improve end-user experience. Signup Now! 
> >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> >> ___
> >> Samtools-help mailing list
> >> Samtools-help@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/samtools-help
> >
> >
> >
> 




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor 

[Samtools-help] samtools mpileup genotype likelihood calulation

2016-02-01 Thread Günter Jäger
Dear Samtools-Team,

I encountered that samtools mpileup followed by bcftools for variant 
calling produces different results when the bam file is sorted 
differently. Since the bam file has to be sorted by coordinate, the 
order of reads sharing the same start position may vary in different 
versions of the bam file, i.e. if reads in a coordinate sorted bam file 
are shuffled and then sorted by coordinate again, the order of reads 
mapping to identical start positions usually differs from the order of 
the same reads in the initial bam file.

This however, does change the quality of the variants that are called by 
bcftools. Is suspect that the genotype likelihoods change. I would have 
expected that changing the order of reads mapping to the same start 
position does not influence the variant calling, when using the exactly 
same samtools mpileup/bcftools commands. I use the current version 1.3., 
but also older versions show that problem.

Is there any explanation why this is the case? The problem occurs with 
every bam file I tested so far. It seems like samtools mpileup does not 
use all of the reads overlapping a variant position for likelihood 
estimation?! Is there any option in bcftools or samtools mpileup to 
prevent this from happening, i.e. to produce the same variant quality 
scores/genotype likelihood values regardless of the order of reads?

I would appreciate any comments on that.

Best regards,
Günter

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] Samtools mpileup: some positions missing fields

2015-09-21 Thread John Marshall
On 16 Sep 2015, at 22:24, Brendan Kohrn  wrote:
> I found the following issue in the output from samtools mpileup (-f), with a 
> single bam file input:
> 
> gi|255961284|ref|NC_011713.2|   140 G   1   ,$  C
> gi|255961284|ref|NC_011713.2|   149 A   0
> gi|255961284|ref|NC_011713.2|   150 C   1   ,   D
> gi|255961284|ref|NC_011713.2|   151 G   1   ,   D
> 
> This has happened at locations with 0 depth in a few other files.  This issue 
> doesn't appear to occur in samtools v. 0.1.17, but it does occur in 0.1.19 
> and up.  I'm not sure about 0.1.18.

One of the changes introduced back in 0.1.19 is that mpileup applies the -Q 
minimum-base-quality filter to mpileup output.  The default filter level is 13; 
if you use -Q0 you'll get the same output with 0.1.19 (and the current 1.2) as 
with 0.1.18 and before.

When these low-quality bases are filtered out, the depth at position 149 is 0, 
and the following columns are empty, as one might expect -- the depth is 0 and 
there are no read bases to display.  However in some other circumstances (with 
multiple input files) empty columns in mpileup output are shown as "*", and it 
possibly ought to be displaying these empty columns due to low-base-quality 
filtering similarly.

So in summary: this (empty fields) is indeed supposed to be happening, but 
those fields likely ought to be being displayed in a less confusing way.  
Thanks for the report.

John

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

--
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] samtools mpileup v1.1 output significantly different from 0.1.19 (starting in middle of base qualities)

2015-02-02 Thread Paul Lott

—ignore-overlaps option was what was needed to get the pre-v1 mpileup output.

Thanks. Petr.





 On Feb 2, 2015, at 12:50 AM, Petr Danecek p...@sanger.ac.uk wrote:
 
 Hi Paul,
 
 I think this might be related to bug fixes where the default MQ and BQ
 filters were not applied with the raw mpileup output. 
 
 Petr
 
 On Fri, 2015-01-30 at 15:07 -0800, Paul Lott wrote:
 Hi all,
 
 We have an amplicon pipeline that uses mpileup output.  We recently updated 
 samtools from 0.1.19 to 1.1.  When performing some validation on the 
 pipeline, we found that there were a significant number of variants missing 
 from the caller(Varscan 2.3.7) output. The only difference was the output 
 from mpileup.  
 
 In the version 1.1, the coverage is greatly reduced and the base quality 
 string is the same upto approx the middle (base quality string position 
 ~1138); after that everything appears to be garbled. I’ve included outputs 
 for one position, but it is pretty much the same for most positions.  (see 
 outputs below)
 
 I used the same command is for both versions:   samtools mpileup -B -f 
 /local/scratch/REFERENCE_DATA/GATK_Bundle/b37/human_g1k_v37_decoy.fasta -r 
 13:32911471-32911471 Sample.bam
 
 I’ve searched for an answer to the differences, but couldn’t find an 
 explanation of why the outputs of v1.1 would be reduced and changed compared 
 to 0.1.19.   If there is an option to get similar output to v0.1.19, that 
 would be great.  Or, if it’s a bug it would need to be fixed. 
 
 Any help or a point in the right direction would be appreciated. 
 
 Thanks,
 
 Paul
 
 Paul Lott
 UC Davis Genome Center
 University of California, Davis
 
 
 Using samtools 0.1.19, we get the following output:
 
 [mpileup] 1 samples in 1 input files
 mpileup Set max per-file depth to 8000
 13  32911471G   3656
 

Re: [Samtools-help] samtools mpileup v1.1 output significantly different from 0.1.19 (starting in middle of base qualities)

2015-02-02 Thread Petr Danecek
Hi Paul,

I think this might be related to bug fixes where the default MQ and BQ
filters were not applied with the raw mpileup output. 

Petr

On Fri, 2015-01-30 at 15:07 -0800, Paul Lott wrote:
 Hi all,
 
 We have an amplicon pipeline that uses mpileup output.  We recently updated 
 samtools from 0.1.19 to 1.1.  When performing some validation on the 
 pipeline, we found that there were a significant number of variants missing 
 from the caller(Varscan 2.3.7) output. The only difference was the output 
 from mpileup.  
 
 In the version 1.1, the coverage is greatly reduced and the base quality 
 string is the same upto approx the middle (base quality string position 
 ~1138); after that everything appears to be garbled. I’ve included outputs 
 for one position, but it is pretty much the same for most positions.  (see 
 outputs below)
 
 I used the same command is for both versions:   samtools mpileup -B -f 
 /local/scratch/REFERENCE_DATA/GATK_Bundle/b37/human_g1k_v37_decoy.fasta -r 
 13:32911471-32911471 Sample.bam
 
 I’ve searched for an answer to the differences, but couldn’t find an 
 explanation of why the outputs of v1.1 would be reduced and changed compared 
 to 0.1.19.   If there is an option to get similar output to v0.1.19, that 
 would be great.  Or, if it’s a bug it would need to be fixed. 
 
 Any help or a point in the right direction would be appreciated. 
 
 Thanks,
 
 Paul
 
 Paul Lott
 UC Davis Genome Center
 University of California, Davis
 
 
 Using samtools 0.1.19, we get the following output:
 
 [mpileup] 1 samples in 1 input files
 mpileup Set max per-file depth to 8000
 13  32911471G   3656
 

[Samtools-help] samtools mpileup v1.1 output significantly different from 0.1.19 (starting in middle of base qualities)

2015-01-30 Thread Paul Lott
Hi all,

We have an amplicon pipeline that uses mpileup output.  We recently updated 
samtools from 0.1.19 to 1.1.  When performing some validation on the pipeline, 
we found that there were a significant number of variants missing from the 
caller(Varscan 2.3.7) output. The only difference was the output from mpileup.  

In the version 1.1, the coverage is greatly reduced and the base quality string 
is the same upto approx the middle (base quality string position ~1138); after 
that everything appears to be garbled. I’ve included outputs for one position, 
but it is pretty much the same for most positions.  (see outputs below)

I used the same command is for both versions:   samtools mpileup -B -f 
/local/scratch/REFERENCE_DATA/GATK_Bundle/b37/human_g1k_v37_decoy.fasta -r 
13:32911471-32911471 Sample.bam

I’ve searched for an answer to the differences, but couldn’t find an 
explanation of why the outputs of v1.1 would be reduced and changed compared to 
0.1.19.   If there is an option to get similar output to v0.1.19, that would be 
great.  Or, if it’s a bug it would need to be fixed. 

Any help or a point in the right direction would be appreciated. 

Thanks,

Paul

Paul Lott
UC Davis Genome Center
University of California, Davis


Using samtools 0.1.19, we get the following output:

[mpileup] 1 samples in 1 input files
mpileup Set max per-file depth to 8000
13  32911471G   3656

[Samtools-help] samtools mpileup -- error status 141

2014-11-13 Thread Karel Břinda
Hello,

I observe a strange problem and I am not able to find out what is the exact
reason of it. When I use samtools mpileup | another_program, samtools
return error code 141. Everything else is OK (pileup is well created, all
data are well processed). I observe it when I use set -o pipefail. The
overall return error status is 141 and it is really caused by samtools
because

echo ${PIPESTATUS[0]} ${PIPESTATUS[1]}
141 0

It is interesting that this problem appears only with pipes. When I save
the output of samtools to a file, the error code is 0.

Do you have any ideas about the reason?

Thanks

Karel
--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] samtools mpileup -- error status 141

2014-11-13 Thread John Marshall
On 13 Nov 2014, at 12:23, Karel Břinda karel.bri...@gmail.com wrote:
 I observe a strange problem and I am not able to find out what is the exact 
 reason of it. When I use samtools mpileup | another_program, samtools 
 return error code 141.

When a Unix shell reports that a command has an exit status greater than 128, 
this usually means the command was terminated by a signal.  You don't say what 
platform this is on, but typically 141 means signal 141-128 = 13, which is 
typically SIGPIPE.

 Everything else is OK (pileup is well created, all data are well processed). 
 I observe it when I use set -o pipefail. The overall return error status is 
 141 and it is really caused by samtools because
 echo ${PIPESTATUS[0]} ${PIPESTATUS[1]}
 141 0
 
 It is interesting that this problem appears only with pipes. When I save the 
 output of samtools to a file, the error code is 0.

SIGPIPE means that samtools tried to write on a pipe but there was no-one to 
read it.  So this means that another_program closed its input and exited while 
samtools mpileup still had more data to write.  Perhaps it was some trivial 
trailer or statistics and so doesn't much matter; but perhaps you should check 
that another_program really did process all the way to the end of the last 
chromosome!

John

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


Re: [Samtools-help] samtools mpileup -- error status 141

2014-11-13 Thread Karel Břinda
Hello,

thanks a lot for this excellent answer. Thanks to it I found where
originally the problem started.

So as I realized, it appeared only with FASTA files containing space in the
first line (i.e., with a sequence description). The program mentioned in my
previous e-mail as another_program considered whole strings after  as a
sequence name (did not remove the description from it). Since samtools
mpileup (unfortunately) skips some of uncovered positions (as it has been
discussed recently in this conference), another_program had to process
the PILEUP file concurrently with the FASTA file (to complet missing
information). So it considered the chromosomes in FASTA to be uncovered by
the PILEUP since the chromosome names were differing. And after iterating
over the whole FASTA file, it was still on the second line of the PILEUP
file.

An idea to the end -- would it be possible to add a new parameter to samtools
mpileup, which would ensure that all FASTA positions will be mentioned in
the PILEUP (even with zero coverage)?

Thanks again!

Karel


2014-11-13 14:07 GMT+01:00 John Marshall j...@sanger.ac.uk:

 On 13 Nov 2014, at 12:23, Karel Břinda karel.bri...@gmail.com wrote:
  I observe a strange problem and I am not able to find out what is the
 exact reason of it. When I use samtools mpileup | another_program,
 samtools return error code 141.

 When a Unix shell reports that a command has an exit status greater than
 128, this usually means the command was terminated by a signal.  You don't
 say what platform this is on, but typically 141 means signal 141-128 = 13,
 which is typically SIGPIPE.

  Everything else is OK (pileup is well created, all data are well
 processed). I observe it when I use set -o pipefail. The overall return
 error status is 141 and it is really caused by samtools because
  echo ${PIPESTATUS[0]} ${PIPESTATUS[1]}
  141 0
 
  It is interesting that this problem appears only with pipes. When I save
 the output of samtools to a file, the error code is 0.

 SIGPIPE means that samtools tried to write on a pipe but there was no-one
 to read it.  So this means that another_program closed its input and exited
 while samtools mpileup still had more data to write.  Perhaps it was some
 trivial trailer or statistics and so doesn't much matter; but perhaps you
 should check that another_program really did process all the way to the end
 of the last chromosome!

 John

 --
  The Wellcome Trust Sanger Institute is operated by Genome Research
  Limited, a charity registered in England with number 1021457 and a
  company registered in England with number 2742969, whose registered
  office is 215 Euston Road, London, NW1 2BE.

--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help


[Samtools-help] samtools mpileup call snp

2014-10-24 Thread ggu
hi all ;

I call snp by samtools mpileup and get results of vcf format; and  i 
find the quality of snp which most are 222,and the max of them are also 222, 
could you  tell me the reasons.

I am looking foward to hearing from you soon.

Sincerely,

fang--
___
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help