On Tue, 22 Oct 2024, Nate Brown wrote:
I'm running a pipeline that uses a very old version of samtools mpileup version 1.6. When we switched to the more modern bcftools mpileup version 1.16 we noticed that the data changed slightly, for example some of the lower frequency calls that used to have 2 supporting reads now only have 1 or vice versa. Although the shifts in data are very subtle, we would like to set the bcftools mpileup commands so that it produces an outputs exactly like the old samtools mpileup. I don't expect anyone here to know what these exact parameters are, but I was wondering if anyone had any general guidance on how the mpileup command differs between samtools and bcftools. Searching the readme.txt files between releases it seems that mpileup had a number of updates on how indels were detected and I see some of the defaults changed (i.e. min-iread default is 2 instead of 1, gap-frac is 0.05 instead of 0.002, etc. etc.) However, even when I explicitly set the values for the arguments in which the defaults changed, I can never exactly mimic the old output. Would it even be possible to do this? Any advice would be appreciated. Thanks!
I'm afraid it's unlikely that this is possible, due to the number of changes that have been made to bcftools mpileup since it was forked from the original in samtools. In fact, this was one of the reasons for removing the VCF/BCF output from samtools mpileup - it was not generating some of the tags that were expected by bcftools call.
Looking at the change-log, you could try options like --config 1.12, -U and --full-BAQ (if using BAQ). I wouldn't be surprised if you still see differences though.
Rob Davies r...@sanger.ac.uk The Sanger Institute http://www.sanger.ac.uk/ Hinxton, Cambs., Tel. +44 (1223) 834244 CB10 1SA, U.K. Fax. +44 (1223) 494919 ---------------------------------------------------------------------- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA. _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help