Hi Martin, Adding 'calmdnmrecompindetonly=1' will increase performance further as it will only recompute the MD/NM values if the reference section has ambiguity/N within the span of the reads.
Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute k...@sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 7703 Office: H104 > On 2 Aug 2016, at 14:47, Martin MOKREJŠ <mmokr...@gmail.com> wrote: > > Keiran Raine wrote: >> For BAM in/out yes: >> >> inputthreads=<[1]> : input helper threads (for inputformat=bam >> only, default: 1) >> outputthreads=<[1]> : output helper threads (for outputformat=bam >> only, default: 1) > > bamsort fixmates=1 calmdnm=1 calmdnmreference="$reference" blockmb=40960 > inputthreads=8 outputthreads=8 level=9 > I="$sample".realignedtogether.BQSR.namesorted.bam > O="$sample".realignedtogether.BQSR.namesorted.fixmate.calmd.bam > > The above takes about 3 cores during input and since much later it starts > writing output it takes 8 cores. Maybe just because of the extreme output > compression only. But definitely, it outperformed "samtools clamd" step doing > half of the work (just MD: tag calculations). Actually, processing the whole > file took maybe 2 minutes in total? "samtools calmd" ran out of wallclock > time limit at 12hrs on a cluster node (running on a single core). > > Thank you for pointing me to bamsort, I added biobambam2 with libmaus2 to my > Gentoo Linux recently (is in science overlay now), so it was simple to call > it. > > Martin -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
_______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help