Dear all,
Samtools (both 0.1.19 and 1.0.0) make a sorting error when sorting by name.
(Minimal example dataset is here:
https://pwbc.garvan.org.au/~nenbar/samtools/test.bam)
If we look at 2 paired end reads mapping to two transcripts, the paired mates
follow each other and their mapping to the transcripts.
samtools view test.bam
>HISEQ:51:C315AACXX:4:1101:1189:97694 419 TCONS_00000145 14 3 99M = 117 136
>CCATAAGCGGAGAAAGAGGGAATGACATTGTTCTTACACGGCACAAGCAGACAAAATCAACATGGTCATTTAGAAATCGGAGGTGTGGATGCTCTCTAT
>
>CCCFFFFFHHGHHIIJGIJJIIJJJJIJJJIJJJJIJJIJGIIJJJJJIIIJGHHGHFFFEFEEECCEDEEFDDDDDDDCBDD2?>@BCDDDDDDAAA@
> NH:i:2 HI:i:1
>HISEQ:51:C315AACXX:4:1101:1189:97694 339 TCONS_00000145 117 3 33M = 14 -136
>CGGAGAAATATGGTACACCTCTTTACGTATATG ?HCA4HHEC?HHFEC?<G@ABHFDA?D?B???@ NH:i:2
>HI:i:1
>HISEQ:51:C315AACXX:4:1101:1189:97694 355 TCONS_00000146 890 3 33M = 927 136
>CATATACGTAAAGAGGTGTACCATATTTCTCCG @???B?D?ADFHBA@G<?CEFHH?CEHH4ACH? NH:i:2
>HI:i:2
>HISEQ:51:C315AACXX:4:1101:1189:97694 403 TCONS_00000146 927 3 99M = 890 -136
>ATAGAGAGCATCCACACCTCCGATTTCTAAATGACCATGTTGATTTTGTCTGCTTGTGCCGTGTAAGAACAATGTCATTCCCTCTTTCTCCGCTTATGG
>
>@AAADDDDDDCB@>?2DDBCDDDDDDDFEEDECCEEEFEFFFHGHHGJIIIJJJJJIIGJIJJIJJJJIJJJIJJJJIIJJIGJIIHHGHHFFFFFCCC
> NH:i:2 HI:i:2
However, when we sort by name, the transcripts are scrambled.
samtools sort -n test.bam test.sorted
samtools view test.sorted.bam
>HISEQ:51:C315AACXX:4:1101:1189:97694 339 TCONS_00000145 117 3 33M = 14 -136
>CGGAGAAATATGGTACACCTCTTTACGTATATG ?HCA4HHEC?HHFEC?<G@ABHFDA?D?B???@ NH:i:2
>HI:i:1
>HISEQ:51:C315AACXX:4:1101:1189:97694 355 TCONS_00000146 890 3 33M = 927 136
>CATATACGTAAAGAGGTGTACCATATTTCTCCG @???B?D?ADFHBA@G<?CEFHH?CEHH4ACH? NH:i:2
>HI:i:2
>HISEQ:51:C315AACXX:4:1101:1189:97694 419 TCONS_00000145 14 3 99M = 117 136
>CCATAAGCGGAGAAAGAGGGAATGACATTGTTCTTACACGGCACAAGCAGACAAAATCAACATGGTCATTTAGAAATCGGAGGTGTGGATGCTCTCTAT
>
>CCCFFFFFHHGHHIIJGIJJIIJJJJIJJJIJJJJIJJIJGIIJJJJJIIIJGHHGHFFFEFEEECCEDEEFDDDDDDDCBDD2?>@BCDDDDDDAAA@
> NH:i:2 HI:i:1
>HISEQ:51:C315AACXX:4:1101:1189:97694 403 TCONS_00000146 927 3 99M = 890 -136
>ATAGAGAGCATCCACACCTCCGATTTCTAAATGACCATGTTGATTTTGTCTGCTTGTGCCGTGTAAGAACAATGTCATTCCCTCTTTCTCCGCTTATGG
>
>@AAADDDDDDCB@>?2DDBCDDDDDDDFEEDECCEEEFEFFFHGHHGJIIIJJJJJIIGJIJJIJJJJIJJJIJJJJIIJJIGJIIHHGHHFFFFFCCC
> NH:i:2 HI:i:2
This causes an error in some tools (such as RSEM) that require paired read
mates to follow each other.
Is this a bug or a feature, and if a feature, how can one go around it without
using bash sort.
Cheers,
Nenad
Nenad Bartonicek, PhD
Bioinformatic Officer
Centre for Clinical Genomics
Garvan Institute of Medical Research
384 Victoria Street
Sydney NSW 2010
Australia
E: n.bartoni...@garvan.org.au<mailto:n.bartoni...@garvan.org.au>
T: +61(02)92955764
NOTICE
Please consider the environment before printing this email. This message and
any attachments are intended for the addressee named and may contain legally
privileged/confidential/copyright information. If you are not the intended
recipient, you should not read, use, disclose, copy or distribute this
communication. If you have received this message in error please notify us at
once by return email and then delete both messages. We accept no liability for
the distribution of viruses or similar in electronic communications. This
notice should not be removed.
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help