Dear all,

Samtools (both 0.1.19 and 1.0.0) make a sorting error when sorting by name. 
(Minimal example dataset is here: 
https://pwbc.garvan.org.au/~nenbar/samtools/test.bam)
If we look at 2 paired end reads mapping to two transcripts, the paired mates 
follow each other and their mapping to the transcripts.

samtools view test.bam

>HISEQ:51:C315AACXX:4:1101:1189:97694 419 TCONS_00000145 14 3 99M = 117 136 
>CCATAAGCGGAGAAAGAGGGAATGACATTGTTCTTACACGGCACAAGCAGACAAAATCAACATGGTCATTTAGAAATCGGAGGTGTGGATGCTCTCTAT
> 
>CCCFFFFFHHGHHIIJGIJJIIJJJJIJJJIJJJJIJJIJGIIJJJJJIIIJGHHGHFFFEFEEECCEDEEFDDDDDDDCBDD2?>@BCDDDDDDAAA@
> NH:i:2 HI:i:1
>HISEQ:51:C315AACXX:4:1101:1189:97694 339 TCONS_00000145 117 3 33M = 14 -136 
>CGGAGAAATATGGTACACCTCTTTACGTATATG ?HCA4HHEC?HHFEC?<G@ABHFDA?D?B???@ NH:i:2 
>HI:i:1
>HISEQ:51:C315AACXX:4:1101:1189:97694 355 TCONS_00000146 890 3 33M = 927 136 
>CATATACGTAAAGAGGTGTACCATATTTCTCCG @???B?D?ADFHBA@G<?CEFHH?CEHH4ACH? NH:i:2 
>HI:i:2
>HISEQ:51:C315AACXX:4:1101:1189:97694 403 TCONS_00000146 927 3 99M = 890 -136 
>ATAGAGAGCATCCACACCTCCGATTTCTAAATGACCATGTTGATTTTGTCTGCTTGTGCCGTGTAAGAACAATGTCATTCCCTCTTTCTCCGCTTATGG
> 
>@AAADDDDDDCB@>?2DDBCDDDDDDDFEEDECCEEEFEFFFHGHHGJIIIJJJJJIIGJIJJIJJJJIJJJIJJJJIIJJIGJIIHHGHHFFFFFCCC
> NH:i:2 HI:i:2

However, when we sort by name, the transcripts are scrambled.

samtools sort -n test.bam test.sorted
samtools view test.sorted.bam

>HISEQ:51:C315AACXX:4:1101:1189:97694 339 TCONS_00000145 117 3 33M = 14 -136 
>CGGAGAAATATGGTACACCTCTTTACGTATATG ?HCA4HHEC?HHFEC?<G@ABHFDA?D?B???@ NH:i:2 
>HI:i:1
>HISEQ:51:C315AACXX:4:1101:1189:97694 355 TCONS_00000146 890 3 33M = 927 136 
>CATATACGTAAAGAGGTGTACCATATTTCTCCG @???B?D?ADFHBA@G<?CEFHH?CEHH4ACH? NH:i:2 
>HI:i:2
>HISEQ:51:C315AACXX:4:1101:1189:97694 419 TCONS_00000145 14 3 99M = 117 136 
>CCATAAGCGGAGAAAGAGGGAATGACATTGTTCTTACACGGCACAAGCAGACAAAATCAACATGGTCATTTAGAAATCGGAGGTGTGGATGCTCTCTAT
> 
>CCCFFFFFHHGHHIIJGIJJIIJJJJIJJJIJJJJIJJIJGIIJJJJJIIIJGHHGHFFFEFEEECCEDEEFDDDDDDDCBDD2?>@BCDDDDDDAAA@
> NH:i:2 HI:i:1
>HISEQ:51:C315AACXX:4:1101:1189:97694 403 TCONS_00000146 927 3 99M = 890 -136 
>ATAGAGAGCATCCACACCTCCGATTTCTAAATGACCATGTTGATTTTGTCTGCTTGTGCCGTGTAAGAACAATGTCATTCCCTCTTTCTCCGCTTATGG
> 
>@AAADDDDDDCB@>?2DDBCDDDDDDDFEEDECCEEEFEFFFHGHHGJIIIJJJJJIIGJIJJIJJJJIJJJIJJJJIIJJIGJIIHHGHHFFFFFCCC
> NH:i:2 HI:i:2

This causes an error in some tools (such as RSEM) that require paired read 
mates to follow each other.

Is this a bug or a feature, and if a feature, how can one go around it without 
using bash sort.

Cheers,

Nenad

Nenad Bartonicek, PhD
Bioinformatic Officer
Centre for Clinical Genomics
Garvan Institute of Medical Research
384 Victoria Street
Sydney NSW 2010
Australia

E: n.bartoni...@garvan.org.au<mailto:n.bartoni...@garvan.org.au>
T: +61(02)92955764

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to