Hi Nenad,
Because this is something I might want in the future anyway:
https://github.com/dpryan79/samtools/blob/HI/bam_sort.c
I've only tested that on your test.bam file, but it's at least a starting
point. That will order alignments with the same name but no HI tag before
those with an HI tag, should such a situation ever occur. It'll then order
things by read#1/2 within each HI. So the output is probably similar to
novosort's.
Best,
Devon
--
Devon Ryan, Ph.D.
Email: dpr...@dpryan.com
Laboratory for Molecular and Cellular Cognition
German Centre for Neurodegenerative Diseases (DZNE)
Ludwig-Erhard-Allee 2
53175 Bonn
Germany
<devon.r...@dzne.de>
On Wed, Oct 22, 2014 at 9:20 AM, Nenad Bartonicek <
n.bartoni...@garvan.org.au> wrote:
> Dear all,
>
> Samtools (both 0.1.19 and 1.0.0) make a sorting error when sorting by
> name. (Minimal example dataset is here:
> https://pwbc.garvan.org.au/~nenbar/samtools/test.bam)
> If we look at 2 paired end reads mapping to two transcripts, the paired
> mates follow each other and their mapping to the transcripts.
>
> samtools view test.bam
>
> >HISEQ:51:C315AACXX:4:1101:1189:97694 419 *TCONS_00000145* 14 3 99M = 117
> 136
> CCATAAGCGGAGAAAGAGGGAATGACATTGTTCTTACACGGCACAAGCAGACAAAATCAACATGGTCATTTAGAAATCGGAGGTGTGGATGCTCTCTAT
> CCCFFFFFHHGHHIIJGIJJIIJJJJIJJJIJJJJIJJIJGIIJJJJJIIIJGHHGHFFFEFEEECCEDEEFDDDDDDDCBDD2?>@BCDDDDDDAAA@
> NH:i:2 HI:i:1
> >HISEQ:51:C315AACXX:4:1101:1189:97694 339 *TCONS_00000145* 117 3 33M = 14
> -136 CGGAGAAATATGGTACACCTCTTTACGTATATG ?HCA4HHEC?HHFEC?<G@ABHFDA?D?B???@
> NH:i:2 HI:i:1
> >HISEQ:51:C315AACXX:4:1101:1189:97694 355 *TCONS_00000146* 890 3 33M = 927
> 136 CATATACGTAAAGAGGTGTACCATATTTCTCCG @???B?D?ADFHBA@G<?CEFHH?CEHH4ACH?
> NH:i:2 HI:i:2
> >HISEQ:51:C315AACXX:4:1101:1189:97694 403 *TCONS_00000146* 927 3 99M = 890
> -136
> ATAGAGAGCATCCACACCTCCGATTTCTAAATGACCATGTTGATTTTGTCTGCTTGTGCCGTGTAAGAACAATGTCATTCCCTCTTTCTCCGCTTATGG
> @AAADDDDDDCB@
> >?2DDBCDDDDDDDFEEDECCEEEFEFFFHGHHGJIIIJJJJJIIGJIJJIJJJJIJJJIJJJJIIJJIGJIIHHGHHFFFFFCCC
> NH:i:2 HI:i:2
>
> However, when we sort by name, the transcripts are scrambled.
>
> samtools sort -n test.bam test.sorted
> samtools view test.sorted.bam
>
> >HISEQ:51:C315AACXX:4:1101:1189:97694 339 *TCONS_00000145* 117 3 33M = 14
> -136 CGGAGAAATATGGTACACCTCTTTACGTATATG ?HCA4HHEC?HHFEC?<G@ABHFDA?D?B???@
> NH:i:2 HI:i:1
> >HISEQ:51:C315AACXX:4:1101:1189:97694 355 *TCONS_00000146* 890 3 33M = 927
> 136 CATATACGTAAAGAGGTGTACCATATTTCTCCG @???B?D?ADFHBA@G<?CEFHH?CEHH4ACH?
> NH:i:2 HI:i:2
> >HISEQ:51:C315AACXX:4:1101:1189:97694 419 *TCONS_00000145* 14 3 99M = 117
> 136
> CCATAAGCGGAGAAAGAGGGAATGACATTGTTCTTACACGGCACAAGCAGACAAAATCAACATGGTCATTTAGAAATCGGAGGTGTGGATGCTCTCTAT
> CCCFFFFFHHGHHIIJGIJJIIJJJJIJJJIJJJJIJJIJGIIJJJJJIIIJGHHGHFFFEFEEECCEDEEFDDDDDDDCBDD2?>@BCDDDDDDAAA@
> NH:i:2 HI:i:1
> >HISEQ:51:C315AACXX:4:1101:1189:97694 403 *TCONS_00000146* 927 3 99M = 890
> -136
> ATAGAGAGCATCCACACCTCCGATTTCTAAATGACCATGTTGATTTTGTCTGCTTGTGCCGTGTAAGAACAATGTCATTCCCTCTTTCTCCGCTTATGG
> @AAADDDDDDCB@
> >?2DDBCDDDDDDDFEEDECCEEEFEFFFHGHHGJIIIJJJJJIIGJIJJIJJJJIJJJIJJJJIIJJIGJIIHHGHHFFFFFCCC
> NH:i:2 HI:i:2
>
> This causes an error in some tools (such as RSEM) that require paired
> read mates to follow each other.
>
> Is this a bug or a feature, and if a feature, how can one go around it
> without using bash sort.
>
> Cheers,
>
> Nenad
>
> Nenad Bartonicek, PhD
> Bioinformatic Officer
> Centre for Clinical Genomics
> Garvan Institute of Medical Research
> 384 Victoria Street
> Sydney NSW 2010
> Australia
>
> E: n.bartoni...@garvan.org.au
> T: +61(02)92955764
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help
>
>
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help