Dear John,
I did a few test in my spare time. All samtools version from 0.1.19 have
the same sorting problem, with or without the use of (-@) multiple threads.
Version 0.1.18 is able to sort the file correctly, but is slower than
sambamba, especially for really big bam files.
I have a reduce unsorted bam file of ~500Mb that can be used to reproduce
this issue. How would you like me to send it to you?
Regards,
Juan Montenegro
2016-10-07 23:07 GMT+10:00 John Marshall <j...@sanger.ac.uk>:
> On 7 Oct 2016, at 06:24, Juan Daniel Montenegro Cabrera <
> jdmonteneg...@gmail.com> wrote:
> >
> > samtools view -bh@ 15 in.sam | samtools sort -T tmp -@ 15 -o
> out.sorted.bam -
> >
> > When I try to index the sorted bam file it complains:
> >
> > samtools index out.sorted.bam
> > [E::hts_idx_push] NO_COOR reads not in a single block at the end 16 -1
> > samtools index: "SORT.0.bam" is corrupted or unsorted
> >
> > when I check the last lines of the file, it effectively has mapped reads
> at the end, instead of the unmapped reads. The last 194 records in the
> sorted bam file are mapped reads and they come right after the unmapped
> reads:
> >
> > tail -n 195 SynOpDH_0.sam | head -n 2
> > SRR1170581.75133375 141 * 0 0 * * 0
> 0 CAACATAAATTTGGCACACAAATAGTTCTCCATTAACCCTTTTAGTAAAAAGAGTAGAAT
> CTATTTTCCAATTTCAAAGCCTTTTTCAATGAGGAACTTGGTTAAGCATTTATAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGAT
> CCCFFFFFHHHHHJJHJJJJJJHIIHIJJJJJJJJJJJJJJJJJJHIIJJHIIIBBFHII
> JJJJJJJJJJIJJJJJJJHIHHHHHFFFFFFEEEEDDDDDDDDDDCDDCDDEFEDDEDDD
> DDBDDDB@BDBDACDDBCDDDDC>C>BCDA YT:Z:UP
> > SRR1170581.75025193 133 6B_concat 1073997887 0 *
> = 1073997887 0 AGGGTAGTAGCATTGCCCCTTCTCTCTTTT
> TCTCTCATTTTTTTGTTTTATCTTTTTTTGGGGGGGCCCTCTATTTTTTTGGCCTCTTTT
> TTTTCGTCCGGAGTCTCAACCCGACTTGTGGGGGAATCATAGTCTCCATCATCCTTTCCT
> BBCFDDFFHHHHHJJJJJJJJIIIJJJJJJJIIIJJGIIJJJJJJJJJJGFHIJJJJJHF
> FDDDDDDDDDDDCDEEEDDDD@CDDDDDDDDDDDCBBDBBBB@BCDDEDDDDD>BBDCCCDBD@9BBDDDCDDEEDDCDDDDDDDCCDCC
> YT:Z:UP
> >
> > In total there are 4561428 reads that map to the 6B_concat reference,
> but for some reason these 194 reads keep appearing at the end of the sorted
> file.
>
> These reads are the sequence number 16 (i.e. 6B_concat) reads following
> the really-unmapped reads that "NO_COOR reads not in a single block at the
> end 16 -1" is complaining about, and they really are not sorted.
>
> > Any ideas why this might be happening?
>
> To figure out what's going on here, it would be very helpful if you were
> able to provide us with access to the sort input file, so we can try to
> reproduce this.
>
> In the meantime, please try removing '-@ 15' from the sort command and
> sorting with just one thread. I am grasping at straws here, but it would
> be interesting to see whether the problem persists in this case.
>
> John
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help