Dear John,

I did a few test in my spare time. All samtools version from 0.1.19 have
the same sorting problem, with or without the use of (-@) multiple threads.
Version 0.1.18 is able to sort the file correctly, but is slower than
sambamba, especially for really big bam files.
I have a reduce unsorted bam file of ~500Mb that can be used to reproduce
this issue. How would you like me to send it to you?

Regards,

Juan Montenegro

2016-10-07 23:07 GMT+10:00 John Marshall <j...@sanger.ac.uk>:

> On 7 Oct 2016, at 06:24, Juan Daniel Montenegro Cabrera <
> jdmonteneg...@gmail.com> wrote:
> >
> > samtools view -bh@ 15 in.sam | samtools sort -T tmp -@ 15 -o
> out.sorted.bam -
> >
> > When I try to index the sorted bam file it complains:
> >
> > samtools index out.sorted.bam
> > [E::hts_idx_push] NO_COOR reads not in a single block at the end 16 -1
> > samtools index: "SORT.0.bam" is corrupted or unsorted
> >
> > when I check the last lines of the file, it effectively has mapped reads
> at the end, instead of the unmapped reads. The last 194 records in the
> sorted bam file are mapped reads and they come right after the unmapped
> reads:
> >
> > tail -n 195 SynOpDH_0.sam | head -n 2
> > SRR1170581.75133375   141     *       0       0       *       *       0
>      0       CAACATAAATTTGGCACACAAATAGTTCTCCATTAACCCTTTTAGTAAAAAGAGTAGAAT
> CTATTTTCCAATTTCAAAGCCTTTTTCAATGAGGAACTTGGTTAAGCATTTATAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGAT
> CCCFFFFFHHHHHJJHJJJJJJHIIHIJJJJJJJJJJJJJJJJJJHIIJJHIIIBBFHII
> JJJJJJJJJJIJJJJJJJHIHHHHHFFFFFFEEEEDDDDDDDDDDCDDCDDEFEDDEDDD
> DDBDDDB@BDBDACDDBCDDDDC>C>BCDA  YT:Z:UP
> > SRR1170581.75025193   133     6B_concat       1073997887      0       *
>      =       1073997887      0       AGGGTAGTAGCATTGCCCCTTCTCTCTTTT
> TCTCTCATTTTTTTGTTTTATCTTTTTTTGGGGGGGCCCTCTATTTTTTTGGCCTCTTTT
> TTTTCGTCCGGAGTCTCAACCCGACTTGTGGGGGAATCATAGTCTCCATCATCCTTTCCT
> BBCFDDFFHHHHHJJJJJJJJIIIJJJJJJJIIIJJGIIJJJJJJJJJJGFHIJJJJJHF
> FDDDDDDDDDDDCDEEEDDDD@CDDDDDDDDDDDCBBDBBBB@BCDDEDDDDD>BBDCCCDBD@9BBDDDCDDEEDDCDDDDDDDCCDCC
> YT:Z:UP
> >
> > In total there are 4561428 reads that map to the 6B_concat reference,
> but for some reason these 194 reads keep appearing at the end of the sorted
> file.
>
> These reads are the sequence number 16 (i.e. 6B_concat) reads following
> the really-unmapped reads that "NO_COOR reads not in a single block at the
> end 16 -1" is complaining about, and they really are not sorted.
>
> > Any ideas why this might be happening?
>
> To figure out what's going on here, it would be very helpful if you were
> able to provide us with access to the sort input file, so we can try to
> reproduce this.
>
> In the meantime, please try removing '-@ 15' from the sort command and
> sorting with just one thread.  I am grasping at straws here, but it would
> be interesting to see whether the problem persists in this case.
>
>     John
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to