On 7 Oct 2016, at 06:24, Juan Daniel Montenegro Cabrera <jdmonteneg...@gmail.com> wrote: > > samtools view -bh@ 15 in.sam | samtools sort -T tmp -@ 15 -o out.sorted.bam - > > When I try to index the sorted bam file it complains: > > samtools index out.sorted.bam > [E::hts_idx_push] NO_COOR reads not in a single block at the end 16 -1 > samtools index: "SORT.0.bam" is corrupted or unsorted > > when I check the last lines of the file, it effectively has mapped reads at > the end, instead of the unmapped reads. The last 194 records in the sorted > bam file are mapped reads and they come right after the unmapped reads: > > tail -n 195 SynOpDH_0.sam | head -n 2 > SRR1170581.75133375 141 * 0 0 * * 0 > 0 > CAACATAAATTTGGCACACAAATAGTTCTCCATTAACCCTTTTAGTAAAAAGAGTAGAATCTATTTTCCAATTTCAAAGCCTTTTTCAATGAGGAACTTGGTTAAGCATTTATAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGAT > > CCCFFFFFHHHHHJJHJJJJJJHIIHIJJJJJJJJJJJJJJJJJJHIIJJHIIIBBFHIIJJJJJJJJJJIJJJJJJJHIHHHHHFFFFFFEEEEDDDDDDDDDDCDDCDDEFEDDEDDDDDBDDDB@BDBDACDDBCDDDDC>C>BCDA > YT:Z:UP > SRR1170581.75025193 133 6B_concat 1073997887 0 * > = 1073997887 0 > AGGGTAGTAGCATTGCCCCTTCTCTCTTTTTCTCTCATTTTTTTGTTTTATCTTTTTTTGGGGGGGCCCTCTATTTTTTTGGCCTCTTTTTTTTCGTCCGGAGTCTCAACCCGACTTGTGGGGGAATCATAGTCTCCATCATCCTTTCCT > > BBCFDDFFHHHHHJJJJJJJJIIIJJJJJJJIIIJJGIIJJJJJJJJJJGFHIJJJJJHFFDDDDDDDDDDDCDEEEDDDD@CDDDDDDDDDDDCBBDBBBB@BCDDEDDDDD>BBDCCCDBD@9BBDDDCDDEEDDCDDDDDDDCCDCC > YT:Z:UP > > In total there are 4561428 reads that map to the 6B_concat reference, but for > some reason these 194 reads keep appearing at the end of the sorted file.
These reads are the sequence number 16 (i.e. 6B_concat) reads following the really-unmapped reads that "NO_COOR reads not in a single block at the end 16 -1" is complaining about, and they really are not sorted. > Any ideas why this might be happening? To figure out what's going on here, it would be very helpful if you were able to provide us with access to the sort input file, so we can try to reproduce this. In the meantime, please try removing '-@ 15' from the sort command and sorting with just one thread. I am grasping at straws here, but it would be interesting to see whether the problem persists in this case. John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help