Hi,
You can download the unsorted bam file from here:

https://cloudstor.aarnet.edu.au/plus/index.php/s/p90bYldJoqE5Fbv

Regards,

Juan Montenegro

2016-10-10 14:35 GMT+10:00 Juan Daniel Montenegro Cabrera <
jdmonteneg...@gmail.com>:

> Dear John,
>
> I did a few test in my spare time. All samtools version from 0.1.19 have
> the same sorting problem, with or without the use of (-@) multiple threads.
> Version 0.1.18 is able to sort the file correctly, but is slower than
> sambamba, especially for really big bam files.
> I have a reduce unsorted bam file of ~500Mb that can be used to reproduce
> this issue. How would you like me to send it to you?
>
> Regards,
>
> Juan Montenegro
>
> 2016-10-07 23:07 GMT+10:00 John Marshall <j...@sanger.ac.uk>:
>
>> On 7 Oct 2016, at 06:24, Juan Daniel Montenegro Cabrera <
>> jdmonteneg...@gmail.com> wrote:
>> >
>> > samtools view -bh@ 15 in.sam | samtools sort -T tmp -@ 15 -o
>> out.sorted.bam -
>> >
>> > When I try to index the sorted bam file it complains:
>> >
>> > samtools index out.sorted.bam
>> > [E::hts_idx_push] NO_COOR reads not in a single block at the end 16 -1
>> > samtools index: "SORT.0.bam" is corrupted or unsorted
>> >
>> > when I check the last lines of the file, it effectively has mapped
>> reads at the end, instead of the unmapped reads. The last 194 records in
>> the sorted bam file are mapped reads and they come right after the unmapped
>> reads:
>> >
>> > tail -n 195 SynOpDH_0.sam | head -n 2
>> > SRR1170581.75133375   141     *       0       0       *       *
>>  0       0       CAACATAAATTTGGCACACAAATAGTTCTC
>> CATTAACCCTTTTAGTAAAAAGAGTAGAATCTATTTTCCAATTTCAAAGCCTTTTTCAAT
>> GAGGAACTTGGTTAAGCATTTATAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGAT
>> CCCFFFFFHHHHHJJHJJJJJJHIIHIJJJJJJJJJJJJJJJJJJHIIJJHIIIBBFHII
>> JJJJJJJJJJIJJJJJJJHIHHHHHFFFFFFEEEEDDDDDDDDDDCDDCDDEFEDDEDDD
>> DDBDDDB@BDBDACDDBCDDDDC>C>BCDA  YT:Z:UP
>> > SRR1170581.75025193   133     6B_concat       1073997887      0
>>  *       =       1073997887      0       AGGGTAGTAGCATTGCCCCTTCTCTCTTTT
>> TCTCTCATTTTTTTGTTTTATCTTTTTTTGGGGGGGCCCTCTATTTTTTTGGCCTCTTTT
>> TTTTCGTCCGGAGTCTCAACCCGACTTGTGGGGGAATCATAGTCTCCATCATCCTTTCCT
>> BBCFDDFFHHHHHJJJJJJJJIIIJJJJJJJIIIJJGIIJJJJJJJJJJGFHIJJJJJHF
>> FDDDDDDDDDDDCDEEEDDDD@CDDDDDDDDDDDCBBDBBBB@BCDDEDDDDD>
>> BBDCCCDBD@9BBDDDCDDEEDDCDDDDDDDCCDCC  YT:Z:UP
>> >
>> > In total there are 4561428 reads that map to the 6B_concat reference,
>> but for some reason these 194 reads keep appearing at the end of the sorted
>> file.
>>
>> These reads are the sequence number 16 (i.e. 6B_concat) reads following
>> the really-unmapped reads that "NO_COOR reads not in a single block at the
>> end 16 -1" is complaining about, and they really are not sorted.
>>
>> > Any ideas why this might be happening?
>>
>> To figure out what's going on here, it would be very helpful if you were
>> able to provide us with access to the sort input file, so we can try to
>> reproduce this.
>>
>> In the meantime, please try removing '-@ 15' from the sort command and
>> sorting with just one thread.  I am grasping at straws here, but it would
>> be interesting to see whether the problem persists in this case.
>>
>>     John
>>
>> --
>>  The Wellcome Trust Sanger Institute is operated by Genome Research
>>  Limited, a charity registered in England with number 1021457 and a
>>  company registered in England with number 2742969, whose registered
>>  office is 215 Euston Road, London, NW1 2BE.
>>
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to