John

You are right!
Here is a short fragment of the file where the sorting starts to be incorrect.

D00689:262:CAUR4ANXX:1:2210:12001:8808/1                 99           
AMEXGS_00350000072                 1073537649
D00689:262:CAUR4ANXX:1:2212:4262:62218/2                 131         
AMEXGS_00350000072                 1073537662
>>> Incorrect sorting beyond this point
D00689:262:CAUR4ANXX:1:1311:7571:42633/1                 115         
AMEXGS_00350000052                 1073757534
D00689:262:CAUR4ANXX:1:1209:14064:92038/1              115         
AMEXGS_00350000052                 1073762716
D00689:262:CAUR4ANXX:1:1207:19595:38302/1              67           
AMEXGS_00350000066                 1073768421
D00689:262:CAUR4ANXX:1:1202:12406:98751/1              83           
AMEXGS_00350000071                 1073787822
D00689:262:CAUR4ANXX:1:1205:7367:92746/2                 131         
AMEXGS_00350000050                 1073798500

The reads map beyond 1,073,741,824.

I guess I’ll just write a small sorting utility.

Thank you!
Best regards
Sergej


Dr. Sergej Nowoshilow
Post-doc in Tanaka Lab

Elly Tanaka group
Animal models of regeneration
Campus-Vienna-Biocenter 1
1030 Vienna

email: sergej.nowoshi...@imp.ac.at
phone: +43 (0) 1 79730 3203

This message is confidential and may contain privileges information. It is 
intended for the named recipients only. If you receive it in error please 
notify me and permanently delete the original message and any copies.

Von: John Marshall <john.w.marsh...@glasgow.ac.uk>
Datum: Samstag, 10. März 2018 um 09:55
An: "Nowoshilow,Sergej" <sergej.nowoshi...@imp.ac.at>
Cc: "samtools-help@lists.sourceforge.net" <samtools-help@lists.sourceforge.net>
Betreff: Re: [Samtools-help] Extremely long reference sequences

On 9 Mar 2018, at 23:56, Nowoshilow,Sergej 
<sergej.nowoshi...@imp.ac.at<mailto:sergej.nowoshi...@imp.ac.at>> wrote:
Apparently, the BAM file is not quite correctly sorted after all. A simple test
samtools view test.sorted.bam | cut -f3 | uniq
proofs that it is indeed the case, since the scaffold IDs are sorted over ~90% 
of the file (e.g. from scaffold001-scaffold100), while the last 10% are not 
sorted at all, e.g. scaffold052 may follow the scaffold100 an so on. Therefore, 
the problem is rather "samtools sort" and not "samtools index".

Does this jumbled last 10% consist of reads mapped at locations beyond 2^30? I 
suspect you have rediscovered https://github.com/samtools/samtools/issues/615 
which it appears fell off the samtools maintainers' to-do list and was never 
fixed. Alas.

    John
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to