John
You are right!
Here is a short fragment of the file where the sorting starts to be incorrect.
D00689:262:CAUR4ANXX:1:2210:12001:8808/1 99
AMEXGS_00350000072 1073537649
D00689:262:CAUR4ANXX:1:2212:4262:62218/2 131
AMEXGS_00350000072 1073537662
>>> Incorrect sorting beyond this point
D00689:262:CAUR4ANXX:1:1311:7571:42633/1 115
AMEXGS_00350000052 1073757534
D00689:262:CAUR4ANXX:1:1209:14064:92038/1 115
AMEXGS_00350000052 1073762716
D00689:262:CAUR4ANXX:1:1207:19595:38302/1 67
AMEXGS_00350000066 1073768421
D00689:262:CAUR4ANXX:1:1202:12406:98751/1 83
AMEXGS_00350000071 1073787822
D00689:262:CAUR4ANXX:1:1205:7367:92746/2 131
AMEXGS_00350000050 1073798500
The reads map beyond 1,073,741,824.
I guess I’ll just write a small sorting utility.
Thank you!
Best regards
Sergej
Dr. Sergej Nowoshilow
Post-doc in Tanaka Lab
Elly Tanaka group
Animal models of regeneration
Campus-Vienna-Biocenter 1
1030 Vienna
email: sergej.nowoshi...@imp.ac.at
phone: +43 (0) 1 79730 3203
This message is confidential and may contain privileges information. It is
intended for the named recipients only. If you receive it in error please
notify me and permanently delete the original message and any copies.
Von: John Marshall <john.w.marsh...@glasgow.ac.uk>
Datum: Samstag, 10. März 2018 um 09:55
An: "Nowoshilow,Sergej" <sergej.nowoshi...@imp.ac.at>
Cc: "samtools-help@lists.sourceforge.net" <samtools-help@lists.sourceforge.net>
Betreff: Re: [Samtools-help] Extremely long reference sequences
On 9 Mar 2018, at 23:56, Nowoshilow,Sergej
<sergej.nowoshi...@imp.ac.at<mailto:sergej.nowoshi...@imp.ac.at>> wrote:
Apparently, the BAM file is not quite correctly sorted after all. A simple test
samtools view test.sorted.bam | cut -f3 | uniq
proofs that it is indeed the case, since the scaffold IDs are sorted over ~90%
of the file (e.g. from scaffold001-scaffold100), while the last 10% are not
sorted at all, e.g. scaffold052 may follow the scaffold100 an so on. Therefore,
the problem is rather "samtools sort" and not "samtools index".
Does this jumbled last 10% consist of reads mapped at locations beyond 2^30? I
suspect you have rediscovered https://github.com/samtools/samtools/issues/615
which it appears fell off the samtools maintainers' to-do list and was never
fixed. Alas.
John
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help