Dear Thomas, Thank you for the quick response and for the suggestion! I created a clean test case file and tried both - "samtools index" with the "-b" switch, which should generate a normal BAI file and predictably failed, and with the "-c -m 34" switches, which should have generated an index file with Bin 0 spanning 2^34 (~17 billion) bases. However, it also failed.
BAI: samtools index test.sorted.bam [E::hts_idx_push] Region 536870821..536870933 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6 samtools index: failed to create index for "test.sorted.bam": Numerical result out of range CSI: samtools index -c -m 34 test.sorted.bam [E::hts_idx_push] Region 2671..2751 cannot be stored in a csi index with min_shift = 34, n_lvls = 10. Try using min_shift = 14, n_lvls >= 0 samtools index: failed to create index for "test.sorted.bam": Numerical result out of range When I run the tool as suggested with min_shift = 14 I get another error message: samtools index -c -m 14 test.sorted.bam [E::hts_idx_push] chromosome blocks not continuous samtools index: failed to create index for "test.sorted.bam": No such file or directory Apparently, the BAM file is not quite correctly sorted after all. A simple test samtools view test.sorted.bam | cut -f3 | uniq proofs that it is indeed the case, since the scaffold IDs are sorted over ~90% of the file (e.g. from scaffold001-scaffold100), while the last 10% are not sorted at all, e.g. scaffold052 may follow the scaffold100 an so on. Therefore, the problem is rather "samtools sort" and not "samtools index". I scanned through the source code in hope to find a variable that would be a good candidate for run over. However, they seem to be at least of type INT, which would be fine for sequences up to 2^32-1, while the longest scaffold I have so far is 1.9Gbp. At the moment I'm stuck, but I will continue tomorrow. Please let me know if there is an obvious workaround for that issue.. I may try to sort the reads myself, though, but I'm wondering if I'm overlooking something. Thanks a lot! Best regards Sergej Dr. Sergej Nowoshilow Post-doc in Tanaka Lab Elly Tanaka group Animal models of regeneration Campus-Vienna-Biocenter 1 1030 Vienna email: sergej.nowoshi...@imp.ac.at phone: +43 (0) 1 79730 3203 This message is confidential and may contain privileges information. It is intended for the named recipients only. If you receive it in error please notify me and permanently delete the original message and any copies. Am 09.03.18, 22:56 schrieb "Thomas W. Blackwell" <tbla...@umich.edu>: Sergej - I can't give you the details, but you should look at the SAM/BAM Format document from www.htslib.org. My old copy is dated 28 Dec 2014. Go to page 15, the two-line paragraph just before 5.1.2. This suggests using CSI index format rather than the default BAI index format. I think samtools supports both. Let the list know if this helps. - tom blackwell - On Fri, 9 Mar 2018, Nowoshilow,Sergej wrote: > Dear SAMtools developers and community > > Our group is working with the axolotl genome, which is 10x larger than that of the human. It has 14 chromosomes and, thus, some (if not all) of the chromosomes are longer than 2Gbp.. Although we don?t have chromosome-size scaffolds yet, we are trying our best and managed to assemble some very long scaffolds (with quite some gaps ? N?s): ~1.5Gbp. > Now I am running into problems with those long scaffolds, since although it is perfectly possible to map the RNA/DNAseq reads to the scaffolds it is not possible to sort and index the resulting BAM files, which means that they cannot be viewed in the genome browser? > I tried ?samtools index -c -m?, but unsuccessfully irrespective of the value specified by the ?-m? option? > The problem seems to be the LENGTH of the scaffold. I also looked at the source code (however, not deep enough, therefore, excuse me if I?m wrong) and did some testing with different datasets and reference sequences and have a feeling that some internal variables might ?overrun? if the position of a read within the scaffold exceeds ~500,000,000.. is it right? > If I?m right, is there any fix to that problem or is it an inherent issue that cannot be fixed easily? > > I would highly appreciate any advice on how to deal with that issue. Theoretically, I could split our long scaffolds into shorter pieces, however, that would defeat the notion of assembling chromosome-size scaffolds. > > Thank you very much in advance! > Best regards > Sergej > > > Dr. Sergej Nowoshilow > Post-doc/Bioinformatician in Tanaka Lab > > Elly Tanaka group > Animal models of regeneration > Campus-Vienna-Biocenter 1 > 1030 Vienna > > email: sergej.nowoshi...@imp.ac.at > phone: +43 (0) 1 79730 3203 > > This message is confidential and may contain privileges information. It is intended for the named recipients only. If you receive it in error please notify me and permanently delete the original message and any copies. > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help