I'm running into an issue trying to remove duplicate reads with Samtools and 
think it might be a bug in the latest version.

I have a pipeline to process my aligned data before putting it into Freebayes 
to call variants that works as such:
Code:

samtools view -S -b foo.sam > foo.bamnorg       # convert from sam to bam
bamaddrg -b foo.bamnorg > foo.bam           # ensure read groups are attached
 samtools sort foo.bam foo.sorted .bam          # sorts
 samtools index foo.sorted.bam                          # index the sorted bam
 samtools rmdup foo.sorted.bam foo.sorted.1.bam  # remove duplicate reads
 freebayes ...

The entire pipeline works fine up until rmdup where it immediately results in a 
segmentation fault.

What's particularly interesting is that this seg fault only occurs in some 
files. I originally thought this had something to do with size or memory 
allocation, since rmdup worked fine with RNA seq data but not with this genome 
seq data. However, the seg fault still occurred even when I allocated more 
memory.

Furthermore, I think this might be a bug. The lab group I work with recently 
switched to a new cluster system. I run into the seg fault on the new cluster, 
which is running samtools 1.3, but the seg fault does not occur on the old 
cluster, which is running version 0.1.18.

The error has been narrowed down to a line number in the bam.c file, with debug 
output below:
Code:

$ gdb samtools core
GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from samtools...done.

[New LWP 11878]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `samtools rmdup NAT_B_9.sorted.bam NAT_B_9.sorted.1.bam'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  bam_get_library (h=h@entry=0x8bb1c0, b=b@entry=0x8db9c0) at bam.c:112
112             for (cp = LB; *cp && *cp != '\t' && *cp != '\n'; cp++)
(gdb) list
107             if (strncmp(rg, ID, strlen(rg)) != 0 || ID[strlen(rg)] != '\t')
108                 continue;
109
110             // Valid until next query
111             static char LB_text[1024];
112             for (cp = LB; *cp && *cp != '\t' && *cp != '\n'; cp++)
113                 ;
114             strncpy(LB_text, LB, MIN(cp-LB, 1023));
115             LB_text[MIN(cp-LB, 1023)] = 0;
116
(gdb)

Does anyone have any suggestions/potential fixes?




Ryan Friedman
Washington University in St. Louis
B.A. Candidate in Genomics & Computational Biology, Computer Science
Class of 2017
[email protected]<mailto:[email protected]> | (440) 708-4736

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to