I'm running into an issue trying to remove duplicate reads with Samtools and
think it might be a bug in the latest version.
I have a pipeline to process my aligned data before putting it into Freebayes
to call variants that works as such:
Code:
samtools view -S -b foo.sam > foo.bamnorg # convert from sam to bam
bamaddrg -b foo.bamnorg > foo.bam # ensure read groups are attached
samtools sort foo.bam foo.sorted .bam # sorts
samtools index foo.sorted.bam # index the sorted bam
samtools rmdup foo.sorted.bam foo.sorted.1.bam # remove duplicate reads
freebayes ...
The entire pipeline works fine up until rmdup where it immediately results in a
segmentation fault.
What's particularly interesting is that this seg fault only occurs in some
files. I originally thought this had something to do with size or memory
allocation, since rmdup worked fine with RNA seq data but not with this genome
seq data. However, the seg fault still occurred even when I allocated more
memory.
Furthermore, I think this might be a bug. The lab group I work with recently
switched to a new cluster system. I run into the seg fault on the new cluster,
which is running samtools 1.3, but the seg fault does not occur on the old
cluster, which is running version 0.1.18.
The error has been narrowed down to a line number in the bam.c file, with debug
output below:
Code:
$ gdb samtools core
GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from samtools...done.
[New LWP 11878]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `samtools rmdup NAT_B_9.sorted.bam NAT_B_9.sorted.1.bam'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 bam_get_library (h=h@entry=0x8bb1c0, b=b@entry=0x8db9c0) at bam.c:112
112 for (cp = LB; *cp && *cp != '\t' && *cp != '\n'; cp++)
(gdb) list
107 if (strncmp(rg, ID, strlen(rg)) != 0 || ID[strlen(rg)] != '\t')
108 continue;
109
110 // Valid until next query
111 static char LB_text[1024];
112 for (cp = LB; *cp && *cp != '\t' && *cp != '\n'; cp++)
113 ;
114 strncpy(LB_text, LB, MIN(cp-LB, 1023));
115 LB_text[MIN(cp-LB, 1023)] = 0;
116
(gdb)
Does anyone have any suggestions/potential fixes?
Ryan Friedman
Washington University in St. Louis
B.A. Candidate in Genomics & Computational Biology, Computer Science
Class of 2017
[email protected]<mailto:[email protected]> | (440) 708-4736
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help