Howdy,

I am experiencing a strange problem with samtools view, regarding file  
truncation, and I'm hoping someone here can verify my conclusion that  
my problem is an error in my file system.  I am using samtools-0.1.19  
on some variety of x86_64 linux.

What I'm trying to do is take a 36G BAM file, do some filtering, and  
output a new BAM file.  The filtering program operates on SAM, so I  
have a short pipeline that looks like this:
    samtools view -h input.bam | filtering_progam | samtools view -bS  
- > output.bam

What happens, after about nine hours, is it halts with this report:
    [main_samview] truncated file

Unfortunately, since I am running samtools view twice in my command,  
this error message does little to inform me about what's happening.   
So I modified the source to report more detail, ran for another 9  
hours, and after it failed the second time I knew that it was trying  
to read BAM input when it failed (and I also know it failed while  
reading a 4-byte BAM record header).  Which (I believe) rules out the  
filtering program, and lack of disk space.

Thinking that perhaps my bam file was indeed truncated or otherwise  
corrupted, I ran this command:
    samtools view alignments/input.bam | wc -l
That runs to completion, and gives a plausible count (about 266M  
records).  So it seems like the bam file is OK.

Some other tests with smaller files reveals that samtools detects  
truncated BAM files when it opens them.  In bam_header_read() it seeks  
to the end of the file and validates there is a proper end of file  
record.  If not, it writes a warning (and proceeds).  My job's output  
doesn't have this warning, further evidence of a good bam file.

The filtering program periodically reports how many input records it  
has seen.  The number of records varied a lot  between the two 9 hour  
failures-- 144M and 194M.

So now I am trying to figure out what could be going wrong that would  
fit this evidence.  The only thing that comes to mind is some kind of  
transient file system error, where it is unable to provide the file  
data to the program at some point in time.

--OR-- is there some system resource that both instances of samtools  
view are fighting over?  Like some temporary file they're both trying  
to write?  In other words, is it legit to run to instances of samtools  
view in one command?

Is there some flaw in my logic above, in my interpretation of the  
evidence?

Thanks for any help,
Bob H


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to