Hi Bob,
To me it sounds like you're running out memory or temporary disk space as you're trying to pipe >>36GB of data, twice. How much memory and/or temporary disk space does your machine have? I would try each step on the whole data without pipes and see if it completes successfully e.g. 1. samtools view -h input.bam > tmp.sam 2. filtering_progam tmp.sam > filtered.sam 3. samtools view -bS filtered.sam > output.bam If the above works, then you're hitting some limit with your pipes. If it doesn't at least you'll know for sure which step it fails on rather than guessing. For the very reason that it's almost impossible to debug problems, I've stopped using pipes on SAM/BAM files. Cheers, Chris On 30/06/2014 17:16, "Bob Harris" <rshar...@bx.psu.edu> wrote: >Howdy, > >I am experiencing a strange problem with samtools view, regarding file >truncation, and I'm hoping someone here can verify my conclusion that >my problem is an error in my file system. I am using samtools-0.1.19 >on some variety of x86_64 linux. > >What I'm trying to do is take a 36G BAM file, do some filtering, and >output a new BAM file. The filtering program operates on SAM, so I >have a short pipeline that looks like this: > samtools view -h input.bam | filtering_progam | samtools view -bS >- > output.bam > >What happens, after about nine hours, is it halts with this report: > [main_samview] truncated file > >Unfortunately, since I am running samtools view twice in my command, >this error message does little to inform me about what's happening. >So I modified the source to report more detail, ran for another 9 >hours, and after it failed the second time I knew that it was trying >to read BAM input when it failed (and I also know it failed while >reading a 4-byte BAM record header). Which (I believe) rules out the >filtering program, and lack of disk space. > >Thinking that perhaps my bam file was indeed truncated or otherwise >corrupted, I ran this command: > samtools view alignments/input.bam | wc -l >That runs to completion, and gives a plausible count (about 266M >records). So it seems like the bam file is OK. > >Some other tests with smaller files reveals that samtools detects >truncated BAM files when it opens them. In bam_header_read() it seeks >to the end of the file and validates there is a proper end of file >record. If not, it writes a warning (and proceeds). My job's output >doesn't have this warning, further evidence of a good bam file. > >The filtering program periodically reports how many input records it >has seen. The number of records varied a lot between the two 9 hour >failures-- 144M and 194M. > >So now I am trying to figure out what could be going wrong that would >fit this evidence. The only thing that comes to mind is some kind of >transient file system error, where it is unable to provide the file >data to the program at some point in time. > >--OR-- is there some system resource that both instances of samtools >view are fighting over? Like some temporary file they're both trying >to write? In other words, is it legit to run to instances of samtools >view in one command? > >Is there some flaw in my logic above, in my interpretation of the >evidence? > >Thanks for any help, >Bob H > > >-------------------------------------------------------------------------- >---- >Open source business process management suite built on Java and Eclipse >Turn processes into business applications with Bonita BPM Community >Edition >Quickly connect people, data, and systems into organized workflows >Winner of BOSSIE, CODIE, OW2 and Gartner awards >http://p.sf.net/sfu/Bonitasoft >_______________________________________________ >Samtools-help mailing list >Samtools-help@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/samtools-help The University of Dundee is a registered Scottish Charity, No: SC015096 ------------------------------------------------------------------------------ Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help