On 2014-07-01 16:02, Christian Cole wrote: > To me it sounds like you're running out memory or temporary disk space as > you're trying to pipe >>36GB of data, twice. How much memory and/or > temporary disk space does your machine have?
Pipes don't work like that. No temporary files are created when you use a pipe, and only very little RAM. Only a small buffer of some kilobytes is allocated by the operating system (use ulimit -p to show its size in units of 512 bytes). When the buffer is full but the program on the left-hand side of the buffer tries to write into it, it blocks. When the program on the right-hand side tries to read from an empty buffer, it also blocks. For the two samtools view processes, memory usage is negligible. > I would try each step on the whole data without pipes and see if it > completes successfully e.g. > 1. samtools view -h input.bam > tmp.sam > 2. filtering_progam tmp.sam > filtered.sam > 3. samtools view -bS filtered.sam > output.bam > > If the above works, then you're hitting some limit with your pipes. If it > doesn't at least you'll know for sure which step it fails on rather than > guessing. Now this does create temporary files and since they are in SAM format, they will be huge. When the next program reads them in, they will have disappeared from the disk cache and there'll be lots of I/O. Bob's problem may be due to a disk space issue, but the above steps will mask it. > For the very reason that it's almost impossible to debug problems, I've > stopped using pipes on SAM/BAM files. Things that help when debugging: - Use 'set -o pipefail' when you are using bash - Send stderr to a log file for each command individually: samtools view -h in.bam 2> err1.log | filter_prog 2> err2.log | ... Piping is extremely powerful: It avoids creating huge temporary files and even gives you some automatic parallelization. Use it! :) Bob, I have no good idea regarding the original problem, but I have a few comments. If you want to create a truncated BAM file, you can do it like this: dd if=complete.bam of=truncated.bam bs=1M count=10 Then this is samtools view's error message: $ samtools view truncated.bam > /dev/null [bam_header_read] EOF marker is absent. The input is probably truncated. [main_samview] truncated file. The first line is printed immediately, the second just before the command exits. Interestingly, at least in my samtools version (0.1.19-96b5f2294a, Ubuntu 14.04), the EOF marker check is also done (and fails) when you pipe in a BAM file from stdin: $ samtools view -b merged.bam | samtools view - > /dev/null [bam_header_read] EOF marker is absent. The input is probably truncated. On 2014-06-30 18:16, Bob Harris wrote: > Thinking that perhaps my bam file was indeed truncated or otherwise > corrupted, I ran this command: > samtools view alignments/input.bam | wc -l > That runs to completion, and gives a plausible count (about 266M > records). So it seems like the bam file is OK. You can use 'samtools view -c alignments/input.bam' instead. That uses 50% less CPU because the SAM output doesn't need to be formatted. "-c" is for "report count only". > is there some system resource that both instances of samtools > view are fighting over? Like some temporary file they're both trying > to write? In other words, is it legit to run to instances of samtools > view in one command? It is legit and I believe it is even the recommended way. samtools view doesn't create temporary files. AFAIK, only "samtools sort" creates temporary files if it cannot sort in RAM. Regards, Marcel ------------------------------------------------------------------------------ Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help