Hi Gordon - It looks like on the samtools mailing lists there is an active discussion on speeding up sorting:

http://sourceforge.net/mailarchive/message.php?msg_id=27247076
http://sourceforge.net/mailarchive/message.php?msg_id=26990598

Interestingly enough there is a recommendation on using Picard instead of samtools. Are there Galaxy tools scripts for Picard? This might be useful.

That still doesn't negate the fact that SAM files are being created and need to be converted to BAM files. Right now, I think I can live with sacrificing a little time for a single-threaded sort than for losing disk space from SAM files unnecessarily.

Ryan

On 4/5/11 1:18 PM, Assaf Gordon wrote:
Hello Ryan,

I'm in the exact same situation with my bowtie/tophat tools,
going back and forth between outputing a SAM, sorted SAM, BAM or sorted BAM,
and I'm still not sure what's the best method.

Storage wise - you're correct, just saving the sorted BAM is the best (even 
more with the fact the processing SAM files as text is so horrendous that I 
think alnost no tool uses them directly, always requiring intervals or sorted 
BAM).

But one annoyance (for me) is that samtools (the program) is very in-efficient 
- using only a single thread (and the sort part isn't doing a great job at 
that).

So if I give the "mapping" tool as a whole 20 threads or more, and a part of 
the running time (the samtools sort part) is only using a single-thread - I'm wasting the 
other threads, as they sit idle waiting for the sort to finish.

I also tried sorting the SAM file directly, using GNU sort (version 8.10 can use multiple 
threads, and the memory management actually works, as opposed to "samtools sort 
-m") - but I'm not sure it's worth the effort.

I didn't find an optimal solution that I like, and I'm interested to hear what 
others think.

-gordon

Ryan Golhar wrote, On 04/05/2011 01:08 PM:
Hi all - I find it redundant to hold on to SAM output from NGS
Mapping tools such when I end up converting the SAM files to BAM
files anyway. The cleanup scripts require the history items to be
deleted, but I don't want to delete them yet as I want the entire
workflow to be kept until we are done analyzing our data.

So, I was thinking of a way to remove the intermediate SAM files and
thought how I would do this on the command line...simply pipe the
output of BWA to samtools to create a BAM file and never have a SAM
file to deal with.

The BWA tool runner can be modified to pipe BWA output directly to
samtools so a SAM file is never physical stored on disk.  Has anyone
done this?  Does this seem like a good idea?

Ryan


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


--
CONFIDENTIALITY NOTICE: This email communication may contain private, confidential, or legally privileged information intended for the sole use of the designated and/or duly authorized recipient(s). If you are not the intended recipient or have received this email in error, please notify the sender immediately by email and permanently delete all copies of this email including all attachments without reading them. If you are the intended recipient, secure the contents in a manner that conforms to all applicable state and/or federal requirements related to privacy and confidentiality of such information.

<<attachment: golharam.vcf>>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to