Re: [galaxy-dev] bam split and gatk calling

2015-05-06 Thread Roberto Alonso CIPF
Indeed, I have rewritten the code with the Peter suggestions and I was thinking to update the PR with this code On 6 May 2015 at 17:45, Joshua Udall wrote: > Use subBam from the BamBam package. Written in C. > > subBam -g targets.bed sorted.bam -o sorted.subset.bam -m 0 > > http://sourceforge.ne

Re: [galaxy-dev] bam split and gatk calling

2015-05-06 Thread Joshua Udall
Use subBam from the BamBam package. Written in C. subBam -g targets.bed sorted.bam -o sorted.subset.bam -m 0 http://sourceforge.net/projects/bambam/ On Wed, May 6, 2015 at 4:23 AM, Peter Cock wrote: > Hi Roberto, > > Given the way BAM indexing works, I see no reason to actually > split the BAM

Re: [galaxy-dev] bam split and gatk calling

2015-05-06 Thread Roberto Alonso CIPF
I agree, I prefer your solution, I will focus on that solution, thanks! Although there is some software more or less used in the community such Delly https://github.com/tobiasrausch/delly and Breakdancer http://gmt.genome.wustl.edu/packages/breakdancer/documentation.html, that doesn't use bed files

Re: [galaxy-dev] bam split and gatk calling

2015-05-06 Thread Peter Cock
On Wed, May 6, 2015 at 11:33 AM, Roberto Alonso CIPF wrote: > Hello, > > I agree, what you say fits perfectly for GATK, but as I wanted to create a > more generic code I did it this way (also because I am a newbie in the > galaxy code and I didn't know so well how to implement this ). What about a

Re: [galaxy-dev] bam split and gatk calling

2015-05-06 Thread Roberto Alonso CIPF
Hello, I agree, what you say fits perfectly for GATK, but as I wanted to create a more generic code I did it this way (also because I am a newbie in the galaxy code and I didn't know so well how to implement this ). What about a tool that doesn't accept a region, just a bam? Maybe we can put anoth

Re: [galaxy-dev] bam split and gatk calling

2015-05-06 Thread Peter Cock
Hi Roberto, Given the way BAM indexing works, I see no reason to actually split the BAM file at all - it seems like wasted disk IO. Instead, can you split a BED file into sub-regions? This way each child GATK job would look at the full BAM file but only for a small region described in the split B

[galaxy-dev] bam split and gatk calling

2015-05-06 Thread Roberto Alonso CIPF
Hello, I have been working in the Galaxy parallelization module and I would like to ask you some questions that I have about how to face one problem. I have done one pull request about splitting bams: https://github.com/galaxyproject/galaxy/pull/184 Regarding this, I think it is useful but it cou