Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-23 Thread Peter Cock
On Wed, Feb 22, 2012 at 7:07 PM, wrote: > Awesome, I'll take a look.  And, if you're able to pull it together easily > enough, clean branches are always nice. > > -Dannon It is all on one new branch, but this covers FASTA splitting (ready), splitting in the BLAST+ wrapper (ready bar merging data

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-22 Thread dannonbaker
Awesome, I'll take a look.  And, if you're able to pull it together easily enough, clean branches are always nice.-DannonOn Feb 22, 2012, at 10:59 AM, Peter Cock wrote: Basic BLAST XML merging implemented and apparently working: https://bitbucket.org/peterjc/galaxy-central/changeset/ebf65c0b1e26

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-22 Thread Peter Cock
On Thu, Feb 16, 2012 at 9:02 PM, Peter wrote: > On Thu, Feb 16, 2012 at 6:42 PM, Chris wrote: >> On Feb 16, 2012, at 12:24 PM, Peter wrote: >>> I also need to look at merging multiple BLAST XML outputs, >>> but this is looking promising. >> >> Yep, that's definitely one where a simple concatenation

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-20 Thread Dannon Baker
Peter has it right in that we need to do this internally to ensure functionality across a range of job runners. A side benefit is that it gives us direct access to the tasks so that we can eventually do interesting things with scheduling, resubmission, feedback, etc. If the overhead looks to b

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-20 Thread Peter Cock
On Mon, Feb 20, 2012 at 8:08 AM, Bram Slabbinck wrote: > Hi Dannon, > > If I may further elaborate on this issue, I would like to mention that this > kind of functionality is also supported by the Sun Grid Engine in the form > of 'array jobs'. With this functionality you can execute a job multiple

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-20 Thread Bram Slabbinck
Hi Dannon, If I may further elaborate on this issue, I would like to mention that this kind of functionality is also supported by the Sun Grid Engine in the form of 'array jobs'. With this functionality you can execute a job multiple times in an independent way, only differing for instance in

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-17 Thread Peter Cock
On Thu, Feb 16, 2012 at 9:02 PM, Peter wrote: > On Thu, Feb 16, 2012 at 6:42 PM, Chris wrote: >> Cool!  Seems like a perfectly fine start.  I guess you could >> grab the # of sequences from the dataset somehow (I'm >> guessing that is set somehow upon import into Galaxy). > > Yes, I should be able

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Peter Cock
On Thu, Feb 16, 2012 at 6:42 PM, Fields, Christopher J wrote: > On Feb 16, 2012, at 12:24 PM, Peter Cock wrote: >> I've checked in my FASTA splitting, which now seems to be >> working OK with my BLAST tests. (If this was unclear, I mean checked into my branch - I don't have commit privileges to t

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Fields, Christopher J
On Feb 16, 2012, at 12:24 PM, Peter Cock wrote: > On Thu, Feb 16, 2012 at 4:28 PM, Peter Cock wrote: >> Hi Dan, >> >> I think I need a little more advice - what is the role of the script >> scripts/extract_dataset_part.py and the JSON files created >> when splitting FASTQ files in lib/galaxy/dat

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Dannon Baker
Very cool, I'll check it out! The addition of the JSON files is indeed very new and was likely unfinished with respect to the base splitter. -Dannon On Feb 16, 2012, at 1:24 PM, Peter Cock wrote: > On Thu, Feb 16, 2012 at 4:28 PM, Peter Cock wrote: >> Hi Dan, >> >> I think I need a little mo

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Peter Cock
On Thu, Feb 16, 2012 at 4:28 PM, Peter Cock wrote: > Hi Dan, > > I think I need a little more advice - what is the role of the script > scripts/extract_dataset_part.py and the JSON files created > when splitting FASTQ files in lib/galaxy/datatypes/sequence.py, > and then used by the class' process

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Peter Cock
Hi Dan, I think I need a little more advice - what is the role of the script scripts/extract_dataset_part.py and the JSON files created when splitting FASTQ files in lib/galaxy/datatypes/sequence.py, and then used by the class' process_split_file method? Why is there no JSON file created by the b

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Peter Cock
On Thu, Feb 16, 2012 at 1:53 PM, Fields, Christopher J wrote: > > Makes sense from my perspective; splits have to be defined based on > data type.  It could be as low-level as defining a simple iterator per > record, then a wrapper that allows a specific chunk-size.  The split > file creation coul

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Fields, Christopher J
On Feb 16, 2012, at 4:47 AM, Peter Cock wrote: > On Wed, Feb 15, 2012 at 6:07 PM, Dannon Baker wrote: >> Good luck, let me know how it goes, and again - contributions are certainly >> welcome :) > > I think I found the first bug, method split in > lib/galaxy/datatypes/sequence.py > for class Se

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Peter Cock
On Thu, Feb 16, 2012 at 10:47 AM, Peter Cock wrote: > On Wed, Feb 15, 2012 at 6:07 PM, Dannon Baker wrote: >> Good luck, let me know how it goes, and again - contributions are certainly >> welcome :) > > I think I found the first bug, method split in > lib/galaxy/datatypes/sequence.py > for clas

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Peter Cock
On Wed, Feb 15, 2012 at 6:07 PM, Dannon Baker wrote: > Good luck, let me know how it goes, and again - contributions are certainly > welcome :) I think I found the first bug, method split in lib/galaxy/datatypes/sequence.py for class Sequence assumes four lines per sequence. This would make sense

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Dannon Baker
On Feb 16, 2012, at 5:15 AM, Peter Cock wrote: > On Wed, Feb 15, 2012 at 6:07 PM, Dannon Baker wrote: >> >> Main still runs these jobs in the standard non-split fashion, and as a >> resource that is occasionally saturated (and thus doesn't necessarily have >> extra resources to parallelize to)

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-16 Thread Peter Cock
On Wed, Feb 15, 2012 at 6:07 PM, Dannon Baker wrote: > > Main still runs these jobs in the standard non-split fashion, and as a > resource that is occasionally saturated (and thus doesn't necessarily have > extra resources to parallelize to) will probably continue doing so as long > as there's sig

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-15 Thread Dannon Baker
Are those four tools being used on Galaxy Main already with this basic parallelism in place? Main still runs these jobs in the standard non-split fashion, and as a resource that is occasionally saturated (and thus doesn't necessarily have extra resources to parallelize to) will probably continue do

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-15 Thread Peter Cock
On Wed, Feb 15, 2012 at 5:08 PM, Dannon Baker wrote: > It's definitely an experimental feature at this point, and there's no wiki, > but basic support for breaking jobs into tasks does exist.  It needs a lot > more work and can go in a few different directions to make it better, Not what I was ho

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-15 Thread Fields, Christopher J
Ah, was just about to ask about this as well, nice to know something is already in place (as experimental as it might be). Thanks Dannon! chris On Feb 15, 2012, at 11:08 AM, Dannon Baker wrote: > It's definitely an experimental feature at this point, and there's no wiki, > but basic support fo

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

2012-02-15 Thread Dannon Baker
It's definitely an experimental feature at this point, and there's no wiki, but basic support for breaking jobs into tasks does exist. It needs a lot more work and can go in a few different directions to make it better, but check out the wrappers with defined, and enable use_tasked_jobs in you