Hi Devs,

I'd like to open up some discussion about incorporating some code bits into the 
Galaxy distribution.

My code is here:
https://bitbucket.org/cganote/osg-blast-galaxy

First off, I'd like to say that these changes were made initially as hacks to 
get Galaxy working with a grid interface for our nefarious purposes. For us, 
the results have been spiffy, in that we can offload a bunch of Blast work off 
of our own clusters and onto the grid, which processes them fast on a 
distributed set of computers.

In order to do this, I wanted to be able to take as much control over the 
process as I could. The destination uses Condor, but it used condor_dag to 
submit jobs - that means I would have to modify the condor job runner.

The destination needed to have the files shipped over to it first - so I had to 
be able to stage. This made lwr attractive, but then I would need to guarantee 
that the server at the other end was running lwr, and since I don't have 
control of that server, this seemed less likely to be a good option.

The easiest thing for me to understand was the cli runner. I could do ssh, I 
could do scp, so this seemed the best place to start. So I started by trying to 
figure out which files needed to be sent to the server, and then implementing a 
way to send them. I start with stdout, stderr and exit code files. I also want 
to stage any datasets that are in the param_dict, and anything that is in 
extra_files_path. Then we alter the command line that is run such that all the 
paths make sense on the remote server, and to make sure that the right things 
are run remotely vs. locally (i.e., metadata.sh is run locally after job is 
done). Right now, this is done by splitting the command line on a specific 
string, which is not robust for future changes to the command_factory, but I'm 
open to suggestions.

So, here's one hack. The hidden data tool parameter is something I hijacked - 
as far as I can tell, hidden data is only used for Cufflinks, so it seemed 
safe. I use it to send the shell script that will be run on the server (but NOT 
sent to the worker nodes). It needed to be a DATA type so that my stager would 
pick it up and send it over. I wanted it to be hidden because it was only used 
by the tool and it should not need to be an HDA. I made changes to allow the 
value of the hidden data to be set in the tool - this would become the 
false_path of the data, which would then become its actual path.

Please have a look, and ask questions, and if there are improvements needed 
before anything is considered for pulling, let me know. I'd like to present 
this at the Galaxy conference without having vegetables thrown at me. Thanks!

-Carrie Ganote
National Center for Genome Analysis Support
Indiana University
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to