Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?
I don't think implementation will be very difficult. The bigger question is this a technology people are open to? The nearest competitor is YARN ( http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html). Mesos seems a bit more geared toward general purpose usage (with several existing frameworks), while YARN seems more specific to Hadoop. But I'd be glad to hear some other thoughts. Kyle On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri madd...@mcs.anl.govwrote: Kyle This is something I am very interested in. The three parts below make sense to me. I would be very happy to discuss further and provide any help to move this forward. Regards On Oct 26, 2013, at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote: I think one of the aspects where Galaxy is a bit soft is the ability to do distributed tasks. The current system of split/replicate/merge tasks based on file type is a bit limited and hard for tool developers to expand upon. Distributed computing is a non-trival thing to implement and I think it would be a better use of our time to use an already existing framework. And it would also mean one less API for tool writers to have to develop for. I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/). You can see an overview of the Mesos architecture at https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md The important thing about Mesos is that it provides an API for C/C++, Java/Scala and Python to write distributed frameworks. There are already implementations of frameworks for common parallel programming systems such as: - Hadoop (https://github.com/mesos/hadoop) - MPI ( https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md ) - Spark (http://spark-project.org) And you can find example Python framework at https://github.com/apache/mesos/tree/master/src/examples/python Integration with Galaxy would have three parts: 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then passed to tool wrappers and allows them to contact the local mesos infrastructure (assuming the system has been configured) or pass a null if the system isn't available. 2) Write a tool runner that works as a mesos framework to executes single cpu jobs on the distributed system. 3) For instances where mesos is not available at a system wide level (say they only have access to an SGE based cluster), but the user wants to run distributed jobs, write a wrapper that can create a mesos cluster using the existing queueing system. For example, right now I run a Mesos system under the SGE queue system. I'm curious to see what other people think. Kyle ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Ravi K Madduri MCS, Argonne National Laboratory Computation Institute, University of Chicago ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?
You probably are a good person to get an opinion from. My plan isn't to write new frameworks, but rather use existing libraries that can communicate with Mesos to setup their parallel environments. But for Swift, you would probably want to write a new framework. Just looking at Swift, I imagine one of the harder parts is just getting the system setup on a cluster (ie distributing out files to remote nodes, making sure that you have a way to start processes on those nodes and have them know where to find the master), it seems like Swift could benefit from having a Mesos based framework. Do you think it would enable you to have a 'zero-config' startup of a distributed Swift application? Kyle On Mon, Oct 28, 2013 at 1:51 PM, Ketan Maheshwari ketancmaheshw...@gmail.com wrote: Hi Kyle, We have a similar ongoing development wherein we are working on integrating our Swift framework ( swift-lang.org ) with Galaxy. The goal is to enable Galaxy based applications to run on a variety of distributed resources via various integration schemes as suitable to application and underlying execution environment. Here is an abstract of a paper (co-authored with Ravi, who responded on this thread) we will be presenting in a workshop at the upcoming SC 13 conference: The Galaxy platform is a web-based science portal for scientific computing supporting Life Sciences users community. While user-friendly and intuitive for doing small to medium scale computations, it currently has a limited support for large-scale, parallel and distributed computing. The Swift parallel scripting framework is capable of composing ordinary applications into parallel scripts that can be run on multi-scale distributed and performance computing platforms. In complex distributed environments, often the user end of application lifecycle slows down because of the technical complexities brought in by the scale, access methods and resource management nuances. Galaxy offers a simple way of designing, composing, executing, reusing, and reproducing application runs. An integration between Swift and Galaxy systems can accelerate science as well as bring the respective user communities together in an interactive, user-friendly, parallel and distributed data analysis environment enabled on a broad range of computational infrastructures. Kindly let us know if you need a hands on for the various tools we have already developed. Best, Ketan On Mon, Oct 28, 2013 at 3:07 PM, Kyle Ellrott kellr...@soe.ucsc.eduwrote: I don't think implementation will be very difficult. The bigger question is this a technology people are open to? The nearest competitor is YARN ( http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html). Mesos seems a bit more geared toward general purpose usage (with several existing frameworks), while YARN seems more specific to Hadoop. But I'd be glad to hear some other thoughts. Kyle On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri madd...@mcs.anl.govwrote: Kyle This is something I am very interested in. The three parts below make sense to me. I would be very happy to discuss further and provide any help to move this forward. Regards On Oct 26, 2013, at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote: I think one of the aspects where Galaxy is a bit soft is the ability to do distributed tasks. The current system of split/replicate/merge tasks based on file type is a bit limited and hard for tool developers to expand upon. Distributed computing is a non-trival thing to implement and I think it would be a better use of our time to use an already existing framework. And it would also mean one less API for tool writers to have to develop for. I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ). You can see an overview of the Mesos architecture at https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md The important thing about Mesos is that it provides an API for C/C++, Java/Scala and Python to write distributed frameworks. There are already implementations of frameworks for common parallel programming systems such as: - Hadoop (https://github.com/mesos/hadoop) - MPI ( https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md ) - Spark (http://spark-project.org) And you can find example Python framework at https://github.com/apache/mesos/tree/master/src/examples/python Integration with Galaxy would have three parts: 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then passed to tool wrappers and allows them to contact the local mesos infrastructure (assuming the system has been configured) or pass a null if the system isn't available. 2) Write a tool runner that works as a mesos framework to executes single cpu jobs on the distributed system. 3) For instances where mesos is not available at a system wide level (say they only have access to an SGE based
Re: [galaxy-dev] Why are Custom Builds disabled when
Why are Custom Builds only enabled if you're *not* using remote_user ? Is this because they're related to user preferences (as per the code in lib/galaxy/webapps/galaxy/controllers/user.py)? This may be an artifact of a time when user preferences were immutable. Do Custom Builds and Preferences work on your set up using remote users? And then a followup, if I have to add a build using either a FASTA file or a lenfile and editing the Galaxy install on the server, where do I need to make changes? Unfortunately, It'll go in different places depending on your use. What are you looking to do with the build? Thanks, J. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] What to do with toolshed tools that get stuck Installing
Hi All, The short version of my problem is that I often seem to find myself in a situation where my tool gets stuck in the Installing phase ... and I don't know how to get feedback on it to see where it is failing. What I guess I'm after is just the stdout of the tool installation .. but since the tool install never completes I can never see this. What I usually do is try to recreate the environment that galaxy sees during tool install in a normal shell .. and then run it manually .. which sometimes tells me the problem ... but not always (see below). The longer version of my problem is below. In my case I am struggling with the following tool http://testtoolshed.g2.bx.psu.edu/view/iracooke/package_protk_1_2_5 Which uses the new setup_ruby_environment tag ... so the tool should do the following (1) Install the dependency .. Ruby 2.0 ... which it does successfully (2) Install my protk rubygem from rubygems.org .. which it gets stuck doing. The last few lines from my galaxy server log during the failed install are; 10.0.2.2 - - [28/Oct/2013:21:54:26 +] POST /admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 [localhost] local: touch /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh [localhost] local: echo 'RUBYLIB=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/lib/:$RUBYLIB; export RUBYLIB' /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh 10.0.2.2 - - [28/Oct/2013:21:54:30 +] POST /admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 [localhost] local: echo 'PATH=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin:$PATH; export PATH' /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh [localhost] local: echo 'GALAXY_RUBY_HOME=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin; export GALAXY_RUBY_HOME' /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh [localhost] local: rm -rf ./database/tmp/tmp-toolshed-mtdQaLI7O ruby version 2.0 installed in /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63 [localhost] local: touch /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: echo 'RUBYLIB=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/lib/:$RUBYLIB; export RUBYLIB' /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: echo 'PATH=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin:$PATH; export PATH' /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: echo 'GALAXY_RUBY_HOME=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin; export GALAXY_RUBY_HOME' /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: rm -rf ./database/tmp/tmp-toolshed-mtdJYyhO0 10.0.2.2 - - [28/Oct/2013:21:54:34 +] POST /admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 10.0.2.2 - - [28/Oct/2013:21:47:22 +] POST /admin_toolshed/manage_repositories HTTP/1.1 302 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 10.0.2.2 - - [28/Oct/2013:21:54:37 +] GET /admin_toolshed/monitor_repository_installation?tool_shed_repository_ids=f597429621d6eb2btool_shed_repository_ids=f2db41e1fa331b3e HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 With a normal tool install I would see the last line of this output repeat lots of times as installation is monitored ... but for some reason in this case monitoring just shuts down and I don't see any further updates. If I recreate the situation on the command-line (eg by manually setting the RUBYLIB and GEM_HOME environment variables appropriately I can install my gem without any errors. More details on my setup: I'm using a vagrant VM for testing which is based on Ubuntu
Re: [galaxy-dev] What to do with toolshed tools that get stuck Installing
Hi Ira, I can reproduce that error here, will try to study it tomorrow more deeply. For the meantime can you try to remove the following lines? action type=set_environment_for_install repository name=package_ruby_2_0 owner=bgruening package name=ruby version=2.0 / /repository /action It's not necessary. Will get back to you tomorrow. Bjoern Hi All, The short version of my problem is that I often seem to find myself in a situation where my tool gets stuck in the Installing phase ... and I don't know how to get feedback on it to see where it is failing. What I guess I'm after is just the stdout of the tool installation .. but since the tool install never completes I can never see this. What I usually do is try to recreate the environment that galaxy sees during tool install in a normal shell .. and then run it manually .. which sometimes tells me the problem ... but not always (see below). The longer version of my problem is below. In my case I am struggling with the following tool http://testtoolshed.g2.bx.psu.edu/view/iracooke/package_protk_1_2_5 Which uses the new setup_ruby_environment tag ... so the tool should do the following (1) Install the dependency .. Ruby 2.0 ... which it does successfully (2) Install my protk rubygem from rubygems.org .. which it gets stuck doing. The last few lines from my galaxy server log during the failed install are; 10.0.2.2 - - [28/Oct/2013:21:54:26 +] POST /admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 [localhost] local: touch /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh [localhost] local: echo 'RUBYLIB=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/lib/:$RUBYLIB; export RUBYLIB' /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh 10.0.2.2 - - [28/Oct/2013:21:54:30 +] POST /admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 [localhost] local: echo 'PATH=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin:$PATH; export PATH' /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh [localhost] local: echo 'GALAXY_RUBY_HOME=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin; export GALAXY_RUBY_HOME' /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh [localhost] local: rm -rf ./database/tmp/tmp-toolshed-mtdQaLI7O ruby version 2.0 installed in /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63 [localhost] local: touch /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: echo 'RUBYLIB=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/lib/:$RUBYLIB; export RUBYLIB' /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: echo 'PATH=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin:$PATH; export PATH' /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: echo 'GALAXY_RUBY_HOME=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin; export GALAXY_RUBY_HOME' /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: rm -rf ./database/tmp/tmp-toolshed-mtdJYyhO0 10.0.2.2 - - [28/Oct/2013:21:54:34 +] POST /admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 10.0.2.2 - - [28/Oct/2013:21:47:22 +] POST /admin_toolshed/manage_repositories HTTP/1.1 302 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 10.0.2.2 - - [28/Oct/2013:21:54:37 +] GET /admin_toolshed/monitor_repository_installation?tool_shed_repository_ids=f597429621d6eb2btool_shed_repository_ids=f2db41e1fa331b3e HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101
Re: [galaxy-dev] What to do with toolshed tools that get stuck Installing
Hi Bjoern, Thanks ... I thought I had to do that in order to make the GALAXY_RUBY_HOME variable available ... nice that it's not necessary. I've updated the tool on the test toolshed and have rerun my test but unfortunately the problem is still there. Cheers Ira On 29/10/2013, at 10:06 AM, Björn Grüning bjoern.gruen...@pharmazie.uni-freiburg.de wrote: Hi Ira, I can reproduce that error here, will try to study it tomorrow more deeply. For the meantime can you try to remove the following lines? action type=set_environment_for_install repository name=package_ruby_2_0 owner=bgruening package name=ruby version=2.0 / /repository /action It's not necessary. Will get back to you tomorrow. Bjoern Hi All, The short version of my problem is that I often seem to find myself in a situation where my tool gets stuck in the Installing phase ... and I don't know how to get feedback on it to see where it is failing. What I guess I'm after is just the stdout of the tool installation .. but since the tool install never completes I can never see this. What I usually do is try to recreate the environment that galaxy sees during tool install in a normal shell .. and then run it manually .. which sometimes tells me the problem ... but not always (see below). The longer version of my problem is below. In my case I am struggling with the following tool http://testtoolshed.g2.bx.psu.edu/view/iracooke/package_protk_1_2_5 Which uses the new setup_ruby_environment tag ... so the tool should do the following (1) Install the dependency .. Ruby 2.0 ... which it does successfully (2) Install my protk rubygem from rubygems.org .. which it gets stuck doing. The last few lines from my galaxy server log during the failed install are; 10.0.2.2 - - [28/Oct/2013:21:54:26 +] POST /admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 [localhost] local: touch /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh [localhost] local: echo 'RUBYLIB=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/lib/:$RUBYLIB; export RUBYLIB' /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh 10.0.2.2 - - [28/Oct/2013:21:54:30 +] POST /admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 [localhost] local: echo 'PATH=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin:$PATH; export PATH' /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh [localhost] local: echo 'GALAXY_RUBY_HOME=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin; export GALAXY_RUBY_HOME' /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/env.sh [localhost] local: rm -rf ./database/tmp/tmp-toolshed-mtdQaLI7O ruby version 2.0 installed in /home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63 [localhost] local: touch /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: echo 'RUBYLIB=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/lib/:$RUBYLIB; export RUBYLIB' /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: echo 'PATH=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin:$PATH; export PATH' /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: echo 'GALAXY_RUBY_HOME=/home/vagrant/tool_dependencies/ruby/2.0/bgruening/package_ruby_2_0/a0494c6e1c63/bin; export GALAXY_RUBY_HOME' /home/vagrant/tool_dependencies/ruby/2.0/iracooke/package_protk_1_2_5/876e44dd4609/env.sh [localhost] local: rm -rf ./database/tmp/tmp-toolshed-mtdJYyhO0 10.0.2.2 - - [28/Oct/2013:21:54:34 +] POST /admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 10.0.2.2 - - [28/Oct/2013:21:47:22 +] POST /admin_toolshed/manage_repositories HTTP/1.1 302 - http://localhost:8088/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36
[galaxy-dev] tardis job splitter
dear all, There have been a few posts lately about doing distributed computing via Galaxy - i.e. job splitters etc - below a contribution of some ideas we have developed and applied in our work, where we have arranged for some Galaxy tools to execute in parallel on our cluster. We have developed a job-splitter script tardis.py (available from https://bitbucket.org/agr-bifo/tardis), which takes marked-up standard unix commands that run an application or tool. The mark-up is prefixed to the input and output command-line options. Tardis strips off the mark-up, and re-writes the commands to refer to split inputs and outputs, which are then executed in parallel e.g. on a distributed compute resource. Tardis knows the output files to expect and how to join them back together. (This was referred to in our GCC2013 talk http://wiki.galaxyproject.org/Events/GCC2013/Abstracts#Events.2FGCC2013.2FAbstracts.2FTalks.A_layered_genotyping-by-sequencing_pipeline_using_Galaxy ) Any reasonable unix based data processing or analysis command may be marked up and run using tardis, though of course tardis needs to know how to split and join the data. Our approach also assumes a symmetrical HPC cluster configuration, in the sense that each node sees the same view of the file system (and has the required underlying application installed). We use tardis to support both Galaxy and command-line based compute. Background / design pattern / motivating analogy: Galaxy provides a high level end to end view of a workflow; the HPC cluster resource that one uses then involves spraying chunks of data out into parallel processes, usually in the form of some kind of distributed compute cluster - but an end-user looking at a Galaxy history, should ideally not be able to tell whether the workflow was run as a single process on the server, or via many parallel processes on the cluster (apart from the fact that when run in parallel on the cluster, its alot faster!). We noticed that the TCP / IP layered networking protocol stack provides a useful metaphor and design pattern - with the end to end topology of a Galaxy workflow corresponding to the transport layer of TCP/ IP; and the distribution of computation across a cluster corresponding to the next TCP/IP layer down - the packet-routing layer. This picture suggested a strongly layered approach to provisioning Galaxy with parallelised compute on split data, and hence to an approach in which the footprint in the Galaxy code-base, of parallel / distributed compute support, should ideally (from the layered-design point of view) be minimal and superficial. Thus in our approach so far, the only footprint is in the tool config files, where we arrange the templating to (optionally) prefix the required tardis mark-up to the input and output command options, and the tardis script name to the command as a whole. tardis then takes care of rewriting and launching all of the jobs, and finally joining the results back together and putting them where galaxy expects them to be (and also housekeeping such as collating and passing up stderr and stdout , and appropriate process exit codes). (For each galaxy job, tardis creates a working folder in a designated scratch area, where input files are uncompressed and split; job files and their output are stored; logging is done etc. Split data is cleaned up at the end unless there was an error in some part of the job, in which case everything is retained for debugging and in some cases restart) (We modify Galaxy tool-configs so that the user can optionally choose to run the tool on our HPC cluster - there are three HPC related input fields, appended to the input section of a tool. Here the user selects whether they want to use our cluster and if so, they specify the chunk size, and can also at that point specify a sampling rate, since we often find it useful to be able to run preliminary analyses on a random sample of (for example) single or paired-end NGS sequence data, to obtain a fairly quick snapshot of the data, before the expense of a complete run. We found it convenient to include support for input sampling in tardis). The pdf document at https://bitbucket.org/agr-bifo/tardis includes a number of examples of marking up a command, and also a simple example of a galaxy tool-config that has been modified to include support for optionally running the job on our HPC cluster via the tardis pre-processor. Known limitations: * we have not yet attempted to integrate our approach with the existing Galaxy job-splitting distributed compute support, partly because of our layered design goal (admittedly also partly because of ignorance about its details ! ) * our current implementation is quite naive in the distributed compute API it uses - it supports launching condor job files (and also native sub-processes) - our plan is to replace that with using the drmaa API * we would like to integrate it better with the galaxy type