Re: [galaxy-dev] Tool development - Selecting a single item from input dataset.

2015-01-26 Thread John Chilton
I have created a Trello card to track this request here
https://trello.com/c/qCtBBB8n.

Any chance you can share the tool with me? I understand that this
instinct is too simplistic - but it seems to me like the option to
repeatedly run a tool over many datasets should not be the concern of
the tool author - it is a user concern. If we add the option - perhaps
we could stipulate that it requests some other parameter to dependent
on it (I assume you have a dependent parameter in the repeat?).

-John

On Thu, Jan 22, 2015 at 5:50 AM, Vimalkumar Velayudhan
vi...@biotechcoder.com wrote:
 Thanks Peter. I see how this feature would be useful, but the program I'm
 writing a wrapper for has an argument with values corresponding to the input
 files. I am using a repeat tag to maintain this order. With the multi-run
 option, files are selected in a random manner and added to the job queue. It
 is best not to display the multi-run option in this case.

 I see there is a TODO on this already:
 https://bitbucket.org/galaxy/galaxy-dist/src/a2308bdc93b897af974766b190abe019ade49e9a/lib/galaxy/tools/parameters/basic.py?at=default#cl-2084

 For now, I have set allow=False but I believe this is best set at the param
 tag level:

 param type=data multirun=false /


 Vimal

 On Wed, Jan 21, 2015 at 2:26 AM, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 I think this is the (relatively new) Galaxy ability to automatically
 run N copies of your tool given N input files, making N outputs
 and is related to the collections work.

 (This is possible if your tool takes a single input file)

 Peter

 On Tue, Jan 20, 2015 at 6:17 PM, Vimalkumar Velayudhan
 vi...@biotechcoder.com wrote:
  Hi all,
 
  I am trying to create a select box with the possibility of selecting
  only a
  single item from the input dataset (figure 1). This works fine but the
  option for selecting multiple files is still visible (figure 2). The
  multiple=false attribute has no effect.
 
  Figure: http://i.imgur.com/oJVFCoF.png
 
  I have the following in my XML.
 
  param format=tabular name=ribo_files type=data
 label=Select Ribo-Seq alignment file multiple=false 
  /param
 
  Any suggestions?
 
  galaxy-dist revision 5f4c13d622b8
 
 
  Regards,
  Vimalkumar Velayudhan
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
 
  To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/



 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   https://lists.galaxyproject.org/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] data collections - workflow - bug?

2015-01-26 Thread John Chilton
Hey,

  Really intensive database operations including datasest collections
but other things too like multi-running tools or workflows over many
individual datasets for instance - can very easily overwhelm the
default sqlite database. This is frustrating and shouldn't happen -
but it does unfortunately. I would recommend using a postgres database
when testing out dataset collections. The good news is that it is
easier than ever to get a fully fledged production-quality server
thanks to Bjoern's docker server
(https://github.com/bgruening/docker-galaxy-stable) - it comes bundled
with Postgres and Slurm so it should be able to handle the collection
operations. If you need to run Galaxy on a non-containerized server
(for instance because that is where the software is) more information
on setting up Galaxy can be found here
https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer.

Here is a Trello card to track progress on the database optimization
efforts if you are interested https://trello.com/c/UPLsMKQI.

Very sorry.

-John



-John

On Mon, Jan 26, 2015 at 9:35 AM, Torsten Houwaart
houwa...@informatik.uni-freiburg.de wrote:
 Hello Galaxy Devs,

 I was using data collections (for the first time) for a new workflow of ours
 and I ran into this problem. There was no complaint by the workflow-editor
 and I could start the workflow but then see below happened.
 If you need more information about the workflow or otherwise let me know.

 Best,
 Torsten H.


 job traceback:
 Traceback (most recent call last):
   File /usr/local/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py,
 line 565, in finish_job
 job_state.job_wrapper.finish( stdout, stderr, exit_code )
   File /usr/local/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py, line
 1250, in finish
 self.sa_session.flush()
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/scoping.py,
 line 114, in do
 return getattr(self.registry(), name)(*args, **kwargs)
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/session.py,
 line 1718, in flush
 self._flush(objects)
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/session.py,
 line 1789, in _flush
 flush_context.execute()
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/unitofwork.py,
 line 331, in execute
 rec.execute(self)
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/unitofwork.py,
 line 475, in execute
 uow
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/persistence.py,
 line 59, in save_obj
 mapper, table, update)
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/persistence.py,
 line 485, in _emit_update_statements
 execute(statement, params)
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py,
 line 1449, in execute
 params)
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py,
 line 1584, in _execute_clauseelement
 compiled_sql, distilled_params
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py,
 line 1698, in _execute_context
 context)
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py,
 line 1691, in _execute_context
 context)
   File
 /usr/local/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/default.py,
 line 331, in do_execute
 cursor.execute(statement, parameters)
 DBAPIError: (TransactionRollbackError) deadlock detected
 DETAIL:  Process 3144 waits for ShareLock on transaction 2517124; blocked by
 process 3143.
 Process 3143 waits for ShareLock on transaction 2517123; blocked by process
 3144.
 HINT:  See server log for query details.
  'UPDATE workflow_invocation SET update_time=%(update_time)s WHERE
 workflow_invocation.id = %(workflow_invocation_id)s' {'update_time':
 datetime.datetime(2015, 1, 26, 14, 20, 4, 155440), 'workflow_invocation_id':
 5454}


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   https://lists.galaxyproject.org/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  

Re: [galaxy-dev] concatenate_datasets docker image can't create output

2015-01-26 Thread John Chilton
Can you send me a screenshot of the Tool form right before you hit
submit. It is odd that two commands are executed barely differing at
all. There were some recent UI changes and I want to make sure that
you are passing two inputs once and not submitting one input twice in
the newly named batch mode?

container.sh is generated in command_factory.py using containers.py -
here are some relevant links.

https://bitbucket.org/galaxy/galaxy-central/src/tip/lib/galaxy/jobs/command_factory.py?at=default
https://bitbucket.org/galaxy/galaxy-central/src/tip/lib/galaxy/tools/deps/containers.py?at=default
https://bitbucket.org/galaxy/galaxy-central/src/tip/lib/galaxy/tools/deps/docker_util.py?at=default

-John


On Mon, Jan 26, 2015 at 2:17 PM, Jeltje van Baren
jeltje.van.ba...@gmail.com wrote:
 I'm still trying to debug this.

 How do I change the working directory? I can't seem to find out how all
 those directory mounts get passed to docker - the catDocker.xml appears to
 generate a 'cat' command that I assume ends up in
 ()/job_working_directory/000/12/container.sh
 but I can't find where that's happening - container.sh isn't listed in any
 file under galaxy-dist.
 job_conf.xml lists the directories that get passed as variables - again,
 when and where are those defined?

 Thanks,

 -Jeltje


 On Fri, Jan 23, 2015 at 1:35 PM, Jeltje van Baren
 jeltje.van.ba...@gmail.com wrote:

 I tried the second solution (setting everything to rw) and while I can
 confirm that the command is changed accordingly, the results are the same:
 Two empty output files and the same error message.

 Command from paster.log:
 docker run -e GALAXY_SLOTS=$GALAXY_SLOTS -v
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist:rw
 -v
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/tools/docker:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist/tools/docker:rw
 -v
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12:rw
 -v
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/files:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/files:rw
 -w
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12
 --net none busybox:ubuntu-14.04
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12/container.sh;
 return_code=$?; if [ -f
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12/working_file
 ] ; then cp
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12/working_file
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/files/000/dataset_19.dat
 ; fi; sh -c exit $return_code

 -Jeltje

 On Fri, Jan 23, 2015 at 12:24 PM, John Chilton jmchil...@gmail.com
 wrote:

 My first thought is the bug mentioned here -

 https://lists.galaxyproject.org/pipermail/galaxy-dev/2014-November/020892.html
 along with potential work arounds. Are you able to confirm that this
 is or is not the problem at all?

 -John

 On Fri, Jan 23, 2015 at 2:37 PM, Jeltje van Baren
 jeltje.van.ba...@gmail.com wrote:
  Hi,
 
  I'm following instructions at
  https://github.com/apetkau/galaxy-hackathon-2014.
 
  When I try to run the concatenate-datasets on two input files, two odd
  things happen. First, TWO nearly identical commands are generated in
  the
  History panel, only differing in their output filename. In paster.log,
  only
  the second one shows up. Second, the program fails, with this info in
  the
  History:
 
  An error occurred with this dataset:
  Galaxy slots passed through contain as 1
 
  /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/10/container.sh:
  line 2: can't create
 
  /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/files/000/dataset_14.dat
 
  This happens for both commands in the History panel, only the output
  filename in the other error is dataset_15.dat
 
  Oddly, the dataset_14.dat and dataset_15.dat are both created during
  this
  command, they just end up empty.
 
  Paster.log:
 
  galaxy.jobs.runners DEBUG 2015-01-23 11:08:35,973 (10) command is:
  docker
  inspect busybox:ubuntu-14.04  /dev/null 21
  [ $? -ne 0 ]  docker pull busybox:ubuntu-14.04  /dev/null 21
 
  docker run -e GALAXY_SLOTS=$GALAXY_SLOTS -v
 
  /inside/home/jeltje/exp/varscan2/programs/galaxy-dist:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist:ro
  -v
 
  /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/tools/docker:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist/tools/docker:ro
  -v
 
  /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/10:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/10:rw
  -v
 
  

Re: [galaxy-dev] concatenate_datasets docker image can't create output

2015-01-26 Thread Jeltje van Baren
See attached for screenshot!

Thanks for the links.



On Mon, Jan 26, 2015 at 11:37 AM, John Chilton jmchil...@gmail.com wrote:

 Can you send me a screenshot of the Tool form right before you hit
 submit. It is odd that two commands are executed barely differing at
 all. There were some recent UI changes and I want to make sure that
 you are passing two inputs once and not submitting one input twice in
 the newly named batch mode?

 container.sh is generated in command_factory.py using containers.py -
 here are some relevant links.


 https://bitbucket.org/galaxy/galaxy-central/src/tip/lib/galaxy/jobs/command_factory.py?at=default

 https://bitbucket.org/galaxy/galaxy-central/src/tip/lib/galaxy/tools/deps/containers.py?at=default

 https://bitbucket.org/galaxy/galaxy-central/src/tip/lib/galaxy/tools/deps/docker_util.py?at=default

 -John


 On Mon, Jan 26, 2015 at 2:17 PM, Jeltje van Baren
 jeltje.van.ba...@gmail.com wrote:
  I'm still trying to debug this.
 
  How do I change the working directory? I can't seem to find out how all
  those directory mounts get passed to docker - the catDocker.xml appears
 to
  generate a 'cat' command that I assume ends up in
  ()/job_working_directory/000/12/container.sh
  but I can't find where that's happening - container.sh isn't listed in
 any
  file under galaxy-dist.
  job_conf.xml lists the directories that get passed as variables - again,
  when and where are those defined?
 
  Thanks,
 
  -Jeltje
 
 
  On Fri, Jan 23, 2015 at 1:35 PM, Jeltje van Baren
  jeltje.van.ba...@gmail.com wrote:
 
  I tried the second solution (setting everything to rw) and while I can
  confirm that the command is changed accordingly, the results are the
 same:
  Two empty output files and the same error message.
 
  Command from paster.log:
  docker run -e GALAXY_SLOTS=$GALAXY_SLOTS -v
 
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist:rw
  -v
 
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/tools/docker:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist/tools/docker:rw
  -v
 
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12:rw
  -v
 
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/files:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/files:rw
  -w
 
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12
  --net none busybox:ubuntu-14.04
 
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12/container.sh;
  return_code=$?; if [ -f
 
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12/working_file
  ] ; then cp
 
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/12/working_file
 
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/files/000/dataset_19.dat
  ; fi; sh -c exit $return_code
 
  -Jeltje
 
  On Fri, Jan 23, 2015 at 12:24 PM, John Chilton jmchil...@gmail.com
  wrote:
 
  My first thought is the bug mentioned here -
 
 
 https://lists.galaxyproject.org/pipermail/galaxy-dev/2014-November/020892.html
  along with potential work arounds. Are you able to confirm that this
  is or is not the problem at all?
 
  -John
 
  On Fri, Jan 23, 2015 at 2:37 PM, Jeltje van Baren
  jeltje.van.ba...@gmail.com wrote:
   Hi,
  
   I'm following instructions at
   https://github.com/apetkau/galaxy-hackathon-2014.
  
   When I try to run the concatenate-datasets on two input files, two
 odd
   things happen. First, TWO nearly identical commands are generated in
   the
   History panel, only differing in their output filename. In
 paster.log,
   only
   the second one shows up. Second, the program fails, with this info in
   the
   History:
  
   An error occurred with this dataset:
   Galaxy slots passed through contain as 1
  
  
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/job_working_directory/000/10/container.sh:
   line 2: can't create
  
  
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist/database/files/000/dataset_14.dat
  
   This happens for both commands in the History panel, only the output
   filename in the other error is dataset_15.dat
  
   Oddly, the dataset_14.dat and dataset_15.dat are both created during
   this
   command, they just end up empty.
  
   Paster.log:
  
   galaxy.jobs.runners DEBUG 2015-01-23 11:08:35,973 (10) command is:
   docker
   inspect busybox:ubuntu-14.04  /dev/null 21
   [ $? -ne 0 ]  docker pull busybox:ubuntu-14.04  /dev/null 21
  
   docker run -e GALAXY_SLOTS=$GALAXY_SLOTS -v
  
  
 /inside/home/jeltje/exp/varscan2/programs/galaxy-dist:/inside/home/jeltje/exp/varscan2/programs/galaxy-dist:ro
   -v
  
  
 

[galaxy-dev] Galaxy XML from python's argparse

2015-01-26 Thread Eric Rasche
Howdy devs,

I put together a library recently, and since it seems to be functional I
thought I'd share with the rest of -dev in case it's of interest to anyone.
If anyone has feedback, bugs/issues, or PRs, I'd be happy to receive them!

https://github.com/erasche/gxargparse

So, what is it?
gxargparse is a drop in replacement for argparse which can generate Galaxy
Tool XML on demand.

When I say drop in replacement, I mean it. Through some python magic, as
soon as you `pip install gxargparse`, your argparse will (maybe*) be
transparently wrapped by gxargparse, and you'll have the
--generate_galaxy_xml flag available, which will generate Galaxy Tool XML.
This means *no code changes required*, and free tool XML generation.

However, beware that it is free tool XML; you will likely need to make
some manual corrections to it before publishing tools (repeat labels for
instance). However, if you're converting an argparse tool with hundreds of
arguments for use in Galaxy, this could save you a lot of initial manual
work.

* I say *maybe* because it depends a bit on your python module load order,
which is something completely outside of my control. The package comes with
a command line tool https://github.com/erasche/gxargparse#it-doesnt-work
which spits out a path you can stick in PYTHONPATH to fix this issue.

Where to get it?
Now available on pypi (gxargparse
https://mail.yandex.ru/re.jsx?h=a,u3kuzrWUkilmdvbgYLueaQl=aHR0cHM6Ly9weXBpLnB5dGhvbi5vcmcvcHlwaS9neGFyZ3BhcnNl)
and github (erasche/gxargparse
https://mail.yandex.ru/re.jsx?h=a,JjiBfNrf6Qgagy0RTH93mwl=aHR0cHM6Ly9naXRodWIuY29tL2VyYXNjaGUvZ3hhcmdwYXJzZQ).
I *strongly* recommend against installing it system wide, as any bugs in it
could render all argparse based python tools broken on your system. It's
much more reasonable to use it in a virtualenv.
 (gx
Known Problems

   - argument_groups are not dealt with specially
   - prefix_chars and other lesser used features are not (yet) supported
   - anything with a repeat is a bit of a hack
   - no translation from argparse to conditionals/which yet figured out.

Bugs reports/suggestions are welcome
https://github.com/erasche/gxargparse/issues/!


Cheers,
Eric
-- 
Eric Rasche
Programmer II

Center for Phage Technology
Rm 312A, BioBio
Texas AM University
College Station, TX 77843
404-692-2048
e...@tamu.edu
rasche.e...@yandex.ru
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] CloudMan + Ansible + AWS

2015-01-26 Thread Dannon Baker
Thanks for the pointer, this is great stuff and I had not seen it before.




I’ve wanted to do something similar for a while now, and it’s becoming even 
more important as we loop in GCE.  I’ll definitely read through and play with 
this and see what we might be able to use, contribute back to, and so on.




-Dannon

On Mon, Jan 26, 2015 at 12:28 PM, Brad Chapman chapm...@fastmail.com
wrote:

 Enis, John and all;
 I spotted ansible-cloudman on GitHub today which reminded me I've been
 meaning to write about the approach we setup late last year to run bcbio
 on AWS. It uses elasticluster (https://github.com/gc3-uzh-ch/elasticluster)
 which has the advantage of being all Ansible scripts and bootstrapping
 from standard images -- so no more making AMIs. It also uses SLURM
 instead of SGE, which is a nice change.
 We wrote an interface that automates all of the stuff you need to setup
 on AWS: IAM users, VPCs and what not. It is a pretty streamlined process
 from the command line, including specifying the cluster size and
 stopping/starting it:
 https://bcbio-nextgen.readthedocs.org/en/latest/contents/cloud.html#aws-setup
 I also wrote up some benchmarking work using to give an idea of using
 it in practice:
 http://bcb.io/2014/12/19/awsbench/
 All of the code is here:
 https://github.com/chapmanb/bcbio-nextgen-vm
 As always, happy to overlap/share with whatever y'all decide to
 do. We could make bcbio specific stuff optional as needed, although it
 is pretty lightweight -- just the driver scripts and a Docker image of
 bcbio. It's basically a ready to use cluster with this little extra
 added so hopefully could be useful for future plans with CloudMan.
 Hope this is useful,
 Brad
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   https://lists.galaxyproject.org/
 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] CloudMan + Ansible + AWS

2015-01-26 Thread Brad Chapman

Enis, John and all;
I spotted ansible-cloudman on GitHub today which reminded me I've been
meaning to write about the approach we setup late last year to run bcbio
on AWS. It uses elasticluster (https://github.com/gc3-uzh-ch/elasticluster)
which has the advantage of being all Ansible scripts and bootstrapping
from standard images -- so no more making AMIs. It also uses SLURM
instead of SGE, which is a nice change.

We wrote an interface that automates all of the stuff you need to setup
on AWS: IAM users, VPCs and what not. It is a pretty streamlined process
from the command line, including specifying the cluster size and
stopping/starting it:

https://bcbio-nextgen.readthedocs.org/en/latest/contents/cloud.html#aws-setup

I also wrote up some benchmarking work using to give an idea of using
it in practice:

http://bcb.io/2014/12/19/awsbench/

All of the code is here:

https://github.com/chapmanb/bcbio-nextgen-vm

As always, happy to overlap/share with whatever y'all decide to
do. We could make bcbio specific stuff optional as needed, although it
is pretty lightweight -- just the driver scripts and a Docker image of
bcbio. It's basically a ready to use cluster with this little extra
added so hopefully could be useful for future plans with CloudMan.

Hope this is useful,
Brad
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/