[galaxy-dev] Galaxy in Amazon

2015-08-19 Thread Ryan G
Hi all - We are running a local instance of Galaxy on our internal
infrastructure.  It seems to be going well.

We've gotten to the point where we are ready to migrate our NGS data to
Amazon for storage in S3.  We are also looking at how Galaxy can be used in
Amazon.  Specifically, we are interested in understanding:

1)  Should we run an instance of Galaxy in Amazon, or continue to run it
locally (to minimize costs) but have it run analyses in Amazon?

2)  Regardless of how we run it, data will be stored in S3.  How will
Galaxy interact with S3 for its Data Libraries?

3)  Is it even possible to separate the Galaxy web interface from the HPC
cluster?

3)  We understand Galaxy in Amazon uses CloudMan.  Can we run this in our
VPC with our own AMI?

If anyone can provide insights into how they are using Galaxy in Amazon, I
am very interested to hear your thoughts.

Ryan
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Galaxy in Amazon

2015-08-19 Thread Enis Afgan
Hi Ryan,
What you're suggesting to do is still somewhat experimental but we're
continuing to work on it to make it more integrated into the Galaxy
ecosystem and more robust. There are really three general approaches:
1. Run Galaxy via CloudMan 100% on AWS. This option is most robust and
basically ready for use but, over time and if you decide to make
modifications to various pieces of the puzzle, will require an increased
understanding of how CloudMan works. It's also the most expensive option.
2. Run Galaxy UI locally and create a CloudMan cluster on demand with the
Pulsar http://pulsar.readthedocs.org/en/latest/index.html service enabled
to accept jobs from the local Galaxy. This paper describes that approach:
http://onlinelibrary.wiley.com/doi/10.1002/cpe.3536/abstract
3. Run your Galaxy UI locally and create Ansible roles/tasks to dynamically
acquire cloud instances and assemble those into a cluster. You will
probably want to use Pulsar for job management again. This option gives you
most control but also means you'll need to build the system. Nate may also
have more comments about this.

I've also put some comments about your specific questions inline.

Hope this helps clarify the situation at least. We're actively working on
this scenario so things should get easier in the future. Let us know if you
have more questions and what you decide.

Cheers,
Enis



On Wed, Aug 19, 2015 at 8:53 AM, Ryan G ngsbioinformat...@gmail.com wrote:

 Hi all - We are running a local instance of Galaxy on our internal
 infrastructure.  It seems to be going well.

 We've gotten to the point where we are ready to migrate our NGS data to
 Amazon for storage in S3.  We are also looking at how Galaxy can be used in
 Amazon.  Specifically, we are interested in understanding:

 1)  Should we run an instance of Galaxy in Amazon, or continue to run it
 locally (to minimize costs) but have it run analyses in Amazon?

 The options above summarize this scenario.


 2)  Regardless of how we run it, data will be stored in S3.  How will
 Galaxy interact with S3 for its Data Libraries?

Galaxy implements an Object Store interface that can link to S3 as a
back-end data store. It's been around for a number of years now and
demonstrated as working but it also hasn't been used in production so I'd
suggest testing this first. Galaxy configuration options for the object
store are in Galaxy's config file:
https://github.com/galaxyproject/galaxy/blob/dev/config/galaxy.ini.sample#L289


 3)  Is it even possible to separate the Galaxy web interface from the HPC
 cluster?

Yes; you either need a shared file system between the resources or use
Pulsar.


 3)  We understand Galaxy in Amazon uses CloudMan.  Can we run this in our
 VPC with our own AMI?

Yes; you can build your own version of the system with the tools and
whatever else you would like to configure. Docs on how to do this are
available here: https://wiki.galaxyproject.org/CloudMan/Building


 If anyone can provide insights into how they are using Galaxy in Amazon, I
 am very interested to hear your thoughts.

 Ryan

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   https://lists.galaxyproject.org/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Unable to run simple Docker tool

2015-08-19 Thread Mikel Egaña Aranguren
2015-08-19 15:50 GMT+02:00 John Chilton jmchil...@gmail.com:

 I don't know to be honest - a couple of things to verify.

 I wrote a bunch of debug stuff right away at the end of the e-mail and
 you can try if I am wrong, but after I wrote I realized the problem.
 You are not using sudo for running docker in your examples - this is
 paused because sudo is waiting on a password and you don't have
 passwordless sudo setup probably.

 I would just add param id=docker_sudofalse/param to your job
 conf destination and this should work.


Yes, this worked.

Thanks!

Regards




 The other stuff to try:

 Does the tool work without using Docker - if you just place test-io.sh
 on Galaxy's PATH. If yes, I would set cleanup_job = never in
 galaxy.ini and try again. Once it dies, grab this part of the command
 line:

 sudo docker run -e GALAXY_SLOTS=$GALAXY_SLOTS -v

 /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy:/home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy:ro
 -v
 /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/tools/catDocker:/home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/tools/catDocker:ro
 -v
 /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/job_working_directory/000/2:/home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/job_working_directory/000/2:rw
 -v
 /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/files:/home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/files:rw
 -w
 /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/job_working_directory/000/2
 --net none --rm -u 1001 mikeleganaaranguren/busybox-galaxy-test-io:v1

 /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/job_working_directory/000/2/tool_script.sh

 and try to debug the problem outside of Galaxy. Maybe using docker
 logs for instance. If the logs don't reveal anything try dropping some
 of the command-line arguments and see if that is the problem - e.g. 
 --net none  or  -u 1001.

 -John

 On Tue, Aug 18, 2015 at 6:32 PM, Mikel Egaña Aranguren
 mikel.egana.arangu...@gmail.com wrote:
  Hi;
 
  I'm trying to develop a Docker based tool, as suggested by a
 peer-reviewer
  who might be reading this :P
 
  However, I'm having trouble with even the most basic setting, and I don't
  know what might be wrong, so any help will be much appreciated. I have
  developed a very simple docker image and corresponding Galaxy tool, so
 that
  I get it working before starting with the actual tool, but when I
 execute it
  through Galaxy it simply stays executing forever, instead of failing or
  terminating.
 
  My image simply executes a shell script that reads the content of a file
 and
  concatenates a string to it. The image:
 
  FROM busybox:ubuntu-14.04
  MAINTAINER Mikel Egaña Aranguren mikel.egana.arangu...@gmail.com
 
  RUN mkdir /sadi
  COPY test-io.sh /sadi/
  RUN chmod a+x /sadi/test-io.sh
  ENV PATH $PATH:/sadi
 
  The test-io.sh script within the image:
 
  #!/bin/sh
 
  cat $1
  echo AAA
 
  Invoking the container and executing the script through a normal shell
 works
  fine:
 
  REPOSITORY   TAG IMAGE ID
  CREATED VIRTUAL SIZE
  mikeleganaaranguren/busybox-galaxy-test-io   v1
  9c2b8bdade1d54 minutes ago  5.609 MB
 
  docker run -i -t mikeleganaaranguren/busybox-galaxy-test-io:v1
 
  BusyBox v1.21.1 (Ubuntu 1:1.21.0-1ubuntu1) built-in shell (ash)
  Enter 'help' for a list of built-in commands.
 
  ~ #
  ~ # echo BBB  test
  ~ # test-io.sh test
  BBB
  AAA
 
  This is the tool file I'm using in Galaxy:
 
  tool id=SADIBUSYBOX name=SADIBUSYBOX
  descriptionIO/description
  requirements
container
  type=dockermikeleganaaranguren/busybox-galaxy-test-io:v1/container
  /requirements
  command
test-io.sh $input  $output
  /command
  inputs
  param name=input type=data label=Dataset/
  /inputs
  outputs
  data format=txt name=output /
  /outputs
  help
  /help
  /tool
 
  And my job_conf.xml:
 
  ?xml version=1.0?
  !-- A sample job config that explicitly configures job running the way
 it
  is configured by default (if there is no explicit config). --
  job_conf
  plugins
  plugin id=local type=runner
  load=galaxy.jobs.runners.local:LocalJobRunner workers=4/
  /plugins
  handlers
  handler id=main/
  /handlers
  destinations default=docker_local
  destination id=local runner=local/
  destination id=docker_local runner=local
param id=docker_enabledtrue/param
  /destination
  /destinations
  /job_conf
 
  As I said, when I execute the tool in Galaxy, it simply executes
 forever, it
  stays in a yellow state, till I kill the Galaxy server. The log says:
 
  127.0.0.1 - - [18/Aug/2015:19:08:00 +0200] GET
  /tool_runner?tool_id=SADIBUSYBOX HTTP/1.1 200 - http://127.0.0.1:8080/
 
  Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101
  Firefox/40.0
  galaxy.tools.actions INFO 2015-08-18 19:08:07,178 Handled output 

[galaxy-dev] Get dataset/API ids for a dataset

2015-08-19 Thread Asma Riyaz
Hello Galaxy-dev,

I thank you so much for all the help you have given me.

I have a question about data set ids in galaxy. As a background, I am
running my own galaxy instance on a server. A pipeline implemented in the
lab produces the following files in the history:

1) 2 BAM files
2) A JSON file

My goal is to use this JSON file to pass the path/URL of bam files into a
custom JS we wrote for visualization purpose.

This JSON file contains among many other details the paths/URLs to the
above bam files. I am using JSON filetypes to send data to the JS
visualization within Galaxy. To do this, I have my own JS which loads a BAM
file from URL provided into an IGV.js track. IGV.js, which is responsible
for making the tracks, expects a valid URL which is updated in the JSON
file in this manner:

1) Extract the API_key and history id from a loaded BAM file
2) Edit the JSON file to reflect the BAM file's dataset id to be something
like this:

{
  CLL-HL_pilot.r1.fastq: {
DNA: /datasets/36ddb788a0f14eb3/display?to_ext=bam,
...

This works fine if I know the API Key for bam files. When a pipeline
executes dataset ids are generated for each output. I want to access and
include these ids in the JSON file and load the updated JSON file into the
history with the bams. Is there a way to get the ids from the history in
this manner?

Thank you,

Asma
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Dataset Collections status

2015-08-19 Thread John Chilton
On Fri, Aug 7, 2015 at 8:01 PM, Keith Suderman suder...@cs.vassar.edu wrote:
 Greetings,

 I started pulling Galaxy code from the dev branch a few months ago to take 
 advantage of the (then just emerging) dataset collections feature.  However, 
 it is not clear to me from the latest release notes if the data collections 
 are now fully merged into master, or if I should continue to use the code in 
 the dev branch to take advantage of bleeding edge code.  I would like to move 
 back to the master branch as soon as feasible.

It is an ongoing effort - but the master branch as of now contains
essentially everything in the dev branch
https://github.com/galaxyproject/galaxy/tree/master. I need to put
together some release notes for 15.07 before there can be an
announcement of that but there is a few new collection related things
in the release. In some senses though collections have been fully
usable for over a year - and in some senses there is a lot of work
left to do. Kind of depends on what you are doing.


 When running workflows over dataset collections I will frequently see errors 
 like:

 /bin/sh: 1: 
 /home/galaxy/galaxy_old/database/job_working_directory/001/1216/galaxy_1216.sh:
  Text file busy

I don't think this is related to collections per se - I think it is
probably more a file system problem - are you using a local job runner
or a cluster manager? Is the file system mounted over a slow NFS
connection?


 Which, from what I can tell, occurs when one process is trying to 
 modify/delete a file open in another process.  While the error seems to be 
 repeatable, it also seems random as the errors do not occur in the same 
 places if I run the workflow multiple times.

 Given that I am working from the dev branch I don't want to open/raise issues 
 on features still in development.  But if this is unexpected then I can do 
 some more investigating and file a proper bug report.

I would say report bugs in dev always - maybe check the existing ones
on github and Trello first - but ideally we would like to catch bugs
as early as possible and we don't usually commit half-baked code to
dev - it should be bug free (though maybe missing features).

Hope this helps,
-John


 Cheers,
 Keith

 --
 Research Associate
 Department of Computer Science
 Vassar College
 Poughkeepsie, NY


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   https://lists.galaxyproject.org/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] 20 August GalaxyAdmins Meetup: Genomic Data Management; Tool Installation Automation

2015-08-19 Thread Dave Clements
Hi All,

Just a reminder that the next GalaxyAdmins online meetup is tomorrow,
Thursday, August 20
https://wiki.galaxyproject.org/Community/GalaxyAdmins/Meetups/2015_08_20
 at your local time
http://www.timeanddate.com/worldclock/fixedtime.html?msg=GalaxyAdmins+August+2015+Web+Meetupiso=20150820T17p1=1229ah=1am=15
.

It can take a few minutes to connect so you aren encouraged to start the
process a few minutes early.

Hope to see you tomorrow,

Dave C

On Thu, Aug 13, 2015 at 10:47 AM, Dave Clements cleme...@galaxyproject.org
wrote:

 Hello all,

 The next GalaxyAdmins online meetup is next week, on Thursday, August 20
 https://wiki.galaxyproject.org/Community/GalaxyAdmins/Meetups/2015_08_20.
 The featured topics are:


- *Genomic data management at Canada's National Microbiology
Laboratory https://www.nml-lnm.gc.ca/index-eng.htm with IRIDA
http://www.irida.ca/ and Galaxy*, presented by Aaron Petkau
https://github.com/apetkau
- *Adding transparency and automation into the Galaxy tool
installation process https://github.com/afgane/ansible-tools, *presented
by Enis Afgan https://wiki.galaxyproject.org/EnisAfgan

 The meetup will use the same Adobe Connect technology
 https://wiki.galaxyproject.org/Community/GalaxyAdmins/Meetups/2015_08_20#Call_Technology
 used on previous calls.  It takes a couple of minutes to connect, so you
 are encouraged to start connecting a few minutes early.

 See your local time
 http://www.timeanddate.com/worldclock/fixedtime.html?msg=GalaxyAdmins+August+2015+Web+Meetupiso=20150820T17p1=1229ah=1am=15
 to know when to connect.

 Looking forward to our first post-GCC gathering,

 Dave C and Hans-Rudolf

 --
 http://galaxyproject.org/
 http://getgalaxy.org/
 http://usegalaxy.org/
 https://wiki.galaxyproject.org/




-- 
http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
https://wiki.galaxyproject.org/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Dataset Collections status

2015-08-19 Thread Suderman Keith
Hi John,

I will try what we have against master.  I just went through my old emails and 
it looks like the developer branch was recommended in response to a UI issue I 
experienced with large dataset collections and not the collections themselves.  
For reference, the UI issue occurred when I inadvertently created 4K history 
items and jQuery kept timing out trying to update all the checkboxes being 
created.

The Text file busy error occurred on my developer machine (OS X 10.9.5, 
Python 2.7.9) with no job runner, cluster manager, or NFS.  I will run more 
tests and file proper bug reports for both issues if I can still recreate them.

Cheers
Keith

On Aug 19, 2015, at 9:48 AM, John Chilton jmchil...@gmail.com wrote:

 On Fri, Aug 7, 2015 at 8:01 PM, Keith Suderman suder...@cs.vassar.edu wrote:
 Greetings,
 
 I started pulling Galaxy code from the dev branch a few months ago to take 
 advantage of the (then just emerging) dataset collections feature.  However, 
 it is not clear to me from the latest release notes if the data collections 
 are now fully merged into master, or if I should continue to use the code in 
 the dev branch to take advantage of bleeding edge code.  I would like to 
 move back to the master branch as soon as feasible.
 
 It is an ongoing effort - but the master branch as of now contains
 essentially everything in the dev branch
 https://github.com/galaxyproject/galaxy/tree/master. I need to put
 together some release notes for 15.07 before there can be an
 announcement of that but there is a few new collection related things
 in the release. In some senses though collections have been fully
 usable for over a year - and in some senses there is a lot of work
 left to do. Kind of depends on what you are doing.
 
 
 When running workflows over dataset collections I will frequently see errors 
 like:
 
 /bin/sh: 1: 
 /home/galaxy/galaxy_old/database/job_working_directory/001/1216/galaxy_1216.sh:
  Text file busy
 
 I don't think this is related to collections per se - I think it is
 probably more a file system problem - are you using a local job runner
 or a cluster manager? Is the file system mounted over a slow NFS
 connection?
 
 
 Which, from what I can tell, occurs when one process is trying to 
 modify/delete a file open in another process.  While the error seems to be 
 repeatable, it also seems random as the errors do not occur in the same 
 places if I run the workflow multiple times.
 
 Given that I am working from the dev branch I don't want to open/raise 
 issues on features still in development.  But if this is unexpected then I 
 can do some more investigating and file a proper bug report.
 
 I would say report bugs in dev always - maybe check the existing ones
 on github and Trello first - but ideally we would like to catch bugs
 as early as possible and we don't usually commit half-baked code to
 dev - it should be bug free (though maybe missing features).
 
 Hope this helps,
 -John
 
 
 Cheers,
 Keith
 
 --
 Research Associate
 Department of Computer Science
 Vassar College
 Poughkeepsie, NY
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/
 
 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] new visualization tool for composite datatype?

2015-08-19 Thread Carl Eberhard
Yes, you can still use the framework to create interactive python-only
visualizations. The simplest way to do this is form-based interaction.

1. The visualization mako page renders a form that contains the options the
user can choose from
2. The user chooses and submits the form
3. Galaxy will re-render the visualization mako and now, since those
options are passed as parameters, the chosen visualization can be rendered

So you can render the form with your options:
%if not query.get( 'param2', None ):
## initial render of form
form
input name=param2 /
input name=param3 /
input name=param4 /
button type=submitDraw/button
/form
%else:
## second render drawing with param2
...

Note: this will default to a GET based form whose 'action' url will be the
visualization's url. Any html5 form based controls will work: radio
buttons, ranges, etc.

When the user submits the form, each of the params, will now be part of the
query string when the mako renders the second time:
/visualization/show/myvis?param2=somethingparam3=...

You can then use these params to call your python code with the user chosen
options:
arg2 = query.get( 'param2', default )
arg3 = query.get( 'param3', ...

from draw import main
returned = main( arg2, arg3, ... )

svg
${ returned }
/svg
(or something similar)

Other things that may help?:

   - you can use action=post if you want to post data instead of get -
   handy for larger parameters
   - try to not bootstrap/load all your data into the mako file as this
   will be re-fetched everytime the form is re-submitted
   - you can add the arg2, arg3, etc. params to the config file and the
   registry will parse those for you and put them in the template's 'config'
   variable, e.g.: int_param2 = config.param2
   - when using these query/form parameters you can now save/generate urls
   that will auto-start your visualizations since all the configuration
   options are sent in the url


Let me know if I can give more info or that doesn't end up working.


On Wed, Aug 19, 2015 at 9:30 AM, Anne Claire Fouilloux 
a.c.fouill...@geo.uio.no wrote:

 Hi Carl,


 Yes I have a python code to manipulate and draw the data (it creates svg
 files and I wish to add this visualization tool to my galaxy instance:

 Let's say my composite datatype contains 3 files:

 -file1.dat

 -file2.dat

 -file3.dat

 My python code takes 4 arguments (for instance draw.py arg1 arg2 arg3
 arg4).

 arg1: The first argument is a path giving the location of these 3 files.

 arg2, arg3 and arg4 are list of options. These lists are dynamic as they
 depend on the content of these 3 files.

 Once the user has selected some options, I have a draw button and would
 like the python script draw.py to be executed and generate a plot.

 Your answer was very helpful; at least it clarifies a number of things.
 Now I understood I can access to these files using hda.extra_files_path in
 my mako file:

 %

 composite_file_path=hda.extra_files_path + '/' + date + '/'

 %

 Which is helpful to generate my select buttons (for arg2, arg3 and arg4)
 as I fist need to read the content of these files for the generation of
 these buttons.


 I still need to call my python script with the option chosen by the user.
 Is it something I can easily do with galaxy?


 Thanks,


 Anne.



 --
 *From:* Carl Eberhard carlfeberh...@gmail.com
 *Sent:* 18 August 2015 18:59
 *To:* Anne Claire Fouilloux
 *Cc:* galaxy-dev@lists.galaxyproject.org
 *Subject:* Re: [galaxy-dev] new visualization tool for composite datatype?

 Hi, Anne

 Can you be more specific about how you'd like to visualize the files? It
 sounds like you've already got a tool that will build an html file which is
 the classic way to make visualizations with galaxy.

 Now you'd like to make it more interactive? Do you already have javascript
 or python code that will be used to manipulate or draw the data? The
 visualization registry will help you link a user's data to your
 visualization code but will not create the code that does the rendering or
 interaction.

 Aside from that, if you know the structure and filenames of your
 composite's extra files directory, currently you can access those files
 using:
 api/histories/{history_id}/contents/{content_id}/display?filename=some
 file
 The meme type will be guessed based on the extension generally and more
 complex paths can also be specified using the filename parameter.

 So, for example, if I have a FastQC Html composite file with an extra
 files path of:
 database/files/000/dataset_83_files

 And within that directory, fastqc makes an additional subdirectory:
 sample1_fastqc/Icons
 sample1_fastqc/Images
 sample1_fastqc/summary.txt

 I can fetch the summary.txt file (with the 'text/plain' memetype) using:

 api/histories/{history_id}/contents/{content_id}/display?filename=sample1_fastqc/summary.txt

 Let me know if that wasn't the info you were after or I misunderstood the
 question.
 Carl


 On Fri, Aug