[galaxy-dev] Galaxy in Amazon
Hi all - We are running a local instance of Galaxy on our internal infrastructure. It seems to be going well. We've gotten to the point where we are ready to migrate our NGS data to Amazon for storage in S3. We are also looking at how Galaxy can be used in Amazon. Specifically, we are interested in understanding: 1) Should we run an instance of Galaxy in Amazon, or continue to run it locally (to minimize costs) but have it run analyses in Amazon? 2) Regardless of how we run it, data will be stored in S3. How will Galaxy interact with S3 for its Data Libraries? 3) Is it even possible to separate the Galaxy web interface from the HPC cluster? 3) We understand Galaxy in Amazon uses CloudMan. Can we run this in our VPC with our own AMI? If anyone can provide insights into how they are using Galaxy in Amazon, I am very interested to hear your thoughts. Ryan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Galaxy in Amazon
Hi Ryan, What you're suggesting to do is still somewhat experimental but we're continuing to work on it to make it more integrated into the Galaxy ecosystem and more robust. There are really three general approaches: 1. Run Galaxy via CloudMan 100% on AWS. This option is most robust and basically ready for use but, over time and if you decide to make modifications to various pieces of the puzzle, will require an increased understanding of how CloudMan works. It's also the most expensive option. 2. Run Galaxy UI locally and create a CloudMan cluster on demand with the Pulsar http://pulsar.readthedocs.org/en/latest/index.html service enabled to accept jobs from the local Galaxy. This paper describes that approach: http://onlinelibrary.wiley.com/doi/10.1002/cpe.3536/abstract 3. Run your Galaxy UI locally and create Ansible roles/tasks to dynamically acquire cloud instances and assemble those into a cluster. You will probably want to use Pulsar for job management again. This option gives you most control but also means you'll need to build the system. Nate may also have more comments about this. I've also put some comments about your specific questions inline. Hope this helps clarify the situation at least. We're actively working on this scenario so things should get easier in the future. Let us know if you have more questions and what you decide. Cheers, Enis On Wed, Aug 19, 2015 at 8:53 AM, Ryan G ngsbioinformat...@gmail.com wrote: Hi all - We are running a local instance of Galaxy on our internal infrastructure. It seems to be going well. We've gotten to the point where we are ready to migrate our NGS data to Amazon for storage in S3. We are also looking at how Galaxy can be used in Amazon. Specifically, we are interested in understanding: 1) Should we run an instance of Galaxy in Amazon, or continue to run it locally (to minimize costs) but have it run analyses in Amazon? The options above summarize this scenario. 2) Regardless of how we run it, data will be stored in S3. How will Galaxy interact with S3 for its Data Libraries? Galaxy implements an Object Store interface that can link to S3 as a back-end data store. It's been around for a number of years now and demonstrated as working but it also hasn't been used in production so I'd suggest testing this first. Galaxy configuration options for the object store are in Galaxy's config file: https://github.com/galaxyproject/galaxy/blob/dev/config/galaxy.ini.sample#L289 3) Is it even possible to separate the Galaxy web interface from the HPC cluster? Yes; you either need a shared file system between the resources or use Pulsar. 3) We understand Galaxy in Amazon uses CloudMan. Can we run this in our VPC with our own AMI? Yes; you can build your own version of the system with the tools and whatever else you would like to configure. Docs on how to do this are available here: https://wiki.galaxyproject.org/CloudMan/Building If anyone can provide insights into how they are using Galaxy in Amazon, I am very interested to hear your thoughts. Ryan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Unable to run simple Docker tool
2015-08-19 15:50 GMT+02:00 John Chilton jmchil...@gmail.com: I don't know to be honest - a couple of things to verify. I wrote a bunch of debug stuff right away at the end of the e-mail and you can try if I am wrong, but after I wrote I realized the problem. You are not using sudo for running docker in your examples - this is paused because sudo is waiting on a password and you don't have passwordless sudo setup probably. I would just add param id=docker_sudofalse/param to your job conf destination and this should work. Yes, this worked. Thanks! Regards The other stuff to try: Does the tool work without using Docker - if you just place test-io.sh on Galaxy's PATH. If yes, I would set cleanup_job = never in galaxy.ini and try again. Once it dies, grab this part of the command line: sudo docker run -e GALAXY_SLOTS=$GALAXY_SLOTS -v /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy:/home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy:ro -v /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/tools/catDocker:/home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/tools/catDocker:ro -v /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/job_working_directory/000/2:/home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/job_working_directory/000/2:rw -v /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/files:/home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/files:rw -w /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/job_working_directory/000/2 --net none --rm -u 1001 mikeleganaaranguren/busybox-galaxy-test-io:v1 /home/mikel/UPV-EHU/SADI-Docker-Galaxy/galaxy/database/job_working_directory/000/2/tool_script.sh and try to debug the problem outside of Galaxy. Maybe using docker logs for instance. If the logs don't reveal anything try dropping some of the command-line arguments and see if that is the problem - e.g. --net none or -u 1001. -John On Tue, Aug 18, 2015 at 6:32 PM, Mikel Egaña Aranguren mikel.egana.arangu...@gmail.com wrote: Hi; I'm trying to develop a Docker based tool, as suggested by a peer-reviewer who might be reading this :P However, I'm having trouble with even the most basic setting, and I don't know what might be wrong, so any help will be much appreciated. I have developed a very simple docker image and corresponding Galaxy tool, so that I get it working before starting with the actual tool, but when I execute it through Galaxy it simply stays executing forever, instead of failing or terminating. My image simply executes a shell script that reads the content of a file and concatenates a string to it. The image: FROM busybox:ubuntu-14.04 MAINTAINER Mikel Egaña Aranguren mikel.egana.arangu...@gmail.com RUN mkdir /sadi COPY test-io.sh /sadi/ RUN chmod a+x /sadi/test-io.sh ENV PATH $PATH:/sadi The test-io.sh script within the image: #!/bin/sh cat $1 echo AAA Invoking the container and executing the script through a normal shell works fine: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE mikeleganaaranguren/busybox-galaxy-test-io v1 9c2b8bdade1d54 minutes ago 5.609 MB docker run -i -t mikeleganaaranguren/busybox-galaxy-test-io:v1 BusyBox v1.21.1 (Ubuntu 1:1.21.0-1ubuntu1) built-in shell (ash) Enter 'help' for a list of built-in commands. ~ # ~ # echo BBB test ~ # test-io.sh test BBB AAA This is the tool file I'm using in Galaxy: tool id=SADIBUSYBOX name=SADIBUSYBOX descriptionIO/description requirements container type=dockermikeleganaaranguren/busybox-galaxy-test-io:v1/container /requirements command test-io.sh $input $output /command inputs param name=input type=data label=Dataset/ /inputs outputs data format=txt name=output / /outputs help /help /tool And my job_conf.xml: ?xml version=1.0? !-- A sample job config that explicitly configures job running the way it is configured by default (if there is no explicit config). -- job_conf plugins plugin id=local type=runner load=galaxy.jobs.runners.local:LocalJobRunner workers=4/ /plugins handlers handler id=main/ /handlers destinations default=docker_local destination id=local runner=local/ destination id=docker_local runner=local param id=docker_enabledtrue/param /destination /destinations /job_conf As I said, when I execute the tool in Galaxy, it simply executes forever, it stays in a yellow state, till I kill the Galaxy server. The log says: 127.0.0.1 - - [18/Aug/2015:19:08:00 +0200] GET /tool_runner?tool_id=SADIBUSYBOX HTTP/1.1 200 - http://127.0.0.1:8080/ Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0 galaxy.tools.actions INFO 2015-08-18 19:08:07,178 Handled output
[galaxy-dev] Get dataset/API ids for a dataset
Hello Galaxy-dev, I thank you so much for all the help you have given me. I have a question about data set ids in galaxy. As a background, I am running my own galaxy instance on a server. A pipeline implemented in the lab produces the following files in the history: 1) 2 BAM files 2) A JSON file My goal is to use this JSON file to pass the path/URL of bam files into a custom JS we wrote for visualization purpose. This JSON file contains among many other details the paths/URLs to the above bam files. I am using JSON filetypes to send data to the JS visualization within Galaxy. To do this, I have my own JS which loads a BAM file from URL provided into an IGV.js track. IGV.js, which is responsible for making the tracks, expects a valid URL which is updated in the JSON file in this manner: 1) Extract the API_key and history id from a loaded BAM file 2) Edit the JSON file to reflect the BAM file's dataset id to be something like this: { CLL-HL_pilot.r1.fastq: { DNA: /datasets/36ddb788a0f14eb3/display?to_ext=bam, ... This works fine if I know the API Key for bam files. When a pipeline executes dataset ids are generated for each output. I want to access and include these ids in the JSON file and load the updated JSON file into the history with the bams. Is there a way to get the ids from the history in this manner? Thank you, Asma ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Dataset Collections status
On Fri, Aug 7, 2015 at 8:01 PM, Keith Suderman suder...@cs.vassar.edu wrote: Greetings, I started pulling Galaxy code from the dev branch a few months ago to take advantage of the (then just emerging) dataset collections feature. However, it is not clear to me from the latest release notes if the data collections are now fully merged into master, or if I should continue to use the code in the dev branch to take advantage of bleeding edge code. I would like to move back to the master branch as soon as feasible. It is an ongoing effort - but the master branch as of now contains essentially everything in the dev branch https://github.com/galaxyproject/galaxy/tree/master. I need to put together some release notes for 15.07 before there can be an announcement of that but there is a few new collection related things in the release. In some senses though collections have been fully usable for over a year - and in some senses there is a lot of work left to do. Kind of depends on what you are doing. When running workflows over dataset collections I will frequently see errors like: /bin/sh: 1: /home/galaxy/galaxy_old/database/job_working_directory/001/1216/galaxy_1216.sh: Text file busy I don't think this is related to collections per se - I think it is probably more a file system problem - are you using a local job runner or a cluster manager? Is the file system mounted over a slow NFS connection? Which, from what I can tell, occurs when one process is trying to modify/delete a file open in another process. While the error seems to be repeatable, it also seems random as the errors do not occur in the same places if I run the workflow multiple times. Given that I am working from the dev branch I don't want to open/raise issues on features still in development. But if this is unexpected then I can do some more investigating and file a proper bug report. I would say report bugs in dev always - maybe check the existing ones on github and Trello first - but ideally we would like to catch bugs as early as possible and we don't usually commit half-baked code to dev - it should be bug free (though maybe missing features). Hope this helps, -John Cheers, Keith -- Research Associate Department of Computer Science Vassar College Poughkeepsie, NY ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] 20 August GalaxyAdmins Meetup: Genomic Data Management; Tool Installation Automation
Hi All, Just a reminder that the next GalaxyAdmins online meetup is tomorrow, Thursday, August 20 https://wiki.galaxyproject.org/Community/GalaxyAdmins/Meetups/2015_08_20 at your local time http://www.timeanddate.com/worldclock/fixedtime.html?msg=GalaxyAdmins+August+2015+Web+Meetupiso=20150820T17p1=1229ah=1am=15 . It can take a few minutes to connect so you aren encouraged to start the process a few minutes early. Hope to see you tomorrow, Dave C On Thu, Aug 13, 2015 at 10:47 AM, Dave Clements cleme...@galaxyproject.org wrote: Hello all, The next GalaxyAdmins online meetup is next week, on Thursday, August 20 https://wiki.galaxyproject.org/Community/GalaxyAdmins/Meetups/2015_08_20. The featured topics are: - *Genomic data management at Canada's National Microbiology Laboratory https://www.nml-lnm.gc.ca/index-eng.htm with IRIDA http://www.irida.ca/ and Galaxy*, presented by Aaron Petkau https://github.com/apetkau - *Adding transparency and automation into the Galaxy tool installation process https://github.com/afgane/ansible-tools, *presented by Enis Afgan https://wiki.galaxyproject.org/EnisAfgan The meetup will use the same Adobe Connect technology https://wiki.galaxyproject.org/Community/GalaxyAdmins/Meetups/2015_08_20#Call_Technology used on previous calls. It takes a couple of minutes to connect, so you are encouraged to start connecting a few minutes early. See your local time http://www.timeanddate.com/worldclock/fixedtime.html?msg=GalaxyAdmins+August+2015+Web+Meetupiso=20150820T17p1=1229ah=1am=15 to know when to connect. Looking forward to our first post-GCC gathering, Dave C and Hans-Rudolf -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ https://wiki.galaxyproject.org/ -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ https://wiki.galaxyproject.org/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Dataset Collections status
Hi John, I will try what we have against master. I just went through my old emails and it looks like the developer branch was recommended in response to a UI issue I experienced with large dataset collections and not the collections themselves. For reference, the UI issue occurred when I inadvertently created 4K history items and jQuery kept timing out trying to update all the checkboxes being created. The Text file busy error occurred on my developer machine (OS X 10.9.5, Python 2.7.9) with no job runner, cluster manager, or NFS. I will run more tests and file proper bug reports for both issues if I can still recreate them. Cheers Keith On Aug 19, 2015, at 9:48 AM, John Chilton jmchil...@gmail.com wrote: On Fri, Aug 7, 2015 at 8:01 PM, Keith Suderman suder...@cs.vassar.edu wrote: Greetings, I started pulling Galaxy code from the dev branch a few months ago to take advantage of the (then just emerging) dataset collections feature. However, it is not clear to me from the latest release notes if the data collections are now fully merged into master, or if I should continue to use the code in the dev branch to take advantage of bleeding edge code. I would like to move back to the master branch as soon as feasible. It is an ongoing effort - but the master branch as of now contains essentially everything in the dev branch https://github.com/galaxyproject/galaxy/tree/master. I need to put together some release notes for 15.07 before there can be an announcement of that but there is a few new collection related things in the release. In some senses though collections have been fully usable for over a year - and in some senses there is a lot of work left to do. Kind of depends on what you are doing. When running workflows over dataset collections I will frequently see errors like: /bin/sh: 1: /home/galaxy/galaxy_old/database/job_working_directory/001/1216/galaxy_1216.sh: Text file busy I don't think this is related to collections per se - I think it is probably more a file system problem - are you using a local job runner or a cluster manager? Is the file system mounted over a slow NFS connection? Which, from what I can tell, occurs when one process is trying to modify/delete a file open in another process. While the error seems to be repeatable, it also seems random as the errors do not occur in the same places if I run the workflow multiple times. Given that I am working from the dev branch I don't want to open/raise issues on features still in development. But if this is unexpected then I can do some more investigating and file a proper bug report. I would say report bugs in dev always - maybe check the existing ones on github and Trello first - but ideally we would like to catch bugs as early as possible and we don't usually commit half-baked code to dev - it should be bug free (though maybe missing features). Hope this helps, -John Cheers, Keith -- Research Associate Department of Computer Science Vassar College Poughkeepsie, NY ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] new visualization tool for composite datatype?
Yes, you can still use the framework to create interactive python-only visualizations. The simplest way to do this is form-based interaction. 1. The visualization mako page renders a form that contains the options the user can choose from 2. The user chooses and submits the form 3. Galaxy will re-render the visualization mako and now, since those options are passed as parameters, the chosen visualization can be rendered So you can render the form with your options: %if not query.get( 'param2', None ): ## initial render of form form input name=param2 / input name=param3 / input name=param4 / button type=submitDraw/button /form %else: ## second render drawing with param2 ... Note: this will default to a GET based form whose 'action' url will be the visualization's url. Any html5 form based controls will work: radio buttons, ranges, etc. When the user submits the form, each of the params, will now be part of the query string when the mako renders the second time: /visualization/show/myvis?param2=somethingparam3=... You can then use these params to call your python code with the user chosen options: arg2 = query.get( 'param2', default ) arg3 = query.get( 'param3', ... from draw import main returned = main( arg2, arg3, ... ) svg ${ returned } /svg (or something similar) Other things that may help?: - you can use action=post if you want to post data instead of get - handy for larger parameters - try to not bootstrap/load all your data into the mako file as this will be re-fetched everytime the form is re-submitted - you can add the arg2, arg3, etc. params to the config file and the registry will parse those for you and put them in the template's 'config' variable, e.g.: int_param2 = config.param2 - when using these query/form parameters you can now save/generate urls that will auto-start your visualizations since all the configuration options are sent in the url Let me know if I can give more info or that doesn't end up working. On Wed, Aug 19, 2015 at 9:30 AM, Anne Claire Fouilloux a.c.fouill...@geo.uio.no wrote: Hi Carl, Yes I have a python code to manipulate and draw the data (it creates svg files and I wish to add this visualization tool to my galaxy instance: Let's say my composite datatype contains 3 files: -file1.dat -file2.dat -file3.dat My python code takes 4 arguments (for instance draw.py arg1 arg2 arg3 arg4). arg1: The first argument is a path giving the location of these 3 files. arg2, arg3 and arg4 are list of options. These lists are dynamic as they depend on the content of these 3 files. Once the user has selected some options, I have a draw button and would like the python script draw.py to be executed and generate a plot. Your answer was very helpful; at least it clarifies a number of things. Now I understood I can access to these files using hda.extra_files_path in my mako file: % composite_file_path=hda.extra_files_path + '/' + date + '/' % Which is helpful to generate my select buttons (for arg2, arg3 and arg4) as I fist need to read the content of these files for the generation of these buttons. I still need to call my python script with the option chosen by the user. Is it something I can easily do with galaxy? Thanks, Anne. -- *From:* Carl Eberhard carlfeberh...@gmail.com *Sent:* 18 August 2015 18:59 *To:* Anne Claire Fouilloux *Cc:* galaxy-dev@lists.galaxyproject.org *Subject:* Re: [galaxy-dev] new visualization tool for composite datatype? Hi, Anne Can you be more specific about how you'd like to visualize the files? It sounds like you've already got a tool that will build an html file which is the classic way to make visualizations with galaxy. Now you'd like to make it more interactive? Do you already have javascript or python code that will be used to manipulate or draw the data? The visualization registry will help you link a user's data to your visualization code but will not create the code that does the rendering or interaction. Aside from that, if you know the structure and filenames of your composite's extra files directory, currently you can access those files using: api/histories/{history_id}/contents/{content_id}/display?filename=some file The meme type will be guessed based on the extension generally and more complex paths can also be specified using the filename parameter. So, for example, if I have a FastQC Html composite file with an extra files path of: database/files/000/dataset_83_files And within that directory, fastqc makes an additional subdirectory: sample1_fastqc/Icons sample1_fastqc/Images sample1_fastqc/summary.txt I can fetch the summary.txt file (with the 'text/plain' memetype) using: api/histories/{history_id}/contents/{content_id}/display?filename=sample1_fastqc/summary.txt Let me know if that wasn't the info you were after or I misunderstood the question. Carl On Fri, Aug