[galaxy-dev] Troubleshooting file uploads (to Data Library)
Hi, I am having trouble uploading files to a Data Library, and I'm not sure where to begin troubleshooting. I'm uploading from a URL (but I had a similar issue from the desktop). The symptom is that the datasets in the library have the message This job is queued and never seem to progress. I am one of very few users of this instance (quite likely the only user right this moment). I don't think the server is busy, so I'm not sure why the files uploads don't seem to be proceeding. How can I investigate further? Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Troubleshooting file uploads (to Data Library)
Sorry, missed some information: there are a handful of files, 35MB (gzipped) each. The issue occurs even if I only try to upload one of them though. The server is a 16-core machine. On 21 August 2013 17:08, Clare Sloggett s...@unimelb.edu.au wrote: Hi, I am having trouble uploading files to a Data Library, and I'm not sure where to begin troubleshooting. I'm uploading from a URL (but I had a similar issue from the desktop). The symptom is that the datasets in the library have the message This job is queued and never seem to progress. I am one of very few users of this instance (quite likely the only user right this moment). I don't think the server is busy, so I'm not sure why the files uploads don't seem to be proceeding. How can I investigate further? Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Troubleshooting file uploads (to Data Library)
Hi Hans and all, The issue turned out to be a more general one with our job runners (it was also stopping non-data-transfer jobs running, if I'd noticed). Yep, I do specify file format when uploading. Thanks for your help! Clare On 21 August 2013 17:38, Hans-Rudolf Hotz h...@fmi.ch wrote: Hi Clare a few points to start: - do you define the 'File Format'? (don't use 'Auto-detect' for big files) - and similar to a recent question on the list: check your proxy settings Regards, Hans-Rudof On 08/21/2013 09:09 AM, Clare Sloggett wrote: Sorry, missed some information: there are a handful of files, 35MB (gzipped) each. The issue occurs even if I only try to upload one of them though. The server is a 16-core machine. On 21 August 2013 17:08, Clare Sloggett s...@unimelb.edu.au mailto:s...@unimelb.edu.au wrote: Hi, I am having trouble uploading files to a Data Library, and I'm not sure where to begin troubleshooting. I'm uploading from a URL (but I had a similar issue from the desktop). The symptom is that the datasets in the library have the message This job is queued and never seem to progress. I am one of very few users of this instance (quite likely the only user right this moment). I don't think the server is busy, so I'm not sure why the files uploads don't seem to be proceeding. How can I investigate further? Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 __**_ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/**search/mailinglists/http://galaxyproject.org/search/mailinglists/ -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] two versions of show datasets API call?
Hi all, This is a using-the-API question; not sure if it belongs in galaxy-dev or galaxy-user ! There seem to be two ways to retrieve metadata for a dataset, one in the Histories API and one in the Datasets API. They return different information. So for instance if I call http://galaxy-vic.genome.edu.au/api/datasets/cb5d3b9eef2b9275?key=myapikey I see { data_type: fastqsanger, deleted: false, file_size: 16439610, genome_build: ?, id: 397, metadata_data_lines: null, metadata_dbkey: ?, metadata_sequences: null, misc_blurb: 15.7 MB, misc_info: uploaded fastqsanger file, model_class: HistoryDatasetAssociation, name: https://bioblend.s3.amazonaws.com/C1_R1_1.chr4.fq;, purged: false, state: ok, visible: true } But if I call http://galaxy-vic.genome.edu.au/api/histories/fb4122d2ca33443e/contents/cb5d3b9eef2b9275?key=myapikey I see { accessible: true, api_type: file, data_type: fastqsanger, deleted: false, display_apps: [], download_url: /datasets/cb5d3b9eef2b9275/display?to_ext=fastqsanger, file_ext: fastqsanger, file_name: /mnt/all/cloudman/galaxy/clare/files/000/dataset_375.dat, file_size: 16439610, genome_build: ?, hid: 1, history_id: fb4122d2ca33443e, id: cb5d3b9eef2b9275, metadata_data_lines: null, metadata_dbkey: ?, metadata_sequences: null, misc_blurb: 15.7 MB, misc_info: uploaded fastqsanger file, model_class: HistoryDatasetAssociation, name: https://bioblend.s3.amazonaws.com/C1_R1_1.chr4.fq;, peek: table cellspacing=\0\ cellpadding=\3\trtd@9453842/1/td/trtrtdCAGATTATGGAATCACTTGAAACTGATATTAATTGCCGAAAGATGCATCTTTCACGTTAGGAAATGTTGCT/td/trtrtd+/td/trtrtdIII/td/trtrtd@9454359/1/td/trtrtdGGAAATGAGTACAGCTATGCAACAGCTATCAGTAAGGCCGAAGAGTTTGATACTATTTCTGCATTGA/td/tr/table, purged: false, state: ok, visible: true, visualizations: [] } The second version gives me much more information, including the History ID (but ironically requires the History ID to make the call in the first place). What I would ideally like is an API call which only requires knowledge of the Dataset ID but provides all the information in the second call. I am also a bit confused by the existence of the two different methods in the first place. Is it necessary for it to be this way, or are they just there for historical reasons? If it would be desirable to either only have one show-dataset REST method or to make the behaviour of the two identical, should I add a Trello card for this? Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] BioBlend: Problem Running Example file
Oops sorry, meant to keep the list cc'd - resending. On 18 March 2013 16:18, Clare Sloggett s...@unimelb.edu.au wrote: Hi Rob, Were you using a very old version of the library and examples or was it quite recent? In any case, try grabbing the script from https://github.com/afgane/bioblend/tree/master/docs/examples , we have done a lot of bugfixing and documentation recently. If you just want to test your setup there are also now some much simpler, less end-to-end example scripts that don't use any admin permissions, and we'll add a few more. The create_library step is the step where you're going to run into problems if you don't have admin rights to the Galaxy instance. If you're running on localhost though you should be able to set up the account as an admin account. Finally... we've put up some much better docs on using workflows (and libraries) in bioblend, which might be useful: http://bioblend.readthedocs.org/ and specifically http://bioblend.readthedocs.org/en/latest/api_docs/galaxy/docs.html#run-a-workflow A lot of this is very recent so if you run into any bugs or anything that is just not clear, let me know, the feedback is very helpful! Cheers, Clare On 8 March 2013 16:01, Rob Leclerc robert.lecl...@gmail.com wrote: I had trouble running blend4j, so I tried to jump into python (a language I have little experience with). I tried running the example run_import_workflow.py *% python run_imported_workflow.py http://localhost:80808c25bc83f6f9e4001dd21eb7b64f063f * but I get an error: Initiating Galaxy connection Importing workflow Creating data library 'Imported data for API demo' Traceback (most recent call last): File run_imported_workflow.py, line 53, in module library_dict = gi.libraries.create_library(library_name) File build/bdist.macosx-10.6-intel/egg/bioblend/galaxy/libraries/__init__.py, line 27, in create_library File build/bdist.macosx-10.6-intel/egg/bioblend/galaxy/client.py, line 53, in _post File build/bdist.macosx-10.6-intel/egg/bioblend/galaxy/__init__.py, line 132, in make_post_request File /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-1.1.0-py2.7.egg/requests/models.py, line 604, in json return json.loads(self.text or self.content) File /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/simplejson-3.1.0-py2.7-macosx-10.6-intel.egg/simplejson/__init__.py, line 454, in loads return _default_decoder.decode(s) File /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/simplejson-3.1.0-py2.7-macosx-10.6-intel.egg/simplejson/decoder.py, line 374, in decode obj, end = self.raw_decode(s) File /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/simplejson-3.1.0-py2.7-macosx-10.6-intel.egg/simplejson/decoder.py, line 393, in raw_decode return self.scan_once(s, idx=_w(s, idx).end()) simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0) Is there anything I am missing from the stock configuration which would cause this not to run out of the box? Cheers, Rob ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Displayed versions of tools in Galaxy
Hi guys, I wasn't sure if I should send this one to galaxy-user. I have just confused myself about the versions of tools displayed within Galaxy. If I select the TopHat tool, the tool UI says Tophat for Illumina (version 1.5.0). After I have run the job, the step panel in the History window says Info: TopHat v1.4.1 If I select the Cufflinks tool, the tool UI says Cufflinks (version 0.0.5) After running the job, the step panel says Info: cufflinks v1.3.0. Looking at the xml files, it does look like the version displayed before running is the wrapper and the version displayed on the resulting dataset is that of the command-line tool. tool id=tophat name=Tophat for Illumina version=1.5.0 !-- Wrapper compatible with Tophat versions 1.3.0 to 1.4.1 -- tool id=cufflinks name=Cufflinks version=0.0.5 !-- Wrapper supports Cufflinks versions v1.3.0 and newer -- So looking at the XML it's very clear, but from the versions displayed in Galaxy I was completely confused, particularly since the two tophat version numbers happened to be similar (1.4.1 and 1.5.0). Maybe we should change it so that when Galaxy says version it is always explicit about whether it's wrapper version or just version? It also seems like there's no way for a user to discover the command-line tool version without actually running the tool (or is there)? Is this because Galaxy itself does not know this information? All this came about because I'm trying to specify to users which versions of tools my exported Workflow was built with, and I'm not sure how to do it without confusing them. Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] How to remove a broken toolshed install
Hi Greg, We had some occasions where we'd try to install a tool from the toolshed, and it would hang - it appeared that the hg pull was timing out. Was the timeout a regular occurrence? if so, do you know that cause, and were you able to get it resolved? It was repeated, but after a few tries the install would succeed without me really 'resolving' the issue. We haven't run into the issue in a while and I honestly have no idea if this is due to a galaxy upgrade or if we had a temporary run of bad luck with those particular tools. The October 5, 2012 Galaxy distribution news brief includes the following link to information about the process for handling repository installation errors, specifically when the errors occur during cloning. http://wiki.g2.bx.psu.edu/InstallingRepositoriesToGalaxy#Handling_repository_installation_errors If you're running an older version of Galaxy, you'll need to update to the October 5 release in order to have these features. The news brief release information is: upgrade: $ hg pull -u -r b5bda7a5c345 Let me know if this is not what you're looking for. Thanks again, this is great. I think we have upgraded past that point now. Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Illumina adaptor sequences in tools - copyright?
Hi all, We are looking at wrapping trimmomatic ( http://www.usadellab.org/cms/index.php?page=trimmomatic ). However to run, it requires the Illumina adaptor sequences, which are copyright. I was wondering if anyone has already dealt with this issue when wrapping other tools and putting them up on a public galaxy instance or in the Toolshed. For instance, I think that FastQC requires the same sequences. I would imagine that Illumina wouldn't want to stop people using these sequences for analysis purposes, but I'm still thinking we might need some sort of permission. Have others dealt with this? Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] How to remove a broken toolshed install
Hi Greg, Thanks for this! On 17 October 2012 01:17, Greg Von Kuster g...@bx.psu.edu wrote: I managed to break a toolshed-installed tool by fiddling with the files under shed_tools. As you've discovered, this is not a good thing to try. Always use the Galaxy interface features to perform tasks like this. Actually, the reason I did this was because I didn't know how to solve a different problem, so maybe I should ask you about that one as well. We had some occasions where we'd try to install a tool from the toolshed, and it would hang - it appeared that the hg pull was timing out. In these cases the config files wouldn't get set up, but a partial repository was pulled / directories were created, and the repository files would then get in the way of trying to install the tool (it seemed to think it was already there). The only way to fix it seemed to be to manually delete the partially-pulled repository under shed_tools. This worked fine for fixing failed installs. But, this time, I thought (wrongly) that this had happened again and I deleted a repository - then realised that it was actually installed and registered in the database, etc. So, if the hg pull times out, is there a right way to clean up the resulting files? I got in the habit of doing it manually, which of course is dangerous, because I didn't know any way to do it via the admin interface. Depending on the changes you've made, you should be able to do the following: 1. Manually remove the installed repository subdirectory hierarchy from disk. 2. If the repository included any tools, manually remove entries for each of them from the shed_tool-conf.xml file ( or the equivalent file you have configured for handling installed repositories ) 3. Manually update the database using the following command (assuming your installed repository is named 'bcftools_view' and it is the only repository you have installed with that name) - letter capitalization is required: The following assumes you're using postgres: update tool_shed_repository set deleted=True, uninstalled=True, status='Uninstalled', error_message=Null where name = 'bcftools_view'; Thanks very much! Yes it's postgres. I'll let you know if I succeed. Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] How to remove a broken toolshed install
Hi all, I managed to break a toolshed-installed tool by fiddling with the files under shed_tools. This led to a situation in which the Galaxy admin interface claims the tool is still installed, but can't find any files for it. I manually put the repository files where I think they should go, but this didn't fix the situation, so what I really want to do is just get rid of it altogether and reinstall cleanly. I'm not certain that the tool was working properly before I fiddled with it, either. Galaxy won't let me uninstall, deactivate or update it (because it can't find it properly) and it won't let me install it (because it thinks it's installed). It also seems (judging by the last of the errors below) to be unable to find some config information that it expects, but I don't really understand what's going on there. So my question is: given a messy, screwed up install, how can I completely remove it and start from scratch? What are the different components and config files I need to remove it from and are they all manually accessible? Thanks in advance for any help! If it's relevant to my question, here are some of the behaviours I see currently: The tool appears as Installed under Admin - Manage installed tool shed repositories, but doesn't show up in the tools panel. If I try Repository Actions - Get repository updates , I get the error: The directory containing the installed repository named 'bcftools_view' cannot be found. But if I try Repository Actions - Reset repository metadata , it apparently works, I get Metadata has been reset on repository bcftools_view. And, if I try to 'Deactivate or uninstall' the apparently-installed repository, I get: URL: http://galaxy-tut.genome.edu.au/admin_toolshed/deactivate_or_uninstall_repository?id=a25e134c184d6e4b Module paste.exceptions.errormiddleware:144 in __call__ app_iter = self.application(environ, sr_checker) Module paste.debug.prints:106 in __call__ environ, self.app) Module paste.wsgilib:543 in intercept_output app_iter = application(environ, replacement_start_response) Module paste.recursive:84 in __call__ return self.application(environ, start_response) Module paste.httpexceptions:633 in __call__ return self.application(environ, start_response) Module galaxy.web.framework.base:160 in __call__ body = method( trans, **kwargs ) Module galaxy.web.framework:205 in decorator return func( self, trans, *args, **kwargs ) Module galaxy.webapps.galaxy.controllers.admin_toolshed:452 in deactivate_or_uninstall_repository remove_from_tool_panel( trans, tool_shed_repository, shed_tool_conf, uninstall=remove_from_disk_checked ) Module galaxy.util.shed_util:1781 in remove_from_tool_panel tool_panel_dict = generate_tool_panel_dict_from_shed_tool_conf_entries( trans, repository ) Module galaxy.util.shed_util:942 in generate_tool_panel_dict_from_shed_tool_conf_entries tree = util.parse_xml( shed_tool_conf ) Module galaxy.util:135 in parse_xml tree = ElementTree.parse(fname) Module elementtree.ElementTree:859 in parse Module elementtree.ElementTree:576 in parse TypeError: coercing to Unicode: need string or buffer, NoneType found Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] patch contribution (was Re: So I think I fixed a bug.)
subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Pull request, and missing step connections
Hi guys, I made a very small code change so that the workflows API will display all the steps in a workflow, not just the inputs (these are still displayed separately, as before). I made a pull request even though the change is small, just to learn what I'm doing: https://bitbucket.org/galaxy/galaxy-central/pull-request/68/show-workflow-steps-and-connectors-in-api But in doing so I noticed something in the workflows I don't understand. I can't see a connection between workflow input datasets and the steps they are inputting to. I doubt this is a bug, I just don't know how it's supposed to work, so I'm not sure if my API change is sufficient. So for instance, if I create a small workflow with steps Input Dataset - TopHat (accepted_hits bam file) - Cufflinks and call the API on this workflow, I now see { id: f2db41e1fa331b3e, inputs: { 1: { label: Input SE fastq, value: } }, name: Tophat + cufflinks, steps: { 1: { id: 1, input_steps: {}, tool_id: null, type: data_input }, 2: { id: 2, input_steps: {}, tool_id: tophat, type: tool }, 3: { id: 3, input_steps: { input: { source_step: 2, step_output: accepted_hits } }, tool_id: cufflinks, type: tool } }, url: /api/workflows/f2db41e1fa331b3e } The inputs field was there before, the steps field is the new bit. So as expected, step 3 lists step 2 as an input. However step 2 does not list step 1 as an input, even though the GUI shows that they are connected and the workflow works. From other testing it seems that Input Dataset steps never appear wired up in my API response. I'm just iterating over all input_connections, so apparently steps of type data_input are not in the list of WorkflowStep.input_connections . Is this how it should be? How can I find out which steps an Input Dataset is connected to? Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Using the tools API
Hi Jeremy, OK, that makes sense. Thanks again! Clare On 24 August 2012 02:17, Jeremy Goecks jeremy.goe...@emory.edu wrote: I think that handle_input() executes the tool? That's the intention and it should work but it hasn't been tested. Also, separately there is a method called _run_tool() (although unlike _rerun_tool() I can't see anything that calls it). Looks like _run_tool is almost a copy of what's in create(). This is probably legacy code from refactoring that hasn't been cleaned up yet. So, I thought from looking at the surface, that the tool-running code was there and that I just didn't know what data structure to pass into payload['inputs'] . Is it not doing what I think? I think your inference is correct, but, yes, there's the problem of specifying the tool input data structure. Tool inputs are specified as dictionaries (often with nested dictionaries for things like conditionals), so you could construct an appropriate input dictionary and could (likely) run a tool. However, there's no help in the API right now to help you construct an appropriate dictionary for a tool; this is the big missing piece in the tools API. Best, J. -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] egg distribution error when running galaxy-central
Hi Nate all, I see - enthought changes the default python version, and virtualenv was giving me a python version based on the version I used to run virtualenv. If I run /usr/bin/python virtualenv.py galaxy_central_syspython . galaxy_central_syspython/bin/activate and then pull galaxy-central and run it, I don't get any egg errors, either the build-by-hand ones or the more serious error that stopped me originally. Thanks! Clare On 22 August 2012 01:26, Nate Coraor n...@bx.psu.edu wrote: Hi Clare, If you use the system python, or a build from python.org, you should be fine. It's Enthought python, which is only built for a single architecture, that's the reason for having to do so much manual egg building. --nate On Aug 19, 2012, at 7:36 AM, Tomithy Too wrote: Hi Clare, I ran into the same problem as well when I upgraded my galaxy-central version. I am running Mac Os10.6.8 What I did to get use the command $ pip install fabric It manually fetches the latest version of fabric from pip (http://www.pip-installer.org/en/latest/index.html) which is a package manager from python, also its dependencies: ssh and pycrypto, which are the components causing the problem. I think it might be due to an erroneous version of the egg hosted on galaxy. Works fine after that for me. Cheers Tomithy On Wed, Aug 15, 2012 at 10:55 AM, Clare Sloggett s...@unimelb.edu.au wrote: Hi Scott, Thanks very much for this! virtualenv is ok I think: clare$ echo $PATH /Users/clare/galaxy/galaxy_central_env/bin: . which is where I set up my environment. I'm not using anything in particular outside Enthought, that I can think of. Enthought packages up a whole lot of things including scipy. The strange thing is that galaxy-dist runs but galaxy-central doesn't. So, I was hoping it would actually be a temporary bug in the egg distribution, but it sounds like the problem really is my environment. I don't understand how Enthought can be causing problems that virtualenv can't work around, but I've never really understood how python is structured in OSX! So I think it's probably worth me going through the effort of setting up a working environment in an ubuntu VM rather than running it on my Mac - I don't want to be asking you to pull code changes from an environment that's unusual. I'm setting it up in VirtualBox ubuntu now (which has python 2.7.1). So far I've pulled the code into the vm and run it, without virtualenv, and it gives none of the errors I see on the Mac. My plan is to both share the drive containing galaxy-central and share the network so that I can do both the editing and the browsing on my host machine, but if there are better ways advice is welcome! Thanks, Clare On 2 August 2012 07:26, Scott McManus scottmcma...@gatech.edu wrote: I haven't been able to reproduce this yet with the instructions you gave, but I'm not using the same environment. Can you give me an idea of what tools you're using outside of SciPy/NumPy/Enthought stuff? There is the possibility that the virtualenv.py script isn't being sourced correctly. We can check if it's actually using the correct environment by calling echo $PATH and checking that the path is pointing to the virtual environment. For example, I installed virtualenv stuff under /home/smcmanus/clare/galaxy_env/bin, and I got: (galaxy_env)$ echo $PATH /home/smcmanus/clare/galaxy_env/bin:/usr/local/bin:other stuff deleted -Scott - Original Message - Hi all, I'm trying to run galaxy-central on my laptop in order to play around with some changes, and I'm having trouble getting it to run. I can run galaxy-dist without problems and have been working with that (so its eggs are all installed already), but now I want to create a pull request so want to run galaxy-dist. I'm not trying to install any extra tools or data, just the code. I'm running on OSX 10.7.4 and using virtualenv. I have Enthought installed, and I assume I will be using its version of python by default. The default python seems to be 2.7.3. I'm using the same virtualenv environment for galaxy-dist and galaxy-central (though it doesn't seem to matter if I give galaxy-central its own environment, I see the same error). So the steps were: - create a virtualenv environment and activate it - get galaxy-dist and call run.sh - it asked me to build quite a lot of dependencies myself, which was just a matter of running the requested commands, and then it worked with no problems. - shut down galaxy-dist, and in another directory, get galaxy-central and call run.sh. I think it asked me to build a couple of dependencies, but then it gives up with the following: (galaxy_env)Clares-MacBook-Pro:galaxy-central clare$ sh run.sh --reload Some eggs are out of date, attempting to fetch... Warning: MarkupSafe (a dependent egg of Mako) cannot be fetched Warning: pycrypto
[galaxy-dev] Using the tools API
Hi guys, The Tools API is currently working for me from galaxy-central, but I'm not sure how to correctly run a tool. Are there any example scripts, as there are for some other parts of the API? Specifically I want to find out what the expected payload fields are when I post to CREATE to run a tool. Some of the fields are clear to me just from the api/tools.py code (e.g. 'tool_id') but others are not (e.g. how the input datasets and parameters are specified). A separate question: How do we specify Advanced or conditional-dependent fields for a tool? Some of these fields are necessary to run the tool at all. For instance, on my system, calling http://localhost:8080/api/tools/tophat?key= returns { description: Find splice junctions using RNA-seq data, id: tophat, inputs: [ { html: %3Cselect%20name%3D%22input1%22%3E%0A%3C/select%3E, label: RNA-Seq FASTQ file, name: input1, type: data }, { label: Conditional (refGenomeSource), name: refGenomeSource }, { label: Conditional (singlePaired), name: singlePaired } ], name: Tophat for Illumina, version: 1.5.0 } This is obviously only some of the inputs you see in the UI. I think that all the Advanced fields are missing, and more importantly, any input which is dependent on a conditional is missing. So the refGenomeSource conditional is there, but the actual reference genome field is not. The type of the reference genome field also presumably depends on which value is supplied for the referenceGenomeSource conditional. Is there currently a way to specify (or see) these missing fields? Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Using the tools API
Hi Jeremy, Thanks for the info! I am confused though because the code in tools.py was what was making me think I could run a tool with specified inputs. ie I was looking at def create( ... ) # Set up inputs. inputs = payload[ 'inputs' ] params = util.Params( inputs, sanitize = False ) template, vars = tool.handle_input( trans, params.__dict__ ) I think that handle_input() executes the tool? Also, separately there is a method called _run_tool() (although unlike _rerun_tool() I can't see anything that calls it). So, I thought from looking at the surface, that the tool-running code was there and that I just didn't know what data structure to pass into payload['inputs'] . Is it not doing what I think? Thanks, Clare On 23 August 2012 23:00, Jeremy Goecks jeremy.goe...@emory.edu wrote: Unfortunately, the tools API isn't at all complete right now. The tools API was driven by Trackster/Sweepster needs, so rerunning tools works well but running tools from scratch doesn't. Practically, this means that the things you want to do, e.g. *view tool parameters; *set tool input datasets; are not yet supported. As always, community contributions are welcome and encouraged. Best, J. On Aug 23, 2012, at 4:10 AM, Clare Sloggett wrote: Hi guys, The Tools API is currently working for me from galaxy-central, but I'm not sure how to correctly run a tool. Are there any example scripts, as there are for some other parts of the API? Specifically I want to find out what the expected payload fields are when I post to CREATE to run a tool. Some of the fields are clear to me just from the api/tools.py code (e.g. 'tool_id') but others are not (e.g. how the input datasets and parameters are specified). A separate question: How do we specify Advanced or conditional-dependent fields for a tool? Some of these fields are necessary to run the tool at all. For instance, on my system, calling http://localhost:8080/api/tools/tophat?key= returns { description: Find splice junctions using RNA-seq data, id: tophat, inputs: [ { html: %3Cselect%20name%3D%22input1%22%3E%0A%3C/select%3E, label: RNA-Seq FASTQ file, name: input1, type: data }, { label: Conditional (refGenomeSource), name: refGenomeSource }, { label: Conditional (singlePaired), name: singlePaired } ], name: Tophat for Illumina, version: 1.5.0 } This is obviously only some of the inputs you see in the UI. I think that all the Advanced fields are missing, and more importantly, any input which is dependent on a conditional is missing. So the refGenomeSource conditional is there, but the actual reference genome field is not. The type of the reference genome field also presumably depends on which value is supplied for the referenceGenomeSource conditional. Is there currently a way to specify (or see) these missing fields? Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] egg distribution error when running galaxy-central
Hi Tomithy, Thanks, this worked for me too! Just to be clear for interested devs: If I run galaxy-dist on my mac it asks me to build a whole series of eggs by hand using scripts/scramble.py, and if I follow these instructions, galaxy runs. A bit tedious but trivial to do. If I run galaxy-central on my mac the same thing happens for a few dependencies, but then it gets stuck at the error I posted originally. If I run `pip install fabric` as Tomithy suggests then I get the same results as running galaxy-dist, ie galaxy works after using scramble.py a few times. If I run galaxy-central (or presumably galaxy-dist) on ubuntu it doesn't complain about any of the dependencies, doesn't get stuck, and doesn't ask me to build any eggs by hand. So I'm now wondering if, for code editing, I should use the ubuntu environment I've set up even though the code is working natively on the mac, just to avoid future complications. Cheers, Clare On 19 August 2012 21:36, Tomithy Too tomithy@gmail.com wrote: Hi Clare, I ran into the same problem as well when I upgraded my galaxy-central version. I am running Mac Os10.6.8 What I did to get use the command $ pip install fabric It manually fetches the latest version of fabric from pip (http://www.pip-installer.org/en/latest/index.html) which is a package manager from python, also its dependencies: ssh and pycrypto, which are the components causing the problem. I think it might be due to an erroneous version of the egg hosted on galaxy. Works fine after that for me. Cheers Tomithy On Wed, Aug 15, 2012 at 10:55 AM, Clare Sloggett s...@unimelb.edu.au wrote: Hi Scott, Thanks very much for this! virtualenv is ok I think: clare$ echo $PATH /Users/clare/galaxy/galaxy_central_env/bin: . which is where I set up my environment. I'm not using anything in particular outside Enthought, that I can think of. Enthought packages up a whole lot of things including scipy. The strange thing is that galaxy-dist runs but galaxy-central doesn't. So, I was hoping it would actually be a temporary bug in the egg distribution, but it sounds like the problem really is my environment. I don't understand how Enthought can be causing problems that virtualenv can't work around, but I've never really understood how python is structured in OSX! So I think it's probably worth me going through the effort of setting up a working environment in an ubuntu VM rather than running it on my Mac - I don't want to be asking you to pull code changes from an environment that's unusual. I'm setting it up in VirtualBox ubuntu now (which has python 2.7.1). So far I've pulled the code into the vm and run it, without virtualenv, and it gives none of the errors I see on the Mac. My plan is to both share the drive containing galaxy-central and share the network so that I can do both the editing and the browsing on my host machine, but if there are better ways advice is welcome! Thanks, Clare On 2 August 2012 07:26, Scott McManus scottmcma...@gatech.edu wrote: I haven't been able to reproduce this yet with the instructions you gave, but I'm not using the same environment. Can you give me an idea of what tools you're using outside of SciPy/NumPy/Enthought stuff? There is the possibility that the virtualenv.py script isn't being sourced correctly. We can check if it's actually using the correct environment by calling echo $PATH and checking that the path is pointing to the virtual environment. For example, I installed virtualenv stuff under /home/smcmanus/clare/galaxy_env/bin, and I got: (galaxy_env)$ echo $PATH /home/smcmanus/clare/galaxy_env/bin:/usr/local/bin:other stuff deleted -Scott - Original Message - Hi all, I'm trying to run galaxy-central on my laptop in order to play around with some changes, and I'm having trouble getting it to run. I can run galaxy-dist without problems and have been working with that (so its eggs are all installed already), but now I want to create a pull request so want to run galaxy-dist. I'm not trying to install any extra tools or data, just the code. I'm running on OSX 10.7.4 and using virtualenv. I have Enthought installed, and I assume I will be using its version of python by default. The default python seems to be 2.7.3. I'm using the same virtualenv environment for galaxy-dist and galaxy-central (though it doesn't seem to matter if I give galaxy-central its own environment, I see the same error). So the steps were: - create a virtualenv environment and activate it - get galaxy-dist and call run.sh - it asked me to build quite a lot of dependencies myself, which was just a matter of running the requested commands, and then it worked with no problems. - shut down galaxy-dist, and in another directory, get galaxy-central and call run.sh. I think it asked me to build a couple of dependencies, but then it gives up with the following
Re: [galaxy-dev] egg distribution error when running galaxy-central
Hi Scott, Thanks very much for this! virtualenv is ok I think: clare$ echo $PATH /Users/clare/galaxy/galaxy_central_env/bin: . which is where I set up my environment. I'm not using anything in particular outside Enthought, that I can think of. Enthought packages up a whole lot of things including scipy. The strange thing is that galaxy-dist runs but galaxy-central doesn't. So, I was hoping it would actually be a temporary bug in the egg distribution, but it sounds like the problem really is my environment. I don't understand how Enthought can be causing problems that virtualenv can't work around, but I've never really understood how python is structured in OSX! So I think it's probably worth me going through the effort of setting up a working environment in an ubuntu VM rather than running it on my Mac - I don't want to be asking you to pull code changes from an environment that's unusual. I'm setting it up in VirtualBox ubuntu now (which has python 2.7.1). So far I've pulled the code into the vm and run it, without virtualenv, and it gives none of the errors I see on the Mac. My plan is to both share the drive containing galaxy-central and share the network so that I can do both the editing and the browsing on my host machine, but if there are better ways advice is welcome! Thanks, Clare On 2 August 2012 07:26, Scott McManus scottmcma...@gatech.edu wrote: I haven't been able to reproduce this yet with the instructions you gave, but I'm not using the same environment. Can you give me an idea of what tools you're using outside of SciPy/NumPy/Enthought stuff? There is the possibility that the virtualenv.py script isn't being sourced correctly. We can check if it's actually using the correct environment by calling echo $PATH and checking that the path is pointing to the virtual environment. For example, I installed virtualenv stuff under /home/smcmanus/clare/galaxy_env/bin, and I got: (galaxy_env)$ echo $PATH /home/smcmanus/clare/galaxy_env/bin:/usr/local/bin:other stuff deleted -Scott - Original Message - Hi all, I'm trying to run galaxy-central on my laptop in order to play around with some changes, and I'm having trouble getting it to run. I can run galaxy-dist without problems and have been working with that (so its eggs are all installed already), but now I want to create a pull request so want to run galaxy-dist. I'm not trying to install any extra tools or data, just the code. I'm running on OSX 10.7.4 and using virtualenv. I have Enthought installed, and I assume I will be using its version of python by default. The default python seems to be 2.7.3. I'm using the same virtualenv environment for galaxy-dist and galaxy-central (though it doesn't seem to matter if I give galaxy-central its own environment, I see the same error). So the steps were: - create a virtualenv environment and activate it - get galaxy-dist and call run.sh - it asked me to build quite a lot of dependencies myself, which was just a matter of running the requested commands, and then it worked with no problems. - shut down galaxy-dist, and in another directory, get galaxy-central and call run.sh. I think it asked me to build a couple of dependencies, but then it gives up with the following: (galaxy_env)Clares-MacBook-Pro:galaxy-central clare$ sh run.sh --reload Some eggs are out of date, attempting to fetch... Warning: MarkupSafe (a dependent egg of Mako) cannot be fetched Warning: pycrypto (a dependent egg of Fabric) cannot be fetched Warning: simplejson (a dependent egg of WebHelpers) cannot be fetched Fetched http://eggs.g2.bx.psu.edu/ssh/ssh-1.7.14-py2.7.egg One of Galaxy's managed eggs depends on something which is missing, this is almost certainly a bug in the egg distribution. Dependency ssh requires pycrypto=2.1,!=2.4 Traceback (most recent call last): File ./scripts/fetch_eggs.py, line 30, in module c.resolve() # Only fetch eggs required by the config File /Users/clare/galaxy/galaxy-central/lib/galaxy/eggs/__init__.py, line 345, in resolve egg.resolve() File /Users/clare/galaxy/galaxy-central/lib/galaxy/eggs/__init__.py, line 168, in resolve dists = pkg_resources.working_set.resolve( ( self.distribution.as_requirement(), ), env, self.fetch ) File /Users/clare/galaxy/galaxy_env/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg/pkg_resources.py, line 569, in resolve raise VersionConflict(dist,req) # XXX put more info here pkg_resources.VersionConflict: (ssh 1.7.14 (/Users/clare/galaxy/galaxy-central/eggs/ssh-1.7.14-py2.7.egg), Requirement.parse('pycrypto=2.1,!=2.4')) Fetch failed. Any idea what is causing this? Thanks, Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357
Re: [galaxy-dev] egg distribution error when running galaxy-central
On 1 August 2012 15:28, Clare Sloggett s...@unimelb.edu.au wrote: I can run galaxy-dist without problems and have been working with that (so its eggs are all installed already), but now I want to create a pull request so want to run galaxy-dist. oops, of course I mean 'so want to run galaxy-central.' -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Citations for tools
Hi Peter, Thanks, I didn't realise it had been discussed! I don't know what would be a good markup system for citations. However, the current situation is that people are putting their citations into the help tag with no special markup, and it seems to work reasonably well. Maybe a simple field is all that's needed? Clare On 18 June 2012 19:50, Peter Cock p.j.a.c...@googlemail.com wrote: On Mon, Jun 18, 2012 at 10:29 AM, Clare Sloggett s...@unimelb.edu.au wrote: Hi all, I'd like to suggest, or request, a feature - I think that posting to galaxy-dev is the right place to start? After I've done an analysis, it would be useful to be given a list of references for all the tools I used in that history, which I could use to cite the appropriate papers. At the moment, it seems that most tool developers add a please cite the following paper note to the help tag in the wrapper so that it displays on the tool screen before you run it. I'd like to suggest: * adding a cite tag to the tool wrappers xml, * adding a feature to the history UI which will list all the references to cite for the a history. I think this would encourage people to cite the tools they use properly and hence encourage developers to put their tools into the toolshed! With the standard tools moving into the toolshed it will be really important for tool wrappers to be maintained. Any thoughts? Clare Hi Clare, We talked about this at the end of last year, and yes, it would be a good idea: http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-December/007873.html Are you familiar enough with the area of semantic web/linked data to know what would be the best XML based markup to use for embedding the citations? Peter -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Citations for tools
Hi all, I'd like to suggest, or request, a feature - I think that posting to galaxy-dev is the right place to start? After I've done an analysis, it would be useful to be given a list of references for all the tools I used in that history, which I could use to cite the appropriate papers. At the moment, it seems that most tool developers add a please cite the following paper note to the help tag in the wrapper so that it displays on the tool screen before you run it. I'd like to suggest: * adding a cite tag to the tool wrappers xml, * adding a feature to the history UI which will list all the references to cite for the a history. I think this would encourage people to cite the tools they use properly and hence encourage developers to put their tools into the toolshed! With the standard tools moving into the toolshed it will be really important for tool wrappers to be maintained. Any thoughts? Clare -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Interested in speaking with other institutions deploying Galaxy locally?
. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Clare Sloggett Research Fellow / Bioinformatician Life Sciences Computation Centre Victorian Life Sciences Computation Initiative University of Melbourne, Parkville Campus 187 Grattan Street, Carlton, Melbourne Victoria 3010, Australia Ph: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] [galaxy-user] Using Galaxy Cloudman for a workshop
Right! I did think to look for a 'share this cluster' command, I just failed to find it. It all makes sense now, thanks. On Thu, Dec 1, 2011 at 7:34 PM, Enis Afgan eaf...@emory.edu wrote: Hi Clare, The share string is generated when you share a cluster. The string is accessible on the shared cluster, when you click the green 'Share a cluster' icon next to the cluster name and then the top link Shared instances. You will get a list of the point in time shares of the cluster you have created. The share string will look something like this cm-cd53Bfg6f1223f966914df347687f6uf32/shared/2011-10-19--03-14 You simply paste that string into new cluster box you mentioned. Enis On Thu, Dec 1, 2011 at 6:31 AM, Clare Sloggett s...@unimelb.edu.au wrote: Hi Enis, Jeremy, and all, Thanks so much for all your help. I have another question which I suspect is just me missing something obvious. I'm guessing that when you cloned the cluster for your workshop, you used CloudMan's 'share-an-instance' functionality? When I launch a new cluster which I want to be a copy of an existing cluster, and select the share-an-instance option, it asks for the cluster share-string. How can I find this string for my existing cluster? Or have I got completely the wrong idea - did you actually clone the instance using AWS functionality? Thanks, Clare On Mon, Nov 21, 2011 at 5:37 PM, Enis Afgan eaf...@emory.edu wrote: Hi Clare, I don't recall what instance type we used earlier, but I think an Extra Large Instance is going to be fine. Do note that the master node is also being used to run jobs. However, if it's loaded by just the web server, SGE will typically just not schedule jobs to it. As far as the core/thread/slot concerns goes, SGE sees each core as a slot. Each job in Galaxy simply requires 1 slot, even if it uses multiple threads (i.e., cores). What this means is that nodes will probably get overloaded if only the same type of job is being run (BWA), but if analyses are being run that use multiple tools, jobs will get spread over the cluster to balance the overal load a bit better than by simply looking at the number of slots. Enis On Mon, Nov 21, 2011 at 4:34 AM, Clare Sloggett s...@unimelb.edu.au wrote: Hi Jeremy, Also if you do remember what kind of Amazon node you used, particularly for the cluster's master node (e.g. an 'xlarge' 4-core 15GB or perhaps one of the 'high-memory' nodes?), that would be a reassuring sanity chech for me! Cheers, Clare On Mon, Nov 21, 2011 at 10:37 AM, Clare Sloggett s...@unimelb.edu.au wrote: Hi Jeremy, Enis, That makes sense. I know I can configure how many threads BWA uses in its wrapper, with bwa -t. But, is there somewhere that I need to tell Galaxy the corresponding information, ie that this command-line task will make use of up to 4 cores? Or, does this imply that there is always exactly one job per node? So if I have (for instance) a cluster made of 4-core nodes, and a single-threaded task (e.g. samtools), are the other 3 cores just going to waste or will the scheduler allocate multiple single-threaded jobs to one node? I've cc'd galaxy-dev instead of galaxy-user as I think the conversation has gone that way! Thanks again, Clare On Fri, Nov 18, 2011 at 2:36 PM, Jeremy Goecks jeremy.goe...@emory.edu wrote: On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Scalability issues are more likely to arise on the back end than the front end, so you'll want to ensure that you have enough compute nodes. BWA uses four nodes by default--Enis, does the cloud config change this parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job simultaneously. Actually, one other question - this paragraph makes me realise that I don't really understand how Galaxy is distributing jobs. I had thought that each job would only use one node, and in some cases take advantage of multiple cores within that node. I'm taking a node to be a set of cores with their own shared memory, so in this case a VM instance, is this right? If some types of jobs can be distributed over multiple nodes, can I configure, in Galaxy, how many nodes they should use? You're right -- my word choices were poor. Replace 'node' with 'core' in my paragraph to get an accurate suggestion for resources. Galaxy uses a job scheduler--SGE on the cloud--to distribute jobs to different cluster nodes. Jobs that require multiple cores typically run on a single node. Enis can chime in on whether CloudMan supports job submission over multiple nodes; this would require setup of an appropriate parallel environment and a tool that can make use of this environment. Good luck, J
[galaxy-dev] Removing nodes from a CloudMan instance
Hi galaxy-devs, Quick question: when using the cloud console on CloudMan, it's possible to add different types of nodes (large, micro, etc) to the virtual cluster using the 'Add Nodes' option at the top. I can also remove a given number of nodes using the 'Remove Nodes' option at the top. However, is there any way to control exactly which node (or more importantly just which type of node) gets removed? Thanks for any help! Clare -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] [galaxy-user] Using Galaxy Cloudman for a workshop
Hi Enis, Jeremy, and all, Thanks so much for all your help. I have another question which I suspect is just me missing something obvious. I'm guessing that when you cloned the cluster for your workshop, you used CloudMan's 'share-an-instance' functionality? When I launch a new cluster which I want to be a copy of an existing cluster, and select the share-an-instance option, it asks for the cluster share-string. How can I find this string for my existing cluster? Or have I got completely the wrong idea - did you actually clone the instance using AWS functionality? Thanks, Clare On Mon, Nov 21, 2011 at 5:37 PM, Enis Afgan eaf...@emory.edu wrote: Hi Clare, I don't recall what instance type we used earlier, but I think an Extra Large Instance is going to be fine. Do note that the master node is also being used to run jobs. However, if it's loaded by just the web server, SGE will typically just not schedule jobs to it. As far as the core/thread/slot concerns goes, SGE sees each core as a slot. Each job in Galaxy simply requires 1 slot, even if it uses multiple threads (i.e., cores). What this means is that nodes will probably get overloaded if only the same type of job is being run (BWA), but if analyses are being run that use multiple tools, jobs will get spread over the cluster to balance the overal load a bit better than by simply looking at the number of slots. Enis On Mon, Nov 21, 2011 at 4:34 AM, Clare Sloggett s...@unimelb.edu.au wrote: Hi Jeremy, Also if you do remember what kind of Amazon node you used, particularly for the cluster's master node (e.g. an 'xlarge' 4-core 15GB or perhaps one of the 'high-memory' nodes?), that would be a reassuring sanity chech for me! Cheers, Clare On Mon, Nov 21, 2011 at 10:37 AM, Clare Sloggett s...@unimelb.edu.au wrote: Hi Jeremy, Enis, That makes sense. I know I can configure how many threads BWA uses in its wrapper, with bwa -t. But, is there somewhere that I need to tell Galaxy the corresponding information, ie that this command-line task will make use of up to 4 cores? Or, does this imply that there is always exactly one job per node? So if I have (for instance) a cluster made of 4-core nodes, and a single-threaded task (e.g. samtools), are the other 3 cores just going to waste or will the scheduler allocate multiple single-threaded jobs to one node? I've cc'd galaxy-dev instead of galaxy-user as I think the conversation has gone that way! Thanks again, Clare On Fri, Nov 18, 2011 at 2:36 PM, Jeremy Goecks jeremy.goe...@emory.edu wrote: On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Scalability issues are more likely to arise on the back end than the front end, so you'll want to ensure that you have enough compute nodes. BWA uses four nodes by default--Enis, does the cloud config change this parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job simultaneously. Actually, one other question - this paragraph makes me realise that I don't really understand how Galaxy is distributing jobs. I had thought that each job would only use one node, and in some cases take advantage of multiple cores within that node. I'm taking a node to be a set of cores with their own shared memory, so in this case a VM instance, is this right? If some types of jobs can be distributed over multiple nodes, can I configure, in Galaxy, how many nodes they should use? You're right -- my word choices were poor. Replace 'node' with 'core' in my paragraph to get an accurate suggestion for resources. Galaxy uses a job scheduler--SGE on the cloud--to distribute jobs to different cluster nodes. Jobs that require multiple cores typically run on a single node. Enis can chime in on whether CloudMan supports job submission over multiple nodes; this would require setup of an appropriate parallel environment and a tool that can make use of this environment. Good luck, J. -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] [galaxy-user] Using Galaxy Cloudman for a workshop
Hi Jeremy, Enis, That makes sense. I know I can configure how many threads BWA uses in its wrapper, with bwa -t. But, is there somewhere that I need to tell Galaxy the corresponding information, ie that this command-line task will make use of up to 4 cores? Or, does this imply that there is always exactly one job per node? So if I have (for instance) a cluster made of 4-core nodes, and a single-threaded task (e.g. samtools), are the other 3 cores just going to waste or will the scheduler allocate multiple single-threaded jobs to one node? I've cc'd galaxy-dev instead of galaxy-user as I think the conversation has gone that way! Thanks again, Clare On Fri, Nov 18, 2011 at 2:36 PM, Jeremy Goecks jeremy.goe...@emory.edu wrote: On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Scalability issues are more likely to arise on the back end than the front end, so you'll want to ensure that you have enough compute nodes. BWA uses four nodes by default--Enis, does the cloud config change this parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job simultaneously. Actually, one other question - this paragraph makes me realise that I don't really understand how Galaxy is distributing jobs. I had thought that each job would only use one node, and in some cases take advantage of multiple cores within that node. I'm taking a node to be a set of cores with their own shared memory, so in this case a VM instance, is this right? If some types of jobs can be distributed over multiple nodes, can I configure, in Galaxy, how many nodes they should use? You're right -- my word choices were poor. Replace 'node' with 'core' in my paragraph to get an accurate suggestion for resources. Galaxy uses a job scheduler--SGE on the cloud--to distribute jobs to different cluster nodes. Jobs that require multiple cores typically run on a single node. Enis can chime in on whether CloudMan supports job submission over multiple nodes; this would require setup of an appropriate parallel environment and a tool that can make use of this environment. Good luck, J. -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Missing requirements in xml wrappers in galaxy-dist?
Hi James all, I have been getting some errors to do with the path environment variable. For instance, when uploading a sam file to our local galaxy instance, we got: Traceback (most recent call last): File /data/ugalaxy/galaxy-dist/tools/data_source/upload.py, line 394, in __main__() . line 63, in _get_samtools_version output = subprocess.Popen( [ 'samtools' ], stderr=subprocess.PIPE, stdout=subprocess.PIPE ).communicate()[1] File /usr/local/lib/python2.7/subprocess.py, line 679, in __init__ errread, errwrite) File /usr/local/lib/python2.7/subprocess.py, line 1228, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory I can post the full error if you'd like, but basically the problem was that samtools wasn't in the PATH. This was because we have our tools installed in a non-standard place, so we are depending on the requirements being specified as James described below, and samtools isn't specified as a requirement in upload.xml, so when upload.py calls datatypes.py and tries to use samtools, it gives an error. I've found a couple of other examples like this - for instance samtools is also used by some picard scripts so should be specified as a requirement in the picard wrappers. These problems probably don't show up in most cases when people just have the tools installed as root and on their PATH by default? I'm going to be fixing these where I find them. Would it be helpful for me to contribute these tweaks back or would it be better just to raise an issue? Thanks, Clare On Wed, Nov 16, 2011 at 4:33 PM, Clare Sloggett s...@unimelb.edu.au wrote: Looks like it's working! The problem I had run into, in hindsight, was a) I hadn't set tool_dependency_dir as I didn't know about it b) if I had set it, I was installing tools to $SW/tool-name/version-number/ but the default tool wrappers in galaxy-dist don't have version numbers set, so they will just look in $SW/tool-name/ . Can I suggest this be added to the wiki somewhere under the Admin pages? Apologies if it's there, I couldn't find it except under News Briefs at http://wiki.g2.bx.psu.edu/News%20Briefs/2010_11_24?highlight=%28tool_dependency_dir%29 . As well as being on the wiki it would be useful to have it (commented out by default) in universe_wsgi.ini. I think the tool_dependency_dir variable isn't in there at all at the moment, at least in the galaxy-dist I have. It would also be useful to have a brief mention or link to it on http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup to save time for people like me who had tools installed in a non-standard place. Thanks again! Clare On Wed, Nov 16, 2011 at 3:01 PM, Clare Sloggett s...@unimelb.edu.au wrote: Great! Thanks James, this is exactly what I need. On Wed, Nov 16, 2011 at 2:20 PM, James Taylor ja...@jamestaylor.org wrote: On Nov 15, 2011, at 9:59 PM, Clare Sloggett wrote: If this is the case, what is the best way to install and maintain two versions of the same tool? I can write code into the wrapper to find the correct version of the tool in a given case, but I was wondering if there is a more standard 'galaxy' way to configure this. You should provide tool_dependency_dir in the config file and point it at some directory $SW where you will install tools under. With this enabled, when a tool has a dependency, Galaxy will look for it under that directory and attempt to run a script to setup the environment. For example if you have tool with a dependency on foo version 1.3, Galaxy will look for: $SW/foo/1.3/env.sh and if found will source it as part of the job submission script. This usually contains something simple like PATH=$PACKAGE_BASE/bin:$PATH to add the binaries installed with the dependency to the path. Ideally all dependencies used by Galaxy tools would be installed in this way. -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] installing multiple versions of a tool / path configuration
Hi all, I am a little confused as to the right way to configure my installed NGS tools. From the documentation I've found and from looking at the xml/python wrappers, it looks to me like the NGS tool wrappers simply call the tools and assume they will be in the galaxy account's PATH. For instance, bwa_wrapper.py simply calls bwa in the shell. So it looks to me like the default assumption is that bwa was installed as root and is available to all users, and will always be on your PATH. Is this right, or am I missing some configuration options that should tell galaxy where to find the bwa binary? If this is the case, what is the best way to install and maintain two versions of the same tool? I can write code into the wrapper to find the correct version of the tool in a given case, but I was wondering if there is a more standard 'galaxy' way to configure this. Sorry for the newbie questions again - I have been looking at the NGS setup and Tools documentation, but I don't think I've found anything on multiple installed versions of a tool. Also if there are any docs I have missed on configuring path environment variables, please let me know! Thanks, Clare -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] installing multiple versions of a tool / path configuration
Great! Thanks James, this is exactly what I need. On Wed, Nov 16, 2011 at 2:20 PM, James Taylor ja...@jamestaylor.org wrote: On Nov 15, 2011, at 9:59 PM, Clare Sloggett wrote: If this is the case, what is the best way to install and maintain two versions of the same tool? I can write code into the wrapper to find the correct version of the tool in a given case, but I was wondering if there is a more standard 'galaxy' way to configure this. You should provide tool_dependency_dir in the config file and point it at some directory $SW where you will install tools under. With this enabled, when a tool has a dependency, Galaxy will look for it under that directory and attempt to run a script to setup the environment. For example if you have tool with a dependency on foo version 1.3, Galaxy will look for: $SW/foo/1.3/env.sh and if found will source it as part of the job submission script. This usually contains something simple like PATH=$PACKAGE_BASE/bin:$PATH to add the binaries installed with the dependency to the path. Ideally all dependencies used by Galaxy tools would be installed in this way. -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] installing multiple versions of a tool / path configuration
Looks like it's working! The problem I had run into, in hindsight, was a) I hadn't set tool_dependency_dir as I didn't know about it b) if I had set it, I was installing tools to $SW/tool-name/version-number/ but the default tool wrappers in galaxy-dist don't have version numbers set, so they will just look in $SW/tool-name/ . Can I suggest this be added to the wiki somewhere under the Admin pages? Apologies if it's there, I couldn't find it except under News Briefs at http://wiki.g2.bx.psu.edu/News%20Briefs/2010_11_24?highlight=%28tool_dependency_dir%29 . As well as being on the wiki it would be useful to have it (commented out by default) in universe_wsgi.ini. I think the tool_dependency_dir variable isn't in there at all at the moment, at least in the galaxy-dist I have. It would also be useful to have a brief mention or link to it on http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup to save time for people like me who had tools installed in a non-standard place. Thanks again! Clare On Wed, Nov 16, 2011 at 3:01 PM, Clare Sloggett s...@unimelb.edu.au wrote: Great! Thanks James, this is exactly what I need. On Wed, Nov 16, 2011 at 2:20 PM, James Taylor ja...@jamestaylor.org wrote: On Nov 15, 2011, at 9:59 PM, Clare Sloggett wrote: If this is the case, what is the best way to install and maintain two versions of the same tool? I can write code into the wrapper to find the correct version of the tool in a given case, but I was wondering if there is a more standard 'galaxy' way to configure this. You should provide tool_dependency_dir in the config file and point it at some directory $SW where you will install tools under. With this enabled, when a tool has a dependency, Galaxy will look for it under that directory and attempt to run a script to setup the environment. For example if you have tool with a dependency on foo version 1.3, Galaxy will look for: $SW/foo/1.3/env.sh and if found will source it as part of the job submission script. This usually contains something simple like PATH=$PACKAGE_BASE/bin:$PATH to add the binaries installed with the dependency to the path. Ideally all dependencies used by Galaxy tools would be installed in this way. -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Configuration of a local install - mi-deploy ?
Hi Greg Ross, Thanks for this! Yes, I confused the issue by mentioning the Tool Shed, sorry - it's the binaries themselves I need to install. Essentially I think I need to follow the steps at http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup , but I was wondering about mi-deployment scripts as a better way to do this, and whether that's standard practice. After looking through the scripts they really seem like the best option to me - it looks like they are set up so you mostly only need to change configuration variables to get this to work. The scripts are mentioned on the wiki instructions but don't seem to be the default option (they are not bundled in galaxy-dist), so I wondered if people are usually doing it this way? Thanks, Clare On Fri, Nov 11, 2011 at 2:06 AM, Ross ross.laza...@gmail.com wrote: Clare, As Greg says, the tool wrappers exposed on Main all come with a mercurial checkout - but all the binary dependencies, genomic data and indexes do not. The tool shed is really the tool-wrapper shed - binary and data dependencies aren't there either. I haven't tried them but I'd guess that the deploy scripts take care of a lot of messy details but will probably need some work to fit your local setup. As to email for admin - my advice would be don't worry about it - if one user forgets their password you can reset it from the admin interface - that's the main use and if this is really a test instance, it's not a show stopper. On Thu, Nov 10, 2011 at 9:14 AM, Greg Von Kuster g...@bx.psu.edu wrote: Hello Clare, On Nov 9, 2011, at 11:11 PM, Clare Sloggett wrote: Hi all, Most of our playing around with Galaxy has been in getting it working on our local cloud, but now for the first time I'm configuring a non-cloud local install of galaxy-dist (set up as per http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Production%20Server) So I have some naive questions! Would it be a sensible approach to grab the tools_fabfile script from mi-deployment and use it in this case? Or should I be using the Tool Shed for installing the base set of tools? The Galaxy distribution includes all of the tools you see on both the Penn State test and main instances. You can also get tools form the tool shed if you want - see our wiki at http://wiki.g2.bx.psu.edu/Tool%20Shed for information about how to do this. I don't think you'll need to use the tools_fabfile script from mi_deployment repo for your local instance. Also, if I have problems with this server sending out emails (which may be the case) am I going to run into trouble with user/password management or can I just admin everything manually? There will only be a very small number of users on this server. I'm not clear on this question, but you certainly shouldn't run into any problems with user / password management within Galaxy. From the Galaxy Admin interface you have the ability to Manage users where you can reset passwords if necessary. If I have missed any good 'getting started' documentation on config/admin of a local install please point me in the right direction. I've been looking at http://wiki.g2.bx.psu.edu/Admin . You found the right wiki. Thanks, Clare -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ Greg Von Kuster Galaxy Development Team g...@bx.psu.edu ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444; -- E: s...@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Using a Galaxy image on a local system
Hi Alex Enis, Thanks very much! I've only recently got VMware installed and am having a look at the NBIC VM now. This is just the sort of thing I was looking for for quick start-up. I've heard of the afgane project - it has been suggested to us that this would be a good path to take for our local cloud. Is it specifically designed for creating AMIs, or is it really for automated configuration and deployment anywhere? Thanks, Clare On Fri, Apr 29, 2011 at 8:48 AM, Enis Afgan eaf...@emory.edu wrote: Hi Clare, Once you have a VM setup and accessible via ssh, you can also use our scripts for automated configuration and deployment of dependencies and tools. These scripts are used to setup Galaxy Cloud and they're targeted for Ubuntu 10.04 but should be applicable to other distributions as well. The scripts are available here: https://bitbucket.org/afgane/mi-deployment/overview Good luck, Enis On Thu, Apr 28, 2011 at 3:16 AM, Clare Sloggett s...@unimelb.edu.auwrote: Hi all, I would like to set up Galaxy locally. At the moment, I'm just trying to use it on my desktop (a Mac, OSX 10.6.7), but later we will want a local server to play with. Rather than install galaxy and then install all the tools it can use (and deal with OSX issues for some of them), it seems simpler just to use a virtual machine, since there are images which get regularly updated and come with pretty much everything. Is there anything wrong with this approach? I know there are Amazon EC2 images for Galaxy. So far as I know there are not other kinds of images? So for using it on my desktop, I think my options are either to run an EC2-compatible system locally, or to try to convert the AMI to a VMWare or VirtualBox image. I was just wondering if anyone has already tried either of these approaches? Also, is it possible to get hold of the galaxy AMI files themselves? Any advice welcome! Thanks, Clare ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/