Re: [galaxy-dev] Trouble setting up a local instance of Galaxy
Hi Starr I will try to answer some of your questions. Next time I recommend to ask just one question (with a corresponding subject line). This will make it much easier for the community to help you, and for others with similar problems to find the correct e-mail thread. see below for my answers... On 11/06/2013 11:43 PM, Hazard, E. Starr wrote: Hello, I am a new user of Galaxy. I have a Galaxy instance running (sort of) on a local research cluster. I issued the command hg update stable” today and it retrieved no files. SO presume I am up-to-date on the stable release. I start the instance as a user named “galaxy”. Right now I am still running in “local” mode. I hope to migrate to DRMAA LSF eventually. I have tried to set up ProFTP to upload files but have not succeeded so I use Galaxy Web-upload. The upload was working nicely and I had added a couple of new tools and they were working with the uploaded files. Getting LSF/DRMAA to work was giving me fits and ultimately I deleted all my history files in an effort to start over. Presently, files being uploaded appear in history as say job 1 ( in a new history) The job status in the history panel of the web GUI changes from purple to yellow then then to red indicating some sort of error. There is no viewable error text captured, but I can click on the “eye” icon and see the first megabyte of the data (for tiny files I can see the entire content and it’s intact). In the Galaxy file system, however, these files appear but have a different number , say, dataset_399.dat On my system the uploaded files appear in /PATH/galaxy-dist/database/files/000 My first question is why is the data going into the “000” subdirectory and not one “owned’ by the user who is uploading? all files are owned by galaxy (independent of who is uploading them and/or generate new files). The first 1000 files are stored in '000', the next 1000 files in '001', and so on. Access permission is handled by the SQLite database (unless you have already switched to a PostgreSQL database). My second question is why is the dataset being labeled as dataset_399.dat and not dataset_001.dat? The number is given by the SQLite database. When you want a clean start, you need to remove all files from ~/database/files/000 and start with an empty database. My third question is why do the uploaded files not appear as selectable options ( say I have paired-end fastq files and tool wants to have choices about filenames)? This problem is present for programs that seek one input file as well. When you uploaded the files, was the 'format' properly recognized? I presume that Galaxy is confused because the numbering in history is not the same as the numbering in the file upload archive (e.g. /PATH/galaxy-dist/database/files/000 in my case) so my last question is how do I “reset” my system to get the dataset and history numbers to be the same? The numbering in the history has nothing to do with the file numbering. See above wrt my comment about a 'clean start' Regards, Hans-Rudolf Here’s how I launch the Galaxy instance sh /shared/app/Galaxy/galaxy-dist/run.sh -v --daemon --pid-file=Nov6Localdaemon.pid.txt --log-file=Nov6Local1639daemon.log.txt Entering daemon mode Here are the last lines of the log Starting server in PID 26236. serving on 0.0.0.0:8089 view at http://127.0.0.1:8089 galaxy.tools.actions.upload_common DEBUG 2013-11-06 16:48:49,624 Changing ownership of /shared/app/Galaxy/galaxy-dist/database/tmp/upload_file_data_QZGHm4 with: /usr/bin/sudo -E scripts/external_chown_script.py /shared/app/Galaxy/galaxy-dist/database/tmp/upload_file_data_QZGHm4 hazards 502 galaxy.tools.actions.upload_common WARNING 2013-11-06 16:48:49,750 Changing ownership of uploaded file /shared/app/Galaxy/galaxy-dist/database/tmp/upload_file_data_QZGHm4 failed: sudo: no tty present and no askpass program specified galaxy.tools.actions.upload_common DEBUG 2013-11-06 16:48:49,751 Changing ownership of /shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO with: /usr/bin/sudo -E scripts/external_chown_script.py /shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO hazards 502 galaxy.tools.actions.upload_common WARNING 2013-11-06 16:48:49,775 Changing ownership of uploaded file /shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO failed: sudo: no tty present and no askpass program specified galaxy.tools.actions.upload_common INFO 2013-11-06 16:48:49,805 tool upload1 created job id 170 galaxy.jobs DEBUG 2013-11-06 16:48:50,678 (170) Persisting job destination (destination id: local) galaxy.jobs.handler INFO 2013-11-06 16:48:50,698 (170) Job dispatched galaxy.jobs.runners.local DEBUG 2013-11-06 16:48:50,994 (170) executing: python /shared/app/Galaxy/galaxy-dist/tools/data_source/upload.py /depot/shared/app/Galaxy/galaxy-dist /shared/app/Galaxy/galaxy-dist/database/tmp/tmpTq22ot /shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO
[galaxy-dev] Using input dataset names in output dataset names
Hi all, I'd like to change the output dataset labelling in Galaxy file format conversion tools. e.g. If the input is history entry 1 (e.g My Genes) then the output from tabular_to_fasta.xml is currently named FASTA-to-Tabular on data 1. I would prefer this was FASTA-to-Tabular on data My Genes or better My Genes (as tabular). I've just done this for my BLAST XML to tabular tool, using the .display_name trick: https://github.com/peterjc/galaxy_blast/commit/31e31c4b5deadd60828ce6e6a381a5f90357393d Would a pull request doing this to the built-in conversion tools be favourably received? Alternatively, would it be preferable to simply reused the input dataset's name unchanged for simple format conversion tools (without text about the conversion)? Related to this, would people prefer if the $on_string in the case of a single input file was the input file's name (e.g. My Genes) rather than data 1? (When there are multiple input files, $on_string needs to be kept short). Regards, Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Using input dataset names in output dataset names
Hi Peter, thanks for raising this important topic. I think the following trello card has a similar idea and a patch attached. https://trello.com/c/JnhOEqow It would be great if we can simplify the naming of datasets, especially if you run a workflow with several input, you would like to keep the input name through the whole workflow to the end. Cheers, Bjoern Hi all, I'd like to change the output dataset labelling in Galaxy file format conversion tools. e.g. If the input is history entry 1 (e.g My Genes) then the output from tabular_to_fasta.xml is currently named FASTA-to-Tabular on data 1. I would prefer this was FASTA-to-Tabular on data My Genes or better My Genes (as tabular). I've just done this for my BLAST XML to tabular tool, using the .display_name trick: https://github.com/peterjc/galaxy_blast/commit/31e31c4b5deadd60828ce6e6a381a5f90357393d Would a pull request doing this to the built-in conversion tools be favourably received? Alternatively, would it be preferable to simply reused the input dataset's name unchanged for simple format conversion tools (without text about the conversion)? Related to this, would people prefer if the $on_string in the case of a single input file was the input file's name (e.g. My Genes) rather than data 1? (When there are multiple input files, $on_string needs to be kept short). Regards, Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Using input dataset names in output dataset names
On Thu, Nov 7, 2013 at 12:29 PM, Bjoern Gruening bjoern.gruen...@gmail.com wrote: Hi Peter, thanks for raising this important topic. I think the following trello card has a similar idea and a patch attached. https://trello.com/c/JnhOEqow It would be great if we can simplify the naming of datasets, especially if you run a workflow with several input, you would like to keep the input name through the whole workflow to the end. Yes, I was aware of some more general ideas like that - and I agree this is important. However, with the conversion tool naming we can make a small improvement right now, without having to modify the Galaxy core. Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Using input dataset names in output dataset names
On Thu, Nov 7, 2013 at 12:21 PM, Peter Cock p.j.a.c...@googlemail.com wrote: Related to this, would people prefer if the $on_string in the case of a single input file was the input file's name (e.g. My Genes) rather than data 1? (When there are multiple input files, $on_string needs to be kept short). That turned out to be quite an easy change (patch below), and personally I think this makes the $on_string much nicer. Peter -- $ hg diff lib/galaxy/tools/actions/__init__.py diff -r 77d58fdd1c2e lib/galaxy/tools/actions/__init__.py --- a/lib/galaxy/tools/actions/__init__.pyTue Oct 29 14:21:48 2013 -0400 +++ b/lib/galaxy/tools/actions/__init__.pyThu Nov 07 15:15:42 2013 + @@ -181,6 +181,7 @@ input_names = [] input_ext = 'data' input_dbkey = incoming.get( dbkey, ? ) +on_text = '' for name, data in inp_data.items(): if not data: data = NoneDataset( datatypes_registry = trans.app.datatypes_registry ) @@ -194,6 +195,7 @@ else: # HDA if data.hid: input_names.append( 'data %s' % data.hid ) +on_text = data.name # Will use below if only one input dataset input_ext = data.ext if data.dbkey not in [None, '?']: @@ -230,7 +232,10 @@ output_permissions = trans.app.security_agent.history_get_default_permissions( history ) # Build name for output datasets based on tool name and input names if len( input_names ) == 1: -on_text = input_names[0] +#We recorded the dataset name as on_text earlier... +if not on_text: +#Fall back on the shorter 'data %i' style: +on_text = input_names[0] elif len( input_names ) == 2: on_text = '%s and %s' % tuple(input_names[0:2]) elif len( input_names ) == 3: ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] local_task_queue_workers
Hi list, I would need a memory refresher about tasked jobs. When testing some larger analyses on a local installation, I thought the local_task_queue_workers setting in universe_wsgi.ini would be the limiting factor for how many tasks can be executed at the same time. In our setup, it is currently set to 2. However, 5 tasks are run simultaneously, leading to memory problems. Am I overlooking something that anyone knows of? cheers, jorrit boekel -- Scientific programmer Mass spec analysis support @ BILS Janne Lehtiö / Lukas Käll labs SciLifeLab Stockholm ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Bug: Two copies of wiggle_to_simple.xml
Hi all, There are two copies of the wiggle_to_simple tool in the main repository, and this duplication appears to have happened back in 2009. $ grep wiggle_to_simple tool_conf.xml.sample tool file=filters/wiggle_to_simple.xml / tool file=stats/wiggle_to_simple.xml / $ diff tools/filters/wiggle_to_simple.py tools/stats/wiggle_to_simple.py (no changes) $ diff -w tools/filters/wiggle_to_simple.xml tools/stats/wiggle_to_simple.xml 15,18d14 test param name=input value=3.wig / output name=out_file1 file=3_wig.bed/ /test The tools/filters/wiggle_to_simple.xml version has Windows newlines, and 2 tests. The tools/stats/wiggle_to_simple.xml version has Unix newlines, but only 1 test. I would therefore suggest merging the two (Unix newlines, both tests). Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Using input dataset names in output dataset names
On Thu, Nov 7, 2013 at 3:18 PM, Peter Cock p.j.a.c...@googlemail.com wrote: On Thu, Nov 7, 2013 at 12:21 PM, Peter Cock p.j.a.c...@googlemail.com wrote: Related to this, would people prefer if the $on_string in the case of a single input file was the input file's name (e.g. My Genes) rather than data 1? (When there are multiple input files, $on_string needs to be kept short). That turned out to be quite an easy change (patch below), and personally I think this makes the $on_string much nicer. Peter Getting back to my motivating example, since fasta_to_tabular.xml does not give the output a label and depends on the default, the small change to $on_string should result in the conversion of a file named My Genes as FASTA-to-Tabular on My Genes, rather than FASTA-to-Tabular on data 1 as now. Here's another variant to keep the data 1 text in $on_string, if people are attached to this functionality. That would result in FASTA-to-Tabular on data 1 (My Genes). Also, here's an outline patch to explicitly produce my preferred label of My Genes (as tabular) etc. (Bjoern is right though - a more long term solution is needed to better address naming, like the tag idea on Trello.) Peter -- $ hg diff lib/galaxy/tools/actions/__init__.py diff -r 77d58fdd1c2e lib/galaxy/tools/actions/__init__.py --- a/lib/galaxy/tools/actions/__init__.pyTue Oct 29 14:21:48 2013 -0400 +++ b/lib/galaxy/tools/actions/__init__.pyThu Nov 07 15:49:15 2013 + @@ -181,6 +181,7 @@ input_names = [] input_ext = 'data' input_dbkey = incoming.get( dbkey, ? ) +on_text = '' for name, data in inp_data.items(): if not data: data = NoneDataset( datatypes_registry = trans.app.datatypes_registry ) @@ -194,6 +195,8 @@ else: # HDA if data.hid: input_names.append( 'data %s' % data.hid ) +#Will use this on_text if only one input dataset: +on_text = data %s (%s) % (data.id, data.name) input_ext = data.ext if data.dbkey not in [None, '?']: @@ -230,7 +233,10 @@ output_permissions = trans.app.security_agent.history_get_default_permissions( history ) # Build name for output datasets based on tool name and input names if len( input_names ) == 1: -on_text = input_names[0] +#We recorded the dataset name as on_text earlier... +if not on_text: +#Fall back on the shorter 'data %i' style: +on_text = input_names[0] elif len( input_names ) == 2: on_text = '%s and %s' % tuple(input_names[0:2]) elif len( input_names ) == 3: -- $ hg diff tools diff -r 77d58fdd1c2e tools/fasta_tools/fasta_to_tabular.xml --- a/tools/fasta_tools/fasta_to_tabular.xmlTue Oct 29 14:21:48 2013 -0400 +++ b/tools/fasta_tools/fasta_to_tabular.xmlThu Nov 07 15:42:13 2013 + @@ -11,7 +11,7 @@ /param /inputs outputs -data name=output format=tabular/ +data name=output format=tabular label=$input.display_name (as tabular)/ /outputs tests test diff -r 77d58fdd1c2e tools/fasta_tools/tabular_to_fasta.xml --- a/tools/fasta_tools/tabular_to_fasta.xmlTue Oct 29 14:21:48 2013 -0400 +++ b/tools/fasta_tools/tabular_to_fasta.xmlThu Nov 07 15:42:13 2013 + @@ -7,7 +7,7 @@ param name=seq_col type=data_column data_ref=input numerical=False label=Sequence column / /inputs outputs -data name=output format=fasta/ +data name=output format=fasta label=$input.display_name (as FASTA) / /outputs tests test @@ -40,4 +40,4 @@ GTGATATGTATGTTGACGGCCATAAGGCTGCTTCTT /help -/tool \ No newline at end of file +/tool diff -r 77d58fdd1c2e tools/fastq/fastq_to_fasta.xml --- a/tools/fastq/fastq_to_fasta.xmlTue Oct 29 14:21:48 2013 -0400 +++ b/tools/fastq/fastq_to_fasta.xmlThu Nov 07 15:42:13 2013 + @@ -5,7 +5,7 @@ param name=input_file type=data format=fastq label=FASTQ file to convert / /inputs outputs -data name=output_file format=fasta / +data name=output_file format=fasta label=$input_file.name (as FASTA) / /outputs tests !-- basic test -- diff -r 77d58fdd1c2e tools/fastq/fastq_to_tabular.xml --- a/tools/fastq/fastq_to_tabular.xmlTue Oct 29 14:21:48 2013 -0400 +++ b/tools/fastq/fastq_to_tabular.xmlThu Nov 07 15:42:13 2013 + @@ -8,7 +8,7 @@ /param /inputs outputs -data name=output_file format=tabular / +data name=output_file format=tabular label=$input_file.name (as tabular) / /outputs tests !-- basic test -- diff -r 77d58fdd1c2e tools/fastq/tabular_to_fastq.xml --- a/tools/fastq/tabular_to_fastq.xml
[galaxy-dev] Job execution order mixed-up
Dear Galaxy mailing-list, Once again I come seeking for your help. I hope someone already had this issue or will have an idea on where to look to solve it. :) One of our users reported having workflows failing because some steps were executed before all their inputs where ready. You can find a screenshot attached, where we can see that step (42) Sort on data 39 has been executed while step (39) is still waiting to run (gray box). This behaviour has been reproduced with at least two different Galaxy tools (one custom, and the sort tool which comes standard with Galaxy). This behaviour seems to be a little bit random, as running two times a workflow where this issue occurs, only one time did some steps were executed in the wrong order. I could be wrong, but I don't think this issue is grid-related as, from my understanding, Galaxy is not using SGE job dependencies functionality. I believe all jobs stays in some internal queues (within Galaxy) until all input files are ready, and only then the job is submitted to the cluster. Any help or any hint on what to look at to solve this issue would be greatly appreciated. We have updated our Galaxy instance to August 12th distribution on October 1st, and I believe we never experienced this issue before the update. Many thanks for your help, Jean-François image/gif___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Samtools and idxstats
Hi Michiel, Did you finish wrapping samtools idxstats? I can't see it on the Tool Shed... If not, I may tackle this shortly. Peter On Wed, May 22, 2013 at 2:39 PM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Galaxy stores a BAI for each BAM internally; you can access it in a tool wrapper like this (assuming the name of your input dataset is 'input_bam': ${input_bam.metadata.bam_index} Once you have the file path, you can set up a symbolic link to it and the tool should work fine. Good luck, J. On May 22, 2013, at 4:05 AM, Michiel Van Bel wrote: Hi, I would like to inquire whether anyone has attempted to implement the idxstats tool from samtools into Galaxy? The xml-file for idxstats is not present in the Galaxy source code, which led me to try and implement it myself. However, the main problem I face is that the idxstats tool silently relies on having an index file available (within the same directory) for the bam file you which to print the stats for. E.g. samtools idxstats PATH/test.bam searches for PATH/test.bam.bai and gives an error when this file is not present. And somehow I cannot model this behavior in Galaxy. A different solution would of course be to ask the author(s) of samtools to have an option available where the user can directly indicate the path to the index file. regards, Michiel PS: I've searched the mailing list archives for this problem but did not find any matches. Apologies if I somehow missed the answer. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Security vulnerability in Galaxy filtering tools
Hello Ido, The project has had a lot of contributors over the years, it is probably not safe to assume they have all been experts and frequently experts know of little tricks or shortcuts that can result in a lot of trouble (this case in point) - a less sophisticated developer probably would not have even known about the Python functionality that resulted in this trouble. I suspect the mere fact that you are concerned about writing secure tools means you would do a better job than many professional software developers whom you may term as experts. Furthermore, you are absolutely right there should be some documentation or something somewhere to aid in writing secure tools - the rest of this e-mail contains a couple quick notes hopefully it can be translated to the wiki at some point and grown over time. If your tool is not taking in text inputs - its all numbers and select parameters, etc it is very likely secure. These sorts of vulnerabilities would usually come into play when users are allowed to pass free text and the tool or wrapper use this text in such a way that it can be broken out of the intended context (these are broadly characterized as code injection attacks). 95% of how these text parameters are going to be used is likely passing them as a command-line argument to another program. For this reason Galaxy preprocesses the text and sanitizes it so it cannot contain characters that would result in the text easily result in code injections. So for this reason - you are still probably fine unless you are circumventing this text preprocessing. For instance, Galaxy will translate quotations marks to '__dq__', this tool explicitly retranslates those back to quotations marks (https://bitbucket.org/galaxy/galaxy-central/src/f2f1cce4678cf1eb188d9611b05f00706afc8897/tools/stats/filtering.py?at=default#cl-176). There is a reason to do this in this case but you will not need for most bioinformatics applications. If your tools are doing this it is time to start getting extra careful. If you are really interested in this topic or when it is time to get extra careful, I would recommend picking up the book The Web Application Hacker's Handbook - it is pretty good. Most of it would not be relevant for tool developers, but chapter 1, chapter 2, and all of chapter 9 could be very relevant and would probably leave one with a solid grasp of what to look for in many different contexts - not just the ones the book discusses explicitly. Hopefully over time the IUC will provide guidance about this sort of thing (informing you if there are potential security vulnerabilities in your tools). Also feel free to post example tool configurations to this list you might be concerned about and I am sure someone would be happy to look it over and tell you if there are any red flags. -John On Tue, Nov 5, 2013 at 12:30 PM, Ido Tamir ta...@imp.ac.at wrote: On Nov 5, 2013, at 6:28 PM, Nate Coraor n...@bx.psu.edu wrote: Hi Ido, Thanks for the feedback. Replies below. On Nov 5, 2013, at 9:54 AM, Ido Tamir wrote: This seems to happen often e.g. http://wiki.galaxyproject.org/DevNewsBriefs/2012_10_23#Compute_Tool_Security_Fix I'm not sure I'd agree that it's often - we've had 4 or 5 vulnerabilities over the life of the project. 2 allowed arbitrary code execution, the others were less severe. But these were written by experts, not by people like me, that don't know what the galaxy framework really does/does not do with the input, so I guess I make many more mistakes. a) are there general guidelines in the wiki on how to avoid these problems when creating tools? The guidelines for writing a Galaxy tool are no different from best practices for writing secure code. In specific for this vulnerability, execution of user input should be handled with extreme care, and this tool had some gaps in its input validation and sanitization. For what it's worth, the filter tool (on which the other vulnerable tools were based) is one of the few tools surviving from the very early days of Galaxy, and would not be implemented the same way if written today. I think it would be nice to have a small outline on the wiki of what galaxy does with the input data and how it could affect a tool. What sanitisation is there by default so I don't have to worry about it, but what could happen I if I don't care to check/remove sanitise ' | or ..., maybe with examples. b) is there a way to check automatically if all input fields are correctly escaped in a tool? I am not sure how Galaxy could do this. Galaxy sanitizes the command line so that input fields passed to a tool as command line arguments cannot be crafted to exploit the shell's parsing rules. Thats good best, ido What the tool itself does with its inputs are out of Galaxy's control. --nate A search for security in the wiki brings up: • Admin/Data Libraries/Library Security 0.0k - rev: 1 (current) last
Re: [galaxy-dev] local_task_queue_workers
I think you want to set local_job_queue_workers instead or set the number on the local job runner plugin element in the newer job_conf.xml style configuration - the task runner delegates to the underlying job runners once it has split out the tasks. The parameter you were setting I am guessing just determines the number of threads used to split up tasks, not run them. -John On Thu, Nov 7, 2013 at 9:17 AM, Jorrit Boekel jorrit.boe...@scilifelab.se wrote: Hi list, I would need a memory refresher about tasked jobs. When testing some larger analyses on a local installation, I thought the local_task_queue_workers setting in universe_wsgi.ini would be the limiting factor for how many tasks can be executed at the same time. In our setup, it is currently set to 2. However, 5 tasks are run simultaneously, leading to memory problems. Am I overlooking something that anyone knows of? cheers, jorrit boekel -- Scientific programmer Mass spec analysis support @ BILS Janne Lehtiö / Lukas Käll labs SciLifeLab Stockholm ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Security vulnerability in Galaxy filtering tools
John just made it to the Wiki: http://wiki.galaxyproject.org/Develop/SecurityToolTips Feel free to add/edit/delete. M. On Thu, Nov 7, 2013 at 12:27 PM, John Chilton chil...@msi.umn.edu wrote: Hello Ido, The project has had a lot of contributors over the years, it is probably not safe to assume they have all been experts and frequently experts know of little tricks or shortcuts that can result in a lot of trouble (this case in point) - a less sophisticated developer probably would not have even known about the Python functionality that resulted in this trouble. I suspect the mere fact that you are concerned about writing secure tools means you would do a better job than many professional software developers whom you may term as experts. Furthermore, you are absolutely right there should be some documentation or something somewhere to aid in writing secure tools - the rest of this e-mail contains a couple quick notes hopefully it can be translated to the wiki at some point and grown over time. If your tool is not taking in text inputs - its all numbers and select parameters, etc it is very likely secure. These sorts of vulnerabilities would usually come into play when users are allowed to pass free text and the tool or wrapper use this text in such a way that it can be broken out of the intended context (these are broadly characterized as code injection attacks). 95% of how these text parameters are going to be used is likely passing them as a command-line argument to another program. For this reason Galaxy preprocesses the text and sanitizes it so it cannot contain characters that would result in the text easily result in code injections. So for this reason - you are still probably fine unless you are circumventing this text preprocessing. For instance, Galaxy will translate quotations marks to '__dq__', this tool explicitly retranslates those back to quotations marks ( https://bitbucket.org/galaxy/galaxy-central/src/f2f1cce4678cf1eb188d9611b05f00706afc8897/tools/stats/filtering.py?at=default#cl-176 ). There is a reason to do this in this case but you will not need for most bioinformatics applications. If your tools are doing this it is time to start getting extra careful. If you are really interested in this topic or when it is time to get extra careful, I would recommend picking up the book The Web Application Hacker's Handbook - it is pretty good. Most of it would not be relevant for tool developers, but chapter 1, chapter 2, and all of chapter 9 could be very relevant and would probably leave one with a solid grasp of what to look for in many different contexts - not just the ones the book discusses explicitly. Hopefully over time the IUC will provide guidance about this sort of thing (informing you if there are potential security vulnerabilities in your tools). Also feel free to post example tool configurations to this list you might be concerned about and I am sure someone would be happy to look it over and tell you if there are any red flags. -John On Tue, Nov 5, 2013 at 12:30 PM, Ido Tamir ta...@imp.ac.at wrote: On Nov 5, 2013, at 6:28 PM, Nate Coraor n...@bx.psu.edu wrote: Hi Ido, Thanks for the feedback. Replies below. On Nov 5, 2013, at 9:54 AM, Ido Tamir wrote: This seems to happen often e.g. http://wiki.galaxyproject.org/DevNewsBriefs/2012_10_23#Compute_Tool_Security_Fix I'm not sure I'd agree that it's often - we've had 4 or 5 vulnerabilities over the life of the project. 2 allowed arbitrary code execution, the others were less severe. But these were written by experts, not by people like me, that don't know what the galaxy framework really does/does not do with the input, so I guess I make many more mistakes. a) are there general guidelines in the wiki on how to avoid these problems when creating tools? The guidelines for writing a Galaxy tool are no different from best practices for writing secure code. In specific for this vulnerability, execution of user input should be handled with extreme care, and this tool had some gaps in its input validation and sanitization. For what it's worth, the filter tool (on which the other vulnerable tools were based) is one of the few tools surviving from the very early days of Galaxy, and would not be implemented the same way if written today. I think it would be nice to have a small outline on the wiki of what galaxy does with the input data and how it could affect a tool. What sanitisation is there by default so I don't have to worry about it, but what could happen I if I don't care to check/remove sanitise ' | or ..., maybe with examples. b) is there a way to check automatically if all input fields are correctly escaped in a tool? I am not sure how Galaxy could do this. Galaxy sanitizes the command line so that input fields passed to a tool as command line arguments cannot be crafted to exploit the
Re: [galaxy-dev] Bug: Two copies of wiggle_to_simple.xml
There are still two copies in central but I have synchronized the test cases and fixed the newlines per your suggestion. I imagine the easiest way to fix the fact there are two copies is just wait until they get migrated to the tool shed - it seems like progress is being made on that front very quickly lately. Thanks, -John On Thu, Nov 7, 2013 at 9:38 AM, Peter Cock p.j.a.c...@googlemail.com wrote: Hi all, There are two copies of the wiggle_to_simple tool in the main repository, and this duplication appears to have happened back in 2009. $ grep wiggle_to_simple tool_conf.xml.sample tool file=filters/wiggle_to_simple.xml / tool file=stats/wiggle_to_simple.xml / $ diff tools/filters/wiggle_to_simple.py tools/stats/wiggle_to_simple.py (no changes) $ diff -w tools/filters/wiggle_to_simple.xml tools/stats/wiggle_to_simple.xml 15,18d14 test param name=input value=3.wig / output name=out_file1 file=3_wig.bed/ /test The tools/filters/wiggle_to_simple.xml version has Windows newlines, and 2 tests. The tools/stats/wiggle_to_simple.xml version has Unix newlines, but only 1 test. I would therefore suggest merging the two (Unix newlines, both tests). Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Dynamic tool configuration
Galaxy is not optimized for this kind of use case, I like to think of it as sort of file centric - but it is clear more and more people are using it to process data in this fashion. Hopefully someone chimes in with better advice than I have :). The way I would probably implement this is create a tool that lists the available options out into a dataset or metadata of a dataset of a new datatype - and then have your tool read that dataset in and build the options using that file. Jim Johnson's mothur tools (available on the tool shed) do a lot of this sort of thing - it may be worth looking at the datatypes and an example such as remove.groups.xml: ... param name=group_in type=data format=groups label=group - Groups/ conditional name=groupnames param name=source type=select label=Select Group Names from option value=groupsA List of Group Names/option option value=accnosA History Group Name Accnos Dataset/option /param when value=groups param name=groups type=select label=groups - Pick groups to remove multiple=true options filter type=data_meta ref=group_in key=groups / /options /param /when when value=accnos param name=accnos type=data format=accnos label=accnos - Group Names from your history/ /when /conditional ... In particular the data_meta filter type in there. -John On Wed, Nov 6, 2013 at 8:03 AM, Biobix Galaxy biobix.gal...@gmail.com wrote: Hi all, We are working on a galaxy tool suite for data analysis. We use a sqlite db to keep result data centralised between the different tools. At one point the tool configuration options of a tool should be dependent on the rows within a table of the sqlite db that is the output of the previous step. In other words, we would like to be able to set selectable parameters based on an underlying sql statement. If sql is not possible, an alternative would be to output the table content into a txt file and subsequently parse the txt file instead of the sqlite_db within the xml configuration file. When looking through the galaxy wiki and mailing lists I came across the code tag which would be ideal, we could run a python script in the background to fetch date from the sqlite table, however that function is deprecated. Does anybody know of other ways to achieve this? Thanks! Jeroen Ir. Jeroen Crappé, PhD Student Lab of Bioinformatics and Computational Genomics (Biobix) FBW - Ghent University ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas
On Thu, Nov 7, 2013 at 1:46 AM, Björn Grüning bjoern.gruen...@pharmazie.uni-freiburg.de wrote: Am Donnerstag, den 07.11.2013, 00:25 -0600 schrieb John Chilton: My two cents below. On Wed, Nov 6, 2013 at 4:20 PM, Björn Grüning bjoern.gruen...@pharmazie.uni-freiburg.de wrote: Hi Dave, We're thinking that the following approach makes the most sense: action type=setup_perl_environment OR action type=setup_r_environment OR action type=setup_ruby_environment OR action type=setup_virtualenv repository changeset_revision=978287122b91 name=package_perl_5_18 owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu; package name=perl version=5.18.1 / /repository repository changeset_revision=8fc96166cddd name=package_expat_2_1 owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu; package name=expat version=2.1 / /repository /action For all repository tag sets contained within these setup_* tags, the repository's env.sh would be pulled in for the setup of the specified environment without requiring a set_environment_for_install action type. Would this work for your use cases? Yes, the first one. But its a little bit to verbose or? Include the perl repository in a setup_perl environment should be implicit or? We can assume that this need to be present. Do you have example why sourcing every repository by default can be harmful? It would make such an installation so much easier and less complex. I am not sure I understand this paragraph - I have a vague sense I agree but is there any chance you could rephrase this or elaborate? My first use case will be addressed by this suggestion. I had hoped that we can create a less verbose syntax. If we I specify a package at the top of my xml file: package name=expat version=2.1.0 repository name=package_expat_2_1 owner=iuc prior_installation_required=True / /package I need to repeat it either in a action type=set_environment_for_install or in a action type=setup_perl_environment. My hope was to get rid of these. Once a package definition is specified/build, every ENV var is available in any downstream package. But if there is any downsides or pitfalls this more verbose and explicit syntax will work for my usecase. I see, this makes perfect sense to me now, thanks! I certainly agree that it should have to be spelled out twice unless there is a good reason. I guess my preference would be to just see it inside of the setup_perl_environment tag - why should it need to be at the top-level as well? There could be many implementation details that make this difficult though, so obviously I delegate to Greg/Dave on this. Also that did not solve the second use case. If have two packages one that is installing perl libraries and the second a binary that is checking or that needs these perl libs. We have discussed off list in another thread. Just to summarize my thoughts there - I think we should delay this or not make it a priority if there are marginally acceptable workarounds that can be found for the time being. Getting these four actions to work well as sort of terminal endpoints and allow specification as tersely as possible should be the primary goal for the time being. You will see Perl or Python packages depend on C libraries 10 times more frequently than you will find makefiles and C programs depend on complex perl or python environments (correct me if I am wrong). Given that there is already years worth of tool shed development outlined in existing Trello cards - this is just how I would prioritize things (happy to be overruled). Ok point taken. Lets focus on real issue. That use case is just a simplification / more structured way to write tool depdendecies, its not strictly needed to get my packages done. John just to make that use case clearer: - You have a package (A) with dependency (B) - B is not worth to put it in a extra repository (extra tool_dependencies.xml file) Currently, you are forced to define both in one package tag, because if you define it in two package tags A will not see B. The perl and python was a bad example you have that problem with every dependency that are not worth to put it in a separate repository. To summarize: I'm fine with that approach. It will address my current use case and it would be great to have it as proposed by Dave! Thanks a lot! Bjoern If so, can you confirm that this should be done for all four currently supported setup_* action types? I think it would be best to tackle setup_r_environment and setup_ruby_environment first. setup_virtualenv cannot have nested elements at this time - it is just assumed to be a bunch of text (either a file containing the dependencies or a list of the dependencies). So setup_r_environment and setup_ruby_environment have the same structure: setup_ruby_environment repository .. / package .. / package .. / /setup_ruby_environment ...
Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas
Please see my inline comments. Thanks! Greg Von Kuster On Nov 7, 2013, at 1:33 PM, John Chilton chil...@msi.umn.edu wrote: On Thu, Nov 7, 2013 at 1:46 AM, Björn Grüning bjoern.gruen...@pharmazie.uni-freiburg.de wrote: Am Donnerstag, den 07.11.2013, 00:25 -0600 schrieb John Chilton: My two cents below. On Wed, Nov 6, 2013 at 4:20 PM, Björn Grüning bjoern.gruen...@pharmazie.uni-freiburg.de wrote: Hi Dave, We're thinking that the following approach makes the most sense: action type=setup_perl_environment OR action type=setup_r_environment OR action type=setup_ruby_environment OR action type=setup_virtualenv repository changeset_revision=978287122b91 name=package_perl_5_18 owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu; package name=perl version=5.18.1 / /repository repository changeset_revision=8fc96166cddd name=package_expat_2_1 owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu; package name=expat version=2.1 / /repository /action For all repository tag sets contained within these setup_* tags, the repository's env.sh would be pulled in for the setup of the specified environment without requiring a set_environment_for_install action type. Would this work for your use cases? Yes, the first one. But its a little bit to verbose or? Include the perl repository in a setup_perl environment should be implicit or? We can assume that this need to be present. Do you have example why sourcing every repository by default can be harmful? It would make such an installation so much easier and less complex. I am not sure I understand this paragraph - I have a vague sense I agree but is there any chance you could rephrase this or elaborate? My first use case will be addressed by this suggestion. I had hoped that we can create a less verbose syntax. If we I specify a package at the top of my xml file: package name=expat version=2.1.0 repository name=package_expat_2_1 owner=iuc prior_installation_required=True / /package I need to repeat it either in a action type=set_environment_for_install or in a action type=setup_perl_environment. My hope was to get rid of these. Once a package definition is specified/build, every ENV var is available in any downstream package. But if there is any downsides or pitfalls this more verbose and explicit syntax will work for my usecase. The potential problem I see here is that environment variables are not name spaced in any way, so if all env.sh files are sourced no matter what, there is the potential for a certain environment variable to get set to a certain dependency version, and then later during the installation (assuming a hierarchy of repository dependencies), the same environment variable gets set to a different version of the same dependency. I'm not sure how often (if ever) this couls occur, but if it did, it the installation would not be as expected. I see, this makes perfect sense to me now, thanks! I certainly agree that it should have to be spelled out twice unless there is a good reason. I guess my preference would be to just see it inside of the setup_perl_environment tag - why should it need to be at the top-level as well? There could be many implementation details that make this difficult though, so obviously I delegate to Greg/Dave on this. Just so I'm clear on this, is this what you want implemented as an enhancement to the setup_* tag sets? action type=setup_perl_environment OR action type=setup_r_environment OR action type=setup_ruby_environment OR action type=setup_virtualenv repository changeset_revision=978287122b91 name=package_perl_5_18 owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu; package name=perl version=5.18.1 / /repository repository changeset_revision=8fc96166cddd name=package_expat_2_1 owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu; package name=expat version=2.1 / /repository /action For all repository tag sets contained within these setup_* tags, the repository's env.sh would be pulled in for the setup of the specified environment without requiring a set_environment_for_install action type. Also that did not solve the second use case. If have two packages one that is installing perl libraries and the second a binary that is checking or that needs these perl libs. We have discussed off list in another thread. Just to summarize my thoughts there - I think we should delay this or not make it a priority if there are marginally acceptable workarounds that can be found for the time being. Getting these four actions to work well as sort of terminal endpoints and allow specification as tersely as possible should be the primary goal for the time being. You will see Perl or Python packages depend on C libraries 10 times more frequently than you will find makefiles and C programs depend on complex perl or python environments (correct me if I am
Re: [galaxy-dev] Installing Galaxy behind an Apache proxy using mod_auth_cas for user auth
Dear all, I have found a solution but I can unfortunately not explain why your solution on the Admin pages is not working. The following entries in httpd.conf solved the problem in our environment. Maybe this is useful for other CAS users. Best, Sandra RewriteEngine on Location / # Define the authentication method AuthType CAS AuthName Galaxy Require valid-user /Location # Proxy Configurations ProxyVia On ProxyPassInterpolateEnv On Proxy * Order allow,deny Allow from all /Proxy ProxyPass / http://galaxy.crc.nd.edu:8080/ ProxyPassReverse / http://galaxy.crc.nd.edu:8080/ RequestHeader set REMOTE_USER %{REMOTE_USER}s SSLProxyEngine On AllowCONNECT 8080 RewriteRule ^(.*) http://galaxy.crc.nd.edu:8080$1 [P] From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Sandra Gesing [sandra.ges...@nd.edu] Sent: Tuesday, November 05, 2013 5:46 PM To: galaxy-dev@lists.bx.psu.edu Subject: [galaxy-dev] Installing Galaxy behind an Apache proxy using mod_auth_cas for user auth Dear all, I would like to set up a local Galaxy instance behind an Apache server with our local CAS for authentication. It would be great if you could give me a hint for the httpd.conf. I have the problem that after authenticating against CAS in the browser, I get following error message and REMOTE_USER doesn't seem to be in the HTTP header for Galaxy (I can see the REMOTE_USER in the access_log of Apache but not any more in paster.log of Galaxy). Access to Galaxy is denied Galaxy is configured to authenticate users via an external method (such as HTTP authentication in Apache), but a username was not provided by the upstream (proxy) server. This is generally due to a misconfiguration in the upstream server. I know that the same question was already asked in the following post but I haven't seen an option to extend the post and I haven't found an answer. http://dev.list.galaxyproject.org/Installing-Galaxy-behind-an-Apache-proxy-using-mod-auth-cas-for-user-auth-tt4660837.html#none Any help is much appreciated. Many thanks, Sandra ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] datacache bowtie2 for mm9 ?
Hi Curtis, This is still open but I expect to correct this very soon, along with new data additions (but corrections first on the list!). We definately consider this important and apologize that this has impacted your paper's supplemental creation. There is a Trello ticket containing the known data problems since the migration of usegalaxy.org, you can follow for updates here: https://trello.com/c/SbizUDQt Thanks! Jen Galaxy team On 10/29/13 8:18 AM, Curtis Hendrickson (Campus) wrote: Jennifer, What's the status of bowtie2/mm9 index on PSU main? When I select tophat2, it offers me mm9 as a choice for built-in indexes. However, when the job runs, I get the following error, indicating the bowtie2/mm9 indexes are missing (below). Any insight into whether this is expected, or what the ETA is until the index would be installed, would be great. I'm trying to reproduce work on PSU I ran on my local galaxy, so that we can link to it for supplemental materials for a paper. Thanks, Curtis PS -- I clicked the submit bug button a few days ago, but haven't received a response yet. Fatal error: Tool execution failed [2013-10-29 10:13:27] Beginning TopHat run (v2.0.9) --- [2013-10-29 10:13:27] Checking for Bowtie Bowtie version: 2.1.0.0 [2013-10-29 10:13:27] Checking for Samtools Samtools version: 0.1.18.0 [2013-10-29 10:13:27] Checking for Bowtie index files (genome).. Error: Could not find Bowtie 2 index files (/galaxy/data/mm9/mm9full/bowtie2_index/mm9full.*.bt2) *From:*Jennifer Jackson [mailto:j...@bx.psu.edu] *Sent:* Friday, September 20, 2013 4:00 PM *To:* Curtis Hendrickson (Campus) *Subject:* Re: [galaxy-dev] datacache bowtie2 for mm9 ? Thanks Curtis, I am actually working to try to get mm9 out there right now. No promises, but is just one (well, three, including variants)! If technical is a go, then will do it. Ideally others soonish. We'll see. The last news brief has help for the Data manager, it may be that you need to do some config changes to get it going. I am certainly no expert - this is Dan's and under active development - but is where I would start. Jen On 9/20/13 1:25 PM, Curtis Hendrickson (Campus) wrote: Thanks for the rapid reply! I have some questions and comments, but need to read up on Data Managers (that admin page seems non-functional in our local galaxy, despite being on latest code) first. Regards, Curtis *From:*Jennifer Jackson [mailto:j...@bx.psu.edu] *Sent:* Friday, September 20, 2013 2:34 PM *To:* Curtis Hendrickson (Campus) *Cc:* galaxy-...@bx.psu.edu mailto:galaxy-...@bx.psu.edu *Subject:* Re: [galaxy-dev] datacache bowtie2 for mm9 ? Hello Curtis, The datacache was originally pointed to the data staging area and is now pointed to the data published area. The difference is that the published area contains data and location (.loc) files that are in synch and have completed final testing. It is your choice about whether to use the staged-only data - it depends how risk tolerant your project is and if you plan on testing. But, that said, I think it is almost certainly fine or our team wouldn't have staged it yet. A vanishingly small number of datasets are pulled back once they make it to staging, and this is why we were comfortable pointing datacache there in the first place (were unable to point to the published area at first, but wanted to make the data available ASAP). Going forward - I can let you know that these indexes are very easy to create: one command-line execution, then add one line to the associated .loc file. Instructions are here, see Bowtie and Tophat: http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup For one or few genomes, not a problem. For hundreds of genomes with variants, can become tedious even with helper tools and in our case, the processing interacted with disk that was undergoing changes (as we have been working on system configuration most of the summer). Also, with the Data Manager is now available, creating batch indexes for use via rsync become lower priority. Even so, I would expect more indexes to be fully published once the final configuration is in place, as many are already staged or close being staged (watch the yellow banner on Main). Hopefully this helps to explain the data, guides you to making an informed decision, and aids with creating your own indexes as needed, Thanks! Jen Galaxy team On 9/18/13 1:04 PM, Curtis Hendrickson (Campus) wrote: Folks, First, I wanted to thank you for making the datacache available (http://wiki.galaxyproject.org/Admin/Data%20Integration; rsync://datacache.g2.bx.psu.edu). It's a great resource. However, what is the best way to stay abreast of
Re: [galaxy-dev] Bowtie2 mm9 index
Hello Mary, The underlying data is not complete for all mm9 reference genomes for the Bowtie2/Tophat2 tools, causing the error. It is a known issue and we expect to have it corrected very soon now. We consider this important and high priority, our apologies for the confusion it caused. A list of known data issues since the migration of usegalaxy.org are listed in this ticket. You can follow the ticket to find out when this genome has been restored: https://trello.com/c/SbizUDQt Best, Jen Galaxy team On 10/29/13 7:43 AM, Davis, Mary wrote: Greetings--- I tried to run an alignment using Bowtie2 and got this message- format: bam, database: mm9 Could not locate a Bowtie index corresponding to basename /galaxy/data/mm9/mm9canon/bowtie2_index/mm9canon Error: Encountered internal Bowtie 2 exception (#1) Command: /galaxy/software/linux2.6-x86_64/pkg/bowtie2-2.1.0/bin/bowtie2-align --wrapper basic I imported Illumina fastq data, groomed them, and then did the analysis using the built-in index mouse, both full and male, and had the same error message. I'm relatively new to this, and don't see what I missed. Thanks Mary E. Davis, Ph.D. Professor Department of Physiology Pharmacology West Virginia University Health Sciences Center PO Box 9229 Morgantown, WV 26506-9229 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas
The potential problem I see here is that environment variables are not name spaced in any way, so if all env.sh files are sourced no matter what, there is the potential for a certain environment variable to get set to a certain dependency version, and then later during the installation (assuming a hierarchy of repository dependencies), the same environment variable gets set to a different version of the same dependency. I'm not sure how often (if ever) this couls occur, but if it did, it the installation would not be as expected. Mh possible, but really a rare corner case, I think. We could offer a 'do-not-source-automatically' tag, to prevent such corner cases. repository name=package_expat_2_1 owner=iuc sourcing=manual|auto I see, this makes perfect sense to me now, thanks! I certainly agree that it should have to be spelled out twice unless there is a good reason. I guess my preference would be to just see it inside of the setup_perl_environment tag - why should it need to be at the top-level as well? There could be many implementation details that make this difficult though, so obviously I delegate to Greg/Dave on this. Just so I'm clear on this, is this what you want implemented as an enhancement to the setup_* tag sets? action type=setup_perl_environment OR action type=setup_r_environment OR action type=setup_ruby_environment OR action type=setup_virtualenv repository changeset_revision=978287122b91 name=package_perl_5_18 owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu; package name=perl version=5.18.1 / /repository repository changeset_revision=8fc96166cddd name=package_expat_2_1 owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu; package name=expat version=2.1 / /repository /action For all repository tag sets contained within these setup_* tags, the repository's env.sh would be pulled in for the setup of the specified environment without requiring a set_environment_for_install action type. Yes, that would solve John's and my use case. Thanks Greg! Bjoern Also that did not solve the second use case. If have two packages one that is installing perl libraries and the second a binary that is checking or that needs these perl libs. We have discussed off list in another thread. Just to summarize my thoughts there - I think we should delay this or not make it a priority if there are marginally acceptable workarounds that can be found for the time being. Getting these four actions to work well as sort of terminal endpoints and allow specification as tersely as possible should be the primary goal for the time being. You will see Perl or Python packages depend on C libraries 10 times more frequently than you will find makefiles and C programs depend on complex perl or python environments (correct me if I am wrong). Given that there is already years worth of tool shed development outlined in existing Trello cards - this is just how I would prioritize things (happy to be overruled). Ok point taken. Lets focus on real issue. That use case is just a simplification / more structured way to write tool depdendecies, its not strictly needed to get my packages done. John just to make that use case clearer: - You have a package (A) with dependency (B) - B is not worth to put it in a extra repository (extra tool_dependencies.xml file) Currently, you are forced to define both in one package tag, because if you define it in two package tags A will not see B. The perl and python was a bad example you have that problem with every dependency that are not worth to put it in a separate repository. To summarize: I'm fine with that approach. It will address my current use case and it would be great to have it as proposed by Dave! Thanks a lot! Bjoern If so, can you confirm that this should be done for all four currently supported setup_* action types? I think it would be best to tackle setup_r_environment and setup_ruby_environment first. setup_virtualenv cannot have nested elements at this time - it is just assumed to be a bunch of text (either a file containing the dependencies or a list of the dependencies). So setup_r_environment and setup_ruby_environment have the same structure: setup_ruby_environment repository .. / package .. / package .. / /setup_ruby_environment ... but setup_virtualenv is just setup_virtualenvrequests=1.20 pycurl==1.3/setup_virtualenv I have created a Trello card for this: https://trello.com/c/NsLJv9la (and some other related stuff). Once that is tackled though, it will make sense to allow setup_virtualenv to utilize the same functionality. Thanks all, -John I think it will solve my current issues.