Re: [galaxy-dev] Local Galaxy Instance MarkupSafe error
Hi Nate, It must be the issue here, I just learnt the head node is running with redhat 5 while Galaxy and all the dependencies are running on a redhat 6 server so yes the python version is not the same. I'll see what I can do with my IS department then. Cheers for the help :) Micka On 2 November 2011 17:18, Nate Coraor n...@bx.psu.edu wrote: Jerico Nico De Leon Revote wrote: It's the same case as what I'm getting. I can see the output via eye icon on the history panel and able to download the files as well. Do your cluster nodes have internet access? If so, log into a node and run the command again from there. Your nodes may have a different Python version or Unicode byte order encoding scheme than your Galaxy application server. --nate On 1 November 2011 03:37, Mickael ESCUDERO mickael.escud...@gmail.com wrote: Hi there, I'm getting exactly the same problem with any job running on a TORQUE/PBS cluster. The jobs actually run fine as I can see the output and download it, but it's marked as failed in the galaxy history, with the following message: WARNING:galaxy.eggs:Warning: MarkupSafe (a dependent egg of Mako) cannot be fetched The command `python -ES ./scripts/fetch_eggs.py` gives nothing as output. If I run the same tools locally there is no problem. Cheers Micka Message: 5 Date: Thu, 27 Oct 2011 15:08:17 +1100 From: Jerico Nico De Leon Revote jerico.rev...@monash.edu To: galaxy-dev@lists.bx.psu.edu Subject: [galaxy-dev] Local Galaxy Instance MarkupSafe error Message-ID: cap9ulhyipxyb2cwtqosm3mquuu55arxy5por1dbahej1uco...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 Hi, I'm just doing a simple get-data from UCSC on our local Galaxy instance and got the following error: WARNING:galaxy.eggs:Warning: MarkupSafe (a dependent egg of Mako) cannot be fetched The job box then is displayed as red on the history panel. The job runner states that the job finished normally on the cluster. Galaxy is checkout from galaxy-central (changeset: 6176:34fffbf01183). Thanks, Jerico -- next part -- An HTML attachment was scrubbed... URL: http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20111027/cd777ff3/attachment-0001.html -- Message: 6 Date: Thu, 27 Oct 2011 16:58:13 +1100 From: Jerico Nico De Leon Revote jerico.rev...@monash.edu To: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Local Galaxy Instance MarkupSafe error Message-ID: cap9ulhbenenkqc50uzyxoca4w-zucdy8obs4i6b3e2qyfmy...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 Just to follow-up on this. The MarkupSafe egg is definitely present on the eggs directory and the servers are ran through virtualenv. On 27 October 2011 15:08, Jerico Nico De Leon Revote jerico.rev...@monash.edu wrote: Hi, I'm just doing a simple get-data from UCSC on our local Galaxy instance and got the following error: WARNING:galaxy.eggs:Warning: MarkupSafe (a dependent egg of Mako) cannot be fetched The job box then is displayed as red on the history panel. The job runner states that the job finished normally on the cluster. Galaxy is checkout from galaxy-central (changeset: 6176:34fffbf01183). Thanks, Jerico -- next part -- An HTML attachment was scrubbed... URL: http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20111027/f8baadf6/attachment-0001.html -- Message: 7 Date: Thu, 27 Oct 2011 02:40:51 -0400 From: Nate Coraor n...@bx.psu.edu To: Jerico Nico De Leon Revote jerico.rev...@monash.edu Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Local Galaxy Instance MarkupSafe error Message-ID: 20111027064051.gg2...@bx.psu.edu Content-Type: text/plain; charset=us-ascii Jerico Nico De Leon Revote wrote: Hi, I'm just doing a simple get-data from UCSC on our local Galaxy instance and got the following error: WARNING:galaxy.eggs:Warning: MarkupSafe (a dependent egg of Mako) cannot be fetched The job box then is displayed as red on the history panel. The job runner states that the job finished normally on the cluster. Galaxy is checkout from galaxy-central (changeset: 6176:34fffbf01183). Hi Jerico, Are you using a cluster? If not, could you run: % python -ES ./scripts/fetch_eggs.py --nate Thanks, Jerico ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ --
[galaxy-dev] GeneTrack-Installation - Version?
Hi, I sent this message about two weeks ago, but by now there was no response. I try again, maybe now someone who's got some advice notices :-) Regards, Steffi Original Message Subject:GeneTrack-Installation - Version? Date: Thu, 20 Oct 2011 23:21:25 +0200 From: Stefanie Ververs stefanie.verv...@fh-stralsund.de To: galaxy-dev@lists.bx.psu.edu Hi everybody, we're hosting our own galaxy instance and the next step should bei the integration of GeneTrack. (Not the tool, that is already included, but the browser, which has to be set up on our own, as i could read on the mailing list.) I've been reading all information on http://genetrack.bx.psu.edu and http://atlas.bx.psu.edu/genetrack/docs/genetrack.html - but I do not really get, which is the current, newest version? The only one to download is available at googlecode (1.0.3) and - according to the information on http://genetrack.bx.psu.edu - there is a newer one (2.0) I tried to run the tests for 1.0.3 but some fail - the logs aren't quite clear about the problem, even on Debug-mode. Could you tell me what version to use and which instructions? Are there limits or other dependencies according to the used python-packages? Hoping for help thanks in advance, Steffi ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Toolshed not showing up
Hello, I am getting a strange error in attempting to use the main Galaxy toolshed. I am using the latest version: galaxy@monolith:~/galaxy-dist$ hg pull -u -r 338ead4737ba pulling from https://bitbucket.org/galaxy/galaxy-dist/ searching for changes no changes found galaxy@monolith:~/galaxy-dist$ I have uncommented this toolshed line: galaxy@monolith:~/galaxy-dist$ grep shed universe_wsgi.ini tool_config_file = tool_conf.xml,shed_tool_conf.xml The shed_tool file is thus: galaxy@monolith:~/galaxy-dist$ cat shed_tool_conf.xml ?xml version=1.0? toolbox tool_path=../shed_tools /toolbox And galaxy@monolith:~/galaxy-dist$ cat tool_sheds_conf.xml ?xml version=1.0? tool_sheds tool_shed name=Galaxy main tool shed url=http://toolshed.g2.bx.psu.edu// tool_shed name=Galaxy test tool shed url=http://testtoolshed.g2.bx.psu.edu// /tool_sheds When I log in as Admin, click Admin the Galaxy main tool shed (under Tool Sheds) I get this: Not Found The resource could not be found. No action for /browse_downloadable_repositories I'm not sure where to go from here as I can find no info on this, but it looks like something tiny is messed up. Any help is appreciated. Jim ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] ImportError: No module named galaxy
Dear Nate, My PYTHONPATH is already set to this value. Best, Oren ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] galaxy citation_url setting
Shantanu Pavgi wrote: On Nov 2, 2011, at 3:39 PM, Nate Coraor wrote: Shantanu Pavgi wrote: Hi, It seems like modification of 'citation_url' in the universe_wsgi.ini config file has no effect in the UI (Help -- How to Cite Galaxy). Is it something hard-coded in the source? Hi Shantanu, Whoops. This has been fixed in 6205:9a9479f7e53f. Thanks Nate. I was wondering if 'lib/galaxy/config.py' file needs to be modified as well. {{{ $ hg diff ./lib/galaxy/config.py diff -r 9e90faf2cb1c lib/galaxy/config.py --- a/lib/galaxy/config.pyWed Nov 02 14:15:08 2011 -0700 +++ b/lib/galaxy/config.pyWed Nov 02 16:46:08 2011 -0500 @@ -113,6 +113,7 @@ self.gbrowse_display_sites = kwargs.get( 'gbrowse_display_sites', wormbase,tair,modencode_worm,modencode_fly,sgd_yeast ).lower().split(,) self.genetrack_display_sites = kwargs.get( 'genetrack_display_sites', main,test ).lower().split(,) self.brand = kwargs.get( 'brand', None ) + self.citation_url = kwargs.get('citation_url', 'http://wiki.g2.bx.psu.edu/Citing%20Galaxy') self.support_url = kwargs.get( 'support_url', 'http://wiki.g2.bx.psu.edu/Support' ) self.wiki_url = kwargs.get( 'wiki_url', 'http://g2.trac.bx.psu.edu/' ) self.blog_url = kwargs.get( 'blog_url', None ) }}} Also, it seems like the default values are being passed twice here - in the mako templates and initialization method of Configuration class. I was wondering if default values can be passed only once during initialization and then other get methods would only query the necessary configuration option. I don't know all the code in detail, so I might be wrong here. The template is using Config's get() method, which allows for a default if the option is not in the kwargs passed to Config.__init__(). It's a shorthand we've used for config options that are only used in one place in the code. --nate -- Shantanu. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Clusters, Runners, and user credentials
Ilya, Nate, To add a bit of background to the below, we have several clusters on campus that use very different accounting systems; some run as a regular cron job to process job run info, however others use a qsub wrapper to check service units prior to job submission (a byproduct of being part of teragrid/xcede). It seems the most direct route to work around accounting-level differences is to submit the job as a user (so I'm interested in this solution), but the below security questions I mentioned were raised by a number of our local cluster sysadmins as well as (if I'm not mistaken) at the conference. Were these ever addressed, or is it considered an non-issue? Apologies about re-sending, I didn't know if this had been answered elsewhere, but this was a serious concern that may block us from using some pretty nice HPC resources. chris On Nov 1, 2011, at 4:59 PM, Fields, Christopher J wrote: I recall at the Galaxy conf there were questions on how secure this is (having the 'galaxy' user submit jobs as someone else). This would involve switching users on the cluster or would require user login information, correct? The way we planned on working around this was to just specify a user account string (using '-A') instead of bothering with switching users. I believe our local cluster disallows switching users via PBS unless the submitter has admin privs, but the accounting string works fine (I suppose one could use the project option as well). chris On Oct 31, 2011, at 6:30 PM, Chorny, Ilya wrote: I modified drmaa.py to pass the galaxy users path variable to the actual user. As long as the galaxy user's environment is correct then the actual user's environment should be correct. -Original Message- From: Glen Beane [mailto:glen.be...@jax.org] Sent: Monday, October 31, 2011 4:20 PM To: Chorny, Ilya Cc: Lloyd Brown; Galaxy Dev List Subject: Re: [galaxy-dev] Clusters, Runners, and user credentials Many of us are using the PBS job runner (for TORQUE) and would definitely be interested in a port. How do you deal with making sure the user's environment is configured properly? We use a python virtualenv and load specific module files with tested tool versions in our galaxy users startup scripts on our cluster. Sent from my iPhone On Oct 31, 2011, at 6:29 PM, Chorny, Ilya icho...@illumina.com wrote: BTW, I am not sure if PBS works with drmaa. If not then the code will need to be ported to work with pbs. Ilya -Original Message- From: galaxy-dev-boun...@lists.bx.psu.edu [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Chorny, Ilya Sent: Monday, October 31, 2011 3:27 PM To: Lloyd Brown; Galaxy Dev List Subject: Re: [galaxy-dev] Clusters, Runners, and user credentials Lyod, See Nate's email below Title: Actual user code. We have been working on implementing this feature in galaxy. The code is still in development but feel free to test it out and let us know how it works for you. Best, Ilya -Original Message- From: galaxy-dev-boun...@lists.bx.psu.edu [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Lloyd Brown Sent: Monday, October 31, 2011 2:35 PM To: Galaxy Dev List Subject: [galaxy-dev] Clusters, Runners, and user credentials I'm a systems administrator for an HPC cluster, and have been asked by a faculty member here to try to get galaxy to work on our cluster. Unfortunately, there are one or two outstanding questions that I can't seem to find the answer to, and I'm hoping someone here can help me out. In particular, is galaxy, and the PBS runner specifically, capable of submitting jobs under specific user names? Essentially, if I set up galaxy to push jobs to our cluster, will they all show up under one user credential (eg. the galaxy user), or can we set it up so that the user logged into galaxy, is used to submit the job? This one is kindof a show-stopper, since our internal policies require that all jobs have a specific user credential, with one person per username. Thanks, Lloyd -- Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use
[galaxy-dev] Installing Galaxy on local server
Hello, I am a graduate student at GSU. I am looking for installing GALAXY on our local server http://alla.cs.gsu.edu/~software. as a result I like my server homepage start with the GALAXY interface, like this one: http://rna1.engr.uconn.edu:7474/ . after that I wish to integrate some software tolls that we developed in our department. my primary question is which option I have to choose: Local or cloud? looking forward for your help and directions. Thank you. Tuqa ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Installing Galaxy on local server
Hello, I am a graduate student at GSU. I am looking for installing GALAXY on our local server http://alla.cs.gsu.edu/~software. as a result I like my server homepage start with the GALAXY interface, like this one: http://rna1.engr.uconn.edu:7474/ . after that I wish to integrate some software tolls that we developed in our department. my primary question is which option I have to choose: Local or cloud? looking forward for your help and directions. Thank you. Tuqa ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] fastx reverse compliment failed - gzip: stdout: Broken pipe
I'm reposting over here from the user side, since this is a local instance, and it was recommended. Hi all, We are running Galaxy on an Ubuntu 11.10 computer (5 TB, stripped, etc.). We are assembling a small genome (110 Gb). Our dataset isn't directly uploaded, but is accessed from a directory (if that matters). Everything went fine through the FASTQ Groomer, but when we ran Reverse-Compliment, we got the following error: fastx_reverse_complement: writing quality scores failed: File too large gzip: stdout: Broken pipe Any help that you might have would be greatly appreciated! Thanks! As a follow up, the file that we're trying to reverse compliment is ~26 Gb. The files seem to work fine until 2.1 Gb. There is plenty of memory (we have 3.4 Tb free on this system) and it doesn't seem to be an issue with the permissions or partitions. I also made sure that the Gzip and connectors to perl are up to date. And I have set everything in ulimit to be unlimited (so there are no issues for Ubuntu for the file size creation). This is a local instance, btw, though I'm sure that's obvious... We were able to groom the files to create the 26Gb file that we want to work with, so it seems like the computer should be able to do all of this... Any thoughts? ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Error on local galaxy using SAM-to-BAM tool on a cluster
___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ Hello Carlos, If what you want is a sorted SAM file, then the tool Filter and Sort - Sort may be a better choice. A SAM file is a tabular file. If there is header data at the beginning of the SAM file, it can be removed before running Sort with the tool Filter and Sort - Select (with a not matching regex). Although, you can choose to not include header output as a BWA option. Perhaps this will solve the immediate problem? Best, Jen Galaxy team On 11/3/11 12:43 PM, Carlos Borroto wrote: Hi, I'm running into this error: Error sorting alignments from (/tmp/5800600.1.all.q/tmpXOc5mD/tmpAZCzt_), When using SAM-to-BAM tool on a locally install Galaxy using a SGE cluster. I'm using the last version of galaxy-dist. I'm guessing I have a problem with the configuration for the tmp folder. I have this on universe_wsgi.ini: # Temporary files are stored in this directory. new_file_path = /home/cborroto/galaxy_dist/database/tmp But I don't see this directory being used and from the error looks like /tmp in the node is used. I wonder if this is the problem, as I don't know if there is enough space in the local /tmp directory at the nodes? I ran the same tool in a subset of the same SAM file and it ran fine. Also, I see this in the description of the tool: This tool uses the SAMTools toolkit to produce an indexed BAM file based on a sorted input SAM file. But what I actually need is to sort a SAM file output from bwa, I haven't found any other way than to converting it to BAM. Looking at sam_to_bam.py I see the BAM file will also be sorted. Would it be wrong to feed an unsorted SAM file into this tool? Finally, just to be sure there is nothing wrong with the initial SAM file, I ran samtools view ... and samtools sort ... on this file manually outside of Galaxy and it ran fine. Thanks in advance, Carlos ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Looks like actual user breaks splitting
I'm not following you - it's been 6 months since I wrote that code ;-} IT looks to me like a DatasetPath() object is always placed in that array, and with one exception near then, it looks like the change I made generates those objects the same way. Do you have a stack trace for the merge problem I can look at? John Duddy Sr. Staff Software Engineer Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Tel: 858-736-3584 E-mail: jdu...@illumina.com -Original Message- From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] Sent: Thursday, November 03, 2011 2:22 PM To: Duddy, John Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu Subject: Re: Looks like actual user breaks splitting Hi John, It looks like the first issue is related to the change from get_output_fnames() - compute_outputs(). When outputs_to_working_directory = False (default) this method stores/returns a HistoryDatasetAssociation, but when True, stores/returns a Dataset (the original method's behavior). Thus, accessing the object's .datatype attribute in the splitter's do_merge() fails. Thanks, --nate Duddy, John wrote: I'll submit a pull request shortly... John Duddy Sr. Staff Software Engineer Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Tel: 858-736-3584 E-mail: jdu...@illumina.com -Original Message- From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] Sent: Wednesday, November 02, 2011 12:24 PM To: Duddy, John Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu Subject: Re: Looks like actual user breaks splitting John, Ilya, I get further with sequence type inputs but it looks like JobWrapper.get_output_datasets_and_fnames() is not returning the right thing when outputs_to_working_directory = True BTW, the base Data.split() method is broken after the updates to Sequence.split() since it wasn't updated to expect HistoryDatasetAssociations rather than filenames. Could you take a look at that when you get a chance? --nate Duddy, John wrote: The datatype you are using does not define a split method. Are you working with our in-progress gz type or fastqillumina? John Duddy Sr. Staff Software Engineer Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Tel: 858-736-3584 E-mail: jdu...@illumina.commailto:jdu...@illumina.com From: Chorny, Ilya Sent: Wednesday, November 02, 2011 11:50 AM To: Duddy, John Cc: Nate Coraor (n...@bx.psu.edu); galaxy-dev@lists.bx.psu.edu Subject: Looks like actual user breaks splitting Hey John, Any thoughts? Ilya Traceback (most recent call last): File /home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/runners/tasks.py, line 73, in run_job tasks = splitter.do_split(job_wrapper) File /home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/splitters/multi.py, line 73, in do_split input_type.split(input_datasets, get_new_working_directory_name, parallel_settings) File /home/galaxy/ichorny/galaxy-central/lib/galaxy/datatypes/data.py, line 473, in split raise Exception(Text file splitting does not support multiple files) Exception: Text file splitting does not support multiple files Ilya Chorny Ph.D. Bioinformatics Scientist I Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Work: 858.202.4582 Email: icho...@illumina.commailto:icho...@illumina.com Website: www.illumina.comhttp://www.illumina.com ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Looks like actual user breaks splitting
Nate Coraor (n...@bx.psu.edu) wrote: Duddy, John wrote: I'm not following you - it's been 6 months since I wrote that code ;-} I know the feeling! IT looks to me like a DatasetPath() object is always placed in that array, and with one exception near then, it looks like the change I made generates those objects the same way. It's creating a dict in self.output_dataset_paths, and that dict looks like this when outputs_to_working_directory = False: { output_param_name : [ HDA, DatasetPath ], ... } And this when True: { output_param_name : [ Dataset, DatasetPath ], ... } Do you have a stack trace for the merge problem I can look at? If you put this in do_merge()'s except block: log.exception( stdout ) You'll get: Traceback (most recent call last): File /space/nate/galaxy-central-ichorny/lib/galaxy/jobs/splitters/multi.py, line 128, in do_merge output_type = outputs[output][0].datatype AttributeError: 'Dataset' object has no attribute 'datatype' I could just change both methods to put an HDA in the list inside the dict there, but I haven't looked much into what output_dataset_paths is used for, so I wasn't sure what that might break. Sorta answered it myself, it looks like you created this precisely for do_merge(), so changing it to contain the HDA fixes the problem (and shouldn't break anything else). --nate Thanks, --nate John Duddy Sr. Staff Software Engineer Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Tel: 858-736-3584 E-mail: jdu...@illumina.com -Original Message- From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] Sent: Thursday, November 03, 2011 2:22 PM To: Duddy, John Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu Subject: Re: Looks like actual user breaks splitting Hi John, It looks like the first issue is related to the change from get_output_fnames() - compute_outputs(). When outputs_to_working_directory = False (default) this method stores/returns a HistoryDatasetAssociation, but when True, stores/returns a Dataset (the original method's behavior). Thus, accessing the object's .datatype attribute in the splitter's do_merge() fails. Thanks, --nate Duddy, John wrote: I'll submit a pull request shortly... John Duddy Sr. Staff Software Engineer Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Tel: 858-736-3584 E-mail: jdu...@illumina.com -Original Message- From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] Sent: Wednesday, November 02, 2011 12:24 PM To: Duddy, John Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu Subject: Re: Looks like actual user breaks splitting John, Ilya, I get further with sequence type inputs but it looks like JobWrapper.get_output_datasets_and_fnames() is not returning the right thing when outputs_to_working_directory = True BTW, the base Data.split() method is broken after the updates to Sequence.split() since it wasn't updated to expect HistoryDatasetAssociations rather than filenames. Could you take a look at that when you get a chance? --nate Duddy, John wrote: The datatype you are using does not define a split method. Are you working with our in-progress gz type or fastqillumina? John Duddy Sr. Staff Software Engineer Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Tel: 858-736-3584 E-mail: jdu...@illumina.commailto:jdu...@illumina.com From: Chorny, Ilya Sent: Wednesday, November 02, 2011 11:50 AM To: Duddy, John Cc: Nate Coraor (n...@bx.psu.edu); galaxy-dev@lists.bx.psu.edu Subject: Looks like actual user breaks splitting Hey John, Any thoughts? Ilya Traceback (most recent call last): File /home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/runners/tasks.py, line 73, in run_job tasks = splitter.do_split(job_wrapper) File /home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/splitters/multi.py, line 73, in do_split input_type.split(input_datasets, get_new_working_directory_name, parallel_settings) File /home/galaxy/ichorny/galaxy-central/lib/galaxy/datatypes/data.py, line 473, in split raise Exception(Text file splitting does not support multiple files) Exception: Text file splitting does not support multiple files Ilya Chorny Ph.D. Bioinformatics Scientist I Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Work: 858.202.4582 Email: icho...@illumina.commailto:icho...@illumina.com Website: www.illumina.comhttp://www.illumina.com ___ Please keep all replies on the list by using reply all in your mail client. To manage your
Re: [galaxy-dev] Clusters, Runners, and user credentials
On Nov 3, 2011, at 9:50 AM, Nate Coraor wrote: Hi Chris, Ilya's solution uses sudo to submit the job via drmaa after switching to the actual user's uid and gid. This means giving your Galaxy user sudo rights to run 3 scripts as root: * A script to submit jobs * A script to kill jobs * A script to chown a directory This could be tightened up a bit, in the case of the first two by sudoing directly to the user, rather than to root and then setuid()ing. In the case of the latter script, a path is passed to the script rather than a Galaxy job id, so it could be used by the Galaxy user to chown anything that root can chown. In addition, if your Galaxy data lives in NFS with root squashing enabled, this script would fail. Yes, we may run into this or worse; we're setting up gpfs locally for our NFS. Of course, the paths to these scripts are configurable, so they can be replaced with site-suitable versions. Another option to avoid sudo entirely would be for Galaxy to start as root and then drop privileges, but I am not incredibly fond of this solution, since it allows for the possibility of privilege separation exploits. Perhaps a stripped down Galaxy data daemon that runs with elevated privileges, whose sole job it is to manage permissions and move data? That sounds like a feasible option. As with the existing Galaxy implementation, Galaxy's data is not copied around at job runtime for tool input, it simply exists in one place and is expected to be locatable on the cluster resource at the same path. My next development goal is to remove this limitation. This is something we will run into at some point, particularly with some of the NCSA resources (where user paths are quite different from other clusters on campus). The assumption is also made that tool inputs are readable by the actual user, which was a problem in some environments. If administrators prefer to give the Galaxy account the permission to run jobs as other users directly in the DRM, this would certainly solve the problem. Galaxy would just need minor modification to take advantage of the feature. As you probably recall, there were many people at the GCC brainstorming this problem, and I don't recall that we ever came up with the perfectly secure solution. This solution may be good enough for some sites. Right, I think this is more a problem when the cluster is not under our control and has already been configured. Not that it's impossible, but there is definitely an additional level of sysadmin concerns we have to deal with. And with multiple clusters (with multiple configurations, sysadmins, etc) this becomes more complex. We're deploying step-wise (on one cluster initially, then others down the road) for this reason. If there's a desire for tightened security, I would be happy to review and accept any work done on that. =) --nate That is a possibility, we have initially talked with a few of the myproxy folks here re: security concerns and possible solutions for user job submissions (there wasn't much added yet beyond what you already covered, unfortunately). chris Fields, Christopher J wrote: Ilya, Nate, To add a bit of background to the below, we have several clusters on campus that use very different accounting systems; some run as a regular cron job to process job run info, however others use a qsub wrapper to check service units prior to job submission (a byproduct of being part of teragrid/xcede). It seems the most direct route to work around accounting-level differences is to submit the job as a user (so I'm interested in this solution), but the below security questions I mentioned were raised by a number of our local cluster sysadmins as well as (if I'm not mistaken) at the conference. Were these ever addressed, or is it considered an non-issue? Apologies about re-sending, I didn't know if this had been answered elsewhere, but this was a serious concern that may block us from using some pretty nice HPC resources. chris On Nov 1, 2011, at 4:59 PM, Fields, Christopher J wrote: I recall at the Galaxy conf there were questions on how secure this is (having the 'galaxy' user submit jobs as someone else). This would involve switching users on the cluster or would require user login information, correct? The way we planned on working around this was to just specify a user account string (using '-A') instead of bothering with switching users. I believe our local cluster disallows switching users via PBS unless the submitter has admin privs, but the accounting string works fine (I suppose one could use the project option as well). chris On Oct 31, 2011, at 6:30 PM, Chorny, Ilya wrote: I modified drmaa.py to pass the galaxy users path variable to the actual user. As long as the galaxy user's environment is correct then the actual user's environment should be correct. -Original