Re: [galaxy-dev] Local Galaxy Instance MarkupSafe error

2011-11-03 Thread Mickael ESCUDERO
Hi Nate,

It must be the issue here, I just learnt the head node is running with
redhat 5 while Galaxy and all the dependencies are running on a redhat 6
server so yes the python version is not the same.
I'll see what I can do with my IS department then.

Cheers for the help :)

Micka


On 2 November 2011 17:18, Nate Coraor n...@bx.psu.edu wrote:

 Jerico Nico De Leon Revote wrote:
  It's the same case as what I'm getting. I can see the output via eye
 icon
  on the history panel and able to download the files as well.

 Do your cluster nodes have internet access?  If so, log into a node and
 run the command again from there.  Your nodes may have a different
 Python version or Unicode byte order encoding scheme than your Galaxy
 application server.

 --nate

 
  On 1 November 2011 03:37, Mickael ESCUDERO mickael.escud...@gmail.com
 wrote:
 
   Hi there,
  
   I'm getting exactly the same problem with any job running on a
 TORQUE/PBS
   cluster. The jobs actually run fine as I can see the output and
 download
   it, but it's marked as failed in the galaxy history, with the following
   message:
  
   WARNING:galaxy.eggs:Warning: MarkupSafe (a dependent egg of Mako)
 cannot be fetched
  
   The command `python -ES ./scripts/fetch_eggs.py` gives nothing as
 output.
   If I run the same tools locally there is no problem.
  
   Cheers
   Micka
  
  
   Message: 5
   Date: Thu, 27 Oct 2011 15:08:17 +1100
   From: Jerico Nico De Leon Revote jerico.rev...@monash.edu
   To: galaxy-dev@lists.bx.psu.edu
   Subject: [galaxy-dev] Local Galaxy Instance MarkupSafe error
   Message-ID:
  
   cap9ulhyipxyb2cwtqosm3mquuu55arxy5por1dbahej1uco...@mail.gmail.com
   Content-Type: text/plain; charset=iso-8859-1
  
   Hi,
  
   I'm just doing a simple get-data from UCSC on our local Galaxy
   instance and got the following error:
  
   WARNING:galaxy.eggs:Warning: MarkupSafe (a dependent egg of Mako)
   cannot be fetched
  
   The job box then is displayed as red on the history panel.
   The job runner states that the job finished normally on the cluster.
   Galaxy is checkout from galaxy-central (changeset: 6176:34fffbf01183).
  
   Thanks,
  
   Jerico
   -- next part --
   An HTML attachment was scrubbed...
   URL: 
  
 http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20111027/cd777ff3/attachment-0001.html
   
  
   --
  
   Message: 6
   Date: Thu, 27 Oct 2011 16:58:13 +1100
   From: Jerico Nico De Leon Revote jerico.rev...@monash.edu
   To: galaxy-dev@lists.bx.psu.edu
   Subject: Re: [galaxy-dev] Local Galaxy Instance MarkupSafe error
   Message-ID:
  
   cap9ulhbenenkqc50uzyxoca4w-zucdy8obs4i6b3e2qyfmy...@mail.gmail.com
   Content-Type: text/plain; charset=iso-8859-1
  
   Just to follow-up on this. The MarkupSafe egg is definitely present
 on the
   eggs directory and the servers are ran through virtualenv.
  
   On 27 October 2011 15:08, Jerico Nico De Leon Revote 
   jerico.rev...@monash.edu wrote:
  
Hi,
   
I'm just doing a simple get-data from UCSC on our local Galaxy
   instance and got the following error:
   
WARNING:galaxy.eggs:Warning: MarkupSafe (a dependent egg of Mako)
   cannot be fetched
   
The job box then is displayed as red on the history panel.
The job runner states that the job finished normally on the cluster.
Galaxy is checkout from galaxy-central (changeset:
 6176:34fffbf01183).
   
Thanks,
   
Jerico
   
   
   -- next part --
   An HTML attachment was scrubbed...
   URL: 
  
 http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20111027/f8baadf6/attachment-0001.html
   
  
   --
  
   Message: 7
   Date: Thu, 27 Oct 2011 02:40:51 -0400
   From: Nate Coraor n...@bx.psu.edu
   To: Jerico Nico De Leon Revote jerico.rev...@monash.edu
   Cc: galaxy-dev@lists.bx.psu.edu
   Subject: Re: [galaxy-dev] Local Galaxy Instance MarkupSafe error
   Message-ID: 20111027064051.gg2...@bx.psu.edu
   Content-Type: text/plain; charset=us-ascii
  
   Jerico Nico De Leon Revote wrote:
Hi,
   
I'm just doing a simple get-data from UCSC on our local Galaxy
instance and got the following error:
   
WARNING:galaxy.eggs:Warning: MarkupSafe (a dependent egg of Mako)
cannot be fetched
   
The job box then is displayed as red on the history panel.
The job runner states that the job finished normally on the cluster.
Galaxy is checkout from galaxy-central (changeset:
 6176:34fffbf01183).
  
   Hi Jerico,
  
   Are you using a cluster?  If not, could you run:
  
  % python -ES ./scripts/fetch_eggs.py
  
   --nate
  
   
Thanks,
   
Jerico
  
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   
  http://lists.bx.psu.edu/
  
   --
   

[galaxy-dev] GeneTrack-Installation - Version?

2011-11-03 Thread Stefanie Ververs

Hi,

I sent this message about two weeks ago, but by now there was no response.
I try again, maybe now someone who's got some advice notices :-)

Regards,
Steffi


 Original Message 
Subject:GeneTrack-Installation - Version?
Date:   Thu, 20 Oct 2011 23:21:25 +0200
From:   Stefanie Ververs stefanie.verv...@fh-stralsund.de
To: galaxy-dev@lists.bx.psu.edu



Hi everybody,

we're hosting our own galaxy instance and the next step should bei the
integration of GeneTrack. (Not the tool, that is already included, but
the browser, which has to be set up on our own, as i could read on the
mailing list.)

I've been reading all information on http://genetrack.bx.psu.edu and
http://atlas.bx.psu.edu/genetrack/docs/genetrack.html - but I do not
really get, which is the current, newest version? The only one to
download is available at googlecode (1.0.3) and - according to the
information on http://genetrack.bx.psu.edu - there is a newer one (2.0)
I tried to run the tests for 1.0.3 but some fail - the logs aren't quite
clear about the problem, even on Debug-mode.

Could you tell me what version to use and which instructions? Are there
limits or other dependencies according to the used python-packages?

Hoping for help  thanks in advance,

Steffi

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Toolshed not showing up

2011-11-03 Thread jjv5
Hello,

I am getting a strange error in attempting to use the main Galaxy toolshed.

I am using the latest version:

galaxy@monolith:~/galaxy-dist$ hg pull -u -r 338ead4737ba
pulling from https://bitbucket.org/galaxy/galaxy-dist/
searching for changes
no changes found
galaxy@monolith:~/galaxy-dist$

I have uncommented this toolshed line:

galaxy@monolith:~/galaxy-dist$ grep shed universe_wsgi.ini
tool_config_file = tool_conf.xml,shed_tool_conf.xml

The shed_tool file is thus:

galaxy@monolith:~/galaxy-dist$ cat shed_tool_conf.xml
?xml version=1.0?
toolbox tool_path=../shed_tools
/toolbox

And

galaxy@monolith:~/galaxy-dist$ cat tool_sheds_conf.xml
?xml version=1.0?
tool_sheds
tool_shed name=Galaxy main tool shed
url=http://toolshed.g2.bx.psu.edu//
tool_shed name=Galaxy test tool shed
url=http://testtoolshed.g2.bx.psu.edu//
/tool_sheds

When I log in as Admin, click Admin the Galaxy main tool shed (under
Tool Sheds) I get this:

Not Found

The resource could not be found.
No action for /browse_downloadable_repositories


I'm not sure where to go from here as I can find no info on this, but
it looks like something tiny is messed up. Any help is appreciated.

Jim
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] ImportError: No module named galaxy

2011-11-03 Thread Oren Livne

Dear Nate,
My PYTHONPATH is already set to this value.
Best,
Oren

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] galaxy citation_url setting

2011-11-03 Thread Nate Coraor
Shantanu Pavgi wrote:
 
 On Nov 2, 2011, at 3:39 PM, Nate Coraor wrote:
 
  Shantanu Pavgi wrote:
  
  Hi,
  
  It seems like modification of 'citation_url'  in the universe_wsgi.ini 
  config file has no effect in the UI (Help -- How to Cite Galaxy). Is it 
  something hard-coded in the source? 
  
  Hi Shantanu,
  
  Whoops.  This has been fixed in 6205:9a9479f7e53f.
  
 
 
 Thanks Nate. I was wondering if 'lib/galaxy/config.py' file needs to be 
 modified as well. 
 
 {{{
 $ hg diff ./lib/galaxy/config.py
 diff -r 9e90faf2cb1c lib/galaxy/config.py
 --- a/lib/galaxy/config.pyWed Nov 02 14:15:08 2011 -0700
 +++ b/lib/galaxy/config.pyWed Nov 02 16:46:08 2011 -0500
 @@ -113,6 +113,7 @@
  self.gbrowse_display_sites = kwargs.get( 'gbrowse_display_sites', 
 wormbase,tair,modencode_worm,modencode_fly,sgd_yeast ).lower().split(,)
  self.genetrack_display_sites = kwargs.get( 
 'genetrack_display_sites', main,test ).lower().split(,)
  self.brand = kwargs.get( 'brand', None )
 +   self.citation_url = kwargs.get('citation_url', 
 'http://wiki.g2.bx.psu.edu/Citing%20Galaxy')
  self.support_url = kwargs.get( 'support_url', 
 'http://wiki.g2.bx.psu.edu/Support' )
  self.wiki_url = kwargs.get( 'wiki_url', 'http://g2.trac.bx.psu.edu/' 
 )
  self.blog_url = kwargs.get( 'blog_url', None )
 
 }}}
 
 Also, it seems like the default values are being passed twice here - in the 
 mako templates and initialization method of Configuration class. I was 
 wondering if default values can be passed only once during initialization and 
 then other get methods would only query the necessary configuration option.   
 I don't know all the code in detail, so I might be wrong here. 

The template is using Config's get() method, which allows for a default
if the option is not in the kwargs passed to Config.__init__().  It's a
shorthand we've used for config options that are only used in one place
in the code.

--nate

 
 --
 Shantanu. 
 
 
 
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Clusters, Runners, and user credentials

2011-11-03 Thread Fields, Christopher J
Ilya, Nate,

To add a bit of background to the below, we have several clusters on campus 
that use very different accounting systems; some run as a regular cron job to 
process job run info, however others use a qsub wrapper to check service units 
prior to job submission (a byproduct of being part of teragrid/xcede).  It 
seems the most direct route to work around accounting-level differences is to 
submit the job as a user (so I'm interested in this solution), but the below 
security questions I mentioned were raised by a number of our local cluster 
sysadmins as well as (if I'm not mistaken) at the conference.  

Were these ever addressed, or is it considered an non-issue?  Apologies about 
re-sending, I didn't know if this had been answered elsewhere, but this was a 
serious concern that may block us from using some pretty nice HPC resources.

chris

On Nov 1, 2011, at 4:59 PM, Fields, Christopher J wrote:

 I recall at the Galaxy conf there were questions on how secure this is 
 (having the 'galaxy' user submit jobs as someone else).  This would involve 
 switching users on the cluster or would require user login information, 
 correct?
 
 The way we planned on working around this was to just specify a user account 
 string (using '-A') instead of bothering with switching users.  I believe our 
 local cluster disallows switching users via PBS unless the submitter has 
 admin privs, but the accounting string works fine (I suppose one could use 
 the project option as well).
 
 chris
 
 On Oct 31, 2011, at 6:30 PM, Chorny, Ilya wrote:
 
 I modified drmaa.py to pass the galaxy users path variable to the actual 
 user. As long as the galaxy user's environment is correct then the actual 
 user's environment should be correct.  
 
 -Original Message-
 From: Glen Beane [mailto:glen.be...@jax.org] 
 Sent: Monday, October 31, 2011 4:20 PM
 To: Chorny, Ilya
 Cc: Lloyd Brown; Galaxy Dev List
 Subject: Re: [galaxy-dev] Clusters, Runners, and user credentials
 
 Many of us are using the PBS job runner (for TORQUE) and would definitely be 
 interested in a port. 
 
 How do you deal with making sure the user's environment is configured 
 properly? We use a python virtualenv and load specific module files with 
 tested tool versions in our galaxy users startup scripts on our cluster. 
 
 Sent from my iPhone
 
 On Oct 31, 2011, at 6:29 PM, Chorny, Ilya icho...@illumina.com wrote:
 
 BTW, I am not sure if PBS works with drmaa. If not then the code will need 
 to be ported to work with pbs.
 
 Ilya
 
 
 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu 
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Chorny, Ilya
 Sent: Monday, October 31, 2011 3:27 PM
 To: Lloyd Brown; Galaxy Dev List
 Subject: Re: [galaxy-dev] Clusters, Runners, and user credentials
 
 Lyod,
 
 See Nate's email below Title: Actual user code. We have been working on 
 implementing this feature in galaxy. The code is still in development but 
 feel free to test it out and let us know how it works for you.
 
 Best,
 
 Ilya
 
 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu 
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Lloyd Brown
 Sent: Monday, October 31, 2011 2:35 PM
 To: Galaxy Dev List
 Subject: [galaxy-dev] Clusters, Runners, and user credentials
 
 I'm a systems administrator for an HPC cluster, and have been asked by a 
 faculty member here to try to get galaxy to work on our cluster.
 Unfortunately, there are one or two outstanding questions that I can't seem 
 to find the answer to, and I'm hoping someone here can help me out.
 
 In particular, is galaxy, and the PBS runner specifically, capable of 
 submitting jobs under specific user names?  Essentially, if I set up galaxy 
 to push jobs to our cluster, will they all show up under one user 
 credential (eg. the galaxy user), or can we set it up so that the user 
 logged into galaxy, is used to submit the job?
 
 This one is kindof a show-stopper, since our internal policies require that 
 all jobs have a specific user credential, with one person per username.
 
 Thanks,
 Lloyd
 
 
 --
 Lloyd Brown
 Systems Administrator
 Fulton Supercomputing Lab
 Brigham Young University
 http://marylou.byu.edu
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this and other Galaxy 
 lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this and other Galaxy 
 lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this and other 
 Galaxy lists, please use 

[galaxy-dev] Installing Galaxy on local server

2011-11-03 Thread Toqa Manasrah




Hello,



I am a graduate student at GSU. I am looking for installing GALAXY on our local 
server http://alla.cs.gsu.edu/~software. as a result I like my server homepage 
start with the GALAXY interface, like this one: 
http://rna1.engr.uconn.edu:7474/ . after that I wish to integrate some software 
tolls that we developed in our department. my primary question is which option 
I have to choose: Local or cloud?





looking forward for your help and directions.



Thank you.



Tuqa


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Installing Galaxy on local server

2011-11-03 Thread Toqa Manasrah
Hello,



I am a graduate student at GSU. I am looking for installing GALAXY on our local 
server http://alla.cs.gsu.edu/~software. as a result I like my server homepage 
start with the GALAXY interface, like this one: 
http://rna1.engr.uconn.edu:7474/ . after that I wish to integrate some software 
tolls that we developed in our department. my primary question is which option 
I have to choose: Local or cloud?





looking forward for your help and directions.



Thank you.



Tuqa


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] fastx reverse compliment failed - gzip: stdout: Broken pipe

2011-11-03 Thread Lucinda Lawson
I'm reposting over here from the user side, since this is a local
instance, and it was recommended.

Hi all,
We are running Galaxy on an Ubuntu 11.10 computer (5 TB, stripped,
etc.). We are assembling a small genome (110 Gb). Our dataset isn't
directly uploaded, but is accessed from a directory (if that matters).
Everything went fine through the FASTQ Groomer, but when we ran
Reverse-Compliment, we got the following error:

fastx_reverse_complement: writing quality scores failed: File too large

gzip: stdout: Broken pipe

Any help that you might have would be greatly appreciated! Thanks!

As a follow up, the file that we're trying to reverse compliment is
~26 Gb. The files seem to work fine until 2.1 Gb. There is plenty of
memory (we have 3.4 Tb free on this system) and it doesn't seem to be
an issue with the permissions or partitions. I also made sure that the
Gzip and connectors to perl are up to date. And I have set everything
in ulimit to be unlimited (so there are no issues for Ubuntu for the
file size creation). This is a local instance, btw, though I'm sure
that's obvious... We were able to groom the files to create the 26Gb
file that we want to work with, so it seems like the computer should
be able to do all of this... Any thoughts?
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Error on local galaxy using SAM-to-BAM tool on a cluster

2011-11-03 Thread Jennifer Jackson

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Hello Carlos,

If what you want is a sorted SAM file, then the tool Filter and Sort - 
Sort may be a better choice. A SAM file is a tabular file.


If there is header data at the beginning of the SAM file, it can be 
removed before running Sort with the tool Filter and Sort - Select 
(with a not matching regex). Although, you can choose to not include 
header output as a BWA option.


Perhaps this will solve the immediate problem?

Best,

Jen
Galaxy team

On 11/3/11 12:43 PM, Carlos Borroto wrote:

Hi,

I'm running into this error:
Error sorting alignments from (/tmp/5800600.1.all.q/tmpXOc5mD/tmpAZCzt_),

When using SAM-to-BAM tool on a locally install Galaxy using a SGE
cluster. I'm using the last version of galaxy-dist. I'm guessing I
have a problem with the configuration for the tmp folder. I have this
on universe_wsgi.ini:
# Temporary files are stored in this directory.
new_file_path = /home/cborroto/galaxy_dist/database/tmp

But I don't see this directory being used and from the error looks
like /tmp in the node is used. I wonder if this is the problem, as I
don't know if there is enough space in the local /tmp directory at the
nodes? I ran the same tool in a subset of the same SAM file and it ran
fine.

Also, I see this in the description of the tool:
This tool uses the SAMTools toolkit to produce an indexed BAM file
based on a sorted input SAM file.

But what I actually need is to sort a SAM file output from bwa, I
haven't found any other way than to converting it to BAM. Looking at
sam_to_bam.py I see the BAM file will also be sorted. Would it be
wrong to feed an unsorted SAM file into this tool?

Finally, just to be sure there is nothing wrong with the initial SAM
file, I ran samtools view ... and samtools sort ... on this file
manually outside of Galaxy and it ran fine.

Thanks in advance,
Carlos
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Looks like actual user breaks splitting

2011-11-03 Thread Duddy, John
I'm not following you - it's been 6 months since I wrote that code ;-}

IT looks to me like a DatasetPath() object is always placed in that array, and 
with one exception near then, it looks like the change I made generates those 
objects the same way.

Do you have a stack trace for the merge problem I can look at?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
Sent: Thursday, November 03, 2011 2:22 PM
To: Duddy, John
Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
Subject: Re: Looks like actual user breaks splitting

Hi John,

It looks like the first issue is related to the change from
get_output_fnames() - compute_outputs().  When
outputs_to_working_directory = False (default) this method
stores/returns a HistoryDatasetAssociation, but when True,
stores/returns a Dataset (the original method's behavior).  Thus,
accessing the object's .datatype attribute in the splitter's do_merge()
fails.

Thanks,
--nate

Duddy, John wrote:
 I'll submit a pull request shortly...
 
 John Duddy
 Sr. Staff Software Engineer
 Illumina, Inc.
 9885 Towne Centre Drive
 San Diego, CA 92121
 Tel: 858-736-3584
 E-mail: jdu...@illumina.com
 
 
 -Original Message-
 From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
 Sent: Wednesday, November 02, 2011 12:24 PM
 To: Duddy, John
 Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
 Subject: Re: Looks like actual user breaks splitting
 
 John, Ilya,
 
 I get further with sequence type inputs but it looks like
 JobWrapper.get_output_datasets_and_fnames() is not returning the right
 thing when outputs_to_working_directory = True
 
 BTW, the base Data.split() method is broken after the updates to
 Sequence.split() since it wasn't updated to expect
 HistoryDatasetAssociations rather than filenames.  Could you take a look
 at that when you get a chance?
 
 --nate
 
 Duddy, John wrote:
  The datatype you are using does not define a split method. Are you working 
  with our in-progress gz type or fastqillumina?
  
  John Duddy
  Sr. Staff Software Engineer
  Illumina, Inc.
  9885 Towne Centre Drive
  San Diego, CA 92121
  Tel: 858-736-3584
  E-mail: jdu...@illumina.commailto:jdu...@illumina.com
  
  From: Chorny, Ilya
  Sent: Wednesday, November 02, 2011 11:50 AM
  To: Duddy, John
  Cc: Nate Coraor (n...@bx.psu.edu); galaxy-dev@lists.bx.psu.edu
  Subject: Looks like actual user breaks splitting
  
  Hey John,
  
  Any thoughts?
  
  Ilya
  
  Traceback (most recent call last):
File 
  /home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/runners/tasks.py, 
  line 73, in run_job
  tasks = splitter.do_split(job_wrapper)
File 
  /home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/splitters/multi.py, 
  line 73, in do_split
  input_type.split(input_datasets, get_new_working_directory_name, 
  parallel_settings)
File /home/galaxy/ichorny/galaxy-central/lib/galaxy/datatypes/data.py, 
  line 473, in split
  raise Exception(Text file splitting does not support multiple files)
  Exception: Text file splitting does not support multiple files
  
  Ilya Chorny Ph.D.
  Bioinformatics Scientist I
  Illumina, Inc.
  9885 Towne Centre Drive
  San Diego, CA 92121
  Work: 858.202.4582
  Email: icho...@illumina.commailto:icho...@illumina.com
  Website: www.illumina.comhttp://www.illumina.com
  
  
 

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Looks like actual user breaks splitting

2011-11-03 Thread Nate Coraor (n...@bx.psu.edu)
Nate Coraor (n...@bx.psu.edu) wrote:
 Duddy, John wrote:
  I'm not following you - it's been 6 months since I wrote that code ;-}
 
 I know the feeling!
 
  IT looks to me like a DatasetPath() object is always placed in that array, 
  and with one exception near then, it looks like the change I made generates 
  those objects the same way.
 
 It's creating a dict in self.output_dataset_paths, and that dict looks
 like this when outputs_to_working_directory = False:
 
 { output_param_name : [ HDA, DatasetPath ], ... }
 
 And this when True:
 
 { output_param_name : [ Dataset, DatasetPath ], ... }
 
  Do you have a stack trace for the merge problem I can look at?
 
 If you put this in do_merge()'s except block:
 
 log.exception( stdout )
 
 You'll get:
 
 Traceback (most recent call last):
   File 
 /space/nate/galaxy-central-ichorny/lib/galaxy/jobs/splitters/multi.py, line 
 128, in do_merge
 output_type = outputs[output][0].datatype
 AttributeError: 'Dataset' object has no attribute 'datatype'
 
 I could just change both methods to put an HDA in the list inside the
 dict there, but I haven't looked much into what output_dataset_paths is
 used for, so I wasn't sure what that might break.

Sorta answered it myself, it looks like you created this precisely for
do_merge(), so changing it to contain the HDA fixes the problem (and
shouldn't break anything else).

--nate

 
 Thanks,
 --nate
 
  John Duddy
  Sr. Staff Software Engineer
  Illumina, Inc.
  9885 Towne Centre Drive
  San Diego, CA 92121
  Tel: 858-736-3584
  E-mail: jdu...@illumina.com
  
  
  -Original Message-
  From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
  Sent: Thursday, November 03, 2011 2:22 PM
  To: Duddy, John
  Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
  Subject: Re: Looks like actual user breaks splitting
  
  Hi John,
  
  It looks like the first issue is related to the change from
  get_output_fnames() - compute_outputs().  When
  outputs_to_working_directory = False (default) this method
  stores/returns a HistoryDatasetAssociation, but when True,
  stores/returns a Dataset (the original method's behavior).  Thus,
  accessing the object's .datatype attribute in the splitter's do_merge()
  fails.
  
  Thanks,
  --nate
  
  Duddy, John wrote:
   I'll submit a pull request shortly...
   
   John Duddy
   Sr. Staff Software Engineer
   Illumina, Inc.
   9885 Towne Centre Drive
   San Diego, CA 92121
   Tel: 858-736-3584
   E-mail: jdu...@illumina.com
   
   
   -Original Message-
   From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
   Sent: Wednesday, November 02, 2011 12:24 PM
   To: Duddy, John
   Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
   Subject: Re: Looks like actual user breaks splitting
   
   John, Ilya,
   
   I get further with sequence type inputs but it looks like
   JobWrapper.get_output_datasets_and_fnames() is not returning the right
   thing when outputs_to_working_directory = True
   
   BTW, the base Data.split() method is broken after the updates to
   Sequence.split() since it wasn't updated to expect
   HistoryDatasetAssociations rather than filenames.  Could you take a look
   at that when you get a chance?
   
   --nate
   
   Duddy, John wrote:
The datatype you are using does not define a split method. Are you 
working with our in-progress gz type or fastqillumina?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.commailto:jdu...@illumina.com

From: Chorny, Ilya
Sent: Wednesday, November 02, 2011 11:50 AM
To: Duddy, John
Cc: Nate Coraor (n...@bx.psu.edu); galaxy-dev@lists.bx.psu.edu
Subject: Looks like actual user breaks splitting

Hey John,

Any thoughts?

Ilya

Traceback (most recent call last):
  File 
/home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/runners/tasks.py, 
line 73, in run_job
tasks = splitter.do_split(job_wrapper)
  File 
/home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/splitters/multi.py,
 line 73, in do_split
input_type.split(input_datasets, get_new_working_directory_name, 
parallel_settings)
  File 
/home/galaxy/ichorny/galaxy-central/lib/galaxy/datatypes/data.py, 
line 473, in split
raise Exception(Text file splitting does not support multiple 
files)
Exception: Text file splitting does not support multiple files

Ilya Chorny Ph.D.
Bioinformatics Scientist I
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Work: 858.202.4582
Email: icho...@illumina.commailto:icho...@illumina.com
Website: www.illumina.comhttp://www.illumina.com


   
  
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your 

Re: [galaxy-dev] Clusters, Runners, and user credentials

2011-11-03 Thread Fields, Christopher J
On Nov 3, 2011, at 9:50 AM, Nate Coraor wrote:

 Hi Chris,
 
 Ilya's solution uses sudo to submit the job via drmaa after switching to
 the actual user's uid and gid.  This means giving your Galaxy user sudo
 rights to run 3 scripts as root:
 
* A script to submit jobs
* A script to kill jobs
* A script to chown a directory
 
 This could be tightened up a bit, in the case of the first two by
 sudoing directly to the user, rather than to root and then setuid()ing.
 In the case of the latter script, a path is passed to the script rather
 than a Galaxy job id, so it could be used by the Galaxy user to chown
 anything that root can chown.  In addition, if your Galaxy data lives in
 NFS with root squashing enabled, this script would fail.

Yes, we may run into this or worse; we're setting up gpfs locally for our NFS.

 Of course, the paths to these scripts are configurable, so they can be
 replaced with site-suitable versions.
 
 Another option to avoid sudo entirely would be for Galaxy to start as
 root and then drop privileges, but I am not incredibly fond of this
 solution, since it allows for the possibility of privilege separation
 exploits.  Perhaps a stripped down Galaxy data daemon that runs with
 elevated privileges, whose sole job it is to manage permissions and move
 data?

That sounds like a feasible option.

 As with the existing Galaxy implementation, Galaxy's data is not copied
 around at job runtime for tool input, it simply exists in one place and
 is expected to be locatable on the cluster resource at the same path.
 My next development goal is to remove this limitation.

This is something we will run into at some point, particularly with some of the 
NCSA resources (where user paths are quite different from other clusters on 
campus).

 The assumption is also made that tool inputs are readable by the actual
 user, which was a problem in some environments.
 
 If administrators prefer to give the Galaxy account the permission to
 run jobs as other users directly in the DRM, this would certainly solve
 the problem.  Galaxy would just need minor modification to take
 advantage of the feature.
 
 As you probably recall, there were many people at the GCC brainstorming
 this problem, and I don't recall that we ever came up with the perfectly
 secure solution.  This solution may be good enough for some sites.

Right, I think this is more a problem when the cluster is not under our control 
and has already been configured.  Not that it's impossible, but there is 
definitely an additional level of sysadmin concerns we have to deal with.  And 
with multiple clusters (with multiple configurations, sysadmins, etc) this 
becomes more complex.  We're deploying step-wise (on one cluster initially, 
then others down the road) for this reason.

 If there's a desire for tightened security, I would be happy to review
 and accept any work done on that. =)
 
 --nate

That is a possibility, we have initially talked with a few of the myproxy folks 
here re: security concerns and possible solutions for user job submissions 
(there wasn't much added yet beyond what you already covered, unfortunately).

chris

 Fields, Christopher J wrote:
 Ilya, Nate,
 
 To add a bit of background to the below, we have several clusters on campus 
 that use very different accounting systems; some run as a regular cron job 
 to process job run info, however others use a qsub wrapper to check service 
 units prior to job submission (a byproduct of being part of teragrid/xcede). 
  It seems the most direct route to work around accounting-level differences 
 is to submit the job as a user (so I'm interested in this solution), but the 
 below security questions I mentioned were raised by a number of our local 
 cluster sysadmins as well as (if I'm not mistaken) at the conference.  
 
 Were these ever addressed, or is it considered an non-issue?  Apologies 
 about re-sending, I didn't know if this had been answered elsewhere, but 
 this was a serious concern that may block us from using some pretty nice HPC 
 resources.
 
 chris
 
 On Nov 1, 2011, at 4:59 PM, Fields, Christopher J wrote:
 
 I recall at the Galaxy conf there were questions on how secure this is 
 (having the 'galaxy' user submit jobs as someone else).  This would involve 
 switching users on the cluster or would require user login information, 
 correct?
 
 The way we planned on working around this was to just specify a user 
 account string (using '-A') instead of bothering with switching users.  I 
 believe our local cluster disallows switching users via PBS unless the 
 submitter has admin privs, but the accounting string works fine (I suppose 
 one could use the project option as well).
 
 chris
 
 On Oct 31, 2011, at 6:30 PM, Chorny, Ilya wrote:
 
 I modified drmaa.py to pass the galaxy users path variable to the actual 
 user. As long as the galaxy user's environment is correct then the actual 
 user's environment should be correct.  
 
 -Original