Re: [galaxy-dev] Trouble setting up a local instance of Galaxy

2013-11-07 Thread Hans-Rudolf Hotz

Hi Starr


I will try to answer some of your questions. Next time I recommend to 
ask just one question (with a corresponding subject line). This will 
make it much easier for the community to help you, and for others with 
similar problems to find the correct e-mail thread.


see below for my answers...

On 11/06/2013 11:43 PM, Hazard, E. Starr wrote:

Hello,
I am a new user of Galaxy.
I have a Galaxy instance running (sort of)  on a local research cluster.
I issued the command hg update stable” today and it retrieved no files.
SO presume I am up-to-date on the stable release. I start the instance
as a user named “galaxy”.
Right now I am still running in “local” mode. I hope to migrate to DRMAA
LSF eventually.
I have tried to set up ProFTP to upload files but have not succeeded so
I use Galaxy Web-upload.
The upload was working nicely and I had  added  a couple of  new tools
and they were working with the uploaded files.
Getting LSF/DRMAA to work was giving me fits and ultimately I deleted
all my history files in an effort to start over.
Presently, files being uploaded appear in history as say job 1 ( in a
new history) The job status in the history panel of the web GUI
changes from purple to yellow then then to red indicating some sort of
error. There is no viewable error text captured, but I can click on the
“eye” icon and see the
first megabyte of the data (for tiny files I can see the entire content
and it’s intact). In the Galaxy file system, however, these files appear
but have a different number , say, dataset_399.dat

On my system the uploaded files appear in
/PATH/galaxy-dist/database/files/000

My first question is why is the data going into the “000” subdirectory
and not one “owned’ by the user who is uploading?


all files are owned by galaxy (independent of who is uploading them 
and/or generate new files). The first 1000 files are stored in '000', 
the next 1000 files in '001', and so on.


Access permission is handled by the SQLite database (unless you have 
already switched to a  PostgreSQL database).




My second question is why is the dataset being labeled as
dataset_399.dat and not dataset_001.dat?


The number is given by the SQLite database. When you want a clean start, 
you need to remove all files from ~/database/files/000 and start with an 
empty database.



My third question is why do the uploaded files not appear as selectable
options ( say I have paired-end fastq files and tool wants to have
choices about filenames)? This problem is present for programs that seek
one input file as well.


When you uploaded the files, was the 'format' properly recognized?



  I presume that Galaxy is confused because the numbering in history is
not the same as the numbering in the file upload archive (e.g.
/PATH/galaxy-dist/database/files/000 in my case) so my last question is
how do I “reset” my system to get the dataset and history numbers to be
the same?


The numbering in the history has nothing to do with the file numbering. 
See above wrt my comment about a 'clean start'



Regards, Hans-Rudolf



Here’s how I launch the Galaxy instance

  sh /shared/app/Galaxy/galaxy-dist/run.sh -v --daemon
--pid-file=Nov6Localdaemon.pid.txt  --log-file=Nov6Local1639daemon.log.txt

Entering daemon mode


Here are the last lines of the log

Starting server in PID 26236.

serving on 0.0.0.0:8089 view at http://127.0.0.1:8089


galaxy.tools.actions.upload_common DEBUG 2013-11-06 16:48:49,624
Changing ownership of
/shared/app/Galaxy/galaxy-dist/database/tmp/upload_file_data_QZGHm4
with: /usr/bin/sudo -E scripts/external_chown_script.py
/shared/app/Galaxy/galaxy-dist/database/tmp/upload_file_data_QZGHm4
hazards 502

galaxy.tools.actions.upload_common WARNING 2013-11-06 16:48:49,750
Changing ownership of uploaded file
/shared/app/Galaxy/galaxy-dist/database/tmp/upload_file_data_QZGHm4
failed: sudo: no tty present and no askpass program specified


galaxy.tools.actions.upload_common DEBUG 2013-11-06 16:48:49,751
Changing ownership of
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO with:
/usr/bin/sudo -E scripts/external_chown_script.py
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO hazards 502

galaxy.tools.actions.upload_common WARNING 2013-11-06 16:48:49,775
Changing ownership of uploaded file
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO failed: sudo: no
tty present and no askpass program specified


galaxy.tools.actions.upload_common INFO 2013-11-06 16:48:49,805 tool
upload1 created job id 170


galaxy.jobs DEBUG 2013-11-06 16:48:50,678 (170) Persisting job
destination (destination id: local)

galaxy.jobs.handler INFO 2013-11-06 16:48:50,698 (170) Job dispatched

galaxy.jobs.runners.local DEBUG 2013-11-06 16:48:50,994 (170) executing:
python /shared/app/Galaxy/galaxy-dist/tools/data_source/upload.py
/depot/shared/app/Galaxy/galaxy-dist
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpTq22ot
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO

[galaxy-dev] Using input dataset names in output dataset names

2013-11-07 Thread Peter Cock
Hi all,

I'd like to change the output dataset labelling in Galaxy file format
conversion tools.

e.g. If the input is history entry 1 (e.g My Genes) then the output
from tabular_to_fasta.xml is currently named FASTA-to-Tabular on data
1.
I would prefer this was FASTA-to-Tabular on data My Genes or better
My Genes (as tabular).

I've just done this for my BLAST XML to tabular tool, using the
.display_name trick:
https://github.com/peterjc/galaxy_blast/commit/31e31c4b5deadd60828ce6e6a381a5f90357393d

Would a pull request doing this to the built-in conversion tools be
favourably received?

Alternatively, would it be preferable to simply reused the input
dataset's name unchanged for simple format conversion tools (without
text about the conversion)?

Related to this, would people prefer if the $on_string in the case of
a single input file was the input file's name (e.g. My Genes) rather
than data 1? (When there are multiple input files, $on_string needs
to be kept short).

Regards,

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Using input dataset names in output dataset names

2013-11-07 Thread Bjoern Gruening
Hi Peter,

thanks for raising this important topic.

I think the following trello card has a similar idea and a patch
attached.

https://trello.com/c/JnhOEqow

It would be great if we can simplify the naming of datasets, especially
if you run a workflow with several input, you would like to keep the
input name through the whole workflow to the end.

Cheers,
Bjoern

 Hi all,
 
 I'd like to change the output dataset labelling in Galaxy file format
 conversion tools.
 
 e.g. If the input is history entry 1 (e.g My Genes) then the output
 from tabular_to_fasta.xml is currently named FASTA-to-Tabular on data
 1.
 I would prefer this was FASTA-to-Tabular on data My Genes or better
 My Genes (as tabular).
 
 I've just done this for my BLAST XML to tabular tool, using the
 .display_name trick:
 https://github.com/peterjc/galaxy_blast/commit/31e31c4b5deadd60828ce6e6a381a5f90357393d
 
 Would a pull request doing this to the built-in conversion tools be
 favourably received?
 
 Alternatively, would it be preferable to simply reused the input
 dataset's name unchanged for simple format conversion tools (without
 text about the conversion)?
 
 Related to this, would people prefer if the $on_string in the case of
 a single input file was the input file's name (e.g. My Genes) rather
 than data 1? (When there are multiple input files, $on_string needs
 to be kept short).
 
 Regards,
 
 Peter
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Using input dataset names in output dataset names

2013-11-07 Thread Peter Cock
On Thu, Nov 7, 2013 at 12:29 PM, Bjoern Gruening
bjoern.gruen...@gmail.com wrote:
 Hi Peter,

 thanks for raising this important topic.

 I think the following trello card has a similar idea and a patch
 attached.

 https://trello.com/c/JnhOEqow

 It would be great if we can simplify the naming of datasets, especially
 if you run a workflow with several input, you would like to keep the
 input name through the whole workflow to the end.

Yes, I was aware of some more general ideas like that - and
I agree this is important.

However, with the conversion tool naming we can make a small
improvement right now, without having to modify the Galaxy core.

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Using input dataset names in output dataset names

2013-11-07 Thread Peter Cock
On Thu, Nov 7, 2013 at 12:21 PM, Peter Cock p.j.a.c...@googlemail.com wrote:

 Related to this, would people prefer if the $on_string in the case of
 a single input file was the input file's name (e.g. My Genes) rather
 than data 1? (When there are multiple input files, $on_string needs
 to be kept short).

That turned out to be quite an easy change (patch below), and
personally I think this makes the $on_string much nicer.

Peter

--

$ hg diff lib/galaxy/tools/actions/__init__.py
diff -r 77d58fdd1c2e lib/galaxy/tools/actions/__init__.py
--- a/lib/galaxy/tools/actions/__init__.pyTue Oct 29 14:21:48 2013 -0400
+++ b/lib/galaxy/tools/actions/__init__.pyThu Nov 07 15:15:42 2013 +
@@ -181,6 +181,7 @@
 input_names = []
 input_ext = 'data'
 input_dbkey = incoming.get( dbkey, ? )
+on_text = ''
 for name, data in inp_data.items():
 if not data:
 data = NoneDataset( datatypes_registry =
trans.app.datatypes_registry )
@@ -194,6 +195,7 @@
 else: # HDA
 if data.hid:
 input_names.append( 'data %s' % data.hid )
+on_text = data.name # Will use below if only one
input dataset
 input_ext = data.ext

 if data.dbkey not in [None, '?']:
@@ -230,7 +232,10 @@
 output_permissions =
trans.app.security_agent.history_get_default_permissions( history )
 # Build name for output datasets based on tool name and input names
 if len( input_names ) == 1:
-on_text = input_names[0]
+#We recorded the dataset name as on_text earlier...
+if not on_text:
+#Fall back on the shorter 'data %i' style:
+on_text = input_names[0]
 elif len( input_names ) == 2:
 on_text = '%s and %s' % tuple(input_names[0:2])
 elif len( input_names ) == 3:
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] local_task_queue_workers

2013-11-07 Thread Jorrit Boekel

Hi list,

I would need a memory refresher about tasked jobs. When testing some 
larger analyses on a local installation, I thought the 
local_task_queue_workers setting in universe_wsgi.ini would be the 
limiting factor for how many tasks can be executed at the same time. In 
our setup, it is currently set to 2. However, 5 tasks are run 
simultaneously, leading to memory problems.


Am I overlooking something that anyone knows of?

cheers,
jorrit boekel

--
Scientific programmer
Mass spec analysis support @ BILS
Janne Lehtiö / Lukas Käll labs
SciLifeLab Stockholm

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


[galaxy-dev] Bug: Two copies of wiggle_to_simple.xml

2013-11-07 Thread Peter Cock
Hi all,

There are two copies of the wiggle_to_simple tool in the main repository,
and this duplication appears to have happened back in 2009.

$ grep wiggle_to_simple tool_conf.xml.sample
tool file=filters/wiggle_to_simple.xml /
tool file=stats/wiggle_to_simple.xml /

$ diff tools/filters/wiggle_to_simple.py tools/stats/wiggle_to_simple.py
(no changes)

$ diff -w tools/filters/wiggle_to_simple.xml tools/stats/wiggle_to_simple.xml
15,18d14
 test
   param name=input value=3.wig /
   output name=out_file1 file=3_wig.bed/
 /test

The tools/filters/wiggle_to_simple.xml version has Windows newlines,
and 2 tests.

The tools/stats/wiggle_to_simple.xml version has Unix newlines, but only 1 test.

I would therefore suggest merging the two (Unix newlines, both tests).

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Using input dataset names in output dataset names

2013-11-07 Thread Peter Cock
On Thu, Nov 7, 2013 at 3:18 PM, Peter Cock p.j.a.c...@googlemail.com wrote:
 On Thu, Nov 7, 2013 at 12:21 PM, Peter Cock p.j.a.c...@googlemail.com wrote:

 Related to this, would people prefer if the $on_string in the case of
 a single input file was the input file's name (e.g. My Genes) rather
 than data 1? (When there are multiple input files, $on_string needs
 to be kept short).

 That turned out to be quite an easy change (patch below), and
 personally I think this makes the $on_string much nicer.

 Peter

Getting back to my motivating example, since fasta_to_tabular.xml
does not give the output a label and depends on the default, the
small change to $on_string should result in the conversion of a
file named My Genes as FASTA-to-Tabular on My Genes,
rather than FASTA-to-Tabular on data 1 as now.

Here's another variant to keep the data 1 text in $on_string,
if people are attached to this functionality. That would result in
FASTA-to-Tabular on data 1 (My Genes).

Also, here's an outline patch to explicitly produce my preferred
label of My Genes (as tabular) etc.

(Bjoern is right though - a more long term solution is needed to
better address naming, like the tag idea on Trello.)

Peter

--

$ hg diff lib/galaxy/tools/actions/__init__.py
diff -r 77d58fdd1c2e lib/galaxy/tools/actions/__init__.py
--- a/lib/galaxy/tools/actions/__init__.pyTue Oct 29 14:21:48 2013 -0400
+++ b/lib/galaxy/tools/actions/__init__.pyThu Nov 07 15:49:15 2013 +
@@ -181,6 +181,7 @@
 input_names = []
 input_ext = 'data'
 input_dbkey = incoming.get( dbkey, ? )
+on_text = ''
 for name, data in inp_data.items():
 if not data:
 data = NoneDataset( datatypes_registry =
trans.app.datatypes_registry )
@@ -194,6 +195,8 @@
 else: # HDA
 if data.hid:
 input_names.append( 'data %s' % data.hid )
+#Will use this on_text if only one input dataset:
+on_text = data %s (%s) % (data.id, data.name)
 input_ext = data.ext

 if data.dbkey not in [None, '?']:
@@ -230,7 +233,10 @@
 output_permissions =
trans.app.security_agent.history_get_default_permissions( history )
 # Build name for output datasets based on tool name and input names
 if len( input_names ) == 1:
-on_text = input_names[0]
+#We recorded the dataset name as on_text earlier...
+if not on_text:
+#Fall back on the shorter 'data %i' style:
+on_text = input_names[0]
 elif len( input_names ) == 2:
 on_text = '%s and %s' % tuple(input_names[0:2])
 elif len( input_names ) == 3:


--

$ hg diff tools
diff -r 77d58fdd1c2e tools/fasta_tools/fasta_to_tabular.xml
--- a/tools/fasta_tools/fasta_to_tabular.xmlTue Oct 29 14:21:48 2013 -0400
+++ b/tools/fasta_tools/fasta_to_tabular.xmlThu Nov 07 15:42:13 2013 +
@@ -11,7 +11,7 @@
 /param
 /inputs
 outputs
-data name=output format=tabular/
+data name=output format=tabular
label=$input.display_name (as tabular)/
 /outputs
 tests
 test
diff -r 77d58fdd1c2e tools/fasta_tools/tabular_to_fasta.xml
--- a/tools/fasta_tools/tabular_to_fasta.xmlTue Oct 29 14:21:48 2013 -0400
+++ b/tools/fasta_tools/tabular_to_fasta.xmlThu Nov 07 15:42:13 2013 +
@@ -7,7 +7,7 @@
 param name=seq_col type=data_column data_ref=input
numerical=False label=Sequence column /
 /inputs
 outputs
-data name=output format=fasta/
+data name=output format=fasta label=$input.display_name
(as FASTA) /
 /outputs
 tests
 test
@@ -40,4 +40,4 @@
 GTGATATGTATGTTGACGGCCATAAGGCTGCTTCTT

 /help
-/tool
\ No newline at end of file
+/tool
diff -r 77d58fdd1c2e tools/fastq/fastq_to_fasta.xml
--- a/tools/fastq/fastq_to_fasta.xmlTue Oct 29 14:21:48 2013 -0400
+++ b/tools/fastq/fastq_to_fasta.xmlThu Nov 07 15:42:13 2013 +
@@ -5,7 +5,7 @@
 param name=input_file type=data format=fastq label=FASTQ
file to convert /
   /inputs
   outputs
-data name=output_file format=fasta /
+data name=output_file format=fasta label=$input_file.name
(as FASTA) /
   /outputs
   tests
 !-- basic test --
diff -r 77d58fdd1c2e tools/fastq/fastq_to_tabular.xml
--- a/tools/fastq/fastq_to_tabular.xmlTue Oct 29 14:21:48 2013 -0400
+++ b/tools/fastq/fastq_to_tabular.xmlThu Nov 07 15:42:13 2013 +
@@ -8,7 +8,7 @@
 /param
   /inputs
   outputs
-data name=output_file format=tabular /
+data name=output_file format=tabular label=$input_file.name
(as tabular) /
   /outputs
   tests
 !-- basic test --
diff -r 77d58fdd1c2e tools/fastq/tabular_to_fastq.xml
--- a/tools/fastq/tabular_to_fastq.xml 

[galaxy-dev] Job execution order mixed-up

2013-11-07 Thread Jean-Francois Payotte
Dear Galaxy mailing-list,

Once again I come seeking for your help. I hope someone already had this 
issue or will have an idea on where to look to solve it. :)

One of our users reported having workflows failing because some steps were 
executed before all their inputs where ready.
You can find a screenshot attached, where we can see that step (42) Sort 
on data 39 has been executed while step (39) is still waiting to run 
(gray box).

This behaviour has been reproduced with at least two different Galaxy 
tools (one custom, and the sort tool which comes standard with Galaxy).
This behaviour seems to be a little bit random, as running two times a 
workflow where this issue occurs, only one time did some steps were 
executed in the wrong order.

I could be wrong, but I don't think this issue is grid-related as, from my 
understanding, Galaxy is not using SGE job dependencies functionality.
I believe all jobs stays in some internal queues (within Galaxy) until all 
input files are ready, and only then the job is submitted to the cluster.

Any help or any hint on what to look at to solve this issue would be 
greatly appreciated.
We have updated our Galaxy instance to August 12th distribution on October 
1st, and I believe we never experienced this issue before the update.

Many thanks for your help,
Jean-François


image/gif___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Samtools and idxstats

2013-11-07 Thread Peter Cock
Hi Michiel,

Did you finish wrapping samtools idxstats? I can't see it on the Tool Shed...

If not, I may tackle this shortly.

Peter

On Wed, May 22, 2013 at 2:39 PM, Jeremy Goecks jeremy.goe...@emory.edu wrote:
 Galaxy stores a BAI for each BAM internally; you can access it in a tool 
 wrapper like this (assuming the name of your input dataset is 'input_bam':

 ${input_bam.metadata.bam_index}

 Once you have the file path, you can set up a symbolic link to it and the 
 tool should work fine.

 Good luck,
 J.



 On May 22, 2013, at 4:05 AM, Michiel Van Bel wrote:

 Hi,

 I would like to inquire whether anyone has attempted to implement the 
 idxstats tool from samtools into Galaxy?
 The xml-file for idxstats is not present in the Galaxy source code, which 
 led me to try and implement it myself.
 However, the main problem I face is that the idxstats tool silently relies 
 on having an index file available (within the same directory)  for the bam 
 file you which to print the stats for.
 E.g. samtools idxstats PATH/test.bam
 searches for PATH/test.bam.bai  and gives an error when this file is not 
 present. And somehow I cannot model this behavior in Galaxy.

 A different solution would of course be to ask the author(s) of samtools to 
 have an option available where the user can directly indicate the path to 
 the index file.

 regards,
 Michiel

 PS: I've searched the mailing list archives for this problem but did not 
 find any matches. Apologies if I somehow missed the answer.

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Security vulnerability in Galaxy filtering tools

2013-11-07 Thread John Chilton
Hello Ido,

  The project has had a lot of contributors over the years, it is
probably not safe to assume they have all been experts and
frequently experts know of little tricks or shortcuts that can
result in a lot of trouble (this case in point) - a less sophisticated
developer probably would not have even known about the Python
functionality that resulted in this trouble.

  I suspect the mere fact that you are concerned about writing secure
tools means you would do a better job than many professional software
developers whom you may term as experts. Furthermore, you are
absolutely right there should be some documentation or something
somewhere to aid in writing secure tools - the rest of this e-mail
contains a couple quick notes hopefully it can be translated to the
wiki at some point and grown over time.

  If your tool is not taking in text inputs - its all numbers and
select parameters, etc it is very likely secure. These sorts of
vulnerabilities would usually come into play when users are allowed to
pass free text and the tool or wrapper use this text in such a way
that it can be broken out of the intended context (these are broadly
characterized as code injection attacks). 95% of how these text
parameters are going to be used is likely passing them as a
command-line argument to another program. For this reason Galaxy
preprocesses the text and sanitizes it so it cannot contain characters
that would result in the text easily result in code injections.

  So for this reason - you are still probably fine unless you are
circumventing this text preprocessing. For instance, Galaxy will
translate quotations marks to '__dq__', this tool explicitly
retranslates those back to quotations marks
(https://bitbucket.org/galaxy/galaxy-central/src/f2f1cce4678cf1eb188d9611b05f00706afc8897/tools/stats/filtering.py?at=default#cl-176).
There is a reason to do this in this case but you will not need for
most bioinformatics applications. If your tools are doing this it is
time to start getting extra careful.

  If you are really interested in this topic or when it is time to get
extra careful, I would recommend picking up the book The Web
Application Hacker's Handbook - it is pretty good. Most of it would
not be relevant for tool developers, but chapter 1, chapter 2, and all
of chapter 9 could be very relevant and would probably leave one with
a solid grasp of what to look for in many different contexts - not
just the ones the book discusses explicitly.

  Hopefully over time the IUC will provide guidance about this sort of
thing (informing you if there are potential security vulnerabilities
in your tools). Also feel free to post example tool configurations to
this list you might be concerned about and I am sure someone would be
happy to look it over and tell you if there are any red flags.

-John


On Tue, Nov 5, 2013 at 12:30 PM, Ido Tamir ta...@imp.ac.at wrote:

 On Nov 5, 2013, at 6:28 PM, Nate Coraor n...@bx.psu.edu wrote:

 Hi Ido,

 Thanks for the feedback.  Replies below.

 On Nov 5, 2013, at 9:54 AM, Ido Tamir wrote:

 This seems to happen often e.g. 
 http://wiki.galaxyproject.org/DevNewsBriefs/2012_10_23#Compute_Tool_Security_Fix

 I'm not sure I'd agree that it's often - we've had 4 or 5 vulnerabilities 
 over the life of the project.  2 allowed arbitrary code execution, the 
 others were less severe.

 But these were written by experts, not by people like me, that don't know 
 what the galaxy framework really does/does not do with the input, so I guess 
 I make many more mistakes.

 a) are there general guidelines in the wiki on how to avoid these problems 
 when creating tools?

 The guidelines for writing a Galaxy tool are no different from best 
 practices for writing secure code.  In specific for this vulnerability, 
 execution of user input should be handled with extreme care, and this tool 
 had some gaps in its input validation and sanitization.  For what it's 
 worth, the filter tool (on which the other vulnerable tools were based) is 
 one of the few tools surviving from the very early days of Galaxy, and would 
 not be implemented the same way if written today.

 I think it would be nice to have a small outline on the wiki of what galaxy 
 does with the input data and how it could affect a tool.
 What sanitisation is there by default so I don't have to worry about it, but 
 what could happen I if I don't care to check/remove sanitise ' | or  ..., 
 maybe with examples.

 b) is there a way to check automatically if all input fields are correctly 
 escaped in a tool?

 I am not sure how Galaxy could do this.  Galaxy sanitizes the command line 
 so that input fields passed to a tool as command line arguments cannot be 
 crafted to exploit the shell's parsing rules.
 Thats good

 best,
 ido


 What the tool itself does with its inputs are out of Galaxy's control.

 --nate


 A search for security in the wiki brings up:
  • Admin/Data Libraries/Library Security
 0.0k - rev: 1 (current) last 

Re: [galaxy-dev] local_task_queue_workers

2013-11-07 Thread John Chilton
I think you want to set local_job_queue_workers instead or set the
number on the local job runner plugin element in the newer
job_conf.xml style configuration - the task runner delegates to the
underlying job runners once it has split out the tasks. The parameter
you were setting I am guessing just determines the number of threads
used to split up tasks, not run them.

-John

On Thu, Nov 7, 2013 at 9:17 AM, Jorrit Boekel
jorrit.boe...@scilifelab.se wrote:
 Hi list,

 I would need a memory refresher about tasked jobs. When testing some larger
 analyses on a local installation, I thought the local_task_queue_workers
 setting in universe_wsgi.ini would be the limiting factor for how many tasks
 can be executed at the same time. In our setup, it is currently set to 2.
 However, 5 tasks are run simultaneously, leading to memory problems.

 Am I overlooking something that anyone knows of?

 cheers,
 jorrit boekel

 --
 Scientific programmer
 Mass spec analysis support @ BILS
 Janne Lehtiö / Lukas Käll labs
 SciLifeLab Stockholm

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Security vulnerability in Galaxy filtering tools

2013-11-07 Thread Martin Čech
John just made it to the Wiki:
http://wiki.galaxyproject.org/Develop/SecurityToolTips

Feel free to add/edit/delete.

M.


On Thu, Nov 7, 2013 at 12:27 PM, John Chilton chil...@msi.umn.edu wrote:

 Hello Ido,

   The project has had a lot of contributors over the years, it is
 probably not safe to assume they have all been experts and
 frequently experts know of little tricks or shortcuts that can
 result in a lot of trouble (this case in point) - a less sophisticated
 developer probably would not have even known about the Python
 functionality that resulted in this trouble.

   I suspect the mere fact that you are concerned about writing secure
 tools means you would do a better job than many professional software
 developers whom you may term as experts. Furthermore, you are
 absolutely right there should be some documentation or something
 somewhere to aid in writing secure tools - the rest of this e-mail
 contains a couple quick notes hopefully it can be translated to the
 wiki at some point and grown over time.

   If your tool is not taking in text inputs - its all numbers and
 select parameters, etc it is very likely secure. These sorts of
 vulnerabilities would usually come into play when users are allowed to
 pass free text and the tool or wrapper use this text in such a way
 that it can be broken out of the intended context (these are broadly
 characterized as code injection attacks). 95% of how these text
 parameters are going to be used is likely passing them as a
 command-line argument to another program. For this reason Galaxy
 preprocesses the text and sanitizes it so it cannot contain characters
 that would result in the text easily result in code injections.

   So for this reason - you are still probably fine unless you are
 circumventing this text preprocessing. For instance, Galaxy will
 translate quotations marks to '__dq__', this tool explicitly
 retranslates those back to quotations marks
 (
 https://bitbucket.org/galaxy/galaxy-central/src/f2f1cce4678cf1eb188d9611b05f00706afc8897/tools/stats/filtering.py?at=default#cl-176
 ).
 There is a reason to do this in this case but you will not need for
 most bioinformatics applications. If your tools are doing this it is
 time to start getting extra careful.

   If you are really interested in this topic or when it is time to get
 extra careful, I would recommend picking up the book The Web
 Application Hacker's Handbook - it is pretty good. Most of it would
 not be relevant for tool developers, but chapter 1, chapter 2, and all
 of chapter 9 could be very relevant and would probably leave one with
 a solid grasp of what to look for in many different contexts - not
 just the ones the book discusses explicitly.

   Hopefully over time the IUC will provide guidance about this sort of
 thing (informing you if there are potential security vulnerabilities
 in your tools). Also feel free to post example tool configurations to
 this list you might be concerned about and I am sure someone would be
 happy to look it over and tell you if there are any red flags.

 -John


 On Tue, Nov 5, 2013 at 12:30 PM, Ido Tamir ta...@imp.ac.at wrote:
 
  On Nov 5, 2013, at 6:28 PM, Nate Coraor n...@bx.psu.edu wrote:
 
  Hi Ido,
 
  Thanks for the feedback.  Replies below.
 
  On Nov 5, 2013, at 9:54 AM, Ido Tamir wrote:
 
  This seems to happen often e.g.
 http://wiki.galaxyproject.org/DevNewsBriefs/2012_10_23#Compute_Tool_Security_Fix
 
  I'm not sure I'd agree that it's often - we've had 4 or 5
 vulnerabilities over the life of the project.  2 allowed arbitrary code
 execution, the others were less severe.
 
  But these were written by experts, not by people like me, that don't
 know what the galaxy framework really does/does not do with the input, so I
 guess I make many more mistakes.
 
  a) are there general guidelines in the wiki on how to avoid these
 problems when creating tools?
 
  The guidelines for writing a Galaxy tool are no different from best
 practices for writing secure code.  In specific for this vulnerability,
 execution of user input should be handled with extreme care, and this tool
 had some gaps in its input validation and sanitization.  For what it's
 worth, the filter tool (on which the other vulnerable tools were based) is
 one of the few tools surviving from the very early days of Galaxy, and
 would not be implemented the same way if written today.
 
  I think it would be nice to have a small outline on the wiki of what
 galaxy does with the input data and how it could affect a tool.
  What sanitisation is there by default so I don't have to worry about it,
 but what could happen I if I don't care to check/remove sanitise ' | or 
 ..., maybe with examples.
 
  b) is there a way to check automatically if all input fields are
 correctly escaped in a tool?
 
  I am not sure how Galaxy could do this.  Galaxy sanitizes the command
 line so that input fields passed to a tool as command line arguments cannot
 be crafted to exploit the 

Re: [galaxy-dev] Bug: Two copies of wiggle_to_simple.xml

2013-11-07 Thread John Chilton
There are still two copies in central but I have synchronized the test
cases and fixed the newlines per your suggestion. I imagine the
easiest way to fix the fact there are two copies is just wait until
they get migrated to the tool shed - it seems like progress is being
made on that front very quickly lately.

Thanks,
-John

On Thu, Nov 7, 2013 at 9:38 AM, Peter Cock p.j.a.c...@googlemail.com wrote:
 Hi all,

 There are two copies of the wiggle_to_simple tool in the main repository,
 and this duplication appears to have happened back in 2009.

 $ grep wiggle_to_simple tool_conf.xml.sample
 tool file=filters/wiggle_to_simple.xml /
 tool file=stats/wiggle_to_simple.xml /

 $ diff tools/filters/wiggle_to_simple.py tools/stats/wiggle_to_simple.py
 (no changes)

 $ diff -w tools/filters/wiggle_to_simple.xml tools/stats/wiggle_to_simple.xml
 15,18d14
  test
param name=input value=3.wig /
output name=out_file1 file=3_wig.bed/
  /test

 The tools/filters/wiggle_to_simple.xml version has Windows newlines,
 and 2 tests.

 The tools/stats/wiggle_to_simple.xml version has Unix newlines, but only 1 
 test.

 I would therefore suggest merging the two (Unix newlines, both tests).

 Peter
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Dynamic tool configuration

2013-11-07 Thread John Chilton
Galaxy is not optimized for this kind of use case, I like to think of
it as sort of file centric - but it is clear more and more people
are using it to process data in this fashion. Hopefully someone chimes
in with better advice than I have :).

The way I would probably implement this is create a tool that lists
the available options out into a dataset or metadata of a dataset of a
new datatype - and then have your tool read that dataset in and build
the options using that file. Jim Johnson's mothur tools (available on
the tool shed) do a lot of this sort of thing - it may be worth
looking at the datatypes and an example such as remove.groups.xml:

  ...
  param name=group_in type=data format=groups label=group - Groups/
  conditional name=groupnames
   param name=source type=select label=Select Group Names from
option value=groupsA List of Group Names/option
option value=accnosA History Group Name Accnos Dataset/option
   /param
   when value=groups
param name=groups type=select label=groups - Pick groups to
remove multiple=true
 options
  filter type=data_meta ref=group_in key=groups /
 /options
/param
   /when
   when value=accnos
param name=accnos type=data format=accnos label=accnos -
Group Names from your history/
   /when
  /conditional
  ...

In particular the data_meta filter type in there.

-John

On Wed, Nov 6, 2013 at 8:03 AM, Biobix Galaxy biobix.gal...@gmail.com wrote:
 Hi all,

 We are working on a galaxy tool suite for data analysis.
 We use a sqlite db to keep result data centralised between the different
 tools.

 At one point the tool configuration options of a tool should be dependent on
 the rows within a table of the sqlite db that is the output of the previous
 step. In other words, we would like to be able to set selectable parameters
 based on an underlying sql statement. If sql is not possible, an alternative
 would be to output the table content into a txt file and subsequently parse
 the txt file instead of the sqlite_db within the xml configuration file.

 When looking through the galaxy wiki and mailing lists I came across the
 code tag which would be ideal, we could run a python script in the
 background to fetch date from the sqlite table, however that function is
 deprecated.

 Does anybody know of other ways to achieve this?

 Thanks!

 Jeroen

 Ir. Jeroen Crappé, PhD Student
 Lab of Bioinformatics and Computational Genomics (Biobix)
 FBW - Ghent University



 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas

2013-11-07 Thread John Chilton
On Thu, Nov 7, 2013 at 1:46 AM, Björn Grüning
bjoern.gruen...@pharmazie.uni-freiburg.de wrote:
 Am Donnerstag, den 07.11.2013, 00:25 -0600 schrieb John Chilton:

 My two cents below.

 On Wed, Nov 6, 2013 at 4:20 PM, Björn Grüning
 bjoern.gruen...@pharmazie.uni-freiburg.de wrote:
 Hi Dave,

 We're thinking that the following approach makes the most sense:

 action type=setup_perl_environment OR action
 type=setup_r_environment OR action type=setup_ruby_environment OR
 action type=setup_virtualenv
  repository changeset_revision=978287122b91
 name=package_perl_5_18 owner=iuc
 toolshed=http://testtoolshed.g2.bx.psu.edu;
  package name=perl version=5.18.1 /
  /repository
  repository changeset_revision=8fc96166cddd
 name=package_expat_2_1 owner=iuc
 toolshed=http://testtoolshed.g2.bx.psu.edu;
  package name=expat version=2.1 /
  /repository
 /action

 For all repository tag sets contained within these setup_* tags, the
 repository's env.sh would be pulled in for the setup of the specified
 environment without requiring a set_environment_for_install action type.

 Would this work for your use cases?

 Yes, the first one. But its a little bit to verbose or? Include the perl
 repository in a setup_perl environment should be implicit or? We can
 assume that this need to be present.
 Do you have example why sourcing every repository by default can be
 harmful? It would make such an installation so much easier and less
 complex.

 I am not sure I understand this paragraph - I have a vague sense I
 agree but is there any chance you could rephrase this or elaborate?

 My first use case will be addressed by this suggestion. I had hoped that we
 can create a less verbose syntax.
 If we I specify a package at the top of my xml file:


 package name=expat version=2.1.0
 repository name=package_expat_2_1 owner=iuc
 prior_installation_required=True /
 /package

 I need to repeat it either in a action type=set_environment_for_install
 or in a action type=setup_perl_environment.
 My hope was to get rid of these. Once a package definition is
 specified/build, every ENV var is available in any downstream package.
 But if there is any downsides or pitfalls this more verbose and explicit
 syntax will work for my usecase.


I see, this makes perfect sense to me now, thanks! I certainly agree
that it should have to be spelled out twice unless there is a good
reason. I guess my preference would be to just see it inside of the
setup_perl_environment tag - why should it need to be at the top-level
as well? There could be many implementation details that make this
difficult though, so obviously I delegate to Greg/Dave on this.


 Also that did not solve the second use case. If have two packages one
 that is installing perl libraries and the second a binary that is
 checking or that needs these perl libs.

 We have discussed off list in another thread. Just to summarize my
 thoughts there - I think we should delay this or not make it a
 priority if there are marginally acceptable workarounds that can be
 found for the time being. Getting these four actions to work well as
 sort of terminal endpoints and allow specification as tersely as
 possible should be the primary goal for the time being. You will see
 Perl or Python packages depend on C libraries 10 times more frequently
 than you will find makefiles and C programs depend on complex perl or
 python environments (correct me if I am wrong). Given that there is
 already years worth of tool shed development outlined in existing
 Trello cards - this is just how I would prioritize things (happy to be
 overruled).

 Ok point taken. Lets focus on real issue. That use case is just a
 simplification / more structured way to write tool depdendecies,
 its not strictly needed to get my packages done.
 John just to make that use case clearer:
 - You have a package (A) with dependency (B)
 - B is not worth to put it in a extra repository (extra
 tool_dependencies.xml file)

 Currently, you are forced to define both in one package tag, because if
 you define it in two package tags A will not see B.
 The perl and python was a bad example you have that problem with every
 dependency that are not worth to put it in a separate repository.


 To summarize:
 I'm fine with that approach. It will address my current use case and it
 would be great to have it as proposed by Dave!

 Thanks a lot!
 Bjoern



 If so, can you confirm that this should be done for all four currently
 supported setup_* action types?

 I think it would be best to tackle setup_r_environment and
 setup_ruby_environment first. setup_virtualenv cannot have nested
 elements at this time - it is just assumed to be a bunch of text
 (either a file containing the dependencies or a list of the
 dependencies).

 So setup_r_environment and setup_ruby_environment have the same structure:

 setup_ruby_environment
   repository .. /
   package .. /
   package .. /
 /setup_ruby_environment

 ... 

Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas

2013-11-07 Thread Greg Von Kuster

Please see my inline comments.  Thanks!

Greg Von Kuster


On Nov 7, 2013, at 1:33 PM, John Chilton chil...@msi.umn.edu wrote:

 On Thu, Nov 7, 2013 at 1:46 AM, Björn Grüning
 bjoern.gruen...@pharmazie.uni-freiburg.de wrote:
 Am Donnerstag, den 07.11.2013, 00:25 -0600 schrieb John Chilton:
 
 My two cents below.
 
 On Wed, Nov 6, 2013 at 4:20 PM, Björn Grüning
 bjoern.gruen...@pharmazie.uni-freiburg.de wrote:
 Hi Dave,
 
 We're thinking that the following approach makes the most sense:
 
 action type=setup_perl_environment OR action
 type=setup_r_environment OR action type=setup_ruby_environment OR
 action type=setup_virtualenv
 repository changeset_revision=978287122b91
 name=package_perl_5_18 owner=iuc
 toolshed=http://testtoolshed.g2.bx.psu.edu;
 package name=perl version=5.18.1 /
 /repository
 repository changeset_revision=8fc96166cddd
 name=package_expat_2_1 owner=iuc
 toolshed=http://testtoolshed.g2.bx.psu.edu;
 package name=expat version=2.1 /
 /repository
 /action
 
 For all repository tag sets contained within these setup_* tags, the
 repository's env.sh would be pulled in for the setup of the specified
 environment without requiring a set_environment_for_install action type.
 
 Would this work for your use cases?
 
 Yes, the first one. But its a little bit to verbose or? Include the perl
 repository in a setup_perl environment should be implicit or? We can
 assume that this need to be present.
 Do you have example why sourcing every repository by default can be
 harmful? It would make such an installation so much easier and less
 complex.
 
 I am not sure I understand this paragraph - I have a vague sense I
 agree but is there any chance you could rephrase this or elaborate?
 
 My first use case will be addressed by this suggestion. I had hoped that we
 can create a less verbose syntax.
 If we I specify a package at the top of my xml file:
 
 
package name=expat version=2.1.0
repository name=package_expat_2_1 owner=iuc
 prior_installation_required=True /
/package
 
 I need to repeat it either in a action type=set_environment_for_install
 or in a action type=setup_perl_environment.
 My hope was to get rid of these. Once a package definition is
 specified/build, every ENV var is available in any downstream package.
 But if there is any downsides or pitfalls this more verbose and explicit
 syntax will work for my usecase.


The potential problem I see here is that environment variables are not name 
spaced in any way, so if all env.sh files are sourced no matter what, there is 
the potential for a certain environment variable to get set to a certain 
dependency version, and then later during the installation (assuming a 
hierarchy of repository dependencies), the same environment variable gets set 
to a different version of the same dependency.  I'm not sure how often (if 
ever) this couls occur, but if it did, it the installation would not be as 
expected.


 
 
 I see, this makes perfect sense to me now, thanks! I certainly agree
 that it should have to be spelled out twice unless there is a good
 reason. I guess my preference would be to just see it inside of the
 setup_perl_environment tag - why should it need to be at the top-level
 as well? There could be many implementation details that make this
 difficult though, so obviously I delegate to Greg/Dave on this.


Just so I'm clear on this, is this what you want implemented as an enhancement 
to the setup_* tag sets?

action type=setup_perl_environment OR action type=setup_r_environment 
OR action type=setup_ruby_environment OR action type=setup_virtualenv
   repository changeset_revision=978287122b91 name=package_perl_5_18 
owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu;
   package name=perl version=5.18.1 /
   /repository
   repository changeset_revision=8fc96166cddd name=package_expat_2_1 
owner=iuc toolshed=http://testtoolshed.g2.bx.psu.edu;
   package name=expat version=2.1 /
   /repository
/action

For all repository tag sets contained within these setup_* tags, the 
repository's env.sh would be pulled in for the setup of the specified 
environment without requiring a set_environment_for_install action type.



 
 
 Also that did not solve the second use case. If have two packages one
 that is installing perl libraries and the second a binary that is
 checking or that needs these perl libs.
 
 We have discussed off list in another thread. Just to summarize my
 thoughts there - I think we should delay this or not make it a
 priority if there are marginally acceptable workarounds that can be
 found for the time being. Getting these four actions to work well as
 sort of terminal endpoints and allow specification as tersely as
 possible should be the primary goal for the time being. You will see
 Perl or Python packages depend on C libraries 10 times more frequently
 than you will find makefiles and C programs depend on complex perl or
 python environments (correct me if I am 

Re: [galaxy-dev] Installing Galaxy behind an Apache proxy using mod_auth_cas for user auth

2013-11-07 Thread Sandra Gesing
Dear all,

I have found a solution but I can unfortunately not explain why your solution 
on the Admin pages is not working. The following entries in httpd.conf solved 
the problem in our environment. Maybe this is useful for other CAS users.

Best,
Sandra

RewriteEngine on
Location /
  # Define the authentication method
  AuthType CAS
  AuthName Galaxy
  Require valid-user
/Location
# Proxy Configurations
ProxyVia On
ProxyPassInterpolateEnv On
Proxy *
   Order allow,deny
   Allow from all
/Proxy
ProxyPass / http://galaxy.crc.nd.edu:8080/
ProxyPassReverse / http://galaxy.crc.nd.edu:8080/
RequestHeader set REMOTE_USER %{REMOTE_USER}s
SSLProxyEngine On
AllowCONNECT 8080

RewriteRule ^(.*) http://galaxy.crc.nd.edu:8080$1 [P]


From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] 
On Behalf Of Sandra Gesing [sandra.ges...@nd.edu]
Sent: Tuesday, November 05, 2013 5:46 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] Installing Galaxy behind an Apache proxy using 
mod_auth_cas for user auth

Dear all,

I would like to set up a local Galaxy instance behind an Apache server with our 
local CAS for authentication.

It would be great if you could give me a hint for the httpd.conf. I have the 
problem that after authenticating against CAS in the browser, I get following 
error message and REMOTE_USER doesn't seem to be in the HTTP header for Galaxy 
(I can see the REMOTE_USER in the access_log of Apache but not any more in 
paster.log of Galaxy).
Access to Galaxy is denied
Galaxy is configured to authenticate users via an external method (such as HTTP 
authentication in Apache), but a username was not provided by the upstream 
(proxy) server. This is generally due to a misconfiguration in the upstream 
server.

I know that the same question was already asked in the following post but I 
haven't seen an option to extend the post and I haven't found an answer.
http://dev.list.galaxyproject.org/Installing-Galaxy-behind-an-Apache-proxy-using-mod-auth-cas-for-user-auth-tt4660837.html#none

Any help is much appreciated.

Many thanks,
Sandra

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] datacache bowtie2 for mm9 ?

2013-11-07 Thread Jennifer Jackson

Hi Curtis,
This is still open but I expect to correct this very soon, along with 
new data additions (but corrections first on the list!). We definately 
consider this important and apologize that this has impacted your 
paper's supplemental creation.


There is a Trello ticket containing the known data problems since the 
migration of usegalaxy.org, you can follow for updates here: 
https://trello.com/c/SbizUDQt


Thanks!

Jen
Galaxy team

On 10/29/13 8:18 AM, Curtis Hendrickson (Campus) wrote:


Jennifer,

What's the status of bowtie2/mm9 index on PSU main?

When I select tophat2, it offers me mm9 as a choice for built-in indexes.

However, when the job runs, I get the following error, indicating the 
bowtie2/mm9 indexes are missing (below).


Any insight into whether this is expected, or what the ETA is until 
the index would be installed, would be great.


I'm trying to reproduce work on PSU I ran on my local galaxy, so that 
we can link to it for supplemental materials for a paper.


Thanks,

Curtis

PS -- I clicked the submit bug button a few days ago, but haven't 
received a response yet.


Fatal error: Tool execution failed

[2013-10-29 10:13:27] Beginning TopHat run (v2.0.9)

---

[2013-10-29 10:13:27] Checking for Bowtie

   Bowtie version: 2.1.0.0

[2013-10-29 10:13:27] Checking for Samtools

 Samtools version: 0.1.18.0

[2013-10-29 10:13:27] Checking for Bowtie index files (genome)..

Error: Could not find Bowtie 2 index files 
(/galaxy/data/mm9/mm9full/bowtie2_index/mm9full.*.bt2)


*From:*Jennifer Jackson [mailto:j...@bx.psu.edu]
*Sent:* Friday, September 20, 2013 4:00 PM
*To:* Curtis Hendrickson (Campus)
*Subject:* Re: [galaxy-dev] datacache  bowtie2 for mm9 ?

Thanks Curtis,

I am actually working to try to get mm9 out there right now. No 
promises, but is just one (well, three, including variants)! If 
technical is a go, then will do it. Ideally others soonish. We'll see.


The last news brief has help for the Data manager, it may be that you 
need to do some config changes to get it going. I am certainly no 
expert - this is Dan's and under active development - but is where I 
would start.


Jen

On 9/20/13 1:25 PM, Curtis Hendrickson (Campus) wrote:

Thanks for the rapid reply! I have some questions and comments,
but need to read up on Data Managers (that admin page seems
non-functional in our local galaxy, despite being on latest code)
first.

Regards,

Curtis

*From:*Jennifer Jackson [mailto:j...@bx.psu.edu]
*Sent:* Friday, September 20, 2013 2:34 PM
*To:* Curtis Hendrickson (Campus)
*Cc:* galaxy-...@bx.psu.edu mailto:galaxy-...@bx.psu.edu
*Subject:* Re: [galaxy-dev] datacache  bowtie2 for mm9 ?

Hello Curtis,

The datacache was originally pointed to the data staging area and
is now pointed to the data published area. The difference is that
the published area contains data and location (.loc) files that
are in synch and have completed final testing. It is your choice
about whether to use the staged-only data - it depends how risk
tolerant your project is and if you plan on testing. But, that
said, I think it is almost certainly fine or our team wouldn't
have staged it yet. A vanishingly small number of datasets are
pulled back once they make it to staging, and this is why we were
comfortable pointing datacache there in the first place (were
unable to point to the published area at first, but wanted to make
the data available ASAP).

Going forward - I can let you know that these indexes are very
easy to create: one command-line execution, then add one line to
the associated .loc file. Instructions are here, see Bowtie and
Tophat:
http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup

For one or few genomes, not a problem. For hundreds of genomes
with variants, can become tedious even with helper tools and in
our case, the processing interacted with disk that was undergoing
changes (as we have been working on system configuration most of
the summer). Also, with the Data Manager is now available,
creating batch indexes for use via rsync become lower priority.
Even so, I would expect more indexes to be fully published once
the final configuration is in place, as many are already staged or
close being staged (watch the yellow banner on Main).

Hopefully this helps to explain the data, guides you to making an
informed decision, and aids with creating your own indexes as needed,

Thanks!
Jen
Galaxy team



On 9/18/13 1:04 PM, Curtis Hendrickson (Campus) wrote:

Folks,

First, I wanted to thank you for making the datacache
available
(http://wiki.galaxyproject.org/Admin/Data%20Integration;
rsync://datacache.g2.bx.psu.edu). It's a great resource.

However, what is the best way to stay abreast of 

Re: [galaxy-dev] Bowtie2 mm9 index

2013-11-07 Thread Jennifer Jackson

Hello Mary,

The underlying data is not complete for all mm9 reference genomes for 
the Bowtie2/Tophat2 tools, causing the error. It is a known issue and we 
expect to have it corrected very soon now. We consider this important 
and high priority, our apologies for the confusion it caused.


A list of known data issues since the migration of usegalaxy.org are 
listed in this ticket. You can follow the ticket to find out when this 
genome has been restored: https://trello.com/c/SbizUDQt


Best,

Jen
Galaxy team

On 10/29/13 7:43 AM, Davis, Mary wrote:


Greetings---

I tried to run an alignment using Bowtie2 and got this message-

format: bam, database: mm9

Could not locate a Bowtie index corresponding to basename 
/galaxy/data/mm9/mm9canon/bowtie2_index/mm9canon Error: Encountered 
internal Bowtie 2 exception (#1) Command: 
/galaxy/software/linux2.6-x86_64/pkg/bowtie2-2.1.0/bin/bowtie2-align 
--wrapper basic


I imported Illumina fastq data, groomed them, and then did the 
analysis using the built-in index mouse, both full and male, and had 
the same error message.


I'm relatively new to this, and don't see what I missed.

Thanks

Mary E. Davis, Ph.D.

Professor

Department of Physiology  Pharmacology

West Virginia University Health Sciences Center

PO Box 9229

Morgantown, WV  26506-9229



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas

2013-11-07 Thread Björn Grüning


 The potential problem I see here is that environment variables are not
 name spaced in any way, so if all env.sh files are sourced no matter
 what, there is the potential for a certain environment variable to get
 set to a certain dependency version, and then later during the
 installation (assuming a hierarchy of repository dependencies), the
 same environment variable gets set to a different version of the same
 dependency.  I'm not sure how often (if ever) this couls occur, but if
 it did, it the installation would not be as expected.


Mh possible, but really a rare corner case, I think. We could offer a
'do-not-source-automatically' tag, to prevent such corner cases.

repository name=package_expat_2_1 owner=iuc sourcing=manual|auto

  I see, this makes perfect sense to me now, thanks! I certainly agree
  that it should have to be spelled out twice unless there is a good
  reason. I guess my preference would be to just see it inside of the
  setup_perl_environment tag - why should it need to be at the
  top-level
  as well? There could be many implementation details that make this
  difficult though, so obviously I delegate to Greg/Dave on this.

 
 Just so I'm clear on this, is this what you want implemented as an
 enhancement to the setup_* tag sets?
 
 
 action type=setup_perl_environment OR action
 type=setup_r_environment OR action type=setup_ruby_environment
 OR action type=setup_virtualenv
repository changeset_revision=978287122b91
 name=package_perl_5_18 owner=iuc
 toolshed=http://testtoolshed.g2.bx.psu.edu;
package name=perl version=5.18.1 /
/repository
repository changeset_revision=8fc96166cddd
 name=package_expat_2_1 owner=iuc
 toolshed=http://testtoolshed.g2.bx.psu.edu;
package name=expat version=2.1 /
/repository
 /action
 
 For all repository tag sets contained within these setup_* tags, the
 repository's env.sh would be pulled in for the setup of the specified
 environment without requiring a set_environment_for_install action
 type.

Yes, that would solve John's and my use case.

Thanks Greg!
Bjoern


  

Also that did not solve the second use case. If have two
packages one
that is installing perl libraries and the second a binary that
is
checking or that needs these perl libs.
   
   We have discussed off list in another thread. Just to summarize my
   thoughts there - I think we should delay this or not make it a
   priority if there are marginally acceptable workarounds that can
   be
   found for the time being. Getting these four actions to work well
   as
   sort of terminal endpoints and allow specification as tersely as
   possible should be the primary goal for the time being. You will
   see
   Perl or Python packages depend on C libraries 10 times more
   frequently
   than you will find makefiles and C programs depend on complex perl
   or
   python environments (correct me if I am wrong). Given that there
   is
   already years worth of tool shed development outlined in existing
   Trello cards - this is just how I would prioritize things (happy
   to be
   overruled).
   
   Ok point taken. Lets focus on real issue. That use case is just a
   simplification / more structured way to write tool depdendecies,
   its not strictly needed to get my packages done.
   John just to make that use case clearer:
   - You have a package (A) with dependency (B)
   - B is not worth to put it in a extra repository (extra
   tool_dependencies.xml file)
   
   Currently, you are forced to define both in one package tag,
   because if
   you define it in two package tags A will not see B.
   The perl and python was a bad example you have that problem with
   every
   dependency that are not worth to put it in a separate repository.
   
   
   To summarize:
   I'm fine with that approach. It will address my current use case
   and it
   would be great to have it as proposed by Dave!
   
   Thanks a lot!
   Bjoern
   
   

 If so, can you confirm that this should be done for all four
 currently
 supported setup_* action types?
   
   I think it would be best to tackle setup_r_environment and
   setup_ruby_environment first. setup_virtualenv cannot have nested
   elements at this time - it is just assumed to be a bunch of text
   (either a file containing the dependencies or a list of the
   dependencies).
   
   So setup_r_environment and setup_ruby_environment have the same
   structure:
   
   setup_ruby_environment
repository .. /
package .. /
package .. /
   /setup_ruby_environment
   
   ... but setup_virtualenv is just
   
   setup_virtualenvrequests=1.20
   pycurl==1.3/setup_virtualenv
   
   I have created a Trello card for this:
   https://trello.com/c/NsLJv9la
   (and some other related stuff).
   
   Once that is tackled though, it will make sense to allow
   setup_virtualenv to utilize the same functionality.
   
   Thanks all,
   -John
   

I think it will solve my current issues.