Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-11-08 Thread Bjoern Gruening
Hi,

to chime into this discussion.

I found some inconsistency during my rsync endeavor and I'm curious if
there is any way to contribute to that service.

--
xenTro3 xenTro3 Frog (Xenopus tropicalis):
xenTro3 /galaxy/data/xenTro3/seq/xenTro3.fa
but only
/xenTro3.fa.gz exists.
---
ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc
---
ce6 has no .fa file under seq/ but in allfasta.loc there is a reference
to it ce6   Caenorhabditis elegans: ce6 /galaxy/data/ce6/seq/ce6.fa
---
TAIR9 and TAIR10 is not available via rync  
---
Bowtie2 indices are missing for ce6, xentTro3


Thanks,
Bjoern


 Hi Jennifer,
 
 Today I was trying to pull some bowtie2 indices from Galaxy rsync server for 
 PhiX to run some tests and just got the ones for bowtie1… I'm wondering 
 what's the state in regards to this past thread and what we can do to help in 
 here.
 
 Cheers!
 Roman
 
 7 mar 2013 kl. 20:01 skrev Jennifer Jackson j...@bx.psu.edu:
 
  Hi Brad (and Roman),
  
  The team has talked about this in detail. There are a few wrinkles with 
  just pulling in indexes - Dan is doing some work that could change this 
  later on, but for now, the rsync will continue to point to the same 
  location as Main's genome data source. This means that there are some 
  limits on what we can do immediately. Setting up a submission pipe is one 
  of them - there just isn't resource to do this right now or a common place 
  distinct from Main to house the data. A few other ideas came up - we can 
  chat later, each had side issues.
  
  But I saw your tweet and think that it is great that you are pulling 
  CloudBioLinux data from the rsync now, so let's get as much data in common 
  as possible, so you have data to work with near term.
  
  I am in the process of adding bt2 indexes - some are published to 
  Main/rsync server already and some are not, but more will show up over the 
  next week or so (along with more genomes and other indexes). I'll take a 
  look at what you have and pull/match what I can. Genome sort order and 
  variants are my concerns, both require special handling in processing and 
  .locs. If it takes longer to check, I am just going to create here if I 
  haven't already. The GATK-sort hg19 canonical is already on my list - it 
  needed all indexes, not just bw2. When the next distribution goes out, I'll 
  list what is new on the rsync in the News Brief.
  
  For the Novoalign indexes, I'm not quite sure what to do about those yet. 
  Or for any indexes associated with tools or genomes not hosted on Main. Do 
  you want to open a card for those and any other cases that are similar? We 
  can discuss a strategy from there, maybe at IUC, if Greg/Dan thinks it is 
  appropriate. Please add me so I can follow.
  
  I'll be in touch as I go through the data. Thanks for your patience on this!
  
  Jen
  Galaxy team
  
  On 2/21/13 12:43 PM, Brad Chapman wrote:
  Hi all;
  Is there a way for community members to contribute indexes to the rsync
  server? This resource is awesome and I'm working on migrating the
  CloudBioLinux retrieval scripts to use this instead of the custom S3
  buckets we'd set up previously:
  
  https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py
  
  It's great to have this as a public shared resource and I'd like to be
  able to contribute back. From an initial pass, here are the things I'd
  like to do:
  
  - Include bowtie2 indexes for more genomes.
  
  - Include novoalign indexes for a number of commonly used genomes.
  
  - Clean up hg19 to include a full canonically sorted hg19, with indexes.
Broad has a nice version prepped so GATK will be happy with it, and
you need to stick with this ordering if you're ever going to use a
GATK tool on it. Right now there is a partial hg19canon (without the
random/haplotype chromosomes) and the structure is a bit complex.
  
  What's the best way to contribute these? Right now I have a lot of the
  indexes on S3. For instance, the hg19 indexes are here:
  
  https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
  https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
  https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
  https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
  https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
  https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz
  
  I'm happy to format these differently or upload somewhere that would
  make it easy to include. Thanks again for setting this up, I'm looking
  forward to working off a shared repository of data,
  Brad
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
  
http://lists.bx.psu.edu/
  
  -- 
  Jennifer Hillman-Jackson
  Galaxy Support and Training
  http://galaxyproject.org
  
  

Re: [galaxy-dev] Fw: No peek issue and datasets wrongly reported as Empty

2013-11-08 Thread Jean-Francois Payotte
Thanks Nate,

I will try manually editing the lib/galaxy/jobs/runners/__init__.py file 
for now, as we are not ready to update to the latest distribution yet.
I should try the new distribution's fix in a couple weeks.

Many thanks for solving this issue!
I will let you know if it doesn't solve our issue when updating to the 
November distribution. :)

Thanks,
Jean-François




From:   Nate Coraor n...@bx.psu.edu
To: Jean-Francois Payotte jean-francois.payo...@dnalandmarks.ca
Cc: galaxy-dev@lists.bx.psu.edu
Date:   08/11/2013 09:40 AM
Subject:Re: [galaxy-dev] Fw: No peek issue and datasets wrongly 
reported as Empty



On Nov 7, 2013, at 2:45 PM, Jean-Francois Payotte wrote:

 Dear Galaxy developers, 
 
 I know I am not the only one with this issue, as over time I've stumbled 
on a few mailing-list threads with other users having the same problem. 
 And I know the recommended solution is to use the -noac mount option. (
http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#Unified_Method%29
) 
 
 However, it is said that using this -noac mount option comes with a 
performance trade-off, so when we first ran into this issue (datasets 
showing Empty and No peek, even though the file on the hard drive is 
full of content), we implemented the hack found in this thread: 
http://dev.list.galaxyproject.org/What-s-causing-this-error-td4141958.html#a4141963
 

 
 In this thread, John suggested to add a sleep() in the finish_job 
method of the galaxy_dist/lib/galaxy/jobs/runnersdrmaa.py file. 
 It worked very well for us. Adding a sleep(30) made all the jobs waiting 
30 seconds before finishing, but the No peek issue had at least 
disappear). 
 
 However, since the latest Galaxy updates, this file (drmaa.py) has been 
dramastically changed and the finish_job method doesn't exist anymore. 
 Hence, I had to remove this hack, hoping that this issue would have 
disappeared as well.  Unfortunaley, this No peek issue is still there 
and causing many headaches to some of our workflows users. 
 
 My question is then: Can I put this sleep(30) in some other place 
(method and/or file) in order to achieve the same result? 
 I would really like to solve this No peek issue without resorting to 
the -noac mount option.  Actually, I am not even sure our system 
administrator would allow it. 

Hi Jean-François,

The job runners have been largely refactored into 
lib/galaxy/jobs/runners/__init__.py, which is where you'll find 
finish_job().  However, we also recently added some tricks to work around 
this issue that has solved the problem (for usegalaxy.org, at least) 
without needing -noac.  This is available in Monday's distribution 
release.  Here's the commit:

  
https://bitbucket.org/galaxy/galaxy-central/commits/384240b8cd29963f302a0349476cf83734cfb5df?at=default


To use, set retry_job_output_collection  0 in the Galaxy config.

--nate

 
 Thanks again for your help! 
 Jean-François 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Fw: No peek issue and datasets wrongly reported as Empty

2013-11-08 Thread Nate Coraor
On Nov 7, 2013, at 2:45 PM, Jean-Francois Payotte wrote:

 Dear Galaxy developers, 
 
 I know I am not the only one with this issue, as over time I've stumbled on a 
 few mailing-list threads with other users having the same problem. 
 And I know the recommended solution is to use the -noac mount option. 
 (http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#Unified_Method%29)
  
 
 However, it is said that using this -noac mount option comes with a 
 performance trade-off, so when we first ran into this issue (datasets showing 
 Empty and No peek, even though the file on the hard drive is full of 
 content), we implemented the hack found in this thread: 
 http://dev.list.galaxyproject.org/What-s-causing-this-error-td4141958.html#a4141963
  
 
 In this thread, John suggested to add a sleep() in the finish_job method 
 of the galaxy_dist/lib/galaxy/jobs/runnersdrmaa.py file. 
 It worked very well for us. Adding a sleep(30) made all the jobs waiting 30 
 seconds before finishing, but the No peek issue had at least disappear). 
 
 However, since the latest Galaxy updates, this file (drmaa.py) has been 
 dramastically changed and the finish_job method doesn't exist anymore. 
 Hence, I had to remove this hack, hoping that this issue would have 
 disappeared as well.  Unfortunaley, this No peek issue is still there and 
 causing many headaches to some of our workflows users. 
 
 My question is then: Can I put this sleep(30) in some other place (method 
 and/or file) in order to achieve the same result? 
 I would really like to solve this No peek issue without resorting to the 
 -noac mount option.  Actually, I am not even sure our system administrator 
 would allow it. 

Hi Jean-François,

The job runners have been largely refactored into 
lib/galaxy/jobs/runners/__init__.py, which is where you'll find finish_job().  
However, we also recently added some tricks to work around this issue that has 
solved the problem (for usegalaxy.org, at least) without needing -noac.  This 
is available in Monday's distribution release.  Here's the commit:

  
https://bitbucket.org/galaxy/galaxy-central/commits/384240b8cd29963f302a0349476cf83734cfb5df?at=default

To use, set retry_job_output_collection  0 in the Galaxy config.

--nate

 
 Thanks again for your help! 
 Jean-François 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] SLURM and hidden success

2013-11-08 Thread Andrew Warren
Hello all,

We are in the process of switching from SGE to SLURM for our galaxy setup.
We are currently experiencing a problem where jobs that are completely
successful (no text in their stderr file and the proper exit code) are
being hidden after the job completes. Any job that fails or has some text
in the stderr file is not hidden (note: hidden not deleted; they can be
viewed by selecting 'Unhide Hidden Datasets').

Our drmaa.py is at changeset 10961:432999eabbaa
Our drmaa egg is at drmaa = 0.6
And our SLURM version is 2.3.5

And we are currently passing no parameters for default_cluster_job_runner =
drmaa:///

We have the same code base on both clusters but only observe this behavior
when using SLURM.
Any pointers or advice would be greatly appreciated.

Thanks,
Andrew
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Empty bowtie2 output

2013-11-08 Thread IIHG Galaxy Administrator
Sending to galaxy-dev instead.

From: Srinivas Maddhi 
iihg-galaxy-ad...@uiowa.edumailto:iihg-galaxy-ad...@uiowa.edu
Date: Friday, November 1, 2013 11:56 AM
To: galaxy-u...@lists.bx.psu.edumailto:galaxy-u...@lists.bx.psu.edu 
galaxy-u...@lists.bx.psu.edumailto:galaxy-u...@lists.bx.psu.edu
Subject: Empty bowtie2 output

In follow-up to 
http://user.list.galaxyproject.org/Empty-bowtie2-output-tp4656137.html, is 
there:
- an ETA on when the issue with Bowtie2, in August 2013 distribution, 
generating empty output will be fixed (if not already fixed) ?
- a suggested workaround (revert to an older version of that particular tool 
etc.) in the meantime ?

Thank you.

Unrelated: wasn't able to determine how to update that thread to request 
status, hence creating a new one.

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Managing Data Locality

2013-11-08 Thread Paniagua, Eric
Hi John,

I was just wondering, did you have an object store based suggestion as well?  
Logically, this seems to be where this operation should be done, but I don't 
see much infrastructure to support this, such as logic for moving a data object 
between object stores.  (Incidentally, the release of Galaxy I'm running is 
from last April or May.  Would and upgrade to the latest and greatest version 
pull in more support infrastructure for this?)

Regarding your LWR suggestion, admittedly I have not yet read the docs you 
referred me to, but I thought a second email was warranted anyway.  We would in 
fact be using DRMAA to talk to the HPCC (this is being configured as I write), 
and Galaxy's long-term storage lives on its our independent Galaxy server.  As 
I may have commented before, we can't simply mount our Galaxy file systems to 
the HPCC for security reasons.  To make the scenario even more concrete, we are 
currently using the DistributedObjectStore to balance Galaxy's storage 
requirements across three mounted volumes.  I don't expect this to complicate 
the task at hand, but please do let me know if you think it will.  We also 
currently have UGE set up on our Galaxy server, so we've already been using 
DRMAA to submit jobs.  The details for submission to another host are more 
complicated, though.

Does your LWR suggestion involve the use of scripts/drmaa_external_killer.py, 
scripts/drmaa_external_runner.py, and scripts/external_chown_script.py?  
(Particularly if so, ) Would you be so kind as to point me toward documentation 
for those scripts?  It's not clear to me from their source how they are 
intended to be used or at what stage of the job creation process they would be 
called by Galaxy.  The same applies also to the file_actions.json file you 
referred to previously.  Is that a Galaxy file or an LWR file?  Where may I 
find some documentation on the available configuration attributes, options, 
values, and semantics?  Does your LWR suggestion require that the same absolute 
path structure exists (not much information is conveyed by the action name 
copy), does it require a certain relative path structure to match on both 
file systems, how does setting that option lead to Galaxy setting the correct 
paths (local to the HPCC) when building the command line?

Our goal is to submit all heavy jobs (e.g. mappers) to the HPCC as the user who 
launches the Galaxy job.  Both the HPCC and our Galaxy instance use LDAP 
logins, so fortunately that's one wrinkle we don't have to worry about.  This 
will help all involved maintain fair quota policies on a per-user basis.  I 
plan to handle the support files (genome indices) by transferring them to the 
HPCC and rewriting the appropriate *.loc files on our Galaxy host with HPCC 
paths.

I appreciate your generous response to my first email, and hope to continue the 
conversation with this email.  Now, I will go RTFM for LWR. :)

Many thanks,
Eric


From: jmchil...@gmail.com [jmchil...@gmail.com] on behalf of John Chilton 
[chil...@msi.umn.edu]
Sent: Tuesday, November 05, 2013 11:58 AM
To: Paniagua, Eric
Cc: Galaxy Dev [galaxy-...@bx.psu.edu]
Subject: Re: [galaxy-dev] Managing Data Locality

Hey Eric,

I think what you are purposing would be a major development effort and
mirrors major development efforts ongoing. There are  sortof ways to
do this already, with various trade-offs, and none particularly well
documented. So before undertaking this efforts I would dig into some
alternatives.

If you are using PBS, the PBS runner contains some logic for
delegating to PBS for doing this kind of thing - I have never tried
it.

https://bitbucket.org/galaxy/galaxy-central/src/default/lib/galaxy/jobs/runners/pbs.py#cl-245

In may be possible to use a specially configured handler and the
Galaxy object store to stage files to a particular mount before
running jobs - not sure it makes sense in this case. It might be worth
looking into this (having the object store stage your files, instead
of solving it at the job runner level).

My recommendation however would be to investigate the LWR job runner.
There are a bunch of fairly recent developments to enable something
like what you are describing. For specificity lets say you are using
DRMAA to talk to some HPC cluster and Galaxy's file data is stored in
/galaxy/data on the galaxy web server but not on the HPC and there is
some scratch space (/scratch) that is mounted on both the Galaxy web
server and your HPC cluster.



I would stand up an LWR (http://lwr.readthedocs.org/en/latest/) server
right beside Galaxy on your web server. The LWR has a concept of
managers that sort of mirrors the concept of runners in Galaxy - see
the sample config for guidance on how to get it to talk with your
cluster. It could use DRMAA, torque command-line tools, or condor at
this time (I could add new methods e.g. PBS library if that would
help). 

Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-11-08 Thread Jennifer Jackson

Thanks,

There have been no public data updates since the migration started (late 
last spring we froze the data). But there are some known issues and data 
that is ready to be released, in the process of becoming ready, etc. We 
expect to be able to start working on this again in the very near term.


To start this off, the mm9 bowtie2 indexes were restored this morning:
https://trello.com/c/SbizUDQt

And these other finds are great, thanks Bjoern. I will add them to the 
public card. A few errors that were corrected later last spring popped 
out again, but will also be fixed. Small, but will be addressed ASAP.


Adding in any missing .2bit files in general are on our internal to-do 
list. Older genomes have other inconsistencies that will be addressed. 
The goal is to have the data filled in with complete indexes around the 
end of Nov, then filled in with newly released genomes/more variants for 
important model organisms by the end of the year. All of this depends on 
various factors, but this is where we are shooting for.


Thanks and if anything else off is noted, please feel free to send to 
the list or add to the card. All input is welcome - we want this to be a 
great resource for everyone - time to get back to making that happen now 
that the migration is wrapping up!


Jen
Galaxy team

On 11/8/13 2:00 AM, Bjoern Gruening wrote:

Hi,

to chime into this discussion.

I found some inconsistency during my rsync endeavor and I'm curious if 
there is any way to contribute to that service.


--
xenTro3 xenTro3 Frog (Xenopus tropicalis): xenTro3 
/galaxy/data/xenTro3/seq/xenTro3.fa

but only
/xenTro3.fa.gz exists.
---
ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc
---
ce6 has no .fa file under seq/ but in allfasta.loc there is a 
reference to it ce6 Caenorhabditis elegans: ce6 
/galaxy/data/ce6/seq/ce6.fa

---
TAIR9 and TAIR10 is not available via rync
---
Bowtie2 indices are missing for ce6, xentTro3


Thanks,
Bjoern


Hi Jennifer,

Today I was trying to pull some bowtie2 indices from Galaxy rsync server for 
PhiX to run some tests and just got the ones for bowtie1… I'm wondering what's 
the state in regards to this past thread and what we can do to help in here.

Cheers!
Roman

7 mar 2013 kl. 20:01 skrev Jennifer Jackson j...@bx.psu.edu  
mailto:j...@bx.psu.edu:

 Hi Brad (and Roman),

 The team has talked about this in detail. There are a few wrinkles with just 
pulling in indexes - Dan is doing some work that could change this later on, but 
for now, the rsync will continue to point to the same location as Main's genome 
data source. This means that there are some limits on what we can do immediately. 
Setting up a submission pipe is one of them - there just isn't resource to do this 
right now or a common place distinct from Main to house the data. A few other 
ideas came up - we can chat later, each had side issues.

 But I saw your tweet and think that it is great that you are pulling 
CloudBioLinux data from the rsync now, so let's get as much data in common as 
possible, so you have data to work with near term.

 I am in the process of adding bt2 indexes - some are published to Main/rsync 
server already and some are not, but more will show up over the next week or so 
(along with more genomes and other indexes). I'll take a look at what you have and 
pull/match what I can. Genome sort order and variants are my concerns, both 
require special handling in processing and .locs. If it takes longer to check, I 
am just going to create here if I haven't already. The GATK-sort hg19 canonical is 
already on my list - it needed all indexes, not just bw2. When the next 
distribution goes out, I'll list what is new on the rsync in the News Brief.

 For the Novoalign indexes, I'm not quite sure what to do about those yet. Or 
for any indexes associated with tools or genomes not hosted on Main. Do you want 
to open a card for those and any other cases that are similar? We can discuss a 
strategy from there, maybe at IUC, if Greg/Dan thinks it is appropriate. Please 
add me so I can follow.

 I'll be in touch as I go through the data. Thanks for your patience on this!

 Jen
 Galaxy team

 On 2/21/13 12:43 PM, Brad Chapman wrote:
 Hi all;
 Is there a way for community members to contribute indexes to the rsync
 server? This resource is awesome and I'm working on migrating the
 CloudBioLinux retrieval scripts to use this instead of the custom S3
 buckets we'd set up previously:

https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py

 It's great to have this as a public shared resource and I'd like to be
 able to contribute back. From an initial pass, here are the things I'd
 like to do:

 - Include bowtie2 indexes for more genomes.

 - Include novoalign indexes for a number of commonly used genomes.

 - Clean up hg19 to include a full canonically sorted hg19, with indexes.
   Broad has a nice version prepped so GATK will be happy with it, and
   you need to stick 

Re: [galaxy-dev] Empty bowtie2 output

2013-11-08 Thread Jennifer Jackson

Hello,

The mm9 bt2 indexes were restored this morning. You can track this and 
other current data fixes through this Trello card:

https://trello.com/c/SbizUDQt

Thanks for your patience during the migration. We are moving on to data 
now, both corrections and updates.


Jen
Galaxy team

On 11/7/13 3:58 PM, IIHG Galaxy Administrator wrote:

Sending to galaxy-dev instead.

From: Srinivas Maddhi iihg-galaxy-ad...@uiowa.edu 
mailto:iihg-galaxy-ad...@uiowa.edu

Date: Friday, November 1, 2013 11:56 AM
To: galaxy-u...@lists.bx.psu.edu 
mailto:galaxy-u...@lists.bx.psu.edu galaxy-u...@lists.bx.psu.edu 
mailto:galaxy-u...@lists.bx.psu.edu

Subject: Empty bowtie2 output

In follow-up to 
http://user.list.galaxyproject.org/Empty-bowtie2-output-tp4656137.html, is 
there:
- an ETA on when the issue with Bowtie2, in August 2013 distribution, 
generating empty output will be fixed (if not already fixed) ?
- a suggested workaround (revert to an older version of that 
particular tool etc.) in the meantime ?


Thank you.

Unrelated: wasn't able to determine how to update that thread to 
request status, hence creating a new one.




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Contributing to genome indexes on rsync server

2013-11-08 Thread Björn Grüning
Hi Jen,

fantastic news!

Thanks a lot!
Bjoern

 Thanks,
 
 There have been no public data updates since the migration started
 (late last spring we froze the data). But there are some known issues
 and data that is ready to be released, in the process of becoming
 ready, etc. We expect to be able to start working on this again in the
 very near term.
 
 To start this off, the mm9 bowtie2 indexes were restored this morning:
 https://trello.com/c/SbizUDQt
 
 And these other finds are great, thanks Bjoern. I will add them to the
 public card. A few errors that were corrected later last spring popped
 out again, but will also be fixed. Small, but will be addressed ASAP.
 
 Adding in any missing .2bit files in general are on our internal to-do
 list. Older genomes have other inconsistencies that will be addressed.
 The goal is to have the data filled in with complete indexes around
 the end of Nov, then filled in with newly released genomes/more
 variants for important model organisms by the end of the year. All of
 this depends on various factors, but this is where we are shooting
 for.
 
 Thanks and if anything else off is noted, please feel free to send to
 the list or add to the card. All input is welcome - we want this to be
 a great resource for everyone - time to get back to making that happen
 now that the migration is wrapping up!
 
 Jen
 Galaxy team
 
 On 11/8/13 2:00 AM, Bjoern Gruening wrote:
 
  Hi,
  
  to chime into this discussion.
  
  I found some inconsistency during my rsync endeavor and I'm curious
  if there is any way to contribute to that service.
  
  --
  xenTro3 xenTro3 Frog (Xenopus tropicalis):
  xenTro3 /galaxy/data/xenTro3/seq/xenTro3.fa
  but only
  /xenTro3.fa.gz exists.
  ---
  ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc
  ---
  ce6 has no .fa file under seq/ but in allfasta.loc there is a
  reference to it ce6 Caenorhabditis elegans:
  ce6 /galaxy/data/ce6/seq/ce6.fa
  ---
  TAIR9 and TAIR10 is not available via rync  
  ---
  Bowtie2 indices are missing for ce6, xentTro3
  
  
  Thanks,
  Bjoern
  
   Hi Jennifer,
   
   Today I was trying to pull some bowtie2 indices from Galaxy rsync server 
   for PhiX to run some tests and just got the ones for bowtie1… I'm 
   wondering what's the state in regards to this past thread and what we can 
   do to help in here.
   
   Cheers!
   Roman
   
   7 mar 2013 kl. 20:01 skrev Jennifer Jackson j...@bx.psu.edu:
   
Hi Brad (and Roman),

The team has talked about this in detail. There are a few wrinkles with 
just pulling in indexes - Dan is doing some work that could change this 
later on, but for now, the rsync will continue to point to the same 
location as Main's genome data source. This means that there are some 
limits on what we can do immediately. Setting up a submission pipe is 
one of them - there just isn't resource to do this right now or a 
common place distinct from Main to house the data. A few other ideas 
came up - we can chat later, each had side issues.

But I saw your tweet and think that it is great that you are pulling 
CloudBioLinux data from the rsync now, so let's get as much data in 
common as possible, so you have data to work with near term.

I am in the process of adding bt2 indexes - some are published to 
Main/rsync server already and some are not, but more will show up over 
the next week or so (along with more genomes and other indexes). I'll 
take a look at what you have and pull/match what I can. Genome sort 
order and variants are my concerns, both require special handling in 
processing and .locs. If it takes longer to check, I am just going to 
create here if I haven't already. The GATK-sort hg19 canonical is 
already on my list - it needed all indexes, not just bw2. When the next 
distribution goes out, I'll list what is new on the rsync in the News 
Brief.

For the Novoalign indexes, I'm not quite sure what to do about those 
yet. Or for any indexes associated with tools or genomes not hosted on 
Main. Do you want to open a card for those and any other cases that are 
similar? We can discuss a strategy from there, maybe at IUC, if 
Greg/Dan thinks it is appropriate. Please add me so I can follow.

I'll be in touch as I go through the data. Thanks for your patience on 
this!

Jen
Galaxy team

On 2/21/13 12:43 PM, Brad Chapman wrote:
Hi all;
Is there a way for community members to contribute indexes to the rsync
server? This resource is awesome and I'm working on migrating the
CloudBioLinux retrieval scripts to use this instead of the custom S3
buckets we'd set up previously:

https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py

It's great to have this as a public shared resource and I'd like to be
able to contribute back. From an initial pass, here are the 

Re: [galaxy-dev] tool of installing tool shed repositories

2013-11-08 Thread Greg Von Kuster
Hello Ray and Björn,

The ability to import an exported repository archive is now available for the 
web UI in 11261:5c59f2c4f770.  I will enhance the Tool Shed API to accomodate 
this new feature next.  

This feature is available only in galaxy-central, so it will not be available 
on the main tool shed until the next Galaxy release.  In the mean time, it 
should be helpful in migrating repositories for local development tool sheds to 
the test tool shed.


If you happened to export a capsule before this changeset, it will not be valid 
as there was an error in generating the capsule manifest that was corrected in 
this change set.  You can just export the capsule again with this latest 
changeset.

Please let me know if you encounter any issues.

Thanks,

Greg Von Kuster


On Nov 4, 2013, at 10:54 AM, Greg Von Kuster g...@bx.psu.edu wrote:

 Hello Ray and Björn,
 
 I'm currently working on the feature for importing a repository capsule into 
 a Tool Shed.  It's been on my plate for a while, but other priorities have 
 side-tracked this work.  Based on your exchange, I'm now working to finish up 
 this feature, so it should be available in the next few days.
 
 Thanks,
 
 Greg Von Kuster
 
 
 On Nov 4, 2013, at 3:46 AM, Björn Grüning 
 bjoern.gruen...@pharmazie.uni-freiburg.de wrote:
 
 Hi Ray,
 
 there is some work in that direction to easily import and export
 repositories. The export feature is already integrated and should help
 you. You will end up with a tarball with all informations about that
 repository and the import should be easier. If you want to work on the
 import part, I think your work is more than welcome!
 
 Cheers,
 Bjoern 
 
 Hi, there:
 
 I'm currently trying to migrate all repositories of main tool shed on 
 http://toolshed.g2.bx.psu.edu/ to local, but got some problems.
 
 I'm wondering whether there is an existing tool can automatically do the 
 job?
 
 thanks
 
 rgds,
 Ray 
 
 
 __
 ngsf...@hygenomics.com
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/
 
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/
 


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Managing Data Locality

2013-11-08 Thread Paniagua, Eric
Hi John,

I have now read the top-level documentation for LWR, and gone through the 
sample configurations.  I would appreciate if you would answer a few technical 
questions for me.

1) How exactly is the staging_directory in server.ini.sample used?  Is that 
intended to be the (final) location at which to put files on the remote server? 
 How is the relative path structure under $GALAXY_ROOT/databases/files handled?

2) What exactly does persistence_directory in server.ini.sample mean?  
Where should it be located, how will it be used?

3) What exactly does file_cache_dir in server.ini.sample mean?

4) Does LWR preserve some relative path (e.g. to GALAXY_ROOT) under the above 
directories?

5) Are files renamed when cached?  If so, are they eventually restored to their 
original names?

6) Is it possible to customize the DRMAA and/or qsub requests made by LWR, for 
example to include additional settings such as Project or a memory limit?  Is 
it possible to customize this on a case by case basis, rather than globally?

7) Are there any options for the queued_drmaa manager in 
job_managers.ini.sample which are not listed in that file?

8) What exactly are the differences between the queued_drmaa manager and the 
queued_cli manager?  Are there any options for the latter which are not in 
the job_managers.ini.sample file?

9) When I attempt to run LWR (not having completed all the mentioned 
preparation steps, namely without setting DRMAA_LIBRARY_PATH), I get a Seg 
fault.  Is this because it can't find DRMAA or is it potentially unrelated?  In 
the latter case, here's the error being output to the console:

./run.sh: line 65: 26277 Segmentation fault  paster serve server.ini $@

Lastly, a simple comment, hopefully helpful.  It would be nice if the LWR 
install docs at least mentioned the dependency of PyOpenSSL 0.13 (or later) on 
OpenSSL 0.9.8f (or later), maybe even with a comment that pip will listen to 
the environment variables CFLAGS and LDFLAGS in the event one is creating a 
local installation of the OpenSSL library for LWR to use.

Thank you for your time and assistance.

Best,
Eric

From: jmchil...@gmail.com [jmchil...@gmail.com] on behalf of John Chilton 
[chil...@msi.umn.edu]
Sent: Tuesday, November 05, 2013 11:58 AM
To: Paniagua, Eric
Cc: Galaxy Dev [galaxy-...@bx.psu.edu]
Subject: Re: [galaxy-dev] Managing Data Locality

Hey Eric,

I think what you are purposing would be a major development effort and
mirrors major development efforts ongoing. There are  sortof ways to
do this already, with various trade-offs, and none particularly well
documented. So before undertaking this efforts I would dig into some
alternatives.

If you are using PBS, the PBS runner contains some logic for
delegating to PBS for doing this kind of thing - I have never tried
it.

https://bitbucket.org/galaxy/galaxy-central/src/default/lib/galaxy/jobs/runners/pbs.py#cl-245

In may be possible to use a specially configured handler and the
Galaxy object store to stage files to a particular mount before
running jobs - not sure it makes sense in this case. It might be worth
looking into this (having the object store stage your files, instead
of solving it at the job runner level).

My recommendation however would be to investigate the LWR job runner.
There are a bunch of fairly recent developments to enable something
like what you are describing. For specificity lets say you are using
DRMAA to talk to some HPC cluster and Galaxy's file data is stored in
/galaxy/data on the galaxy web server but not on the HPC and there is
some scratch space (/scratch) that is mounted on both the Galaxy web
server and your HPC cluster.

I would stand up an LWR (http://lwr.readthedocs.org/en/latest/) server
right beside Galaxy on your web server. The LWR has a concept of
managers that sort of mirrors the concept of runners in Galaxy - see
the sample config for guidance on how to get it to talk with your
cluster. It could use DRMAA, torque command-line tools, or condor at
this time (I could add new methods e.g. PBS library if that would
help). 
https://bitbucket.org/jmchilton/lwr/src/default/job_managers.ini.sample?at=default

On the Galaxy side, I would then create a job_conf.xml file telling
certain HPC tools to be sent to the LWR. Be sure to enable the LWR
runner at the top (see advanced example config) and then add at least
one LWR destination.

 destinations

destination id=lwr runner=lwr
  param id=urlhttp://localhost:8913//param
  !-- Leave Galaxy directory and data indices alone, assumes they
are mounted in both places. --
  param id=default_file_actionnone/param
  !-- Do stage everything in /galaxy/data though --
  param id=file_action_configfile_actions.json/param
/destination

Then create a file_actions.json file in the Galaxy root directory
(structure of this file is subject to change, current json layout
doesn't feel very Galaxy-ish).

{paths: [
{path: 

[galaxy-dev] Trouble getting proftpd to use Galaxy postgresql authentication

2013-11-08 Thread Hazard, E. Starr
Folks,

I am new to using proftpd/Galaxy on a local cluster I am having lots of issues. 
Here is one. I cannot connect to the proftpd server to upload files.


./proftpd --version

ProFTPD Version 1.3.4d

( so proftpd is compiled and running)


I am attempting to connect from an iMAC (OSX 10.9) to a LINUX machine where the 
proftp server is running


ftp 123.45.678.123

Connected to 123.45.678.123.

220 ProFTPD 1.3.4d Server (Public Galaxy FTP by ProFTPD server installation) 
[128.23.191.200]

Name (123.45.678.123:hazards): hazards

331 Password required for hazards


I enter a password and I get  “Abort trap 6” or “421 service not available, 
remote server closed connection” ( this latter reply from an older mac)


I presume that this means that the authentication failed.


Here are the last lines of my proftpd.conf file

# Do not authenticate against real (system) users

AuthPAM off



# Set up mod_sql_password - Galaxy passwords are stored as hex-encoded SHA1

SQLPasswordEngine   on

SQLPasswordEncoding hex


# Set up mod_sql to authenticate against the Galaxy database

SQLEngine   on

SQLBackend  postgres

SQLConnectInfo  galax...@xxx.musc.edu dbuser dbpassword

SQLAuthTypesSHA1

SQLAuthenticate users



# An empty directory in case chroot fails

SQLDefaultHomedir   /shared/app/ProFFTPd-1.3.4d/default


# Define a custom query for lookup that returns a passwd-like entry.  UID and 
GID should match your Galaxy user.

SQLUserInfo custom:/LookupGalaxyUser

SQLNamedQuery   LookupGalaxyUser SELECT 
email,password,'581','582','/shared/app/Galaxy/galaxy_dist/database/files/%U','/bin/bash'
 FROM galaxy_user WHERE email='%U’


I have a user named galaxy on the system and a Galaxy-user named galaxy defined 
within Galaxy and an owner of the postgresql database.


With respect to the following line, If my Galaxy postgresql database is called 
“galaxydb”,  are the dbuser and dbpassword to be the postgresql owner of the 
database? Or the system user named galaxy who has actually started the instance 
of galaxy with  run.sh command.


SQLConnectInfo  galax...@xxx.musc.edu galaxy Galaxy2013


With respect to this line

SQLNamedQuery   LookupGalaxyUser SELECT 
email,password,'581','582’,……

What are the appropriate UID and GID to apply here?  I  have a system user 
“galaxy” who starts Galaxy with “run.sh”. This user’s UID and GID are 581 and 
582 respectively but I am not able to login for FTP transfer


If I use Chrome and paste  ftp://123.45.678.123”


I get “unable to load webpage because server sent no data “error code  
ERR_EMPTY_RESPONSE


If I paste

ftp://123.45.678.123/ /path/to/file


And press execute in my local Galaxy GUI I get a tool error” with no Std_Out 
and no Std_error


So I cannot use FTP via  ProFTPD 1.3.4d Server to upload files. How can I test 
postgresql authentication via Galaxy/proftpd to get at the problem?


Starr


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] multiple output tool in workflow

2013-11-08 Thread Jennifer Jackson

Hi Jun,

There is probably a problem with the tool design itself, but that may be 
what you are asking how to solve. I wouldn't think this is a problem 
with workflows at first pass.


Is this your own tool? Or a tool from the tool shed (the repo developer 
is usually the one to make changes, unless you want to try)?
This is the primary tool development wiki, the  output tag set is 
where I would double check the tool first.

http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax

I am moving this over to the galaxy-...@bx.psu.edu mailing list since it 
is a tool development question.


Jen
Galaxy team


On 11/8/13 10:19 AM, Jun Fan wrote:


Hi all,

  I am trying to creating a workflow from history. One of the tool 
used generates multiple outputs in the format of gff3, fasta and sam. 
Gff3 will be visualized in IGV and the fasta file is doing further 
BLAST analysis. Now the problem is that the automatically generated 
workflow does not connect the having-multiple-output tool and the 
BLAST tool. I failed even I tried to connect these two tools in the 
workflow by hand. I am guessing this is due to only the main output 
type (gff3) is recognized in the workflow. How could I solve this problem?


Best regards and have a nice weekend!

Jun



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] multiple output tool in workflow

2013-11-08 Thread Jun Fan
Hi Jen

 Thanks for your reply. Yes, it is my own tool.
The outputs element is
outputs

data format=gff3 name=output/

/outputs

In the command element, the three output files are defined as below
$output /$__new_file_path__/primary_${output.id}_samWithPeptides_visible_sam 
/$__new_file_path__/primary_${output.id}_longestORFs_visible_fasta

Is there anything wrong here?

Best regards!
Jun

From: Jennifer Jackson [mailto:j...@bx.psu.edu]
Sent: 09 November 2013 01:05
To: Galaxy Dev
Cc: Jun Fan
Subject: multiple output tool in workflow

Hi Jun,

There is probably a problem with the tool design itself, but that may be what 
you are asking how to solve. I wouldn't think this is a problem with workflows 
at first pass.

Is this your own tool? Or a tool from the tool shed (the repo developer is 
usually the one to make changes, unless you want to try)?
This is the primary tool development wiki, the  output tag set is where I 
would double check the tool first.
http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax

I am moving this over to the 
galaxy-...@bx.psu.edumailto:galaxy-...@bx.psu.edu mailing list since it is a 
tool development question.

Jen
Galaxy team

On 11/8/13 10:19 AM, Jun Fan wrote:
Hi all,

  I am trying to creating a workflow from history. One of the tool used 
generates multiple outputs in the format of gff3, fasta and sam. Gff3 will be 
visualized in IGV and the fasta file is doing further BLAST analysis. Now the 
problem is that the automatically generated workflow does not connect the 
having-multiple-output tool and the BLAST tool. I failed even I tried to 
connect these two tools in the workflow by hand. I am guessing this is due to 
only the main output type (gff3) is recognized in the workflow. How could I 
solve this problem?

Best regards and have a nice weekend!
Jun




___

The Galaxy User list should be used for the discussion of

Galaxy analysis and other features on the public server

at usegalaxy.org.  Please keep all replies on the list by

using reply all in your mail client.  For discussion of

local Galaxy instances and the Galaxy source code, please

use the Galaxy Development list:



  http://lists.bx.psu.edu/listinfo/galaxy-dev



To manage your subscriptions to this and other Galaxy lists,

please use the interface at:



  http://lists.bx.psu.edu/



To search Galaxy mailing lists use the unified search at:



  http://galaxyproject.org/search/mailinglists/



--

Jennifer Hillman-Jackson

http://galaxyproject.org
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] multiple output tool in workflow

2013-11-08 Thread Jennifer Jackson

Hi Fan,

The outputs /outputs block should contain one line for each of the 
output files. I believe that you need to name these differently. 
output1, output2, etc. Or you can add in text and use variables, if 
you add in the  label option.


Others can correct or add to my comments.

Good luck!

Jen
Galaxy team

On 11/8/13 5:41 PM, Jun Fan wrote:


Hi Jen

 Thanks for your reply. Yes, it is my own tool.

The outputs element is

outputs

data format=gff3 name=output/

/outputs

In the command element, the three output files are defined as below

$output 
/$__new_file_path__/primary_${output.id}_samWithPeptides_visible_sam 
/$__new_file_path__/primary_${output.id}_longestORFs_visible_fasta


Is there anything wrong here?

Best regards!

Jun

*From:*Jennifer Jackson [mailto:j...@bx.psu.edu]
*Sent:* 09 November 2013 01:05
*To:* Galaxy Dev
*Cc:* Jun Fan
*Subject:* multiple output tool in workflow

Hi Jun,

There is probably a problem with the tool design itself, but that may 
be what you are asking how to solve. I wouldn't think this is a 
problem with workflows at first pass.


Is this your own tool? Or a tool from the tool shed (the repo 
developer is usually the one to make changes, unless you want to try)?
This is the primary tool development wiki, the  output tag set is 
where I would double check the tool first.

http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax

I am moving this over to the galaxy-...@bx.psu.edu 
mailto:galaxy-...@bx.psu.edu mailing list since it is a tool 
development question.


Jen
Galaxy team

On 11/8/13 10:19 AM, Jun Fan wrote:

Hi all,

  I am trying to creating a workflow from history. One of the
tool used generates multiple outputs in the format of gff3, fasta
and sam. Gff3 will be visualized in IGV and the fasta file is
doing further BLAST analysis. Now the problem is that the
automatically generated workflow does not connect the
having-multiple-output tool and the BLAST tool. I failed even I
tried to connect these two tools in the workflow by hand. I am
guessing this is due to only the main output type (gff3) is
recognized in the workflow. How could I solve this problem?

Best regards and have a nice weekend!

Jun




___

The Galaxy User list should be used for the discussion of

Galaxy analysis and other features on the public server

at usegalaxy.org.  Please keep all replies on the list by

using reply all in your mail client.  For discussion of

local Galaxy instances and the Galaxy source code, please

use the Galaxy Development list:

  


   http://lists.bx.psu.edu/listinfo/galaxy-dev

  


To manage your subscriptions to this and other Galaxy lists,

please use the interface at:

  


   http://lists.bx.psu.edu/

  


To search Galaxy mailing lists use the unified search at:

  


   http://galaxyproject.org/search/mailinglists/



--
Jennifer Hillman-Jackson
http://galaxyproject.org


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/