Re: [galaxy-dev] Contributing to genome indexes on rsync server
Hi, to chime into this discussion. I found some inconsistency during my rsync endeavor and I'm curious if there is any way to contribute to that service. -- xenTro3 xenTro3 Frog (Xenopus tropicalis): xenTro3 /galaxy/data/xenTro3/seq/xenTro3.fa but only /xenTro3.fa.gz exists. --- ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc --- ce6 has no .fa file under seq/ but in allfasta.loc there is a reference to it ce6 Caenorhabditis elegans: ce6 /galaxy/data/ce6/seq/ce6.fa --- TAIR9 and TAIR10 is not available via rync --- Bowtie2 indices are missing for ce6, xentTro3 Thanks, Bjoern Hi Jennifer, Today I was trying to pull some bowtie2 indices from Galaxy rsync server for PhiX to run some tests and just got the ones for bowtie1… I'm wondering what's the state in regards to this past thread and what we can do to help in here. Cheers! Roman 7 mar 2013 kl. 20:01 skrev Jennifer Jackson j...@bx.psu.edu: Hi Brad (and Roman), The team has talked about this in detail. There are a few wrinkles with just pulling in indexes - Dan is doing some work that could change this later on, but for now, the rsync will continue to point to the same location as Main's genome data source. This means that there are some limits on what we can do immediately. Setting up a submission pipe is one of them - there just isn't resource to do this right now or a common place distinct from Main to house the data. A few other ideas came up - we can chat later, each had side issues. But I saw your tweet and think that it is great that you are pulling CloudBioLinux data from the rsync now, so let's get as much data in common as possible, so you have data to work with near term. I am in the process of adding bt2 indexes - some are published to Main/rsync server already and some are not, but more will show up over the next week or so (along with more genomes and other indexes). I'll take a look at what you have and pull/match what I can. Genome sort order and variants are my concerns, both require special handling in processing and .locs. If it takes longer to check, I am just going to create here if I haven't already. The GATK-sort hg19 canonical is already on my list - it needed all indexes, not just bw2. When the next distribution goes out, I'll list what is new on the rsync in the News Brief. For the Novoalign indexes, I'm not quite sure what to do about those yet. Or for any indexes associated with tools or genomes not hosted on Main. Do you want to open a card for those and any other cases that are similar? We can discuss a strategy from there, maybe at IUC, if Greg/Dan thinks it is appropriate. Please add me so I can follow. I'll be in touch as I go through the data. Thanks for your patience on this! Jen Galaxy team On 2/21/13 12:43 PM, Brad Chapman wrote: Hi all; Is there a way for community members to contribute indexes to the rsync server? This resource is awesome and I'm working on migrating the CloudBioLinux retrieval scripts to use this instead of the custom S3 buckets we'd set up previously: https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py It's great to have this as a public shared resource and I'd like to be able to contribute back. From an initial pass, here are the things I'd like to do: - Include bowtie2 indexes for more genomes. - Include novoalign indexes for a number of commonly used genomes. - Clean up hg19 to include a full canonically sorted hg19, with indexes. Broad has a nice version prepped so GATK will be happy with it, and you need to stick with this ordering if you're ever going to use a GATK tool on it. Right now there is a partial hg19canon (without the random/haplotype chromosomes) and the structure is a bit complex. What's the best way to contribute these? Right now I have a lot of the indexes on S3. For instance, the hg19 indexes are here: https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz I'm happy to format these differently or upload somewhere that would make it easy to include. Thanks again for setting this up, I'm looking forward to working off a shared repository of data, Brad ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
Re: [galaxy-dev] Fw: No peek issue and datasets wrongly reported as Empty
Thanks Nate, I will try manually editing the lib/galaxy/jobs/runners/__init__.py file for now, as we are not ready to update to the latest distribution yet. I should try the new distribution's fix in a couple weeks. Many thanks for solving this issue! I will let you know if it doesn't solve our issue when updating to the November distribution. :) Thanks, Jean-François From: Nate Coraor n...@bx.psu.edu To: Jean-Francois Payotte jean-francois.payo...@dnalandmarks.ca Cc: galaxy-dev@lists.bx.psu.edu Date: 08/11/2013 09:40 AM Subject:Re: [galaxy-dev] Fw: No peek issue and datasets wrongly reported as Empty On Nov 7, 2013, at 2:45 PM, Jean-Francois Payotte wrote: Dear Galaxy developers, I know I am not the only one with this issue, as over time I've stumbled on a few mailing-list threads with other users having the same problem. And I know the recommended solution is to use the -noac mount option. ( http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#Unified_Method%29 ) However, it is said that using this -noac mount option comes with a performance trade-off, so when we first ran into this issue (datasets showing Empty and No peek, even though the file on the hard drive is full of content), we implemented the hack found in this thread: http://dev.list.galaxyproject.org/What-s-causing-this-error-td4141958.html#a4141963 In this thread, John suggested to add a sleep() in the finish_job method of the galaxy_dist/lib/galaxy/jobs/runnersdrmaa.py file. It worked very well for us. Adding a sleep(30) made all the jobs waiting 30 seconds before finishing, but the No peek issue had at least disappear). However, since the latest Galaxy updates, this file (drmaa.py) has been dramastically changed and the finish_job method doesn't exist anymore. Hence, I had to remove this hack, hoping that this issue would have disappeared as well. Unfortunaley, this No peek issue is still there and causing many headaches to some of our workflows users. My question is then: Can I put this sleep(30) in some other place (method and/or file) in order to achieve the same result? I would really like to solve this No peek issue without resorting to the -noac mount option. Actually, I am not even sure our system administrator would allow it. Hi Jean-François, The job runners have been largely refactored into lib/galaxy/jobs/runners/__init__.py, which is where you'll find finish_job(). However, we also recently added some tricks to work around this issue that has solved the problem (for usegalaxy.org, at least) without needing -noac. This is available in Monday's distribution release. Here's the commit: https://bitbucket.org/galaxy/galaxy-central/commits/384240b8cd29963f302a0349476cf83734cfb5df?at=default To use, set retry_job_output_collection 0 in the Galaxy config. --nate Thanks again for your help! Jean-François ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Fw: No peek issue and datasets wrongly reported as Empty
On Nov 7, 2013, at 2:45 PM, Jean-Francois Payotte wrote: Dear Galaxy developers, I know I am not the only one with this issue, as over time I've stumbled on a few mailing-list threads with other users having the same problem. And I know the recommended solution is to use the -noac mount option. (http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#Unified_Method%29) However, it is said that using this -noac mount option comes with a performance trade-off, so when we first ran into this issue (datasets showing Empty and No peek, even though the file on the hard drive is full of content), we implemented the hack found in this thread: http://dev.list.galaxyproject.org/What-s-causing-this-error-td4141958.html#a4141963 In this thread, John suggested to add a sleep() in the finish_job method of the galaxy_dist/lib/galaxy/jobs/runnersdrmaa.py file. It worked very well for us. Adding a sleep(30) made all the jobs waiting 30 seconds before finishing, but the No peek issue had at least disappear). However, since the latest Galaxy updates, this file (drmaa.py) has been dramastically changed and the finish_job method doesn't exist anymore. Hence, I had to remove this hack, hoping that this issue would have disappeared as well. Unfortunaley, this No peek issue is still there and causing many headaches to some of our workflows users. My question is then: Can I put this sleep(30) in some other place (method and/or file) in order to achieve the same result? I would really like to solve this No peek issue without resorting to the -noac mount option. Actually, I am not even sure our system administrator would allow it. Hi Jean-François, The job runners have been largely refactored into lib/galaxy/jobs/runners/__init__.py, which is where you'll find finish_job(). However, we also recently added some tricks to work around this issue that has solved the problem (for usegalaxy.org, at least) without needing -noac. This is available in Monday's distribution release. Here's the commit: https://bitbucket.org/galaxy/galaxy-central/commits/384240b8cd29963f302a0349476cf83734cfb5df?at=default To use, set retry_job_output_collection 0 in the Galaxy config. --nate Thanks again for your help! Jean-François ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] SLURM and hidden success
Hello all, We are in the process of switching from SGE to SLURM for our galaxy setup. We are currently experiencing a problem where jobs that are completely successful (no text in their stderr file and the proper exit code) are being hidden after the job completes. Any job that fails or has some text in the stderr file is not hidden (note: hidden not deleted; they can be viewed by selecting 'Unhide Hidden Datasets'). Our drmaa.py is at changeset 10961:432999eabbaa Our drmaa egg is at drmaa = 0.6 And our SLURM version is 2.3.5 And we are currently passing no parameters for default_cluster_job_runner = drmaa:/// We have the same code base on both clusters but only observe this behavior when using SLURM. Any pointers or advice would be greatly appreciated. Thanks, Andrew ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Empty bowtie2 output
Sending to galaxy-dev instead. From: Srinivas Maddhi iihg-galaxy-ad...@uiowa.edumailto:iihg-galaxy-ad...@uiowa.edu Date: Friday, November 1, 2013 11:56 AM To: galaxy-u...@lists.bx.psu.edumailto:galaxy-u...@lists.bx.psu.edu galaxy-u...@lists.bx.psu.edumailto:galaxy-u...@lists.bx.psu.edu Subject: Empty bowtie2 output In follow-up to http://user.list.galaxyproject.org/Empty-bowtie2-output-tp4656137.html, is there: - an ETA on when the issue with Bowtie2, in August 2013 distribution, generating empty output will be fixed (if not already fixed) ? - a suggested workaround (revert to an older version of that particular tool etc.) in the meantime ? Thank you. Unrelated: wasn't able to determine how to update that thread to request status, hence creating a new one. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Managing Data Locality
Hi John, I was just wondering, did you have an object store based suggestion as well? Logically, this seems to be where this operation should be done, but I don't see much infrastructure to support this, such as logic for moving a data object between object stores. (Incidentally, the release of Galaxy I'm running is from last April or May. Would and upgrade to the latest and greatest version pull in more support infrastructure for this?) Regarding your LWR suggestion, admittedly I have not yet read the docs you referred me to, but I thought a second email was warranted anyway. We would in fact be using DRMAA to talk to the HPCC (this is being configured as I write), and Galaxy's long-term storage lives on its our independent Galaxy server. As I may have commented before, we can't simply mount our Galaxy file systems to the HPCC for security reasons. To make the scenario even more concrete, we are currently using the DistributedObjectStore to balance Galaxy's storage requirements across three mounted volumes. I don't expect this to complicate the task at hand, but please do let me know if you think it will. We also currently have UGE set up on our Galaxy server, so we've already been using DRMAA to submit jobs. The details for submission to another host are more complicated, though. Does your LWR suggestion involve the use of scripts/drmaa_external_killer.py, scripts/drmaa_external_runner.py, and scripts/external_chown_script.py? (Particularly if so, ) Would you be so kind as to point me toward documentation for those scripts? It's not clear to me from their source how they are intended to be used or at what stage of the job creation process they would be called by Galaxy. The same applies also to the file_actions.json file you referred to previously. Is that a Galaxy file or an LWR file? Where may I find some documentation on the available configuration attributes, options, values, and semantics? Does your LWR suggestion require that the same absolute path structure exists (not much information is conveyed by the action name copy), does it require a certain relative path structure to match on both file systems, how does setting that option lead to Galaxy setting the correct paths (local to the HPCC) when building the command line? Our goal is to submit all heavy jobs (e.g. mappers) to the HPCC as the user who launches the Galaxy job. Both the HPCC and our Galaxy instance use LDAP logins, so fortunately that's one wrinkle we don't have to worry about. This will help all involved maintain fair quota policies on a per-user basis. I plan to handle the support files (genome indices) by transferring them to the HPCC and rewriting the appropriate *.loc files on our Galaxy host with HPCC paths. I appreciate your generous response to my first email, and hope to continue the conversation with this email. Now, I will go RTFM for LWR. :) Many thanks, Eric From: jmchil...@gmail.com [jmchil...@gmail.com] on behalf of John Chilton [chil...@msi.umn.edu] Sent: Tuesday, November 05, 2013 11:58 AM To: Paniagua, Eric Cc: Galaxy Dev [galaxy-...@bx.psu.edu] Subject: Re: [galaxy-dev] Managing Data Locality Hey Eric, I think what you are purposing would be a major development effort and mirrors major development efforts ongoing. There are sortof ways to do this already, with various trade-offs, and none particularly well documented. So before undertaking this efforts I would dig into some alternatives. If you are using PBS, the PBS runner contains some logic for delegating to PBS for doing this kind of thing - I have never tried it. https://bitbucket.org/galaxy/galaxy-central/src/default/lib/galaxy/jobs/runners/pbs.py#cl-245 In may be possible to use a specially configured handler and the Galaxy object store to stage files to a particular mount before running jobs - not sure it makes sense in this case. It might be worth looking into this (having the object store stage your files, instead of solving it at the job runner level). My recommendation however would be to investigate the LWR job runner. There are a bunch of fairly recent developments to enable something like what you are describing. For specificity lets say you are using DRMAA to talk to some HPC cluster and Galaxy's file data is stored in /galaxy/data on the galaxy web server but not on the HPC and there is some scratch space (/scratch) that is mounted on both the Galaxy web server and your HPC cluster. I would stand up an LWR (http://lwr.readthedocs.org/en/latest/) server right beside Galaxy on your web server. The LWR has a concept of managers that sort of mirrors the concept of runners in Galaxy - see the sample config for guidance on how to get it to talk with your cluster. It could use DRMAA, torque command-line tools, or condor at this time (I could add new methods e.g. PBS library if that would help).
Re: [galaxy-dev] Contributing to genome indexes on rsync server
Thanks, There have been no public data updates since the migration started (late last spring we froze the data). But there are some known issues and data that is ready to be released, in the process of becoming ready, etc. We expect to be able to start working on this again in the very near term. To start this off, the mm9 bowtie2 indexes were restored this morning: https://trello.com/c/SbizUDQt And these other finds are great, thanks Bjoern. I will add them to the public card. A few errors that were corrected later last spring popped out again, but will also be fixed. Small, but will be addressed ASAP. Adding in any missing .2bit files in general are on our internal to-do list. Older genomes have other inconsistencies that will be addressed. The goal is to have the data filled in with complete indexes around the end of Nov, then filled in with newly released genomes/more variants for important model organisms by the end of the year. All of this depends on various factors, but this is where we are shooting for. Thanks and if anything else off is noted, please feel free to send to the list or add to the card. All input is welcome - we want this to be a great resource for everyone - time to get back to making that happen now that the migration is wrapping up! Jen Galaxy team On 11/8/13 2:00 AM, Bjoern Gruening wrote: Hi, to chime into this discussion. I found some inconsistency during my rsync endeavor and I'm curious if there is any way to contribute to that service. -- xenTro3 xenTro3 Frog (Xenopus tropicalis): xenTro3 /galaxy/data/xenTro3/seq/xenTro3.fa but only /xenTro3.fa.gz exists. --- ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc --- ce6 has no .fa file under seq/ but in allfasta.loc there is a reference to it ce6 Caenorhabditis elegans: ce6 /galaxy/data/ce6/seq/ce6.fa --- TAIR9 and TAIR10 is not available via rync --- Bowtie2 indices are missing for ce6, xentTro3 Thanks, Bjoern Hi Jennifer, Today I was trying to pull some bowtie2 indices from Galaxy rsync server for PhiX to run some tests and just got the ones for bowtie1… I'm wondering what's the state in regards to this past thread and what we can do to help in here. Cheers! Roman 7 mar 2013 kl. 20:01 skrev Jennifer Jackson j...@bx.psu.edu mailto:j...@bx.psu.edu: Hi Brad (and Roman), The team has talked about this in detail. There are a few wrinkles with just pulling in indexes - Dan is doing some work that could change this later on, but for now, the rsync will continue to point to the same location as Main's genome data source. This means that there are some limits on what we can do immediately. Setting up a submission pipe is one of them - there just isn't resource to do this right now or a common place distinct from Main to house the data. A few other ideas came up - we can chat later, each had side issues. But I saw your tweet and think that it is great that you are pulling CloudBioLinux data from the rsync now, so let's get as much data in common as possible, so you have data to work with near term. I am in the process of adding bt2 indexes - some are published to Main/rsync server already and some are not, but more will show up over the next week or so (along with more genomes and other indexes). I'll take a look at what you have and pull/match what I can. Genome sort order and variants are my concerns, both require special handling in processing and .locs. If it takes longer to check, I am just going to create here if I haven't already. The GATK-sort hg19 canonical is already on my list - it needed all indexes, not just bw2. When the next distribution goes out, I'll list what is new on the rsync in the News Brief. For the Novoalign indexes, I'm not quite sure what to do about those yet. Or for any indexes associated with tools or genomes not hosted on Main. Do you want to open a card for those and any other cases that are similar? We can discuss a strategy from there, maybe at IUC, if Greg/Dan thinks it is appropriate. Please add me so I can follow. I'll be in touch as I go through the data. Thanks for your patience on this! Jen Galaxy team On 2/21/13 12:43 PM, Brad Chapman wrote: Hi all; Is there a way for community members to contribute indexes to the rsync server? This resource is awesome and I'm working on migrating the CloudBioLinux retrieval scripts to use this instead of the custom S3 buckets we'd set up previously: https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py It's great to have this as a public shared resource and I'd like to be able to contribute back. From an initial pass, here are the things I'd like to do: - Include bowtie2 indexes for more genomes. - Include novoalign indexes for a number of commonly used genomes. - Clean up hg19 to include a full canonically sorted hg19, with indexes. Broad has a nice version prepped so GATK will be happy with it, and you need to stick
Re: [galaxy-dev] Empty bowtie2 output
Hello, The mm9 bt2 indexes were restored this morning. You can track this and other current data fixes through this Trello card: https://trello.com/c/SbizUDQt Thanks for your patience during the migration. We are moving on to data now, both corrections and updates. Jen Galaxy team On 11/7/13 3:58 PM, IIHG Galaxy Administrator wrote: Sending to galaxy-dev instead. From: Srinivas Maddhi iihg-galaxy-ad...@uiowa.edu mailto:iihg-galaxy-ad...@uiowa.edu Date: Friday, November 1, 2013 11:56 AM To: galaxy-u...@lists.bx.psu.edu mailto:galaxy-u...@lists.bx.psu.edu galaxy-u...@lists.bx.psu.edu mailto:galaxy-u...@lists.bx.psu.edu Subject: Empty bowtie2 output In follow-up to http://user.list.galaxyproject.org/Empty-bowtie2-output-tp4656137.html, is there: - an ETA on when the issue with Bowtie2, in August 2013 distribution, generating empty output will be fixed (if not already fixed) ? - a suggested workaround (revert to an older version of that particular tool etc.) in the meantime ? Thank you. Unrelated: wasn't able to determine how to update that thread to request status, hence creating a new one. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Contributing to genome indexes on rsync server
Hi Jen, fantastic news! Thanks a lot! Bjoern Thanks, There have been no public data updates since the migration started (late last spring we froze the data). But there are some known issues and data that is ready to be released, in the process of becoming ready, etc. We expect to be able to start working on this again in the very near term. To start this off, the mm9 bowtie2 indexes were restored this morning: https://trello.com/c/SbizUDQt And these other finds are great, thanks Bjoern. I will add them to the public card. A few errors that were corrected later last spring popped out again, but will also be fixed. Small, but will be addressed ASAP. Adding in any missing .2bit files in general are on our internal to-do list. Older genomes have other inconsistencies that will be addressed. The goal is to have the data filled in with complete indexes around the end of Nov, then filled in with newly released genomes/more variants for important model organisms by the end of the year. All of this depends on various factors, but this is where we are shooting for. Thanks and if anything else off is noted, please feel free to send to the list or add to the card. All input is welcome - we want this to be a great resource for everyone - time to get back to making that happen now that the migration is wrapping up! Jen Galaxy team On 11/8/13 2:00 AM, Bjoern Gruening wrote: Hi, to chime into this discussion. I found some inconsistency during my rsync endeavor and I'm curious if there is any way to contribute to that service. -- xenTro3 xenTro3 Frog (Xenopus tropicalis): xenTro3 /galaxy/data/xenTro3/seq/xenTro3.fa but only /xenTro3.fa.gz exists. --- ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc --- ce6 has no .fa file under seq/ but in allfasta.loc there is a reference to it ce6 Caenorhabditis elegans: ce6 /galaxy/data/ce6/seq/ce6.fa --- TAIR9 and TAIR10 is not available via rync --- Bowtie2 indices are missing for ce6, xentTro3 Thanks, Bjoern Hi Jennifer, Today I was trying to pull some bowtie2 indices from Galaxy rsync server for PhiX to run some tests and just got the ones for bowtie1… I'm wondering what's the state in regards to this past thread and what we can do to help in here. Cheers! Roman 7 mar 2013 kl. 20:01 skrev Jennifer Jackson j...@bx.psu.edu: Hi Brad (and Roman), The team has talked about this in detail. There are a few wrinkles with just pulling in indexes - Dan is doing some work that could change this later on, but for now, the rsync will continue to point to the same location as Main's genome data source. This means that there are some limits on what we can do immediately. Setting up a submission pipe is one of them - there just isn't resource to do this right now or a common place distinct from Main to house the data. A few other ideas came up - we can chat later, each had side issues. But I saw your tweet and think that it is great that you are pulling CloudBioLinux data from the rsync now, so let's get as much data in common as possible, so you have data to work with near term. I am in the process of adding bt2 indexes - some are published to Main/rsync server already and some are not, but more will show up over the next week or so (along with more genomes and other indexes). I'll take a look at what you have and pull/match what I can. Genome sort order and variants are my concerns, both require special handling in processing and .locs. If it takes longer to check, I am just going to create here if I haven't already. The GATK-sort hg19 canonical is already on my list - it needed all indexes, not just bw2. When the next distribution goes out, I'll list what is new on the rsync in the News Brief. For the Novoalign indexes, I'm not quite sure what to do about those yet. Or for any indexes associated with tools or genomes not hosted on Main. Do you want to open a card for those and any other cases that are similar? We can discuss a strategy from there, maybe at IUC, if Greg/Dan thinks it is appropriate. Please add me so I can follow. I'll be in touch as I go through the data. Thanks for your patience on this! Jen Galaxy team On 2/21/13 12:43 PM, Brad Chapman wrote: Hi all; Is there a way for community members to contribute indexes to the rsync server? This resource is awesome and I'm working on migrating the CloudBioLinux retrieval scripts to use this instead of the custom S3 buckets we'd set up previously: https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py It's great to have this as a public shared resource and I'd like to be able to contribute back. From an initial pass, here are the
Re: [galaxy-dev] tool of installing tool shed repositories
Hello Ray and Björn, The ability to import an exported repository archive is now available for the web UI in 11261:5c59f2c4f770. I will enhance the Tool Shed API to accomodate this new feature next. This feature is available only in galaxy-central, so it will not be available on the main tool shed until the next Galaxy release. In the mean time, it should be helpful in migrating repositories for local development tool sheds to the test tool shed. If you happened to export a capsule before this changeset, it will not be valid as there was an error in generating the capsule manifest that was corrected in this change set. You can just export the capsule again with this latest changeset. Please let me know if you encounter any issues. Thanks, Greg Von Kuster On Nov 4, 2013, at 10:54 AM, Greg Von Kuster g...@bx.psu.edu wrote: Hello Ray and Björn, I'm currently working on the feature for importing a repository capsule into a Tool Shed. It's been on my plate for a while, but other priorities have side-tracked this work. Based on your exchange, I'm now working to finish up this feature, so it should be available in the next few days. Thanks, Greg Von Kuster On Nov 4, 2013, at 3:46 AM, Björn Grüning bjoern.gruen...@pharmazie.uni-freiburg.de wrote: Hi Ray, there is some work in that direction to easily import and export repositories. The export feature is already integrated and should help you. You will end up with a tarball with all informations about that repository and the import should be easier. If you want to work on the import part, I think your work is more than welcome! Cheers, Bjoern Hi, there: I'm currently trying to migrate all repositories of main tool shed on http://toolshed.g2.bx.psu.edu/ to local, but got some problems. I'm wondering whether there is an existing tool can automatically do the job? thanks rgds, Ray __ ngsf...@hygenomics.com ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Managing Data Locality
Hi John, I have now read the top-level documentation for LWR, and gone through the sample configurations. I would appreciate if you would answer a few technical questions for me. 1) How exactly is the staging_directory in server.ini.sample used? Is that intended to be the (final) location at which to put files on the remote server? How is the relative path structure under $GALAXY_ROOT/databases/files handled? 2) What exactly does persistence_directory in server.ini.sample mean? Where should it be located, how will it be used? 3) What exactly does file_cache_dir in server.ini.sample mean? 4) Does LWR preserve some relative path (e.g. to GALAXY_ROOT) under the above directories? 5) Are files renamed when cached? If so, are they eventually restored to their original names? 6) Is it possible to customize the DRMAA and/or qsub requests made by LWR, for example to include additional settings such as Project or a memory limit? Is it possible to customize this on a case by case basis, rather than globally? 7) Are there any options for the queued_drmaa manager in job_managers.ini.sample which are not listed in that file? 8) What exactly are the differences between the queued_drmaa manager and the queued_cli manager? Are there any options for the latter which are not in the job_managers.ini.sample file? 9) When I attempt to run LWR (not having completed all the mentioned preparation steps, namely without setting DRMAA_LIBRARY_PATH), I get a Seg fault. Is this because it can't find DRMAA or is it potentially unrelated? In the latter case, here's the error being output to the console: ./run.sh: line 65: 26277 Segmentation fault paster serve server.ini $@ Lastly, a simple comment, hopefully helpful. It would be nice if the LWR install docs at least mentioned the dependency of PyOpenSSL 0.13 (or later) on OpenSSL 0.9.8f (or later), maybe even with a comment that pip will listen to the environment variables CFLAGS and LDFLAGS in the event one is creating a local installation of the OpenSSL library for LWR to use. Thank you for your time and assistance. Best, Eric From: jmchil...@gmail.com [jmchil...@gmail.com] on behalf of John Chilton [chil...@msi.umn.edu] Sent: Tuesday, November 05, 2013 11:58 AM To: Paniagua, Eric Cc: Galaxy Dev [galaxy-...@bx.psu.edu] Subject: Re: [galaxy-dev] Managing Data Locality Hey Eric, I think what you are purposing would be a major development effort and mirrors major development efforts ongoing. There are sortof ways to do this already, with various trade-offs, and none particularly well documented. So before undertaking this efforts I would dig into some alternatives. If you are using PBS, the PBS runner contains some logic for delegating to PBS for doing this kind of thing - I have never tried it. https://bitbucket.org/galaxy/galaxy-central/src/default/lib/galaxy/jobs/runners/pbs.py#cl-245 In may be possible to use a specially configured handler and the Galaxy object store to stage files to a particular mount before running jobs - not sure it makes sense in this case. It might be worth looking into this (having the object store stage your files, instead of solving it at the job runner level). My recommendation however would be to investigate the LWR job runner. There are a bunch of fairly recent developments to enable something like what you are describing. For specificity lets say you are using DRMAA to talk to some HPC cluster and Galaxy's file data is stored in /galaxy/data on the galaxy web server but not on the HPC and there is some scratch space (/scratch) that is mounted on both the Galaxy web server and your HPC cluster. I would stand up an LWR (http://lwr.readthedocs.org/en/latest/) server right beside Galaxy on your web server. The LWR has a concept of managers that sort of mirrors the concept of runners in Galaxy - see the sample config for guidance on how to get it to talk with your cluster. It could use DRMAA, torque command-line tools, or condor at this time (I could add new methods e.g. PBS library if that would help). https://bitbucket.org/jmchilton/lwr/src/default/job_managers.ini.sample?at=default On the Galaxy side, I would then create a job_conf.xml file telling certain HPC tools to be sent to the LWR. Be sure to enable the LWR runner at the top (see advanced example config) and then add at least one LWR destination. destinations destination id=lwr runner=lwr param id=urlhttp://localhost:8913//param !-- Leave Galaxy directory and data indices alone, assumes they are mounted in both places. -- param id=default_file_actionnone/param !-- Do stage everything in /galaxy/data though -- param id=file_action_configfile_actions.json/param /destination Then create a file_actions.json file in the Galaxy root directory (structure of this file is subject to change, current json layout doesn't feel very Galaxy-ish). {paths: [ {path:
[galaxy-dev] Trouble getting proftpd to use Galaxy postgresql authentication
Folks, I am new to using proftpd/Galaxy on a local cluster I am having lots of issues. Here is one. I cannot connect to the proftpd server to upload files. ./proftpd --version ProFTPD Version 1.3.4d ( so proftpd is compiled and running) I am attempting to connect from an iMAC (OSX 10.9) to a LINUX machine where the proftp server is running ftp 123.45.678.123 Connected to 123.45.678.123. 220 ProFTPD 1.3.4d Server (Public Galaxy FTP by ProFTPD server installation) [128.23.191.200] Name (123.45.678.123:hazards): hazards 331 Password required for hazards I enter a password and I get “Abort trap 6” or “421 service not available, remote server closed connection” ( this latter reply from an older mac) I presume that this means that the authentication failed. Here are the last lines of my proftpd.conf file # Do not authenticate against real (system) users AuthPAM off # Set up mod_sql_password - Galaxy passwords are stored as hex-encoded SHA1 SQLPasswordEngine on SQLPasswordEncoding hex # Set up mod_sql to authenticate against the Galaxy database SQLEngine on SQLBackend postgres SQLConnectInfo galax...@xxx.musc.edu dbuser dbpassword SQLAuthTypesSHA1 SQLAuthenticate users # An empty directory in case chroot fails SQLDefaultHomedir /shared/app/ProFFTPd-1.3.4d/default # Define a custom query for lookup that returns a passwd-like entry. UID and GID should match your Galaxy user. SQLUserInfo custom:/LookupGalaxyUser SQLNamedQuery LookupGalaxyUser SELECT email,password,'581','582','/shared/app/Galaxy/galaxy_dist/database/files/%U','/bin/bash' FROM galaxy_user WHERE email='%U’ I have a user named galaxy on the system and a Galaxy-user named galaxy defined within Galaxy and an owner of the postgresql database. With respect to the following line, If my Galaxy postgresql database is called “galaxydb”, are the dbuser and dbpassword to be the postgresql owner of the database? Or the system user named galaxy who has actually started the instance of galaxy with run.sh command. SQLConnectInfo galax...@xxx.musc.edu galaxy Galaxy2013 With respect to this line SQLNamedQuery LookupGalaxyUser SELECT email,password,'581','582’,…… What are the appropriate UID and GID to apply here? I have a system user “galaxy” who starts Galaxy with “run.sh”. This user’s UID and GID are 581 and 582 respectively but I am not able to login for FTP transfer If I use Chrome and paste ftp://123.45.678.123” I get “unable to load webpage because server sent no data “error code ERR_EMPTY_RESPONSE If I paste ftp://123.45.678.123/ /path/to/file And press execute in my local Galaxy GUI I get a tool error” with no Std_Out and no Std_error So I cannot use FTP via ProFTPD 1.3.4d Server to upload files. How can I test postgresql authentication via Galaxy/proftpd to get at the problem? Starr ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] multiple output tool in workflow
Hi Jun, There is probably a problem with the tool design itself, but that may be what you are asking how to solve. I wouldn't think this is a problem with workflows at first pass. Is this your own tool? Or a tool from the tool shed (the repo developer is usually the one to make changes, unless you want to try)? This is the primary tool development wiki, the output tag set is where I would double check the tool first. http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax I am moving this over to the galaxy-...@bx.psu.edu mailing list since it is a tool development question. Jen Galaxy team On 11/8/13 10:19 AM, Jun Fan wrote: Hi all, I am trying to creating a workflow from history. One of the tool used generates multiple outputs in the format of gff3, fasta and sam. Gff3 will be visualized in IGV and the fasta file is doing further BLAST analysis. Now the problem is that the automatically generated workflow does not connect the having-multiple-output tool and the BLAST tool. I failed even I tried to connect these two tools in the workflow by hand. I am guessing this is due to only the main output type (gff3) is recognized in the workflow. How could I solve this problem? Best regards and have a nice weekend! Jun ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] multiple output tool in workflow
Hi Jen Thanks for your reply. Yes, it is my own tool. The outputs element is outputs data format=gff3 name=output/ /outputs In the command element, the three output files are defined as below $output /$__new_file_path__/primary_${output.id}_samWithPeptides_visible_sam /$__new_file_path__/primary_${output.id}_longestORFs_visible_fasta Is there anything wrong here? Best regards! Jun From: Jennifer Jackson [mailto:j...@bx.psu.edu] Sent: 09 November 2013 01:05 To: Galaxy Dev Cc: Jun Fan Subject: multiple output tool in workflow Hi Jun, There is probably a problem with the tool design itself, but that may be what you are asking how to solve. I wouldn't think this is a problem with workflows at first pass. Is this your own tool? Or a tool from the tool shed (the repo developer is usually the one to make changes, unless you want to try)? This is the primary tool development wiki, the output tag set is where I would double check the tool first. http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax I am moving this over to the galaxy-...@bx.psu.edumailto:galaxy-...@bx.psu.edu mailing list since it is a tool development question. Jen Galaxy team On 11/8/13 10:19 AM, Jun Fan wrote: Hi all, I am trying to creating a workflow from history. One of the tool used generates multiple outputs in the format of gff3, fasta and sam. Gff3 will be visualized in IGV and the fasta file is doing further BLAST analysis. Now the problem is that the automatically generated workflow does not connect the having-multiple-output tool and the BLAST tool. I failed even I tried to connect these two tools in the workflow by hand. I am guessing this is due to only the main output type (gff3) is recognized in the workflow. How could I solve this problem? Best regards and have a nice weekend! Jun ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] multiple output tool in workflow
Hi Fan, The outputs /outputs block should contain one line for each of the output files. I believe that you need to name these differently. output1, output2, etc. Or you can add in text and use variables, if you add in the label option. Others can correct or add to my comments. Good luck! Jen Galaxy team On 11/8/13 5:41 PM, Jun Fan wrote: Hi Jen Thanks for your reply. Yes, it is my own tool. The outputs element is outputs data format=gff3 name=output/ /outputs In the command element, the three output files are defined as below $output /$__new_file_path__/primary_${output.id}_samWithPeptides_visible_sam /$__new_file_path__/primary_${output.id}_longestORFs_visible_fasta Is there anything wrong here? Best regards! Jun *From:*Jennifer Jackson [mailto:j...@bx.psu.edu] *Sent:* 09 November 2013 01:05 *To:* Galaxy Dev *Cc:* Jun Fan *Subject:* multiple output tool in workflow Hi Jun, There is probably a problem with the tool design itself, but that may be what you are asking how to solve. I wouldn't think this is a problem with workflows at first pass. Is this your own tool? Or a tool from the tool shed (the repo developer is usually the one to make changes, unless you want to try)? This is the primary tool development wiki, the output tag set is where I would double check the tool first. http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax I am moving this over to the galaxy-...@bx.psu.edu mailto:galaxy-...@bx.psu.edu mailing list since it is a tool development question. Jen Galaxy team On 11/8/13 10:19 AM, Jun Fan wrote: Hi all, I am trying to creating a workflow from history. One of the tool used generates multiple outputs in the format of gff3, fasta and sam. Gff3 will be visualized in IGV and the fasta file is doing further BLAST analysis. Now the problem is that the automatically generated workflow does not connect the having-multiple-output tool and the BLAST tool. I failed even I tried to connect these two tools in the workflow by hand. I am guessing this is due to only the main output type (gff3) is recognized in the workflow. How could I solve this problem? Best regards and have a nice weekend! Jun ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org -- Jennifer Hillman-Jackson http://galaxyproject.org ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/