[galaxy-dev] Running two Galaxy instances on the same Torque cluster
Hello, I have a question about running two Galaxy instances on separate hosts on the same Torque cluster. For various reasons, including some recent changes and/or removal of certain features (I am told BLAST was affected) in the newer versions of Galaxy, I would like to keep our current older version of Galaxy running while creating a separate Torque submit host to run the latest version of Galaxy on it. I do not think that will pose any issues for Torque since it will just see the new host as another submit host for jobs, but I would like to know if this would cause any unforeseen issues for either of the Galaxy instances. They will both mount and store their data on the same network filesystem but I will naturally have to create two separate directory trees for their /basedir/database/files, pbs, job_working_directory, etc/ paths. I am planning on making the local user the same on both submit nodes ('galaxy' - we are not using LDAP on that cluster although we may in the future). Will that cause any strange issues such as jobs being reported back to the wrong galaxy instance? Will IP address or DNS name be a factor? Additionally I hope there will not be an issue with the two instances both pointing to the same FTP upload directory. The idea seems sound in my head but I want to make sure I'm not excluding any critical considerations. Any suggestions or insights would be appreciated. Thanks, Josh Nielsen HudsonAlpha Institute for Biotechnology ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Required Galaxy umask settings for HTML downloads?
Hello all, I am having issues downloading HTML files from Galaxy the same as is described in this email chain: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-August/010965.html I am getting the error (13)Permission denied: xsendfile: cannot open file: /basedir/galaxy_data/database/tmp/tmp8iEccn/library_download.zip which is indeed a basic filesystem permissions issue. The problem is that the permissions created for that directory and every directory created in tmp/ look like this: drwx--+ 2 galaxy galaxy 3 Dec 4 09:23 tmp8iEccn And I have placed the Apache user in the galaxy group, but as you can see no group permissions ever get set by Galaxy on the directories that it creates (it is getting a 700 permissions setting). As Nate Coraor suggested in the message linked to above, I have tried altering the default umask but I ran into issues with getting non-existant results. I use sudo service galaxy start as the galaxy user each time to start the server and a ps -ef | grep galaxy confirms that Galaxy is running as the galaxy user. Since I use sudo though I changed the sudoers file to include: rootALL=(ALL) ALL galaxy ALL=(ALL) ALL Defaults umask_override Defaults umask = 0002 This changed absolutely nothing. Then I started looking deeper into the PAM configuration and added a umask directive to /etc/pam.d/sudo (and also tried it in password-auth-ac and system-auth-ac) like this: session optional pam_umask.so umask=0002. Still nothing changed in the permissions in tmp/ when I tried to download an HTML file: no group permissions were set. Then I dug deeper still and saw that sometimes if setting the mask in /etc/pam.d/ config files is not enough that you can try to set a system-wide mask in /etc/login.defs (following the suggestion here: http://stackoverflow.com/questions/10220531/how-to-set-system-wide-umask). Still no dice. I've pretty much exhausted my know-how in this department. Any other suggestions of how to fix this or where the correct place to set the umask is? Thanks, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Required Galaxy umask settings for HTML downloads?
Hi Nate, Thanks for the reply. No I hadn't thought to add anything to /etc/init.d/galaxy itself. It is a short enough script that I can paste it below. What would I need to do to edit it with umask settings? Also I should note, changing the umask in the PAM files actually did change the default permissions for the galaxy user when I did an su - galaxy in a bash shell and then created or 'touch'-ed any files (which you could logically expect). But for some reason it didn't seem to make a difference with the directories created in that tmp/ directory even though the galaxy user was given ownership. That made me wonder if something was going on internal to Galaxy, or something else, that was overwriting/ignoring the system umask settings (which actually work fine in a shell environment as the user itself). Maybe I'll look into that ACL stuff Paul mentioned. Here is my /etc/init.d/galaxy script: . /etc/rc.d/init.d/functions GALAXY_USER=galaxy GALAXY_DIST_HOME=/home/galaxy/galaxy-dist GALAXY_RUN=${GALAXY_DIST_HOME}/run.sh GALAXY_PID=${GALAXY_DIST_HOME}/paster.pid case $1 in start) echo -n Starting galaxy services: daemon --user $GALAXY_USER ${GALAXY_RUN} --daemon --pid-file=${GALAXY_PID} touch /var/lock/subsys/galaxy ;; stop) echo -n Shutting down galaxy services: daemon --user $GALAXY_USER ${GALAXY_RUN} --stop-daemon rm -f /var/lock/subsys/galaxy ;; status) daemon --user galaxy ${GALAXY_RUN} --status ;; restart) $0 stop; $0 start ;; reload) $0 stop; $0 start ;; *) echo Usage: galaxy {start|stop|status|reload|restart} ;; esac -- Thanks! Josh On Tue, Dec 4, 2012 at 9:56 AM, Nate Coraor n...@bx.psu.edu wrote: On Dec 4, 2012, at 10:52 AM, Josh Nielsen wrote: Hello all, I am having issues downloading HTML files from Galaxy the same as is described in this email chain: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-August/010965.html I am getting the error (13)Permission denied: xsendfile: cannot open file: /basedir/galaxy_data/database/tmp/tmp8iEccn/library_download.zip which is indeed a basic filesystem permissions issue. The problem is that the permissions created for that directory and every directory created in tmp/ look like this: drwx--+ 2 galaxy galaxy 3 Dec 4 09:23 tmp8iEccn And I have placed the Apache user in the galaxy group, but as you can see no group permissions ever get set by Galaxy on the directories that it creates (it is getting a 700 permissions setting). As Nate Coraor suggested in the message linked to above, I have tried altering the default umask but I ran into issues with getting non-existant results. I use sudo service galaxy start as the galaxy user each time to start the server and a ps -ef | grep galaxy confirms that Galaxy is running as the galaxy user. Since I use sudo though I changed the sudoers file to include: rootALL=(ALL) ALL galaxy ALL=(ALL) ALL Defaults umask_override Defaults umask = 0002 This changed absolutely nothing. Then I started looking deeper into the PAM configuration and added a umask directive to /etc/pam.d/sudo (and also tried it in password-auth-ac and system-auth-ac) like this: session optional pam_umask.so umask=0002. Still nothing changed in the permissions in tmp/ when I tried to download an HTML file: no group permissions were set. Then I dug deeper still and saw that sometimes if setting the mask in /etc/pam.d/ config files is not enough that you can try to set a system-wide mask in /etc/login.defs (following the suggestion here: http://stackoverflow.com/questions/10220531/how-to-set-system-wide-umask). Still no dice. I've pretty much exhausted my know-how in this department. Any other suggestions of how to fix this or where the correct place to set the umask is? Hi Josh, Thanks for doing such extensive tests. Have you tried setting the umask in the init script itself? --nate Thanks, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Required Galaxy umask settings for HTML downloads?
Hi Paul, Thanks for replying. Interestingly I've never dealt with filesystem ACLs before and I didn't even know that ext3/4 systems had that feature. Here is my output from those commands: bash getfacl tmp8iEccn # file: tmp8iEccn # owner: galaxy # group: galaxy user::rwx group::--- mask::rwx other::--- bash getfacl tmp # file: tmp # owner: root # group: galaxy user::rwx group::rwx other::rwx What enforces these ACLs/where can they be tweaked? P.S. The reason I run Galaxy with sudo is because if I try to do so as just the Galaxy user it cannot create the process lock files: touch: cannot touch `/var/lock/subsys/galaxy`. I suppose I could put the lock files somewhere else or manually give galaxy group permission to /var/lock/subsys (not so sure that's a good idea though), but sudo seemed to solve the problem. You can see my init script in my reply to Nate. Thanks, Josh On Tue, Dec 4, 2012 at 10:06 AM, Paul Boddie paul.bod...@biotek.uio.nowrote: On 04/12/12 16:52, Josh Nielsen wrote: I am getting the error (13)Permission denied: xsendfile: cannot open file: /basedir/galaxy_data/database/**tmp/tmp8iEccn/library_**download.zip which is indeed a basic filesystem permissions issue. The problem is that the permissions created for that directory and every directory created in tmp/ look like this: drwx--+ 2 galaxy galaxy 3 Dec 4 09:23 tmp8iEccn And I have placed the Apache user in the galaxy group, but as you can see no group permissions ever get set by Galaxy on the directories that it creates (it is getting a 700 permissions setting). Isn't the trailing + character an indication of ACLs being set on the directory? What do the following say...? getfacl /tmp/tmp8iEccn getfacl /tmp If you do have ACLs involved, it may be the case that various masks are being enforced via that mechanism. Paul P.S. I'm not sure that making the galaxy user a sudoer would have any effect unless the user was attempting to gain privileges, which would be a pretty scary way of running Galaxy, I would have thought. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Required Galaxy umask settings for HTML downloads?
Great. I'll give those ideas a shot to see if it gets me anywhere. P.S. You referenced in the email that I linked to a fix in the next release of Galaxy. Is that out yet or still in development? -Josh On Tue, Dec 4, 2012 at 10:44 AM, Nate Coraor n...@bx.psu.edu wrote: On Dec 4, 2012, at 11:23 AM, Josh Nielsen wrote: Hi Nate, Thanks for the reply. No I hadn't thought to add anything to /etc/init.d/galaxy itself. It is a short enough script that I can paste it below. What would I need to do to edit it with umask settings? Also I should note, changing the umask in the PAM files actually did change the default permissions for the galaxy user when I did an su - galaxy in a bash shell and then created or 'touch'-ed any files (which you could logically expect). But for some reason it didn't seem to make a difference with the directories created in that tmp/ directory even though the galaxy user was given ownership. That made me wonder if something was going on internal to Galaxy, or something else, that was overwriting/ignoring the system umask settings (which actually work fine in a shell environment as the user itself). Maybe I'll look into that ACL stuff Paul mentioned. Paul's suggestions are worth checking in to. I'd be interested in knowing what the POSIX permissions are on /tmp itself, and what the ACLs are, if any. Those temporary files are created by creating a temporary directory using Python's tempfile.mkdtemp(), which creates them with a mode of 700, which is then masked by the current umask. The change I added in the email you referenced in your original post changes the directory to mode 0777 masked by the umask (after it's created). Depending on how the pieces used in RHEL's startup() shell function handle the environment, you may be able to set it on the line above `daemon ...` inside the 'start)' branch of the case statement. If that doesn't work, you may need to get more creative and do something like: daemon --user $GALAXY_USER umask 027; ${GALAXY_RUN} --daemon --pid-file=${GALAXY_PID} Alternatively, you can set it inside /home/galaxy/galaxy-dist/run.sh, since this startup script uses run.sh. --nate Here is my /etc/init.d/galaxy script: . /etc/rc.d/init.d/functions GALAXY_USER=galaxy GALAXY_DIST_HOME=/home/galaxy/galaxy-dist GALAXY_RUN=${GALAXY_DIST_HOME}/run.sh GALAXY_PID=${GALAXY_DIST_HOME}/paster.pid case $1 in start) echo -n Starting galaxy services: daemon --user $GALAXY_USER ${GALAXY_RUN} --daemon --pid-file=${GALAXY_PID} touch /var/lock/subsys/galaxy ;; stop) echo -n Shutting down galaxy services: daemon --user $GALAXY_USER ${GALAXY_RUN} --stop-daemon rm -f /var/lock/subsys/galaxy ;; status) daemon --user galaxy ${GALAXY_RUN} --status ;; restart) $0 stop; $0 start ;; reload) $0 stop; $0 start ;; *) echo Usage: galaxy {start|stop|status|reload|restart} ;; esac -- Thanks! Josh On Tue, Dec 4, 2012 at 9:56 AM, Nate Coraor n...@bx.psu.edu wrote: On Dec 4, 2012, at 10:52 AM, Josh Nielsen wrote: Hello all, I am having issues downloading HTML files from Galaxy the same as is described in this email chain: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-August/010965.html I am getting the error (13)Permission denied: xsendfile: cannot open file: /basedir/galaxy_data/database/tmp/tmp8iEccn/library_download.zip which is indeed a basic filesystem permissions issue. The problem is that the permissions created for that directory and every directory created in tmp/ look like this: drwx--+ 2 galaxy galaxy 3 Dec 4 09:23 tmp8iEccn And I have placed the Apache user in the galaxy group, but as you can see no group permissions ever get set by Galaxy on the directories that it creates (it is getting a 700 permissions setting). As Nate Coraor suggested in the message linked to above, I have tried altering the default umask but I ran into issues with getting non-existant results. I use sudo service galaxy start as the galaxy user each time to start the server and a ps -ef | grep galaxy confirms that Galaxy is running as the galaxy user. Since I use sudo though I changed the sudoers file to include: rootALL=(ALL) ALL galaxy ALL=(ALL) ALL Defaults umask_override Defaults umask = 0002 This changed absolutely nothing. Then I started looking deeper into the PAM configuration and added a umask directive to /etc/pam.d/sudo (and also tried it in password-auth-ac and system-auth-ac
Re: [galaxy-dev] Issues up/downloading datasets after file_path change
Yes I am, and recently I thought to check my httpd.conf and it still has a XSendFilePath /panfs/galaxy_data setting, where /panfs is the obsolete directory path, and figured that was the problem, I just haven't changed it yet. Thanks for the suggestion to look! I'll let you know if changing that doesn't fix it. -Josh On Mon, Oct 1, 2012 at 11:33 AM, Nate Coraor n...@bx.psu.edu wrote: Hi Josh, Are you using XSendFile or X-Accel-Redirect, by any chance? --nate On Sep 17, 2012, at 5:18 PM, Josh Nielsen wrote: Okay, I solved half of the problem. The upload job was consistently being submitted to the one compute node in our cluster that I forgot to create the symlink on that points to the new location. I can now upload files. This means I am fully functional now, which is good, however I still prefer to do away with the symlink approach altogether and get Galaxy to work directly with the new directory path. Any suggestions for getting that to work? Thanks! On Mon, Sep 17, 2012 at 3:50 PM, Josh Nielsen jniel...@hudsonalpha.com wrote: Hello all, I recently migrated the location of the Galaxy file_path directory (~/database/files/) along with the temp, job_working_directory, and pbs directory locations (all under ~/database/) to a new storage system mount point and I properly updated the change in the universe_wsgi.ini file but have run into some problems with managing our datasets. I can run Galaxy jobs just fine but I cannot view or download the datasets in the output of the jobs after they have run - when it tries to open it from the new path - and I get an actual browser error when trying to fetch the dataset. On a hunch I created a symbolic link named after the old path in the / directory pointing to the new location (and pointed file_path in the universe_wsgi.ini file back to it - which looks exactly like the old directory path), and sure enough this works as a (temporary) fix to view/download the files. However doing that seems to, in turn, mess with the file uploading function which will fail every time with a OSError: [Errno 2] No such file or directory message. So I'm caught between two broken configurations depending on the path I point to (the new path or the symlink alternative). I guess in the latter situation it does not like traversing a symbolic link (which is the only explanation that I can think of). The root of the problem needs to be fixed though in that Galaxy seems to be remembering the old path from somewhere (the Galaxy database?) such that it will not let me simply migrate the data to a new location and change the path variables in universe_wsgi.ini. Any suggestions on the proper changes that need to be made to fix this permanently? Thanks, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Issues up/downloading datasets after file_path change
Hello all, I recently migrated the location of the Galaxy file_path directory (~/database/files/) along with the temp, job_working_directory, and pbs directory locations (all under ~/database/) to a new storage system mount point and I properly updated the change in the universe_wsgi.ini file but have run into some problems with managing our datasets. I can run Galaxy jobs just fine but I cannot view or download the datasets in the output of the jobs after they have run - when it tries to open it from the new path - and I get an actual browser error when trying to fetch the dataset. On a hunch I created a symbolic link named after the old path in the / directory pointing to the new location (and pointed file_path in the universe_wsgi.ini file back to it - which looks exactly like the old directory path), and sure enough this works as a (temporary) fix to view/download the files. However doing that seems to, in turn, mess with the file uploading function which will fail every time with a OSError: [Errno 2] No such file or directory message. So I'm caught between two broken configurations depending on the path I point to (the new path or the symlink alternative). I guess in the latter situation it does not like traversing a symbolic link (which is the only explanation that I can think of). The root of the problem needs to be fixed though in that Galaxy seems to be remembering the old path from somewhere (the Galaxy database?) such that it will not let me simply migrate the data to a new location and change the path variables in universe_wsgi.ini. Any suggestions on the proper changes that need to be made to fix this permanently? Thanks, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Issues up/downloading datasets after file_path change
Okay, I solved half of the problem. The upload job was consistently being submitted to the one compute node in our cluster that I forgot to create the symlink on that points to the new location. I can now upload files. This means I am fully functional now, which is good, however I still prefer to do away with the symlink approach altogether and get Galaxy to work directly with the new directory path. Any suggestions for getting that to work? Thanks! On Mon, Sep 17, 2012 at 3:50 PM, Josh Nielsen jniel...@hudsonalpha.comwrote: Hello all, I recently migrated the location of the Galaxy file_path directory (~/database/files/) along with the temp, job_working_directory, and pbs directory locations (all under ~/database/) to a new storage system mount point and I properly updated the change in the universe_wsgi.ini file but have run into some problems with managing our datasets. I can run Galaxy jobs just fine but I cannot view or download the datasets in the output of the jobs after they have run - when it tries to open it from the new path - and I get an actual browser error when trying to fetch the dataset. On a hunch I created a symbolic link named after the old path in the / directory pointing to the new location (and pointed file_path in the universe_wsgi.ini file back to it - which looks exactly like the old directory path), and sure enough this works as a (temporary) fix to view/download the files. However doing that seems to, in turn, mess with the file uploading function which will fail every time with a OSError: [Errno 2] No such file or directory message. So I'm caught between two broken configurations depending on the path I point to (the new path or the symlink alternative). I guess in the latter situation it does not like traversing a symbolic link (which is the only explanation that I can think of). The root of the problem needs to be fixed though in that Galaxy seems to be remembering the old path from somewhere (the Galaxy database?) such that it will not let me simply migrate the data to a new location and change the path variables in universe_wsgi.ini. Any suggestions on the proper changes that need to be made to fix this permanently? Thanks, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] FastQC Tool Errors
Sure enough, this fixed the problem. I just installed the latest Sun JRE and it is working now. Thanks for the suggestion! I would have never guessed. -Josh On Mon, Jun 25, 2012 at 11:34 AM, simon andrews simon.andr...@babraham.ac.uk wrote: Yes, that's the broken version of gcj. I don't have a Centos machine here at the moment, but I think if you install OpenJDK and use the alternatives system to select that as the default JRE then that should fix things. Simon. On 25 Jun 2012, at 17:19, Josh Nielsen wrote: Hi Simon, I recently installed Java with the yum package manager on our compute nodes, and our cluster is a Centos 6 environment. Here is what the results of java -version returned on the compute nodes: *bash# java -version* *java version 1.5.0* *gij (GNU libgcj) version 4.4.4 20100726 (Red Hat 4.4.4-13)* * * *Copyright (C) 2007 Free Software Foundation, Inc.* *This is free software; see the source for copying conditions. There is NO* *warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE*. Is this version of Java too old? Perhaps I need to install the JRE manually? Thanks! On *Sat Jun 23 04:20:53 EDT 2012*, Simon Andrews simon.andr...@babraham.ac.uk galaxy-dev%40lists.bx.psu.edu?Subject=Re%3A%20%5Bgalaxy-dev%5D%20FastQC%20Tool%20ErrorsIn-Reply-To=%3CD9909700-8628-4478-814C-449803EE45F1%40babraham.ac.uk%3E wrote: Are you by any chance running an older version of gcj as your java version? There is a known bug in some of these where they don't correctly configure the headless environment, even if the correct parameters are passed. This causes exactly the kind of errors you're seeing. If this is the case you'll need to install a more recent JRE (or update your path to point to one which is already present). Simon. On Sat, Jun 23, 2012 at 6:30 AM, Josh Nielsen jniel...@hudsonalpha.com wrote: Hello, I am having an issue with getting the FastQC tool to work with Galaxy on our server. I downloaded the FastQC files (version 0.8.0) and changed the directory that the wrapper script looks for the 'fastqc' executable in, but when we run a job with it we have been getting the following output: Started analysis of Clip Approx 5% complete for Clip Approx 10% complete for Clip ... ... Approx 95% complete for Clip Approx 100% complete for Clip Analysis complete for Clip (.:9754): Gtk-WARNING **: cannot open display: And then the job shows as failed in Galaxy. The output .dat file just has that same output/error message in it (though it seems to indicate it got to 100%). Also when I try to execute the fastqc file directly (albeit with no arguments) I get this: Exception in thread main java.awt.HeadlessException: No X11 DISPLAY variable was set, but this program performed an operation which requires it. at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:173) at java.awt.Window.init(Window.java:437) at java.awt.Frame.init(Frame.java:419) at java.awt.Frame.init(Frame.java:384) at javax.swing.JFrame.init(JFrame.java:174) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.init(FastQCApplication.java:271) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:102) Both errors seem to have something to do with the graphical GUI component of FastQC (which I have seen some screenshots for on the FastQC webpage). If this application is GUI-driven how did the online PSU Galaxy get it to work with their wrapper script when the tools are run in a command-line environment with no X11 or Gtk? Essentially I'm just wondering what steps I'm missing here to getting this to work with our Galaxy mirror, other than just dropping the executable in place? Any suggestions? Thanks, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444; The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT *Registered Charity No. 1053902.* The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/email_disclaimer.html ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface
Re: [galaxy-dev] FastQC Tool Errors
We are not currently running anything like Xvfb, but if that is the only way to get it to run I suppose I can try it. How does PSU's Galaxy handle grabbing the results and outputting them to Galaxy without having X11 applications (useless/unneeded with Galaxy - meant for manual thickclient GUI interaction not browsers) starting on the servers on their end for every user job executed? And how do they kill or manage the opened X11 sessions once started? The users do not/aren't supposed to see the X11 / R Graphics, correct? They are only supposed to see whatever file output it results in so that they can view it strictly through Galaxy, as per the wrapper's text description: The tool produces a single HTML output file that contains all of the results. It is only the results summary in HTML that Galaxy the user are concerned about, as I understand it. We run Galaxy on a Linux cluster headnode (with no monitor only boots to init 3 [no GUI] - with admin-only ssh access) and it submits all jobs to the compute nodes and then returns the results. I installed Java on each compute nodes but do I now need to install that X11 virtual frame buffer/Xvfb on all of the compute nodes also? Thanks! On Fri, Jun 22, 2012 at 5:35 PM, Ross ross.laza...@gmail.com wrote: Do you run an X11 virtual frame buffer - eg Xvfb? Otherwise AFAIK R graphics and Java will complain on headless nodes. On Sat, Jun 23, 2012 at 6:30 AM, Josh Nielsen jniel...@hudsonalpha.com wrote: Hello, I am having an issue with getting the FastQC tool to work with Galaxy on our server. I downloaded the FastQC files (version 0.8.0) and changed the directory that the wrapper script looks for the 'fastqc' executable in, but when we run a job with it we have been getting the following output: Started analysis of Clip Approx 5% complete for Clip Approx 10% complete for Clip ... ... Approx 95% complete for Clip Approx 100% complete for Clip Analysis complete for Clip (.:9754): Gtk-WARNING **: cannot open display: And then the job shows as failed in Galaxy. The output .dat file just has that same output/error message in it (though it seems to indicate it got to 100%). Also when I try to execute the fastqc file directly (albeit with no arguments) I get this: Exception in thread main java.awt.HeadlessException: No X11 DISPLAY variable was set, but this program performed an operation which requires it. at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:173) at java.awt.Window.init(Window.java:437) at java.awt.Frame.init(Frame.java:419) at java.awt.Frame.init(Frame.java:384) at javax.swing.JFrame.init(JFrame.java:174) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.init(FastQCApplication.java:271) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:102) Both errors seem to have something to do with the graphical GUI component of FastQC (which I have seen some screenshots for on the FastQC webpage). If this application is GUI-driven how did the online PSU Galaxy get it to work with their wrapper script when the tools are run in a command-line environment with no X11 or Gtk? Essentially I'm just wondering what steps I'm missing here to getting this to work with our Galaxy mirror, other than just dropping the executable in place? Any suggestions? Thanks, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444; ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] FastQC Tool Errors
Hi Simon, I recently installed Java with the yum package manager on our compute nodes, and our cluster is a Centos 6 environment. Here is what the results of java -version returned on the compute nodes: *bash# java -version* *java version 1.5.0* *gij (GNU libgcj) version 4.4.4 20100726 (Red Hat 4.4.4-13)* * * *Copyright (C) 2007 Free Software Foundation, Inc.* *This is free software; see the source for copying conditions. There is NO* *warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE* . Is this version of Java too old? Perhaps I need to install the JRE manually? Thanks! On *Sat Jun 23 04:20:53 EDT 2012*, Simon Andrews simon.andr...@babraham.ac.uk galaxy-dev%40lists.bx.psu.edu?Subject=Re%3A%20%5Bgalaxy-dev%5D%20FastQC%20Tool%20ErrorsIn-Reply-To=%3CD9909700-8628-4478-814C-449803EE45F1%40babraham.ac.uk%3E wrote: Are you by any chance running an older version of gcj as your java version? There is a known bug in some of these where they don't correctly configure the headless environment, even if the correct parameters are passed. This causes exactly the kind of errors you're seeing. If this is the case you'll need to install a more recent JRE (or update your path to point to one which is already present). Simon. On Sat, Jun 23, 2012 at 6:30 AM, Josh Nielsen jniel...@hudsonalpha.com wrote: Hello, I am having an issue with getting the FastQC tool to work with Galaxy on our server. I downloaded the FastQC files (version 0.8.0) and changed the directory that the wrapper script looks for the 'fastqc' executable in, but when we run a job with it we have been getting the following output: Started analysis of Clip Approx 5% complete for Clip Approx 10% complete for Clip ... ... Approx 95% complete for Clip Approx 100% complete for Clip Analysis complete for Clip (.:9754): Gtk-WARNING **: cannot open display: And then the job shows as failed in Galaxy. The output .dat file just has that same output/error message in it (though it seems to indicate it got to 100%). Also when I try to execute the fastqc file directly (albeit with no arguments) I get this: Exception in thread main java.awt.HeadlessException: No X11 DISPLAY variable was set, but this program performed an operation which requires it. at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:173) at java.awt.Window.init(Window.java:437) at java.awt.Frame.init(Frame.java:419) at java.awt.Frame.init(Frame.java:384) at javax.swing.JFrame.init(JFrame.java:174) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.init(FastQCApplication.java:271) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:102) Both errors seem to have something to do with the graphical GUI component of FastQC (which I have seen some screenshots for on the FastQC webpage). If this application is GUI-driven how did the online PSU Galaxy get it to work with their wrapper script when the tools are run in a command-line environment with no X11 or Gtk? Essentially I'm just wondering what steps I'm missing here to getting this to work with our Galaxy mirror, other than just dropping the executable in place? Any suggestions? Thanks, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444; ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] FastQC Tool Errors
Thanks Simon! I'll definitely give that a shot. -Josh On Mon, Jun 25, 2012 at 11:34 AM, simon andrews simon.andr...@babraham.ac.uk wrote: Yes, that's the broken version of gcj. I don't have a Centos machine here at the moment, but I think if you install OpenJDK and use the alternatives system to select that as the default JRE then that should fix things. Simon. On 25 Jun 2012, at 17:19, Josh Nielsen wrote: Hi Simon, I recently installed Java with the yum package manager on our compute nodes, and our cluster is a Centos 6 environment. Here is what the results of java -version returned on the compute nodes: *bash# java -version* *java version 1.5.0* *gij (GNU libgcj) version 4.4.4 20100726 (Red Hat 4.4.4-13)* * * *Copyright (C) 2007 Free Software Foundation, Inc.* *This is free software; see the source for copying conditions. There is NO* *warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE*. Is this version of Java too old? Perhaps I need to install the JRE manually? Thanks! On *Sat Jun 23 04:20:53 EDT 2012*, Simon Andrews simon.andr...@babraham.ac.uk galaxy-dev%40lists.bx.psu.edu?Subject=Re%3A%20%5Bgalaxy-dev%5D%20FastQC%20Tool%20ErrorsIn-Reply-To=%3CD9909700-8628-4478-814C-449803EE45F1%40babraham.ac.uk%3E wrote: Are you by any chance running an older version of gcj as your java version? There is a known bug in some of these where they don't correctly configure the headless environment, even if the correct parameters are passed. This causes exactly the kind of errors you're seeing. If this is the case you'll need to install a more recent JRE (or update your path to point to one which is already present). Simon. On Sat, Jun 23, 2012 at 6:30 AM, Josh Nielsen jniel...@hudsonalpha.com wrote: Hello, I am having an issue with getting the FastQC tool to work with Galaxy on our server. I downloaded the FastQC files (version 0.8.0) and changed the directory that the wrapper script looks for the 'fastqc' executable in, but when we run a job with it we have been getting the following output: Started analysis of Clip Approx 5% complete for Clip Approx 10% complete for Clip ... ... Approx 95% complete for Clip Approx 100% complete for Clip Analysis complete for Clip (.:9754): Gtk-WARNING **: cannot open display: And then the job shows as failed in Galaxy. The output .dat file just has that same output/error message in it (though it seems to indicate it got to 100%). Also when I try to execute the fastqc file directly (albeit with no arguments) I get this: Exception in thread main java.awt.HeadlessException: No X11 DISPLAY variable was set, but this program performed an operation which requires it. at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:173) at java.awt.Window.init(Window.java:437) at java.awt.Frame.init(Frame.java:419) at java.awt.Frame.init(Frame.java:384) at javax.swing.JFrame.init(JFrame.java:174) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.init(FastQCApplication.java:271) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:102) Both errors seem to have something to do with the graphical GUI component of FastQC (which I have seen some screenshots for on the FastQC webpage). If this application is GUI-driven how did the online PSU Galaxy get it to work with their wrapper script when the tools are run in a command-line environment with no X11 or Gtk? Essentially I'm just wondering what steps I'm missing here to getting this to work with our Galaxy mirror, other than just dropping the executable in place? Any suggestions? Thanks, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444; The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT *Registered Charity No. 1053902.* The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/email_disclaimer.html ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] FastQC Tool Errors
Hello, I am having an issue with getting the FastQC tool to work with Galaxy on our server. I downloaded the FastQC files (version 0.8.0) and changed the directory that the wrapper script looks for the 'fastqc' executable in, but when we run a job with it we have been getting the following output: Started analysis of Clip Approx 5% complete for Clip Approx 10% complete for Clip ... ... Approx 95% complete for Clip Approx 100% complete for Clip Analysis complete for Clip (.:9754): Gtk-WARNING **: cannot open display: And then the job shows as failed in Galaxy. The output .dat file just has that same output/error message in it (though it seems to indicate it got to 100%). Also when I try to execute the fastqc file directly (albeit with no arguments) I get this: Exception in thread main java.awt.HeadlessException: No X11 DISPLAY variable was set, but this program performed an operation which requires it. at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:173) at java.awt.Window.init(Window.java:437) at java.awt.Frame.init(Frame.java:419) at java.awt.Frame.init(Frame.java:384) at javax.swing.JFrame.init(JFrame.java:174) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.init(FastQCApplication.java:271) at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:102) Both errors seem to have something to do with the graphical GUI component of FastQC (which I have seen some screenshots for on the FastQC webpage). If this application is GUI-driven how did the online PSU Galaxy get it to work with their wrapper script when the tools are run in a command-line environment with no X11 or Gtk? Essentially I'm just wondering what steps I'm missing here to getting this to work with our Galaxy mirror, other than just dropping the executable in place? Any suggestions? Thanks, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] user data upload directory structure
Ah, yes. This is what I was just requesting yesterday in the email that I sent, although it was much more long-winded. I didn't see this email chain from the day before. Having a user-representative directory structure would be beneficial in my mind. I followed/understood your suggested directory structure up until the arrows. Are those supposed to be symlinks? If so, what do you have in mind? I was thinking that just having those subdirectories by user id under files/ would be enough (although I could see how you could symlink them to some other arbitrary location if you so desired). My desired application was so that I could set up an FTP share to the files/ directory so that our users could copy their (processed) files off of the Galaxy server to other servers in our environment as well as one of our other clusters. Having the datasets segregated into the user's/owner's subdirectories would make it easier to identify and copy them off for that purpose. -Josh Nate- I do know about the disk accounting/quota features of Galaxy As I eluded in my previous email, it goes beyond accounting actually. I wanted to be able to implement something like: ~/galaxy-dist/database/files/user_id_000 - /one_data_pool_set/id_000 ~/galaxy-dist/database/files/user_id_001 - /another_data_pool_set/id_001 which would match the usual data placement from a scheduler perspective too. I'll look at galaxy-dist/lib/galaxy/objectstore/__init__.py Thanks a lot JC ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] ? IOError: [Errno 13] Permission denied: u'/home/koala2/galaxy-central/database/compiled_templates/base_panels.mako.py'
I have seen this error many times (for various mako.py files) and it is often either when the user running it has insufficient privileges or the file permissions on a folder (could be several directories upstream) or a file have changed in some way. If you do a 'ps -ef | grep python' do you see the paster.py process running with your expected user? Do you use sudo to elevate the privileges when you run it? I use a 'galaxy' user but I execute run.sh (which I have created a service script for in /etc/init.d) with sudo while logged in as galaxy. Also you might want to do an 'ls -l' in the compiled_templates/ directory and look at the file permissions and file ownership. -Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Idea for user-based dataset subdirectories
Hi David, Actually that is an interesting idea to use a daemon to move the files into associated user directories. Is that something that Galaxy Dev is working/can work on, or was that just a suggestion? I'm not opposed to doing any dev work of my own, but I don't really know Python that well and I know most of the Galaxy code is Python. I'm not sure that I follow what you are talking about with the joint user/galaxy directory though. I'm of course wanting it to not be unified (not all in the same directory) and rather be segregated by user into user subdirectories, but I think you already caught that so I guess I just didn't understand what you were getting at. Josh Nielsen -- How about if there were a completely separate daemon that monitored the galaxy database periodically to determine what datasets belong to which user(s). Then it would move the actual dataset to an area owned by the user and group accessible to galaxy, replacing the dataset with a symlink. This would require no changes to the galaxy build, but it would require a constant monitoring system. There is already a mechanism for users to move their files into a joint user/galaxy directory, but it is (as far as I know) only allowed for libraries, not histories. It would be better if there were a way for users to browse through their own directories as a tool, and be able to load files directly into their history. David Hoover ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Idea for user-based dataset subdirectories
Thanks for breaking that down for me. We are trying to set up some dev machines in our environment in a few weeks and I may create a clone of our production Galaxy mirror and play around with that version to see if I can get the functionality that I'm looking for. I'll take that idea about having a daemon into consideration. Regards, Josh On Wed, May 16, 2012 at 1:08 PM, David Hoover hoove...@helix.nih.govwrote: No, this was all an idea I've had for a while, but never did anything about it. I'm pretty sure the Galaxy developers are not interested in anything this locally-centric, and I don't blame them. It ought to be something outside the Galaxy build completely, because Galaxy is meant to be system-independent. What I meant by 'joint user/galaxy directory' is a directory that is owned by a user, but that the galaxy user has read (and possibly write) access to. This is entirely possible given either a well-informed user population, or an iron-clad suexec executable. The mechanism I alluded to is a feature by which a user can upload a directory of files all at once. There is a configuration directive in universe_wsgi.ini, user_library_import_dir, that allows non-administrative users to upload an entire directory of files into a library. The directive identifies the base directory, within which subdirectories named as the galaxy user login (email address) are searched. The user_library_import_dir directory is owned by the galaxy user, and the subdirectories are owned by the user, but group owned by the galaxy user. A user will copy files to the subdirectory, login to galaxy, switch to their library, and upload all the files in the directory into a single library folder. There isn't much documentation about it in the main Galaxy wiki, so forget that. I haven't enabled it in our local production site, and I haven't played with it in a long time. I'm pretty sure that the files are not removed after uploading, and a user is free to re-upload the files again and again, so it's kind of quirky. Also, if the files are not readable by the galaxy user, a bizarre and unhelpful error is thrown. If this functionality could be extended and elaborated, it could do what you want. The user_library_import_dir requires that the user's login in Galaxy must be identical to the the user's login on the cluster, and that the permissions be kept correct. Typically users have no idea what is going on with their permissions, so what are you going to do? David On May 16, 2012, at 1:33 PM, Josh Nielsen wrote: Hi David, Actually that is an interesting idea to use a daemon to move the files into associated user directories. Is that something that Galaxy Dev is working/can work on, or was that just a suggestion? I'm not opposed to doing any dev work of my own, but I don't really know Python that well and I know most of the Galaxy code is Python. I'm not sure that I follow what you are talking about with the joint user/galaxy directory though. I'm of course wanting it to not be unified (not all in the same directory) and rather be segregated by user into user subdirectories, but I think you already caught that so I guess I just didn't understand what you were getting at. Josh Nielsen -- How about if there were a completely separate daemon that monitored the galaxy database periodically to determine what datasets belong to which user(s). Then it would move the actual dataset to an area owned by the user and group accessible to galaxy, replacing the dataset with a symlink. This would require no changes to the galaxy build, but it would require a constant monitoring system. There is already a mechanism for users to move their files into a joint user/galaxy directory, but it is (as far as I know) only allowed for libraries, not histories. It would be better if there were a way for users to browse through their own directories as a tool, and be able to load files directly into their history. David Hoover ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Idea for user-based dataset subdirectories
Hello, Please forgive the length of this proposition as I try to explain my reasoning behind this. Let me say first of all that I understand that Galaxy is not meant to be everything to everyone and that requests for features may not suit everyone who uses Galaxy. That being said I have an idea or request that I think would be convenient for dealing with user's datasets from a file-system perspective. Galaxy has the obvious benefit and advantage (compared to manual job-submission for tools on a cluster) of providing an interface for using all the analysis tools, and the history of the operations done on your data, all in one place. However I have found that putting all the output datasets in one directory (the files/000/ directory) on the file-system causes a problem for the users if they specifically want to interact with it *on the file-system*, and not just through the Web interface - for whatever complicated or diverse reasons. Since Galaxy runs on a cluster of its own in our environment, and we do not allow users to remote connect into it to submit manual jobs (and individually output it to their separate home directories) like we do our main cluster, it is essentially a black box beyond the GUI interface of Galaxy. That is essentially what we want except for how they can interact with the output files. The issue is that our users would like an easy means of copying their files off of the Galaxy cluster to other servers from a command line (possibly even automated by scripts). Even if we allow an FTP share of the output directory for users to do that, the common [galaxy-dist]/database/files/000/ directory clumps all of the files for all users together in one directory and uses a sequential file-naming scheme (dataset_N++) that is not easy to discriminate between as to who the owner is for each file. Is there a way that the dataset output directory locations could be designed (or set optionally?) like the FTP upload feature's expected directory structure: where the files are dropped into the corresponding subdirectory of the user who produced it? For example having under database/files/ subdirectories named according to the user's Galaxy account id (like [galaxy-dist]/database/files/jsmith, [galaxy-dist]/database/files/sparker, etc.). If they could be segregated by user it would be much easier to keep track of what datasets belong to whom on the file-system. Then I could possibly set up a read-only FTP share to the files/ directory on the cluster, from which the users could directly copy the files in their personal subdirectory to other systems, and perhaps batch download them, rather than having to rely solely on the Web interface. I understand that the way Galaxy is currently designed is that the files are just generically named (the behind-the-scenes handling of data is a black box) and it is the database that keeps track of which files belong to whom, and which has the metadata for more meaningful dataset/job names, etc. But a file-system hierarchy alternative would also be welcome in a heavily command-line oriented computational environment too. Would setting up a more user-representative output directory hierarchy on the file-system like that be possible? Best Regards, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Migrate history datasets to new database
Hello all, I recently tried moving from a tarball install of Galaxy to a Mercurial managed one, and in the process something went wrong with the database upgrade. I had intended to install the Mercurial-based Galaxy separately (though on the same machine) and then move it to production once it was working but it installed in-place over the current database while it was still completely active/up and running. That broke my existing Galaxy install and I had to move to the new install immediately. I recall having to run the Mercurial install's run.sh script multiple times though because the upgrade sequence (looked like 87-88, 88-89, etc. as it progressed) did not complete all the way the first time. I also ran it as root when I probably should have done it as our galaxy user. Long story short now I cannot log in to Galaxy even though Galaxy recognizes correct credentials from the database. My debugging so far has not yielded any results. At this point after a week of unsuccessful attempts to repair the existing install I just want to create a fresh database and migrate over our users' history and dataset (and possibly login credentials) information stored in the database to the new one, if at all possible. Could someone give me any guidance as to how to do that, and which table files (MYI, MYD, etc.) that I should copy over into the new mysql database to make that happen? P.S. I do have to thank Dannon Baker for helping me so far through private email correspondence to try to figure out what went wrong with the current install. However I'm not having any breakthroughs and our local Galaxy mirror has been down for over a week now and I just want to start fresh and migrate over critical data if possible. Thanks for your help, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] More meaningful dataset names/easier method of identifying?
Hello, For a while now with the Galaxy mirror that we have I have found on many occasions a need to identify which dataset_*.dat files on the file system (in the [galaxy_dist]/database/files/000/ directory) belong to which user, and even for the same user to distinguish between their various datasets. Files directly uploaded by the user will have a Galaxy job dataset file name which match - like a Galaxy job name of data 18 (for example) which actually is reflective of the file name 'dataset_18.dat' on the file system. However any analysis on that file thereafter that produces another dataset does not give you a clue of the corresponding file name. For example, a Clip on data 18 run some time later may be called 'dataset_44.dat' on the filesystem, and a Map with Bowtie on data 18 that runs on the clipped 'dataset_44.dat' may produce an output file of 'dataset_53.dat'. When debugging failed jobs, and after the user has rerun them for the umpteenth time, there may be dozens of identical or near-identical files to weed through, and the generic naming scheme is not helpful even though it is sequential (also not easy to keep track of/match up unless you are watching the file writes in the directory live). The current implementation makes sense for internal usage and the code that uses it, but it is difficult for a human to distinguish which files match the jobs in Galaxy. It would be useful to have more meaningful dataset file names or an easier way to identify them (a record that matches the internal and external names) for administrative maintenance reasons so that I can delete files, or possibly even export those .dat files to a network share where our users can perform manual analysis on them. Could anyone point me to where in the code I could look to make the dataset names more meaningful? Or perhaps I should request of the Galaxy developers (as a feature) a way for the users themselves to see under the metadata name of their job (like Map with Bowtie on data 18) in the right side pane the *actual* corresponding file and location on the file system path to it (dataset_53.dat, for example). Or if not for users at least something for Administrators. Even a database that has four columns for the internal/filesystem dataset name, the job metadata name, the Galaxy job number (that the user sees), and the user that the dataset belongs to, would be helpful. A lot of our users are heavy into informatics though and would probably prefer that the user be able to see that information. Does anyone have any suggestions or thoughts about this? Thanks, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Galaxy pbs scripts and job_working_directory files
Hello again, As I mentioned in a recent post I often find need to debug jobs running from our local Galaxy mirror and I often have the need to look at the script data files that the job is trying to use in order to figure out what is causing a problem. The directories containing those file are in '[galaxy_dist]/database/pbs/' and '[galaxy_dist]/database/job_working_directory/' for me. Each job that is run gets a corresponding .sh file in the pbs/ directory (like 344.sh) which will have the entire sequence of bash commands to execute the job with and also a call to a wrapper script somewhere in the middle normally. That script information is very useful, but the problem is that when a job fails (often within the first 30 seconds of running it) the script is deleted and there is no trace of it left in the directory. The same with the output or job data files in job_working_directory/. I have had to suffice with using the technique of coordinating with the user when to (re)run their failed job and then quickly within the 30 second window do a cp -R script_I_care_about.sh copy_of_script.sh command, so that when the script is deleted I have a copy that I can examine. The same goes with copying the job_working_directory/ files. I know that it would get very cluttered in those directories if they were not automatically cleaned/deleted but I find those files essential for debugging. Is there a way to force Galaxy to retain those files (optionally) for debugging purposes? Maybe make a new option in the universe.ini file for that purpose that can be set for people who want it? Thanks, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Galaxy pbs scripts and job_working_directory files
Ah, so this would just betray my ignorance of current features. :-) I'll give that a try! Thanks, Josh On Tue, Apr 24, 2012 at 4:09 PM, Dannon Baker dannonba...@me.com wrote: Josh, Check out the cleanup_job setting in universe_wsgi.ini(and included below). It sounds like 'cleanup_job = onsuccess' is exactly what you're looking for. -Dannon # Clean up various bits of jobs left on the filesystem after completion. These # bits include the job working directory, external metadata temporary files, # and DRM stdout and stderr files (if using a DRM). Possible values are: # always, onsuccess, never #cleanup_job = always On Apr 24, 2012, at 5:00 PM, Josh Nielsen wrote: Hello again, As I mentioned in a recent post I often find need to debug jobs running from our local Galaxy mirror and I often have the need to look at the script data files that the job is trying to use in order to figure out what is causing a problem. The directories containing those file are in '[galaxy_dist]/database/pbs/' and '[galaxy_dist]/database/job_working_directory/' for me. Each job that is run gets a corresponding .sh file in the pbs/ directory (like 344.sh) which will have the entire sequence of bash commands to execute the job with and also a call to a wrapper script somewhere in the middle normally. That script information is very useful, but the problem is that when a job fails (often within the first 30 seconds of running it) the script is deleted and there is no trace of it left in the directory. The same with the output or job data files in job_working_directory/. I have had to suffice with using the technique of coordinating with the user when to (re)run their failed job and then quickly within the 30 second window do a cp -R script_I_care_about.sh copy_of_script.sh command, so that when the script is deleted I have a copy that I can examine. The same goes with copying the job_working_directory/ files. I know that it would get very cluttered in those directories if they were not automatically cleaned/deleted but I find those files essential for debugging. Is there a way to force Galaxy to retain those files (optionally) for debugging purposes? Maybe make a new option in the universe.ini file for that purpose that can be set for people who want it? Thanks, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Trouble linking to/displaying at local Genome Browser
Hello all, I am having some trouble getting a display at link to my local UCSC Genome Browser mirror to show up under my Galaxy jobs for viewing, although I can get the link to the online UCSC GB to display. I followed the suggestions in the following thread but it has yielded no results yet: http://gmod.827538.n3.nabble.com/using-local-Genome-Browser-mirror-td1829290.html . Here's what I have tried: - I edited the [galaxy-dir]/tool-data/shared/ucsc/ucsc_build_sites.txt file and duplicated the 'main' UCSC entry but changed the new entry's name to 'internalgb' and changed the URL to point to our local GB with * cgi-bin/hgTracks?* at the end - I set *ucsc_display_sites = internalgb* in universe_wsgi.ini - I made a modified copy of [galaxy-dir]/tools/data_source/ucsc_tablebrowser.xml (named HAIB_ucsc_tablebrowser.xml) in the same directory as the other XML files and created a tool link on the left side of the Galaxy page for it by adding it to the tool_conf.xml file (which works fine: when I click on it it loads our local Genome Browser inside the central Galaxy window frame) - I edited the HAIB_ucsc_tablebrowser.xml file according to the recommendation in the thread above: *I had to change the name and id of the tool in the new ucsc_tablebrowser.xml file, and keep the id the same for the param.toold_id.value as well*. So I gave it a unique id (changed ucsc_table_direct1 to HAIB_table_direct1) but kept the tool_id value field the same as the original/main UCSC tablebrowser (ucsc_table_direct1). Then I put the new HAIB id for our local tablebrowser into universe_wsgi.ini under the tool runners section as *HAIB_table_direct1 = local:///* - And of course I have restarted Galaxy after each change. Still after all of this I cannot see a display at link for bam other files from Galaxy for my local UCSC GB. When I change universe_wsgi.ini to have ucsc_display_sites = internalgb,main set (for both browsers) the online/main GB will have a display at link on Galaxy jobs but no link shows for my local browser mirror. I'm also a little fuzzy on the relation between the ucsc_build_sites.txt file and the *_ucsc_tablebrowser.xml files (if any). Any tips as to what I might be missing here? Thanks! Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Trouble linking to/displaying at local Genome Browser
Hi Dan, Good catch sir! Genius! It is working now. The spacing was exactly the same to the eye in vi for all the entries, but once I did a 'set list' in vi to display formatting characters I saw that 'main' had the tab ^I characters but my 'internalgb' entry didn't. This must have happened because I did a visual copy with my cursor. Thank you. -Josh On Mon, Feb 27, 2012 at 11:09 AM, Daniel Blankenberg d...@bx.psu.edu wrote: Hi Josh, It sounds like you are really close to getting this to work. My first guess would be that there is an issue with the modified ucsc_build_sites.txt file. Can you check that the line you copied is tab delimited and not space delimited? Thanks for using Galaxy, Dan On Feb 27, 2012, at 11:46 AM, Josh Nielsen wrote: Hello all, I am having some trouble getting a display at link to my local UCSC Genome Browser mirror to show up under my Galaxy jobs for viewing, although I can get the link to the online UCSC GB to display. I followed the suggestions in the following thread but it has yielded no results yet: http://gmod.827538.n3.nabble.com/using-local-Genome-Browser-mirror-td1829290.html . Here's what I have tried: - I edited the [galaxy-dir]/tool-data/shared/ucsc/ucsc_build_sites.txt file and duplicated the 'main' UCSC entry but changed the new entry's name to 'internalgb' and changed the URL to point to our local GB with * cgi-bin/hgTracks?* at the end - I set *ucsc_display_sites = internalgb* in universe_wsgi.ini - I made a modified copy of [galaxy-dir]/tools/data_source/ucsc_tablebrowser.xml (named HAIB_ucsc_tablebrowser.xml) in the same directory as the other XML files and created a tool link on the left side of the Galaxy page for it by adding it to the tool_conf.xml file (which works fine: when I click on it it loads our local Genome Browser inside the central Galaxy window frame) - I edited the HAIB_ucsc_tablebrowser.xml file according to the recommendation in the thread above: *I had to change the name and id of the tool in the new ucsc_tablebrowser.xml file, and keep the id the same for the param.toold_id.value as well*. So I gave it a unique id (changed ucsc_table_direct1 to HAIB_table_direct1) but kept the tool_id value field the same as the original/main UCSC tablebrowser (ucsc_table_direct1). Then I put the new HAIB id for our local tablebrowser into universe_wsgi.ini under the tool runners section as *HAIB_table_direct1 = local:///* - And of course I have restarted Galaxy after each change. Still after all of this I cannot see a display at link for bam other files from Galaxy for my local UCSC GB. When I change universe_wsgi.ini to have ucsc_display_sites = internalgb,main set (for both browsers) the online/main GB will have a display at link on Galaxy jobs but no link shows for my local browser mirror. I'm also a little fuzzy on the relation between the ucsc_build_sites.txt file and the *_ucsc_tablebrowser.xml files (if any). Any tips as to what I might be missing here? Thanks! Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Trouble linking to/displaying at local Genome Browser
One more thing while I'm at it. For clarification would only the first two steps I listed be sufficient to get the display at link in Galaxy? If so what is the benefit of the other steps (other than having an actual link to your local genome browser mirror in Galaxy)? There's nothing that ties my internalgb entry to the new tool_id I created is there? On Mon, Feb 27, 2012 at 11:58 AM, Josh Nielsen jniel...@hudsonalpha.comwrote: Hi Dan, Good catch sir! Genius! It is working now. The spacing was exactly the same to the eye in vi for all the entries, but once I did a 'set list' in vi to display formatting characters I saw that 'main' had the tab ^I characters but my 'internalgb' entry didn't. This must have happened because I did a visual copy with my cursor. Thank you. -Josh On Mon, Feb 27, 2012 at 11:09 AM, Daniel Blankenberg d...@bx.psu.eduwrote: Hi Josh, It sounds like you are really close to getting this to work. My first guess would be that there is an issue with the modified ucsc_build_sites.txt file. Can you check that the line you copied is tab delimited and not space delimited? Thanks for using Galaxy, Dan On Feb 27, 2012, at 11:46 AM, Josh Nielsen wrote: Hello all, I am having some trouble getting a display at link to my local UCSC Genome Browser mirror to show up under my Galaxy jobs for viewing, although I can get the link to the online UCSC GB to display. I followed the suggestions in the following thread but it has yielded no results yet: http://gmod.827538.n3.nabble.com/using-local-Genome-Browser-mirror-td1829290.html . Here's what I have tried: - I edited the [galaxy-dir]/tool-data/shared/ucsc/ucsc_build_sites.txt file and duplicated the 'main' UCSC entry but changed the new entry's name to 'internalgb' and changed the URL to point to our local GB with * cgi-bin/hgTracks?* at the end - I set *ucsc_display_sites = internalgb* in universe_wsgi.ini - I made a modified copy of [galaxy-dir]/tools/data_source/ucsc_tablebrowser.xml (named HAIB_ucsc_tablebrowser.xml) in the same directory as the other XML files and created a tool link on the left side of the Galaxy page for it by adding it to the tool_conf.xml file (which works fine: when I click on it it loads our local Genome Browser inside the central Galaxy window frame) - I edited the HAIB_ucsc_tablebrowser.xml file according to the recommendation in the thread above: *I had to change the name and id of the tool in the new ucsc_tablebrowser.xml file, and keep the id the same for the param.toold_id.value as well*. So I gave it a unique id (changed ucsc_table_direct1 to HAIB_table_direct1) but kept the tool_id value field the same as the original/main UCSC tablebrowser (u csc_table_direct1). Then I put the new HAIB id for our local tablebrowser into universe_wsgi.ini under the tool runners section as *HAIB_table_direct1 = local:///* - And of course I have restarted Galaxy after each change. Still after all of this I cannot see a display at link for bam other files from Galaxy for my local UCSC GB. When I change universe_wsgi.ini to have ucsc_display_sites = internalgb,main set (for both browsers) the online/main GB will have a display at link on Galaxy jobs but no link shows for my local browser mirror. I'm also a little fuzzy on the relation between the ucsc_build_sites.txt file and the *_ucsc_tablebrowser.xml files (if any). Any tips as to what I might be missing here? Thanks! Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Issues displaying/downloading datasets
I seem to be getting somewhere by monitoring the /var/log/httpd/error_log file (I don't know why I didn't think of that before). For each dataset I click I am seeing a corresponding message like this: *[Wed Jan 18 09:31:06 2012] [debug] mod_proxy_http.c(56): proxy: HTTP: canonicalising URL //localhost:8080/datasets/b23533e4ff1bb7ec/display/* * * *[Wed Jan 18 09:31:06 2012] [error] [client 172.26.14.93] (20023)The given path was above the root path: xsendfile: unable to find file: /panfs/galaxy_data/database/files/000/dataset_118.dat, referer: http://galaxy-dev.haib.org/galaxy/history* * * *[Wed Jan 18 09:31:06 2012] [debug] mod_proxy_http.c(1836): proxy: end body send* So it seems that there is some issue with the xsendfile module, or with apache serving up the dataset files, because '/panfs/galaxy_data/database/files/000/dataset_118.dat' is present. The only thing that I recall changing recently was to move the /panfs/galaxy_data/database/ directory from the local file system to an nfs share with the exact same path (mounted on /panfs). I'm not sure how that would make a difference though. I'm still looking into the xsendfile error lead though. -Josh On Tue, Jan 17, 2012 at 11:08 AM, Josh Nielsen jniel...@hudsonalpha.comwrote: Also, I just uploaded a 1.3GB FASTQ file and the small preview box in the history pane shows the first few lines, and when I click on the eye it actually displays in the window with the message This dataset is large and only the first megabyte is shown below. Show all | Save and it shows the first megabyte with no problems, but if I click 'Show all' or 'Save' I get the message The requested URL /galaxy/datasets/2faba7054d92b2df/display/ was not found on this server from apache. So I'm having a specific problem with displaying the whole dataset according to the URL it is trying to load. -Josh -- Forwarded message -- From: Josh Nielsen jniel...@hudsonalpha.com Date: Tue, Jan 17, 2012 at 10:22 AM Subject: Issues displaying/downloading datasets To: galaxy-dev@lists.bx.psu.edu Hello all, I recently have been having problems viewing/displaying datasets (with the eye icon) as well as downloading datasets in Galaxy which I have uploaded, although I can actually point to those datasets as input to other tools and they show up on the drop down menus and it runs perfectly. Every time that I click on the eye icon for a dataset in my history pane I get an apache error which displays in the window that says it cannot find /galaxy/datasets/X/display/?preview=True. I see a corresponding entry like this in paster.log (for example): *[17/Jan/2012:09:53:03 -0500] GET /galaxy/datasets/92b83968e0b52980/display/?preview=True HTTP/1.1 200 - http://galaxy-dev.haib.org/galaxy/history; Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1* For downloads I get the same error except that the requested URL is: *GET /galaxy/datasets/92b83968e0b52980/display?to_ext=txt* The alphanumeric code is of course different for each dataset but I am puzzled at how to even debug this because I cannot find anywhere on the file system or under the galaxy-dist directory any path that is named datasets and has a display subfolder (so I assume it is an internal url/path notation). I looked at the python code some and all I got was a headache. I see that it uses a fetch url method to grab a specific url for each dataset but I'm not sure what it is actually looking for on the file system, or if the url is just an alias to something else. I thought to check in the MySQL database but didn't see any corresponding values that matched datasets or the alphanumeric code (which I still can't tell where it is getting that from). Everything else in Galaxy works fine except for this. Could anyone please point me in the right direction about how to debug this? It would be much appreciated! Thanks, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Issues displaying/downloading datasets
Ah, following the lead in the log paid off. I had to add the statement *XSendFilePath /panfs/galaxy_data *to /etc/httpd/conf/httpd.conf. Apparently ever since I had moved the dataset directory to /panfs/galaxy_data by setting the file_path variable in universe_wsgi.ini I have not attempted to view a dataset, and so I didn't even notice that the functionality broke when I moved it at first. I just needed to point XSendFilePath to the new directory and it worked. I hope this helps someone else if they encounter the same problem. Cheers, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Issues displaying/downloading datasets
Hello all, I recently have been having problems viewing/displaying datasets (with the eye icon) as well as downloading datasets in Galaxy which I have uploaded, although I can actually point to those datasets as input to other tools and they show up on the drop down menus and it runs perfectly. Every time that I click on the eye icon for a dataset in my history pane I get an apache error which displays in the window that says it cannot find /galaxy/datasets/X/display/?preview=True. I see a corresponding entry like this in paster.log (for example): *[17/Jan/2012:09:53:03 -0500] GET /galaxy/datasets/92b83968e0b52980/display/?preview=True HTTP/1.1 200 - http://galaxy-dev.haib.org/galaxy/history; Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1* For downloads I get the same error except that the requested URL is: *GET /galaxy/datasets/92b83968e0b52980/display?to_ext=txt* The alphanumeric code is of course different for each dataset but I am puzzled at how to even debug this because I cannot find anywhere on the file system or under the galaxy-dist directory any path that is named datasets and has a display subfolder (so I assume it is an internal url/path notation). I looked at the python code some and all I got was a headache. I see that it uses a fetch url method to grab a specific url for each dataset but I'm not sure what it is actually looking for on the file system, or if the url is just an alias to something else. I thought to check in the MySQL database but didn't see any corresponding values that matched datasets or the alphanumeric code (which I still can't tell where it is getting that from). Everything else in Galaxy works fine except for this. Could anyone please point me in the right direction about how to debug this? It would be much appreciated! Thanks, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Issues displaying/downloading datasets
Also, I just uploaded a 1.3GB FASTQ file and the small preview box in the history pane shows the first few lines, and when I click on the eye it actually displays in the window with the message This dataset is large and only the first megabyte is shown below. Show all | Save and it shows the first megabyte with no problems, but if I click 'Show all' or 'Save' I get the message The requested URL /galaxy/datasets/2faba7054d92b2df/display/ was not found on this server from apache. So I'm having a specific problem with displaying the whole dataset according to the URL it is trying to load. -Josh -- Forwarded message -- From: Josh Nielsen jniel...@hudsonalpha.com Date: Tue, Jan 17, 2012 at 10:22 AM Subject: Issues displaying/downloading datasets To: galaxy-dev@lists.bx.psu.edu Hello all, I recently have been having problems viewing/displaying datasets (with the eye icon) as well as downloading datasets in Galaxy which I have uploaded, although I can actually point to those datasets as input to other tools and they show up on the drop down menus and it runs perfectly. Every time that I click on the eye icon for a dataset in my history pane I get an apache error which displays in the window that says it cannot find /galaxy/datasets/X/display/?preview=True. I see a corresponding entry like this in paster.log (for example): *[17/Jan/2012:09:53:03 -0500] GET /galaxy/datasets/92b83968e0b52980/display/?preview=True HTTP/1.1 200 - http://galaxy-dev.haib.org/galaxy/history; Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1* For downloads I get the same error except that the requested URL is: *GET /galaxy/datasets/92b83968e0b52980/display?to_ext=txt* The alphanumeric code is of course different for each dataset but I am puzzled at how to even debug this because I cannot find anywhere on the file system or under the galaxy-dist directory any path that is named datasets and has a display subfolder (so I assume it is an internal url/path notation). I looked at the python code some and all I got was a headache. I see that it uses a fetch url method to grab a specific url for each dataset but I'm not sure what it is actually looking for on the file system, or if the url is just an alias to something else. I thought to check in the MySQL database but didn't see any corresponding values that matched datasets or the alphanumeric code (which I still can't tell where it is getting that from). Everything else in Galaxy works fine except for this. Could anyone please point me in the right direction about how to debug this? It would be much appreciated! Thanks, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] How and where to install tool dependencies
Thanks Nate! Updated documentation is always welcome and useful. I appreciate the clarifications. -J On Mon, Dec 12, 2011 at 10:18 AM, Nate Coraor n...@bx.psu.edu wrote: On Dec 9, 2011, at 4:34 PM, Josh Nielsen wrote: Hello, I have a question which I have not seen specifically addressed in the online Galaxy wiki documentation about how to integrate tools (dependencies) into Galaxy. I have implemented a locally managed instance of Galaxy that my business is using with our cluster and now have a freshly installed and configured instance of Galaxy running. It is bare-bones right now and I did not use mercurial to sync any existing files/directory structures. I have seen the page on external tool dependencies ( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Dependencies) needed for Galaxy, but I am somewhat unsure where to place the tools to utilize them as intended (other than through trial error). It appears that there are shell directories for the tools under ~/galaxy-dist/tools/ with basic wrapper scripts but without the corresponding executables (very few that I've noticed have the tools already in them). Is the intent to download the dependency tools and (building from source if necessary) take the binaries in those directories and copy them to their corresponding directory under ~/galaxy-dist/tools/? This seems to have worked with an error I first got when clipping a FASTQ file which reported that fastx_clipper was not a recognized command. So I downloaded the FASTX Toolkit, compiled the binaries, and copied only the binaries into the corresponding fastx tools directory. Would I do the same thing for TopHat and Cufflinks by taking all their binaries (combined) and copying them into ~/galaxy-dist/tools/ngs_rna/? Hi Josh, There are two ways to do this. The simplest is to place the binaries into a directory on the Galaxy user's $PATH. The second is via the tool dependency system, which I need to write up documentation for to put in the wiki, which I'll do this week. Even if that is the case though, I have occasionally gotten errors about tools missing in completely different directories. One was for the FASTQ Groomer. One user saw this error in their browser (which for now is the only way I know to figure out where tools are *expected* to be): File /home/galaxy/galaxy-dist/tools/rgenetics/rgFastQC.py, line 141, in assert os.path.isfile(opts.executable),'##rgFastQC.py error - cannot find executable %s' % opts.executable AssertionError: ##rgFastQC.py error - cannot find executable /home/galaxy/galaxy-dist/tool-data/shared/jars/FastQC/fastqc Java JARs are a special case, and FastQC has a unique way of locating its jars, which is why it is expected to be found in that directory. This needs to be documented. To fix this I downloaded the FastQC tar file from its webpage, unzipped it, and copied the fastqc binary/script to the home/galaxy/galaxy-dist/tool-data/shared/jars/FastQC/ directory. I also had to mkdir FastQC/ under jars/ to place it there since it didn't already exist. Had I not been told the specific directory by the error I'm not sure how I would have intuitively known to place the binary there (unless I'm overlooking some critical documentation). And how do I know that other similar things are not missing which should be there? Can anyone shed some light on this please? Adding a brief page on the Galaxy wiki site under the Admin section about this would really help, even if it only showed an example for one or two specific tools. The list of external dependencies by tool is maintained here: http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Dependencies I'll update this page with links to the new documentation when I write it. I should also add that work is under way to make it possible to automatically install these dependencies as needed. --nate Thanks, Josh ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Josh Nielsen Systems Administrator HudsonAlpha Institute for Biotechnology 256-319-1485 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] How and where to install tool dependencies
Thanks Jeremy, That does clear quite a few things up. I will probably take the route of just adding the directories that contain the tool executables to my path. And I guess I'll just pay attention to which tools are Java based and copy the executables under the jars directory. Thanks for looking on the wiki too. If you find (or create) a suitable wiki page sometime soon would you be so kind as to post a link to it here, for myself and posterity? That would be great. Thanks for your help! P.S. Sorry for the 'double post'. I don't use mailing lists often. -J On Sat, Dec 10, 2011 at 8:33 AM, Jeremy Goecks jeremy.goe...@emory.eduwrote: Josh, It appears that there are shell directories for the tools under ~/galaxy-dist/tools/ with basic wrapper scripts but without the corresponding executables (very few that I've noticed have the tools already in them). Is the intent to download the dependency tools and (building from source if necessary) take the binaries in those directories and copy them to their corresponding directory under ~/galaxy-dist/tools/? This seems to have worked with an error I first got when clipping a FASTQ file which reported that fastx_clipper was not a recognized command. So I downloaded the FASTX Toolkit, compiled the binaries, and copied only the binaries into the corresponding fastx tools directory. Would I do the same thing for TopHat and Cufflinks by taking all their binaries (combined) and copying them into ~/galaxy-dist/tools/ngs_rna/? You'll want to read about Galaxy Tool files a bit to understand the files in ~/galaxy-dist/tools: http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax#Admin.2BAC8-Tools.2BAC8-Tool_Config_Syntax.Galaxy_Tool_XML_File These are not shell directories; instead, they include tool config files + additional wrapper scripts to run a tool in Galaxy. To answer your question, executables for tools need to be in your path but do not need to be in the config/wrapper directories. For example, in an SGE cluster, we suggest setting the PATH environment var in ~/.sge_request Even if that is the case though, I have occasionally gotten errors about tools missing in completely different directories. One was for the FASTQ Groomer. One user saw this error in their browser (which for now is the only way I know to figure out where tools are *expected* to be): *File /home/galaxy/galaxy-dist/tools/rgenetics/rgFastQC.py, line 141, in assert os.path.isfile(opts.executable),'##rgFastQC.py error - cannot find executable %s' % opts.executable AssertionError: ##rgFastQC.py error - cannot find executable /home/galaxy/galaxy-dist/tool-data/shared/jars/FastQC/fastqc * The exception to the above is Java-based tools. For these tools, you'll need to use the ~/galaxy-dist/shared/jars directory. This is a limitation of Galaxy that will likely be addressed in the future. Adding a brief page on the Galaxy wiki site under the Admin section about this would really help, even if it only showed an example for one or two specific tools. I looked a bit but couldn't find it; I suspect it is out on the wiki somewhere, though clearly it needs to be easier to find. Good luck, J. -- Josh Nielsen Systems Administrator HudsonAlpha Institute for Biotechnology 256-319-1485 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] How and where to install tool dependencies
Hello, I have a question which I have not seen specifically addressed in the online Galaxy wiki documentation about how to integrate tools (dependencies) into Galaxy. I have implemented a locally managed instance of Galaxy that my business is using with our cluster and now have a freshly installed and configured instance of Galaxy running. It is bare-bones right now and I did not use mercurial to sync any existing files/directory structures. I have seen the page on external tool dependencies ( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Dependencies) needed for Galaxy, but I am somewhat unsure where to place the tools to utilize them as intended (other than through trial error). It appears that there are shell directories for the tools under ~/galaxy-dist/tools/ with basic wrapper scripts but without the corresponding executables (very few that I've noticed have the tools already in them). Is the intent to download the dependency tools and (building from source if necessary) take the binaries in those directories and copy them to their corresponding directory under ~/galaxy-dist/tools/? This seems to have worked with an error I first got when clipping a FASTQ file which reported that fastx_clipper was not a recognized command. So I downloaded the FASTX Toolkit, compiled the binaries, and copied only the binaries into the corresponding fastx tools directory. Would I do the same thing for TopHat and Cufflinks by taking all their binaries (combined) and copying them into ~/galaxy-dist/tools/ngs_rna/? Even if that is the case though, I have occasionally gotten errors about tools missing in completely different directories. One was for the FASTQ Groomer. One user saw this error in their browser (which for now is the only way I know to figure out where tools are *expected* to be): *File /home/galaxy/galaxy-dist/tools/rgenetics/rgFastQC.py, line 141, in assert os.path.isfile(opts.executable),'##rgFastQC.py error - cannot find executable %s' % opts.executable AssertionError: ##rgFastQC.py error - cannot find executable /home/galaxy/galaxy-dist/tool-data/shared/jars/FastQC/fastqc * To fix this I downloaded the FastQC tar file from its webpage, unzipped it, and copied the fastqc binary/script to the home/galaxy/galaxy -dist/tool-data/shared/jars/FastQC/** directory. I also had to mkdir FastQC/ under jars/ to place it there since it didn't already exist. Had I not been told the specific directory by the error I'm not sure how I would have intuitively known to place the binary there (unless I'm overlooking some critical documentation). And how do I know that other similar things are not missing which should be there? Can anyone shed some light on this please? Adding a brief page on the Galaxy wiki site under the Admin section about this would really help, even if it only showed an example for one or two specific tools. Thanks, Josh Nielsen ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/