Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Feb 8, 2012, at 9:32 PM, Fields, Christopher J wrote: 'samtools sort' seems to be running on our server end as well (not on the cluster). I may look into it a bit more myself. Snapshot of top off our server (you can see our local runner as well): PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3950 galaxy20 0 1303m 1.2g 676 R 99.7 15.2 234:48.07 samtools sort /home/a-m/galaxy/dist-database/file/000/dataset_587.dat /home/a-m/galaxy/dist-database/tmp/tmp9tv6zc/sorted 5417 galaxy20 0 1186m 104m 5384 S 0.3 1.3 0:15.08 python ./scripts/paster.py serve universe_wsgi.runner.ini --server-name=runner0 --pid-file=runner0.pid --log-file=runner0.log --daemon Hi Chris, 'samtools sort' is run by groom_dataset_contents, which should only be called from within the upload tool, which should run on the cluster unless you still have the default local override for it in your job runner's config file. Ryan's instance is running 'samtools index' which is in set_meta which is supposed to be run on the cluster if set_metadata_externally = True, but can be run locally under certain conditions. --nate chris On Jan 20, 2012, at 10:43 AM, Shantanu Pavgi wrote: Just wanted to add that we have consistently seen this issue of 'samtools index' running locally on our install. We are using SGE scheduler. Thanks for pointing out details in the code Nate. -- Shantanu. On Jan 20, 2012, at 9:35 AM, Nate Coraor wrote: On Jan 18, 2012, at 11:54 AM, Ryan Golhar wrote: Nate - Is there a specific place in the Galaxy code that forks the samtools index on bam files on the cluster or the head node? I really need to track this down. Hey Ryan, Sorry it's taken so long, I've been pretty busy. The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class. When Galaxy runs a tool, it creates a Job, which is placed inside a JobWrapper in lib/galaxy/jobs/__init__.py. After the job execution is complete, the JobWrapper.finish() method is called, which contains: if not self.app.config.set_metadata_externally or \ ( not self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) \ and self.app.config.retry_metadata_internally ): dataset.set_meta( overwrite = False ) Somehow, this conditional is being entered. Since set_metadata_externally is set to True, presumably the problem is external_metadata_set_successfully() is returning False and retry_metadata_internally is set to True. If you leave behind the relevant job files (cleanup_job = never) and have a look at the PBS and metadata outputs you may be able to see what's happening. Also, you'll want to set retry_metadata_internally = False. --nate On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/76/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in default queue as 114.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 (76/114.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 (76/114.localhost.localdomain) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job state changed from E to C galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job has completed successfully 76.sh shows: [galaxy@bic pbs]$ more 76.sh #!/bin/sh GALAXY_LIB=/home/galaxy/galaxy-dist-9/lib if [ $GALAXY_LIB != None ]; then if [ -n $PYTHONPATH ]; then export PYTHONPATH=$GALAXY_LIB:$PYTHONPATH else export PYTHONPATH=$GALAXY_LIB fi fi cd
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Feb 8, 2012, at 11:58 AM, Ryan Golhar wrote: Hi Nate - I finally got a chance to look at this briefly, but I must admit, my Python skills are lacking. In the Bam class in binary.py, all I see are calls to proc = subprocess.Popen( args=command, shell=True, cwd=tmp_dir, stderr=open( stderr_name, 'wb' ) ) which, to me, look like calls to execute a command. So maybe Galaxy is running samtools on the webserver because of this? This is indeed the place in the code where samtools called, but that code can be called from within the external metadata setting tool or from the job runner. In your case, it's happening in the job runner despite having set_metadata_externally = True. Could you check the conditionals in the earlier email I sent: The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class. When Galaxy runs a tool, it creates a Job, which is placed inside a JobWrapper in lib/galaxy/jobs/__init__.py. After the job execution is complete, the JobWrapper.finish() method is called, which contains: if not self.app.config.set_metadata_externally or \ ( not self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) \ and self.app.config.retry_metadata_internally ): dataset.set_meta( overwrite = False ) Somehow, this conditional is being entered. Since set_metadata_externally is set to True, presumably the problem is external_metadata_set_successfully() is returning False and retry_metadata_internally is set to True. If you leave behind the relevant job files (cleanup_job = never) and have a look at the PBS and metadata outputs you may be able to see what's happening. Also, you'll want to set retry_metadata_internally = False. Namely, try adding the following right above that conditional: log.debug(' %s: %s' % (type(self.app.config.set_metadata_externally), self.app.config.set_metadata_externally)) log.debug(' %s: %s' % (type(self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ), self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ))) log.debug(' %s: %s' % (type(self.app.config.retry_metadata_internally), self.app.config.retry_metadata_internally)) I am guessing self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) is returning False, and self.app.config.retry_metadata_internally is True, so then we'd need to determine why external metadata is failing for this job. --nate On Fri, Jan 20, 2012 at 11:43 AM, Shantanu Pavgi pa...@uab.edu wrote: Just wanted to add that we have consistently seen this issue of 'samtools index' running locally on our install. We are using SGE scheduler. Thanks for pointing out details in the code Nate. -- Shantanu. On Jan 20, 2012, at 9:35 AM, Nate Coraor wrote: On Jan 18, 2012, at 11:54 AM, Ryan Golhar wrote: Nate - Is there a specific place in the Galaxy code that forks the samtools index on bam files on the cluster or the head node? I really need to track this down. Hey Ryan, Sorry it's taken so long, I've been pretty busy. The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class. When Galaxy runs a tool, it creates a Job, which is placed inside a JobWrapper in lib/galaxy/jobs/__init__.py. After the job execution is complete, the JobWrapper.finish() method is called, which contains: if not self.app.config.set_metadata_externally or \ ( not self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) \ and self.app.config.retry_metadata_internally ): dataset.set_meta( overwrite = False ) Somehow, this conditional is being entered. Since set_metadata_externally is set to True, presumably the problem is external_metadata_set_successfully() is returning False and retry_metadata_internally is set to True. If you leave behind the relevant job files (cleanup_job = never) and have a look at the PBS and metadata outputs you may be able to see what's happening. Also, you'll want to set retry_metadata_internally = False. --nate On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Feb 13, 2012, at 9:45 AM, Nate Coraor wrote: On Feb 8, 2012, at 9:32 PM, Fields, Christopher J wrote: 'samtools sort' seems to be running on our server end as well (not on the cluster). I may look into it a bit more myself. Snapshot of top off our server (you can see our local runner as well): PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3950 galaxy20 0 1303m 1.2g 676 R 99.7 15.2 234:48.07 samtools sort /home/a-m/galaxy/dist-database/file/000/dataset_587.dat /home/a-m/galaxy/dist-database/tmp/tmp9tv6zc/sorted 5417 galaxy20 0 1186m 104m 5384 S 0.3 1.3 0:15.08 python ./scripts/paster.py serve universe_wsgi.runner.ini --server-name=runner0 --pid-file=runner0.pid --log-file=runner0.log --daemon Hi Chris, 'samtools sort' is run by groom_dataset_contents, which should only be called from within the upload tool, which should run on the cluster unless you still have the default local override for it in your job runner's config file. Yes, that is likely the problem. Our cluster was running an old version of python (v2.4) that was also UCS2 (bx_python broke), so we were running locally. That was rectified this past week (the admins insisted on not installing a python version locally, so we insisted back they install something modern using UCS4). I tested a single upload with success off the cluster, so I would guess this is rectified (I'll confirm that). Is there any information on data grooming on the wiki? I only found info relevant to FASTQ grooming, not SAM/BAM. Ryan's instance is running 'samtools index' which is in set_meta which is supposed to be run on the cluster if set_metadata_externally = True, but can be run locally under certain conditions. --nate Will have to check, but I believe we have not set that yet either. We are in the midst of moving all jobs to the cluster, just rectifying the various issues with disparate python versions, etc. which now seem to be rectified, so that will shortly be resolved as well. chris ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Feb 13, 2012, at 11:52 AM, Fields, Christopher J wrote: On Feb 13, 2012, at 9:45 AM, Nate Coraor wrote: On Feb 8, 2012, at 9:32 PM, Fields, Christopher J wrote: 'samtools sort' seems to be running on our server end as well (not on the cluster). I may look into it a bit more myself. Snapshot of top off our server (you can see our local runner as well): PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3950 galaxy20 0 1303m 1.2g 676 R 99.7 15.2 234:48.07 samtools sort /home/a-m/galaxy/dist-database/file/000/dataset_587.dat /home/a-m/galaxy/dist-database/tmp/tmp9tv6zc/sorted 5417 galaxy20 0 1186m 104m 5384 S 0.3 1.3 0:15.08 python ./scripts/paster.py serve universe_wsgi.runner.ini --server-name=runner0 --pid-file=runner0.pid --log-file=runner0.log --daemon Hi Chris, 'samtools sort' is run by groom_dataset_contents, which should only be called from within the upload tool, which should run on the cluster unless you still have the default local override for it in your job runner's config file. Yes, that is likely the problem. Our cluster was running an old version of python (v2.4) that was also UCS2 (bx_python broke), so we were running locally. That was rectified this past week (the admins insisted on not installing a python version locally, so we insisted back they install something modern using UCS4). I tested a single upload with success off the cluster, so I would guess this is rectified (I'll confirm that). Is there any information on data grooming on the wiki? I only found info relevant to FASTQ grooming, not SAM/BAM. FASTQ grooming runs voluntarily as a tool. The datatype grooming method is only called at the end of the upload tool, and is only defined for the Bam datatype (although other datatypes could define it). I believe it's implemented this way because it was deemed inefficient to force FASTQ grooming when the FASTQ may already be in an acceptable format. I am not sure why the same determination was not made for BAM, so perhaps one of my colleagues will clarify that. Ryan's instance is running 'samtools index' which is in set_meta which is supposed to be run on the cluster if set_metadata_externally = True, but can be run locally under certain conditions. --nate Will have to check, but I believe we have not set that yet either. We are in the midst of moving all jobs to the cluster, just rectifying the various issues with disparate python versions, etc. which now seem to be rectified, so that will shortly be resolved as well. set_metadata_externally = True should just work and will significantly decrease the performance penalty taken on the server and by the (effectively single-threaded) Galaxy process. --nate chris ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Thu, Feb 9, 2012 at 2:57 AM, Fields, Christopher J cjfie...@illinois.edu wrote: Forgot to add, but this also seems tied to the same problem Ryan's describing. IIRC Galaxy also runs 'samtools sort' after certain jobs, correct? chris This sounds like part of the BAM grooming (assuming that's what the Galaxy team call it, based on their term FASTQ grooming) which tries to ensure BAM files are coordinate sorted and indexed on upload/import etc. Perhaps this is a general symptom of file format conversion also being done on the server rather than as a cluster job? Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
Hi Nate - I finally got a chance to look at this briefly, but I must admit, my Python skills are lacking. In the Bam class in binary.py, all I see are calls to proc = subprocess.Popen( args=command, shell=True, cwd=tmp_dir, stderr=open( stderr_name, 'wb' ) ) which, to me, look like calls to execute a command. So maybe Galaxy is running samtools on the webserver because of this? On Fri, Jan 20, 2012 at 11:43 AM, Shantanu Pavgi pa...@uab.edu wrote: Just wanted to add that we have consistently seen this issue of 'samtools index' running locally on our install. We are using SGE scheduler. Thanks for pointing out details in the code Nate. -- Shantanu. On Jan 20, 2012, at 9:35 AM, Nate Coraor wrote: On Jan 18, 2012, at 11:54 AM, Ryan Golhar wrote: Nate - Is there a specific place in the Galaxy code that forks the samtools index on bam files on the cluster or the head node? I really need to track this down. Hey Ryan, Sorry it's taken so long, I've been pretty busy. The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class. When Galaxy runs a tool, it creates a Job, which is placed inside a JobWrapper in lib/galaxy/jobs/__init__.py. After the job execution is complete, the JobWrapper.finish() method is called, which contains: if not self.app.config.set_metadata_externally or \ ( not self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) \ and self.app.config.retry_metadata_internally ): dataset.set_meta( overwrite = False ) Somehow, this conditional is being entered. Since set_metadata_externally is set to True, presumably the problem is external_metadata_set_successfully() is returning False and retry_metadata_internally is set to True. If you leave behind the relevant job files (cleanup_job = never) and have a look at the PBS and metadata outputs you may be able to see what's happening. Also, you'll want to set retry_metadata_internally = False. --nate On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/76/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in default queue as 114.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 (76/114.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 (76/114.localhost.localdomain) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job state changed from E to C galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job has completed successfully 76.sh shows: [galaxy@bic pbs]$ more 76.sh #!/bin/sh GALAXY_LIB=/home/galaxy/galaxy-dist-9/lib if [ $GALAXY_LIB != None ]; then if [ -n $PYTHONPATH ]; then export PYTHONPATH=$GALAXY_LIB:$PYTHONPATH else export PYTHONPATH=$GALAXY_LIB fi fi cd /home/galaxy/galaxy-dist-9/database/job_working_directory/76 python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy /galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/ home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_dire ctory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . dataty pes_conf.xml ./database/job_working_directory/76/galaxy.json Right as the job ended I check the job output files: [galaxy@bic pbs]$ ll total 4 -rw-rw-r-- 1 galaxy galaxy 950 Jan 13
Re: [galaxy-dev] Status on importing BAM file into Library does not update
'samtools sort' seems to be running on our server end as well (not on the cluster). I may look into it a bit more myself. Snapshot of top off our server (you can see our local runner as well): PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3950 galaxy20 0 1303m 1.2g 676 R 99.7 15.2 234:48.07 samtools sort /home/a-m/galaxy/dist-database/file/000/dataset_587.dat /home/a-m/galaxy/dist-database/tmp/tmp9tv6zc/sorted 5417 galaxy20 0 1186m 104m 5384 S 0.3 1.3 0:15.08 python ./scripts/paster.py serve universe_wsgi.runner.ini --server-name=runner0 --pid-file=runner0.pid --log-file=runner0.log --daemon chris On Jan 20, 2012, at 10:43 AM, Shantanu Pavgi wrote: Just wanted to add that we have consistently seen this issue of 'samtools index' running locally on our install. We are using SGE scheduler. Thanks for pointing out details in the code Nate. -- Shantanu. On Jan 20, 2012, at 9:35 AM, Nate Coraor wrote: On Jan 18, 2012, at 11:54 AM, Ryan Golhar wrote: Nate - Is there a specific place in the Galaxy code that forks the samtools index on bam files on the cluster or the head node? I really need to track this down. Hey Ryan, Sorry it's taken so long, I've been pretty busy. The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class. When Galaxy runs a tool, it creates a Job, which is placed inside a JobWrapper in lib/galaxy/jobs/__init__.py. After the job execution is complete, the JobWrapper.finish() method is called, which contains: if not self.app.config.set_metadata_externally or \ ( not self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) \ and self.app.config.retry_metadata_internally ): dataset.set_meta( overwrite = False ) Somehow, this conditional is being entered. Since set_metadata_externally is set to True, presumably the problem is external_metadata_set_successfully() is returning False and retry_metadata_internally is set to True. If you leave behind the relevant job files (cleanup_job = never) and have a look at the PBS and metadata outputs you may be able to see what's happening. Also, you'll want to set retry_metadata_internally = False. --nate On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/76/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in default queue as 114.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 (76/114.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 (76/114.localhost.localdomain) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job state changed from E to C galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job has completed successfully 76.sh shows: [galaxy@bic pbs]$ more 76.sh #!/bin/sh GALAXY_LIB=/home/galaxy/galaxy-dist-9/lib if [ $GALAXY_LIB != None ]; then if [ -n $PYTHONPATH ]; then export PYTHONPATH=$GALAXY_LIB:$PYTHONPATH else export PYTHONPATH=$GALAXY_LIB fi fi cd /home/galaxy/galaxy-dist-9/database/job_working_directory/76 python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy /galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/ home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_dire ctory/76/dataset_210_files:None; cd
Re: [galaxy-dev] Status on importing BAM file into Library does not update
Forgot to add, but this also seems tied to the same problem Ryan's describing. IIRC Galaxy also runs 'samtools sort' after certain jobs, correct? chris On Feb 8, 2012, at 8:32 PM, Fields, Christopher J wrote: 'samtools sort' seems to be running on our server end as well (not on the cluster). I may look into it a bit more myself. Snapshot of top off our server (you can see our local runner as well): PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3950 galaxy20 0 1303m 1.2g 676 R 99.7 15.2 234:48.07 samtools sort /home/a-m/galaxy/dist-database/file/000/dataset_587.dat /home/a-m/galaxy/dist-database/tmp/tmp9tv6zc/sorted 5417 galaxy20 0 1186m 104m 5384 S 0.3 1.3 0:15.08 python ./scripts/paster.py serve universe_wsgi.runner.ini --server-name=runner0 --pid-file=runner0.pid --log-file=runner0.log --daemon chris On Jan 20, 2012, at 10:43 AM, Shantanu Pavgi wrote: Just wanted to add that we have consistently seen this issue of 'samtools index' running locally on our install. We are using SGE scheduler. Thanks for pointing out details in the code Nate. -- Shantanu. On Jan 20, 2012, at 9:35 AM, Nate Coraor wrote: On Jan 18, 2012, at 11:54 AM, Ryan Golhar wrote: Nate - Is there a specific place in the Galaxy code that forks the samtools index on bam files on the cluster or the head node? I really need to track this down. Hey Ryan, Sorry it's taken so long, I've been pretty busy. The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class. When Galaxy runs a tool, it creates a Job, which is placed inside a JobWrapper in lib/galaxy/jobs/__init__.py. After the job execution is complete, the JobWrapper.finish() method is called, which contains: if not self.app.config.set_metadata_externally or \ ( not self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) \ and self.app.config.retry_metadata_internally ): dataset.set_meta( overwrite = False ) Somehow, this conditional is being entered. Since set_metadata_externally is set to True, presumably the problem is external_metadata_set_successfully() is returning False and retry_metadata_internally is set to True. If you leave behind the relevant job files (cleanup_job = never) and have a look at the PBS and metadata outputs you may be able to see what's happening. Also, you'll want to set retry_metadata_internally = False. --nate On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/76/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in default queue as 114.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 (76/114.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 (76/114.localhost.localdomain) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job state changed from E to C galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job has completed successfully 76.sh shows: [galaxy@bic pbs]$ more 76.sh #!/bin/sh GALAXY_LIB=/home/galaxy/galaxy-dist-9/lib if [ $GALAXY_LIB != None ]; then if [ -n $PYTHONPATH ]; then export PYTHONPATH=$GALAXY_LIB:$PYTHONPATH else export PYTHONPATH=$GALAXY_LIB fi fi cd /home/galaxy/galaxy-dist-9/database/job_working_directory/76 python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy /galaxy-dist-9/database/tmp/tmpqrVYY7
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 18, 2012, at 11:54 AM, Ryan Golhar wrote: Nate - Is there a specific place in the Galaxy code that forks the samtools index on bam files on the cluster or the head node? I really need to track this down. Hey Ryan, Sorry it's taken so long, I've been pretty busy. The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class. When Galaxy runs a tool, it creates a Job, which is placed inside a JobWrapper in lib/galaxy/jobs/__init__.py. After the job execution is complete, the JobWrapper.finish() method is called, which contains: if not self.app.config.set_metadata_externally or \ ( not self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) \ and self.app.config.retry_metadata_internally ): dataset.set_meta( overwrite = False ) Somehow, this conditional is being entered. Since set_metadata_externally is set to True, presumably the problem is external_metadata_set_successfully() is returning False and retry_metadata_internally is set to True. If you leave behind the relevant job files (cleanup_job = never) and have a look at the PBS and metadata outputs you may be able to see what's happening. Also, you'll want to set retry_metadata_internally = False. --nate On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/76/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in default queue as 114.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 (76/114.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 (76/114.localhost.localdomain) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job state changed from E to C galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job has completed successfully 76.sh shows: [galaxy@bic pbs]$ more 76.sh #!/bin/sh GALAXY_LIB=/home/galaxy/galaxy-dist-9/lib if [ $GALAXY_LIB != None ]; then if [ -n $PYTHONPATH ]; then export PYTHONPATH=$GALAXY_LIB:$PYTHONPATH else export PYTHONPATH=$GALAXY_LIB fi fi cd /home/galaxy/galaxy-dist-9/database/job_working_directory/76 python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy /galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/ home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_dire ctory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . dataty pes_conf.xml ./database/job_working_directory/76/galaxy.json Right as the job ended I check the job output files: [galaxy@bic pbs]$ ll total 4 -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh [galaxy@bic pbs]$ ll total 4 -rw--- 1 galaxy galaxy 0 Jan 13 12:50 76.e -rw--- 1 galaxy galaxy 0 Jan 13 12:50 76.o -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh samtools is now running on the head node. Where does Galaxy decide how to run samtools? Maybe I can add a check of some sort to see what's going on? On Fri, Jan 13, 2012 at 10:53 AM, Nate Coraor n...@bx.psu.edu wrote: On Jan 12, 2012, at 11:41 PM, Ryan Golhar wrote: Any ideas as to how to fix this? We are interested in using Galaxy to host all our NGS data. If indexing on the head node is going to happen, then this is going to be an extremely slow process. Could you post the
Re: [galaxy-dev] Status on importing BAM file into Library does not update
Galaxy shouldn't be trying to do that, but it also shouldn't cause metadata to fail. On Jan 20, 2012, at 10:52 AM, Ryan Golhar wrote: Thanks Nate. I'll play with that. Could it be that Galaxy is trying to reset the permissions or ownership of the imported BAM files. I'm not copying them into Galaxy, rather I am linking to them. That is the only error I see in runner0.log that indicates any type of failure. On Fri, Jan 20, 2012 at 10:35 AM, Nate Coraor n...@bx.psu.edu wrote: On Jan 18, 2012, at 11:54 AM, Ryan Golhar wrote: Nate - Is there a specific place in the Galaxy code that forks the samtools index on bam files on the cluster or the head node? I really need to track this down. Hey Ryan, Sorry it's taken so long, I've been pretty busy. The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class. When Galaxy runs a tool, it creates a Job, which is placed inside a JobWrapper in lib/galaxy/jobs/__init__.py. After the job execution is complete, the JobWrapper.finish() method is called, which contains: if not self.app.config.set_metadata_externally or \ ( not self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) \ and self.app.config.retry_metadata_internally ): dataset.set_meta( overwrite = False ) Somehow, this conditional is being entered. Since set_metadata_externally is set to True, presumably the problem is external_metadata_set_successfully() is returning False and retry_metadata_internally is set to True. If you leave behind the relevant job files (cleanup_job = never) and have a look at the PBS and metadata outputs you may be able to see what's happening. Also, you'll want to set retry_metadata_internally = False. --nate On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/76/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in default queue as 114.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 (76/114.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 (76/114.localhost.localdomain) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job state changed from E to C galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job has completed successfully 76.sh shows: [galaxy@bic pbs]$ more 76.sh #!/bin/sh GALAXY_LIB=/home/galaxy/galaxy-dist-9/lib if [ $GALAXY_LIB != None ]; then if [ -n $PYTHONPATH ]; then export PYTHONPATH=$GALAXY_LIB:$PYTHONPATH else export PYTHONPATH=$GALAXY_LIB fi fi cd /home/galaxy/galaxy-dist-9/database/job_working_directory/76 python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy /galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/ home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_dire ctory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . dataty pes_conf.xml ./database/job_working_directory/76/galaxy.json Right as the job ended I check the job output files: [galaxy@bic pbs]$ ll total 4 -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh [galaxy@bic pbs]$ ll total 4 -rw--- 1 galaxy galaxy 0 Jan 13 12:50 76.e -rw--- 1 galaxy galaxy 0 Jan 13
Re: [galaxy-dev] Status on importing BAM file into Library does not update
Just wanted to add that we have consistently seen this issue of 'samtools index' running locally on our install. We are using SGE scheduler. Thanks for pointing out details in the code Nate. -- Shantanu. On Jan 20, 2012, at 9:35 AM, Nate Coraor wrote: On Jan 18, 2012, at 11:54 AM, Ryan Golhar wrote: Nate - Is there a specific place in the Galaxy code that forks the samtools index on bam files on the cluster or the head node? I really need to track this down. Hey Ryan, Sorry it's taken so long, I've been pretty busy. The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class. When Galaxy runs a tool, it creates a Job, which is placed inside a JobWrapper in lib/galaxy/jobs/__init__.py. After the job execution is complete, the JobWrapper.finish() method is called, which contains: if not self.app.config.set_metadata_externally or \ ( not self.external_output_metadata.external_metadata_set_successfully( dataset, self.sa_session ) \ and self.app.config.retry_metadata_internally ): dataset.set_meta( overwrite = False ) Somehow, this conditional is being entered. Since set_metadata_externally is set to True, presumably the problem is external_metadata_set_successfully() is returning False and retry_metadata_internally is set to True. If you leave behind the relevant job files (cleanup_job = never) and have a look at the PBS and metadata outputs you may be able to see what's happening. Also, you'll want to set retry_metadata_internally = False. --nate On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/76/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in default queue as 114.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 (76/114.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 (76/114.localhost.localdomain) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job state changed from E to C galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job has completed successfully 76.sh shows: [galaxy@bic pbs]$ more 76.sh #!/bin/sh GALAXY_LIB=/home/galaxy/galaxy-dist-9/lib if [ $GALAXY_LIB != None ]; then if [ -n $PYTHONPATH ]; then export PYTHONPATH=$GALAXY_LIB:$PYTHONPATH else export PYTHONPATH=$GALAXY_LIB fi fi cd /home/galaxy/galaxy-dist-9/database/job_working_directory/76 python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy /galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/ home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_dire ctory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . dataty pes_conf.xml ./database/job_working_directory/76/galaxy.json Right as the job ended I check the job output files: [galaxy@bic pbs]$ ll total 4 -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh [galaxy@bic pbs]$ ll total 4 -rw--- 1 galaxy galaxy 0 Jan 13 12:50 76.e -rw--- 1 galaxy galaxy 0 Jan 13 12:50 76.o -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh samtools is now running on the head node. Where does Galaxy decide how to run samtools? Maybe I can add a check of some sort to see what's going on? On Fri, Jan 13, 2012 at 10:53 AM, Nate Coraor n...@bx.psu.edu wrote: On
Re: [galaxy-dev] Status on importing BAM file into Library does not update
Nate - Is there a specific place in the Galaxy code that forks the samtools index on bam files on the cluster or the head node? I really need to track this down. On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar ngsbioinformat...@gmail.comwrote: I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/76/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in default queue as 114.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 (76/114.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 (76/114.localhost.localdomain) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job state changed from E to C galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job has completed successfully 76.sh shows: [galaxy@bic pbs]$ more 76.sh #!/bin/sh GALAXY_LIB=/home/galaxy/galaxy-dist-9/lib if [ $GALAXY_LIB != None ]; then if [ -n $PYTHONPATH ]; then export PYTHONPATH=$GALAXY_LIB:$PYTHONPATH else export PYTHONPATH=$GALAXY_LIB fi fi cd /home/galaxy/galaxy-dist-9/database/job_working_directory/76 python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy /galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/ home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_dire ctory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . dataty pes_conf.xml ./database/job_working_directory/76/galaxy.json Right as the job ended I check the job output files: [galaxy@bic pbs]$ ll total 4 -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh [galaxy@bic pbs]$ ll total 4 -rw--- 1 galaxy galaxy 0 Jan 13 12:50 76.e -rw--- 1 galaxy galaxy 0 Jan 13 12:50 76.o -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh samtools is now running on the head node. Where does Galaxy decide how to run samtools? Maybe I can add a check of some sort to see what's going on? On Fri, Jan 13, 2012 at 10:53 AM, Nate Coraor n...@bx.psu.edu wrote: On Jan 12, 2012, at 11:41 PM, Ryan Golhar wrote: Any ideas as to how to fix this? We are interested in using Galaxy to host all our NGS data. If indexing on the head node is going to happen, then this is going to be an extremely slow process. Could you post the contents of /home/galaxy/galaxy-dist-9/database/pbs/62.sh ? Although I have to admit this is really baffling. The presence of this line without an error: galaxy.datatypes.metadata DEBUG 2012-01-11 10:22:40,162 Cleaning up external metadata files Indicates that metadata was set externally and the relevant metadata files were present on disk. --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 12, 2012, at 11:41 PM, Ryan Golhar wrote: Any ideas as to how to fix this? We are interested in using Galaxy to host all our NGS data. If indexing on the head node is going to happen, then this is going to be an extremely slow process. Could you post the contents of /home/galaxy/galaxy-dist-9/database/pbs/62.sh ? Although I have to admit this is really baffling. The presence of this line without an error: galaxy.datatypes.metadata DEBUG 2012-01-11 10:22:40,162 Cleaning up external metadata files Indicates that metadata was set externally and the relevant metadata files were present on disk. --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
I re-uploaded 3 BAM files using the Upload system filepaths On Fri, Jan 13, 2012 at 10:53 AM, Nate Coraor n...@bx.psu.edu wrote: On Jan 12, 2012, at 11:41 PM, Ryan Golhar wrote: Any ideas as to how to fix this? We are interested in using Galaxy to host all our NGS data. If indexing on the head node is going to happen, then this is going to be an extremely slow process. Could you post the contents of /home/galaxy/galaxy-dist-9/database/pbs/62.sh ? Although I have to admit this is really baffling. The presence of this line without an error: galaxy.datatypes.metadata DEBUG 2012-01-11 10:22:40,162 Cleaning up external metadata files Indicates that metadata was set externally and the relevant metadata files were present on disk. --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
I re-uploaded 3 BAM files using the Upload system file paths. runner0.log shows: galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file /home/galaxy/galaxy-dist-9/database/pbs/76.sh galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_conf.xml ./database/job_working_directory/76/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in default queue as 114.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 (76/114.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 (76/114.localhost.localdomain) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job state changed from E to C galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 (76/114.localhost.localdomain) PBS job has completed successfully 76.sh shows: [galaxy@bic pbs]$ more 76.sh #!/bin/sh GALAXY_LIB=/home/galaxy/galaxy-dist-9/lib if [ $GALAXY_LIB != None ]; then if [ -n $PYTHONPATH ]; then export PYTHONPATH=$GALAXY_LIB:$PYTHONPATH else export PYTHONPATH=$GALAXY_LIB fi fi cd /home/galaxy/galaxy-dist-9/database/job_working_directory/76 python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy /galaxy-dist-9/database/tmp/tmpqrVYY7 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None 209:/ home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None 210:/home/galaxy/galaxy-dist-9/database/job_working_dire ctory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . dataty pes_conf.xml ./database/job_working_directory/76/galaxy.json Right as the job ended I check the job output files: [galaxy@bic pbs]$ ll total 4 -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh [galaxy@bic pbs]$ ll total 4 -rw--- 1 galaxy galaxy 0 Jan 13 12:50 76.e -rw--- 1 galaxy galaxy 0 Jan 13 12:50 76.o -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh samtools is now running on the head node. Where does Galaxy decide how to run samtools? Maybe I can add a check of some sort to see what's going on? On Fri, Jan 13, 2012 at 10:53 AM, Nate Coraor n...@bx.psu.edu wrote: On Jan 12, 2012, at 11:41 PM, Ryan Golhar wrote: Any ideas as to how to fix this? We are interested in using Galaxy to host all our NGS data. If indexing on the head node is going to happen, then this is going to be an extremely slow process. Could you post the contents of /home/galaxy/galaxy-dist-9/database/pbs/62.sh ? Although I have to admit this is really baffling. The presence of this line without an error: galaxy.datatypes.metadata DEBUG 2012-01-11 10:22:40,162 Cleaning up external metadata files Indicates that metadata was set externally and the relevant metadata files were present on disk. --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
Any ideas as to how to fix this? We are interested in using Galaxy to host all our NGS data. If indexing on the head node is going to happen, then this is going to be an extremely slow process. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 10, 2012, at 10:20 PM, Ryan Golhar wrote: On Tue, Jan 10, 2012 at 11:43 AM, Ryan Golhar ngsbioinformat...@gmail.com wrote: Hi Ryan, You could check it in lib/galaxy/config.py, after it's read. By any chance, are you using galaxy-central vs. galaxy-dist? It's possible that due to a bug I recently fixed and a certain combination of options, metadata for BAMs would always fail externally and be retried internally, although you should still see log messages indicating that this has happened. --nate set_metadata_externally is definitely set to True. I added 1 line to check this: self.set_metadata_externally = string_as_bool( kwargs.get( set_metadata_externally, False ) ) self.retry_metadata_internally = string_as_bool( kwargs.get( retry_metadata_internally, True ) ) +log.warning( Ryan Golhar - self.set_metadata_externally = %s % self.set_metadata_externally ) and the log files show: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web0.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web1.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web2.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web3.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web4.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True so I know its being read in correctly. I then proceeded to add the same check in /home/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py: #but somewhat trickier (need to recurse up the copied_from tree), for now we'll call set_meta() + log.warning( Ryan Golhar - self.set_metadata_externally = %s % self.app.config.set_metadata_externally ) if not self.app.config.set_metadata_externally or \ and it is also set to True: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:galaxy.jobs WARNING 2012-01-10 22:17:26,381 Ryan Golhar - self.set_metadata_externally = True Clearly something else is going on here. On my last import of BAM file, even after samtools finished indexing the BAM files, Galaxy never registered those jobs as completed. So this problem is still there as well. Thanks for checking these. So, what does your server log for runner0 look like? --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
it is also set to True: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:galaxy.jobs WARNING 2012-01-10 22:17:26,381 Ryan Golhar - self.set_metadata_externally = True Clearly something else is going on here. On my last import of BAM file, even after samtools finished indexing the BAM files, Galaxy never registered those jobs as completed. So this problem is still there as well. Thanks for checking these. So, what does your server log for runner0 look like? it is also set to True: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:galaxy.jobs WARNING 2012-01-10 22:17:26,381 Ryan Golhar - self.set_metadata_externally = True ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 11, 2012, at 10:56 AM, Ryan Golhar wrote: it is also set to True: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:galaxy.jobs WARNING 2012-01-10 22:17:26,381 Ryan Golhar - self.set_metadata_externally = True Clearly something else is going on here. On my last import of BAM file, even after samtools finished indexing the BAM files, Galaxy never registered those jobs as completed. So this problem is still there as well. Thanks for checking these. So, what does your server log for runner0 look like? it is also set to True: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:galaxy.jobs WARNING 2012-01-10 22:17:26,381 Ryan Golhar - self.set_metadata_externally = True Sorry, I should have been more specific. What are the contents of the log from job start to finish? --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Wed, Jan 11, 2012 at 11:16 AM, Nate Coraor n...@bx.psu.edu wrote: On Jan 11, 2012, at 10:56 AM, Ryan Golhar wrote: it is also set to True: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:galaxy.jobs WARNING 2012-01-10 22:17:26,381 Ryan Golhar - self.set_metadata_externally = True Clearly something else is going on here. On my last import of BAM file, even after samtools finished indexing the BAM files, Galaxy never registered those jobs as completed. So this problem is still there as well. Thanks for checking these. So, what does your server log for runner0 look like? it is also set to True: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:galaxy.jobs WARNING 2012-01-10 22:17:26,381 Ryan Golhar - self.set_metadata_externally = True Sorry, I should have been more specific. What are the contents of the log from job start to finish? --nate I think this is it. The rest looks like stuff for jobs I started today. I wonder if Galaxy is running the jobs on the head node because its unable to change the permissions of the BAM files. The BAM files are located on a read-only NFS mount. serving on http://127.0.0.1:8079 galaxy.jobs DEBUG 2012-01-10 22:16:54,322 dispatching job 62 to pbs runner galaxy.jobs INFO 2012-01-10 22:16:54,436 job 62 dispatched galaxy.jobs.runners.pbs DEBUG 2012-01-10 22:16:54,595 (62) submitting file /home/galaxy/galaxy-dist-9/database/pbs/62.sh galaxy.jobs.runners.pbs DEBUG 2012-01-10 22:16:54,596 (62) command is: python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml /home/galaxy/galaxy-dist-9/database/tmp/tmpaEn2 jR 181:/home/galaxy/galaxy-dist-9/database/job_working_directory/62/dataset_181_files:None 182:/home/galaxy/galaxy-dist-9/database/job_working_directory/62/dataset_182_files:None 183:/home/galaxy/galaxy-dist-9/database/job_workin g_directory/62/dataset_183_files:None 184:/home/galaxy/galaxy-dist-9/database/job_working_directory/62/dataset_184_files:None; cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp . datatypes_ conf.xml ./database/job_working_directory/62/galaxy.json galaxy.jobs.runners.pbs DEBUG 2012-01-10 22:16:54,597 (62) queued in default queue as 73.localhost.localdomain galaxy.jobs.runners.pbs DEBUG 2012-01-10 22:16:55,179 (62/73.localhost.localdomain) PBS job state changed from N to R galaxy.jobs.runners.pbs DEBUG 2012-01-10 22:17:26,324 (62/73.localhost.localdomain) PBS job state changed from R to C galaxy.jobs.runners.pbs DEBUG 2012-01-10 22:17:26,324 (62/73.localhost.localdomain) PBS job has completed successfully galaxy.jobs WARNING 2012-01-10 22:17:26,381 Ryan Golhar - self.set_metadata_externally = True galaxy.jobs WARNING 2012-01-11 01:29:53,110 Ryan Golhar - self.set_metadata_externally = True galaxy.jobs WARNING 2012-01-11 04:30:06,043 Ryan Golhar - self.set_metadata_externally = True galaxy.jobs WARNING 2012-01-11 07:41:02,917 Ryan Golhar - self.set_metadata_externally = True galaxy.util WARNING 2012-01-11 10:22:40,046 Unable to honor umask (02) for /mnt/isilon/cag/ngs/hiseq/golharr/Bookman/5500xl_5_2011_BK_WTS_03_122211_1_06-2-1.bam, tried to set: 0664 but mode remains 0771, error was: [Errno 30] Read-only file system: '/mn t/isilon/cag/ngs/hiseq/golharr/Bk/5500xl_5_2011_BK_WTS_03_122211_1_06-2-1.bam' galaxy.util WARNING 2012-01-11 10:22:40,066 Unable to honor primary group (grp.struct_group(gr_name='galaxy', gr_passwd='x', gr_gid=503, gr_mem=[])) for /mnt/isilon/cag/ngs/hiseq/golharr/Bookman/5500xl_5_2011_BK_WTS_03_122211_1_06-2-1.bam, group remains grp.struct_group(gr_name='cag_lab2', gr_passwd='x', gr_gid=10726, gr_mem=['ryang', 'lifeng', 'galaxy']), error was: [Errno 30] Read-only file system: '/mnt/isilon/cag/ngs/hiseq/golharr/Bk/5500xl_5_2011_BK_WTS_03_122211_1_06-2-1.bam' galaxy.util WARNING 2012-01-11 10:22:40,069 Unable to honor umask (02) for /mnt/isilon/cag/ngs/hiseq/golharr/Bookman/5500_4_BK_WTS_03_20111222_1_04-1-1.bam, tried to set: 0664 but mode remains 0771, error was: [Errno 30] Read-only file system: '/mnt/isi lon/cag/ngs/hiseq/golharr/Bk/5500_4_BK_WTS_03_20111222_1_04-1-1.bam' galaxy.util WARNING 2012-01-11 10:22:40,069 Unable to honor primary group (grp.struct_group(gr_name='galaxy', gr_passwd='x', gr_gid=503, gr_mem=[])) for /mnt/isilon/cag/ngs/hiseq/golharr/Bk/5500_4_BK_WTS_03_20111222_1_04-1-1.bam, group remains grp. struct_group(gr_name='cag_lab2', gr_passwd='x', gr_gid=10726, gr_mem=['ryang', 'lifeng', 'galaxy']), error was: [Errno 30] Read-only file system: '/mnt/isilon/cag/ngs/hiseq/golharr/Bk/5500_4_BK_WTS_03_20111222_1_04-1-1.bam' galaxy.util WARNING 2012-01-11 10:22:40,072 Unable to honor umask (02) for /mnt/isilon/cag/ngs/hiseq/golharr/Bk/5500xl_5_2011_BK_WTS_03_122211_1_04-1-1.bam, tried to set: 0664 but mode remains 0771, error was: [Errno 30] Read-only file system: '/mn
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 9, 2012, at 2:38 PM, Ryan Golhar wrote: On Fri, Jan 6, 2012 at 12:55 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: This indicates that set_meta is running locally, in the runner process. Can you make sure there's not a typo in your config? The other possibility is that external metadata setting failed and it's being retried internally (if that was true, you'd see messages indicated such in the server log). I'm pretty sure there isn't a typo. Here is anything meta related (with comment lines removed) in my universe_wsgi.*.ini files: [galaxy@bic galaxy-dist]$ grep set_meta *.ini universe_wsgi.ini:set_metadata_externally = True universe_wsgi.runner.ini:set_metadata_externally = True universe_wsgi.webapp.ini:set_metadata_externally = True [galaxy@bic galaxy-dist]$ grep meta *.ini universe_wsgi.ini:set_metadata_externally = True universe_wsgi.runner.ini:set_metadata_externally = True universe_wsgi.webapp.ini:set_metadata_externally = True I just tried it again on some BAM files, and nothing comes up in /var/log/messages or /var/log/httpd/error_log. Runner0.log also doesn't show anything except for the upload job being completed. I'm still trying to track this one down. Can I add a debug output string to show what the value of set_metadata_externally is when its read in? If so, where would I do this? Hi Ryan, You could check it in lib/galaxy/config.py, after it's read. By any chance, are you using galaxy-central vs. galaxy-dist? It's possible that due to a bug I recently fixed and a certain combination of options, metadata for BAMs would always fail externally and be retried internally, although you should still see log messages indicating that this has happened. --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Tue, Jan 10, 2012 at 11:43 AM, Ryan Golhar ngsbioinformat...@gmail.comwrote: Hi Ryan, You could check it in lib/galaxy/config.py, after it's read. By any chance, are you using galaxy-central vs. galaxy-dist? It's possible that due to a bug I recently fixed and a certain combination of options, metadata for BAMs would always fail externally and be retried internally, although you should still see log messages indicating that this has happened. --nate set_metadata_externally is definitely set to True. I added 1 line to check this: self.set_metadata_externally = string_as_bool( kwargs.get( set_metadata_externally, False ) ) self.retry_metadata_internally = string_as_bool( kwargs.get( retry_metadata_internally, True ) ) +log.warning( Ryan Golhar - self.set_metadata_externally = %s % self.set_metadata_externally ) and the log files show: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web0.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web1.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web2.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web3.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True web4.log:WARNING:galaxy.config:Ryan Golhar - self.set_metadata_externally = True so I know its being read in correctly. I then proceeded to add the same check in /home/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py: #but somewhat trickier (need to recurse up the copied_from tree), for now we'll call set_meta() + log.warning( Ryan Golhar - self.set_metadata_externally = %s % self.app.config.set_metadata_externally ) if not self.app.config.set_metadata_externally or \ and it is also set to True: [galaxy@bic galaxy-dist]$ grep Ryan *.log runner0.log:galaxy.jobs WARNING 2012-01-10 22:17:26,381 Ryan Golhar - self.set_metadata_externally = True Clearly something else is going on here. On my last import of BAM file, even after samtools finished indexing the BAM files, Galaxy never registered those jobs as completed. So this problem is still there as well. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Fri, Jan 6, 2012 at 12:55 PM, Ryan Golhar ngsbioinformat...@gmail.comwrote: This indicates that set_meta is running locally, in the runner process. Can you make sure there's not a typo in your config? The other possibility is that external metadata setting failed and it's being retried internally (if that was true, you'd see messages indicated such in the server log). I'm pretty sure there isn't a typo. Here is anything meta related (with comment lines removed) in my universe_wsgi.*.ini files: [galaxy@bic galaxy-dist]$ grep set_meta *.ini universe_wsgi.ini:set_metadata_externally = True universe_wsgi.runner.ini:set_metadata_externally = True universe_wsgi.webapp.ini:set_metadata_externally = True [galaxy@bic galaxy-dist]$ grep meta *.ini universe_wsgi.ini:set_metadata_externally = True universe_wsgi.runner.ini:set_metadata_externally = True universe_wsgi.webapp.ini:set_metadata_externally = True I just tried it again on some BAM files, and nothing comes up in /var/log/messages or /var/log/httpd/error_log. Runner0.log also doesn't show anything except for the upload job being completed. I'm still trying to track this one down. Can I add a debug output string to show what the value of set_metadata_externally is when its read in? If so, where would I do this? ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 5, 2012, at 2:48 PM, Ryan Golhar wrote: On Thu, Jan 5, 2012 at 11:59 AM, Nate Coraor n...@bx.psu.edu wrote: On Jan 5, 2012, at 11:41 AM, Ryan Golhar wrote: I set it to run on the cluster: [galaxy@bic galaxy-dist]$ grep upload1 universe_wsgi.runner.ini #upload1 = local:/// Could you set use_heartbeat = True in the runner's config file and then check the resulting heartbeat log files created in the root directory to get a stack trace to the call of samtools? Thanks, --nate Hi Nate, I just tried importing another BAM file. I see the upload working on a compute node, but the indexing happens on the head node. 'samtools index' is never submitted to the cluster. Attached is a copy of the heartbeat log. Its 990K, hopefully it will go through. Thread 1180711232, Thread(Thread-8, started 1180711232): File /share/apps/Python-2.6.7/lib/python2.6/threading.py, line 504, in __bootstrap self.__bootstrap_inner() File /share/apps/Python-2.6.7/lib/python2.6/threading.py, line 532, in __bootstrap_inner self.run() File /share/apps/Python-2.6.7/lib/python2.6/threading.py, line 484, in run self.__target(*self.__args, **self.__kwargs) File /home/galaxy/galaxy-dist-9/lib/galaxy/jobs/runners/pbs.py, line 190, in run_next self.finish_job( obj ) File /home/galaxy/galaxy-dist-9/lib/galaxy/jobs/runners/pbs.py, line 514, in finish_job pbs_job_state.job_wrapper.finish( stdout, stderr ) File /home/galaxy/galaxy-dist-9/lib/galaxy/jobs/__init__.py, line 611, in finish dataset.set_meta( overwrite = False ) File /home/galaxy/galaxy-dist-9/lib/galaxy/model/__init__.py, line 886, in set_meta return self.datatype.set_meta( self, **kwd ) File /home/galaxy/galaxy-dist-9/lib/galaxy/datatypes/binary.py, line 173, in set_meta exit_code = proc.wait() File /share/apps/Python-2.6.7/lib/python2.6/subprocess.py, line 1182, in wait pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0) File /share/apps/Python-2.6.7/lib/python2.6/subprocess.py, line 455, in _eintr_retry_call return func(*args) This indicates that set_meta is running locally, in the runner process. Can you make sure there's not a typo in your config? The other possibility is that external metadata setting failed and it's being retried internally (if that was true, you'd see messages indicated such in the server log). --nate Ryan heartbeat.log___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
This indicates that set_meta is running locally, in the runner process. Can you make sure there's not a typo in your config? The other possibility is that external metadata setting failed and it's being retried internally (if that was true, you'd see messages indicated such in the server log). I'm pretty sure there isn't a typo. Here is anything meta related (with comment lines removed) in my universe_wsgi.*.ini files: [galaxy@bic galaxy-dist]$ grep set_meta *.ini universe_wsgi.ini:set_metadata_externally = True universe_wsgi.runner.ini:set_metadata_externally = True universe_wsgi.webapp.ini:set_metadata_externally = True [galaxy@bic galaxy-dist]$ grep meta *.ini universe_wsgi.ini:set_metadata_externally = True universe_wsgi.runner.ini:set_metadata_externally = True universe_wsgi.webapp.ini:set_metadata_externally = True I just tried it again on some BAM files, and nothing comes up in /var/log/messages or /var/log/httpd/error_log. Runner0.log also doesn't show anything except for the upload job being completed. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 4, 2012, at 6:44 PM, Ryan Golhar wrote: On Wed, Jan 4, 2012 at 5:17 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I'm adding Data Libraries to my local galaxy instance. I'm doing this by importing directories that contain bam and bai files. I see the bam/bai files get added on the admin page and the Message is This job is running. qstat shows the job run and complete. I checked my runner0.log and it registers the PBS job completed successfully. But the web page never updates. I tried to refresh the page by navigating away from it then back to it, but it still reads This job is running. How do I fix this? Some more information...I check my head node and I see samtools is running there. Its running 'samtools index'. So two problems: 1) samtools is not using the cluster. I assume this is a configuration setting somewhere. See set_metadata_externally in universe_wsgi.ini. This should be set to True to run on the cluster. If you haven't seen the rest of the production server documentation, see http://usegalaxy.org/production 2) Why is galaxy trying to index the bam files if the bai files exists in the same directory as the bam file. The BAM files are sorted and have 'SO:coordinate'. I also have samtools-0.1.18 installed. Galaxy does not yet have a method to upload BAM files with a precreated .bai. It also appears: 3) Galaxy is unable to import .bai files. It says there was an error importing these files The uploaded binary file contains inappropriate content See #2. Galaxy will always create its own .bai. 4) Galaxy is trying to change the permissions on the files I'm importing (as links). Thankfully the data tree is read-only. If I'm linking Galaxy to my date, why does Galaxy want to change the permissions? This seems like something it shouldn't be doing i.e. Galaxy should leave external data alone. Hrm, this is not good. I'll have a look at this. --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 4, 2012, at 6:44 PM, Ryan Golhar wrote: On Wed, Jan 4, 2012 at 5:17 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I'm adding Data Libraries to my local galaxy instance. I'm doing this by importing directories that contain bam and bai files. I see the bam/bai files get added on the admin page and the Message is This job is running. qstat shows the job run and complete. I checked my runner0.log and it registers the PBS job completed successfully. But the web page never updates. I tried to refresh the page by navigating away from it then back to it, but it still reads This job is running. How do I fix this? Some more information...I check my head node and I see samtools is running there. Its running 'samtools index'. So two problems: 1) samtools is not using the cluster. I assume this is a configuration setting somewhere. See set_metadata_externally in universe_wsgi.ini. This should be set to True to run on the cluster. If you haven't seen the rest of the production server documentation, see http://usegalaxy.org/production This is already set. I set this in universe_wsgi.ini (and universe_wsgi.webapp.ini and universe_wsgi.running.ini since I'm using a proxy server and load balancer on Apache). This was one of the first things I set up. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 5, 2012, at 11:29 AM, Ryan Golhar wrote: On Jan 4, 2012, at 6:44 PM, Ryan Golhar wrote: On Wed, Jan 4, 2012 at 5:17 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I'm adding Data Libraries to my local galaxy instance. I'm doing this by importing directories that contain bam and bai files. I see the bam/bai files get added on the admin page and the Message is This job is running. qstat shows the job run and complete. I checked my runner0.log and it registers the PBS job completed successfully. But the web page never updates. I tried to refresh the page by navigating away from it then back to it, but it still reads This job is running. How do I fix this? Some more information...I check my head node and I see samtools is running there. Its running 'samtools index'. So two problems: 1) samtools is not using the cluster. I assume this is a configuration setting somewhere. See set_metadata_externally in universe_wsgi.ini. This should be set to True to run on the cluster. If you haven't seen the rest of the production server documentation, see http://usegalaxy.org/production This is already set. I set this in universe_wsgi.ini (and universe_wsgi.webapp.ini and universe_wsgi.running.ini since I'm using a proxy server and load balancer on Apache). This was one of the first things I set up. Does the upload tool run on the cluster? See upload1 under [galaxy:tool_runners] in universe_wsgi.runner.ini. --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
I set it to run on the cluster: [galaxy@bic galaxy-dist]$ grep upload1 universe_wsgi.runner.ini #upload1 = local:/// On Thu, Jan 5, 2012 at 11:33 AM, Nate Coraor n...@bx.psu.edu wrote: On Jan 5, 2012, at 11:29 AM, Ryan Golhar wrote: On Jan 4, 2012, at 6:44 PM, Ryan Golhar wrote: On Wed, Jan 4, 2012 at 5:17 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I'm adding Data Libraries to my local galaxy instance. I'm doing this by importing directories that contain bam and bai files. I see the bam/bai files get added on the admin page and the Message is This job is running. qstat shows the job run and complete. I checked my runner0.log and it registers the PBS job completed successfully. But the web page never updates. I tried to refresh the page by navigating away from it then back to it, but it still reads This job is running. How do I fix this? Some more information...I check my head node and I see samtools is running there. Its running 'samtools index'. So two problems: 1) samtools is not using the cluster. I assume this is a configuration setting somewhere. See set_metadata_externally in universe_wsgi.ini. This should be set to True to run on the cluster. If you haven't seen the rest of the production server documentation, see http://usegalaxy.org/production This is already set. I set this in universe_wsgi.ini (and universe_wsgi.webapp.ini and universe_wsgi.running.ini since I'm using a proxy server and load balancer on Apache). This was one of the first things I set up. Does the upload tool run on the cluster? See upload1 under [galaxy:tool_runners] in universe_wsgi.runner.ini. --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Jan 5, 2012, at 11:41 AM, Ryan Golhar wrote: I set it to run on the cluster: [galaxy@bic galaxy-dist]$ grep upload1 universe_wsgi.runner.ini #upload1 = local:/// Could you set use_heartbeat = True in the runner's config file and then check the resulting heartbeat log files created in the root directory to get a stack trace to the call of samtools? Thanks, --nate On Thu, Jan 5, 2012 at 11:33 AM, Nate Coraor n...@bx.psu.edu wrote: On Jan 5, 2012, at 11:29 AM, Ryan Golhar wrote: On Jan 4, 2012, at 6:44 PM, Ryan Golhar wrote: On Wed, Jan 4, 2012 at 5:17 PM, Ryan Golhar ngsbioinformat...@gmail.com wrote: I'm adding Data Libraries to my local galaxy instance. I'm doing this by importing directories that contain bam and bai files. I see the bam/bai files get added on the admin page and the Message is This job is running. qstat shows the job run and complete. I checked my runner0.log and it registers the PBS job completed successfully. But the web page never updates. I tried to refresh the page by navigating away from it then back to it, but it still reads This job is running. How do I fix this? Some more information...I check my head node and I see samtools is running there. Its running 'samtools index'. So two problems: 1) samtools is not using the cluster. I assume this is a configuration setting somewhere. See set_metadata_externally in universe_wsgi.ini. This should be set to True to run on the cluster. If you haven't seen the rest of the production server documentation, see http://usegalaxy.org/production This is already set. I set this in universe_wsgi.ini (and universe_wsgi.webapp.ini and universe_wsgi.running.ini since I'm using a proxy server and load balancer on Apache). This was one of the first things I set up. Does the upload tool run on the cluster? See upload1 under [galaxy:tool_runners] in universe_wsgi.runner.ini. --nate ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Status on importing BAM file into Library does not update
I'm adding Data Libraries to my local galaxy instance. I'm doing this by importing directories that contain bam and bai files. I see the bam/bai files get added on the admin page and the Message is This job is running. qstat shows the job run and complete. I checked my runner0.log and it registers the PBS job completed successfully. But the web page never updates. I tried to refresh the page by navigating away from it then back to it, but it still reads This job is running. How do I fix this? ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Wed, Jan 4, 2012 at 5:17 PM, Ryan Golhar ngsbioinformat...@gmail.comwrote: I'm adding Data Libraries to my local galaxy instance. I'm doing this by importing directories that contain bam and bai files. I see the bam/bai files get added on the admin page and the Message is This job is running. qstat shows the job run and complete. I checked my runner0.log and it registers the PBS job completed successfully. But the web page never updates. I tried to refresh the page by navigating away from it then back to it, but it still reads This job is running. How do I fix this? Some more information...I check my head node and I see samtools is running there. Its running 'samtools index'. So two problems: 1) samtools is not using the cluster. I assume this is a configuration setting somewhere. 2) Why is galaxy trying to index the bam files if the bai files exists in the same directory as the bam file. The BAM files are sorted and have 'SO:coordinate'. I also have samtools-0.1.18 installed. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Status on importing BAM file into Library does not update
On Wed, Jan 4, 2012 at 5:17 PM, Ryan Golhar ngsbioinformat...@gmail.comwrote: I'm adding Data Libraries to my local galaxy instance. I'm doing this by importing directories that contain bam and bai files. I see the bam/bai files get added on the admin page and the Message is This job is running. qstat shows the job run and complete. I checked my runner0.log and it registers the PBS job completed successfully. But the web page never updates. I tried to refresh the page by navigating away from it then back to it, but it still reads This job is running. How do I fix this? Some more information...I check my head node and I see samtools is running there. Its running 'samtools index'. So two problems: 1) samtools is not using the cluster. I assume this is a configuration setting somewhere. 2) Why is galaxy trying to index the bam files if the bai files exists in the same directory as the bam file. The BAM files are sorted and have 'SO:coordinate'. I also have samtools-0.1.18 installed. It also appears: 3) Galaxy is unable to import .bai files. It says there was an error importing these files The uploaded binary file contains inappropriate content 4) Galaxy is trying to change the permissions on the files I'm importing (as links). Thankfully the data tree is read-only. If I'm linking Galaxy to my date, why does Galaxy want to change the permissions? This seems like something it shouldn't be doing i.e. Galaxy should leave external data alone. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/