Re: [galaxy-dev] Rocks cluster; jobs run but Galaxy can't find jobid when submitted via drmaa
Ok; it's quite weird. Perhaps a Galaxy guru could give you a better answer but I remembered that I had this kind of issues a while ago. In the meantime, you could take a look at these parameters: $ grep retry galaxy.ini.sample # these instances, you can choose to retry setting it internally or leave it in # a failed state (since retrying internally may cause the Galaxy process to be # option to retry externally, or set metadata manually (when possible). #retry_metadata_internally = True # the job's stdout and stderr files when it completes, you can retry reading # these files. The job runner will retry the number of times specified below, #retry_job_output_collection = 0 Best, Remy 2016-01-20 21:12 GMT+01:00 Eric Shell : > Thanks, Remy. I went through the cluster documentation and our Rocks > environment seems to be configured properly, after all. > > It appears that my issue may be related to the UCSC Main table browser. > The jobs that Galaxy reports have failed are leaving the > job_working_directory behind, with galaxy_#.e error files that contain "The > remote data source application has not sent back a URL parameter in the > request." > > [root@campusrocks2 7045]# pwd > /campusdata/galaxy/galaxy/database/job_working_directory/007/7045 > [root@campusrocks2 7045]# cat galaxy_7045.e > The remote data source application has not sent back a URL parameter in > the request. > > These errors correspond with empty dataset_#.dat files > in /campusdata/galaxy/galaxy/database/files/011/: > > [root@campusrocks2 7045]# ll /campusdata/galaxy/galaxy/database/files/011/ > -rw-rw-r-- 1 galaxy galaxy0 Jan 20 11:54 dataset_11387.dat > > The job failures are intermittent. Sometimes, a job requesting the exact > same dataset will succeed moments before or after a failed job. Is there > perhaps a way to tell the table browser to retry when it fails to get the > dataset it is requesting? Is that even what's going on? > > On Wed, Jan 20, 2016 at 7:05 AM, Rémy Dernat wrote: > >> I forgot to point out the needs of sharing folders and checking the >> UID/GID of the galaxy user between your systems (and his access to SGE). >> >> Remy >> >> 2016-01-20 16:00 GMT+01:00 Rémy Dernat : >> >>> Hi Eric, >>> >>> Here we use both solutions: Galaxy and RocksCluster. In Galaxy, you have >>> to define your jobs in "config/job_conf.xml" and you should probably source >>> a file (search for "environment" in your galaxy.ini) before the submit >>> process. In fact, you could have to set a DRMAA_LIBRARY_PATH to load your >>> drmaa library; see >>> https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#DRMAA >>> >>> Best, >>> Remy >>> >>> 2016-01-19 20:28 GMT+01:00 Eric Shell : >>> I am trying to get a Galaxy instance running on a Rocks cluster. I am able to run jobs with the local runner at this point, but I am having an issue with the drmaa runner that I haven't been able to fix. When I submit a job in Galaxy it is successfully submitted to the cluster and runs to completion according to qacct, but Galaxy just reports "failure running job". Here's what is written to paster.log when I submit a job: 69.181.235.240 - - [19/Jan/2016:11:24:31 -0700] "GET > /api/histories/fb86c918c0d3d33b/contents?dataset_details=bae154fe2294752e%2C6fe732485990d2ac%2C604c4e6e60e997bc%2Cf015f1cb819ec50e%2C9f6f4b3cb6cf43eb%2C3d13d598882b6eb8%2C551006fddcb290ae%2C10b9bbc646c48387%2C7670dfdf35146bc5%2Ce0ec2cf59f1fc79e%2Cee30922e5e4854db%2C9e7a0ba216194210 > HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows > NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) > Chrome/47.0.2526.111 Safari/537.36" > 69.181.235.240 - - [19/Jan/2016:11:24:38 -0700] "GET > /tool_runner/data_source_redirect?tool_id=ucsc_table_direct1 HTTP/1.1" 302 > - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; > Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 > Safari/537.36" > galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:42,801 Handled > output (327.778 ms) > galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:43,236 Verified > access to datasets (0.023 ms) > galaxy.tools.execute DEBUG 2016-01-19 11:24:43,343 Tool > [ucsc_table_direct1] created job [7019] (919.481 ms) > 69.181.235.240 - - [19/Jan/2016:11:24:42 -0700] "POST /tool_runner > HTTP/1.1" 200 - "https://genome.ucsc.edu/cgi-bin/hgTables"; > "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like > Gecko) Chrome/47.0.2526.111 Safari/537.36" > galaxy.jobs DEBUG 2016-01-19 11:24:44,056 (7019) Working directory > for job is: > /campusdata/galaxy/galaxy/database/job_working_directory/007/7019 > galaxy.jobs.handler DEBUG 2016-01-19 11:24:44,070 (7019) Dispatching > to sge runner > galaxy.jobs DEBUG 2016-01-19 11:24:44,378 (7019) Persisting job > destination (destination id: sge_d
Re: [galaxy-dev] Rocks cluster; jobs run but Galaxy can't find jobid when submitted via drmaa
Thanks, Remy. I went through the cluster documentation and our Rocks environment seems to be configured properly, after all. It appears that my issue may be related to the UCSC Main table browser. The jobs that Galaxy reports have failed are leaving the job_working_directory behind, with galaxy_#.e error files that contain "The remote data source application has not sent back a URL parameter in the request." [root@campusrocks2 7045]# pwd /campusdata/galaxy/galaxy/database/job_working_directory/007/7045 [root@campusrocks2 7045]# cat galaxy_7045.e The remote data source application has not sent back a URL parameter in the request. These errors correspond with empty dataset_#.dat files in /campusdata/galaxy/galaxy/database/files/011/: [root@campusrocks2 7045]# ll /campusdata/galaxy/galaxy/database/files/011/ -rw-rw-r-- 1 galaxy galaxy0 Jan 20 11:54 dataset_11387.dat The job failures are intermittent. Sometimes, a job requesting the exact same dataset will succeed moments before or after a failed job. Is there perhaps a way to tell the table browser to retry when it fails to get the dataset it is requesting? Is that even what's going on? On Wed, Jan 20, 2016 at 7:05 AM, Rémy Dernat wrote: > I forgot to point out the needs of sharing folders and checking the > UID/GID of the galaxy user between your systems (and his access to SGE). > > Remy > > 2016-01-20 16:00 GMT+01:00 Rémy Dernat : > >> Hi Eric, >> >> Here we use both solutions: Galaxy and RocksCluster. In Galaxy, you have >> to define your jobs in "config/job_conf.xml" and you should probably source >> a file (search for "environment" in your galaxy.ini) before the submit >> process. In fact, you could have to set a DRMAA_LIBRARY_PATH to load your >> drmaa library; see >> https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#DRMAA >> >> Best, >> Remy >> >> 2016-01-19 20:28 GMT+01:00 Eric Shell : >> >>> I am trying to get a Galaxy instance running on a Rocks cluster. I am >>> able to run jobs with the local runner at this point, but I am having an >>> issue with the drmaa runner that I haven't been able to fix. When I submit >>> a job in Galaxy it is successfully submitted to the cluster and runs to >>> completion according to qacct, but Galaxy just reports "failure running >>> job". >>> >>> Here's what is written to paster.log when I submit a job: >>> >>> 69.181.235.240 - - [19/Jan/2016:11:24:31 -0700] "GET /api/histories/fb86c918c0d3d33b/contents?dataset_details=bae154fe2294752e%2C6fe732485990d2ac%2C604c4e6e60e997bc%2Cf015f1cb819ec50e%2C9f6f4b3cb6cf43eb%2C3d13d598882b6eb8%2C551006fddcb290ae%2C10b9bbc646c48387%2C7670dfdf35146bc5%2Ce0ec2cf59f1fc79e%2Cee30922e5e4854db%2C9e7a0ba216194210 HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36" 69.181.235.240 - - [19/Jan/2016:11:24:38 -0700] "GET /tool_runner/data_source_redirect?tool_id=ucsc_table_direct1 HTTP/1.1" 302 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36" galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:42,801 Handled output (327.778 ms) galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:43,236 Verified access to datasets (0.023 ms) galaxy.tools.execute DEBUG 2016-01-19 11:24:43,343 Tool [ucsc_table_direct1] created job [7019] (919.481 ms) 69.181.235.240 - - [19/Jan/2016:11:24:42 -0700] "POST /tool_runner HTTP/1.1" 200 - "https://genome.ucsc.edu/cgi-bin/hgTables"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36" galaxy.jobs DEBUG 2016-01-19 11:24:44,056 (7019) Working directory for job is: /campusdata/galaxy/galaxy/database/job_working_directory/007/7019 galaxy.jobs.handler DEBUG 2016-01-19 11:24:44,070 (7019) Dispatching to sge runner galaxy.jobs DEBUG 2016-01-19 11:24:44,378 (7019) Persisting job destination (destination id: sge_default) galaxy.jobs.runners DEBUG 2016-01-19 11:24:44,403 Job [7019] queued (332.423 ms) galaxy.jobs.handler INFO 2016-01-19 11:24:44,444 (7019) Job dispatched 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET /api/genomes HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36" 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET /api/datatypes?extension_only=False& HTTP/1.1" 200 - " https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36" 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET /history/current_history_json HTTP/1.1" 200 - " https://galaxy.soe.ucsc.
Re: [galaxy-dev] Rocks cluster; jobs run but Galaxy can't find jobid when submitted via drmaa
I forgot to point out the needs of sharing folders and checking the UID/GID of the galaxy user between your systems (and his access to SGE). Remy 2016-01-20 16:00 GMT+01:00 Rémy Dernat : > Hi Eric, > > Here we use both solutions: Galaxy and RocksCluster. In Galaxy, you have > to define your jobs in "config/job_conf.xml" and you should probably source > a file (search for "environment" in your galaxy.ini) before the submit > process. In fact, you could have to set a DRMAA_LIBRARY_PATH to load your > drmaa library; see > https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#DRMAA > > Best, > Remy > > 2016-01-19 20:28 GMT+01:00 Eric Shell : > >> I am trying to get a Galaxy instance running on a Rocks cluster. I am >> able to run jobs with the local runner at this point, but I am having an >> issue with the drmaa runner that I haven't been able to fix. When I submit >> a job in Galaxy it is successfully submitted to the cluster and runs to >> completion according to qacct, but Galaxy just reports "failure running >> job". >> >> Here's what is written to paster.log when I submit a job: >> >> 69.181.235.240 - - [19/Jan/2016:11:24:31 -0700] "GET >>> /api/histories/fb86c918c0d3d33b/contents?dataset_details=bae154fe2294752e%2C6fe732485990d2ac%2C604c4e6e60e997bc%2Cf015f1cb819ec50e%2C9f6f4b3cb6cf43eb%2C3d13d598882b6eb8%2C551006fddcb290ae%2C10b9bbc646c48387%2C7670dfdf35146bc5%2Ce0ec2cf59f1fc79e%2Cee30922e5e4854db%2C9e7a0ba216194210 >>> HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT >>> 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) >>> Chrome/47.0.2526.111 Safari/537.36" >>> 69.181.235.240 - - [19/Jan/2016:11:24:38 -0700] "GET >>> /tool_runner/data_source_redirect?tool_id=ucsc_table_direct1 HTTP/1.1" 302 >>> - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; >>> x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 >>> Safari/537.36" >>> galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:42,801 Handled >>> output (327.778 ms) >>> galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:43,236 Verified >>> access to datasets (0.023 ms) >>> galaxy.tools.execute DEBUG 2016-01-19 11:24:43,343 Tool >>> [ucsc_table_direct1] created job [7019] (919.481 ms) >>> 69.181.235.240 - - [19/Jan/2016:11:24:42 -0700] "POST /tool_runner >>> HTTP/1.1" 200 - "https://genome.ucsc.edu/cgi-bin/hgTables"; "Mozilla/5.0 >>> (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) >>> Chrome/47.0.2526.111 Safari/537.36" >>> galaxy.jobs DEBUG 2016-01-19 11:24:44,056 (7019) Working directory for >>> job is: /campusdata/galaxy/galaxy/database/job_working_directory/007/7019 >>> galaxy.jobs.handler DEBUG 2016-01-19 11:24:44,070 (7019) Dispatching to >>> sge runner >>> galaxy.jobs DEBUG 2016-01-19 11:24:44,378 (7019) Persisting job >>> destination (destination id: sge_default) >>> galaxy.jobs.runners DEBUG 2016-01-19 11:24:44,403 Job [7019] queued >>> (332.423 ms) >>> galaxy.jobs.handler INFO 2016-01-19 11:24:44,444 (7019) Job dispatched >>> 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET /api/genomes >>> HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT >>> 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) >>> Chrome/47.0.2526.111 Safari/537.36" >>> 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET >>> /api/datatypes?extension_only=False& HTTP/1.1" 200 - " >>> https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; >>> x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 >>> Safari/537.36" >>> 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET >>> /history/current_history_json HTTP/1.1" 200 - " >>> https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; >>> x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 >>> Safari/537.36" >>> galaxy.jobs.runners DEBUG 2016-01-19 11:24:46,399 (7019) command is: >>> python /campusdata/galaxy/galaxy/tools/data_source/data_source.py >>> /campusdata/galaxy/galaxy/database/files/011/dataset_11361.dat 0; >>> return_code=$?; python >>> "/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/set_metadata_IaPURP.py" >>> "/campusdata/galaxy/galaxy/database/tmp/tmp9Qt0cv" >>> "/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/galaxy.json" >>> "/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_in_HistoryDatasetAssociation_13512_oucw5s,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_kwds_HistoryDatasetAssociation_13512_ZrUbrF,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_out_HistoryDatasetAssociation_13512_twCvq7,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_results_HistoryDatasetAssociation_13512_FO1cy9,/campusdata/galaxy/galaxy/database/files/011/dataset_11361.dat,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_override_HistoryDatasetAssociation_13512_Z_cUTF" >>> 5242880; sh -c "exit
Re: [galaxy-dev] Rocks cluster; jobs run but Galaxy can't find jobid when submitted via drmaa
Hi Eric, Here we use both solutions: Galaxy and RocksCluster. In Galaxy, you have to define your jobs in "config/job_conf.xml" and you should probably source a file (search for "environment" in your galaxy.ini) before the submit process. In fact, you could have to set a DRMAA_LIBRARY_PATH to load your drmaa library; see https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#DRMAA Best, Remy 2016-01-19 20:28 GMT+01:00 Eric Shell : > I am trying to get a Galaxy instance running on a Rocks cluster. I am > able to run jobs with the local runner at this point, but I am having an > issue with the drmaa runner that I haven't been able to fix. When I submit > a job in Galaxy it is successfully submitted to the cluster and runs to > completion according to qacct, but Galaxy just reports "failure running > job". > > Here's what is written to paster.log when I submit a job: > > 69.181.235.240 - - [19/Jan/2016:11:24:31 -0700] "GET >> /api/histories/fb86c918c0d3d33b/contents?dataset_details=bae154fe2294752e%2C6fe732485990d2ac%2C604c4e6e60e997bc%2Cf015f1cb819ec50e%2C9f6f4b3cb6cf43eb%2C3d13d598882b6eb8%2C551006fddcb290ae%2C10b9bbc646c48387%2C7670dfdf35146bc5%2Ce0ec2cf59f1fc79e%2Cee30922e5e4854db%2C9e7a0ba216194210 >> HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT >> 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) >> Chrome/47.0.2526.111 Safari/537.36" >> 69.181.235.240 - - [19/Jan/2016:11:24:38 -0700] "GET >> /tool_runner/data_source_redirect?tool_id=ucsc_table_direct1 HTTP/1.1" 302 >> - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; >> x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 >> Safari/537.36" >> galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:42,801 Handled output >> (327.778 ms) >> galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:43,236 Verified >> access to datasets (0.023 ms) >> galaxy.tools.execute DEBUG 2016-01-19 11:24:43,343 Tool >> [ucsc_table_direct1] created job [7019] (919.481 ms) >> 69.181.235.240 - - [19/Jan/2016:11:24:42 -0700] "POST /tool_runner >> HTTP/1.1" 200 - "https://genome.ucsc.edu/cgi-bin/hgTables"; "Mozilla/5.0 >> (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) >> Chrome/47.0.2526.111 Safari/537.36" >> galaxy.jobs DEBUG 2016-01-19 11:24:44,056 (7019) Working directory for >> job is: /campusdata/galaxy/galaxy/database/job_working_directory/007/7019 >> galaxy.jobs.handler DEBUG 2016-01-19 11:24:44,070 (7019) Dispatching to >> sge runner >> galaxy.jobs DEBUG 2016-01-19 11:24:44,378 (7019) Persisting job >> destination (destination id: sge_default) >> galaxy.jobs.runners DEBUG 2016-01-19 11:24:44,403 Job [7019] queued >> (332.423 ms) >> galaxy.jobs.handler INFO 2016-01-19 11:24:44,444 (7019) Job dispatched >> 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET /api/genomes >> HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT >> 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) >> Chrome/47.0.2526.111 Safari/537.36" >> 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET >> /api/datatypes?extension_only=False& HTTP/1.1" 200 - " >> https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64) >> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36" >> 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET >> /history/current_history_json HTTP/1.1" 200 - " >> https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64) >> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36" >> galaxy.jobs.runners DEBUG 2016-01-19 11:24:46,399 (7019) command is: >> python /campusdata/galaxy/galaxy/tools/data_source/data_source.py >> /campusdata/galaxy/galaxy/database/files/011/dataset_11361.dat 0; >> return_code=$?; python >> "/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/set_metadata_IaPURP.py" >> "/campusdata/galaxy/galaxy/database/tmp/tmp9Qt0cv" >> "/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/galaxy.json" >> "/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_in_HistoryDatasetAssociation_13512_oucw5s,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_kwds_HistoryDatasetAssociation_13512_ZrUbrF,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_out_HistoryDatasetAssociation_13512_twCvq7,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_results_HistoryDatasetAssociation_13512_FO1cy9,/campusdata/galaxy/galaxy/database/files/011/dataset_11361.dat,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_override_HistoryDatasetAssociation_13512_Z_cUTF" >> 5242880; sh -c "exit $return_code" >> galaxy.jobs.runners.drmaa DEBUG 2016-01-19 11:24:46,787 (7019) submitting >> file >> /campusdata/galaxy/galaxy/database/job_working_directory/007/7019/galaxy_7019.sh >> galaxy.jobs.runners.drmaa DEBUG 2016-01-19 11:24:46,808 (7019) native >> specification is: -R