[galaxy-dev] Re: Errors running jobs in Galaxy 16.01 with slurm-drmaa 1.1.0

2019-09-14 Thread Philip Blood
Thanks Nate. I just updated the github issue. It turns out this error was
being caused by a configuration issue that required a job name be specified
in the native spec passed to drmaa-run. With jobs submitted via sbatch, the
name of the script was used when no job name was specified, but it was
empty if not specified explicitly to drmaa-run. Once the admin changed
job_script.lua to handle nil values for job name, the tests with drmaa-run
started working with Slurm 18 08.8.

Unfortunately, this did not fix my related issue with submitting jobs from
Galaxy using slurm-drmaa. I am still getting the same errors. Any
suggestions where to look next?

Phil

On Fri, Sep 13, 2019, 11:53 AM Nate Coraor  wrote:

> Hi Phil,
>
> I followed up over on the Github issue, let's track it there and we can
> reply here for the sake of history once we figure out what's going on.
>
> Thanks,
> --nate
>
> On Thu, Sep 12, 2019 at 12:32 AM Philip Blood  wrote:
>
>> Update: Nate Coraor pointed me to the drmaa-run utility in slurm-drmaa to
>> do more focused testing, and it looks like the issue with running Slurm
>> jobs from Galaxy comes down to *slurm-drmaa not working with the latest
>> version of Slurm 18 -- 18.08.8.* I created an issue on the slurm-drmaa
>> github page here <https://github.com/natefoo/slurm-drmaa/issues/32>.
>>
>> Since 18.08.8 addresses a security vulnerability
>> <https://www.schedmd.com/news.php> that is not addressed in previous
>> versions of Slurm, it seems like this slurm-drmaa problem will be an
>> important issue to address for all those running Galaxy jobs on Slurm
>> clusters.
>>
>> If anyone finds they *can* run jobs via slurm-drmaa with Slurm 18.08.8,
>> I'd
>> be interested to hear it.
>>
>> Phil
>>
>> On Tue, Sep 3, 2019 at 2:29 PM Philip Blood  wrote:
>>
>> > Hi Folks,
>> >
>> > I'm trying to get an old instance of Galaxy (16.01) working for a user
>> who
>> > needs to use it this week for a class he is teaching (so upgrading
>> Galaxy
>> > is not an option at the moment). Due to a recent slurm upgrade on our
>> > compute system to slurm 18.08.8, we had to replace the old slurm-drmaa
>> > 1.0.7 library <http://apps.man.poznan.pl/trac/slurm-drmaa>, which
>> doesn't
>> > work with with 18.08.8, with Nate's forked slurm-drmaa library version
>> > 1.1.0 <https://github.com/natefoo/slurm-drmaa>. That built fine with
>> > slurm 18.08.8 and (I think) we updated all the relevant pointers in the
>> > galaxy config to point to the new slurm-drmaa 1.1.0 library.
>> >
>> > However, now when I try to run jobs on our system I get errors (it
>> worked
>> > fine before with slurm-drmaa 1.0.7 and the older version of slurm). So,
>> I
>> > wanted to get a quick sanity check on whether this might be an issue
>> with
>> > trying to use the new slurm-drmaa with an old version of Galaxy, 16.01,
>> or
>> > if anyone has any other quick thoughts on troubleshooting this. The
>> errors
>> > I get are below.
>> >
>> > Best,
>> > Phil
>> >
>> > *Short version (just the errors):*
>> > 198.91.54.159 - - [31/Aug/2019:16:31:28 +] "GET
>> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
>> > https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
>> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
>> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10)
>> > drmaa.Session.runJob() failed, will retry: code 1:
>> slurm_submit_batch_job
>> > error (2): No such file or directory*
>> > 198.91.54.159 - - [31/Aug/2019:16:31:32 +] "GET
>> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
>> > https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
>> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
>> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10)
>> > drmaa.Session.runJob() failed, will retry: code 1:
>> slurm_submit_batch_job
>> > error (0): No error*
>> > 198.91.54.159 - - [31/Aug/2019:16:31:37 +] "GET
>> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
>> > https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
>> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
>> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10)
>> > drmaa.Session.runJob() failed, will retry: code 1:
>> slurm_submit_batch_job
>> > error (

[galaxy-dev] Re: Errors running jobs in Galaxy 16.01 with slurm-drmaa 1.1.0

2019-09-11 Thread Philip Blood
Update: Nate Coraor pointed me to the drmaa-run utility in slurm-drmaa to
do more focused testing, and it looks like the issue with running Slurm
jobs from Galaxy comes down to *slurm-drmaa not working with the latest
version of Slurm 18 -- 18.08.8.* I created an issue on the slurm-drmaa
github page here <https://github.com/natefoo/slurm-drmaa/issues/32>.

Since 18.08.8 addresses a security vulnerability
<https://www.schedmd.com/news.php> that is not addressed in previous
versions of Slurm, it seems like this slurm-drmaa problem will be an
important issue to address for all those running Galaxy jobs on Slurm
clusters.

If anyone finds they *can* run jobs via slurm-drmaa with Slurm 18.08.8, I'd
be interested to hear it.

Phil

On Tue, Sep 3, 2019 at 2:29 PM Philip Blood  wrote:

> Hi Folks,
>
> I'm trying to get an old instance of Galaxy (16.01) working for a user who
> needs to use it this week for a class he is teaching (so upgrading Galaxy
> is not an option at the moment). Due to a recent slurm upgrade on our
> compute system to slurm 18.08.8, we had to replace the old slurm-drmaa
> 1.0.7 library <http://apps.man.poznan.pl/trac/slurm-drmaa>, which doesn't
> work with with 18.08.8, with Nate's forked slurm-drmaa library version
> 1.1.0 <https://github.com/natefoo/slurm-drmaa>. That built fine with
> slurm 18.08.8 and (I think) we updated all the relevant pointers in the
> galaxy config to point to the new slurm-drmaa 1.1.0 library.
>
> However, now when I try to run jobs on our system I get errors (it worked
> fine before with slurm-drmaa 1.0.7 and the older version of slurm). So, I
> wanted to get a quick sanity check on whether this might be an issue with
> trying to use the new slurm-drmaa with an old version of Galaxy, 16.01, or
> if anyone has any other quick thoughts on troubleshooting this. The errors
> I get are below.
>
> Best,
> Phil
>
> *Short version (just the errors):*
> 198.91.54.159 - - [31/Aug/2019:16:31:28 +] "GET
> /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
> x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10)
> drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
> error (2): No such file or directory*
> 198.91.54.159 - - [31/Aug/2019:16:31:32 +] "GET
> /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
> x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10)
> drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
> error (0): No error*
> 198.91.54.159 - - [31/Aug/2019:16:31:37 +] "GET
> /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
> x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10)
> drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
> error (0): No error*
> 198.91.54.159 - - [31/Aug/2019:16:31:41 +] "GET
> /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
> x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> 198.91.54.159 - - [31/Aug/2019:16:31:45 +] "GET
> /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
> x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10)
> drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
> error (0): No error*
> 198.91.54.159 - - [31/Aug/2019:16:31:49 +] "GET
> /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
> x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10)
> drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
> error (0): No error*
> 198.91.54.159 - - [31/Aug/2019:16:31:53 +] "GET
> /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64;
> x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All attempts
> to submit job failed *
>
>
> *Full context:*
>  198.91.54.159 - - [31/Aug/2019:16:30:27 +] "GET
> /api

[galaxy-dev] Errors running jobs in Galaxy 16.01 with slurm-drmaa 1.1.0

2019-09-03 Thread Philip Blood
 Hi Folks,

I'm trying to get an old instance of Galaxy (16.01) working for a user who
needs to use it this week for a class he is teaching (so upgrading Galaxy
is not an option at the moment). Due to a recent slurm upgrade on our
compute system to slurm 18.08.8, we had to replace the old slurm-drmaa
1.0.7 library , which doesn't
work with with 18.08.8, with Nate's forked slurm-drmaa library version 1.1.0
. That built fine with slurm
18.08.8 and (I think) we updated all the relevant pointers in the galaxy
config to point to the new slurm-drmaa 1.1.0 library.

However, now when I try to run jobs on our system I get errors (it worked
fine before with slurm-drmaa 1.0.7 and the older version of slurm). So, I
wanted to get a quick sanity check on whether this might be an issue with
trying to use the new slurm-drmaa with an old version of Galaxy, 16.01, or
if anyone has any other quick thoughts on troubleshooting this. The errors
I get are below.

Best,
Phil

*Short version (just the errors):*
198.91.54.159 - - [31/Aug/2019:16:31:28 +] "GET
/api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:68.0) Gecko/20100101 Firefox/68.0"
*galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10)
drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
error (2): No such file or directory*
198.91.54.159 - - [31/Aug/2019:16:31:32 +] "GET
/api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:68.0) Gecko/20100101 Firefox/68.0"
*galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10)
drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
error (0): No error*
198.91.54.159 - - [31/Aug/2019:16:31:37 +] "GET
/api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:68.0) Gecko/20100101 Firefox/68.0"
*galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10)
drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
error (0): No error*
198.91.54.159 - - [31/Aug/2019:16:31:41 +] "GET
/api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:68.0) Gecko/20100101 Firefox/68.0"
198.91.54.159 - - [31/Aug/2019:16:31:45 +] "GET
/api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:68.0) Gecko/20100101 Firefox/68.0"
*galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10)
drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
error (0): No error*
198.91.54.159 - - [31/Aug/2019:16:31:49 +] "GET
/api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:68.0) Gecko/20100101 Firefox/68.0"
*galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10)
drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job
error (0): No error*
198.91.54.159 - - [31/Aug/2019:16:31:53 +] "GET
/api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:68.0) Gecko/20100101 Firefox/68.0"
*galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All attempts
to submit job failed *


*Full context:*
 198.91.54.159 - - [31/Aug/2019:16:30:27 +] "GET
/api/tools/squeue/build HTTP/1.1" 200 - "https://galaxy.bridges.psc.edu/;
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0"
galaxy.tools DEBUG 2019-08-31 16:30:30,142 Validated and populated state
for tool request (4.081 ms)
galaxy.tools.actions INFO 2019-08-31 16:30:30,285 Handled output (100.616
ms)
galaxy.tools.actions INFO 2019-08-31 16:30:30,319 Verified access to
datasets (0.005 ms)
galaxy.tools.execute DEBUG 2019-08-31 16:30:30,368 Tool [squeue] created
job [10] (206.086 ms)
galaxy.tools.execute DEBUG 2019-08-31 16:30:30,376 Executed all jobs for
tool request: (233.862 ms)
198.91.54.159 - - [31/Aug/2019:16:30:30 +] "POST /api/tools HTTP/1.1"
200 - "https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0;
Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0"
198.91.54.159 - - [31/Aug/2019:16:30:30 +] "GET
/api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
https://galaxy.bridges.psc.edu/; "Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:68.0) Gecko/20100101 Firefox/68.0"
galaxy.jobs DEBUG 2019-08-31 16:30:30,747 (10) Working directory for job
is: /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10
galaxy.jobs.handler DEBUG 2019-08-31 16:30:30,751 (10) Dispatching to slurm
runner
galaxy.jobs DEBUG 2019-08-31 16:30:30,774 (10) Persisting job destination
(destination 

Re: [galaxy-dev] Decrease Galaxy polling of slurm?

2017-05-24 Thread Philip Blood
Thanks Nate! I have no idea if this is actually having any impact, but
keeping admins happy is of utmost importance. :-)

Phil

On Wed, May 24, 2017 at 8:58 AM, Nate Coraor <n...@bx.psu.edu> wrote:

> Hi Phil,
>
> Unfortunately, there's no configurable for this. You can, however, modify
> the loop's sleep here:
>
> https://github.com/galaxyproject/galaxy/blob/
> d7a0fdfaa748ca427dda1cef74b81e253f1921d5/lib/galaxy/jobs/
> runners/__init__.py#L552
>
> Hope this helps,
> --nate
>
> On Wed, May 24, 2017 at 10:40 AM, Philip Blood <bl...@psc.edu> wrote:
>
>> An admin at our center noticed our Galaxy instance is polling slurm every
>> second and asked if the polling frequency can be decreased. Is there an
>> easy way to do this?
>>
>> Phil
>>
>> --
>> Philip D. Blood, Ph.D.
>> Senior Computational Scientist Voice: (412) 268-9329
>> Pittsburgh Supercomputing CenterFax: (412) 268-5832
>> Carnegie Mellon University   Email: bl...@psc.edu
>>
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>   https://lists.galaxyproject.org/
>>
>> To search Galaxy mailing lists use the unified search at:
>>   http://galaxyproject.org/search/
>>
>
>


-- 
Philip D. Blood, Ph.D.
Senior Computational Scientist Voice: (412) 268-9329
Pittsburgh Supercomputing CenterFax: (412) 268-5832
Carnegie Mellon University   Email: bl...@psc.edu
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

[galaxy-dev] Decrease Galaxy polling of slurm?

2017-05-24 Thread Philip Blood
An admin at our center noticed our Galaxy instance is polling slurm every
second and asked if the polling frequency can be decreased. Is there an
easy way to do this?

Phil

-- 
Philip D. Blood, Ph.D.
Senior Computational Scientist Voice: (412) 268-9329
Pittsburgh Supercomputing CenterFax: (412) 268-5832
Carnegie Mellon University   Email: bl...@psc.edu
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

Re: [galaxy-dev] Pulsar: issue configuring to use shared filesystem rather than staging data

2016-12-02 Thread Philip Blood
Marius,

Thanks for this suggestion! John also suggested this on IRC. I had thought
about this, but I initially avoided it for some technical reasons. However,
I'm happy to say this is working for me. Thanks again for the pointers!

Phil

On Thu, Dec 1, 2016 at 11:33 AM, Marius van den Beek <m.vandenb...@gmail.com
> wrote:

> I hope John can give some more input on this,
>
> but if you're really short on time you could try the ssh runner
>
> directly from galaxy without pulsar. I'm very happy with this driving a
> Torque 4 cluster.
>
> The relevant part of my job_conf.xml looks like this:
>
> ```
>
> 
> SecureShell
> Torque
> myusername
> submitnode.curie.fr
> walltim
> e=2:00:00,nodes=1:ppn=8,mem=32gb
> true
>  />
>
> 
>
> ```
>
> You'll only have to be able to do a passwordless ssh login using your
> galaxy user.
>
> Best,
>
> Marius
>
> On 01/12/2016 15:35, Philip Blood wrote:
>
> Hi Folks,
>
> I'm working on a time-sensitive project (just a day or two left to sort it
> out) that requires I be able to submit jobs from a remote resource (TACC)
> to my resources (Pittsburgh Supercomputing Center) using a shared
> filesystem for data rather than staging via Pulsar. I've tried to set it up
> so that Pulsar does not do any staging, but is just handling the remote job
> submission and tool dependencies. However, Pulsar continues to use a local
> staging directory for the data and then copies the data back to the Galaxy
> directory.
>
> What I'm hoping to do is have all the data stay in the shared filesystem
> for the entire course of the job. The quick test below does not have input
> data, but just issues commands on the remote compute node and generates
> output.
>
> I'm using latest stable releases of Galaxy (16.07, installed on a TACC VM)
> and Pulsar (installed via pip at PSC). If anyone can provide some quick
> pointers I'd appreciate it.
>
> Here are relevant parts of my configuration and some of the log output:
>
> *Shared filesystem:* /hopper
>
> *Local staging dir created by Pulsar:* /usr/local/packages/pu
> lsar/etc/files/staging
>
> *galaxy.ini*
> file_path = /hopper/sy3l67p/blood/galaxy_home/database/files
> new_file_path = /hopper/sy3l67p/blood/galaxy_home/database/tmp
> job_working_directory = /hopper/sy3l67p/blood/galaxy_h
> ome/database/job_working_directory
> tool_dependency_dir = /hopper/sy3l67p/blood/galaxy_h
> ome/database/dependencies
> dependency_resolvers_config_file = /hopper/sy3l67p/blood/galaxy_h
> ome/config/dependency_resolvers_conf.xml
>
>
> *job_conf.xml *
>
> https://128.182.99.126:8913/
> none
> /home/
> tg455546/galaxy/config/file_actions.yaml
> remote
> -q batch
>
> 
>
> *file_actions.yml*
> paths:
>   # If Galaxy, the Pulsar, and the compute nodes all mount the same
> directory
>   # staging can be disabled altogether for given paths.
>   - path: /hopper/sy3l67p/blood/galaxy_home/database/files
> action: none
>   - path: /hopper/sy3l67p/blood/galaxy_home/database/job_working_directory
> action: none
>
> *app.yml*
> ---
> manager:
>   type: queued_drmaa
> dependency_resolvers_config_file: /usr/local/packages/pulsar/etc
> /dependency_resolvers_conf.xml
> tool_dependency_dir: /usr/local/packages/pulsar/dependencies
>
> *Galaxy paster.log*
> galaxy.jobs DEBUG 2016-12-01 08:07:35,770 (50) Working directory for job
> is: /hopper/sy3l67p/blood/galaxy_home/database/job_working_directory\
> /000/50
> pulsar.client.staging.down INFO 2016-12-01 08:08:08,305 collecting output
> output with action FileAction[action_type=copy]
> pulsar.client.client DEBUG 2016-12-01 08:08:08,814 Copying path
> [/usr/local/packages/pulsar/etc/files/staging/50/working/output] to
> [/hopper/\
> sy3l67p/blood/galaxy_home/database/files/000/dataset_50.dat]
>
> *Pulsar uwsgi.log*
> 2016-12-01 09:07:38,174 INFO  [pulsar.managers.base.base_dr
> maa][[manager=_default_]-[action=preprocess]-[job=50]] Submitting DRMA\
> A job with nativeSpecification [-q batch]
> t #78bc [   539.97] -> drmaa_allocate_job_template
> t #78bc [   539.97] <- drmaa_allocate_job_template =0: jt=0x7f69cc002f80
> t #78bc [   539.97] -> drmaa_set_attribute(jt=0x7f69cc002f80,
> name='drmaa_remote_command', value='/usr/local/packages/pulsar/etc/\
> files/staging/50/command.sh')
> t #78bc [   539.97] -> fsd_template_set_attr(drmaa_re
> mote_command=/usr/local/packages/pulsar/etc/files/staging/50/command.sh)
> t #78bc [   539.97] <- drmaa

[galaxy-dev] Pulsar: issue configuring to use shared filesystem rather than staging data

2016-12-01 Thread Philip Blood
Hi Folks,

I'm working on a time-sensitive project (just a day or two left to sort it
out) that requires I be able to submit jobs from a remote resource (TACC)
to my resources (Pittsburgh Supercomputing Center) using a shared
filesystem for data rather than staging via Pulsar. I've tried to set it up
so that Pulsar does not do any staging, but is just handling the remote job
submission and tool dependencies. However, Pulsar continues to use a local
staging directory for the data and then copies the data back to the Galaxy
directory.

What I'm hoping to do is have all the data stay in the shared filesystem
for the entire course of the job. The quick test below does not have input
data, but just issues commands on the remote compute node and generates
output.

I'm using latest stable releases of Galaxy (16.07, installed on a TACC VM)
and Pulsar (installed via pip at PSC). If anyone can provide some quick
pointers I'd appreciate it.

Here are relevant parts of my configuration and some of the log output:

*Shared filesystem:* /hopper

*Local staging dir created by Pulsar:*
 /usr/local/packages/pulsar/etc/files/staging

*galaxy.ini*
file_path = /hopper/sy3l67p/blood/galaxy_home/database/files
new_file_path = /hopper/sy3l67p/blood/galaxy_home/database/tmp
job_working_directory =
/hopper/sy3l67p/blood/galaxy_home/database/job_working_directory
tool_dependency_dir =
/hopper/sy3l67p/blood/galaxy_home/database/dependencies
dependency_resolvers_config_file =
/hopper/sy3l67p/blood/galaxy_home/config/dependency_resolvers_conf.xml


*job_conf.xml*
   
https://128.182.99.126:8913/
none
/home/tg455546/galaxy/config/file_actions.yaml
remote
-q batch



*file_actions.yml*
paths:
  # If Galaxy, the Pulsar, and the compute nodes all mount the same
directory
  # staging can be disabled altogether for given paths.
  - path: /hopper/sy3l67p/blood/galaxy_home/database/files
action: none
  - path: /hopper/sy3l67p/blood/galaxy_home/database/job_working_directory
action: none

*app.yml*
---
manager:
  type: queued_drmaa
dependency_resolvers_config_file:
/usr/local/packages/pulsar/etc/dependency_resolvers_conf.xml
tool_dependency_dir: /usr/local/packages/pulsar/dependencies

*Galaxy paster.log*
galaxy.jobs DEBUG 2016-12-01 08:07:35,770 (50) Working directory for job
is: /hopper/sy3l67p/blood/galaxy_home/database/job_working_directory\
/000/50
pulsar.client.staging.down INFO 2016-12-01 08:08:08,305 collecting output
output with action FileAction[action_type=copy]
pulsar.client.client DEBUG 2016-12-01 08:08:08,814 Copying path
[/usr/local/packages/pulsar/etc/files/staging/50/working/output] to
[/hopper/\
sy3l67p/blood/galaxy_home/database/files/000/dataset_50.dat]

*Pulsar uwsgi.log*
2016-12-01 09:07:38,174 INFO
 
[pulsar.managers.base.base_drmaa][[manager=_default_]-[action=preprocess]-[job=50]]
Submitting DRMA\
A job with nativeSpecification [-q batch]
t #78bc [   539.97] -> drmaa_allocate_job_template
t #78bc [   539.97] <- drmaa_allocate_job_template =0: jt=0x7f69cc002f80
t #78bc [   539.97] -> drmaa_set_attribute(jt=0x7f69cc002f80,
name='drmaa_remote_command', value='/usr/local/packages/pulsar/etc/\
files/staging/50/command.sh')
t #78bc [   539.97] ->
fsd_template_set_attr(drmaa_remote_command=/usr/local/packages/pulsar/etc/files/staging/50/command.sh)
t #78bc [   539.97] <- drmaa_set_attribute =0
t #78bc [   539.97] -> drmaa_set_attribute(jt=0x7f69cc002f80,
name='drmaa_output_path', value=':/usr/local/packages/pulsar/etc/fi\
les/staging/50/stdout')
t #78bc [   539.97] ->
fsd_template_set_attr(drmaa_output_path=:/usr/local/packages/pulsar/etc/files/staging/50/stdout)
t #78bc [   539.97] <- drmaa_set_attribute =0
t #78bc [   539.97] -> drmaa_set_attribute(jt=0x7f69cc002f80,
name='drmaa_job_name', value='pulsar_50')
t #78bc [   539.97] -> fsd_template_set_attr(drmaa_job_name=pulsar_50)
t #78bc [   539.97] <- drmaa_set_attribute =0
t #78bc [   539.97] -> drmaa_set_attribute(jt=0x7f69cc002f80,
name='drmaa_error_path', value=':/usr/local/packages/pulsar/etc/fil\
es/staging/50/stderr')
t #78bc [   539.97] ->
fsd_template_set_attr(drmaa_error_path=:/usr/local/packages/pulsar/etc/files/staging/50/stderr)
t #78bc [   539.97] -> drmaa_set_attribute(jt=0x7f69cc002f80,
name='drmaa_native_specification', value='-q batch')
t #78bc [   539.97] -> fsd_template_set_attr(drmaa_native_specification=-q
batch)
t #78bc [   539.97] <- drmaa_set_attribute =0
t #78bc [   539.97] -> drmaa_run_job(jt=0x7f69cc002f80)
t #78bc [   539.97] -> pbsdrmaa_session_run_impl(jt=0x7f69cc002f80,
bulk_idx=-1)
t #78bc [   539.97] -> fsd_template_set_attr(Checkpoint=u)
t #78bc [   539.97] -> fsd_template_set_attr(Keep_Files=n)
t #78bc [   539.97] -> fsd_template_set_attr(Priority=0)
t #78bc [   539.97] -> pbsdrmaa_write_tmpfile
t #78bc [   539.97] <- pbsdrmaa_write_tmpfile=/tmp/pbs_drmaa.n6VpuR
t #78bc [   539.97] -> fsd_template_set_attr(Job_Name=pulsar_50)
t #78bc [   539.97] ->