[slurm-dev] Re: let sbatch run a script

2017-01-31 Thread Lachlan Musicman
There's always the --dependency flag for sbatch. So yes, depending on what you wanted, you could line up another sbatch after the first if you liked. cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 1 February 2017 at 08:38, TO_We

[slurm-dev] Re: Node switching to DRAIN for unknown reason, trouble shooting ideas?

2017-01-31 Thread Lachlan Musicman
trival questions: does node has correct time wrt head node? and is node correctly configured in slurm.conf? (# of cpus, amount of memory, etc) cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 1 February 2017 at 08:03, E V wrote:

[slurm-dev] Re: let sbatch run a script

2017-01-31 Thread TO_Webmaster
You might use a job submission plugin, but in this case, you have to be aware of the fact that the job can still rejected by slurm after the plugin was involved. Another idea might be to wrap sbatch. Maybe there is a much better way. I'm really not sure. Could you provide a few details about wh

[slurm-dev] Re: Backup controller not responding to requests

2017-01-31 Thread TO_Webmaster
Still, for maintenance of the primary controller, you should call "scontrol takeover" because Slurm commands, especially "srun", do not work during these 60 seconds otherwise. 2017-01-31 19:12 GMT+01:00 Andrus, Brian Contractor : > Aha! That was just far too large for me. Set it down to 60 second

[slurm-dev] Re: Displaying Messages to Users from job_submit.lua

2017-01-31 Thread TO_Webmaster
We also still have 16.5 in our productive system. We thought about exchanging the data through files but decided that it is better to use the comment field of the job (in contrast to the AdminComment field, this field is available also in 16.5, but can also be used by the user [which does not happ

[slurm-dev] Re: Node switching to DRAIN for unknown reason, trouble shooting ideas?

2017-01-31 Thread E V
enabling debug5 doesn't show anything more useful. I don't see anything relevant in slurmd.log just job starts and stops. slurmctld.log has the takeover output with backup head node immediately draining itself same as before but with more of the context before the DRAIN: [2017-01-31T15:37:38.387]

[slurm-dev] Slurm versions 16.05.9 and 17.02.0-0rc1 are now available

2017-01-31 Thread Danny Auble
We are pleased to announce the availability of Slurm versions 16.05.9 and 17.02.0-0rc1 (release candidate 1). 16.05.9 contains around 25 rather minor bug fixes. Please upgrade at your leisure. The rc release contains all of the features intended for release 17.02. Development has ended for

[slurm-dev] Re: Backup controller not responding to requests

2017-01-31 Thread Andrus, Brian Contractor
Aha! That was just far too large for me. Set it down to 60 seconds and things seem happier (along with the users). Thanks! Brian -Original Message- From: TO_Webmaster [mailto:luftha...@gmail.com] Sent: Tuesday, January 31, 2017 12:26 AM To: slurm-dev Subject: [slurm-dev] Re: Backup co

[slurm-dev] Re: Displaying Messages to Users from job_submit.lua

2017-01-31 Thread Yang Liu
We still have Slurm 16.5. With the wrapper idea, I will try dump the message t a file and then display the file contents in sbatch wrapper. Thanks a lot. Yang From: TO_Webmaster Sent: Tuesday, January 31, 2017 11:32 AM To: slurm-dev Subject: [slurm-dev]

[slurm-dev] Re: Displaying Messages to Users from job_submit.lua

2017-01-31 Thread TO_Webmaster
I was told that this is impossible at the moment. We ended up in putting the message in the comment field of the job (in Slurm 17.02, you can use the AdminComment) and writing a small wrapper to sbatch that prints the message to the screen. Anyway, it would be nice if such a feature would be adde

[slurm-dev] Displaying Messages to Users from job_submit.lua

2017-01-31 Thread Yang Liu
In job_submit.lua, slurm.user_msg displays a message only when slurm.ERROR is returned from job_submit.lua. Is there a way to display a message to users whe slurm.SUCCESS is returned? Yang High Performance Research Computing Texas A&M University

[slurm-dev] Re: Node switching to DRAIN for unknown reason, trouble shooting ideas?

2017-01-31 Thread E V
No eplilog scripts defined, and access to save state is fine, as an scontrol takeover works, but does have the side affect of the backup draining itself. I set SlurmctlDebug to debug3 and didn't get much more info: [2017-01-31T09:45:22.329] debug2: node_did_resp hpcc-1 [2017-01-31T09:45:22.329] de

[slurm-dev] let sbatch run a script

2017-01-31 Thread Malte Thoma
I guess this is a simple question, but I have not found an answer yet: We would like to run a script just after sbatch was launched. We are aware of PrologSlurmctld= and Prolog= in slurm.conf, but AFAIK these are executed when the job is allocated and there might be several hours between submis

[slurm-dev] Lua job_submit script enhancement

2017-01-31 Thread CB
Dear Slurm Developers, I've added the following changes so that the Lua job submit script can handle jobs submitted with the --immediate flag in the Slurm 16.05.x source branch. I'm wondering if this change can be checked in. $ git status # On branch slurm-16.05 # Changes not staged for commit:

[slurm-dev] Re: SPANK and job variables/options

2017-01-31 Thread Carlos Fenoy
Hi, You should take a look at the job_submit plugin. That is the nest place to check if a job should be queued or it can be rejected otherwise. Regards, Carlos On Thu, Jan 26, 2017 at 1:03 PM, Dmitry Chirikov wrote: > Hi all, > > Playing with SPANK I faced up with an issue > Seems I can't get

[slurm-dev] RE: SPANK and job variables/options

2017-01-31 Thread Sam Gallop (NBI)
Hi Dmitry, I can see the confusion, my last mail was poorly worded. When I ran the plugin in the S_CTX_REMOTE context I was able to retrieve some information. I wasn't able to retrieve any data when running in the S_CTX_ALLOCATOR context. Again, apologies for the confusion. --- Sam Gallop F

[slurm-dev] Re: Backup controller not responding to requests

2017-01-31 Thread TO_Webmaster
What is the output of scontrol show config | grep SlurmctldTimeout ? 2017-01-31 6:57 GMT+01:00 Andrus, Brian Contractor : > Yes, if I do scontrol takeover, it successfully goes to the backup. > > > Brian Andrus > ITACS/Research Computing > Naval Postgraduate School > Monterey, California > voic