Re: [slurm-users] How to debug a prolog script?

2022-10-29 Thread Davide DelVento
Finally I found some time available when I could do the job without disrupting my users. It turned out to be both the permissions issue as discussed here, and the fact that the slurm.conf needs the fully qualified path of the prolog script. So that is solved, but sadly my problem is not solved as

Re: [slurm-users] How to debug a prolog script?

2022-09-18 Thread Bjørn-Helge Mevik
Davide DelVento writes: >> I'm curious: What kind of disruption did it cause for your production >> jobs? > > All jobs failed and went in pending/held with "launch failed requeued > held" status, all nodes where the jobs were scheduled went draining. > > The logs only said "error: validate_node_s

Re: [slurm-users] How to debug a prolog script?

2022-09-16 Thread Davide DelVento
Thanks a lot. > > Does it need the execution permission? For root alone sufficient? > > slurmd runs as root, so it only need exec perms for root. Perfect. That must have been then, since my script (like the example one) did not have the execution permission on. > I'm curious: What kind of disrup

Re: [slurm-users] How to debug a prolog script?

2022-09-16 Thread Bjørn-Helge Mevik
Davide DelVento writes: > Does it need the execution permission? For root alone sufficient? slurmd runs as root, so it only need exec perms for root. >> > 2. How to debug the issue? >> I'd try capturing all stdout and stderr from the script into a file on the >> compute >> node, for instance l

Re: [slurm-users] How to debug a prolog script?

2022-09-16 Thread Davide DelVento
Thanks to both of you. > Permissions on the file itself (and the directories in the path to it) Does it need the execution permission? For root alone sufficient? > Existence of the script on the nodes (prologue is run on the nodes, not the > head) Yes, it's in a shared filesystem. > Not sure

Re: [slurm-users] How to debug a prolog script?

2022-09-16 Thread Bjørn-Helge Mevik
Davide DelVento writes: > 2. How to debug the issue? I'd try capturing all stdout and stderr from the script into a file on the compute node, for instance like this: exec &> /root/prolog_slurmd.$$ set -x # To print out all commands before any other commands in the script. The "prolog_slurmd.

Re: [slurm-users] How to debug a prolog script?

2022-09-15 Thread Brian Andrus
Davide, Quick things to check: * Permissions on the file itself (and the directories in the path to it) * Existence of the script on the nodes (prologue is run on the nodes, not the head) Not sure your error is the prologue script itself. Does everything run fine with no prologue configur

[slurm-users] How to debug a prolog script?

2022-09-15 Thread Davide DelVento
I have a super simple prolog script, as follows (very similar to the example one) #!/bin/bash if [[ $VAR == 1 ]]; then echo "True" fi exit 0 This fails (and obviously causes great disruption to my production jobs). So I have two questions: 1. Why does it fail? It does so regardless of