'fixed it' ish.

One of our prolog scripts does 'sleep 10' after dropping caches before
checking free mem again.  Just this sleep line kills all srun jobs, somehow.

I did add PrologEpilogTimeout=120, but that didn't help at all.  Just
removed the sleep function.

Looks like it may be related to this bug:

https://bugs.schedmd.com/show_bug.cgi?id=2214




On Mon, Jun 12, 2017 at 2:27 PM, Michael Senizaiz <[email protected]> wrote:

> SRUN fails on all hosts.  All other features work, MPI jobs run fine as
> long as we bypass SRUN.
>
> Error:
>
> srun: error: slurm_receive_msgs: Socket timed out on send/recv operation
> srun: error: Task launch for 5.0 failed on node node002: Socket timed out
> on send/recv operation
> srun: error: Application launch failed: Socket timed out on send/recv
> operation
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>
>
> This happens on single and multi-node jobs.  All hosts use DNS, so there
> shouldn't be any wierd name mismatches.
> SELinux is DISABLED:
>
> Log (/var/spool/slurm/slurmd.log on node002):
> [2017-06-12T14:08:40.654] debug2: Processing RPC: REQUEST_LAUNCH_TASKS
> [2017-06-12T14:08:40.654] launch task 5.0 request from [email protected]
> (port 7638)
> [2017-06-12T14:08:40.654] debug3: state for jobid 2850: ctime:1497018336
> revoked:0 expires:0
> [2017-06-12T14:08:40.654] debug3: state for jobid 2882: ctime:1497023548
> revoked:1497023548 expires:1497023548
> [2017-06-12T14:08:40.654] debug3: destroying job 2882 state
> [2017-06-12T14:08:40.654] debug3: state for jobid 2886: ctime:1497023588
> revoked:1497023588 expires:1497023588
> [2017-06-12T14:08:40.654] debug3: destroying job 2886 state
> [2017-06-12T14:08:40.654] debug3: state for jobid 2892: ctime:1497023659
> revoked:1497023659 expires:1497023659
> [2017-06-12T14:08:40.654] debug3: destroying job 2892 state
> [2017-06-12T14:08:40.654] debug:  Checking credential with 276 bytes of
> sig data
> [2017-06-12T14:08:40.654] debug:  task_p_slurmd_launch_request: 5.0 0
> [2017-06-12T14:08:40.655] debug:  [job 5] attempting to run prolog
> [/var/spool/slurm/prolog_and_epilog/bin/prolog]
> [2017-06-12T14:09:06.467] _run_prolog: run job script took usec=25811904
> [2017-06-12T14:09:06.467] _run_prolog: prolog with lock for job 5 ran for
> 26 seconds
> [2017-06-12T14:09:06.467] prolog for job 5 ran for 26 seconds
> [2017-06-12T14:09:06.467] debug3: _rpc_launch_tasks: call to
> _forkexec_slurmstepd
> [2017-06-12T14:09:06.467] debug3: slurmstepd rank 0 (dh0002), parent rank
> -1 (NONE), children 0, depth 0, max_depth 0
> [2017-06-12T14:09:06.467] debug3: _send_slurmstepd_init: call to getpwuid_r
> [2017-06-12T14:09:06.467] debug3: _send_slurmstepd_init: return from
> getpwuid_r
> [2017-06-12T14:09:06.467] debug2: Cached group access list for root/0
> [2017-06-12T14:09:06.580] debug2: debug level is 8.
> [2017-06-12T14:09:06.580] debug3: Trying to load plugin
> /usr/lib64/slurm/switch_none.so
> [2017-06-12T14:09:06.580] debug:  switch NONE plugin loaded
> [2017-06-12T14:09:06.580] debug3: Success.
> [2017-06-12T14:09:06.580] debug3: slurmstepd rank 0, parent address =
> 0.0.0.0, port = 0
> [2017-06-12T14:09:06.591] debug2: Received CPU frequency information for
> 24 CPUs
> [2017-06-12T14:09:06.597] debug2: got gid 0
> [2017-06-12T14:09:06.597] debug2: got gid 1
> [2017-06-12T14:09:06.597] debug2: got gid 2
> [2017-06-12T14:09:06.597] debug2: got gid 3
> [2017-06-12T14:09:06.597] debug2: got gid 4
> [2017-06-12T14:09:06.597] debug2: got gid 6
> [2017-06-12T14:09:06.597] debug2: got gid 10
> [2017-06-12T14:09:06.597] debug2: setup for a launch_task
> [2017-06-12T14:09:06.597] debug3: entering stepd_step_rec_create
> [2017-06-12T14:09:06.602] debug3: Trying to load plugin
> /usr/lib64/slurm/acct_gather_profile_none.so
> [2017-06-12T14:09:06.602] debug:  AcctGatherProfile NONE plugin loaded
> [2017-06-12T14:09:06.602] debug3: Success.
> [2017-06-12T14:09:06.602] debug3: Trying to load plugin
> /usr/lib64/slurm/acct_gather_energy_none.so
> [2017-06-12T14:09:06.606] debug:  AcctGatherEnergy NONE plugin loaded
> [2017-06-12T14:09:06.606] debug3: Success.
> [2017-06-12T14:09:06.606] debug3: Trying to load plugin
> /usr/lib64/slurm/acct_gather_infiniband_none.so
> [2017-06-12T14:09:06.609] debug:  AcctGatherInfiniband NONE plugin loaded
> [2017-06-12T14:09:06.609] debug3: Success.
> [2017-06-12T14:09:06.609] debug3: Trying to load plugin
> /usr/lib64/slurm/acct_gather_filesystem_none.so
> [2017-06-12T14:09:06.609] debug:  AcctGatherFilesystem NONE plugin loaded
> [2017-06-12T14:09:06.609] debug3: Success.
> [2017-06-12T14:09:06.609] debug2: No acct_gather.conf file
> (/etc/slurm/acct_gather.conf)
> [2017-06-12T14:09:06.609] debug3: Trying to load plugin
> /usr/lib64/slurm/jobacct_gather_none.so
> [2017-06-12T14:09:06.609] debug:  Job accounting gather NOT_INVOKED plugin
> loaded
> [2017-06-12T14:09:06.609] debug3: Success.
> [2017-06-12T14:09:06.609] debug3: jobacct_gather dynamic logging enabled
> [2017-06-12T14:09:06.609] debug3: acct_gather_profile_startpoll dynamic
> logging enabled
> [2017-06-12T14:09:06.609] debug2: profile signalling type Task
> [2017-06-12T14:09:06.609] debug2: scaling CPU count by factor of 2
> (24/(12-0)
> [2017-06-12T14:09:06.609] debug3: _rpc_launch_tasks: return from
> _forkexec_slurmstepd
> [2017-06-12T14:09:06.609] [5.0] debug:  Message thread started pid = 9825
> [2017-06-12T14:09:06.609] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:06.609] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:06.609] [5.0] debug3: Entered job_manager for 5.0
> pid=9825
> [2017-06-12T14:09:06.609] [5.0] debug3: Trying to load plugin
> /usr/lib64/slurm/core_spec_none.so
> [2017-06-12T14:09:06.609] debug2: slurm_send_timeout: Socket no longer
> there
> [2017-06-12T14:09:06.609] debug3: slurm_msg_sendto: peer has disappeared
> for msg_type=8001
> [2017-06-12T14:09:06.609] error: _rpc_launch_tasks: unable to send return
> code to address:port=10.30.0.42:54813 msg_type=6001: Transport endpoint
> is not connected
> [2017-06-12T14:09:06.617] [5.0] debug3: Success.
> [2017-06-12T14:09:06.617] [5.0] debug3: Trying to load plugin
> /usr/lib64/slurm/task_none.so
> [2017-06-12T14:09:06.617] [5.0] debug:  task NONE plugin loaded
> [2017-06-12T14:09:06.617] [5.0] debug3: Success.
> [2017-06-12T14:09:06.617] [5.0] debug3: Trying to load plugin
> /usr/lib64/slurm/proctrack_pgid.so
> [2017-06-12T14:09:06.617] [5.0] debug3: Success.
> [2017-06-12T14:09:06.617] [5.0] debug3: Trying to load plugin
> /usr/lib64/slurm/checkpoint_none.so
> [2017-06-12T14:09:06.622] [5.0] debug3: Success.
> [2017-06-12T14:09:06.622] [5.0] debug:  Checkpoint plugin loaded:
> checkpoint/none
> [2017-06-12T14:09:06.622] [5.0] debug3: Trying to load plugin
> /usr/lib64/slurm/crypto_munge.so
> [2017-06-12T14:09:06.627] [5.0] Munge cryptographic signature plugin loaded
> [2017-06-12T14:09:06.627] [5.0] debug3: Success.
> [2017-06-12T14:09:06.627] [5.0] debug3: Trying to load plugin
> /usr/lib64/slurm/job_container_none.so
> [2017-06-12T14:09:06.627] [5.0] debug:  job_container none plugin loaded
> [2017-06-12T14:09:06.627] [5.0] debug3: Success.
> [2017-06-12T14:09:06.627] [5.0] debug:  mpi type = none
> [2017-06-12T14:09:06.627] [5.0] debug3: Trying to load plugin
> /usr/lib64/slurm/mpi_none.so
> [2017-06-12T14:09:06.627] [5.0] debug3: Success.
> [2017-06-12T14:09:06.627] [5.0] debug2: Before call to spank_init()
> [2017-06-12T14:09:06.627] [5.0] debug:  spank: opening plugin stack
> /etc/slurm/plugstack.conf
> [2017-06-12T14:09:06.627] [5.0] debug2: After call to spank_init()
> [2017-06-12T14:09:06.627] [5.0] debug:  mpi type = (null)
> [2017-06-12T14:09:06.627] [5.0] debug:  mpi/none: slurmstepd prefork
> [2017-06-12T14:09:06.627] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:06.628] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:06.628] [5.0] debug2: Using gid list sent by slurmd
> [2017-06-12T14:09:06.628] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:06.628] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:06.682] [5.0] debug2: Entering _setup_normal_io
> [2017-06-12T14:09:06.682] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:06.682] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:06.682] [5.0] debug2: Using gid list sent by slurmd
> [2017-06-12T14:09:06.682] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:06.682] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:06.682] [5.0] debug:    stdin uses a pty object
> [2017-06-12T14:09:06.682] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:06.682] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:06.683] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:06.683] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:06.683] [5.0] debug:  init pty size 53:161
> [2017-06-12T14:09:06.683] [5.0] in _window_manager
> [2017-06-12T14:09:06.683] [5.0] debug4: adding IO connection (logical node
> rank 0)
> [2017-06-12T14:09:06.683] [5.0] debug4: connecting IO back to
> 10.30.0.42:57166
> [2017-06-12T14:09:06.684] [5.0] debug2: Entering io_init_msg_write_to_fd
> [2017-06-12T14:09:06.684] [5.0] debug2:   msg->nodeid = 0
> [2017-06-12T14:09:06.684] [5.0] debug2: Leaving  io_init_msg_write_to_fd
> [2017-06-12T14:09:06.684] [5.0] debug2: Leaving  _setup_normal_io
> [2017-06-12T14:09:06.684] [5.0] debug level = 2
> [2017-06-12T14:09:06.684] [5.0] debug:  IO handler started pid=9825
> [2017-06-12T14:09:06.684] [5.0] debug4: eio: handling events for 3 objects
> [2017-06-12T14:09:06.684] [5.0] debug2: Using gid list sent by slurmd
> [2017-06-12T14:09:06.684] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:06.684] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:06.693] [5.0] starting 1 tasks
> [2017-06-12T14:09:06.693] [5.0] task 0 (9831) started 2017-06-12T14:09:06
> [2017-06-12T14:09:06.693] [5.0] debug3: Unblocking 5.0 task 0, writefd = 17
> [2017-06-12T14:09:06.693] [5.0] debug2: read_slurm_cgroup_conf: No
> cgroup.conf file (/etc/slurm/cgroup.conf)
> [2017-06-12T14:09:06.693] [5.0] debug2: xcgroup_load: unable to get cgroup
> '(null)/cpuset' entry '(null)/cpuset/system' properties: No such file or
> directory
> [2017-06-12T14:09:06.693] [5.0] debug:  task_p_pre_launch_priv: 5.0
> [2017-06-12T14:09:06.693] [5.0] debug2: xcgroup_load: unable to get cgroup
> '(null)/memory' entry '(null)/memory/system' properties: No such file or
> directory
> [2017-06-12T14:09:06.693] [5.0] debug:  Sending launch resp rc=0
> [2017-06-12T14:09:06.693] [5.0] debug2: Using gid list sent by slurmd
> [2017-06-12T14:09:06.693] [5.0] debug3: login_tty good
> [2017-06-12T14:09:06.694] [5.0] debug3: slurm_send_only_node_msg: sent 208
> [2017-06-12T14:09:06.694] [5.0] debug:  mpi type = (null)
> [2017-06-12T14:09:06.694] [5.0] debug:  Using mpi/none
> [2017-06-12T14:09:06.694] [5.0] debug:  task_p_pre_launch: 5.0, task 0
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_CPU no change in value: 18446744073709551615
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_FSIZE no change in value: 18446744073709551615
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_DATA no change in value: 18446744073709551615
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_STACK no change in value: 18446744073709551615
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: RLIMIT_CORE   :
> max:inf cur:inf req:0
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_CORE succeeded
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_RSS no change in value: 18446744073709551615
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_NPROC no change in value: 18446744073709551615
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_NOFILE no change in value: 65536
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_MEMLOCK no change in value: 18446744073709551615
> [2017-06-12T14:09:06.694] [5.0] debug2: _set_limit: conf setrlimit
> RLIMIT_AS no change in value: 18446744073709551615
> [2017-06-12T14:09:06.694] [5.0] debug2: Set task rss(128000 MB)
> [2017-06-12T14:09:06.945] [5.0] debug4: Entering _task_read for obj 2139cc0
> [2017-06-12T14:09:06.945] [5.0] debug4: _task_build_message: Entering...
> [2017-06-12T14:09:06.945] [5.0] debug4: Leaving  _task_build_message
> [2017-06-12T14:09:06.945] [5.0] debug4: eio: handling events for 3 objects
> [2017-06-12T14:09:06.945] [5.0] debug4: Entering _task_read for obj 2139cc0
> [2017-06-12T14:09:06.945] [5.0] debug4: _task_build_message: Entering...
> [2017-06-12T14:09:06.945] [5.0] debug4: Leaving  _task_build_message
> [2017-06-12T14:09:06.945] [5.0] debug4: eio: handling events for 3 objects
> [2017-06-12T14:09:06.945] [5.0] debug4: Entering _task_read for obj 2139cc0
> [2017-06-12T14:09:06.945] [5.0] debug4: _task_build_message: Entering...
> [2017-06-12T14:09:06.945] [5.0] debug4: Leaving  _task_build_message
> [2017-06-12T14:09:06.945] [5.0] debug4: eio: handling events for 3 objects
> [2017-06-12T14:09:32.008] [5.0] debug4: Entering _client_read
> [2017-06-12T14:09:32.008] [5.0] debug3: Entering io_hdr_read_fd
> [2017-06-12T14:09:32.008] [5.0] debug3:   _full_read (_client_read) got eof
> [2017-06-12T14:09:32.008] [5.0] debug3: Leaving  io_hdr_read_fd
> [2017-06-12T14:09:32.008] [5.0] debug4: eio: handling events for 3 objects
> [2017-06-12T14:09:32.010] debug3: in the service_connection
> [2017-06-12T14:09:32.011] [5.0] debug:  _slurm_recv_timeout at 0 of 4,
> recv zero bytes
> [2017-06-12T14:09:32.011] debug2: got this type of message 6011
> [2017-06-12T14:09:32.011] debug2: Processing RPC: REQUEST_TERMINATE_JOB
> [2017-06-12T14:09:32.011] debug:  _rpc_terminate_job, uid = 493
> [2017-06-12T14:09:32.011] debug:  task_p_slurmd_release_resources: 5
> [2017-06-12T14:09:32.011] debug3: state for jobid 2850: ctime:1497018336
> revoked:0 expires:0
> [2017-06-12T14:09:32.011] debug3: state for jobid 5: ctime:1497294520
> revoked:0 expires:0
> [2017-06-12T14:09:32.011] debug3: state for jobid 5: ctime:1497294520
> revoked:0 expires:0
> [2017-06-12T14:09:32.011] debug:  credential for job 5 revoked
> [2017-06-12T14:09:32.011] debug4: found jobid = 5, stepid = 0
> [2017-06-12T14:09:32.011] [5.0] debug3: Called _msg_socket_accept
> [2017-06-12T14:09:32.011] [5.0] debug3: Leaving _msg_socket_accept
> [2017-06-12T14:09:32.011] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:32.011] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:32.011] [5.0] debug3: Entering _handle_accept (new
> thread)
> [2017-06-12T14:09:32.011] [5.0] debug3:   Identity: uid=0, gid=0
> [2017-06-12T14:09:32.011] [5.0] debug3: Entering _handle_request
> [2017-06-12T14:09:32.011] debug2: container signal 18 to job 5.0
> [2017-06-12T14:09:32.011] [5.0] debug3: Got request 4
> [2017-06-12T14:09:32.011] [5.0] debug:  Handling REQUEST_SIGNAL_CONTAINER
> [2017-06-12T14:09:32.011] [5.0] debug:  _handle_signal_container for
> step=5.0 uid=0 signal=18
> [2017-06-12T14:09:32.012] [5.0] Sent signal 18 to 5.0
> [2017-06-12T14:09:32.012] [5.0] debug3: Leaving  _handle_request:
> SLURM_SUCCESS
> [2017-06-12T14:09:32.012] [5.0] debug3: Entering _handle_request
> [2017-06-12T14:09:32.012] [5.0] debug3: Leaving  _handle_accept
> [2017-06-12T14:09:32.012] debug4: found jobid = 5, stepid = 0
> [2017-06-12T14:09:32.012] [5.0] debug3: Called _msg_socket_accept
> [2017-06-12T14:09:32.012] [5.0] debug3: Leaving _msg_socket_accept
> [2017-06-12T14:09:32.012] [5.0] debug4: eio: handling events for 1 objects
> [2017-06-12T14:09:32.012] [5.0] debug3: Called _msg_socket_readable
> [2017-06-12T14:09:32.012] [5.0] debug3: Entering _handle_accept (new
> thread)
> [2017-06-12T14:09:32.012] [5.0] debug3:   Identity: uid=0, gid=0
> [2017-06-12T14:09:32.012] [5.0] debug3: Entering _handle_request
> [2017-06-12T14:09:32.012] debug2: container signal 15 to job 5.0
> [2017-06-12T14:09:32.012] [5.0] debug3: Got request 4
> [2017-06-12T14:09:32.012] [5.0] debug:  Handling REQUEST_SIGNAL_CONTAINER
> [2017-06-12T14:09:32.012] [5.0] debug:  _handle_signal_container for
> step=5.0 uid=0 signal=15
> [2017-06-12T14:09:32.012] [5.0] error: *** STEP 5.0 ON dh0002 CANCELLED AT
> 2017-06-12T14:09:32 ***
> [2017-06-12T14:09:32.012] [5.0] Sent signal 15 to 5.0
> [2017-06-12T14:09:32.012] [5.0] debug3: Leaving  _handle_request:
> SLURM_SUCCESS
> [2017-06-12T14:09:32.012] [5.0] debug3: Entering _handle_request
> [2017-06-12T14:09:32.012] debug4: sent SUCCESS
>
> We have also seen:
> *Error* connecting slurm stream socket at 10.30.0.42:45967: Connection
> refused (where the IP is that of the submit host)
>
>
> When doing an STRACE I do see srun trying to bind to privileged ports and
> failing:
>
>
> 10636 bind(4, {sa_family=AF_INET, sin_port=htons(636),
> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
> 10636 bind(8, {sa_family=AF_INET, sin_port=htons(637),
> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
> 10636 bind(8, {sa_family=AF_INET, sin_port=htons(638),
> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
> 10636 bind(9, {sa_family=AF_INET, sin_port=htons(639),
> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
> 10636 bind(9, {sa_family=AF_INET, sin_port=htons(640),
> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
> 10636 bind(15, {sa_family=AF_INET, sin_port=htons(641),
> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
>
> Here is the slurm.conf
> ClusterName=cluster
> ControlMachine=sched01
> ControlAddr=10.30.4.41
> SlurmUser=slurm
> SlurmctldPort=6817
> SlurmdPort=6818
> AuthType=auth/munge
> StateSaveLocation=/etc/slurm/state
> SlurmdSpoolDir=/var/spool/slurm/jobs
> SwitchType=switch/none
> MpiDefault=none
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmdPidFile=/var/run/slurmd.pid
> ProctrackType=proctrack/pgid
> PluginDir=/usr/lib64/slurm
> CacheGroups=0
> ReturnToService=0
> Prolog=/var/spool/slurm/prolog_and_epilog/bin/prolog
> Epilog=/var/spool/slurm/prolog_and_epilog/bin/epilog
> UsePAM=1
> SlurmctldTimeout=300
> SlurmdTimeout=300
> InactiveLimit=0
> MinJobAge=300
> KillWait=30
> Waittime=0
> SchedulerType=sched/backfill
> SchedulerParameters=bf_max_job_start=1,sched_max_job_
> start=1,preempt_youngest_first
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> FastSchedule=1
> PreemptType=preempt/qos
> PreemptMode=requeue
> SlurmctldDebug=8
> SlurmctldLogFile=/var/log/slurm/slurmctld.log
> SlurmdDebug=8
> SlurmdLogFile=/var/spool/slurm/slurmd.log
> JobCompType=jobcomp/none
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStorageHost=sched01
> AccountingStorageLoc=slurm_acct_db
> AccountingStoragePass=/var/run/munge/munge.socket.2
> AccountingStoragePort=6819
> AccountingStorageUser=slurm
>

Reply via email to