Didier,
Thank you for your work on this. There was definitely a bug introduced
in Slurm version 15.08.7 if the AuthInfo value in slurmdbd.conf is
different from its value in slurm.conf. Your patch fixes this bug and
will be in version 15.08.8 when released. The commit is here:
https://github.com/SchedMD/slurm/commit/fa4222ecfbf978695cf850c9c3bae4e175e2c09b
Thanks!
On 2016-02-02 07:55, Didier GAZEN wrote:
Hi,
There is a possible bug in the slurm_get_auth_info function
(src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon
to look for the AuthInfo parameter in slurm.conf instead of
slurmdbd.conf when the auth/munge authentication method is used
(AuthType=auth/munge).
Here is the slurmdbd log revealing the problem (debug5() printing
were added in the sources) :
slurmdbd: slurmdbd version 15.08.7 started
slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016
slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c,
_send_slurmctld_register_req (line 690)
slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c,
slurm_send_node_msg (line 3601)
slurmdbd: debug5: in
../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create
(line 217)
slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_ttl (line 1732)
slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_info
slurmdbd: debug: Reading slurm.conf file:
/usr/local/slurm-15-08-7-1/etc/slurm.conf
slurmdbd: error: s_p_parse_file: unable to status file
/usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory,
retrying in 1sec up to 60sec
...
Then 60 seconds later, the auth_info value returned by
slurm_get_auth_info is NULL:
slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_info, auth_info=(null)
and slurmdbd continues without crashing, but I am not sure it is in a
safe state.
When applying this patch :
diff --git a/src/common/slurm_protocol_api.c
b/src/common/slurm_protocol_api.c
index c5db879..be1dab6 100644
--- a/src/common/slurm_protocol_api.c
+++ b/src/common/slurm_protocol_api.c
@@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void)
char *auth_info;
slurm_ctl_conf_t *conf;
- conf = slurm_conf_lock();
- auth_info = xstrdup(conf->authinfo);
- slurm_conf_unlock();
+ if (slurmdbd_conf) {
+ auth_info = xstrdup(slurmdbd_conf->auth_info);
+ } else {
+ conf = slurm_conf_lock();
+ auth_info = xstrdup(conf->authinfo);
+ slurm_conf_unlock();
+ }
return auth_info;
}
the auth_info value is now valid and consistent with the
slurmdbd.conf setting:
slurmdbd: slurmdbd version 15.08.7 started
slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016
slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c,
_send_slurmctld_register_req (line 690)
slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c,
slurm_send_node_msg (line 3600)
slurmdbd: debug5: in
../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create
(line 217)
slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_ttl (line 1731)
slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_info
slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_info,
auth_info=socket=/var/run/munge/munge_dbd.socket.2
What do you think ?
Best regards,
Didier