Hi,
There is a possible bug in the slurm_get_auth_info function
(src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon to
look for the AuthInfo parameter in slurm.conf instead of slurmdbd.conf
when the auth/munge authentication method is used (AuthType=auth/munge).
Here is the slurmdbd log revealing the problem (debug5() printing were
added in the sources) :
slurmdbd: slurmdbd version 15.08.7 started
slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016
slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c,
_send_slurmctld_register_req (line 690)
slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c,
slurm_send_node_msg (line 3601)
slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c,
slurm_auth_create (line 217)
slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_ttl (line 1732)
slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_info
slurmdbd: debug: Reading slurm.conf file:
/usr/local/slurm-15-08-7-1/etc/slurm.conf
slurmdbd: error: s_p_parse_file: unable to status file
/usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory,
retrying in 1sec up to 60sec
...
Then 60 seconds later, the auth_info value returned by
slurm_get_auth_info is NULL:
slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_info, auth_info=(null)
and slurmdbd continues without crashing, but I am not sure it is in a
safe state.
When applying this patch :
diff --git a/src/common/slurm_protocol_api.c
b/src/common/slurm_protocol_api.c
index c5db879..be1dab6 100644
--- a/src/common/slurm_protocol_api.c
+++ b/src/common/slurm_protocol_api.c
@@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void)
char *auth_info;
slurm_ctl_conf_t *conf;
- conf = slurm_conf_lock();
- auth_info = xstrdup(conf->authinfo);
- slurm_conf_unlock();
+ if (slurmdbd_conf) {
+ auth_info = xstrdup(slurmdbd_conf->auth_info);
+ } else {
+ conf = slurm_conf_lock();
+ auth_info = xstrdup(conf->authinfo);
+ slurm_conf_unlock();
+ }
return auth_info;
}
the auth_info value is now valid and consistent with the slurmdbd.conf
setting:
slurmdbd: slurmdbd version 15.08.7 started
slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016
slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c,
_send_slurmctld_register_req (line 690)
slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c,
slurm_send_node_msg (line 3600)
slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c,
slurm_auth_create (line 217)
slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_ttl (line 1731)
slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_info
slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c,
slurm_get_auth_info, auth_info=socket=/var/run/munge/munge_dbd.socket.2
What do you think ?
Best regards,
Didier