Hi,

There is a possible bug in the slurm_get_auth_info function (src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon to look for the AuthInfo parameter in slurm.conf instead of slurmdbd.conf when the auth/munge authentication method is used (AuthType=auth/munge).

Here is the slurmdbd log revealing the problem (debug5() printing were added in the sources) :

slurmdbd: slurmdbd version 15.08.7 started
slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016
slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3601) slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1732) slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info slurmdbd: debug: Reading slurm.conf file: /usr/local/slurm-15-08-7-1/etc/slurm.conf slurmdbd: error: s_p_parse_file: unable to status file /usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
...

Then 60 seconds later, the auth_info value returned by slurm_get_auth_info is NULL:

slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=(null)

and slurmdbd continues without crashing, but I am not sure it is in a safe state.

When applying this patch :

diff --git a/src/common/slurm_protocol_api.c b/src/common/slurm_protocol_api.c
index c5db879..be1dab6 100644
--- a/src/common/slurm_protocol_api.c
+++ b/src/common/slurm_protocol_api.c
@@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void)
        char *auth_info;
        slurm_ctl_conf_t *conf;

-       conf = slurm_conf_lock();
-       auth_info = xstrdup(conf->authinfo);
-       slurm_conf_unlock();
+       if (slurmdbd_conf) {
+                auth_info = xstrdup(slurmdbd_conf->auth_info);
+        } else {
+               conf = slurm_conf_lock();
+               auth_info = xstrdup(conf->authinfo);
+               slurm_conf_unlock();
+       }

        return auth_info;
 }

the auth_info value is now valid and consistent with the slurmdbd.conf setting:

slurmdbd: slurmdbd version 15.08.7 started
slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016
slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3600) slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1731) slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=socket=/var/run/munge/munge_dbd.socket.2

What do you think ?

Best regards,

Didier




Reply via email to