Hi,

We are using a non-default AuthInfo configuration and based on
log-messages we see I believe this is not properly handled in certain
parts of the code.

Typical log message:
Aug 12 17:06:15 t02n20 slurmd[27001]: error: Munge encode failed:
Failed to access "/var/run/munge/munge.socket.2": No such file or
directory
Aug 12 17:06:15 t02n20 slurmd[27001]: error: Creating authentication
credential: Socket communication error
Aug 12 17:06:15 t02n20 slurmd[27001]: error: stepd_connect to 3165.0
failed: Protocol authentication error
Aug 12 17:06:15 t02n20 slurmd[27001]: error: If munged is up, restart
with --num-threads=10

Below is two untested fixes for this. It may be some time before we
can deploy this so I post them anyway for comments and possible use to
other sites.

diff -u slurm-14.11.8/src/common/stepd_api.c~
slurm-14.11.8/src/common/stepd_api.c
--- slurm-14.11.8/src/common/stepd_api.c~       2015-07-08
00:19:49.000000000 +0200
+++ slurm-14.11.8/src/common/stepd_api.c        2015-08-13
07:31:32.330700484 +0200
@@ -238,7 +238,7 @@

        buffer = init_buf(0);
        /* Create an auth credential */
-       auth_cred = g_slurm_auth_create(NULL, 2, NULL);
+       auth_cred = g_slurm_auth_create(NULL, 2, slurm_get_auth_info());
        if (auth_cred == NULL) {
                error("Creating authentication credential: %s",
                      g_slurm_auth_errstr(g_slurm_auth_errno(NULL)));

I believe the same error to be present in:

diff -u slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c~
slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c---
slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c~ 2015-07-08
00:19:49.000000000 +0200
+++ slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c  2015-08-13
07:34:41.204029110 +0200
@@ -154,7 +154,7 @@
        spawn_subcmd_t *subcmd;
        void *auth_cred;

-       auth_cred = g_slurm_auth_create(NULL, 2, NULL);
+       auth_cred = g_slurm_auth_create(NULL, 2, slurm_get_auth_info());
        if (auth_cred == NULL) {
                error("authentication: %s",
                      g_slurm_auth_errstr(g_slurm_auth_errno(NULL)) );

Best regards,
Daniel Ahlin
PDC, KTH

Reply via email to