Hi,
We are using a non-default AuthInfo configuration and based on
log-messages we see I believe this is not properly handled in certain
parts of the code.
Typical log message:
Aug 12 17:06:15 t02n20 slurmd[27001]: error: Munge encode failed:
Failed to access "/var/run/munge/munge.socket.2": No such file or
directory
Aug 12 17:06:15 t02n20 slurmd[27001]: error: Creating authentication
credential: Socket communication error
Aug 12 17:06:15 t02n20 slurmd[27001]: error: stepd_connect to 3165.0
failed: Protocol authentication error
Aug 12 17:06:15 t02n20 slurmd[27001]: error: If munged is up, restart
with --num-threads=10
Below is two untested fixes for this. It may be some time before we
can deploy this so I post them anyway for comments and possible use to
other sites.
diff -u slurm-14.11.8/src/common/stepd_api.c~
slurm-14.11.8/src/common/stepd_api.c
--- slurm-14.11.8/src/common/stepd_api.c~ 2015-07-08
00:19:49.000000000 +0200
+++ slurm-14.11.8/src/common/stepd_api.c 2015-08-13
07:31:32.330700484 +0200
@@ -238,7 +238,7 @@
buffer = init_buf(0);
/* Create an auth credential */
- auth_cred = g_slurm_auth_create(NULL, 2, NULL);
+ auth_cred = g_slurm_auth_create(NULL, 2, slurm_get_auth_info());
if (auth_cred == NULL) {
error("Creating authentication credential: %s",
g_slurm_auth_errstr(g_slurm_auth_errno(NULL)));
I believe the same error to be present in:
diff -u slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c~
slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c---
slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c~ 2015-07-08
00:19:49.000000000 +0200
+++ slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c 2015-08-13
07:34:41.204029110 +0200
@@ -154,7 +154,7 @@
spawn_subcmd_t *subcmd;
void *auth_cred;
- auth_cred = g_slurm_auth_create(NULL, 2, NULL);
+ auth_cred = g_slurm_auth_create(NULL, 2, slurm_get_auth_info());
if (auth_cred == NULL) {
error("authentication: %s",
g_slurm_auth_errstr(g_slurm_auth_errno(NULL)) );
Best regards,
Daniel Ahlin
PDC, KTH