Hi, I've attached an as of yet non-tested patch against 15.08.4 for the same issues.
Best regards, Daniel On Fri, Aug 21, 2015 at 8:50 PM, Moe Jette <[email protected]> wrote: > > Perfect (except for some long lines that I reformatted). The commit is here: > https://github.com/SchedMD/slurm/commit/f8490c4389e82d6b348582d726683b04d457330f > > > Thanks! > > Quoting Daniel Ahlin <[email protected]>: > >> Hi again, >> >> [dah@petrel src]$ diff -u slurm-14.11.8/src/slurmd/slurmstepd/req.c{~,} >> --- slurm-14.11.8/src/slurmd/slurmstepd/req.c~ 2015-07-08 >> 00:19:49.000000000 +0200 >> +++ slurm-14.11.8/src/slurmd/slurmstepd/req.c 2015-08-21 >> 09:58:53.853187174 +0200 >> @@ -409,7 +409,7 @@ >> free_buf(buffer); >> goto fail; >> } >> - rc = g_slurm_auth_verify(auth_cred, NULL, 2, NULL); >> + rc = g_slurm_auth_verify(auth_cred, NULL, 2, >> slurm_get_auth_info()); >> if (rc != SLURM_SUCCESS) { >> error("Verifying authentication credential: %s", >> g_slurm_auth_errstr(g_slurm_auth_errno(auth_cred))); >> >> is needed as well, this has now been somewhat tested. >> >> However inspecting function calls I would assume the following is a >> more complete patch, but I've not tested this: >> >> --- slurm-14.11.8/src/slurmd/slurmstepd/req.c~ 2015-07-08 >> 00:19:49.000000000 +0200 >> +++ slurm-14.11.8/src/slurmd/slurmstepd/req.c 2015-08-21 >> 10:07:25.766802443 +0200 >> @@ -409,7 +409,7 @@ >> free_buf(buffer); >> goto fail; >> } >> - rc = g_slurm_auth_verify(auth_cred, NULL, 2, NULL); >> + rc = g_slurm_auth_verify(auth_cred, NULL, 2, >> slurm_get_auth_info()); >> if (rc != SLURM_SUCCESS) { >> error("Verifying authentication credential: %s", >> g_slurm_auth_errstr(g_slurm_auth_errno(auth_cred))); >> @@ -419,8 +419,8 @@ >> } >> >> /* Get the uid & gid from the credential, then destroy it. */ >> - uid = g_slurm_auth_get_uid(auth_cred, NULL); >> - gid = g_slurm_auth_get_gid(auth_cred, NULL); >> + uid = g_slurm_auth_get_uid(auth_cred, slurm_get_auth_info()); >> + gid = g_slurm_auth_get_gid(auth_cred, slurm_get_auth_info()); >> debug3(" Identity: uid=%d, gid=%d", uid, gid); >> g_slurm_auth_destroy(auth_cred); >> free_buf(buffer); >> >> Then continuing on - src/common/slurm_auth.h shows that the following >> functions are intended to take the auth_info argument: >> extern void * g_slurm_auth_create( void *hosts, int timeout, char >> *auth_info ); >> extern int g_slurm_auth_verify( void *cred, void *hosts, int >> timeout, char *auth_info ); >> extern uid_t g_slurm_auth_get_uid( void *cred, char *auth_info ); >> extern gid_t g_slurm_auth_get_gid( void *cred, char *auth_info ); >> >> g_slurm_auth_create and g_slurm_auth_verify seems to be OK now - but >> g_slurm_auth_get_uid and g_slurm_auth_get_gid are not (most cases will >> work anyway since the munge auth plugin will only use auth_info if the >> cred has not yet been verified - and in many instances it has - still >> I would assume passing the information to be the safe thing to do). >> I've attached two blind patches that just replaces null with >> slurm_get_auth_info() for these two functions. >> >> However, I have not tested these two patches and probably each >> replacement should be checked by someone who knows more of the >> surrounding context. >> >> Best regards, >> Daniel >> >> >> On Fri, Aug 14, 2015 at 12:57 AM, Moe Jette <[email protected]> wrote: >>> >>> >>> Hi Daniel, >>> >>> You seem to have found two places where the AuthInfo configuration >>> parameter >>> was not used. I've committed your patch here: >>> >>> https://github.com/SchedMD/slurm/commit/02c96859e0d1df5f98b9422a64059820cc7035c7 >>> >>> Thanks! >>> >>> >>> Quoting Daniel Ahlin <[email protected]>: >>>> >>>> >>>> Hi, >>>> >>>> We are using a non-default AuthInfo configuration and based on >>>> log-messages we see I believe this is not properly handled in certain >>>> parts of the code. >>>> >>>> Typical log message: >>>> Aug 12 17:06:15 t02n20 slurmd[27001]: error: Munge encode failed: >>>> Failed to access "/var/run/munge/munge.socket.2": No such file or >>>> directory >>>> Aug 12 17:06:15 t02n20 slurmd[27001]: error: Creating authentication >>>> credential: Socket communication error >>>> Aug 12 17:06:15 t02n20 slurmd[27001]: error: stepd_connect to 3165.0 >>>> failed: Protocol authentication error >>>> Aug 12 17:06:15 t02n20 slurmd[27001]: error: If munged is up, restart >>>> with --num-threads=10 >>>> >>>> Below is two untested fixes for this. It may be some time before we >>>> can deploy this so I post them anyway for comments and possible use to >>>> other sites. >>>> >>>> diff -u slurm-14.11.8/src/common/stepd_api.c~ >>>> slurm-14.11.8/src/common/stepd_api.c >>>> --- slurm-14.11.8/src/common/stepd_api.c~ 2015-07-08 >>>> 00:19:49.000000000 +0200 >>>> +++ slurm-14.11.8/src/common/stepd_api.c 2015-08-13 >>>> 07:31:32.330700484 +0200 >>>> @@ -238,7 +238,7 @@ >>>> >>>> buffer = init_buf(0); >>>> /* Create an auth credential */ >>>> - auth_cred = g_slurm_auth_create(NULL, 2, NULL); >>>> + auth_cred = g_slurm_auth_create(NULL, 2, slurm_get_auth_info()); >>>> if (auth_cred == NULL) { >>>> error("Creating authentication credential: %s", >>>> g_slurm_auth_errstr(g_slurm_auth_errno(NULL))); >>>> >>>> I believe the same error to be present in: >>>> >>>> diff -u slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c~ >>>> slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c--- >>>> slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c~ 2015-07-08 >>>> 00:19:49.000000000 +0200 >>>> +++ slurm-14.11.8/src/plugins/mpi/pmi2/spawn.c 2015-08-13 >>>> 07:34:41.204029110 +0200 >>>> @@ -154,7 +154,7 @@ >>>> spawn_subcmd_t *subcmd; >>>> void *auth_cred; >>>> >>>> - auth_cred = g_slurm_auth_create(NULL, 2, NULL); >>>> + auth_cred = g_slurm_auth_create(NULL, 2, slurm_get_auth_info()); >>>> if (auth_cred == NULL) { >>>> error("authentication: %s", >>>> g_slurm_auth_errstr(g_slurm_auth_errno(NULL)) ); >>>> >>>> Best regards, >>>> Daniel Ahlin >>>> PDC, KTH >>> >>> >>> >>> >>> -- >>> Morris "Moe" Jette >>> CTO, SchedMD LLC >>> Commercial Slurm Development and Support >>> =============================================================== >>> Slurm User Group Meeting, 15-16 September 2015, Washington D.C. >>> http://slurm.schedmd.com/slurm_ug_agenda.html > > > > -- > Morris "Moe" Jette > CTO, SchedMD LLC > Commercial Slurm Development and Support > =============================================================== > Slurm User Group Meeting, 15-16 September 2015, Washington D.C. > http://slurm.schedmd.com/slurm_ug_agenda.html
g_slurm_auth_x-should-use-slurm_get_auth_info.patch.gz
Description: GNU Zip compressed data
