Hello; I’ve installed slurm 16.05 on SL 7.3 using ohpc. I also have the latest version of AUKS. I was able to hack auks so that aklog successfully runs when either obtaining or renewing a krb5 ticket.
For example — p-slogin.p-stheno.tartan.njit.edu-77 guest24>: kinit Password for [email protected]: p-slogin.p-stheno.tartan.njit.edu-78 guest24>: tokens Tokens held by the Cache Manager: --End of list-- p-slogin.p-stheno.tartan.njit.edu-79 guest24>: auks -g Auks API request succeed p-slogin.p-stheno.tartan.njit.edu-80 guest24>: tokens Tokens held by the Cache Manager: User's (AFS ID 22967) tokens for [email protected] [Expires Jul 10 22:34] --End of list— Works just as well with auks -R loop I also set up a function slurm_spank_task_init() to call aklog in the auks spank plugin. Unfortunately, this does not work. I get the following error — p-slogin.p-stheno.tartan.njit.edu-81 guest24>: srun hostname aklog: Couldn't determine realm of user:aklog: unknown RPC error (-1765328189) while getting realm My guess is that in this case the user running aklog is not “guest24” Here is some relevant lines fro the log — 2017-07-10T16:19:34.763] [78.0] debug3: Entering _handle_request [2017-07-10T16:19:34.763] [78.0] debug3: Leaving _handle_accept [2017-07-10T16:19:34.773] [78.0] debug: mpi type = (null) [2017-07-10T16:19:34.773] [78.0] debug: Using mpi/none [2017-07-10T16:19:34.773] [78.0] debug: task_p_pre_launch: 78.0, task 0 [2017-07-10T16:19:34.773] [78.0] spank-auks: running aklog [2017-07-10T16:19:34.781] [78.0] debug2: spank: auks.so: task_init = 0 [2017-07-10T16:19:34.781] [78.0] debug: [job 78] attempting to run slurm task_prolog [/opt/local/bin/TaskProlog] [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: conf setrlimit RLIMIT_CPU no change in value: 18446744073709551615 [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: conf setrlimit RLIMIT_FSIZE no change in value: 18446744073709551615 [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: conf setrlimit RLIMIT_DATA no change in value: 18446744073709551615 [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: RLIMIT_STACK : max:inf cur:inf req:8388608 [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: conf setrlimit RLIMIT_STACK succeeded [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: conf setrlimit RLIMIT_CORE no change in value: 0 [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: conf setrlimit RLIMIT_RSS no change in value: 18446744073709551615 [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: conf setrlimit RLIMIT_NPROC no change in value: 4096 [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: RLIMIT_NOFILE : max:51200 cur:51200 req:1024 [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: conf setrlimit RLIMIT_NOFILE succeeded [2017-07-10T16:19:34.813] [78.0] debug: Couldn't find SLURM_RLIMIT_MEMLOCK in environment [2017-07-10T16:19:34.813] [78.0] debug2: _set_limit: conf setrlimit RLIMIT_AS no change in value: 18446744073709551615 [2017-07-10T16:19:34.815] [78.0] task 0 (5305) exited with exit code 0. Note that the TaskProlog also calls aklog. This will get me a token using srun but will not get me a token when using sbatch. I also have in my slurm.conf “UsePAM=1” with the following slurm pamfile auth required pam_localuser.so account required pam_unix.so session required pam_limits.so session required pam_afs_session.so This doesn’t work either. Any advice would be greatly appreciated. _______________ Gedaliah Wolosh IST Academic and Research Computing Systems (ARCS) NJIT GITC 2203 973 596 5437 [email protected]
