Hi Mike,
On 25-05-16 13:22, Mike Johnson wrote:
I am in an environment that uses NFSv4, which obviously needs user credentials to grant access to filesystems. Has anyone else tackled the issue of unattended batch jobs successfully? I'm aware of AUKS.
We are using Kerberised NFS4 with Slurm 15.08 and AUKS successfully. Users generally don't have to do anything special, and some probably forget about the lifetime of Kerberos tickets.
It can sometimes take a day or more before jobs are run and we allow maximum walltimes longer than the Kerberos renewable lifetime so it's possible for tickets to expire before the jobs finish. We advise users to do a fresh login to the head nodes before submitting jobs. When users regularly submit jobs, the ticket stored in AUKS will have enough remaining lifetime to bridge a couple of days (or a long weekend). For longer jobs, they may have to run 'auks -a' to update the ticket until the job finishes.
The only thing that could be improved is the feedback when a ticket has expired. Since Slurm jobs will no longer be able to write to the output file all further output and error messages will simply be lost. For jobs that get started the only clue is that Slurm immediately reports the job as failed but no output file is created.
All in all it works well. Regards, Robbert -- Robbert Eggermont Intelligent Systems [email protected] Electr.Eng., Mathematics & Comp.Science +31 15 27 83234 Delft University of Technology
