I'd recommend avoiding NFSv4 as previously mentioned. We tried v4 and it caused lots of issues. I think the primary problem was our cgroup release agents were on NFSv4 and when jobs finished the delay from NFSv4 caused kernel locks which made nodes go unresponsive. Switched to v3 and issue was gone.
- Trey On Mar 24, 2015 3:18 PM, "Eduardo Almeida Costa" <[email protected]> wrote: > Humm... I always feel lost about NFS4 or NFS3. I shall test this. > > 2015-03-24 17:11 GMT-03:00 Paul Edmon <[email protected]>: > >> >> Interesting. Yeah we use v3 here. Hadn't tried out v4, and good thing >> we didn't then. >> >> -Paul Edmon- >> >> >> On 03/24/2015 04:05 PM, Uwe Sauter wrote: >> >>> And if you are planning on using cgroups, don't use NFSv4. There are >>> problems that cause the NFS client process to freeze (and >>> with that freeze the node) when the cgroup removal script is called. >>> >>> Regards, >>> >>> Uwe Sauter >>> >>> Am 24.03.2015 um 20:50 schrieb Paul Edmon: >>> >>>> Yup, that's exactly what we do. We make sure to export it read only >>>> and make sure that it is synced and hard mounted. Not much >>>> else to it. >>>> >>>> -Paul Edmon- >>>> >>>> On 03/24/2015 03:43 PM, Jeff Layton wrote: >>>> >>>>> Good afternoon, >>>>> >>>>> I apologies for the newb question but I'm setting up slurm >>>>> for the first time in a very long time. I've got a small cluster >>>>> of a master node and 4 compute nodes. I'd like to install >>>>> slurm on an NFS file system that is exported from the master >>>>> node and mounted on the compute nodes. I've been reading >>>>> a bit about this but does anyone have recommendations on >>>>> what to watch out for? >>>>> >>>>> Thanks! >>>>> >>>>> Jeff >>>>> >>>> >
