No, have never seen anything similar.
A small bit of help - the 'nfswatch' utility is useful for tracking down
NFS problems. '
Less relevant, but on a system which is running low on memory 'watch cat
/proc/meminfo' is often good for shining a light.


On 2 September 2017 at 00:16, Brendan Moloney <moloney.bren...@gmail.com>
wrote:

> Hello,
>
> I am using cgroups to track processes and limit memory. Occasionally it
> seems like a job will use too much memory and instead of getting killed it
> ends up in a unkillable state waiting for NFS I/O.  There are no other
> signs of NFS issues, and in fact other jobs (even on the same node) seem to
> be having no problem communicating with the same NFS server at that same
> time.  I just get hung task errors for that one specific process (that used
> too much memory).
>
> Has anyone else ran into this? Searching this mailing list archive I found
> some similar stuff, but that seemed to be in regards to installing Slurm
> itself onto an NFS4 mount rather than just having jobs use an NFS4 mount.
>
> Any advice is greatly appreciated.
>
> Thanks,
> Brendan
>

Reply via email to