I'm seeing quite a few of these errors:
May 2 11:33:29 holy-slurm01 slurmctld[47253]: error: slurm_receive_msg: Zero Bytes were transmitted or received May 2 11:33:29 holy-slurm01 slurmctld[47253]: error: slurm_receive_msg: Zero Bytes were transmitted or received
I know that this can be caused by a node or client that is in a bad state, but I can't figure out how to trace it back to which one. Does anyone have any tricks for tracing this sort of error back? I turned on the Protocol Debug Flag but none of the additional debug statements lead to the culprit.
-Paul Edmon-
