I'm having a situation where nodes are fine, but SLURM sees them as not responding. I can ssh to the nodes just fine, but in SLURM they show up as *. Sometimes they begin responding again, other times they stop. I'm guessing this is due to latency as it takes about 10 ms for ping to hit these nodes from the master. However, I have the time out for the daemons set to the default of 5 minutes. Is there an inherent level of latency that is unacceptable to SLURM? If so what is the value? I don't see any documentation about that, nor any settings in slurm.conf
-Paul Edmon-
