I'm having a situation where nodes are fine, but SLURM sees them as not 
responding.  I can ssh to the nodes just fine, but in SLURM they show up 
as *.  Sometimes they begin responding again, other times they stop.  
I'm guessing this is due to latency as it takes about 10 ms for ping to 
hit these nodes from the master.  However, I have the time out for the 
daemons set to the default of 5 minutes.  Is there an inherent level of 
latency that is unacceptable to SLURM?  If so what is the value?  I 
don't see any documentation about that, nor any settings in slurm.conf

-Paul Edmon-

Reply via email to