Hi!
We're trying to install slurm 15.08.7 with munge 0.5.11 and experience regular communication errors on the client side:
[initialization] slurmd: debug3: in the service_connection slurmd: debug: _slurm_recv_timeout at 0 of 4, recv zero bytes slurmd: error: slurm_receive_msg_and_forward: Zero Bytes were transmitted or received slurmd: error: service_connection: slurm_receive_msg: Zero Bytes were transmitted or received slurmd: debug2: slurm_send_timeout: Socket no longer there slurmd: debug3: slurm_msg_sendto: peer has disappeared for msg_type=8001 slurmd: debug3: in the service_connection slurmd: debug2: got this type of message 1008 slurmd: debug3: in the service_connection slurmd: debug: _slurm_recv_timeout at 0 of 4, recv zero bytes slurmd: error: slurm_receive_msg_and_forward: Zero Bytes were transmitted or received slurmd: error: service_connection: slurm_receive_msg: Zero Bytes were transmitted or received slurmd: debug2: slurm_send_timeout: Socket no longer there slurmd: debug3: slurm_msg_sendto: peer has disappeared for msg_type=8001 [repeating]
However, running simple commands like
srun -n 16 hostname
works without a problem. Additionally, a quick test in a VM installation (smaller installation with less packages) works flawlessly.
What could be the cause of these problems? Best regards, Stefan
