On Saturday, December 25, 2010, DuDu <[email protected]> wrote: > Hi,t > > I knew my issue sounds weird, and I'm not sure it is opennebula's fault. But > the problem is really annoying, so can anyone shed some light? > I've a opennebula cluster deployed and running, with local disk. When a new > VM gets provisioned, the disk template is copied from a NFS to the host's > local disk. I've two VMs running on two hosts. These VMs have some heartbeat > connection between them, for HA. However when a third VM is provision on one > host (during the disk image copy process), the heartbeat connection is > timeout (socket returns "Broken Pipe"). So the failover is > triggered....(obviously it is NOT correct). > > CPU usage during the copying, and it was around 17%, which is not high. Ping > the host didn't show significant lag. I don't really understand why the > host's disk I/O triggers the VM's network problem, do you? >
It sounds plausible anyway - with nfs you involve the network too, and copying big files can bring hell in scheduling latencies... What hypervisor do you use ? If you ping the vms themselves during provisionning, do you see latency ? What about ssh interactiveness on the host and vms ? In parallel, I'd start by raising heartbeat's timeout to big values (ie timeout > time to copy a vm), just to confirm what's happening. > BR > > > -- *Stefan Praszalowicz* * * _______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
