The network should *not* be flakey - all hosts are plugged into a Cisco 
Catalyst 4500 switch. I can take a look at the port counters when I have a 
chance, but would not expect intermittent network disruptions.

Will post logs soon and provide URLs.


On 12/22/2015 02:38 PM, Simone Tiraboschi wrote:

OK, another problem :(

I was having the same problem with my second oVirt host that I had with my 
first one, where when I ran “hosted-engine —deploy” on it, after it completed 
successfully, then I was experiencing a ~50sec lag when SSH’ing into the node…

vpnp71:~ will$ time ssh root@ovirt-node-02 uptime
 19:36:06 up 4 days,  8:31,  0 users,  load average: 0.68, 0.70, 0.67

real  0m50.540s
user  0m0.025s
sys 0m0.008s

So, in the oVirt web admin console, I put the "ovirt-node-02” node into 
Maintenance mode, then SSH’d to the server and rebooted it. Sure enough, after 
the server came back up, SSH was fine (no delay), which again was the same 
experience I had had with the first oVirt host. So, I went back to the web 
console, and choose the “Confirm host has been rebooted” option, which I 
thought would be the right action to take after a reboot. The system opened a 
dialog box with a spinner, which never stopped spinning… So finally, I closed 
the dialog box with the upper right (X) symbol, and then for this same host 
choose “Activate” from the menu. It was then I noticed I had recieved a state 
transition email notifying me that "EngineUp-EngineUpBadHealth” and sure 
enough, the web UI was then unresponsive. I checked on the first oVirt host, 
the VM with the name “HostedEngine” is still running, but obviously isn’t 

So, looks like I need to restart the HostedEngine VM or take whatever action is 
needed to return oVirt to operation… Hate to keep asking this question, but 
what’s the correct action at this point?

ovirt-ha-agent should always restart it for you after a few minutes but the 
point is that the network configuration seams to be not that stable.

I know from another thread that you are trying to deploy hosted-engine over 
GlusterFS in an hyperconverged way and this, as I said, is currently not 
I think that it can also requires some specific configuration on network side.

For hyperconverged gluster+engine , it should work without any specific 
configuration on network side. However if the network is flaky, it is possible 
that there are errors with gluster volume access. Could you provide the 
ovirt-ha-agent logs as well as gluster mount logs?

Adding Sahina and Dan here.

Thanks, again,

