Re: [ovirt-users] Hosted Engine crash - state = EngineUp-EngineUpBadHealth

2015-12-22 Thread Will Dennis
Ah OK, my ignorance strikes again... I flushed all the rules on IPtables on 
hosts -01 and -02, and now Gluster seems up and happy...

I saw messages after I flushed IPtables that the engine state was changing, and 
eventually it landed on status: ReinitializeFSM-LocalMaintenance
...but I could not log into the admin website.

I then queried the engine VM state at the CLI, and saw it was thus:

[root@ovirt-node-01 ~]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date  : True
Hostname   : ovirt-node-01
Host ID: 1
Engine status  : {"reason": "bad vm status", "health": 
"bad", "vm": "up", "detail": "paused"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 538868a0
Host timestamp : 214954


--== Host 2 status ==--

Status up-to-date  : True
Hostname   : ovirt-node-02
Host ID: 2
Engine status  : {"reason": "vm not running on this host", 
"health": "bad", "vm": "down", "detail": "unknown"}
Score  : 0
stopped: False
Local maintenance  : True
crc32  : 419a0c6d
Host timestamp : 53528


So, I issued the command “hosted-engine --vm-shutdown” on host -01, and it 
eventually came down (had to be force-killed per status emails) and then HA 
restarted it :)

Looks like I’m back to being good now... Thanks everyone for the assist, and 
talk to you soon, I’m sure ;)


-Will

From: Simone Tiraboschi [mailto:stira...@redhat.com]
Sent: Tuesday, December 22, 2015 10:23 AM
To: Will Dennis; users
Cc: Sahina Bose; Dan Kenigsberg
Subject: Re: [ovirt-users] Hosted Engine crash - state = 
EngineUp-EngineUpBadHealth


hosted-engine-setup asks:
  iptables was detected on your computer, do you wish setup to 
configure it? (Yes, No)[Yes]:

You have just to say no here.

If you say no it's completely up to you to configure it opening the required 
ports or everything disabling it if you don't care.

The issue with gluster ports is that hosted-engine-setup simply configure 
iptables for what it knows you'll need and on 3.6 it's always assuming that the 
gluster volume is served by external hosts.



From: Sahina Bose [mailto:sab...@redhat.com<mailto:sab...@redhat.com>]
Sent: Tuesday, December 22, 2015 9:19 AM
To: Will Dennis; Simone Tiraboschi; Dan Kenigsberg

Subject: Re: [ovirt-users] Hosted Engine crash - state = 
EngineUp-EngineUpBadHealth


On 12/22/2015 07:47 PM, Sahina Bose wrote:

On 12/22/2015 07:28 PM, Will Dennis wrote:
See attached for requested log files

From gluster logs

[2015-12-22 00:40:53.501341] W [MSGID: 108001] [afr-common.c:3924:afr_notify] 
0-engine-replicate-1: Client-quorum is not met
[2015-12-22 00:40:53.502288] W [socket.c:588:__socket_rwv] 0-engine-client-2: 
readv on 138.15.200.93:49217<http://138.15.200.93:49217> failed (No data 
available)

[2015-12-22 00:41:17.667302] W [fuse-bridge.c:2292:fuse_writev_cbk] 
0-glusterfs-fuse: 3875597: WRITE => -1 (Read-only file system)

Could you check if the gluster ports are open on all nodes?

It's possible you ran into this ? - 
https://bugzilla.redhat.com/show_bug.cgi?id=1288979




From: Sahina Bose [mailto:sab...@redhat.com]
Sent: Tuesday, December 22, 2015 4:59 AM
To: Simone Tiraboschi; Will Dennis; Dan Kenigsberg
Cc: users
Subject: Re: [ovirt-users] Hosted Engine crash - state = 
EngineUp-EngineUpBadHealth


On 12/22/2015 02:38 PM, Simone Tiraboschi wrote:


On Tue, Dec 22, 2015 at 2:31 AM, Will Dennis 
mailto:wden...@nec-labs.com>> wrote:
OK, another problem :(

I was having the same problem with my second oVirt host that I had with my 
first one, where when I ran “hosted-engine —deploy” on it, after it completed 
successfully, then I was experiencing a ~50sec lag when SSH’ing into the node…

vpnp71:~ will$ time ssh root@ovirt-node-02 uptime
 19:36:06 up 4 days,  8:31,  0 users,  load average: 0.68, 0.70, 0.67

real  0m50.540s
user  0m0.025s
sys 0m0.008s


So, in the oVirt web admin console, I put the "ovirt-node-02” node into 
Maintenance mode, then SSH’d to the server and rebooted it. Sure enough, after 
the server came back up, SSH was fine (no delay), which again was the same 
experience I had had with the first oVirt host. So, I went back to the web 
console, and choose the “Confirm host has been rebooted” option, which I 
thought would be the right action to take after a reboot. The system opened a 
dialog box with a spinner, which never stopped spinning… So finally, I closed 
the dialog box wi

Re: [ovirt-users] Hosted Engine crash - state = EngineUp-EngineUpBadHealth

2015-12-22 Thread Simone Tiraboschi
 0  Y
> 3007
>
> Self-heal Daemon on localhost   N/A   N/AY
> 3012
>
> NFS Server on ovirt-node-03 2049  0  Y
> 1671
>
> Self-heal Daemon on ovirt-node-03   N/A   N/AY
> 1707
>
>
>
> I had changed the base port # per instructions found at
> http://www.ovirt.org/Features/Self_Hosted_Engine_Hyper_Converged_Gluster_Support
> :
>
> “By default gluster uses a port that vdsm also wants, so we need to change
> base-port setting avoiding the clash between the two daemons. We need to add
>
>
>
> option base-port 49217
>
> to /etc/glusterfs/glusterd.vol
>
>
>
> and ensure glusterd service is enabled and started before proceeding.”
>
>
>
> So I did that on all the hosts:
>
>
>
> [root@ovirt-node-02 ~]# cat /etc/glusterfs/glusterd.vol
>
> volume management
>
> type mgmt/glusterd
>
> option working-directory /var/lib/glusterd
>
> option transport-type socket,rdma
>
> option transport.socket.keepalive-time 10
>
> option transport.socket.keepalive-interval 2
>
>option transport.socket.read-fail-log off
>
> option ping-timeout 30
>
> #   option base-port 49152
>
> option base-port 49217
>
> option rpc-auth-allow-insecure on
>
> end-volume
>
>
>
>
>
> Question: does oVirt really need IPtables to be enforcing rules, or can I
> just set everything wide open? If I can, how to specify that in setup?
>

hosted-engine-setup asks:
  iptables was detected on your computer, do you wish setup to
configure it? (Yes, No)[Yes]:

You have just to say no here.

If you say no it's completely up to you to configure it opening the
required ports or everything disabling it if you don't care.

The issue with gluster ports is that hosted-engine-setup simply configure
iptables for what it knows you'll need and on 3.6 it's always assuming that
the gluster volume is served by external hosts.

>
>
>
> W.
>
>
>
>
>
> *From:* Sahina Bose [mailto:sab...@redhat.com]
> *Sent:* Tuesday, December 22, 2015 9:19 AM
> *To:* Will Dennis; Simone Tiraboschi; Dan Kenigsberg
>
> *Subject:* Re: [ovirt-users] Hosted Engine crash - state =
> EngineUp-EngineUpBadHealth
>
>
>
>
>
> On 12/22/2015 07:47 PM, Sahina Bose wrote:
>
>
>
> On 12/22/2015 07:28 PM, Will Dennis wrote:
>
> See attached for requested log files
>
>
> From gluster logs
>
> [2015-12-22 00:40:53.501341] W [MSGID: 108001]
> [afr-common.c:3924:afr_notify] 0-engine-replicate-1: Client-quorum is not
> met
> [2015-12-22 00:40:53.502288] W [socket.c:588:__socket_rwv]
> 0-engine-client-2: readv on 138.15.200.93:49217 failed (No data available)
>
> [2015-12-22 00:41:17.667302] W [fuse-bridge.c:2292:fuse_writev_cbk]
> 0-glusterfs-fuse: 3875597: WRITE => -1 (Read-only file system)
>
> Could you check if the gluster ports are open on all nodes?
>
>
> It's possible you ran into this ? -
> https://bugzilla.redhat.com/show_bug.cgi?id=1288979
>
>
>
>
>
>
>
> *From:* Sahina Bose [mailto:sab...@redhat.com ]
> *Sent:* Tuesday, December 22, 2015 4:59 AM
> *To:* Simone Tiraboschi; Will Dennis; Dan Kenigsberg
> *Cc:* users
> *Subject:* Re: [ovirt-users] Hosted Engine crash - state =
> EngineUp-EngineUpBadHealth
>
>
>
>
>
> On 12/22/2015 02:38 PM, Simone Tiraboschi wrote:
>
>
>
>
>
> On Tue, Dec 22, 2015 at 2:31 AM, Will Dennis  wrote:
>
> OK, another problem :(
>
> I was having the same problem with my second oVirt host that I had with my
> first one, where when I ran “hosted-engine —deploy” on it, after it
> completed successfully, then I was experiencing a ~50sec lag when SSH’ing
> into the node…
>
> vpnp71:~ will$ time ssh root@ovirt-node-02 uptime
>  19:36:06 up 4 days,  8:31,  0 users,  load average: 0.68, 0.70, 0.67
>
> real  0m50.540s
> user  0m0.025s
> sys 0m0.008s
>
>
> So, in the oVirt web admin console, I put the "ovirt-node-02” node into
> Maintenance mode, then SSH’d to the server and rebooted it. Sure enough,
> after the server came back up, SSH was fine (no delay), which again was the
> same experience I had had with the first oVirt host. So, I went back to the
> web console, and choose the “Confirm host has been rebooted” option, which
> I thought would be the right action to take after a reboot. The system
> opened a dialog box with a spinner, which never stopped spinning… So
> finally, I closed the dialog box with the upper right (X) symbol, and then
> for this same host choose “Activate” from the menu. It was then I noticed I
> had recieved a state transition em

Re: [ovirt-users] Hosted Engine crash - state = EngineUp-EngineUpBadHealth

2015-12-22 Thread Will Dennis
The network should *not* be flakey - all hosts are plugged into a Cisco 
Catalyst 4500 switch. I can take a look at the port counters when I have a 
chance, but would not expect intermittent network disruptions.

Will post logs soon and provide URLs.

W.



Sent with Good (www.good.com)


-Original Message-
From: Sahina Bose [sab...@redhat.com<mailto:sab...@redhat.com>]
Sent: Tuesday, December 22, 2015 04:58 AM Eastern Standard Time
To: Simone Tiraboschi; Will Dennis; Dan Kenigsberg
Cc: users
Subject: Re: [ovirt-users] Hosted Engine crash - state = 
EngineUp-EngineUpBadHealth



On 12/22/2015 02:38 PM, Simone Tiraboschi wrote:


On Tue, Dec 22, 2015 at 2:31 AM, Will Dennis 
mailto:wden...@nec-labs.com>> wrote:
OK, another problem :(

I was having the same problem with my second oVirt host that I had with my 
first one, where when I ran “hosted-engine —deploy” on it, after it completed 
successfully, then I was experiencing a ~50sec lag when SSH’ing into the node…

vpnp71:~ will$ time ssh root@ovirt-node-02 uptime
 19:36:06 up 4 days,  8:31,  0 users,  load average: 0.68, 0.70, 0.67

real  0m50.540s
user  0m0.025s
sys 0m0.008s


So, in the oVirt web admin console, I put the "ovirt-node-02” node into 
Maintenance mode, then SSH’d to the server and rebooted it. Sure enough, after 
the server came back up, SSH was fine (no delay), which again was the same 
experience I had had with the first oVirt host. So, I went back to the web 
console, and choose the “Confirm host has been rebooted” option, which I 
thought would be the right action to take after a reboot. The system opened a 
dialog box with a spinner, which never stopped spinning… So finally, I closed 
the dialog box with the upper right (X) symbol, and then for this same host 
choose “Activate” from the menu. It was then I noticed I had recieved a state 
transition email notifying me that "EngineUp-EngineUpBadHealth” and sure 
enough, the web UI was then unresponsive. I checked on the first oVirt host, 
the VM with the name “HostedEngine” is still running, but obviously isn’t 
working…

So, looks like I need to restart the HostedEngine VM or take whatever action is 
needed to return oVirt to operation… Hate to keep asking this question, but 
what’s the correct action at this point?


ovirt-ha-agent should always restart it for you after a few minutes but the 
point is that the network configuration seams to be not that stable.

I know from another thread that you are trying to deploy hosted-engine over 
GlusterFS in an hyperconverged way and this, as I said, is currently not 
supported.
I think that it can also requires some specific configuration on network side.

For hyperconverged gluster+engine , it should work without any specific 
configuration on network side. However if the network is flaky, it is possible 
that there are errors with gluster volume access. Could you provide the 
ovirt-ha-agent logs as well as gluster mount logs?


Adding Sahina and Dan here.

Thanks, again,
Will

___
Users mailing list
Users@ovirt.org<mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine crash - state = EngineUp-EngineUpBadHealth

2015-12-22 Thread Sahina Bose



On 12/22/2015 02:38 PM, Simone Tiraboschi wrote:



On Tue, Dec 22, 2015 at 2:31 AM, Will Dennis > wrote:


OK, another problem :(

I was having the same problem with my second oVirt host that I had
with my first one, where when I ran “hosted-engine —deploy” on it,
after it completed successfully, then I was experiencing a ~50sec
lag when SSH’ing into the node…

vpnp71:~ will$ time ssh root@ovirt-node-02 uptime
 19:36:06 up 4 days,  8:31,  0 users,  load average: 0.68, 0.70, 0.67

real  0m50.540s
user  0m0.025s
sys 0m0.008s


So, in the oVirt web admin console, I put the "ovirt-node-02” node
into Maintenance mode, then SSH’d to the server and rebooted it.
Sure enough, after the server came back up, SSH was fine (no
delay), which again was the same experience I had had with the
first oVirt host. So, I went back to the web console, and choose
the “Confirm host has been rebooted” option, which I thought would
be the right action to take after a reboot. The system opened a
dialog box with a spinner, which never stopped spinning… So
finally, I closed the dialog box with the upper right (X) symbol,
and then for this same host choose “Activate” from the menu. It
was then I noticed I had recieved a state transition email
notifying me that "EngineUp-EngineUpBadHealth” and sure enough,
the web UI was then unresponsive. I checked on the first oVirt
host, the VM with the name “HostedEngine” is still running, but
obviously isn’t working…

So, looks like I need to restart the HostedEngine VM or take
whatever action is needed to return oVirt to operation… Hate to
keep asking this question, but what’s the correct action at this
point?


ovirt-ha-agent should always restart it for you after a few minutes 
but the point is that the network configuration seams to be not that 
stable.


I know from another thread that you are trying to deploy hosted-engine 
over GlusterFS in an hyperconverged way and this, as I said, is 
currently not supported.
I think that it can also requires some specific configuration on 
network side.


For hyperconverged gluster+engine , it should work without any specific 
configuration on network side. However if the network is flaky, it is 
possible that there are errors with gluster volume access. Could you 
provide the ovirt-ha-agent logs as well as gluster mount logs?




Adding Sahina and Dan here.

Thanks, again,
Will

___
Users mailing list
Users@ovirt.org 
http://lists.ovirt.org/mailman/listinfo/users




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine crash - state = EngineUp-EngineUpBadHealth

2015-12-22 Thread Simone Tiraboschi
On Tue, Dec 22, 2015 at 2:31 AM, Will Dennis  wrote:

> OK, another problem :(
>
> I was having the same problem with my second oVirt host that I had with my
> first one, where when I ran “hosted-engine —deploy” on it, after it
> completed successfully, then I was experiencing a ~50sec lag when SSH’ing
> into the node…
>
> vpnp71:~ will$ time ssh root@ovirt-node-02 uptime
>  19:36:06 up 4 days,  8:31,  0 users,  load average: 0.68, 0.70, 0.67
>
> real  0m50.540s
> user  0m0.025s
> sys 0m0.008s
>
>
> So, in the oVirt web admin console, I put the "ovirt-node-02” node into
> Maintenance mode, then SSH’d to the server and rebooted it. Sure enough,
> after the server came back up, SSH was fine (no delay), which again was the
> same experience I had had with the first oVirt host. So, I went back to the
> web console, and choose the “Confirm host has been rebooted” option, which
> I thought would be the right action to take after a reboot. The system
> opened a dialog box with a spinner, which never stopped spinning… So
> finally, I closed the dialog box with the upper right (X) symbol, and then
> for this same host choose “Activate” from the menu. It was then I noticed I
> had recieved a state transition email notifying me that
> "EngineUp-EngineUpBadHealth” and sure enough, the web UI was then
> unresponsive. I checked on the first oVirt host, the VM with the name
> “HostedEngine” is still running, but obviously isn’t working…
>
> So, looks like I need to restart the HostedEngine VM or take whatever
> action is needed to return oVirt to operation… Hate to keep asking this
> question, but what’s the correct action at this point?
>
>
ovirt-ha-agent should always restart it for you after a few minutes but the
point is that the network configuration seams to be not that stable.

I know from another thread that you are trying to deploy hosted-engine over
GlusterFS in an hyperconverged way and this, as I said, is currently not
supported.
I think that it can also requires some specific configuration on network
side.
Adding Sahina and Dan here.


> Thanks, again,
> Will
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users