Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade
On 05/04/2016 01:21 PM, Michal Skrivanek wrote: On 29 Apr 2016, at 18:49, Will Denniswrote: Answers inline below... From: Michal Skrivanek [mailto:michal.skriva...@redhat.com] what exactly did you do in the UI? Clicked on the node, and in the bottom pane, clicked on the "Upgrade" link showing there (the nodes also had an icon indicating that updates were available) so..it was not in Maintenance when you run the update? You should avoid doing that as an update to any package may interfere with running guests. E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I suppose similarly for Gluster before updating anything the volumes should be in some kind of maintenance mode as well No, the "Upgrade" link once clicked migrates any running VM off the target node onto another node, then sets the target node into Maintenance mode, and then performs the updates. ok, thanks for clarification, you got me scared:) Once the updates are completed successfully, it re-activates the node and makes it available again. On the second and third nodes this coming out of Maintenance process experienced a problem with mounting the Gluster storage so it seems, and had the problems I'd indicated. it might be a question for gluster guys, it might be that the maintenance process is a bit different there Sahina, can you check/comment that? Activation of hosts - after maintenance, w.r.t gluster - checks that glusterd is connected and returns Peer status as connected. The error seems to be in activating the gluster storage domain - hosted_storage. Did you see anything suspicious w.r.t this in vdsm logs? Regarding the Host being unresponsive with "Heartbeat exceeded" - we have a bug logged on this which is being investigated - https://bugzilla.redhat.com/show_bug.cgi?id=1331006. In this bug too, the host regains connectivity after ~ 2 mins time. Thanks, michal ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade
> On 29 Apr 2016, at 18:49, Will Denniswrote: > > Answers inline below... > >> From: Michal Skrivanek [mailto:michal.skriva...@redhat.com] > >> what exactly did you do in the UI? > Clicked on the node, and in the bottom pane, clicked on the "Upgrade" link > showing there (the nodes also had an icon indicating that updates were > available) > >> so..it was not in Maintenance when you run the update? >> You should avoid doing that as an update to any package may interfere with >> running guests. >> E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I >> suppose similarly for Gluster before updating anything >> the volumes should be in some kind of maintenance mode as well > > No, the "Upgrade" link once clicked migrates any running VM off the target > node onto another node, then sets the target node into Maintenance mode, and > then performs the updates. ok, thanks for clarification, you got me scared:) > Once the updates are completed successfully, it re-activates the node and > makes it available again. On the second and third nodes this coming out of > Maintenance process experienced a problem with mounting the Gluster storage > so it seems, and had the problems I'd indicated. it might be a question for gluster guys, it might be that the maintenance process is a bit different there Sahina, can you check/comment that? Thanks, michal > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade
Answers inline below... > From: Michal Skrivanek [mailto:michal.skriva...@redhat.com] > what exactly did you do in the UI? Clicked on the node, and in the bottom pane, clicked on the "Upgrade" link showing there (the nodes also had an icon indicating that updates were available) > so..it was not in Maintenance when you run the update? > You should avoid doing that as an update to any package may interfere with > running guests. > E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I > suppose similarly for Gluster before updating anything > the volumes should be in some kind of maintenance mode as well No, the "Upgrade" link once clicked migrates any running VM off the target node onto another node, then sets the target node into Maintenance mode, and then performs the updates. Once the updates are completed successfully, it re-activates the node and makes it available again. On the second and third nodes this coming out of Maintenance process experienced a problem with mounting the Gluster storage so it seems, and had the problems I'd indicated. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade
(so noted) ...or anyone else who knows the answer ;) -Original Message- From: Michal Skrivanek [mailto:michal.skriva...@redhat.com] Sent: Friday, April 29, 2016 9:02 AM To: Will Dennis Cc: users@ovirt.org Subject: Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade > On 29 Apr 2016, at 14:46, Will Dennis <wden...@nec-labs.com> wrote: > > Bump - can any RHAT folks comment on this? note oVirt is a community project;-) > > -Original Message- > From: Will Dennis > Sent: Wednesday, April 27, 2016 11:00 PM > To: users@ovirt.org > Subject: Hosts temporarily in "Non Operational" state after upgrade > > Hi all, > > Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on > two of them, they went into “non Operational” state for a few minutes each > before springing back to life… The synopsis was this: > > - Ran updates throughout the web Admin UI ...then I got the following series > of messages via the “Events” tab in the UI: what exactly did you do in the UI? > - Updates successfully ran > - VDSM “command failed: Heartbeat exceeded” message > - host is not responding message > - "Failed to connect to hosted_storage" message > - “The error message for connection localhost:/engine returned by VDSM was: > Problem while trying to mount target” > - "Host reports about one of the Active Storage Domains as Problematic” > - “Host cannot access the Storage Domain(s) hosted_storage attached to > the data center Default. Setting host state to Non-Operational.” > - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” > (once for every brick on the host for every Gluster volume.) > - "Host was autorecovered.” > - "Status of host was set to Up.” so..it was not in Maintenance when you run the update? You should avoid doing that as an update to any package may interfere with running guests. E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I suppose similarly for Gluster before updating anything the volumes should be in some kind of maintenance mode as well > > (BTW, it would be awesome if the UI’s Events log could be copied and pasted… > Doesn’t work for me at least…) > > Duration of outage was ~3 mins per each affected host. Didn’t happen on the > first host I upgraded, but did on the last two. > > I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) > but, should this behavior be expected? > > Also, if I go onto the hosts directly and run a ‘yum update’ after this > upgrade process (not that I went thru with it, just wanted to see what was > available to be upgraded) I see a bunch of ovirt-* packages that can be > upgraded, which didn’t get updated thru the web UI’s upgrade process — > ovirt-engine-sdk-pythonnoarch 3.6.5.0-1.el7.centos > ovirt-3.6 480 k > ovirt-hosted-engine-ha noarch 1.3.5.3-1.1.el7 > centos-ovirt36 295 k > ovirt-hosted-engine-setup noarch 1.3.5.0-1.1.el7 > centos-ovirt36 270 k > ovirt-release36noarch 007-1 > ovirt-3.6 9.5 k > > Are these packages not related to the “Upgrade” process available thru the > web UI? > > FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 > 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: > libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: > libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 > Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: > vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: > libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: > libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: > libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: > libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: > libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64 > Apr 27 21:36:29 Updated:
Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade
> On 29 Apr 2016, at 14:46, Will Denniswrote: > > Bump - can any RHAT folks comment on this? note oVirt is a community project;-) > > -Original Message- > From: Will Dennis > Sent: Wednesday, April 27, 2016 11:00 PM > To: users@ovirt.org > Subject: Hosts temporarily in "Non Operational" state after upgrade > > Hi all, > > Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on > two of them, they went into “non Operational” state for a few minutes each > before springing back to life… The synopsis was this: > > - Ran updates throughout the web Admin UI ...then I got the following series > of messages via the “Events” tab in the UI: what exactly did you do in the UI? > - Updates successfully ran > - VDSM “command failed: Heartbeat exceeded” message > - host is not responding message > - "Failed to connect to hosted_storage" message > - “The error message for connection localhost:/engine returned by VDSM was: > Problem while trying to mount target” > - "Host reports about one of the Active Storage Domains as Problematic” > - “Host cannot access the Storage Domain(s) hosted_storage attached to > the data center Default. Setting host state to Non-Operational.” > - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” > (once for every brick on the host for every Gluster volume.) > - "Host was autorecovered.” > - "Status of host was set to Up.” so..it was not in Maintenance when you run the update? You should avoid doing that as an update to any package may interfere with running guests. E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I suppose similarly for Gluster before updating anything the volumes should be in some kind of maintenance mode as well > > (BTW, it would be awesome if the UI’s Events log could be copied and pasted… > Doesn’t work for me at least…) > > Duration of outage was ~3 mins per each affected host. Didn’t happen on the > first host I upgraded, but did on the last two. > > I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) > but, should this behavior be expected? > > Also, if I go onto the hosts directly and run a ‘yum update’ after this > upgrade process (not that I went thru with it, just wanted to see what was > available to be upgraded) I see a bunch of ovirt-* packages that can be > upgraded, which didn’t get updated thru the web UI’s upgrade process — > ovirt-engine-sdk-pythonnoarch 3.6.5.0-1.el7.centos > ovirt-3.6 480 k > ovirt-hosted-engine-ha noarch 1.3.5.3-1.1.el7 > centos-ovirt36 295 k > ovirt-hosted-engine-setup noarch 1.3.5.0-1.1.el7 > centos-ovirt36 270 k > ovirt-release36noarch 007-1 > ovirt-3.6 9.5 k > > Are these packages not related to the “Upgrade” process available thru the > web UI? > > FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 > 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: > libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: > libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 > Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: > vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: > libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: > libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: > libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: > libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: > libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64 > Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64 > Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64 > Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64 > Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch > Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 > Installed: unzip-6.0-15.el7.x86_64 Apr 27 21:36:30 Installed: > gtk2-2.24.28-8.el7.x86_64 Apr 27 21:36:31 Installed: > 1:virt-v2v-1.28.1-1.55.el7.centos.2.x86_64 > Apr 27 21:36:31 Updated: safelease-1.0-7.el7.x86_64 Apr 27 21:36:31 Updated: > vdsm-hook-vmfex-dev-4.17.26-1.el7.noarch > Apr 27 21:36:32 Updated: vdsm-4.17.26-1.el7.noarch
Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade
Bump - can any RHAT folks comment on this? -Original Message- From: Will Dennis Sent: Wednesday, April 27, 2016 11:00 PM To: users@ovirt.org Subject: Hosts temporarily in "Non Operational" state after upgrade Hi all, Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on two of them, they went into “non Operational” state for a few minutes each before springing back to life… The synopsis was this: - Ran updates throughout the web Admin UI ...then I got the following series of messages via the “Events” tab in the UI: - Updates successfully ran - VDSM “command failed: Heartbeat exceeded” message - host is not responding message - "Failed to connect to hosted_storage" message - “The error message for connection localhost:/engine returned by VDSM was: Problem while trying to mount target” - "Host reports about one of the Active Storage Domains as Problematic” - “Host cannot access the Storage Domain(s) hosted_storage attached to the data center Default. Setting host state to Non-Operational.” - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” (once for every brick on the host for every Gluster volume.) - "Host was autorecovered.” - "Status of host was set to Up." (BTW, it would be awesome if the UI’s Events log could be copied and pasted… Doesn’t work for me at least…) Duration of outage was ~3 mins per each affected host. Didn’t happen on the first host I upgraded, but did on the last two. I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) but, should this behavior be expected? Also, if I go onto the hosts directly and run a ‘yum update’ after this upgrade process (not that I went thru with it, just wanted to see what was available to be upgraded) I see a bunch of ovirt-* packages that can be upgraded, which didn’t get updated thru the web UI’s upgrade process — ovirt-engine-sdk-pythonnoarch 3.6.5.0-1.el7.centos ovirt-3.6 480 k ovirt-hosted-engine-ha noarch 1.3.5.3-1.1.el7 centos-ovirt36 295 k ovirt-hosted-engine-setup noarch 1.3.5.0-1.1.el7 centos-ovirt36 270 k ovirt-release36noarch 007-1 ovirt-3.6 9.5 k Are these packages not related to the “Upgrade” process available thru the web UI? FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64 Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Installed: unzip-6.0-15.el7.x86_64 Apr 27 21:36:30 Installed: gtk2-2.24.28-8.el7.x86_64 Apr 27 21:36:31 Installed: 1:virt-v2v-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:31 Updated: safelease-1.0-7.el7.x86_64 Apr 27 21:36:31 Updated: vdsm-hook-vmfex-dev-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-gluster-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-cli-4.17.26-1.el7.noarch Thanks, Will ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users