[ovirt-users] Redeploying hosted engine from backup
Hello, Just had an issue with my cluster with a self-hosted engine (hosted-engine is not coming up) and decided to redeploy it as I have a backup. Just tried a hosted-engine --deploy --restore-from-file=engine.backup But the script is asking questions like DC name, Cluster name which I can't recall correctly. What will be the consequence of the wrong answer? Is there any chance to get this info from the hosts or backup file? -- Regards, Artem ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/O7PK465YAK2ZNZM532WR5ACLKAEA2Q54/
[ovirt-users] Does anyone have a positive experience with physical host to oVirt conversion?
Hello, Just would like to check if it really possible to convert old Centos6 based physical box into oVirt 4.3 VM? I haven't been able to find any success stories on this and process seems a bit complicated. As I understand it I need a virt-v2v conversion proxy + image with virt-p2v running on physical host which needs to be converted. I'm a bit lost how I can get the converted VM into oVirt cluster than? -- Regards, Artem ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SW5U3TX5OO2D2Q74TH4EGBTSJBGS5NDJ/
[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster
Shani, supervdsm failing too. [root@ovirt1 vdsm]# systemctl status supervdsmd ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: failed (Result: start-limit) since Tue 2019-06-11 16:18:16 MSK; 5s ago Process: 176025 ExecStart=/usr/share/vdsm/daemonAdapter /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock (code=exited, status=1/FAILURE) Main PID: 176025 (code=exited, status=1/FAILURE) Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Unit supervdsmd.service entered failed state. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service failed. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service holdoff time over, scheduling restart. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: start request repeated too quickly for supervdsmd.service Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Failed to start Auxiliary vdsm service for running helper functions as root. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Unit supervdsmd.service entered failed state. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service failed. supervdsm.log is full of messages like logfile::DEBUG::2019-06-11 16:18:46,379::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:04,401::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:06,289::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:17,535::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:21,528::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:24,541::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:42,543::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:57,442::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:20:18,539::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:20:32,041::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:20:41,051::concurrent::193::root::(run) START thread (func=>, args=(), kwargs={}) Regards, Artem On Tue, Jun 11, 2019 at 3:59 PM Shani Leviim wrote: > +Dan Kenigsberg > > Hi Artem, > Thanks for the log. > > It seems that this error message appears quite a lot: > 2019-06-11 12:10:35,283+0300 ERROR (MainThread) [root] Panic: Connect to > supervdsm service failed: [Errno 2] No such file or directory (panic:29) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line > 86, in _connect > self._manager.connect, Exception, timeout=60, tries=3) > File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line > 58, in retry > return func() > File "/usr/lib64/python2.7/multiprocessing/managers.py", line 500, in > connect > conn = Client(self._address, authkey=self._authkey) > File "/usr/lib64/python2.7/multiprocessing/connection.py", line 173, in > Client > c = SocketClient(address) > File "/usr/lib64/python2.7/multiprocessing/connection.py", line 308, in > SocketClient > s.connect(address) > File "/usr/lib64/python2.7/socket.py", line 224, in meth > return getattr(self._sock,name)(*args) > error: [Errno 2] No such file or directory > > Can you please verify that the 'supervdsmd.service' is running? > > > *Regards,* > > *Shani Leviim* > > > On Tue, Jun 11, 2019 at 3:04 PM Artem Tambovskiy < > artem.tambovs...@gmail.com> wrote: > >> Hi Shani, >> >> yes, you are right - I can do ssh form aby to any hosts in the cluster. >> vdsm.log attached. >> I have tried to restart vdsm manually and even done a host restart >> several times with no success. >> Host activation fails all the time ... >> >> Thank you in advance for your help! >> Regard, >> Artem >> >> On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim wrote: >> >>> Hi Artem, >>> According to oVirt documentation [1], hosts on the same cluster should >>> be reachable from one to each other. >>> >>> Can you please share your vdsm log? >>> I suppose you do manage to ssh that inactive host (correct me if I'm >>> wrong). >>> While getting the vdsm log,
[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster
Hi Shani, yes, you are right - I can do ssh form aby to any hosts in the cluster. vdsm.log attached. I have tried to restart vdsm manually and even done a host restart several times with no success. Host activation fails all the time ... Thank you in advance for your help! Regard, Artem On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim wrote: > Hi Artem, > According to oVirt documentation [1], hosts on the same cluster should be > reachable from one to each other. > > Can you please share your vdsm log? > I suppose you do manage to ssh that inactive host (correct me if I'm > wrong). > While getting the vdsm log, maybe try to restart the network and vdsmd > services on the host. > > Another thing you can try on the UI is putting the host on maintenance and > then activate it. > > [1] > https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduction-to-clusters > > > *Regards,* > > *Shani Leviim* > > > On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy < > artem.tambovs...@gmail.com> wrote: > >> Hello, >> >> May I ask you for and advise? >> I'm running a small oVirt cluster and couple of months ago I decided to >> do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time. >> I can only guess what I did wrong - probably one of the problems that I >> haven't switched the cluster from iptables to firewalld. But this is just >> my guess. >> >> The problem is that I have upgraded the engine and one host, and then I >> done an upgrade of second host I can't bring it to active state. Looks like >> VDSM can't detect the network and fails to start. I even tried to reinstall >> the hosts from UI (I have seen that the packages being installed) but >> again, VDSM doesn't startup at the end and reinstallation fails. >> >> Looking at hosts process list I see script *wait_for_ipv4s* hanging >> forever. >> >> vdsm 8603 1 6 16:26 ?00:00:00 /usr/bin/python >> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent >> >> *root 8630 1 0 16:26 ?00:00:00 /bin/sh >> /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot 8645 8630 6 >> 16:26 ?00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s* >> root 8688 1 30 16:27 ?00:00:00 /usr/bin/python2 >> /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock >> vdsm 8715 1 0 16:27 ?00:00:00 /usr/bin/python >> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker >> >> The all hosts in cluster are reachable from each other ... That could be >> the issue? >> >> Thank you in advance! >> -- >> Regards, >> Artem >> ___ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DECKKUMMRCWXTRM6BGIAB/ >> > -- Regards, Artem vdsm.tar.bzip2 Description: Binary data ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/U65YDKV4P6IFXENCQOCGNR23KXTM6HHD/
[ovirt-users] Can't bring upgraded to 4.3 host back to cluster
Hello, May I ask you for and advise? I'm running a small oVirt cluster and couple of months ago I decided to do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time. I can only guess what I did wrong - probably one of the problems that I haven't switched the cluster from iptables to firewalld. But this is just my guess. The problem is that I have upgraded the engine and one host, and then I done an upgrade of second host I can't bring it to active state. Looks like VDSM can't detect the network and fails to start. I even tried to reinstall the hosts from UI (I have seen that the packages being installed) but again, VDSM doesn't startup at the end and reinstallation fails. Looking at hosts process list I see script *wait_for_ipv4s* hanging forever. vdsm 8603 1 6 16:26 ?00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent *root 8630 1 0 16:26 ?00:00:00 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot 8645 8630 6 16:26 ?00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s* root 8688 1 30 16:27 ?00:00:00 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock vdsm 8715 1 0 16:27 ?00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker The all hosts in cluster are reachable from each other ... That could be the issue? Thank you in advance! -- Regards, Artem ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DECKKUMMRCWXTRM6BGIAB/
[ovirt-users] Re: Hosts not coming back into oVirt
Hi, I have exactly the same issue after upgrade from 4.2.8 to 4.3.2. I can reach the host from SHE but the VDSM is constantly failing to start on the host after upgrade. чт, 21 мар. 2019 г., 19:48 Simone Tiraboschi : > > > On Thu, Mar 21, 2019 at 3:47 PM Arif Ali wrote: > >> Hi all, >> >> Recently deployed oVirt version 4.3.1 >> >> It's in a self-hosted engine environment >> >> Used the steps via cockpit to install the engine, and was able to add >> the rest of the oVirt nodes without any specific problems >> >> We tested the HA of the hosted-engine without a problem, and then at one >> point of turn off the machine that was hosting the engine, to mimic >> failure to see how it goes; the vm was able to move over successfully, >> but some of the oVirt started to go into Unassigned. From a total of 6 >> oVirt hosts, I have 4 of them in this state. >> >> Clicking on the host, I see the following message in the events. I can >> get to the hosts via the engine, and ping the machine, so not sure what >> it's doing that it's no longer working >> >> VDSM command Get Host Capabilities failed: Message timeout which >> can be caused by communication issues >> >> Mind you, I have been trying to resolve this issue since Monday, and >> have tried various things, like rebooting and re-installing the oVirt >> hosts, without having much luck >> >> So any assistance on this would be grateful, maybe I've missed something >> really simple, and I am overlooking it >> > > Can you please check that VDSM is correctly running on that nodes? > Are you able to correctly reach that nodes from the engine VM? > > >> >> -- >> regards, >> >> Arif Ali >> ___ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FYG7NEV24JCCR4RIXLOMZ2CAPYAH4GDH/ >> > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y7YVXSLFJ3XCQSPJSPQ2K2OCCMS2F465/ > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FN23MNIIKFLVJPBVHU3T236X7S6I42HK/
[ovirt-users] Host unresponsive after upgrade 4.2.8 -> 4.3.2 failed
Hello, Just started upgrading my small cluster to from 4.2.8 to 4.3.2 and endup in the situation that one of the hosts is not working after upgrade. For some reason vdsmd is not starting up, I have tried to restart it manually with no luck: Any ideas on what could be the reason? [root@ovirt2 log]# systemctl restart vdsmd A dependency job for vdsmd.service failed. See 'journalctl -xe' for details. [root@ovirt2 log]# journalctl -xe -- Unit ovirt-ha-agent.service has finished shutting down. Mar 19 15:47:47 ovirt2.domain.org systemd[1]: Starting Virtual Desktop Server Manager... -- Subject: Unit vdsmd.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit vdsmd.service has begun starting up. Mar 19 15:47:47 ovirt2.domain.org vdsmd_init_common.sh[56717]: vdsm: Running mkdirs Mar 19 15:47:47 ovirt2.domain.org vdsmd_init_common.sh[56717]: vdsm: Running configure_coredump Mar 19 15:47:47 ovirt2.domain.org vdsmd_init_common.sh[56717]: vdsm: Running configure_vdsm_logs Mar 19 15:47:47 ovirt2.domain.org vdsmd_init_common.sh[56717]: vdsm: Running wait_for_network Mar 19 15:47:47 ovirt2.domain.org supervdsmd[56716]: Supervdsm failed to start: 'module' object has no attribute 'Accounting' Mar 19 15:47:47 ovirt2.domain.org python2[56716]: detected unhandled Python exception in '/usr/share/vdsm/supervdsmd' Mar 19 15:47:48 ovirt2.domain.org abrt-server[56745]: Duplicate: core backtrace Mar 19 15:47:48 ovirt2.domain.org abrt-server[56745]: DUP_OF_DIR: /var/tmp/abrt/Python-2019-03-19-14:23:04-17292 Mar 19 15:47:48 ovirt2.domain.org abrt-server[56745]: Deleting problem directory Python-2019-03-19-15:47:47-56716 (dup of Python-2019-03-19-14:23:04-17292 Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: Traceback (most recent call last): Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: File "/usr/share/vdsm/supervdsmd", line 26, in Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: supervdsm_server.main(sys.argv[1:]) Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 294, in main Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: module_name)) Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: __import__(name) Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_api/systemd.py", line 34, in Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: cmdutils.Accounting.CPU, Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: AttributeError: 'module' object has no attribute 'Accounting' Mar 19 15:47:49 ovirt2.domain.org systemd[1]: supervdsmd.service: main process exited, code=exited, status=1/FAILURE Mar 19 15:47:49 ovirt2.domain.org systemd[1]: Unit supervdsmd.service entered failed state. Mar 19 15:47:49 ovirt2.domain.org systemd[1]: supervdsmd.service failed. Mar 19 15:47:49 ovirt2.domain.org systemd[1]: supervdsmd.service holdoff time over, scheduling restart. Mar 19 15:47:49 ovirt2.domain.org systemd[1]: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Mar 19 15:47:49 ovirt2.domain.org systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root. -- Subject: Unit supervdsmd.service has finished shutting down -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit supervdsmd.service has finished shutting down. Mar 19 15:47:49 ovirt2.domain.org systemd[1]: Started Auxiliary vdsm service for running helper functions as root. -- Subject: Unit supervdsmd.service has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit supervdsmd.service has finished starting up. -- -- The start-up result is done. Mar 19 15:47:50 ovirt2.domain.org supervdsmd[56757]: Supervdsm failed to start: 'module' object has no attribute 'Accounting' Mar 19 15:47:50 ovirt2.domain.org python2[56757]: detected unhandled Python exception in '/usr/share/vdsm/supervdsmd' -- Regards, Artem ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RXQ7ZH2EZ74CO3VID7PXAXO6CHK4BXH3/
[ovirt-users] VM clone network interfaces names changes
Hello, I have a question indirectly related to oVirt - I have a VM with CentOS 6 running on my cluster, which has 6 virtual interfaces (eth0 - eth5). Now its a time to do an upgrade to CentOS 7 based, and I did a VM clone to test an upgrade process and was a bit surprised to see that now I have interface names shifted eth6 - eth11. I would afraid that I'll run out of digits soon :) Anyway how to change the interfaces names back to originals and perhaps prevent them form the further changes? I do understand that MAC address has changed, but don't get why it changing the interfaces names. Cleaning up everything from /etc/udev/rules.d/70-persistent-net.rules doesn't really helped .. Regards, Artem ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VF7FANR24CQS4DHDBLYXAUTHL3G42FR2/
[ovirt-users] lost connection to hosted engine
Hi, Just run into the issue during cluster upgrade from 4.24 to 4.2.6.1. I'm running small cluster with 2 hosts and gluster storage. Once I upgraded one of the hosts to 4.2.6.1 something went wrong (looks like it tried to start HE instance) and I can't connect to hosted-engine any longer. As I can see HostedEngine is still running on the second host (and another yet 7 VM's) , but I can't stop it. ovirt-ha-agent and ovirt-ha-broker are failing to start. hosted-engine --vm-status gives nothing but error message "The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable." ps -ef shows plenty of vdsm processes in defunc state thats probably the reason why agent and brocker can't start. Just wondering that is the good way to start problem resolution here to minimize downtime for running VM's? Restart vdsm and try again restarting agent and broker or just reboot the whole host? Regards, Artem ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BKU2N2UOEHWJ3XKJ5DRTERKBTQZ4X7EB/
[ovirt-users] ovirt host upgrade 4.2.2 -> 4.2.3
Hello, I'm upgrading my cluster from 4.2.2 to 4.2.3, HE upgrade went well, but having some issues with hosts upgrade: for some reason yum complaining about conflicts during transaction check: Transaction check error: file /usr/share/cockpit/networkmanager/manifest.json from install of cockpit-system-160-3.el7.centos.noarch conflicts with file from package cockpit-networkmanager-160-1.el7.centos.noarch Any ideas about the reason for this? Regards, Artem ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org
Re: [ovirt-users] Hosted engine VDSM issue with sanlock
Hi, How many hosts you have? Check hosted-engine.conf on all hosts including the one you have problem with and look if all host_id values are unique. It might happen that you have several hosts with host_id=1 Regards, Artem ср, 28 мар. 2018 г., 20:49 Jamie Lawrence: > I still can't resolve this issue. > > I have a host that is stuck in a cycle; it will be marked non responsive, > then come back up, ending with an "finished activation" message in the GUI. > Then it repeats. > > The root cause seems to be sanlock. I'm just unclear on why it started or > how to resolve it. The only "approved" knob I'm aware of is > --reinitialize-lockspace and the manual equivalent, neither of which fix > anything. > > Anyone have a guess? > > -j > > - - - vdsm.log - - - - > > 2018-03-28 10:38:22,207-0700 INFO (monitor/b41eb20) [storage.SANLock] > Acquiring host id for domain b41eb20a-eafb-481b-9a50-a135cf42b15e (id=1, > async=True) (clusterlock:284) > 2018-03-28 10:38:22,208-0700 ERROR (monitor/b41eb20) [storage.Monitor] > Error acquiring host id 1 for domain b41eb20a-eafb-481b-9a50-a135cf42b15e > (monitor:568) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line > 565, in _acquireHostId > self.domain.acquireHostId(self.hostId, async=True) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 828, in > acquireHostId > self._manifest.acquireHostId(hostId, async) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 453, in > acquireHostId > self._domainLock.acquireHostId(hostId, async) > File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", > line 315, in acquireHostId > raise se.AcquireHostIdFailure(self._sdUUID, e) > AcquireHostIdFailure: Cannot acquire host id: > (u'b41eb20a-eafb-481b-9a50-a135cf42b15e', SanlockException(22, 'Sanlock > lockspace add failure', 'Invalid argument')) > 2018-03-28 10:38:23,078-0700 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC > call Host.ping2 succeeded in 0.00 seconds (__init__:573) > 2018-03-28 10:38:23,085-0700 INFO (jsonrpc/6) [vdsm.api] START > repoStats(domains=[u'b41eb20a-eafb-481b-9a50-a135cf42b15e']) > from=::1,54450, task_id=186d7e8b-7b4e-485d-a9e0-c0cb46eed621 (api:46) > 2018-03-28 10:38:23,085-0700 INFO (jsonrpc/6) [vdsm.api] FINISH repoStats > return={u'b41eb20a-eafb-481b-9a50-a135cf42b15e': {'code': 0, 'actual': > True, 'version': 4, 'acquired': False, 'delay': '0.000812547', 'lastCheck': > '0.4', 'valid': True}} from=::1,54450, > task_id=186d7e8b-7b4e-485d-a9e0-c0cb46eed621 (api:52) > 2018-03-28 10:38:23,086-0700 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC > call Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:573) > 2018-03-28 10:38:23,092-0700 WARN (vdsm.Scheduler) [Executor] Worker > blocked: action= at 0x1d44150> > timeout=15, duration=150 at 0x7f076c05fb90> task#=83985 at 0x7f082c08e510>, > traceback: > File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap > self.__bootstrap_inner() > File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner > self.run() > File: "/usr/lib64/python2.7/threading.py", line 765, in run > self.__target(*self.__args, **self.__kwargs) > File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line > 194, in run > ret = func(*args, **kwargs) > File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in > _run > self._execute_task() > File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in > _execute_task > task() > File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in > __call__ > self._callable() > File: "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 213, > in __call__ > self._func() > File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 578, > in __call__ > stats = hostapi.get_stats(self._cif, self._samples.stats()) > File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 77, in > get_stats > ret['haStats'] = _getHaInfo() > File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in > _getHaInfo > stats = instance.get_all_stats() > File: > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 93, in get_all_stats > stats = broker.get_stats_from_storage() > File: > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 135, in get_stats_from_storage > result = self._proxy.get_stats() > File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__ > return self.__send(self.__name, args) > File: "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request > verbose=self.__verbose > File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request > return self.single_request(host, handler, request_body, verbose) > File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request > response = h.getresponse(buffering=True) > File:
Re: [ovirt-users] Issue with deploy HE on another host 4.1
Hello Krzysztof, As I can see both hosts have the same host_id=1, which causing conflict. You need this this manually on the newly deployed host and restart ovirt-ha-agent. You may run following command on engine VM in order to find correct host_id values for your hosts. sudo -u postgres psql -d engine -c 'select vds_name, vds_spm_id from vds' Once you fixed host_id and restarted agents, i would advise to check sanlock client status in order to see that there are no conflicts and hosts using correct host_id values. Regards, Artem пт, 2 мар. 2018 г., 17:10 Krzysztof Wajda: > Hi, > > I have an issue with Hosted Engine when I try to deploy via gui on another > host. There is no errors after deploy but in GUI I see only "Not active" > status HE, and hosted-engine --status shows only 1 node (on both nodes same > output). In hosted-engine.conf I see that host_id is the same as it is on > primary host with HE !? Issue looks quite similar like in > > http://lists.ovirt.org/pipermail/users/2018-February/086932.html > > Here is config file on newly deployed node : > > ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem > gateway=192.168.8.1 > iqn= > conf_image_UUID=f2813205-4b0c-45f3-a9cb-3748f61d2194 > ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem > sdUUID=7e7a275c-6939-4f79-85f6-d695209951ea > connectionUUID=81a2f9a3-2efe-448f-b305-e22543068044 > conf_volume_UUID=d6b7e25c-9912-47ff-b104-9d424b9f34b8 > user= > host_id=1 > bridge=ovirtmgmt > metadata_image_UUID=fe95f22e-b468-4adf-a754-21d419ae3e67 > spUUID=---- > mnt_options= > fqdn=dev-ovirtengine0.somedomain.it > portal= > vm_disk_id=febde231-92cc-4599-8f55-816f63132739 > metadata_volume_UUID=7ebaf268-15ec-4c76-ba89-b5e2dc143830 > vm_disk_vol_id=e3920b18-4467-44f8-b2d0-629b3b1d1a58 > domainType=fc > port= > console=vnc > ca_subject="C=EN, L=Test, O=Test, CN=Test" > password= > vmid=3f7d9c1d-6c3e-4b96-b85d-d240f3bf9b76 > lockspace_image_UUID=49e318ad-63a3-4efd-977c-33b8c4c93728 > lockspace_volume_UUID=91bcb5cf-006c-42b4-b419-6ac9f841f50a > vdsm_use_ssl=true > storage=None > conf=/var/run/ovirt-hosted-engine-ha/vm.conf > > This is original one: > > fqdn=dev-ovirtengine0.somedomain.it > vm_disk_id=febde231-92cc-4599-8f55-816f63132739 > vm_disk_vol_id=e3920b18-4467-44f8-b2d0-629b3b1d1a58 > vmid=3f7d9c1d-6c3e-4b96-b85d-d240f3bf9b76 > storage=None > mnt_options= > conf=/var/run/ovirt-hosted-engine-ha/vm.conf > host_id=1 > console=vnc > domainType=fc > spUUID=---- > sdUUID=7e7a275c-6939-4f79-85f6-d695209951ea > connectionUUID=81a2f9a3-2efe-448f-b305-e22543068044 > ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem > ca_subject="C=EN, L=Test, O=Test, CN=Test" > vdsm_use_ssl=true > gateway=192.168.8.1 > bridge=ovirtmgmt > metadata_volume_UUID=7ebaf268-15ec-4c76-ba89-b5e2dc143830 > metadata_image_UUID=fe95f22e-b468-4adf-a754-21d419ae3e67 > lockspace_volume_UUID=91bcb5cf-006c-42b4-b419-6ac9f841f50a > lockspace_image_UUID=49e318ad-63a3-4efd-977c-33b8c4c93728 > conf_volume_UUID=d6b7e25c-9912-47ff-b104-9d424b9f34b8 > conf_image_UUID=f2813205-4b0c-45f3-a9cb-3748f61d2194 > > # The following are used only for iSCSI storage > iqn= > portal= > user= > password= > port= > > Packages: > > ovirt-imageio-daemon-1.0.0-1.el7.noarch > ovirt-host-deploy-1.6.7-1.el7.centos.noarch > ovirt-release41-4.1.9-1.el7.centos.noarch > ovirt-setup-lib-1.1.4-1.el7.centos.noarch > ovirt-hosted-engine-ha-2.1.8-1.el7.centos.noarch > ovirt-hosted-engine-setup-2.1.4-1.el7.centos.noarch > ovirt-vmconsole-1.0.4-1.el7.centos.noarch > ovirt-vmconsole-host-1.0.4-1.el7.centos.noarch > ovirt-engine-sdk-python-3.6.9.1-1.el7.centos.noarch > ovirt-imageio-common-1.0.0-1.el7.noarch > > Output from agent.log > > MainThread::INFO::2018-03-02 > 15:01:47,279::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) > Success, id 140493346760912 > MainThread::INFO::2018-03-02 > 15:01:51,011::brokerlink::179::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) > Success, id 140493346759824 > MainThread::INFO::2018-03-02 > 15:01:51,011::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > Broker initialized, all submonitors started > MainThread::INFO::2018-03-02 > 15:01:51,045::hosted_engine::704::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) > Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: > /var/run/vdsm/storage/7e7a275c-6939-4f79-85f6-d695209951ea/49e318ad-63a3-4efd-977c-33b8c4c93728/91bcb5cf-006c-42b4-b419-6ac9f841f50a) > MainThread::INFO::2018-03-02 > 15:04:12,058::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) > Failed to acquire the lock. Waiting '5's before the next attempt > > Regards > > Krzysztof > > ___ > Users mailing list > Users@ovirt.org >
[ovirt-users] Question about sanlock lockspaces
Hello, I'm still troubleshooting my cluster and trying to figure out which lockspaces should be present and which shouldn't. If HE VM is not running both ovirt-ha-agent and ovirt-ha-broker are down and storage disconnected by hosted-engine --disconnect-storage should I see something related to HE storage domain in sanlock client status output? For some reason on one host I don't see anything and the second one still reports about present lockspace for HE storage domain. Is this normal? [root@ovirt1 ~]# sanlock client status daemon b1d7fea2-e8a9-4645-b449-97702fc3808e.ovirt1.tel p -1 helper p -1 listener p -1 status p 3763 p 62861 quaggaVM p 63111 powerDNS p 107818 pjsip_freepbx_14 p 109092 revizorro_dev p 109589 routerVM s a40cc3a9-54d6-40fd-acee-525ef29c8ce3:2:/rhev/data-center/mnt/glusterSD/ ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/ids:0 s 4a7f8717-9bb0-4d80-8016-498fa4b88162:1:/rhev/data-center/mnt/glusterSD/ ovirt2.telia.ru\:_engine/4a7f8717-9bb0-4d80-8016-498fa4b88162/dom_md/ids:0 r a40cc3a9-54d6-40fd-acee-525ef29c8ce3:SDM:/rhev/data-center/mnt/glusterSD/ ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/leases:1048576:49 p 3763 As it looks to me lockspace 4a7f8717-9bb0-4d80-8016-498fa4b88162:1:/rhev/data-center/mnt/glusterSD/ ovirt2.telia.ru\:_engine/4a7f8717-9bb0-4d80-8016-498fa4b88162/dom_md/ids:0 shouldn't be present, and it doesn't match to the host_id, but may be I'm wrong here... Regards, Artem ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Fwd: why host is not capable to run HE?
I took a HE VM down and stopped ovirt-ha-agents on both hosts. Tried hosted-engine --reinitialize-lockspace the command just silently executes and I'm not sure if it doing something at all. I also tried to clean the metadata. On one host it went correct, on second host it always failing with following messages: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to start monitoring domain (sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162, host_id=2): timeout during domain acquisition ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 67, in action_clean return he.clean(options.force_cleanup) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 345, in clean self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 829, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162, host_id=2): timeout during domain acquisition ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors occurred, giving up. Please review the log and consider filing a bug. INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down I'm not an expert when it comes to read the sanlock but the output looks a bit strange to me: from first host (host_id=2) [root@ovirt1 ~]# sanlock client status daemon b1d7fea2-e8a9-4645-b449-97702fc3808e.ovirt1.tel p -1 helper p -1 listener p -1 status p 3763 p 62861 quaggaVM p 63111 powerDNS p 107818 pjsip_freepbx_14 p 109092 revizorro_dev p 109589 routerVM s hosted-engine:2:/var/run/vdsm/storage/4a7f8717-9bb0-4d80- 8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/ 911c7637-b49d-463e-b186-23b404e50769:0 s a40cc3a9-54d6-40fd-acee-525ef29c8ce3:2:/rhev/data-center/mnt/glusterSD/ ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/ids:0 s 4a7f8717-9bb0-4d80-8016-498fa4b88162:1:/rhev/data-center/mnt/glusterSD/ ovirt2.telia.ru\:_engine/4a7f8717-9bb0-4d80-8016-498fa4b88162/dom_md/ids:0 r a40cc3a9-54d6-40fd-acee-525ef29c8ce3:SDM:/rhev/data-center/mnt/glusterSD/ ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/leases:1048576:49 p 3763 from second host (host_id=1) [root@ovirt2 ~]# sanlock client status daemon 9263e081-e5ea-416b-866a-0a73fe32fe16.ovirt2.tel p -1 helper p -1 listener p 150440 CentOS-Desk p 151061 centos-dev-box p 151288 revizorro_nfq p 151954 gitlabVM p -1 status s hosted-engine:1:/var/run/vdsm/storage/4a7f8717-9bb0-4d80- 8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/ 911c7637-b49d-463e-b186-23b404e50769:0 s a40cc3a9-54d6-40fd-acee-525ef29c8ce3:1:/rhev/data-center/mnt/glusterSD/ ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/ids:0 s 4a7f8717-9bb0-4d80-8016-498fa4b88162:1:/rhev/data-center/mnt/glusterSD/ ovirt2.telia.ru\:_engine/4a7f8717-9bb0-4d80-8016-498fa4b88162/dom_md/ids:0 ADD Not sure if there is a problem with locspace 4a7f8717-9bb0-4d80-8016-498fa4b88162, but both hosts showing 1 as a host_id here. Is this correct? Should't they have different Id's here? Once ha-agent's has been started hosted-engine --vm-status showing 'unknow-stale-data' for the second host. And HE just doesn't start on second host at all. Host redeployment haven't helped as well. Any advises on this? Regards, Artem On Mon, Feb 19, 2018 at 9:32 PM, Artem Tambovskiy < artem.tambovs...@gmail.com> wrote: > Thanks Martin. > > As you suggested I updated hosted-engine.conf with correct host_id values > and restarted ovirt-ha-agent services on both hosts and now I run into the > problem with status "unknown-stale-data" :( > And second host still doesn't looks as capable to run HE. > > Should I stop HE VM, bring down ovirt-ha-agents and reinitialize-lockspace > and start ovirt-ha-agents again? > > Regards, > Artem > > > > On Mon, Feb 19, 2018 at 6:45 PM, Martin Sivak <msi...@redhat.com> wrote: > >> Hi Artem, >> >> just a restart of ovirt-ha-agent services should be enough. >> >> Best regards >> >&g
[ovirt-users] Fwd: Fwd: why host is not capable to run HE?
Thanks Martin. As you suggested I updated hosted-engine.conf with correct host_id values and restarted ovirt-ha-agent services on both hosts and now I run into the problem with status "unknown-stale-data" :( And second host still doesn't looks as capable to run HE. Should I stop HE VM, bring down ovirt-ha-agents and reinitialize-lockspace and start ovirt-ha-agents again? Regards, Artem On Mon, Feb 19, 2018 at 6:45 PM, Martin Sivak <msi...@redhat.com> wrote: > Hi Artem, > > just a restart of ovirt-ha-agent services should be enough. > > Best regards > > Martin Sivak > > On Mon, Feb 19, 2018 at 4:40 PM, Artem Tambovskiy > <artem.tambovs...@gmail.com> wrote: > > Ok, understood. > > Once I set correct host_id on both hosts how to take changes in force? > With > > minimal downtime? Or i need reboot both hosts anyway? > > > > Regards, > > Artem > > > > 19 февр. 2018 г. 18:18 пользователь "Simone Tiraboschi" > > <stira...@redhat.com> написал: > > > >> > >> > >> On Mon, Feb 19, 2018 at 4:12 PM, Artem Tambovskiy > >> <artem.tambovs...@gmail.com> wrote: > >>> > >>> > >>> Thanks a lot, Simone! > >>> > >>> This is clearly shows a problem: > >>> > >>> [root@ov-eng ovirt-engine]# sudo -u postgres psql -d engine -c 'select > >>> vds_name, vds_spm_id from vds' > >>> vds_name | vds_spm_id > >>> -+ > >>> ovirt1.local | 2 > >>> ovirt2.local | 1 > >>> (2 rows) > >>> > >>> While hosted-engine.conf on ovirt1.local have host_id=1, and > ovirt2.local > >>> host_id=2. So totally opposite values. > >>> So how to get this fixed in the simple way? Update the engine DB? > >> > >> > >> I'd suggest to manually fix /etc/ovirt-hosted-engine/hosted-engine.conf > on > >> both the hosts > >> > >>> > >>> > >>> Regards, > >>> Artem > >>> > >>> On Mon, Feb 19, 2018 at 5:37 PM, Simone Tiraboschi < > stira...@redhat.com> > >>> wrote: > >>>> > >>>> > >>>> > >>>> On Mon, Feb 19, 2018 at 12:13 PM, Artem Tambovskiy > >>>> <artem.tambovs...@gmail.com> wrote: > >>>>> > >>>>> Hello, > >>>>> > >>>>> Last weekend my cluster suffered form a massive power outage due to > >>>>> human mistake. > >>>>> I'm using SHE setup with Gluster, I managed to bring the cluster up > >>>>> quickly, but once again I have a problem with duplicated host_id > >>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1543988) on second > host and due > >>>>> to this second host is not capable to run HE. > >>>>> > >>>>> I manually updated file hosted_engine.conf with correct host_id and > >>>>> restarted agent & broker - no effect. Than I rebooted the host > itself - > >>>>> still no changes. How to fix this issue? > >>>> > >>>> > >>>> I'd suggest to run this command on the engine VM: > >>>> sudo -u postgres scl enable rh-postgresql95 -- psql -d engine -c > >>>> 'select vds_name, vds_spm_id from vds' > >>>> (just sudo -u postgres psql -d engine -c 'select vds_name, vds_spm_id > >>>> from vds' if still on 4.1) and check > >>>> /etc/ovirt-hosted-engine/hosted-engine.conf on all the involved host. > >>>> Maybe you can also have a leftover configuration file on undeployed > >>>> host. > >>>> > >>>> When you find a conflict you should manually bring down sanlock > >>>> In doubt a reboot of both the hosts will solve for sure. > >>>> > >>>> > >>>>> > >>>>> > >>>>> Regards, > >>>>> Artem > >>>>> > >>>>> ___ > >>>>> Users mailing list > >>>>> Users@ovirt.org > >>>>> http://lists.ovirt.org/mailman/listinfo/users > >>>>> > >>>> > >>> > >>> > >>> > >>> ___ > >>> Users mailing list > >>> Users@ovirt.org > >>> http://lists.ovirt.org/mailman/listinfo/users > >>> > >> > > > > ___ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Fwd: why host is not capable to run HE?
Ok, understood. Once I set correct host_id on both hosts how to take changes in force? With minimal downtime? Or i need reboot both hosts anyway? Regards, Artem 19 февр. 2018 г. 18:18 пользователь "Simone Tiraboschi" <stira...@redhat.com> написал: > > > On Mon, Feb 19, 2018 at 4:12 PM, Artem Tambovskiy < > artem.tambovs...@gmail.com> wrote: > >> >> Thanks a lot, Simone! >> >> This is clearly shows a problem: >> >> [root@ov-eng ovirt-engine]# sudo -u postgres psql -d engine -c 'select >> vds_name, vds_spm_id from vds' >> vds_name | vds_spm_id >> -+ >> ovirt1.local | 2 >> ovirt2.local | 1 >> (2 rows) >> >> While hosted-engine.conf on ovirt1.local have host_id=1, and >> ovirt2.local host_id=2. So totally opposite values. >> So how to get this fixed in the simple way? Update the engine DB? >> > > I'd suggest to manually fix /etc/ovirt-hosted-engine/hosted-engine.conf > on both the hosts > > >> >> Regards, >> Artem >> >> On Mon, Feb 19, 2018 at 5:37 PM, Simone Tiraboschi <stira...@redhat.com> >> wrote: >> >>> >>> >>> On Mon, Feb 19, 2018 at 12:13 PM, Artem Tambovskiy < >>> artem.tambovs...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> Last weekend my cluster suffered form a massive power outage due to >>>> human mistake. >>>> I'm using SHE setup with Gluster, I managed to bring the cluster up >>>> quickly, but once again I have a problem with duplicated host_id ( >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1543988) on second host >>>> and due to this second host is not capable to run HE. >>>> >>>> I manually updated file hosted_engine.conf with correct host_id and >>>> restarted agent & broker - no effect. Than I rebooted the host itself - >>>> still no changes. How to fix this issue? >>>> >>> >>> I'd suggest to run this command on the engine VM: >>> sudo -u postgres scl enable rh-postgresql95 -- psql -d engine -c >>> 'select vds_name, vds_spm_id from vds' >>> (just sudo -u postgres psql -d engine -c 'select vds_name, vds_spm_id >>> from vds' if still on 4.1) and check >>> /etc/ovirt-hosted-engine/hosted-engine.conf >>> on all the involved host. >>> Maybe you can also have a leftover configuration file on undeployed host. >>> >>> When you find a conflict you should manually bring down sanlock >>> In doubt a reboot of both the hosts will solve for sure. >>> >>> >>> >>>> >>>> Regards, >>>> Artem >>>> >>>> ___ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>> >> >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Fwd: why host is not capable to run HE?
Thanks a lot, Simone! This is clearly shows a problem: [root@ov-eng ovirt-engine]# sudo -u postgres psql -d engine -c 'select vds_name, vds_spm_id from vds' vds_name | vds_spm_id -+ ovirt1.local | 2 ovirt2.local | 1 (2 rows) While hosted-engine.conf on ovirt1.local have host_id=1, and ovirt2.local host_id=2. So totally opposite values. So how to get this fixed in the simple way? Update the engine DB? Regards, Artem On Mon, Feb 19, 2018 at 5:37 PM, Simone Tiraboschi <stira...@redhat.com> wrote: > > > On Mon, Feb 19, 2018 at 12:13 PM, Artem Tambovskiy < > artem.tambovs...@gmail.com> wrote: > >> Hello, >> >> Last weekend my cluster suffered form a massive power outage due to human >> mistake. >> I'm using SHE setup with Gluster, I managed to bring the cluster up >> quickly, but once again I have a problem with duplicated host_id ( >> https://bugzilla.redhat.com/show_bug.cgi?id=1543988) on second host and >> due to this second host is not capable to run HE. >> >> I manually updated file hosted_engine.conf with correct host_id and >> restarted agent & broker - no effect. Than I rebooted the host itself - >> still no changes. How to fix this issue? >> > > I'd suggest to run this command on the engine VM: > sudo -u postgres scl enable rh-postgresql95 -- psql -d engine -c 'select > vds_name, vds_spm_id from vds' > (just sudo -u postgres psql -d engine -c 'select vds_name, vds_spm_id > from vds' if still on 4.1) and check > /etc/ovirt-hosted-engine/hosted-engine.conf > on all the involved host. > Maybe you can also have a leftover configuration file on undeployed host. > > When you find a conflict you should manually bring down sanlock > In doubt a reboot of both the hosts will solve for sure. > > > >> >> Regards, >> Artem >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] why host is not capable to run HE?
Hello, Last weekend my cluster suffered form a massive power outage due to human mistake. I'm using SHE setup with Gluster, I managed to bring the cluster up quickly, but once again I have a problem with duplicated host_id ( https://bugzilla.redhat.com/show_bug.cgi?id=1543988) on second host and due to this second host is not capable to run HE. I manually updated file hosted_engine.conf with correct host_id and restarted agent & broker - no effect. Than I rebooted the host itself - still no changes. How to fix this issue? Regards, Artem ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] hosted-engine unknow stale-data
Hello Kasturi, Yes, I set global maintenance mode intentionally, I'm run out of the ideas troubleshooting my cluster and decided to undeploy the hosted engine from second host, clean the installation and add again to the cluster. Also I cleaned the metadata with *hosted-engine --clean-metadata --host-id=2 --force-clean *But once I added the second host to the cluster again it doesn't show the capability to run hosted engine. And doesn't even appear in the output hosted-engine --vm-status [root@ovirt1 ~]#hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a23c7cbd local_conf_timestamp : 848931 Host timestamp : 848930 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=848930 (Mon Jan 22 09:53:29 2018) host-id=1 score=3400 vm_conf_refresh_time=848931 (Mon Jan 22 09:53:29 2018) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False On redeployed second host I see unknown-stale-data again, and second host doesn't show up as a hosted-engine capable. [root@ovirt2 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID: 1 Engine status : unknown stale-data Score : 0 stopped: False Local maintenance : False crc32 : 18765f68 local_conf_timestamp : 848951 Host timestamp : 848951 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=848951 (Mon Jan 22 09:53:49 2018) host-id=1 score=0 vm_conf_refresh_time=848951 (Mon Jan 22 09:53:50 2018) conf_on_shared_storage=True maintenance=False state=ReinitializeFSM stopped=False Really strange situation ... Regards, Artem On Mon, Jan 22, 2018 at 9:46 AM, Kasturi Narra <kna...@redhat.com> wrote: > Hello Artem, > > Any reason why you chose hosted-engine undeploy action for the second > host ? I see that the cluster is in global maintenance mode, was this > intended ? > > command to clear the entries from hosted-engine --vm-status is "hosted-engine > --clean-metadata --host-id= --force-clean" > > Hope this helps !! > > Thanks > kasturi > > > On Fri, Jan 19, 2018 at 12:07 AM, Artem Tambovskiy < > artem.tambovs...@gmail.com> wrote: > >> Hi, >> >> Ok, i decided to remove second host from the cluster. >> I reinstalled from webUI it with hosted-engine action UNDEPLOY, and >> removed it from the cluster aftewards. >> All VM's are fine hosted engine running ok, >> But hosted-engine --vm-status still showing 2 hosts. >> >> How I can clean the traces of second host in a correct way? >> >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : True >> Hostname : ovirt1.telia.ru >> Host ID: 1 >> Engine status : {"health": "good", "vm": "up", >> "detail": "up"} >> Score : 3400 >> stopped: False >> Local maintenance : False >> crc32 : 1b1b6f6d >> local_conf_timestamp : 545385 >> Host timestamp : 545385 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=545385 (Thu Jan 18 21:34:25 2018) >> host-id=1 >> score=3400 >> vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=GlobalMaintenance >> stopped=False >> >> >> --== Host 2 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID: 2 >> Engine status : unknown stale-data >> Score : 0 >> stopped: True >> Local maintenance
Re: [ovirt-users] correct settings for gluster based storage domain
Ok, Alexey, you have picked the third option and leaving host selection to DNS resolver. But in general the solution 2 also should work, right? Regards, Artem On Fri, Jan 19, 2018 at 4:50 PM, Николаев Алексей < alexeynikolaev.p...@yandex.ru> wrote: > https://ovirt.org/documentation/self-hosted/chap-Deploying_Self-Hosted_ > Engine/ > > > For Gluster storage, specify the full address, using either the FQDN or IP > address, and path name of the shared storage domain. > > *Important:* Only replica 3 Gluster storage is supported. Ensure the > following configuration has been made: > >- > >In the /etc/glusterfs/glusterd.vol file on all three Gluster servers, >set rpc-auth-allow-insecure to on. > > option rpc-auth-allow-insecure on > >- > >Configure the volume as follows: > > gluster volume set volume cluster.quorum-type auto > gluster volume set volume network.ping-timeout 10 > gluster volume set volume auth.allow \* > gluster volume set volume group virt > gluster volume set volume storage.owner-uid 36 > gluster volume set volume storage.owner-gid 36 > gluster volume set volume server.allow-insecure on > > > > I have problems with hosted engine storage on gluster replica 3 arbiter > with oVIrt 4.1. > I recommend update oVirt to 4.2. I have no problems with 4.2. > > > 19.01.2018, 16:43, "Artem Tambovskiy" <artem.tambovs...@gmail.com>: > > > I'm still troubleshooting the my oVirt 4.1.8 cluster and idea came to my > mind that I have an issue with storage settings for hosted_engine storage > domain. > > But in general if I have a 2 ovirt nodes running gluster + 3rd host as > arbiter, how the settings should looks like? > > lets say I have a 3 nodes: > ovirt1.domain.com (gluster + ovirt) > ovirt2.domain.com (gluster + ovirt) > ovirt3.domain.com (gluster) > > How the correct storage domain config should looks like? > > Option 1: > /etc/ovirt-hosted-engine/hosted-engine.conf > > storage=ovirt1.domain.com:/engine > mnt_options=backup-volfile-servers=ovirt2.domain.com:ovirt3.domain.com > > Option 2: > /etc/ovirt-hosted-engine/hosted-engine.conf > > storage=localhost:/engine > mnt_options=backup-volfile-servers=ovirt1.domain.com:ovirt2.domain.com:o > virt3.domain.com > > Option 3: > Setup a DNS record gluster.domain.com pointing to IP addresses of gluster > nodes > > /etc/ovirt-hosted-engine/hosted-engine.conf > > storage=gluster.domain.com:/engine > mnt_options= > > Of course its related not only to hosted engine domain, but to all gluster > based storage domains. > > Thank you in advance! > Regards, > Artem > > , > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] correct settings for gluster based storage domain
I'm still troubleshooting the my oVirt 4.1.8 cluster and idea came to my mind that I have an issue with storage settings for hosted_engine storage domain. But in general if I have a 2 ovirt nodes running gluster + 3rd host as arbiter, how the settings should looks like? lets say I have a 3 nodes: ovirt1.domain.com (gluster + ovirt) ovirt2.domain.com (gluster + ovirt) ovirt3.domain.com (gluster) How the correct storage domain config should looks like? Option 1: /etc/ovirt-hosted-engine/hosted-engine.conf storage=ovirt1.domain.com:/engine mnt_options=backup-volfile-servers=ovirt2.domain.com:ovirt3.domain.com Option 2: /etc/ovirt-hosted-engine/hosted-engine.conf storage=localhost:/engine mnt_options=backup-volfile-servers=ovirt1.domain.com:ovirt2.domain.com:o virt3.domain.com Option 3: Setup a DNS record gluster.domain.com pointing to IP addresses of gluster nodes /etc/ovirt-hosted-engine/hosted-engine.conf storage=gluster.domain.com:/engine mnt_options= Of course its related not only to hosted engine domain, but to all gluster based storage domains. Thank you in advance! Regards, Artem ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] hosted-engine unknow stale-data
Hi, Ok, i decided to remove second host from the cluster. I reinstalled from webUI it with hosted-engine action UNDEPLOY, and removed it from the cluster aftewards. All VM's are fine hosted engine running ok, But hosted-engine --vm-status still showing 2 hosts. How I can clean the traces of second host in a correct way? --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID: 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped: False Local maintenance : False crc32 : 1b1b6f6d local_conf_timestamp : 545385 Host timestamp : 545385 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=545385 (Thu Jan 18 21:34:25 2018) host-id=1 score=3400 vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID: 2 Engine status : unknown stale-data Score : 0 stopped: True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True !! Cluster is in GLOBAL MAINTENANCE mode !! Thank you in advance! Regards, Artem On Wed, Jan 17, 2018 at 6:47 PM, Artem Tambovskiy < artem.tambovs...@gmail.com> wrote: > Hello, > > Any further suggestions on how to fix the issue and make HA setup working? > Can the complete removal of second host (with complete removal ovirt > configuration files and packages) from cluster and adding it again solve > the issue? Or it might completly ruin the cluster? > > Regards, > Artem > > 16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" < > artem.tambovs...@gmail.com> написал: > > Hi Martin, >> >> Thanks for feedback. >> >> All hosts and hosted-engine running 4.1.8 release. >> The strange thing : I can see that host ID is set to 1 on both hosts at >> /etc/ovirt-hosted-engine/hosted-engine.conf file. >> I have no idea how this happen, the only thing I have changed recently is >> that I have changed mnt_options in order to add backup-volfile-servers >> by using hosted-engine --set-shared-config command >> >> Both agent and broker are running on second host >> >> [root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha- >> vdsm 42331 1 26 14:40 ?00:31:35 /usr/bin/python >> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon >> vdsm 42332 1 0 14:40 ?00:00:16 /usr/bin/python >> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon >> >> but I saw some tracebacks during the broker start >> >> [root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l >> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability >> Communications Broker >>Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; >> enabled; vendor preset: disabled) >>Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min >> ago >> Main PID: 42331 (ovirt-ha-broker) >>CGroup: /system.slice/ovirt-ha-broker.service >>└─42331 /usr/bin/python >> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker >> --no-daemon >> >> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine >> High Availability Communications Broker. >> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine >> High Availability Communications Broker... >> Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker >> ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error >> handling request, data: 'set-storage-domain FilesystemBackend >> dom_type=glusterfs sd_uuid=4a
Re: [ovirt-users] hosted-engine unknow stale-data
Hello, Any further suggestions on how to fix the issue and make HA setup working? Can the complete removal of second host (with complete removal ovirt configuration files and packages) from cluster and adding it again solve the issue? Or it might completly ruin the cluster? Regards, Artem 16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" < artem.tambovs...@gmail.com> написал: > Hi Martin, > > Thanks for feedback. > > All hosts and hosted-engine running 4.1.8 release. > The strange thing : I can see that host ID is set to 1 on both hosts at > /etc/ovirt-hosted-engine/hosted-engine.conf file. > I have no idea how this happen, the only thing I have changed recently is > that I have changed mnt_options in order to add backup-volfile-servers > by using hosted-engine --set-shared-config command > > Both agent and broker are running on second host > > [root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha- > vdsm 42331 1 26 14:40 ?00:31:35 /usr/bin/python > /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon > vdsm 42332 1 0 14:40 ?00:00:16 /usr/bin/python > /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon > > but I saw some tracebacks during the broker start > > [root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l > ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability > Communications Broker >Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; > enabled; vendor preset: disabled) >Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago > Main PID: 42331 (ovirt-ha-broker) >CGroup: /system.slice/ovirt-ha-broker.service >└─42331 /usr/bin/python > /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker > --no-daemon > > Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine > High Availability Communications Broker. > Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine > High Availability Communications Broker... > Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker > ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error > handling request, data: 'set-storage-domain FilesystemBackend > dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162' > Traceback (most > recent call last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", > line 166, in handle > data) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", > line 299, in _dispatch > > .set_storage_domain(client, sd_type, **options) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", > line 66, in set_storage_domain > > self._backends[client].connect() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", > line 462, in connect > self._dom_type) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", > line 107, in get_domain_path > " in > {1}".format(sd_uuid, parent)) > > BackendFailureException: path to storage domain > 4a7f8717-9bb0-4d80-8016-498fa4b88162 > not found in /rhev/data-center/mnt/glusterSD > > > > I have tried to issue hosted-engine --connect-storage on second host > followed by agent & broker restart > But there is no any visible improvements. > > Regards, > Artem > > > > > > > > On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msi...@redhat.com> wrote: > >> Hi everybody, >> >> there are couple of things to check here. >> >> - what version of hosted engine agent is this? The logs look like >> coming from 4.1 >> - what version of engine is used? >> - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on >> both hosts, the numbers must be different >> - it looks like the agent or broker on host 2 is not active (or there >> would be a report) >> - the second host does not see data from the first host (unknown >> stale-data), wait for a minute and check again, then check the storage >
Re: [ovirt-users] hosted-engine unknow stale-data
Hi Martin, Thanks for feedback. All hosts and hosted-engine running 4.1.8 release. The strange thing : I can see that host ID is set to 1 on both hosts at /etc/ovirt-hosted-engine/hosted-engine.conf file. I have no idea how this happen, the only thing I have changed recently is that I have changed mnt_options in order to add backup-volfile-servers by using hosted-engine --set-shared-config command Both agent and broker are running on second host [root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha- vdsm 42331 1 26 14:40 ?00:31:35 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon vdsm 42332 1 0 14:40 ?00:00:16 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon but I saw some tracebacks during the broker start [root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago Main PID: 42331 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─42331 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain 4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in /rhev/data-center/mnt/glusterSD I have tried to issue hosted-engine --connect-storage on second host followed by agent & broker restart But there is no any visible improvements. Regards, Artem On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msi...@redhat.com> wrote: > Hi everybody, > > there are couple of things to check here. > > - what version of hosted engine agent is this? The logs look like > coming from 4.1 > - what version of engine is used? > - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on > both hosts, the numbers must be different > - it looks like the agent or broker on host 2 is not active (or there > would be a report) > - the second host does not see data from the first host (unknown > stale-data), wait for a minute and check again, then check the storage > connection > > And then the general troubleshooting: > > - put hosted engine in global maintenance mode (and check that it is > visible from the other host using he --vm-status) > - mount storage domain (hosted-engine --connect-storage) > - check sanlock client status to see if proper lockspaces are present > > Best regards > > Martin Sivak > > On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <de...@ihtfp.com> wrote: > > Why are both hosts reporting as ovirt 1? > > Look at the hostname fields to see what mean. > > > > -derek > > Sent using my mobile device. Please excuse any typos. > > > > On January 16, 2018 7:11:09 AM Artem Tambovskiy < > artem.tambovs...@gmail.com> > > wrote: > >> > >> Hello, > >> > >> Yes, I followed exactly
Re: [ovirt-users] hosted-engine unknow stale-data
ks ? > > 1) Move the host to maintenance > 2) click on reinstall > 3) provide the password > 4) uncheck 'automatically configure host firewall' > 5) click on 'Deploy' tab > 6) click Hosted Engine deployment as 'Deploy' > > And once the host installation is done, wait till the active score of the > host shows 3400 in the general tab then check hosted-engine --vm-status. > > Thanks > kasturi > > On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy < > artem.tambovs...@gmail.com> wrote: > >> Hello, >> >> I have uploaded 2 archives with all relevant logs to shared hosting >> files from host 1 (which is currently running all VM's including >> hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK >> files from second host - https://yadi.sk/d/UBducEsV3RTvhc >> >> I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it >> gives no effect. I have also tried to shutdown hosted_engine VM, stop >> ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect >> it again - no effect as well. >> Also I tried to reinstall second host from WebGUI - this lead to the >> interesting situation - now hosted-engine --vm-status shows that both >> hosts have the same address. >> >> [root@ovirt1 ~]# hosted-engine --vm-status >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : True >> Hostname : ovirt1.telia.ru >> Host ID: 1 >> Engine status : {"health": "good", "vm": "up", >> "detail": "up"} >> Score : 3400 >> stopped: False >> Local maintenance : False >> crc32 : a7758085 >> local_conf_timestamp : 259327 >> Host timestamp : 259327 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=259327 (Mon Jan 15 14:06:48 2018) >> host-id=1 >> score=3400 >> vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineUp >> stopped=False >> >> >> --== Host 2 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID: 2 >> Engine status : unknown stale-data >> Score : 0 >> stopped: True >> Local maintenance : False >> crc32 : c7037c03 >> local_conf_timestamp : 7530 >> Host timestamp : 7530 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=7530 (Fri Jan 12 16:10:12 2018) >> host-id=2 >> score=0 >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=AgentStopped >> stopped=True >> >> Gluster seems working fine. all gluster nodes showing connected state. >> >> Any advises on how to resolve this situation are highly appreciated! >> >> Regards, >> Artem >> >> >> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <kna...@redhat.com> >> wrote: >> >>> Hello Artem, >>> >>> Can you check if glusterd service is running on host1 and all >>> the peers are in connected state ? If yes, can you restart ovirt-ha-agent >>> and broker services and check if things are working fine ? >>> >>> Thanks >>> kasturi >>> >>> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy < >>> artem.tambovs...@gmail.com> wrote: >>> >>>> Explored logs on both hosts. >>>> broker.log shows no errors. >>>> >>>> agent.log looking not good: >>>> >>>> on host1 (which running hosted engine) : >>>> >>>> MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovir >>>> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most >>>> recent call last): >>>> F
Re: [ovirt-users] hosted-engine unknow stale-data
Hello, I have uploaded 2 archives with all relevant logs to shared hosting files from host 1 (which is currently running all VM's including hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK files from second host - https://yadi.sk/d/UBducEsV3RTvhc I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it gives no effect. I have also tried to shutdown hosted_engine VM, stop ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect it again - no effect as well. Also I tried to reinstall second host from WebGUI - this lead to the interesting situation - now hosted-engine --vm-status shows that both hosts have the same address. [root@ovirt1 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID: 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped: False Local maintenance : False crc32 : a7758085 local_conf_timestamp : 259327 Host timestamp : 259327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=259327 (Mon Jan 15 14:06:48 2018) host-id=1 score=3400 vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID: 2 Engine status : unknown stale-data Score : 0 stopped: True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True Gluster seems working fine. all gluster nodes showing connected state. Any advises on how to resolve this situation are highly appreciated! Regards, Artem On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <kna...@redhat.com> wrote: > Hello Artem, > > Can you check if glusterd service is running on host1 and all the > peers are in connected state ? If yes, can you restart ovirt-ha-agent and > broker services and check if things are working fine ? > > Thanks > kasturi > > On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy < > artem.tambovs...@gmail.com> wrote: > >> Explored logs on both hosts. >> broker.log shows no errors. >> >> agent.log looking not good: >> >> on host1 (which running hosted engine) : >> >> MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovir >> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most >> recent call last): >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> line 191, in _run_agent >> return action(he) >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> line 64, in action_proper >> return he.start_monitoring() >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> line 411, in start_monitoring >> self._initialize_sanlock() >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> line 749, in _initialize_sanlock >> "Failed to initialize sanlock, the number of errors has" >> SanlockInitializationError: Failed to initialize sanlock, the number of >> errors has exceeded the limit >> >> MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovir >> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart >> agent >> MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovir >> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, >> attempt '1' >> MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::2 >> 42::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >> Found cer
Re: [ovirt-users] hosted-engine unknow stale-data
) Connecting the storage MainThread::INFO::2018-01-12 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server Any suggestions how to resolve this . regards, Artem On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy < artem.tambovs...@gmail.com> wrote: > Trying to fix one thing I broke another :( > > I fixed mnt_options for hosted engine storage domain and installed latest > security patches to my hosts and hosted engine. All VM's up and running, > but hosted_engine --vm-status reports about issues: > > [root@ovirt1 ~]# hosted-engine --vm-status > > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt2 > Host ID: 1 > Engine status : unknown stale-data > Score : 0 > stopped: False > Local maintenance : False > crc32 : 193164b8 > local_conf_timestamp : 8350 > Host timestamp : 8350 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=8350 (Fri Jan 12 19:03:54 2018) > host-id=1 > score=0 > vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) > conf_on_shared_storage=True > maintenance=False > state=EngineUnexpectedlyDown > stopped=False > timeout=Thu Jan 1 05:24:43 1970 > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt1.telia.ru > Host ID: 2 > Engine status : unknown stale-data > Score : 0 > stopped: True > Local maintenance : False > crc32 : c7037c03 > local_conf_timestamp : 7530 > Host timestamp : 7530 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=7530 (Fri Jan 12 16:10:12 2018) > host-id=2 > score=0 > vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) > conf_on_shared_storage=True > maintenance=False > state=AgentStopped > stopped=True > [root@ovirt1 ~]# > > > > from second host situation looks a bit different: > > > [root@ovirt2 ~]# hosted-engine --vm-status > > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : ovirt2 > Host ID: 1 > Engine status : {"reason": "vm not running on this > host", "health": "bad", "vm": "down", "detail": "unknown"} > Score : 0 > stopped: False > Local maintenance : False > crc32 : 78eabdb6 > local_conf_timestamp : 8403 > Host timestamp : 8402 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=8402 (Fri Jan 12 19:04:47 2018) > host-id=1 > score=0 > vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) > conf_on_shared_storage=True > maintenance=False > state=EngineUnexpectedlyDown > stopped=False > timeout=Thu Jan 1 05:24:43 1970 > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt1.telia.ru > Host ID: 2 > Engine status : unknown stale-data > Score : 0 > stopped: True > Local maintenance : False > crc32 : c7037c03 > local_conf_timestamp : 7530 > Host timestamp : 7530 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=7530 (Fri Jan 12 16:10:12 2018) > host-id=2 > score=0 > vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) > conf_on_shared_storage=True >
[ovirt-users] (no subject)
Trying to fix one thing I broke another :( I fixed mnt_options for hosted engine storage domain and installed latest security patches to my hosts and hosted engine. All VM's up and running, but hosted_engine --vm-status reports about issues: [root@ovirt1 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt2 Host ID: 1 Engine status : unknown stale-data Score : 0 stopped: False Local maintenance : False crc32 : 193164b8 local_conf_timestamp : 8350 Host timestamp : 8350 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8350 (Fri Jan 12 19:03:54 2018) host-id=1 score=0 vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970 --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID: 2 Engine status : unknown stale-data Score : 0 stopped: True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True [root@ovirt1 ~]# from second host situation looks a bit different: [root@ovirt2 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2 Host ID: 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped: False Local maintenance : False crc32 : 78eabdb6 local_conf_timestamp : 8403 Host timestamp : 8402 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8402 (Fri Jan 12 19:04:47 2018) host-id=1 score=0 vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970 --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID: 2 Engine status : unknown stale-data Score : 0 stopped: True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True WebGUI shows that engine running on host ovirt1. Gluster looks fine [root@ovirt1 ~]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid -- Brick ovirt1.telia.ru:/oVirt/engine 49169 0 Y 3244 Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y 20372 Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y 16609 Self-heal Daemon on localhost N/A N/AY 117868 Self-heal Daemon on ovirt2.telia.ru N/A N/AY 20521 Self-heal Daemon on ovirt3 N/A N/AY 25093 Task Status of Volume engine -- There are no active volume tasks How to resolve this issue?
Re: [ovirt-users] mount_options for hosted_engine storage domain
Thanks a lot, Simeone! hosted-engine --set-shared-config mnt_options backup-volfile-servers=host1.domain.com:host2.domain.com --type=he_conf solved my issue! Regards, Artem On Fri, Jan 12, 2018 at 3:39 PM, Simone Tiraboschi <stira...@redhat.com> wrote: > > > On Fri, Jan 12, 2018 at 1:22 PM, Artem Tambovskiy < > artem.tambovs...@gmail.com> wrote: > >> Hi, >> >> I have deployed a small cluster with 2 ovirt hosts and GlusterFS cluster >> some time ago. And recently during software upgrade I noticed that I made >> some mistakes during the installation: >> >> if the host which was deployed first will be taken down for upgrade >> (powered off or rebooted) the engine becomes unavailable (even all VM's and >> hosted engine were migrated to second host in advance). >> >> I was thinking that this is due to missing mnt_options=backup-volfile--se >> rvers=host1.domain.com;host2.domain.com option for hosted engine storage >> domain. >> Is there any good way to fix this? I have tried >> edit /etc/ovirt-hosted-engine/hosted-engine.conf manually to add missing >> mnt_options but after while I noticed that those changes are gone. >> > > The master copy used at host-deploy time is on the shared storage domain, > you can change it with: > hosted-engine --set-shared-conf mnt_options=backup-volfile--servers= > host1.domain.com;host2.domain.com > > And then edit /etc/ovirt-hosted-engine/hosted-engine.conf and restart > ovirt-ha-agent on existing HE hosts. > > > >> >> Any suggestions? >> >> Thanks in advance! >> Artem >> >> >> >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] mount_options for hosted_engine storage domain
Hi, I have deployed a small cluster with 2 ovirt hosts and GlusterFS cluster some time ago. And recently during software upgrade I noticed that I made some mistakes during the installation: if the host which was deployed first will be taken down for upgrade (powered off or rebooted) the engine becomes unavailable (even all VM's and hosted engine were migrated to second host in advance). I was thinking that this is due to missing mnt_options=backup-volfile--servers=host1.domain.com;host2.domain.com option for hosted engine storage domain. Is there any good way to fix this? I have tried edit /etc/ovirt-hosted-engine/hosted-engine.conf manually to add missing mnt_options but after while I noticed that those changes are gone. Any suggestions? Thanks in advance! Artem ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Q: Partitioning - oVirt 4.1 & GlusterFS 2-node System
Hi, AFAIK, during hosted engine deployment installer will check the GlusterFS replica type. And replica 3 is a mandatory requirement. Previously, i got and idvise within this mailing list to look on DRDB solution if you do t have a third node to to run at a GlusterFS replica 3. 14 дек. 2017 г. 1:51 пользователь "Andrei V"написал: > Hi, Donny, > > Thanks for the link. > > Am I understood correctly that I'm need at least 3-node system to run in > failover mode? So far I'm plan to deploy only 2 nodes, either with hosted > either with bare metal engine. > > *The key thing to keep in mind regarding host maintenance and downtime is > that this converged three node system relies on having at least two of the > nodes up at all times. If you bring down two machines at once, you'll run > afoul of the Gluster quorum rules that guard us from split-brain states in > our storage, the volumes served by your remaining host will go read-only, > and the VMs stored on those volumes will pause and require a shutdown and > restart in order to run again.* > > What happens if in 2-node glusterfs system (with hosted engine) one node > goes down? > Bare metal engine can manage this situation, but I'm not sure about hosted > engine. > > > On 12/13/2017 11:17 PM, Donny Davis wrote: > > I would start here > https://ovirt.org/blog/2017/04/up-and-running-with-ovirt- > 4.1-and-gluster-storage/ > > Pretty good basic guidance. > > Also with software defined storage its recommended their are at least two > "storage" nodes and one arbiter node to maintain quorum. > > On Wed, Dec 13, 2017 at 3:45 PM, Andrei V wrote: > >> Hi, >> >> I'm going to setup relatively simple 2-node system with oVirt 4.1, >> GlusterFS, and several VMs running. >> Each node going to be installed on dual Xeon system with single RAID 5. >> >> oVirt node installer uses relatively simple default partitioning scheme. >> Should I leave it as is, or there are better options? >> I never used GlusterFS before, so any expert opinion is very welcome. >> >> Thanks in advance. >> Andrei >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> > > > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Standalone Gluster Storage
Hi, I just updated almost all storage domains with backup-volfile-servers mount options, the last one remaining is the hosted_storage domain which serves hosted engine VM. I wonder if this domain also needs to be configured with backup-volfile-servers option? If so how to do this - I can't put this domain on maintenance via web UI. Regards, Artem On Wed, Dec 13, 2017 at 9:03 AM, Sahina Bose <sab...@redhat.com> wrote: > The backup-volfile-servers as an additional mount option should handle the > case where one of the servers goes down - storage domain should continue to > be available. > The servers specified for this option can be the servers participating in > your volume. For instance, the set of unique servers from the "gluster > volume info" command. > > If even with this mount option, you're facing an issue - please log a bug > with gluster mount logs and vdsm logs. > > thanks > sahina > > On Wed, Dec 13, 2017 at 12:37 AM, Beau Sapach <bsap...@ualberta.ca> wrote: > >> We did use the backup-volfile-servers option but still had trouble. We >> were simply adding all servers in the cluster as backups, is there a best >> practice that should be followed? >> >> On Tue, Dec 12, 2017 at 8:59 AM, Artem Tambovskiy < >> artem.tambovs...@gmail.com> wrote: >> >>> I did exactly the same mistake with my standalone GlusterFS cluster and >>> now need to take down all Storage Domains in order to fix this mistake. >>> Probably, worth to add a few words about this in Installation guide! >>> >>> On Tue, Dec 12, 2017 at 4:52 PM, Simone Tiraboschi <stira...@redhat.com> >>> wrote: >>> >>>> >>>> >>>> On Mon, Dec 11, 2017 at 8:44 PM, Beau Sapach <bsap...@ualberta.ca> >>>> wrote: >>>> >>>>> We've been doing some experimenting with gluster, and have built a >>>>> stand-alone gluster cluster (not managed by oVirt). We've been able to >>>>> create a storage domain backed by that gluster cluster and run VMs with >>>>> their disks on that storage. >>>>> >>>>> The problem we have is that when we take a gluster node down for >>>>> updates, maintenance etc. the entire storage domain goes offline in oVirt. >>>>> Other gluster clients, that is servers connecting directly to the gluster >>>>> cluster don't seem to notice if one node goes offline. >>>>> >>>>> Is anyone else using gluster storage in oVirt that is not managed >>>>> within oVirt? >>>>> >>>> >>>> Did you set also the backup-volfile-servers mount option? >>>> >>>> >>>>> >>>>> >>>>> -- >>>>> Beau Sapach >>>>> *System Administrator | Information Technology Services | University >>>>> of Alberta Libraries* >>>>> *Phone: 780.492.4181 <(780)%20492-4181> | Email: >>>>> beau.sap...@ualberta.ca <beau.sap...@ualberta.ca>* >>>>> >>>>> >>>>> ___ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>> >>>> ___ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>> >>> ___ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >> >> >> -- >> Beau Sapach >> *System Administrator | Information Technology Services | University of >> Alberta Libraries* >> *Phone: 780.492.4181 | Email: beau.sap...@ualberta.ca >> <beau.sap...@ualberta.ca>* >> >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Standalone Gluster Storage
I did exactly the same mistake with my standalone GlusterFS cluster and now need to take down all Storage Domains in order to fix this mistake. Probably, worth to add a few words about this in Installation guide! On Tue, Dec 12, 2017 at 4:52 PM, Simone Tiraboschiwrote: > > > On Mon, Dec 11, 2017 at 8:44 PM, Beau Sapach wrote: > >> We've been doing some experimenting with gluster, and have built a >> stand-alone gluster cluster (not managed by oVirt). We've been able to >> create a storage domain backed by that gluster cluster and run VMs with >> their disks on that storage. >> >> The problem we have is that when we take a gluster node down for updates, >> maintenance etc. the entire storage domain goes offline in oVirt. Other >> gluster clients, that is servers connecting directly to the gluster cluster >> don't seem to notice if one node goes offline. >> >> Is anyone else using gluster storage in oVirt that is not managed within >> oVirt? >> > > Did you set also the backup-volfile-servers mount option? > > >> >> >> -- >> Beau Sapach >> *System Administrator | Information Technology Services | University of >> Alberta Libraries* >> *Phone: 780.492.4181 <(780)%20492-4181> | Email: beau.sap...@ualberta.ca >> * >> >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] extending cloud Images in oVirt
I have a question indirectly releted to Ovirt. I need to move one old setup into VM running in oVirt cluster. The VM was based on on Debian 8.9, so I took a Debian cloude image from https://cdimage.debian.org/cdimage/openstack/8.9.8-20171105/ uploaded it into my cluster and attached it to VM. All looks good but ... the disk shows only 2G and indeed ned more disk space. I tried to edit and add more space - and it didnt work. Any ideas how to extend those cloud images? Regards, Artem ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Non-responsive host, VM's are still running - how to resolve?
', 'memUsage': '49', 'guestFQDN': '', 'memoryStats': {u'swap_out': '0', u'majflt': '0', u'swap_usage': '0', u'mem_cached': '549844', u'mem_free': '1054040', u'mem_buffers': '2080', u'swap_in': '0', u'swap_total': '4064252', u'pageflt': '148', u'mem_total': '1815524', u'mem_unused': '502116'}, 'session': 'Unknown', 'netIfaces': [], 'guestCPUCount': -1, 'appsList': (), 'guestIPs': '', 'disksUsage': []}} Nov 14 21:01:34 ovirt2.telia.ru vdsm[54971]: vdsm vds WARN Not ready yet, ignoring event u'|virt|VM_status|ca2815c5-f815-469d-869d-a8fe1cb8c2e7' args={u'ca2815c5-f815-469d-869d-a8fe1cb8c2e7': {'status': 'Up', 'username': 'Unknown', 'memUsage': '14', 'guestFQDN': '', 'memoryStats': {u'swap_out': '0', u'majflt': '0', u'swap_usage': '0', u'mem_cached': '497136', u'mem_free': '1801440', u'mem_buffers': '102108', u'swap_in': '0', u'swap_total': '1046524', u'pageflt': '64', u'mem_total': '2046116', u'mem_unused': '1202196'}, 'session': 'Unknown', 'netIfaces': [], 'guestCPUCount': -1, 'appsList': (), 'guestIPs': '', 'disksUsage': []}} On Tue, Nov 14, 2017 at 8:49 PM, Darrell Budic <bu...@onholyground.com> wrote: > Try restarting vdsmd from the shell, “systemctl restart vdsmd”. > > > ------ > *From:* Artem Tambovskiy <artem.tambovs...@gmail.com> > *Subject:* [ovirt-users] Non-responsive host, VM's are still running - > how to resolve? > *Date:* November 14, 2017 at 11:23:32 AM CST > *To:* users > > Apparently, i lost the host which was running hosted-engine and another 4 > VM's exactly during migration of second host from bare-metal to second host > in the cluster. For some reason first host entered the "Non reponsive" > state. The interesting thing is that hosted-engine and all other VM's up > and running, so its like a communication problem between hosted-engine and > host. > > The engine.log at hosted-engine is full of following messages: > > 2017-11-14 17:06:43,158Z INFO > [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] > (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 > 2017-11-14 17:06:43,159Z ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] > (DefaultQuartzScheduler9) [50938c3] Command 'GetAllVmStatsVDSCommand(HostName > = ovirt2.telia.ru, VdsIdVDSCommandParametersBase:{runAsync='true', > hostId='3970247c-69eb-4bd8-b263-9100703a8243'})' execution failed: > java.net.NoRouteToHostException: No route to host > 2017-11-14 17:06:43,159Z INFO [org.ovirt.engine.core. > vdsbroker.monitoring.PollVmStatsRefresher] (DefaultQuartzScheduler9) > [50938c3] Failed to fetch vms info for host 'ovirt2.telia.ru' - skipping > VMs monitoring. > 2017-11-14 17:06:45,929Z INFO > [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] > (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 > 2017-11-14 17:06:45,930Z ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] > (DefaultQuartzScheduler2) [6080f1cc] Command > 'GetCapabilitiesVDSCommand(HostName > = ovirt2.telia.ru, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', > hostId='3970247c-69eb-4bd8-b263-9100703a8243', vds='Host[ovirt2.telia.ru, > 3970247c-69eb-4bd8-b263-9100703a8243]'})' execution failed: > java.net.NoRouteToHostException: > No route to host > 2017-11-14 17:06:45,930Z ERROR [org.ovirt.engine.core. > vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler2) [6080f1cc] > Failure to refresh host 'ovirt2.telia.ru' runtime info: > java.net.NoRouteToHostException: > No route to host > 2017-11-14 17:06:48,933Z INFO > [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] > (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 > 2017-11-14 17:06:48,934Z ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] > (DefaultQuartzScheduler6) [1a64dfea] Command > 'GetCapabilitiesVDSCommand(HostName > = ovirt2.telia.ru, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', > hostId='3970247c-69eb-4bd8-b263-9100703a8243', vds='Host[ovirt2.telia.ru, > 3970247c-69eb-4bd8-b263-9100703a8243]'})' execution failed: > java.net.NoRouteToHostException: > No route to host > 2017-11-14 17:06:48,934Z ERROR [org.ovirt.engine.core. > vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler6) [1a64dfea] > Failure to refresh host 'ovirt2.telia.ru' runtime info: > java.net.NoRouteToHostException: > No route to host > 2017-11-14 17:06:50,931Z INFO > [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] > (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 > 2017-11-14 17:06:50,932Z ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] > (DefaultQuartzScheduler4) [6b19d168] Command 'SpmStatusVDSCommand(HostName > = ovirt2.telia.ru, SpmStatusVDSCom
[ovirt-users] Non-responsive host, VM's are still running - how to resolve?
Apparently, i lost the host which was running hosted-engine and another 4 VM's exactly during migration of second host from bare-metal to second host in the cluster. For some reason first host entered the "Non reponsive" state. The interesting thing is that hosted-engine and all other VM's up and running, so its like a communication problem between hosted-engine and host. The engine.log at hosted-engine is full of following messages: 2017-11-14 17:06:43,158Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 2017-11-14 17:06:43,159Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler9) [50938c3] Command 'GetAllVmStatsVDSCommand(HostName = ovirt2.telia.ru, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='3970247c-69eb-4bd8-b263-9100703a8243'})' execution failed: java.net.NoRouteToHostException: No route to host 2017-11-14 17:06:43,159Z INFO [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] (DefaultQuartzScheduler9) [50938c3] Failed to fetch vms info for host ' ovirt2.telia.ru' - skipping VMs monitoring. 2017-11-14 17:06:45,929Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 2017-11-14 17:06:45,930Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler2) [6080f1cc] Command 'GetCapabilitiesVDSCommand(HostName = ovirt2.telia.ru, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='3970247c-69eb-4bd8-b263-9100703a8243', vds='Host[ovirt2.telia.ru,3970247c-69eb-4bd8-b263-9100703a8243]'})' execution failed: java.net.NoRouteToHostException: No route to host 2017-11-14 17:06:45,930Z ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler2) [6080f1cc] Failure to refresh host ' ovirt2.telia.ru' runtime info: java.net.NoRouteToHostException: No route to host 2017-11-14 17:06:48,933Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 2017-11-14 17:06:48,934Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler6) [1a64dfea] Command 'GetCapabilitiesVDSCommand(HostName = ovirt2.telia.ru, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='3970247c-69eb-4bd8-b263-9100703a8243', vds='Host[ovirt2.telia.ru,3970247c-69eb-4bd8-b263-9100703a8243]'})' execution failed: java.net.NoRouteToHostException: No route to host 2017-11-14 17:06:48,934Z ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler6) [1a64dfea] Failure to refresh host ' ovirt2.telia.ru' runtime info: java.net.NoRouteToHostException: No route to host 2017-11-14 17:06:50,931Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 2017-11-14 17:06:50,932Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler4) [6b19d168] Command 'SpmStatusVDSCommand(HostName = ovirt2.telia.ru, SpmStatusVDSCommandParameters:{runAsync='true', hostId='3970247c-69eb-4bd8-b263-9100703a8243', storagePoolId='5a044257-02ec-0382-0243-01f2'})' execution failed: java.net.NoRouteToHostException: No route to host 2017-11-14 17:06:50,939Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 2017-11-14 17:06:50,940Z ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler4) [6b19d168] IrsBroker::Failed::GetStoragePoolInfoVDS 2017-11-14 17:06:50,940Z ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand] (DefaultQuartzScheduler4) [6b19d168] Command 'GetStoragePoolInfoVDSCommand( GetStoragePoolInfoVDSCommandParameters:{runAsync='true', storagePoolId='5a044257-02ec-0382-0243-01f2', ignoreFailoverLimit='true'})' execution failed: IRSProtocolException: 2017-11-14 17:06:51,937Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 2017-11-14 17:06:51,938Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler7) [7f23a3bd] Command 'GetCapabilitiesVDSCommand(HostName = ovirt2.telia.ru, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='3970247c-69eb-4bd8-b263-9100703a8243', vds='Host[ovirt2.telia.ru,3970247c-69eb-4bd8-b263-9100703a8243]'})' execution failed: java.net.NoRouteToHostException: No route to host 2017-11-14 17:06:51,938Z ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler7) [7f23a3bd] Failure to refresh host ' ovirt2.telia.ru' runtime info: java.net.NoRouteToHostException: No route to host 2017-11-14 17:06:54,941Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt2/80.239.162.106 2017-11-14
Re: [ovirt-users] Host Power Management Configuration questions
Hi, In the engine.log appears following: 2017-11-14 12:04:33,081+03 ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (default task-184) [32fe1ce0-2e25-4e2e-a6bf-59f39a65b2f1] Can not run fence action on host 'ovirt.prod.env', no suitable proxy host was found. 2017-11-14 12:04:36,534+03 INFO [org.ovirt.engine.core.bll.hostdeploy.UpdateVdsCommand] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Running command: UpdateVdsCommand internal: false. Entities affected : ID: a9bb1c6f-b9c9-4dc3-a24e-b83b2004552d Type: VDSAction group EDIT_HOST_CONFIGURATION with role type ADMIN 2017-11-14 12:04:36,704+03 ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Can not run fence action on host 'ovirt.prod.env', no suitable proxy host was found. 2017-11-14 12:04:36,705+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Failed to get vds 'a9bb1c6f-b9c9-4dc3-a24e-b83b2004552d', error: null 2017-11-14 12:04:36,705+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Failed to get vds 'a9bb1c6f-b9c9-4dc3-a24e-b83b2004552d', error: null 2017-11-14 12:04:36,705+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Failed to get vds 'a9bb1c6f-b9c9-4dc3-a24e-b83b2004552d', error null 2017-11-14 12:04:36,705+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Failed to get vds 'a9bb1c6f-b9c9-4dc3-a24e-b83b2004552d', error null 2017-11-14 12:04:36,720+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] EVENT_ID: VDS_ALERT_PM_HEALTH_CHECK_START_MIGHT_FAIL(9,010), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Health check on Host indicates that future attempts to Start this host using Power-Management are expected to fail. 2017-11-14 12:04:36,720+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Failed to get vds 'a9bb1c6f-b9c9-4dc3-a24e-b83b2004552d', error: null 2017-11-14 12:04:36,720+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Failed to get vds 'a9bb1c6f-b9c9-4dc3-a24e-b83b2004552d', error: null 2017-11-14 12:04:36,720+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Failed to get vds 'a9bb1c6f-b9c9-4dc3-a24e-b83b2004552d', error null 2017-11-14 12:04:36,720+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] Failed to get vds 'a9bb1c6f-b9c9-4dc3-a24e-b83b2004552d', error null 2017-11-14 12:04:36,731+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] EVENT_ID: VDS_ALERT_PM_HEALTH_CHECK_STOP_MIGHT_FAIL(9,011), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Health check on Host indicates that future attempts to Stop this host using Power-Management are expected to fail. 2017-11-14 12:04:36,765+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] EVENT_ID: KDUMP_DETECTION_NOT_CONFIGURED_ON_VDS(617), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Kdump integration is enabled for host ovirt.prod.env, but kdump is not configured properly on host. 2017-11-14 12:04:36,781+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-186) [d83ce46d-ce89-4804-aba1-761103e93e8c] EVENT_ID: USER_UPDATE_VDS(43), Correlation ID: d83ce46d-ce89-4804-aba1-761103e93e8c, Call Stack: null, Custom Event ID: -1, Message: Host ovirt.prod.env configuration was updated by arta00@internal-authz. Just let me know if more logs are needed. Regards, Artem On Tue, Nov 14, 2017 at 11:52 AM, Martin Perina <mper...@redhat.com> wrote: > Hi, > > could you please provide engine logs so we can investigate? > > Thanks > > Martin > > > On Tue, Nov 14, 2017 at 9:33 AM, Artem Tambovskiy < > artem.tambovs...@gmail.com> wrote: > >> Trying to configure power management for a certain host and fence agent >> always fail when I'm pressing Test button. >> >> At the same time from command line on the same host all looks good: >> >> [root@ovirt ~]# fence_ipmilan -a 172.16.22.1 -l user -p pwd -o status -v >> -P >> Executing: /usr/bin/ipmitool -I lanplus -H
[ovirt-users] Host Power Management Configuration questions
Trying to configure power management for a certain host and fence agent always fail when I'm pressing Test button. At the same time from command line on the same host all looks good: [root@ovirt ~]# fence_ipmilan -a 172.16.22.1 -l user -p pwd -o status -v -P Executing: /usr/bin/ipmitool -I lanplus -H 172.16.22.1 -p 623 -U user -P pwd -L ADMINISTRATOR chassis power status 0 Chassis Power is on Status: ON [root@ovirt ~]# What could be the reason? Regards, Artem ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Hosted-Engine environment, strange messages in event log
Any suggestion what can be the reason for those strange messages (repeating every hour) in the web gui event log: Nov 11, 2017 7:07:01 PM Status of host ovirt2.prod.env was set to Up. Nov 11, 2017 7:06:54 PM Failed to update OVF disks 94b7554b-4c18-4296-b795-98ca6c0fb251, 002af29c-58df-493d-a45c-5009d4dfc1de, OVF data isn't updated on those OVF stores (Data Center oVirtDC, Storage Domain oVirtMigration). Nov 11, 2017 7:06:53 PM Failed to update OVF disks c7bb37de-739b-4899-8d23-d9197f81b596, OVF data isn't updated on those OVF stores (Data Center oVirtDC, Storage Domain oVirtStorageData). Nov 11, 2017 7:06:53 PM Host ovirt2.prod.env is not responding. Host cannot be fenced automatically because power management for the host is disabled. Nov 11, 2017 6:06:57 PM Status of host ovirt2.prod.env was set to Up. Nov 11, 2017 6:06:51 PM Failed to update OVF disks 94b7554b-4c18-4296-b795-98ca6c0fb251, 002af29c-58df-493d-a45c-5009d4dfc1de, OVF data isn't updated on those OVF stores (Data Center oVirtDC, Storage Domain oVirtMigration). Nov 11, 2017 6:06:48 PM Failed to update OVF disks c7bb37de-739b-4899-8d23-d9197f81b596, OVF data isn't updated on those OVF stores (Data Center oVirtDC, Storage Domain oVirtStorageData). Nov 11, 2017 6:06:48 PM Host ovirt2.prod.env is not responding. Host cannot be fenced automatically because power management for the host is disabled. Power Management is not configured yet, plaqnning to test it on coming week. So far I have one host serving 5 VM's + hosted_engine. I was planning to add one more host to the cluster this week. All storage domains running on GlusterFS cluster. Regards, Artem ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Hosted Engine installation + GlusterFS cluster
One more thing is - firewall rules. For 3 gluster bricks I have configured following: firewall-cmd --zone=public --add-port=24007-24009/tcp --add-port=49152-49664/tcp --permanent and this seems not enough. have to stop the firewall in order to make the cluster working. I have noticed 490xx being used by gluster, any ideas on that documented range? lsof -i | grep gluster | grep "490" glusterfs 32301root 10u IPv4 148985 0t0 TCP ovirt1:49159->ovirt1:49099 (ESTABLISHED) glusterfs 32301root 17u IPv4 153084 0t0 TCP ovirt1:49159->ovirt2:49096 (ESTABLISHED) glusterfs 46346root 17u IPv4 156437 0t0 TCP ovirt1:49161->ovirt1:49093 (ESTABLISHED) glusterfs 46346root 18u IPv4 149985 0t0 TCP ovirt1:49161->ovirt2:49090 (ESTABLISHED) glusterfs 46380root8u IPv4 151389 0t0 TCP ovirt1:49090->ovirt3:49161 (ESTABLISHED) glusterfs 46380root 11u IPv4 148986 0t0 TCP ovirt1:49091->ovirt2:49161 (ESTABLISHED) glusterfs 46380root 21u IPv4 153074 0t0 TCP ovirt1:49099->ovirt1:49159 (ESTABLISHED) glusterfs 46380root 25u IPv4 153075 0t0 TCP ovirt1:49097->ovirt2:49160 (ESTABLISHED) glusterfs 46380root 26u IPv4 153076 0t0 TCP ovirt1:49095->ovirt3:49159 (ESTABLISHED) glusterfs 46380root 27u IPv4 153077 0t0 TCP ovirt1:49093->ovirt1:49161 (ESTABLISHED) Regards, Artem On Thu, Nov 9, 2017 at 3:56 PM, Artem Tambovskiy <artem.tambovs...@gmail.com > wrote: > Hi, > > Just realized that I probably went in the wrong way. Reinstalled > everything from the scratch added 4 volumes (hosted_engine, data, export, > iso). All looks good so far. > But if go to the Cluster properties and tick the checkbox "Enable Cluster > Service" - the host will be marked as Non-Operational. Am I messing up the > things? > Or I'm just fine as long as I already have a Data (Master) Storage Domain > over GlusterFS? > > Regards, > Artem > > On Thu, Nov 9, 2017 at 2:46 PM, Fred Rolland <froll...@redhat.com> wrote: > >> Hi, >> >> The steps for this kind of setup are described in [1]. >> However it seems you have already succeeded in installing, so maybe you >> need some additional steps [2] >> Did you add a storage domain that will act as Master Domain? It is >> needed, then the initial Storage Domain should be imported automatically. >> >> >> [1] https://www.ovirt.org/blog/2017/04/up-and-running-with-ovirt >> -4.1-and-gluster-storage/ >> [2] https://www.ovirt.org/documentation/gluster-hyperconverged/ >> chap-Additional_Steps/ >> >> On Thu, Nov 9, 2017 at 10:50 AM, Artem Tambovskiy < >> artem.tambovs...@gmail.com> wrote: >> >>> Another yet attempt to get a help on hosted-engine deployment with >>> glusterfs cluster. >>> I already spend a day trying to get bring such a setup to work with no >>> luck. >>> >>> The hosted engine being successfully deployed but I can't activate the >>> host, the storage domain for the host is missing and I can't even add it. >>> So either something went wrong during deployment or my glusterfs cluster >>> doesn't configured properly. >>> >>> That are the prerequisites for this? >>> >>> - glusterfs cluster of 3 nodes with replica 3 volume >>> - Any specific volume configs? >>> - how many volumes should I prepare for hosted engine deployment? >>> >>> Any other thoughts? >>> >>> Regards, >>> Artem >>> >>> ___ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >> > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Hosted Engine installation + GlusterFS cluster
Another yet attempt to get a help on hosted-engine deployment with glusterfs cluster. I already spend a day trying to get bring such a setup to work with no luck. The hosted engine being successfully deployed but I can't activate the host, the storage domain for the host is missing and I can't even add it. So either something went wrong during deployment or my glusterfs cluster doesn't configured properly. That are the prerequisites for this? - glusterfs cluster of 3 nodes with replica 3 volume - Any specific volume configs? - how many volumes should I prepare for hosted engine deployment? Any other thoughts? Regards, Artem ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Advise needed: building cheap HA oVirt cluster with just 2 physical servers
Hi, Can anyone share their experience on deploying hosted-engine with GlusterFS cluster? I managed to setup a GlusterFS cluster and started to deploy hosted engine. At first stage I was beaten by firewall rules - the deployment process interrupted at GlusterFS config stage. After fixing the rules I got engine up and running, but the host is non-operational state still. Logged in to Web UI -> and see 2 action points: "Gluster command failed on server" and Gluster status is disconnected for this server" This is a bit strange, since the cluster was properly detected during deployment and deployment script was supposed to configure the cluster. --== STORAGE CONFIGURATION ==-- Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: glusterfs [ INFO ] Please note that Replica 3 support is required for the shared storage. Please specify the full shared storage connection path to use (example: host:/path): .x.xx:/oVirt [ INFO ] GlusterFS replica 3 Volume detected Do you want to configure this host and its cluster for gluster? (Yes, No) [No]: Yes [ INFO ] GlusterFS replica 3 Volume detected Any ideas how to fix this? Thanks in advance! Regards, Artem On Fri, Nov 3, 2017 at 2:28 PM, Martin Sivak <msi...@redhat.com> wrote: > Hi, > > cockpit is enabled by default when you use ovirt-node. You will > probably have to install the necessary cockpit packages yourself on > pure CentOS - you will need cockpit and ovirt + gdeploy cockpit > plugins (sadly I do not recall the exact package names). > > With regards to arbiter and the wizard.. I really do not know, but I > will alert my colleagues who might have more detailed knowledge of the > gluster part. > > Denis, Sahina: can you please help me here? > > Best regards > > Martin Sivak > > On Fri, Nov 3, 2017 at 11:29 AM, Artem Tambovskiy > <artem.tambovs...@gmail.com> wrote: > > Thanks for an article, Martin! > > Any chance to configure a third cost to act as GlusterFS Arbitr only > using > > this wizard? > > > > And stupid question - how to make this wizard up and running? I've > > everything installed and nothing is runnin on port 9090 :) > > > > Regards, > > Artem > > > > On Fri, Nov 3, 2017 at 12:49 PM, Martin Sivak <msi...@redhat.com> wrote: > >> > >> Hi, > >> > >> you should take a look at the hyper converged way of installing oVirt. > >> We have a cockpit wizard that does almost everything for you: > >> > >> > >> https://www.ovirt.org/documentation/gluster- > hyperconverged/chap-Deploying_Hyperconverged/ > >> > >> It uses three hosts and collocates the VMs together with Gluster > storage. > >> > >> Best regards > >> > >> -- > >> Martin Sivak > >> SLA /oVirt > >> > >> On Fri, Nov 3, 2017 at 8:39 AM, Artem Tambovskiy > >> <artem.tambovs...@gmail.com> wrote: > >> > Thanks Eduardo! > >> > > >> > I think I can find a third server to build a glusterFS storage. So the > >> > first > >> > step will be to install a self-hosted engine on the new server and > start > >> > building a glusterFS storage. IS there any easy way to migrate > existing > >> > 5 > >> > VM's running on the second bare-metal oVirt host, right? I found a > >> > little > >> > bit tricky moving oVirt backups between the hosts (at least I failed > to > >> > replicate the existing VM's on the second server). > >> > > >> > Regards, > >> > Artem > >> > > >> > > >> > > >> > On Fri, Nov 3, 2017 at 10:24 AM, Eduardo Mayoral <emayo...@arsys.es> > >> > wrote: > >> >> > >> >> For HA you will need some kind of storage available to all the > compute > >> >> nodes in the cluster. If you have no external storage and few nodes, > I > >> >> think > >> >> your best option for storage is gluster , and the minimum number of > >> >> nodes > >> >> you will need for HA is 3 (the third gluster node can be > metadata-only, > >> >> but > >> >> you still need that third node to give you quorum, avoid split-brains > >> >> and > >> >> have something that you can call "HA" with a straight face. > >> >> > >> >> Eduardo Mayoral Jimeno (emayo...@arsys.es) > >> >> Administrador de sistemas. De
Re: [ovirt-users] Advise needed: building cheap HA oVirt cluster with just 2 physical servers
Thanks for an article, Martin! Any chance to configure a third cost to act as GlusterFS Arbitr only using this wizard? And stupid question - how to make this wizard up and running? I've everything installed and nothing is runnin on port 9090 :) Regards, Artem On Fri, Nov 3, 2017 at 12:49 PM, Martin Sivak <msi...@redhat.com> wrote: > Hi, > > you should take a look at the hyper converged way of installing oVirt. > We have a cockpit wizard that does almost everything for you: > > https://www.ovirt.org/documentation/gluster-hyperconverged/chap-Deploying_ > Hyperconverged/ > > It uses three hosts and collocates the VMs together with Gluster storage. > > Best regards > > -- > Martin Sivak > SLA /oVirt > > On Fri, Nov 3, 2017 at 8:39 AM, Artem Tambovskiy > <artem.tambovs...@gmail.com> wrote: > > Thanks Eduardo! > > > > I think I can find a third server to build a glusterFS storage. So the > first > > step will be to install a self-hosted engine on the new server and start > > building a glusterFS storage. IS there any easy way to migrate existing 5 > > VM's running on the second bare-metal oVirt host, right? I found a little > > bit tricky moving oVirt backups between the hosts (at least I failed to > > replicate the existing VM's on the second server). > > > > Regards, > > Artem > > > > > > > > On Fri, Nov 3, 2017 at 10:24 AM, Eduardo Mayoral <emayo...@arsys.es> > wrote: > >> > >> For HA you will need some kind of storage available to all the compute > >> nodes in the cluster. If you have no external storage and few nodes, I > think > >> your best option for storage is gluster , and the minimum number of > nodes > >> you will need for HA is 3 (the third gluster node can be metadata-only, > but > >> you still need that third node to give you quorum, avoid split-brains > and > >> have something that you can call "HA" with a straight face. > >> > >> Eduardo Mayoral Jimeno (emayo...@arsys.es) > >> Administrador de sistemas. Departamento de Plataformas. Arsys internet. > >> +34 941 620 145 ext. 5153 > >> > >> On 03/11/17 08:10, Artem Tambovskiy wrote: > >> > >> Looking for a design advise on oVirt provisioning. I'm running a PoC lab > >> on single bare-metal host (suddenly it was setup with just Local Storage > >> domain) and > >> no I'd like to rebuild the setup by making a cluster of 2 physical > >> servers, no external storage array available. That are the options > here? is > >> there any options to build cheap HA cluster with just 2 servers? > >> > >> Thanks in advance! > >> > >> Artem > >> > >> > >> ___ > >> Users mailing list > >> Users@ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/users > >> > >> > > > > > > ___ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Advise needed: building cheap HA oVirt cluster with just 2 physical servers
Good point! Need to focus on this first Thanks, Artem On Fri, Nov 3, 2017 at 10:50 AM, Karli Sjöberg <ka...@inparadise.se> wrote: > On fre, 2017-11-03 at 10:39 +0300, Artem Tambovskiy wrote: > > Thanks Eduardo! > > > > I think I can find a third server to build a glusterFS storage. So > > the first step will be to install a self-hosted engine on the new > > server and start building a glusterFS storage. > > You´ll need to build the Gluster storage first, as you´ll want to > install the Hosted Engine _in_ the HA Gluster storage, right? > > /K > > > IS there any easy way to migrate existing 5 VM's running on the > > second bare-metal oVirt host, right? I found a little bit tricky > > moving oVirt backups between the hosts (at least I failed to > > replicate the existing VM's on the second server). > > > > Regards, > > Artem > > > > > > > > On Fri, Nov 3, 2017 at 10:24 AM, Eduardo Mayoral <emayo...@arsys.es> > > wrote: > > > For HA you will need some kind of storage available to all the > > > compute nodes in the cluster. If you have no external storage and > > > few nodes, I think your best option for storage is gluster , and > > > the minimum number of nodes you will need for HA is 3 (the third > > > gluster node can be metadata-only, but you still need that third > > > node to give you quorum, avoid split-brains and have something that > > > you can call "HA" with a straight face. > > > Eduardo Mayoral Jimeno (emayo...@arsys.es) > > > Administrador de sistemas. Departamento de Plataformas. Arsys > > > internet. > > > +34 941 620 145 ext. 5153 > > > On 03/11/17 08:10, Artem Tambovskiy wrote: > > > > Looking for a design advise on oVirt provisioning. I'm running a > > > > PoC lab on single bare-metal host (suddenly it was setup with > > > > just Local Storage domain) and > > > > no I'd like to rebuild the setup by making a cluster of 2 > > > > physical servers, no external storage array available. That are > > > > the options here? is there any options to build cheap HA cluster > > > > with just 2 servers? > > > > > > > > Thanks in advance! > > > > > > > > Artem > > > > > > > > > > > > ___ > > > > Users mailing list > > > > Users@ovirt.org > > > > http://lists.ovirt.org/mailman/listinfo/users > > > > > ___ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Advise needed: building cheap HA oVirt cluster with just 2 physical servers
Thanks Eduardo! I think I can find a third server to build a glusterFS storage. So the first step will be to install a self-hosted engine on the new server and start building a glusterFS storage. IS there any easy way to migrate existing 5 VM's running on the second bare-metal oVirt host, right? I found a little bit tricky moving oVirt backups between the hosts (at least I failed to replicate the existing VM's on the second server). Regards, Artem On Fri, Nov 3, 2017 at 10:24 AM, Eduardo Mayoral <emayo...@arsys.es> wrote: > For HA you will need some kind of storage available to all the compute > nodes in the cluster. If you have no external storage and few nodes, I > think your best option for storage is gluster , and the minimum number of > nodes you will need for HA is 3 (the third gluster node can be > metadata-only, but you still need that third node to give you quorum, avoid > split-brains and have something that you can call "HA" with a straight face. > > Eduardo Mayoral Jimeno (emayo...@arsys.es) > Administrador de sistemas. Departamento de Plataformas. Arsys internet.+34 > 941 620 145 ext. 5153 <+34%20941%2062%2001%2045> > > On 03/11/17 08:10, Artem Tambovskiy wrote: > > Looking for a design advise on oVirt provisioning. I'm running a PoC lab > on single bare-metal host (suddenly it was setup with just Local Storage > domain) and > no I'd like to rebuild the setup by making a cluster of 2 physical > servers, no external storage array available. That are the options here? is > there any options to build cheap HA cluster with just 2 servers? > > Thanks in advance! > > Artem > > > ___ > Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users > > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Advise needed: building cheap HA oVirt cluster with just 2 physical servers
Looking for a design advise on oVirt provisioning. I'm running a PoC lab on single bare-metal host (suddenly it was setup with just Local Storage domain) and no I'd like to rebuild the setup by making a cluster of 2 physical servers, no external storage array available. That are the options here? is there any options to build cheap HA cluster with just 2 servers? Thanks in advance! Artem ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users