Re: [ovirt-users] ovirt and gateway behavior
Hi Alex, Please provide Engine logs from when this is occurring and mention the date/time we should focus at. Thanks, Edy. On Mon, Feb 5, 2018 at 2:19 PM, Alex Kwrote: > Hi all, > > I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The > cluster is used to host several VMs. > I have observed that when gateway is lost (say the gateway device is down) > the ovirt cluster goes down. > > It seems a bit extreme behavior especially when one does not care if the > hosted VMs have connectivity to Internet or not. > > Can this behavior be disabled? > > Thanx, > Alex > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] NetworkManager with oVirt version 4.2.0
On Sun, Feb 4, 2018 at 10:01 PM, Vincent Royerwrote: > I had these types of issues as well my first time around, and after a > failed engine install I haven't been able to get things cleaned up, so I > will have to start over. I created a bonded interface on the host before > the engine setup. but once I created my first VM and assigned bond0 to it, > the engine became inaccessible the moment the VM got an IP from the > router. > Please clarify what does it mean to "assign bond0 to it". A vnic can be defined on a network (using vnic profiles). If your Engine is inaccessible, try to understand what changed in the network, perhaps something collided (duplicate IP/s, routes, mac/s etc). > What is the preferred way to setup bonded interfaces? In Cockpit or nmcli > before hosted engine setup? Or proceed with only one interface then add > the other in engine? > All should work. > > Is it possible, for example, to setup bonded interfaces with a static > management IP on vlan 50 to access the engine, and let the other VMs grab > DHCP IPs on vlan 10? > Sure it is, one is the management (vlan 50) network and the other a VM network (vlan 10). > > > On Feb 3, 2018 11:31 PM, "Edward Haas" wrote: > > > > On Sat, Feb 3, 2018 at 9:06 AM, maoz zadok wrote: > >> Hello All, >> I'm new to oVirt, I'm trying with no success to set up the networking on >> an oVirt 4.2.0 node, and I think I'm missing something. >> >> background: >> interfaces em1-4 is bonded to bond0 >> VLAN configured on bond0.1 >> and bridged to ovirtmgmt for the management interface. >> >> I'm not sure its updated to version 4.2.0 but I followed this post: >> https://www.ovirt.org/documentation/how-to/networking/bondin >> g-vlan-bridge/ >> > > It looks like an old howto, we will need to update or remove it. > > >> >> with this setting, the NetworkManager keep starting up on reboot, >> and the interfaces are not managed by oVirt (and the nice traffic graphs >> are not shown). >> > > For the interfaces to be owned by oVirt, you will need to add the host to > Engine. > So I would just configure everything up to the VLAN (slaves, bond, VLAN) > with NetworkManager prior to adding it to Engine. The bridge should be > created when you add the host. > (assuming the VLAN you mentioned is your management interface and its ip > is the one used by Engine) > > >> >> >> >> >> my question: >> Is NetworkManager need to be disabled as in the above post? >> > > No (for 4.1 and 4.2) > > Do I need to manage the networking using (nmtui) NetworkManager? >> > > You better use cockpit or nmcli to configure the node before you add it to > Engine. > > >> >> Thanks! >> Maoz >> >> >> >> >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Slow conversion from VMware in 4.1
On Mon, Feb 05, 2018 at 10:57:58PM +0100, Luca 'remix_tj' Lorenzetto wrote: > On Fri, Feb 2, 2018 at 12:52 PM, Richard W.M. Joneswrote: > > There is a section about this in the virt-v2v man page. I'm on > > a train at the moment but you should be able to find it. Try to > > run many conversions, at least 4 or 8 would be good places to start. > > Hello Richard, > > read the man but found nothing explicit about resource usage. Anyway, > digging on our setup i found out that vcenter when on low cpu usage is > 95%. > I think our windows admins should take care of this. http://libguestfs.org/virt-v2v.1.html#vmware-vcenter-resources You should be able to run multiple conversions in parallel to improve throughput. The only long-term solution is to use a different method such as VMX over SSH. vCenter is just fundamentally bad. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Slow conversion from VMware in 4.1
On Fri, Feb 2, 2018 at 12:52 PM, Richard W.M. Joneswrote: > There is a section about this in the virt-v2v man page. I'm on > a train at the moment but you should be able to find it. Try to > run many conversions, at least 4 or 8 would be good places to start. Hello Richard, read the man but found nothing explicit about resource usage. Anyway, digging on our setup i found out that vcenter when on low cpu usage is 95%. I think our windows admins should take care of this. Luca -- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] guest ip address not shown on the engine panel - version 4.2.0
Do you have the guest agent installed? On Mon, Feb 5, 2018 at 4:18 PM, maoz zadokwrote: > Hi All, > Is it possible that the "IP addresses" of the guest virtual machine will > be shown? it currently empty. > > [image: Inline image 1] > > > > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] guest ip address not shown on the engine panel - version 4.2.0
Hi All, Is it possible that the "IP addresses" of the guest virtual machine will be shown? it currently empty. [image: Inline image 1] ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] ovirt 3.6, we had the ovirt manager go down in a bad way and all VMs for one node marked Unknown and Not Reponding while up
Answering my own post... a restart of vdsmd on the affected blade has fixed everything. Thanks everyone who helped. On 02/05/2018 10:02 AM, Christopher Cox wrote: Forgive the top post. I guess what I need to know now is whether there is a recovery path that doesn't lead to total loss of the VMs that are currently in the "Unknown" "Not responding" state. We are planning a total oVirt shutdown. I just would like to know if we've effectively lot those VMs or not. Again, the VMs are currently "up". And we use a file backup process, so in theory they can be restored, just somewhat painfully, from scratch. But if somebody knows if we shutdown all the bad VMs and the blade, is there someway oVirt can know the VMs are "ok" to start up?? Will changing their state directly to "down" in the db stick if the blade is down? That is, will we get to a known state where the VMs can actually be started and brought back into a known state? Right now, we're feeling there's a good chance we will not be able to recover these VMs, even though they are "up" right now. I really need some way to force oVirt into an integral state, even if it means we take the whole thing down. Possible? On 01/25/2018 06:57 PM, Christopher Cox wrote: On 01/25/2018 04:57 PM, Douglas Landgraf wrote: On Thu, Jan 25, 2018 at 5:12 PM, Christopher Coxwrote: On 01/25/2018 02:25 PM, Douglas Landgraf wrote: On Wed, Jan 24, 2018 at 10:18 AM, Christopher Cox wrote: Would restarting vdsm on the node in question help fix this? Again, all the VMs are up on the node. Prior attempts to fix this problem have left the node in a state where I can issue the "has been rebooted" command to it, it's confused. So... node is up. All VMs are up. Can't issue "has been rebooted" to the node, all VMs show Unknown and not responding but they are up. Chaning the status is the ovirt db to 0 works for a second and then it goes immediately back to 8 (which is why I'm wondering if I should restart vdsm on the node). It's not recommended to change db manually. Oddly enough, we're running all of this in production. So, watching it all go down isn't the best option for us. Any advice is welcome. We would need to see the node/engine logs, have you found any error in the vdsm.log (from nodes) or engine.log? Could you please share the error? In short, the error is our ovirt manager lost network (our problem) and crashed hard (hardware issue on the server).. On bring up, we had some network changes (that caused the lost network problem) so our LACP bond was down for a bit while we were trying to bring it up (noting the ovirt manager is up while we're reestablishing the network on the switch side). In other word, that's the "error" so to speak that got us to where we are. Full DEBUG enabled on the logs... The error messages seem obvious to me.. starts like this (nothing the ISO DOMAIN was coming off an NFS mount off the ovirt management server... yes... we know... we do have plans to move that). So on the hypervisor node itself, from the vdsm.log (vdsm.log.33.xz): (hopefully no surprise here) Thread-2426633::WARNING::2018-01-23 13:50:56,672::fileSD::749::Storage.scanDomains::(collectMetaFiles) Could not collect metadata file for domain path /rhev/data-center/mnt/d0lppc129.skopos.me:_var_lib_exports_iso-20160408002844 Traceback (most recent call last): File "/usr/share/vdsm/storage/fileSD.py", line 735, in collectMetaFiles sd.DOMAIN_META_DATA)) File "/usr/share/vdsm/storage/outOfProcess.py", line 121, in glob return self._iop.glob(pattern) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 536, in glob return self._sendCommand("glob", {"pattern": pattern}, self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 421, in _sendCommand raise Timeout(os.strerror(errno.ETIMEDOUT)) Timeout: Connection timed out Thread-27::ERROR::2018-01-23 13:50:56,672::sdc::145::Storage.StorageDomainCache::(_findDomain) domain e5ecae2f-5a06-4743-9a43-e74d83992c35 not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'e5ecae2f-5a06-4743-9a43-e74d83992c35',) Thread-27::ERROR::2018-01-23 13:50:56,673::monitor::276::Storage.Monitor::(_monitorDomain) Error monitoring domain e5ecae2f-5a06-4743-9a43-e74d83992c35 Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 272, in _monitorDomain self._performDomainSelftest() File
Re: [ovirt-users] oVirt DR: ansible with 4.1, only a subset of storage domain replicated
Hi Luca, Thank you for your interst in the Disaster Recovery ansible solution, it is great to see users get familiar with it. Please see my comments inline Regards, Maor On Mon, Feb 5, 2018 at 7:54 PM, Yaniv Kaulwrote: > > > On Feb 5, 2018 5:00 PM, "Luca 'remix_tj' Lorenzetto" < > lorenzetto.l...@gmail.com> wrote: > > Hello, > > i'm starting the implementation of our disaster recovery site with RHV > 4.1.latest for our production environment. > > Our production setup is very easy, with self hosted engine on dc > KVMPDCA, and virtual machines both in KVMPDCA and KVMPD dcs. All our > setup has an FC storage backend, which is EMC VPLEX/VMAX in KVMPDCA > and EMC VNX8000. Both storage arrays supports replication via their > own replication protocols (SRDF, MirrorView), so we'd like to delegate > to them the replication of data to the remote site, which is located > on another remote datacenter. > > In KVMPD DC we have some storage domains that contains non critical > VMs, which we don't want to replicate to remote site (in case of > failure they have a low priority and will be restored from a backup). > In our setup we won't replicate them, so will be not available for > attachment on remote site. Can be this be an issue? Do we require to > replicate everything? > > No, it is not required to replicate everything. If there are no disks on those storage domains that attached to your critical VMs/Templates you don't have to use them as part of yout mapping var file > What about master domain? Do i require that the master storage domain > stays on a replicated volume or can be any of the available ones? > > You can choose which storage domains you want to recover. Basically, if a storage domain is indicated as "master" in the mapping var file then it should be attached first to the Data Center. If your secondary setup already contains a master storage domain which you dont care to replicate and recover, then you can configure your mapping var file to only attach regular storage domains, simply indicate "dr_master_domain: False" in the dr_import_storages for all the storage domains. (You can contact me on IRC if you need some guidance with it) > > I've seen that since 4.1 there's an API for updating OVF_STORE disks. > Do we require to invoke it with a frequency that is the compatible > with the replication frequency on storage side. > > No, you don't have to use the update OVF_STORE disk for replication. The OVF_STORE disk is being updated every 60 minutes (The default configuration value), > We set at the moment > RPO to 1hr (even if planned RPO requires 2hrs). Does OVF_STORE gets > updated with the required frequency? > > OVF_STORE disk is being updated every 60 minutes but keep in mind that the OVF_STORE is being updated internally in the engine so it might not be synced with the RPO which you configured. If I understood correctly, then you are right by indicating that the data of the storage domain will be synced at approximatly 2 hours = RPO of 1hr + OVF_STORE update of 1hr > > I've seen a recent presentation by Maor Lipchuk that is showing the > "automagic" ansible role for disaster recovery: > > https://www.slideshare.net/maorlipchuk/ovirt-dr-site-tosite-using-ansible > > It's also related with some youtube presentations demonstrating a real > DR plan execution. > > But what i've seen is that Maor is explicitly talking about 4.2 > release. Does that role works only with >4.2 releases or can be used > also on earlier (4.1) versions? > > > Releases before 4.2 do not store complete information on the OVF store to > perform such comprehensive failover. I warmly suggest 4.2! > Y. > Indeed, We also introduced several functionalities like detach of master storage domain , and attach of "dirty" master storage domain which are depndant on the failover process, so unfortunatly to support a full recovery process you will need oVirt 4.2 env. > > I've tested a manual flow of replication + recovery through Import SD > followed by Import VM and worked like a charm. Using a prebuilt > ansible role will reduce my effort on creating a new automation for > doing this. > > Anyone has experiences like mine? > > Thank you for the help you may provide, i'd like to contribute back to > you with all my findings and with an usable tool (also integrated with > storage arrays if possible). > > Please feel free to share your comments and questions, I would very appreciate to know your user expirience. > > Luca > > (Sorry for duplicate email, ctrl-enter happened before mail completion) > > > -- > "E' assurdo impiegare gli uomini di intelligenza eccellente per fare > calcoli che potrebbero essere affidati a chiunque se si usassero delle > macchine" > Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) > > "Internet è la più grande biblioteca del mondo. > Ma il problema è che i libri sono tutti sparsi sul pavimento" > John Allen Paulos, Matematico (1945-vivente) > > Luca 'remix_tj'
Re: [ovirt-users] oVirt DR: ansible with 4.1, only a subset of storage domain replicated
On Feb 5, 2018 5:00 PM, "Luca 'remix_tj' Lorenzetto" < lorenzetto.l...@gmail.com> wrote: Hello, i'm starting the implementation of our disaster recovery site with RHV 4.1.latest for our production environment. Our production setup is very easy, with self hosted engine on dc KVMPDCA, and virtual machines both in KVMPDCA and KVMPD dcs. All our setup has an FC storage backend, which is EMC VPLEX/VMAX in KVMPDCA and EMC VNX8000. Both storage arrays supports replication via their own replication protocols (SRDF, MirrorView), so we'd like to delegate to them the replication of data to the remote site, which is located on another remote datacenter. In KVMPD DC we have some storage domains that contains non critical VMs, which we don't want to replicate to remote site (in case of failure they have a low priority and will be restored from a backup). In our setup we won't replicate them, so will be not available for attachment on remote site. Can be this be an issue? Do we require to replicate everything? What about master domain? Do i require that the master storage domain stays on a replicated volume or can be any of the available ones? I've seen that since 4.1 there's an API for updating OVF_STORE disks. Do we require to invoke it with a frequency that is the compatible with the replication frequency on storage side. We set at the moment RPO to 1hr (even if planned RPO requires 2hrs). Does OVF_STORE gets updated with the required frequency? I've seen a recent presentation by Maor Lipchuk that is showing the "automagic" ansible role for disaster recovery: https://www.slideshare.net/maorlipchuk/ovirt-dr-site-tosite-using-ansible It's also related with some youtube presentations demonstrating a real DR plan execution. But what i've seen is that Maor is explicitly talking about 4.2 release. Does that role works only with >4.2 releases or can be used also on earlier (4.1) versions? Releases before 4.2 do not store complete information on the OVF store to perform such comprehensive failover. I warmly suggest 4.2! Y. I've tested a manual flow of replication + recovery through Import SD followed by Import VM and worked like a charm. Using a prebuilt ansible role will reduce my effort on creating a new automation for doing this. Anyone has experiences like mine? Thank you for the help you may provide, i'd like to contribute back to you with all my findings and with an usable tool (also integrated with storage arrays if possible). Luca (Sorry for duplicate email, ctrl-enter happened before mail completion) -- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , < lorenzetto.l...@gmail.com> ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] hosted-engine 4.2.1-pre setup on a clean node..
On Fri, Feb 2, 2018 at 9:10 PM, Thomas Daviswrote: > Is this supported? > > I have a node, that centos 7.4 minimal is installed on, with an interface > setup for an IP address. > > I've yum installed nothing else except the ovirt-4.2.1-pre rpm, run > screen, and then do the 'hosted-engine --deploy' command. > Fine, nothing else is required. > > It hangs on: > > [ INFO ] changed: [localhost] > [ INFO ] TASK [Get ovirtmgmt route table id] > [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, > "cmd": "ip rule list | grep ovirtmgmt | sed s/[.*]\\ //g | awk '{ > print $9 }'", "delta": "0:00:00.004845", "end": "2018-02-02 > 12:03:30.794860", "rc": 0, "start": "2018-02-02 12:03:30.790015", "stderr": > "", "stderr_lines": [], "stdout": "", "stdout_lines": []} > [ ERROR ] Failed to execute stage 'Closing up': Failed executing > ansible-playbook > [ INFO ] Stage: Clean up > [ INFO ] Cleaning temporary resources > [ INFO ] TASK [Gathering Facts] > [ INFO ] ok: [localhost] > [ INFO ] TASK [Remove local vm dir] > [ INFO ] ok: [localhost] > [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine- > setup/answers/answers-20180202120333.conf' > [ INFO ] Stage: Pre-termination > [ INFO ] Stage: Termination > [ ERROR ] Hosted Engine deployment failed: please check the logs for the > issue, fix accordingly or re-deploy from scratch. > Log file is located at /var/log/ovirt-hosted-engine- > setup/ovirt-hosted-engine-setup-20180202115038-r11nh1.log > > but the VM is up and running, just attached to the 192.168.122.0/24 subnet > > [root@d8-r13-c2-n1 ~]# ssh root@192.168.122.37 > root@192.168.122.37's password: > Last login: Fri Feb 2 11:54:47 2018 from 192.168.122.1 > [root@ovirt ~]# systemctl status ovirt-engine > ● ovirt-engine.service - oVirt Engine >Loaded: loaded (/usr/lib/systemd/system/ovirt-engine.service; enabled; > vendor preset: disabled) >Active: active (running) since Fri 2018-02-02 11:54:42 PST; 11min ago > Main PID: 24724 (ovirt-engine.py) >CGroup: /system.slice/ovirt-engine.service >├─24724 /usr/bin/python /usr/share/ovirt-engine/ > services/ovirt-engine/ovirt-engine.py --redirect-output --systemd=notify > start >└─24856 ovirt-engine -server -XX:+TieredCompilation -Xms3971M > -Xmx3971M -Djava.awt.headless=true -Dsun.rmi.dgc.client.gcInterval=360 > -Dsun.rmi.dgc.server.gcInterval=360 -Djsse... > > Feb 02 11:54:41 ovirt.crt.nersc.gov systemd[1]: Starting oVirt Engine... > Feb 02 11:54:41 ovirt.crt.nersc.gov ovirt-engine.py[24724]: 2018-02-02 > 11:54:41,767-0800 ovirt-engine: INFO _detectJBossVersion:187 Detecting > JBoss version. Running: /usr/lib/jvm/jre/...60', '- > Feb 02 11:54:42 ovirt.crt.nersc.gov ovirt-engine.py[24724]: 2018-02-02 > 11:54:42,394-0800 ovirt-engine: INFO _detectJBossVersion:207 Return code: > 0, | stdout: '[u'WildFly Full 11.0.0tderr: '[]' > Feb 02 11:54:42 ovirt.crt.nersc.gov systemd[1]: Started oVirt Engine. > Feb 02 11:55:25 ovirt.crt.nersc.gov python2[25640]: ansible-stat Invoked > with checksum_algorithm=sha1 get_checksum=True follow=False > path=/usr/share/ovirt-engine/playbooks/roles/ovir...ributes=True > Feb 02 11:55:29 ovirt.crt.nersc.gov python2[25698]: ansible-stat Invoked > with checksum_algorithm=sha1 get_checksum=True follow=False > path=/usr/share/ovirt-engine/playbooks/roles/ovir...ributes=True > Feb 02 11:55:30 ovirt.crt.nersc.gov python2[25741]: ansible-stat Invoked > with checksum_algorithm=sha1 get_checksum=True follow=False > path=/usr/share/ovirt-engine/playbooks/roles/ovir...ributes=True > Feb 02 11:55:30 ovirt.crt.nersc.gov python2[25767]: ansible-stat Invoked > with checksum_algorithm=sha1 get_checksum=True follow=False > path=/usr/share/ovirt-engine/playbooks/roles/ovir...ributes=True > Feb 02 11:55:31 ovirt.crt.nersc.gov python2[25795]: ansible-stat Invoked > with checksum_algorithm=sha1 get_checksum=True follow=False > path=/etc/ovirt-engine-metrics/config.yml get_md5...ributes=True > > The 'ip rule list' never has an ovirtmgmt rule/table in it.. which means > the ansible script loops then dies; vdsmd has never configured the network > on the node. > Right. Can you please attach engine.log and host-deploy from the engine VM? > > [root@d8-r13-c2-n1 ~]# systemctl status vdsmd -l > ● vdsmd.service - Virtual Desktop Server Manager >Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor > preset: enabled) >Active: active (running) since Fri 2018-02-02 11:55:11 PST; 14min ago > Main PID: 7654 (vdsmd) >CGroup: /system.slice/vdsmd.service >└─7654 /usr/bin/python2 /usr/share/vdsm/vdsmd > > Feb 02 11:55:11 d8-r13-c2-n1 vdsmd_init_common.sh[7551]: vdsm: Running > dummybr > Feb 02 11:55:11 d8-r13-c2-n1 vdsmd_init_common.sh[7551]: vdsm: Running > tune_system > Feb 02 11:55:11 d8-r13-c2-n1 vdsmd_init_common.sh[7551]: vdsm: Running > test_space > Feb 02 11:55:11 d8-r13-c2-n1 vdsmd_init_common.sh[7551]:
Re: [ovirt-users] ovirt 3.6, we had the ovirt manager go down in a bad way and all VMs for one node marked Unknown and Not Reponding while up
Forgive the top post. I guess what I need to know now is whether there is a recovery path that doesn't lead to total loss of the VMs that are currently in the "Unknown" "Not responding" state. We are planning a total oVirt shutdown. I just would like to know if we've effectively lot those VMs or not. Again, the VMs are currently "up". And we use a file backup process, so in theory they can be restored, just somewhat painfully, from scratch. But if somebody knows if we shutdown all the bad VMs and the blade, is there someway oVirt can know the VMs are "ok" to start up?? Will changing their state directly to "down" in the db stick if the blade is down? That is, will we get to a known state where the VMs can actually be started and brought back into a known state? Right now, we're feeling there's a good chance we will not be able to recover these VMs, even though they are "up" right now. I really need some way to force oVirt into an integral state, even if it means we take the whole thing down. Possible? On 01/25/2018 06:57 PM, Christopher Cox wrote: On 01/25/2018 04:57 PM, Douglas Landgraf wrote: On Thu, Jan 25, 2018 at 5:12 PM, Christopher Coxwrote: On 01/25/2018 02:25 PM, Douglas Landgraf wrote: On Wed, Jan 24, 2018 at 10:18 AM, Christopher Cox wrote: Would restarting vdsm on the node in question help fix this? Again, all the VMs are up on the node. Prior attempts to fix this problem have left the node in a state where I can issue the "has been rebooted" command to it, it's confused. So... node is up. All VMs are up. Can't issue "has been rebooted" to the node, all VMs show Unknown and not responding but they are up. Chaning the status is the ovirt db to 0 works for a second and then it goes immediately back to 8 (which is why I'm wondering if I should restart vdsm on the node). It's not recommended to change db manually. Oddly enough, we're running all of this in production. So, watching it all go down isn't the best option for us. Any advice is welcome. We would need to see the node/engine logs, have you found any error in the vdsm.log (from nodes) or engine.log? Could you please share the error? In short, the error is our ovirt manager lost network (our problem) and crashed hard (hardware issue on the server).. On bring up, we had some network changes (that caused the lost network problem) so our LACP bond was down for a bit while we were trying to bring it up (noting the ovirt manager is up while we're reestablishing the network on the switch side). In other word, that's the "error" so to speak that got us to where we are. Full DEBUG enabled on the logs... The error messages seem obvious to me.. starts like this (nothing the ISO DOMAIN was coming off an NFS mount off the ovirt management server... yes... we know... we do have plans to move that). So on the hypervisor node itself, from the vdsm.log (vdsm.log.33.xz): (hopefully no surprise here) Thread-2426633::WARNING::2018-01-23 13:50:56,672::fileSD::749::Storage.scanDomains::(collectMetaFiles) Could not collect metadata file for domain path /rhev/data-center/mnt/d0lppc129.skopos.me:_var_lib_exports_iso-20160408002844 Traceback (most recent call last): File "/usr/share/vdsm/storage/fileSD.py", line 735, in collectMetaFiles sd.DOMAIN_META_DATA)) File "/usr/share/vdsm/storage/outOfProcess.py", line 121, in glob return self._iop.glob(pattern) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 536, in glob return self._sendCommand("glob", {"pattern": pattern}, self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 421, in _sendCommand raise Timeout(os.strerror(errno.ETIMEDOUT)) Timeout: Connection timed out Thread-27::ERROR::2018-01-23 13:50:56,672::sdc::145::Storage.StorageDomainCache::(_findDomain) domain e5ecae2f-5a06-4743-9a43-e74d83992c35 not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'e5ecae2f-5a06-4743-9a43-e74d83992c35',) Thread-27::ERROR::2018-01-23 13:50:56,673::monitor::276::Storage.Monitor::(_monitorDomain) Error monitoring domain e5ecae2f-5a06-4743-9a43-e74d83992c35 Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 272, in _monitorDomain self._performDomainSelftest() File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 769, in wrapper value = meth(self, *a, **kw) File "/usr/share/vdsm/storage/monitor.py", line 339, in _performDomainSelftest
Re: [ovirt-users] oVirt Upgrade 4.1 -> 4.2 fails with YUM dependency problems (CentOS)
Following the minor release upgrade instructions on https://www.ovirt.org/documentation/upgrade-guide/chap-Updates_between_Minor_Releases/ solved this issue. Now we are bumping into an other issue, for which I'll probably open an other thread. frank On 02/02/2018 05:33 PM, Chas Hockenbarger wrote: I haven't tried this yet, but looking at the detailed error, the implication is that your current install is less than 4.1.7, which is where the conflict is. Have you tried updating to > 4.1.7 before upgrading? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt DR: ansible with 4.1, only a subset of storage domain replicated
Hello, i'm starting the implementation of our disaster recovery site with RHV 4.1.latest for our production environment. Our production setup is very easy, with self hosted engine on dc KVMPDCA, and virtual machines both in KVMPDCA and KVMPD dcs. All our setup has an FC storage backend, which is EMC VPLEX/VMAX in KVMPDCA and EMC VNX8000. Both storage arrays supports replication via their own replication protocols (SRDF, MirrorView), so we'd like to delegate to them the replication of data to the remote site, which is located on another remote datacenter. In KVMPD DC we have some storage domains that contains non critical VMs, which we don't want to replicate to remote site (in case of failure they have a low priority and will be restored from a backup). In our setup we won't replicate them, so will be not available for attachment on remote site. Can be this be an issue? Do we require to replicate everything? What about master domain? Do i require that the master storage domain stays on a replicated volume or can be any of the available ones? I've seen that since 4.1 there's an API for updating OVF_STORE disks. Do we require to invoke it with a frequency that is the compatible with the replication frequency on storage side. We set at the moment RPO to 1hr (even if planned RPO requires 2hrs). Does OVF_STORE gets updated with the required frequency? I've seen a recent presentation by Maor Lipchuk that is showing the "automagic" ansible role for disaster recovery: https://www.slideshare.net/maorlipchuk/ovirt-dr-site-tosite-using-ansible It's also related with some youtube presentations demonstrating a real DR plan execution. But what i've seen is that Maor is explicitly talking about 4.2 release. Does that role works only with >4.2 releases or can be used also on earlier (4.1) versions? I've tested a manual flow of replication + recovery through Import SD followed by Import VM and worked like a charm. Using a prebuilt ansible role will reduce my effort on creating a new automation for doing this. Anyone has experiences like mine? Thank you for the help you may provide, i'd like to contribute back to you with all my findings and with an usable tool (also integrated with storage arrays if possible). Luca (Sorry for duplicate email, ctrl-enter happened before mail completion) -- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net ,___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Failed upgrade from 4.1.9 to 4.2.x
On Mon, Feb 5, 2018 at 3:08 PM,wrote: > El 2018-02-05 14:03, Simone Tiraboschi escribió: > >> On Mon, Feb 5, 2018 at 2:46 PM, wrote: >> >> Hi, >>> >>> We're trying to upgrade from 4.1.9 to 4.2.x and we're bumping into >>> an error we don't know how to solve. As per [1] we run the >>> 'engine-setup' command and it fails with: >>> >>> [ INFO ] Rolling back to the previous PostgreSQL instance >>> (postgresql). >>> [ ERROR ] Failed to execute stage 'Misc configuration': Command >>> '/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup' failed to >>> execute >>> [ INFO ] Yum Performing yum transaction rollback >>> [ INFO ] Stage: Clean up >>> Log file is located at >>> >>> /var/log/ovirt-engine/setup/ovirt-engine-setup-20180205133116-sm2xd1.log >> >>> [ INFO ] Generating answer file >>> '/var/lib/ovirt-engine/setup/answers/20180205133354-setup.co [1]nf' >>> [ INFO ] Stage: Pre-termination >>> [ INFO ] Stage: Termination >>> [ ERROR ] Execution of setup failed >>> >>> As of the >>> >>> /var/log/ovirt-engine/setup/ovirt-engine-setup-20180205133116-sm2xd1.log >> >>> file I could see this: >>> >>> * upgrading from 'postgresql.service' to >>> 'rh-postgresql95-postgresql.se [2]rvice' >>> * Upgrading database. >>> ERROR: pg_upgrade tool failed >>> ERROR: Upgrade failed. >>> * See /var/lib/pgsql/upgrade_rh-postgresql95-postgresql.log for >>> details. >>> >>> And this file contains this information: >>> >>> Performing Consistency Checks >>> - >>> Checking cluster versions >>> ok >>> Checking database user is the install user >>> ok >>> Checking database connection settings >>> ok >>> Checking for prepared transactions >>> ok >>> Checking for reg* system OID user data types >>> ok >>> Checking for contrib/isn with bigint-passing mismatch >>> ok >>> Checking for invalid "line" user columns >>> ok >>> Creating dump of global objects >>>ok >>> Creating dump of database schemas >>> django >>> engine >>> ovirt_engine_history >>> postgres >>> template1 >>> >>> ok >>> Checking for presence of required libraries >>>fatal >>> >>> Your installation references loadable libraries that are missing >>> from the >>> new installation. You can add these libraries to the new >>> installation, >>> or remove the functions using them from the old installation. >>> A list of >>> problem libraries is in the file: >>> loadable_libraries.txt >>> >>> Failure, exiting >>> >>> I'm attaching full logs FWIW. Also, I'd like to mention that we >>> created two custom triggers on the engine's 'users' table, but as I >>> understand from the error this is not the issue (We upgraded several >>> times within the same minor and we had no issues with that). >>> >>> Could someone shed some light on this error and how to debug it? >>> >> >> Hi, >> can you please attach also loadable_libraries.txt ? >> >> > > Could not load library "$libdir/plpython2" > ERROR: could not access file "$libdir/plpython2": No such file or > directory > Hmm, you probably need to install rh-postgresql95-postgresql-plpython package. This is not installed by default with oVirt as we don't use it > > Well, definitely it has to do with the triggers... The trigger uses > plpython2u to replicate some entries in a different database. Is there a > way I can get rid of this error other than disabling plpython2 before > upgrading and re-enabling it after the upgrade? > > Thanks. > > >> Thanks. >>> >>> [1]: https://www.ovirt.org/release/4.2.0/ [3] >>> ___ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users [4] >>> >> >> >> >> Links: >> -- >> [1] http://20180205133354-setup.co >> [2] http://rh-postgresql95-postgresql.se >> [3] https://www.ovirt.org/release/4.2.0/ >> [4] http://lists.ovirt.org/mailman/listinfo/users >> > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > -- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] oVirt DR: ansible with 4.1, only a subset of storage domain replicated
Hello, i'm starting the implementation of our disaster recovery site with RHV 4.1.latest for our production environment. Our production setup is very easy, with self hosted engine on dc KVMPDCA, and virtual machines both in KVMPDCA and KVMPD dcs. All our setup has an FC storage backend, which is EMC VPLEX/VMAX in KVMPDCA and EMC VNX8000. Both storage arrays supports replication via their own replication protocols (SRDF, MirrorView), so we'd like to delegate to them the replication of data to the remote site, which is located on another remote datacenter. In KVMPD DC we have some storage domains that contains non critical VMs, which we don't want to replicate to remote site (in case of failure they have a low priority and will be restored from a backup). In our setup we won't replicate them, so will be not available for attachment on remote site. Can be this be an issue? Do we require to replicate everything? What about master domain? Do i require that the master storage domain stays on a replicated volume or can be any of the available ones? I've seen that since 4.1 there's an API for updating OVF_STORE disks. Do we require to invoke it with a frequency that is the compatible with the replication frequency on storage side. We set at the moment RPO to 1hr (even if planned RPO requires 2hrs). Does OVF_STORE gets updated with the required frequency? I've seen a recent presentation by Maor Lipchuk that is showing the automagic ansible role for disaster recovery: -- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net ,___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] vdsmd fails after upgrade 4.1 -> 4.2
Hi Frank, can you please send a vdsm logs? The 4.2 release added the little different deployment from the engine. Now the ansible is also called. Although I am not sure if this is your case. I would go for entirely removing the vdsm and installing it from scratch if it is possible for you. This could solve your issue. Looking forward to hear from you. Petr On Mon, Feb 5, 2018 at 2:49 PM, Frank Rothenstein < f.rothenst...@bodden-kliniken.de> wrote: > Hi, > > I'm currently stuck - after upgrading 4.1 to 4.2 I cannot start the > host-processes. > systemctl start vdsmd fails with following lines in journalctl: > > > > Feb 05 14:40:15 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: vdsm: Running wait_for_network > Feb 05 14:40:15 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: vdsm: Running run_init_hooks > Feb 05 14:40:15 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: vdsm: Running check_is_configured > Feb 05 14:40:15 glusternode1.bodden-kliniken.net > sasldblistusers2[10440]: DIGEST-MD5 common mech free > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: Error: > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: One of the modules is not configured to > work with VDSM. > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: To configure the module use the following: > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: 'vdsm-tool configure [--module module- > name]'. > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: If all modules are not configured try to > use: > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: 'vdsm-tool configure --force' > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: (The force flag will stop the module's > service and start it > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: afterwards automatically to load the new > configuration.) > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: abrt is already configured for vdsm > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: lvm is configured for vdsm > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: libvirt is not configured for vdsm yet > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: Current revision of multipath.conf > detected, preserving > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: Modules libvirt are not configured > Feb 05 14:40:16 glusternode1.bodden-kliniken.net > vdsmd_init_common.sh[10414]: vdsm: stopped during execute > check_is_configured task (task returned with error code 1). > Feb 05 14:40:16 glusternode1.bodden-kliniken.net systemd[1]: > vdsmd.service: control process exited, code=exited status=1 > Feb 05 14:40:16 glusternode1.bodden-kliniken.net systemd[1]: Failed to > start Virtual Desktop Server Manager. > -- Subject: Unit vdsmd.service has failed > -- Defined-By: systemd > -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel > -- > -- Unit vdsmd.service has failed. > -- > -- The result is failed. > Feb 05 14:40:16 glusternode1.bodden-kliniken.net systemd[1]: Dependency > failed for MOM instance configured for VDSM purposes. > -- Subject: Unit mom-vdsm.service has failed > > > > The suggested "vdsm-tool configure --force" runs w/o errors, the > following restart of vdsmd shows the same error. > > Any hints on that topic? > > Frank > > > > Frank Rothenstein > > Systemadministrator > Fon: +49 3821 700 125 <+49%203821%20700125> > Fax: +49 3821 700 190 <+49%203821%20700190> > Internet: www.bodden-kliniken.de > E-Mail: f.rothenst...@bodden-kliniken.de > > > _ > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH > Sandhufe 2 > 18311 Ribnitz-Damgarten > > Telefon: 03821-700-0 > Telefax: 03821-700-240 > > E-Mail: i...@bodden-kliniken.de > Internet: http://www.bodden-kliniken.de > > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: > 079/133/40188 > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko > Milski, MBA > > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten > Adressaten bestimmt. Wenn Sie nicht der > vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, > beachten Sie bitte, dass jede > Form der Veröffentlichung, Vervielfältigung oder Weitergabe des Inhalts > dieser E-Mail unzulässig ist. > Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu > löschen. > > © BODDEN-KLINIKEN Ribnitz-Damgarten GmbH 2017 > *** Virenfrei durch Kerio Mail Server und SOPHOS Antivirus *** > > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >
Re: [ovirt-users] Failed upgrade from 4.1.9 to 4.2.x
El 2018-02-05 14:03, Simone Tiraboschi escribió: On Mon, Feb 5, 2018 at 2:46 PM,wrote: Hi, We're trying to upgrade from 4.1.9 to 4.2.x and we're bumping into an error we don't know how to solve. As per [1] we run the 'engine-setup' command and it fails with: [ INFO ] Rolling back to the previous PostgreSQL instance (postgresql). [ ERROR ] Failed to execute stage 'Misc configuration': Command '/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup' failed to execute [ INFO ] Yum Performing yum transaction rollback [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20180205133116-sm2xd1.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20180205133354-setup.co [1]nf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Execution of setup failed As of the /var/log/ovirt-engine/setup/ovirt-engine-setup-20180205133116-sm2xd1.log file I could see this: * upgrading from 'postgresql.service' to 'rh-postgresql95-postgresql.se [2]rvice' * Upgrading database. ERROR: pg_upgrade tool failed ERROR: Upgrade failed. * See /var/lib/pgsql/upgrade_rh-postgresql95-postgresql.log for details. And this file contains this information: Performing Consistency Checks - Checking cluster versions ok Checking database user is the install user ok Checking database connection settings ok Checking for prepared transactions ok Checking for reg* system OID user data types ok Checking for contrib/isn with bigint-passing mismatch ok Checking for invalid "line" user columns ok Creating dump of global objects ok Creating dump of database schemas django engine ovirt_engine_history postgres template1 ok Checking for presence of required libraries fatal Your installation references loadable libraries that are missing from the new installation. You can add these libraries to the new installation, or remove the functions using them from the old installation. A list of problem libraries is in the file: loadable_libraries.txt Failure, exiting I'm attaching full logs FWIW. Also, I'd like to mention that we created two custom triggers on the engine's 'users' table, but as I understand from the error this is not the issue (We upgraded several times within the same minor and we had no issues with that). Could someone shed some light on this error and how to debug it? Hi, can you please attach also loadable_libraries.txt ? Could not load library "$libdir/plpython2" ERROR: could not access file "$libdir/plpython2": No such file or directory Well, definitely it has to do with the triggers... The trigger uses plpython2u to replicate some entries in a different database. Is there a way I can get rid of this error other than disabling plpython2 before upgrading and re-enabling it after the upgrade? Thanks. Thanks. [1]: https://www.ovirt.org/release/4.2.0/ [3] ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users [4] Links: -- [1] http://20180205133354-setup.co [2] http://rh-postgresql95-postgresql.se [3] https://www.ovirt.org/release/4.2.0/ [4] http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Failed upgrade from 4.1.9 to 4.2.x
On Mon, Feb 5, 2018 at 2:46 PM,wrote: > Hi, > > We're trying to upgrade from 4.1.9 to 4.2.x and we're bumping into an > error we don't know how to solve. As per [1] we run the 'engine-setup' > command and it fails with: > > [ INFO ] Rolling back to the previous PostgreSQL instance (postgresql). > [ ERROR ] Failed to execute stage 'Misc configuration': Command > '/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup' failed to execute > [ INFO ] Yum Performing yum transaction rollback > [ INFO ] Stage: Clean up > Log file is located at /var/log/ovirt-engine/setup/ov > irt-engine-setup-20180205133116-sm2xd1.log > [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/ > 20180205133354-setup.conf' > [ INFO ] Stage: Pre-termination > [ INFO ] Stage: Termination > [ ERROR ] Execution of setup failed > > As of the > /var/log/ovirt-engine/setup/ovirt-engine-setup-20180205133116-sm2xd1.log > file I could see this: > > * upgrading from 'postgresql.service' to 'rh-postgresql95-postgresql.se > rvice' > * Upgrading database. > ERROR: pg_upgrade tool failed > ERROR: Upgrade failed. > * See /var/lib/pgsql/upgrade_rh-postgresql95-postgresql.log for details. > > And this file contains this information: > > Performing Consistency Checks > - > Checking cluster versions ok > Checking database user is the install user ok > Checking database connection settings ok > Checking for prepared transactions ok > Checking for reg* system OID user data typesok > Checking for contrib/isn with bigint-passing mismatch ok > Checking for invalid "line" user columnsok > Creating dump of global objects ok > Creating dump of database schemas > django > engine > ovirt_engine_history > postgres > template1 > ok > Checking for presence of required libraries fatal > > Your installation references loadable libraries that are missing from the > new installation. You can add these libraries to the new installation, > or remove the functions using them from the old installation. A list of > problem libraries is in the file: > loadable_libraries.txt > > Failure, exiting > > I'm attaching full logs FWIW. Also, I'd like to mention that we created > two custom triggers on the engine's 'users' table, but as I understand from > the error this is not the issue (We upgraded several times within the same > minor and we had no issues with that). > > Could someone shed some light on this error and how to debug it? > Hi, can you please attach also loadable_libraries.txt ? > > Thanks. > > [1]: https://www.ovirt.org/release/4.2.0/ > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] vdsmd fails after upgrade 4.1 -> 4.2
Hi, I'm currently stuck - after upgrading 4.1 to 4.2 I cannot start the host-processes. systemctl start vdsmd fails with following lines in journalctl: Feb 05 14:40:15 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: vdsm: Running wait_for_network Feb 05 14:40:15 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: vdsm: Running run_init_hooks Feb 05 14:40:15 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: vdsm: Running check_is_configured Feb 05 14:40:15 glusternode1.bodden-kliniken.net sasldblistusers2[10440]: DIGEST-MD5 common mech free Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: Error: Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: One of the modules is not configured to work with VDSM. Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: To configure the module use the following: Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: 'vdsm-tool configure [--module module- name]'. Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: If all modules are not configured try to use: Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: 'vdsm-tool configure --force' Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: (The force flag will stop the module's service and start it Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: afterwards automatically to load the new configuration.) Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: abrt is already configured for vdsm Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: lvm is configured for vdsm Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: libvirt is not configured for vdsm yet Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: Current revision of multipath.conf detected, preserving Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: Modules libvirt are not configured Feb 05 14:40:16 glusternode1.bodden-kliniken.net vdsmd_init_common.sh[10414]: vdsm: stopped during execute check_is_configured task (task returned with error code 1). Feb 05 14:40:16 glusternode1.bodden-kliniken.net systemd[1]: vdsmd.service: control process exited, code=exited status=1 Feb 05 14:40:16 glusternode1.bodden-kliniken.net systemd[1]: Failed to start Virtual Desktop Server Manager. -- Subject: Unit vdsmd.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit vdsmd.service has failed. -- -- The result is failed. Feb 05 14:40:16 glusternode1.bodden-kliniken.net systemd[1]: Dependency failed for MOM instance configured for VDSM purposes. -- Subject: Unit mom-vdsm.service has failed The suggested "vdsm-tool configure --force" runs w/o errors, the following restart of vdsmd shows the same error. Any hints on that topic? Frank Frank Rothenstein Systemadministrator Fon: +49 3821 700 125 Fax: +49 3821 700 190Internet: www.bodden-kliniken.de E-Mail: f.rothenst...@bodden-kliniken.de _ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten Telefon: 03821-700-0 Telefax: 03821-700-240 E-Mail: i...@bodden-kliniken.de Internet: http://www.bodden-kliniken.de Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski, MBA Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröffentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen. © BODDEN-KLINIKEN Ribnitz-Damgarten GmbH 2017 *** Virenfrei durch Kerio Mail Server und SOPHOS Antivirus *** ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Failed upgrade from 4.1.9 to 4.2.x
Hi, We're trying to upgrade from 4.1.9 to 4.2.x and we're bumping into an error we don't know how to solve. As per [1] we run the 'engine-setup' command and it fails with: [ INFO ] Rolling back to the previous PostgreSQL instance (postgresql). [ ERROR ] Failed to execute stage 'Misc configuration': Command '/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup' failed to execute [ INFO ] Yum Performing yum transaction rollback [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20180205133116-sm2xd1.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20180205133354-setup.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Execution of setup failed As of the /var/log/ovirt-engine/setup/ovirt-engine-setup-20180205133116-sm2xd1.log file I could see this: * upgrading from 'postgresql.service' to 'rh-postgresql95-postgresql.service' * Upgrading database. ERROR: pg_upgrade tool failed ERROR: Upgrade failed. * See /var/lib/pgsql/upgrade_rh-postgresql95-postgresql.log for details. And this file contains this information: Performing Consistency Checks - Checking cluster versions ok Checking database user is the install user ok Checking database connection settings ok Checking for prepared transactions ok Checking for reg* system OID user data typesok Checking for contrib/isn with bigint-passing mismatch ok Checking for invalid "line" user columnsok Creating dump of global objects ok Creating dump of database schemas django engine ovirt_engine_history postgres template1 ok Checking for presence of required libraries fatal Your installation references loadable libraries that are missing from the new installation. You can add these libraries to the new installation, or remove the functions using them from the old installation. A list of problem libraries is in the file: loadable_libraries.txt Failure, exiting I'm attaching full logs FWIW. Also, I'd like to mention that we created two custom triggers on the engine's 'users' table, but as I understand from the error this is not the issue (We upgraded several times within the same minor and we had no issues with that). Could someone shed some light on this error and how to debug it? Thanks. [1]: https://www.ovirt.org/release/4.2.0/ upgrade.tar.gz Description: GNU Zip compressed data ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Documentation about vGPU in oVirt 4.2
On Fri, Feb 2, 2018 at 12:13 PM, Jordan, Marcelwrote: > Hi, > > i have some NVIDIA Tesla P100 and V100 gpu in our oVirt 4.2 cluster and > searching for a documentation how to use the new vGPU feature. Is there > any documentation out there how i configure it correctly? > > -- > Marcel Jordan > > > Possibly check what would become the official documentation for RHEV 4.2, even if it could not map one-to-one with oVirt Admin guide here: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2-beta/html/administration_guide/sect-host_tasks#Preparing_GPU_Passthrough Planning and prerequisites guide here: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2-Beta/html/planning_and_prerequisites_guide/requirements#pci_device_requirements In oVirt 4.2 release notes I see these bugzilla entries that can help too... https://bugzilla.redhat.com/show_bug.cgi?id=1481007 https://bugzilla.redhat.com/show_bug.cgi?id=1482033 HIH, Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] After realizing HA migration, the virtual machine can still get the virtual machine information by using the "vdsm-client host getVMList" instruction on the host before the migration
Hi, I have experimented on the issue and figured out the reason for the original issue. You are right, that the vm1 is not properly stopped. This is due to the known issue in the graceful shutdown introduced in the ovirt 4.2. The vm on the host in shutdown are killed, but are not marked as stopped. This results in the behavior you have observed. Luckily, the patch is already done and present in the latest ovirt. However, be ware that gracefully shutting down the host, will result in graceful shutdown of the VMs. This result in engine not migrating them, since they have been terminated gracefully. Hope this helps. Best, Petr On Fri, Feb 2, 2018 at 6:00 PM, Simone Tiraboschiwrote: > > > On Thu, Feb 1, 2018 at 1:06 PM, Pym wrote: > >> The environment on my side may be different from the link. My VM1 can be >> used normally after it is started on host2, but there is still information >> left on host1 that is not cleaned up. >> >> Only the interface and background can still get the information of vm1 on >> host1, but the vm2 has been successfully started on host2, with the HA >> function. >> >> I would like to ask a question, whether the UUID of the virtual machine >> is stored in the database or where is it maintained? Is it not successfully >> deleted after using the HA function? >> >> > I just encounter a similar behavior: > after a reboot of the host 'vdsm-client Host getVMFullList' is still > reporting an old VM that is not visible with 'virsh -r list --all'. > > I filed a bug to track it: > https://bugzilla.redhat.com/show_bug.cgi?id=1541479 > > > >> >> >> >> >> 2018-02-01 16:12:16,"Simone Tiraboschi" : >> >> >> >> On Thu, Feb 1, 2018 at 2:21 AM, Pym wrote: >> >>> >>> I checked the vm1, he is keep up state, and can be used, but on host1 >>> has after shutdown is a suspended vm1, this cannot be used, this is the >>> problem now. >>> >>> In host1, you can get the information of vm1 using the "vdsm-client Host >>> getVMList", but you can't get the vm1 information using the "virsh list". >>> >>> >> Maybe a side effect of https://bugzilla.redhat.com >> /show_bug.cgi?id=1505399 >> >> Arik? >> >> >> >>> >>> >>> >>> 2018-02-01 07:16:37,"Simone Tiraboschi" : >>> >>> >>> >>> On Wed, Jan 31, 2018 at 12:46 PM, Pym wrote: >>> Hi: The current environment is as follows: Ovirt-engine version 4.2.0 is the source code compilation and installation. Add two hosts, host1 and host2, respectively. At host1, a virtual machine is created on vm1, and a vm2 is created on host2 and HA is configured. Operation steps: Use the shutdown -r command on host1. Vm1 successfully migrated to host2. When host1 is restarted, the following situation occurs: The state of the vm2 will be shown in two images, switching from up and pause. When I perform the "vdsm-client Host getVMList" in host1, I will get the information of vm1. When I execute the "vdsm-client Host getVMList" in host2, I will get the information of vm1 and vm2. When I do "virsh list" in host1, there is no virtual machine information. When I execute "virsh list" at host2, I will get information of vm1 and vm2. How to solve this problem? Is it the case that vm1 did not remove the information on host1 during the migration, or any other reason? >>> >>> Did you also check if your vms always remained up? >>> In 4.2 we have libvirt-guests service on the hosts which tries to >>> properly shutdown the running VMs on host shutdown. >>> >>> Thank you. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> >> >> >> >> >> > > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] ovirt and gateway behavior
Hi all, I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The cluster is used to host several VMs. I have observed that when gateway is lost (say the gateway device is down) the ovirt cluster goes down. It seems a bit extreme behavior especially when one does not care if the hosted VMs have connectivity to Internet or not. Can this behavior be disabled? Thanx, Alex ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Documentation about vGPU in oVirt 4.2
Hi, i have some NVIDIA Tesla P100 and V100 gpu in our oVirt 4.2 cluster and searching for a documentation how to use the new vGPU feature. Is there any documentation out there how i configure it correctly? -- Marcel Jordan signature.asc Description: OpenPGP digital signature ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Power management - oVirt 4,2
Dear Martin, Um..Since i am going to use HPE ProLiant DL360 Gen10 Server to setup oVirt Node(Hypervisor). HP G10 is using ilo5 rather than ilo4. Therefore, i would like to ask whether oVirt power management support iLO5 or not. If not, do you have any idea to setup power management with HP G10? Regards, Terry 2018-02-01 16:21 GMT+08:00 Martin Perina: > > > On Wed, Jan 31, 2018 at 11:19 PM, Luca 'remix_tj' Lorenzetto < > lorenzetto.l...@gmail.com> wrote: > >> Hi, >> >> From ilo3 and up, ilo fencing agents are an alias for fence_ipmi. Try >> using the standard ipmi. >> > > It's not just an alias, ilo3/ilo4 also have different defaults than > ipmilan. For example if you use ilo4, then by default following is used: > > > > lanplus=1 > power_wait=4 > > So I recommend to start with ilo4 and add any necessary custom options > into Options field. If you need some custom > options, could you please share them with us? It would be very helpful for > us, if needed we could introduce ilo5 with > different defaults then ilo4 > > Thanks > > Martin > > >> Luca >> >> >> >> Il 31 gen 2018 11:14 PM, "Terry hey" ha scritto: >> >>> Dear all, >>> Did oVirt 4.2 Power management support iLO5 as i could not see iLO5 >>> option in Power Management. >>> >>> Regards >>> Terry >>> >>> ___ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> > > > -- > Martin Perina > Associate Manager, Software Engineering > Red Hat Czech s.r.o. > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Host engine on virtual machine
On Mon, Feb 5, 2018 at 9:09 AM, maoz zadokwrote: > Hi All, > What do you think about installing the host engine on a virtual machine > hosted on the same cluster managing it? > is it make sense? > I don't like the alternative to install it on physical hardware, on the > other hand, if the host hosting the engine fall, there will be no access to > management. > Is there a best practice for it? please, share with me/us your > implementation. > > > Yes, it is supported and it is called Self Hosted Engine. See here: https://www.ovirt.org/documentation/self-hosted/Self-Hosted_Engine_Guide/ Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Host engine on virtual machine
Hi All, What do you think about installing the host engine on a virtual machine hosted on the same cluster managing it? is it make sense? I don't like the alternative to install it on physical hardware, on the other hand, if the host hosting the engine fall, there will be no access to management. Is there a best practice for it? please, share with me/us your implementation. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users