/run/vdsm/<vmid>.recovery On Fri, Apr 29, 2016 at 10:59 PM, Bill James <bill.ja...@j2.com> wrote:
> where do I find the recovery files? > > [root@ovirt1 test vdsm]# pwd > /var/lib/vdsm > [root@ovirt1 test vdsm]# ls -la > total 16 > drwxr-xr-x 6 vdsm kvm 100 Mar 17 16:33 . > drwxr-xr-x. 45 root root 4096 Apr 29 12:01 .. > -rw-r--r-- 1 vdsm kvm 10170 Jan 19 05:04 bonding-defaults.json > drwxr-xr-x 2 vdsm root 6 Apr 19 11:34 netconfback > drwxr-xr-x 3 vdsm kvm 54 Apr 19 11:35 persistence > drwxr-x---. 2 vdsm kvm 6 Mar 17 16:33 transient > drwxr-xr-x 2 vdsm kvm 40 Mar 17 16:33 upgrade > [root@ovirt1 test vdsm]# locate recovery > /opt/hp/hpdiags/en/tcstorage.ldinterimrecovery.htm > /opt/hp/hpdiags/en/tcstorage.ldrecoveryready.htm > /usr/share/doc/postgresql-9.2.15/html/archive-recovery-settings.html > /usr/share/doc/postgresql-9.2.15/html/recovery-config.html > /usr/share/doc/postgresql-9.2.15/html/recovery-target-settings.html > /usr/share/pgsql/recovery.conf.sample > /var/lib/nfs/v4recovery > > > [root@ovirt1 test vdsm]# locate 757a5 (disk id) > > /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118 > > /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2 > > /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2.lease > > /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2.meta > [root@ovirt1 test vdsm]# locate 5bfb140 (vm id) > > /var/lib/libvirt/qemu/channels/5bfb140a-a971-4c9c-82c6-277929eb45d4.com.redhat.rhevm.vdsm > > /var/lib/libvirt/qemu/channels/5bfb140a-a971-4c9c-82c6-277929eb45d4.org.qemu.guest_agent.0 > > > > > On 4/29/16 10:02 AM, Michal Skrivanek wrote: > > > > On 29 Apr 2016, at 18:26, Bill James < <bill.ja...@j2.com> > bill.ja...@j2.com> wrote: > > yes they are still saying "paused" state. > No, bouncing libvirt didn't help. > > > Then my suspicion of vm recovery gets closer to a certainty:) > Can you get one of the paused vm's .recovery file from /var/lib/vdsm and > check it says Paused there? It's worth a shot to try to remove that file > and restart vdsm, then check logs and that vm status...it should recover > "good enough" from libvirt only. > Try it with one first > > I noticed the errors about the ISO domain. Didn't think that was related. > I have been migrating a lot of VMs to ovirt lately, and recently added > another node. > Also had some problems with /etc/exports for a while, but I think those > issues are all resolved. > > > Last "unresponsive" message in vdsm.log was: > > vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::*2016-04-21* > 11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout) > vmId=`b6a13808-9552-401b-840b-4f7022e8293d`::monitor become unresponsive > (command timeout, age=310323.97) > vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::2016-04-21 > 11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout) > vmId=`5bfb140a-a971-4c9c-82c6-277929eb45d4`::monitor become unresponsive > (command timeout, age=310323.97) > > > > Thanks. > > > > On 4/29/16 1:40 AM, Michal Skrivanek wrote: > > > On 28 Apr 2016, at 19:40, Bill James <bill.ja...@j2.com> wrote: > > thank you for response. > I bold-ed the ones that are listed as "paused". > > > [root@ovirt1 test vdsm]# virsh -r list --all >  Id   Name                          State > ---------------------------------------------------- > > > > > Looks like problem started around 2016-04-17 20:19:34,822, based on > engine.log attached. > > > yes, that time looks correct. Any idea what might have been a trigger? > Anything interesting happened at that time (power outage of some host, some > maintenance action, anything)? > logs indicate a problem when vdsm talks to libvirt(all those "monitor > become unresponsive†) > > It does seem that at that time you started to have some storage > connectivity issues - first one at 2016-04-17 20:06:53,929. And it > doesn’t look temporary because such errors are still there couple hours > later(in your most recent file you attached I can see at 23:00:54) > When I/O gets blocked the VMs may experience issues (then VM gets Paused), > or their qemu process gets stuck(resulting in libvirt either reporting > error or getting stuck as well -> resulting in what vdsm sees as “monitor > unresponsive†) > > Since you now bounced libvirtd - did it help? Do you still see wrong > status for those VMs and still those "monitor unresponsive" errors in > vdsm.log? > If not…then I would suspect the “vm recovery†code not working > correctly. Milan is looking at that. > > Thanks, > michal > > > There's a lot of vdsm logs! > > fyi, the storage domain for these Vms is a "local" nfs share, > 7e566f55-e060-47b7-bfa4-ac3c48d70dda. > > attached more logs. > > > On 04/28/2016 12:53 AM, Michal Skrivanek wrote: > > On 27 Apr 2016, at 19:16, Bill James <bill.ja...@j2.com> <bill.ja...@j2.com> > wrote: > > virsh # list --all > error: failed to connect to the hypervisor > error: no valid connection > error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such > file or directory > > > you need to run virsh in read-only mode > virsh -r list —all > > > [root@ovirt1 test vdsm]# systemctl status libvirtd > â— libvirtd.service - Virtualization daemon > Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor > preset: enabled) > Drop-In: /etc/systemd/system/libvirtd.service.d > └─unlimited-core.conf > Active: active (running) since Thu 2016-04-21 16:00:03 PDT; 5 days ago > > > tried systemctl restart libvirtd. > No change. > > Attached vdsm.log and supervdsm.log. > > > [root@ovirt1 test vdsm]# systemctl status vdsmd > â— vdsmd.service - Virtual Desktop Server Manager > Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor > preset: enabled) > Active: active (running) since Wed 2016-04-27 10:09:14 PDT; 3min 46s ago > > > vdsm-4.17.18-0.el7.centos.noarch > > the vdsm.log attach is good, but it’s too short interval, it only shows > recovery(vdsm restart) phase when the VMs are identified as paused….can you > add earlier logs? Did you restart vdsm yourself or did it crash? > > > > libvirt-daemon-1.2.17-13.el7_2.4.x86_64 > > > Thanks. > > > On 04/26/2016 11:35 PM, Michal Skrivanek wrote: > > On 27 Apr 2016, at 02:04, Nir Soffer <nsof...@redhat.com> > <nsof...@redhat.com> wrote: > > jjOn Wed, Apr 27, 2016 at 2:03 AM, Bill James <bill.ja...@j2.com> > <bill.ja...@j2.com> wrote: > > I have a hardware node that has 26 VMs. > 9 are listed as "running", 17 are listed as "paused". > > In truth all VMs are up and running fine. > > I tried telling the db they are up: > > engine=> update vm_dynamic set status = 1 where vm_guid =(select > vm_guid from vm_static where vm_name = 'api1.test.j2noc.com'); > > GUI then shows it up for a short while, > > then puts it back in paused state. > > 2016-04-26 15:16:46,095 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-16) [157cc21e] VM '242ca0af-4ab2-4dd6-b515-5 > d435e6452c4'(api1.test.j2noc.com) moved from 'Up' --> 'Paused' > 2016-04-26 15:16:46,221 INFO [org.ovirt.engine.core.dal.dbbroker.auditlogh > andling.AuditLogDirector] (DefaultQuartzScheduler_Worker-16) [157cc21e] Cor > relation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM > api1.test.j2noc.com has been paused. > > > Why does the engine think the VMs are paused? > Attached engine.log. > > I can fix the problem by powering off the VM then starting it back up. > But the VM is working fine! How do I get ovirt to realize that? > > If this is an issue in engine, restarting engine may fix this. > but having this problem only with one node, I don't think this is the issue. > > If this is an issue in vdsm, restarting vdsm may fix this. > > If this does not help, maybe this is libvirt issue? did you try to check vm > status using virsh? > > this looks more likely as it seems such status is being reported > logs would help, vdsm.log at the very least. > > > If virsh thinks that the vms are paused, you can try to restart libvirtd. > > Please file a bug about this in any case with engine and vdsm logs. > > Adding Michal in case he has better idea how to proceed. > > Nir > > Users@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users > > > <engine.log-20160421.gz><vdsm.logs.tar.gz> > > > > [image: www.j2.com] > <http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employeeemail> > > This email, its contents and attachments contain information from j2 > Global, Inc > <http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employemail>. > and/or its affiliates which may be privileged, confidential or otherwise > protected from disclosure. The information is intended to be for the > addressee(s) only. If you are not an addressee, any disclosure, copy, > distribution, or use of the contents of this message is prohibited. If you > have received this email in error please notify the sender by reply e-mail > and delete the original message and any copies. © 2015 j2 Global, Inc > <http://www.j2.com/>. All rights reserved. eFax ® <http://www.efax.com/>, > eVoice > ® <http://www.evoice.com/>, Campaigner ® <http://www.campaigner.com/>, > FuseMail > ® <http://www.fusemail.com/>, KeepItSafe ® <http://www.keepitsafe.com/> > and Onebox ® <http://www.onebox.com/> are ! registere d trademarks of j2 > Global, Inc <http://www.j2.com/>. and its affiliates. > > > [image: www.j2.com] > <http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employeeemail> > > This email, its contents and attachments contain information from j2 > Global, Inc > <http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employemail>. > and/or its affiliates which may be privileged, confidential or otherwise > protected from disclosure. The information is intended to be for the > addressee(s) only. If you are not an addressee, any disclosure, copy, > distribution, or use of the contents of this message is prohibited. If you > have received this email in error please notify the sender by reply e-mail > and delete the original message and any copies. © 2015 j2 Global, Inc > <http://www.j2.com/>. All rights reserved. eFax ® <http://www.efax.com/>, > eVoice > ® <http://www.evoice.com/>, Campaigner ® <http://www.campaigner.com/>, > FuseMail > ® <http://www.fusemail.com/>, KeepItSafe ® <http://www.keepitsafe.com/> > and Onebox ® <http://www.onebox.com/> are r egistered trademarks of j2 > Global, Inc <http://www.j2.com/>. and its affiliates. > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users