/run/vdsm/<vmid>.recovery

On Fri, Apr 29, 2016 at 10:59 PM, Bill James <bill.ja...@j2.com> wrote:

> where do I find the recovery files?
>
> [root@ovirt1 test vdsm]# pwd
> /var/lib/vdsm
> [root@ovirt1 test vdsm]# ls -la
> total 16
> drwxr-xr-x   6 vdsm kvm    100 Mar 17 16:33 .
> drwxr-xr-x. 45 root root  4096 Apr 29 12:01 ..
> -rw-r--r--   1 vdsm kvm  10170 Jan 19 05:04 bonding-defaults.json
> drwxr-xr-x   2 vdsm root     6 Apr 19 11:34 netconfback
> drwxr-xr-x   3 vdsm kvm     54 Apr 19 11:35 persistence
> drwxr-x---.  2 vdsm kvm      6 Mar 17 16:33 transient
> drwxr-xr-x   2 vdsm kvm     40 Mar 17 16:33 upgrade
> [root@ovirt1 test vdsm]# locate recovery
> /opt/hp/hpdiags/en/tcstorage.ldinterimrecovery.htm
> /opt/hp/hpdiags/en/tcstorage.ldrecoveryready.htm
> /usr/share/doc/postgresql-9.2.15/html/archive-recovery-settings.html
> /usr/share/doc/postgresql-9.2.15/html/recovery-config.html
> /usr/share/doc/postgresql-9.2.15/html/recovery-target-settings.html
> /usr/share/pgsql/recovery.conf.sample
> /var/lib/nfs/v4recovery
>
>
> [root@ovirt1 test vdsm]# locate 757a5  (disk id)
>
> /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118
>
> /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2
>
> /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2.lease
>
> /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2.meta
> [root@ovirt1 test vdsm]# locate 5bfb140 (vm id)
>
> /var/lib/libvirt/qemu/channels/5bfb140a-a971-4c9c-82c6-277929eb45d4.com.redhat.rhevm.vdsm
>
> /var/lib/libvirt/qemu/channels/5bfb140a-a971-4c9c-82c6-277929eb45d4.org.qemu.guest_agent.0
>
>
>
>
> On 4/29/16 10:02 AM, Michal Skrivanek wrote:
>
>
>
> On 29 Apr 2016, at 18:26, Bill James < <bill.ja...@j2.com>
> bill.ja...@j2.com> wrote:
>
> yes they are still saying "paused" state.
> No, bouncing libvirt didn't help.
>
>
> Then my suspicion of vm recovery gets closer to a certainty:)
> Can you get one of the paused vm's .recovery file from /var/lib/vdsm and
> check it says Paused there? It's worth a shot to try to remove that file
> and restart vdsm, then check logs and that vm status...it should recover
> "good enough" from libvirt only.
> Try it with one first
>
> I noticed the errors about the ISO domain. Didn't think that was related.
> I have been migrating a lot of VMs to ovirt lately, and recently added
> another node.
> Also had some problems with /etc/exports for a while, but I think those
> issues are all resolved.
>
>
> Last "unresponsive" message in vdsm.log was:
>
> vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::*2016-04-21*
> 11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout)
> vmId=`b6a13808-9552-401b-840b-4f7022e8293d`::monitor become unresponsive
> (command timeout, age=310323.97)
> vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::2016-04-21
> 11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout)
> vmId=`5bfb140a-a971-4c9c-82c6-277929eb45d4`::monitor become unresponsive
> (command timeout, age=310323.97)
>
>
>
> Thanks.
>
>
>
> On 4/29/16 1:40 AM, Michal Skrivanek wrote:
>
>
> On 28 Apr 2016, at 19:40, Bill James <bill.ja...@j2.com> wrote:
>
> thank you for response.
> I bold-ed the ones that are listed as "paused".
>
>
> [root@ovirt1 test vdsm]# virsh -r list --all
>  Id    Name                           State
> ----------------------------------------------------
>
>
>
>
> Looks like problem started around 2016-04-17 20:19:34,822, based on
> engine.log attached.
>
>
> yes, that time looks correct. Any idea what might have been a trigger?
> Anything interesting happened at that time (power outage of some host, some
> maintenance action, anything)?Â
> logs indicate a problem when vdsm talks to libvirt(all those "monitor
> become unresponsive†)
>
> It does seem that at that time you started to have some storage
> connectivity issues - first one at 2016-04-17 20:06:53,929. And it
> doesn’t look temporary because such errors are still there couple hours
> later(in your most recent file you attached I can see at 23:00:54)
> When I/O gets blocked the VMs may experience issues (then VM gets Paused),
> or their qemu process gets stuck(resulting in libvirt either reporting
> error or getting stuck as well -> resulting in what vdsm sees as “monitor
> unresponsive†)
>
> Since you now bounced libvirtd - did it help? Do you still see wrong
> status for those VMs and still those "monitor unresponsive" errors in
> vdsm.log?
> If not…then I would suspect the “vm recovery†code not working
> correctly. Milan is looking at that.
>
> Thanks,
> michal
>
>
> There's a lot of vdsm logs!
>
> fyi, the storage domain for these Vms is a "local" nfs share,
> 7e566f55-e060-47b7-bfa4-ac3c48d70dda.
>
> attached more logs.
>
>
> On 04/28/2016 12:53 AM, Michal Skrivanek wrote:
>
> On 27 Apr 2016, at 19:16, Bill James <bill.ja...@j2.com> <bill.ja...@j2.com> 
> wrote:
>
> virsh # list --all
> error: failed to connect to the hypervisor
> error: no valid connection
> error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such 
> file or directory
>
>
> you need to run virsh in read-only mode
> virsh -r list —all
>
>
> [root@ovirt1 test vdsm]# systemctl status libvirtd
> â—  libvirtd.service - Virtualization daemon
>   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor 
> preset: enabled)
>  Drop-In: /etc/systemd/system/libvirtd.service.d
>           └─unlimited-core.conf
>   Active: active (running) since Thu 2016-04-21 16:00:03 PDT; 5 days ago
>
>
> tried systemctl restart libvirtd.
> No change.
>
> Attached vdsm.log and supervdsm.log.
>
>
> [root@ovirt1 test vdsm]# systemctl status vdsmd
> â—  vdsmd.service - Virtual Desktop Server Manager
>   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
> preset: enabled)
>   Active: active (running) since Wed 2016-04-27 10:09:14 PDT; 3min 46s ago
>
>
> vdsm-4.17.18-0.el7.centos.noarch
>
> the vdsm.log attach is good, but it’s too short interval, it only shows 
> recovery(vdsm restart) phase when the VMs are identified as paused….can you 
> add earlier logs? Did you restart vdsm yourself or did it crash?
>
>
>
> libvirt-daemon-1.2.17-13.el7_2.4.x86_64
>
>
> Thanks.
>
>
> On 04/26/2016 11:35 PM, Michal Skrivanek wrote:
>
> On 27 Apr 2016, at 02:04, Nir Soffer <nsof...@redhat.com> 
> <nsof...@redhat.com> wrote:
>
> jjOn Wed, Apr 27, 2016 at 2:03 AM, Bill James <bill.ja...@j2.com> 
> <bill.ja...@j2.com> wrote:
>
> I have a hardware node that has 26 VMs.
> 9 are listed as "running", 17 are listed as "paused".
>
> In truth all VMs are up and running fine.
>
> I tried telling the db they are up:
>
> engine=> update vm_dynamic set status = 1 where vm_guid =(select
> vm_guid from vm_static where vm_name = 'api1.test.j2noc.com');
>
> GUI then shows it up for a short while,
>
> then puts it back in paused state.
>
> 2016-04-26 15:16:46,095 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-16) [157cc21e] VM '242ca0af-4ab2-4dd6-b515-5
> d435e6452c4'(api1.test.j2noc.com) moved from 'Up' --> 'Paused'
> 2016-04-26 15:16:46,221 INFO [org.ovirt.engine.core.dal.dbbroker.auditlogh
> andling.AuditLogDirector] (DefaultQuartzScheduler_Worker-16) [157cc21e] Cor
> relation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM 
> api1.test.j2noc.com has been paused.
>
>
> Why does the engine think the VMs are paused?
> Attached engine.log.
>
> I can fix the problem by powering off the VM then starting it back up.
> But the VM is working fine! How do I get ovirt to realize that?
>
> If this is an issue in engine, restarting engine may fix this.
> but having this problem only with one node, I don't think this is the issue.
>
> If this is an issue in vdsm, restarting vdsm may fix this.
>
> If this does not help, maybe this is libvirt issue? did you try to check vm
> status using virsh?
>
> this looks more likely as it seems such status is being reported
> logs would help, vdsm.log at the very least.
>
>
> If virsh thinks that the vms are paused, you can try to restart libvirtd.
>
> Please file a bug about this in any case with engine and vdsm logs.
>
> Adding Michal in case he has better idea how to proceed.
>
> Nir
>
> Users@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>
>
> <engine.log-20160421.gz><vdsm.logs.tar.gz>
>
>
>
> [image: www.j2.com]
> <http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employeeemail>
>
> This email, its contents and attachments contain information from j2
> Global, Inc
> <http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employemail>.
> and/or its affiliates which may be privileged, confidential or otherwise
> protected from disclosure. The information is intended to be for the
> addressee(s) only. If you are not an addressee, any disclosure, copy,
> distribution, or use of the contents of this message is prohibited. If you
> have received this email in error please notify the sender by reply e-mail
> and delete the original message and any copies. © 2015 j2 Global, Inc
> <http://www.j2.com/>. All rights reserved. eFax ® <http://www.efax.com/>, 
> eVoice
> ® <http://www.evoice.com/>, Campaigner ® <http://www.campaigner.com/>, 
> FuseMail
> ® <http://www.fusemail.com/>, KeepItSafe ® <http://www.keepitsafe.com/>
> and Onebox ® <http://www.onebox.com/> are ! registere d trademarks of j2
> Global, Inc <http://www.j2.com/>. and its affiliates.
>
>
> [image: www.j2.com]
> <http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employeeemail>
>
> This email, its contents and attachments contain information from j2
> Global, Inc
> <http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employemail>.
> and/or its affiliates which may be privileged, confidential or otherwise
> protected from disclosure. The information is intended to be for the
> addressee(s) only. If you are not an addressee, any disclosure, copy,
> distribution, or use of the contents of this message is prohibited. If you
> have received this email in error please notify the sender by reply e-mail
> and delete the original message and any copies. © 2015 j2 Global, Inc
> <http://www.j2.com/>. All rights reserved. eFax ® <http://www.efax.com/>, 
> eVoice
> ® <http://www.evoice.com/>, Campaigner ® <http://www.campaigner.com/>, 
> FuseMail
> ® <http://www.fusemail.com/>, KeepItSafe ® <http://www.keepitsafe.com/>
> and Onebox ® <http://www.onebox.com/> are r egistered trademarks of j2
> Global, Inc <http://www.j2.com/>. and its affiliates.
>
> _______________________________________________
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to