.recovery setting before removing:
p298
sS'status'
p299
S'Paused'
p300



After removing .recovery file and shutdown and restart:
V0
sS'status'
p51
S'Up'
p52


So far looks good, GUI show's VM as Up.


another host was:
p318
sS'status'
p319
S'Paused'
p320

after moving .recovery file and restarting:
V0
sS'status'
p51
S'Up'


Thanks.

On 04/29/2016 02:36 PM, Nir Soffer wrote:
/run/vdsm/<vmid>.recovery

On Fri, Apr 29, 2016 at 10:59 PM, Bill James <bill.ja...@j2.com <mailto:bill.ja...@j2.com>> wrote:

    where do I find the recovery files?

    [root@ovirt1 test vdsm]# pwd
    /var/lib/vdsm
    [root@ovirt1 test vdsm]# ls -la
    total 16
    drwxr-xr-x   6 vdsm kvm    100 Mar 17 16:33 .
    drwxr-xr-x. 45 root root  4096 Apr 29 12:01 ..
    -rw-r--r--   1 vdsm kvm  10170 Jan 19 05:04 bonding-defaults.json
    drwxr-xr-x   2 vdsm root     6 Apr 19 11:34 netconfback
    drwxr-xr-x   3 vdsm kvm     54 Apr 19 11:35 persistence
    drwxr-x---.  2 vdsm kvm      6 Mar 17 16:33 transient
    drwxr-xr-x   2 vdsm kvm     40 Mar 17 16:33 upgrade



    On 4/29/16 10:02 AM, Michal Skrivanek wrote:


    On 29 Apr 2016, at 18:26, Bill James <bill.ja...@j2.com
    <mailto:bill.ja...@j2.com>> wrote:

    yes they are still saying "paused" state.
    No, bouncing libvirt didn't help.

    Then my suspicion of vm recovery gets closer to a certainty:)
    Can you get one of the paused vm's .recovery file from
    /var/lib/vdsm and check it says Paused there? It's worth a shot
    to try to remove that file and restart vdsm, then check logs and
    that vm status...it should recover "good enough" from libvirt only.
    Try it with one first

    I noticed the errors about the ISO domain. Didn't think that was
    related.
    I have been migrating a lot of VMs to ovirt lately, and recently
    added another node.
    Also had some problems with /etc/exports for a while, but I
    think those issues are all resolved.


    Last "unresponsive" message in vdsm.log was:

    vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::*2016-04-21*
    11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout)
    vmId=`b6a13808-9552-401b-840b-4f7022e8293d`::monitor become
    unresponsive (command timeout, age=310323.97)
    vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::2016-04-21
    11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout)
    vmId=`5bfb140a-a971-4c9c-82c6-277929eb45d4`::monitor become
    unresponsive (command timeout, age=310323.97)



    Thanks.



    On 4/29/16 1:40 AM, Michal Skrivanek wrote:

    On 28 Apr 2016, at 19:40, Bill James <bill.ja...@j2.com
    <mailto:bill.ja...@j2.com>> wrote:

    thank you for response.
    I bold-ed the ones that are listed as "paused".


    [root@ovirt1 test vdsm]# virsh -r list --all
     Id  Â
    Name                          State
    ----------------------------------------------------



    Looks like problem started around 2016-04-17 20:19:34,822,
    based on engine.log attached.

    yes, that time looks correct. Any idea what might have been a
    trigger? Anything interesting happened at that time (power
    outage of some host, some maintenance action, anything)?Â
    logs indicate a problem when vdsm talks to libvirt(all those
    "monitor become unresponsive†)

    It does seem that at that time you started to have some storage
    connectivity issues - first one at 2016-04-17 20:06:53,929.
    And it doesn’t look temporary because such errors are still
    there couple hours later(in your most recent file you attached
    I can see at 23:00:54)
    When I/O gets blocked the VMs may experience issues (then VM
    gets Paused), or their qemu process gets stuck(resulting in
    libvirt either reporting error or getting stuck as well ->
    resulting in what vdsm sees as “monitor unresponsive†)

    Since you now bounced libvirtd - did it help? Do you still see
    wrong status for those VMs and still those "monitor
    unresponsive" errors in vdsm.log?
    If not…then I would suspect the “vm recovery†code not
    working correctly. Milan is looking at that.

    Thanks,
    michal


    There's a lot of vdsm logs!

    fyi, the storage domain for these Vms is a "local" nfs share,
    7e566f55-e060-47b7-bfa4-ac3c48d70dda.

    attached more logs.


    On 04/28/2016 12:53 AM, Michal Skrivanek wrote:
    On 27 Apr 2016, at 19:16, Bill James<bill.ja...@j2.com> 
<mailto:bill.ja...@j2.com>  wrote:

    virsh # list --all
    error: failed to connect to the hypervisor
    error: no valid connection
    error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such 
file or directory

    you need to run virsh in read-only mode
    virsh -r list —all

    [root@ovirt1 test vdsm]# systemctl status libvirtd
    â—  libvirtd.service - Virtualization daemon
       Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; 
vendor preset: enabled)
      Drop-In: /etc/systemd/system/libvirtd.service.d
               └─unlimited-core.conf
       Active: active (running) since Thu 2016-04-21 16:00:03 PDT; 5 days ago


    tried systemctl restart libvirtd.
    No change.

    Attached vdsm.log and supervdsm.log.


    [root@ovirt1 test vdsm]# systemctl status vdsmd
    â—  vdsmd.service - Virtual Desktop Server Manager
       Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
preset: enabled)
       Active: active (running) since Wed 2016-04-27 10:09:14 PDT; 3min 46s ago


    vdsm-4.17.18-0.el7.centos.noarch
    the vdsm.log attach is good, but it’s too short interval, it only shows 
recovery(vdsm restart) phase when the VMs are identified as paused….can you 
add earlier logs? Did you restart vdsm yourself or did it crash?


    libvirt-daemon-1.2.17-13.el7_2.4.x86_64


    Thanks.


    On 04/26/2016 11:35 PM, Michal Skrivanek wrote:
    On 27 Apr 2016, at 02:04, Nir Soffer<nsof...@redhat.com> 
<mailto:nsof...@redhat.com>  wrote:

    jjOn Wed, Apr 27, 2016 at 2:03 AM, Bill James<bill.ja...@j2.com> 
<mailto:bill.ja...@j2.com>  wrote:
    I have a hardware node that has 26 VMs.
    9 are listed as "running", 17 are listed as "paused".

    In truth all VMs are up and running fine.

    I tried telling the db they are up:

    engine=> update vm_dynamic set status = 1 where vm_guid =(select
    vm_guid from vm_static where vm_name = 'api1.test.j2noc.com 
<http://api1.test.j2noc.com>');

    GUI then shows it up for a short while,

    then puts it back in paused state.

    2016-04-26 15:16:46,095 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
    (DefaultQuartzScheduler_Worker-16) [157cc21e] VM '242ca0af-4ab2-4dd6-b515-5
    d435e6452c4'(api1.test.j2noc.com <http://api1.test.j2noc.com>) moved from 'Up' 
--> 'Paused'
    2016-04-26 15:16:46,221 INFO [org.ovirt.engine.core.dal.dbbroker.auditlogh
    andling.AuditLogDirector] (DefaultQuartzScheduler_Worker-16) [157cc21e] Cor
    relation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM api1.
    test.j2noc.com <http://test.j2noc.com>  has been paused.


    Why does the engine think the VMs are paused?
    Attached engine.log.

    I can fix the problem by powering off the VM then starting it back up.
    But the VM is working fine! How do I get ovirt to realize that?
    If this is an issue in engine, restarting engine may fix this.
    but having this problem only with one node, I don't think this is the issue.

    If this is an issue in vdsm, restarting vdsm may fix this.

    If this does not help, maybe this is libvirt issue? did you try to check vm
    status using virsh?
    this looks more likely as it seems such status is being reported
    logs would help, vdsm.log at the very least.

    If virsh thinks that the vms are paused, you can try to restart libvirtd.

    Please file a bug about this in any case with engine and vdsm logs.

    Adding Michal in case he has better idea how to proceed.

    Nir
    Users@ovirt.org <mailto:Users@ovirt.org>
    http://lists.ovirt.org/mailman/listinfo/users

    <engine.log-20160421.gz><vdsm.logs.tar.gz>


    www.j2.com
    
<http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employeeemail>

    This email, its contents and attachments contain information
    from j2 Global, Inc
    
<http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employemail>.
    and/or its affiliates which may be privileged, confidential or
    otherwise protected from disclosure. The information is intended
    to be for the addressee(s) only. If you are not an addressee,
    any disclosure, copy, distribution, or use of the contents of
    this message is prohibited. If you have received this email in
    error please notify the sender by reply e-mail and delete the
    original message and any copies. © 2015 j2 Global, Inc
    <http://www.j2.com/>. All rights reserved. eFax ®
    <http://www.efax.com/>, eVoice ® <http://www.evoice.com/>,
    Campaigner ® <http://www.campaigner.com/>, FuseMail ®
    <http://www.fusemail.com/>, KeepItSafe ®
    <http://www.keepitsafe.com/> and Onebox ®
    <http://www.onebox.com/> are ! registere d trademarks of j2
    Global, Inc <http://www.j2.com/>. and its affiliates.


    www.j2.com
    
<http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employeeemail>

    This email, its contents and attachments contain information from
    j2 Global, Inc
    
<http://www.j2.com/?utm_source=j2global&utm_medium=xsell-referral&utm_campaign=employemail>.
    and/or its affiliates which may be privileged, confidential or
    otherwise protected from disclosure. The information is intended
    to be for the addressee(s) only. If you are not an addressee, any
    disclosure, copy, distribution, or use of the contents of this
    message is prohibited. If you have received this email in error
    please notify the sender by reply e-mail and delete the original
    message and any copies. © 2015 j2 Global, Inc
    <http://www.j2.com/>. All rights reserved. eFax ®
    <http://www.efax.com/>, eVoice ® <http://www.evoice.com/>,
    Campaigner ® <http://www.campaigner.com/>, FuseMail ®
    <http://www.fusemail.com/>, KeepItSafe ®
    <http://www.keepitsafe.com/> and Onebox ® <http://www.onebox.com/>
    are r egistered trademarks of j2 Global, Inc <http://www.j2.com/>.
    and its affiliates.


    _______________________________________________
    Users mailing list
    Users@ovirt.org <mailto:Users@ovirt.org>
    http://lists.ovirt.org/mailman/listinfo/users




Cloud Services for Business www.j2.com
j2 | eFax | eVoice | FuseMail | Campaigner | KeepItSafe | Onebox


This email, its contents and attachments contain information from j2 Global, 
Inc. and/or its affiliates which may be privileged, confidential or otherwise 
protected from disclosure. The information is intended to be for the 
addressee(s) only. If you are not an addressee, any disclosure, copy, 
distribution, or use of the contents of this message is prohibited. If you have 
received this email in error please notify the sender by reply e-mail and 
delete the original message and any copies. (c) 2015 j2 Global, Inc. All rights 
reserved. eFax, eVoice, Campaigner, FuseMail, KeepItSafe, and Onebox are 
registered trademarks of j2 Global, Inc. and its affiliates.
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to