Hi,

It worked (thanks Ekin, I'd probably had not turned it off and it would indeed been restarted), restarting libvirtd and vdsmd made all machines set their status to up. Seems the cuprit is libvirtd in this case, as I could see some errors in the log. I'm attaching the log FWIW.

Thanks.

El 07/08/16 a las 19:37, Ekin Meroğlu escribió:
Hi,

Just a reminder, if you have power management configured, first turn that off for the host - when you restart vdsmd with the power management configured, engine finds it not responding and tries to fence (e.g. reboot) the host.

Other than that, restarting vdsmd has been safe in my experience...

Regards,

On Thu, Aug 4, 2016 at 6:10 PM, Nicolás <nico...@devels.es <mailto:nico...@devels.es>> wrote:



    El 04/08/16 a las 15:25, Arik Hadas escribió:


        ----- Original Message -----

            El 2016-08-04 08:24, Arik Hadas escribió:

                ----- Original Message -----


                    El 04/08/16 a las 07:18, Arik Hadas escribió:

                        ----- Original Message -----

                            Hi,

                            We're running oVirt 4.0.1 and today I
                            found out that one of our hosts
                            has all its VMs in an unknown state. I
                            actually don't know how (and
                            when) did this happen, but I'd like to
                            restore service possibly without
                            turning off these machines. The host is
                            up, the VMs are up, 'qemu'
                            process exists, no errors, it's just the
                            VMs running on it that have a
                            '?' where status is defined.

                            Is it safe in this case to simply modify
                            database and set those VM's
                            status to 'up'? I remember having to do
                            this a time ago when we faced
                            storage issues, it didn't break anything
                            back then. If not, is there a
                            "safe" way to migrate those VMs to a
                            different host and restart the
                            host
                            that marked them as unknown?

                        Hi Nicolás,

                        I assume that the host these VMs are running
                        on is empty in the
                        webadmin,
                        right? if that is the case then you've
                        probably hit [1]. Changing their
                        status to up is not the way to go since these
                        VMs will not be monitored.

                    Hi Arik,

                    By "empty" you mean the webadmin reports the host
                    being running 0 VMs?
                    If so, that's not the case, actually the VM count
                    seems to be correct
                    in
                    relation to "qemu-*" processes (about 32 VMs), I
                    can even see the
                    machines in the "Virtual machines" tab of the
                    host, it's just they are
                    all marked with the '?' mark.

                No, I meant the 'Host' column in the Virtual Machines
                tab but if you
                see
                the VMs in the "Virtual machines" sub-tab of the host
                then run_on_vds
                points to the right host..

                The host is up in the webadmin as well?
                Can you share the engine log?

            Yes, the host is up in the webadmin, there are no issues
            with it, just
            the VMs running on it have the '?' mark. I've made 3 tests:

            1) Restart engine: did not help
            2) Check firewall, seems to be ok.
            2) PostgreSQL: UPDATE vm_dynamic SET status = 1 WHERE
            status = 8; :
            After a while, I see lots of entries like this:

                  2016-08-04 09:23:10,910 WARN
            
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
            (DefaultQuartzScheduler4) [6ad135b8] Correlation ID: null,
            Call Stack:
            null, Custom Event ID: -1, Message: VM xxx is not responding.

            I'm attaching the engine log, but I don't know when did
            this happen for
            the first time, though. If there's a manual way/command to
            migrate VMs
            to a different host I'd appreciate a hint about it.

            Is it safe to restart vdsmd on this host?

        The engine log looks fine - the VMs are reported as
        not-responding for
        some reason. I would restart libvirtd and vdsmd then


    Is restarting those two daemons safe? I mean, will that stop all
    qemu-* processes, so the VMs marked as unknown will stop?


            Thanks.

                    Thanks.

                        Yes, there is no other way to resolve it other
                        than changing the DB but
                        the change should be to update run_on_vds
                        field of these VMs to the host
                        you know they are running on. Their status
                        will then be updates in 15
                        sec.

                        [1]
                        https://bugzilla.redhat.com/show_bug.cgi?id=1354494
                        <https://bugzilla.redhat.com/show_bug.cgi?id=1354494>

                        Arik.

                            Thanks.

                            Nicolás

                            _______________________________________________
                            Users mailing list
                            Users@ovirt.org <mailto:Users@ovirt.org>
                            http://lists.ovirt.org/mailman/listinfo/users
                            <http://lists.ovirt.org/mailman/listinfo/users>



    _______________________________________________
    Users mailing list
    Users@ovirt.org <mailto:Users@ovirt.org>
    http://lists.ovirt.org/mailman/listinfo/users
    <http://lists.ovirt.org/mailman/listinfo/users>




--
        *Ekin Meroğlu*/ Red Hat Certified Architect/

linuxera Özgür Yazılım Çözüm ve Hizmetleri
*T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04
www.linuxera.com <http://www.linuxera.com> | bi...@linuxera.com <mailto:bi...@linuxera.com>


Attachment: libvirtd.tar.gz
Description: application/gzip

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to