Hi Yaniv, Just a reminder, can you give us a pointer? Red Hat Support just asked us to disable PM before restarting vdsm again.
Thanks & Best regards, On Mon, Aug 22, 2016 at 10:57 PM, Ekin Meroğlu <ekin.mero...@linuxera.com> wrote: > Hi Yaniv, > > On Sun, Aug 7, 2016 at 9:37 PM, Ekin Meroğlu <ekin.mero...@linuxera.com> >> wrote: >> >>> Hi, >>> >>> Just a reminder, if you have power management configured, first turn >>> that off for the host - when you restart vdsmd with the power management >>> configured, engine finds it not responding and tries to fence (e.g. reboot) >>> the host. >>> >> >> That's not true - if it's a graceful restart, it should not happen. >> > > Can you explain this a little more? Is there a mechanism to prevent > fencing on this scenario? > > In two of our customers' production systems we've experienced this exact > behavior (i.e. engine fencing the host while restarting vdsm service > manually) for a number of times, and we were specifically advised by Red > Hat Support to turn off PM before restarting service. I'd like to to know > if we have a better / easier way to restart vdsm. > > btw, b > oth of the environments were RHEV-H based RHEV 3.5 clusters, and both we > were busy systems, so restarting vdsm service took quite a long time. I'm > guessing this might be a factor. > > Regards, > > >> >> > >> >>> >>> Other than that, restarting vdsmd has been safe in my experience... >>> >>> Regards, >>> >>> On Thu, Aug 4, 2016 at 6:10 PM, Nicolás <nico...@devels.es> wrote: >>> >>>> >>>> >>>> El 04/08/16 a las 15:25, Arik Hadas escribió: >>>> >>>>> >>>>> ----- Original Message ----- >>>>> >>>>>> El 2016-08-04 08:24, Arik Hadas escribió: >>>>>> >>>>>>> ----- Original Message ----- >>>>>>> >>>>>>>> >>>>>>>> El 04/08/16 a las 07:18, Arik Hadas escribió: >>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> We're running oVirt 4.0.1 and today I found out that one of our >>>>>>>>>> hosts >>>>>>>>>> has all its VMs in an unknown state. I actually don't know how >>>>>>>>>> (and >>>>>>>>>> when) did this happen, but I'd like to restore service possibly >>>>>>>>>> without >>>>>>>>>> turning off these machines. The host is up, the VMs are up, 'qemu' >>>>>>>>>> process exists, no errors, it's just the VMs running on it that >>>>>>>>>> have a >>>>>>>>>> '?' where status is defined. >>>>>>>>>> >>>>>>>>>> Is it safe in this case to simply modify database and set those >>>>>>>>>> VM's >>>>>>>>>> status to 'up'? I remember having to do this a time ago when we >>>>>>>>>> faced >>>>>>>>>> storage issues, it didn't break anything back then. If not, is >>>>>>>>>> there a >>>>>>>>>> "safe" way to migrate those VMs to a different host and restart >>>>>>>>>> the >>>>>>>>>> host >>>>>>>>>> that marked them as unknown? >>>>>>>>>> >>>>>>>>> Hi Nicolás, >>>>>>>>> >>>>>>>>> I assume that the host these VMs are running on is empty in the >>>>>>>>> webadmin, >>>>>>>>> right? if that is the case then you've probably hit [1]. Changing >>>>>>>>> their >>>>>>>>> status to up is not the way to go since these VMs will not be >>>>>>>>> monitored. >>>>>>>>> >>>>>>>> Hi Arik, >>>>>>>> >>>>>>>> By "empty" you mean the webadmin reports the host being running 0 >>>>>>>> VMs? >>>>>>>> If so, that's not the case, actually the VM count seems to be >>>>>>>> correct >>>>>>>> in >>>>>>>> relation to "qemu-*" processes (about 32 VMs), I can even see the >>>>>>>> machines in the "Virtual machines" tab of the host, it's just they >>>>>>>> are >>>>>>>> all marked with the '?' mark. >>>>>>>> >>>>>>> No, I meant the 'Host' column in the Virtual Machines tab but if you >>>>>>> see >>>>>>> the VMs in the "Virtual machines" sub-tab of the host then run_on_vds >>>>>>> points to the right host.. >>>>>>> >>>>>>> The host is up in the webadmin as well? >>>>>>> Can you share the engine log? >>>>>>> >>>>>>> Yes, the host is up in the webadmin, there are no issues with it, >>>>>> just >>>>>> the VMs running on it have the '?' mark. I've made 3 tests: >>>>>> >>>>>> 1) Restart engine: did not help >>>>>> 2) Check firewall, seems to be ok. >>>>>> 2) PostgreSQL: UPDATE vm_dynamic SET status = 1 WHERE status = 8; : >>>>>> After a while, I see lots of entries like this: >>>>>> >>>>>> 2016-08-04 09:23:10,910 WARN >>>>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo >>>>>> gDirector] >>>>>> (DefaultQuartzScheduler4) [6ad135b8] Correlation ID: null, Call Stack: >>>>>> null, Custom Event ID: -1, Message: VM xxx is not responding. >>>>>> >>>>>> I'm attaching the engine log, but I don't know when did this happen >>>>>> for >>>>>> the first time, though. If there's a manual way/command to migrate VMs >>>>>> to a different host I'd appreciate a hint about it. >>>>>> >>>>>> Is it safe to restart vdsmd on this host? >>>>>> >>>>> The engine log looks fine - the VMs are reported as not-responding for >>>>> some reason. I would restart libvirtd and vdsmd then >>>>> >>>> >>>> Is restarting those two daemons safe? I mean, will that stop all qemu-* >>>> processes, so the VMs marked as unknown will stop? >>>> >>>> >>>> Thanks. >>>>>> >>>>>> Thanks. >>>>>>>> >>>>>>>> Yes, there is no other way to resolve it other than changing the DB >>>>>>>>> but >>>>>>>>> the change should be to update run_on_vds field of these VMs to >>>>>>>>> the host >>>>>>>>> you know they are running on. Their status will then be updates in >>>>>>>>> 15 >>>>>>>>> sec. >>>>>>>>> >>>>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1354494 >>>>>>>>> >>>>>>>>> Arik. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Nicolás >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Users mailing list >>>>>>>>>> Users@ovirt.org >>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>> >>> >>> >>> -- >>> *Ekin Meroğlu** Red Hat Certified Architect* >>> >>> linuxera Özgür Yazılım Çözüm ve Hizmetleri >>> *T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04 >>> www.linuxera.com | bi...@linuxera.com >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >> > > > -- > *Ekin Meroğlu** Red Hat Certified Architect* > > linuxera Özgür Yazılım Çözüm ve Hizmetleri > *T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04 > www.linuxera.com | bi...@linuxera.com > -- *Ekin Meroğlu** Red Hat Certified Architect* linuxera Özgür Yazılım Çözüm ve Hizmetleri *T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04 www.linuxera.com | bi...@linuxera.com
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users