Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition
Do you know when .34 will be released? http://mirror.centos.org/centos/7/virt/x86_64/ovirt-3.6/ Latest version is: vdsm-cli-4.17.32-1.el7.noarch.rpm 08-Aug-2016 17:36 On Fri, Oct 14, 2016 at 1:11 AM, Francesco Romani wrote: > > - Original Message - > > From: "Simone Tiraboschi" > > To: "Steve Dainard" , "Francesco Romani" < > from...@redhat.com> > > Cc: "users" > > Sent: Friday, October 14, 2016 9:59:49 AM > > Subject: Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill > partition > > > > On Fri, Oct 14, 2016 at 1:12 AM, Steve Dainard > wrote: > > > > > Hello, > > > > > > I had a hypervisor semi-crash this week, 4 of ~10 VM's continued to > run, > > > but the others were killed off somehow and all VM's running on this > host > > > had '?' status in the ovirt UI. > > > > > > This appears to have been caused by vdsm logs filling up disk space on > the > > > logging partition. > > > > > > I've attached the log file vdsm.log.27.xz which shows this error: > > > > > > vdsm.Scheduler::DEBUG::2016-10-11 > > > 16:42:09,318::executor::216::Executor::(_discard) > > > Worker discarded: > > action= > > 'virt.periodic.DriveWatermarkMonitor'> > > > at 0x7f8e90021210> at 0x7f8e90021250> discarded at 0x7f8dd123e850> > > > > > > which happens more and more frequently throughout the log. > > > > > > It was a bit difficult to understand what caused the failure, but the > logs > > > were getting really large, then being xz'd which compressed 11G+ into > a few > > > MB. Once this happened the disk space would be freed, and nagios > wouldn't > > > hit the 3rd check to throw a warning, until pretty much right at the > crash. > > > > > > I was able to restart vdsmd to resolve the issue, but I still need to > know > > > why these logs started to stack up so I can avoid this issue in the > future. > > > > > > > We had this one: https://bugzilla.redhat.com/show_bug.cgi?id=1383259 > > but in your case the logs are rotating. > > Francesco? > > Hi, > > yes, it is a different issue. Here the log messages are caused by the > Worker threads > of the periodic subsystem, which are leaking[1]. > This was a bug in Vdsm (insufficient protection against rogue domains), > but the > real problem is that some of your domain are being unresponsive at > hypervisor level. > The most likely cause is in turn unresponsive storages. > > Fixes are been committed and shipped with Vdsm 4.17.34. > > See: ttps://bugzilla.redhat.com/1364925 > > HTH, > > +++ > > [1] actually, they are replaced too quickly, leading to unbound growth. > So those aren't actually "leaking", Vdsm is just overzealous handling one > error condition, > making things worse than before. > Still serious issue, no doubt, but quite different cause. > > -- > Francesco Romani > Red Hat Engineering Virtualization R & D > Phone: 8261328 > IRC: fromani > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition
- Original Message - > From: "Simone Tiraboschi" > To: "Steve Dainard" , "Francesco Romani" > > Cc: "users" > Sent: Friday, October 14, 2016 9:59:49 AM > Subject: Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition > > On Fri, Oct 14, 2016 at 1:12 AM, Steve Dainard wrote: > > > Hello, > > > > I had a hypervisor semi-crash this week, 4 of ~10 VM's continued to run, > > but the others were killed off somehow and all VM's running on this host > > had '?' status in the ovirt UI. > > > > This appears to have been caused by vdsm logs filling up disk space on the > > logging partition. > > > > I've attached the log file vdsm.log.27.xz which shows this error: > > > > vdsm.Scheduler::DEBUG::2016-10-11 > > 16:42:09,318::executor::216::Executor::(_discard) > > Worker discarded: > action= > 'virt.periodic.DriveWatermarkMonitor'> > > at 0x7f8e90021210> at 0x7f8e90021250> discarded at 0x7f8dd123e850> > > > > which happens more and more frequently throughout the log. > > > > It was a bit difficult to understand what caused the failure, but the logs > > were getting really large, then being xz'd which compressed 11G+ into a few > > MB. Once this happened the disk space would be freed, and nagios wouldn't > > hit the 3rd check to throw a warning, until pretty much right at the crash. > > > > I was able to restart vdsmd to resolve the issue, but I still need to know > > why these logs started to stack up so I can avoid this issue in the future. > > > > We had this one: https://bugzilla.redhat.com/show_bug.cgi?id=1383259 > but in your case the logs are rotating. > Francesco? Hi, yes, it is a different issue. Here the log messages are caused by the Worker threads of the periodic subsystem, which are leaking[1]. This was a bug in Vdsm (insufficient protection against rogue domains), but the real problem is that some of your domain are being unresponsive at hypervisor level. The most likely cause is in turn unresponsive storages. Fixes are been committed and shipped with Vdsm 4.17.34. See: ttps://bugzilla.redhat.com/1364925 HTH, +++ [1] actually, they are replaced too quickly, leading to unbound growth. So those aren't actually "leaking", Vdsm is just overzealous handling one error condition, making things worse than before. Still serious issue, no doubt, but quite different cause. -- Francesco Romani Red Hat Engineering Virtualization R & D Phone: 8261328 IRC: fromani ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition
On Fri, Oct 14, 2016 at 1:12 AM, Steve Dainard wrote: > Hello, > > I had a hypervisor semi-crash this week, 4 of ~10 VM's continued to run, > but the others were killed off somehow and all VM's running on this host > had '?' status in the ovirt UI. > > This appears to have been caused by vdsm logs filling up disk space on the > logging partition. > > I've attached the log file vdsm.log.27.xz which shows this error: > > vdsm.Scheduler::DEBUG::2016-10-11 > 16:42:09,318::executor::216::Executor::(_discard) > Worker discarded: action= > at 0x7f8e90021210> at 0x7f8e90021250> discarded at 0x7f8dd123e850> > > which happens more and more frequently throughout the log. > > It was a bit difficult to understand what caused the failure, but the logs > were getting really large, then being xz'd which compressed 11G+ into a few > MB. Once this happened the disk space would be freed, and nagios wouldn't > hit the 3rd check to throw a warning, until pretty much right at the crash. > > I was able to restart vdsmd to resolve the issue, but I still need to know > why these logs started to stack up so I can avoid this issue in the future. > We had this one: https://bugzilla.redhat.com/show_bug.cgi?id=1383259 but in your case the logs are rotating. Francesco? > > Hypervisor host info: > CentOS 7 > # rpm -qa | grep vdsm > vdsm-yajsonrpc-4.17.32-1.el7.noarch > vdsm-xmlrpc-4.17.32-1.el7.noarch > vdsm-infra-4.17.32-1.el7.noarch > vdsm-hook-vmfex-dev-4.17.32-1.el7.noarch > vdsm-python-4.17.32-1.el7.noarch > vdsm-4.17.32-1.el7.noarch > vdsm-cli-4.17.32-1.el7.noarch > vdsm-jsonrpc-4.17.32-1.el7.noarch > > Engine host info: > CentOS 7 > $ rpm -qa | grep ovirt > ovirt-engine-lib-3.6.7.5-1.el7.centos.noarch > ovirt-iso-uploader-3.6.0-1.el7.centos.noarch > ovirt-engine-wildfly-overlay-8.0.5-1.el7.noarch > ovirt-engine-webadmin-portal-3.6.7.5-1.el7.centos.noarch > ovirt-engine-jboss-as-7.1.1-1.el7.x86_64 > ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.7. > 5-1.el7.centos.noarch > ovirt-host-deploy-1.4.1-1.el7.centos.noarch > ovirt-engine-vmconsole-proxy-helper-3.6.7.5-1.el7.centos.noarch > ovirt-engine-backend-3.6.7.5-1.el7.centos.noarch > ovirt-setup-lib-1.0.1-1.el7.centos.noarch > ovirt-engine-setup-plugin-websocket-proxy-3.6.7.5-1.el7.centos.noarch > ovirt-engine-websocket-proxy-3.6.7.5-1.el7.centos.noarch > ovirt-engine-tools-3.6.7.5-1.el7.centos.noarch > ovirt-engine-setup-base-3.6.7.5-1.el7.centos.noarch > ovirt-engine-setup-3.6.7.5-1.el7.centos.noarch > ovirt-vmconsole-1.0.2-1.el7.centos.noarch > ovirt-engine-wildfly-8.2.1-1.el7.x86_64 > ovirt-engine-tools-backup-3.6.7.5-1.el7.centos.noarch > ovirt-engine-userportal-3.6.7.5-1.el7.centos.noarch > ovirt-engine-3.6.7.5-1.el7.centos.noarch > ovirt-release35-006-1.noarch > ovirt-engine-extension-aaa-ldap-1.1.0-0.0.master. > 20151021074904.git92c5c31.el7.noarch > ovirt-release36-3.6.7-1.noarch > ovirt-engine-setup-plugin-ovirt-engine-3.6.7.5-1.el7.centos.noarch > ovirt-host-deploy-java-1.4.1-1.el7.centos.noarch > ovirt-image-uploader-3.6.0-1.el7.centos.noarch > ovirt-engine-dbscripts-3.6.7.5-1.el7.centos.noarch > ovirt-engine-sdk-python-3.6.3.0-1.el7.noarch > ovirt-engine-extension-aaa-jdbc-1.0.7-1.el7.noarch > ovirt-engine-extensions-api-impl-3.6.7.5-1.el7.centos.noarch > ovirt-engine-restapi-3.6.7.5-1.el7.centos.noarch > ovirt-engine-setup-plugin-ovirt-engine-common-3.6.7.5-1.el7.centos.noarch > ovirt-vmconsole-proxy-1.0.2-1.el7.centos.noarch > ovirt-engine-cli-3.6.2.0-1.el7.centos.noarch > > > Thanks, > Steve > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users