Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition

2016-10-23 Thread Steve Dainard
Do you know when .34 will be released?

http://mirror.centos.org/centos/7/virt/x86_64/ovirt-3.6/
Latest version is:
vdsm-cli-4.17.32-1.el7.noarch.rpm 08-Aug-2016 17:36

On Fri, Oct 14, 2016 at 1:11 AM, Francesco Romani 
wrote:

>
> - Original Message -
> > From: "Simone Tiraboschi" 
> > To: "Steve Dainard" , "Francesco Romani" <
> from...@redhat.com>
> > Cc: "users" 
> > Sent: Friday, October 14, 2016 9:59:49 AM
> > Subject: Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill
> partition
> >
> > On Fri, Oct 14, 2016 at 1:12 AM, Steve Dainard 
> wrote:
> >
> > > Hello,
> > >
> > > I had a hypervisor semi-crash this week, 4 of ~10 VM's continued to
> run,
> > > but the others were killed off somehow and all VM's running on this
> host
> > > had '?' status in the ovirt UI.
> > >
> > > This appears to have been caused by vdsm logs filling up disk space on
> the
> > > logging partition.
> > >
> > > I've attached the log file vdsm.log.27.xz which shows this error:
> > >
> > > vdsm.Scheduler::DEBUG::2016-10-11
> > > 16:42:09,318::executor::216::Executor::(_discard)
> > > Worker discarded:  > > action= > > 'virt.periodic.DriveWatermarkMonitor'>
> > > at 0x7f8e90021210> at 0x7f8e90021250> discarded at 0x7f8dd123e850>
> > >
> > > which happens more and more frequently throughout the log.
> > >
> > > It was a bit difficult to understand what caused the failure, but the
> logs
> > > were getting really large, then being xz'd which compressed 11G+ into
> a few
> > > MB. Once this happened the disk space would be freed, and nagios
> wouldn't
> > > hit the 3rd check to throw a warning, until pretty much right at the
> crash.
> > >
> > > I was able to restart vdsmd to resolve the issue, but I still need to
> know
> > > why these logs started to stack up so I can avoid this issue in the
> future.
> > >
> >
> > We had this one: https://bugzilla.redhat.com/show_bug.cgi?id=1383259
> > but in your case the logs are rotating.
> > Francesco?
>
> Hi,
>
> yes, it is a different issue. Here the log messages are caused by the
> Worker threads
> of the periodic subsystem, which are leaking[1].
> This was a bug in Vdsm (insufficient protection against rogue domains),
> but the
> real problem is that some of your domain are being unresponsive at
> hypervisor level.
> The most likely cause is in turn unresponsive storages.
>
> Fixes are been committed and shipped with Vdsm 4.17.34.
>
> See: ttps://bugzilla.redhat.com/1364925
>
> HTH,
>
> +++
>
> [1] actually, they are replaced too quickly, leading to unbound growth.
> So those aren't actually "leaking", Vdsm is just overzealous handling one
> error condition,
> making things worse than before.
> Still serious issue, no doubt, but quite different cause.
>
> --
> Francesco Romani
> Red Hat Engineering Virtualization R & D
> Phone: 8261328
> IRC: fromani
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition

2016-10-14 Thread Francesco Romani

- Original Message -
> From: "Simone Tiraboschi" 
> To: "Steve Dainard" , "Francesco Romani" 
> 
> Cc: "users" 
> Sent: Friday, October 14, 2016 9:59:49 AM
> Subject: Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition
> 
> On Fri, Oct 14, 2016 at 1:12 AM, Steve Dainard  wrote:
> 
> > Hello,
> >
> > I had a hypervisor semi-crash this week, 4 of ~10 VM's continued to run,
> > but the others were killed off somehow and all VM's running on this host
> > had '?' status in the ovirt UI.
> >
> > This appears to have been caused by vdsm logs filling up disk space on the
> > logging partition.
> >
> > I've attached the log file vdsm.log.27.xz which shows this error:
> >
> > vdsm.Scheduler::DEBUG::2016-10-11
> > 16:42:09,318::executor::216::Executor::(_discard)
> > Worker discarded:  > action= > 'virt.periodic.DriveWatermarkMonitor'>
> > at 0x7f8e90021210> at 0x7f8e90021250> discarded at 0x7f8dd123e850>
> >
> > which happens more and more frequently throughout the log.
> >
> > It was a bit difficult to understand what caused the failure, but the logs
> > were getting really large, then being xz'd which compressed 11G+ into a few
> > MB. Once this happened the disk space would be freed, and nagios wouldn't
> > hit the 3rd check to throw a warning, until pretty much right at the crash.
> >
> > I was able to restart vdsmd to resolve the issue, but I still need to know
> > why these logs started to stack up so I can avoid this issue in the future.
> >
> 
> We had this one: https://bugzilla.redhat.com/show_bug.cgi?id=1383259
> but in your case the logs are rotating.
> Francesco?

Hi,

yes, it is a different issue. Here the log messages are caused by the Worker 
threads
of the periodic subsystem, which are leaking[1].
This was a bug in Vdsm (insufficient protection against rogue domains), but the
real problem is that some of your domain are being unresponsive at hypervisor 
level.
The most likely cause is in turn unresponsive storages.

Fixes are been committed and shipped with Vdsm 4.17.34.

See: ttps://bugzilla.redhat.com/1364925

HTH,

+++

[1] actually, they are replaced too quickly, leading to unbound growth.
So those aren't actually "leaking", Vdsm is just overzealous handling one error 
condition,
making things worse than before.
Still serious issue, no doubt, but quite different cause.

-- 
Francesco Romani
Red Hat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition

2016-10-14 Thread Simone Tiraboschi
On Fri, Oct 14, 2016 at 1:12 AM, Steve Dainard  wrote:

> Hello,
>
> I had a hypervisor semi-crash this week, 4 of ~10 VM's continued to run,
> but the others were killed off somehow and all VM's running on this host
> had '?' status in the ovirt UI.
>
> This appears to have been caused by vdsm logs filling up disk space on the
> logging partition.
>
> I've attached the log file vdsm.log.27.xz which shows this error:
>
> vdsm.Scheduler::DEBUG::2016-10-11 
> 16:42:09,318::executor::216::Executor::(_discard)
> Worker discarded:  action=
> at 0x7f8e90021210> at 0x7f8e90021250> discarded at 0x7f8dd123e850>
>
> which happens more and more frequently throughout the log.
>
> It was a bit difficult to understand what caused the failure, but the logs
> were getting really large, then being xz'd which compressed 11G+ into a few
> MB. Once this happened the disk space would be freed, and nagios wouldn't
> hit the 3rd check to throw a warning, until pretty much right at the crash.
>
> I was able to restart vdsmd to resolve the issue, but I still need to know
> why these logs started to stack up so I can avoid this issue in the future.
>

We had this one: https://bugzilla.redhat.com/show_bug.cgi?id=1383259
but in your case the logs are rotating.
Francesco?


>
> Hypervisor host info:
> CentOS 7
> # rpm -qa | grep vdsm
> vdsm-yajsonrpc-4.17.32-1.el7.noarch
> vdsm-xmlrpc-4.17.32-1.el7.noarch
> vdsm-infra-4.17.32-1.el7.noarch
> vdsm-hook-vmfex-dev-4.17.32-1.el7.noarch
> vdsm-python-4.17.32-1.el7.noarch
> vdsm-4.17.32-1.el7.noarch
> vdsm-cli-4.17.32-1.el7.noarch
> vdsm-jsonrpc-4.17.32-1.el7.noarch
>
> Engine host info:
> CentOS 7
> $ rpm -qa | grep ovirt
> ovirt-engine-lib-3.6.7.5-1.el7.centos.noarch
> ovirt-iso-uploader-3.6.0-1.el7.centos.noarch
> ovirt-engine-wildfly-overlay-8.0.5-1.el7.noarch
> ovirt-engine-webadmin-portal-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-jboss-as-7.1.1-1.el7.x86_64
> ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.7.
> 5-1.el7.centos.noarch
> ovirt-host-deploy-1.4.1-1.el7.centos.noarch
> ovirt-engine-vmconsole-proxy-helper-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-backend-3.6.7.5-1.el7.centos.noarch
> ovirt-setup-lib-1.0.1-1.el7.centos.noarch
> ovirt-engine-setup-plugin-websocket-proxy-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-websocket-proxy-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-tools-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-setup-base-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-setup-3.6.7.5-1.el7.centos.noarch
> ovirt-vmconsole-1.0.2-1.el7.centos.noarch
> ovirt-engine-wildfly-8.2.1-1.el7.x86_64
> ovirt-engine-tools-backup-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-userportal-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-3.6.7.5-1.el7.centos.noarch
> ovirt-release35-006-1.noarch
> ovirt-engine-extension-aaa-ldap-1.1.0-0.0.master.
> 20151021074904.git92c5c31.el7.noarch
> ovirt-release36-3.6.7-1.noarch
> ovirt-engine-setup-plugin-ovirt-engine-3.6.7.5-1.el7.centos.noarch
> ovirt-host-deploy-java-1.4.1-1.el7.centos.noarch
> ovirt-image-uploader-3.6.0-1.el7.centos.noarch
> ovirt-engine-dbscripts-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-sdk-python-3.6.3.0-1.el7.noarch
> ovirt-engine-extension-aaa-jdbc-1.0.7-1.el7.noarch
> ovirt-engine-extensions-api-impl-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-restapi-3.6.7.5-1.el7.centos.noarch
> ovirt-engine-setup-plugin-ovirt-engine-common-3.6.7.5-1.el7.centos.noarch
> ovirt-vmconsole-proxy-1.0.2-1.el7.centos.noarch
> ovirt-engine-cli-3.6.2.0-1.el7.centos.noarch
>
>
> Thanks,
> Steve
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users