Re: [ovirt-users] vdsm using 100% CPU, rapidly filling logs with _handle_event messages
On Wed, 18 Nov 2015 07:28:23 -0500 Robert wrote: RS> On Wed, 18 Nov 2015 12:35:17 +0100 Vinzenz wrote: RS> VF> On 11/12/2015 03:16 PM, Robert Story wrote: RS> VF> > On Thu, 12 Nov 2015 16:02:59 +0200 Dan wrote: RS> VF> > DK> On Thu, Nov 12, 2015 at 08:45:43AM -0500, Robert Story wrote: RS> VF> > DK> > I'm running oVirt 3.5.x with a hosted engine. This morning I RS> VF> > DK> > noticed that 2 of my 5 hosts were showing 99-100% cpu usage. RS> VF> > DK> > Logging in to them, vdsmd seemed to be the culprit, and it RS> VF> > DK> > was filling the log file with these messages: RS> VF> > DK> RS> VF> > DK> You're probably seeing RS> VF> > DK> Bug 1226911 - vmchannel thread consumes 100% of CPU RS> VF> > DK> RS> VF> > DK> which was closed due to missing information. Do you have any RS> VF> > DK> information on when this pops up? Is it reproducible? Would RS> VF> > DK> you be bale to test a suggested patch RS> VF> > DK> RS> VF> > DK> https://gerrit.ovirt.org/#/c/42570/ RS> VF> > RS> VF> > Hi Dan, RS> VF> > RS> VF> > Thanks for the pointers. If it comes up again, I'll try this RS> VF> > patch and report back on the bug... RS> VF> > RS> VF> Out of curiosity, did you happen again to see that happening again? RS> RS> No, I have not. So naturally it shows up again during a holiday. I came in today to find 1 of my 5 nodes (the SPM and host where hosted engine is running) with two vdsmd threads chewing up 90-100% of the CPU. I applied the patch from above and restarted vdsmd. This resulted in another node being selected as the SPM, and within about 15 minutes, that node had the same issue. Applied the patch to the new node, and restarted vdsmd, and the SPM went back to the previous (now patched) node. Hopefully things will stay stable. I've attached snippets of the logs from the SPM when the problem started, along with the server/engine log snippets from the hosted engine around the same time.. Robert -- Senior Software Engineer @ Parsons vdsm-runaway.tgz Description: application/compressed-tar pgpT38OiTML4U.pgp Description: OpenPGP digital signature ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] vdsm using 100% CPU, rapidly filling logs with _handle_event messages
On Wed, 18 Nov 2015 12:35:17 +0100 Vinzenz wrote: VF> On 11/12/2015 03:16 PM, Robert Story wrote: VF> > On Thu, 12 Nov 2015 16:02:59 +0200 Dan wrote: VF> > DK> On Thu, Nov 12, 2015 at 08:45:43AM -0500, Robert Story wrote: VF> > DK> > I'm running oVirt 3.5.x with a hosted engine. This morning I VF> > DK> > noticed that 2 of my 5 hosts were showing 99-100% cpu usage. VF> > DK> > Logging in to them, vdsmd seemed to be the culprit, and it was VF> > DK> > filling the log file with these messages: VF> > DK> VF> > DK> You're probably seeing VF> > DK> Bug 1226911 - vmchannel thread consumes 100% of CPU VF> > DK> VF> > DK> which was closed due to missing information. Do you have any VF> > DK> information on when this pops up? Is it reproducible? Would you VF> > DK> be bale to test a suggested patch VF> > DK> VF> > DK> https://gerrit.ovirt.org/#/c/42570/ VF> > VF> > Hi Dan, VF> > VF> > Thanks for the pointers. If it comes up again, I'll try this patch and VF> > report back on the bug... VF> > VF> Out of curiosity, did you happen again to see that happening again? No, I have not. Robert -- Senior Software Engineer @ Parsons pgpTptOzRh2yy.pgp Description: OpenPGP digital signature ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] vdsm using 100% CPU, rapidly filling logs with _handle_event messages
On Thu, 12 Nov 2015 16:02:59 +0200 Dan wrote: DK> On Thu, Nov 12, 2015 at 08:45:43AM -0500, Robert Story wrote: DK> > I'm running oVirt 3.5.x with a hosted engine. This morning I noticed DK> > that 2 of my 5 hosts were showing 99-100% cpu usage. Logging in to DK> > them, vdsmd seemed to be the culprit, and it was filling the log file DK> > with these messages: DK> DK> You're probably seeing DK> Bug 1226911 - vmchannel thread consumes 100% of CPU DK> DK> which was closed due to missing information. Do you have any information DK> on when this pops up? Is it reproducible? Would you be bale to test a DK> suggested patch DK> DK> https://gerrit.ovirt.org/#/c/42570/ Hi Dan, Thanks for the pointers. If it comes up again, I'll try this patch and report back on the bug... Robert -- Senior Software Engineer @ Parsons pgprPpkTJVCUB.pgp Description: OpenPGP digital signature ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] vdsm using 100% CPU, rapidly filling logs with _handle_event messages
On Thu, Nov 12, 2015 at 08:45:43AM -0500, Robert Story wrote: > I'm running oVirt 3.5.x with a hosted engine. This morning I noticed that 2 > of my 5 hosts were showing 99-100% cpu usage. Logging in to them, vdsmd > seemed to be the culprit, and it was filling the log file with these > messages: You're probably seeing Bug 1226911 - vmchannel thread consumes 100% of CPU which was closed due to missing information. Do you have any information on when this pops up? Is it reproducible? Would you be bale to test a suggested patch https://gerrit.ovirt.org/#/c/42570/ Regards, Dan. > > VM Channels Listener::DEBUG::2015-11-12 > 08:09:26,292::vmchannels::59::vds::(_handle_event) Received 0011. On fd > removed by epoll. VM Channels Listener::INFO::2015-11-12 > 08:09:26,293::vmchannels::54::vds::(_handle_event) Received 0011 on > fileno 119 > VM Channels Listener::DEBUG::2015-11-12 > 08:09:26,293::vmchannels::59::vds::(_handle_event) Received 0011. On fd > removed by epoll. > VM Channels Listener::INFO::2015-11-12 > 08:09:26,293::vmchannels::54::vds::(_handle_event) Received 0011 on > fileno 75 > VM Channels Listener::DEBUG::2015-11-12 > 08:09:26,293::vmchannels::59::vds::(_handle_event) Received 0011. On fd > removed by epoll. > VM Channels Listener::INFO::2015-11-12 > 08:09:26,294::vmchannels::54::vds::(_handle_event) Received 0011 on > fileno 119 > VM Channels Listener::DEBUG::2015-11-12 > 08:09:26,294::vmchannels::59::vds::(_handle_event) Received 0011. On fd > removed by epoll. > > I googled to see how to change the debug level to turn of DEBUG messages > for vdsm, which referred me to libvirtd.conf, but the debug level there was > not set, which should have meant a log level of 3 (warnings and errors), so > I'm not sure why the log was filling up with DEBUG/INFO messages. > > I restarted vdsmd, which resulted in those nodes being marked as > 'disconnected', but they did eventually recover and loads went back to > normal. > > This may or may not be related to the fact that the 3 hosts where this did > not happen can't seem to keep their ha brokers up. I'll be starting a new > thread on that shortly. > > > Robert > > -- > Senior Software Engineer @ Parsons > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users