Hi all, I did some more debugging with pdb, and seems to be the problem somehow connected to this eventlet issue: https://github.com/eventlet/eventlet/issues/30
I don't have a clue if it has any connections to the Rabbit heartbeat thing, but if I change the self.wait(0) to self.wait(0.1) in eventlet/hubs/hub.py, then the CPU usage drops significantly. Br, György > -----Original Message----- > From: Gyorgy Szombathelyi > [mailto:gyorgy.szombathe...@doclerholding.com] > Sent: 2016 február 17, szerda 14:47 > To: 'openstack-dev@lists.openstack.org' <openstack- > d...@lists.openstack.org> > Subject: Re: [openstack-dev] [ceilometer]ceilometer-collector high CPU > usage > > > > > hi, > Hi Gordon, > > > > > this seems to be similar to a bug we were tracking in earlier[1]. > > basically, any service with a listener never seemed to idle properly. > > > > based on earlier investigation, we found it relates to the heartbeat > > functionality in oslo.messaging. i'm not entirely sure if it's because > > of it or some combination of things including it. the short answer, is > > to disable heartbeat by setting heartbeat_timeout_threshold = 0 and > > see if it fixes your cpu usage. you can track the comments in bug. > > As I see in the bug report, you mention that the problem is only with the > notification agent, and the collector is fine. I'm in an entirely opposite > else > situtation. > > starce-ing the two processes: > > Notification agent: > ---------------------- > epoll_wait(4, {}, 1023, 43) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > epoll_ctl(4, EPOLL_CTL_DEL, 8, > {EPOLLWRNORM|EPOLLMSG|EPOLLERR|EPOLLHUP|EPOLLRDHUP|EPOLLON > ESHOT|EPOLLET|0x1ec88000, {u32=32738, u64=24336577484324834}}) = 0 > recvfrom(8, 0x7fe2da3a4084, 7, 0, 0, 0) = -1 EAGAIN (Resource temporarily > unavailable) epoll_ctl(4, EPOLL_CTL_ADD, 8, > {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=8, > u64=40046962262671368}}) = 0 > epoll_wait(4, {}, 1023, 1) = 0 > epoll_ctl(4, EPOLL_CTL_DEL, 24, > {EPOLLWRNORM|EPOLLMSG|EPOLLERR|EPOLLHUP|EPOLLRDHUP|EPOLLON > ESHOT|EPOLLET|0x1ec88000, {u32=32738, u64=24336577484324834}}) = 0 > recvfrom(24, 0x7fe2da3a4084, 7, 0, 0, 0) = -1 EAGAIN (Resource temporarily > unavailable) epoll_ctl(4, EPOLL_CTL_ADD, 24, > {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=24, > u64=40046962262671384}}) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > > ceilometer-collector: > ------------------------- > epoll_wait(4, {}, 1023, 0) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > epoll_wait(4, {}, 1023, 0) = 0 > > So the notification agent do something at least between the crazy epoll()s. > > It is the same with or without the heartbeat_timeout_threshold = 0 in > [oslo_messaging_rabbit]. > Then something must be still wrong with the listeners, the bug[1] should not > be closed, I think. > > Br, > György > > > > > [1] https://bugs.launchpad.net/oslo.messaging/+bug/1478135 > > > > On 17/02/2016 4:14 AM, Gyorgy Szombathelyi wrote: > > > Hi! > > > > > > Excuse me, if the following question/problem is a basic one, already > > > known problem, or even a bad setup on my side. > > > > > > I just noticed that the most CPU consuming process in an idle > > > OpenStack cluster is ceilometer-collector. When there are only > > > 10-15 samples/minute, it just constantly eats about 15-20% CPU. > > > > > > I started to debug, and noticed that it epoll()s constantly with a > > > zero timeout, so it seems it just polls for events in a tight loop. > > > I found out that the _maybe_ the python side of the problem is > > > oslo_messaging.get_notification_listener() with the eventlet executor. > > > A quick search showed that this function is only used in > > > aodh_listener and ceilometer_collector, and both are using > > > relatively high CPU even if they're just 'listening'. > > > > > > My skills for further debugging is limited, but I'm just curious why > > > this listener uses so much CPU, while other executors, which are > > > using eventlet, are not that bad. Excuse me, if it was a basic > > > question, already known problem, or even a bad setup on my side. > > > > > > Br, > > > György > > > > > > > > > __________________________________________________________ > > ____________ > > > ____ OpenStack Development Mailing List (not for usage questions) > > > Unsubscribe: > > > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > -- > > gord > > > > > __________________________________________________________ > > ________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: OpenStack-dev- > > requ...@lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __________________________________________________________ > ________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev- > requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev