Public bug reported:

[Impact]
The CPU utilization keeps high and the flamegraph[1] shows that the CPU
is busy updating the load average in the for loop inside
update_blocked_averages() function. Also, the OOM happens because of
the decayed cfs_rqs are not released.

[Fix]
commit a9e7f6544b9cebdae54d29f87a7ba2a83c0471b5
Author: Tejun Heo <[email protected]>
Date:   Tue Apr 25 17:43:50 2017 -0700

sched/fair: Fix O(nr_cgroups) in load balance path
    
Currently, rq->leaf_cfs_rq_list is a traversal ordered list of all
live cfs_rqs which have ever been active on the CPU; unfortunately,
this makes update_blocked_averages() O(# total cgroups) which isn't
scalable at all.
    
This shows up as a small CPU consumption and scheduling latency
increase in the load balancing path in systems with CPU controller
enabled across most cgroups.  In an edge case where temporary cgroups
were leaking, this caused the kernel to consume good several tens of
percents of CPU cycles running update_blocked_averages(), each run
taking multiple millisecs.
    
This patch fixes the issue by taking empty and fully decayed cfs_rqs
off the rq->leaf_cfs_rq_list. 

[Test]
1). Running the script
#/bin/bash

for i in $(seq 1 10); do
        ( for j in $(seq 1 3000); do ssh -S none u@localhost date;done; echo 
"done $i" ) &
done    

2). Observe the cfs_rqs
$ watch -n1 "grep cfs_rq /proc/sched_debug| wc -l"

3). Observe the CPU utilization rate
$ sudo htop

The patched kernel[2] shows that the CPU utilization rate is normal, the
cfs_rqs is decreased periodically, and the memory can be limited.

[Reference]
[1]. http://kernel.ubuntu.com/~gavinguo/168887/2018-01-31_07-38-45.perf.data.svg
[2]. https://launchpad.net/~mimi0213kimo/+archive/ubuntu/cfs-rq-clean

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Gavin Guo (mimi0213kimo)
         Status: Incomplete


** Tags: sts

** Attachment added: "flamegraph of the high CPU utilization symptom"
   
https://bugs.launchpad.net/bugs/1747896/+attachment/5050575/+files/2018-01-31_07-38-45.perf.data.svg

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1747896

Title:
  OOM and High CPU utilization in update_blocked_averages because of too
  many cfs_rqs in rq->leaf_cfs_rq_list

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747896/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to