This patchset aims to reduce contention of the global pool_lock while improving performance at the same time. It is done to resolve the following soft lockup problem with a debug kernel in some of the large SMP systems:
NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21] ... RIP: 0010:[<ffffffff817c216b>] [<ffffffff817c216b>] _raw_spin_unlock_irqrestore+0x3b/0x60 ... Call Trace: [<ffffffff813f40d1>] free_object+0x81/0xb0 [<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220 [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [<ffffffff81284996>] ? file_free_rcu+0x36/0x60 [<ffffffff81251712>] kmem_cache_free+0xd2/0x380 [<ffffffff81284960>] ? fput+0x90/0x90 [<ffffffff81284996>] file_free_rcu+0x36/0x60 [<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550 [<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550 [<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50 [<ffffffff810c59d1>] kthread+0x101/0x120 [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [<ffffffff817c2d32>] ret_from_fork+0x22/0x50 On a 8-socket IvyBridge-EX system (120 cores, 240 threads), the elapsed time of a 4.9-rc7 kernel parallel build (make -j 240) was reduced from 7m57s to 7m19s with a patched 4.9-rc7 kernel. There was also about a 10X reduction in the number of debug objects being allocated from or freed to the kmemcache during the kernel build. Waiman Long (3): debugobjects: Scale thresholds with # of CPUs debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done debugobjects: Reduce contention on the global pool_lock lib/debugobjects.c | 57 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 45 insertions(+), 12 deletions(-) -- 1.8.3.1