[PATCH v2 0/8] thread_info cleanups and stack caching
This is the last bit of the vmap stack pile. Now that thread_info is non-magical, we can free the thread stack as soon as the task is dead (without waiting for RCU) and then, if vmapped stacks are in use, cache the entire stack for reuse on the same cpu. This seems to be an overall speedup of about 0.5-1 µs per pthread_create/join compared to the old CONFIG_VMAP_STACK=n baseline in a simple test -- a percpu cache of vmalloced stacks appears to be a bit faster than a high-order stack allocation, at least when the cache hits. (I expect that workloads with a low cache hit rate are likely to be dominated by other effects anyway.) Changes from v1: - Rebased. - Added a comment fixup (patch 1). - Add one more try_get_task_stack() that Josh noticed. Changes from before: - A bunch of the series is already in 4.8-rc. - Added the get_wchan() and collect_syscall() patches. - Rebased. Andy Lutomirski (7): x86/entry/64: Fix a minor comment rebase error sched: Add try_get_task_stack() and put_task_stack() x86/dumpstack: Pin the target stack when dumping it x86/process: Pin the target stack in get_wchan() lib/syscall: Pin the task stack in collect_syscall() sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK fork: Cache two thread stacks per cpu if CONFIG_VMAP_STACK is set Oleg Nesterov (1): kthread: to_live_kthread() needs try_get_task_stack() arch/x86/entry/entry_64.S | 1 - arch/x86/kernel/dumpstack_32.c | 5 +++ arch/x86/kernel/dumpstack_64.c | 5 +++ arch/x86/kernel/process.c | 22 +++--- arch/x86/kernel/stacktrace.c | 5 +++ include/linux/init_task.h | 4 +- include/linux/sched.h | 30 + init/Kconfig | 3 ++ kernel/fork.c | 97 +- kernel/kthread.c | 8 +++- kernel/sched/core.c| 4 ++ lib/syscall.c | 15 ++- 12 files changed, 176 insertions(+), 23 deletions(-) -- 2.7.4
[PATCH v2 0/8] thread_info cleanups and stack caching
This is the last bit of the vmap stack pile. Now that thread_info is non-magical, we can free the thread stack as soon as the task is dead (without waiting for RCU) and then, if vmapped stacks are in use, cache the entire stack for reuse on the same cpu. This seems to be an overall speedup of about 0.5-1 µs per pthread_create/join compared to the old CONFIG_VMAP_STACK=n baseline in a simple test -- a percpu cache of vmalloced stacks appears to be a bit faster than a high-order stack allocation, at least when the cache hits. (I expect that workloads with a low cache hit rate are likely to be dominated by other effects anyway.) Changes from v1: - Rebased. - Added a comment fixup (patch 1). - Add one more try_get_task_stack() that Josh noticed. Changes from before: - A bunch of the series is already in 4.8-rc. - Added the get_wchan() and collect_syscall() patches. - Rebased. Andy Lutomirski (7): x86/entry/64: Fix a minor comment rebase error sched: Add try_get_task_stack() and put_task_stack() x86/dumpstack: Pin the target stack when dumping it x86/process: Pin the target stack in get_wchan() lib/syscall: Pin the task stack in collect_syscall() sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK fork: Cache two thread stacks per cpu if CONFIG_VMAP_STACK is set Oleg Nesterov (1): kthread: to_live_kthread() needs try_get_task_stack() arch/x86/entry/entry_64.S | 1 - arch/x86/kernel/dumpstack_32.c | 5 +++ arch/x86/kernel/dumpstack_64.c | 5 +++ arch/x86/kernel/process.c | 22 +++--- arch/x86/kernel/stacktrace.c | 5 +++ include/linux/init_task.h | 4 +- include/linux/sched.h | 30 + init/Kconfig | 3 ++ kernel/fork.c | 97 +- kernel/kthread.c | 8 +++- kernel/sched/core.c| 4 ++ lib/syscall.c | 15 ++- 12 files changed, 176 insertions(+), 23 deletions(-) -- 2.7.4