Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state
I'd prefer the kernel to do such clustering... I think that is a next step. Also, while the kernel can do this at a best effort basis, it cannot take into account things the kernel doesn't know about, like high priority job peak load etc.., things a job scheduler would know. Then again, a job scheduler would likely already know about the AVX state anyway. the job scheduler can guess. unless it can also *measure* it won't know for sure... so even in that scenario having a decent way to report actuals is useful
Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state
On Mon, Nov 12, 2018 at 06:31:47AM +0100, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > > > On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote: > > > > > > * Aubrey Li wrote: > > > > > > > Expose the per-task cpu specific thread state value, it's helpful > > > > for userland to classify and schedule the tasks by different policies > > > > > > That's pretty vague - what exactly would use this information? I'm sure > > > you have a usecase in mind - could you please describe it? > > > > Yeah, "thread_state" is a pretty terrible name for this. The use-case is > > detectoring which tasks use AVX3 such that a userspace component (think > > job scheduler) can cluster them together. > > I'd prefer the kernel to do such clustering... I think that is a next step. Also, while the kernel can do this at a best effort basis, it cannot take into account things the kernel doesn't know about, like high priority job peak load etc.., things a job scheduler would know. Then again, a job scheduler would likely already know about the AVX state anyway.
Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state
On 2018/11/12 13:31, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > >> On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote: >>> >>> * Aubrey Li wrote: >>> Expose the per-task cpu specific thread state value, it's helpful for userland to classify and schedule the tasks by different policies >>> >>> That's pretty vague - what exactly would use this information? I'm sure >>> you have a usecase in mind - could you please describe it? >> >> Yeah, "thread_state" is a pretty terrible name for this. The use-case is >> detectoring which tasks use AVX3 such that a userspace component (think >> job scheduler) can cluster them together. > > I'd prefer the kernel to do such clustering... > Some userland application projects like Kubernetes request an interface of such information, we could do the clustering either in the kernel or from userland, does it make sense we expose the information via the proposed interface first? Thanks, -Aubrey
Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state
* Peter Zijlstra wrote: > On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote: > > > > * Aubrey Li wrote: > > > > > Expose the per-task cpu specific thread state value, it's helpful > > > for userland to classify and schedule the tasks by different policies > > > > That's pretty vague - what exactly would use this information? I'm sure > > you have a usecase in mind - could you please describe it? > > Yeah, "thread_state" is a pretty terrible name for this. The use-case is > detectoring which tasks use AVX3 such that a userspace component (think > job scheduler) can cluster them together. I'd prefer the kernel to do such clustering... Thanks, Ingo
Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state
On 2018/11/8 18:17, Peter Zijlstra wrote: > On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote: >> >> * Aubrey Li wrote: >> >>> Expose the per-task cpu specific thread state value, it's helpful >>> for userland to classify and schedule the tasks by different policies >> >> That's pretty vague - what exactly would use this information? I'm sure >> you have a usecase in mind - could you please describe it? > > Yeah, "thread_state" is a pretty terrible name for this. task_struct has a CPU specific element "thread", I quote it to here to create a generic interface to expose CPU specific state of the task. Like /proc//stat, I plan to use this "thread_state" to host any CPU specific state, including AVX state(now only), and any other states(may come soon). So this interface may be extended in future like the following: #cat /proc//thread_state 1 0 0 > The use-case is > detectoring which tasks use AVX3 such that a userspace component (think > job scheduler) can cluster them together.> > The 'problem' is that running AVX2+ code drops the max clock, because > you light up the massive wide (and thus large area) paths. Thanks to explain this. > > So maybe something like "simd_active" is a better name, dunno. As above, maybe the name can be hidden by thread_state? > > Or maybe "simd_class" and we can write out 0,1,2,3 depending on the AVX > class being used, dunno. It might make sense to look at what other arch > SIMD stuff looks like to form this interface. > A task may use 1,2,3 simultaneously, as a scheduler hint, it's just cluster or not, so 0/1 may be good enough. Thanks, -Aubrey
Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state
On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote: > > * Aubrey Li wrote: > > > Expose the per-task cpu specific thread state value, it's helpful > > for userland to classify and schedule the tasks by different policies > > That's pretty vague - what exactly would use this information? I'm sure > you have a usecase in mind - could you please describe it? Yeah, "thread_state" is a pretty terrible name for this. The use-case is detectoring which tasks use AVX3 such that a userspace component (think job scheduler) can cluster them together. The 'problem' is that running AVX2+ code drops the max clock, because you light up the massive wide (and thus large area) paths. So maybe something like "simd_active" is a better name, dunno. Or maybe "simd_class" and we can write out 0,1,2,3 depending on the AVX class being used, dunno. It might make sense to look at what other arch SIMD stuff looks like to form this interface.
Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state
* Aubrey Li wrote: > Expose the per-task cpu specific thread state value, it's helpful > for userland to classify and schedule the tasks by different policies That's pretty vague - what exactly would use this information? I'm sure you have a usecase in mind - could you please describe it? Thanks, Ingo
[RFC PATCH v1 2/2] proc: add /proc//thread_state
Expose the per-task cpu specific thread state value, it's helpful for userland to classify and schedule the tasks by different policies Signed-off-by: Aubrey Li Cc: Peter Zijlstra Cc: Andi Kleen Cc: Tim Chen Cc: Arjan van de Ven --- arch/x86/kernel/fpu/xstate.c | 13 + fs/proc/base.c | 13 + 2 files changed, 26 insertions(+) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 87a57b7..5a4a1d5 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -1245,3 +1246,15 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf) return 0; } + +/* + * report AVX state + */ +void arch_thread_state(struct seq_file *m, struct task_struct *task) +{ + if (task->thread.fpu.avx.state) + seq_putc(m, '1'); + else + seq_putc(m, '0'); + seq_putc(m, '\n'); +} diff --git a/fs/proc/base.c b/fs/proc/base.c index aaffc0c..be24327 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2893,6 +2893,17 @@ static int proc_pid_patch_state(struct seq_file *m, struct pid_namespace *ns, } #endif /* CONFIG_LIVEPATCH */ +void __weak arch_thread_state(struct seq_file *m, struct task_struct *task) +{ +} + +static int proc_pid_thread_state(struct seq_file *m, struct pid_namespace *ns, +struct pid *pid, struct task_struct *task) +{ + arch_thread_state(m, task); + return 0; +} + /* * Thread groups */ @@ -2994,6 +3005,7 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_LIVEPATCH ONE("patch_state", S_IRUSR, proc_pid_patch_state), #endif + ONE("thread_state", S_IRUSR, proc_pid_thread_state), }; static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx) @@ -3372,6 +3384,7 @@ static const struct pid_entry tid_base_stuff[] = { #ifdef CONFIG_LIVEPATCH ONE("patch_state", S_IRUSR, proc_pid_patch_state), #endif + ONE("thread_state", S_IRUSR, proc_pid_thread_state), }; static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx) -- 2.7.4