Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state

2018-11-12 Thread Arjan van de Ven




I'd prefer the kernel to do such clustering...


I think that is a next step.

Also, while the kernel can do this at a best effort basis, it cannot
take into account things the kernel doesn't know about, like high
priority job peak load etc.., things a job scheduler would know.

Then again, a job scheduler would likely already know about the AVX
state anyway.


the job scheduler can guess.
unless it can also *measure* it won't know for sure...

so even in that scenario having a decent way to report actuals is useful








Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state

2018-11-12 Thread Peter Zijlstra
On Mon, Nov 12, 2018 at 06:31:47AM +0100, Ingo Molnar wrote:
> 
> * Peter Zijlstra  wrote:
> 
> > On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote:
> > > 
> > > * Aubrey Li  wrote:
> > > 
> > > > Expose the per-task cpu specific thread state value, it's helpful
> > > > for userland to classify and schedule the tasks by different policies
> > > 
> > > That's pretty vague - what exactly would use this information? I'm sure 
> > > you have a usecase in mind - could you please describe it?
> > 
> > Yeah, "thread_state" is a pretty terrible name for this. The use-case is
> > detectoring which tasks use AVX3 such that a userspace component (think
> > job scheduler) can cluster them together.
> 
> I'd prefer the kernel to do such clustering...

I think that is a next step.

Also, while the kernel can do this at a best effort basis, it cannot
take into account things the kernel doesn't know about, like high
priority job peak load etc.., things a job scheduler would know.

Then again, a job scheduler would likely already know about the AVX
state anyway.


Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state

2018-11-11 Thread Li, Aubrey
On 2018/11/12 13:31, Ingo Molnar wrote:
> 
> * Peter Zijlstra  wrote:
> 
>> On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote:
>>>
>>> * Aubrey Li  wrote:
>>>
 Expose the per-task cpu specific thread state value, it's helpful
 for userland to classify and schedule the tasks by different policies
>>>
>>> That's pretty vague - what exactly would use this information? I'm sure 
>>> you have a usecase in mind - could you please describe it?
>>
>> Yeah, "thread_state" is a pretty terrible name for this. The use-case is
>> detectoring which tasks use AVX3 such that a userspace component (think
>> job scheduler) can cluster them together.
> 
> I'd prefer the kernel to do such clustering...
> 
Some userland application projects like Kubernetes request an interface of 
such information, we could do the clustering either in the kernel or from
userland, does it make sense we expose the information via the proposed 
interface first?

Thanks,
-Aubrey


Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state

2018-11-11 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote:
> > 
> > * Aubrey Li  wrote:
> > 
> > > Expose the per-task cpu specific thread state value, it's helpful
> > > for userland to classify and schedule the tasks by different policies
> > 
> > That's pretty vague - what exactly would use this information? I'm sure 
> > you have a usecase in mind - could you please describe it?
> 
> Yeah, "thread_state" is a pretty terrible name for this. The use-case is
> detectoring which tasks use AVX3 such that a userspace component (think
> job scheduler) can cluster them together.

I'd prefer the kernel to do such clustering...

Thanks,

Ingo


Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state

2018-11-08 Thread Li, Aubrey
On 2018/11/8 18:17, Peter Zijlstra wrote:
> On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote:
>>
>> * Aubrey Li  wrote:
>>
>>> Expose the per-task cpu specific thread state value, it's helpful
>>> for userland to classify and schedule the tasks by different policies
>>
>> That's pretty vague - what exactly would use this information? I'm sure 
>> you have a usecase in mind - could you please describe it?
> 
> Yeah, "thread_state" is a pretty terrible name for this.

task_struct has a CPU specific element "thread", I quote it to here to 
create a generic interface to expose CPU specific state of the task.
Like /proc//stat, I plan to use this "thread_state" to host any CPU
specific state, including AVX state(now only), and any other states(may come
soon). So this interface may be extended in future like the following:

#cat /proc//thread_state
1 0 0


>
 The use-case is
> detectoring which tasks use AVX3 such that a userspace component (think
> job scheduler) can cluster them together.> 
> The 'problem' is that running AVX2+ code drops the max clock, because
> you light up the massive wide (and thus large area) paths.

Thanks to explain this.

> 
> So maybe something like "simd_active" is a better name, dunno.

As above, maybe the name can be hidden by thread_state?

> 
> Or maybe "simd_class" and we can write out 0,1,2,3 depending on the AVX
> class being used, dunno. It might make sense to look at what other arch
> SIMD stuff looks like to form this interface.
> 
A task may use 1,2,3 simultaneously, as a scheduler hint, it's just cluster
or not, so 0/1 may be good enough.

Thanks,
-Aubrey


Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state

2018-11-08 Thread Peter Zijlstra
On Thu, Nov 08, 2018 at 07:32:46AM +0100, Ingo Molnar wrote:
> 
> * Aubrey Li  wrote:
> 
> > Expose the per-task cpu specific thread state value, it's helpful
> > for userland to classify and schedule the tasks by different policies
> 
> That's pretty vague - what exactly would use this information? I'm sure 
> you have a usecase in mind - could you please describe it?

Yeah, "thread_state" is a pretty terrible name for this. The use-case is
detectoring which tasks use AVX3 such that a userspace component (think
job scheduler) can cluster them together.

The 'problem' is that running AVX2+ code drops the max clock, because
you light up the massive wide (and thus large area) paths.

So maybe something like "simd_active" is a better name, dunno.

Or maybe "simd_class" and we can write out 0,1,2,3 depending on the AVX
class being used, dunno. It might make sense to look at what other arch
SIMD stuff looks like to form this interface.


Re: [RFC PATCH v1 2/2] proc: add /proc//thread_state

2018-11-07 Thread Ingo Molnar


* Aubrey Li  wrote:

> Expose the per-task cpu specific thread state value, it's helpful
> for userland to classify and schedule the tasks by different policies

That's pretty vague - what exactly would use this information? I'm sure 
you have a usecase in mind - could you please describe it?

Thanks,

Ingo


[RFC PATCH v1 2/2] proc: add /proc//thread_state

2018-11-06 Thread Aubrey Li
Expose the per-task cpu specific thread state value, it's helpful
for userland to classify and schedule the tasks by different policies

Signed-off-by: Aubrey Li 
Cc: Peter Zijlstra 
Cc: Andi Kleen 
Cc: Tim Chen 
Cc: Arjan van de Ven 
---
 arch/x86/kernel/fpu/xstate.c | 13 +
 fs/proc/base.c   | 13 +
 2 files changed, 26 insertions(+)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7..5a4a1d5 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1245,3 +1246,15 @@ int copy_user_to_xstate(struct xregs_state *xsave, const 
void __user *ubuf)
 
return 0;
 }
+
+/*
+ * report AVX state
+ */
+void arch_thread_state(struct seq_file *m, struct task_struct *task)
+{
+   if (task->thread.fpu.avx.state)
+   seq_putc(m, '1');
+   else
+   seq_putc(m, '0');
+   seq_putc(m, '\n');
+}
diff --git a/fs/proc/base.c b/fs/proc/base.c
index aaffc0c..be24327 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2893,6 +2893,17 @@ static int proc_pid_patch_state(struct seq_file *m, 
struct pid_namespace *ns,
 }
 #endif /* CONFIG_LIVEPATCH */
 
+void __weak arch_thread_state(struct seq_file *m, struct task_struct *task)
+{
+}
+
+static int proc_pid_thread_state(struct seq_file *m, struct pid_namespace *ns,
+struct pid *pid, struct task_struct *task)
+{
+   arch_thread_state(m, task);
+   return 0;
+}
+
 /*
  * Thread groups
  */
@@ -2994,6 +3005,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_LIVEPATCH
ONE("patch_state",  S_IRUSR, proc_pid_patch_state),
 #endif
+   ONE("thread_state",  S_IRUSR, proc_pid_thread_state),
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
@@ -3372,6 +3384,7 @@ static const struct pid_entry tid_base_stuff[] = {
 #ifdef CONFIG_LIVEPATCH
ONE("patch_state",  S_IRUSR, proc_pid_patch_state),
 #endif
+   ONE("thread_state",  S_IRUSR, proc_pid_thread_state),
 };
 
 static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx)
-- 
2.7.4