Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Andy Lutomirski
On Thu, Jan 26, 2017 at 6:54 AM, Rik van Riel  wrote:
> On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:
>
>> index c56fb57f2991..7eb2f3041fde 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p,
>> unsigned int new_cpu)
>>   p->sched_class->migrate_task_rq(p);
>>   p->se.nr_migrations++;
>>   perf_event_task_migrate(p);
>> +
>> + arch_task_migrate(p);
>>   }
>>
>
> Does it really count as a "simplification" if you add a
> scheduler callback?
>
> This code does not seem any easier to understand than
> the old code...

I think I lean toward liking Ingo's version better.  The old code most
likely saved an instruction, but the new code gets the point across
quite nicely.


Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Andy Lutomirski
On Thu, Jan 26, 2017 at 6:54 AM, Rik van Riel  wrote:
> On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:
>
>> index c56fb57f2991..7eb2f3041fde 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p,
>> unsigned int new_cpu)
>>   p->sched_class->migrate_task_rq(p);
>>   p->se.nr_migrations++;
>>   perf_event_task_migrate(p);
>> +
>> + arch_task_migrate(p);
>>   }
>>
>
> Does it really count as a "simplification" if you add a
> scheduler callback?
>
> This code does not seem any easier to understand than
> the old code...

I think I lean toward liking Ingo's version better.  The old code most
likely saved an instruction, but the new code gets the point across
quite nicely.


Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Ingo Molnar

* Rik van Riel  wrote:

> On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:
> 
> > index c56fb57f2991..7eb2f3041fde 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p,
> > unsigned int new_cpu)
> >     p->sched_class->migrate_task_rq(p);
> >     p->se.nr_migrations++;
> >     perf_event_task_migrate(p);
> > +
> > +   arch_task_migrate(p);
> >     }
> > 
> 
> Does it really count as a "simplification" if you add a
> scheduler callback?
> 
> This code does not seem any easier to understand than
> the old code...

See the extra commit I added on top:

  7deff4369276 x86/fpu: Unify the naming of the FPU register cache validity 
flags

which makes it clearer, we now have:

->fpregs_owner [bool]
  fpregs_owner_ctx [ptr]

That are set to 1 and the context pointer when a task with no FPU state is 
scheduled in and where the state of the previous task is preserved (cached) in 
the 
FPU registers - and which FPU register state cache can be invalidated after 
this 
by clearing any of the two flags.

That should make its overall meaning clearer, in that they represent a single 
'cache' where the cache validity flag is split into two copies, where any of 
which 
can be used to invalidate the cache.

Thanks,

Ingo


Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Ingo Molnar

* Rik van Riel  wrote:

> On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:
> 
> > index c56fb57f2991..7eb2f3041fde 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p,
> > unsigned int new_cpu)
> >     p->sched_class->migrate_task_rq(p);
> >     p->se.nr_migrations++;
> >     perf_event_task_migrate(p);
> > +
> > +   arch_task_migrate(p);
> >     }
> > 
> 
> Does it really count as a "simplification" if you add a
> scheduler callback?
> 
> This code does not seem any easier to understand than
> the old code...

See the extra commit I added on top:

  7deff4369276 x86/fpu: Unify the naming of the FPU register cache validity 
flags

which makes it clearer, we now have:

->fpregs_owner [bool]
  fpregs_owner_ctx [ptr]

That are set to 1 and the context pointer when a task with no FPU state is 
scheduled in and where the state of the previous task is preserved (cached) in 
the 
FPU registers - and which FPU register state cache can be invalidated after 
this 
by clearing any of the two flags.

That should make its overall meaning clearer, in that they represent a single 
'cache' where the cache validity flag is split into two copies, where any of 
which 
can be used to invalidate the cache.

Thanks,

Ingo


Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Rik van Riel
On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:

> index c56fb57f2991..7eb2f3041fde 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p,
> unsigned int new_cpu)
>   p->sched_class->migrate_task_rq(p);
>   p->se.nr_migrations++;
>   perf_event_task_migrate(p);
> +
> + arch_task_migrate(p);
>   }
> 

Does it really count as a "simplification" if you add a
scheduler callback?

This code does not seem any easier to understand than
the old code...


Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Rik van Riel
On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:

> index c56fb57f2991..7eb2f3041fde 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p,
> unsigned int new_cpu)
>   p->sched_class->migrate_task_rq(p);
>   p->se.nr_migrations++;
>   perf_event_task_migrate(p);
> +
> + arch_task_migrate(p);
>   }
> 

Does it really count as a "simplification" if you add a
scheduler callback?

This code does not seem any easier to understand than
the old code...


Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Ingo Molnar

* Rik van Riel  wrote:

> On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:
> > 
> > @@ -322,6 +308,16 @@ struct fpu {
> >     unsigned char   fpregs_active;
> >  
> >     /*
> > +    * @fpregs_cached:
> > +    *
> > +    * This flag tells us whether this context is loaded into a
> > CPU
> > +    * right now.
> 
> Not quite. You are still checking against fpu_fpregs_owner_ctx.

> How about something like
> 
>   * This flag tells us whether this context was loaded into
>   * its current CPU; fpu_fpregs_owner_ctx will tell us whether
>   * this context is actually in the registers.

That's still not quite accurate: if ->fpregs_cached is 0 and 
fpu_fpregs_owner_ctx 
is still pointing to the FPU structure then the context is not actually in the 
registers anymore - it's a stale copy of some past version.

These values simply tell us whether an in-memory FPU context's latest version 
is 
in CPU registers or not: both have to be valid for the in-CPU registers to be 
valid and current. The fpu_fpregs_owner_ctx pointer is a per-CPU data structure 
that tells us this fact, the ->fpregs_cached flag tells us the same - but it is 
placed into the task/fpu structure.

Clearing any of those values invalidates the cache and the point of keeping 
them 
split is implementation efficiency: for some invalidations it's easier to use 
the 
per-cpu structure, for some others (such as ptrace access) it's easier to 
access 
the per-task flag. The FPU switch-in code has easy access to both values so 
there's no extra cost from having the cache validity flag split into two parts.

A consequence of this is that a correct implementation could in theory 
eliminate 
any of the two flags:

 - We could use only fpu_fpregs_owner_ctx and remove ->fpregs_cached, in this 
case
   the ptrace codepaths would have to invalidate the fpu_fpregs_owner_ctx 
pointer 
   which requires some care as it's not just a local CPU modification, i.e. a 
   single cmpxchg() would be required to invalidate the register state.

 - Or we could use only ->fpregs_cached and eliminate fpu_fpregs_owner_ctx: 
this 
   would be awkward from the kernel_fpu_begin()/end() API codepaths, which has 
no 
   easy access to the task that has its FPU context cached in the CPU 
registers. 
   (Which might not be the current task.)

So I think the best implementation is to have both flags, and to use the one 
that 
is the most efficient to access to drive the invalidations from.

What we could do is to unify the naming to explain all this a bit better - 
right 
now there's very little indication that ->fpregs_cached is closely related to 
fpu_fpregs_owner_ctx.

For example we could rename them to:

->fpregs_cached => ->fpregs_owner[bool]
fpu_fpregs_owner_ctx=>   fpregs_owner_ctx[ptr]

?

Clearing ->fpregs_owner or setting fpregs_owner_ctx to NULL invalidates the 
cache 
and it's clear from the naming that the two values are closely related.

Would this work with you?

Thanks,

Ingo


Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Ingo Molnar

* Rik van Riel  wrote:

> On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:
> > 
> > @@ -322,6 +308,16 @@ struct fpu {
> >     unsigned char   fpregs_active;
> >  
> >     /*
> > +    * @fpregs_cached:
> > +    *
> > +    * This flag tells us whether this context is loaded into a
> > CPU
> > +    * right now.
> 
> Not quite. You are still checking against fpu_fpregs_owner_ctx.

> How about something like
> 
>   * This flag tells us whether this context was loaded into
>   * its current CPU; fpu_fpregs_owner_ctx will tell us whether
>   * this context is actually in the registers.

That's still not quite accurate: if ->fpregs_cached is 0 and 
fpu_fpregs_owner_ctx 
is still pointing to the FPU structure then the context is not actually in the 
registers anymore - it's a stale copy of some past version.

These values simply tell us whether an in-memory FPU context's latest version 
is 
in CPU registers or not: both have to be valid for the in-CPU registers to be 
valid and current. The fpu_fpregs_owner_ctx pointer is a per-CPU data structure 
that tells us this fact, the ->fpregs_cached flag tells us the same - but it is 
placed into the task/fpu structure.

Clearing any of those values invalidates the cache and the point of keeping 
them 
split is implementation efficiency: for some invalidations it's easier to use 
the 
per-cpu structure, for some others (such as ptrace access) it's easier to 
access 
the per-task flag. The FPU switch-in code has easy access to both values so 
there's no extra cost from having the cache validity flag split into two parts.

A consequence of this is that a correct implementation could in theory 
eliminate 
any of the two flags:

 - We could use only fpu_fpregs_owner_ctx and remove ->fpregs_cached, in this 
case
   the ptrace codepaths would have to invalidate the fpu_fpregs_owner_ctx 
pointer 
   which requires some care as it's not just a local CPU modification, i.e. a 
   single cmpxchg() would be required to invalidate the register state.

 - Or we could use only ->fpregs_cached and eliminate fpu_fpregs_owner_ctx: 
this 
   would be awkward from the kernel_fpu_begin()/end() API codepaths, which has 
no 
   easy access to the task that has its FPU context cached in the CPU 
registers. 
   (Which might not be the current task.)

So I think the best implementation is to have both flags, and to use the one 
that 
is the most efficient to access to drive the invalidations from.

What we could do is to unify the naming to explain all this a bit better - 
right 
now there's very little indication that ->fpregs_cached is closely related to 
fpu_fpregs_owner_ctx.

For example we could rename them to:

->fpregs_cached => ->fpregs_owner[bool]
fpu_fpregs_owner_ctx=>   fpregs_owner_ctx[ptr]

?

Clearing ->fpregs_owner or setting fpregs_owner_ctx to NULL invalidates the 
cache 
and it's clear from the naming that the two values are closely related.

Would this work with you?

Thanks,

Ingo


Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Rik van Riel
On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:
> 
> @@ -322,6 +308,16 @@ struct fpu {
>   unsigned char   fpregs_active;
>  
>   /*
> +  * @fpregs_cached:
> +  *
> +  * This flag tells us whether this context is loaded into a
> CPU
> +  * right now.

Not quite. You are still checking against fpu_fpregs_owner_ctx.

How about something like

  * This flag tells us whether this context was loaded into
  * its current CPU; fpu_fpregs_owner_ctx will tell us whether
  * this context is actually in the registers.

> +  *
> +  * This is set to 0 if a task is migrated to another CPU.
> +  */
> + unsigned char   fpregs_cached;
> +
> + /*
>    * @state:
>    *
>    * In-memory copy of all FPU registers that we save/restore




Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

2017-01-26 Thread Rik van Riel
On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:
> 
> @@ -322,6 +308,16 @@ struct fpu {
>   unsigned char   fpregs_active;
>  
>   /*
> +  * @fpregs_cached:
> +  *
> +  * This flag tells us whether this context is loaded into a
> CPU
> +  * right now.

Not quite. You are still checking against fpu_fpregs_owner_ctx.

How about something like

  * This flag tells us whether this context was loaded into
  * its current CPU; fpu_fpregs_owner_ctx will tell us whether
  * this context is actually in the registers.

> +  *
> +  * This is set to 0 if a task is migrated to another CPU.
> +  */
> + unsigned char   fpregs_cached;
> +
> + /*
>    * @state:
>    *
>    * In-memory copy of all FPU registers that we save/restore