Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
On Thu, Jan 26, 2017 at 6:54 AM, Rik van Rielwrote: > On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > >> index c56fb57f2991..7eb2f3041fde 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p, >> unsigned int new_cpu) >> p->sched_class->migrate_task_rq(p); >> p->se.nr_migrations++; >> perf_event_task_migrate(p); >> + >> + arch_task_migrate(p); >> } >> > > Does it really count as a "simplification" if you add a > scheduler callback? > > This code does not seem any easier to understand than > the old code... I think I lean toward liking Ingo's version better. The old code most likely saved an instruction, but the new code gets the point across quite nicely.
Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
On Thu, Jan 26, 2017 at 6:54 AM, Rik van Riel wrote: > On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > >> index c56fb57f2991..7eb2f3041fde 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p, >> unsigned int new_cpu) >> p->sched_class->migrate_task_rq(p); >> p->se.nr_migrations++; >> perf_event_task_migrate(p); >> + >> + arch_task_migrate(p); >> } >> > > Does it really count as a "simplification" if you add a > scheduler callback? > > This code does not seem any easier to understand than > the old code... I think I lean toward liking Ingo's version better. The old code most likely saved an instruction, but the new code gets the point across quite nicely.
Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
* Rik van Rielwrote: > On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > > > index c56fb57f2991..7eb2f3041fde 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p, > > unsigned int new_cpu) > > p->sched_class->migrate_task_rq(p); > > p->se.nr_migrations++; > > perf_event_task_migrate(p); > > + > > + arch_task_migrate(p); > > } > > > > Does it really count as a "simplification" if you add a > scheduler callback? > > This code does not seem any easier to understand than > the old code... See the extra commit I added on top: 7deff4369276 x86/fpu: Unify the naming of the FPU register cache validity flags which makes it clearer, we now have: ->fpregs_owner [bool] fpregs_owner_ctx [ptr] That are set to 1 and the context pointer when a task with no FPU state is scheduled in and where the state of the previous task is preserved (cached) in the FPU registers - and which FPU register state cache can be invalidated after this by clearing any of the two flags. That should make its overall meaning clearer, in that they represent a single 'cache' where the cache validity flag is split into two copies, where any of which can be used to invalidate the cache. Thanks, Ingo
Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
* Rik van Riel wrote: > On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > > > index c56fb57f2991..7eb2f3041fde 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p, > > unsigned int new_cpu) > > p->sched_class->migrate_task_rq(p); > > p->se.nr_migrations++; > > perf_event_task_migrate(p); > > + > > + arch_task_migrate(p); > > } > > > > Does it really count as a "simplification" if you add a > scheduler callback? > > This code does not seem any easier to understand than > the old code... See the extra commit I added on top: 7deff4369276 x86/fpu: Unify the naming of the FPU register cache validity flags which makes it clearer, we now have: ->fpregs_owner [bool] fpregs_owner_ctx [ptr] That are set to 1 and the context pointer when a task with no FPU state is scheduled in and where the state of the previous task is preserved (cached) in the FPU registers - and which FPU register state cache can be invalidated after this by clearing any of the two flags. That should make its overall meaning clearer, in that they represent a single 'cache' where the cache validity flag is split into two copies, where any of which can be used to invalidate the cache. Thanks, Ingo
Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > index c56fb57f2991..7eb2f3041fde 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p, > unsigned int new_cpu) > p->sched_class->migrate_task_rq(p); > p->se.nr_migrations++; > perf_event_task_migrate(p); > + > + arch_task_migrate(p); > } > Does it really count as a "simplification" if you add a scheduler callback? This code does not seem any easier to understand than the old code...
Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > index c56fb57f2991..7eb2f3041fde 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1253,6 +1253,8 @@ void set_task_cpu(struct task_struct *p, > unsigned int new_cpu) > p->sched_class->migrate_task_rq(p); > p->se.nr_migrations++; > perf_event_task_migrate(p); > + > + arch_task_migrate(p); > } > Does it really count as a "simplification" if you add a scheduler callback? This code does not seem any easier to understand than the old code...
Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
* Rik van Rielwrote: > On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > > > > @@ -322,6 +308,16 @@ struct fpu { > > unsigned char fpregs_active; > > > > /* > > + * @fpregs_cached: > > + * > > + * This flag tells us whether this context is loaded into a > > CPU > > + * right now. > > Not quite. You are still checking against fpu_fpregs_owner_ctx. > How about something like > > * This flag tells us whether this context was loaded into > * its current CPU; fpu_fpregs_owner_ctx will tell us whether > * this context is actually in the registers. That's still not quite accurate: if ->fpregs_cached is 0 and fpu_fpregs_owner_ctx is still pointing to the FPU structure then the context is not actually in the registers anymore - it's a stale copy of some past version. These values simply tell us whether an in-memory FPU context's latest version is in CPU registers or not: both have to be valid for the in-CPU registers to be valid and current. The fpu_fpregs_owner_ctx pointer is a per-CPU data structure that tells us this fact, the ->fpregs_cached flag tells us the same - but it is placed into the task/fpu structure. Clearing any of those values invalidates the cache and the point of keeping them split is implementation efficiency: for some invalidations it's easier to use the per-cpu structure, for some others (such as ptrace access) it's easier to access the per-task flag. The FPU switch-in code has easy access to both values so there's no extra cost from having the cache validity flag split into two parts. A consequence of this is that a correct implementation could in theory eliminate any of the two flags: - We could use only fpu_fpregs_owner_ctx and remove ->fpregs_cached, in this case the ptrace codepaths would have to invalidate the fpu_fpregs_owner_ctx pointer which requires some care as it's not just a local CPU modification, i.e. a single cmpxchg() would be required to invalidate the register state. - Or we could use only ->fpregs_cached and eliminate fpu_fpregs_owner_ctx: this would be awkward from the kernel_fpu_begin()/end() API codepaths, which has no easy access to the task that has its FPU context cached in the CPU registers. (Which might not be the current task.) So I think the best implementation is to have both flags, and to use the one that is the most efficient to access to drive the invalidations from. What we could do is to unify the naming to explain all this a bit better - right now there's very little indication that ->fpregs_cached is closely related to fpu_fpregs_owner_ctx. For example we could rename them to: ->fpregs_cached => ->fpregs_owner[bool] fpu_fpregs_owner_ctx=> fpregs_owner_ctx[ptr] ? Clearing ->fpregs_owner or setting fpregs_owner_ctx to NULL invalidates the cache and it's clear from the naming that the two values are closely related. Would this work with you? Thanks, Ingo
Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
* Rik van Riel wrote: > On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > > > > @@ -322,6 +308,16 @@ struct fpu { > > unsigned char fpregs_active; > > > > /* > > + * @fpregs_cached: > > + * > > + * This flag tells us whether this context is loaded into a > > CPU > > + * right now. > > Not quite. You are still checking against fpu_fpregs_owner_ctx. > How about something like > > * This flag tells us whether this context was loaded into > * its current CPU; fpu_fpregs_owner_ctx will tell us whether > * this context is actually in the registers. That's still not quite accurate: if ->fpregs_cached is 0 and fpu_fpregs_owner_ctx is still pointing to the FPU structure then the context is not actually in the registers anymore - it's a stale copy of some past version. These values simply tell us whether an in-memory FPU context's latest version is in CPU registers or not: both have to be valid for the in-CPU registers to be valid and current. The fpu_fpregs_owner_ctx pointer is a per-CPU data structure that tells us this fact, the ->fpregs_cached flag tells us the same - but it is placed into the task/fpu structure. Clearing any of those values invalidates the cache and the point of keeping them split is implementation efficiency: for some invalidations it's easier to use the per-cpu structure, for some others (such as ptrace access) it's easier to access the per-task flag. The FPU switch-in code has easy access to both values so there's no extra cost from having the cache validity flag split into two parts. A consequence of this is that a correct implementation could in theory eliminate any of the two flags: - We could use only fpu_fpregs_owner_ctx and remove ->fpregs_cached, in this case the ptrace codepaths would have to invalidate the fpu_fpregs_owner_ctx pointer which requires some care as it's not just a local CPU modification, i.e. a single cmpxchg() would be required to invalidate the register state. - Or we could use only ->fpregs_cached and eliminate fpu_fpregs_owner_ctx: this would be awkward from the kernel_fpu_begin()/end() API codepaths, which has no easy access to the task that has its FPU context cached in the CPU registers. (Which might not be the current task.) So I think the best implementation is to have both flags, and to use the one that is the most efficient to access to drive the invalidations from. What we could do is to unify the naming to explain all this a bit better - right now there's very little indication that ->fpregs_cached is closely related to fpu_fpregs_owner_ctx. For example we could rename them to: ->fpregs_cached => ->fpregs_owner[bool] fpu_fpregs_owner_ctx=> fpregs_owner_ctx[ptr] ? Clearing ->fpregs_owner or setting fpregs_owner_ctx to NULL invalidates the cache and it's clear from the naming that the two values are closely related. Would this work with you? Thanks, Ingo
Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > > @@ -322,6 +308,16 @@ struct fpu { > unsigned char fpregs_active; > > /* > + * @fpregs_cached: > + * > + * This flag tells us whether this context is loaded into a > CPU > + * right now. Not quite. You are still checking against fpu_fpregs_owner_ctx. How about something like * This flag tells us whether this context was loaded into * its current CPU; fpu_fpregs_owner_ctx will tell us whether * this context is actually in the registers. > + * > + * This is set to 0 if a task is migrated to another CPU. > + */ > + unsigned char fpregs_cached; > + > + /* > * @state: > * > * In-memory copy of all FPU registers that we save/restore
Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached
On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote: > > @@ -322,6 +308,16 @@ struct fpu { > unsigned char fpregs_active; > > /* > + * @fpregs_cached: > + * > + * This flag tells us whether this context is loaded into a > CPU > + * right now. Not quite. You are still checking against fpu_fpregs_owner_ctx. How about something like * This flag tells us whether this context was loaded into * its current CPU; fpu_fpregs_owner_ctx will tell us whether * this context is actually in the registers. > + * > + * This is set to 0 if a task is migrated to another CPU. > + */ > + unsigned char fpregs_cached; > + > + /* > * @state: > * > * In-memory copy of all FPU registers that we save/restore