Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Thu, 2015-03-19 at 10:59 -0700, Linus Torvalds wrote: > On Thu, Mar 19, 2015 at 10:21 AM, Jason Low wrote: > > > > I tested this patch on a 32 bit ARM system with 4 cores. Using the > > generic 64 bit atomics, I did not see any performance change with this > > patch, and the relevant functions (account_group_*_time(), ect...) don't > > show up in perf reports. > > Ok. > > > One factor might be because locking/cacheline contention isn't as > > apparent on smaller systems to begin with, and lib/atomic64.c also > > mentions that "this is expected to used on systems with small numbers of > > CPUs (<= 4 or so)". > > Yeah, that's probably a valid argument anyway - 32-bit systems aren't > really going to be multi-node big systems any more. > > So I'm ok with the patch, Okay, I will be sending out a v3 patch which will also mention a bit about its effect on 32 bit systems in the changelog, in addition to the changes that were discussed about in this thread (using WRITE_ONCE(), ect...). Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Thu, Mar 19, 2015 at 10:21 AM, Jason Low wrote: > > I tested this patch on a 32 bit ARM system with 4 cores. Using the > generic 64 bit atomics, I did not see any performance change with this > patch, and the relevant functions (account_group_*_time(), ect...) don't > show up in perf reports. Ok. > One factor might be because locking/cacheline contention isn't as > apparent on smaller systems to begin with, and lib/atomic64.c also > mentions that "this is expected to used on systems with small numbers of > CPUs (<= 4 or so)". Yeah, that's probably a valid argument anyway - 32-bit systems aren't really going to be multi-node big systems any more. So I'm ok with the patch, Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Mon, 2015-03-02 at 13:49 -0800, Jason Low wrote: > On Mon, 2015-03-02 at 11:03 -0800, Linus Torvalds wrote: > > On Mon, Mar 2, 2015 at 10:42 AM, Jason Low wrote: > > > > > > This patch converts the timers to 64 bit atomic variables and use > > > atomic add to update them without a lock. With this patch, the percent > > > of total time spent updating thread group cputimer timers was reduced > > > from 30% down to less than 1%. > > > > NAK. > > > > Not because I think this is wrong, but because somebody needs to look > > at the effects on 32-bit architectures too. > > Okay, I will run some tests to see how this change affects the > performance of itimers on 32 bit systems. Hi Linus, I tested this patch on a 32 bit ARM system with 4 cores. Using the generic 64 bit atomics, I did not see any performance change with this patch, and the relevant functions (account_group_*_time(), ect...) don't show up in perf reports. One factor might be because locking/cacheline contention isn't as apparent on smaller systems to begin with, and lib/atomic64.c also mentions that "this is expected to used on systems with small numbers of CPUs (<= 4 or so)". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Thu, 2015-03-05 at 16:35 +0100, Frederic Weisbecker wrote: > On Mon, Mar 02, 2015 at 10:42:11AM -0800, Jason Low wrote: > > +/* Sample thread_group_cputimer values in "cputimer", copy results to > > "times" */ > > +static inline void sample_group_cputimer(struct task_cputime *times, > > +struct thread_group_cputimer *cputimer) > > +{ > > +times->utime = atomic64_read(&cputimer->utime); > > +times->stime = atomic64_read(&cputimer->stime); > > +times->sum_exec_runtime = > > atomic64_read(&cputimer->sum_exec_runtime); > > So, in the case we are calling that right after setting cputimer->running, I > guess we are fine > because we just updated cputimer with the freshest values. > > But if we are reading this a while after, say several ticks further, there is > a chance that > we read stale values since we don't lock anymore. > > I don't know if it matters or not, I guess it depends how stale it can be and > how much precision > we expect from posix cpu timers. It probably doesn't matter. > > But just in case, atomic64_read_return(&cputimer->utime, 0) would make sure > we get the freshest > value because it performs a full barrier, at the cost of more overhead of > course. (Assuming that is atomic64_add_return :)) Yeah, there aren't any guarantees that we read the freshest value, but since the lock isn't used to serialize subsequent accesses of times->utime, ect..., the values can potentially become stale by the time they get used anyway, even when we have the locking. So I'm not sure if atomic64_add_return(&time, 0) for the reads would really provide much of a benefit when we factor in the extra overhead. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Thu, 2015-03-05 at 16:20 +0100, Frederic Weisbecker wrote: > On Mon, Mar 02, 2015 at 01:44:04PM -0800, Linus Torvalds wrote: > > On Mon, Mar 2, 2015 at 1:16 PM, Jason Low wrote: > > > > > > In original code, we set cputimer->running first so it is running while > > > we call update_gt_cputime(). Now in this patch, we swapped the 2 calls > > > such that we set running after calling update_gt_cputime(), so that > > > wouldn't be an issue anymore. > > > > Hmm. If you actually care about ordering, and 'running' should be > > written to after the other things, then it might be best if you use > > > >smp_store_release(&cputimer->running, 1); > > > > which makes it clear that the store happens *after* what went before it. > > > > Or at least have a "smp_wmb()" between the atomic64 updates and the > > "WRITE_ONCE()". > > FWIW, perhaps it can be reduced with an smp_mb__before_atomic() on the > account_group_*_time() side, Hi Frederic, I think Linus might be referring to the updates in update_gt_cputime()? Otherwise, if the atomic updates in account_group_*_time() is already enough for correctness, then we might not want to be adding barriers in the hot paths if they aren't necessary. I was thinking about the adding smp_store_release(&cputimer->running, 1) to document that we want to write to the running field after the operations in update_gt_cputime(). The overhead here won't be much since it doesn't get called frequently as you mentioned. > paired with smp_wmb() from the thread_group_cputimer() > side. Arming cputime->running shouldn't be too frequent while update cputime > happens at least every tick... > > Assuming smp_mb__before_atomic() is more lightweight than smp_load_acquire() > of course. > > > > > I guess that since you use cmpxchg in update_gt_cputime, the accesses > > end up being ordered anyway, but it might be better to make that thing > > very explicit. > > > >Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Thu, Mar 05, 2015 at 05:00:05PM +0100, Frederic Weisbecker wrote: > On Thu, Mar 05, 2015 at 07:56:59AM -0800, Paul E. McKenney wrote: > > On Thu, Mar 05, 2015 at 04:35:09PM +0100, Frederic Weisbecker wrote: > > > So, in the case we are calling that right after setting > > > cputimer->running, I guess we are fine > > > because we just updated cputimer with the freshest values. > > > > > > But if we are reading this a while after, say several ticks further, > > > there is a chance that > > > we read stale values since we don't lock anymore. > > > > > > I don't know if it matters or not, I guess it depends how stale it can be > > > and how much precision > > > we expect from posix cpu timers. It probably doesn't matter. > > > > > > But just in case, atomic64_read_return(&cputimer->utime, 0) would make > > > sure we get the freshest > > > value because it performs a full barrier, at the cost of more overhead of > > > course. > > > > Well, if we are running within a guest OS, we might be delayed at any point > > for quite some time. Even with interrupts disabled. > > You mean delayed because of the overhead of atomic_add_return() or the stale > value > of cptimer-> fields? Because of preemption of the guest OS's VCPUs by the host OS. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Thu, Mar 05, 2015 at 07:56:59AM -0800, Paul E. McKenney wrote: > On Thu, Mar 05, 2015 at 04:35:09PM +0100, Frederic Weisbecker wrote: > > So, in the case we are calling that right after setting cputimer->running, > > I guess we are fine > > because we just updated cputimer with the freshest values. > > > > But if we are reading this a while after, say several ticks further, there > > is a chance that > > we read stale values since we don't lock anymore. > > > > I don't know if it matters or not, I guess it depends how stale it can be > > and how much precision > > we expect from posix cpu timers. It probably doesn't matter. > > > > But just in case, atomic64_read_return(&cputimer->utime, 0) would make sure > > we get the freshest > > value because it performs a full barrier, at the cost of more overhead of > > course. > > Well, if we are running within a guest OS, we might be delayed at any point > for quite some time. Even with interrupts disabled. You mean delayed because of the overhead of atomic_add_return() or the stale value of cptimer-> fields? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Thu, Mar 05, 2015 at 04:35:09PM +0100, Frederic Weisbecker wrote: > On Mon, Mar 02, 2015 at 10:42:11AM -0800, Jason Low wrote: > > v1->v2: > > - Peter suggested that cputimer->running does not need to be atomic, > > so we can leave it as an integer. > > - Address a race condition that could occur in update_gt_cputime(). > > - Add helper functions to avoid repeating code. > > > > While running a database workload, we found a scalability issue > > with itimers. > > > > Much of the problem was caused by the thread_group_cputimer spinlock. > > Each time we account for group system/user time, we need to obtain a > > thread_group_cputimer's spinlock to update the timers. On larger > > systems (such as a 16 socket machine), this caused more than 30% of > > total time spent trying to obtain this kernel lock to update these > > group timer stats. > > > > This patch converts the timers to 64 bit atomic variables and use > > atomic add to update them without a lock. With this patch, the percent > > of total time spent updating thread group cputimer timers was reduced > > from 30% down to less than 1%. > > > > Signed-off-by: Jason Low > > --- > > include/linux/init_task.h |7 +++-- > > include/linux/sched.h | 10 ++- > > kernel/fork.c |3 -- > > kernel/sched/stats.h | 12 ++-- > > kernel/time/posix-cpu-timers.c | 55 > > +++ > > 5 files changed, 42 insertions(+), 45 deletions(-) > > > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > > index 3037fc0..c4cdec7 100644 > > --- a/include/linux/init_task.h > > +++ b/include/linux/init_task.h > > @@ -50,9 +50,10 @@ extern struct fs_struct init_fs; > > .cpu_timers = INIT_CPU_TIMERS(sig.cpu_timers), \ > > .rlim = INIT_RLIMITS, \ > > .cputimer = { \ > > - .cputime = INIT_CPUTIME,\ > > - .running = 0, \ > > - .lock = __RAW_SPIN_LOCK_UNLOCKED(sig.cputimer.lock),\ > > + .utime = ATOMIC64_INIT(0), \ > > + .stime = ATOMIC64_INIT(0), \ > > + .sum_exec_runtime = ATOMIC64_INIT(0), \ > > + .running = 0\ > > }, \ > > .cred_guard_mutex = \ > > __MUTEX_INITIALIZER(sig.cred_guard_mutex), \ > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index 8db31ef..d6b0f76 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -588,9 +588,10 @@ struct task_cputime { > > * used for thread group CPU timer calculations. > > */ > > struct thread_group_cputimer { > > - struct task_cputime cputime; > > + atomic64_t utime; > > + atomic64_t stime; > > + atomic64_t sum_exec_runtime; > > int running; > > - raw_spinlock_t lock; > > }; > > > > #include > > @@ -2942,11 +2943,6 @@ static __always_inline bool need_resched(void) > > void thread_group_cputime(struct task_struct *tsk, struct task_cputime > > *times); > > void thread_group_cputimer(struct task_struct *tsk, struct task_cputime > > *times); > > > > -static inline void thread_group_cputime_init(struct signal_struct *sig) > > -{ > > - raw_spin_lock_init(&sig->cputimer.lock); > > -} > > - > > /* > > * Reevaluate whether the task has signals pending delivery. > > * Wake the task if so. > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 4dc2dda..df9dfe9 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -1037,9 +1037,6 @@ static void posix_cpu_timers_init_group(struct > > signal_struct *sig) > > { > > unsigned long cpu_limit; > > > > - /* Thread group counters. */ > > - thread_group_cputime_init(sig); > > - > > cpu_limit = ACCESS_ONCE(sig->rlim[RLIMIT_CPU].rlim_cur); > > if (cpu_limit != RLIM_INFINITY) { > > sig->cputime_expires.prof_exp = secs_to_cputime(cpu_limit); > > diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h > > index 4ab7043..adda94e 100644 > > --- a/kernel/sched/stats.h > > +++ b/kernel/sched/stats.h > > @@ -215,9 +215,7 @@ static inline void account_group_user_time(struct > > task_struct *tsk, > > if (!cputimer_running(tsk)) > > return; > > > > - raw_spin_lock(&cputimer->lock); > > - cputimer->cputime.utime += cputime; > > - raw_spin_unlock(&cputimer->lock); > > + atomic64_add(cputime, &cputimer->utime); > > } > > > > /** > > @@ -238,9 +236,7 @@ static inline void account_group_system_time(struct > > task_struct *tsk, > > if (!cputimer_running(tsk)) > > return; > > > > - raw_spin_lock(&cputimer->lock); > > - cputime
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Mon, Mar 02, 2015 at 10:42:11AM -0800, Jason Low wrote: > v1->v2: > - Peter suggested that cputimer->running does not need to be atomic, > so we can leave it as an integer. > - Address a race condition that could occur in update_gt_cputime(). > - Add helper functions to avoid repeating code. > > While running a database workload, we found a scalability issue > with itimers. > > Much of the problem was caused by the thread_group_cputimer spinlock. > Each time we account for group system/user time, we need to obtain a > thread_group_cputimer's spinlock to update the timers. On larger > systems (such as a 16 socket machine), this caused more than 30% of > total time spent trying to obtain this kernel lock to update these > group timer stats. > > This patch converts the timers to 64 bit atomic variables and use > atomic add to update them without a lock. With this patch, the percent > of total time spent updating thread group cputimer timers was reduced > from 30% down to less than 1%. > > Signed-off-by: Jason Low > --- > include/linux/init_task.h |7 +++-- > include/linux/sched.h | 10 ++- > kernel/fork.c |3 -- > kernel/sched/stats.h | 12 ++-- > kernel/time/posix-cpu-timers.c | 55 +++ > 5 files changed, 42 insertions(+), 45 deletions(-) > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > index 3037fc0..c4cdec7 100644 > --- a/include/linux/init_task.h > +++ b/include/linux/init_task.h > @@ -50,9 +50,10 @@ extern struct fs_struct init_fs; > .cpu_timers = INIT_CPU_TIMERS(sig.cpu_timers), \ > .rlim = INIT_RLIMITS, \ > .cputimer = { \ > - .cputime = INIT_CPUTIME,\ > - .running = 0, \ > - .lock = __RAW_SPIN_LOCK_UNLOCKED(sig.cputimer.lock),\ > + .utime = ATOMIC64_INIT(0), \ > + .stime = ATOMIC64_INIT(0), \ > + .sum_exec_runtime = ATOMIC64_INIT(0), \ > + .running = 0\ > }, \ > .cred_guard_mutex = \ >__MUTEX_INITIALIZER(sig.cred_guard_mutex), \ > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 8db31ef..d6b0f76 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -588,9 +588,10 @@ struct task_cputime { > * used for thread group CPU timer calculations. > */ > struct thread_group_cputimer { > - struct task_cputime cputime; > + atomic64_t utime; > + atomic64_t stime; > + atomic64_t sum_exec_runtime; > int running; > - raw_spinlock_t lock; > }; > > #include > @@ -2942,11 +2943,6 @@ static __always_inline bool need_resched(void) > void thread_group_cputime(struct task_struct *tsk, struct task_cputime > *times); > void thread_group_cputimer(struct task_struct *tsk, struct task_cputime > *times); > > -static inline void thread_group_cputime_init(struct signal_struct *sig) > -{ > - raw_spin_lock_init(&sig->cputimer.lock); > -} > - > /* > * Reevaluate whether the task has signals pending delivery. > * Wake the task if so. > diff --git a/kernel/fork.c b/kernel/fork.c > index 4dc2dda..df9dfe9 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1037,9 +1037,6 @@ static void posix_cpu_timers_init_group(struct > signal_struct *sig) > { > unsigned long cpu_limit; > > - /* Thread group counters. */ > - thread_group_cputime_init(sig); > - > cpu_limit = ACCESS_ONCE(sig->rlim[RLIMIT_CPU].rlim_cur); > if (cpu_limit != RLIM_INFINITY) { > sig->cputime_expires.prof_exp = secs_to_cputime(cpu_limit); > diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h > index 4ab7043..adda94e 100644 > --- a/kernel/sched/stats.h > +++ b/kernel/sched/stats.h > @@ -215,9 +215,7 @@ static inline void account_group_user_time(struct > task_struct *tsk, > if (!cputimer_running(tsk)) > return; > > - raw_spin_lock(&cputimer->lock); > - cputimer->cputime.utime += cputime; > - raw_spin_unlock(&cputimer->lock); > + atomic64_add(cputime, &cputimer->utime); > } > > /** > @@ -238,9 +236,7 @@ static inline void account_group_system_time(struct > task_struct *tsk, > if (!cputimer_running(tsk)) > return; > > - raw_spin_lock(&cputimer->lock); > - cputimer->cputime.stime += cputime; > - raw_spin_unlock(&cputimer->lock); > + atomic64_add(cputime, &cputimer->stime); > } > > /** > @@ -261,7 +257,5 @@ static inline void account_group_exec_runtime(struct > task_struct *tsk, >
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Mon, Mar 02, 2015 at 01:44:04PM -0800, Linus Torvalds wrote: > On Mon, Mar 2, 2015 at 1:16 PM, Jason Low wrote: > > > > In original code, we set cputimer->running first so it is running while > > we call update_gt_cputime(). Now in this patch, we swapped the 2 calls > > such that we set running after calling update_gt_cputime(), so that > > wouldn't be an issue anymore. > > Hmm. If you actually care about ordering, and 'running' should be > written to after the other things, then it might be best if you use > >smp_store_release(&cputimer->running, 1); > > which makes it clear that the store happens *after* what went before it. > > Or at least have a "smp_wmb()" between the atomic64 updates and the > "WRITE_ONCE()". FWIW, perhaps it can be reduced with an smp_mb__before_atomic() on the account_group_*_time() side, paired with smp_wmb() from the thread_group_cputimer() side. Arming cputime->running shouldn't be too frequent while update cputime happens at least every tick... Assuming smp_mb__before_atomic() is more lightweight than smp_load_acquire() of course. > > I guess that since you use cmpxchg in update_gt_cputime, the accesses > end up being ordered anyway, but it might be better to make that thing > very explicit. > >Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Mon, 2015-03-02 at 13:44 -0800, Linus Torvalds wrote: > On Mon, Mar 2, 2015 at 1:16 PM, Jason Low wrote: > > > > In original code, we set cputimer->running first so it is running while > > we call update_gt_cputime(). Now in this patch, we swapped the 2 calls > > such that we set running after calling update_gt_cputime(), so that > > wouldn't be an issue anymore. > > Hmm. If you actually care about ordering, and 'running' should be > written to after the other things, then it might be best if you use > >smp_store_release(&cputimer->running, 1); > > which makes it clear that the store happens *after* what went before it. > > Or at least have a "smp_wmb()" between the atomic64 updates and the > "WRITE_ONCE()". > > I guess that since you use cmpxchg in update_gt_cputime, the accesses > end up being ordered anyway, but it might be better to make that thing > very explicit. Yeah, I suppose the extra (smp_mb or smp_wmb) might add more overhead but since this is not a common code path anyway, it would be worth adding it to make things more clear. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Mon, 2015-03-02 at 11:03 -0800, Linus Torvalds wrote: > On Mon, Mar 2, 2015 at 10:42 AM, Jason Low wrote: > > > > This patch converts the timers to 64 bit atomic variables and use > > atomic add to update them without a lock. With this patch, the percent > > of total time spent updating thread group cputimer timers was reduced > > from 30% down to less than 1%. > > NAK. > > Not because I think this is wrong, but because somebody needs to look > at the effects on 32-bit architectures too. > > In particular, check out lib/atomic64.c - which uses a hashed array of > 16-bit spinlocks to do 64-bit atomics. That may or may well work ok in > practice, but it does mean that now sample_group_cputimer() and > update_gt_cputime() will take that (it ends up generally being the > same) spinlock three times for the three atomic64_read()'s. Okay, I will run some tests to see how this change affects the performance of itimers on 32 bit systems. While the update_gt_cputime() shouldn't be an issue for performance since it doesn't get called often, the sample_group_cputimer() needing to take locks 3 times for each atomic64_read is something that could impact performance, so we should take a look at that. Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Mon, Mar 2, 2015 at 1:16 PM, Jason Low wrote: > > In original code, we set cputimer->running first so it is running while > we call update_gt_cputime(). Now in this patch, we swapped the 2 calls > such that we set running after calling update_gt_cputime(), so that > wouldn't be an issue anymore. Hmm. If you actually care about ordering, and 'running' should be written to after the other things, then it might be best if you use smp_store_release(&cputimer->running, 1); which makes it clear that the store happens *after* what went before it. Or at least have a "smp_wmb()" between the atomic64 updates and the "WRITE_ONCE()". I guess that since you use cmpxchg in update_gt_cputime, the accesses end up being ordered anyway, but it might be better to make that thing very explicit. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Mon, 2015-03-02 at 20:40 +0100, Oleg Nesterov wrote: > Well, I forgot everything about this code, but let me ask anyway ;) > > On 03/02, Jason Low wrote: > > @@ -222,13 +239,10 @@ void thread_group_cputimer(struct task_struct *tsk, > > struct task_cputime *times) > > * it. > > */ > > thread_group_cputime(tsk, &sum); > > - raw_spin_lock_irqsave(&cputimer->lock, flags); > > - cputimer->running = 1; > > - update_gt_cputime(&cputimer->cputime, &sum); > > - } else > > - raw_spin_lock_irqsave(&cputimer->lock, flags); > > - *times = cputimer->cputime; > > - raw_spin_unlock_irqrestore(&cputimer->lock, flags); > > + update_gt_cputime(cputimer, &sum); > > + ACCESS_ONCE(cputimer->running) = 1; > > WRITE_ONCE() looks better... Okay, I can update that. > but it is not clear to me why do we need it > at all. Peter suggested it here as we would now be updating the running field without the lock: https://lkml.org/lkml/2015/1/23/641 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Mon, 2015-03-02 at 20:43 +0100, Oleg Nesterov wrote: > On 03/02, Oleg Nesterov wrote: > > > > Well, I forgot everything about this code, but let me ask anyway ;) > > > > On 03/02, Jason Low wrote: > > > > > > -static void update_gt_cputime(struct task_cputime *a, struct > > > task_cputime *b) > > > +static inline void __update_gt_cputime(atomic64_t *cputime, u64 > > > sum_cputime) > > > { > > > - if (b->utime > a->utime) > > > - a->utime = b->utime; > > > - > > > - if (b->stime > a->stime) > > > - a->stime = b->stime; > > > + u64 curr_cputime; > > > + /* > > > + * Set cputime to sum_cputime if sum_cputime > cputime. Use cmpxchg > > > + * to avoid race conditions with concurrent updates to cputime. > > > + */ > > > +retry: > > > + curr_cputime = atomic64_read(cputime); > > > + if (sum_cputime > curr_cputime) { > > > + if (atomic64_cmpxchg(cputime, curr_cputime, sum_cputime) != > > > curr_cputime) > > > + goto retry; > > > + } > > > +} > > > > > > - if (b->sum_exec_runtime > a->sum_exec_runtime) > > > - a->sum_exec_runtime = b->sum_exec_runtime; > > > +static void update_gt_cputime(struct thread_group_cputimer *cputimer, > > > struct task_cputime *sum) > > > +{ > > > + __update_gt_cputime(&cputimer->utime, sum->utime); > > > + __update_gt_cputime(&cputimer->stime, sum->stime); > > > + __update_gt_cputime(&cputimer->sum_exec_runtime, sum->sum_exec_runtime); > > > } > > > > And this is called if !cputimer_running(). > > > > So who else can update these atomic64_t's ? The caller is called under > > ->siglock. > > IOW, do we really need to cmpxchg/retry ? > > > > Just curious, I am sure I missed something. > > Ah, sorry, I seem to understand. > > We still can race with account_group_*time() even if ->running == 0. Because > (say) account_group_exec_runtime() can race with 1 -> 0 -> 1 transition. > > Or is there another reason? Hi Oleg, Yes, that 1 -> 0 -> 1 transition was the race that I had in mind. Thus, I added the extra atomic logic in update_gt_cputime() just to be safe. In original code, we set cputimer->running first so it is running while we call update_gt_cputime(). Now in this patch, we swapped the 2 calls such that we set running after calling update_gt_cputime(), so that wouldn't be an issue anymore. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On 03/02, Oleg Nesterov wrote: > > Well, I forgot everything about this code, but let me ask anyway ;) > > On 03/02, Jason Low wrote: > > > > -static void update_gt_cputime(struct task_cputime *a, struct task_cputime > > *b) > > +static inline void __update_gt_cputime(atomic64_t *cputime, u64 > > sum_cputime) > > { > > - if (b->utime > a->utime) > > - a->utime = b->utime; > > - > > - if (b->stime > a->stime) > > - a->stime = b->stime; > > + u64 curr_cputime; > > + /* > > +* Set cputime to sum_cputime if sum_cputime > cputime. Use cmpxchg > > +* to avoid race conditions with concurrent updates to cputime. > > +*/ > > +retry: > > + curr_cputime = atomic64_read(cputime); > > + if (sum_cputime > curr_cputime) { > > + if (atomic64_cmpxchg(cputime, curr_cputime, sum_cputime) != > > curr_cputime) > > + goto retry; > > + } > > +} > > > > - if (b->sum_exec_runtime > a->sum_exec_runtime) > > - a->sum_exec_runtime = b->sum_exec_runtime; > > +static void update_gt_cputime(struct thread_group_cputimer *cputimer, > > struct task_cputime *sum) > > +{ > > + __update_gt_cputime(&cputimer->utime, sum->utime); > > + __update_gt_cputime(&cputimer->stime, sum->stime); > > + __update_gt_cputime(&cputimer->sum_exec_runtime, sum->sum_exec_runtime); > > } > > And this is called if !cputimer_running(). > > So who else can update these atomic64_t's ? The caller is called under > ->siglock. > IOW, do we really need to cmpxchg/retry ? > > Just curious, I am sure I missed something. Ah, sorry, I seem to understand. We still can race with account_group_*time() even if ->running == 0. Because (say) account_group_exec_runtime() can race with 1 -> 0 -> 1 transition. Or is there another reason? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
Well, I forgot everything about this code, but let me ask anyway ;) On 03/02, Jason Low wrote: > > -static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b) > +static inline void __update_gt_cputime(atomic64_t *cputime, u64 sum_cputime) > { > - if (b->utime > a->utime) > - a->utime = b->utime; > - > - if (b->stime > a->stime) > - a->stime = b->stime; > + u64 curr_cputime; > + /* > + * Set cputime to sum_cputime if sum_cputime > cputime. Use cmpxchg > + * to avoid race conditions with concurrent updates to cputime. > + */ > +retry: > + curr_cputime = atomic64_read(cputime); > + if (sum_cputime > curr_cputime) { > + if (atomic64_cmpxchg(cputime, curr_cputime, sum_cputime) != > curr_cputime) > + goto retry; > + } > +} > > - if (b->sum_exec_runtime > a->sum_exec_runtime) > - a->sum_exec_runtime = b->sum_exec_runtime; > +static void update_gt_cputime(struct thread_group_cputimer *cputimer, struct > task_cputime *sum) > +{ > + __update_gt_cputime(&cputimer->utime, sum->utime); > + __update_gt_cputime(&cputimer->stime, sum->stime); > + __update_gt_cputime(&cputimer->sum_exec_runtime, sum->sum_exec_runtime); > } And this is called if !cputimer_running(). So who else can update these atomic64_t's ? The caller is called under ->siglock. IOW, do we really need to cmpxchg/retry ? Just curious, I am sure I missed something. > @@ -222,13 +239,10 @@ void thread_group_cputimer(struct task_struct *tsk, > struct task_cputime *times) >* it. >*/ > thread_group_cputime(tsk, &sum); > - raw_spin_lock_irqsave(&cputimer->lock, flags); > - cputimer->running = 1; > - update_gt_cputime(&cputimer->cputime, &sum); > - } else > - raw_spin_lock_irqsave(&cputimer->lock, flags); > - *times = cputimer->cputime; > - raw_spin_unlock_irqrestore(&cputimer->lock, flags); > + update_gt_cputime(cputimer, &sum); > + ACCESS_ONCE(cputimer->running) = 1; WRITE_ONCE() looks better... but it is not clear to me why do we need it at all. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
On Mon, Mar 2, 2015 at 10:42 AM, Jason Low wrote: > > This patch converts the timers to 64 bit atomic variables and use > atomic add to update them without a lock. With this patch, the percent > of total time spent updating thread group cputimer timers was reduced > from 30% down to less than 1%. NAK. Not because I think this is wrong, but because somebody needs to look at the effects on 32-bit architectures too. In particular, check out lib/atomic64.c - which uses a hashed array of 16-bit spinlocks to do 64-bit atomics. That may or may well work ok in practice, but it does mean that now sample_group_cputimer() and update_gt_cputime() will take that (it ends up generally being the same) spinlock three times for the three atomic64_read()'s. Now, I think on x86, we end up using not lib/atomic64.c but our own versions that use cmpxchg8b, which is probably fine from a performance standpoint. But I see a lot of "select GENERIC_ATOMIC64" for other architectures. Anyway, it is *possible* that even on those 32-bit targets, the atomic64's aren't any worse than the current spinlock in practice. So the "NAK" is in no way absolute - but I'd just like to hear that this is all reasonably fine on 32-bit ARM and powerpc, for example. Hmm? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/