Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-24 Thread Johannes Weiner
On Thu, Jul 19, 2018 at 10:31:15PM +0200, Peter Zijlstra wrote: > On Thu, Jul 19, 2018 at 02:47:40PM -0400, Johannes Weiner wrote: > > On Wed, Jul 18, 2018 at 02:03:18PM +0200, Peter Zijlstra wrote: > > > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > > > + /* Update tas

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-20 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > +static bool psi_update_stats(struct psi_group *group) > +{ > + for_each_online_cpu(cpu) { > + struct psi_group_cpu *groupc = per_cpu_ptr(group->cpus, cpu); > + unsigned long nonidle; > + > +

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-20 Thread Johannes Weiner
On Wed, Jul 18, 2018 at 06:06:23PM -0400, Johannes Weiner wrote: > On Tue, Jul 17, 2018 at 05:01:42PM +0200, Peter Zijlstra wrote: > > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > > +static bool psi_update_stats(struct psi_group *group) > > > +{ > > > + u64 some[NR_PSI_RESOU

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-19 Thread Peter Zijlstra
On Thu, Jul 19, 2018 at 02:47:40PM -0400, Johannes Weiner wrote: > On Wed, Jul 18, 2018 at 02:03:18PM +0200, Peter Zijlstra wrote: > > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > > + /* Update task counts according to the set/clear bitmasks */ > > > + for (to = 0; (bo = ffs

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-19 Thread Johannes Weiner
On Wed, Jul 18, 2018 at 02:03:18PM +0200, Peter Zijlstra wrote: > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > + /* Update task counts according to the set/clear bitmasks */ > > + for (to = 0; (bo = ffs(clear)); to += bo, clear >>= bo) { > > + int idx = to + (b

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-19 Thread Johannes Weiner
On Thu, Jul 19, 2018 at 08:08:20AM -0700, Linus Torvalds wrote: > On Wed, Jul 18, 2018 at 5:03 AM Peter Zijlstra wrote: > > > > And as said before, we can compress the state from 12 bytes, to 6 bits > > (or 1 byte), giving another 11 bytes for 59 bytes free. > > > > Leaving us just 5 bytes short o

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-19 Thread Linus Torvalds
On Wed, Jul 18, 2018 at 5:03 AM Peter Zijlstra wrote: > > And as said before, we can compress the state from 12 bytes, to 6 bits > (or 1 byte), giving another 11 bytes for 59 bytes free. > > Leaving us just 5 bytes short of needing a single cacheline :/ Do you actually need 64 bits for the times?

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-19 Thread Peter Zijlstra
On Wed, Jul 18, 2018 at 06:36:44PM -0400, Johannes Weiner wrote: > On Wed, Jul 18, 2018 at 02:03:18PM +0200, Peter Zijlstra wrote: > > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > > + /* Time in which tasks wait for the CPU */ > > > + state = PSI_NONE; > > > + if (tasks[NR_R

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-19 Thread Peter Zijlstra
On Thu, Jul 19, 2018 at 08:50:38AM -0400, Johannes Weiner wrote: > On Thu, Jul 19, 2018 at 11:26:14AM +0200, Peter Zijlstra wrote: > > On Wed, Jul 18, 2018 at 02:03:18PM +0200, Peter Zijlstra wrote: > > > > > Leaving us just 5 bytes short of needing a single cacheline :/ > > > > > > struct ponies

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-19 Thread Johannes Weiner
On Thu, Jul 19, 2018 at 11:26:14AM +0200, Peter Zijlstra wrote: > On Wed, Jul 18, 2018 at 02:03:18PM +0200, Peter Zijlstra wrote: > > > Leaving us just 5 bytes short of needing a single cacheline :/ > > > > struct ponies { > > unsigned int tasks[3];

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-19 Thread Peter Zijlstra
On Wed, Jul 18, 2018 at 02:03:18PM +0200, Peter Zijlstra wrote: > Leaving us just 5 bytes short of needing a single cacheline :/ > > struct ponies { > unsigned int tasks[3]; >/* 012 */ > unsigned int

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Johannes Weiner
On Wed, Jul 18, 2018 at 02:03:18PM +0200, Peter Zijlstra wrote: > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > + /* Time in which tasks wait for the CPU */ > > + state = PSI_NONE; > > + if (tasks[NR_RUNNING] > 1) > > + state = PSI_SOME; > > + time_state(&gr

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Johannes Weiner
On Tue, Jul 17, 2018 at 05:17:05PM +0200, Peter Zijlstra wrote: > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > @@ -457,6 +457,22 @@ config TASK_IO_ACCOUNTING > > > > Say N if unsure. > > > > +config PSI > > + bool "Pressure stall information tracking" > > + sel

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Johannes Weiner
On Tue, Jul 17, 2018 at 05:01:42PM +0200, Peter Zijlstra wrote: > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > +static bool psi_update_stats(struct psi_group *group) > > +{ > > + u64 some[NR_PSI_RESOURCES] = { 0, }; > > + u64 full[NR_PSI_RESOURCES] = { 0, }; > > + unsi

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Johannes Weiner
On Tue, Jul 17, 2018 at 04:21:57PM +0200, Peter Zijlstra wrote: > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > diff --git a/include/linux/sched/stat.h b/include/linux/sched/stat.h > > index 04f1321d14c4..ac39435d1521 100644 > > --- a/include/linux/sched/stat.h > > +++ b/incl

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Johannes Weiner
On Tue, Jul 17, 2018 at 04:16:14PM +0200, Peter Zijlstra wrote: > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > +/* Tracked task states */ > > +enum psi_task_count { > > + NR_RUNNING, > > + NR_IOWAIT, > > + NR_MEMSTALL, > > + NR_PSI_TASK_COUNTS, > > +}; > > > +/* Res

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Johannes Weiner
On Tue, Jul 17, 2018 at 12:03:47PM +0200, Peter Zijlstra wrote: > This is still a scary amount of accounting; not to mention you'll be > adding O(cgroup-depth) to this in a later patch. > > Where are the performance numbers for all this? I benchmarked it using our two most scheduling sensitive wo

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Johannes Weiner
On Wed, Jul 18, 2018 at 06:31:15PM +0200, Peter Zijlstra wrote: > On Wed, Jul 18, 2018 at 09:56:33AM -0400, Johannes Weiner wrote: > > On Wed, Jul 18, 2018 at 02:46:27PM +0200, Peter Zijlstra wrote: > > > > I'm confused by this whole MEMSTALL thing... I thought the idea was to > > > account the ti

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Peter Zijlstra
On Wed, Jul 18, 2018 at 09:56:33AM -0400, Johannes Weiner wrote: > On Wed, Jul 18, 2018 at 02:46:27PM +0200, Peter Zijlstra wrote: > > I'm confused by this whole MEMSTALL thing... I thought the idea was to > > account the time we were _blocked_ because of memstall, but you seem to > > count the ti

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Johannes Weiner
Hi Peter, thanks for the feedback so far, I'll get to the other emails later. I'm currently running A/B tests against our production traffic to get uptodate numbers in particular on the optimizations you suggested for the cacheline packing, time_state(), ffs() etc. On Wed, Jul 18, 2018 at 02:46:2

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > +static inline void psi_enqueue(struct task_struct *p, u64 now, bool wakeup) > +{ > + int clear = 0, set = TSK_RUNNING; > + > + if (psi_disabled) > + return; > + > + if (!wakeup || p->sched_psi_wake_requeue)

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Peter Zijlstra
On Wed, Jul 18, 2018 at 02:03:18PM +0200, Peter Zijlstra wrote: > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > + for (to = 0; (bo = ffs(set)); to += bo, set >>= bo) > > + tasks[to + (bo - 1)]++; > > You want to benchmark this, but since it's only 3 consecutive b

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-18 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > +/* Tracked task states */ > +enum psi_task_count { > + NR_RUNNING, > + NR_IOWAIT, > + NR_MEMSTALL, > + NR_PSI_TASK_COUNTS, > +}; > + > +/* Task state bitmasks */ > +#define TSK_RUNNING (1 << NR_RUNNING) > +#define

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-17 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > +struct psi_group { > + struct psi_group_cpu *cpus; That one wants a __percpu annotation on I think. Also, maybe a rename. > + > + struct mutex stat_lock; > + > + u64 some[NR_PSI_RESOURCES]; > + u64 full[NR_PSI_RES

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-17 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > diff --git a/include/linux/sched/stat.h b/include/linux/sched/stat.h > index 04f1321d14c4..ac39435d1521 100644 > --- a/include/linux/sched/stat.h > +++ b/include/linux/sched/stat.h > @@ -28,10 +28,14 @@ static inline int sched_info_

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-17 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > +static bool psi_update_stats(struct psi_group *group) > +{ > + u64 some[NR_PSI_RESOURCES] = { 0, }; > + u64 full[NR_PSI_RESOURCES] = { 0, }; > + unsigned long nonidle_total = 0; > + unsigned long missed_periods; > +

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-17 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > diff --git a/include/linux/sched/stat.h b/include/linux/sched/stat.h > index 04f1321d14c4..ac39435d1521 100644 > --- a/include/linux/sched/stat.h > +++ b/include/linux/sched/stat.h > @@ -28,10 +28,14 @@ static inline int sched_info_

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-17 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > +/* Tracked task states */ > +enum psi_task_count { > + NR_RUNNING, > + NR_IOWAIT, > + NR_MEMSTALL, > + NR_PSI_TASK_COUNTS, > +}; > +/* Resources that workloads could be stalled on */ > +enum psi_res { > + PSI_C

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-17 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > +static void time_state(struct psi_resource *res, int state, u64 now) > +{ > + if (res->state != PSI_NONE) { > + bool was_full = res->state == PSI_FULL; > + > + res->times[was_full] += now - res->state_st

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-14 Thread Peter Zijlstra
On Fri, Jul 13, 2018 at 12:17:56PM -0400, Johannes Weiner wrote: > On Fri, Jul 13, 2018 at 11:21:53AM +0200, Peter Zijlstra wrote: > > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > > +static inline void psi_ttwu_dequeue(struct task_struct *p) > > > +{ > > > + if (psi_disabled

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-14 Thread Peter Zijlstra
Hi Johannes, A few quick comments on first reading; I'll do a second and more thorough reading on Monday. On Fri, Jul 13, 2018 at 12:17:56PM -0400, Johannes Weiner wrote: > First off, what I want to do can indeed be done without a strong link > of a sleeping task to a CPU. We don't rely on it,

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-13 Thread Johannes Weiner
Hi Peter, On Fri, Jul 13, 2018 at 11:21:53AM +0200, Peter Zijlstra wrote: > On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > > +static inline void psi_ttwu_dequeue(struct task_struct *p) > > +{ > > + if (psi_disabled) > > + return; > > + /* > > +* Is the task be

Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-13 Thread Peter Zijlstra
On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote: > +static inline void psi_ttwu_dequeue(struct task_struct *p) > +{ > + if (psi_disabled) > + return; > + /* > + * Is the task being migrated during a wakeup? Make sure to > + * deregister its sleep-persis

[PATCH 08/10] psi: pressure stall information for CPU, memory, and IO

2018-07-12 Thread Johannes Weiner
When systems are overcommitted and resources become contended, it's hard to tell exactly the impact this has on workload productivity, or how close the system is to lockups and OOM kills. In particular, when machines work multiple jobs concurrently, the impact of overcommit in terms of latency and