Re: [PATCH v4 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-12 Thread Michael Ellerman
Srikar Dronamraju  writes:
> With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
> vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
> This leads to wrong choice of CPU, which in-turn leads to larger wakeup
> latencies. Eventually, it leads to performance regression in latency
> sensitive benchmarks like soltp, schbench etc.
>
> On Powerpc, vcpu_is_preempted only looks at yield_count. If the
> yield_count is odd, the vCPU is assumed to be preempted. However
> yield_count is increased whenever LPAR enters CEDE state. So any CPU
> that has entered CEDE state is assumed to be preempted.
>
> Even if vCPU of dedicated LPAR is preempted/donated, it should have
> right of first-use since they are suppose to own the vCPU.
>
> On a Power9 System with 32 cores
>  # lscpu
> Architecture:ppc64le
> Byte Order:  Little Endian
> CPU(s):  128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  8
> Core(s) per socket:  1
> Socket(s):   16
> NUMA node(s):2
> Model:   2.2 (pvr 004e 0202)
> Model name:  POWER9 (architected), altivec supported
> Hypervisor vendor:   pHyp
> Virtualization type: para
> L1d cache:   32K
> L1i cache:   32K
> L2 cache:512K
> L3 cache:10240K
> NUMA node0 CPU(s):   0-63
> NUMA node1 CPU(s):   64-127
>
>   # perf stat -a -r 5 ./schbench
> v5.4 v5.4 + patch
> Latency percentiles (usec)   Latency percentiles (usec)
>   50.th: 45   50.th: 39
>   75.th: 62   75.th: 53
>   90.th: 71   90.th: 67
>   95.th: 77   95.th: 76
>   *99.th: 91  *99.th: 89
>   99.5000th: 707  99.5000th: 93
>   99.9000th: 6920 99.9000th: 118
>   min=0, max=10048min=0, max=211
> Latency percentiles (usec)   Latency percentiles (usec)
>   50.th: 45   50.th: 34
>   75.th: 61   75.th: 45
>   90.th: 72   90.th: 53
>   95.th: 79   95.th: 56
>   *99.th: 691 *99.th: 61
>   99.5000th: 3972 99.5000th: 63
>   99.9000th: 8368 99.9000th: 78
>   min=0, max=16606min=0, max=228
> Latency percentiles (usec)   Latency percentiles (usec)
>   50.th: 45   50.th: 34
>   75.th: 61   75.th: 45
>   90.th: 71   90.th: 53
>   95.th: 77   95.th: 57
>   *99.th: 106 *99.th: 63
>   99.5000th: 2364 99.5000th: 68
>   99.9000th: 7480 99.9000th: 100
>   min=0, max=10001min=0, max=134
> Latency percentiles (usec)   Latency percentiles (usec)
>   50.th: 45   50.th: 34
>   75.th: 62   75.th: 46
>   90.th: 72   90.th: 53
>   95.th: 78   95.th: 56
>   *99.th: 93  *99.th: 61
>   99.5000th: 108  99.5000th: 64
>   99.9000th: 6792 99.9000th: 85
>   min=0, max=17681min=0, max=121
> Latency percentiles (usec)   Latency percentiles (usec)
>   50.th: 46   50.th: 33
>   75.th: 62   75.th: 44
>   90.th: 73   90.th: 51
>   95.th: 79   95.th: 54
>   *99.th: 113 *99.th: 61
>   99.5000th: 2724 99.5000th: 64
>   99.9000th: 6184 99.9000th: 82
>   min=0, max=9887 min=0, max=121
>
>  Performance counter stats for 'system wide' (5 runs):
>
> context-switches43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
> cpu-migrations   1,211  ( +-  5.04% )  220 ( +-  6.23% )
> page-faults 15,983  ( +-  5.21% )   15,360 ( +-  3.38% )
>
> Waiman Long suggested using static_keys.
>
> Fixes: 41946c86876e ("locking/core, powerpc: Implement 
> vcpu_is_preempted(cpu)")
>
> Cc: Parth Shah 
> Cc: Ihor Pasichnyk 
> Cc: Juri Lelli 
> Cc: Phil Auld 
> Cc: Waiman Long 
> Cc: Gautham R. Shenoy 
> Cc: Vaidyanathan Srinivasan 
> Reported-by: Parth Shah 
> Reported-by: Ihor Pasichnyk 
> Tested-by: Juri Lelli 
> Tested-by: Parth Shah 
> Acked-by: Waiman Long 
> 

[PATCH v4 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-12 Thread Srikar Dronamraju
With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
This leads to wrong choice of CPU, which in-turn leads to larger wakeup
latencies. Eventually, it leads to performance regression in latency
sensitive benchmarks like soltp, schbench etc.

On Powerpc, vcpu_is_preempted only looks at yield_count. If the
yield_count is odd, the vCPU is assumed to be preempted. However
yield_count is increased whenever LPAR enters CEDE state. So any CPU
that has entered CEDE state is assumed to be preempted.

Even if vCPU of dedicated LPAR is preempted/donated, it should have
right of first-use since they are suppose to own the vCPU.

On a Power9 System with 32 cores
 # lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  128
On-line CPU(s) list: 0-127
Thread(s) per core:  8
Core(s) per socket:  1
Socket(s):   16
NUMA node(s):2
Model:   2.2 (pvr 004e 0202)
Model name:  POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:   32K
L1i cache:   32K
L2 cache:512K
L3 cache:10240K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127

  # perf stat -a -r 5 ./schbench
v5.4 v5.4 + patch
Latency percentiles (usec)   Latency percentiles (usec)
50.th: 45   50.th: 39
75.th: 62   75.th: 53
90.th: 71   90.th: 67
95.th: 77   95.th: 76
*99.th: 91  *99.th: 89
99.5000th: 707  99.5000th: 93
99.9000th: 6920 99.9000th: 118
min=0, max=10048min=0, max=211
Latency percentiles (usec)   Latency percentiles (usec)
50.th: 45   50.th: 34
75.th: 61   75.th: 45
90.th: 72   90.th: 53
95.th: 79   95.th: 56
*99.th: 691 *99.th: 61
99.5000th: 3972 99.5000th: 63
99.9000th: 8368 99.9000th: 78
min=0, max=16606min=0, max=228
Latency percentiles (usec)   Latency percentiles (usec)
50.th: 45   50.th: 34
75.th: 61   75.th: 45
90.th: 71   90.th: 53
95.th: 77   95.th: 57
*99.th: 106 *99.th: 63
99.5000th: 2364 99.5000th: 68
99.9000th: 7480 99.9000th: 100
min=0, max=10001min=0, max=134
Latency percentiles (usec)   Latency percentiles (usec)
50.th: 45   50.th: 34
75.th: 62   75.th: 46
90.th: 72   90.th: 53
95.th: 78   95.th: 56
*99.th: 93  *99.th: 61
99.5000th: 108  99.5000th: 64
99.9000th: 6792 99.9000th: 85
min=0, max=17681min=0, max=121
Latency percentiles (usec)   Latency percentiles (usec)
50.th: 46   50.th: 33
75.th: 62   75.th: 44
90.th: 73   90.th: 51
95.th: 79   95.th: 54
*99.th: 113 *99.th: 61
99.5000th: 2724 99.5000th: 64
99.9000th: 6184 99.9000th: 82
min=0, max=9887 min=0, max=121

 Performance counter stats for 'system wide' (5 runs):

context-switches43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
cpu-migrations   1,211  ( +-  5.04% )  220 ( +-  6.23% )
page-faults 15,983  ( +-  5.21% )   15,360 ( +-  3.38% )

Waiman Long suggested using static_keys.

Fixes: 41946c86876e ("locking/core, powerpc: Implement vcpu_is_preempted(cpu)")

Cc: Parth Shah 
Cc: Ihor Pasichnyk 
Cc: Juri Lelli 
Cc: Phil Auld 
Cc: Waiman Long 
Cc: Gautham R. Shenoy 
Cc: Vaidyanathan Srinivasan 
Reported-by: Parth Shah 
Reported-by: Ihor Pasichnyk 
Tested-by: Juri Lelli 
Tested-by: Parth Shah 
Acked-by: Waiman Long 
Acked-by: Phil Auld 
Reviewed-by: Gautham R. Shenoy 
Reviewed-by: Vaidyanathan Srinivasan 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1