Re: [PATCH v4 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt
Srikar Dronamraju writes: > With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted > vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup. > This leads to wrong choice of CPU, which in-turn leads to larger wakeup > latencies. Eventually, it leads to performance regression in latency > sensitive benchmarks like soltp, schbench etc. > > On Powerpc, vcpu_is_preempted only looks at yield_count. If the > yield_count is odd, the vCPU is assumed to be preempted. However > yield_count is increased whenever LPAR enters CEDE state. So any CPU > that has entered CEDE state is assumed to be preempted. > > Even if vCPU of dedicated LPAR is preempted/donated, it should have > right of first-use since they are suppose to own the vCPU. > > On a Power9 System with 32 cores > # lscpu > Architecture:ppc64le > Byte Order: Little Endian > CPU(s): 128 > On-line CPU(s) list: 0-127 > Thread(s) per core: 8 > Core(s) per socket: 1 > Socket(s): 16 > NUMA node(s):2 > Model: 2.2 (pvr 004e 0202) > Model name: POWER9 (architected), altivec supported > Hypervisor vendor: pHyp > Virtualization type: para > L1d cache: 32K > L1i cache: 32K > L2 cache:512K > L3 cache:10240K > NUMA node0 CPU(s): 0-63 > NUMA node1 CPU(s): 64-127 > > # perf stat -a -r 5 ./schbench > v5.4 v5.4 + patch > Latency percentiles (usec) Latency percentiles (usec) > 50.th: 45 50.th: 39 > 75.th: 62 75.th: 53 > 90.th: 71 90.th: 67 > 95.th: 77 95.th: 76 > *99.th: 91 *99.th: 89 > 99.5000th: 707 99.5000th: 93 > 99.9000th: 6920 99.9000th: 118 > min=0, max=10048min=0, max=211 > Latency percentiles (usec) Latency percentiles (usec) > 50.th: 45 50.th: 34 > 75.th: 61 75.th: 45 > 90.th: 72 90.th: 53 > 95.th: 79 95.th: 56 > *99.th: 691 *99.th: 61 > 99.5000th: 3972 99.5000th: 63 > 99.9000th: 8368 99.9000th: 78 > min=0, max=16606min=0, max=228 > Latency percentiles (usec) Latency percentiles (usec) > 50.th: 45 50.th: 34 > 75.th: 61 75.th: 45 > 90.th: 71 90.th: 53 > 95.th: 77 95.th: 57 > *99.th: 106 *99.th: 63 > 99.5000th: 2364 99.5000th: 68 > 99.9000th: 7480 99.9000th: 100 > min=0, max=10001min=0, max=134 > Latency percentiles (usec) Latency percentiles (usec) > 50.th: 45 50.th: 34 > 75.th: 62 75.th: 46 > 90.th: 72 90.th: 53 > 95.th: 78 95.th: 56 > *99.th: 93 *99.th: 61 > 99.5000th: 108 99.5000th: 64 > 99.9000th: 6792 99.9000th: 85 > min=0, max=17681min=0, max=121 > Latency percentiles (usec) Latency percentiles (usec) > 50.th: 46 50.th: 33 > 75.th: 62 75.th: 44 > 90.th: 73 90.th: 51 > 95.th: 79 95.th: 54 > *99.th: 113 *99.th: 61 > 99.5000th: 2724 99.5000th: 64 > 99.9000th: 6184 99.9000th: 82 > min=0, max=9887 min=0, max=121 > > Performance counter stats for 'system wide' (5 runs): > > context-switches43,373 ( +- 0.40% ) 44,597 ( +- 0.55% ) > cpu-migrations 1,211 ( +- 5.04% ) 220 ( +- 6.23% ) > page-faults 15,983 ( +- 5.21% ) 15,360 ( +- 3.38% ) > > Waiman Long suggested using static_keys. > > Fixes: 41946c86876e ("locking/core, powerpc: Implement > vcpu_is_preempted(cpu)") > > Cc: Parth Shah > Cc: Ihor Pasichnyk > Cc: Juri Lelli > Cc: Phil Auld > Cc: Waiman Long > Cc: Gautham R. Shenoy > Cc: Vaidyanathan Srinivasan > Reported-by: Parth Shah > Reported-by: Ihor Pasichnyk > Tested-by: Juri Lelli > Tested-by: Parth Shah > Acked-by: Waiman Long >
[PATCH v4 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt
With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup. This leads to wrong choice of CPU, which in-turn leads to larger wakeup latencies. Eventually, it leads to performance regression in latency sensitive benchmarks like soltp, schbench etc. On Powerpc, vcpu_is_preempted only looks at yield_count. If the yield_count is odd, the vCPU is assumed to be preempted. However yield_count is increased whenever LPAR enters CEDE state. So any CPU that has entered CEDE state is assumed to be preempted. Even if vCPU of dedicated LPAR is preempted/donated, it should have right of first-use since they are suppose to own the vCPU. On a Power9 System with 32 cores # lscpu Architecture:ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 16 NUMA node(s):2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache:512K L3 cache:10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # perf stat -a -r 5 ./schbench v5.4 v5.4 + patch Latency percentiles (usec) Latency percentiles (usec) 50.th: 45 50.th: 39 75.th: 62 75.th: 53 90.th: 71 90.th: 67 95.th: 77 95.th: 76 *99.th: 91 *99.th: 89 99.5000th: 707 99.5000th: 93 99.9000th: 6920 99.9000th: 118 min=0, max=10048min=0, max=211 Latency percentiles (usec) Latency percentiles (usec) 50.th: 45 50.th: 34 75.th: 61 75.th: 45 90.th: 72 90.th: 53 95.th: 79 95.th: 56 *99.th: 691 *99.th: 61 99.5000th: 3972 99.5000th: 63 99.9000th: 8368 99.9000th: 78 min=0, max=16606min=0, max=228 Latency percentiles (usec) Latency percentiles (usec) 50.th: 45 50.th: 34 75.th: 61 75.th: 45 90.th: 71 90.th: 53 95.th: 77 95.th: 57 *99.th: 106 *99.th: 63 99.5000th: 2364 99.5000th: 68 99.9000th: 7480 99.9000th: 100 min=0, max=10001min=0, max=134 Latency percentiles (usec) Latency percentiles (usec) 50.th: 45 50.th: 34 75.th: 62 75.th: 46 90.th: 72 90.th: 53 95.th: 78 95.th: 56 *99.th: 93 *99.th: 61 99.5000th: 108 99.5000th: 64 99.9000th: 6792 99.9000th: 85 min=0, max=17681min=0, max=121 Latency percentiles (usec) Latency percentiles (usec) 50.th: 46 50.th: 33 75.th: 62 75.th: 44 90.th: 73 90.th: 51 95.th: 79 95.th: 54 *99.th: 113 *99.th: 61 99.5000th: 2724 99.5000th: 64 99.9000th: 6184 99.9000th: 82 min=0, max=9887 min=0, max=121 Performance counter stats for 'system wide' (5 runs): context-switches43,373 ( +- 0.40% ) 44,597 ( +- 0.55% ) cpu-migrations 1,211 ( +- 5.04% ) 220 ( +- 6.23% ) page-faults 15,983 ( +- 5.21% ) 15,360 ( +- 3.38% ) Waiman Long suggested using static_keys. Fixes: 41946c86876e ("locking/core, powerpc: Implement vcpu_is_preempted(cpu)") Cc: Parth Shah Cc: Ihor Pasichnyk Cc: Juri Lelli Cc: Phil Auld Cc: Waiman Long Cc: Gautham R. Shenoy Cc: Vaidyanathan Srinivasan Reported-by: Parth Shah Reported-by: Ihor Pasichnyk Tested-by: Juri Lelli Tested-by: Parth Shah Acked-by: Waiman Long Acked-by: Phil Auld Reviewed-by: Gautham R. Shenoy Reviewed-by: Vaidyanathan Srinivasan Signed-off-by: Srikar Dronamraju --- Changelog v1