Re: [PATCH 0/4] Powerpc: Better preemption for shared processor

2020-10-29 Thread Srikar Dronamraju
* Waiman Long  [2020-10-28 20:01:30]:

> > Srikar Dronamraju (4):
> >powerpc: Refactor is_kvm_guest declaration to new header
> >powerpc: Rename is_kvm_guest to check_kvm_guest
> >powerpc: Reintroduce is_kvm_guest
> >powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted
> > 
> >   arch/powerpc/include/asm/firmware.h  |  6 --
> >   arch/powerpc/include/asm/kvm_guest.h | 25 +
> >   arch/powerpc/include/asm/kvm_para.h  |  2 +-
> >   arch/powerpc/include/asm/paravirt.h  | 18 ++
> >   arch/powerpc/kernel/firmware.c   |  5 -
> >   arch/powerpc/platforms/pseries/smp.c |  3 ++-
> >   6 files changed, 50 insertions(+), 9 deletions(-)
> >   create mode 100644 arch/powerpc/include/asm/kvm_guest.h
> > 
> This patch series looks good to me and the performance is nice too.
> 
> Acked-by: Waiman Long 

Thank you.

> 
> Just curious, is the performance mainly from the use of static_branch
> (patches 1 - 3) or from reducing call to yield_count_of().

Because of the reduced call to yield_count

> 
> Cheers,
> Longman
> 

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH 0/4] Powerpc: Better preemption for shared processor

2020-10-28 Thread Waiman Long

On 10/28/20 8:35 AM, Srikar Dronamraju wrote:

Currently, vcpu_is_preempted will return the yield_count for
shared_processor. On a PowerVM LPAR, Phyp schedules at SMT8 core boundary
i.e all CPUs belonging to a core are either group scheduled in or group
scheduled out. This can be used to better predict non-preempted CPUs on
PowerVM shared LPARs.

perf stat -r 5 -a perf bench sched pipe -l 1000 (lesser time is better)

powerpc/next
  35,107,951.20 msec cpu-clock #  255.898 CPUs utilized 
   ( +-  0.31% )
 23,655,348  context-switches  #0.674 K/sec 
   ( +-  3.72% )
 14,465  cpu-migrations#0.000 K/sec 
   ( +-  5.37% )
 82,463  page-faults   #0.002 K/sec 
   ( +-  8.40% )
  1,127,182,328,206  cycles#0.032 GHz   
   ( +-  1.60% )  (66.67%)
 78,587,300,622  stalled-cycles-frontend   #6.97% frontend cycles 
idle ( +-  0.08% )  (50.01%)
654,124,218,432  stalled-cycles-backend#   58.03% backend cycles 
idle  ( +-  1.74% )  (50.01%)
834,013,059,242  instructions  #0.74  insn per cycle
   #0.78  stalled cycles 
per insn  ( +-  0.73% )  (66.67%)
132,911,454,387  branches  #3.786 M/sec 
   ( +-  0.59% )  (50.00%)
  2,890,882,143  branch-misses #2.18% of all branches   
   ( +-  0.46% )  (50.00%)

137.195 +- 0.419 seconds time elapsed  ( +-  0.31% )

powerpc/next + patchset
  29,981,702.64 msec cpu-clock #  255.881 CPUs utilized 
   ( +-  1.30% )
 40,162,456  context-switches  #0.001 M/sec 
   ( +-  0.01% )
  1,110  cpu-migrations#0.000 K/sec 
   ( +-  5.20% )
 62,616  page-faults   #0.002 K/sec 
   ( +-  3.93% )
  1,430,030,626,037  cycles#0.048 GHz   
   ( +-  1.41% )  (66.67%)
 83,202,707,288  stalled-cycles-frontend   #5.82% frontend cycles 
idle ( +-  0.75% )  (50.01%)
744,556,088,520  stalled-cycles-backend#   52.07% backend cycles 
idle  ( +-  1.39% )  (50.01%)
940,138,418,674  instructions  #0.66  insn per cycle
   #0.79  stalled cycles 
per insn  ( +-  0.51% )  (66.67%)
146,452,852,283  branches  #4.885 M/sec 
   ( +-  0.80% )  (50.00%)
  3,237,743,996  branch-misses #2.21% of all branches   
   ( +-  1.18% )  (50.01%)

 117.17 +- 1.52 seconds time elapsed  ( +-  1.30% )

This is around 14.6% improvement in performance.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: Gautham R Shenoy 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Juri Lelli 
Cc: Waiman Long 
Cc: Phil Auld 

Srikar Dronamraju (4):
   powerpc: Refactor is_kvm_guest declaration to new header
   powerpc: Rename is_kvm_guest to check_kvm_guest
   powerpc: Reintroduce is_kvm_guest
   powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted

  arch/powerpc/include/asm/firmware.h  |  6 --
  arch/powerpc/include/asm/kvm_guest.h | 25 +
  arch/powerpc/include/asm/kvm_para.h  |  2 +-
  arch/powerpc/include/asm/paravirt.h  | 18 ++
  arch/powerpc/kernel/firmware.c   |  5 -
  arch/powerpc/platforms/pseries/smp.c |  3 ++-
  6 files changed, 50 insertions(+), 9 deletions(-)
  create mode 100644 arch/powerpc/include/asm/kvm_guest.h


This patch series looks good to me and the performance is nice too.

Acked-by: Waiman Long 

Just curious, is the performance mainly from the use of static_branch 
(patches 1 - 3) or from reducing call to yield_count_of().


Cheers,
Longman



[PATCH 0/4] Powerpc: Better preemption for shared processor

2020-10-28 Thread Srikar Dronamraju
Currently, vcpu_is_preempted will return the yield_count for
shared_processor. On a PowerVM LPAR, Phyp schedules at SMT8 core boundary
i.e all CPUs belonging to a core are either group scheduled in or group
scheduled out. This can be used to better predict non-preempted CPUs on
PowerVM shared LPARs.

perf stat -r 5 -a perf bench sched pipe -l 1000 (lesser time is better)

powerpc/next
 35,107,951.20 msec cpu-clock #  255.898 CPUs utilized  
  ( +-  0.31% )
23,655,348  context-switches  #0.674 K/sec  
  ( +-  3.72% )
14,465  cpu-migrations#0.000 K/sec  
  ( +-  5.37% )
82,463  page-faults   #0.002 K/sec  
  ( +-  8.40% )
 1,127,182,328,206  cycles#0.032 GHz
  ( +-  1.60% )  (66.67%)
78,587,300,622  stalled-cycles-frontend   #6.97% frontend cycles 
idle ( +-  0.08% )  (50.01%)
   654,124,218,432  stalled-cycles-backend#   58.03% backend cycles 
idle  ( +-  1.74% )  (50.01%)
   834,013,059,242  instructions  #0.74  insn per cycle
  #0.78  stalled cycles per 
insn  ( +-  0.73% )  (66.67%)
   132,911,454,387  branches  #3.786 M/sec  
  ( +-  0.59% )  (50.00%)
 2,890,882,143  branch-misses #2.18% of all branches
  ( +-  0.46% )  (50.00%)

   137.195 +- 0.419 seconds time elapsed  ( +-  0.31% )

powerpc/next + patchset
 29,981,702.64 msec cpu-clock #  255.881 CPUs utilized  
  ( +-  1.30% )
40,162,456  context-switches  #0.001 M/sec  
  ( +-  0.01% )
 1,110  cpu-migrations#0.000 K/sec  
  ( +-  5.20% )
62,616  page-faults   #0.002 K/sec  
  ( +-  3.93% )
 1,430,030,626,037  cycles#0.048 GHz
  ( +-  1.41% )  (66.67%)
83,202,707,288  stalled-cycles-frontend   #5.82% frontend cycles 
idle ( +-  0.75% )  (50.01%)
   744,556,088,520  stalled-cycles-backend#   52.07% backend cycles 
idle  ( +-  1.39% )  (50.01%)
   940,138,418,674  instructions  #0.66  insn per cycle
  #0.79  stalled cycles per 
insn  ( +-  0.51% )  (66.67%)
   146,452,852,283  branches  #4.885 M/sec  
  ( +-  0.80% )  (50.00%)
 3,237,743,996  branch-misses #2.21% of all branches
  ( +-  1.18% )  (50.01%)

117.17 +- 1.52 seconds time elapsed  ( +-  1.30% )

This is around 14.6% improvement in performance.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: Gautham R Shenoy 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Juri Lelli 
Cc: Waiman Long 
Cc: Phil Auld 

Srikar Dronamraju (4):
  powerpc: Refactor is_kvm_guest declaration to new header
  powerpc: Rename is_kvm_guest to check_kvm_guest
  powerpc: Reintroduce is_kvm_guest
  powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted

 arch/powerpc/include/asm/firmware.h  |  6 --
 arch/powerpc/include/asm/kvm_guest.h | 25 +
 arch/powerpc/include/asm/kvm_para.h  |  2 +-
 arch/powerpc/include/asm/paravirt.h  | 18 ++
 arch/powerpc/kernel/firmware.c   |  5 -
 arch/powerpc/platforms/pseries/smp.c |  3 ++-
 6 files changed, 50 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_guest.h

-- 
2.18.4