[REPORT] Nightly Performance Tests - Saturday, November 27, 2021

2021-11-27 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2021-11-27 21:35:01
End Time (UTC)   : 2021-11-27 21:44:11
Execution Time   : 0:09:09.679515

Status   : FAILURE



  ERROR LOGS

2021-11-27T21:35:02.042294 - Verifying executables of 8 benchmarks for 17 targets
2021-11-27T21:35:02.044552 - Verifying results of reference version v5.1.0
2021-11-27T21:35:02.060658 - Checking out master
2021-11-27T21:35:02.445242 - Pulling the latest changes from QEMU master
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:35:03.576900 - Trial 1/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:36:05.421731 - Trial 2/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:37:06.150805 - Trial 3/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:38:06.984661 - Trial 4/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:39:07.714521 - Trial 5/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:40:08.341699 - Trial 6/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:41:09.066812 - Trial 7/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:42:09.799084 - Trial 8/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:43:10.933260 - Trial 9/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-11-27T21:44:11.661151 - Trial 10/10: Failed to pull QEMU






Re: [RFC PATCH] contrib/gitdm: Add more individual contributors

2020-10-09 Thread Ahmed Karaman
On Sun, Oct 4, 2020, 8:25 PM Philippe Mathieu-Daudé  wrote:

> These individual contributors have a number of contributions,
> add them to the 'individual' group map.
>
> Cc: Ahmed Karaman 
> Cc: Aleksandar Markovic 
> Cc: Alistair Francis 
> Cc: Artyom Tarasenko 
> Cc: David Carlier 
> Cc: Finn Thain 
> Cc: Guenter Roeck 
> Cc: Helge Deller 
> Cc: Hervé Poussineau 
> Cc: James Hogan 
> Cc: Jean-Christophe Dubois 
> Cc: Kővágó Zoltán 
> Cc: Laurent Vivier 
> Cc: Michael Rolnik 
> Cc: Niek Linnenbank 
> Cc: Paul Burton 
> Cc: Paul Zimmerman 
> Cc: Stefan Weil 
> Cc: Subbaraya Sundeep 
> Cc: Sven Schnelle 
> Cc: Thomas Huth 
> Cc: Volker Rümelin 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> To the developers Cc'ed: If you agree with your entry, please
> reply with a Reviewed-by/Acked-by tag. If you disagree or doesn't
> care, please either reply with Nack-by or ignore this patch.
> I'll repost in 2 weeks as formal patch (not RFC) with only the
> entries acked by their author.
> ---
>  contrib/gitdm/group-map-individuals | 22 ++
>  contrib/gitdm/group-map-redhat  |  1 -
>  2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/contrib/gitdm/group-map-individuals
> b/contrib/gitdm/group-map-individuals
> index cf8a2ce367..b478fd4576 100644
> --- a/contrib/gitdm/group-map-individuals
> +++ b/contrib/gitdm/group-map-individuals
> @@ -16,3 +16,25 @@ aurel...@aurel32.net
>  bala...@eik.bme.hu
>  e.emanuelegiuse...@gmail.com
>  andrew.smir...@gmail.com
> +s...@weilnetz.de
> +h...@tuxfamily.org
> +laur...@vivier.eu
> +atar4q...@gmail.com
> +hpous...@reactos.org
> +del...@gmx.de
> +alist...@alistair23.me
> +fth...@telegraphics.com.au
> +sv...@stackframe.org
> +aleksandar.qemu.de...@gmail.com
> +jho...@kernel.org
> +paulbur...@kernel.org
> +vr_q...@t-online.de
> +nieklinnenb...@gmail.com
> +devne...@gmail.com
> +j...@tribudubois.net
> +dirty.ice...@gmail.com
> +mrol...@gmail.com
> +pauld...@gmail.com
> +li...@roeck-us.net
> +sundeep.l...@gmail.com
> +ahmedkhaledkara...@gmail.com
> diff --git a/contrib/gitdm/group-map-redhat
> b/contrib/gitdm/group-map-redhat
> index d15db2d35e..4a8ca84b36 100644
> --- a/contrib/gitdm/group-map-redhat
> +++ b/contrib/gitdm/group-map-redhat
> @@ -3,6 +3,5 @@
>  #
>
>  da...@gibson.dropbear.id.au
> -laur...@vivier.eu
>  p...@fedoraproject.org
>  arm...@pond.sub.org
> --
> 2.26.2
>

Acked-by: Ahmed Karaman >

>


[Bug 1895703] Re: performance degradation in tcg since Meson switch

2020-09-17 Thread Ahmed Karaman
** Attachment added: "matmult_double-m68k"
   
https://bugs.launchpad.net/qemu/+bug/1895703/+attachment/5411765/+files/matmult_double-m68k

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1895703

Title:
  performance degradation in tcg since Meson switch

Status in QEMU:
  New

Bug description:
  The buildsys conversion to Meson (1d806cef0e3..7fd51e68c34)
  introduced a degradation in performance in some TCG targets:

  
  Test Program: matmult_double
  
  Target  Instructions PreviousLatest
   1d806cef   7fd51e68
  --    --  --
  alpha  3 233 957 639   - +7.472%
  m68k   3 919 110 506   -+18.433%
  

  Original report from Ahmed Karaman with further testing done
  by Aleksandar Markovic:
  https://www.mail-archive.com/qemu-devel@nongnu.org/msg740279.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1895703/+subscriptions



Re: [REPORT] Nightly Performance Tests - Wednesday, September 16, 2020

2020-09-17 Thread Ahmed Karaman
On Thu, Sep 17, 2020 at 3:28 PM Philippe Mathieu-Daudé
 wrote:
>
> On 9/17/20 10:02 AM, Ahmed Karaman wrote:
> > On Thu, Sep 17, 2020 at 12:07 AM Ahmed Karaman
> >  wrote:
> >>
> >> Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
> >> Host Memory  : 15.49 GB
> >>
> >> Start Time (UTC) : 2020-09-16 21:35:02
> >> End Time (UTC)   : 2020-09-16 22:07:32
> >> Execution Time   : 0:32:29.941492
> >>
> >> Status   : SUCCESS
> >>
> >> Note:
> >> Changes denoted by '-' are less than 0.01%.
> >>
> >> 
> >> SUMMARY REPORT - COMMIT 8ee61272
> >> 
> >> AVERAGE RESULTS
> >> 
> >> Target  Instructions  Latest  v5.1.0
> >> --    --  --
> >> aarch642 158 513 150   - +1.703%
> >> alpha  1 914 947 541   - +3.522%
> >> arm8 076 527 003   - +2.308%
> >> hppa   4 261 673 329   - +3.163%
> >> m68k   2 690 293 359   - +7.134%
> >> mips   1 861 902 263   - +2.484%
> >> mipsel 2 008 240 685   - +2.676%
> >> mips64 1 918 624 648   - +2.817%
> >> mips64el   2 051 554 799   - +3.025%
> >> ppc2 480 174 328   - +3.109%
> >> ppc64  2 576 701 038   - +3.142%
> >> ppc64le2 558 820 807   - +3.171%
> >> riscv641 406 685 833   - +2.648%
> >> s390x  3 158 140 071   - +3.119%
> >> sh42 364 606 066   - +3.341%
> >> sparc643 318 698 928   - +3.855%
> >> x86_64 1 775 941 661   - +2.167%
> >> 
> >>
> >>DETAILED RESULTS
> >> 
> >> Test Program: dijkstra_double
> >> 
> >> Target  Instructions  Latest  v5.1.0
> >> --    --  --
> >> aarch643 062 745 624   - +1.429%
> >> alpha  3 191 842 908   - +3.695%
> >> arm   16 357 299 506   - +2.348%
> >> hppa   7 228 387 843   - +3.086%
> >> m68k   4 294 056 834   - +9.693%
> >> mips   3 051 314 790   - +2.423%
> >> mipsel 3 231 546 887   -  +2.87%
> >> mips64 3 245 814 633   - +2.596%
> >> mips64el   3 414 215 768   - +3.021%
> >> ppc4 914 556 467   -  +4.74%
> >> ppc64  5 098 137 458   - +4.565%
> >> ppc64le5 082 383 704   - +4.579%
> >> riscv642 192 269 006   - +1.954%
> >> s390x  4 584 587 692   - +2.898%
> >> sh43 949 197 667   - +3.468%
> >> sparc644 586 104 947   - +4.235%
> >> x86_64 2 484 245 797   - +1.757%
> >> 
> >> 
> >> Test Program: dijkstra_int32
> >> 
> >> Target  Instructions  Latest  v5.1.0
> >> --    --  --
> >> aarch642 210 360 293   - +1.501%
> >> alpha  1 494 111 691   - +2.149%
> >> arm8 263 044 506   - +2.667%
> >> hppa   5 207 306 045   - +3.047%
> >> m68k   1 725 880 564   - +2.528%
> >> mips   1 495 110 368   - +1.484%
> >> mipsel 1 497 169 328   - +1.481%
> >> mips64 1 715 421 334   - +1.894%
> >>

Re: [REPORT] Nightly Performance Tests - Wednesday, September 16, 2020

2020-09-17 Thread Ahmed Karaman
On Thu, Sep 17, 2020 at 12:07 AM Ahmed Karaman
 wrote:
>
> Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
> Host Memory  : 15.49 GB
>
> Start Time (UTC) : 2020-09-16 21:35:02
> End Time (UTC)   : 2020-09-16 22:07:32
> Execution Time   : 0:32:29.941492
>
> Status   : SUCCESS
>
> Note:
> Changes denoted by '-' are less than 0.01%.
>
> 
> SUMMARY REPORT - COMMIT 8ee61272
> 
> AVERAGE RESULTS
> 
> Target  Instructions  Latest  v5.1.0
> --    --  --
> aarch642 158 513 150   - +1.703%
> alpha  1 914 947 541   - +3.522%
> arm8 076 527 003   - +2.308%
> hppa   4 261 673 329   - +3.163%
> m68k   2 690 293 359   - +7.134%
> mips   1 861 902 263   - +2.484%
> mipsel 2 008 240 685   - +2.676%
> mips64 1 918 624 648   - +2.817%
> mips64el   2 051 554 799   - +3.025%
> ppc2 480 174 328   - +3.109%
> ppc64  2 576 701 038   - +3.142%
> ppc64le2 558 820 807   - +3.171%
> riscv641 406 685 833   - +2.648%
> s390x  3 158 140 071   - +3.119%
> sh42 364 606 066   - +3.341%
> sparc643 318 698 928   - +3.855%
> x86_64 1 775 941 661   - +2.167%
> 
>
>DETAILED RESULTS
> 
> Test Program: dijkstra_double
> 
> Target  Instructions  Latest  v5.1.0
> --    --  --
> aarch643 062 745 624   - +1.429%
> alpha  3 191 842 908   - +3.695%
> arm   16 357 299 506   - +2.348%
> hppa   7 228 387 843   - +3.086%
> m68k   4 294 056 834   - +9.693%
> mips   3 051 314 790   - +2.423%
> mipsel 3 231 546 887   -  +2.87%
> mips64 3 245 814 633   - +2.596%
> mips64el   3 414 215 768   - +3.021%
> ppc4 914 556 467   -  +4.74%
> ppc64  5 098 137 458   - +4.565%
> ppc64le5 082 383 704   - +4.579%
> riscv642 192 269 006   - +1.954%
> s390x  4 584 587 692   - +2.898%
> sh43 949 197 667   - +3.468%
> sparc644 586 104 947   - +4.235%
> x86_64 2 484 245 797   - +1.757%
> 
> 
> Test Program: dijkstra_int32
> 
> Target  Instructions  Latest  v5.1.0
> --    --  --
> aarch642 210 360 293   - +1.501%
> alpha  1 494 111 691   - +2.149%
> arm8 263 044 506   - +2.667%
> hppa   5 207 306 045   - +3.047%
> m68k   1 725 880 564   - +2.528%
> mips   1 495 110 368   - +1.484%
> mipsel 1 497 169 328   - +1.481%
> mips64 1 715 421 334   - +1.894%
> mips64el   1 695 209 677   - +1.909%
> ppc2 014 602 126   - +1.822%
> ppc64  2 206 256 217   - +2.138%
> ppc64le2 197 967 863   - +2.145%
> riscv641 354 884 068   - +2.394%
> s390x  2 916 098 604   - +1.236%
> sh41 990 693 666   - +2.678%
> sparc642 874 142 164   - +3.827%
> x86_64 1 554 138 606   -  +2.13%
> 
> 
> Test Program: matmult_double
> 
> Target  Instructions  Latest  v5.1.0

[REPORT] Nightly Performance Tests - Wednesday, September 16, 2020

2020-09-16 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-16 21:35:02
End Time (UTC)   : 2020-09-16 22:07:32
Execution Time   : 0:32:29.941492

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 8ee61272

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 513 150   - +1.703%
alpha  1 914 947 541   - +3.522%
arm8 076 527 003   - +2.308%
hppa   4 261 673 329   - +3.163%
m68k   2 690 293 359   - +7.134%
mips   1 861 902 263   - +2.484%
mipsel 2 008 240 685   - +2.676%
mips64 1 918 624 648   - +2.817%
mips64el   2 051 554 799   - +3.025%
ppc2 480 174 328   - +3.109%
ppc64  2 576 701 038   - +3.142%
ppc64le2 558 820 807   - +3.171%
riscv641 406 685 833   - +2.648%
s390x  3 158 140 071   - +3.119%
sh42 364 606 066   - +3.341%
sparc643 318 698 928   - +3.855%
x86_64 1 775 941 661   - +2.167%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 745 624   - +1.429%
alpha  3 191 842 908   - +3.695%
arm   16 357 299 506   - +2.348%
hppa   7 228 387 843   - +3.086%
m68k   4 294 056 834   - +9.693%
mips   3 051 314 790   - +2.423%
mipsel 3 231 546 887   -  +2.87%
mips64 3 245 814 633   - +2.596%
mips64el   3 414 215 768   - +3.021%
ppc4 914 556 467   -  +4.74%
ppc64  5 098 137 458   - +4.565%
ppc64le5 082 383 704   - +4.579%
riscv642 192 269 006   - +1.954%
s390x  4 584 587 692   - +2.898%
sh43 949 197 667   - +3.468%
sparc644 586 104 947   - +4.235%
x86_64 2 484 245 797   - +1.757%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 360 293   - +1.501%
alpha  1 494 111 691   - +2.149%
arm8 263 044 506   - +2.667%
hppa   5 207 306 045   - +3.047%
m68k   1 725 880 564   - +2.528%
mips   1 495 110 368   - +1.484%
mipsel 1 497 169 328   - +1.481%
mips64 1 715 421 334   - +1.894%
mips64el   1 695 209 677   - +1.909%
ppc2 014 602 126   - +1.822%
ppc64  2 206 256 217   - +2.138%
ppc64le2 197 967 863   - +2.145%
riscv641 354 884 068   - +2.394%
s390x  2 916 098 604   - +1.236%
sh41 990 693 666   - +2.678%
sparc642 874 142 164   - +3.827%
x86_64 1 554 138 606   -  +2.13%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 417 224   - +0.312%
alpha  3 233 972 467   - +7.473%
arm8 545 300 144   -  +1.09%
hppa   3 483 516 785   - +4.466%
m68k   3 919 111 292   -+18.433%
mips   2 344 644 680   - +4.085%
mipsel 3 329 922 415   - +5.178%
mips64 2 359 029 035   - +4.075%

[REPORT] Nightly Performance Tests - Tuesday, September 15, 2020

2020-09-15 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-15 20:40:01
End Time (UTC)   : 2020-09-15 21:12:43
Execution Time   : 0:32:41.290408

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT de39a045

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 524 492   - +1.704%
alpha  1 914 934 541   - +3.521%
arm8 076 523 798   - +2.308%
hppa   4 261 683 298   - +3.163%
m68k   2 690 291 552   - +7.134%
mips   1 861 895 402   - +2.484%
mipsel 2 008 239 436   - +2.676%
mips64 1 918 601 151   - +2.816%
mips64el   2 051 554 246   - +3.025%
ppc2 480 178 421   -  +3.11%
ppc64  2 576 714 241   - +3.143%
ppc64le2 558 832 167   - +3.171%
riscv641 406 692 488   - +2.649%
s390x  3 158 132 301   - +3.118%
sh42 364 605 623   - +3.341%
sparc643 318 701 656   - +3.855%
x86_64 1 775 944 131   - +2.167%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 758 181   -  +1.43%
alpha  3 191 832 563   - +3.695%
arm   16 357 287 709   - +2.348%
hppa   7 228 398 607   - +3.086%
m68k   4 294 056 637   - +9.693%
mips   3 051 310 255   - +2.423%
mipsel 3 231 548 335   -  +2.87%
mips64 3 245 791 692   - +2.595%
mips64el   3 414 214 247   - +3.021%
ppc4 914 561 218   -  +4.74%
ppc64  5 098 152 776   - +4.565%
ppc64le5 082 399 487   -  +4.58%
riscv642 192 276 027   - +1.955%
s390x  4 584 581 806   - +2.897%
sh43 949 197 215   - +3.468%
sparc644 586 107 304   - +4.235%
x86_64 2 484 250 887   - +1.757%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 370 607   - +1.502%
alpha  1 494 098 562   - +2.148%
arm8 263 040 235   - +2.667%
hppa   5 207 316 726   - +3.047%
m68k   1 725 877 759   - +2.528%
mips   1 495 100 498   - +1.483%
mipsel 1 497 165 574   -  +1.48%
mips64 1 715 397 024   - +1.892%
mips64el   1 695 210 741   - +1.909%
ppc2 014 604 657   - +1.822%
ppc64  2 206 271 827   - +2.139%
ppc64le2 197 979 437   - +2.145%
riscv641 354 885 081   - +2.394%
s390x  2 916 086 983   - +1.236%
sh41 990 693 213   - +2.678%
sparc642 874 141 483   - +3.827%
x86_64 1 554 140 016   -  +2.13%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 427 654   - +0.313%
alpha  3 233 959 954   - +7.472%
arm8 545 300 488   -  +1.09%
hppa   3 483 527 384   - +4.466%
m68k   3 919 108 398   -+18.433%
mips   2 344 640 021   - +4.085%
mipsel 3 329 921 331   - +5.178%
mips64 2 359 007 385   - +4.074%

[REPORT] Nightly Performance Tests - Monday, September 14, 2020

2020-09-14 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-14 23:50:01
End Time (UTC)   : 2020-09-15 00:23:36
Execution Time   : 0:33:35.073901

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 2d2c73d0

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 526 497   - +1.704%
alpha  1 914 934 456   - +3.521%
arm8 076 525 475   - +2.308%
hppa   4 261 674 155   - +3.163%
m68k   2 690 286 745   - +7.134%
mips   1 861 909 990   - +2.485%
mipsel 2 008 232 972   - +2.675%
mips64 1 918 611 432   - +2.817%
mips64el   2 051 558 759   - +3.026%
ppc2 480 190 626   - +3.111%
ppc64  2 576 704 411   - +3.142%
ppc64le2 558 826 758   - +3.171%
riscv641 406 690 056   - +2.648%
s390x  3 158 137 991   - +3.119%
sh42 364 597 394   -  +3.34%
sparc643 318 698 826   - +3.855%
x86_64 1 775 946 436   - +2.167%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 758 110   -  +1.43%
alpha  3 191 832 536   - +3.695%
arm   16 357 290 166   - +2.348%
hppa   7 228 388 968   - +3.086%
m68k   4 294 052 082   - +9.693%
mips   3 051 324 052   - +2.423%
mipsel 3 231 541 123   -  +2.87%
mips64 3 245 801 241   - +2.595%
mips64el   3 414 218 711   - +3.021%
ppc4 914 574 577   - +4.741%
ppc64  5 098 142 590   - +4.565%
ppc64le5 082 394 928   - +4.579%
riscv642 192 274 732   - +1.954%
s390x  4 584 586 983   - +2.898%
sh43 949 189 064   - +3.468%
sparc644 586 103 855   - +4.235%
x86_64 2 484 252 053   - +1.757%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 372 521   - +1.502%
alpha  1 494 098 613   - +2.148%
arm8 263 042 498   - +2.667%
hppa   5 207 307 253   - +3.047%
m68k   1 725 873 225   - +2.528%
mips   1 495 116 648   - +1.484%
mipsel 1 497 160 995   -  +1.48%
mips64 1 715 408 180   - +1.893%
mips64el   1 695 215 333   - +1.909%
ppc2 014 617 377   - +1.823%
ppc64  2 206 261 667   - +2.138%
ppc64le2 197 974 982   - +2.145%
riscv641 354 886 376   - +2.394%
s390x  2 916 095 094   - +1.236%
sh41 990 685 482   - +2.677%
sparc642 874 141 228   - +3.827%
x86_64 1 554 142 522   -  +2.13%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 432 189   - +0.313%
alpha  3 233 956 740   - +7.472%
arm8 545 298 231   -  +1.09%
hppa   3 483 517 882   - +4.466%
m68k   3 919 103 883   -+18.432%
mips   2 344 653 600   - +4.085%
mipsel 3 329 914 037   - +5.177%
mips64 2 359 018 408   - +4.075%

Re: [REPORT] Nightly Performance Tests - Sunday, September 13, 2020

2020-09-14 Thread Ahmed Karaman
On Mon, Sep 14, 2020 at 8:46 AM Philippe Mathieu-Daudé  wrote:
>
> Hi Ahmed,
>
> On 9/14/20 12:07 AM, Ahmed Karaman wrote:
> > Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
> > Host Memory  : 15.49 GB
> >
> > Start Time (UTC) : 2020-09-13 21:35:01
> > End Time (UTC)   : 2020-09-13 22:07:44
> > Execution Time   : 0:32:42.230467
> >
> > Status   : SUCCESS
> >
> > Note:
> > Changes denoted by '-' are less than 0.01%.
> >
> > 
> > SUMMARY REPORT - COMMIT f00f57f3
> > 
>
> (Maybe this was already commented earlier but I missed it).
>
> What change had a so significant impact on the m68k target?
> At a glance I only see mostly changes in softfloat:
>
> $ git log --oneline v5.1.0..f00f57f3 tcg target/m68k fpu
> fe4b0b5bfa9 tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
> 6a17646176e tcg: Eliminate one store for in-place 128-bit dup_mem
> e7e8f33fb60 tcg: Fix tcg gen for vectorized absolute value
> 5ebf5f4be66 softfloat: Define misc operations for bfloat16
> 34f0c0a98a5 softfloat: Define convert operations for bfloat16
> 8282310d853 softfloat: Define operations for bfloat16
> 0d93d8ec632 softfloat: Add fp16 and uint8/int8 conversion functions
> fbcc38e4cb1 softfloat: add xtensa specialization for pickNaNMulAdd
> 913602e3ffe softfloat: pass float_status pointer to pickNaN
> cc43c692511 softfloat: make NO_SIGNALING_NANS runtime property
> 73ebe95e8e5 target/ppc: add vmulld to INDEX_op_mul_vec case
>
> > 
> > 
> > Test Program: matmult_double
> > 
> > Target  Instructions  Latest  v5.1.0
> > --    --  --
> > aarch641 412 412 599   - +0.311%
> > alpha  3 233 957 639   - +7.472%
> > arm8 545 302 995   -  +1.09%
> > hppa   3 483 527 330   - +4.466%
> > m68k   3 919 110 506   -+18.433%
> > mips   2 344 641 840   - +4.085%
> > mipsel 3 329 912 425   - +5.177%
> > mips64 2 359 024 910   - +4.075%
> > mips64el   3 343 650 686   - +5.166%
> > ppc3 209 505 701   - +3.248%
> > ppc64  3 287 495 266   - +3.173%
> > ppc64le3 287 135 580   - +3.171%
> > riscv641 221 617 903   - +0.278%
> > s390x  2 874 160 417   - +5.826%
> > sh43 544 094 841   -  +6.42%
> > sparc643 426 094 848   - +7.138%
> > x86_64 1 249 076 697   - +0.335%
> > 
> ...
> > 
> > Test Program: qsort_double
> > 
> > Target  Instructions  Latest  v5.1.0
> > --    --  --
> > aarch642 709 839 947   - +2.423%
> > alpha  1 969 432 086   - +3.679%
> > arm8 323 168 267   - +2.589%
> > hppa   3 188 316 726   -   +2.9%
> > m68k   4 953 947 225   -+15.153%
> > mips   2 123 789 120   - +3.049%
> > mipsel 2 124 235 492   - +3.049%
> > mips64 1 999 025 951   - +3.404%
> > mips64el   1 996 433 190   - +3.409%
> > ppc2 819 299 843   - +5.436%
> > ppc64  2 768 177 037   - +5.512%
> > ppc64le2 724 766 044   - +5.602%
> > riscv641 638 324 190   - +4.021%
> > s390x  2 519 117 806   - +3.364%
> > sh42 595 696 102   -   +3.0%
> > sparc643 988 892 763   - +2.744%
> > x86_64 2 033 624 062   - +3.242%
> > 

Hi Mr. Philippe,
The performance degradation from v5.1.0 of all targets, and especially
m68k, was introduced between the two nightly tests below:

[REPORT] Night

[REPORT] Nightly Performance Tests - Sunday, September 13, 2020

2020-09-13 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-13 21:35:01
End Time (UTC)   : 2020-09-13 22:07:44
Execution Time   : 0:32:42.230467

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT f00f57f3

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 508 828   - +1.702%
alpha  1 914 935 737   - +3.522%
arm8 076 530 212   - +2.308%
hppa   4 261 683 277   - +3.164%
m68k   2 690 292 498   - +7.134%
mips   1 861 900 192   - +2.484%
mipsel 2 008 230 252   - +2.675%
mips64 1 918 619 554   - +2.817%
mips64el   2 051 563 493   - +3.026%
ppc2 480 182 998   -  +3.11%
ppc64  2 576 703 950   - +3.142%
ppc64le2 558 815 580   -  +3.17%
riscv641 406 695 380   - +2.649%
s390x  3 158 135 553   - +3.119%
sh42 364 595 912   -  +3.34%
sparc643 318 698 159   - +3.855%
x86_64 1 775 941 994   - +2.167%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 743 600   - +1.429%
alpha  3 191 833 331   - +3.695%
arm   16 357 294 690   - +2.348%
hppa   7 228 398 333   - +3.086%
m68k   4 294 058 807   - +9.693%
mips   3 051 314 390   - +2.423%
mipsel 3 231 539 489   -  +2.87%
mips64 3 245 810 333   - +2.595%
mips64el   3 414 224 392   - +3.021%
ppc4 914 566 476   - +4.741%
ppc64  5 098 142 490   - +4.565%
ppc64le5 082 379 941   - +4.579%
riscv642 192 281 397   - +1.955%
s390x  4 584 584 444   - +2.898%
sh43 949 187 813   - +3.468%
sparc644 586 103 196   - +4.235%
x86_64 2 484 245 365   - +1.757%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 358 194   - +1.501%
alpha  1 494 102 190   - +2.148%
arm8 263 047 212   - +2.667%
hppa   5 207 316 610   - +3.047%
m68k   1 725 879 565   - +2.528%
mips   1 495 107 231   - +1.484%
mipsel 1 497 159 278   -  +1.48%
mips64 1 715 414 609   - +1.893%
mips64el   1 695 217 507   - +1.909%
ppc2 014 609 510   - +1.822%
ppc64  2 206 261 487   - +2.138%
ppc64le2 197 963 518   - +2.144%
riscv641 354 889 828   - +2.394%
s390x  2 916 092 750   - +1.236%
sh41 990 683 821   - +2.677%
sparc642 874 140 449   - +3.827%
x86_64 1 554 138 493   -  +2.13%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 412 599   - +0.311%
alpha  3 233 957 639   - +7.472%
arm8 545 302 995   -  +1.09%
hppa   3 483 527 330   - +4.466%
m68k   3 919 110 506   -+18.433%
mips   2 344 641 840   - +4.085%
mipsel 3 329 912 425   - +5.177%
mips64 2 359 024 910   - +4.075%

[REPORT] Nightly Performance Tests - Saturday, September 12, 2020

2020-09-12 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-12 23:30:02
End Time (UTC)   : 2020-09-13 00:10:47
Execution Time   : 0:40:45.297747

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT c47edb8d

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 533 305   - +1.704%
alpha  1 914 937 425   - +3.522%
arm8 076 539 818   - +2.308%
hppa   4 261 680 617   - +3.163%
m68k   2 690 300 478   - +7.134%
mips   1 861 900 008   - +2.484%
mipsel 2 008 241 691   - +2.676%
mips64 1 918 606 346   - +2.816%
mips64el   2 051 547 798   - +3.025%
ppc2 480 190 796   -  +3.11%
ppc64  2 576 721 225   - +3.143%
ppc64le2 558 836 762   - +3.172%
riscv641 406 697 771   - +2.649%
s390x  3 158 142 233   - +3.119%
sh42 364 602 125   - +3.341%
sparc643 318 709 670   - +3.855%
x86_64 1 775 943 554   - +2.167%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 765 352   -  +1.43%
alpha  3 191 834 824   - +3.695%
arm   16 357 303 898   - +2.348%
hppa   7 228 395 763   - +3.086%
m68k   4 294 066 146   - +9.693%
mips   3 051 312 576   - +2.423%
mipsel 3 231 549 566   -  +2.87%
mips64 3 245 796 511   - +2.595%
mips64el   3 414 208 805   - +3.021%
ppc4 914 573 969   - +4.741%
ppc64  5 098 162 490   - +4.565%
ppc64le5 082 404 015   -  +4.58%
riscv642 192 282 536   - +1.955%
s390x  4 584 590 861   - +2.898%
sh43 949 193 845   - +3.468%
sparc644 586 114 994   - +4.235%
x86_64 2 484 250 046   - +1.757%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 379 662   - +1.502%
alpha  1 494 102 952   - +2.148%
arm8 263 056 483   - +2.667%
hppa   5 207 314 014   - +3.047%
m68k   1 725 886 877   - +2.528%
mips   1 495 105 629   - +1.484%
mipsel 1 497 169 390   - +1.481%
mips64 1 715 399 279   - +1.892%
mips64el   1 695 203 817   - +1.909%
ppc2 014 616 952   - +1.822%
ppc64  2 206 274 942   - +2.139%
ppc64le2 197 983 658   - +2.145%
riscv641 354 893 842   - +2.394%
s390x  2 916 099 097   - +1.236%
sh41 990 686 479   - +2.677%
sparc642 874 151 831   - +3.827%
x86_64 1 554 140 253   -  +2.13%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 439 264   - +0.313%
alpha  3 233 958 824   - +7.472%
arm8 545 314 889   -  +1.09%
hppa   3 483 524 691   - +4.466%
m68k   3 919 118 621   -+18.433%
mips   2 344 642 485   - +4.085%
mipsel 3 329 922 487   - +5.178%
mips64 2 359 012 166   - +4.074%

[REPORT] Nightly Performance Tests - Friday, September 11, 2020

2020-09-11 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-11 22:30:01
End Time (UTC)   : 2020-09-11 23:03:04
Execution Time   : 0:33:03.117362

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT f4ef8c9c

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 513 843   - +1.703%
alpha  1 914 941 891   - +3.522%
arm8 076 528 799   - +2.308%
hppa   4 261 695 580   - +3.164%
m68k   2 690 298 148   - +7.134%
mips   1 861 897 036   - +2.484%
mipsel 2 008 241 453   - +2.676%
mips64 1 918 628 017   - +2.818%
mips64el   2 051 556 337   - +3.025%
ppc2 480 183 459   -  +3.11%
ppc64  2 576 716 032   - +3.143%
ppc64le2 558 829 700   - +3.171%
riscv641 406 687 602   - +2.648%
s390x  3 158 138 787   - +3.119%
sh42 364 612 143   - +3.342%
sparc643 318 699 089   - +3.855%
x86_64 1 775 943 173   - +2.167%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 748 276   - +1.429%
alpha  3 191 839 657   - +3.695%
arm   16 357 293 127   - +2.348%
hppa   7 228 414 335   - +3.086%
m68k   4 294 064 495   - +9.693%
mips   3 051 311 472   - +2.423%
mipsel 3 231 555 095   -  +2.87%
mips64 3 245 820 200   - +2.596%
mips64el   3 414 216 987   - +3.021%
ppc4 914 567 276   - +4.741%
ppc64  5 098 155 410   - +4.565%
ppc64le5 082 397 275   - +4.579%
riscv642 192 270 574   - +1.954%
s390x  4 584 588 182   - +2.898%
sh43 949 203 367   - +3.468%
sparc644 586 103 876   - +4.235%
x86_64 2 484 248 999   - +1.757%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 362 911   - +1.501%
alpha  1 494 108 252   - +2.149%
arm8 263 045 423   - +2.667%
hppa   5 207 328 962   - +3.047%
m68k   1 725 884 327   - +2.528%
mips   1 495 104 505   - +1.484%
mipsel 1 497 168 730   -  +1.48%
mips64 1 715 424 042   - +1.894%
mips64el   1 695 213 527   - +1.909%
ppc2 014 610 319   - +1.822%
ppc64  2 206 271 027   - +2.139%
ppc64le2 197 976 823   - +2.145%
riscv641 354 882 544   - +2.393%
s390x  2 916 096 113   - +1.236%
sh41 990 699 387   - +2.678%
sparc642 874 141 101   - +3.827%
x86_64 1 554 141 680   -  +2.13%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 417 236   - +0.312%
alpha  3 233 966 468   - +7.472%
arm8 545 303 870   -  +1.09%
hppa   3 483 540 419   - +4.467%
m68k   3 919 115 602   -+18.433%
mips   2 344 638 746   - +4.085%
mipsel 3 329 921 883   - +5.178%
mips64 2 359 031 228   - +4.075%

[REPORT] Nightly Performance Tests - Thursday, September 10, 2020

2020-09-10 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-10 22:30:02
End Time (UTC)   : 2020-09-10 23:02:44
Execution Time   : 0:32:42.042974

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 9435a8b3

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 530 784   - +1.704%
alpha  1 914 942 924   - +3.522%
arm8 076 533 542   - +2.308%
hppa   4 261 686 945   - +3.163%
m68k   2 690 301 917   - +7.134%
mips   1 861 908 658   - +2.485%
mipsel 2 008 238 228   - +2.676%
mips64 1 918 619 080   - +2.817%
mips64el   2 051 558 749   - +3.026%
ppc2 480 186 117   -  +3.11%
ppc64  2 576 712 845   - +3.143%
ppc64le2 558 834 556   - +3.172%
riscv641 406 686 610   - +2.648%
s390x  3 158 138 926   - +3.119%
sh42 364 607 396   - +3.341%
sparc643 318 699 569   - +3.855%
x86_64 1 775 939 491   - +2.167%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 763 459   -  +1.43%
alpha  3 191 841 001   - +3.695%
arm   16 357 297 933   - +2.348%
hppa   7 228 403 780   - +3.086%
m68k   4 294 067 102   - +9.693%
mips   3 051 322 440   - +2.423%
mipsel 3 231 546 639   -  +2.87%
mips64 3 245 809 829   - +2.595%
mips64el   3 414 221 164   - +3.021%
ppc4 914 569 790   - +4.741%
ppc64  5 098 154 111   - +4.565%
ppc64le5 082 401 939   -  +4.58%
riscv642 192 269 262   - +1.954%
s390x  4 584 587 581   - +2.898%
sh43 949 198 637   - +3.468%
sparc644 586 104 691   - +4.235%
x86_64 2 484 245 145   - +1.757%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 378 588   - +1.502%
alpha  1 494 107 106   - +2.148%
arm8 263 049 727   - +2.667%
hppa   5 207 320 779   - +3.047%
m68k   1 725 888 114   - +2.528%
mips   1 495 115 474   - +1.484%
mipsel 1 497 166 302   -  +1.48%
mips64 1 715 414 160   - +1.893%
mips64el   1 695 215 089   - +1.909%
ppc2 014 612 573   - +1.822%
ppc64  2 206 266 568   - +2.138%
ppc64le2 197 979 170   - +2.145%
riscv641 354 881 275   - +2.393%
s390x  2 916 095 691   - +1.236%
sh41 990 694 661   - +2.678%
sparc642 874 141 872   - +3.827%
x86_64 1 554 138 280   -  +2.13%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 435 761   - +0.313%
alpha  3 233 965 145   - +7.472%
arm8 545 308 485   -  +1.09%
hppa   3 483 531 263   - +4.466%
m68k   3 919 118 884   -+18.433%
mips   2 344 652 365   - +4.085%
mipsel 3 329 919 269   - +5.178%
mips64 2 359 024 447   - +4.075%

[REPORT] Nightly Performance Tests - Wednesday, September 9, 2020

2020-09-09 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-09 22:30:01
End Time (UTC)   : 2020-09-09 23:03:07
Execution Time   : 0:33:05.538923

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 9435a8b3

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 522 544   - +1.703%
alpha  1 914 935 308   - +3.521%
arm8 076 520 812   - +2.308%
hppa   4 261 684 900   - +3.164%
m68k   2 690 295 007   - +7.134%
mips   1 861 890 014   - +2.483%
mipsel 2 008 235 946   - +2.676%
mips64 1 918 617 216   - +2.817%
mips64el   2 051 541 704   - +3.024%
ppc2 480 189 411   -  +3.11%
ppc64  2 576 718 770   - +3.143%
ppc64le2 558 841 069   - +3.172%
riscv641 406 690 655   - +2.649%
s390x  3 158 135 000   - +3.118%
sh42 364 613 426   - +3.342%
sparc643 318 708 055   - +3.855%
x86_64 1 775 950 449   - +2.167%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 756 695   -  +1.43%
alpha  3 191 832 287   - +3.695%
arm   16 357 285 139   - +2.348%
hppa   7 228 399 652   - +3.086%
m68k   4 294 060 261   - +9.693%
mips   3 051 309 526   - +2.423%
mipsel 3 231 548 527   -  +2.87%
mips64 3 245 807 529   - +2.595%
mips64el   3 414 201 984   - +3.021%
ppc4 914 573 285   - +4.741%
ppc64  5 098 156 715   - +4.565%
ppc64le5 082 408 601   -  +4.58%
riscv642 192 274 279   - +1.954%
s390x  4 584 583 501   - +2.897%
sh43 949 205 390   - +3.468%
sparc644 586 112 971   - +4.235%
x86_64 2 484 255 485   - +1.757%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 370 437   - +1.502%
alpha  1 494 100 836   - +2.148%
arm8 263 037 447   - +2.666%
hppa   5 207 317 925   - +3.047%
m68k   1 725 881 154   - +2.528%
mips   1 495 096 306   - +1.483%
mipsel 1 497 162 134   -  +1.48%
mips64 1 715 411 865   - +1.893%
mips64el   1 695 198 169   - +1.908%
ppc2 014 615 604   - +1.822%
ppc64  2 206 275 558   - +2.139%
ppc64le2 197 987 787   - +2.146%
riscv641 354 886 276   - +2.394%
s390x  2 916 091 833   - +1.236%
sh41 990 699 952   - +2.678%
sparc642 874 149 950   - +3.827%
x86_64 1 554 147 703   - +2.131%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 427 497   - +0.313%
alpha  3 233 958 809   - +7.472%
arm8 545 295 858   -  +1.09%
hppa   3 483 528 550   - +4.466%
m68k   3 919 112 040   -+18.433%
mips   2 344 633 252   - +4.084%
mipsel 3 329 915 307   - +5.178%
mips64 2 359 022 136   - +4.075%

[REPORT] Nightly Performance Tests - Tuesday, September 8, 2020

2020-09-08 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-08 22:30:02
End Time (UTC)   : 2020-09-08 23:03:19
Execution Time   : 0:33:16.914858

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 67790385

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 526 758   - +1.704%
alpha  1 914 949 603   - +3.523%
arm8 076 536 751   - +2.308%
hppa   4 261 694 044   - +3.164%
m68k   2 690 302 191   - +7.135%
mips   1 861 899 373   - +2.484%
mipsel 2 008 222 249   - +2.675%
mips64 1 918 624 363   - +2.817%
mips64el   2 051 551 547   - +3.025%
ppc2 480 197 737   - +3.111%
ppc64  2 576 708 052   - +3.142%
ppc64le2 558 825 498   - +3.171%
riscv641 406 678 911   - +2.647%
s390x  3 158 128 812   - +3.118%
sh42 364 598 659   - +3.341%
sparc643 318 700 221   - +3.855%
x86_64 1 775 940 947   - +2.167%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 759 323   -  +1.43%
alpha  3 191 847 048   - +3.695%
arm   16 357 300 631   - +2.348%
hppa   7 228 408 847   - +3.086%
m68k   4 294 067 760   - +9.693%
mips   3 051 313 439   - +2.423%
mipsel 3 231 530 724   - +2.869%
mips64 3 245 816 156   - +2.596%
mips64el   3 414 213 537   - +3.021%
ppc4 914 582 254   - +4.741%
ppc64  5 098 145 682   - +4.565%
ppc64le5 082 392 870   - +4.579%
riscv642 192 263 039   - +1.954%
s390x  4 584 577 681   - +2.897%
sh43 949 190 599   - +3.468%
sparc644 586 105 365   - +4.235%
x86_64 2 484 245 841   - +1.757%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 374 595   - +1.502%
alpha  1 494 113 126   - +2.149%
arm8 263 054 865   - +2.667%
hppa   5 207 327 123   - +3.047%
m68k   1 725 888 200   - +2.528%
mips   1 495 106 430 -0.011% +1.484%
mipsel 1 497 151 197   - +1.479%
mips64 1 715 420 469   - +1.893%
mips64el   1 695 205 871   - +1.909%
ppc2 014 623 828   - +1.823%
ppc64  2 206 264 991   - +2.138%
ppc64le2 197 972 278   - +2.145%
riscv641 354 872 661   - +2.393%
s390x  2 916 085 978   - +1.236%
sh41 990 686 615   - +2.677%
sparc642 874 142 528   - +3.827%
x86_64 1 554 139 013   -  +2.13%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 431 616 +0.011% +0.313%
alpha  3 233 973 783   - +7.473%
arm8 545 311 751   -  +1.09%
hppa   3 483 537 736   - +4.467%
m68k   3 919 119 423   -+18.433%
mips   2 344 643 485   - +4.085%
mipsel 3 329 903 673   - +5.177%
mips64 2 359 030 673   - +4.075%

[REPORT] Nightly Performance Tests - Monday, September 7, 2020

2020-09-07 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-07 22:30:02
End Time (UTC)   : 2020-09-07 23:02:34
Execution Time   : 0:32:31.848707

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT e11bd71f

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 376 909   - +1.694%
alpha  1 914 987 215   - +3.526%
arm8 076 559 412   - +2.309%
hppa   4 261 667 542   - +3.163%
m68k   2 690 305 241   - +7.135%
mips   1 862 066 070   - +2.496%
mipsel 2 008 237 631   - +2.676%
mips64 1 918 633 887   - +2.818%
mips64el   2 051 563 013   - +3.026%
ppc2 480 186 076   -  +3.11%
ppc64  2 576 712 418   - +3.143%
ppc64le2 558 867 616   - +3.174%
riscv641 406 730 247   - +2.652%
s390x  3 158 142 650   - +3.119%
sh42 364 490 038   - +3.333%
sparc643 318 815 234   - +3.861%
x86_64 1 775 821 957   - +2.157%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 610 271   - +1.425%
alpha  3 191 884 386   - +3.697%
arm   16 357 323 255   - +2.348%
hppa   7 228 383 429   - +3.086%
m68k   4 294 071 402   - +9.693%
mips   3 051 478 106   - +2.429%
mipsel 3 231 545 860   -  +2.87%
mips64 3 245 823 176   - +2.596%
mips64el   3 414 224 497   - +3.021%
ppc4 914 571 558   - +4.741%
ppc64  5 098 150 107   - +4.565%
ppc64le5 082 428 296   -  +4.58%
riscv642 192 315 793   - +1.956%
s390x  4 584 591 383   - +2.898%
sh43 949 081 784   - +3.465%
sparc644 586 221 241   - +4.238%
x86_64 2 484 126 935   - +1.752%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 225 365   - +1.495%
alpha  1 494 152 973   - +2.152%
arm8 263 076 066   - +2.667%
hppa   5 207 301 102   - +3.046%
m68k   1 725 889 780   - +2.529%
mips   1 495 270 793   - +1.495%
mipsel 1 497 165 711   -  +1.48%
mips64 1 715 427 272   - +1.894%
mips64el   1 695 219 297   -  +1.91%
ppc2 014 612 375   - +1.822%
ppc64  2 206 269 047   - +2.139%
ppc64le2 198 015 652   - +2.147%
riscv641 354 926 118   - +2.397%
s390x  2 916 099 522   - +1.236%
sh41 990 577 118   - +2.672%
sparc642 874 256 944   - +3.831%
x86_64 1 554 019 603   - +2.122%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 282 569   - +0.302%
alpha  3 234 011 219   - +7.474%
arm8 545 334 711   -  +1.09%
hppa   3 483 512 094   - +4.466%
m68k   3 919 123 098   -+18.433%
mips   2 344 810 281   - +4.092%
mipsel 3 329 918 767   - +5.178%
mips64 2 359 040 228   - +4.076%

[REPORT] Nightly Performance Tests - Sunday, September 6, 2020

2020-09-06 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-06 22:30:01
End Time (UTC)   : 2020-09-06 23:03:15
Execution Time   : 0:33:13.596113

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 7c37270b

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 368 938   - +1.694%
alpha  1 914 974 770   - +3.525%
arm8 076 533 928   - +2.308%
hppa   4 261 662 205   - +3.163%
m68k   2 690 304 604   - +7.135%
mips   1 862 052 805   - +2.495%
mipsel 2 008 238 496   - +2.676%
mips64 1 918 641 080   - +2.818%
mips64el   2 051 560 694   - +3.026%
ppc2 480 167 504   - +3.109%
ppc64  2 576 714 539   - +3.143%
ppc64le2 558 857 542   - +3.173%
riscv641 406 726 161   - +2.652%
s390x  3 158 150 495   - +3.119%
sh42 364 464 886   - +3.331%
sparc643 318 827 014   - +3.861%
x86_64 1 775 819 330   - +2.157%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 603 221   - +1.424%
alpha  3 191 870 253   - +3.696%
arm   16 357 297 762   - +2.348%
hppa   7 228 377 811   - +3.086%
m68k   4 294 069 988   - +9.693%
mips   3 051 466 603   - +2.428%
mipsel 3 231 546 446   -  +2.87%
mips64 3 245 832 942   - +2.596%
mips64el   3 414 222 197   - +3.021%
ppc4 914 551 625   -  +4.74%
ppc64  5 098 153 168   - +4.565%
ppc64le5 082 421 772   -  +4.58%
riscv642 192 309 680   - +1.956%
s390x  4 584 599 723   - +2.898%
sh43 949 056 427   - +3.464%
sparc644 586 233 308   - +4.238%
x86_64 2 484 125 760   - +1.752%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 214 520   - +1.494%
alpha  1 494 138 832   - +2.151%
arm8 263 050 249   - +2.667%
hppa   5 207 295 533   - +3.046%
m68k   1 725 891 138   - +2.529%
mips   1 495 259 637   - +1.494%
mipsel 1 497 166 282   -  +1.48%
mips64 1 715 436 791   - +1.894%
mips64el   1 695 217 041   - +1.909%
ppc2 014 591 618   - +1.821%
ppc64  2 206 271 717   - +2.139%
ppc64le2 198 003 699   - +2.146%
riscv641 354 920 580   - +2.396%
s390x  2 916 104 932   - +1.236%
sh41 990 549 868   -  +2.67%
sparc642 874 269 567   - +3.831%
x86_64 1 554 016 064   - +2.122%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 274 274   - +0.302%
alpha  3 233 999 758   - +7.473%
arm8 545 308 683   -  +1.09%
hppa   3 483 506 445   - +4.466%
m68k   3 919 121 861   -+18.433%
mips   2 344 796 448   - +4.092%
mipsel 3 329 919 320   - +5.178%
mips64 2 359 046 997   - +4.076%

[REPORT] Nightly Performance Tests - Saturday, September 5, 2020

2020-09-05 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-05 22:30:02
End Time (UTC)   : 2020-09-05 23:02:41
Execution Time   : 0:32:38.859046

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 8ca019b9

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 372 846   - +1.694%
alpha  1 914 985 445   - +3.526%
arm8 076 535 242   - +2.308%
hppa   4 261 669 730   - +3.163%
m68k   2 690 289 243   - +7.134%
mips   1 862 064 257   - +2.496%
mipsel 2 008 246 117   - +2.676%
mips64 1 918 642 593   - +2.819%
mips64el   2 051 561 403   - +3.026%
ppc2 480 179 159   -  +3.11%
ppc64  2 576 714 747   - +3.143%
ppc64le2 558 860 504   - +3.173%
riscv641 406 723 320   - +2.652%
s390x  3 158 151 520   - +3.119%
sh42 364 479 204   - +3.333%
sparc643 318 821 622   - +3.861%
x86_64 1 775 829 866   - +2.158%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 604 661   - +1.424%
alpha  3 191 883 440   - +3.697%
arm   16 357 299 371   - +2.348%
hppa   7 228 385 164   - +3.086%
m68k   4 294 054 921   - +9.693%
mips   3 051 478 321   - +2.429%
mipsel 3 231 555 078   -  +2.87%
mips64 3 245 831 513   - +2.596%
mips64el   3 414 222 653   - +3.021%
ppc4 914 563 597   - +4.741%
ppc64  5 098 153 685   - +4.565%
ppc64le5 082 423 743   -  +4.58%
riscv642 192 306 122   - +1.956%
s390x  4 584 600 496   - +2.898%
sh43 949 070 779   - +3.465%
sparc644 586 228 003   - +4.238%
x86_64 2 484 133 478   - +1.752%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 218 962   - +1.495%
alpha  1 494 152 165   - +2.152%
arm8 263 051 865   - +2.667%
hppa   5 207 303 004   - +3.047%
m68k   1 725 875 771   - +2.528%
mips   1 495 271 435   - +1.495%
mipsel 1 497 172 181   - +1.481%
mips64 1 715 437 823   - +1.895%
mips64el   1 695 217 319   - +1.909%
ppc2 014 602 742   - +1.822%
ppc64  2 206 269 157   - +2.139%
ppc64le2 198 007 802   - +2.147%
riscv641 354 920 272   - +2.396%
s390x  2 916 108 759   - +1.236%
sh41 990 564 182   - +2.671%
sparc642 874 264 153   - +3.831%
x86_64 1 554 026 228   - +2.123%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 278 646   - +0.302%
alpha  3 234 009 994   - +7.474%
arm8 545 310 309   -  +1.09%
hppa   3 483 513 896   - +4.466%
m68k   3 919 106 424   -+18.432%
mips   2 344 808 256   - +4.092%
mipsel 3 329 928 020   - +5.178%
mips64 2 359 048 149   - +4.076%

[REPORT] Nightly Performance Tests - Friday, September 4, 2020

2020-09-04 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-04 22:30:01
End Time (UTC)   : 2020-09-04 23:02:43
Execution Time   : 0:32:41.137532

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 1133ce5e

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 367 875   - +1.694%
alpha  1 914 974 588   - +3.525%
arm8 076 537 577   - +2.308%
hppa   4 261 662 168   - +3.163%
m68k   2 690 308 642   - +7.135%
mips   1 862 063 208   - +2.496%
mipsel 2 008 248 016   - +2.676%
mips64 1 918 630 831   - +2.818%
mips64el   2 051 549 551   - +3.025%
ppc2 480 175 842   -  +3.11%
ppc64  2 576 710 679   - +3.142%
ppc64le2 558 859 780   - +3.173%
riscv641 406 726 829   - +2.652%
s390x  3 158 144 397   - +3.119%
sh42 364 478 176   - +3.333%
sparc643 318 820 841   - +3.861%
x86_64 1 775 814 279   - +2.157%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 601 310   - +1.424%
alpha  3 191 872 263   - +3.696%
arm   16 357 308 169   - +2.348%
hppa   7 228 377 660   - +3.086%
m68k   4 294 075 336   - +9.693%
mips   3 051 477 273   - +2.429%
mipsel 3 231 556 258   -  +2.87%
mips64 3 245 819 354   - +2.596%
mips64el   3 414 210 464   - +3.021%
ppc4 914 559 526   -  +4.74%
ppc64  5 098 149 082   - +4.565%
ppc64le5 082 423 403   -  +4.58%
riscv642 192 311 866   - +1.956%
s390x  4 584 593 271   - +2.898%
sh43 949 069 035   - +3.465%
sparc644 586 226 389   - +4.238%
x86_64 2 484 119 875   - +1.752%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 216 374   - +1.494%
alpha  1 494 141 041   - +2.151%
arm8 263 052 939   - +2.667%
hppa   5 207 295 529   - +3.046%
m68k   1 725 893 071   - +2.529%
mips   1 495 267 687   - +1.495%
mipsel 1 497 173 485   - +1.481%
mips64 1 715 426 577   - +1.894%
mips64el   1 695 203 694   - +1.909%
ppc2 014 602 248   - +1.822%
ppc64  2 206 265 317   - +2.138%
ppc64le2 198 007 387   - +2.146%
riscv641 354 920 226   - +2.396%
s390x  2 916 101 653   - +1.236%
sh41 990 565 618   - +2.671%
sparc642 874 262 720   - +3.831%
x86_64 1 554 012 507   - +2.122%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 273 280   - +0.302%
alpha  3 233 996 524   - +7.473%
arm8 545 311 371   -  +1.09%
hppa   3 483 506 390   - +4.466%
m68k   3 919 126 809   -+18.433%
mips   2 344 804 626   - +4.092%
mipsel 3 329 929 173   - +5.178%
mips64 2 359 036 928   - +4.075%

[REPORT] Nightly Performance Tests - Thursday, September 3, 2020

2020-09-03 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-03 22:30:01
End Time (UTC)   : 2020-09-03 23:02:32
Execution Time   : 0:32:30.542730

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 67a7bfe5

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 367 755   - +1.694%
alpha  1 914 960 995   - +3.524%
arm8 076 520 721   - +2.308%
hppa   4 261 664 764   - +3.163%
m68k   2 690 290 514   - +7.134%
mips   1 862 051 405   - +2.495%
mipsel 2 008 225 202   - +2.675%
mips64 1 918 629 510   - +2.818%
mips64el   2 051 549 378   - +3.025%
ppc2 480 156 856   - +3.108%
ppc64  2 576 717 767   - +3.143%
ppc64le2 558 850 374   - +3.172%
riscv641 406 711 802   - +2.651%
s390x  3 158 134 609   - +3.118%
sh42 364 458 411   - +3.331%
sparc643 318 813 316   - +3.861%
x86_64 1 775 808 298   - +2.156%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 601 465   - +1.424%
alpha  3 191 859 003   - +3.696%
arm   16 357 291 602   - +2.348%
hppa   7 228 382 065   - +3.086%
m68k   4 294 056 156   - +9.693%
mips   3 051 465 554   - +2.428%
mipsel 3 231 534 555   -  +2.87%
mips64 3 245 821 920   - +2.596%
mips64el   3 414 211 292   - +3.021%
ppc4 914 541 834   -  +4.74%
ppc64  5 098 157 130   - +4.565%
ppc64le5 082 415 071   -  +4.58%
riscv642 192 296 198   - +1.955%
s390x  4 584 583 479   - +2.897%
sh43 949 049 364   - +3.464%
sparc644 586 218 963   - +4.238%
x86_64 2 484 115 032   - +1.751%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 216 632   - +1.494%
alpha  1 494 125 156   -  +2.15%
arm8 263 036 399   - +2.666%
hppa   5 207 297 927   - +3.046%
m68k   1 725 877 340   - +2.528%
mips   1 495 255 900   - +1.494%
mipsel 1 497 151 612   - +1.479%
mips64 1 715 424 355   - +1.894%
mips64el   1 695 205 171   - +1.909%
ppc2 014 582 478   - +1.821%
ppc64  2 206 273 897   - +2.139%
ppc64le2 197 997 803   - +2.146%
riscv641 354 908 557   - +2.395%
s390x  2 916 091 851   - +1.236%
sh41 990 545 408   -  +2.67%
sparc642 874 255 135   - +3.831%
x86_64 1 554 005 310   - +2.121%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 273 024   - +0.302%
alpha  3 233 985 929   - +7.473%
arm8 545 294 928   -  +1.09%
hppa   3 483 506 780   - +4.466%
m68k   3 919 107 825   -+18.433%
mips   2 344 795 311   - +4.092%
mipsel 3 329 907 371   - +5.177%
mips64 2 359 034 683   - +4.075%

[REPORT] Nightly Performance Tests - Wednesday, September 2, 2020

2020-09-02 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-02 22:30:02
End Time (UTC)   : 2020-09-02 23:02:38
Execution Time   : 0:32:36.315663

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT ed215cec

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 360 930   - +1.694%
alpha  1 914 970 396   - +3.524%
arm8 076 521 559   - +2.308%
hppa   4 261 655 888   - +3.162%
m68k   2 690 291 454   - +7.134%
mips   1 862 055 271   - +2.495%
mipsel 2 008 237 057   - +2.676%
mips64 1 918 627 557   - +2.818%
mips64el   2 051 536 272   - +3.024%
ppc2 480 160 545   - +3.108%
ppc64  2 576 707 600   - +3.142%
ppc64le2 558 869 865   - +3.174%
riscv641 406 710 004   -  +2.65%
s390x  3 158 126 452   - +3.118%
sh42 364 458 094   - +3.331%
sparc643 318 800 942   -  +3.86%
x86_64 1 775 807 445   - +2.156%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 596 104   - +1.424%
alpha  3 191 868 433   - +3.696%
arm   16 357 285 680   - +2.348%
hppa   7 228 372 058   - +3.086%
m68k   4 294 057 266   - +9.693%
mips   3 051 467 316   - +2.428%
mipsel 3 231 546 158   -  +2.87%
mips64 3 245 820 632   - +2.596%
mips64el   3 414 198 738   - +3.021%
ppc4 914 545 594   -  +4.74%
ppc64  5 098 147 494   - +4.565%
ppc64le5 082 438 783   -  +4.58%
riscv642 192 294 741   - +1.955%
s390x  4 584 575 668   - +2.897%
sh43 949 047 124   - +3.464%
sparc644 586 207 238   - +4.237%
x86_64 2 484 112 825   - +1.751%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 209 502   - +1.494%
alpha  1 494 136 649   - +2.151%
arm8 263 038 159   - +2.667%
hppa   5 207 287 269   - +3.046%
m68k   1 725 878 249   - +2.528%
mips   1 495 262 663   - +1.494%
mipsel 1 497 165 654   -  +1.48%
mips64 1 715 423 109   - +1.894%
mips64el   1 695 190 855   - +1.908%
ppc2 014 586 104   - +1.821%
ppc64  2 206 261 948   - +2.138%
ppc64le2 198 014 858   - +2.147%
riscv641 354 904 958   - +2.395%
s390x  2 916 083 991   - +1.236%
sh41 990 545 700   -  +2.67%
sparc642 874 242 969   -  +3.83%
x86_64 1 554 003 185   - +2.121%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 263 878   - +0.301%
alpha  3 233 995 126   - +7.473%
arm8 545 296 938   -  +1.09%
hppa   3 483 500 827   - +4.466%
m68k   3 919 108 993   -+18.433%
mips   2 344 799 492   - +4.092%
mipsel 3 329 919 049   - +5.178%
mips64 2 359 033 307   - +4.075%

Re: [PATCH 7/9] tests/performance: Add nightly tests

2020-09-02 Thread Ahmed Karaman
Thanks Mr. Alex,

On Wed, Sep 2, 2020 at 3:26 PM Alex Bennée  wrote:
>
>
> Ahmed Karaman  writes:
>
> > A nightly performance testing system to monitor any change in QEMU
> > performance across seventeen different targets.
> >
> > The system includes eight different benchmarks to provide a variety
> > of testing workloads.
> >
> > dijkstra_double:
> > Find the shortest path between the source node and all other nodes
> > using Dijkstra’s algorithm. The graph contains n nodes where all nxn
> > distances are double values. The value of n can be specified using
> > the -n flag. The default value is 2000.
> >
> > dijkstra_int32:
> > Find the shortest path between the source node and all other nodes
> > using Dijkstra’s algorithm. The graph contains n nodes where all nxn
> > distances are int32 values. The value of n can be specified using
> > the -n flag. The default value is 2000.
> >
> > matmult_double:
> > Standard matrix multiplication of an n*n matrix of randomly generated
> > double numbers from 0 to 100. The value of n is passed as an argument
> > with the -n flag. The default value is 200.
> >
> > matmult_int32:
> > Standard matrix multiplication of an n*n matrix of randomly generated
> > integer numbers from 0 to 100. The value of n is passed as an
> > argument with the -n flag. The default value is 200.
> >
> > qsort_double:
> > Quick sort of an array of n randomly generated double numbers from 0
> > to 1000. The value of n is passed as an argument with the -n flag.
> > The default value is 30.
> >
> > qsort_int32:
> > Quick sort of an array of n randomly generated integer numbers from 0
> > to 5000. The value of n is passed as an argument with the -n
> > flag.The default value is 30.
> >
> > qsort_string:
> > Quick sort of an array of 1 randomly generated strings of size 8
> > (including null terminating character). The sort process is repeated
> > n number of times. The value of n is passed as an argument with the
> > -n flag. The default value is 20.
> >
> > search_string:
> > Search for the occurrence of a small string in a much larger random
> > string (“needle in a hay”). The search process is repeated n number
> > of times and each time, a different large random string (“hay”) is
> > generated. The value of n can be specified using the -n flag. The
> > default value is 20.
> >
> > Syntax:
> > nightly_tests_core.py [-h] [-r REF]
> > Optional arguments:
> > -h, --helpShow this help message and exit
> > -r REF, --reference REF
> >         Reference QEMU version - Default is v5.1.0
> > Example of usage:
> > nightly_tests_core.py -r v5.1.0 2>log.txt
> >
> > The following report includes detailed setup and execution details
> > of the system:
> > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-Nightly-Performance-Tests/
> >
> > Signed-off-by: Ahmed Karaman 
> > ---
> >  tests/performance/nightly-tests/README.md | 243 +
> >  .../source/dijkstra_double/dijkstra_double.c  | 194 
> >  .../source/dijkstra_int32/dijkstra_int32.c| 192 
> >  .../source/matmult_double/matmult_double.c| 123 +++
> >  .../source/matmult_int32/matmult_int32.c  | 121 +++
> >  .../source/qsort_double/qsort_double.c| 104 ++
> >  .../source/qsort_int32/qsort_int32.c  | 103 ++
> >  .../source/qsort_string/qsort_string.c| 122 +++
> >  .../source/search_string/search_string.c  | 110 +++
> >  .../scripts/nightly_tests_core.py | 920 ++
> >  .../scripts/run_nightly_tests.py  | 135 +++
> >  .../nightly-tests/scripts/send_email.py   |  56 ++
> >  12 files changed, 2423 insertions(+)
> >  create mode 100644 tests/performance/nightly-tests/README.md
> >  create mode 100644 
> > tests/performance/nightly-tests/benchmarks/source/dijkstra_double/dijkstra_double.c
> >  create mode 100644 
> > tests/performance/nightly-tests/benchmarks/source/dijkstra_int32/dijkstra_int32.c
> >  create mode 100644 
> > tests/performance/nightly-tests/benchmarks/source/matmult_double/matmult_double.c
> >  create mode 100644 
> > tests/performance/nightly-tests/benchmarks/source/matmult_int32/matmult_int32.c
> >  create mode 100644 
> > tests/performance/nightly-tests/benchmarks/source/qsort_double/qsort_double.c
> >  create mode 100644 
> > tests/performance/nightly-tests/benchmarks/source/qsort_int32/qsort_int32.c
>

[REPORT] Nightly Performance Tests - Tuesday, September 1, 2020

2020-09-01 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-09-01 22:30:02
End Time (UTC)   : 2020-09-01 23:02:51
Execution Time   : 0:32:49.638129

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 8d90bfc5

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 350 999   - +1.693%
alpha  1 914 981 010   - +3.525%
arm8 076 531 001   - +2.308%
hppa   4 261 662 287   - +3.163%
m68k   2 690 302 840   - +7.135%
mips   1 862 054 380   - +2.495%
mipsel 2 008 241 001   - +2.676%
mips64 1 918 633 852   - +2.818%
mips64el   2 051 567 365   - +3.026%
ppc2 480 164 517   - +3.109%
ppc64  2 576 708 166   - +3.142%
ppc64le2 558 867 362   - +3.174%
riscv641 406 721 465   - +2.651%
s390x  3 158 148 058   - +3.119%
sh42 364 478 840   - +3.333%
sparc643 318 819 982   - +3.861%
x86_64 1 775 817 408   - +2.157%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 583 254   - +1.424%
alpha  3 191 875 753   - +3.696%
arm   16 357 301 960   - +2.348%
hppa   7 228 378 025   - +3.086%
m68k   4 294 068 499   - +9.693%
mips   3 051 468 311   - +2.428%
mipsel 3 231 549 756   -  +2.87%
mips64 3 245 827 156   - +2.596%
mips64el   3 414 230 354   - +3.022%
ppc4 914 550 074   -  +4.74%
ppc64  5 098 147 947   - +4.565%
ppc64le5 082 418 836   -  +4.58%
riscv642 192 306 931   - +1.956%
s390x  4 584 596 667   - +2.898%
sh43 949 069 729   - +3.465%
sparc644 586 225 467   - +4.238%
x86_64 2 484 124 345   - +1.752%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 199 232   - +1.494%
alpha  1 494 147 129   - +2.151%
arm8 263 046 429   - +2.667%
hppa   5 207 295 544   - +3.046%
m68k   1 725 886 990   - +2.528%
mips   1 495 261 093   - +1.494%
mipsel 1 497 168 507   -  +1.48%
mips64 1 715 429 703   - +1.894%
mips64el   1 695 229 035   -  +1.91%
ppc2 014 590 358   - +1.821%
ppc64  2 206 264 813   - +2.138%
ppc64le2 198 017 266   - +2.147%
riscv641 354 917 032   - +2.396%
s390x  2 916 104 780   - +1.236%
sh41 990 565 824   - +2.671%
sparc642 874 261 717   - +3.831%
x86_64 1 554 014 845   - +2.122%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 256 280   -   +0.3%
alpha  3 234 002 720   - +7.474%
arm8 545 305 325   -  +1.09%
hppa   3 483 506 497   - +4.466%
m68k   3 919 120 341   -+18.433%
mips   2 344 798 117   - +4.092%
mipsel 3 329 921 914   - +5.178%
mips64 2 359 037 334   - +4.075%

[REPORT] Nightly Performance Tests - Monday, August 31, 2020

2020-08-31 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-31 21:30:01
End Time (UTC)   : 2020-08-31 22:03:18
Execution Time   : 0:33:17.321826

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 2f4c51c0

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 385 359   - +1.695%
alpha  1 914 979 612   - +3.525%
arm8 076 413 634   - +2.305%
hppa   4 261 661 111   - +3.163%
m68k   2 690 309 055   - +7.135%
mips   1 862 059 680   - +2.496%
mipsel 2 008 233 282   - +2.675%
mips64 1 918 655 917   - +2.819%
mips64el   2 051 564 413   - +3.026%
ppc2 480 164 718   - +3.109%
ppc64  2 576 704 659   - +3.142%
ppc64le2 558 877 496   - +3.174%
riscv641 406 728 405   - +2.652%
s390x  3 158 138 330   - +3.119%
sh42 364 470 690   - +3.332%
sparc643 318 827 000   - +3.861%
x86_64 1 775 830 837   - +2.158%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 621 750   - +1.425%
alpha  3 191 877 533   - +3.696%
arm   16 357 167 482   - +2.347%
hppa   7 228 376 256   - +3.086%
m68k   4 294 075 581   - +9.693%
mips   3 051 474 700   - +2.429%
mipsel 3 231 542 693   -  +2.87%
mips64 3 245 849 927   - +2.597%
mips64el   3 414 228 231   - +3.022%
ppc4 914 550 541   -  +4.74%
ppc64  5 098 144 074   - +4.565%
ppc64le5 082 430 657   -  +4.58%
riscv642 192 314 810   - +1.956%
s390x  4 584 587 603   - +2.898%
sh43 949 062 260   - +3.465%
sparc644 586 233 287   - +4.238%
x86_64 2 484 139 245   - +1.752%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 231 707   - +1.495%
alpha  1 494 143 338   - +2.151%
arm8 262 936 445   - +2.665%
hppa   5 207 294 158   - +3.046%
m68k   1 725 895 594   - +2.529%
mips   1 495 263 941   - +1.494%
mipsel 1 497 162 301   -  +1.48%
mips64 1 715 451 246   - +1.895%
mips64el   1 695 220 602   -  +1.91%
ppc2 014 590 742   - +1.821%
ppc64  2 206 260 784   - +2.138%
ppc64le2 198 026 833   - +2.147%
riscv641 354 923 960   - +2.397%
s390x  2 916 095 829   - +1.236%
sh41 990 558 292   - +2.671%
sparc642 874 269 435   - +3.831%
x86_64 1 554 029 269   - +2.123%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 288 804   - +0.303%
alpha  3 234 004 360   - +7.474%
arm8 545 194 947   - +1.089%
hppa   3 483 504 965   - +4.466%
m68k   3 919 126 846   -+18.433%
mips   2 344 803 493   - +4.092%
mipsel 3 329 915 487   - +5.178%
mips64 2 359 059 099   - +4.076%

[REPORT] Nightly Performance Tests - Sunday, August 30, 2020

2020-08-30 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-30 22:30:02
End Time (UTC)   : 2020-08-30 23:02:40
Execution Time   : 0:32:38.741642

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 39335fab

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 356 799   - +1.693%
alpha  1 914 979 687   - +3.525%
arm8 076 407 703   - +2.305%
hppa   4 261 660 530   - +3.163%
m68k   2 690 309 633   - +7.135%
mips   1 862 053 953   - +2.495%
mipsel 2 008 238 376   - +2.676%
mips64 1 918 641 638   - +2.818%
mips64el   2 051 571 079   - +3.026%
ppc2 480 161 990   - +3.108%
ppc64  2 576 693 226   - +3.141%
ppc64le2 558 854 547   - +3.173%
riscv641 406 724 446   - +2.652%
s390x  3 158 133 785   - +3.118%
sh42 364 482 941   - +3.333%
sparc643 318 814 369   - +3.861%
x86_64 1 775 816 026   - +2.157%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 590 918   - +1.424%
alpha  3 191 876 357   - +3.696%
arm   16 357 180 254   - +2.347%
hppa   7 228 376 861   - +3.086%
m68k   4 294 075 069   - +9.693%
mips   3 051 468 173   - +2.428%
mipsel 3 231 546 500   -  +2.87%
mips64 3 245 835 893   - +2.596%
mips64el   3 414 234 674   - +3.022%
ppc4 914 547 363   -  +4.74%
ppc64  5 098 133 092   - +4.565%
ppc64le5 082 406 390   -  +4.58%
riscv642 192 311 045   - +1.956%
s390x  4 584 582 569   - +2.897%
sh43 949 075 402   - +3.465%
sparc644 586 219 976   - +4.238%
x86_64 2 484 120 493   - +1.752%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 203 163   - +1.494%
alpha  1 494 145 105   - +2.151%
arm8 262 925 102   - +2.665%
hppa   5 207 291 886   - +3.046%
m68k   1 725 893 958   - +2.529%
mips   1 495 258 401   - +1.494%
mipsel 1 497 166 001   -  +1.48%
mips64 1 715 435 496   - +1.894%
mips64el   1 695 224 300   -  +1.91%
ppc2 014 587 884   - +1.821%
ppc64  2 206 248 055   - +2.138%
ppc64le2 198 001 759   - +2.146%
riscv641 354 920 887   - +2.396%
s390x  2 916 088 319   - +1.236%
sh41 990 570 655   - +2.671%
sparc642 874 255 925   - +3.831%
x86_64 1 554 013 871   - +2.122%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 260 258   - +0.301%
alpha  3 234 003 322   - +7.474%
arm8 545 183 605   - +1.088%
hppa   3 483 505 358   - +4.466%
m68k   3 919 128 393   -+18.433%
mips   2 344 797 921   - +4.092%
mipsel 3 329 919 088   - +5.178%
mips64 2 359 048 388   - +4.076%

[REPORT] Nightly Performance Tests - Saturday, August 29, 2020

2020-08-29 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-29 22:35:01
End Time (UTC)   : 2020-08-29 23:07:59
Execution Time   : 0:32:57.786998

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 39335fab

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 345 257   - +1.693%
alpha  1 914 965 210   - +3.524%
arm8 076 411 724   - +2.304%
hppa   4 261 660 282   - +3.162%
m68k   2 690 286 019   - +7.133%
mips   1 862 042 108   - +2.494%
mipsel 2 008 215 481   - +2.674%
mips64 1 918 638 076   - +2.818%
mips64el   2 051 558 104   - +3.026%
ppc2 480 149 009   - +3.107%
ppc64  2 576 685 090   -  +3.14%
ppc64le2 558 841 352   - +3.172%
riscv641 406 710 751   -  +2.65%
s390x  3 158 131 112   - +3.118%
sh42 364 461 215   - +3.331%
sparc643 318 816 026   - +3.861%
x86_64 1 775 796 489   - +2.155%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 564 802   - +1.423%
alpha  3 191 863 094   - +3.696%
arm   16 357 176 719   - +2.347%
hppa   7 228 376 670   - +3.086%
m68k   4 294 055 120   - +9.693%
mips   3 051 470 521   - +2.428%
mipsel 3 231 521 199   - +2.869%
mips64 3 245 828 100   - +2.596%
mips64el   3 414 220 121   - +3.021%
ppc4 914 534 259   -  +4.74%
ppc64  5 098 122 090   - +4.565%
ppc64le5 082 391 080   - +4.579%
riscv642 192 297 538   - +1.956%
s390x  4 584 592 888   - +2.898%
sh43 949 055 211   - +3.464%
sparc644 586 220 801   - +4.238%
x86_64 2 484 100 236   - +1.751%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 194 213   - +1.493%
alpha  1 494 131 792   -  +2.15%
arm8 262 944 317   - +2.665%
hppa   5 207 293 281   - +3.046%
m68k   1 725 873 574   - +2.528%
mips   1 495 245 241   - +1.493%
mipsel 1 497 140 812   - +1.479%
mips64 1 715 422 394   - +1.894%
mips64el   1 695 211 434   - +1.909%
ppc2 014 572 096   -  +1.82%
ppc64  2 206 239 628   - +2.137%
ppc64le2 197 989 010   - +2.146%
riscv641 354 919 707   - +2.396%
s390x  2 916 086 767   - +1.236%
sh41 990 548 527   -  +2.67%
sparc642 874 257 044  +0.07% +3.831%
x86_64 1 553 996 279   - +2.121%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 251 151   -   +0.3%
alpha  3 233 987 397   - +7.473%
arm8 545 187 751   - +1.088%
hppa   3 483 519 295   - +4.466%
m68k   3 919 104 247   -+18.432%
mips   2 344 782 089   - +4.091%
mipsel 3 329 894 022   - +5.177%
mips64 2 359 038 211   - +4.076%

[REPORT] Nightly Performance Tests - Friday, August 28, 2020

2020-08-28 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-28 21:30:02
End Time (UTC)   : 2020-08-28 21:35:21
Execution Time   : 0:05:18.871569

Status   : FAILURE


SUMMARY REPORT - COMMIT a4e236b7


  ERROR LOGS

2020-08-28T21:30:02.254226 - Verifying executables of 8 benchmarks for 17 targets
2020-08-28T21:30:02.258421 - Verifying results of reference version v5.1.0
2020-08-28T21:30:02.278698 - Checking out master
2020-08-28T21:30:02.703788 - Pulling the latest changes from QEMU master
>From https://git.qemu.org/git/qemu
 * branch  master -> FETCH_HEAD
   ac8b279f13..a4e236b7d4  master -> origin/master
2020-08-28T21:30:08.254973 - Running 'configure' for master
2020-08-28T21:30:24.336578 - Running 'make' for master
2020-08-28T21:32:14.504346 - Measuring instructions for master - dijkstra_double
==15118== Callgrind, a call-graph generating cache profiler
==15118== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==15118== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==15118== Command: /home/ahmedkrmn/Desktop/GSoC2020/tools/qemu-nightly-tests/qemu-nightly/build-gcc/m68k-linux-user/qemu-m68k /home/ahmedkrmn/Desktop/GSoC2020/tools/qemu-nightly-tests/benchmarks/executables/dijkstra_double/dijkstra_double-m68k
==15118== 
==15118== For interactive control, run 'callgrind_control -h'.
==15118== 
==15118== Events: Ir
==15118== Collected : 4294020686
==15118== 
==15118== I   refs:  4,294,020,686






[PATCH 7/9] tests/performance: Add nightly tests

2020-08-28 Thread Ahmed Karaman
A nightly performance testing system to monitor any change in QEMU
performance across seventeen different targets.

The system includes eight different benchmarks to provide a variety
of testing workloads.

dijkstra_double:
Find the shortest path between the source node and all other nodes
using Dijkstra’s algorithm. The graph contains n nodes where all nxn
distances are double values. The value of n can be specified using
the -n flag. The default value is 2000.

dijkstra_int32:
Find the shortest path between the source node and all other nodes
using Dijkstra’s algorithm. The graph contains n nodes where all nxn
distances are int32 values. The value of n can be specified using
the -n flag. The default value is 2000.

matmult_double:
Standard matrix multiplication of an n*n matrix of randomly generated
double numbers from 0 to 100. The value of n is passed as an argument
with the -n flag. The default value is 200.

matmult_int32:
Standard matrix multiplication of an n*n matrix of randomly generated
integer numbers from 0 to 100. The value of n is passed as an
argument with the -n flag. The default value is 200.

qsort_double:
Quick sort of an array of n randomly generated double numbers from 0
to 1000. The value of n is passed as an argument with the -n flag.
The default value is 30.

qsort_int32:
Quick sort of an array of n randomly generated integer numbers from 0
to 5000. The value of n is passed as an argument with the -n
flag.The default value is 30.

qsort_string:
Quick sort of an array of 1 randomly generated strings of size 8
(including null terminating character). The sort process is repeated
n number of times. The value of n is passed as an argument with the
-n flag. The default value is 20.

search_string:
Search for the occurrence of a small string in a much larger random
string (“needle in a hay”). The search process is repeated n number
of times and each time, a different large random string (“hay”) is
generated. The value of n can be specified using the -n flag. The
default value is 20.

Syntax:
nightly_tests_core.py [-h] [-r REF]
Optional arguments:
-h, --helpShow this help message and exit
-r REF, --reference REF
Reference QEMU version - Default is v5.1.0
Example of usage:
nightly_tests_core.py -r v5.1.0 2>log.txt

The following report includes detailed setup and execution details
of the system:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-Nightly-Performance-Tests/

Signed-off-by: Ahmed Karaman 
---
 tests/performance/nightly-tests/README.md | 243 +
 .../source/dijkstra_double/dijkstra_double.c  | 194 
 .../source/dijkstra_int32/dijkstra_int32.c| 192 
 .../source/matmult_double/matmult_double.c| 123 +++
 .../source/matmult_int32/matmult_int32.c  | 121 +++
 .../source/qsort_double/qsort_double.c| 104 ++
 .../source/qsort_int32/qsort_int32.c  | 103 ++
 .../source/qsort_string/qsort_string.c| 122 +++
 .../source/search_string/search_string.c  | 110 +++
 .../scripts/nightly_tests_core.py | 920 ++
 .../scripts/run_nightly_tests.py  | 135 +++
 .../nightly-tests/scripts/send_email.py   |  56 ++
 12 files changed, 2423 insertions(+)
 create mode 100644 tests/performance/nightly-tests/README.md
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/dijkstra_double/dijkstra_double.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/dijkstra_int32/dijkstra_int32.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/matmult_double/matmult_double.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/matmult_int32/matmult_int32.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/qsort_double/qsort_double.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/qsort_int32/qsort_int32.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/qsort_string/qsort_string.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/search_string/search_string.c
 create mode 100755 
tests/performance/nightly-tests/scripts/nightly_tests_core.py
 create mode 100755 tests/performance/nightly-tests/scripts/run_nightly_tests.py
 create mode 100644 tests/performance/nightly-tests/scripts/send_email.py

diff --git a/tests/performance/nightly-tests/README.md 
b/tests/performance/nightly-tests/README.md
new file mode 100644
index 00..6db3b351b3
--- /dev/null
+++ b/tests/performance/nightly-tests/README.md
@@ -0,0 +1,243 @@
+### QEMU Nightly Tests
+
+**Required settings:**
+
+Update the `GMAIL_USER` object in `send_email.py` with your credentials.
+
+For more details on how the system works, please check the [eighth 
report](https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-Nightly-Performance-Tests/)
 of the "TCG Continuos Benchmarking" series.

[PATCH 8/9] MAINTAINERS: Add 'tests/performance' to 'Performance Tools and Tests' subsection

2020-08-28 Thread Ahmed Karaman
Add a new 'tests/performance' directory under the 'Performance Tools
and Tests' subsection.

Signed-off-by: Ahmed Karaman 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5a22c8be42..8923307642 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3124,3 +3124,4 @@ Performance Tools and Tests
 M: Ahmed Karaman 
 S: Maintained
 F: scripts/performance/
+F: tests/performance/
-- 
2.17.1




[PATCH 6/9] scripts/performance: Add bisect.py script

2020-08-28 Thread Ahmed Karaman
Python script that locates the commit that caused a performance
degradation or improvement in QEMU using the git bisect command
(binary search).

Syntax:
bisect.py [-h] -s,--start START [-e,--end END] [-q,--qemu QEMU] \
--target TARGET --tool {perf,callgrind} -- \
 []

[-h] - Print the script arguments help message
-s,--start START - First commit hash in the search range
[-e,--end END] - Last commit hash in the search range
(default: Latest commit)
[-q,--qemu QEMU] - QEMU path.
(default: Path to a GitHub QEMU clone)
--target TARGET - QEMU target name
--tool {perf,callgrind} - Underlying tool used for measurements

Example of usage:
bisect.py --start=fdd76fecdd --qemu=/path/to/qemu --target=ppc \
--tool=perf -- coulomb_double-ppc -n 1000

Example output:
Start Commit Instructions: 12,710,790,060
End Commit Instructions:   13,031,083,512
Performance Change:-2.458%

Estimated Number of Steps: 10

*BISECT STEP 1*
Instructions:13,031,097,790
Status:  slow commit
*BISECT STEP 2*
Instructions:12,710,805,265
Status:  fast commit
*BISECT STEP 3*
Instructions:13,031,028,053
Status:  slow commit
*BISECT STEP 4*
Instructions:12,711,763,211
Status:  fast commit
*BISECT STEP 5*
Instructions:13,031,027,292
Status:  slow commit
*BISECT STEP 6*
Instructions:12,711,748,738
Status:  fast commit
*BISECT STEP 7*
Instructions:12,711,748,788
Status:  fast commit
*BISECT STEP 8*
Instructions:13,031,100,493
Status:  slow commit
*BISECT STEP 9*
Instructions:12,714,472,954
Status:  fast commit
BISECT STEP 10*
Instructions:12,715,409,153
Status:  fast commit
BISECT STEP 11*
Instructions:12,715,394,739
Status:  fast commit

*BISECT RESULT*
commit 0673ecdf6cb2b1445a85283db8cbacb251c46516
Author: Richard Henderson 
Date:   Tue May 5 10:40:23 2020 -0700

softfloat: Inline float64 compare specializations

Replace the float64 compare specializations with inline functions
that call the standard float64_compare{,_quiet} functions.
Use bool as the return type.
***

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/bisect.py | 425 ++
 1 file changed, 425 insertions(+)
 create mode 100755 scripts/performance/bisect.py

diff --git a/scripts/performance/bisect.py b/scripts/performance/bisect.py
new file mode 100755
index 00..0c60be22ce
--- /dev/null
+++ b/scripts/performance/bisect.py
@@ -0,0 +1,425 @@
+#!/usr/bin/env python3
+
+"""
+Locate the commit that caused a performance degradation or improvement in
+QEMU using the git bisect command (binary search).
+
+This file is a part of the project "TCG Continuous Benchmarking".
+
+Copyright (C) 2020  Ahmed Karaman 
+Copyright (C) 2020  Aleksandar Markovic 
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
+
+import argparse
+import multiprocessing
+import tempfile
+import os
+import shutil
+import subprocess
+import sys
+
+from typing import List
+
+
+# --- GIT WRAPPERS --
+def git_bisect(qemu_path: str, qemu_build_path: str, command: str,
+   args: List[str] = None) -> str:
+"""
+Wrapper function for running git bisect.
+
+Parameters:
+qemu_path (str): QEMU path
+qemu_build_path (str): Path to the build directory with configuration files
+command (str): bisect command (start|fast|slow|reset)
+args (list): Optional arguments
+
+Returns:
+(str): git bisect stdout.
+"""
+process = ["git", "bisect", command]
+if args:
+process += args
+bisect = subprocess.run(process,
+cwd=qemu_path,
+stdout=subprocess.

[PATCH 9/9] scripts/performance: Add topN_system.py script

2020-08-28 Thread Ahmed Karaman
Python script for listing the topN executed QEMU functions in
system mode.

Syntax:
topN_system.py [-h] [-n ] -- \
 

[-h] - Print the script arguments help message.
[-n] - Specify the number of top functions to print.
 - If this flag is not specified, the tool defaults to 25.

Example of usage:
topN_system.py -n 20 -- qemu-system-x86_64 -m 1024 -kernel  \
-initrd 

Example output:
Number of instructions: 150,991,381,071

 No.  Percentage  Name
 ---  --  --
   1  11.30%  helper_lookup_tb_ptr
   2   7.01%  liveness_pass_1
   3   4.48%  tcg_gen_code
   4   3.41%  tcg_optimize
   5   1.84%  tcg_out_opc.isra.13
   6   1.78%  helper_pcmpeqb_xmm
   7   1.20%  object_dynamic_cast_assert
   8   1.00%  cpu_exec
   9   0.99%  tcg_temp_new_internal
  10   0.88%  tb_htable_lookup
  11   0.84%  object_class_dynamic_cast_assert
  12   0.81%  init_ts_info
  13   0.80%  tlb_set_page_with_attrs
  14   0.77%  victim_tlb_hit
  15   0.75%  tcg_out_sib_offset
  16   0.62%  tcg_op_alloc
  17   0.61%  helper_pmovmskb_xmm
  18   0.58%  disas_insn.isra.50
  19   0.56%  helper_pcmpgtb_xmm
  20   0.56%  address_space_ldq

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/topN_system.py | 158 +
 1 file changed, 158 insertions(+)
 create mode 100755 scripts/performance/topN_system.py

diff --git a/scripts/performance/topN_system.py 
b/scripts/performance/topN_system.py
new file mode 100755
index 00..9b1f1a66c7
--- /dev/null
+++ b/scripts/performance/topN_system.py
@@ -0,0 +1,158 @@
+#!/usr/bin/env python3
+
+"""
+Print the top N most executed functions in QEMU system mode emulation.
+
+Syntax:
+topN_system.py [-h] [-n ] -- \
+ 
+
+[-h] - Print the script arguments help message.
+[-n] - Specify the number of top functions to print.
+ - If this flag is not specified, the tool defaults to 25.
+
+Example of usage:
+topN_system.py -n 20 -- qemu-system-x86_64 -m 1024 -kernel  \
+-initrd 
+
+This file is a part of the project "TCG Continuous Benchmarking".
+
+Copyright (C) 2020  Ahmed Karaman 
+Copyright (C) 2020  Aleksandar Markovic 
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
+
+import argparse
+import os
+import subprocess
+import sys
+import tempfile
+
+
+# Parse the command line arguments
+PARSER = argparse.ArgumentParser(
+usage="usage: topN_system.py [-h] [-n ]"
+  " --  ")
+
+PARSER.add_argument("-n", dest="top", type=int, default=25,
+help="Specify the number of top functions to print.")
+
+PARSER.add_argument("command", type=str, nargs="+", help=argparse.SUPPRESS)
+
+ARGS = PARSER.parse_args()
+
+# Extract the needed variables from the args
+COMMAND = ARGS.command
+TOP = ARGS.top
+
+# Ensure that perf is installed
+CHECK_PERF_PRESENCE = subprocess.run(["which", "perf"],
+ stdout=subprocess.DEVNULL,
+ check=False)
+if CHECK_PERF_PRESENCE.returncode:
+sys.exit("Please install perf before running the script!")
+
+# Ensure user has previllage to run perf
+CHECK_PERF_EXECUTABILITY = subprocess.run(["perf", "stat", "ls", "/"],
+  stdout=subprocess.DEVNULL,
+  stderr=subprocess.DEVNULL,
+  check=False)
+if CHECK_PERF_EXECUTABILITY.returncode:
+sys.exit("""
+Error:
+You may not have permission to collect stats.
+Consider tweaking /proc/sys/kernel/perf_event_paranoid,
+which controls use of the performance events system by
+unprivileged users (without CAP_SYS_ADMIN).
+  -1: Allow use of (almost) all events by all users
+  Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
+   0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
+  Disallow raw tracepoint access by users without CAP_SYS_ADMIN
+   1: Disallow CPU event access by users without CAP_SYS_ADMIN
+   2: Disallow kernel profiling by users without CAP_SYS_ADMIN
+To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
+   kernel.perf_event_paranoid = -1
+
+* A

[PATCH 2/9] scripts/performance: Refactor topN_callgrind.py

2020-08-28 Thread Ahmed Karaman
- Apply pylint and flake8 formatting rules to the script.
- Use 'tempfile' instead of '/tmp' for creating temporary files.

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/topN_callgrind.py | 169 +-
 1 file changed, 87 insertions(+), 82 deletions(-)

diff --git a/scripts/performance/topN_callgrind.py 
b/scripts/performance/topN_callgrind.py
index 67c59197af..f8a554f393 100755
--- a/scripts/performance/topN_callgrind.py
+++ b/scripts/performance/topN_callgrind.py
@@ -1,113 +1,122 @@
 #!/usr/bin/env python3
 
-#  Print the top N most executed functions in QEMU using callgrind.
-#  Syntax:
-#  topN_callgrind.py [-h] [-n]   -- \
-#[] \
-#[]
-#
-#  [-h] - Print the script arguments help message.
-#  [-n] - Specify the number of top functions to print.
-#   - If this flag is not specified, the tool defaults to 25.
-#
-#  Example of usage:
-#  topN_callgrind.py -n 20 -- qemu-arm coulomb_double-arm
-#
-#  This file is a part of the project "TCG Continuous Benchmarking".
-#
-#  Copyright (C) 2020  Ahmed Karaman 
-#  Copyright (C) 2020  Aleksandar Markovic 
-#
-#  This program is free software: you can redistribute it and/or modify
-#  it under the terms of the GNU General Public License as published by
-#  the Free Software Foundation, either version 2 of the License, or
-#  (at your option) any later version.
-#
-#  This program is distributed in the hope that it will be useful,
-#  but WITHOUT ANY WARRANTY; without even the implied warranty of
-#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-#  GNU General Public License for more details.
-#
-#  You should have received a copy of the GNU General Public License
-#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
+Print the top N most executed functions in QEMU using callgrind.
+
+Syntax:
+topN_callgrind.py [-h] [-n ] -- \
+  [] \
+  []
+
+[-h] - Print the script arguments help message.
+[-n] - Specify the number of top functions to print.
+ - If this flag is not specified, the tool defaults to 25.
+
+Example of usage:
+topN_callgrind.py -n 20 -- qemu-arm coulomb_double-arm
+
+This file is a part of the project "TCG Continuous Benchmarking".
+
+Copyright (C) 2020  Ahmed Karaman 
+Copyright (C) 2020  Aleksandar Markovic 
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
 
 import argparse
 import os
 import subprocess
 import sys
+import tempfile
 
 
 # Parse the command line arguments
-parser = argparse.ArgumentParser(
-usage='topN_callgrind.py [-h] [-n]   -- 
'
+PARSER = argparse.ArgumentParser(
+usage='topN_callgrind.py [-h] [-n]  -- '
   ' [] '
   ' []')
 
-parser.add_argument('-n', dest='top', type=int, default=25,
+PARSER.add_argument('-n', dest='top', type=int, default=25,
 help='Specify the number of top functions to print.')
 
-parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
+PARSER.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
 
-args = parser.parse_args()
+ARGS = PARSER.parse_args()
 
 # Extract the needed variables from the args
-command = args.command
-top = args.top
+COMMAND = ARGS.command
+TOP = ARGS.top
 
 # Insure that valgrind is installed
-check_valgrind_presence = subprocess.run(["which", "valgrind"],
- stdout=subprocess.DEVNULL)
-if check_valgrind_presence.returncode:
+CHECK_VALGRIND_PRESENCE = subprocess.run(["which", "valgrind"],
+ stdout=subprocess.DEVNULL,
+ check=False)
+if CHECK_VALGRIND_PRESENCE.returncode:
 sys.exit("Please install valgrind before running the script!")
 
-# Run callgrind
-callgrind = subprocess.run((
-["valgrind", "--tool=callgrind", 
"--callgrind-out-file=/tmp/callgrind.data"]
-+ command),
-stdout=subprocess.DEVNULL,
-stderr=subprocess.PIPE)
-if callgrind.returncode:
-sys.exit(callgrind.stderr.decode("utf-8"))
-
-# Save callgrind_annotate output to /tmp/callgrind_annotate.out
-with open("/tmp/callgrind_annotate.out", "w") as output:
-callgrind_annotate = subprocess.run(["callgrind_annotate",
-  

[PATCH 5/9] scripts/performance: Add list_helpers.py script

2020-08-28 Thread Ahmed Karaman
Python script that prints executed helpers of a QEMU invocation.

Syntax:
list_helpers.py [-h] -- \
[] \
[]

[-h] - Print the script arguments help message.

Example of usage:
list_helpers.py -- qemu-mips coulomb_double-mips -n10

Example output:
 Total number of instructions: 108,933,695

 Executed QEMU Helpers:

 No. Ins Percent  Calls Ins/Call Helper Name Source File
 --- --- --- --  ---
   1 183,021  0.168%  1,305  140 helper_float_sub_d  
/target/mips/fpu_helper.c
   2 177,111  0.163%770  230 helper_float_madd_d 
/target/mips/fpu_helper.c
   3 171,537  0.157%  1,014  169 helper_float_mul_d  
/target/mips/fpu_helper.c
   4 157,298  0.144%  2,443   64 helper_lookup_tb_ptr
/accel/tcg/tcg-runtime.c
   5 138,123  0.127%897  153 helper_float_add_d  
/target/mips/fpu_helper.c
   6  47,083  0.043%207  227 helper_float_msub_d 
/target/mips/fpu_helper.c
   7  24,062  0.022%487   49 helper_cmp_d_lt 
/target/mips/fpu_helper.c
   8  22,910  0.021%150  152 helper_float_div_d  
/target/mips/fpu_helper.c
   9  15,497  0.014%321   48 helper_cmp_d_eq 
/target/mips/fpu_helper.c
  10   9,100  0.008% 52  175 helper_float_trunc_w_d  
/target/mips/fpu_helper.c
  11   7,059  0.006% 10  705 helper_float_sqrt_d 
/target/mips/fpu_helper.c
  12   3,000  0.003% 40   75 helper_cmp_d_ule
/target/mips/fpu_helper.c
  13   2,720  0.002% 20  136 helper_float_cvtd_w 
/target/mips/fpu_helper.c
  14   2,477  0.002% 27   91 helper_swl  
/target/mips/op_helper.c
  15   2,000  0.002% 40   50 helper_cmp_d_le 
/target/mips/fpu_helper.c
  16   1,800  0.002% 40   45 helper_cmp_d_un 
/target/mips/fpu_helper.c
  17   1,164  0.001% 12   97 helper_raise_exception_ 
/target/mips/op_helper.c
  18 720  0.001% 10   72 helper_cmp_d_ult
/target/mips/fpu_helper.c
  19 560  0.001%1404 helper_cfc1 
/target/mips/fpu_helper.c

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/list_helpers.py | 221 
 1 file changed, 221 insertions(+)
 create mode 100755 scripts/performance/list_helpers.py

diff --git a/scripts/performance/list_helpers.py 
b/scripts/performance/list_helpers.py
new file mode 100755
index 00..823b1cef66
--- /dev/null
+++ b/scripts/performance/list_helpers.py
@@ -0,0 +1,221 @@
+#!/usr/bin/env python3
+
+"""
+Print the executed helpers of a QEMU invocation.
+
+This file is a part of the project "TCG Continuous Benchmarking".
+
+Copyright (C) 2020  Ahmed Karaman 
+Copyright (C) 2020  Aleksandar Markovic 
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
+
+import argparse
+import os
+import subprocess
+import sys
+import tempfile
+
+from typing import List, Union
+
+
+def find_jit_line(callgrind_data: List[str]) -> int:
+"""
+Search for the line with the JIT call in the callgrind_annotate
+output when ran using --tre=calling.
+All the helpers should be listed after that line.
+
+Parameters:
+callgrind_data (List[str]): callgrind_annotate output
+
+Returns:
+(int): Line number of JIT call
+"""
+line = -1
+for (i, callgrind_datum) in enumerate(callgrind_data):
+split_line = callgrind_datum.split()
+if len(split_line) > 2 and \
+split_line[1] == "*" and \
+split_line[-1] == "[???]":
+line = i
+break
+return line
+
+
+def get_helpers(jit_line: int,
+callgrind_data: List[str]) -> List[List[Union[str, int]]]:
+"""
+Get all helpers data given the line number of the JIT call.
+
+Parameters:
+jit_line (int): Line number of the JIT call
+callgrind_data (List[str]): callgrind_annotate output
+
+Returns:
+(List[List[Union[str, int]]]):[[number_of_instructions(int),
+helper_name(str),
+number_of_calls(int),
+source_file(str)],
+...]
+"""
+helper

[PATCH 3/9] scripts/performance: Refactor dissect.py

2020-08-28 Thread Ahmed Karaman
- Apply pylint and flake8 formatting rules to the script.
- Move syntax and usage exmaple to main() docstring.
- Update get_jit_line() to only detect the main jit call.
- Use mypy for specifying parameters and return types in functions.

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/dissect.py | 123 ++---
 1 file changed, 68 insertions(+), 55 deletions(-)

diff --git a/scripts/performance/dissect.py b/scripts/performance/dissect.py
index bf24f50922..d4df884b75 100755
--- a/scripts/performance/dissect.py
+++ b/scripts/performance/dissect.py
@@ -1,34 +1,27 @@
 #!/usr/bin/env python3
 
-#  Print the percentage of instructions spent in each phase of QEMU
-#  execution.
-#
-#  Syntax:
-#  dissect.py [-h] --  [] \
-#[]
-#
-#  [-h] - Print the script arguments help message.
-#
-#  Example of usage:
-#  dissect.py -- qemu-arm coulomb_double-arm
-#
-#  This file is a part of the project "TCG Continuous Benchmarking".
-#
-#  Copyright (C) 2020  Ahmed Karaman 
-#  Copyright (C) 2020  Aleksandar Markovic 
-#
-#  This program is free software: you can redistribute it and/or modify
-#  it under the terms of the GNU General Public License as published by
-#  the Free Software Foundation, either version 2 of the License, or
-#  (at your option) any later version.
-#
-#  This program is distributed in the hope that it will be useful,
-#  but WITHOUT ANY WARRANTY; without even the implied warranty of
-#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-#  GNU General Public License for more details.
-#
-#  You should have received a copy of the GNU General Public License
-#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
+Print the percentage of instructions spent in each phase of QEMU
+execution.
+
+This file is a part of the project "TCG Continuous Benchmarking".
+
+Copyright (C) 2020  Ahmed Karaman 
+Copyright (C) 2020  Aleksandar Markovic 
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
 
 import argparse
 import os
@@ -36,23 +29,26 @@ import subprocess
 import sys
 import tempfile
 
+from typing import List
+
 
-def get_JIT_line(callgrind_data):
+def get_jit_line(callgrind_data: List[str]) -> int:
 """
 Search for the first instance of the JIT call in
 the callgrind_annotate output when ran using --tree=caller
 This is equivalent to the self number of instructions of JIT.
 
 Parameters:
-callgrind_data (list): callgrind_annotate output
+callgrind_data (List[str]): callgrind_annotate output
 
 Returns:
 (int): Line number
 """
 line = -1
-for i in range(len(callgrind_data)):
-if callgrind_data[i].strip('\n') and \
-callgrind_data[i].split()[-1] == "[???]":
+for (i, callgrind_datum) in enumerate(callgrind_data):
+if callgrind_datum.strip('\n') and \
+callgrind_datum.split()[-1] == "[???]" and \
+callgrind_datum.split()[1] == "*":
 line = i
 break
 if line == -1:
@@ -61,6 +57,18 @@ def get_JIT_line(callgrind_data):
 
 
 def main():
+"""
+Parse the command line arguments then start the execution.
+Syntax:
+dissect.py [-h] --  [] \
+  []
+
+[-h] - Print the script arguments help message.
+
+Example of usage:
+dissect.py -- qemu-arm coulomb_double-arm
+"""
+
 # Parse the command line arguments
 parser = argparse.ArgumentParser(
 usage='dissect.py [-h] -- '
@@ -76,7 +84,7 @@ def main():
 
 # Insure that valgrind is installed
 check_valgrind = subprocess.run(
-["which", "valgrind"], stdout=subprocess.DEVNULL)
+["which", "valgrind"], stdout=subprocess.DEVNULL, check=False)
 if check_valgrind.returncode:
 sys.exit("Please install valgrind before running the script.")
 
@@ -93,7 +101,8 @@ def main():
  "--callgrind-out-file=" + data_path]
 + command),
stdout=subprocess.DEVNULL,
-   stderr=subprocess.PIPE)
+   stderr=subprocess.PIPE,
+ 

[PATCH 1/9] scripts/performance: Refactor topN_perf.py

2020-08-28 Thread Ahmed Karaman
- Apply pylint and flake8 formatting rules to the script.
- Use 'tempfile' instead of '/tmp' for creating temporary files.

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/topN_perf.py | 174 +++
 1 file changed, 87 insertions(+), 87 deletions(-)

diff --git a/scripts/performance/topN_perf.py b/scripts/performance/topN_perf.py
index 07be195fc8..56b100da87 100755
--- a/scripts/performance/topN_perf.py
+++ b/scripts/performance/topN_perf.py
@@ -1,72 +1,77 @@
 #!/usr/bin/env python3
 
-#  Print the top N most executed functions in QEMU using perf.
-#  Syntax:
-#  topN_perf.py [-h] [-n]   -- \
-#[] \
-#[]
-#
-#  [-h] - Print the script arguments help message.
-#  [-n] - Specify the number of top functions to print.
-#   - If this flag is not specified, the tool defaults to 25.
-#
-#  Example of usage:
-#  topN_perf.py -n 20 -- qemu-arm coulomb_double-arm
-#
-#  This file is a part of the project "TCG Continuous Benchmarking".
-#
-#  Copyright (C) 2020  Ahmed Karaman 
-#  Copyright (C) 2020  Aleksandar Markovic 
-#
-#  This program is free software: you can redistribute it and/or modify
-#  it under the terms of the GNU General Public License as published by
-#  the Free Software Foundation, either version 2 of the License, or
-#  (at your option) any later version.
-#
-#  This program is distributed in the hope that it will be useful,
-#  but WITHOUT ANY WARRANTY; without even the implied warranty of
-#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-#  GNU General Public License for more details.
-#
-#  You should have received a copy of the GNU General Public License
-#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
+Print the top N most executed functions in QEMU using perf.
+
+Syntax:
+topN_perf.py [-h] [-n ] -- \
+  [] \
+  []
+
+[-h] - Print the script arguments help message.
+[-n] - Specify the number of top functions to print.
+ - If this flag is not specified, the tool defaults to 25.
+
+Example of usage:
+topN_perf.py -n 20 -- qemu-arm coulomb_double-arm
+
+This file is a part of the project "TCG Continuous Benchmarking".
+
+Copyright (C) 2020  Ahmed Karaman 
+Copyright (C) 2020  Aleksandar Markovic 
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
 
 import argparse
 import os
 import subprocess
 import sys
+import tempfile
 
 
 # Parse the command line arguments
-parser = argparse.ArgumentParser(
-usage='topN_perf.py [-h] [-n]   -- '
+PARSER = argparse.ArgumentParser(
+usage='topN_perf.py [-h] [-n ] -- '
   ' [] '
   ' []')
 
-parser.add_argument('-n', dest='top', type=int, default=25,
+PARSER.add_argument('-n', dest='top', type=int, default=25,
 help='Specify the number of top functions to print.')
 
-parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
+PARSER.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
 
-args = parser.parse_args()
+ARGS = PARSER.parse_args()
 
 # Extract the needed variables from the args
-command = args.command
-top = args.top
+COMMAND = ARGS.command
+TOP = ARGS.top
 
 # Insure that perf is installed
-check_perf_presence = subprocess.run(["which", "perf"],
- stdout=subprocess.DEVNULL)
-if check_perf_presence.returncode:
+CHECK_PERF_PRESENCE = subprocess.run(["which", "perf"],
+ stdout=subprocess.DEVNULL,
+ check=False)
+if CHECK_PERF_PRESENCE.returncode:
 sys.exit("Please install perf before running the script!")
 
 # Insure user has previllage to run perf
-check_perf_executability = subprocess.run(["perf", "stat", "ls", "/"],
+CHECK_PERF_EXECUTABILITY = subprocess.run(["perf", "stat", "ls", "/"],
   stdout=subprocess.DEVNULL,
-  stderr=subprocess.DEVNULL)
-if check_perf_executability.returncode:
-sys.exit(
-"""
+  stderr=subprocess.DEVNULL,
+  check=False)
+if CHECK_PERF_EXECUTABILITY.returncode:
+sys.exit("&

[PATCH 0/9] GSoC 2020 - TCG Continuous Benchmarking scripts and tools

2020-08-28 Thread Ahmed Karaman
Greetings,

This series includes all of the scripts, tools, and benchmarks
developed in the TCG Continuous Benchmarking GSoC project for 2020.

The series includes one patch for updating the MAINTAINERS file and
eight patches each with a separate script or tool.

All scripts and tools were thoroughly introduced, explained, and
utilized in the project weekly reports which were posted here on the
list each Monday at 12:30PM for the last three months.

Reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg714853.html
Introduced tools:
topN_perf.py and topN_callgrind.py

Report 2 - Dissecting QEMU Into Three Main Parts:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg717592.html
Introduced tools:
dissect.py

Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg720321.html

Report 4 - Listing QEMU Helpers and Function Callees:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg722559.html
Introduced tools:
list_fn_callees.py and list_helpers.py

Report 5 - Finding Commits Affecting QEMU Performance:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg724080.html
Introduced tools:
bisect.py

Report 6 - Performance Comparison of Two QEMU Builds:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg725700.html

Report 7 - Measuring QEMU Emulation Efficiency:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg727147.html

Report 8 - QEMU Nightly Performance Tests:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg730444.html
Introduced tools:
nightly-tests/

Report 9 - Measuring QEMU Performance in System Mode:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg732729.html
Introduced tools:
topN_system.py

Report 10 - Measuring QEMU Performance in System Mode - Part Two:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg734485.html


Best regards,
Ahmed Karaman

Ahmed Karaman (9):
  scripts/performance: Refactor topN_perf.py
  scripts/performance: Refactor topN_callgrind.py
  scripts/performance: Refactor dissect.py
  scripts/performance: Add list_fn_callees.py script
  scripts/performance: Add list_helpers.py script
  scripts/performance: Add bisect.py script
  tests/performance: Add nightly tests
  MAINTAINERS: Add 'tests/performance' to 'Performance Tools and Tests'
subsection
  scripts/performance: Add topN_system.py script

 MAINTAINERS   |  32 +-
 scripts/performance/bisect.py | 425 
 scripts/performance/dissect.py| 123 +--
 scripts/performance/list_fn_callees.py| 245 +
 scripts/performance/list_helpers.py   | 221 +
 scripts/performance/topN_callgrind.py | 169 ++--
 scripts/performance/topN_perf.py  | 174 ++--
 scripts/performance/topN_system.py| 158 +++
 tests/performance/nightly-tests/README.md | 243 +
 .../source/dijkstra_double/dijkstra_double.c  | 194 
 .../source/dijkstra_int32/dijkstra_int32.c| 192 
 .../source/matmult_double/matmult_double.c| 123 +++
 .../source/matmult_int32/matmult_int32.c  | 121 +++
 .../source/qsort_double/qsort_double.c| 104 ++
 .../source/qsort_int32/qsort_int32.c  | 103 ++
 .../source/qsort_string/qsort_string.c| 122 +++
 .../source/search_string/search_string.c  | 110 +++
 .../scripts/nightly_tests_core.py | 920 ++
 .../scripts/run_nightly_tests.py  | 135 +++
 .../nightly-tests/scripts/send_email.py   |  56 ++
 20 files changed, 3744 insertions(+), 226 deletions(-)
 create mode 100755 scripts/performance/bisect.py
 create mode 100755 scripts/performance/list_fn_callees.py
 create mode 100755 scripts/performance/list_helpers.py
 create mode 100755 scripts/performance/topN_system.py
 create mode 100644 tests/performance/nightly-tests/README.md
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/dijkstra_double/dijkstra_double.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/dijkstra_int32/dijkstra_int32.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/matmult_double/matmult_double.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/matmult_int32/matmult_int32.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/qsort_double/qsort_double.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/qsort_int32/qsort_int32.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/qsort_string/qsort_string.c
 create mode 100644 
tests/performance/nightly-tests/benchmarks/source/search_string/search_string.c
 create mode 100755 
tests/performance/nightly-tests/scripts/nightly_tests_core.py
 create mode 100755 tests/performance/nightly-tests/scripts/run_nightly_tests.py
 create mode 100644 tests/performance/nightly-tests/scripts/send_email.py

-- 
2.17.1




[PATCH 4/9] scripts/performance: Add list_fn_callees.py script

2020-08-28 Thread Ahmed Karaman
Python script that prints the callees of a given list of QEMU
functions.

Syntax:
list_fn_callees.py [-h] -f FUNCTION [FUNCTION ...] -- \
[] \
[]

[-h] - Print the script arguments help message.
-f FUNCTION [FUNCTION ...] - List of function names

Example of usage:
list_fn_callees.py -f helper_float_sub_d helper_float_mul_d -- \
  qemu-mips coulomb_double-mips -n10

Example output:
 Total number of instructions: 108,952,851

 Callees of helper_float_sub_d:

 No. Instructions Percentage  Calls Ins/Call Function Name Source File
 ---  -- --  - ---
   1  153,160 0.141%  1,305 117  float64_sub   
/fpu/softfloat.c

 Callees of helper_float_mul_d:

 No. Instructions Percentage  Calls Ins/Call Function Name Source File
 ---  -- --  - ---
   1  131,137 0.120%  1,014  129 float64_mul   
/fpu/softfloat.c

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/list_fn_callees.py | 245 +
 1 file changed, 245 insertions(+)
 create mode 100755 scripts/performance/list_fn_callees.py

diff --git a/scripts/performance/list_fn_callees.py 
b/scripts/performance/list_fn_callees.py
new file mode 100755
index 00..6aa8f6b6ca
--- /dev/null
+++ b/scripts/performance/list_fn_callees.py
@@ -0,0 +1,245 @@
+#!/usr/bin/env python3
+
+"""
+Print the callees of a given list of QEMU functions.
+
+This file is a part of the project "TCG Continuous Benchmarking".
+
+Copyright (C) 2020  Ahmed Karaman 
+Copyright (C) 2020  Aleksandar Markovic 
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+"""
+
+import argparse
+import os
+import subprocess
+import sys
+import tempfile
+
+from typing import List, Union
+
+
+def find_function_lines(function_name: str,
+callgrind_data: List[str]) -> List[int]:
+"""
+Search for the line with the function name in the
+callgrind_annotate output when ran using --tre=calling.
+All the function callees should be listed after that line.
+
+Parameters:
+function_name (string): The desired function name to print its callees
+callgrind_data (List[str]): callgrind_annotate output
+
+Returns:
+(List[int]): List of function line numbers
+"""
+lines = []
+for (i, callgrind_datum) in enumerate(callgrind_data):
+split_line = callgrind_datum.split()
+if len(split_line) > 2 and \
+split_line[1] == "*" and \
+split_line[2].split(":")[-1] == function_name:
+# Function might be in the callgrind_annotate output more than
+# once, so don't break after finding an instance
+if callgrind_data[i + 1] != "\n":
+# Only append the line number if the found instance has
+# callees
+lines.append(i)
+return lines
+
+
+def get_function_calles(
+function_lines: List[int],
+callgrind_data: List[str]) -> List[List[Union[str, int]]]:
+"""
+Get all callees data for a function given its list of line numbers in
+callgrind_annotate output.
+
+Parameters:
+function_lines (List[int]): Line numbers of the function to get its callees
+callgrind_data (List[str]): callgrind_annotate output
+
+Returns:
+(List[List[Union[str, int]]]):[[number_of_instructions(int),
+callee_name(str),
+number_of_calls(int),
+source_file(str)],
+...]
+"""
+callees: List[List[Union[str, int]]] = []
+for function_line in function_lines:
+next_callee = function_line + 1
+while callgrind_data[next_callee] != "\n":
+split_line = callgrind_data[next_callee].split()
+number_of_instructions = int(split_line[0].replace(",", ""))
+source_file = split_line[2].split(":")[0]
+callee_name = split_line[2].split(":")[1]
+number_of_calls = int(split_line[3][1:-2])
+callees.append([number_of_instructions, callee_name,
+  

[REPORT] [GSoC - TCG Continuous Benchmarking] [#10] Measuring QEMU Performance in System Mode - Part Two

2020-08-28 Thread Ahmed Karaman
Greetings,

In part two of the final TCG Continuous Benchmarking report, the same
procedures introduced in part one are used for inspecting the
performance of QEMU system mode emulation. The only difference is
instead of emulating the same OS for all targets, different images
where selected from the Qemu-devel thread below and the official QEMU
documentation.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg604682.html

For each of the five targets used in this report (arm, hppa, m68k,
mipsel, and sh4), the source of the used emulation instructions is
mentioned. This is followed by a snippet for fetching the required
files and starting the system emulation.

The results of running the time command are then displayed for
measuring the boot-up time until the login screen is reached where the
emulation is stopped. The top 25 executed QEMU functions are also
measured using the topN_system.py script introduced in part one of the
report.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Measuring-QEMU-Performance-in-System-Mode-Part-Two/

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
Report 2 - Dissecting QEMU Into Three Main Parts:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01978.html
Report 4 - Listing QEMU Helpers and Function Callees:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04227.html
Report 5 - Finding Commits Affecting QEMU Performance:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg05769.html
Report 6 - Performance Comparison of Two QEMU Builds:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg07389.html
Report 7 - Measuring QEMU Emulation Efficiency:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg00098.html
Report 8 - QEMU Nightly Performance Tests:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03409.html
Report 9 - Measuring QEMU Performance in System Mode:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg05705.html

Best regards,
Ahmed Karaman



[REPORT] Nightly Performance Tests - Thursday, August 27, 2020

2020-08-27 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-27 21:30:03
End Time (UTC)   : 2020-08-27 22:02:58
Execution Time   : 0:32:54.920498

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT ac8b279f

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 342 191   - +1.692%
alpha  1 914 972 140   - +3.524%
arm8 076 404 894   - +2.304%
hppa   4 261 677 507   - +3.163%
m68k   2 690 269 887   - +7.131%
mips   1 862 023 330   - +2.493%
mipsel 2 008 211 753   - +2.674%
mips64 1 918 645 078   - +2.819%
mips64el   2 051 567 505   - +3.026%
ppc2 480 144 777   - +3.107%
ppc64  2 576 714 380   - +3.143%
ppc64le2 558 865 873   - +3.174%
riscv641 406 706 168   -  +2.65%
s390x  3 158 131 808   - +3.118%
sh42 364 459 082   - +3.331%
sparc643 318 544 351   - +3.851%
x86_64 1 775 843 892   - +2.156%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 573 443   - +1.423%
alpha  3 191 869 281   - +3.696%
arm   16 357 166 524   - +2.347%
hppa   7 228 367 299   - +3.086%
m68k   4 294 012 030   - +9.692%
mips   3 051 408 507   - +2.426%
mipsel 3 231 507 917   - +2.869%
mips64 3 245 845 593   - +2.597%
mips64el   3 414 200 419   - +3.021%
ppc4 914 524 698   -  +4.74%
ppc64  5 098 153 878   - +4.565%
ppc64le5 082 431 528   -  +4.58%
riscv642 192 299 393   - +1.956%
s390x  4 584 495 872   - +2.896%
sh43 949 047 442   - +3.464%
sparc644 586 202 504   - +4.237%
x86_64 2 484 092 820   -  +1.75%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 183 548   - +1.493%
alpha  1 494 137 977   - +2.151%
arm8 262 937 192   - +2.665%
hppa   5 207 309 118   - +3.047%
m68k   1 725 855 550   - +2.526%
mips   1 495 219 108   - +1.491%
mipsel 1 497 148 185   - +1.479%
mips64 1 715 400 005   - +1.892%
mips64el   1 695 276 461   - +1.913%
ppc2 014 561 870   -  +1.82%
ppc64  2 206 270 729   - +2.139%
ppc64le2 198 010 960   - +2.147%
riscv641 354 914 552   - +2.396%
s390x  2 916 239 244   - +1.241%
sh41 990 540 028   -  +2.67%
sparc642 872 232 498   - +3.758%
x86_64 1 553 979 956   -  +2.12%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 257 235   -   +0.3%
alpha  3 233 996 079   - +7.473%
arm8 545 175 267   - +1.088%
hppa   3 483 588 720   - +4.468%
m68k   3 919 060 938   -+18.431%
mips   2 344 763 701   -  +4.09%
mipsel 3 329 889 270   - +5.177%
mips64 2 359 056 141   - +4.076%

[REPORT] Nightly Performance Tests - Wednesday, August 26, 2020

2020-08-26 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-26 21:30:02
End Time (UTC)   : 2020-08-26 22:03:25
Execution Time   : 0:33:22.684015

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 25f6dc28

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 343 131   - +1.692%
alpha  1 914 977 713   - +3.525%
arm8 076 400 021   - +2.304%
hppa   4 261 678 154   - +3.163%
m68k   2 690 261 170   - +7.131%
mips   1 862 023 351   - +2.493%
mipsel 2 008 208 091   - +2.674%
mips64 1 918 635 853   - +2.818%
mips64el   2 051 571 520   - +3.026%
ppc2 480 152 604   - +3.107%
ppc64  2 576 708 898   - +3.142%
ppc64le2 558 863 471   - +3.173%
riscv641 406 706 018   -  +2.65%
s390x  3 158 137 523   - +3.118%
sh42 364 463 007   - +3.331%
sparc643 318 544 246   - +3.851%
x86_64 1 775 842 110   - +2.156%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 571 701   - +1.423%
alpha  3 191 875 352   - +3.696%
arm   16 357 154 152   - +2.347%
hppa   7 228 368 174   - +3.086%
m68k   4 294 003 802   - +9.692%
mips   3 051 405 821   - +2.426%
mipsel 3 231 506 827   - +2.869%
mips64 3 245 837 618   - +2.596%
mips64el   3 414 203 643   - +3.021%
ppc4 914 532 207   -  +4.74%
ppc64  5 098 149 119   - +4.565%
ppc64le5 082 428 275   -  +4.58%
riscv642 192 299 304   - +1.956%
s390x  4 584 501 630   - +2.896%
sh43 949 051 181   - +3.464%
sparc644 586 202 420   - +4.237%
x86_64 2 484 087 711   -  +1.75%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 184 587   - +1.493%
alpha  1 494 141 516   - +2.151%
arm8 262 932 780   - +2.665%
hppa   5 207 310 153   - +3.047%
m68k   1 725 847 586   - +2.526%
mips   1 495 219 183   - +1.491%
mipsel 1 497 145 499   - +1.479%
mips64 1 715 391 181   - +1.892%
mips64el   1 695 282 408   - +1.913%
ppc2 014 571 983   -  +1.82%
ppc64  2 206 263 388   - +2.138%
ppc64le2 198 011 065   - +2.147%
riscv641 354 917 093   - +2.396%
s390x  2 916 244 976   - +1.241%
sh41 990 543 099   -  +2.67%
sparc642 872 232 296   - +3.758%
x86_64 1 553 980 145   -  +2.12%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 257 972   - +0.301%
alpha  3 234 002 286   - +7.474%
arm8 545 170 759   - +1.088%
hppa   3 483 589 580   - +4.468%
m68k   3 919 052 824   -+18.431%
mips   2 344 763 681   -  +4.09%
mipsel 3 329 886 020   - +5.177%
mips64 2 359 046 809   - +4.076%

[REPORT] Nightly Performance Tests - Tuesday, August 25, 2020

2020-08-25 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-25 21:30:01
End Time (UTC)   : 2020-08-25 22:02:37
Execution Time   : 0:32:35.896990

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT d1a2b51f

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 355 274   - +1.693%
alpha  1 914 967 171   - +3.524%
arm8 076 402 940   - +2.304%
hppa   4 261 685 987 -0.182% +3.164%
m68k   2 690 273 044   - +7.131%
mips   1 862 033 667   - +2.494%
mipsel 2 008 211 069   - +2.674%
mips64 1 918 635 565   - +2.818%
mips64el   2 051 565 677   - +3.026%
ppc2 480 141 217   - +3.107%
ppc64  2 576 713 959   - +3.143%
ppc64le2 558 853 539   - +3.173%
riscv641 406 704 050   -  +2.65%
s390x  3 158 140 046   - +3.118%
sh42 364 449 748   -  +3.33%
sparc643 318 544 783   - +3.851%
x86_64 1 775 844 158   - +2.156%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 583 464   - +1.424%
alpha  3 191 864 698   - +3.696%
arm   16 357 157 526   - +2.347%
hppa   7 228 376 315 -0.139% +3.086%
m68k   4 294 016 587   - +9.692%
mips   3 051 419 166   - +2.427%
mipsel 3 231 509 618   - +2.869%
mips64 3 245 837 754   - +2.596%
mips64el   3 414 195 796   - +3.021%
ppc4 914 520 972 -0.041%  +4.74%
ppc64  5 098 154 311   - +4.565%
ppc64le5 082 419 054   -  +4.58%
riscv642 192 294 915   - +1.955%
s390x  4 584 503 977   - +2.896%
sh43 949 036 447   - +3.464%
sparc644 586 203 546   - +4.237%
x86_64 2 484 092 105   -  +1.75%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 194 577   - +1.493%
alpha  1 494 133 274   -  +2.15%
arm8 262 935 967   - +2.665%
hppa   5 207 318 306   - +3.047%
m68k   1 725 856 962   - +2.527%
mips   1 495 227 032   - +1.492%
mipsel 1 497 147 869   - +1.479%
mips64 1 715 388 570   - +1.892%
mips64el   1 695 276 864   - +1.913%
ppc2 014 557 389   - +1.819%
ppc64  2 206 267 901   - +2.139%
ppc64le2 197 998 781   - +2.146%
riscv641 354 912 745   - +2.396%
s390x  2 916 247 062   - +1.241%
sh41 990 532 533   - +2.669%
sparc642 872 231 051   - +3.758%
x86_64 1 553 981 241   -  +2.12%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 273 223   - +0.302%
alpha  3 233 991 649   - +7.473%
arm8 545 173 979   - +1.088%
hppa   3 483 597 802 -1.267% +4.468%
m68k   3 919 065 529   -+18.431%
mips   2 344 774 894   - +4.091%
mipsel 3 329 886 464   - +5.177%
mips64 2 359 046 988   - +4.076%

[REPORT] Nightly Performance Tests - Monday, August 24, 2020

2020-08-24 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-24 21:30:01
End Time (UTC)   : 2020-08-24 22:02:34
Execution Time   : 0:32:33.288312

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 30aa1944

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 336 369   - +1.692%
alpha  1 914 963 828   - +3.524%
arm8 076 406 621   - +2.304%
hppa   4 268 707 368   - +3.355%
m68k   2 690 268 056   - +7.131%
mips   1 862 013 209   - +2.492%
mipsel 2 008 201 160   - +2.673%
mips64 1 918 633 309   - +2.818%
mips64el   2 051 562 603   - +3.026%
ppc2 480 494 245   - +3.116%
ppc64  2 576 749 439   - +3.143%
ppc64le2 558 895 447   - +3.174%
riscv641 406 714 486   - +2.651%
s390x  3 158 130 515   - +3.118%
sh42 364 446 984   -  +3.33%
sparc643 318 536 224   -  +3.85%
x86_64 1 775 853 879   - +2.157%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 565 290   - +1.423%
alpha  3 191 861 945   - +3.696%
arm   16 357 168 614   - +2.347%
hppa   7 238 406 927   - +3.229%
m68k   4 294 010 671   - +9.692%
mips   3 051 404 978   - +2.426%
mipsel 3 231 499 722   - +2.869%
mips64 3 245 835 252   - +2.596%
mips64el   3 414 195 149   - +3.021%
ppc4 916 527 458   - +4.782%
ppc64  5 098 142 111   - +4.565%
ppc64le5 082 413 487   -  +4.58%
riscv642 192 308 138   - +1.956%
s390x  4 584 495 400   - +2.896%
sh43 949 034 207   - +3.464%
sparc644 586 194 372   - +4.237%
x86_64 2 484 102 186   - +1.751%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 178 166   - +1.493%
alpha  1 494 130 687   -  +2.15%
arm8 262 938 105   - +2.665%
hppa   5 207 303 578   - +3.047%
m68k   1 725 851 873   - +2.526%
mips   1 495 206 180   - +1.491%
mipsel 1 497 138 038   - +1.478%
mips64 1 715 388 602   - +1.892%
mips64el   1 695 271 234   - +1.913%
ppc2 014 568 021   -  +1.82%
ppc64  2 206 259 282   - +2.138%
ppc64le2 197 993 199   - +2.146%
riscv641 354 923 136   - +2.396%
s390x  2 916 235 947   - +1.241%
sh41 990 530 126   - +2.669%
sparc642 872 225 164   - +3.757%
x86_64 1 553 989 229   -  +2.12%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 251 532   -   +0.3%
alpha  3 233 986 239   - +7.473%
arm8 545 174 231   - +1.088%
hppa   3 528 318 810   -  +5.81%
m68k   3 919 059 909   -+18.431%
mips   2 344 753 389   -  +4.09%
mipsel 3 329 876 697   - +5.176%
mips64 2 359 044 360   - +4.076%

[REPORT] [GSoC - TCG Continuous Benchmarking] [#9] Measuring QEMU Performance in System Mode

2020-08-24 Thread Ahmed Karaman
Greetings,

The final report of the TCG Continuous Benchmarking series introduces
basic performance measurements for QEMU system mode emulation.
The latest version of Debian 15.0 is used for testing and comparing
the emulation performance on five targets (aarch64, arm, mips, mipsel,
and x86_64).

The boot-up time and the number of executed instructions are compared
for the emulation of the targets. The report also introduces a new
topN_system script for finding the most executed QEMU functions in the
emulation.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Measuring-QEMU-Performance-in-System-Mode/

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
Report 2 - Dissecting QEMU Into Three Main Parts:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01978.html
Report 4 - Listing QEMU Helpers and Function Callees:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04227.html
Report 5 - Finding Commits Affecting QEMU Performance:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg05769.html
Report 6 - Performance Comparison of Two QEMU Builds:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg07389.html
Report 7 - Measuring QEMU Emulation Efficiency:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg00098.html
Report 8 - QEMU Nightly Performance Tests:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03409.html

Best regards,
Ahmed Karaman



[REPORT] Nightly Performance Tests - Sunday, August 23, 2020

2020-08-23 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-23 21:30:02
End Time (UTC)   : 2020-08-23 22:02:16
Execution Time   : 0:32:14.028460

Status   : FAILURE



  ERROR LOGS

2020-08-23T21:30:03.149828 - Verifying executables of 8 benchmarks for 17 targets
2020-08-23T21:30:03.256787 - Verifying results of reference version v5.1.0
2020-08-23T21:30:03.522675 - Checking out master
2020-08-23T21:30:07.340980 - Pulling the latest changes from QEMU master
error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.
fatal: The remote end hung up unexpectedly
fatal: protocol error: bad pack header
2020-08-23T21:43:32.047625 - Trial 1/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': Could not resolve host: git.qemu.org
2020-08-23T21:44:52.183570 - Trial 2/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': Could not resolve host: git.qemu.org
2020-08-23T21:46:12.285963 - Trial 3/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': Could not resolve host: git.qemu.org
2020-08-23T21:47:32.368841 - Trial 4/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': Could not resolve host: git.qemu.org
2020-08-23T21:48:52.473741 - Trial 5/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': Could not resolve host: git.qemu.org
2020-08-23T21:50:12.584987 - Trial 6/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': Could not resolve host: git.qemu.org
2020-08-23T21:51:32.688082 - Trial 7/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': GnuTLS recv error (-110): The TLS connection was non-properly terminated.
2020-08-23T21:53:03.538585 - Trial 8/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': Could not resolve host: git.qemu.org
2020-08-23T21:54:23.661134 - Trial 9/10: Failed to pull QEMU ... retrying again in a minute!
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': GnuTLS recv error (-54): Error in the pull function.
2020-08-23T22:02:16.671699 - Trial 10/10: Failed to pull QEMU






[REPORT] Nightly Performance Tests - Saturday, August 22, 2020

2020-08-22 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-22 21:30:02
End Time (UTC)   : 2020-08-22 22:35:14
Execution Time   : 1:05:12.080181

Status   : SUCCESS

Note:
Changes denoted by '-' are less than 0.01%.


SUMMARY REPORT - COMMIT 66e01f1c

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 158 307 521 N/A  +1.69%
alpha  1 914 952 274 N/A +3.523%
arm8 076 363 834 N/A +2.304%
hppa   4 268 702 064 N/A +3.355%
m68k   2 690 260 104 N/A +7.131%
mips   1 862 001 379 N/A +2.491%
mipsel 2 008 198 982 N/A +2.673%
mips64 1 918 624 661 N/A +2.817%
mips64el   2 051 547 914 N/A +3.025%
ppc2 480 477 927 N/A +3.115%
ppc64  2 576 741 167 N/A +3.143%
ppc64le2 558 878 783 N/A +3.173%
riscv641 406 694 841 N/A +2.649%
s390x  3 158 128 415 N/A +3.118%
sh42 364 441 076 N/A  +3.33%
sparc643 318 523 931 N/A +3.849%
x86_64 1 775 833 886 N/A +2.155%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 062 524 484 N/A +1.422%
alpha  3 191 860 820 N/A +3.696%
arm   16 357 111 456 N/A +2.347%
hppa   7 238 394 077 N/A +3.229%
m68k   4 294 002 716 N/A +9.692%
mips   3 051 383 586 N/A +2.425%
mipsel 3 231 497 182 N/A +2.868%
mips64 3 245 824 955 N/A +2.596%
mips64el   3 414 179 260 N/A  +3.02%
ppc4 916 505 731 N/A +4.782%
ppc64  5 098 131 929 N/A +4.565%
ppc64le5 082 392 648 N/A +4.579%
riscv642 192 278 694 N/A +1.955%
s390x  4 584 482 755 N/A +2.895%
sh43 949 025 009 N/A +3.464%
sparc644 586 172 717 N/A +4.237%
x86_64 2 484 077 965 N/A  +1.75%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 210 139 976 N/A +1.491%
alpha  1 494 107 932 N/A +2.149%
arm8 262 894 758 N/A +2.665%
hppa   5 207 294 767 N/A +3.046%
m68k   1 725 839 443 N/A +2.526%
mips   1 495 193 055 N/A  +1.49%
mipsel 1 497 134 165 N/A +1.478%
mips64 1 715 372 351 N/A +1.891%
mips64el   1 695 257 618 N/A +1.912%
ppc2 014 547 431 N/A +1.819%
ppc64  2 206 249 752 N/A +2.138%
ppc64le2 197 979 723 N/A +2.145%
riscv641 354 902 721 N/A +2.395%
s390x  2 916 235 101 N/A +1.241%
sh41 990 519 487 N/A +2.669%
sparc642 872 212 578 N/A +3.757%
x86_64 1 553 963 533 N/A +2.118%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 412 256 031 N/A   +0.3%
alpha  3 233 975 236 N/A +7.473%
arm8 545 132 276 N/A +1.088%
hppa   3 528 308 660 N/A +5.809%
m68k   3 919 054 398 N/A+18.431%
mips   2 344 755 096 N/A  +4.09%
mipsel 3 329 879 162 N/A +5.176%
mips64 2 359 034 998 N/A +4.075%

Re: [REPORT] Nightly Performance Tests - Wednesday, August 19, 2020

2020-08-22 Thread Ahmed Karaman
Thanks Mr. Aleksandar,

"Reference" and "latest" already each has its own results directory.

I will try re-running the tests tonight from scratch as you've suggested to
see how things will go.

I will also add the "---" line at the beginning as you've suggested.

Best regards,
Ahmed

On Sat, Aug 22, 2020, 2:21 PM Aleksandar Markovic <
aleksandar.qemu.de...@gmail.com> wrote:

>
>
> On Saturday, August 22, 2020, Aleksandar Markovic <
> aleksandar.qemu.de...@gmail.com> wrote:
>
>>
>>
>> On Wednesday, August 19, 2020, Ahmed Karaman <
>> ahmedkhaledkara...@gmail.com> wrote:
>>
>>> Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
>>> Host Memory  : 15.49 GB
>>>
>>> Start Time (UTC) : 2020-08-19 21:00:01
>>> End Time (UTC)   : 2020-08-19 21:32:15
>>> Execution Time   : 0:32:14.021998
>>>
>>> Status   : SUCCESS
>>>
>>> 
>>>
>>>
>>> I see we did not receive nightly report last night. The cause is most
>> likely the change of our build system that happened yesterday.
>>
>> I think the best approach for you is to start tonight "from scratch". So,
>> with source code tree and all past data deleted - as if you execute the
>> nighlies for the first time ever. This will cause a fresh checkout, and a
>> recreation of 5.1 and 'latest' perfirmace data.
>>
>>
> Sorry, I envision some problems with the approach I described. I think you
> should better create a separate source directory for reference 5.1
> measurements, and another source directory for all measurements from
> tonight on. That way you will avoid switching between build systems from
> within the same directory.
>
> Thanks,
> Aleksandar
>
>
>
>
>> Unrelated hint:
>>
>> Please include the following (or similar) text right before the tables
>> with results:
>>
>> "'-' denotes difference less than 0.01%."
>>
>> This way the readers will not be confused with '-' meaning.
>>
>> Yours,
>> Aleksandar
>>
>>
>>>
>>> SUMMARY REPORT - COMMIT 672b2f26
>>> 
>>> AVERAGE RESULTS
>>> 
>>> Target  Instructions  Latest  v5.1.0
>>> --    --  --
>>> aarch642 118 484 879   -   -
>>> alpha  1 838 407 216   -   -
>>> arm7 887 992 884   -   -
>>> hppa   4 124 996 474   -   -
>>> m68k   2 453 421 671   -   -
>>> mips   1 812 636 995   -   -
>>> mipsel 1 947 725 352   -   -
>>> mips64 1 862 495 613   -   -
>>> mips64el   1 984 211 702   -   -
>>> ppc2 394 319 834   -   -
>>> ppc64  2 488 040 622   -   -
>>> ppc64le2 470 198 016   -   -
>>> riscv641 367 774 718   -   -
>>> s390x  3 058 498 362   -   -
>>> sh42 278 490 061   -   -
>>> sparc643 186 999 246   -   -
>>> x86_64 1 734 475 394   -   -
>>> 
>>>
>>>DETAILED RESULTS
>>> 
>>> Test Program: dijkstra_double
>>> 
>>> Target  Instructions  Latest  v5.1.0
>>> --    --  --
>>> aarch643 019 613 303   -   -
>>> alpha  3 078 110 233   -   -
>>> arm   15 982 079 823   -   -
>>> hppa   7 012 014 505   -   -
>>> m68k   3 914 631 319   -   -
>>> mips   2 979 137 836   -   -
>>> mipsel 3 141 391 810   -   -
>>> mips64 3 163 713 203   -   -
>>> mips64el  

Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#6] Performance Comparison of Two QEMU Builds

2020-08-22 Thread Ahmed Karaman
Thanks Mr. Aleksandar for your feedback!

On Sat, Aug 22, 2020, 1:09 PM Aleksandar Markovic <
aleksandar.qemu.de...@gmail.com> wrote:

> Hi, Ahmed.
>
> The report, and the topic in general, look quite interesting. However, I
> would suggest two improvements:
>
> - The title should reflect the content in a clearer way. Let's say,
> "Compilers and QEMU performance" would be IMHO better. The expression "two
> builds" is just missing the central motivation of the report, which is
> comparing gcc-built QEMU and clang-built QEMU, performance-wise.
>
> - At the end, a section "Useful links" would be handy, akin to the similar
> section in Report 1. There were many people that analysed (and posted their
> results on the internet) gcc vs clang in terms of performance of produced
> executables (in contexts other than QEMU). Having the most useful and
> informative ones (3-5 links with a short summary for each one would be more
> than sufficient) listed in this report would enhance it significantly.
>
> Yours,
> Aleksandar
>
>
> On Monday, July 27, 2020, Ahmed Karaman 
> wrote:
>
>> Hi everyone,
>>
>> The sixth report of the TCG Continuous Benchmarking project presents a
>> performance comparison between two different QEMU builds, GCC and Clang.
>>
>> The report also presents five new benchmarks to allow for a variety of
>> test workloads. Each of the five benchmarks is executed for seventeen
>> different QEMU targets on both the GCC and Clang builds.
>>
>> The resulting ten tables are then summarized then analyzed using the
>> list_helpers.py and list_fn_callees.py scripts. The entire workflow is
>> automated using Python scripts that are posted in the report.
>>
>> Report link:
>>
>> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Performance-Comparison-of-Two-QEMU-Builds/
>>
>> Previous reports:
>> Report 1 - Measuring Basic Performance Metrics of QEMU:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>> Report 2 - Dissecting QEMU Into Three Main Parts:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
>> Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01978.html
>> Report 4 - Listing QEMU Helpers and Function Callees:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04227.html
>> Report 5 - Finding Commits Affecting QEMU Performance:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg05769.html
>>
>> Best regards,
>> Ahmed Karaman
>>
>


[Bug 1892441] Re: "No zIPL section in IPL2 record" error when emulating Debian 10.5.0 on s390x

2020-08-20 Thread Ahmed Karaman
** Description changed:

  Hi,
  
- I want to emulate Debian 10.5.0 for the s390x architecture. 
+ I want to emulate Debian 10.5.0 for the s390x architecture.
  The Debian image is downloaded from the following link:
- 
https://cdimage.debian.org/debian-cd/current/s390x/iso-cd/debian-10.5.0-s390x-netinst.iso
 
+ 
https://cdimage.debian.org/debian-cd/current/s390x/iso-cd/debian-10.5.0-s390x-netinst.iso
  
  Using the latest QEMU version 5.1.0, running the debian image using the given 
command:
  qemu-system-s390x -boot d -m 4096 -hda debian.qcow -cdrom 
debian-10.5.0-s390x-netinst.iso -nographic
  
  causes the error output below:
  
  LOADPARM=[]
  Using virtio-blk.
  Using guessed DASD geometry.
  Using ECKD scheme (block size  4096), CDL
  
  ! No zIPL section in IPL2 record. !
+ 
+ Using exactly the same qemu command above with the Alpine 3.12 image for
+ s390x ran successfully without any errors.

** Description changed:

  Hi,
  
- I want to emulate Debian 10.5.0 for the s390x architecture.
+ I want to emulate Debian 10.5.0 for the s390x architecture on an Ubuntu
+ x86_64 host.
+ 
  The Debian image is downloaded from the following link:
  
https://cdimage.debian.org/debian-cd/current/s390x/iso-cd/debian-10.5.0-s390x-netinst.iso
  
- Using the latest QEMU version 5.1.0, running the debian image using the given 
command:
+ Using the latest QEMU version 5.1.0, the Debian image is emulated using the 
given command:
  qemu-system-s390x -boot d -m 4096 -hda debian.qcow -cdrom 
debian-10.5.0-s390x-netinst.iso -nographic
  
- causes the error output below:
+ Running the command causes the output below:
  
- LOADPARM=[]
- Using virtio-blk.
- Using guessed DASD geometry.
- Using ECKD scheme (block size  4096), CDL
- 
- ! No zIPL section in IPL2 record. !
+ LOADPARM=[]
+ Using virtio-blk.
+ Using guessed DASD geometry.
+ Using ECKD scheme (block size  4096), CDL
+ 
+ ! No zIPL section in IPL2 record. !
  
  Using exactly the same qemu command above with the Alpine 3.12 image for
  s390x ran successfully without any errors.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1892441

Title:
  "No zIPL section in IPL2 record" error when emulating Debian 10.5.0 on
  s390x

Status in QEMU:
  New

Bug description:
  Hi,

  I want to emulate Debian 10.5.0 for the s390x architecture on an
  Ubuntu x86_64 host.

  The Debian image is downloaded from the following link:
  
https://cdimage.debian.org/debian-cd/current/s390x/iso-cd/debian-10.5.0-s390x-netinst.iso

  Using the latest QEMU version 5.1.0, the Debian image is emulated using the 
given command:
  qemu-system-s390x -boot d -m 4096 -hda debian.qcow -cdrom 
debian-10.5.0-s390x-netinst.iso -nographic

  Running the command causes the output below:

  LOADPARM=[]
  Using virtio-blk.
  Using guessed DASD geometry.
  Using ECKD scheme (block size  4096), CDL
  
  ! No zIPL section in IPL2 record. !

  Using exactly the same qemu command above with the Alpine 3.12 image
  for s390x ran successfully without any errors.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1892441/+subscriptions



[Bug 1892441] [NEW] "No zIPL section in IPL2 record" error when emulating Debian 10.5.0 on s390x

2020-08-20 Thread Ahmed Karaman
Public bug reported:

Hi,

I want to emulate Debian 10.5.0 for the s390x architecture. 
The Debian image is downloaded from the following link:
https://cdimage.debian.org/debian-cd/current/s390x/iso-cd/debian-10.5.0-s390x-netinst.iso
 

Using the latest QEMU version 5.1.0, running the debian image using the given 
command:
qemu-system-s390x -boot d -m 4096 -hda debian.qcow -cdrom 
debian-10.5.0-s390x-netinst.iso -nographic

causes the error output below:

LOADPARM=[]
Using virtio-blk.
Using guessed DASD geometry.
Using ECKD scheme (block size  4096), CDL

! No zIPL section in IPL2 record. !

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: s390x softmmu

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1892441

Title:
  "No zIPL section in IPL2 record" error when emulating Debian 10.5.0 on
  s390x

Status in QEMU:
  New

Bug description:
  Hi,

  I want to emulate Debian 10.5.0 for the s390x architecture. 
  The Debian image is downloaded from the following link:
  
https://cdimage.debian.org/debian-cd/current/s390x/iso-cd/debian-10.5.0-s390x-netinst.iso
 

  Using the latest QEMU version 5.1.0, running the debian image using the given 
command:
  qemu-system-s390x -boot d -m 4096 -hda debian.qcow -cdrom 
debian-10.5.0-s390x-netinst.iso -nographic

  causes the error output below:

  LOADPARM=[]
  Using virtio-blk.
  Using guessed DASD geometry.
  Using ECKD scheme (block size  4096), CDL

  ! No zIPL section in IPL2 record. !

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1892441/+subscriptions



[REPORT] Nightly Performance Tests - Thursday, August 20, 2020

2020-08-20 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-20 21:00:02
End Time (UTC)   : 2020-08-20 21:32:14
Execution Time   : 0:32:12.619200

Status   : SUCCESS


SUMMARY REPORT - COMMIT 1d806cef

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 118 485 222   -   -
alpha  1 838 408 248   -   -
arm7 887 993 527   -   -
hppa   4 124 996 473   -   -
m68k   2 453 421 667   -   -
mips   1 812 637 399   -   -
mipsel 1 947 724 962   -   -
mips64 1 862 495 949   -   -
mips64el   1 984 211 712   -   -
ppc2 394 319 184   -   -
ppc64  2 488 040 948   -   -
ppc64le2 470 197 058   -   -
riscv641 367 774 062   -   -
s390x  3 058 498 052   -   -
sh42 278 490 066   -   -
sparc643 186 999 555   -   -
x86_64 1 734 476 045   -   -


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 019 613 403   -   -
alpha  3 078 112 827   -   -
arm   15 982 079 858   -   -
hppa   7 012 014 536   -   -
m68k   3 914 631 326   -   -
mips   2 979 140 437   -   -
mipsel 3 141 391 155   -   -
mips64 3 163 713 235   -   -
mips64el   3 314 105 619   -   -
ppc4 692 148 198   -   -
ppc64  4 875 585 390   -   -
ppc64le4 859 857 221   -   -
riscv642 150 267 316   -   -
s390x  4 455 507 331   -   -
sh43 816 841 768   -   -
sparc644 399 783 128   -   -
x86_64 2 441 371 739   -   -


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 177 687 621   -   -
alpha  1 462 695 786   -   -
arm8 048 440 620   -   -
hppa   5 053 364 832   -   -
m68k   1 683 346 175   -   -
mips   1 473 265 060   -   -
mipsel 1 475 326 930   -   -
mips64 1 683 560 336   -   -
mips64el   1 663 469 689   -   -
ppc1 978 578 627   -   -
ppc64  2 160 088 891   -   -
ppc64le2 151 841 557   -   -
riscv641 323 226 632   -   -
s390x  2 880 507 254   -   -
sh41 938 789 917   -   -
sparc642 768 217 662   -   -
x86_64 1 521 729 332   -   -


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 408 042 276   -   -
alpha  3 009 131 612   -   -
arm8 453 189 769   -   -
hppa   3 334 593 478   -   -
m68k   3 309 165 572   -   -
mips   2 252 645 028   -   -
mipsel 3 166 010 232   -   -
mips64 2 266 660 281   -   -
mips64el   3 179 406 382   -   -

[REPORT] Nightly Performance Tests - Wednesday, August 19, 2020

2020-08-19 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-19 21:00:01
End Time (UTC)   : 2020-08-19 21:32:15
Execution Time   : 0:32:14.021998

Status   : SUCCESS


SUMMARY REPORT - COMMIT 672b2f26

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 118 484 879   -   -
alpha  1 838 407 216   -   -
arm7 887 992 884   -   -
hppa   4 124 996 474   -   -
m68k   2 453 421 671   -   -
mips   1 812 636 995   -   -
mipsel 1 947 725 352   -   -
mips64 1 862 495 613   -   -
mips64el   1 984 211 702   -   -
ppc2 394 319 834   -   -
ppc64  2 488 040 622   -   -
ppc64le2 470 198 016   -   -
riscv641 367 774 718   -   -
s390x  3 058 498 362   -   -
sh42 278 490 061   -   -
sparc643 186 999 246   -   -
x86_64 1 734 475 394   -   -


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 019 613 303   -   -
alpha  3 078 110 233   -   -
arm   15 982 079 823   -   -
hppa   7 012 014 505   -   -
m68k   3 914 631 319   -   -
mips   2 979 137 836   -   -
mipsel 3 141 391 810   -   -
mips64 3 163 713 203   -   -
mips64el   3 314 105 619   -   -
ppc4 692 148 212   -   -
ppc64  4 875 585 404   -   -
ppc64le4 859 857 200   -   -
riscv642 150 267 230   -   -
s390x  4 455 507 359   -   -
sh43 816 841 775   -   -
sparc644 399 783 149   -   -
x86_64 2 441 371 746   -   -


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 177 687 656   -   -
alpha  1 462 693 182   -   -
arm8 048 440 634   -   -
hppa   5 053 362 217   -   -
m68k   1 683 346 196   -   -
mips   1 473 265 047   -   -
mipsel 1 475 326 892   -   -
mips64 1 683 560 350   -   -
mips64el   1 663 467 060   -   -
ppc1 978 581 291   -   -
ppc64  2 160 088 877   -   -
ppc64le2 151 841 575   -   -
riscv641 323 226 597   -   -
s390x  2 880 509 792   -   -
sh41 938 787 291   -   -
sparc642 768 217 627   -   -
x86_64 1 521 726 675   -   -


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 408 042 295   -   -
alpha  3 009 129 018   -   -
arm8 453 187 175   -   -
hppa   3 334 593 464   -   -
m68k   3 309 165 600   -   -
mips   2 252 644 394   -   -
mipsel 3 166 010 232   -   -
mips64 2 266 660 274   -   -
mips64el   3 179 408 969   -   -

[REPORT] Nightly Performance Tests - Tuesday, August 18, 2020

2020-08-18 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-18 21:00:01
End Time (UTC)   : 2020-08-18 21:00:11
Execution Time   : 0:00:10.419271

Status   : FAILURE



  ERROR LOGS

2020-08-18T21:00:01.542176 - Verifying executables of 8 benchmarks for 17 targets
2020-08-18T21:00:01.545389 - Verifying results of reference version v5.1.0
2020-08-18T21:00:01.552203 - Checking out master
2020-08-18T21:00:01.876017 - Pulling the latest changes from QEMU master
fatal: unable to access 'https://git.qemu.org/git/qemu.git/': Could not resolve host: git.qemu.org
Failed to pull latest changes in QEMU master.





[Bug 1892081] Re: Performance improvement when using "QEMU_FLATTEN" with softfloat type conversions

2020-08-18 Thread Ahmed Karaman
** Attachment added: "before.png"
   
https://bugs.launchpad.net/qemu/+bug/1892081/+attachment/5402578/+files/before.png

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1892081

Title:
  Performance improvement when using "QEMU_FLATTEN" with softfloat type
  conversions

Status in QEMU:
  New

Bug description:
  Attached below is a matrix multiplication program for double data
  types. The program performs the casting operation "(double)rand()"
  when generating random numbers.

  This operation calls the integer to float softfloat conversion
  function "int32_to_float_64".

  Adding the "QEMU_FLATTEN" attribute to the function definition
  decreases the instructions per call of the function by about 63%.

  Attached are before and after performance screenshots from
  KCachegrind.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1892081/+subscriptions



[Bug 1892081] [NEW] Performance improvement when using "QEMU_FLATTEN" with softfloat type conversions

2020-08-18 Thread Ahmed Karaman
Public bug reported:

Attached below is a matrix multiplication program for double data
types. The program performs the casting operation "(double)rand()"
when generating random numbers.

This operation calls the integer to float softfloat conversion
function "int32_to_float_64".

Adding the "QEMU_FLATTEN" attribute to the function definition
decreases the instructions per call of the function by about 63%.

Attached are before and after performance screenshots from
KCachegrind.

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: "matmult_double.c"
   
https://bugs.launchpad.net/bugs/1892081/+attachment/5402577/+files/matmult_double.c

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1892081

Title:
  Performance improvement when using "QEMU_FLATTEN" with softfloat type
  conversions

Status in QEMU:
  New

Bug description:
  Attached below is a matrix multiplication program for double data
  types. The program performs the casting operation "(double)rand()"
  when generating random numbers.

  This operation calls the integer to float softfloat conversion
  function "int32_to_float_64".

  Adding the "QEMU_FLATTEN" attribute to the function definition
  decreases the instructions per call of the function by about 63%.

  Attached are before and after performance screenshots from
  KCachegrind.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1892081/+subscriptions



[Bug 1892081] Re: Performance improvement when using "QEMU_FLATTEN" with softfloat type conversions

2020-08-18 Thread Ahmed Karaman
** Attachment added: "after.png"
   
https://bugs.launchpad.net/qemu/+bug/1892081/+attachment/5402579/+files/after.png

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1892081

Title:
  Performance improvement when using "QEMU_FLATTEN" with softfloat type
  conversions

Status in QEMU:
  New

Bug description:
  Attached below is a matrix multiplication program for double data
  types. The program performs the casting operation "(double)rand()"
  when generating random numbers.

  This operation calls the integer to float softfloat conversion
  function "int32_to_float_64".

  Adding the "QEMU_FLATTEN" attribute to the function definition
  decreases the instructions per call of the function by about 63%.

  Attached are before and after performance screenshots from
  KCachegrind.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1892081/+subscriptions



[REPORT] Nightly Performance Tests - Monday, August 17, 2020

2020-08-17 Thread Ahmed Karaman

Host CPU : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Host Memory  : 15.49 GB

Start Time (UTC) : 2020-08-17 21:00:02
End Time (UTC)   : 2020-08-17 21:32:56
Execution Time   : 0:32:54.21

Status   : SUCCESS


SUMMARY REPORT - COMMIT d0ed6a69

AVERAGE RESULTS

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 118 484 532 -0.021% -0.021%
alpha  1 838 408 254 -0.027% -0.027%
arm7 887 995 569   -   -
hppa   4 124 996 446   -   -
m68k   2 453 419 250  -0.02% -0.021%
mips   1 812 637 396 -0.017% -0.017%
mipsel 1 947 725 268   -   -
mips64 1 862 496 009 -0.019% -0.018%
mips64el   1 984 212 701 -0.023% -0.023%
ppc2 394 318 517 -0.027% -0.027%
ppc64  2 488 040 654 -0.031% -0.031%
ppc64le2 470 197 723 -0.025% -0.024%
riscv641 367 775 048 -0.031%  -0.03%
s390x  3 058 500 465 -0.016% -0.015%
sh42 278 492 108 -0.024% -0.024%
sparc643 187 005 638 -0.029% -0.028%
x86_64 1 734 476 702 -0.039% -0.039%


   DETAILED RESULTS

Test Program: dijkstra_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch643 019 613 375   -   -
alpha  3 078 112 807 -0.011% -0.011%
arm   15 982 081 632   -   -
hppa   7 012 014 478   -   -
m68k   3 914 629 819   -   -
mips   2 979 141 054 -0.011% -0.011%
mipsel 3 141 391 180   -   -
mips64 3 163 713 165 -0.015% -0.015%
mips64el   3 314 105 649   -   -
ppc4 692 147 232   -   -
ppc64  4 875 585 401   -   -
ppc64le4 859 856 713   -   -
riscv642 150 267 506 -0.012% -0.012%
s390x  4 455 509 097   -   -
sh43 816 843 006   -   -
sparc644 399 787 202   -   -
x86_64 2 441 371 659  -0.03%  -0.03%


Test Program: dijkstra_int32

Target  Instructions  Latest  v5.1.0
--    --  --
aarch642 177 687 572   -   -
alpha  1 462 695 833 -0.023% -0.023%
arm8 048 442 356   -   -
hppa   5 053 364 798   -   -
m68k   1 683 344 745  -0.02% -0.019%
mips   1 473 262 443 -0.018% -0.018%
mipsel 1 475 326 897 -0.023% -0.022%
mips64 1 683 560 967 -0.017% -0.017%
mips64el   1 663 469 676 -0.029% -0.029%
ppc1 978 580 290 -0.019% -0.019%
ppc64  2 160 088 912 -0.013% -0.012%
ppc64le2 151 841 073 -0.027% -0.026%
riscv641 323 229 277 -0.029% -0.029%
s390x  2 880 511 627 -0.012% -0.012%
sh41 938 788 554 -0.017% -0.017%
sparc642 768 224 289 -0.022% -0.022%
x86_64 1 521 726 637 -0.031% -0.031%


Test Program: matmult_double

Target  Instructions  Latest  v5.1.0
--    --  --
aarch641 408 042 239   -   -
alpha  3 009 131 606 -0.012% -0.011%
arm8 453 191 424   -   -
hppa   3 334 593 493   -   -
m68k   3 309 164 187   - -0.011%
mips   2 252 644 413   -   -
mipsel 3 166 010 236 -0.011% -0.011%
mips64 2 266 660 280 -0.013% -0.013%
mips64el   3 179 409 047 -0.016% -0.016%

[REPORT] [GSoC - TCG Continuous Benchmarking] [#8] QEMU Nightly Performance Tests

2020-08-17 Thread Ahmed Karaman
Hi everyone,

QEMU currently lacks a system for measuring the performance of targets
automatically. The previous reports introduced different tools and
methods for locating performance regressions, but all of them had to
be manually executed by the user when needed.

This report devises a new nightly tests system that runs automatically
each night. After the execution is completed, it sends a report to the
QEMU mailing list with the performance measurements of seventeen
different QEMU targets, and how these measurements compare to
previously obtained ones.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-Nightly-Performance-Tests/

The system is now scheduled to execute daily, and starting from
tonight, the results will be sent to the mailing list.

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
Report 2 - Dissecting QEMU Into Three Main Parts:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01978.html
Report 4 - Listing QEMU Helpers and Function Callees:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04227.html
Report 5 - Finding Commits Affecting QEMU Performance:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg05769.html
Report 6 - Performance Comparison of Two QEMU Builds:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg07389.html
Report 7 - Measuring QEMU Emulation Efficiency:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg00098.html

Best regards,
Ahmed Karaman


[REPORT] [GSoC - TCG Continuous Benchmarking] [#7] Measuring QEMU Emulation Efficiency

2020-08-03 Thread Ahmed Karaman
Hi everyone,

The seventh report of the TCG Continuous Benchmarking series presents
a method for measuring the TCG emulation efficiency of QEMU for
seventeen different targets.

This is achieved by comparing the number of guest instructions
(running the program natively on the target) and the number of QEMU
instructions (running the program through QEMU). For each target, the
ratio between these two numbers presents a rough estimation of the
emulation efficiency for that target.

It's clearly shown in the report that the emulation efficiency for a
given target depends on the type of the program being emulated, for
that reason, six different benchmark programs are used to provide a
set of diverse workloads.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Measuring-QEMU-Emulation-Efficiency/

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
Report 2 - Dissecting QEMU Into Three Main Parts:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01978.html
Report 4 - Listing QEMU Helpers and Function Callees:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04227.html
Report 5 - Finding Commits Affecting QEMU Performance:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg05769.html
Report 6 - Performance Comparison of Two QEMU Builds:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg07389.html

Best regards,
Ahmed Karaman



Re: [PATCH v2 2/2] scripts/performance: Add list_helpers.py script

2020-07-28 Thread Ahmed Karaman
On Tue, Jul 28, 2020 at 12:30 PM Aleksandar Markovic
 wrote:
>
>
>
> On Thursday, July 16, 2020, Ahmed Karaman  
> wrote:
>>
>> Python script that prints executed helpers of a QEMU invocation.
>>
>
> Hi, Ahmed.
>
> You outlined the envisioned user workflow regarding this script in your 
> report. As I understand it, it generally goes like this:
>
> 1) The user first discovers helpers, and their performance data.
> 2) The user examines the callees of a particular helper of choice (usually, 
> the most instruction-consuming helper).
> 3) The user perhaps further examines a callee of a particular callee of the 
> particular helper.
> 4) The user continues this way until the conclusion can be drawn, or maximal 
> depth is reached.
>
> The procedure might be time consuming since each step requires running an 
> emulation of the test program.
>
> This makes me think that the faster and easier tool for the user (but, to 
> some, not that great, extent, harder for you) would be improved 
> list_helpers.py (and list_fn_calees.py) that provides list of all callees for 
> all helpers, in the tree form (so, callees of callees, callees of callees of 
> callees, etc.), rather than providing just a list of immediate callees, like 
> it currently does.
>
> I think you can provide such functionality relatively easily using recursion. 
> See, let's say:
>
> https://realpython.com/python-thinking-recursively/
>
> Perhaps you can have a switch (let's say, --tree ) that specifies 
> whether the script outputs just immediate callee list, or entire callee tree.

I have to say, this is a very nice suggestion. I will start working on it!

>
> Thanks,
> Aleksandar
>
>>
>> Syntax:
>> list_helpers.py [-h] -- \
>> [] \
>> []
>>
>> [-h] - Print the script arguments help message.
>>
>> Example of usage:
>> list_helpers.py -- qemu-mips coulomb_double-mips -n10
>>
>> Example output:
>>  Total number of instructions: 108,933,695
>>
>>  Executed QEMU Helpers:
>>
>>  No. Ins Percent  Calls Ins/Call Helper Name Source File
>>  --- --- --- --  ---
>>1 183,021  0.168%  1,305  140 helper_float_sub_d  
>> /target/mips/fpu_helper.c
>>2 177,111  0.163%770  230 helper_float_madd_d 
>> /target/mips/fpu_helper.c
>>3 171,537  0.157%  1,014  169 helper_float_mul_d  
>> /target/mips/fpu_helper.c
>>4 157,298  0.144%  2,443   64 helper_lookup_tb_ptr
>> /accel/tcg/tcg-runtime.c
>>5 138,123  0.127%897  153 helper_float_add_d  
>> /target/mips/fpu_helper.c
>>6  47,083  0.043%207  227 helper_float_msub_d 
>> /target/mips/fpu_helper.c
>>7  24,062  0.022%487   49 helper_cmp_d_lt 
>> /target/mips/fpu_helper.c
>>8  22,910  0.021%150  152 helper_float_div_d  
>> /target/mips/fpu_helper.c
>>9  15,497  0.014%321   48 helper_cmp_d_eq 
>> /target/mips/fpu_helper.c
>>   10   9,100  0.008% 52  175 helper_float_trunc_w_d  
>> /target/mips/fpu_helper.c
>>   11   7,059  0.006% 10  705 helper_float_sqrt_d 
>> /target/mips/fpu_helper.c
>>   12   3,000  0.003% 40   75 helper_cmp_d_ule
>> /target/mips/fpu_helper.c
>>   13   2,720  0.002% 20  136 helper_float_cvtd_w 
>> /target/mips/fpu_helper.c
>>   14   2,477  0.002% 27   91 helper_swl  
>> /target/mips/op_helper.c
>>   15   2,000  0.002% 40   50 helper_cmp_d_le 
>> /target/mips/fpu_helper.c
>>   16   1,800  0.002% 40   45 helper_cmp_d_un 
>> /target/mips/fpu_helper.c
>>   17   1,164  0.001% 12   97 helper_raise_exception_ 
>> /target/mips/op_helper.c
>>   18 720  0.001% 10   72 helper_cmp_d_ult
>> /target/mips/fpu_helper.c
>>   19 560  0.001%1404 helper_cfc1 
>> /target/mips/fpu_helper.c
>>
>> Signed-off-by: Ahmed Karaman 
>> ---
>>  scripts/performance/list_helpers.py | 207 
>>  1 file changed, 207 insertions(+)
>>  create mode 100755 scripts/performance/list_helpers.py
>>
>> diff --git a/scripts/performance/list_helpers.py 
>> b/scripts/performance/list_helpers.py
>> new file mode 100755
>> index 00..a97c7ed4fe
>> --- /dev/null
>> +++ b/scripts/performance/list_helpers.py
>> @@ -0,0 +1,207 @@
>> +#!/usr/bin/env python3
>> 

[REPORT] [GSoC - TCG Continuous Benchmarking] [#6] Performance Comparison of Two QEMU Builds

2020-07-27 Thread Ahmed Karaman
Hi everyone,

The sixth report of the TCG Continuous Benchmarking project presents a
performance comparison between two different QEMU builds, GCC and Clang.

The report also presents five new benchmarks to allow for a variety of
test workloads. Each of the five benchmarks is executed for seventeen
different QEMU targets on both the GCC and Clang builds.

The resulting ten tables are then summarized then analyzed using the
list_helpers.py and list_fn_callees.py scripts. The entire workflow is
automated using Python scripts that are posted in the report.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Performance-Comparison-of-Two-QEMU-Builds/

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
Report 2 - Dissecting QEMU Into Three Main Parts:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01978.html
Report 4 - Listing QEMU Helpers and Function Callees:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04227.html
Report 5 - Finding Commits Affecting QEMU Performance:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg05769.html

Best regards,
Ahmed Karaman


Re: [PATCH 1/1] scripts/performance: Add bisect.py script

2020-07-25 Thread Ahmed Karaman
On Sat, Jul 25, 2020 at 9:48 PM Aleksandar Markovic
 wrote:
>
>
>
> On Saturday, July 25, 2020, Ahmed Karaman  
> wrote:
>>
>> On Sat, Jul 25, 2020 at 2:31 PM Aleksandar Markovic 
>>  wrote:
>>>
>>>
>>> Hi, Ahmed.
>>>
>>> Yes, somewhat related to John's hints on these comments, it is customary to 
>>> have just a brief description before "Copyright" lines. This means one 
>>> sentence, or a short paragraph (3-4 sentences max). The lenghty syntax 
>>> commemt should be, in my opinion, moved after the license preamble, just 
>>> before the start of real Python code.
>>
>>
>> Thanks Mr. John and Aleksandar for your feedback. I will update the script 
>> accordingly.
>>
>>>
>>>
>>> One question:
>>>
>>> What is the behavior in case of the executable architecture and "target" 
>>> command line option mismatch (for example, one specifies m68k target, but 
>>> passes hppa executable? Would that be detected before bisect search, or the 
>>> bisect procedure will be applied even though such cases do not make sense?
>>
>>
>> The script will exit with an error of something along the lines of "Invalid 
>> ELF image for this architecture".
>> This is done before starting "bisect" and after the initial "configure" and 
>> "make".
>>
>
> This is good enough (the moment of detection). However, are all cleanups 
> done? Is temporary directory deleted?

This is a thing I missed, I will add a clean_up() function to be
called before any exit.

>
> The same questions for the scenario where the user specifies non-existant 
> commit ID as the start or the end commit.
>

The script will exit with a message from "git" saying that this ID
doesn't exist. This will be done during the initial measurements of
the two boundary commits which is also before the bisect process.

> Does the script work if user specifies a tag, instead of commit ID? I think 
> it should work. For example, can the user specify v3.1.0 as start commit, and 
> v4.2.0 as the end commit, in order to detect degradation/improvement between 
> QEMU 3.1 and QEMU 4.2? Please test if such scenario works. If it works, I 
> think you should insert "commit ID or tag ID" instead of "commit" only in the 
> commit massage and applicable code comments (including also the user-visible 
> help outputed on "-h").

Yes, tags also work. Basically, anything that works with "git bisect"
as "start" and "end" values works with the script.

>
> Lastly, what happens if specified start and end commits are existant, but in 
> the wrong order (end is "before" start)?

The script will also exit with an error before starting the bisect
process. The error would say:
"Some slow revs are not ancestors of the fast rev.
git bisect cannot work properly in this case.
Maybe you mistook slow and fast revs?"


>
> Thanks,
> Aleksandar
>
>
>
>
>>>
>>>
>>> Yours, Aleksandar
>>>
>>>
>>>>
>>>> +#  This program is free software: you can redistribute it and/or modify
>>>> +#  it under the terms of the GNU General Public License as published by
>>>> +#  the Free Software Foundation, either version 2 of the License, or
>>>> +#  (at your option) any later version.
>>>> +#
>>>> +#  This program is distributed in the hope that it will be useful,
>>>> +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>>> +#  GNU General Public License for more details.
>>>> +#
>>>> +#  You should have received a copy of the GNU General Public License
>>>> +#  along with this program. If not, see <https://www.gnu.org/licenses/>.
>>>> +
>>>> +import argparse
>>>> +import multiprocessing
>>>> +import tempfile
>>>> +import os
>>>> +import shutil
>>>> +import subprocess
>>>> +import sys
>>>> +
>>>> +
>>>> + GIT WRAPPERS 
>>>> +def git_bisect(qemu_path, command, args=None):
>>>> +"""
>>>> +Wrapper function for running git bisect.
>>>> +
>>>> +Parameters:
>>>> +qemu_path (str): QEMU path.
>>>> +command (str):   bisect comma

Re: [PATCH 1/1] scripts/performance: Add bisect.py script

2020-07-25 Thread Ahmed Karaman
On Sat, Jul 25, 2020 at 2:31 PM Aleksandar Markovic <
aleksandar.qemu.de...@gmail.com> wrote:

>
> Hi, Ahmed.
>
> Yes, somewhat related to John's hints on these comments, it is customary
> to have just a brief description before "Copyright" lines. This means one
> sentence, or a short paragraph (3-4 sentences max). The lenghty syntax
> commemt should be, in my opinion, moved after the license preamble, just
> before the start of real Python code.
>

Thanks Mr. John and Aleksandar for your feedback. I will update the script
accordingly.


>
> One question:
>
> What is the behavior in case of the executable architecture and "target"
> command line option mismatch (for example, one specifies m68k target, but
> passes hppa executable? Would that be detected before bisect search, or the
> bisect procedure will be applied even though such cases do not make sense?
>

The script will exit with an error of something along the lines of "Invalid
ELF image for this architecture".
This is done before starting "bisect" and after the initial "configure" and
"make".


> Yours, Aleksandar
>
>
>
>> +#  This program is free software: you can redistribute it and/or modify
>> +#  it under the terms of the GNU General Public License as published by
>> +#  the Free Software Foundation, either version 2 of the License, or
>> +#  (at your option) any later version.
>> +#
>> +#  This program is distributed in the hope that it will be useful,
>> +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> +#  GNU General Public License for more details.
>> +#
>> +#  You should have received a copy of the GNU General Public License
>> +#  along with this program. If not, see .
>> +
>> +import argparse
>> +import multiprocessing
>> +import tempfile
>> +import os
>> +import shutil
>> +import subprocess
>> +import sys
>> +
>> +
>> + GIT WRAPPERS 
>> +def git_bisect(qemu_path, command, args=None):
>> +"""
>> +Wrapper function for running git bisect.
>> +
>> +Parameters:
>> +qemu_path (str): QEMU path.
>> +command (str):   bisect command (start|fast|slow|reset).
>> +args (list): Optional arguments.
>> +
>> +Returns:
>> +(str):   git bisect stdout.
>> +"""
>> +process = ["git", "bisect", command]
>> +if args:
>> +process += args
>> +bisect = subprocess.run(process,
>> +cwd=qemu_path,
>> +stdout=subprocess.PIPE,
>> +stderr=subprocess.PIPE)
>> +if bisect.returncode:
>> +sys.exit(bisect.stderr.decode("utf-8"))
>> +return bisect.stdout.decode("utf-8")
>> +
>> +
>> +def git_checkout(commit, qemu_path):
>> +"""
>> +Wrapper function for checking out a given git commit.
>> +
>> +Parameters:
>> +commit (str):Commit hash of a git commit.
>> +qemu_path (str): QEMU path.
>> +"""
>> +checkout_commit = subprocess.run(["git",
>> +  "checkout",
>> +  commit],
>> + cwd=qemu_path,
>> + stdout=subprocess.DEVNULL,
>> + stderr=subprocess.PIPE)
>> +if checkout_commit.returncode:
>> +sys.exit(checkout_commit.stderr.decode("utf-8"))
>> +
>> +
>> +def git_clone(qemu_path):
>> +"""
>> +Wrapper function for cloning QEMU git repo from GitHub.
>> +
>> +Parameters:
>> +qemu_path (str): Path to clone the QEMU repo to.
>> +"""
>> +clone_qemu = subprocess.run(["git",
>> + "clone",
>> + "https://github.com/qemu/qemu.git;,
>> + qemu_path],
>> +stderr=subprocess.STDOUT)
>> +if clone_qemu.returncode:
>> +sys.exit("Failed to clone QEMU!")
>> +##
>> +
>> +
>> +def check_requirements(tool):
>> +"""
>> +Verify that all script requirements are installed (perf|callgrind &
>> git).
>> +
>> +Parameters:
>> +tool (str): Tool used for the measurement (perf or callgrind).
>> +"""
>> +if tool == "perf":
>> +check_perf_installation = subprocess.run(["which", "perf"],
>> +
>>  stdout=subprocess.DEVNULL)
>> +if check_perf_installation.returncode:
>> +sys.exit("Please install perf before running the script.")
>> +
>> +# Insure user has previllage to run perf
>> +check_perf_executability = subprocess.run(["perf", "stat", "ls",
>> "/"],
>> +
>> stdout=subprocess.DEVNULL,
>> +
>> stderr=subprocess.DEVNULL)
>> +if check_perf_executability.returncode:
>> +sys.exit("""
>> +Error:
>> +You may not have permission to collect 

[PATCH 1/1] scripts/performance: Add bisect.py script

2020-07-21 Thread Ahmed Karaman
Python script that locates the commit that caused a performance
degradation or improvement in QEMU using the git bisect command
(binary search).

Syntax:
bisect.py [-h] -s,--start START [-e,--end END] [-q,--qemu QEMU] \
--target TARGET --tool {perf,callgrind} -- \
 []

[-h] - Print the script arguments help message
-s,--start START - First commit hash in the search range
[-e,--end END] - Last commit hash in the search range
(default: Latest commit)
[-q,--qemu QEMU] - QEMU path.
(default: Path to a GitHub QEMU clone)
--target TARGET - QEMU target name
--tool {perf,callgrind} - Underlying tool used for measurements

Example of usage:
bisect.py --start=fdd76fecdd --qemu=/path/to/qemu --target=ppc \
--tool=perf -- coulomb_double-ppc -n 1000

Example output:
Start Commit Instructions: 12,710,790,060
End Commit Instructions:   13,031,083,512
Performance Change:-2.458%

Estimated Number of Steps: 10

*BISECT STEP 1*
Instructions:13,031,097,790
Status:  slow commit
*BISECT STEP 2*
Instructions:12,710,805,265
Status:  fast commit
*BISECT STEP 3*
Instructions:13,031,028,053
Status:  slow commit
*BISECT STEP 4*
Instructions:12,711,763,211
Status:  fast commit
*BISECT STEP 5*
Instructions:13,031,027,292
Status:  slow commit
*BISECT STEP 6*
Instructions:12,711,748,738
Status:  fast commit
*BISECT STEP 7*
Instructions:12,711,748,788
Status:  fast commit
*BISECT STEP 8*
Instructions:13,031,100,493
Status:  slow commit
*BISECT STEP 9*
Instructions:12,714,472,954
Status:  fast commit
BISECT STEP 10*
Instructions:12,715,409,153
Status:  fast commit
BISECT STEP 11*
Instructions:12,715,394,739
Status:  fast commit

*BISECT RESULT*
commit 0673ecdf6cb2b1445a85283db8cbacb251c46516
Author: Richard Henderson 
Date:   Tue May 5 10:40:23 2020 -0700

softfloat: Inline float64 compare specializations

Replace the float64 compare specializations with inline functions
that call the standard float64_compare{,_quiet} functions.
Use bool as the return type.
***

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/bisect.py | 374 ++
 1 file changed, 374 insertions(+)
 create mode 100755 scripts/performance/bisect.py

diff --git a/scripts/performance/bisect.py b/scripts/performance/bisect.py
new file mode 100755
index 00..869cc69ef4
--- /dev/null
+++ b/scripts/performance/bisect.py
@@ -0,0 +1,374 @@
+#!/usr/bin/env python3
+
+#  Locate the commit that caused a performance degradation or improvement in
+#  QEMU using the git bisect command (binary search).
+#
+#  Syntax:
+#  bisect.py [-h] -s,--start START [-e,--end END] [-q,--qemu QEMU] \
+#  --target TARGET --tool {perf,callgrind} -- \
+#   []
+#
+#  [-h] - Print the script arguments help message
+#  -s,--start START - First commit hash in the search range
+#  [-e,--end END] - Last commit hash in the search range
+# (default: Latest commit)
+#  [-q,--qemu QEMU] - QEMU path.
+#  (default: Path to a GitHub QEMU clone)
+#  --target TARGET - QEMU target name
+#  --tool {perf,callgrind} - Underlying tool used for measurements
+
+#  Example of usage:
+#  bisect.py --start=fdd76fecdd --qemu=/path/to/qemu --target=ppc --tool=perf \
+#  -- coulomb_double-ppc -n 1000
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import multiprocessing
+import tempfile
+import os
+import shutil
+import subprocess
+import sys
+
+
+ GIT WRAPPERS 
+def git_bisect(qemu_path, command, args=None):
+

[PATCH 0/1] Add bisect.py script

2020-07-21 Thread Ahmed Karaman
Hi,

This series adds the new bisect.py script introduced in report 5 of
the "TCG Continuous Benchmarking" GSoC project.

The script is used for locating the commit that caused a performance
degradation or improvement in QEMU using the git bisect command
(binary search).

To learn more about how the script works and how it can be used for
detecting two commits, one that introduced a performance degradation
in PowerPC targets, and the other introducing a performance
improvement in MIPS, please check the
"Finding Commits Affecting QEMU Performance" report.

Report link:
https://lists.nongnu.org/archive/html/qemu-devel/2020-07/msg05769.html

Best regards,
Ahmed Karaman

Ahmed Karaman (1):
  scripts/performance: Add bisect.py script

 scripts/performance/bisect.py | 374 ++
 1 file changed, 374 insertions(+)
 create mode 100755 scripts/performance/bisect.py

-- 
2.17.1




Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#5] Finding Commits Affecting QEMU Performance

2020-07-21 Thread Ahmed Karaman
On Tue, Jul 21, 2020 at 1:54 PM Alex Bennée  wrote:
>
>
> Ahmed Karaman  writes:
>
> > Hi,
> >
> > The fifth report of the TCG Continuous Benchmarking project concludes
> > a mini-series of three reports that dealt with the performance
> > comparison and analysis of QEMU 5.0 and 5.1-pre-soft-freeze.
> >
> > The report presents a new Python script that utilizes "git bisect" for
> > running a binary search within a specified range of commits to
> > automatically detect the commit causing a performance improvement or
> > degradation.
>
> Excellent stuff.

Thanks for your continued support!

>
> > The new script is then used to find the commit introducing the PowerPC
> > performance degradation as well as that introducing the performance
> > improvement in MIPS. The results obtained for both commits proves the
> > correctness of the conclusions and analyses presented in the two
> > previous reports.
>
> I can certainly envision a mechanism where 0673ec slows things down. I
> wonder if it would come back if instead of inline function calls we
> ended up making concrete flattend versions, e.g.:
>
> bool QEMU_FLATTEN float64_eq(float64 a, float64 b, float_status *s)
> {
> return float64_compare(a, b, s) == float_relation_equal;
> }
>
> PPC is of course more affected by these changes than others because
> HARDFLOAT never gets a chance to kick in. Looking at the objdump of
> f64_compare there should surely be an opportunity to loose some of the
> branches when looking for a certain test result?

Interesting, I will try to tinker a little bit with the float64
functions and will let you know if I find anything interesting.

>
> >
> > Report link:
> > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Finding-Commits-Affecting-QEMU-Performance/
> >
> > Previous reports:
> > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> > Report 2 - Dissecting QEMU Into Three Main Parts:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
> > Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01978.html
> > Report 4 - Listing QEMU Helpers and Function Callees:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04227.html
> >
> > Best regards,
> > Ahmed Karaman
>
>
> --
> Alex Bennée

Best regards,
Ahmed Karaman



[REPORT] [GSoC - TCG Continuous Benchmarking] [#5] Finding Commits Affecting QEMU Performance

2020-07-20 Thread Ahmed Karaman
Hi,

The fifth report of the TCG Continuous Benchmarking project concludes
a mini-series of three reports that dealt with the performance
comparison and analysis of QEMU 5.0 and 5.1-pre-soft-freeze.

The report presents a new Python script that utilizes "git bisect" for
running a binary search within a specified range of commits to
automatically detect the commit causing a performance improvement or
degradation.

The new script is then used to find the commit introducing the PowerPC
performance degradation as well as that introducing the performance
improvement in MIPS. The results obtained for both commits proves the
correctness of the conclusions and analyses presented in the two
previous reports.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Finding-Commits-Affecting-QEMU-Performance/

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
Report 2 - Dissecting QEMU Into Three Main Parts:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01978.html
Report 4 - Listing QEMU Helpers and Function Callees:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04227.html

Best regards,
Ahmed Karaman



[PATCH v2 2/2] scripts/performance: Add list_helpers.py script

2020-07-16 Thread Ahmed Karaman
Python script that prints executed helpers of a QEMU invocation.

Syntax:
list_helpers.py [-h] -- \
[] \
[]

[-h] - Print the script arguments help message.

Example of usage:
list_helpers.py -- qemu-mips coulomb_double-mips -n10

Example output:
 Total number of instructions: 108,933,695

 Executed QEMU Helpers:

 No. Ins Percent  Calls Ins/Call Helper Name Source File
 --- --- --- --  ---
   1 183,021  0.168%  1,305  140 helper_float_sub_d  
/target/mips/fpu_helper.c
   2 177,111  0.163%770  230 helper_float_madd_d 
/target/mips/fpu_helper.c
   3 171,537  0.157%  1,014  169 helper_float_mul_d  
/target/mips/fpu_helper.c
   4 157,298  0.144%  2,443   64 helper_lookup_tb_ptr
/accel/tcg/tcg-runtime.c
   5 138,123  0.127%897  153 helper_float_add_d  
/target/mips/fpu_helper.c
   6  47,083  0.043%207  227 helper_float_msub_d 
/target/mips/fpu_helper.c
   7  24,062  0.022%487   49 helper_cmp_d_lt 
/target/mips/fpu_helper.c
   8  22,910  0.021%150  152 helper_float_div_d  
/target/mips/fpu_helper.c
   9  15,497  0.014%321   48 helper_cmp_d_eq 
/target/mips/fpu_helper.c
  10   9,100  0.008% 52  175 helper_float_trunc_w_d  
/target/mips/fpu_helper.c
  11   7,059  0.006% 10  705 helper_float_sqrt_d 
/target/mips/fpu_helper.c
  12   3,000  0.003% 40   75 helper_cmp_d_ule
/target/mips/fpu_helper.c
  13   2,720  0.002% 20  136 helper_float_cvtd_w 
/target/mips/fpu_helper.c
  14   2,477  0.002% 27   91 helper_swl  
/target/mips/op_helper.c
  15   2,000  0.002% 40   50 helper_cmp_d_le 
/target/mips/fpu_helper.c
  16   1,800  0.002% 40   45 helper_cmp_d_un 
/target/mips/fpu_helper.c
  17   1,164  0.001% 12   97 helper_raise_exception_ 
/target/mips/op_helper.c
  18 720  0.001% 10   72 helper_cmp_d_ult
/target/mips/fpu_helper.c
  19 560  0.001%1404 helper_cfc1 
/target/mips/fpu_helper.c

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/list_helpers.py | 207 
 1 file changed, 207 insertions(+)
 create mode 100755 scripts/performance/list_helpers.py

diff --git a/scripts/performance/list_helpers.py 
b/scripts/performance/list_helpers.py
new file mode 100755
index 00..a97c7ed4fe
--- /dev/null
+++ b/scripts/performance/list_helpers.py
@@ -0,0 +1,207 @@
+#!/usr/bin/env python3
+
+#  Print the executed helpers of a QEMU invocation.
+#
+#  Syntax:
+#  list_helpers.py [-h] -- \
+#  [] \
+#  []
+#
+#  [-h] - Print the script arguments help message.
+#
+#  Example of usage:
+#  list_helpers.py -- qemu-mips coulomb_double-mips
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+import tempfile
+
+
+def find_JIT_line(callgrind_data):
+"""
+Search for the line with the JIT call in the callgrind_annotate
+output when ran using --tre=calling.
+All the helpers should be listed after that line.
+
+Parameters:
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(int): Line number of JIT call
+"""
+line = -1
+for i in range(len(callgrind_data)):
+split_line = callgrind_data[i].split()
+if len(split_line) > 2 and \
+split_line[1] == "*" and \
+split_line[-1] == "[???]":
+line = i
+break
+return line
+
+
+def get_helpers(JIT_line, callgrind_data):
+"""
+Get all helpers data given the line number of the JIT call.
+
+Parameters:
+JIT_line (int): Line number of the JIT call
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(list):[[number_of_instructions(int), helper_name(str),
+ number_of_calls(int), source_file(str)]]
+"""
+helpers = []
+next_helper = JIT_line + 1
+while (callgrind_data[next_helper] != &qu

[PATCH v2 0/2] Add list_fn_callees.py and list_helpers.py scripts

2020-07-16 Thread Ahmed Karaman
Hi,

This series adds the two new scripts introduced in report 4 of the
"TCG Continuous Benchmarking" GSoC project.

"list_fn_callees.py" is used for printing the callees of a given list
of QEMU functions.

"list_helpers.py" is used for printing the executed helpers of a QEMU
invocation.

To learn more about how the scripts work and how they can be used for
analyzing the performance of different targets, please check the
"Listing QEMU Helpers and Function Callees" report.

Report link:
https://lists.nongnu.org/archive/html/qemu-devel/2020-07/msg04227.html

Best regards,
Ahmed Karaman

v1->v2:
- Indent script example output in commit message to pass the "Test checkpatch"
  on patchew.

Ahmed Karaman (2):
  scripts/performance: Add list_fn_callees.py script
  scripts/performance: Add list_helpers.py script

 scripts/performance/list_fn_callees.py | 228 +
 scripts/performance/list_helpers.py| 207 ++
 2 files changed, 435 insertions(+)
 create mode 100755 scripts/performance/list_fn_callees.py
 create mode 100755 scripts/performance/list_helpers.py

-- 
2.17.1




[PATCH v2 1/2] scripts/performance: Add list_fn_callees.py script

2020-07-16 Thread Ahmed Karaman
Python script that prints the callees of a given list of QEMU
functions.

Syntax:
list_fn_callees.py [-h] -f FUNCTION [FUNCTION ...] -- \
[] \
[]

[-h] - Print the script arguments help message.
-f FUNCTION [FUNCTION ...] - List of function names

Example of usage:
list_fn_callees.py -f helper_float_sub_d helper_float_mul_d -- \
  qemu-mips coulomb_double-mips -n10

Example output:
 Total number of instructions: 108,952,851

 Callees of helper_float_sub_d:

 No. Instructions Percentage  Calls Ins/Call Function Name Source File
 ---  -- --  - ---
   1  153,160 0.141%  1,305 117  float64_sub   
/fpu/softfloat.c

 Callees of helper_float_mul_d:

 No. Instructions Percentage  Calls Ins/Call Function Name Source File
 ---  -- --  - ---
   1  131,137 0.120%  1,014  129 float64_mul   
/fpu/softfloat.c

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/list_fn_callees.py | 228 +
 1 file changed, 228 insertions(+)
 create mode 100755 scripts/performance/list_fn_callees.py

diff --git a/scripts/performance/list_fn_callees.py 
b/scripts/performance/list_fn_callees.py
new file mode 100755
index 00..f0ec5c8e81
--- /dev/null
+++ b/scripts/performance/list_fn_callees.py
@@ -0,0 +1,228 @@
+#!/usr/bin/env python3
+
+#  Print the callees of a given list of QEMU functions.
+#
+#  Syntax:
+#  list_fn_callees.py [-h] -f FUNCTION [FUNCTION ...] -- \
+#  [] \
+#  []
+#
+#  [-h] - Print the script arguments help message.
+#  -f FUNCTION [FUNCTION ...] - List of function names
+#
+#  Example of usage:
+#  list_fn_callees.py -f helper_float_sub_d helper_float_mul_d -- \
+#qemu-mips coulomb_double-mips
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+import tempfile
+
+
+def find_function_lines(function_name, callgrind_data):
+"""
+Search for the line with the function name in the
+callgrind_annotate output when ran using --tre=calling.
+All the function callees should be listed after that line.
+
+Parameters:
+function_name (string): The desired function name to print its callees
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(list): List of function line numbers
+"""
+lines = []
+for i in range(len(callgrind_data)):
+split_line = callgrind_data[i].split()
+if len(split_line) > 2 and \
+split_line[1] == "*" and \
+split_line[2].split(":")[-1] == function_name:
+# Function might be in the callgrind_annotate output more than
+# once, so don't break after finding an instance
+if callgrind_data[i + 1] != "\n":
+# Only append the line number if the found instance has
+# callees
+lines.append(i)
+return lines
+
+
+def get_function_calles(function_lines, callgrind_data):
+"""
+Get all callees data for a function given its list of line numbers in
+callgrind_annotate output.
+
+Parameters:
+function_lines (list): Line numbers of the function to get its callees
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(list):[[number_of_instructions(int), callee_name(str),
+ number_of_calls(int), source_file(str)]]
+"""
+callees = []
+for function_line in function_lines:
+next_callee = function_line + 1
+while (callgrind_data[next_callee] != "\n"):
+split_line = callgrind_data[next_callee].split()
+number_of_instructions = int(split_line[0].replace(",", ""))
+source_file = split_line[2].split(":")[0]
+callee_name = split_line[2].split(":")[1]
+number_of_calls = int(split_line[3][1:-2])
+callees.append([number_of_instructions, callee_name,
+ 

[PATCH 2/2] scripts/performance: Add list_helpers.py script

2020-07-14 Thread Ahmed Karaman
Python script that prints executed helpers of a QEMU invocation.

Syntax:
list_helpers.py [-h] -- \
[] \
[]

[-h] - Print the script arguments help message.

Example of usage:
list_helpers.py -- qemu-mips coulomb_double-mips -n10

Example output:
Total number of instructions: 108,933,695

Executed QEMU Helpers:

No. Ins Percent  Calls Ins/Call Helper Name Source File
--- --- --- --  ---
  1 183,021  0.168%  1,305  140 helper_float_sub_d  
/target/mips/fpu_helper.c
  2 177,111  0.163%770  230 helper_float_madd_d 
/target/mips/fpu_helper.c
  3 171,537  0.157%  1,014  169 helper_float_mul_d  
/target/mips/fpu_helper.c
  4 157,298  0.144%  2,443   64 helper_lookup_tb_ptr
/accel/tcg/tcg-runtime.c
  5 138,123  0.127%897  153 helper_float_add_d  
/target/mips/fpu_helper.c
  6  47,083  0.043%207  227 helper_float_msub_d 
/target/mips/fpu_helper.c
  7  24,062  0.022%487   49 helper_cmp_d_lt 
/target/mips/fpu_helper.c
  8  22,910  0.021%150  152 helper_float_div_d  
/target/mips/fpu_helper.c
  9  15,497  0.014%321   48 helper_cmp_d_eq 
/target/mips/fpu_helper.c
 10   9,100  0.008% 52  175 helper_float_trunc_w_d  
/target/mips/fpu_helper.c
 11   7,059  0.006% 10  705 helper_float_sqrt_d 
/target/mips/fpu_helper.c
 12   3,000  0.003% 40   75 helper_cmp_d_ule
/target/mips/fpu_helper.c
 13   2,720  0.002% 20  136 helper_float_cvtd_w 
/target/mips/fpu_helper.c
 14   2,477  0.002% 27   91 helper_swl  
/target/mips/op_helper.c
 15   2,000  0.002% 40   50 helper_cmp_d_le 
/target/mips/fpu_helper.c
 16   1,800  0.002% 40   45 helper_cmp_d_un 
/target/mips/fpu_helper.c
 17   1,164  0.001% 12   97 helper_raise_exception_ 
/target/mips/op_helper.c
 18 720  0.001% 10   72 helper_cmp_d_ult
/target/mips/fpu_helper.c
 19 560  0.001%1404 helper_cfc1 
/target/mips/fpu_helper.c

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/list_helpers.py | 207 
 1 file changed, 207 insertions(+)
 create mode 100755 scripts/performance/list_helpers.py

diff --git a/scripts/performance/list_helpers.py 
b/scripts/performance/list_helpers.py
new file mode 100755
index 00..a97c7ed4fe
--- /dev/null
+++ b/scripts/performance/list_helpers.py
@@ -0,0 +1,207 @@
+#!/usr/bin/env python3
+
+#  Print the executed helpers of a QEMU invocation.
+#
+#  Syntax:
+#  list_helpers.py [-h] -- \
+#  [] \
+#  []
+#
+#  [-h] - Print the script arguments help message.
+#
+#  Example of usage:
+#  list_helpers.py -- qemu-mips coulomb_double-mips
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+import tempfile
+
+
+def find_JIT_line(callgrind_data):
+"""
+Search for the line with the JIT call in the callgrind_annotate
+output when ran using --tre=calling.
+All the helpers should be listed after that line.
+
+Parameters:
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(int): Line number of JIT call
+"""
+line = -1
+for i in range(len(callgrind_data)):
+split_line = callgrind_data[i].split()
+if len(split_line) > 2 and \
+split_line[1] == "*" and \
+split_line[-1] == "[???]":
+line = i
+break
+return line
+
+
+def get_helpers(JIT_line, callgrind_data):
+"""
+Get all helpers data given the line number of the JIT call.
+
+Parameters:
+JIT_line (int): Line number of the JIT call
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(list):[[number_of_instructions(int), helper_name(str),
+ number_of_calls(int), source_file(str)]]
+"""
+helpers = []
+next_helper = JIT_line + 1
+while (callgrind_data[next_helper] != "\n"):
+   

[PATCH 1/2] scripts/performance: Add list_fn_callees.py script

2020-07-14 Thread Ahmed Karaman
Python script that prints the callees of a given list of QEMU
functions.

Syntax:
list_fn_callees.py [-h] -f FUNCTION [FUNCTION ...] -- \
[] \
[]

[-h] - Print the script arguments help message.
-f FUNCTION [FUNCTION ...] - List of function names

Example of usage:
list_fn_callees.py -f helper_float_sub_d helper_float_mul_d -- \
  qemu-mips coulomb_double-mips -n10

Example output:
Total number of instructions: 108,952,851

Callees of helper_float_sub_d:

No. Instructions Percentage  Calls Ins/Call Function Name Source File
---  -- --  - ---
  1  153,160 0.141%  1,305 117  float64_sub   /fpu/softfloat.c

Callees of helper_float_mul_d:

No. Instructions Percentage  Calls Ins/Call Function Name Source File
---  -- --  - ---
  1  131,137 0.120%  1,014  129 float64_mul   /fpu/softfloat.c

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/list_fn_callees.py | 228 +
 1 file changed, 228 insertions(+)
 create mode 100755 scripts/performance/list_fn_callees.py

diff --git a/scripts/performance/list_fn_callees.py 
b/scripts/performance/list_fn_callees.py
new file mode 100755
index 00..f0ec5c8e81
--- /dev/null
+++ b/scripts/performance/list_fn_callees.py
@@ -0,0 +1,228 @@
+#!/usr/bin/env python3
+
+#  Print the callees of a given list of QEMU functions.
+#
+#  Syntax:
+#  list_fn_callees.py [-h] -f FUNCTION [FUNCTION ...] -- \
+#  [] \
+#  []
+#
+#  [-h] - Print the script arguments help message.
+#  -f FUNCTION [FUNCTION ...] - List of function names
+#
+#  Example of usage:
+#  list_fn_callees.py -f helper_float_sub_d helper_float_mul_d -- \
+#qemu-mips coulomb_double-mips
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+import tempfile
+
+
+def find_function_lines(function_name, callgrind_data):
+"""
+Search for the line with the function name in the
+callgrind_annotate output when ran using --tre=calling.
+All the function callees should be listed after that line.
+
+Parameters:
+function_name (string): The desired function name to print its callees
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(list): List of function line numbers
+"""
+lines = []
+for i in range(len(callgrind_data)):
+split_line = callgrind_data[i].split()
+if len(split_line) > 2 and \
+split_line[1] == "*" and \
+split_line[2].split(":")[-1] == function_name:
+# Function might be in the callgrind_annotate output more than
+# once, so don't break after finding an instance
+if callgrind_data[i + 1] != "\n":
+# Only append the line number if the found instance has
+# callees
+lines.append(i)
+return lines
+
+
+def get_function_calles(function_lines, callgrind_data):
+"""
+Get all callees data for a function given its list of line numbers in
+callgrind_annotate output.
+
+Parameters:
+function_lines (list): Line numbers of the function to get its callees
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(list):[[number_of_instructions(int), callee_name(str),
+ number_of_calls(int), source_file(str)]]
+"""
+callees = []
+for function_line in function_lines:
+next_callee = function_line + 1
+while (callgrind_data[next_callee] != "\n"):
+split_line = callgrind_data[next_callee].split()
+number_of_instructions = int(split_line[0].replace(",", ""))
+source_file = split_line[2].split(":")[0]
+callee_name = split_line[2].split(":")[1]
+number_of_calls = int(split_line[3][1:-2])
+callees.append([number_of_instructions, callee_name,
+  

[PATCH 0/2] Add list_fn_callees.py and list_helpers.py scripts

2020-07-14 Thread Ahmed Karaman
Hi,

This series adds the two new scripts introduced in report 4 of the
"TCG Continuous Benchmarking" GSoC project.

"list_fn_callees.py" is used for printing the callees of a given list
of QEMU functions.

"list_helpers.py" is used for printing the executed helpers of a QEMU
invocation.

To learn more about how the scripts work and how they can be used for
analyzing the performance of different targets, please check the
"Listing QEMU Helpers and Function Callees" report.

Report link:
https://lists.nongnu.org/archive/html/qemu-devel/2020-07/msg04227.html

Best regards,
Ahmed Karaman

Ahmed Karaman (2):
  scripts/performance: Add list_fn_callees.py script
  scripts/performance: Add list_helpers.py script

 scripts/performance/list_fn_callees.py | 228 +
 scripts/performance/list_helpers.py| 207 ++
 2 files changed, 435 insertions(+)
 create mode 100755 scripts/performance/list_fn_callees.py
 create mode 100755 scripts/performance/list_helpers.py

-- 
2.17.1




[REPORT] [GSoC - TCG Continuous Benchmarking] [#4] Listing QEMU Helpers and Function Callees

2020-07-13 Thread Ahmed Karaman
Greetings,

The fourth report of the TCG Continuous Benchmarking series builds
upon the previous report by presenting two new Python scripts that
facilitate the process of displaying the executed QEMU helpers and
function callees without the need of setting up KCachegrind.

The ppc performance degradation is then re-analyzed using the new
scripts. The report also introduces the analysis of three other
targets, hppa and sh4, explaining why they were not affected the same
way as ppc, and mips, explaining why it showed an increase in
performance.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Listing-QEMU-Helpers-and-Function-Callees/

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
Report 2 - Dissecting QEMU Into Three Main Parts:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
Report 3 - QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01978.html

Best regards,
Ahmed Karaman



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#3] QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison

2020-07-10 Thread Ahmed Karaman
On Thu, Jul 9, 2020 at 4:41 PM Alex Bennée  wrote:
>
>
> Ahmed Karaman  writes:
>
> > Hi,
> >
> > The third report of the TCG Continuous Benchmarking series utilizes
> > the tools presented in the previous report for comparing the
> > performance of 17 different targets across two versions of QEMU. The
> > two versions addressed are 5.0 and 5.1-pre-soft-freeze (current state
> > of QEMU).
> >
> > After summarizing the results, the report utilizes the KCachegrind
> > tool and dives into the analysis of why all three PowerPC targets
> > (ppc, ppc64, ppc64le) had a performance degradation between the two
> > QEMU versions.
>
> It's an interesting degradation especially as you would think that a
> change in the softfloat implementation should hit everyone in the same
> way.
>

That's the same that I've thought of, but while working on next week's
report, it appears that this specific change introduced a performance
improvement in other targets!

> We actually have a tool for benchmarking the softfloat implementation
> itself called fp-bench. You can find it in tests/fp. I would be curious
> to see if you saw a drop in performance in the following:
>
>   ./fp-bench -p double -o cmp
>

I ran the command before and after the commit introducing the
degradation. Both runs gave results varying between 600~605 MFlops.
Running with Callgrind and the Coulomb benchmark, the results were:
Number of instructions before: 12,715,390,413
Number of isntructions after: 13,031,104,137

> >
> > Report link:
> > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-5.0-and-5.1-pre-soft-freeze-Dissect-Comparison/
>
> If you identify a drop in performance due to a commit linking to it from
> the report wouldn't be a bad idea so those that want to quickly
> replicate the test can do before/after runs.
>

Report number 5 will introduce a new tool for detecting commits
causing performance improvements and degradations. The report will
utilize this tool to find out the specific commit introducing these
changes.

> >
> > Previous reports:
> > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> > Report 2 - Dissecting QEMU Into Three Main Parts:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
> >
> > Best regards,
> > Ahmed Karaman
>
>
> --
> Alex Bennée

Best regards,
Ahmed Karaman



[PATCH v3 1/1] scripts/performance: Add dissect.py script

2020-07-08 Thread Ahmed Karaman
Python script that dissects QEMU execution into three main phases:
code generation, JIT execution and helpers execution.

Syntax:
dissect.py [-h] --  [] \
  []

[-h] - Print the script arguments help message.

Example of usage:
dissect.py -- qemu-arm coulomb_double-arm

Example output:
Total Instructions:4,702,865,362

Code Generation: 115,819,309 2.463%
JIT Execution: 1,081,980,52823.007%
Helpers:   3,505,065,52574.530%

Signed-off-by: Ahmed Karaman 
Reviewed-by: Aleksandar Markovic 
---
 scripts/performance/dissect.py | 166 +
 1 file changed, 166 insertions(+)
 create mode 100755 scripts/performance/dissect.py

diff --git a/scripts/performance/dissect.py b/scripts/performance/dissect.py
new file mode 100755
index 00..bf24f50922
--- /dev/null
+++ b/scripts/performance/dissect.py
@@ -0,0 +1,166 @@
+#!/usr/bin/env python3
+
+#  Print the percentage of instructions spent in each phase of QEMU
+#  execution.
+#
+#  Syntax:
+#  dissect.py [-h] --  [] \
+#[]
+#
+#  [-h] - Print the script arguments help message.
+#
+#  Example of usage:
+#  dissect.py -- qemu-arm coulomb_double-arm
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+import tempfile
+
+
+def get_JIT_line(callgrind_data):
+"""
+Search for the first instance of the JIT call in
+the callgrind_annotate output when ran using --tree=caller
+This is equivalent to the self number of instructions of JIT.
+
+Parameters:
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(int): Line number
+"""
+line = -1
+for i in range(len(callgrind_data)):
+if callgrind_data[i].strip('\n') and \
+callgrind_data[i].split()[-1] == "[???]":
+line = i
+break
+if line == -1:
+sys.exit("Couldn't locate the JIT call ... Exiting.")
+return line
+
+
+def main():
+# Parse the command line arguments
+parser = argparse.ArgumentParser(
+usage='dissect.py [-h] -- '
+' [] '
+' []')
+
+parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
+
+args = parser.parse_args()
+
+# Extract the needed variables from the args
+command = args.command
+
+# Insure that valgrind is installed
+check_valgrind = subprocess.run(
+["which", "valgrind"], stdout=subprocess.DEVNULL)
+if check_valgrind.returncode:
+sys.exit("Please install valgrind before running the script.")
+
+# Save all intermediate files in a temporary directory
+with tempfile.TemporaryDirectory() as tmpdirname:
+# callgrind output file path
+data_path = os.path.join(tmpdirname, "callgrind.data")
+# callgrind_annotate output file path
+annotate_out_path = os.path.join(tmpdirname, "callgrind_annotate.out")
+
+# Run callgrind
+callgrind = subprocess.run((["valgrind",
+ "--tool=callgrind",
+ "--callgrind-out-file=" + data_path]
++ command),
+   stdout=subprocess.DEVNULL,
+   stderr=subprocess.PIPE)
+if callgrind.returncode:
+sys.exit(callgrind.stderr.decode("utf-8"))
+
+# Save callgrind_annotate output
+with open(annotate_out_path, "w") as output:
+callgrind_annotate = subprocess.run(
+["callgrind_annotate", data_path, "--tree=caller"],
+stdout=output,
+stderr=subprocess.PIPE)
+if callgrind_annotate.returncode:
+sys.exit(callgrind_annotate.stderr.decode("utf-8"))
+
+# Read the callgrind_annotate output to callgrind_data[]
+callgrind_data = []
+with open(annotate_out_path, 'r') as data:
+ 

[PATCH v3 0/1] Add Script for Dissecting QEMU Execution

2020-07-08 Thread Ahmed Karaman
Hi,

This series adds the dissect.py script which breaks down the execution
of QEMU into three main phases:
code generation, JIT execution, and helpers execution.

It prints the number of instructions executed by QEMU in each of these
three phases, plus the total number of executed instructions.

To learn more about how the script works and for further usage
instructions, please check the "Dissecting QEMU Into Three Main Parts"
report posted as part of the "TCG Continuous Benchmarking" GSoC project.

Report link:
https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09441.html

Best regards,
Ahmed Karaman

v2->v3:
- Fix a misalignment in a comment line.
- Use tempfile.TemporaryDirectory() for handling intermediate files.

Ahmed Karaman (1):
  scripts/performance: Add dissect.py script

 scripts/performance/dissect.py | 166 +
 1 file changed, 166 insertions(+)
 create mode 100755 scripts/performance/dissect.py

-- 
2.17.1




Re: [PATCH v2 1/1] scripts/performance: Add dissect.py script

2020-07-08 Thread Ahmed Karaman
On Wed, Jul 8, 2020 at 5:41 PM Philippe Mathieu-Daudé  wrote:
>
> Hi Ahmed,
>
> On 7/2/20 4:29 PM, Ahmed Karaman wrote:
> > Python script that dissects QEMU execution into three main phases:
> > code generation, JIT execution and helpers execution.
> >
> > Syntax:
> > dissect.py [-h] --  [] \
> >   []
> >
> > [-h] - Print the script arguments help message.
> >
> > Example of usage:
> > dissect.py -- qemu-arm coulomb_double-arm
> >
> > Example output:
> > Total Instructions:4,702,865,362
> >
> > Code Generation: 115,819,309   2.463%
> > JIT Execution:     1,081,980,528  23.007%
> > Helpers:   3,505,065,525  74.530%
> >
> > Signed-off-by: Ahmed Karaman 
> > ---
> >  scripts/performance/dissect.py | 165 +
> >  1 file changed, 165 insertions(+)
> >  create mode 100755 scripts/performance/dissect.py
> >
> > diff --git a/scripts/performance/dissect.py b/scripts/performance/dissect.py
> > new file mode 100755
> > index 00..8c2967d082
> > --- /dev/null
> > +++ b/scripts/performance/dissect.py
> > @@ -0,0 +1,165 @@
> > +#!/usr/bin/env python3
> > +
> > +#  Print the percentage of instructions spent in each phase of QEMU
> > +#  execution.
> > +#
> > +#  Syntax:
> > +#  dissect.py [-h] --  [] \
> > +#[]
> > +#
> > +#  [-h] - Print the script arguments help message.
> > +#
> > +#  Example of usage:
> > +#  dissect.py -- qemu-arm coulomb_double-arm
> > +#
> > +#  This file is a part of the project "TCG Continuous Benchmarking".
> > +#
> > +#  Copyright (C) 2020  Ahmed Karaman 
> > +#  Copyright (C) 2020  Aleksandar Markovic 
> > 
> > +#
> > +#  This program is free software: you can redistribute it and/or modify
> > +#  it under the terms of the GNU General Public License as published by
> > +#  the Free Software Foundation, either version 2 of the License, or
> > +#  (at your option) any later version.
> > +#
> > +#  This program is distributed in the hope that it will be useful,
> > +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > +#  GNU General Public License for more details.
> > +#
> > +#  You should have received a copy of the GNU General Public License
> > +#  along with this program. If not, see <https://www.gnu.org/licenses/>.
> > +
> > +import argparse
> > +import os
> > +import subprocess
> > +import sys
> > +
> > +
> > +def get_JIT_line(callgrind_data):
> > +"""
> > +Search for the first instance of the JIT call in
> > +the callgrind_annotate output when ran using --tree=caller
> > +This is equivalent to the self number of instructions of JIT.
> > +
> > +Parameters:
> > +callgrind_data (list): callgrind_annotate output
> > +
> > +Returns:
> > +(int): Line number
> > +   """
>
> Alignment off by 1 ;)

Thanks, didn't notice that!

>
> > +line = -1
> > +for i in range(len(callgrind_data)):
> > +if callgrind_data[i].strip('\n') and \
> > +callgrind_data[i].split()[-1] == "[???]":
> > +line = i
> > +break
> > +if line == -1:
> > +sys.exit("Couldn't locate the JIT call ... Exiting.")
> > +return line
> > +
> > +
> > +def main():
> > +# Parse the command line arguments
> > +parser = argparse.ArgumentParser(
> > +usage='dissect.py [-h] -- '
> > +' [] '
> > +' []')
> > +
> > +parser.add_argument('command', type=str, nargs='+', 
> > help=argparse.SUPPRESS)
> > +
> > +args = parser.parse_args()
> > +
> > +# Extract the needed variables from the args
> > +command = args.command
> > +
> > +# Insure that valgrind is installed
> > +check_valgrind = subprocess.run(
> > +["which", "valgrind"], stdout=subprocess.DEVNULL)
> > +if check_valgrind.returncode:
> > +sys.exit("Please install valgrind before running the script.")
> > +
> > +# Run callgrind
> > +callgrind = subprocess.run((["valgrind",
> > + "--tool=callgrind",
> > +   

[REPORT] [GSoC - TCG Continuous Benchmarking] [#3] QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison

2020-07-06 Thread Ahmed Karaman
Hi,

The third report of the TCG Continuous Benchmarking series utilizes
the tools presented in the previous report for comparing the
performance of 17 different targets across two versions of QEMU. The
two versions addressed are 5.0 and 5.1-pre-soft-freeze (current state
of QEMU).

After summarizing the results, the report utilizes the KCachegrind
tool and dives into the analysis of why all three PowerPC targets
(ppc, ppc64, ppc64le) had a performance degradation between the two
QEMU versions.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-5.0-and-5.1-pre-soft-freeze-Dissect-Comparison/

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
Report 2 - Dissecting QEMU Into Three Main Parts:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html

Best regards,
Ahmed Karaman



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-07-04 Thread Ahmed Karaman
On Sat, Jul 4, 2020 at 10:45 AM Alex Bennée  wrote:
>
>
> Aleksandar Markovic  writes:
>
> > On Wednesday, July 1, 2020, Alex Bennée  wrote:
> >
> >>
> >> Ahmed Karaman  writes:
> >>
> >> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée 
> >> wrote:
> >> >>
> >> >> Assuming your test case is constant execution (i.e. runs the same each
> >> >> time) you could run in through a plugins build to extract the number of
> >> >> guest instructions, e.g.:
> >> >>
> >> >>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d
> >> plugin ./tests/tcg/aarch64-linux-user/sha1
> >> >>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
> >> >>   insns: 158603512
> >> >>
> >> >> --
> >> >> Alex Bennée
> >> >
> >> > Hi Mr. Alex,
> >> > I've created a plugins build as you've said using "--enable-plugins"
> >> option.
> >> > I've searched for "libinsn.so" plugin that you've mentioned in your
> >> > command but it isn't in that path.
> >>
> >> make plugins
> >>
> >> and you should find them in tests/plugins/
> >>
> >>
> > Hi, both Alex and Ahmed,
> >
> > Ahmed showed me tonight the first results with number of guest
> > instructions. It was almost eye-opening to me. The thing is, by now, I had
> > only vague picture that, on average, "many" host instructions are generated
> > per one guest instruction. Now, I could see exact ratio for each target,
> > for a particular example.
> >
> > A question for Alex:
> >
> > - What would be the application of this new info? (Except that one has nice
> > feeling, like I do, of knowing the exact ratio host/guest instruction for a
> > particular scenario.)
>
> Well I think the total number of guest instructions is important because
> some architectures are more efficient than others and this will an
> impact on the total executed instructions.
>
> > I just have a feeling there is more significance of this new data that I
> > currently see. Could it be that it can be used in analysis of performance?
> > Or measuring quality of emulation (TCG operation)? But how exactly? What
> > conclusion could potentially be derived from knowing number of guest
> > instructions?
>
> Knowing the ratio (especially as it changes between workloads) means you
> can better pin point where the inefficiencies lie. You don't want to
> spend your time chasing down an inefficiency that is down to the guest
> compiler ;-)
>
> >
> > Sorry for a "stupid" question.
> >
> > Aleksandar
> >
> >
> >
> >
> >> >
> >> > Are there any other options that I should configure my build with?
> >> > Thanks in advance.
> >> >
> >> > Regards,
> >> > Ahmed Karaman
> >>
> >>
> >> --
> >> Alex Bennée
> >>
>
>
> --
> Alex Bennée

Thanks Mr. Alex for your help!

Regards,
Ahmed Karaman



Re: [PATCH v2 1/1] scripts/performance: Add dissect.py script

2020-07-02 Thread Ahmed Karaman
On Thu, Jul 2, 2020 at 5:45 PM Aleksandar Markovic
 wrote:
>
>
> A very good script! Hopefully there will be some script in near future that 
> will, for example, list all hepers used in the test program.
>
> Reviewed-by: Aleksandar Markovic 
>
>
Thanks Mr. Aleksandar. I Will start working on it.

Best regards,
Ahmed Karaman



[PATCH v2 0/1] Add Script for Dissecting QEMU Execution

2020-07-02 Thread Ahmed Karaman
Hi,

This series adds the dissect.py script which breaks down the execution
of QEMU into three main phases:
code generation, JIT execution, and helpers execution.

It prints the number of instructions executed by QEMU in each of these
three phases, plus the total number of executed instructions.

To learn more about how the script works and for further usage
instructions, please check the "Dissecting QEMU Into Three Main Parts"
report posted as part of the "TCG Continuous Benchmarking" GSoC project.

Report link:
https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09441.html

Best regards,
Ahmed Karaman

v1->v2:
- Set the executable bit for the script.
- Remove exclamation marks from error output.
- Fix a misspelling in a comment line.

Ahmed Karaman (1):
  scripts/performance: Add dissect.py script

 scripts/performance/dissect.py | 165 +
 1 file changed, 165 insertions(+)
 create mode 100755 scripts/performance/dissect.py

-- 
2.17.1




[PATCH v2 1/1] scripts/performance: Add dissect.py script

2020-07-02 Thread Ahmed Karaman
Python script that dissects QEMU execution into three main phases:
code generation, JIT execution and helpers execution.

Syntax:
dissect.py [-h] --  [] \
  []

[-h] - Print the script arguments help message.

Example of usage:
dissect.py -- qemu-arm coulomb_double-arm

Example output:
Total Instructions:4,702,865,362

Code Generation: 115,819,309 2.463%
JIT Execution: 1,081,980,52823.007%
Helpers:   3,505,065,52574.530%

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/dissect.py | 165 +
 1 file changed, 165 insertions(+)
 create mode 100755 scripts/performance/dissect.py

diff --git a/scripts/performance/dissect.py b/scripts/performance/dissect.py
new file mode 100755
index 00..8c2967d082
--- /dev/null
+++ b/scripts/performance/dissect.py
@@ -0,0 +1,165 @@
+#!/usr/bin/env python3
+
+#  Print the percentage of instructions spent in each phase of QEMU
+#  execution.
+#
+#  Syntax:
+#  dissect.py [-h] --  [] \
+#[]
+#
+#  [-h] - Print the script arguments help message.
+#
+#  Example of usage:
+#  dissect.py -- qemu-arm coulomb_double-arm
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+
+
+def get_JIT_line(callgrind_data):
+"""
+Search for the first instance of the JIT call in
+the callgrind_annotate output when ran using --tree=caller
+This is equivalent to the self number of instructions of JIT.
+
+Parameters:
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(int): Line number
+   """
+line = -1
+for i in range(len(callgrind_data)):
+if callgrind_data[i].strip('\n') and \
+callgrind_data[i].split()[-1] == "[???]":
+line = i
+break
+if line == -1:
+sys.exit("Couldn't locate the JIT call ... Exiting.")
+return line
+
+
+def main():
+# Parse the command line arguments
+parser = argparse.ArgumentParser(
+usage='dissect.py [-h] -- '
+' [] '
+' []')
+
+parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
+
+args = parser.parse_args()
+
+# Extract the needed variables from the args
+command = args.command
+
+# Insure that valgrind is installed
+check_valgrind = subprocess.run(
+["which", "valgrind"], stdout=subprocess.DEVNULL)
+if check_valgrind.returncode:
+sys.exit("Please install valgrind before running the script.")
+
+# Run callgrind
+callgrind = subprocess.run((["valgrind",
+ "--tool=callgrind",
+ "--callgrind-out-file=/tmp/callgrind.data"]
++ command),
+   stdout=subprocess.DEVNULL,
+   stderr=subprocess.PIPE)
+if callgrind.returncode:
+sys.exit(callgrind.stderr.decode("utf-8"))
+
+# Save callgrind_annotate output to /tmp/callgrind_annotate.out
+with open("/tmp/callgrind_annotate.out", "w") as output:
+callgrind_annotate = subprocess.run(
+["callgrind_annotate", "/tmp/callgrind.data", "--tree=caller"],
+stdout=output,
+stderr=subprocess.PIPE)
+if callgrind_annotate.returncode:
+os.unlink('/tmp/callgrind.data')
+output.close()
+os.unlink('/tmp/callgrind_annotate.out')
+sys.exit(callgrind_annotate.stderr.decode("utf-8"))
+
+# Read the callgrind_annotate output to callgrind_data[]
+callgrind_data = []
+with open('/tmp/callgrind_annotate.out', 'r') as data:
+callgrind_data = data.readlines()
+
+# Line number with the total number of instructions
+total_instructions_line_number = 20
+# Get the total number of instructions
+total_instructions_line_data = \
+callgrind_data[total_instructions_line_number

Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-07-01 Thread Ahmed Karaman
On Wed, Jul 1, 2020 at 5:42 PM Alex Bennée  wrote:
>
>
> Ahmed Karaman  writes:
>
> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée  wrote:
> >>
> >> Assuming your test case is constant execution (i.e. runs the same each
> >> time) you could run in through a plugins build to extract the number of
> >> guest instructions, e.g.:
> >>
> >>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d 
> >> plugin ./tests/tcg/aarch64-linux-user/sha1
> >>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
> >>   insns: 158603512
> >>
> >> --
> >> Alex Bennée
> >
> > Hi Mr. Alex,
> > I've created a plugins build as you've said using "--enable-plugins" option.
> > I've searched for "libinsn.so" plugin that you've mentioned in your
> > command but it isn't in that path.
>
> make plugins
>
> and you should find them in tests/plugins/
>
> >
> > Are there any other options that I should configure my build with?
> > Thanks in advance.
> >
> > Regards,
> > Ahmed Karaman
>
>
> --
> Alex Bennée

Thanks a lot.



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-07-01 Thread Ahmed Karaman
On Mon, Jun 29, 2020 at 12:25 PM Ahmed Karaman
 wrote:
>
> Hi,
>
> The second report of the TCG Continuous Benchmarking series builds
> upon the QEMU performance metrics calculated in the previous report.
> This report presents a method to dissect the number of instructions
> executed by a QEMU invocation into three main phases:
> - Code Generation
> - JIT Execution
> - Helpers Execution
> It devises a Python script that automates this process.
>
> After that, the report presents an experiment for comparing the
> output of running the script on 17 different targets. Many conclusions
> can be drawn from the results and two of them are discussed in the
> analysis section.
>
> Report link:
> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>
> Previous reports:
> Report 1 - Measuring Basic Performance Metrics of QEMU:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>
> Best regards,
> Ahmed Karaman

Hi Mr. Lukáš and Yonggang,

I've created a separate "setup" page on the reports website.
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/setup/

It contains the hardware and OS information of the used system.
It also contains all dependencies and setup instructions required to
set up a machine identical to the one used in the reports.

If you have any further questions or you're using a different Linux
distribution, please let me know.

Best regards,
Ahmed Karaman



Re: [PATCH 1/1] scripts/performance: Add dissect.py script

2020-07-01 Thread Ahmed Karaman
On Wed, Jul 1, 2020 at 3:41 PM Eric Blake  wrote:
>
> On 7/1/20 8:04 AM, Ahmed Karaman wrote:
> > Python script that dissects QEMU execution into three main phases:
> > code generation, JIT execution and helpers execution.
> >
> > Syntax:
> > dissect.py [-h] --  [] \
> >[]
> >
> > [-h] - Print the script arguments help message.
> >
> > Example of usage:
> > dissect.py -- qemu-arm coulomb_double-arm
>
> Given the example usage...
>
> >
> > Example output:
> > Total Instructions:4,702,865,362
> >
> > Code Generation: 115,819,309   2.463%
> > JIT Execution:     1,081,980,528  23.007%
> > Helpers:   3,505,065,525  74.530%
> >
> > Signed-off-by: Ahmed Karaman 
> > ---
> >   scripts/performance/dissect.py | 165 +
> >   1 file changed, 165 insertions(+)
> >   create mode 100644 scripts/performance/dissect.py
> >
> > diff --git a/scripts/performance/dissect.py b/scripts/performance/dissect.py
> > new file mode 100644
>
> ...this should have the executable bit set.
Thanks Mr. Eric, I don't know why I always forget doing this before
sending the patch. Will do it in V2.
>
>
> > +def get_JIT_line(callgrind_data):
> > +"""
> > +Search for the first instance of the JIT call in
> > +the callgrind_annoutate output when ran using --tree=caller
>
> annotate
Thanks.

>
> > +This is equivalent to the self number of instructions of JIT.
> > +
> > +Parameters:
> > +callgrind_data (list): callgrind_annotate output
> > +
> > +Returns:
> > +(int): Line number
> > +   """
> > +line = -1
> > +for i in range(len(callgrind_data)):
> > +if callgrind_data[i].strip('\n') and \
> > +callgrind_data[i].split()[-1] == "[???]":
> > +line = i
> > +break
> > +if line == -1:
> > +sys.exit("Couldn't locate the JIT call ... Exiting!")
>
> We tend to avoid ! at the end of error messages (it can come across as
> shouting at the user).
Yeah right, Will remove the exclamations.
>
> > +return line
> > +
> > +
> > +def main():
> > +# Parse the command line arguments
> > +parser = argparse.ArgumentParser(
> > +usage='dissect.py [-h] -- '
> > +' [] '
> > +' []')
> > +
> > +parser.add_argument('command', type=str, nargs='+', 
> > help=argparse.SUPPRESS)
> > +
> > +args = parser.parse_args()
> > +
> > +# Extract the needed variables from the args
> > +command = args.command
> > +
> > +# Insure that valgrind is installed
> > +check_valgrind = subprocess.run(
> > +["which", "valgrind"], stdout=subprocess.DEVNULL)
> > +if check_valgrind.returncode:
> > +sys.exit("Please install valgrind before running the script!")
>
> and again
Noted.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
Thanks for your feedback Mr. Eric.

Best regards,
Ahmed Karaman



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-07-01 Thread Ahmed Karaman
On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée  wrote:
>
> Assuming your test case is constant execution (i.e. runs the same each
> time) you could run in through a plugins build to extract the number of
> guest instructions, e.g.:
>
>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin 
> ./tests/tcg/aarch64-linux-user/sha1
>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
>   insns: 158603512
>
> --
> Alex Bennée

Hi Mr. Alex,
I've created a plugins build as you've said using "--enable-plugins" option.
I've searched for "libinsn.so" plugin that you've mentioned in your
command but it isn't in that path.

Are there any other options that I should configure my build with?
Thanks in advance.

Regards,
Ahmed Karaman



[PATCH 1/1] scripts/performance: Add dissect.py script

2020-07-01 Thread Ahmed Karaman
Python script that dissects QEMU execution into three main phases:
code generation, JIT execution and helpers execution.

Syntax:
dissect.py [-h] --  [] \
  []

[-h] - Print the script arguments help message.

Example of usage:
dissect.py -- qemu-arm coulomb_double-arm

Example output:
Total Instructions:4,702,865,362

Code Generation: 115,819,309 2.463%
JIT Execution: 1,081,980,52823.007%
Helpers:   3,505,065,52574.530%

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/dissect.py | 165 +
 1 file changed, 165 insertions(+)
 create mode 100644 scripts/performance/dissect.py

diff --git a/scripts/performance/dissect.py b/scripts/performance/dissect.py
new file mode 100644
index 00..26121e4a43
--- /dev/null
+++ b/scripts/performance/dissect.py
@@ -0,0 +1,165 @@
+#!/usr/bin/env python3
+
+#  Print the percentage of instructions spent in each phase of QEMU
+#  execution.
+#
+#  Syntax:
+#  dissect.py [-h] --  [] \
+#[]
+#
+#  [-h] - Print the script arguments help message.
+#
+#  Example of usage:
+#  dissect.py -- qemu-arm coulomb_double-arm
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+
+
+def get_JIT_line(callgrind_data):
+"""
+Search for the first instance of the JIT call in
+the callgrind_annoutate output when ran using --tree=caller
+This is equivalent to the self number of instructions of JIT.
+
+Parameters:
+callgrind_data (list): callgrind_annotate output
+
+Returns:
+(int): Line number
+   """
+line = -1
+for i in range(len(callgrind_data)):
+if callgrind_data[i].strip('\n') and \
+callgrind_data[i].split()[-1] == "[???]":
+line = i
+break
+if line == -1:
+sys.exit("Couldn't locate the JIT call ... Exiting!")
+return line
+
+
+def main():
+# Parse the command line arguments
+parser = argparse.ArgumentParser(
+usage='dissect.py [-h] -- '
+' [] '
+' []')
+
+parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
+
+args = parser.parse_args()
+
+# Extract the needed variables from the args
+command = args.command
+
+# Insure that valgrind is installed
+check_valgrind = subprocess.run(
+["which", "valgrind"], stdout=subprocess.DEVNULL)
+if check_valgrind.returncode:
+sys.exit("Please install valgrind before running the script!")
+
+# Run callgrind
+callgrind = subprocess.run((["valgrind",
+ "--tool=callgrind",
+ "--callgrind-out-file=/tmp/callgrind.data"]
++ command),
+   stdout=subprocess.DEVNULL,
+   stderr=subprocess.PIPE)
+if callgrind.returncode:
+sys.exit(callgrind.stderr.decode("utf-8"))
+
+# Save callgrind_annotate output to /tmp/callgrind_annotate.out
+with open("/tmp/callgrind_annotate.out", "w") as output:
+callgrind_annotate = subprocess.run(
+["callgrind_annotate", "/tmp/callgrind.data", "--tree=caller"],
+stdout=output,
+stderr=subprocess.PIPE)
+if callgrind_annotate.returncode:
+os.unlink('/tmp/callgrind.data')
+output.close()
+os.unlink('/tmp/callgrind_annotate.out')
+sys.exit(callgrind_annotate.stderr.decode("utf-8"))
+
+# Read the callgrind_annotate output to callgrind_data[]
+callgrind_data = []
+with open('/tmp/callgrind_annotate.out', 'r') as data:
+callgrind_data = data.readlines()
+
+# Line number with the total number of instructions
+total_instructions_line_number = 20
+# Get the total number of instructions
+total_instructions_line_data = \
+callgrind_data[total_instructions_line_number

[PATCH 0/1] Add Script for Dissecting QEMU Execution

2020-07-01 Thread Ahmed Karaman
Hi,

This series adds the dissect.py script which breaks down the execution
of QEMU into three main phases:
code generation, JIT execution, and helpers execution.

It prints the number of instructions executed by QEMU in each of these
three phases, plus the total number of executed instructions.

To learn more about how the script works and for further usage
instructions, please check the "Dissecting QEMU Into Three Main Parts"
report posted as part of the "TCG Continuous Benchmarking" GSoC project.

Report link:
https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09441.html

Best regards,
Ahmed Karaman

Ahmed Karaman (1):
  scripts/performance: Add dissect.py script

 scripts/performance/dissect.py | 165 +
 1 file changed, 165 insertions(+)
 create mode 100644 scripts/performance/dissect.py

-- 
2.17.1




Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-06-30 Thread Ahmed Karaman
On Tue, Jun 30, 2020 at 2:46 PM Lukáš Doktor  wrote:
>
> > However, we know that the results for hosts of different architectures
> > will be different - we expect that.
> >
> > 32-bit Intel host will also most likely produce significantly
> > different results than 64-bit Intel hosts. By the way, 64-bit targets
> > in QEMU linux-user mode are not supported on 32-bit hosts (although
> > nothing stops the user to start corresponding instances of QEMU on a
> > 32-bit host, but the results are unpredictable.
> >
> > Let's focus now on Intel 64-bit hosts only. Richard, can you perhaps
> > enlighten us on whether QEMU (from the point of view of TCG target)
> > behaves differently on different Intel 64-bit hosts, and to what
> > degree?
> >
> > I currently work remotely, but once I am be physically at my office I
> > will have a variety of hosts at the company, and would be happy to do
> > the comparison between them, wrt what you presented in Report 2.
> >
> > In conclusion, I think a basic description of your test bed is missing
> > in your reports. And, for final reports (which we call "nightly
> > reports") a detailed system description, as Mr Lukas outlined, is,
> > also in my opinion, necessary.
> >
> > Thanks, Mr. Lukas, for bringing this to our attention!
> >
>
> You're welcome. I'm more on the python side, but as far as I know different 
> cpu models (provided their features are enabled) and especially architectures 
> result in way different code-paths. Imagine an old processor without vector 
> instructions compare to newer ones that can process multiple instructions at 
> once.
>
> As for the reports, I don't think that at this point it would be necessary to 
> focus on anything besides a single cpu model (x86_64 Intel) as there are 
> already many variables. Later someone can follow-up with a cross-arch 
> comparison, if necessary.
>
> Regards,
> Lukáš
>
> > Yours,
> > Aleksandar
> >
> >
> >
> >
> >> Best regards,
> >> Ahmed Karaman
> >
>
>
Thanks Mr. Lukáš and Aleksandar,
OK, now I see how important it is to have this information somewhere
on the reports page.

In response to Mr. Yongang, I said I will create an mini-report as a
guide for setting up the testbed.
I will add a section to this report with the detailed hardware
information of the used system.
Thanks for bringing this into attention.

Best regards,
Ahmed Karaman



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-06-30 Thread Ahmed Karaman
On Tue, Jun 30, 2020 at 11:52 AM Aleksandar Markovic
 wrote:
>
> > As far as I know, this is how Ahmed test bed is setup:
> >
> > 1) Fresh installation on Ubuntu 18.04 on an Inter 64-bit host.
> > 2) Install QEMU build prerequisite packages.
> > 3) Install perf (this step is not necessary for Report 2, but it is
> > for Report 1).
> > 4) Install vallgrind.
> > 5) Install 16 gcc cross-compilers. (which, together with native
> > comipler, will sum up to the 17 possible QEMU targets)
> >
>
> The following commands install cross-compilers needed for creating
> table in the second part or Ahmed's Report 2:
>
> sudo apt-get install g++
> sudo apt-get install g++-aarch64-linux-gnu
> sudo apt-get install g++-alpha-linux-gnu
> sudo apt-get install g++-arm-linux-gnueabi
> sudo apt-get install g++-hppa-linux-gnu
> sudo apt-get install g++-m68k-linux-gnu
> sudo apt-get install g++-mips-linux-gnu
> sudo apt-get install g++-mips64-linux-gnuabi64
> sudo apt-get install g++-mips64el-linux-gnuabi64
> sudo apt-get install g++-mipsel-linux-gnu
> sudo apt-get install g++-powerpc-linux-gnu
> sudo apt-get install g++-powerpc64-linux-gnu
> sudo apt-get install g++-powerpc64le-linux-gnu
> sudo apt-get install g++-riscv64-linux-gnu
> sudo apt-get install g++-s390x-linux-gnu
> sudo apt-get install g++-sh4-linux-gnu
> sudo apt-get install g++-sparc64-linux-gnu
>
> Ahmed, I think this should be in an Appendix section of Report 2.
>
> Sincerely,
> Aleksandar
>
> > That is all fine if Mr. Yongang is able to do the above, or if he
> > already have similar system.
> >
> > I am fairly convinced that the setup for any Debian-based Linux
> > distribution will be almost identical as described above
> >
> > However, let's say Mr.Yongang system is Suse-bases distribution (SUSE
> > Linux Enterprise, openSUSE Leap, openSUSE Tumbleweed, Gecko). He could
> > do steps 2), 3), 4) in a fairly similar manner. But, step 5) will be
> > difficult. I know that support for cross-compilers is relatively poor
> > for Suse-based distributions. I think Mr. Yongang could run experiment
> > from the second part of Report 2 only for 5 or 6 targets, rather than
> > 17 as you did.
> >
> > The bottom line for Report 2:
> >
> > I think there should be an "Appendix" note on installing
> > cross-compilers. And some general note on your test bed, as well as
> > some guideline for all people like Mr. Yongang who wish to repro the
> > results on their own systems.
> >
> > Sincerely,
> > Aleksandar
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2)
> >
> >
> > > Best Regards,
> > > Ahmed Karaman
Thanks Mr. Aleksandar for your input on this one.
This is indeed my setup for the testbed used for the two previous
reports and all the upcoming ones.
To help Mr. Yongang with his setup, and anybody else trying to set
this up, I plan to post a mini-report (Report 0) to lay down the
instructions for setting up a system similar to the one used in the
reports.

Best regards,
Ahmed Karaman



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-06-30 Thread Ahmed Karaman
On Tue, Jun 30, 2020 at 7:59 AM 罗勇刚(Yonggang Luo)  wrote:
>
> Wonderful work, May I reproduce the work on my local machine?
>
> On Mon, Jun 29, 2020 at 6:26 PM Ahmed Karaman  
> wrote:
>>
>> Hi,
>>
>> The second report of the TCG Continuous Benchmarking series builds
>> upon the QEMU performance metrics calculated in the previous report.
>> This report presents a method to dissect the number of instructions
>> executed by a QEMU invocation into three main phases:
>> - Code Generation
>> - JIT Execution
>> - Helpers Execution
>> It devises a Python script that automates this process.
>>
>> After that, the report presents an experiment for comparing the
>> output of running the script on 17 different targets. Many conclusions
>> can be drawn from the results and two of them are discussed in the
>> analysis section.
>>
>> Report link:
>> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>>
>> Previous reports:
>> Report 1 - Measuring Basic Performance Metrics of QEMU:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>>
>> Best regards,
>> Ahmed Karaman
>
>
>
> --
>  此致
> 礼
> 罗勇刚
> Yours
> sincerely,
> Yonggang Luo

Thanks Mr. Yonggang. Yes of course, go ahead.
Please let me know if you have any further questions.

Best Regards,
Ahmed Karaman



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-06-30 Thread Ahmed Karaman
On Tue, Jun 30, 2020 at 6:34 AM Lukáš Doktor  wrote:
>
> Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a):
> > Hi,
> >
> > The second report of the TCG Continuous Benchmarking series builds
> > upon the QEMU performance metrics calculated in the previous report.
> > This report presents a method to dissect the number of instructions
> > executed by a QEMU invocation into three main phases:
> > - Code Generation
> > - JIT Execution
> > - Helpers Execution
> > It devises a Python script that automates this process.
> >
> > After that, the report presents an experiment for comparing the
> > output of running the script on 17 different targets. Many conclusions
> > can be drawn from the results and two of them are discussed in the
> > analysis section.
> >
> > Report link:
> > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
> >
> > Previous reports:
> > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> >
> > Best regards,
> > Ahmed Karaman
>
> Hello Ahmed,
>
> very nice reading, both reports so far. One thing that could be better 
> displayed is the system you used this to generate. This would come handy 
> especially later when you move from examples to actual reports. I think it'd 
> make sense to add a section with a clear definition of the machine as well as 
> the operation system, qemu version and eventually other deps (like compiler, 
> flags, ...). For this report something like:
>
> architecture: x86_64
> cpu_codename: Kaby Lake
> cpu: i7-8650U
> ram: 32GB DDR4
> os: Fedora 32
> qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7
> compiler: gcc-10.1.1-1.fc32.x86_64
> flags: 
> --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu"
>  --disable-werror --disable-sparse --enable-sdl --enable-kvm  
> --enable-vhost-net --enable-vhost-net --enable-attr  --enable-kvm  
> --enable-fdt   --enable-vnc --enable-seccomp 
> --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read"
>  --python=/usr/bin/python3 --enable-linux-io-uring
>
> would do. Maybe it'd be even a good idea to create a script to report this 
> basic set of information and add it after each of the perf scripts so people 
> don't forget to double-check the conditions, but others might disagree so 
> take this only as a suggestion.
>
> Regards,
> Lukáš
>
> PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can 
> use other libraries or just best-effort-approach with fallback to "unknown" 
> to let people filling it manually or adding their branch to your script.
>
> Regards,
> Lukáš
>
Thanks Mr. Lukáš, I'm really glad you found both reports interesting.

Both reports are based on QEMU version 5.0.0, this wasn't mentioned in
the reports so thanks for the reminder. I'll add a short note about
that.

The used QEMU build is a very basic GCC build (created by just running
../configure in the build directory without any flags).

Regarding the detailed machine information (CPU, RAM ... etc), The two
reports introduce some concepts and methodologies that will produce
consistent results on whichever machine they are executed on. So I
think it's unnecessary to mention the detailed system information used
in the reports for now.

Best regards,
Ahmed Karaman



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-06-29 Thread Ahmed Karaman
On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée  wrote:
>
>
> Ahmed Karaman  writes:
>
> > Hi,
> >
> > The second report of the TCG Continuous Benchmarking series builds
> > upon the QEMU performance metrics calculated in the previous report.
> > This report presents a method to dissect the number of instructions
> > executed by a QEMU invocation into three main phases:
> > - Code Generation
> > - JIT Execution
> > - Helpers Execution
> > It devises a Python script that automates this process.
> >
> > After that, the report presents an experiment for comparing the
> > output of running the script on 17 different targets. Many conclusions
> > can be drawn from the results and two of them are discussed in the
> > analysis section.
>
> A couple of comments. One think I think is missing from your analysis is
> the total number of guest instructions being emulated. As you point out
> each guest will have different code efficiency in terms of it's
> generated code.
>
> Assuming your test case is constant execution (i.e. runs the same each
> time)
Yes indeed, the report utilizes Callgrind in the measurements so the
results are very stable.
>you could run in through a plugins build to extract the number of
> guest instructions, e.g.:
>
>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin 
> ./tests/tcg/aarch64-linux-user/sha1
>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
>   insns: 158603512
>
That's a very nice suggestion. Maybe this will be the idea of a whole
new report. I'll try to execute the provided command and will let you
know if I have any questions.
> I should have also pointed out in your last report that running FP heavy
> code will always be biased towards helper/softfloat code to the
> detriment of everything else. I think you need more of a mix of
> benchmarks to get a better view.
>
> When Emilio did the last set of analysis he used a suite he built out of
> nbench and a perl benchmark:
>
>   https://github.com/cota/dbt-bench
>
> As he quoted in his README:
>
>   NBench programs are small, with execution time dominated by small code
>   loops. Thus, when run under a DBT engine, the resulting performance
>   depends almost entirely on the quality of the output code.
>
>   The Perl benchmarks compile Perl code. As is common for compilation
>   workloads, they execute large amounts of code and show no particular
>   code execution hotspots. Thus, the resulting DBT performance depends
>   largely on code translation speed.
>
> by only having one benchmark you are going to miss out on the envelope
> of use cases.
>
Future reports will introduce a variety of benchmarks. This report -
and the previous one - are introductory reports. The benchmark used
was to only demonstrate the report ideas. It was not used as a strict
benchmarking program.
> >
> > Report link:
> >https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
> >
> > Previous reports:
> > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> >
> > Best regards,
> > Ahmed Karaman
>
>
> --
> Alex Bennée



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-06-29 Thread Ahmed Karaman
Thank you for your support!

On Mon, Jun 29, 2020, 12:40 PM Aleksandar Markovic <
aleksandar.qemu.de...@gmail.com> wrote:

>
>
> понедељак, 29. јун 2020., Ahmed Karaman  је
> написао/ла:
>
>> Hi,
>>
>> The second report of the TCG Continuous Benchmarking series builds
>> upon the QEMU performance metrics calculated in the previous report.
>> This report presents a method to dissect the number of instructions
>> executed by a QEMU invocation into three main phases:
>> - Code Generation
>> - JIT Execution
>> - Helpers Execution
>> It devises a Python script that automates this process.
>>
>> After that, the report presents an experiment for comparing the
>> output of running the script on 17 different targets. Many conclusions
>> can be drawn from the results and two of them are discussed in the
>> analysis section.
>>
>> Report link:
>>
>> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>>
>> Previous reports:
>> Report 1 - Measuring Basic Performance Metrics of QEMU:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>>
>>
> My sincere congratulations on the Report 2!!
>
> And, on top of that, this is an excellent idea to list previous reports,
> as you did in the paragraph above.
>
> Keep reports coming!!
>
> Aleksandar
>
>
>
>> Best regards,
>> Ahmed Karaman
>>
>


[REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

2020-06-29 Thread Ahmed Karaman
Hi,

The second report of the TCG Continuous Benchmarking series builds
upon the QEMU performance metrics calculated in the previous report.
This report presents a method to dissect the number of instructions
executed by a QEMU invocation into three main phases:
- Code Generation
- JIT Execution
- Helpers Execution
It devises a Python script that automates this process.

After that, the report presents an experiment for comparing the
output of running the script on 17 different targets. Many conclusions
can be drawn from the results and two of them are discussed in the
analysis section.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html

Best regards,
Ahmed Karaman


Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#1] Measuring Basic Performance Metrics of QEMU

2020-06-28 Thread Ahmed Karaman
On Sun, Jun 28, 2020 at 7:20 PM Aleksandar Markovic
 wrote:
>
> Now, thinking longer about topN scripts, I think one really missing
> thing is number of invocations (or calls, whatever term you prefer)
> for any function in the list. This data must be possible to obtain
> using callgrind_annotate (most likely by using --tree option). With
> perf, i don't think this is possible, given that perf works based on
> sampling.
>
> You don't need to start working on it right now, or work on it at all
> - this is more like a brainstorming suggestion from me. You can make
> improvements and correction all the way towards the end of the
> project, on Aug 31st.
>
> At the end of the project, perhaps you can publish a "Master Project
> Report" - a pdf that is basically a sum of all your reports produced
> during the project. That would be a nice reading!
>
> Regards,
> Aleksandar
>

Thanks Mr. Aleksandar for always sharring your thoughts and suggestions.
I will consider this for an updated version of the report.

Regards,
Ahmed Karaman



[PATCH v4 3/3] MAINTAINERS: Add 'Performance Tools and Tests'subsection

2020-06-26 Thread Ahmed Karaman
This commit creates a new 'Miscellaneous' section which hosts a new
'Performance Tools and Tests' subsection.
The subsection will contain the the performance scripts and benchmarks
written as a part of the 'TCG Continuous Benchmarking' project.

Signed-off-by: Ahmed Karaman 
Reviewed-by: Alex Bennée 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1b40446c73..c510c942ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3019,3 +3019,10 @@ M: Peter Maydell 
 S: Maintained
 F: docs/conf.py
 F: docs/*/conf.py
+
+Miscellaneous
+-
+Performance Tools and Tests
+M: Ahmed Karaman 
+S: Maintained
+F: scripts/performance/
-- 
2.17.1




[PATCH v4 1/3] scripts/performance: Add topN_perf.py script

2020-06-26 Thread Ahmed Karaman
Syntax:
topN_perf.py [-h] [-n]   -- \
  [] \
  []

[-h] - Print the script arguments help message.
[-n] - Specify the number of top functions to print.
 - If this flag is not specified, the tool defaults to 25.

Example of usage:
topN_perf.py -n 20 -- qemu-arm coulomb_double-arm

Example Output:
 No.  Percentage  Name   Invoked by
  --  -  -
   1  16.25%  float64_mulqemu-x86_64
   2  12.01%  float64_subqemu-x86_64
   3  11.99%  float64_addqemu-x86_64
   4   5.69%  helper_mulsd   qemu-x86_64
   5   4.68%  helper_addsd   qemu-x86_64
   6   4.43%  helper_lookup_tb_ptr   qemu-x86_64
   7   4.28%  helper_subsd   qemu-x86_64
   8   2.71%  f64_compareqemu-x86_64
   9   2.71%  helper_ucomisd qemu-x86_64
  10   1.04%  helper_pand_xmmqemu-x86_64
  11   0.71%  float64_divqemu-x86_64
  12   0.63%  helper_pxor_xmmqemu-x86_64
  13   0.50%  0x7f7b7004ef95 [JIT] tid 491
  14   0.50%  0x7f7b70044e83 [JIT] tid 491
  15   0.36%  helper_por_xmm qemu-x86_64
  16   0.32%  helper_cc_compute_all  qemu-x86_64
  17   0.30%  0x7f7b700433f0 [JIT] tid 491
  18   0.30%  float64_compare_quiet  qemu-x86_64
  19   0.27%  soft_f64_addsubqemu-x86_64
  20   0.26%  round_to_int   qemu-x86_64

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/topN_perf.py | 149 +++
 1 file changed, 149 insertions(+)
 create mode 100755 scripts/performance/topN_perf.py

diff --git a/scripts/performance/topN_perf.py b/scripts/performance/topN_perf.py
new file mode 100755
index 00..07be195fc8
--- /dev/null
+++ b/scripts/performance/topN_perf.py
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+
+#  Print the top N most executed functions in QEMU using perf.
+#  Syntax:
+#  topN_perf.py [-h] [-n]   -- \
+#[] \
+#[]
+#
+#  [-h] - Print the script arguments help message.
+#  [-n] - Specify the number of top functions to print.
+#   - If this flag is not specified, the tool defaults to 25.
+#
+#  Example of usage:
+#  topN_perf.py -n 20 -- qemu-arm coulomb_double-arm
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+
+
+# Parse the command line arguments
+parser = argparse.ArgumentParser(
+usage='topN_perf.py [-h] [-n]   -- '
+  ' [] '
+  ' []')
+
+parser.add_argument('-n', dest='top', type=int, default=25,
+help='Specify the number of top functions to print.')
+
+parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
+
+args = parser.parse_args()
+
+# Extract the needed variables from the args
+command = args.command
+top = args.top
+
+# Insure that perf is installed
+check_perf_presence = subprocess.run(["which", "perf"],
+ stdout=subprocess.DEVNULL)
+if check_perf_presence.returncode:
+sys.exit("Please install perf before running the script!")
+
+# Insure user has previllage to run perf
+check_perf_executability = subprocess.run(["perf", "stat", "ls", "/"],
+  stdout=subprocess.DEVNULL,
+  stderr=subprocess.DEVNULL)
+if check_perf_executability.returncode:
+sys.exit(
+"""
+Error:
+You may not have permission to collect stats.
+
+Consider tweaking /proc/sys/kernel/perf_event_paranoid,
+which controls use of the performance events system by
+unprivileged users (without CAP_SYS_ADMIN).
+
+  -1: Allow use of (almost) all events by all users
+  Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
+   0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
+  Disallow raw tracepoint access by users without CAP_SYS_ADMIN
+   1: Disallow CPU event 

[PATCH v4 2/3] scripts/performance: Add topN_callgrind.py script

2020-06-26 Thread Ahmed Karaman
Python script that prints the top N most executed functions in QEMU
using callgrind.

Syntax:
topN_callgrind.py [-h] [-n]   -- \
   [] \
   []

[-h] - Print the script arguments help message.
[-n] - Specify the number of top functions to print.
 - If this flag is not specified, the tool defaults to 25.

Example of usage:
topN_callgrind.py -n 20 -- qemu-arm coulomb_double-arm

Example Output:
No.  Percentage Function Name Source File
  - ----
   124.577% 0x082db000???
   220.467% float64_mul   /fpu/softfloat.c
   314.720% float64_sub   /fpu/softfloat.c
   413.864% float64_add   /fpu/softfloat.c
   5 4.876% helper_mulsd  /target/i386/ops_sse.h
   6 3.767% helper_subsd  /target/i386/ops_sse.h
   7 3.549% helper_addsd  /target/i386/ops_sse.h
   8 2.185% helper_ucomisd/target/i386/ops_sse.h
   9 1.667% helper_lookup_tb_ptr  /include/exec/tb-lookup.h
  10 1.662% f64_compare   /fpu/softfloat.c
  11 1.509% helper_lookup_tb_ptr  /accel/tcg/tcg-runtime.c
  12 0.635% helper_lookup_tb_ptr  /include/exec/exec-all.h
  13 0.616% float64_div   /fpu/softfloat.c
  14 0.502% helper_pand_xmm   /target/i386/ops_sse.h
  15 0.502% float64_mul   /include/fpu/softfloat.h
  16 0.476% helper_lookup_tb_ptr  /target/i386/cpu.h
  17 0.437% float64_compare_quiet /fpu/softfloat.c
  18 0.414% helper_pxor_xmm   /target/i386/ops_sse.h
  19 0.353% round_to_int  /fpu/softfloat.c
  20 0.347% helper_cc_compute_all /target/i386/cc_helper.c

Signed-off-by: Ahmed Karaman 
---
 scripts/performance/topN_callgrind.py | 140 ++
 1 file changed, 140 insertions(+)
 create mode 100755 scripts/performance/topN_callgrind.py

diff --git a/scripts/performance/topN_callgrind.py 
b/scripts/performance/topN_callgrind.py
new file mode 100755
index 00..67c59197af
--- /dev/null
+++ b/scripts/performance/topN_callgrind.py
@@ -0,0 +1,140 @@
+#!/usr/bin/env python3
+
+#  Print the top N most executed functions in QEMU using callgrind.
+#  Syntax:
+#  topN_callgrind.py [-h] [-n]   -- \
+#[] \
+#[]
+#
+#  [-h] - Print the script arguments help message.
+#  [-n] - Specify the number of top functions to print.
+#   - If this flag is not specified, the tool defaults to 25.
+#
+#  Example of usage:
+#  topN_callgrind.py -n 20 -- qemu-arm coulomb_double-arm
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman 
+#  Copyright (C) 2020  Aleksandar Markovic 
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+
+
+# Parse the command line arguments
+parser = argparse.ArgumentParser(
+usage='topN_callgrind.py [-h] [-n]   -- 
'
+  ' [] '
+  ' []')
+
+parser.add_argument('-n', dest='top', type=int, default=25,
+help='Specify the number of top functions to print.')
+
+parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
+
+args = parser.parse_args()
+
+# Extract the needed variables from the args
+command = args.command
+top = args.top
+
+# Insure that valgrind is installed
+check_valgrind_presence = subprocess.run(["which", "valgrind"],
+ stdout=subprocess.DEVNULL)
+if check_valgrind_presence.returncode:
+sys.exit("Please install valgrind before running the script!")
+
+# Run callgrind
+callgrind = subprocess.run((
+["valgrind", "--tool=callgrind", 
"--callgrind-out-file=/tmp/callgrind.data"]
++ command),
+stdout=subprocess.DEVNULL,
+stderr=subprocess.PIPE)
+if callgrind.returncode:
+sys.exit(callgrind.stderr.decode("utf-8"))
+
+# Save callgrind_annotate output to /tmp/callgrind_annotate.out
+with open("/tmp/callgrind_annotate.out", "w") as output:
+callgrind_annotate = subprocess.run(["callgrind_annotate",
+ "/tmp/callgrind.data"],
+stdout=output,
+

[PATCH v4 0/3] Add Scripts for Finding Top 25 Executed Functions

2020-06-26 Thread Ahmed Karaman
Greetings,

As a part of the TCG Continous Benchmarking project for GSoC this
year, detailed reports discussing different performance measurement
methodologies and analysis results will be sent here on the mailing
list.

The project's first report was published on the mailing list on the
22nd of June:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html

A section in this report deals with measuring the top 25 executed
functions when running QEMU. It includes two Python scripts that
automatically perform this task.

This series adds these two scripts to a new performance directory
created under the scripts directory. It also adds a new
"Miscellaneous" section to the end of the MAINTAINERS file with a
"Performance Tools and Tests" subsection.

Previous versions of the series:
v3:
https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg07856.html
v2:
https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg06147.html
v1:
https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg04868.html

Best regards,
Ahmed Karaman

v3->v4:
- Save all intermediate files generated by the scripts in the '/tmp'
  directory instead of the current working directory of the user.
- Use more descriptive variable names and table headers.

v2->v3:
- Use a clearer "Syntax" and "Example of usage" in the script comment
  and commit message.
- Manually specify the instructions required to run Perf instead of
  relying on the stderr produced by Perf.
- Use more descriptive variable names.

v1->v2:
- Add an empty line at the end of the MAINTAINERS file.
- Move MAINTAINERS patch to be the last in the series.
- Allow custom number of top functions to be specified.
- Check for vallgrind and perf before executing the scripts.
- Ensure sufficient permissions when running the topN_perf script.
- Use subprocess instead of os.system
- Use os.unlink() for deleting intermediate files.
- Spread out the data extraction steps.
- Enable execution permission for the scripts.
- Add script example output in the commit message.


Ahmed Karaman (3):
  scripts/performance: Add topN_perf.py script
  scripts/performance: Add topN_callgrind.py script
  MAINTAINERS: Add 'Performance Tools and Tests'subsection

 MAINTAINERS   |   7 ++
 scripts/performance/topN_callgrind.py | 140 
 scripts/performance/topN_perf.py  | 149 ++
 3 files changed, 296 insertions(+)
 create mode 100755 scripts/performance/topN_callgrind.py
 create mode 100755 scripts/performance/topN_perf.py

-- 
2.17.1




  1   2   >