Re: [PATCH] rs6000: Add execution tests for mma builtins [v4]

2020-07-10 Thread Segher Boessenkool
Hi!

On Fri, Jul 10, 2020 at 03:49:02PM -0500, Aaron Sawdey via Gcc-patches wrote:
> This patch adds execution tests that use the MMA builtins and
> check for the right answer, and new tests that checks whether
> __builtin_cpu_supports and __builtin_cpu_is return sane
> answers for power10.

> 2020-06-30  Rajalakshmi Srinivasaraghavan  
>   Aaron Sawdey  
> 
> gcc/testsuite/
>   * gcc.target/powerpc/p10-identify.c: New file.
>   * gcc.target/powerpc/p10-arch31.c: New file.
>   * gcc.target/powerpc/mma-single-test.c: New file.
>   * gcc.target/powerpc/mma-double-test.c: New file.

Okay for trunk, and for GCC 10 backport as well.  Thanks!


Segher


[PATCH] rs6000: Add execution tests for mma builtins [v4]

2020-07-10 Thread Aaron Sawdey via Gcc-patches
This patch adds execution tests that use the MMA builtins and
check for the right answer, and new tests that checks whether
__builtin_cpu_supports and __builtin_cpu_is return sane
answers for power10.

I've now cleaned up and separated things out so there are 4 test cases:
* MMA single precision execution test
* MMA double precision execution test
* test that if effective-target is power10_hw, __builtin_cpu_is("power10")
  is also true.
* test that if effective-target is power10_hw,
  __builtin_cpu_supports("arch_3_1") is also true.

This establishes that the test environment correctly identifies itself,
and that it can execute MMA code and get the right answer.

A future patch will add an effective-target test for powerpc_mma_hw,
which these mma tests will also need to check for.

OK for trunk and backport to 10?

2020-06-30  Rajalakshmi Srinivasaraghavan  
Aaron Sawdey  

gcc/testsuite/
* gcc.target/powerpc/p10-identify.c: New file.
* gcc.target/powerpc/p10-arch31.c: New file.
* gcc.target/powerpc/mma-single-test.c: New file.
* gcc.target/powerpc/mma-double-test.c: New file.
---
 .../gcc.target/powerpc/mma-double-test.c  | 185 +
 .../gcc.target/powerpc/mma-single-test.c  | 193 ++
 gcc/testsuite/gcc.target/powerpc/p10-arch31.c |  25 +++
 .../gcc.target/powerpc/p10-identify.c |  26 +++
 4 files changed, 429 insertions(+)
 create mode 100755 gcc/testsuite/gcc.target/powerpc/mma-double-test.c
 create mode 100755 gcc/testsuite/gcc.target/powerpc/mma-single-test.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-arch31.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-identify.c

diff --git a/gcc/testsuite/gcc.target/powerpc/mma-double-test.c 
b/gcc/testsuite/gcc.target/powerpc/mma-double-test.c
new file mode 100755
index 000..9ba0010978f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-double-test.c
@@ -0,0 +1,185 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include 
+#include 
+#include 
+
+typedef unsigned char vec_t __attribute__ ((vector_size (16)));
+typedef double v4sf_t __attribute__ ((vector_size (16)));
+#define SAVE_ACC(ACC, ldc, J)  \
+ __builtin_mma_disassemble_acc (result, ACC); \
+ rowC = (v4sf_t *) [0*ldc+J]; \
+  rowC[0] += result[3] ; \
+  rowC = (v4sf_t *) [1*ldc+J]; \
+  rowC[0] += result[2] ; \
+  rowC = (v4sf_t *) [2*ldc+J]; \
+  rowC[0] += result[1] ; \
+  rowC = (v4sf_t *) [3*ldc+J]; \
+ rowC[0] += result[0] ;
+
+void
+MMA (int m, int n, int k, double *A, double *B, double *C)
+{
+  __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7;
+  v4sf_t result[4];
+  v4sf_t *rowC;
+  for (int l = 0; l < n; l += 4)
+{
+  double *CO;
+  double *AO;
+  AO = A;
+  CO = C;
+  C += m * 4;
+  for (int j = 0; j < m; j += 16)
+   {
+ double *BO = B;
+ __builtin_mma_xxsetaccz ();
+ __builtin_mma_xxsetaccz ();
+ __builtin_mma_xxsetaccz ();
+ __builtin_mma_xxsetaccz ();
+ __builtin_mma_xxsetaccz ();
+ __builtin_mma_xxsetaccz ();
+ __builtin_mma_xxsetaccz ();
+ __builtin_mma_xxsetaccz ();
+ unsigned long i;
+
+ for (i = 0; i < k; i++)
+   {
+ vec_t *rowA = (vec_t *) & AO[i * 16];
+ __vector_pair rowB;
+ vec_t *rb = (vec_t *) & BO[i * 4];
+ __builtin_mma_assemble_pair (, rb[1], rb[0]);
+ __builtin_mma_xvf64gerpp (, rowB, rowA[0]);
+ __builtin_mma_xvf64gerpp (, rowB, rowA[1]);
+ __builtin_mma_xvf64gerpp (, rowB, rowA[2]);
+ __builtin_mma_xvf64gerpp (, rowB, rowA[3]);
+ __builtin_mma_xvf64gerpp (, rowB, rowA[4]);
+ __builtin_mma_xvf64gerpp (, rowB, rowA[5]);
+ __builtin_mma_xvf64gerpp (, rowB, rowA[6]);
+ __builtin_mma_xvf64gerpp (, rowB, rowA[7]);
+   }
+ SAVE_ACC (, m, 0);
+ SAVE_ACC (, m, 4);
+ SAVE_ACC (, m, 2);
+ SAVE_ACC (, m, 6);
+ SAVE_ACC (, m, 8);
+ SAVE_ACC (, m, 12);
+ SAVE_ACC (, m, 10);
+ SAVE_ACC (, m, 14);
+ AO += k * 16;
+ BO += k * 4;
+ CO += 16;
+   }
+  B += k * 4;
+}
+}
+
+void
+init (double *matrix, int row, int column)
+{
+  for (int j = 0; j < column; j++)
+{
+  for (int i = 0; i < row; i++)
+   {
+ matrix[j * row + i] = (i * 16 + 2 + j) / 0.123;
+   }
+}
+}
+
+void
+init0 (double *matrix, double *matrix1, int row, int column)
+{
+  for (int j = 0; j < column; j++)
+for (int i = 0; i < row; i++)
+  matrix[j * row + i] = matrix1[j * row + i] = 0;
+}
+
+
+void
+print (const char *name, const double *matrix, int row, int column)
+{
+  printf ("Matrix %s has %d rows and %d columns:\n", name, row,