Hi I have some code that computes a MatMatMult, then modifies one of the matrices and calls the MatMatMult again with MAT_REUSE_MATRIX. This gives the correct results with mat_types aij, aijkokkos and either run on a CPU or a GPU.
In some cases one of the matrices is diagonal so I have been modifying my code to call MatDiagonalScale instead and have seen some peculiar behaviour. If the mat type is aij, everything works correctly on the CPU and GPU. If the mat type is aijkokkos, everything works correctly on the CPU but when run on a machine with a GPU the results differ. I have some example code below that shows this, it prints out four matrices; the first two should match and the last two should match. To see the failure run with "-mat_type aijkokkos" on a machine with a GPU (again this gives the correct results with aijkokkos run on a CPU). I get the output: Mat Object: 1 MPI process type: seqaijkokkos row 0: (0, 4.) (1, 6.) row 1: (0, 10.) (1, 14.) Mat Object: 1 MPI process type: seqaijkokkos row 0: (0, 4.) (1, 6.) row 1: (0, 10.) (1, 14.) Mat Object: 1 MPI process type: seqaijkokkos row 0: (0, 6.) (1, 9.) row 1: (0, 15.) (1, 21.) Mat Object: 1 MPI process type: seqaijkokkos row 0: (0, 12.) (1, 18.) row 1: (0, 30.) (1, 42.) I have narrowed down this failure to a MatCopy call, where the values of result_diag should be overwritten with A before calling MatDiagonalScale. The results with aijkokos on a GPU suggest the values of result_diag are not being changed. If instead of using the MatCopy I destroy result_diag and call MatDuplicate, the results are correct. You can trigger the correct behaviour with "-mat_type aijkokkos -correct". To me this looks like the values are out of sync between the device/host, but I would have thought a call to MatCopy would be safe. Any thoughts on what I might be doing wrong? Thanks for your help! Steven ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ static char help[] = "Test matmat products with matdiagonal on gpus \n\n"; #include <iostream> #include <petscmat.h> int main(int argc, char **args) { const PetscInt inds[] = {0, 1}; PetscScalar avals[] = {2, 3, 5, 7}; Mat A, B_diag, B_aij_diag, result, result_diag; Vec diag; PetscCall(PetscInitialize(&argc, &args, NULL, help)); // Create matrix to start PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2, &A)); PetscCall(MatSetUp(A)); PetscCall(MatSetValues(A, 2, inds, 2, inds, avals, INSERT_VALUES)); PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY)); PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY)); // Create a matdiagonal matrix // Will be the matching vec type as A PetscCall(MatCreateVecs(A, &diag, NULL)); PetscCall(VecSet(diag, 2.0)); PetscCall(MatCreateDiagonal(diag, &B_diag)); // Create the same matrix as the matdiagonal but in aij format PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2, &B_aij_diag)); PetscCall(MatSetUp(B_aij_diag)); PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES)); PetscCall(MatAssemblyBegin(B_aij_diag, MAT_FINAL_ASSEMBLY)); PetscCall(MatAssemblyEnd(B_aij_diag, MAT_FINAL_ASSEMBLY)); PetscCall(VecDestroy(&diag)); // ~~~~~~~~~~~~~ // Do an initial matmatmult // A * B_aij_diag // and then // A * B_diag but just using MatDiagonalScale // ~~~~~~~~~~~~~ // aij * aij PetscCall(MatMatMult(A, B_aij_diag, MAT_INITIAL_MATRIX, 1.5, &result)); PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD)); // aij * diagonal PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag)); PetscCall(MatDiagonalGetDiagonal(B_diag, &diag)); PetscCall(MatDiagonalScale(result_diag, NULL, diag)); PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag)); PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD)); // ~~~~~~~~~~~~~ // Now let's modify the diagonal and do it again with "reuse" // ~~~~~~~~~~~~~ PetscCall(MatDiagonalGetDiagonal(B_diag, &diag)); PetscCall(VecSet(diag, 3.0)); PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES)); PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag)); // aij * aij PetscCall(MatMatMult(A, B_aij_diag, MAT_REUSE_MATRIX, 1.5, &result)); PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD)); PetscBool correct = PETSC_FALSE; PetscCall(PetscOptionsGetBool(NULL, NULL, "-correct", &correct, NULL)); if (!correct) { // This gives the wrong results below when run on gpu // Results suggest this isn't copied PetscCall(MatCopy(A, result_diag, SAME_NONZERO_PATTERN)); } else { // This gives the correct results below PetscCall(MatDestroy(&result_diag)); PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag)); } // aij * diagonal PetscCall(MatDiagonalGetDiagonal(B_diag, &diag)); PetscCall(MatDiagonalScale(result_diag, NULL, diag)); PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag)); PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD)); PetscCall(PetscFinalize()); return 0; }