Fantastic, that seems to have fixed it. Thanks for your help! On Tue, 8 Apr 2025 at 06:04, Junchao Zhang <junchao.zh...@gmail.com> wrote:
> Hi, Steven > Thank you for the bug report and test example. You were right. The > MatCopy(A, B,..) implementation was wrong when B was on the device. > I have a fix at > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8288__;!!G_uCfscf7eWS!czsmOAhl7kNlur21sGgNTcYMe79EP113IoDsGXFSWeuR9b1LW7-vxQgIfp2YGSsKyiIR3s193a6KcdEqeQAv-XxqSVRCPUHa$ > , > and will add your test to the MR tomorrow. > > Thanks! > --Junchao Zhang > > > On Mon, Apr 7, 2025 at 4:57 PM Steven Dargaville < > dargaville.ste...@gmail.com> wrote: > >> Hi >> >> I have some code that computes a MatMatMult, then modifies one of the >> matrices and calls the MatMatMult again with MAT_REUSE_MATRIX. This gives >> the correct results with mat_types aij, aijkokkos and either run on a CPU >> or a GPU. >> >> In some cases one of the matrices is diagonal so I have been modifying my >> code to call MatDiagonalScale instead and have seen some peculiar >> behaviour. If the mat type is aij, everything works correctly on the CPU >> and GPU. If the mat type is aijkokkos, everything works correctly on the >> CPU but when run on a machine with a GPU the results differ. >> >> I have some example code below that shows this, it prints out four >> matrices; the first two should match and the last two should match. To see >> the failure run with "-mat_type aijkokkos" on a machine with a GPU (again >> this gives the correct results with aijkokkos run on a CPU). I get the >> output: >> >> Mat Object: 1 MPI process >> type: seqaijkokkos >> row 0: (0, 4.) (1, 6.) >> row 1: (0, 10.) (1, 14.) >> Mat Object: 1 MPI process >> type: seqaijkokkos >> row 0: (0, 4.) (1, 6.) >> row 1: (0, 10.) (1, 14.) >> Mat Object: 1 MPI process >> type: seqaijkokkos >> row 0: (0, 6.) (1, 9.) >> row 1: (0, 15.) (1, 21.) >> Mat Object: 1 MPI process >> type: seqaijkokkos >> row 0: (0, 12.) (1, 18.) >> row 1: (0, 30.) (1, 42.) >> >> I have narrowed down this failure to a MatCopy call, where the values of >> result_diag should be overwritten with A before calling MatDiagonalScale. >> The results with aijkokos on a GPU suggest the values of result_diag are >> not being changed. If instead of using the MatCopy I destroy result_diag >> and call MatDuplicate, the results are correct. You can trigger the correct >> behaviour with "-mat_type aijkokkos -correct". >> >> To me this looks like the values are out of sync between the device/host, >> but I would have thought a call to MatCopy would be safe. Any thoughts on >> what I might be doing wrong? >> >> Thanks for your help! >> Steven >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> static char help[] = "Test matmat products with matdiagonal on gpus \n\n"; >> >> #include <iostream> >> #include <petscmat.h> >> >> >> int main(int argc, char **args) >> { >> const PetscInt inds[] = {0, 1}; >> PetscScalar avals[] = {2, 3, 5, 7}; >> Mat A, B_diag, B_aij_diag, result, result_diag; >> Vec diag; >> >> PetscCall(PetscInitialize(&argc, &args, NULL, help)); >> >> // Create matrix to start >> PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2, >> &A)); >> PetscCall(MatSetUp(A)); >> PetscCall(MatSetValues(A, 2, inds, 2, inds, avals, INSERT_VALUES)); >> PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY)); >> PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY)); >> >> // Create a matdiagonal matrix >> // Will be the matching vec type as A >> PetscCall(MatCreateVecs(A, &diag, NULL)); >> PetscCall(VecSet(diag, 2.0)); >> PetscCall(MatCreateDiagonal(diag, &B_diag)); >> >> // Create the same matrix as the matdiagonal but in aij format >> PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2, >> &B_aij_diag)); >> PetscCall(MatSetUp(B_aij_diag)); >> PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES)); >> PetscCall(MatAssemblyBegin(B_aij_diag, MAT_FINAL_ASSEMBLY)); >> PetscCall(MatAssemblyEnd(B_aij_diag, MAT_FINAL_ASSEMBLY)); >> PetscCall(VecDestroy(&diag)); >> >> // ~~~~~~~~~~~~~ >> // Do an initial matmatmult >> // A * B_aij_diag >> // and then >> // A * B_diag but just using MatDiagonalScale >> // ~~~~~~~~~~~~~ >> >> // aij * aij >> PetscCall(MatMatMult(A, B_aij_diag, MAT_INITIAL_MATRIX, 1.5, &result)); >> PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD)); >> >> // aij * diagonal >> PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag)); >> PetscCall(MatDiagonalGetDiagonal(B_diag, &diag)); >> PetscCall(MatDiagonalScale(result_diag, NULL, diag)); >> PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag)); >> PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD)); >> >> // ~~~~~~~~~~~~~ >> // Now let's modify the diagonal and do it again with "reuse" >> // ~~~~~~~~~~~~~ >> PetscCall(MatDiagonalGetDiagonal(B_diag, &diag)); >> PetscCall(VecSet(diag, 3.0)); >> PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES)); >> PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag)); >> >> // aij * aij >> PetscCall(MatMatMult(A, B_aij_diag, MAT_REUSE_MATRIX, 1.5, &result)); >> PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD)); >> >> PetscBool correct = PETSC_FALSE; >> PetscCall(PetscOptionsGetBool(NULL, NULL, "-correct", &correct, NULL)); >> >> if (!correct) >> { >> // This gives the wrong results below when run on gpu >> // Results suggest this isn't copied >> PetscCall(MatCopy(A, result_diag, SAME_NONZERO_PATTERN)); >> } >> else >> { >> // This gives the correct results below >> PetscCall(MatDestroy(&result_diag)); >> PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag)); >> } >> >> // aij * diagonal >> PetscCall(MatDiagonalGetDiagonal(B_diag, &diag)); >> PetscCall(MatDiagonalScale(result_diag, NULL, diag)); >> PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag)); >> PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD)); >> >> PetscCall(PetscFinalize()); >> return 0; >> } >> >