Here you have, https://gitlab.com/petsc/petsc/-/merge_requests/3903. We can discuss the issue on gitlab.
Thanks Stefano Il giorno mer 21 apr 2021 alle ore 13:39 Stefano Zampini < [email protected]> ha scritto: > Peder > > I have slightly modified your code and I confirm the bug. > The bug is not with the MatMatTranspose operation; it is within the HDF5 > reader. I will soon open an MR with the code and discussing the issues. > > Thanks for reporting the issue > Stefano > > Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via > petsc-users <[email protected]> ha scritto: > >> Dear Hong >> >> >> Thank your for your reply. >> >> >> I have a hunch that the issue goes beyond the minor differences that >> might arise from floating-point computation order, however. >> >> >> Writing the product matrix to a binary file using MatView() and >> inspecting the output shows very different entries depending on the number >> of processes. Here are the first three rows and columns of the product >> matrix obtained in a sequential run: >> >> 2.58348 1.68202 1.66302 >> >> 1.68202 4.27506 1.91897 >> >> 1.66302 1.91897 2.70028 >> >> >> - and the corresponding part of the product matrix obtained on one node >> (40 processes): >> >> 4.43536 2.17261 0.16430 >> >> 2.17261 4.53224 2.53210 >> >> 0.16430 2.53210 4.73234 >> >> >> The parallel result is not even close to the sequential one. Trying >> different numbers of processes produces yet different results. >> >> >> Also, the eigenvectors that I subsequently determine using a SLEPC >> solver do not form a proper basis for the column space of the data >> matrix as they must, which is hardly a surprise given the variability of >> results indicated above - except when the code is run on just a single >> process. Forming such a basis central to the intended application, and given >> that it would need to work on rather large data sets, running on a single >> process is hardly a viable solution. >> >> >> Best regards >> >> Peder >> ------------------------------ >> *Fra:* Zhang, Hong <[email protected]> >> *Sendt:* 19. april 2021 18:34:31 >> *Til:* [email protected]; Peder Jørgensgaard Olesen >> *Emne:* Re: Rather different matrix product results on multiple processes >> >> Peder, >> I tested your code on a linux machine. I got >> $ ./acorr_mwe >> Data matrix norm: 5.0538e+01 >> Autocorrelation matrix norm: 1.0473e+03 >> >> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via >> allgatherv (default) >> Data matrix norm: 5.0538e+01 >> Autocorrelation matrix norm: 1.0363e+03 >> >> mpiexec -n 20 ./acorr_mwe >> Data matrix norm: 5.0538e+01 >> Autocorrelation matrix norm: 1.0897e+03 >> >> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic >> Data matrix norm: 5.0538e+01 >> Autocorrelation matrix norm: 1.0363e+03 >> >> I use petsc 'main' branch (same as the latest release). You can remove >> MatAssemblyBegin/End calls after MatMatTransposeMult(): >> MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX, >> PETSC_DEFAULT, &corr_mat); >> //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); >> //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); >> >> The communication patterns of parallel implementation led to different >> order of floating-point computation, thus slightly different matrix norm of >> R. >> Hong >> >> ------------------------------ >> *From:* petsc-users <[email protected]> on behalf of Peder >> Jørgensgaard Olesen via petsc-users <[email protected]> >> *Sent:* Monday, April 19, 2021 7:57 AM >> *To:* [email protected] <[email protected]> >> *Subject:* [petsc-users] Rather different matrix product results on >> multiple processes >> >> >> Hello, >> >> >> When computing a matrix product of the type R = D.DT using >> MatMatTransposeMult() I find I get rather different results depending on >> the number of processes. In one example using a data set that is >> small compared to the application I get Frobenius norms |R| = 1.047e3 on a >> single process, 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on >> two nodes. >> >> >> I have ascertained that the single process result is indeed the correct >> one (i.e., eigenvectors of R form a proper basis for the columns of D), so >> naturally I'd love to be able to reproduce this result across different >> parallel setups. How might I achieve this? >> >> >> I'm attaching MWE code and the data set used for the example. >> >> >> Thanks in advance! >> >> >> Best Regards >> >> >> Peder Jørgensgaard Olesen >> >> PhD Student, Turbulence Research Lab >> >> Dept. of Mechanical Engineering >> >> Technical University of Denmark >> >> Niels Koppels Allé >> >> Bygning 403, Rum 105 >> >> DK-2800 Kgs. Lyngby >> > > > -- > Stefano > -- Stefano
