Peder I have slightly modified your code and I confirm the bug. The bug is not with the MatMatTranspose operation; it is within the HDF5 reader. I will soon open an MR with the code and discussing the issues.
Thanks for reporting the issue Stefano Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via petsc-users <[email protected]> ha scritto: > Dear Hong > > > Thank your for your reply. > > > I have a hunch that the issue goes beyond the minor differences that > might arise from floating-point computation order, however. > > > Writing the product matrix to a binary file using MatView() and inspecting > the output shows very different entries depending on the number of > processes. Here are the first three rows and columns of the product matrix > obtained in a sequential run: > > 2.58348 1.68202 1.66302 > > 1.68202 4.27506 1.91897 > > 1.66302 1.91897 2.70028 > > > - and the corresponding part of the product matrix obtained on one node > (40 processes): > > 4.43536 2.17261 0.16430 > > 2.17261 4.53224 2.53210 > > 0.16430 2.53210 4.73234 > > > The parallel result is not even close to the sequential one. Trying > different numbers of processes produces yet different results. > > > Also, the eigenvectors that I subsequently determine using a SLEPC solver > do not form a proper basis for the column space of the data matrix as > they must, which is hardly a surprise given the variability of > results indicated above - except when the code is run on just a single > process. Forming such a basis central to the intended application, and given > that it would need to work on rather large data sets, running on a single > process is hardly a viable solution. > > > Best regards > > Peder > ------------------------------ > *Fra:* Zhang, Hong <[email protected]> > *Sendt:* 19. april 2021 18:34:31 > *Til:* [email protected]; Peder Jørgensgaard Olesen > *Emne:* Re: Rather different matrix product results on multiple processes > > Peder, > I tested your code on a linux machine. I got > $ ./acorr_mwe > Data matrix norm: 5.0538e+01 > Autocorrelation matrix norm: 1.0473e+03 > > mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via > allgatherv (default) > Data matrix norm: 5.0538e+01 > Autocorrelation matrix norm: 1.0363e+03 > > mpiexec -n 20 ./acorr_mwe > Data matrix norm: 5.0538e+01 > Autocorrelation matrix norm: 1.0897e+03 > > mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic > Data matrix norm: 5.0538e+01 > Autocorrelation matrix norm: 1.0363e+03 > > I use petsc 'main' branch (same as the latest release). You can remove > MatAssemblyBegin/End calls after MatMatTransposeMult(): > MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX, PETSC_DEFAULT, > &corr_mat); > //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > The communication patterns of parallel implementation led to different > order of floating-point computation, thus slightly different matrix norm of > R. > Hong > > ------------------------------ > *From:* petsc-users <[email protected]> on behalf of Peder > Jørgensgaard Olesen via petsc-users <[email protected]> > *Sent:* Monday, April 19, 2021 7:57 AM > *To:* [email protected] <[email protected]> > *Subject:* [petsc-users] Rather different matrix product results on > multiple processes > > > Hello, > > > When computing a matrix product of the type R = D.DT using > MatMatTransposeMult() I find I get rather different results depending on > the number of processes. In one example using a data set that is > small compared to the application I get Frobenius norms |R| = 1.047e3 on a > single process, 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on > two nodes. > > > I have ascertained that the single process result is indeed the correct > one (i.e., eigenvectors of R form a proper basis for the columns of D), so > naturally I'd love to be able to reproduce this result across different > parallel setups. How might I achieve this? > > > I'm attaching MWE code and the data set used for the example. > > > Thanks in advance! > > > Best Regards > > > Peder Jørgensgaard Olesen > > PhD Student, Turbulence Research Lab > > Dept. of Mechanical Engineering > > Technical University of Denmark > > Niels Koppels Allé > > Bygning 403, Rum 105 > > DK-2800 Kgs. Lyngby > -- Stefano
