Dear Stefano and Jose
Thank you for your replies. Using SVD works like a charm. I'll try to do some trickery to work around the HDF5 reader bug. Best regards Peder ________________________________ Fra: Jose E. Roman <[email protected]> Sendt: 21. april 2021 14:24:38 Til: Peder Jørgensgaard Olesen Cc: [email protected]; Stefano Zampini Emne: Re: [petsc-users] Rather different matrix product results on multiple processes Independently of the bug mentioned by Stefano, you may want to consider using SLEPc's SVD instead of EPS. Left singular vectors of D are equal to eigenvectors of D*D', see chapter 4 of SLEPc's users manual. The default solver 'cross' gives you flexibility to compute the product D*D' explicitly or not, and build the transpose explicitly or not. Jose > El 21 abr 2021, a las 12:54, Stefano Zampini <[email protected]> > escribió: > > Here you have, https://gitlab.com/petsc/petsc/-/merge_requests/3903. We can > discuss the issue on gitlab. > > Thanks > Stefano > > Il giorno mer 21 apr 2021 alle ore 13:39 Stefano Zampini > <[email protected]> ha scritto: > Peder > > I have slightly modified your code and I confirm the bug. > The bug is not with the MatMatTranspose operation; it is within the HDF5 > reader. I will soon open an MR with the code and discussing the issues. > > Thanks for reporting the issue > Stefano > > Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via > petsc-users <[email protected]> ha scritto: > Dear Hong > > > > Thank your for your reply. > > > > I have a hunch that the issue goes beyond the minor differences that might > arise from floating-point computation order, however. > > > > Writing the product matrix to a binary file using MatView() and inspecting > the output shows very different entries depending on the number of processes. > Here are the first three rows and columns of the product matrix obtained in a > sequential run: > > 2.58348 1.68202 1.66302 > > 1.68202 4.27506 1.91897 > > 1.66302 1.91897 2.70028 > > > > - and the corresponding part of the product matrix obtained on one node (40 > processes): > > 4.43536 2.17261 0.16430 > > 2.17261 4.53224 2.53210 > > 0.16430 2.53210 4.73234 > > > > The parallel result is not even close to the sequential one. Trying different > numbers of processes produces yet different results. > > > > Also, the eigenvectors that I subsequently determine using a SLEPC solver do > not form a proper basis for the column space of the data matrix as they must, > which is hardly a surprise given the variability of results indicated above - > except when the code is run on just a single process. Forming such a basis > central to the intended application, and given that it would need to work on > rather large data sets, running on a single process is hardly a viable > solution. > > > > Best regards > > Peder > > Fra: Zhang, Hong <[email protected]> > Sendt: 19. april 2021 18:34:31 > Til: [email protected]; Peder Jørgensgaard Olesen > Emne: Re: Rather different matrix product results on multiple processes > > Peder, > I tested your code on a linux machine. I got > $ ./acorr_mwe > Data matrix norm: 5.0538e+01 > Autocorrelation matrix norm: 1.0473e+03 > > mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via allgatherv > (default) > Data matrix norm: 5.0538e+01 > Autocorrelation matrix norm: 1.0363e+03 > > mpiexec -n 20 ./acorr_mwe > Data matrix norm: 5.0538e+01 > Autocorrelation matrix norm: 1.0897e+03 > > mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic > Data matrix norm: 5.0538e+01 > Autocorrelation matrix norm: 1.0363e+03 > > I use petsc 'main' branch (same as the latest release). You can remove > MatAssemblyBegin/End calls after MatMatTransposeMult(): > MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX, PETSC_DEFAULT, > &corr_mat); > //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > The communication patterns of parallel implementation led to different order > of floating-point computation, thus slightly different matrix norm of R. > Hong > > From: petsc-users <[email protected]> on behalf of Peder > Jørgensgaard Olesen via petsc-users <[email protected]> > Sent: Monday, April 19, 2021 7:57 AM > To: [email protected] <[email protected]> > Subject: [petsc-users] Rather different matrix product results on multiple > processes > > Hello, > > When computing a matrix product of the type R = D.DT using > MatMatTransposeMult() I find I get rather different results depending on the > number of processes. In one example using a data set that is small compared > to the application I get Frobenius norms |R| = 1.047e3 on a single process, > 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on two nodes. > > I have ascertained that the single process result is indeed the correct one > (i.e., eigenvectors of R form a proper basis for the columns of D), so > naturally I'd love to be able to reproduce this result across different > parallel setups. How might I achieve this? > > I'm attaching MWE code and the data set used for the example. > > Thanks in advance! > > Best Regards > > Peder Jørgensgaard Olesen > PhD Student, Turbulence Research Lab > Dept. of Mechanical Engineering > Technical University of Denmark > Niels Koppels Allé > Bygning 403, Rum 105 > DK-2800 Kgs. Lyngby > > > -- > Stefano > > > -- > Stefano
