Dear Stefano and Jose

Thank you for your replies. Using SVD works like a charm. I'll try to do some 
trickery to work around the HDF5 reader bug.


Best regards

Peder

________________________________
Fra: Jose E. Roman <[email protected]>
Sendt: 21. april 2021 14:24:38
Til: Peder Jørgensgaard Olesen
Cc: [email protected]; Stefano Zampini
Emne: Re: [petsc-users] Rather different matrix product results on multiple 
processes

Independently of the bug mentioned by Stefano, you may want to consider using 
SLEPc's SVD instead of EPS. Left singular vectors of D are equal to 
eigenvectors of D*D', see chapter 4 of SLEPc's users manual. The default solver 
'cross' gives you flexibility to compute the product D*D' explicitly or not, 
and build the transpose explicitly or not.

Jose


> El 21 abr 2021, a las 12:54, Stefano Zampini <[email protected]> 
> escribió:
>
> Here you have, https://gitlab.com/petsc/petsc/-/merge_requests/3903. We can 
> discuss the issue on gitlab.
>
> Thanks
> Stefano
>
> Il giorno mer 21 apr 2021 alle ore 13:39 Stefano Zampini 
> <[email protected]> ha scritto:
> Peder
>
> I have slightly modified your code and I confirm the bug.
> The bug is not with the MatMatTranspose operation; it is within the HDF5 
> reader. I will soon open an MR with the code and discussing the issues.
>
> Thanks for reporting the issue
> Stefano
>
> Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via 
> petsc-users <[email protected]> ha scritto:
> Dear Hong
>
>
>
> Thank your for your reply.
>
>
>
> I have a hunch that the issue goes beyond the minor differences that might 
> arise from floating-point computation order, however.
>
>
>
> Writing the product matrix to a binary file using MatView() and inspecting 
> the output shows very different entries depending on the number of processes. 
> Here are the first three rows and columns of the product matrix obtained in a 
> sequential run:
>
> 2.58348   1.68202   1.66302
>
> 1.68202   4.27506   1.91897
>
> 1.66302   1.91897   2.70028
>
>
>
> - and the corresponding part of the product matrix obtained on one node (40 
> processes):
>
> 4.43536   2.17261   0.16430
>
> 2.17261   4.53224   2.53210
>
> 0.16430   2.53210   4.73234
>
>
>
> The parallel result is not even close to the sequential one. Trying different 
> numbers of processes produces yet different results.
>
>
>
> Also, the eigenvectors that I subsequently determine using a SLEPC solver do 
> not form a proper basis for the column space of the data matrix as they must, 
> which is hardly a surprise given the variability of results indicated above - 
> except when the code is run on just a single process. Forming such a basis 
> central to the intended application, and given that it would need to work on 
> rather large data sets, running on a single process is hardly a viable 
> solution.
>
>
>
> Best regards
>
> Peder
>
> Fra: Zhang, Hong <[email protected]>
> Sendt: 19. april 2021 18:34:31
> Til: [email protected]; Peder Jørgensgaard Olesen
> Emne: Re: Rather different matrix product results on multiple processes
>
> Peder,
> I tested your code on a linux machine. I got
> $ ./acorr_mwe
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0473e+03
>
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via allgatherv 
> (default)
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0363e+03
>
> mpiexec -n 20 ./acorr_mwe
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0897e+03
>
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0363e+03
>
> I use petsc 'main' branch (same as the latest release). You can remove 
> MatAssemblyBegin/End calls after MatMatTransposeMult():
> MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX, PETSC_DEFAULT, 
> &corr_mat);
> //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
> //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>
> The communication patterns of parallel implementation led to different order 
> of floating-point computation, thus slightly different matrix norm of R.
> Hong
>
> From: petsc-users <[email protected]> on behalf of Peder 
> Jørgensgaard Olesen via petsc-users <[email protected]>
> Sent: Monday, April 19, 2021 7:57 AM
> To: [email protected] <[email protected]>
> Subject: [petsc-users] Rather different matrix product results on multiple 
> processes
>
> Hello,
>
> When computing a matrix product of the type R = D.DT using 
> MatMatTransposeMult() I find I get rather different results depending on the 
> number of processes. In one example using a data set that is small compared 
> to the application I get Frobenius norms |R| = 1.047e3 on a single process, 
> 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on two nodes.
>
> I have ascertained that the single process result is indeed the correct one 
> (i.e., eigenvectors of R form a proper basis for the columns of D), so 
> naturally I'd love to be able to reproduce this result across different 
> parallel setups. How might I achieve this?
>
> I'm attaching MWE code and the data set used for the example.
>
> Thanks in advance!
>
> Best Regards
>
> Peder Jørgensgaard Olesen
> PhD Student, Turbulence Research Lab
> Dept. of Mechanical Engineering
> Technical University of Denmark
> Niels Koppels Allé
> Bygning 403, Rum 105
> DK-2800 Kgs. Lyngby
>
>
> --
> Stefano
>
>
> --
> Stefano

Reply via email to