I am answering the SLEPc-related questions: - Having different number of iterations when changing the number of processes is normal. - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused.
Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. Jose On 3/11/2015, Denis Davydov <[email protected]> wrote: > > > Dear all, > > I experience strange convergence problems in SLEPc for GHEP with Krylov-Schur > and CG + GAMG. > The issue appears to be contingent on the number of MPI cores used. > Say for 8 cores there is no issue and for 4 cores there is an issue. > When I substitute GAMG with Jacobi for the problematic number of cores -- all > works. > > To be more specific, I solve Ax=\lambda Bx for a sequence of A’s where A is a > function of eigenvectors. > On each iteration step currently the eigensolver EPS is initialised from > scratch. And thus should the underlying > ST, KSP, PC objects: -st_ksp_type cg -st_pc_type gamg -st_ksp_rtol 1e-12. > For these particular matrices the issue appears on the 4th iteration, even > though the matrix to be inverted (mass/overlap matrix) > is the same, does not change!!! > From my debuging info the A matrix has the same norm for CG + GAMG vs CG + > Jacobi cases: > DEAL:: frobenius_norm = 365.7 > DEAL:: linfty_norm = 19.87 > DEAL:: l1_norm = 19.87 > Just to be sure that there are no bugs on my side which would result in > different mass matrices i check that > it has the same norm for CG + GAMG vs CG + Jacobi BEFORE i start iteration: > DEAL:: frobenius_norm = 166.4 > DEAL:: linfty_norm = 8.342 > DEAL:: l1_norm = 8.342 > All the dependent scalar quantities I calculate on each iteration are > identical for the two cases, which makes me believe that > the solution path is the same up to the certain tolerance. > The only output which is slightly different are the number iterations for > convergence in EPS (e.g. 113 vs 108) and the > resulting maxing EPSComputeResidualNorm : 4.1524e-07 vs 2.9639e-08. > > > Any ideas what could be an issue, especially given the fact that it does work > for some numbers of cores and does not for other? > Perhaps some default settings in GAMG preconditioner? Although that does not > explain why it works for the first 3 iterations > and does not on 4th as the mass matrix is unchanged... > > Lastly, i suppose ideally i should keep the eigensolver context between the > iterations and just update the matrices by EPSSetOperators. > Is it correct to assume that since B matrix does not change between > iterations and I use the default shift transformation with zero shift > (operator is B^{-1)A ), the GAMG preconditioner will not be re-initialised > and thus I should save some time? > > p.s. the relevant error message is below. I have the same issues on CentOS > cluster, so it is not related to OS-X. > > Kind regards, > Denis > > > === > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch > system) has told this process to end > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to > find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] KSPSolve line 510 > /private/tmp/petsc20151102-50378-1t7b3in/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: [0] STMatSolve line 148 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsles.c > [0]PETSC ERROR: [0] STApply_Shift line 33 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/impls/shift/shift.c > [0]PETSC ERROR: [0] STApply line 50 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsolve.c > [0]PETSC ERROR: [0] EPSGetStartVector line 726 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c > [0]PETSC ERROR: [0] EPSSolve_KrylovSchur_Symm line 41 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/impls/krylov/krylovschur/ks-symm.c > [0]PETSC ERROR: [0] EPSSolve line 83 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.2, Oct, 02, 2015 > [0]PETSC ERROR: /Users/davydden/Desktop/work/C++/deal.ii-dft/build_debug~/dft > on a real named MBP-Denis.fritz.box by davydden Tue Nov 3 07:02:47 2015 > [0]PETSC ERROR: Configure options CC=/usr/local/bin/mpicc > CXX=/usr/local/bin/mpicxx F77=/usr/local/bin/mpif77 FC=/usr/local/bin/mpif90 > --with-shared-libraries=1 --with-pthread=0 --with-openmp=0 --with-debugging=1 > --with-ssl=0 > --with-superlu_dist-include=/usr/local/opt/superlu_dist/include/superlu_dist > --with-superlu_dist-lib="-L/usr/local/opt/superlu_dist/lib -lsuperlu_dist" > --with-superlu-include=/usr/local/Cellar/superlu43/4.3/include/superlu > --with-superlu-lib="-L/usr/local/Cellar/superlu43/4.3/lib -lsuperlu" > --with-fftw-dir=/usr/local/opt/fftw --with-netcdf-dir=/usr/local/opt/netcdf > --with-suitesparse-dir=/usr/local/opt/suite-sparse > --with-hdf5-dir=/usr/local/opt/hdf5 --with-metis-dir=/usr/local/opt/metis > --with-parmetis-dir=/usr/local/opt/parmetis > --with-scalapack-dir=/usr/local/opt/scalapack > --with-mumps-dir=/usr/local/opt/mumps --with-x=0 > --prefix=/usr/local/Cellar/petsc/3.6.2/real --with-scalar-type=real > --with-hypre-dir=/usr/local/opt/hypre > --with-sundials-dir=/usr/local/opt/sundials > --with-hwloc-dir=/usr/local/opt/hwloc > [0]PETSC ERROR: #8 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > mpirun noticed that process rank 3 with PID 96754 on node MBP-Denis exited on > signal 6 (Abort trap: 6). > -------------------------------------------------------------------------- >
