Re: [petsc-users] Question on MatMatmult
Ah ok, Then I will have a look at matconvert. And then maybe later switch to AIJ as well. Thanks of the help, Frank > On 29 May 2024, at 16:57, Barry Smith wrote: > > > You can use MatConvert() > > >> On May 29, 2024, at 10:53 AM, Frank Bramkamp wrote: >> >> This Message Is From an External Sender >> This message came from outside your organization. >> Hello Hong, >> >> Thank you for the clarification. >> If I already have a BAIJ matrix format, can I then convert it later into AIJ >> format as well ?! >> In that case I would have two matrices, but that would be ok for testing. >> I think that you sometimes convert different matrix formats into each other >> ?! >> >> >> Since I typically have BAIJ format, I also use a blocked ILU, which would >> turn into a point wise ILU >> for an AIJ matrix. That is why I typically have the BAIJ format. >> >> Otherwise, I have to change it into an AIJ format from the beginning. >> >> >> Thanks for the quick help, >> >> Frank >> >> >> >> >
Re: [petsc-users] Question on MatMatmult
Hello Hong, Thank you for the clarification. If I already have a BAIJ matrix format, can I then convert it later into AIJ format as well ?! In that case I would have two matrices, but that would be ok for testing. I think that you sometimes ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hello Hong, Thank you for the clarification. If I already have a BAIJ matrix format, can I then convert it later into AIJ format as well ?! In that case I would have two matrices, but that would be ok for testing. I think that you sometimes convert different matrix formats into each other ?! Since I typically have BAIJ format, I also use a blocked ILU, which would turn into a point wise ILU for an AIJ matrix. That is why I typically have the BAIJ format. Otherwise, I have to change it into an AIJ format from the beginning. Thanks for the quick help, Frank
[petsc-users] Question on MatMatmult
Dear PETSc Team, I would like to make a matrix-matrix product of two matrices. I try to use CALL MatMatMult(Mat_A,MAT_B,MAT_INITIAL_MATRIX,PETSC_DEFAULT_REAL,MAT_AB,IERROR). // calling from fortran When I try to use this function I get the ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Dear PETSc Team, I would like to make a matrix-matrix product of two matrices. I try to use CALL MatMatMult(Mat_A,MAT_B,MAT_INITIAL_MATRIX,PETSC_DEFAULT_REAL,MAT_AB,IERROR). // calling from fortran When I try to use this function I get the following error message: "Unspecified symbolic phase for product AB with A seqbaij, B seqbaij. The product is not supported” I am using the seqbaij matrix format. Is MatMatMult and MatProductSymbolic only defined for the standard point-wise matrix format but not for a blocked format ?! In the documentation, I could not see a hint on supported matrix formats or any limitations. The examples also just use a point-wise format (AIJ), as I can see so far. Greetings, Frank Bramkamp
Re: [petsc-users] Problem with NVIDIA compiler and OpenACC
Dear Barry, That looks very good now. The -lnvc is gone now. I also tested my small fortran program. There I can see that libnvc is automatically added as well, but this time is comes after the libaccdevice. so. library for openacc. And then ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Dear Barry, That looks very good now. The -lnvc is gone now. I also tested my small fortran program. There I can see that libnvc is automatically added as well, but this time is comes after the libaccdevice.so. library for openacc. And then my openacc commands also work again. I also mentioned some issues with some cuda nvJitlink library. I just found out that some path in our cuda compiler module was not set correctly. I will try to compile it with cuda again as well. We just start to get PETSC on GPUs with the cuda backend, and I start with openccc for our fortran code to get first experience how everything works with GPU porting. Good that you could fix the issue. Thanks for the great help. Have a nice weekend, Frank Bramkamp
Re: [petsc-users] Problem with NVIDIA compiler and OpenACC
Thanks for effort, Barry. I will get it and give it another try. Thanks a lot, Frank > On 5 Apr 2024, at 15: 56, Barry Smith wrote: > > > There was a bug in my attempted fix so it actually did not skip the ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Thanks for effort, Barry. I will get it and give it another try. Thanks a lot, Frank > On 5 Apr 2024, at 15:56, Barry Smith wrote: > > > There was a bug in my attempted fix so it actually did not skip the option. > > Try git pull and then run configure again. > > >> On Apr 5, 2024, at 6:30 AM, Frank Bramkamp wrote: >> >> Dear Barry, >> >> I tried your fix for -lnvc. Unfortunately it did not work so far. >> Here I send you the configure.log file again. >> >> One can see that you try to skip something, but later it still always includes -lnvc for the linker. >> In the file petscvariables it also appears as before. >> >> As I see it, it lists the linker options including -lnvc also before you try to skip it. >> Maybe it is already in the linker options before the skipping. >> >> >> Greetings, Frank >> >> >> >
Re: [petsc-users] Problem with NVIDIA compiler and OpenACC
Thanks for the response, My code is in fortran. I will try to explicitly set LIBS=. . as you suggested. At the moment I skip cuda, but later I also want to use cuda as well. Barry also tried to skip the “-lnvc”, but that did not work yet. Thanks ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Thanks for the response, My code is in fortran. I will try to explicitly set LIBS=.. as you suggested. At the moment I skip cuda, but later I also want to use cuda as well. Barry also tried to skip the “-lnvc”, but that did not work yet. Thanks a lot for the suggestions, Frank
Re: [petsc-users] Problem with NVIDIA compiler and OpenACC
Ok, I will have a look. It is already evening here in Sweden, so it might take until tomorrow. Thanks Frank ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Ok, I will have a look. It is already evening here in Sweden, so it might take until tomorrow. Thanks Frank
Re: [petsc-users] Problem with NVIDIA compiler and OpenACC
Ok, I will look for the config. log file. Frank ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Ok, I will look for the config.log file. Frank
Re: [petsc-users] Problem with NVIDIA compiler and OpenACC
Thanks for the reply, Do you know if you actively include the libnvc library ?! Or is this somehow automatically included ?! Greetings, Frank > On 4 Apr 2024, at 15:56, Satish Balay wrote: > > > On Thu, 4 Apr 2024, Frank Bramkamp wrote: > >> Dear PETSC Team, >> >> I found the following problem: >> I compile petsc 3.20.5 with Nvidia compiler 23.7. >> >> >> I use a pretty standard configuration, including >> >> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort COPTFLAGS="-O2 -g" >> CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-log=1 >> --download-fblaslapack --with-cuda=0 >> >> I exclude cuda, since I was not sure if the problem was cuda related. > > Can you try using (to exclude cuda): --with-cudac=0 > >> >> >> The problem is now, if I have s simple fortran program where I link the >> petsc library, but I actually do not use petsc in that program >> (Just for testing). I want to use OpenACC directives in my program, e.g. >> !$acc parallel loop . >> The problem is now, as soon I link with the petsc library, the openacc >> commands do not work anymore. >> It seems that openacc is not initialised and hence it cannot find a GPU. >> >> The problem seems that you link with -lnvc. >> In “petscvariables” => PETSC_WITH_EXTERNAL_LIB you include “-lnvc”. >> If I take this out, then openacc works. With “-lnvc” something gets messed >> up. >> >> The problem is also discussed here: >> https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$ >> >> <https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$><https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$ >> >> <https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$> >> > >> >> My understanding is that libnvc is more a runtime library that does not need >> to be included by the linker. >> Not sure if there is a specific reason to include libnvc (I am not so >> familiar what this library does). >> >> If I take out -lnvc from “petscvariables”, then my program with openacc >> works as expected. I did not try any more realistic program that includes >> petsc. >> >> >> >> 2) >> When compiling petsc with cuda support, I also found that in the petsc >> library the library libnvJitLink.so.12 >> Is not found. On my system this library is in $CUDA_ROOT/lib64 >> I am not sure where this library is on your system ?! > > Hm - good if you can send configure.log for this. configure attempts '$CC -v' > to determine the link libraries to get c/c++/fortran compatibility libraries. > But it can grab other libraries that the compilers are using internally here. > > To avoid this - you can explicitly list these libraries to configure. For ex: > for gcc/g++/gfortran > > ./configure CC=gcc CXX=g++ FC=gfortran LIBS="-lgfortran -lstdc++" > > Satish > >> >> >> Thanks a lot, Frank Bramkamp
[petsc-users] Problem with NVIDIA compiler and OpenACC
Dear PETSC Team, I found the following problem: I compile petsc 3.20.5 with Nvidia compiler 23.7. I use a pretty standard configuration, including --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-log=1 --download-fblaslapack --with-cuda=0 I exclude cuda, since I was not sure if the problem was cuda related. The problem is now, if I have s simple fortran program where I link the petsc library, but I actually do not use petsc in that program (Just for testing). I want to use OpenACC directives in my program, e.g. !$acc parallel loop . The problem is now, as soon I link with the petsc library, the openacc commands do not work anymore. It seems that openacc is not initialised and hence it cannot find a GPU. The problem seems that you link with -lnvc. In “petscvariables” => PETSC_WITH_EXTERNAL_LIB you include “-lnvc”. If I take this out, then openacc works. With “-lnvc” something gets messed up. The problem is also discussed here: https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$ <https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$ > My understanding is that libnvc is more a runtime library that does not need to be included by the linker. Not sure if there is a specific reason to include libnvc (I am not so familiar what this library does). If I take out -lnvc from “petscvariables”, then my program with openacc works as expected. I did not try any more realistic program that includes petsc. 2) When compiling petsc with cuda support, I also found that in the petsc library the library libnvJitLink.so.12 Is not found. On my system this library is in $CUDA_ROOT/lib64 I am not sure where this library is on your system ?! Thanks a lot, Frank Bramkamp
[petsc-users] MATSETVALUES: Fortran problem
Dear PETSc Team, I am using the latest petsc version 3. 20. 5. I would like to create a matrix using MatCreateSeqAIJ To insert values, I use MatSetValues. It seems that the Fortran interface/stubs are missing for MatsetValues, as the linker does ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Dear PETSc Team, I am using the latest petsc version 3.20.5. I would like to create a matrix using MatCreateSeqAIJ To insert values, I use MatSetValues. It seems that the Fortran interface/stubs are missing for MatsetValues, as the linker does not find any subroutine with that name. MatSetValueLocal seems to be fine. Typically I am using a blocked matrix format (BAIJ), which works fine in fortran. Soon we want to try PETSC on GPUs, using the format MATAIJCUSPARSE, since there seems not to be a blocked format available in PETSC for GPUs so far. Therefore I first want to try the pointwise format MatCreateSeqAIJ format on a CPU, before using the GPU format. I think that CUDA also supports a block format now ?! Maybe that would be also useful to have one day. Greetings, Frank Bramkamp
[petsc-users] Fortran problem MatGetValuesLocal
Dear PETSc team, We are using the latest petsc version 3.20.1, intel compiler 2023, and we found the following problem: We want to call the function MatGetValuesLocal to extract a block sub-matrix from an assembled matrix (e.g. a 5x5 blocked sub matrix). We use the matrix format MatCreateBAIJ in parallel. In particular we try to call MatGetValuesLocal in Fortran. It seems that the linked does not find the subroutine MatGetValuesLocal. The subroutine MatGetValues seems to be fine. I guess that the fortran stubs/fortran interface is missing for this routine. On the documentation side, you also write a note for developers that the fortran stubs and interface Is not automatically generated for MatGetValuesLocal. So maybe that has been forgotten to do. Unfortunately I do not have any small test example, since we just incorporated the function call into our own software. Otherwise I would first have to set a small test example for the parallel case. I think there is also an include file where one can check the fortran interfaces ?! I forgot where to look this up. Greetings, Frank Bramkamp
[petsc-users] KSPAGMRES Question
Dear PETSc team, I have seen that there is the KSP method: KSPAGMRES, https://petsc.org/release/docs/manualpages/KSP/KSPAGMRES.html <https://petsc.org/release/docs/manualpages/KSP/KSPAGMRES.html> I wanted to test this method, as it also seems to reduce the amount of MPI communication, compared to the standard GMRES. I supposed that the class is called “KSPAGMRES”. But in the include files petscksp.h and petsc/finclude/petscksp.h there is no definition for KSPAGMRES, just KSPDGMRES. I wonder if the definition KSPAGMRES is simply missing, or do I have to call DGMRES and set another option for AGMRES ?! The standard GMRES has the problem that MPI_Allreduce gets expensive for 2048 cores. Therefore I wanted to see, if AGMRES has a bit less communication, as this is mentioned in the description of the method. Greetings, Frank Bramkamp
[petsc-users] Fortran interface of MatNullSpaceCreate
Hello, I have a question of the Fortran interface of subroutine MatNullSpaceCreate. I tried to call the subroutine in the following form: Vec :: dummyVec, dummyVecs(1) MatNullSpace :: nullspace INTEGER :: ierr (a) call MatNullSpaceCreate( PETSC_COMM_WORLD, PETSC_TRUE, PETSC_NULL_INTEGER, dummyVec, nullspace, ierr) (b) call MatNullSpaceCreate( PETSC_COMM_WORLD, PETSC_TRUE, PETSC_NULL_INTEGER, dummyVecs, nullspace, ierr) (a) and (b) gave me the same error during compilation: no specific subroutine for the generic MatNullSpaceCreate. I am using the latest version of Petsc. I just did a "git pull" and re-build it. How can I call the subroutine ? In addition, I found two 'petscmat.h90' : petsc/include/petsc/finclude/ftn-auto/petscmat.h90 and petsc/src/mat/f90-mod/petscmat.h90. The former defines a subroutine MatNullSpaceCreate in the above form (b). The latter provides generic interface for both (a) and (b). I am not sure if this relates to the error I get. Thank you. Frank
Re: [petsc-users] Question about Set-up of Full MG and its Output
Hello, Thank you. Now I am able to see the trace of MG. I still have a question about the interpolation. I wan to get the matrix of the default interpolation method and print it on terminal. The code is as follow: ( KSP is already set by petsc options) - 132 CALL KSPGetPC( ksp, pc, ierr ) 133 CALL MATCreate( PETSC_COMM_WORLD, interpMat, ierr ) 134 CALL MATSetType( interpMat, MATSEQAIJ, ierr ) 135 CALL MATSetSizes( interpMat, i5, i5, i5, i5, ierr ) 136 CALL MATSetUp( interpMat, ierr ) 137 CALL PCMGGetInterpolation( pc, i1, interpMat, ierr ) 138 CALL MatAssemblyBegin( interpMat, MAT_FINAL_ASSEMBLY, ierr ) 139 CALL MatAssemblyEnd( interpMat, MAT_FINAL_ASSEMBLY, ierr ) 140 CALL MatView( interpMat, PETSC_VIEWER_STDOUT_SELF, ierr ) - The error massage is: --- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Must call PCMGSetInterpolation() or PCMGSetRestriction() --- Do I have to set the interpolation first? How can I just print the default interpolation matrix? I attached the option file. Thank you. Frank On 12/06/2016 02:31 PM, Jed Brown wrote: frank <hengj...@uci.edu> writes: Dear all, I am trying to use full MG to solve a 2D Poisson equation. I want to set full MG as the solver and SOR as the smoother. Is the following setup the proper way to do it? -ksp_typerichardson -pc_type mg -pc_mg_type full -mg_levels_ksp_type richardson -mg_levels_pc_typesor The ksp_view shows the levels from the coarsest mesh to finest mesh in a linear order. It is showing the solver configuration, not a trace of the cycle. I was expecting sth like: coarsest -> level1 -> coarsest -> level1 -> level2 -> level1 -> coarsest -> ... Is there a way to show exactly how the full MG proceeds? You could get a trace like this from -mg_coarse_ksp_converged_reason -mg_levels_ksp_converged_reason If you want to deliminate the iterations, you could add -ksp_monitor. Also in the above example, I want to know what interpolation or prolongation method is used from level1 to level2. Can I get that info by adding some options? (not using PCMGGetInterpolation) I attached the ksp_view info and my petsc options file. Thank you. Frank Linear solve converged due to CONVERGED_RTOL iterations 3 KSP Object: 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-07, absolute=1e-50, divergence=1. left preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: mg MG: type is FULL, levels=6 cycles=v Using Galerkin computed coarse grid matrices Coarse grid solver -- level --- KSP Object: (mg_coarse_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 0., needed 0. Factored matrix follows: Mat Object: 1 MPI processes type: superlu_dist rows=64, cols=64 package used to perform factorization: superlu_dist total: nonzeros=0, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 SuperLU_DIST run parameters: Process grid nprow 1 x npcol 1 Equilibrate matrix TRUE Matrix input mode 0 Replace tiny pivots FALSE Use iterative refinement FALSE Processors in row 1 col partition 1 Row permutation LargeDiag Column permutation METIS_AT_PLUS_A Parallel symbolic factorization FALSE Repeated factorization SamePattern linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=64, cols=64 total: nonzeros=576, allocated nonzeros=576 total number of mallocs used during MatSetValues calls =0 not using I-node routines Down solver (pre-smoother) on level 1
[petsc-users] Question about Set-up of Full MG and its Output
Dear all, I am trying to use full MG to solve a 2D Poisson equation. I want to set full MG as the solver and SOR as the smoother. Is the following setup the proper way to do it? -ksp_typerichardson -pc_type mg -pc_mg_type full -mg_levels_ksp_type richardson -mg_levels_pc_typesor The ksp_view shows the levels from the coarsest mesh to finest mesh in a linear order. I was expecting sth like: coarsest -> level1 -> coarsest -> level1 -> level2 -> level1 -> coarsest -> ... Is there a way to show exactly how the full MG proceeds? Also in the above example, I want to know what interpolation or prolongation method is used from level1 to level2. Can I get that info by adding some options? (not using PCMGGetInterpolation) I attached the ksp_view info and my petsc options file. Thank you. Frank Linear solve converged due to CONVERGED_RTOL iterations 3 KSP Object: 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-07, absolute=1e-50, divergence=1. left preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: mg MG: type is FULL, levels=6 cycles=v Using Galerkin computed coarse grid matrices Coarse grid solver -- level --- KSP Object: (mg_coarse_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 0., needed 0. Factored matrix follows: Mat Object: 1 MPI processes type: superlu_dist rows=64, cols=64 package used to perform factorization: superlu_dist total: nonzeros=0, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 SuperLU_DIST run parameters: Process grid nprow 1 x npcol 1 Equilibrate matrix TRUE Matrix input mode 0 Replace tiny pivots FALSE Use iterative refinement FALSE Processors in row 1 col partition 1 Row permutation LargeDiag Column permutation METIS_AT_PLUS_A Parallel symbolic factorization FALSE Repeated factorization SamePattern linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=64, cols=64 total: nonzeros=576, allocated nonzeros=576 total number of mallocs used during MatSetValues calls =0 not using I-node routines Down solver (pre-smoother) on level 1 --- KSP Object: (mg_levels_1_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=256, cols=256 total: nonzeros=2304, allocated nonzeros=2304 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 --- KSP Object: (mg_levels_2_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=1024, cols=1024 total: nonzeros=9216, allocated nonzeros=9216 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 --- KSP Object: (mg_levels_3_) 1 MPI
[petsc-users] Time cost by Vec Assembly
Hello, Another thing, the vector assemble and scatter take more time as I increased the cores#: cores# 4096 8192 16384 32768 65536 VecAssemblyBegin 2982.91E+002.87E+008.59E+002.75E+01 2.21E+03 VecAssemblyEnd 2983.37E-031.78E-031.78E-03 5.13E-031.99E-03 VecScatterBegin 763033.82E+003.01E+002.54E+004.40E+00 1.32E+00 VecScatterEnd 763033.09E+011.47E+012.23E+01 2.96E+012.10E+01 The above data is produced by solving a constant coefficients Possoin equation with different rhs for 100 steps. As you can see, the time of VecAssemblyBegin increase dramatically from 32K cores to 65K. Something is very very wrong here. It is likely not the VecAssemblyBegin() itself that is taking the huge amount of time. VecAssemblyBegin() is a barrier, that is all processes have to reach it before any process can continue beyond it. Something in the code on some processes is taking a huge amount of time before reaching that point. Perhaps it is in starting up all the processes? Or are you generating the entire rhs on one process? You can't to that. Barry (I create a new subject since this is a separate problem from my previous question.) Each process computes its part of the rhs. The above result are from 100 steps' computation. It is not a starting-up issue. I also have the results from a simple code to show this problem: cores# 4096 8192 16384 32768 65536 VecAssemblyBegin14.56E-023.27E-023.63E-02 6.26E-02 2.80E+02 VecAssemblyEnd 13.54E-043.43E-043.47E-04 3.44E-04 4.53E-04 Again, the time cost increases dramatically after 30K cores. The max/min ratio of VecAssemblyBegin is 1.2 for both 30K and 65K cases. If there is a huge delay on some process, should this value be large? The part of code that calls the assembly subroutines looks like: CALL DMCreateGlobalVector( ... ) CALL DMDAVecGetArrayF90( ... ) ... each process computes its part of rhs... CALL DMDAVecRestoreArrayF90(...) CALL VecAssemblyBegin( ... ) CALL VecAssemblyEnd( ... ) Thank you Regards, Frank On 10/04/2016 12:56 PM, Dave May wrote: On Tuesday, 4 October 2016, frank <hengj...@uci.edu> wrote: Hi, This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner". I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem. Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all. Test1: 512^3 grid points Core#telescope_reduction_factorMG levels# for up/down solver Time for KSPSolve (s) 512 8 4 / 3 6.2466 4096 64 5 / 3 0.9361 32768 64 4 / 3 4.8914 Test2: 1024^3 grid points Core#telescope_reduction_factorMG levels# for up/down solver Time for KSPSolve (s) 4096 64 5 / 4 3.4139 8192 128 5 / 4 2.4196 16384 32 5 / 3 5.4150 32768 64 5 / 3 5.6067 65536 128 5 / 3 6.5219 You have to be very careful how you interpret these numbers. Your solver contains nested calls to KSPSolve, and unfortunately as a result the numbers you report include setup time. This will remain true even if you call KSPSetUp on the outermost KSP. Your email concerns scalability of the silver application, so let's focus on that issue. The only way to clearly separate setup from solve time is to perform two identical solves. The second solve will not require any setup. You should monitor the second solve via a new PetscStage. This was what I did in the telescope paper. It was the only way to understand the setup cost (and sca
[petsc-users] create global vector in latest version of petsc
Hi, I update petsc to the latest version by pulling from the repo. Then I find one of my old code, which worked before, output errors now. After debugging, I find that the error is caused by "DMCreateGlobalVector". I attach a short program which can re-produce the error. This program works well with an older version of petsc. I also attach the script I used to configure petsc. The error message is below. Did I miss something in the installation ? Thank you. 1 [0]PETSC ERROR: - Error Message -- 2 [0]PETSC ERROR: Null argument, when expecting valid pointer 3 [0]PETSC ERROR: Null Object: Parameter # 2 4 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. 5 [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 6 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 17:40:07 2016 7 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --download-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx 8 [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c 9 [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c 10 [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c Regards, Frank PROGRAM test_ksp #include #include USE petscdmda USE petscsys IMPLICIT NONE DM :: decomp INTEGER :: N = 32, px = 2, py = 2, pz = 2, ierr Vec :: b CALL PetscInitialize( PETSC_NULL_CHARACTER, ierr ) CALL DMDACreate3d( PETSC_COMM_WORLD, & & DM_BOUNDARY_PERIODIC, DM_BOUNDARY_PERIODIC, DM_BOUNDARY_PERIODIC, & & DMDA_STENCIL_STAR, N, N, N, px, py, pz, 1, 1, & & PETSC_NULL_INTEGER, PETSC_NULL_INTEGER, PETSC_NULL_INTEGER, & & decomp, ierr ) CALL DMCreateGlobalVector( decomp, b, ierr ) ! cause error CALL VecDestroy( b, ierr ) CALL DMDestroy( decomp, ierr ) CALL PetscFinalize( ierr ) END PROGRAM test_ksp #!/usr/bin/python # Do the following before running this configure script [hopp2.nersc.gov] # # setenv XTPE_INFO_MESSAGE_OFF yes # module add acml # Order of the download and installation of libraries is crucial!!! if __name__ == '__main__': import sys import os sys.path.insert(0, os.path.abspath('config')) import configure configure_options = [ '--known-mpi-shared=0 ', '--known-memcmp-ok ', '--with-debugging=1 ', '--with-shared-libraries=0', '--with-mpi-compilers=1 ', #'--with-64-bit-indices', '--download-blacs=1 ', '--download-metis=1 ', '--download-parmetis=1 ', '--download-superlu_dist=1 ', '--download-hypre=1', #'--with-hdf5-include=/usr/local/petsc/gnu-dbg-32idx/include', #'--with-hdf5-lib=/usr/local/petsc/gnu-dbg-32idx/lib', #'--download-netcdf=1', #'--download-ml=1', ] configure.petsc_configure(configure_options)
Re: [petsc-users] Performance of the Telescope Multigrid Preconditioner
Hi Dave, Thank you for the reply. What do you mean by the "nested calls to KSPSolve"? I tried to call KSPSolve twice, but the the second solve converged in 0 iteration. KSPSolve seems to remember the solution. How can I force both solves start from the same initial guess? Thank you. Frank On 10/04/2016 12:56 PM, Dave May wrote: On Tuesday, 4 October 2016, frank <hengj...@uci.edu <mailto:hengj...@uci.edu>> wrote: Hi, This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner". I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem. Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all. Test1: 512^3 grid points Core#telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) 512 8 4 / 3 6.2466 4096 64 5 / 3 0.9361 32768 64 4 / 3 4.8914 Test2: 1024^3 grid points Core#telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) 4096 64 5 / 43.4139 8192 128 5 / 4 2.4196 16384 32 5 / 35.4150 32768 64 5 / 3 5.6067 65536 128 5 / 3 6.5219 You have to be very careful how you interpret these numbers. Your solver contains nested calls to KSPSolve, and unfortunately as a result the numbers you report include setup time. This will remain true even if you call KSPSetUp on the outermost KSP. Your email concerns scalability of the silver application, so let's focus on that issue. The only way to clearly separate setup from solve time is to perform two identical solves. The second solve will not require any setup. You should monitor the second solve via a new PetscStage. This was what I did in the telescope paper. It was the only way to understand the setup cost (and scaling) cf the solve time (and scaling). Thanks Dave I guess I didn't set the MG levels properly. What would be the efficient way to arrange the MG levels? Also which preconditionr at the coarse mesh of the 2nd communicator should I use to improve the performance? I attached the test code and the petsc options file for the 1024^3 cube with 32768 cores. Thank you. Regards, Frank On 09/15/2016 03:35 AM, Dave May wrote: HI all, I the only unexpected memory usage I can see is associated with the call to MatPtAP(). Here is something you can try immediately. Run your code with the additional options -matrap 0 -matptap_scalable I didn't realize this before, but the default behaviour of MatPtAP in parallel is actually to to explicitly form the transpose of P (e.g. assemble R = P^T) and then compute R.A.P. You don't want to do this. The option -matrap 0 resolves this issue. The implementation of P^T.A.P has two variants. The scalable implementation (with respect to memory usage) is selected via the second option -matptap_scalable. Try it out - I see a significant memory reduction using these options for particular mesh sizes / partitions. I've attached a cleaned up version of the code you sent me. There were a number of memory leaks and other issues. The main points being * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} * You should call PetscFinalize(), otherwise the option -log_summary (-log_view) will not display anything once the program has completed. Thanks, Dave On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu <javascript:_e(%7B%7D,'cvml','hengj...@uci.edu');>> wrote: Hi Dave, Sorry, I should have put more comment to explain the code. The number of process in each dimension is the same: Px = Py=Pz=P. So is the domain size. So if the you want to run the code for a 512^3 grid points on 16^3 cores, you need to set "-N 512 -P 16" in the command line. I add more comments and also fix an error in the attached code. ( The error only effects the accuracy of solution but not the memory usage. ) Thank you. Frank On 9/14/2016 9:05 PM, Dave May wrote: On Thursday, 15 September 2016, Dave May &l
[petsc-users] Performance of the Telescope Multigrid Preconditioner
Hi, This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner". I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem. Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all. Test1: 512^3 grid points Core#telescope_reduction_factorMG levels# for up/down solver Time for KSPSolve (s) 512 8 4 / 3 6.2466 4096 64 5 / 3 0.9361 32768 64 4 / 3 4.8914 Test2: 1024^3 grid points Core#telescope_reduction_factorMG levels# for up/down solver Time for KSPSolve (s) 4096 64 5 / 4 3.4139 8192 128 5 / 4 2.4196 16384 32 5 / 3 5.4150 32768 64 5 / 3 5.6067 65536 128 5 / 3 6.5219 I guess I didn't set the MG levels properly. What would be the efficient way to arrange the MG levels? Also which preconditionr at the coarse mesh of the 2nd communicator should I use to improve the performance? I attached the test code and the petsc options file for the 1024^3 cube with 32768 cores. Thank you. Regards, Frank On 09/15/2016 03:35 AM, Dave May wrote: HI all, I the only unexpected memory usage I can see is associated with the call to MatPtAP(). Here is something you can try immediately. Run your code with the additional options -matrap 0 -matptap_scalable I didn't realize this before, but the default behaviour of MatPtAP in parallel is actually to to explicitly form the transpose of P (e.g. assemble R = P^T) and then compute R.A.P. You don't want to do this. The option -matrap 0 resolves this issue. The implementation of P^T.A.P has two variants. The scalable implementation (with respect to memory usage) is selected via the second option -matptap_scalable. Try it out - I see a significant memory reduction using these options for particular mesh sizes / partitions. I've attached a cleaned up version of the code you sent me. There were a number of memory leaks and other issues. The main points being * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} * You should call PetscFinalize(), otherwise the option -log_summary (-log_view) will not display anything once the program has completed. Thanks, Dave On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu <mailto:hengj...@uci.edu>> wrote: Hi Dave, Sorry, I should have put more comment to explain the code. The number of process in each dimension is the same: Px = Py=Pz=P. So is the domain size. So if the you want to run the code for a 512^3 grid points on 16^3 cores, you need to set "-N 512 -P 16" in the command line. I add more comments and also fix an error in the attached code. ( The error only effects the accuracy of solution but not the memory usage. ) Thank you. Frank On 9/14/2016 9:05 PM, Dave May wrote: On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com <mailto:dave.mayhe...@gmail.com>> wrote: On Thursday, 15 September 2016, frank <hengj...@uci.edu> wrote: Hi, I write a simple code to re-produce the error. I hope this can help to diagnose the problem. The code just solves a 3d poisson equation. Why is the stencil width a runtime parameter?? And why is the default value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. Was this choice made to mimic something in the real application code? Please ignore - I misunderstood your usage of the param set by -P I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. That's when I re-produce the OOM error. Each core has about 2G memory. I al
Re: [petsc-users] Question about memory usage in Multigrid preconditioner
Hi, I write a simple code to re-produce the error. I hope this can help to diagnose the problem. The code just solves a 3d poisson equation. I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. That's when I re-produce the OOM error. Each core has about 2G memory. I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp solver works fine. I attached the code, ksp_view_pre's output and my petsc option file. Thank you. Frank On 09/09/2016 06:38 PM, Hengjie Wang wrote: Hi Barry, I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is not in file I sent you. I am sorry for the confusion. Regards, Frank On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov <mailto:bsm...@mcs.anl.gov>> wrote: > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu <javascript:;>> wrote: > > Hi Barry, > > I think the first KSP view output is from -ksp_view_pre. Before I submitted the test, I was not sure whether there would be OOM error or not. So I added both -ksp_view_pre and -ksp_view. But the options file you sent specifically does NOT list the -ksp_view_pre so how could it be from that? Sorry to be pedantic but I've spent too much time in the past trying to debug from incorrect information and want to make sure that the information I have is correct before thinking. Please recheck exactly what happened. Rerun with the exact input file you emailed if that is needed. Barry > > Frank > > > On 09/09/2016 12:38 PM, Barry Smith wrote: >> Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only one KSPView in it? Did you run two different solves in the 2 case but not the one? >> >> Barry >> >> >> >>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu <javascript:;>> wrote: >>> >>> Hi, >>> >>> I want to continue digging into the memory problem here. >>> I did find a work around in the past, which is to use less cores per node so that each core has 8G memory. However this is deficient and expensive. I hope to locate the place that uses the most memory. >>> >>> Here is a brief summary of the tests I did in past: >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>> Maximum (over computational time) process memory: total 7.0727e+08 >>> Current process memory:total 7.0727e+08 >>> Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11 >>> Current space PetscMalloc()ed: total 1.8275e+09 >>> >>>> Test2:Mesh 1536*128*384 | Process Mesh 96*8*24 >>> Maximum (over computational time) process memory: total 5.9431e+09 >>> Current process memory:total 5.9431e+09 >>> Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12 >>> Current space PetscMalloc()ed: total 5.4844e+09 >>> >>>> Test3:Mesh 3072*256*768 | Process Mesh 96*8*24 >>> OOM( Out Of Memory ) killer of the supercomputer terminated the job during "KSPSolve". >>> >>> I attached the output of ksp_view( the third test's output is from ksp_view_pre ), memory_view and also the petsc options. >>> >>> In all the tests, each core can access about 2G memory. In test3, there are 4223139840 non-zeros in the matrix. This will consume about 1.74M, using double precision. Considering some extra memory used to store integer index, 2G memory should still be way enough. >>> >>> Is there a way to find out which part of KSPSolve uses the most memory? >>> Thank you so much. >>> >>> BTW, there are 4 options remains unused and I don't understand why they are omitted: >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>> >>> >>> Regards, >>> Frank >>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>> >>>> On 14 July 2016 at 01:07, frank <hengj...@uci.edu <javascript:;>> wrote: >>>> Hi Dave, >>&
Re: [petsc-users] Question about memory usage in Multigrid preconditioner
Hi Barry, I think the first KSP view output is from -ksp_view_pre. Before I submitted the test, I was not sure whether there would be OOM error or not. So I added both -ksp_view_pre and -ksp_view. Frank On 09/09/2016 12:38 PM, Barry Smith wrote: Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only one KSPView in it? Did you run two different solves in the 2 case but not the one? Barry On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu> wrote: Hi, I want to continue digging into the memory problem here. I did find a work around in the past, which is to use less cores per node so that each core has 8G memory. However this is deficient and expensive. I hope to locate the place that uses the most memory. Here is a brief summary of the tests I did in past: Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 Maximum (over computational time) process memory: total 7.0727e+08 Current process memory: total 7.0727e+08 Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11 Current space PetscMalloc()ed: total 1.8275e+09 Test2:Mesh 1536*128*384 | Process Mesh 96*8*24 Maximum (over computational time) process memory: total 5.9431e+09 Current process memory: total 5.9431e+09 Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12 Current space PetscMalloc()ed: total 5.4844e+09 Test3:Mesh 3072*256*768 | Process Mesh 96*8*24 OOM( Out Of Memory ) killer of the supercomputer terminated the job during "KSPSolve". I attached the output of ksp_view( the third test's output is from ksp_view_pre ), memory_view and also the petsc options. In all the tests, each core can access about 2G memory. In test3, there are 4223139840 non-zeros in the matrix. This will consume about 1.74M, using double precision. Considering some extra memory used to store integer index, 2G memory should still be way enough. Is there a way to find out which part of KSPSolve uses the most memory? Thank you so much. BTW, there are 4 options remains unused and I don't understand why they are omitted: -mg_coarse_telescope_mg_coarse_ksp_type value: preonly -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 -mg_coarse_telescope_mg_levels_ksp_type value: richardson Regards, Frank On 07/13/2016 05:47 PM, Dave May wrote: On 14 July 2016 at 01:07, frank <hengj...@uci.edu> wrote: Hi Dave, Sorry for the late reply. Thank you so much for your detailed reply. I have a question about the estimation of the memory usage. There are 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is used. So the memory per process is: 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? Did I do sth wrong here? Because this seems too small. No - I totally f***ed it up. You are correct. That'll teach me for fumbling around with my iphone calculator and not using my brain. (Note that to convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert between units correctly) From the PETSc objects associated with the solver, It looks like it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in your application code there are other objects you have forgotten to log the memory for. I am running this job on Bluewater I am using the 7 points FD stencil in 3D. I thought so on both counts. I apologize that I made a stupid mistake in computing the memory per core. My settings render each core can access only 2G memory on average instead of 8G which I mentioned in previous email. I re-run the job with 8G memory per core on average and there is no "Out Of Memory" error. I would do more test to see if there is still some memory issue. Ok. I'd still like to know where the memory was being used since my estimates were off. Thanks, Dave Regards, Frank On 07/11/2016 01:18 PM, Dave May wrote: Hi Frank, On 11 July 2016 at 19:14, frank <hengj...@uci.edu> wrote: Hi Dave, I re-run the test using bjacobi as the preconditioner on the coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc option file is attached. I still got the "Out Of Memory" error. The error occurred before the linear solver finished one step. So I don't have the full info from ksp_view. The info from ksp_view_pre is attached. Okay - that is essentially useless (sorry) It seems to me that the error occurred when the decomposition was going to be changed. Based on what informat
Re: [petsc-users] Question about memory usage in Multigrid preconditioner
Hi, I want to continue digging into the memory problem here. I did find a work around in the past, which is to use less cores per node so that each core has 8G memory. However this is deficient and expensive. I hope to locate the place that uses the most memory. Here is a brief summary of the tests I did in past: > Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 Maximum (over computational time) process memory: total 7.0727e+08 Current process memory: total 7.0727e+08 Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11 Current space PetscMalloc()ed: total 1.8275e+09 > Test2:Mesh 1536*128*384 | Process Mesh 96*8*24 Maximum (over computational time) process memory: total 5.9431e+09 Current process memory: total 5.9431e+09 Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12 Current space PetscMalloc()ed: total 5.4844e+09 > Test3:Mesh 3072*256*768 | Process Mesh 96*8*24 OOM( Out Of Memory ) killer of the supercomputer terminated the job during "KSPSolve". I attached the output of ksp_view( the third test's output is from ksp_view_pre ), memory_view and also the petsc options. In all the tests, each core can access about 2G memory. In test3, there are 4223139840 non-zeros in the matrix. This will consume about 1.74M, using double precision. Considering some extra memory used to store integer index, 2G memory should still be way enough. Is there a way to find out which part of KSPSolve uses the most memory? Thank you so much. BTW, there are 4 options remains unused and I don't understand why they are omitted: -mg_coarse_telescope_mg_coarse_ksp_type value: preonly -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 -mg_coarse_telescope_mg_levels_ksp_type value: richardson Regards, Frank On 07/13/2016 05:47 PM, Dave May wrote: On 14 July 2016 at 01:07, frank <hengj...@uci.edu <mailto:hengj...@uci.edu>> wrote: Hi Dave, Sorry for the late reply. Thank you so much for your detailed reply. I have a question about the estimation of the memory usage. There are 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is used. So the memory per process is: 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? Did I do sth wrong here? Because this seems too small. No - I totally f***ed it up. You are correct. That'll teach me for fumbling around with my iphone calculator and not using my brain. (Note that to convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert between units correctly) From the PETSc objects associated with the solver, It looks like it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in your application code there are other objects you have forgotten to log the memory for. I am running this job on Bluewater <https://bluewaters.ncsa.illinois.edu/user-guide> I am using the 7 points FD stencil in 3D. I thought so on both counts. I apologize that I made a stupid mistake in computing the memory per core. My settings render each core can access only 2G memory on average instead of 8G which I mentioned in previous email. I re-run the job with 8G memory per core on average and there is no "Out Of Memory" error. I would do more test to see if there is still some memory issue. Ok. I'd still like to know where the memory was being used since my estimates were off. Thanks, Dave Regards, Frank On 07/11/2016 01:18 PM, Dave May wrote: Hi Frank, On 11 July 2016 at 19:14, frank <hengj...@uci.edu <mailto:hengj...@uci.edu>> wrote: Hi Dave, I re-run the test using bjacobi as the preconditioner on the coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc option file is attached. I still got the "Out Of Memory" error. The error occurred before the linear solver finished one step. So I don't have the full info from ksp_view. The info from ksp_view_pre is attached. Okay - that is essentially useless (sorry) It seems to me that the error occurred when the decomposition was going to be changed. Based on what information? Running with -info would give us more clues, but will create a ton of output. Please try running the case which failed with -info I had another test with a grid of 1536*128*384 and the same process mesh as above. There was no error. The ksp_view info is attached for comparison. Thank you. [3] Here is my crude estimate of y
Re: [petsc-users] Question about memory usage in Multigrid preconditioner
Hi Dave, Sorry for the late reply. Thank you so much for your detailed reply. I have a question about the estimation of the memory usage. There are 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is used. So the memory per process is: 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? Did I do sth wrong here? Because this seems too small. I am running this job on Bluewater <https://bluewaters.ncsa.illinois.edu/user-guide> I am using the 7 points FD stencil in 3D. I apologize that I made a stupid mistake in computing the memory per core. My settings render each core can access only 2G memory on average instead of 8G which I mentioned in previous email. I re-run the job with 8G memory per core on average and there is no "Out Of Memory" error. I would do more test to see if there is still some memory issue. Regards, Frank On 07/11/2016 01:18 PM, Dave May wrote: Hi Frank, On 11 July 2016 at 19:14, frank <hengj...@uci.edu <mailto:hengj...@uci.edu>> wrote: Hi Dave, I re-run the test using bjacobi as the preconditioner on the coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc option file is attached. I still got the "Out Of Memory" error. The error occurred before the linear solver finished one step. So I don't have the full info from ksp_view. The info from ksp_view_pre is attached. Okay - that is essentially useless (sorry) It seems to me that the error occurred when the decomposition was going to be changed. Based on what information? Running with -info would give us more clues, but will create a ton of output. Please try running the case which failed with -info I had another test with a grid of 1536*128*384 and the same process mesh as above. There was no error. The ksp_view info is attached for comparison. Thank you. [3] Here is my crude estimate of your memory usage. I'll target the biggest memory hogs only to get an order of magnitude estimate * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI rank assuming double precision. The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit integers) * You use 5 levels of coarsening, so the other operators should represent (collectively) 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on the communicator with 18432 ranks. The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator with 18432 ranks. * You use a reduction factor of 64, making the new communicator with 288 MPI ranks. PCTelescope will first gather a temporary matrix associated with your coarse level operator assuming a comm size of 288 living on the comm with size 18432. This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288 ranks. This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus require another 32 MB per rank. The temporary matrix is now destroyed. * Because a DMDA is detected, a permutation matrix is assembled. This requires 2 doubles per point in the DMDA. Your coarse DMDA contains 92 x 16 x 48 points. Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm. * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting operator will have the same memory footprint as the unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held in memory when the DMDA is provided. From my rough estimates, the worst case memory foot print for any given core, given your options is approximately 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB This is way below 8 GB. Note this estimate completely ignores: (1) the memory required for the restriction operator, (2) the potential growth in the number of non-zeros per row due to Galerkin coarsening (I wished -ksp_view_pre reported the output from MatView so we could see the number of non-zeros required by the coarse level operators) (3) all temporary vectors required by the CG solver, and those required by the smoothers. (4) internal memory allocated by MatPtAP (5) memory associated with IS's used within PCTelescope So either I am completely off in my estimates, or you have not carefully estimated the memory usage of your application code. Hopefully others might examine/correct my rough estimates Since I don't have your code I cannot access the latter. Since I don't have access to the same machine you are running on, I think we need to take a step back. [1] What machine are you running on? Send me a URL if its available [2] What discretization are you using? (I am guessing a scalar 7 point FD stencil) If it's a 7 point FD stencil, we should be able to examine the memory usage of your solver configuration using a standard, light weight existing PETSc example, run on your machine at the same scale. This would hopefully enable us to correctly evaluate the
Re: [petsc-users] Question about memory usage in Multigrid preconditioner
Hi Barry and Dave, Thank both of you for the advice. @Barry I made a mistake in the file names in last email. I attached the correct files this time. For all the three tests, 'Telescope' is used as the coarse preconditioner. == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 Part of the memory usage: Vector 125124 3971904 0. Matrix 101 101 9462372 0 == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 Part of the memory usage: Vector 125124 681672 0. Matrix 101 101 1462180 0. In theory, the memory usage in Test1 should be 8 times of Test2. In my case, it is about 6 times. == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain per process: 32*32*32 Here I get the out of memory error. I tried to use -mg_coarse jacobi. In this way, I don't need to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? The linear solver didn't work in this case. Petsc output some errors. @Dave In test3, I use only one instance of 'Telescope'. On the coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. If my set the levels correctly, then on the last coarse mesh of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. On the last coarse mesh of 'Telescope', there is only one grid point per process. I still got the OOM error. The detailed petsc option file is attached. Thank you so much. Frank On 07/06/2016 02:51 PM, Barry Smith wrote: On Jul 6, 2016, at 4:19 PM, frank <hengj...@uci.edu> wrote: Hi Barry, Thank you for you advice. I tried three test. In the 1st test, the grid is 3072*256*768 and the process mesh is 96*8*24. The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as the preconditioner at the coarse mesh. The system gives me the "Out of Memory" error before the linear system is completely solved. The info from '-ksp_view_pre' is attached. I seems to me that the error occurs when it reaches the coarse mesh. The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd test uses the same grid but a different process mesh 48*4*12. Are you sure this is right? The total matrix and vector memory usage goes from 2nd test Vector 384383 8,193,712 0. Matrix 103103 11,508,688 0. to 3rd test Vector 384383 1,590,520 0. Matrix 103103 3,508,664 0. that is the memory usage got smaller but if you have only 1/8th the processes and the same grid it should have gotten about 8 times bigger. Did you maybe cut the grid by a factor of 8 also? If so that still doesn't explain it because the memory usage changed by a factor of 5 something for the vectors and 3 something for the matrices. The linear solver and petsc options in 2nd and 3rd tests are the same in 1st test. The linear solver works fine in both test. I attached the memory usage of the 2nd and 3rd tests. The memory info is from the option '-log_summary'. I tried to use '-momery_info' as you suggested, but in my case petsc treated it as an unused option. It output nothing about the memory. Do I need to add sth to my code so I can use '-memory_info'? Sorry, my mistake the option is -memory_view Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory is used without the telescope? Also run case 2 the same way. Barry In both tests the memory usage is not large. It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused the error in the 1st test. Is there is a way to show how much memory it allocated? Frank On 07/05/2016 03:37 PM, Barry Smith wrote: Frank, You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far. Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options. Barry On Jul 5, 2016, at 5:23 PM, frank <hengj...@uci.edu> wrote: Hi, I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel. I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance. The petsc options file is attached. The domain is a 3d box. It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I
Re: [petsc-users] Question about memory usage in Multigrid preconditioner
Hi Barry, Thank you for you advice. I tried three test. In the 1st test, the grid is 3072*256*768 and the process mesh is 96*8*24. The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as the preconditioner at the coarse mesh. The system gives me the "Out of Memory" error before the linear system is completely solved. The info from '-ksp_view_pre' is attached. I seems to me that the error occurs when it reaches the coarse mesh. The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd test uses the same grid but a different process mesh 48*4*12. The linear solver and petsc options in 2nd and 3rd tests are the same in 1st test. The linear solver works fine in both test. I attached the memory usage of the 2nd and 3rd tests. The memory info is from the option '-log_summary'. I tried to use '-momery_info' as you suggested, but in my case petsc treated it as an unused option. It output nothing about the memory. Do I need to add sth to my code so I can use '-memory_info'? In both tests the memory usage is not large. It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused the error in the 1st test. Is there is a way to show how much memory it allocated? Frank On 07/05/2016 03:37 PM, Barry Smith wrote: Frank, You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far. Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options. Barry On Jul 5, 2016, at 5:23 PM, frank <hengj...@uci.edu> wrote: Hi, I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel. I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance. The petsc options file is attached. The domain is a 3d box. It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using. Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver. The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either. In addition, I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error. How can I diagnose what exactly cause the error? Thank you so much. Frank KSP Object: 18432 MPI processes type: cg maximum iterations=1 tolerances: relative=1e-07, absolute=1e-50, divergence=1. left preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 18432 MPI processes type: mg PC has not been set up so information may be incomplete MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level --- KSP Object:(mg_coarse_) 18432 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using DEFAULT norm type for convergence test PC Object:(mg_coarse_) 18432 MPI processes type: redundant PC has not been set up so information may be incomplete Redundant preconditioner: Not yet setup Down solver (pre-smoother) on level 1 --- KSP Object:(mg_levels_1_) 18432 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0., max = 0. maximum iterations=2, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using NONE norm type for convergence test PC Object:(mg_levels_1_) 18432 MPI processes type: sor PC has not been set up so information may be incomplete SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 --- KSP Obj
[petsc-users] Question about memory usage in Multigrid preconditioner
Hi, I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel. I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance. The petsc options file is attached. The domain is a 3d box. It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using. Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver. The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either. In addition, I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error. How can I diagnose what exactly cause the error? Thank you so much. Frank -ksp_typecg -ksp_norm_type unpreconditioned -ksp_lag_norm -ksp_rtol1e-7 -ksp_initial_guess_nonzero yes -ksp_converged_reason -ppe_max_iter 50 -pc_type mg -pc_mg_galerkin -pc_mg_levels 4 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_coarse_ksp_type preonly -mg_coarse_pc_type telescope -mg_coarse_pc_telescope_reduction_factor 64 -options_left -log_summary # Setting dmdarepart on subcomm -repart_da_processors_x 24 -repart_da_processors_y 2 -repart_da_processors_z 6 -mg_coarse_telescope_ksp_type preonly #-mg_coarse_telescope_ksp_constant_null_space -mg_coarse_telescope_pc_type mg -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 4 -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type svd #-mg_coarse_telescope_mg_coarse_pc_type telescope #-mg_coarse_telescope_mg_coarse_pc_telescope_reduction_factor 64 # Second subcomm #-mg_coarse_telescope_mg_coarse_telescope_ksp_type preonly #-mg_coarse_telescope_mg_coarse_telescope_pc_type mg #-mg_coarse_telescope_mg_coarse_telescope_pc_mg_galerkin #-mg_coarse_telescope_mg_coarse_telescope_pc_mg_levels 3 #-mg_coarse_telescope_mg_coarse_telescope_mg_levels_ksp_type richardson #-mg_coarse_telescope_mg_coarse_telescope_mg_levels_ksp_max_it 1 #-mg_coarse_telescope_mg_coarse_telescope_mg_coarse_ksp_type richardson #-mg_coarse_telescope_mg_coarse_telescope_mg_coarse_pc_type svd
Re: [petsc-users] Question about using Hypre with OpenMP under Petsc
Hi Barry, Thank you for your prompt reply. Which executable lib should I use ldd to check? Thank you, Frank. On 05/26/2015 02:41 PM, Barry Smith wrote: On May 26, 2015, at 4:18 PM, frank hengj...@uci.edu wrote: Hi I am trying to use multigrid to solve a large sparse linear system. I use Hypre boomeramg as the preconditioner. The code calling KSPSolve is paralleled by MPI. I want to set Hypre to use OpenMP. Here is what I did: * I downloaded and compiled Hypre through Petsc * I recompiled the Hypre with --with-opemp . Ok, you need to make sure that PETSc is linking against the OpenMP compiled version of hypre libraries. Use ldd on linux or otool -L on Mac. * I set -pc_type hypre and -pc_type_type boomeramg for Petsc. My question: ? In this way, would Hypre use OpenMP to parallel the execution when KSPSolve is called ? ? If this does not work, is there another way I can set Hypre to use OpenMP under Petsc ? ? Is there a way I can know explicitly whether Hypre is using OpenMP under Petsc or not ? Your question really has little to do with PETSc and more to do with hypre. You need to look through the hypre documentation and find out how you control the number of OpenMP threads that hypre uses (likely it is some environmental variables). Then run varying this number of threads and see what happens, if you use more threads does it go faster? It is best to make this test with just a single MPI process and 1,2,4, 8 OpenMP threads. Barry Thank you so much Frank
[petsc-users] Question about using Hypre with OpenMP under Petsc
Hi I am trying to use multigrid to solve a large sparse linear system. I use Hypre boomeramg as the preconditioner. The code calling KSPSolve is paralleled by MPI. I want to set Hypre to use OpenMP. Here is what I did: * I downloaded and compiled Hypre through Petsc * I recompiled the Hypre with --with-opemp . * I set -pc_type hypre and -pc_type_type boomeramg for Petsc. My question: ? In this way, would Hypre use OpenMP to parallel the execution when KSPSolve is called ? ? If this does not work, is there another way I can set Hypre to use OpenMP under Petsc ? ? Is there a way I can know explicitly whether Hypre is using OpenMP under Petsc or not ? Thank you so much Frank
[petsc-users] Dense matrix solver
Hi, I am thinking of solving a linear equations, whose coefficient matrix has 27 nonzero diagonal bands(diagonal dominant). Does any body have any idea about how this will perform? Do you have any recommendation about which solver to choose? I have solved a 11 nonzero diagonal bands matrix equation with boomeramg, which is 50% solwer than 7 nonzero diagonal bands. Thanks.
[petsc-users] Will matrix-free faster than solving linearized equation?
Hi, Currently, I am solving a nonlinear equation with some linearization method. I am thinking to modify it with a nonlinear solver. With PETSc library, I am confident to do it. I just want to ask those who have experiences with nonlinear solver if matrix-free method will be faster. Thank you very much.
[petsc-users] How to reuse matrix A
Hi, I am solving Ax=b repeatedly, and A does not change all the time. I did things like this: %1. Set up entries for matrix A%%% CALL MATASSEMBLYBEGIN(A,MAT_FINAL_ASSEMBLY,IERR) CALL MATASSEMBLYEND(A,MAT_FINAL_ASSEMBLY,IERR) CALL MATSETOPTION(A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ierr) CALL KSPSetOperators(ksp,A,A,SAME_NONZERO_PATTERN,ierr) Call KSPSetOperators(ksp,A,A,SAME_NONZERO_PATTERN,ierr) Do i=1,100 % Set up entries for vector b% CALL VECASSEMBLYBEGIN(b,IERR) CALL VECASSEMBLYEND(b,IERR) CALL KSPSolve(ksp,b,x,ierr) ENDDO However, it does not work. I don't know why. Could anybody help me with this? Thank you very much. Sincerely Xingjun Fang
[petsc-users] Weird memory leakage
Hi, I have very weird problem here. I am using FORTRAN to call PETSc to solve Poisson equation. When I run my code with 8 cores, it works fine, and the consumed memory does not increase. However, when it is run with 64 cores, first of all it gives lots of error like this: [n310:18951] [[62652,0],2] - [[62652,0],10] (node: n219) oob-tcp: Number of attempts to create TCP connection has been exceeded. Can not communicate with peer [n310:18951] [[62652,0],2] - [[62652,0],18] (node: n128) oob-tcp: Number of attempts to create TCP connection has been exceeded. Can not communicate with peer [n310:18951] [[62652,0],2] - [[62652,0],34] (node: n089) oob-tcp: Number of attempts to create TCP connection has been exceeded. Can not communicate with peer [n310:18951] [[62652,0],2] ORTED_CMD_PROCESSOR: STUCK IN INFINITE LOOP - ABORTING [n310:18951] *** Process received signal *** [n310:18951] Signal: Aborted (6) [n310:18951] Signal code: (-6) [n310:18951] [ 0] /lib64/libpthread.so.0() [0x35b120f500] [n310:18951] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x35b0e328a5] [n310:18951] [ 2] /lib64/libc.so.6(abort+0x175) [0x35b0e34085] [n310:18951] [ 3] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_daemon_cmd_processor+0x243) [0x2ae5e02f0813] [n310:18951] [ 4] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_base_loop+0x31a) [0x2ae5e032f56a] [n310:18951] [ 5] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_loop+0x12) [0x2ae5e032f242] [n310:18951] [ 6] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_progress+0x5c) [0x2ae5e031845c] [n310:18951] [ 7] /global/software/openmpi-1.6.1-intel1/lib/openmpi/mca_grpcomm_bad.so(+0x1bd7) [0x2ae5e28debd7] [n310:18951] [ 8] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_ess_base_orted_finalize+0x1e) [0x2ae5e02f431e] [n310:18951] [ 9] /global/software/openmpi-1.6.1-intel1/lib/openmpi/mca_ess_tm.so(+0x1294) [0x2ae5e1ab1294] [n310:18951] [10] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_finalize+0x4e) [0x2ae5e02d0fbe] [n310:18951] [11] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(+0x4840b) [0x2ae5e02f040b] [n310:18951] [12] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_base_loop+0x31a) [0x2ae5e032f56a] [n310:18951] [13] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_loop+0x12) [0x2ae5e032f242] [n310:18951] [14] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_progress+0x5c) [0x2ae5e031845c] [n310:18951] [15] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_trigger_event+0x50) [0x2ae5e02dc930] [n310:18951] [16] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(+0x4916f) [0x2ae5e02f116f] [n310:18951] [17] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_daemon_cmd_processor+0x149) [0x2ae5e02f0719] [n310:18951] [18] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_base_loop+0x31a) [0x2ae5e032f56a] [n310:18951] [19] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_loop+0x12) [0x2ae5e032f242] [n310:18951] [20] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_dispatch+0x8) [0x2ae5e032f228] [n310:18951] [21] /global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_daemon+0x9f0) [0x2ae5e02ef8a0] [n310:18951] [22] orted(main+0x88) [0x4024d8] [n310:18951] [23] /lib64/libc.so.6(__libc_start_main+0xfd) [0x35b0e1ecdd] [n310:18951] [24] orted() [0x402389] [n310:18951] *** End of error message *** but the program still gives the right result for a short period. After that, it suddenly stopped because memory exceeds some limit. I don't understand this. If there is memory leakage in my code, how come it can work with 8 cores? Please help me.Thank you so much! Sincerely Xingjun
[petsc-users] FORTRAN 90 with PETSc
Hi, I am using PETSc to iterate a problem, that is to say I call KSPSolve repeatedly. Firstly, I write all the PETSc components in one subroutine, including MatCreate, VecCreateMPI, etc. Everything works fine. Then, I want to only initialize ksp once outside the loop, and the matrix and rhs is changed within the loop repeatedly. Here are my problem: 1. I tried to use COMMON to transfer the following variables. I include petsc.var in the solver subroutine. It cannot be compiled. petsc.var Vec x,b Mat A KSP ksp PC pc COMMON /MYPETSC/x, b, A,ksp,pc 2. I defined the following in the main program: PROGRAM MAIN #include finclude/petscsys.h #include finclude/petscvec.h #include finclude/petscmat.h #include finclude/petscpc.h #include finclude/petscksp.h Vec x,b Mat A KSP ksp PC pc .. CALL INIT_PETSC(ksp,pc,A,x,b) .. CALL LOOP(ksp,pc,A,x,b) END PROGRAM !--- SUBROUTINE LOOP(ksp,pc,A,x,b) Vec x,b Mat A KSP ksp PC pc .. CALL SOLVE(ksp,pc,A,x,b) ... END SUBROUTINE !--- SUBROUTINE SOLVE(ksp,pc,A,x,b) Vec x,b Mat A KSP ksp PC pc .. CALL (ksp, b,x,ierr) END SUBROUTINE It can be compiled, but ksp does not iterate. Could you please explain to me the reason and solution for this problem. Thank you very much.