The problem is possibly due to most elements being computed on "wrong" MPI rank and thus requiring almost all the matrix entries to be "stashed" when computed and then sent off to the owning MPI rank. Please send ALL the output of a parallel run with -info so we can see how much communication is done in the matrix assembly.
Barry > On Dec 12, 2022, at 6:16 AM, 김성익 <[email protected]> wrote: > > Hello, > > > I need some keyword or some examples for parallelizing matrix assemble > process. > > My current state is as below. > - Finite element analysis code for Structural mechanics. > - problem size : 3D solid hexa element (number of elements : 125,000), number > of degree of freedom : 397,953 > - Matrix type : seqaij, matrix set preallocation by using > MatSeqAIJSetPreallocation > - Matrix assemble time by using 1 core : 120 sec > for (int i=0; i<125000; i++) { > ~~ element matrix calculation} > matassemblybegin > matassemblyend > - Matrix assemble time by using 8 core : 70,234sec > int start, end; > VecGetOwnershipRange( element_vec, &start, &end); > for (int i=start; i<end; i++){ > ~~ element matrix calculation > matassemblybegin > matassemblyend > > > As you see the state, the parallel case spent a lot of time than sequential > case.. > How can I speed up in this case? > Can I get some keyword or examples for parallelizing assembly of matrix in > finite element analysis ? > > Thanks, > Hyung Kim >
