On Mon, Feb 27, 2017 at 9:06 AM, Lukas van de Wiel < [email protected]> wrote:
> Spreading the elements over the processors by sheer number is not > automatically a safe method, depending on the mesh. Especially with > irregular meshes, such as created by Triangle of Gmsh, such a > distribution will not reduce the amount of communication, maybe even > increase it. > > There are mature and well-tested partitioning tools available that can > divide your mesh into regional partitions. We use Metis/ParMetis. I > believe PETSc uses PTScotch. We have interfaces to Chaco, Metis, ParMetis, Party, and PTScotch Matt > This is an extra step, but it will reduce > the communication volume considerably. > > Cheers > Lukas > > On 2/27/17, Mark Adams <[email protected]> wrote: > > Another approach that might be simple, if you have the metadata for the > > entire mesh locally, is set up a list of elements that your local matrix > > block-rows/vertices touch but going over all the elements and test if any > > of its vertices i are: if (i >= start && i < end) list.append(i). Just > > compute and assemble those elements and tell PETSc to > > ignore-off-processor-entries. No communication, redundant local work, > some > > setup code and cost. > > > > On Sun, Feb 26, 2017 at 11:37 PM, Fangbo Wang <[email protected]> > wrote: > > > >> I got my finite element mesh from a commercial finite element software > >> ABAQUS. I simply draw the geometry of the model in the graphical > >> interface > >> and assign element types and material properties to different parts of > >> the > >> model, ABAQUS will automatically output the element and node information > >> of > >> the model. > >> > >> Suppose I have 1000 elements in my model and 10 MPI processes, > >> #1 to #100 local element matrices will be computed in MPI process 0; > >> #101 to #200 local element matrices will be computed in MPI process 1; > >> #201 to #300 local element matrices will be computed in MPI process 2; > >> .......... > >> #901 to #1000 local element matrices will be computed in MPI process 9; > >> > >> > >> However, I might get a lot of global matrix indices which I need to send > >> to other processors due to the degree of freedom ordering in the finite > >> element model. > >> > >> This is what I did according to my understanding of finite element and > >> what I have seen. > >> Do you have some nice libraries or packages that can be easily used in > >> scientific computing environment? > >> > >> Thank you very much! > >> > >> > >> > >> Fangbo Wang > >> > >> > >> > >> > >> On Sun, Feb 26, 2017 at 11:15 PM, Barry Smith <[email protected]> > wrote: > >> > >>> > >>> > On Feb 26, 2017, at 10:04 PM, Fangbo Wang <[email protected]> > >>> > wrote: > >>> > > >>> > My problem is a solid mechanics problem using finite element method > to > >>> discretize the model ( a 30mX30mX30m soil domain with a building > >>> structure > >>> on top). > >>> > > >>> > I am not manually deciding which MPI process compute which matrix > >>> enties. Because I know Petsc can automaticaly communicate between these > >>> processors. > >>> > I am just asking each MPI process generate certain number of matrix > >>> entries regardless of which process will finally store them. > >>> > >>> The standard way to handle this for finite elements is to partition > >>> the > >>> elements among the processes and then partition the nodes (rows of the > >>> degrees of freedom) subservient to the partitioning of the elements. > >>> Otherwise most of the matrix (or vector) entries must be communicated > >>> and > >>> this is not scalable. > >>> > >>> So how are you partitioning the elements (for matrix stiffness > >>> computations) and the nodes between processes? > >>> > > >>> > Actually, I constructed another matrix with same size but generating > >>> much less entries, and the code worked. However, it gets stuck when I > >>> generate more matrix entries. > >>> > > >>> > thank you very much! Any suggestion is highly appreciated. > >>> > > >>> > BTW, what is the meaning of "[4] MatCheckCompressedRow(): Found the > >>> ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use > >>> CompressedRow routines."? I know compressed row format is commonly used > >>> for > >>> sparse matrix, why don't use compressed row routines here? > >>> > >>> This is not important. > >>> > >>> > > >>> > > >>> > Thanks, > >>> > > >>> > > >>> > Fangbo Wang > >>> > > >>> > > >>> > > >>> > On Sun, Feb 26, 2017 at 10:42 PM, Barry Smith <[email protected]> > >>> wrote: > >>> > > >>> > How are you generating the matrix entries in parallel? In general > >>> > you > >>> can generate any matrix entries on any MPI process and they will be > >>> automatically transferred to the MPI process that owns the entries > >>> automatically. BUT if a huge number of matrix entries are computed on > >>> one > >>> process and need to be communicated to another process this may cause > >>> gridlock with MPI. Based on the huge size of messages from process 12 > it > >>> looks like this is what is happening in your code. > >>> > > >>> > Ideally most matrix entries are generated on the process they are > >>> stored and hence this gridlock does not happen. > >>> > > >>> > What type of discretization are you using? Finite differences, finite > >>> element, finite volume, spectral, something else? How are you deciding > >>> which MPI process should compute which matrix entries? Once we > >>> understand > >>> this we may be able to suggest a better way to compute the entries. > >>> > > >>> > Barry > >>> > > >>> > Under normally circumstances 1.3 million unknowns is not a large > >>> parallel matrix, there may be special features of your matrix that is > >>> making this difficult. > >>> > > >>> > > >>> > > >>> > > On Feb 26, 2017, at 9:30 PM, Fangbo Wang <[email protected]> > >>> wrote: > >>> > > > >>> > > Hi, > >>> > > > >>> > > I construct a big matrix which is 1.3million by 1.3million which is > >>> using approximatly 100GB memory. I have a computer with 500GB memory. > >>> > > > >>> > > I run the Petsc program and it get stuck when finally assembling > the > >>> matrix. The program is using around 200GB memory only. However, the > >>> program > >>> just get stuck there. Here is the output message when it gets stuck. > >>> > > . > >>> > > . > >>> > > previous outputs not shown here > >>> > > . > >>> > > [12] MatStashScatterBegin_Ref(): No of messages: 15 > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes > >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes > >>> > > [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses > 14 > >>> mallocs. > >>> > > [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14 > >>> mallocs. > >>> > > [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses > 14 > >>> mallocs. > >>> > > [13] MatStashScatterBegin_Ref(): No of messages: 15 > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes > >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes > >>> > > [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses > 14 > >>> mallocs. > >>> > > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage > >>> space: 89786788 unneeded,7025212 used > >>> > > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() > >>> is 0 > >>> > > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > >>> > > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows > >>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines. > >>> > > [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used: > >>> > > 5. > >>> Using Inode routines > >>> > > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage > >>> space: 89841924 unneeded,6970076 used > >>> > > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() > >>> is 0 > >>> > > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > >>> > > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows > >>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines. > >>> > > [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used: > >>> > > 5. > >>> Using Inode routines > >>> > > > >>> > > stuck here!!!! > >>> > > > >>> > > > >>> > > Any one have ideas on this? Thank you very much! > >>> > > > >>> > > > >>> > > > >>> > > Fangbo Wang > >>> > > > >>> > > > >>> > > > >>> > > -- > >>> > > Fangbo Wang, PhD student > >>> > > Stochastic Geomechanics Research Group > >>> > > Department of Civil, Structural and Environmental Engineering > >>> > > University at Buffalo > >>> > > Email: [email protected] > >>> > > >>> > > >>> > > >>> > > >>> > -- > >>> > Fangbo Wang, PhD student > >>> > Stochastic Geomechanics Research Group > >>> > Department of Civil, Structural and Environmental Engineering > >>> > University at Buffalo > >>> > Email: [email protected] > >>> > >>> > >> > >> > >> -- > >> Fangbo Wang, PhD student > >> Stochastic Geomechanics Research Group > >> Department of Civil, Structural and Environmental Engineering > >> University at Buffalo > >> Email: *[email protected] <[email protected]>* > >> > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
