Another approach that might be simple, if you have the metadata for the entire mesh locally, is set up a list of elements that your local matrix block-rows/vertices touch but going over all the elements and test if any of its vertices i are: if (i >= start && i < end) list.append(i). Just compute and assemble those elements and tell PETSc to ignore-off-processor-entries. No communication, redundant local work, some setup code and cost.
On Sun, Feb 26, 2017 at 11:37 PM, Fangbo Wang <[email protected]> wrote: > I got my finite element mesh from a commercial finite element software > ABAQUS. I simply draw the geometry of the model in the graphical interface > and assign element types and material properties to different parts of the > model, ABAQUS will automatically output the element and node information of > the model. > > Suppose I have 1000 elements in my model and 10 MPI processes, > #1 to #100 local element matrices will be computed in MPI process 0; > #101 to #200 local element matrices will be computed in MPI process 1; > #201 to #300 local element matrices will be computed in MPI process 2; > .......... > #901 to #1000 local element matrices will be computed in MPI process 9; > > > However, I might get a lot of global matrix indices which I need to send > to other processors due to the degree of freedom ordering in the finite > element model. > > This is what I did according to my understanding of finite element and > what I have seen. > Do you have some nice libraries or packages that can be easily used in > scientific computing environment? > > Thank you very much! > > > > Fangbo Wang > > > > > On Sun, Feb 26, 2017 at 11:15 PM, Barry Smith <[email protected]> wrote: > >> >> > On Feb 26, 2017, at 10:04 PM, Fangbo Wang <[email protected]> wrote: >> > >> > My problem is a solid mechanics problem using finite element method to >> discretize the model ( a 30mX30mX30m soil domain with a building structure >> on top). >> > >> > I am not manually deciding which MPI process compute which matrix >> enties. Because I know Petsc can automaticaly communicate between these >> processors. >> > I am just asking each MPI process generate certain number of matrix >> entries regardless of which process will finally store them. >> >> The standard way to handle this for finite elements is to partition the >> elements among the processes and then partition the nodes (rows of the >> degrees of freedom) subservient to the partitioning of the elements. >> Otherwise most of the matrix (or vector) entries must be communicated and >> this is not scalable. >> >> So how are you partitioning the elements (for matrix stiffness >> computations) and the nodes between processes? >> > >> > Actually, I constructed another matrix with same size but generating >> much less entries, and the code worked. However, it gets stuck when I >> generate more matrix entries. >> > >> > thank you very much! Any suggestion is highly appreciated. >> > >> > BTW, what is the meaning of "[4] MatCheckCompressedRow(): Found the >> ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use >> CompressedRow routines."? I know compressed row format is commonly used for >> sparse matrix, why don't use compressed row routines here? >> >> This is not important. >> >> > >> > >> > Thanks, >> > >> > >> > Fangbo Wang >> > >> > >> > >> > On Sun, Feb 26, 2017 at 10:42 PM, Barry Smith <[email protected]> >> wrote: >> > >> > How are you generating the matrix entries in parallel? In general you >> can generate any matrix entries on any MPI process and they will be >> automatically transferred to the MPI process that owns the entries >> automatically. BUT if a huge number of matrix entries are computed on one >> process and need to be communicated to another process this may cause >> gridlock with MPI. Based on the huge size of messages from process 12 it >> looks like this is what is happening in your code. >> > >> > Ideally most matrix entries are generated on the process they are >> stored and hence this gridlock does not happen. >> > >> > What type of discretization are you using? Finite differences, finite >> element, finite volume, spectral, something else? How are you deciding >> which MPI process should compute which matrix entries? Once we understand >> this we may be able to suggest a better way to compute the entries. >> > >> > Barry >> > >> > Under normally circumstances 1.3 million unknowns is not a large >> parallel matrix, there may be special features of your matrix that is >> making this difficult. >> > >> > >> > >> > > On Feb 26, 2017, at 9:30 PM, Fangbo Wang <[email protected]> >> wrote: >> > > >> > > Hi, >> > > >> > > I construct a big matrix which is 1.3million by 1.3million which is >> using approximatly 100GB memory. I have a computer with 500GB memory. >> > > >> > > I run the Petsc program and it get stuck when finally assembling the >> matrix. The program is using around 200GB memory only. However, the program >> just get stuck there. Here is the output message when it gets stuck. >> > > . >> > > . >> > > previous outputs not shown here >> > > . >> > > [12] MatStashScatterBegin_Ref(): No of messages: 15 >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes >> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes >> > > [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses 14 >> mallocs. >> > > [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14 >> mallocs. >> > > [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses 14 >> mallocs. >> > > [13] MatStashScatterBegin_Ref(): No of messages: 15 >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes >> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes >> > > [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses 14 >> mallocs. >> > > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage >> space: 89786788 unneeded,7025212 used >> > > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >> is 0 >> > > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 >> > > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines. >> > > [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used: 5. >> Using Inode routines >> > > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage >> space: 89841924 unneeded,6970076 used >> > > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >> is 0 >> > > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 >> > > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines. >> > > [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used: 5. >> Using Inode routines >> > > >> > > stuck here!!!! >> > > >> > > >> > > Any one have ideas on this? Thank you very much! >> > > >> > > >> > > >> > > Fangbo Wang >> > > >> > > >> > > >> > > -- >> > > Fangbo Wang, PhD student >> > > Stochastic Geomechanics Research Group >> > > Department of Civil, Structural and Environmental Engineering >> > > University at Buffalo >> > > Email: [email protected] >> > >> > >> > >> > >> > -- >> > Fangbo Wang, PhD student >> > Stochastic Geomechanics Research Group >> > Department of Civil, Structural and Environmental Engineering >> > University at Buffalo >> > Email: [email protected] >> >> > > > -- > Fangbo Wang, PhD student > Stochastic Geomechanics Research Group > Department of Civil, Structural and Environmental Engineering > University at Buffalo > Email: *[email protected] <[email protected]>* >
