Hello,
I have a FEM code that perform with several meshes. We handle interactions/contacts between bodies/meshes by assembling coupling terms between the ? contact ? nodes of the meshes. I have a very large bandwidth : The numbering of the whole problem is done mesh by mesh (my problem is of size 4*N where N is the total number of node and N = N1 + N2 + .. + Nq with Nq the number of nodes of mesh q. Nodes of mesh q are numbered from N1+N2 + .. + N(q-1) + 1 to N1+N2+..+Nq) Typically N # 100000 to 1000000. The matrix is a MPIBAIJ one and the d_nnz and o_nnz info are specified when created. It is filled using MatSetValuesBlockedLocal in mode ADD_VALUES. At each increment of my time step scheme, the connections between mesh nodes may change and I have to rebuild the matrix. It appears that the CPU required for the first matrix assembly is very large (three to four times the CPU for 1 system solve) and depend on the number of meshes : if I have only one mesh of an equivalent size the assembly CPU remain almost zero. So I wonder what is causing the assembly to last so much ? I was thinking that the system solve would have been longer because of my large bandwidth but I don't understand why it is the matrix assembly that last so much. I have investigated using Mat_info but all seems to be correct : the number of malloc during MatSetValue is always zero and I have a ratio non zero used / non zero allocated between 1% and 10% (same ratio than when I have only one mesh). I have tested using a simple SOR preconditionner instead of ILU, wondering if it was the precond assembly that last long because of the bandwidth, but it does not change anything ! Thanks a lot for any remarks or any tip. Best regards, Etienne Perchat
