Hi Everyone, I get a strange error during a call to MatAssemblyBegin. The error message is triggered by Intel MPI, as shown below. The error does not always occurs, which is even more strange. [333:node1179] unexpected disconnect completion event from [163:node1254] Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
All ranks output the same error message with their own node number. I did a bit of research and some say that MPICH2 solves this issue. Since our group is keen in using Intel MPI, I would like to solves this issue at the root. A few important points: · At the moment, we are assembling the matrix with a single MatAssembleBegin/End and MAT_FINAL_ASSEMBLY after doing MatSetValuesBlocked. Can it be due to memory overflow in the buffers? · We are using -genv I_MPI_FABRICS shm:dapl in the submission script · I tried using -malloc_log and -log_summary, but the crash prevents writing the log ouput Has anyone of you already faced this issue? Any suggestion is welcome, Best regards, Antoine DeBlois Antoine DeBlois Specialiste ingenierie, MDO lead / Engineering Specialist, MDO lead Aéronautique / Aerospace 514-855-5001, x 50862 [email protected]<mailto:[email protected]> 2351 Blvd Alfred-Nobel Montreal, Qc H4S 1A9 [Description : Description : http://signatures.ca.aero.bombardier.net/eom_logo_164x39_fr.jpg] CONFIDENTIALITY NOTICE - This communication may contain privileged or confidential information. If you are not the intended recipient or received this communication by error, please notify the sender and delete the message without copying
