Can you run with valgrind to determine if there is memory corruption? 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

  Also check with Intel for any MPI updates.

  You can also try to call MatAssemblyBegin/End(mat,MAT_FLUSH_ASSEMBLY) several 
times during the generation of the matrix entries (this will make the messages 
smaller).  Warning: all processes have to call 
MatAssemblyBegin/End(mat,MAT_FLUSH_ASSEMBLY) the same number of times. If this 
"solves" the problem then we know it is an issue with the MPI buffers.

 
  Barry


> On Jan 22, 2015, at 9:17 AM, Antoine De Blois 
> <[email protected]> wrote:
> 
> Hi Everyone,
>  
> I get a strange error during a call to MatAssemblyBegin. The error message is 
> triggered by Intel MPI, as shown below. The error does not always occurs, 
> which is even more strange.
> [333:node1179] unexpected disconnect completion event from [163:node1254]
> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
>  
> All ranks output the same error message with their own node number. I did a 
> bit of research and some say that MPICH2 solves this issue. Since our group 
> is keen in using Intel MPI, I would like to solves this issue at the root.
>  
> A few important points:
> ·         At the moment, we are assembling the matrix with a single 
> MatAssembleBegin/End and MAT_FINAL_ASSEMBLY after doing MatSetValuesBlocked. 
> Can it be due to memory overflow in the buffers?
> ·         We are using -genv I_MPI_FABRICS shm:dapl in the submission script
> ·         I tried using –malloc_log and –log_summary, but the crash prevents 
> writing the log ouput
>  
> Has anyone of you already faced this issue?
> Any suggestion is welcome,
> Best regards,
> Antoine DeBlois
>  
> Antoine DeBlois
> Specialiste ingenierie, MDO lead / Engineering Specialist, MDO lead
> Aéronautique / Aerospace
> 514-855-5001, x 50862
> [email protected]
> 
> 2351 Blvd Alfred-Nobel
> Montreal, Qc
> H4S 1A9
> 
> <image001.jpg>
> CONFIDENTIALITY NOTICE - This communication may contain privileged or 
> confidential information.
> If you are not the intended recipient or received this communication by 
> error, please notify the sender
> and delete the message without copying

Reply via email to