Sounds like you have a problem with the physical layer of your InfiniBand.  You 
should run layer 0 diagnostics and/or contact your IB vendor for assistance.


On Jun 24, 2014, at 4:48 AM, Diego Saúl Carrió Carrió <diego.car...@uib.es> 
wrote:

> Dear all,
> 
> I have problems for a long time related with  mpirun. When I executed mpirun 
> (with my program) I obtained the next error after a while:
> 
> .
> .
> .
> .
> .
> 
>  mlx4: local QP operation err (QPN c00054, WQE index a0000, vendor syndrome 
> 6f, opcode = 5e)
> [[64826,1],0][btl_openib_component.c:3497:handle_wc] from foner109 to: 
> foner111 error polling LP CQ with status LOCAL QP OPERATION ERROR status 
> number 2 for wr_id af58a8 opcode 128  vendor error 111 qp_idx 3
> 
> mpirun has exited due to process rank 0 with PID 51754 on
> node foner109 exiting improperly. There are two reasons this could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
> 
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> 
> 
> 
> I am using a cluster (42 nodes, with 20 processors and 64 Gb RAM for each 
> one). I want to use for example only 20 nodes, so I put:
> 
> salloc -N20 --tasks-per-node=1 --cpus-per-task=20 -p thin(name of the node)
> 
> mpirun -pernode [my_program]
> 
> 
> Could you help me to solve this problem?
> 
> Best Regards,
> Diego
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/06/24692.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to