Sounds like you have a problem with the physical layer of your InfiniBand. You should run layer 0 diagnostics and/or contact your IB vendor for assistance.
On Jun 24, 2014, at 4:48 AM, Diego Saúl Carrió Carrió <diego.car...@uib.es> wrote: > Dear all, > > I have problems for a long time related with mpirun. When I executed mpirun > (with my program) I obtained the next error after a while: > > . > . > . > . > . > > mlx4: local QP operation err (QPN c00054, WQE index a0000, vendor syndrome > 6f, opcode = 5e) > [[64826,1],0][btl_openib_component.c:3497:handle_wc] from foner109 to: > foner111 error polling LP CQ with status LOCAL QP OPERATION ERROR status > number 2 for wr_id af58a8 opcode 128 vendor error 111 qp_idx 3 > > mpirun has exited due to process rank 0 with PID 51754 on > node foner109 exiting improperly. There are two reasons this could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > This may have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > > > > I am using a cluster (42 nodes, with 20 processors and 64 Gb RAM for each > one). I want to use for example only 20 nodes, so I put: > > salloc -N20 --tasks-per-node=1 --cpus-per-task=20 -p thin(name of the node) > > mpirun -pernode [my_program] > > > Could you help me to solve this problem? > > Best Regards, > Diego > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24692.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/