Pardon if this has been addressed already, but I could not find the answer after going through the OpenMPI FAQ and doing Google searches of the open-mpi.org site.
We are in the process of analyzing and troubleshooting MPI jobs of increasingly large scale (OpenMPI 1.6.5). At a sufficiently large scale (# cores) a job will end up failing with errors similar to: [yyyyy][[56933,1],1904][connect/btl_openib_connect_oob.c:867:rml_recv_cb] error in endpoint reply start connect [xxxxx:29318] 853 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / ibv_create_qp failed So I know we are running into some memory limitation (educated guess) when queue pairs are being created to support such a huge mesh. We are now investigating using the XRC transport to decrease memory consumption. Anyways, my questions are: 1. How do we determine HOW MUCH memory is being pinned by an MPI job on a node? (If pmap, what exactly are we looking for?) 2. How do we determine WHERE these pinned memory regions are? We are running RedHat 6.x. Thank you! --john