Hi, I’ve just upgraded one of my cluster nodes to Debian 9 and OpenMPI 2.0. Since then, running MPI jobs via SGE gives the following message in the output:
-------------------------------------------------------------------------- Failed to create a completion queue (CQ): Hostname: node4 Requested CQE: 16384 Error: Cannot allocate memory Check the CQE attribute. -------------------------------------------------------------------------- -------------------------------------------------------------------------- Open MPI has detected that there are UD-capable Verbs devices on your system, but none of them were able to be setup properly. This may indicate a problem on this system. You job will continue, but Open MPI will ignore the "ud" oob component in this run. Hostname: node4 -------------------------------------------------------------------------- ulimit -l within the job script returns unlimited. When instead of using qsub, I use qrsh to log into the node and run the same script, I don’t get the error. The nodes do have Infiniband and I do want to be able to use it, so I can’t ignore it. Has anyone else encountered this? The only result I can find on Google is a bug in OpenMPI, which has been fixed (see https://github.com/open-mpi/ompi/issues/1301) The fact that it happens with qsub but not qrsh suggests to me that it’s a configuration problem with SGE on my end. Many thanks, Ed ============================= Dr Ed Bennett Department of Physics, Vivian Tower Swansea University, Singleton Park Swansea SA2 8PP UK ============================= _______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss