Hi,

I’ve just upgraded one of my cluster nodes to Debian 9 and OpenMPI 2.0. Since 
then, running MPI jobs via SGE gives the following message in the output:

--------------------------------------------------------------------------
Failed to create a completion queue (CQ):

Hostname: node4
Requested CQE: 16384
Error:    Cannot allocate memory

Check the CQE attribute.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI has detected that there are UD-capable Verbs devices on your
system, but none of them were able to be setup properly.  This may
indicate a problem on this system.

You job will continue, but Open MPI will ignore the "ud" oob component
in this run.

Hostname: node4
--------------------------------------------------------------------------

ulimit -l within the job script returns unlimited.

When instead of using qsub, I use qrsh to log into the node and run the same 
script, I don’t get the error. The nodes do have Infiniband and I do want to be 
able to use it, so I can’t ignore it.

Has anyone else encountered this? The only result I can find on Google is a bug 
in OpenMPI, which has been fixed (see 
https://github.com/open-mpi/ompi/issues/1301) The fact that it happens with 
qsub but not qrsh suggests to me that it’s a configuration problem with SGE on 
my end.

Many thanks,

Ed


=============================
Dr Ed Bennett
Department of Physics, Vivian Tower
Swansea University, Singleton Park
Swansea SA2 8PP UK
=============================

_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Reply via email to