Hello, I am an employee of the UNH InterOperability Lab, and we are in the process of testing OFED-4.17-RC1 for the OpenFabrics Alliance. We have purchased some new hardware that has one processor, and noticed an issue when running mpi jobs on nodes that do not have similar processor counts. If we launch the MPI job from a node that has 2 processors, it will fail and stating there are not enough resources and will not start the run, like so: -------------------------------------------------------------------------- There are not enough slots available in the system to satisfy the 14 slots that were requested by the application: IMB-MPI1 Either request fewer slots for your application, or make more slots available for use. -------------------------------------------------------------------------- If we launch the MPI job from the node with one processor, without changing the mpirun command at all, it runs as expected. Here is the command being run: mpirun --mca btl_openib_warn_no_device_params_found 0 --mca orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_receive_queues P,65536,120,64,32 -hostfile /home/soesterreich/ce-mpi-hosts IMB-MPI1 Here is the hostfile being used: farbauti-ce.ofa.iol.unh.edu slots=1 hyperion-ce.ofa.iol.unh.edu slots=1 io-ce.ofa.iol.unh.edu slots=1 jarnsaxa-ce.ofa.iol.unh.edu slots=1 rhea-ce.ofa.iol.unh.edu slots=1 tarqeq-ce.ofa.iol.unh.edu slots=1 tarvos-ce.ofa.iol.unh.edu slots=1 This seems like a bug and we would like some help to explain and fix what is happening. The IBTA plugfest saw similar behaviours, so this should be reproduceable. Thanks, Adam LeBlanc
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users