Couple of things you can try: * add --oversubscribe to your mpirun cmd line so it doesn’t care how many slots there are
* modify your MPI_INFO to be “host”, “node0:22” so it thinks there are more slots available It’s possible that the “host” info processing has a bug in it, but this will tell us a little more and hopefully get your running. If you want to bind your processes to cores, then add “--bind-to core” to the cmd line > On Oct 6, 2017, at 1:35 PM, George Reeke <re...@mail.rockefeller.edu> wrote: > > Dear colleagues, > I need some help controlling where a process spawned with > MPI_Comm_spawn goes. I am in openmpi-1.10 under Centos 6.7. > My application is written in C and am running on a RedBarn > system with a master node (hardware box) that connects to the > outside world and two other nodes connected to it via ethernet and > Infiniband. There are two executable files, one (I'll call it > "Rank0Pgm") that expects to be rank 0 and does all the I/O and > the other ("RanknPgm") that only communicates via MPI messages. > There are two MPI_Comm_spawns that run just after MPI_Init and > an initial broadcast that shares some setup info, like this: > MPI_Comm_spawn("andmsg", argv, 1, MPI_INFO_NULL, > hostid, commc, &commd, &sperr); > where "andmsg" is a program that needs to communicate with the > internet and with all the other processes via a new communicator > that will be called commd (and another name for the other one). > When I run this program with no hostfile and an mpirun line > something like this on a node with 32 cores: > /usr/lib64/openmpi-1.10/bin/mpirun -n 1 Rank0Pgm : -n 28 RanknPgm \ > < InputFile > everything works fine. I assume the spawns use 2 of the 3 available > cores that I did not ask the program to use. > > Now I want to run on the full network, so I make a hostfile like this > (call it "nodes120"): > node0 slots=22 max-slots=22 > n0003 slots=40 max-slots=40 > n0004 slots=56 max-slots=56 > where node0 has 24 cores and I am trying to leave room for my two > spawned processes. The spawned processes have to be able to contact > the internet, so I make an MPI_INFO with MPI_Info_create and > MPI_Info_set(mpinfo, "host", "node0") > and change the MPI_INFO_NULL in the spawn calls to point to this > new MPI_Info. (If I leave the MPI_INFO_NULL I get a different > error that is probably not of interest here.) > > Now I run the mpirun like above except now with > "--hostfile nodes120" and "-n 116" after the colon. Now I get this > error: > > "There are not enough slots available in the system to satisfy the 1 > slots that were requested by the application: > andmsg > Either request fewer slots for your application, or make more slots > available for use." > > I get the same error with "max-slots=24" on the first line of the > hosts file. > > Sorry for the length of all that. Request for help: How do I set > things up to run my rank 0 program and enough copies of RanknPgm to fill > all but some number of cores on the master hardware node, and all the > other rank n programs on the other hardware "nodes" (boxes of CPUs). > [My application will do best with the default "by slot" scheduling.] > > Suggestions much appreciated. I am quite convinced my code is OK > in that it runs OK as shown above on one hardware box. Also runs > on my laptop with 4 cores and "-n 3 RanknPgm" so I guess I don't > even really need to reserve cores for the two spawned processes. > I thought of using old-fashioned 'fork' but I really want the > extra communicators to keep asynchronous messages separated. > The documentation says overloading is OK by default, so maybe > something else is wrong here. > > George Reeke > > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users