I confess I am sorely puzzled. I replace the Info key with MPI_INFO_NULL, but still had to pass a bogus argument to master since you still have the Info_set code in there - otherwise, info_set segfaults due to a NULL argv[1]. Doing that (and replacing "hostname" with an MPI example code) makes everything work just fine.
I've attached one of our example comm_spawn codes that we test against - it also works fine with the current head of the 1.8 code base. I confess that some changes have been made since 1.8.4 was released, and it is entirely possible that this was a problem in 1.8.4 and has since been fixed. So I'd suggest trying with the nightly 1.8 tarball and seeing if it works for you. You can download it from here: http://www.open-mpi.org/nightly/v1.8/ HTH Ralph On Tue, Feb 3, 2015 at 6:20 PM, Evan Samanas <evan.sama...@gmail.com> wrote: > Yes, I did. I replaced the info argument of MPI_Comm_spawn with > MPI_INFO_NULL. > > On Tue, Feb 3, 2015 at 5:54 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> When running your comm_spawn code, did you remove the Info key code? You >> wouldn't need to provide a hostfile or hosts any more, which is why it >> should resolve that problem. >> >> I agree that providing either hostfile or host as an Info key will cause >> the program to segfault - I'm woking on that issue. >> >> >> On Tue, Feb 3, 2015 at 3:46 PM, Evan Samanas <evan.sama...@gmail.com> >> wrote: >> >>> Setting these environment variables did indeed change the way mpirun >>> maps things, and I didn't have to specify a hostfile. However, setting >>> these for my MPI_Comm_spawn code still resulted in the same segmentation >>> fault. >>> >>> Evan >>> >>> On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> If you add the following to your environment, you should run on >>>> multiple nodes: >>>> >>>> OMPI_MCA_rmaps_base_mapping_policy=node >>>> OMPI_MCA_orte_default_hostfile=<your hostfile> >>>> >>>> The first tells OMPI to map-by node. The second passes in your default >>>> hostfile so you don't need to specify it as an Info key. >>>> >>>> HTH >>>> Ralph >>>> >>>> >>>> On Tue, Feb 3, 2015 at 9:23 AM, Evan Samanas <evan.sama...@gmail.com> >>>> wrote: >>>> >>>>> Hi Ralph, >>>>> >>>>> Good to know you've reproduced it. I was experiencing this using both >>>>> the hostfile and host key. A simple comm_spawn was working for me as >>>>> well, >>>>> but it was only launching locally, and I'm pretty sure each node only has >>>>> 4 >>>>> slots given past behavior (the mpirun -np 8 example I gave in my first >>>>> email launches on both hosts). Is there a way to specify the hosts I want >>>>> to launch on without the hostfile or host key so I can test remote launch? >>>>> >>>>> And to the "hostname" response...no wonder it was hanging! I just >>>>> constructed that as a basic example. In my real use I'm launching >>>>> something that calls MPI_Init. >>>>> >>>>> Evan >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/02/26271.php >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/02/26272.php >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/02/26281.php >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/02/26285.php >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26286.php >
#include <stdio.h> #include <sys/types.h> #include <unistd.h> #include <mpi.h> int main(int argc, char* argv[]) { int msg, rc; MPI_Comm parent, child; int rank, size; char hostname[512]; pid_t pid; pid = getpid(); printf("[pid %ld] starting up!\n", (long)pid); MPI_Init(NULL, NULL); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf("%d completed MPI_Init\n", rank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_get_parent(&parent); /* If we get COMM_NULL back, then we're the parent */ if (MPI_COMM_NULL == parent) { pid = getpid(); printf("Parent [pid %ld] about to spawn!\n", (long)pid); if (MPI_SUCCESS != (rc = MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 3, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &child, MPI_ERRCODES_IGNORE))) { printf("Child failed to spawn\n"); return rc; } printf("Parent done with spawn\n"); if (0 == rank) { msg = 38; printf("Parent sending message to child\n"); MPI_Send(&msg, 1, MPI_INT, 0, 1, child); } MPI_Comm_disconnect(&child); printf("Parent disconnected\n"); } /* Otherwise, we're the child */ else { MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); gethostname(hostname, 512); pid = getpid(); printf("Hello from the child %d of %d on host %s pid %ld\n", rank, 3, hostname, (long)pid); if (0 == rank) { MPI_Recv(&msg, 1, MPI_INT, 0, 1, parent, MPI_STATUS_IGNORE); printf("Child %d received msg: %d\n", rank, msg); } MPI_Comm_disconnect(&parent); printf("Child %d disconnected\n", rank); } MPI_Finalize(); fprintf(stderr, "%d: exiting\n", pid); return 0; }