Okay, I tracked this down - thanks for your patience! I have a fix pending review. You can track it here:
https://github.com/open-mpi/ompi-release/pull/179 <https://github.com/open-mpi/ompi-release/pull/179> > On Feb 4, 2015, at 5:14 PM, Evan Samanas <evan.sama...@gmail.com> wrote: > > Indeed, I simply commented out all the MPI_Info stuff, which you essentially > did by passing a dummy argument. I'm still not able to get it to succeed. > > So here we go, my results defy logic. I'm sure this could be my fault...I've > only been an occasional user of OpenMPI and MPI in general over the years and > I've never used MPI_Comm_spawn before this project. I tested simple_spawn > like so: > mpicc simple_spawn.c -o simple_spawn > ./simple_spawn > > When my default hostfile points to a file that just lists localhost, this > test completes successfully. If it points to my hostfile with localhost and > 5 remote hosts, here's the output: > evan@lasarti:~/devel/toy_progs/mpi_spawn$ mpicc simple_spawn.c -o simple_spawn > evan@lasarti:~/devel/toy_progs/mpi_spawn$ ./simple_spawn > [pid 5703] starting up! > 0 completed MPI_Init > Parent [pid 5703] about to spawn! > [lasarti:05703] [[14661,1],0] FORKING HNP: orted --hnp --set-sid --report-uri > 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca ess_base_jobid > 960823296 > [lasarti:05705] *** Process received signal *** > [lasarti:05705] Signal: Segmentation fault (11) > [lasarti:05705] Signal code: Address not mapped (1) > [lasarti:05705] Failing at address: (nil) > [lasarti:05705] [ 0] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fc185dcf340] > [lasarti:05705] [ 1] > /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_compute_bindings+0x650)[0x7fc186033bb0] > [lasarti:05705] [ 2] > /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x939)[0x7fc18602fb99] > [lasarti:05705] [ 3] > /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x6e4)[0x7fc18577dcc4] > [lasarti:05705] [ 4] > /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_daemon+0xdf8)[0x7fc186010438] > [lasarti:05705] [ 5] orted(main+0x47)[0x400887] > [lasarti:05705] [ 6] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc185a1aec5] > [lasarti:05705] [ 7] orted[0x4008db] > [lasarti:05705] *** End of error message *** > > You can see from the message that this particular run IS from the latest > snapshot, though the failure happens on v.1.8.4 as well. I didn't bother > installing the snapshot on the remote nodes though. Should I do that? It > looked to me like this error happened well before we got to a remote node, so > that's why I didn't. > > Your thoughts? > > Evan > > > > On Tue, Feb 3, 2015 at 7:40 PM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > I confess I am sorely puzzled. I replace the Info key with MPI_INFO_NULL, but > still had to pass a bogus argument to master since you still have the > Info_set code in there - otherwise, info_set segfaults due to a NULL argv[1]. > Doing that (and replacing "hostname" with an MPI example code) makes > everything work just fine. > > I've attached one of our example comm_spawn codes that we test against - it > also works fine with the current head of the 1.8 code base. I confess that > some changes have been made since 1.8.4 was released, and it is entirely > possible that this was a problem in 1.8.4 and has since been fixed. > > So I'd suggest trying with the nightly 1.8 tarball and seeing if it works for > you. You can download it from here: > > http://www.open-mpi.org/nightly/v1.8/ <http://www.open-mpi.org/nightly/v1.8/> > > HTH > Ralph > > > On Tue, Feb 3, 2015 at 6:20 PM, Evan Samanas <evan.sama...@gmail.com > <mailto:evan.sama...@gmail.com>> wrote: > Yes, I did. I replaced the info argument of MPI_Comm_spawn with > MPI_INFO_NULL. > > On Tue, Feb 3, 2015 at 5:54 PM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > When running your comm_spawn code, did you remove the Info key code? You > wouldn't need to provide a hostfile or hosts any more, which is why it should > resolve that problem. > > I agree that providing either hostfile or host as an Info key will cause the > program to segfault - I'm woking on that issue. > > > On Tue, Feb 3, 2015 at 3:46 PM, Evan Samanas <evan.sama...@gmail.com > <mailto:evan.sama...@gmail.com>> wrote: > Setting these environment variables did indeed change the way mpirun maps > things, and I didn't have to specify a hostfile. However, setting these for > my MPI_Comm_spawn code still resulted in the same segmentation fault. > > Evan > > On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > If you add the following to your environment, you should run on multiple > nodes: > > OMPI_MCA_rmaps_base_mapping_policy=node > OMPI_MCA_orte_default_hostfile=<your hostfile> > > The first tells OMPI to map-by node. The second passes in your default > hostfile so you don't need to specify it as an Info key. > > HTH > Ralph > > > On Tue, Feb 3, 2015 at 9:23 AM, Evan Samanas <evan.sama...@gmail.com > <mailto:evan.sama...@gmail.com>> wrote: > Hi Ralph, > > Good to know you've reproduced it. I was experiencing this using both the > hostfile and host key. A simple comm_spawn was working for me as well, but > it was only launching locally, and I'm pretty sure each node only has 4 slots > given past behavior (the mpirun -np 8 example I gave in my first email > launches on both hosts). Is there a way to specify the hosts I want to > launch on without the hostfile or host key so I can test remote launch? > > And to the "hostname" response...no wonder it was hanging! I just > constructed that as a basic example. In my real use I'm launching something > that calls MPI_Init. > > Evan > > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26271.php > <http://www.open-mpi.org/community/lists/users/2015/02/26271.php> > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26272.php > <http://www.open-mpi.org/community/lists/users/2015/02/26272.php> > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26281.php > <http://www.open-mpi.org/community/lists/users/2015/02/26281.php> > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26285.php > <http://www.open-mpi.org/community/lists/users/2015/02/26285.php> > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26286.php > <http://www.open-mpi.org/community/lists/users/2015/02/26286.php> > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26287.php > <http://www.open-mpi.org/community/lists/users/2015/02/26287.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26292.php > <http://www.open-mpi.org/community/lists/users/2015/02/26292.php>