Okay, I tracked this down - thanks for your patience! I have a fix pending 
review. You can track it here:

https://github.com/open-mpi/ompi-release/pull/179 
<https://github.com/open-mpi/ompi-release/pull/179>


> On Feb 4, 2015, at 5:14 PM, Evan Samanas <evan.sama...@gmail.com> wrote:
> 
> Indeed, I simply commented out all the MPI_Info stuff, which you essentially 
> did by passing a dummy argument.  I'm still not able to get it to succeed.
> 
> So here we go, my results defy logic.  I'm sure this could be my fault...I've 
> only been an occasional user of OpenMPI and MPI in general over the years and 
> I've never used MPI_Comm_spawn before this project. I tested simple_spawn 
> like so:
> mpicc simple_spawn.c -o simple_spawn
> ./simple_spawn
> 
> When my default hostfile points to a file that just lists localhost, this 
> test completes successfully.  If it points to my hostfile with localhost and 
> 5 remote hosts, here's the output:
> evan@lasarti:~/devel/toy_progs/mpi_spawn$ mpicc simple_spawn.c -o simple_spawn
> evan@lasarti:~/devel/toy_progs/mpi_spawn$ ./simple_spawn
> [pid 5703] starting up!
> 0 completed MPI_Init
> Parent [pid 5703] about to spawn!
> [lasarti:05703] [[14661,1],0] FORKING HNP: orted --hnp --set-sid --report-uri 
> 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca ess_base_jobid 
> 960823296
> [lasarti:05705] *** Process received signal ***
> [lasarti:05705] Signal: Segmentation fault (11)
> [lasarti:05705] Signal code: Address not mapped (1)
> [lasarti:05705] Failing at address: (nil)
> [lasarti:05705] [ 0] 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fc185dcf340]
> [lasarti:05705] [ 1] 
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_compute_bindings+0x650)[0x7fc186033bb0]
> [lasarti:05705] [ 2] 
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x939)[0x7fc18602fb99]
> [lasarti:05705] [ 3] 
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x6e4)[0x7fc18577dcc4]
> [lasarti:05705] [ 4] 
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_daemon+0xdf8)[0x7fc186010438]
> [lasarti:05705] [ 5] orted(main+0x47)[0x400887]
> [lasarti:05705] [ 6] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc185a1aec5]
> [lasarti:05705] [ 7] orted[0x4008db]
> [lasarti:05705] *** End of error message ***
> 
> You can see from the message that this particular run IS from the latest 
> snapshot, though the failure happens on v.1.8.4 as well.  I didn't bother 
> installing the snapshot on the remote nodes though.  Should I do that?  It 
> looked to me like this error happened well before we got to a remote node, so 
> that's why I didn't.
> 
> Your thoughts?
> 
> Evan
> 
> 
> 
> On Tue, Feb 3, 2015 at 7:40 PM, Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> I confess I am sorely puzzled. I replace the Info key with MPI_INFO_NULL, but 
> still had to pass a bogus argument to master since you still have the 
> Info_set code in there - otherwise, info_set segfaults due to a NULL argv[1]. 
> Doing that (and replacing "hostname" with an MPI example code) makes 
> everything work just fine.
> 
> I've attached one of our example comm_spawn codes that we test against - it 
> also works fine with the current head of the 1.8 code base. I confess that 
> some changes have been made since 1.8.4 was released, and it is entirely 
> possible that this was a problem in 1.8.4 and has since been fixed.
> 
> So I'd suggest trying with the nightly 1.8 tarball and seeing if it works for 
> you. You can download it from here:
> 
> http://www.open-mpi.org/nightly/v1.8/ <http://www.open-mpi.org/nightly/v1.8/>
> 
> HTH
> Ralph
> 
> 
> On Tue, Feb 3, 2015 at 6:20 PM, Evan Samanas <evan.sama...@gmail.com 
> <mailto:evan.sama...@gmail.com>> wrote:
> Yes, I did.  I replaced the info argument of MPI_Comm_spawn with 
> MPI_INFO_NULL.
> 
> On Tue, Feb 3, 2015 at 5:54 PM, Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> When running your comm_spawn code, did you remove the Info key code? You 
> wouldn't need to provide a hostfile or hosts any more, which is why it should 
> resolve that problem.
> 
> I agree that providing either hostfile or host as an Info key will cause the 
> program to segfault - I'm woking on that issue.
> 
> 
> On Tue, Feb 3, 2015 at 3:46 PM, Evan Samanas <evan.sama...@gmail.com 
> <mailto:evan.sama...@gmail.com>> wrote:
> Setting these environment variables did indeed change the way mpirun maps 
> things, and I didn't have to specify a hostfile.  However, setting these for 
> my MPI_Comm_spawn code still resulted in the same segmentation fault.
> 
> Evan
> 
> On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> If you add the following to your environment, you should run on multiple 
> nodes:
> 
> OMPI_MCA_rmaps_base_mapping_policy=node
> OMPI_MCA_orte_default_hostfile=<your hostfile>
> 
> The first tells OMPI to map-by node. The second passes in your default 
> hostfile so you don't need to specify it as an Info key.
> 
> HTH
> Ralph
> 
> 
> On Tue, Feb 3, 2015 at 9:23 AM, Evan Samanas <evan.sama...@gmail.com 
> <mailto:evan.sama...@gmail.com>> wrote:
> Hi Ralph,
> 
> Good to know you've reproduced it.  I was experiencing this using both the 
> hostfile and host key.  A simple comm_spawn was working for me as well, but 
> it was only launching locally, and I'm pretty sure each node only has 4 slots 
> given past behavior (the mpirun -np 8 example I gave in my first email 
> launches on both hosts).  Is there a way to specify the hosts I want to 
> launch on without the hostfile or host key so I can test remote launch?
> 
> And to the "hostname" response...no wonder it was hanging!  I just 
> constructed that as a basic example.  In my real use I'm launching something 
> that calls MPI_Init.
> 
> Evan
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/02/26271.php 
> <http://www.open-mpi.org/community/lists/users/2015/02/26271.php>
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/02/26272.php 
> <http://www.open-mpi.org/community/lists/users/2015/02/26272.php>
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/02/26281.php 
> <http://www.open-mpi.org/community/lists/users/2015/02/26281.php>
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/02/26285.php 
> <http://www.open-mpi.org/community/lists/users/2015/02/26285.php>
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/02/26286.php 
> <http://www.open-mpi.org/community/lists/users/2015/02/26286.php>
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/02/26287.php 
> <http://www.open-mpi.org/community/lists/users/2015/02/26287.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/02/26292.php 
> <http://www.open-mpi.org/community/lists/users/2015/02/26292.php>

Reply via email to