I confess I am sorely puzzled. I replace the Info key with MPI_INFO_NULL,
but still had to pass a bogus argument to master since you still have the
Info_set code in there - otherwise, info_set segfaults due to a NULL
argv[1]. Doing that (and replacing "hostname" with an MPI example code)
makes everything work just fine.

I've attached one of our example comm_spawn codes that we test against - it
also works fine with the current head of the 1.8 code base. I confess that
some changes have been made since 1.8.4 was released, and it is entirely
possible that this was a problem in 1.8.4 and has since been fixed.

So I'd suggest trying with the nightly 1.8 tarball and seeing if it works
for you. You can download it from here:

http://www.open-mpi.org/nightly/v1.8/

HTH
Ralph


On Tue, Feb 3, 2015 at 6:20 PM, Evan Samanas <evan.sama...@gmail.com> wrote:

> Yes, I did.  I replaced the info argument of MPI_Comm_spawn with
> MPI_INFO_NULL.
>
> On Tue, Feb 3, 2015 at 5:54 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> When running your comm_spawn code, did you remove the Info key code? You
>> wouldn't need to provide a hostfile or hosts any more, which is why it
>> should resolve that problem.
>>
>> I agree that providing either hostfile or host as an Info key will cause
>> the program to segfault - I'm woking on that issue.
>>
>>
>> On Tue, Feb 3, 2015 at 3:46 PM, Evan Samanas <evan.sama...@gmail.com>
>> wrote:
>>
>>> Setting these environment variables did indeed change the way mpirun
>>> maps things, and I didn't have to specify a hostfile.  However, setting
>>> these for my MPI_Comm_spawn code still resulted in the same segmentation
>>> fault.
>>>
>>> Evan
>>>
>>> On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> If you add the following to your environment, you should run on
>>>> multiple nodes:
>>>>
>>>> OMPI_MCA_rmaps_base_mapping_policy=node
>>>> OMPI_MCA_orte_default_hostfile=<your hostfile>
>>>>
>>>> The first tells OMPI to map-by node. The second passes in your default
>>>> hostfile so you don't need to specify it as an Info key.
>>>>
>>>> HTH
>>>> Ralph
>>>>
>>>>
>>>> On Tue, Feb 3, 2015 at 9:23 AM, Evan Samanas <evan.sama...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Ralph,
>>>>>
>>>>> Good to know you've reproduced it.  I was experiencing this using both
>>>>> the hostfile and host key.  A simple comm_spawn was working for me as 
>>>>> well,
>>>>> but it was only launching locally, and I'm pretty sure each node only has 
>>>>> 4
>>>>> slots given past behavior (the mpirun -np 8 example I gave in my first
>>>>> email launches on both hosts).  Is there a way to specify the hosts I want
>>>>> to launch on without the hostfile or host key so I can test remote launch?
>>>>>
>>>>> And to the "hostname" response...no wonder it was hanging!  I just
>>>>> constructed that as a basic example.  In my real use I'm launching
>>>>> something that calls MPI_Init.
>>>>>
>>>>> Evan
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/02/26271.php
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/02/26272.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/02/26281.php
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/02/26285.php
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/02/26286.php
>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

#include <mpi.h>

int main(int argc, char* argv[])
{
    int msg, rc;
    MPI_Comm parent, child;
    int rank, size;
    char hostname[512];
    pid_t pid;

    pid = getpid();
    printf("[pid %ld] starting up!\n", (long)pid);
    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    printf("%d completed MPI_Init\n", rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_get_parent(&parent);
    /* If we get COMM_NULL back, then we're the parent */
    if (MPI_COMM_NULL == parent) {
        pid = getpid();
        printf("Parent [pid %ld] about to spawn!\n", (long)pid);
        if (MPI_SUCCESS != (rc = MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 3, MPI_INFO_NULL, 
                                                0, MPI_COMM_WORLD, &child, MPI_ERRCODES_IGNORE))) {
            printf("Child failed to spawn\n");
            return rc;
        }
        printf("Parent done with spawn\n");
        if (0 == rank) {
            msg = 38;
            printf("Parent sending message to child\n");
            MPI_Send(&msg, 1, MPI_INT, 0, 1, child);
        }
        MPI_Comm_disconnect(&child);
        printf("Parent disconnected\n");
    } 
    /* Otherwise, we're the child */
    else {
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);
        gethostname(hostname, 512);
        pid = getpid();
        printf("Hello from the child %d of %d on host %s pid %ld\n", rank, 3, hostname, (long)pid);
        if (0 == rank) {
            MPI_Recv(&msg, 1, MPI_INT, 0, 1, parent, MPI_STATUS_IGNORE);
            printf("Child %d received msg: %d\n", rank, msg);
        }
        MPI_Comm_disconnect(&parent);
        printf("Child %d disconnected\n", rank);
    }

    MPI_Finalize();
    fprintf(stderr, "%d: exiting\n", pid);
    return 0;
}

Reply via email to