Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain

On Sep 29, 2014, at 12:05 PM, Amos Anderson  wrote:

> Hi Dave --
> 
> It looks like my argv[argc] is not NULL (see below), so are we getting that 
> this problem is boost::python's fault?

Yep - they are violating the C99 standard


> 
> Thanks!
> Amos.
> 
> 
> 
> Looking in the boost code, I see this is how MPI_Init is called:
> 
> 
> environment::environment(int& argc, char** , bool abort_on_exception)
>  : i_initialized(false),
>abort_on_exception(abort_on_exception)
> {
>  if (!initialized()) {
>BOOST_MPI_CHECK_RESULT(MPI_Init, (, ));
>i_initialized = true;
>  }
> 
>  MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
> }
> 
> 
> 
> 
> Getting some more info from a gdb session (the trace is the same):
> (gdb) up
> #1  0x2ab2ce4e in ompi_mpi_init (argc=2, argv=0xa39440, requested=0, 
> provided=0x7fffb9e8) at runtime/ompi_mpi_init.c:450
> 450   tmp = opal_argv_join([1], ' ');
> (gdb) up
> #2  0x2ab63e39 in PMPI_Init (argc=0x7fffbadc, 
> argv=0x7fffbad0) at pinit.c:84
> 84err = ompi_mpi_init(*argc, *argv, required, );
> (gdb) print argc
> $1 = (int *) 0x7fffbadc
> (gdb) print *argc
> $2 = 2
> (gdb) print *argv
> $3 = (char **) 0xa39440
> (gdb) print argv
> $4 = (char ***) 0x7fffbad0
> (gdb) print **argv
> $5 = 0x9d3230 "test/regression/regression-test.py"
> (gdb) up
> #3  0x2aaab7b965d6 in boost::mpi::environment::environment 
> (this=0xa3a280, argc=@0x7fffbadc, argv=@0x7fffbad0, 
> abort_on_exception=true)
>at ../tools/boost/libs/mpi/src/environment.cpp:98
> 98BOOST_MPI_CHECK_RESULT(MPI_Init, (, ));
> (gdb) print argc
> $6 = (int &) @0x7fffbadc: 2
> (gdb) print *argc
> Attempt to take contents of a non-pointer value.
> (gdb) print 
> $7 = (int *) 0x7fffbadc
> (gdb) print argc
> $8 = (int &) @0x7fffbadc: 2
> (gdb) print argv
> $9 = (char **&) @0x7fffbad0: 0xa39440
> (gdb) print *argv
> $10 = 0x9d3230 "test/regression/regression-test.py"
> (gdb) print argv[0]
> $11 = 0x9d3230 "test/regression/regression-test.py"
> (gdb) print argv[1]
> $12 = 0x9caa40 "test/regression/regression-jobs"
> (gdb) print argv[2]
> $13 = 0x20 
> (gdb) 
> 
> 
> 
> 
> On Sep 29, 2014, at 11:48 AM, Dave Goodell (dgoodell)  
> wrote:
> 
>> Looks like boost::mpi and/or your python "mpi" module might be creating a 
>> bogus argv array and passing it to OMPI's MPI_Init routine.  Note that argv 
>> is required by C99 to be terminated with a NULL pointer (that is, 
>> (argv[argc]==NULL) must hold).  See 
>> http://stackoverflow.com/a/3772826/158513.
>> 
>> -Dave
>> 
>> On Sep 29, 2014, at 1:34 PM, Ralph Castain  wrote:
>> 
>>> Afraid I cannot replicate a problem with singleton behavior in the 1.8 
>>> series:
>>> 
>>> 11:31:52  /home/common/openmpi/v1.8/orte/test/mpi$ ./hello foo bar
>>> Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0-23
>>> OMPI_MCA_orte_default_hostfile=/home/common/hosts
>>> OMPI_COMMAND=./hello
>>> OMPI_ARGV=foo bar
>>> OMPI_NUM_APP_CTX=1
>>> OMPI_FIRST_RANKS=0
>>> OMPI_APP_CTX_NUM_PROCS=1
>>> OMPI_MCA_orte_ess_num_procs=1
>>> 
>>> You can see that the OMPI_ARGV envar (which is the spot you flagged) is 
>>> correctly being set and there is no segfault. Not sure what your program 
>>> may be doing, though, so I'm not sure I've really tested your scenario.
>>> 
>>> 
>>> On Sep 29, 2014, at 10:55 AM, Ralph Castain  wrote:
>>> 
 Okay, so regression-test.py is calling MPI_Init as a singleton, correct? 
 Just trying to fully understand the scenario
 
 Singletons are certainly allowed, if that's the scenario
 
 On Sep 29, 2014, at 10:51 AM, Amos Anderson  
 wrote:
 
> I'm not calling mpirun in this case because this particular calculation 
> doesn't use more than one processor. What I'm doing on my command line is 
> this:
> 
> /home/user/myapp/tools/python/bin/python 
> test/regression/regression-test.py test/regression/regression-jobs
> 
> and internally I check for rank/size. This command is executed in the 
> context of a souped up LD_LIBRARY_PATH. You can see the variable argv in 
> opal_argv_join is ending up with the last argument on my command line.
> 
> I suppose your question implies that mpirun is mandatory for executing 
> anything compiled with OpenMPI > 1.6 ?
> 
> 
> 
> On Sep 29, 2014, at 10:28 AM, Ralph Castain  wrote:
> 
>> Can you pass us the actual mpirun command line being executed? 
>> Especially need to see the argv being passed to your application.
>> 
>> 
>> On Sep 27, 2014, at 7:09 PM, Amos Anderson  
>> wrote:
>> 
>>> FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. 
>>> Also, I have some gdb output (from 1.7.5) for your perusal, including a 
>>> 

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Amos Anderson
Hi Dave --

It looks like my argv[argc] is not NULL (see below), so are we getting that 
this problem is boost::python's fault?

Thanks!
Amos.



Looking in the boost code, I see this is how MPI_Init is called:


environment::environment(int& argc, char** , bool abort_on_exception)
  : i_initialized(false),
abort_on_exception(abort_on_exception)
{
  if (!initialized()) {
BOOST_MPI_CHECK_RESULT(MPI_Init, (, ));
i_initialized = true;
  }

  MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
}




Getting some more info from a gdb session (the trace is the same):
(gdb) up
#1  0x2ab2ce4e in ompi_mpi_init (argc=2, argv=0xa39440, requested=0, 
provided=0x7fffb9e8) at runtime/ompi_mpi_init.c:450
450 tmp = opal_argv_join([1], ' ');
(gdb) up
#2  0x2ab63e39 in PMPI_Init (argc=0x7fffbadc, argv=0x7fffbad0) 
at pinit.c:84
84  err = ompi_mpi_init(*argc, *argv, required, );
(gdb) print argc
$1 = (int *) 0x7fffbadc
(gdb) print *argc
$2 = 2
(gdb) print *argv
$3 = (char **) 0xa39440
(gdb) print argv
$4 = (char ***) 0x7fffbad0
(gdb) print **argv
$5 = 0x9d3230 "test/regression/regression-test.py"
(gdb) up
#3  0x2aaab7b965d6 in boost::mpi::environment::environment (this=0xa3a280, 
argc=@0x7fffbadc, argv=@0x7fffbad0, abort_on_exception=true)
at ../tools/boost/libs/mpi/src/environment.cpp:98
98  BOOST_MPI_CHECK_RESULT(MPI_Init, (, ));
(gdb) print argc
$6 = (int &) @0x7fffbadc: 2
(gdb) print *argc
Attempt to take contents of a non-pointer value.
(gdb) print 
$7 = (int *) 0x7fffbadc
(gdb) print argc
$8 = (int &) @0x7fffbadc: 2
(gdb) print argv
$9 = (char **&) @0x7fffbad0: 0xa39440
(gdb) print *argv
$10 = 0x9d3230 "test/regression/regression-test.py"
(gdb) print argv[0]
$11 = 0x9d3230 "test/regression/regression-test.py"
(gdb) print argv[1]
$12 = 0x9caa40 "test/regression/regression-jobs"
(gdb) print argv[2]
$13 = 0x20 
(gdb) 




On Sep 29, 2014, at 11:48 AM, Dave Goodell (dgoodell)  
wrote:

> Looks like boost::mpi and/or your python "mpi" module might be creating a 
> bogus argv array and passing it to OMPI's MPI_Init routine.  Note that argv 
> is required by C99 to be terminated with a NULL pointer (that is, 
> (argv[argc]==NULL) must hold).  See http://stackoverflow.com/a/3772826/158513.
> 
> -Dave
> 
> On Sep 29, 2014, at 1:34 PM, Ralph Castain  wrote:
> 
>> Afraid I cannot replicate a problem with singleton behavior in the 1.8 
>> series:
>> 
>> 11:31:52  /home/common/openmpi/v1.8/orte/test/mpi$ ./hello foo bar
>> Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0-23
>> OMPI_MCA_orte_default_hostfile=/home/common/hosts
>> OMPI_COMMAND=./hello
>> OMPI_ARGV=foo bar
>> OMPI_NUM_APP_CTX=1
>> OMPI_FIRST_RANKS=0
>> OMPI_APP_CTX_NUM_PROCS=1
>> OMPI_MCA_orte_ess_num_procs=1
>> 
>> You can see that the OMPI_ARGV envar (which is the spot you flagged) is 
>> correctly being set and there is no segfault. Not sure what your program may 
>> be doing, though, so I'm not sure I've really tested your scenario.
>> 
>> 
>> On Sep 29, 2014, at 10:55 AM, Ralph Castain  wrote:
>> 
>>> Okay, so regression-test.py is calling MPI_Init as a singleton, correct? 
>>> Just trying to fully understand the scenario
>>> 
>>> Singletons are certainly allowed, if that's the scenario
>>> 
>>> On Sep 29, 2014, at 10:51 AM, Amos Anderson  
>>> wrote:
>>> 
 I'm not calling mpirun in this case because this particular calculation 
 doesn't use more than one processor. What I'm doing on my command line is 
 this:
 
 /home/user/myapp/tools/python/bin/python 
 test/regression/regression-test.py test/regression/regression-jobs
 
 and internally I check for rank/size. This command is executed in the 
 context of a souped up LD_LIBRARY_PATH. You can see the variable argv in 
 opal_argv_join is ending up with the last argument on my command line.
 
 I suppose your question implies that mpirun is mandatory for executing 
 anything compiled with OpenMPI > 1.6 ?
 
 
 
 On Sep 29, 2014, at 10:28 AM, Ralph Castain  wrote:
 
> Can you pass us the actual mpirun command line being executed? Especially 
> need to see the argv being passed to your application.
> 
> 
> On Sep 27, 2014, at 7:09 PM, Amos Anderson  
> wrote:
> 
>> FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. 
>> Also, I have some gdb output (from 1.7.5) for your perusal, including a 
>> printout of some of the variables' values.
>> 
>> 
>> 
>> Starting program: /home/user/myapp/tools/python/bin/python 
>> test/regression/regression-test.py test/regression/regression-jobs
>> [Thread debugging using libthread_db enabled]
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> 

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Dave Goodell (dgoodell)
Looks like boost::mpi and/or your python "mpi" module might be creating a bogus 
argv array and passing it to OMPI's MPI_Init routine.  Note that argv is 
required by C99 to be terminated with a NULL pointer (that is, 
(argv[argc]==NULL) must hold).  See http://stackoverflow.com/a/3772826/158513.

-Dave

On Sep 29, 2014, at 1:34 PM, Ralph Castain  wrote:

> Afraid I cannot replicate a problem with singleton behavior in the 1.8 series:
> 
> 11:31:52  /home/common/openmpi/v1.8/orte/test/mpi$ ./hello foo bar
> Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0-23
> OMPI_MCA_orte_default_hostfile=/home/common/hosts
> OMPI_COMMAND=./hello
> OMPI_ARGV=foo bar
> OMPI_NUM_APP_CTX=1
> OMPI_FIRST_RANKS=0
> OMPI_APP_CTX_NUM_PROCS=1
> OMPI_MCA_orte_ess_num_procs=1
> 
> You can see that the OMPI_ARGV envar (which is the spot you flagged) is 
> correctly being set and there is no segfault. Not sure what your program may 
> be doing, though, so I'm not sure I've really tested your scenario.
> 
> 
> On Sep 29, 2014, at 10:55 AM, Ralph Castain  wrote:
> 
>> Okay, so regression-test.py is calling MPI_Init as a singleton, correct? 
>> Just trying to fully understand the scenario
>> 
>> Singletons are certainly allowed, if that's the scenario
>> 
>> On Sep 29, 2014, at 10:51 AM, Amos Anderson  
>> wrote:
>> 
>>> I'm not calling mpirun in this case because this particular calculation 
>>> doesn't use more than one processor. What I'm doing on my command line is 
>>> this:
>>> 
>>> /home/user/myapp/tools/python/bin/python test/regression/regression-test.py 
>>> test/regression/regression-jobs
>>> 
>>> and internally I check for rank/size. This command is executed in the 
>>> context of a souped up LD_LIBRARY_PATH. You can see the variable argv in 
>>> opal_argv_join is ending up with the last argument on my command line.
>>> 
>>> I suppose your question implies that mpirun is mandatory for executing 
>>> anything compiled with OpenMPI > 1.6 ?
>>> 
>>> 
>>> 
>>> On Sep 29, 2014, at 10:28 AM, Ralph Castain  wrote:
>>> 
 Can you pass us the actual mpirun command line being executed? Especially 
 need to see the argv being passed to your application.
 
 
 On Sep 27, 2014, at 7:09 PM, Amos Anderson  
 wrote:
 
> FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. 
> Also, I have some gdb output (from 1.7.5) for your perusal, including a 
> printout of some of the variables' values.
> 
> 
> 
> Starting program: /home/user/myapp/tools/python/bin/python 
> test/regression/regression-test.py test/regression/regression-jobs
> [Thread debugging using libthread_db enabled]
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
> argv.c:299
> 299   str_len += strlen(*p) + 1;
> (gdb) where
> #0  0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
> argv.c:299
> #1  0x2ab2ce4e in ompi_mpi_init (argc=2, argv=0xa39390, 
> requested=0, provided=0x7fffba98) at runtime/ompi_mpi_init.c:450
> #2  0x2ab63e39 in PMPI_Init (argc=0x7fffbb8c, 
> argv=0x7fffbb80) at pinit.c:84
> #3  0x2aaab7b965d6 in boost::mpi::environment::environment 
> (this=0xa3a1d0, argc=@0x7fffbb8c, argv=@0x7fffbb80, 
> abort_on_exception=true)
>at ../tools/boost/libs/mpi/src/environment.cpp:98
> #4  0x2aaabc7b311d in boost::mpi::python::mpi_init (python_argv=..., 
> abort_on_exception=true) at 
> ../tools/boost/libs/mpi/src/python/py_environment.cpp:60
> #5  0x2aaabc7b33fb in boost::mpi::python::export_environment () at 
> ../tools/boost/libs/mpi/src/python/py_environment.cpp:94
> #6  0x2aaabc7d5ab5 in boost::mpi::python::init_module_mpi () at 
> ../tools/boost/libs/mpi/src/python/module.cpp:44
> #7  0x2aaab792a2f2 in 
> boost::detail::function::void_function_ref_invoker0 void>::invoke (function_obj_ptr=...)
>at ../tools/boost/boost/function/function_template.hpp:188
> #8  0x2aaab7929e6b in boost::function0::operator() 
> (this=0x7fffc110) at 
> ../tools/boost/boost/function/function_template.hpp:767
> #9  0x2aaab7928f11 in boost::python::handle_exception_impl (f=...) at 
> ../tools/boost/libs/python/src/errors.cpp:25
> #10 0x2aaab792a54f in boost::python::handle_exception 
> (f=0x2aaabc7d5746 ) at 
> ../tools/boost/boost/python/errors.hpp:29
> #11 0x2aaab792a1d9 in boost::python::detail::(anonymous 
> namespace)::init_module_in_scope (m=0x2aaabc617f68, 
>init_function=0x2aaabc7d5746 ) 
> at ../tools/boost/libs/python/src/module.cpp:24
> #12 

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
Afraid I cannot replicate a problem with singleton behavior in the 1.8 series:

11:31:52  /home/common/openmpi/v1.8/orte/test/mpi$ ./hello foo bar
Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0-23
OMPI_MCA_orte_default_hostfile=/home/common/hosts
OMPI_COMMAND=./hello
OMPI_ARGV=foo bar
OMPI_NUM_APP_CTX=1
OMPI_FIRST_RANKS=0
OMPI_APP_CTX_NUM_PROCS=1
OMPI_MCA_orte_ess_num_procs=1

You can see that the OMPI_ARGV envar (which is the spot you flagged) is 
correctly being set and there is no segfault. Not sure what your program may be 
doing, though, so I'm not sure I've really tested your scenario.


On Sep 29, 2014, at 10:55 AM, Ralph Castain  wrote:

> Okay, so regression-test.py is calling MPI_Init as a singleton, correct? Just 
> trying to fully understand the scenario
> 
> Singletons are certainly allowed, if that's the scenario
> 
> On Sep 29, 2014, at 10:51 AM, Amos Anderson  
> wrote:
> 
>> I'm not calling mpirun in this case because this particular calculation 
>> doesn't use more than one processor. What I'm doing on my command line is 
>> this:
>> 
>> /home/user/myapp/tools/python/bin/python test/regression/regression-test.py 
>> test/regression/regression-jobs
>> 
>> and internally I check for rank/size. This command is executed in the 
>> context of a souped up LD_LIBRARY_PATH. You can see the variable argv in 
>> opal_argv_join is ending up with the last argument on my command line.
>> 
>> I suppose your question implies that mpirun is mandatory for executing 
>> anything compiled with OpenMPI > 1.6 ?
>> 
>> 
>> 
>> On Sep 29, 2014, at 10:28 AM, Ralph Castain  wrote:
>> 
>>> Can you pass us the actual mpirun command line being executed? Especially 
>>> need to see the argv being passed to your application.
>>> 
>>> 
>>> On Sep 27, 2014, at 7:09 PM, Amos Anderson  
>>> wrote:
>>> 
 FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. 
 Also, I have some gdb output (from 1.7.5) for your perusal, including a 
 printout of some of the variables' values.
 
 
 
 Starting program: /home/user/myapp/tools/python/bin/python 
 test/regression/regression-test.py test/regression/regression-jobs
 [Thread debugging using libthread_db enabled]
 
 Program received signal SIGSEGV, Segmentation fault.
 0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
 argv.c:299
 299str_len += strlen(*p) + 1;
 (gdb) where
 #0  0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
 argv.c:299
 #1  0x2ab2ce4e in ompi_mpi_init (argc=2, argv=0xa39390, 
 requested=0, provided=0x7fffba98) at runtime/ompi_mpi_init.c:450
 #2  0x2ab63e39 in PMPI_Init (argc=0x7fffbb8c, 
 argv=0x7fffbb80) at pinit.c:84
 #3  0x2aaab7b965d6 in boost::mpi::environment::environment 
 (this=0xa3a1d0, argc=@0x7fffbb8c, argv=@0x7fffbb80, 
 abort_on_exception=true)
at ../tools/boost/libs/mpi/src/environment.cpp:98
 #4  0x2aaabc7b311d in boost::mpi::python::mpi_init (python_argv=..., 
 abort_on_exception=true) at 
 ../tools/boost/libs/mpi/src/python/py_environment.cpp:60
 #5  0x2aaabc7b33fb in boost::mpi::python::export_environment () at 
 ../tools/boost/libs/mpi/src/python/py_environment.cpp:94
 #6  0x2aaabc7d5ab5 in boost::mpi::python::init_module_mpi () at 
 ../tools/boost/libs/mpi/src/python/module.cpp:44
 #7  0x2aaab792a2f2 in 
 boost::detail::function::void_function_ref_invoker0>>> void>::invoke (function_obj_ptr=...)
at ../tools/boost/boost/function/function_template.hpp:188
 #8  0x2aaab7929e6b in boost::function0::operator() 
 (this=0x7fffc110) at 
 ../tools/boost/boost/function/function_template.hpp:767
 #9  0x2aaab7928f11 in boost::python::handle_exception_impl (f=...) at 
 ../tools/boost/libs/python/src/errors.cpp:25
 #10 0x2aaab792a54f in boost::python::handle_exception 
 (f=0x2aaabc7d5746 ) at 
 ../tools/boost/boost/python/errors.hpp:29
 #11 0x2aaab792a1d9 in boost::python::detail::(anonymous 
 namespace)::init_module_in_scope (m=0x2aaabc617f68, 
init_function=0x2aaabc7d5746 ) 
 at ../tools/boost/libs/python/src/module.cpp:24
 #12 0x2aaab792a26c in boost::python::detail::init_module 
 (name=0x2aaabc7f7f4d "mpi", init_function=0x2aaabc7d5746 
 )
at ../tools/boost/libs/python/src/module.cpp:59
 #13 0x2aaabc7d5b2b in boost::mpi::python::initmpi () at 
 ../tools/boost/libs/mpi/src/python/module.cpp:34
 #14 0x2b27e095 in _PyImport_LoadDynamicModule (name=0xac9435 
 "mpi", pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", fp=0xaca450) at 
 

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
Okay, so regression-test.py is calling MPI_Init as a singleton, correct? Just 
trying to fully understand the scenario

Singletons are certainly allowed, if that's the scenario

On Sep 29, 2014, at 10:51 AM, Amos Anderson  wrote:

> I'm not calling mpirun in this case because this particular calculation 
> doesn't use more than one processor. What I'm doing on my command line is 
> this:
> 
> /home/user/myapp/tools/python/bin/python test/regression/regression-test.py 
> test/regression/regression-jobs
> 
> and internally I check for rank/size. This command is executed in the context 
> of a souped up LD_LIBRARY_PATH. You can see the variable argv in 
> opal_argv_join is ending up with the last argument on my command line.
> 
> I suppose your question implies that mpirun is mandatory for executing 
> anything compiled with OpenMPI > 1.6 ?
> 
> 
> 
> On Sep 29, 2014, at 10:28 AM, Ralph Castain  wrote:
> 
>> Can you pass us the actual mpirun command line being executed? Especially 
>> need to see the argv being passed to your application.
>> 
>> 
>> On Sep 27, 2014, at 7:09 PM, Amos Anderson  
>> wrote:
>> 
>>> FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. 
>>> Also, I have some gdb output (from 1.7.5) for your perusal, including a 
>>> printout of some of the variables' values.
>>> 
>>> 
>>> 
>>> Starting program: /home/user/myapp/tools/python/bin/python 
>>> test/regression/regression-test.py test/regression/regression-jobs
>>> [Thread debugging using libthread_db enabled]
>>> 
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
>>> argv.c:299
>>> 299 str_len += strlen(*p) + 1;
>>> (gdb) where
>>> #0  0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
>>> argv.c:299
>>> #1  0x2ab2ce4e in ompi_mpi_init (argc=2, argv=0xa39390, 
>>> requested=0, provided=0x7fffba98) at runtime/ompi_mpi_init.c:450
>>> #2  0x2ab63e39 in PMPI_Init (argc=0x7fffbb8c, 
>>> argv=0x7fffbb80) at pinit.c:84
>>> #3  0x2aaab7b965d6 in boost::mpi::environment::environment 
>>> (this=0xa3a1d0, argc=@0x7fffbb8c, argv=@0x7fffbb80, 
>>> abort_on_exception=true)
>>>at ../tools/boost/libs/mpi/src/environment.cpp:98
>>> #4  0x2aaabc7b311d in boost::mpi::python::mpi_init (python_argv=..., 
>>> abort_on_exception=true) at 
>>> ../tools/boost/libs/mpi/src/python/py_environment.cpp:60
>>> #5  0x2aaabc7b33fb in boost::mpi::python::export_environment () at 
>>> ../tools/boost/libs/mpi/src/python/py_environment.cpp:94
>>> #6  0x2aaabc7d5ab5 in boost::mpi::python::init_module_mpi () at 
>>> ../tools/boost/libs/mpi/src/python/module.cpp:44
>>> #7  0x2aaab792a2f2 in 
>>> boost::detail::function::void_function_ref_invoker0>> void>::invoke (function_obj_ptr=...)
>>>at ../tools/boost/boost/function/function_template.hpp:188
>>> #8  0x2aaab7929e6b in boost::function0::operator() 
>>> (this=0x7fffc110) at 
>>> ../tools/boost/boost/function/function_template.hpp:767
>>> #9  0x2aaab7928f11 in boost::python::handle_exception_impl (f=...) at 
>>> ../tools/boost/libs/python/src/errors.cpp:25
>>> #10 0x2aaab792a54f in boost::python::handle_exception 
>>> (f=0x2aaabc7d5746 ) at 
>>> ../tools/boost/boost/python/errors.hpp:29
>>> #11 0x2aaab792a1d9 in boost::python::detail::(anonymous 
>>> namespace)::init_module_in_scope (m=0x2aaabc617f68, 
>>>init_function=0x2aaabc7d5746 ) at 
>>> ../tools/boost/libs/python/src/module.cpp:24
>>> #12 0x2aaab792a26c in boost::python::detail::init_module 
>>> (name=0x2aaabc7f7f4d "mpi", init_function=0x2aaabc7d5746 
>>> )
>>>at ../tools/boost/libs/python/src/module.cpp:59
>>> #13 0x2aaabc7d5b2b in boost::mpi::python::initmpi () at 
>>> ../tools/boost/libs/mpi/src/python/module.cpp:34
>>> #14 0x2b27e095 in _PyImport_LoadDynamicModule (name=0xac9435 "mpi", 
>>> pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", fp=0xaca450) at 
>>> ./Python/importdl.c:53
>>> #15 0x2b279fd4 in load_module (name=0xac9435 "mpi", fp=0xaca450, 
>>> pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", type=3, loader=0x0) at 
>>> Python/import.c:1915
>>> #16 0x2b27c2e8 in import_submodule (mod=0x2b533a20, 
>>> subname=0xac9435 "mpi", fullname=0xac9435 "mpi") at Python/import.c:2700
>>> #17 0x2b27b8fa in load_next (mod=0x2aaab0f075a8, 
>>> altmod=0x2b533a20, p_name=0x7fffc3f8, buf=0xac9430 "util.mpi", 
>>> p_buflen=0x7fffc408)
>>>at Python/import.c:2519
>>> #18 0x2b27a98d in import_module_level (name=0x0, globals=0xe95a70, 
>>> locals=0xe95a70, fromlist=0x2b533a20, level=-1) at Python/import.c:2224
>>> #19 0x2b27aeda in PyImport_ImportModuleLevel (name=0x2aaab0f00964 
>>> "mpi", 

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Amos Anderson
I'm not calling mpirun in this case because this particular calculation doesn't 
use more than one processor. What I'm doing on my command line is this:

/home/user/myapp/tools/python/bin/python test/regression/regression-test.py 
test/regression/regression-jobs

and internally I check for rank/size. This command is executed in the context 
of a souped up LD_LIBRARY_PATH. You can see the variable argv in opal_argv_join 
is ending up with the last argument on my command line.

I suppose your question implies that mpirun is mandatory for executing anything 
compiled with OpenMPI > 1.6 ?



On Sep 29, 2014, at 10:28 AM, Ralph Castain  wrote:

> Can you pass us the actual mpirun command line being executed? Especially 
> need to see the argv being passed to your application.
> 
> 
> On Sep 27, 2014, at 7:09 PM, Amos Anderson  wrote:
> 
>> FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. 
>> Also, I have some gdb output (from 1.7.5) for your perusal, including a 
>> printout of some of the variables' values.
>> 
>> 
>> 
>> Starting program: /home/user/myapp/tools/python/bin/python 
>> test/regression/regression-test.py test/regression/regression-jobs
>> [Thread debugging using libthread_db enabled]
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
>> argv.c:299
>> 299  str_len += strlen(*p) + 1;
>> (gdb) where
>> #0  0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
>> argv.c:299
>> #1  0x2ab2ce4e in ompi_mpi_init (argc=2, argv=0xa39390, requested=0, 
>> provided=0x7fffba98) at runtime/ompi_mpi_init.c:450
>> #2  0x2ab63e39 in PMPI_Init (argc=0x7fffbb8c, 
>> argv=0x7fffbb80) at pinit.c:84
>> #3  0x2aaab7b965d6 in boost::mpi::environment::environment 
>> (this=0xa3a1d0, argc=@0x7fffbb8c, argv=@0x7fffbb80, 
>> abort_on_exception=true)
>>at ../tools/boost/libs/mpi/src/environment.cpp:98
>> #4  0x2aaabc7b311d in boost::mpi::python::mpi_init (python_argv=..., 
>> abort_on_exception=true) at 
>> ../tools/boost/libs/mpi/src/python/py_environment.cpp:60
>> #5  0x2aaabc7b33fb in boost::mpi::python::export_environment () at 
>> ../tools/boost/libs/mpi/src/python/py_environment.cpp:94
>> #6  0x2aaabc7d5ab5 in boost::mpi::python::init_module_mpi () at 
>> ../tools/boost/libs/mpi/src/python/module.cpp:44
>> #7  0x2aaab792a2f2 in 
>> boost::detail::function::void_function_ref_invoker0> void>::invoke (function_obj_ptr=...)
>>at ../tools/boost/boost/function/function_template.hpp:188
>> #8  0x2aaab7929e6b in boost::function0::operator() 
>> (this=0x7fffc110) at 
>> ../tools/boost/boost/function/function_template.hpp:767
>> #9  0x2aaab7928f11 in boost::python::handle_exception_impl (f=...) at 
>> ../tools/boost/libs/python/src/errors.cpp:25
>> #10 0x2aaab792a54f in boost::python::handle_exception 
>> (f=0x2aaabc7d5746 ) at 
>> ../tools/boost/boost/python/errors.hpp:29
>> #11 0x2aaab792a1d9 in boost::python::detail::(anonymous 
>> namespace)::init_module_in_scope (m=0x2aaabc617f68, 
>>init_function=0x2aaabc7d5746 ) at 
>> ../tools/boost/libs/python/src/module.cpp:24
>> #12 0x2aaab792a26c in boost::python::detail::init_module 
>> (name=0x2aaabc7f7f4d "mpi", init_function=0x2aaabc7d5746 
>> )
>>at ../tools/boost/libs/python/src/module.cpp:59
>> #13 0x2aaabc7d5b2b in boost::mpi::python::initmpi () at 
>> ../tools/boost/libs/mpi/src/python/module.cpp:34
>> #14 0x2b27e095 in _PyImport_LoadDynamicModule (name=0xac9435 "mpi", 
>> pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", fp=0xaca450) at 
>> ./Python/importdl.c:53
>> #15 0x2b279fd4 in load_module (name=0xac9435 "mpi", fp=0xaca450, 
>> pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", type=3, loader=0x0) at 
>> Python/import.c:1915
>> #16 0x2b27c2e8 in import_submodule (mod=0x2b533a20, 
>> subname=0xac9435 "mpi", fullname=0xac9435 "mpi") at Python/import.c:2700
>> #17 0x2b27b8fa in load_next (mod=0x2aaab0f075a8, 
>> altmod=0x2b533a20, p_name=0x7fffc3f8, buf=0xac9430 "util.mpi", 
>> p_buflen=0x7fffc408)
>>at Python/import.c:2519
>> #18 0x2b27a98d in import_module_level (name=0x0, globals=0xe95a70, 
>> locals=0xe95a70, fromlist=0x2b533a20, level=-1) at Python/import.c:2224
>> #19 0x2b27aeda in PyImport_ImportModuleLevel (name=0x2aaab0f00964 
>> "mpi", globals=0xe95a70, locals=0xe95a70, fromlist=0x2b533a20, level=-1) 
>> at Python/import.c:2288
>> #20 0x2b2419c4 in builtin___import__ (self=0x0, args=0x2aaabc6211f8, 
>> kwds=0x0) at Python/bltinmodule.c:49
>> #21 0x2b1b19c7 in PyCFunction_Call (func=0x2bf85510, 
>> arg=0x2aaabc6211f8, kw=0x0) at Objects/methodobject.c:85
>> #22 0x2b14d673 in 

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Ralph Castain
Can you pass us the actual mpirun command line being executed? Especially need 
to see the argv being passed to your application.


On Sep 27, 2014, at 7:09 PM, Amos Anderson  wrote:

> FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. Also, 
> I have some gdb output (from 1.7.5) for your perusal, including a printout of 
> some of the variables' values.
> 
> 
> 
> Starting program: /home/user/myapp/tools/python/bin/python 
> test/regression/regression-test.py test/regression/regression-jobs
> [Thread debugging using libthread_db enabled]
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
> argv.c:299
> 299   str_len += strlen(*p) + 1;
> (gdb) where
> #0  0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
> argv.c:299
> #1  0x2ab2ce4e in ompi_mpi_init (argc=2, argv=0xa39390, requested=0, 
> provided=0x7fffba98) at runtime/ompi_mpi_init.c:450
> #2  0x2ab63e39 in PMPI_Init (argc=0x7fffbb8c, 
> argv=0x7fffbb80) at pinit.c:84
> #3  0x2aaab7b965d6 in boost::mpi::environment::environment 
> (this=0xa3a1d0, argc=@0x7fffbb8c, argv=@0x7fffbb80, 
> abort_on_exception=true)
>at ../tools/boost/libs/mpi/src/environment.cpp:98
> #4  0x2aaabc7b311d in boost::mpi::python::mpi_init (python_argv=..., 
> abort_on_exception=true) at 
> ../tools/boost/libs/mpi/src/python/py_environment.cpp:60
> #5  0x2aaabc7b33fb in boost::mpi::python::export_environment () at 
> ../tools/boost/libs/mpi/src/python/py_environment.cpp:94
> #6  0x2aaabc7d5ab5 in boost::mpi::python::init_module_mpi () at 
> ../tools/boost/libs/mpi/src/python/module.cpp:44
> #7  0x2aaab792a2f2 in 
> boost::detail::function::void_function_ref_invoker0::invoke 
> (function_obj_ptr=...)
>at ../tools/boost/boost/function/function_template.hpp:188
> #8  0x2aaab7929e6b in boost::function0::operator() 
> (this=0x7fffc110) at 
> ../tools/boost/boost/function/function_template.hpp:767
> #9  0x2aaab7928f11 in boost::python::handle_exception_impl (f=...) at 
> ../tools/boost/libs/python/src/errors.cpp:25
> #10 0x2aaab792a54f in boost::python::handle_exception 
> (f=0x2aaabc7d5746 ) at 
> ../tools/boost/boost/python/errors.hpp:29
> #11 0x2aaab792a1d9 in boost::python::detail::(anonymous 
> namespace)::init_module_in_scope (m=0x2aaabc617f68, 
>init_function=0x2aaabc7d5746 ) at 
> ../tools/boost/libs/python/src/module.cpp:24
> #12 0x2aaab792a26c in boost::python::detail::init_module 
> (name=0x2aaabc7f7f4d "mpi", init_function=0x2aaabc7d5746 
> )
>at ../tools/boost/libs/python/src/module.cpp:59
> #13 0x2aaabc7d5b2b in boost::mpi::python::initmpi () at 
> ../tools/boost/libs/mpi/src/python/module.cpp:34
> #14 0x2b27e095 in _PyImport_LoadDynamicModule (name=0xac9435 "mpi", 
> pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", fp=0xaca450) at 
> ./Python/importdl.c:53
> #15 0x2b279fd4 in load_module (name=0xac9435 "mpi", fp=0xaca450, 
> pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", type=3, loader=0x0) at 
> Python/import.c:1915
> #16 0x2b27c2e8 in import_submodule (mod=0x2b533a20, 
> subname=0xac9435 "mpi", fullname=0xac9435 "mpi") at Python/import.c:2700
> #17 0x2b27b8fa in load_next (mod=0x2aaab0f075a8, 
> altmod=0x2b533a20, p_name=0x7fffc3f8, buf=0xac9430 "util.mpi", 
> p_buflen=0x7fffc408)
>at Python/import.c:2519
> #18 0x2b27a98d in import_module_level (name=0x0, globals=0xe95a70, 
> locals=0xe95a70, fromlist=0x2b533a20, level=-1) at Python/import.c:2224
> #19 0x2b27aeda in PyImport_ImportModuleLevel (name=0x2aaab0f00964 
> "mpi", globals=0xe95a70, locals=0xe95a70, fromlist=0x2b533a20, level=-1) 
> at Python/import.c:2288
> #20 0x2b2419c4 in builtin___import__ (self=0x0, args=0x2aaabc6211f8, 
> kwds=0x0) at Python/bltinmodule.c:49
> #21 0x2b1b19c7 in PyCFunction_Call (func=0x2bf85510, 
> arg=0x2aaabc6211f8, kw=0x0) at Objects/methodobject.c:85
> #22 0x2b14d673 in PyObject_Call (func=0x2bf85510, 
> arg=0x2aaabc6211f8, kw=0x0) at Objects/abstract.c:2529
> #23 0x2b25ad03 in PyEval_CallObjectWithKeywords (func=0x2bf85510, 
> arg=0x2aaabc6211f8, kw=0x0) at Python/ceval.c:3890
> #24 0x2b2543e5 in PyEval_EvalFrameEx (f=0xe8aef0, throwflag=0) at 
> Python/ceval.c:2333
> #25 0x2b258b7e in PyEval_EvalCodeEx (co=0x2aaabc61ce00, 
> globals=0xe95a70, locals=0xe95a70, args=0x0, argcount=0, kws=0x0, kwcount=0, 
> defs=0x0, defcount=0, 
>closure=0x0) at Python/ceval.c:3253
> #26 0x2b24b5ce in PyEval_EvalCode (co=0x2aaabc61ce00, 
> globals=0xe95a70, locals=0xe95a70) at Python/ceval.c:667
> #27 0x2b2779e2 in PyImport_ExecCodeModuleEx (name=0xaa9080 
> "util.myappMPI", 

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-27 Thread Amos Anderson
FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. Also, I 
have some gdb output (from 1.7.5) for your perusal, including a printout of 
some of the variables' values.



Starting program: /home/user/myapp/tools/python/bin/python 
test/regression/regression-test.py test/regression/regression-jobs
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at argv.c:299
299 str_len += strlen(*p) + 1;
(gdb) where
#0  0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
argv.c:299
#1  0x2ab2ce4e in ompi_mpi_init (argc=2, argv=0xa39390, requested=0, 
provided=0x7fffba98) at runtime/ompi_mpi_init.c:450
#2  0x2ab63e39 in PMPI_Init (argc=0x7fffbb8c, argv=0x7fffbb80) 
at pinit.c:84
#3  0x2aaab7b965d6 in boost::mpi::environment::environment (this=0xa3a1d0, 
argc=@0x7fffbb8c, argv=@0x7fffbb80, abort_on_exception=true)
at ../tools/boost/libs/mpi/src/environment.cpp:98
#4  0x2aaabc7b311d in boost::mpi::python::mpi_init (python_argv=..., 
abort_on_exception=true) at 
../tools/boost/libs/mpi/src/python/py_environment.cpp:60
#5  0x2aaabc7b33fb in boost::mpi::python::export_environment () at 
../tools/boost/libs/mpi/src/python/py_environment.cpp:94
#6  0x2aaabc7d5ab5 in boost::mpi::python::init_module_mpi () at 
../tools/boost/libs/mpi/src/python/module.cpp:44
#7  0x2aaab792a2f2 in 
boost::detail::function::void_function_ref_invoker0::invoke 
(function_obj_ptr=...)
at ../tools/boost/boost/function/function_template.hpp:188
#8  0x2aaab7929e6b in boost::function0::operator() 
(this=0x7fffc110) at ../tools/boost/boost/function/function_template.hpp:767
#9  0x2aaab7928f11 in boost::python::handle_exception_impl (f=...) at 
../tools/boost/libs/python/src/errors.cpp:25
#10 0x2aaab792a54f in boost::python::handle_exception 
(f=0x2aaabc7d5746 ) at 
../tools/boost/boost/python/errors.hpp:29
#11 0x2aaab792a1d9 in boost::python::detail::(anonymous 
namespace)::init_module_in_scope (m=0x2aaabc617f68, 
init_function=0x2aaabc7d5746 ) at 
../tools/boost/libs/python/src/module.cpp:24
#12 0x2aaab792a26c in boost::python::detail::init_module 
(name=0x2aaabc7f7f4d "mpi", init_function=0x2aaabc7d5746 
)
at ../tools/boost/libs/python/src/module.cpp:59
#13 0x2aaabc7d5b2b in boost::mpi::python::initmpi () at 
../tools/boost/libs/mpi/src/python/module.cpp:34
#14 0x2b27e095 in _PyImport_LoadDynamicModule (name=0xac9435 "mpi", 
pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", fp=0xaca450) at 
./Python/importdl.c:53
#15 0x2b279fd4 in load_module (name=0xac9435 "mpi", fp=0xaca450, 
pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", type=3, loader=0x0) at 
Python/import.c:1915
#16 0x2b27c2e8 in import_submodule (mod=0x2b533a20, 
subname=0xac9435 "mpi", fullname=0xac9435 "mpi") at Python/import.c:2700
#17 0x2b27b8fa in load_next (mod=0x2aaab0f075a8, altmod=0x2b533a20, 
p_name=0x7fffc3f8, buf=0xac9430 "util.mpi", p_buflen=0x7fffc408)
at Python/import.c:2519
#18 0x2b27a98d in import_module_level (name=0x0, globals=0xe95a70, 
locals=0xe95a70, fromlist=0x2b533a20, level=-1) at Python/import.c:2224
#19 0x2b27aeda in PyImport_ImportModuleLevel (name=0x2aaab0f00964 
"mpi", globals=0xe95a70, locals=0xe95a70, fromlist=0x2b533a20, level=-1) at 
Python/import.c:2288
#20 0x2b2419c4 in builtin___import__ (self=0x0, args=0x2aaabc6211f8, 
kwds=0x0) at Python/bltinmodule.c:49
#21 0x2b1b19c7 in PyCFunction_Call (func=0x2bf85510, 
arg=0x2aaabc6211f8, kw=0x0) at Objects/methodobject.c:85
#22 0x2b14d673 in PyObject_Call (func=0x2bf85510, 
arg=0x2aaabc6211f8, kw=0x0) at Objects/abstract.c:2529
#23 0x2b25ad03 in PyEval_CallObjectWithKeywords (func=0x2bf85510, 
arg=0x2aaabc6211f8, kw=0x0) at Python/ceval.c:3890
#24 0x2b2543e5 in PyEval_EvalFrameEx (f=0xe8aef0, throwflag=0) at 
Python/ceval.c:2333
#25 0x2b258b7e in PyEval_EvalCodeEx (co=0x2aaabc61ce00, 
globals=0xe95a70, locals=0xe95a70, args=0x0, argcount=0, kws=0x0, kwcount=0, 
defs=0x0, defcount=0, 
closure=0x0) at Python/ceval.c:3253
#26 0x2b24b5ce in PyEval_EvalCode (co=0x2aaabc61ce00, globals=0xe95a70, 
locals=0xe95a70) at Python/ceval.c:667
#27 0x2b2779e2 in PyImport_ExecCodeModuleEx (name=0xaa9080 
"util.myappMPI", co=0x2aaabc61ce00, pathname=0xe7d380 
"/home/user/myapp/src/util/myappMPI.pyc")
at Python/import.c:709
#28 0x2b278629 in load_source_module (name=0xaa9080 "util.myappMPI", 
pathname=0xe7d380 "/home/user/myapp/src/util/myappMPI.pyc", fp=0x76eb00)
at Python/import.c:1099
#29 0x2b279fa0 in load_module (name=0xaa9080 "util.myappMPI", 
fp=0x76eb00, pathname=0x80fe00