Hi, I was trying to run some MPI processes as a singletons. On some of the machines they crash on MPI_Init. I use exactly the same binaries of my application and the same installation of openmpi 1.4.2 on two machines and it works on one of them and fails on the other one. This is the command and its output (test is a simple application calling only MPI_Init and MPI_Finalize):
LD_LIBRARY_PATH=/home/gmaj/openmpi/lib ./test [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 161 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_plm_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ../../orte/runtime/orte_init.c at line 132 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ../../orte/orted/orted_main.c at line 323 [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 381 [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 143 [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ../../orte/runtime/orte_init.c at line 132 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value Unable to start a daemon on the local node (-128) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Unable to start a daemon on the local node" (-128) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [host01:21865] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! Any ideas on this? Thanks, Grzegorz Maj