Hi there, I recently compiled OpenMPI 1.3.3 for a NetBSD platform as part of an attempt to get some MPI-based codes running on the SGE cycle stealing grid we have in the School here.
I should point out that this has not been done within the pkgsrc build system as yet but that I found I was able to get a working environment by starting out with: ./configure --prefix=/vol/grid/pkg/openmpi-1.3.3 \ --with-sge --disable-dlopen --enable-contrib-no-build=vt OK, following a recent rebuild of the underlying NetBSD OS on the machines which participate in our grid, I am now seeing the following error message when trying to run a simple mpirun on a single box: $ mpirun -n 4 hello_f77 [somebox.ecs.vuw.ac.nz:04414] opal_ifinit: ioctl(SIOCGIFFLAGS) failed with errno=6 Hello, world, I am 0 of 4 Hello, world, I am 1 of 4 Hello, world, I am 2 of 4 Hello, world, I am 3 of 4 Whilst this runs, I was not seeing the error before the OS rebuild. When running on a "server" machine within the grid, a machine I am told should not be any different to the workstation I was using above in respect of user environment, I get a different error and find that the job does not run at all. This case seems to producean error message that is oft reported within the OpenMPI community: $ mpirun -n 4 hello_f77 [somebox2.ecs.vuw.ac.nz:25244] [[51186,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 150 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is ... orte_rml_base_select failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [somebox2.ecs.vuw.ac.nz:25244] [[51186,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is ... orte_ess_set_name failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [somebox2.ecs.vuw.ac.nz:25244] [[51186,0],0] ORTE_ERROR_LOG: Error in file orterun.c at line 473 Anyone like to suggest what I might do to better understand and so possibly correct these issues? Kevin -- Kevin M. Buckley Room: CO327 School of Engineering and Phone: +64 4 463 5971 Computer Science Victoria University of Wellington New Zealand