Hi all: I recently rebuilt my cluster from rocks 5 to rocks 6 (which is based on CentOS 6.2) using the official spec file and my build options as before. It all built successfully and all appeared good. That is, until one tried to use it. This is built with torque integration, and its run through torque. When a user's job runs, this ends up in the error file and the program does not run successfully:
-------------------------------------------------------------------------- Open RTE was unable to open the hostfile: /opt/openmpi-gcc/1.6/etc/openmpi-default-hostfile Check to make sure the path and filename are correct. -------------------------------------------------------------------------- [compute-0-2.local:13834] [[12466,0],0] ORTE_ERROR_LOG: Not found in file base/rmaps_base_support_fns.c at line 88 [compute-0-2.local:13834] [[12466,0],0] ORTE_ERROR_LOG: Not found in file rmaps_rr.c at line 82 [compute-0-2.local:13834] [[12466,0],0] ORTE_ERROR_LOG: Not found in file base/rmaps_base_map_job.c at line 88 [compute-0-2.local:13834] [[12466,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 105 [compute-0-2.local:13834] [[12466,0],0] ORTE_ERROR_LOG: Not found in file plm_tm_module.c at line 194 -------------------------------------------------------------------------- A daemon (pid unknown) died unexpectedly on signal 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- This has been confirmed with several different node assignments. Any ideas on cause or fixes? I built it with this command: rpmbuild -bb --define 'install_in_opt 1' --define 'install_modulefile 1' --define 'modules_rpm_name environment-modules' --define 'build_all_in_one_rpm 0' --define 'configure_options --with-tm=/opt/torque' --define '_name openmpi-gcc' --define 'makeopts -J8' openmpi.spec (and the PGI version was built with: CC=pgcc CXX=pgCC F77=pgf77 FC=pgf90 rpmbuild -bb --define 'install_in_opt 1' --define 'install_modulefile 1' --define 'modules_rpm_name environment-modules' --define 'build_all_in_one_rpm 0' --define 'configure_options --with-tm=/opt/torque' --define '_name openmpi-pgi' --define 'use_default_rpm_opt_flags 0' openmpi.spec ) --Jim