Re: [OMPI users] mpirun fails across cluster
Hi Syed Ahsan Ali On 02/27/2015 12:46 PM, Syed Ahsan Ali wrote: Oh sorry. That is related to application. I need to recompile application too I guess. You surely do. Also, make sure the environment, in particular PATH and LD_LIBRARY_PATH is propagated to the compute nodes. Not doing that is a common cause of trouble. OpenMPI needs PATH and LD_LIBRARY_PATH at runtime also. I hope this helps, Gus Correa On Fri, Feb 27, 2015 at 10:44 PM, Syed Ahsan Aliwrote: Dear Gus Thanks once again for suggestion. Yes I did that before installation to new path. I am getting error now about some library tstint2lm: error while loading shared libraries: libmpi_usempif08.so.0: cannot open shared object file: No such file or directory While library is present [pmdtest@hpc bin]$ locate libmpi_usempif08.so.0 /state/partition1/apps/openmpi-1.8.4_gcc-4.9.2/lib/libmpi_usempif08.so.0 /state/partition1/apps/openmpi-1.8.4_gcc-4.9.2/lib/libmpi_usempif08.so.0.6.0 in path as well echo $LD_LIBRARY_PATH /share/apps/openmpi-1.8.4_gcc-4.9.2/lib:/share/apps/libpng-1.6.16/lib:/share/apps/netcdf-fortran-4.4.1_gcc-4.9.2_wo_hdf5/lib:/share/apps/netcdf-4.3.2_gcc_wo_hdf5/lib:/share/apps/grib_api-1.11.0/lib:/share/apps/jasper-1.900.1/lib:/share/apps/zlib-1.2.8_gcc-4.9.2/lib:/share/apps/gcc-4.9.2/lib64:/share/apps/gcc-4.9.2/lib:/usr/lib64:/usr/share/Modules/lib:/opt/python/lib [pmdtest@hpc bin]$ Ahsan On Fri, Feb 27, 2015 at 10:17 PM, Gus Correa wrote: Hi Syed Ahsan Ali To avoid any leftovers and further confusion, I suggest that you delete completely the old installation directory. Then start fresh from the configure step with the prefix pointing to --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2 I hope this helps, Gus Correa On 02/27/2015 12:11 PM, Syed Ahsan Ali wrote: Hi Gus Thanks for prompt response. Well judged, I compiled with /export/apps prefix so that is most probably the reason. I'll check and update you. Best wishes Ahsan On Fri, Feb 27, 2015 at 10:07 PM, Gus Correa wrote: Hi Syed This really sounds as a problem specific to Rocks Clusters, not an issue with Open MPI. A confusion related to mount points, and soft links used by Rocks. I haven't used Rocks Clusters in a while, and I don't remember the details anymore, so please take my suggestions with a grain of salt, and check them out before committing to them Which --prefix did you use when you configured Open MPI? My suggestion is that you don't use "/export/apps" as a prefix (and this goes to any application that you install). but instead use a /share/apps subdirectory, something like: --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2 This is because /export/apps is just a mount point on the frontend/head node, whereas /share/apps is a mount point across all nodes in the cluster (and, IIRR, a soft link on the head node). My recollection is that the Rocks documentation was obscure about this, not making clear the difference between /export/apps and /share/apps. Issuing the Rocks commands: "tentakel 'ls -d /export/apps'" "tentakel 'ls -d /share/apps'" may show something useful. I hope this helps, Gus Correa On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote: I am trying to run openmpi application on my cluster. But the mpirun fails, simple hostname command gives this error [pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname -- Sorry! You were supposed to get help about: opal_init:startup:internal-failure But I couldn't open the help file: /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: No such file or directory. Sorry! -- -- Sorry! You were supposed to get help about: orte_init:startup:internal-failure But I couldn't open the help file: /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime: No such file or directory. Sorry! -- [compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 369 -- ORTE was unable to reliably start one or more daemons. I am using Environment modules to load OpenMPI 1.8.4 and PATH and LD_LIBRARY_PATH points to same openmpi on nodes [pmdtest@hpc bin]$ which mpirun /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun [pmdtest@hpc bin]$ ssh compute-0-0 Last login: Sat Feb 28 02:15:50 2015 from hpc.local Rocks Compute Node Rocks 6.1.1 (Sand Boa) Profile built 01:53 28-Feb-2015 Kickstarted 01:59 28-Feb-2015 [pmdtest@compute-0-0 ~]$ which mpirun /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun The only this I notice important is that in the error it is referring to
Re: [OMPI users] mpirun fails across cluster
Oh sorry. That is related to application. I need to recompile application too I guess. On Fri, Feb 27, 2015 at 10:44 PM, Syed Ahsan Aliwrote: > Dear Gus > > Thanks once again for suggestion. Yes I did that before installation > to new path. I am getting error now about some library > tstint2lm: error while loading shared libraries: > libmpi_usempif08.so.0: cannot open shared object file: No such file or > directory > > While library is present > [pmdtest@hpc bin]$ locate libmpi_usempif08.so.0 > /state/partition1/apps/openmpi-1.8.4_gcc-4.9.2/lib/libmpi_usempif08.so.0 > /state/partition1/apps/openmpi-1.8.4_gcc-4.9.2/lib/libmpi_usempif08.so.0.6.0 > in path as well > > echo $LD_LIBRARY_PATH > /share/apps/openmpi-1.8.4_gcc-4.9.2/lib:/share/apps/libpng-1.6.16/lib:/share/apps/netcdf-fortran-4.4.1_gcc-4.9.2_wo_hdf5/lib:/share/apps/netcdf-4.3.2_gcc_wo_hdf5/lib:/share/apps/grib_api-1.11.0/lib:/share/apps/jasper-1.900.1/lib:/share/apps/zlib-1.2.8_gcc-4.9.2/lib:/share/apps/gcc-4.9.2/lib64:/share/apps/gcc-4.9.2/lib:/usr/lib64:/usr/share/Modules/lib:/opt/python/lib > [pmdtest@hpc bin]$ > > Ahsan > > On Fri, Feb 27, 2015 at 10:17 PM, Gus Correa wrote: >> Hi Syed Ahsan Ali >> >> To avoid any leftovers and further confusion, >> I suggest that you delete completely the old installation directory. >> Then start fresh from the configure step with the prefix pointing to >> --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2 >> >> I hope this helps, >> Gus Correa >> >> On 02/27/2015 12:11 PM, Syed Ahsan Ali wrote: >>> >>> Hi Gus >>> >>> Thanks for prompt response. Well judged, I compiled with /export/apps >>> prefix so that is most probably the reason. I'll check and update you. >>> >>> Best wishes >>> Ahsan >>> >>> On Fri, Feb 27, 2015 at 10:07 PM, Gus Correa >>> wrote: Hi Syed This really sounds as a problem specific to Rocks Clusters, not an issue with Open MPI. A confusion related to mount points, and soft links used by Rocks. I haven't used Rocks Clusters in a while, and I don't remember the details anymore, so please take my suggestions with a grain of salt, and check them out before committing to them Which --prefix did you use when you configured Open MPI? My suggestion is that you don't use "/export/apps" as a prefix (and this goes to any application that you install). but instead use a /share/apps subdirectory, something like: --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2 This is because /export/apps is just a mount point on the frontend/head node, whereas /share/apps is a mount point across all nodes in the cluster (and, IIRR, a soft link on the head node). My recollection is that the Rocks documentation was obscure about this, not making clear the difference between /export/apps and /share/apps. Issuing the Rocks commands: "tentakel 'ls -d /export/apps'" "tentakel 'ls -d /share/apps'" may show something useful. I hope this helps, Gus Correa On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote: > > > I am trying to run openmpi application on my cluster. But the mpirun > fails, simple hostname command gives this error > > [pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname > > -- > Sorry! You were supposed to get help about: > opal_init:startup:internal-failure > But I couldn't open the help file: > > > /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: > No such file or directory. Sorry! > > -- > > -- > Sorry! You were supposed to get help about: > orte_init:startup:internal-failure > But I couldn't open the help file: > > /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime: > No such file or directory. Sorry! > > -- > [compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in > file orted/orted_main.c at line 369 > > -- > ORTE was unable to reliably start one or more daemons. > > I am using Environment modules to load OpenMPI 1.8.4 and PATH and > LD_LIBRARY_PATH points to same openmpi on nodes > > [pmdtest@hpc bin]$ which mpirun > /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun > [pmdtest@hpc bin]$ ssh compute-0-0 > Last login: Sat Feb 28 02:15:50 2015 from hpc.local > Rocks Compute Node > Rocks 6.1.1 (Sand Boa) > Profile built 01:53 28-Feb-2015 > Kickstarted 01:59
Re: [OMPI users] mpirun fails across cluster
Hi Syed Ahsan Ali To avoid any leftovers and further confusion, I suggest that you delete completely the old installation directory. Then start fresh from the configure step with the prefix pointing to --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2 I hope this helps, Gus Correa On 02/27/2015 12:11 PM, Syed Ahsan Ali wrote: Hi Gus Thanks for prompt response. Well judged, I compiled with /export/apps prefix so that is most probably the reason. I'll check and update you. Best wishes Ahsan On Fri, Feb 27, 2015 at 10:07 PM, Gus Correawrote: Hi Syed This really sounds as a problem specific to Rocks Clusters, not an issue with Open MPI. A confusion related to mount points, and soft links used by Rocks. I haven't used Rocks Clusters in a while, and I don't remember the details anymore, so please take my suggestions with a grain of salt, and check them out before committing to them Which --prefix did you use when you configured Open MPI? My suggestion is that you don't use "/export/apps" as a prefix (and this goes to any application that you install). but instead use a /share/apps subdirectory, something like: --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2 This is because /export/apps is just a mount point on the frontend/head node, whereas /share/apps is a mount point across all nodes in the cluster (and, IIRR, a soft link on the head node). My recollection is that the Rocks documentation was obscure about this, not making clear the difference between /export/apps and /share/apps. Issuing the Rocks commands: "tentakel 'ls -d /export/apps'" "tentakel 'ls -d /share/apps'" may show something useful. I hope this helps, Gus Correa On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote: I am trying to run openmpi application on my cluster. But the mpirun fails, simple hostname command gives this error [pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname -- Sorry! You were supposed to get help about: opal_init:startup:internal-failure But I couldn't open the help file: /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: No such file or directory. Sorry! -- -- Sorry! You were supposed to get help about: orte_init:startup:internal-failure But I couldn't open the help file: /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime: No such file or directory. Sorry! -- [compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 369 -- ORTE was unable to reliably start one or more daemons. I am using Environment modules to load OpenMPI 1.8.4 and PATH and LD_LIBRARY_PATH points to same openmpi on nodes [pmdtest@hpc bin]$ which mpirun /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun [pmdtest@hpc bin]$ ssh compute-0-0 Last login: Sat Feb 28 02:15:50 2015 from hpc.local Rocks Compute Node Rocks 6.1.1 (Sand Boa) Profile built 01:53 28-Feb-2015 Kickstarted 01:59 28-Feb-2015 [pmdtest@compute-0-0 ~]$ which mpirun /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun The only this I notice important is that in the error it is referring to /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: While it should have shown /share/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: which is the path compute nodes see. Please help! Ahsan ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/02/26411.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/02/26412.php
Re: [OMPI users] mpirun fails across cluster
Hi Gus Thanks for prompt response. Well judged, I compiled with /export/apps prefix so that is most probably the reason. I'll check and update you. Best wishes Ahsan On Fri, Feb 27, 2015 at 10:07 PM, Gus Correawrote: > Hi Syed > > This really sounds as a problem specific to Rocks Clusters, > not an issue with Open MPI. > A confusion related to mount points, and soft links used by Rocks. > > I haven't used Rocks Clusters in a while, > and I don't remember the details anymore, so please take my > suggestions with a grain of salt, and check them out > before committing to them > > Which --prefix did you use when you configured Open MPI? > My suggestion is that you don't use "/export/apps" as a prefix > (and this goes to any application that you install). > but instead use a /share/apps subdirectory, something like: > > --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2 > > This is because /export/apps is just a mount point on the > frontend/head node, whereas /share/apps is a mount point > across all nodes in the cluster (and, IIRR, a soft link on the > head node). > > My recollection is that the Rocks documentation was obscure > about this, not making clear the difference between > /export/apps and /share/apps. > > Issuing the Rocks commands: > "tentakel 'ls -d /export/apps'" > "tentakel 'ls -d /share/apps'" > may show something useful. > > I hope this helps, > Gus Correa > > > On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote: >> >> I am trying to run openmpi application on my cluster. But the mpirun >> fails, simple hostname command gives this error >> >> [pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname >> -- >> Sorry! You were supposed to get help about: >> opal_init:startup:internal-failure >> But I couldn't open the help file: >> >> /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: >> No such file or directory. Sorry! >> -- >> -- >> Sorry! You were supposed to get help about: >> orte_init:startup:internal-failure >> But I couldn't open the help file: >> /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime: >> No such file or directory. Sorry! >> -- >> [compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in >> file orted/orted_main.c at line 369 >> -- >> ORTE was unable to reliably start one or more daemons. >> >> I am using Environment modules to load OpenMPI 1.8.4 and PATH and >> LD_LIBRARY_PATH points to same openmpi on nodes >> >> [pmdtest@hpc bin]$ which mpirun >> /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun >> [pmdtest@hpc bin]$ ssh compute-0-0 >> Last login: Sat Feb 28 02:15:50 2015 from hpc.local >> Rocks Compute Node >> Rocks 6.1.1 (Sand Boa) >> Profile built 01:53 28-Feb-2015 >> Kickstarted 01:59 28-Feb-2015 >> [pmdtest@compute-0-0 ~]$ which mpirun >> /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun >> >> The only this I notice important is that in the error it is referring to >> >> /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: >> >> While it should have shown >> /share/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: >> which is the path compute nodes see. >> >> Please help! >> Ahsan >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/02/26411.php >> > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26412.php -- Syed Ahsan Ali Bokhari Electronic Engineer (EE) Research & Development Division Pakistan Meteorological Department H-8/4, Islamabad. Phone # off +92518358714 Cell # +923155145014
Re: [OMPI users] mpirun fails across cluster
Hi Syed This really sounds as a problem specific to Rocks Clusters, not an issue with Open MPI. A confusion related to mount points, and soft links used by Rocks. I haven't used Rocks Clusters in a while, and I don't remember the details anymore, so please take my suggestions with a grain of salt, and check them out before committing to them Which --prefix did you use when you configured Open MPI? My suggestion is that you don't use "/export/apps" as a prefix (and this goes to any application that you install). but instead use a /share/apps subdirectory, something like: --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2 This is because /export/apps is just a mount point on the frontend/head node, whereas /share/apps is a mount point across all nodes in the cluster (and, IIRR, a soft link on the head node). My recollection is that the Rocks documentation was obscure about this, not making clear the difference between /export/apps and /share/apps. Issuing the Rocks commands: "tentakel 'ls -d /export/apps'" "tentakel 'ls -d /share/apps'" may show something useful. I hope this helps, Gus Correa On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote: I am trying to run openmpi application on my cluster. But the mpirun fails, simple hostname command gives this error [pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname -- Sorry! You were supposed to get help about: opal_init:startup:internal-failure But I couldn't open the help file: /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: No such file or directory. Sorry! -- -- Sorry! You were supposed to get help about: orte_init:startup:internal-failure But I couldn't open the help file: /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime: No such file or directory. Sorry! -- [compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 369 -- ORTE was unable to reliably start one or more daemons. I am using Environment modules to load OpenMPI 1.8.4 and PATH and LD_LIBRARY_PATH points to same openmpi on nodes [pmdtest@hpc bin]$ which mpirun /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun [pmdtest@hpc bin]$ ssh compute-0-0 Last login: Sat Feb 28 02:15:50 2015 from hpc.local Rocks Compute Node Rocks 6.1.1 (Sand Boa) Profile built 01:53 28-Feb-2015 Kickstarted 01:59 28-Feb-2015 [pmdtest@compute-0-0 ~]$ which mpirun /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun The only this I notice important is that in the error it is referring to /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: While it should have shown /share/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: which is the path compute nodes see. Please help! Ahsan ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/02/26411.php
[OMPI users] mpirun fails across cluster
I am trying to run openmpi application on my cluster. But the mpirun fails, simple hostname command gives this error [pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname -- Sorry! You were supposed to get help about: opal_init:startup:internal-failure But I couldn't open the help file: /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: No such file or directory. Sorry! -- -- Sorry! You were supposed to get help about: orte_init:startup:internal-failure But I couldn't open the help file: /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime: No such file or directory. Sorry! -- [compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 369 -- ORTE was unable to reliably start one or more daemons. I am using Environment modules to load OpenMPI 1.8.4 and PATH and LD_LIBRARY_PATH points to same openmpi on nodes [pmdtest@hpc bin]$ which mpirun /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun [pmdtest@hpc bin]$ ssh compute-0-0 Last login: Sat Feb 28 02:15:50 2015 from hpc.local Rocks Compute Node Rocks 6.1.1 (Sand Boa) Profile built 01:53 28-Feb-2015 Kickstarted 01:59 28-Feb-2015 [pmdtest@compute-0-0 ~]$ which mpirun /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun The only this I notice important is that in the error it is referring to /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: While it should have shown /share/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt: which is the path compute nodes see. Please help! Ahsan