Re: [OMPI users] mpirun fails across cluster

2015-02-27 Thread Gus Correa

Hi Syed Ahsan Ali

On 02/27/2015 12:46 PM, Syed Ahsan Ali wrote:

Oh sorry. That is related to application. I need to recompile
application too I guess.


You surely do.

Also, make sure the environment, in particular PATH and LD_LIBRARY_PATH
is propagated to the compute nodes.
Not doing that is a common cause of trouble.
OpenMPI needs PATH and LD_LIBRARY_PATH at runtime also.

I hope this helps,
Gus Correa



On Fri, Feb 27, 2015 at 10:44 PM, Syed Ahsan Ali  wrote:

Dear Gus

Thanks once again for suggestion. Yes I did that before installation
to new path. I am getting error now about some library
tstint2lm: error while loading shared libraries:
libmpi_usempif08.so.0: cannot open shared object file: No such file or
directory

While library is present
[pmdtest@hpc bin]$ locate libmpi_usempif08.so.0
/state/partition1/apps/openmpi-1.8.4_gcc-4.9.2/lib/libmpi_usempif08.so.0
/state/partition1/apps/openmpi-1.8.4_gcc-4.9.2/lib/libmpi_usempif08.so.0.6.0
in path as well

echo $LD_LIBRARY_PATH
/share/apps/openmpi-1.8.4_gcc-4.9.2/lib:/share/apps/libpng-1.6.16/lib:/share/apps/netcdf-fortran-4.4.1_gcc-4.9.2_wo_hdf5/lib:/share/apps/netcdf-4.3.2_gcc_wo_hdf5/lib:/share/apps/grib_api-1.11.0/lib:/share/apps/jasper-1.900.1/lib:/share/apps/zlib-1.2.8_gcc-4.9.2/lib:/share/apps/gcc-4.9.2/lib64:/share/apps/gcc-4.9.2/lib:/usr/lib64:/usr/share/Modules/lib:/opt/python/lib
[pmdtest@hpc bin]$

Ahsan

On Fri, Feb 27, 2015 at 10:17 PM, Gus Correa  wrote:

Hi Syed Ahsan Ali

To avoid any leftovers and further confusion,
I suggest that you delete completely the old installation directory.
Then start fresh from the configure step with the prefix pointing to
--prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2

I hope this helps,
Gus Correa

On 02/27/2015 12:11 PM, Syed Ahsan Ali wrote:


Hi Gus

Thanks for prompt response. Well judged, I compiled with /export/apps
prefix so that is most probably the reason. I'll check and update you.

Best wishes
Ahsan

On Fri, Feb 27, 2015 at 10:07 PM, Gus Correa 
wrote:


Hi Syed

This really sounds as a problem specific to Rocks Clusters,
not an issue with Open MPI.
A confusion related to mount points, and soft links used by Rocks.

I haven't used Rocks Clusters in a while,
and I don't remember the details anymore, so please take my
suggestions with a grain of salt, and check them out
before committing to them

Which --prefix did you use when you configured Open MPI?
My suggestion is that you don't use "/export/apps" as a prefix
(and this goes to any application that you install).
but instead use a /share/apps subdirectory, something like:

--prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2

This is because /export/apps is just a mount point on the
frontend/head node, whereas /share/apps is a mount point
across all nodes in the cluster (and, IIRR, a soft link on the
head node).

My recollection is that the Rocks documentation was obscure
about this, not making clear the difference between
/export/apps and /share/apps.

Issuing the Rocks commands:
"tentakel 'ls -d /export/apps'"
"tentakel 'ls -d /share/apps'"
may show something useful.

I hope this helps,
Gus Correa


On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote:



I am trying to run openmpi application on my cluster.  But the mpirun
fails, simple hostname command gives this error

[pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname

--
Sorry!  You were supposed to get help about:
   opal_init:startup:internal-failure
But I couldn't open the help file:


/export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
No such file or directory.  Sorry!

--

--
Sorry!  You were supposed to get help about:
   orte_init:startup:internal-failure
But I couldn't open the help file:

/export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime:
No such file or directory.  Sorry!

--
[compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in
file orted/orted_main.c at line 369

--
ORTE was unable to reliably start one or more daemons.

I am using Environment modules to load OpenMPI 1.8.4 and PATH and
LD_LIBRARY_PATH points to same openmpi on nodes

[pmdtest@hpc bin]$ which mpirun
/share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun
[pmdtest@hpc bin]$ ssh compute-0-0
Last login: Sat Feb 28 02:15:50 2015 from hpc.local
Rocks Compute Node
Rocks 6.1.1 (Sand Boa)
Profile built 01:53 28-Feb-2015
Kickstarted 01:59 28-Feb-2015
[pmdtest@compute-0-0 ~]$ which mpirun
/share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun

The only this I notice important is that in the error it is referring to



Re: [OMPI users] mpirun fails across cluster

2015-02-27 Thread Syed Ahsan Ali
Oh sorry. That is related to application. I need to recompile
application too I guess.

On Fri, Feb 27, 2015 at 10:44 PM, Syed Ahsan Ali  wrote:
> Dear Gus
>
> Thanks once again for suggestion. Yes I did that before installation
> to new path. I am getting error now about some library
> tstint2lm: error while loading shared libraries:
> libmpi_usempif08.so.0: cannot open shared object file: No such file or
> directory
>
> While library is present
> [pmdtest@hpc bin]$ locate libmpi_usempif08.so.0
> /state/partition1/apps/openmpi-1.8.4_gcc-4.9.2/lib/libmpi_usempif08.so.0
> /state/partition1/apps/openmpi-1.8.4_gcc-4.9.2/lib/libmpi_usempif08.so.0.6.0
> in path as well
>
> echo $LD_LIBRARY_PATH
> /share/apps/openmpi-1.8.4_gcc-4.9.2/lib:/share/apps/libpng-1.6.16/lib:/share/apps/netcdf-fortran-4.4.1_gcc-4.9.2_wo_hdf5/lib:/share/apps/netcdf-4.3.2_gcc_wo_hdf5/lib:/share/apps/grib_api-1.11.0/lib:/share/apps/jasper-1.900.1/lib:/share/apps/zlib-1.2.8_gcc-4.9.2/lib:/share/apps/gcc-4.9.2/lib64:/share/apps/gcc-4.9.2/lib:/usr/lib64:/usr/share/Modules/lib:/opt/python/lib
> [pmdtest@hpc bin]$
>
> Ahsan
>
> On Fri, Feb 27, 2015 at 10:17 PM, Gus Correa  wrote:
>> Hi Syed Ahsan Ali
>>
>> To avoid any leftovers and further confusion,
>> I suggest that you delete completely the old installation directory.
>> Then start fresh from the configure step with the prefix pointing to
>> --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2
>>
>> I hope this helps,
>> Gus Correa
>>
>> On 02/27/2015 12:11 PM, Syed Ahsan Ali wrote:
>>>
>>> Hi Gus
>>>
>>> Thanks for prompt response. Well judged, I compiled with /export/apps
>>> prefix so that is most probably the reason. I'll check and update you.
>>>
>>> Best wishes
>>> Ahsan
>>>
>>> On Fri, Feb 27, 2015 at 10:07 PM, Gus Correa 
>>> wrote:

 Hi Syed

 This really sounds as a problem specific to Rocks Clusters,
 not an issue with Open MPI.
 A confusion related to mount points, and soft links used by Rocks.

 I haven't used Rocks Clusters in a while,
 and I don't remember the details anymore, so please take my
 suggestions with a grain of salt, and check them out
 before committing to them

 Which --prefix did you use when you configured Open MPI?
 My suggestion is that you don't use "/export/apps" as a prefix
 (and this goes to any application that you install).
 but instead use a /share/apps subdirectory, something like:

 --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2

 This is because /export/apps is just a mount point on the
 frontend/head node, whereas /share/apps is a mount point
 across all nodes in the cluster (and, IIRR, a soft link on the
 head node).

 My recollection is that the Rocks documentation was obscure
 about this, not making clear the difference between
 /export/apps and /share/apps.

 Issuing the Rocks commands:
 "tentakel 'ls -d /export/apps'"
 "tentakel 'ls -d /share/apps'"
 may show something useful.

 I hope this helps,
 Gus Correa


 On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote:
>
>
> I am trying to run openmpi application on my cluster.  But the mpirun
> fails, simple hostname command gives this error
>
> [pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname
>
> --
> Sorry!  You were supposed to get help about:
>   opal_init:startup:internal-failure
> But I couldn't open the help file:
>
>
> /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
> No such file or directory.  Sorry!
>
> --
>
> --
> Sorry!  You were supposed to get help about:
>   orte_init:startup:internal-failure
> But I couldn't open the help file:
>
> /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime:
> No such file or directory.  Sorry!
>
> --
> [compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in
> file orted/orted_main.c at line 369
>
> --
> ORTE was unable to reliably start one or more daemons.
>
> I am using Environment modules to load OpenMPI 1.8.4 and PATH and
> LD_LIBRARY_PATH points to same openmpi on nodes
>
> [pmdtest@hpc bin]$ which mpirun
> /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun
> [pmdtest@hpc bin]$ ssh compute-0-0
> Last login: Sat Feb 28 02:15:50 2015 from hpc.local
> Rocks Compute Node
> Rocks 6.1.1 (Sand Boa)
> Profile built 01:53 28-Feb-2015
> Kickstarted 01:59 

Re: [OMPI users] mpirun fails across cluster

2015-02-27 Thread Gus Correa

Hi Syed Ahsan Ali

To avoid any leftovers and further confusion,
I suggest that you delete completely the old installation directory.
Then start fresh from the configure step with the prefix pointing to
--prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2

I hope this helps,
Gus Correa

On 02/27/2015 12:11 PM, Syed Ahsan Ali wrote:

Hi Gus

Thanks for prompt response. Well judged, I compiled with /export/apps
prefix so that is most probably the reason. I'll check and update you.

Best wishes
Ahsan

On Fri, Feb 27, 2015 at 10:07 PM, Gus Correa  wrote:

Hi Syed

This really sounds as a problem specific to Rocks Clusters,
not an issue with Open MPI.
A confusion related to mount points, and soft links used by Rocks.

I haven't used Rocks Clusters in a while,
and I don't remember the details anymore, so please take my
suggestions with a grain of salt, and check them out
before committing to them

Which --prefix did you use when you configured Open MPI?
My suggestion is that you don't use "/export/apps" as a prefix
(and this goes to any application that you install).
but instead use a /share/apps subdirectory, something like:

--prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2

This is because /export/apps is just a mount point on the
frontend/head node, whereas /share/apps is a mount point
across all nodes in the cluster (and, IIRR, a soft link on the
head node).

My recollection is that the Rocks documentation was obscure
about this, not making clear the difference between
/export/apps and /share/apps.

Issuing the Rocks commands:
"tentakel 'ls -d /export/apps'"
"tentakel 'ls -d /share/apps'"
may show something useful.

I hope this helps,
Gus Correa


On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote:


I am trying to run openmpi application on my cluster.  But the mpirun
fails, simple hostname command gives this error

[pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname
--
Sorry!  You were supposed to get help about:
  opal_init:startup:internal-failure
But I couldn't open the help file:

/export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
No such file or directory.  Sorry!
--
--
Sorry!  You were supposed to get help about:
  orte_init:startup:internal-failure
But I couldn't open the help file:
  /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime:
No such file or directory.  Sorry!
--
[compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in
file orted/orted_main.c at line 369
--
ORTE was unable to reliably start one or more daemons.

I am using Environment modules to load OpenMPI 1.8.4 and PATH and
LD_LIBRARY_PATH points to same openmpi on nodes

[pmdtest@hpc bin]$ which mpirun
/share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun
[pmdtest@hpc bin]$ ssh compute-0-0
Last login: Sat Feb 28 02:15:50 2015 from hpc.local
Rocks Compute Node
Rocks 6.1.1 (Sand Boa)
Profile built 01:53 28-Feb-2015
Kickstarted 01:59 28-Feb-2015
[pmdtest@compute-0-0 ~]$ which mpirun
/share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun

The only this I notice important is that in the error it is referring to

/export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:

While it should have shown
/share/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
which is the path compute nodes see.

Please help!
Ahsan
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/02/26411.php



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/02/26412.php








Re: [OMPI users] mpirun fails across cluster

2015-02-27 Thread Syed Ahsan Ali
Hi Gus

Thanks for prompt response. Well judged, I compiled with /export/apps
prefix so that is most probably the reason. I'll check and update you.

Best wishes
Ahsan

On Fri, Feb 27, 2015 at 10:07 PM, Gus Correa  wrote:
> Hi Syed
>
> This really sounds as a problem specific to Rocks Clusters,
> not an issue with Open MPI.
> A confusion related to mount points, and soft links used by Rocks.
>
> I haven't used Rocks Clusters in a while,
> and I don't remember the details anymore, so please take my
> suggestions with a grain of salt, and check them out
> before committing to them
>
> Which --prefix did you use when you configured Open MPI?
> My suggestion is that you don't use "/export/apps" as a prefix
> (and this goes to any application that you install).
> but instead use a /share/apps subdirectory, something like:
>
> --prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2
>
> This is because /export/apps is just a mount point on the
> frontend/head node, whereas /share/apps is a mount point
> across all nodes in the cluster (and, IIRR, a soft link on the
> head node).
>
> My recollection is that the Rocks documentation was obscure
> about this, not making clear the difference between
> /export/apps and /share/apps.
>
> Issuing the Rocks commands:
> "tentakel 'ls -d /export/apps'"
> "tentakel 'ls -d /share/apps'"
> may show something useful.
>
> I hope this helps,
> Gus Correa
>
>
> On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote:
>>
>> I am trying to run openmpi application on my cluster.  But the mpirun
>> fails, simple hostname command gives this error
>>
>> [pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname
>> --
>> Sorry!  You were supposed to get help about:
>>  opal_init:startup:internal-failure
>> But I couldn't open the help file:
>>
>> /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
>> No such file or directory.  Sorry!
>> --
>> --
>> Sorry!  You were supposed to get help about:
>>  orte_init:startup:internal-failure
>> But I couldn't open the help file:
>>  /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime:
>> No such file or directory.  Sorry!
>> --
>> [compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in
>> file orted/orted_main.c at line 369
>> --
>> ORTE was unable to reliably start one or more daemons.
>>
>> I am using Environment modules to load OpenMPI 1.8.4 and PATH and
>> LD_LIBRARY_PATH points to same openmpi on nodes
>>
>> [pmdtest@hpc bin]$ which mpirun
>> /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun
>> [pmdtest@hpc bin]$ ssh compute-0-0
>> Last login: Sat Feb 28 02:15:50 2015 from hpc.local
>> Rocks Compute Node
>> Rocks 6.1.1 (Sand Boa)
>> Profile built 01:53 28-Feb-2015
>> Kickstarted 01:59 28-Feb-2015
>> [pmdtest@compute-0-0 ~]$ which mpirun
>> /share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun
>>
>> The only this I notice important is that in the error it is referring to
>>
>> /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
>>
>> While it should have shown
>> /share/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
>> which is the path compute nodes see.
>>
>> Please help!
>> Ahsan
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/02/26411.php
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/02/26412.php



-- 
Syed Ahsan Ali Bokhari
Electronic Engineer (EE)

Research & Development Division
Pakistan Meteorological Department H-8/4, Islamabad.
Phone # off  +92518358714
Cell # +923155145014


Re: [OMPI users] mpirun fails across cluster

2015-02-27 Thread Gus Correa

Hi Syed

This really sounds as a problem specific to Rocks Clusters,
not an issue with Open MPI.
A confusion related to mount points, and soft links used by Rocks.

I haven't used Rocks Clusters in a while,
and I don't remember the details anymore, so please take my
suggestions with a grain of salt, and check them out
before committing to them

Which --prefix did you use when you configured Open MPI?
My suggestion is that you don't use "/export/apps" as a prefix
(and this goes to any application that you install).
but instead use a /share/apps subdirectory, something like:

--prefix=/share/apps/openmpi-1.8.4_gcc-4.9.2

This is because /export/apps is just a mount point on the
frontend/head node, whereas /share/apps is a mount point
across all nodes in the cluster (and, IIRR, a soft link on the
head node).

My recollection is that the Rocks documentation was obscure
about this, not making clear the difference between
/export/apps and /share/apps.

Issuing the Rocks commands:
"tentakel 'ls -d /export/apps'"
"tentakel 'ls -d /share/apps'"
may show something useful.

I hope this helps,
Gus Correa

On 02/27/2015 11:47 AM, Syed Ahsan Ali wrote:

I am trying to run openmpi application on my cluster.  But the mpirun
fails, simple hostname command gives this error

[pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname
--
Sorry!  You were supposed to get help about:
 opal_init:startup:internal-failure
But I couldn't open the help file:
 /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
No such file or directory.  Sorry!
--
--
Sorry!  You were supposed to get help about:
 orte_init:startup:internal-failure
But I couldn't open the help file:
 /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime:
No such file or directory.  Sorry!
--
[compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in
file orted/orted_main.c at line 369
--
ORTE was unable to reliably start one or more daemons.

I am using Environment modules to load OpenMPI 1.8.4 and PATH and
LD_LIBRARY_PATH points to same openmpi on nodes

[pmdtest@hpc bin]$ which mpirun
/share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun
[pmdtest@hpc bin]$ ssh compute-0-0
Last login: Sat Feb 28 02:15:50 2015 from hpc.local
Rocks Compute Node
Rocks 6.1.1 (Sand Boa)
Profile built 01:53 28-Feb-2015
Kickstarted 01:59 28-Feb-2015
[pmdtest@compute-0-0 ~]$ which mpirun
/share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun

The only this I notice important is that in the error it is referring to
 /export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:

While it should have shown
/share/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
which is the path compute nodes see.

Please help!
Ahsan
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/02/26411.php





[OMPI users] mpirun fails across cluster

2015-02-27 Thread Syed Ahsan Ali
I am trying to run openmpi application on my cluster.  But the mpirun
fails, simple hostname command gives this error

[pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname
--
Sorry!  You were supposed to get help about:
opal_init:startup:internal-failure
But I couldn't open the help file:
/export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
No such file or directory.  Sorry!
--
--
Sorry!  You were supposed to get help about:
orte_init:startup:internal-failure
But I couldn't open the help file:
/export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-orte-runtime:
No such file or directory.  Sorry!
--
[compute-0-0.local:03410] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in
file orted/orted_main.c at line 369
--
ORTE was unable to reliably start one or more daemons.

I am using Environment modules to load OpenMPI 1.8.4 and PATH and
LD_LIBRARY_PATH points to same openmpi on nodes

[pmdtest@hpc bin]$ which mpirun
/share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun
[pmdtest@hpc bin]$ ssh compute-0-0
Last login: Sat Feb 28 02:15:50 2015 from hpc.local
Rocks Compute Node
Rocks 6.1.1 (Sand Boa)
Profile built 01:53 28-Feb-2015
Kickstarted 01:59 28-Feb-2015
[pmdtest@compute-0-0 ~]$ which mpirun
/share/apps/openmpi-1.8.4_gcc-4.9.2/bin/mpirun

The only this I notice important is that in the error it is referring to
/export/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:

While it should have shown
/share/apps/openmpi-1.8.4_gcc-4.9.2/share/openmpi/help-opal-runtime.txt:
which is the path compute nodes see.

Please help!
Ahsan