Re: [OMPI users] Newbie question?

2012-09-15 Thread Ralph Castain
No - the mca param has to be specified *before* your executable

mpiexec -mca btl ^openib -n 4 ./a.out

Also, note the space between "btl" and "^openib"


On Sep 15, 2012, at 5:45 PM, John Chludzinski  
wrote:

> Is this what you intended(?):
> 
> $ mpiexec -n 4 ./a.out -mca btl^openib
> 
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>   Host: elzbieta
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
>  rank=1  Results:5.000   6.000   7.000
>8.000
>  rank=0  Results:1.000   2.000   3.000
>4.000
>  rank=2  Results:9.000   10.00   11.00
>12.00
>  rank=3  Results:13.00   14.00   15.00
>16.00
> [elzbieta:02374] 3 more processes have sent help message 
> help-mpi-btl-base.txt / btl:no-nics
> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
> 
> 
> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it up.
> 
> 
> On Sep 15, 2012, at 12:44 PM, John Chludzinski  
> wrote:
> 
>> There was a bug in the code.  So now I get this, which is correct but how do 
>> I get rid of all these ABI, CMA, etc. messages?
>> 
>> $ mpiexec -n 4 ./a.out 
>> librdmacm: couldn't read ABI version.
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>> 
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>> 
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>>  rank=1  Results:5.000   6.000   7.000   
>> 8.000
>>  rank=2  Results:9.000   10.00   11.00   
>> 12.00
>>  rank=0  Results:1.000   2.000   3.000   
>> 4.000
>>  rank=3  Results:13.00   14.00   15.00   
>> 16.00
>> [elzbieta:02559] 3 more processes have sent help message 
>> help-mpi-btl-base.txt / btl:no-nics
>> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
>> all help / error messages
>> 
>> 
>> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski 
>>  wrote:
>> BTW, here the example code:
>> 
>> program scatter
>> include 'mpif.h'
>> 
>> integer, parameter :: SIZE=4
>> integer :: numtasks, rank, sendcount, recvcount, source, ierr
>> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>> 
>> !  Fortran stores this array in column major order, so the 
>> !  scatter will actually scatter columns, not rows.
>> data sendbuf /1.0, 2.0, 3.0, 4.0, &
>> 5.0, 6.0, 7.0, 8.0, &
>> 9.0, 10.0, 11.0, 12.0, &
>> 13.0, 14.0, 15.0, 16.0 /
>> 
>> call MPI_INIT(ierr)
>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>> 
>> if (numtasks .eq. SIZE) then
>>   source = 1
>>   sendcount = SIZE
>>   recvcount = SIZE
>>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>>   print *, 'rank= ',rank,' Results: ',recvbuf 
>> else
>>print *, 'Must specify',SIZE,' processors.  Terminating.' 
>> endif
>> 
>> call MPI_FINALIZE(ierr)
>> 
>> end program
>> 
>> 
>> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski 
>>  wrote:
>> # export LD_LIBRARY_PATH
>> 
>> 
>> # mpiexec -n 1 printenv | grep PATH
>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>> 
>> 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
Is this what you intended(?):

*$ mpiexec -n 4 ./a.out -mca btl^openib

*librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[5991,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: elzbieta

Another transport will be used instead, although this may result in
lower performance.
--
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
 rank=1  Results:5.000   6.000
7.000   8.000
 rank=0  Results:1.000   2.000
3.000   4.000
 rank=2  Results:9.000   10.00
11.00   12.00
 rank=3  Results:13.00   14.00
15.00   16.00
[elzbieta:02374] 3 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages


On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:

> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it
> up.
>
>
> On Sep 15, 2012, at 12:44 PM, John Chludzinski 
> wrote:
>
> There was a bug in the code.  So now I get this, which is correct but how
> do I get rid of all these ABI, CMA, etc. messages?
>
> $ mpiexec -n 4 ./a.out
> librdmacm: couldn't read ABI version.
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: elzbieta
>
> Another transport will be used instead, although this may result in
> lower performance.
> --
>  rank=1  Results:5.000   6.000
> 7.000   8.000
>  rank=2  Results:9.000   10.00
> 11.00   12.00
>  rank=0  Results:1.000   2.000
> 3.000   4.000
>  rank=3  Results:13.00   14.00
> 15.00   16.00
> [elzbieta:02559] 3 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
>
>
> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> BTW, here the example code:
>>
>> program scatter
>> include 'mpif.h'
>>
>> integer, parameter :: SIZE=4
>> integer :: numtasks, rank, sendcount, recvcount, source, ierr
>> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>>
>> !  Fortran stores this array in column major order, so the
>> !  scatter will actually scatter columns, not rows.
>> data sendbuf /1.0, 2.0, 3.0, 4.0, &
>> 5.0, 6.0, 7.0, 8.0, &
>> 9.0, 10.0, 11.0, 12.0, &
>> 13.0, 14.0, 15.0, 16.0 /
>>
>> call MPI_INIT(ierr)
>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>>
>> if (numtasks .eq. SIZE) then
>>   source = 1
>>   sendcount = SIZE
>>   recvcount = SIZE
>>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>>   print *, 'rank= ',rank,' Results: ',recvbuf
>> else
>>print *, 'Must specify',SIZE,' processors.  Terminating.'
>> endif
>>
>> call MPI_FINALIZE(ierr)
>>
>> end program
>>
>>
>> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>>> # export LD_LIBRARY_PATH
>>>
>>>
>>> # mpiexec -n 1 printenv | grep PATH
>>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>>
>>>
>>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>>> WINDOWPATH=1
>>>
>>> # mpiexec -n 4 ./a.out
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>
>>> --
>>> [[3598,1],0]: 

Re: [OMPI users] Newbie question?

2012-09-15 Thread Ralph Castain
Try adding "-mca btl ^openib" to your cmd line and see if that cleans it up.


On Sep 15, 2012, at 12:44 PM, John Chludzinski  
wrote:

> There was a bug in the code.  So now I get this, which is correct but how do 
> I get rid of all these ABI, CMA, etc. messages?
> 
> $ mpiexec -n 4 ./a.out 
> librdmacm: couldn't read ABI version.
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>   Host: elzbieta
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --
>  rank=1  Results:5.000   6.000   7.000
>8.000
>  rank=2  Results:9.000   10.00   11.00
>12.00
>  rank=0  Results:1.000   2.000   3.000
>4.000
>  rank=3  Results:13.00   14.00   15.00
>16.00
> [elzbieta:02559] 3 more processes have sent help message 
> help-mpi-btl-base.txt / btl:no-nics
> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
> 
> 
> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski 
>  wrote:
> BTW, here the example code:
> 
> program scatter
> include 'mpif.h'
> 
> integer, parameter :: SIZE=4
> integer :: numtasks, rank, sendcount, recvcount, source, ierr
> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
> 
> !  Fortran stores this array in column major order, so the 
> !  scatter will actually scatter columns, not rows.
> data sendbuf /1.0, 2.0, 3.0, 4.0, &
> 5.0, 6.0, 7.0, 8.0, &
> 9.0, 10.0, 11.0, 12.0, &
> 13.0, 14.0, 15.0, 16.0 /
> 
> call MPI_INIT(ierr)
> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
> 
> if (numtasks .eq. SIZE) then
>   source = 1
>   sendcount = SIZE
>   recvcount = SIZE
>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>   print *, 'rank= ',rank,' Results: ',recvbuf 
> else
>print *, 'Must specify',SIZE,' processors.  Terminating.' 
> endif
> 
> call MPI_FINALIZE(ierr)
> 
> end program
> 
> 
> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski 
>  wrote:
> # export LD_LIBRARY_PATH
> 
> 
> # mpiexec -n 1 printenv | grep PATH
> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
> 
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
> 
> # mpiexec -n 4 ./a.out 
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[3598,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>   Host: elzbieta
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> [elzbieta:4145] *** An error occurred in MPI_Scatter
> [elzbieta:4145] *** on communicator MPI_COMM_WORLD
> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --
> mpiexec has exited due to process rank 1 with PID 4145 on
> node elzbieta exiting improperly. There are two reasons this could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
There was a bug in the code.  So now I get this, which is correct but how
do I get rid of all these ABI, CMA, etc. messages?

$ mpiexec -n 4 ./a.out
librdmacm: couldn't read ABI version.
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: assuming: 4
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[6110,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: elzbieta

Another transport will be used instead, although this may result in
lower performance.
--
 rank=1  Results:5.000   6.000
7.000   8.000
 rank=2  Results:9.000   10.00
11.00   12.00
 rank=0  Results:1.000   2.000
3.000   4.000
 rank=3  Results:13.00   14.00
15.00   16.00
[elzbieta:02559] 3 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages


On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
john.chludzin...@gmail.com> wrote:

> BTW, here the example code:
>
> program scatter
> include 'mpif.h'
>
> integer, parameter :: SIZE=4
> integer :: numtasks, rank, sendcount, recvcount, source, ierr
> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>
> !  Fortran stores this array in column major order, so the
> !  scatter will actually scatter columns, not rows.
> data sendbuf /1.0, 2.0, 3.0, 4.0, &
> 5.0, 6.0, 7.0, 8.0, &
> 9.0, 10.0, 11.0, 12.0, &
> 13.0, 14.0, 15.0, 16.0 /
>
> call MPI_INIT(ierr)
> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>
> if (numtasks .eq. SIZE) then
>   source = 1
>   sendcount = SIZE
>   recvcount = SIZE
>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>   print *, 'rank= ',rank,' Results: ',recvbuf
> else
>print *, 'Must specify',SIZE,' processors.  Terminating.'
> endif
>
> call MPI_FINALIZE(ierr)
>
> end program
>
>
> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> # export LD_LIBRARY_PATH
>>
>>
>> # mpiexec -n 1 printenv | grep PATH
>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>
>>
>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>> WINDOWPATH=1
>>
>> # mpiexec -n 4 ./a.out
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[3598,1],0]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> librdmacm: couldn't read ABI version.
>> CMA: unable to get RDMA device list
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> [elzbieta:4145] *** An error occurred in MPI_Scatter
>> [elzbieta:4145] *** on communicator MPI_COMM_WORLD
>> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
>> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>> --
>> mpiexec has exited due to process rank 1 with PID 4145 on
>> node elzbieta exiting improperly. There are two reasons this could occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>>
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>>
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpiexec (as reported here).
>> 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
BTW, here the example code:

program scatter
include 'mpif.h'

integer, parameter :: SIZE=4
integer :: numtasks, rank, sendcount, recvcount, source, ierr
real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)

!  Fortran stores this array in column major order, so the
!  scatter will actually scatter columns, not rows.
data sendbuf /1.0, 2.0, 3.0, 4.0, &
5.0, 6.0, 7.0, 8.0, &
9.0, 10.0, 11.0, 12.0, &
13.0, 14.0, 15.0, 16.0 /

call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)

if (numtasks .eq. SIZE) then
  source = 1
  sendcount = SIZE
  recvcount = SIZE
  call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
   recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
  print *, 'rank= ',rank,' Results: ',recvbuf
else
   print *, 'Must specify',SIZE,' processors.  Terminating.'
endif

call MPI_FINALIZE(ierr)

end program


On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
john.chludzin...@gmail.com> wrote:

> # export LD_LIBRARY_PATH
>
>
> # mpiexec -n 1 printenv | grep PATH
> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>
>
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
>
> # mpiexec -n 4 ./a.out
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[3598,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: elzbieta
>
> Another transport will be used instead, although this may result in
> lower performance.
> --
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> [elzbieta:4145] *** An error occurred in MPI_Scatter
> [elzbieta:4145] *** on communicator MPI_COMM_WORLD
> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --
> mpiexec has exited due to process rank 1 with PID 4145 on
> node elzbieta exiting improperly. There are two reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpiexec (as reported here).
> --
>
>
>
> On Sat, Sep 15, 2012 at 2:24 PM, Ralph Castain  wrote:
>
>> Ah - note that there is no LD_LIBRARY_PATH in the environment. That's the
>> problem
>>
>> On Sep 15, 2012, at 11:19 AM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>> $ which mpiexec
>> /usr/lib/openmpi/bin/mpiexec
>>
>> # mpiexec -n 1 printenv | grep PATH
>>
>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>> WINDOWPATH=1
>>
>>
>>
>> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:
>>
>>> Couple of things worth checking:
>>>
>>> 1. verify that you executed the "mpiexec" you think you did - a simple
>>> "which mpiexec" should suffice
>>>
>>> 2. verify that your environment is correct by "mpiexec -n 1 printenv |
>>> grep PATH". Sometimes the ld_library_path doesn't carry over like you think
>>> it should
>>>
>>>
>>>  On Sep 15, 2012, at 10:00 AM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora
>>> 16) via:
>>>
>>> # yum install openmpi
>>> # yum install openmpi-devel
>>> # mpirun --version
>>> mpirun (Open MPI) 1.5.4
>>>
>>> I added:
>>>
>>> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
>>> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>>
>>> Then:
>>>
>>> $ mpif90 ex1.f95
>>> $ mpiexec -n 4 ./a.out
>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>>> open shared object file: No such file or directory
>>> ./a.out: error while loading shared libraries: 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
# export LD_LIBRARY_PATH

# mpiexec -n 1 printenv | grep PATH
LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
WINDOWPATH=1

# mpiexec -n 4 ./a.out
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[3598,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: elzbieta

Another transport will be used instead, although this may result in
lower performance.
--
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
CMA: unable to get RDMA device list
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
[elzbieta:4145] *** An error occurred in MPI_Scatter
[elzbieta:4145] *** on communicator MPI_COMM_WORLD
[elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
[elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--
mpiexec has exited due to process rank 1 with PID 4145 on
node elzbieta exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--


On Sat, Sep 15, 2012 at 2:24 PM, Ralph Castain  wrote:

> Ah - note that there is no LD_LIBRARY_PATH in the environment. That's the
> problem
>
> On Sep 15, 2012, at 11:19 AM, John Chludzinski 
> wrote:
>
> $ which mpiexec
> /usr/lib/openmpi/bin/mpiexec
>
> # mpiexec -n 1 printenv | grep PATH
>
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
>
>
>
> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:
>
>> Couple of things worth checking:
>>
>> 1. verify that you executed the "mpiexec" you think you did - a simple
>> "which mpiexec" should suffice
>>
>> 2. verify that your environment is correct by "mpiexec -n 1 printenv |
>> grep PATH". Sometimes the ld_library_path doesn't carry over like you think
>> it should
>>
>>
>> On Sep 15, 2012, at 10:00 AM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora
>> 16) via:
>>
>> # yum install openmpi
>> # yum install openmpi-devel
>> # mpirun --version
>> mpirun (Open MPI) 1.5.4
>>
>> I added:
>>
>> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
>> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>
>> Then:
>>
>> $ mpif90 ex1.f95
>> $ mpiexec -n 4 ./a.out
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> --
>> mpiexec noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>>
>> ls -l /usr/lib/openmpi/lib/
>> total 6788
>> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so ->
>> libmca_common_sm.so.2.0.0
>> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 ->
>> libmca_common_sm.so.2.0.0
>> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
>> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so ->
>> libmpi_cxx.so.1.0.1
>> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 ->
>> libmpi_cxx.so.1.0.1
>> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1

Re: [OMPI users] Newbie question?

2012-09-15 Thread Ralph Castain
Ah - note that there is no LD_LIBRARY_PATH in the environment. That's the 
problem

On Sep 15, 2012, at 11:19 AM, John Chludzinski  
wrote:

> $ which mpiexec
> /usr/lib/openmpi/bin/mpiexec
> 
> # mpiexec -n 1 printenv | grep PATH
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
> 
> 
> 
> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:
> Couple of things worth checking:
> 
> 1. verify that you executed the "mpiexec" you think you did - a simple "which 
> mpiexec" should suffice
> 
> 2. verify that your environment is correct by "mpiexec -n 1 printenv | grep 
> PATH". Sometimes the ld_library_path doesn't carry over like you think it 
> should
> 
> 
> On Sep 15, 2012, at 10:00 AM, John Chludzinski  
> wrote:
> 
>> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora 16) 
>> via:
>> 
>> # yum install openmpi
>> # yum install openmpi-devel
>> # mpirun --version
>> mpirun (Open MPI) 1.5.4
>> 
>> I added: 
>> 
>> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
>> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>> 
>> Then:
>> 
>> $ mpif90 ex1.f95
>> $ mpiexec -n 4 ./a.out 
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
>> shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
>> shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
>> shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
>> shared object file: No such file or directory
>> --
>> mpiexec noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>> 
>> ls -l /usr/lib/openmpi/lib/
>> total 6788
>> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so -> 
>> libmca_common_sm.so.2.0.0
>> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 -> 
>> libmca_common_sm.so.2.0.0
>> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
>> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so -> 
>> libmpi_cxx.so.1.0.1
>> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 -> 
>> libmpi_cxx.so.1.0.1
>> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
>> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so -> 
>> libmpi_f77.so.1.0.2
>> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 -> 
>> libmpi_f77.so.1.0.2
>> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
>> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so -> 
>> libmpi_f90.so.1.1.0
>> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 -> 
>> libmpi_f90.so.1.1.0
>> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
>> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
>> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
>> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
>> lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so -> 
>> libompitrace.so.0.0.0
>> lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 -> 
>> libompitrace.so.0.0.0
>> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
>> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so -> 
>> libopen-pal.so.3.0.0
>> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 -> 
>> libopen-pal.so.3.0.0
>> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
>> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so -> 
>> libopen-rte.so.3.0.0
>> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 -> 
>> libopen-rte.so.3.0.0
>> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
>> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
>> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
>> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
>> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
>> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
>> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
>> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so -> 
>> libvt-hyb.so.0.0.0
>> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 -> 
>> libvt-hyb.so.0.0.0
>> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
>> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
>> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so -> 
>> libvt-mpi.so.0.0.0
>> lrwxrwxrwx. 1 root root  18 Sep 14 

Re: [OMPI users] Newbie question?

2012-09-15 Thread Reuti

Am 15.09.2012 um 19:00 schrieb John Chludzinski:

> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora 16) 
> via:
> 
> # yum install openmpi
> # yum install openmpi-devel
> # mpirun --version
> mpirun (Open MPI) 1.5.4
> 
> I added: 
> 
> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH

Is this a typo - double PATH?


> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/

It needs to be exported, so that child processes can use it too.

-- Reuti


> Then:
> 
> $ mpif90 ex1.f95
> $ mpiexec -n 4 ./a.out 
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> --
> mpiexec noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> 
> ls -l /usr/lib/openmpi/lib/
> total 6788
> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so -> 
> libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 -> 
> libmca_common_sm.so.2.0.0
> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so -> 
> libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 -> 
> libmpi_cxx.so.1.0.1
> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so -> 
> libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 -> 
> libmpi_f77.so.1.0.2
> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so -> 
> libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 -> 
> libmpi_f90.so.1.1.0
> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so -> 
> libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 -> 
> libompitrace.so.0.0.0
> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so -> 
> libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 -> 
> libopen-pal.so.3.0.0
> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so -> 
> libopen-rte.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 -> 
> libopen-rte.so.3.0.0
> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so -> 
> libvt-hyb.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 -> 
> libvt-hyb.so.0.0.0
> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so -> 
> libvt-mpi.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-mpi.so.0 -> 
> libvt-mpi.so.0.0.0
> -rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
> -rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
> lrwxrwxrwx. 1 root root  17 Sep 15 12:25 libvt-mt.so -> libvt-mt.so.0.0.0
> lrwxrwxrwx. 1 root root  17 Sep 14 16:14 libvt-mt.so.0 -> 
> libvt-mt.so.0.0.0
> -rwxr-xr-x. 1 root root  266104 Jan 20  2012 libvt-mt.so.0.0.0
> -rw-r--r--. 1 root root   60390 Jan 20  2012 libvt-pomp.a
> lrwxrwxrwx. 1 root root  14 Sep 15 12:25 libvt.so -> libvt.so.0.0.0
> lrwxrwxrwx. 1 root root  14 Sep 14 16:14 libvt.so.0 -> libvt.so.0.0.0
> -rwxr-xr-x. 1 root root  242604 Jan 20  2012 libvt.so.0.0.0
> -rwxr-xr-x. 1 root root  303591 Jan 20  2012 mpi.mod
> drwxr-xr-x. 2 root root4096 Sep 14 16:14 openmpi
> 
> 
> The file (actually, a link) it claims it can't find: libmpi_f90.so.1, is 
> clearly there. And 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
$ which mpiexec
/usr/lib/openmpi/bin/mpiexec

# mpiexec -n 1 printenv | grep PATH
PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
WINDOWPATH=1



On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:

> Couple of things worth checking:
>
> 1. verify that you executed the "mpiexec" you think you did - a simple
> "which mpiexec" should suffice
>
> 2. verify that your environment is correct by "mpiexec -n 1 printenv |
> grep PATH". Sometimes the ld_library_path doesn't carry over like you think
> it should
>
>
> On Sep 15, 2012, at 10:00 AM, John Chludzinski 
> wrote:
>
> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora
> 16) via:
>
> # yum install openmpi
> # yum install openmpi-devel
> # mpirun --version
> mpirun (Open MPI) 1.5.4
>
> I added:
>
> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>
> Then:
>
> $ mpif90 ex1.f95
> $ mpiexec -n 4 ./a.out
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> --
> mpiexec noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
>
> ls -l /usr/lib/openmpi/lib/
> total 6788
> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so ->
> libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 ->
> libmca_common_sm.so.2.0.0
> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so ->
> libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 ->
> libmpi_cxx.so.1.0.1
> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so ->
> libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 ->
> libmpi_f77.so.1.0.2
> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so ->
> libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 ->
> libmpi_f90.so.1.1.0
> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so ->
> libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 ->
> libompitrace.so.0.0.0
> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so ->
> libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 ->
> libopen-pal.so.3.0.0
> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so ->
> libopen-rte.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 ->
> libopen-rte.so.3.0.0
> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so ->
> libvt-hyb.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 ->
> libvt-hyb.so.0.0.0
> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so ->
> libvt-mpi.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-mpi.so.0 ->
> libvt-mpi.so.0.0.0
> -rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
> -rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
> lrwxrwxrwx. 1 root root  17 Sep 15 12:25 libvt-mt.so ->
> libvt-mt.so.0.0.0
> lrwxrwxrwx. 1 root root  17 Sep 14 16:14 libvt-mt.so.0 

Re: [OMPI users] Newbie question?

2012-09-15 Thread Ralph Castain
Couple of things worth checking:

1. verify that you executed the "mpiexec" you think you did - a simple "which 
mpiexec" should suffice

2. verify that your environment is correct by "mpiexec -n 1 printenv | grep 
PATH". Sometimes the ld_library_path doesn't carry over like you think it should


On Sep 15, 2012, at 10:00 AM, John Chludzinski  
wrote:

> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora 16) 
> via:
> 
> # yum install openmpi
> # yum install openmpi-devel
> # mpirun --version
> mpirun (Open MPI) 1.5.4
> 
> I added: 
> 
> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
> 
> Then:
> 
> $ mpif90 ex1.f95
> $ mpiexec -n 4 ./a.out 
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> --
> mpiexec noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> 
> ls -l /usr/lib/openmpi/lib/
> total 6788
> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so -> 
> libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 -> 
> libmca_common_sm.so.2.0.0
> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so -> 
> libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 -> 
> libmpi_cxx.so.1.0.1
> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so -> 
> libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 -> 
> libmpi_f77.so.1.0.2
> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so -> 
> libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 -> 
> libmpi_f90.so.1.1.0
> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so -> 
> libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 -> 
> libompitrace.so.0.0.0
> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so -> 
> libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 -> 
> libopen-pal.so.3.0.0
> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so -> 
> libopen-rte.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 -> 
> libopen-rte.so.3.0.0
> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so -> 
> libvt-hyb.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 -> 
> libvt-hyb.so.0.0.0
> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so -> 
> libvt-mpi.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-mpi.so.0 -> 
> libvt-mpi.so.0.0.0
> -rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
> -rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
> lrwxrwxrwx. 1 root root  17 Sep 15 12:25 libvt-mt.so -> libvt-mt.so.0.0.0
> lrwxrwxrwx. 1 root root  17 Sep 14 16:14 libvt-mt.so.0 -> 
> libvt-mt.so.0.0.0
> -rwxr-xr-x. 1 root root  266104 Jan 20  2012 libvt-mt.so.0.0.0
> -rw-r--r--. 1 root root   60390 Jan 20  2012 libvt-pomp.a
> lrwxrwxrwx. 1 root root  14 Sep 15 12:25 libvt.so -> libvt.so.0.0.0
> lrwxrwxrwx. 1 root root  14 Sep 14 16:14 libvt.so.0 -> libvt.so.0.0.0
> -rwxr-xr-x. 1 root root  242604 Jan 20  2012 libvt.so.0.0.0
> 

[OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
I installed OpenMPI (I have a simple dual core AMD notebook with Fedora 16)
via:

# yum install openmpi
# yum install openmpi-devel
# mpirun --version
mpirun (Open MPI) 1.5.4

I added:

$ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
$ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/

Then:

$ mpif90 ex1.f95
$ mpiexec -n 4 ./a.out
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
--
mpiexec noticed that the job aborted, but has no info as to the process
that caused that situation.
--

ls -l /usr/lib/openmpi/lib/
total 6788
lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so ->
libmca_common_sm.so.2.0.0
lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 ->
libmca_common_sm.so.2.0.0
-rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so ->
libmpi_cxx.so.1.0.1
lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 ->
libmpi_cxx.so.1.0.1
-rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so ->
libmpi_f77.so.1.0.2
lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 ->
libmpi_f77.so.1.0.2
-rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so ->
libmpi_f90.so.1.1.0
lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 ->
libmpi_f90.so.1.1.0
-rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
-rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so ->
libompitrace.so.0.0.0
lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 ->
libompitrace.so.0.0.0
-rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so ->
libopen-pal.so.3.0.0
lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 ->
libopen-pal.so.3.0.0
-rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so ->
libopen-rte.so.3.0.0
lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 ->
libopen-rte.so.3.0.0
-rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
-rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
-rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
-rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
-rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so ->
libvt-hyb.so.0.0.0
lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 ->
libvt-hyb.so.0.0.0
-rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
-rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so ->
libvt-mpi.so.0.0.0
lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-mpi.so.0 ->
libvt-mpi.so.0.0.0
-rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
-rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
lrwxrwxrwx. 1 root root  17 Sep 15 12:25 libvt-mt.so ->
libvt-mt.so.0.0.0
lrwxrwxrwx. 1 root root  17 Sep 14 16:14 libvt-mt.so.0 ->
libvt-mt.so.0.0.0
-rwxr-xr-x. 1 root root  266104 Jan 20  2012 libvt-mt.so.0.0.0
-rw-r--r--. 1 root root   60390 Jan 20  2012 libvt-pomp.a
lrwxrwxrwx. 1 root root  14 Sep 15 12:25 libvt.so -> libvt.so.0.0.0
lrwxrwxrwx. 1 root root  14 Sep 14 16:14 libvt.so.0 -> libvt.so.0.0.0
-rwxr-xr-x. 1 root root  242604 Jan 20  2012 libvt.so.0.0.0
-rwxr-xr-x. 1 root root  303591 Jan 20  2012 mpi.mod
drwxr-xr-x. 2 root root4096 Sep 14 16:14 openmpi


The file (actually, a link) it claims it can't find: libmpi_f90.so.1, is
clearly there. And LD_LIBRARY_PATH=/usr/lib/openmpi/lib/.

What's the problem?

---John