Re: [OMPI users] MPI_INIT failed 4.0.1

2019-04-19 Thread Mahmood Naderan
Thanks for the hint.

Regards,
Mahmood




On Thu, Apr 18, 2019 at 2:47 AM Reuti  wrote:

> Hi,
>
> Am 17.04.2019 um 11:07 schrieb Mahmood Naderan:
>
> > Hi,
> > After successful installation of v4 on a custom location, I see some
> errors while the default installation (v2) hasn't.
>
> Did you also recompile your application with this version of Open MPI?
>
> -- Reuti
>
>
> > $ /share/apps/softwares/openmpi-4.0.1/bin/mpirun --version
> > mpirun (Open MPI) 4.0.1
> >
> > Report bugs to http://www.open-mpi.org/community/help/
> > $ /share/apps/softwares/openmpi-4.0.1/bin/mpirun -np 4 pw.x -i
> mos2.rlx.in
> >
> --
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> >
> >   ompi_mpi_init: ompi_rte_init failed
> >   --> Returned "(null)" (-43) instead of "Success" (0)
> >
> --
> >
> --
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> >
> >   ompi_mpi_init: ompi_rte_init failed
> >   --> Returned "(null)" (-43) instead of "Success" (0)
> >
> --
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> > ***and potentially your MPI job)
> > [rocks7.jupiterclusterscu.com:18531] Local abort before MPI_INIT
> completed completed successfully, but am not able to aggregate error
> messages, and not able to guarantee that all other processes were killed!
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> > ***and potentially your MPI job)
> > [rocks7.jupiterclusterscu.com:18532] Local abort before MPI_INIT
> completed completed successfully, but am not able to aggregate error
> messages, and not able to guarantee that all other processes were killed!
> >
> --
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> --
> >
> --
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> >
> >   ompi_mpi_init: ompi_rte_init failed
> >   --> Returned "(null)" (-43) instead of "Success" (0)
> >
> --
> >
> --
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> >
> >   ompi_mpi_init: ompi_rte_init failed
> >   --> Returned "(null)" (-43) instead of "Success" (0)
> >
> --
> > *** An error occurred in MPI_Init
> >

[OMPI users] MPI_INIT failed 4.0.1

2019-04-17 Thread Mahmood Naderan
Hi,
After successful installation of v4 on a custom location, I see some errors
while the default installation (v2) hasn't.

$ /share/apps/softwares/openmpi-4.0.1/bin/mpirun --version
mpirun (Open MPI) 4.0.1

Report bugs to http://www.open-mpi.org/community/help/
$ /share/apps/softwares/openmpi-4.0.1/bin/mpirun -np 4 pw.x -i mos2.rlx.in
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[rocks7.jupiterclusterscu.com:18531] Local abort before MPI_INIT completed
completed successfully, but am not able to aggregate error messages, and
not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[rocks7.jupiterclusterscu.com:18532] Local abort before MPI_INIT completed
completed successfully, but am not able to aggregate error messages, and
not able to guarantee that all other processes were killed!
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[rocks7.jupiterclusterscu.com:18530] Local abort before MPI_INIT completed
completed successfully, but am not able to aggregate error messages, and
not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[rocks7.jupiterclusterscu.com:18533] Local abort before MPI_INIT completed
completed successfully, but am not able to aggregate error messages, and
not able to guarantee that all other processes were killed!
--
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[10,1],2]
  Exit code:1
--





I can not see a meaningful error message  here. Can someone give me a hint?

Regards,

[OMPI users] job termination

2019-04-17 Thread Mahmood Naderan
Hi,
A QuantumEspresso, multinode and multiprocess MPI job has been terminated
with the following messages in the log file


 total cpu time spent up to now is63540.4 secs

 total energy  =  -14004.61932175 Ry
 Harris-Foulkes estimate   =  -14004.73511665 Ry
 estimated scf accuracy<   0.84597958 Ry

 iteration #  7 ecut=48.95 Ry beta= 0.70
 Davidson diagonalization with overlap
--
ORTE has lost communication with a remote daemon.

  HNP daemon   : [[7952,0],0] on node compute-0-0
  Remote daemon: [[7952,0],1] on node compute-0-1

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--




The slurm script for that is

#!/bin/bash
#SBATCH --job-name=myQE
#SBATCH --output=mos2.rlx.out
#SBATCH --ntasks=14
#SBATCH --mem-per-cpu=17G
#SBATCH --nodes=6
#SBATCH --partition=QUARTZ
#SBATCH --account=z5
mpirun pw.x -i mos2.rlx.in


The job is running on Slurm 18.08 and Rocks7 which its default OpenMPI
2.1.1.

Other jobs with OMPI and slurm and QE are fine. So, I want to know how can
I narrow my searches to find the root of the problem of this specific
problem. For example, I don't know if the QE job had been diverged in
calculations or not. Is there any way to find more information about that.

Any idea?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] mpi and gromacs

2018-07-11 Thread Mahmood Naderan
Hi
Although not directly related to ompi, I would like to know if anybody uses
gromcas with mpi support? The binary is gmx_mpi and it has some options for
threading. However, I am also able to run that by using running mpirun
before gmx_mpi.


So, it is possible to run

gmx_mpi 

and

mpirun -np 4 gmx_mpi

Is the second command OK? Seems to be a two layer mpi calls which may
degrade the performance.

Any thoughts?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] --oversubscribe option

2018-06-06 Thread Mahmood Naderan
Hi,
On a Ryzen 1800x which has 8 cores and 16 threads, when I run "mpirun -np
16 lammps..." I get an error that there is not enough slot. It seems that
--oversubscribe option will fix that.

Odd thing is that when I run "mpirun -np 8 lammps" it takes about 46
minutes to complete the job while with "mpirun --oversubscribe -np 16
lammps" it takes about 39 minutes.

I want to be sure that I "-np 16" uses are logical cores. Is that confirmed
with --oversubscribe?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] libopen-pal not found

2018-03-02 Thread Mahmood Naderan
Hi,
After a successful installation of opmi v3 with cuda enabled, I see that
ldd can not find a right lib file although it exists. /usr/local/lib is one
of the default locations for the library files. Isn't that?



$ which mpic++
/usr/local/bin/mpic++
$ ldd /usr/local/bin/mpic++
linux-vdso.so.1 =>  (0x7fff06d0d000)
libopen-pal.so.40 => not found
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x7f3901cbd000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f39018f2000)
/lib64/ld-linux-x86-64.so.2 (0x55df58104000)
$ sudo find /usr/local -name libopen-pal*
/usr/local/lib/libopen-pal.so.40
/usr/local/lib/libopen-pal.so.40.0.0
/usr/local/lib/libopen-pal.la
/usr/local/lib/libopen-pal.so



Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] NAS benchmark

2018-02-03 Thread Mahmood Naderan
Thanks for that. I have to use that as both compiler and linker options.

Regards,
Mahmood



On Sat, Feb 3, 2018 at 4:37 PM, Matt Thompson  wrote:

> Well, whenever I see a "relocation truncated to fit" error, my first
> thought is to add "-mcmodel=medium" to the compile flags. I'm surprised NAS
> Benchmarks need it, though.
>
> On Sat, Feb 3, 2018 at 3:48 AM, Mahmood Naderan 
> wrote:
>
>> Hi,
>> Any body has tried NAS benchmark with ompi? I get the following linker
>> error while building one of the benchmarks.
>>
>> [mahmood@rocks7 NPB3.3-MPI]$ make BT NPROCS=4 CLASS=D
>>=
>>=  NAS Parallel Benchmarks 3.3  =
>>=  MPI/F77/C=
>>=
>>
>> cd BT; make NPROCS=4 CLASS=D SUBTYPE= VERSION=
>> make[1]: Entering directory `/home/mahmood/Downloads/NPB3.
>> 3.1/NPB3.3-MPI/BT'
>> make[2]: Entering directory `/home/mahmood/Downloads/NPB3.
>> 3.1/NPB3.3-MPI/sys'
>> cc -g  -o setparams setparams.c
>> make[2]: Leaving directory `/home/mahmood/Downloads/NPB3.
>> 3.1/NPB3.3-MPI/sys'
>> ../sys/setparams bt 4 D
>> make[2]: Entering directory `/home/mahmood/Downloads/NPB3.
>> 3.1/NPB3.3-MPI/BT'
>> make.def modified. Rebuilding npbparams.h just in case
>> rm -f npbparams.h
>> ../sys/setparams bt 4 D
>> mpif90 -c -I/usr/local/include -O bt.f
>> mpif90 -c -I/usr/local/include -O make_set.f
>> mpif90 -c -I/usr/local/include -O initialize.f
>> mpif90 -c -I/usr/local/include -O exact_solution.f
>> mpif90 -c -I/usr/local/include -O exact_rhs.f
>> mpif90 -c -I/usr/local/include -O set_constants.f
>> mpif90 -c -I/usr/local/include -O adi.f
>> mpif90 -c -I/usr/local/include -O define.f
>> mpif90 -c -I/usr/local/include -O copy_faces.f
>> mpif90 -c -I/usr/local/include -O rhs.f
>> mpif90 -c -I/usr/local/include -O solve_subs.f
>> mpif90 -c -I/usr/local/include -O x_solve.f
>> mpif90 -c -I/usr/local/include -O y_solve.f
>> mpif90 -c -I/usr/local/include -O z_solve.f
>> mpif90 -c -I/usr/local/include -O add.f
>> mpif90 -c -I/usr/local/include -O error.f
>> mpif90 -c -I/usr/local/include -O verify.f
>> mpif90 -c -I/usr/local/include -O setup_mpi.f
>> make[3]: Entering directory `/home/mahmood/Downloads/NPB3.
>> 3.1/NPB3.3-MPI/BT'
>> mpif90 -c -I/usr/local/include -O btio.f
>> mpif90 -O -o ../bin/bt.D.4 bt.o make_set.o initialize.o exact_solution.o
>> exact_rhs.o set_constants.o adi.o define.o copy_faces.o rhs.o solve_subs.o
>> x_solve.o y_solve.o z_solve.o add.o error.o verify.o setup_mpi.o
>> ../common/print_results.o ../common/timers.o btio.o -L/usr/local/lib -lmpi
>> x_solve.o: In function `x_solve_cell_':
>> x_solve.f:(.text+0x77a): relocation truncated to fit: R_X86_64_32 against
>> symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0x77f): relocation truncated to fit: R_X86_64_32 against
>> symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0x946): relocation truncated to fit: R_X86_64_32S
>> against symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0x94e): relocation truncated to fit: R_X86_64_32S
>> against symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0x958): relocation truncated to fit: R_X86_64_32S
>> against symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0x962): relocation truncated to fit: R_X86_64_32S
>> against symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0x96c): relocation truncated to fit: R_X86_64_32S
>> against symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0x9ab): relocation truncated to fit: R_X86_64_32S
>> against symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0x9c6): relocation truncated to fit: R_X86_64_32S
>> against symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0x9f3): relocation truncated to fit: R_X86_64_32S
>> against symbol `work_lhs_' defined in COMMON section in x_solve.o
>> x_solve.f:(.text+0xa21): additional relocation overflows omitted from the
>> output
>> collect2: error: ld returned 1 exit status
>> make[3]: *** [bt-bt] Error 1
>> make[3]: Leaving directory `/home/mahmood/Downloads/NPB3.
>> 3.1/NPB3.3-MPI/BT'
>> make[2]: *** [exec] Error 2
>> make[2]: Leaving directory `/home/mahmood/Downloads/NPB3.
>> 3.1/N

[OMPI users] NAS benchmark

2018-02-03 Thread Mahmood Naderan
Hi,
Any body has tried NAS benchmark with ompi? I get the following linker
error while building one of the benchmarks.

[mahmood@rocks7 NPB3.3-MPI]$ make BT NPROCS=4 CLASS=D
   =
   =  NAS Parallel Benchmarks 3.3  =
   =  MPI/F77/C=
   =

cd BT; make NPROCS=4 CLASS=D SUBTYPE= VERSION=
make[1]: Entering directory `/home/mahmood/Downloads/NPB3.3.1/NPB3.3-MPI/BT'
make[2]: Entering directory
`/home/mahmood/Downloads/NPB3.3.1/NPB3.3-MPI/sys'
cc -g  -o setparams setparams.c
make[2]: Leaving directory `/home/mahmood/Downloads/NPB3.3.1/NPB3.3-MPI/sys'
../sys/setparams bt 4 D
make[2]: Entering directory `/home/mahmood/Downloads/NPB3.3.1/NPB3.3-MPI/BT'
make.def modified. Rebuilding npbparams.h just in case
rm -f npbparams.h
../sys/setparams bt 4 D
mpif90 -c -I/usr/local/include -O bt.f
mpif90 -c -I/usr/local/include -O make_set.f
mpif90 -c -I/usr/local/include -O initialize.f
mpif90 -c -I/usr/local/include -O exact_solution.f
mpif90 -c -I/usr/local/include -O exact_rhs.f
mpif90 -c -I/usr/local/include -O set_constants.f
mpif90 -c -I/usr/local/include -O adi.f
mpif90 -c -I/usr/local/include -O define.f
mpif90 -c -I/usr/local/include -O copy_faces.f
mpif90 -c -I/usr/local/include -O rhs.f
mpif90 -c -I/usr/local/include -O solve_subs.f
mpif90 -c -I/usr/local/include -O x_solve.f
mpif90 -c -I/usr/local/include -O y_solve.f
mpif90 -c -I/usr/local/include -O z_solve.f
mpif90 -c -I/usr/local/include -O add.f
mpif90 -c -I/usr/local/include -O error.f
mpif90 -c -I/usr/local/include -O verify.f
mpif90 -c -I/usr/local/include -O setup_mpi.f
make[3]: Entering directory `/home/mahmood/Downloads/NPB3.3.1/NPB3.3-MPI/BT'
mpif90 -c -I/usr/local/include -O btio.f
mpif90 -O -o ../bin/bt.D.4 bt.o make_set.o initialize.o exact_solution.o
exact_rhs.o set_constants.o adi.o define.o copy_faces.o rhs.o solve_subs.o
x_solve.o y_solve.o z_solve.o add.o error.o verify.o setup_mpi.o
../common/print_results.o ../common/timers.o btio.o -L/usr/local/lib -lmpi
x_solve.o: In function `x_solve_cell_':
x_solve.f:(.text+0x77a): relocation truncated to fit: R_X86_64_32 against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0x77f): relocation truncated to fit: R_X86_64_32 against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0x946): relocation truncated to fit: R_X86_64_32S against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0x94e): relocation truncated to fit: R_X86_64_32S against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0x958): relocation truncated to fit: R_X86_64_32S against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0x962): relocation truncated to fit: R_X86_64_32S against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0x96c): relocation truncated to fit: R_X86_64_32S against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0x9ab): relocation truncated to fit: R_X86_64_32S against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0x9c6): relocation truncated to fit: R_X86_64_32S against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0x9f3): relocation truncated to fit: R_X86_64_32S against
symbol `work_lhs_' defined in COMMON section in x_solve.o
x_solve.f:(.text+0xa21): additional relocation overflows omitted from the
output
collect2: error: ld returned 1 exit status
make[3]: *** [bt-bt] Error 1
make[3]: Leaving directory `/home/mahmood/Downloads/NPB3.3.1/NPB3.3-MPI/BT'
make[2]: *** [exec] Error 2
make[2]: Leaving directory `/home/mahmood/Downloads/NPB3.3.1/NPB3.3-MPI/BT'
make[1]: *** [../bin/bt.D.4] Error 2
make[1]: Leaving directory `/home/mahmood/Downloads/NPB3.3.1/NPB3.3-MPI/BT'
make: *** [bt] Error 2


There is a good guide about that (
https://www.technovelty.org/c/relocation-truncated-to-fit-wtf.html) but I
don't know which compiler flag should I fix to fix that.

Any idea?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] openmpi with htcondor

2018-01-25 Thread Mahmood Naderan
Hi,
Has anyone here used htcondor scheduler with mpi jobs? I followed the
example, openmpiscript, in the condor folder like this

[mahmood@rocks7 ~]$ cat mpi.ht

universe = parallel

executable = openmpiscript

arguments = mpihello

log = hellompi.log

output = hellompi.out

error = hellompi.err

machine_count = 2


However, it fails with this error

​


[mahmood@rocks7 ~]$ cat hellompi.out
WARNING: MOUNT_UNDER_SCRATCH not set in condor_config
WARNING: MOUNT_UNDER_SCRATCH not set in condor_config
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[62274,1],0]
  Exit code:1
--
[mahmood@rocks7 ~]$ cat hellompi.err
Not defined: MOUNT_UNDER_SCRATCH
Not defined: MOUNT_UNDER_SCRATCH
[compute-0-1.local:17511] [[62274,1],0] usock_peer_recv_connect_ack:
received unexpected process identifier [[62274,0],2] from [[62274,0],1]
[compute-0-1.local:17512] [[62274,1],1] usock_peer_recv_connect_ack:
received unexpected process identifier [[62274,0],2] from [[62274,0],1]




​

A
​ny idea?


​
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] MPI test program

2018-01-15 Thread Mahmood Naderan
Hi,
Is there any small benchmark for performance measurements? I mean a test
which utilize the number of cpus given to the mp
​i for comparison. I want to compare two kernel versions on one system only
and not across different platforms.


I know Intel MPI benchmark, but I would like to know if there is another
option. Is MTT suitable for that?

Any comment?


Regards,​

Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] About my GPU performance using Openmpi-2.0.4

2017-12-13 Thread Mahmood Naderan
>Currently I am using two Tesla K40m cards for my computational work on
>quantum espresso (QE) suit http://www.quantum-espresso.org/. My GPU
>enabled QE code running very slower than normal version

Hi,
When I hear such words, I would say, yeah it is quite natural!

My personal experience with a GPU (Quadro M2000) was actually a
failure and loss of money. With various models, configs and companies,
it is very hard to determine if a GPU product really boosts the
performance unless you sign a contract with them (have to pay them!)
and consult their experts to find a good product.

At the end of the day, I think companies put all good features in
their high-end products (multi thousand dollar ones). So, I think the
K40m version, where it uses passive cooling, misses many good features
although it has 12GB of GDDR5.

I hope that in your case, the slow run is a software issue. That was
my thoughts only and may not be correct!

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-14 Thread Mahmood Naderan
Fortunately Rocks 7 beta is released. So, there is a hope that newer
version will be born one day.

>https://github.com/sdsc/mpi-roll 
I wasn't aware of that. Thanks for sharing it.



>there is a typo, it should be
>-Wl,-rpath,/.../

Thanks a lot. It is now working.

$ chrpath /share/apps/chemistry/qe-6.1/bin/pw.x
/share/apps/chemistry/qe-6.1/bin/pw.x:
RPATH=/share/apps/computer/OpenBLAS-0.2.18
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-14 Thread Mahmood Naderan
So it seems that -rpath is not available with 1.4 which is ompi came with
rocks 6.

Regards,
Mahmood



On Thu, Sep 14, 2017 at 2:44 PM, Mahmood Naderan 
wrote:

> Well that may be good if someone intend to rebuild ompi.
> Lets say, there is an ompi on the system...
>
> Regards,
> Mahmood
>
>
>
> On Thu, Sep 14, 2017 at 2:31 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Peter and all,
>>
>> an easier option is to configure Open MPI with --mpirun-prefix-by-default
>> this will automagically add rpath to the libs.
>>
>> Cheers,
>>
>> Gilles
>>
>> On Thu, Sep 14, 2017 at 6:43 PM, Peter Kjellström  wrote:
>> > On Wed, 13 Sep 2017 20:13:54 +0430
>> > Mahmood Naderan  wrote:
>> > ...
>> >> `/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/li
>> bc.a(strcmp.o)'
>> >> can not be used when making an executable; recompile with -fPIE and
>> >> relink with -pie collect2: ld returned 1 exit status
>> >>
>> >>
>> >> With such an error, I thought it is better to forget static linking!
>> >> (as it is related to libc) and work with the shared libs and
>> >> LD_LIBRARY_PATH
>> >
>> > First, I think giving up on static linking is the right choice.
>> >
>> > If the main thing you were after was the convenience of a binary that
>> > will run without the need to setup LD_LIBRARY_PATH correctly you should
>> > have a look at passing -rpath to the linker.
>> >
>> > In short, "mpicc -Wl,-rpath=/my/lib/path helloworld.c -o hello", will
>> > compile a dynamic binary "hello" with built in search path
>> > to "/my/lib/path".
>> >
>> > With OpenMPI this will be added as a "runpath" due to how the wrappers
>> > are designed. Both rpath and runpath works for finding "/my/lib/path"
>> > wihtout LD_LIBRARY_PATH but the difference is in priority. rpath is
>> > higher priority than LD_LIBRARY_PATH etc. and runpath is lower.
>> >
>> > You can check your rpath or runpath in a binary using the command
>> > chrpath (package on rhel/centos/... is chrpath):
>> >
>> > $ chrpath hello
>> > hello: RUNPATH=/my/lib/path
>> >
>> > If what you really wanted is the rpath behavior (winning over any
>> > LD_LIBRARY_PATH in the environment etc.) then you need to modify the
>> > openmpi wrappers (rebuild openmpi) such that it does NOT pass
>> > "--enable-new-dtags" to the linker.
>> >
>> > /Peter
>> > ___
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://lists.open-mpi.org/mailman/listinfo/users
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-14 Thread Mahmood Naderan
Well that may be good if someone intend to rebuild ompi.
Lets say, there is an ompi on the system...

Regards,
Mahmood



On Thu, Sep 14, 2017 at 2:31 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Peter and all,
>
> an easier option is to configure Open MPI with --mpirun-prefix-by-default
> this will automagically add rpath to the libs.
>
> Cheers,
>
> Gilles
>
> On Thu, Sep 14, 2017 at 6:43 PM, Peter Kjellström  wrote:
> > On Wed, 13 Sep 2017 20:13:54 +0430
> > Mahmood Naderan  wrote:
> > ...
> >> `/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/
> libc.a(strcmp.o)'
> >> can not be used when making an executable; recompile with -fPIE and
> >> relink with -pie collect2: ld returned 1 exit status
> >>
> >>
> >> With such an error, I thought it is better to forget static linking!
> >> (as it is related to libc) and work with the shared libs and
> >> LD_LIBRARY_PATH
> >
> > First, I think giving up on static linking is the right choice.
> >
> > If the main thing you were after was the convenience of a binary that
> > will run without the need to setup LD_LIBRARY_PATH correctly you should
> > have a look at passing -rpath to the linker.
> >
> > In short, "mpicc -Wl,-rpath=/my/lib/path helloworld.c -o hello", will
> > compile a dynamic binary "hello" with built in search path
> > to "/my/lib/path".
> >
> > With OpenMPI this will be added as a "runpath" due to how the wrappers
> > are designed. Both rpath and runpath works for finding "/my/lib/path"
> > wihtout LD_LIBRARY_PATH but the difference is in priority. rpath is
> > higher priority than LD_LIBRARY_PATH etc. and runpath is lower.
> >
> > You can check your rpath or runpath in a binary using the command
> > chrpath (package on rhel/centos/... is chrpath):
> >
> > $ chrpath hello
> > hello: RUNPATH=/my/lib/path
> >
> > If what you really wanted is the rpath behavior (winning over any
> > LD_LIBRARY_PATH in the environment etc.) then you need to modify the
> > openmpi wrappers (rebuild openmpi) such that it does NOT pass
> > "--enable-new-dtags" to the linker.
> >
> > /Peter
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-14 Thread Mahmood Naderan
>In short, "mpicc -Wl,-rpath=/my/lib/path helloworld.c -o hello", will
>compile a dynamic binary "hello" with built in search path
>to "/my/lib/path".

Excuse me... Is that a path or file? I get this:

mpif90 -g -pthread -Wl,rpath=/share/apps/computer/OpenBLAS-0.2.18 -o
iotk_print_kinds.x iotk_print_kinds.o libiotk.a
/usr/bin/ld: rpath=/share/apps/computer/OpenBLAS-0.2.18: No such file: No
such file or directory
collect2: ld returned 1 exit status


However, the lib files are there.

[root@cluster source]# ls -l
/share/apps/computer/OpenBLAS-0.2.18/libopenblas*
lrwxrwxrwx 1 nfsnobody nfsnobody   32 Sep  8 14:40
/share/apps/computer/OpenBLAS-0.2.18/libopenblas.a ->
libopenblas_bulldozerp-r0.2.18.a
-rw-r--r-- 1 nfsnobody nfsnobody 28075178 Sep  8 14:41
/share/apps/computer/OpenBLAS-0.2.18/libopenblas_bulldozerp-r0.2.18.a
-rwxr-xr-x 1 nfsnobody nfsnobody 14906048 Sep  8 14:41
/share/apps/computer/OpenBLAS-0.2.18/libopenblas_bulldozerp-r0.2.18.so
lrwxrwxrwx 1 nfsnobody nfsnobody   33 Sep  8 14:41
/share/apps/computer/OpenBLAS-0.2.18/libopenblas.so ->
libopenblas_bulldozerp-r0.2.18.so
lrwxrwxrwx 1 nfsnobody nfsnobody   33 Sep  8 14:41
/share/apps/computer/OpenBLAS-0.2.18/libopenblas.so.0 ->
libopenblas_bulldozerp-r0.2.18.so



Please note that, I added that option to the linker section of make.inc
from ESPRESSO

# compiler flags: C, F90, F77
# C flags must include DFLAGS and IFLAGS
# F90 flags must include MODFLAGS, IFLAGS, and FDFLAGS with appropriate
syntax
CFLAGS = -O3 $(DFLAGS) $(IFLAGS)
F90FLAGS   = $(FFLAGS) -x f95-cpp-input $(FDFLAGS) $(IFLAGS) $(MODFLAGS)
FFLAGS = -O3 -g
# Linker, linker-specific flags (if any)
# Typically LD coincides with F90 or MPIF90, LD_LIBS is empty
LD = mpif90
LDFLAGS= -g -pthread -Wl,rpath=/share/apps/computer/OpenBLAS-0.2.18
LD_LIBS=


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-13 Thread Mahmood Naderan
>are you sure you are using Open MPI ?
I am using the openmpi shipped with Rocks 6 and trying to build Quantum
ESPRESSO 6.1


>Beware: static linking is not for the meek.
Agree! I found that I have to install compat-dapl-static.x86_64. As can be
seen from the name, it is a compatibility library. After that, I faced an
error saying


mpif90 -g -pthread -static -o iotk_print_kinds.x iotk_print_kinds.o
libiotk.a
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libpthread.a(libpthread.o):
In function `sem_open':
(.text+0x774d): warning: the use of `mktemp' is dangerous, better use
`mkstemp'
/usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp' with pointer equality in
`/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libc.a(strcmp.o)'
can not be used when making an executable; recompile with -fPIE and relink
with -pie
collect2: ld returned 1 exit status


With such an error, I thought it is better to forget static linking! (as it
is related to libc) and work with the shared libs and LD_LIBRARY_PATH

Anyway, thanks for your help.

Regards,
Mahmood



On Wed, Sep 13, 2017 at 6:12 PM, Jeff Squyres (jsquyres)  wrote:

> Beware: static linking is not for the meek.
>
> Is there a reason you need to link statically?
>
> Be sure to read this FAQ item: https://www.open-mpi.org/faq/?
> category=mpi-apps#static-ofa-mpi-apps (note that that FAQ item was
> written a long time ago; it cites the "mthca" Mellanox obverts driver; the
> current generation driver name is ?I think? mlx5).  You'll likely also have
> to adapt those instructions if you're using the UCX or MXM IB libraries.
>
>
> > On Sep 13, 2017, at 7:21 AM, gil...@rist.or.jp wrote:
> >
> > This is something related to DAPL.
> >
> > /* just google "libdat" */
> >
> >
> > iirc, Intel MPI uses that,  but i do not recall Open MPI using it (!)
> >
> > are you sure you are using Open MPI ?
> >
> > which interconnect do you have ?
> >
> >
> > Cheers,
> >
> >
> > Gilles
> >
> > - Original Message -
> >
> > Thanks Gilles... That has been solved. Another issue is
> >
> > mpif90 -g -pthread -static -o iotk_print_kinds.x iotk_print_kinds.o
> libiotk.a
> > /usr/bin/ld: cannot find -ldat
> >
> > The name is actually hard to google! I cannot find the library name for
> "dat". Have you heard of that? There is not "libdat" package as I searched.
> >
> > Regards,
> > Mahmood
> >
> >
> > On Wed, Sep 13, 2017 at 2:54 PM,  wrote:
> >  Mahmood,
> >
> >
> > since you are building a static binary, only static library (e.g.
> libibverbs.a) can be used.
> >
> > on your system, only dynamic libibverbs.so is available.
> >
> >
> > simply install libibverbs.a and you should be fine.
> >
> >
> > Best regards,
> >
> >
> > Gilles
> >
> > - Original Message -
> >
> > Hi,
> > I am trying to build an application with static linking that uses
> openmpi. in the middle of the build, I get this
> >
> > mpif90 -g -pthread -static -o iotk_print_kinds.x iotk_print_kinds.o
> libiotk.a
> > /usr/bin/ld: cannot find -libverbs
> > collect2: ld returned 1 exit status
> > However, such library exists on the system.
> >
> > [root@cluster source]# find /usr/ -name *ibverb*
> > /usr/lib64/libibverbs.so
> > /usr/lib64/libibverbs.so.1.0.0
> > /usr/lib64/libibverbs.so.1
> > /usr/share/doc/libibverbs-1.1.8
> > [root@cluster source]# mpif90 -v
> > Using built-in specs.
> > Target: x86_64-redhat-linux
> > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
> --infodir=/usr/share/info --with-bugurl=http://bugzilla.
> redhat.com/bugzilla --enable-bootstrap --enable-shared
> --enable-threads=posix --enable-checking=release --with-system-zlib
> --enable-__cxa_atexit --disable-libunwind-exceptions
> --enable-gnu-unique-object 
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada
> --enable-java-awt=gtk --disable-dssi 
> --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
> --enable-libgcj-multifile --enable-java-maintainer-mode
> --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
> --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686
> --build=x86_64-redhat-linux
> > Thread model: posix
> > gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)
> >
> >
> >
> > Any idea for that?
> > Regards,
> > Mahmood
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-13 Thread Mahmood Naderan
Thanks Gilles... That has been solved. Another issue is

mpif90 -g -pthread -static -o iotk_print_kinds.x iotk_print_kinds.o
libiotk.a
/usr/bin/ld: cannot find -ldat


The name is actually hard to google! I cannot find the library name for
"dat". Have you heard of that? There is not "libdat" package as I searched.


Regards,
Mahmood



On Wed, Sep 13, 2017 at 2:54 PM,  wrote:

>  Mahmood,
>
>
>
> since you are building a static binary, only static library (e.g.
> libibverbs.a) can be used.
>
> on your system, only dynamic libibverbs.so is available.
>
>
>
> simply install libibverbs.a and you should be fine.
>
>
>
> Best regards,
>
>
>
> Gilles
>
> - Original Message -
>
> Hi,
> I am trying to build an application with static linking that uses openmpi.
> in the middle of the build, I get this
>
> mpif90 -g -pthread -static -o iotk_print_kinds.x iotk_print_kinds.o
> libiotk.a
> /usr/bin/ld: cannot find -libverbs
> collect2: ld returned 1 exit status
>
> However, such library exists on the system.
>
> [root@cluster source]# find /usr/ -name *ibverb*
> /usr/lib64/libibverbs.so
> /usr/lib64/libibverbs.so.1.0.0
> /usr/lib64/libibverbs.so.1
> /usr/share/doc/libibverbs-1.1.8
> [root@cluster source]# mpif90 -v
> Using built-in specs.
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
> --infodir=/usr/share/info --with-bugurl=http://bugzilla.
> redhat.com/bugzilla --enable-bootstrap --enable-shared
> --enable-threads=posix --enable-checking=release --with-system-zlib
> --enable-__cxa_atexit --disable-libunwind-exceptions
> --enable-gnu-unique-object 
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada
> --enable-java-awt=gtk --disable-dssi 
> --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
> --enable-libgcj-multifile --enable-java-maintainer-mode
> --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
> --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686
> --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)
>
>
>
> Any idea for that?
> Regards,
> Mahmood
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] mpif90 unable to find ibverbs

2017-09-13 Thread Mahmood Naderan
Hi,
I am trying to build an application with static linking that uses openmpi.
in the middle of the build, I get this

mpif90 -g -pthread -static -o iotk_print_kinds.x iotk_print_kinds.o
libiotk.a
/usr/bin/ld: cannot find -libverbs
collect2: ld returned 1 exit status


However, such library exists on the system.

[root@cluster source]# find /usr/ -name *ibverb*
/usr/lib64/libibverbs.so
/usr/lib64/libibverbs.so.1.0.0
/usr/lib64/libibverbs.so.1
/usr/share/doc/libibverbs-1.1.8
[root@cluster source]# mpif90 -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-ppl --with-cloog --with-tune=generic --with-arch_32=i686
--build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)




Any idea for that?
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-08-03 Thread Mahmood Naderan
Well, it seems that the default Rocks-openmpi dominates the systems. So, at
the moment, I stick with that which is 1.6.5 and uses -machinefile.
I will later debug to see why 2.0.1 doesn't work.

Thanks.

Regards,
Mahmood



On Tue, Aug 1, 2017 at 12:30 AM, Gus Correa  wrote:

> Maybe something is wrong with the Torque installation?
> Or perhaps with the Open MPI + Torque integration?
>
> 1) Make sure your Open MPI was configured and compiled with the
> Torque "tm" library of your Torque installation.
> In other words:
>
> configure --with-tm=/path/to/your/Torque/tm_library ...
>
> 2) Check if your $TORQUE/server_priv/nodes file has all the nodes
> in your cluster.  If not, edit the file and add the missing nodes.
> Then restart the Torque server (service pbs_server restart).
>
> 3) Run "pbsnodes" to see if all nodes are listed.
>
> 4) Run "hostname" with mpirun in a short Torque script:
>
> #PBS -l nodes=4:ppn=1
> ...
> mpirun hostname
>
> The output should show all four nodes.
>
> Good luck!
> Gus Correa
>
> On 07/31/2017 02:41 PM, Mahmood Naderan wrote:
>
>> Well it is confusing!! As you can see, I added four nodes to the host
>> file (the same nodes are used by PBS). The --map-by ppr:1:node works well.
>> However, the PBS directive doesn't work
>>
>> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun
>> -hostfile hosts --map-by ppr:1:node a.out
>> 
>> 
>> * hwloc 1.11.2 has encountered what looks like an error from the
>> operating system.
>> *
>> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset
>> 0xff00) without inclusion!
>> * Error occurred in topology.c line 1048
>> *
>> * The following FAQ entry in the hwloc documentation may help:
>> *   What should I do when hwloc reports "operating system" warnings?
>> * Otherwise please report this error message to the hwloc user's mailing
>> list,
>> * along with the output+tarball generated by the hwloc-gather-topology
>> script.
>> 
>> 
>> Hello world from processor cluster.hpc.org <http://cluster.hpc.org>,
>> rank 0 out of 4 processors
>> Hello world from processor compute-0-0.local, rank 1 out of 4 processors
>> Hello world from processor compute-0-1.local, rank 2 out of 4 processors
>> Hello world from processor compute-0-2.local, rank 3 out of 4 processors
>> mahmood@cluster:mpitest$ cat mmt.sh
>> #!/bin/bash
>> #PBS -V
>> #PBS -q default
>> #PBS -j oe
>> #PBS -l  nodes=4:ppn=1
>> #PBS -N job1
>> #PBS -o .
>> cd $PBS_O_WORKDIR
>> /share/apps/computer/openmpi-2.0.1/bin/mpirun a.out
>> mahmood@cluster:mpitest$ qsub mmt.sh
>> 6428.cluster.hpc.org <http://6428.cluster.hpc.org>
>>
>> mahmood@cluster:mpitest$ cat job1.o6428
>> Hello world from processor compute-0-1.local, rank 0 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 2 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 3 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 4 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 5 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 6 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 8 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 9 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 12 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 15 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 16 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 18 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 19 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 20 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 21 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 22 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 24 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 26 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 27 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 28 out of 32 processors
>> Hello world from processor compute-0-1.local, rank 29 out of 32 processors
>> Hello world fr

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Mahmood Naderan
Well it is confusing!! As you can see, I added four nodes to the host file
(the same nodes are used by PBS). The --map-by ppr:1:node works well.
However, the PBS directive doesn't work

mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun
-hostfile hosts --map-by ppr:1:node a.out

* hwloc 1.11.2 has encountered what looks like an error from the operating
system.
*
* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset
0xff00) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing
list,
* along with the output+tarball generated by the hwloc-gather-topology
script.

Hello world from processor cluster.hpc.org, rank 0 out of 4 processors
Hello world from processor compute-0-0.local, rank 1 out of 4 processors
Hello world from processor compute-0-1.local, rank 2 out of 4 processors
Hello world from processor compute-0-2.local, rank 3 out of 4 processors
mahmood@cluster:mpitest$ cat mmt.sh
#!/bin/bash
#PBS -V
#PBS -q default
#PBS -j oe
#PBS -l  nodes=4:ppn=1
#PBS -N job1
#PBS -o .
cd $PBS_O_WORKDIR
/share/apps/computer/openmpi-2.0.1/bin/mpirun a.out
mahmood@cluster:mpitest$ qsub mmt.sh
6428.cluster.hpc.org
mahmood@cluster:mpitest$ cat job1.o6428
Hello world from processor compute-0-1.local, rank 0 out of 32 processors
Hello world from processor compute-0-1.local, rank 2 out of 32 processors
Hello world from processor compute-0-1.local, rank 3 out of 32 processors
Hello world from processor compute-0-1.local, rank 4 out of 32 processors
Hello world from processor compute-0-1.local, rank 5 out of 32 processors
Hello world from processor compute-0-1.local, rank 6 out of 32 processors
Hello world from processor compute-0-1.local, rank 8 out of 32 processors
Hello world from processor compute-0-1.local, rank 9 out of 32 processors
Hello world from processor compute-0-1.local, rank 12 out of 32 processors
Hello world from processor compute-0-1.local, rank 15 out of 32 processors
Hello world from processor compute-0-1.local, rank 16 out of 32 processors
Hello world from processor compute-0-1.local, rank 18 out of 32 processors
Hello world from processor compute-0-1.local, rank 19 out of 32 processors
Hello world from processor compute-0-1.local, rank 20 out of 32 processors
Hello world from processor compute-0-1.local, rank 21 out of 32 processors
Hello world from processor compute-0-1.local, rank 22 out of 32 processors
Hello world from processor compute-0-1.local, rank 24 out of 32 processors
Hello world from processor compute-0-1.local, rank 26 out of 32 processors
Hello world from processor compute-0-1.local, rank 27 out of 32 processors
Hello world from processor compute-0-1.local, rank 28 out of 32 processors
Hello world from processor compute-0-1.local, rank 29 out of 32 processors
Hello world from processor compute-0-1.local, rank 30 out of 32 processors
Hello world from processor compute-0-1.local, rank 31 out of 32 processors
Hello world from processor compute-0-1.local, rank 7 out of 32 processors
Hello world from processor compute-0-1.local, rank 10 out of 32 processors
Hello world from processor compute-0-1.local, rank 14 out of 32 processors
Hello world from processor compute-0-1.local, rank 1 out of 32 processors
Hello world from processor compute-0-1.local, rank 11 out of 32 processors
Hello world from processor compute-0-1.local, rank 13 out of 32 processors
Hello world from processor compute-0-1.local, rank 17 out of 32 processors
Hello world from processor compute-0-1.local, rank 23 out of 32 processors
Hello world from processor compute-0-1.local, rank 25 out of 32 processors



Any idea?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Mahmood Naderan
Excuse me, my fault.. I meant

nodes=2:ppn=2

is 4 threads.


Regards,
Mahmood



On Mon, Jul 31, 2017 at 8:49 PM, r...@open-mpi.org  wrote:

> ?? Doesn't that tell pbs to allocate 1 node with 2 slots on it? I don't
> see where you get 4
>
> Sent from my iPad
>
> On Jul 31, 2017, at 10:00 AM, Mahmood Naderan 
> wrote:
>
> OK. The next question is how touse it with torque (PBS)? currently we
> write this directive
>
> Nodes=1:ppn=2
>
> which means 4 threads. Then we omit -np and -hostfile in the mpirun
> command.
>
> On 31 Jul 2017 20:24, "Elken, Tom"  wrote:
>
>> Hi Mahmood,
>>
>>
>>
>> With the -hostfile case, Open MPI is trying to helpfully run things
>> faster by keeping both processes on one host.  Ways to avoid this…
>>
>>
>>
>> On the mpirun command line add:
>>
>>
>>
>> -pernode  (runs 1 process per node), oe
>>
>> -npernode 1 ,   but these two has been deprecated in favor of the
>> wonderful syntax:
>>
>> --map-by ppr:1:node
>>
>>
>>
>> Or you could change your hostfile to:
>>
>> cluster slots=1
>>
>> compute-0-0 slots=1
>>
>>
>>
>>
>>
>> -Tom
>>
>>
>>
>> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of 
>> *Mahmood
>> Naderan
>> *Sent:* Monday, July 31, 2017 6:47 AM
>> *To:* Open MPI Users 
>> *Subject:* [OMPI users] -host vs -hostfile
>>
>>
>>
>> Hi,
>>
>> I have stuck at a problem which I don't remember that on previous
>> versions. when I run a test program with -host, it works. I mean, the
>> process spans to the hosts I specified. However, when I specify -hostfile,
>> it doesn't work!!
>>
>> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host 
>> compute-0-0,cluster -np 2 a.out
>>
>> 
>>
>> * hwloc 1.11.2 has encountered what looks like an error from the operating 
>> system.
>>
>> *
>>
>> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
>> 0xff00) without inclusion!
>>
>> * Error occurred in topology.c line 1048
>>
>> *
>>
>> * The following FAQ entry in the hwloc documentation may help:
>>
>> *   What should I do when hwloc reports "operating system" warnings?
>>
>> * Otherwise please report this error message to the hwloc user's mailing 
>> list,
>>
>> * along with the output+tarball generated by the hwloc-gather-topology 
>> script.
>>
>> 
>>
>> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>>
>> Hello world from processor compute-0-0.local, rank 0 out of 2 processors
>>
>> mahmood@cluster:mpitest$ cat hosts
>>
>> cluster
>>
>> compute-0-0
>>
>>
>>
>> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun 
>> -hostfile hosts -np 2 a.out
>>
>> 
>>
>> * hwloc 1.11.2 has encountered what looks like an error from the operating 
>> system.
>>
>> *
>>
>> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
>> 0xff00) without inclusion!
>>
>> * Error occurred in topology.c line 1048
>>
>> *
>>
>> * The following FAQ entry in the hwloc documentation may help:
>>
>> *   What should I do when hwloc reports "operating system" warnings?
>>
>> * Otherwise please report this error message to the hwloc user's mailing 
>> list,
>>
>> * along with the output+tarball generated by the hwloc-gather-topology 
>> script.
>>
>> 
>>
>> Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
>>
>> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>>
>>
>> how can I resolve that?
>>
>> Regards,
>> Mahmood
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Mahmood Naderan
OK. The next question is how touse it with torque (PBS)? currently we write
this directive

Nodes=1:ppn=2

which means 4 threads. Then we omit -np and -hostfile in the mpirun command.

On 31 Jul 2017 20:24, "Elken, Tom"  wrote:

> Hi Mahmood,
>
>
>
> With the -hostfile case, Open MPI is trying to helpfully run things faster
> by keeping both processes on one host.  Ways to avoid this…
>
>
>
> On the mpirun command line add:
>
>
>
> -pernode  (runs 1 process per node), oe
>
> -npernode 1 ,   but these two has been deprecated in favor of the
> wonderful syntax:
>
> --map-by ppr:1:node
>
>
>
> Or you could change your hostfile to:
>
> cluster slots=1
>
> compute-0-0 slots=1
>
>
>
>
>
> -Tom
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Mahmood
> Naderan
> *Sent:* Monday, July 31, 2017 6:47 AM
> *To:* Open MPI Users 
> *Subject:* [OMPI users] -host vs -hostfile
>
>
>
> Hi,
>
> I have stuck at a problem which I don't remember that on previous
> versions. when I run a test program with -host, it works. I mean, the
> process spans to the hosts I specified. However, when I specify -hostfile,
> it doesn't work!!
>
> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host 
> compute-0-0,cluster -np 2 a.out
>
> 
>
> * hwloc 1.11.2 has encountered what looks like an error from the operating 
> system.
>
> *
>
> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
> 0xff00) without inclusion!
>
> * Error occurred in topology.c line 1048
>
> *
>
> * The following FAQ entry in the hwloc documentation may help:
>
> *   What should I do when hwloc reports "operating system" warnings?
>
> * Otherwise please report this error message to the hwloc user's mailing list,
>
> * along with the output+tarball generated by the hwloc-gather-topology script.
>
> 
>
> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>
> Hello world from processor compute-0-0.local, rank 0 out of 2 processors
>
> mahmood@cluster:mpitest$ cat hosts
>
> cluster
>
> compute-0-0
>
>
>
> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun 
> -hostfile hosts -np 2 a.out
>
> 
>
> * hwloc 1.11.2 has encountered what looks like an error from the operating 
> system.
>
> *
>
> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
> 0xff00) without inclusion!
>
> * Error occurred in topology.c line 1048
>
> *
>
> * The following FAQ entry in the hwloc documentation may help:
>
> *   What should I do when hwloc reports "operating system" warnings?
>
> * Otherwise please report this error message to the hwloc user's mailing list,
>
> * along with the output+tarball generated by the hwloc-gather-topology script.
>
> 
>
> Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
>
> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>
>
> how can I resolve that?
>
> Regards,
> Mahmood
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] -host vs -hostfile

2017-07-31 Thread Mahmood Naderan
Hi,

I have stuck at a problem which I don't remember that on previous versions.
when I run a test program with -host, it works. I mean, the process spans
to the hosts I specified. However, when I specify -hostfile, it doesn't
work!!

mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun
-host compute-0-0,cluster -np 2 a.out

* hwloc 1.11.2 has encountered what looks like an error from the
operating system.
*
* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset
0xff00) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.

Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
Hello world from processor compute-0-0.local, rank 0 out of 2 processors
mahmood@cluster:mpitest$ cat hosts
cluster
compute-0-0

mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun
-hostfile hosts -np 2 a.out

* hwloc 1.11.2 has encountered what looks like an error from the
operating system.
*
* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset
0xff00) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.

Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
Hello world from processor cluster.hpc.org, rank 1 out of 2 processors


how can I resolve that?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] test

2017-07-31 Thread Mahmood Naderan
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Test, Am I subscribed?

2017-07-31 Thread Mahmood Naderan
Hello,

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Using custom version of gfortran in mpifort

2016-11-17 Thread Mahmood Naderan
Hi,
The mpifort wrapper uses the default gfortran compiler on the system. How
can I give it another version of gfortran which has been installed in
another folder?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] An old code compatibility

2016-11-14 Thread Mahmood Naderan
The output is not meaningful for me.
If I add --showme option, the output is http://pastebin.com/FX1ks8iW

and if I drop --showme, the output is http://pastebin.com/R1QFYVBe

Please search for xml.F. Do you have any idea?


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] An old code compatibility

2016-11-14 Thread Mahmood Naderan
Hi,
The following mpifort command fails with a syntax error. It seems that the
code is compatible with old gfortran, but I am not aware of that. Any idea
about that?

mpifort -ffree-form -ffree-line-length-0 -ff2c -fno-second-underscore
-I/opt/fftw-3.3.5/include  -O3  -c xml.f90
xml.F:641.46:

   CALL XML_TAG("set", comment="spin "
  1
Error: Syntax error in argument list at (1)




In the source code, that line is

CALL XML_TAG("set", comment="spin "//TRIM(ADJUSTL(strcounter)))


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] error on dlopen

2016-11-04 Thread Mahmood Naderan
>​What problems are you referring to?
I mean errors that are saying failed to load X.so. Then the user has to add
some paths to LD_LIBRARY_PATH. Although such problem can be fixed by adding
an export to the .bashrc, but I prefer to avoid that.


>We might need a bit more detail than that; I use "--enable-static
--disable-shared" and I do not get dlopen errors

I also have seen that on Centos. But as I test an application on
Ubuntu-15.04, I saw that error. Maybe on Centos, an external library has
been installed but it is missed on Ubuntu... This is a guess though.


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] error on dlopen

2016-11-04 Thread Mahmood Naderan
>​If there's a reason you did --enable-static --disable-shared​
Basically, I want to prevent dynamic library problems (ldd) on a
distributed environment.


​$ mpifort --showme
gfortran -I/opt/openmpi-2.0.1/include -pthread -I/opt/openmpi-2.0.1/lib
-Wl,-rpath -Wl,/opt/openmpi-2.0.1/lib -Wl,--enable-new-dtags
-L/opt/openmpi-2.0.1/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr
-lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lm -lrt -lutil
​
As I said, --disbale-dlopen fixed that error. But, if anybody know how to
have --enable-static --disable-shared with dlopen, please let me know.



Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] error on dlopen

2016-11-04 Thread Mahmood Naderan
> You might have to remove -ldl from the scalapack makefile
I removed that before... I will try one more time

Actually, using --disable-dlopen fixed the error.

>mpirun --showme

$ mpirun --showme
mpirun: Error: unknown option "--showme"
Type 'mpirun --help' for usage.


Regards,
Mahmood



On Fri, Nov 4, 2016 at 2:12 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> You might have to remove -ldl from the scalapack makefile
>
> If it still does not work, can you please post
> mpirun --showme ...
> output ?
>
> Cheers,
>
> Gilles
>
>
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] error on dlopen

2016-11-04 Thread Mahmood Naderan
Hi Gilles,
I noticed that /opt/openmpi-2.0.1/share/openmpi/mpifort-wrapper-data.txt is
created after "make install". So, I edited it and appended -ldl to
libs_static.
Then I ran "make clean && make all" for scalapack.

However, still get the same error!!

So, let me try disabling dlopen.


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] error on dlopen

2016-11-04 Thread Mahmood Naderan
I will try that. Meanwhile, I want to know what is the performance effect
of disabling/enabling dlopen?

Regards,
Mahmood



On Fri, Nov 4, 2016 at 11:02 AM, Gilles Gouaillardet 
wrote:

> Yes, that is a problem :-(
>
>
> you might want to reconfigure with
>
> --enable-static --disable-shared --disable-dlopen
>
> and see if it helps
>
>
> or you can simply manuall edit /opt/openmpi-2.0.1/share/
> openmpi/mpifort-wrapper-data.txt,
>
> and append -ldl to the libs_static definition
>
>
> Cheers,
>
>
> Gilles
>
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] error on dlopen

2016-11-04 Thread Mahmood Naderan
>did you build Open MPI as a static only library ?
Yes, I used --enable-static --disable-shared


Please see the output

# mpifort -O3 -o xCbtest --showme blacstest.o btprim.o tools.o Cbt.o
../../libscalapack.a -ldl
gfortran -O3 -o xCbtest blacstest.o btprim.o tools.o Cbt.o
../../libscalapack.a -ldl -I/opt/openmpi-2.0.1/include -pthread
-I/opt/openmpi-2.0.1/lib -Wl,-rpath -Wl,/opt/openmpi-2.0.1/lib
-Wl,--enable-new-dtags -L/opt/openmpi-2.0.1/lib -lmpi_usempif08
-lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lm -lrt
-lutil


I don't see what you said after "-lopen-pal". Is that OK?

Regards,
Mahmood



On Fri, Nov 4, 2016 at 10:23 AM, Gilles Gouaillardet 
wrote:

> Mahmood,
>
>
> did you build Open MPI as a static only library ?
>
>
> i guess the -ldl position is wrong. your link command line should be
>
> mpifort -O3 -o xCbtest blacstest.o btprim.o tools.o Cbt.o
> ../../libscalapack.a -ldl
>
>
> you can manually
>
> mpifort -O3 -o xCbtest --showme blacstest.o btprim.o tools.o Cbt.o
> ../../libscalapack.a -ldl
>
> it should show -ldl is added *after* -lopen-pal
>
>
> Cheers,
>
> Gilles
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] error on dlopen

2016-11-03 Thread Mahmood Naderan
I added that but still get the same error. Please see the config file for
building scalapack

# cat SLmake.inc
CDEFS = -DAdd_
FC= mpifort -ldl
CC= mpicc
NOOPT = -O0
FCFLAGS   = -O3
CCFLAGS   = -O3
FCLOADER  = $(FC)
CCLOADER  = $(CC)
FCLOADFLAGS   = $(FCFLAGS)
CCLOADFLAGS   = $(CCFLAGS)
ARCH  = ar
ARCHFLAGS = cr
RANLIB= ranlib
SCALAPACKLIB  = libscalapack.a
BLASLIB   = /opt/OpenBLAS-0.2.18/libopenblas.a
LAPACKLIB =
LIBS  = $(LAPACKLIB) $(BLASLIB)




Regards,
Mahmood



On Fri, Nov 4, 2016 at 12:25 AM, Sean Ahern  wrote:

> Sounds to me like you're missing a -ldl linker flag.
>
> -Sean
>
> --
> Sean Ahern
> Computational Engineering International
> 919-363-0883
>
> On Thu, Nov 3, 2016 at 3:57 PM, Mahmood Naderan 
> wrote:
>
>> Hi
>> I am building scalapack with mpicc and mpifort, however this is the error
>> I get:
>>
>> mpifort -O3 -o xCbtest blacstest.o btprim.o tools.o Cbt.o
>> ../../libscalapack.a
>> /opt/openmpi-2.0.1/lib/libopen-pal.a(dl_dlopen_module.o): In function
>> `dlopen_close':
>> dl_dlopen_module.c:(.text+0x29d): undefined reference to `dlclose'
>> /opt/openmpi-2.0.1/lib/libopen-pal.a(dl_dlopen_module.o): In function
>> `dlopen_lookup':
>> dl_dlopen_module.c:(.text+0x2d0): undefined reference to `dlsym'
>> dl_dlopen_module.c:(.text+0x2fb): undefined reference to `dlerror'
>> /opt/openmpi-2.0.1/lib/libopen-pal.a(dl_dlopen_module.o): In function
>> `dlopen_open':
>> dl_dlopen_module.c:(.text+0x3ca): undefined reference to `dlopen'
>> dl_dlopen_module.c:(.text+0x431): undefined reference to `dlerror'
>> dl_dlopen_module.c:(.text+0x456): undefined reference to `dlopen'
>> dl_dlopen_module.c:(.text+0x4a9): undefined reference to `dlerror'
>> dl_dlopen_module.c:(.text+0x501): undefined reference to `dlopen'
>> /opt/openmpi-2.0.1/lib/libopen-pal.a(patcher_overwrite_module.o): In
>> function `mca_patcher_overwrite_patch_symbol':
>> patcher_overwrite_module.c:(.text+0x12e): undefined reference to `dlsym'
>> patcher_overwrite_module.c:(.text+0x166): undefined reference to `dlsym'
>> patcher_overwrite_module.c:(.text+0x173): undefined reference to
>> `dlerror'
>> collect2: error: ld returned 1 exit status
>> Makefile:18: recipe for target 'xCbtest' failed
>> make[2]: *** [xCbtest] Error 1
>>
>>
>>
>> As I grep "dlopen", some OMPI binary files match. Any idea about that?
>>
>>
>> Regards,
>> Mahmood
>>
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] error on dlopen

2016-11-03 Thread Mahmood Naderan
Hi
I am building scalapack with mpicc and mpifort, however this is the error I
get:

mpifort -O3 -o xCbtest blacstest.o btprim.o tools.o Cbt.o
../../libscalapack.a
/opt/openmpi-2.0.1/lib/libopen-pal.a(dl_dlopen_module.o): In function
`dlopen_close':
dl_dlopen_module.c:(.text+0x29d): undefined reference to `dlclose'
/opt/openmpi-2.0.1/lib/libopen-pal.a(dl_dlopen_module.o): In function
`dlopen_lookup':
dl_dlopen_module.c:(.text+0x2d0): undefined reference to `dlsym'
dl_dlopen_module.c:(.text+0x2fb): undefined reference to `dlerror'
/opt/openmpi-2.0.1/lib/libopen-pal.a(dl_dlopen_module.o): In function
`dlopen_open':
dl_dlopen_module.c:(.text+0x3ca): undefined reference to `dlopen'
dl_dlopen_module.c:(.text+0x431): undefined reference to `dlerror'
dl_dlopen_module.c:(.text+0x456): undefined reference to `dlopen'
dl_dlopen_module.c:(.text+0x4a9): undefined reference to `dlerror'
dl_dlopen_module.c:(.text+0x501): undefined reference to `dlopen'
/opt/openmpi-2.0.1/lib/libopen-pal.a(patcher_overwrite_module.o): In
function `mca_patcher_overwrite_patch_symbol':
patcher_overwrite_module.c:(.text+0x12e): undefined reference to `dlsym'
patcher_overwrite_module.c:(.text+0x166): undefined reference to `dlsym'
patcher_overwrite_module.c:(.text+0x173): undefined reference to `dlerror'
collect2: error: ld returned 1 exit status
Makefile:18: recipe for target 'xCbtest' failed
make[2]: *** [xCbtest] Error 1



As I grep "dlopen", some OMPI binary files match. Any idea about that?


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Low CPU utilization

2016-10-16 Thread Mahmood Naderan
Hi,
I am running two softwares that use OMPI-2.0.1. Problem is that the CPU
utilization is low on the nodes.


For example, see the process information below

[root@compute-0-1 ~]# ps aux | grep siesta
mahmood  14635  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
mahmood  14636  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
mahmood  14637 61.6  0.2 372076 158220 ?   Rl   21:58   0:38
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta
mahmood  14639 59.6  0.2 365992 154228 ?   Rl   21:58   0:37
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta


Note that the cpu utilization is the third column. The "siesta.pl" script is

#!/bin/bash
BENCH=$1
export OMP_NUM_THREADS=1
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta < $BENCH




I also saw a similar behavior from Gromacs which has been discussed at
https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2016-October/108939.html

It seems that there is a tricky thing with OMPI. Any idea is welcomed.


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] not enough slots available

2016-10-05 Thread Mahmood Naderan
I found that if I put "compute-0-3" in a file (say hosts.txt) and pass that
file name via --hostfile, then the error disappears.

It is interesting to know what is the difference between these two? I think
it is very common to specify the nodes with --host option.


Regards,
Mahmood



On Wed, Oct 5, 2016 at 7:52 PM, Mahmood Naderan 
wrote:

> Sorry about the incomplete message...
>
> Is there any idea about the following error? On that node, there are 15
> empty cores.
>
> $ /share/apps/siesta/openmpi-2.0.1/bin/mpirun --host compute-0-3 -np 2
> /share/apps/siesta/siesta-4.0-mpi201/tpar/transiesta < A.fdf
> --
> There are not enough slots available in the system to satisfy the 2 slots
> that were requested by the application:
>   /share/apps/siesta/siesta-4.0-mpi201/tpar/transiesta
>
> Either request fewer slots for your application, or make more slots
> available
> for use.
> --
>
>
>
> Regards,
> Mahmood
>
>
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] not enough slots available

2016-10-05 Thread Mahmood Naderan
Sorry about the incomplete message...

Is there any idea about the following error? On that node, there are 15
empty cores.

$ /share/apps/siesta/openmpi-2.0.1/bin/mpirun --host compute-0-3 -np 2
/share/apps/siesta/siesta-4.0-mpi201/tpar/transiesta < A.fdf
--
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
  /share/apps/siesta/siesta-4.0-mpi201/tpar/transiesta

Either request fewer slots for your application, or make more slots
available
for use.
--



Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] (no subject)

2016-10-05 Thread Mahmood Naderan
Hi,
Is there any idea about the following error? On that node, there are 15
empty cores.

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Multiple versions of OpenMPI

2016-10-03 Thread Mahmood Naderan
Hello,
Consider that OMPI-2.0.1 has been installed with
--enable-mpirun-prefix-by-default. Now, is it possible to install
OMPI-1.6.5 in its own folder and use it without any problem?

I mean, if I run ompi-1.6.5/bin/mpirun, I want to be sure that
ompi-2.0.1/lib is not in use.

Please let me know, if there is any caution for that.

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Viewing the output of the program

2016-10-03 Thread Mahmood Naderan
Thank you very much. It is fine with 2.0.1.

Regards,
Mahmood



On Sat, Oct 1, 2016 at 1:09 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Mahmood,
>
> iirc, a related bug was fixed in v2.0.0
> Can you please update to 2.0.1 and try again ?
>
> Cheers,
>
> Gilles
>
>
> On Saturday, October 1, 2016, Mahmood Naderan 
> wrote:
>
>> Hi,
>> Here is the bizarre behavior of the system and hope that someone can
>> clarify is this related to OMPI or not.
>>
>> When I issue the mpirun command with -np 2, I can see the output of the
>> program online as it is running (I am std out). However, if I issue the
>> command with -np 4, the progress is not shown!!
>>
>>
>> Please see the output below. I ran 'date' command first and the issued
>> the command with '-np 4'. After some seconds, I pressed ^C and ran 'date'
>> again. As you can see, there is no output information. Next, I ran with
>> '-np 2' and after a while I pressed ^C. You can see that the progress of
>> the program is shown.
>>
>>
>>
>>
>> mahmood@cluster:A4$ date
>> Sat Oct  1 11:26:13 2016
>> mahmood@cluster:A4$ /share/apps/computer/openmpi-2.0.0/bin/mpirun
>> --hostfile hosts.txt -np 4 /share/apps/chemistry/siesta-4.0/spar/siesta
>> < A.fdf
>> Siesta Version: siesta-4.0--500
>> Architecture  : x86_64-unknown-linux-gnu--unknown
>> Compiler flags: /share/apps/computer/openmpi-2.0.0/bin/mpifort
>> PP flags  : -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
>> PARALLEL version
>>
>> * Running on4 nodes in parallel
>> >> Start of run:   1-OCT-2016  11:26:23
>>
>>***
>>*  WELCOME TO SIESTA  *
>>***
>>
>> reinit: Reading from standard input
>> ** Dump of input data file
>> 
>> ^CKilled by signal 2.
>> mahmood@cluster:A4$ date
>> Sat Oct  1 11:26:30 2016
>> mahmood@cluster:A4$ /share/apps/computer/openmpi-2.0.0/bin/mpirun
>> --hostfile hosts.txt -np 2 /share/apps/chemistry/siesta-4.0/spar/siesta
>> < A.fdf
>> Siesta Version: siesta-4.0--500
>> Architecture  : x86_64-unknown-linux-gnu--unknown
>> Compiler flags: /share/apps/computer/openmpi-2.0.0/bin/mpifort
>> PP flags  : -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
>> PARALLEL version
>>
>> * Running on2 nodes in parallel
>> >> Start of run:   1-OCT-2016  11:26:36
>>
>>***
>>*  WELCOME TO SIESTA  *
>>***
>>
>> reinit: Reading from standard input
>> ** Dump of input data file
>> 
>> SystemLabel  A
>> NumberOfAtoms54
>> NumberOfSpecies  2
>> %block ChemicalSpeciesLabel
>> ...
>> ...
>> ...
>> ^CKilled by signal 2.
>> mahmood@cluster:A4$ date
>> Sat Oct  1 11:26:38 2016
>>
>>
>>
>>
>>
>> Any idea about that? The problem occurs when I change the MPI's switches.
>>
>> Regards,
>> Mahmood
>>
>>
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Viewing the output of the program

2016-10-01 Thread Mahmood Naderan
Hi,
Here is the bizarre behavior of the system and hope that someone can
clarify is this related to OMPI or not.

When I issue the mpirun command with -np 2, I can see the output of the
program online as it is running (I am std out). However, if I issue the
command with -np 4, the progress is not shown!!


Please see the output below. I ran 'date' command first and the issued the
command with '-np 4'. After some seconds, I pressed ^C and ran 'date'
again. As you can see, there is no output information. Next, I ran with
'-np 2' and after a while I pressed ^C. You can see that the progress of
the program is shown.




mahmood@cluster:A4$ date
Sat Oct  1 11:26:13 2016
mahmood@cluster:A4$ /share/apps/computer/openmpi-2.0.0/bin/mpirun
--hostfile hosts.txt -np 4 /share/apps/chemistry/siesta-4.0/spar/siesta <
A.fdf
Siesta Version: siesta-4.0--500
Architecture  : x86_64-unknown-linux-gnu--unknown
Compiler flags: /share/apps/computer/openmpi-2.0.0/bin/mpifort
PP flags  : -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
PARALLEL version

* Running on4 nodes in parallel
>> Start of run:   1-OCT-2016  11:26:23

   ***
   *  WELCOME TO SIESTA  *
   ***

reinit: Reading from standard input
** Dump of input data file

^CKilled by signal 2.
mahmood@cluster:A4$ date
Sat Oct  1 11:26:30 2016
mahmood@cluster:A4$ /share/apps/computer/openmpi-2.0.0/bin/mpirun
--hostfile hosts.txt -np 2 /share/apps/chemistry/siesta-4.0/spar/siesta <
A.fdf
Siesta Version: siesta-4.0--500
Architecture  : x86_64-unknown-linux-gnu--unknown
Compiler flags: /share/apps/computer/openmpi-2.0.0/bin/mpifort
PP flags  : -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
PARALLEL version

* Running on2 nodes in parallel
>> Start of run:   1-OCT-2016  11:26:36

   ***
   *  WELCOME TO SIESTA  *
   ***

reinit: Reading from standard input
** Dump of input data file

SystemLabel  A
NumberOfAtoms54
NumberOfSpecies  2
%block ChemicalSpeciesLabel
...
...
...
^CKilled by signal 2.
mahmood@cluster:A4$ date
Sat Oct  1 11:26:38 2016





Any idea about that? The problem occurs when I change the MPI's switches.

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem running an MPI program through the PBS manager

2016-09-26 Thread Mahmood Naderan
OK thank you very much. It is now running...

Regards,
Mahmood



On Mon, Sep 26, 2016 at 2:04 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Mahmood,
>
> The node is defined in the PBS config, however it is not part of the
> allocation (e.g. job) so it cannot be used, and hence the error message.
>
> In your PBS script, you do not need -np nor -host parameters to your
> mpirun command.
> Open MPI mpirun will automatically detect it is launched from a PBS job,
> and get the needed information directly from PBS.
>
> FWIW, the list of allocated nodes is in the file $PBS_NODEFILE, but you
> should not need that.
>
> Cheers,
>
> Gilles
>
>
> On Monday, September 26, 2016, Mahmood Naderan 
> wrote:
>
>> Hi,
>> When I run an MPI command through the terminal the programs runs fine on
>> the compute node specified in hosts.txt.
>>
>> However, when I put that command in a PBS script, if says that the
>> compute node is not defined in the job manager's list. However, that node
>> is actually defined in the job manager.
>>
>> Please see the output below
>>
>>
>> mahmood@cluster:tran-bt-o-40$ cat submit.tor
>> #!/bin/bash
>> #PBS -V
>> #PBS -q default
>> #PBS -j oe
>> #PBS -l nodes=1:ppn=15
>> #PBS -N job-1
>> #PBS -o /home/mahmood/tran-bt-o-40/cc-bt-cc-163-20.out
>> cd $PBS_O_WORKDIR
>> /share/apps/computer/openmpi-2.0.0/bin/mpirun -hostfile hosts.txt -np 15
>> /share/apps/chemistry/siesta-4.0/tpar/transiesta <
>> trans-cc-bt-cc-163-20.fdf
>> mahmood@cluster:tran-bt-o-40$ cat cc-bt-cc-163-20.out
>> 
>> --
>> A hostfile was provided that contains at least one node not
>> present in the allocation:
>>
>>   hostfile:  hosts.txt
>>   node:  compute-0-1
>>
>> If you are operating in a resource-managed environment, then only
>> nodes that are in the allocation can be used in the hostfile. You
>> may find relative node syntax to be a useful alternative to
>> specifying absolute node names see the orte_hosts man page for
>> further information.
>> 
>> --
>> mahmood@cluster:tran-bt-o-40$ cat hosts.txt
>> compute-0-1
>> compute-0-2
>> mahmood@cluster:tran-bt-o-40$ pbsnodes -l all
>> compute-0-0  down
>> compute-0-1  free
>> compute-0-2  free
>> compute-0-3  free
>>
>>
>>
>> As you can see, compute-0-1 has free cores and it is defined for the
>> manager.
>>
>> Any idea?
>> Regards,
>> Mahmood
>>
>>
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Problem running an MPI program through the PBS manager

2016-09-26 Thread Mahmood Naderan
Hi,
When I run an MPI command through the terminal the programs runs fine on
the compute node specified in hosts.txt.

However, when I put that command in a PBS script, if says that the compute
node is not defined in the job manager's list. However, that node is
actually defined in the job manager.

Please see the output below


mahmood@cluster:tran-bt-o-40$ cat submit.tor
#!/bin/bash
#PBS -V
#PBS -q default
#PBS -j oe
#PBS -l nodes=1:ppn=15
#PBS -N job-1
#PBS -o /home/mahmood/tran-bt-o-40/cc-bt-cc-163-20.out
cd $PBS_O_WORKDIR
/share/apps/computer/openmpi-2.0.0/bin/mpirun -hostfile hosts.txt -np 15
/share/apps/chemistry/siesta-4.0/tpar/transiesta < trans-cc-bt-cc-163-20.fdf
mahmood@cluster:tran-bt-o-40$ cat cc-bt-cc-163-20.out
--
A hostfile was provided that contains at least one node not
present in the allocation:

  hostfile:  hosts.txt
  node:  compute-0-1

If you are operating in a resource-managed environment, then only
nodes that are in the allocation can be used in the hostfile. You
may find relative node syntax to be a useful alternative to
specifying absolute node names see the orte_hosts man page for
further information.
--
mahmood@cluster:tran-bt-o-40$ cat hosts.txt
compute-0-1
compute-0-2
mahmood@cluster:tran-bt-o-40$ pbsnodes -l all
compute-0-0  down
compute-0-1  free
compute-0-2  free
compute-0-3  free



As you can see, compute-0-1 has free cores and it is defined for the
manager.

Any idea?
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-22 Thread Mahmood Naderan
​>Thx for sharing, quite interesting. But does this mean, that there is no
working command line flag for gcc to switch this >off (like -march=bdver1
what Gilles mentioned) or to tell me what he thinks it should compile for?
​
Well that didn't work. maybe I messed somethings since I did recompile the
programs multiple times with different configs and options. I will try one
more time.



Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-22 Thread Mahmood Naderan
Although this problem is not related to OMPI *at all*, I think it is good
to tell the others what was going on. Finally, I caught the illegal
instruction :)

Briefly, I built the serial version of Siesta on the frontend and ran it
directly on the compute node. Fortunately, "x/i $pc" from GDB showed that
the illegal instruction was a FMA3 instruction. More detail is available at
https://gcc.gnu.org/ml/gcc-help/2016-09/msg00084.html

According to the Wikipedia,


   - *FMA4* is supported in AMD
    processors
   starting with the Bulldozer
   
   architecture. FMA4 was realized in hardware before FMA3.
   - *FMA3* is supported in AMD processors starting with the Piledriver
   
   architecture and Intel 
   starting with Haswell processors
    and
Broadwell
   processors
    since
   2014.

Therefore, the frontend (piledriver) inserts a FMA3 instruction while the
compute node (Bulldozer) doesn't recognize it.

The solution was (as stated by guys) building Siesta on the compute node. I
have to say that I tested all related programs (OMPI
​,​
Scalapack, OpenBLAS
​) sequentially on the compute node in order to find who generate the
illegal instruction.

Anyway... thanks a lot for your comments. Hope this helps others in the
future.
​


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-21 Thread Mahmood Naderan
Dear Gilles,
It seems that using GDB with MPI is a bit tricky. I read the FAQ about that.

Please see the post at https://gcc.gnu.org/ml/gcc-help/2016-09/msg00078.html



>i guess your gdb is also a bit too old to support all operations on a core
file
>(fwiw, i am able to do that on RHEL7)
This is a Rocks-6 and the GBD is 7.2. It seems that it doesn't support
"info proc mapping" command


I will try your suggestion by modifying the code. Meanwhile do you have any
comment about that post (the link above)?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-16 Thread Mahmood Naderan
OK Gilles, let me try that. I will troubleshoot with gcc mailing list and
will come back later.


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-15 Thread Mahmood Naderan
Excuse me, which is most suitable for me to find the name of the illegal
instruction?

--verbose
--debug-level
--debug-daemons
--debug-daemons-file


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-15 Thread Mahmood Naderan
The differences are very very minor

root@cluster:tpar# echo | gcc -v -E - 2>&1 | grep cc1
 /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1 -E -quiet -v -
-mtune=generic

[root@compute-0-1 ~]# echo | gcc -v -E - 2>&1 | grep cc1
 /usr/libexec/gcc/x86_64-redhat-linux/4.4.6/cc1 -E -quiet -v -
-mtune=generic


Even I tried to compile the program with -march=amdfam10. Something like
these

/export/apps/siesta/openmpi-2.0.0/bin/mpifort -c -g -Os -march=amdfam10
`FoX/FoX-config --fcflags`  -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
-DTRANSIESTA/export/apps/siesta/siesta-4.0/Src/pspltm1.F

But got the same error.

/proc/cpuinfo on the frontend shows (family 21, model 2) and on the compute
node it shows (family 21, model 1).



>That being said, my best bet is you compile on a compute node ...
gcc is there on the computes, but the NFS permission is another issue. It
seems that nodes are not able to write on /share (the one which is shared
between frontend and computes).



An important question is that, how can I find out what is the name of the
illegal instruction. Then, I hope to find the document that points which
instruction set (avx, sse4, ...) contains that instruction.

Is there any option in mpirun to turn on the verbosity to see more
information?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-15 Thread Mahmood Naderan
Although the CPUs are nearly the same, but the CPU flags are different.
I noticed that the frontend has fma, f16c, tch, tce, tbm and bmi1 while the
compute nodes don't have them.

I guess that since the programs were compiled on the frontend (6380), there
are some especial instructions in the optimization phase which aren't
available in compute nodes (6282).

Maybe this is not really related to OMPI, but anybody know which compiler
flags are related to these special instructions?




>Ok, you can try this under gdb
>info proc mapping
>info registers
>x /100x $rip
>x /100x $eip

The process is dead, so some commands are invalid.

Program terminated with signal 4, Illegal instruction.
#0  0x008da76e in ?? ()
(gdb) info proc mapping
No /proc directory: '/proc/5383'
(gdb) info registers
rax0x0  0
rbx0x448f98071891328
rcx0x7fff52810b00   140734577576704
rdx0x448f98071891328
rsi0x448f98071891328
rdi0x8  8
rbp0x448f9800x448f980
rsp0x7fff52810ae8   0x7fff52810ae8
r8 0x1  1
r9 0x9c02496
r100x44af48072021120
r110x44b1b8072031104
r120x8  8
r130x8  8
r140x9  9
r150x13880  8
rip0x8da76e 0x8da76e
eflags 0x10246  [ PF ZF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0  0
es 0x0  0
fs 0x0  0
gs 0x0  0
(gdb) x /100x $rip
0x8da76e:   Cannot access memory at address 0x8da76e
(gdb) x /100x $eip
Value can't be converted to integer.
(gdb)



Regards,
Mahmood
​​
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-15 Thread Mahmood Naderan
disas command fails.

Program terminated with signal 4, Illegal instruction.
#0  0x008da76e in ?? ()
(gdb) bt
#0  0x008da76e in ?? ()
#1  0x008da970 in ?? ()
#2  0x00bfe9f8 in ?? ()
#3  0x in ?? ()
(gdb) disas
No function contains program counter for selected frame.


>Btw, did you run some simple applications with openmpi 2.0.0 ?
>We do have bits of assembly code, and even if i do not believe they are
specific to intel cpus, i might be wrong >and that could be the root cause.

I didn't run the tests. But I am pretty sure that OpenMPI is working
because, other applications (not siesta) have no problem.
Please note that the CPUs are AMD. Frontend is Opteron 6380 and the compute
nodes are 6282SE

>Also, did you run
>make check
>After you built openmpi ?

All are OK. Please see below.


Testsuite summary for Open MPI 2.0.0

# TOTAL: 2
# PASS:  2
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0



Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Still "illegal instruction"

2016-09-15 Thread Mahmood Naderan
>gdb --pid=core.5383

​Are you sure about the syntax?​
​PID must be a running process. I see --core which seems to be relevant
here.

Both OpenMPI and Siesta were compiled with O flags. This is not appropriate
for gdb. Should I compile both of them with debug symbols?

>Btw, did you compile lapack and friends by yourself ?
I use Scalapack which need BLAS. I use OpenBLAS instead of netllib's BLAS?


​$ gdb --core=core.5383

Try: yum --enablerepo='*-debug*' install
/usr/lib/debug/.build-id/e1/ddc85f7caa9f2571545a58479d64ba676217dd
[New Thread 5383]
[New Thread 5416]
[New Thread 5401]
[New Thread 5388]
[New Thread 5407]
[New Thread 5406]
[New Thread 5418]
[New Thread 5393]
[New Thread 5391]
[New Thread 5387]
[New Thread 5405]
[New Thread 5389]
[New Thread 5408]
[New Thread 5417]
[New Thread 5394]
[New Thread 5506]
[New Thread 5404]
[New Thread 5392]
[New Thread 5410]
[New Thread 5411]
[New Thread 5395]
[New Thread 5409]
[New Thread 5403]
[New Thread 5414]
[New Thread 5396]
[New Thread 5412]
[New Thread 5419]
[New Thread 5413]
[New Thread 5509]
[New Thread 5415]
[New Thread 5397]
[New Thread 5420]
[New Thread 5398]
[New Thread 5399]
Core was generated by `/share/apps/siesta/siesta-4.0/tpar/transiesta'.
Program terminated with signal 4, Illegal instruction.
#0  0x008da76e in ?? ()
(gdb) bt
#0  0x008da76e in ?? ()
#1  0x008da970 in ?? ()
#2  0x00bfe9f8 in ?? ()
#3  0x in ?? ()
(gdb)
​

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Still "illegal instruction"

2016-09-15 Thread Mahmood Naderan
Hi,
After upgrading OpenMPI (from 1.6.5 to 2.0.0) and my program (from 3.2 to
4.0), still the parallel run aborts with the "Illegal instruction" error in
the middle on the run.

I wonder why this happens and how can I debug more? How can I find that
this error is related to the program itself, mpi or system libraries?

Gilles gave a suggestion about using ulimit to create a core file (
https://mail-archive.com/users@lists.open-mpi.org/msg29919.html). Please
see the following:

mahmood@cluster:tran$ cat sc.sh
#!/bin/bash
ulimit -c unlimited
exec /share/apps/siesta/siesta-4.0/tpar/transiesta < trans-cc.fdf
mahmood@cluster:tran$ cat hosts.txt
compute-0-1
mahmood@cluster:tran$ hostname
cluster
mahmood@cluster:tran$ #/share/apps/siesta/openmpi-2.0.0/bin/mpirun
-hostfile hosts.txt -np 15 sc.sh

--
mpirun noticed that process rank 0 with PID 5383 on node compute-0-1 exited
on signal 4 (Illegal instruction).
--



Now I see a file core.5383
It is a very huge file (1290018816 bytes)!!!
How can I process that?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] static linking MPI libraries with applications

2016-09-14 Thread Mahmood Naderan
Good news :)

>If I drop -static, the error is gone... However, ldd command shoes that
binary can not access those two MPI libraries.

In the previous installation, I kept both .so and .a files. Therefore, it
first searched for .so files and that was the reason why ldd failed.


Forget about dlopen.
The only options needed were --disable-shared --enabled-static
​--enable-mpirun-prefix-by-default

Sorry guys for spamming​


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] static linking MPI libraries with applications

2016-09-14 Thread Mahmood Naderan
​I installed libibverb-devel-static.x86_64 via yum


root@cluster:tpar# yum list libibverb*
Installed Packages
libibverbs.x86_64
1.1.8-4.el6@base
libibverbs-devel.x86_64
1.1.8-4.el6@base
libibverbs-devel-static.x86_64
1.1.8-4.el6@base
Available Packages
libibverbs.i686
1.1.8-4.el6base
libibverbs-devel.i686
1.1.8-4.el6base
libibverbs-utils.x86_64
1.1.8-4.el6base
root@cluster:tpar# find /usr -name libibverb*
/usr/lib64/libibverbs.so.1.0.0
/usr/lib64/libibverbs.so
/usr/lib64/libibverbs.a
/usr/lib64/libibverbs.so.1
/usr/share/doc/libibverbs-1.1.8


and added /usr/lib64/libibverbs.a similar to the scalapack I added... Just
gave the full path.



However, this is what I get:

libmpi_f90.a  \
`FoX/FoX-config --libs --wcml` ../libscalapack.a
../libopenblas.a  /export/apps/siesta/openmpi-1.8.8/lib/libmpi_mpifh.a
/export/apps/siesta/openmpi-1.8.8/lib/libmpi_usempi.a
/usr/lib64/libibverbs.a
/export/apps/siesta/openmpi-1.8.8/lib/libopen-rte.a(session_dir.o): In
function `orte_session_dir_get_name':
session_dir.c:(.text+0x751): warning: Using 'getpwuid' in statically linked
applications requires at runtime the shared libraries from the glibc
version used for linking
sockets.o: In function `open_socket':
sockets.c:(.text+0xb5): warning: Using 'getaddrinfo' in statically linked
applications requires at runtime the shared libraries from the glibc
version used for linking
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libpthread.a(libpthread.o):
In function `sem_open':
(.text+0x764d): warning: the use of `mktemp' is dangerous, better use
`mkstemp'
/export/apps/siesta/openmpi-1.8.8/lib/libopen-rte.a(ras_slurm_module.o): In
function `init':
ras_slurm_module.c:(.text+0x6d5): warning: Using 'gethostbyname' in
statically linked applications requires at runtime the shared libraries
from the glibc version used for linking
/export/apps/siesta/openmpi-1.8.8/lib/libopen-pal.a(evutil.o): In function
`evutil_unparse_protoname':
/export/apps/siesta/openmpi-1.8.8/opal/mca/event/libevent2021/libevent/evutil.c:758:
warning: Using 'getprotobynumber' in statically linked applications
requires at runtime the shared libraries from the glibc version used for
linking
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libnl.a(utils.o):
In function `nl_str2ip_proto':
(.text+0x599): warning: Using 'getprotobyname' in statically linked
applications requires at runtime the shared libraries from the glibc
version used for linking
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libibverbs.a(src_libibverbs_la-init.o):
In function `load_driver':
(.text+0x2ec): undefined reference to `dlopen'
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libibverbs.a(src_libibverbs_la-init.o):
In function `load_driver':
(.text+0x331): undefined reference to `dlerror'
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libibverbs.a(src_libibverbs_la-init.o):
In function `ibverbs_init':
(.text+0xd25): undefined reference to `dlopen'
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libibverbs.a(src_libibverbs_la-init.o):
In function `ibverbs_init':
(.text+0xd36): undefined reference to `dlclose'
collect2: ld returned 1 exit status
make: *** [transiesta] Error 1


​

Regards,
Mahmood



On Wed, Sep 14, 2016 at 9:54 PM, Reuti  wrote:

>
> The "-l" includes already the "lib" prefix when it tries to find the
> library. Hence "-libverbs" might be misleading due to the "lib" in the
> word, as it looks for "libibverbs.{a|so}". Like "-lm" will look for
> "libm.a" resp. "libm.so".
>
> -- Reuti
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] static linking MPI libraries with applications

2016-09-14 Thread Mahmood Naderan
So, I used

./configure --prefix=/export/apps/siesta/openmpi-1.8.8
--enable-mpirun-prefix-by-default --enable-static --disable-shared
--disable-dlopen

and added -static to LDFLAGS, but I get:

/export/apps/siesta/openmpi-1.8.8/bin/mpifort -o transiesta -static
libfdf.a libSiestaXC.a \
   libmpi_f90.a  \
`FoX/FoX-config --libs --wcml` ../libscalapack.a
../libopenblas.a  /export/apps/siesta/openmpi-1.8.8/lib/libmpi_mpifh.a
/export/apps/siesta/openmpi-1.8.8/lib/libmpi_usempi.a
/usr/bin/ld: cannot find -libverbs
collect2: ld returned 1 exit status


removing -static will eliminate the error but that is not what I want.
Should I build libverbs from source first? Am I on the right direction?



Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] static linking MPI libraries with applications

2016-09-14 Thread Mahmood Naderan
Do you mean --disable-dl-dlopen? The last lines of configure are

+++ Configuring MCA framework dl
checking for no configure components in framework dl...
checking for m4 configure components in framework dl... libltdl, dlopen

--- MCA component dl:dlopen (m4 configuration macro, priority 80)
checking for MCA component dl:dlopen compile mode... static
checking if MCA component dl:dlopen can compile... no

--- MCA component dl:libltdl (m4 configuration macro, priority 50)
checking for MCA component dl:libltdl compile mode... static
checking --with-libltdl value... simple ok (unspecified)
checking --with-libltdl-libdir value... simple ok (unspecified)
checking for libltdl dir... compiler default
checking for libltdl library dir... linker default
checking ltdl.h usability... no
checking ltdl.h presence... no
checking for ltdl.h... no
checking if MCA component dl:libltdl can compile... no
configure: WARNING: Did not find a suitable static opal dl component
configure: WARNING: You might need to install libltld (and its headers) or
configure: WARNING: specify --disable-dlopen to configure.
configure: error: Cannot continue


The command is:

./configure --prefix=/export/apps/siesta/openmpi-1.8.8
--enable-mpirun-prefix-by-default --enable-static --disable-dl-dlopen




Regards,
Mahmood



On Wed, Sep 14, 2016 at 5:07 PM, Bennet Fauber  wrote:

> Mahmood,
>
> It looks like it is dlopen that is complaining.  What happens if
> --disable-dlopen?
>
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] static linking MPI libraries with applications

2016-09-14 Thread Mahmood Naderan
Well I want to omit LD_LIBRARY_PATH. For that reason I am building the
binary statically.

> note this is not required when Open MPI is configure'd with
>--enable-mpirun-prefix-by-default
I really did that. Using Rocks-6, I installed the application and openmpi
on the shared file system (/export).
Yes it is possible to add the library paths to LD_LIBRARY_PATH, but I want
to statically put the required libraries in the binary.



Regards,
Mahmood



On Wed, Sep 14, 2016 at 4:44 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Mahmood,
>
> try to prepend /export/apps/siesta/openmpi-1.8.8/lib to your
> $LD_LIBRARY_PATH
>
>  note this is not required when Open MPI is configure'd with
> --enable-mpirun-prefix-by-default
>
>
> Cheers,
>
> Gilles
>
> On Wednesday, September 14, 2016, Mahmood Naderan 
> wrote:
>
>> Hi,
>> Here is the problem with statically linking an application with a program.
>>
>> by specifying the library names:
>>
>> FC=/export/apps/siesta/openmpi-1.8.8/bin/mpifort
>> FFLAGS=-g -Os
>> FPPFLAGS= -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
>> LDFLAGS=-static
>> MPI1=/export/apps/siesta/openmpi-1.8.8/lib/libmpi_mpifh.a
>> MPI2=/export/apps/siesta/openmpi-1.8.8/lib/libmpi_usempi.a
>> BLAS_LIBS=../libopenblas.a
>> SCALAPACK_LIBS=../libscalapack.a
>> LIBS=$(SCALAPACK_LIBS) $(BLAS_LIBS) $(MPI1) $(MPI2)
>>
>>
>>
>>
>> The output of "make" is:
>>
>> /export/apps/siesta/openmpi-1.8.8/bin/mpifort -o transiesta \
>>-static automatic_cell.o 
>> libmpi_f90.a
>>   `FoX/FoX-config --libs --wcml` ../libscalapack.a
>> ../libopenblas.a  /export/apps/siesta/openmpi-1.8.8/lib/libmpi_mpifh.a
>> /export/apps/siesta/openmpi-1.8.8/lib/libmpi_usempi.a
>> /export/apps/siesta/openmpi-1.8.8/lib/libopen-pal.a(dl_dlopen_module.o):
>> In function `dlopen_open':
>> dl_dlopen_module.c:(.text+0x473): warning: Using 'dlopen' in statically
>> linked applications requires at runtime the shared libraries from the glibc
>> version used for linking
>> /usr/bin/ld: cannot find -libverbs
>> collect2: ld returned 1 exit status
>>
>>
>>
>>
>> If I drop -static, the error is gone... However, ldd command shoes that
>> binary can not access those two MPI libraries.
>>
>>
>> Regards,
>> Mahmood
>>
>>
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] static linking MPI libraries with applications

2016-09-14 Thread Mahmood Naderan
Hi,
Here is the problem with statically linking an application with a program.

by specifying the library names:

FC=/export/apps/siesta/openmpi-1.8.8/bin/mpifort
FFLAGS=-g -Os
FPPFLAGS= -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
LDFLAGS=-static
MPI1=/export/apps/siesta/openmpi-1.8.8/lib/libmpi_mpifh.a
MPI2=/export/apps/siesta/openmpi-1.8.8/lib/libmpi_usempi.a
BLAS_LIBS=../libopenblas.a
SCALAPACK_LIBS=../libscalapack.a
LIBS=$(SCALAPACK_LIBS) $(BLAS_LIBS) $(MPI1) $(MPI2)




The output of "make" is:

/export/apps/siesta/openmpi-1.8.8/bin/mpifort -o transiesta \
   -static automatic_cell.o 
libmpi_f90.a
  `FoX/FoX-config --libs --wcml` ../libscalapack.a
../libopenblas.a  /export/apps/siesta/openmpi-1.8.8/lib/libmpi_mpifh.a
/export/apps/siesta/openmpi-1.8.8/lib/libmpi_usempi.a
/export/apps/siesta/openmpi-1.8.8/lib/libopen-pal.a(dl_dlopen_module.o): In
function `dlopen_open':
dl_dlopen_module.c:(.text+0x473): warning: Using 'dlopen' in statically
linked applications requires at runtime the shared libraries from the glibc
version used for linking
/usr/bin/ld: cannot find -libverbs
collect2: ld returned 1 exit status




If I drop -static, the error is gone... However, ldd command shoes that
binary can not access those two MPI libraries.


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI libraries

2016-09-14 Thread Mahmood Naderan
It seems that siesta build its own mpi library named libmpi_f90.a which has
the same name as MPI's libraries. I solved it. Thanks for all suggestions.

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI libraries

2016-09-12 Thread Mahmood Naderan
OK. I will try that. Thanks for the suggestion.

Regards,
Mahmood



On Mon, Sep 12, 2016 at 11:35 PM, Dave Love  wrote:

> Gilles Gouaillardet  writes:
>
> > Mahmood,
> >
> > mpi_siesta is a siesta library, not an Open MPI library.
> >
> > fwiw, you might want to try again from scratch with
> > MPI_INTERFACE=libmpi_f90.a
> > DEFS_MPI=-DMPI
> > in your arch.make
> >
> > i do not think libmpi_f90.a is related to an OpenMPI library.
>
> libmpi_f90 is the Fortran 90 library in OMPI 1.6, but presumably you
> want the shared, system version.
>
> > if you need some more support, please refer to the siesta doc and/or ask
> on
> > a siesta mailing list
>
> I used the system MPI (which is OMPI 1.6 for historical reasons) and it
> seems siesta 4.0 just built on RHEL6 with the rpm spec fragment below,
> but I'm sure it would also work with 1.8.  (However, it needs cleaning
> up significantly for the intended Fedora packaging.)
>
>   %global _configure ../Src/configure
>   cd Obj
>   ../Src/obj_setup.sh
>   %_openmpi_load
>   %configure --enable-mpi
>   make # not smp-safe
>
> (%_openmpi_load just does "module load openmpi_x86_64" in this case.)
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI libraries

2016-09-12 Thread Mahmood Naderan
>i do not think libmpi_f90.a is related to an OpenMPI library.


Thing is that, libmpi_f90.a is part of 1.6.5 and siesta use that. However,
1.8.8 has no such file. Instead it has some other names and you said before
that mpifort (the wrapper) will automatically use the necessary libraries.
Please see below:

# ls /export/apps/computer/openmpi-1.6.5/lib/libmpi_f90.*
/export/apps/computer/openmpi-1.6.5/lib/libmpi_f90.a
/export/apps/computer/openmpi-1.6.5/lib/libmpi_f90.la
/export/apps/computer/openmpi-1.6.5/lib/libmpi_f90.so
/export/apps/computer/openmpi-1.6.5/lib/libmpi_f90.so.1
/export/apps/computer/openmpi-1.6.5/lib/libmpi_f90.so.1.3.0


The Makefile of Siesta that use mpi interface is available at
http://pastebin.com/RQx3KpXp

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] MPI libraries

2016-09-12 Thread Mahmood Naderan
Hi,
Trying to build a source code with newer versions of OpenMPI, I still have
some problems that weren't exist in previous versions.

In 1.6.5, I wrote something in a arch.make file which is used by Makefile

FC=/export/apps/siesta/openmpi-1.6.5/bin/mpif90
MPI_INTERFACE=libmpi_f90.a
MPI_INCLUDE=.

And then I had to copy that library to the local folder in order to drop
the full path.



However, with 1.8.8, it has been stated that mpifort (the wrapper) provides
all necssary libraries. So I now write

FC=/export/apps/siesta/openmpi-1.8.8/bin/mpifort
MPI_INTERFACE=
MPI_INCLUDE=.



However, the compilation of the program fails with

make[1]: Entering directory `/export/apps/siesta/siesta-4.0/spar/fdf'
In fdf, INCFLAGS is: -I/export/apps/siesta/siesta-4.0/Src/fdf  -I../
/export/apps/siesta/openmpi-1.8.8/bin/mpifort -c -g -Os
-I/export/apps/siesta/siesta-4.0/Src/fdf  -I../ -DMPI -DFC_HAVE_FLUSH
-DFC_HAVE_ABORT  /export/apps/siesta/siesta-4.0/Src/fdf/fdf.F90
/export/apps/siesta/siesta-4.0/Src/fdf/fdf.F90:498.20:

  use mpi_siesta
1
Fatal Error: Can't open module file 'mpi_siesta.mod' for reading at (1): No
such file or directory



It seems that it is trying to find an MPI library. Any idea about that?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
OK. Running "module unload rocks-openmpi" and putting that in ~/.bashrc
will remove /opt/openmpi/lib from LD_LIBRARY_PATH.

Thanks Gilles for your help.

Regards,
Mahmood



On Mon, Sep 12, 2016 at 1:25 PM, Mahmood Naderan 
wrote:

> It seems that it is part of rocks-openmpi. I will find out how to remove
> it and will come back.
>
> Regards,
> Mahmood
>
>
>
> On Mon, Sep 12, 2016 at 1:06 PM, Gilles Gouaillardet 
> wrote:
>
>> Mahmood,
>>
>> you need to manually remove /opt/openmpi/lib from your LD_LIBRARY_PATH
>> (or have your sysadmin do it if this is somehow done automatically)
>>
>> the point of configuring with --enable-mpirun-prefix-by-default is you
>> do *not* need
>> to add /export/apps/siesta/openmpi-1.8.8/lib in your LD_LIBRARY_PATH
>>
>> Cheers,
>>
>> Gilles
>>
>>
>>
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
It seems that it is part of rocks-openmpi. I will find out how to remove it
and will come back.

Regards,
Mahmood



On Mon, Sep 12, 2016 at 1:06 PM, Gilles Gouaillardet 
wrote:

> Mahmood,
>
> you need to manually remove /opt/openmpi/lib from your LD_LIBRARY_PATH
> (or have your sysadmin do it if this is somehow done automatically)
>
> the point of configuring with --enable-mpirun-prefix-by-default is you do
> *not* need
> to add /export/apps/siesta/openmpi-1.8.8/lib in your LD_LIBRARY_PATH
>
> Cheers,
>
> Gilles
>
>
> On 9/12/2016 5:28 PM, Mahmood Naderan wrote:
>
> Is the following output OK?
>
>
> ...
> Making install in util
> make[2]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test/util'
> make[3]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test/util'
> make[3]: Nothing to be done for `install-exec-am'.
> make[3]: Nothing to be done for `install-data-am'.
> make[3]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test/util'
> make[2]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test/util'
> make[2]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test'
> make[3]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test'
> make[3]: Nothing to be done for `install-exec-am'.
> make[3]: Nothing to be done for `install-data-am'.
> make[3]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test'
> make[2]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test'
> make[1]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test'
> make[1]: Entering directory `/export/apps/siesta/openmpi-1.8.8'
> make[2]: Entering directory `/export/apps/siesta/openmpi-1.8.8'
> make[2]: Nothing to be done for `install-exec-am'.
> make[2]: Nothing to be done for `install-data-am'.
> make[2]: Leaving directory `/export/apps/siesta/openmpi-1.8.8'
> make[1]: Leaving directory `/export/apps/siesta/openmpi-1.8.8'
> root@cluster:openmpi-1.8.8# ls bin/
> mpic++ mpif90-vt orte-clean   otfcompress shmemrun
> vtjava
> mpicc  mpifort   ortedotfconfig   vtc++
> vtrun
> mpiCC  mpifort-vtorte-infootfdecompress   vtcc
> vtsetup
> mpicc-vt   mpirunorte-ps  otfinfo vtCC
> vtsetup.jar
> mpiCC-vt   ompi-cleanorterun  otfmergevtcxx
> vtunify
> mpic++-vt  ompi_info orte-server  otfmerge-mpivtf77
> vtunify-mpi
> mpicxx ompi-ps   orte-top otfprintvtf90
> vtwrapper
> mpicxx-vt  ompi-server   oshccotfprofile  vtfilter
> mpiexecompi-top  oshfort  otfprofile-mpi  vtfiltergen
> mpif77 opal_wrapper  oshmem_info  otfshrink   vtfiltergen-mpi
> mpif77-vt  opari oshrun   shmemcc vtfilter-mpi
> mpif90 orteccotfaux   shmemfort   vtfort
> root@cluster:openmpi-1.8.8# echo $LD_LIBRARY_PATH
> /opt/gridengine/lib/linux-x64:/opt/openmpi/lib
> root@cluster:openmpi-1.8.8# grep -r mpirun config.log
>   $ ./configure --prefix=/export/apps/siesta/openmpi-1.8.8
> --enable-mpirun-prefix-by-default
> configure:66027: result:  '--prefix=/export/apps/siesta/openmpi-1.8.8'
> '--enable-mpirun-prefix-by-default'
> configure:97538: running /bin/sh './configure' --disable-dns
> --disable-http --disable-rpc --disable-openssl --enable-thread-support
> --disable-evport  '--prefix=/export/apps/siesta/openmpi-1.8.8'
> '--enable-mpirun-prefix-by-default' --cache-file=/dev/null --srcdir=.
> --disable-option-checking
> configure:290550: running /bin/sh './configure' --disable-option-checking
> --with-openmpi-inside=1.7 '--prefix=/export/apps/siesta/openmpi-1.8.8'
> '--enable-mpirun-prefix-by-default' CPPFLAGS=-I/export/apps/
> siesta/openmpi-1.8.8/ompi/include 
> LDFLAGS=-L/export/apps/siesta/openmpi-1.8.8/ompi/.libs
> --cache-file=/dev/null --srcdir=. --disable-option-checking
> root@cluster:openmpi-1.8.8#
>
>
>
>
>
>
> So I ran
>
> # ./configure --prefix=/export/apps/siesta/openmpi-1.8.8
> --enable-mpirun-prefix-by-default
> # make
> # make install
>
> However, LD_LIBRARY_PATH still shows the previous install (/opt/openmpi)
>
>
> Regards,
> Mahmood
>
>
>
>
> ___
> users mailing 
> listus...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
Is the following output OK?


...
Making install in util
make[2]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test/util'
make[3]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test/util'
make[3]: Nothing to be done for `install-exec-am'.
make[3]: Nothing to be done for `install-data-am'.
make[3]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test/util'
make[2]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test/util'
make[2]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test'
make[3]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test'
make[3]: Nothing to be done for `install-exec-am'.
make[3]: Nothing to be done for `install-data-am'.
make[3]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test'
make[2]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test'
make[1]: Leaving directory `/export/apps/siesta/openmpi-1.8.8/test'
make[1]: Entering directory `/export/apps/siesta/openmpi-1.8.8'
make[2]: Entering directory `/export/apps/siesta/openmpi-1.8.8'
make[2]: Nothing to be done for `install-exec-am'.
make[2]: Nothing to be done for `install-data-am'.
make[2]: Leaving directory `/export/apps/siesta/openmpi-1.8.8'
make[1]: Leaving directory `/export/apps/siesta/openmpi-1.8.8'
root@cluster:openmpi-1.8.8# ls bin/
mpic++ mpif90-vt orte-clean   otfcompress shmemrun
vtjava
mpicc  mpifort   ortedotfconfig   vtc++vtrun
mpiCC  mpifort-vtorte-infootfdecompress   vtcc
vtsetup
mpicc-vt   mpirunorte-ps  otfinfo vtCC
vtsetup.jar
mpiCC-vt   ompi-cleanorterun  otfmergevtcxx
vtunify
mpic++-vt  ompi_info orte-server  otfmerge-mpivtf77
vtunify-mpi
mpicxx ompi-ps   orte-top otfprintvtf90
vtwrapper
mpicxx-vt  ompi-server   oshccotfprofile  vtfilter
mpiexecompi-top  oshfort  otfprofile-mpi  vtfiltergen
mpif77 opal_wrapper  oshmem_info  otfshrink   vtfiltergen-mpi
mpif77-vt  opari oshrun   shmemcc vtfilter-mpi
mpif90 orteccotfaux   shmemfort   vtfort
root@cluster:openmpi-1.8.8# echo $LD_LIBRARY_PATH
/opt/gridengine/lib/linux-x64:/opt/openmpi/lib
root@cluster:openmpi-1.8.8# grep -r mpirun config.log
  $ ./configure --prefix=/export/apps/siesta/openmpi-1.8.8
--enable-mpirun-prefix-by-default
configure:66027: result:  '--prefix=/export/apps/siesta/openmpi-1.8.8'
'--enable-mpirun-prefix-by-default'
configure:97538: running /bin/sh './configure' --disable-dns --disable-http
--disable-rpc --disable-openssl --enable-thread-support --disable-evport
'--prefix=/export/apps/siesta/openmpi-1.8.8'
'--enable-mpirun-prefix-by-default' --cache-file=/dev/null --srcdir=.
--disable-option-checking
configure:290550: running /bin/sh './configure' --disable-option-checking
--with-openmpi-inside=1.7 '--prefix=/export/apps/siesta/openmpi-1.8.8'
'--enable-mpirun-prefix-by-default'
CPPFLAGS=-I/export/apps/siesta/openmpi-1.8.8/ompi/include
LDFLAGS=-L/export/apps/siesta/openmpi-1.8.8/ompi/.libs
--cache-file=/dev/null --srcdir=. --disable-option-checking
root@cluster:openmpi-1.8.8#






So I ran

# ./configure --prefix=/export/apps/siesta/openmpi-1.8.8
--enable-mpirun-prefix-by-default
# make
# make install

However, LD_LIBRARY_PATH still shows the previous install (/opt/openmpi)


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
>​  --enable-mpirun-prefix-by-default​

What is that? Does that mean "configure 1.8.8 with the default one
installed on the system"? Then that is not good I think because


# /opt/openmpi/bin/ompi_info
 Package: Open MPI root@centos-6-3.localdomain Distribution
Open MPI: 1.6.2




Regards,
Mahmood



On Mon, Sep 12, 2016 at 12:20 PM, Mahmood Naderan 
wrote:

> >​  --enable-mpirun-prefix-by-default​
>
> What is that? Does that mean "configure 1.8.8 with the default one
> installed on the system"? Then that is not good I think because
>
>
> Regards,
> Mahmood
>
>
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
>​  --enable-mpirun-prefix-by-default​

What is that? Does that mean "configure 1.8.8 with the default one
installed on the system"? Then that is not good I think because


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
​>(i'd like to make sure you are not using IntelMPI libmpi.so.1 with Open
MPI libmpi_mpifh.so.2, that can happen if Intel MPI >appears first in your
LD_LIBRARY_PATH)

# echo $LD_LIBRARY_PATH
/opt/gridengine/lib/linux-x64:/opt/openmpi/lib
# ls /opt/openmpi/lib
libmpi.a libompitrace.a libotfaux.so.0.0.0
libvt-mpi.so.0.0.0
libmpi_cxx.a libompitrace.lalibotf.la
libvt-mpi-unify.a
libmpi_cxx.lalibompitrace.solibotf.so
libvt-mpi-unify.la
libmpi_cxx.solibompitrace.so.0  libotf.so.1
libvt-mpi-unify.so
libmpi_cxx.so.1  libompitrace.so.0.0.0  libotf.so.1.5.2
libvt-mpi-unify.so.0
libmpi_cxx.so.1.0.1  libopen-pal.a  libvt.a
libvt-mpi-unify.so.0.0.0
libmpi_f77.a libopen-pal.la libvt-hyb.a  libvt-mt.a
libmpi_f77.lalibopen-pal.so libvt-hyb.la libvt-mt.la
libmpi_f77.solibopen-pal.so.4   libvt-hyb.so libvt-mt.so
libmpi_f77.so.1  libopen-pal.so.4.0.0   libvt-hyb.so.0
libvt-mt.so.0
libmpi_f77.so.1.0.3  libopen-rte.a  libvt-hyb.so.0.0.0
libvt-mt.so.0.0.0
libmpi_f90.a libopen-rte.la libvt-java.la
libvt-pomp.a
libmpi_f90.lalibopen-rte.so libvt-java.so
libvt-pomp.la
libmpi_f90.solibopen-rte.so.4   libvt-java.so.0  libvt.so
libmpi_f90.so.1  libopen-rte.so.4.0.0   libvt-java.so.0.0.0  libvt.so.0
libmpi_f90.so.1.1.0  libotf.a   libvt.la
libvt.so.0.0.0
libmpi.lalibotfaux.alibvt-mpi.a  mpi.mod
libmpi.solibotfaux.la   libvt-mpi.la openmpi
libmpi.so.1  libotfaux.so   libvt-mpi.so pkgconfig
libmpi.so.1.0.3  libotfaux.so.0 libvt-mpi.so.0




It seems that there is another openmpi installation which has been
installed by previous admins.
Before removing /opt/openmpi, I would like to unset the LD_LIBRARY_PATH
first to see if it has any effect. Do you agree with that?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
Hi,
Following the suggestion by Gilles Gouaillardet (
https://mail-archive.com/users@lists.open-mpi.org/msg29688.html), I ran a
configure command for a program like this

​# ../Src/configure FC=/export/apps/siesta/openmpi-1.8.8/bin/mpifort
--with-blas=libopenblas.a --with-lapack=liblapack.a
--with-scalapack=libscalapack.a
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for Fortran compiler default output file name... a.out
checking whether the Fortran compiler works... configure: error: cannot run
Fortran compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details.



The content of config.log is available at ​http://pastebin.com/LTxxRMwH

It seems that mpifort (the wrapper compiler) has different usage than
previous mpif90

Any idea about that?


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Error in file runtime/orte_init.c

2016-09-02 Thread Mahmood Naderan
​OK thanks for the hint. In fact 'ldd' command shows that some libraries
were missing. adding the paths to LD_LIBRARY_PATH solved the problem.



Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread Mahmood Naderan
Thanks for your help. Please see below

mahmood@compute-0-1:~$ ldd /share/apps/chemistry/siesta-3.2-pl-5/tpar/transiesta
linux-vdso.so.1 =>  (0x7fffba9a8000)
libmpi_f90.so.1 => /opt/openmpi/lib/libmpi_f90.so.1 (0x2b472b64)
libmpi_f77.so.1 => /opt/openmpi/lib/libmpi_f77.so.1 (0x2b472b848000)
libmpi.so.1 => /opt/openmpi/lib/libmpi.so.1 (0x2b472ba8)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x003d17e0)
librt.so.1 => /lib64/librt.so.1 (0x003d1860)
libnsl.so.1 => /lib64/libnsl.so.1 (0x003d1ae0)
libutil.so.1 => /lib64/libutil.so.1 (0x003d18a0)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x2b472c028000)
libm.so.6 => /lib64/libm.so.6 (0x2b472c32)
libtorque.so.2 => /opt/torque/lib/libtorque.so.2 (0x2b472c5a8000)
libdl.so.2 => /lib64/libdl.so.2 (0x003d1760)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x003d1920)
libpthread.so.0 => /lib64/libpthread.so.0 (0x003d17a0)
libc.so.6 => /lib64/libc.so.6 (0x003d1720)
libdat.so.1 => /usr/lib64/libdat.so.1 (0x2b472c8b)
/lib64/ld-linux-x86-64.so.2 (0x003d16e0)


-- 
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread Mahmood Naderan
The application is Siesta-3.2 and the command I use is


/share/apps/computer/openmpi-1.6.5/bin/mpirun -hostfile hosts.txt -np
15 /share/apps/chemistry/siesta-3.2-pl-5/tpar/transiesta <
trans-cc-bt-cc-163-20.fdf

There is one node in the hosts.txt file. I have built transiesta
binary from the source which uses
/share/apps/computer/openmpi-1.6.5/bin/mpif90

-- 
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] Error in file runtime/orte_init.c

2016-09-02 Thread Mahmood Naderan
Hi,
Using OpenMPI-2.0.0, is there any idea about this error

A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:  compute-0-1.local
Framework: ess
Component: pmi
--
[compute-0-1.local:22993] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in
file runtime/orte_init.c at line 116




-- 
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread Mahmood Naderan
>Did you ran
>ulimit -c unlimited
>before invoking mpirun ?

Yes. On the node which says that error. Is that file created in the
current working directory? Or it is somewhere in the system folders?



As another question, I am trying to use OpenMPI-2.0.0 as a new one.
Problem is that the application uses libmpi_f90.a from old versions
but I don't see that in OpenMPI-2.0.0. There are some other libraries
there.




-- 
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread Mahmood Naderan
>Are you running under a batch manager ?
>On which architecture ?
Currently I am not using the job manager (which is actually PBS). I am
running from the terminal.

The machines are AMD Opteron 64 bit


>Hopefully you will get a core file that points you to the illegal instruction
Where is that core file. I can not find it.

BTW, the openmpi is 1.6.5


-- 
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] job aborts "readv failed: Connection reset by peer"

2016-08-30 Thread Mahmood Naderan
Hi,
An MPI job is running on two nodes and everything seems to be fine.
However, in the middle of the run, the program aborts with the following
error


[compute-0-1.local][[47664,1],14][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[compute-0-3.local][[47664,1],11][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[compute-0-3.local][[47664,1],13][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
--
mpirun noticed that process rank 0 with PID 4989 on node compute-0-1 exited
on signal 4 (Illegal instruction).
--


There are 8 processes on that node and each consumes about 150MB of memory.
The total memory usage is about 1% of the memory.

There are some discussions on the web about memory error but there is no
clear answer for that. What does that illegal instruction mean?




Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Need libmpi_f90.a

2016-07-12 Thread Mahmood Naderan
Sorry but I didn't understand the relation between name changes and wrapper
compilers. I only used --enable-static in the configure process.

> -rw-r--r-- 1 root root 1029580 Jul 11 23:51 libmpi_mpifh.a
> -rw-r--r-- 1 root root   17292 Jul 11 23:51 libmpi_usempi.a
>These are the two for v1.10.x.

So, for an application that used libmpi_f90.a, I have to specify those two
files. Is that right?

MPI_INTERFACE=libmpi_f90.a
->
MPI_INTERFACE=libmpi_mpifh.a libmpi_usempi.a


Regards,
Mahmood


Re: [OMPI users] Need libmpi_f90.a

2016-07-11 Thread Mahmood Naderan
Excuse me... that command only creates libmpi_f90.a for V1.6.5.
What about V1.10.3? I don't see such file even with --enable-static. Does
it have a different name?


# ls -l libmpi*
-rw-r--r-- 1 root root 5888466 Jul 11 23:51 libmpi.a
-rw-r--r-- 1 root root  962656 Jul 11 23:51 libmpi_cxx.a
-rwxr-xr-x 1 root root1210 Jul 11 23:51 libmpi_cxx.la
lrwxrwxrwx 1 root root  19 Jul 11 23:51 libmpi_cxx.so ->
libmpi_cxx.so.1.1.3
lrwxrwxrwx 1 root root  19 Jul 11 23:51 libmpi_cxx.so.1 ->
libmpi_cxx.so.1.1.3
-rwxr-xr-x 1 root root  139927 Jul 11 23:51 libmpi_cxx.so.1.1.3
-rwxr-xr-x 1 root root1139 Jul 11 23:51 libmpi.la
-rw-r--r-- 1 root root 1029580 Jul 11 23:51 libmpi_mpifh.a
-rwxr-xr-x 1 root root1232 Jul 11 23:51 libmpi_mpifh.la
lrwxrwxrwx 1 root root  22 Jul 11 23:51 libmpi_mpifh.so ->
libmpi_mpifh.so.12.0.1
lrwxrwxrwx 1 root root  22 Jul 11 23:51 libmpi_mpifh.so.12 ->
libmpi_mpifh.so.12.0.1
-rwxr-xr-x 1 root root  584518 Jul 11 23:51 libmpi_mpifh.so.12.0.1
lrwxrwxrwx 1 root root  16 Jul 11 23:51 libmpi.so -> libmpi.so.12.0.3
lrwxrwxrwx 1 root root  16 Jul 11 23:51 libmpi.so.12 -> libmpi.so.12.0.3
-rwxr-xr-x 1 root root 2903817 Jul 11 23:51 libmpi.so.12.0.3
-rw-r--r-- 1 root root   17292 Jul 11 23:51 libmpi_usempi.a
-rwxr-xr-x 1 root root1288 Jul 11 23:51 libmpi_usempi.la
lrwxrwxrwx 1 root root  22 Jul 11 23:51 libmpi_usempi.so ->
libmpi_usempi.so.5.1.0
lrwxrwxrwx 1 root root  22 Jul 11 23:51 libmpi_usempi.so.5 ->
libmpi_usempi.so.5.1.0
-rwxr-xr-x 1 root root   11900 Jul 11 23:51 libmpi_usempi.so.5.1.0




Regards,
Mahmood



On Sun, Jul 10, 2016 at 8:39 PM, Mahmood Naderan 
wrote:

> >./configure --disable-shared --enable-static
>
> Thank you very much
>
> Regards,
> Mahmood
>
>


Re: [OMPI users] Need libmpi_f90.a

2016-07-10 Thread Mahmood Naderan
>./configure --disable-shared --enable-static

Thank you very much

Regards,
Mahmood


[OMPI users] Need libmpi_f90.a

2016-07-10 Thread Mahmood Naderan
Hi,
I need libmpi_f90.a for building an application. I have manually compiled
1.6.5 and 1.10.3 but that file is absent. Instead I see these

openmpi-1.6.5/lib/libmpi_f90.la
openmpi-1.10.3/lib/libmpi_mpifh.la

What should I do?

Regards,
Mahmood


Re: [OMPI users] openmpi shared memory feature

2012-11-01 Thread Mahmood Naderan
I have understood about the the advantages of shared memeory BTL. I wanted to 
share some of my observations and gain an understanding about the internal 
mechanisms of opemmpi. I am wondering why openmpi uses a temporary file for 
transferring data between the two processes which are on the same node 
(regardless of having a tmpfs or tcp stack). 


Assume there is no tmpfs. Then why P1 and P2 on another node (B in my example) 
should communicate through tcp? Why should they use a file for shared 
communication. This is our observation that there is a lot of IO activity 
(writing activity is larger than reading). Basically they should communicate 
through the RAM of the node. An analogy for this, is the boot process of node B 
which has no disks. At the boot process it reads the images from the disk on A 
though network. Later it has loaded all necessary things in to *its RAM* and do 
what ever it want though its memory.


It seems that reading and writing files for this purpose is inefficient. 
Wouldn't  it be more logical to use interprocess communication (IPC) API to 
transfer the pointer to the data between processes. As an observation, we found 
that mpich2 does not use the temporary file for shared memory management 
(though I have not figured out the mechanism yet) and achieves a better 
performance (minor but noticable) with respect to openmpi.  


Any thoughts?


 
Regards,
Mahmood




 From: Jeff Squyres 
To: Open MPI Users  
Sent: Monday, October 29, 2012 4:31 PM
Subject: Re: [OMPI users] openmpi shared memory feature
 
On Oct 29, 2012, at 11:01 AM, Ralph Castain wrote:

> Wow, that would make no sense at all. If P1 and P2 are on the same node, then 
> we will use shared memory to do the transfer, as Jeff described. However, if 
> you disable shared memory, as you indicated you were doing on a previous 
> message (by adding -mca btl ^sm), then we would use a loopback device if 
> available - i.e., the packet would be handed to the network stack, which 
> would then return it to P2 without it ever leaving the node.
> 
> If there is no loopback device, and you disable shared memory, then we would 
> abort the job with an error as there is no way for P1 to communicate with P2.
> 
> We would never do what you describe.

To be clear: it would probably be a good idea to have *some* tmpfs on your 
diskless node.  Some things should simply not be on a network filesystem (e.g., 
/tmp).  Google around; there are good reasons for having a small tmpfs, even on 
a diskless server.

Indeed, Open MPI will warn you if it ends up putting a shared memory "file" 
(which, as I described, isn't really a file) on a network filesystem -- e.g., 
if /tmp is a network filesystem.  OMPI warns because corner cases can arise 
that cause performance degradation (e.g., the OS may periodically writing out 
the contents of shared memory to the network filesystem).

But as Ralph says: Open MPI primarily uses shared memory when communicating 
with processes on the same server (unless you disable shared memory).  This 
means Open MPI copies message A from P1's address space to shared memory, and 
then P2 copies message A from shared memory to its address space.  Or, if 
you're using the Linux knem kernel module, MPI copies message A from P1's 
address space directly to P2's address space.  No network transfer occurs, 
unless you possibly have /tmp on a network filesystem, and/or no /dev/shm 
filesystem, or other corner cases like that.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] openmpi shared memory feature

2012-10-29 Thread Mahmood Naderan
Thanks again for your answer. The reason why I had negative view to the shared 
memory feature was that we were debugging the system (our program, openmpi, 
cluster settings, ...) for nearly a week. To avoid any confusion, I will use 
"node". Here we have:
1- Node 'A' which has some pysical disks 32GB of memory

2- Node 'B' which has 64GB of memory but has no disks. It boots from an image 
which resides on 'A'.
3- There is no tmpfs.
4- We installed openmpi with *default* options.

5- We run the command "openmpi -np 4 " on 'B'



So 4 processes are running on 'B'. Assume P1 is trying to send sond something 
to P2. This is my understanding (please correct if I am wrong)
1- P1 create a packet.
2- P1 send the packet to the network interface.
3- The packet is transfered from 'B' to 'A'.
4- While on 'A', the packet goes to the disk and do something.
5- The packet is again oon the way from 'A' to 'B'.
6- P2 on 'B' will get the packet.

That is a clear inefficient communication.

What I understand from your replies, is that if there is a tmpfs, then P1 and 
P2 can communicate through the memory on 'B' which is fine. But I think there 
should be more documentation on that. 

 
Regards,
Mahmood




 From: Jeff Squyres 
To: Mahmood Naderan ; Open MPI Users  
Sent: Monday, October 29, 2012 1:28 PM
Subject: Re: [OMPI users] openmpi shared memory feature
 
Your original question stuck in my brain over the weekend, and I *think* you 
may have been asking a different question than I originally answered.  Even 
though you say we answered your question, I'm going to post my ruminations here 
anyway.  :-)

You might have been asking about how a shared memory *file* on a diskless 
machine -- where the majority of the filesystem is presumably on a network 
mount -- could be efficient.  

If you look at the shared memory as a "file" on a filesystem (particularly if 
it's a network filesystem), then you're right: all file reads and writes turn 
into network communications.  Therefore, communication through "files" would 
actually be quite inefficient: reads and writes to such files would be pumped 
through the network.

The reality is that shared memory "files" are special kinds of files.  They're 
just rendezvous points for multiple processes to find the shared memory.  Once 
a process mmaps a shared memory "file", then reads and writes to that file 
effectively don't actually go through the underlying filesystem anymore.  
Instead, they go directly to the shared memory (which is kinda the point).  

There are some corner cases where the contents of the shared memory can be 
written out to the filesystem (which, in the case of the network filesystem, 
would result in network communications to the file server), but Open MPI avoids 
those cases.

Hope that helps.




On Oct 27, 2012, at 2:17 PM, Mahmood Naderan wrote:

> Thanks all. It is now cleared.
> 
> Regards,
> Mahmood
> 
> From: Damien 
> To: Open MPI Users  
> Sent: Saturday, October 27, 2012 7:25 PM
> Subject: Re: [OMPI users] openmpi shared memory feature
> 
> Mahmood,
> 
> To build on what Jeff said, here's a short summary of how diskless clusters 
> work:
> 
> A diskless node gets its operating system through a physical network (say 
> gig-E), including the HPC applications and the MPI runtimes, from a master 
> server.  That master server isn't the MPI head node, it's a separate 
> OS/Network boot server.  That's completely separate from how the MPI 
> applications run.  The MPI-based HPC applications on the nodes communicate 
> through a dedicated, faster physical network (say Infiniband).  There's two 
> separate networks, one for starting and running nodes and one for doing HPC 
> work.  On the same node, MPI processes use shared-memory to communicate, 
> regardless of whether it's diskless or not, it's just part of MPI.  Between 
> nodes, MPI processes use that faster, dedicated network, and that's 
> regardless of whether it's diskless or not, it's just part of MPI. The 
> networks are separate because it's more efficient.
> 
> Damien
> 
> On 27/10/2012 11:00 AM, Jeff Squyres wrote:
> > On Oct 27, 2012, at 12:47 PM, Mahmood Naderan wrote:
> > 
> >>> Because communicating through shared memory when sending messages between 
> >>> processes on the same server is far faster than going through a network 
> >>> stack.
> >>  I see... But that is not good for diskless clusters. Am I right? assume 
> >>processes are on a node (which has no disk). In this case, their 
> >>communication go though network (from compu

Re: [OMPI users] openmpi shared memory feature

2012-10-27 Thread Mahmood Naderan
Thanks all. It is now cleared.


Regards,
Mahmood




 From: Damien 
To: Open MPI Users  
Sent: Saturday, October 27, 2012 7:25 PM
Subject: Re: [OMPI users] openmpi shared memory feature
 
Mahmood,

To build on what Jeff said, here's a short summary of how diskless clusters 
work:

A diskless node gets its operating system through a physical network (say 
gig-E), including the HPC applications and the MPI runtimes, from a master 
server.  That master server isn't the MPI head node, it's a separate OS/Network 
boot server.  That's completely separate from how the MPI applications run.  
The MPI-based HPC applications on the nodes communicate through a dedicated, 
faster physical network (say Infiniband).  There's two separate networks, one 
for starting and running nodes and one for doing HPC work.  On the same node, 
MPI processes use shared-memory to communicate, regardless of whether it's 
diskless or not, it's just part of MPI.  Between nodes, MPI processes use that 
faster, dedicated network, and that's regardless of whether it's diskless or 
not, it's just part of MPI. The networks are separate because it's more 
efficient.

Damien

On 27/10/2012 11:00 AM, Jeff Squyres wrote:
> On Oct 27, 2012, at 12:47 PM, Mahmood Naderan wrote:
> 
>>> Because communicating through shared memory when sending messages between 
>>> processes on the same server is far faster than going through a network 
>>> stack.
>>   I see... But that is not good for diskless clusters. Am I right? assume 
>>processes are on a node (which has no disk). In this case, their 
>>communication go though network (from computing node to server) then IO and 
>>then network again (from server to computing node).
> I don't quite understand what you're saying -- what exactly is your 
> distinction between "server" and "computing node"?
> 
> For the purposes of my reply, I use the word "server" to mean "one 
> computational server, possibly containing multiple processors, a bunch of 
> RAM, and possibly one or more disks."  For example, a 1U "pizza box" style 
> rack enclosure containing the guts of a typical x86-based system.
> 
> You seem to be relating two orthogonal things: whether a server has a disk 
> and how MPI messages flow from one process to another.
> 
> When using shared memory, the message starts in one process, gets copied to 
> shared memory, then then gets copied to the other process.  If you use the 
> knem Linux kernel module, we can avoid shared memory in some cases and copy 
> the message directly from one process' memory to the other.
> 
> It's irrelevant as to whether there is a disk or not.
> 

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] openmpi shared memory feature

2012-10-27 Thread Mahmood Naderan

>Because communicating through shared memory when sending messages 
between processes on the same server is far faster than going through a 
network stack.
 
I see... But that is not good for diskless clusters. Am I right? assume 
processes are on a node (which has no disk). In this case, their communication 
go though network (from computing node to server) then IO and then network 
again (from server to computing node).


Regards,
Mahmood




 From: Jeff Squyres 
To: Mahmood Naderan ; Open MPI Users  
Sent: Saturday, October 27, 2012 6:19 PM
Subject: Re: [OMPI users] openmpi shared memory feature
 
On Oct 27, 2012, at 10:49 AM, Mahmood Naderan wrote:

> Why openmpi uses shared memory model?

Because communicating through shared memory when sending messages between 
processes on the same server is far faster than going through a network stack.

> this can be disabled though by setting "--mca ^sm". 
> It seems that by default openmpi uses such feature (shared memory backing 
> files) which is strange.
>  
> Regards,
> Mahmood
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] open mpi 1.6 with intel compilers

2012-10-27 Thread Mahmood Naderan
oops...
Sorry about that.

 
Regards,
Mahmood




 From: Jeff Squyres 
To: Mahmood Naderan ; Open MPI Users  
Sent: Saturday, October 27, 2012 6:34 PM
Subject: Re: [OMPI users] open mpi 1.6 with intel compilers
 
I believe you're referring to a different thread on this mailing list:

    http://www.open-mpi.org/community/lists/users/2012/10/20552.php

I answered the question about shared memory in that thread, not this one (which 
is about a run-time error).


On Oct 27, 2012, at 12:24 PM, Mahmood Naderan wrote:

> 
> >This looks like you're trying to execute an MPICH2-build MPI executable, not 
> >Open MPI.
> No that was a general question. I mean message passing is a model of 
> communication
> versus shared memory programming. So what is the point when openmpi uses 
> shared
> memory model?
>  
> Regards,
> Mahmood
> 
> From: Jeff Squyres 
> To: Open MPI Users  
> Sent: Saturday, October 27, 2012 6:18 PM
> Subject: Re: [OMPI users] open mpi 1.6 with intel compilers
> 
> This looks like you're trying to execute an MPICH2-build MPI executable, not 
> Open MPI.
> 
> On Oct 27, 2012, at 11:46 AM, Giuseppe P. wrote:
> 
> > Hello!
> > 
> > I have built open mpi 1.6 with Intel compilers (2013 versions). Compilation 
> > was smooth, however even when I try to execute
> > the simple program hello.c:
> > 
> > mpirun -np 4 ./hello_c.x
> > [mpie...@claudio.ukzn] HYDU_create_process (./utils/launch/launch.c:102): 
> > execvp error on file 
> > /opt/intel/composer_xe_2013.0.079/mpirt/bin/intel64/pmi_proxy (No such  
> > file or directory)
> > [mpie...@claudio.ukzn] HYD_pmcd_pmiserv_proxy_init_cb 
> > (./pm/pmiserv/pmiserv_cb.c:1177): assert (!closed) failed
> > [mpie...@claudio.ukzn] HYDT_dmxu_poll_wait_for_event 
> > (./tools/demux/demux_poll.c:77): callback returned error status
> > [mpie...@claudio.ukzn] HYD_pmci_wait_for_completion 
> > (./pm/pmiserv/pmiserv_pmci.c:358): error waiting for event
> > [mpie...@claudio.ukzn] main (./ui/mpich/mpiexec.c:689): process manager 
> > error waiting for completion
> > 
> > Before that, there was an additional error, since also the file mpivars.sh 
> > was not present in /opt/intel/composer_xe_2013.0.079/mpirt/bin/intel64/.
> > Even though I managed to create one and it worked:
> > 
> > #!/bin/bash
> > 
> > if [ -z "`echo $PATH | grep /usr/local/bin`" ]; then
> > export PATH=/usr/local/bin:$PATH
> > fi
> > 
> > if [ -z "`echo $LD_LIBRARY_PATH | grep /usr/local/lib`" ]; then
> > if [ -n "$LD_LIBRARY_PATH" ]; then
> > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
> > else
> > export LD_LIBRARY_PATH=/usr/local/lib
> > fi
> > fi
> > 
> > I do not have any clue about how to generate the file pmi_proxy.
> > 
> > Thank you in advance for your help!
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] open mpi 1.6 with intel compilers

2012-10-27 Thread Mahmood Naderan

>This looks like you're trying to execute an MPICH2-build MPI executable, not 
>Open MPI.
No that was a general question. I mean message passing is a model of 
communication
versus shared memory programming. So what is the point when openmpi uses shared
memory model? 

 
Regards,
Mahmood




 From: Jeff Squyres 
To: Open MPI Users  
Sent: Saturday, October 27, 2012 6:18 PM
Subject: Re: [OMPI users] open mpi 1.6 with intel compilers
 
This looks like you're trying to execute an MPICH2-build MPI executable, not 
Open MPI.

On Oct 27, 2012, at 11:46 AM, Giuseppe P. wrote:

> Hello!
> 
> I have built open mpi 1.6 with Intel compilers (2013 versions). Compilation 
> was smooth, however even when I try to execute
> the simple program hello.c:
> 
> mpirun -np 4 ./hello_c.x
> [mpie...@claudio.ukzn] HYDU_create_process (./utils/launch/launch.c:102): 
> execvp error on file 
> /opt/intel/composer_xe_2013.0.079/mpirt/bin/intel64/pmi_proxy (No such  file 
> or directory)
> [mpie...@claudio.ukzn] HYD_pmcd_pmiserv_proxy_init_cb 
> (./pm/pmiserv/pmiserv_cb.c:1177): assert (!closed) failed
> [mpie...@claudio.ukzn] HYDT_dmxu_poll_wait_for_event 
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpie...@claudio.ukzn] HYD_pmci_wait_for_completion 
> (./pm/pmiserv/pmiserv_pmci.c:358): error waiting for event
> [mpie...@claudio.ukzn] main (./ui/mpich/mpiexec.c:689): process manager error 
> waiting for completion
> 
> Before that, there was an additional error, since also the file mpivars.sh 
> was not present in /opt/intel/composer_xe_2013.0.079/mpirt/bin/intel64/.
> Even though I managed to create one and it worked:
> 
> #!/bin/bash
> 
> if [ -z "`echo $PATH | grep /usr/local/bin`" ]; then
> export PATH=/usr/local/bin:$PATH
> fi
> 
> if [ -z "`echo $LD_LIBRARY_PATH | grep /usr/local/lib`" ]; then
> if [ -n "$LD_LIBRARY_PATH" ]; then
> export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
> else
> export LD_LIBRARY_PATH=/usr/local/lib
> fi
> fi
> 
> I do not have any clue about how to generate the file pmi_proxy.
> 
> Thank you in advance for your help!
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] openmpi shared memory feature

2012-10-27 Thread Mahmood Naderan


Dear all,
Why openmpi uses shared memory model? this can be disabled though by setting 
"--mca ^sm". 
It seems that by default openmpi uses such feature (shared memory backing 
files) which is strange.

 
Regards,
Mahmood


Re: [OMPI users] running openmpi in debug/verbose mode

2012-10-26 Thread Mahmood Naderan

>You can usually resolve that by configuring with --disable-dlopen
Ok I will try.
So what is the purpose of enabling dlopen? Why dlopen is not disabled by 
default.
I mean why high traffic configuration is enabled by default?


 
Regards,
Mahmood




 From: Ralph Castain 
To: Mahmood Naderan  
Sent: Thursday, October 25, 2012 8:55 PM
Subject: Re: [OMPI users] running openmpi in debug/verbose mode
 

Sorry - we're all a tad busy with deadlines for the Supercomputing conference 
:-(

You are probably running into trouble due to dlopen pulling files across the 
network. You can usually resolve that by configuring with --disable-dlopen.



On Oct 25, 2012, at 11:51 AM, Mahmood Naderan  wrote:

I sent a problem to the list but didn't receive any reply. In short, we found 
>
>that when we run openmpi+openfoam program on a node (in a diskless cluster), 
>
>there is a huge IO operations caused by openmpi. When we run openmpi+openfoam
>on the server, there is no problem. When we run openfoam directly on the node,
>there is also no problem.
>
>
>Now I am looking for some verbose/debug outputs from openmpi which 
>
>shows the activity of it (in particular IO messages for example opening file1
>
>or closing file2...).
>
>
>Can I extract such messages?
>
> 
>Regards,
>Mahmood
>
>
>
>
>
> From: Ralph Castain 
>To: Mahmood Naderan ; Open MPI Users 
> 
>Sent: Thursday, October 25, 2012 8:44 PM
>Subject: Re: [OMPI users] running openmpi in debug/verbose mode
> 
>
>There is a *ton* of debug output available - would help to know what you are 
>attempting to debug
>
>
>
>
>On Oct 25, 2012, at 11:38 AM, Mahmood Naderan  wrote:
>
>
>>
>>Dear all,
>>Is there any way to run openmpi in debug or verbose mode? Is there any log 
>>for openmpi run?
>> 
>>Regards,
>>Mahmood
>>___
>>users mailing list
>>us...@open-mpi.org
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>

Re: [OMPI users] running openmpi in debug/verbose mode

2012-10-25 Thread Mahmood Naderan
I sent a problem to the list but didn't receive any reply. In short, we found 

that when we run openmpi+openfoam program on a node (in a diskless cluster), 

there is a huge IO operations caused by openmpi. When we run openmpi+openfoam
on the server, there is no problem. When we run openfoam directly on the node,
there is also no problem.

Now I am looking for some verbose/debug outputs from openmpi which 

shows the activity of it (in particular IO messages for example opening file1

or closing file2...).

Can I extract such messages?

 
Regards,
Mahmood




 From: Ralph Castain 
To: Mahmood Naderan ; Open MPI Users  
Sent: Thursday, October 25, 2012 8:44 PM
Subject: Re: [OMPI users] running openmpi in debug/verbose mode
 

There is a *ton* of debug output available - would help to know what you are 
attempting to debug



On Oct 25, 2012, at 11:38 AM, Mahmood Naderan  wrote:


>
>Dear all,
>Is there any way to run openmpi in debug or verbose mode? Is there any log for 
>openmpi run?
> 
>Regards,
>Mahmood
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] running openmpi in debug/verbose mode

2012-10-25 Thread Mahmood Naderan


Dear all,
Is there any way to run openmpi in debug or verbose mode? Is there any log for 
openmpi run?

 
Regards,
Mahmood


[OMPI users] Low cpu utilization due to high IO operations of openmpi

2012-10-21 Thread Mahmood Naderan
Dear all,
We have a diskless cluster with these specs:
1) A server which has some disks. Root directories (/usr, /lib, ...) are 

on /dev/sda while /home is on /dev/sdb and these are two physical 

hard drives.

2) Some compute nodes. These don't have any disk drive instead they 

are connected through a 10/100/1000 switch to the server

3) Nodes uses a NFS directory for booting which resides on /dev/sda.

In our cluster we use openmpi with openfoam. Both were compiled using 

default options. Problem is when the openfoam solver with openmpi is 

sent to a compute node, the opnempi uses a lot of *WRITE* operations
which causes a low cpu utilization and hence the processes are mainly 

in 'D' state. A brief description of our tests are:

We ssh to the compute node and run the application there

Test 1) One process of openfoam is launched without openmpi. Everything 

is fine and cpu is utilized 100%
Test 2) Two processes of openfoam is launched (mpirun -np 2 ). Two 

openfoam processes has about 30% cpu utilization and they are in 'D' 

state most of the time.


Is there any suggestion on that. That is a really poor performance


Regards,
Mahmood