[OMPI users] Preloading the libraries "--preload-files" Effect

2020-04-21 Thread Kihang Youn via users

Hi, everyone.

I'd like to know about the OpenMPI mpirun runtime option "--preload-files".
Is the function of this option used to put the library on Cache (or RAM?) at 
startup and to gain benefits from multiple calls?

Our team is looking for a way to load the library used by programs running 
through the "vmtouch" into the cache when performing the model, and we would 
like to know if it can be implemented through the "preload-files" option.

The second question may be a lazy one.
Is there any options on tuning factor that can adjust parallel performance with 
OpenMPI?
I'm actually only familiar with using MPI because I'm a newbie and I don't know 
much about performance enhancement, but I'm going to organize the list of 
recommendable performance-related options with OpenMPI and test them, so I'd 
like to know if there's anything organized(PDF or URL or something).

Thank you.



Kihang Youn(윤기항) - Application Analyst | Lenovo DCG Professional Services
Mobile: +82-10-9374-9396
E-mail: ky...@lenovo.com


Re: [OMPI users] OMPI v2.1.5 with Slurm

2020-04-21 Thread Gilles Gouaillardet via users
Levi,

as a workaround, have you tried using mpirun instead of direct launch (e.g.
srun) ?


Note you are using pmix 1.2.5, so you likely want to srun --mpi=pmix_v1

Also, as reported by the logs



   1. [nodeA:12838] OPAL ERROR: Error in file pmix3x_client.c at line 112


there is something fishy here, since there is no pmix3x_client.c in Open
MPI 2.x
(there is pmix1_client.c though)

Cheers,

Gilles

On Wed, Apr 22, 2020 at 7:53 AM Levi D Davis via users <
users@lists.open-mpi.org> wrote:

> Hello,
>
> I have been having a heck of a time getting an old version of OMPI (v2.1)
> with slurm (v19). I need it to work with an application that I don't have
> control of recompiling with newer OMPI versions.
>
> I think it is an OMPI or PMIX compiling issue because the newer versions
> (OMPI v3 and v4, PMIX v2 and v3) work with slurm as expected form the docs.
> I am not sure if it is because of a parameter I need to set for an older
> OMPI to make it working, since it is hard to find details about old
> versions and issues.
>
> Here are some commands:
>
> srun --mpi=list
> srun: MPI types are...
> srun: pmix_v3
> srun: none
> srun: openmpi
> srun: pmi2
> srun: pmix
> srun: pmix_v1
> srun: pmix_v2
>
> srun --mpi=pmix_v2 -N 2 -n 2 mpi_program
> Hello World from Node headnode, job 0
> Hello World from Node nodeA, job 1
>
>
> https://pastebin.com/5P8eJ7hb
> 
> srun --mpi=pmix_v1 -N 2 -n 2 mpi_program [headnode:17246] PMIX ERROR:
> UNPACK-PA - Pastebin.com 
> pastebin.com
>
> I have OMPI and pmix as modules which I load at runtime for the different
> slurm runs (different pmix).
>
> Configure commands for OMPI v2
>
> ./configure --prefix=/data_storage/cluster_software/mpis/openmpi-2.1.3
> --with-libevent=external --with-hwloc=external --with-ucx=/usr
> --with-pmix=/data_storage/cluster_software/mpis/pmix-1.2.5 --with-slurm
> --disable-pmix-dstore
>
> Configure command for PMIX v1
> ./configure --prefix=/data_storage/cluster_software/mpis/pmix-1.2.5
>
>
>
> Anyone have an idea?
>
>


[OMPI users] OMPI v2.1.5 with Slurm

2020-04-21 Thread Levi D Davis via users
Hello,

I have been having a heck of a time getting an old version of OMPI (v2.1) with 
slurm (v19). I need it to work with an application that I don't have control of 
recompiling with newer OMPI versions.

I think it is an OMPI or PMIX compiling issue because the newer versions (OMPI 
v3 and v4, PMIX v2 and v3) work with slurm as expected form the docs. I am not 
sure if it is because of a parameter I need to set for an older OMPI to make it 
working, since it is hard to find details about old versions and issues.

Here are some commands:

srun --mpi=list
srun: MPI types are...
srun: pmix_v3
srun: none
srun: openmpi
srun: pmi2
srun: pmix
srun: pmix_v1
srun: pmix_v2

srun --mpi=pmix_v2 -N 2 -n 2 mpi_program
Hello World from Node headnode, job 0
Hello World from Node nodeA, job 1


https://pastebin.com/5P8eJ7hb
[https://pastebin.com/i/facebook.png]
srun --mpi=pmix_v1 -N 2 -n 2 mpi_program [headnode:17246] PMIX ERROR: UNPACK-PA 
- Pastebin.com
pastebin.com

I have OMPI and pmix as modules which I load at runtime for the different slurm 
runs (different pmix).

Configure commands for OMPI v2

./configure --prefix=/data_storage/cluster_software/mpis/openmpi-2.1.3 
--with-libevent=external --with-hwloc=external --with-ucx=/usr 
--with-pmix=/data_storage/cluster_software/mpis/pmix-1.2.5 --with-slurm 
--disable-pmix-dstore

Configure command for PMIX v1
./configure --prefix=/data_storage/cluster_software/mpis/pmix-1.2.5



Anyone have an idea?



[OMPI users] opal_path_nfs freeze

2020-04-21 Thread Patrick Bégou via users
Hi OpenMPI maintainers,


I have temporary access to servers with AMD Epyc processors running RHEL7.

I'm trying to deploy OpenMPI with several setup but each time "make
check" fails on *opal_path_nfs*. This test freeze for ever consuming no
cpu resources.

After nearly one hour I have killed the process.

*_In test-suite.log I have:_*


   Open MPI v3.1.x-201810100324-c8e9819: test/util/test-suite.log


# TOTAL: 3
# PASS:  2
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: opal_path_nfs
===

FAIL opal_path_nfs (exit status: 137)


_*In opal_path_nfs.out I have a list of path:*_

/proc proc
/sys sysfs
/dev devtmpfs
/run tmpfs
/ xfs
/sys/kernel/security securityfs
/dev/shm tmpfs
/dev/pts devpts
/sys/fs/cgroup tmpfs
/sys/fs/cgroup/systemd cgroup
/sys/fs/pstore pstore
/sys/firmware/efi/efivars efivarfs
/sys/fs/cgroup/hugetlb cgroup
/sys/fs/cgroup/pids cgroup
/sys/fs/cgroup/net_cls,net_prio cgroup
/sys/fs/cgroup/devices cgroup
/sys/fs/cgroup/cpu,cpuacct cgroup
/sys/fs/cgroup/freezer cgroup
/sys/fs/cgroup/perf_event cgroup
/sys/fs/cgroup/cpuset cgroup
/sys/fs/cgroup/memory cgroup
/sys/fs/cgroup/blkio cgroup
/proc/sys/fs/binfmt_misc autofs
/sys/kernel/debug debugfs
/dev/hugepages hugetlbfs
/dev/mqueue mqueue
/sys/kernel/config configfs
/proc/sys/fs/binfmt_misc binfmt_misc
/boot/efi vfat
/local xfs
/var xfs
/tmp xfs
/var/lib/nfs/rpc_pipefs rpc_pipefs
/home nfs
/cm/shared nfs
/scratch nfs
/run/user/1013 tmpfs
/run/user/1010 tmpfs
/run/user/1046 tmpfs
/run/user/1015 tmpfs
/run/user/1121 tmpfs
/run/user/1113 tmpfs
/run/user/1126 tmpfs
/run/user/1002 tmpfs
/run/user/1130 tmpfs
/run/user/1004 tmpfs

_*In opal_path_nfs.log:*_

FAIL opal_path_nfs (exit status: 137)


The compiler is GCC9.2.

I've also tested openmpi-4.0.3 built with gcc 8.2. Same problem.

Thanks for your help.

Patrick