This is an ancient version of HCOLL. Please upgrade to the latest version
(you can do this by installing HPC-X
https://www.mellanox.com/products/hpc-x-toolkit)

Josh

On Wed, Feb 5, 2020 at 4:35 AM Angel de Vicente <angel.de.vice...@iac.es>
wrote:

> Hi,
>
> Joshua Ladd <jladd.m...@gmail.com> writes:
>
> > We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it
> > takes exactly the same 19 secs (80 ranks).
> >
> > What version of HCOLL are you using? Command line?
>
> Thanks for having a look at this.
>
> According to ompi_info, our OpenMPI (version 3.0.1) was configured with
> (and gcc version 7.2.0):
>
> ,----
> |   Configure command line: 'CFLAGS=-I/apps/OPENMPI/SRC/PMI/include'
> |                           '--prefix=/storage/apps/OPENMPI/3.0.1/gnu'
> |                           '--with-mxm=/opt/mellanox/mxm'
> |                           '--with-hcoll=/opt/mellanox/hcoll'
> |                           '--with-knem=/opt/knem-1.1.2.90mlnx2'
> |                           '--with-slurm' '--with-pmi=/usr'
> |                           '--with-pmi-libdir=/usr/lib64'
> |
>  '--with-platform=../contrib/platform/mellanox/optimized'
> `----
>
> Not sure if there is a better way to find out the HCOLL version, but the
> file hcoll_version.h in /opt/mellanox/hcoll/include/hcoll/api/ says we
> have version 3.8.1649
>
> Code compiled as:
>
> ,----
> | $ mpicc -o test_t thread_io.c test.c
> `----
>
> To run the tests, I just submit the job to Slurm with the following
> script (changing the coll_hcoll_enable param accordingly):
>
> ,----
> | #!/bin/bash
> | #
> | #SBATCH -J test
> | #SBATCH -N 5
> | #SBATCH -n 51
> | #SBATCH -t 00:07:00
> | #SBATCH -o test-%j.out
> | #SBATCH -e test-%j.err
> | #SBATCH -D .
> |
> | module purge
> | module load openmpi/gnu/3.0.1
> |
> | time mpirun --mca coll_hcoll_enable 1 -np 51 ./test_t
> `----
>
> In the latest test I managed to squeeze in our queuing system, the
> hcoll-disabled run took ~3.5s, and the hcoll-enabled one ~43.5s (in this
> one I actually commented out all the fprintf statements just in case, so
> the code was pure communication).
>
> Thanks,
> --
> Ángel de Vicente
>
> Tel.: +34 922 605 747
> Web.: http://research.iac.es/proyecto/polmag/
>
> ---------------------------------------------------------------------------------------------
> ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de
> Datos, acceda a http://www.iac.es/disclaimer.php
> WARNING: For more information on privacy and fulfilment of the Law
> concerning the Protection of Data, consult
> http://www.iac.es/disclaimer.php?lang=en
>
>

Reply via email to