HI Brian,

As a sanity check, can you see if the ob1 pml works okay, i.e.

 mpirun -n 2 --mca pml ob1 --mca btl self,vader,openib ./osu_latency

Howard


2018-02-07 11:03 GMT-07:00 brian larkins <brianlark...@gmail.com>:

> Hello,
>
> I’m doing some work with Portals4 and am trying to run some MPI programs
> using the Portals 4 as the transport layer. I’m running into problems and
> am hoping that someone can help me figure out how to get things working.
> I’m using OpenMPI 3.0.0 with the following configuration:
>
> ./configure CFLAGS=-pipe —prefix=path/to/install --enable-picky
> --enable-debug --enable-mpi-fortran --with-portals4=path/to/portals4
> --disable-oshmem --disable-vt --disable-java --disable-mpi-io
> --disable-io-romio --disable-libompitrace --disable-btl-portals4-flow-control
> --disable-mtl-portals4-flow-control
>
> I have also tried the head from the git repo and 2.1.2 with the same
> results. A simpler configure line (w —prefix and —with-portals4=) also gets
> same results.
>
> Portals4 configuration is from github master and configured thus:
>
> ./configure —prefix=path/to/portals4 --with-ev=path/to/libev
> --enable-transport-ib --enable-fast --enable-zero-mrs --enable-me-triggered
>
> If I specify the cm pml on the command-line, I can get examples/hello_c to
> run correctly. Trying to get some latency numbers using the OSU benchmarks
> is where my trouble begins:
>
> $ mpirun -n 2 --mca mtl portals4  --mca pml cm env
> PTL_DISABLE_MEM_REG_CACHE=1 ./osu_latency
> NOTE: Ummunotify and IB registered mem cache disabled, set
> PTL_DISABLE_MEM_REG_CACHE=0 to re-enable.
> NOTE: Ummunotify and IB registered mem cache disabled, set
> PTL_DISABLE_MEM_REG_CACHE=0 to re-enable.
> # OSU MPI Latency Test
> # Size            Latency (us)
> 0                        25.96
> [node41:19740] *** An error occurred in MPI_Barrier
> [node41:19740] *** reported by process [139815819542529,4294967297]
> [node41:19740] *** on communicator MPI_COMM_WORLD
> [node41:19740] *** MPI_ERR_OTHER: known error not in list
> [node41:19740] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
> will now abort,
> [node41:19740] ***    and potentially your MPI job)
>
> Not specifying CM gets an earlier segfault (defaults to ob1) and looks to
> be a progress thread initialization problem.
> Using PTL_IGNORE_UMMUNOTIFY=1  gets here:
>
> $ mpirun --mca pml cm -n 2 env PTL_IGNORE_UMMUNOTIFY=1 ./osu_latency
> # OSU MPI Latency Test
> # Size            Latency (us)
> 0                        24.14
> 1                        26.24
> [node41:19993] *** Process received signal ***
> [node41:19993] Signal: Segmentation fault (11)
> [node41:19993] Signal code: Address not mapped (1)
> [node41:19993] Failing at address: 0x141
> [node41:19993] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7fa6ac73b710]
> [node41:19993] [ 1] /ascldap/users/dblarki/opt/portals4.master/lib/
> libportals.so.4(+0xcd65)[0x7fa69b770d65]
> [node41:19993] [ 2] /ascldap/users/dblarki/opt/portals4.master/lib/
> libportals.so.4(PtlPut+0x143)[0x7fa69b773fb3]
> [node41:19993] [ 3] /ascldap/users/dblarki/opt/ompi/lib/openmpi/mca_mtl_
> portals4.so(+0xa961)[0x7fa698cf5961]
> [node41:19993] [ 4] /ascldap/users/dblarki/opt/ompi/lib/openmpi/mca_mtl_
> portals4.so(+0xb0e5)[0x7fa698cf60e5]
> [node41:19993] [ 5] /ascldap/users/dblarki/opt/ompi/lib/openmpi/mca_mtl_
> portals4.so(ompi_mtl_portals4_send+0x90)[0x7fa698cf61d1]
> [node41:19993] [ 6] /ascldap/users/dblarki/opt/
> ompi/lib/openmpi/mca_pml_cm.so(+0x5430)[0x7fa69a794430]
> [node41:19993] [ 7] /ascldap/users/dblarki/opt/ompi/lib/libmpi.so.40(PMPI_
> Send+0x2b4)[0x7fa6ac9ff018]
> [node41:19993] [ 8] ./osu_latency[0x40106f]
> [node41:19993] [ 9] /lib64/libc.so.6(__libc_start_
> main+0xfd)[0x7fa6ac3b6d5d]
> [node41:19993] [10] ./osu_latency[0x400c59]
>
> This cluster is running RHEL 6.5 without ummunotify modules, but I get the
> same results on a local (small) cluster running ubuntu 16.04 with
> ummunotify loaded.
>
> Any help would be much appreciated.
> thanks,
>
> brian.
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to