Re: [OMPI users] A strange warning on Cray XC with Opemmpi-4.00

2019-01-07 Thread Salim Jamal-Eddine
Hi Udayanga,

I had the same issue, the default behavior of OpenMpi4.0.0 is to use UCX.. add 
“–mca btl_openib_allow_ib 1” and everything should be fine.

Regards,


Salim Jamal-Eddine
Lead Engineering Labs Supervisor
Industrial & Mechanical Engineering Department, Byblos Campus
School of 
Engineering
Office: +961 1 786456 ext. 2899
f

SOE




[Lebanese American 
University]
Beirut
 | 
Byblos
 | New 
York
f

in

t

ig




From: users  On Behalf Of Udayanga 
Wickramasinghe
Sent: Monday, January 07, 2019 1:03 PM
To: Open MPI Users 
Cc: Open MPI Users 
Subject: [OMPI users] A strange warning on Cray XC with Opemmpi-4.00


Hi,

I upgraded my open-mpi version to 4.00 on a Cray Aries cluster (GNI/uGNI). 
Every time I run mpi, I get the following warning. Is there any way to suppress 
this message? I am not seeing this in 3.1.3 version of open-mpi. Any idea why 
openfabrics device is getting initialized when cray GNI transport is actively 
used? (i.e. looks like it is related to UCX, but I assume this could 
potentially be a configuration issue even though GNI transport/btl seems to get 
detected correctly and works by default without any explicit --mca parameters)





By default, for Open MPI 4.0 and later, infiniband ports on a device

are not used by default.  The intent is to use UCX for these devices.

You can override this policy by setting the btl_openib_allow_ib MCA parameter

to true.



  Local host:  nid00301

  Local adapter:   ibgni

  Local port:  1



--

--

WARNING: There was an error initializing an OpenFabrics device.



  Local host:   nid00300

  Local device: ibgni



... [program output]

... [program output]

...



[login1:02032] 3 more processes have sent help message help-mpi-btl-openib.txt 
/ ib port not selected

[login1:02032] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
help / error messages

[login1:02032] 3 more processes have sent help message help-mpi-btl-openib.txt 
/ error in device init


Thanks,
Udayanga

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A strange warning on Cray XC with Opemmpi-4.00

2019-01-07 Thread Udayanga Wickramasinghe
Hi Salim,
Thank you. Yeah, I noticed warnings would vanish by turning on
btl_openib_allow_ib
-- but since this is quite annoying I am wondering if there is any other
way to suppress this by configuration.

Best,
Udayanga

On Mon, Jan 7, 2019 at 7:33 AM Salim Jamal-Eddine <
salim.jamal-edd...@lau.edu.lb> wrote:

> Hi Udayanga,
>
>
>
> I had the same issue, the default behavior of OpenMpi4.0.0 is to use UCX..
> add “–mca btl_openib_allow_ib 1” and everything should be fine.
>
>
>
> Regards,
>
>
>
>
>
> Salim Jamal-Eddine
>
> Lead Engineering Labs Supervisor
> Industrial & Mechanical Engineering Department, Byblos Campus
> School of Engineering
> 
>
> Office: +961 1 786456 ext. 2899
>
> f
> 
>
> SOE
>
>
>
> [image: Lebanese American University]
> 
>
> Beirut
> 
>  | Byblos
> 
>  | New York
> 
>
> f 
>
> in 
>
> t 
>
> ig 
> --
>
>
>
> *From:* users  *On Behalf Of *Udayanga
> Wickramasinghe
> *Sent:* Monday, January 07, 2019 1:03 PM
> *To:* Open MPI Users 
> *Cc:* Open MPI Users 
> *Subject:* [OMPI users] A strange warning on Cray XC with Opemmpi-4.00
>
>
>
> Hi,
>
> I upgraded my open-mpi version to 4.00 on a Cray Aries cluster (GNI/uGNI).
> Every time I run mpi, I get the following warning. Is there any way to
> suppress this message? I am not seeing this in 3.1.3 version of open-mpi.
> Any idea why openfabrics device is getting initialized when cray GNI
> transport is actively used? (i.e. looks like it is related to UCX, but I
> assume this could potentially be a configuration issue even though GNI
> transport/btl seems to get detected correctly and works by default without
> any explicit --mca parameters)
>
>
>
>
>
> By default, for Open MPI 4.0 and later, infiniband ports on a device
>
> are not used by default.  The intent is to use UCX for these devices.
>
> You can override this policy by setting the btl_openib_allow_ib MCA
> parameter
>
> to true.
>
>
>
>   Local host:  nid00301
>
>   Local adapter:   ibgni
>
>   Local port:  1
>
>
>
> --
>
> --
>
> WARNING: There was an error initializing an OpenFabrics device.
>
>
>
>   Local host:   nid00300
>
>   Local device: ibgni
>
>
>
> ... [program output]
>
> ... [program output]
>
> ...
>
>
>
> [login1:02032] 3 more processes have sent help message
> help-mpi-btl-openib.txt / ib port not selected
>
> [login1:02032] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
>
> [login1:02032] 3 more processes have sent help message
> help-mpi-btl-openib.txt / error in device init
>
>
>
> Thanks,
> Udayanga
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A strange warning on Cray XC with Opemmpi-4.00

2019-01-07 Thread Bennet Fauber
Used to be that you could put default MCA settings in
OMPI_ROOT/etc/openmpi-mca-params.conf.

btl_openib_allow_ib=1

You could try that.

-- bennet



On Mon, Jan 7, 2019 at 8:16 AM Udayanga Wickramasinghe  wrote:
>
> Hi Salim,
> Thank you. Yeah, I noticed warnings would vanish by turning on 
> btl_openib_allow_ib -- but since this is quite annoying I am wondering if 
> there is any other way to suppress this by configuration.
>
> Best,
> Udayanga
>
> On Mon, Jan 7, 2019 at 7:33 AM Salim Jamal-Eddine 
>  wrote:
>>
>> Hi Udayanga,
>>
>>
>>
>> I had the same issue, the default behavior of OpenMpi4.0.0 is to use UCX.. 
>> add “–mca btl_openib_allow_ib 1” and everything should be fine.
>>
>>
>>
>> Regards,
>>
>>
>>
>>
>>
>> Salim Jamal-Eddine
>>
>> Lead Engineering Labs Supervisor
>> Industrial & Mechanical Engineering Department, Byblos Campus
>> School of Engineering
>>
>> Office: +961 1 786456 ext. 2899
>>
>> f
>>
>> SOE
>>
>>
>>
>> Beirut | Byblos | New York
>>
>> f
>>
>> in
>>
>> t
>>
>> ig
>>
>> 
>>
>>
>>
>> From: users  On Behalf Of Udayanga 
>> Wickramasinghe
>> Sent: Monday, January 07, 2019 1:03 PM
>> To: Open MPI Users 
>> Cc: Open MPI Users 
>> Subject: [OMPI users] A strange warning on Cray XC with Opemmpi-4.00
>>
>>
>>
>> Hi,
>>
>> I upgraded my open-mpi version to 4.00 on a Cray Aries cluster (GNI/uGNI). 
>> Every time I run mpi, I get the following warning. Is there any way to 
>> suppress this message? I am not seeing this in 3.1.3 version of open-mpi. 
>> Any idea why openfabrics device is getting initialized when cray GNI 
>> transport is actively used? (i.e. looks like it is related to UCX, but I 
>> assume this could potentially be a configuration issue even though GNI 
>> transport/btl seems to get detected correctly and works by default without 
>> any explicit --mca parameters)
>>
>>
>>
>>
>>
>> By default, for Open MPI 4.0 and later, infiniband ports on a device
>>
>> are not used by default.  The intent is to use UCX for these devices.
>>
>> You can override this policy by setting the btl_openib_allow_ib MCA parameter
>>
>> to true.
>>
>>
>>
>>   Local host:  nid00301
>>
>>   Local adapter:   ibgni
>>
>>   Local port:  1
>>
>>
>>
>> --
>>
>> --
>>
>> WARNING: There was an error initializing an OpenFabrics device.
>>
>>
>>
>>   Local host:   nid00300
>>
>>   Local device: ibgni
>>
>>
>>
>> ... [program output]
>>
>> ... [program output]
>>
>> ...
>>
>>
>>
>> [login1:02032] 3 more processes have sent help message 
>> help-mpi-btl-openib.txt / ib port not selected
>>
>> [login1:02032] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
>> help / error messages
>>
>> [login1:02032] 3 more processes have sent help message 
>> help-mpi-btl-openib.txt / error in device init
>>
>>
>>
>> Thanks,
>> Udayanga
>>
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] A strange warning on Cray XC with Opemmpi-4.00

2019-01-07 Thread Udayanga Wickramasinghe
Hi,

I upgraded my open-mpi version to 4.00 on a Cray Aries cluster (GNI/uGNI).
Every time I run mpi, I get the following warning. Is there any way to
suppress this message? I am not seeing this in 3.1.3 version of open-mpi.
Any idea why openfabrics device is getting initialized when cray GNI
transport is actively used? (i.e. looks like it is related to UCX, but I
assume this could potentially be a configuration issue even though GNI
transport/btl seems to get detected correctly and works by default without
any explicit --mca parameters)



By default, for Open MPI 4.0 and later, infiniband ports on a device

are not used by default.  The intent is to use UCX for these devices.

You can override this policy by setting the btl_openib_allow_ib MCA
parameter

to true.


  Local host:  nid00301

  Local adapter:   ibgni

  Local port:  1


--

--

WARNING: There was an error initializing an OpenFabrics device.


  Local host:   nid00300

  Local device: ibgni


... [program output]

... [program output]

...


[login1:02032] 3 more processes have sent help message
help-mpi-btl-openib.txt / ib port not selected

[login1:02032] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages

[login1:02032] 3 more processes have sent help message
help-mpi-btl-openib.txt / error in device init


Thanks,
Udayanga
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A strange warning on Cray XC with Opemmpi-4.00

2019-01-07 Thread Udayanga Wickramasinghe
Thanks, that actually worked!

Best,
Udayanga


On Mon, Jan 7, 2019 at 8:23 AM Bennet Fauber  wrote:

> Used to be that you could put default MCA settings in
> OMPI_ROOT/etc/openmpi-mca-params.conf.
>
> btl_openib_allow_ib=1
>
> You could try that.
>
> -- bennet
>
>
>
> On Mon, Jan 7, 2019 at 8:16 AM Udayanga Wickramasinghe 
> wrote:
> >
> > Hi Salim,
> > Thank you. Yeah, I noticed warnings would vanish by turning on
> btl_openib_allow_ib -- but since this is quite annoying I am wondering if
> there is any other way to suppress this by configuration.
> >
> > Best,
> > Udayanga
> >
> > On Mon, Jan 7, 2019 at 7:33 AM Salim Jamal-Eddine <
> salim.jamal-edd...@lau.edu.lb> wrote:
> >>
> >> Hi Udayanga,
> >>
> >>
> >>
> >> I had the same issue, the default behavior of OpenMpi4.0.0 is to use
> UCX.. add “–mca btl_openib_allow_ib 1” and everything should be fine.
> >>
> >>
> >>
> >> Regards,
> >>
> >>
> >>
> >>
> >>
> >> Salim Jamal-Eddine
> >>
> >> Lead Engineering Labs Supervisor
> >> Industrial & Mechanical Engineering Department, Byblos Campus
> >> School of Engineering
> >>
> >> Office: +961 1 786456 ext. 2899
> >>
> >> f
> >>
> >> SOE
> >>
> >>
> >>
> >> Beirut | Byblos | New York
> >>
> >> f
> >>
> >> in
> >>
> >> t
> >>
> >> ig
> >>
> >> 
> >>
> >>
> >>
> >> From: users  On Behalf Of Udayanga
> Wickramasinghe
> >> Sent: Monday, January 07, 2019 1:03 PM
> >> To: Open MPI Users 
> >> Cc: Open MPI Users 
> >> Subject: [OMPI users] A strange warning on Cray XC with Opemmpi-4.00
> >>
> >>
> >>
> >> Hi,
> >>
> >> I upgraded my open-mpi version to 4.00 on a Cray Aries cluster
> (GNI/uGNI). Every time I run mpi, I get the following warning. Is there any
> way to suppress this message? I am not seeing this in 3.1.3 version of
> open-mpi. Any idea why openfabrics device is getting initialized when cray
> GNI transport is actively used? (i.e. looks like it is related to UCX, but
> I assume this could potentially be a configuration issue even though GNI
> transport/btl seems to get detected correctly and works by default without
> any explicit --mca parameters)
> >>
> >>
> >>
> >>
> >>
> >> By default, for Open MPI 4.0 and later, infiniband ports on a device
> >>
> >> are not used by default.  The intent is to use UCX for these devices.
> >>
> >> You can override this policy by setting the btl_openib_allow_ib MCA
> parameter
> >>
> >> to true.
> >>
> >>
> >>
> >>   Local host:  nid00301
> >>
> >>   Local adapter:   ibgni
> >>
> >>   Local port:  1
> >>
> >>
> >>
> >>
> --
> >>
> >>
> --
> >>
> >> WARNING: There was an error initializing an OpenFabrics device.
> >>
> >>
> >>
> >>   Local host:   nid00300
> >>
> >>   Local device: ibgni
> >>
> >>
> >>
> >> ... [program output]
> >>
> >> ... [program output]
> >>
> >> ...
> >>
> >>
> >>
> >> [login1:02032] 3 more processes have sent help message
> help-mpi-btl-openib.txt / ib port not selected
> >>
> >> [login1:02032] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
> >>
> >> [login1:02032] 3 more processes have sent help message
> help-mpi-btl-openib.txt / error in device init
> >>
> >>
> >>
> >> Thanks,
> >> Udayanga
> >>
> >>
> >>
> >> ___
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] got warning and error messages when run Opemmpi v4.0.0

2019-01-07 Thread Jing Gong
Hi,


When we ran openmpi  v4.0.0 on a cluster with infiniband, we got the following 
warning and error messages. The older versions < 3.x work fine on the cluster.




$ mpirun -n 4 ./a.out

--
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default.  The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.

  Local host:  t02n34
  Local adapter:   mlx5_0
  Local port:  1

--
--
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   t02n34
  Local device: mlx5_0
--
libibcm: couldn't read ABI version
[1546869563.579350] [t02n34:28160:0]   cm_iface.c:309  UCX  ERROR 
ib_cm_open_device() failed: No such file or directory. Check if ib_ucm.ko 
module is loaded.
libibcm: couldn't read ABI version
[1546869563.580315] [t02n34:28159:0]   cm_iface.c:309  UCX  ERROR 
ib_cm_open_device() failed: No such file or directory. Check if ib_ucm.ko 
module is loaded.
libibcm: couldn't read ABI version
[1546869563.580620] [t02n34:28161:0]   cm_iface.c:309  UCX  ERROR 
ib_cm_open_device() failed: No such file or directory. Check if ib_ucm.ko 
module is loaded.
libibcm: couldn't read ABI version
[1546869563.581113] [t02n34:28158:0]   cm_iface.c:309  UCX  ERROR 
ib_cm_open_device() failed: No such file or directory. Check if ib_ucm.ko 
module is loaded.
[t02n34:28159] ../../../../../openmpi-4.0.0/ompi/mca/pml/ucx/pml_ucx.c:212 
Error: Failed to create UCP worker
[t02n34:28160] ../../../../../openmpi-4.0.0/ompi/mca/pml/ucx/pml_ucx.c:212 
Error: Failed to create UCP worker
[t02n34:28158] ../../../../../openmpi-4.0.0/ompi/mca/pml/ucx/pml_ucx.c:212 
Error: Failed to create UCP worker
[t02n34:28161] ../../../../../openmpi-4.0.0/ompi/mca/pml/ucx/pml_ucx.c:212 
Error: Failed to create UCP worker
Hello world from processor t02n34, rank 3 out of 4 processors
Hello world from processor t02n34, rank 0 out of 4 processors
Hello world from processor t02n34, rank 2 out of 4 processors
Hello world from processor t02n34, rank 1 out of 4 processors
[t02n34:28151] 3 more processes have sent help message help-mpi-btl-openib.txt 
/ ib port not selected
[t02n34:28151] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
help / error messages
[t02n34:28151] 3 more processes have sent help message help-mpi-btl-openib.txt 
/ error in device init



If set the variable "btl_openib_allow_ib=1", there are other errors.


t02n34$ mpirun -n 4 --mca btl_openib_allow_ib 1 ./a.out
[t02n34:28232:0:28232] Caught signal 11 (Segmentation fault: invalid 
permissions for mapped object at address 0x7fef6749e7e0)
[t02n34:28234:0:28234] Caught signal 11 (Segmentation fault: invalid 
permissions for mapped object at address 0x7fc2e8f4d7e0)
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
[t02n34:28233:0:28233] Caught signal 11 (Segmentation fault: invalid 
permissions for mapped object at address 0x7f981ee0e7e0)
[t02n34:28235:0:28235] Caught signal 11 (Segmentation fault: invalid 
permissions for mapped object at address 0x7fdc778c07e0)
--
mpirun noticed that process rank 2 with PID 0 on node t02n34 exited on signal 
11 (Segmentation fault).
--





The configuration flags to build this version are:


$ ../openmpi-4.0.0/configure --prefix=/vol/openmpi/4.0.0/ 
--with-ucx=/vol/openmpi/4.0.0/ucx/1.4.0

(even tried with --without-verbs but got same errors)



Thanks a lot.


Regards, Jing

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] A strange warning on Cray XC with Opemmpi-4.00

2019-01-07 Thread Udayanga Wickramasinghe
Hi,

I upgraded my open-mpi version to 4.00 on a Cray Aries cluster (GNI/uGNI).
Every time I run mpi, I get the following warning. Is there any way to
suppress this message? I am not seeing this in 3.1.3 version of open-mpi.
Any idea why openfabrics device is getting initialized when cray GNI
transport is actively used? (i.e. looks like it is related to UCX, but I
assume this could potentially be a configuration issue even though GNI
transport/btl seems to get detected correctly and works by default without
any explicit --mca parameters)



By default, for Open MPI 4.0 and later, infiniband ports on a device

are not used by default.  The intent is to use UCX for these devices.

You can override this policy by setting the btl_openib_allow_ib MCA
parameter

to true.


  Local host:  nid00301

  Local adapter:   ibgni

  Local port:  1


--

--

WARNING: There was an error initializing an OpenFabrics device.


  Local host:   nid00300

  Local device: ibgni


... [program output]

... [program output]

...


[login1:02032] 3 more processes have sent help message
help-mpi-btl-openib.txt / ib port not selected

[login1:02032] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages

[login1:02032] 3 more processes have sent help message
help-mpi-btl-openib.txt / error in device init


Thanks,
Udayanga
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users