Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Gilles Gouaillardet
Ralph,

IIRC there is load balancing accros all the btl, for example
between vader and scif.
So load balancing between ib0 and eoib0 is just a particular case that might 
not necessarily be handled by the btl tcp.

Cheers,

Gilles

Ralph Castain  wrote:
>OMPI discovers all active interfaces and automatically considers them 
>available for its use unless instructed otherwise via the params. I’d have to 
>look at the TCP BTL code to see the loadbalancing algo - I thought we didn’t 
>have that “on” by default across BTLs, but I don’t know if the TCP one 
>automatically uses all available Ethernet interfaces by default. Sounds like 
>it must.
>
>
>> On Nov 7, 2014, at 11:53 AM, Brock Palen  wrote:
>> 
>> I was doing a test on our IB based cluster, where I was diabling IB
>> 
>> --mca btl ^openib --mca mtl ^mxm
>> 
>> I was sending very large messages >1GB  and I was surppised by the speed.
>> 
>> I noticed then that of all our ethernet interfaces
>> 
>> eth0  (1gig-e)
>> ib0  (ip over ib, for lustre configuration at vendor request)
>> eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some 
>> extrnal storage support at >1Gig speed
>> 
>> I saw all three were getting traffic.
>> 
>> We use torque for our Resource Manager and use TM support, the hostnames 
>> given by torque match the eth0 interfaces.
>> 
>> How does OMPI figure out that it can also talk over the others?  How does it 
>> chose to load balance?
>> 
>> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 
>> and eoib0  are the same physical device and may screw with load balancing if 
>> anyone ver falls back to TCP.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php
>
>___
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/11/25710.php

Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Gilles Gouaillardet
Brock,

Is your post related to ib0/eoib0 being used at all, or being used with load 
balancing ?

let me clarify this :
--mca btl ^openib
disables the openib btl aka *native* infiniband.
This does not disable ib0 and eoib0 that are handled by the tcp btl.
As you already figured out, btl_tcp_if_include (or btl_tcp_if_exclude) can be 
used for that purpose.

Cheers,

Gilles




Ralph Castain  wrote:
>OMPI discovers all active interfaces and automatically considers them 
>available for its use unless instructed otherwise via the params. I’d have to 
>look at the TCP BTL code to see the loadbalancing algo - I thought we didn’t 
>have that “on” by default across BTLs, but I don’t know if the TCP one 
>automatically uses all available Ethernet interfaces by default. Sounds like 
>it must.
>
>
>> On Nov 7, 2014, at 11:53 AM, Brock Palen  wrote:
>> 
>> I was doing a test on our IB based cluster, where I was diabling IB
>> 
>> --mca btl ^openib --mca mtl ^mxm
>> 
>> I was sending very large messages >1GB  and I was surppised by the speed.
>> 
>> I noticed then that of all our ethernet interfaces
>> 
>> eth0  (1gig-e)
>> ib0  (ip over ib, for lustre configuration at vendor request)
>> eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some 
>> extrnal storage support at >1Gig speed
>> 
>> I saw all three were getting traffic.
>> 
>> We use torque for our Resource Manager and use TM support, the hostnames 
>> given by torque match the eth0 interfaces.
>> 
>> How does OMPI figure out that it can also talk over the others?  How does it 
>> chose to load balance?
>> 
>> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 
>> and eoib0  are the same physical device and may screw with load balancing if 
>> anyone ver falls back to TCP.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php
>
>___
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/11/25710.php

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Ralph Castain
OMPI discovers all active interfaces and automatically considers them available 
for its use unless instructed otherwise via the params. I’d have to look at the 
TCP BTL code to see the loadbalancing algo - I thought we didn’t have that “on” 
by default across BTLs, but I don’t know if the TCP one automatically uses all 
available Ethernet interfaces by default. Sounds like it must.


> On Nov 7, 2014, at 11:53 AM, Brock Palen  wrote:
> 
> I was doing a test on our IB based cluster, where I was diabling IB
> 
> --mca btl ^openib --mca mtl ^mxm
> 
> I was sending very large messages >1GB  and I was surppised by the speed.
> 
> I noticed then that of all our ethernet interfaces
> 
> eth0  (1gig-e)
> ib0  (ip over ib, for lustre configuration at vendor request)
> eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some 
> extrnal storage support at >1Gig speed
> 
> I saw all three were getting traffic.
> 
> We use torque for our Resource Manager and use TM support, the hostnames 
> given by torque match the eth0 interfaces.
> 
> How does OMPI figure out that it can also talk over the others?  How does it 
> chose to load balance?
> 
> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 and 
> eoib0  are the same physical device and may screw with load balancing if 
> anyone ver falls back to TCP.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25709.php



[OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Brock Palen
I was doing a test on our IB based cluster, where I was diabling IB

--mca btl ^openib --mca mtl ^mxm

I was sending very large messages >1GB  and I was surppised by the speed.

I noticed then that of all our ethernet interfaces

eth0  (1gig-e)
ib0  (ip over ib, for lustre configuration at vendor request)
eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some extrnal 
storage support at >1Gig speed

I saw all three were getting traffic.

We use torque for our Resource Manager and use TM support, the hostnames given 
by torque match the eth0 interfaces.

How does OMPI figure out that it can also talk over the others?  How does it 
chose to load balance?

BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 and 
eoib0  are the same physical device and may screw with load balancing if anyone 
ver falls back to TCP.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





Re: [OMPI users] Question on mapping processes to hosts file

2014-11-07 Thread Ralph Castain
Ah, yes - so here is what is happening. When no slot info is provided, we use 
the number of detected cores on each node as the #slots. So if you want to 
loadbalance across the nodes, you need to set —map-by node

Or add slots=1 to each line of your host file to override the default behavior

> On Nov 7, 2014, at 8:52 AM, Blosch, Edwin L  wrote:
> 
> Here’s my command:
>  
> /bin/mpirun  --machinefile 
> hosts.dat -np 4 
>  
> Here’s my hosts.dat file:
>  
> % cat hosts.dat
> node01
> node02
> node03
> node04
>  
> All 4 ranks are launched on node01.  I don’t believe I’ve ever seen this 
> before.  I had to do a sanity check, so I tried MVAPICH2-2.1a and got what I 
> expected: 1 process runs on each of the 4 nodes.  The mpirun man page says 
> ‘round-robin’, which I take to mean that one process would be launched per 
> line in the hosts file, so this really seems like incorrect behavior.
>  
> What could be the possibilities here?
>  
> Thanks for the help!
>  
>  
>  
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25707.php 
> 


[OMPI users] Question on mapping processes to hosts file

2014-11-07 Thread Blosch, Edwin L
Here's my command:

/bin/mpirun  --machinefile 
hosts.dat -np 4 

Here's my hosts.dat file:

% cat hosts.dat
node01
node02
node03
node04

All 4 ranks are launched on node01.  I don't believe I've ever seen this 
before.  I had to do a sanity check, so I tried MVAPICH2-2.1a and got what I 
expected: 1 process runs on each of the 4 nodes.  The mpirun man page says 
'round-robin', which I take to mean that one process would be launched per line 
in the hosts file, so this really seems like incorrect behavior.

What could be the possibilities here?

Thanks for the help!





Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of MPI_Ibcast

2014-11-07 Thread Steven Eliuk
Let me clarify as that wasn’t very clear… if we enable, or disable, GDR it 
doesn’t make a difference. Seems to be in the base code,

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.


From: Rolf vandeVaart >
Reply-To: Open MPI Users >
List-Post: users@lists.open-mpi.org
Date: Thursday, November 6, 2014 at 10:18 AM
To: Open MPI Users >
Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of 
MPI_Ibcast

The CUDA person is now responding.  I will try and reproduce.  I looked through 
the zip file but did not see the mpirun command.   Can this be reproduced with 
–np 4 running across four nodes?
Also, in your original message you wrote “Likewise, it doesn't matter if I 
enable CUDA support or not. “  Can you provide more detail about what that 
means?
Thanks

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, November 06, 2014 1:05 PM
To: Open MPI Users
Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of 
MPI_Ibcast

I was hoping our CUDA person would respond, but in the interim - I would 
suggest trying the nightly 1.8.4 tarball as we are getting ready to release it, 
and I know there were some CUDA-related patches since 1.8.1

http://www.open-mpi.org/nightly/v1.8/


On Nov 5, 2014, at 4:45 PM, Steven Eliuk 
> wrote:

OpenMPI: 1.8.1 with CUDA RDMA…

Thanks sir and sorry for the late response,

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.


From: Ralph Castain >
Reply-To: Open MPI Users >
List-Post: users@lists.open-mpi.org
Date: Monday, November 3, 2014 at 10:02 AM
To: Open MPI Users >
Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of 
MPI_Ibcast

Which version of OMPI were you testing?

On Nov 3, 2014, at 9:14 AM, Steven Eliuk 
> wrote:

Hello,

We were using OpenMPI for some testing, everything works fine but randomly, 
MPI_Ibcast()
takes long time to finish. We have a standalone program just to test it.  The 
following
is the profiling results of the simple test program on our cluster:

Ibcast 604 mb takes 103 ms
Ibcast 608 mb takes 106 ms
Ibcast 612 mb takes 105 ms
Ibcast 616 mb takes 105 ms
Ibcast 620 mb takes 107 ms
Ibcast 624 mb takes 107 ms
Ibcast 628 mb takes 108 ms
Ibcast 632 mb takes 110 ms
Ibcast 636 mb takes 110 ms
Ibcast 640 mb takes 7437 ms
Ibcast 644 mb takes 115 ms
Ibcast 648 mb takes 111 ms
Ibcast 652 mb takes 112 ms
Ibcast 656 mb takes 112 ms
Ibcast 660 mb takes 114 ms
Ibcast 664 mb takes 114 ms
Ibcast 668 mb takes 115 ms
Ibcast 672 mb takes 116 ms
Ibcast 676 mb takes 116 ms
Ibcast 680 mb takes 116 ms
Ibcast 684 mb takes 122 ms
Ibcast 688 mb takes 7385 ms
Ibcast 692 mb takes 8729 ms
Ibcast 696 mb takes 120 ms
Ibcast 700 mb takes 124 ms
Ibcast 704 mb takes 121 ms
Ibcast 708 mb takes 8240 ms
Ibcast 712 mb takes 122 ms
Ibcast 716 mb takes 123 ms
Ibcast 720 mb takes 123 ms
Ibcast 724 mb takes 124 ms
Ibcast 728 mb takes 125 ms
Ibcast 732 mb takes 125 ms
Ibcast 736 mb takes 126 ms

As you can see, Ibcast takes a long to finish and it's totally random.
The same program was compiled and tested with MVAPICH2-gdr but it went smoothly.
Both tests were running exclusively on our four nodes cluster without 
contention. Likewise, it doesn't matter
if I enable CUDA support or not.  The followings are the configuration of our 
server:

We have four nodes in this test, each with one K40 GPU and connected with 
mellanox IB.

Please find attached config details and some sample code…

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25662.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25695.php