Re: [lustre-discuss] Lnet Self Test

2019-12-07 Thread Pinkesh Valdria
0  0 S   3.6  0.0  81:30.26 
socknal_sd01_03

   551 root  20   0   0  0  0 S   2.6  0.0  39:24.00 kswapd0

 60860 root      20   0   0  0  0 S   2.3  0.0  30:54.35 
socknal_sd00_01

 60864 root  20   0   0  0  0 S   2.3  0.0  30:58.20 
socknal_sd00_05

 64426 root  20   0   0  0  0 S   2.3  0.0   7:28.65 
ll_ost_io01_102

 60859 root  20   0   0  0  0 S   2.0  0.0  30:56.70 
socknal_sd00_00

 60861 root  20   0   0  0  0 S   2.0  0.0  30:54.97 
socknal_sd00_02

 60862 root  20   0   0  0  0 S   2.0  0.0  30:56.06 
socknal_sd00_03

 60863 root  20   0   0  0  0 S   2.0  0.0  30:56.32 
socknal_sd00_04

64334 root      20   0       0  0  0 D   1.3  0.0   7:19.46 
ll_ost_io01_010

 64329 root  20   0   0  0  0 S   1.0  0.0   7:46.48 
ll_ost_io01_005

 

 

 

From: "Moreno Diego (ID SIS)" 
Date: Wednesday, December 4, 2019 at 11:12 PM
To: Pinkesh Valdria , Jongwoo Han 

Cc: "lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Lnet Self Test

 

I recently did some work on 40Gb and 100Gb ethernet interfaces and these are a 
few of the things that helped me during lnet_selftest:

 
On lnet: credits set to higher than the default (e.g: 1024 or more), 
peer_credits to 128 at least for network testing (it’s just 8 by default which 
is good for a big cluster maybe not for lnet_selftest with 2 clients),
On ksocklnd module options: more schedulers (10, 6 by default which was not 
enough for my server), also changed some of the buffers (tx_buffer_size and 
rx_buffer_size set to 1073741824) but you need to be very careful on these
Sysctl.conf: increase buffers (tcp_rmem, tcp_wmem, check window_scaling, 
net.core.max and default, check disabling timestamps if you can afford it)
Other: cpupower governor (set to performance at least for testing), BIOS 
settings (e.g: on my AMD routers it was better to disable  HT, disable a few 
virtualization oriented features and set the PCI config to performance). 
Basically, be aware that Lustre ethernet’s performance will take CPU resources 
so better optimize for it
 

Last but not least be aware that Lustre’s ethernet driver (ksocklnd) does not 
load balance as well as Infiniband’s (ko2iblnd). I already saw sometimes 
several Lustre peers using the same socklnd thread on the destination but the 
other socklnd threads might not be active which means that your entire load is 
on just dependent on one core. For that the best is to try with more clients 
and check in your node what’s the cpu load per thread with top. 2 clients do 
not seem enough to me. With the proper configuration you should be perfectly 
able to saturate a 25Gb link in lnet_selftest.

 

Regards,

 

Diego

 

 

From: lustre-discuss  on behalf of 
Pinkesh Valdria 
Date: Thursday, 5 December 2019 at 06:14
To: Jongwoo Han 
Cc: "lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Lnet Self Test

 

Thanks Jongwoo. 

 

I have the MTU set for 9000 and also ring buffer setting set to max. 

 

ip link set dev $primaryNICInterface mtu 9000

ethtool -G $primaryNICInterface rx 2047 tx 2047 rx-jumbo 8191

 

I read about changing  Interrupt Coalesce, but unable to find what values 
should be changed and also if it really helps or not. 

# Several packets in a rapid sequence can be coalesced into one interrupt 
passed up to the CPU, providing more CPU time for application processing.

 

Thanks,

Pinkesh valdria

Oracle Cloud

 

 

 

From: Jongwoo Han 
Date: Wednesday, December 4, 2019 at 8:07 PM
To: Pinkesh Valdria 
Cc: Andreas Dilger , "lustre-discuss@lists.lustre.org" 

Subject: Re: [lustre-discuss] Lnet Self Test

 

Have you tried MTU >= 9000 bytes (AKA jumbo frame) on the 25G ethernet and the 
switch? 

If it is set to 1500 bytes, ethernet + IP + TCP frame headers take quite amount 
of packet, reducing available bandwidth for data.

 

Jongwoo Han

 

2019년 11월 28일 (목) 오전 3:44, Pinkesh Valdria 님이 작성:

Thanks Andreas for your response.  

 

I ran anotherLnet Self test with 48 concurrent processes, since the nodes have 
52 physical cores and I was able to achieve same throughput (2052.71  MiB/s = 
2152 MB/s).

 

Is it expected to lose almost 600 MB/s (2750-2150= ) due to overheads on 
ethernet with Lnet?

 

 

Thanks,

Pinkesh Valdria

Oracle Cloud Infrastructure 

 

 

 

 

From: Andreas Dilger 
Date: Wednesday, November 27, 2019 at 1:25 AM
To: Pinkesh Valdria 
Cc: "lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Lnet Self Test

 

The first thing to note is that lst reports results in binary units 

(MiB/s) while iperf reports results in decimal units (Gbps).  If you do the

conversion you get 2055.31 MiB/s = 2155 MB/s.

 

The other thing to check is the CPU usage. For TCP the CPU usage can

be high. You should try RoCE+o2iblnd instead. 

 

Cheers, Andreas


On Nov 26, 20

Re: [lustre-discuss] Lnet Self Test

2019-12-04 Thread Moreno Diego (ID SIS)
I recently did some work on 40Gb and 100Gb ethernet interfaces and these are a 
few of the things that helped me during lnet_selftest:


  *   On lnet: credits set to higher than the default (e.g: 1024 or more), 
peer_credits to 128 at least for network testing (it’s just 8 by default which 
is good for a big cluster maybe not for lnet_selftest with 2 clients),
  *   On ksocklnd module options: more schedulers (10, 6 by default which was 
not enough for my server), also changed some of the buffers (tx_buffer_size and 
rx_buffer_size set to 1073741824) but you need to be very careful on these
  *   Sysctl.conf: increase buffers (tcp_rmem, tcp_wmem, check window_scaling, 
net.core.max and default, check disabling timestamps if you can afford it)
  *   Other: cpupower governor (set to performance at least for testing), BIOS 
settings (e.g: on my AMD routers it was better to disable  HT, disable a few 
virtualization oriented features and set the PCI config to performance). 
Basically, be aware that Lustre ethernet’s performance will take CPU resources 
so better optimize for it

Last but not least be aware that Lustre’s ethernet driver (ksocklnd) does not 
load balance as well as Infiniband’s (ko2iblnd). I already saw sometimes 
several Lustre peers using the same socklnd thread on the destination but the 
other socklnd threads might not be active which means that your entire load is 
on just dependent on one core. For that the best is to try with more clients 
and check in your node what’s the cpu load per thread with top. 2 clients do 
not seem enough to me. With the proper configuration you should be perfectly 
able to saturate a 25Gb link in lnet_selftest.

Regards,

Diego


From: lustre-discuss  on behalf of 
Pinkesh Valdria 
Date: Thursday, 5 December 2019 at 06:14
To: Jongwoo Han 
Cc: "lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Lnet Self Test

Thanks Jongwoo.

I have the MTU set for 9000 and also ring buffer setting set to max.


ip link set dev $primaryNICInterface mtu 9000

ethtool -G $primaryNICInterface rx 2047 tx 2047 rx-jumbo 8191

I read about changing  Interrupt Coalesce, but unable to find what values 
should be changed and also if it really helps or not.
# Several packets in a rapid sequence can be coalesced into one interrupt 
passed up to the CPU, providing more CPU time for application processing.

Thanks,
Pinkesh valdria
Oracle Cloud



From: Jongwoo Han 
Date: Wednesday, December 4, 2019 at 8:07 PM
To: Pinkesh Valdria 
Cc: Andreas Dilger , "lustre-discuss@lists.lustre.org" 

Subject: Re: [lustre-discuss] Lnet Self Test

Have you tried MTU >= 9000 bytes (AKA jumbo frame) on the 25G ethernet and the 
switch?
If it is set to 1500 bytes, ethernet + IP + TCP frame headers take quite amount 
of packet, reducing available bandwidth for data.

Jongwoo Han

2019년 11월 28일 (목) 오전 3:44, Pinkesh Valdria 
mailto:pinkesh.vald...@oracle.com>>님이 작성:
Thanks Andreas for your response.

I ran anotherLnet Self test with 48 concurrent processes, since the nodes have 
52 physical cores and I was able to achieve same throughput (2052.71  MiB/s = 
2152 MB/s).

Is it expected to lose almost 600 MB/s (2750-2150= ) due to overheads on 
ethernet with Lnet?


Thanks,
Pinkesh Valdria
Oracle Cloud Infrastructure




From: Andreas Dilger mailto:adil...@whamcloud.com>>
Date: Wednesday, November 27, 2019 at 1:25 AM
To: Pinkesh Valdria 
mailto:pinkesh.vald...@oracle.com>>
Cc: "lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>" 
mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] Lnet Self Test

The first thing to note is that lst reports results in binary units
(MiB/s) while iperf reports results in decimal units (Gbps).  If you do the
conversion you get 2055.31 MiB/s = 2155 MB/s.

The other thing to check is the CPU usage. For TCP the CPU usage can
be high. You should try RoCE+o2iblnd instead.

Cheers, Andreas

On Nov 26, 2019, at 21:26, Pinkesh Valdria 
mailto:pinkesh.vald...@oracle.com>> wrote:
Hello All,

I created a new Lustre cluster on CentOS7.6 and I am running 
lnet_selftest_wrapper.sh to measure throughput on the network.  The nodes are 
connected to each other using 25Gbps ethernet, so theoretical max is 25 Gbps * 
125 = 3125 MB/s.Using iperf3,  I get 22Gbps (2750 MB/s) between the nodes.


[root@lustre-client-2 ~]# for c in 1 2 4 8 12 16 20 24 ;  do echo $c ; 
ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)  CN=$c  SZ=1M  TM=30 BRW=write 
CKSUM=simple LFROM="10.0.3.7@tcp1" LTO="10.0.3.6@tcp1" 
/root/lnet_selftest_wrapper.sh; done ;

When I run lnet_selftest_wrapper.sh (from Lustre 
wiki<https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.lustre.org_LNET-5FSelftest=DwMGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=HpfvG0tozSl7HgJJuyxxo2149EjwqpQDE7ytv-4sZuI=dEosA07cQm7WPohubrpzab8agc4uFDGesC-4tI4ylm0=-ne2Yke64JRw4BQu9pa0DXwf3tHkDqaUbp7S6Eq_C_Q=>)
 be

Re: [lustre-discuss] Lnet Self Test

2019-12-04 Thread Pinkesh Valdria
Thanks Jongwoo. 

 

I have the MTU set for 9000 and also ring buffer setting set to max. 

 

ip link set dev $primaryNICInterface mtu 9000

ethtool -G $primaryNICInterface rx 2047 tx 2047 rx-jumbo 8191

 

I read about changing  Interrupt Coalesce, but unable to find what values 
should be changed and also if it really helps or not. 

# Several packets in a rapid sequence can be coalesced into one interrupt 
passed up to the CPU, providing more CPU time for application processing.

 

Thanks,

Pinkesh valdria

Oracle Cloud

 

 

 

From: Jongwoo Han 
Date: Wednesday, December 4, 2019 at 8:07 PM
To: Pinkesh Valdria 
Cc: Andreas Dilger , "lustre-discuss@lists.lustre.org" 

Subject: Re: [lustre-discuss] Lnet Self Test

 

Have you tried MTU >= 9000 bytes (AKA jumbo frame) on the 25G ethernet and the 
switch? 

If it is set to 1500 bytes, ethernet + IP + TCP frame headers take quite amount 
of packet, reducing available bandwidth for data.

 

Jongwoo Han

 

2019년 11월 28일 (목) 오전 3:44, Pinkesh Valdria 님이 작성:

Thanks Andreas for your response.  

 

I ran anotherLnet Self test with 48 concurrent processes, since the nodes have 
52 physical cores and I was able to achieve same throughput (2052.71  MiB/s = 
2152 MB/s).

 

Is it expected to lose almost 600 MB/s (2750-2150= ) due to overheads on 
ethernet with Lnet?

 

 

Thanks,

Pinkesh Valdria

Oracle Cloud Infrastructure 

 

 

 

 

From: Andreas Dilger 
Date: Wednesday, November 27, 2019 at 1:25 AM
To: Pinkesh Valdria 
Cc: "lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Lnet Self Test

 

The first thing to note is that lst reports results in binary units 

(MiB/s) while iperf reports results in decimal units (Gbps).  If you do the

conversion you get 2055.31 MiB/s = 2155 MB/s.

 

The other thing to check is the CPU usage. For TCP the CPU usage can

be high. You should try RoCE+o2iblnd instead. 

 

Cheers, Andreas


On Nov 26, 2019, at 21:26, Pinkesh Valdria  wrote:

Hello All, 

 

I created a new Lustre cluster on CentOS7.6 and I am running 
lnet_selftest_wrapper.sh to measure throughput on the network.  The nodes are 
connected to each other using 25Gbps ethernet, so theoretical max is 25 Gbps * 
125 = 3125 MB/s.Using iperf3,  I get 22Gbps (2750 MB/s) between the nodes.

 

 

[root@lustre-client-2 ~]# for c in 1 2 4 8 12 16 20 24 ;  do echo $c ; 
ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)  CN=$c  SZ=1M  TM=30 BRW=write 
CKSUM=simple LFROM="10.0.3.7@tcp1" LTO="10.0.3.6@tcp1" 
/root/lnet_selftest_wrapper.sh; done ;

 

When I run lnet_selftest_wrapper.sh (from Lustre wiki) between 2 nodes,  I get 
a max of  2055.31  MiB/s,  Is that expected at the Lnet level?  Or can I 
further tune the network and OS kernel (tuning I applied are below) to get 
better throughput?

 

 

 

Result Snippet from lnet_selftest_wrapper.sh

 

[LNet Rates of lfrom]

[R] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s

[W] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s

[LNet Bandwidth of lfrom]

[R] Avg: 0.31 MiB/s Min: 0.31 MiB/s Max: 0.31 MiB/s

[W] Avg: 2055.30  MiB/s Min: 2055.30  MiB/s Max: 2055.30  MiB/s

[LNet Rates of lto]

[R] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s

[W] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s

[LNet Bandwidth of lto]

[R] Avg: 2055.31  MiB/s Min: 2055.31  MiB/s Max: 2055.31  MiB/s

[W] Avg: 0.32 MiB/s Min: 0.32 MiB/s Max: 0.32 MiB/s

 

 

Tuning applied: 

Ethernet NICs: 

ip link set dev ens3 mtu 9000 

ethtool -G ens3 rx 2047 tx 2047 rx-jumbo 8191

 

 

less /etc/sysctl.conf

net.core.wmem_max=16777216

net.core.rmem_max=16777216

net.core.wmem_default=16777216

net.core.rmem_default=16777216

net.core.optmem_max=16777216

net.core.netdev_max_backlog=27000

kernel.sysrq=1

kernel.shmmax=18446744073692774399

net.core.somaxconn=8192

net.ipv4.tcp_adv_win_scale=2

net.ipv4.tcp_low_latency=1

net.ipv4.tcp_rmem = 212992 87380 16777216

net.ipv4.tcp_sack = 1

net.ipv4.tcp_timestamps = 1

net.ipv4.tcp_window_scaling = 1

net.ipv4.tcp_wmem = 212992 65536 16777216

vm.min_free_kbytes = 65536

net.ipv4.tcp_congestion_control = cubic

net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_congestion_control = htcp

net.ipv4.tcp_no_metrics_save = 0

 

 

 

echo "#

# tuned configuration

#

[main]

summary=Broadly applicable tuning that provides excellent performance across a 
variety of common server workloads

 

[disk]

devices=!dm-*, !sda1, !sda2, !sda3

readahead=>4096

 

[cpu]

force_latency=1

governor=performance

energy_perf_bias=performance

min_perf_pct=100

[vm]

transparent_huge_pages=never

[sysctl]

kernel.sched_min_granularity_ns = 1000

kernel.sched_wakeup_granularity_ns = 1500

vm.dirty_ratio = 30

vm.dirty_background_ratio = 10

vm.swappiness=30

" > lustre-performance/tuned.conf

 

tuned-adm profile lustre-performance

 

 

Thanks,

Pinkesh Valdria

 

___

Re: [lustre-discuss] Lnet Self Test

2019-12-04 Thread Jongwoo Han
Have you tried MTU >= 9000 bytes (AKA jumbo frame) on the 25G ethernet and
the switch?
If it is set to 1500 bytes, ethernet + IP + TCP frame headers take quite
amount of packet, reducing available bandwidth for data.

Jongwoo Han

2019년 11월 28일 (목) 오전 3:44, Pinkesh Valdria 님이
작성:

> Thanks Andreas for your response.
>
>
>
> I ran anotherLnet Self test with 48 concurrent processes, since the nodes
> have 52 physical cores and I was able to achieve same throughput (2052.71
> MiB/s = 2152 MB/s).
>
>
>
> Is it expected to lose almost 600 MB/s (2750-2150= ) due to overheads on
> ethernet with Lnet?
>
>
>
>
>
> Thanks,
>
> Pinkesh Valdria
>
> Oracle Cloud Infrastructure
>
>
>
>
>
>
>
>
>
> *From: *Andreas Dilger 
> *Date: *Wednesday, November 27, 2019 at 1:25 AM
> *To: *Pinkesh Valdria 
> *Cc: *"lustre-discuss@lists.lustre.org" 
> *Subject: *Re: [lustre-discuss] Lnet Self Test
>
>
>
> The first thing to note is that lst reports results in binary units
>
> (MiB/s) while iperf reports results in decimal units (Gbps).  If you do the
>
> conversion you get 2055.31 MiB/s = 2155 MB/s.
>
>
>
> The other thing to check is the CPU usage. For TCP the CPU usage can
>
> be high. You should try RoCE+o2iblnd instead.
>
>
>
> Cheers, Andreas
>
>
> On Nov 26, 2019, at 21:26, Pinkesh Valdria 
> wrote:
>
> Hello All,
>
>
>
> I created a new Lustre cluster on CentOS7.6 and I am running
> lnet_selftest_wrapper.sh to measure throughput on the network.  The nodes
> are connected to each other using 25Gbps ethernet, so theoretical max is 25
> Gbps * 125 = 3125 MB/s.Using iperf3,  I get 22Gbps (2750 MB/s) between
> the nodes.
>
>
>
>
>
> [root@lustre-client-2 ~]# for c in 1 2 4 8 12 16 20 24 ;  do echo $c ;
> ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)  CN=$c  SZ=1M  TM=30 BRW=write
> CKSUM=simple LFROM="10.0.3.7@tcp1" LTO="10.0.3.6@tcp1"
> /root/lnet_selftest_wrapper.sh; done ;
>
>
>
> When I run lnet_selftest_wrapper.sh (from Lustre wiki
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.lustre.org_LNET-5FSelftest=DwMGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=HpfvG0tozSl7HgJJuyxxo2149EjwqpQDE7ytv-4sZuI=dEosA07cQm7WPohubrpzab8agc4uFDGesC-4tI4ylm0=-ne2Yke64JRw4BQu9pa0DXwf3tHkDqaUbp7S6Eq_C_Q=>)
> between 2 nodes,  I get a max of  2055.31  MiB/s,  Is that expected at the
> Lnet level?  Or can I further tune the network and OS kernel (tuning I
> applied are below) to get better throughput?
>
>
>
>
>
>
>
> *Result Snippet from lnet_selftest_wrapper.sh*
>
>
>
> [LNet Rates of lfrom]
>
> [R] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s
>
> [W] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s
>
> [LNet Bandwidth of lfrom]
>
> [R] Avg: 0.31 MiB/s Min: 0.31 MiB/s Max: 0.31 MiB/s
>
> [W] Avg: 2055.30  MiB/s Min: 2055.30  MiB/s Max: 2055.30  MiB/s
>
> [LNet Rates of lto]
>
> [R] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s
>
> [W] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s
>
> [LNet Bandwidth of lto]
>
> [R] Avg: 2055.31  MiB/s Min: 2055.31  MiB/s Max: 2055.31  MiB/s
>
> [W] Avg: 0.32 MiB/s Min: 0.32 MiB/s Max: 0.32 MiB/s
>
>
>
>
>
> *Tuning applied: *
>
> *Ethernet NICs: *
>
> ip link set dev ens3 mtu 9000
>
> ethtool -G ens3 rx 2047 tx 2047 rx-jumbo 8191
>
>
>
>
>
> *less /etc/sysctl.conf*
>
> net.core.wmem_max=16777216
>
> net.core.rmem_max=16777216
>
> net.core.wmem_default=16777216
>
> net.core.rmem_default=16777216
>
> net.core.optmem_max=16777216
>
> net.core.netdev_max_backlog=27000
>
> kernel.sysrq=1
>
> kernel.shmmax=18446744073692774399
>
> net.core.somaxconn=8192
>
> net.ipv4.tcp_adv_win_scale=2
>
> net.ipv4.tcp_low_latency=1
>
> net.ipv4.tcp_rmem = 212992 87380 16777216
>
> net.ipv4.tcp_sack = 1
>
> net.ipv4.tcp_timestamps = 1
>
> net.ipv4.tcp_window_scaling = 1
>
> net.ipv4.tcp_wmem = 212992 65536 16777216
>
> vm.min_free_kbytes = 65536
>
> net.ipv4.tcp_congestion_control = cubic
>
> net.ipv4.tcp_timestamps = 0
>
> net.ipv4.tcp_congestion_control = htcp
>
> net.ipv4.tcp_no_metrics_save = 0
>
>
>
>
>
>
>
> echo "#
>
> *# tuned configuration*
>
> *#*
>
> [main]
>
> summary=Broadly applicable tuning that provides excellent performance
> across a variety of common server workloads
>
>
>
> [disk]
>
> devices=!dm-*, !sda1, !sda2, !sda3
>
> readahead=>4096
>
>
>
> [cpu]
>
> for

Re: [lustre-discuss] Lnet Self Test

2019-11-27 Thread Pinkesh Valdria
Thanks Andreas for your response.  

 

I ran anotherLnet Self test with 48 concurrent processes, since the nodes have 
52 physical cores and I was able to achieve same throughput (2052.71  MiB/s = 
2152 MB/s).

 

Is it expected to lose almost 600 MB/s (2750-2150= ) due to overheads on 
ethernet with Lnet?

 

 

Thanks,

Pinkesh Valdria

Oracle Cloud Infrastructure 

 

 

 

 

From: Andreas Dilger 
Date: Wednesday, November 27, 2019 at 1:25 AM
To: Pinkesh Valdria 
Cc: "lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Lnet Self Test

 

The first thing to note is that lst reports results in binary units 

(MiB/s) while iperf reports results in decimal units (Gbps).  If you do the

conversion you get 2055.31 MiB/s = 2155 MB/s.

 

The other thing to check is the CPU usage. For TCP the CPU usage can

be high. You should try RoCE+o2iblnd instead. 

 

Cheers, Andreas


On Nov 26, 2019, at 21:26, Pinkesh Valdria  wrote:

Hello All, 

 

I created a new Lustre cluster on CentOS7.6 and I am running 
lnet_selftest_wrapper.sh to measure throughput on the network.  The nodes are 
connected to each other using 25Gbps ethernet, so theoretical max is 25 Gbps * 
125 = 3125 MB/s.Using iperf3,  I get 22Gbps (2750 MB/s) between the nodes.

 

 

[root@lustre-client-2 ~]# for c in 1 2 4 8 12 16 20 24 ;  do echo $c ; 
ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)  CN=$c  SZ=1M  TM=30 BRW=write 
CKSUM=simple LFROM="10.0.3.7@tcp1" LTO="10.0.3.6@tcp1" 
/root/lnet_selftest_wrapper.sh; done ;

 

When I run lnet_selftest_wrapper.sh (from Lustre wiki) between 2 nodes,  I get 
a max of  2055.31  MiB/s,  Is that expected at the Lnet level?  Or can I 
further tune the network and OS kernel (tuning I applied are below) to get 
better throughput?

 

 

 

Result Snippet from lnet_selftest_wrapper.sh

 

[LNet Rates of lfrom]

[R] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s

[W] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s

[LNet Bandwidth of lfrom]

[R] Avg: 0.31 MiB/s Min: 0.31 MiB/s Max: 0.31 MiB/s

[W] Avg: 2055.30  MiB/s Min: 2055.30  MiB/s Max: 2055.30  MiB/s

[LNet Rates of lto]

[R] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s

[W] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s

[LNet Bandwidth of lto]

[R] Avg: 2055.31  MiB/s Min: 2055.31  MiB/s Max: 2055.31  MiB/s

[W] Avg: 0.32 MiB/s Min: 0.32 MiB/s Max: 0.32 MiB/s

 

 

Tuning applied: 

Ethernet NICs: 

ip link set dev ens3 mtu 9000 

ethtool -G ens3 rx 2047 tx 2047 rx-jumbo 8191

 

 

less /etc/sysctl.conf

net.core.wmem_max=16777216

net.core.rmem_max=16777216

net.core.wmem_default=16777216

net.core.rmem_default=16777216

net.core.optmem_max=16777216

net.core.netdev_max_backlog=27000

kernel.sysrq=1

kernel.shmmax=18446744073692774399

net.core.somaxconn=8192

net.ipv4.tcp_adv_win_scale=2

net.ipv4.tcp_low_latency=1

net.ipv4.tcp_rmem = 212992 87380 16777216

net.ipv4.tcp_sack = 1

net.ipv4.tcp_timestamps = 1

net.ipv4.tcp_window_scaling = 1

net.ipv4.tcp_wmem = 212992 65536 16777216

vm.min_free_kbytes = 65536

net.ipv4.tcp_congestion_control = cubic

net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_congestion_control = htcp

net.ipv4.tcp_no_metrics_save = 0

 

 

 

echo "#

# tuned configuration

#

[main]

summary=Broadly applicable tuning that provides excellent performance across a 
variety of common server workloads

 

[disk]

devices=!dm-*, !sda1, !sda2, !sda3

readahead=>4096

 

[cpu]

force_latency=1

governor=performance

energy_perf_bias=performance

min_perf_pct=100

[vm]

transparent_huge_pages=never

[sysctl]

kernel.sched_min_granularity_ns = 1000

kernel.sched_wakeup_granularity_ns = 1500

vm.dirty_ratio = 30

vm.dirty_background_ratio = 10

vm.swappiness=30

" > lustre-performance/tuned.conf

 

tuned-adm profile lustre-performance

 

 

Thanks,

Pinkesh Valdria

 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lnet Self Test

2019-11-27 Thread Andreas Dilger
The first thing to note is that lst reports results in binary units
(MiB/s) while iperf reports results in decimal units (Gbps).  If you do the
conversion you get 2055.31 MiB/s = 2155 MB/s.

The other thing to check is the CPU usage. For TCP the CPU usage can
be high. You should try RoCE+o2iblnd instead.

Cheers, Andreas

On Nov 26, 2019, at 21:26, Pinkesh Valdria 
mailto:pinkesh.vald...@oracle.com>> wrote:

Hello All,

I created a new Lustre cluster on CentOS7.6 and I am running 
lnet_selftest_wrapper.sh to measure throughput on the network.  The nodes are 
connected to each other using 25Gbps ethernet, so theoretical max is 25 Gbps * 
125 = 3125 MB/s.Using iperf3,  I get 22Gbps (2750 MB/s) between the nodes.


[root@lustre-client-2 ~]# for c in 1 2 4 8 12 16 20 24 ;  do echo $c ; 
ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)  CN=$c  SZ=1M  TM=30 BRW=write 
CKSUM=simple LFROM="10.0.3.7@tcp1" LTO="10.0.3.6@tcp1" 
/root/lnet_selftest_wrapper.sh; done ;

When I run lnet_selftest_wrapper.sh (from Lustre 
wiki) between 2 nodes,  I get a max of  
2055.31  MiB/s,  Is that expected at the Lnet level?  Or can I further tune the 
network and OS kernel (tuning I applied are below) to get better throughput?



Result Snippet from lnet_selftest_wrapper.sh

[LNet Rates of lfrom]
[R] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s
[W] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s
[LNet Bandwidth of lfrom]
[R] Avg: 0.31 MiB/s Min: 0.31 MiB/s Max: 0.31 MiB/s
[W] Avg: 2055.30  MiB/s Min: 2055.30  MiB/s Max: 2055.30  MiB/s
[LNet Rates of lto]
[R] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s
[W] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s
[LNet Bandwidth of lto]
[R] Avg: 2055.31  MiB/s Min: 2055.31  MiB/s Max: 2055.31  MiB/s
[W] Avg: 0.32 MiB/s Min: 0.32 MiB/s Max: 0.32 MiB/s


Tuning applied:
Ethernet NICs:

ip link set dev ens3 mtu 9000

ethtool -G ens3 rx 2047 tx 2047 rx-jumbo 8191


less /etc/sysctl.conf
net.core.wmem_max=16777216
net.core.rmem_max=16777216
net.core.wmem_default=16777216
net.core.rmem_default=16777216
net.core.optmem_max=16777216
net.core.netdev_max_backlog=27000
kernel.sysrq=1
kernel.shmmax=18446744073692774399
net.core.somaxconn=8192
net.ipv4.tcp_adv_win_scale=2
net.ipv4.tcp_low_latency=1
net.ipv4.tcp_rmem = 212992 87380 16777216
net.ipv4.tcp_sack = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 212992 65536 16777216
vm.min_free_kbytes = 65536
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_congestion_control = htcp
net.ipv4.tcp_no_metrics_save = 0



echo "#
# tuned configuration
#
[main]
summary=Broadly applicable tuning that provides excellent performance across a 
variety of common server workloads

[disk]
devices=!dm-*, !sda1, !sda2, !sda3
readahead=>4096

[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[vm]
transparent_huge_pages=never
[sysctl]
kernel.sched_min_granularity_ns = 1000
kernel.sched_wakeup_granularity_ns = 1500
vm.dirty_ratio = 30
vm.dirty_background_ratio = 10
vm.swappiness=30
" > lustre-performance/tuned.conf

tuned-adm profile lustre-performance


Thanks,
Pinkesh Valdria

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lnet Self Test

2019-11-26 Thread Pinkesh Valdria
Hello All, 

 

I created a new Lustre cluster on CentOS7.6 and I am running 
lnet_selftest_wrapper.sh to measure throughput on the network.  The nodes are 
connected to each other using 25Gbps ethernet, so theoretical max is 25 Gbps * 
125 = 3125 MB/s.    Using iperf3,  I get 22Gbps (2750 MB/s) between the nodes.

 

 

[root@lustre-client-2 ~]# for c in 1 2 4 8 12 16 20 24 ;  do echo $c ; 
ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)  CN=$c  SZ=1M  TM=30 BRW=write 
CKSUM=simple LFROM="10.0.3.7@tcp1" LTO="10.0.3.6@tcp1" 
/root/lnet_selftest_wrapper.sh; done ;

 

When I run lnet_selftest_wrapper.sh (from Lustre wiki) between 2 nodes,  I get 
a max of  2055.31  MiB/s,  Is that expected at the Lnet level?  Or can I 
further tune the network and OS kernel (tuning I applied are below) to get 
better throughput?

 

 

 

Result Snippet from lnet_selftest_wrapper.sh

 

 [LNet Rates of lfrom]

[R] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s

[W] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s

[LNet Bandwidth of lfrom]

[R] Avg: 0.31 MiB/s Min: 0.31 MiB/s Max: 0.31 MiB/s

[W] Avg: 2055.30  MiB/s Min: 2055.30  MiB/s Max: 2055.30  MiB/s

[LNet Rates of lto]

[R] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s

[W] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s

[LNet Bandwidth of lto]

[R] Avg: 2055.31  MiB/s Min: 2055.31  MiB/s Max: 2055.31  MiB/s

[W] Avg: 0.32 MiB/s Min: 0.32 MiB/s Max: 0.32 MiB/s

 

 

Tuning applied: 

Ethernet NICs: 

ip link set dev ens3 mtu 9000 

ethtool -G ens3 rx 2047 tx 2047 rx-jumbo 8191

 

 

less /etc/sysctl.conf

net.core.wmem_max=16777216

net.core.rmem_max=16777216

net.core.wmem_default=16777216

net.core.rmem_default=16777216

net.core.optmem_max=16777216

net.core.netdev_max_backlog=27000

kernel.sysrq=1

kernel.shmmax=18446744073692774399

net.core.somaxconn=8192

net.ipv4.tcp_adv_win_scale=2

net.ipv4.tcp_low_latency=1

net.ipv4.tcp_rmem = 212992 87380 16777216

net.ipv4.tcp_sack = 1

net.ipv4.tcp_timestamps = 1

net.ipv4.tcp_window_scaling = 1

net.ipv4.tcp_wmem = 212992 65536 16777216

vm.min_free_kbytes = 65536

net.ipv4.tcp_congestion_control = cubic

net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_congestion_control = htcp

net.ipv4.tcp_no_metrics_save = 0

 

 

 

echo "#

# tuned configuration

#

[main]

summary=Broadly applicable tuning that provides excellent performance across a 
variety of common server workloads

 

[disk]

devices=!dm-*, !sda1, !sda2, !sda3

readahead=>4096

 

[cpu]

force_latency=1

governor=performance

energy_perf_bias=performance

min_perf_pct=100

[vm]

transparent_huge_pages=never

[sysctl]

kernel.sched_min_granularity_ns = 1000

kernel.sched_wakeup_granularity_ns = 1500

vm.dirty_ratio = 30

vm.dirty_background_ratio = 10

vm.swappiness=30

" > lustre-performance/tuned.conf

 

tuned-adm profile lustre-performance

 

 

Thanks,

Pinkesh Valdria

 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-08 Thread Jon Tegner

Thanks a lot!

A related question: is it possible to use the result from the "ping" 
test to verify the latency obtained from openmpi? Or, how do I know it 
the result from the "ping" test is "acceptable"?


/jon

On 02/07/2017 06:38 PM, Oucharek, Doug S wrote:

Because the stat command is “lst stat servers”, the statistics you are seeing 
are from the perspective of the server.  The “from” and “to” parameters can get 
quite confusing for the read case.  When reading, you are transferring the bulk 
data from the “to” group to the “from” group (yes, seems the opposite of what 
you would expect).  I think the “from” and “to” labels were designed to make 
sense in the write case and the logic was just flipped for the read case.

So, the stats you show indicated that are you writing an average of 3.6GiB/s 
(note: the lnet-selftest stats are mislabel and should be MiB/s rather than 
MB/s…I have fixed this in the latest release.  You are then getting 3.8GB/s).  
The reason you see traffic in the read direction is due to responses/acks.  
That is why there are a lot of small messages going back to the server (high 
RPC rate, small bandwidth).

So, your test looks like it is working to me.

Doug


On Feb 7, 2017, at 2:13 AM, Jon Tegner  wrote:

Probably doing something wrong here, but I tried to test only READING with the 
following:

#!/bin/bash
export LST_SESSION=$$
lst new_session read
lst add_group servers 10.0.12.12@o2ib
lst add_group readers 10.0.12.11@o2ib
lst add_batch bulk_read
lst add_test --batch bulk_read --concurrency 12 --from readers --to servers \
brw read check=simple size=1M
lst run bulk_read
lst stat servers & sleep 10; kill $!
lst end_session

which in my case gives:

[LNet Rates of servers]
[R] Avg: 3633 RPC/s Min: 3633 RPC/s Max: 3633 RPC/s
[W] Avg: 7241 RPC/s Min: 7241 RPC/s Max: 7241 RPC/s
[LNet Bandwidth of servers]
[R] Avg: 2.29 MB/s  Min: 2.29 MB/s  Max: 2.29 MB/s
[W] Avg: 3608.44  MB/s  Min: 3608.44  MB/s  Max: 3608.44  MB/s

it seems strange that it should report non zero numbers in the [W] positions? Specially that bandwidth is low in the [R] position (since I 
explicitly demanded "read")? Also note that if I change "brw read" to "brw write" in the script above the 
results are "reversed" in the sense that it reports the higher number regarding bandwidth in the [R] position. That is "brw 
read" reports (almost) the expected bandwidth in the [W]-position, whereas "brw write" reports it in the [R]-position.

This is on CentOS-6.5/Lustre-2.5.3. Will try 7.3/2.9.0 later.

Thanks,
/jon


On 02/06/2017 05:45 PM, Oucharek, Doug S wrote:

Try running just a read test and then just a write test rather than having both 
at the same time and see if the performance goes up.


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-07 Thread Oucharek, Doug S
Because the stat command is “lst stat servers”, the statistics you are seeing 
are from the perspective of the server.  The “from” and “to” parameters can get 
quite confusing for the read case.  When reading, you are transferring the bulk 
data from the “to” group to the “from” group (yes, seems the opposite of what 
you would expect).  I think the “from” and “to” labels were designed to make 
sense in the write case and the logic was just flipped for the read case.

So, the stats you show indicated that are you writing an average of 3.6GiB/s 
(note: the lnet-selftest stats are mislabel and should be MiB/s rather than 
MB/s…I have fixed this in the latest release.  You are then getting 3.8GB/s).  
The reason you see traffic in the read direction is due to responses/acks.  
That is why there are a lot of small messages going back to the server (high 
RPC rate, small bandwidth).

So, your test looks like it is working to me.

Doug

> On Feb 7, 2017, at 2:13 AM, Jon Tegner  wrote:
> 
> Probably doing something wrong here, but I tried to test only READING with 
> the following:
> 
> #!/bin/bash
> export LST_SESSION=$$
> lst new_session read
> lst add_group servers 10.0.12.12@o2ib
> lst add_group readers 10.0.12.11@o2ib
> lst add_batch bulk_read
> lst add_test --batch bulk_read --concurrency 12 --from readers --to servers \
> brw read check=simple size=1M
> lst run bulk_read
> lst stat servers & sleep 10; kill $!
> lst end_session
> 
> which in my case gives:
> 
> [LNet Rates of servers]
> [R] Avg: 3633 RPC/s Min: 3633 RPC/s Max: 3633 RPC/s
> [W] Avg: 7241 RPC/s Min: 7241 RPC/s Max: 7241 RPC/s
> [LNet Bandwidth of servers]
> [R] Avg: 2.29 MB/s  Min: 2.29 MB/s  Max: 2.29 MB/s
> [W] Avg: 3608.44  MB/s  Min: 3608.44  MB/s  Max: 3608.44  MB/s
> 
> it seems strange that it should report non zero numbers in the [W] positions? 
> Specially that bandwidth is low in the [R] position (since I explicitly 
> demanded "read")? Also note that if I change "brw read" to "brw write" in the 
> script above the results are "reversed" in the sense that it reports the 
> higher number regarding bandwidth in the [R] position. That is "brw read" 
> reports (almost) the expected bandwidth in the [W]-position, whereas "brw 
> write" reports it in the [R]-position.
> 
> This is on CentOS-6.5/Lustre-2.5.3. Will try 7.3/2.9.0 later.
> 
> Thanks,
> /jon
> 
> 
> On 02/06/2017 05:45 PM, Oucharek, Doug S wrote:
>> Try running just a read test and then just a write test rather than having 
>> both at the same time and see if the performance goes up.
> 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-07 Thread Jon Tegner
Probably doing something wrong here, but I tried to test only READING 
with the following:


#!/bin/bash
export LST_SESSION=$$
lst new_session read
lst add_group servers 10.0.12.12@o2ib
lst add_group readers 10.0.12.11@o2ib
lst add_batch bulk_read
lst add_test --batch bulk_read --concurrency 12 --from readers --to 
servers \

brw read check=simple size=1M
lst run bulk_read
lst stat servers & sleep 10; kill $!
lst end_session

which in my case gives:

[LNet Rates of servers]
[R] Avg: 3633 RPC/s Min: 3633 RPC/s Max: 3633 RPC/s
[W] Avg: 7241 RPC/s Min: 7241 RPC/s Max: 7241 RPC/s
[LNet Bandwidth of servers]
[R] Avg: 2.29 MB/s  Min: 2.29 MB/s  Max: 2.29 MB/s
[W] Avg: 3608.44  MB/s  Min: 3608.44  MB/s  Max: 3608.44  MB/s

it seems strange that it should report non zero numbers in the [W] 
positions? Specially that bandwidth is low in the [R] position (since I 
explicitly demanded "read")? Also note that if I change "brw read" to 
"brw write" in the script above the results are "reversed" in the sense 
that it reports the higher number regarding bandwidth in the [R] 
position. That is "brw read" reports (almost) the expected bandwidth in 
the [W]-position, whereas "brw write" reports it in the [R]-position.


This is on CentOS-6.5/Lustre-2.5.3. Will try 7.3/2.9.0 later.

Thanks,
/jon


On 02/06/2017 05:45 PM, Oucharek, Doug S wrote:

Try running just a read test and then just a write test rather than having both 
at the same time and see if the performance goes up.


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-06 Thread Oucharek, Doug S
Try running just a read test and then just a write test rather than having both 
at the same time and see if the performance goes up.

Doug

> On Feb 6, 2017, at 4:40 AM, Jon Tegner  wrote:
> 
> Hi,
> 
> I used the following script:
> 
> #!/bin/bash
> export LST_SESSION=$$
> lst new_session read/write
> lst add_group servers 10.0.12.12@o2ib
> lst add_group readers 10.0.12.11@o2ib
> lst add_group writers 10.0.12.11@o2ib
> lst add_batch bulk_rw
> lst add_test --batch bulk_rw --concurrency 12 --from readers --to servers \
> brw read check=simple size=1M
> lst add_test --batch bulk_rw --concurrency 12 --from writers --to servers \
> brw write check=simple size=1M
> # start running
> lst run bulk_rw
> # display server stats for 30 seconds
> lst stat servers & sleep 30; kill $!
> # tear down
> lst end_session
> 
> and tried with concurrency from 0,2,4,8,12,16, results in
> 
> http://renget.se/lnetBandwidth.png
> and
> http://renget.se/lnetRates.png
> 
> From Bandwidth a max of just below 2800 MB/s can be noted. Since in this case 
> "readers" and "writers" are the same, I did a few tests with the line
> 
> lst add_test --batch bulk_rw --concurrency 12 --from writers --to servers \
> brw write check=simple size=1M
> 
> removed from the script - which resulted in a bandwidth of around 3600 MB/s.
> 
> I also did tests using mpitests-osu_bw from openmpi, and in that case I 
> monitored a bandwidth of about 3900 MB/s.
> 
> Considering the "openmpi-bandwidth" should I be happy with the numbers 
> obtained by LNet selftest? Is there a way to modify the test so that the 
> result gets closer to what openmpi is giving? And what can be said of the 
> "Rates of servers (RPC/s)" - are they "good" or "bad"? What to compare them 
> with?
> 
> Thanks!
> 
> /jon
> 
> On 02/05/2017 08:55 PM, Jeff Johnson wrote:
>> Without seeing your entire command it is hard to say for sure but I would 
>> make sure your concurrency option is set to 8 for starters.
>> 
>> --Jeff
>> 
>> Sent from my iPhone
>> 
>>> On Feb 5, 2017, at 11:30, Jon Tegner  wrote:
>>> 
>>> Hi,
>>> 
>>> I'm trying to use lnet selftest to evaluate network performance on a test 
>>> setup (only two machines). Using e.g., iperf or Netpipe I've managed to 
>>> demonstrate the bandwidth of the underlying 10 Gbits/s network (and 
>>> typically you reach the expected bandwidth as the packet size increases).
>>> 
>>> How can I do the same using lnet selftest (i.e., verifying the bandwidth of 
>>> the underlying hardware)? My initial thought was to increase the I/O size, 
>>> but it seems the maximum size one can use is "--size=1M".
>>> 
>>> Thanks,
>>> 
>>> /jon
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-06 Thread Oucharek, Doug S
You can have larger RPCs, but those get split up into 1M LNet operations.  
Lnet-selftest works with LNet messages and not RPCs.

Doug

On Feb 5, 2017, at 3:07 PM, Patrick Farrell 
<p...@cray.com<mailto:p...@cray.com>> wrote:

Doug,

It seems to me that's not true any more, with larger RPC sizes available.  Is 
there some reason that's not true?

- Patrick

From: lustre-discuss 
<lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Oucharek, Doug S 
<doug.s.oucha...@intel.com<mailto:doug.s.oucha...@intel.com>>
Sent: Sunday, February 5, 2017 3:18:10 PM
To: Jeff Johnson
Cc: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] LNET Self-test

Yes, you can bump your concurrency.  Size caps out at 1M because that is how 
LNet is setup to work.  Going over 1M size would result in an unrealistic 
Lustre test.

Doug

> On Feb 5, 2017, at 11:55 AM, Jeff Johnson 
> <jeff.john...@aeoncomputing.com<mailto:jeff.john...@aeoncomputing.com>> wrote:
>
> Without seeing your entire command it is hard to say for sure but I would 
> make sure your concurrency option is set to 8 for starters.
>
> --Jeff
>
> Sent from my iPhone
>
>> On Feb 5, 2017, at 11:30, Jon Tegner <teg...@foi.se<mailto:teg...@foi.se>> 
>> wrote:
>>
>> Hi,
>>
>> I'm trying to use lnet selftest to evaluate network performance on a test 
>> setup (only two machines). Using e.g., iperf or Netpipe I've managed to 
>> demonstrate the bandwidth of the underlying 10 Gbits/s network (and 
>> typically you reach the expected bandwidth as the packet size increases).
>>
>> How can I do the same using lnet selftest (i.e., verifying the bandwidth of 
>> the underlying hardware)? My initial thought was to increase the I/O size, 
>> but it seems the maximum size one can use is "--size=1M".
>>
>> Thanks,
>>
>> /jon
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-06 Thread Jon Tegner

Hi,

I used the following script:

#!/bin/bash
export LST_SESSION=$$
lst new_session read/write
lst add_group servers 10.0.12.12@o2ib
lst add_group readers 10.0.12.11@o2ib
lst add_group writers 10.0.12.11@o2ib
lst add_batch bulk_rw
lst add_test --batch bulk_rw --concurrency 12 --from readers --to servers \
brw read check=simple size=1M
lst add_test --batch bulk_rw --concurrency 12 --from writers --to servers \
brw write check=simple size=1M
# start running
lst run bulk_rw
# display server stats for 30 seconds
lst stat servers & sleep 30; kill $!
# tear down
lst end_session

and tried with concurrency from 0,2,4,8,12,16, results in

http://renget.se/lnetBandwidth.png
and
http://renget.se/lnetRates.png

From Bandwidth a max of just below 2800 MB/s can be noted. Since in 
this case "readers" and "writers" are the same, I did a few tests with 
the line


lst add_test --batch bulk_rw --concurrency 12 --from writers --to servers \
brw write check=simple size=1M

removed from the script - which resulted in a bandwidth of around 3600 MB/s.

I also did tests using mpitests-osu_bw from openmpi, and in that case I 
monitored a bandwidth of about 3900 MB/s.


Considering the "openmpi-bandwidth" should I be happy with the numbers 
obtained by LNet selftest? Is there a way to modify the test so that the 
result gets closer to what openmpi is giving? And what can be said of 
the "Rates of servers (RPC/s)" - are they "good" or "bad"? What to 
compare them with?


Thanks!

/jon

On 02/05/2017 08:55 PM, Jeff Johnson wrote:

Without seeing your entire command it is hard to say for sure but I would make 
sure your concurrency option is set to 8 for starters.

--Jeff

Sent from my iPhone


On Feb 5, 2017, at 11:30, Jon Tegner  wrote:

Hi,

I'm trying to use lnet selftest to evaluate network performance on a test setup 
(only two machines). Using e.g., iperf or Netpipe I've managed to demonstrate 
the bandwidth of the underlying 10 Gbits/s network (and typically you reach the 
expected bandwidth as the packet size increases).

How can I do the same using lnet selftest (i.e., verifying the bandwidth of the 
underlying hardware)? My initial thought was to increase the I/O size, but it seems the 
maximum size one can use is "--size=1M".

Thanks,

/jon
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-05 Thread Patrick Farrell
Doug,


It seems to me that's not true any more, with larger RPC sizes available.  Is 
there some reason that's not true?


- Patrick


From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Oucharek, Doug S <doug.s.oucha...@intel.com>
Sent: Sunday, February 5, 2017 3:18:10 PM
To: Jeff Johnson
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] LNET Self-test

Yes, you can bump your concurrency.  Size caps out at 1M because that is how 
LNet is setup to work.  Going over 1M size would result in an unrealistic 
Lustre test.

Doug

> On Feb 5, 2017, at 11:55 AM, Jeff Johnson <jeff.john...@aeoncomputing.com> 
> wrote:
>
> Without seeing your entire command it is hard to say for sure but I would 
> make sure your concurrency option is set to 8 for starters.
>
> --Jeff
>
> Sent from my iPhone
>
>> On Feb 5, 2017, at 11:30, Jon Tegner <teg...@foi.se> wrote:
>>
>> Hi,
>>
>> I'm trying to use lnet selftest to evaluate network performance on a test 
>> setup (only two machines). Using e.g., iperf or Netpipe I've managed to 
>> demonstrate the bandwidth of the underlying 10 Gbits/s network (and 
>> typically you reach the expected bandwidth as the packet size increases).
>>
>> How can I do the same using lnet selftest (i.e., verifying the bandwidth of 
>> the underlying hardware)? My initial thought was to increase the I/O size, 
>> but it seems the maximum size one can use is "--size=1M".
>>
>> Thanks,
>>
>> /jon
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-05 Thread Oucharek, Doug S
Yes, you can bump your concurrency.  Size caps out at 1M because that is how 
LNet is setup to work.  Going over 1M size would result in an unrealistic 
Lustre test.

Doug

> On Feb 5, 2017, at 11:55 AM, Jeff Johnson  
> wrote:
> 
> Without seeing your entire command it is hard to say for sure but I would 
> make sure your concurrency option is set to 8 for starters. 
> 
> --Jeff
> 
> Sent from my iPhone
> 
>> On Feb 5, 2017, at 11:30, Jon Tegner  wrote:
>> 
>> Hi,
>> 
>> I'm trying to use lnet selftest to evaluate network performance on a test 
>> setup (only two machines). Using e.g., iperf or Netpipe I've managed to 
>> demonstrate the bandwidth of the underlying 10 Gbits/s network (and 
>> typically you reach the expected bandwidth as the packet size increases).
>> 
>> How can I do the same using lnet selftest (i.e., verifying the bandwidth of 
>> the underlying hardware)? My initial thought was to increase the I/O size, 
>> but it seems the maximum size one can use is "--size=1M".
>> 
>> Thanks,
>> 
>> /jon
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-05 Thread Jeff Johnson
Without seeing your entire command it is hard to say for sure but I would make 
sure your concurrency option is set to 8 for starters. 

--Jeff

Sent from my iPhone

> On Feb 5, 2017, at 11:30, Jon Tegner  wrote:
> 
> Hi,
> 
> I'm trying to use lnet selftest to evaluate network performance on a test 
> setup (only two machines). Using e.g., iperf or Netpipe I've managed to 
> demonstrate the bandwidth of the underlying 10 Gbits/s network (and typically 
> you reach the expected bandwidth as the packet size increases).
> 
> How can I do the same using lnet selftest (i.e., verifying the bandwidth of 
> the underlying hardware)? My initial thought was to increase the I/O size, 
> but it seems the maximum size one can use is "--size=1M".
> 
> Thanks,
> 
> /jon
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-05 Thread Raj
You should be able to do concurrent streams using --concurrency option. I
would try with 2/4/8.
-RG

On Sun, Feb 5, 2017 at 1:30 PM Jon Tegner  wrote:

> Hi,
>
> I'm trying to use lnet selftest to evaluate network performance on a
> test setup (only two machines). Using e.g., iperf or Netpipe I've
> managed to demonstrate the bandwidth of the underlying 10 Gbits/s
> network (and typically you reach the expected bandwidth as the packet
> size increases).
>
> How can I do the same using lnet selftest (i.e., verifying the bandwidth
> of the underlying hardware)? My initial thought was to increase the I/O
> size, but it seems the maximum size one can use is "--size=1M".
>
> Thanks,
>
> /jon
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] LNET Self-test

2017-02-05 Thread Jon Tegner

Hi,

I'm trying to use lnet selftest to evaluate network performance on a 
test setup (only two machines). Using e.g., iperf or Netpipe I've 
managed to demonstrate the bandwidth of the underlying 10 Gbits/s 
network (and typically you reach the expected bandwidth as the packet 
size increases).


How can I do the same using lnet selftest (i.e., verifying the bandwidth 
of the underlying hardware)? My initial thought was to increase the I/O 
size, but it seems the maximum size one can use is "--size=1M".


Thanks,

/jon
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org