Re: [lustre-discuss] Lustre poor performance

2017-08-28 Thread Riccardo Veraldi
for Qlogic the script works but then there is some other parameter to
change in the peer credits value otherwise Lustre will complane and it
would not work.
At lest this is for my old Qlogic QDR cards.
I do not know if this does apply for newer Qlogic too.

I'll write a patch to the script that will work for mellanox cards
(ConnectX-3 family).
I Can't speak for ConnectX-4 because I have no experience on those right
now.
 


On 8/23/17 4:36 PM, Dilger, Andreas wrote:
> On Aug 23, 2017, at 08:39, Mohr Jr, Richard Frank (Rick Mohr)  
> wrote:
>>
>>> On Aug 22, 2017, at 7:14 PM, Riccardo Veraldi 
>>>  wrote:
>>>
>>> On 8/22/17 9:22 AM, Mannthey, Keith wrote:
 Younot expected.

>>> yes they are automatically used on my Mellanox and the script 
>>> ko2iblnd-probe seems like not working properly.
>> The ko2iblnd-probe script looks in /sys/class/infiniband for device names 
>> starting with “hfi” or “qib”.  If it detects those, it decides that the 
>> “profile” it should use is “opa” so then it basically invokes the 
>> ko2iblnd-opa modprobe line.  But the script has no logic to detect other 
>> types of card (i.e. - mellanox), so in those cases, no ko2iblnd options are 
>> used and you end up with the default module parameters being used.
>>
>> If you want to use the script, you will need to modify ko2iblnd-probe to add 
>> a new case for your brand of HCA and then add an appropriate 
>> ko2iblnd- line to ko2iblnd.conf.
>>
>> Or just do what I did and comment out all the lines in ko2iblnd.conf and add 
>> your own lines.
> If there are significantly different options needed for newer Mellanox HCAs 
> (e.g. as between Qlogic/OPA and MLX) it would be great to get a patch to 
> ko2iblnd-probe and ko2iblnd.conf that adds those options as the default for 
> the new type of card, so that Lustre works better out of the box.  That helps 
> transfer the experience of veteran IB users to users that may not have the 
> background to get the best LNet IB performance.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
>
>
>
>
>
>
>
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-28 Thread Riccardo Veraldi
On 8/23/17 7:39 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>> On Aug 22, 2017, at 7:14 PM, Riccardo Veraldi 
>>  wrote:
>>
>> On 8/22/17 9:22 AM, Mannthey, Keith wrote:
>>> Younot expected.
>>>
>> yes they are automatically used on my Mellanox and the script ko2iblnd-probe 
>> seems like not working properly.
> The ko2iblnd-probe script looks in /sys/class/infiniband for device names 
> starting with “hfi” or “qib”.  If it detects those, it decides that the 
> “profile” it should use is “opa” so then it basically invokes the 
> ko2iblnd-opa modprobe line.  But the script has no logic to detect other 
> types of card (i.e. - mellanox), so in those cases, no ko2iblnd options are 
> used and you end up with the default module parameters being used.
>
> If you want to use the script, you will need to modify ko2iblnd-probe to add 
> a new case for your brand of HCA and then add an appropriate 
> ko2iblnd- line to ko2iblnd.conf.
>
> Or just do what I did and comment out all the lines in ko2iblnd.conf and add 
> your own lines.
yes what I did was to disable the module alias and just

options ko2iblnd ...
install ko2iblnd ...

and it worked.
I may modify the script as well as you mentioned.

thank you.
>
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
>
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-23 Thread Dilger, Andreas
On Aug 23, 2017, at 08:39, Mohr Jr, Richard Frank (Rick Mohr)  
wrote:
> 
> 
>> On Aug 22, 2017, at 7:14 PM, Riccardo Veraldi 
>>  wrote:
>> 
>> On 8/22/17 9:22 AM, Mannthey, Keith wrote:
>>> Younot expected.
>>> 
>> yes they are automatically used on my Mellanox and the script ko2iblnd-probe 
>> seems like not working properly.
> 
> The ko2iblnd-probe script looks in /sys/class/infiniband for device names 
> starting with “hfi” or “qib”.  If it detects those, it decides that the 
> “profile” it should use is “opa” so then it basically invokes the 
> ko2iblnd-opa modprobe line.  But the script has no logic to detect other 
> types of card (i.e. - mellanox), so in those cases, no ko2iblnd options are 
> used and you end up with the default module parameters being used.
> 
> If you want to use the script, you will need to modify ko2iblnd-probe to add 
> a new case for your brand of HCA and then add an appropriate 
> ko2iblnd- line to ko2iblnd.conf.
> 
> Or just do what I did and comment out all the lines in ko2iblnd.conf and add 
> your own lines.

If there are significantly different options needed for newer Mellanox HCAs 
(e.g. as between Qlogic/OPA and MLX) it would be great to get a patch to 
ko2iblnd-probe and ko2iblnd.conf that adds those options as the default for the 
new type of card, so that Lustre works better out of the box.  That helps 
transfer the experience of veteran IB users to users that may not have the 
background to get the best LNet IB performance.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-23 Thread Mohr Jr, Richard Frank (Rick Mohr)

> On Aug 22, 2017, at 7:14 PM, Riccardo Veraldi  
> wrote:
> 
> On 8/22/17 9:22 AM, Mannthey, Keith wrote:
>> Younot expected.
>> 
> yes they are automatically used on my Mellanox and the script ko2iblnd-probe 
> seems like not working properly.

The ko2iblnd-probe script looks in /sys/class/infiniband for device names 
starting with “hfi” or “qib”.  If it detects those, it decides that the 
“profile” it should use is “opa” so then it basically invokes the ko2iblnd-opa 
modprobe line.  But the script has no logic to detect other types of card (i.e. 
- mellanox), so in those cases, no ko2iblnd options are used and you end up 
with the default module parameters being used.

If you want to use the script, you will need to modify ko2iblnd-probe to add a 
new case for your brand of HCA and then add an appropriate ko2iblnd- 
line to ko2iblnd.conf.

Or just do what I did and comment out all the lines in ko2iblnd.conf and add 
your own lines.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-22 Thread Riccardo Veraldi
On 8/22/17 9:22 AM, Mannthey, Keith wrote:
>
> You may want to file a jira ticket if ko2iblnd-opa setting were being
> automatically used on your Mellanox setup.  That is not expected.
>
yes they are automatically used on my Mellanox and the script
ko2iblnd-probe seems like not working properly.
>
>  
>
> On another note:  As you note you NVMe backend is much faster than QRD
> link speed.  You may want to look at using the new Multi-rall lnet
> feature to boost network bandwidth.  You can add a 2^nd QRD HCA/Port
> and get more Lnet bandwith from your OSS server.   It is a new feature
> that is a bit of work to use but if you are chasing bandwith it might
> be worth the effort.
>
I have a dual infiniband card so I was thinking to bond them to have
more bandwidth. Is this that you mean when you are talking about the
Muti-rail feature boost ?

thanks

Rick


>  
>
> Thanks,
>
> Keith
>
>  
>
> *From:*lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org]
> *On Behalf Of *Chris Horn
> *Sent:* Monday, August 21, 2017 12:40 PM
> *To:* Riccardo Veraldi <riccardo.vera...@cnaf.infn.it>; Arman
> Khalatyan <arm2...@gmail.com>
> *Cc:* lustre-discuss@lists.lustre.org
> *Subject:* Re: [lustre-discuss] Lustre poor performance
>
>  
>
> The ko2iblnd-opa settings are tuned specifically for Intel OmniPath.
> Take a look at the /usr/sbin/ko2iblnd-probe script to see how OPA
> hardware is detected and the “ko2iblnd-opa” settings get used.
>
>  
>
> Chris Horn
>
>  
>
> *From: *lustre-discuss <lustre-discuss-boun...@lists.lustre.org
> <mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of
> Riccardo Veraldi <riccardo.vera...@cnaf.infn.it
> <mailto:riccardo.vera...@cnaf.infn.it>>
> *Date: *Saturday, August 19, 2017 at 5:00 PM
> *To: *Arman Khalatyan <arm2...@gmail.com <mailto:arm2...@gmail.com>>
> *Cc: *"lustre-discuss@lists.lustre.org
> <mailto:lustre-discuss@lists.lustre.org>"
> <lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>>
> *Subject: *Re: [lustre-discuss] Lustre poor performance
>
>  
>
> I ran again my Lnet self test and  this time adding --concurrency=16 
> I can use all of the IB bandwith (3.5GB/sec).
>
> the only thing I do not understand is why ko2iblnd.conf is not loaded
> properly and I had to remove the alias in the config file to allow
> the proper peer_credit settings to be loaded.
>
> thanks to everyone for helping
>
> Riccardo
>
> On 8/19/17 8:54 AM, Riccardo Veraldi wrote:
>
>
> I found out that ko2iblnd is not getting settings from
> /etc/modprobe/ko2iblnd.conf
> alias ko2iblnd-opa ko2iblnd
> options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64
> credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32
> fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
>
> install ko2iblnd /usr/sbin/ko2iblnd-probe
>
> but if I modify ko2iblnd.conf like this, then settings are loaded:
>
> options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024
> concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
> fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
>
> install ko2iblnd /usr/sbin/ko2iblnd-probe
>
> Lnet tests show better behaviour but still I Would expect more
> than this.
> Is it possible to tune parameters in /etc/modprobe/ko2iblnd.conf
> so that Mellanox ConnectX-3 will work more efficiently ?
>
> [LNet Rates of servers]
> [R] Avg: 2286 RPC/s Min: 0RPC/s Max: 4572 RPC/s
> [W] Avg: 3322 RPC/s Min: 0RPC/s Max: 6643 RPC/s
> [LNet Bandwidth of servers]
> [R] Avg: 625.23   MiB/s Min: 0.00 MiB/s Max: 1250.46  MiB/s
> [W] Avg: 1035.85  MiB/s Min: 0.00 MiB/s Max: 2071.69  MiB/s
> [LNet Rates of servers]
> [R] Avg: 2286 RPC/s Min: 1RPC/s Max: 4571 RPC/s
> [W] Avg: 3321 RPC/s Min: 1RPC/s Max: 6641 RPC/s
> [LNet Bandwidth of servers]
> [R] Avg: 625.55   MiB/s Min: 0.00 MiB/s Max: 1251.11  MiB/s
> [W] Avg: 1035.05  MiB/s Min: 0.00 MiB/s Max: 2070.11  MiB/s
> [LNet Rates of servers]
> [R] Avg: 2291 RPC/s Min: 0RPC/s Max: 4581 RPC/s
> [W] Avg: 3329 RPC/s Min: 0RPC/s Max: 6657 RPC/s
> [LNet Bandwidth of servers]
> [R] Avg: 626.55   MiB/s Min: 0.00 MiB/s Max: 1253.11  MiB/s
> [W] Avg: 1038.05  MiB/s Min: 0.00 MiB/s Max: 2076.11  MiB/s
> session is ended
> ./lnet_test.sh: line 17: 23394 Terminated  lst stat
> servers
>
>
>
>
> On 8/19/17 4:

Re: [lustre-discuss] Lustre poor performance

2017-08-22 Thread Mannthey, Keith
You may want to file a jira ticket if ko2iblnd-opa setting were being 
automatically used on your Mellanox setup.  That is not expected.

On another note:  As you note you NVMe backend is much faster than QRD link 
speed.  You may want to look at using the new Multi-rall lnet feature to boost 
network bandwidth.  You can add a 2nd QRD HCA/Port and get more Lnet bandwith 
from your OSS server.   It is a new feature that is a bit of work to use but if 
you are chasing bandwith it might be worth the effort.

Thanks,
Keith

From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Chris Horn
Sent: Monday, August 21, 2017 12:40 PM
To: Riccardo Veraldi <riccardo.vera...@cnaf.infn.it>; Arman Khalatyan 
<arm2...@gmail.com>
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre poor performance

The ko2iblnd-opa settings are tuned specifically for Intel OmniPath. Take a 
look at the /usr/sbin/ko2iblnd-probe script to see how OPA hardware is detected 
and the “ko2iblnd-opa” settings get used.

Chris Horn

From: lustre-discuss 
<lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Riccardo Veraldi 
<riccardo.vera...@cnaf.infn.it<mailto:riccardo.vera...@cnaf.infn.it>>
Date: Saturday, August 19, 2017 at 5:00 PM
To: Arman Khalatyan <arm2...@gmail.com<mailto:arm2...@gmail.com>>
Cc: "lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>" 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] Lustre poor performance

I ran again my Lnet self test and  this time adding --concurrency=16  I can use 
all of the IB bandwith (3.5GB/sec).

the only thing I do not understand is why ko2iblnd.conf is not loaded properly 
and I had to remove the alias in the config file to allow
the proper peer_credit settings to be loaded.

thanks to everyone for helping

Riccardo

On 8/19/17 8:54 AM, Riccardo Veraldi wrote:

I found out that ko2iblnd is not getting settings from 
/etc/modprobe/ko2iblnd.conf
alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe

but if I modify ko2iblnd.conf like this, then settings are loaded:

options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe

Lnet tests show better behaviour but still I Would expect more than this.
Is it possible to tune parameters in /etc/modprobe/ko2iblnd.conf so that 
Mellanox ConnectX-3 will work more efficiently ?

[LNet Rates of servers]
[R] Avg: 2286 RPC/s Min: 0RPC/s Max: 4572 RPC/s
[W] Avg: 3322 RPC/s Min: 0RPC/s Max: 6643 RPC/s
[LNet Bandwidth of servers]
[R] Avg: 625.23   MiB/s Min: 0.00 MiB/s Max: 1250.46  MiB/s
[W] Avg: 1035.85  MiB/s Min: 0.00 MiB/s Max: 2071.69  MiB/s
[LNet Rates of servers]
[R] Avg: 2286 RPC/s Min: 1RPC/s Max: 4571 RPC/s
[W] Avg: 3321 RPC/s Min: 1RPC/s Max: 6641 RPC/s
[LNet Bandwidth of servers]
[R] Avg: 625.55   MiB/s Min: 0.00 MiB/s Max: 1251.11  MiB/s
[W] Avg: 1035.05  MiB/s Min: 0.00 MiB/s Max: 2070.11  MiB/s
[LNet Rates of servers]
[R] Avg: 2291 RPC/s Min: 0RPC/s Max: 4581 RPC/s
[W] Avg: 3329 RPC/s Min: 0RPC/s Max: 6657 RPC/s
[LNet Bandwidth of servers]
[R] Avg: 626.55   MiB/s Min: 0.00 MiB/s Max: 1253.11  MiB/s
[W] Avg: 1038.05  MiB/s Min: 0.00 MiB/s Max: 2076.11  MiB/s
session is ended
./lnet_test.sh: line 17: 23394 Terminated  lst stat servers




On 8/19/17 4:20 AM, Arman Khalatyan wrote:
just minor comment,
you should push up performance of your nodes,they are not running in the max 
cpu frequencies.Al tests might be inconsistent. in order to get most of ib run 
following:
tuned-adm profile latency-performance
for more options use:
tuned-adm list

It will be interesting to see the difference.

Am 19.08.2017 3:57 vorm. schrieb "Riccardo Veraldi" 
<riccardo.vera...@cnaf.infn.it<mailto:riccardo.vera...@cnaf.infn.it>>:
Hello Keith and Dennis, these are the test I ran.

  *   obdfilter-survey, shows that I Can saturate disk performance, the 
NVMe/ZFS backend is performing very well and it is faster then my Infiniband 
network

pool  alloc   free   read  write   read  write
  -  -  -  -  -  -
drpffb-ost01  3.31T  3.19T  3  35.7K  16.0K  7.03G
  raidz1  3.31T  3.19T  3  35.7K  16.0K  7.03G
nvme0n1   -  -  1  5.95K  7.99K  1.17G
nvme1n1   -  -  0  6.01K  0  1.18G
nvme2n1   -  -  0  5.93K 

Re: [lustre-discuss] Lustre poor performance

2017-08-21 Thread Chris Horn
The ko2iblnd-opa settings are tuned specifically for Intel OmniPath. Take a 
look at the /usr/sbin/ko2iblnd-probe script to see how OPA hardware is detected 
and the “ko2iblnd-opa” settings get used.

Chris Horn

From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Riccardo Veraldi <riccardo.vera...@cnaf.infn.it>
Date: Saturday, August 19, 2017 at 5:00 PM
To: Arman Khalatyan <arm2...@gmail.com>
Cc: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] Lustre poor performance

I ran again my Lnet self test and  this time adding --concurrency=16  I can use 
all of the IB bandwith (3.5GB/sec).

the only thing I do not understand is why ko2iblnd.conf is not loaded properly 
and I had to remove the alias in the config file to allow
the proper peer_credit settings to be loaded.

thanks to everyone for helping

Riccardo

On 8/19/17 8:54 AM, Riccardo Veraldi wrote:

I found out that ko2iblnd is not getting settings from 
/etc/modprobe/ko2iblnd.conf
alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe

but if I modify ko2iblnd.conf like this, then settings are loaded:

options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe

Lnet tests show better behaviour but still I Would expect more than this.
Is it possible to tune parameters in /etc/modprobe/ko2iblnd.conf so that 
Mellanox ConnectX-3 will work more efficiently ?

[LNet Rates of servers]
[R] Avg: 2286 RPC/s Min: 0RPC/s Max: 4572 RPC/s
[W] Avg: 3322 RPC/s Min: 0RPC/s Max: 6643 RPC/s
[LNet Bandwidth of servers]
[R] Avg: 625.23   MiB/s Min: 0.00 MiB/s Max: 1250.46  MiB/s
[W] Avg: 1035.85  MiB/s Min: 0.00 MiB/s Max: 2071.69  MiB/s
[LNet Rates of servers]
[R] Avg: 2286 RPC/s Min: 1RPC/s Max: 4571 RPC/s
[W] Avg: 3321 RPC/s Min: 1RPC/s Max: 6641 RPC/s
[LNet Bandwidth of servers]
[R] Avg: 625.55   MiB/s Min: 0.00 MiB/s Max: 1251.11  MiB/s
[W] Avg: 1035.05  MiB/s Min: 0.00 MiB/s Max: 2070.11  MiB/s
[LNet Rates of servers]
[R] Avg: 2291 RPC/s Min: 0RPC/s Max: 4581 RPC/s
[W] Avg: 3329 RPC/s Min: 0RPC/s Max: 6657 RPC/s
[LNet Bandwidth of servers]
[R] Avg: 626.55   MiB/s Min: 0.00 MiB/s Max: 1253.11  MiB/s
[W] Avg: 1038.05  MiB/s Min: 0.00 MiB/s Max: 2076.11  MiB/s
session is ended
./lnet_test.sh: line 17: 23394 Terminated  lst stat servers




On 8/19/17 4:20 AM, Arman Khalatyan wrote:
just minor comment,
you should push up performance of your nodes,they are not running in the max 
cpu frequencies.Al tests might be inconsistent. in order to get most of ib run 
following:
tuned-adm profile latency-performance
for more options use:
tuned-adm list

It will be interesting to see the difference.

Am 19.08.2017 3:57 vorm. schrieb "Riccardo Veraldi" 
<riccardo.vera...@cnaf.infn.it<mailto:riccardo.vera...@cnaf.infn.it>>:
Hello Keith and Dennis, these are the test I ran.

  *   obdfilter-survey, shows that I Can saturate disk performance, the 
NVMe/ZFS backend is performing very well and it is faster then my Infiniband 
network

pool  alloc   free   read  write   read  write
  -  -  -  -  -  -
drpffb-ost01  3.31T  3.19T  3  35.7K  16.0K  7.03G
  raidz1  3.31T  3.19T  3  35.7K  16.0K  7.03G
nvme0n1   -  -  1  5.95K  7.99K  1.17G
nvme1n1   -  -  0  6.01K  0  1.18G
nvme2n1   -  -  0  5.93K  0  1.17G
nvme3n1   -  -  0  5.88K  0  1.16G
nvme4n1   -  -  1  5.95K  7.99K  1.17G
nvme5n1   -  -  0  5.96K  0  1.17G
  -  -  -  -  -  -
this are the tests results

Fri Aug 18 16:54:48 PDT 2017 Obdfilter-survey for case=disk from drp-tst-ffb01
ost  1 sz 10485760K rsz 1024K obj1 thr1 write 7633.08 SHORT 
rewrite 7558.78 SHORT read 3205.24 [3213.70, 3226.78]
ost  1 sz 10485760K rsz 1024K obj1 thr2 write 7996.89 SHORT 
rewrite 7903.42 SHORT read 5264.70 SHORT
ost  1 sz 10485760K rsz 1024K obj2 thr2 write 7718.94 SHORT 
rewrite 7977.84 SHORT read 5802.17 SHORT

  *   Lnet self test, and here I see the problems. For reference 
172.21.52.[83,84] are the two OSSes 172.21.52.86 is the reader/writer. Here is 
the script that I ran

#!/bin/bash
export LST_SESSION=$$
lst new_session read_write
lst add_group servers 172.21.52.[83,84]@o2ib5
lst add_group readers 172

Re: [lustre-discuss] Lustre poor performance

2017-08-19 Thread Riccardo Veraldi
>> 
>> ---
>>
>> RDMA modules are loaded
>>
>> rpcrdma90366  0
>> rdma_ucm   26837  0
>> ib_uverbs  51854  2 ib_ucm,rdma_ucm
>> rdma_cm53755  5
>> rpcrdma,ko2iblnd,ib_iser,rdma_ucm,ib_isert
>> ib_cm  47149  5
>> rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
>> iw_cm  46022  1 rdma_cm
>> ib_core   210381  15
>> 
>> rdma_cm,ib_cm,iw_cm,rpcrdma,ko2iblnd,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
>> sunrpc334343  17
>> nfs,nfsd,rpcsec_gss_krb5,auth_rpcgss,lockd,nfsv4,rpcrdma,nfs_acl
>>
>> I do not know where to look to have Lnet performing faster. I am
>> running my ib0 interface in connected mode with 65520 MTU size.
>>
>> Any hint will be much appreciated
>>
>> thank you
>>
>> Rick
>>
>>
>>
>>
>> On 8/18/17 9:05 AM, Mannthey, Keith wrote:
>>> I would suggest you a few other tests to help isolate where the issue 
>>> might be.  
>>>
>>> 1. What is the single thread "DD" write speed?
>>>  
>>> 2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network 
>>> Performance (LNet Self-Test)" in the Lustre manual if this is a new test 
>>> for you. 
>>> This will help show how much Lnet bandwith you have from your single 
>>> client.  There are tunable in the lnet later that can affect things.  Which 
>>> QRD HCA are you using?
>>>
>>> 3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
>>> (obdfilter-survey)" in the Lustre manual.  This test will help demonstrate 
>>> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  
>>>
>>> Thanks,
>>>  Keith 
>>> -Original Message-
>>> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org
>>> <mailto:lustre-discuss-boun...@lists.lustre.org>] On Behalf Of Riccardo 
>>> Veraldi
>>> Sent: Thursday, August 17, 2017 10:48 PM
>>> To: Dennis Nelson <dnel...@ddn.com> <mailto:dnel...@ddn.com>; 
>>> lustre-discuss@lists.lustre.org
>>> <mailto:lustre-discuss@lists.lustre.org>
>>> Subject: Re: [lustre-discuss] Lustre poor performance
>>>
>>> this is my lustre.conf
>>>
>>> [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
>>> networks=o2ib5(ib0),tcp5(enp1s0f0)
>>>
>>> data transfer is over infiniband
>>>
>>> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
>>> inet 172.21.52.83  netmask 255.255.252.0  broadcast 
>>> 172.21.55.255
>>>
>>>
>>> On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
>>>> On 8/17/17 9:22 PM, Dennis Nelson wrote:
>>>>> It appears that you are running iozone on a single client?  What kind 
>>>>> of network is tcp5?  Have you looked at the network to make sure it is 
>>>>> not the bottleneck?
>>>>>
>>>> yes the data transfer is on ib0 interface and I did a memory to memory 
>>>> test through InfiniBand QDR  resulting in 3.7GB/sec.
>>>> tcp is used to connect to the MDS. It is tcp5 to differentiate it from 
>>>> my other many Lustre clusters. I could have called it tcp but it does 
>>>> not make any difference performance wise.
>>>> I ran the test from one single node yes, I ran the same test also 
>>>> locally on a zpool identical to the one on the Lustre OSS.
>>>>  Ihave 4 identical servers each of them with the aame nvme disks:
>>>>
>>>> server1: OSS - OST1 Lustre/ZFS  raidz1
>>>>
>>>> server2: OSS - OST2 Lustre/ZFS  raidz1
>>>>
>>>> server3: local ZFS raidz1
>>>>
>>>> server4: Lustre client
>>>>
>>>>
>>>>
>>>> ___
>>>> lustre-discuss mailing list
>>>> lustre-discuss@lists.lustre.org
>>>> <mailto:lustre-discuss@lists.lustre.org>
>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> <mailto:lustre-discuss@lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>>
>> ___ lustre-discuss
>> mailing list lustre-discuss@lists.lustre.org
>> <mailto:lustre-discuss@lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> 
>>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-19 Thread Riccardo Veraldi
 Frequency is not max.
>  1281000 0.00   845.45   
> 6.925918
> Conflicting CPU frequency values detected: 1469.703000 !=
> 1362.257000. CPU Frequency is not max.
>  2561000 0.00   1746.93  
> 7.155406
> Conflicting CPU frequency values detected: 1469.703000 !=
> 1362.257000. CPU Frequency is not max.
>  5121000 0.00   2766.93  
> 5.82
> Conflicting CPU frequency values detected: 1296.714000 !=
> 1204.675000. CPU Frequency is not max.
>  1024   1000 0.00   3516.26  
> 3.600646
> Conflicting CPU frequency values detected: 1296.714000 !=
> 1325.535000. CPU Frequency is not max.
>  2048   1000 0.00   3630.93  
> 1.859035
> Conflicting CPU frequency values detected: 1296.714000 !=
> 1331.312000. CPU Frequency is not max.
>  4096   1000 0.00   3702.39  
> 0.947813
> Conflicting CPU frequency values detected: 1296.714000 !=
> 1200.027000. CPU Frequency is not max.
>  8192   1000 0.00   3724.82  
> 0.476777
> Conflicting CPU frequency values detected: 1384.902000 !=
> 1314.113000. CPU Frequency is not max.
>  16384  1000 0.00   3731.21  
> 0.238798
> Conflicting CPU frequency values detected: 1578.078000 !=
> 1200.027000. CPU Frequency is not max.
>  32768  1000 0.00   3735.32  
> 0.119530
> Conflicting CPU frequency values detected: 1578.078000 !=
> 1200.027000. CPU Frequency is not max.
>  65536  1000 0.00   3736.98  
> 0.059792
> Conflicting CPU frequency values detected: 1578.078000 !=
> 1200.027000. CPU Frequency is not max.
>  131072 1000 0.00   3737.80  
> 0.029902
> Conflicting CPU frequency values detected: 1578.078000 !=
> 1200.027000. CPU Frequency is not max.
>  262144 1000 0.00   3738.43  
> 0.014954
> Conflicting CPU frequency values detected: 1570.507000 !=
> 1200.027000. CPU Frequency is not max.
>  524288 1000 0.00   3738.50  
> 0.007477
> Conflicting CPU frequency values detected: 1457.019000 !=
> 1236.152000. CPU Frequency is not max.
>  10485761000 0.00   3738.65  
> 0.003739
> Conflicting CPU frequency values detected: 1411.597000 !=
> 1234.957000. CPU Frequency is not max.
>  20971521000 0.00   3738.65  
> 0.001869
> Conflicting CPU frequency values detected: 1369.828000 !=
> 1516.851000. CPU Frequency is not max.
>  41943041000 0.00   3738.80  
> 0.000935
> Conflicting CPU frequency values detected: 1564.664000 !=
> 1247.574000. CPU Frequency is not max.
>  83886081000 0.00   3738.76  
> 0.000467
> 
> ---
>
> RDMA modules are loaded
>
> rpcrdma90366  0
> rdma_ucm   26837  0
> ib_uverbs  51854  2 ib_ucm,rdma_ucm
> rdma_cm53755  5
> rpcrdma,ko2iblnd,ib_iser,rdma_ucm,ib_isert
> ib_cm  47149  5 rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
> iw_cm  46022  1 rdma_cm
> ib_core   210381  15
> 
> rdma_cm,ib_cm,iw_cm,rpcrdma,ko2iblnd,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
> sunrpc334343  17
> nfs,nfsd,rpcsec_gss_krb5,auth_rpcgss,lockd,nfsv4,rpcrdma,nfs_acl
>
> I do not know where to look to have Lnet performing faster. I am
> running my ib0 interface in connected mode with 65520 MTU size.
>
> Any hint will be much appreciated
>
> thank you
>
> Rick
>
>
>
>
> On 8/18/17 9:05 AM, Mannthey, Keith wrote:
>> I would suggest you a few other tests to help isolate where the issue 
>> might be.  
>>
>> 1. What is the single thread "DD" write speed?
>>  
>> 2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network 
>> Performance (LNet Self-Test)" in the Lustre manual if this is a new test for 
>> you. 
>> This will help show how much Lnet

Re: [lustre-discuss] Lustre poor performance

2017-08-19 Thread Arman Khalatyan
,
ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
sunrpc334343  17 nfs,nfsd,rpcsec_gss_krb5,auth_
rpcgss,lockd,nfsv4,rpcrdma,nfs_acl

I do not know where to look to have Lnet performing faster. I am running my
ib0 interface in connected mode with 65520 MTU size.

Any hint will be much appreciated

thank you

Rick




On 8/18/17 9:05 AM, Mannthey, Keith wrote:

I would suggest you a few other tests to help isolate where the issue
might be.

1. What is the single thread "DD" write speed?

2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network
Performance (LNet Self-Test)" in the Lustre manual if this is a new
test for you.
This will help show how much Lnet bandwith you have from your single
client.  There are tunable in the lnet later that can affect things.
Which QRD HCA are you using?

3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance
(obdfilter-survey)" in the Lustre manual.  This test will help
demonstrate what the backed NVMe/ZFS setup can do at the OBD layer in
Lustre.

Thanks,
 Keith
-Original Message-
From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org
<lustre-discuss-boun...@lists.lustre.org>] On Behalf Of Riccardo
Veraldi
Sent: Thursday, August 17, 2017 10:48 PM
To: Dennis Nelson <dnel...@ddn.com> <dnel...@ddn.com>;
lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre poor performance

this is my lustre.conf

[drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet
networks=o2ib5(ib0),tcp5(enp1s0f0)

data transfer is over infiniband

ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
inet 172.21.52.83  netmask 255.255.252.0  broadcast 172.21.55.255


On 8/17/17 10:45 PM, Riccardo Veraldi wrote:

On 8/17/17 9:22 PM, Dennis Nelson wrote:

It appears that you are running iozone on a single client?  What kind
of network is tcp5?  Have you looked at the network to make sure it is
not the bottleneck?


yes the data transfer is on ib0 interface and I did a memory to memory
test through InfiniBand QDR  resulting in 3.7GB/sec.
tcp is used to connect to the MDS. It is tcp5 to differentiate it from
my other many Lustre clusters. I could have called it tcp but it does
not make any difference performance wise.
I ran the test from one single node yes, I ran the same test also
locally on a zpool identical to the one on the Lustre OSS.
 Ihave 4 identical servers each of them with the aame nvme disks:

server1: OSS - OST1 Lustre/ZFS  raidz1

server2: OSS - OST2 Lustre/ZFS  raidz1

server3: local ZFS raidz1

server4: Lustre client



___
lustre-discuss mailing
listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing
listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-18 Thread Riccardo Veraldi
.21  
>> 0.238798
>> Conflicting CPU frequency values detected: 1578.078000 !=
>> 1200.027000. CPU Frequency is not max.
>>  32768  1000 0.00   3735.32  
>> 0.119530
>> Conflicting CPU frequency values detected: 1578.078000 !=
>> 1200.027000. CPU Frequency is not max.
>>  65536  1000 0.00   3736.98  
>> 0.059792
>> Conflicting CPU frequency values detected: 1578.078000 !=
>> 1200.027000. CPU Frequency is not max.
>>  131072 1000 0.00   3737.80  
>> 0.029902
>> Conflicting CPU frequency values detected: 1578.078000 !=
>> 1200.027000. CPU Frequency is not max.
>>  262144 1000 0.00   3738.43  
>> 0.014954
>> Conflicting CPU frequency values detected: 1570.507000 !=
>> 1200.027000. CPU Frequency is not max.
>>  524288 1000 0.00   3738.50  
>> 0.007477
>> Conflicting CPU frequency values detected: 1457.019000 !=
>> 1236.152000. CPU Frequency is not max.
>>  10485761000 0.00   3738.65  
>> 0.003739
>> Conflicting CPU frequency values detected: 1411.597000 !=
>> 1234.957000. CPU Frequency is not max.
>>  20971521000 0.00   3738.65  
>> 0.001869
>> Conflicting CPU frequency values detected: 1369.828000 !=
>> 1516.851000. CPU Frequency is not max.
>>  41943041000 0.00   3738.80  
>> 0.000935
>> Conflicting CPU frequency values detected: 1564.664000 !=
>> 1247.574000. CPU Frequency is not max.
>>  83886081000 0.00   3738.76  
>> 0.000467
>> ---
>>
>> RDMA modules are loaded
>>
>> rpcrdma90366  0
>> rdma_ucm   26837  0
>> ib_uverbs  51854  2 ib_ucm,rdma_ucm
>> rdma_cm53755  5
>> rpcrdma,ko2iblnd,ib_iser,rdma_ucm,ib_isert
>> ib_cm  47149  5 rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
>> iw_cm  46022  1 rdma_cm
>> ib_core   210381  15
>> rdma_cm,ib_cm,iw_cm,rpcrdma,ko2iblnd,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
>> sunrpc334343  17
>> nfs,nfsd,rpcsec_gss_krb5,auth_rpcgss,lockd,nfsv4,rpcrdma,nfs_acl
>>
>> I do not know where to look to have Lnet performing faster. I am
>> running my ib0 interface in connected mode with 65520 MTU size.
>>
>> Any hint will be much appreciated
>>
>> thank you
>>
>> Rick
>>
>>
>>
>>
>> On 8/18/17 9:05 AM, Mannthey, Keith wrote:
>>> I would suggest you a few other tests to help isolate where the issue might 
>>> be.  
>>>
>>> 1. What is the single thread "DD" write speed?
>>>  
>>> 2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network 
>>> Performance (LNet Self-Test)" in the Lustre manual if this is a new test 
>>> for you. 
>>> This will help show how much Lnet bandwith you have from your single 
>>> client.  There are tunable in the lnet later that can affect things.  Which 
>>> QRD HCA are you using?
>>>
>>> 3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
>>> (obdfilter-survey)" in the Lustre manual.  This test will help demonstrate 
>>> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  
>>>
>>> Thanks,
>>>  Keith 
>>> -Original Message-
>>> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On 
>>> Behalf Of Riccardo Veraldi
>>> Sent: Thursday, August 17, 2017 10:48 PM
>>> To: Dennis Nelson <dnel...@ddn.com>; lustre-discuss@lists.lustre.org
>>> Subject: Re: [lustre-discuss] Lustre poor performance
>>>
>>> this is my lustre.conf
>>>
>>> [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
>>> networks=o2ib5(ib0),tcp5(enp1s0f0)
>>>
>>> data transfer is over infiniband
>>>
>>> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
>>> inet 172.21.52.83  netmask 255.255.252.0  broadcast 172.21.55.255
>>>
>>>
>>> On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
>>>> On 8/17/17 9:22 PM, Dennis Nelson wrote:
>>>>> It appears that you are running iozone on a single client?  What kind of 
>>>>> network is tcp5?  Have you looked at the network to make sure it is not 
>>>>> the bottleneck?
>>>>>
>>>> yes the data transfer is on ib0 interface and I did a memory to memory 
>>>> test through InfiniBand QDR  resulting in 3.7GB/sec.
>>>> tcp is used to connect to the MDS. It is tcp5 to differentiate it from 
>>>> my other many Lustre clusters. I could have called it tcp but it does 
>>>> not make any difference performance wise.
>>>> I ran the test from one single node yes, I ran the same test also 
>>>> locally on a zpool identical to the one on the Lustre OSS.
>>>>  Ihave 4 identical servers each of them with the aame nvme disks:
>>>>
>>>> server1: OSS - OST1 Lustre/ZFS  raidz1
>>>>
>>>> server2: OSS - OST2 Lustre/ZFS  raidz1
>>>>
>>>> server3: local ZFS raidz1
>>>>
>>>> server4: Lustre client
>>>>
>>>>
>>>>
>>>> ___
>>>> lustre-discuss mailing list
>>>> lustre-discuss@lists.lustre.org
>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-18 Thread Dennis Nelson
crdma,nfs_acl

I do not know where to look to have Lnet performing faster. I am running my ib0 
interface in connected mode with 65520 MTU size.

Any hint will be much appreciated

thank you

Rick




On 8/18/17 9:05 AM, Mannthey, Keith wrote:

I would suggest you a few other tests to help isolate where the issue might be.

1. What is the single thread "DD" write speed?

2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network Performance 
(LNet Self-Test)" in the Lustre manual if this is a new test for you.
This will help show how much Lnet bandwith you have from your single client.  
There are tunable in the lnet later that can affect things.  Which QRD HCA are 
you using?

3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
(obdfilter-survey)" in the Lustre manual.  This test will help demonstrate what 
the backed NVMe/ZFS setup can do at the OBD layer in Lustre.

Thanks,
 Keith
-Original Message-
From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Riccardo Veraldi
Sent: Thursday, August 17, 2017 10:48 PM
To: Dennis Nelson <dnel...@ddn.com><mailto:dnel...@ddn.com>; 
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] Lustre poor performance

this is my lustre.conf

[drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
networks=o2ib5(ib0),tcp5(enp1s0f0)

data transfer is over infiniband

ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
inet 172.21.52.83  netmask 255.255.252.0  broadcast 172.21.55.255


On 8/17/17 10:45 PM, Riccardo Veraldi wrote:


On 8/17/17 9:22 PM, Dennis Nelson wrote:


It appears that you are running iozone on a single client?  What kind of 
network is tcp5?  Have you looked at the network to make sure it is not the 
bottleneck?



yes the data transfer is on ib0 interface and I did a memory to memory
test through InfiniBand QDR  resulting in 3.7GB/sec.
tcp is used to connect to the MDS. It is tcp5 to differentiate it from
my other many Lustre clusters. I could have called it tcp but it does
not make any difference performance wise.
I ran the test from one single node yes, I ran the same test also
locally on a zpool identical to the one on the Lustre OSS.
 Ihave 4 identical servers each of them with the aame nvme disks:

server1: OSS - OST1 Lustre/ZFS  raidz1

server2: OSS - OST2 Lustre/ZFS  raidz1

server3: local ZFS raidz1

server4: Lustre client



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-18 Thread Riccardo Veraldi
> I would suggest you a few other tests to help isolate where the issue might 
> be.  
>
> 1. What is the single thread "DD" write speed?
>  
> 2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network 
> Performance (LNet Self-Test)" in the Lustre manual if this is a new test for 
> you. 
> This will help show how much Lnet bandwith you have from your single client.  
> There are tunable in the lnet later that can affect things.  Which QRD HCA 
> are you using?
>
> 3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
> (obdfilter-survey)" in the Lustre manual.  This test will help demonstrate 
> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  
>
> Thanks,
>  Keith 
> -Original Message-
> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On 
> Behalf Of Riccardo Veraldi
> Sent: Thursday, August 17, 2017 10:48 PM
> To: Dennis Nelson <dnel...@ddn.com>; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Lustre poor performance
>
> this is my lustre.conf
>
> [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
> networks=o2ib5(ib0),tcp5(enp1s0f0)
>
> data transfer is over infiniband
>
> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
> inet 172.21.52.83  netmask 255.255.252.0  broadcast 172.21.55.255
>
>
> On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
>> On 8/17/17 9:22 PM, Dennis Nelson wrote:
>>> It appears that you are running iozone on a single client?  What kind of 
>>> network is tcp5?  Have you looked at the network to make sure it is not the 
>>> bottleneck?
>>>
>> yes the data transfer is on ib0 interface and I did a memory to memory 
>> test through InfiniBand QDR  resulting in 3.7GB/sec.
>> tcp is used to connect to the MDS. It is tcp5 to differentiate it from 
>> my other many Lustre clusters. I could have called it tcp but it does 
>> not make any difference performance wise.
>> I ran the test from one single node yes, I ran the same test also 
>> locally on a zpool identical to the one on the Lustre OSS.
>>  Ihave 4 identical servers each of them with the aame nvme disks:
>>
>> server1: OSS - OST1 Lustre/ZFS  raidz1
>>
>> server2: OSS - OST2 Lustre/ZFS  raidz1
>>
>> server3: local ZFS raidz1
>>
>> server4: Lustre client
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-18 Thread Riccardo Veraldi
On 8/18/17 1:13 PM, Mannthey, Keith wrote:
> Is Selinux enabled on the client or server? 
the first thing I always to is to disable SElinux.
it's not running.

>
> Thanks,
>  Keith 
> -Original Message-
> From: Riccardo Veraldi [mailto:riccardo.vera...@cnaf.infn.it] 
> Sent: Friday, August 18, 2017 11:31 AM
> To: Mannthey, Keith <keith.mannt...@intel.com>; Dennis Nelson 
> <dnel...@ddn.com>; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Lustre poor performance
>
>
> thank you Keith,
> I will do all this. the single thread dd tests shows 1GB/sec. I will do the 
> other tests
>
>
> On 8/18/17 9:05 AM, Mannthey, Keith wrote:
>> I would suggest you a few other tests to help isolate where the issue might 
>> be.  
>>
>> 1. What is the single thread "DD" write speed?
>>  
>> 2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network 
>> Performance (LNet Self-Test)" in the Lustre manual if this is a new test for 
>> you. 
>> This will help show how much Lnet bandwith you have from your single client. 
>>  There are tunable in the lnet later that can affect things.  Which QRD HCA 
>> are you using?
>>
>> 3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
>> (obdfilter-survey)" in the Lustre manual.  This test will help demonstrate 
>> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  
>>
>> Thanks,
>>  Keith
>> -Original Message-
>> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] 
>> On Behalf Of Riccardo Veraldi
>> Sent: Thursday, August 17, 2017 10:48 PM
>> To: Dennis Nelson <dnel...@ddn.com>; lustre-discuss@lists.lustre.org
>> Subject: Re: [lustre-discuss] Lustre poor performance
>>
>> this is my lustre.conf
>>
>> [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
>> networks=o2ib5(ib0),tcp5(enp1s0f0)
>>
>> data transfer is over infiniband
>>
>> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
>> inet 172.21.52.83  netmask 255.255.252.0  broadcast 
>> 172.21.55.255
>>
>>
>> On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
>>> On 8/17/17 9:22 PM, Dennis Nelson wrote:
>>>> It appears that you are running iozone on a single client?  What kind of 
>>>> network is tcp5?  Have you looked at the network to make sure it is not 
>>>> the bottleneck?
>>>>
>>> yes the data transfer is on ib0 interface and I did a memory to 
>>> memory test through InfiniBand QDR  resulting in 3.7GB/sec.
>>> tcp is used to connect to the MDS. It is tcp5 to differentiate it 
>>> from my other many Lustre clusters. I could have called it tcp but it 
>>> does not make any difference performance wise.
>>> I ran the test from one single node yes, I ran the same test also 
>>> locally on a zpool identical to the one on the Lustre OSS.
>>>  Ihave 4 identical servers each of them with the aame nvme disks:
>>>
>>> server1: OSS - OST1 Lustre/ZFS  raidz1
>>>
>>> server2: OSS - OST2 Lustre/ZFS  raidz1
>>>
>>> server3: local ZFS raidz1
>>>
>>> server4: Lustre client
>>>
>>>
>>>
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-18 Thread Mannthey, Keith
Is Selinux enabled on the client or server? 

Thanks,
 Keith 
-Original Message-
From: Riccardo Veraldi [mailto:riccardo.vera...@cnaf.infn.it] 
Sent: Friday, August 18, 2017 11:31 AM
To: Mannthey, Keith <keith.mannt...@intel.com>; Dennis Nelson 
<dnel...@ddn.com>; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre poor performance


thank you Keith,
I will do all this. the single thread dd tests shows 1GB/sec. I will do the 
other tests


On 8/18/17 9:05 AM, Mannthey, Keith wrote:
> I would suggest you a few other tests to help isolate where the issue might 
> be.  
>
> 1. What is the single thread "DD" write speed?
>  
> 2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network 
> Performance (LNet Self-Test)" in the Lustre manual if this is a new test for 
> you. 
> This will help show how much Lnet bandwith you have from your single client.  
> There are tunable in the lnet later that can affect things.  Which QRD HCA 
> are you using?
>
> 3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
> (obdfilter-survey)" in the Lustre manual.  This test will help demonstrate 
> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  
>
> Thanks,
>  Keith
> -Original Message-
> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] 
> On Behalf Of Riccardo Veraldi
> Sent: Thursday, August 17, 2017 10:48 PM
> To: Dennis Nelson <dnel...@ddn.com>; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Lustre poor performance
>
> this is my lustre.conf
>
> [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
> networks=o2ib5(ib0),tcp5(enp1s0f0)
>
> data transfer is over infiniband
>
> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
> inet 172.21.52.83  netmask 255.255.252.0  broadcast 
> 172.21.55.255
>
>
> On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
>> On 8/17/17 9:22 PM, Dennis Nelson wrote:
>>> It appears that you are running iozone on a single client?  What kind of 
>>> network is tcp5?  Have you looked at the network to make sure it is not the 
>>> bottleneck?
>>>
>> yes the data transfer is on ib0 interface and I did a memory to 
>> memory test through InfiniBand QDR  resulting in 3.7GB/sec.
>> tcp is used to connect to the MDS. It is tcp5 to differentiate it 
>> from my other many Lustre clusters. I could have called it tcp but it 
>> does not make any difference performance wise.
>> I ran the test from one single node yes, I ran the same test also 
>> locally on a zpool identical to the one on the Lustre OSS.
>>  Ihave 4 identical servers each of them with the aame nvme disks:
>>
>> server1: OSS - OST1 Lustre/ZFS  raidz1
>>
>> server2: OSS - OST2 Lustre/ZFS  raidz1
>>
>> server3: local ZFS raidz1
>>
>> server4: Lustre client
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-18 Thread Riccardo Veraldi

thank you Keith,
I will do all this. the single thread dd tests shows 1GB/sec. I will do
the other tests


On 8/18/17 9:05 AM, Mannthey, Keith wrote:
> I would suggest you a few other tests to help isolate where the issue might 
> be.  
>
> 1. What is the single thread "DD" write speed?
>  
> 2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network 
> Performance (LNet Self-Test)" in the Lustre manual if this is a new test for 
> you. 
> This will help show how much Lnet bandwith you have from your single client.  
> There are tunable in the lnet later that can affect things.  Which QRD HCA 
> are you using?
>
> 3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
> (obdfilter-survey)" in the Lustre manual.  This test will help demonstrate 
> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  
>
> Thanks,
>  Keith 
> -Original Message-
> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On 
> Behalf Of Riccardo Veraldi
> Sent: Thursday, August 17, 2017 10:48 PM
> To: Dennis Nelson <dnel...@ddn.com>; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Lustre poor performance
>
> this is my lustre.conf
>
> [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
> networks=o2ib5(ib0),tcp5(enp1s0f0)
>
> data transfer is over infiniband
>
> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
> inet 172.21.52.83  netmask 255.255.252.0  broadcast 172.21.55.255
>
>
> On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
>> On 8/17/17 9:22 PM, Dennis Nelson wrote:
>>> It appears that you are running iozone on a single client?  What kind of 
>>> network is tcp5?  Have you looked at the network to make sure it is not the 
>>> bottleneck?
>>>
>> yes the data transfer is on ib0 interface and I did a memory to memory 
>> test through InfiniBand QDR  resulting in 3.7GB/sec.
>> tcp is used to connect to the MDS. It is tcp5 to differentiate it from 
>> my other many Lustre clusters. I could have called it tcp but it does 
>> not make any difference performance wise.
>> I ran the test from one single node yes, I ran the same test also 
>> locally on a zpool identical to the one on the Lustre OSS.
>>  Ihave 4 identical servers each of them with the aame nvme disks:
>>
>> server1: OSS - OST1 Lustre/ZFS  raidz1
>>
>> server2: OSS - OST2 Lustre/ZFS  raidz1
>>
>> server3: local ZFS raidz1
>>
>> server4: Lustre client
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-18 Thread Mannthey, Keith
I would suggest you a few other tests to help isolate where the issue might be. 
 

1. What is the single thread "DD" write speed?
 
2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network Performance 
(LNet Self-Test)" in the Lustre manual if this is a new test for you. 
This will help show how much Lnet bandwith you have from your single client.  
There are tunable in the lnet later that can affect things.  Which QRD HCA are 
you using?

3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
(obdfilter-survey)" in the Lustre manual.  This test will help demonstrate what 
the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  

Thanks,
 Keith 
-Original Message-
From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Riccardo Veraldi
Sent: Thursday, August 17, 2017 10:48 PM
To: Dennis Nelson <dnel...@ddn.com>; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre poor performance

this is my lustre.conf

[drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
networks=o2ib5(ib0),tcp5(enp1s0f0)

data transfer is over infiniband

ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
inet 172.21.52.83  netmask 255.255.252.0  broadcast 172.21.55.255


On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
> On 8/17/17 9:22 PM, Dennis Nelson wrote:
>> It appears that you are running iozone on a single client?  What kind of 
>> network is tcp5?  Have you looked at the network to make sure it is not the 
>> bottleneck?
>>
> yes the data transfer is on ib0 interface and I did a memory to memory 
> test through InfiniBand QDR  resulting in 3.7GB/sec.
> tcp is used to connect to the MDS. It is tcp5 to differentiate it from 
> my other many Lustre clusters. I could have called it tcp but it does 
> not make any difference performance wise.
> I ran the test from one single node yes, I ran the same test also 
> locally on a zpool identical to the one on the Lustre OSS.
>  Ihave 4 identical servers each of them with the aame nvme disks:
>
> server1: OSS - OST1 Lustre/ZFS  raidz1
>
> server2: OSS - OST2 Lustre/ZFS  raidz1
>
> server3: local ZFS raidz1
>
> server4: Lustre client
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-17 Thread Riccardo Veraldi
On 8/17/17 8:56 PM, Jones, Peter A wrote:
> Riccardo
>
> I expect that it will be useful to know which version of ZFS you are using
apologies for not telling this I Am running 0.7.1
>
> Peter
>
>
>
>
> On 8/17/17, 8:21 PM, "lustre-discuss on behalf of Riccardo Veraldi" 
>  riccardo.vera...@cnaf.infn.it> wrote:
>
>> Hello,
>>
>> I am running Lustre 2.10.0 on Centos 7.3
>> I have one MDS and two OSSes, each with one OST
>> each OST is a ZFS raidz1 with 6 nvme disks each.
>> The configuration of ZFS is done in a way to allow maximum write
>> performances:
>>
>> zfs set sync=disabled drpffb-ost02
>> zfs set atime=off drpffb-ost02
>> zfs set redundant_metadata=most drpffb-ost02
>> zfs set xattr=sa drpffb-ost02
>> zfs set recordsize=1M drpffb-ost02
>>
>> every NVMe disk has 4K byte sector, zfs -o  ashift=12
>>
>> In a LOCAL raidz1 configuration I get 3.6GB/sec writings and 5GB/sec
>> readings.
>>
>> The same configuration thru Lustre has very poor performances, 1.3GB/sec
>> writes and 2GB/sec reads
>>
>> There must be something else to look for having better performances but
>> a local ZFS raidz1 is working pretty good.
>>
>> this is the Lustre partition client side:
>>
>> 172.21.42.159@tcp5:/drpffb  10T  279G  9.8T   3% /drpffb
>>
>> UUID   bytesUsed   Available Use% Mounted on
>> drpffb-MDT_UUID19.1G2.1M   19.1G   0% /drpffb[MDT:0]
>> drpffb-OST0001_UUID 5.0T  142.2G4.9T   3% /drpffb[OST:1]
>> drpffb-OST0002_UUID 5.0T  136.4G4.9T   3% /drpffb[OST:2]
>>
>> filesystem_summary:10.0T  278.6G9.7T   3% /drpffb
>>
>> Tests both on Lustre/ZFS and local ZFS are based on 50 threads writing
>> 4GB of data each and 50 threads reading using iozone:
>>
>> iozone  -i 0 -t 50 -i 1 -t 50 -s4g
>>
>> I do not know what else I can do to improve performances
>>
>> here some details on the OSSes
>>
>> OSS01:
>>
>> NAME USED  AVAIL  REFER  MOUNTPOINT
>> drpffb-ost0139.4G  4.99T   153K  none
>> drpffb-ost01/ost01  39.4G  4.99T  39.4G  none
>>
>>  pool: drpffb-ost01
>> state: ONLINE
>>  scan: none requested
>> config:
>>
>>NAME STATE READ WRITE CKSUM
>>drpffb-ost01  ONLINE   0 0 0
>>  raidz1-0   ONLINE   0 0 0
>>nvme0n1  ONLINE   0 0 0
>>nvme1n1  ONLINE   0 0 0
>>nvme2n1  ONLINE   0 0 0
>>nvme3n1  ONLINE   0 0 0
>>nvme4n1  ONLINE   0 0 0
>>nvme5n1  ONLINE   0 0 0
>>
>> OSS02:
>>
>> NAME USED  AVAIL  REFER  MOUNTPOINT
>> drpffb-ost0262.2G  4.97T   153K  none
>> drpffb-ost02/ost02  62.2G  4.97T  62.2G  none
>>
>>  pool: drpffb-ost02
>> state: ONLINE
>>  scan: none requested
>> config:
>>
>>NAME STATE READ WRITE CKSUM
>>drpffb-ost02  ONLINE   0 0 0
>>  raidz1-0   ONLINE   0 0 0
>>nvme0n1  ONLINE   0 0 0
>>nvme1n1  ONLINE   0 0 0
>>nvme2n1  ONLINE   0 0 0
>>nvme3n1  ONLINE   0 0 0
>>nvme4n1  ONLINE   0 0 0
>>nvme5n1  ONLINE   0 0 0
>>
>> thanks to anyone who may help giving hints.
>>
>> Rick
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-17 Thread Dennis Nelson
It appears that you are running iozone on a single client?  What kind of 
network is tcp5?  Have you looked at the network to make sure it is not the 
bottleneck?

-- 
Dennis Nelson
Mobile: 817-233-6116
 
Applications Support Engineer
DataDirect Networks, Inc.
dnel...@ddn.com

On 8/17/17, 10:22 PM, "lustre-discuss on behalf of Riccardo Veraldi" 
 wrote:

Hello,

I am running Lustre 2.10.0 on Centos 7.3
I have one MDS and two OSSes, each with one OST
each OST is a ZFS raidz1 with 6 nvme disks each.
The configuration of ZFS is done in a way to allow maximum write
performances:

zfs set sync=disabled drpffb-ost02
zfs set atime=off drpffb-ost02
zfs set redundant_metadata=most drpffb-ost02
zfs set xattr=sa drpffb-ost02
zfs set recordsize=1M drpffb-ost02

every NVMe disk has 4K byte sector, zfs -o  ashift=12

In a LOCAL raidz1 configuration I get 3.6GB/sec writings and 5GB/sec
readings.

The same configuration thru Lustre has very poor performances, 1.3GB/sec
writes and 2GB/sec reads

There must be something else to look for having better performances but
a local ZFS raidz1 is working pretty good.

this is the Lustre partition client side:

172.21.42.159@tcp5:/drpffb  10T  279G  9.8T   3% /drpffb

UUID   bytesUsed   Available Use% Mounted on
drpffb-MDT_UUID19.1G2.1M   19.1G   0% /drpffb[MDT:0]
drpffb-OST0001_UUID 5.0T  142.2G4.9T   3% /drpffb[OST:1]
drpffb-OST0002_UUID 5.0T  136.4G4.9T   3% /drpffb[OST:2]

filesystem_summary:10.0T  278.6G9.7T   3% /drpffb

Tests both on Lustre/ZFS and local ZFS are based on 50 threads writing
4GB of data each and 50 threads reading using iozone:

iozone  -i 0 -t 50 -i 1 -t 50 -s4g

I do not know what else I can do to improve performances

here some details on the OSSes

OSS01:

NAME USED  AVAIL  REFER  MOUNTPOINT
drpffb-ost0139.4G  4.99T   153K  none
drpffb-ost01/ost01  39.4G  4.99T  39.4G  none

  pool: drpffb-ost01
 state: ONLINE
  scan: none requested
config:

NAME STATE READ WRITE CKSUM
drpffb-ost01  ONLINE   0 0 0
  raidz1-0   ONLINE   0 0 0
nvme0n1  ONLINE   0 0 0
nvme1n1  ONLINE   0 0 0
nvme2n1  ONLINE   0 0 0
nvme3n1  ONLINE   0 0 0
nvme4n1  ONLINE   0 0 0
nvme5n1  ONLINE   0 0 0

OSS02:

NAME USED  AVAIL  REFER  MOUNTPOINT
drpffb-ost0262.2G  4.97T   153K  none
drpffb-ost02/ost02  62.2G  4.97T  62.2G  none

  pool: drpffb-ost02
 state: ONLINE
  scan: none requested
config:

NAME STATE READ WRITE CKSUM
drpffb-ost02  ONLINE   0 0 0
  raidz1-0   ONLINE   0 0 0
nvme0n1  ONLINE   0 0 0
nvme1n1  ONLINE   0 0 0
nvme2n1  ONLINE   0 0 0
nvme3n1  ONLINE   0 0 0
nvme4n1  ONLINE   0 0 0
nvme5n1  ONLINE   0 0 0

thanks to anyone who may help giving hints.

Rick



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre poor performance

2017-08-17 Thread Jones, Peter A
Riccardo

I expect that it will be useful to know which version of ZFS you are using

Peter




On 8/17/17, 8:21 PM, "lustre-discuss on behalf of Riccardo Veraldi" 
 wrote:

>Hello,
>
>I am running Lustre 2.10.0 on Centos 7.3
>I have one MDS and two OSSes, each with one OST
>each OST is a ZFS raidz1 with 6 nvme disks each.
>The configuration of ZFS is done in a way to allow maximum write
>performances:
>
>zfs set sync=disabled drpffb-ost02
>zfs set atime=off drpffb-ost02
>zfs set redundant_metadata=most drpffb-ost02
>zfs set xattr=sa drpffb-ost02
>zfs set recordsize=1M drpffb-ost02
>
>every NVMe disk has 4K byte sector, zfs -o  ashift=12
>
>In a LOCAL raidz1 configuration I get 3.6GB/sec writings and 5GB/sec
>readings.
>
>The same configuration thru Lustre has very poor performances, 1.3GB/sec
>writes and 2GB/sec reads
>
>There must be something else to look for having better performances but
>a local ZFS raidz1 is working pretty good.
>
>this is the Lustre partition client side:
>
>172.21.42.159@tcp5:/drpffb  10T  279G  9.8T   3% /drpffb
>
>UUID   bytesUsed   Available Use% Mounted on
>drpffb-MDT_UUID19.1G2.1M   19.1G   0% /drpffb[MDT:0]
>drpffb-OST0001_UUID 5.0T  142.2G4.9T   3% /drpffb[OST:1]
>drpffb-OST0002_UUID 5.0T  136.4G4.9T   3% /drpffb[OST:2]
>
>filesystem_summary:10.0T  278.6G9.7T   3% /drpffb
>
>Tests both on Lustre/ZFS and local ZFS are based on 50 threads writing
>4GB of data each and 50 threads reading using iozone:
>
>iozone  -i 0 -t 50 -i 1 -t 50 -s4g
>
>I do not know what else I can do to improve performances
>
>here some details on the OSSes
>
>OSS01:
>
>NAME USED  AVAIL  REFER  MOUNTPOINT
>drpffb-ost0139.4G  4.99T   153K  none
>drpffb-ost01/ost01  39.4G  4.99T  39.4G  none
>
>  pool: drpffb-ost01
> state: ONLINE
>  scan: none requested
>config:
>
>NAME STATE READ WRITE CKSUM
>drpffb-ost01  ONLINE   0 0 0
>  raidz1-0   ONLINE   0 0 0
>nvme0n1  ONLINE   0 0 0
>nvme1n1  ONLINE   0 0 0
>nvme2n1  ONLINE   0 0 0
>nvme3n1  ONLINE   0 0 0
>nvme4n1  ONLINE   0 0 0
>nvme5n1  ONLINE   0 0 0
>
>OSS02:
>
>NAME USED  AVAIL  REFER  MOUNTPOINT
>drpffb-ost0262.2G  4.97T   153K  none
>drpffb-ost02/ost02  62.2G  4.97T  62.2G  none
>
>  pool: drpffb-ost02
> state: ONLINE
>  scan: none requested
>config:
>
>NAME STATE READ WRITE CKSUM
>drpffb-ost02  ONLINE   0 0 0
>  raidz1-0   ONLINE   0 0 0
>nvme0n1  ONLINE   0 0 0
>nvme1n1  ONLINE   0 0 0
>nvme2n1  ONLINE   0 0 0
>nvme3n1  ONLINE   0 0 0
>nvme4n1  ONLINE   0 0 0
>nvme5n1  ONLINE   0 0 0
>
>thanks to anyone who may help giving hints.
>
>Rick
>
>
>
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre poor performance

2017-08-17 Thread Riccardo Veraldi
Hello,

I am running Lustre 2.10.0 on Centos 7.3
I have one MDS and two OSSes, each with one OST
each OST is a ZFS raidz1 with 6 nvme disks each.
The configuration of ZFS is done in a way to allow maximum write
performances:

zfs set sync=disabled drpffb-ost02
zfs set atime=off drpffb-ost02
zfs set redundant_metadata=most drpffb-ost02
zfs set xattr=sa drpffb-ost02
zfs set recordsize=1M drpffb-ost02

every NVMe disk has 4K byte sector, zfs -o  ashift=12

In a LOCAL raidz1 configuration I get 3.6GB/sec writings and 5GB/sec
readings.

The same configuration thru Lustre has very poor performances, 1.3GB/sec
writes and 2GB/sec reads

There must be something else to look for having better performances but
a local ZFS raidz1 is working pretty good.

this is the Lustre partition client side:

172.21.42.159@tcp5:/drpffb  10T  279G  9.8T   3% /drpffb

UUID   bytesUsed   Available Use% Mounted on
drpffb-MDT_UUID19.1G2.1M   19.1G   0% /drpffb[MDT:0]
drpffb-OST0001_UUID 5.0T  142.2G4.9T   3% /drpffb[OST:1]
drpffb-OST0002_UUID 5.0T  136.4G4.9T   3% /drpffb[OST:2]

filesystem_summary:10.0T  278.6G9.7T   3% /drpffb

Tests both on Lustre/ZFS and local ZFS are based on 50 threads writing
4GB of data each and 50 threads reading using iozone:

iozone  -i 0 -t 50 -i 1 -t 50 -s4g

I do not know what else I can do to improve performances

here some details on the OSSes

OSS01:

NAME USED  AVAIL  REFER  MOUNTPOINT
drpffb-ost0139.4G  4.99T   153K  none
drpffb-ost01/ost01  39.4G  4.99T  39.4G  none

  pool: drpffb-ost01
 state: ONLINE
  scan: none requested
config:

NAME STATE READ WRITE CKSUM
drpffb-ost01  ONLINE   0 0 0
  raidz1-0   ONLINE   0 0 0
nvme0n1  ONLINE   0 0 0
nvme1n1  ONLINE   0 0 0
nvme2n1  ONLINE   0 0 0
nvme3n1  ONLINE   0 0 0
nvme4n1  ONLINE   0 0 0
nvme5n1  ONLINE   0 0 0

OSS02:

NAME USED  AVAIL  REFER  MOUNTPOINT
drpffb-ost0262.2G  4.97T   153K  none
drpffb-ost02/ost02  62.2G  4.97T  62.2G  none

  pool: drpffb-ost02
 state: ONLINE
  scan: none requested
config:

NAME STATE READ WRITE CKSUM
drpffb-ost02  ONLINE   0 0 0
  raidz1-0   ONLINE   0 0 0
nvme0n1  ONLINE   0 0 0
nvme1n1  ONLINE   0 0 0
nvme2n1  ONLINE   0 0 0
nvme3n1  ONLINE   0 0 0
nvme4n1  ONLINE   0 0 0
nvme5n1  ONLINE   0 0 0

thanks to anyone who may help giving hints.

Rick



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org