subject:"Re\: \[lustre\-discuss\] bad performance with Lustre\/ZFS on NVMe SSD"

Re: [lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

2018-04-12 Thread Riccardo Veraldi

Yes I tested every single disk and also with disks in a raidz pool
without Lustre.
disks perform to specs, 1.2TB each and up to 6GB/s in the zpool.
When using lustre the zpool performs really bad no more than 1.5GB/s.

I then configured one OST per disk without any raidz (6 OST total).
I can scale up with performance distributing processes across OSTs in
this way, but anyway if I use striping across all OSTs
instead of manually bounding proesses to a specific OST, the performance
decreases.
Also running a single process on a single OST I never can get more than
700MB/s while I can reach 1.2GB/s using at least 4 processes on the same
OST.

I did test using obdfilter-survey this is what I got:

ost  1 sz 524288000K rsz 1024K obj    4 thr    4 write 4872.92 [1525.83,
6120.75]

I did run Lnet selftest and I got 6GB/s using FDR.

But when I write form the client side the performances really drops
dramatically. Especially when using a Lustre on raidz.

so I Was wondering if  there is any RPC parameter setting that I need to
set to get better performances out of Lustre ?

thank you

On 4/9/18 4:15 PM, Dilger, Andreas wrote:
> On Apr 6, 2018, at 23:04, Riccardo Veraldi  
> wrote:
>> So I'm struggling since months with these low performances on Lsutre/ZFS.
>>
>> Looking for hints.
>>
>> 3 OSSes, RHEL 74  Lustre 2.10.3 and zfs 0.7.6
>>
>> each OSS has one  OST raidz
>>
>>   pool: drpffb-ost01
>>  state: ONLINE
>>   scan: none requested
>>   trim: completed on Fri Apr  6 21:53:04 2018 (after 0h3m)
>> config:
>>
>> NAME  STATE READ WRITE CKSUM
>> drpffb-ost01  ONLINE   0 0 0
>>   raidz1-0ONLINE   0 0 0
>> nvme0n1   ONLINE   0 0 0
>> nvme1n1   ONLINE   0 0 0
>> nvme2n1   ONLINE   0 0 0
>> nvme3n1   ONLINE   0 0 0
>> nvme4n1   ONLINE   0 0 0
>> nvme5n1   ONLINE   0 0 0
>>
>> while the raidz without Lustre perform well at 6GB/s (1GB/s per disk),
>> with Lustre on top of it performances are really poor.
>> most of all they are not stable at all and go up and down between
>> 1.5GB/s and 6GB/s. I Tested with obfilter-survey
>> LNET is ok and working at 6GB/s (using infiniband FDR)
>>
>> What could be the cause of OST performance going up and down like a
>> roller coaster ?
> Riccardo,
> to take a step back for a minute, have you tested all of the devices
> individually, and also concurrently with some low-level tool like
> sgpdd or vdbench?  After that is known to be working, have you tested
> with obdfilter-survey locally on the OSS, then remotely on the client(s)
> so that we can isolate where the bottleneck is being hit.
>
> Cheers, Andreas
>
>
>> for reference here are few considerations:
>>
>> filesystem parameters:
>>
>> zfs set mountpoint=none drpffb-ost01
>> zfs set sync=disabled drpffb-ost01
>> zfs set atime=off drpffb-ost01
>> zfs set redundant_metadata=most drpffb-ost01
>> zfs set xattr=sa drpffb-ost01
>> zfs set recordsize=1M drpffb-ost01
>>
>> NVMe SSD are  4KB/sector
>>
>> ashift=12
>>
>>
>> ZFS module parameters
>>
>> options zfs zfs_prefetch_disable=1
>> options zfs zfs_txg_history=120
>> options zfs metaslab_debug_unload=1
>> #
>> options zfs zfs_vdev_scheduler=deadline
>> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
>> #
>> options zfs zfs_vdev_scrub_min_active=48
>> options zfs zfs_vdev_scrub_max_active=128
>> #options zfs zfs_vdev_sync_write_min_active=64
>> #options zfs zfs_vdev_sync_write_max_active=128
>> #
>> options zfs zfs_vdev_sync_write_min_active=8
>> options zfs zfs_vdev_sync_write_max_active=32
>> options zfs zfs_vdev_sync_read_min_active=8
>> options zfs zfs_vdev_sync_read_max_active=32
>> options zfs zfs_vdev_async_read_min_active=8
>> options zfs zfs_vdev_async_read_max_active=32
>> options zfs zfs_top_maxinflight=320
>> options zfs zfs_txg_timeout=30
>> options zfs zfs_dirty_data_max_percent=40
>> options zfs zfs_vdev_scheduler=deadline
>> options zfs zfs_vdev_async_write_min_active=8
>> options zfs zfs_vdev_async_write_max_active=32
>>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
>
>
>
>
>
>
>


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

2018-04-10 Thread Alexander I Kulyavtsev

Ricardo,
It can be helpful to see output of commands on zfs pool host when you read 
files through lustre client; and directly through zfs:

# zpool iostat -lq -y zpool_name 1
# zpool iostat -w -y zpool_name 5
# zpool iostat -r -y zpool_name 5

-q queue statistics
-l Latency statistics

-r Request size histogram:
-w (undocumented) latency statistics

I did see different behavior of zfs reads on zfs pool for the same dd/fio 
command reading file from lustre mount on different host; and directly from zfs 
on OSS. I created separate zfs dataset with similar zfs settings on lustre 
zpool.
lustre IO seen on zfs pool with 128KB requests while dd/fio on zfs has 1MB 
requests. dd/fio command used 1MB IO.

zptevlfs6 sync_readsync_writeasync_readasync_write  scrub   
req_size  indaggindaggindaggindaggindagg
--  -  -  -  -  -  -  -  -  -  -
512 0  0  0  0  0  0  0  0  0  0
1K  0  0  0  0  0  0  0  0  0  0
2K  0  0  0  0  0  0  0  0  0  0
4K  0  0  0  0  0  0  0  0  0  0
8K  0  0  0  0  0  0  0  0  0  0
16K 0  0  0  0  0  0  0  0  0  0
32K 0  0  0  0  0  0  0  0  0  0
64K 0  0  0  0  0  0  0  0  0  0
128K0  0  0  0  2.00K  0  0  0  0  
0 <
256K0  0  0  0  0  0  0  0  0  0
512K0  0  0  0  0  0  0  0  0  0
1M  0  0  0  0125  0  0  0  0  
0<
2M  0  0  0  0  0  0  0  0  0  0
4M  0  0  0  0  0  0  0  0  0  0
8M  0  0  0  0  0  0  0  0  0  0
16M 0  0  0  0  0  0  0  0  0  0

^C

Alex.


On 4/9/18, 6:15 PM, "lustre-discuss on behalf of Dilger, Andreas" 
 
wrote:

On Apr 6, 2018, at 23:04, Riccardo Veraldi  
wrote:
> 
> So I'm struggling since months with these low performances on Lsutre/ZFS.
> 
> Looking for hints.
> 
> 3 OSSes, RHEL 74  Lustre 2.10.3 and zfs 0.7.6
> 
> each OSS has one  OST raidz
> 
>   pool: drpffb-ost01
>  state: ONLINE
>   scan: none requested
>   trim: completed on Fri Apr  6 21:53:04 2018 (after 0h3m)
> config:
> 
> NAME  STATE READ WRITE CKSUM
> drpffb-ost01  ONLINE   0 0 0
>   raidz1-0ONLINE   0 0 0
> nvme0n1   ONLINE   0 0 0
> nvme1n1   ONLINE   0 0 0
> nvme2n1   ONLINE   0 0 0
> nvme3n1   ONLINE   0 0 0
> nvme4n1   ONLINE   0 0 0
> nvme5n1   ONLINE   0 0 0
> 
> while the raidz without Lustre perform well at 6GB/s (1GB/s per disk),
> with Lustre on top of it performances are really poor.
> most of all they are not stable at all and go up and down between
> 1.5GB/s and 6GB/s. I Tested with obfilter-survey
> LNET is ok and working at 6GB/s (using infiniband FDR)
> 
> What could be the cause of OST performance going up and down like a
> roller coaster ?

Riccardo,
to take a step back for a minute, have you tested all of the devices
individually, and also concurrently with some low-level tool like
sgpdd or vdbench?  After that is known to be working, have you tested
with obdfilter-survey locally on the OSS, then remotely on the client(s)
so that we can isolate where the bottleneck is being hit.

Cheers, Andreas


> for reference here are few considerations:
> 
> filesystem parameters:
> 
> zfs set mountpoint=none drpffb-ost01
> zfs set sync=disabled drpffb-ost01
> zfs set atime=off drpffb-ost01
> zfs set redundant_metadata=most drpffb-ost01
> zfs set xattr=sa drpffb-ost01
> zfs set recordsize=1M drpffb-ost01
> 
> NVMe SSD are  4KB/sector
> 
> ashift=12
> 
> 
> ZFS module parameters
> 
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug_unload=1
> #
> options zfs zfs_vdev_scheduler=deadline
> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
> #
> options zfs zfs_vdev_scrub_min_active=48
> options zfs zfs_vdev_scrub_max_active=128

Re: [lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

2018-04-09 Thread Dilger, Andreas

On Apr 6, 2018, at 23:04, Riccardo Veraldi  
wrote:
> 
> So I'm struggling since months with these low performances on Lsutre/ZFS.
> 
> Looking for hints.
> 
> 3 OSSes, RHEL 74  Lustre 2.10.3 and zfs 0.7.6
> 
> each OSS has one  OST raidz
> 
>   pool: drpffb-ost01
>  state: ONLINE
>   scan: none requested
>   trim: completed on Fri Apr  6 21:53:04 2018 (after 0h3m)
> config:
> 
> NAME  STATE READ WRITE CKSUM
> drpffb-ost01  ONLINE   0 0 0
>   raidz1-0ONLINE   0 0 0
> nvme0n1   ONLINE   0 0 0
> nvme1n1   ONLINE   0 0 0
> nvme2n1   ONLINE   0 0 0
> nvme3n1   ONLINE   0 0 0
> nvme4n1   ONLINE   0 0 0
> nvme5n1   ONLINE   0 0 0
> 
> while the raidz without Lustre perform well at 6GB/s (1GB/s per disk),
> with Lustre on top of it performances are really poor.
> most of all they are not stable at all and go up and down between
> 1.5GB/s and 6GB/s. I Tested with obfilter-survey
> LNET is ok and working at 6GB/s (using infiniband FDR)
> 
> What could be the cause of OST performance going up and down like a
> roller coaster ?

Riccardo,
to take a step back for a minute, have you tested all of the devices
individually, and also concurrently with some low-level tool like
sgpdd or vdbench?  After that is known to be working, have you tested
with obdfilter-survey locally on the OSS, then remotely on the client(s)
so that we can isolate where the bottleneck is being hit.

Cheers, Andreas


> for reference here are few considerations:
> 
> filesystem parameters:
> 
> zfs set mountpoint=none drpffb-ost01
> zfs set sync=disabled drpffb-ost01
> zfs set atime=off drpffb-ost01
> zfs set redundant_metadata=most drpffb-ost01
> zfs set xattr=sa drpffb-ost01
> zfs set recordsize=1M drpffb-ost01
> 
> NVMe SSD are  4KB/sector
> 
> ashift=12
> 
> 
> ZFS module parameters
> 
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug_unload=1
> #
> options zfs zfs_vdev_scheduler=deadline
> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
> #
> options zfs zfs_vdev_scrub_min_active=48
> options zfs zfs_vdev_scrub_max_active=128
> #options zfs zfs_vdev_sync_write_min_active=64
> #options zfs zfs_vdev_sync_write_max_active=128
> #
> options zfs zfs_vdev_sync_write_min_active=8
> options zfs zfs_vdev_sync_write_max_active=32
> options zfs zfs_vdev_sync_read_min_active=8
> options zfs zfs_vdev_sync_read_max_active=32
> options zfs zfs_vdev_async_read_min_active=8
> options zfs zfs_vdev_async_read_max_active=32
> options zfs zfs_top_maxinflight=320
> options zfs zfs_txg_timeout=30
> options zfs zfs_dirty_data_max_percent=40
> options zfs zfs_vdev_scheduler=deadline
> options zfs zfs_vdev_async_write_min_active=8
> options zfs zfs_vdev_async_write_max_active=32
> 
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

Re: [lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

Re: [lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

3 matches

Site Navigation

Mail list logo

Footer information