Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-17 Thread Ben Evans
I'm guessing that you have more disk bandwidth than network bandwidth.
Adding more OSSes and distributing the OSTs among them would probably help
the general case, not necessarily the single dd case.

On 10/14/16, 3:22 PM, "lustre-discuss on behalf of Riccardo Veraldi"
 wrote:

>Hello,
>
>I would like how may I improve the situation of my lustre cluster.
>
>I have 1 MDS and 1 OSS with 20 OST defined.
>
>Each OST is a 8x Disks RAIDZ2.
>
>A single process write performance is around 800MB/sec
>
>anyway if I force direct I/O, for example using oflag=direct in dd, the
>write performance drop as low as 8MB/sec
>
>with 1MB block size. And each write it's about 120ms latency.
>
>I used these ZFS settings
>
>options zfs zfs_prefetch_disable=1
>options zfs zfs_txg_history=120
>options zfs metaslab_debug_unload=1
>
>i am quite worried for the low performance.
>
>Any hints or suggestions that may help me to improve the situation ?
>
>
>thank you
>
>
>Rick
>
>
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-15 Thread Riccardo Veraldi

On 14/10/16 14:31, Mark Hahn wrote:
anyway if I force direct I/O, for example using oflag=direct in dd, 
the write performance drop as low as 8MB/sec


with 1MB block size. And each write it's about 120ms latency.


but that's quite a small block size.  do you approach buffered 
performance
if you write significantly bigger blocks (8-32M)?  presumably you're 
already

striping across OSTs?
Yes I agree with you but my lustre filesystem can have many small files 
so I Was trying to do a test similar

to the real life situation.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-15 Thread Riccardo Veraldi

On 14/10/16 14:38, Dilger, Andreas wrote:


John, with newer Lustre clients it is possible for multiple threads to 
submit non-overlapping writes concurrently (also not conflicting 
within a single page), see LU-1669 for details.


Even so, O_DIRECT writes need to be synchronous to disk on the OSS, as 
Patrick reports, because if the OSS fails before the write is on disk 
there is no cached copy of the data on the client that can be used to 
resend the RPC.


The problem is that the ZFS OSD has very long transaction commit times 
for synchronous writes because it does not yet have support for the 
ZIL.  Using buffered writes, or having very large O_DIRECT writes 
(e.g. 40MB or larger) and large RPCs (4MB, or up to 16MB in 2.9.0) to 
amortize the sync overhead may be beneficial if you really want to use 
O_DIRECT.


Riccardo,

The other potential issue is that you have 20 OSTs on a single OSS, 
which isn't going to have very good performance.  Spreading the OSTs 
across multiple OSS nodes is going to improve your performance 
significantly when there are multiple clients writing, as there will 
be N times the OSS network bandwidth, N times the CPU, N times the 
RAM.  It only makes sense to have 20 OSTs/OSS if your workload is only 
a single client and you want the maximum possible capacity for a given 
cost.




Hello Andreas,
each OST has a separate VDEV and separate zpool.
thank you

Is each OST a separate VDEV and separate zpool, or are they a single 
zpool?  Separate zpools have less overhead for maximum performance, 
but only one VDEV per zpool means that metadata ditto blocks are 
written twice per RAID-Z2 VDEV, which isn't very efficient.  Having at 
least 3 VDEVs per zpool is better in this regard.


Cheers, Andreas

--

Andreas Dilger

Lustre Principal Architect

Intel High Performance Data Division

On 2016/10/14, 15:22, "John Bauer" <mailto:bau...@iodoctors.com>> wrote:


Patrick

I thought at one time there was an inode lock held for the duration of 
the direct I/O read or write. So that even if one had multiple 
application threads writing direct, only one was "in flight" at a 
time. Has that changed?


John

Sent from my iPhone


On Oct 14, 2016, at 3:16 PM, Patrick Farrell <mailto:p...@cray.com>> wrote:


Sorry, I phrased one thing wrong:
I said "transferring to the network", but it's actually until it's
received confirmation the data has been received successfully, I
believe.

In any case, only one I/O (per thread) can be outstanding at a
time with direct I/O.



*From:*lustre-discuss mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of
Patrick Farrell mailto:p...@cray.com>>
*Sent:* Friday, October 14, 2016 3:12:22 PM
*To:* Riccardo Veraldi; lustre-discuss@lists.lustre.org
    <mailto:lustre-discuss@lists.lustre.org>
*Subject:* Re: [lustre-discuss] Lustre on ZFS pooer direct I/O
performance

Riccardo,

While the difference is extreme, direct I/O write performance will
always be poor.  Direct I/O writes cannot be asynchronous, since
they don't use the page cache.  This means Lustre cannot return
from one write (and start the next) until it has finished
transferring the data to the network.

This means you can only have one I/O in flight at a time. Good
write performance from Lustre (or any network filesystem) depends
on keeping a lot of data in flight at once.

What sort of direct write performance were you hoping for? It will
never match that 800 MB/s from one thread you see with buffered I/O.

- Patrick



*From:*lustre-discuss mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of
Riccardo Veraldi mailto:riccardo.vera...@cnaf.infn.it>>
*Sent:* Friday, October 14, 2016 2:22:32 PM
*To:* lustre-discuss@lists.lustre.org
    <mailto:lustre-discuss@lists.lustre.org>
*Subject:* [lustre-discuss] Lustre on ZFS pooer direct I/O
performance

Hello,

I would like how may I improve the situation of my lustre cluster.

I have 1 MDS and 1 OSS with 20 OST defined.

Each OST is a 8x Disks RAIDZ2.

A single process write performance is around 800MB/sec

anyway if I force direct I/O, for example using oflag=direct in
dd, the
write performance drop as low as 8MB/sec

with 1MB block size. And each write it's about 120ms latency.

I used these ZFS settings

options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1

i am quite worried for the low performance.

Any hints or suggestions that may help me to improve the situation ?


thank you


Rick


_

Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread Patrick Farrell

John,

Just to clarify a little further.  Your comment about the inode lock is correct 
(keeping in mind the comment from Andreas about more recent changes), but 
Lustre (in buffered, rather than O_DIRECT mode)returns to user space before the 
data is down on the server.  It maintains the pages in the page cache on the 
client until it knows for sure it will not have to resend them.  This means 
that - in the inode lock case - while only one process can be actively writing 
data to the page cache at a given moment, Lustre returns to userspace before 
the data is written out.

This means that while (in the inode lock case) the 'writing pages in to the 
page cache and setting them up to go out' (which is nearly all of most write 
syscalls) can't overlap, the network and disk portions or Lustre I/Os can 
overlap.  That's how multiple writes can be in flight at once even with single 
entry for writing processes.

This also applies to one process - it can get many I/Os in flight at once, and 
achieve much higher bandwidth by overlapping that time they spend in flight.  
So, buffered writing is generally >> direct writing.  Extremely large write 
sizes can mitigate this, as Andreas suggested.

- Patrick

From: Dilger, Andreas 
Sent: Friday, October 14, 2016 4:38:19 PM
To: John Bauer; Riccardo Veraldi
Cc: lustre-discuss@lists.lustre.org; Patrick Farrell
Subject: Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

John, with newer Lustre clients it is possible for multiple threads to submit 
non-overlapping writes concurrently (also not conflicting within a single 
page), see LU-1669 for details.

Even so, O_DIRECT writes need to be synchronous to disk on the OSS, as Patrick 
reports, because if the OSS fails before the write is on disk there is no 
cached copy of the data on the client that can be used to resend the RPC.

The problem is that the ZFS OSD has very long transaction commit times for 
synchronous writes because it does not yet have support for the ZIL.  Using 
buffered writes, or having very large O_DIRECT writes (e.g. 40MB or larger) and 
large RPCs (4MB, or up to 16MB in 2.9.0) to amortize the sync overhead may be 
beneficial if you really want to use O_DIRECT.

Riccardo,
The other potential issue is that you have 20 OSTs on a single OSS, which isn't 
going to have very good performance.  Spreading the OSTs across multiple OSS 
nodes is going to improve your performance significantly when there are 
multiple clients writing, as there will be N times the OSS network bandwidth, N 
times the CPU, N times the RAM.  It only makes sense to have 20 OSTs/OSS if 
your workload is only a single client and you want the maximum possible 
capacity for a given cost.

Is each OST a separate VDEV and separate zpool, or are they a single zpool?  
Separate zpools have less overhead for maximum performance, but only one VDEV 
per zpool means that metadata ditto blocks are written twice per RAID-Z2 VDEV, 
which isn't very efficient.  Having at least 3 VDEVs per zpool is better in 
this regard.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel High Performance Data Division

On 2016/10/14, 15:22, "John Bauer" 
mailto:bau...@iodoctors.com>> wrote:

Patrick
I thought at one time there was an inode lock held for the duration of the 
direct I/O read or write. So that even if one had multiple application threads 
writing direct, only one was "in flight" at a time. Has that changed?
John

Sent from my iPhone

On Oct 14, 2016, at 3:16 PM, Patrick Farrell 
mailto:p...@cray.com>> wrote:

Sorry, I phrased one thing wrong:
I said "transferring to the network", but it's actually until it's received 
confirmation the data has been received successfully, I believe.



In any case, only one I/O (per thread) can be outstanding at a time with direct 
I/O.


From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Patrick Farrell mailto:p...@cray.com>>
Sent: Friday, October 14, 2016 3:12:22 PM
To: Riccardo Veraldi; 
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance


Riccardo,



While the difference is extreme, direct I/O write performance will always be 
poor.  Direct I/O writes cannot be asynchronous, since they don't use the page 
cache.  This means Lustre cannot return from one write (and start the next) 
until it has finished transferring the data to the network.



This means you can only have one I/O in flight at a time.  Good write 
performance from Lustre (or any network filesystem) depends on keeping a lot of 
data in flight at once.



What sort of direct write performance were you hoping for?  It will never match 
that 800 MB/s from one thread you see with buffered I/O.



- Patrick

___

Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread Dilger, Andreas
John, with newer Lustre clients it is possible for multiple threads to submit 
non-overlapping writes concurrently (also not conflicting within a single 
page), see LU-1669 for details.

Even so, O_DIRECT writes need to be synchronous to disk on the OSS, as Patrick 
reports, because if the OSS fails before the write is on disk there is no 
cached copy of the data on the client that can be used to resend the RPC.

The problem is that the ZFS OSD has very long transaction commit times for 
synchronous writes because it does not yet have support for the ZIL.  Using 
buffered writes, or having very large O_DIRECT writes (e.g. 40MB or larger) and 
large RPCs (4MB, or up to 16MB in 2.9.0) to amortize the sync overhead may be 
beneficial if you really want to use O_DIRECT.

Riccardo,
The other potential issue is that you have 20 OSTs on a single OSS, which isn't 
going to have very good performance.  Spreading the OSTs across multiple OSS 
nodes is going to improve your performance significantly when there are 
multiple clients writing, as there will be N times the OSS network bandwidth, N 
times the CPU, N times the RAM.  It only makes sense to have 20 OSTs/OSS if 
your workload is only a single client and you want the maximum possible 
capacity for a given cost.

Is each OST a separate VDEV and separate zpool, or are they a single zpool?  
Separate zpools have less overhead for maximum performance, but only one VDEV 
per zpool means that metadata ditto blocks are written twice per RAID-Z2 VDEV, 
which isn't very efficient.  Having at least 3 VDEVs per zpool is better in 
this regard.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel High Performance Data Division

On 2016/10/14, 15:22, "John Bauer" 
mailto:bau...@iodoctors.com>> wrote:

Patrick
I thought at one time there was an inode lock held for the duration of the 
direct I/O read or write. So that even if one had multiple application threads 
writing direct, only one was "in flight" at a time. Has that changed?
John

Sent from my iPhone

On Oct 14, 2016, at 3:16 PM, Patrick Farrell 
mailto:p...@cray.com>> wrote:

Sorry, I phrased one thing wrong:
I said "transferring to the network", but it's actually until it's received 
confirmation the data has been received successfully, I believe.



In any case, only one I/O (per thread) can be outstanding at a time with direct 
I/O.


From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Patrick Farrell mailto:p...@cray.com>>
Sent: Friday, October 14, 2016 3:12:22 PM
To: Riccardo Veraldi; 
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance


Riccardo,



While the difference is extreme, direct I/O write performance will always be 
poor.  Direct I/O writes cannot be asynchronous, since they don't use the page 
cache.  This means Lustre cannot return from one write (and start the next) 
until it has finished transferring the data to the network.



This means you can only have one I/O in flight at a time.  Good write 
performance from Lustre (or any network filesystem) depends on keeping a lot of 
data in flight at once.



What sort of direct write performance were you hoping for?  It will never match 
that 800 MB/s from one thread you see with buffered I/O.



- Patrick


From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Riccardo Veraldi 
mailto:riccardo.vera...@cnaf.infn.it>>
Sent: Friday, October 14, 2016 2:22:32 PM
To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

Hello,

I would like how may I improve the situation of my lustre cluster.

I have 1 MDS and 1 OSS with 20 OST defined.

Each OST is a 8x Disks RAIDZ2.

A single process write performance is around 800MB/sec

anyway if I force direct I/O, for example using oflag=direct in dd, the
write performance drop as low as 8MB/sec

with 1MB block size. And each write it's about 120ms latency.

I used these ZFS settings

options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1

i am quite worried for the low performance.

Any hints or suggestions that may help me to improve the situation ?


thank you


Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___

Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread Mark Hahn
anyway if I force direct I/O, for example using oflag=direct in dd, the write 
performance drop as low as 8MB/sec


with 1MB block size. And each write it's about 120ms latency.


but that's quite a small block size.  do you approach buffered performance
if you write significantly bigger blocks (8-32M)?  presumably you're already
striping across OSTs?
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread John Bauer
Patrick
I thought at one time there was an inode lock held for the duration of the 
direct I/O read or write. So that even if one had multiple application threads 
writing direct, only one was "in flight" at a time. Has that changed?
John

Sent from my iPhone

> On Oct 14, 2016, at 3:16 PM, Patrick Farrell  wrote:
> 
> Sorry, I phrased one thing wrong:
> I said "transferring to the network", but it's actually until it's received 
> confirmation the data has been received successfully, I believe.
> 
> In any case, only one I/O (per thread) can be outstanding at a time with 
> direct I/O.
>  
> From: lustre-discuss  on behalf of 
> Patrick Farrell 
> Sent: Friday, October 14, 2016 3:12:22 PM
> To: Riccardo Veraldi; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance
>  
> Riccardo,
> 
> While the difference is extreme, direct I/O write performance will always be 
> poor.  Direct I/O writes cannot be asynchronous, since they don't use the 
> page cache.  This means Lustre cannot return from one write (and start the 
> next) until it has finished transferring the data to the network.
> 
> This means you can only have one I/O in flight at a time.  Good write 
> performance from Lustre (or any network filesystem) depends on keeping a lot 
> of data in flight at once.
> 
> What sort of direct write performance were you hoping for?  It will never 
> match that 800 MB/s from one thread you see with buffered I/O.
> 
> - Patrick
>  
> From: lustre-discuss  on behalf of 
> Riccardo Veraldi 
> Sent: Friday, October 14, 2016 2:22:32 PM
> To: lustre-discuss@lists.lustre.org
> Subject: [lustre-discuss] Lustre on ZFS pooer direct I/O performance
>  
> Hello,
> 
> I would like how may I improve the situation of my lustre cluster.
> 
> I have 1 MDS and 1 OSS with 20 OST defined.
> 
> Each OST is a 8x Disks RAIDZ2.
> 
> A single process write performance is around 800MB/sec
> 
> anyway if I force direct I/O, for example using oflag=direct in dd, the 
> write performance drop as low as 8MB/sec
> 
> with 1MB block size. And each write it's about 120ms latency.
> 
> I used these ZFS settings
> 
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug_unload=1
> 
> i am quite worried for the low performance.
> 
> Any hints or suggestions that may help me to improve the situation ?
> 
> 
> thank you
> 
> 
> Rick
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread Riccardo Veraldi

Sorry I thought I had told the ZFS and lustre version but I did not.
Lustre version is 2.8.0 ZFS is 0.6.5.8-1

I understand that I cannot expect performance from Direct I/O but on hte 
previous Lsutre clsuter with ldiskfs
and hardare raid I was having 300MB/sec. Probably the hw raid caching 
mechanism was aiding in this.


thanks

Riccardo


On 14/10/16 13:12, Patrick Farrell wrote:


Riccardo,


While the difference is extreme, direct I/O write performance will 
always be poor.  Direct I/O writes cannot be asynchronous, since they 
don't use the page cache.  This means Lustre cannot return from one 
write (and start the next) until it has finished transferring the data 
to the network.



This means you can only have one I/O in flight at a time. Good write 
performance from Lustre (or any network filesystem) depends on keeping 
a lot of data in flight at once.



What sort of direct write performance were you hoping for? It will 
never match that 800 MB/s from one thread you see with buffered I/O.



- Patrick


*From:* lustre-discuss  on 
behalf of Riccardo Veraldi 

*Sent:* Friday, October 14, 2016 2:22:32 PM
*To:* lustre-discuss@lists.lustre.org
*Subject:* [lustre-discuss] Lustre on ZFS pooer direct I/O performance
Hello,

I would like how may I improve the situation of my lustre cluster.

I have 1 MDS and 1 OSS with 20 OST defined.

Each OST is a 8x Disks RAIDZ2.

A single process write performance is around 800MB/sec

anyway if I force direct I/O, for example using oflag=direct in dd, the
write performance drop as low as 8MB/sec

with 1MB block size. And each write it's about 120ms latency.

I used these ZFS settings

options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1

i am quite worried for the low performance.

Any hints or suggestions that may help me to improve the situation ?


thank you


Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread Patrick Farrell
Sorry, I phrased one thing wrong:
I said "transferring to the network", but it's actually until it's received 
confirmation the data has been received successfully, I believe.


In any case, only one I/O (per thread) can be outstanding at a time with direct 
I/O.


From: lustre-discuss  on behalf of 
Patrick Farrell 
Sent: Friday, October 14, 2016 3:12:22 PM
To: Riccardo Veraldi; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance


Riccardo,


While the difference is extreme, direct I/O write performance will always be 
poor.  Direct I/O writes cannot be asynchronous, since they don't use the page 
cache.  This means Lustre cannot return from one write (and start the next) 
until it has finished transferring the data to the network.


This means you can only have one I/O in flight at a time.  Good write 
performance from Lustre (or any network filesystem) depends on keeping a lot of 
data in flight at once.


What sort of direct write performance were you hoping for?  It will never match 
that 800 MB/s from one thread you see with buffered I/O.


- Patrick


From: lustre-discuss  on behalf of 
Riccardo Veraldi 
Sent: Friday, October 14, 2016 2:22:32 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

Hello,

I would like how may I improve the situation of my lustre cluster.

I have 1 MDS and 1 OSS with 20 OST defined.

Each OST is a 8x Disks RAIDZ2.

A single process write performance is around 800MB/sec

anyway if I force direct I/O, for example using oflag=direct in dd, the
write performance drop as low as 8MB/sec

with 1MB block size. And each write it's about 120ms latency.

I used these ZFS settings

options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1

i am quite worried for the low performance.

Any hints or suggestions that may help me to improve the situation ?


thank you


Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread Patrick Farrell
Riccardo,


While the difference is extreme, direct I/O write performance will always be 
poor.  Direct I/O writes cannot be asynchronous, since they don't use the page 
cache.  This means Lustre cannot return from one write (and start the next) 
until it has finished transferring the data to the network.


This means you can only have one I/O in flight at a time.  Good write 
performance from Lustre (or any network filesystem) depends on keeping a lot of 
data in flight at once.


What sort of direct write performance were you hoping for?  It will never match 
that 800 MB/s from one thread you see with buffered I/O.


- Patrick


From: lustre-discuss  on behalf of 
Riccardo Veraldi 
Sent: Friday, October 14, 2016 2:22:32 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

Hello,

I would like how may I improve the situation of my lustre cluster.

I have 1 MDS and 1 OSS with 20 OST defined.

Each OST is a 8x Disks RAIDZ2.

A single process write performance is around 800MB/sec

anyway if I force direct I/O, for example using oflag=direct in dd, the
write performance drop as low as 8MB/sec

with 1MB block size. And each write it's about 120ms latency.

I used these ZFS settings

options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1

i am quite worried for the low performance.

Any hints or suggestions that may help me to improve the situation ?


thank you


Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread Jones, Peter A
Riccardo

I would imagine that knowing the Lustre and ZFS version you are using
could be useful info to anyone who could advise you.

Peter

On 10/14/16, 12:22 PM, "lustre-discuss on behalf of Riccardo Veraldi"
 wrote:

>Hello,
>
>I would like how may I improve the situation of my lustre cluster.
>
>I have 1 MDS and 1 OSS with 20 OST defined.
>
>Each OST is a 8x Disks RAIDZ2.
>
>A single process write performance is around 800MB/sec
>
>anyway if I force direct I/O, for example using oflag=direct in dd, the
>write performance drop as low as 8MB/sec
>
>with 1MB block size. And each write it's about 120ms latency.
>
>I used these ZFS settings
>
>options zfs zfs_prefetch_disable=1
>options zfs zfs_txg_history=120
>options zfs metaslab_debug_unload=1
>
>i am quite worried for the low performance.
>
>Any hints or suggestions that may help me to improve the situation ?
>
>
>thank you
>
>
>Rick
>
>
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre on ZFS pooer direct I/O performance

2016-10-14 Thread Riccardo Veraldi

Hello,

I would like how may I improve the situation of my lustre cluster.

I have 1 MDS and 1 OSS with 20 OST defined.

Each OST is a 8x Disks RAIDZ2.

A single process write performance is around 800MB/sec

anyway if I force direct I/O, for example using oflag=direct in dd, the 
write performance drop as low as 8MB/sec


with 1MB block size. And each write it's about 120ms latency.

I used these ZFS settings

options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1

i am quite worried for the low performance.

Any hints or suggestions that may help me to improve the situation ?


thank you


Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org