Re: [lustre-discuss] Avoiding system cache when using ssd pfl extent

2022-05-19 Thread Patrick Farrell via lustre-discuss
Well, you could use two file descriptors, one for O_DIRECT one otherwise. 

SSD is a fast medium but my instinct is the desirability of having data in RAM 
is much more about I/O pattern and hard to optimize for in advance - Do you 
read the data you wrote?  (Or read data repeatedly?)

In any case, there's no mechanism today.  It's also relatively marginal if 
we're just doing buffered I/O then forcing the data out - it will reduce memory 
usage but it won't improve performance.

-Patrick


From: John Bauer 
Sent: Thursday, May 19, 2022 1:16 PM
To: Patrick Farrell ; lustre-discuss@lists.lustre.org 

Subject: Re: [lustre-discuss] Avoiding system cache when using ssd pfl extent


Pat,

No, not in  general.  It just seems that if one is storing data on an SSD it 
should be optional to have it not stored in memory ( why store in 2 fast 
mediums ).

O_DIRECT is not of value as that would apply to all extents, whether on SSD on 
HDD.   O_DIRECT on Lustre has been problematic for me in the past, performance 
wise.

John

On 5/19/22 13:05, Patrick Farrell wrote:
No, and I'm not sure I agree with you at first glance.

Is this just generally an idea that data stored on SSD should not be in RAM?  
If so, there's no mechanism for that other than using direct I/O.

-Patrick

From: lustre-discuss 

 on behalf of John Bauer 
Sent: Thursday, May 19, 2022 12:48 PM
To: lustre-discuss@lists.lustre.org 

Subject: [lustre-discuss] Avoiding system cache when using ssd pfl extent

When using PFL, and using an SSD as the first extent, it seems it would
be advantageous to not have that extent's file data consume memory in
the client's system buffers.  It would be similar to using O_DIRECT, but
on a per-extent basis.  Is there a mechanism for that already?

Thanks,

John

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Avoiding system cache when using ssd pfl extent

2022-05-19 Thread John Bauer

Pat,

No, not in  general.  It just seems that if one is storing data on an 
SSD it should be optional to have it not stored in memory ( why store in 
2 fast mediums ).


O_DIRECT is not of value as that would apply to all extents, whether on 
SSD on HDD.   O_DIRECT on Lustre has been problematic for me in the 
past, performance wise.


John

On 5/19/22 13:05, Patrick Farrell wrote:

No, and I'm not sure I agree with you at first glance.

Is this just generally an idea that data stored on SSD should not be 
in RAM?  If so, there's no mechanism for that other than using direct I/O.


-Patrick

*From:* lustre-discuss  on 
behalf of John Bauer 

*Sent:* Thursday, May 19, 2022 12:48 PM
*To:* lustre-discuss@lists.lustre.org 
*Subject:* [lustre-discuss] Avoiding system cache when using ssd pfl 
extent

When using PFL, and using an SSD as the first extent, it seems it would
be advantageous to not have that extent's file data consume memory in
the client's system buffers.  It would be similar to using O_DIRECT, but
on a per-extent basis.  Is there a mechanism for that already?

Thanks,

John

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Avoiding system cache when using ssd pfl extent

2022-05-19 Thread Patrick Farrell via lustre-discuss
No, and I'm not sure I agree with you at first glance.

Is this just generally an idea that data stored on SSD should not be in RAM?  
If so, there's no mechanism for that other than using direct I/O.

-Patrick

From: lustre-discuss  on behalf of 
John Bauer 
Sent: Thursday, May 19, 2022 12:48 PM
To: lustre-discuss@lists.lustre.org 
Subject: [lustre-discuss] Avoiding system cache when using ssd pfl extent

When using PFL, and using an SSD as the first extent, it seems it would
be advantageous to not have that extent's file data consume memory in
the client's system buffers.  It would be similar to using O_DIRECT, but
on a per-extent basis.  Is there a mechanism for that already?

Thanks,

John

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Avoiding system cache when using ssd pfl extent

2022-05-19 Thread John Bauer
When using PFL, and using an SSD as the first extent, it seems it would 
be advantageous to not have that extent's file data consume memory in 
the client's system buffers.  It would be similar to using O_DIRECT, but 
on a per-extent basis.  Is there a mechanism for that already?


Thanks,

John

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] FLR Mirroring for read performance

2022-05-19 Thread Andreas Dilger via lustre-discuss
On May 11, 2022, at 08:25, Nathan Dauchy wrote:
> 
> Greetings!

Hello Nathan,

> During the helpful LUG tutorial from Rick Mohr on advanced lustre file 
> layouts, it was mentioned that “lfs mirror” could be used to improve read 
> performance.  And the manual supports this, stating “files that are 
> concurrently read by many clients (e.g. input decks, shared libraries, or 
> executables) the aggregate parallel read performance of a single file can be 
> improved by creating multiple mirrors of the file data”.
>  
> What method does Lustre use to ensure that multiple clients balance their 
> read workloads from the multiple mirrors?

Currently (2.15.0), if there are no mirror copies marked "prefer", it tries the 
mirror with the most stripes on flash devices (vs. mirrors on HDDs), and if 
there are still multiple mirrors it uses the hash of a client memory pointer 
address modulo mirror count.  This should be relatively random for each client 
to distribute the read workload across mirrors. 

I'm not totally sure why the "hash of the pointer address" mechanism was 
implemented, as clients typically use the client NID as the basis for 
"autonomous" load distribution (modulo mirror count in this case) so that the 
workload is "ideally" distributed across copies without any added 
communication.  The latter is what is described in LU-10158 "FLR: Define a 
replica choosing policy function", but this is not fully implemented.

> Are there any tuning parameters that should be considered, other than making 
> sure the “preferred” flag is NOT set on a single mirror, to help even out the 
> read workload among the OSTs?
>  
> Has anyone tested this and quantified the performance improvement?

I don't recall seeing any benchmarks to verify this behavior for reads, but I'd 
be interested to learn of any results you find.

In typical FLR uses that I'm aware of this is mainly between HDD and NVMe 
mirror copies, not multiple copies on the same class of storage, so they use 
either the "prefer" flag set on the flash mirror, or with LU-14996 it also 
checks the OS_STATFS_NONROT flag from the OSTs (if this is reported, check "lfs 
df -v" for the 'f' (flash) flag).

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org