Hi Thomas,
Thanks for the quick response. llapi_hsm_get_state itself, which is to say the
IOCTL, generates no CL in modern versions of Lustre, but the open and close
events do despite being read-only mode bits. This is due to the following code
(present in both 2.12.6 and 2.14.0 when I checked):
lustre/mdd/mdd_object.c
afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3329)
/* Record CL_CLOSE in changelog only if file was opened in write mode,
b45f8364a30 (Sebastien Buisson 2017-07-31 20:50:22 +0900 3330)
* or if CL_OPEN was recorded and it's last close by user.
afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3331)
* Changelogs mask may change between open and close operations, but
afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3332)
* this is not a big deal if we have a CL_CLOSE entry with no matching
afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3333)
* CL_OPEN. Plus Changelogs mask may not change often.
afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3334)
*/
e8bafb17ed1 (John L. Hammond 2018-03-01 10:02:09 -0600 3335)
if (((!(mdd->mdd_cl.mc_mask & (1 << CL_OPEN)) &&
9c2ffe39bd3 (Andreas Dilger 2018-10-18 23:43:11 -0400 3336)
(open_flags & (MDS_FMODE_WRITE | MDS_OPEN_APPEND |
9c2ffe39bd3 (Andreas Dilger 2018-10-18 23:43:11 -0400 3337)
MDS_OPEN_TRUNC))) ||
b45f8364a30 (Sebastien Buisson 2017-07-31 20:50:22 +0900 3338)
((mdd->mdd_cl.mc_mask & (1 << CL_OPEN)) && last_close_by_uid)) &&
20d724103f4 (Fan Yong 2016-11-04 18:19:29 +0800 3339)
!(ma->ma_valid & MA_FLAGS && ma->ma_attr_flags & MDS_RECOV_OPEN)) {
In short, since RBH via liblustre is the sole opener/closer of the file to
perform the IOCTL to get HSM state, it gets its CLOSE event recorded. I have
confirmed that if I remove OPEN from my mask the changelog doesn't record the
llapi call. This is concerning, because everywhere I've seen the advice is to
use "all-ATIME". Is there a more accurate subset of the masks folks using
Robinhood predominantly for HSM tasks should make sure they use? Notably, the
following link to the v3 HSM Tutorial references all-ATIME:
https://github.com/cea-hpc/robinhood/wiki/v3_lhsm_tuto
There is a tool in RBH that supposedly configures the changelog appropriately:
rbh-config. Can you comment on how up-to-date it is? Comparing the masks it
enables against events RBH seemingly references in-code makes me think it's
out-of-date, which is why I'd not used it from the outset.
Thanks again for your time.
Best,
ellis
From: thomasleibovici <[email protected]>
Sent: Thursday, June 2, 2022 5:31 PM
To: Ellis Wilson <[email protected]>;
[email protected]
Subject: [EXTERNAL] RE: [robinhood-support] Infinite llapi_hsm_state_get Calls
You don't often get email from
[email protected]<mailto:[email protected]>. Learn why this is
important<https://aka.ms/LearnAboutSenderIdentification>
Dear Elis,
Thank you for your precise analysis and report that perfectly helps to
understand the issue.
I'm quite surprised the llapi_hsm_get_state triggers a changelog event, given
it is supposed to be a read-only action. Do you have a way to request some
support about that to your lustre support?
If you don't have a filesystem with too many entries (<100M), a possible
workaround could be to disable changelogs (or at least the close event) for the
time of the import, and then scan the filesystem after the import.
Thank you for keeping us updated.
Thomas
-------- Message d'origine --------
De : Ellis Wilson via robinhood-support
<[email protected]<mailto:[email protected]>>
Date : 02/06/2022 23:01 (GMT+01:00)
À :
[email protected]<mailto:[email protected]>
Objet : [robinhood-support] Infinite llapi_hsm_state_get Calls
Hi all,
I noticed on my Lustre 2.14.0 cluster running Robinhood 3.1.5 that my
changelogs were never reaching zero.
It appears that files that are imported from a backing archive, but are not
modified or removed from the backing archive (i.e., never reach new or modified
in lhsm parlance), go in a loop of the following:
1. RBH sees close associated with the import event or a subsequent
open/read/close sequence.
2. RBH determines that since the file is not already in new or modified states,
it must get a fresh llapi_hsm_get_state to see if it changed. This appears to
occur on CLOSE event.
3. llapi_hsm_get_state does an open/ioctl/close, throwing another close on the
tail of the CL queue.
4. RBH clears the just processed close from the CL, but the new one persists.
5. Rinse and repeat starting at 2.
If 1M files are imported, the changelog remains at roughly 2M entries (one open
and one close per file), though it's constantly being rototilled by RBH.
Thoughts?
ellis
_______________________________________________
robinhood-support mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/robinhood-support
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support