Hi Thomas, Thanks for the quick response. llapi_hsm_get_state itself, which is to say the IOCTL, generates no CL in modern versions of Lustre, but the open and close events do despite being read-only mode bits. This is due to the following code (present in both 2.12.6 and 2.14.0 when I checked):
lustre/mdd/mdd_object.c afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3329) /* Record CL_CLOSE in changelog only if file was opened in write mode, b45f8364a30 (Sebastien Buisson 2017-07-31 20:50:22 +0900 3330) * or if CL_OPEN was recorded and it's last close by user. afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3331) * Changelogs mask may change between open and close operations, but afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3332) * this is not a big deal if we have a CL_CLOSE entry with no matching afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3333) * CL_OPEN. Plus Changelogs mask may not change often. afef52b9f2b (Sebastien Buisson 2017-07-05 00:21:44 +0900 3334) */ e8bafb17ed1 (John L. Hammond 2018-03-01 10:02:09 -0600 3335) if (((!(mdd->mdd_cl.mc_mask & (1 << CL_OPEN)) && 9c2ffe39bd3 (Andreas Dilger 2018-10-18 23:43:11 -0400 3336) (open_flags & (MDS_FMODE_WRITE | MDS_OPEN_APPEND | 9c2ffe39bd3 (Andreas Dilger 2018-10-18 23:43:11 -0400 3337) MDS_OPEN_TRUNC))) || b45f8364a30 (Sebastien Buisson 2017-07-31 20:50:22 +0900 3338) ((mdd->mdd_cl.mc_mask & (1 << CL_OPEN)) && last_close_by_uid)) && 20d724103f4 (Fan Yong 2016-11-04 18:19:29 +0800 3339) !(ma->ma_valid & MA_FLAGS && ma->ma_attr_flags & MDS_RECOV_OPEN)) { In short, since RBH via liblustre is the sole opener/closer of the file to perform the IOCTL to get HSM state, it gets its CLOSE event recorded. I have confirmed that if I remove OPEN from my mask the changelog doesn't record the llapi call. This is concerning, because everywhere I've seen the advice is to use "all-ATIME". Is there a more accurate subset of the masks folks using Robinhood predominantly for HSM tasks should make sure they use? Notably, the following link to the v3 HSM Tutorial references all-ATIME: https://github.com/cea-hpc/robinhood/wiki/v3_lhsm_tuto There is a tool in RBH that supposedly configures the changelog appropriately: rbh-config. Can you comment on how up-to-date it is? Comparing the masks it enables against events RBH seemingly references in-code makes me think it's out-of-date, which is why I'd not used it from the outset. Thanks again for your time. Best, ellis From: thomasleibovici <thomasleibov...@free.fr> Sent: Thursday, June 2, 2022 5:31 PM To: Ellis Wilson <elliswil...@microsoft.com>; robinhood-support@lists.sourceforge.net Subject: [EXTERNAL] RE: [robinhood-support] Infinite llapi_hsm_state_get Calls You don't often get email from thomasleibov...@free.fr<mailto:thomasleibov...@free.fr>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Dear Elis, Thank you for your precise analysis and report that perfectly helps to understand the issue. I'm quite surprised the llapi_hsm_get_state triggers a changelog event, given it is supposed to be a read-only action. Do you have a way to request some support about that to your lustre support? If you don't have a filesystem with too many entries (<100M), a possible workaround could be to disable changelogs (or at least the close event) for the time of the import, and then scan the filesystem after the import. Thank you for keeping us updated. Thomas -------- Message d'origine -------- De : Ellis Wilson via robinhood-support <robinhood-support@lists.sourceforge.net<mailto:robinhood-support@lists.sourceforge.net>> Date : 02/06/2022 23:01 (GMT+01:00) À : robinhood-support@lists.sourceforge.net<mailto:robinhood-support@lists.sourceforge.net> Objet : [robinhood-support] Infinite llapi_hsm_state_get Calls Hi all, I noticed on my Lustre 2.14.0 cluster running Robinhood 3.1.5 that my changelogs were never reaching zero. It appears that files that are imported from a backing archive, but are not modified or removed from the backing archive (i.e., never reach new or modified in lhsm parlance), go in a loop of the following: 1. RBH sees close associated with the import event or a subsequent open/read/close sequence. 2. RBH determines that since the file is not already in new or modified states, it must get a fresh llapi_hsm_get_state to see if it changed. This appears to occur on CLOSE event. 3. llapi_hsm_get_state does an open/ioctl/close, throwing another close on the tail of the CL queue. 4. RBH clears the just processed close from the CL, but the new one persists. 5. Rinse and repeat starting at 2. If 1M files are imported, the changelog remains at roughly 2M entries (one open and one close per file), though it's constantly being rototilled by RBH. Thoughts? ellis _______________________________________________ robinhood-support mailing list robinhood-support@lists.sourceforge.net<mailto:robinhood-support@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/robinhood-support
_______________________________________________ robinhood-support mailing list robinhood-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/robinhood-support