Hi Thomas,

Thanks for the quick response.  llapi_hsm_get_state itself, which is to say the 
IOCTL, generates no CL in modern versions of Lustre, but the open and close 
events do despite being read-only mode bits.  This is due to the following code 
(present in both 2.12.6 and 2.14.0 when I checked):

lustre/mdd/mdd_object.c
afef52b9f2b (Sebastien Buisson           2017-07-05 00:21:44 +0900 3329)        
/* Record CL_CLOSE in changelog only if file was opened in write mode,
b45f8364a30 (Sebastien Buisson           2017-07-31 20:50:22 +0900 3330)        
 * or if CL_OPEN was recorded and it's last close by user.
afef52b9f2b (Sebastien Buisson           2017-07-05 00:21:44 +0900 3331)        
 * Changelogs mask may change between open and close operations, but
afef52b9f2b (Sebastien Buisson           2017-07-05 00:21:44 +0900 3332)        
 * this is not a big deal if we have a CL_CLOSE entry with no matching
afef52b9f2b (Sebastien Buisson           2017-07-05 00:21:44 +0900 3333)        
 * CL_OPEN. Plus Changelogs mask may not change often.
afef52b9f2b (Sebastien Buisson           2017-07-05 00:21:44 +0900 3334)        
 */
e8bafb17ed1 (John L. Hammond             2018-03-01 10:02:09 -0600 3335)        
if (((!(mdd->mdd_cl.mc_mask & (1 << CL_OPEN)) &&
9c2ffe39bd3 (Andreas Dilger              2018-10-18 23:43:11 -0400 3336)        
      (open_flags & (MDS_FMODE_WRITE | MDS_OPEN_APPEND |
9c2ffe39bd3 (Andreas Dilger              2018-10-18 23:43:11 -0400 3337)        
                     MDS_OPEN_TRUNC))) ||
b45f8364a30 (Sebastien Buisson           2017-07-31 20:50:22 +0900 3338)        
     ((mdd->mdd_cl.mc_mask & (1 << CL_OPEN)) && last_close_by_uid)) &&
20d724103f4 (Fan Yong                    2016-11-04 18:19:29 +0800 3339)        
    !(ma->ma_valid & MA_FLAGS && ma->ma_attr_flags & MDS_RECOV_OPEN)) {

In short, since RBH via liblustre is the sole opener/closer of the file to 
perform the IOCTL to get HSM state, it gets its CLOSE event recorded.  I have 
confirmed that if I remove OPEN from my mask the changelog doesn't record the 
llapi call.  This is concerning, because everywhere I've seen the advice is to 
use "all-ATIME".  Is there a more accurate subset of the masks folks using 
Robinhood predominantly for HSM tasks should make sure they use?  Notably, the 
following link to the v3 HSM Tutorial references all-ATIME:
https://github.com/cea-hpc/robinhood/wiki/v3_lhsm_tuto

There is a tool in RBH that supposedly configures the changelog appropriately: 
rbh-config.  Can you comment on how up-to-date it is?  Comparing the masks it 
enables against events RBH seemingly references in-code makes me think it's 
out-of-date, which is why I'd not used it from the outset.

Thanks again for your time.

Best,

ellis


From: thomasleibovici <thomasleibov...@free.fr>
Sent: Thursday, June 2, 2022 5:31 PM
To: Ellis Wilson <elliswil...@microsoft.com>; 
robinhood-support@lists.sourceforge.net
Subject: [EXTERNAL] RE: [robinhood-support] Infinite llapi_hsm_state_get Calls

You don't often get email from 
thomasleibov...@free.fr<mailto:thomasleibov...@free.fr>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Dear Elis,

Thank you for your precise analysis and report that perfectly helps to 
understand the issue.

I'm quite surprised the llapi_hsm_get_state triggers a changelog event, given 
it is supposed to be a read-only action. Do you have a way to request some 
support about that to your lustre support?

If you don't have a filesystem with too many entries (<100M), a possible 
workaround could be to disable changelogs (or at least the close event) for the 
time of the import, and then scan the filesystem after the import.

Thank you for keeping us updated.

Thomas

-------- Message d'origine --------
De : Ellis Wilson via robinhood-support 
<robinhood-support@lists.sourceforge.net<mailto:robinhood-support@lists.sourceforge.net>>
Date : 02/06/2022 23:01 (GMT+01:00)
À : 
robinhood-support@lists.sourceforge.net<mailto:robinhood-support@lists.sourceforge.net>
Objet : [robinhood-support] Infinite llapi_hsm_state_get Calls

Hi all,

I noticed on my Lustre 2.14.0 cluster running Robinhood 3.1.5 that my 
changelogs were never reaching zero.

It appears that files that are imported from a backing archive, but are not 
modified or removed from the backing archive (i.e., never reach new or modified 
in lhsm parlance), go in a loop of the following:

1. RBH sees close associated with the import event or a subsequent 
open/read/close sequence.
2. RBH determines that since the file is not already in new or modified states, 
it must get a fresh llapi_hsm_get_state to see if it changed.  This appears to 
occur on CLOSE event.
3. llapi_hsm_get_state does an open/ioctl/close, throwing another close on the 
tail of the CL queue.
4. RBH clears the just processed close from the CL, but the new one persists.
5. Rinse and repeat starting at 2.

If 1M files are imported, the changelog remains at roughly 2M entries (one open 
and one close per file), though it's constantly being rototilled by RBH.

Thoughts?

ellis


_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net<mailto:robinhood-support@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/robinhood-support
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to