Re: [robinhood-support] difference in fullness between rbh-du and lfs df -h (on lustre filesystem)

Hall, Shawn (NAG) Tue, 31 Jan 2017 13:36:33 -0800

Hi Thomas.  Thanks for the reply and the helpful information.

For what it’s worth, we’ve had ATIME changelogs on for months and the Robinhood 
changelog stats indicate that it’s processing fewer ATIME updates compared to 
CLOSE, TRUNC, and SATTR updates.  Of course that may just be our systems.


Are you saying the solution you proposed is possible today if we were running 
Robinhood v3?  I’ll keep that handy for when we are able to do a Robinhood 
upgrade.

It’s understandable that Robinhood won’t keep track of block counts in real 
time (and Lustre shouldn’t produce changelogs for that).  But from what we’ve 
seen, there are also instances where even after a file is closed not in use by 
a job, the size information remains out of date.  It’s as if some files “slip 
through the cracks” and remain out of date.  I would have expected that all 
files get a final update upon closing due to a CLOSE or a SATTR changelog 
generation, therefore not leaving closed files with an incorrect size.  Is this 
not how the changelogs should work?

Thanks,
Shawn

From: LEIBOVICI Thomas <thomas.leibov...@cea.fr>
Organization: CEA-DAM
Date: Tuesday, January 31, 2017 at 8:23 AM
To: Shawn Hall <shawn.h...@bp.com>, Jessica Otey <jo...@nrao.edu>, 
"robinhood-support@lists.sourceforge.net" 
<robinhood-support@lists.sourceforge.net>
Subject: Re: [robinhood-support] difference in fullness between rbh-du and lfs 
df -h (on lustre filesystem)

Hi Jessica, Hi Shawn,

For performance reasons, Lustre Changelog does not report about each I/O.
So robinhood does not update entries block count each time a block is written 
to a file.
This can explain that the space usage progressively differs.
Moreover, Lustre may have a lazy management of the block count attribute, as it 
is not critical for application working, or POSIX compliance.
So even if robinhood stat() the file, if may not get the right value...

The solutions proposed by Shawn give robinhood more chances to get an 
up-to-date information about block count, but there is no guaranty that this 
results in a 100% up-to-date block count.
Solution 1 is right. I recommend it for your issue.
Solution 2 must have a significant performance impact, so it is not recommended 
if your system is loaded.

The solution i'm thinking about (possible with robinhood v3) is to define an 
"update" policy, that will do nothing except updating entry attributes.
For example, this policy could only apply to entries modified > 1h ago and < 
1d, so robinhood will have a chance to get a up-to-date block count (by 
querying it 1h after last change), and it only has to do this for recent 
entries (the <1d condition).

I'd define a policy like this:

define_policy update {
    default_action = none; # simply update entries, run no action
    scope { type == file }
    status_manager = none;
    default_lru_sort_attr = none;
}

update_rules {
    ignore { last_mod < 1h }
    ignore { last_mod > 1d }

    rule default {
        condition = true; # apply to all entries (except above exclusions)
    }
}

# run every 12h
update_trigger {
    trigger_on = scheduled;
    check_interval = 12h;
}

# to run it, the command line should include:
--run=update

Regards,
Thomas

On 01/24/17 16:50, Hall, Shawn (NAG) wrote:

Jessica,



We see the same issue.  I posted my question back in October but haven’t heard 
anything back: https://sourceforge.net/p/robinhood/mailman/message/35435513/



We are in a better place than before, but I still see some amount of 
discrepancy over time.  We haven’t done a scan since late October and are 
staying reasonably in sync (not perfect though).  I’ve done a couple things 
that are helping us to stay more in sync:



1) In the Robinhood configuration file, I set md_update = always.  From looking 
through the source code, it appears this setting will force a metadata update 
every time every chance it gets, which is more frequently than the default.  I 
don’t believe this has made a big impact, but it helps 
(https://github.com/cea-hpc/robinhood/blob/5274d237105e04f20fc81258a927d9c4311ebc77/src/common/update_params.c#L428).

2) I also enabled ATIME changelogs.  This is not recommended for production by 
Lustre or Robinhood developers, but our metadata load is low enough that this 
doesn’t cause problems.  There seems to be a hierarchy to when/if changelogs 
are reported 
(http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2016-March/013376.html),
 and I believe this change is the one that has helped us keep in sync the most.



Both of these changes have performance ramifications and should be done at your 
own risk, but in our situation they’ve seemed to help.  I’d be very happy to 
hear expert advice on keeping Robinhood in sync however.



Thanks,

Shawn



On 1/24/17, 9:24 AM, "Jessica Otey" <jo...@nrao.edu><mailto:jo...@nrao.edu> 
wrote:



    All,



    I have been observing for a while now a difference between how full

    robinhood believes our filesystem is and how full lfs df -h reports it is.



    I am wondering if anyone has any insight into this.



    A bit of history... I run frequent reports against the robinhood

    database, which also include the output of the lfs df -h command.

    Historically, the total based on rbh-du and lfs df were essentially

    identical. Lately, they seem to keep growing apart--what seems to bring

    them back together is doing a full scan of the file system. As time

    passes after a scan (changelogs are on), the 'lfs df' command reports

    that usage is more and more full than rbh-du says it is.



    Indeed, I recently reinstalled (upgraded) my robinhood and noticed that

    when the report ran precisely after the filescan and before the service

    was activated (so no changelogs were being consumed) the difference was

    precisely zero. I believe that is a clue... but I don't know why this is

    happening (when there was a lengthy period where it wasn't happening) or

    what to do to fix it.



    Also, it might help to know that there is a non-negligible amount of

    moving files from one OST to another taking place.



    Thanks,

    Jessica



    --

    Jessica Otey

    System Administrator II

    North American ALMA Science Center (NAASC)

    National Radio Astronomy Observatory (NRAO)

    Charlottesville, Virginia (USA)





    
------------------------------------------------------------------------------

    Check out the vibrant tech community on one of the world's most

    engaging tech sites, SlashDot.org! http://sdm.link/slashdot

    _______________________________________________

    robinhood-support mailing list

    
robinhood-support@lists.sourceforge.net<mailto:robinhood-support@lists.sourceforge.net>

    https://lists.sourceforge.net/lists/listinfo/robinhood-support





------------------------------------------------------------------------------

Check out the vibrant tech community on one of the world's most

engaging tech sites, SlashDot.org! http://sdm.link/slashdot

_______________________________________________

robinhood-support mailing list

robinhood-support@lists.sourceforge.net<mailto:robinhood-support@lists.sourceforge.net>

https://lists.sourceforge.net/lists/listinfo/robinhood-support

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Re: [robinhood-support] difference in fullness between rbh-du and lfs df -h (on lustre filesystem)

Reply via email to