On 01/16/14 22:39, Brock Palen wrote:
Our Scratch filesystem is organized in the format:
/scratch/$PROEJCT/$USER
$PROJECT is the actual funder of the use of the system and we want to track
their usage over time.
In the past I used:
rhb-report -i --top-users -P/scratch/$PROJECT
To get the information we want, we would like to run this on a regular basis
(once a week) to get time series of per user per project use of scratch, both
by count and size. All data RBH provides.
I am running a git clone of the current RBH tree, the scan went fast, but the
invocation of:
rbh-report -i -P/scratch/$PROJECT
Is actually taking more time than the entire filesystem scan :-(
The scan was 17 hours, and rbh-report -i -P /scratch/aero_flux started Jan
14th, and is not yet done.
With rbh 2.5, fullpath is no longer stored in the DB to correctly manage
rename on directories, hardlinks etc.
As a result, matching the fullpath (what -P does) is more expensive, as
it needs to rebuild the fullpath for entries to match it.
For your need, you can use "rbh-du" that can provide details (entry
count and volume per type), and can be filtered by user.
This command gives you the stats you want:
rbh-du -d -u $USER /scratch/$PROJECT
Is there a way I can 'train' RBH to make these totals as it scans? Much like
the summary for the entire filesystem?
Robinhood does maintain pre-generated stats per user, per type etc., so
they can be queried instantly
but it only does this for the whole filesystem, not per subdirectory
(/scratch/$PROJECT).
This is an interesting feature we can consider for a future version.
One possible solution for now is to split your robinhood in multiple
instances, one for each project
so it will pre-generate the stats you want (you will have 1 DB and 1
scanning process per project).
This can be done easily with rbh 2.5, with a single config file, as you
can now use environment variables in the config.
You can write in the config file: "fs_path = $FSPATH" and "db = $RHDB",
and then run robinhood commands like this:
FSPATH=/scratch/$PROJECT RHDB=robinhood_$PROJECT robinhood --scan ...
However, splitting robinhood won't be compatible with changelog
mechanism if you use it,
and you must configure purge policies with caution as the 'df' that
trigger them is global.
HTH
Thomas
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
[email protected]
(734)936-1985
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support