Apologies if this reply does not get attached to the original thread,
I have just joined the mailing list.

We're running robinhood v3.0 here, but only for a few months.  However,
I'd like to mention that time-based fileclasses actually do get defined,
despite the dire warning messages generated when you put them into your
config file.  They show up in the "rbh-report --class-info" output, and
you do get reasonable numbers in that report after re-scanning the
filesystem.

We have created time-based fileclasses here for reporting only, at this
point.  They are not as useful as we had hoped, because we have so far not
found a fast way to generate an age profile of a particular subdirectory.

For example, in our Lustre filesystem of about 270 million entries,
it takes 30+ hours to get the result of:

  rbh-report --class-info -P /lustre/path/mysubdir

It only takes about 50-55 minutes for the above command without
the "-P" option.  This is roughly the same amount of time required
to run our version of the "filesystem temperature" SQL query that
Thomas described in his earlier reply.

Suggestions for making the "-P /path/" case faster would be welcome.
I think we would find a --filter-class (-C) option for rbh-du and
rbh-find to be useful as well.

Oh, I should probably mention that our changelog mask here includes
modify time, but not access time items.  Presumably this means that
access times will only be updated in the robinhood database when
changelog records for those entries come through for other reasons.
Does anyone know if that is indeed what happens?

Regards,

Marion



===============================================================

Re: [robinhood-support] Fileclass with time criteria
From: HERTRICH Olivier <o.hertrich@da...> - 2017-11-10 16:16:57
Attachments: image002.jpg image003.jpg Message as HTML

Thomas,

Thank you for this quick answer.
I'm out of office next week; I'll try as soon as I'm back.

[cid:image003.jpg@...]
Olivier HERTRICH
EXPERT ITS
DIRECTION OPÉRATIONS TECHNOLOGIES / INTEGRATION CENTER
DIRECTION DES SYSTÈMES D'INFORMATION

T +33 562417338
M +33 608020775

o.hertrich@...<mailto:o.hertrich@...>
TARBES - FRANCE
 http://www.daher.com<http://www.daher.com>; - http://www.tbm.aero<http://www.t
bm.aero>;


De : LEIBOVICI Thomas [mailto:thomas.leibovici@...]
Envoyé : vendredi 10 novembre 2017 14:17
À : HERTRICH Olivier <o.hertrich@...>; robinhood-support@...
Objet : Re: [robinhood-support] Fileclass with time criteria

Hi,

Fileclass are expected to be quite static, while time conditions may change 
every second...

I know the kind of report you mention has already been done with robinhood
For example the "filesystem temprature" graph shown in this presentation slide 
14, is based on such report:
https://www.eofs.eu/_media/events/lad13/04_kilian_cavalotti_lustre.usage.monito
ring.pdf
It shows the "modification age" and the "access age" of the  data.

This is the request to build such graph for access times. Just replace 
'last_access' with 'last_mod' to get the same for last modification time :

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; START TRANSACTION; 
SELECT age, SUM(c) AS cnt, SUM(v) AS vol FROM ( SELECT c, v, CASE
 WHEN log_age < ROUND(LOG(10,900),5) THEN 'r_0'
 WHEN log_age < ROUND(LOG(10,3600),5) THEN 'r_1'
 WHEN log_age < ROUND(LOG(10,21600),5) THEN 'r_2'
 WHEN log_age < ROUND(LOG(10,86400),5) THEN 'r_3'
 WHEN log_age < ROUND(LOG(10,604800),5) THEN 'r_4'
 WHEN log_age < ROUND(LOG(10,2592000),5) THEN 'r_5'
 WHEN log_age < ROUND(LOG(10,5184000),5) THEN 'r_6'
 WHEN log_age < ROUND(LOG(10,7776000),5) THEN 'r_7'
 ELSE 'r_8' END AS age FROM ( SELECT IF(UNIX_TIMESTAMP(NOW())>
=last_access,ROUND(LOG(10,UNIX_TIMESTAMP(NOW())-last_access),5),NULL) AS 
log_age, COUNT(*) AS c, IFNULL(SUM(size),0) AS v FROM ENTRIES GROUP BY log_age 
) AS ps ) AS stats GROUP BY age; COMMIT;

For duplicate name-size couples, a crafty sql request should do it...
You can first create a temporary table with id, size and name:

CREATE TABLE FIND_DUP AS (SELECT E.id, E.size, N.parent_id, N.name, 
this_path(N.parent_id, N.name) as path from ENTRIES E,NAMES N WHERE E.id=N.id 
and E.type='file');

Then you can list name-size duplicates using:
SELECT A.path, B.path, A.size FROM FIND_DUP A, FIND_DUP B WHERE A.size=B.size 
and A.name=B.name and A.id <> B.id;

HTH
Thomas


On 11/09/17 19:28, HERTRICH Olivier wrote:
Discovering Robinhood, I search to have a report based last_access time to 
identify the size of data older than some dates.
It seems that time criteria are not allowed in Fileclass definition.

How can I have this reports?
I also search to find duplicate files (same name/ same size, for now). IS this 
possible ?

Thank you for your help.

[cid:image002.jpg@...]
Olivier HERTRICH
EXPERT ITS
DIRECTION OPÉRATIONS TECHNOLOGIES / INTEGRATION CENTER
DIRECTION DES SYSTÈMES D'INFORMATION

T +33 562417338
M +33 608020775

o.hertrich@...<mailto:o.hertrich@...>
TARBES - FRANCE
 http://www.daher.com<http://www.daher.com>; - http://www.tbm.aero<http://www.t
bm.aero>;




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to