Apologies if this reply does not get attached to the original thread, I have just joined the mailing list.
We're running robinhood v3.0 here, but only for a few months. However, I'd like to mention that time-based fileclasses actually do get defined, despite the dire warning messages generated when you put them into your config file. They show up in the "rbh-report --class-info" output, and you do get reasonable numbers in that report after re-scanning the filesystem. We have created time-based fileclasses here for reporting only, at this point. They are not as useful as we had hoped, because we have so far not found a fast way to generate an age profile of a particular subdirectory. For example, in our Lustre filesystem of about 270 million entries, it takes 30+ hours to get the result of: rbh-report --class-info -P /lustre/path/mysubdir It only takes about 50-55 minutes for the above command without the "-P" option. This is roughly the same amount of time required to run our version of the "filesystem temperature" SQL query that Thomas described in his earlier reply. Suggestions for making the "-P /path/" case faster would be welcome. I think we would find a --filter-class (-C) option for rbh-du and rbh-find to be useful as well. Oh, I should probably mention that our changelog mask here includes modify time, but not access time items. Presumably this means that access times will only be updated in the robinhood database when changelog records for those entries come through for other reasons. Does anyone know if that is indeed what happens? Regards, Marion =============================================================== Re: [robinhood-support] Fileclass with time criteria From: HERTRICH Olivier <o.hertrich@da...> - 2017-11-10 16:16:57 Attachments: image002.jpg image003.jpg Message as HTML Thomas, Thank you for this quick answer. I'm out of office next week; I'll try as soon as I'm back. [cid:image003.jpg@...] Olivier HERTRICH EXPERT ITS DIRECTION OPÉRATIONS TECHNOLOGIES / INTEGRATION CENTER DIRECTION DES SYSTÈMES D'INFORMATION T +33 562417338 M +33 608020775 o.hertrich@...<mailto:o.hertrich@...> TARBES - FRANCE http://www.daher.com<http://www.daher.com>; - http://www.tbm.aero<http://www.t bm.aero>; De : LEIBOVICI Thomas [mailto:thomas.leibovici@...] Envoyé : vendredi 10 novembre 2017 14:17 À : HERTRICH Olivier <o.hertrich@...>; robinhood-support@... Objet : Re: [robinhood-support] Fileclass with time criteria Hi, Fileclass are expected to be quite static, while time conditions may change every second... I know the kind of report you mention has already been done with robinhood For example the "filesystem temprature" graph shown in this presentation slide 14, is based on such report: https://www.eofs.eu/_media/events/lad13/04_kilian_cavalotti_lustre.usage.monito ring.pdf It shows the "modification age" and the "access age" of the data. This is the request to build such graph for access times. Just replace 'last_access' with 'last_mod' to get the same for last modification time : SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; START TRANSACTION; SELECT age, SUM(c) AS cnt, SUM(v) AS vol FROM ( SELECT c, v, CASE WHEN log_age < ROUND(LOG(10,900),5) THEN 'r_0' WHEN log_age < ROUND(LOG(10,3600),5) THEN 'r_1' WHEN log_age < ROUND(LOG(10,21600),5) THEN 'r_2' WHEN log_age < ROUND(LOG(10,86400),5) THEN 'r_3' WHEN log_age < ROUND(LOG(10,604800),5) THEN 'r_4' WHEN log_age < ROUND(LOG(10,2592000),5) THEN 'r_5' WHEN log_age < ROUND(LOG(10,5184000),5) THEN 'r_6' WHEN log_age < ROUND(LOG(10,7776000),5) THEN 'r_7' ELSE 'r_8' END AS age FROM ( SELECT IF(UNIX_TIMESTAMP(NOW())> =last_access,ROUND(LOG(10,UNIX_TIMESTAMP(NOW())-last_access),5),NULL) AS log_age, COUNT(*) AS c, IFNULL(SUM(size),0) AS v FROM ENTRIES GROUP BY log_age ) AS ps ) AS stats GROUP BY age; COMMIT; For duplicate name-size couples, a crafty sql request should do it... You can first create a temporary table with id, size and name: CREATE TABLE FIND_DUP AS (SELECT E.id, E.size, N.parent_id, N.name, this_path(N.parent_id, N.name) as path from ENTRIES E,NAMES N WHERE E.id=N.id and E.type='file'); Then you can list name-size duplicates using: SELECT A.path, B.path, A.size FROM FIND_DUP A, FIND_DUP B WHERE A.size=B.size and A.name=B.name and A.id <> B.id; HTH Thomas On 11/09/17 19:28, HERTRICH Olivier wrote: Discovering Robinhood, I search to have a report based last_access time to identify the size of data older than some dates. It seems that time criteria are not allowed in Fileclass definition. How can I have this reports? I also search to find duplicate files (same name/ same size, for now). IS this possible ? Thank you for your help. [cid:image002.jpg@...] Olivier HERTRICH EXPERT ITS DIRECTION OPÉRATIONS TECHNOLOGIES / INTEGRATION CENTER DIRECTION DES SYSTÈMES D'INFORMATION T +33 562417338 M +33 608020775 o.hertrich@...<mailto:o.hertrich@...> TARBES - FRANCE http://www.daher.com<http://www.daher.com>; - http://www.tbm.aero<http://www.t bm.aero>; ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ robinhood-support mailing list robinhood-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/robinhood-support