On 11/14/17 02:58, Marion Hakanson wrote:
I'd like to mention that time-based fileclasses actually do get defined,
despite the dire warning messages generated when you put them into your
config file.  They show up in the "rbh-report --class-info" output, and
you do get reasonable numbers in that report after re-scanning the
filesystem.
You must be aware that the fileclass matching is done when the entry is discovered (at scan time, policy run time or when there is activity - i.e changelog records - about an entry).
That means if you categorize an entry < 1 day as the "recent" fileclass,
and you request the report after 15days, the entry will still appear as "recent" which is wrong as a long time elapsed since the entry was matched... That's why time-based condition are not recommended in fileclasses. However, as long as you are aware of that and you know your fileclasses may be wrong until you rescan everything, no problem.


We have created time-based fileclasses here for reporting only, at this
point.  They are not as useful as we had hoped, because we have so far not
found a fast way to generate an age profile of a particular subdirectory.

For example, in our Lustre filesystem of about 270 million entries,
it takes 30+ hours to get the result of:

   rbh-report --class-info -P /lustre/path/mysubdir
in-db path matching using -P is known to be inefficient.
I suggest you define fileclasses for the particular directories you want to track
so you will get everything into the classinfo report.

I think we would find a --filter-class (-C) option for rbh-du and
rbh-find to be useful as well.
Yes that would be fine.

Regards,
Thomas
Oh, I should probably mention that our changelog mask here includes
modify time, but not access time items.  Presumably this means that
access times will only be updated in the robinhood database when
changelog records for those entries come through for other reasons.
Does anyone know if that is indeed what happens?

Regards,

Marion



===============================================================

Re: [robinhood-support] Fileclass with time criteria
From: HERTRICH Olivier <o.hertrich@da...> - 2017-11-10 16:16:57
Attachments: image002.jpg image003.jpg Message as HTML

Thomas,

Thank you for this quick answer.
I'm out of office next week; I'll try as soon as I'm back.

[cid:image003.jpg@...]
Olivier HERTRICH
EXPERT ITS
DIRECTION OPÉRATIONS TECHNOLOGIES / INTEGRATION CENTER
DIRECTION DES SYSTÈMES D'INFORMATION

T +33 562417338
M +33 608020775

o.hertrich@...<mailto:o.hertrich@...>
TARBES - FRANCE
  http://www.daher.com<http://www.daher.com>; - http://www.tbm.aero<http://www.t
bm.aero>;


De : LEIBOVICI Thomas [mailto:thomas.leibovici@...]
Envoyé : vendredi 10 novembre 2017 14:17
À : HERTRICH Olivier <o.hertrich@...>; robinhood-support@...
Objet : Re: [robinhood-support] Fileclass with time criteria

Hi,

Fileclass are expected to be quite static, while time conditions may change
every second...

I know the kind of report you mention has already been done with robinhood
For example the "filesystem temprature" graph shown in this presentation slide
14, is based on such report:
https://www.eofs.eu/_media/events/lad13/04_kilian_cavalotti_lustre.usage.monito
ring.pdf
It shows the "modification age" and the "access age" of the  data.

This is the request to build such graph for access times. Just replace
'last_access' with 'last_mod' to get the same for last modification time :

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; START TRANSACTION;
SELECT age, SUM(c) AS cnt, SUM(v) AS vol FROM ( SELECT c, v, CASE
  WHEN log_age < ROUND(LOG(10,900),5) THEN 'r_0'
  WHEN log_age < ROUND(LOG(10,3600),5) THEN 'r_1'
  WHEN log_age < ROUND(LOG(10,21600),5) THEN 'r_2'
  WHEN log_age < ROUND(LOG(10,86400),5) THEN 'r_3'
  WHEN log_age < ROUND(LOG(10,604800),5) THEN 'r_4'
  WHEN log_age < ROUND(LOG(10,2592000),5) THEN 'r_5'
  WHEN log_age < ROUND(LOG(10,5184000),5) THEN 'r_6'
  WHEN log_age < ROUND(LOG(10,7776000),5) THEN 'r_7'
  ELSE 'r_8' END AS age FROM ( SELECT IF(UNIX_TIMESTAMP(NOW())>
=last_access,ROUND(LOG(10,UNIX_TIMESTAMP(NOW())-last_access),5),NULL) AS
log_age, COUNT(*) AS c, IFNULL(SUM(size),0) AS v FROM ENTRIES GROUP BY log_age
) AS ps ) AS stats GROUP BY age; COMMIT;

For duplicate name-size couples, a crafty sql request should do it...
You can first create a temporary table with id, size and name:

CREATE TABLE FIND_DUP AS (SELECT E.id, E.size, N.parent_id, N.name,
this_path(N.parent_id, N.name) as path from ENTRIES E,NAMES N WHERE E.id=N.id
and E.type='file');

Then you can list name-size duplicates using:
SELECT A.path, B.path, A.size FROM FIND_DUP A, FIND_DUP B WHERE A.size=B.size
and A.name=B.name and A.id <> B.id;

HTH
Thomas


On 11/09/17 19:28, HERTRICH Olivier wrote:
Discovering Robinhood, I search to have a report based last_access time to
identify the size of data older than some dates.
It seems that time criteria are not allowed in Fileclass definition.

How can I have this reports?
I also search to find duplicate files (same name/ same size, for now). IS this
possible ?

Thank you for your help.

[cid:image002.jpg@...]
Olivier HERTRICH
EXPERT ITS
DIRECTION OPÉRATIONS TECHNOLOGIES / INTEGRATION CENTER
DIRECTION DES SYSTÈMES D'INFORMATION

T +33 562417338
M +33 608020775

o.hertrich@...<mailto:o.hertrich@...>
TARBES - FRANCE
  http://www.daher.com<http://www.daher.com>; - http://www.tbm.aero<http://www.t
bm.aero>;




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to