Hello,

Sorry for the late answer.

Yes you can run multiple rbh-find in parallel on the database, it scales really 
well in RBH3.

However, that means you need a server that is able to handle such a load, so 
you may need to move the database to another server.


Yoann

________________________________
De : OGER Niels <niels.o...@meteo.fr>
Envoyé : mardi 23 avril 2024 13:31:36
À : VALERI Yoann 610657
Cc : LEIBOVICI Thomas 601315; robinhood-supp...@lists.sf.net
Objet : Re: Extraction des métadonnées atime et mtime de RBH ?

Hello Yoann,

we are using RBH 3, with mysql databases instead of mariadb.
We might have an issue with the synchronization between Lustre and the master 
RBH database, or between the master and mirror databases, but it not what we 
are looking into for now.
I do not know how the mirror database is synchronized with the master, but the 
mirror data has data.

We have roughly 400 millions files on our Lustre and we only managed to get the 
metadata for 8 millions files with the rbh-find commad we let running during a 
whole night. We stopped it in the morning to assess what we got.
What we want to do is to run several rbh-find in parallele on different 
directories to be faster.
I'm assuming the rbh-find is not impacted by the synchronization process. If it 
is the case, we might reconsider our strategy on using the mirror. We do not 
need 100% up-to-date data for our statistics.

Maybe we can setup a call, it might be easier to explain what we are trying to 
do.

best regards,
Niels

________________________________
De: "Yoann VALERI" <yoann.val...@cea.fr>
À: "Niels OGER" <niels.o...@meteo.fr>, "Thomas LEIBOVICI" 
<thomas.leibov...@cea.fr>
Cc: robinhood-supp...@lists.sf.net
Envoyé: Mardi 23 Avril 2024 09:06:43
Objet: RE: Extraction des métadonnées atime et mtime de RBH ?


Hello,


To help you better, could you please tell us if you are trying to use Robinhood 
3 or Robinhood 4 ?


For Robinhood 3, you must use the command `robinhood` to synchronize data, 
`rbh-find` to query said data and `rbh-report` to get general information about 
the filesystem mirrored.

RBH 3 relies on a Maria DB mirror to work properly.


Robinhood 4 uses `rbh-sync` to synchronize data, `rbh-find` to query the data, 
but doesn't have a `rbh-report` yet.

It relies mainly on Mongo DB, at currently only it can be written to and 
contain a mirror of the filesystem.

That means that for RBH 4, you cannot use `rbh-find` on anything but a Mongo 
backend.


If you are indeed trying to use RBH 4, we have added in the last two weeks a 
new backend `lustre-mpi` that uses MPIFileUtils to synchronize data from a 
Lustre filesystem.

We are currently working a similar backend for POSIX, which will also use 
MPIFileUtils.

With this, you will be able to lower the synchronization time, depending on the 
allowed resources.


Also, since you talked about retention, we have a branch for RBH 4 available 
that adds such a feature, you might want to look into that.


Don't hesitate to come back to us, we'll be happy to help.


Kind regards,

Yoann Valeri.

________________________________
De : OGER Niels <niels.o...@meteo.fr>
Envoyé : lundi 22 avril 2024 16:18:53
À : LEIBOVICI Thomas 601315
Cc : robinhood-supp...@lists.sf.net
Objet : Re: [robinhood-support] Extraction des métadonnées atime et mtime de 
RBH ?

Hello Thomas,

thank you for your quick answer.
We managed to run an rbh-find command with every metadata we need with the 
posix backend (we do not have mongo on the instance).

We are using a slave instance of the RBH database to run the commands to avoid 
interfering with the real time updates.
We are looking into how to run several rbh-find in parallel, because with only 
1 command we expected it to run for 25 days and lead to a 1.5To file.
Would you advise to run several rbh-find commands in parallel to be quicker or 
do you think the database would be the bottleneck and run several commands 
would only make it worse ?

best regards,
Niels

________________________________
De: "Thomas LEIBOVICI" <thomas.leibov...@cea.fr>
À: "Niels OGER" <niels.o...@meteo.fr>, robinhood-supp...@lists.sf.net
Envoyé: Jeudi 18 Avril 2024 11:32:40
Objet: RE: Extraction des métadonnées atime et mtime de RBH ?

Dear Niels,

Please prioritize using English on this mailing list so that the community of 
other users can respond to you or benefit from the provided answers.

Did you take a look at the “rbh-find –printf” option that potentially allows 
diplaying any attribute present in the robinhood’s database? For sure it can 
display all the attributes you mentioned (size, path, user, group …).
See rbh-find –help or man rbh-find for more details.

AFAIK, there is no existing GUI as you mention. It’s been a long time since 
this idea was mentioned, but nobody has coded it yet.
There is still the Robinhood webUI that enables visualising some useful stats 
about usage, size, age, users, groups…

I hope that helps.

Best Regards,
Thomas

De : OGER Niels <niels.o...@meteo.fr>
Envoyé : jeudi 18 avril 2024 09:32
À : robinhood-supp...@lists.sf.net
Objet : [robinhood-support] Extraction des métadonnées atime et mtime de RBH ?

Bonjour,

nous commençons à exploiter les instances RBH déployées sur nos 2 clusters à 
Météo-France.
Dans un premier temps nous souhaitons faire des statistiques sur la date de 
dernier accès en fonction de l'âge des fichiers pour estimer de manière plus 
objective des durées de rétention.

La commande rbh-report nous semblait la plus prometteuse mais nous n'avons pas 
trouvé d'option pour récupérer le atime et le mtime (commande testé: rbh-report 
--dump-group xxx -c -f scratch). Les autres métadonnées du rbh-report nous 
intéressent aussi.
Nous pourrions faire plusieurs rbh-find en spécifiant les atime et mtime mais 
nous manquerait la taille des fichiers (ou alors il faudrait combiner du 
rbh-report et des rbh-find).

Est-ce qu'il existe une commande ou des options pour avoir la taille, le 
chemin, user/group et les atime+mtime pour les fichiers à partir de RBH ?

On envisage d'aller jardiner dans le code de rbh-report pour ajouter ce que 
l'on veut ou faire des requêtes SQL directement dans les tables mais cela 
risque de ne pas être trivial.

Autre question un peu annexe, est-ce que vous auriez connaissance d'un outil 
permettant d'avoir une vision de type "occupation du système de fichier Ubuntu" 
(= cercles concentriques selon la taille des répertoires) pour du Lustre (en 
s'appuyant sur RBH ou pas) ?

merci d'avance,
Niels
--
----- Météo-France -----
OGER NIELS
DSI/D - Chef de projet Calcul Intensif
niels.o...@meteo.fr<mailto:niels.o...@meteo.fr>
Fixe : +33 561078198


--
----- Météo-France -----
OGER NIELS
DSI/D - Chef de projet Calcul Intensif
niels.o...@meteo.fr
Fixe : +33 561078198


--
----- Météo-France -----
OGER NIELS
DSI/D - Chef de projet Calcul Intensif
niels.o...@meteo.fr
Fixe : +33 561078198
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to