Hello,
Sorry for the late answer. Yes you can run multiple rbh-find in parallel on the database, it scales really well in RBH3. However, that means you need a server that is able to handle such a load, so you may need to move the database to another server. Yoann ________________________________ De : OGER Niels <niels.o...@meteo.fr> Envoyé : mardi 23 avril 2024 13:31:36 À : VALERI Yoann 610657 Cc : LEIBOVICI Thomas 601315; robinhood-supp...@lists.sf.net Objet : Re: Extraction des métadonnées atime et mtime de RBH ? Hello Yoann, we are using RBH 3, with mysql databases instead of mariadb. We might have an issue with the synchronization between Lustre and the master RBH database, or between the master and mirror databases, but it not what we are looking into for now. I do not know how the mirror database is synchronized with the master, but the mirror data has data. We have roughly 400 millions files on our Lustre and we only managed to get the metadata for 8 millions files with the rbh-find commad we let running during a whole night. We stopped it in the morning to assess what we got. What we want to do is to run several rbh-find in parallele on different directories to be faster. I'm assuming the rbh-find is not impacted by the synchronization process. If it is the case, we might reconsider our strategy on using the mirror. We do not need 100% up-to-date data for our statistics. Maybe we can setup a call, it might be easier to explain what we are trying to do. best regards, Niels ________________________________ De: "Yoann VALERI" <yoann.val...@cea.fr> À: "Niels OGER" <niels.o...@meteo.fr>, "Thomas LEIBOVICI" <thomas.leibov...@cea.fr> Cc: robinhood-supp...@lists.sf.net Envoyé: Mardi 23 Avril 2024 09:06:43 Objet: RE: Extraction des métadonnées atime et mtime de RBH ? Hello, To help you better, could you please tell us if you are trying to use Robinhood 3 or Robinhood 4 ? For Robinhood 3, you must use the command `robinhood` to synchronize data, `rbh-find` to query said data and `rbh-report` to get general information about the filesystem mirrored. RBH 3 relies on a Maria DB mirror to work properly. Robinhood 4 uses `rbh-sync` to synchronize data, `rbh-find` to query the data, but doesn't have a `rbh-report` yet. It relies mainly on Mongo DB, at currently only it can be written to and contain a mirror of the filesystem. That means that for RBH 4, you cannot use `rbh-find` on anything but a Mongo backend. If you are indeed trying to use RBH 4, we have added in the last two weeks a new backend `lustre-mpi` that uses MPIFileUtils to synchronize data from a Lustre filesystem. We are currently working a similar backend for POSIX, which will also use MPIFileUtils. With this, you will be able to lower the synchronization time, depending on the allowed resources. Also, since you talked about retention, we have a branch for RBH 4 available that adds such a feature, you might want to look into that. Don't hesitate to come back to us, we'll be happy to help. Kind regards, Yoann Valeri. ________________________________ De : OGER Niels <niels.o...@meteo.fr> Envoyé : lundi 22 avril 2024 16:18:53 À : LEIBOVICI Thomas 601315 Cc : robinhood-supp...@lists.sf.net Objet : Re: [robinhood-support] Extraction des métadonnées atime et mtime de RBH ? Hello Thomas, thank you for your quick answer. We managed to run an rbh-find command with every metadata we need with the posix backend (we do not have mongo on the instance). We are using a slave instance of the RBH database to run the commands to avoid interfering with the real time updates. We are looking into how to run several rbh-find in parallel, because with only 1 command we expected it to run for 25 days and lead to a 1.5To file. Would you advise to run several rbh-find commands in parallel to be quicker or do you think the database would be the bottleneck and run several commands would only make it worse ? best regards, Niels ________________________________ De: "Thomas LEIBOVICI" <thomas.leibov...@cea.fr> À: "Niels OGER" <niels.o...@meteo.fr>, robinhood-supp...@lists.sf.net Envoyé: Jeudi 18 Avril 2024 11:32:40 Objet: RE: Extraction des métadonnées atime et mtime de RBH ? Dear Niels, Please prioritize using English on this mailing list so that the community of other users can respond to you or benefit from the provided answers. Did you take a look at the “rbh-find –printf” option that potentially allows diplaying any attribute present in the robinhood’s database? For sure it can display all the attributes you mentioned (size, path, user, group …). See rbh-find –help or man rbh-find for more details. AFAIK, there is no existing GUI as you mention. It’s been a long time since this idea was mentioned, but nobody has coded it yet. There is still the Robinhood webUI that enables visualising some useful stats about usage, size, age, users, groups… I hope that helps. Best Regards, Thomas De : OGER Niels <niels.o...@meteo.fr> Envoyé : jeudi 18 avril 2024 09:32 À : robinhood-supp...@lists.sf.net Objet : [robinhood-support] Extraction des métadonnées atime et mtime de RBH ? Bonjour, nous commençons à exploiter les instances RBH déployées sur nos 2 clusters à Météo-France. Dans un premier temps nous souhaitons faire des statistiques sur la date de dernier accès en fonction de l'âge des fichiers pour estimer de manière plus objective des durées de rétention. La commande rbh-report nous semblait la plus prometteuse mais nous n'avons pas trouvé d'option pour récupérer le atime et le mtime (commande testé: rbh-report --dump-group xxx -c -f scratch). Les autres métadonnées du rbh-report nous intéressent aussi. Nous pourrions faire plusieurs rbh-find en spécifiant les atime et mtime mais nous manquerait la taille des fichiers (ou alors il faudrait combiner du rbh-report et des rbh-find). Est-ce qu'il existe une commande ou des options pour avoir la taille, le chemin, user/group et les atime+mtime pour les fichiers à partir de RBH ? On envisage d'aller jardiner dans le code de rbh-report pour ajouter ce que l'on veut ou faire des requêtes SQL directement dans les tables mais cela risque de ne pas être trivial. Autre question un peu annexe, est-ce que vous auriez connaissance d'un outil permettant d'avoir une vision de type "occupation du système de fichier Ubuntu" (= cercles concentriques selon la taille des répertoires) pour du Lustre (en s'appuyant sur RBH ou pas) ? merci d'avance, Niels -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif niels.o...@meteo.fr<mailto:niels.o...@meteo.fr> Fixe : +33 561078198 -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif niels.o...@meteo.fr Fixe : +33 561078198 -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif niels.o...@meteo.fr Fixe : +33 561078198
_______________________________________________ robinhood-support mailing list robinhood-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/robinhood-support