I will add my data point to the discussion in that I have been able to
leave the CHANGELOG running on the storage metadata server while a full
robinhood scan runs.

Cheers,
megan

On Mon, Aug 23, 2021 at 10:08 AM Nathan Gregg - NOAA Affiliate via
robinhood-support <[email protected]> wrote:

> Thanks Thomas for the excellent feedback.  I am going to give this a try.
>
> This is probably a silly question but is it ok to leave the changelog
> scans running while I do another full scan in parallel?
>
> Thanks for the help.
>
> Nate
>
>
> On Mon, Aug 16, 2021 at 3:57 AM [email protected]
> <[email protected]> wrote:
> >
> > Hello Nathan,
> >
> > Request on subtrees of the filesystem is what make the query very slow
> because this request builds and matches the path of every entry in the DB.
> > A possible solution we can imagine to optimize your query is to define
> fileclasses for the parts of the filesystem you want to query.
> > e.g.
> > fileclass projectA {
> >    definition { tree == /fs/subdirA }
> > }
> > fileclass projectB {
> >    definition { tree == /fs/subdirB }
> > }
> > ...
> > Note you will need to rescan the FS to update the fileclass of all the
> entries.
> >
> > Then
> > rbh-report --top-users=1000 --filter-class=projectA
> > should be faster that using -P.
> >
> > Of course this supposes you know in advance the set of directories on
> which you want to get stats.
> >
> > I hope this helps,
> > Regards,
> > Thomas
> >
> > > -----Message d'origine-----
> > > De : Nathan Gregg - NOAA Affiliate via robinhood-support [mailto:
> robinhood-
> > > [email protected]]
> > > Envoyé : lundi 9 août 2021 19:46
> > > À : [email protected]
> > > Objet : [robinhood-support] Robinhood Report Performance
> > >
> > > Hello All,
> > >
> > > We successfully have Robinhood up and running and ingesting data from
> > > changelogs from two Lustre file systems.  Everything seems to perform
> > > well other than when we want to run reports that are not part of the
> > > accounting table.  For example, if we want to run a report such as, `
> > > rbh-report --top-users=1000 -P /fs/subdir`, it takes 1.5 days to
> > > complete.
> > >
> > > Our system has SSD drives and 384 GB of RAM.  The IO load looks to be
> > > very low on the box and I am sure more memory would help some but not
> > > sure how much?  Is there anything else we can do to try to
> > > dramatically increase our reporting times for such queries?
> > >
> > > We are running `mysqltuner` and keeping up with its suggestions but so
> > > far reports such as the one above are painfully slow.
> > >
> > > Thanks in advance for your support.
> > >
> > > Nate
> > >
> > >
> > > _______________________________________________
> > > robinhood-support mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/robinhood-support
>
>
> _______________________________________________
> robinhood-support mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/robinhood-support
>
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to