I haven't used Gluster personally, but have you tried turning performance.parallel-readdir on? https://docs.gluster.org/en/latest/release-notes/3.10.0/#implemented-parallel-readdirp-with-distribute-xlator
It seems there's a reason why it's on by default ( https://www.spinics.net/lists/gluster-devel/msg25518.html) but maybe it'd still be worth it for you? On Mon, May 4, 2020 at 9:55 AM Alvin Starr via talk <talk@gtalug.org> wrote: > > I am hoping someone has seen this kind of problem before and knows of a > solution. > I have a client who has file systems filled with lots of small files on > the orders of hundreds of millions of files. > Running something like a find on filesystem takes the better part of a > week so any kind of directory walking backup tool will take even longer > to run. > The actual data-size for 100M files is on the order of 15TB so there is > a lot of data to backup but the data only increases on the order of tens > to hundreds of MB a day. > > > Even things like xfsdump take a long time. > For example I tried xfsdump on a 50M file set and it took over 2 days to > complete. > > The only thing that seems to be workable is Veeam. > It will run an incremental volume snapshot in a few hours a night but I > dislike adding proprietary kernel modules into the systems. > > > -- > Alvin Starr || land: (647)478-6285 > Netvel Inc. || Cell: (416)806-0133 > al...@netvel.net || > > --- > Post to this mailing list talk@gtalug.org > Unsubscribe from this mailing list > https://gtalug.org/mailman/listinfo/talk >
--- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk