Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Greg Martyn via talk
I haven't used Gluster personally, but have you tried turning performance.parallel-readdir on? https://docs.gluster.org/en/latest/release-notes/3.10.0/#implemented-parallel-readdirp-with-distribute-xlator It seems there's a reason why it's on by default (

Re: [GTALUG] On the subject of backups.

2020-05-06 Thread John Sellens via talk
On Wed, 2020/05/06 10:38:29AM -0400, Howard Gibson via talk wrote: | > ZFS is another option. And it handles delta-backups very easily. | |How do you recover stuff from delta backups? You have to figure which backup the file or directory is in, right? Remember that snapshots, like RAID,

Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Alvin Starr via talk
On 5/6/20 10:18 AM, Lennart Sorensen via talk wrote: On Wed, May 06, 2020 at 07:25:29AM -0400, David Mason wrote: ZFS is another option. And it handles delta-backups very easily. It is however not the XFS that glusterfs says to use. If glusterfs is involved, XFS really seems to be the only

Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Howard Gibson via talk
On Wed, 6 May 2020 07:25:29 -0400 David Mason via talk wrote: > ZFS is another option. And it handles delta-backups very easily. David, How do you recover stuff from delta backups? You have to figure which backup the file or directory is in, right? My backup recoveries, admittedly

Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Lennart Sorensen via talk
On Wed, May 06, 2020 at 07:25:29AM -0400, David Mason wrote: > ZFS is another option. And it handles delta-backups very easily. It is however not the XFS that glusterfs says to use. If glusterfs is involved, XFS really seems to be the only option. -- Len Sorensen --- Post to this mailing list

Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Alvin Starr via talk
On 5/6/20 7:25 AM, David Mason via talk wrote: ZFS is another option. And it handles delta-backups very easily. ../Dave I have been following ZFS on and off and it looks interesting but I am kind of stuck because it is not included in Centos/RH which is a requirement in this case. It would

Re: [GTALUG] On the subject of backups.

2020-05-06 Thread David Mason via talk
ZFS is another option. And it handles delta-backups very easily. ../Dave On May 5, 2020, 11:27 PM -0400, Lennart Sorensen via talk , wrote: > On Mon, May 04, 2020 at 10:42:25PM -0400, Alvin Starr via talk wrote: > > The files are generally a few hundred KB each. They may run into a few MB > >

Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Alvin Starr via talk
On 5/6/20 12:37 AM, Nicholas Krause wrote: [snip] Well, does the system have enough ram?  That is something that often isn't hard to increase.  XFS has certainly in the past been known to require a fair bit of ram to manage well. I mentioned to check /proc/meminfo as well to look at cache in

Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Alvin Starr via talk
On 5/5/20 11:27 PM, Lennart Sorensen wrote: On Mon, May 04, 2020 at 10:42:25PM -0400, Alvin Starr via talk wrote: The files are generally a few hundred KB each. They may run into a few MB but that's about it. I use to use ReiserFS back in the days of ext2/3 but it kind of fell out of favor

Re: [GTALUG] On the subject of backups.

2020-05-05 Thread Nicholas Krause via talk
On 5/5/20 11:27 PM, Lennart Sorensen wrote: On Mon, May 04, 2020 at 10:42:25PM -0400, Alvin Starr via talk wrote: The files are generally a few hundred KB each. They may run into a few MB but that's about it. I use to use ReiserFS back in the days of ext2/3 but it kind of fell out of favor

Re: [GTALUG] On the subject of backups.

2020-05-05 Thread Lennart Sorensen via talk
On Mon, May 04, 2020 at 10:42:25PM -0400, Alvin Starr via talk wrote: > The files are generally a few hundred KB each. They may run into a few MB > but that's about it. > I use to use ReiserFS back in the days of ext2/3 but it kind of fell out of > favor after the lead developer got sent away for

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread Nicholas Krause via talk
On 5/4/20 10:42 PM, Alvin Starr wrote: On 5/4/20 7:28 PM, nick wrote: On 2020-05-04 2:12 p.m., Alvin Starr via talk wrote: On 5/4/20 1:26 PM, Lennart Sorensen via talk wrote: On Mon, May 04, 2020 at 04:38:28PM +0200, ac via talk wrote: Hi Alvin, On a 2TB dataset, with +-600k files, I have

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread Alvin Starr via talk
On 5/4/20 7:28 PM, nick wrote: On 2020-05-04 2:12 p.m., Alvin Starr via talk wrote: On 5/4/20 1:26 PM, Lennart Sorensen via talk wrote: On Mon, May 04, 2020 at 04:38:28PM +0200, ac via talk wrote: Hi Alvin, On a 2TB dataset, with +-600k files, I have piped tree to less with limited joy, it

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread nick via talk
On 2020-05-04 2:12 p.m., Alvin Starr via talk wrote: > On 5/4/20 1:26 PM, Lennart Sorensen via talk wrote: >> On Mon, May 04, 2020 at 04:38:28PM +0200, ac via talk wrote: >>> Hi Alvin, >>> >>> On a 2TB dataset, with +-600k files, I have piped tree to less with >>> limited joy, it took a few

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread Alvin Starr via talk
On 5/4/20 1:26 PM, Lennart Sorensen via talk wrote: On Mon, May 04, 2020 at 04:38:28PM +0200, ac via talk wrote: Hi Alvin, On a 2TB dataset, with +-600k files, I have piped tree to less with limited joy, it took a few hours and at least I could search for what I was looking for... - 15TB and

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread Alvin Starr via talk
On 5/4/20 12:52 PM, John Sellens wrote: On Mon, 2020/05/04 12:03:19PM -0400, Alvin Starr wrote: | The client really only wants to use Centos/RHEL and ZFS is not part of that | mix at the moment. Well, one could argue that zfs on centos is fairly well supported ... If it were purely my choice I

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread Lennart Sorensen via talk
On Mon, May 04, 2020 at 04:38:28PM +0200, ac via talk wrote: > Hi Alvin, > > On a 2TB dataset, with +-600k files, I have piped tree to less with > limited joy, it took a few hours and at least I could search for > what I was looking for... - 15TB and 100M is another animal though > and as disk

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread John Sellens via talk
On Mon, 2020/05/04 12:03:19PM -0400, Alvin Starr wrote: | The client really only wants to use Centos/RHEL and ZFS is not part of that | mix at the moment. Well, one could argue that zfs on centos is fairly well supported ... | The data is actually sitting on a replicated Gluster cluster so

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread Jamon Camisso via talk
On 2020-05-04 09:55, Alvin Starr via talk wrote: > > I am hoping someone has seen this kind of problem before and knows of a > solution. > I have a client who has file systems filled with lots of small files on > the orders of hundreds of millions of files. > Running something like a find on

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread Alvin Starr via talk
Sadly this one is a bit of a non-starter. The client really only wants to use Centos/RHEL and ZFS is not part of that mix at the moment. The data is actually sitting on a replicated Gluster cluster so trying to replace that with an HA NAS would start to get expensive if it were a commercial

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread John Sellens via talk
I bet no one would want this advice, but it seems to me that the implementation needs to change i.e. that one big (possibly shallow) filesystem on xfs is unworkable. The best answer of course depends on the value of the data. One obvious approach is to use a filesystem/NAS with off-site

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread Alvin Starr via talk
I am not quite sure where the breaking point is but I think part of the problem is that the directories start to get big. The directory hierarchy is only 5 to 10 nodes deep. Its running on xfs. On 5/4/20 10:38 AM, ac wrote: Hi Alvin, On a 2TB dataset, with +-600k files, I have piped tree to

Re: [GTALUG] On the subject of backups.

2020-05-04 Thread ac via talk
Hi Alvin, On a 2TB dataset, with +-600k files, I have piped tree to less with limited joy, it took a few hours and at least I could search for what I was looking for... - 15TB and 100M is another animal though and as disk i/o will be your bottleneck, anything will take long, no? now, for my own

[GTALUG] On the subject of backups.

2020-05-04 Thread Alvin Starr via talk
I am hoping someone has seen this kind of problem before and knows of a solution. I have a client who has file systems filled with lots of small files on the orders of hundreds of millions of files. Running something like a find on filesystem takes the better part of a week so any kind of