Re: [GTALUG] On the subject of backups.

Nicholas Krause via talk Mon, 04 May 2020 19:55:09 -0700


On 5/4/20 10:42 PM, Alvin Starr wrote:

On 5/4/20 7:28 PM, nick wrote:
On 2020-05-04 2:12 p.m., Alvin Starr via talk wrote:
On 5/4/20 1:26 PM, Lennart Sorensen via talk wrote:
On Mon, May 04, 2020 at 04:38:28PM +0200, ac via talk wrote:
Hi Alvin,

On a 2TB dataset, with +-600k files, I have piped tree to less with
limited joy, it took a few hours and at least I could search for
what I was looking for... - 15TB and 100M is another animal though
and as disk i/o will be your bottleneck, anything will take long, no?
now, for my own info/interest, can you tell me which fs is usedfor this
ext3?
Hmm, sounds awful slow.

Just for fun I ran find on one of my drives:

# time find /data | wc -l
1825463
real    3m57s.208s

That is with 5.3T used out of 6.0TB.
Running it a second time when it is cached takes 7.7s. Tree takes14.7s.
Another volume:
# time find /mythdata | wc -l
54972

real    0m1.924s

That is with 15 TB out of 15 TB in use (yes that one always fills up
for some reason).

Both of those are lvm volumes with ext4 on top of software raid6 using
5400rpm WD red drives.
Seems either XFS is unbelievable bad, or there isn't enough ram tocache
the filesystem metadata if you are having a problem with 100M files.
I only have a measly 32GB in my home machine.
I believe the directory hierarchy has a lot to do with the performance.
It seems that the listing time is non-linear although I do notbelieve its an N^^2 kind of problem.I would have said the same as you before I started having to dealwith 10's of millions of files.
The first question I would have is how big are the actual filesversus space used. Most file systemstry to merge files that don't allocate in a single block to saveblocks and get better space usage.Assuming a) the size of the files being rather small and b) Theamount, I would be curious if ReiserFSor brtfs helps either as most have much better tail merging intometadata blocks to my knowledge. Betterpacking the small files can help as disk seeks are a problem here itseems. My disks will solve to like
10 MB/S on ext4 with lots of small files like this due to seeks.
The files are generally a few hundred KB each. They may run into a fewMB but that's about it.I use to use ReiserFS back in the days of ext2/3 but it kind of fellout of favor after the lead developer got sent away for murder.
Reiser was much faster and more reliable than ext at the time.
It would actually be interesting to see if running a reiserfs or btrfsfilesystem would actually make a significant difference but in thelong run I am kind of stuck with Centos/RH supported file systems andreiser and btrfs are not part of that mix anymore.

Interesting, I recall CentOS 7 and RHEL 7 supporting it. You must be ona older system unless my memory is wrong :). The problem really isseek times. I've not sure if there is a way to tune the kernel frommemory for this but that's your problem unfortunately.

I am not sure how much I can get by tweaking the filesystem.
I would need to get a 50x -100x improvement to make backups completein a few hours.Most stuff I have read comparing various filesystems and performanceare talking about percentage differences that is much less than 100%.
I have a feeling that the only answer will be something like Veeamwhere only changed blocks are backed up.
A directory tree walk just takes too long.

See my above comments but the other two ideas would be to use some shortof filesystem or LVM snapshotsor try dd as a benchmark to see if a raw copy tool does better.Snapshots are probably the best way throughor only changed blocked based tools as you logically came to theconclusion of.

Good luck through as lots of small files are always a pain intransferring and especially on rotating disks,

Nick

The other question as pointed out here is how much memory the pagecache and other kernel caches areusing. I would check /proc/meminfo to start as this may be anotherlogical solution already pointed
out.

Maybe that helps,

Nick


--
Fundamentally an organism has conscious mental states if and only if there is 
something that it is like to be that organism--something it is like for the 
organism. - Thomas Nagel

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk

Re: [GTALUG] On the subject of backups.

Reply via email to