Hi,
On 2023-11-21 13:41:15 -0400, David Steele wrote:
> On 11/20/23 16:41, Andres Freund wrote:
> >
> > On 2023-11-20 15:56:19 -0400, David Steele wrote:
> > > I understand this is an option -- but does it need to be? What is the
> > > benefit of excluding the manifest?
> >
> > It's not free to create the manifest, particularly if checksums are enabled.
>
> It's virtually free, even with the basic CRCs.
Huh?
perf stat src/bin/pg_basebackup/pg_basebackup -h /tmp/ -p 5440 -D - -cfast
-Xnone --format=tar > /dev/null
4,423.81 msec task-clock # 0.626 CPUs
utilized
433,475 context-switches # 97.987 K/sec
5 cpu-migrations # 1.130 /sec
599 page-faults # 135.404 /sec
12,208,261,153 cycles # 2.760 GHz
6,805,401,520 instructions # 0.56 insn per
cycle
1,273,896,027 branches # 287.964 M/sec
14,233,126 branch-misses # 1.12% of all
branches
7.068946385 seconds time elapsed
1.106072000 seconds user
3.403793000 seconds sys
perf stat src/bin/pg_basebackup/pg_basebackup -h /tmp/ -p 5440 -D - -cfast
-Xnone --format=tar --manifest-checksums=CRC32C > /dev/null
4,324.64 msec task-clock # 0.640 CPUs
utilized
433,306 context-switches # 100.195 K/sec
3 cpu-migrations # 0.694 /sec
598 page-faults # 138.277 /sec
11,952,475,908 cycles # 2.764 GHz
6,816,888,845 instructions # 0.57 insn per
cycle
1,275,949,455 branches # 295.042 M/sec
13,721,376 branch-misses # 1.08% of all
branches
6.760321433 seconds time elapsed
1.113256000 seconds user
3.302907000 seconds sys
perf stat src/bin/pg_basebackup/pg_basebackup -h /tmp/ -p 5440 -D - -cfast
-Xnone --format=tar --no-manifest > /dev/null
3,925.38 msec task-clock # 0.823 CPUs
utilized
257,467 context-switches # 65.590 K/sec
4 cpu-migrations # 1.019 /sec
552 page-faults # 140.624 /sec
11,577,054,842 cycles # 2.949 GHz
5,933,731,797 instructions # 0.51 insn per
cycle
1,108,784,719 branches # 282.466 M/sec
11,867,511 branch-misses # 1.07% of all
branches
4.770347012 seconds time elapsed
1.002521000 seconds user
2.991769000 seconds sys
I'd not call 7.06->4.77 or 6.76->4.77 "virtually free".
And this actually *under* selling the cost - we waste a lot of cycles due to
bad buffering decisions. Once we fix that, the cost differential increases
further.
> Anyway, would you really want a backup without a manifest? How would you
> know something is missing? In particular, for page incremental how do you
> know something is new (but not WAL logged) if there is no manifest? Is the
> plan to just recopy anything not WAL logged with each incremental?
Shrug. If you just want to create a new standby by copying the primary, I
don't think creating and then validating the manifest buys you much. Long term
backups are a different story, particularly if data files are stored
individually, rather than in a single checksummed file.
> > Also, for external backups, there's no manifest...
>
> There certainly is a manifest for many external backup solutions. Not having
> a manifest is just running with scissors, backup-wise.
You mean that you have an external solution gin up a backup manifest? I fail
to see how that's relevant here?
Greetings,
Andres Freund