On 13.05.2016 9:39, Michael Paquier wrote:
Hi all,

Beginning a new thread because the ext4 issues are closed, and because
pg_basebackup data durability meritates a new thread. And in short
about the problem: pg_basebackup makes no effort in being sure that
the data it backs up is on disk, which is bad... One possible
recommendation is to use initdb -S after running pg_basebackup, but
making sure that data is on disk should be done before pg_basebackup

On Thu, May 12, 2016 at 8:09 PM, I wrote:
And actually this won't fly high if there is no equivalent of
walkdir() or if the fsync()'s are not applied recursively. On master
at least the refactoring had better be done cleanly first... For the
back branches, we could just have some recursive call like
fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do
you think that this should be part of fe_utils or src/common/? I'd
tend to think the latter is more adapted as there is an equivalent in
the backend. On back-branches, we could just have something like
fsync_recursively that walks though the paths. An even more simple
approach would be to fsync() individually things that have been
written, but that would suck in performance.

So, attached are two patches that apply on HEAD to address the problem
of pg_basebackup that does not sync the data it writes. As
pg_basebackup cannot use directly initdb -S because, as a client-side
utility, it may be installed while initdb is not (see Fedora and
RHEL), I have refactored the code so as the routines in initdb.c doing
the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
this is 0001.

Patch 0002 is a set of fixes for pg_basebackup:
- In plain mode, fsync_pgdata is used so as all the tablespaces are
fsync'd at once. This takes care as well of the case where pg_xlog is
a symlink.
- In tar mode (no stdout), each tar file is synced individually, and
the base directory is synced once at the end.
In both cases, failures are not considered fatal.

With pg_basebackup -X and pg_receivexlog, the manipulation of WAL
files is made durable by using fsync and durable_rename where needed
(credits to Andres mainly for this part).

This set of patches is aimed only at HEAD. Back-patchable versions of
this patch would need to copy fsync_pgdata and friends into
streamutil.c for example.

I am adding that to the next CF for review as a bug fix.

Do we have any confidence that data file is not being corrupted? I.e contains some corrupted page? Can pg_basebackup check page checksum (db init with initdb -k) while backing up files?

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

