Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-29 Thread Tom Lane
Michael Paquier  writes:
> Oops. Are you using gcc to see that? I compiled with clang and on
> Windows without noticing it.

Peter already noted that you'd only see it on platforms that define
PG_FLUSH_DATA_WORKS.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-29 Thread Michael Paquier
On Fri, Sep 30, 2016 at 1:31 AM, Jeff Janes  wrote:
> On Thu, Sep 29, 2016 at 8:33 AM, Peter Eisentraut
>  wrote:
>>
>> On 9/26/16 10:34 PM, Michael Paquier wrote:
>> > I thought that as long as the error string is shown to the user, it
>> > does not matter much if errno is still saved or not. All the callers
>> > of durable_rename() on frontends don't check for strerrno(errno)
>> > afterwards. Do you think it matters? Changing that back is trivial.
>> > Sorry for not mentioning directly in the thread that I modified that
>> > when dropping the last patch set.
>>
>> Actually, I think the equivalent backend code only does this to save the
>> errno across the close call because the elog call comes after the close.
>>  So it's OK to not do that in the frontend.
>>
>> With that in mind, I have committed the v3 series now.

Thanks!

> I'm getting compiler warnings:
>
> file_utils.c: In function 'fsync_pgdata':
> file_utils.c:86: warning: passing argument 2 of 'walkdir' from incompatible
> pointer type
> file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char *)'
> but argument is of type 'void (*)(const char *, bool,  const char *)'
> file_utils.c:88: warning: passing argument 2 of 'walkdir' from incompatible
> pointer type
> file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char *)'
> but argument is of type 'void (*)(const char *, bool,  const char *)'
> file_utils.c:89: warning: passing argument 2 of 'walkdir' from incompatible
> pointer type
> file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char *)'
> but argument is of type 'void (*)(const char *, bool,  const char *)'

Oops. Are you using gcc to see that? I compiled with clang and on
Windows without noticing it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-29 Thread Jeff Janes
On Thu, Sep 29, 2016 at 8:33 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/26/16 10:34 PM, Michael Paquier wrote:
> > I thought that as long as the error string is shown to the user, it
> > does not matter much if errno is still saved or not. All the callers
> > of durable_rename() on frontends don't check for strerrno(errno)
> > afterwards. Do you think it matters? Changing that back is trivial.
> > Sorry for not mentioning directly in the thread that I modified that
> > when dropping the last patch set.
>
> Actually, I think the equivalent backend code only does this to save the
> errno across the close call because the elog call comes after the close.
>  So it's OK to not do that in the frontend.
>
> With that in mind, I have committed the v3 series now.
>

I'm getting compiler warnings:

file_utils.c: In function 'fsync_pgdata':
file_utils.c:86: warning: passing argument 2 of 'walkdir' from incompatible
pointer type
file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char
*)' but argument is of type 'void (*)(const char *, bool,  const char *)'
file_utils.c:88: warning: passing argument 2 of 'walkdir' from incompatible
pointer type
file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char
*)' but argument is of type 'void (*)(const char *, bool,  const char *)'
file_utils.c:89: warning: passing argument 2 of 'walkdir' from incompatible
pointer type
file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char
*)' but argument is of type 'void (*)(const char *, bool,  const char *)'

Cheers,

Jeff


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-29 Thread Peter Eisentraut
On 9/26/16 10:34 PM, Michael Paquier wrote:
> I thought that as long as the error string is shown to the user, it
> does not matter much if errno is still saved or not. All the callers
> of durable_rename() on frontends don't check for strerrno(errno)
> afterwards. Do you think it matters? Changing that back is trivial.
> Sorry for not mentioning directly in the thread that I modified that
> when dropping the last patch set.

Actually, I think the equivalent backend code only does this to save the
errno across the close call because the elog call comes after the close.
 So it's OK to not do that in the frontend.

With that in mind, I have committed the v3 series now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-26 Thread Michael Paquier
On Tue, Sep 27, 2016 at 11:16 AM, Peter Eisentraut
 wrote:
> On 9/23/16 11:15 AM, Michael Paquier wrote:
>>> 0002:
>>> >
>>> > durable_rename needs close(fd) in non-error path (compare backend code).
>> Oops, leak.
>
> Why did you remove the errno save and restore?

That's this bit in durable_rename, patch 0002:
+if (fsync(fd) != 0)
+{
+fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+progname, newfile, strerror(errno));
+close(fd);
+return -1;
+}
+close(fd);
I thought that as long as the error string is shown to the user, it
does not matter much if errno is still saved or not. All the callers
of durable_rename() on frontends don't check for strerrno(errno)
afterwards. Do you think it matters? Changing that back is trivial.
Sorry for not mentioning directly in the thread that I modified that
when dropping the last patch set.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-26 Thread Peter Eisentraut
On 9/23/16 11:15 AM, Michael Paquier wrote:
>> 0002:
>> >
>> > durable_rename needs close(fd) in non-error path (compare backend code).
> Oops, leak.

Why did you remove the errno save and restore?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-23 Thread Michael Paquier
On Fri, Sep 23, 2016 at 10:54 AM, Peter Eisentraut
 wrote:
> This is looking pretty good.  More comments on this patch set:

Thanks for the review.

> 0001:
>
> Keep the file order alphabetical in Mkvcbuild.pm.
>
> Needs nls.mk updates for file move (in initdb and pg_basebackup
> directories).

Fixed.

> 0002:
>
> durable_rename needs close(fd) in non-error path (compare backend code).

Oops, leak.

> Changing from fsync() to fsync_name() in some cases means that
> inaccessible files are now ignored.  I guess this would only happen in
> very obscure circumstances, but it's worth considering if that is OK.

In writeTimeLineHistoryFile() that's fine, the user should have
ownership of it as it has been created by receivelog.c. Similar remark
for .
In mark_file_as_archived, the previous open() call would have just failed.

> You added this comment:
>  * XXX: This means that we might not restart if a crash occurs
> before the
>  * fsync below. We probably should create the file in a temporary path
>  * like the backend does...
> pg_receivexlog uses the .partial suffix for this reason.  Why doesn't
> pg_basebackup do that?

Because it matters for pg_receivexlog as it has a retry logic and it
is able to rework on a partial segment. Thinking more about that this
comment does not make much sense, because for pg_basebackup a crash
would just cause the backup to be incomplete, so I have removed it.

> In open_walfile, could we move the fsync calls before the fstat or
> somewhere around there so we don't have to make the same calls in two
> different branches?

We could refactor a bit the code so as follows:
if (size == 16MB)
{
walfile = f;
}
else if (size == 0)
{
//do padding and lseek
}
else
error();
fsync();
I am not sure if this gains in clarity though, perhaps I looked too
much the current code.

> 0003:
>
> There was a discussion about renaming the --nosync option in initdb to
> --no-sync, but until that is done, the option in pg_basebackup should
> stay what initdb has right now.

Changed.

> There is a whitespace alignment error in usage().

Not sure I follow here.

> The -N option should be listed after -n.

In the switch()? Fixed.

> Some fsync calls are not covered by a do_sync conditional.  I see some
> in close_walfile(), HandleCopyStream(), ProcessKeepaliveMsg().

Right. I looked at the rest and did not notice any extra mistakes.
-- 
Michael
From 79a9c302bc4723d6709bf5f1dc6ca673598e8b4a Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 23 Sep 2016 23:23:05 +0900
Subject: [PATCH 1/4] Relocation fsync routines of initdb into src/common

Those are aimed at being used by other utilities, pg_basebackup mainly,
and at some other degree by pg_dump and pg_receivexlog.
---
 src/bin/initdb/initdb.c | 270 ++-
 src/bin/initdb/nls.mk   |   2 +-
 src/bin/pg_basebackup/nls.mk|   2 +-
 src/common/Makefile |   2 +-
 src/common/file_utils.c | 276 
 src/include/common/file_utils.h |  22 
 src/tools/msvc/Mkvcbuild.pm |   2 +-
 7 files changed, 313 insertions(+), 263 deletions(-)
 create mode 100644 src/common/file_utils.c
 create mode 100644 src/include/common/file_utils.h

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3350e13..e52e67d 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -61,6 +61,7 @@
 #endif
 
 #include "catalog/catalog.h"
+#include "common/file_utils.h"
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "mb/pg_wchar.h"
@@ -70,13 +71,6 @@
 #include "fe_utils/string_utils.h"
 
 
-/* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */
-#if defined(HAVE_SYNC_FILE_RANGE)
-#define PG_FLUSH_DATA_WORKS 1
-#elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
-#define PG_FLUSH_DATA_WORKS 1
-#endif
-
 /* Ideally this would be in a .h file, but it hardly seems worth the trouble */
 extern const char *select_default_timezone(const char *share_path);
 
@@ -237,13 +231,6 @@ static char **filter_lines_with_token(char **lines, const char *token);
 #endif
 static char **readfile(const char *path);
 static void writefile(char *path, char **lines);
-static void walkdir(const char *path,
-		void (*action) (const char *fname, bool isdir),
-		bool process_symlinks);
-#ifdef PG_FLUSH_DATA_WORKS
-static void pre_sync_fname(const char *fname, bool isdir);
-#endif
-static void fsync_fname_ext(const char *fname, bool isdir);
 static FILE *popen_check(const char *command, const char *mode);
 static void exit_nicely(void);
 static char *get_id(void);
@@ -270,7 +257,6 @@ static void load_plpgsql(FILE *cmdfd);
 static void vacuum_db(FILE *cmdfd);
 static void make_template0(FILE *cmdfd);
 static void make_postgres(FILE *cmdfd);
-static void fsync_pgdata(void);
 static void trapsig(int signum);
 static void check_ok(void);
 static char *escape_quotes(const char *src);
@@ -529,177 

Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-22 Thread Peter Eisentraut
This is looking pretty good.  More comments on this patch set:

0001:

Keep the file order alphabetical in Mkvcbuild.pm.

Needs nls.mk updates for file move (in initdb and pg_basebackup
directories).

0002:

durable_rename needs close(fd) in non-error path (compare backend code).

Changing from fsync() to fsync_name() in some cases means that
inaccessible files are now ignored.  I guess this would only happen in
very obscure circumstances, but it's worth considering if that is OK.

You added this comment:

 * XXX: This means that we might not restart if a crash occurs
before the
 * fsync below. We probably should create the file in a temporary path
 * like the backend does...

pg_receivexlog uses the .partial suffix for this reason.  Why doesn't
pg_basebackup do that?

In open_walfile, could we move the fsync calls before the fstat or
somewhere around there so we don't have to make the same calls in two
different branches?

0003:

There was a discussion about renaming the --nosync option in initdb to
--no-sync, but until that is done, the option in pg_basebackup should
stay what initdb has right now.

There is a whitespace alignment error in usage().

The -N option should be listed after -n.

Some fsync calls are not covered by a do_sync conditional.  I see some
in close_walfile(), HandleCopyStream(), ProcessKeepaliveMsg().

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-14 Thread Michael Paquier
On Thu, Sep 15, 2016 at 9:44 AM, Peter Eisentraut
 wrote:
> On 9/12/16 11:16 PM, Michael Paquier wrote:
>>> I don't think tar file output in pg_basebackup needs to be fsynced.
>>> > It's just a backup file like what pg_dump produces, and we don't fsync
>>> > that either.  The point of this change is to leave a data directory in
>>> > a synced state equivalent to what initdb leaves behind.
>> Here I don't agree. The point of the patch is to make sure that what
>> gets generated by pg_basebackup is consistent on disk, for all the
>> formats. It seems weird to me that we could say that the plain format
>> makes things consistent while the tar format may cause some files to
>> be lost in case of power outage. The docs make it clear I think:
>> +By default, pg_basebackup will wait for all files
>> +to be written safely to disk.
>> But perhaps this should directly mention that this is done for all the 
>> formats?
>
> That doesn't really explain why we fsync.

Data durability, particularly on ext4 as it has been discussed a
couple of months back [1]. In case if a crash it could be perfectly
possible to lose files, hence we need to be sure that the files
themselves are flushed, as well as their parent directory to prevent
any problems. I think that we should protect users' backups as much as
we can, in the range we can.

> If we think that all
> "important" files should be fsynced, why aren't we doing it in pg_dump?

pg_dump should do it where it can, see thread [2]. I am tackling
problems once at a time, and that's as well a reason why I would like
us to have a common set of routines in src/common or src/fe_utils to
help improve this handling.

> Or psql, or server-side copy?  Similarly, there is no straightforward
> mechanism by which you can unpack the tar file generated by
> pg_basebackup and get the unpacked directory fsynced properly.  (I
> suppose initdb -S should be recommended.)

Yes, those are cases that you cannot cover. Imagine for example
pg_basebackup's tar or pg_dump data sent to stdout. There is nothing
we can actually do for all those cases. However what we can do it
giving a set of options making possible to get consistent backups for
users.

[1] Silent data loss on ext4:
https://www.postgresql.org/message-id/56583bdd.9060...@2ndquadrant.com

[2] Data durability:
https://www.postgresql.org/message-id/20160327233033.gd20...@awork2.anarazel.de
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-14 Thread Peter Eisentraut
On 9/12/16 11:16 PM, Michael Paquier wrote:
>> I don't think tar file output in pg_basebackup needs to be fsynced.
>> > It's just a backup file like what pg_dump produces, and we don't fsync
>> > that either.  The point of this change is to leave a data directory in
>> > a synced state equivalent to what initdb leaves behind.
> Here I don't agree. The point of the patch is to make sure that what
> gets generated by pg_basebackup is consistent on disk, for all the
> formats. It seems weird to me that we could say that the plain format
> makes things consistent while the tar format may cause some files to
> be lost in case of power outage. The docs make it clear I think:
> +By default, pg_basebackup will wait for all files
> +to be written safely to disk.
> But perhaps this should directly mention that this is done for all the 
> formats?

That doesn't really explain why we fsync.  If we think that all
"important" files should be fsynced, why aren't we doing it in pg_dump?
Or psql, or server-side copy?  Similarly, there is no straightforward
mechanism by which you can unpack the tar file generated by
pg_basebackup and get the unpacked directory fsynced properly.  (I
suppose initdb -S should be recommended.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-12 Thread Michael Paquier
On Sat, Sep 10, 2016 at 6:27 PM, Peter Eisentraut
 wrote:
> In fsync_pgdata(), you have removed the progname from one error
> message, even though it is passed into the function.

Right. Good catch.

> Also, I think
> fsync_pgdata() should not be printing initdb's progress messages.
> That should stay in initdb.

That's a reference to that.
+   fputs(_("syncing data to disk ... "), stdout);
+   fflush(stdout);
Oops, fixed. I missed that now that I look at it. Perhaps I was
thinking something else when hacking that lastly, like getting this
message out for pg_basebackup as well.

> Worse, in 0002 you are then changing the
> output, presumably to suit pg_basebackup or something else, which
> messes up the initdb output.

Yes, fixed.

> durable_rename() does not fsync an existing newfile before renaming,
> in contrast to the backend code in fd.c.

That's in 0002. In the case of initdb it did not really matter, but
that matters for pg_receivexlog, so let's do it unconditionally. I
reworked the code to check if the existing new file is here, and
fsync() if it exists.

> I'm slightly concerned about lots of code duplication in
> durable_rename, fsync_parent_path between backend and standalone code.

Based on the ground of code readability, I am not sure that it would
be worth sharing the code of durable_rename or even fsync_parent_path
between the backend and the frontend, particularly because they are
actually not really duplicated... One good step in favor of that would
be to get a elog()/ereport() layer in src/common as well, but that's
really something that this set of patches should tackle, no?

> Also, I'm equally slightly concerned that having duplicate function
> names will mess up tag search for someone.

There are similar cases, take palloc().. Now perhaps some of those
functions could be renamed pg_fsync_... Thoughts?

> This is a preexisting mistake, but fsync_fname_ext() should just be
> fsync_fname() to match the backend naming.  In the backend,
> fsync_fname_ext() "extends" fsync_fname() by accepting an ignore_perm
> argument, but the initdb code doesn't do that.

OK.

> I don't think tar file output in pg_basebackup needs to be fsynced.
> It's just a backup file like what pg_dump produces, and we don't fsync
> that either.  The point of this change is to leave a data directory in
> a synced state equivalent to what initdb leaves behind.

Here I don't agree. The point of the patch is to make sure that what
gets generated by pg_basebackup is consistent on disk, for all the
formats. It seems weird to me that we could say that the plain format
makes things consistent while the tar format may cause some files to
be lost in case of power outage. The docs make it clear I think:
+By default, pg_basebackup will wait for all files
+to be written safely to disk.
But perhaps this should directly mention that this is done for all the formats?

> Some of the changes in receivelog.c seem excessive.  Replacing a plain
> fsync() with fsync_fname_ext() plus fsync_parent_path() isn't
> justified by the references you have provided.  This would need more
> explanation.

In mark_file_as_archived(), we had better do it or in case of a crash
the .done file would just be gone. open_walfile() as well need that,
to ensure that the segment is created and zeroed, and in the case
where it was padded (is this one overthinking it?).

Patch 0003 had conflicts caused by 9083353.
-- 
Michael
From 6158dc8f70dd100f59d7d2c4b7a60435b7c1cfac Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 13 Sep 2016 12:14:12 +0900
Subject: [PATCH 1/4] Relocation fsync routines of initdb into src/common

Those are aimed at being used by other utilities, pg_basebackup mainly,
and at some other degree by pg_dump and pg_receivexlog.
---
 src/bin/initdb/initdb.c | 270 ++-
 src/common/Makefile |   2 +-
 src/common/file_utils.c | 276 
 src/include/common/file_utils.h |  22 
 src/tools/msvc/Mkvcbuild.pm |   2 +-
 5 files changed, 311 insertions(+), 261 deletions(-)
 create mode 100644 src/common/file_utils.c
 create mode 100644 src/include/common/file_utils.h

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3350e13..e52e67d 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -61,6 +61,7 @@
 #endif
 
 #include "catalog/catalog.h"
+#include "common/file_utils.h"
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "mb/pg_wchar.h"
@@ -70,13 +71,6 @@
 #include "fe_utils/string_utils.h"
 
 
-/* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */
-#if defined(HAVE_SYNC_FILE_RANGE)
-#define PG_FLUSH_DATA_WORKS 1
-#elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
-#define PG_FLUSH_DATA_WORKS 1
-#endif
-
 /* Ideally this would be in a .h file, but it hardly seems worth the trouble */
 extern const char *select_default_timezone(c

Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-12 Thread Michael Paquier
On Tue, Sep 13, 2016 at 10:37 AM, Andres Freund  wrote:
> On 2016-09-13 10:35:38 +0900, Michael Paquier wrote:
>> On Sat, Sep 10, 2016 at 7:36 PM, Craig Ringer
>>  wrote:
>> > We need it for tap tests. More and more will use pg_basebackup and it'll
>> > start hurting test speeds badly.
>>
>> Ah yes, that's a good argument. hamster would suffer pretty badly
>> after that if nothing is done. I'll get an extra patch out for that,
>> with --no-sync and not --nosync by the way.
>
> FWIW, it might be better to instead use eatmydata in the cron
> invocations on such slow machines, that way we also test the fsync paths
> in them.

Er, well.. libeatmydata is only available in AUR for Archlinux x86_64,
and not in sight for Arch ARM...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-12 Thread Andres Freund
On 2016-09-13 10:35:38 +0900, Michael Paquier wrote:
> On Sat, Sep 10, 2016 at 7:36 PM, Craig Ringer
>  wrote:
> > We need it for tap tests. More and more will use pg_basebackup and it'll
> > start hurting test speeds badly.
> 
> Ah yes, that's a good argument. hamster would suffer pretty badly
> after that if nothing is done. I'll get an extra patch out for that,
> with --no-sync and not --nosync by the way.

FWIW, it might be better to instead use eatmydata in the cron
invocations on such slow machines, that way we also test the fsync paths
in them.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-12 Thread Michael Paquier
On Sat, Sep 10, 2016 at 7:36 PM, Craig Ringer
 wrote:
> We need it for tap tests. More and more will use pg_basebackup and it'll
> start hurting test speeds badly.

Ah yes, that's a good argument. hamster would suffer pretty badly
after that if nothing is done. I'll get an extra patch out for that,
with --no-sync and not --nosync by the way.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-10 Thread Craig Ringer
On 3 Sep. 2016 9:22 pm, "Michael Paquier"  wrote:
>
> On Sat, Sep 3, 2016 at 12:42 AM, Magnus Hagander 
wrote:
> > On Fri, Sep 2, 2016 at 8:50 AM, Michael Paquier <
michael.paqu...@gmail.com>
> > wrote:
> >> On Fri, Sep 2, 2016 at 2:20 AM, Peter Eisentraut
> >>  wrote:
> >> > On 5/13/16 2:39 AM, Michael Paquier wrote:
> >> What do others think about that? I could implement that on top of 0002
> >> with some extra options. But to be honest that looks to be just some
> >> extra sugar for what is basically a bug fix... And I am feeling that
> >> providing such a switch to users would be a way for one to shoot
> >> himself badly, particularly for pg_receivexlog where a crash can cause
> >> segments to go missing.
> >>
> >
> > Well, why do we provide a --nosync option for initdb? Wouldn't the
argument
> > basically be the same?
>
> Yes, the good-for-testing-but-not-production argument.

We need it for tap tests. More and more will use pg_basebackup and it'll
start hurting test speeds badly.


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-10 Thread Peter Eisentraut
On 9/3/16 9:23 AM, Michael Paquier wrote:
> On Sat, Sep 3, 2016 at 10:22 PM, Michael Paquier
>  wrote:
>> On Sat, Sep 3, 2016 at 10:21 PM, Michael Paquier
>>  wrote:
>>> Oh, well. I have just implemented it on top of the two other patches
>>> for pg_basebackup. For pg_receivexlog, I am wondering if it makes
>>> sense to have it. That would be trivial to implement it, and I think
>>> that we had better make the combination of --synchronous and --nosync
>>> just leave with an error. Thoughts about having that for
>>> pg_receivexlog?
>>
>> With patches that's actually better..
> 
> Meh, meh et meh.

In fsync_pgdata(), you have removed the progname from one error
message, even though it is passed into the function.  Also, I think
fsync_pgdata() should not be printing initdb's progress messages.
That should stay in initdb.  Worse, in 0002 you are then changing the
output, presumably to suit pg_basebackup or something else, which
messes up the initdb output.

durable_rename() does not fsync an existing newfile before renaming,
in contrast to the backend code in fd.c.

I'm slightly concerned about lots of code duplication in
durable_rename, fsync_parent_path between backend and standalone code.
Also, I'm equally slightly concerned that having duplicate function
names will mess up tag search for someone.

This is a preexisting mistake, but fsync_fname_ext() should just be
fsync_fname() to match the backend naming.  In the backend,
fsync_fname_ext() "extends" fsync_fname() by accepting an ignore_perm
argument, but the initdb code doesn't do that.

I don't think tar file output in pg_basebackup needs to be fsynced.
It's just a backup file like what pg_dump produces, and we don't fsync
that either.  The point of this change is to leave a data directory in
a synced state equivalent to what initdb leaves behind.

Some of the changes in receivelog.c seem excessive.  Replacing a plain
fsync() with fsync_fname_ext() plus fsync_parent_path() isn't
justified by the references you have provided.  This would need more
explanation.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-03 Thread Michael Paquier
On Sat, Sep 3, 2016 at 10:26 PM, Magnus Hagander  wrote:
> Yes, we should definitely not allow that combination. In fact, maybe that
> argument in itself is enough not to have it for pg_receivexlog -- the cause
> of confusion.
>
> I can see how you might want to avoid it for pg_basebackup during testing
> for example,. but the overhead on pg_receivexlog shouldn't be as bad in
> testing, should it?

No, I haven't tested directly but it should not be.. pg_basebackup
does always a bulk write, while pg_receivexlog depends on the server
activity.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-03 Thread Magnus Hagander
On Sat, Sep 3, 2016 at 3:21 PM, Michael Paquier 
wrote:

> On Sat, Sep 3, 2016 at 12:42 AM, Magnus Hagander 
> wrote:
> > On Fri, Sep 2, 2016 at 8:50 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >> On Fri, Sep 2, 2016 at 2:20 AM, Peter Eisentraut
> >>  wrote:
> >> > On 5/13/16 2:39 AM, Michael Paquier wrote:
> >> What do others think about that? I could implement that on top of 0002
> >> with some extra options. But to be honest that looks to be just some
> >> extra sugar for what is basically a bug fix... And I am feeling that
> >> providing such a switch to users would be a way for one to shoot
> >> himself badly, particularly for pg_receivexlog where a crash can cause
> >> segments to go missing.
> >>
> >
> > Well, why do we provide a --nosync option for initdb? Wouldn't the
> argument
> > basically be the same?
>
> Yes, the good-for-testing-but-not-production argument.
>
> > I agree it kind of feels like overkill, but it would be consistent
> overkill?
> > :)
>
> Oh, well. I have just implemented it on top of the two other patches
> for pg_basebackup. For pg_receivexlog, I am wondering if it makes
> sense to have it. That would be trivial to implement it, and I think
> that we had better make the combination of --synchronous and --nosync
> just leave with an error. Thoughts about having that for
> pg_receivexlog?
>

Yes, we should definitely not allow that combination. In fact, maybe that
argument in itself is enough not to have it for pg_receivexlog -- the cause
of confusion.

I can see how you might want to avoid it for pg_basebackup during testing
for example,. but the overhead on pg_receivexlog shouldn't be as bad in
testing, should it?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-03 Thread Michael Paquier
On Sat, Sep 3, 2016 at 10:22 PM, Michael Paquier
 wrote:
> On Sat, Sep 3, 2016 at 10:21 PM, Michael Paquier
>  wrote:
>> Oh, well. I have just implemented it on top of the two other patches
>> for pg_basebackup. For pg_receivexlog, I am wondering if it makes
>> sense to have it. That would be trivial to implement it, and I think
>> that we had better make the combination of --synchronous and --nosync
>> just leave with an error. Thoughts about having that for
>> pg_receivexlog?
>
> With patches that's actually better..

Meh, meh et meh.
-- 
Michael
From be6888046cc7dcfde33c22294e8d94a9369ff6b5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 2 Sep 2016 15:19:11 +0900
Subject: [PATCH 1/3] Relocation fsync routines of initdb into src/common

Those are aimed at being used by other utilities, pg_basebackup mainly,
and at some other degree by pg_dump and pg_receivexlog.
---
 src/bin/initdb/initdb.c | 266 +-
 src/common/Makefile |   2 +-
 src/common/file_utils.c | 279 
 src/include/common/file_utils.h |  22 
 src/tools/msvc/Mkvcbuild.pm |   2 +-
 5 files changed, 310 insertions(+), 261 deletions(-)
 create mode 100644 src/common/file_utils.c
 create mode 100644 src/include/common/file_utils.h

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9407492..0b74ff9 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -61,6 +61,7 @@
 #endif
 
 #include "catalog/catalog.h"
+#include "common/file_utils.h"
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "mb/pg_wchar.h"
@@ -70,13 +71,6 @@
 #include "fe_utils/string_utils.h"
 
 
-/* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */
-#if defined(HAVE_SYNC_FILE_RANGE)
-#define PG_FLUSH_DATA_WORKS 1
-#elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
-#define PG_FLUSH_DATA_WORKS 1
-#endif
-
 /* Ideally this would be in a .h file, but it hardly seems worth the trouble */
 extern const char *select_default_timezone(const char *share_path);
 
@@ -237,13 +231,6 @@ static char **filter_lines_with_token(char **lines, const char *token);
 #endif
 static char **readfile(const char *path);
 static void writefile(char *path, char **lines);
-static void walkdir(const char *path,
-		void (*action) (const char *fname, bool isdir),
-		bool process_symlinks);
-#ifdef PG_FLUSH_DATA_WORKS
-static void pre_sync_fname(const char *fname, bool isdir);
-#endif
-static void fsync_fname_ext(const char *fname, bool isdir);
 static FILE *popen_check(const char *command, const char *mode);
 static void exit_nicely(void);
 static char *get_id(void);
@@ -270,7 +257,6 @@ static void load_plpgsql(FILE *cmdfd);
 static void vacuum_db(FILE *cmdfd);
 static void make_template0(FILE *cmdfd);
 static void make_postgres(FILE *cmdfd);
-static void fsync_pgdata(void);
 static void trapsig(int signum);
 static void check_ok(void);
 static char *escape_quotes(const char *src);
@@ -529,177 +515,6 @@ writefile(char *path, char **lines)
 }
 
 /*
- * walkdir: recursively walk a directory, applying the action to each
- * regular file and directory (including the named directory itself).
- *
- * If process_symlinks is true, the action and recursion are also applied
- * to regular files and directories that are pointed to by symlinks in the
- * given directory; otherwise symlinks are ignored.  Symlinks are always
- * ignored in subdirectories, ie we intentionally don't pass down the
- * process_symlinks flag to recursive calls.
- *
- * Errors are reported but not considered fatal.
- *
- * See also walkdir in fd.c, which is a backend version of this logic.
- */
-static void
-walkdir(const char *path,
-		void (*action) (const char *fname, bool isdir),
-		bool process_symlinks)
-{
-	DIR		   *dir;
-	struct dirent *de;
-
-	dir = opendir(path);
-	if (dir == NULL)
-	{
-		fprintf(stderr, _("%s: could not open directory \"%s\": %s\n"),
-progname, path, strerror(errno));
-		return;
-	}
-
-	while (errno = 0, (de = readdir(dir)) != NULL)
-	{
-		char		subpath[MAXPGPATH];
-		struct stat fst;
-		int			sret;
-
-		if (strcmp(de->d_name, ".") == 0 ||
-			strcmp(de->d_name, "..") == 0)
-			continue;
-
-		snprintf(subpath, MAXPGPATH, "%s/%s", path, de->d_name);
-
-		if (process_symlinks)
-			sret = stat(subpath, &fst);
-		else
-			sret = lstat(subpath, &fst);
-
-		if (sret < 0)
-		{
-			fprintf(stderr, _("%s: could not stat file \"%s\": %s\n"),
-	progname, subpath, strerror(errno));
-			continue;
-		}
-
-		if (S_ISREG(fst.st_mode))
-			(*action) (subpath, false);
-		else if (S_ISDIR(fst.st_mode))
-			walkdir(subpath, action, false);
-	}
-
-	if (errno)
-		fprintf(stderr, _("%s: could not read directory \"%s\": %s\n"),
-progname, path, strerror(errno));
-
-	(void) closedir(dir);
-
-	/*
-	 * It's important to fsync the destination directory itself as individual
-	 * file fsyncs don't guarantee that the directo

Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-03 Thread Michael Paquier
On Sat, Sep 3, 2016 at 10:21 PM, Michael Paquier
 wrote:
> On Sat, Sep 3, 2016 at 12:42 AM, Magnus Hagander  wrote:
>> On Fri, Sep 2, 2016 at 8:50 AM, Michael Paquier 
>> wrote:
>>> On Fri, Sep 2, 2016 at 2:20 AM, Peter Eisentraut
>>>  wrote:
>>> > On 5/13/16 2:39 AM, Michael Paquier wrote:
>>> What do others think about that? I could implement that on top of 0002
>>> with some extra options. But to be honest that looks to be just some
>>> extra sugar for what is basically a bug fix... And I am feeling that
>>> providing such a switch to users would be a way for one to shoot
>>> himself badly, particularly for pg_receivexlog where a crash can cause
>>> segments to go missing.
>>>
>>
>> Well, why do we provide a --nosync option for initdb? Wouldn't the argument
>> basically be the same?
>
> Yes, the good-for-testing-but-not-production argument.
>
>> I agree it kind of feels like overkill, but it would be consistent overkill?
>> :)
>
> Oh, well. I have just implemented it on top of the two other patches
> for pg_basebackup. For pg_receivexlog, I am wondering if it makes
> sense to have it. That would be trivial to implement it, and I think
> that we had better make the combination of --synchronous and --nosync
> just leave with an error. Thoughts about having that for
> pg_receivexlog?

With patches that's actually better..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-03 Thread Michael Paquier
On Sat, Sep 3, 2016 at 12:42 AM, Magnus Hagander  wrote:
> On Fri, Sep 2, 2016 at 8:50 AM, Michael Paquier 
> wrote:
>> On Fri, Sep 2, 2016 at 2:20 AM, Peter Eisentraut
>>  wrote:
>> > On 5/13/16 2:39 AM, Michael Paquier wrote:
>> What do others think about that? I could implement that on top of 0002
>> with some extra options. But to be honest that looks to be just some
>> extra sugar for what is basically a bug fix... And I am feeling that
>> providing such a switch to users would be a way for one to shoot
>> himself badly, particularly for pg_receivexlog where a crash can cause
>> segments to go missing.
>>
>
> Well, why do we provide a --nosync option for initdb? Wouldn't the argument
> basically be the same?

Yes, the good-for-testing-but-not-production argument.

> I agree it kind of feels like overkill, but it would be consistent overkill?
> :)

Oh, well. I have just implemented it on top of the two other patches
for pg_basebackup. For pg_receivexlog, I am wondering if it makes
sense to have it. That would be trivial to implement it, and I think
that we had better make the combination of --synchronous and --nosync
just leave with an error. Thoughts about having that for
pg_receivexlog?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-02 Thread Magnus Hagander
On Fri, Sep 2, 2016 at 8:50 AM, Michael Paquier 
wrote:

> On Fri, Sep 2, 2016 at 2:20 AM, Peter Eisentraut
>  wrote:
> > On 5/13/16 2:39 AM, Michael Paquier wrote:
> >> So, attached are two patches that apply on HEAD to address the problem
> >> of pg_basebackup that does not sync the data it writes. As
> >> pg_basebackup cannot use directly initdb -S because, as a client-side
> >> utility, it may be installed while initdb is not (see Fedora and
> >> RHEL), I have refactored the code so as the routines in initdb.c doing
> >> the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
> >> this is 0001.
> >
> > Why fe_utils?  initdb is not a front-end program.
>
> Thinking about that, you are right. Let's move it to src/common,
> frontend-only though.
>
> >> Patch 0002 is a set of fixes for pg_basebackup:
> >> - In plain mode, fsync_pgdata is used so as all the tablespaces are
> >> fsync'd at once. This takes care as well of the case where pg_xlog is
> >> a symlink.
> >> - In tar mode (no stdout), each tar file is synced individually, and
> >> the base directory is synced once at the end.
> >> In both cases, failures are not considered fatal.
> >
> > Maybe there should be --nosync options like initdb has?
>
> What do others think about that? I could implement that on top of 0002
> with some extra options. But to be honest that looks to be just some
> extra sugar for what is basically a bug fix... And I am feeling that
> providing such a switch to users would be a way for one to shoot
> himself badly, particularly for pg_receivexlog where a crash can cause
> segments to go missing.
>
>
Well, why do we provide a --nosync option for initdb? Wouldn't the argument
basically be the same?

I agree it kind of feels like overkill, but it would be consistent
overkill? :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-01 Thread Michael Paquier
On Fri, Sep 2, 2016 at 2:20 AM, Peter Eisentraut
 wrote:
> On 5/13/16 2:39 AM, Michael Paquier wrote:
>> So, attached are two patches that apply on HEAD to address the problem
>> of pg_basebackup that does not sync the data it writes. As
>> pg_basebackup cannot use directly initdb -S because, as a client-side
>> utility, it may be installed while initdb is not (see Fedora and
>> RHEL), I have refactored the code so as the routines in initdb.c doing
>> the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
>> this is 0001.
>
> Why fe_utils?  initdb is not a front-end program.

Thinking about that, you are right. Let's move it to src/common,
frontend-only though.

>> Patch 0002 is a set of fixes for pg_basebackup:
>> - In plain mode, fsync_pgdata is used so as all the tablespaces are
>> fsync'd at once. This takes care as well of the case where pg_xlog is
>> a symlink.
>> - In tar mode (no stdout), each tar file is synced individually, and
>> the base directory is synced once at the end.
>> In both cases, failures are not considered fatal.
>
> Maybe there should be --nosync options like initdb has?

What do others think about that? I could implement that on top of 0002
with some extra options. But to be honest that looks to be just some
extra sugar for what is basically a bug fix... And I am feeling that
providing such a switch to users would be a way for one to shoot
himself badly, particularly for pg_receivexlog where a crash can cause
segments to go missing.
-- 
Michael


0001-Relocation-fsync-routines-of-initdb-into-src-common.patch
Description: application/download


0002-Issue-fsync-more-carefully-in-pg_receivexlog-and-pg_.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-01 Thread Peter Eisentraut
On 5/13/16 2:39 AM, Michael Paquier wrote:
> So, attached are two patches that apply on HEAD to address the problem
> of pg_basebackup that does not sync the data it writes. As
> pg_basebackup cannot use directly initdb -S because, as a client-side
> utility, it may be installed while initdb is not (see Fedora and
> RHEL), I have refactored the code so as the routines in initdb.c doing
> the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
> this is 0001.

Why fe_utils?  initdb is not a front-end program.

> Patch 0002 is a set of fixes for pg_basebackup:
> - In plain mode, fsync_pgdata is used so as all the tablespaces are
> fsync'd at once. This takes care as well of the case where pg_xlog is
> a symlink.
> - In tar mode (no stdout), each tar file is synced individually, and
> the base directory is synced once at the end.
> In both cases, failures are not considered fatal.

Maybe there should be --nosync options like initdb has?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-05-13 Thread Michael Paquier
On Fri, May 13, 2016 at 11:49 PM, Alex Ignatov  wrote:
>
> On 13.05.2016 9:39, Michael Paquier wrote:
> Do we have any confidence that data file is not being corrupted? I.e
> contains some corrupted page? Can pg_basebackup check page checksum (db init
> with initdb -k) while backing up files?

That may be an idea to consider, for one class of users, though this
patch is not aimed at doing something regarding that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-05-13 Thread Alex Ignatov


On 13.05.2016 9:39, Michael Paquier wrote:

Hi all,

Beginning a new thread because the ext4 issues are closed, and because
pg_basebackup data durability meritates a new thread. And in short
about the problem: pg_basebackup makes no effort in being sure that
the data it backs up is on disk, which is bad... One possible
recommendation is to use initdb -S after running pg_basebackup, but
making sure that data is on disk should be done before pg_basebackup
ends.

On Thu, May 12, 2016 at 8:09 PM, I wrote:

And actually this won't fly high if there is no equivalent of
walkdir() or if the fsync()'s are not applied recursively. On master
at least the refactoring had better be done cleanly first... For the
back branches, we could just have some recursive call like
fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do
you think that this should be part of fe_utils or src/common/? I'd
tend to think the latter is more adapted as there is an equivalent in
the backend. On back-branches, we could just have something like
fsync_recursively that walks though the paths. An even more simple
approach would be to fsync() individually things that have been
written, but that would suck in performance.


So, attached are two patches that apply on HEAD to address the problem
of pg_basebackup that does not sync the data it writes. As
pg_basebackup cannot use directly initdb -S because, as a client-side
utility, it may be installed while initdb is not (see Fedora and
RHEL), I have refactored the code so as the routines in initdb.c doing
the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
this is 0001.

Patch 0002 is a set of fixes for pg_basebackup:
- In plain mode, fsync_pgdata is used so as all the tablespaces are
fsync'd at once. This takes care as well of the case where pg_xlog is
a symlink.
- In tar mode (no stdout), each tar file is synced individually, and
the base directory is synced once at the end.
In both cases, failures are not considered fatal.

With pg_basebackup -X and pg_receivexlog, the manipulation of WAL
files is made durable by using fsync and durable_rename where needed
(credits to Andres mainly for this part).

This set of patches is aimed only at HEAD. Back-patchable versions of
this patch would need to copy fsync_pgdata and friends into
streamutil.c for example.

I am adding that to the next CF for review as a bug fix.
Regards,





Hi!
Do we have any confidence that data file is not being corrupted? I.e 
contains some corrupted page? Can pg_basebackup check page checksum (db 
init with initdb -k) while backing up files?


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-05-12 Thread Michael Paquier
Hi all,

Beginning a new thread because the ext4 issues are closed, and because
pg_basebackup data durability meritates a new thread. And in short
about the problem: pg_basebackup makes no effort in being sure that
the data it backs up is on disk, which is bad... One possible
recommendation is to use initdb -S after running pg_basebackup, but
making sure that data is on disk should be done before pg_basebackup
ends.

On Thu, May 12, 2016 at 8:09 PM, I wrote:
> And actually this won't fly high if there is no equivalent of
> walkdir() or if the fsync()'s are not applied recursively. On master
> at least the refactoring had better be done cleanly first... For the
> back branches, we could just have some recursive call like
> fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do
> you think that this should be part of fe_utils or src/common/? I'd
> tend to think the latter is more adapted as there is an equivalent in
> the backend. On back-branches, we could just have something like
> fsync_recursively that walks though the paths. An even more simple
> approach would be to fsync() individually things that have been
> written, but that would suck in performance.

So, attached are two patches that apply on HEAD to address the problem
of pg_basebackup that does not sync the data it writes. As
pg_basebackup cannot use directly initdb -S because, as a client-side
utility, it may be installed while initdb is not (see Fedora and
RHEL), I have refactored the code so as the routines in initdb.c doing
the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
this is 0001.

Patch 0002 is a set of fixes for pg_basebackup:
- In plain mode, fsync_pgdata is used so as all the tablespaces are
fsync'd at once. This takes care as well of the case where pg_xlog is
a symlink.
- In tar mode (no stdout), each tar file is synced individually, and
the base directory is synced once at the end.
In both cases, failures are not considered fatal.

With pg_basebackup -X and pg_receivexlog, the manipulation of WAL
files is made durable by using fsync and durable_rename where needed
(credits to Andres mainly for this part).

This set of patches is aimed only at HEAD. Back-patchable versions of
this patch would need to copy fsync_pgdata and friends into
streamutil.c for example.

I am adding that to the next CF for review as a bug fix.
Regards,
-- 
Michael


0001-Relocation-fsync-routines-of-initdb-into-fe_utils.patch
Description: application/download


0002-Issue-fsync-more-carefully-in-pg_receivexlog-and-pg_.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers