Re: InstallXLogFileSegment() vs concurrent WAL flush

2024-02-05 Thread Kyotaro Horiguchi
At Fri, 2 Feb 2024 14:42:46 +0100, Thomas Munro  wrote 
in 
> On Fri, Feb 2, 2024 at 12:56 PM Yugo NAGATA  wrote:
> > On Fri, 2 Feb 2024 11:18:18 +0100
> > Thomas Munro  wrote:
> > > One simple way to address that would be to make XLogFileInitInternal()
> > > wait for InstallXLogFileSegment() to finish.  It's a little
> >
> > Or, can we make sure the rename is durable by calling fsync before
> > returning the fd, as a patch attached here?
> 
> Right, yeah, that works too.  I'm not sure which way is better.

I'm not sure I like issuing spurious syncs unconditionally. Therefore,
I prefer Thomas' approach in that regard.  0002 would be beneficial,
considering the case of a very large max_wal_size, and the code seems
to be the minimal required. I don't think it matters that the lock
attempts occur uselessly until the first segment installation. That
being said, we could avoid it by initializing
last_known_installed_segno properly.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: InstallXLogFileSegment() vs concurrent WAL flush

2024-02-02 Thread Thomas Munro
On Fri, Feb 2, 2024 at 12:56 PM Yugo NAGATA  wrote:
> On Fri, 2 Feb 2024 11:18:18 +0100
> Thomas Munro  wrote:
> > One simple way to address that would be to make XLogFileInitInternal()
> > wait for InstallXLogFileSegment() to finish.  It's a little
>
> Or, can we make sure the rename is durable by calling fsync before
> returning the fd, as a patch attached here?

Right, yeah, that works too.  I'm not sure which way is better.




Re: InstallXLogFileSegment() vs concurrent WAL flush

2024-02-02 Thread Yugo NAGATA
On Fri, 2 Feb 2024 11:18:18 +0100
Thomas Munro  wrote:

> Hi,
> 
> New WAL space is created by renaming a file into place.  Either a
> newly created file with a temporary name or, ideally, a recyclable old
> file with a name derived from an old LSN.  I think there is a data
> loss window between rename() and fsync(parent_directory).  A
> concurrent backend might open(new_name), write(), fdatasync(), and
> then we might lose power before the rename hits the disk.  The data
> itself would survive the crash, but recovery wouldn't be able to find
> and replay it.  That might break the log-before-data rule or forget a
> transaction that has been reported as committed to a client.
> 
> Actual breakage would presumably require really bad luck, and I
> haven't seen this happen or anything, it just occurred to me while
> reading code, and I can't see any existing defences.
> 
> One simple way to address that would be to make XLogFileInitInternal()
> wait for InstallXLogFileSegment() to finish.  It's a little

Or, can we make sure the rename is durable by calling fsync before
returning the fd, as a patch attached here?

Regards,
Yugo Nagata

> pessimistic to do that unconditionally, though, as then you have to
> wait even for rename operations for segment files later than the one
> you're opening, so I thought about how to skip waiting in that case --
> see 0002.  I'm not sure if it's worth worrying about or not.


-- 
Yugo NAGATA 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 478377c4a2..862109ca3b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3062,7 +3062,11 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	 errmsg("could not open file \"%s\": %m", path)));
 	}
 	else
+	{
+		fsync_fname_ext(path, false, false, ERROR);
+		fsync_parent_path(path, ERROR);
 		return fd;
+	}
 
 	/*
 	 * Initialize an empty (all zeroes) segment.  NOTE: it is possible that


InstallXLogFileSegment() vs concurrent WAL flush

2024-02-02 Thread Thomas Munro
Hi,

New WAL space is created by renaming a file into place.  Either a
newly created file with a temporary name or, ideally, a recyclable old
file with a name derived from an old LSN.  I think there is a data
loss window between rename() and fsync(parent_directory).  A
concurrent backend might open(new_name), write(), fdatasync(), and
then we might lose power before the rename hits the disk.  The data
itself would survive the crash, but recovery wouldn't be able to find
and replay it.  That might break the log-before-data rule or forget a
transaction that has been reported as committed to a client.

Actual breakage would presumably require really bad luck, and I
haven't seen this happen or anything, it just occurred to me while
reading code, and I can't see any existing defences.

One simple way to address that would be to make XLogFileInitInternal()
wait for InstallXLogFileSegment() to finish.  It's a little
pessimistic to do that unconditionally, though, as then you have to
wait even for rename operations for segment files later than the one
you're opening, so I thought about how to skip waiting in that case --
see 0002.  I'm not sure if it's worth worrying about or not.


0001-Fix-InstallXLogFileSegment-concurrency-bug.patch
Description: Binary data


0002-Track-end-of-installed-WAL-space-in-shared-memory.patch
Description: Binary data