Re: Speed up the removal of WAL files

2018-09-30 Thread Michael Paquier
On Fri, Mar 09, 2018 at 12:50:03AM +, Tsunakawa, Takayuki wrote:
> From: Michael Paquier [mailto:mich...@paquier.xyz]
>> Hm.  durable_xx should remain a sane operation as an isolated call as you
>> still get the same problem if a crash happens before flushing the parent...
>> Fujii-san idea also has value to speed up the end of recovery but this costs
>> as well in extra recycling operations post promotion.  If the checkpoint
>> was to happen a the end of recovery then that would be more logic, but we
>> don't for performance reasons.  Let's continue to discuss on this thread.
>> If you have any patch to offer, let's also look at them.
>> 
>> Anyway, as things are pretty much idle on this thread for a couple of days
>> and that we are still discussing potential ideas, I think that this entry
>> should be marked as returned with feedback.  Thoughts?
> 
> OK, I moved this to the next CF.  Thank you for your cooperation.

The patch is still roughly with this status, so I am marking it as
returned with feedback.
--
Michael


signature.asc
Description: PGP signature


Re: Speed up the removal of WAL files

2018-07-05 Thread Fujii Masao
On Wed, Feb 21, 2018 at 4:52 PM, Michael Paquier  wrote:
> On Wed, Feb 21, 2018 at 07:20:00AM +, Tsunakawa, Takayuki wrote:
>> Right.  Then I thought of doing the following to avoid making a new
>> function only for RemoveNonParentXlogFiles() which is similar to
>> RemoveXlogFile().
>>
>> * Add an argument "bool durable" to RemoveXlogFile().  Based on its
>> value, RemoveXlogFile() calls either durable_xx() or xx().
>>
>> * Likewise, add an argument "bool durable" to InstallXLogFileSegment()
>> and do the same.
>>
>> * RemoveNonParentXlogFiles() calls RemoveXlogFile() with
>> durable=false.  At the end of the function, sync the pg_wal directory
>> with fsync_fname().
>>
>> * RemoveOldXlogFiles() does the same thing.  One difference is that it
>> passes false to RemoveXlogFile() during recovery (InRecovery == true)
>> and true otherwise.
>
> It seems to me that you would reintroduce partially the problems that
> 1d4a0ab1 has fixed.  In short, if a crash happens in the code paths
> calling RemoveXlogFile with durable = false before fsync'ing pg_wal,
> then a rename has no guarantee to be durable, so you could finish again
> with a file that as an old name, but new contents.

ISTM that this trouble will never happen because basically no one writes
new contents in the renamed WAL files while RemoveNonParentXlogFiles()
is renaming the WAL files at the end of recovery. No?

Regards,

-- 
Fujii Masao



RE: Speed up the removal of WAL files

2018-03-08 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz]
> Hm.  durable_xx should remain a sane operation as an isolated call as you
> still get the same problem if a crash happens before flushing the parent...
> Fujii-san idea also has value to speed up the end of recovery but this costs
> as well in extra recycling operations post promotion.  If the checkpoint
> was to happen a the end of recovery then that would be more logic, but we
> don't for performance reasons.  Let's continue to discuss on this thread.
> If you have any patch to offer, let's also look at them.
> 
> Anyway, as things are pretty much idle on this thread for a couple of days
> and that we are still discussing potential ideas, I think that this entry
> should be marked as returned with feedback.  Thoughts?

OK, I moved this to the next CF.  Thank you for your cooperation.



Regards
Takayuki Tsunakawa





RE: Speed up the removal of WAL files

2018-03-06 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz]
> On Wed, Mar 07, 2018 at 12:55:43AM +0900, Fujii Masao wrote:
> > So, what about, as another approach, making the checkpointer instead
> > of the startup process call RemoveNonParentXlogFiles() when
> > end-of-recovery checkpoint is executed? ISTM that a recovery doesn't
> > need to wait for
> > RemoveNonParentXlogFiles() to end. Instead, RemoveNonParentXlogFiles()
> > seems to have to complete before the checkpointer calls
> > RemoveOldXlogFiles() and creates .ready files for the "garbage" WAL files
> on the old timeline.
> > So it seems natual to leave that WAL recycle task to the checkpointer.
> 
> Couldn't that impact the I/O performance at the end of recovery until the
> first post-recovery checkpoint is completed?  Let's not forget that since
> 9.3 the end-of-recovery checkpoint is not triggered immediately, so there
> could be a delay.  If WAL segments of the past timeline are recycled without
> waiting for this first checkpoint to happen then there is no need to create
> new, zero-emptied, segments post-recovery, which can count as well.

Good point.  I understood you referred to PreallocXlogFiles(), which may create 
one new WAL file if RemoveNonParentXlogFiles() is not called or does not 
recycle WAL files in the old timeline.

The best hack (or a compromise/kludge?) seems to be:

1. Modify durable_xx() functions so that they don't fsync directory hanges when 
enableFsync is false.

2. RemoveNonParentXlogFiles() sets enableFsync to false before the while loop, 
restores the original value of it after the while loop, and fsync pg_wal/ just 
once.
What do you think?


Regards
Takayuki Tsunakawa






Re: Speed up the removal of WAL files

2018-03-06 Thread Michael Paquier
On Wed, Mar 07, 2018 at 12:55:43AM +0900, Fujii Masao wrote:
> So, what about, as another approach, making the checkpointer instead of
> the startup process call RemoveNonParentXlogFiles() when end-of-recovery
> checkpoint is executed? ISTM that a recovery doesn't need to wait for
> RemoveNonParentXlogFiles() to end. Instead, RemoveNonParentXlogFiles()
> seems to have to complete before the checkpointer calls RemoveOldXlogFiles()
> and creates .ready files for the "garbage" WAL files on the old timeline.
> So it seems natual to leave that WAL recycle task to the checkpointer.

Couldn't that impact the I/O performance at the end of recovery until
the first post-recovery checkpoint is completed?  Let's not forget that
since 9.3 the end-of-recovery checkpoint is not triggered immediately,
so there could be a delay.  If WAL segments of the past timeline are
recycled without waiting for this first checkpoint to happen then there
is no need to create new, zero-emptied, segments post-recovery, which
can count as well.
--
Michael


signature.asc
Description: PGP signature


Re: Speed up the removal of WAL files

2018-03-06 Thread Fujii Masao
On Wed, Feb 21, 2018 at 5:27 PM, Tsunakawa, Takayuki
 wrote:
> From: Michael Paquier [mailto:mich...@paquier.xyz]
> It seems to me that you would reintroduce partially the problems that
>> 1d4a0ab1 has fixed.  In short, if a crash happens in the code paths calling
>> RemoveXlogFile with durable = false before fsync'ing pg_wal, then a rename
>> has no guarantee to be durable, so you could finish again with a file that
>> as an old name, but new contents.  A crucial thing which matters for a rename
>
> Hmm, you're right.  Even during recovery, RemoveOldXlogFiles() can't skip 
> fsyncing pg_wal/ because new WAL records streamed from the master are written 
> to recycled WAL files.
>
> After all, it seems to me that we have to stand with the current patch which 
> only handles RemoveNonParentXlogFiles().

But the approach that the patch uses would cause the performance problem
as Horiguchi-san pointed out upthread.

So, what about, as another approach, making the checkpointer instead of
the startup process call RemoveNonParentXlogFiles() when end-of-recovery
checkpoint is executed? ISTM that a recovery doesn't need to wait for
RemoveNonParentXlogFiles() to end. Instead, RemoveNonParentXlogFiles()
seems to have to complete before the checkpointer calls RemoveOldXlogFiles()
and creates .ready files for the "garbage" WAL files on the old timeline.
So it seems natual to leave that WAL recycle task to the checkpointer.

Regards,

-- 
Fujii Masao



RE: Speed up the removal of WAL files

2018-02-21 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz]
It seems to me that you would reintroduce partially the problems that
> 1d4a0ab1 has fixed.  In short, if a crash happens in the code paths calling
> RemoveXlogFile with durable = false before fsync'ing pg_wal, then a rename
> has no guarantee to be durable, so you could finish again with a file that
> as an old name, but new contents.  A crucial thing which matters for a rename

Hmm, you're right.  Even during recovery, RemoveOldXlogFiles() can't skip 
fsyncing pg_wal/ because new WAL records streamed from the master are written 
to recycled WAL files.

After all, it seems to me that we have to stand with the current patch which 
only handles RemoveNonParentXlogFiles().


Regards
Takayuki Tsunakawa






Re: Speed up the removal of WAL files

2018-02-20 Thread Michael Paquier
On Wed, Feb 21, 2018 at 07:20:00AM +, Tsunakawa, Takayuki wrote:
> Right.  Then I thought of doing the following to avoid making a new
> function only for RemoveNonParentXlogFiles() which is similar to
> RemoveXlogFile(). 
> 
> * Add an argument "bool durable" to RemoveXlogFile().  Based on its
> value, RemoveXlogFile() calls either durable_xx() or xx(). 
> 
> * Likewise, add an argument "bool durable" to InstallXLogFileSegment()
> and do the same.
>
> * RemoveNonParentXlogFiles() calls RemoveXlogFile() with
> durable=false.  At the end of the function, sync the pg_wal directory
> with fsync_fname(). 
> 
> * RemoveOldXlogFiles() does the same thing.  One difference is that it
> passes false to RemoveXlogFile() during recovery (InRecovery == true)
> and true otherwise. 

It seems to me that you would reintroduce partially the problems that
1d4a0ab1 has fixed.  In short, if a crash happens in the code paths
calling RemoveXlogFile with durable = false before fsync'ing pg_wal,
then a rename has no guarantee to be durable, so you could finish again
with a file that as an old name, but new contents.  A crucial thing
which matters for a rename to be durable is that the old file is sync'ed
as well.
--
Michael


signature.asc
Description: PGP signature


RE: Speed up the removal of WAL files

2018-02-20 Thread Tsunakawa, Takayuki
From: Fujii Masao [mailto:masao.fu...@gmail.com]
> On Fri, Nov 17, 2017 at 5:20 PM, Tsunakawa, Takayuki
>  wrote:
> > Yes, I noticed it after submitting the patch and was wondering what to
> do.  Thinking simply, I think it would be just enough to replace
> durable_unlink/durable_rename in RemoveXLogFile() with unlink/rename, and
> sync the pg_wal directory once in RemoveNonParentXlogFiles() and
> RemoveOldXlogFiles().  This will benefit the failover time when fast
> promotion is not performed.  What do you think?
> 
> It seems not good idea to just replace durable_rename() with rename() in
> RemoveOldXlogFiles()->RemoveXlogFiles()->InstallXLogFileSegment().
> Because that change seems to be able to cause the following problem.
> 
> 1. Checkpoint calls RemoveOldXlogFiles().
> 2. It recycles the WAL file AAA to BBB. pg_wal directory has not fsync'd
> yet.
> 3. Another transaction (TX1) writes its WAL data into the (recycled) file
> BBB.
> 4. CRASH and RESTART
> 5. The WAL file BBB disappears and you can see AAA,
> but AAA is not used in recovery. This causes data loss of transaction
> by Tx1.

Right.  Then I thought of doing the following to avoid making a new function 
only for RemoveNonParentXlogFiles() which is similar to RemoveXlogFile().

* Add an argument "bool durable" to RemoveXlogFile().  Based on its value, 
RemoveXlogFile() calls either durable_xx() or xx().

* Likewise, add an argument "bool durable" to InstallXLogFileSegment() and do 
the same.

* RemoveNonParentXlogFiles() calls RemoveXlogFile() with durable=false.  At the 
end of the function, sync the pg_wal directory with fsync_fname().

* RemoveOldXlogFiles() does the same thing.  One difference is that it passes 
false to RemoveXlogFile() during recovery (InRecovery == true) and true 
otherwise.

But this requires InstallXLogFileSegment() to also have an argument "bool 
durable."  That means InstallXLogFileSegment() and RemoveXlogFile() have to 
include something like below in several places, which is dirty:

if (durable)
rc = durable_xx(path, elevel);
else
rc = xx(path);
if (rc != 0)
{
/* durable_xx() output the message */
if (!durable)
ereport(elevel, ...);
}

Do you think of a cleaner way?


From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> The orinal code recycles some of the to-be-removed files, but the
> patch removes all the victims.  This may impact on performance.

At most only 10 WAL files are recycled.  If there's no good solution to the 
above matter, can we compromise with the current patch?

if (PriorRedoPtr == InvalidXLogRecPtr)
recycleSegNo = endlogSegNo + 10;

> Likewise the original code is using durable_unlink to actually
> remove a file so separating unlink and fsync might resurrect the
> problem that should have been fixed by
> 1b02be21f271db6bd3cd43abb23fa596fcb6bac3 (I'm not sure what it
> was but you are one of the reviwers of it). I suppose that you
> need to explain the reason why this change doesn't risk anything.

If the host crashes while running RemoveNonParentXlogFiles() and some effects 
of the function are lost, the next recovery will do them again.

Regards
Takayuki Tsunakawa





RE: Speed up the removal of WAL files

2017-11-19 Thread Tsunakawa, Takayuki
Thanks, it seems to require a bit more consideration about 
RemoveOldXLogFiles().  Let me continue this next month.


Regards
Takayuki Tsunakawa


> -Original Message-
> From: Michael Paquier [mailto:michael.paqu...@gmail.com]
> Sent: Saturday, November 18, 2017 10:37 PM
> To: Fujii Masao
> Cc: Tsunakawa, Takayuki/綱川 貴之; Kyotaro HORIGUCHI;
> pgsql-hack...@postgresql.org
> Subject: Re: Speed up the removal of WAL files
> 
> On Sat, Nov 18, 2017 at 2:57 AM, Fujii Masao <masao.fu...@gmail.com> wrote:
> > On Fri, Nov 17, 2017 at 5:20 PM, Tsunakawa, Takayuki
> > <tsunakawa.ta...@jp.fujitsu.com> wrote:
> >> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> >>> The orinal code recycles some of the to-be-removed files, but the
> >>> patch removes all the victims.  This may impact on performance.
> >>
> >> Yes, I noticed it after submitting the patch and was wondering what to
> do.  Thinking simply, I think it would be just enough to replace
> durable_unlink/durable_rename in RemoveXLogFile() with unlink/rename, and
> sync the pg_wal directory once in RemoveNonParentXlogFiles() and
> RemoveOldXlogFiles().  This will benefit the failover time when fast
> promotion is not performed.  What do you think?
> 
> Note that durable_rename() also flushes the origin file to make sure that
> it does not show up again after a crash.
> 
> > It seems not good idea to just replace durable_rename() with rename()
> > in RemoveOldXlogFiles()->RemoveXlogFiles()->InstallXLogFileSegment().
> > Because that change seems to be able to cause the following problem.
> 
> If archiving is enabled, RemoveOldXlogFiles() would create a .ready flag
> for all segments that have reappeared. Oops, it archived again.



Re: Speed up the removal of WAL files

2017-11-17 Thread Fujii Masao
On Fri, Nov 17, 2017 at 5:20 PM, Tsunakawa, Takayuki
 wrote:
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
>> The orinal code recycles some of the to-be-removed files, but the patch
>> removes all the victims.  This may impact on performance.
>
> Yes, I noticed it after submitting the patch and was wondering what to do.  
> Thinking simply, I think it would be just enough to replace 
> durable_unlink/durable_rename in RemoveXLogFile() with unlink/rename, and 
> sync the pg_wal directory once in RemoveNonParentXlogFiles() and 
> RemoveOldXlogFiles().  This will benefit the failover time when fast 
> promotion is not performed.  What do you think?

It seems not good idea to just replace durable_rename() with rename()
in RemoveOldXlogFiles()->RemoveXlogFiles()->InstallXLogFileSegment().
Because that change seems to be able to cause the following problem.

1. Checkpoint calls RemoveOldXlogFiles().
2. It recycles the WAL file AAA to BBB. pg_wal directory has not fsync'd yet.
3. Another transaction (TX1) writes its WAL data into the (recycled) file BBB.
4. CRASH and RESTART
5. The WAL file BBB disappears and you can see AAA,
but AAA is not used in recovery. This causes data loss of
transaction by Tx1.

Regards,

-- 
Fujii Masao



RE: Speed up the removal of WAL files

2017-11-17 Thread Tsunakawa, Takayuki
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> The orinal code recycles some of the to-be-removed files, but the patch
> removes all the victims.  This may impact on performance.

Yes, I noticed it after submitting the patch and was wondering what to do.  
Thinking simply, I think it would be just enough to replace 
durable_unlink/durable_rename in RemoveXLogFile() with unlink/rename, and sync 
the pg_wal directory once in RemoveNonParentXlogFiles() and 
RemoveOldXlogFiles().  This will benefit the failover time when fast promotion 
is not performed.  What do you think?

BTW, RemoveNonParentXlogFiles() recycles only 10 WAL files and delete all other 
files.  So the impact of modification is limited.


> Likewise the original code is using durable_unlink to actually remove a
> file so separating unlink and fsync might resurrect the problem that should
> have been fixed by
> 1b02be21f271db6bd3cd43abb23fa596fcb6bac3 (I'm not sure what it was but you
> are one of the reviwers of it). I suppose that you need to explain the reason
> why this change doesn't risk anything.

It's safe because the directory is finally synced.  If the database server 
crashes before it deletes all WAL files, it performs the deletion again during 
the next recovery.

Regards
Takayuki Tsunakawa