Re: Problem while setting the fpw with SIGHUP

2018-10-06 Thread Amit Kapila
On Fri, Sep 28, 2018 at 5:16 PM Amit Kapila  wrote:
>
> On Fri, Sep 28, 2018 at 4:23 AM Michael Paquier  wrote:
> >
> > On Thu, Sep 27, 2018 at 07:38:31PM +0530, Amit Kapila wrote:
> > > Okay, I am planning to commit the attached patch tomorrow unless you
> > > or anybody else has any objections to it.
> >
> > None from here.  Thanks for taking care of it.
> >
>
> Thanks, pushed!  I have backpatched till 9.5 as this has been
> introduced by the commit 2c03216d83 which added the XLOG machinery
> initialization in RecoveryInProgress code path.
>

I have marked the originally reported issue as fixed in the open-items list.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-09-28 Thread Amit Kapila
On Fri, Sep 28, 2018 at 4:23 AM Michael Paquier  wrote:
>
> On Thu, Sep 27, 2018 at 07:38:31PM +0530, Amit Kapila wrote:
> > Okay, I am planning to commit the attached patch tomorrow unless you
> > or anybody else has any objections to it.
>
> None from here.  Thanks for taking care of it.
>

Thanks, pushed!  I have backpatched till 9.5 as this has been
introduced by the commit 2c03216d83 which added the XLOG machinery
initialization in RecoveryInProgress code path.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-09-27 Thread Michael Paquier
On Thu, Sep 27, 2018 at 07:38:31PM +0530, Amit Kapila wrote:
> Okay, I am planning to commit the attached patch tomorrow unless you
> or anybody else has any objections to it.

None from here.  Thanks for taking care of it.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-09-27 Thread Amit Kapila
On Thu, Sep 27, 2018 at 6:22 PM Michael Paquier  wrote:
>
> On Thu, Sep 27, 2018 at 04:19:02PM +0530, Amit Kapila wrote:
> > I think this is mostly fine, but it seems "if the instance just got
> > out of recovery" doesn't fit well because it can happen anytime after
> > recovery, this code gets called from checkpointer.  I think we can
> > slightly tweak it as below:
> > "Perform this outside critical section so that the WAL insert
> > initialization done by RecoveryInProgress() doesn't trigger an
> > assertion failure."
> >
> > What do you say?
>
> Fine for me.
>

Okay, I am planning to commit the attached patch tomorrow unless you
or anybody else has any objections to it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_fpwupdate.2.patch
Description: Binary data


Re: Problem while setting the fpw with SIGHUP

2018-09-27 Thread Michael Paquier
On Thu, Sep 27, 2018 at 04:19:02PM +0530, Amit Kapila wrote:
> I think this is mostly fine, but it seems "if the instance just got
> out of recovery" doesn't fit well because it can happen anytime after
> recovery, this code gets called from checkpointer.  I think we can
> slightly tweak it as below:
> "Perform this outside critical section so that the WAL insert
> initialization done by RecoveryInProgress() doesn't trigger an
> assertion failure."
> 
> What do you say?

Fine for me.

>> Sure, feel free to if you have some room.  I am fine to take care of it
>> as well, so that's up to you to decide.
> 
> Okay, I will take care of it.

Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-09-27 Thread Amit Kapila
On Thu, Sep 27, 2018 at 1:32 PM Michael Paquier  wrote:
>
> On Thu, Sep 27, 2018 at 11:18:02AM +0530, Amit Kapila wrote:
> > Your proposed solution makes sense to me.  IIUC, this is quite similar
> > to what Dilip has also proposed [1].
>
> Indeed.  I would just add with the patch a comment like that:
> "Perform this call outside the critical section so as if the instance
> just got out of recovery, the upcoming WAL insert initialization does
> not trigger an assertion failure."
>

I think this is mostly fine, but it seems "if the instance just got
out of recovery" doesn't fit well because it can happen anytime after
recovery, this code gets called from checkpointer.  I think we can
slightly tweak it as below:
"Perform this outside critical section so that the WAL insert
initialization done by RecoveryInProgress() doesn't trigger an
assertion failure."

What do you say?

> Sure, feel free to if you have some room.  I am fine to take care of it
> as well, so that's up to you to decide.

Okay, I will take care of it.

>  Adding a comment like what I
> proposed upthread is necessary in my opinion.

Agreed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-09-27 Thread Michael Paquier
On Thu, Sep 27, 2018 at 01:30:13PM +0530, Amit Kapila wrote:
> I can take care of committing something along the lines of Dilip's
> patch if you are okay.

Sure, feel free to if you have some room.  I am fine to take care of it
as well, so that's up to you to decide.  Adding a comment like what I
proposed upthread is necessary in my opinion.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-09-27 Thread Michael Paquier
On Thu, Sep 27, 2018 at 11:18:02AM +0530, Amit Kapila wrote:
> Your proposed solution makes sense to me.  IIUC, this is quite similar
> to what Dilip has also proposed [1].

Indeed.  I would just add with the patch a comment like that:
"Perform this call outside the critical section so as if the instance
just got out of recovery, the upcoming WAL insert initialization does
not trigger an assertion failure."

If that sounds fine to you, I propose that we commit Dilip's patch with
the comment addition.  That will take care of (a).
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-09-26 Thread Amit Kapila
On Thu, Sep 27, 2018 at 10:34 AM Michael Paquier  wrote:
>
> On Thu, Sep 27, 2018 at 10:03:59AM +0530, Amit Kapila wrote:
> > I think, in this case, it might be advisable to just fix the problem
> > (a) which is what has been reported originally in the thread and
> > AFAICS, the fix for that is clear as compared to the problem (b).  If
> > you agree, then we can discuss what is the best fix for the first
> > problem (a).
>
> Okay, thanks for the input.  The fix for (a) would be in my opinion to
> just move the call to RecoveryInProgress() out of the critical section,
> then save the result into a variable, and use the variable within the
> critical section to avoid the potential palloc() problems.  What do you
> think?
>

Your proposed solution makes sense to me.  IIUC, this is quite similar
to what Dilip has also proposed [1].

[1] - 
https://www.postgresql.org/message-id/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-09-26 Thread Amit Kapila
On Wed, Sep 19, 2018 at 10:50 AM Michael Paquier  wrote:
> > Agreed. "If we need to do that in the start process," we need to
> > change the shared flag and issue FPW_CHANGE always when the
> > database state is different from configuration file, regardless
> > of what StartXLOG() did until the point.
>
> Definitely my mistake here.  Attached is a patch to show what I have in
> mind by making sure that the startup process generates a record even
> after switching full_page_writes when started normally.  This removes
> the condition based on InRecovery, and uses a new argument for
> UpdateFullPageWrites() instead.  Your test case,as well as what I do
> manually for testing, pass without triggering the assertion.
>

This will fix the previous problem reported by me but will lead to
some other behavior change.  If somebody toggles the full_page_writes
flag before crash recovery, then it will log the XLOG_FPW_CHANGE
record, but that was not the case without your patch.

> When I see your patch, I actually see the same kind of logic as what I
> propose which is summarized in two points, and that's a good thing:
> a) Avoid the allocation in the critical section.
> b) Avoid two processes to call UpdateFullPageWrites at the same time.

I think, in this case, it might be advisable to just fix the problem
(a) which is what has been reported originally in the thread and
AFAICS, the fix for that is clear as compared to the problem (b).  If
you agree, then we can discuss what is the best fix for the first
problem (a).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-09-21 Thread Amit Kapila
On Tue, Sep 18, 2018 at 12:46 PM Kyotaro HORIGUCHI
 wrote:
>
> At Fri, 14 Sep 2018 16:30:37 +0530, Amit Kapila  
> wrote in 
> > On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier  
> > wrote:
> > >
> > > On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote:
> > > > /*
> > > >  * Properly accept or ignore signals the postmaster might send us.
> > > >  */
> > > > -   pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config 
> > > > file */
> > > > +   pqsignal(SIGHUP, SIG_IGN);  /* ignore reload config */
> > > >
> > > > I am finally coming back to this patch set, and that's one of the first
> > > > things I am going to help moving on for this CF.  And this bit from the
> > > > last patch series is not acceptable as there are some parameters which
> > > > are used by the startup process which can be reloaded.  One of them is
> > > > wal_retrieve_retry_interval for tuning when fetching WAL at recovery.
> > >
> > > So, I have been working on this problem again and I have reviewed the
> > > thread, and there have been many things discussed in the last couple of
> > > months:
> > > 1) We do not want to initialize XLogInsert stuff unconditionally for all
> > > processes at the moment recovery begins, but we just want to initialize
> > > it once WAL write is open for business.
> > > 2) Both the checkpointer and the startup process can call
> > > UpdateFullPageWrites() which can cause Insert->fullPageWrites to get
> > > incorrect values.
> >
> > Can you share the steps to reproduce this problem?
>
> The window for the issue is small.
>
> It happens if checkpointer first looks SharedRecoveryInProgress
> is turned off at the beginning of the CheckPointerMain loop.
> The window is needed be widen to make sure the issue happens.
>
> Applying the first patch attched, the following steps will cause
> the issue with high reliability.
>
> 1. initdb  (full_page_writes = on by default)
>
> 2. put recovery.conf so that server can accept promote signal
>
> 3. touch /tmp/hoge
>change full_page_write to off in postgresql.conf
>
> 4. pg_ctl_promote
>
> The attached second file is a script do the above steps.
> Server writes the following log message and die.
>
> | 2018-09-18 13:55:49.928 JST [16405] LOG:  database system is ready to 
> accept connections
> | TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 731)
> | 2018-09-18 13:55:50.546 JST [16405] LOG:  checkpointer process (PID 16407) 
> was terminated by signal 6: Aborted
>
> We can preallocating the XLogInsert buffer just to prevent the
> crash. This is done by calling RecoveryInProgress() before
> UpdateFullPageWrites() finds it turned to false. This is an
> isolated problem.
>

Yes, till this point the problem is quite clear and can be reproduced,
however, my question was for the next part for which there doesn't
seem to be a concrete test case.

> But it has another issue that FPW_CHANGE record
> can be missing or wrong FPW state after recovery end.
>
> It comes from the fact that responsibility to update the flag is
> not atomically passed from startup to checkpointer. (The window
> is after UpdateFullPageWrites() call and until setting
> SharedRecoveryInProgress to false.)
>

I understand your concern about missing FPW_CHANGE WAL record during
the above window but is that really a problem because till the
promotion is complete, it is not expected that we write any WAL
record.  Now, the other part of the problem you mentioned is "wrong
FPW state" which can happen because we don't read
Insert->fullPageWrites under lock.  If you are concerned about that
then I think we can solve it with a much less invasive change.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-09-20 Thread Michael Paquier
On Wed, Sep 19, 2018 at 02:20:34PM +0900, Michael Paquier wrote:
> On Tue, Sep 18, 2018 at 04:15:42PM +0900, Kyotaro HORIGUCHI wrote:
>> My latest patch tries to remove the window by imposing all
>> responsibility to apply config file changes to the shared FPW
>> flag on the checkpointer. RecoveryInProgress() is changed to be
>> called prior to UpdateFullPageWrites on the way doing that.
> 
> I still need to look at that in details.  That may be better than what I
> am proposing.  At quick glance what I propose is more simple, and does
> not need enforcing a checkpoint using SIGINT.

I have finally looked at the patch set from Horiguchi-san here:
https://www.postgresql.org/message-id/20180828.193436.253621888.horiguchi.kyot...@lab.ntt.co.jp

And actually this is very close to what my proposal does, except for a
couple of caveats.  Here are my notes:
1) The issue with palloc() happening in critical sections is able to go
away, by making the logic behind UpdateFullPageWrites() more complicated
than necessary.  With the proposed patch, UpdateFullPageWrites() never
gets called by the checkpointer until the startup process has done its
business.  I would have believed that keeping the check to
RecoveryInProgress() directly in UpdateFullPageWrites() makes the logic
more simple.  That's actually what my proposal does.  With your patch,
it would be even possible to add an assertion so as this never gets
called while in recovery.
2) At the end of recovery, if there is a crash before the checkpointer
is able to update the shared parameters it needs to work on
away, then no XLOG_CHANGE_PARAMETER record is generated.  This is not a
problem for normal cases, but there is one scenario where this is a
problem: at the end of recovery if the bgwriter is not started, then the
startup process creates a checkpoint by itself, and it would miss the
record generation.
3) SIGINT is abused of, in such a way that the checkpointer may generate
two checkpoints where only one is needed post-recovery.
4) SyncRepUpdateSyncStandbysDefined() would get called even without
SIGHUP being reached.  This feels also like a future trap waiting to
bite as well.

When I see your patch, I actually see the same kind of logic as what I
propose which is summarized in two points, and that's a good thing:
a) Avoid the allocation in the critical section.
b) Avoid two processes to call UpdateFullPageWrites at the same time.

Now the points mentioned above make what you are proposing weaker in my
opinion, and 2) is an actual bug, side effect of the proposed patch.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-09-18 Thread Michael Paquier
On Tue, Sep 18, 2018 at 04:15:42PM +0900, Kyotaro HORIGUCHI wrote:
> My latest patch tries to remove the window by imposing all
> responsibility to apply config file changes to the shared FPW
> flag on the checkpointer. RecoveryInProgress() is changed to be
> called prior to UpdateFullPageWrites on the way doing that.

I still need to look at that in details.  That may be better than what I
am proposing.  At quick glance what I propose is more simple, and does
not need enforcing a checkpoint using SIGINT.

> InRecoery is turned off after the last UpdateFullPageWrite() and
> before SharedRecoveryInProgress is turned off. So it is still
> leaving the window. Thus, checkpointer stops flipping the value
> before SharedRecoveryInProgress's turning off. (I don't
> understand InRecovery condition..) However, this lets
> checkpointer omit reloading after UpdateFullPagesWrite() in
> startup until SharedRecoveryInProgress is tunred off.

That won't matter in this case, as RecoveryInProgress() gets called out
of the critical section in the previous patch I sent, so there is no
failure.  Let's not forget that the first issue is the allocation in the
critical section.  The second issue is that UpdateFullPageWrites may be
called concurrently across multiple processes, which is not something it
is designed for.  The second issue gets addressed in my proposal my
making sure that the checkpointer never tries to update full_page_writes
by himself until recovery has finished, and that the startup process is
the only updater once recovery ends.

> Agreed. "If we need to do that in the start process," we need to
> change the shared flag and issue FPW_CHANGE always when the
> database state is different from configuration file, regardless
> of what StartXLOG() did until the point.

Definitely my mistake here.  Attached is a patch to show what I have in
mind by making sure that the startup process generates a record even
after switching full_page_writes when started normally.  This removes
the condition based on InRecovery, and uses a new argument for
UpdateFullPageWrites() instead.  Your test case,as well as what I do
manually for testing, pass without triggering the assertion.

+   /* DEBUG: cause a reload */
+   {
+   struct stat b;
+   if (stat("/tmp/hoge", ) == 0)
+   {
+   elog(LOG, "STARTUP SLEEP FOR 1s");
+   sleep(1);
+   elog(LOG, "DONE.");
+   DirectFunctionCall1(pg_reload_conf, 0);
+   }
+   }
The way you patch the backend this way is always nice to see so as it is
easy to reproduce bugs, and it actually helps in reproducing the
assertion failure correctly ;)
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3025d0badb..079e1814c1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7719,10 +7719,15 @@ StartupXLOG(void)
 	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
 	 * record before resource manager writes cleanup WAL records or checkpoint
 	 * record is written.
+	 *
+	 * It is safe to check the shared full_page_writes without the lock,
+	 * because there is no concurrently running process able to update it.
+	 * The only other process able to update full_page_writes is the
+	 * checkpointer, still it is unable to do so until recovery finishes.
 	 */
 	Insert->fullPageWrites = lastFullPageWrites;
 	LocalSetXLogInsertAllowed();
-	UpdateFullPageWrites();
+	UpdateFullPageWrites(true);
 	LocalXLogInsertAllowed = -1;
 
 	if (InRecovery)
@@ -9693,14 +9698,28 @@ XLogReportParameters(void)
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
- * Note: this function assumes there is no other process running
- * concurrently that could update it.
+ * Note: this can be called from the checkpointer, or the startup process
+ * at the end of recovery.  One could think that this routine should be
+ * careful with its lock handling, however this is a no-op for the
+ * checkpointer until the startup process marks the end of recovery,
+ * so only one of them can do the work of this routine at once.
  */
 void
-UpdateFullPageWrites(void)
+UpdateFullPageWrites(bool force)
 {
 	XLogCtlInsert *Insert = >Insert;
 
+	/*
+	 * Check if recovery is still in progress before entering this critical
+	 * section, as some memory allocation could happen at the end of
+	 * recovery.  There is nothing to do for a system still in recovery.
+	 * Note that the caller may still want to enforce an update to happen.
+	 * One such example is the startup process, which updates full_page_writes
+	 * at the end of recovery.
+	 */
+	if (RecoveryInProgress() && !force)
+		return;
+
 	/*
 	 * Do nothing if full_page_writes has not been changed.
 	 *
@@ -9731,7 +9750,7 @@ UpdateFullPageWrites(void)
 	 * Write an XLOG_FPW_CHANGE record. This allows us to keep track of
 	 * full_page_writes during archive recovery, if required.
 	 */
-	if 

Re: Problem while setting the fpw with SIGHUP

2018-09-18 Thread Kyotaro HORIGUCHI
At Fri, 14 Sep 2018 16:30:37 +0530, Amit Kapila  wrote 
in 
> On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier  wrote:
> >
> > On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote:
> > > /*
> > >  * Properly accept or ignore signals the postmaster might send us.
> > >  */
> > > -   pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file 
> > > */
> > > +   pqsignal(SIGHUP, SIG_IGN);  /* ignore reload config */
> > >
> > > I am finally coming back to this patch set, and that's one of the first
> > > things I am going to help moving on for this CF.  And this bit from the
> > > last patch series is not acceptable as there are some parameters which
> > > are used by the startup process which can be reloaded.  One of them is
> > > wal_retrieve_retry_interval for tuning when fetching WAL at recovery.
> >
> > So, I have been working on this problem again and I have reviewed the
> > thread, and there have been many things discussed in the last couple of
> > months:
> > 1) We do not want to initialize XLogInsert stuff unconditionally for all
> > processes at the moment recovery begins, but we just want to initialize
> > it once WAL write is open for business.
> > 2) Both the checkpointer and the startup process can call
> > UpdateFullPageWrites() which can cause Insert->fullPageWrites to get
> > incorrect values.
> 
> Can you share the steps to reproduce this problem?

The window for the issue is small.

It happens if checkpointer first looks SharedRecoveryInProgress
is turned off at the beginning of the CheckPointerMain loop.
The window is needed be widen to make sure the issue happens.

Applying the first patch attched, the following steps will cause
the issue with high reliability.

1. initdb  (full_page_writes = on by default)

2. put recovery.conf so that server can accept promote signal

3. touch /tmp/hoge
   change full_page_write to off in postgresql.conf

4. pg_ctl_promote

The attached second file is a script do the above steps.
Server writes the following log message and die.

| 2018-09-18 13:55:49.928 JST [16405] LOG:  database system is ready to accept 
connections
| TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 731)
| 2018-09-18 13:55:50.546 JST [16405] LOG:  checkpointer process (PID 16407) 
was terminated by signal 6: Aborted

We can preallocating the XLogInsert buffer just to prevent the
crash. This is done by calling RecoveryInProgress() before
UpdateFullPageWrites() finds it turned to false. This is an
isolated problem. But it has another issue that FPW_CHANGE record
can be missing or wrong FPW state after recovery end.

It comes from the fact that responsibility to update the flag is
not atomically passed from startup to checkpointer. (The window
is after UpdateFullPageWrites() call and until setting
SharedRecoveryInProgress to false.)

My latest patch tries to remove the window by imposing all
responsibility to apply config file changes to the shared FPW
flag on the checkpointer. RecoveryInProgress() is changed to be
called prior to UpdateFullPageWrites on the way doing that.


> > 3) We do not want a palloc() in a critical section because of
> > RecoveryinProgress being called.
> >
> > And the root issue here is 2), because the checkpointer tries to update
> > Insert->fullPageWrites but it does not need to do so until recovery has
> > been finished.  So in order to fix the original issue I am proposing a
> > simple fix: let's make sure that the checkpointer does not update
> > Insert->fullPageWrites until recovery finishes, and let's have the
> > startup process do the first update once it finishes recovery and
> > inserts by itself the XLOG_PARAMETER_CHANGE.  This way the order of
> > events is purely sequential and we don't need to worry about having the
> > checkpointer and the startup process eat on each other's plate because
> > the checkpointer would only try to work on updating the shared memory
> > value of full_page_writes once SharedRecoveryInProgress is switched to
> > true, and that happens after the startup process does its initial call
> > to UpdateFullPageWrites().  I have improved as well all the comments
> > around to make clear the behavior wanted.
> >
> > Thoughts?

InRecoery is turned off after the last UpdateFullPageWrite() and
before SharedRecoveryInProgress is turned off. So it is still
leaving the window. Thus, checkpointer stops flipping the value
before SharedRecoveryInProgress's turning off. (I don't
understand InRecovery condition..) However, this lets
checkpointer omit reloading after UpdateFullPagesWrite() in
startup until SharedRecoveryInProgress is tunred off.

>  UpdateFullPageWrites(void)
>  {
>   XLogCtlInsert *Insert = >Insert;
> + /*
> + * Check if recovery is still in progress before entering this critical
> + * section, as some memory allocation could happen at the end of
> + * recovery.  There is nothing to do for a system still in recovery.
> + * Note that we need to process things here at the end 

Re: Problem while setting the fpw with SIGHUP

2018-09-18 Thread Michael Paquier
On Tue, Sep 18, 2018 at 01:06:09PM +0900, Kyotaro HORIGUCHI wrote:
> I was wrong here. It was handled in HandleStartupProcInterrupts
> called from StartupXLOG. So, it should be just removed from the
> set. Sorry for the bogus patch.

Thanks for confirming.

Still, it looks like a waste to abuse on SIGINT just to forcibly wake up
the checkpointer and request from it a checkpoint...  And you could just
have used a new parameter for the checkpointer appended with
CHECKPOINT_FORCE.  I think that my approach of just making the set of
events purely ordered will save from any kind of race conditions, while
I suspect that what you propose here does not close all the holes.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-09-18 Thread Michael Paquier
On Fri, Sep 14, 2018 at 04:30:37PM +0530, Amit Kapila wrote:
> On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier  wrote:
>> So, I have been working on this problem again and I have reviewed the
>> thread, and there have been many things discussed in the last couple of
>> months:
>> 1) We do not want to initialize XLogInsert stuff unconditionally for all
>> processes at the moment recovery begins, but we just want to initialize
>> it once WAL write is open for business.
>> 2) Both the checkpointer and the startup process can call
>> UpdateFullPageWrites() which can cause Insert->fullPageWrites to get
>> incorrect values.
> 
> Can you share the steps to reproduce this problem?

This refers to the first problem reported on this thread:
https://www.postgresql.org/message-id/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com

In order to reproduce the problem, you can for example stop the server
in immediate mode.  Then attach a debugger to it and add a breakpoint to
UpdateFullPageWrites.  You can check that XLOG insert has not been
initialized yet by looking at xloginsert_cxt ot ThisTimeLineID.  On a
second session, switch full_page_writes to on or off, reload the
parameters and then trigger a checkpoint.  The important point is to
trigger an inconsistency between XLogCtl->Insert->fullPageWrites and
the value of fullPageWrites within the checkpointer context.  With the
checkpoint triggered, the debugger will stop at UpdateFullPageWrites
immediately.  At this point, you can simply check if fullPageWrites
Insert->fullPageWrites have the same value or a different one.  If the
values match, simply switch full_page_writes and reload again, with the
checkpointer still waiting at the beginning of UpdateFullPageWrites.
SIGHUP will make the checkpointer process hang a bit, and then it will
move on.  At this point you will be able to see the failure:
TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 731)
2018-09-18 15:06:39 JST [7396]: [11-1] db=,user=,app=,client= LOG:
checkpointer process (PID 7399) was terminated by signal 6: Aborted

> On a regular startup when there is no recovery, it won't allow us to
> log the WAL record (XLOG_FPW_CHANGE) which can happen without above
> change.  You can check that by setting full_page_writes=off and start
> the system.

Oh, good point, InRecovery is set to false in this case so that would be
skipped.  We can simply fix that by adding a flag, say "force" to
UpdateFullPageWrites to allow a process to enforce the update of FPW
even if RecoveryInProgress returns true, which would be the case for the
startup process.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-09-17 Thread Kyotaro HORIGUCHI
Hello.

At Tue, 18 Sep 2018 11:38:50 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180918.113850.164570138.horiguchi.kyot...@lab.ntt.co.jp>
> At Thu, 6 Sep 2018 16:37:28 -0700, Michael Paquier  
> wrote in <20180906233728.gr2...@paquier.xyz>
> > I am finally coming back to this patch set, and that's one of the first
> > things I am going to help moving on for this CF.  And this bit from the
> > last patch series is not acceptable as there are some parameters which
> > are used by the startup process which can be reloaded.  One of them is
> > wal_retrieve_retry_interval for tuning when fetching WAL at recovery.
> 
> The third patch actually is not mandatory in the patch set. The
> only motive of that is it doesn't nothing. The handler for SIGHUP
> just sets got_SIGHUP then wakes up the process, and the variable
> is not looked up by no one. If you mind the change, it can be
> removed with no side effect.

I was wrong here. It was handled in HandleStartupProcInterrupts
called from StartupXLOG. So, it should be just removed from the
set. Sorry for the bogus patch.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Problem while setting the fpw with SIGHUP

2018-09-17 Thread Kyotaro HORIGUCHI
At Thu, 6 Sep 2018 16:37:28 -0700, Michael Paquier  wrote 
in <20180906233728.gr2...@paquier.xyz>
> On Tue, Aug 28, 2018 at 07:34:36PM +0900, Kyotaro HORIGUCHI wrote:
> > Thanks for prompting. The difference is in a comment and I'm fine
> > with the change.
> 
> /*
>  * Properly accept or ignore signals the postmaster might send us.
>  */
> -   pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */
> +   pqsignal(SIGHUP, SIG_IGN);  /* ignore reload config */
> 
> I am finally coming back to this patch set, and that's one of the first
> things I am going to help moving on for this CF.  And this bit from the
> last patch series is not acceptable as there are some parameters which
> are used by the startup process which can be reloaded.  One of them is
> wal_retrieve_retry_interval for tuning when fetching WAL at recovery.

The third patch actually is not mandatory in the patch set. The
only motive of that is it doesn't nothing. The handler for SIGHUP
just sets got_SIGHUP then wakes up the process, and the variable
is not looked up by no one. If you mind the change, it can be
removed with no side effect.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Problem while setting the fpw with SIGHUP

2018-09-14 Thread Amit Kapila
On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier  wrote:
>
> On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote:
> > /*
> >  * Properly accept or ignore signals the postmaster might send us.
> >  */
> > -   pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */
> > +   pqsignal(SIGHUP, SIG_IGN);  /* ignore reload config */
> >
> > I am finally coming back to this patch set, and that's one of the first
> > things I am going to help moving on for this CF.  And this bit from the
> > last patch series is not acceptable as there are some parameters which
> > are used by the startup process which can be reloaded.  One of them is
> > wal_retrieve_retry_interval for tuning when fetching WAL at recovery.
>
> So, I have been working on this problem again and I have reviewed the
> thread, and there have been many things discussed in the last couple of
> months:
> 1) We do not want to initialize XLogInsert stuff unconditionally for all
> processes at the moment recovery begins, but we just want to initialize
> it once WAL write is open for business.
> 2) Both the checkpointer and the startup process can call
> UpdateFullPageWrites() which can cause Insert->fullPageWrites to get
> incorrect values.

Can you share the steps to reproduce this problem?

> 3) We do not want a palloc() in a critical section because of
> RecoveryinProgress being called.
>
> And the root issue here is 2), because the checkpointer tries to update
> Insert->fullPageWrites but it does not need to do so until recovery has
> been finished.  So in order to fix the original issue I am proposing a
> simple fix: let's make sure that the checkpointer does not update
> Insert->fullPageWrites until recovery finishes, and let's have the
> startup process do the first update once it finishes recovery and
> inserts by itself the XLOG_PARAMETER_CHANGE.  This way the order of
> events is purely sequential and we don't need to worry about having the
> checkpointer and the startup process eat on each other's plate because
> the checkpointer would only try to work on updating the shared memory
> value of full_page_writes once SharedRecoveryInProgress is switched to
> true, and that happens after the startup process does its initial call
> to UpdateFullPageWrites().  I have improved as well all the comments
> around to make clear the behavior wanted.
>
> Thoughts?
>

 UpdateFullPageWrites(void)
 {
  XLogCtlInsert *Insert = >Insert;
+ /*
+ * Check if recovery is still in progress before entering this critical
+ * section, as some memory allocation could happen at the end of
+ * recovery.  There is nothing to do for a system still in recovery.
+ * Note that we need to process things here at the end of recovery for
+ * the startup process, which is why this checks after InRecovery.
+ */
+ if (RecoveryInProgress() && !InRecovery)
+ return;
+

On a regular startup when there is no recovery, it won't allow us to
log the WAL record (XLOG_FPW_CHANGE) which can happen without above
change.  You can check that by setting full_page_writes=off and start
the system.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-09-14 Thread Michael Paquier
On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote:
> /*
>  * Properly accept or ignore signals the postmaster might send us.
>  */
> -   pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */
> +   pqsignal(SIGHUP, SIG_IGN);  /* ignore reload config */
> 
> I am finally coming back to this patch set, and that's one of the first
> things I am going to help moving on for this CF.  And this bit from the
> last patch series is not acceptable as there are some parameters which
> are used by the startup process which can be reloaded.  One of them is
> wal_retrieve_retry_interval for tuning when fetching WAL at recovery.

So, I have been working on this problem again and I have reviewed the
thread, and there have been many things discussed in the last couple of
months:
1) We do not want to initialize XLogInsert stuff unconditionally for all
processes at the moment recovery begins, but we just want to initialize
it once WAL write is open for business.
2) Both the checkpointer and the startup process can call
UpdateFullPageWrites() which can cause Insert->fullPageWrites to get
incorrect values.
3) We do not want a palloc() in a critical section because of
RecoveryinProgress being called.

And the root issue here is 2), because the checkpointer tries to update
Insert->fullPageWrites but it does not need to do so until recovery has
been finished.  So in order to fix the original issue I am proposing a
simple fix: let's make sure that the checkpointer does not update
Insert->fullPageWrites until recovery finishes, and let's have the
startup process do the first update once it finishes recovery and
inserts by itself the XLOG_PARAMETER_CHANGE.  This way the order of
events is purely sequential and we don't need to worry about having the
checkpointer and the startup process eat on each other's plate because
the checkpointer would only try to work on updating the shared memory
value of full_page_writes once SharedRecoveryInProgress is switched to
true, and that happens after the startup process does its initial call
to UpdateFullPageWrites().  I have improved as well all the comments
around to make clear the behavior wanted.

Thoughts?
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3025d0badb..69912e6a22 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7719,6 +7719,11 @@ StartupXLOG(void)
 	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
 	 * record before resource manager writes cleanup WAL records or checkpoint
 	 * record is written.
+	 *
+	 * It is safe to check the shared full_page_writes without the lock,
+	 * because there is no concurrently running process able to update it.
+	 * The only other process able to update full_page_writes is the
+	 * checkpointer, still it is unable to do so until recovery finishes.
 	 */
 	Insert->fullPageWrites = lastFullPageWrites;
 	LocalSetXLogInsertAllowed();
@@ -9693,14 +9698,27 @@ XLogReportParameters(void)
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
- * Note: this function assumes there is no other process running
- * concurrently that could update it.
+ * Note: this can be called from the checkpointer, or the startup process
+ * at the end of recovery.  One could think that this routine should be
+ * careful with its lock handling, however this is a no-op for the
+ * checkpointer until the startup process marks the end of recovery,
+ * so only one of them can do the work of this routine at once.
  */
 void
 UpdateFullPageWrites(void)
 {
 	XLogCtlInsert *Insert = >Insert;
 
+	/*
+	 * Check if recovery is still in progress before entering this critical
+	 * section, as some memory allocation could happen at the end of
+	 * recovery.  There is nothing to do for a system still in recovery.
+	 * Note that we need to process things here at the end of recovery for
+	 * the startup process, which is why this checks after InRecovery.
+	 */
+	if (RecoveryInProgress() && !InRecovery)
+		return;
+
 	/*
 	 * Do nothing if full_page_writes has not been changed.
 	 *
@@ -9731,7 +9749,7 @@ UpdateFullPageWrites(void)
 	 * Write an XLOG_FPW_CHANGE record. This allows us to keep track of
 	 * full_page_writes during archive recovery, if required.
 	 */
-	if (XLogStandbyInfoActive() && !RecoveryInProgress())
+	if (XLogStandbyInfoActive())
 	{
 		XLogBeginInsert();
 		XLogRegisterData((char *) (), sizeof(bool));


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-09-14 Thread Michael Paquier
On Thu, Sep 13, 2018 at 04:38:30PM +0530, Amit Kapila wrote:
> So, the problem started appearing after some rearrangement of code in
> both the above-mentioned commits.  I verified that this problem
> doesn't exist in versions <=9.4, so backpatch-through 9.5.

Thanks Amit for taking care of this first problem.  I am going to send
another patch which is able to take care of concurrent updates of
Insert->fullPageWrites for the checkpointer and the startup process
to fix the original issue reported by Dilip Kumar, so as we are able to
close definitely the loop on this thread. 
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-09-13 Thread Amit Kapila
On Mon, Sep 10, 2018 at 11:54 AM Amit Kapila  wrote:
>
> Thanks, but what I wanted you to verify is the commit that broke it in
> 9.5.  On again looking at it, I think it is below code in commit
> 2076db2aea that caused this problem.  If possible, can you once test
> it before and at this commit or at least do the manual review of same
> to cross-verify?
>

I have myself investigated this further and found that this problem
started occuring due to code rearrangement in commits 2c03216d83 and
2076db2aea.  In commit 2076db2aea, below check for comparing the old
and new value for fullPageWrites got changed:

> -   if ((Insert->fullPageWrites || Insert->forcePageWrites) &&
> !doPageWrites)
> +   if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr &&
> doPageWrites)
> {

However, it alone didn't lead to this problem because
XLogRecordAssemble() gives the valid value of fpw_lsn irrespective of
whether full-page-writes was enabled or not. Later in commit
2c03216d83, we changed XLogRecordAssemble() such that it will give the
valid value of fpw_lsn only if doPageWrites is enabled, basically this
part of the commit:

+ /* Determine if this block needs to be backed up */
+ if (regbuf->flags & REGBUF_FORCE_IMAGE)
+ needs_backup = true;
+ else if (regbuf->flags & REGBUF_NO_IMAGE)
+ needs_backup = false;
+ else if (!doPageWrites)
+ needs_backup = false;
+ else
  {
- /* Simple data, just include it */
- len += rdt->len;
+ /*
+ * We assume page LSN is first data on *every* page that can be
+ * passed to XLogInsert, whether it has the standard page layout
+ * or not.
+ */
+ XLogRecPtr page_lsn = PageGetLSN(regbuf->page);
+
+ needs_backup = (page_lsn <= RedoRecPtr);
+ if (!needs_backup)
+ {
+ if (*fpw_lsn == InvalidXLogRecPtr || page_lsn < *fpw_lsn)
+ *fpw_lsn = page_lsn;
+ }
  }

So, the problem started appearing after some rearrangement of code in
both the above-mentioned commits.  I verified that this problem
doesn't exist in versions <=9.4, so backpatch-through 9.5.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-09-10 Thread Amit Kapila
On Tue, Aug 28, 2018 at 4:05 PM Kyotaro HORIGUCHI
 wrote:
>
> Hello.
>
> At Sat, 25 Aug 2018 14:50:53 +0530, Amit Kapila  
> wrote in 
> > On Wed, Aug 1, 2018 at 12:56 PM Kyotaro HORIGUCHI
> >  wrote:
> > >
> > > Thank you, Amit, Michael.
> > >
> >
> > Can you verify the first patch that I have posted above [1]?  We can
> > commit it separately.
>
> Thanks for prompting. The difference is in a comment and I'm fine
> with the change.
>

Thanks, but what I wanted you to verify is the commit that broke it in
9.5.  On again looking at it, I think it is below code in commit
2076db2aea that caused this problem.  If possible, can you once test
it before and at this commit or at least do the manual review of same
to cross-verify?

+   doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
-   /*
-* Also check to see if fullPageWrites or forcePageWrites was
just turned
-* on; if we weren't already doing full-page writes then go back and
-* recompute. (If it was just turned off, we could recompute the record
-* without full pages, but we choose not to bother.)
-*/
-   if ((Insert->fullPageWrites || Insert->forcePageWrites) &&
!doPageWrites)
+   if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr &&
doPageWrites)
{
-   /* Oops, must redo it with full-page data. */
+   /*
+* Oops, some buffer now needs to be backed up that the caller
+* didn't back up.  Start over.
+*/
WALInsertLockRelease();
END_CRIT_SECTION();
-   rdt_lastnormal->next = NULL;
-   info = info_orig;
-   goto begin;
+   return InvalidXLogRecPtr;
}


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-09-06 Thread Michael Paquier
On Tue, Aug 28, 2018 at 07:34:36PM +0900, Kyotaro HORIGUCHI wrote:
> Thanks for prompting. The difference is in a comment and I'm fine
> with the change.

/*
 * Properly accept or ignore signals the postmaster might send us.
 */
-   pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */
+   pqsignal(SIGHUP, SIG_IGN);  /* ignore reload config */

I am finally coming back to this patch set, and that's one of the first
things I am going to help moving on for this CF.  And this bit from the
last patch series is not acceptable as there are some parameters which
are used by the startup process which can be reloaded.  One of them is
wal_retrieve_retry_interval for tuning when fetching WAL at recovery.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-08-28 Thread Kyotaro HORIGUCHI
Hello.

At Sat, 25 Aug 2018 14:50:53 +0530, Amit Kapila  wrote 
in 
> On Wed, Aug 1, 2018 at 12:56 PM Kyotaro HORIGUCHI
>  wrote:
> >
> > Thank you, Amit, Michael.
> >
> 
> Can you verify the first patch that I have posted above [1]?  We can
> commit it separately.

Thanks for prompting. The difference is in a comment and I'm fine
with the change.


> > It's a long time ago.. Let me have a bit of time to blow dust off..
> >
> > https://www.postgresql.org/message-id/20180420.173354.43303926.horiguchi.kyot...@lab.ntt.co.jp
> >
> > ...done..i working..
> >
> > The original problem here was UpdateFullPageWrites() can call
> > RecoveryInProgress(), which does palloc in a critical
> > section. This happens when standy is commanded to reload with
> > change of full_pages_writes during recovery.
> >
> 
> AFAIU from the original problem reported by Dilip, it can happen
> during checkpoint without standby as well.

Yes, standby is not needed but archive (streaming) recovery is
required to start checkpointer.

> > While exploring it, I found that update of fullPageWrite flag is
> > updated concurrently against its design. A race condition happens
> > in the following steps in my diagnosis.
> >
> > 1. While the startup process is running recovery, we didn't
> >  consider that checkpointer may be running concurrently, but it
> >  happens for standby.
> >
> > 2. Checkpointer considers itself (or designed) as the *only*
> >  updator of shared config including fillPageWrites. In reality
> >  the startup is another concurent updator on standby. Addition to
> >  that, checkpointer assumes that it is started under WAL-emitting
> >  state, that is, InitXLogInsert() has been called elsewhere, but
> >  it is not the case on standby.
> >
> >  Note that checkpointer mustn't update FPW flag before recovery
> >  ends because startup will overrides the flag.
> >
> > 3. As the result, when standby gets full_page_writes changed and
> >  SIGHUP during recovery, checkpointer tries to update FPW flag
> >  and calls RecoveryInProgress() on the way. bang!
> >
> >
> > With the 0002-step1.patch, checkpointer really becomes the only
> > updator of the FPW flag after recovery ends. StartupXLOG()
> > updates it just once before checkpointer starts to update it.
> >
> 
> - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
> - * record before resource manager writes cleanup WAL records or checkpoint
> - * record is written.
> + * Update full_page_writes in shared memory with the lastest value before
> + * resource manager writes cleanup WAL records or checkpoint record is
> + * written. We don't need to write XLOG_FPW_CHANGE since this just
> + * reflects the status at the last redo'ed record. No lock is required
> + * since startup is the only updator of the flag at this
> + * point. Checkpointer will take over after SharedRecoveryInProgress is
> + * turned to false.
>   */
>   Insert->fullPageWrites = lastFullPageWrites;
> - LocalSetXLogInsertAllowed();
> - UpdateFullPageWrites();
> - LocalXLogInsertAllowed = -1;
> 
> lastFullPageWrites will contain the latest value among the replayed
> WAL records.  It can still be different fullPageWrites which is
> updated by UpdateFullPageWrites().  So, I don't think it is advisable
> to remove it and rely on checkpointer to update it.  I think when it
> is called from this code path, it will anyway not write
> XLOG_FPW_CHANGE because of the !RecoveryInProgress() check.

Unfortunately startup doesn't process reloads by SIGHUP. So just
letting startup process set sharedFPW doesn't work correctly. I
don't think reload during redo loop will be apparently
safe. Forcibly reloading config without SIGHUP just before
UpdateFullPageWrites() in StartupXLOG breaks the reload
semantics.

# The comment for StartupPorcessSigHagndler is wrong in the sense..

> > Under the normal(slow?) promotion mode, checkpointer is woken up
> > explicitly to make the config setting effective as soon as
> > possible. (This might be unnecessary.)
> >
> 
> I am not sure this is the right approach.  If we are worried about
> concurrency issues, then we can use lock to protect it.

Since only checkpointer knows the right current value of the
flag, it is responsible for the final (just after recovery end)
setup of the flag. Actually we don't need to wake up checkpoiner
as soon as the end of recovery, but it must
UpdateFullPageWrites() before the first checkpoint starts without
receiving SIGHUP.


> > In checkpointer, RecoveryInProgress() is called preior to
> > UpdateFPW() to disable update FPW during recovery, so the crash
> > that is the issue here is fixed.
> >
> 
> It seems to me that you are trying to resolve two different problems
> in the same patch.  I think we can have a patch to deal with
> concurrency issue if any and a separate patch to call
> RecoveryInProgress outside critical section.

Hmm. Perhaps right. The change of UpdateShareMemoryConfig alone
resolves the initial issue and others are needed to have 

Re: Problem while setting the fpw with SIGHUP

2018-08-25 Thread Amit Kapila
On Wed, Aug 1, 2018 at 12:56 PM Kyotaro HORIGUCHI
 wrote:
>
> Thank you, Amit, Michael.
>

Can you verify the first patch that I have posted above [1]?  We can
commit it separately.

>
> It's a long time ago.. Let me have a bit of time to blow dust off..
>
> https://www.postgresql.org/message-id/20180420.173354.43303926.horiguchi.kyot...@lab.ntt.co.jp
>
> ...done..i working..
>
> The original problem here was UpdateFullPageWrites() can call
> RecoveryInProgress(), which does palloc in a critical
> section. This happens when standy is commanded to reload with
> change of full_pages_writes during recovery.
>

AFAIU from the original problem reported by Dilip, it can happen
during checkpoint without standby as well.

> While exploring it, I found that update of fullPageWrite flag is
> updated concurrently against its design. A race condition happens
> in the following steps in my diagnosis.
>
> 1. While the startup process is running recovery, we didn't
>  consider that checkpointer may be running concurrently, but it
>  happens for standby.
>
> 2. Checkpointer considers itself (or designed) as the *only*
>  updator of shared config including fillPageWrites. In reality
>  the startup is another concurent updator on standby. Addition to
>  that, checkpointer assumes that it is started under WAL-emitting
>  state, that is, InitXLogInsert() has been called elsewhere, but
>  it is not the case on standby.
>
>  Note that checkpointer mustn't update FPW flag before recovery
>  ends because startup will overrides the flag.
>
> 3. As the result, when standby gets full_page_writes changed and
>  SIGHUP during recovery, checkpointer tries to update FPW flag
>  and calls RecoveryInProgress() on the way. bang!
>
>
> With the 0002-step1.patch, checkpointer really becomes the only
> updator of the FPW flag after recovery ends. StartupXLOG()
> updates it just once before checkpointer starts to update it.
>

- * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
- * record before resource manager writes cleanup WAL records or checkpoint
- * record is written.
+ * Update full_page_writes in shared memory with the lastest value before
+ * resource manager writes cleanup WAL records or checkpoint record is
+ * written. We don't need to write XLOG_FPW_CHANGE since this just
+ * reflects the status at the last redo'ed record. No lock is required
+ * since startup is the only updator of the flag at this
+ * point. Checkpointer will take over after SharedRecoveryInProgress is
+ * turned to false.
  */
  Insert->fullPageWrites = lastFullPageWrites;
- LocalSetXLogInsertAllowed();
- UpdateFullPageWrites();
- LocalXLogInsertAllowed = -1;

lastFullPageWrites will contain the latest value among the replayed
WAL records.  It can still be different fullPageWrites which is
updated by UpdateFullPageWrites().  So, I don't think it is advisable
to remove it and rely on checkpointer to update it.  I think when it
is called from this code path, it will anyway not write
XLOG_FPW_CHANGE because of the !RecoveryInProgress() check.

> Under the normal(slow?) promotion mode, checkpointer is woken up
> explicitly to make the config setting effective as soon as
> possible. (This might be unnecessary.)
>

I am not sure this is the right approach.  If we are worried about
concurrency issues, then we can use lock to protect it.

> In checkpointer, RecoveryInProgress() is called preior to
> UpdateFPW() to disable update FPW during recovery, so the crash
> that is the issue here is fixed.
>

It seems to me that you are trying to resolve two different problems
in the same patch.  I think we can have a patch to deal with
concurrency issue if any and a separate patch to call
RecoveryInProgress outside critical section.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JvKxsHfM6GCHoKNas-7vDSniLgaAm%3DcG_OaQaOYRNc3w%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-08-01 Thread Kyotaro HORIGUCHI
Thank you, Amit, Michael.

At Sun, 29 Jul 2018 08:19:11 +0900, Michael Paquier  wrote 
in <20180728231911.gb1...@paquier.xyz>
> On Sat, Jul 28, 2018 at 07:10:24PM +0530, Amit Kapila wrote:
> > I have just responded to your first patch (0001).  Can you once again
> > summarize what the 0002 exactly accomplishes?  I think one of the
> > goals is to fix the original problem reported in this thread and other
> > is you have found the concurrency issue.  Is it possible to have
> > separate patches for those or you think they are interrelated and
> > needs to be fixed together?
> 
> That would be nice.  The last time I read this thread I have been rather
> confused about what was being discussed, what were the set of problems,
> and what was being fixed.  Speaking of which, this is one of the bugfix
> patches I wanted to look at once I have untanggled the autovacuum one
> for temporary relations and the DOS issues with lock lookups.

It's a long time ago.. Let me have a bit of time to blow dust off..

https://www.postgresql.org/message-id/20180420.173354.43303926.horiguchi.kyot...@lab.ntt.co.jp

...done..i working..

The original problem here was UpdateFullPageWrites() can call
RecoveryInProgress(), which does palloc in a critical
section. This happens when standy is commanded to reload with
change of full_pages_writes during recovery.

While exploring it, I found that update of fullPageWrite flag is
updated concurrently against its design. A race condition happens
in the following steps in my diagnosis.

1. While the startup process is running recovery, we didn't
 consider that checkpointer may be running concurrently, but it
 happens for standby.

2. Checkpointer considers itself (or designed) as the *only*
 updator of shared config including fillPageWrites. In reality
 the startup is another concurent updator on standby. Addition to
 that, checkpointer assumes that it is started under WAL-emitting
 state, that is, InitXLogInsert() has been called elsewhere, but
 it is not the case on standby.

 Note that checkpointer mustn't update FPW flag before recovery
 ends because startup will overrides the flag.

3. As the result, when standby gets full_page_writes changed and
 SIGHUP during recovery, checkpointer tries to update FPW flag
 and calls RecoveryInProgress() on the way. bang!


With the 0002-step1.patch, checkpointer really becomes the only
updator of the FPW flag after recovery ends. StartupXLOG()
updates it just once before checkpointer starts to update it.

Under the normal(slow?) promotion mode, checkpointer is woken up
explicitly to make the config setting effective as soon as
possible. (This might be unnecessary.)

In checkpointer, RecoveryInProgress() is called preior to
UpdateFPW() to disable update FPW during recovery, so the crash
that is the issue here is fixed.

FYI I think that 0002-tesp2.patch is rejected.

Please find the attacehd revised patch. It is rebased and
provided with a commit message.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From c839e039d4ebb06c0dd345f7992a460a27827805 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 1 Aug 2018 15:29:01 +0900
Subject: [PATCH] Fix FPW flags updates during recovery.

Each of checkpointer and startup processes thinks it is the only
updator of fullPageWrites flag ignorantly of each other. This causes
race condition and leads checkpionter to a crash in a certain case.

This patch makes chackpionter the only updator and organize fix the
sequence around recovery end in order to fullPageWrite flag is
correctly updated.
---
 src/backend/access/transam/xlog.c | 19 -
 src/backend/postmaster/checkpointer.c | 50 ---
 src/include/postmaster/bgwriter.h |  1 +
 3 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 493f1db7b9..f315d11745 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7692,14 +7692,15 @@ StartupXLOG(void)
 	XLogCtl->LogwrtRqst.Flush = EndOfLog;
 
 	/*
-	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
-	 * record before resource manager writes cleanup WAL records or checkpoint
-	 * record is written.
+	 * Update full_page_writes in shared memory with the lastest value before
+	 * resource manager writes cleanup WAL records or checkpoint record is
+	 * written. We don't need to write XLOG_FPW_CHANGE since this just
+	 * reflects the status at the last redo'ed record. No lock is required
+	 * since startup is the only updator of the flag at this
+	 * point. Checkpointer will take over after SharedRecoveryInProgress is
+	 * turned to false.
 	 */
 	Insert->fullPageWrites = lastFullPageWrites;
-	LocalSetXLogInsertAllowed();
-	UpdateFullPageWrites();
-	LocalXLogInsertAllowed = -1;
 
 	if (InRecovery)
 	{
@@ -7941,10 +7942,14 @@ StartupXLOG(void)
 	 * If this was a fast promotion, request an (online) checkpoint 

Re: Problem while setting the fpw with SIGHUP

2018-07-28 Thread Michael Paquier
On Sat, Jul 28, 2018 at 07:10:24PM +0530, Amit Kapila wrote:
> I have just responded to your first patch (0001).  Can you once again
> summarize what the 0002 exactly accomplishes?  I think one of the
> goals is to fix the original problem reported in this thread and other
> is you have found the concurrency issue.  Is it possible to have
> separate patches for those or you think they are interrelated and
> needs to be fixed together?

That would be nice.  The last time I read this thread I have been rather
confused about what was being discussed, what were the set of problems,
and what was being fixed.  Speaking of which, this is one of the bugfix
patches I wanted to look at once I have untanggled the autovacuum one
for temporary relations and the DOS issues with lock lookups.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-07-28 Thread Amit Kapila
On Tue, Apr 24, 2018 at 7:10 AM, Kyotaro HORIGUCHI
 wrote:
> At Tue, 24 Apr 2018 08:52:17 +0900, Michael Paquier  
> wrote in <20180423235217.gb1...@paquier.xyz>
>> On Mon, Apr 23, 2018 at 12:21:04PM -0400, Robert Haas wrote:
>> > Fine, but that doesn't answer the question of whether we actually need
>> > to or should change the behavior in the first place.
>>
>> Per the last arguments that would be "No, we don't want to change it as
>> it would surprise some users":
>> https://www.postgresql.org/message-id/20180420010402.gf2...@paquier.xyz
>
> The answer is that the change of behavior is not required to fix
> the bug. So I'm fine with applying only (0001 and) 0002 here.
>

I have just responded to your first patch (0001).  Can you once again
summarize what the 0002 exactly accomplishes?  I think one of the
goals is to fix the original problem reported in this thread and other
is you have found the concurrency issue.  Is it possible to have
separate patches for those or you think they are interrelated and
needs to be fixed together?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-07-28 Thread Amit Kapila
On Fri, Apr 20, 2018 at 6:06 PM, Amit Kapila  wrote:
> On Fri, Apr 20, 2018 at 11:40 AM, Kyotaro HORIGUCHI
>  wrote:
>> By the way, I think I found a bug of FPW.
>>
>> The following steps yields INSERT record that doesn't have a FPI
>> after a checkpoint.
>>
>> (Start server with full_page_writes = off)
>> CREATE TABLE t (a int);
>> CHECKPOINT;
>> INSERT INTO t VALUES (1);
>> ALTER SYSTEM SET full_page_writes TO on;
>> SELECT pg_reload_conf();
>> CHECKPOINT;
>> INSERT INTO t VALUES (1);
>>
>> The last insert is expected to write a record with FPI but it
>> doesn't actually. No FPI will be written for the page after that.
>>
>> It seems that the reason is that XLogInsertRecord is forgetting
>> to check doPageWrites' update.
>>
>> In the failure case, fpw_lsn has been set by XLogRecordAssemble
>> but doPageWrite is false at the time and it considers that no FPI
>> is required then it sets fpw_lsn to InvalidXLogRecPtr.
>>
>> After that, XLogInsertRecord receives the record but it thinks
>> that doPageWrites is true since it looks the shared value.
>>
>>> if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
>>
>> So this line thinks that "no FPI is omitted in this record" but
>> actually the record is just forgotten to attach them.
>>
>> The attached patch lets XLogInsertRecord check if doPageWrites
>> has been turned on after the last call and cause recomputation in
>> the case.
>>
>
> I think you have correctly spotted the problem and your fix looks good
> to me.  As this is a separate problem and fix is different from what
> we are discussing here, I think this can be committed it separately.
>

AFAICS, this problem has been introduced by commit
2c03216d831160bedd72d45f712601b6f7d03f1c, so we should backpatch till
9.5.  Please find attached the slightly modified patch for this bug.
I have modified one of the comments in the patch and the proposed
commit message.  Can you please once cross-verify if the problem exits
till 9.5 and that this patch fixes it?  Also, I don't see an easy way
to write a test for it, do you have anything in mind?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Correctly-attach-FPI-to-the-first-record.2.patch
Description: Binary data


Re: Problem while setting the fpw with SIGHUP

2018-04-23 Thread Kyotaro HORIGUCHI
At Tue, 24 Apr 2018 08:52:17 +0900, Michael Paquier  wrote 
in <20180423235217.gb1...@paquier.xyz>
> On Mon, Apr 23, 2018 at 12:21:04PM -0400, Robert Haas wrote:
> > Fine, but that doesn't answer the question of whether we actually need
> > to or should change the behavior in the first place.
> 
> Per the last arguments that would be "No, we don't want to change it as
> it would surprise some users":
> https://www.postgresql.org/message-id/20180420010402.gf2...@paquier.xyz

The answer is that the change of behavior is not required to fix
the bug. So I'm fine with applying only (0001 and) 0002 here.

https://www.postgresql.org/message-id/20180420010402.gf2...@paquier.xyz

One reason that I introduced the "restriction" was that it was
useful to avoid concurrent udpate of the flag. It is now avoided
in another way.

Just my opinion on the behavior follows.

I don't think anyone confirms that FPI come back after switching
full_page_writes (one of the reason is it needs pg_waldump).
pg_start/stop_backup() on standby says that "You need to turn on
FPW, then do checkpoint". It suggests that FPI's don't work
before the next checkpoint. If we keep the current behavior, the
documentation might need an additional phrase something like "FPW
ensures that data is protected from partial writes after the next
chackpoint starts". On the other hand honestly I don't have
confidence that the WAL reduction worth the additional complexity
by 0003.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Problem while setting the fpw with SIGHUP

2018-04-23 Thread Robert Haas
On Wed, Apr 18, 2018 at 9:49 PM, Michael Paquier  wrote:
> On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote:
>> I would just document the risks.  If the documentation says that you
>> can't rely on the value until after the next checkpoint, or whatever
>> the rule is, then I think we're fine.  I don't think that we really
>> have the infrastructure to do any better; if we try, we'll just end up
>> with odd warts.  Documenting the current set of warts is less churn
>> and less work.
>
> The last version of the patch proposed has eaten this diff which was
> part of one of the past versions (v2-0001-Change-FPW-handling.patch from
> https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp):
> +The default is on. The change of the parameter 
> takes
> +effect at the next checkpoint time.
> So there were some documentation about the beHavior change for what it's
> worth.

Fine, but that doesn't answer the question of whether we actually need
to or should change the behavior in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Problem while setting the fpw with SIGHUP

2018-04-20 Thread Amit Kapila
On Fri, Apr 20, 2018 at 11:40 AM, Kyotaro HORIGUCHI
 wrote:
> By the way, I think I found a bug of FPW.
>
> The following steps yields INSERT record that doesn't have a FPI
> after a checkpoint.
>
> (Start server with full_page_writes = off)
> CREATE TABLE t (a int);
> CHECKPOINT;
> INSERT INTO t VALUES (1);
> ALTER SYSTEM SET full_page_writes TO on;
> SELECT pg_reload_conf();
> CHECKPOINT;
> INSERT INTO t VALUES (1);
>
> The last insert is expected to write a record with FPI but it
> doesn't actually. No FPI will be written for the page after that.
>
> It seems that the reason is that XLogInsertRecord is forgetting
> to check doPageWrites' update.
>
> In the failure case, fpw_lsn has been set by XLogRecordAssemble
> but doPageWrite is false at the time and it considers that no FPI
> is required then it sets fpw_lsn to InvalidXLogRecPtr.
>
> After that, XLogInsertRecord receives the record but it thinks
> that doPageWrites is true since it looks the shared value.
>
>> if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
>
> So this line thinks that "no FPI is omitted in this record" but
> actually the record is just forgotten to attach them.
>
> The attached patch lets XLogInsertRecord check if doPageWrites
> has been turned on after the last call and cause recomputation in
> the case.
>

I think you have correctly spotted the problem and your fix looks good
to me.  As this is a separate problem and fix is different from what
we are discussing here, I think this can be committed it separately.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-04-20 Thread Kyotaro HORIGUCHI
I noticed that the previous patch is a mixture with another
patch.. sorry.

At Thu, 19 Apr 2018 19:11:43 +0530, Amit Kapila  wrote 
in 
> On Thu, Apr 19, 2018 at 7:19 AM, Michael Paquier  wrote:
> > On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote:
> >> I would just document the risks.  If the documentation says that you
> >> can't rely on the value until after the next checkpoint, or whatever
> >> the rule is, then I think we're fine.  I don't think that we really
> >> have the infrastructure to do any better; if we try, we'll just end up
> >> with odd warts.  Documenting the current set of warts is less churn
> >> and less work.
> >
> > The last version of the patch proposed has eaten this diff which was
> > part of one of the past versions (v2-0001-Change-FPW-handling.patch from
> > https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp):
> > +The default is on. The change of the parameter 
> > takes
> > +effect at the next checkpoint time.
> > So there were some documentation about the beHavior change for what it's
> > worth.
> >
> > And, er, actually, I was thinking again about the case where a user
> > wants to disable full_page_writes temporarily to do some bulk load and
> > then re-enable it.  With the patch proposed to actually update the FPW
> > effect at checkpoint time, then a user would need to issue a manual
> > checkpoint after updating the configuration and reloading, which may
> > create more I/O than he'd want to pay for, then a second checkpoint
> > would need to be issued after the configuration comes back again.
> >
> 
> Why a second checkpoint?  One checkpoint either manual or automatic
> should be enough to make the setting effective.

One significant point in my first patch is anyway the FPW is
useless untill the second checkpoint starts. In the sense of the
timing when *useful* FPW comes back, the second checkpoint is
required regardless of the patch. As Amit said, there is no
difference whether it is manual or automatic.

On the other hand the timing when FPW is actually turned off is
different. Focusing on user's view, one can run bulkload
immediately under the current behavior but should wait for the
next checkpoint starts with the first patch, which can be caused
manually but may be delayed after the currently running
checkpoint if any.

I think that starting the *first* checkpoint before bulkload is
not such a bother but on the other hand the behavior can lead to
FPW flood for those who are accostomed to the current behavior.

The attached first patch is the bugfix proposed in this thread.
The second fixes the cocurrent update problem only.

The third changes the behavior so that turning-on happenes only
on checkpoints and turning-off happens any time.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 9b3496ceb6f39b017698225fef289974bd01fb07 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 20 Apr 2018 15:01:26 +0900
Subject: [PATCH 1/3] Correctly attach FPI to the first record after a
 checkpoint

XLogInsert fails to attach a required FPIs to the first record after a
checkpoint if no other record has been written after full_page_writes
was turned on. This makes incosistency between fpw flag of the
checkpoint record and the following record's FPW status. This patch
makes XLogInsertRecord to cause a recomputation when the given record
is generated during FPW is off but found that the flag has been turned
on.
---
 src/backend/access/transam/xlog.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 08dc9ba031..27753f7321 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -937,7 +937,7 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
  *
  * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this
  * WAL record applies to, that were not included in the record as full page
- * images.  If fpw_lsn >= RedoRecPtr, the function does not perform the
+ * images.  If fpw_lsn <= RedoRecPtr, the function does not perform the
  * insertion and returns InvalidXLogRecPtr.  The caller can then recalculate
  * which pages need a full-page image, and retry.  If fpw_lsn is invalid, the
  * record is always inserted.
@@ -970,6 +970,7 @@ XLogInsertRecord(XLogRecData *rdata,
 			   info == XLOG_SWITCH);
 	XLogRecPtr	StartPos;
 	XLogRecPtr	EndPos;
+	bool		prevDoPageWrites = doPageWrites;
 
 	/* we assume that all of the record header is in the first chunk */
 	Assert(rdata->len >= SizeOfXLogRecord);
@@ -1022,7 +1023,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * This can only happen just after a checkpoint, so it's better to be slow
 	 * in this case and fast otherwise.
 	 *
-	

Re: Problem while setting the fpw with SIGHUP

2018-04-20 Thread Kyotaro HORIGUCHI
By the way, I think I found a bug of FPW.

The following steps yields INSERT record that doesn't have a FPI
after a checkpoint.

(Start server with full_page_writes = off)
CREATE TABLE t (a int);
CHECKPOINT;
INSERT INTO t VALUES (1);
ALTER SYSTEM SET full_page_writes TO on;
SELECT pg_reload_conf();
CHECKPOINT;
INSERT INTO t VALUES (1);

The last insert is expected to write a record with FPI but it
doesn't actually. No FPI will be written for the page after that.

It seems that the reason is that XLogInsertRecord is forgetting
to check doPageWrites' update.

In the failure case, fpw_lsn has been set by XLogRecordAssemble
but doPageWrite is false at the time and it considers that no FPI
is required then it sets fpw_lsn to InvalidXLogRecPtr.

After that, XLogInsertRecord receives the record but it thinks
that doPageWrites is true since it looks the shared value.

> if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)

So this line thinks that "no FPI is omitted in this record" but
actually the record is just forgotten to attach them.

The attached patch lets XLogInsertRecord check if doPageWrites
has been turned on after the last call and cause recomputation in
the case.

> * If there are any registered buffers, and a full-page image was not taken
> * of all of them, *fpw_lsn is set to the lowest LSN among such pages. This
> * signals that the assembled record is only good for insertion on the
> * assumption that the RedoRecPtr and doPageWrites values were up-to-date.

And the patch fixes one comment typo of XLogInsertRecord.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 40ce98bba0496b1eb0a982134eae9cec01d532a8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 20 Apr 2018 15:01:26 +0900
Subject: [PATCH] Correctly attach FPI to the first record after a checkpoint

XLogInsert fails to attach a required FPIs to the first record after a
checkpoint if no other record has been written after full_page_writes
was turned on. This makes incosistency between fpw flag of the
checkpoint record and the following record's FPW status. This patch
makes XLogInsertRecord to cause a recomputation when the given record
is generated during FPW is off but found that the flag has been turned
on.
---
 src/backend/access/transam/xlog.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 08dc9ba031..27753f7321 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -937,7 +937,7 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
  *
  * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this
  * WAL record applies to, that were not included in the record as full page
- * images.  If fpw_lsn >= RedoRecPtr, the function does not perform the
+ * images.  If fpw_lsn <= RedoRecPtr, the function does not perform the
  * insertion and returns InvalidXLogRecPtr.  The caller can then recalculate
  * which pages need a full-page image, and retry.  If fpw_lsn is invalid, the
  * record is always inserted.
@@ -970,6 +970,7 @@ XLogInsertRecord(XLogRecData *rdata,
 			   info == XLOG_SWITCH);
 	XLogRecPtr	StartPos;
 	XLogRecPtr	EndPos;
+	bool		prevDoPageWrites = doPageWrites;
 
 	/* we assume that all of the record header is in the first chunk */
 	Assert(rdata->len >= SizeOfXLogRecord);
@@ -1022,7 +1023,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * This can only happen just after a checkpoint, so it's better to be slow
 	 * in this case and fast otherwise.
 	 *
-	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
+	 * If doPageWrites was just turned on, we force a recomputation. Otherwise
+	 * if we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
 	 * but not force a recomputation.  (If doPageWrites was just turned off,
 	 * we could recompute the record without full pages, but we choose not to
@@ -1035,7 +1037,9 @@ XLogInsertRecord(XLogRecData *rdata,
 	}
 	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
 
-	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
+	if (doPageWrites &&
+		(!prevDoPageWrites ||
+		 (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr)))
 	{
 		/*
 		 * Oops, some buffer now needs to be backed up that the caller didn't
-- 
2.16.3



Re: Problem while setting the fpw with SIGHUP

2018-04-19 Thread Michael Paquier
On Thu, Apr 19, 2018 at 07:11:43PM +0530, Amit Kapila wrote:
> On Thu, Apr 19, 2018 at 7:19 AM, Michael Paquier  wrote:
>> And, er, actually, I was thinking again about the case where a user
>> wants to disable full_page_writes temporarily to do some bulk load and
>> then re-enable it.  With the patch proposed to actually update the FPW
>> effect at checkpoint time, then a user would need to issue a manual
>> checkpoint after updating the configuration and reloading, which may
>> create more I/O than he'd want to pay for, then a second checkpoint
>> would need to be issued after the configuration comes back again.
> 
> Why a second checkpoint?  One checkpoint either manual or automatic
> should be enough to make the setting effective.

I was thinking about cases where users have say hourly cron jobs in
charge of doing some maintenance of update cleanups, where they would
need to be sure that full_page_writes are back online after doing the
bulk-load.  In this case an extra checkpoint would be necessary to make
the parameter update effective.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-04-19 Thread Amit Kapila
On Thu, Apr 19, 2018 at 7:19 AM, Michael Paquier  wrote:
> On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote:
>> I would just document the risks.  If the documentation says that you
>> can't rely on the value until after the next checkpoint, or whatever
>> the rule is, then I think we're fine.  I don't think that we really
>> have the infrastructure to do any better; if we try, we'll just end up
>> with odd warts.  Documenting the current set of warts is less churn
>> and less work.
>
> The last version of the patch proposed has eaten this diff which was
> part of one of the past versions (v2-0001-Change-FPW-handling.patch from
> https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp):
> +The default is on. The change of the parameter 
> takes
> +effect at the next checkpoint time.
> So there were some documentation about the beHavior change for what it's
> worth.
>
> And, er, actually, I was thinking again about the case where a user
> wants to disable full_page_writes temporarily to do some bulk load and
> then re-enable it.  With the patch proposed to actually update the FPW
> effect at checkpoint time, then a user would need to issue a manual
> checkpoint after updating the configuration and reloading, which may
> create more I/O than he'd want to pay for, then a second checkpoint
> would need to be issued after the configuration comes back again.
>

Why a second checkpoint?  One checkpoint either manual or automatic
should be enough to make the setting effective.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-04-18 Thread Michael Paquier
On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote:
> I would just document the risks.  If the documentation says that you
> can't rely on the value until after the next checkpoint, or whatever
> the rule is, then I think we're fine.  I don't think that we really
> have the infrastructure to do any better; if we try, we'll just end up
> with odd warts.  Documenting the current set of warts is less churn
> and less work.

The last version of the patch proposed has eaten this diff which was
part of one of the past versions (v2-0001-Change-FPW-handling.patch from
https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp):
+The default is on. The change of the parameter takes
+effect at the next checkpoint time.
So there were some documentation about the beHavior change for what it's
worth. 

And, er, actually, I was thinking again about the case where a user
wants to disable full_page_writes temporarily to do some bulk load and
then re-enable it.  With the patch proposed to actually update the FPW
effect at checkpoint time, then a user would need to issue a manual
checkpoint after updating the configuration and reloading, which may
create more I/O than he'd want to pay for, then a second checkpoint
would need to be issued after the configuration comes back again.  That
would cause a regression which could surprise a class of users.  WAL and
FPW overhead is a problem which shows up a lot when doing bulk-loading
of data.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-04-18 Thread Robert Haas
On Wed, Apr 18, 2018 at 10:37 AM, Amit Kapila  wrote:
> On Fri, Apr 13, 2018 at 10:36 PM, Robert Haas  wrote:
>> On Thu, Apr 12, 2018 at 9:29 PM, Michael Paquier  wrote:
>>> Still does it matter when the change is effective?
>>
>> I don't really care deeply about when the change takes effect, but I
>> do care about whether the time when the system *says* the change took
>> effect is the same as when it *actually* took effect.  If those aren't
>> the same, it's confusing.
>>
>
> So, what in your opinion is the way to deal with this?  If we make it
> a PGC_POSTMASTER parameter, it will have a very clear behavior and
> users don't need to bother whether they have a risk of torn page
> problem or not and as a side-impact the code will be simplified as
> well.  However, as Michael said the people who get the benefit of this
> option by disabling/enabling this parameter might complain.  Keeping
> it as a SIGHUP option has the drawback that even after the user has
> enabled it, there is a risk of torn pages.

I would just document the risks.  If the documentation says that you
can't rely on the value until after the next checkpoint, or whatever
the rule is, then I think we're fine.  I don't think that we really
have the infrastructure to do any better; if we try, we'll just end up
with odd warts.  Documenting the current set of warts is less churn
and less work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Problem while setting the fpw with SIGHUP

2018-04-18 Thread Amit Kapila
On Fri, Apr 13, 2018 at 10:36 PM, Robert Haas  wrote:
> On Thu, Apr 12, 2018 at 9:29 PM, Michael Paquier  wrote:
>> Still does it matter when the change is effective?
>
> I don't really care deeply about when the change takes effect, but I
> do care about whether the time when the system *says* the change took
> effect is the same as when it *actually* took effect.  If those aren't
> the same, it's confusing.
>

So, what in your opinion is the way to deal with this?  If we make it
a PGC_POSTMASTER parameter, it will have a very clear behavior and
users don't need to bother whether they have a risk of torn page
problem or not and as a side-impact the code will be simplified as
well.  However, as Michael said the people who get the benefit of this
option by disabling/enabling this parameter might complain.  Keeping
it as a SIGHUP option has the drawback that even after the user has
enabled it, there is a risk of torn pages.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-04-13 Thread Robert Haas
On Thu, Apr 12, 2018 at 9:29 PM, Michael Paquier  wrote:
> Still does it matter when the change is effective?

I don't really care deeply about when the change takes effect, but I
do care about whether the time when the system *says* the change took
effect is the same as when it *actually* took effect.  If those aren't
the same, it's confusing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Problem while setting the fpw with SIGHUP

2018-04-13 Thread Kyotaro HORIGUCHI
Sorry, the patch attached to the previous main is slightly
old. The attached is the correct one.

# They differ only in some phrase in a comment.


At Fri, 13 Apr 2018 17:28:40 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180413.172840.228724367.horiguchi.kyot...@lab.ntt.co.jp>
At Fri, 13 Apr 2018 13:47:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180413.134751.76149471.horiguchi.kyot...@lab.ntt.co.jp>
> At Fri, 13 Apr 2018 08:31:02 +0530, Amit Kapila  
> wrote in 

Re: Problem while setting the fpw with SIGHUP

2018-04-13 Thread Kyotaro HORIGUCHI
At Fri, 13 Apr 2018 13:47:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180413.134751.76149471.horiguchi.kyot...@lab.ntt.co.jp>
> At Fri, 13 Apr 2018 08:31:02 +0530, Amit Kapila  
> wrote in 

Re: Problem while setting the fpw with SIGHUP

2018-04-12 Thread Amit Kapila
On Fri, Apr 13, 2018 at 6:59 AM, Michael Paquier  wrote:
> On Thu, Apr 12, 2018 at 02:55:53PM -0400, Robert Haas wrote:
>> I think it may actually be confusing.  If you run pg_ctl reload and it
>> reports that the value has changed, you'll expect it to have taken
>> effect.  But really, it will take effect at some later time.
>

+1. I also think it is confusing and it could be difficult for end
users to know when the setting is effective.

> It is true that sometimes some people like to temporarily disable
> full_page_writes particularly when doing some bulk load of data to
> minimize the effort on WAL, and then re-enable it just after doing
> the inserting this data.
>
> Still does it matter when the change is effective?  By disabling
> full_page_writes even temporarily, you accept the fact that this
> instance would not be safe until the next checkpoint completes.  The
> instance even finishes by writing less unnecessary WAL data if the
> change is only effective at the next checkpoint.  Well, it is true that
> this increases potential torn pages problems but the user is already
> accepting that risk if a crash happens until the next checkpoint then it
> exposes itself to torn pages anyway as it chose to disable
> full_page_writes.
>

I think this means that is will be difficult for end users to predict
unless they track the next checkpoint which isn't too bad, but won't
be convenient either.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-04-12 Thread Michael Paquier
On Thu, Apr 12, 2018 at 02:55:53PM -0400, Robert Haas wrote:
> I think it may actually be confusing.  If you run pg_ctl reload and it
> reports that the value has changed, you'll expect it to have taken
> effect.  But really, it will take effect at some later time.

It is true that sometimes some people like to temporarily disable
full_page_writes particularly when doing some bulk load of data to
minimize the effort on WAL, and then re-enable it just after doing 
the inserting this data.

Still does it matter when the change is effective?  By disabling
full_page_writes even temporarily, you accept the fact that this
instance would not be safe until the next checkpoint completes.  The
instance even finishes by writing less unnecessary WAL data if the
change is only effective at the next checkpoint.  Well, it is true that
this increases potential torn pages problems but the user is already
accepting that risk if a crash happens until the next checkpoint then it
exposes itself to torn pages anyway as it chose to disable
full_page_writes.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-04-12 Thread Robert Haas
On Wed, Apr 11, 2018 at 7:09 AM, Heikki Linnakangas  wrote:
> I think the new behavior where the GUC only takes effect at next checkpoint
> is OK. It seems quite intuitive.

I think it may actually be confusing.  If you run pg_ctl reload and it
reports that the value has changed, you'll expect it to have taken
effect.  But really, it will take effect at some later time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Problem while setting the fpw with SIGHUP

2018-04-12 Thread Michael Paquier
On Thu, Apr 12, 2018 at 04:59:10PM +0900, Kyotaro HORIGUCHI wrote:
> At Thu, 12 Apr 2018 14:07:53 +0900, Michael Paquier  
> wrote in <20180412050753.ga19...@paquier.xyz>
>> I have been able to spend a couple of hours on your patch, wrapping my
>> mind on your stuff.  So what I had in mind was something like this type
>> of scenario:
> 
> Thank for the precise explanation.

Just to be clear and to avoid incorrect conclusion.  This is the type of
scenarios I imagined about when I read your previous email, concluding
such scenarios those cannot apply per the strong assumption on
SharedRecoveryInProgress your patch heavily relies on.  In short I have
no objections.

>> (The latest patch is a mix of two patches.)
> 
> Sorry I counld get this.

The patch called v2-0001-Change-FPW-handling.patch posted on
https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp,
which is the latest version available, is a mix of the patch you are
creating for this thread and of a patch aimed a fixing an issue with
partition range handling.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-04-12 Thread Kyotaro HORIGUCHI
Hello.

At Thu, 12 Apr 2018 14:07:53 +0900, Michael Paquier  wrote 
in <20180412050753.ga19...@paquier.xyz>
> On Thu, Apr 12, 2018 at 10:34:30AM +0900, Kyotaro HORIGUCHI wrote:
> > Checkpointer never calls CreateCheckPoint while
> > RecoveryInProgress() == true. In other words, checkpointer is not
> > an updator of shared FPW at the time StartupXLOG calls
> > CreateCheckPoint for fallback_promote.
> 
> I have been able to spend a couple of hours on your patch, wrapping my
> mind on your stuff.  So what I had in mind was something like this type
> of scenario:

Thank for the precise explanation.

The scenario that CreateCheckPoint is called simultaneously in
different prcoesses seems broken by itself in the first place but
I put that aside for now.

> 1) The startup process requires a restart point.
> 2) The checkpointer receives the request, and blocks before reading
> RecoveryInProgress().

RecoveryInProgress doesn't take lock. But I assume here that
checkpointer is taking a long time after entering
RecoveryInProgress and haven't actually read
SharedRecoveryInProgress.

> 3) A fallback_promote is triggered, making the startup process call
> CreateCheckpoint().

I'm confused here. It seems to me that StartupXLOG calls
CreateCheckPoint only in bootstrap or standalone cases. No
concurrent processe is running in the cases.

Even if CreateCheckPoint is called there in the IsUnderPostmater
case, checkpointer never calls CreateCheckPoint during the
checkpoint. This is safe in regard to shared FPW. (but checkpoint
is being blocked in this scenario.)

Assuming that RequestCheckpoint() is called here, the request is
merged with the previous request above and the function returns
after the checkpoint ends. (That is, checkpointer must continue
to run in this case.)

> 4) Startup process finishes checkpoint, updates Insert->fullPageWrites.

According to this scenario, checkpionter is still stalling
now. So it is not a concurrent update.

> 5) Checkpoint reads RecoveryInProgress to false, moves on with its
> checkpoint.

If checkpointer sees SharedRecoveryInProgress being false,
Create(or Request)CheckPoint called in (3) must have finished and
StartupXLOG() no longer updates shared FPW. There's no concurrent
update.

> > The comment may be somewhat confusing that it is written
> > there. The point is that checkpointer and StartupXLOG are
> > mutually excluded on updating shared FPW by
> > SharedRecoveryInProgress flag.
> 
> Indeed.  I can see that it is the main key point of the patch.
> 
> > | * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before
> > | * the flag actually takes effect. Checkpointer never calls this function
> > | * before StartupXLOG() turns off SharedRecoveryInProgress so there's no
> > | * window where checkpointer and startup processes - the only updators of
> > | * the flag - can update shared FPW simultaneously. Thus no lock is
> > | * required here. Both shared and local fullPageWrites do not change
> > | * before the next reading below.
> 
> Yeah, this reduces the confusion.

Thanks^^;

> (The latest patch is a mix of two patches.)

Sorry I counld get this.

> +The default is on. The change of the parmeter 
> takes
> +effect at the next checkpoint time.
> s/parmeter/parameter/
> 
> By the way, I would vote for keeping track in WAL of full_page_writes
> switched from off to on.  This is not used in the backend, but that's
> still useful for debugging end-user issues.

Agreed and I tried that. The problem on that is that some records
can be written after REDO point before XLOG_FPW_CHANGE(true) is
written. However this is no problem for the FPW-related stuff to
work properly (since no one looks it), the FPW record suggests
that the current checkpoint loses FPI in the first several
records. This has a far larger impact with this patch because
shared FPW is always turned on just at REDO point.

So I choosed not to write XLOG_FPW_CHANGE(false) rather than
writing bogus records.

> Actually, I was wondering why a spin lock is not taken in
> RecoveryInProgress when reading SharedRecoveryInProgress, but that's
> from 1a3d1044 which added a memory barrier instead as guarantee...

Maybe it doesn't need barrier, since the flag is initialized as
true and becomes false just once and delay in reading by other
processes doesn't no harm. I think that bool doesn't suffer
atomicity. Even all these are true, some description is needed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Problem while setting the fpw with SIGHUP

2018-04-11 Thread Michael Paquier
On Thu, Apr 12, 2018 at 10:34:30AM +0900, Kyotaro HORIGUCHI wrote:
> Checkpointer never calls CreateCheckPoint while
> RecoveryInProgress() == true. In other words, checkpointer is not
> an updator of shared FPW at the time StartupXLOG calls
> CreateCheckPoint for fallback_promote.

I have been able to spend a couple of hours on your patch, wrapping my
mind on your stuff.  So what I had in mind was something like this type
of scenario:
1) The startup process requires a restart point.
2) The checkpointer receives the request, and blocks before reading
RecoveryInProgress().
3) A fallback_promote is triggered, making the startup process call
CreateCheckpoint().
4) Startup process finishes checkpoint, updates Insert->fullPageWrites.
5) Checkpoint reads RecoveryInProgress to false, moves on with its
checkpoint.

> The comment may be somewhat confusing that it is written
> there. The point is that checkpointer and StartupXLOG are
> mutually excluded on updating shared FPW by
> SharedRecoveryInProgress flag.

Indeed.  I can see that it is the main key point of the patch.

> | * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before
> | * the flag actually takes effect. Checkpointer never calls this function
> | * before StartupXLOG() turns off SharedRecoveryInProgress so there's no
> | * window where checkpointer and startup processes - the only updators of
> | * the flag - can update shared FPW simultaneously. Thus no lock is
> | * required here. Both shared and local fullPageWrites do not change
> | * before the next reading below.

Yeah, this reduces the confusion.

(The latest patch is a mix of two patches.)

+The default is on. The change of the parmeter takes
+effect at the next checkpoint time.
s/parmeter/parameter/

By the way, I would vote for keeping track in WAL of full_page_writes
switched from off to on.  This is not used in the backend, but that's
still useful for debugging end-user issues.

Actually, I was wondering why a spin lock is not taken in
RecoveryInProgress when reading SharedRecoveryInProgress, but that's
from 1a3d1044 which added a memory barrier instead as guarantee...
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-04-11 Thread Kyotaro HORIGUCHI
Hello. Thanks to Heikkit for picking this up and thanks for the
commnet to Michael.

# The attached is changed only in a comment, and rebased.

At Thu, 12 Apr 2018 05:24:14 +0900, Michael Paquier  wrote 
in <20180411202414.ga32...@paquier.xyz>
> On Wed, Apr 11, 2018 at 02:09:48PM +0300, Heikki Linnakangas wrote:
> > I think the new behavior where the GUC only takes effect at next checkpoint
> > is OK. It seems quite intuitive.
> > 
> > > [rebased patch version]
> > 
> > Looks good at a quick glance. Assuming no objections from others, I'll take
> > a closer look and commit tomorrow. Thanks!
> 
> Sorry for not following up closely this thread lately.
> 
> +   /*
> +* If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before
> +* the flag actually takes effect. No lock is required since checkpointer
> +* is the only updator of shared fullPageWrites after recovery is
> +* finished. Both shared and local fullPageWrites do not change before the
> +* next reading below.
> +*/
> +   if (Insert->fullPageWrites && !fullPageWrites)
> +   {
> +   XLogBeginInsert();
> +   XLogRegisterData((char *) (), sizeof(bool));
> +   XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
> +   }
> 
> This is not actually true.  If a fallback_promote is used, then
> CreateCheckPoint() is called by the startup process which is in charge
> of issuing the end-of-recovery checkpoint, and not the checkpointer.  So
> I still fail to see how a no-lock approach is fine except if we remove
> fallback_promote?

Checkpointer never calls CreateCheckPoint while
RecoveryInProgress() == true. In other words, checkpointer is not
an updator of shared FPW at the time StartupXLOG calls
CreateCheckPoint for fallback_promote.

The comment may be somewhat confusing that it is written
there. The point is that checkpointer and StartupXLOG are
mutually excluded on updating shared FPW by
SharedRecoveryInProgress flag.

| * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before
| * the flag actually takes effect. Checkpointer never calls this function
| * before StartupXLOG() turns off SharedRecoveryInProgress so there's no
| * window where checkpointer and startup processes - the only updators of
| * the flag - can update shared FPW simultaneously. Thus no lock is
| * required here. Both shared and local fullPageWrites do not change
| * before the next reading below.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From f6d4857356508fa16dc5d54b92d0177dbeaae3e2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Apr 2018 13:57:48 +0900
Subject: [PATCH] Change FPW handling

The GUC full_pages_writes currently has an effect immediately. That
makes a race condition between config reload on checkpointer and
StartupXLOG. But since full page images are meaningful only when they
are attached to all WAL records covers a checkpoint, there is no
problem if we update the shared FPW only at REDO point.  On the other
hand, online backup mechanism on standby requires to know if FPW is
turned off before the next checkpoint record comes.

As the result, with this patch, changing of full_page_writes takes
effect at REDO point and additional XLOG_FPW_CHANGE is written only
for turning-off. These are sufficient for standby-backup to work
properly, reduces complexity and prevent the race condition.
---
 doc/src/sgml/config.sgml   |   4 +-
 src/backend/access/transam/xlog.c  | 116 -
 src/backend/optimizer/util/clauses.c   |   4 +-
 src/backend/parser/gram.y  |  78 ++-
 src/backend/parser/parse_agg.c |  10 +++
 src/backend/parser/parse_expr.c|   5 ++
 src/backend/parser/parse_func.c|   3 +
 src/backend/parser/parse_utilcmd.c |  72 +++---
 src/backend/postmaster/checkpointer.c  |   6 --
 src/include/access/xlog.h  |   1 -
 src/include/catalog/pg_control.h   |   2 +-
 src/include/optimizer/clauses.h|   2 +
 src/include/parser/parse_node.h|   1 +
 src/include/utils/rel.h|   6 ++
 src/test/regress/expected/create_table.out |  36 +
 src/test/regress/sql/create_table.sql  |  21 +-
 16 files changed, 192 insertions(+), 175 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d5f2d23c4..7ea42c25e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2598,7 +2598,9 @@ include_dir 'conf.d'

 This parameter can only be set in the postgresql.conf
 file or on the server command line.
-The default is on.
+
+The default is on. The change of the parmeter takes
+effect at the next checkpoint time.

   
  
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4a47395174..e251cc108b 100644

Re: Problem while setting the fpw with SIGHUP

2018-04-11 Thread Michael Paquier
On Wed, Apr 11, 2018 at 02:09:48PM +0300, Heikki Linnakangas wrote:
> I think the new behavior where the GUC only takes effect at next checkpoint
> is OK. It seems quite intuitive.
> 
> > [rebased patch version]
> 
> Looks good at a quick glance. Assuming no objections from others, I'll take
> a closer look and commit tomorrow. Thanks!

Sorry for not following up closely this thread lately.

+   /*
+* If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before
+* the flag actually takes effect. No lock is required since checkpointer
+* is the only updator of shared fullPageWrites after recovery is
+* finished. Both shared and local fullPageWrites do not change before the
+* next reading below.
+*/
+   if (Insert->fullPageWrites && !fullPageWrites)
+   {
+   XLogBeginInsert();
+   XLogRegisterData((char *) (), sizeof(bool));
+   XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
+   }

This is not actually true.  If a fallback_promote is used, then
CreateCheckPoint() is called by the startup process which is in charge
of issuing the end-of-recovery checkpoint, and not the checkpointer.  So
I still fail to see how a no-lock approach is fine except if we remove
fallback_promote?
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-04-09 Thread Kyotaro HORIGUCHI

At Fri, 6 Apr 2018 17:59:58 +0530, Amit Kapila  wrote 
in 
> On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello.
> >
> > At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
> >  wrote in 
> > <20180404.172646.238325981.horiguchi.kyot...@lab.ntt.co.jp>
> >> > In general, I was wondering why in the first place this variable
> >> > (full_page_writes) is a SIGHUP variable?  I think if the user tries to
> >> > switch it to 'on' from 'off', it won't guarantee the recovery from
> >> > torn pages.  Yeah, one can turn it to 'off' from 'on' without any
> >> > problem, but as the reverse doesn't guarantee anything, it can confuse
> >> > users. What do you think?
> >>
> >> I tend to agree with you. It works as expected after the next
> >> checkpoint. So define the variable as "it can be changed any time
> >> but has an effect at the next checkpoint time", then remove
> >> XLOG_FPW_CHANGE record. And that eliminates the problem of
> >> concurrent calls since the checkpointer becomes the only modifier
> >> of the variable. And the problematic function
> >> UpdateFullPageWrites also no longer needs to write a WAL
> >> record. The information is conveyed only by checkpoint records.
> >
> > I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
> > pg_start/stop_backup to know FPW's turning-off without waiting
> > for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
> > required since no one uses the information. It seems even harmful
> > when it is written at the incorrect place.
> >
> > In the attached patch, shared fullPageWrites is updated only at
> > REDO point
> >
> 
> I am not completely sure if that is the right option because this
> would mean that we are defining the new scope for a GUC variable.  I
> guess we should take the input of others as well.  I am not sure what
> is the right way to do that, but maybe we can start a new thread with
> a proper subject and description rather than discussing this under
> some related bug fix patch discussion.  I guess we can try that after
> CF unless some other people pitch in and share their feedback.

I'd like to refrain from making a new thread since this topic is
registered as an open item (in Old Bugs section). Or making a new
thread then relinking it from the page is preferable?

I'm surprised a bit that this got confilcted so soon. On the way
rebasing, for anyone's information, I considered comment and
documentation fix but all comments and documentation can be read
as both old and new behavior. That being said, the patch contains
a small addtion about the new behavior.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From fe0401335e0ac1a9ae36dbdb5c2db82f9081a1a3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Apr 2018 13:57:48 +0900
Subject: [PATCH] Change FPW handling

The GUC full_pages_writes currently has an effect immediately. That
makes a race condition between config reload on checkpointer and
StartupXLOG. But since full page images are meaningful only when they
are attached to all WAL records covers a checkpoint, there is no
problem if we update the shared FPW only at REDO point.  On the other
hand, online backup mechanism on standby requires to know if FPW is
turned off before the next checkpoint record comes.

As the result, with this patch, changing of full_page_writes takes
effect at REDO point and additional XLOG_FPW_CHANGE is written only
for turning-off. These are sufficient for standby-backup to work
properly, reduces complexity and prevent the race condition.
---
 doc/src/sgml/config.sgml  |   4 +-
 src/backend/access/transam/xlog.c | 114 +-
 src/backend/postmaster/checkpointer.c |   6 --
 src/include/access/xlog.h |   1 -
 src/include/catalog/pg_control.h  |   2 +-
 5 files changed, 34 insertions(+), 93 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d5f2d23c4..7ea42c25e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2598,7 +2598,9 @@ include_dir 'conf.d'

 This parameter can only be set in the postgresql.conf
 file or on the server command line.
-The default is on.
+
+The default is on. The change of the parmeter takes
+effect at the next checkpoint time.

   
  
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4a47395174..ba9cf07d38 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -202,14 +202,6 @@ static XLogRecPtr LastRec;
 static XLogRecPtr receivedUpto = 0;
 static TimeLineID receiveTLI = 0;
 
-/*
- * During recovery, lastFullPageWrites keeps track of full_page_writes that
- * the 

Re: Problem while setting the fpw with SIGHUP

2018-04-06 Thread Amit Kapila
On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI
 wrote:
> Hello.
>
> At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20180404.172646.238325981.horiguchi.kyot...@lab.ntt.co.jp>
>> > In general, I was wondering why in the first place this variable
>> > (full_page_writes) is a SIGHUP variable?  I think if the user tries to
>> > switch it to 'on' from 'off', it won't guarantee the recovery from
>> > torn pages.  Yeah, one can turn it to 'off' from 'on' without any
>> > problem, but as the reverse doesn't guarantee anything, it can confuse
>> > users. What do you think?
>>
>> I tend to agree with you. It works as expected after the next
>> checkpoint. So define the variable as "it can be changed any time
>> but has an effect at the next checkpoint time", then remove
>> XLOG_FPW_CHANGE record. And that eliminates the problem of
>> concurrent calls since the checkpointer becomes the only modifier
>> of the variable. And the problematic function
>> UpdateFullPageWrites also no longer needs to write a WAL
>> record. The information is conveyed only by checkpoint records.
>
> I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
> pg_start/stop_backup to know FPW's turning-off without waiting
> for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
> required since no one uses the information. It seems even harmful
> when it is written at the incorrect place.
>
> In the attached patch, shared fullPageWrites is updated only at
> REDO point
>

I am not completely sure if that is the right option because this
would mean that we are defining the new scope for a GUC variable.  I
guess we should take the input of others as well.  I am not sure what
is the right way to do that, but maybe we can start a new thread with
a proper subject and description rather than discussing this under
some related bug fix patch discussion.  I guess we can try that after
CF unless some other people pitch in and share their feedback.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-04-06 Thread Kyotaro HORIGUCHI
Hello.

At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180404.172646.238325981.horiguchi.kyot...@lab.ntt.co.jp>
> > In general, I was wondering why in the first place this variable
> > (full_page_writes) is a SIGHUP variable?  I think if the user tries to
> > switch it to 'on' from 'off', it won't guarantee the recovery from
> > torn pages.  Yeah, one can turn it to 'off' from 'on' without any
> > problem, but as the reverse doesn't guarantee anything, it can confuse
> > users. What do you think?
> 
> I tend to agree with you. It works as expected after the next
> checkpoint. So define the variable as "it can be changed any time
> but has an effect at the next checkpoint time", then remove
> XLOG_FPW_CHANGE record. And that eliminates the problem of
> concurrent calls since the checkpointer becomes the only modifier
> of the variable. And the problematic function
> UpdateFullPageWrites also no longer needs to write a WAL
> record. The information is conveyed only by checkpoint records.

I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
pg_start/stop_backup to know FPW's turning-off without waiting
for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
required since no one uses the information. It seems even harmful
when it is written at the incorrect place.

In the attached patch, shared fullPageWrites is updated only at
REDO point and XLOG_FPW_CHANGE is written only for FPW's turning
off before REDO point.

Disabling FPW at any time makes sense when we need to slow down
the speed of WAL urgently, but...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 10f6865cf1e96e8d883ed7e03e1fafb6e73ec935 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Apr 2018 13:57:48 +0900
Subject: [PATCH] Change FPW handling

The GUC full_pages_writes currently has an effect immediately. That
makes a race condition between config reload on checkpointer and
StartupXLOG. But since full page images are meaningful only when they
are attached to all WAL records covers a checkpoint, there is no
problem if we update the shared FPW only at REDO point.  On the other
hand, online backup mechanism on standby requires to know if FPW is
turned off before the next checkpoint record comes.

As the result, with this patch, changing of full_page_writes takes
effect at REDO point and additional XLOG_FPW_CHANGE is written only
for turning-off. These are sufficient for standby-backup to work
properly, reduces complexity and prevent the race condition.
---
 src/backend/access/transam/xlog.c | 114 +-
 src/backend/postmaster/checkpointer.c |   6 --
 src/include/access/xlog.h |   1 -
 src/include/catalog/pg_control.h  |   2 +-
 4 files changed, 31 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..0a81650c26 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -202,14 +202,6 @@ static XLogRecPtr LastRec;
 static XLogRecPtr receivedUpto = 0;
 static TimeLineID receiveTLI = 0;
 
-/*
- * During recovery, lastFullPageWrites keeps track of full_page_writes that
- * the replayed WAL records indicate. It's initialized with full_page_writes
- * that the recovery starting checkpoint record indicates, and then updated
- * each time XLOG_FPW_CHANGE record is replayed.
- */
-static bool lastFullPageWrites;
-
 /*
  * Local copy of SharedRecoveryInProgress variable. True actually means "not
  * known, need to check the shared state".
@@ -6776,11 +6768,7 @@ StartupXLOG(void)
 	 */
 	restoreTwoPhaseData();
 
-	lastFullPageWrites = checkPoint.fullPageWrites;
-
 	RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
-	doPageWrites = lastFullPageWrites;
-
 	if (RecPtr < checkPoint.redo)
 		ereport(PANIC,
 (errmsg("invalid redo in checkpoint record")));
@@ -7575,16 +7563,6 @@ StartupXLOG(void)
 	/* Pre-scan prepared transactions to find out the range of XIDs present */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
 
-	/*
-	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
-	 * record before resource manager writes cleanup WAL records or checkpoint
-	 * record is written.
-	 */
-	Insert->fullPageWrites = lastFullPageWrites;
-	LocalSetXLogInsertAllowed();
-	UpdateFullPageWrites();
-	LocalXLogInsertAllowed = -1;
-
 	if (InRecovery)
 	{
 		/*
@@ -7808,6 +7786,13 @@ StartupXLOG(void)
 	ControlFile->state = DB_IN_PRODUCTION;
 	ControlFile->time = (pg_time_t) time(NULL);
 
+	/*
+	 * Set the initial value of shared fullPageWrites. Once
+	 * SharedRecoveryInProgress is turned false, checkpointer will update this
+	 * value.
+	 */
+	XLogCtl->Insert.fullPageWrites = checkPoint.fullPageWrites;
+
 	SpinLockAcquire(>info_lck);
 	XLogCtl->SharedRecoveryInProgress = false;
 	

Re: Problem while setting the fpw with SIGHUP

2018-04-04 Thread Kyotaro HORIGUCHI
At Sat, 31 Mar 2018 17:43:58 +0530, Amit Kapila  wrote 
in 
> On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI
>  wrote:
> > At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier  
> > wrote in <20180327130226.ga1...@paquier.xyz>
> >> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
> >> > The current UpdateFullPageWrites is safe on standby and promotion
> >> > so what we should consider is only the non-standby case. I think
> >> > what we should do is just calling RecoveryInProgress() at the
> >> > beginning of CheckPointerMain, which is just the same thing with
> >> > InitPostgres, but before setting up signal handler to avoid
> >> > processing SIGHUP before being ready to insert xlog.
> >>
> >> Your proposal does not fix the issue for a checkpointer process started
> >> on a standby.  After a promotion, if SIGHUP is issued with a change in
> >> full_page_writes, then the initialization of InitXLogInsert() would
> >> happen again in the critical section of UpdateFullPageWrites().  The
> >> window is rather small for normal promotions as the startup process
> >> requests a checkpoint which would do the initialization, and much larger
> >> for fallback_promote where the startup process is in charge of doing the
> >> end-of-recovery checkpoint.
> >
> > Yeah. I realized that after sending the mail.
> >
> > I looked closer and found several problems there.
> >
> > - On standby, StartupXLOG calls UpdateFullPageWrites and
> >   checkpointer can call the same function simultaneously, but it
> >   doesn't assume concurrent call.
> >
> > - StartupXLOG can make a concurrent write to
> >   Insert->fullPageWrite so it needs to be locked.
> >
> > - At the time of the very end of recovery, the startup process
> >   ignores possible change of full_page_writes GUC. It sticks with
> >   the startup value. It leads to loss of
> >   XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
> >   reload)
> >
> 
> Won't this be covered by checkpointer process?  Basically, the next
> time checkpointer sees that it has received the sighup, it will call
> UpdateFullPageWrites which will log the record if required.

Right. Checkpointer is doing the right thing, but without
writing XLOG_FPW_CHANGE records during recovery.

The problem is in StartupXLOG. It doesn't read shared FPW flag
during recovery and updates local flag according to WAL
records. Then it tries to issue XLOG_FPW_CHANGE if its local
status and shared flag are different. This is correct.

But after that, checkpointer still cannot write XLOG (before
SharedRecovoeryInProgress becomes false) but checkpointer can
change shared fullPagesWrites without writing WAL and the WAL
record is eventually lost.

> In general, I was wondering why in the first place this variable
> (full_page_writes) is a SIGHUP variable?  I think if the user tries to
> switch it to 'on' from 'off', it won't guarantee the recovery from
> torn pages.  Yeah, one can turn it to 'off' from 'on' without any
> problem, but as the reverse doesn't guarantee anything, it can confuse
> users. What do you think?

I tend to agree with you. It works as expected after the next
checkpoint. So define the variable as "it can be changed any time
but has an effect at the next checkpoint time", then remove
XLOG_FPW_CHANGE record. And that eliminates the problem of
concurrent calls since the checkpointer becomes the only modifier
of the variable. And the problematic function
UpdateFullPageWrites also no longer needs to write a WAL
record. The information is conveyed only by checkpoint records.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Problem while setting the fpw with SIGHUP

2018-03-31 Thread Amit Kapila
On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI
 wrote:
> At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier  
> wrote in <20180327130226.ga1...@paquier.xyz>
>> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
>> > The current UpdateFullPageWrites is safe on standby and promotion
>> > so what we should consider is only the non-standby case. I think
>> > what we should do is just calling RecoveryInProgress() at the
>> > beginning of CheckPointerMain, which is just the same thing with
>> > InitPostgres, but before setting up signal handler to avoid
>> > processing SIGHUP before being ready to insert xlog.
>>
>> Your proposal does not fix the issue for a checkpointer process started
>> on a standby.  After a promotion, if SIGHUP is issued with a change in
>> full_page_writes, then the initialization of InitXLogInsert() would
>> happen again in the critical section of UpdateFullPageWrites().  The
>> window is rather small for normal promotions as the startup process
>> requests a checkpoint which would do the initialization, and much larger
>> for fallback_promote where the startup process is in charge of doing the
>> end-of-recovery checkpoint.
>
> Yeah. I realized that after sending the mail.
>
> I looked closer and found several problems there.
>
> - On standby, StartupXLOG calls UpdateFullPageWrites and
>   checkpointer can call the same function simultaneously, but it
>   doesn't assume concurrent call.
>
> - StartupXLOG can make a concurrent write to
>   Insert->fullPageWrite so it needs to be locked.
>
> - At the time of the very end of recovery, the startup process
>   ignores possible change of full_page_writes GUC. It sticks with
>   the startup value. It leads to loss of
>   XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
>   reload)
>

Won't this be covered by checkpointer process?  Basically, the next
time checkpointer sees that it has received the sighup, it will call
UpdateFullPageWrites which will log the record if required.

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable?  I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages.  Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Problem while setting the fpw with SIGHUP

2018-03-28 Thread Kyotaro HORIGUCHI
At Wed, 28 Mar 2018 15:59:48 +0900, Michael Paquier  wrote 
in <20180328065948.gm1...@paquier.xyz>
> On Wed, Mar 28, 2018 at 03:40:59PM +0900, Kyotaro HORIGUCHI wrote:
> > The attached does that. I don't like that it uses ControlFileLock
> > to exlucde concurrent UpdateFullPageWrites and StartupXLOG but
> > WALInsertLock cannot be used since UpdateFullPageWrites may take
> > the same lock.
> 
> You visibly forgot your patch.

Mmm, someone must have eaten that. I'm sure it is attached this
time.

I don't like UpdateFullPageWrites is using ControlFileLock to
exclusion..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cb9c2a29cb..d2bb5e16ac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7558,9 +7558,11 @@ StartupXLOG(void)
 	/*
 	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
 	 * record before resource manager writes cleanup WAL records or checkpoint
-	 * record is written.
+	 * record is written, ignoreing the change of full_page_write GUC so far.
 	 */
+	WALInsertLockAcquireExclusive();
 	Insert->fullPageWrites = lastFullPageWrites;
+	WALInsertLockRelease();
 	LocalSetXLogInsertAllowed();
 	UpdateFullPageWrites();
 	LocalXLogInsertAllowed = -1;
@@ -7783,6 +7785,9 @@ StartupXLOG(void)
 	 * itself, we use the info_lck here to ensure that there are no race
 	 * conditions concerning visibility of other recent updates to shared
 	 * memory.
+	 *
+	 * ControlFileLock excludes concurrent update of shared fullPageWrites in
+	 * addition to its ordinary use.
 	 */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 	ControlFile->state = DB_IN_PRODUCTION;
@@ -7790,11 +7795,25 @@ StartupXLOG(void)
 
 	SpinLockAcquire(>info_lck);
 	XLogCtl->SharedRecoveryInProgress = false;
+	lastFullPageWrites = Insert->fullPageWrites;
 	SpinLockRelease(>info_lck);
 
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
 
+	/*
+	 * Shared fullPageWrites at the end of recovery differs to our last known
+	 * value. The change has been made by checkpointer but we haven't made
+	 * corresponding XLOG_FPW_CHANGE record. We reread GUC change if any and
+	 * try update shared fullPageWrites by myself. It ends with doing nothing
+	 * if checkpointer already did the same thing.
+	 */
+	if (lastFullPageWrites != fullPageWrites)
+	{
+		HandleStartupProcInterrupts();
+		UpdateFullPageWrites();
+	}
+
 	/*
 	 * If there were cascading standby servers connected to us, nudge any wal
 	 * sender processes to notice that we've been promoted.
@@ -9525,8 +9544,10 @@ XLogReportParameters(void)
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
- * Note: this function assumes there is no other process running
- * concurrently that could update it.
+ * This function assumes called concurrently from multiple processes and
+ * called concurrently with changing of shared fullPageWrites. See
+ * StartupXLOG().
+ * 
  */
 void
 UpdateFullPageWrites(void)
@@ -9536,13 +9557,25 @@ UpdateFullPageWrites(void)
 	/*
 	 * Do nothing if full_page_writes has not been changed.
 	 *
-	 * It's safe to check the shared full_page_writes without the lock,
-	 * because we assume that there is no concurrently running process which
-	 * can update it.
+	 * We acquire ControlFileLock to exclude other concurrent call and change
+	 * of shared fullPageWrites.
 	 */
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	WALInsertLockAcquireExclusive();
 	if (fullPageWrites == Insert->fullPageWrites)
+	{
+		WALInsertLockRelease();
+		LWLockRelease(ControlFileLock);
 		return;
+	}
+	WALInsertLockRelease();
 
+	/*
+	 * Need to set up XLogInsert working area before entering critical section
+	 * if we are not in recovery mode.
+	 */
+	(void) RecoveryInProgress();
+		
 	START_CRIT_SECTION();
 
 	/*
@@ -9578,6 +9611,8 @@ UpdateFullPageWrites(void)
 		WALInsertLockRelease();
 	}
 	END_CRIT_SECTION();
+
+	LWLockRelease(ControlFileLock);
 }
 
 /*


Re: Problem while setting the fpw with SIGHUP

2018-03-28 Thread Michael Paquier
On Wed, Mar 28, 2018 at 03:40:59PM +0900, Kyotaro HORIGUCHI wrote:
> The attached does that. I don't like that it uses ControlFileLock
> to exlucde concurrent UpdateFullPageWrites and StartupXLOG but
> WALInsertLock cannot be used since UpdateFullPageWrites may take
> the same lock.

You visibly forgot your patch.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-03-28 Thread Kyotaro HORIGUCHI
At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier  wrote 
in <20180327130226.ga1...@paquier.xyz>
> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
> > The current UpdateFullPageWrites is safe on standby and promotion
> > so what we should consider is only the non-standby case. I think
> > what we should do is just calling RecoveryInProgress() at the
> > beginning of CheckPointerMain, which is just the same thing with
> > InitPostgres, but before setting up signal handler to avoid
> > processing SIGHUP before being ready to insert xlog.
> 
> Your proposal does not fix the issue for a checkpointer process started
> on a standby.  After a promotion, if SIGHUP is issued with a change in
> full_page_writes, then the initialization of InitXLogInsert() would
> happen again in the critical section of UpdateFullPageWrites().  The
> window is rather small for normal promotions as the startup process
> requests a checkpoint which would do the initialization, and much larger
> for fallback_promote where the startup process is in charge of doing the
> end-of-recovery checkpoint.

Yeah. I realized that after sending the mail.

I looked closer and found several problems there.

- On standby, StartupXLOG calls UpdateFullPageWrites and
  checkpointer can call the same function simultaneously, but it
  doesn't assume concurrent call.

- StartupXLOG can make a concurrent write to
  Insert->fullPageWrite so it needs to be locked.

- At the time of the very end of recovery, the startup process
  ignores possible change of full_page_writes GUC. It sticks with
  the startup value. It leads to loss of
  XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
  reload)

- If checkpointer calls UpdateFullPageWrites concurrently with
  change of SharedRecoveryInProgress to false in StartupXLOG the
  change may lose corresponding XLOG_CHANGE_FPW.

So, if we don't accept the current behavior, what I think we
should do are all of the follows.

A. In StartupXLOG, protect write to Insert->fullPageWrites with
  wal insert exlusive lock. Do the same thing for read in
  UpdateFullPageWrites.

B. Surround the whole UpdateFullPageWrites with any kind of lock
  to exclude concurrent calls. The attached uses ControlFileLock.
  This also exludes the function with chaging of
  SharedRecoveryInProgress to avoid loss of XLOG_CHANGE_FPW.

C. After exiting recovery mode, call UpdateFullPageWrites from
  StartupXLOG if shared fullPageWrites is found changed from the
  last known value. If checkponiter did the same thing at the
  same time, one of them completes the work.

D. Call RecoveryInProgress to set up xlog working area.

The attached does that. I don't like that it uses ControlFileLock
to exlucde concurrent UpdateFullPageWrites and StartupXLOG but
WALInsertLock cannot be used since UpdateFullPageWrites may take
the same lock.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Problem while setting the fpw with SIGHUP

2018-03-27 Thread Michael Paquier
On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
> The current UpdateFullPageWrites is safe on standby and promotion
> so what we should consider is only the non-standby case. I think
> what we should do is just calling RecoveryInProgress() at the
> beginning of CheckPointerMain, which is just the same thing with
> InitPostgres, but before setting up signal handler to avoid
> processing SIGHUP before being ready to insert xlog.

Your proposal does not fix the issue for a checkpointer process started
on a standby.  After a promotion, if SIGHUP is issued with a change in
full_page_writes, then the initialization of InitXLogInsert() would
happen again in the critical section of UpdateFullPageWrites().  The
window is rather small for normal promotions as the startup process
requests a checkpoint which would do the initialization, and much larger
for fallback_promote where the startup process is in charge of doing the
end-of-recovery checkpoint.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-03-27 Thread Kyotaro HORIGUCHI
At Tue, 27 Mar 2018 16:46:30 +0900, Michael Paquier  wrote 
in <20180327074630.gd9...@paquier.xyz>
> I have finally been able to spend more time on this issue, and checked
> for a couple of hours all the calls to RecoveryInProgress() that could
> be triggered within a critical section to see if the move I suggested
> previously is worth it ot not as this would cost some memory for
> standbys all the time, which would suck for many read-only sessions.
> 
> There are a couple of calls potentially happening in critical sections,
> however except for the one in UpdateFullPageWrites() those are used for
> sanity  checks in code paths that should never trigger it, take
> XLogInsertBegin() for example.  So with assertions this would trigger
> a failure before the real elog(ERROR) message shows up.
> 
> Hence, I am changing opinion still I think that instead of the patch you
> proposed first we could just do a call to InitXLogInsert() before
> entering the critical section.  This is also more consistent with what
> CreateCheckpoint() does.  That's also less risky for a backpatch as your
> patch increases the window between the beginning of the critical section
> and the real moment where the check for RecoveryInProgress is needed.  A
> comment explaining why the initialization needs to happen is also
> essential.
> 
> All in all, this would give the simple patch attached.
> 
> Thoughts?

At the first look I was uneasy that the patch initializes xlog
working area earlier than required.

The current UpdateFullPageWrites is safe on standby and promotion
so what we should consider is only the non-standby case. I think
what we should do is just calling RecoveryInProgress() at the
beginning of CheckPointerMain, which is just the same thing with
InitPostgres, but before setting up signal handler to avoid
processing SIGHUP before being ready to insert xlog.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4b452e7cee..c446e81299 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -197,6 +197,13 @@ CheckpointerMain(void)
 
 	CheckpointerShmem->checkpointer_pid = MyProcPid;
 
+	/*
+	 * We need to call InitXLOGAccess(), if the system isn't in hot-standby
+	 * mode.  This is handled by calling RecoveryInProgress and ignoring the
+	 * result. This needs to be done before SIGHUP handler is set up.
+	 */
+	(void) RecoveryInProgress();
+
 	/*
 	 * Properly accept or ignore signals the postmaster might send us
 	 *


Re: Problem while setting the fpw with SIGHUP

2018-03-27 Thread Michael Paquier
On Mon, Mar 26, 2018 at 02:32:22PM +0530, Dilip Kumar wrote:
> On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier 
> wrote:
> In StartupXLOG, just before the CreateCheckPoint() call,  we are calling
> LocalSetXLogInsertAllowed().  So, I am thinking can we just remove
> InitXLogInsert from CreateCheckpoint. And, we don't need to move it to
> bootstrap.c.  Or am I missing something?

I have finally been able to spend more time on this issue, and checked
for a couple of hours all the calls to RecoveryInProgress() that could
be triggered within a critical section to see if the move I suggested
previously is worth it ot not as this would cost some memory for
standbys all the time, which would suck for many read-only sessions.

There are a couple of calls potentially happening in critical sections,
however except for the one in UpdateFullPageWrites() those are used for
sanity  checks in code paths that should never trigger it, take
XLogInsertBegin() for example.  So with assertions this would trigger
a failure before the real elog(ERROR) message shows up.

Hence, I am changing opinion still I think that instead of the patch you
proposed first we could just do a call to InitXLogInsert() before
entering the critical section.  This is also more consistent with what
CreateCheckpoint() does.  That's also less risky for a backpatch as your
patch increases the window between the beginning of the critical section
and the real moment where the check for RecoveryInProgress is needed.  A
comment explaining why the initialization needs to happen is also
essential.

All in all, this would give the simple patch attached.

Thoughts?
--
Michael
From 0fc3878375890aef5c0a6dc57d73c30aa97c88df Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 27 Mar 2018 16:38:37 +0900
Subject: [PATCH] Fix WAL insert when updating full_page_writes for
 checkpointer

At startup or when receiving a SIGHUP signal to update its configuration,
the checkpointer may need to update the shared memory for
full_page_writes and needs to write a WAL record about it.  However,
just after a promotion, there is a small window where it is possible
that the memory context in charge of building WAL records has not been
correctly initialized, leading to a crash of the checkpointer.  With
assertions enabled, this causes a failure when allocating memory in a
critical section.  And this fails when trying to build the FPW record
without assertions.

Report and patch by Dilip Kumar, reviewed by Michael Paquier.

Discussion:
https://postgr.es/m/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com
---
 src/backend/access/transam/xlog.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cb9c2a29cb..aae1131977 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9543,6 +9543,16 @@ UpdateFullPageWrites(void)
 	if (fullPageWrites == Insert->fullPageWrites)
 		return;
 
+	/*
+	 * Initialize InitXLogInsert working areas before entering the critical
+	 * section.  Normally, this is done by the first call to
+	 * RecoveryInProgress() or LocalSetXLogInsertAllowed(), but it can
+	 * be possible that this has not been initialized yet for a checkpointer
+	 * which updates the value of full_page_writes for a received SIGHUP or
+	 * at process startup.
+	 */
+	InitXLogInsert();
+
 	START_CRIT_SECTION();
 
 	/*
-- 
2.16.3



signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-03-26 Thread Dilip Kumar
On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier 
wrote:

> On Tue, Mar 20, 2018 at 12:00:47PM +0530, Dilip Kumar wrote:
> > Yeah, you are right.  Fixed.
>
> So I have been spending a couple of hours playing with your patch, and
> tested various configurations manually, like switch the fpw switch to on
> and off while running in parallel pgbench.  I have also tested
> promotions, etc.
>
> While the patch does its job, it is possible to simplify a bit more the
> calls to InitXLogInsert().  Particularly, the one in CreateCheckpoint()
> is basically useless for the checkpointer, still it is useful for the
> startup process when trigerring an end-in-recovery checkpoint.  So one
> additional cleanup would be to move the call in CreateCheckpoint() to
> bootstrap.c for the startup process.


In StarupXLOG, just before the CreateCheckPoint() call,  we are calling
LocalSetXLogInsertAllowed().  So, I am thinking can we just remove
InitXLogInsert
from CreateCheckpoint. And, we don't need to move it to bootstrap.c.  Or am
I missing something?


> In order to test that, please make
> sure to create fallback_promote at the root of the data folder before
> sending SIGUSR2 to the startup process which would trigger the pre-9.3
> promotion where the end-of-recovery checkpoint is triggered by the
> startup process itself.
>
> +   /* Initialize the working areas for constructing WAL records. */
> +   InitXLogInsert();
> Instead of having the same comment for each process calling
> InitXLogInsert() multiple times, I think that it would be better to
> complete the comment in bootstrap.c where is written "XLOG operations".
>
> Here is a suggestion:
> /*
>  * Initialize WAL-related operations and enter the main loop of each
>  * process.  InitXLogInsert is called for each process which can
>  * generate WAL.  While this is wasteful for processes started on
>  * a standby, this gives the guarantee that initialization of WAL
>  * insertion areas is able to happen in a consistent way and out of
>  * any critical sections so as the facility is usable when a promotion
>  * is triggered.
>  */
>
> What do you think?
>

Looks good to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Problem while setting the fpw with SIGHUP

2018-03-23 Thread Michael Paquier
On Tue, Mar 20, 2018 at 12:00:47PM +0530, Dilip Kumar wrote:
> Yeah, you are right.  Fixed.

So I have been spending a couple of hours playing with your patch, and
tested various configurations manually, like switch the fpw switch to on
and off while running in parallel pgbench.  I have also tested
promotions, etc.

While the patch does its job, it is possible to simplify a bit more the
calls to InitXLogInsert().  Particularly, the one in CreateCheckpoint()
is basically useless for the checkpointer, still it is useful for the
startup process when trigerring an end-in-recovery checkpoint.  So one
additional cleanup would be to move the call in CreateCheckpoint() to
bootstrap.c for the startup process.  In order to test that, please make
sure to create fallback_promote at the root of the data folder before
sending SIGUSR2 to the startup process which would trigger the pre-9.3
promotion where the end-of-recovery checkpoint is triggered by the
startup process itself.

+   /* Initialize the working areas for constructing WAL records. */
+   InitXLogInsert();
Instead of having the same comment for each process calling
InitXLogInsert() multiple times, I think that it would be better to
complete the comment in bootstrap.c where is written "XLOG operations".

Here is a suggestion:
/*
 * Initialize WAL-related operations and enter the main loop of each
 * process.  InitXLogInsert is called for each process which can
 * generate WAL.  While this is wasteful for processes started on
 * a standby, this gives the guarantee that initialization of WAL
 * insertion areas is able to happen in a consistent way and out of
 * any critical sections so as the facility is usable when a promotion
 * is triggered.
 */

What do you think?
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-03-20 Thread Dilip Kumar
On Tue, Mar 20, 2018 at 11:26 AM, Michael Paquier 
wrote:

> On Tue, Mar 20, 2018 at 10:43:55AM +0530, Dilip Kumar wrote:
> > I think like WALWriterProcess, we need to call InitXLogInsert for the
> > CheckpointerProcess as well as for the BgWriterProcess
> > because earlier they were calling InitXLogInsert while check
> > RecoveryInProgress before inserting the WAL.
>
> /* don't set signals, bgwriter has its own agenda */
> +   InitXLOGAccess();
> +   InitXLogInsert()
>
> This is wrong, as the checkpointer is started as well on standbys, and
> that InitXLOGAccess initializes things for WAL generation like
> ThisTimeLineID.  So you should just call InitXLogInsert(), and a comment
> would be welcome for both the bgwriter and the checkpointer.
>

Yeah, you are right.  Fixed.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


xlog-insert-critical-v3.patch
Description: Binary data


Re: Problem while setting the fpw with SIGHUP

2018-03-19 Thread Michael Paquier
On Tue, Mar 20, 2018 at 10:43:55AM +0530, Dilip Kumar wrote:
> I think like WALWriterProcess, we need to call InitXLogInsert for the
> CheckpointerProcess as well as for the BgWriterProcess
> because earlier they were calling InitXLogInsert while check
> RecoveryInProgress before inserting the WAL.

/* don't set signals, bgwriter has its own agenda */
+   InitXLOGAccess();
+   InitXLogInsert()

This is wrong, as the checkpointer is started as well on standbys, and
that InitXLOGAccess initializes things for WAL generation like
ThisTimeLineID.  So you should just call InitXLogInsert(), and a comment
would be welcome for both the bgwriter and the checkpointer.
--
Michael


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-03-19 Thread Dilip Kumar
On Fri, Mar 16, 2018 at 10:53 AM, Michael Paquier 
wrote:

> On Tue, Mar 13, 2018 at 05:04:04PM +0900, Michael Paquier wrote:
> > Instead of doing what you are suggesting, why not moving
> > InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as
> > the allocations for WAL inserts is done unconditionally?  This has
> > the cost of also making this allocation even for backends which are
> > started during recovery, still we are talking about allocating a couple
> > of bytes in exchange of addressing completely all race conditions in
> > this area.  InitXLogInsert() does not depend on any post-recovery data
> > like ThisTimeLineId, so a split is possible.
>
> I have been hacking things this way, and it seems to me that it takes
> care of all this class of problems.  CreateCheckPoint() actually
> mentions that InitXLogInsert() cannot be called in a critical section,
> so the comments don't match the code.  I also think that we still want
> to be able to use RecoveryInProgress() in critical sections to do
> decision-making for the generation of WAL records
>

Thanks for the patch, the idea looks good to me.  Please find some comments
and updated patch.

I think like WALWriterProcess, we need to call InitXLogInsert for the
CheckpointerProcess as well as for the BgWriterProcess
because earlier they were calling InitXLogInsert while check
RecoveryInProgress before inserting the WAL.

see below crash:
#0  0x7f89273a65d7 in raise () from /lib64/libc.so.6
#1  0x7f89273a7cc8 in abort () from /lib64/libc.so.6
#2  0x009fd24e in errfinish (dummy=0) at elog.c:556
#3  0x009ff70b in elog_finish (elevel=20, fmt=0xac0d1d "too much
WAL data") at elog.c:1378
#4  0x00558766 in XLogRegisterData (data=0xf3efac 
"\001", len=1) at xloginsert.c:330
#5  0x0055080e in UpdateFullPageWrites () at xlog.c:9569
#6  0x007ea831 in UpdateSharedMemoryConfig () at checkpointer.c:1360
#7  0x007e95b1 in CheckpointerMain () at checkpointer.c:370
#8  0x00561680 in AuxiliaryProcessMain (argc=2,
argv=0x7fffcfd4bec0) at bootstrap.c:447

I have modified you patch and called InitXLogInsert for CheckpointerProcess
and BgWriterProcess also. After that the
issue is solved and fpw is getting set properly.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


xlog-insert-critical-v2.patch
Description: Binary data


Re: Problem while setting the fpw with SIGHUP

2018-03-15 Thread Michael Paquier
On Tue, Mar 13, 2018 at 05:04:04PM +0900, Michael Paquier wrote:
> Instead of doing what you are suggesting, why not moving
> InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as
> the allocations for WAL inserts is done unconditionally?  This has
> the cost of also making this allocation even for backends which are
> started during recovery, still we are talking about allocating a couple
> of bytes in exchange of addressing completely all race conditions in
> this area.  InitXLogInsert() does not depend on any post-recovery data
> like ThisTimeLineId, so a split is possible.

I have been hacking things this way, and it seems to me that it takes
care of all this class of problems.  CreateCheckPoint() actually
mentions that InitXLogInsert() cannot be called in a critical section,
so the comments don't match the code.  I also think that we still want
to be able to use RecoveryInProgress() in critical sections to do
decision-making for the generation of WAL records.
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..cff238ee2a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8049,6 +8049,9 @@ LocalSetXLogInsertAllowed(void)
 
 	/* Initialize as RecoveryInProgress() would do when switching state */
 	InitXLOGAccess();
+
+	/* Also initialize the working areas for constructing WAL records */
+	InitXLogInsert();
 }
 
 /*
@@ -8178,9 +8181,6 @@ InitXLOGAccess(void)
 	(void) GetRedoRecPtr();
 	/* Also update our copy of doPageWrites. */
 	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
-
-	/* Also initialize the working areas for constructing WAL records */
-	InitXLogInsert();
 }
 
 /*
@@ -8582,11 +8582,11 @@ CreateCheckPoint(int flags)
 
 	/*
 	 * Initialize InitXLogInsert working areas before entering the critical
-	 * section.  Normally, this is done by the first call to
-	 * RecoveryInProgress() or LocalSetXLogInsertAllowed(), but when creating
-	 * an end-of-recovery checkpoint, the LocalSetXLogInsertAllowed call is
-	 * done below in a critical section, and InitXLogInsert cannot be called
-	 * in a critical section.
+	 * section.  Normally, this is done at backend startup or when calling
+	 * LocalSetXLogInsertAllowed(), but when creating an end-of-recovery
+	 * checkpoint, the LocalSetXLogInsertAllowed call is done below in a
+	 * critical section, and InitXLogInsert cannot be called in a critical
+	 * section.
 	 */
 	InitXLogInsert();
 
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 28ff2f0979..4d20b9712f 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -452,6 +452,7 @@ AuxiliaryProcessMain(int argc, char *argv[])
 		case WalWriterProcess:
 			/* don't set signals, walwriter has its own agenda */
 			InitXLOGAccess();
+			InitXLogInsert();
 			WalWriterMain();
 			proc_exit(1);		/* should never return */
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d8f45b3c43..8db377c1cf 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -618,6 +618,15 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	 */
 	if (IsUnderPostmaster)
 	{
+		/*
+		 * Initialize the working areas for constructing WAL records.
+		 * This is done even for a standby instance to avoid initialization
+		 * of this machinery after a promotion, which could happen in a
+		 * critical section and InitXLogInsert() cannot be called in such
+		 * a code path.
+		 */
+		InitXLogInsert();
+
 		/*
 		 * The postmaster already started the XLOG machinery, but we need to
 		 * call InitXLOGAccess(), if the system isn't in hot-standby mode.


signature.asc
Description: PGP signature


Re: Problem while setting the fpw with SIGHUP

2018-03-13 Thread Michael Paquier
On Fri, Mar 09, 2018 at 01:42:04PM +0530, Dilip Kumar wrote:
> While setting the full_page_write with SIGHUP I hit an assert in checkpoint
> process. And, that is because inside a CRITICAL section we are calling
> RecoveryInProgress which intern allocates memory.  So I have moved
> RecoveryInProgress call out of the CRITICAL section and the problem got
> solved.

Indeed, I can see how this is possible.

If you apply the following you can also have way more fun, but that's
overdoing it:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7918,6 +7918,8 @@ CheckRecoveryConsistency(void)
bool
RecoveryInProgress(void)
{
+   Assert(CritSectionCount == 0);

Anyway, it seems to me that you are not taking care of all possible race
conditions here.  RecoveryInProgress() could as well be called in
XLogFlush(), and that's a code path taken during redo.

Instead of doing what you are suggesting, why not moving
InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as
the allocations for WAL inserts is done unconditionally?  This has
the cost of also making this allocation even for backends which are
started during recovery, still we are talking about allocating a couple
of bytes in exchange of addressing completely all race conditions in
this area.  InitXLogInsert() does not depend on any post-recovery data
like ThisTimeLineId, so a split is possible.

Thoughts?
--
Michael


signature.asc
Description: PGP signature