Re: Should we remove a fallback promotion? take 2

2020-07-29 Thread Fujii Masao




On 2020/07/28 20:35, Hamid Akhtar wrote:

There have been no real objections on this patch and it seems to work.


Thanks! So I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Should we remove a fallback promotion? take 2

2020-07-28 Thread Hamid Akhtar
There have been no real objections on this patch and it seems to work. So,
IMHO, the only thing that needs to be done is perhaps update the patch so
that it applies clearly on the master branch.

On Mon, Jul 27, 2020 at 9:31 PM Fujii Masao 
wrote:

>
>
> On 2020/07/27 17:53, Hamid Akhtar wrote:
> > Applying the patch to the current master branch throws 9 hunks. AFAICT,
> the patch is good otherwise.
>
> So you think that the patch can be marked as Ready for Committer?
>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


Re: Should we remove a fallback promotion? take 2

2020-07-27 Thread Fujii Masao




On 2020/07/27 17:53, Hamid Akhtar wrote:

Applying the patch to the current master branch throws 9 hunks. AFAICT, the 
patch is good otherwise.


So you think that the patch can be marked as Ready for Committer?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Should we remove a fallback promotion? take 2

2020-07-27 Thread Hamid Akhtar
Applying the patch to the current master branch throws 9 hunks. AFAICT, the
patch is good otherwise.

On Wed, Jun 3, 2020 at 3:20 PM Fujii Masao 
wrote:

>
>
> On 2020/06/03 12:06, Kyotaro Horiguchi wrote:
> > At Wed, 3 Jun 2020 09:43:17 +0900, Fujii Masao <
> masao.fu...@oss.nttdata.com> wrote in
> >> I will change the status back to Needs Review.
>
> Thanks for the review!
>
> >   record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1,
> false);
> >   if (record != NULL)
> >   {
> > -  fast_promoted = true;
> > +  promoted = true;
> >
> > Even if we missed the last checkpoint record, we don't give up
> > promotion and continue fall-back promotion but the variable "promoted"
> > stays false. That is confusiong.
> >
> > How about changing it to fallback_promotion, or some names with more
> > behavior-specific name like immediate_checkpoint_needed?
>
>
> I like doEndOfRecoveryCkpt or something, but I have no strong opinion
> about that flag naming. So I'm ok with immediate_checkpoint_needed
> if others also like that, too.
>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


Re: Should we remove a fallback promotion? take 2

2020-06-03 Thread Fujii Masao




On 2020/06/03 12:06, Kyotaro Horiguchi wrote:

At Wed, 3 Jun 2020 09:43:17 +0900, Fujii Masao  
wrote in

I will change the status back to Needs Review.


Thanks for the review!


  record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, false);
  if (record != NULL)
  {
-  fast_promoted = true;
+  promoted = true;

Even if we missed the last checkpoint record, we don't give up
promotion and continue fall-back promotion but the variable "promoted"
stays false. That is confusiong.

How about changing it to fallback_promotion, or some names with more
behavior-specific name like immediate_checkpoint_needed?



I like doEndOfRecoveryCkpt or something, but I have no strong opinion
about that flag naming. So I'm ok with immediate_checkpoint_needed
if others also like that, too.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Should we remove a fallback promotion? take 2

2020-06-02 Thread Kyotaro Horiguchi
At Wed, 3 Jun 2020 09:43:17 +0900, Fujii Masao  
wrote in 
> I will change the status back to Needs Review.

 record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, false);
 if (record != NULL)
 {
-  fast_promoted = true;
+  promoted = true;

Even if we missed the last checkpoint record, we don't give up
promotion and continue fall-back promotion but the variable "promoted"
stays false. That is confusiong.

How about changing it to fallback_promotion, or some names with more
behavior-specific name like immediate_checkpoint_needed?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Should we remove a fallback promotion? take 2

2020-06-02 Thread Fujii Masao



On 2020/06/03 3:38, Hamid Akhtar wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

I've applied the v2 patch on the master branch. There some hunks, but the patch 
got applied. So, I ran make installcheck-world and everything looks fine to me 
with this patch. Though, I do have a few suggestions in general:


Thanks for the test and review!


(1) I see two functions being used (a) CheckPromoteSignal and (b) IsPromoteSignaled in the code. Should these 
be combined into a single function and perhaps check for "promote_signaled" and the 
"PROMOTE_SIGNAL_FILE". Not sure if doing this will break "sigusr1_handler" in 
postmaster.c though.


I don't think we can do that simply. CheckPromoteSignal() can be called by
both postmaster and the startup process. OTOH, IsPromoteSignaled()
accesses the flag that can be set only in the startup process' signal handler,
i.e., it's intended to be called only by the startup process.


(2) CheckPromoteSignal is checking for "PROMOTE_SIGNAL_FILE" file. So, perhaps, rather than calling 
stat on "PROMOTE_SIGNAL_FILE" in if statements, I would suggest to use CheckPromoteSignal function 
instead as it does nothing but stat on "PROMOTE_SIGNAL_FILE" (after applying your patch).


Yes, that's good idea. Attached is the updated version of the patch.
I replaced that stat() with CheckPromoteSignal(). Also I replaced
unlink(PROMOTE_SIGNAL_FILE) with RemovePromoteSignalFiles().


The new status of this patch is: Waiting on Author


I will change the status back to Needs Review.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index ca09d81b08..1df1aa9b30 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -299,9 +299,6 @@ boolwal_receiver_create_temp_slot = false;
 /* are we currently in standby mode? */
 bool   StandbyMode = false;
 
-/* whether request for fast promotion has been made yet */
-static bool fast_promote = false;
-
 /*
  * if recoveryStopsBefore/After returns true, it saves information of the stop
  * point here
@@ -6319,7 +6316,7 @@ StartupXLOG(void)
DBState dbstate_at_startup;
XLogReaderState *xlogreader;
XLogPageReadPrivate private;
-   boolfast_promoted = false;
+   boolpromoted = false;
struct stat st;
 
/*
@@ -7724,14 +7721,14 @@ StartupXLOG(void)
 * the rule that TLI only changes in shutdown checkpoints, which
 * allows some extra error checking in xlog_redo.
 *
-* In fast promotion, only create a lightweight end-of-recovery 
record
+* In promotion, only create a lightweight end-of-recovery 
record
 * instead of a full checkpoint. A checkpoint is requested 
later,
 * after we're fully out of recovery mode and already accepting
 * queries.
 */
if (bgwriterLaunched)
{
-   if (fast_promote)
+   if (LocalPromoteIsTriggered)
{
checkPointLoc = ControlFile->checkPoint;
 
@@ -7742,7 +7739,7 @@ StartupXLOG(void)
record = ReadCheckpointRecord(xlogreader, 
checkPointLoc, 1, false);
if (record != NULL)
{
-   fast_promoted = true;
+   promoted = true;
 
/*
 * Insert a special WAL record to mark 
the end of
@@ -7759,7 +7756,7 @@ StartupXLOG(void)
}
}
 
-   if (!fast_promoted)
+   if (!promoted)
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
  
CHECKPOINT_IMMEDIATE |
  
CHECKPOINT_WAIT);
@@ -7950,12 +7947,12 @@ StartupXLOG(void)
WalSndWakeup();
 
/*
-* If this was a fast promotion, request an (online) checkpoint now. 
This
+* If this was a promotion, request an (online) checkpoint now. This
 * isn't required for consistency, but the last restartpoint might be 
far
 * back, and in case of a crash, recovering from it might take a longer
 * than is appropriate now that we're not in standby mode anymore.
 */
-   if (fast_promoted)
+   if (promoted)

Re: Should we remove a fallback promotion? take 2

2020-06-02 Thread Hamid Akhtar
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

I've applied the v2 patch on the master branch. There some hunks, but the patch 
got applied. So, I ran make installcheck-world and everything looks fine to me 
with this patch. Though, I do have a few suggestions in general:

(1) I see two functions being used (a) CheckPromoteSignal and (b) 
IsPromoteSignaled in the code. Should these be combined into a single function 
and perhaps check for "promote_signaled" and the "PROMOTE_SIGNAL_FILE". Not 
sure if doing this will break "sigusr1_handler" in postmaster.c though.

(2) CheckPromoteSignal is checking for "PROMOTE_SIGNAL_FILE" file. So, perhaps, 
rather than calling stat on "PROMOTE_SIGNAL_FILE" in if statements, I would 
suggest to use CheckPromoteSignal function instead as it does nothing but stat 
on "PROMOTE_SIGNAL_FILE" (after applying your patch).

The new status of this patch is: Waiting on Author


Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-22 Thread Fujii Masao




On 2020/04/23 3:56, Jehan-Guillaume de Rorthais wrote:

On Wed, 22 Apr 2020 11:51:15 +0900
Fujii Masao  wrote:


On 2020/04/22 10:53, Kyotaro Horiguchi wrote:
  [...]
  [...]
  [...]
  [...]
  [...]
  [...]
  [...]
  [...]
  [...]
  [...]
  [...]

Thanks all for checking whether the change affects each HA solution!


Unless I'm wrong, we don't have feedback from Patroni team.

I did some quick grep and it seems to rely on "pg_ctl promote" as well.
Moreover, the latest commit 80fbe9005 force a checkpoint right after the
promote. So I suppose they don't use non-fast promote.


Thanks for checking that!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-22 Thread Jehan-Guillaume de Rorthais
On Wed, 22 Apr 2020 11:51:15 +0900
Fujii Masao  wrote:

> On 2020/04/22 10:53, Kyotaro Horiguchi wrote:
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> 
> Thanks all for checking whether the change affects each HA solution!

Unless I'm wrong, we don't have feedback from Patroni team.

I did some quick grep and it seems to rely on "pg_ctl promote" as well.
Moreover, the latest commit 80fbe9005 force a checkpoint right after the
promote. So I suppose they don't use non-fast promote.

I CC'ed Alexander Kukushkin to this discussion, so at least he is aware of
this topic.

Regards,




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Kyotaro Horiguchi
At Wed, 22 Apr 2020 11:51:42 +0900, Fujii Masao  
wrote in 
> > I meant that we always have EOR at the end of recovery.  So in the
> > missing latest checkpoint (and crash recovery) case, we insert EOR
> > after the immediate checkpoint. That also means we no longer set
> > CHECKPOINT_END_OF_RECOVERY to the checkpoint, too.
> 
> Could you tell me what the benefit by this change is? Even with this
> change,
> the server still needs to wait for the checkpoint to complete before
> becoming the master and starting the service, unlike fast
> promotion. No?

There's no benefit of performance.  It's just for simplicity by
signalling end-of-recovery in a unified way.

Long ago, we had only non-fast promotion, which is marked by
CHECKPOINT_END_OF_RECOVERY.  When we introduced fast-promotion, it is
marked by the END_OF_RECOVERY record since checkpoint record is not
inserted at the promotion time. However, we internally fall back to
non-fast promotion when we need to make a checkpoint immediately.
If we remove non-fast checkpoint, we don't need two means to signal
end-of-recovery.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Fujii Masao




On 2020/04/22 9:13, Kyotaro Horiguchi wrote:

At Tue, 21 Apr 2020 22:08:56 +0900, Fujii Masao  
wrote in



On 2020/04/21 17:15, Kyotaro Horiguchi wrote:

At Mon, 20 Apr 2020 15:26:16 +0900, Fujii Masao
 wrote in

Patch attached. I will add this into the first CF for v14.

-   if (!fast_promoted)
+   if (!promoted)
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
  
CHECKPOINT_IMMEDIATE |
  
CHECKPOINT_WAIT);
If we don't find the checkpoint record just before, we don't insert
End-Of-Recovery record then run an immediate chekpoint.  I think if we
nuke the non-fast promotion, shouldn't we insert the EOR record even
in that case?


I'm not sure if that's safe. What if the server crashes before the
checkpoint
completes in that case? Since the last checkpoint record is not
available,
the subsequent crash recovery will fail. This would lead to that the
server
will never start up. Right? Currently ISTM that


Yes, that's right.


end-of-recovery-checkpoint
is executed to avoid such trouble in that case.


I meant that we always have EOR at the end of recovery.  So in the
missing latest checkpoint (and crash recovery) case, we insert EOR
after the immediate checkpoint. That also means we no longer set
CHECKPOINT_END_OF_RECOVERY to the checkpoint, too.


Could you tell me what the benefit by this change is? Even with this change,
the server still needs to wait for the checkpoint to complete before
becoming the master and starting the service, unlike fast promotion. No?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Fujii Masao




On 2020/04/22 10:53, Kyotaro Horiguchi wrote:

At Wed, 22 Apr 2020 10:28:07 +0900, Ian Barwick  
wrote in

On 2020/04/22 6:53, Alvaro Herrera wrote:

On 2020-Apr-21, Jehan-Guillaume de Rorthais wrote:


On Tue, 21 Apr 2020 15:36:22 +0900
Michael Paquier  wrote:



Also in that case, non-fast promotion is triggered. Since my patch
tries to remove non-fast promotion, it's intentional to prevent them
from doing that. But you think that we should not drop that because
there are still some users for that?


It would be good to ask around to folks maintaining HA solutions about
that change at least, as there could be a point in still letting
promotion to happen in this case, but switch silently to the fast
path.


FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.

AFAICT repmgr uses 'pg_ctl promote', and has since version 3.0
(released
in mid 2015).  It was only 3.3.2 (mid 2017) that supported Postgres
10,
so it seems fairly safe to assume that the removal won't be a problem.


Correct, repmgr uses "pg_ctl promote" or pg_promote() (if available),
and
won't be affected by this change.


For the record, the pgsql resource agent uses "pg_ctl promote" and
working with fast-promote.  Auxiliary tools for it is assuming
fast-promote.


Thanks all for checking whether the change affects each HA solution!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Fujii Masao



On 2020/04/22 6:57, Alvaro Herrera wrote:

On 2020-Apr-20, Fujii Masao wrote:


+   /*
+* In 9.1 and 9.2 the postmaster unlinked the promote file inside the
+* signal handler. It now leaves the file in place and lets the
+* Startup process do the unlink.
+*/
+   if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, _buf) == 0)
{
-   /*
-* In 9.1 and 9.2 the postmaster unlinked the promote file 
inside the
-* signal handler. It now leaves the file in place and lets the
-* Startup process do the unlink. This allows Startup to know 
whether
-* it should create a full checkpoint before starting up 
(fallback
-* mode). Fast promotion takes precedence.
-*/


It seems pointless to leave a very old comment that documents what the
code no longer does.  I thikn it would be better to reword it indicating
what the code does do, ie. something like "Leave the signal file in
place; it will be removed by the startup process when ..."


Agreed. And, while reading the related code, I thought that it's more proper
to place this comment in CheckPromoteSignal() rather than
CheckForStandbyTrigger(). Because CheckPromoteSignal() actually does
what the comment says, i.e., leaves the promote signal file in place and
lets the startup process do the unlink.

Attached is the updated version of the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 11e32733c4..43b6d8da69 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -298,9 +298,6 @@ boolwal_receiver_create_temp_slot = false;
 /* are we currently in standby mode? */
 bool   StandbyMode = false;
 
-/* whether request for fast promotion has been made yet */
-static bool fast_promote = false;
-
 /*
  * if recoveryStopsBefore/After returns true, it saves information of the stop
  * point here
@@ -6311,7 +6308,7 @@ StartupXLOG(void)
DBState dbstate_at_startup;
XLogReaderState *xlogreader;
XLogPageReadPrivate private;
-   boolfast_promoted = false;
+   boolpromoted = false;
struct stat st;
 
/*
@@ -7698,14 +7695,14 @@ StartupXLOG(void)
 * the rule that TLI only changes in shutdown checkpoints, which
 * allows some extra error checking in xlog_redo.
 *
-* In fast promotion, only create a lightweight end-of-recovery 
record
+* In promotion, only create a lightweight end-of-recovery 
record
 * instead of a full checkpoint. A checkpoint is requested 
later,
 * after we're fully out of recovery mode and already accepting
 * queries.
 */
if (bgwriterLaunched)
{
-   if (fast_promote)
+   if (LocalPromoteIsTriggered)
{
checkPointLoc = ControlFile->checkPoint;
 
@@ -7716,7 +7713,7 @@ StartupXLOG(void)
record = ReadCheckpointRecord(xlogreader, 
checkPointLoc, 1, false);
if (record != NULL)
{
-   fast_promoted = true;
+   promoted = true;
 
/*
 * Insert a special WAL record to mark 
the end of
@@ -7733,7 +7730,7 @@ StartupXLOG(void)
}
}
 
-   if (!fast_promoted)
+   if (!promoted)
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
  
CHECKPOINT_IMMEDIATE |
  
CHECKPOINT_WAIT);
@@ -7924,12 +7921,12 @@ StartupXLOG(void)
WalSndWakeup();
 
/*
-* If this was a fast promotion, request an (online) checkpoint now. 
This
+* If this was a promotion, request an (online) checkpoint now. This
 * isn't required for consistency, but the last restartpoint might be 
far
 * back, and in case of a crash, recovering from it might take a longer
 * than is appropriate now that we're not in standby mode anymore.
 */
-   if (fast_promoted)
+   if (promoted)
RequestCheckpoint(CHECKPOINT_FORCE);
 }
 
@@ -12532,29 +12529,10 @@ CheckForStandbyTrigger(void)
if (LocalPromoteIsTriggered)
return true;
 
-   if (IsPromoteSignaled())
+   if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, _buf) 

Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Kyotaro Horiguchi
At Wed, 22 Apr 2020 10:28:07 +0900, Ian Barwick  
wrote in 
> On 2020/04/22 6:53, Alvaro Herrera wrote:
> > On 2020-Apr-21, Jehan-Guillaume de Rorthais wrote:
> > 
> >> On Tue, 21 Apr 2020 15:36:22 +0900
> >> Michael Paquier  wrote:
> > 
>  Also in that case, non-fast promotion is triggered. Since my patch
>  tries to remove non-fast promotion, it's intentional to prevent them
>  from doing that. But you think that we should not drop that because
>  there are still some users for that?
> >>>
> >>> It would be good to ask around to folks maintaining HA solutions about
> >>> that change at least, as there could be a point in still letting
> >>> promotion to happen in this case, but switch silently to the fast
> >>> path.
> >>
> >> FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.
> > AFAICT repmgr uses 'pg_ctl promote', and has since version 3.0
> > (released
> > in mid 2015).  It was only 3.3.2 (mid 2017) that supported Postgres
> > 10,
> > so it seems fairly safe to assume that the removal won't be a problem.
> 
> Correct, repmgr uses "pg_ctl promote" or pg_promote() (if available),
> and
> won't be affected by this change.

For the record, the pgsql resource agent uses "pg_ctl promote" and
working with fast-promote.  Auxiliary tools for it is assuming
fast-promote.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Ian Barwick

On 2020/04/22 6:53, Alvaro Herrera wrote:

On 2020-Apr-21, Jehan-Guillaume de Rorthais wrote:


On Tue, 21 Apr 2020 15:36:22 +0900
Michael Paquier  wrote:



Also in that case, non-fast promotion is triggered. Since my patch
tries to remove non-fast promotion, it's intentional to prevent them
from doing that. But you think that we should not drop that because
there are still some users for that?


It would be good to ask around to folks maintaining HA solutions about
that change at least, as there could be a point in still letting
promotion to happen in this case, but switch silently to the fast
path.


FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.


AFAICT repmgr uses 'pg_ctl promote', and has since version 3.0 (released
in mid 2015).  It was only 3.3.2 (mid 2017) that supported Postgres 10,
so it seems fairly safe to assume that the removal won't be a problem.


Correct, repmgr uses "pg_ctl promote" or pg_promote() (if available), and
won't be affected by this change.


Regards

Ian Barwick


--
Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Kyotaro Horiguchi
At Tue, 21 Apr 2020 22:08:56 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2020/04/21 17:15, Kyotaro Horiguchi wrote:
> > At Mon, 20 Apr 2020 15:26:16 +0900, Fujii Masao
> >  wrote in
> >> Patch attached. I will add this into the first CF for v14.
> > -   if (!fast_promoted)
> > +   if (!promoted)
> > RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
> >   
> > CHECKPOINT_IMMEDIATE |
> >   
> > CHECKPOINT_WAIT);
> > If we don't find the checkpoint record just before, we don't insert
> > End-Of-Recovery record then run an immediate chekpoint.  I think if we
> > nuke the non-fast promotion, shouldn't we insert the EOR record even
> > in that case?
> 
> I'm not sure if that's safe. What if the server crashes before the
> checkpoint
> completes in that case? Since the last checkpoint record is not
> available,
> the subsequent crash recovery will fail. This would lead to that the
> server
> will never start up. Right? Currently ISTM that

Yes, that's right.

> end-of-recovery-checkpoint
> is executed to avoid such trouble in that case.

I meant that we always have EOR at the end of recovery.  So in the
missing latest checkpoint (and crash recovery) case, we insert EOR
after the immediate checkpoint. That also means we no longer set
CHECKPOINT_END_OF_RECOVERY to the checkpoint, too.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Alvaro Herrera
On 2020-Apr-20, Fujii Masao wrote:

> + /*
> +  * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
> +  * signal handler. It now leaves the file in place and lets the
> +  * Startup process do the unlink.
> +  */
> + if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, _buf) == 0)
>   {
> - /*
> -  * In 9.1 and 9.2 the postmaster unlinked the promote file 
> inside the
> -  * signal handler. It now leaves the file in place and lets the
> -  * Startup process do the unlink. This allows Startup to know 
> whether
> -  * it should create a full checkpoint before starting up 
> (fallback
> -  * mode). Fast promotion takes precedence.
> -  */

It seems pointless to leave a very old comment that documents what the
code no longer does.  I thikn it would be better to reword it indicating
what the code does do, ie. something like "Leave the signal file in
place; it will be removed by the startup process when ..."

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Alvaro Herrera
On 2020-Apr-21, Jehan-Guillaume de Rorthais wrote:

> On Tue, 21 Apr 2020 15:36:22 +0900
> Michael Paquier  wrote:

> > > Also in that case, non-fast promotion is triggered. Since my patch
> > > tries to remove non-fast promotion, it's intentional to prevent them
> > > from doing that. But you think that we should not drop that because
> > > there are still some users for that?  
> > 
> > It would be good to ask around to folks maintaining HA solutions about
> > that change at least, as there could be a point in still letting
> > promotion to happen in this case, but switch silently to the fast
> > path.
> 
> FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.

AFAICT repmgr uses 'pg_ctl promote', and has since version 3.0 (released
in mid 2015).  It was only 3.3.2 (mid 2017) that supported Postgres 10,
so it seems fairly safe to assume that the removal won't be a problem.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Jehan-Guillaume de Rorthais
Hello,

On Tue, 21 Apr 2020 15:36:22 +0900
Michael Paquier  wrote:

> On Tue, Apr 21, 2020 at 03:29:54PM +0900, Fujii Masao wrote:
> > Yeah, but that's not documented. So I don't think that we need to keep
> > the backward-compatibility for that.
> > 
> > Also in that case, non-fast promotion is triggered. Since my patch
> > tries to remove non-fast promotion, it's intentional to prevent them
> > from doing that. But you think that we should not drop that because
> > there are still some users for that?  
> 
> It would be good to ask around to folks maintaining HA solutions about
> that change at least, as there could be a point in still letting
> promotion to happen in this case, but switch silently to the fast
> path.

FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.

Regards,




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Fujii Masao




On 2020/04/21 17:15, Kyotaro Horiguchi wrote:

At Mon, 20 Apr 2020 15:26:16 +0900, Fujii Masao  
wrote in

Patch attached. I will add this into the first CF for v14.


-   if (!fast_promoted)
+   if (!promoted)
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
  
CHECKPOINT_IMMEDIATE |
  
CHECKPOINT_WAIT);

If we don't find the checkpoint record just before, we don't insert
End-Of-Recovery record then run an immediate chekpoint.  I think if we
nuke the non-fast promotion, shouldn't we insert the EOR record even
in that case?


I'm not sure if that's safe. What if the server crashes before the checkpoint
completes in that case? Since the last checkpoint record is not available,
the subsequent crash recovery will fail. This would lead to that the server
will never start up. Right? Currently ISTM that end-of-recovery-checkpoint
is executed to avoid such trouble in that case.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Kyotaro Horiguchi
At Mon, 20 Apr 2020 15:26:16 +0900, Fujii Masao  
wrote in 
> Patch attached. I will add this into the first CF for v14.

-   if (!fast_promoted)
+   if (!promoted)
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
  
CHECKPOINT_IMMEDIATE |
  
CHECKPOINT_WAIT);

If we don't find the checkpoint record just before, we don't insert
End-Of-Recovery record then run an immediate chekpoint.  I think if we
nuke the non-fast promotion, shouldn't we insert the EOR record even
in that case?

Or, as Andres suggested upthread, do we always insert it?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Kyotaro Horiguchi
At Tue, 21 Apr 2020 15:48:02 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2020/04/21 15:36, Michael Paquier wrote:
> > On Tue, Apr 21, 2020 at 03:29:54PM +0900, Fujii Masao wrote:
> >> Yeah, but that's not documented. So I don't think that we need to keep
> >> the backward-compatibility for that.
> >>
> >> Also in that case, non-fast promotion is triggered. Since my patch
> >> tries to remove non-fast promotion, it's intentional to prevent them
> >> from doing that. But you think that we should not drop that because
> >> there are still some users for that?
> > It would be good to ask around to folks maintaining HA solutions about
> > that change at least, as there could be a point in still letting
> > promotion to happen in this case, but switch silently to the fast
> > path.
> 
> *If* there are some HA solutions doing that, IMO that they should be
> *changed
> so that the documented official way to trigger promotion (i.e., pg_ctl
> promote,
> pg_promote or trigger_file) is used instead.

The difference between fast and non-fast promotions is far trivial
than the difference between promotion happens or not.  I think
everyone cares about the new version actually promotes by the steps
they have been doing, but few of them even notices the difference
between the fast and non-fast.  If those who are using non-fast
promotion for a certain reason should notice the change of promotion
behavior in release notes.

This is similar to the change of the default waiting behvaior of
pg_ctl at PG10.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Fujii Masao




On 2020/04/21 15:36, Michael Paquier wrote:

On Tue, Apr 21, 2020 at 03:29:54PM +0900, Fujii Masao wrote:

Yeah, but that's not documented. So I don't think that we need to keep
the backward-compatibility for that.

Also in that case, non-fast promotion is triggered. Since my patch
tries to remove non-fast promotion, it's intentional to prevent them
from doing that. But you think that we should not drop that because
there are still some users for that?


It would be good to ask around to folks maintaining HA solutions about
that change at least, as there could be a point in still letting
promotion to happen in this case, but switch silently to the fast
path.


*If* there are some HA solutions doing that, IMO that they should be changed
so that the documented official way to trigger promotion (i.e., pg_ctl promote,
pg_promote or trigger_file) is used instead.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-21 Thread Fujii Masao




On 2020/04/21 14:54, Michael Paquier wrote:

On Tue, Apr 21, 2020 at 02:27:20PM +0900, Fujii Masao wrote:

On 2020/04/21 10:59, Michael Paquier wrote:

With your patch, this code
now means that in order to finish recovery you need to send SIGUSR2 to
the startup process *and* to create the promote signal file.


Yes, but isn't this the same as the way to trigger fast promotion in HEAD?


Yep, but my point is that some users who have been relying only on
SIGUSR2 sent to the startup process for a promotion may be surprised
to see that doing the same operation does not trigger a promotion
anymore.


Yeah, but that's not documented. So I don't think that we need to keep
the backward-compatibility for that.

Also in that case, non-fast promotion is triggered. Since my patch
tries to remove non-fast promotion, it's intentional to prevent them
from doing that. But you think that we should not drop that because
there are still some users for that?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-20 Thread Michael Paquier
On Tue, Apr 21, 2020 at 02:27:20PM +0900, Fujii Masao wrote:
> On 2020/04/21 10:59, Michael Paquier wrote:
>> With your patch, this code
>> now means that in order to finish recovery you need to send SIGUSR2 to
>> the startup process *and* to create the promote signal file.
> 
> Yes, but isn't this the same as the way to trigger fast promotion in HEAD?

Yep, but my point is that some users who have been relying only on
SIGUSR2 sent to the startup process for a promotion may be surprised
to see that doing the same operation does not trigger a promotion
anymore.
--
Michael


signature.asc
Description: PGP signature


Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-20 Thread Fujii Masao




On 2020/04/21 10:59, Michael Paquier wrote:

On Mon, Apr 20, 2020 at 03:26:16PM +0900, Fujii Masao wrote:

Patch attached. I will add this into the first CF for v14.


Thanks!


-if (IsPromoteSignaled())
+/*
+ * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
+ * signal handler. It now leaves the file in place and lets the
+ * Startup process do the unlink.
+ */
+if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, _buf) == 0)
  {
-/*
- * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
- * signal handler. It now leaves the file in place and lets the
- * Startup process do the unlink. This allows Startup to know whether
- * it should create a full checkpoint before starting up (fallback
- * mode). Fast promotion takes precedence.
- */
-if (stat(PROMOTE_SIGNAL_FILE, _buf) == 0)
-{
-unlink(PROMOTE_SIGNAL_FILE);
-unlink(FALLBACK_PROMOTE_SIGNAL_FILE);
-fast_promote = true;
-}
-else if (stat(FALLBACK_PROMOTE_SIGNAL_FILE, _buf) == 0)
-{
-unlink(FALLBACK_PROMOTE_SIGNAL_FILE);
-fast_promote = false;
-}
-
  ereport(LOG, (errmsg("received promote request")));
-
+unlink(PROMOTE_SIGNAL_FILE);


Thanks for reviewing the patch!


On HEAD, this code means that it is possible to end recovery just by
sending SIGUSR2 to the startup process.


Yes, in this case, non-fast promotion is triggered.


With your patch, this code
now means that in order to finish recovery you need to send SIGUSR2 to
the startup process *and* to create the promote signal file.


Yes, but isn't this the same as the way to trigger fast promotion in HEAD?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-20 Thread Michael Paquier
On Mon, Apr 20, 2020 at 03:26:16PM +0900, Fujii Masao wrote:
> Patch attached. I will add this into the first CF for v14.

Thanks!

> -if (IsPromoteSignaled())
> +/*
> + * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
> + * signal handler. It now leaves the file in place and lets the
> + * Startup process do the unlink.
> + */
> +if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, _buf) == 0)
>  {
> -/*
> - * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
> - * signal handler. It now leaves the file in place and lets the
> - * Startup process do the unlink. This allows Startup to know whether
> - * it should create a full checkpoint before starting up (fallback
> - * mode). Fast promotion takes precedence.
> - */
> -if (stat(PROMOTE_SIGNAL_FILE, _buf) == 0)
> -{
> -unlink(PROMOTE_SIGNAL_FILE);
> -unlink(FALLBACK_PROMOTE_SIGNAL_FILE);
> -fast_promote = true;
> -}
> -else if (stat(FALLBACK_PROMOTE_SIGNAL_FILE, _buf) == 0)
> -{
> -unlink(FALLBACK_PROMOTE_SIGNAL_FILE);
> -fast_promote = false;
> -}
> -
>  ereport(LOG, (errmsg("received promote request")));
> -
> +unlink(PROMOTE_SIGNAL_FILE);

On HEAD, this code means that it is possible to end recovery just by
sending SIGUSR2 to the startup process.  With your patch, this code
now means that in order to finish recovery you need to send SIGUSR2 to
the startup process *and* to create the promote signal file.  Is that
really what you want?
--
Michael


signature.asc
Description: PGP signature


Remove non-fast promotion Re: Should we remove a fallback promotion? take 2

2020-04-20 Thread Fujii Masao



On 2020/03/10 6:56, Andres Freund wrote:

Hi,

On 2020-03-06 16:33:18 -0300, Alvaro Herrera wrote:

On 2020-Mar-06, Michael Paquier wrote:

On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:

Seems reasonable, but it would be better if people proposed these
kinds of changes closer to the beginning of the release cycle rather
than in the crush at the end.


+1, to both points.


Why?  Are you saying that there's some actual risk of breaking
something?  We're not even near beta or feature freeze yet.

I'm not seeing the reason for the "please propose this sooner in the
cycle" argument.  It has already been proposed sooner -- seven years
sooner.  We're not waiting for users to complain anymore; clearly nobody
cared.


Yea. There are changes that are so invasive that it's useful to go very
early, but in this case I'm not seeing it?

+1 for removing non-fast promotions.


Patch attached. I will add this into the first CF for v14.


FWIW, I find "fallback promotion" a confusing description.


Yeah, so I changed the subject.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 11e32733c4..35fc700b7b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -298,9 +298,6 @@ boolwal_receiver_create_temp_slot = false;
 /* are we currently in standby mode? */
 bool   StandbyMode = false;
 
-/* whether request for fast promotion has been made yet */
-static bool fast_promote = false;
-
 /*
  * if recoveryStopsBefore/After returns true, it saves information of the stop
  * point here
@@ -6311,7 +6308,7 @@ StartupXLOG(void)
DBState dbstate_at_startup;
XLogReaderState *xlogreader;
XLogPageReadPrivate private;
-   boolfast_promoted = false;
+   boolpromoted = false;
struct stat st;
 
/*
@@ -7698,14 +7695,14 @@ StartupXLOG(void)
 * the rule that TLI only changes in shutdown checkpoints, which
 * allows some extra error checking in xlog_redo.
 *
-* In fast promotion, only create a lightweight end-of-recovery 
record
+* In promotion, only create a lightweight end-of-recovery 
record
 * instead of a full checkpoint. A checkpoint is requested 
later,
 * after we're fully out of recovery mode and already accepting
 * queries.
 */
if (bgwriterLaunched)
{
-   if (fast_promote)
+   if (LocalPromoteIsTriggered)
{
checkPointLoc = ControlFile->checkPoint;
 
@@ -7716,7 +7713,7 @@ StartupXLOG(void)
record = ReadCheckpointRecord(xlogreader, 
checkPointLoc, 1, false);
if (record != NULL)
{
-   fast_promoted = true;
+   promoted = true;
 
/*
 * Insert a special WAL record to mark 
the end of
@@ -7733,7 +7730,7 @@ StartupXLOG(void)
}
}
 
-   if (!fast_promoted)
+   if (!promoted)
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
  
CHECKPOINT_IMMEDIATE |
  
CHECKPOINT_WAIT);
@@ -7924,12 +7921,12 @@ StartupXLOG(void)
WalSndWakeup();
 
/*
-* If this was a fast promotion, request an (online) checkpoint now. 
This
+* If this was a promotion, request an (online) checkpoint now. This
 * isn't required for consistency, but the last restartpoint might be 
far
 * back, and in case of a crash, recovering from it might take a longer
 * than is appropriate now that we're not in standby mode anymore.
 */
-   if (fast_promoted)
+   if (promoted)
RequestCheckpoint(CHECKPOINT_FORCE);
 }
 
@@ -12532,29 +12529,15 @@ CheckForStandbyTrigger(void)
if (LocalPromoteIsTriggered)
return true;
 
-   if (IsPromoteSignaled())
+   /*
+* In 9.1 and 9.2 the postmaster unlinked the promote file inside the
+* signal handler. It now leaves the file in place and lets the
+* Startup process do the unlink.
+*/
+   if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, _buf) == 0)
{
-   /*
-* In 9.1 and 9.2 the postmaster unlinked the promote file 
inside the
-* signal handler. It now leaves the file in place and 

Re: Should we remove a fallback promotion? take 2

2020-03-09 Thread Andres Freund
Hi,

On 2020-03-06 16:33:18 -0300, Alvaro Herrera wrote:
> On 2020-Mar-06, Michael Paquier wrote:
> > On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
> > > Seems reasonable, but it would be better if people proposed these
> > > kinds of changes closer to the beginning of the release cycle rather
> > > than in the crush at the end.
> > 
> > +1, to both points.
> 
> Why?  Are you saying that there's some actual risk of breaking
> something?  We're not even near beta or feature freeze yet.
> 
> I'm not seeing the reason for the "please propose this sooner in the
> cycle" argument.  It has already been proposed sooner -- seven years
> sooner.  We're not waiting for users to complain anymore; clearly nobody
> cared.

Yea. There are changes that are so invasive that it's useful to go very
early, but in this case I'm not seeing it?

+1 for removing non-fast promotions.

FWIW, I find "fallback promotion" a confusing description.


Btw, I'd really like to make the crash recovery environment more like
the replication environment. I.e. have checkpointer, bgwriter running,
and have an 'end-of-recovery' record instead of a checkpoint at the end.


Greetings,

Andres Freund




Re: Should we remove a fallback promotion? take 2

2020-03-06 Thread Alvaro Herrera
On 2020-Mar-06, Michael Paquier wrote:

> On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
> > Seems reasonable, but it would be better if people proposed these
> > kinds of changes closer to the beginning of the release cycle rather
> > than in the crush at the end.
> 
> +1, to both points.

Why?  Are you saying that there's some actual risk of breaking
something?  We're not even near beta or feature freeze yet.

I'm not seeing the reason for the "please propose this sooner in the
cycle" argument.  It has already been proposed sooner -- seven years
sooner.  We're not waiting for users to complain anymore; clearly nobody
cared.

I think dragging things forever serves no purpose.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Should we remove a fallback promotion? take 2

2020-03-06 Thread Fujii Masao




On 2020/03/06 10:40, Michael Paquier wrote:

On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:

Seems reasonable, but it would be better if people proposed these
kinds of changes closer to the beginning of the release cycle rather
than in the crush at the end.


+1, to both points.


Ok, I'm fine to do that in v14.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Should we remove a fallback promotion? take 2

2020-03-05 Thread Michael Paquier
On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
> Seems reasonable, but it would be better if people proposed these
> kinds of changes closer to the beginning of the release cycle rather
> than in the crush at the end.

+1, to both points.
--
Michael


signature.asc
Description: PGP signature


Re: Should we remove a fallback promotion? take 2

2020-03-05 Thread Robert Haas
On Thu, Mar 5, 2020 at 8:48 AM Fujii Masao  wrote:
> We discussed the $SUBJECT six years ago at the following thread.
> https://postgr.es/m/CAHGQGwGYkF+CvpOMdxaO=+anazc1oo9o4lqwo50mxpvfj+0...@mail.gmail.com
>
> Seems our consensus at that discussion was to leave a fallback
> promotion for a release or two for debugging purpose or as
> an emergency method because fast promotion might have
> some issues, and then to remove it later. Now, more than six years
> have already passed since that discussion. Is there still
> any reason to keep a fallback promotion? If nothing, I'd like to
> drop it from v13.

Seems reasonable, but it would be better if people proposed these
kinds of changes closer to the beginning of the release cycle rather
than in the crush at the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Should we remove a fallback promotion? take 2

2020-03-05 Thread Fujii Masao

Hi,

We discussed the $SUBJECT six years ago at the following thread.
https://postgr.es/m/CAHGQGwGYkF+CvpOMdxaO=+anazc1oo9o4lqwo50mxpvfj+0...@mail.gmail.com

Seems our consensus at that discussion was to leave a fallback
promotion for a release or two for debugging purpose or as
an emergency method because fast promotion might have
some issues, and then to remove it later. Now, more than six years
have already passed since that discussion. Is there still
any reason to keep a fallback promotion? If nothing, I'd like to
drop it from v13.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters