On 2020/03/08 13:52, Masahiko Sawada wrote:
On Thu, 5 Mar 2020 at 20:16, Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
On 2020/03/05 16:58, Masahiko Sawada wrote:
On Wed, 4 Mar 2020 at 15:21, Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
On 2020/03/04 14:31, Masahiko Sawada wrote:
On Wed, 4 Mar 2020 at 13:48, Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
On 2020/03/04 13:27, Michael Paquier wrote:
On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:
Yeah, so 0001 patch sets existing wait events to recovery conflict
resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
to the recovery conflict on a snapshot. 0003 patch improves these wait
events by adding the new type of wait event such as
WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
is the fix for existing versions and 0003 patch is an improvement for
only PG13. Did you mean even 0001 patch doesn't fit for back-patching?
Yes, it looks like a improvement rather than bug fix.
Okay, understand.
I got my eyes on this patch set. The full patch set is in my opinion
just a set of improvements, and not bug fixes, so I would refrain from
back-backpatching.
I think that the issue (i.e., "waiting" is reported twice needlessly
in PS display) that 0002 patch tries to fix is a bug. So it should be
fixed even in the back branches.
So we need only two patches: one fixes process title issue and another
improve wait event. I've attached updated patches.
Thanks for updating the patches! I started reading 0001 patch.
Thank you for reviewing this patch.
- /*
- * Report via ps if we have been waiting for more than
500 msec
- * (should that be configurable?)
- */
- if (update_process_title && new_status == NULL &&
- TimestampDifferenceExceeds(waitStart,
GetCurrentTimestamp(),
-
500))
The patch changes ResolveRecoveryConflictWithSnapshot() and
ResolveRecoveryConflictWithTablespace() so that they always add
"waiting" into the PS display, whether wait is really necessary or not.
But isn't it better to display "waiting" in PS basically when wait is
necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
does as the above?
You're right. Will fix it.
ResolveRecoveryConflictWithDatabase(Oid dbid)
{
+ char *new_status = NULL;
+
+ /* Report via ps we are waiting */
+ new_status = set_process_title_waiting();
In ResolveRecoveryConflictWithDatabase(), there seems no need to
display "waiting" in PS because no wait occurs when recovery conflict
with database happens.
Isn't the startup process waiting for other backend to terminate?
Yeah, you're right. I agree that "waiting" should be reported in this case.
Currently ResolveRecoveryConflictWithLock() and
ResolveRecoveryConflictWithBufferPin() don't call
ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting"
in PS display. You changed them so that they report "waiting". I agree
to have this change. But this change is an improvement rather than
a bug fix, i.e., we should apply this change only in v13?
Did you mean ResolveRecoveryConflictWithDatabase and
ResolveRecoveryConflictWithBufferPin?
Yes! Sorry for my typo.
In the current code as far as I
researched there are two cases where we don't add "waiting" and one
case where we doubly add "waiting".
ResolveRecoveryConflictWithDatabase and
ResolveRecoveryConflictWithBufferPin don't update the ps title.
Although the path where GetCurrentTimestamp() >= ltime is false in
ResolveRecoveryConflictWithLock also doesn't update the ps title, it's
already updated in WaitOnLock. On the other hand, the path where
GetCurrentTimestamp() >= ltime is true in
ResolveRecoveryConflictWithLock updates the ps title but it's wrong as
I reported.
I've split the patch into two patches: 0001 patch fixes the issue
about doubly updating ps title, 0002 patch makes the recovery conflict
resolution on database and buffer pin update the ps title.
Thanks for splitting the patches. I think that 0001 patch can be back-patched.
- /*
- * Report via ps if we have been waiting for more than
500 msec
- * (should that be configurable?)
- */
- if (update_process_title && new_status == NULL &&
- TimestampDifferenceExceeds(waitStart,
GetCurrentTimestamp(),
-
500))
Originally, "waiting" is reported in PS if we've been waiting for more than
500 msec, as the above does. But you got rid of those codes in the patch.
Did you confirm that it's safe to do that? If not, isn't it better to apply
the attached patch? The attached patch makes
ResolveRecoveryConflictWithVirtualXIDs() report "waiting" as it does now,
and allows its caller to choose whether to report that.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/src/backend/storage/ipc/standby.c
b/src/backend/storage/ipc/standby.c
index 3090e57fa4..94d4a8cbff 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -43,7 +43,7 @@ int max_standby_streaming_delay = 30 * 1000;
static HTAB *RecoveryLockLists;
static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId
*waitlist,
-
ProcSignalReason reason);
+
ProcSignalReason reason, bool report_waiting);
static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
@@ -216,10 +216,14 @@ WaitExceedsMaxStandbyDelay(void)
* recovery processing. Judgement has already been passed on it within
* a specific rmgr. Here we just issue the orders to the procs. The procs
* then throw the required error as instructed.
+ *
+ * If report_waiting is true, "waiting" is reported in PS display if necessary.
+ * If the caller has already reported that, report_waiting should be false.
+ * Otherwise, "waiting" is reported twice unexpectedly.
*/
static void
ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
-
ProcSignalReason reason)
+
ProcSignalReason reason, bool report_waiting)
{
TimestampTz waitStart;
char *new_status;
@@ -228,7 +232,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId
*waitlist,
if (!VirtualTransactionIdIsValid(*waitlist))
return;
- waitStart = GetCurrentTimestamp();
+ if (report_waiting)
+ waitStart = GetCurrentTimestamp();
new_status = NULL; /* we haven't changed the ps
display */
while (VirtualTransactionIdIsValid(*waitlist))
@@ -243,7 +248,7 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId
*waitlist,
* Report via ps if we have been waiting for more than
500 msec
* (should that be configurable?)
*/
- if (update_process_title && new_status == NULL &&
+ if (update_process_title && new_status == NULL &&
report_waiting &&
TimestampDifferenceExceeds(waitStart,
GetCurrentTimestamp(),
500))
{
@@ -311,7 +316,8 @@ ResolveRecoveryConflictWithSnapshot(TransactionId
latestRemovedXid, RelFileNode
node.dbNode);
ResolveRecoveryConflictWithVirtualXIDs(backends,
-
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
+
true);
}
void
@@ -339,7 +345,8 @@ ResolveRecoveryConflictWithTablespace(Oid tsid)
temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
InvalidOid);
ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
-
PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+
PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
+
true);
}
void
@@ -402,8 +409,15 @@ ResolveRecoveryConflictWithLock(LOCKTAG locktag)
VirtualTransactionId *backends;
backends = GetLockConflicts(&locktag, AccessExclusiveLock,
NULL);
+
+ /*
+ * Prevent ResolveRecoveryConflictWithVirtualXIDs() from
reporting
+ * "waiting" in PS display by disabling its argument
report_waiting
+ * because the caller, WaitOnLock(), has already reported that.
+ */
ResolveRecoveryConflictWithVirtualXIDs(backends,
-
PROCSIG_RECOVERY_CONFLICT_LOCK);
+
PROCSIG_RECOVERY_CONFLICT_LOCK,
+
false);
}
else
{