Re: [HACKERS] Strange debug message of walreciver?

2015-03-13 Thread Tatsuo Ishii
>> On Sun, Mar 8, 2015 at 8:13 PM, Tatsuo Ishii  wrote:
>>>
>>> > On Sat, Mar 7, 2015 at 10:18 PM, Tatsuo Ishii 
>> wrote:
>>> >>
>>> >> When I set log_min_messages to debug5 and looked into walreciver log,
>>> >> I saw this:
>>> >>
>>> >> 3600 2015-03-08 09:47:38 JST DEBUG:  sendtime 2015-03-08
>>> > 09:47:38.31493+09 receipttime 2015-03-08 09:47:38.315027+09 replication
>>> > apply delay -1945478837 ms tran sfer latency 0 ms
>>> >>
>>> >> The "replication apply delay -1945478837 ms" part looks strange
>>> >> because the delay is below 0. The number is formatted as %d in elog
>>> >> call, and I suspect this is kind of integer overflow.
>>> >>
>>> >
>>> > Looking at GetReplicationApplyDelay() in walreceiverfuncs.c I noticed
>> that
>>> > the integer overflow occurs because sometimes the return of the
>>> > GetCurrentChunkReplayStartTime() is 0 (zero).
>>> >
>>> > I added an elog into GetReplicationApplyDelay to check this info:
>>> >
>>> > DEBUG:  GetReplicationApplyDelay 479099832 seconds, 352 milliseconds,
>>> > (0.00, 479099832352083.00)
>>> > DEBUG:  sendtime 2015-03-08 00:17:12.351987-03 receipttime 2015-03-08
>>> > 00:17:12.352043-03 replication apply delay -1936504800 ms transfer
>> latency
>>> > 0 ms
>>> > DEBUG:  GetReplicationApplyDelay 479099841 seconds, 450 milliseconds,
>>> > (0.00, 479099841450320.00)
>>> > DEBUG:  sendtime 2015-03-08 00:17:21.45018-03 receipttime 2015-03-08
>>> > 00:17:21.450294-03 replication apply delay -1936495702 ms transfer
>> latency
>>> > 0 ms
>>>
>>> Right. Until walreceiver gets the first WAL record to replay,
>>> xlogctl->currentChunkStartTime remains 0.
>>>
>>> > Maybe we should check before and return zero from
>> GetReplicationApplyDelay.
>>> > The attached patch implement it.
>>>
>>> Your patch regards 0 replication apply delay in the case above which
>>> is confusing if the delay is actually 0.
>>>
>>> What about something like this to explicitly stats the delay data is
>>> not available?
>>>
>>> elog(DEBUG2, "sendtime %s receipttime %s replication apply delay (N/A)
>> transfer latency %d ms",
>>>
>> 
>> Makes sense. Attached patch implement what you suggested.
> 
> Looks good. If there's no objection, I will commit this.

Done.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange debug message of walreciver?

2015-03-08 Thread Tatsuo Ishii
> On Sun, Mar 8, 2015 at 8:13 PM, Tatsuo Ishii  wrote:
>>
>> > On Sat, Mar 7, 2015 at 10:18 PM, Tatsuo Ishii 
> wrote:
>> >>
>> >> When I set log_min_messages to debug5 and looked into walreciver log,
>> >> I saw this:
>> >>
>> >> 3600 2015-03-08 09:47:38 JST DEBUG:  sendtime 2015-03-08
>> > 09:47:38.31493+09 receipttime 2015-03-08 09:47:38.315027+09 replication
>> > apply delay -1945478837 ms tran sfer latency 0 ms
>> >>
>> >> The "replication apply delay -1945478837 ms" part looks strange
>> >> because the delay is below 0. The number is formatted as %d in elog
>> >> call, and I suspect this is kind of integer overflow.
>> >>
>> >
>> > Looking at GetReplicationApplyDelay() in walreceiverfuncs.c I noticed
> that
>> > the integer overflow occurs because sometimes the return of the
>> > GetCurrentChunkReplayStartTime() is 0 (zero).
>> >
>> > I added an elog into GetReplicationApplyDelay to check this info:
>> >
>> > DEBUG:  GetReplicationApplyDelay 479099832 seconds, 352 milliseconds,
>> > (0.00, 479099832352083.00)
>> > DEBUG:  sendtime 2015-03-08 00:17:12.351987-03 receipttime 2015-03-08
>> > 00:17:12.352043-03 replication apply delay -1936504800 ms transfer
> latency
>> > 0 ms
>> > DEBUG:  GetReplicationApplyDelay 479099841 seconds, 450 milliseconds,
>> > (0.00, 479099841450320.00)
>> > DEBUG:  sendtime 2015-03-08 00:17:21.45018-03 receipttime 2015-03-08
>> > 00:17:21.450294-03 replication apply delay -1936495702 ms transfer
> latency
>> > 0 ms
>>
>> Right. Until walreceiver gets the first WAL record to replay,
>> xlogctl->currentChunkStartTime remains 0.
>>
>> > Maybe we should check before and return zero from
> GetReplicationApplyDelay.
>> > The attached patch implement it.
>>
>> Your patch regards 0 replication apply delay in the case above which
>> is confusing if the delay is actually 0.
>>
>> What about something like this to explicitly stats the delay data is
>> not available?
>>
>> elog(DEBUG2, "sendtime %s receipttime %s replication apply delay (N/A)
> transfer latency %d ms",
>>
> 
> Makes sense. Attached patch implement what you suggested.

Looks good. If there's no objection, I will commit this.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange debug message of walreciver?

2015-03-08 Thread Fabrízio de Royes Mello
On Sun, Mar 8, 2015 at 8:13 PM, Tatsuo Ishii  wrote:
>
> > On Sat, Mar 7, 2015 at 10:18 PM, Tatsuo Ishii 
wrote:
> >>
> >> When I set log_min_messages to debug5 and looked into walreciver log,
> >> I saw this:
> >>
> >> 3600 2015-03-08 09:47:38 JST DEBUG:  sendtime 2015-03-08
> > 09:47:38.31493+09 receipttime 2015-03-08 09:47:38.315027+09 replication
> > apply delay -1945478837 ms tran sfer latency 0 ms
> >>
> >> The "replication apply delay -1945478837 ms" part looks strange
> >> because the delay is below 0. The number is formatted as %d in elog
> >> call, and I suspect this is kind of integer overflow.
> >>
> >
> > Looking at GetReplicationApplyDelay() in walreceiverfuncs.c I noticed
that
> > the integer overflow occurs because sometimes the return of the
> > GetCurrentChunkReplayStartTime() is 0 (zero).
> >
> > I added an elog into GetReplicationApplyDelay to check this info:
> >
> > DEBUG:  GetReplicationApplyDelay 479099832 seconds, 352 milliseconds,
> > (0.00, 479099832352083.00)
> > DEBUG:  sendtime 2015-03-08 00:17:12.351987-03 receipttime 2015-03-08
> > 00:17:12.352043-03 replication apply delay -1936504800 ms transfer
latency
> > 0 ms
> > DEBUG:  GetReplicationApplyDelay 479099841 seconds, 450 milliseconds,
> > (0.00, 479099841450320.00)
> > DEBUG:  sendtime 2015-03-08 00:17:21.45018-03 receipttime 2015-03-08
> > 00:17:21.450294-03 replication apply delay -1936495702 ms transfer
latency
> > 0 ms
>
> Right. Until walreceiver gets the first WAL record to replay,
> xlogctl->currentChunkStartTime remains 0.
>
> > Maybe we should check before and return zero from
GetReplicationApplyDelay.
> > The attached patch implement it.
>
> Your patch regards 0 replication apply delay in the case above which
> is confusing if the delay is actually 0.
>
> What about something like this to explicitly stats the delay data is
> not available?
>
> elog(DEBUG2, "sendtime %s receipttime %s replication apply delay (N/A)
transfer latency %d ms",
>

Makes sense. Attached patch implement what you suggested.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index bfbc02f..9c7710f 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1191,15 +1191,26 @@ ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)
 	{
 		char	   *sendtime;
 		char	   *receipttime;
+		int			applyDelay;
 
 		/* Copy because timestamptz_to_str returns a static buffer */
 		sendtime = pstrdup(timestamptz_to_str(sendTime));
 		receipttime = pstrdup(timestamptz_to_str(lastMsgReceiptTime));
-		elog(DEBUG2, "sendtime %s receipttime %s replication apply delay %d ms transfer latency %d ms",
-			 sendtime,
-			 receipttime,
-			 GetReplicationApplyDelay(),
-			 GetReplicationTransferLatency());
+		applyDelay = GetReplicationApplyDelay();
+
+		/* apply delay is not available */
+		if (applyDelay == -1)
+			elog(DEBUG2, "sendtime %s receipttime %s replication apply delay (N/A) transfer latency %d ms",
+ sendtime,
+ receipttime,
+ GetReplicationTransferLatency());
+		else
+			elog(DEBUG2, "sendtime %s receipttime %s replication apply delay %d ms transfer latency %d ms",
+ sendtime,
+ receipttime,
+ applyDelay,
+ GetReplicationTransferLatency());
+
 		pfree(sendtime);
 		pfree(receipttime);
 	}
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 496605f..b26f5fc 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -314,7 +314,8 @@ GetWalRcvWriteRecPtr(XLogRecPtr *latestChunkStart, TimeLineID *receiveTLI)
 }
 
 /*
- * Returns the replication apply delay in ms
+ * Returns the replication apply delay in ms or -1
+ * if the apply delay info is not available
  */
 int
 GetReplicationApplyDelay(void)
@@ -328,6 +329,8 @@ GetReplicationApplyDelay(void)
 	long		secs;
 	int			usecs;
 
+	TimestampTz	chunckReplayStartTime;
+
 	SpinLockAcquire(&walrcv->mutex);
 	receivePtr = walrcv->receivedUpto;
 	SpinLockRelease(&walrcv->mutex);
@@ -337,7 +340,12 @@ GetReplicationApplyDelay(void)
 	if (receivePtr == replayPtr)
 		return 0;
 
-	TimestampDifference(GetCurrentChunkReplayStartTime(),
+	chunckReplayStartTime = GetCurrentChunkReplayStartTime();
+
+	if (chunckReplayStartTime == 0)
+		return -1;
+
+	TimestampDifference(chunckReplayStartTime,
 		GetCurrentTimestamp(),
 		&secs, &usecs);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange debug message of walreciver?

2015-03-08 Thread Tatsuo Ishii
> On Sat, Mar 7, 2015 at 10:18 PM, Tatsuo Ishii  wrote:
>>
>> When I set log_min_messages to debug5 and looked into walreciver log,
>> I saw this:
>>
>> 3600 2015-03-08 09:47:38 JST DEBUG:  sendtime 2015-03-08
> 09:47:38.31493+09 receipttime 2015-03-08 09:47:38.315027+09 replication
> apply delay -1945478837 ms tran sfer latency 0 ms
>>
>> The "replication apply delay -1945478837 ms" part looks strange
>> because the delay is below 0. The number is formatted as %d in elog
>> call, and I suspect this is kind of integer overflow.
>>
> 
> Looking at GetReplicationApplyDelay() in walreceiverfuncs.c I noticed that
> the integer overflow occurs because sometimes the return of the
> GetCurrentChunkReplayStartTime() is 0 (zero).
> 
> I added an elog into GetReplicationApplyDelay to check this info:
> 
> DEBUG:  GetReplicationApplyDelay 479099832 seconds, 352 milliseconds,
> (0.00, 479099832352083.00)
> DEBUG:  sendtime 2015-03-08 00:17:12.351987-03 receipttime 2015-03-08
> 00:17:12.352043-03 replication apply delay -1936504800 ms transfer latency
> 0 ms
> DEBUG:  GetReplicationApplyDelay 479099841 seconds, 450 milliseconds,
> (0.00, 479099841450320.00)
> DEBUG:  sendtime 2015-03-08 00:17:21.45018-03 receipttime 2015-03-08
> 00:17:21.450294-03 replication apply delay -1936495702 ms transfer latency
> 0 ms

Right. Until walreceiver gets the first WAL record to replay,
xlogctl->currentChunkStartTime remains 0.

> Maybe we should check before and return zero from GetReplicationApplyDelay.
> The attached patch implement it.

Your patch regards 0 replication apply delay in the case above which
is confusing if the delay is actually 0.

What about something like this to explicitly stats the delay data is
not available?

elog(DEBUG2, "sendtime %s receipttime %s replication apply delay (N/A) transfer 
latency %d ms",

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange debug message of walreciver?

2015-03-07 Thread Fabrízio de Royes Mello
On Sat, Mar 7, 2015 at 10:18 PM, Tatsuo Ishii  wrote:
>
> When I set log_min_messages to debug5 and looked into walreciver log,
> I saw this:
>
> 3600 2015-03-08 09:47:38 JST DEBUG:  sendtime 2015-03-08
09:47:38.31493+09 receipttime 2015-03-08 09:47:38.315027+09 replication
apply delay -1945478837 ms tran sfer latency 0 ms
>
> The "replication apply delay -1945478837 ms" part looks strange
> because the delay is below 0. The number is formatted as %d in elog
> call, and I suspect this is kind of integer overflow.
>

Looking at GetReplicationApplyDelay() in walreceiverfuncs.c I noticed that
the integer overflow occurs because sometimes the return of the
GetCurrentChunkReplayStartTime() is 0 (zero).

I added an elog into GetReplicationApplyDelay to check this info:

DEBUG:  GetReplicationApplyDelay 479099832 seconds, 352 milliseconds,
(0.00, 479099832352083.00)
DEBUG:  sendtime 2015-03-08 00:17:12.351987-03 receipttime 2015-03-08
00:17:12.352043-03 replication apply delay -1936504800 ms transfer latency
0 ms
DEBUG:  GetReplicationApplyDelay 479099841 seconds, 450 milliseconds,
(0.00, 479099841450320.00)
DEBUG:  sendtime 2015-03-08 00:17:21.45018-03 receipttime 2015-03-08
00:17:21.450294-03 replication apply delay -1936495702 ms transfer latency
0 ms

Maybe we should check before and return zero from GetReplicationApplyDelay.
The attached patch implement it.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 496605f..0c45b66 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -328,6 +328,8 @@ GetReplicationApplyDelay(void)
 	long		secs;
 	int			usecs;
 
+	TimestampTz	chunckReplayStartTime;
+
 	SpinLockAcquire(&walrcv->mutex);
 	receivePtr = walrcv->receivedUpto;
 	SpinLockRelease(&walrcv->mutex);
@@ -337,7 +339,12 @@ GetReplicationApplyDelay(void)
 	if (receivePtr == replayPtr)
 		return 0;
 
-	TimestampDifference(GetCurrentChunkReplayStartTime(),
+	chunckReplayStartTime = GetCurrentChunkReplayStartTime();
+
+	if (chunckReplayStartTime == 0)
+		return 0;
+
+	TimestampDifference(chunckReplayStartTime,
 		GetCurrentTimestamp(),
 		&secs, &usecs);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Strange debug message of walreciver?

2015-03-07 Thread Tatsuo Ishii
When I set log_min_messages to debug5 and looked into walreciver log,
I saw this:

3600 2015-03-08 09:47:38 JST DEBUG:  sendtime 2015-03-08 09:47:38.31493+09 
receipttime 2015-03-08 09:47:38.315027+09 replication apply delay -1945478837 
ms tran sfer latency 0 ms

The "replication apply delay -1945478837 ms" part looks strange
because the delay is below 0. The number is formatted as %d in elog
call, and I suspect this is kind of integer overflow.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers