Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-12 Thread Alexey Kondratov

On 2020-11-11 06:59, Tom Lane wrote:

Alexey Kondratov  writes:
After looking on the autoprewarm code more closely I have realised 
that

this 'double dump' issues was not an issues at all. I have just
misplaced a debug elog(), so its second output in the log was only
indicating that we calculated delay_in_ms one more time.


Ah --- that explains why I couldn't see a problem.

I've pushed 0001+0002 plus some followup work to fix other places
that could usefully use TimestampDifferenceMilliseconds().  I have
not done anything with 0003 (the TAP test for pg_prewarm), and will
leave that to the judgment of somebody who's worked with pg_prewarm
before.  To me it looks like it's not really testing things very
carefully at all; on the other hand, we have exactly zero test
coverage of that module today, so maybe something is better than
nothing.



Great, thank you for generalisation of the issue and working on it.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-10 Thread Tom Lane
Alexey Kondratov  writes:
> After looking on the autoprewarm code more closely I have realised that 
> this 'double dump' issues was not an issues at all. I have just 
> misplaced a debug elog(), so its second output in the log was only 
> indicating that we calculated delay_in_ms one more time.

Ah --- that explains why I couldn't see a problem.

I've pushed 0001+0002 plus some followup work to fix other places
that could usefully use TimestampDifferenceMilliseconds().  I have
not done anything with 0003 (the TAP test for pg_prewarm), and will
leave that to the judgment of somebody who's worked with pg_prewarm
before.  To me it looks like it's not really testing things very
carefully at all; on the other hand, we have exactly zero test
coverage of that module today, so maybe something is better than
nothing.

regards, tom lane




Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-10 Thread Alexey Kondratov

On 2020-11-09 23:25, Tom Lane wrote:

Alexey Kondratov  writes:

On 2020-11-09 21:53, Tom Lane wrote:
0002 seems like a pretty clear bug fix, though I wonder if this is 
exactly
what we want to do going forward.  It seems like a very large 
fraction of
the callers of TimestampDifference would like to have the value in 
msec,

which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again.  Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).



Yeah, I get into this problem after a bug in another extension —
pg_wait_sampling. I have attached 0002, which implements
TimestampDifferenceMilliseconds(), so 0003 just uses this new function
to solve the initial issues. If it looks good to you, then we can 
switch

all similar callers to it.


Yeah, let's move forward with that --- in fact, I'm inclined to
back-patch it.  (Not till the current release cycle is done, though.
I don't find this important enough to justify a last-moment patch.)

BTW, I wonder if we shouldn't make TimestampDifferenceMilliseconds
round any fractional millisecond up rather than down.  Rounding down
seems to create a hazard of uselessly waking just before the delay is
completed.  Better to wake just after.



Yes, it make sense. I have changed TimestampDifferenceMilliseconds() to 
round result up if there is a reminder.


After looking on the autoprewarm code more closely I have realised that 
this 'double dump' issues was not an issues at all. I have just 
misplaced a debug elog(), so its second output in the log was only 
indicating that we calculated delay_in_ms one more time. Actually, even 
with wrong calculation of delay_in_ms the only problem was that we were 
busy looping with ~1 second interval instead of waiting on latch.


It is still a buggy behaviour, but much less harmful than I have 
originally thought.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom ce09103d9d58b611728b66366cd24e8a4069f7ac Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:04:10 +0300
Subject: [PATCH v3 3/3] pg_prewarm: add tap test for autoprewarm feature

---
 contrib/pg_prewarm/Makefile |  2 +
 contrib/pg_prewarm/t/001_autoprewarm.pl | 59 +
 2 files changed, 61 insertions(+)
 create mode 100644 contrib/pg_prewarm/t/001_autoprewarm.pl

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index b13ac3c813..9cfde8c4e4 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -10,6 +10,8 @@ EXTENSION = pg_prewarm
 DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/pg_prewarm/t/001_autoprewarm.pl b/contrib/pg_prewarm/t/001_autoprewarm.pl
new file mode 100644
index 00..f55b2a5352
--- /dev/null
+++ b/contrib/pg_prewarm/t/001_autoprewarm.pl
@@ -0,0 +1,59 @@
+#
+# Check that pg_prewarm can dump blocks from shared buffers
+# to PGDATA/autoprewarm.blocks.
+#
+
+use strict;
+use Test::More;
+use TestLib;
+use Time::HiRes qw(usleep);
+use warnings;
+
+use PostgresNode;
+
+plan tests => 3;
+
+# Wait up to 180s for pg_prewarm to dump blocks.
+sub wait_for_dump
+{
+	my $path = shift;
+
+	foreach my $i (0 .. 1800)
+	{
+		last if -e $path;
+		usleep(100_000);
+	}
+}
+
+my $node = get_new_node("node");
+$node->init;
+$node->append_conf(
+	'postgresql.conf', qq(
+shared_preload_libraries = 'pg_prewarm'
+pg_prewarm.autoprewarm = 'on'
+pg_prewarm.autoprewarm_interval = 1
+));
+$node->start;
+
+my $blocks_path = $node->data_dir . '/autoprewarm.blocks';
+
+# Check that we can dump blocks on timeout.
+wait_for_dump($blocks_path);
+ok(-e $blocks_path, 'file autoprewarm.blocks should be present in the PGDATA');
+
+# Check that we can dump blocks on shutdown.
+$node->stop;
+$node->append_conf(
+	'postgresql.conf', qq(
+pg_prewarm.autoprewarm_interval = 0
+));
+
+# Remove autoprewarm.blocks
+unlink($blocks_path) || die "$blocks_path: $!";
+ok(!-e $blocks_path, 'sanity check, dump on timeout is turned off');
+
+$node->start;
+$node->stop;
+
+wait_for_dump($blocks_path);
+ok(-e $blocks_path, 'file autoprewarm.blocks should be present in the PGDATA after clean shutdown');
-- 
2.19.1

From fba212ed765c8c411db1ca19c2ac991662109d99 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:12:00 +0300
Subject: [PATCH v3 2/3] pg_prewarm: fix autoprewarm_interval behaviour

Previously it misused seconds from TimestampDifference() as
milliseconds, so it was busy looping with ~1 second interval
instead of waiting on latch with default autoprewarm_interval = 300s.
---
 contrib/pg_prewarm/autoprewarm.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git 

Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-09 Thread Tom Lane
Alexey Kondratov  writes:
> On 2020-11-09 21:53, Tom Lane wrote:
>> 0002 seems like a pretty clear bug fix, though I wonder if this is exactly
>> what we want to do going forward.  It seems like a very large fraction of
>> the callers of TimestampDifference would like to have the value in msec,
>> which means we're doing a whole lot of expensive and error-prone
>> arithmetic to break down the difference to sec/usec and then put it
>> back together again.  Let's get rid of that by inventing, say
>> TimestampDifferenceMilliseconds(...).

> Yeah, I get into this problem after a bug in another extension — 
> pg_wait_sampling. I have attached 0002, which implements 
> TimestampDifferenceMilliseconds(), so 0003 just uses this new function 
> to solve the initial issues. If it looks good to you, then we can switch 
> all similar callers to it.

Yeah, let's move forward with that --- in fact, I'm inclined to
back-patch it.  (Not till the current release cycle is done, though.
I don't find this important enough to justify a last-moment patch.)

BTW, I wonder if we shouldn't make TimestampDifferenceMilliseconds
round any fractional millisecond up rather than down.  Rounding down
seems to create a hazard of uselessly waking just before the delay is
completed.  Better to wake just after.

I still think your 0001 is fishy, but don't have time today to stare at
it more closely.

regards, tom lane




Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-09 Thread Alexey Kondratov

On 2020-11-09 21:53, Tom Lane wrote:

Alexey Kondratov  writes:
After fixing this issue I have noticed that it still dumps blocks 
twice

at each timeout (here I set autoprewarm_interval to 15s):
...
This happens because at timeout time we were using continue, but
actually we still have to wait the entire autoprewarm_interval after
successful dumping.


I don't think your 0001 is correct.  It would be okay if apw_dump_now()
could be counted on to take negligible time, but we shouldn't assume
that should we?



Yes, it seems so, if I understand you correctly. I had a doubt about 
possibility of pg_ctl to exit earlier than a dumping process. Now I 
added an explicit wait for dump file into test.



I agree that the "continue" seems a bit bogus, because it's skipping
the ResetLatch call at the bottom of the loop; it's not quite clear
to me whether that's a good thing or not.  But the general idea of
the existing code seems to be to loop around and make a fresh 
calculation

of how-long-to-wait, and that doesn't seem wrong.


I have left the last patch intact, since it resolves the 'double dump' 
issue, but I agree with нщгк point about existing logic of the code, 
although it is a bit broken. So I have to think more about how to fix it 
in a better way.


0002 seems like a pretty clear bug fix, though I wonder if this is 
exactly
what we want to do going forward.  It seems like a very large fraction 
of
the callers of TimestampDifference would like to have the value in 
msec,

which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again.  Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).


Yeah, I get into this problem after a bug in another extension — 
pg_wait_sampling. I have attached 0002, which implements 
TimestampDifferenceMilliseconds(), so 0003 just uses this new function 
to solve the initial issues. If it looks good to you, then we can switch 
all similar callers to it.



BTW, I see another bug of a related ilk.  Look what
postgres_fdw/connection.c is doing:

TimestampDifference(now, endtime, , );

/* To protect against clock skew, limit sleep to one 
minute. */
cur_timeout = Min(6, secs * USECS_PER_SEC + 
microsecs);


/* Sleep until there's something to do */
wc = WaitLatchOrSocket(MyLatch,
   WL_LATCH_SET | 
WL_SOCKET_READABLE |
   WL_TIMEOUT | 
WL_EXIT_ON_PM_DEATH,

   PQsocket(conn),
   cur_timeout, PG_WAIT_EXTENSION);

WaitLatchOrSocket's timeout is measured in msec not usec.  I think the
comment about "clock skew" is complete BS, and the Min() calculation 
was

put in as a workaround by somebody observing that the sleep waited too
long, but not understanding why.


I wonder how many troubles one can get with all these unit conversions.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom c79de17014753b311858b4570ca475f713328c62 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:24:55 +0300
Subject: [PATCH v2 4/4] pg_prewarm: refactor autoprewarm waits

Previously it was dumping twice at every timeout time.
---
 contrib/pg_prewarm/autoprewarm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index e5bd130bc8..872c7d51b1 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -236,7 +236,9 @@ autoprewarm_main(Datum main_arg)
 			{
 last_dump_time = GetCurrentTimestamp();
 apw_dump_now(true, false);
-continue;
+
+/* We have to sleep even after a successful dump */
+delay_in_ms = autoprewarm_interval * 1000;
 			}
 
 			/* Sleep until the next dump time. */
-- 
2.19.1

From c38c07708d57d6dec5a8a1697ca9c9810ad4d7ce Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:12:00 +0300
Subject: [PATCH v2 3/4] pg_prewarm: fix autoprewarm_interval behaviour.

Previously it misused seconds from TimestampDifference() as
milliseconds, so it was dumping autoprewarm.blocks ~every second
event with default autoprewarm_interval = 300s.
---
 contrib/pg_prewarm/autoprewarm.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index d3dec6e3ec..e5bd130bc8 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -222,16 +222,14 @@ autoprewarm_main(Datum main_arg)
 		{
 			long		delay_in_ms = 0;
 			TimestampTz next_dump_time = 0;
-			long		secs = 0;
-			int			usecs = 0;
 
 			/* Compute the next dump time. */
 			next_dump_time =
 TimestampTzPlusMilliseconds(last_dump_time,
 			autoprewarm_interval * 

Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-09 Thread Tom Lane
Alexey Kondratov  writes:
> After fixing this issue I have noticed that it still dumps blocks twice 
> at each timeout (here I set autoprewarm_interval to 15s):
> ...
> This happens because at timeout time we were using continue, but 
> actually we still have to wait the entire autoprewarm_interval after 
> successful dumping.

I don't think your 0001 is correct.  It would be okay if apw_dump_now()
could be counted on to take negligible time, but we shouldn't assume
that should we?

I agree that the "continue" seems a bit bogus, because it's skipping
the ResetLatch call at the bottom of the loop; it's not quite clear
to me whether that's a good thing or not.  But the general idea of
the existing code seems to be to loop around and make a fresh calculation
of how-long-to-wait, and that doesn't seem wrong.

0002 seems like a pretty clear bug fix, though I wonder if this is exactly
what we want to do going forward.  It seems like a very large fraction of
the callers of TimestampDifference would like to have the value in msec,
which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again.  Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).

BTW, I see another bug of a related ilk.  Look what
postgres_fdw/connection.c is doing:

TimestampDifference(now, endtime, , );

/* To protect against clock skew, limit sleep to one minute. */
cur_timeout = Min(6, secs * USECS_PER_SEC + microsecs);

/* Sleep until there's something to do */
wc = WaitLatchOrSocket(MyLatch,
   WL_LATCH_SET | WL_SOCKET_READABLE |
   WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
   PQsocket(conn),
   cur_timeout, PG_WAIT_EXTENSION);

WaitLatchOrSocket's timeout is measured in msec not usec.  I think the
comment about "clock skew" is complete BS, and the Min() calculation was
put in as a workaround by somebody observing that the sleep waited too
long, but not understanding why.

regards, tom lane




Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-09 Thread Jeevan Ladhe
Hi,

On Mon, Nov 9, 2020 at 10:15 PM Alexey Kondratov 
wrote:

> Hi Hackers,
>
> Today I have accidentally noticed that autoprewarm feature of pg_prewarm
> used TimestampDifference()'s results in a wrong way.
>
> First, it used *seconds* result from it as a *milliseconds*. It was
> causing it to make dump file autoprewarm.blocks ~every second with
> default setting of autoprewarm_interval = 300s.
>
>
I had a look at this, and I agree that this is an issue. I also had a look
at
the patch 0002, and the patch looks good to me.

In patch 0003 there is a typo:

+ /* We have to sleep even after a successfull dump */

s/successfull/successful


Regards,
Jeevan Ladhe


Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-09 Thread Alexey Kondratov

Hi Hackers,

Today I have accidentally noticed that autoprewarm feature of pg_prewarm 
used TimestampDifference()'s results in a wrong way.


First, it used *seconds* result from it as a *milliseconds*. It was 
causing it to make dump file autoprewarm.blocks ~every second with 
default setting of autoprewarm_interval = 300s.


Here is a log part with debug output in this case:

```
2020-11-09 19:09:00.162 MSK [85328] LOG:  dumping autoprewarm.blocks
2020-11-09 19:09:01.161 MSK [85328] LOG:  dumping autoprewarm.blocks
2020-11-09 19:09:02.160 MSK [85328] LOG:  dumping autoprewarm.blocks
2020-11-09 19:09:03.159 MSK [85328] LOG:  dumping autoprewarm.blocks
```

After fixing this issue I have noticed that it still dumps blocks twice 
at each timeout (here I set autoprewarm_interval to 15s):


```
2020-11-09 19:18:59.692 MSK [85662] LOG:  dumping autoprewarm.blocks
2020-11-09 19:18:59.700 MSK [85662] LOG:  dumping autoprewarm.blocks

2020-11-09 19:19:14.694 MSK [85662] LOG:  dumping autoprewarm.blocks
2020-11-09 19:19:14.704 MSK [85662] LOG:  dumping autoprewarm.blocks
```

This happens because at timeout time we were using continue, but 
actually we still have to wait the entire autoprewarm_interval after 
successful dumping.


I have fixed both issues in the attached patches and also added a 
minimalistic tap test as a first one to verify that this automatic 
damping still works after refactoring. I put Robert into CC, since he is 
an author of this feature.


What do you think?


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 6d4bab7f21c3661dd4dd5a0de7e097b1de3f642c Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:24:55 +0300
Subject: [PATCH v1 3/3] pg_prewarm: refactor autoprewarm waits

Previously it was dumping twice at every timeout time.
---
 contrib/pg_prewarm/autoprewarm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index b18a065ed5..f52c83de1e 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -238,7 +238,9 @@ autoprewarm_main(Datum main_arg)
 			{
 last_dump_time = GetCurrentTimestamp();
 apw_dump_now(true, false);
-continue;
+
+/* We have to sleep even after a successfull dump */
+delay_in_ms = autoprewarm_interval * 1000;
 			}
 
 			/* Sleep until the next dump time. */
-- 
2.19.1

From 8793b8beb6a5c1ae730f1fffb09dff64c83bc631 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:12:00 +0300
Subject: [PATCH v1 2/3] pg_prewarm: fix autoprewarm_interval behaviour.

Previously it misused seconds from TimestampDifference() as
milliseconds, so it was dumping autoprewarm.blocks ~every second
event with default autoprewarm_interval = 300s.
---
 contrib/pg_prewarm/autoprewarm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index d3dec6e3ec..b18a065ed5 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -231,7 +231,7 @@ autoprewarm_main(Datum main_arg)
 			autoprewarm_interval * 1000);
 			TimestampDifference(GetCurrentTimestamp(), next_dump_time,
 , );
-			delay_in_ms = secs + (usecs / 1000);
+			delay_in_ms = secs * 1000 + (usecs / 1000);
 
 			/* Perform a dump if it's time. */
 			if (delay_in_ms <= 0)
-- 
2.19.1

From 31dc30c97861afae9c34852afc5a5b1c91bbeadc Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:04:10 +0300
Subject: [PATCH v1 1/3] pg_prewarm: add tap test for autoprewarm feature

---
 contrib/pg_prewarm/Makefile |  2 +
 contrib/pg_prewarm/t/001_autoprewarm.pl | 51 +
 2 files changed, 53 insertions(+)
 create mode 100644 contrib/pg_prewarm/t/001_autoprewarm.pl

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index b13ac3c813..9cfde8c4e4 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -10,6 +10,8 @@ EXTENSION = pg_prewarm
 DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/pg_prewarm/t/001_autoprewarm.pl b/contrib/pg_prewarm/t/001_autoprewarm.pl
new file mode 100644
index 00..b564c29931
--- /dev/null
+++ b/contrib/pg_prewarm/t/001_autoprewarm.pl
@@ -0,0 +1,51 @@
+#
+# Check that pg_prewarm can dump blocks from shared buffers
+# to PGDATA/autoprewarm.blocks.
+#
+
+use strict;
+use Test::More;
+use TestLib;
+use Time::HiRes qw(usleep);
+use warnings;
+
+use PostgresNode;
+
+plan tests => 3;
+
+my $node = get_new_node("node");
+$node->init;
+$node->append_conf(
+'postgresql.conf', qq(
+shared_preload_libraries = 'pg_prewarm'
+pg_prewarm.autoprewarm = 'on'