Re: cleanup temporary files after crash

2021-06-24 Thread Michael Paquier
On Wed, Jun 23, 2021 at 08:46:12AM +0900, Michael Paquier wrote:
> Looks adapted to me.

Applied as it was still on the stack.  Please note that
GUC_NOT_IN_SAMPLE was missing.
--
Michael


signature.asc
Description: PGP signature


Re: cleanup temporary files after crash

2021-06-22 Thread Michael Paquier
On Tue, Jun 22, 2021 at 04:12:51PM -0300, Euler Taveira wrote:
> Was it? I seem to have missed this suggestion.
> 
> I'm attaching a patch to fix it.

Looks adapted to me.
--
Michael


signature.asc
Description: PGP signature


Re: cleanup temporary files after crash

2021-06-22 Thread Euler Taveira
On Fri, Jun 11, 2021, at 9:43 PM, Justin Pryzby wrote:
> On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> > > > The current behavior is only useful for debugging purposes.
> 
> On Wed, 28 Oct 2020 at 15:42, Tomas Vondra  
> wrote:
> > > One thing I'm not sure about is whether we should have the GUC as
> > > proposed, or have a negative "keep_temp_files_after_restart" defaulting
> > > to false. But I don't have a very good justification for the alternative
> > > other than vague personal preference.
> 
> On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> > I thought about not providing a GUC at all or provide it in the developer
> > section. I've never heard someone saying that they use those temporary
> > files to investigate an issue. Regarding a crash, all information is already
> > available and temporary files don't provide extra details. This new GUC is 
> > just to keep the
> > previous behavior. I'm fine without the GUC, though.
> 
> Should this GUC be classified as a developer option, and removed from
> postgresql.sample.conf ?
It probably should.

> That was discussed initially in October but not since.
Was it? I seem to have missed this suggestion.

I'm attaching a patch to fix it.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From fbfd63fcb0d75045b56576cb220e7caea6af9e92 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Tue, 22 Jun 2021 14:39:56 -0300
Subject: [PATCH v1] Move new GUC remove_temp_files_after_crash to another
 section

The appropriate section for GUC remove_temp_files_after_crash is
probably Developer Options since the main goal of the feature is to
allow inspecting temporary files for debugging purposes. It was an
oversight in commit cd91de0d. Per report from Justin Pryzby.
---
 doc/src/sgml/config.sgml  | 41 +--
 src/backend/utils/misc/guc.c  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 -
 3 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index eeba2caa43..f5a753e589 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9863,28 +9863,6 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
- 
-  remove_temp_files_after_crash (boolean)
-  
-   remove_temp_files_after_crash configuration parameter
-  
-  
-  
-   
-When set to on, which is the default,
-PostgreSQL will automatically remove
-temporary files after a backend crash. If disabled, the files will be
-retained and may be used for debugging, for example. Repeated crashes
-may however result in accumulation of useless files.
-   
-
-   
-This parameter can only be set in the postgresql.conf
-file or on the server command line.
-   
-  
- 
-
  
   data_sync_retry (boolean)
   
@@ -10912,6 +10890,25 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
   
  
 
+ 
+  remove_temp_files_after_crash (boolean)
+  
+   remove_temp_files_after_crash configuration parameter
+  
+  
+  
+   
+When set to on, which is the default,
+PostgreSQL will automatically remove
+temporary files after a backend crash. If disabled, the files will be
+retained and may be used for debugging, for example. Repeated crashes
+may however result in accumulation of useless files. This parameter 
+can only be set in the postgresql.conf file or on
+the server command line.
+   
+  
+ 
+
 
   
   
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 68b62d523d..d8b8f4a572 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1404,7 +1404,7 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 	{
-		{"remove_temp_files_after_crash", PGC_SIGHUP, ERROR_HANDLING_OPTIONS,
+		{"remove_temp_files_after_crash", PGC_SIGHUP, DEVELOPER_OPTIONS,
 			gettext_noop("Remove temporary files after backend crash."),
 			NULL
 		},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ddbb6dc2be..a5a7174b0e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -768,8 +768,6 @@
 
 #exit_on_error = off			# terminate session on any error?
 #restart_after_crash = on		# reinitialize after backend crash?
-#remove_temp_files_after_crash = on	# remove temporary files after
-	# backend crash?
 #data_sync_retry = off			# retry or panic on failure to fsync
 	# data?
 	# (change requires restart)
-- 
2.20.1



Re: cleanup temporary files after crash

2021-06-11 Thread Justin Pryzby
On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> > > The current behavior is only useful for debugging purposes.

On Wed, 28 Oct 2020 at 15:42, Tomas Vondra  wrote:
> > One thing I'm not sure about is whether we should have the GUC as
> > proposed, or have a negative "keep_temp_files_after_restart" defaulting
> > to false. But I don't have a very good justification for the alternative
> > other than vague personal preference.

On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> I thought about not providing a GUC at all or provide it in the developer
> section. I've never heard someone saying that they use those temporary
> files to investigate an issue. Regarding a crash, all information is already
> available and temporary files don't provide extra details. This new GUC is 
> just to keep the
> previous behavior. I'm fine without the GUC, though.

Should this GUC be classified as a developer option, and removed from
postgresql.sample.conf ?

That was discussed initially in October but not since.

On Wed, Oct 28, 2020 at 11:16:26AM -0300, Euler Taveira wrote:
> It also has an undesirable behavior: you have to restart the service to
> reclaim the space.

BTW, that's not really true - you can remove the tempfiles while the server is
running.  The complainant in bug#16427 was being careful - which is good.
I watch for and remove tempfiles older than 2 days.  The worst consequence of
removing a tempfile would be a failed query.  

-- 
Justin




Re: cleanup temporary files after crash

2021-03-19 Thread Euler Taveira
On Fri, Mar 19, 2021, at 2:27 PM, Tomas Vondra wrote:
> I've pushed this version, with the plpgsql block. I've tested it on all
> the machines I have here, hopefully it'll make buildfarm happy too.
Thanks for taking care of this issue.

> AFAICS I was mistaken about what the pump() functions do - it clearly
> does not run the command repeatedly, it just waits for the right output
> to appear. So a busy loop in plpgsql seems like a reasonable solution.
> Perhaps there's a better way to do this in TAP, not sure.
I took 3 times longer to do that fragile test than the code itself. :-/

> My brain hurts from reading too much Perl today ...
I feel the same when I have to deal with Perl code.

It seems the animals are happy with this fix.

--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: cleanup temporary files after crash

2021-03-19 Thread Tomas Vondra
On 3/19/21 5:26 PM, Tomas Vondra wrote:
> On 3/19/21 5:23 PM, Tomas Vondra wrote:
>> ...
>>
>> If I replace this with a wait loop in a plpgsql block, that works
>> perfectly fine (no infinite loops). Tested both on x86_64 and rpi.
>>
> 
> For the record, here's the version with plpgsql block, which seems to be
> working just fine.
> 

I've pushed this version, with the plpgsql block. I've tested it on all
the machines I have here, hopefully it'll make buildfarm happy too.

AFAICS I was mistaken about what the pump() functions do - it clearly
does not run the command repeatedly, it just waits for the right output
to appear. So a busy loop in plpgsql seems like a reasonable solution.
Perhaps there's a better way to do this in TAP, not sure.

My brain hurts from reading too much Perl today ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-19 Thread Tomas Vondra
On 3/19/21 5:23 PM, Tomas Vondra wrote:
> ...
>
> If I replace this with a wait loop in a plpgsql block, that works
> perfectly fine (no infinite loops). Tested both on x86_64 and rpi.
>

For the record, here's the version with plpgsql block, which seems to be
working just fine.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl
index 8044849b73..c5624fe864 100644
--- a/src/test/recovery/t/022_crash_temp_files.pl
+++ b/src/test/recovery/t/022_crash_temp_files.pl
@@ -79,8 +79,8 @@ my $killme2 = IPC::Run::start(
 # Insert one tuple and leave the transaction open
 $killme_stdin2 .= q[
 BEGIN;
-SELECT $$insert-tuple-to-lock-next-insert$$;
 INSERT INTO tab_crash (a) VALUES(1);
+SELECT $$insert-tuple-to-lock-next-insert$$;
 ];
 pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m);
 $killme_stdout2 = '';
@@ -100,6 +100,26 @@ ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Wait until the batch insert gets stuck on the lock.
+$killme_stdin2 .= q[
+DO $c$
+DECLARE
+  c INT;
+BEGIN
+  LOOP
+SELECT COUNT(*) INTO c FROM pg_locks WHERE pid = ] . $pid . q[ AND NOT granted;
+IF c > 0 THEN
+  EXIT;
+END IF;
+  END LOOP;
+END; $c$;
+SELECT $$insert-tuple-lock-waiting$$;
+];
+
+pump_until($killme2, \$killme_stdout2, qr/insert-tuple-lock-waiting/m);
+$killme_stdout2 = '';
+$killme_stderr2 = '';
+
 # Kill with SIGKILL
 my $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
 is($ret, 0, 'killed process with KILL');
@@ -147,8 +167,8 @@ $killme2->run();
 # Insert one tuple and leave the transaction open
 $killme_stdin2 .= q[
 BEGIN;
-SELECT $$insert-tuple-to-lock-next-insert$$;
 INSERT INTO tab_crash (a) VALUES(1);
+SELECT $$insert-tuple-to-lock-next-insert$$;
 ];
 pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m);
 $killme_stdout2 = '';
@@ -168,6 +188,26 @@ ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Wait until the batch insert gets stuck on the lock.
+$killme_stdin2 .= q[
+DO $c$
+DECLARE
+  c INT;
+BEGIN
+  LOOP
+SELECT COUNT(*) INTO c FROM pg_locks WHERE pid = ] . $pid . q[ AND NOT granted;
+IF c > 0 THEN
+  EXIT;
+END IF;
+  END LOOP;
+END; $c$;
+SELECT $$insert-tuple-lock-waiting$$;
+];
+
+pump_until($killme2, \$killme_stdout2, qr/insert-tuple-lock-waiting/m);
+$killme_stdout2 = '';
+$killme_stderr2 = '';
+
 # Kill with SIGKILL
 $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
 is($ret, 0, 'killed process with KILL');


Re: cleanup temporary files after crash

2021-03-19 Thread Tomas Vondra


On 3/19/21 5:23 PM, Tomas Vondra wrote:
> ...
>
> If I replace this with a wait loop in a plpgsql block, that works
> perfectly fine (no infinite loops). Tested both on x86_64 and rpi.
>

For the record, here's the version with plpgsql block, which seems to be
working just fine.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-19 Thread Tomas Vondra


On 3/19/21 3:17 PM, Euler Taveira wrote:
> On Fri, Mar 19, 2021, at 12:23 AM, Tom Lane wrote:
>> [ reads code... ]
>> ... no, I think the problem is the test is still full of race conditions.
>>
>> In the first place, waiting till you see the output of a SELECT that's
>> before the useful query is not enough to guarantee that the useful query
>> is done, or even started.  That's broken on both sessions.
> That's an ugly and fragile mechanism to workaround the fact that pump_until
> reacts after you have the query return.
> 
>> In the second place, even if the second INSERT has started, you don't know
>> that it's reached the point of blocking on the tuple conflict yet.
>> Which in turn means that it might not've filled its tuplestore yet.
>>
>> In short, this script is designed to ensure that the test query can't
>> finish too soon, but that proves nothing about whether the test query
>> has even started.  And since you also haven't really guaranteed that the
>> intended-to-be-blocking query is done, I don't think that the first
>> condition really holds either.
> In order to avoid the race condition between filling the tuplestore and
> killing
> the backend, we could use a pool_query_until() before SIGKILL to wait the
> temporary file being created. Do you think this modification will make this
> test more stable?
> 

Wow, I thought I understand the perl code, but clearly I was wrong.

I think the solution is not particularly difficult.

For the initial insert, it's fine to switch the order of the SQL
commands, so that the insert happens first, and only then emit the string.

For the second insert, we can't do that, because it's expected to block
on the lock. So we have to verify that by looking at the lock directly.

I however don't understand what the pump_until function does. AFAICS it
should run the query in a loop, and bail out once it matches the
expected output. But that does not seem to happen, so when the query
gets executed before the lock is there, it results in infinite loop.

In the attached patch I've simulated this by random() < 0.5.

If I replace this with a wait loop in a plpgsql block, that works
perfectly fine (no infinite loops). Tested both on x86_64 and rpi.



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl
index 8044849b73..88478d44c1 100644
--- a/src/test/recovery/t/022_crash_temp_files.pl
+++ b/src/test/recovery/t/022_crash_temp_files.pl
@@ -79,8 +79,8 @@ my $killme2 = IPC::Run::start(
 # Insert one tuple and leave the transaction open
 $killme_stdin2 .= q[
 BEGIN;
-SELECT $$insert-tuple-to-lock-next-insert$$;
 INSERT INTO tab_crash (a) VALUES(1);
+SELECT $$insert-tuple-to-lock-next-insert$$;
 ];
 pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m);
 $killme_stdout2 = '';
@@ -100,6 +100,15 @@ ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Wait until the batch insert gets stuck on the lock.
+$killme_stdin2 .= q[
+SELECT $$insert-tuple-lock-waiting$$ FROM pg_locks WHERE pid = ] . $pid . q[ AND NOT granted AND random() < 0.5;
+];
+
+pump_until($killme2, \$killme_stdout2, qr/insert-tuple-lock-waiting/m);
+$killme_stdout2 = '';
+$killme_stderr2 = '';
+
 # Kill with SIGKILL
 my $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
 is($ret, 0, 'killed process with KILL');
@@ -147,8 +156,8 @@ $killme2->run();
 # Insert one tuple and leave the transaction open
 $killme_stdin2 .= q[
 BEGIN;
-SELECT $$insert-tuple-to-lock-next-insert$$;
 INSERT INTO tab_crash (a) VALUES(1);
+SELECT $$insert-tuple-to-lock-next-insert$$;
 ];
 pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m);
 $killme_stdout2 = '';
@@ -168,6 +177,15 @@ ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Wait until the batch insert gets stuck on the lock.
+$killme_stdin2 .= q[
+SELECT $$insert-tuple-lock-waiting$$ FROM pg_locks WHERE pid = ] . $pid . q[ AND NOT granted AND random() < 0.5;
+];
+
+pump_until($killme2, \$killme_stdout2, qr/insert-tuple-lock-waiting/m);
+$killme_stdout2 = '';
+$killme_stderr2 = '';
+
 # Kill with SIGKILL
 $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
 is($ret, 0, 'killed process with KILL');


Re: cleanup temporary files after crash

2021-03-19 Thread Euler Taveira
On Fri, Mar 19, 2021, at 12:23 AM, Tom Lane wrote:
> [ reads code... ]
> ... no, I think the problem is the test is still full of race conditions.
> 
> In the first place, waiting till you see the output of a SELECT that's
> before the useful query is not enough to guarantee that the useful query
> is done, or even started.  That's broken on both sessions.
That's an ugly and fragile mechanism to workaround the fact that pump_until
reacts after you have the query return.

> In the second place, even if the second INSERT has started, you don't know
> that it's reached the point of blocking on the tuple conflict yet.
> Which in turn means that it might not've filled its tuplestore yet.
> 
> In short, this script is designed to ensure that the test query can't
> finish too soon, but that proves nothing about whether the test query
> has even started.  And since you also haven't really guaranteed that the
> intended-to-be-blocking query is done, I don't think that the first
> condition really holds either.
In order to avoid the race condition between filling the tuplestore and killing
the backend, we could use a pool_query_until() before SIGKILL to wait the
temporary file being created. Do you think this modification will make this
test more stable?


--
Euler Taveira
EDB   https://www.enterprisedb.com/
diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl
index 8044849b73..41a91ebd06 100644
--- a/src/test/recovery/t/022_crash_temp_files.pl
+++ b/src/test/recovery/t/022_crash_temp_files.pl
@@ -100,6 +100,11 @@ ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Wait till a temporary file is created
+$node->poll_query_until(
+	'postgres',
+	'SELECT COUNT(1) FROM pg_ls_dir($$base/pgsql_tmp$$)', '1');
+
 # Kill with SIGKILL
 my $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
 is($ret, 0, 'killed process with KILL');
@@ -168,6 +173,11 @@ ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Wait till a temporary file is created
+$node->poll_query_until(
+	'postgres',
+	'SELECT COUNT(1) FROM pg_ls_dir($$base/pgsql_tmp$$)', '1');
+
 # Kill with SIGKILL
 $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
 is($ret, 0, 'killed process with KILL');


Re: cleanup temporary files after crash

2021-03-18 Thread Tom Lane
[ reads code... ]
... no, I think the problem is the test is still full of race conditions.

In the first place, waiting till you see the output of a SELECT that's
before the useful query is not enough to guarantee that the useful query
is done, or even started.  That's broken on both sessions.

In the second place, even if the second INSERT has started, you don't know
that it's reached the point of blocking on the tuple conflict yet.
Which in turn means that it might not've filled its tuplestore yet.

In short, this script is designed to ensure that the test query can't
finish too soon, but that proves nothing about whether the test query
has even started.  And since you also haven't really guaranteed that the
intended-to-be-blocking query is done, I don't think that the first
condition really holds either.

regards, tom lane




Re: cleanup temporary files after crash

2021-03-18 Thread Tomas Vondra
On 3/19/21 3:57 AM, Tom Lane wrote:
> Tomas Vondra  writes:
>> So crake and florican seem to be happy now. Not sure about lapwing yet.
>> But interestingly enough, prion and curculio got unhappy. They worked
>> fine with the older test, but now it fails with the "no such file or
>> directory" message. I wonder what makes them different from the other
>> x86_64 machines ...
> 
> I'm confused about why the new query would use a temp file at all.
> Maybe they aren't using the same plan?
> 

Because generate_series() stashes the results into a tuplestore, and the
test sets work_mem to just 64kB. Maybe I'm missing some detail, though.

The plan is extremely simple - just Function Scan feeding data into an
Insert, not sure what other plan could be used.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-18 Thread Tom Lane
Tomas Vondra  writes:
> So crake and florican seem to be happy now. Not sure about lapwing yet.
> But interestingly enough, prion and curculio got unhappy. They worked
> fine with the older test, but now it fails with the "no such file or
> directory" message. I wonder what makes them different from the other
> x86_64 machines ...

I'm confused about why the new query would use a temp file at all.
Maybe they aren't using the same plan?

regards, tom lane




Re: cleanup temporary files after crash

2021-03-18 Thread Tomas Vondra
On 3/19/21 2:21 AM, Tomas Vondra wrote:
> On 3/19/21 1:54 AM, Euler Taveira wrote:
>> On Thu, Mar 18, 2021, at 8:34 PM, Tomas Vondra wrote:
>>> Well, that's better, bit it still does not do the trick on the 32-bit
>>> machine - in that case a 1000 rows with int4 still fit into work_mem, so
>>> the temp file is not created. Per my experiments about 1040 rows are
>>> needed - s close ;-) So let's make it 2000.
>> My 32-bit laptop needs some repairs so I blindly chose 1k rows.
>>
> 
> Yeah, it's not immediately obvious how many rows are needed. 2000 would
> be enough, but I just bumped it up to 5000 for good measure.
> 
>>> We might as well check that the temp file actually exists, before
>>> killing the backend. Just to be sure.
>> Do you mean with remove_temp_files_after_crash = on? New version attached.
>>
> 
> On second thought, we don't need that check. I realized the second part
> of the test (checking remove_temp_files_after_crash=off) actually does
> verify that we've created the file, so that should be OK.
> 
> I've pushed the previous version with the 5000 rows. Let's see what
> crake and florican think about this ...
> 

So crake and florican seem to be happy now. Not sure about lapwing yet.

But interestingly enough, prion and curculio got unhappy. They worked
fine with the older test, but now it fails with the "no such file or
directory" message. I wonder what makes them different from the other
x86_64 machines ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-18 Thread Tomas Vondra
On 3/19/21 1:54 AM, Euler Taveira wrote:
> On Thu, Mar 18, 2021, at 8:34 PM, Tomas Vondra wrote:
>> Well, that's better, bit it still does not do the trick on the 32-bit
>> machine - in that case a 1000 rows with int4 still fit into work_mem, so
>> the temp file is not created. Per my experiments about 1040 rows are
>> needed - s close ;-) So let's make it 2000.
> My 32-bit laptop needs some repairs so I blindly chose 1k rows.
> 

Yeah, it's not immediately obvious how many rows are needed. 2000 would
be enough, but I just bumped it up to 5000 for good measure.

>> We might as well check that the temp file actually exists, before
>> killing the backend. Just to be sure.
> Do you mean with remove_temp_files_after_crash = on? New version attached.
> 

On second thought, we don't need that check. I realized the second part
of the test (checking remove_temp_files_after_crash=off) actually does
verify that we've created the file, so that should be OK.

I've pushed the previous version with the 5000 rows. Let's see what
crake and florican think about this ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-18 Thread Euler Taveira
On Thu, Mar 18, 2021, at 8:34 PM, Tomas Vondra wrote:
> Well, that's better, bit it still does not do the trick on the 32-bit
> machine - in that case a 1000 rows with int4 still fit into work_mem, so
> the temp file is not created. Per my experiments about 1040 rows are
> needed - s close ;-) So let's make it 2000.
My 32-bit laptop needs some repairs so I blindly chose 1k rows.

> We might as well check that the temp file actually exists, before
> killing the backend. Just to be sure.
Do you mean with remove_temp_files_after_crash = on? New version attached.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl
index c37b227770..38e935d641 100644
--- a/src/test/recovery/t/022_crash_temp_files.pl
+++ b/src/test/recovery/t/022_crash_temp_files.pl
@@ -5,9 +5,8 @@ use PostgresNode;
 use TestLib;
 use Test::More;
 use Config;
-use Time::HiRes qw(usleep);
 
-plan tests => 9;
+plan tests => 10;
 
 
 # To avoid hanging while expecting some specific input from a psql
@@ -33,8 +32,7 @@ $node->safe_psql(
 # create table, insert rows
 $node->safe_psql(
 	'postgres',
-	q[CREATE TABLE tab_crash (a text);
-		INSERT INTO tab_crash (a) SELECT gen_random_uuid() FROM generate_series(1, 500);]);
+	q[CREATE TABLE tab_crash (a integer UNIQUE);]);
 
 # Run psql, keeping session alive, so we have an alive backend to kill.
 my ($killme_stdin, $killme_stdout, $killme_stderr) = ('', '', '');
@@ -62,6 +60,32 @@ chomp($pid);
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Open a 2nd session that will block the 1st session. The UNIQUE constraint
+# will prevent the temporary file from the 1st session to be removed.
+my ($killme_stdin2, $killme_stdout2, $killme_stderr2) = ('', '', '');
+my $killme2 = IPC::Run::start(
+	[
+		'psql', '-X', '-qAt', '-v', 'ON_ERROR_STOP=1', '-f', '-', '-d',
+		$node->connstr('postgres')
+	],
+	'<',
+	\$killme_stdin2,
+	'>',
+	\$killme_stdout2,
+	'2>',
+	\$killme_stderr2,
+	$psql_timeout);
+
+# Insert one tuple and leave the transaction open
+$killme_stdin2 .= q[
+BEGIN;
+SELECT $$insert-tuple-to-lock-next-insert$$;
+INSERT INTO tab_crash (a) VALUES(1);
+];
+pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m);
+$killme_stdout2 = '';
+$killme_stderr2 = '';
+
 # Run the query that generates a temporary file and that will be killed before
 # it finishes. Since the query that generates the temporary file does not
 # return before the connection is killed, use a SELECT before to trigger
@@ -69,15 +93,18 @@ $killme_stderr = '';
 $killme_stdin .= q[
 BEGIN;
 SELECT $$in-progress-before-sigkill$$;
-WITH foo AS (SELECT a FROM tab_crash ORDER BY a) SELECT a, pg_sleep(1) FROM foo;
+INSERT INTO tab_crash (a) SELECT i FROM generate_series(1, 2000) s(i);
 ];
 ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
-	'select in-progress-before-sigkill');
+	'insert in-progress-before-sigkill');
 $killme_stdout = '';
 $killme_stderr = '';
 
-# Wait some time so the temporary file is generated by SELECT
-usleep(10_000);
+# Check for the existence of a temporary file
+is($node->safe_psql(
+	'postgres',
+	'SELECT COUNT(1) FROM pg_ls_dir($$base/pgsql_tmp$$)'),
+	qq(1), 'one temporary file');
 
 # Kill with SIGKILL
 my $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
@@ -85,6 +112,7 @@ is($ret, 0, 'killed process with KILL');
 
 # Close psql session
 $killme->finish;
+$killme2->finish;
 
 # Wait till server restarts
 $node->poll_query_until('postgres', 'SELECT 1', '1');
@@ -118,6 +146,20 @@ chomp($pid);
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Restart the 2nd psql session
+($killme_stdin2, $killme_stdout2, $killme_stderr2) = ('', '', '');
+$killme2->run();
+
+# Insert one tuple and leave the transaction open
+$killme_stdin2 .= q[
+BEGIN;
+SELECT $$insert-tuple-to-lock-next-insert$$;
+INSERT INTO tab_crash (a) VALUES(1);
+];
+pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m);
+$killme_stdout2 = '';
+$killme_stderr2 = '';
+
 # Run the query that generates a temporary file and that will be killed before
 # it finishes. Since the query that generates the temporary file does not
 # return before the connection is killed, use a SELECT before to trigger
@@ -125,22 +167,20 @@ $killme_stderr = '';
 $killme_stdin .= q[
 BEGIN;
 SELECT $$in-progress-before-sigkill$$;
-WITH foo AS (SELECT a FROM tab_crash ORDER BY a) SELECT a, pg_sleep(1) FROM foo;
+INSERT INTO tab_crash (a) SELECT i FROM generate_series(1, 2000) s(i);
 ];
 ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
-	'select in-progress-before-sigkill');
+	'insert in-progress-before-sigkill');
 $killme_stdout = '';
 $killme_stderr = '';
 
-# Wait some time so the temporary file is generated by SELECT
-usleep(10_000);
-
 # Kill with SIGKILL
 $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
 is($ret, 0, 'killed process with KILL');
 
 # Close psql 

Re: cleanup temporary files after crash

2021-03-18 Thread Tomas Vondra
On 3/18/21 10:44 PM, Euler Taveira wrote:
> On Thu, Mar 18, 2021, at 6:00 PM, Euler Taveira wrote:
>> On Thu, Mar 18, 2021, at 5:51 PM, Tomas Vondra wrote:
>>> OK. Can you prepare a patch with the proposed test approach?
>> I'm on it.
> What do you think about this patch?
> 

Well, that's better, bit it still does not do the trick on the 32-bit
machine - in that case a 1000 rows with int4 still fit into work_mem, so
the temp file is not created. Per my experiments about 1040 rows are
needed - s close ;-) So let's make it 2000.

We might as well check that the temp file actually exists, before
killing the backend. Just to be sure.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-18 Thread Euler Taveira
On Thu, Mar 18, 2021, at 6:00 PM, Euler Taveira wrote:
> On Thu, Mar 18, 2021, at 5:51 PM, Tomas Vondra wrote:
>> OK. Can you prepare a patch with the proposed test approach?
> I'm on it.
What do you think about this patch?


--
Euler Taveira
EDB   https://www.enterprisedb.com/
diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl
index c37b227770..68e6fc20ad 100644
--- a/src/test/recovery/t/022_crash_temp_files.pl
+++ b/src/test/recovery/t/022_crash_temp_files.pl
@@ -5,7 +5,6 @@ use PostgresNode;
 use TestLib;
 use Test::More;
 use Config;
-use Time::HiRes qw(usleep);
 
 plan tests => 9;
 
@@ -33,8 +32,7 @@ $node->safe_psql(
 # create table, insert rows
 $node->safe_psql(
 	'postgres',
-	q[CREATE TABLE tab_crash (a text);
-		INSERT INTO tab_crash (a) SELECT gen_random_uuid() FROM generate_series(1, 500);]);
+	q[CREATE TABLE tab_crash (a integer UNIQUE);]);
 
 # Run psql, keeping session alive, so we have an alive backend to kill.
 my ($killme_stdin, $killme_stdout, $killme_stderr) = ('', '', '');
@@ -62,6 +60,32 @@ chomp($pid);
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Open a 2nd session that will block the 1st session. The UNIQUE constraint
+# will prevent the temporary file from the 1st session to be removed.
+my ($killme_stdin2, $killme_stdout2, $killme_stderr2) = ('', '', '');
+my $killme2 = IPC::Run::start(
+	[
+		'psql', '-X', '-qAt', '-v', 'ON_ERROR_STOP=1', '-f', '-', '-d',
+		$node->connstr('postgres')
+	],
+	'<',
+	\$killme_stdin2,
+	'>',
+	\$killme_stdout2,
+	'2>',
+	\$killme_stderr2,
+	$psql_timeout);
+
+# Insert one tuple and leave the transaction open
+$killme_stdin2 .= q[
+BEGIN;
+SELECT $$insert-tuple-to-lock-next-insert$$;
+INSERT INTO tab_crash (a) VALUES(1);
+];
+pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m);
+$killme_stdout2 = '';
+$killme_stderr2 = '';
+
 # Run the query that generates a temporary file and that will be killed before
 # it finishes. Since the query that generates the temporary file does not
 # return before the connection is killed, use a SELECT before to trigger
@@ -69,22 +93,20 @@ $killme_stderr = '';
 $killme_stdin .= q[
 BEGIN;
 SELECT $$in-progress-before-sigkill$$;
-WITH foo AS (SELECT a FROM tab_crash ORDER BY a) SELECT a, pg_sleep(1) FROM foo;
+INSERT INTO tab_crash (a) SELECT i FROM generate_series(1, 1000) s(i);
 ];
 ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
-	'select in-progress-before-sigkill');
+	'insert in-progress-before-sigkill');
 $killme_stdout = '';
 $killme_stderr = '';
 
-# Wait some time so the temporary file is generated by SELECT
-usleep(10_000);
-
 # Kill with SIGKILL
 my $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
 is($ret, 0, 'killed process with KILL');
 
 # Close psql session
 $killme->finish;
+$killme2->finish;
 
 # Wait till server restarts
 $node->poll_query_until('postgres', 'SELECT 1', '1');
@@ -118,6 +140,20 @@ chomp($pid);
 $killme_stdout = '';
 $killme_stderr = '';
 
+# Restart the 2nd psql session
+($killme_stdin2, $killme_stdout2, $killme_stderr2) = ('', '', '');
+$killme2->run();
+
+# Insert one tuple and leave the transaction open
+$killme_stdin2 .= q[
+BEGIN;
+SELECT $$insert-tuple-to-lock-next-insert$$;
+INSERT INTO tab_crash (a) VALUES(1);
+];
+pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m);
+$killme_stdout2 = '';
+$killme_stderr2 = '';
+
 # Run the query that generates a temporary file and that will be killed before
 # it finishes. Since the query that generates the temporary file does not
 # return before the connection is killed, use a SELECT before to trigger
@@ -125,22 +161,20 @@ $killme_stderr = '';
 $killme_stdin .= q[
 BEGIN;
 SELECT $$in-progress-before-sigkill$$;
-WITH foo AS (SELECT a FROM tab_crash ORDER BY a) SELECT a, pg_sleep(1) FROM foo;
+INSERT INTO tab_crash (a) SELECT i FROM generate_series(1, 1000) s(i);
 ];
 ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
-	'select in-progress-before-sigkill');
+	'insert in-progress-before-sigkill');
 $killme_stdout = '';
 $killme_stderr = '';
 
-# Wait some time so the temporary file is generated by SELECT
-usleep(10_000);
-
 # Kill with SIGKILL
 $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
 is($ret, 0, 'killed process with KILL');
 
 # Close psql session
 $killme->finish;
+$killme2->finish;
 
 # Wait till server restarts
 $node->poll_query_until('postgres', 'SELECT 1', '1');


Re: cleanup temporary files after crash

2021-03-18 Thread Euler Taveira
On Thu, Mar 18, 2021, at 5:51 PM, Tomas Vondra wrote:
> OK. Can you prepare a patch with the proposed test approach?
I'm on it.

> FWIW I can reproduce this on a 32-bit ARM system (rpi4), where 500 rows
> simply does not use a temp file, and with 1000 rows it works fine. On
> the x86_64 the temp file is created even with 500 rows. So there clearly
> is some platform dependency, not sure if it's due to 32/64 bits,
> alignment or something else. In any case, the 500 rows seems to be just
> on the threshold.
> 
> We need to do both - stop using the timing and increase the number of
> rows, to consistently get temp files.
Yeah.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: cleanup temporary files after crash

2021-03-18 Thread Tomas Vondra



On 3/18/21 9:06 PM, Euler Taveira wrote:
> On Thu, Mar 18, 2021, at 4:20 PM, Tomas Vondra wrote:
>> I think a better way to test this would be to use a tuple lock:
> I predicated such issues with this test. Your suggestion works for me. Maybe
> you should use less rows in the session 2 query.
> 
>> setup:
>>
>>   create table t (a int unique);
>>
>> session 1:
>>
>>   begin;
>>   insert into t values (1);
>>   ... keep open ...
>>
>> session 2:
>>
>>    begin;
>>    set work_mem = '64kB';
>>    insert into t select i from generate_series(1,1) s(i);
>>    ... should block ...
>>
>> Then, once the second session gets waiting on the tuple, kill the
>> backend. We might as well test that there actually is a temp file first,
>> and then test that it disappeared.
> Your suggestion works for me. Maybe you could use less rows in the session 2
> query. I experimented with 1k rows and it generates a temporary file.
> 

OK. Can you prepare a patch with the proposed test approach?

FWIW I can reproduce this on a 32-bit ARM system (rpi4), where 500 rows
simply does not use a temp file, and with 1000 rows it works fine. On
the x86_64 the temp file is created even with 500 rows. So there clearly
is some platform dependency, not sure if it's due to 32/64 bits,
alignment or something else. In any case, the 500 rows seems to be just
on the threshold.

We need to do both - stop using the timing and increase the number of
rows, to consistently get temp files.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-18 Thread Euler Taveira
On Thu, Mar 18, 2021, at 4:20 PM, Tomas Vondra wrote:
> I think a better way to test this would be to use a tuple lock:
I predicated such issues with this test. Your suggestion works for me. Maybe
you should use less rows in the session 2 query.

> setup:
> 
>   create table t (a int unique);
> 
> session 1:
> 
>   begin;
>   insert into t values (1);
>   ... keep open ...
> 
> session 2:
> 
>begin;
>set work_mem = '64kB';
>insert into t select i from generate_series(1,1) s(i);
>... should block ...
> 
> Then, once the second session gets waiting on the tuple, kill the
> backend. We might as well test that there actually is a temp file first,
> and then test that it disappeared.
Your suggestion works for me. Maybe you could use less rows in the session 2
query. I experimented with 1k rows and it generates a temporary file.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: cleanup temporary files after crash

2021-03-18 Thread Tomas Vondra
On 3/18/21 6:51 PM, Tomas Vondra wrote:
> Hmm,
> 
> crake and florican seem to have failed because of this commit, with this
> error in the new TAP test:
> 
> error running SQL: 'psql::1: ERROR:  could not open directory
> "base/pgsql_tmp": No such file or directory'
> while running 'psql -XAtq -d port=64336 host=/tmp/sv1WjSvj3P
> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT COUNT(1)
> FROM pg_ls_dir($$base/pgsql_tmp$$)' at
> /home/andrew/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
> line 1572.
> 
> So it seems the pgsql_tmp directory does not exist, for some reason.
> Considering the directory should be created for the first temp file,
> that either means the query in the TAP test does not actually create a
> temp file on those machines, or it gets killed too early.
> 
> The 500 rows used by the test seems fairly low, so maybe those machines
> can do the sort entirely in memory?
> 
> The other option is that the sleep in the TAP test is a bit too short,
> but those machines don't seem to be that slow.
> 
> Anyway, TAP test relying on timing like this may not be the best idea,
> so I wonder how else to test this ...
> 

I think a better way to test this would be to use a tuple lock:

setup:

  create table t (a int unique);

session 1:

  begin;
  insert into t values (1);
  ... keep open ...

session 2:

   begin;
   set work_mem = '64kB';
   insert into t select i from generate_series(1,1) s(i);
   ... should block ...

Then, once the second session gets waiting on the tuple, kill the
backend. We might as well test that there actually is a temp file first,
and then test that it disappeared.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-18 Thread Tomas Vondra
Hmm,

crake and florican seem to have failed because of this commit, with this
error in the new TAP test:

error running SQL: 'psql::1: ERROR:  could not open directory
"base/pgsql_tmp": No such file or directory'
while running 'psql -XAtq -d port=64336 host=/tmp/sv1WjSvj3P
dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT COUNT(1)
FROM pg_ls_dir($$base/pgsql_tmp$$)' at
/home/andrew/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
line 1572.

So it seems the pgsql_tmp directory does not exist, for some reason.
Considering the directory should be created for the first temp file,
that either means the query in the TAP test does not actually create a
temp file on those machines, or it gets killed too early.

The 500 rows used by the test seems fairly low, so maybe those machines
can do the sort entirely in memory?

The other option is that the sleep in the TAP test is a bit too short,
but those machines don't seem to be that slow.

Anyway, TAP test relying on timing like this may not be the best idea,
so I wonder how else to test this ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-18 Thread Tomas Vondra



On 3/17/21 2:34 AM, Euler Taveira wrote:
> On Sun, Mar 14, 2021, at 11:01 PM, Thomas Munro wrote:
>> On Wed, Mar 10, 2021 at 1:31 AM Michael Paquier > > wrote:
>> > On Tue, Mar 09, 2021 at 02:28:43AM +0100, Tomas Vondra wrote:
>> > > Let's move this patch forward. Based on the responses, I agree the
>> > > default behavior should be to remove the temp files, and I think we
>> > > should have the GUC (on the off chance that someone wants to preserve
>> > > the temporary files for debugging or whatever other reason).
>> >
>> > Thanks for taking care of this.  I am having some second-thoughts
>> > about changing this behavior by default, still that's much more useful
>> > this way.
>>
>> +1 for having it on by default.
>>
>> I was also just looking at this patch and came here to say LGTM except
>> for two cosmetic things, below.
> Thanks for taking a look at this patch. I'm not sure Tomas is preparing
> a new
> patch that includes the suggested modifications but I decided to do it. This
> new version has the new GUC name (using "remove"). I also replaced "cleanup"
> with "remove" in the all the remain places. As pointed by Thomas, I reworded
> the paragraph that describes the GUC moving the default information to the
> beginning of the sentence. I also added the "literal" as suggested by
> Michael.
> The postgresql.conf.sample was fixed. The tests was slightly modified. I
> reworded some comments and added a hack to avoid breaking the temporary file
> test in slow machines. A usleep() after sending the query provides some time
> for the query to create the temporary file. I used an arbitrarily sleep
> (10ms)
> that seems to be sufficient.
> 

Thanks. Pushed with some minor changes to docs wording.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: cleanup temporary files after crash

2021-03-16 Thread Euler Taveira
On Sun, Mar 14, 2021, at 11:01 PM, Thomas Munro wrote:
> On Wed, Mar 10, 2021 at 1:31 AM Michael Paquier  wrote:
> > On Tue, Mar 09, 2021 at 02:28:43AM +0100, Tomas Vondra wrote:
> > > Let's move this patch forward. Based on the responses, I agree the
> > > default behavior should be to remove the temp files, and I think we
> > > should have the GUC (on the off chance that someone wants to preserve
> > > the temporary files for debugging or whatever other reason).
> >
> > Thanks for taking care of this.  I am having some second-thoughts
> > about changing this behavior by default, still that's much more useful
> > this way.
> 
> +1 for having it on by default.
> 
> I was also just looking at this patch and came here to say LGTM except
> for two cosmetic things, below.
Thanks for taking a look at this patch. I'm not sure Tomas is preparing a new
patch that includes the suggested modifications but I decided to do it. This
new version has the new GUC name (using "remove"). I also replaced "cleanup"
with "remove" in the all the remain places. As pointed by Thomas, I reworded
the paragraph that describes the GUC moving the default information to the
beginning of the sentence. I also added the "literal" as suggested by Michael.
The postgresql.conf.sample was fixed. The tests was slightly modified. I
reworded some comments and added a hack to avoid breaking the temporary file
test in slow machines. A usleep() after sending the query provides some time
for the query to create the temporary file. I used an arbitrarily sleep (10ms)
that seems to be sufficient.

> > +#remove_temp_files_after_crash = on# remove temporary files after
> > +#  # backend crash?
> > The indentation of the second line is incorrect here (Incorrect number
> > of spaces in tabs perhaps?), and there is no need for the '#' at the
> > beginning of the line.
> 
> Yeah, that's wrong.  For some reason that one file uses a tab size of
> 8, unlike the rest of the tree (I guess because people will read that
> file in software with the more common setting of 8).  If you do :set
> tabstop=8 in vim, suddenly it all makes sense, but it is revealed that
> this patch has it wrong, as you said.  (Perhaps this file should have
> some of those special Vim/Emacs control messages so we don't keep
> getting this wrong?)
I hadn't noticed that this file use ts=8. (This explains the misalignment that
I observed in some parameter comments). I'm not sure if people care about the
indentation in this file. From the development perspective, we can use the same
number of spaces for Tab as the source code so it is not required to fix your
dev environment. However, the regular user doesn't follow the dev guidelines so
it could probably observe the misalignment while using Vim (that has 8 spaces
as default), for example. For now, I will add

autocmd BufRead,BufNewFile postgresql.conf* setlocal ts=8

to my .vimrc. We should probably fix some settings that are misaligned such as
parallel_setup_cost and  shared_preload_libraries. The parameters
timezone_abbreviations and max_pred_locks_per_page are using spaces instead of
tabs and should probably be fixed too.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 8dfee5694210c14054170563ff01fc15a66a76b0 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 25 May 2020 00:08:20 -0300
Subject: [PATCH v2] Control temporary files removal after crash

A new GUC remove_temp_files_after_crash controls whether temporary
files are removed after a crash. Successive crashes could result in
useless storage usage until service is restarted. It could be the case
on host with inadequate resources. This manual intervention for some
environments is not desirable. This GUC is marked as SIGHUP hence you
don't have to interrupt the service to change it.

The current behavior introduces a backward incompatibility (minor one)
which means Postgres will reclaim temporary file space after a crash.
The main reason is that we favor service continuity over debugging.
---
 doc/src/sgml/config.sgml  |  18 ++
 src/backend/postmaster/postmaster.c   |   5 +
 src/backend/storage/file/fd.c |  12 +-
 src/backend/utils/misc/guc.c  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/postmaster/postmaster.h   |   1 +
 src/test/recovery/t/022_crash_temp_files.pl   | 192 ++
 7 files changed, 234 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/022_crash_temp_files.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5e551b9f6d..c0b1b48136 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9648,6 +9648,24 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  remove_temp_files_after_crash (boolean)
+  
+   remove_temp_files_after_crash configuration parameter
+  
+  
+  
+   
+  

Re: cleanup temporary files after crash

2021-03-14 Thread Thomas Munro
On Wed, Mar 10, 2021 at 1:31 AM Michael Paquier  wrote:
> On Tue, Mar 09, 2021 at 02:28:43AM +0100, Tomas Vondra wrote:
> > Let's move this patch forward. Based on the responses, I agree the
> > default behavior should be to remove the temp files, and I think we
> > should have the GUC (on the off chance that someone wants to preserve
> > the temporary files for debugging or whatever other reason).
>
> Thanks for taking care of this.  I am having some second-thoughts
> about changing this behavior by default, still that's much more useful
> this way.

+1 for having it on by default.

I was also just looking at this patch and came here to say LGTM except
for two cosmetic things, below.

> > I propose to rename the GUC to remove_temp_files_after_crash, I think
> > "remove" is a bit clearer than "cleanup". I've also reworded the sgml
> > docs a little bit.
>
> "remove" sounds fine to me.

+1

> > Attached is a patch with those changes. Barring objections, I'll get
> > this committed in the next couple days.
>
> +When set to on, PostgreSQL will 
> automatically
> Nit: using a  markup for the "on" value.

Maybe should say "which is the default", like other similar things?

> +#remove_temp_files_after_crash = on# remove temporary files after
> +#  # backend crash?
> The indentation of the second line is incorrect here (Incorrect number
> of spaces in tabs perhaps?), and there is no need for the '#' at the
> beginning of the line.

Yeah, that's wrong.  For some reason that one file uses a tab size of
8, unlike the rest of the tree (I guess because people will read that
file in software with the more common setting of 8).  If you do :set
tabstop=8 in vim, suddenly it all makes sense, but it is revealed that
this patch has it wrong, as you said.  (Perhaps this file should have
some of those special Vim/Emacs control messages so we don't keep
getting this wrong?)




Re: cleanup temporary files after crash

2021-03-09 Thread Euler Taveira
On Tue, Mar 9, 2021, at 9:31 AM, Michael Paquier wrote:
> On Tue, Mar 09, 2021 at 02:28:43AM +0100, Tomas Vondra wrote:
> > Let's move this patch forward. Based on the responses, I agree the
> > default behavior should be to remove the temp files, and I think we
> > should have the GUC (on the off chance that someone wants to preserve
> > the temporary files for debugging or whatever other reason).
> 
> Thanks for taking care of this.  I am having some second-thoughts
> about changing this behavior by default, still that's much more useful
> this way.
> 
> > I propose to rename the GUC to remove_temp_files_after_crash, I think
> > "remove" is a bit clearer than "cleanup". I've also reworded the sgml
> > docs a little bit.
> 
> "remove" sounds fine to me.
+1.

> > Attached is a patch with those changes. Barring objections, I'll get
> > this committed in the next couple days.
> 
> +When set to on, PostgreSQL will 
> automatically
> Nit: using a  markup for the "on" value.
> 
> +#remove_temp_files_after_crash = on# remove temporary files after
> +#  # backend crash?
> The indentation of the second line is incorrect here (Incorrect number
> of spaces in tabs perhaps?), and there is no need for the '#' at the
> beginning of the line.
> --
> Michael
That was my fault. Editor automatically added #.

I'm not sure Tomas will include the tests. If so. the terminology should be 
adjusted too.

+++ b/src/test/recovery/t/022_crash_temp_files.pl
@@ -0,0 +1,194 @@
+#
+# Test cleanup of temporary files after a crash.

s/cleanup/remove/


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: cleanup temporary files after crash

2021-03-09 Thread Michael Paquier
On Tue, Mar 09, 2021 at 02:28:43AM +0100, Tomas Vondra wrote:
> Let's move this patch forward. Based on the responses, I agree the
> default behavior should be to remove the temp files, and I think we
> should have the GUC (on the off chance that someone wants to preserve
> the temporary files for debugging or whatever other reason).

Thanks for taking care of this.  I am having some second-thoughts
about changing this behavior by default, still that's much more useful
this way.

> I propose to rename the GUC to remove_temp_files_after_crash, I think
> "remove" is a bit clearer than "cleanup". I've also reworded the sgml
> docs a little bit.

"remove" sounds fine to me.

> Attached is a patch with those changes. Barring objections, I'll get
> this committed in the next couple days.

+When set to on, PostgreSQL will 
automatically
Nit: using a  markup for the "on" value.

+#remove_temp_files_after_crash = on# remove temporary files after
+#  # backend crash?
The indentation of the second line is incorrect here (Incorrect number
of spaces in tabs perhaps?), and there is no need for the '#' at the
beginning of the line.
--
Michael


signature.asc
Description: PGP signature


Re: cleanup temporary files after crash

2021-03-08 Thread Tomas Vondra
routines generally report syscall failures
  * with ereport(LOG) and keep going.  Removing temp files is not so critical
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3fd1a5fbe2..64ac48b2e5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1361,6 +1361,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"remove_temp_files_after_crash", PGC_SIGHUP, ERROR_HANDLING_OPTIONS,
+			gettext_noop("Remove temporary files after backend crash."),
+			NULL
+		},
+		_temp_files_after_crash,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"log_duration", PGC_SUSET, LOGGING_WHAT,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ee06528bb0..25c1b89ffc 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -759,6 +759,8 @@
 
 #exit_on_error = off			# terminate session on any error?
 #restart_after_crash = on		# reinitialize after backend crash?
+#remove_temp_files_after_crash = on	# remove temporary files after
+#	# backend crash?
 #data_sync_retry = off			# retry or panic on failure to fsync
 	# data?
 	# (change requires restart)
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index cfa59c4dc0..0efdd7c232 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -29,6 +29,7 @@ extern bool log_hostname;
 extern bool enable_bonjour;
 extern char *bonjour_name;
 extern bool restart_after_crash;
+extern bool remove_temp_files_after_crash;
 
 #ifdef WIN32
 extern HANDLE PostmasterHandle;
diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl
new file mode 100644
index 00..45a70a368e
--- /dev/null
+++ b/src/test/recovery/t/022_crash_temp_files.pl
@@ -0,0 +1,194 @@
+#
+# Test cleanup of temporary files after a crash.
+#
+# This is a description that I will add later.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use Config;
+use Time::HiRes qw(usleep);
+
+plan tests => 9;
+
+
+# To avoid hanging while expecting some specific input from a psql
+# instance being driven by us, add a timeout high enough that it
+# should never trigger even on very slow machines, unless something
+# is really wrong.
+my $psql_timeout = IPC::Run::timer(60);
+
+my $node = get_new_node('node_crash');
+$node->init();
+$node->start();
+
+# By default, PostgresNode doesn't restart after crash
+# Reduce work_mem to generate temporary file with a few number of rows
+$node->safe_psql(
+	'postgres',
+	q[ALTER SYSTEM SET remove_temp_files_after_crash = on;
+   ALTER SYSTEM SET log_connections = 1;
+   ALTER SYSTEM SET work_mem = '64kB';
+   ALTER SYSTEM SET restart_after_crash = on;
+   SELECT pg_reload_conf();]);
+
+# create table, insert rows
+$node->safe_psql(
+	'postgres',
+	q[CREATE TABLE tab_crash (a text);
+		INSERT INTO tab_crash (a) SELECT gen_random_uuid() FROM generate_series(1, 500);]);
+
+#
+# Test cleanup temporary files after crash
+#
+
+# Run psql, keeping session alive, so we have an alive backend to kill.
+my ($killme_stdin, $killme_stdout, $killme_stderr) = ('', '', '');
+my $killme = IPC::Run::start(
+	[
+		'psql', '-X', '-qAt', '-v', 'ON_ERROR_STOP=1', '-f', '-', '-d',
+		$node->connstr('postgres')
+	],
+	'<',
+	\$killme_stdin,
+	'>',
+	\$killme_stdout,
+	'2>',
+	\$killme_stderr,
+	$psql_timeout);
+
+# Get backend pid
+$killme_stdin .= q[
+SELECT pg_backend_pid();
+];
+ok(pump_until($killme, \$killme_stdout, qr/[[:digit:]]+[\r\n]$/m),
+	'acquired pid for SIGKILL');
+my $pid = $killme_stdout;
+chomp($pid);
+$killme_stdout = '';
+$killme_stderr = '';
+
+# Run the query that generates a temporary file and that will be killed before
+# it finishes. Since the query that generates the temporary file does not
+# return before the connection is killed, use a SELECT before to trigger
+# pump_until.
+$killme_stdin .= q[
+BEGIN;
+SELECT $$in-progress-before-sigkill$$;
+WITH foo AS (SELECT a FROM tab_crash ORDER BY a) SELECT a, pg_sleep(1) FROM foo;
+];
+ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
+	'select in-progress-before-sigkill');
+$killme_stdout = '';
+$killme_stderr = '';
+
+# Kill with SIGKILL
+my $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
+is($ret, 0, 'killed process with KILL');
+
+# Close psql session
+$killme->finish;
+
+# Wait till server restarts
+$node->poll_query_until('postgres', 'SELECT 1', '1');
+
+# Check for temporary files
+is($node->safe_psql(
+	'postgres',
+	'SELECT COUNT(1) FROM pg_ls_dir($$base/pgsql_tmp$$)'),
+	qq(0), 'no temporary files');
+
+#
+# Test old behavior (doesn't cleanup temporary file after crash)
+#
+$node->safe_psql(
+	'postgres',
+	q[ALTER SYSTEM SET remove_temp_files_after_cras

Re: cleanup temporary files after crash

2020-11-26 Thread Euler Taveira
On Thu, 26 Nov 2020 at 05:48, Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

>
> I also think that the GUC is not needed here. This 'feature' was
> internal from the very beginning, so users shouldn't care about
> preserving old behavior. Without the GUC the patch is very simple,
> please see attached version. I also omit the test, because I am not sure
> it will be stable given that the RemovePgTempFiles() allows the
> possibility of failure.
>
> Anastasia, thanks for reviewing it. As I said, I'm fine without the GUC.
However, if we decided to go with the GUC, default behavior should be remove
the temporary files after the crash.

Regards,

-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: cleanup temporary files after crash

2020-11-26 Thread Anastasia Lubennikova

On 01.11.2020 04:25, Michael Paquier wrote:

On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:

I thought about not providing a GUC at all or provide it in the developer
section. I've never heard someone saying that they use those temporary
files to investigate an issue. Regarding a crash, all information is already
available and temporary files don't provide extra details. This new
GUC is just to keep the previous behavior. I'm fine without the GUC, though.

The original behavior is as old as 4a5f38c4, and last time we talked
about that there were arguments about still keeping the existing
behavior to not cleanup files during a restart-after-crash scenario
for the sake of being useful just "in case".  I have never used that
property myself, TBH, and I have seen much more cases of users caring
about the data folder not facing an ENOSPC particularly if they don't
use different partitions for pg_wal/ and the main data folder.
--
Michael


Thank you, Euler for submitting this.
+1 for the feature. One of the platforms we support uses temp files a 
lot and we faced the problem, while never actually used these orphan 
files for debugging purposes.


I also think that the GUC is not needed here. This 'feature' was 
internal from the very beginning, so users shouldn't care about 
preserving old behavior. Without the GUC the patch is very simple, 
please see attached version. I also omit the test, because I am not sure 
it will be stable given that the RemovePgTempFiles() allows the 
possibility of failure.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 417581b287f4060178595cd96465adc639639290
Author: anastasia 
Date:   Thu Nov 26 11:45:05 2020 +0300

Remove temporary files after a backend crash

Successive crashes could result in useless storage usage until service
is restarted. It could be the case on host with inadequate resources.
This manual intervention for some environments is not desirable.

The only reason we kept temp files was that someone might want to examine
them for debugging purposes. With this patch we favor service continuity
over debugging.

Author: Euler Taveira
Reviewer: Anastasia Lubennikova

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b7799ed1d24..a151250734f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3999,6 +3999,9 @@ PostmasterStateMachine(void)
 		ereport(LOG,
 (errmsg("all server processes terminated; reinitializing")));
 
+		/* remove leftover temporary files after a crash */
+		RemovePgTempFiles();
+
 		/* allow background workers to immediately restart */
 		ResetBackgroundWorkerCrashTimes();
 
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 05abcf72d68..d30bfc28541 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2992,15 +2992,15 @@ CleanupTempFiles(bool isCommit, bool isProcExit)
  * remove any leftover files created by OpenTemporaryFile and any leftover
  * temporary relation files created by mdcreate.
  *
- * NOTE: we could, but don't, call this during a post-backend-crash restart
- * cycle.  The argument for not doing it is that someone might want to examine
- * the temp files for debugging purposes.  This does however mean that
- * OpenTemporaryFile had better allow for collision with an existing temp
- * file name.
+ * This is also called during a post-backend-crash restart cycle. The argument
+ * for not doing it is that crashes of queries using temp files can result in
+ * useless storage usage that can only be reclaimed by a service restart.
  *
  * NOTE: this function and its subroutines generally report syscall failures
  * with ereport(LOG) and keep going.  Removing temp files is not so critical
  * that we should fail to start the database when we can't do it.
+ * Because this routine is not strict, OpenTemporaryFile allows for collision
+ * with an existing temp file name.
  */
 void
 RemovePgTempFiles(void)


Re: cleanup temporary files after crash

2020-10-31 Thread Michael Paquier
On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> I thought about not providing a GUC at all or provide it in the developer
> section. I've never heard someone saying that they use those temporary
> files to investigate an issue. Regarding a crash, all information is already
> available and temporary files don't provide extra details. This new
> GUC is just to keep the previous behavior. I'm fine without the GUC, though.

The original behavior is as old as 4a5f38c4, and last time we talked
about that there were arguments about still keeping the existing
behavior to not cleanup files during a restart-after-crash scenario
for the sake of being useful just "in case".  I have never used that
property myself, TBH, and I have seen much more cases of users caring
about the data folder not facing an ENOSPC particularly if they don't
use different partitions for pg_wal/ and the main data folder.
--
Michael


signature.asc
Description: PGP signature


Re: cleanup temporary files after crash

2020-10-31 Thread Euler Taveira
On Wed, 28 Oct 2020 at 15:42, Tomas Vondra 
wrote:

>
> I did a quick review and the patch seems fine to me. Let's wait for a
> bit and see if there are any objections - if not, I'll get it committed
> in the next CF.
>
>
Tomas, thanks for your review.


> One thing I'm not sure about is whether we should have the GUC as
> proposed, or have a negative "keep_temp_files_after_restart" defaulting
> to false. But I don't have a very good justification for the alternative
> other than vague personal preference.
>
>
I thought about not providing a GUC at all or provide it in the developer
section. I've never heard someone saying that they use those temporary
files to
investigate an issue. Regarding a crash, all information is already
available
and temporary files don't provide extra details. This new GUC is just to
keep the
previous behavior. I'm fine without the GUC, though.


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: cleanup temporary files after crash

2020-10-28 Thread Tomas Vondra

Hi Euler,

On Wed, Oct 28, 2020 at 11:16:26AM -0300, Euler Taveira wrote:

Hi,

Bug #16427 mentioned that temporary files are not removed after a crash. I
heard similar complaints from some customers. In the bug report [1], I
proposed a new GUC to control whether temporary files are removed after a
crash recovery. The current behavior is only useful for debugging purposes.
It also has an undesirable behavior: you have to restart the service to
reclaim the space. Interrupting the service continuity isn't always an
option and due to limited resources, you have no choice but to restart the
service.

The new GUC cleanup_temp_files_after_crash is marked as SIGHUP. Hence, you
can enable it to debug without service interruption. The default value is
on which means it changes the current behavior. Documentation and tests are
included.



I did a quick review and the patch seems fine to me. Let's wait for a
bit and see if there are any objections - if not, I'll get it committed
in the next CF.

One thing I'm not sure about is whether we should have the GUC as
proposed, or have a negative "keep_temp_files_after_restart" defaulting
to false. But I don't have a very good justification for the alternative
other than vague personal preference.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





cleanup temporary files after crash

2020-10-28 Thread Euler Taveira
usage that can
+ * only be reclaimed by a service restart. The argument against enabling it is
+ * that someone might want to examine the temporary files for debugging
+ * purposes. This does however mean that OpenTemporaryFile had better allow for
+ * collision with an existing temp file name.
  *
  * NOTE: this function and its subroutines generally report syscall failures
  * with ereport(LOG) and keep going.  Removing temp files is not so critical
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..44f070cb66 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1358,6 +1358,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"cleanup_temp_files_after_crash", PGC_SIGHUP, ERROR_HANDLING_OPTIONS,
+			gettext_noop("Remove temporary files after backend crash."),
+			NULL
+		},
+		_temp_files_after_crash,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"log_duration", PGC_SUSET, LOGGING_WHAT,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..a4c3439211 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -757,6 +757,8 @@
 
 #exit_on_error = off			# terminate session on any error?
 #restart_after_crash = on		# reinitialize after backend crash?
+#cleanup_temp_files_after_crash = on	# remove temporary files after
+#		# backend crash?
 #data_sync_retry = off			# retry or panic on failure to fsync
 	# data?
 	# (change requires restart)
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index babc87dfc9..90cc7ccb62 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -29,6 +29,7 @@ extern bool log_hostname;
 extern bool enable_bonjour;
 extern char *bonjour_name;
 extern bool restart_after_crash;
+extern bool cleanup_temp_files_after_crash;
 
 #ifdef WIN32
 extern HANDLE PostmasterHandle;
diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl
new file mode 100644
index 00..dee3783a0c
--- /dev/null
+++ b/src/test/recovery/t/022_crash_temp_files.pl
@@ -0,0 +1,194 @@
+#
+# Test cleanup of temporary files after a crash.
+#
+# This is a description that I will add later.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use Config;
+use Time::HiRes qw(usleep);
+
+plan tests => 9;
+
+
+# To avoid hanging while expecting some specific input from a psql
+# instance being driven by us, add a timeout high enough that it
+# should never trigger even on very slow machines, unless something
+# is really wrong.
+my $psql_timeout = IPC::Run::timer(60);
+
+my $node = get_new_node('node_crash');
+$node->init();
+$node->start();
+
+# By default, PostgresNode doesn't restart after crash
+# Reduce work_mem to generate temporary file with a few number of rows
+$node->safe_psql(
+	'postgres',
+	q[ALTER SYSTEM SET cleanup_temp_files_after_crash = on;
+   ALTER SYSTEM SET log_connections = 1;
+   ALTER SYSTEM SET work_mem = '64kB';
+   ALTER SYSTEM SET restart_after_crash = on;
+   SELECT pg_reload_conf();]);
+
+# create table, insert rows
+$node->safe_psql(
+	'postgres',
+	q[CREATE TABLE tab_crash (a text);
+		INSERT INTO tab_crash (a) SELECT gen_random_uuid() FROM generate_series(1, 500);]);
+
+#
+# Test cleanup temporary files after crash
+#
+
+# Run psql, keeping session alive, so we have an alive backend to kill.
+my ($killme_stdin, $killme_stdout, $killme_stderr) = ('', '', '');
+my $killme = IPC::Run::start(
+	[
+		'psql', '-X', '-qAt', '-v', 'ON_ERROR_STOP=1', '-f', '-', '-d',
+		$node->connstr('postgres')
+	],
+	'<',
+	\$killme_stdin,
+	'>',
+	\$killme_stdout,
+	'2>',
+	\$killme_stderr,
+	$psql_timeout);
+
+# Get backend pid
+$killme_stdin .= q[
+SELECT pg_backend_pid();
+];
+ok(pump_until($killme, \$killme_stdout, qr/[[:digit:]]+[\r\n]$/m),
+	'acquired pid for SIGKILL');
+my $pid = $killme_stdout;
+chomp($pid);
+$killme_stdout = '';
+$killme_stderr = '';
+
+# Run the query that generates a temporary file and that will be killed before
+# it finishes. Since the query that generates the temporary file does not
+# return before the connection is killed, use a SELECT before to trigger
+# pump_until.
+$killme_stdin .= q[
+BEGIN;
+SELECT $$in-progress-before-sigkill$$;
+WITH foo AS (SELECT a FROM tab_crash ORDER BY a) SELECT a, pg_sleep(1) FROM foo;
+];
+ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m),
+	'select in-progress-before-sigkill');
+$killme_stdout = '';
+$killme_stderr = '';
+
+# Kill with SIGKILL
+my $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
+is($ret, 0, 'killed process with KILL');
+
+# Close psql session
+$killme->finish;
+
+# Wait till server restarts
+$node->poll_query_until('pos