Re: Transaction timeout
On Wed, Mar 13, 2024 at 7:56 AM Andrey M. Borodin wrote: > > On 13 Mar 2024, at 05:23, Alexander Korotkov wrote: > > > > On Tue, Mar 12, 2024 at 10:28 AM Andrey M. Borodin > > wrote: > >>> On 11 Mar 2024, at 16:18, Alexander Korotkov wrote: > >>> > >>> I think if checking psql stderr is problematic, checking just logs is > >>> fine. Could we wait for the relevant log messages one by one with > >>> $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do? > >> > >> PFA version with $node->wait_for_log() > > > > I've slightly revised the patch. I'm going to push it if no objections. > > One small note: log_offset was updated, but was not used. Thank you. This is the updated version. -- Regards, Alexander Korotkov v7-0001-Add-TAP-tests-for-timeouts.patch Description: Binary data
Re: Transaction timeout
> On 13 Mar 2024, at 05:23, Alexander Korotkov wrote: > > On Tue, Mar 12, 2024 at 10:28 AM Andrey M. Borodin > wrote: >>> On 11 Mar 2024, at 16:18, Alexander Korotkov wrote: >>> >>> I think if checking psql stderr is problematic, checking just logs is >>> fine. Could we wait for the relevant log messages one by one with >>> $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do? >> >> PFA version with $node->wait_for_log() > > I've slightly revised the patch. I'm going to push it if no objections. One small note: log_offset was updated, but was not used. Thanks! Best regards, Andrey Borodin. v6-0001-Add-TAP-tests-for-timeouts.patch Description: Binary data
Re: Transaction timeout
On Tue, Mar 12, 2024 at 10:28 AM Andrey M. Borodin wrote: > > On 11 Mar 2024, at 16:18, Alexander Korotkov wrote: > > > > I think if checking psql stderr is problematic, checking just logs is > > fine. Could we wait for the relevant log messages one by one with > > $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do? > > PFA version with $node->wait_for_log() I've slightly revised the patch. I'm going to push it if no objections. -- Regards, Alexander Korotkov v5-0001-Add-TAP-tests-for-timeouts.patch Description: Binary data
Re: Transaction timeout
> On 11 Mar 2024, at 16:18, Alexander Korotkov wrote: > > I think if checking psql stderr is problematic, checking just logs is > fine. Could we wait for the relevant log messages one by one with > $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do? PFA version with $node->wait_for_log() Best regards, Andrey Borodin. v4-0001-Add-timeouts-TAP-tests.patch Description: Binary data
Re: Transaction timeout
On Mon, Mar 11, 2024 at 12:53 PM Andrey M. Borodin wrote: > > On 7 Mar 2024, at 00:55, Alexander Korotkov wrote: > > > > On Wed, Mar 6, 2024 at 10:22 AM Andrey M. Borodin > > wrote: > >>> On 25 Feb 2024, at 21:50, Alexander Korotkov wrote: > >>> > >>> Thank you for the patches. I've pushed the 0001 patch to avoid > >>> further failures on buildfarm. Let 0004 wait till injections points > >>> by Mechael are committed. > >> > >> Thanks! > >> > >> All prerequisites are committed. I propose something in a line with this > >> patch. > > > > Thank you. I took a look at the patch. Should we also check the > > relevant message after the timeout is fired? We could check it in > > psql stderr or log for that. > > PFA version which checks log output. > But I could not come up with a proper use of BackgroundPsql->query_until() to > check outputs. And there are multiple possible errors. > > We can copy test from src/bin/psql/t/001_basic.pl: > > # test behavior and output on server crash > my ($ret, $out, $err) = $node->psql('postgres', > "SELECT 'before' AS running;\n" > . "SELECT pg_terminate_backend(pg_backend_pid());\n" > . "SELECT 'AFTER' AS not_running;\n"); > > is($ret, 2, 'server crash: psql exit code'); > like($out, qr/before/, 'server crash: output before crash'); > ok($out !~ qr/AFTER/, 'server crash: no output after crash'); > is( $err, > 'psql::2: FATAL: terminating connection due to administrator command > psql::2: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > psql::2: error: connection to server was lost', > 'server crash: error message’); > > But I do not see much value in this. > What do you think? I think if checking psql stderr is problematic, checking just logs is fine. Could we wait for the relevant log messages one by one with $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do? -- Regards, Alexander Korotkov
Re: Transaction timeout
> On 7 Mar 2024, at 00:55, Alexander Korotkov wrote: > > On Wed, Mar 6, 2024 at 10:22 AM Andrey M. Borodin > wrote: >>> On 25 Feb 2024, at 21:50, Alexander Korotkov wrote: >>> >>> Thank you for the patches. I've pushed the 0001 patch to avoid >>> further failures on buildfarm. Let 0004 wait till injections points >>> by Mechael are committed. >> >> Thanks! >> >> All prerequisites are committed. I propose something in a line with this >> patch. > > Thank you. I took a look at the patch. Should we also check the > relevant message after the timeout is fired? We could check it in > psql stderr or log for that. PFA version which checks log output. But I could not come up with a proper use of BackgroundPsql->query_until() to check outputs. And there are multiple possible errors. We can copy test from src/bin/psql/t/001_basic.pl: # test behavior and output on server crash my ($ret, $out, $err) = $node->psql('postgres', "SELECT 'before' AS running;\n" . "SELECT pg_terminate_backend(pg_backend_pid());\n" . "SELECT 'AFTER' AS not_running;\n"); is($ret, 2, 'server crash: psql exit code'); like($out, qr/before/, 'server crash: output before crash'); ok($out !~ qr/AFTER/, 'server crash: no output after crash'); is( $err, 'psql::2: FATAL: terminating connection due to administrator command psql::2: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql::2: error: connection to server was lost', 'server crash: error message’); But I do not see much value in this. What do you think? Best regards, Andrey Borodin. v3-0001-Add-timeouts-TAP-tests.patch Description: Binary data
Re: Transaction timeout
On Wed, Mar 6, 2024 at 10:22 AM Andrey M. Borodin wrote: > > On 25 Feb 2024, at 21:50, Alexander Korotkov wrote: > > > > Thank you for the patches. I've pushed the 0001 patch to avoid > > further failures on buildfarm. Let 0004 wait till injections points > > by Mechael are committed. > > Thanks! > > All prerequisites are committed. I propose something in a line with this > patch. Thank you. I took a look at the patch. Should we also check the relevant message after the timeout is fired? We could check it in psql stderr or log for that. -- Regards, Alexander Korotkov
Re: Transaction timeout
> On 25 Feb 2024, at 21:50, Alexander Korotkov wrote: > > Thank you for the patches. I've pushed the 0001 patch to avoid > further failures on buildfarm. Let 0004 wait till injections points > by Mechael are committed. Thanks! All prerequisites are committed. I propose something in a line with this patch. Best regards, Andrey Borodin. v2-0001-Add-timeouts-TAP-tests.patch Description: Binary data
Re: Transaction timeout
Hi, Andrey! On Thu, Feb 22, 2024 at 7:23 PM Andrey M. Borodin wrote: > > On 19 Feb 2024, at 15:17, Japin Li wrote: > > > > > > +1 > > PFA patch set of 4 patches: > 1. remove all potential flaky tests. BTW recently we had a bingo when 3 of > them failed together [0] > 2-3. waiting injection points patchset by Michael Paquier, intact v2 from > nearby thread. > 4. prototype of simple TAP tests for timeouts. > > I did not add a test for statement_timeout, because it still have good > coverage in isolation tests. But added test for idle_sessoin_timeout. > Maybe these tests could be implemented with NOTICE injection points (not > requiring steps 2-3), but I'm afraid that they might be flaky too: FATALed > connection might not send information necesary for test success (we will see > something like "PQconsumeInput failed: server closed the connection > unexpectedly" as in [1]). Thank you for the patches. I've pushed the 0001 patch to avoid further failures on buildfarm. Let 0004 wait till injections points by Mechael are committed. -- Regards, Alexander Korotkov
Re: Transaction timeout
> On 19 Feb 2024, at 15:17, Japin Li wrote: > > > +1 PFA patch set of 4 patches: 1. remove all potential flaky tests. BTW recently we had a bingo when 3 of them failed together [0] 2-3. waiting injection points patchset by Michael Paquier, intact v2 from nearby thread. 4. prototype of simple TAP tests for timeouts. I did not add a test for statement_timeout, because it still have good coverage in isolation tests. But added test for idle_sessoin_timeout. Maybe these tests could be implemented with NOTICE injection points (not requiring steps 2-3), but I'm afraid that they might be flaky too: FATALed connection might not send information necesary for test success (we will see something like "PQconsumeInput failed: server closed the connection unexpectedly" as in [1]). Best regards, Andrey Borodin. [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-02-20%2010%3A20%3A13 [1] https://www.postgresql.org/message-id/flat/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com 0001-Remove-flacky-isolation-tests-for-timeouts.patch Description: Binary data 0004-Add-timeouts-TAP-tests.patch Description: Binary data 0003-Add-regression-test-for-restart-points-during-promot.patch Description: Binary data 0002-injection_points-Add-routines-to-wait-and-wake-proce.patch Description: Binary data
Re: Transaction timeout
On Mon, 19 Feb 2024 at 17:14, Andrey M. Borodin wrote: >> On 18 Feb 2024, at 22:16, Andrey M. Borodin wrote: >> >> But it seems a little strange that session 3 did not fail at all > It was only coincidence. Any test that verifies FATALing out in 100ms can > fail, see new failure here [0]. > > In a nearby thread Michael is proposing injections points that can wait and > be awaken. So I propose following course of action: > 1. Remove all tests that involve pg_stat_activity test of FATALed session > (any permutation with checker_sleep step) > 2. Add idle_in_transaction_session_timeout, statement_timeout and > transaction_timeout tests when injection points features get committed. > +1 > Alexander, what do you think? >
Re: Transaction timeout
> On 18 Feb 2024, at 22:16, Andrey M. Borodin wrote: > > But it seems a little strange that session 3 did not fail at all It was only coincidence. Any test that verifies FATALing out in 100ms can fail, see new failure here [0]. In a nearby thread Michael is proposing injections points that can wait and be awaken. So I propose following course of action: 1. Remove all tests that involve pg_stat_activity test of FATALed session (any permutation with checker_sleep step) 2. Add idle_in_transaction_session_timeout, statement_timeout and transaction_timeout tests when injection points features get committed. Alexander, what do you think? Best regards, Andrey Borodin. [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-02-18%2022%3A23%3A45
Re: Transaction timeout
Alexander, thanks for pushing this! This is small but very awaited feature. > On 16 Feb 2024, at 02:08, Andres Freund wrote: > > Isn't this test going to be very fragile on busy / slow machines? What if the > pg_sleep() takes one second, because there were other tasks to schedule? I'd > be surprised if this didn't fail under valgrind, for example. Even more robust tests that were bullet-proof in CI previously exhibited some failures on buildfarm. Currently there are 5 failures through this weekend. Failing tests are testing interaction of idle_in_transaction_session_timeout vs transaction_timeout(5), and rescheduling transaction_timeout(6). Symptoms: [0] transaction timeout occurs when it is being scheduled. Seems like SET was running to long. step s6_begin: BEGIN ISOLATION LEVEL READ COMMITTED; step s6_tt: SET statement_timeout = '1s'; SET transaction_timeout = '10ms'; +s6: FATAL: terminating connection due to transaction timeout step checker_sleep: SELECT pg_sleep(0.1); [1] transaction timeout 10ms is not detected after 1s step s6_check: SELECT count(*) FROM pg_stat_activity WHERE application_name = 'isolation/timeouts/s6'; count - -0 +1 [2] transaction timeout is not detected in both session 5 and session 6. So far not signle animal reported failures twice, so it's hard to say anything about frequency. But it seems to be significant source of failures. So far I have these ideas: 1. Remove test sessions 5 and 6. But it seems a little strange that session 3 did not fail at all (it is testing interaction of statement_timeout and transaction_timeout). This test is very similar to test sessiont 5... 2. Increase wait times. step checker_sleep { SELECT pg_sleep(0.1); } Seems not enough to observe backend timed out from pg_stat_activity. But this won't help from [0]. 3. Reuse waiting INJECTION_POINT from [3] to make timeout tests deterministic and safe from race conditions. With waiting injection points we can wait as much as needed in current environment. Any advices are welcome. Best regards, Andrey Borodin. [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-02-16%2020%3A06%3A51 [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-02-16%2001%3A45%3A10 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2024-02-17%2001%3A55%3A45 [3] https://www.postgresql.org/message-id/0925f9a9-4d53-4b27-a87e-3d83a757b...@yandex-team.ru
Re: Transaction timeout
Hi, On 2024-02-13 23:42:35 +0200, Alexander Korotkov wrote: > diff --git a/src/backend/access/transam/xact.c > b/src/backend/access/transam/xact.c > index 464858117e0..a124ba59330 100644 > --- a/src/backend/access/transam/xact.c > +++ b/src/backend/access/transam/xact.c > @@ -2139,6 +2139,10 @@ StartTransaction(void) >*/ > s->state = TRANS_INPROGRESS; > > + /* Schedule transaction timeout */ > + if (TransactionTimeout > 0) > + enable_timeout_after(TRANSACTION_TIMEOUT, TransactionTimeout); > + > ShowTransactionState("StartTransaction"); > } Isn't it a problem that all uses of StartTransaction() trigger a timeout, but transaction commit/abort don't? What if e.g. logical replication apply starts a transaction, commits it, and then goes idle? The timer would still be active, afaict? I don't think it works well to enable timeouts in xact.c and to disable them in PostgresMain(). > @@ -4491,12 +4511,18 @@ PostgresMain(const char *dbname, const char *username) > > pgstat_report_activity(STATE_IDLEINTRANSACTION_ABORTED, NULL); > > /* Start the idle-in-transaction timer */ > - if (IdleInTransactionSessionTimeout > 0) > + if (IdleInTransactionSessionTimeout > 0 > + && (IdleInTransactionSessionTimeout < > TransactionTimeout || TransactionTimeout == 0)) > { > idle_in_transaction_timeout_enabled = > true; > > enable_timeout_after(IDLE_IN_TRANSACTION_SESSION_TIMEOUT, > > IdleInTransactionSessionTimeout); > } > + > + /* Schedule or reschedule transaction timeout */ > + if (TransactionTimeout > 0 && > !get_timeout_active(TRANSACTION_TIMEOUT)) > + > enable_timeout_after(TRANSACTION_TIMEOUT, > + > TransactionTimeout); > } > else if (IsTransactionOrTransactionBlock()) > { > @@ -4504,12 +4530,18 @@ PostgresMain(const char *dbname, const char *username) > pgstat_report_activity(STATE_IDLEINTRANSACTION, > NULL); > > /* Start the idle-in-transaction timer */ > - if (IdleInTransactionSessionTimeout > 0) > + if (IdleInTransactionSessionTimeout > 0 > + && (IdleInTransactionSessionTimeout < > TransactionTimeout || TransactionTimeout == 0)) > { > idle_in_transaction_timeout_enabled = > true; > > enable_timeout_after(IDLE_IN_TRANSACTION_SESSION_TIMEOUT, > > IdleInTransactionSessionTimeout); > } > + > + /* Schedule or reschedule transaction timeout */ > + if (TransactionTimeout > 0 && > !get_timeout_active(TRANSACTION_TIMEOUT)) > + > enable_timeout_after(TRANSACTION_TIMEOUT, > + > TransactionTimeout); > } > else > { Why do we need to do anything in these cases if the timer is started in StartTransaction()? > new file mode 100644 > index 000..ce2c9a43011 > --- /dev/null > +++ b/src/test/isolation/specs/timeouts-long.spec > @@ -0,0 +1,35 @@ > +# Tests for transaction timeout that require long wait times > + > +session s7 > +step s7_begin > +{ > +BEGIN ISOLATION LEVEL READ COMMITTED; > +SET transaction_timeout = '1s'; > +} > +step s7_commit_and_chain { COMMIT AND CHAIN; } > +step s7_sleep{ SELECT pg_sleep(0.6); } > +step s7_abort{ ABORT; } > + > +session s8 > +step s8_begin > +{ > +BEGIN ISOLATION LEVEL READ COMMITTED; > +SET transaction_timeout = '900ms'; > +} > +# to test that quick query does not restart transaction_timeout > +step s8_select_1 { SELECT 1; } > +step s8_sleep{ SELECT pg_sleep(0.6); } > + > +session checker > +step checker_sleep { SELECT pg_sleep(0.3); } Isn't this test going to be very fragile on busy / slow machines? What if the pg_sleep() takes one second, because there were other tasks to schedule? I'd be surprised if this didn't fail under valgrind, for example. Greetings, Andres Freund
Re: Transaction timeout
Hi! On Wed, Jan 31, 2024 at 11:57 AM Andrey Borodin wrote: > > On 31 Jan 2024, at 14:27, Japin Li wrote: > > > > LGTM. > > > > If there is no other objections, I'll change it to ready for committer > > next Monday. > > I think we have a quorum, so I decided to go ahead and flipped status to RfC. > Thanks! I checked this patch. Generally I look good. I've slightly revised that. I think there is one unaddressed concern by Andres Freund [1] about the overhead of this patch by adding extra branches and function calls in the case transaction_timeout is disabled. I tried to measure the overhead of this patch using a pgbench script containing 20 semicolons (20 empty statements in 20 empty transactions). I didn't manage to find measurable overhead or change of performance profile (I used XCode Instruments on my x86 MacBook). One thing, which I still found possible to do is to avoid unconditional calls to get_timeout_active(TRANSACTION_TIMEOUT). Instead I put responsibility for disabling timeout after GUC disables the transaction_timeout assign hook. I removed the TODO comment from _doSetFixedOutputState(). I think backup restore is the operation where slow commands and slow transactions are expected, and it's natural to disable transaction_timeout among other timeouts there. And the existing comment clarifies that. Also I made some grammar fixes to docs and comments. I'm going to push this if there are no objections. Links. 1. https://www.postgresql.org/message-id/20221206011050.s6hapukjqha35hud%40alap3.anarazel.de -- Regards, Alexander Korotkov 0001-Introduce-transaction_timeout-v26.patch Description: Binary data
Re: Transaction timeout
> On 31 Jan 2024, at 14:27, Japin Li wrote: > > LGTM. > > If there is no other objections, I'll change it to ready for committer > next Monday. I think we have a quorum, so I decided to go ahead and flipped status to RfC. Thanks! Best regards, Andrey Borodin.
Re: Transaction timeout
On Tue, 30 Jan 2024 at 14:22, Andrey M. Borodin wrote: >> On 26 Jan 2024, at 19:58, Japin Li wrote: >> >> Thanks for updating the patch. Here are some comments for v24. >> >> + >> +Terminate any session that spans longer than the specified amount of >> +time in transaction. The limit applies both to explicit transactions >> +(started with BEGIN) and to implicitly started >> +transaction corresponding to single statement. But this limit is not >> +applied to prepared transactions. >> +If this value is specified without units, it is taken as >> milliseconds. >> +A value of zero (the default) disables the timeout. >> + >> The sentence "But this limit is not applied to prepared transactions" is >> redundant, >> since we have a paragraph to describe this later. > Fixed. >> >> + >> + >> +If transaction_timeout is shorter than >> +idle_in_transaction_session_timeout or >> statement_timeout >> +transaction_timeout will invalidate longer >> timeout. >> + >> + >> >> Since we are already try to disable the timeouts, should we try to disable >> them even if they are equal. > > Well, we disable timeouts on equality. Fixed docs. > >> >> + >> + >> +Prepared transactions are not subject for this timeout. >> + >> >> Maybe wrap this with is a good idea. > Done. > Thanks for updating the patch. LGTM. If there is no other objections, I'll change it to ready for committer next Monday.
Re: Transaction timeout
> On 26 Jan 2024, at 19:58, Japin Li wrote: > > Thanks for updating the patch. Here are some comments for v24. > > + > +Terminate any session that spans longer than the specified amount of > +time in transaction. The limit applies both to explicit transactions > +(started with BEGIN) and to implicitly started > +transaction corresponding to single statement. But this limit is not > +applied to prepared transactions. > +If this value is specified without units, it is taken as > milliseconds. > +A value of zero (the default) disables the timeout. > + > The sentence "But this limit is not applied to prepared transactions" is > redundant, > since we have a paragraph to describe this later. Fixed. > > + > + > +If transaction_timeout is shorter than > +idle_in_transaction_session_timeout or > statement_timeout > +transaction_timeout will invalidate longer > timeout. > + > + > > Since we are already try to disable the timeouts, should we try to disable > them even if they are equal. Well, we disable timeouts on equality. Fixed docs. > > + > + > +Prepared transactions are not subject for this timeout. > + > > Maybe wrap this with is a good idea. Done. > >> I’ve inspected CI fails and they were caused by two different problems: >> 1. It’s unsafe for isaoltion tester to await transaction_timeout within a >> query. Usually it gets >> FATAL: terminating connection due to transaction timeout >> But if VM is a bit slow it can get occasional >> PQconsumeInput failed: server closed the connection unexpectedly >> So, currently all tests use “passive waiting”, in a session that will not >> timeout. >> >> 2. In some cases pg_sleep(0.1) were sleeping up to 200 ms. That was making >> s7 and s8 fail, because they rely on this margin. > > I'm curious why this happened. I think pg_sleep() cannot provide guarantees on when next query will be executed. In our case we need that isolation tester see that sleep is over and continue in other session... >> I’ve separated these tests into different test timeouts-long and increased >> margin to 300ms. Now tests run horrible 2431 ms. Moreover I’m afraid that on >> buildfarm we can have much randomly-slower machines so this test might be >> excluded. >> This test checks COMMIT AND CHAIN and flow of small queries (Nik’s case). >> >> Also I’ve verified that every "enable_timeout_after(TRANSACTION_TIMEOUT)” >> and “disable_timeout(TRANSACTION_TIMEOUT)” is necessary and found that case >> of aborting "idle in transaction (aborted)” is not covered by tests. I’m not >> sure we need a test for this. > > I see there is a test about idle_in_transaction_timeout and > transaction_timeout. > > Both of them only check the session, but don't check the reason, so we cannot > distinguish the reason they are terminated. Right? Yes. > >> Japin, Junwang, what do you think? > > However, checking the reason on the timeout session may cause regression test > failed (as you point in 1), I don't strongly insist on it. Indeed, if we check a reason of FATAL timeouts - we get flaky tests. Best regards, Andrey Borodin. v25-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
On Fri, 26 Jan 2024 at 14:44, Andrey M. Borodin wrote: >> On 22 Jan 2024, at 11:23, Peter Smith wrote: >> >> Hi, This patch has a CF status of "Needs Review" [1], but it seems >> there was a CFbot test failure last time it was run [2]. Please have a >> look and post an updated version if necessary. > Thanks Peter! > Thanks for updating the patch. Here are some comments for v24. + +Terminate any session that spans longer than the specified amount of +time in transaction. The limit applies both to explicit transactions +(started with BEGIN) and to implicitly started +transaction corresponding to single statement. But this limit is not +applied to prepared transactions. +If this value is specified without units, it is taken as milliseconds. +A value of zero (the default) disables the timeout. + The sentence "But this limit is not applied to prepared transactions" is redundant, since we have a paragraph to describe this later. + + +If transaction_timeout is shorter than +idle_in_transaction_session_timeout or statement_timeout +transaction_timeout will invalidate longer timeout. + + Since we are already try to disable the timeouts, should we try to disable them even if they are equal. + + +Prepared transactions are not subject for this timeout. + Maybe wrap this with is a good idea. > I’ve inspected CI fails and they were caused by two different problems: > 1. It’s unsafe for isaoltion tester to await transaction_timeout within a > query. Usually it gets > FATAL: terminating connection due to transaction timeout > But if VM is a bit slow it can get occasional > PQconsumeInput failed: server closed the connection unexpectedly > So, currently all tests use “passive waiting”, in a session that will not > timeout. > > 2. In some cases pg_sleep(0.1) were sleeping up to 200 ms. That was making s7 > and s8 fail, because they rely on this margin. I'm curious why this happened. > I’ve separated these tests into different test timeouts-long and increased > margin to 300ms. Now tests run horrible 2431 ms. Moreover I’m afraid that on > buildfarm we can have much randomly-slower machines so this test might be > excluded. > This test checks COMMIT AND CHAIN and flow of small queries (Nik’s case). > > Also I’ve verified that every "enable_timeout_after(TRANSACTION_TIMEOUT)” and > “disable_timeout(TRANSACTION_TIMEOUT)” is necessary and found that case of > aborting "idle in transaction (aborted)” is not covered by tests. I’m not > sure we need a test for this. I see there is a test about idle_in_transaction_timeout and transaction_timeout. Both of them only check the session, but don't check the reason, so we cannot distinguish the reason they are terminated. Right? > Japin, Junwang, what do you think? However, checking the reason on the timeout session may cause regression test failed (as you point in 1), I don't strongly insist on it. -- Best regards, Japin Li.
Re: Transaction timeout
> On 26 Jan 2024, at 11:44, Andrey M. Borodin wrote: > > > 1. It’s unsafe for isaoltion tester to await transaction_timeout within a > query. Usually it gets > FATAL: terminating connection due to transaction timeout > But if VM is a bit slow it can get occasional > PQconsumeInput failed: server closed the connection unexpectedly > So, currently all tests use “passive waiting”, in a session that will not > timeout. Oops, sorry, I’ve accidentally sent version without this fix. Here it is. Best regards, Andrey Borodin. v24-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
> On 22 Jan 2024, at 11:23, Peter Smith wrote: > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > there was a CFbot test failure last time it was run [2]. Please have a > look and post an updated version if necessary. Thanks Peter! I’ve inspected CI fails and they were caused by two different problems: 1. It’s unsafe for isaoltion tester to await transaction_timeout within a query. Usually it gets FATAL: terminating connection due to transaction timeout But if VM is a bit slow it can get occasional PQconsumeInput failed: server closed the connection unexpectedly So, currently all tests use “passive waiting”, in a session that will not timeout. 2. In some cases pg_sleep(0.1) were sleeping up to 200 ms. That was making s7 and s8 fail, because they rely on this margin. I’ve separated these tests into different test timeouts-long and increased margin to 300ms. Now tests run horrible 2431 ms. Moreover I’m afraid that on buildfarm we can have much randomly-slower machines so this test might be excluded. This test checks COMMIT AND CHAIN and flow of small queries (Nik’s case). Also I’ve verified that every "enable_timeout_after(TRANSACTION_TIMEOUT)” and “disable_timeout(TRANSACTION_TIMEOUT)” is necessary and found that case of aborting "idle in transaction (aborted)” is not covered by tests. I’m not sure we need a test for this. Japin, Junwang, what do you think? Thanks! Best regards, Andrey Borodin. v23-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
2024-01 Commitfest. Hi, This patch has a CF status of "Needs Review" [1], but it seems there was a CFbot test failure last time it was run [2]. Please have a look and post an updated version if necessary. == [1] https://commitfest.postgresql.org/46/4040/ [2] https://cirrus-ci.com/task/4721191139672064 Kind Regards, Peter Smith.
Re: Transaction timeout
> On 4 Jan 2024, at 07:14, Japin Li wrote: > > Does the timeout is too short for testing? I see the timeouts for lock_timeout > and statement_timeout is more bigger than transaction_timeout. Makes sense. Done. I've also put some effort into fine-tuning timeouts Nik's case tests. To have 100ms gap between check, false positive and actual bug we had I had to use transaction_timeout = 300ms. Currently all tests take more than 1000ms! But I do not see a way to make these tests both stable and short. Best regards, Andrey Borodin. v22-0001-Introduce-transaction_timeout.patch Description: Binary data v22-0002-Add-tests-for-transaction_timeout.patch Description: Binary data v22-0003-Try-to-enable-transaction_timeout-before-next-co.patch Description: Binary data v22-0004-fix-reschedule-timeout-for-each-commmand.patch Description: Binary data
Re: Transaction timeout
On Wed, 03 Jan 2024 at 20:04, Andrey M. Borodin wrote: >> On 3 Jan 2024, at 16:46, Andrey M. Borodin wrote: >> >> I do not understand why, but mailing list did not pick patches that I sent. >> I'll retry. > > > Sorry for the noise. Seems like Apple updated something in Mail.App couple of > days ago and it started to use strange "Apple-Mail" stuff by default. > I see patches were attached, but were not recognized by mailing list archives > and CFbot. > Now I've flipped everything to "plain text by default" everywhere. Hope that > helps. > Thanks for updating the patch, I find the test on Debian with mason failed [1]. Does the timeout is too short for testing? I see the timeouts for lock_timeout and statement_timeout is more bigger than transaction_timeout. [1] https://api.cirrus-ci.com/v1/artifact/task/5490718928535552/testrun/build-32/testrun/isolation/isolation/regression.diffs
Re: Transaction timeout
> On 3 Jan 2024, at 16:46, Andrey M. Borodin wrote: > > I do not understand why, but mailing list did not pick patches that I sent. > I'll retry. Sorry for the noise. Seems like Apple updated something in Mail.App couple of days ago and it started to use strange "Apple-Mail" stuff by default. I see patches were attached, but were not recognized by mailing list archives and CFbot. Now I've flipped everything to "plain text by default" everywhere. Hope that helps. Best regards, Andrey Borodin. v21-0001-Introduce-transaction_timeout.patch Description: Binary data v21-0004-fix-reschedule-timeout-for-each-commmand.patch Description: Binary data v21-0003-Try-to-enable-transaction_timeout-before-next-co.patch Description: Binary data v21-0002-Add-better-tests-for-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
On 3 Jan 2024, at 11:39, Andrey M. Borodinwrote:On 1 Jan 2024, at 19:28, Andrey M. Borodin wrote: 3. Check that timeout is not rescheduled by new queries (Nik's case)The test of Nik's case was not stable enough together with COMMIT AND CHAIN. So I've separated these cases into different permutations.Looking through CI logs it seems variation in sleeps and actual timeouts easily reach 30+ms. I'm not entirely sure we can reach 100% stable tests without too big timeouts.Best regards, Andrey Borodin. I do not understand why, but mailing list did not pick patches that I sent. I'll retry. v21-0001-Introduce-transaction_timeout.patch Description: Binary data v21-0004-fix-reschedule-timeout-for-each-commmand.patch Description: Binary data v21-0003-Try-to-enable-transaction_timeout-before-next-co.patch Description: Binary data v21-0002-Add-better-tests-for-transaction_timeout.patch Description: Binary data Best regards, Andrey Borodin.
Re: Transaction timeout
On 1 Jan 2024, at 19:28, Andrey M. Borodinwrote: 3. Check that timeout is not rescheduled by new queries (Nik's case)The test of Nik's case was not stable enough together with COMMIT AND CHAIN. So I've separated these cases into different permutations.Looking through CI logs it seems variation in sleeps and actual timeouts easily reach 30+ms. I'm not entirely sure we can reach 100% stable tests without too big timeouts.Best regards, Andrey Borodin. v21-0001-Introduce-transaction_timeout.patch Description: Binary data v21-0002-Add-better-tests-for-transaction_timeout.patch Description: Binary data v21-0003-Try-to-enable-transaction_timeout-before-next-co.patch Description: Binary data v21-0004-fix-reschedule-timeout-for-each-commmand.patch Description: Binary data
Re: Transaction timeout
> On 29 Dec 2023, at 16:15, Andrey M. Borodin wrote: PFA v20. Code steps are intact. Further refactored tests: 1. Check termination of active and idle queries (previously tests from Li were testing only termination of idle query) 2. Check timeout reschedule (even when last active query was 'SET transaction_timeout') 3. Check that timeout is not rescheduled by new queries (Nik's case) Do we have any other open items? I've left 'make check-timeouts' in isolation directory, it's for development purposes. I think we should remove this before committing. Obviously, all patch steps are expected to be squashed before commit. Best regards, Andrey Borodin. v20-0001-Introduce-transaction_timeout.patch Description: Binary data v20-0004-fix-reschedule-timeout-for-each-commmand.patch Description: Binary data v20-0003-Try-to-enable-transaction_timeout-before-next-co.patch Description: Binary data v20-0002-Add-better-tests-for-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
> On 29 Dec 2023, at 16:00, Junwang Zhao wrote: > > After exploring the code, I found scheduling the timeout in > `StartTransaction` might be a reasonable idea, all the chain > commands will call this function. > > What concerns me is that it is also called by StartParallelWorkerTransaction, > I'm not sure if we should enable this timeout for parallel execution. I think for parallel workers we should mimic statement_timeout. Because these workers have per-statemenent lifetime. Best regards, Andrey Borodin.
Re: Transaction timeout
On Fri, Dec 29, 2023 at 6:00 PM Andrey M. Borodin wrote: > > > > > On 28 Dec 2023, at 21:02, Junwang Zhao wrote: > > > > Seems V5~V17 doesn't work as expected for Nikolay's case: > > > > Yeah, that's a problem. > > So I propose the following change, what do you think? > This breaks COMMIT AND CHAIN. > > PFA v18: I've added a test for Nik's case and for COMMIT AND CHAIN. Now we > need to fix stuff to pass this tests (I've crafted output). > We also need test for patchset step "Try to enable transaction_timeout before > next command". > > Thanks! After exploring the code, I found scheduling the timeout in `StartTransaction` might be a reasonable idea, all the chain commands will call this function. What concerns me is that it is also called by StartParallelWorkerTransaction, I'm not sure if we should enable this timeout for parallel execution. Thought? > > > Best regards, Andrey Borodin. -- Regards Junwang Zhao v19-0002-Use-test-from-Li-Japin-Also-add-tests-for-multip.patch Description: Binary data v19-0003-Try-to-enable-transaction_timeout-before-next-co.patch Description: Binary data v19-0001-Introduce-transaction_timeout.patch Description: Binary data v19-0004-fix-reschedule-timeout-for-each-commmand.patch Description: Binary data
Re: Transaction timeout
> On 28 Dec 2023, at 21:02, Junwang Zhao wrote: > > Seems V5~V17 doesn't work as expected for Nikolay's case: > Yeah, that's a problem. > So I propose the following change, what do you think? This breaks COMMIT AND CHAIN. PFA v18: I've added a test for Nik's case and for COMMIT AND CHAIN. Now we need to fix stuff to pass this tests (I've crafted output). We also need test for patchset step "Try to enable transaction_timeout before next command". Thanks! Best regards, Andrey Borodin. v18-0001-Introduce-transaction_timeout.patch Description: Binary data v18-0002-Use-test-from-Li-Japin.patch Description: Binary data v18-0003-Try-to-enable-transaction_timeout-before-next-co.patch Description: Binary data
Re: Transaction timeout
Hey Andrey, On Sun, Dec 24, 2023 at 1:14 AM Andrey M. Borodin wrote: > > > > > On 22 Dec 2023, at 10:39, Japin Li wrote: > > > > > > I try to split the test for transaction timeout, and all passed on my CI > > [1]. > > > I like the refactoring you did in timeout.spec. I thought it is impossible, > because permutations would try to reinitialize FATALed sessions. But, > obviously, tests work the way you refactored it. > However I don't think ignoring test failures on Windows without understanding > root cause is a good idea. > Let's get back to v13 version of tests, understand why it failed, apply your > test refactorings afterwards. BTW are you sure that v14 refactorings are > functional equivalent of v13 tests? > > To go with this plan I attach slightly modified version of v13 tests in v16 > patchset. The only change is timing in "sleep_there" step. I suspect that > failure was induced by more coarse timer granularity on Windows. Tests were > giving only 9 milliseconds for a timeout to entirely wipe away backend from > pg_stat_activity. This saves testing time, but might induce false positive > test flaps. So I've raised wait times to 100ms. This seems too much, but I do > not have other ideas how to ensure tests stability. Maybe 50ms would be > enough, I do not know. Isolation runs ~50 seconds now. I'm tempted to say > that 200ms for timeouts worth it. > > > As to 2nd step "Try to enable transaction_timeout during transaction", I > think this makes sense. But if we are doing so, shouldn't we also allow to > enable idle_in_transaction timeout in a same manner? Currently we only allow > to disable other timeouts... Also, if we are already in transaction, > shouldn't we also subtract current transaction span from timeout? > I think making this functionality as another step of the patchset was a good > idea. > > Thanks! Seems V5~V17 doesn't work as expected for Nikolay's case: postgres=# set transaction_timeout to '2s'; SET postgres=# begin; select pg_sleep(1); select pg_sleep(1); select pg_sleep(1); select pg_sleep(1); select pg_sleep(1); commit; BEGIN The reason for this seems the timer has been refreshed for each command, xact_started along can not indicate it's a new transaction or not, there is a TransactionState contains some infos. So I propose the following change, what do you think? diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index a2611cf8e6..cffd2c44d0 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2746,7 +2746,7 @@ start_xact_command(void) StartTransactionCommand(); /* Schedule or reschedule transaction timeout */ - if (TransactionTimeout > 0) + if (TransactionTimeout > 0 && !get_timeout_active(TRANSACTION_TIMEOUT)) enable_timeout_after(TRANSACTION_TIMEOUT, TransactionTimeout); xact_started = true; > > > Best regards, Andrey Borodin. -- Regards Junwang Zhao
Re: Transaction timeout
On Sun, 24 Dec 2023 at 01:14, Andrey M. Borodin wrote: >> On 22 Dec 2023, at 10:39, Japin Li wrote: >> >> >> I try to split the test for transaction timeout, and all passed on my CI [1]. > > > I like the refactoring you did in timeout.spec. I thought it is impossible, > because permutations would try to reinitialize FATALed sessions. But, > obviously, tests work the way you refactored it. > However I don't think ignoring test failures on Windows without understanding > root cause is a good idea. Yeah. > Let's get back to v13 version of tests, understand why it failed, apply your > test refactorings afterwards. BTW are you sure that v14 refactorings are > functional equivalent of v13 tests? > I think it is equivalent. Maybe I missing something. Please let me known if they are not equivalent. > To go with this plan I attach slightly modified version of v13 tests in v16 > patchset. The only change is timing in "sleep_there" step. I suspect that > failure was induced by more coarse timer granularity on Windows. Tests were > giving only 9 milliseconds for a timeout to entirely wipe away backend from > pg_stat_activity. This saves testing time, but might induce false positive > test flaps. So I've raised wait times to 100ms. This seems too much, but I do > not have other ideas how to ensure tests stability. Maybe 50ms would be > enough, I do not know. Isolation runs ~50 seconds now. I'm tempted to say > that 200ms for timeouts worth it. > So this is caused by Windows timer granularity? > As to 2nd step "Try to enable transaction_timeout during transaction", I > think this makes sense. But if we are doing so, shouldn't we also allow to > enable idle_in_transaction timeout in a same manner? I think the current idle_in_transaction_session_timeout work correctly. > Currently we only allow to disable other timeouts... Also, if we are already > in transaction, shouldn't we also subtract current transaction span from > timeout? Agreed. > I think making this functionality as another step of the patchset was a good > idea. > -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
Hi Andrey, On Sun, Dec 24, 2023 at 1:14 AM Andrey M. Borodin wrote: > > > > > On 22 Dec 2023, at 10:39, Japin Li wrote: > > > > > > I try to split the test for transaction timeout, and all passed on my CI > > [1]. > > > I like the refactoring you did in timeout.spec. I thought it is impossible, > because permutations would try to reinitialize FATALed sessions. But, > obviously, tests work the way you refactored it. > However I don't think ignoring test failures on Windows without understanding > root cause is a good idea. > Let's get back to v13 version of tests, understand why it failed, apply your > test refactorings afterwards. BTW are you sure that v14 refactorings are > functional equivalent of v13 tests? > > To go with this plan I attach slightly modified version of v13 tests in v16 > patchset. The only change is timing in "sleep_there" step. I suspect that > failure was induced by more coarse timer granularity on Windows. Tests were > giving only 9 milliseconds for a timeout to entirely wipe away backend from > pg_stat_activity. This saves testing time, but might induce false positive > test flaps. So I've raised wait times to 100ms. This seems too much, but I do > not have other ideas how to ensure tests stability. Maybe 50ms would be > enough, I do not know. Isolation runs ~50 seconds now. I'm tempted to say > that 200ms for timeouts worth it. > > > As to 2nd step "Try to enable transaction_timeout during transaction", I > think this makes sense. But if we are doing so, shouldn't we also allow to > enable idle_in_transaction timeout in a same manner? Currently we only allow > to disable other timeouts... Also, if we are already in transaction, > shouldn't we also subtract current transaction span from timeout? idle_in_transaction_session_timeout is already the behavior Japin suggested, it is enabled before backend sends *read for query* to client. > I think making this functionality as another step of the patchset was a good > idea. > > Thanks! > > > Best regards, Andrey Borodin. -- Regards Junwang Zhao
Re: Transaction timeout
> On 22 Dec 2023, at 10:39, Japin Li wrote: > > > I try to split the test for transaction timeout, and all passed on my CI [1]. I like the refactoring you did in timeout.spec. I thought it is impossible, because permutations would try to reinitialize FATALed sessions. But, obviously, tests work the way you refactored it. However I don't think ignoring test failures on Windows without understanding root cause is a good idea. Let's get back to v13 version of tests, understand why it failed, apply your test refactorings afterwards. BTW are you sure that v14 refactorings are functional equivalent of v13 tests? To go with this plan I attach slightly modified version of v13 tests in v16 patchset. The only change is timing in "sleep_there" step. I suspect that failure was induced by more coarse timer granularity on Windows. Tests were giving only 9 milliseconds for a timeout to entirely wipe away backend from pg_stat_activity. This saves testing time, but might induce false positive test flaps. So I've raised wait times to 100ms. This seems too much, but I do not have other ideas how to ensure tests stability. Maybe 50ms would be enough, I do not know. Isolation runs ~50 seconds now. I'm tempted to say that 200ms for timeouts worth it. As to 2nd step "Try to enable transaction_timeout during transaction", I think this makes sense. But if we are doing so, shouldn't we also allow to enable idle_in_transaction timeout in a same manner? Currently we only allow to disable other timeouts... Also, if we are already in transaction, shouldn't we also subtract current transaction span from timeout? I think making this functionality as another step of the patchset was a good idea. Thanks! Best regards, Andrey Borodin. v17-0001-Introduce-transaction_timeout.patch Description: Binary data v17-0002-Try-to-enable-transaction_timeout-before-next-co.patch Description: Binary data
Re: Transaction timeout
> 在 2023年12月23日,11:35,Junwang Zhao 写道: > > On Sat, Dec 23, 2023 at 11:17 AM Japin Li wrote: >> >> a >>> On Sat, 23 Dec 2023 at 10:40, Japin Li wrote: >>> On Sat, 23 Dec 2023 at 08:32, Japin Li wrote: On Fri, 22 Dec 2023 at 23:30, Junwang Zhao wrote: > On Fri, Dec 22, 2023 at 10:44 PM Japin Li wrote: >> >> >> On Fri, 22 Dec 2023 at 22:37, Junwang Zhao wrote: >>> On Fri, Dec 22, 2023 at 10:25 PM Japin Li wrote: I try to set idle_in_transaction_session_timeout after begin transaction, it changes immediately, so I think transaction_timeout should also be take immediately. >>> >>> Ah, right, idle_in_transaction_session_timeout is set after the set >>> command finishes and before the backend send *ready for query* >>> to the client, so the value of the GUC is already set before >>> next command. >>> >> >> I mean, is it possible to set transaction_timeout before next comand? >> > Yeah, it's possible, set transaction_timeout in the when it first > goes into *idle in transaction* mode, see the attached files. > Thanks for updating the patch, LGTM. >>> >>> Sorry for the noise! >>> >>> Read the previous threads, I find why the author enable transaction_timeout >>> in start_xact_command(). >>> >>> The v15 patch cannot handle COMMIT AND CHAIN, see [1]. For example: >>> >>> SET transaction_timeout TO '2s'; BEGIN; SELECT 1, pg_sleep(1); COMMIT AND >>> CHAIN; SELECT 2, pg_sleep(1); COMMIT; >>> >>> The transaction_timeout do not reset when executing COMMIT AND CHAIN. >>> >>> [1] >>> https://www.postgresql.org/message-id/a906dea1-76a1-4f26-76c5-a7efad3ef5b8%40oss.nttdata.com >> >> Attach v16 to solve this. Any suggestions? > > I've checked this with *COMMIT AND CHAIN* and *ABORT AND CHAIN*, > both work as expected. Thanks for the update. > Thanks for your testing and reviewing!
Re: Transaction timeout
On Sat, Dec 23, 2023 at 11:17 AM Japin Li wrote: > > a > On Sat, 23 Dec 2023 at 10:40, Japin Li wrote: > > On Sat, 23 Dec 2023 at 08:32, Japin Li wrote: > >> On Fri, 22 Dec 2023 at 23:30, Junwang Zhao wrote: > >>> On Fri, Dec 22, 2023 at 10:44 PM Japin Li wrote: > > > On Fri, 22 Dec 2023 at 22:37, Junwang Zhao wrote: > > On Fri, Dec 22, 2023 at 10:25 PM Japin Li wrote: > >> I try to set idle_in_transaction_session_timeout after begin > >> transaction, > >> it changes immediately, so I think transaction_timeout should also be > >> take > >> immediately. > > > > Ah, right, idle_in_transaction_session_timeout is set after the set > > command finishes and before the backend send *ready for query* > > to the client, so the value of the GUC is already set before > > next command. > > > > I mean, is it possible to set transaction_timeout before next comand? > > >>> Yeah, it's possible, set transaction_timeout in the when it first > >>> goes into *idle in transaction* mode, see the attached files. > >>> > >> > >> Thanks for updating the patch, LGTM. > > > > Sorry for the noise! > > > > Read the previous threads, I find why the author enable transaction_timeout > > in start_xact_command(). > > > > The v15 patch cannot handle COMMIT AND CHAIN, see [1]. For example: > > > > SET transaction_timeout TO '2s'; BEGIN; SELECT 1, pg_sleep(1); COMMIT AND > > CHAIN; SELECT 2, pg_sleep(1); COMMIT; > > > > The transaction_timeout do not reset when executing COMMIT AND CHAIN. > > > > [1] > > https://www.postgresql.org/message-id/a906dea1-76a1-4f26-76c5-a7efad3ef5b8%40oss.nttdata.com > > Attach v16 to solve this. Any suggestions? I've checked this with *COMMIT AND CHAIN* and *ABORT AND CHAIN*, both work as expected. Thanks for the update. > > -- > Regrads, > Japin Li > ChengDu WenWu Information Technology Co., Ltd. > -- Regards Junwang Zhao
Re: Transaction timeout
imeout. + + + +Setting transaction_timeout in +postgresql.conf is not recommended because it would +affect all sessions. + + + +Prepared transactions are not subject for this timeout. + + + + lock_timeout (integer) diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index b04fcfc8c8..e6fa1cfdc2 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -586,6 +586,7 @@ AutoVacLauncherMain(int argc, char *argv[]) * regular maintenance from being executed. */ SetConfigOption("statement_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE); + SetConfigOption("transaction_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE); SetConfigOption("lock_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE); SetConfigOption("idle_in_transaction_session_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE); @@ -1591,6 +1592,7 @@ AutoVacWorkerMain(int argc, char *argv[]) * regular maintenance from being executed. */ SetConfigOption("statement_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE); + SetConfigOption("transaction_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE); SetConfigOption("lock_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE); SetConfigOption("idle_in_transaction_session_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE); diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index b6451d9d08..4be06c1e5d 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -59,6 +59,7 @@ int DeadlockTimeout = 1000; int StatementTimeout = 0; int LockTimeout = 0; int IdleInTransactionSessionTimeout = 0; +int TransactionTimeout = 0; int IdleSessionTimeout = 0; bool log_lock_waits = false; diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 7298a187d1..a2611cf8e6 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2745,6 +2745,10 @@ start_xact_command(void) { StartTransactionCommand(); + /* Schedule or reschedule transaction timeout */ + if (TransactionTimeout > 0) + enable_timeout_after(TRANSACTION_TIMEOUT, TransactionTimeout); + xact_started = true; } @@ -3426,6 +3430,17 @@ ProcessInterrupts(void) IdleInTransactionSessionTimeoutPending = false; } + if (TransactionTimeoutPending) + { + /* As above, ignore the signal if the GUC has been reset to zero. */ + if (TransactionTimeout > 0) + ereport(FATAL, + (errcode(ERRCODE_TRANSACTION_TIMEOUT), + errmsg("terminating connection due to transaction timeout"))); + else + TransactionTimeoutPending = false; + } + if (IdleSessionTimeoutPending) { /* As above, ignore the signal if the GUC has been reset to zero. */ @@ -4491,7 +4506,8 @@ PostgresMain(const char *dbname, const char *username) pgstat_report_activity(STATE_IDLEINTRANSACTION_ABORTED, NULL); /* Start the idle-in-transaction timer */ -if (IdleInTransactionSessionTimeout > 0) +if (IdleInTransactionSessionTimeout > 0 + && (IdleInTransactionSessionTimeout < TransactionTimeout || TransactionTimeout == 0)) { idle_in_transaction_timeout_enabled = true; enable_timeout_after(IDLE_IN_TRANSACTION_SESSION_TIMEOUT, @@ -4504,7 +4520,8 @@ PostgresMain(const char *dbname, const char *username) pgstat_report_activity(STATE_IDLEINTRANSACTION, NULL); /* Start the idle-in-transaction timer */ -if (IdleInTransactionSessionTimeout > 0) +if (IdleInTransactionSessionTimeout > 0 + && (IdleInTransactionSessionTimeout < TransactionTimeout || TransactionTimeout == 0)) { idle_in_transaction_timeout_enabled = true; enable_timeout_after(IDLE_IN_TRANSACTION_SESSION_TIMEOUT, @@ -4562,6 +4579,9 @@ PostgresMain(const char *dbname, const char *username) enable_timeout_after(IDLE_SESSION_TIMEOUT, IdleSessionTimeout); } + +if (get_timeout_active(TRANSACTION_TIMEOUT)) + disable_timeout(TRANSACTION_TIMEOUT, false); } /* Report any recently-changed GUC options */ @@ -5120,7 +5140,8 @@ enable_statement_timeout(void) /* must be within an xact */ Assert(xact_started); - if (StatementTimeout > 0) + if (StatementTimeout > 0 + && (StatementTimeout < TransactionTimeout || TransactionTimeout == 0)) { if (!get_timeout_active(STATEMENT_TIMEOUT)) enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout); diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt index 8e97a0150f..8f1157afee 100644 --- a/src/backend/utils/errcodes.txt +++ b/src/backend/utils/errcodes.txt @@ -252,6 +252,7 @@ Section: Class 25 - Invalid Transaction State 25P01EERRCODE_NO_ACTIVE_SQL_TRANSACTION
Re: Transaction timeout
On Sat, Dec 23, 2023 at 10:40 AM Japin Li wrote: > > > On Sat, 23 Dec 2023 at 08:32, Japin Li wrote: > > On Fri, 22 Dec 2023 at 23:30, Junwang Zhao wrote: > >> On Fri, Dec 22, 2023 at 10:44 PM Japin Li wrote: > >>> > >>> > >>> On Fri, 22 Dec 2023 at 22:37, Junwang Zhao wrote: > >>> > On Fri, Dec 22, 2023 at 10:25 PM Japin Li wrote: > >>> >> I try to set idle_in_transaction_session_timeout after begin > >>> >> transaction, > >>> >> it changes immediately, so I think transaction_timeout should also be > >>> >> take > >>> >> immediately. > >>> > > >>> > Ah, right, idle_in_transaction_session_timeout is set after the set > >>> > command finishes and before the backend send *ready for query* > >>> > to the client, so the value of the GUC is already set before > >>> > next command. > >>> > > >>> > >>> I mean, is it possible to set transaction_timeout before next comand? > >>> > >> Yeah, it's possible, set transaction_timeout in the when it first > >> goes into *idle in transaction* mode, see the attached files. > >> > > > > Thanks for updating the patch, LGTM. > > Sorry for the noise! > > Read the previous threads, I find why the author enable transaction_timeout > in start_xact_command(). > > The v15 patch cannot handle COMMIT AND CHAIN, see [1]. For example: I didn't read the previous threads, sorry for that, let's stick to v14. > > SET transaction_timeout TO '2s'; BEGIN; SELECT 1, pg_sleep(1); COMMIT AND > CHAIN; SELECT 2, pg_sleep(1); COMMIT; > > The transaction_timeout do not reset when executing COMMIT AND CHAIN. > > [1] > https://www.postgresql.org/message-id/a906dea1-76a1-4f26-76c5-a7efad3ef5b8%40oss.nttdata.com > > -- > Regrads, > Japin Li > ChengDu WenWu Information Technology Co., Ltd. -- Regards Junwang Zhao
Re: Transaction timeout
On Sat, 23 Dec 2023 at 08:32, Japin Li wrote: > On Fri, 22 Dec 2023 at 23:30, Junwang Zhao wrote: >> On Fri, Dec 22, 2023 at 10:44 PM Japin Li wrote: >>> >>> >>> On Fri, 22 Dec 2023 at 22:37, Junwang Zhao wrote: >>> > On Fri, Dec 22, 2023 at 10:25 PM Japin Li wrote: >>> >> I try to set idle_in_transaction_session_timeout after begin transaction, >>> >> it changes immediately, so I think transaction_timeout should also be >>> >> take >>> >> immediately. >>> > >>> > Ah, right, idle_in_transaction_session_timeout is set after the set >>> > command finishes and before the backend send *ready for query* >>> > to the client, so the value of the GUC is already set before >>> > next command. >>> > >>> >>> I mean, is it possible to set transaction_timeout before next comand? >>> >> Yeah, it's possible, set transaction_timeout in the when it first >> goes into *idle in transaction* mode, see the attached files. >> > > Thanks for updating the patch, LGTM. Sorry for the noise! Read the previous threads, I find why the author enable transaction_timeout in start_xact_command(). The v15 patch cannot handle COMMIT AND CHAIN, see [1]. For example: SET transaction_timeout TO '2s'; BEGIN; SELECT 1, pg_sleep(1); COMMIT AND CHAIN; SELECT 2, pg_sleep(1); COMMIT; The transaction_timeout do not reset when executing COMMIT AND CHAIN. [1] https://www.postgresql.org/message-id/a906dea1-76a1-4f26-76c5-a7efad3ef5b8%40oss.nttdata.com -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
On Fri, 22 Dec 2023 at 23:30, Junwang Zhao wrote: > On Fri, Dec 22, 2023 at 10:44 PM Japin Li wrote: >> >> >> On Fri, 22 Dec 2023 at 22:37, Junwang Zhao wrote: >> > On Fri, Dec 22, 2023 at 10:25 PM Japin Li wrote: >> >> I try to set idle_in_transaction_session_timeout after begin transaction, >> >> it changes immediately, so I think transaction_timeout should also be take >> >> immediately. >> > >> > Ah, right, idle_in_transaction_session_timeout is set after the set >> > command finishes and before the backend send *ready for query* >> > to the client, so the value of the GUC is already set before >> > next command. >> > >> >> I mean, is it possible to set transaction_timeout before next comand? >> > Yeah, it's possible, set transaction_timeout in the when it first > goes into *idle in transaction* mode, see the attached files. > Thanks for updating the patch, LGTM. -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
On Fri, Dec 22, 2023 at 10:44 PM Japin Li wrote: > > > On Fri, 22 Dec 2023 at 22:37, Junwang Zhao wrote: > > On Fri, Dec 22, 2023 at 10:25 PM Japin Li wrote: > >> I try to set idle_in_transaction_session_timeout after begin transaction, > >> it changes immediately, so I think transaction_timeout should also be take > >> immediately. > > > > Ah, right, idle_in_transaction_session_timeout is set after the set > > command finishes and before the backend send *ready for query* > > to the client, so the value of the GUC is already set before > > next command. > > > > I mean, is it possible to set transaction_timeout before next comand? > Yeah, it's possible, set transaction_timeout in the when it first goes into *idle in transaction* mode, see the attached files. > > -- > Regrads, > Japin Li > ChengDu WenWu Information Technology Co., Ltd. -- Regards Junwang Zhao v15-0001-Introduce-transaction_timeout.patch Description: Binary data v15-0002-set-transaction_timeout-before-next-command.patch Description: Binary data
Re: Transaction timeout
On Fri, 22 Dec 2023 at 22:37, Junwang Zhao wrote: > On Fri, Dec 22, 2023 at 10:25 PM Japin Li wrote: >> I try to set idle_in_transaction_session_timeout after begin transaction, >> it changes immediately, so I think transaction_timeout should also be take >> immediately. > > Ah, right, idle_in_transaction_session_timeout is set after the set > command finishes and before the backend send *ready for query* > to the client, so the value of the GUC is already set before > next command. > I mean, is it possible to set transaction_timeout before next comand? -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
On Fri, Dec 22, 2023 at 10:25 PM Japin Li wrote: > > > On Fri, 22 Dec 2023 at 20:29, Junwang Zhao wrote: > > On Fri, Dec 22, 2023 at 1:39 PM Japin Li wrote: > >> > >> > >> On Tue, 19 Dec 2023 at 22:06, Japin Li wrote: > >> > On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin > >> > wrote: > >> >>> On 19 Dec 2023, at 13:26, Andrey M. Borodin > >> >>> wrote: > >> >>> > >> >>> I don’t have Windows machine, so I hope CF bot will pick this. > >> >> > >> >> I used Github CI to produce version of tests that seems to be is stable > >> >> on Windows. > >> > > >> > It still failed on Windows Server 2019 [1]. > >> > > >> > diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out > >> > C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out > >> > --- C:/cirrus/src/test/isolation/expected/timeouts.out2023-12-19 > >> > 10:34:30.354721100 + > >> > +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out > >> > 2023-12-19 10:38:25.877981600 + > >> > @@ -100,7 +100,7 @@ > >> > step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE > >> > application_name = 'isolation/timeouts/stt2' > >> > count > >> > - > >> > -0 > >> > +1 > >> > (1 row) > >> > > >> > step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET > >> > statement_timeout = '10s'; SET lock_timeout = '10s'; SET > >> > transaction_timeout = '10s'; > >> > > >> > [1] > >> > https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs > >> > >> Hi, > >> > >> I try to split the test for transaction timeout, and all passed on my CI > >> [1]. > >> > >> OTOH, I find if I set transaction_timeout in a transaction, it will not > >> take > >> effect immediately. For example: > >> > >> [local]:2049802 postgres=# BEGIN; > >> BEGIN > >> [local]:2049802 postgres=*# SET transaction_timeout TO '1s'; > > when this execute, TransactionTimeout is still 0, this command will > > not set timeout > >> SET > >> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1; -- wait > >> 10s > > when this command get execute, start_xact_command will enable the timer > > Thanks for your exaplantion, got it. > > >>relname > >> -- > >> pg_statistic > >> (1 row) > >> > >> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1; > >> FATAL: terminating connection due to transaction timeout > >> server closed the connection unexpectedly > >> This probably means the server terminated abnormally > >> before or while processing the request. > >> The connection to the server was lost. Attempting reset: Succeeded. > >> > >> It looks odd. Does this is expected? I'm not read all the threads, > >> am I missing something? > > > > I think this is by design, if you debug statement_timeout, it's the same > > behaviour, the timeout will be set for each command after the second > > command was called, you just aren't aware of this. > > > > I try to set idle_in_transaction_session_timeout after begin transaction, > it changes immediately, so I think transaction_timeout should also be take > immediately. Ah, right, idle_in_transaction_session_timeout is set after the set command finishes and before the backend send *ready for query* to the client, so the value of the GUC is already set before next command. I bet you must have checked this ;) > > > I doubt people will set this in a transaction. > > Maybe not, > > > -- > Regrads, > Japin Li > ChengDu WenWu Information Technology Co., Ltd. -- Regards Junwang Zhao
Re: Transaction timeout
On Fri, 22 Dec 2023 at 20:29, Junwang Zhao wrote: > On Fri, Dec 22, 2023 at 1:39 PM Japin Li wrote: >> >> >> On Tue, 19 Dec 2023 at 22:06, Japin Li wrote: >> > On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin >> > wrote: >> >>> On 19 Dec 2023, at 13:26, Andrey M. Borodin wrote: >> >>> >> >>> I don’t have Windows machine, so I hope CF bot will pick this. >> >> >> >> I used Github CI to produce version of tests that seems to be is stable >> >> on Windows. >> > >> > It still failed on Windows Server 2019 [1]. >> > >> > diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out >> > C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out >> > --- C:/cirrus/src/test/isolation/expected/timeouts.out2023-12-19 >> > 10:34:30.354721100 + >> > +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out >> > 2023-12-19 10:38:25.877981600 + >> > @@ -100,7 +100,7 @@ >> > step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE >> > application_name = 'isolation/timeouts/stt2' >> > count >> > - >> > -0 >> > +1 >> > (1 row) >> > >> > step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET >> > statement_timeout = '10s'; SET lock_timeout = '10s'; SET >> > transaction_timeout = '10s'; >> > >> > [1] >> > https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs >> >> Hi, >> >> I try to split the test for transaction timeout, and all passed on my CI [1]. >> >> OTOH, I find if I set transaction_timeout in a transaction, it will not take >> effect immediately. For example: >> >> [local]:2049802 postgres=# BEGIN; >> BEGIN >> [local]:2049802 postgres=*# SET transaction_timeout TO '1s'; > when this execute, TransactionTimeout is still 0, this command will > not set timeout >> SET >> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1; -- wait >> 10s > when this command get execute, start_xact_command will enable the timer Thanks for your exaplantion, got it. >>relname >> -- >> pg_statistic >> (1 row) >> >> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1; >> FATAL: terminating connection due to transaction timeout >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> The connection to the server was lost. Attempting reset: Succeeded. >> >> It looks odd. Does this is expected? I'm not read all the threads, >> am I missing something? > > I think this is by design, if you debug statement_timeout, it's the same > behaviour, the timeout will be set for each command after the second > command was called, you just aren't aware of this. > I try to set idle_in_transaction_session_timeout after begin transaction, it changes immediately, so I think transaction_timeout should also be take immediately. > I doubt people will set this in a transaction. Maybe not, -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
On Fri, Dec 22, 2023 at 1:39 PM Japin Li wrote: > > > On Tue, 19 Dec 2023 at 22:06, Japin Li wrote: > > On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin > > wrote: > >>> On 19 Dec 2023, at 13:26, Andrey M. Borodin wrote: > >>> > >>> I don’t have Windows machine, so I hope CF bot will pick this. > >> > >> I used Github CI to produce version of tests that seems to be is stable on > >> Windows. > > > > It still failed on Windows Server 2019 [1]. > > > > diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out > > C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out > > --- C:/cirrus/src/test/isolation/expected/timeouts.out2023-12-19 > > 10:34:30.354721100 + > > +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out > > 2023-12-19 10:38:25.877981600 + > > @@ -100,7 +100,7 @@ > > step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE > > application_name = 'isolation/timeouts/stt2' > > count > > - > > -0 > > +1 > > (1 row) > > > > step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET > > statement_timeout = '10s'; SET lock_timeout = '10s'; SET > > transaction_timeout = '10s'; > > > > [1] > > https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs > > Hi, > > I try to split the test for transaction timeout, and all passed on my CI [1]. > > OTOH, I find if I set transaction_timeout in a transaction, it will not take > effect immediately. For example: > > [local]:2049802 postgres=# BEGIN; > BEGIN > [local]:2049802 postgres=*# SET transaction_timeout TO '1s'; when this execute, TransactionTimeout is still 0, this command will not set timeout > SET > [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1; -- wait 10s when this command get execute, start_xact_command will enable the timer >relname > -- > pg_statistic > (1 row) > > [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1; > FATAL: terminating connection due to transaction timeout > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Succeeded. > > It looks odd. Does this is expected? I'm not read all the threads, > am I missing something? I think this is by design, if you debug statement_timeout, it's the same behaviour, the timeout will be set for each command after the second command was called, you just aren't aware of this. I doubt people will set this in a transaction. > > [1] https://cirrus-ci.com/build/6574686130143232 > > -- > Regrads, > Japin Li > ChengDu WenWu Information Technology Co., Ltd. > -- Regards Junwang Zhao
Re: Transaction timeout
On Tue, 19 Dec 2023 at 22:06, Japin Li wrote: > On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin wrote: >>> On 19 Dec 2023, at 13:26, Andrey M. Borodin wrote: >>> >>> I don’t have Windows machine, so I hope CF bot will pick this. >> >> I used Github CI to produce version of tests that seems to be is stable on >> Windows. > > It still failed on Windows Server 2019 [1]. > > diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out > C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out > --- C:/cirrus/src/test/isolation/expected/timeouts.out2023-12-19 > 10:34:30.354721100 + > +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out > 2023-12-19 10:38:25.877981600 + > @@ -100,7 +100,7 @@ > step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE > application_name = 'isolation/timeouts/stt2' > count > - > -0 > +1 > (1 row) > > step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET > statement_timeout = '10s'; SET lock_timeout = '10s'; SET transaction_timeout > = '10s'; > > [1] > https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs Hi, I try to split the test for transaction timeout, and all passed on my CI [1]. OTOH, I find if I set transaction_timeout in a transaction, it will not take effect immediately. For example: [local]:2049802 postgres=# BEGIN; BEGIN [local]:2049802 postgres=*# SET transaction_timeout TO '1s'; SET [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1; -- wait 10s relname -- pg_statistic (1 row) [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1; FATAL: terminating connection due to transaction timeout server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. It looks odd. Does this is expected? I'm not read all the threads, am I missing something? [1] https://cirrus-ci.com/build/6574686130143232 -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd. >From fb87e5fe2ea5ced51a7e443243cdd40115423449 Mon Sep 17 00:00:00 2001 From: "Andrey M. Borodin" Date: Sun, 3 Dec 2023 23:18:00 +0500 Subject: [PATCH v13 1/1] Introduce transaction_timeout This commit adds timeout that is expected to be used as a prevention of long-running queries. Any session within transaction will be terminated after spanning longer than this timeout. However, this timeout is not applied to prepared transactions. Only transactions with user connections are affected. Author: Andrey Borodin Reviewed-by: Nikolay Samokhvalov Reviewed-by: Andres Freund Reviewed-by: Fujii Masao Reviewed-by: bt23nguyent Reviewed-by: Yuhang Qiu Reviewed-by: Japin Li Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com --- doc/src/sgml/config.sgml | 35 +++ src/backend/postmaster/autovacuum.c | 2 + src/backend/storage/lmgr/proc.c | 1 + src/backend/tcop/postgres.c | 27 +++- src/backend/utils/errcodes.txt| 1 + src/backend/utils/init/globals.c | 1 + src/backend/utils/init/postinit.c | 10 +++ src/backend/utils/misc/guc_tables.c | 11 src/backend/utils/misc/postgresql.conf.sample | 1 + src/bin/pg_dump/pg_backup_archiver.c | 2 + src/bin/pg_dump/pg_dump.c | 2 + src/bin/pg_rewind/libpq_source.c | 1 + src/include/miscadmin.h | 1 + src/include/storage/proc.h| 1 + src/include/utils/timeout.h | 1 + src/test/isolation/Makefile | 5 +- src/test/isolation/expected/timeouts.out | 63 ++- src/test/isolation/specs/timeouts.spec| 30 + 18 files changed, 190 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b5624ca884..d62edcf83b 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -9134,6 +9134,41 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + transaction_timeout (integer) + + transaction_timeout configuration parameter + + + + +Terminate any session that spans longer than the specified amount of +time in transaction. The limit applies both to explicit transactions +(started with BEGIN) and to implicitly started +transaction corresponding to single statement. But this limit is not +applied to prepared transactions.
Re: Transaction timeout
Hi Junwang Zhao Agree +1 Best whish Junwang Zhao 于2023年12月20日周三 10:35写道: > On Wed, Dec 20, 2023 at 9:58 AM Thomas wen > wrote: > > > > Hi Junwang Zhao > > #should we invalidate lock_timeout? Or maybe just document this. > > I think you mean when lock_time is greater than trasaction-time > invalidate lock_timeout or needs to be logged ? > > > I mean the interleaving of the gucs, which is lock_timeout and the new > introduced transaction_timeout, > if lock_timeout >= transaction_timeout, seems no need to enable > lock_timeout. > > > > > > > > Best whish > > > > 发件人: Junwang Zhao > > 发送时间: 2023年12月20日 9:48 > > 收件人: Andrey M. Borodin > > 抄送: Japin Li ; 邱宇航 ; Fujii Masao > ; Andrey Borodin ; > Andres Freund ; Michael Paquier < > michael.paqu...@gmail.com>; Nikolay Samokhvalov ; > pgsql-hackers ; > pgsql-hackers@lists.postgresql.org > > 主题: Re: Transaction timeout > > > > On Tue, Dec 19, 2023 at 10:51 PM Junwang Zhao wrote: > > > > > > On Tue, Dec 19, 2023 at 6:27 PM Andrey M. Borodin < > x4...@yandex-team.ru> wrote: > > > > > > > > > > > > > > > > > On 19 Dec 2023, at 13:26, Andrey M. Borodin > wrote: > > > > > > > > > > I don’t have Windows machine, so I hope CF bot will pick this. > > > > > > > > I used Github CI to produce version of tests that seems to be is > stable on Windows. > > > > Sorry for the noise. > > > > > > > > > > > > Best regards, Andrey Borodin. > > > > > > + > > > +If transaction_timeout is shorter than > > > +idle_in_transaction_session_timeout or > > > statement_timeout > > > +transaction_timeout will invalidate longer > timeout. > > > + > > > > > > When transaction_timeout is *equal* to > idle_in_transaction_session_timeout > > > or statement_timeout, idle_in_transaction_session_timeout and > statement_timeout > > > will also be invalidated, the logic in the code seems right, though > > > this document > > > is a little bit inaccurate. > > > > > > > Unlike statement_timeout, this timeout can > only occur > > while waiting for locks. Note that if > > statement_timeout > > is nonzero, it is rather pointless to set > > lock_timeout to > > the same or larger value, since the statement timeout would > always > > trigger first. If log_min_error_statement is > set to > > ERROR or lower, the statement that timed out > will be > > logged. > > > > > > There is a note about statement_timeout and lock_timeout, set both > > and lock_timeout >= statement_timeout is pointless, but this logic seems > not > > implemented in the code. I am wondering if lock_timeout >= > transaction_timeout, > > should we invalidate lock_timeout? Or maybe just document this. > > > > > -- > > > Regards > > > Junwang Zhao > > > > > > > > -- > > Regards > > Junwang Zhao > > > > > > > -- > Regards > Junwang Zhao > > >
Re: Transaction timeout
On Wed, Dec 20, 2023 at 9:58 AM Thomas wen wrote: > > Hi Junwang Zhao > #should we invalidate lock_timeout? Or maybe just document this. > I think you mean when lock_time is greater than trasaction-time invalidate > lock_timeout or needs to be logged ? > I mean the interleaving of the gucs, which is lock_timeout and the new introduced transaction_timeout, if lock_timeout >= transaction_timeout, seems no need to enable lock_timeout. > > > > Best whish > > 发件人: Junwang Zhao > 发送时间: 2023年12月20日 9:48 > 收件人: Andrey M. Borodin > 抄送: Japin Li ; 邱宇航 ; Fujii Masao > ; Andrey Borodin ; Andres > Freund ; Michael Paquier ; > Nikolay Samokhvalov ; pgsql-hackers > ; pgsql-hackers@lists.postgresql.org > > 主题: Re: Transaction timeout > > On Tue, Dec 19, 2023 at 10:51 PM Junwang Zhao wrote: > > > > On Tue, Dec 19, 2023 at 6:27 PM Andrey M. Borodin > > wrote: > > > > > > > > > > > > > On 19 Dec 2023, at 13:26, Andrey M. Borodin > > > > wrote: > > > > > > > > I don’t have Windows machine, so I hope CF bot will pick this. > > > > > > I used Github CI to produce version of tests that seems to be is stable > > > on Windows. > > > Sorry for the noise. > > > > > > > > > Best regards, Andrey Borodin. > > > > + > > +If transaction_timeout is shorter than > > +idle_in_transaction_session_timeout or > > statement_timeout > > +transaction_timeout will invalidate longer > > timeout. > > + > > > > When transaction_timeout is *equal* to idle_in_transaction_session_timeout > > or statement_timeout, idle_in_transaction_session_timeout and > > statement_timeout > > will also be invalidated, the logic in the code seems right, though > > this document > > is a little bit inaccurate. > > > > Unlike statement_timeout, this timeout can only > occur > while waiting for locks. Note that if > statement_timeout > is nonzero, it is rather pointless to set > lock_timeout to > the same or larger value, since the statement timeout would always > trigger first. If log_min_error_statement is set > to > ERROR or lower, the statement that timed out will > be > logged. > > > There is a note about statement_timeout and lock_timeout, set both > and lock_timeout >= statement_timeout is pointless, but this logic seems not > implemented in the code. I am wondering if lock_timeout >= > transaction_timeout, > should we invalidate lock_timeout? Or maybe just document this. > > > -- > > Regards > > Junwang Zhao > > > > -- > Regards > Junwang Zhao > > -- Regards Junwang Zhao
回复: Transaction timeout
Hi Junwang Zhao #should we invalidate lock_timeout? Or maybe just document this. I think you mean when lock_time is greater than trasaction-time invalidate lock_timeout or needs to be logged ? Best whish 发件人: Junwang Zhao 发送时间: 2023年12月20日 9:48 收件人: Andrey M. Borodin 抄送: Japin Li ; 邱宇航 ; Fujii Masao ; Andrey Borodin ; Andres Freund ; Michael Paquier ; Nikolay Samokhvalov ; pgsql-hackers ; pgsql-hackers@lists.postgresql.org 主题: Re: Transaction timeout On Tue, Dec 19, 2023 at 10:51 PM Junwang Zhao wrote: > > On Tue, Dec 19, 2023 at 6:27 PM Andrey M. Borodin > wrote: > > > > > > > > > On 19 Dec 2023, at 13:26, Andrey M. Borodin wrote: > > > > > > I don’t have Windows machine, so I hope CF bot will pick this. > > > > I used Github CI to produce version of tests that seems to be is stable on > > Windows. > > Sorry for the noise. > > > > > > Best regards, Andrey Borodin. > > + > +If transaction_timeout is shorter than > +idle_in_transaction_session_timeout or > statement_timeout > +transaction_timeout will invalidate longer > timeout. > + > > When transaction_timeout is *equal* to idle_in_transaction_session_timeout > or statement_timeout, idle_in_transaction_session_timeout and > statement_timeout > will also be invalidated, the logic in the code seems right, though > this document > is a little bit inaccurate. > Unlike statement_timeout, this timeout can only occur while waiting for locks. Note that if statement_timeout is nonzero, it is rather pointless to set lock_timeout to the same or larger value, since the statement timeout would always trigger first. If log_min_error_statement is set to ERROR or lower, the statement that timed out will be logged. There is a note about statement_timeout and lock_timeout, set both and lock_timeout >= statement_timeout is pointless, but this logic seems not implemented in the code. I am wondering if lock_timeout >= transaction_timeout, should we invalidate lock_timeout? Or maybe just document this. > -- > Regards > Junwang Zhao -- Regards Junwang Zhao
Re: Transaction timeout
On Tue, Dec 19, 2023 at 10:51 PM Junwang Zhao wrote: > > On Tue, Dec 19, 2023 at 6:27 PM Andrey M. Borodin > wrote: > > > > > > > > > On 19 Dec 2023, at 13:26, Andrey M. Borodin wrote: > > > > > > I don’t have Windows machine, so I hope CF bot will pick this. > > > > I used Github CI to produce version of tests that seems to be is stable on > > Windows. > > Sorry for the noise. > > > > > > Best regards, Andrey Borodin. > > + > +If transaction_timeout is shorter than > +idle_in_transaction_session_timeout or > statement_timeout > +transaction_timeout will invalidate longer > timeout. > + > > When transaction_timeout is *equal* to idle_in_transaction_session_timeout > or statement_timeout, idle_in_transaction_session_timeout and > statement_timeout > will also be invalidated, the logic in the code seems right, though > this document > is a little bit inaccurate. > Unlike statement_timeout, this timeout can only occur while waiting for locks. Note that if statement_timeout is nonzero, it is rather pointless to set lock_timeout to the same or larger value, since the statement timeout would always trigger first. If log_min_error_statement is set to ERROR or lower, the statement that timed out will be logged. There is a note about statement_timeout and lock_timeout, set both and lock_timeout >= statement_timeout is pointless, but this logic seems not implemented in the code. I am wondering if lock_timeout >= transaction_timeout, should we invalidate lock_timeout? Or maybe just document this. > -- > Regards > Junwang Zhao -- Regards Junwang Zhao
Re: Transaction timeout
On Tue, Dec 19, 2023 at 6:27 PM Andrey M. Borodin wrote: > > > > > On 19 Dec 2023, at 13:26, Andrey M. Borodin wrote: > > > > I don’t have Windows machine, so I hope CF bot will pick this. > > I used Github CI to produce version of tests that seems to be is stable on > Windows. > Sorry for the noise. > > > Best regards, Andrey Borodin. + +If transaction_timeout is shorter than +idle_in_transaction_session_timeout or statement_timeout +transaction_timeout will invalidate longer timeout. + When transaction_timeout is *equal* to idle_in_transaction_session_timeout or statement_timeout, idle_in_transaction_session_timeout and statement_timeout will also be invalidated, the logic in the code seems right, though this document is a little bit inaccurate. -- Regards Junwang Zhao
Re: Transaction timeout
On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin wrote: >> On 19 Dec 2023, at 13:26, Andrey M. Borodin wrote: >> >> I don’t have Windows machine, so I hope CF bot will pick this. > > I used Github CI to produce version of tests that seems to be is stable on > Windows. It still failed on Windows Server 2019 [1]. diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out --- C:/cirrus/src/test/isolation/expected/timeouts.out 2023-12-19 10:34:30.354721100 + +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out 2023-12-19 10:38:25.877981600 + @@ -100,7 +100,7 @@ step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE application_name = 'isolation/timeouts/stt2' count - -0 +1 (1 row) step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET statement_timeout = '10s'; SET lock_timeout = '10s'; SET transaction_timeout = '10s'; [1] https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
> On 19 Dec 2023, at 13:26, Andrey M. Borodin wrote: > > I don’t have Windows machine, so I hope CF bot will pick this. I used Github CI to produce version of tests that seems to be is stable on Windows. Sorry for the noise. Best regards, Andrey Borodin. v13-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
> On 19 Dec 2023, at 06:25, Japin Li wrote: > > On Windows, there still have an error: Uhhmm, yes. Connection termination looks different on windows machine. I’ve checked how this looks in relication slot tests and removed select that was observing connection failure. I don’t have Windows machine, so I hope CF bot will pick this. Best regards, Andrey Borodin. v12-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
On Mon, 18 Dec 2023 at 17:40, Andrey M. Borodin wrote: >> On 18 Dec 2023, at 14:32, Japin Li wrote: >> >> >> Thanks for updating the patch > > Sorry for the noise, but commitfest bot found one more bug in handling > statement timeout. PFA v11. > On Windows, there still have an error: diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out --- C:/cirrus/src/test/isolation/expected/timeouts.out 2023-12-18 10:22:21.772537200 + +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out 2023-12-18 10:26:08.039831800 + @@ -103,24 +103,7 @@ 0 (1 row) -step stt2_check: SELECT 1; -FATAL: terminating connection due to transaction timeout -server closed the connection unexpectedly +PQconsumeInput failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. -step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET statement_timeout = '10s'; SET lock_timeout = '10s'; SET transaction_timeout = '10s'; -step itt4_begin: BEGIN ISOLATION LEVEL READ COMMITTED; -step sleep_there: SELECT pg_sleep(0.01); -pg_sleep - - -(1 row) - -step stt3_check_itt4: SELECT count(*) FROM pg_stat_activity WHERE application_name = 'isolation/timeouts/itt4' -step stt3_check_itt4: <... completed> -count -- -0 -(1 row) - -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
> On 18 Dec 2023, at 14:32, Japin Li wrote: > > > Thanks for updating the patch Sorry for the noise, but commitfest bot found one more bug in handling statement timeout. PFA v11. Best regards, Andrey Borodin. v11-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
On Mon, 18 Dec 2023 at 13:49, Andrey M. Borodin wrote: >> On 16 Dec 2023, at 05:58, Japin Li wrote: >> >> >> On Fri, 15 Dec 2023 at 17:51, Andrey M. Borodin wrote: On 8 Dec 2023, at 15:29, Japin Li wrote: Thanks for updating the patch. LGTM. >>> >>> PFA v9. Changes: >>> 1. Added tests for idle_in_transaction_timeout >>> 2. Suppress statement_timeout if it’s shorter than transaction_timeout >>> >> + if (StatementTimeout > 0 >> + && IdleInTransactionSessionTimeout < TransactionTimeout) >> ^ >> >> Should be StatementTimeout? > Yes, that’s an oversight. I’ve adjusted tests so they catch this problem. > >> Maybe we should add documentation to describe this behavior. > > I've added a paragraph about it to config.sgml, but I'm not sure about the > comprehensiveness of the wording. > Thanks for updating the patch, no objections. -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
> On 16 Dec 2023, at 05:58, Japin Li wrote: > > > On Fri, 15 Dec 2023 at 17:51, Andrey M. Borodin wrote: >>> On 8 Dec 2023, at 15:29, Japin Li wrote: >>> >>> Thanks for updating the patch. LGTM. >> >> PFA v9. Changes: >> 1. Added tests for idle_in_transaction_timeout >> 2. Suppress statement_timeout if it’s shorter than transaction_timeout >> > + if (StatementTimeout > 0 > + && IdleInTransactionSessionTimeout < TransactionTimeout) > ^ > > Should be StatementTimeout? Yes, that’s an oversight. I’ve adjusted tests so they catch this problem. > Maybe we should add documentation to describe this behavior. I've added a paragraph about it to config.sgml, but I'm not sure about the comprehensiveness of the wording. Best regards, Andrey Borodin. v10-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
On Fri, 15 Dec 2023 at 17:51, Andrey M. Borodin wrote: >> On 8 Dec 2023, at 15:29, Japin Li wrote: >> >> Thanks for updating the patch. LGTM. > > PFA v9. Changes: > 1. Added tests for idle_in_transaction_timeout > 2. Suppress statement_timeout if it’s shorter than transaction_timeout > + if (StatementTimeout > 0 + && IdleInTransactionSessionTimeout < TransactionTimeout) ^ Should be StatementTimeout? Maybe we should add documentation to describe this behavior. > Consider changing status of the commitfest entry if you think it’s ready for > committer. > -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
> On 8 Dec 2023, at 15:29, Japin Li wrote: > > Thanks for updating the patch. LGTM. PFA v9. Changes: 1. Added tests for idle_in_transaction_timeout 2. Suppress statement_timeout if it’s shorter than transaction_timeout Consider changing status of the commitfest entry if you think it’s ready for committer. Thanks! Best regards, Andrey Borodin. v9-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
On Fri, 08 Dec 2023 at 18:08, Andrey M. Borodin wrote: >> On 8 Dec 2023, at 12:59, Japin Li wrote: >> >> >> On Thu, 07 Dec 2023 at 20:40, Andrey M. Borodin wrote: >>>> On 7 Dec 2023, at 06:25, Japin Li wrote: >>>> >>>> If idle_in_transaction_timeout is bigger than transaction_timeout, >>>> the idle-in-transaction timeout don't needed, right? >>> Yes, I think so. >>> >> >> Should we disable the idle_in_transaction_timeout in this case? Of cursor, >> I'm >> not strongly insist on this. > Good idea! > >> I think you forget disable transaction_timeout in AutoVacWorkerMain(). >> If not, can you elaborate on why you don't disable it? > > Seems like code in autovacuum.c was copied, but patch was not updated. I’ve > fixed this oversight. > Thanks for updating the patch. LGTM. -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
> On 8 Dec 2023, at 12:59, Japin Li wrote: > > > On Thu, 07 Dec 2023 at 20:40, Andrey M. Borodin wrote: >>> On 7 Dec 2023, at 06:25, Japin Li wrote: >>> >>> If idle_in_transaction_timeout is bigger than transaction_timeout, >>> the idle-in-transaction timeout don't needed, right? >> Yes, I think so. >> > > Should we disable the idle_in_transaction_timeout in this case? Of cursor, I'm > not strongly insist on this. Good idea! > I think you forget disable transaction_timeout in AutoVacWorkerMain(). > If not, can you elaborate on why you don't disable it? Seems like code in autovacuum.c was copied, but patch was not updated. I’ve fixed this oversight. Thanks! Best regards, Andrey Borodin. v8-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
On Thu, 07 Dec 2023 at 20:40, Andrey M. Borodin wrote: >> On 7 Dec 2023, at 06:25, Japin Li wrote: >> >> If idle_in_transaction_timeout is bigger than transaction_timeout, >> the idle-in-transaction timeout don't needed, right? > Yes, I think so. > Should we disable the idle_in_transaction_timeout in this case? Of cursor, I'm not strongly insist on this. I think you forget disable transaction_timeout in AutoVacWorkerMain(). If not, can you elaborate on why you don't disable it? -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
> On 7 Dec 2023, at 06:25, Japin Li wrote: > > If idle_in_transaction_timeout is bigger than transaction_timeout, > the idle-in-transaction timeout don't needed, right? Yes, I think so. > >> TODO: as Yuhang pointed out prepared transactions must not be killed, thus >> name "transaction_timeout" is not correct. I think the name must be like >> "session_transaction_timeout", but I'd like to have an opinion of someone >> more experienced in giving names to GUCs than me. Or, perhaps, a native >> speaker? >> > How about transaction_session_timeout? Similar to idle_session_timeout. Well, Yuhang also suggested this name... Honestly, I still have a gut feeling that transaction_timeout is a good name, despite being not exactly precise. Thanks! Best regards, Andrey Borodin. PS Sorry for posting twice to the same thread, i noticed your message only after answering to Yuhang's review.
Re: Transaction timeout
Thanks Yuhang!On 7 Dec 2023, at 13:39, 邱宇航wrote:I read the V6 patch and found something needs to be improved.Fixed. PFA v7.Best regards, Andrey Borodin. v7-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
Hi, I read the V6 patch and found something needs to be improved. Prepared transactions should also be documented. A value of zero (the default) disables the timeout. +This timeout is not applied to prepared transactions. Only transactions +with user connections are affected. Missing 'time'. - gettext_noop("Sets the maximum allowed in a transaction."), + gettext_noop("Sets the maximum allowed time in a transaction."), 16 is already released. It's 17 now. - if (AH->remoteVersion >= 16) + if (AH->remoteVersion >= 17) ExecuteSqlStatement(AH, "SET transaction_timeout = 0"); And I test the V6 patch and it works as expected. -- Yuhang Qiu
Re: Transaction timeout
On Wed, 06 Dec 2023 at 21:05, Andrey M. Borodin wrote: >> On 30 Nov 2023, at 20:06, Andrey M. Borodin wrote: >> >> >> Tomorrow I plan to fix raising of the timeout when the transaction is idle. >> Renaming transaction_timeout to something else (to avoid confusion with >> prepared xacts) also seems correct to me. > > > Here's a v6 version of the feature. Changes: > 1. Now transaction_timeout will break connection with FATAL instead of > hanging in "idle in transaction (aborted)" > 2. It will kill equally idle and active transactions > 3. New isolation tests are slightly more complex: isolation tester does not > like when the connection is forcibly killed, thus there must be only 1 > permutation with killed connection. > Greate. If idle_in_transaction_timeout is bigger than transaction_timeout, the idle-in-transaction timeout don't needed, right? > TODO: as Yuhang pointed out prepared transactions must not be killed, thus > name "transaction_timeout" is not correct. I think the name must be like > "session_transaction_timeout", but I'd like to have an opinion of someone > more experienced in giving names to GUCs than me. Or, perhaps, a native > speaker? > How about transaction_session_timeout? Similar to idle_session_timeout. -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Transaction timeout
> On 30 Nov 2023, at 20:06, Andrey M. Borodin wrote: > > > Tomorrow I plan to fix raising of the timeout when the transaction is idle. > Renaming transaction_timeout to something else (to avoid confusion with > prepared xacts) also seems correct to me. Here's a v6 version of the feature. Changes: 1. Now transaction_timeout will break connection with FATAL instead of hanging in "idle in transaction (aborted)" 2. It will kill equally idle and active transactions 3. New isolation tests are slightly more complex: isolation tester does not like when the connection is forcibly killed, thus there must be only 1 permutation with killed connection. TODO: as Yuhang pointed out prepared transactions must not be killed, thus name "transaction_timeout" is not correct. I think the name must be like "session_transaction_timeout", but I'd like to have an opinion of someone more experienced in giving names to GUCs than me. Or, perhaps, a native speaker? Best regards, Andrey Borodin. v6-0001-Introduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
> On 20 Nov 2023, at 06:33, 邱宇航 wrote: Nikolay, Peter, Fujii, Tung, Yuhang, thank you for reviewing this. I'll address feedback soon, this patch has been for a long time on my TODO list. I've started with fixing problem of COMMIT AND CHAIN by restarting timeout counter. Tomorrow I plan to fix raising of the timeout when the transaction is idle. Renaming transaction_timeout to something else (to avoid confusion with prepared xacts) also seems correct to me. Thanks! Best regards, Andrey Borodin. v5-0001-Intorduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
I test the V4 patch and found the backend does't process SIGINT while it's in secure_read. And it seems not a good choice to report ERROR during secure_read, which will turns into FATAL "terminating connection because protocol synchronization was lost". It might be much easier to terminate the backend rather than cancel the backend just like idle_in_transaction_session_timeout and idle_session_timeout did. But the name of the GUC might be transaction_session_timeout. And what about 2PC transaction? The hanging 2PC transaction also hurts server a lot. It’s active transaction but not active backend. Can we cancel the 2PC transaction and how we cancel it. -- Yuhang Qiu
Re: Transaction timeout
On Wed, Sep 6, 2023 at 1:16 AM Fujii Masao wrote: > With the v4 patch, I found that timeout errors no longer occur during the > idle in > transaction phase. Instead, they occur when the next statement is executed. > Is this > the intended behavior? I thought some users might want to use the transaction > timeout > feature to prevent prolonged transactions and promptly release resources > (e.g., locks) > in case of a timeout, similar to idle_in_transaction_session_timeout. I agree – it seems reasonable to interrupt transaction immediately when the timeout occurs. This was the idea – to determine the maximum possible time for all transactions that is allowed on a server, to avoid too long-lasting locking and not progressing xmin horizon. That being said, I also think this wording in the docs: +Setting transaction_timeout in +postgresql.conf is not recommended because it would +affect all sessions. It was inherited from statement_timeout, where I also find this wording too one-sided. There are certain situations where we do want global setting to be set – actually, any large OLTP case (to be on lower risk side; those users who need longer timeout, can set it when needed, but by default we do need very restrictive timeouts, usually < 1 minute, like we do in HTTP or application servers). I propose this: > Setting transaction_timeout in postgresql.conf should be done with caution > because it affects all sessions. Looking at the v4 of the patch, a couple of more comments that might be helpful for v5 (which is planned, as I understand): 1) it might be beneficial to add tests for more complex scenarios, e.g., subtransactions 2) In the error message: + errmsg("canceling statement due to %s%s%s%s%s", lock_reason, comma1, + stmt_reason, comma2, tx_reason))); – it seems we can have excessive commas here 3) Perhaps, we should say that we cancel the transaction, not statement (especially in the case when it is happening in the idle-in-transaction state). Thanks for working on this feature!
Re: Transaction timeout
On 2023-09-06 20:32, Andrey M. Borodin wrote: Thanks for looking into this! On 6 Sep 2023, at 13:16, Fujii Masao wrote: While testing v4 patch, I noticed it doesn't handle the COMMIT AND CHAIN case correctly. When COMMIT AND CHAIN is executed, I believe the transaction timeout counter should reset and start from zero with the next transaction. However, it appears that the current v4 patch doesn't reset the counter in this scenario. Can you confirm this? Yes, I was not aware of this feature. I'll test and fix this. With the v4 patch, I found that timeout errors no longer occur during the idle in transaction phase. Instead, they occur when the next statement is executed. Is this the intended behavior? AFAIR I had been testing that behaviour of "idle in transaction" was intact. I'll check that again. I thought some users might want to use the transaction timeout feature to prevent prolonged transactions and promptly release resources (e.g., locks) in case of a timeout, similar to idle_in_transaction_session_timeout. Yes, this is exactly how I was expecting the feature to behave: empty up max_connections slots for long-hanging transactions. Thanks for your findings, I'll check and post new version! Best regards, Andrey Borodin. Hi, Thank you for implementing this nice feature! I tested the v4 patch in the interactive transaction mode with 3 following cases: 1. Start a transaction with transaction_timeout=0 (i.e., timeout disabled), and then change the timeout value to more than 0 during the transaction. =# SET transaction_timeout TO 0; =# BEGIN;//timeout is not enabled =# SELECT pg_sleep(5); =# SET transaction_timeout TO '1s'; =# SELECT pg_sleep(10);//timeout is enabled with 1s In this case, the transaction timeout happens during pg_sleep(10). 2. Start a transaction with transaction_timeout>0 (i.e., timeout enabled), and then change the timeout value to more than 0 during the transaction. =# SET transaction_timeout TO '1000s'; =# BEGIN;//timeout is enabled with 1000s =# SELECT pg_sleep(5); =# SET transaction_timeout TO '1s'; =# SELECT pg_sleep(10);//timeout is not restarted and still running with 1000s In this case, the transaction timeout does NOT happen during pg_sleep(10). 3. Start a transaction with transaction_timeout>0 (i.e., timeout enabled), and then change the timeout value to 0 during the transaction. =# SET transaction_timeout TO '10s'; =# BEGIN;//timeout is enabled with 10s =# SELECT pg_sleep(5); =# SET transaction_timeout TO 0; =# SELECT pg_sleep(10);//timeout is NOT disabled and still running with 10s In this case, the transaction timeout happens during pg_sleep(10). The first case where transaction_timeout is disabled before the transaction begins is totally fine. However, in the second and third cases, where transaction_timeout is enabled before the transaction begins, since the timeout has already enabled with a certain value, it will not be enabled again with a new setting value. Furthermore, let's say I want to set a transaction_timeout value for all transactions in postgresql.conf file so it would affect all sessions. The same behavior happened but for all 3 cases, here is one example with the second case: =# BEGIN; SHOW transaction_timeout; select pg_sleep(10); SHOW transaction_timeout; COMMIT; BEGIN transaction_timeout - 15s (1 row) 2023-09-07 11:52:50.510 JST [23889] LOG: received SIGHUP, reloading configuration files 2023-09-07 11:52:50.510 JST [23889] LOG: parameter "transaction_timeout" changed to "5000" pg_sleep -- (1 row) transaction_timeout - 5s (1 row) COMMIT I am of the opinion that these behaviors might lead to confusion among users. Could you confirm if these are the intended behaviors? Additionally, I think the short description should be "Sets the maximum allowed time to commit a transaction." or "Sets the maximum allowed time to wait before aborting a transaction." so that it could be more clear and consistent with other %_timeout descriptions. Also, there is a small whitespace error here: src/backend/tcop/postgres.c:3373: space before tab in indent. + stmt_reason, comma2, tx_reason))); On a side note, while testing the patch with pgbench, it came to my attention that in scenarios involving the execution of multiple concurrent transactions within a high contention environment and with relatively short timeout durations, there is a potential for cascading blocking. This phenomenon can lead to multiple transactions exceeding their designated timeouts, consequently resulting in a degradation of transaction processing performance. No? Do you think this feature should be co-implemented with the existing concurrency control proto
Re: Transaction timeout
Thanks for looking into this! > On 6 Sep 2023, at 13:16, Fujii Masao wrote: > > While testing v4 patch, I noticed it doesn't handle the COMMIT AND CHAIN case > correctly. > When COMMIT AND CHAIN is executed, I believe the transaction timeout counter > should reset > and start from zero with the next transaction. However, it appears that the > current > v4 patch doesn't reset the counter in this scenario. Can you confirm this? Yes, I was not aware of this feature. I'll test and fix this. > With the v4 patch, I found that timeout errors no longer occur during the > idle in > transaction phase. Instead, they occur when the next statement is executed. > Is this > the intended behavior? AFAIR I had been testing that behaviour of "idle in transaction" was intact. I'll check that again. > I thought some users might want to use the transaction timeout > feature to prevent prolonged transactions and promptly release resources > (e.g., locks) > in case of a timeout, similar to idle_in_transaction_session_timeout. Yes, this is exactly how I was expecting the feature to behave: empty up max_connections slots for long-hanging transactions. Thanks for your findings, I'll check and post new version! Best regards, Andrey Borodin.
Re: Transaction timeout
On 2022/12/19 5:53, Andrey Borodin wrote: On Wed, Dec 7, 2022 at 1:30 PM Andrey Borodin wrote: I hope to address other feedback on the weekend. Thanks for implementing this feature! While testing v4 patch, I noticed it doesn't handle the COMMIT AND CHAIN case correctly. When COMMIT AND CHAIN is executed, I believe the transaction timeout counter should reset and start from zero with the next transaction. However, it appears that the current v4 patch doesn't reset the counter in this scenario. Can you confirm this? With the v4 patch, I found that timeout errors no longer occur during the idle in transaction phase. Instead, they occur when the next statement is executed. Is this the intended behavior? I thought some users might want to use the transaction timeout feature to prevent prolonged transactions and promptly release resources (e.g., locks) in case of a timeout, similar to idle_in_transaction_session_timeout. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Transaction timeout
On 12.01.23 20:46, Andrey Borodin wrote: On Sun, Dec 18, 2022 at 12:53:31PM -0800, Andrey Borodin wrote: I've rewritten this part to correctly report all timeouts that did happen. However there's now a tricky comma-formatting code which was tested only manually. I suspect this will make translation difficult. I use special functions for this like _() char* lock_reason = lock_timeout_occurred ? _("lock timeout") : ""; and then ereport(ERROR, (errcode(err_code), errmsg("canceling statement due to %s%s%s%s%s", lock_reason, comma1, stmt_reason, comma2, tx_reason))); I hope it will be translatable... No, you can't do that. You have to write out all the strings separately.
Re: Transaction timeout
On Fri, Jan 13, 2023 at 10:16 AM Andrey Borodin wrote: > > – it seems we could (should) have one more successful "1s wait, 3s > sleep" iteration here, ~727ms somehow wasted in a loop, quite a lot. > > I think big chunk from these 727ms were spent between "BEGIN" and > "select now(), clock_timestamp(), pg_sleep(3) \watch 1". > Not really – there was indeed ~2s delay between BEGIN and the first pg_sleep query, but those ~727ms is something else. here we measure the remainder between the beginning of the transaction measured by "now()' and the the beginning of the last successful pg_sleep() query: gitpod=# select timestamptz '2023-01-13 15:51:18.179579+00' - '2023-01-13 15:49:22.906924+00'; ?column? - 00:01:55.272655 (1 row) It already includes all delays that we had from the beginning of our transaction. The problem with my question was that I didn't take into attention that '2023-01-13 15:51:18.179579+00' is when the last successful query *started*. So the remainder of our 2-min quota – 00:00:04.727345 – includes the last successful loop (3s of successful query + 1s of waiting), and then we have failed after ~700ms. In other words, there are no issues here, all good. > Many thanks for looking into this! many thanks for implementing it
Re: Transaction timeout
Thanks for the review Nikolay! On Fri, Jan 13, 2023 at 8:03 AM Nikolay Samokhvalov wrote: > > 1) The current test set has only 2 simple cases – I'd suggest adding one more > (that one that didn't work in v1): > > gitpod=# set transaction_timeout to '20ms'; > SET > gitpod=# begin; select pg_sleep(.01); select pg_sleep(.01); select > pg_sleep(.01); commit; I tried exactly these tests - tests were unstable on Windows. Maybe that OS has a more coarse-grained timer resolution. It's a tradeoff between time spent on tests, strength of the test and probability of false failure. I chose small time without false alarms. > – it seems we could (should) have one more successful "1s wait, 3s sleep" > iteration here, ~727ms somehow wasted in a loop, quite a lot. I think big chunk from these 727ms were spent between "BEGIN" and "select now(), clock_timestamp(), pg_sleep(3) \watch 1". I doubt patch really contains arithmetic errors. Many thanks for looking into this! Best regards, Andrey Borodin.
Re: Transaction timeout
On Thu, Jan 12, 2023 at 11:47 AM Andrey Borodin wrote: > On Thu, Jan 12, 2023 at 11:24 AM Nathan Bossart > wrote: > > > > On Sun, Dec 18, 2022 at 12:53:31PM -0800, Andrey Borodin wrote: > > > I've rewritten this part to correctly report all timeouts that did > > > happen. However there's now a tricky comma-formatting code which was > > > tested only manually. > Testing it again, a couple of questions 1) The current test set has only 2 simple cases – I'd suggest adding one more (that one that didn't work in v1): gitpod=# set transaction_timeout to '20ms'; SET gitpod=# begin; select pg_sleep(.01); select pg_sleep(.01); select pg_sleep(.01); commit; BEGIN pg_sleep -- (1 row) ERROR: canceling statement due to transaction timeout gitpod=# set statement_timeout to '20ms'; set transaction_timeout to 0; -- to test value for statement_timeout and see that it doesn't fail SET SET gitpod=# begin; select pg_sleep(.01); select pg_sleep(.01); select pg_sleep(.01); commit; BEGIN pg_sleep -- (1 row) pg_sleep -- (1 row) pg_sleep -- (1 row) COMMIT 2) Testing for a longer transaction (2 min), in a gitpod VM (everything is local, no network involved) // not sure what's happening here, maybe some overheads that are not related to the implementation, // but the goal was to see how precise the limiting is for longer transactions gitpod=# set transaction_timeout to '2min'; SET gitpod=# begin; BEGIN gitpod=*# select now(), clock_timestamp(), pg_sleep(3) \watch 1 Fri 13 Jan 2023 03:49:24 PM UTC (every 1s) now |clock_timestamp| pg_sleep ---+---+-- 2023-01-13 15:49:22.906924+00 | 2023-01-13 15:49:24.088728+00 | (1 row) [...] Fri 13 Jan 2023 03:51:18 PM UTC (every 1s) now |clock_timestamp| pg_sleep ---+---+-- 2023-01-13 15:49:22.906924+00 | 2023-01-13 15:51:18.179579+00 | (1 row) ERROR: canceling statement due to transaction timeout gitpod=!# gitpod=!# rollback; ROLLBACK gitpod=# select timestamptz '2023-01-13 15:51:18.179579+00' - '2023-01-13 15:49:22.906924+00'; ?column? - 00:01:55.272655 (1 row) gitpod=# select interval '2min' - '00:01:55.272655'; ?column? - 00:00:04.727345 (1 row) gitpod=# select interval '2min' - '00:01:55.272655' - '4s'; ?column? - 00:00:00.727345 (1 row) – it seems we could (should) have one more successful "1s wait, 3s sleep" iteration here, ~727ms somehow wasted in a loop, quite a lot.
Re: Transaction timeout
On Thu, Jan 12, 2023 at 11:24 AM Nathan Bossart wrote: > > On Sun, Dec 18, 2022 at 12:53:31PM -0800, Andrey Borodin wrote: > > I've rewritten this part to correctly report all timeouts that did > > happen. However there's now a tricky comma-formatting code which was > > tested only manually. > > I suspect this will make translation difficult. I use special functions for this like _() char* lock_reason = lock_timeout_occurred ? _("lock timeout") : ""; and then ereport(ERROR, (errcode(err_code), errmsg("canceling statement due to %s%s%s%s%s", lock_reason, comma1, stmt_reason, comma2, tx_reason))); I hope it will be translatable... > >> > > > + ahprintf(AH, "SET transaction_timeout = 0;\n"); > >> > > > >> > > Hm - why is that the right thing to do? > >> > Because transaction_timeout has effects of statement_timeout. > >> > >> I guess it's just following precedent - but it seems a bit presumptuous > >> to just disable safety settings a DBA might have set up. That makes some > >> sense for e.g. idle_in_transaction_session_timeout, because I think > >> e.g. parallel backup can lead to a connection being idle for a bit. > > > > I do not know. My reasoning - everywhere we turn off > > statement_timeout, we should turn off transaction_timeout too. > > But I have no strong opinion here. I left this code as is in the patch > > so far. For the same reason I did not change anything in > > pg_backup_archiver.c. > > From 8383486's commit message: > > We disable statement_timeout and lock_timeout during dump and restore, > to prevent any global settings that might exist from breaking routine > backups. > > I imagine changing this could disrupt existing servers that depend on these > overrides during backups, although I think Andres has a good point about > disabling safety settings. This might be a good topic for another thread. > +1. Thanks for the review! Best regards, Andrey Borodin.
Re: Transaction timeout
On Sun, Dec 18, 2022 at 12:53:31PM -0800, Andrey Borodin wrote: > I've rewritten this part to correctly report all timeouts that did > happen. However there's now a tricky comma-formatting code which was > tested only manually. I suspect this will make translation difficult. >> > > > + ahprintf(AH, "SET transaction_timeout = 0;\n"); >> > > >> > > Hm - why is that the right thing to do? >> > Because transaction_timeout has effects of statement_timeout. >> >> I guess it's just following precedent - but it seems a bit presumptuous >> to just disable safety settings a DBA might have set up. That makes some >> sense for e.g. idle_in_transaction_session_timeout, because I think >> e.g. parallel backup can lead to a connection being idle for a bit. > > I do not know. My reasoning - everywhere we turn off > statement_timeout, we should turn off transaction_timeout too. > But I have no strong opinion here. I left this code as is in the patch > so far. For the same reason I did not change anything in > pg_backup_archiver.c. >From 8383486's commit message: We disable statement_timeout and lock_timeout during dump and restore, to prevent any global settings that might exist from breaking routine backups. I imagine changing this could disrupt existing servers that depend on these overrides during backups, although I think Andres has a good point about disabling safety settings. This might be a good topic for another thread. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Transaction timeout
On Wed, Dec 7, 2022 at 1:30 PM Andrey Borodin wrote: > I hope to address other feedback on the weekend. Andres, here's my progress on working with your review notes. > > @@ -3277,6 +3282,7 @@ ProcessInterrupts(void) > >*/ > > lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, > > true); > > stmt_timeout_occurred = > > get_timeout_indicator(STATEMENT_TIMEOUT, true); > > + tx_timeout_occurred = > > get_timeout_indicator(TRANSACTION_TIMEOUT, true); > > > > /* > >* If both were set, we want to report whichever timeout > > completed > > This doesn't update the preceding comment, btw, which now reads oddly: I've rewritten this part to correctly report all timeouts that did happen. However there's now a tricky comma-formatting code which was tested only manually. > > > > @@ -1360,6 +1363,16 @@ IdleInTransactionSessionTimeoutHandler(void) > > > > SetLatch(MyLatch); > > > > } > > > > > > > > +static void > > > > +TransactionTimeoutHandler(void) > > > > +{ > > > > +#ifdef HAVE_SETSID > > > > + /* try to signal whole process group */ > > > > + kill(-MyProcPid, SIGINT); > > > > +#endif > > > > + kill(MyProcPid, SIGINT); > > > > +} > > > > + > > > > > > Why does this use signals instead of just setting the latch like > > > IdleInTransactionSessionTimeoutHandler() etc? > > > > I just copied statement_timeout behaviour. As I understand this > > implementation is prefered if the timeout can catch the backend > > running at full steam. > > Hm. I'm not particularly convinced by that code. Be that as it may, I > don't think it's a good idea to have one more copy of this code. At > least the patch should wrap the signalling code in a helper. Done, now there is a single CancelOnTimeoutHandler() handler. > > > > diff --git a/src/bin/pg_dump/pg_backup_archiver.c > > > > b/src/bin/pg_dump/pg_backup_archiver.c > > > > index 0081873a72..5229fe3555 100644 > > > > --- a/src/bin/pg_dump/pg_backup_archiver.c > > > > +++ b/src/bin/pg_dump/pg_backup_archiver.c > > > > @@ -3089,6 +3089,7 @@ _doSetFixedOutputState(ArchiveHandle *AH) > > > > ahprintf(AH, "SET statement_timeout = 0;\n"); > > > > ahprintf(AH, "SET lock_timeout = 0;\n"); > > > > ahprintf(AH, "SET idle_in_transaction_session_timeout = 0;\n"); > > > > + ahprintf(AH, "SET transaction_timeout = 0;\n"); > > > > > > Hm - why is that the right thing to do? > > Because transaction_timeout has effects of statement_timeout. > > I guess it's just following precedent - but it seems a bit presumptuous > to just disable safety settings a DBA might have set up. That makes some > sense for e.g. idle_in_transaction_session_timeout, because I think > e.g. parallel backup can lead to a connection being idle for a bit. I do not know. My reasoning - everywhere we turn off statement_timeout, we should turn off transaction_timeout too. But I have no strong opinion here. I left this code as is in the patch so far. For the same reason I did not change anything in pg_backup_archiver.c. > > Either way we can do batch function enable_timeouts() instead > > enable_timeout_after(). > > > Does anything of it make sense? > > I'm at least as worried about the various calls *after* the execution of > a statement. I think this code is just a one bit check if (get_timeout_active(TRANSACTION_TIMEOUT)) inside of get_timeout_active(). With all 14 timeouts we have, I don't see a good way to optimize stuff so far. > > + if (tx_timeout_occurred) > > + { > > + LockErrorCleanup(); > > + ereport(ERROR, > > + (errcode(ERRCODE_TRANSACTION_TIMEOUT), > > + errmsg("canceling transaction due to > > transaction timeout"))); > > + } > > The number of calls to LockErrorCleanup() here feels wrong - there's > already 8 calls in ProcessInterrupts(). Besides the code duplication I > also think it's not a sane idea to rely on having LockErrorCleanup() > before all the relevant ereport(ERROR)s. I've refactored that code down to 7 calls of LockErrorCleanup() :) Logic behind various branches is not clear for me, e.g. why we do not LockErrorCleanup() when reading commands f
Re: Transaction timeout
On Wed, Dec 7, 2022 at 10:23 AM Andres Freund wrote: > > On 2022-12-03 09:41:04 -0800, Andrey Borodin wrote: > > Fixed. Added test for this. > > The tests don't pass: https://cirrus-ci.com/build/4811553145356288 > oops, sorry. Here's the fix. I hope to address other feedback on the weekend. Thank you! Best regards, Andrey Borodin. v3-0001-Intorduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
Hi, On 2022-12-03 09:41:04 -0800, Andrey Borodin wrote: > Fixed. Added test for this. The tests don't pass: https://cirrus-ci.com/build/4811553145356288 [00:54:35.337](1.251s) not ok 1 - no parameters missing from postgresql.conf.sample [00:54:35.338](0.000s) # Failed test 'no parameters missing from postgresql.conf.sample' # at /tmp/cirrus-ci-build/src/test/modules/test_misc/t/003_check_guc.pl line 81. [00:54:35.338](0.000s) # got: '1' # expected: '0' I am just looking through a bunch of failing CF entries, so I'm perhaps over-sensitized right now. But I'm a bit confused why there's so many occasions of the tests clearly not having been run... Michael, any reason 003_check_guc doesn't show the missing GUCs? It's not particularly helpful to see "0 is different from 1". Seems that even just something like is_deeply(\@missing_from_file, [], "no parameters missing from postgresql.conf.sample"); would be a decent improvement? Greetings, Andres Freund
Re: Transaction timeout
At Mon, 5 Dec 2022 17:10:50 -0800, Andres Freund wrote in > I'm most concerned about the overhead when the timeouts are *not* > enabled. And this adds a branch to start_xact_command() and a function > call for get_timeout_active(TRANSACTION_TIMEOUT) in that case. On its > own, that's not a whole lot, but it does add up. There's 10+ function > calls for timeout and ps_display purposes for every single statement. That path seems like existing just for robustness. I inserted "Assert(0)" just before the disable_timeout(), but make check-world didn't fail [1]. Couldn't we get rid of that path, adding an assertion instead? I'm not sure about other timeouts yet, though. About disabling side, we cannot rely on StatementTimeout. [1] # 032_apply_delay.pl fails for me so I don't know any of the later # tests fails. > But it's definitely also worth optimizing the timeout enabled paths. And > you're right, it looks like there's a fair bit of optimization > potential. Thanks. I'll work on that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Transaction timeout
Hi, On 2022-12-06 09:44:01 +0900, Kyotaro Horiguchi wrote: > At Mon, 5 Dec 2022 15:07:47 -0800, Andres Freund wrote in > > I'm a bit worried about adding evermore branches and function calls for > > the processing of single statements. We already spend a noticable > > percentage of the cycles for a single statement in PostgresMain(), this > > adds additional overhead. > > > > I'm somewhat inclined to think that we need some redesign here before we > > add more overhead. > > insert_timeout() and remove_timeout_index() move 40*(# of several > timeouts) bytes at every enabling/disabling a timeout. This is far > frequent than actually any timeout fires. schedule_alarm() is > interested only in the nearest timeout. > So, we can get rid of the insertion sort in > insert_timeout/remove_timeout_index then let them just search for the > nearest one and remember it. Finding the nearest should be faster than > the insertion sort. Instead we need to scan over the all timeouts > instead of the a few first ones, but that's overhead is not a matter > when a timeout fires. I'm most concerned about the overhead when the timeouts are *not* enabled. And this adds a branch to start_xact_command() and a function call for get_timeout_active(TRANSACTION_TIMEOUT) in that case. On its own, that's not a whole lot, but it does add up. There's 10+ function calls for timeout and ps_display purposes for every single statement. But it's definitely also worth optimizing the timeout enabled paths. And you're right, it looks like there's a fair bit of optimization potential. Greetings, Andres Freund
Re: Transaction timeout
At Mon, 5 Dec 2022 15:07:47 -0800, Andres Freund wrote in > I'm a bit worried about adding evermore branches and function calls for > the processing of single statements. We already spend a noticable > percentage of the cycles for a single statement in PostgresMain(), this > adds additional overhead. > > I'm somewhat inclined to think that we need some redesign here before we > add more overhead. insert_timeout() and remove_timeout_index() move 40*(# of several timeouts) bytes at every enabling/disabling a timeout. This is far frequent than actually any timeout fires. schedule_alarm() is interested only in the nearest timeout. So, we can get rid of the insertion sort in insert_timeout/remove_timeout_index then let them just search for the nearest one and remember it. Finding the nearest should be faster than the insertion sort. Instead we need to scan over the all timeouts instead of the a few first ones, but that's overhead is not a matter when a timeout fires. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Transaction timeout
Hi, On 2022-12-05 15:41:29 -0800, Andrey Borodin wrote: > Thanks for looking into this Andres! > > On Mon, Dec 5, 2022 at 3:07 PM Andres Freund wrote: > > > > I'm a bit worried about adding evermore branches and function calls for > > the processing of single statements. We already spend a noticable > > percentage of the cycles for a single statement in PostgresMain(), this > > adds additional overhead. > > > > I'm somewhat inclined to think that we need some redesign here before we > > add more overhead. > > > We can cap statement_timeout\idle_session_timeout by the budget of > transaction_timeout left. I don't know what you mean by that. > @@ -3277,6 +3282,7 @@ ProcessInterrupts(void) >*/ > lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, > true); > stmt_timeout_occurred = > get_timeout_indicator(STATEMENT_TIMEOUT, true); > + tx_timeout_occurred = > get_timeout_indicator(TRANSACTION_TIMEOUT, true); > > /* >* If both were set, we want to report whichever timeout > completed This doesn't update the preceding comment, btw, which now reads oddly: /* * If LOCK_TIMEOUT and STATEMENT_TIMEOUT indicators are both set, we * need to clear both, so always fetch both. */ > > > @@ -1360,6 +1363,16 @@ IdleInTransactionSessionTimeoutHandler(void) > > > SetLatch(MyLatch); > > > } > > > > > > +static void > > > +TransactionTimeoutHandler(void) > > > +{ > > > +#ifdef HAVE_SETSID > > > + /* try to signal whole process group */ > > > + kill(-MyProcPid, SIGINT); > > > +#endif > > > + kill(MyProcPid, SIGINT); > > > +} > > > + > > > > Why does this use signals instead of just setting the latch like > > IdleInTransactionSessionTimeoutHandler() etc? > > I just copied statement_timeout behaviour. As I understand this > implementation is prefered if the timeout can catch the backend > running at full steam. Hm. I'm not particularly convinced by that code. Be that as it may, I don't think it's a good idea to have one more copy of this code. At least the patch should wrap the signalling code in a helper. FWIW, the HAVE_SETSID code originates in: commit 3ad0728c817bf8abd2c76bd11d856967509b307c Author: Tom Lane Date: 2006-11-21 20:59:53 + On systems that have setsid(2) (which should be just about everything except Windows), arrange for each postmaster child process to be its own process group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole process group not only the direct child process. This provides saner behavior for archive and recovery scripts; in particular, it's possible to shut down a warm-standby recovery server using "pg_ctl stop -m immediate", since delivery of SIGQUIT to the startup subprocess will result in killing the waiting recovery_command. Also, this makes Query Cancel and statement_timeout apply to scripts being run from backends via system(). (There is no support in the core backend for that, but it's widely done using untrusted PLs.) Per gripe from Stephen Harris and subsequent discussion. > > > diff --git a/src/bin/pg_dump/pg_backup_archiver.c > > > b/src/bin/pg_dump/pg_backup_archiver.c > > > index 0081873a72..5229fe3555 100644 > > > --- a/src/bin/pg_dump/pg_backup_archiver.c > > > +++ b/src/bin/pg_dump/pg_backup_archiver.c > > > @@ -3089,6 +3089,7 @@ _doSetFixedOutputState(ArchiveHandle *AH) > > > ahprintf(AH, "SET statement_timeout = 0;\n"); > > > ahprintf(AH, "SET lock_timeout = 0;\n"); > > > ahprintf(AH, "SET idle_in_transaction_session_timeout = 0;\n"); > > > + ahprintf(AH, "SET transaction_timeout = 0;\n"); > > > > Hm - why is that the right thing to do? > Because transaction_timeout has effects of statement_timeout. I guess it's just following precedent - but it seems a bit presumptuous to just disable safety settings a DBA might have set up. That makes some sense for e.g. idle_in_transaction_session_timeout, because I think e.g. parallel backup can lead to a connection being idle for a bit. A few more review comments: > Either way we can do batch function enable_timeouts() instead > enable_timeout_after(). > Does anything of it make sense? I'm at least as worried about the various calls *after* the execution of a statement. > + if (tx_timeout_occurred) > + { > + LockEr
Re: Transaction timeout
Thanks for looking into this Andres! On Mon, Dec 5, 2022 at 3:07 PM Andres Freund wrote: > > I'm a bit worried about adding evermore branches and function calls for > the processing of single statements. We already spend a noticable > percentage of the cycles for a single statement in PostgresMain(), this > adds additional overhead. > > I'm somewhat inclined to think that we need some redesign here before we > add more overhead. > We can cap statement_timeout\idle_session_timeout by the budget of transaction_timeout left. Either way we can do batch function enable_timeouts() instead enable_timeout_after(). Does anything of it make sense? > > > @@ -1360,6 +1363,16 @@ IdleInTransactionSessionTimeoutHandler(void) > > SetLatch(MyLatch); > > } > > > > +static void > > +TransactionTimeoutHandler(void) > > +{ > > +#ifdef HAVE_SETSID > > + /* try to signal whole process group */ > > + kill(-MyProcPid, SIGINT); > > +#endif > > + kill(MyProcPid, SIGINT); > > +} > > + > > Why does this use signals instead of just setting the latch like > IdleInTransactionSessionTimeoutHandler() etc? I just copied statement_timeout behaviour. As I understand this implementation is prefered if the timeout can catch the backend running at full steam. > > diff --git a/src/bin/pg_dump/pg_backup_archiver.c > > b/src/bin/pg_dump/pg_backup_archiver.c > > index 0081873a72..5229fe3555 100644 > > --- a/src/bin/pg_dump/pg_backup_archiver.c > > +++ b/src/bin/pg_dump/pg_backup_archiver.c > > @@ -3089,6 +3089,7 @@ _doSetFixedOutputState(ArchiveHandle *AH) > > ahprintf(AH, "SET statement_timeout = 0;\n"); > > ahprintf(AH, "SET lock_timeout = 0;\n"); > > ahprintf(AH, "SET idle_in_transaction_session_timeout = 0;\n"); > > + ahprintf(AH, "SET transaction_timeout = 0;\n"); > > Hm - why is that the right thing to do? Because transaction_timeout has effects of statement_timeout. Thank you! Best regards, Andrey Borodin.
Re: Transaction timeout
Hi, On 2022-12-03 09:41:04 -0800, Andrey Borodin wrote: > @@ -2720,6 +2723,7 @@ finish_xact_command(void) > > if (xact_started) > { > + > CommitTransactionCommand(); > > #ifdef MEMORY_CONTEXT_CHECKING Spurious newline added. > @@ -4460,6 +4473,10 @@ PostgresMain(const char *dbname, const char *username) > > enable_timeout_after(IDLE_SESSION_TIMEOUT, > > IdleSessionTimeout); > } > + > + > + if (get_timeout_active(TRANSACTION_TIMEOUT)) > + disable_timeout(TRANSACTION_TIMEOUT, > false); > } > > /* Report any recently-changed GUC options */ Too many newlines added. I'm a bit worried about adding evermore branches and function calls for the processing of single statements. We already spend a noticable percentage of the cycles for a single statement in PostgresMain(), this adds additional overhead. I'm somewhat inclined to think that we need some redesign here before we add more overhead. > @@ -1360,6 +1363,16 @@ IdleInTransactionSessionTimeoutHandler(void) > SetLatch(MyLatch); > } > > +static void > +TransactionTimeoutHandler(void) > +{ > +#ifdef HAVE_SETSID > + /* try to signal whole process group */ > + kill(-MyProcPid, SIGINT); > +#endif > + kill(MyProcPid, SIGINT); > +} > + Why does this use signals instead of just setting the latch like IdleInTransactionSessionTimeoutHandler() etc? > diff --git a/src/bin/pg_dump/pg_backup_archiver.c > b/src/bin/pg_dump/pg_backup_archiver.c > index 0081873a72..5229fe3555 100644 > --- a/src/bin/pg_dump/pg_backup_archiver.c > +++ b/src/bin/pg_dump/pg_backup_archiver.c > @@ -3089,6 +3089,7 @@ _doSetFixedOutputState(ArchiveHandle *AH) > ahprintf(AH, "SET statement_timeout = 0;\n"); > ahprintf(AH, "SET lock_timeout = 0;\n"); > ahprintf(AH, "SET idle_in_transaction_session_timeout = 0;\n"); > + ahprintf(AH, "SET transaction_timeout = 0;\n"); Hm - why is that the right thing to do? > diff --git a/src/test/isolation/specs/timeouts.spec > b/src/test/isolation/specs/timeouts.spec > index c747b4ae28..a7f27811c7 100644 > --- a/src/test/isolation/specs/timeouts.spec > +++ b/src/test/isolation/specs/timeouts.spec > @@ -23,6 +23,9 @@ step sto{ SET statement_timeout = '10ms'; } > step lto { SET lock_timeout = '10ms'; } > step lsto{ SET lock_timeout = '10ms'; SET statement_timeout = '10s'; } > step slto{ SET lock_timeout = '10s'; SET statement_timeout = '10ms'; } > +step tto { SET transaction_timeout = '10ms'; } > +step sleep0 { SELECT pg_sleep(0.0001) } > +step sleep10 { SELECT pg_sleep(0.01) } > step locktbl { LOCK TABLE accounts; } > step update { DELETE FROM accounts WHERE accountid = 'checking'; } > teardown { ABORT; } > @@ -47,3 +50,5 @@ permutation wrtbl lto update(*) > permutation wrtbl lsto update(*) > # statement timeout expires first, row-level lock > permutation wrtbl slto update(*) > +# transaction timeout > +permutation tto sleep0 sleep0 sleep10(*) > \ No newline at end of file I don't think this is quite sufficient. I think the test should verify that transaction timeout interacts correctly with statement timeout / idle in tx timeout. Greetings, Andres Freund
Re: Transaction timeout
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Tested, works as expected; documentation is not yet added
Re: Transaction timeout
On Sat, Dec 3, 2022 at 9:41 AM Andrey Borodin wrote: > Fixed. Added test for this. > Thanks! Tested (gitpod: https://gitpod.io/#https://gitlab.com/NikolayS/postgres/tree/transaction_timeout-v2 ), works as expected.
Re: Transaction timeout
On Fri, Dec 2, 2022 at 10:59 PM Nikolay Samokhvalov wrote: > > But it fails in the "worst" case I've described above – a series of small > statements: Fixed. Added test for this. Open questions: 1. Docs 2. Order of reporting if happened lock_timeout, statement_timeout, and transaction_timeout simultaneously. Currently there's a lot of code around this... Thanks! Best regards, Andrey Borodin. v2-0001-Intorduce-transaction_timeout.patch Description: Binary data
Re: Transaction timeout
On Fri, Dec 2, 2022 at 9:18 PM Andrey Borodin wrote: > Hello, > > We have statement_timeout, idle_in_transaction_timeout, > idle_session_timeout and many more! But we have no > transaction_timeout. I've skimmed thread [0,1] about existing timeouts > and found no contraindications to implement transaction_timeout. > > Nikolay asked me if I can prototype the feature for testing by him, > and it seems straightforward. Please find attached. If it's not known > to be a bad idea - we'll work on it. > Thanks!! It was a super quick reaction to my proposal Honestly, I was thinking about it for several years, wondering why it's still not implemented. The reasons to have it should be straightforward – here are a couple of them I can see. First one. In the OLTP context, we usually have: - a hard timeout set in application server - a hard timeout set in HTTP server - users not willing to wait more than several seconds – and almost never being ready to wait for more than 30-60s. If Postgres allows longer transactions (it does since we cannot reliably limit their duration now, it's always virtually unlimited), it might be doing the work that nobody is waiting / is needing anymore, speeding resources, affecting health, etc. Why we cannot limit transaction duration reliably? The existing timeouts (namely, statement_timeout + idle_session_timeout) don't protect from having transactions consisting of a series of small statements and short pauses between them. If such behavior happens (e.g., a long series of fast UPDATEs in a loop). It can be dangerous, affecting general DB health (bloat issues). This is reason number two – DBAs might want to decide to minimize the cases of long transactions, setting transaction limits globally (and allowing to set it locally for particular sessions or for some users in rare cases). Speaking of the patch – I just tested it (gitpod: https://gitpod.io/#https://gitlab.com/NikolayS/postgres/tree/transaction_timeout); it applies, works as expected for single-statement transactions: postgres=# set transaction_timeout to '2s'; SET postgres=# select pg_sleep(3); ERROR: canceling transaction due to transaction timeout But it fails in the "worst" case I've described above – a series of small statements: postgres=# set transaction_timeout to '2s'; SET postgres=# begin; select pg_sleep(1); select pg_sleep(1); select pg_sleep(1); select pg_sleep(1); select pg_sleep(1); commit; BEGIN pg_sleep -- (1 row) pg_sleep -- (1 row) pg_sleep -- (1 row) pg_sleep -- (1 row) pg_sleep -- (1 row) COMMIT postgres=#
Transaction timeout
Hello, We have statement_timeout, idle_in_transaction_timeout, idle_session_timeout and many more! But we have no transaction_timeout. I've skimmed thread [0,1] about existing timeouts and found no contraindications to implement transaction_timeout. Nikolay asked me if I can prototype the feature for testing by him, and it seems straightforward. Please find attached. If it's not known to be a bad idea - we'll work on it. Thanks! Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/763A0689-F189-459E-946F-F0EC4458980B%40hotmail.com v1-0001-Intorduce-transaction_timeout.patch Description: Binary data
Re: idle-in-transaction timeout error does not give a hint
>>From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] >> >>> Personally, I don't find this hint particularly necessary. The >>> session was terminated because nothing was happening, so the real fix >>> on the application side is probably more involved than just retrying. >>> This is different from some of the other cases that were cited, such >>> as serialization conflicts, where you just got unlucky due to >>> concurrent events. In the case of idle-in-transaction-timeout, the >>> fault is entirely your own. >> >>Hum. Sounds like a fair argument. Ideriha-san, what do you think? >> > > Hi. > As Peter mentioned, application code generally needs to handle more things > than retrying. > So I'm ok with not adding this hint. I have changed the patch status to "Withdrawn". Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
RE: idle-in-transaction timeout error does not give a hint
>From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] > >> Personally, I don't find this hint particularly necessary. The >> session was terminated because nothing was happening, so the real fix >> on the application side is probably more involved than just retrying. >> This is different from some of the other cases that were cited, such >> as serialization conflicts, where you just got unlucky due to >> concurrent events. In the case of idle-in-transaction-timeout, the >> fault is entirely your own. > >Hum. Sounds like a fair argument. Ideriha-san, what do you think? > Hi. As Peter mentioned, application code generally needs to handle more things than retrying. So I'm ok with not adding this hint. Regards, Takeshi Ideriha
Re: idle-in-transaction timeout error does not give a hint
> Personally, I don't find this hint particularly necessary. The session > was terminated because nothing was happening, so the real fix on the > application side is probably more involved than just retrying. This is > different from some of the other cases that were cited, such as > serialization conflicts, where you just got unlucky due to concurrent > events. In the case of idle-in-transaction-timeout, the fault is > entirely your own. Hum. Sounds like a fair argument. Ideriha-san, what do you think? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: idle-in-transaction timeout error does not give a hint
On 2018-11-28 04:17, Tatsuo Ishii wrote: > + errmsg("terminating connection due to > idle-in-transaction timeout"), > + errhint("In a moment you should be > able to reconnect to the" > + " database and repeat > your command."))); Personally, I don't find this hint particularly necessary. The session was terminated because nothing was happening, so the real fix on the application side is probably more involved than just retrying. This is different from some of the other cases that were cited, such as serialization conflicts, where you just got unlucky due to concurrent events. In the case of idle-in-transaction-timeout, the fault is entirely your own. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: idle-in-transaction timeout error does not give a hint
Hi, On Monday, February 4, 2019 2:15 AM +, Michael Paquier wrote: > On Tue, Dec 04, 2018 at 04:07:34AM +, Ideriha, Takeshi wrote: > > Sure. I didn't have a strong opinion about it, so it's ok. > From what I can see this is waiting input from a native English speaker, so > for now I have moved this entry to next CF. Although I also use English in daily life, I am not from a native English-speaking country. But I'm giving it a try, since the current patch is not that complex to check. > HINT: In a moment you should be able to reconnect to the > database and restart your transaction. I think this error hint message makes sense for idle-in-transaction timeout. It's grammatically correct and gives a clearer hint for that. I agree that "reconnect to database" implies "restart session", so it may not be the best word for errhint message. Now the next point raised by Ideriha-san is whether the other places in the code should also be changed. I also checked the source code where the other errhints appear, besides idle-in-transaction. (ERRCODE_CRASH_SHUTDOWN and ERRCODE_T_R_SERIALIZATION_FAILURE) Those errcodes use "repeat your command" for the hint. > errhint("In a moment you should be able to reconnect to the" > " database and repeat your command."))); (1) (errcode(ERRCODE_CRASH_SHUTDOWN) As per its errdetail, the transaction is rollbacked, so "restart transaction" also makes sense. (2) ERRCODE_T_R_SERIALIZATION_FAILURE The second one is under the clause below... > if (RecoveryConflictPending && DoingCommandRead) ...so using "repeat your command" makes sense. I'm fine with retaining the other two hint messages as they are. Regardless if we want to change the other similarly written errhint messages or add errhint messages for other errcodes that don't have it, I think the latest patch that provides errhint message for idle-in-transaction timeout may be changed to "Ready for Committer", as it still applies and the tests successfully passed. Regards, Kirk Jamison