Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
In patch 0004, I noticed a couple of typos in the documentation; please find attached a fixup patch correcting these. Still in the documentation, same patch, the last paragraph documenting PQcancelPoll() ends as: + indicate the current stage of the connection procedure and might be useful + to provide feedback to the user for example. These statuses are: + while not actually listing the "statuses". Should we list them? Adjust the wording? Or refer to PQconnectPoll() documentation (since the paragraph is copied from there it seems)? Otherwise, the feature still works fine as far as I can tell.From 3e04442e3f283829ed38e4a2b435fd182addf87a Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Wed, 6 Mar 2024 14:55:40 +0100 Subject: [PATCH] fixup! libpq: Add encrypted and non-blocking versions of PQcancel --- doc/src/sgml/libpq.sgml | 2 +- src/interfaces/libpq/fe-cancel.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 1613fcc7bb..1281cac284 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -6122,7 +6122,7 @@ PostgresPollingStatusType PQcancelPoll(PGcancelConn *cancelConn); The request is made over the given PGcancelConn, which needs to be created with . - The return value of + The return value of is 1 if the cancellation request could be started and 0 if not. If it was unsuccessful, the error message can be retrieved using . diff --git a/src/interfaces/libpq/fe-cancel.c b/src/interfaces/libpq/fe-cancel.c index e66b8819ee..9c9a23bb4d 100644 --- a/src/interfaces/libpq/fe-cancel.c +++ b/src/interfaces/libpq/fe-cancel.c @@ -146,7 +146,7 @@ PQcancelBlocking(PGcancelConn *cancelConn) /* * PQcancelStart * - * Starts sending a cancellation request in a blocking fashion. Returns + * Starts sending a cancellation request in a non-blocking fashion. Returns * 1 if successful 0 if not. */ int -- 2.39.2
Re: list of acknowledgments for PG16
Peter Eisentraut a écrit : The list of acknowledgments for the PG16 release notes has been committed. It should show up here sometime: <https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-ACKNOWLEDGEMENTS>. As usual, please check for problems such as wrong sorting, duplicate names in different variants, or names in the wrong order etc. (Our convention is given name followed by surname.) "Gabriele Varrazzo" is mentioned in commit 0032a5456708811ca95bd80a538f4fb72ad0dd20 but it should be "Daniele Varrazzo" (per Discussion link in commit message); the later is already in the list.From c2e685b51f89d80e6e937afaaa0f8d1231fc4d2c Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Wed, 23 Aug 2023 09:10:28 +0200 Subject: [PATCH] Remove a wrong name in acknowledgments --- doc/src/sgml/release-16.sgml | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml index db889127fe..3e8106d22b 100644 --- a/doc/src/sgml/release-16.sgml +++ b/doc/src/sgml/release-16.sgml @@ -3968,7 +3968,6 @@ Author: Andres Freund Farias de Oliveira Florin Irion Franz-Josef Färber -Gabriele Varrazzo Garen Torikian Georgios Kokolatos Gilles Darold -- 2.39.2
Re: [PATCH] Allow Postgres to pick an unused port to listen
The documentation fails to build for me: $ ninja docs [1/2] Generating doc/src/sgml/postgres-full.xml with a custom command FAILED: doc/src/sgml/postgres-full.xml /usr/bin/python3 ../postgresql/doc/src/sgml/xmltools_dep_wrapper --targetname doc/src/sgml/postgres-full.xml --depfile doc/src/sgml/postgres-full.xml.d --tool /usr/bin/xmllint -- --nonet --noent --valid --path doc/src/sgml -o doc/src/sgml/postgres-full.xml ../postgresql/doc/src/sgml/postgres.sgml ../postgresql/doc/src/sgml/postgres.sgml:685: element para: validity error : Element entry is not declared in para list of possible children ninja: build stopped: subcommand failed. Removing the tag resolves the issue: diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index cd07bad3b5..f71859f710 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -684,7 +684,7 @@ include_dir 'conf.d' The port can be set to 0 to make Postgres pick an unused port number. -The assigned port number can be then retrieved from postmaster.pid. +The assigned port number can be then retrieved from postmaster.pid.
Re: Add PQsendSyncMessage() to libpq
Michael Paquier a écrit : On Thu, Apr 27, 2023 at 01:06:27PM +0200, Denis Laxalde wrote: Thank you; this V2 looks good to me. Marking as ready for committer. Please note that we are in a stabilization period for v16 and that the first commit fest of v17 should start in July, so it will perhaps take some time before this is looked at by a committer. Yes, I am aware; totally fine by me. Speaking of which, what was the performance impact of your application once PQflush() was moved out of the pipeline sync? Just asking for curiosity.. I have no metrics for that; but maybe Anton has some? (In Psycopg, we generally do not expect users to handle the sync operation themselves, it's done under the hood; and I only found one situation where the flush could be avoided, but that's largely because our design, there can be more in general I think.)
Re: Add PQsendSyncMessage() to libpq
Hello, Anton Kirilov a écrit : On 25/04/2023 15:23, Denis Laxalde wrote: This sounds like a useful addition to me. I've played a bit with it in Psycopg and it works fine. Thank you very much for reviewing my patch! I have attached a new version of it that addresses your comments and that has been rebased on top of the current tip of the master branch (by fixing a merge conflict), i.e. commit 7b7fa85130330128b404eddebd4f33c6739454b0. For the sake of others who might read this e-mail thread, I would like to mention that my patch is complete (including documentation and tests, but modulo review comments, of course), and that it passes the tests, i.e.: make check make -C src/test/modules/libpq_pipeline check Thank you; this V2 looks good to me. Marking as ready for committer.
Re: Add PQsendSyncMessage() to libpq
Anton Kirilov wrote: I would appeciate your thoughts on my proposal. This sounds like a useful addition to me. I've played a bit with it in Psycopg and it works fine. diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index a16bbf32ef..e2b32c1379 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -82,6 +82,7 @@ static intPQsendDescribe(PGconn *conn, char desc_type, static int check_field_number(const PGresult *res, int field_num); static void pqPipelineProcessQueue(PGconn *conn); static int pqPipelineFlush(PGconn *conn); +static int send_sync_message(PGconn *conn, int flush); Could (should?) be: static int send_sync_message(PGconn *conn, bool flush); diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index f48da7d963..829907957a 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -244,6 +244,104 @@ test_multi_pipelines(PGconn *conn) fprintf(stderr, "ok\n"); } +static void +test_multi_pipelines_noflush(PGconn *conn) +{ Maybe test_multi_pipelines() could be extended with an additional PQsendQueryParams()+PQsendSyncMessage() step instead of adding this extra test case?
Re: [PATCH] Allow Postgres to pick an unused port to listen
Hi, Yurii Rashkovskii a écrit : On Wed, Apr 19, 2023 at 11:44 PM Aleksander Alekseev < aleksan...@timescale.com> wrote: I would like to suggest a patch against master (although it may be worth backporting it) that makes it possible to listen on any unused port. [...] A bullet-proof approach would be (approximately) for the test framework to lease the ports on the given machine, for instance by using a KV value with CAS support like Consul or etcd (or another PostgreSQL instance), as this is done for leader election in distributed systems (so called leader lease). After leasing the port the framework knows no other testing process on the given machine will use it (and also it keeps telling the KV storage that the port is still leased) and specifies it in postgresql.conf as usual. The approach you suggest introduces a significant amount of complexity but seemingly fails to address one of the core issues: using a KV store to lease a port does not guarantee the port's availability. I don't believe this is a sound way to address this issue, let alone a bulletproof one. Also, I don't think there's a case for distributed systems here because we're only managing a single computer's resource: the allocation of local ports. For this (local computer) use case, a tool such as https://github.com/kmike/port-for/ would do the job if I understand correctly (the lease thing, locally). And it would work for "anything", not just Postgres. I am curious, Yurii, is Postgres the only service that need an unused port for listening in your testing/application environment? Otherwise, how is this handled in other software? Cheers, Denis
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
The patch set does not apply any more. I tried to rebase locally; even leaving out 1 ("libpq: Run pgindent after a9e9a9f32b3"), patch 4 ("Start using new libpq cancel APIs") is harder to resolve following 983ec23007b (I suppose). Appart from that, the implementation in v19 sounds good to me, and seems worthwhile. FWIW, as said before, I also implemented it in Psycopg in a sort of an end-to-end validation.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Jelte Fennema a écrit : > On Wed, 29 Mar 2023 at 10:43, Denis Laxalde wrote: > > More importantly, not having PQcancelSend() creating the PGcancelConn > > makes reuse of that value, passing through PQcancelReset(), more > > intuitive. E.g., in the tests: > > You convinced me. Attached is an updated patch where PQcancelSend > takes the PGcancelConn and returns 1 or 0. Patch 5 is missing respective changes; please find attached a fixup patch for these. >From c9e59fb3e30db1bfab75be9fdd4afbc227a5270e Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Thu, 30 Mar 2023 09:19:18 +0200 Subject: [PATCH] fixup! Start using new libpq cancel APIs --- contrib/dblink/dblink.c | 4 ++-- src/fe_utils/connect_utils.c | 4 +++- src/test/isolation/isolationtester.c | 4 ++-- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index e139f66e11..073795f088 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -1332,11 +1332,11 @@ dblink_cancel_query(PG_FUNCTION_ARGS) dblink_init(); conn = dblink_get_named_conn(text_to_cstring(PG_GETARG_TEXT_PP(0))); - cancelConn = PQcancelSend(conn); + cancelConn = PQcancelConn(conn); PG_TRY(); { - if (PQcancelStatus(cancelConn) == CONNECTION_BAD) + if (!PQcancelSend(cancelConn)) { msg = pchomp(PQcancelErrorMessage(cancelConn)); } diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c index b32448c010..1cfd717217 100644 --- a/src/fe_utils/connect_utils.c +++ b/src/fe_utils/connect_utils.c @@ -161,7 +161,9 @@ disconnectDatabase(PGconn *conn) if (PQtransactionStatus(conn) == PQTRANS_ACTIVE) { - PQcancelFinish(PQcancelSend(conn)); + PGcancelConn *cancelConn = PQcancelConn(conn); + PQcancelSend(cancelConn); + PQcancelFinish(cancelConn); } PQfinish(conn); diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c index 3781f7982b..de31a87571 100644 --- a/src/test/isolation/isolationtester.c +++ b/src/test/isolation/isolationtester.c @@ -946,9 +946,9 @@ try_complete_step(TestSpec *testspec, PermutationStep *pstep, int flags) */ if (td > max_step_wait && !canceled) { -PGcancelConn *cancel_conn = PQcancelSend(conn); +PGcancelConn *cancel_conn = PQcancelConn(conn); -if (PQcancelStatus(cancel_conn) == CONNECTION_OK) +if (PQcancelSend(cancel_conn)) { /* * print to stdout not stderr, as this should appear in -- 2.30.2
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Jelte Fennema a écrit : > > Namely, I wonder why it returns a PGcancelConn and what's the > > point of requiring the user to call PQcancelStatus() to see if something > > got wrong. Maybe it could be defined as: > > > > int PQcancelSend(PGcancelConn *cancelConn); > > > > where the return value would be status? And the user would only need to > > call PQcancelErrorMessage() in case of error. This would leave only one > > single way to create a PGcancelConn value (i.e. PQcancelConn()), which > > seems less confusing to me. > > To clarify what you mean, the API would then be like this: > PGcancelConn cancelConn = PQcancelConn(conn); > if (PQcancelSend(cancelConn) == CONNECTION_BAD) { >printf("ERROR %s\n", PQcancelErrorMessage(cancelConn)) >exit(1) > } I'm not sure it's worth returning the connection status, maybe just an int value (the return value of connectDBComplete() for instance). More importantly, not having PQcancelSend() creating the PGcancelConn makes reuse of that value, passing through PQcancelReset(), more intuitive. E.g., in the tests: diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index 6764ab513b..91363451af 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -217,17 +217,18 @@ test_cancel(PGconn *conn, const char *conninfo) pg_fatal("failed to run PQrequestCancel: %s", PQerrorMessage(conn)); confirm_query_cancelled(conn); + cancelConn = PQcancelConn(conn); + /* test PQcancelSend */ send_cancellable_query(conn, monitorConn); - cancelConn = PQcancelSend(conn); - if (PQcancelStatus(cancelConn) == CONNECTION_BAD) + if (PQcancelSend(cancelConn) == CONNECTION_BAD) pg_fatal("failed to run PQcancelSend: %s", PQcancelErrorMessage(cancelConn)); confirm_query_cancelled(conn); - PQcancelFinish(cancelConn); + + PQcancelReset(cancelConn); /* test PQcancelConn and then polling with PQcancelPoll */ send_cancellable_query(conn, monitorConn); - cancelConn = PQcancelConn(conn); if (PQcancelStatus(cancelConn) == CONNECTION_BAD) pg_fatal("bad cancel connection: %s", PQcancelErrorMessage(cancelConn)); while (true) Otherwise, it's not clear if the PGcancelConn created by PQcancelSend() should be reused or not. But maybe that's a matter of documentation? > > As part of my testing, I've implemented non-blocking cancellation in > > Psycopg, based on v16 on this patchset. Overall this worked fine and > > seems useful; if you want to try it: > > > > https://github.com/dlax/psycopg3/tree/pg16/non-blocking-pqcancel > > That's great to hear! I'll try to take a closer look at that change tomorrow. See also https://github.com/psycopg/psycopg/issues/534 if you want to discuss about this. > > (The only thing I found slightly inconvenient is the need to convey the > > connection encoding (from PGconn) when handling error message from the > > PGcancelConn.) > > Could you expand a bit more on this? And if you have any idea on how > to improve the API with regards to this? The thing is that we need the connection encoding (client_encoding) when eventually forwarding the result of PQcancelErrorMessage(), decoded, to the user. More specifically, it seems to me that we'd the encoding of the *cancel connection*, but since PQparameterStatus() cannot be used with a PGcancelConn, I use that of the PGconn. Roughly, in Python: encoding = conn.parameter_status(b"client_encoding") # i.e, in C: char *encoding PQparameterStatus(conn, "client_encoding"); cancel_conn = conn.cancel_conn() # i.e., in C: PGcancelConn *cancelConn = PQcancelConn(conn); # [... then work with with cancel_conn ...] if cancel_conn.status == ConnStatus.BAD: raise OperationalError(cancel_conn.error_message().decode(encoding)) This feels a bit non-atomic to me; isn't there a risk that client_encoding be changed between PQparameterStatus(conn) and PQcancelConn(conn) calls? So maybe PQcancelParameterStatus(PGcancelConn *cancelConn, char *name) is needed?
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Hi Jelte, I had a look into your patchset (v16), did a quick review and played a bit with the feature. Patch 2 is missing the documentation about PQcancelSocket() and contains a few typos; please find attached a (fixup) patch to correct these. --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -321,16 +328,28 @@ extern PostgresPollingStatusType PQresetPoll(PGconn *conn); /* Synchronous (blocking) */ extern void PQreset(PGconn *conn); +/* issue a cancel request */ +extern PGcancelConn * PQcancelSend(PGconn *conn); [...] Maybe I'm missing something, but this function above seems a bit strange. Namely, I wonder why it returns a PGcancelConn and what's the point of requiring the user to call PQcancelStatus() to see if something got wrong. Maybe it could be defined as: int PQcancelSend(PGcancelConn *cancelConn); where the return value would be status? And the user would only need to call PQcancelErrorMessage() in case of error. This would leave only one single way to create a PGcancelConn value (i.e. PQcancelConn()), which seems less confusing to me. Jelte Fennema wrote: > Especially since I ran into another use case that I would want to use > this patch for recently: Adding an async cancel function to Python > it's psycopg3 library. This library exposes both a Connection class > and an AsyncConnection class (using python its asyncio feature). But > one downside of the AsyncConnection type is that it doesn't have a > cancel method. As part of my testing, I've implemented non-blocking cancellation in Psycopg, based on v16 on this patchset. Overall this worked fine and seems useful; if you want to try it: https://github.com/dlax/psycopg3/tree/pg16/non-blocking-pqcancel (The only thing I found slightly inconvenient is the need to convey the connection encoding (from PGconn) when handling error message from the PGcancelConn.) Cheers, Denis >From a5f9cc680ffa520b05fe34b7cac5df2e60a6d4ad Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Tue, 28 Mar 2023 16:06:42 +0200 Subject: [PATCH] fixup! Add non-blocking version of PQcancel --- doc/src/sgml/libpq.sgml | 16 +++- src/interfaces/libpq/fe-connect.c| 2 +- src/test/modules/libpq_pipeline/libpq_pipeline.c | 2 +- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 29f08a4317..aa404c4d15 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -5795,7 +5795,7 @@ PGcancelConn *PQcancelSend(PGconn *conn); options as the the original PGconn. So when the original connection is encrypted (using TLS or GSS), the connection for the cancel request is encrypted in the same way. Any connection options - that only are only used during authentication or after authentication of + that are only used during authentication or after authentication of the client are ignored though, because cancellation requests do not require authentication and the connection is closed right after the cancellation request is submitted. @@ -5912,6 +5912,20 @@ ConnStatusType PQcancelStatus(const PGcancelConn *conn); + + PQcancelSocketPQcancelSocket + + + + A version of that can be used for + cancellation connections. + +int PQcancelSocket(PGcancelConn *conn); + + + + + PQcancelPollPQcancelPoll diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 74e337fddf..16af7303d4 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -808,7 +808,7 @@ PQcancelConn(PGconn *conn) /* * Cancel requests should not iterate over all possible hosts. The request * needs to be sent to the exact host and address that the original - * connection used. So we we manually create the host and address arrays + * connection used. So we manually create the host and address arrays * with a single element after freeing the host array that we generated * from the connection options. */ diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index e8e904892c..6764ab513b 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -89,7 +89,7 @@ pg_fatal_impl(int line, const char *fmt,...) /* * Check that the query on the given connection got cancelled. * - * This is a function wrapped in a macrco to make the reported line number + * This is a function wrapped in a macro to make the reported line number * in an error match the line number of the invocation. */ #define confirm_query_cancelled(conn) confirm_query_cancelled_impl(__LINE__, conn) -- 2.30.2
[PATCH] Reset single-row processing mode at end of pipeline commands queue
Hello, I'm trying to make single-row mode and pipeline mode work together in Psycopg using libpq. I think there is something wrong with respect to the single-row mode flag, not being correctly reset, in some situations. The minimal case I'm considering is (in a pipeline): * send query 1, * get its results in single-row mode, * send query 2, * get its results *not* in single-row mode. It seems that, as the command queue in the pipeline is empty after getting the results of query 1, the single-row mode flag is not reset and is still active for query 2, thus leading to an unexpected PGRES_SINGLE_TUPLE status. The attached patch demonstrates this in the test suite. It also suggests to move the statement resetting single-row mode up in pqPipelineProcessQueue(), before exiting the function when the command queue is empty in particular. Thanks for considering, DenisFrom de28d9ac1e541c3b3a0279dc58fb5a7f23f775f8 Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Fri, 7 Oct 2022 14:00:24 +0200 Subject: [PATCH] Reset single-row processing mode at end of pipeline commands queue Previously, we'd reset the single-row mode only when processing results of the next command in a pipeline and not if there's no more commands in the queue. But if the client emits a query, retrieves its results in single-row mode (thus setting single-row mode flag), then emits another query and wants to retrieve its result in "normal" mode, the single-row mode was not reset in pqPipelineProcessQueue(); this lead to an erroneous PGRES_SINGLE_TUPLE instead of, e.g., PGRES_TUPLES_OK. We fix this by resetting the single-row mode *even* if the command queue is empty in pqPipelineProcessQueue(). --- src/interfaces/libpq/fe-exec.c| 12 +++--- .../modules/libpq_pipeline/libpq_pipeline.c | 43 ++- .../libpq_pipeline/traces/singlerow.trace | 20 + 3 files changed, 68 insertions(+), 7 deletions(-) diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index 0274c1b156..64701b562b 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -3096,6 +3096,12 @@ pqPipelineProcessQueue(PGconn *conn) break; } + /* + * Reset single-row processing mode. (Client has to set it up for each + * query, if desired.) + */ + conn->singleRowMode = false; + /* * If there are no further commands to process in the queue, get us in * "real idle" mode now. @@ -3115,12 +3121,6 @@ pqPipelineProcessQueue(PGconn *conn) /* Initialize async result-accumulation state */ pqClearAsyncResult(conn); - /* - * Reset single-row processing mode. (Client has to set it up for each - * query, if desired.) - */ - conn->singleRowMode = false; - if (conn->pipelineStatus == PQ_PIPELINE_ABORTED && conn->cmd_queue_head->queryclass != PGQUERY_SYNC) { diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index c609f42258..4b9ccbfb4c 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -1153,11 +1153,11 @@ test_singlerowmode(PGconn *conn) int i; bool pipeline_ended = false; - /* 1 pipeline, 3 queries in it */ if (PQenterPipelineMode(conn) != 1) pg_fatal("failed to enter pipeline mode: %s", PQerrorMessage(conn)); + /* One series of three commands, using single-row mode for the first two. */ for (i = 0; i < 3; i++) { char *param[1]; @@ -1249,6 +1249,47 @@ test_singlerowmode(PGconn *conn) pg_fatal("didn't get expected terminating TUPLES_OK"); } + /* + * Now issue one command, get its results in with single-row mode, then + * issue another command, and get its results in normal mode; make sure + * the single-row mode flag is reset as expected. + */ + if (PQsendQueryParams(conn, "SELECT generate_series(0, 0)", 0, NULL, NULL, NULL, NULL, 0) != 1) + pg_fatal("failed to send query: %s", + PQerrorMessage(conn)); + if (PQsendFlushRequest(conn) != 1) + pg_fatal("failed to send flush request"); + if (PQsetSingleRowMode(conn) != 1) + pg_fatal("PQsetSingleRowMode() failed"); + res = PQgetResult(conn); + if (res == NULL) + pg_fatal("unexpected NULL"); + if (PQresultStatus(res) != PGRES_SINGLE_TUPLE) + pg_fatal("Expected PGRES_SINGLE_TUPLE, got %s", + PQresStatus(PQresultStatus(res))); + res = PQgetResult(conn); + if (res == NULL) + pg_fatal("unexpected NULL"); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + pg_fatal("Expected PGRES_TUPLES_OK, got %s", + PQresStatus(PQresultStatus(res))); + if (PQgetResult(conn) != NULL) + pg_fatal("expected NULL result"); + + if (PQsendQueryParams(conn, "SELECT 1", 0, NULL, NULL, NULL, NULL, 0) != 1) + pg_fatal("failed to send query: %s", + PQe
Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
Julien Rouhaud a écrit : I think having a new option for vacuumdb is the right move. It seems unlikely that any cron or similar on the host will try to run some concurrent vacuumdb, but we still have to enforce that only the one executed by pg_upgrade can succeed. I guess it could be an undocumented option, similar to postgres' -b, which would only be allowed iff --all and --freeze is also passed to be extra safe. The help text in pg_dump's man page states: --binary-upgrade This option is for use by in-place upgrade utilities. Its use for other purposes is not recommended or supported. The behavior of the option may change in future releases without notice. Is it enough? Or do we actually want to hide it for vacuumdb? While at it: vacuumdb: error: processing of database "postgres" failed: PANIC: cannot make new WAL entries during recovery Should we tweak that message when IsBinaryUpgrade is true? Yes, indeed, I had in mind to simply make the message more generic as: "cannot insert new WAL entries".
Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
Hi, Julien Rouhaud a écrit : On Wed, 27 Jan 2021 11:25:11 +0100 Denis Laxalde wrote: Andres Freund a écrit : b) when in binary upgrade mode / -b, error out on all wal writes in sessions that don't explicitly set a session-level GUC to allow writes. It should be enough to add an additional test in XLogInsertAllowed() with some new variable that is set when starting in binary upgrade mode, and a new function to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade mode. I tried that simple change first: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index dfe2a0bcce..8feab0cb96 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8498,6 +8498,9 @@ HotStandbyActiveInReplay(void) bool XLogInsertAllowed(void) { + if (IsBinaryUpgrade) + return false; + /* * If value is "unconditionally true" or "unconditionally false", just * return it. This provides the normal fast path once recovery is known But then, pg_upgrade's tests (make -C src/bin/pg_upgrade/ check) fail at vaccumdb but not during pg_dumpall: $ cat src/bin/pg_upgrade/pg_upgrade_utility.log - pg_upgrade run on Fri Jan 28 10:37:36 2022 - command: "/home/denis/src/pgsql/build/tmp_install/home/denis/.local/pgsql/bin/pg_dumpall" --host /home/denis/src/pgsql/build/src/bin/pg_upgrade --port 51696 --username denis --globals-only --quote-all-identifiers --binary-upgrade -f pg_upgrade_dump_globals.sql >> "pg_upgrade_utility.log" 2>&1 command: "/home/denis/src/pgsql/build/tmp_install/home/denis/.local/pgsql/bin/vacuumdb" --host /home/denis/src/pgsql/build/src/bin/pg_upgrade --port 51696 --username denis --all --analyze >> "pg_upgrade_utility.log" 2>&1 vacuumdb: vacuuming database "postgres" vacuumdb: error: processing of database "postgres" failed: PANIC: cannot make new WAL entries during recovery In contrast with pg_dump/pg_dumpall, vacuumdb has no --binary-upgrade flag, so it does not seem possible to use a special GUC setting to allow WAL writes in that vacuumdb session at the moment. Should we add --binary-upgrade to vacuumdb as well? Or am I going in the wrong direction? Thanks, Denis
Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
Bruce Momjian a écrit : On Thu, Aug 26, 2021 at 03:38:23PM +0200, Daniel Gustafsson wrote: On 26 Aug 2021, at 15:09, Bruce Momjian wrote: On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote: .. I still think that changing bgworker_should_start_now() is a better option. I am not sure. We have a lot of pg_upgrade code that turns off things like autovacuum and that has worked fine: True, but there are also conditionals on IsBinaryUpgrade for starting the autovacuum launcher in the postmaster, so there is some precedent. Oh, I was not aware of that. If I understand correctly, autovacuum is turned off by pg_upgrade code only if the old cluster does not support the -b flag (prior to 9.1 apparently). Otherwise, this is indeed handled by IsBinaryUpgrade.
Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
Michael Paquier a écrit : @@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw) static bool bgworker_should_start_now(BgWorkerStartTime start_time) { + if (IsBinaryUpgrade) + return false; + Using -c max_worker_processes=0 would just have the same effect, no? So we could just patch pg_upgrade's server.c to get the same level of protection? Yes, same effect indeed. This would log "too many background workers" messages in pg_upgrade logs, though. See attached patch implementing this suggestion. >From 04867b2184c335e6efaf5a96d348a53efcc9e3d2 Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Mon, 23 Aug 2021 15:19:41 +0200 Subject: [PATCH] Disable bgworkers at servers start in pg_upgrade Background workers may produce undesired activities (writes) on the old cluster during upgrade which may corrupt the new cluster. --- src/bin/pg_upgrade/server.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c index 7fed0ae108..a2b67a5e9a 100644 --- a/src/bin/pg_upgrade/server.c +++ b/src/bin/pg_upgrade/server.c @@ -236,6 +236,9 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error) * values are less than a gap of 20 from the current xid counter, * so autovacuum will not touch them. * + * Disable background workers by setting max_worker_processes=0 to prevent + * undesired writes which may cause corruptions on the new cluster. + * * Turn off durability requirements to improve object creation speed, and * we only modify the new cluster, so only use it there. If there is a * crash, the new cluster has to be recreated anyway. fsync=off is a big @@ -245,7 +248,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error) * vacuumdb --freeze actually freezes the tuples. */ snprintf(cmd, sizeof(cmd), - "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start", + "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s -c max_worker_processes=0\" start", cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port, (cluster->controldata.cat_ver >= BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" : -- 2.30.2
Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
Julien Rouhaud a écrit : On Wed, Jan 27, 2021 at 02:41:32PM +0100, Jehan-Guillaume de Rorthais wrote: On Wed, 27 Jan 2021 11:25:11 +0100 Denis Laxalde wrote: Andres Freund a écrit : I wonder if we could a) set default_transaction_read_only to true, and explicitly change it in the sessions that need that. According to Denis' tests discussed off-list, it works fine in regard with the powa bgworker, albeit some complaints in logs. However, I wonder how fragile it could be as bgworker could easily manipulate either the GUC or "BEGIN READ WRITE". I realize this is really uncommon practices, but bgworker code from third parties might be surprising. Given that having any writes happening at the wrong moment on the old cluster can end up corrupting the new cluster, and that the corruption might not be immediately visible we should try to put as many safeguards as possible. so +1 for the default_transaction_read_only as done in Denis' patch at minimum, but not only. AFAICT it should be easy to prevent all background worker from being launched by adding a check on IsBinaryUpgrade at the beginning of bgworker_should_start_now(). It won't prevent modules from being loaded, so this approach should be problematic. Please find attached another patch implementing this suggestion (as a complement to the previous patch setting default_transaction_read_only). b) when in binary upgrade mode / -b, error out on all wal writes in sessions that don't explicitly set a session-level GUC to allow writes. It feels safer because more specific to the subject. But I wonder if the benefice worth adding some (limited?) complexity and a small/quick conditional block in a very hot code path for a very limited use case. I don't think that it would add that much complexity or overhead as there's already all the infrastructure to prevent WAL writes in certain condition. It should be enough to add an additional test in XLogInsertAllowed() with some new variable that is set when starting in binary upgrade mode, and a new function to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade mode. This part is less clear to me so I'm not sure I'd be able to work on it. >From b265b6f5b49cfcbc3c6271e980455696a5a3622b Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Mon, 23 Aug 2021 15:19:41 +0200 Subject: [PATCH] Do not start bgworker processes during binary upgrade Background workers may produce undesired activities (writes) on the old cluster during upgrade which may corrupt the new cluster. For a similar reason that we restrict socket access and set default transactions to read-only when starting the old cluster in pg_upgrade, we should prevent bgworkers from starting when using the -b flag. --- src/backend/postmaster/postmaster.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 9c2c98614a..e66c962509 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw) static bool bgworker_should_start_now(BgWorkerStartTime start_time) { + if (IsBinaryUpgrade) + return false; + switch (pmState) { case PM_NO_CHILDREN: -- 2.30.2
Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
Hi, Andres Freund a écrit : > On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote: > > We found an issue in pg_upgrade on a cluster with a third-party > > background worker. The upgrade goes fine, but the new cluster is then in > > an inconsistent state. The background worker comes from the PoWA > > extension but the issue does not appear to related to this particular > > code. > > Well, it does imply that that backgrounder did something, as the pure > existence of a bgworker shouldn't affect > > anything. Presumably the issue is that the bgworker actually does > transactional writes, which causes problems because the xids / > multixacts from early during pg_upgrade won't actually be valid after we > do pg_resetxlog etc. > > > > As a solution, it seems that, for similar reasons that we restrict > > socket access to prevent accidental connections (from commit > > f763b77193), we should also prevent background workers to start at this > > step. > > I think that'd have quite the potential for negative impact - imagine > extensions that refuse to be loaded outside of shared_preload_libraries > (e.g. because they need to allocate shared memory) but that are required > during the course of pg_upgrade (e.g. because it's a tableam, a PL or > such). Those libraries will then tried to be loaded during the upgrade > (due to the _PG_init() hook being called when functions from the > extension are needed, e.g. the tableam or PL handler). > > Nor is it clear to me that the only way this would be problematic is > with shared_preload_libraries. A library in local_preload_libraries, or > a demand loaded library can trigger bgworkers (or database writes in > some other form) as well. Thank you for those insights! > I wonder if we could > > a) set default_transaction_read_only to true, and explicitly change it >in the sessions that need that. > b) when in binary upgrade mode / -b, error out on all wal writes in >sessions that don't explicitly set a session-level GUC to allow >writes. Solution "a" appears to be enough to solve the problem described in my initial email. See attached patch implementing this. Cheers, Denis >From d8b74f9220775736917b7fc08bbe397d7e2eedcd Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Wed, 20 Jan 2021 17:25:58 +0100 Subject: [PATCH] Set default transactions to read-only at servers start in pg_upgrade In essence, this is for a similar reason that we use a restricted socket access from f763b77193b04eba03a1f4ce46df34dc0348419e because background workers may produce undesired activities during the upgrade. Author: Denis Laxalde Co-authored-by: Jehan-Guillaume de Rorthais --- src/bin/pg_upgrade/server.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c index 31b1425202..b17f348a5b 100644 --- a/src/bin/pg_upgrade/server.c +++ b/src/bin/pg_upgrade/server.c @@ -244,7 +244,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error) * vacuumdb --freeze actually freezes the tuples. */ snprintf(cmd, sizeof(cmd), - "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start", + "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s -c default_transaction_read_only=on\" start", cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port, (cluster->controldata.cat_ver >= BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" : -- 2.20.1
[PATCH] Disable bgworkers during servers start in pg_upgrade
Hello, We found an issue in pg_upgrade on a cluster with a third-party background worker. The upgrade goes fine, but the new cluster is then in an inconsistent state. The background worker comes from the PoWA extension but the issue does not appear to related to this particular code. Here is a shell script to reproduce the issue (error at the end): OLDBINDIR=/usr/lib/postgresql/11/bin NEWBINDIR=/usr/lib/postgresql/13/bin OLDDATADIR=$(mktemp -d) NEWDATADIR=$(mktemp -d) $OLDBINDIR/initdb -D $OLDDATADIR echo "unix_socket_directories = '/tmp'" >> $OLDDATADIR/postgresql.auto.conf echo "shared_preload_libraries = 'pg_stat_statements, powa'" >> $OLDDATADIR/postgresql.auto.conf $OLDBINDIR/pg_ctl -D $OLDDATADIR -l $OLDDATADIR/pgsql.log start $OLDBINDIR/createdb -h /tmp powa $OLDBINDIR/psql -h /tmp -d powa -c "CREATE EXTENSION powa CASCADE" $OLDBINDIR/pg_ctl -D $OLDDATADIR -m fast stop $NEWBINDIR/initdb -D $NEWDATADIR cp $OLDDATADIR/postgresql.auto.conf $NEWDATADIR/postgresql.auto.conf $NEWBINDIR/pg_upgrade --old-datadir $OLDDATADIR --new-datadir $NEWDATADIR --old-bindir $OLDBINDIR $NEWBINDIR/pg_ctl -D $NEWDATADIR -l $NEWDATADIR/pgsql.log start $NEWBINDIR/psql -h /tmp -d powa -c "SELECT 1 FROM powa_snapshot_metas" # ERROR: MultiXactId 1 has not been created yet -- apparent wraparound (This needs PoWA to be installed; packages are available on pgdg repositories as postgresql--powa on Debian or powa_ on RedHat or directly from source code at https://github.com/powa-team/powa-archivist). As far as I currently understand, this is due to the data to be migrated being somewhat inconsistent (from the perspective of pg_upgrade) when the old cluster and its background workers get started in pg_upgrade during the "checks" step. (The old cluster remains sane, still.) As a solution, it seems that, for similar reasons that we restrict socket access to prevent accidental connections (from commit f763b77193), we should also prevent background workers to start at this step. Please find attached a patch implementing this. Thanks for considering, Denis >From 31b1f31cd3a822d23ccd5883120a013891ade0f3 Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Wed, 20 Jan 2021 17:25:58 +0100 Subject: [PATCH] Disable background workers during servers start in pg_upgrade We disable shared_preload_libraries to prevent background workers to initialize and start during server start in pg_upgrade. In essence, this is for a similar reason that we use a restricted socket access from f763b77193b04eba03a1f4ce46df34dc0348419e because background workers may produce undesired activities during the upgrade. Author: Denis Laxalde Co-authored-by: Jehan-Guillaume de Rorthais --- src/bin/pg_upgrade/server.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c index 31b1425202..fab95a2d24 100644 --- a/src/bin/pg_upgrade/server.c +++ b/src/bin/pg_upgrade/server.c @@ -240,11 +240,13 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error) * crash, the new cluster has to be recreated anyway. fsync=off is a big * win on ext4. * + * Turn off background workers by emptying shared_preload_libraries. + * * Force vacuum_defer_cleanup_age to 0 on the new cluster, so that * vacuumdb --freeze actually freezes the tuples. */ snprintf(cmd, sizeof(cmd), - "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start", + "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s -c shared_preload_libraries=''\" start", cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port, (cluster->controldata.cat_ver >= BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" : -- 2.20.1