Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
[ from a week ago ] Alvaro Herrera writes: > Hm, indri failed: > ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith > -Wdeclaration-after-statement -Werror=vla -Werror=unguarded-availability-new > -Wendif-labels -Wmissing-format-attribute -Wcast-function-type > -Wformat-security -fno-strict-aliasing -fwrapv > -Wno-unused-command-line-argument -Wno-compound-token-split-by-macro -g -O2 > -fno-common -Werror -fvisibility=hidden -bundle -o dblink.dylib dblink.o > -L../../src/port -L../../src/common -L../../src/interfaces/libpq -lpq > -isysroot > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.4.sdk > -L/opt/local/libexec/llvm-15/lib -L/opt/local/lib -L/opt/local/lib > -L/opt/local/lib -L/opt/local/lib -Wl,-dead_strip_dylibs -Werror > -fvisibility=hidden -bundle_loader ../../src/backend/postgres > Undefined symbols for architecture arm64: > "_libintl_gettext", referenced from: > _libpqsrv_cancel in dblink.o > _libpqsrv_cancel in dblink.o > ld: symbol(s) not found for architecture arm64 > clang: error: linker command failed with exit code 1 (use -v to see > invocation) > make[1]: *** [dblink.dylib] Error 1 > make: *** [all-dblink-recurse] Error 2 Having just fixed the same issue for test_json_parser, I now realize what's going on there: dblink's link command doesn't actually mention any of the external libraries that we might need, such as libintl. You can get away with that on some platforms, but not macOS. It would probably be possible to fix that if anyone cared to. I'm not sufficiently excited about it to do so right now --- as you say, we don't support translation in contrib anyway. regards, tom lane
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 4 Apr 2024 at 10:45, Alvaro Herrera wrote: > Yeah, this seems to work and I no longer get that complaint from > headerscheck. patch LGTM
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Apr-03, Tom Lane wrote: > On my machine, headerscheck does not like this: > > $ src/tools/pginclude/headerscheck --cplusplus > In file included from /tmp/headerscheck.4gTaW5/test.cpp:3: > ./src/include/libpq/libpq-be-fe-helpers.h: In function 'char* > libpqsrv_cancel(PGconn*, TimestampTz)': > ./src/include/libpq/libpq-be-fe-helpers.h:393:10: warning: ISO C++ forbids > converting a string constant to 'char*' [-Wwrite-strings] >return "out of memory"; > ^~~ > ./src/include/libpq/libpq-be-fe-helpers.h:421:13: warning: ISO C++ forbids > converting a string constant to 'char*' [-Wwrite-strings] > error = "cancel request timed out"; > ^~ > > The second part of that could easily be fixed by declaring "error" as > "const char *". As for the first part, can we redefine the whole > function as returning "const char *"? (If not, this coding is very > questionable anyway.) Yeah, this seems to work and I no longer get that complaint from headerscheck. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ >From 0af8c7039b2c8ed80bc0bddacfe4a9abb8f527b3 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 4 Apr 2024 10:13:07 +0200 Subject: [PATCH] Make libpqsrv_cancel's return type const char * --- contrib/dblink/dblink.c | 2 +- contrib/postgres_fdw/connection.c | 2 +- src/include/libpq/libpq-be-fe-helpers.h | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index de858e165a..755293456f 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -1347,7 +1347,7 @@ Datum dblink_cancel_query(PG_FUNCTION_ARGS) { PGconn *conn; - char *msg; + const char *msg; TimestampTz endtime; dblink_init(); diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 2532e453c4..cac9d96d33 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -1332,7 +1332,7 @@ pgfdw_cancel_query(PGconn *conn) static bool pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime) { - char *errormsg = libpqsrv_cancel(conn, endtime); + const char *errormsg = libpqsrv_cancel(conn, endtime); if (errormsg != NULL) ereport(WARNING, diff --git a/src/include/libpq/libpq-be-fe-helpers.h b/src/include/libpq/libpq-be-fe-helpers.h index 8be9aa1f2f..fe50829274 100644 --- a/src/include/libpq/libpq-be-fe-helpers.h +++ b/src/include/libpq/libpq-be-fe-helpers.h @@ -382,11 +382,11 @@ libpqsrv_get_result(PGconn *conn, uint32 wait_event_info) * Note: this function leaks a string's worth of memory when reporting * libpq errors. Make sure to call it in a transient memory context. */ -static inline char * +static inline const char * libpqsrv_cancel(PGconn *conn, TimestampTz endtime) { PGcancelConn *cancel_conn; - char *error = NULL; + const char *error = NULL; cancel_conn = PQcancelCreate(conn); if (cancel_conn == NULL) -- 2.39.2
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Alvaro Herrera writes: > Great, thanks for looking. Pushed now, I'll be closing the commitfest > entry shortly. On my machine, headerscheck does not like this: $ src/tools/pginclude/headerscheck --cplusplus In file included from /tmp/headerscheck.4gTaW5/test.cpp:3: ./src/include/libpq/libpq-be-fe-helpers.h: In function 'char* libpqsrv_cancel(PGconn*, TimestampTz)': ./src/include/libpq/libpq-be-fe-helpers.h:393:10: warning: ISO C++ forbids converting a string constant to 'char*' [-Wwrite-strings] return "out of memory"; ^~~ ./src/include/libpq/libpq-be-fe-helpers.h:421:13: warning: ISO C++ forbids converting a string constant to 'char*' [-Wwrite-strings] error = "cancel request timed out"; ^~ The second part of that could easily be fixed by declaring "error" as "const char *". As for the first part, can we redefine the whole function as returning "const char *"? (If not, this coding is very questionable anyway.) regards, tom lane
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Fri, Mar 29, 2024 at 09:17:55AM +0100, Jelte Fennema-Nio wrote: > On Thu, 28 Mar 2024 at 19:03, Tom Lane wrote: > > Could we make this test bulletproof by using an injection point? > > If not, I remain of the opinion that we're better off without it. > > Possibly, and if so, I agree that would be better than the currently > added test. But I honestly don't feel like spending the time on > creating such a test. The SQL test is more representative of real applications, and it's way simpler to understand. In general, I prefer 6-line SQL tests that catch a problem 10% of the time over injection point tests that catch it 100% of the time. For low detection rate to be exciting, it needs to be low enough to have a serious chance of all buildfarm members reporting green for the bad commit. With ~115 buildfarm members running in the last day, 0.1% detection rate would have been low enough to bother improving, but 4% would be high enough to call it good.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 28 Mar 2024 at 19:03, Tom Lane wrote: > > Alvaro Herrera writes: > > It doesn't fail when it's too fast -- it's just that it doesn't cover > > the case we want to cover. > > That's hardly better, because then you think you have test > coverage but maybe you don't. Honestly, that seems quite a lot better. Instead of having randomly failing builds, you have a test that creates coverage 80+% of the time. And that also seems a lot better than having no coverage at all (which is what we had for the last 7 years since introduction of cancellations to postgres_fdw). It would be good to expand the comment in the test though saying that the test might not always cover the intended code path, due to timing problems. > Could we make this test bulletproof by using an injection point? > If not, I remain of the opinion that we're better off without it. Possibly, and if so, I agree that would be better than the currently added test. But I honestly don't feel like spending the time on creating such a test. And given 7 years have passed without someone adding any test for this codepath at all, I don't expect anyone else will either. If you both feel we're better off without the test, feel free to remove it. This was just some small missing test coverage that I noticed while working on this patch, that I thought I'd quickly address. I don't particularly care a lot about the specific test.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Alvaro Herrera writes: > On 2024-Mar-28, Tom Lane wrote: >> If the test fails both when the machine is too slow and when it's >> too fast, then there's zero hope of making it stable and we should >> just remove it. > It doesn't fail when it's too fast -- it's just that it doesn't cover > the case we want to cover. That's hardly better, because then you think you have test coverage but maybe you don't. Could we make this test bulletproof by using an injection point? If not, I remain of the opinion that we're better off without it. regards, tom lane
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-28, Tom Lane wrote: > Jelte Fennema-Nio writes: > > > I think we don't really want to make the timeout too short. Otherwise > > the query might get cancelled before we push any query down to the > > FDW. I guess that means that for some slow machines even 10ms is not > > enough to make the test do the intended purpose. I'd keep it at 10ms, > > which seems long enough for normal systems, while still being pretty > > short. > > If the test fails both when the machine is too slow and when it's > too fast, then there's zero hope of making it stable and we should > just remove it. It doesn't fail when it's too fast -- it's just that it doesn't cover the case we want to cover. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Jelte Fennema-Nio writes: > On Thu, 28 Mar 2024 at 17:43, Alvaro Herrera wrote: >> Hah, you're right, I can reproduce with a smaller timeout, and using SET >> LOCAL works as a fix. If we're doing that, why not reduce the timeout >> to 1ms? We don't need to wait extra 9ms ... > I think we don't really want to make the timeout too short. Otherwise > the query might get cancelled before we push any query down to the > FDW. I guess that means that for some slow machines even 10ms is not > enough to make the test do the intended purpose. I'd keep it at 10ms, > which seems long enough for normal systems, while still being pretty > short. If the test fails both when the machine is too slow and when it's too fast, then there's zero hope of making it stable and we should just remove it. regards, tom lane
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 28 Mar 2024 at 17:43, Alvaro Herrera wrote: > Hah, you're right, I can reproduce with a smaller timeout, and using SET > LOCAL works as a fix. If we're doing that, why not reduce the timeout > to 1ms? We don't need to wait extra 9ms ... I think we don't really want to make the timeout too short. Otherwise the query might get cancelled before we push any query down to the FDW. I guess that means that for some slow machines even 10ms is not enough to make the test do the intended purpose. I'd keep it at 10ms, which seems long enough for normal systems, while still being pretty short.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-28, Jelte Fennema-Nio wrote: > Ugh that's annoying, the RESET is timing out too I guess. Hah, you're right, I can reproduce with a smaller timeout, and using SET LOCAL works as a fix. If we're doing that, why not reduce the timeout to 1ms? We don't need to wait extra 9ms ... -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ “Cuando no hay humildad las personas se degradan” (A. Christie)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 28 Mar 2024 at 17:34, Alvaro Herrera wrote: > > Eh, kestrel has also failed[1], apparently every query after the large > JOIN that this commit added as test fails with a statement timeout error. > > [1] > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel=2024-03-28%2016%3A01%3A14 Ugh that's annoying, the RESET is timing out too I guess. That can hopefully be easily fixed by changing the new test to: BEGIN; SET LOCAL statement_timeout = '10ms'; select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long ROLLBACK;
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Eh, kestrel has also failed[1], apparently every query after the large JOIN that this commit added as test fails with a statement timeout error. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel=2024-03-28%2016%3A01%3A14 -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "No deja de ser humillante para una persona de ingenio saber que no hay tonto que no le pueda enseñar algo." (Jean B. Say)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-28, Alvaro Herrera wrote: > Undefined symbols for architecture arm64: > "_libintl_gettext", referenced from: > _libpqsrv_cancel in dblink.o > _libpqsrv_cancel in dblink.o > ld: symbol(s) not found for architecture arm64 > clang: error: linker command failed with exit code 1 (use -v to see > invocation) > make[1]: *** [dblink.dylib] Error 1 > make: *** [all-dblink-recurse] Error 2 I just removed the _() from the new function. There's not much point in wasting more time on this, given that contrib doesn't have translation support anyway, and we're not using this in libpqwalreceiver. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Crear es tan difícil como ser libre" (Elsa Triolet)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Hm, indri failed: ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Werror=unguarded-availability-new -Wendif-labels -Wmissing-format-attribute -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -Wno-compound-token-split-by-macro -g -O2 -fno-common -Werror -fvisibility=hidden -bundle -o dblink.dylib dblink.o -L../../src/port -L../../src/common -L../../src/interfaces/libpq -lpq -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.4.sdk -L/opt/local/libexec/llvm-15/lib -L/opt/local/lib -L/opt/local/lib -L/opt/local/lib -L/opt/local/lib -Wl,-dead_strip_dylibs -Werror -fvisibility=hidden -bundle_loader ../../src/backend/postgres Undefined symbols for architecture arm64: "_libintl_gettext", referenced from: _libpqsrv_cancel in dblink.o _libpqsrv_cancel in dblink.o ld: symbol(s) not found for architecture arm64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [dblink.dylib] Error 1 make: *** [all-dblink-recurse] Error 2 -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-28, Jelte Fennema-Nio wrote: > Your changes look good, apart from the prverror stuff indeed. If you > remove the prverror stuff again I think this is ready to commit. Great, thanks for looking. Pushed now, I'll be closing the commitfest entry shortly. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Always assume the user will do much worse than the stupidest thing you can imagine."(Julien PUYDT)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 27 Mar 2024 at 19:27, Alvaro Herrera wrote: > I ended up reducing the two PG_TRY blocks to a single one. I see no > reason to split them up, and this way it looks more legible. I definitely agree this looks better. Not sure why I hadn't done that, maybe it wasn't possible in one of the earlier iterations of the API.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 27 Mar 2024 at 19:46, Alvaro Herrera wrote: > > On 2024-Mar-27, Alvaro Herrera wrote: > > > I changed it so that the error messages are returned as translated > > phrases, and was bothered by the fact that if errors happen repeatedly, > > the memory for them might be leaked. Maybe this is fine depending on > > the caller's memory context, but since it's only at most one string each > > time, it's quite easy to just keep track of it so that we can release it > > on the next. > > (Actually this sounds clever but fails pretty obviously if the caller > does free the string, such as in a memory context reset. So I guess we > have to just accept the potential leakage.) Your changes look good, apart from the prverror stuff indeed. If you remove the prverror stuff again I think this is ready to commit.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-27, Alvaro Herrera wrote: > I changed it so that the error messages are returned as translated > phrases, and was bothered by the fact that if errors happen repeatedly, > the memory for them might be leaked. Maybe this is fine depending on > the caller's memory context, but since it's only at most one string each > time, it's quite easy to just keep track of it so that we can release it > on the next. (Actually this sounds clever but fails pretty obviously if the caller does free the string, such as in a memory context reset. So I guess we have to just accept the potential leakage.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La conclusión que podemos sacar de esos estudios es que no podemos sacar ninguna conclusión de ellos" (Tanenbaum)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-22, Jelte Fennema-Nio wrote: > On Thu, 21 Mar 2024 at 03:54, Noah Misch wrote: > > > > On Mon, Mar 18, 2024 at 07:40:10PM +0100, Alvaro Herrera wrote: > > > I enabled the test again and also pushed the changes to dblink, > > > isolationtester and fe_utils (which AFAICS is used by pg_dump, > > > > I recommend adding a libpqsrv_cancel() function to libpq-be-fe-helpers.h, to > > use from dblink and postgres_fdw. pgxn modules calling PQcancel() from the > > backend (citus pg_bulkload plproxy pmpp) then have a better chance to adopt > > the new way. > > Done Nice, thanks. I played with it a bit, mostly trying to figure out if the chosen API is usable. I toyed with making it return boolean success and the error message as an output argument, because I was nervous about what'd happen in OOM. But since this is backend environment, what actually happens is that we elog(ERROR) anyway, so we never return a NULL error message. So after the detour I think Jelte's API is okay. I changed it so that the error messages are returned as translated phrases, and was bothered by the fact that if errors happen repeatedly, the memory for them might be leaked. Maybe this is fine depending on the caller's memory context, but since it's only at most one string each time, it's quite easy to just keep track of it so that we can release it on the next. I ended up reducing the two PG_TRY blocks to a single one. I see no reason to split them up, and this way it looks more legible. What do you think? -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Tiene valor aquel que admite que es un cobarde" (Fernandel) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index edbc9ab02a..de858e165a 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -1347,25 +1347,16 @@ Datum dblink_cancel_query(PG_FUNCTION_ARGS) { PGconn *conn; - PGcancelConn *cancelConn; char *msg; + TimestampTz endtime; dblink_init(); conn = dblink_get_named_conn(text_to_cstring(PG_GETARG_TEXT_PP(0))); - cancelConn = PQcancelCreate(conn); - - PG_TRY(); - { - if (!PQcancelBlocking(cancelConn)) - msg = pchomp(PQcancelErrorMessage(cancelConn)); - else - msg = "OK"; - } - PG_FINALLY(); - { - PQcancelFinish(cancelConn); - } - PG_END_TRY(); + endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), + 3); + msg = libpqsrv_cancel(conn, endtime); + if (msg == NULL) + msg = "OK"; PG_RETURN_TEXT_P(cstring_to_text(msg)); } diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 4931ebf591..2532e453c4 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -133,7 +133,7 @@ static void pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue); static void pgfdw_reject_incomplete_xact_state_change(ConnCacheEntry *entry); static void pgfdw_reset_xact_state(ConnCacheEntry *entry, bool toplevel); static bool pgfdw_cancel_query(PGconn *conn); -static bool pgfdw_cancel_query_begin(PGconn *conn); +static bool pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime); static bool pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime, bool consume_input); static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query, @@ -1315,36 +1315,31 @@ pgfdw_cancel_query(PGconn *conn) endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), CONNECTION_CLEANUP_TIMEOUT); - if (!pgfdw_cancel_query_begin(conn)) + if (!pgfdw_cancel_query_begin(conn, endtime)) return false; return pgfdw_cancel_query_end(conn, endtime, false); } +/* + * Submit a cancel request to the given connection, waiting only until + * the given time. + * + * We sleep interruptibly until we receive confirmation that the cancel + * request has been accepted, and if it is, return true; if the timeout + * lapses without that, or the request fails for whatever reason, return + * false. + */ static bool -pgfdw_cancel_query_begin(PGconn *conn) +pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime) { - PGcancel *cancel; - char errbuf[256]; + char *errormsg = libpqsrv_cancel(conn, endtime); - /* - * Issue cancel request. Unfortunately, there's no good way to limit the - * amount of time that we might block inside PQgetCancel(). - */ - if ((cancel = PQgetCancel(conn))) - { - if (!PQcancel(cancel, errbuf, sizeof(errbuf))) - { - ereport(WARNING, - (errcode(ERRCODE_CONNECTION_FAILURE), - errmsg("could not send cancel request: %s", - errbuf))); - PQfreeCancel(cancel); - return false; - } - PQfreeCancel(cancel); - } + if (errormsg != NULL) + ereport(WARNING, +errcode(ERRCODE_CONNECTION_FAILURE), +errmsg("could not send cancel request: %s", errormsg)); - return true; + return errormsg == NULL; } static bool @@ -1685,7 +1680,11 @@ pgfdw_abort_cleanup_begin(ConnCacheEntry *entry, bool toplevel, */ if (PQtransactionStatus(entry->conn) ==
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 21 Mar 2024 at 03:54, Noah Misch wrote: > > On Mon, Mar 18, 2024 at 07:40:10PM +0100, Alvaro Herrera wrote: > > I enabled the test again and also pushed the changes to dblink, > > isolationtester and fe_utils (which AFAICS is used by pg_dump, > > I recommend adding a libpqsrv_cancel() function to libpq-be-fe-helpers.h, to > use from dblink and postgres_fdw. pgxn modules calling PQcancel() from the > backend (citus pg_bulkload plproxy pmpp) then have a better chance to adopt > the new way. Done v37-0001-postgres_fdw-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Mon, Mar 18, 2024 at 07:40:10PM +0100, Alvaro Herrera wrote: > I enabled the test again and also pushed the changes to dblink, > isolationtester and fe_utils (which AFAICS is used by pg_dump, I recommend adding a libpqsrv_cancel() function to libpq-be-fe-helpers.h, to use from dblink and postgres_fdw. pgxn modules calling PQcancel() from the backend (citus pg_bulkload plproxy pmpp) then have a better chance to adopt the new way.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
I enabled the test again and also pushed the changes to dblink, isolationtester and fe_utils (which AFAICS is used by pg_dump, pg_amcheck, reindexdb and vacuumdb). I chickened out of committing the postgres_fdw changes though, so here they are again. Not sure I'll find courage to get these done by tomorrow, or whether I should just let them for Fujita-san or Noah, who have been the last committers to touch this. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "No renuncies a nada. No te aferres a nada." >From 737578fcdc6ed0de64838d2ad905054b18eb9ec1 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Mon, 18 Mar 2024 19:37:40 +0100 Subject: [PATCH v39] postgres_fdw: Start using new libpq cancel APIs Commit 61461a300c1c introduced new functions to libpq for cancelling queries. This replaces the usage of the old ones in postgres_fdw. Author: Jelte Fennema-Nio Discussion: https://postgr.es/m/cageczqt_vgowwenuqvuv9xqmbacyxjtrrayo8w07oqashk_...@mail.gmail.com --- contrib/postgres_fdw/connection.c | 108 +++--- .../postgres_fdw/expected/postgres_fdw.out| 15 +++ contrib/postgres_fdw/sql/postgres_fdw.sql | 7 ++ 3 files changed, 113 insertions(+), 17 deletions(-) diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 4931ebf591..0c66eaa001 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -133,7 +133,7 @@ static void pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue); static void pgfdw_reject_incomplete_xact_state_change(ConnCacheEntry *entry); static void pgfdw_reset_xact_state(ConnCacheEntry *entry, bool toplevel); static bool pgfdw_cancel_query(PGconn *conn); -static bool pgfdw_cancel_query_begin(PGconn *conn); +static bool pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime); static bool pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime, bool consume_input); static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query, @@ -1315,36 +1315,106 @@ pgfdw_cancel_query(PGconn *conn) endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), CONNECTION_CLEANUP_TIMEOUT); - if (!pgfdw_cancel_query_begin(conn)) + if (!pgfdw_cancel_query_begin(conn, endtime)) return false; return pgfdw_cancel_query_end(conn, endtime, false); } +/* + * Submit a cancel request to the given connection, waiting only until + * the given time. + * + * We sleep interruptibly until we receive confirmation that the cancel + * request has been accepted, and if it is, return true; if the timeout + * lapses without that, or the request fails for whatever reason, return + * false. + */ static bool -pgfdw_cancel_query_begin(PGconn *conn) +pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime) { - PGcancel *cancel; - char errbuf[256]; + bool timed_out = false; + bool failed = false; + PGcancelConn *cancel_conn = PQcancelCreate(conn); - /* - * Issue cancel request. Unfortunately, there's no good way to limit the - * amount of time that we might block inside PQgetCancel(). - */ - if ((cancel = PQgetCancel(conn))) + if (!PQcancelStart(cancel_conn)) { - if (!PQcancel(cancel, errbuf, sizeof(errbuf))) + PG_TRY(); { ereport(WARNING, (errcode(ERRCODE_CONNECTION_FAILURE), errmsg("could not send cancel request: %s", - errbuf))); - PQfreeCancel(cancel); - return false; + pchomp(PQcancelErrorMessage(cancel_conn); } - PQfreeCancel(cancel); + PG_FINALLY(); + { + PQcancelFinish(cancel_conn); + } + PG_END_TRY(); + return false; } - return true; + /* In what follows, do not leak any PGcancelConn on an error. */ + PG_TRY(); + { + while (true) + { + PostgresPollingStatusType pollres = PQcancelPoll(cancel_conn); + TimestampTz now = GetCurrentTimestamp(); + long cur_timeout; + int waitEvents = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH; + + if (pollres == PGRES_POLLING_OK) +break; + + /* If timeout has expired, give up, else get sleep time. */ + cur_timeout = TimestampDifferenceMilliseconds(now, endtime); + if (cur_timeout <= 0) + { +timed_out = true; +failed = true; +goto exit; + } + + switch (pollres) + { +case PGRES_POLLING_READING: + waitEvents |= WL_SOCKET_READABLE; + break; +case PGRES_POLLING_WRITING: + waitEvents |= WL_SOCKET_WRITEABLE; + break; +default: + failed = true; + goto exit; + } + + /* Sleep until there's something to do */ + WaitLatchOrSocket(MyLatch, waitEvents, PQcancelSocket(cancel_conn), + cur_timeout, PG_WAIT_EXTENSION); + ResetLatch(MyLatch); + + CHECK_FOR_INTERRUPTS(); + } +exit: + if (failed) + { + if (timed_out) +ereport(WARNING, + (errmsg("could not cancel request due to timeout"))); + else +ereport(WARNING, + (errcode(ERRCODE_CONNECTION_FAILURE), + errmsg("could not send cancel request: %s", +
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 14 Mar 2024 at 11:33, Alvaro Herrera wrote: > Hmm, isn't this basically saying that we're giving up on reliably > canceling queries altogether? I mean, maybe we'd like to instead fix > the bug about canceling queries in extended query protocol ... > Isn't that something you're worried about? In any case I think it's worth having (non-flaky) test coverage of our libpq cancellation sending code. So I think it makes sense to commit the patch I proposed, even if the backend code to handle that code is arguably buggy. Regarding the question if the backend code is actually buggy or not: the way cancel requests are defined to work is a bit awkward. They cancel whatever operation is running on the session when they arrive. So if the session is just in the middle of a Bind and Execute message there is nothing to cancel. While surprising and probably not what someone would want, I don't think this behaviour is too horrible in practice in this case. Most of the time people cancel queries while the Execute message is being processed. The new test really only runs into this problem because it sends a cancel request, immediately after sending the query. I definitely think it's worth rethinking the way we do query cancellations though. I think what we would probably want is a way to cancel a specific query/message on a session. Instead of cancelling whatever is running at the moment when the cancel request is processed by Postgres. Because this "cancel whatever is running" behaviour is fraught with issues, this Bind/Execute issue being only one of them. One really annoying race condition of a cancel request cancelling another query than intended can happen with this flow (that I spend lots of time on addressing in PgBouncer): 1. You send query A on session 1 2. You send a cancel request for session 1 (intending to cancel query A) 3. Query A completes by itself 4. You now send query B 5. The cancel request is now processed 6. Query B is now cancelled But solving that race condition would involve changing the postgres protocol. Which I'm trying to make possible with the first few commits in[1]. And while those first few commits might still land in PG17, I don't think a large protocol change like adding query identifiers to cancel requests is feasible for PG17 anymore.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-14, Jelte Fennema-Nio wrote: > On Wed, 13 Mar 2024 at 20:08, Jacob Champion > wrote: > > I hit this on my machine. With the attached diff I can reproduce > > constantly (including with the most recent test patch); I think the > > cancel must be arriving between the bind/execute steps? > > Nice find! Your explanation makes total sense. Attached a patchset > that fixes/works around this issue by using the simple query protocol > in the cancel test. Hmm, isn't this basically saying that we're giving up on reliably canceling queries altogether? I mean, maybe we'd like to instead fix the bug about canceling queries in extended query protocol ... Isn't that something you're worried about? -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "World domination is proceeding according to plan"(Andrew Morton)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 13 Mar 2024 at 20:08, Jacob Champion wrote: > I hit this on my machine. With the attached diff I can reproduce > constantly (including with the most recent test patch); I think the > cancel must be arriving between the bind/execute steps? Nice find! Your explanation makes total sense. Attached a patchset that fixes/works around this issue by using the simple query protocol in the cancel test. v38-0001-Use-simple-query-protocol-for-cancel-test.patch Description: Binary data v38-0003-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v38-0002-Revert-Comment-out-noisy-libpq_pipeline-test.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, Mar 13, 2024 at 12:01 PM Alvaro Herrera wrote: > On 2024-Mar-13, Jelte Fennema-Nio wrote: > > Sadly I'm having a hard time reliably reproducing this race condition > > locally. So it's hard to be sure what is happening here. Attached is a > > patch with a wild guess as to what the issue might be (i.e. seeing an > > outdated "active" state and thus passing the check even though the > > query is not running yet) > > I tried leaving the original running in my laptop to see if I could > reproduce it, but got no hits ... and we didn't get any other failures > apart from the three ones already reported ... so it's not terribly high > probability. Anyway I pushed your patch now since the theory seems > plausible; let's see if we still get the issue to reproduce. If it > does, we could make the script more verbose to hunt for further clues. I hit this on my machine. With the attached diff I can reproduce constantly (including with the most recent test patch); I think the cancel must be arriving between the bind/execute steps? Thanks, --Jacob diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 6b7903314a..22ce7c07d9 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2073,6 +2073,9 @@ exec_bind_message(StringInfo input_message) valgrind_report_error_query(debug_query_string); debug_query_string = NULL; + + if (strstr(psrc->query_string, "pg_sleep")) + sleep(1); } /*
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-13, Jelte Fennema-Nio wrote: > I agree it's probably a timing issue. The cancel being received after > the query is done seems very unlikely, since the query takes 180 > seconds (assuming PG_TEST_TIMEOUT_DEFAULT is not lowered for these > animals). I think it's more likely that the cancel request arrives too > early, and thus being ignored because no query is running yet. The > test already had logic to wait until the query backend was in the > "active" state, before sending a cancel to solve that issue. But my > guess is that that somehow isn't enough. > > Sadly I'm having a hard time reliably reproducing this race condition > locally. So it's hard to be sure what is happening here. Attached is a > patch with a wild guess as to what the issue might be (i.e. seeing an > outdated "active" state and thus passing the check even though the > query is not running yet) I tried leaving the original running in my laptop to see if I could reproduce it, but got no hits ... and we didn't get any other failures apart from the three ones already reported ... so it's not terribly high probability. Anyway I pushed your patch now since the theory seems plausible; let's see if we still get the issue to reproduce. If it does, we could make the script more verbose to hunt for further clues. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Here's a general engineering tip: if the non-fun part is too complex for you to figure out, that might indicate the fun part is too ambitious." (John Naylor) https://postgr.es/m/CAFBsxsG4OWHBbSDM%3DsSeXrQGOtkPiOEOuME4yD7Ce41NtaAD9g%40mail.gmail.com
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 13 Mar 2024 at 04:53, Tom Lane wrote: > I suspect it's basically just a > timing dependency. Have you thought about the fact that a cancel > request is a no-op if it arrives after the query's done? I agree it's probably a timing issue. The cancel being received after the query is done seems very unlikely, since the query takes 180 seconds (assuming PG_TEST_TIMEOUT_DEFAULT is not lowered for these animals). I think it's more likely that the cancel request arrives too early, and thus being ignored because no query is running yet. The test already had logic to wait until the query backend was in the "active" state, before sending a cancel to solve that issue. But my guess is that that somehow isn't enough. Sadly I'm having a hard time reliably reproducing this race condition locally. So it's hard to be sure what is happening here. Attached is a patch with a wild guess as to what the issue might be (i.e. seeing an outdated "active" state and thus passing the check even though the query is not running yet) v37-0001-Hopefully-make-cancel-test-more-reliable.patch Description: Binary data v37-0002-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Jelte Fennema-Nio writes: > On Tue, 12 Mar 2024 at 19:28, Alvaro Herrera wrote: >>> Hmm, buildfarm member kestrel (which uses >>> -fsanitize=undefined,alignment) failed: >> Hm, I tried using the same compile flags, couldn't reproduce. > Okay, it passed now it seems so I guess this test is flaky somehow. Two more intermittent failures: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bushmaster=2024-03-13%2003%3A15%3A09 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=taipan=2024-03-13%2003%3A15%3A31 These animals all belong to Andres' flotilla, but other than that I'm not seeing a connection. I suspect it's basically just a timing dependency. Have you thought about the fact that a cancel request is a no-op if it arrives after the query's done? regards, tom lane
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 19:28, Alvaro Herrera wrote: > > On 2024-Mar-12, Alvaro Herrera wrote: > > > Hmm, buildfarm member kestrel (which uses > > -fsanitize=undefined,alignment) failed: > > > > # Running: libpq_pipeline -r 700 cancel port=49975 host=/tmp/dFh46H7YGc > > dbname='postgres' > > test cancellations... > > libpq_pipeline:260: query did not fail when it was expected > > Hm, I tried using the same compile flags, couldn't reproduce. Okay, it passed now it seems so I guess this test is flaky somehow. The error message and the timing difference between failed and succeeded buildfarm run clearly indicates that the pg_sleep ran its 180 seconds to completion (so cancel was never processed for some reason). **failed case** 282/285 postgresql:libpq_pipeline / libpq_pipeline/001_libpq_pipeline ERROR 191.56s exit status 1 **succeeded case** 252/285 postgresql:libpq_pipeline / libpq_pipeline/001_libpq_pipeline OK 10.01s 21 subtests passed I don't see any obvious reason for how this test can be flaky, but I'll think a bit more about it tomorrow.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-12, Alvaro Herrera wrote: > Hmm, buildfarm member kestrel (which uses > -fsanitize=undefined,alignment) failed: > > # Running: libpq_pipeline -r 700 cancel port=49975 host=/tmp/dFh46H7YGc > dbname='postgres' > test cancellations... > libpq_pipeline:260: query did not fail when it was expected Hm, I tried using the same compile flags, couldn't reproduce. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Hmm, buildfarm member kestrel (which uses -fsanitize=undefined,alignment) failed: # Running: libpq_pipeline -r 700 cancel port=49975 host=/tmp/dFh46H7YGc dbname='postgres' test cancellations... libpq_pipeline:260: query did not fail when it was expected https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel=2024-03-12%2016%3A41%3A27 -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom." (Isaac Asimov)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-12, Jelte Fennema-Nio wrote: > On Tue, 12 Mar 2024 at 10:19, Alvaro Herrera wrote: > > Here's a last one for the cfbot. > > Thanks for committing the first 3 patches btw. Attached a tiny change > to 0001, which adds "(backing struct for PGcancelConn)" to the comment > on pg_cancel_conn. Thanks, I included it. I hope there were no other changes, because I didn't verify :-) but if there were, please let me know to incorporate them. I made a number of other small changes, mostly to the documentation, nothing fundamental. (Someday we should stop using to document the libpq functions and use refentry's instead ... it'd be useful to have manpages for these functions.) One thing I don't like very much is release_conn_addrinfo(), which is called conditionally in two places but unconditionally in other places. Maybe it'd make more sense to put this conditionality inside the function itself, possibly with a "bool force" flag to suppress that in the cases where it is not desired. In pqConnectDBComplete, we cast the PGconn * to PGcancelConn * in order to call PQcancelPoll, which is a bit abusive, but I don't know how to do better. Maybe we just accept this ... but if PQcancelStart is the only way to have pqConnectDBStart called from a cancel connection, maybe it'd be saner to duplicate pqConnectDBStart for cancel conns. Thanks! -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 15:04, Jelte Fennema-Nio wrote: > Attached a tiny change to 0001 One more tiny comment change, stating that pg_cancel is used by the deprecated PQcancel function. v36-0001-libpq-Add-encrypted-and-non-blocking-query-cance.patch Description: Binary data v36-0002-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 10:19, Alvaro Herrera wrote: > Here's a last one for the cfbot. Thanks for committing the first 3 patches btw. Attached a tiny change to 0001, which adds "(backing struct for PGcancelConn)" to the comment on pg_cancel_conn. From d340fde6883a249fd7c1a90033675a3b5edb603e Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Thu, 14 Dec 2023 13:39:09 +0100 Subject: [PATCH v35 2/2] Start using new libpq cancel APIs A previous commit introduced new APIs to libpq for cancelling queries. This replaces the usage of the old APIs in most of the codebase with these newer ones. This specifically leaves out changes to psql and pgbench as those would need a much larger refactor to be able to call them, due to the new functions not being signal-safe. --- contrib/dblink/dblink.c | 30 +++-- contrib/postgres_fdw/connection.c | 105 +++--- .../postgres_fdw/expected/postgres_fdw.out| 15 +++ contrib/postgres_fdw/sql/postgres_fdw.sql | 7 ++ src/fe_utils/connect_utils.c | 11 +- src/test/isolation/isolationtester.c | 29 ++--- 6 files changed, 145 insertions(+), 52 deletions(-) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 19a362526d2..98dcca3e6fd 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -1346,22 +1346,32 @@ PG_FUNCTION_INFO_V1(dblink_cancel_query); Datum dblink_cancel_query(PG_FUNCTION_ARGS) { - int res; PGconn *conn; - PGcancel *cancel; - char errbuf[256]; + PGcancelConn *cancelConn; + char *msg; dblink_init(); conn = dblink_get_named_conn(text_to_cstring(PG_GETARG_TEXT_PP(0))); - cancel = PQgetCancel(conn); + cancelConn = PQcancelCreate(conn); - res = PQcancel(cancel, errbuf, 256); - PQfreeCancel(cancel); + PG_TRY(); + { + if (!PQcancelBlocking(cancelConn)) + { + msg = pchomp(PQcancelErrorMessage(cancelConn)); + } + else + { + msg = "OK"; + } + } + PG_FINALLY(); + { + PQcancelFinish(cancelConn); + } + PG_END_TRY(); - if (res == 1) - PG_RETURN_TEXT_P(cstring_to_text("OK")); - else - PG_RETURN_TEXT_P(cstring_to_text(errbuf)); + PG_RETURN_TEXT_P(cstring_to_text(msg)); } diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 4931ebf5915..dcc13dc3b24 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -133,7 +133,7 @@ static void pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue); static void pgfdw_reject_incomplete_xact_state_change(ConnCacheEntry *entry); static void pgfdw_reset_xact_state(ConnCacheEntry *entry, bool toplevel); static bool pgfdw_cancel_query(PGconn *conn); -static bool pgfdw_cancel_query_begin(PGconn *conn); +static bool pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime); static bool pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime, bool consume_input); static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query, @@ -1315,36 +1315,104 @@ pgfdw_cancel_query(PGconn *conn) endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), CONNECTION_CLEANUP_TIMEOUT); - if (!pgfdw_cancel_query_begin(conn)) + if (!pgfdw_cancel_query_begin(conn, endtime)) return false; return pgfdw_cancel_query_end(conn, endtime, false); } static bool -pgfdw_cancel_query_begin(PGconn *conn) +pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime) { - PGcancel *cancel; - char errbuf[256]; + bool timed_out = false; + bool failed = false; + PGcancelConn *cancel_conn = PQcancelCreate(conn); - /* - * Issue cancel request. Unfortunately, there's no good way to limit the - * amount of time that we might block inside PQgetCancel(). - */ - if ((cancel = PQgetCancel(conn))) + + if (!PQcancelStart(cancel_conn)) { - if (!PQcancel(cancel, errbuf, sizeof(errbuf))) + PG_TRY(); { ereport(WARNING, (errcode(ERRCODE_CONNECTION_FAILURE), errmsg("could not send cancel request: %s", - errbuf))); - PQfreeCancel(cancel); - return false; + pchomp(PQcancelErrorMessage(cancel_conn); } - PQfreeCancel(cancel); + PG_FINALLY(); + { + PQcancelFinish(cancel_conn); + } + PG_END_TRY(); + return false; } - return true; + /* In what follows, do not leak any PGcancelConn on an error. */ + PG_TRY(); + { + while (true) + { + TimestampTz now = GetCurrentTimestamp(); + long cur_timeout; + PostgresPollingStatusType pollres = PQcancelPoll(cancel_conn); + int waitEvents = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH; + + if (pollres == PGRES_POLLING_OK) + { +break; + } + + /* If timeout has expired, give up, else get sleep time. */ + cur_timeout = TimestampDifferenceMilliseconds(now, endtime); + if (cur_timeout <= 0) + { +timed_out = true; +failed = true; +goto exit; + } + + switch (pollres) + { +case PGRES_POLLING_READING: + waitEvents |= WL_SOCKET_READABLE; + break; +case
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 10:53, Jelte Fennema-Nio wrote: > I'd rather fail as hard as possible when someone is using the API > wrongly. To be clear, this is my way of looking at it. If you feel strongly about that we should not change conn.status, I'm fine with making that change to the patchset.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 10:19, Alvaro Herrera wrote: > If we do this and we see conn.status is not ALLOCATED, meaning a cancel > is already ongoing, shouldn't we leave conn.status alone instead of > changing to CONNECTION_BAD? I mean, we shouldn't be juggling the elbow > of whoever's doing that, should we? Maybe just add the error message > and return 0? I'd rather fail as hard as possible when someone is using the API wrongly. Not doing so is bound to cause confusion imho. e.g. if the state is still CONNECTION_OK because the user forgot to call PQcancelReset then keeping the connection status "as is" might seem as if the cancel request succeeded even though nothing happened. So if the user uses the API incorrectly then I'd rather use all the avenues possible to indicate that there was an error. Especially since in all other cases if PQcancelStart returns false CONNECTION_BAD is the status, and this in turn means that PQconnectPoll will return PGRES_POLLING_FAILED. So I doubt people will always check the actual return value of the function to check if an error happened. They might check PQcancelStatus or PQconnectPoll instead, because that integrates easier with the rest of their code.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Here's a last one for the cfbot. I have a question about this one int PQcancelStart(PGcancelConn *cancelConn) { [...] if (cancelConn->conn.status != CONNECTION_ALLOCATED) { libpq_append_conn_error(>conn, "cancel request is already being sent on this connection"); cancelConn->conn.status = CONNECTION_BAD; return 0; } If we do this and we see conn.status is not ALLOCATED, meaning a cancel is already ongoing, shouldn't we leave conn.status alone instead of changing to CONNECTION_BAD? I mean, we shouldn't be juggling the elbow of whoever's doing that, should we? Maybe just add the error message and return 0? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "If it is not right, do not do it. If it is not true, do not say it." (Marcus Aurelius, Meditations) >From fc0cbf0a6184d374e12e051f88f9f8eef7cc30d9 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 12 Mar 2024 10:09:25 +0100 Subject: [PATCH v34 1/2] libpq: Add encrypted and non-blocking query cancellation routines The existing PQcancel API uses blocking IO, which makes PQcancel impossible to use in an event loop based codebase without blocking the event loop until the call returns. It also doesn't encrypt the connection over which the cancel request is sent, even when the original connection required encryption. This commit adds a PQcancelConn struct and assorted functions, which provide a better mechanism of sending cancel requests; in particular all the encryption used in the original connection are also used in the cancel connection. The main entry points are: - PQcancelCreate creates the PQcancelConn based on the original connection (but does not establish an actual connection). - PQcancelStart can be used to initiate non-blocking cancel requests, using encryption if the original connection did so, which must be pumped using - PQcancelPoll. - PQcancelReset puts a PQcancelConn back in state so that it can be reused to send a new cancel request to the same connection. - PQcancelBlocking is a simpler-to-use blocking API that still uses encryption. Additional functions are - PQcancelStatus, mimicks PQstatus; - PQcancelSocket, mimicks PQcancelSocket; - PQcancelErrorMessage, mimicks PQerrorMessage; - PQcancelFinish, mimicks PQfinish. Author: Jelte Fennema-Nio Reviewed-by: Denis Laxalde Discussion: https://postgr.es/m/am5pr83mb0178d3b31ca1b6ec4a8ecc42f7...@am5pr83mb0178.eurprd83.prod.outlook.com --- doc/src/sgml/libpq.sgml | 465 -- src/interfaces/libpq/exports.txt | 9 + src/interfaces/libpq/fe-cancel.c | 297 ++- src/interfaces/libpq/fe-connect.c | 129 - src/interfaces/libpq/libpq-fe.h | 31 +- src/interfaces/libpq/libpq-int.h | 5 + .../modules/libpq_pipeline/libpq_pipeline.c | 125 + src/tools/pgindent/typedefs.list | 1 + 8 files changed, 1015 insertions(+), 47 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index a2bbf33d02..373d0dc322 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -265,7 +265,7 @@ PGconn *PQsetdb(char *pghost, PQconnectStartParamsPQconnectStartParams PQconnectStartPQconnectStart - PQconnectPollPQconnectPoll + PQconnectPollPQconnectPoll nonblocking connection @@ -5287,7 +5287,7 @@ int PQisBusy(PGconn *conn); / can also attempt to cancel a command that is still being processed by the server; see . But regardless of - the return value of , the application + the return value of , the application must continue with the normal result-reading sequence using . A successful cancellation will simply cause the command to terminate sooner than it would have @@ -6034,10 +6034,402 @@ int PQsetSingleRowMode(PGconn *conn); SQL command - - A client application can request cancellation of a command that is - still being processed by the server, using the functions described in - this section. + + Functions for Sending Cancel Requests + + + PQcancelCreatePQcancelCreate + + + + Prepares a connection over which a cancel request can be sent. + +PGcancelConn *PQcancelCreate(PGconn *conn); + + + + +creates a + PGcancelConnPGcancelConn + object, but it won't instantly start sending a cancel request over this + connection. A cancel request can be sent over this connection in a + blocking manner using and in a + non-blocking manner using . + The return value can be passed to + to check if the PGcancelConn object was + created successfully. The PGcancelConn object + is an opaque structure that is not meant to be accessed directly by the + application. This PGcancelConn object can be + used to cancel the query that's
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Attached is a new patchset with various changes. I created a dedicated 0002 patch to add tests for the already existing cancellation functions, because that seemed useful for another thread where changes to the cancellation protocol are being proposed[1]. [1]: https://www.postgresql.org/message-id/flat/508d0505-8b7a-4864-a681-e7e5edfe32aa%40iki.fi On Wed, 6 Mar 2024 at 19:22, Alvaro Herrera wrote: > > Docs: one bogus "that that". This was already fixed by my previous doc changes in v32, I guess that email got crossed with this one > Did we consider having PQcancelConn() instead be called > PQcancelCreate()? Done > "Asynchronously cancel a query on the given connection. This requires > polling the returned PGcancelConn to actually complete the cancellation > of the query." but this is no longer a good description of what this > function does. Fixed > Anyway, maybe there are reasons for this; but in any case we > should set ->cancelRequest in all cases, not only after the first tests > for errors. Done > I think the extra PGconn inside pg_cancel_conn is useless; it would be > simpler to typedef PGcancelConn to PGconn in fe-cancel.c, and remove the > indirection through the extra struct. You're actually dereferencing the > object in two ways in the new code, both by casting the outer object > straight to PGconn (taking advantage that the struct member is first in > the struct), and by using PGcancelConn->conn. This seems pointless. I > mean, if we're going to cast to "PGconn *" in some places anyway, then > we may as well access all members directly. Perhaps, if you want, you > could add asserts that ->cancelRequest is set true in all the > fe-cancel.c functions. Anyway, we'd still have compiler support to tell > you that you're passing the wrong struct to the function. (I didn't > actually try to change the code this way, so I might be wrong.) Turns out you were wrong about the compiler support to tell us we're passing the wrong struct: When both the PGconn and PGcancelConn typedefs refer to the same struct, the compiler allows passing PGconn to PGcancelConn functions and vice versa without complaining. This seems enough reason for me to keep indirection through the extra struct. So instead of adding the proposed typed this typedef I chose to add a comment to pg_cancel_conn explaining its purpose, as well as not casting PGcancelConn to PGconn but always accessing the conn field for consistency. > We could move the definition of struct pg_cancel to fe-cancel.c. Nobody > outside that needs to know that definition anyway. Done in 0003 v33-0003-libpq-Move-pg_cancel-to-fe-cancel.c.patch Description: Binary data v33-0002-Add-tests-for-libpq-query-cancellation-APIs.patch Description: Binary data v33-0001-Add-missing-connection-statuses-to-docs.patch Description: Binary data v33-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v33-0004-libpq-Add-encrypted-and-non-blocking-versions-of.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 6 Mar 2024 at 19:22, Alvaro Herrera wrote: > Docs: one bogus "that that". will fix > Did we consider having PQcancelConn() instead be called > PQcancelCreate()? Fine by me > Also, the comment still says > "Asynchronously cancel a query on the given connection. This requires > polling the returned PGcancelConn to actually complete the cancellation > of the query." but this is no longer a good description of what this > function does. will fix > Why do we return a non-NULL pointer from PQcancelConn in the first three > cases where we return errors? (original conn was NULL, original conn is > PGINVALID_SOCKET, pqCopyPGconn returns failure) Wouldn't it make more > sense to free the allocated object and return NULL? Actually, I wonder > if there's any reason at all to return a valid pointer in any failure > cases; I mean, do we really expect that application authors are going to > read/report the error message from a PGcancelConn that failed to be fully > created? I think having a useful error message when possible is quite nice. And I do think people will read/report this error message. Especially since many people will simply pass it to PQcancelBlocking, whether it's NULL or not. And then check the status, and then report the error if the status was CONNECTION_BAD. > but in any case we > should set ->cancelRequest in all cases, not only after the first tests > for errors. makes sense > I think the extra PGconn inside pg_cancel_conn is useless; it would be > simpler to typedef PGcancelConn to PGconn in fe-cancel.c, and remove the > indirection through the extra struct. That sounds nice indeed. I'll try it out. > We could move the definition of struct pg_cancel to fe-cancel.c. Nobody > outside that needs to know that definition anyway. will do
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Docs: one bogus "that that". Did we consider having PQcancelConn() instead be called PQcancelCreate()? I think this better conveys that what we're doing is create an object that can be used to do something, and that nothing else is done with it by default. Also, the comment still says "Asynchronously cancel a query on the given connection. This requires polling the returned PGcancelConn to actually complete the cancellation of the query." but this is no longer a good description of what this function does. Why do we return a non-NULL pointer from PQcancelConn in the first three cases where we return errors? (original conn was NULL, original conn is PGINVALID_SOCKET, pqCopyPGconn returns failure) Wouldn't it make more sense to free the allocated object and return NULL? Actually, I wonder if there's any reason at all to return a valid pointer in any failure cases; I mean, do we really expect that application authors are going to read/report the error message from a PGcancelConn that failed to be fully created? Anyway, maybe there are reasons for this; but in any case we should set ->cancelRequest in all cases, not only after the first tests for errors. I think the extra PGconn inside pg_cancel_conn is useless; it would be simpler to typedef PGcancelConn to PGconn in fe-cancel.c, and remove the indirection through the extra struct. You're actually dereferencing the object in two ways in the new code, both by casting the outer object straight to PGconn (taking advantage that the struct member is first in the struct), and by using PGcancelConn->conn. This seems pointless. I mean, if we're going to cast to "PGconn *" in some places anyway, then we may as well access all members directly. Perhaps, if you want, you could add asserts that ->cancelRequest is set true in all the fe-cancel.c functions. Anyway, we'd still have compiler support to tell you that you're passing the wrong struct to the function. (I didn't actually try to change the code this way, so I might be wrong.) We could move the definition of struct pg_cancel to fe-cancel.c. Nobody outside that needs to know that definition anyway. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "XML!" Exclaimed C++. "What are you doing here? You're not a programming language." "Tell that to the people who use me," said XML. https://burningbird.net/the-parable-of-the-languages/
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 6 Mar 2024 at 15:03, Denis Laxalde wrote: > > In patch 0004, I noticed a couple of typos in the documentation; please > find attached a fixup patch correcting these. Thanks, applied. > while not actually listing the "statuses". Should we list them? I listed the relevant statuses over now and updated the PQcancelStatus docs to look more like the PQstatus one. I didn't list any statuses that a cancel connection could never have (but a normal connection can). While going over the list of statuses possible for a cancel connection I realized that the docs for PQconnectStart were not listing all relevant statuses, so I fixed that in patch 0001. From 40d3d9b0f4058bcf3041e63f71ce4c56e43e73f2 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 6 Mar 2024 18:33:49 +0100 Subject: [PATCH v32 1/3] Add missing connection statuses to docs The list of connection statuses that PQstatus might return during an asynchronous connection attempt was incorrect: 1. CONNECTION_SETENV is never returned anymore and is only part of the enum for backwards compatibility. So it's removed from the list. 2. CONNECTION_CHECK_STANDBY and CONNECTION_GSS_STARTUP were not listed. This addresses those problems. CONNECTION_NEEDED and CONNECTION_CHECK_TARGET are not listed in the docs on purpose, since these states are internal states that can never be observed by a caller of PQstatus. --- doc/src/sgml/libpq.sgml | 15 --- src/interfaces/libpq/libpq-fe.h | 3 ++- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 1d8998efb2a..a2bbf33d029 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -428,11 +428,11 @@ PostgresPollingStatusType PQconnectPoll(PGconn *conn); - - CONNECTION_SETENV + + CONNECTION_GSS_STARTUP - Negotiating environment-driven parameter settings. + Negotiating GSS encryption. @@ -446,6 +446,15 @@ PostgresPollingStatusType PQconnectPoll(PGconn *conn); + + CONNECTION_CHECK_STANDBY + + + Checking if connection is to a server in standby mode. + + + + CONNECTION_CONSUME diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h index defc415fa3f..1e5e7481a7c 100644 --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -77,7 +77,8 @@ typedef enum CONNECTION_CHECK_WRITABLE, /* Checking if session is read-write. */ CONNECTION_CONSUME, /* Consuming any extra messages. */ CONNECTION_GSS_STARTUP, /* Negotiating GSSAPI. */ - CONNECTION_CHECK_TARGET, /* Checking target server properties. */ + CONNECTION_CHECK_TARGET, /* Internal state: Checking target server + * properties. */ CONNECTION_CHECK_STANDBY /* Checking if server is in standby mode. */ } ConnStatusType; base-commit: de7c6fe8347ab726c80ebbfcdb57f4b714d5243d -- 2.34.1 From 15d91bb8ca87764eee02d785126495df1f6ffd5f Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Thu, 14 Dec 2023 13:39:09 +0100 Subject: [PATCH v32 3/3] Start using new libpq cancel APIs A previous commit introduced new APIs to libpq for cancelling queries. This replaces the usage of the old APIs in most of the codebase with these newer ones. This specifically leaves out changes to psql and pgbench as those would need a much larger refactor to be able to call them, due to the new functions not being signal-safe. --- contrib/dblink/dblink.c | 30 +++-- contrib/postgres_fdw/connection.c | 105 +++--- .../postgres_fdw/expected/postgres_fdw.out| 15 +++ contrib/postgres_fdw/sql/postgres_fdw.sql | 7 ++ src/fe_utils/connect_utils.c | 11 +- src/test/isolation/isolationtester.c | 29 ++--- 6 files changed, 145 insertions(+), 52 deletions(-) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 19a362526d2..8b4013c480a 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -1346,22 +1346,32 @@ PG_FUNCTION_INFO_V1(dblink_cancel_query); Datum dblink_cancel_query(PG_FUNCTION_ARGS) { - int res; PGconn *conn; - PGcancel *cancel; - char errbuf[256]; + PGcancelConn *cancelConn; + char *msg; dblink_init(); conn = dblink_get_named_conn(text_to_cstring(PG_GETARG_TEXT_PP(0))); - cancel = PQgetCancel(conn); + cancelConn = PQcancelConn(conn); - res = PQcancel(cancel, errbuf, 256); - PQfreeCancel(cancel); + PG_TRY(); + { + if (!PQcancelBlocking(cancelConn)) + { + msg = pchomp(PQcancelErrorMessage(cancelConn)); + } + else + { + msg = "OK"; + } + } + PG_FINALLY(); + { + PQcancelFinish(cancelConn); + } + PG_END_TRY(); - if (res == 1) - PG_RETURN_TEXT_P(cstring_to_text("OK")); - else - PG_RETURN_TEXT_P(cstring_to_text(errbuf)); +
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
In patch 0004, I noticed a couple of typos in the documentation; please find attached a fixup patch correcting these. Still in the documentation, same patch, the last paragraph documenting PQcancelPoll() ends as: + indicate the current stage of the connection procedure and might be useful + to provide feedback to the user for example. These statuses are: + while not actually listing the "statuses". Should we list them? Adjust the wording? Or refer to PQconnectPoll() documentation (since the paragraph is copied from there it seems)? Otherwise, the feature still works fine as far as I can tell.From 3e04442e3f283829ed38e4a2b435fd182addf87a Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Wed, 6 Mar 2024 14:55:40 +0100 Subject: [PATCH] fixup! libpq: Add encrypted and non-blocking versions of PQcancel --- doc/src/sgml/libpq.sgml | 2 +- src/interfaces/libpq/fe-cancel.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 1613fcc7bb..1281cac284 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -6122,7 +6122,7 @@ PostgresPollingStatusType PQcancelPoll(PGcancelConn *cancelConn); The request is made over the given PGcancelConn, which needs to be created with . - The return value of + The return value of is 1 if the cancellation request could be started and 0 if not. If it was unsuccessful, the error message can be retrieved using . diff --git a/src/interfaces/libpq/fe-cancel.c b/src/interfaces/libpq/fe-cancel.c index e66b8819ee..9c9a23bb4d 100644 --- a/src/interfaces/libpq/fe-cancel.c +++ b/src/interfaces/libpq/fe-cancel.c @@ -146,7 +146,7 @@ PQcancelBlocking(PGcancelConn *cancelConn) /* * PQcancelStart * - * Starts sending a cancellation request in a blocking fashion. Returns + * Starts sending a cancellation request in a non-blocking fashion. Returns * 1 if successful 0 if not. */ int -- 2.39.2
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 14 Feb 2024 at 18:41, Alvaro Herrera wrote: > Hmm, I think the changes to libpq_pipeline in 0005 should be in 0004. Yeah, you're correct. Fixed that now. v32-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v32-0004-libpq-Add-encrypted-and-non-blocking-versions-of.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Feb-14, Jelte Fennema-Nio wrote: > Attached is a new version of the final patches, with much improved > docs (imho) and the new function names: PQcancelStart and > PQcancelBlocking. Hmm, I think the changes to libpq_pipeline in 0005 should be in 0004. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Sun, 4 Feb 2024 at 16:39, Alvaro Herrera wrote: > Maybe this is okay? I'll have a look at the whole final situation more > carefully later; or if somebody else wants to share an opinion, please > do so. Attached is a new version of the final patches, with much improved docs (imho) and the new function names: PQcancelStart and PQcancelBlocking. v31-0004-libpq-Add-encrypted-and-non-blocking-versions-of.patch Description: Binary data v31-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Feb-02, Jelte Fennema-Nio wrote: > The only reasonable thing I can think of to make that situation better > is to move that part of the function outside of PQcancelPoll and > create a dedicated PQcancelStart function for it. It introduces an > extra function, but it does seem more in line with how we do the > regular connection establishment. Basically you would have code like > this then, which looks quite nice honestly: > > cancelConn = PQcancelConn(conn); > if (!PQcancelStart(cancelConn)) > pg_fatal("bad cancel connection: %s", > PQcancelErrorMessage(cancelConn)); > while (true) > { > // polling using PQcancelPoll here > } Maybe this is okay? I'll have a look at the whole final situation more carefully later; or if somebody else wants to share an opinion, please do so. In the meantime I pushed your 0002 and 0003 patches, so you can take this as an opportunity to rebase the remaining ones. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom." (Isaac Asimov)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Fri, 2 Feb 2024 at 16:06, Alvaro Herrera wrote: > Now, looking at this list, I think it's surprising that the nonblocking > request for a cancellation is called PQcancelPoll. PQcancelSend() is at > odds with the asynchronous query API, which uses the verb "send" for the > asynchronous variants. This would suggest that PQcancelPoll should > actually be called PQcancelSend or maybe PQcancelStart (mimicking > PQconnectStart). I'm not sure what's a good alternative name for the > blocking one, which you have called PQcancelSend. I agree that Send is an unfortunate suffix. I'd love to use PQcancel for this, but obviously that one is already taken. Some other options that I can think of are (from favorite to less favorite): - PQcancelBlocking - PQcancelAndWait - PQcancelGo - PQcancelNow Finally, another option would be to renome PQcancelConn to PQgetCancelConn and then rename PQcancelSend to PQcancelConn. Regarding PQcancelPoll, I think it's a good name for the polling function, but I agree it's a bit confusing to use it to also start sending the connection. Even the code of PQcancelPoll basically admits that this is confusing behaviour: /* * Before we can call PQconnectPoll we first need to start the connection * using pqConnectDBStart. Non-cancel connections already do this whenever * the connection is initialized. But cancel connections wait until the * caller starts polling, because there might be a large delay between * creating a cancel connection and actually wanting to use it. */ if (conn->status == CONNECTION_STARTING) { if (!pqConnectDBStart(>conn)) { cancelConn->conn.status = CONNECTION_STARTED; return PGRES_POLLING_WRITING; } } The only reasonable thing I can think of to make that situation better is to move that part of the function outside of PQcancelPoll and create a dedicated PQcancelStart function for it. It introduces an extra function, but it does seem more in line with how we do the regular connection establishment. Basically you would have code like this then, which looks quite nice honestly: cancelConn = PQcancelConn(conn); if (!PQcancelStart(cancelConn)) pg_fatal("bad cancel connection: %s", PQcancelErrorMessage(cancelConn)); while (true) { // polling using PQcancelPoll here }
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Hello, The patched docs claim that PQrequestCancel is insecure, but neither the code nor docs explain why. The docs for PQcancel on the other hand do mention that encryption is not used; does that apply to PQrequestCancel as well and is that the reason? If so, I think we should copy the warning and perhaps include a code comment about that. Also, maybe that final phrase in PQcancel should be a box: remove from "So, for example" and add Because gssencmode and sslencmode are not preserved from the original connection, the cancel request is not encrypted. or something like that. I wonder if Section 33.7 Canceling Queries in Progress should be split in three subsections, and I propose the following order: 33.7.1 PGcancelConn-based Cancellation API PQcancelConn -- we first document the basics PQcancelSend PQcancelFinish PQcancelPoll -- the nonblocking interface is documented next PQcancelReset -- reuse a cancelconn, later in docs because it's more advanced PQcancelStatus-- accessors go last PQcancelSocket PQcancelErrorMessage 33.7.2 Obsolete interface PQgetCancel PQfreeCancel PQcancel 33.7.3 Deprecated and Insecure Methods PQrequestCancel I have a hard time coming up with good subsection titles though. Now, looking at this list, I think it's surprising that the nonblocking request for a cancellation is called PQcancelPoll. PQcancelSend() is at odds with the asynchronous query API, which uses the verb "send" for the asynchronous variants. This would suggest that PQcancelPoll should actually be called PQcancelSend or maybe PQcancelStart (mimicking PQconnectStart). I'm not sure what's a good alternative name for the blocking one, which you have called PQcancelSend. I see upthread that the names of these functions were already quite heavily debated. Sorry to beat that dead horse some more ... I'm just not sure it's decided matter. Lastly -- the doc blurbs that say simply "a version of XYZ that can be used for cancellation connections" are a bit underwhelming. Shouldn't we document these more fully instead of making users go read the docs for the other functions and wonder what the differences might be, if any? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Before you were born your parents weren't as boring as they are now. They got that way paying your bills, cleaning up your room and listening to you tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Fri, 2 Feb 2024 at 13:19, Alvaro Herrera wrote: > Thank you, looks good. > > I propose the following minor/trivial fixes over your initial 3 patches. All of those seem good like fixes. Attached is an updated patchset where they are all applied. As well as adding a missing word ("been") in a comment that I noticed while reading your fixes. From 7736e940567878c32355c2143cddba3b13bfa71e Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Fri, 26 Jan 2024 16:47:51 +0100 Subject: [PATCH v30 3/5] libpq: Change some static functions to extern This is in preparation of a follow up commit that starts using these functions from fe-cancel.c. --- src/interfaces/libpq/fe-connect.c | 87 +++ src/interfaces/libpq/libpq-int.h | 6 +++ 2 files changed, 47 insertions(+), 46 deletions(-) diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 079abfca9e..7d8616eb6d 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -387,15 +387,10 @@ static const char uri_designator[] = "postgresql://"; static const char short_uri_designator[] = "postgres://"; static bool connectOptions1(PGconn *conn, const char *conninfo); -static bool connectOptions2(PGconn *conn); -static int connectDBStart(PGconn *conn); -static int connectDBComplete(PGconn *conn); static PGPing internal_ping(PGconn *conn); -static PGconn *makeEmptyPGconn(void); static void pqFreeCommandQueue(PGcmdQueueEntry *queue); static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions); static void freePGconn(PGconn *conn); -static void closePGconn(PGconn *conn); static void release_conn_addrinfo(PGconn *conn); static int store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist); static void sendTerminateConn(PGconn *conn); @@ -644,8 +639,8 @@ pqDropServerData(PGconn *conn) * PQconnectStart or PQconnectStartParams (which differ in the same way as * PQconnectdb and PQconnectdbParams) and PQconnectPoll. * - * Internally, the static functions connectDBStart, connectDBComplete - * are part of the connection procedure. + * The non-exported functions pqConnectDBStart, pqConnectDBComplete are + * part of the connection procedure implementation. */ /* @@ -678,7 +673,7 @@ PQconnectdbParams(const char *const *keywords, PGconn *conn = PQconnectStartParams(keywords, values, expand_dbname); if (conn && conn->status != CONNECTION_BAD) - (void) connectDBComplete(conn); + (void) pqConnectDBComplete(conn); return conn; } @@ -731,7 +726,7 @@ PQconnectdb(const char *conninfo) PGconn *conn = PQconnectStart(conninfo); if (conn && conn->status != CONNECTION_BAD) - (void) connectDBComplete(conn); + (void) pqConnectDBComplete(conn); return conn; } @@ -785,7 +780,7 @@ PQconnectStartParams(const char *const *keywords, * to initialize conn->errorMessage to empty. All subsequent steps during * connection initialization will only append to that buffer. */ - conn = makeEmptyPGconn(); + conn = pqMakeEmptyPGconn(); if (conn == NULL) return NULL; @@ -819,15 +814,15 @@ PQconnectStartParams(const char *const *keywords, /* * Compute derived options */ - if (!connectOptions2(conn)) + if (!pqConnectOptions2(conn)) return conn; /* * Connect to the database */ - if (!connectDBStart(conn)) + if (!pqConnectDBStart(conn)) { - /* Just in case we failed to set it in connectDBStart */ + /* Just in case we failed to set it in pqConnectDBStart */ conn->status = CONNECTION_BAD; } @@ -863,7 +858,7 @@ PQconnectStart(const char *conninfo) * to initialize conn->errorMessage to empty. All subsequent steps during * connection initialization will only append to that buffer. */ - conn = makeEmptyPGconn(); + conn = pqMakeEmptyPGconn(); if (conn == NULL) return NULL; @@ -876,15 +871,15 @@ PQconnectStart(const char *conninfo) /* * Compute derived options */ - if (!connectOptions2(conn)) + if (!pqConnectOptions2(conn)) return conn; /* * Connect to the database */ - if (!connectDBStart(conn)) + if (!pqConnectDBStart(conn)) { - /* Just in case we failed to set it in connectDBStart */ + /* Just in case we failed to set it in pqConnectDBStart */ conn->status = CONNECTION_BAD; } @@ -895,7 +890,7 @@ PQconnectStart(const char *conninfo) * Move option values into conn structure * * Don't put anything cute here --- intelligence should be in - * connectOptions2 ... + * pqConnectOptions2 ... * * Returns true on success. On failure, returns false and sets error message. */ @@ -933,7 +928,7 @@ fillPGconn(PGconn *conn, PQconninfoOption *connOptions) * * Internal subroutine to set up connection parameters given an already- * created PGconn and a conninfo string. Derived settings should be - * processed by calling connectOptions2 next. (We split them because + * processed by calling pqConnectOptions2 next. (We split them because * PQsetdbLogin overrides
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Jan-29, Jelte Fennema-Nio wrote: > On Mon, 29 Jan 2024 at 12:44, Alvaro Herrera wrote: > > Thanks! I committed 0001 now. I also renamed the new > > pq_parse_int_param to pqParseIntParam, for consistency with other > > routines there. Please rebase the other patches. > > Awesome! Rebased, and renamed pq_release_conn_hosts to > pqReleaseConnHosts for the same consistency reasons. Thank you, looks good. I propose the following minor/trivial fixes over your initial 3 patches. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "I can't go to a restaurant and order food because I keep looking at the fonts on the menu. Five minutes later I realize that it's also talking about food" (Donald Knuth) >From 92ca8dc2739a777ff5a0df990d6e9818c5729ac5 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 2 Feb 2024 10:57:02 +0100 Subject: [PATCH 1/4] pgindent --- src/interfaces/libpq/fe-cancel.c | 14 +++--- src/interfaces/libpq/libpq-fe.h | 17 + src/tools/pgindent/typedefs.list | 1 + 3 files changed, 17 insertions(+), 15 deletions(-) diff --git a/src/interfaces/libpq/fe-cancel.c b/src/interfaces/libpq/fe-cancel.c index 7416791d9f..d75c9628e7 100644 --- a/src/interfaces/libpq/fe-cancel.c +++ b/src/interfaces/libpq/fe-cancel.c @@ -137,7 +137,7 @@ oom_error: * Returns 1 if successful 0 if not. */ int -PQcancelSend(PGcancelConn * cancelConn) +PQcancelSend(PGcancelConn *cancelConn) { if (!cancelConn || cancelConn->conn.status == CONNECTION_BAD) return 1; @@ -157,7 +157,7 @@ PQcancelSend(PGcancelConn * cancelConn) * Poll a cancel connection. For usage details see PQconnectPoll. */ PostgresPollingStatusType -PQcancelPoll(PGcancelConn * cancelConn) +PQcancelPoll(PGcancelConn *cancelConn) { PGconn *conn = (PGconn *) cancelConn; int n; @@ -249,7 +249,7 @@ PQcancelPoll(PGcancelConn * cancelConn) * Get the status of a cancel connection. */ ConnStatusType -PQcancelStatus(const PGcancelConn * cancelConn) +PQcancelStatus(const PGcancelConn *cancelConn) { return PQstatus((const PGconn *) cancelConn); } @@ -260,7 +260,7 @@ PQcancelStatus(const PGcancelConn * cancelConn) * Get the socket of the cancel connection. */ int -PQcancelSocket(const PGcancelConn * cancelConn) +PQcancelSocket(const PGcancelConn *cancelConn) { return PQsocket((const PGconn *) cancelConn); } @@ -271,7 +271,7 @@ PQcancelSocket(const PGcancelConn * cancelConn) * Get the socket of the cancel connection. */ char * -PQcancelErrorMessage(const PGcancelConn * cancelConn) +PQcancelErrorMessage(const PGcancelConn *cancelConn) { return PQerrorMessage((const PGconn *) cancelConn); } @@ -283,7 +283,7 @@ PQcancelErrorMessage(const PGcancelConn * cancelConn) * request. */ void -PQcancelReset(PGcancelConn * cancelConn) +PQcancelReset(PGcancelConn *cancelConn) { pqClosePGconn((PGconn *) cancelConn); cancelConn->conn.status = CONNECTION_STARTING; @@ -299,7 +299,7 @@ PQcancelReset(PGcancelConn * cancelConn) * Closes and frees the cancel connection. */ void -PQcancelFinish(PGcancelConn * cancelConn) +PQcancelFinish(PGcancelConn *cancelConn) { PQfinish((PGconn *) cancelConn); } diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h index 857ba54d94..851e549355 100644 --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -329,17 +329,18 @@ extern PostgresPollingStatusType PQresetPoll(PGconn *conn); extern void PQreset(PGconn *conn); /* Create a PGcancelConn that's used to cancel a query on the given PGconn */ -extern PGcancelConn * PQcancelConn(PGconn *conn); +extern PGcancelConn *PQcancelConn(PGconn *conn); + /* issue a blocking cancel request */ -extern int PQcancelSend(PGcancelConn * conn); +extern int PQcancelSend(PGcancelConn *conn); /* issue or poll a non-blocking cancel request */ -extern PostgresPollingStatusType PQcancelPoll(PGcancelConn * cancelConn); -extern ConnStatusType PQcancelStatus(const PGcancelConn * cancelConn); -extern int PQcancelSocket(const PGcancelConn * cancelConn); -extern char *PQcancelErrorMessage(const PGcancelConn * cancelConn); -extern void PQcancelReset(PGcancelConn * cancelConn); -extern void PQcancelFinish(PGcancelConn * cancelConn); +extern PostgresPollingStatusType PQcancelPoll(PGcancelConn *cancelConn); +extern ConnStatusType PQcancelStatus(const PGcancelConn *cancelConn); +extern int PQcancelSocket(const PGcancelConn *cancelConn); +extern char *PQcancelErrorMessage(const PGcancelConn *cancelConn); +extern void PQcancelReset(PGcancelConn *cancelConn); +extern void PQcancelFinish(PGcancelConn *cancelConn); /* request a cancel structure */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 91433d439b..9ffb169e9d 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Mon, 29 Jan 2024 at 12:44, Alvaro Herrera wrote: > Thanks! I committed 0001 now. I also renamed the new > pq_parse_int_param to pqParseIntParam, for consistency with other > routines there. Please rebase the other patches. Awesome! Rebased, and renamed pq_release_conn_hosts to pqReleaseConnHosts for the same consistency reasons. v29-0004-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v29-0002-libpq-Add-pqReleaseConnHosts-function.patch Description: Binary data v29-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v29-0003-libpq-Change-some-static-functions-to-extern.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Jan-28, Jelte Fennema-Nio wrote: > On Sun, 28 Jan 2024 at 10:51, Jelte Fennema-Nio wrote: > > Both of those are fixed now. > > Okay, there turned out to also be an issue on Windows with > setKeepalivesWin32 not being available in fe-cancel.c. That's fixed > now too (as well as some minor formatting issues). Thanks! I committed 0001 now. I also renamed the new pq_parse_int_param to pqParseIntParam, for consistency with other routines there. Please rebase the other patches. Thanks, -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest "foo" someone someday shall type "supercalifragilisticexpialidocious" (5th Commandment for C programmers)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Sun, 28 Jan 2024 at 10:51, Jelte Fennema-Nio wrote: > Both of those are fixed now. Okay, there turned out to also be an issue on Windows with setKeepalivesWin32 not being available in fe-cancel.c. That's fixed now too (as well as some minor formatting issues). From 4efbb0c75341f4612f0c5b8d5d3fe3f8f9c3b43c Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Fri, 26 Jan 2024 17:01:28 +0100 Subject: [PATCH v28 2/5] libpq: Add pq_release_conn_hosts function In a follow up PR we'll need to free this connhost field in a function defined in fe-cancel.c So this extracts the logic to a dedicated extern function. --- src/interfaces/libpq/fe-connect.c | 38 --- src/interfaces/libpq/libpq-int.h | 1 + 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 5d08b4904d3..bc1f6521650 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -4395,19 +4395,7 @@ freePGconn(PGconn *conn) free(conn->events[i].name); } - /* clean up pg_conn_host structures */ - for (int i = 0; i < conn->nconnhost; ++i) - { - free(conn->connhost[i].host); - free(conn->connhost[i].hostaddr); - free(conn->connhost[i].port); - if (conn->connhost[i].password != NULL) - { - explicit_bzero(conn->connhost[i].password, strlen(conn->connhost[i].password)); - free(conn->connhost[i].password); - } - } - free(conn->connhost); + pq_release_conn_hosts(conn); free(conn->client_encoding_initial); free(conn->events); @@ -4526,6 +4514,30 @@ release_conn_addrinfo(PGconn *conn) } } +/* + * pq_release_conn_hosts + * - Free the host list in the PGconn. + */ +void +pq_release_conn_hosts(PGconn *conn) +{ + if (conn->connhost) + { + for (int i = 0; i < conn->nconnhost; ++i) + { + free(conn->connhost[i].host); + free(conn->connhost[i].hostaddr); + free(conn->connhost[i].port); + if (conn->connhost[i].password != NULL) + { +explicit_bzero(conn->connhost[i].password, strlen(conn->connhost[i].password)); +free(conn->connhost[i].password); + } + } + free(conn->connhost); + } +} + /* * sendTerminateConn * - Send a terminate message to backend. diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index 48c10b474f5..4cbad2c2c83 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -680,6 +680,7 @@ extern int pqPacketSend(PGconn *conn, char pack_type, extern bool pqGetHomeDirectory(char *buf, int bufsize); extern bool pq_parse_int_param(const char *value, int *result, PGconn *conn, const char *context); +extern void pq_release_conn_hosts(PGconn *conn); extern pgthreadlock_t pg_g_threadlock; -- 2.34.1 From f1168ac4c3dd758a77be3ceb8c40bacb9aebef8c Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Fri, 26 Jan 2024 16:47:51 +0100 Subject: [PATCH v28 3/5] libpq: Change some static functions to extern This is in preparation of a follow up commit that starts using these functions from fe-cancel.c. --- src/interfaces/libpq/fe-connect.c | 85 +++ src/interfaces/libpq/libpq-int.h | 6 +++ 2 files changed, 46 insertions(+), 45 deletions(-) diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index bc1f6521650..aeb3adc0e31 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -387,15 +387,10 @@ static const char uri_designator[] = "postgresql://"; static const char short_uri_designator[] = "postgres://"; static bool connectOptions1(PGconn *conn, const char *conninfo); -static bool connectOptions2(PGconn *conn); -static int connectDBStart(PGconn *conn); -static int connectDBComplete(PGconn *conn); static PGPing internal_ping(PGconn *conn); -static PGconn *makeEmptyPGconn(void); static void pqFreeCommandQueue(PGcmdQueueEntry *queue); static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions); static void freePGconn(PGconn *conn); -static void closePGconn(PGconn *conn); static void release_conn_addrinfo(PGconn *conn); static int store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist); static void sendTerminateConn(PGconn *conn); @@ -644,7 +639,7 @@ pqDropServerData(PGconn *conn) * PQconnectStart or PQconnectStartParams (which differ in the same way as * PQconnectdb and PQconnectdbParams) and PQconnectPoll. * - * Internally, the static functions connectDBStart, connectDBComplete + * Internally, the static functions pqConnectDBStart, pqConnectDBComplete * are part of the connection procedure. */ @@ -678,7 +673,7 @@ PQconnectdbParams(const char *const *keywords, PGconn *conn = PQconnectStartParams(keywords, values, expand_dbname); if (conn && conn->status != CONNECTION_BAD) - (void) connectDBComplete(conn); + (void) pqConnectDBComplete(conn); return conn; } @@ -731,7 +726,7 @@ PQconnectdb(const char *conninfo) PGconn *conn =
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Sun, 28 Jan 2024 at 04:15, vignesh C wrote: > CFBot shows that the patch has few compilation errors as in [1]: > [17:07:07.621] /usr/bin/ld: > ../../../src/fe_utils/libpgfeutils.a(cancel.o): in function > `handle_sigint': > [17:07:07.621] cancel.c:(.text+0x50): undefined reference to `PQcancel' I forgot to update ./configure based builds with the new file, only meson was working. Also it seems I trimmed the header list fe-cancel.c a bit too much for OSX, so I added unistd.h back. Both of those are fixed now. v27-0002-libpq-Add-pq_release_conn_hosts-function.patch Description: Binary data v27-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v27-0003-libpq-Change-some-static-functions-to-extern.patch Description: Binary data v27-0004-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v27-0001-libpq-Move-cancellation-related-functions-to-fe-.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Fri, 26 Jan 2024 at 22:22, Jelte Fennema-Nio wrote: > > On Fri, 26 Jan 2024 at 13:11, Alvaro Herrera wrote: > > I wonder, would it make sense to put all these new functions in a > > separate file fe-cancel.c? > > > Okay I tried doing that. I think the end result is indeed quite nice, > having all the cancellation related functions together in a file. But > it did require making a bunch of static functions in fe-connect > extern, and adding them to libpq-int.h. On one hand that seems fine to > me, on the other maybe that indicates that this cancellation logic > makes sense to be in the same file as the other connection functions > (in a sense, connecting is all that a cancel request does). CFBot shows that the patch has few compilation errors as in [1]: [17:07:07.621] /usr/bin/ld: ../../../src/fe_utils/libpgfeutils.a(cancel.o): in function `handle_sigint': [17:07:07.621] cancel.c:(.text+0x50): undefined reference to `PQcancel' [17:07:07.621] /usr/bin/ld: ../../../src/fe_utils/libpgfeutils.a(cancel.o): in function `SetCancelConn': [17:07:07.621] cancel.c:(.text+0x10c): undefined reference to `PQfreeCancel' [17:07:07.621] /usr/bin/ld: cancel.c:(.text+0x114): undefined reference to `PQgetCancel' [17:07:07.621] /usr/bin/ld: ../../../src/fe_utils/libpgfeutils.a(cancel.o): in function `ResetCancelConn': [17:07:07.621] cancel.c:(.text+0x148): undefined reference to `PQfreeCancel' [17:07:07.621] /usr/bin/ld: ../../../src/fe_utils/libpgfeutils.a(connect_utils.o): in function `disconnectDatabase': [17:07:07.621] connect_utils.c:(.text+0x2fc): undefined reference to `PQcancelConn' [17:07:07.621] /usr/bin/ld: connect_utils.c:(.text+0x307): undefined reference to `PQcancelSend' [17:07:07.621] /usr/bin/ld: connect_utils.c:(.text+0x30f): undefined reference to `PQcancelFinish' [17:07:07.623] /usr/bin/ld: ../../../src/interfaces/libpq/libpq.so: undefined reference to `PQcancelPoll' [17:07:07.626] collect2: error: ld returned 1 exit status [17:07:07.626] make[3]: *** [Makefile:31: pg_amcheck] Error 1 [17:07:07.626] make[2]: *** [Makefile:45: all-pg_amcheck-recurse] Error 2 [17:07:07.626] make[2]: *** Waiting for unfinished jobs [17:07:08.126] /usr/bin/ld: ../../../src/interfaces/libpq/libpq.so: undefined reference to `PQcancelPoll' [17:07:08.130] collect2: error: ld returned 1 exit status [17:07:08.131] make[3]: *** [Makefile:42: initdb] Error 1 [17:07:08.131] make[2]: *** [Makefile:45: all-initdb-recurse] Error 2 [17:07:08.492] /usr/bin/ld: ../../../src/interfaces/libpq/libpq.so: undefined reference to `PQcancelPoll' [17:07:08.495] collect2: error: ld returned 1 exit status [17:07:08.496] make[3]: *** [Makefile:50: pg_basebackup] Error 1 [17:07:08.496] make[2]: *** [Makefile:45: all-pg_basebackup-recurse] Error 2 [17:07:09.060] /usr/bin/ld: parallel.o: in function `sigTermHandler': [17:07:09.060] parallel.c:(.text+0x1aa): undefined reference to `PQcancel' Please post an updated version for the same. [1] - https://cirrus-ci.com/task/6210637211107328 Regards, Vignesh
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Fri, 26 Jan 2024 at 18:19, Alvaro Herrera wrote: > Yeah, I see that point of view as well. I like the end result; the > additional protos in libpq-int.h don't bother me. Does anybody else > wants to share their opinion on it? If none, then I'd consider going > ahead with this version. To be clear, I'm +1 on the new file structure (although if people feel strongly against it, I don't care enough to make a big deal out of it). @Alvaro did you have any other comments on the contents of the patch btw?
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Jan-26, Jelte Fennema-Nio wrote: > Okay I tried doing that. I think the end result is indeed quite nice, > having all the cancellation related functions together in a file. But > it did require making a bunch of static functions in fe-connect > extern, and adding them to libpq-int.h. On one hand that seems fine to > me, on the other maybe that indicates that this cancellation logic > makes sense to be in the same file as the other connection functions > (in a sense, connecting is all that a cancel request does). Yeah, I see that point of view as well. I like the end result; the additional protos in libpq-int.h don't bother me. Does anybody else wants to share their opinion on it? If none, then I'd consider going ahead with this version. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "We’ve narrowed the problem down to the customer’s pants being in a situation of vigorous combustion" (Robert Haas, Postgres expert extraordinaire)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Fri, 26 Jan 2024 at 13:11, Alvaro Herrera wrote: > I wonder, would it make sense to put all these new functions in a > separate file fe-cancel.c? Okay I tried doing that. I think the end result is indeed quite nice, having all the cancellation related functions together in a file. But it did require making a bunch of static functions in fe-connect extern, and adding them to libpq-int.h. On one hand that seems fine to me, on the other maybe that indicates that this cancellation logic makes sense to be in the same file as the other connection functions (in a sense, connecting is all that a cancel request does). v26-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v26-0002-libpq-Add-pq_release_conn_hosts-function.patch Description: Binary data v26-0003-libpq-Change-some-static-functions-to-extern.patch Description: Binary data v26-0001-libpq-Move-cancellation-related-functions-to-fe-.patch Description: Binary data v26-0004-Add-non-blocking-version-of-PQcancel.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Pushed 0001. I wonder, would it make sense to put all these new functions in a separate file fe-cancel.c? -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "World domination is proceeding according to plan"(Andrew Morton)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Fri, 26 Jan 2024 at 02:59, vignesh C wrote: > Please post an updated version for the same. Done. From 5a94d610a4fe138365e2e88c5cec72eba53ed036 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Thu, 14 Dec 2023 13:39:04 +0100 Subject: [PATCH v25 2/3] Add non-blocking version of PQcancel This patch makes the following changes in libpq: 1. Add a new PQcancelSend function, which sends cancellation requests using the regular connection establishment code. This makes sure that cancel requests support and use all connection options including encryption. 2. Add a new PQcancelConn function which allows sending cancellation in a non-blocking way by using it together with the newly added PQcancelPoll and PQcancelSocket. The existing PQcancel API is using blocking IO. This makes PQcancel impossible to use in an event loop based codebase, without blocking the event loop until the call returns. PQcancelConn can now be used instead, to have a non-blocking way of sending cancel requests. This patch also includes a test for all of libpq cancellation APIs. The test can be easily run like this: cd src/test/modules/libpq_pipeline make && ./libpq_pipeline cancel --- doc/src/sgml/libpq.sgml | 280 ++- src/interfaces/libpq/exports.txt | 8 + src/interfaces/libpq/fe-connect.c | 451 +- src/interfaces/libpq/libpq-fe.h | 27 +- src/interfaces/libpq/libpq-int.h | 9 + .../modules/libpq_pipeline/libpq_pipeline.c | 263 +- 6 files changed, 987 insertions(+), 51 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index d0d5aefadc0..9808e678650 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -265,7 +265,7 @@ PGconn *PQsetdb(char *pghost, PQconnectStartParamsPQconnectStartParams PQconnectStartPQconnectStart - PQconnectPollPQconnectPoll + PQconnectPollPQconnectPoll nonblocking connection @@ -5281,7 +5281,7 @@ int PQisBusy(PGconn *conn); / can also attempt to cancel a command that is still being processed by the server; see . But regardless of - the return value of , the application + the return value of , the application must continue with the normal result-reading sequence using . A successful cancellation will simply cause the command to terminate sooner than it would have @@ -6034,13 +6034,223 @@ int PQsetSingleRowMode(PGconn *conn); this section. + + PQcancelConnPQcancelConn + + + + Prepares a connection over which a cancel request can be sent. + +PGcancelConn *PQcancelConn(PGconn *conn); + + + + +creates a + PGcancelConnPGcancelConn + object, but it won't instantly start sending a cancel request over this + connection. A cancel request can be sent over this connection in a + blocking manner using and in a + non-blocking manner using . + The return value should can be passed to , + to check if the PGcancelConn object was + created successfully. The PGcancelConn object + is an opaque structure that is not meant to be accessed directly by the + application. This PGcancelConn object can be + used to cancel the query that's running on the original connection in a + thread-safe way. + + + + If the original connection is encrypted (using TLS or GSS), then the + connection for the cancel request is encrypted in the same way. Any + connection options that are only used during authentication or after + authentication of the client are ignored though, because cancellation + requests do not require authentication and the connection is closed right + after the cancellation request is submitted. + + + + Note that when PQcancelConn returns a non-null + pointer, you must call when you + are finished with it, in order to dispose of the structure and any + associated memory blocks. This must be done even if the cancel request + failed or was abandoned. + + + + + + PQcancelSendPQcancelSend + + + + Requests that the server abandons processing of the current command in a blocking manner. + +int PQcancelSend(PGcancelConn *conn); + + + + + The request is made over the given PGcancelConn, + which needs to be created with + The return value of + is 1 if the cancel request was successfully + dispatched and 0 if not. If it was unsuccessful, the error message can be + retrieved using . + + + + Successful dispatch of the cancellation is no guarantee that the request + will have any effect, however. If the cancellation is effective, the + command being canceled will terminate early and return an error result. + If the cancellation fails (say, because
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 20 Dec 2023 at 19:17, Jelte Fennema-Nio wrote: > > On Thu, 14 Dec 2023 at 13:57, Jelte Fennema-Nio wrote: > > I changed all the places that were not adhering to those spellings. > > It seems I forgot a /g on my sed command to do this so it turned out I > missed one that caused the test to fail to compile... Attached is a > fixed version. > > I also updated the patchset to use the EOF detection provided by > 0a5c46a7a488f2f4260a90843bb9de6c584c7f4e instead of introducing a new > way of EOF detection using a -2 return value. CFBot shows that the patch does not apply anymore as in [1]: patching file doc/src/sgml/libpq.sgml ... patching file src/interfaces/libpq/exports.txt Hunk #1 FAILED at 191. 1 out of 1 hunk FAILED -- saving rejects to file src/interfaces/libpq/exports.txt.rej patching file src/interfaces/libpq/fe-connect.c Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_3511.log Regards, Vignesh
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 14 Dec 2023 at 13:57, Jelte Fennema-Nio wrote: > I changed all the places that were not adhering to those spellings. It seems I forgot a /g on my sed command to do this so it turned out I missed one that caused the test to fail to compile... Attached is a fixed version. I also updated the patchset to use the EOF detection provided by 0a5c46a7a488f2f4260a90843bb9de6c584c7f4e instead of introducing a new way of EOF detection using a -2 return value. v24-0003-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v24-0001-Fix-spelling-of-canceled-cancellation.patch Description: Binary data v24-0002-Add-non-blocking-version-of-PQcancel.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Mon, 13 Nov 2023 at 03:39, Thomas Munro wrote: > We follow the common en-US usage: "canceled", "canceling" but > "cancellation". Blame Webstah et al. I changed all the places that were not adhering to those spellings. There were also a few of such places in parts of the codebase that these changes didn't touch. I included a new 0001 patch to fix those. I do feel like this patchset is pretty much in a committable state. So it would be very much appreciated if any comitter could help push it over the finish line. v23-0002-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v23-0001-Fix-spelling-of-canceled-cancellation.patch Description: Binary data v23-0004-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v23-0003-Add-non-blocking-version-of-PQcancel.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Trivial observation: these patches obviously introduce many instances of words derived from "cancel", but they don't all conform to established project decisions (cf 21f1e15a) about how to spell them. We follow the common en-US usage: "canceled", "canceling" but "cancellation". Blame Webstah et al. https://english.stackexchange.com/questions/176957/cancellation-canceled-canceling-us-usage
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Rebased again to resolve some conflicts v22-0003-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v22-0002-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v22-0004-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
I noticed that cfbot was unable to run tests due to some rebase conflict. It seems the pgindent changes from patch 1 have now been made. So adding the rebased patches without patch 1 now to unblock cfbot. v21-0002-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v21-0003-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v21-0004-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Okay, I rebased again. Indeed 983ec23007b gave the most problems. On Fri, 7 Apr 2023 at 10:02, Denis Laxalde wrote: > > The patch set does not apply any more. > > I tried to rebase locally; even leaving out 1 ("libpq: Run pgindent > after a9e9a9f32b3"), patch 4 ("Start using new libpq cancel APIs") is > harder to resolve following 983ec23007b (I suppose). > > Appart from that, the implementation in v19 sounds good to me, and seems > worthwhile. FWIW, as said before, I also implemented it in Psycopg in a > sort of an end-to-end validation. v20-0003-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v20-0004-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v20-0002-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v20-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
The patch set does not apply any more. I tried to rebase locally; even leaving out 1 ("libpq: Run pgindent after a9e9a9f32b3"), patch 4 ("Start using new libpq cancel APIs") is harder to resolve following 983ec23007b (I suppose). Appart from that, the implementation in v19 sounds good to me, and seems worthwhile. FWIW, as said before, I also implemented it in Psycopg in a sort of an end-to-end validation.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 30 Mar 2023 at 10:07, Denis Laxalde wrote: > Patch 5 is missing respective changes; please find attached a fixup > patch for these. Thanks, attached are newly rebased patches that include this change. I also cast the result of PQcancelSend to to void in the one case where it's ignored on purpose. Note that the patchset shrunk by one, since the original patch 0002 has been committed now. v19-0002-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v19-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data v19-0003-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v19-0004-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Jelte Fennema a écrit : > On Wed, 29 Mar 2023 at 10:43, Denis Laxalde wrote: > > More importantly, not having PQcancelSend() creating the PGcancelConn > > makes reuse of that value, passing through PQcancelReset(), more > > intuitive. E.g., in the tests: > > You convinced me. Attached is an updated patch where PQcancelSend > takes the PGcancelConn and returns 1 or 0. Patch 5 is missing respective changes; please find attached a fixup patch for these. >From c9e59fb3e30db1bfab75be9fdd4afbc227a5270e Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Thu, 30 Mar 2023 09:19:18 +0200 Subject: [PATCH] fixup! Start using new libpq cancel APIs --- contrib/dblink/dblink.c | 4 ++-- src/fe_utils/connect_utils.c | 4 +++- src/test/isolation/isolationtester.c | 4 ++-- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index e139f66e11..073795f088 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -1332,11 +1332,11 @@ dblink_cancel_query(PG_FUNCTION_ARGS) dblink_init(); conn = dblink_get_named_conn(text_to_cstring(PG_GETARG_TEXT_PP(0))); - cancelConn = PQcancelSend(conn); + cancelConn = PQcancelConn(conn); PG_TRY(); { - if (PQcancelStatus(cancelConn) == CONNECTION_BAD) + if (!PQcancelSend(cancelConn)) { msg = pchomp(PQcancelErrorMessage(cancelConn)); } diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c index b32448c010..1cfd717217 100644 --- a/src/fe_utils/connect_utils.c +++ b/src/fe_utils/connect_utils.c @@ -161,7 +161,9 @@ disconnectDatabase(PGconn *conn) if (PQtransactionStatus(conn) == PQTRANS_ACTIVE) { - PQcancelFinish(PQcancelSend(conn)); + PGcancelConn *cancelConn = PQcancelConn(conn); + PQcancelSend(cancelConn); + PQcancelFinish(cancelConn); } PQfinish(conn); diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c index 3781f7982b..de31a87571 100644 --- a/src/test/isolation/isolationtester.c +++ b/src/test/isolation/isolationtester.c @@ -946,9 +946,9 @@ try_complete_step(TestSpec *testspec, PermutationStep *pstep, int flags) */ if (td > max_step_wait && !canceled) { -PGcancelConn *cancel_conn = PQcancelSend(conn); +PGcancelConn *cancel_conn = PQcancelConn(conn); -if (PQcancelStatus(cancel_conn) == CONNECTION_OK) +if (PQcancelSend(cancel_conn)) { /* * print to stdout not stderr, as this should appear in -- 2.30.2
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 29 Mar 2023 at 10:43, Denis Laxalde wrote: > More importantly, not having PQcancelSend() creating the PGcancelConn > makes reuse of that value, passing through PQcancelReset(), more > intuitive. E.g., in the tests: You convinced me. Attached is an updated patch where PQcancelSend takes the PGcancelConn and returns 1 or 0. > The thing is that we need the connection encoding (client_encoding) when > eventually forwarding the result of PQcancelErrorMessage(), decoded, to > the user. Cancel connections don't have an encoding specified. They never receive an error from the server. All errors come from the machine that libpq is on. So I think you're making the decoding more complicated than it needs to be. v18-0004-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v18-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v18-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data v18-0002-Copy-and-store-addrinfo-in-libpq-owned-private-m.patch Description: Binary data v18-0003-Return-2-from-pqReadData-on-EOF.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Jelte Fennema a écrit : > > Namely, I wonder why it returns a PGcancelConn and what's the > > point of requiring the user to call PQcancelStatus() to see if something > > got wrong. Maybe it could be defined as: > > > > int PQcancelSend(PGcancelConn *cancelConn); > > > > where the return value would be status? And the user would only need to > > call PQcancelErrorMessage() in case of error. This would leave only one > > single way to create a PGcancelConn value (i.e. PQcancelConn()), which > > seems less confusing to me. > > To clarify what you mean, the API would then be like this: > PGcancelConn cancelConn = PQcancelConn(conn); > if (PQcancelSend(cancelConn) == CONNECTION_BAD) { >printf("ERROR %s\n", PQcancelErrorMessage(cancelConn)) >exit(1) > } I'm not sure it's worth returning the connection status, maybe just an int value (the return value of connectDBComplete() for instance). More importantly, not having PQcancelSend() creating the PGcancelConn makes reuse of that value, passing through PQcancelReset(), more intuitive. E.g., in the tests: diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index 6764ab513b..91363451af 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -217,17 +217,18 @@ test_cancel(PGconn *conn, const char *conninfo) pg_fatal("failed to run PQrequestCancel: %s", PQerrorMessage(conn)); confirm_query_cancelled(conn); + cancelConn = PQcancelConn(conn); + /* test PQcancelSend */ send_cancellable_query(conn, monitorConn); - cancelConn = PQcancelSend(conn); - if (PQcancelStatus(cancelConn) == CONNECTION_BAD) + if (PQcancelSend(cancelConn) == CONNECTION_BAD) pg_fatal("failed to run PQcancelSend: %s", PQcancelErrorMessage(cancelConn)); confirm_query_cancelled(conn); - PQcancelFinish(cancelConn); + + PQcancelReset(cancelConn); /* test PQcancelConn and then polling with PQcancelPoll */ send_cancellable_query(conn, monitorConn); - cancelConn = PQcancelConn(conn); if (PQcancelStatus(cancelConn) == CONNECTION_BAD) pg_fatal("bad cancel connection: %s", PQcancelErrorMessage(cancelConn)); while (true) Otherwise, it's not clear if the PGcancelConn created by PQcancelSend() should be reused or not. But maybe that's a matter of documentation? > > As part of my testing, I've implemented non-blocking cancellation in > > Psycopg, based on v16 on this patchset. Overall this worked fine and > > seems useful; if you want to try it: > > > > https://github.com/dlax/psycopg3/tree/pg16/non-blocking-pqcancel > > That's great to hear! I'll try to take a closer look at that change tomorrow. See also https://github.com/psycopg/psycopg/issues/534 if you want to discuss about this. > > (The only thing I found slightly inconvenient is the need to convey the > > connection encoding (from PGconn) when handling error message from the > > PGcancelConn.) > > Could you expand a bit more on this? And if you have any idea on how > to improve the API with regards to this? The thing is that we need the connection encoding (client_encoding) when eventually forwarding the result of PQcancelErrorMessage(), decoded, to the user. More specifically, it seems to me that we'd the encoding of the *cancel connection*, but since PQparameterStatus() cannot be used with a PGcancelConn, I use that of the PGconn. Roughly, in Python: encoding = conn.parameter_status(b"client_encoding") # i.e, in C: char *encoding PQparameterStatus(conn, "client_encoding"); cancel_conn = conn.cancel_conn() # i.e., in C: PGcancelConn *cancelConn = PQcancelConn(conn); # [... then work with with cancel_conn ...] if cancel_conn.status == ConnStatus.BAD: raise OperationalError(cancel_conn.error_message().decode(encoding)) This feels a bit non-atomic to me; isn't there a risk that client_encoding be changed between PQparameterStatus(conn) and PQcancelConn(conn) calls? So maybe PQcancelParameterStatus(PGcancelConn *cancelConn, char *name) is needed?
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 28 Mar 2023 at 16:54, Denis Laxalde wrote: > I had a look into your patchset (v16), did a quick review and played a > bit with the feature. > > Patch 2 is missing the documentation about PQcancelSocket() and contains > a few typos; please find attached a (fixup) patch to correct these. Thanks applied that patch and attached a new patchset > Namely, I wonder why it returns a PGcancelConn and what's the > point of requiring the user to call PQcancelStatus() to see if something > got wrong. Maybe it could be defined as: > > int PQcancelSend(PGcancelConn *cancelConn); > > where the return value would be status? And the user would only need to > call PQcancelErrorMessage() in case of error. This would leave only one > single way to create a PGcancelConn value (i.e. PQcancelConn()), which > seems less confusing to me. To clarify what you mean, the API would then be like this: PGcancelConn cancelConn = PQcancelConn(conn); if (PQcancelSend(cancelConn) == CONNECTION_BAD) { printf("ERROR %s\n", PQcancelErrorMessage(cancelConn)) exit(1) } Instead of: PGcancelConn cancelConn = PQcancelSend(conn); if (PQcancelStatus(cancelConn) == CONNECTION_BAD) { printf("ERROR %s\n", PQcancelErrorMessage(cancelConn)) exit(1) } Those are so similar, that I have no preference either way. If more people prefer one over the other I'm happy to change it, but for now I'll keep it as is. > As part of my testing, I've implemented non-blocking cancellation in > Psycopg, based on v16 on this patchset. Overall this worked fine and > seems useful; if you want to try it: > > https://github.com/dlax/psycopg3/tree/pg16/non-blocking-pqcancel That's great to hear! I'll try to take a closer look at that change tomorrow. > (The only thing I found slightly inconvenient is the need to convey the > connection encoding (from PGconn) when handling error message from the > PGcancelConn.) Could you expand a bit more on this? And if you have any idea on how to improve the API with regards to this? v17-0002-Copy-and-store-addrinfo-in-libpq-owned-private-m.patch Description: Binary data v17-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v17-0003-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v17-0004-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v17-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Hi Jelte, I had a look into your patchset (v16), did a quick review and played a bit with the feature. Patch 2 is missing the documentation about PQcancelSocket() and contains a few typos; please find attached a (fixup) patch to correct these. --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -321,16 +328,28 @@ extern PostgresPollingStatusType PQresetPoll(PGconn *conn); /* Synchronous (blocking) */ extern void PQreset(PGconn *conn); +/* issue a cancel request */ +extern PGcancelConn * PQcancelSend(PGconn *conn); [...] Maybe I'm missing something, but this function above seems a bit strange. Namely, I wonder why it returns a PGcancelConn and what's the point of requiring the user to call PQcancelStatus() to see if something got wrong. Maybe it could be defined as: int PQcancelSend(PGcancelConn *cancelConn); where the return value would be status? And the user would only need to call PQcancelErrorMessage() in case of error. This would leave only one single way to create a PGcancelConn value (i.e. PQcancelConn()), which seems less confusing to me. Jelte Fennema wrote: > Especially since I ran into another use case that I would want to use > this patch for recently: Adding an async cancel function to Python > it's psycopg3 library. This library exposes both a Connection class > and an AsyncConnection class (using python its asyncio feature). But > one downside of the AsyncConnection type is that it doesn't have a > cancel method. As part of my testing, I've implemented non-blocking cancellation in Psycopg, based on v16 on this patchset. Overall this worked fine and seems useful; if you want to try it: https://github.com/dlax/psycopg3/tree/pg16/non-blocking-pqcancel (The only thing I found slightly inconvenient is the need to convey the connection encoding (from PGconn) when handling error message from the PGcancelConn.) Cheers, Denis >From a5f9cc680ffa520b05fe34b7cac5df2e60a6d4ad Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Tue, 28 Mar 2023 16:06:42 +0200 Subject: [PATCH] fixup! Add non-blocking version of PQcancel --- doc/src/sgml/libpq.sgml | 16 +++- src/interfaces/libpq/fe-connect.c| 2 +- src/test/modules/libpq_pipeline/libpq_pipeline.c | 2 +- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 29f08a4317..aa404c4d15 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -5795,7 +5795,7 @@ PGcancelConn *PQcancelSend(PGconn *conn); options as the the original PGconn. So when the original connection is encrypted (using TLS or GSS), the connection for the cancel request is encrypted in the same way. Any connection options - that only are only used during authentication or after authentication of + that are only used during authentication or after authentication of the client are ignored though, because cancellation requests do not require authentication and the connection is closed right after the cancellation request is submitted. @@ -5912,6 +5912,20 @@ ConnStatusType PQcancelStatus(const PGcancelConn *conn); + + PQcancelSocketPQcancelSocket + + + + A version of that can be used for + cancellation connections. + +int PQcancelSocket(PGcancelConn *conn); + + + + + PQcancelPollPQcancelPoll diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 74e337fddf..16af7303d4 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -808,7 +808,7 @@ PQcancelConn(PGconn *conn) /* * Cancel requests should not iterate over all possible hosts. The request * needs to be sent to the exact host and address that the original - * connection used. So we we manually create the host and address arrays + * connection used. So we manually create the host and address arrays * with a single element after freeing the host array that we generated * from the connection options. */ diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index e8e904892c..6764ab513b 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -89,7 +89,7 @@ pg_fatal_impl(int line, const char *fmt,...) /* * Check that the query on the given connection got cancelled. * - * This is a function wrapped in a macrco to make the reported line number + * This is a function wrapped in a macro to make the reported line number * in an error match the line number of the invocation. */ #define confirm_query_cancelled(conn) confirm_query_cancelled_impl(__LINE__, conn) -- 2.30.2
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Rebased after conflicts with bfc9497ece01c7c45437bc36387cb1ebe346f4d2 Also included the fix for feedback from Daniel on patch 2, which he had shared in the load balancing thread. On Wed, 15 Mar 2023 at 09:49, Jelte Fennema wrote: > > The rebase was indeed trivial (git handled everything automatically), > because my first patch was doing a superset of the changes that were > committed in b6dfee28f. Attached are the new patches. > > On Tue, 14 Mar 2023 at 19:04, Greg Stark wrote: > > > > On Tue, 14 Mar 2023 at 13:59, Tom Lane wrote: > > > > > > "Gregory Stark (as CFM)" writes: > > > > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and > > > > fe-connect.c. Every hunk is failing which perhaps means the code > > > > you're patching has been moved or refactored? > > > > > > The cfbot is giving up after > > > v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails, > > > but that's been superseded (at least in part) by b6dfee28f. > > > > Ah, same with Jelte Fennema's patch for load balancing in libpq. > > > > -- > > greg v16-0003-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v16-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v16-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch Description: Binary data v16-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data v16-0004-Add-non-blocking-version-of-PQcancel.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
The rebase was indeed trivial (git handled everything automatically), because my first patch was doing a superset of the changes that were committed in b6dfee28f. Attached are the new patches. On Tue, 14 Mar 2023 at 19:04, Greg Stark wrote: > > On Tue, 14 Mar 2023 at 13:59, Tom Lane wrote: > > > > "Gregory Stark (as CFM)" writes: > > > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and > > > fe-connect.c. Every hunk is failing which perhaps means the code > > > you're patching has been moved or refactored? > > > > The cfbot is giving up after > > v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails, > > but that's been superseded (at least in part) by b6dfee28f. > > Ah, same with Jelte Fennema's patch for load balancing in libpq. > > -- > greg v15-0003-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v15-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch Description: Binary data v15-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data v15-0004-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v15-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 14 Mar 2023 at 13:59, Tom Lane wrote: > > "Gregory Stark (as CFM)" writes: > > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and > > fe-connect.c. Every hunk is failing which perhaps means the code > > you're patching has been moved or refactored? > > The cfbot is giving up after > v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails, > but that's been superseded (at least in part) by b6dfee28f. Ah, same with Jelte Fennema's patch for load balancing in libpq. -- greg
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
"Gregory Stark (as CFM)" writes: > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and > fe-connect.c. Every hunk is failing which perhaps means the code > you're patching has been moved or refactored? The cfbot is giving up after v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails, but that's been superseded (at least in part) by b6dfee28f. regards, tom lane
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and fe-connect.c. Every hunk is failing which perhaps means the code you're patching has been moved or refactored?
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 1 Mar 2023 at 14:48, Jelte Fennema wrote: > > > This looks like it needs a rebase. > > done Great. Please update the CF entry to Needs Review or Ready for Committer as appropriate :) -- Gregory Stark As Commitfest Manager
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 28 Feb 2023 at 15:59, Gregory Stark wrote: > > This looks like it needs a rebase. So I'm updating the patch to Waiting on Author
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
This looks like it needs a rebase. === Applying patches on top of PostgreSQL commit ID 71a75626d5271f2bcdbdc43b8c13065c4634fd9f === === applying patch ./v11-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch patching file src/interfaces/libpq/fe-auth-scram.c patching file src/interfaces/libpq/fe-auth.c patching file src/interfaces/libpq/fe-connect.c Hunk #35 FAILED at 3216. Hunk #36 succeeded at 3732 (offset 27 lines). Hunk #37 succeeded at 3782 (offset 27 lines). Hunk #38 succeeded at 3795 (offset 27 lines). Hunk #39 succeeded at 7175 (offset 27 lines). 1 out of 39 hunks FAILED -- saving rejects to file src/interfaces/libpq/fe-connect.c.rej patching file src/interfaces/libpq/fe-exec.c patching file src/interfaces/libpq/fe-lobj.c patching file src/interfaces/libpq/fe-misc.c patching file src/interfaces/libpq/fe-protocol3.c patching file src/interfaces/libpq/fe-secure-common.c patching file src/interfaces/libpq/fe-secure-gssapi.c Hunk #3 succeeded at 590 (offset 2 lines). patching file src/interfaces/libpq/fe-secure-openssl.c Hunk #3 succeeded at 415 (offset 5 lines). Hunk #4 succeeded at 967 (offset 5 lines). Hunk #5 succeeded at 993 (offset 5 lines). Hunk #6 succeeded at 1037 (offset 5 lines). Hunk #7 succeeded at 1089 (offset 5 lines). Hunk #8 succeeded at 1122 (offset 5 lines). Hunk #9 succeeded at 1140 (offset 5 lines). Hunk #10 succeeded at 1239 (offset 5 lines). Hunk #11 succeeded at 1250 (offset 5 lines). Hunk #12 succeeded at 1265 (offset 5 lines). Hunk #13 succeeded at 1278 (offset 5 lines). Hunk #14 succeeded at 1315 (offset 5 lines). Hunk #15 succeeded at 1326 (offset 5 lines). Hunk #16 succeeded at 1383 (offset 5 lines). Hunk #17 succeeded at 1399 (offset 5 lines). Hunk #18 succeeded at 1452 (offset 5 lines). Hunk #19 succeeded at 1494 (offset 5 lines). patching file src/interfaces/libpq/fe-secure.c patching file src/interfaces/libpq/libpq-int.h
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Is there anything that is currently blocking this patch? I'd quite like it to get into PG16. Especially since I ran into another use case that I would want to use this patch for recently: Adding an async cancel function to Python it's psycopg3 library. This library exposes both a Connection class and an AsyncConnection class (using python its asyncio feature). But one downside of the AsyncConnection type is that it doesn't have a cancel method. I ran into this while changing the PgBouncer tests to use python. And the cancellation tests were the only tests that required me to use a ThreadPoolExecutor instead of simply being able to use async-await style programming: https://github.com/pgbouncer/pgbouncer/blob/master/test/test_cancel.py#LL9C17-L9C17
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
> This version of the patch no longer applies, a rebased version is needed. Attached is a patch that applies cleanly again and is also changed to use the recently introduced libpq_append_conn_error. I also attached a patch that runs pgindent after the introduction of libpq_append_conn_error. I noticed that this hadn't happened when trying to run pgindent on my own changes. v9-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: v9-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch v9-0002-Add-non-blocking-version-of-PQcancel.patch Description: v9-0002-Add-non-blocking-version-of-PQcancel.patch
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Jelte Fennema writes: > [ non-blocking PQcancel ] I pushed the 0001 patch (libpq_pipeline documentation) with a bit of further wordsmithing. As for 0002, I'm not sure that's anywhere near ready. I doubt it's a great idea to un-deprecate PQrequestCancel with a major change in its behavior. If there is anybody out there still using it, they're not likely to appreciate that. Let's leave that alone and pick some other name. I'm also finding the entire design of PQrequestCancelStart etc to be horribly confusing --- it's not *bad* necessarily, but the chosen function names are seriously misleading. PQrequestCancelStart doesn't actually "start" anything, so the apparent parallel with PQconnectStart is just wrong. It's also fairly unclear what the state of a cancel PQconn is after the request cycle is completed, and whether you can re-use it (especially after a failed request), and whether you have to dispose of it separately. On the whole it feels like a mistake to have two separate kinds of PGconn with fundamentally different behaviors and yet no distinction in the API. I think I'd recommend having a separate struct type (which might internally contain little more than a pointer to a cloned PGconn), and provide only a limited set of operations on it. Seems like create, start/continue cancel request, destroy, and fetch error message ought to be enough. I don't see a reason why we need to support all of libpq's inquiry operations on such objects --- for instance, if you want to know which host is involved, you could perfectly well query the parent PGconn. Nor do I want to run around and add code to every single libpq entry point to make it reject cancel PGconns if it can't support them, but we'd have to do so if there's just one struct type. I'm not seeing the use-case for PQconnectComplete. If you want a non-blocking cancel request, why would you then use a blocking operation to complete the request? Seems like it'd be better to have just a monolithic cancel function for those who don't need non-blocking. This change: --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -59,12 +59,15 @@ typedef enum { CONNECTION_OK, CONNECTION_BAD, + CONNECTION_CANCEL_FINISHED, /* Non-blocking mode only below here */ is an absolute non-starter: it breaks ABI for every libpq client, even ones that aren't using this facility. Why do we need a new ConnStatusType value anyway? Seems like PostgresPollingStatusType covers what we need: once you reach PGRES_POLLING_OK, the cancel request is done. The test case is still not very bulletproof on slow machines, as it seems to be assuming that 30 seconds == forever. It would be all right to use $PostgreSQL::Test::Utils::timeout_default, but I'm not sure that that's easily retrievable by C code. Maybe make the TAP test pass it in with another optional switch to libpq_pipeline? Alternatively, we could teach libpq_pipeline to do getenv("PG_TEST_TIMEOUT_DEFAULT") with a fallback to 180, but that feels like it might be overly familiar with the innards of Utils.pm. regards, tom lane
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
(resent because it was blocked from the mailing-list due to inclusion of a blocked email address in the To line) From: Andres Freund > On 2022-04-04 15:21:54 +, Jelte Fennema wrote: > > 2. Added some extra sleeps to the cancellation test, to remove random > > failures on FreeBSD. > > That's extremely extremely rarely the solution to address test reliability > issues. It'll fail when running test under valgrind etc. > > Why do you need sleeps / can you find another way to make the test reliable? The problem they are solving is racy behaviour between sending the query and sending the cancellation. If the cancellation is handled before the query is started, then the query doesn't get cancelled. To solve this problem I used the sleeps to wait a bit before sending the cancelation request. When I wrote this, I couldn't think of a better way to do it then with sleeps. But I didn't like it either (and I still don't). These emails made me start to think again, about other ways of solving the problem. I think I've found another solution (see attached patch). The way I solve it now is by using another connection to check the state of the first one. Jelte 0001-Add-documentation-for-libpq_pipeline-tests.patch Description: 0001-Add-documentation-for-libpq_pipeline-tests.patch 0002-Add-non-blocking-version-of-PQcancel.patch Description: 0002-Add-non-blocking-version-of-PQcancel.patch
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Thanks for all the feedback everyone. I'll try to send a new patch later this week that includes user facing docs and a simplified API. For now a few responses: > Yeah. We need to choose a name for these new function(s) that is > sufficiently different from "PQcancel" that people won't expect them > to behave exactly the same as that does. I lack any good ideas about > that, how about you? So I guess the names I proposed were not great, since everyone seems to be falling over them. But I'd like to make my intention clear with the current naming. After this patch there would be four different APIs for starting a cancelation: 1. PQrequestCancel: deprecated+old, not signal-safe function for requesting query cancellation, only uses a specific set of connection options 2. PQcancel: Cancel queries in a signal safe way, to be signal-safe it only uses a limited set of connection options 3. PQcancelConnect: Cancel queries in a non-signal safe way that uses all connection options 4. PQcancelConnectStart: Cancel queries in a non-signal safe and non-blocking way that uses all connection options So the idea was that you should not look at PQcancelConnectStart as the non-blocking version of PQcancel, but as the non-blocking version of PQcancelConnect. I'll try to think of some different names too, but IMHO these names could be acceptable when their differences are addressed sufficiently in the documentation. One other approach to naming that comes to mind now is repurposing PQrequestCancel: 1. PQrequestCancel: Cancel queries in a non-signal safe way that uses all connection options 2. PQrequestCancelStart: Cancel queries in a non-signal safe and non-blocking way that uses all connection options 3. PQcancel: Cancel queries in a signal safe way, to be signal-safe it only uses a limited set of connection options > I think it's probably a > bad sign that this function is tinkering with logic in this sort of > low-level function anyway. pqReadData() is a really general function > that manages to work with non-blocking I/O already, so why does > non-blocking query cancellation need to change its return values, or > whether or not it drops data in certain cases? The reason for this low level change is that the cancellation part of the Postgres protocol is following a different, much more simplistic design than all the other parts. The client does not expect a response message back from the server after sending the cancellation request. The expectation is that the server signals completion by closing the connection, i.e. sending EOF. For all other parts of the protocol, connection termination should be initiated client side by sending a Terminate message. So the server closing (sending EOF) is always unexpected and is thus currently considered an error by pqReadData. But since this is not the case for the cancellation protocol, the result is changed to -2 in case of EOF to make it possible to distinguish between an EOF and an actual error. > And it updates that function to return -2 when the is > ECONNRESET, which seems to fly in the face of the comment's idea that > this is the "clean connection closure" case. The diff sadly does not include the very relevant comment right above these lines. Pasting the whole case statement here to clear up this confusion: case SSL_ERROR_ZERO_RETURN: /* * Per OpenSSL documentation, this error code is only returned for * a clean connection closure, so we should not report it as a * server crash. */ appendPQExpBufferStr(>errorMessage, libpq_gettext("SSL connection has been closed unexpectedly\n")); result_errno = ECONNRESET; n = -2; break; > For example, it updates the > comment for pqsecure_read() to say "Returns -1 in case of failures, > except in the case of clean connection closure then it returns -2." > But that function calls any of three different implementation > functions depending on the situation and the patch only updates one of > them. That comment is indeed not describing what is happening correctly and I'll try to make it clearer. The main reason for it being incorrect is coming from the fact that receiving EOFs is handled in different places based on the encryption method: 1. Unencrypted TCP: EOF is not returned as an error by pqsecure_read, but detected by pqReadData (see comments related to definitelyEOF) 2. OpenSSL: EOF is returned as an error by pqsecure_read (see copied case statement above) 3. GSS: When writing the patch I was not sure how EOF handling worked here, but given that the tests passed for Jacob on GSS, I'm guessing it works the same as unencrypted TCP.