Different compression methods for FPI
Hi! There is a lot of different compression threads nearby. And that's great! Every few bytes going to IO still deserve to be compressed. Currently, we have a pglz compression for WAL full page images. As shown in [0] this leads to high CPU usage in pglz when wal_compression is on. Swapping pglz with lz4 increases pgbench tps by 21% on my laptop (if wal_compression is enabled). So I think it worth to propose a patch to make wal_compression_method = {"pglz", "lz4", "zlib"}. Probably, "zstd" can be added to the list. Even better option would be to teach WAL compression to compress everything, not only FPIs. But this is a big orthogonal chunk of work. Attached is a draft taking CompressionId from "custom compression methods" patch and adding zlib to it. I'm not sure where to add tests that check recovery with different methods. It seems to me that only TAP tests are suitable for this. Thanks! Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/323B1B01-DA42-419F-A99C-23E2C162D53B%40yandex-team.ru 0001-Use-different-compression-methods-for-FPIs.patch Description: Binary data
Re: SEARCH and CYCLE clauses
On 22.02.21 14:45, Vik Fearing wrote: On 2/22/21 1:28 PM, Peter Eisentraut wrote: On 22.02.21 11:05, Vik Fearing wrote: This looks good to me, except that you forgot to add the feature stamp. Attached is a small diff to apply on top of your patch to fix that. The feature code is from SQL:202x, whereas the table is relative to SQL:2016. We could add it, but probably with a comment. OK. done
Re: repeated decoding of prepared transactions
On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian wrote: > > On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian wrote: > > > I've updated snapshot_was_exported_at_ member to pg_replication_slots as > > well. > > Do have a look and let me know if there are any comments. > > Update with both patches. > Thanks, I have made some minor changes to the first patch and now it looks good to me. The changes are as below: 1. Removed the changes related to exposing this new parameter via view as mentioned in my previous email. 2. Changed the variable name initial_consistent_point. 3. Ran pgindent, minor changes in comments, and modified the commit message. Let me know what you think about these changes. Next, I'll review your second patch which allows the 2PC option to be enabled only at slot creation time. -- With Regards, Amit Kapila. From b4d4504b64452ba6cc8602f66acac8209317da0a Mon Sep 17 00:00:00 2001 From: Ajin Cherian Date: Fri, 26 Feb 2021 02:58:49 -0500 Subject: [PATCH v5 1/2] Avoid repeated decoding of prepared transactions after the restart. In commit a271a1b50e, we allowed decoding at prepare time and the prepare was decoded again if there is a restart after decoding it. It was done that way because we can't distinguish between the cases where we have not decoded the prepare because it was prior to consistent snapshot or we have decoded it earlier but restarted. To distinguish between these two cases, we have introduced an initial_consisten_point at the slot level which is an LSN at which we found a consistent point at the time of slot creation. This is also the point where we have exported a snapshot for the initial copy. So, prepare transaction prior to this point are sent along with commit prepared. Author: Ajin Cherian, based on idea by Andres Freund Reviewed-by: Amit Kapila and Vignesh C Discussion: d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com">https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com --- contrib/test_decoding/expected/twophase.out| 38 +++--- contrib/test_decoding/expected/twophase_stream.out | 28 ++-- doc/src/sgml/logicaldecoding.sgml | 9 ++--- src/backend/replication/logical/decode.c | 2 ++ src/backend/replication/logical/logical.c | 3 +- src/backend/replication/logical/reorderbuffer.c| 10 +++--- src/backend/replication/logical/snapbuild.c| 24 +- src/include/replication/reorderbuffer.h| 1 + src/include/replication/slot.h | 7 src/include/replication/snapbuild.h| 4 ++- 10 files changed, 60 insertions(+), 66 deletions(-) diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out index f9f6bed..c51870f 100644 --- a/contrib/test_decoding/expected/twophase.out +++ b/contrib/test_decoding/expected/twophase.out @@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two COMMIT PREPARED 'test_prepared#1'; SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1'); -data - - BEGIN - table public.test_prepared1: INSERT: id[integer]:1 - table public.test_prepared1: INSERT: id[integer]:2 - PREPARE TRANSACTION 'test_prepared#1' + data +--- COMMIT PREPARED 'test_prepared#1' -(5 rows) +(1 row) -- Test that rollback of a prepared xact is decoded. BEGIN; @@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two COMMIT PREPARED 'test_prepared#3'; SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1'); - data -- - BEGIN - table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar' - PREPARE TRANSACTION 'test_prepared#3' + data +--- COMMIT PREPARED 'test_prepared#3' -(4 rows) +(1 row) -- make sure stuff still works INSERT INTO test_prepared1 VALUES (6); @@ -159,14 +152,10 @@ RESET statement_timeout; COMMIT PREPARED 'test_prepared_lock'; -- consume the commit SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1'); - data - BEGIN - table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol' - table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2' -
Re: repeated decoding of prepared transactions
On Sat, 27 Feb, 2021, 1:59 pm Amit Kapila, wrote: > > I have recommended above to change this name to initial_consistency_at > because there are times when we don't export snapshot and we still set > this like when creating slots with CRS_NOEXPORT_SNAPSHOT or when > creating via SQL APIs. I am not sure why Ajin neither changed the > name nor responded to that comment. What is your opinion? > I am fine with the name initial_consistency_at. I am also fine with not showing this in the pg_replication_slot view and keeping this internal. Regards, Ajin Cherian Fujitsu Australia > >
Re: repeated decoding of prepared transactions
On Thu, Feb 25, 2021 at 5:04 PM vignesh C wrote: > > On Wed, Feb 24, 2021 at 5:06 PM Ajin Cherian wrote: > > > > On Wed, Feb 24, 2021 at 4:48 PM Ajin Cherian wrote: > > > > > I plan to split this into two patches next. But do review and let me > > > know if you have any comments. > > > > Attaching an updated patch-set with the changes for > > snapshot_was_exported_at_lsn separated out from the changes for the > > APIs pg_create_logical_replication_slot() and > > pg_logical_slot_get_changes(). Along with a rebase that takes in a few > > more commits since my last patch. > > One observation while verifying the patch I noticed that most of > ReplicationSlotPersistentData structure members are displayed in > pg_replication_slots, but I did not see snapshot_was_exported_at_lsn > being displayed. Is this intentional? If not intentional we can > include snapshot_was_exported_at_lsn in pg_replication_slots. > On thinking about this point, I feel we don't need this new parameter in the view because I am not able to see how it is of any use to the user. Over time, corresponding to that LSN there won't be any WAL record or maybe WAL would be overwritten. I think this is primarily for our internal use so let's not expose it. I intend to remove it from the patch unless you have some reason for exposing this to the user. -- With Regards, Amit Kapila.
Re: Remove latch.c workaround for Linux < 2.6.27
Thomas Munro writes: > Commit 82ebbeb0 added a workaround for (already in 2017) ancient Linux > kernels with no EPOLL_CLOEXEC. I don't see any such systems in the > build farm today (and if there is one hiding in there somewhere, it's > well past time to upgrade). I'd like to rip that code out, because > I'm about to commit some new code that uses another 2.6.17+ > XXX_CLOEXEC flag, and it'd be silly to have to write new workaround > code for that too, and a contradiction to have fallback code in one > place but not another. Any objections? I believe we've dropped support for RHEL5, so no objection here. regards, tom lane
Re: [HACKERS] logical decoding of two-phase transactions
On Sat, Feb 27, 2021 at 7:31 AM Peter Smith wrote: > > On Fri, Feb 26, 2021 at 9:58 PM Amit Kapila wrote: > > 6. > > + * XXX - Is there a potential timing problem here - e.g. if signal arrives > > + * while executing this then maybe we will set table_states_valid without > > + * refetching them? > > + */ > > +static void > > +FetchTableStates(bool *started_tx) > > .. > > > > Can you explain which race condition you are worried about here which > > is not possible earlier but can happen after this patch? > > > > Yes, my question (in that XXX comment) was not about anything new for > the current patch, because this FetchTableStates function has exactly > the same logic as the HEAD code. > > I was only wondering if there is any possibility that one of the > function calls (inside the if block) can end up calling > CHECK_INTERRUPTS. If that could happen, then perhaps the > table_states_valid flag could be assigned false (by the > invalidate_syncing_table_states signal handler) only to be > immediately/wrongly overwritten as table_states_valid = true in this > FetchTableStates code. > This is not related to CHECK_FOR_INTERRUPTS. The invalidate_syncing_table_states() can be called only when we process invalidation messages which we do while locking the relation via GetSubscriptionRelationstable_open->relation_open->LockRelationOid. After that, it won't be done in that part of the code. So, I think we don't need this comment. -- With Regards, Amit Kapila.
Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Here's a new version. The condition variable patch 0001 fixes a bug: CleanupProcSignalState() also needs to broadcast. The hunk that allows the interrupt handlers to use CVs while you're already waiting on a CV is now in a separate patch 0002. I'm thinking of going ahead and committing those two. The 0003 patch to achieve $SUBJECT needs more discussion. From 51bb33c8755efa11d24595ed0a27fa0b387e8153 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 27 Feb 2021 15:40:11 +1300 Subject: [PATCH v4 1/3] Use condition variables for ProcSignalBarriers. Instead of a poll/sleep loop, use a condition variable for precise wake-up whenever a backend's pss_barrierGeneration advances. Discussion: https://postgr.es/m/ca+hukgldemy2gbm80kz20gte6hnvwoere8kwcjk6-u56ost...@mail.gmail.com --- src/backend/storage/ipc/procsignal.c | 34 1 file changed, 14 insertions(+), 20 deletions(-) diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index c43cdd685b..aa0362216b 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -23,6 +23,7 @@ #include "miscadmin.h" #include "pgstat.h" #include "replication/walsender.h" +#include "storage/condition_variable.h" #include "storage/ipc.h" #include "storage/latch.h" #include "storage/proc.h" @@ -59,10 +60,11 @@ */ typedef struct { - pid_t pss_pid; - sig_atomic_t pss_signalFlags[NUM_PROCSIGNALS]; + volatile pid_t pss_pid; + volatile sig_atomic_t pss_signalFlags[NUM_PROCSIGNALS]; pg_atomic_uint64 pss_barrierGeneration; pg_atomic_uint32 pss_barrierCheckMask; + ConditionVariable pss_barrierCV; } ProcSignalSlot; /* @@ -93,7 +95,7 @@ typedef struct ((flags) &= ~(((uint32) 1) << (uint32) (type))) static ProcSignalHeader *ProcSignal = NULL; -static volatile ProcSignalSlot *MyProcSignalSlot = NULL; +static ProcSignalSlot *MyProcSignalSlot = NULL; static bool CheckProcSignal(ProcSignalReason reason); static void CleanupProcSignalState(int status, Datum arg); @@ -142,6 +144,7 @@ ProcSignalShmemInit(void) MemSet(slot->pss_signalFlags, 0, sizeof(slot->pss_signalFlags)); pg_atomic_init_u64(>pss_barrierGeneration, PG_UINT64_MAX); pg_atomic_init_u32(>pss_barrierCheckMask, 0); + ConditionVariableInit(>pss_barrierCV); } } } @@ -156,7 +159,7 @@ ProcSignalShmemInit(void) void ProcSignalInit(int pss_idx) { - volatile ProcSignalSlot *slot; + ProcSignalSlot *slot; uint64 barrier_generation; Assert(pss_idx >= 1 && pss_idx <= NumProcSignalSlots); @@ -208,7 +211,7 @@ static void CleanupProcSignalState(int status, Datum arg) { int pss_idx = DatumGetInt32(arg); - volatile ProcSignalSlot *slot; + ProcSignalSlot *slot; slot = >psh_slot[pss_idx - 1]; Assert(slot == MyProcSignalSlot); @@ -237,6 +240,7 @@ CleanupProcSignalState(int status, Datum arg) * no barrier waits block on it. */ pg_atomic_write_u64(>pss_barrierGeneration, PG_UINT64_MAX); + ConditionVariableBroadcast(>pss_barrierCV); slot->pss_pid = 0; } @@ -391,13 +395,11 @@ EmitProcSignalBarrier(ProcSignalBarrierType type) void WaitForProcSignalBarrier(uint64 generation) { - long timeout = 125L; - Assert(generation <= pg_atomic_read_u64(>psh_barrierGeneration)); for (int i = NumProcSignalSlots - 1; i >= 0; i--) { - volatile ProcSignalSlot *slot = >psh_slot[i]; + ProcSignalSlot *slot = >psh_slot[i]; uint64 oldval; /* @@ -409,20 +411,11 @@ WaitForProcSignalBarrier(uint64 generation) oldval = pg_atomic_read_u64(>pss_barrierGeneration); while (oldval < generation) { - int events; - - CHECK_FOR_INTERRUPTS(); - - events = -WaitLatch(MyLatch, - WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, - timeout, WAIT_EVENT_PROC_SIGNAL_BARRIER); - ResetLatch(MyLatch); - + ConditionVariableSleep(>pss_barrierCV, + WAIT_EVENT_PROC_SIGNAL_BARRIER); oldval = pg_atomic_read_u64(>pss_barrierGeneration); - if (events & WL_TIMEOUT) -timeout = Min(timeout * 2, 1000L); } + ConditionVariableCancelSleep(); } /* @@ -589,6 +582,7 @@ ProcessProcSignalBarrier(void) * next called. */ pg_atomic_write_u64(>pss_barrierGeneration, shared_gen); + ConditionVariableBroadcast(>pss_barrierCV); } /* -- 2.30.0 From 9fbb46cd8909d8058d32421e0e41c223d710d7d4 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 27 Feb 2021 15:36:22 +1300 Subject: [PATCH v4 2/3] Allow condition variables to be used in interrupt code. Adjust the condition variable sleep loop to work correctly when code reached by its internal CHECK_FOR_INTERRUPTS() call interacts with another condition variable. There are no such cases currently, but a proposed patch would do this. Discussion: https://postgr.es/m/ca+hukgldemy2gbm80kz20gte6hnvwoere8kwcjk6-u56ost...@mail.gmail.com --- src/backend/storage/lmgr/condition_variable.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git
Re: repeated decoding of prepared transactions
On Fri, Feb 26, 2021 at 7:26 PM vignesh C wrote: > > On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian wrote: > > > > On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian wrote: > > > > > I've updated snapshot_was_exported_at_ member to pg_replication_slots as > > > well. > > > Do have a look and let me know if there are any comments. > > > > Update with both patches. > > Thanks for fixing and providing an updated patch. Patch applies, make > check and make check-world passes. I could see the issue working fine. > > Few minor comments: > + snapshot_was_exported_at > pg_lsn > + > + > + The address (LSN) at which the logical > + slot found a consistent point at the time of slot creation. > + NULL for physical slots. > + > + > > > I had seen earlier also we had some discussion on naming > snapshot_was_exported_at. Can we change snapshot_was_exported_at to > snapshot_exported_lsn, I felt if we can include the lsn in the name, > the user will be able to interpret easily and also it will be similar > to other columns in pg_replication_slots view. > I have recommended above to change this name to initial_consistency_at because there are times when we don't export snapshot and we still set this like when creating slots with CRS_NOEXPORT_SNAPSHOT or when creating via SQL APIs. I am not sure why Ajin neither changed the name nor responded to that comment. What is your opinion? -- With Regards, Amit Kapila.
Re: [HACKERS] Custom compression methods
On Fri, Feb 26, 2021 at 8:10 PM Dilip Kumar wrote: > > On Sun, Feb 21, 2021 at 5:33 PM Dilip Kumar wrote: > > > > Based on offlist discussion with Robert, I have done further analysis > of the composite type data. So the Idea is that I have analyzed all > the callers of > HeapTupleGetDatum and HeapTupleHeaderGetDatum and divide them into two > category 1) Callers which are forming the tuple from values that can > not have compressed/external data. > 2) Callers which can have external/compressed data I just realized that there is one more function "heap_copy_tuple_as_datum" which is flattening the tuple based on the HeapTupleHasExternal check, so I think I will have to analyze the caller of this function as well and need to do a similar analysis, although there are just a few callers for this. And, I think the fix in ExecEvalConvertRowtype is wrong, we will have to do something for the compressed type here as well. I am not sure what is the best way to fix it because we are directly getting the input tuple so we can not put an optimization of dettoasting before forming the tuple. We might detoast in execute_attr_map_tuple, when the source and target row types are different because we are anyway deforming and processing each filed in that function but the problem is execute_attr_map_tuple is used at multiple places but for that, we can make another version of this function which actually detoast along with conversion and use that in ExecEvalConvertRowtype. But if there is no tuple conversion needed then we directly use heap_copy_tuple_as_datum and in that case, there is no deforming at all so maybe, in this case, we can not do anything but I think ExecEvalConvertRowtype should not be the very common path. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] logical decoding of two-phase transactions
On Fri, Feb 26, 2021 at 9:58 PM Amit Kapila wrote: > 6. > + * XXX - Is there a potential timing problem here - e.g. if signal arrives > + * while executing this then maybe we will set table_states_valid without > + * refetching them? > + */ > +static void > +FetchTableStates(bool *started_tx) > .. > > Can you explain which race condition you are worried about here which > is not possible earlier but can happen after this patch? > Yes, my question (in that XXX comment) was not about anything new for the current patch, because this FetchTableStates function has exactly the same logic as the HEAD code. I was only wondering if there is any possibility that one of the function calls (inside the if block) can end up calling CHECK_INTERRUPTS. If that could happen, then perhaps the table_states_valid flag could be assigned false (by the invalidate_syncing_table_states signal handler) only to be immediately/wrongly overwritten as table_states_valid = true in this FetchTableStates code. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Reducing WaitEventSet syscall churn
On Tue, Jan 5, 2021 at 6:10 PM Thomas Munro wrote: > For point 2, the question I am raising is: why should users get a > special FATAL message in some places and not others, for PM death? > However, if people are attached to that behaviour, we could still > achieve goal 1 without goal 2 by handling PM death explicitly in > walsender.c and I'd be happy to post an alternative patch set like > that. Here's the alternative patch set, with no change to existing error message behaviour. I'm going to commit this version and close this CF item soon if there are no objections. From 8f563edefdc2f276b3d8abeb019af1ff172bf7d9 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 27 Feb 2021 13:34:37 +1300 Subject: [PATCH v8 1/2] Introduce symbolic names for FeBeWaitSet positions. Previously we used 0 and 1 to refer to the socket and latch in far flung parts of the tree, without any explanation. Also use PGINVALID_SOCKET rather than -1 in the place where we weren't already. Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com --- src/backend/libpq/be-secure.c | 4 ++-- src/backend/libpq/pqcomm.c| 18 +++--- src/backend/utils/init/miscinit.c | 6 -- src/include/libpq/libpq.h | 3 +++ 4 files changed, 24 insertions(+), 7 deletions(-) diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index d1545a2ad6..bb603ad209 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -180,7 +180,7 @@ retry: Assert(waitfor); - ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL); WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_READ); @@ -292,7 +292,7 @@ retry: Assert(waitfor); - ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL); WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_WRITE); diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 1e6b6db540..27a298f110 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -191,6 +191,9 @@ WaitEventSet *FeBeWaitSet; void pq_init(void) { + int socket_pos PG_USED_FOR_ASSERTS_ONLY; + int latch_pos PG_USED_FOR_ASSERTS_ONLY; + /* initialize state variables */ PqSendBufferSize = PQ_SEND_BUFFER_SIZE; PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize); @@ -219,10 +222,19 @@ pq_init(void) #endif FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3); - AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock, + socket_pos = AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, + MyProcPort->sock, NULL, NULL); + latch_pos = AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, PGINVALID_SOCKET, + MyLatch, NULL); + AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, PGINVALID_SOCKET, NULL, NULL); - AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL); - AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL); + + /* + * The event positions match the order we added them, but let's sanity + * check them to be sure. + */ + Assert(socket_pos == FeBeWaitSetSocketPos); + Assert(latch_pos == FeBeWaitSetLatchPos); } /* diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index 734c66d4e8..8de70eb46d 100644 --- a/src/backend/utils/init/miscinit.c +++ b/src/backend/utils/init/miscinit.c @@ -202,7 +202,8 @@ SwitchToSharedLatch(void) MyLatch = >procLatch; if (FeBeWaitSet) - ModifyWaitEvent(FeBeWaitSet, 1, WL_LATCH_SET, MyLatch); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET, + MyLatch); /* * Set the shared latch as the local one might have been set. This @@ -221,7 +222,8 @@ SwitchBackToLocalLatch(void) MyLatch = if (FeBeWaitSet) - ModifyWaitEvent(FeBeWaitSet, 1, WL_LATCH_SET, MyLatch); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET, + MyLatch); SetLatch(MyLatch); } diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h index b41b10620a..e4e5c21565 100644 --- a/src/include/libpq/libpq.h +++ b/src/include/libpq/libpq.h @@ -55,6 +55,9 @@ extern const PGDLLIMPORT PQcommMethods *PqCommMethods; */ extern WaitEventSet *FeBeWaitSet; +#define FeBeWaitSetSocketPos 0 +#define FeBeWaitSetLatchPos 1 + extern int StreamServerPort(int family, const char *hostName, unsigned short portNumber, const char *unixSocketDir, pgsocket ListenSocket[], int MaxListen); -- 2.30.0 From 22b2c474b476ca91b579408f73ebdc014bb76083 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 24 Feb 2020 23:48:29 +1300 Subject: [PATCH v8 2/2] Use FeBeWaitSet for walsender.c. This avoids the need to set up and tear down a new WaitEventSet every time we
Re: row filtering for logical replication
On Mon, Feb 22, 2021, at 9:28 AM, Euler Taveira wrote: > On Mon, Feb 22, 2021, at 7:45 AM, Önder Kalacı wrote: >> Thanks for working on this. I did some review and testing, please see my >> comments below. > I appreciate your review. I'm working on a new patch set and expect to post > it soon. I'm attaching a new patch set. This new version improves documentation and commit messages and incorporates a few debug messages. I did a couple of tests and didn't find issues. Here are some numbers from my i7 using a simple expression (aid > 0) on table pgbench_accounts. $ awk '/row filter time/ {print $9}' postgresql.log | /tmp/stat.pl 99 mean: 33.00 us stddev: 17.65 us median: 28.83 us min-max:[3.48 .. 6404.84] us percentile(99): 49.66 us mode: 41.71 us I don't expect 0005 and 0006 to be included. I attached them to help with some tests. -- Euler Taveira EDB https://www.enterprisedb.com/ From 596b278d4be77f67467aa1d61ce26e765b078197 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Mon, 18 Jan 2021 11:53:34 -0300 Subject: [PATCH 1/6] Rename a WHERE node A WHERE clause will be used for row filtering in logical replication. We already have a similar node: 'WHERE (condition here)'. Let's rename the node to a generic name and use it for row filtering too. --- src/backend/parser/gram.y | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index dd72a9fc3c..ecfd98ba5b 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -485,7 +485,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type def_arg columnElem where_clause where_or_current_clause a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound columnref in_expr having_clause func_table xmltable array_expr -ExclusionWhereClause operator_def_arg +OptWhereClause operator_def_arg %type rowsfrom_item rowsfrom_list opt_col_def_list %type opt_ordinality %type ExclusionConstraintList ExclusionConstraintElem @@ -3806,7 +3806,7 @@ ConstraintElem: $$ = (Node *)n; } | EXCLUDE access_method_clause '(' ExclusionConstraintList ')' -opt_c_include opt_definition OptConsTableSpace ExclusionWhereClause +opt_c_include opt_definition OptConsTableSpace OptWhereClause ConstraintAttributeSpec { Constraint *n = makeNode(Constraint); @@ -3908,7 +3908,7 @@ ExclusionConstraintElem: index_elem WITH any_operator } ; -ExclusionWhereClause: +OptWhereClause: WHERE '(' a_expr ')' { $$ = $3; } | /*EMPTY*/{ $$ = NULL; } ; -- 2.20.1 From 9b1d46d5c146d591022c12c7acec8d61d4e0bfcc Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Mon, 18 Jan 2021 12:07:51 -0300 Subject: [PATCH 2/6] Row filter for logical replication This feature adds row filter for publication tables. When you define or modify a publication you can optionally filter rows that does not satisfy a WHERE condition. It allows you to partially replicate a database or set of tables. The row filter is per table which means that you can define different row filters for different tables. A new row filter can be added simply by informing the WHERE clause after the table name. The WHERE expression must be enclosed by parentheses. The WHERE clause should probably contain only columns that are part of the primary key or that are covered by REPLICA IDENTITY. Otherwise, and DELETEs won't be replicated. DELETE uses the old row version (that is limited to primary key or REPLICA IDENTITY) to evaluate the row filter. INSERT and UPDATE use the new row version to evaluate the row filter, hence, you can use any column. If the row filter evaluates to NULL, it returns false. For simplicity, functions are not allowed; it could possibly be addressed in a future patch. If you choose to do the initial table synchronization, only data that satisfies the row filters is sent. If the subscription has several publications in which a table has been published with different WHERE clauses, rows must satisfy all expressions to be copied. If your publication contains partitioned table, the parameter publish_via_partition_root determines if it uses the partition row filter (if the parameter is false -- default) or the partitioned table row filter. --- doc/src/sgml/catalogs.sgml | 8 + doc/src/sgml/ref/alter_publication.sgml | 11 +- doc/src/sgml/ref/create_publication.sgml| 38 +++- doc/src/sgml/ref/create_subscription.sgml | 8 +- src/backend/catalog/pg_publication.c| 52 - src/backend/commands/publicationcmds.c | 96 + src/backend/parser/gram.y | 28 ++- src/backend/parser/parse_agg.c | 10 + src/backend/parser/parse_expr.c | 13 ++ src/backend/parser/parse_func.c | 3 + src/backend/replication/logical/tablesync.c | 93 -
Re: Interest in GSoC 2021 Projects
Hi, On Fri, Feb 26, 2021 at 03:56:17PM +0800, Yixin Shi wrote: > Hi, > > > > I am Yixin Shi, a junior student majoring in Computer Science at University > of Michigan. I notice that the project ideas for GSoC 2021 have been posted > on your webpage and I am quite interested in two of them. I really wish to > take part in the project *Make plsample a more complete procedural language > handler example (2021)* considering both my interest and ability. The > potential mentor for this Project is *Mark Wong*. I am also interested in > the project *Improve PostgreSQL Regression Test Coverage (2021)* and the > potential mentor is *Andreas Scherbaum* and *Stephen Frost*. > > > > I would like to learn more about these two projects but failed to contact > the mentors. How can I contact them? Also, I really hope to join the > project. Are there any suggestions on application? The idea for plsample is to continue providing working code and documentation examples so that plsample can be used as a template for creating additional language handlers. Depending on the individual's comfort level, some effort may need to be put into studying the current handlers for ideas on how to implement the lacking functionality in plsample. Does that help? Regards, Mark
Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.
On Fri, Feb 26, 2021 at 5:22 AM Amit Kapila wrote: > > https://www.postgresql.org/docs/devel/logical-replication-config.html > Ah, yep. I added a clause to the end of the sentence to clarify why we're using max_replication_slots here: - The subscriber also requires the max_replication_slots to be set. + The subscriber also requires that max_replication_slots be set to + configure how many replication origins can be tracked. > > Okay, that makes sense. However, I have sent a patch today (see [1]) > where I have slightly updated the subscriber-side configuration > paragraph. From PG-14 onwards, table synchronization workers also use > origins on subscribers, so you might want to adjust. > > ... > > I guess we can leave adding GUC to some other day as that might > require a bit broader acceptance and we are already near to the start > of last CF. I think we can still consider it if we few more people > share the same opinion as yours. > Great. I'll wait to update the GUC patch until your patch and/or my doc-only patch get merged. Should I add it to the March CF? Separate question: are documentation updates like these ever backported to older versions that are still supported? And if so, would the changes be reflected immediately, or would they require a minor point release? When I was on an older release I found that I'd jump back and forth between the version I was using and the latest version to see if anything had changed. - Paul max_replication_slots_subscriber_doc_v01.diff Description: Binary data
Remove latch.c workaround for Linux < 2.6.27
Hello, Commit 82ebbeb0 added a workaround for (already in 2017) ancient Linux kernels with no EPOLL_CLOEXEC. I don't see any such systems in the build farm today (and if there is one hiding in there somewhere, it's well past time to upgrade). I'd like to rip that code out, because I'm about to commit some new code that uses another 2.6.17+ XXX_CLOEXEC flag, and it'd be silly to have to write new workaround code for that too, and a contradiction to have fallback code in one place but not another. Any objections? From 9c6ac992f60a904f8073e62b1b8085ff9df7ae0b Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 27 Feb 2021 11:22:16 +1300 Subject: [PATCH] Remove latch.c workaround for Linux < 2.6.27. Ancient Linux had no EPOLL_CLOEXEC flag, so commit 82ebbeb0 added a separate code path with an fcntl() call. Kernels of that vintage are long gone. Now seems like a good time to drop this extra code, because otherwise we'd have to add similar workaround code to new patches using XXX_CLOEXEC flags to avoid contradicting ourselves. --- src/backend/storage/ipc/latch.c | 21 + 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index f2d005eea0..79b9627831 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -666,31 +666,12 @@ CreateWaitEventSet(MemoryContext context, int nevents) /* treat this as though epoll_create1 itself returned EMFILE */ elog(ERROR, "epoll_create1 failed: %m"); } -#ifdef EPOLL_CLOEXEC set->epoll_fd = epoll_create1(EPOLL_CLOEXEC); if (set->epoll_fd < 0) { ReleaseExternalFD(); elog(ERROR, "epoll_create1 failed: %m"); } -#else - /* cope with ancient glibc lacking epoll_create1 (e.g., RHEL5) */ - set->epoll_fd = epoll_create(nevents); - if (set->epoll_fd < 0) - { - ReleaseExternalFD(); - elog(ERROR, "epoll_create failed: %m"); - } - if (fcntl(set->epoll_fd, F_SETFD, FD_CLOEXEC) == -1) - { - int save_errno = errno; - - close(set->epoll_fd); - ReleaseExternalFD(); - errno = save_errno; - elog(ERROR, "fcntl(F_SETFD) failed on epoll descriptor: %m"); - } -#endif /* EPOLL_CLOEXEC */ #elif defined(WAIT_USE_KQUEUE) if (!AcquireExternalFD()) { @@ -736,7 +717,7 @@ CreateWaitEventSet(MemoryContext context, int nevents) * * Note: preferably, this shouldn't have to free any resources that could be * inherited across an exec(). If it did, we'd likely leak those resources in - * many scenarios. For the epoll case, we ensure that by setting FD_CLOEXEC + * many scenarios. For the epoll case, we ensure that by setting EPOLL_CLOEXEC * when the FD is created. For the Windows case, we assume that the handles * involved are non-inheritable. */ -- 2.30.0
Re: SSL SNI
> Do you mean the IPv6 detection code is not correct? What is the problem? This bit, will recognize ipv4 addresses but not ipv6 addresses: + /* + * Set Server Name Indication (SNI), but not if it's a literal IP address. + * (RFC 6066) + */ + if (!(strspn(conn->pghost, "0123456789.") == strlen(conn->pghost) || + strchr(conn->pghost, ':'))) + {
Re: Allow matching whole DN from a client certificate
On 2/26/21 2:55 PM, Jacob Champion wrote: > On Sat, 2021-01-30 at 16:18 -0500, Andrew Dunstan wrote: >> Making incremental additions to the certificate set easier wouldn't be a >> bad thing. >> >> I wonder if we should really be setting 1 as the serial number, though. >> Might it not be better to use, say, `date +%Y%m%d01` rather like we do >> with catalog version numbers? > I have been experimenting a bit with both of these suggestions; hope to > have something in time for commitfest on Monday. Writing new tests for > NSS has run into the same problems you've mentioned. > > FYI, I've pulled the port->peer_dn functionality you've presented here > into my authenticated identity patchset at [1]. > > --Jacob > > [1] > https://www.postgresql.org/message-id/flat/c55788dd1773c521c862e8e0dddb367df51222be.camel%40vmware.com Cool. I think the thing that's principally outstanding w.r.t. this patch is what format we should use to extract the DN. Should we use RFC2253, which reverses the field order, as has been suggested upthread and is in the latest patch? I'm slightly worried that it might be a POLA violation. But I don't have terribly strong feelings about it. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
On 2021-Jan-10, Justin Pryzby wrote: > On Fri, Jan 08, 2021 at 04:14:33PM -0300, Alvaro Herrera wrote: > > > > I ended up with apparently broken constraint when running multiple > > > > loops around > > > > a concurrent detach / attach: > > > > > > > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR > > > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 > > > > CONCURRENTLY"; do :; done& > > > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR > > > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 > > > > CONCURRENTLY"; do :; done& > > > > > > > > "p1_check" CHECK (true) > > > > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2) > > > > > > Not good. > > > > Haven't had time to investigate this problem yet. > > I guess it's because you commited the txn and released lock in the middle of > the command. Hmm, but if we take this approach, then we're still vulnerable to the problem that somebody can do DETACH CONCURRENTLY and cancel the wait (or crash the server), then mess up the state before doing DETACH FINALIZE: when they cancel the wait, the lock will be released. I think the right fix is to disallow any action on a partition which is pending detach other than DETACH FINALIZE. (Didn't do that here.) Here's a rebase to current sources; there are no changes from v5. -- Álvaro Herrera Valdivia, Chile "No hay ausente sin culpa ni presente sin disculpa" (Prov. francés) >From d042b99ad3368ba0b658ac9260450ed8e39accea Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 13 Jul 2020 20:15:30 -0400 Subject: [PATCH v6 1/2] Let ALTER TABLE exec routines deal with the relation This means that ATExecCmd relies on AlteredRelationInfo->rel instead of keeping the relation as a local variable; this is useful if the subcommand needs to modify the relation internally. For example, a subcommand that internally commits its transaction needs this. --- src/backend/commands/tablecmds.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b2457a6924..b990063d38 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -156,6 +156,8 @@ typedef struct AlteredTableInfo Oid relid; /* Relation to work on */ char relkind; /* Its relkind */ TupleDesc oldDesc; /* Pre-modification tuple descriptor */ + /* Transiently set during Phase 2, normally set to NULL */ + Relation rel; /* Information saved by Phase 1 for Phase 2: */ List *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */ /* Information saved by Phases 1/2 for Phase 3: */ @@ -353,7 +355,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, AlterTableUtilityContext *context); static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, AlterTableUtilityContext *context); -static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, +static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass, AlterTableUtilityContext *context); static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab, @@ -4513,7 +4515,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, { AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab); List *subcmds = tab->subcmds[pass]; - Relation rel; ListCell *lcmd; if (subcmds == NIL) @@ -4522,10 +4523,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, /* * Appropriate lock was obtained by phase 1, needn't get it again */ - rel = relation_open(tab->relid, NoLock); + tab->rel = relation_open(tab->relid, NoLock); foreach(lcmd, subcmds) -ATExecCmd(wqueue, tab, rel, +ATExecCmd(wqueue, tab, castNode(AlterTableCmd, lfirst(lcmd)), lockmode, pass, context); @@ -4537,7 +4538,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, if (pass == AT_PASS_ALTER_TYPE) ATPostAlterTypeCleanup(wqueue, tab, lockmode); - relation_close(rel, NoLock); + if (tab->rel) + { +relation_close(tab->rel, NoLock); +tab->rel = NULL; + } } } @@ -4563,11 +4568,12 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, * ATExecCmd: dispatch a subcommand to appropriate execution routine */ static void -ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, +ATExecCmd(List **wqueue, AlteredTableInfo *tab, AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass, AlterTableUtilityContext *context) { ObjectAddress address = InvalidObjectAddress; + Relation rel = tab->rel; switch (cmd->subtype) { @@ -5670,6 +5676,7 @@ ATGetQueueEntry(List **wqueue, Relation rel) */ tab = (AlteredTableInfo *) palloc0(sizeof(AlteredTableInfo)); tab->relid = relid; + tab->rel = NULL; /* set later */ tab->relkind = rel->rd_rel->relkind; tab->oldDesc
authtype parameter in libpq
When looking at disallowing SSL compression I found the parameter "authtype" which was deprecated in commit d5bbe2aca55bc8 on January 26 1998. While I do think there is a case to be made for the backwards compatibility having run its course on this one, shouldn't we at least remove the environment variable and default compiled fallback for it to save us a getenv call when filling in the option defaults? -- Daniel Gustafsson https://vmware.com/ 0001-Remove-defaults-from-authtype-parameter.patch Description: Binary data
Re: Disallow SSL compression?
> On 26 Feb 2021, at 03:34, Michael Paquier wrote: > There is just pain waiting ahead when breaking connection strings > that used to work previously. A "while" could take a long time > though, see the case of "tty" that's still around (cb7fb3c). I see your tty removal from 2003, and raise you one "authtype" which was axed on January 26 1998 in commit d5bbe2aca55bc833e38c768d7f82, but which is still around. More on that in a separate thread though. -- Daniel Gustafsson https://vmware.com/
Re: Allow matching whole DN from a client certificate
On Sat, 2021-01-30 at 16:18 -0500, Andrew Dunstan wrote: > Making incremental additions to the certificate set easier wouldn't be a > bad thing. > > I wonder if we should really be setting 1 as the serial number, though. > Might it not be better to use, say, `date +%Y%m%d01` rather like we do > with catalog version numbers? I have been experimenting a bit with both of these suggestions; hope to have something in time for commitfest on Monday. Writing new tests for NSS has run into the same problems you've mentioned. FYI, I've pulled the port->peer_dn functionality you've presented here into my authenticated identity patchset at [1]. --Jacob [1] https://www.postgresql.org/message-id/flat/c55788dd1773c521c862e8e0dddb367df51222be.camel%40vmware.com
Re: More test/kerberos tweaks
On Fri, 2021-02-05 at 21:54 +, Jacob Champion wrote: > That just leaves the first patch, then. I've moved the first patch into the commitfest entry for [1], which depends on it. --Jacob [1] https://www.postgresql.org/message-id/flat/c55788dd1773c521c862e8e0dddb367df51222be.camel%40vmware.com
Re: Proposal: Save user's original authenticated identity for logging
On Thu, 2021-02-11 at 20:32 +, Jacob Champion wrote: > v2 just updates the patchset to remove the Windows TODO and fill in the > patch notes; no functional changes. The question about escaping log > contents remains. v3 rebases onto latest master, for SSL test conflicts. Note: - Since the 0001 patch from [1] is necessary for the new Kerberos tests in 0003, I won't make a separate commitfest entry for it. - 0002 would be subsumed by [2] if it's committed. --Jacob [1] https://www.postgresql.org/message-id/flat/fe7a46f8d46ebb074ba1572d4b5e4af72dc95420.camel%40vmware.com [2] https://www.postgresql.org/message-id/flat/fd96ae76-a8e3-ef8e-a642-a592f5b76771%40dunslane.net#642757cec955d8e923025898402f9452 From ed225e1d1671dcd4da94a2244171a206b88563cc Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Wed, 3 Feb 2021 11:04:42 -0800 Subject: [PATCH v3 1/3] test/kerberos: only search forward in logs The log content tests searched through the entire log file, from the beginning, for every match. If two tests shared the same expected log content, it was possible for the second test to get a false positive by matching against the first test's output. (This could be seen by modifying one of the expected-failure tests to expect the same output as a previous happy-path test.) Store the offset of the previous match, and search forward from there during the next match, resetting the offset every time the log file changes. This could still result in false positives if one test spits out two or more matching log lines, but it should be an improvement over what's there now. --- src/test/kerberos/t/001_auth.pl | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl index 079321bbfc..c721d24dbd 100644 --- a/src/test/kerberos/t/001_auth.pl +++ b/src/test/kerberos/t/001_auth.pl @@ -182,6 +182,9 @@ $node->safe_psql('postgres', 'CREATE USER test1;'); note "running tests"; +my $current_log_name = ''; +my $current_log_position; + # Test connection success or failure, and if success, that query returns true. sub test_access { @@ -221,18 +224,37 @@ sub test_access $lfname =~ s/^stderr //; chomp $lfname; + if ($lfname ne $current_log_name) + { + $current_log_name = $lfname; + + # Search from the beginning of this new file. + $current_log_position = 0; + note "current_log_position = $current_log_position"; + } + # might need to retry if logging collector process is slow... my $max_attempts = 180 * 10; my $first_logfile; for (my $attempts = 0; $attempts < $max_attempts; $attempts++) { $first_logfile = slurp_file($node->data_dir . '/' . $lfname); - last if $first_logfile =~ m/\Q$expect_log_msg\E/; + + # Don't include previously matched text in the search. + $first_logfile = substr $first_logfile, $current_log_position; + if ($first_logfile =~ m/\Q$expect_log_msg\E/g) + { +$current_log_position += pos($first_logfile); +last; + } + usleep(100_000); } like($first_logfile, qr/\Q$expect_log_msg\E/, 'found expected log file content'); + + note "current_log_position = $current_log_position"; } return; -- 2.25.1 From def87ea906d37764d7d6a6e0c6d473ffacf2c801 Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Mon, 8 Feb 2021 10:53:20 -0800 Subject: [PATCH v3 2/3] ssl: store client's DN in port->peer_dn Original patch by Andrew Dunstan: https://www.postgresql.org/message-id/fd96ae76-a8e3-ef8e-a642-a592f5b76771%40dunslane.net but I've taken out the clientname=DN functionality; all that will be needed for the next patch is the DN string. --- src/backend/libpq/be-secure-openssl.c | 53 +++ src/include/libpq/libpq-be.h | 1 + 2 files changed, 47 insertions(+), 7 deletions(-) diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 4c4f025eb1..d0184a2ce2 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -543,22 +543,25 @@ aloop: /* Get client certificate, if available. */ port->peer = SSL_get_peer_certificate(port->ssl); - /* and extract the Common Name from it. */ + /* and extract the Common Name / Distinguished Name from it. */ port->peer_cn = NULL; + port->peer_dn = NULL; port->peer_cert_valid = false; if (port->peer != NULL) { int len; + X509_NAME *x509name = X509_get_subject_name(port->peer); + char *peer_cn; + char *peer_dn; + BIO *bio = NULL; + BUF_MEM*bio_buf = NULL; - len = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer), - NID_commonName, NULL, 0); + len = X509_NAME_get_text_by_NID(x509name, NID_commonName, NULL, 0); if (len != -1) { - char *peer_cn; - peer_cn = MemoryContextAlloc(TopMemoryContext, len + 1); - r = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer), - NID_commonName, peer_cn, len + 1); + r =
Re: [PATCH] pgbench: Remove ecnt, a member variable of CState
On 2021-Feb-26, Michael Paquier wrote: > On Fri, Feb 26, 2021 at 05:39:31PM +0900, miyake_kouta wrote: > > Also, the current pgbench's client abandons processing after hitting error, > > so this variable is no need, I think. > > Agreed. Its last use was in 12788ae, as far as I can see. So let's > just cleanup that. +1 -- Álvaro Herrera Valdivia, Chile
Re: Disallow SSL compression?
> On 26 Feb 2021, at 11:02, Magnus Hagander wrote: > On Mon, Feb 22, 2021 at 12:28 PM Daniel Gustafsson wrote: >> >>> On 22 Feb 2021, at 11:52, Magnus Hagander wrote: >>> Agreed. It will also help with not having to implement it in new SSL >>> implementations I'm sure :) >> >> Not really, no other TLS library I would consider using actually has >> compression (except maybe wolfSSL?). GnuTLS and NSS both removed it, and >> Secure Transport and Schannel never had it AFAIK. > > Ah, well, you'd still have to implement some empty placeholder :) Correct. The attached removes sslcompression to see what it would look like. The server actively disallows it and the parameter is removed, but the sslcompression column in the stat view is retained. An alternative could be to retain the parameter but not act on it in order to not break scripts etc, but that just postpones the pain until when we inevitably do remove it. Thoughts? Any reason to keep supporting SSL compression or is it time for v14 to remove it? Are there still users leveraging this for protocol compression without security making it worthwhile to keep? >>> >>> When the last round of discussion happened, I had multiple customers >>> who did exactly that. None of them do that anymore, due to the pain of >>> making it work... >> >> Unsurprisingly. >> >>> I think for libpq we want to keep the option for a while but making it >>> a no-op, to not unnecessarily break systems where people just upgrade >>> libpq, though. And document it as such having no effect and "will >>> eventually be removed". >> >> Agreed, that's better. > > In fact, pg_basebackup with -R will generate a connection string that > includes sslcompression=0 when used today (unless you jump through the > hoops to make it work), so not accepting that noe would definitely > break a lot of things needlessly. Yup, and as mentioned elsewhere in the thread the standard way of doing it is to leave the param behind and just document it as not in use. Attached is a v2 which retains the sslcompression parameter for backwards compatibility. -- Daniel Gustafsson https://vmware.com/ v2-0001-Disallow-SSL-compression.patch Description: Binary data
Systems Integration and Raising of the Abstraction Level
We have highly evolved systems such as SQL, HTTP, HTML, file formats or high level programming languages such as Java or PHP that allow us to program many things with little code. Even so a lot of effort is invested in the integration of these systems. To try to reduce this problem libraries and frameworks that help in some ways are created but the integration is not complete. It is well known that most of the time when you try to create something to integrate several incompatible systems what you get in the end is to have another incompatible system :). Still I think the integration between the systems mentioned above is something very important that can mean a great step in the evolution of computing and worth a try. To explore how this integration can be I have created a framework that I have called NextTypes and that its main objective is the integration of data types. For me it is something very illogical that something as basic as a 16 bits integer receives a different name in each of the systems ("smallint", "short", "number") and can also be signed or unsigned. In any moment, due to a mistake from the programmer, the number of the programming language does not fit in the database column. Besides these names are little indicative of its characteristics, it would be clearer for example to use "int16". Whatever names are chosen the most important thing is to use in all systems the same names for types of the same characteristics. Also there is no standard system for defining composite data types of primitive types and other composite types. From an HTML form to a SQL table or from one application to another are required multiple transformations of the data. Lack of integration also lowers the level of abstraction, making it necessary to do lots of low level stuff for systems to fit. NextTypes at this moment is nothing more than another incompatible system with the others. It simply integrates them quite a bit and raises the level of abstraction. But what I would like is that the things that compose NextTypes were included in the systems it integrates. Finally I would like to list some examples of improvements in database managers, SQL, HTTP, HTML, browsers and programming languages that would help the integration and elevation of the level of abstraction. Some of these enhancements are already included in NextTypes and other frameworks. SQL --- - Custom metadata in tables and columns. - Date of creation and modification of the rows. - Date of creation and modification of the definition of the tables. - Use of table and column names in prepared statements. Example: select # from # where # = ?; - Use of arrays in prepared statements. Example: select # from article where id in (?); # = author,title ? = 10,24,45 - Standardization of ranges of valid values and resolution for date, time and datetime types in database managers an HTML time element. PostgreSQL -- - Facilitate access to the definition of full text search indexes with a function to parse "pg_index.indexprs" column. Other database managers --- - Allow transactional DDL, deferrable constraints and composite types. JDBC - High level methods that allow queries with the execution of a single method. Example: Tuple [] tuples = query("select author,title from article where id in (?), ids); - Integration with java.time data types. HTTP - Servers -- - Processing of arrays of elements composed of several parameters. fields:0:type = string fields:0:name = title fields:0:parameters = 250 fields:0:not_null = true fields:1:type = string fields:1:name = author fields:1:parameters = 250 fields:1:not_null = true Another possibility is to generate in the browser arrays of JSON objects from the forms. "fields": [ { "type": "string", "name": "title", "parameters": 250, "not_null": true }, { "type": "string", "name": "author", "parameters": 250, "not_null": true } ] XML/HTML - BROWSER -- - Input elements for different types of numbers with min and max values: 16 bits integer, 32 bits integer, 32 bits float and 64 bits float. - Input elements for images, audios and videos with preview. - Timezone input element. - Boolean input element with "true" and "false" values. - Null value in file inputs. - Clear button in file inputs like in date and time inputs. - Show size in file inputs. - Extension of the DOM API with high level and chainable methods. Example: paragraph.appendElement("a").setAttribute("href", "/article"); - Change of the "action" parameter of the forms to "target" to indicate the URL where to execute the action. The "action" parameter is moved to the different buttons on the form and allows executing a different action with each of the the buttons.
Re: Some regular-expression performance hacking
"Joel Jacobson" writes: > On Fri, Feb 26, 2021, at 01:16, Tom Lane wrote: >> 0007-smarter-regex-allocation-2.patch > I've successfully tested this patch. Cool, thanks for testing! > That's a 29% speed-up compared to HEAD! Truly amazing. Hmm, I'm still only seeing about 10% or a little better. I wonder why the difference in your numbers. Either way, though, I'll take it, since the main point here is to cut memory consumption and not so much cycles. regards, tom lane
Re: CREATE INDEX CONCURRENTLY on partitioned index
On Mon, Feb 15, 2021 at 10:07:05PM +0300, Anastasia Lubennikova wrote: > 5) Speaking of documentation, I think we need to add a paragraph about CIC > on partitioned indexes which will explain that invalid indexes may appear > and what user should do to fix them. I'm not sure about that - it's already documented in general, for nonpartitioned indexes. -- Justin >From fb60da3c0fac8f1699a6caeea57476770c66576d Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 6 Jun 2020 17:42:23 -0500 Subject: [PATCH 1/5] Allow CREATE INDEX CONCURRENTLY on partitioned table Note, this effectively reverts 050098b14, so take care to not reintroduce the bug it fixed. --- doc/src/sgml/ref/create_index.sgml | 9 -- src/backend/commands/indexcmds.c | 143 ++--- src/test/regress/expected/indexing.out | 60 ++- src/test/regress/sql/indexing.sql | 18 +++- 4 files changed, 176 insertions(+), 54 deletions(-) diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 965dcf472c..7c75119d78 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -686,15 +686,6 @@ Indexes: cannot. - -Concurrent builds for indexes on partitioned tables are currently not -supported. However, you may concurrently build the index on each -partition individually and then finally create the partitioned index -non-concurrently in order to reduce the time where writes to the -partitioned table will be locked out. In this case, building the -partitioned index is a metadata only operation. - - diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 8bc652ecd3..9ab1a66971 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -68,6 +68,7 @@ /* non-export function prototypes */ +static void reindex_invalid_child_indexes(Oid indexRelationId); static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts); static void CheckPredicate(Expr *predicate); static void ComputeIndexAttrs(IndexInfo *indexInfo, @@ -680,17 +681,6 @@ DefineIndex(Oid relationId, partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE; if (partitioned) { - /* - * Note: we check 'stmt->concurrent' rather than 'concurrent', so that - * the error is thrown also for temporary tables. Seems better to be - * consistent, even though we could do it on temporary table because - * we're not actually doing it concurrently. - */ - if (stmt->concurrent) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot create index on partitioned table \"%s\" concurrently", - RelationGetRelationName(rel; if (stmt->excludeOpNames) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), @@ -1128,6 +1118,11 @@ DefineIndex(Oid relationId, if (pd->nparts != 0) flags |= INDEX_CREATE_INVALID; } + else if (concurrent && OidIsValid(parentIndexId)) + { + /* If concurrent, initially build index partitions as "invalid" */ + flags |= INDEX_CREATE_INVALID; + } if (stmt->deferrable) constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE; @@ -1183,6 +1178,14 @@ DefineIndex(Oid relationId, partdesc = RelationGetPartitionDesc(rel); if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0) { + /* + * Need to close the relation before recursing into children, so + * copy needed data into a longlived context. + */ + + MemoryContext ind_context = AllocSetContextCreate(PortalContext, "CREATE INDEX", + ALLOCSET_DEFAULT_SIZES); + MemoryContext oldcontext = MemoryContextSwitchTo(ind_context); int nparts = partdesc->nparts; Oid *part_oids = palloc(sizeof(Oid) * nparts); bool invalidate_parent = false; @@ -1193,8 +1196,10 @@ DefineIndex(Oid relationId, nparts); memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts); + parentDesc = CreateTupleDescCopy(RelationGetDescr(rel)); + table_close(rel, NoLock); + MemoryContextSwitchTo(oldcontext); - parentDesc = RelationGetDescr(rel); opfamOids = palloc(sizeof(Oid) * numberOfKeyAttributes); for (i = 0; i < numberOfKeyAttributes; i++) opfamOids[i] = get_opclass_family(classObjectId[i]); @@ -1237,10 +1242,12 @@ DefineIndex(Oid relationId, continue; } +oldcontext = MemoryContextSwitchTo(ind_context); childidxs = RelationGetIndexList(childrel); attmap = build_attrmap_by_name(RelationGetDescr(childrel), parentDesc); +MemoryContextSwitchTo(oldcontext); foreach(cell, childidxs) { @@ -1311,10 +1318,14 @@ DefineIndex(Oid relationId, */ if (!found) { - IndexStmt *childStmt = copyObject(stmt); + IndexStmt *childStmt; bool found_whole_row; ListCell *lc; + oldcontext = MemoryContextSwitchTo(ind_context); + childStmt = copyObject(stmt); +
Re: Interest in GSoC 2021 Projects
Greetings! * Yixin Shi (es...@umich.edu) wrote: > I am Yixin Shi, a junior student majoring in Computer Science at University > of Michigan. I notice that the project ideas for GSoC 2021 have been posted > on your webpage and I am quite interested in two of them. I really wish to > take part in the project *Make plsample a more complete procedural language > handler example (2021)* considering both my interest and ability. The > potential mentor for this Project is *Mark Wong*. I am also interested in > the project *Improve PostgreSQL Regression Test Coverage (2021)* and the > potential mentor is *Andreas Scherbaum* and *Stephen Frost*. > > I would like to learn more about these two projects but failed to contact > the mentors. How can I contact them? Also, I really hope to join the > project. Are there any suggestions on application? Glad to hear you're interested- I'm one of the mentors you mention. You've found me, but I do want to point out that I'm also listed with my email address at: https://wiki.postgresql.org/wiki/GSoC ; the top-level PostgreSQL GSoC page (I've also added a link to that from the GSoC 2021 project ideas page). Note that PostgreSQL hasn't yet been formally accepted into the GSoC 2021 program. Google has said they intend to announce the chosen organizations on March 9th. Of course, you're welcome to start working on your application even ahead of that date if you wish to do so. Google provides a lot of useful information for prospective students here: https://google.github.io/gsocguides/student/ including a lot of information about how to write a solid proposal. For the regression test coverage, the first step would be to review, as suggested in the project idea, the current state of coverage which you can see here: https://coverage.postgresql.org/ You'd also want to pull PG down and make sure that you have an environment where you can build PG and run the regression tests and build the coverage report yourself. With that done, you can consider what areas you'd like to tackle, perhaps starting out with some of the simpler cases such as extensions that maybe don't have any test coverage today, or the various tools which have zero or very little. I've also CC'd Mark so he can provide some comments regarding plsample. Thanks, Stephen signature.asc Description: PGP signature
Re: SSL SNI
Greetings, * Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote: > A customer asked about including Server Name Indication (SNI) into the SSL > connection from the client, so they can use an SSL-aware proxy to route > connections. There was a thread a few years ago where this was briefly > discussed but no patch appeared.[0] I whipped up a quick patch and it did > seem to do the job, so I figured I'd share it here. This doesn't actually result in the ability to do that SSL connection proxying though, does it? At the least, whatever the proxy is would have to be taught how to send back an 'S' to the client, and send an 'S' to the server chosen after the client sends along the TLS ClientHello w/ SNI set, and then pass the traffic between afterwards. Perhaps it's worth doing this to allow proxy developers to do that, but this isn't enough to make it actually work without the proxy actually knowing PG and being able to be configured to do the right thing for the PG protocol. I would think that, ideally, we'd have some proxy author who would be willing to actually implement this and test that it all works with this patch applied, and then make sure to explain that proxies will need to be adapted to be able to work. Simply including this and then putting in the release notes that we now provide SNI as part of the SSL connection would likely lead people to believe that it'll 'just work'. Perhaps to manage expectations we'd want to say something like: - libpq will now include Server Name Indication as part of the PostgreSQL SSL startup process; proxies will need to understand the PostgreSQL protocol in order to be able to leverage this to perform routing. Or something along those lines, I would think. Of course, such a proxy would need to also understand how to tell a client that, for example, GSSAPI encryption isn't available if a 'G' came first from the client, and what to do if a plaintext connection was requested. > The question I had was whether this should be an optional behavior, or > conversely a behavior that can be turned off, or whether it should just be > turned on all the time. Certainly seems like something that we should support turning off, at least. As mentioned elsewhere, knowing the system that's being connected to is certainly interesting to attackers. Thanks, Stephen signature.asc Description: PGP signature
Re: Some regular-expression performance hacking
Hi, On Fri, Feb 26, 2021, at 01:16, Tom Lane wrote: > 0007-smarter-regex-allocation-2.patch I've successfully tested this patch. I had to re-create the performance_test table since some cases the previously didn't give an error, now gives error "invalid regular expression: invalid character range". This is expected and of course an improvement, but just wanted to explain why the number of rows don't match the previous test runs. CREATE TABLE performance_test AS SELECT subjects.subject, patterns.pattern, patterns.flags, tests.is_match, tests.captured FROM tests JOIN subjects ON subjects.subject_id = tests.subject_id JOIN patterns ON patterns.pattern_id = subjects.pattern_id WHERE tests.error IS NULL -- -- the below part is added to ignore cases -- that now results in error: -- AND NOT EXISTS ( SELECT 1 FROM deviations WHERE deviations.test_id = tests.test_id AND deviations.error IS NOT NULL ); SELECT 3253889 Comparing 13.2 with HEAD, not a single test resulted in a different is_match value, i.e. the test just using the ~ regex operator, to only check if it matches or not. Good. SELECT COUNT(*) FROM deviations JOIN tests ON tests.test_id = deviations.test_id WHERE tests.is_match <> deviations.is_match count --- 0 (1 row) The below query shows a frequency count per error message: SELECT error, COUNT(*) FROM deviations GROUP BY 1 ORDER BY 2 DESC error| count -+ | 106173 regexp_match() does not support the "global" option | 5799 invalid regular expression: invalid character range | 1060 invalid regular expression option: "y" |277 (4 rows) As we can see, 106173 cases now goes through without an error, that previously gave an error. This is thanks to now allowing escape sequences within bracket expressions. The other errors are expected and all good. End of correctness analysis. Now let's look at performance! I reran the same query three times to get a feeling for the stddev. \timing SELECT is_match <> (subject ~ pattern), captured IS DISTINCT FROM regexp_match(subject, pattern, flags), COUNT(*) FROM performance_test GROUP BY 1,2 ORDER BY 1,2; ?column? | ?column? | count --+--+- f| f| 3253889 (1 row) HEAD (b3a9e9897ec702d56602b26a8cdc0950f23b29dc) Time: 125938.747 ms (02:05.939) Time: 125414.792 ms (02:05.415) Time: 126185.496 ms (02:06.185) HEAD (b3a9e9897ec702d56602b26a8cdc0950f23b29dc)+0007-smarter-regex-allocation-2.patch ?column? | ?column? | count --+--+- f| f| 3253889 (1 row) Time: 89145.030 ms (01:29.145) Time: 89083.210 ms (01:29.083) Time: 89166.442 ms (01:29.166) That's a 29% speed-up compared to HEAD! Truly amazing. Let's have a look at the total speed-up compared to PostgreSQL 13. In my previous benchmarks testing against old versions, I used precompiled binaries, but this time I compiled REL_13_STABLE: Time: 483390.132 ms (08:03.390) That's a 82% speed-up in total! Amazing! /Joel
Re: Postgresql network transmission overhead
"Michael J. Baars" writes: > In the logfile you can see that the effective user data being written is only > 913kb, while the actual being transmitted over the network is 7946kb when > writing > one row at a time. That is an overhead of 770%! So ... don't write one row at a time. You haven't shown any details, but I imagine that most of the overhead comes from per-query stuff like the RowDescription metadata. The intended usage pattern for bulk operations is that there's only one RowDescription message for a whole lot of data rows. There might be reasons you want to work a row at a time, but if your concern is to minimize network traffic, don't do that. regards, tom lane
RE: [HACKERS] logical decoding of two-phase transactions
Hi On Thursday, February 25, 2021 4:02 PM Peter Smith > Please find attached the latest patch set v43* > > - Added new patch 0007 "Fix apply worker prepare" as discussed here [2] > > [2] > https://www.postgresql.org/message-id/CAA4eK1L%3DdhuCRvyDvrXX5wZ > gc7s1hLRD29CKCK6oaHtVCPgiFA%40mail.gmail.com I tested the scenario that we resulted in skipping prepared transaction data and the replica became out of sync, which was described in [2]. And, as you said, the problem is addressed in v43. I used twophase_snapshot.spec as a reference for the flow (e.g. how to make a consistent snapshot between prepare and commit prepared) and this time, as an alternative of the SQL API(pg_create_logical_replication_slot), I issued CREATE SUBSCRIPTION, and other than that, I followed other flows in the spec file mainly. I checked that the replica has the same data at the end of this test, which means the mechanism of spoolfile works. Best Regards, Takamichi Osumi
Re: repeated decoding of prepared transactions
On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian wrote: > > On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian wrote: > > > I've updated snapshot_was_exported_at_ member to pg_replication_slots as > > well. > > Do have a look and let me know if there are any comments. > > Update with both patches. Thanks for fixing and providing an updated patch. Patch applies, make check and make check-world passes. I could see the issue working fine. Few minor comments: + snapshot_was_exported_at pg_lsn + + + The address (LSN) at which the logical + slot found a consistent point at the time of slot creation. + NULL for physical slots. + + I had seen earlier also we had some discussion on naming snapshot_was_exported_at. Can we change snapshot_was_exported_at to snapshot_exported_lsn, I felt if we can include the lsn in the name, the user will be able to interpret easily and also it will be similar to other columns in pg_replication_slots view. L.restart_lsn, L.confirmed_flush_lsn, + L.snapshot_was_exported_at, L.wal_status, L.safe_wal_size Looks like there is some indentation issue here. Regards, Vignesh
Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.
On Fri, Feb 26, 2021 at 1:53 AM Paul Martinez wrote: > > On Thu, Feb 25, 2021 at 5:31 AM Amit Kapila wrote: > > > > For docs only patch, I have few suggestions: > > 1. On page [1], it is not very clear that we are suggesting to set > > max_replication_slots for origins whereas your new doc patch has > > clarified it, can we update the other page as well. > > Sorry, what other page are you referring to? > https://www.postgresql.org/docs/devel/logical-replication-config.html > > > 2. > > Setting it a lower value than the current > > + number of tracked replication origins (reflected in > > + > linkend="view-pg-replication-origin-status">pg_replication_origin_status, > > + not > linkend="catalog-pg-replication-origin">pg_replication_origin) > > + will prevent the server from starting. > > + > > > > Why can't we just mention pg_replication_origin above? > > > > So this is slightly confusing: > > pg_replication_origin just contains mappings from origin names to oids. > It is regular catalog table and has no limit on its size. Users can also > manually insert rows into this table. > > https://www.postgresql.org/docs/13/catalog-pg-replication-origin.html > > The view showing the in-memory information is actually > pg_replication_origin_status. The number of entries here is what is > actually constrained by the GUC parameter. > Okay, that makes sense. However, I have sent a patch today (see [1]) where I have slightly updated the subscriber-side configuration paragraph. From PG-14 onwards, table synchronization workers also use origins on subscribers, so you might want to adjust. > > > This also brings up a point regarding the naming of the added GUC. > max_replication_origins is cleanest, but has this confusion regarding > pg_replication_origin vs. pg_replication_origin_status. > max_replication_origin_statuses is weird (and long). > max_tracked_replication_origins is a possibility? > or maybe max_replication_origin_states. I guess we can leave adding GUC to some other day as that might require a bit broader acceptance and we are already near to the start of last CF. I think we can still consider it if we few more people share the same opinion as yours. [1] - https://www.postgresql.org/message-id/CAA4eK1KkbppndxxRKbaT2sXrLkdPwy44F4pjEZ0EDrVjD9MPjQ%40mail.gmail.com -- With Regards, Amit Kapila.
Re: [PATCH] pgbench: Remove ecnt, a member variable of CState
On Fri, Feb 26, 2021 at 05:39:31PM +0900, miyake_kouta wrote: > Also, the current pgbench's client abandons processing after hitting error, > so this variable is no need, I think. Agreed. Its last use was in 12788ae, as far as I can see. So let's just cleanup that. -- Michael signature.asc Description: PGP signature
Re: [PATCH] pgbench: Bug fix for the -d option
On Fri, Feb 26, 2021 at 05:16:17PM +0900, Michael Paquier wrote: > Yes, the existing code could mess up with the selection logic if > PGPORT and PGUSER are both specified in an environment, masking the > value of PGUSER, so let's fix that. This is as old as 412893b. By the way, I can help but wonder why pgbench has such a different handling for the user name, fetching first PGUSER and then looking at the options while most of the other binaries use get_user_name(). It seems to me that we could simplify the handling around "login" without really impacting the usefulness of the tool, no? -- Michael signature.asc Description: PGP signature
Re: Add --tablespace option to reindexdb
On Fri, Feb 26, 2021 at 11:00:13AM +0100, Daniel Gustafsson wrote: > Some other small comments: > > + Assert(PQserverVersion(conn) >= 14); > Are these assertions really buying us much when we already check the server > version in reindex_one_database()? I found these helpful when working on vacuumdb and refactoring the code, though I'd agree this code may not justify going down to that. > + printf(_(" --tablespace=TABLESPACE reindex on specified > tablespace\n")); > While not introduced by this patch, I realized that we have a mix of > conventions for how to indent long options which don't have a short option. > Under "Connection options" all options are left-justified but under "Options" > all long-options are aligned with space indentation for missing shorts. Some > tools do it like this, where others like createuser left justifies under > Options as well. Maybe not the most pressing issue, but consistency is always > good in user interfaces so maybe it's worth addressing (in a separate patch)? Yeah, consistency is good, though I am not sure which one would be consistent here actually as there is no defined rule. For this one, I think that I would keep what I have to be consistent with the existing inconsistency (?). In short, I would just add --tablespace the same way there is --concurrently, without touching the connection option part. -- Michael signature.asc Description: PGP signature
Re: non-HOT update not looking at FSM for large tuple update
On Wed, Feb 24, 2021 at 6:29 PM Floris Van Nee wrote: > > > > That makes sense, although the exact number seems precisely tailored to your use case. 2% gives 164 bytes of slack and doesn't seem too large. Updated patch attached. > > Yeah, I tried picking it as conservative as I could, but 2% is obviously great too. :-) I can't think of any large drawbacks either of having a slightly larger value. > Thanks for posting the patch! I've added this to the commitfest as a bug fix and added you as an author. -- John Naylor EDB: http://www.enterprisedb.com
Re: Optimising latch signals
Here's a new version with two small changes from Andres: 1. Reorder InitPostmasterChild() slightly to avoid hanging on EXEC_BACKEND builds. 2. Revert v2's use of raise(x) instead of kill(MyProcPid, x); glibc manages to generate 5 syscalls for raise(). I'm planning to commit this soon if there are no objections. From acacbf06aa54b4b139553b08d7f8511b0aaa0331 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 8 Aug 2020 15:08:09 +1200 Subject: [PATCH v5 1/5] Optimize latches to send fewer signals. Don't send signals to processes that aren't currently sleeping. Author: Andres Freund Discussion: https://postgr.es/m/ca+hukgjjxpdpzbe0a3hyuywbvazuc89yx3jk9rfzgfv_khu...@mail.gmail.com --- src/backend/storage/ipc/latch.c | 24 src/include/storage/latch.h | 1 + 2 files changed, 25 insertions(+) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index f2d005eea0..04aed96207 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -274,6 +274,7 @@ void InitLatch(Latch *latch) { latch->is_set = false; + latch->maybe_sleeping = false; latch->owner_pid = MyProcPid; latch->is_shared = false; @@ -321,6 +322,7 @@ InitSharedLatch(Latch *latch) #endif latch->is_set = false; + latch->maybe_sleeping = false; latch->owner_pid = 0; latch->is_shared = true; } @@ -523,6 +525,10 @@ SetLatch(Latch *latch) latch->is_set = true; + pg_memory_barrier(); + if (!latch->maybe_sleeping) + return; + #ifndef WIN32 /* @@ -589,6 +595,7 @@ ResetLatch(Latch *latch) { /* Only the owner should reset the latch */ Assert(latch->owner_pid == MyProcPid); + Assert(latch->maybe_sleeping == false); latch->is_set = false; @@ -1289,6 +1296,14 @@ WaitEventSetWait(WaitEventSet *set, long timeout, * ordering, so that we cannot miss seeing is_set if a notification * has already been queued. */ + if (set->latch && !set->latch->is_set) + { + /* about to sleep on a latch */ + set->latch->maybe_sleeping = true; + pg_memory_barrier(); + /* and recheck */ + } + if (set->latch && set->latch->is_set) { occurred_events->fd = PGINVALID_SOCKET; @@ -1299,6 +1314,9 @@ WaitEventSetWait(WaitEventSet *set, long timeout, occurred_events++; returned_events++; + /* could have been set above */ + set->latch->maybe_sleeping = false; + break; } @@ -1310,6 +1328,12 @@ WaitEventSetWait(WaitEventSet *set, long timeout, rc = WaitEventSetWaitBlock(set, cur_timeout, occurred_events, nevents); + if (set->latch) + { + Assert(set->latch->maybe_sleeping); + set->latch->maybe_sleeping = false; + } + if (rc == -1) break;/* timeout occurred */ else diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index 1468f30a8e..393591be03 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -110,6 +110,7 @@ typedef struct Latch { sig_atomic_t is_set; + sig_atomic_t maybe_sleeping; bool is_shared; int owner_pid; #ifdef WIN32 -- 2.30.0 From af31f00b4df0b65f177696891180345da8dabaee Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 26 Nov 2020 10:18:29 +1300 Subject: [PATCH v5 2/5] Use SIGURG rather than SIGUSR1 for latches. Traditionally, SIGUSR1 has been overloaded for ad-hoc signals, procsignal.c signals and latch.c wakeups. Move that last use over to a new dedicated signal. SIGURG is normally used to report out-of-band socket data, but PostgreSQL doesn't use that. The signal handler is now installed in all postmaster children by InitializeLatchSupport(). Those wishing to disconnect from it should call ShutdownLatchSupport() which will set it back to SIG_IGN. Future patches will use this separation to avoid the need for a signal handler on some operating systems. Discussion: https://postgr.es/m/ca+hukgjjxpdpzbe0a3hyuywbvazuc89yx3jk9rfzgfv_khu...@mail.gmail.com --- src/backend/postmaster/bgworker.c| 19 ++- src/backend/postmaster/postmaster.c | 4 +++ src/backend/storage/ipc/latch.c | 50 ++-- src/backend/storage/ipc/procsignal.c | 2 -- src/include/storage/latch.h | 11 +- 5 files changed, 40 insertions(+), 46 deletions(-) diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 6fdea3fc2d..bbbc09b0b5 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -713,22 +713,6 @@ bgworker_die(SIGNAL_ARGS) MyBgworkerEntry->bgw_type))); } -/* - * Standard SIGUSR1 handler for unconnected workers - * - * Here, we want to make sure an unconnected worker will at least heed - * latch activity. - */ -static void -bgworker_sigusr1_handler(SIGNAL_ARGS) -{ - int save_errno = errno; - - latch_sigusr1_handler(); - - errno = save_errno; -} - /* * Start a new background worker * @@ -759,6 +743,7 @@ StartBackgroundWorker(void) */ if ((worker->bgw_flags & BGWORKER_SHMEM_ACCESS) == 0)
Re: [HACKERS] logical decoding of two-phase transactions
On Fri, Feb 26, 2021 at 9:56 AM Amit Kapila wrote: > > On Thu, Feb 25, 2021 at 12:32 PM Peter Smith wrote: > > > > 5. You need to write/sync the spool file at prepare time because after > restart between prepare and commit prepared the changes can be lost > and won't be resent by the publisher assuming there are commits of > other transactions between prepare and commit prepared. For the same > reason, I am not sure if we can just rely on the in-memory hash table > for it (prepare_spoolfile_exists). Sure, if it exists and there is no > restart then it would be cheap to check in the hash table but I don't > think it is guaranteed. > As we can't rely on the hash table, I think we can get rid of it and always check if the corresponding file exists. Few more comments on v43-0007-Fix-apply-worker-empty-prepare 1. + * So the "table_states_not_ready" list might end up having a READY + * state it it even though The above sentence doesn't sound correct to me. 2. @@ -759,6 +798,79 @@ apply_handle_begin_prepare(StringInfo s) { .. + */ + if (!am_tablesync_worker()) + { I think here we should have an Assert for tablesync worker because it should never receive prepare. 3. + while (BusyTablesyncs()) + { + elog(DEBUG1, "apply_handle_begin_prepare - waiting for all sync workers to be DONE/READY"); + + process_syncing_tables(begin_data.end_lsn); .. + if (begin_data.end_lsn < BiggestTablesyncLSN() In both the above places, you need to use begin_data.final_lsn because the prepare is yet not replayed so we can't use its end_lsn for syncup. 4. +/* + * Are there any tablesyncs which have still not yet reached SYNCDONE/READY state? + */ +bool +BusyTablesyncs() The function name is not clear enough. Can we change it to something like AnyTableSyncInProgress? 5. +/* + * Are there any tablesyncs which have still not yet reached SYNCDONE/READY state? + */ +bool +BusyTablesyncs() { .. + /* + * XXX - When the process_syncing_tables_for_sync changes the state + * from SYNCDONE to READY, that change is actually written directly In the above comment, do you mean to process_syncing_tables_for_apply because that is where we change state to READY? And, I don't think we need to mark this comment as XXX. 6. + * XXX - Is there a potential timing problem here - e.g. if signal arrives + * while executing this then maybe we will set table_states_valid without + * refetching them? + */ +static void +FetchTableStates(bool *started_tx) .. Can you explain which race condition you are worried about here which is not possible earlier but can happen after this patch? 7. @@ -941,6 +1162,26 @@ apply_handle_stream_prepare(StringInfo s) elog(DEBUG1, "received prepare for streamed transaction %u", xid); /* + * Wait for all the sync workers to reach the SYNCDONE/READY state. + * + * This is same waiting logic as in appy_handle_begin_prepare function + * (see that function for more details about this). + */ + if (!am_tablesync_worker()) + { + while (BusyTablesyncs()) + { + process_syncing_tables(prepare_data.end_lsn); + + /* This latch is to prevent 100% CPU looping. */ + (void) WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + 1000L, WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE); + ResetLatch(MyLatch); + } + } I think we need similar handling in stream_prepare as in begin_prepare for writing to spool file because this has the same danger. But here we need to write it xid spool file in StreamXidHash. Another thing we need to ensure is to sync that file in stream prepare so that it can survive restarts. Then in apply_handle_commit_prepared, after checking for prepared spool file, we need to check the existence of xid spool file, and if the same exists then apply messages from that file. Again, like begin_prepare, in apply_handle_stream_prepare also we should have an Assert for table sync worker. I feel that 2PC and streaming case is a bit complicated to deal with. How about, for now, we won't allow users to enable streaming if 2PC option is enabled for Subscription. This requires some change (error out if both streaming and 2PC options are enabled) in both createsubscrition and altersubscription but that change should be fairly small. If we follow this, then in apply_dispatch (for case LOGICAL_REP_MSG_STREAM_PREPARE), we should report an ERROR "invalid logical replication message type". -- With Regards, Amit Kapila.
Re: repeated decoding of prepared transactions
On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian wrote: > I've updated snapshot_was_exported_at_ member to pg_replication_slots as > well. > Do have a look and let me know if there are any comments. Update with both patches. regards, Ajin Cherian Fujitsu Australia v4-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patch Description: Binary data v4-0001-Avoid-repeated-decoding-of-prepared-transactions.patch Description: Binary data
Re: REINDEX backend filtering
On Fri, Feb 26, 2021 at 11:07 AM Julien Rouhaud wrote: > > On Fri, Feb 26, 2021 at 5:50 PM Magnus Hagander wrote: > > > > I don't agree with the conclusion though. > > > > The most common case of that will be in the case of an upgrade. In > > that case I want to reindex all of those indexes as quickly as > > possible. So I'll want to parallelize it across multiple sessions > > (like reindexdb -j 4 or whatever). But doesn't putting the filter in > > the grammar prevent me from doing exactly that? Each of those 4 (or > > whatever) sessions would have to guess which would go where and then > > speculatively run the command on that, instead of being able to > > directly distribute the worload? > > It means that you'll have to distribute the work on a per-table basis > rather than a per-index basis. The time spent to find out that a > table doesn't have any impacted index should be negligible compared to > the cost of running a reindex. This obviously won't help that much if > you have a lot of table but only one being gigantic. Yeah -- or at least a couple of large and many small, which I find to be a very common scenario. Or the case of some tables having many affected indexes and some having few. You'd basically want to order the operation by table on something like "total size of the affected indexes on table x" -- which may very well put a smaller table with many indexes earlier in the queue. But you can't do that without having access to the filter > But even if we put the logic in the client, this still won't help as > reindexdb doesn't support multiple job with an index list: > > * Index-level REINDEX is not supported with multiple jobs as we > * cannot control the concurrent processing of multiple indexes > * depending on the same relation. > */ > if (concurrentCons > 1 && indexes.head != NULL) > { > pg_log_error("cannot use multiple jobs to reindex indexes"); > exit(1); > } That sounds like it would be a fixable problem though, in principle. It could/should probably still limit all indexes on the same table to be processed in the same connection for the locking reasons of course, but doing an order by the total size of the indexes like above, and ensuring that they are grouped that way, doesn't sound *that* hard. I doubt it's that important in the current usecase of manually listing the indexes, but it would be useful for something like this. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: REINDEX backend filtering
On Fri, Feb 26, 2021 at 5:50 PM Magnus Hagander wrote: > > I don't agree with the conclusion though. > > The most common case of that will be in the case of an upgrade. In > that case I want to reindex all of those indexes as quickly as > possible. So I'll want to parallelize it across multiple sessions > (like reindexdb -j 4 or whatever). But doesn't putting the filter in > the grammar prevent me from doing exactly that? Each of those 4 (or > whatever) sessions would have to guess which would go where and then > speculatively run the command on that, instead of being able to > directly distribute the worload? It means that you'll have to distribute the work on a per-table basis rather than a per-index basis. The time spent to find out that a table doesn't have any impacted index should be negligible compared to the cost of running a reindex. This obviously won't help that much if you have a lot of table but only one being gigantic. But even if we put the logic in the client, this still won't help as reindexdb doesn't support multiple job with an index list: * Index-level REINDEX is not supported with multiple jobs as we * cannot control the concurrent processing of multiple indexes * depending on the same relation. */ if (concurrentCons > 1 && indexes.head != NULL) { pg_log_error("cannot use multiple jobs to reindex indexes"); exit(1); }
Re: Disallow SSL compression?
On Mon, Feb 22, 2021 at 12:28 PM Daniel Gustafsson wrote: > > > On 22 Feb 2021, at 11:52, Magnus Hagander wrote: > > > > On Thu, Feb 18, 2021 at 1:51 PM Daniel Gustafsson wrote: > >> > >> A few years ago we discussed whether to disable SSL compression [0] which > >> ended > >> up with it being off by default combined with a recommendation against it > >> in > >> the docs. > >> > >> OpenSSL themselves disabled SSL compression by default in 2016 in 1.1.0 > >> with > >> distros often having had it disabled for a long while before then. > >> Further, > >> TLSv1.3 removes compression entirely on the protocol level mandating that > >> only > >> NULL compression is allowed in the ClientHello. NSS, which is discussed in > >> another thread, removed SSL compression entirely in version 3.33 in 2017. > >> > >> It seems about time to revisit this since it's unlikely to work anywhere > >> but in > >> a very small subset of system setups (being disabled by default > >> everywhere) and > >> is thus likely to be very untested at best. There is also the security > >> aspect > >> which is less clear-cut for us compared to HTTP client/servers, but not > >> refuted > >> (the linked thread has a good discussion on this). > > > > Agreed. It will also help with not having to implement it in new SSL > > implementations I'm sure :) > > Not really, no other TLS library I would consider using actually has > compression (except maybe wolfSSL?). GnuTLS and NSS both removed it, and > Secure Transport and Schannel never had it AFAIK. Ah, well, you'd still have to implement some empty placeholder :) > >> The attached removes sslcompression to see what it would look like. The > >> server > >> actively disallows it and the parameter is removed, but the sslcompression > >> column in the stat view is retained. An alternative could be to retain the > >> parameter but not act on it in order to not break scripts etc, but that > >> just > >> postpones the pain until when we inevitably do remove it. > >> > >> Thoughts? Any reason to keep supporting SSL compression or is it time for > >> v14 > >> to remove it? Are there still users leveraging this for protocol > >> compression > >> without security making it worthwhile to keep? > > > > When the last round of discussion happened, I had multiple customers > > who did exactly that. None of them do that anymore, due to the pain of > > making it work... > > Unsurprisingly. > > > I think for libpq we want to keep the option for a while but making it > > a no-op, to not unnecessarily break systems where people just upgrade > > libpq, though. And document it as such having no effect and "will > > eventually be removed". > > Agreed, that's better. In fact, pg_basebackup with -R will generate a connection string that includes sslcompression=0 when used today (unless you jump through the hoops to make it work), so not accepting that noe would definitely break a lot of things needlessly. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Add --tablespace option to reindexdb
> On 26 Feb 2021, at 07:49, Michael Paquier wrote: > Since c5b2860, it is possible to specify a tablespace for a REINDEX, > but the equivalent option has not been added to reindexdb. Attached > is a patch to take care of that. > > This includes documentation and tests. Makes sense. > While on it, I have added tests for toast tables and indexes with a > tablespace move during a REINDEX. Those operations fail, but it is > not possible to get that into the main regression test suite because > the error messages include the relation names so that's unstable. > Well, it would be possible to do that for the non-concurrent case > using a TRY/CATCH block in a custom function but that would not work > with CONCURRENTLY. Anyway, I would rather group the whole set of > tests together, and using the --tablespace option introduced here > within a TAP test does the job. Agreed, doing it with a TAP test removes the complication. Some other small comments: + Assert(PQserverVersion(conn) >= 14); Are these assertions really buying us much when we already check the server version in reindex_one_database()? + printf(_(" --tablespace=TABLESPACE reindex on specified tablespace\n")); While not introduced by this patch, I realized that we have a mix of conventions for how to indent long options which don't have a short option. Under "Connection options" all options are left-justified but under "Options" all long-options are aligned with space indentation for missing shorts. Some tools do it like this, where others like createuser left justifies under Options as well. Maybe not the most pressing issue, but consistency is always good in user interfaces so maybe it's worth addressing (in a separate patch)? -- Daniel Gustafsson https://vmware.com/
Re: REINDEX backend filtering
On Thu, Jan 21, 2021 at 4:13 AM Julien Rouhaud wrote: > > On Wed, Dec 16, 2020 at 8:27 AM Michael Paquier wrote: > > > > On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote: > > > Is this really a common enough operation that we need it in the main > > > grammar? > > > > > > Having the functionality, definitely, but what if it was "just" a > > > function instead? So you'd do something like: > > > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here) > > > \gexec > > > > > > Or even a function that returns the REINDEX commands directly (taking > > > a parameter to turn on/off concurrency for example). > > > > > > That also seems like it would be easier to make flexible, and just as > > > easy to plug into reindexdb? > > > > Having control in the grammar to choose which index to reindex for a > > table is very useful when it comes to parallel reindexing, because > > it is no-brainer in terms of knowing which index to distribute to one > > job or another. In short, with this grammar you can just issue a set > > of REINDEX TABLE commands that we know will not conflict with each > > other. You cannot get that level of control with REINDEX INDEX as it > > may be possible that more or more commands conflict if they work on an > > index of the same relation because it is required to take lock also on > > the parent table. Of course, we could decide to implement a > > redistribution logic in all frontend tools that need such things, like > > reindexdb, but that's not something I think we should let the client > > decide of. A backend-side filtering is IMO much simpler, less code, > > and more elegant. > > Maybe additional filtering capabilities is not something that will be > required frequently, but I'm pretty sure that reindexing only indexes > that might be corrupt is something that will be required often.. So I > agree, having all that logic in the backend makes everything easier > for users, having the choice of the tools they want to issue the query > while still having all features available. I agree with that scenario -- in that the most common case will be exactly that of reindexing only indexes that might be corrupt. I don't agree with the conclusion though. The most common case of that will be in the case of an upgrade. In that case I want to reindex all of those indexes as quickly as possible. So I'll want to parallelize it across multiple sessions (like reindexdb -j 4 or whatever). But doesn't putting the filter in the grammar prevent me from doing exactly that? Each of those 4 (or whatever) sessions would have to guess which would go where and then speculatively run the command on that, instead of being able to directly distribute the worload? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: REINDEX backend filtering
On Wed, Dec 16, 2020 at 1:27 AM Michael Paquier wrote: > > On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote: > > Is this really a common enough operation that we need it in the main > > grammar? > > > > Having the functionality, definitely, but what if it was "just" a > > function instead? So you'd do something like: > > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here) > > \gexec > > > > Or even a function that returns the REINDEX commands directly (taking > > a parameter to turn on/off concurrency for example). > > > > That also seems like it would be easier to make flexible, and just as > > easy to plug into reindexdb? > > Having control in the grammar to choose which index to reindex for a > table is very useful when it comes to parallel reindexing, because > it is no-brainer in terms of knowing which index to distribute to one > job or another. In short, with this grammar you can just issue a set > of REINDEX TABLE commands that we know will not conflict with each > other. You cannot get that level of control with REINDEX INDEX as it (oops, seems I forgot to reply to this one, sorry!) Uh, isn't it almost exactly the opposite? If you do it in the backend grammar you *cannot* parallelize it between indexes, because you can only run one at a time. Whereas if you do it in the frontend, you can. Either with something like reindexdb that could do it automatically, or with psql as a copy/paste job? > may be possible that more or more commands conflict if they work on an > index of the same relation because it is required to take lock also on > the parent table. Of course, we could decide to implement a > redistribution logic in all frontend tools that need such things, like > reindexdb, but that's not something I think we should let the client > decide of. A backend-side filtering is IMO much simpler, less code, > and more elegant. It's simpler in the simple case, i agree with that. But ISTM it's also a lot less flexible for anything except the simplest case... -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: NOT VALID for Unique Indexes
On Mon, Jan 18, 2021 at 11:19 PM japin wrote: > > > On Fri, 15 Jan 2021 at 00:22, Simon Riggs > wrote: > > As you may be aware the NOT VALID qualifier currently only applies to > > CHECK and FK constraints, but not yet to unique indexes. I have had > > customer requests to change that. > > > > It's a reasonably common requirement to be able to change an index > > to/from a unique index, i.e. Unique -> NonUnique or NonUnique to > > Unique. Previously, it was easy enough to do that using a catalog > > update, but with security concerns and the fact that the optimizer > > uses the uniqueness to optimize queries means that there is a gap in > > our support. We obviously need to scan the index to see if it actually > > can be marked as unique. > > > > In terms of locking we need to exclude writes while we add uniqueness, > > so scanning the index to check it is unique would cause problems. So > > we need to do the same thing as we do with other constraint types: add > > the constraint NOT VALID in one transaction and then later validate it > > in a separate transaction (if ever). > > > > I present a WIP patch to show it's a small patch to change Uniqueness > > for an index, with docs and tests. > > > > ALTER INDEX SET [NOT] UNIQUE [NOT VALID] > > ALTER INDEX VALIDATE UNIQUE > > > > It doesn't do the index validation scan (yet), but I wanted to check > > acceptability, syntax and requirements before I do that. > > > > I can also add similar syntax for UNIQUE and PK constraints. > > > > Thoughts please? > > Great! I have some questions. > > 1. In the patch, you add a new attribute named "induniquevalid" in pg_index, >however, there is a "indisvalid" in pg_index, can we use "indisvalid"? indisvalid already has defined meaning related to creating indexes concurrently, so I was forced to create another column with a similar name. Thanks for reviewing the code in detail. > 2. The foreign key and CHECK constraints are valid by using >ALTER TABLE .. ADD table_constraint [ NOT VALID ] >ALTER TABLE .. VALIDATE CONSTRAINT constraint_name > >Should we implement unique index valid/not valid same as foreign key and >CHECK constraints? Yes, that is possible. (I wrote the NOT VALID patch for FKs, so was aware of that). The syntax I presented was for ALTER INDEX. Not all UNIQUE indexes are constraints, so it is important to add the option on ALTER INDEX. Adding the ALTER TABLE syntax can be done later. > 3. If we use the syntax to valid/not valid the unique, should we support >other constraints, such as foreign key and CHECK constraints? I'm sorry, I don't understand that question. FKs and CHECK constrants are already supported, as you point out above. I won't be able to finish this patch in time for this next CF, but thanks for your interest, I will complete for PG15 later this year. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: NOT VALID for Unique Indexes
On Mon, Jan 18, 2021 at 12:34 AM David Fetter wrote: > > On Thu, Jan 14, 2021 at 04:22:17PM +, Simon Riggs wrote: > > As you may be aware the NOT VALID qualifier currently only applies to > > CHECK and FK constraints, but not yet to unique indexes. I have had > > customer requests to change that. > > This is a great feature! > > Not exactly on point with this, but in a pretty closely related > context, is there some way we could give people the ability to declare > at their peril that a constraint is valid without incurring the full > scan that VALIDATE currently does? This is currently doable by > fiddling directly with the catalog, which operation is broadly more > dangerous and ill-advised. That is what NOT VALID allows, but it can't be relied on for optimization. -- Simon Riggshttp://www.EnterpriseDB.com/
RE: Parallel INSERT (INTO ... SELECT ...)
From: Tsunakawa, Takayuki/綱川 貴之 >the current patch showd nice performance improvement in some (many?) patterns. > >So, I think it can be committed in PG 14, when it has addressed the plan cache >issue that Amit Langote-san posed. Agreed. I summarized my test results for the current patch(V18) in the attached file(Please use VSCode or Notepad++ to open it, the context is a bit long). As you can see, the performance is good in many patterns(Please refer to my test script, test NO in text is correspond to number in sql file). If you have any question on my test, please feel free to ask. Regards, Tang max_parallel_workers_per_gather=2 - Test NO | test case | query plan | patched(ms) | master(ms) | %reg -- 1 | Test INSERT with underlying query. | parallel INSERT+SELECT | 20894.069 | 27030.623 | -23% 2 | Test INSERT into a table with a foreign key. | parallel SELECT | 80921.960 | 84416.775 | -4% 3 | Test INSERT with ON CONFLICT ... DO UPDATE | non-parallel| 28111.186 | 29531.922 | -5% 4 | Test INSERT with parallel_leader_participation = off; | parallel INSERT+SELECT | 2799.354| 5131.874| -45% 5 | Test INSERT with max_parallel_workers=0; | non-parallel| 27006.385 | 26962.279 | 0% 6 | Test INSERT with parallelized aggregate | parallel SELECT | 95482.307 | 105090.301 | -9% 7 | Test INSERT with parallel bitmap heap scan | parallel INSERT+SELECT | 606.664 | 844.471 | -28% 8 | Test INSERT with parallel append | parallel INSERT+SELECT | 109.896 | 239.198 | -54% 9 | Test INSERT with parallel index scan | parallel INSERT+SELECT | 552.481 | 1238.079| -55% 10 | Test INSERT with parallel-safe index expression | parallel INSERT+SELECT | 1453.686| 2908.489 | -50% 11 | Test INSERT with parallel-unsafe index expression | non-parallel| 2828.559| 2837.570| 0% 12 | Test INSERT with underlying query - and RETURNING (no projection) | parallel INSERT+SELECT | 417.493 | 685.430 | -39% 13 | Test INSERT with underlying ordered query - and RETURNING (no projection) | parallel SELECT | 971.095 | 1717.502| -43% 14 | Test INSERT with underlying ordered query - and RETURNING (with projection) | parallel SELECT | 1002.287|
Re: archive_command / pg_stat_archiver & documentation
Done here : https://commitfest.postgresql.org/32/3012/ Le jeu. 25 févr. 2021 à 15:34, Julien Rouhaud a écrit : > On Thu, Feb 25, 2021 at 7:25 PM Benoit Lobréau > wrote: > > > > Le mer. 24 févr. 2021 à 14:52, Julien Rouhaud a > écrit : > >> > >> I thought that this behavior was documented, especially for the lack > >> of update of pg_stat_archiver. If it's not the case then we should > >> definitely fix that! > > > > I tried to do it in the attached patch. > > Building the doc worked fine on my computer. > > Great, thanks! Can you register it in the next commitfest to make > sure it won't be forgotten? >
Re: repeated decoding of prepared transactions
On Thu, Feb 25, 2021 at 10:36 PM Amit Kapila wrote: > Few comments on the first patch: > 1. We can't remove ReorderBufferSkipPrepare because we rely on that in > SnapBuildDistributeNewCatalogSnapshot. > 2. I have changed the name of the variable from > snapshot_was_exported_at_lsn to snapshot_was_exported_at but I am > still not very sure about this naming because there are times when we > don't export snapshot and we still set this like when creating slots > with CRS_NOEXPORT_SNAPSHOT or when creating via SQL APIs. The other > name that comes to mind is initial_consistency_at, what do you think? > 3. Changed comments at various places. > > Please find the above changes as a separate patch, if you like you can > include these in the main patch. > > Apart from the above, I think the comments related to docs in my > previous email [1] are still valid, can you please take care of those. > > [1] - > https://www.postgresql.org/message-id/CAA4eK1Kr34_TiREr57Wpd%3D3%3D03x%3D1n55DAjwJPGpHAEc4dWfUQ%40mail.gmail.com I've added Amit's changes-patch as well as addressed comments related to docs in the attached patch. >On Thu, Feb 25, 2021 at 10:34 PM vignesh C wrote: >One observation while verifying the patch I noticed that most of >ReplicationSlotPersistentData structure members are displayed in >pg_replication_slots, but I did not see snapshot_was_exported_at_lsn >being displayed. Is this intentional? If not intentional we can >include snapshot_was_exported_at_lsn in pg_replication_slots. I've updated snapshot_was_exported_at_ member to pg_replication_slots as well. Do have a look and let me know if there are any comments. regards, Ajin Cherian Fujitsu Australia v4-0001-Avoid-repeated-decoding-of-prepared-transactions.patch Description: Binary data
[PATCH] pgbench: Remove ecnt, a member variable of CState
Hi. I created a patch to remove ecnt which is a member variable of CState. This variable is incremented in some places, but it's not used for any purpose. Also, the current pgbench's client abandons processing after hitting error, so this variable is no need, I think. Regards -- Kota Miyakediff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 627a244fb7..31a4df45f5 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -427,7 +427,6 @@ typedef struct /* per client collected stats */ int64 cnt; /* client transaction count, for -t */ - int ecnt; /* error count */ } CState; /* @@ -2716,7 +2715,6 @@ sendCommand(CState *st, Command *command) if (r == 0) { pg_log_debug("client %d could not send %s", st->id, command->argv[0]); - st->ecnt++; return false; } else @@ -2828,14 +2826,12 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix) if (qrynum == 0) { pg_log_error("client %d command %d: no results", st->id, st->command); - st->ecnt++; return false; } return true; error: - st->ecnt++; PQclear(res); PQclear(next_res); do
RE: libpq debug log
From: Kyotaro Horiguchi > Using (inCursor - inStart) as logCursor doesn't work correctly if tracing > state > desyncs. Once desync happens inStart can be moved at the timing that the > tracing code doesn't expect. This requires (as I mentioned upthread) > pqReadData to actively reset logCursor, though. > > logCursor should move when bytes are fed to the tracing functoins even when > theyare the last bytes of a message. Hmm, sharp and nice insight. I'd like Iwata-san and Kirk to consider this, including Horiguchi-san's previous mails. Regards Takayuki Tsunakawa
Re: libpq debug log
At Fri, 26 Feb 2021 17:30:32 +0900 (JST), Kyotaro Horiguchi wrote in > At Fri, 26 Feb 2021 16:12:39 +0900 (JST), Kyotaro Horiguchi > wrote in > > This inhibits logCursor being updated. What is worse, I find that > > logCursor movement is quite dubious. > > Using (inCursor - inStart) as logCursor doesn't work correctly if > tracing state desyncs. Once desync happens inStart can be moved at > the timing that the tracing code doesn't expect. - This requires (as I - mentioned upthread) pqReadData to actively reset logCursor, though. + So pgReadData needs to avtively reset logCursor. If logCursor is + actively reset, we no longer need to use (inCursor - inStart) as + logCursor and it is enough that logCursor follows inCursor. > > logCursor should move when bytes are fed to the tracing functoins even > when theyare the last bytes of a message. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: libpq debug log
At Fri, 26 Feb 2021 16:12:39 +0900 (JST), Kyotaro Horiguchi wrote in > This inhibits logCursor being updated. What is worse, I find that > logCursor movement is quite dubious. Using (inCursor - inStart) as logCursor doesn't work correctly if tracing state desyncs. Once desync happens inStart can be moved at the timing that the tracing code doesn't expect. This requires (as I mentioned upthread) pqReadData to actively reset logCursor, though. logCursor should move when bytes are fed to the tracing functoins even when theyare the last bytes of a message. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [PATCH] pgbench: Bug fix for the -d option
On Fri, Feb 26, 2021 at 01:18:20PM +0900, miyake_kouta wrote: > I'm suggesting this bug fix because I think it's a bug, but if there's any > other intent to this else if statement, could you let me know? Yes, the existing code could mess up with the selection logic if PGPORT and PGUSER are both specified in an environment, masking the value of PGUSER, so let's fix that. This is as old as 412893b. -- Michael signature.asc Description: PGP signature
Interest in GSoC 2021 Projects
Hi, I am Yixin Shi, a junior student majoring in Computer Science at University of Michigan. I notice that the project ideas for GSoC 2021 have been posted on your webpage and I am quite interested in two of them. I really wish to take part in the project *Make plsample a more complete procedural language handler example (2021)* considering both my interest and ability. The potential mentor for this Project is *Mark Wong*. I am also interested in the project *Improve PostgreSQL Regression Test Coverage (2021)* and the potential mentor is *Andreas Scherbaum* and *Stephen Frost*. I would like to learn more about these two projects but failed to contact the mentors. How can I contact them? Also, I really hope to join the project. Are there any suggestions on application? Best wishes, Yixin Shi
Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.
On Thu, Feb 25, 2021 at 5:31 AM Amit Kapila wrote: > > For docs only patch, I have few suggestions: > 1. On page [1], it is not very clear that we are suggesting to set > max_replication_slots for origins whereas your new doc patch has > clarified it, can we update the other page as well. Sorry, what other page are you referring to? > 2. > Setting it a lower value than the current > + number of tracked replication origins (reflected in > + linkend="view-pg-replication-origin-status">pg_replication_origin_status, > + not linkend="catalog-pg-replication-origin">pg_replication_origin) > + will prevent the server from starting. > + > > Why can't we just mention pg_replication_origin above? > So this is slightly confusing: pg_replication_origin just contains mappings from origin names to oids. It is regular catalog table and has no limit on its size. Users can also manually insert rows into this table. https://www.postgresql.org/docs/13/catalog-pg-replication-origin.html The view showing the in-memory information is actually pg_replication_origin_status. The number of entries here is what is actually constrained by the GUC parameter. https://www.postgresql.org/docs/13/view-pg-replication-origin-status.html I clarified pointing to pg_replication_origin_status because it could in theory be out of sync with pg_replication_origin. I'm actually not sure how entries there are managed. Perhaps if you were replicating from one database and then stopped and started replicating from another database you'd have two replication origins, but only one replication origin status? This also brings up a point regarding the naming of the added GUC. max_replication_origins is cleanest, but has this confusion regarding pg_replication_origin vs. pg_replication_origin_status. max_replication_origin_statuses is weird (and long). max_tracked_replication_origins is a possibility? (One last bit of naming confusion; the internal code refers to them as ReplicationStates, rather than ReplicationOrigins or ReplicationOriginStatuses, or something like that.) - Paul
Re: 64-bit XIDs in deleted nbtree pages
On Fri, Feb 26, 2021 at 9:58 AM Peter Geoghegan wrote: > > On Thu, Feb 25, 2021 at 5:42 AM Masahiko Sawada wrote: > > btvacuumcleanup() has been playing two roles: recycling deleted pages > > and collecting index statistics. > > Right. > > I pushed the VACUUM VERBOSE "index pages newly deleted" > instrumentation patch earlier - it really isn't complicated or > controversial, so I saw no reason to delay with that. Thanks! I think we can improve bloom indexes in a separate patch so that they use pages_newly_deleted. > > Attached is v7, which now only has the final patch -- the optimization > that makes it possible for VACUUM to recycle pages that were newly > deleted during the same VACUUM operation. Still no real changes. > Again, I just wanted to keep CFBot happy. I haven't thought about or > improved this final patch recently, and it clearly needs more work to > be ready to commit. I've looked at the patch. The patch is straightforward and I agreed with the direction. Here are some comments on v7 patch. --- + /* Allocate _bt_newly_deleted_pages_recycle related information */ + vstate.ndeletedspace = 512; Maybe add a #define for the value 512? + for (int i = 0; i < vstate->ndeleted; i++) + { + BlockNumber blkno = vstate->deleted[i].blkno; + FullTransactionId safexid = vstate->deleted[i].safexid; + + if (!GlobalVisCheckRemovableFullXid(heapRel, safexid)) + break; + + RecordFreeIndexPage(rel, blkno); + stats->pages_free++; + } Should we use 'continue' instead of 'break'? Or can we sort vstate->deleted array by full XID and leave 'break'? --- Currently, the patch checks only newly-deleted-pages if they are recyclable at the end of btvacuumscan. What do you think about the idea of checking also pages that are deleted by previous vacuums (i.g., pages already marked P_ISDELETED() but not BTPageIsRecyclable())? There is still a little hope that such pages become recyclable when we reached the end of btvacuumscan. We will end up checking such pages twice (during btvacuumscan() and the end of btvacuumscan()) but if the cost of collecting and checking pages is not high it probably could expand the chance of recycling pages. I'm going to reply to the discussion vacuum_cleanup_index_scale_factor in a separate mail. Or maybe it's better to start a new thread for that so as get opinions from other hackers. It's no longer related to the subject. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Is Recovery actually paused?
At Thu, 25 Feb 2021 13:22:53 +0530, Dilip Kumar wrote in > On Thu, Feb 25, 2021 at 12:42 PM Kyotaro Horiguchi > wrote: > > The latest version applies (almost) cleanly to the current master and > > works fine. > > I don't have further comment on this. > > > > I'll wait for a day before marking this RfC in case anyone have > > further comments. > > Okay. Hearing nothing, done that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Parallel INSERT (INTO ... SELECT ...)
From: Hou, Zhijie/侯 志杰 > After doing some more tests on it (performance degradation will not happen > when source table is out of order). > I think we can say the performance degradation is related to the order of the > data in source table. ... > So, the order of data 's influence seems a normal phenomenon, I think it seems > we do not need to do anything about it (currently). > It seems better to mark it as todo which we can improve this in the future. > > (Since the performance degradation in parallel bitmap is because of the lock > in > _bt_search, It will not always happen when the target table already have data, > less race condition will happened when parallel insert into a evenly > distributed > btree). I think so, too. The slowness of parallel insert operation due to index page contention, and index bloat, would occur depending on the order of the index key values of source records. I guess other DBMSs exhibit similar phenomenon, but I couldn't find such description in the manual, whitepapers, or several books on Oracle. One relevant excerpt is the following. This is about parallel index build. Oracle tries to minimize page contention and index bloat. This is completely my guess, but they may do similar things in parallel INSERT SELECT, because Oracle holds an exclusive lock on the target table. SQL Server also acquires an exclusive lock. Maybe we can provide an option to do so in the future. https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-tips.html#GUID-08A08783-C243-4872-AFFA-56B603F1F0F5 -- Optimizing Performance by Creating Indexes in Parallel ... Multiple processes can work simultaneously to create an index. By dividing the work necessary to create an index among multiple server processes, Oracle Database can create the index more quickly than if a single server process created the index serially. Parallel index creation works in much the same way as a table scan with an ORDER BY clause. The table is randomly sampled and a set of index keys is found that equally divides the index into the same number of pieces as the DOP. A first set of query processes scans the table, extracts key-rowid pairs, and sends each pair to a process in a second set of query processes based on a key. Each process in the second set sorts the keys and builds an index in the usual fashion. After all index pieces are built, the parallel execution coordinator simply concatenates the pieces (which are ordered) to form the final index. ... When creating an index in parallel, the STORAGE clause refers to the storage of each of the subindexes created by the query server processes. Therefore, an index created with an INITIAL value of 5 MB and a DOP of 12 consumes at least 60 MB of storage during index creation because each process starts with an extent of 5 MB. When the query coordinator process combines the sorted subindexes, some extents might be trimmed, and the resulting index might be smaller than the requested 60 MB. -- IIRC, the current patch showd nice performance improvement in some (many?) patterns. So, I think it can be committed in PG 14, when it has addressed the plan cache issue that Amit Langote-san posed. I remember the following issues/comments are pending, but they are not blockers: 1. Insert operation is run serially when the target table has a foreign key, sequence or identity column. This can be added later based on the current design without requiring rework. That is, the current patch leaves no debt. (Personally, foreign key and sequence support will also be wanted in PG 14. We may try them in the last CF once the current patch is likely to be committable.) 2. There's a plausible reason for the performance variation and index bloat with the bitmap scan case. Ideally, we want to come up with a solution that can be incorporated in PG 15. Or, it may be one conclusion that we can't prevent performance degradation in all cases. That may be one hidden reason why Oracle and SQL Server doesn't enable parallel DML by default. We can advise the user in the manual that parallel DML is not always faster than serial operation so he should test performance by enabling and disabling parallel DML. Also, maybe we should honestly state that indexes can get a bit bigger after parallel insert than after serial insert, and advise the user to do REINDEX CONCURRENTLY if necessary. 3. The total time of parallel execution can get longer because of unbalanced work distribution among parallel workers. This seems to be an existing problem, so we can pursue the improvement later, hopefully before the release of PG 14. Does anyone see any problem with committing the current patch (after polishing it)? Regards Takayuki Tsunakawa