Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
On Tue, Aug 17, 2021 at 10:54 AM Amit Kapila wrote: > > On Mon, Aug 16, 2021 at 8:18 PM Dilip Kumar wrote: > > > > On Fri, Aug 13, 2021 at 9:29 PM Andres Freund wrote: > > > > > > I think we can extend this API but I guess it is better to then do it > > > > for dsm-based as well so that these get tracked via resowner. > > > > > > DSM segments are resowner managed already, so it's not obvious that > > > that'd buy > > > us much? Although I guess there could be a few edge cases that'd look > > > cleaner, > > > because we could reliably trigger cleanup in the leader instead of > > > relying on > > > dsm detach hooks + refcounts to manage when a set is physically deleted? > > > > > > > In an off-list discussion with Thomas and Amit, we tried to discuss > > how to clean up the shared files set in the current use case. Thomas > > suggested that instead of using individual shared fileset for storing > > the data for each XID why don't we just create a single shared fileset > > for complete worker lifetime and when the worker is exiting we can > > just remove that shared fileset. And for storing XID data, we can > > just create the files under the same shared fileset and delete those > > files when we longer need them. I like this idea and it looks much > > cleaner, after this, we can get rid of the special cleanup mechanism > > using 'filesetlist'. I have attached a patch for the same. > > > > It seems to me that this idea would obviate any need for resource > owners as we will have only one fileset now. I have a few initial > comments on the patch: > One more comment: @@ -2976,39 +2952,17 @@ subxact_info_write(Oid subid, TransactionId xid) .. + /* Try to open the subxact file, if it doesn't exist then create it */ + fd = BufFileOpenShared(xidfileset, path, O_RDWR, true); + if (fd == NULL) + fd = BufFileCreateShared(xidfileset, path); .. Instead of trying to create the file here based on whether it exists or not, can't we create it in subxact_info_add where we are first time allocating memory for subxacts? If that works then in the above code, the file should always exist. -- With Regards, Amit Kapila.
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 16, 2021 at 12:06:16PM +0200, Michael Meskes wrote: > You patch removes the warning but by doing that also removes the > feature that is being tested. Oops. If kept this way, this test scenario is going to need a comment to explain exactly that. > I'm not sure what's the best way to go about it, Shall we accept to not > test this particular feature and remove the warning? After all this is > not the way the statement should be used, hence the warning. Or should > be keep it in and redirect the warning? In that case, we would also > lose other warnings that are not planned, though. FWIW, I would tend to drop the warning here. I am not sure that this is a use case interesting enough. My 2c. -- Michael signature.asc Description: PGP signature
Re: Diagnostic comment in LogicalIncreaseXminForSlot
Hi Amit and Andres, Here's updated patch On Mon, Aug 9, 2021 at 11:14 AM Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > > > On Sat, Aug 7, 2021 at 11:40 AM Andres Freund wrote: > >> Hi, >> >> On 2021-07-12 17:28:15 +0530, Ashutosh Bapat wrote: >> > On Mon, Jul 12, 2021 at 8:39 AM Amit Kapila >> wrote: >> > >> > > On Mon, Jul 5, 2021 at 12:54 PM Masahiko Sawada < >> sawada.m...@gmail.com> >> > > Today, again looking at this patch, I don't think doing elog inside >> > > spinlock is a good idea. We can either release spinlock before it or >> > > use some variable to remember that we need to write such an elog and >> > > do it after releasing the lock. What do you think? >> > >> > >> > The elog will be effective only under DEBUG1, otherwise it will be >> almost a >> > NOOP. I am wondering whether it's worth adding a bool assignment and >> move >> > the elog out only for DEBUG1. Anyway, will defer it to you. >> >> It's *definitely* not ok to do an elog() while holding a spinlock. >> Consider >> what happens if the elog tries to format the message and runs out of >> memory. Or if elog detects the stack depth is too deep. An ERROR would be >> thrown, and we'd loose track of the held spinlock. >> > > Thanks for the clarification. > > Amit, > I will provide the patch changed accordingly. > > -- > Best Wishes, > Ashutosh > -- -- Best Wishes, Ashutosh From 0a2f2c1f530d72219d9db7bd1ff77c1ae8c4ffe4 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Thu, 13 May 2021 13:40:33 +0530 Subject: [PATCH] Report new catalog_xmin candidate in LogicalIncreaseXminForSlot() Similar to LogicalIncreaseRestartDecodingForSlot() add a debug message to LogicalIncreaseXminForSlot() reporting a new catalog_xmin candidate. --- src/backend/replication/logical/logical.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c index 64b8280c13..6c8c387e73 100644 --- a/src/backend/replication/logical/logical.c +++ b/src/backend/replication/logical/logical.c @@ -1565,6 +1565,7 @@ LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin) { bool updated_xmin = false; ReplicationSlot *slot; + bool got_new_xmin = false; slot = MyReplicationSlot; @@ -1602,12 +1603,22 @@ LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin) { slot->candidate_catalog_xmin = xmin; slot->candidate_xmin_lsn = current_lsn; + + /* + * Add a line about new xmin in server logs at an appropriate log level + * after releasing the spinlock. + */ + got_new_xmin = true; } SpinLockRelease(&slot->mutex); /* candidate already valid with the current flush position, apply */ if (updated_xmin) LogicalConfirmReceivedLocation(slot->data.confirmed_flush); + + if (got_new_xmin) + elog(DEBUG1, "got new catalog_xmin %u at %X/%X", xmin, + LSN_FORMAT_ARGS(current_lsn)); } /* -- 2.25.1
Re: A problem in ExecModifyTable
On Tue, 17 Aug 2021 at 15:56, 李杰(慎追) wrote: > According to my understanding, the parent table of a partitioned table does > not store any tuples. > Then why is "relkind = = RELKIND_PARTITIONED_TABLE" suddenly added here ? We'll need some sort of ResultRelInfo in the case that all partitions are pruned. Using the one for the partitioned table is convenient. I believe that's required if there are any statement-level triggers on the partitioned table or if there's a RETURNING clause. > There is no comment on this point in the code. > Can you answer my confusion? Be deeply grateful. Yeah maybe. It's not exactly close by, but one in grouping_planner mentions this: /* * We managed to exclude every child rel, so generate a * dummy one-relation plan using info for the top target * rel (even though that may not be a leaf target). * Although it's clear that no data will be updated or * deleted, we still need to have a ModifyTable node so * that any statement triggers will be executed. (This * could be cleaner if we fixed nodeModifyTable.c to allow * zero target relations, but that probably wouldn't be a * net win.) */ David
Re: Skipping logical replication transactions on subscriber side
On Tue, Aug 17, 2021 at 10:46 AM Masahiko Sawada wrote: > > On Mon, Aug 16, 2021 at 5:30 PM Greg Nancarrow wrote: > > > > On Mon, Aug 16, 2021 at 5:54 PM houzj.f...@fujitsu.com > > wrote: > > > > > > Here is another comment: > > > > > > +char * > > > +logicalrep_message_type(LogicalRepMsgType action) > > > +{ > > > ... > > > + case LOGICAL_REP_MSG_STREAM_END: > > > + return "STREAM END"; > > > ... > > > > > > I think most the existing code use "STREAM STOP" to describe the > > > LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" > > > in > > > function logicalrep_message_type() too ? > > > > > > > +1 > > I think you're right, it should be "STREAM STOP" in that case. > > It's right that we use "STREAM STOP" rather than "STREAM END" in many > places such as elog messages, a callback name, and source code > comments. As far as I have found there are two places where we’re > using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in > doc/src/sgml/protocol.sgml. Isn't it better to fix these > inconsistencies in the first place? I think “STREAM STOP” would be > more appropriate. > I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END' seems to be a bit better because of the value 'E' we use for it. -- With Regards, Amit Kapila.
Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
On Mon, Aug 16, 2021 at 8:18 PM Dilip Kumar wrote: > > On Fri, Aug 13, 2021 at 9:29 PM Andres Freund wrote: > > > > I think we can extend this API but I guess it is better to then do it > > > for dsm-based as well so that these get tracked via resowner. > > > > DSM segments are resowner managed already, so it's not obvious that that'd > > buy > > us much? Although I guess there could be a few edge cases that'd look > > cleaner, > > because we could reliably trigger cleanup in the leader instead of relying > > on > > dsm detach hooks + refcounts to manage when a set is physically deleted? > > > > In an off-list discussion with Thomas and Amit, we tried to discuss > how to clean up the shared files set in the current use case. Thomas > suggested that instead of using individual shared fileset for storing > the data for each XID why don't we just create a single shared fileset > for complete worker lifetime and when the worker is exiting we can > just remove that shared fileset. And for storing XID data, we can > just create the files under the same shared fileset and delete those > files when we longer need them. I like this idea and it looks much > cleaner, after this, we can get rid of the special cleanup mechanism > using 'filesetlist'. I have attached a patch for the same. > It seems to me that this idea would obviate any need for resource owners as we will have only one fileset now. I have a few initial comments on the patch: 1. + /* do cleanup on worker exit (e.g. after DROP SUBSCRIPTION) */ + on_shmem_exit(worker_cleanup, (Datum) 0); It should be registered with before_shmem_exit() hook to allow sending stats for file removal. 2. After these changes, the comments atop stream_open_file and SharedFileSetInit need to be changed. 3. In function subxact_info_write(), we are computing subxact file path twice which doesn't seem to be required. 4. + if (missing_ok) + return NULL; ereport(ERROR, (errcode_for_file_access(), - errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m", + errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m", segment_name, name))); There seems to be a formatting issue with errmsg. Also, it is better to keep an empty line before ereport. 5. How can we provide a strict mechanism to not allow to use dsm APIs for non-dsm FileSet? One idea could be that we can have a variable (probably bool) in SharedFileSet structure which will be initialized in SharedFileSetInit based on whether the caller has provided dsm segment. Then in other DSM-based APIs, we can check if it is used for the wrong type. -- With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Mon, Aug 16, 2021 at 5:30 PM Greg Nancarrow wrote: > > On Mon, Aug 16, 2021 at 5:54 PM houzj.f...@fujitsu.com > wrote: > > > > Here is another comment: > > > > +char * > > +logicalrep_message_type(LogicalRepMsgType action) > > +{ > > ... > > + case LOGICAL_REP_MSG_STREAM_END: > > + return "STREAM END"; > > ... > > > > I think most the existing code use "STREAM STOP" to describe the > > LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in > > function logicalrep_message_type() too ? > > > > +1 > I think you're right, it should be "STREAM STOP" in that case. It's right that we use "STREAM STOP" rather than "STREAM END" in many places such as elog messages, a callback name, and source code comments. As far as I have found there are two places where we’re using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in doc/src/sgml/protocol.sgml. Isn't it better to fix these inconsistencies in the first place? I think “STREAM STOP” would be more appropriate. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Allow parallel DISTINCT
On Wed, 11 Aug 2021 at 16:51, David Rowley wrote: > The patch is just some plumbing work to connect all the correct paths > up to make it work. It's all fairly trivial. I looked at this patch again and realise that it could be done a bit better. For example, the previous version set the distinct_rel's FDW fields twice, once when making the serial paths and once when finalizing the partial paths. I've now added two new functions; create_final_distinct_paths and create_partial_distinct_paths. The responsibility of create_distinct_paths has changed. Instead of it creating the non-parallel DISTINCT paths, it calls the two new functions and also takes charge of calling the create_upper_paths_hook for UPPERREL_DISTINCT plus the FDW GetForeignUpperPaths() call. I think this is nicer as I'd previously added a new parameter to create_distinct_paths() so I could tell it not to call the hook as I didn't want to call that twice on the same relation as it would no doubt result in some plugin just creating the same paths again. I've also changed my mind about the previous choice I'd made not to call GetForeignUpperPaths for the UPPERREL_PARTIAL_DISTINCT. I now think that's ok. I think this is a fairly trivial patch that just does a bit of wiring up of paths. Unless anyone has anything to say about it in the next few days, I'll be looking at it again with intensions to push it. David
Re: Skipping logical replication transactions on subscriber side
On Mon, Aug 16, 2021 at 3:59 PM houzj.f...@fujitsu.com wrote: > > On Thu, Aug 12, 2021 1:53 PM Masahiko Sawada wrote: > > I've attached the updated patches. FYI I've included the patch > > (v8-0005) that fixes the assertion failure during shared fileset cleanup to > > make > > cfbot tests happy. > > Hi, > > Thanks for the new patches. > I have a few comments on the v8-0001 patch. Thank you for the comments! > > > 2) > +/* > + * Get string representing LogicalRepMsgType. > + */ > +char * > +logicalrep_message_type(LogicalRepMsgType action) > +{ > ... > + > + elog(ERROR, "invalid logical replication message type \"%c\"", > action); > +} > > Some old compilers might complain that the function doesn't have a return > value > at the end of the function, maybe we can code like the following: > > +char * > +logicalrep_message_type(LogicalRepMsgType action) > +{ > + switch (action) > + { > + case LOGICAL_REP_MSG_BEGIN: > + return "BEGIN"; > ... > + default: > + elog(ERROR, "invalid logical replication message type > \"%c\"", action); > + } > + return NULL;/* keep compiler quiet */ > +} Fixed. > > > 3) > Do we need to invoke set_apply_error_context_xact() in the function > apply_handle_stream_prepare() to save the xid and timestamp ? Yes. I think that v8-0001 patch already set xid and timestamp just after parsing stream_prepare message. You meant it's not necessary? I'll submit the updated patches soon. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: return correct error code from pgtls_init
On Tue, Jun 01, 2021 at 06:56:42PM -0700, Zhihong Yu wrote: > Looking at the -1 return, e.g. > > pq_lockarray = malloc(sizeof(pthread_mutex_t) * > CRYPTO_num_locks()); > > when pq_lockarray is NULL. We can return errno. > > I didn't change call to pthread_mutex_lock() because PGTHREAD_ERROR() is > used which aborts. I am not sure what you mean here, and there is nothing wrong with this code as far as I know, as we would let the caller of pgtls_init() know that something is wrong. -- Michael signature.asc Description: PGP signature
A problem in ExecModifyTable
Hi hackers, Recently, I noticed a great patch in pg 14. "Rework planning and execution of UPDATE and DELETE. (86dc90056dfdbd9d1b891718d2e5614e3e432f35)" This changes the DML execution of the partitioned table and makes it more friendly. But I am very confused about the following changes: ``` + relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind; + if (relkind == RELKIND_RELATION || + relkind == RELKIND_MATVIEW || + relkind == RELKIND_PARTITIONED_TABLE) { - charrelkind; - Datum datum; - boolisNull; - - relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind; - if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW) - { - datum = ExecGetJunkAttribute(slot, -junkfilter->jf_junkAttNo, -&isNull); - /* shouldn't ever get a null result... */``` According to my understanding, the parent table of a partitioned table does not store any tuples. Then why is "relkind = = RELKIND_PARTITIONED_TABLE" suddenly added here ? There is no comment on this point in the code. Can you answer my confusion? Be deeply grateful. Regards & Thanks Adger
Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
On Mon, Aug 16, 2021 at 02:06:31PM -0300, Ranier Vilela wrote: > uint64 and size_t in 64 bits are equivalents. > uint64 and size_t in 32 bits can be weird, but anyway size_t at 32 bits can > handle 1GB. There is usually a reason why things are done as they are, so before suggesting changing something I would advise to read the threads that created those changes more carefully because they could be mentioned. In this case, we are talking about aef8948, and this part of its related thread: https://www.postgresql.org/message-id/20210105034739.gg7...@momjian.us > base64.c can be made in another patch. This file is a set of code paths used by authentication and SCRAM, it will never get as hot as the code paths that Hans has reported as these are one-time operations. Please note as well cfc40d3 for the reasons why things are handled this way. We absolutely have to use safe routines here. -- Michael signature.asc Description: PGP signature
Re: Skipping logical replication transactions on subscriber side
On Thu, Aug 12, 2021 at 3:54 PM Masahiko Sawada wrote: > > I've attached the updated patches. FYI I've included the patch > (v8-0005) that fixes the assertion failure during shared fileset > cleanup to make cfbot tests happy. > Another comment on the 0001 patch: as there is now a mix of setting "apply_error_callback_arg" members directly and also through inline functions, it might look better to have it done consistently with functions having prototypes something like the following: static inline void set_apply_error_context_rel(LogicalRepRelMapEntry *rel); static inline void reset_apply_error_context_rel(void); static inline void set_apply_error_context_attnum(int remote_attnum); Regards, Greg Nancarrow Fujitsu Australia
Re: Some RELKIND macro refactoring
On Mon, Aug 16, 2021 at 10:22:50AM -0400, Alvaro Herrera wrote: > Partitioned tables are not listed here, but IIRC there's a patch to let > partitioned tables have a table AM so that their partitions inherit it. This was raised in the thread for ALTER TABLE SET ACCESS METHOD (see patch 0002): https://www.postgresql.org/message-id/20210308010707.ga29...@telsasoft.com I am not sure whether we'd want to do that for table AMs as property inheritance combined with partitioned tables has always been a sensitive topic for any properties, but if we discuss this it should be on a new thread. -- Michael signature.asc Description: PGP signature
Re: Is it worth pushing conditions to sublink/subplan?
> 2021年8月16日 17:15,Wenjing 写道: > > Hi Hackers, > > Recently, a issue has been bothering me, This is about conditional push-down > in SQL. > I use cases from regression testing as an example. > I found that the conditions (B =1) can be pushed down into the subquery, > However, it cannot be pushed down to sublink/subplan. > If a sublink/subplan clause contains a partition table, it can be useful to > get the conditions for pruning. > So, is it worth pushing conditions to sublink/subplan? > Anybody have any ideas? > > > regards, > Wenjing > > > example: > create table p (a int, b int, c int) partition by list (a); > create table p1 partition of p for values in (1); > create table p2 partition of p for values in (2); > create table q (a int, b int, c int) partition by list (a); > create table q1 partition of q for values in (1) partition by list (b); > create table q11 partition of q1 for values in (1) partition by list (c); > create table q111 partition of q11 for values in (1); > create table q2 partition of q for values in (2) partition by list (b); > create table q21 partition of q2 for values in (1); > create table q22 partition of q2 for values in (2); > insert into q22 values (2, 2, 3); Sorry, I messed up the structure of the table. It is should be: create table ab (a int not null, b int not null) partition by list (a); create table ab_a2 partition of ab for values in(2) partition by list (b); create table ab_a2_b1 partition of ab_a2 for values in (1); create table ab_a2_b2 partition of ab_a2 for values in (2); create table ab_a2_b3 partition of ab_a2 for values in (3); create table ab_a1 partition of ab for values in(1) partition by list (b); create table ab_a1_b1 partition of ab_a1 for values in (1); create table ab_a1_b2 partition of ab_a1 for values in (2); create table ab_a1_b3 partition of ab_a1 for values in (3); create table ab_a3 partition of ab for values in(3) partition by list (b); create table ab_a3_b1 partition of ab_a3 for values in (1); create table ab_a3_b2 partition of ab_a3 for values in (2); create table ab_a3_b3 partition of ab_a3 for values in (3); > > > postgres-# explain (costs off) > postgres-# select temp.b from > postgres-# ( > postgres(# select a,b from ab x where x.a = 1 > postgres(# union all > postgres(# (values(1,1)) > postgres(# ) temp, > postgres-# ab y > postgres-# where y.b = temp.b and y.a = 1 and y.b=1; > QUERY PLAN > --- > Nested Loop >-> Seq Scan on ab_a1_b1 y > Filter: ((b = 1) AND (a = 1)) >-> Append > -> Subquery Scan on "*SELECT* 1" >-> Seq Scan on ab_a1_b1 x > Filter: ((a = 1) AND (b = 1)) > -> Result > (8 rows) > > The conditions (B =1) can be pushed down into the subquery. > > postgres=# explain (costs off) > postgres-# select > postgres-# y.a, > postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b > postgres-# from ab y where a = 1 and b = 1; > QUERY PLAN > --- > Seq Scan on ab_a1_b1 y >Filter: ((a = 1) AND (b = 1)) >SubPlan 1 > -> Append >-> Seq Scan on ab_a1_b1 x_1 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a1_b2 x_2 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a1_b3 x_3 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a2_b1 x_4 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a2_b2 x_5 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a2_b3 x_6 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a3_b1 x_7 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a3_b2 x_8 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a3_b3 x_9 > Filter: ((y.a = a) AND (y.b = b)) > (22 rows) > > The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in > targetlist. > > postgres=# explain (costs off) > postgres-# select y.a > postgres-# from ab y > postgres-# where > postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and > postgres-# y.a = 1 and y.b = 1; > QUERY PLAN > --- > Seq Scan on ab_a1_b1 y >Filter: ((a = 1) AND (b = 1) AND (SubPlan 1)) >SubPlan 1 > -> Append >-> Seq Scan on ab_a1_b1 x_1 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a1_b2 x_2 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a1_b3 x_3 > Filter: ((y.a = a) AND (y.b = b)) >-> Seq Scan on ab_a2_b1 x_4 > Filter:
Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
On Sun, Aug 15, 2021 at 02:58:59PM +, Hans Buschmann wrote: > If it seems useful somebody could enter it as an open item / > resolved item for pg14 after beta 3. Hmm. Using SQLs like the following to stress pg_hex_encode(), I can see a measurable difference easily, so no need of pg_dump or an instance with many LOs: set work_mem = '1GB'; with hex_values as materialized (select repeat(i::text, N)::bytea as i from generate_series(1, M) t(i)) SELECT count(encode(i, 'hex')) from hex_values; With N = 1000, M = 10, perf shows a 34.88% use of pg_hex_encode(). On REL_13_STABLE, I get down ~27% for hex_encode(), the same backend routine. The runtime numbers are more interesting, HEAD gets up to 1600ms. With the latest version of the patch, we get down to 1550ms. Now, a simple revert of ccf4e277 and aef8948 brings me down to the same runtimes as ~13, closer to 1450ms. There seem to be some noise in the tests, but the difference is clear. Considering that commit aef8948 also involved a cleanup of src/common/hex_decode.c, I think that it would be saner to also revert c3826f83, so as we have a clean state of the code to work on should the work on the data encryption patch set move on in the future. It is not decided that this would actually use any of the hex decode/encode code, either. Honestly, I don't like much messing with this stuff after beta3, and one of the motives of moving the hex handling code to src/common/ was for the data encryption code whose state is not known as of late. This does not prevent working on such refactorings in the future, of course, keeping the performance impact in mind. In short, I am planning to address this regression with the attached, for a combined revert of 0d70d30, 5c33ba5 and 92436a7. -- Michael From 55ca0af0d5e40a57b468dac21c0b1ad294a96df5 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 17 Aug 2021 10:57:02 +0900 Subject: [PATCH] Revert refactoring of hex code to src/common/ This is a combined revert of aef8948, ccf4e27, c3826f8. --- src/include/common/hex.h | 25 --- src/include/common/sha2.h | 4 + src/include/utils/builtins.h | 4 + src/backend/replication/backup_manifest.c | 30 ++-- src/backend/utils/adt/encode.c| 158 +++--- src/backend/utils/adt/varlena.c | 15 +- src/common/Makefile | 1 - src/common/hex.c | 192 -- src/tools/msvc/Mkvcbuild.pm | 2 +- 9 files changed, 127 insertions(+), 304 deletions(-) delete mode 100644 src/include/common/hex.h delete mode 100644 src/common/hex.c diff --git a/src/include/common/hex.h b/src/include/common/hex.h deleted file mode 100644 index 150771a14d..00 --- a/src/include/common/hex.h +++ /dev/null @@ -1,25 +0,0 @@ -/* - * - * hex.h - * Encoding and decoding routines for hex strings. - * - * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group - * Portions Copyright (c) 1994, Regents of the University of California - * - * IDENTIFICATION - * src/include/common/hex.h - * - * - */ - -#ifndef COMMON_HEX_H -#define COMMON_HEX_H - -extern uint64 pg_hex_decode(const char *src, size_t srclen, - char *dst, size_t dstlen); -extern uint64 pg_hex_encode(const char *src, size_t srclen, - char *dst, size_t dstlen); -extern uint64 pg_hex_enc_len(size_t srclen); -extern uint64 pg_hex_dec_len(size_t srclen); - -#endif /* COMMON_HEX_H */ diff --git a/src/include/common/sha2.h b/src/include/common/sha2.h index dfeee6bceb..f4bae35af1 100644 --- a/src/include/common/sha2.h +++ b/src/include/common/sha2.h @@ -18,11 +18,15 @@ /*** SHA224/256/384/512 Various Length Definitions ***/ #define PG_SHA224_BLOCK_LENGTH 64 #define PG_SHA224_DIGEST_LENGTH 28 +#define PG_SHA224_DIGEST_STRING_LENGTH (PG_SHA224_DIGEST_LENGTH * 2 + 1) #define PG_SHA256_BLOCK_LENGTH 64 #define PG_SHA256_DIGEST_LENGTH 32 +#define PG_SHA256_DIGEST_STRING_LENGTH (PG_SHA256_DIGEST_LENGTH * 2 + 1) #define PG_SHA384_BLOCK_LENGTH 128 #define PG_SHA384_DIGEST_LENGTH 48 +#define PG_SHA384_DIGEST_STRING_LENGTH (PG_SHA384_DIGEST_LENGTH * 2 + 1) #define PG_SHA512_BLOCK_LENGTH 128 #define PG_SHA512_DIGEST_LENGTH 64 +#define PG_SHA512_DIGEST_STRING_LENGTH (PG_SHA512_DIGEST_LENGTH * 2 + 1) #endif /* _PG_SHA2_H_ */ diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index f3ce4fb173..40fcb0ab6d 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -31,6 +31,10 @@ extern void domain_check(Datum value, bool isnull, Oid domainType, extern int errdatatype(Oid datatypeOid); extern int errdomainconstraint(Oid datatypeOid, const char *conname); +/* encode.c */ +extern uint64 hex_encode(const char *src, size
Re: Added schema level support for publication.
On Mon, Aug 16, 2021 at 11:31 PM Tom Lane wrote: > > Peter Smith writes: > > Then the question from Peter E. [2] "Why can't I have a publication > > that publishes tables t1, t2, t3, *and* schemas s1, s2, s3." would > > have an intuitive solution like: > > > CREATE PUBLICATION pub1 > > FOR TABLE t1,t2,t3 AND > > FOR ALL TABLES IN SCHEMA s1,s2,s3; > > That seems a bit awkward, since the existing precedent is > to use commas. We shouldn't need more than one FOR noise-word, > either. So I was imagining syntax more like, say, When I wrote that "AND" suggestion I had in mind that commas may get weird if there were objects with keyword names. e.g. if there was a schema called SEQUENCE and a SEQUENCE called SEQUENCE then this will be allowed. CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA SEQUENCE, SEQUENCE SEQUENCE; But probably I was just overthinking it. > > CREATE PUBLICATION pub1 FOR > TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2, > SEQUENCE seq1,seq2, ALL SEQUENCES IN SCHEMA s3,s4; > > Abstractly it'd be > > createpub := CREATE PUBLICATION pubname FOR cpitem [, ... ] [ WITH ... ] > > cpitem := ALL TABLES | > TABLE name | > ALL TABLES IN SCHEMA name | > ALL SEQUENCES | > SEQUENCE name | > ALL SEQUENCES IN SCHEMA name | > name > > The grammar output would need some post-analysis to attribute the > right type to bare "name" items, but that doesn't seem difficult. That last bare "name" cpitem. looks like it would permit the following syntax: CREATE PUBLICATION pub FOR a,b,c; Was that intentional? -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: The Free Space Map: Problems and Opportunities
On Mon, Aug 16, 2021 at 5:15 PM Peter Geoghegan wrote: > Another concern is knock on effects *after* any initial problematic > updates -- that's certainly not where it ends. Every action has > consequences, and these consequences themselves have consequences. By > migrating a row from an earlier block into a later block (both > physically and temporally early/late), we're not just changing things > for that original order row or rows (the order non-HOT-updated by the > delivery transaction). To be clear, TPC-C looks like this: each order row and each order line row will be inserted just once, and then later updated just once. But that's it, forever -- no more modifications. Both tables grow and grow for as long as the benchmark runs. Users with workloads like this will naturally expect that performance will steady over time. Even over days or even weeks. But that's not what we see. What we actually see is that the FSM can never quite resist the temptation to dirty older pages that should be aging out. And so performance degrades over days and weeks -- that's how long Jan has said that it can take. The FSM does have a bias in favor of using earlier pages in favor of later pages, in order to make relation truncation by VACUUM more likely in general. That bias certainly isn't helping us here, and might be another thing that hurts us. I think that the fundamental problem is that the FSM just doesn't have any way of recognizing that it's behavior is penny wise, pound foolish. I don't believe that there is any fundamental reason why Postgres can't have *stable* long term performance for this workload in the way that it's really expected to. That seems possible within the confines of the existing design for HOT and VACUUM, which already work very well for the first few hours. -- Peter Geoghegan
Re: The Free Space Map: Problems and Opportunities
On Mon, Aug 16, 2021 at 3:49 PM Bruce Momjian wrote: > OK, here is my feedback. First, I understand the space > reservation/soccer idea, but I am also concerned that adding space > reservation could improve some things and make others worse. That is definitely a legitimate concern. There is a trade-off here: if we do too much preallocation, there may be cases where the preallocated space that we expected to be used quickly is never used by anybody. But that doesn't seem like a problem, provided we don't lose track of the fact that it happened. Then the worst that can happen is that we have empty pages that nobody will ever use, because nobody inserts into the table ever again (not the backend with the leaked space lease, not anybody else). That seems manageable. We can add some kind of ramp-up behavior that gets more and more aggressive as demand for new space from backends is steady or increasing. > Second, let's look at a concrete example, and see how it can be improved > more simply. As I understand it, there are three operations that need > free space: > > 1. INSERT/COPY > 2. UPDATE where there is insufficient space on the page of the > old row > 3. UPDATE where there is sufficient page space Right. > The third option already can use the existing page for a HOT update, so > the FSM doesn't matter. I agree. All good questions. Thank you for diving in on this. > For 1 & 2, suppose you have a table with 10 8k > pages, all 80% full. As I understand it, the first page will go from > 80% to 81%, then the next row addition will take the second page from > 80% to 81%, until all pages are 81%, and then it starts over again. Is > that accurate? No. Generally backends have their own target block (a simple BlockNumber) that they store in the relcache, one per table recently modified. Backends/heapam uses RelationGetTargetBlock() to access this local cache. So there is "backend affinity" for particular individual heap pages, even today. However, that still has many problems. It doesn't make sense to have a local cache for a shared resource -- that's the problem. You actually need some kind of basic locking or lease system, so that 10 backends don't all decide at the same time that one particular heap block is fully empty, and therefore a good target block for that backend alone. It's as if the backends believe that they're special little snowflakes, and that no other backend could possibly be thinking the same thing at the same time about the same heap page. And so when TPC-C does its initial bulk insert, distinct orders are already shuffled together in a hodge-podge, just because concurrent bulk inserters all insert on the same heap pages. If we could safely give out 10 or 50 pages directly to each backend, then clearly the related data would be stored together in this scenario -- each backend would be able to work through its lease of contiguous heap pages for quite a while before revisiting the FSM/relation extension. The trick is to teach the implementation to do that without allowing the backend to permanently leak whole entire leases with maybe dozens of free pages. Systems like DB2 and Oracle probably can't have this problem. Even the bulk lease case is just an extension of something they had to do anyway. You see, some kind of FSM lease system that knows about transaction lifetime for any given piece of free space is strictly necessary with UNDO based rollback. Without that, transaction rollback breaks in certain edge cases involving concurrent inserts into the same page, right before a rollback that needs to put an updated version back in place. If the updated version from undo happens to be physically larger than the now-aborted successor version, and if there is little or no space left to cover that, what can rollback do about it? Nothing. It's a nasty bug. They use heavyweight locks to avoid this. > Is that the non-temporal memory issue you are concerned > about? When I talk about memory or time, what I'm usually referring to is the ability to manage larger allocations of multiple blocks for a while after they're originally requested. So that a requesting transaction/backend is given the opportunity to use the space that they asked for. They shouldn't be completely trusted. We should trust but verify. > Doesn't spreading the new rows out increase the ability to do > HOT updates? It's true that the problem scenario with TPC-C does involve updates, and it's true that that's when things really go badly. But I deliberately haven't gone into the role that the updates play there too much. Not just yet. It's important to note that the problems really do start when we insert, even if that's not obvious -- that's when the locality is lost, right when the original order transaction comes in. We lose locality just because we have concurrent inserters into the same table, where each transaction inserts multiple related rows. We fail to group related rows
Re: archive status ".ready" files may be created too early
So why do we call this structure "record boundary map", when the boundaries it refers to are segment boundaries? I think we should call it "segment boundary map" instead ... and also I think we should use that name in the lock that protects it, so instead of ArchNotifyLock, it could be SegmentBoundaryLock or perhaps WalSegmentBoundaryLock. The reason for the latter is that I suspect the segment boundary map will also have a use to fix the streaming replication issue I mentioned earlier in the thread. This also makes me think that we'll want the wal record *start* address to be in the hash table too, not just its *end* address. So we'll use the start-1 as position to send, rather than the end-of-segment when GetFlushRecPtr() returns that. As for 0x, I think it would be cleaner to do a #define MaxXLogSegNo with that value in the line immediately after typedef XLogSegNo, rather than use the numerical value directly in the assignment. Typo in comment atop RemoveRecordBoundariesUpTo: it reads "up to an", should read "up to and". I think the API of GetLatestRecordBoundarySegment would be better by returning the boolean and having the segment as out argument. Then you could do the caller more cleanly, if (GetLatestRecordBoundarySegment(last_notified, flushed, &latest_boundary_segment)) { SetLastNotified( ... ); RemoveRecordBoundaries( ... ); LWLockRelease( ... ); for (..) XLogArchiveNotifySeg(...); PgArchWakeup(); } else LWLockRelease(...); -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "La virtud es el justo medio entre dos defectos" (Aristóteles)
Re: Re[5]: On login trigger: take three
On Tue, Aug 17, 2021 at 1:11 AM Ivan Panchenko wrote: > > Does it mean that "RAISE NOTICE" should’t be used, or behaves unexpectedly in > logon triggers? Should we mention this in the docs? > No I don't believe so, it was just that that part of the test framework (sub poll_query_until) had been changed to regard anything output to stderr as an error (so now for the test to succeed, whatever is printed to stdout must match the expected test output, and stderr must be empty). Regards, Greg Nancarrow Fujitsu Australia
Re: Allow composite foreign keys to reference a superset of unique constraint columns?
On Mon, Aug 16, 2021 at 4:37 PM Paul Martinez wrote: > > It seems like a somewhat useful feature. If people think it would be > useful to > implement, I might take a stab at it when I have time. > > This doesn't seem useful enough for us to be the only implementation to go above and beyond the SQL Standard's specification for the references feature (I assume that is what this proposal suggests). This example does a good job of explaining but its assumptions aren't that impactful and thus isn't that good at inducing desirability. David J.
Allow composite foreign keys to reference a superset of unique constraint columns?
Hey, hackers! While working with some foreign keys, I noticed some mildly unexpected behavior. The columns referenced by a unique constraint must naturally have a unique constraint on them: CREATE TABLE foo (a integer); CREATE TABLE bar (x integer REFERENCES foo(a)); > ERROR: there is no unique constraint matching given keys for referenced table "foo" But Postgres doesn't allow a foreign key to reference a set of columns without a unique constraint, even if there a unique constraint on a subset of those columns (i.e., it doesn't allow referencing a superset of a unique constraint). CREATE TABLE foo (a integer PRIMARY KEY, b integer); CREATE TABLE bar (x integer, y integer, FOREIGN KEY (x, y) REFERENCES foo(a, b)); > ERROR: there is no unique constraint matching given keys for referenced table "foo" It seems to me like there would be nothing wrong in this case to allow this foreign key constraint to exist. Because there is a unique constraint on foo(a), foo(a, b) will also be unique. And it doesn't seem like it would be too complex to implement. Neither MATCH SIMPLE nor MATCH FULL constraints would have any issues with this. MATCH PARTIAL may, but, alas, it's not implemented. (I've had a few ideas about foreign keys, and MATCH PARTIAL seems to always come up, and I still don't understand what its use case is.) A real-world use case that uses denormalization could run into this. Imagine a naive music database that has a list of artists, albums, and songs, where each album is by one artist and each song is on one album, but we still store a reference to the artist on each song: CREATE TABLE artists (id serial PRIMARY KEY, name text); CREATE TABLE albums (id serial PRIMARY KEY, artist_id REFERENCES artists(id) name text); CREATE TABLE songs ( id serial PRIMARY KEY, artist_id REFERENCES artists(id) ON DELETE CASCADE, album_id REFERENCES albums(id) ON DELETE CASCADE, name text, ); To ensure that artist deletions are fast, we need to create an index on songs(artist_id) and songs(album_id). But, suppose we wanted to save on index space, and we never needed to query JUST by album_id. We could then do: CREATE TABLE songs ( id serial PRIMARY KEY, artist_id REFERENCES artists(id) ON DELETE CASCADE, album_id integer, name text, FOREIGN KEY (artist_id, album_id) REFERENCES albums(artist_id, id) ON DELETE CASCADE ); And then we could have a single index on songs(artist_id, album_id) that would serve both ON CASCADE DELETE triggers: -- Delete artist DELETE FROM songs WHERE artist_id = ; -- Delete artist DELETE FROM songs WHERE artist_id = AND album_id = ; But Postgres wouldn't let us create the composite foreign key described. It seems like a somewhat useful feature. If people think it would be useful to implement, I might take a stab at it when I have time. - Paul
Re: The Free Space Map: Problems and Opportunities
On Mon, Aug 16, 2021 at 10:35:45AM -0700, Peter Geoghegan wrote: > I have suspected that there are serious design issues with the FSM > (and related heapam code like hio.c) for several years now [1]. This > has a lot to do with the experience of using Jan Wieck's BenchmarkSQL > implementation of TPC-C [2][3][4]. It clearly makes Postgres exhibit > pathological performance issues, especially heap fragmentation. OK, here is my feedback. First, I understand the space reservation/soccer idea, but I am also concerned that adding space reservation could improve some things and make others worse. Second, let's look at a concrete example, and see how it can be improved more simply. As I understand it, there are three operations that need free space: 1. INSERT/COPY 2. UPDATE where there is insufficient space on the page of the old row 3. UPDATE where there is sufficient page space The third option already can use the existing page for a HOT update, so the FSM doesn't matter. For 1 & 2, suppose you have a table with 10 8k pages, all 80% full. As I understand it, the first page will go from 80% to 81%, then the next row addition will take the second page from 80% to 81%, until all pages are 81%, and then it starts over again. Is that accurate? Is that the non-temporal memory issue you are concerned about? Doesn't spreading the new rows out increase the ability to do HOT updates? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Autovacuum on partitioned table (autoanalyze)
On Mon, Aug 16, 2021 at 05:42:48PM -0400, Álvaro Herrera wrote: > On 2021-Aug-16, Álvaro Herrera wrote: > > > Here's the reversal patch for the 14 branch. (It applies cleanly to > > master, but the unused member of PgStat_StatTabEntry needs to be > > removed and catversion bumped). > > I have pushed this to both branches. (I did not remove the item from > the release notes in the 14 branch.) |I retained the addition of relkind 'p' to tables included by |pg_stat_user_tables, because reverting that would require a catversion |bump. Right now, on v15dev, it shows 0, which is misleading. Shouldn't it be null ? analyze_count | 0 Note that having analyze_count and last_analyze would be an an independently useful change. Since parent tables aren't analyzed automatically, I have a script to periodically process them if they weren't processed recently. Right now, for partitioned tables, the best I could find is to check its partitions: | MIN(last_analyzed) FROM pg_stat_all_tables psat JOIN pg_inherits i ON psat.relid=i.inhrelid In 20200418050815.ge26...@telsasoft.com I wrote: |This patch includes partitioned tables in pg_stat_*_tables, which is great; I |complained awhile ago that they were missing [0]. It might be useful if that |part was split out into a separate 0001 patch (?). | [0] https://www.postgresql.org/message-id/20180601221428.GU5164%40telsasoft.com -- Justin
Re: Autovacuum on partitioned table (autoanalyze)
On 2021-Aug-16, Álvaro Herrera wrote: > Here's the reversal patch for the 14 branch. (It applies cleanly to > master, but the unused member of PgStat_StatTabEntry needs to be > removed and catversion bumped). I have pushed this to both branches. (I did not remove the item from the release notes in the 14 branch.) It upsets me to have reverted it, but after spending so much time trying to correct the problems, I believe it just wasn't salvageable within the beta-period code freeze constraints. I described the issues I ran into in earlier messages; I think a good starting point to re-develop this is to revert the reversal commit, then apply my patch at https://postgr.es/m/0794d7ca-5183-486b-9c5e-6d434867c...@www.fastmail.com then do something about the remaining problems that were complained about. (Maybe: add an "ancestor OID" member to PgStat_StatTabEntry so that the collector knows to propagate counts from children to ancestors when the upd/ins/del counts are received. However, consider developing it as follow-up to Horiguchi-san's shmem pgstat rather than current pgstat implementation.) Thanks -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: Reducing memory consumption for pending inval messages
"Bossart, Nathan" writes: > On 5/30/21, 10:22 AM, "Tom Lane" wrote: >> We can do a lot better, by exploiting what we know about the usage >> patterns of invalidation requests. > I spent some time looking through this patch, and it seems reasonable > to me. Thanks for reviewing! >> There is one notable new assumption I had to make for this. At end >> of a subtransaction, we have to merge its inval events into the >> "PriorCmd" list of the parent subtransaction. (It has to be the >> PriorCmd list, not the CurrentCmd list, because these events have >> already been processed locally; we don't want to do that again.) >> This means the parent's CurrentCmd list has to be empty at that >> instant, else we'd be trying to merge sublists that aren't adjacent >> in the array. As far as I can tell, this is always true: the patch's >> check for it doesn't trigger in a check-world run. And there's an >> argument that it must be true for semantic consistency (see comments >> in patch). So if that check ever fails, it probably means there is a >> missing CommandCounterIncrement somewhere. Still, this could use more >> review and testing. > I didn't discover any problems with this assumption in my testing, > either. Perhaps it'd be good to commit something like this sooner in > the v15 development cycle to maximize the amount of coverage it gets. Yeah, that's a good point. I'll go push this. regards, tom lane
Re: Reducing memory consumption for pending inval messages
On 5/30/21, 10:22 AM, "Tom Lane" wrote: > We can do a lot better, by exploiting what we know about the usage > patterns of invalidation requests. New requests are always added to > the latest sublist, and the only management actions are (1) merge > latest sublist into next-to-latest sublist, or (2) drop latest > sublist, if a subtransaction aborts. This means we could perfectly > well keep all the requests in a single, densely packed array in > TopTransactionContext, and replace the "list" control structures > with indexes into that array. The attached patch does that. I spent some time looking through this patch, and it seems reasonable to me. > There is one notable new assumption I had to make for this. At end > of a subtransaction, we have to merge its inval events into the > "PriorCmd" list of the parent subtransaction. (It has to be the > PriorCmd list, not the CurrentCmd list, because these events have > already been processed locally; we don't want to do that again.) > This means the parent's CurrentCmd list has to be empty at that > instant, else we'd be trying to merge sublists that aren't adjacent > in the array. As far as I can tell, this is always true: the patch's > check for it doesn't trigger in a check-world run. And there's an > argument that it must be true for semantic consistency (see comments > in patch). So if that check ever fails, it probably means there is a > missing CommandCounterIncrement somewhere. Still, this could use more > review and testing. I didn't discover any problems with this assumption in my testing, either. Perhaps it'd be good to commit something like this sooner in the v15 development cycle to maximize the amount of coverage it gets. Nathan
Re: badly calculated width of emoji in psql
On Mon, Aug 16, 2021 at 1:04 PM Jacob Champion wrote: > > On Mon, 2021-08-16 at 11:24 -0400, John Naylor wrote: > > > > On Sun, Aug 15, 2021 at 10:45 PM Michael Paquier wrote: > > > > > How large do libpgcommon deliverables get with this patch? Skimming > > > through the patch, that just looks like a couple of bytes, still. > > > > More like a couple thousand bytes. That's because the width > > of mbinterval doubled. If this is not desirable, we could teach the > > scripts to adjust the width of the interval type depending on the > > largest character they saw. > > True. Note that the combining character table currently excludes > codepoints outside of the BMP, so if someone wants combinations in > higher planes to be handled correctly in the future, the mbinterval for > that table may have to be widened anyway. Hmm, somehow it escaped my attention that the combining character table script explicitly excludes those. There's no comment about it. Maybe best to ask Peter E. (CC'd) Peter, does the combining char table exclude values > 0x for size reasons, correctness, or some other consideration? -- John Naylor EDB: http://www.enterprisedb.com
Re: when the startup process doesn't (logging startup delays)
On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby wrote: > Should this feature distinguish between crash recovery and archive recovery on > a hot standby ? Otherwise the standby will display this all the time. > > 2021-08-14 16:13:33.139 CDT startup[11741] LOG: redo in progress, elapsed > time: 124.42 s, current LSN: 0/EEE2100 > > If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes > cleanup of xlog.c variables, I can't say that with much confidence). Hmm. My inclination is to think that on an actual standby, you wouldn't want to get messages like this, but if you were doing a point-in-time-recovery, then you would. So I think maybe InArchiveRecovery is not the right thing. Perhaps StandbyMode? -- Robert Haas EDB: http://www.enterprisedb.com
ALTER TYPE vs extension membership (was Re: BUG #17144: Upgrade from v13 to v14 with the cube extension failed)
I wrote: > Hm, thanks. This does not seem to be a problem with pg_upgrade per se; > you can reproduce it with > regression=# CREATE EXTENSION cube VERSION '1.4'; > CREATE EXTENSION > regression=# CREATE EXTENSION earthdistance; > CREATE EXTENSION > regression=# ALTER EXTENSION "cube" UPDATE; > ERROR: type earth is already a member of extension "earthdistance" > [ experiments a bit more ] It might just be directly-dependent types. > If I create a table column of type cube, then nothing strange happens > during the upgrade. But if I create a domain over cube, then do the > update, the domain gets absorbed into the extension. That'd be kind > of annoying :-( So the problem is that ALTER TYPE SET recurses to dependent domains, as it must, but it is not careful about what that implies for extension membership. It looks like we need to extend GenerateTypeDependencies so that it knows not to mess with extension dependencies when doing an ALTER. There's a policy question here, which is when does an operation on a pre-existing object within an extension script cause the object to be absorbed into the extension? You might naively say "never", but that's not our historical behavior, and I think it'd clearly be the wrong thing in some cases. For example, consider CREATE TYPE cube; -- make a shell type -- do something that requires type cube to exist CREATE EXTENSION cube; We don't want this to fail, because it might be necessary to do things that way to get out of circular dependencies. On the other hand, the formerly-shell type had certainly better wind up owned by the extension. The general policy as the code stands seems to be that CREATE OR REPLACE-style operations will absorb any replaced object into the extension. IMO extension scripts generally shouldn't use CREATE OR REPLACE unless they're sure they already have such an object; but if one does use such a command, I think this behavior is reasonable. The situation is a lot less clear-cut for ALTER commands, though. Again, it's dubious that an extension should ever apply ALTER to an object that it doesn't know it already owns; but if it does, should that result in absorbing the object? I'm inclined to think not, so the attached patch just causes AlterTypeRecurse and AlterDomainDefault to never change the object's extension membership. That's more behavioral change than is strictly needed to fix the bug at hand, but I think it's a consistent definition. I looked around for other places that might have similar issues, and the only one I could find (accepting that CREATE OR REPLACE should work this way) is that ALTER OPERATOR ... SET applies makeOperatorDependencies, which has the same sort of behavior as GenerateTypeDependencies does. I'm inclined to think that for consistency, we should make that case likewise not change extension membership; but I've not done anything about it in the attached. Another point that perhaps deserves discussion is whether it's okay to change the signature of GenerateTypeDependencies in stable branches (we need to fix this in v13 not only v14/HEAD). I can't see a good reason for extensions to be calling that, and codesearch.debian.net knows of no outside callers, so I'm inclined to just change it. If anyone thinks that's too risky, we could do something with a wrapper function in v13. Comments? regards, tom lane diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c index 10f3119670..07bcdc463a 100644 --- a/src/backend/catalog/pg_depend.c +++ b/src/backend/catalog/pg_depend.c @@ -179,6 +179,13 @@ recordMultipleDependencies(const ObjectAddress *depender, * existed), so we must check for a pre-existing extension membership entry. * Passing false is a guarantee that the object is newly created, and so * could not already be a member of any extension. + * + * Note: isReplace = true is typically used when updating a object in + * CREATE OR REPLACE and similar commands. The net effect is that if an + * extension script uses such a command on a pre-existing free-standing + * object, the object will be absorbed into the extension. If the object + * is already a member of some other extension, the command will fail. + * This behavior is desirable for cases such as replacing a shell type. */ void recordDependencyOnCurrentExtension(const ObjectAddress *object, diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c index 6f9b5471da..cdce22f394 100644 --- a/src/backend/catalog/pg_type.c +++ b/src/backend/catalog/pg_type.c @@ -167,6 +167,7 @@ TypeShellMake(const char *typeName, Oid typeNamespace, Oid ownerId) 0, false, false, + true, /* make extension dependency */ false); /* Post creation hook for new shell type */ @@ -504,6 +505,7 @@ TypeCreate(Oid newTypeOid, relationKind, isImplicitArray, isDependentType, + true, /* make extension dependency */
Re: [PATCH] Native spinlock support on RISC-V
Daniel Gustafsson writes: > There didn’t seem to be anything left here (at least until there is hardware > in > the buildfarm) so I took the liberty to close it as committed in the CF app. Ah, sorry, I did not realize there was a CF entry. regards, tom lane
Re: [PATCH] Native spinlock support on RISC-V
> On 13 Aug 2021, at 20:09, Tom Lane wrote: > ..I split it off to its own code block and pushed it. There didn’t seem to be anything left here (at least until there is hardware in the buildfarm) so I took the liberty to close it as committed in the CF app. -- Daniel Gustafsson https://vmware.com/
Re: Emit namespace in post-copy output
> On 28 Jul 2021, at 16:15, Daniel Gustafsson wrote: > I took a look at this I agree with the reviewer that it's a good change. Pushed to master, thanks! -- Daniel Gustafsson https://vmware.com/
The Free Space Map: Problems and Opportunities
I have suspected that there are serious design issues with the FSM (and related heapam code like hio.c) for several years now [1]. This has a lot to do with the experience of using Jan Wieck's BenchmarkSQL implementation of TPC-C [2][3][4]. It clearly makes Postgres exhibit pathological performance issues, especially heap fragmentation. There is a clear way in which free space in the two largest tables (orders and order lines) ought to be managed, given the fixed pattern of the workload, but that isn't what we see. I had the opportunity to work with Jan Wieck and Greg Smith directly on this, shortly before I left Crunchy Data several weeks ago. Jan was able to fix some issues on the BenchmarkSQL side, which helped a lot. But the real problems remain. It is generally understood that the FSM leads to Postgres doing badly with TPC-C. I personally believe that this is representative of real practical problems, and not just a problem with this one benchmark. This email is an attempt to impose some order on a disorderly problem space. I'd like to compare notes with other people that are interested in the problem. I suspect that some experienced hackers have yet to be convinced of the importance of the FSM when it comes to bloat. Hopefully I'll manage to make that case convincingly on this thread. If any of my more concrete claims about problems in the FSM seem like they need to be justified, please let me know. I am omitting details in this initial overview email for the sake of brevity. It's already too long. Problems The FSM gives out space without considering the passage of time, or the likelihood that a particular transaction or client will consume its requested free space in a more or less predictable and steady way over time. It has no memory. The FSM therefore fails to capture naturally occurring locality, or at least doesn't specifically care about it. This is the central problem, that all other problems with the FSM seem to stem from in one way or another. The FSM should group related rows (e.g. rows from the same transaction or backend) together, so that they reliably go on the same heap page -- careless mixing of unrelated rows should be avoided. When I say "related", I mean related in whatever sense the application or end user thinks of them as related. As a general rule we should expect groups of rows that were inserted by the same transaction to also be read, updated, deleted, or removed by VACUUM together, as a group. While this isn't guaranteed, it's a good working assumption for the FSM. It avoids unnecessary fragmentation. The current FSM design causes significant fragmentation with workloads where users *expect* no fragmentation at all. I see this with TPC-C. The way that the FSM *ought* to behave for the workload in question is intuitively obvious, once you lay it all out. But that's not what we see in practice -- far from it. (Happy to go into more detail on this.) You don't even need temporal locality to see problems. Even random inserts show up certain FSM implementation problems. The FSM lacks even a basic understanding of the *aggregate* result of backends applying various FSM heuristics over time, and with concurrent access. Inserting backends currently behave like an amateur soccer team where every individual soccer player chases the ball, without any coordination among team members. The players in this analogy somehow fail to consider where the ball *is about to be* -- there is no temporal awareness. They also miss the importance of cooperating with each other as a team -- there is also no spatial awareness, and no thought given to second order effects. This leads to increased buffer lock contention and fragmentation. (Other known FSM problems have been omitted for the sake of brevity.) Background -- To recap, the FSM tracks how much free space is in every heap page. Most FSM maintenance takes place during VACUUM, though ordinary connection backends occasionally help to keep the FSM current, via calls to RecordAndGetPageWithFreeSpace() made from hio.c. There is also a bulk extension mechanism added by commit 719c84c1be, which is used when we detect multiple concurrent inserters. This bulk extension mechanism does change things a little (it gives some thought to systematic effects), though it still has the same basic issues: it doesn't do nearly enough to group likely-related rows together. I suspect that improving the bulk extension mechanism will be an important part of improving the overall picture. That mechanism needs to be more explicit, and more careful about who gets what free space when. I'll go into the significance of this mechanism to the FSM below, under "Opportunities". But first, more background information. Papers -- I've taken time to review the database research literature, which doesn't have all that much to say here. But there are a couple of relevant classic papers that I know of [6][7]. A lot of the considerations for free space
Re: Autovacuum on partitioned table (autoanalyze)
Another possible problem is that before the revert, we accept ALTER TABLE some_partitioned_table SET (autovacuum_enabled=on/off); (also autovacuum_analyze_scale_factor and autovacuum_analyze_threshold) but after the revert this is will throw a syntax error. What do people think we should do about that? 1. Do nothing. If somebody finds in that situation, they can use ALTER TABLE .. RESET ... to remove the settings. 2. Silently accept the option and do nothing. 3. Accept the option and throw a warning that it's a no-op. 4. Something else Opinions? -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ Officer Krupke, what are we to do? Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
Em seg., 16 de ago. de 2021 às 13:19, Hans Buschmann escreveu: > -- > *Von:* Ranier Vilela > *Gesendet:* Montag, 16. August 2021 17:04 > *An:* Hans Buschmann > *Cc:* pgsql-hack...@postgresql.org > *Betreff:* Re: PG14: Avoid checking output-buffer-length for every > encoded byte during pg_hex_encode > > Hello Ranier, > > Thank you for your quick response. > > >Is always good to remove immutables from loops, if safe and secure. > > I think it's worse because a loop changed variable is involved in the > test, so it seems to be not immutable. > > >Decode has regression too. > > good catch, I overlooked it. > > >I think that we can take one more step here. > >pg_hex_decode can be improved too. > > +1 > > By looking a bit more precisely I detected the same checks in > common/base64.c also involving loop-changed variables. Could you please > make the same changes to base64.c? > I will take a look. > > >We can remove limitations from all functions that use hex functions. > >byteaout from (varlena.c) has an artificial limitation to handle only > MaxAllocSize length, but > >nothing prevents her from being prepared for that limitation to be > removed one day. > > +1, but I don't know all implications of the type change to size_t > uint64 and size_t in 64 bits are equivalents. uint64 and size_t in 32 bits can be weird, but anyway size_t at 32 bits can handle 1GB. > > >Please, take a look at the new version attached. > > Two remarks for decoding (by eyeball inspection of patch file only > because of still struggling with git etc.): > > 1. The odd-number-of-digits-check for decoding is still part of the loop. > It should be before the loop and called only once (by testing for an even > number of srclen) > So we can eliminate the block if (s == srcend) {} inside the loop! > I'm afraid that is not possible. I tested some variations for this test and regress fails at strings tests. Anyway for test purposes, I changed it temporarily in this version, but I'm afraid we'll have to go back. > 2. I noticed that the error messages for hex_decode should be changed as > > s/encoding/decoding/ > > (as in pg_log_fatal("overflow of destination buffer in hex encoding");) > Ok. Changed. > >If possible can you review the tests if there is an overflow at > >pg_hex_encode and pg_hex_decode functions? > > I would prefer to wait for another patch version from you (my development > troubles), perhaps also dealing with base64.c > base64.c can be made in another patch. > I don't know how and where any call to the encoding/decoding functions > creates an overflow situation in normal operation. > > I will nonceless try the tests but I don't have any experience with it. > Thanks. > BTW the root cause for my work is an attempt to vastly accelerate these > functions (hex and base64 encode/decode), but this is left for another day > (not finished yet) and certainly only on master/new development. > I think this can speed up a little. > Lateron support on this topic would be highly appreciated... > If I can help, count on me. regards, Ranier Vilela 0001-improve-hex-functions-handling-v1.patch Description: Binary data
Re: badly calculated width of emoji in psql
On Mon, 2021-08-16 at 11:24 -0400, John Naylor wrote: > > On Sun, Aug 15, 2021 at 10:45 PM Michael Paquier wrote: > > > How large do libpgcommon deliverables get with this patch? Skimming > > through the patch, that just looks like a couple of bytes, still. > > More like a couple thousand bytes. That's because the width > of mbinterval doubled. If this is not desirable, we could teach the > scripts to adjust the width of the interval type depending on the > largest character they saw. True. Note that the combining character table currently excludes codepoints outside of the BMP, so if someone wants combinations in higher planes to be handled correctly in the future, the mbinterval for that table may have to be widened anyway. --Jacob
Re: Autovacuum on partitioned table (autoanalyze)
On 2021-Aug-16, Tom Lane wrote: > =?utf-8?Q?=C3=81lvaro?= Herrera writes: > > Here's the reversal patch for the 14 branch. (It applies cleanly to > > master, but the unused member of PgStat_StatTabEntry needs to be > > removed and catversion bumped). > > I don't follow the connection to catversion? Sorry, I misspoke -- I mean PGSTAT_FORMAT_FILE_ID. I shouldn't just change it, since if I do then the file is reported as corrupted and all counters are lost. So in the posted patch I did as you suggest: > I agree that we probably don't want to change PgStat_StatTabEntry in > v14 at this point. But it'd be a good idea to attach a comment to > the entry saying it's unused but left there for ABI reasons. It's only in branch master that I'd change the pgstat format version and remove the field. This is what I meant with the patch being for v14 and a tweak needed for this in master. A catversion bump would be required to change the definition of pg_stat_user_tables, which the patch being reverted originally changed to include relkind 'p'. A straight revert would remove that, but in my reversal patch I chose to keep it in place. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)
AW: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
Von: Ranier Vilela Gesendet: Montag, 16. August 2021 17:04 An: Hans Buschmann Cc: pgsql-hack...@postgresql.org Betreff: Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode Hello Ranier, Thank you for your quick response. >Is always good to remove immutables from loops, if safe and secure. I think it's worse because a loop changed variable is involved in the test, so it seems to be not immutable. >Decode has regression too. good catch, I overlooked it. >I think that we can take one more step here. >pg_hex_decode can be improved too. +1 By looking a bit more precisely I detected the same checks in common/base64.c also involving loop-changed variables. Could you please make the same changes to base64.c? >We can remove limitations from all functions that use hex functions. >byteaout from (varlena.c) has an artificial limitation to handle only >MaxAllocSize length, but >nothing prevents her from being prepared for that limitation to be removed one >day. +1, but I don't know all implications of the type change to size_t >Please, take a look at the new version attached. Two remarks for decoding (by eyeball inspection of patch file only because of still struggling with git etc.): 1. The odd-number-of-digits-check for decoding is still part of the loop. It should be before the loop and called only once (by testing for an even number of srclen) So we can eliminate the block if (s == srcend)? {} inside the loop! 2. I noticed that the error messages for hex_decode should be changed as s/encoding/decoding/ (as in pg_log_fatal("overflow of destination buffer in hex encoding");) >If possible can you review the tests if there is an overflow at >pg_hex_encode and pg_hex_decode functions? I would prefer to wait for another patch version from you (my development troubles), perhaps also dealing with base64.c I don't know how and where any call to the encoding/decoding functions creates an overflow situation in normal operation. I will nonceless try the tests but I don't have any experience with it. BTW the root cause for my work is an attempt to vastly accelerate these functions (hex and base64 encode/decode), but this is left for another day (not finished yet) and certainly only on master/new development. Lateron support on this topic would be highly appreciated... Best regards, Hans Buschmann
Re: Autovacuum on partitioned table (autoanalyze)
=?utf-8?Q?=C3=81lvaro?= Herrera writes: > Here's the reversal patch for the 14 branch. (It applies cleanly to > master, but the unused member of PgStat_StatTabEntry needs to be > removed and catversion bumped). I don't follow the connection to catversion? I agree that we probably don't want to change PgStat_StatTabEntry in v14 at this point. But it'd be a good idea to attach a comment to the entry saying it's unused but left there for ABI reasons. regards, tom lane
Re: Slightly improve initdb --sync-only option's help message
On Mon, Aug 16, 2021 at 4:42 AM Daniel Gustafsson wrote: > > > On 30 Jul 2021, at 18:27, Bossart, Nathan wrote: > > > > On 7/30/21, 2:22 AM, "Daniel Gustafsson" wrote: > >> LGTM. I took the liberty to rephrase the "and exit" part of the initdb > >> help > >> output match the other exiting options in there. Barring objections, I > >> think > >> this is ready. > > > > LGTM. Thanks! > > Pushed to master, thanks! Thank you Daniel and Nathan! Much appreciated. Best regards, -- Gurjeet Singh http://gurjeet.singh.im/
Re: call popcount32/64 directly on non-x86 platforms
I wrote: > On Thu, Aug 12, 2021 at 1:26 AM David Rowley wrote: > > Closer, but I don't see why there's any need to make the fast and slow > > functions external. It should be perfectly fine to keep them static. > > > > I didn't test the performance, but the attached works for me. > > Thanks for that! I still get a big improvement to on Power8 / gcc 4.8, but it's not quite as fast as earlier versions, which were around 200ms: > > master: 646ms > v3: 312ms > > This machine does seem to be pickier about code layout than any other I've tried running benchmarks on, but that's still a bit much. In any case, your version is clearer and has the intended effect, so I plan to commit that, barring other comments. Pushed, thanks for looking1 -- John Naylor EDB: http://www.enterprisedb.com
Re: Window Function "Run Conditions"
On Mon, Aug 16, 2021 at 3:28 AM David Rowley wrote: > On Thu, 1 Jul 2021 at 21:11, David Rowley wrote: > > 1) Unsure of the API to the prosupport function. I wonder if the > > prosupport function should just be able to say if the function is > > either monotonically increasing or decreasing or neither then have > > core code build a qual. That would make the job of building new > > functions easier, but massively reduce the flexibility of the feature. > > I'm just not sure it needs to do more in the future. > > I looked at this patch again today and ended up changing the API that > I'd done for the prosupport functions. These just now set a new > "monotonic" field in the (also newly renamed) > SupportRequestWFuncMonotonic struct. This can be set to one of the > values from the newly added MonotonicFunction enum, namely: > MONOTONICFUNC_NONE, MONOTONICFUNC_INCREASING, MONOTONICFUNC_DECREASING > or MONOTONICFUNC_BOTH. > > I also added handling for a few more cases that are perhaps rare but > could be done with just a few lines of code. For example; COUNT(*) > OVER() is MONOTONICFUNC_BOTH as it can neither increase nor decrease > for a given window partition. I think technically all of the standard > set of aggregate functions could have a prosupport function to handle > that case. Min() and Max() could go a little further, but I'm not sure > if adding handling for that would be worth it, and if someone does > think that it is worth it, then I'd rather do that as a separate > patch. > > I put the MonotonicFunction enum in plannodes.h. There's nothing > specific about window functions or support functions. It could, for > example, be reused again for something else such as monotonic > set-returning functions. > > One thing which I'm still not sure about is where > find_window_run_conditions() should be located. Currently, it's in > allpaths.c but that does not really feel like the right place to me. > We do have planagg.c in src/backend/optimizer/plan, maybe we need > planwindow.c? > > David > Hi, + if ((res->monotonic & MONOTONICFUNC_INCREASING) == MONOTONICFUNC_INCREASING) The above can be simplified as 'if (res->monotonic & MONOTONICFUNC_INCREASING) ' Cheers
Re: Autovacuum on partitioned table (autoanalyze)
Here's the reversal patch for the 14 branch. (It applies cleanly to master, but the unused member of PgStat_StatTabEntry needs to be removed and catversion bumped). -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ Maybe there's lots of data loss but the records of data loss are also lost. (Lincoln Yeoh) >From cad5b710a531ec6eefc8856177c68d594c60ac8c Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 16 Aug 2021 10:56:07 -0400 Subject: [PATCH] Revert analyze support for partitioned tables This reverts the following commits: 1b5617eb844cd2470a334c1d2eec66cf9b39c41a Describe (auto-)analyze behavior for partitioned tables 0e69f705cc1a3df273b38c9883fb5765991e04fe Set pg_class.reltuples for partitioned tables 41badeaba8beee7648ebe7923a41c04f1f3cb302 Document ANALYZE storage parameters for partitioned tables 0827e8af70f4653ba17ed773f123a60eadd9f9c9 autovacuum: handle analyze for partitioned tables There are efficiency issues in this code when handling databases with large numbers of partitions, and it doesn't look like there isn't any trivial way to handle those. There are some other issues as well. It's now too late in the cycle for nontrivial fixes, so we'll have to let Postgres 14 users continue to manually deal with ANALYZE their partitioned tables, and hopefully we can fix the issues for Postgres 15. I chose to keep [most of] be280cdad298 ("Don't reset relhasindex for partitioned tables on ANALYZE") because while we added due to 0827e8af70f4, it is a reasonable change in its own right (since it affects manual analyze as well as autovacuum-induced analyze) and there's no reason to revert it. I retained relkind 'p' in the definition of view pg_stat_user_tables, because that change would require a catversion bump. Also, in pg14 only, I keep a struct member that was added in PgStat_TabStatEntry to avoid breaking compatibility with existing stat files, because changing that would require a catversion bump. Backpatch to 14. Discussion: https://postgr.es/m/20210722205458.f2bug3z6qzxzp...@alap3.anarazel.de --- doc/src/sgml/maintenance.sgml | 6 -- doc/src/sgml/perform.sgml | 3 +- doc/src/sgml/ref/analyze.sgml | 40 +++-- doc/src/sgml/ref/create_table.sgml | 8 +- doc/src/sgml/ref/pg_restore.sgml | 6 +- src/backend/access/common/reloptions.c | 15 ++-- src/backend/commands/analyze.c | 52 +++- src/backend/commands/tablecmds.c | 47 +-- src/backend/postmaster/autovacuum.c| 66 +++ src/backend/postmaster/pgstat.c| 108 +++-- src/include/pgstat.h | 26 +- 11 files changed, 57 insertions(+), 320 deletions(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 998a48fc25..36f975b1e5 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -817,12 +817,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu is compared to the total number of tuples inserted, updated, or deleted since the last ANALYZE. -For partitioned tables, inserts, updates and deletes on partitions -are counted towards this threshold; however, DDL -operations such as ATTACH, DETACH -and DROP are not, so running a manual -ANALYZE is recommended if the partition added or -removed contains a statistically significant volume of data. diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index ddd6c3ff3e..89ff58338e 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1767,8 +1767,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; Whenever you have significantly altered the distribution of data within a table, running ANALYZE is strongly recommended. This -includes bulk loading large amounts of data into the table as well as -attaching, detaching or dropping partitions. Running +includes bulk loading large amounts of data into the table. Running ANALYZE (or VACUUM ANALYZE) ensures that the planner has up-to-date statistics about the table. With no statistics or obsolete statistics, the planner might diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml index 176c7cb225..c8fcebc161 100644 --- a/doc/src/sgml/ref/analyze.sgml +++ b/doc/src/sgml/ref/analyze.sgml @@ -250,38 +250,20 @@ ANALYZE [ VERBOSE ] [ table_and_columns - If the table being analyzed is partitioned, ANALYZE - will gather statistics by sampling blocks randomly from its partitions; - in addition, it will recurse into each partition and update its statistics. - (However, in multi-level partitioning scenarios, each leaf partition - will only be analyzed once.) - By contrast, if the table being analyzed has inheritance children, - ANALYZE will gather statistics for it twice: - once on the rows of the parent table only, and a s
Re: PoC/WIP: Extended statistics on expressions
On 8/16/21 3:32 AM, Justin Pryzby wrote: On Mon, Dec 07, 2020 at 03:15:17PM +0100, Tomas Vondra wrote: Looking at the current behaviour, there are a couple of things that seem a little odd, even though they are understandable. For example, the fact that CREATE STATISTICS s (expressions) ON (expr), col FROM tbl; fails, but CREATE STATISTICS s (expressions, mcv) ON (expr), col FROM tbl; succeeds and creates both "expressions" and "mcv" statistics. Also, the syntax CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl; tends to suggest that it's going to create statistics on the pair of expressions, describing their correlation, when actually it builds 2 independent statistics. Also, this error text isn't entirely accurate: CREATE STATISTICS s ON col FROM tbl; ERROR: extended statistics require at least 2 columns because extended statistics don't always require 2 columns, they can also just have an expression, or multiple expressions and 0 or 1 columns. I think a lot of this stems from treating "expressions" in the same way as the other (multi-column) stats kinds, and it might actually be neater to have separate documented syntaxes for single- and multi-column statistics: CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ON (expression) FROM table_name CREATE STATISTICS [ IF NOT EXISTS ] statistics_name [ ( statistics_kind [, ... ] ) ] ON { column_name | (expression) } , { column_name | (expression) } [, ...] FROM table_name The first syntax would create single-column stats, and wouldn't accept a statistics_kind argument, because there is only one kind of single-column statistic. Maybe that might change in the future, but if so, it's likely that the kinds of single-column stats will be different from the kinds of multi-column stats. In the second syntax, the only accepted kinds would be the current multi-column stats kinds (ndistinct, dependencies, and mcv), and it would always build stats describing the correlations between the columns listed. It would continue to build standard/expression stats on any expressions in the list, but that's more of an implementation detail. It would no longer be possible to do "CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl". Instead, you'd have to issue 2 separate "CREATE STATISTICS" commands, but that seems more logical, because they're independent stats. The parsing code might not change much, but some of the errors would be different. For example, the errors "building only extended expression statistics on simple columns not allowed" and "extended expression statistics require at least one expression" would go away, and the error "extended statistics require at least 2 columns" might become more specific, depending on the stats kind. This still seems odd: postgres=# CREATE STATISTICS asf ON i FROM t; ERROR: extended statistics require at least 2 columns postgres=# CREATE STATISTICS asf ON (i) FROM t; CREATE STATISTICS It seems wrong that the command works with added parens, but builds expression stats on a simple column (which is redundant with what analyze does without extended stats). Well, yeah. But I think this is a behavior that was discussed somewhere in this thread, and the agreement was that it's not worth the complexity, as this comment explains * XXX We do only the bare minimum to separate simple attribute and * complex expressions - for example "(a)" will be treated as a complex * expression. No matter how elaborate the check is, there'll always be * a way around it, if the user is determined (consider e.g. "(a+0)"), * so it's not worth protecting against it. Of course, maybe that wasn't the right decision - it's a bit weird that CREATE INDEX on t ((a), (b)) actually "extracts" the column references and stores that in indkeys, instead of treating that as expressions. Patch 0001 fixes the "double parens" issue discussed elsewhere in this thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a simple column reference. But I'm not sure 0002 is something we can do without catversion bump. What if someone created such "bogus" statistics? It's mostly harmless, because the statistics is useless anyway (AFAICS we'll just use the regular one we have for the column), but if they do pg_dump, that'll fail because of this new restriction. OTOH we're still "only" in beta, and IIRC the rule is not to bump catversion after rc1. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company >From 4428ee5b46fe1ce45331c355e1646520ca769991 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 16 Aug 2021 16:40:43 +0200 Subject: [PATCH 1/2] fix: don't print double parens --- src/backend/utils/adt/ruleutils.c | 2 +- .../regress/expected/create_table_like.out| 6 +- src/test/regress/expected/stats_ext.out | 110 +- 3 files changed, 59 insertions(+), 59 del
Re: Non-decimal integer literals
On Mon, Aug 16, 2021 at 5:52 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > > Here is a patch to add support for hexadecimal, octal, and binary > integer literals: > > 0x42E > 0o112 > 0b100101 > > per SQL:202x draft. > > This adds support in the lexer as well as in the integer type input > functions. The one thing that jumped out at me on a cursory reading is the {integer} rule, which seems to be used nowhere except to call process_integer_literal, which must then inspect the token text to figure out what type of integer it is. Maybe consider 4 separate process_*_literal functions? -- John Naylor EDB: http://www.enterprisedb.com
Re: badly calculated width of emoji in psql
On Sun, Aug 15, 2021 at 10:45 PM Michael Paquier wrote: > > On Thu, Aug 12, 2021 at 05:13:31PM -0400, John Naylor wrote: > > I'm a bit concerned about the build dependencies not working right, but > > it's not clear it's even due to the patch. I'll spend some time > > investigating next week. > > How large do libpgcommon deliverables get with this patch? Skimming > through the patch, that just looks like a couple of bytes, still. More like a couple thousand bytes. That's because the width of mbinterval doubled. If this is not desirable, we could teach the scripts to adjust the width of the interval type depending on the largest character they saw. -- John Naylor EDB: http://www.enterprisedb.com
Re[5]: On login trigger: take three
Hi Greg, >Среда, 7 июля 2021, 3:55 +03:00 от Greg Nancarrow : > >On Sun, Jul 4, 2021 at 1:21 PM vignesh C < vignes...@gmail.com > wrote: >> >> CFBot shows the following failure: >> # poll_query_until timed out executing this query: >> # SELECT '0/3046250' <= replay_lsn AND state = 'streaming' FROM >> pg_catalog.pg_stat_replication WHERE application_name = 'standby_1'; >> # expecting this output: >> # t >> # last actual query output: >> # t >> # with stderr: >> # NOTICE: You are welcome! >> # Looks like your test exited with 29 before it could output anything. >> t/001_stream_rep.pl .. >> Dubious, test returned 29 (wstat 7424, 0x1d00) >> >Thanks. >I found that the patch was broken by commit f452aaf7d (the part >"adjust poll_query_until to insist on empty stderr as well as a stdout >match"). >So I had to remove a "RAISE NOTICE" (which was just an informational >message) from the login trigger function, to satisfy the new >poll_query_until expectations. Does it mean that "RAISE NOTICE" should’t be used, or behaves unexpectedly in logon triggers? Should we mention this in the docs? Regards, Ivan Panchenko >Also, I updated a PG14 version check (now must check PG15 version). > >Regards, >Greg Nancarrow >Fujitsu Australia
Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
Welcome. Em seg., 16 de ago. de 2021 às 05:46, Hans Buschmann escreveu: > During some development on encoding-related parts of postgres I stumbled > over the new length-checking-code in common/hex.c/pg_hex_encode. > > Differently when comparing it to all versions before the > output-buffer-length is checked during encoding of every individual source > byte. > Is always good to remove immutables from loops, if safe and secure. > This may impose quite a regression when using pg_dump on databases with > many/big bytea or lo columns. > Decode has regression too. > Because all criteria to check buffer-length are known in advance of > encoding (srclen and destlen) I propose doing the check only once before > starting the while-loop. > Good. > > Please find the attached patch for pg14 and master, older versions did not > have this behavior. > I think that we can take one more step here. pg_hex_decode can be improved too. We can remove limitations from all functions that use hex functions. byteaout from (varlena.c) has an artificial limitation to handle only MaxAllocSize length, but nothing prevents her from being prepared for that limitation to be removed one day. > Tested on pg14-beta3, but applies also on master. > > PS: This is my very first patch, I am in no way an experienced C-developer > and don't have a smoothly running development environment or experience > yet. (originally mostly dealing with postgres on Windows). > It seems good to me. Please, take a look at the new version attached. If possible can you review the tests if there is an overflow at pg_hex_encode and pg_hex_decode functions? regards, Ranier Vilela 0001-improve-hex-functions-handling.patch Description: Binary data
Re: POC: Cleaning up orphaned files using undo logs
> On Wed, Jun 30, 2021 at 07:41:16PM +0200, Antonin Houska wrote: > Antonin Houska wrote: > > > tsunakawa.ta...@fujitsu.com wrote: > > > > > I'm crawling like a snail to read the patch set. Below are my first set > > > of review comments, which are all minor. > > > > Thanks. > > I've added the patch to the upcoming CF [1], so it possibly gets more review > and makes some progress. I've marked myself as the author so it's clear who > will try to respond to the reviews. It's clear that other people did much more > work on the feature than I did so far - they are welcome to add themselves to > the author list. > > [1] https://commitfest.postgresql.org/33/3228/ Hi, I'm crawling through the patch set like even slower creature than a snail, sorry for long absence. I'm reading the latest version posted here and, although it's hard to give any high level design comments on it yet, I thought it could be useful to post a few findings and questions in the meantime. * One question about the functionality: > On Fri, Jan 29, 2021 at 06:30:15PM +0100, Antonin Houska wrote: > Attached is the next version. Changes done: > > * Removed the progress tracking and implemented undo discarding in a simpler > way. Now, instead of maintaining the pointer to the last record applied, > only a boolean field in the chunk header is set when ROLLBACK is > done. This helps to determine whether the undo of a non-committed > transaction can be discarded. Just to clarify, the whole feature was removed for the sake of simplicity, right? * By throwing at the patchset `make installcheck` I'm getting from time to time and error on the restart: TRAP: FailedAssertion("BufferIsValid(buffers[nbuffers].buffer)", File: "undorecordset.c", Line: 1098, PID: 6055) From what I see XLogReadBufferForRedoExtended finds an invalid buffer and returns BLK_NOTFOUND. The commentary says: If the block was not found, then it must be discarded later in the WAL. and continues with skip = false, but fails to get a page from an invalid buffer few lines later. It seems that the skip flag is supposed to be used this situation, should it also guard the BufferGetPage part? * Another interesting issue I've found happened inside DropUndoLogsInTablespace, when the process got SIGTERM. It seems processing stuck on: slist_foreach_modify(iter, &UndoLogShared->shared_free_lists[i]) iterating on the same element over and over. My guess is clear_private_free_lists was called and caused such unexpected outcome, should the access to shared_free_lists be somehow protected? * I also wonder about the segments in base/undo, the commentary in pg_undodump says: Since the UNDO log is a continuous stream of changes, any hole terminates processing. It looks like it's relatively easy to end up with such holes, and pg_undodump ends up with a message (found is added by me and contains a found offset which do not match the expected value): pg_undodump: error: segment 00 missing in log 2, found 10 This seems to be not causing any real issues, but it's not clear for me if such situation with gaps is fine or is it a problem? Other than that one more time thank you for this tremendous work, I find that the topic is of extreme importance.
Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
On Fri, Aug 13, 2021 at 9:29 PM Andres Freund wrote: > > I think we can extend this API but I guess it is better to then do it > > for dsm-based as well so that these get tracked via resowner. > > DSM segments are resowner managed already, so it's not obvious that that'd buy > us much? Although I guess there could be a few edge cases that'd look cleaner, > because we could reliably trigger cleanup in the leader instead of relying on > dsm detach hooks + refcounts to manage when a set is physically deleted? > In an off-list discussion with Thomas and Amit, we tried to discuss how to clean up the shared files set in the current use case. Thomas suggested that instead of using individual shared fileset for storing the data for each XID why don't we just create a single shared fileset for complete worker lifetime and when the worker is exiting we can just remove that shared fileset. And for storing XID data, we can just create the files under the same shared fileset and delete those files when we longer need them. I like this idea and it looks much cleaner, after this, we can get rid of the special cleanup mechanism using 'filesetlist'. I have attached a patch for the same. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com 0001-Better-usage-of-sharedfileset-in-apply-worker.patch Description: Binary data
Re: Some RELKIND macro refactoring
On 2021-Aug-16, Peter Eisentraut wrote: > + if (rel->rd_rel->relkind == RELKIND_INDEX || > + rel->rd_rel->relkind == RELKIND_SEQUENCE) > + RelationCreateStorage(rel->rd_node, relpersistence); > + else if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind)) > + table_relation_set_new_filenode(rel, &rel->rd_node, > + > relpersistence, > + > relfrozenxid, relminmxid); > + else > + Assert(false); I think you could turn this one (and the one in RelationSetNewRelfilenode) around: if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind)) table_relation_set_new_filenode(...); else if (RELKIND_HAS_STORAGE(rel->rd_rel->relkind)) RelationCreateStorage(...); > +/* > + * Relation kinds with a table access method (rd_tableam). Although > sequences > + * use the heap table AM, they are enough of a special case in most uses that > + * they are not included here. > + */ > +#define RELKIND_HAS_TABLE_AM(relkind) \ > + ((relkind) == RELKIND_RELATION || \ > + (relkind) == RELKIND_TOASTVALUE || \ > + (relkind) == RELKIND_MATVIEW) Partitioned tables are not listed here, but IIRC there's a patch to let partitioned tables have a table AM so that their partitions inherit it. I'm wondering if that patch is going to have to change this spot; and if it does, how will that affect the callers of this macro that this patch creates. I think a few places assume that HAS_TABLE_AM means that the table has storage, but perhaps it would be better to spell that out explicitly: @@ -303,9 +303,7 @@ verify_heapam(PG_FUNCTION_ARGS) /* * Check that a relation's relkind and access method are both supported. */ - if (ctx.rel->rd_rel->relkind != RELKIND_RELATION && - ctx.rel->rd_rel->relkind != RELKIND_MATVIEW && - ctx.rel->rd_rel->relkind != RELKIND_TOASTVALUE) + if (!(RELKIND_HAS_TABLE_AM(ctx.rel->rd_rel->relkind) && RELKIND_HAS_STOAGE(ctx.rel->rd_rel->relkind))) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot check relation \"%s\"", -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Autovacuum on partitioned table (autoanalyze)
On 2021-Aug-13, Álvaro Herrera wrote: > Some doc changes are pending, and some more commentary in parts of the > code, but I think this is much more sensible. I do lament the lack of > a syscache for pg_inherits. Thinking about this again, this one here is the killer problem, I think; this behaves pretty horribly if you have more than one partition level, because it'll have to do one indexscan *per level per partition*. (For example, five partitions two levels down mean ten index scans). There's no cache for this, and no way to disable it. So for situations with a lot of partitions, it could be troublesome. Granted, it only needs to be done for partitions with DML changes since the previous autovacuum worker run in the affected database, but still it could be significant. Now we could perhaps have a hash table in partition_analyze_report_ancestors() to avoid the need for repeated indexscans for partitions of the same hierarchy (an open-coded cache to take the place of the missing pg_inherits syscache); and perhaps even use a single seqscan of pg_inherits to capture the whole story first and then filter down to the partitions that we were asked to process ... (so are we building a mini-optimizer to determine which strategy to use in each case?). That all sounds too much to be doing in the beta. So I'm leaning towards the idea that we need to revert the patch and start over for pg15. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)
Re: CI/windows docker vs "am a service" autodetection on windows
On Fri, Aug 13, 2021 at 3:03 PM Andres Freund wrote: > > Hi, > > Magnus, Michael, Anyone - I'd appreciate a look. > > On 2021-03-05 12:55:37 -0800, Andres Freund wrote: > > > After fighting with a windows VM for a bit (ugh), it turns out that yes, > > > there is stderr, but that fileno(stderr) returns -2, and > > > GetStdHandle(STD_ERROR_HANDLE) returns NULL (not INVALID_HANDLE_VALUE). > > > > > > The complexity however is that while that's true for pg_ctl within > > > pgwin32_ServiceMain: > > > checking what stderr=7FF8687DFCB0 is (handle: 0, fileno=-2) > > > but not for postmaster or backends > > > WARNING: 01000: checking what stderr=7FF880F5FCB0 is (handle: 92, > > > fileno=2) > > > > > > which makes sense in a way, because we don't tell CreateProcessAsUser() > > > that it should pass stdin/out/err down (which then seems to magically > > > get access to the "topmost" console applications output - damn, this > > > stuff is weird). > > > > That part is not too hard to address - it seems we only need to do that > > in pg_ctl pgwin32_doRunAsService(). It seems that the > > stdin/stderr/stdout being set to invalid will then be passed down to > > postmaster children. > > > > https://docs.microsoft.com/en-us/windows/console/getstdhandle > > "If an application does not have associated standard handles, such as a > > service running on an interactive desktop, and has not redirected them, > > the return value is NULL." > > > > There does seem to be some difference between what services get as std* > > - GetStdHandle() returns NULL, and what explicitly passing down invalid > > handles to postmaster does - GetStdHandle() returns > > INVALID_HANDLE_VALUE. But passing down NULL rather than > > INVALID_HANDLE_VALUE to postmaster seems to lead to postmaster > > re-opening console buffers. > > > > Patch attached. > > > I'd like to commit something to address this issue to master soon - it > > allows us to run a lot more tests in cirrus-ci... But probably not > > backpatch it [yet] - there've not really been field complaints, and who > > knows if there end up being some unintentional side-effects... > > Because it'd allow us to run more tests as part of cfbot and other CI > efforts, I'd like to push this forward. So I'm planning to commit this > to master soon-ish, unless somebody wants to take this over? I'm really > not a windows person... It certainly sounds reasonable. It does make me wonder why we didn't use that GetStdHandle in the first place -- mostly in that "did we try that and it didn't work", but that was long enough ago that I really can't remember, and I am unable to find any references in my mail history either. So it may very well just be that we missed it. But give the number of times we've had issues around this, it makes me wonder. It could of course also be something that didn't use to be reliable but us now -- the world of Windows has changed a lot since that was written. It wouldn't surprise me if it does break some *other* weird cornercase, but based on the docs page you linked to it doesn't look like it would break any of the normal/standard usecases. But I'm also very much not a Windows person these days, and most of my knowledge on the API side is quite outdated by now -- so I can only base that on reading the same manual page you did... Gaining better testability definitely seems worth it, so I think an approach of "push to master and see what explodes" is reasonable :) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Added schema level support for publication.
Peter Smith writes: > Then the question from Peter E. [2] "Why can't I have a publication > that publishes tables t1, t2, t3, *and* schemas s1, s2, s3." would > have an intuitive solution like: > CREATE PUBLICATION pub1 > FOR TABLE t1,t2,t3 AND > FOR ALL TABLES IN SCHEMA s1,s2,s3; That seems a bit awkward, since the existing precedent is to use commas. We shouldn't need more than one FOR noise-word, either. So I was imagining syntax more like, say, CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2, SEQUENCE seq1,seq2, ALL SEQUENCES IN SCHEMA s3,s4; Abstractly it'd be createpub := CREATE PUBLICATION pubname FOR cpitem [, ... ] [ WITH ... ] cpitem := ALL TABLES | TABLE name | ALL TABLES IN SCHEMA name | ALL SEQUENCES | SEQUENCE name | ALL SEQUENCES IN SCHEMA name | name The grammar output would need some post-analysis to attribute the right type to bare "name" items, but that doesn't seem difficult. regards, tom lane
Some RELKIND macro refactoring
Attached patches introduce more macros to group some RELKIND_* macros: - RELKIND_HAS_PARTITIONS() - RELKIND_HAS_TABLESPACE() - RELKIND_HAS_TABLE_AM() - RELKIND_HAS_INDEX_AM() I collected those mainly while working on the relkind error messages patch [0]. I think they improve the self-documentation of the code in many places that are currently just random collections or endless streams of RELKIND macros. Some may recall the previous thread [1] that made a similar proposal. The result here was that those macros were too thinly sliced and not generally useful enough. My proposal is completely separate from that. [0]: https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f...@2ndquadrant.com [1]: https://www.postgresql.org/message-id/flat/CAFjFpRcfzs%2Byst6YBCseD_orEcDNuAr9GUTraZ5GC%3DAvCYh55Q%40mail.gmail.com From 356e2e5c56b9856693a5df24038abd0361d97d92 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 16 Aug 2021 14:25:59 +0200 Subject: [PATCH v1 1/2] pg_dump: Remove redundant relkind checks It is checked in flagInhTables() which relkinds can have parents. After that, those entries will have numParents==0, so we don't need to check the relkind again. --- src/bin/pg_dump/common.c | 8 +--- src/bin/pg_dump/pg_dump.c | 7 +-- 2 files changed, 2 insertions(+), 13 deletions(-) diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c index 1f24e79665..7b85718075 100644 --- a/src/bin/pg_dump/common.c +++ b/src/bin/pg_dump/common.c @@ -501,13 +501,7 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables) int numParents; TableInfo **parents; - /* Some kinds never have parents */ - if (tbinfo->relkind == RELKIND_SEQUENCE || - tbinfo->relkind == RELKIND_VIEW || - tbinfo->relkind == RELKIND_MATVIEW) - continue; - - /* Don't bother computing anything for non-target tables, either */ + /* Don't bother computing anything for non-target tables */ if (!tbinfo->dobj.dump) continue; diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 90ac445bcd..d114377bde 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -2725,12 +2725,7 @@ guessConstraintInheritance(TableInfo *tblinfo, int numTables) TableInfo **parents; TableInfo *parent; - /* Sequences and views never have parents */ - if (tbinfo->relkind == RELKIND_SEQUENCE || - tbinfo->relkind == RELKIND_VIEW) - continue; - - /* Don't bother computing anything for non-target tables, either */ + /* Don't bother computing anything for non-target tables */ if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)) continue; -- 2.32.0 From 0656f3959518bfa1bd03e8bea3028ccf69b1edad Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 16 Aug 2021 14:30:26 +0200 Subject: [PATCH v1 2/2] Some RELKIND macro refactoring Add more macros to group some RELKIND_* macros: - RELKIND_HAS_PARTITIONS() - RELKIND_HAS_TABLESPACE() - RELKIND_HAS_TABLE_AM() - RELKIND_HAS_INDEX_AM() --- contrib/amcheck/verify_heapam.c| 4 +- contrib/pg_surgery/heap_surgery.c | 4 +- contrib/pg_visibility/pg_visibility.c | 4 +- src/backend/access/index/indexam.c | 3 +- src/backend/catalog/heap.c | 146 + src/backend/catalog/index.c| 2 +- src/backend/commands/indexcmds.c | 9 +- src/backend/commands/tablecmds.c | 8 +- src/backend/storage/buffer/bufmgr.c| 42 +++ src/backend/utils/adt/amutils.c| 3 +- src/backend/utils/adt/partitionfuncs.c | 7 +- src/backend/utils/cache/relcache.c | 89 +-- src/bin/pg_dump/pg_dump.c | 17 ++- src/include/catalog/pg_class.h | 33 ++ 14 files changed, 153 insertions(+), 218 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index 226271923a..1e9624b84d 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -303,9 +303,7 @@ verify_heapam(PG_FUNCTION_ARGS) /* * Check that a relation's relkind and access method are both supported. */ - if (ctx.rel->rd_rel->relkind != RELKIND_RELATION && - ctx.rel->rd_rel->relkind != RELKIND_MATVIEW && - ctx.rel->rd_rel->relkind != RELKIND_TOASTVALUE) + if (!RELKIND_HAS_TABLE_AM(ctx.rel->rd_rel->relkind)) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot check relation \"%s\"", diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surger
Re: PoC/WIP: Extended statistics on expressions
On 8/16/21 3:31 AM, Justin Pryzby wrote: On 1/22/21 5:01 AM, Justin Pryzby wrote: In any case, why are there so many parentheses ? On Fri, Jan 22, 2021 at 02:09:04PM +0100, Tomas Vondra wrote: That's a bug in pg_get_statisticsobj_worker, probably. It shouldn't be adding extra parentheses, on top of what deparse_expression_pretty does. Will fix. The extra parens are still here - is it intended ? Ah, thanks for reminding me! I was looking at this, and the problem is that pg_get_statisticsobj_worker only does this: prettyFlags = PRETTYFLAG_INDENT; Changing that to prettyFlags = PRETTYFLAG_INDENT | PRETTYFLAG_PAREN; fixes this (not sure we need the INDENT flag - probably not). I'm a bit confused, though. My assumption was "PRETTYFLAG_PAREN = true" would force the deparsing itself to add the parens, if needed, but in reality it works the other way around. I guess it's more complicated due to deparsing multi-level expressions, but unfortunately, there's no comment explaining what it does. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Slightly improve initdb --sync-only option's help message
> On 30 Jul 2021, at 18:27, Bossart, Nathan wrote: > > On 7/30/21, 2:22 AM, "Daniel Gustafsson" wrote: >> LGTM. I took the liberty to rephrase the "and exit" part of the initdb help >> output match the other exiting options in there. Barring objections, I think >> this is ready. > > LGTM. Thanks! Pushed to master, thanks! -- Daniel Gustafsson https://vmware.com/
RE: Why timestamptz_pl_interval and timestamptz_mi_interval are not immutable?
> > What do I miss? > > -- > Best regards, > Alexander Pyhalov, > Postgres Professional > See for example around DST changes postgres=# begin; BEGIN postgres =# show timezone; TimeZone -- Europe/Amsterdam (1 row) postgres=# select '2021-03-27 15:00 +0100'::timestamptz + interval '1d'; ?column? 2021-03-28 15:00:00+02 (1 row) postgres =# set timezone to UTC; SET postgres =# select '2021-03-27 15:00 +0100'::timestamptz + interval '1d'; ?column? 2021-03-28 14:00:00+00 (1 row) postgres =# select '2021-03-28 15:00:00+02' = '2021-03-28 14:00:00+00'; ?column? -- f (1 row)
Why timestamptz_pl_interval and timestamptz_mi_interval are not immutable?
Hi. I'm currently looking on pushing down SQLValue expressions to foreign servers and was surprised that two timestamptz-related functions are not immutable. I see that this was changed in commit commit 1ab415596d1de61561d0de8fe9da4aea207adca4 Author: Tom Lane Date: Mon Oct 4 22:13:14 2004 + Correct the volatility labeling of ten timestamp-related functions, per discussion from Friday. initdb not forced in this commit but I intend to do that later. I'm not sure, why timestamptz_pl_interval and timestamptz_mi_interval are not immutable. Even if we change timezone during transaction, addition of the same interval to the same timestamps with time zone gives the same result. postgres=# begin ; BEGIN postgres=*# select current_timestamp; current_timestamp --- 2021-08-16 13:26:59.366452+03 (1 row) postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03'; timestamptz --- 2021-08-16 13:26:59.366452+03 (1 row) postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03' + '2 days'::interval; ?column? --- 2021-08-18 13:26:59.366452+03 (1 row) postgres=*# set timezone to UTC; SET postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03' + '2 days'::interval; ?column? --- 2021-08-18 10:26:59.366452+00 (1 row) postgres=*# select timestamptz '2021-08-18 13:26:59.366452+03' = timestamptz '2021-08-18 10:26:59.366452+00'; ?column? -- t (1 row) What do I miss? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Window Function "Run Conditions"
On Thu, 1 Jul 2021 at 21:11, David Rowley wrote: > 1) Unsure of the API to the prosupport function. I wonder if the > prosupport function should just be able to say if the function is > either monotonically increasing or decreasing or neither then have > core code build a qual. That would make the job of building new > functions easier, but massively reduce the flexibility of the feature. > I'm just not sure it needs to do more in the future. I looked at this patch again today and ended up changing the API that I'd done for the prosupport functions. These just now set a new "monotonic" field in the (also newly renamed) SupportRequestWFuncMonotonic struct. This can be set to one of the values from the newly added MonotonicFunction enum, namely: MONOTONICFUNC_NONE, MONOTONICFUNC_INCREASING, MONOTONICFUNC_DECREASING or MONOTONICFUNC_BOTH. I also added handling for a few more cases that are perhaps rare but could be done with just a few lines of code. For example; COUNT(*) OVER() is MONOTONICFUNC_BOTH as it can neither increase nor decrease for a given window partition. I think technically all of the standard set of aggregate functions could have a prosupport function to handle that case. Min() and Max() could go a little further, but I'm not sure if adding handling for that would be worth it, and if someone does think that it is worth it, then I'd rather do that as a separate patch. I put the MonotonicFunction enum in plannodes.h. There's nothing specific about window functions or support functions. It could, for example, be reused again for something else such as monotonic set-returning functions. One thing which I'm still not sure about is where find_window_run_conditions() should be located. Currently, it's in allpaths.c but that does not really feel like the right place to me. We do have planagg.c in src/backend/optimizer/plan, maybe we need planwindow.c? David v2_teach_planner_and_executor_about_monotonic_wfuncs.patch Description: Binary data
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> > (1) There should be no output to stderr in the tests. Why isn't > > this > > message being caught and redirected to the normal test output file? > > These are generated during the compilation of the tests with the > pre-processor, so that's outside the test runs. This is actually a deeper issue, we have no test for the compiler itself, other than the source code it generates. We do not test warnings or errors thrown by it. The topic has come up ages ago and we simply removed the test that generated the (planned) warning message. > > (2) This message is both unintelligible and grammatically > > incorrect. > > Yeah, debugging such tests would be more helpful if the name of the > DECLARE statement is included, at least. Those messages being > generated is not normal anyway, which is something coming from the > tests as a typo with the connection name of stmt_3. > > Michael, what do you think about the attached? I think what Tom was saying is that it should be either "is overwritten with" or "is rewritten to", but you raise a very good point. Adding the statement name makes the message better. I fully agree. However, it should be the other way round, the DECLARE STATEMENT changes the connection that is used. You patch removes the warning but by doing that also removes the feature that is being tested. I'm not sure what's the best way to go about it, Shall we accept to not test this particular feature and remove the warning? After all this is not the way the statement should be used, hence the warning. Or should be keep it in and redirect the warning? In that case, we would also lose other warnings that are not planned, though. Any comments? Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org signature.asc Description: This is a digitally signed message part
Non-decimal integer literals
Here is a patch to add support for hexadecimal, octal, and binary integer literals: 0x42E 0o112 0b100101 per SQL:202x draft. This adds support in the lexer as well as in the integer type input functions. Those core parts are straightforward enough, but there are a bunch of other places where integers are parsed, and one could consider in each case whether they should get the same treatment, for example the replication syntax lexer, or input function for oid, numeric, and int2vector. There are also some opportunities to move some code around, for example scanint8() could be in numutils.c. I have also looked with some suspicion at some details of the number lexing in ecpg, but haven't found anything I could break yet. Suggestions are welcome. From f2a9b37968a55bf91feb2b4753745c9f5a64be2e Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 16 Aug 2021 09:32:14 +0200 Subject: [PATCH v1] Non-decimal integer literals Add support for hexadecimal, octal, and binary integer literals: 0x42E 0o112 0b100101 per SQL:202x draft. This adds support in the lexer as well as in the integer type input functions. --- doc/src/sgml/syntax.sgml | 26 src/backend/catalog/sql_features.txt | 1 + src/backend/parser/scan.l| 70 ++-- src/backend/utils/adt/int8.c | 54 src/backend/utils/adt/numutils.c | 97 src/fe_utils/psqlscan.l | 55 +++- src/interfaces/ecpg/preproc/pgc.l| 64 +++--- src/test/regress/expected/int2.out | 19 ++ src/test/regress/expected/int4.out | 37 +++ src/test/regress/expected/int8.out | 19 ++ src/test/regress/sql/int2.sql| 7 ++ src/test/regress/sql/int4.sql| 11 src/test/regress/sql/int8.sql| 7 ++ 13 files changed, 412 insertions(+), 55 deletions(-) diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index d66560b587..8fb4b1228d 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -694,6 +694,32 @@ Numeric Constants + + Additionally, non-decimal integer constants can be used in these forms: + +0xhexdigits +0ooctdigits +0bbindigits + + hexdigits is one or more hexadecimal digits + (0-9, A-F), octdigits is one or more octal + digits (0-7), bindigits is one or more binary + digits (0 or 1). Hexadecimal digits and the radix prefixes can be in + upper or lower case. Note that only integers can have non-decimal forms, + not numbers with fractional parts. + + + + These are some examples of this: +0b100101 +0B10011001 +0o112 +0O755 +0x42e +0X + + + integer bigint diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index 9f424216e2..d6359503f3 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -526,6 +526,7 @@ T652SQL-dynamic statements in SQL routines NO T653 SQL-schema statements in external routines YES T654 SQL-dynamic statements in external routines NO T655 Cyclically dependent routines YES +T661 Non-decimal integer literalsYES SQL:202x draft T811 Basic SQL/JSON constructor functionsNO T812 SQL/JSON: JSON_OBJECTAGGNO T813 SQL/JSON: JSON_ARRAYAGG with ORDER BY NO diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l index 6e6824faeb..83458ffb30 100644 --- a/src/backend/parser/scan.l +++ b/src/backend/parser/scan.l @@ -262,7 +262,7 @@ quotecontinuefail {whitespace}*"-"? xbstart[bB]{quote} xbinside [^']* -/* Hexadecimal number */ +/* Hexadecimal byte string */ xhstart[xX]{quote} xhinside [^']* @@ -341,7 +341,7 @@ xcstart \/\*{op_chars}* xcstop \*+\/ xcinside [^*/]+ -digit [0-9] + ident_start[A-Za-z\200-\377_] ident_cont [A-Za-z\200-\377_0-9\$] @@ -380,24 +380,41 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=] op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=] operator {op_chars}+ -/* we no longer allow unary minus in numbers. - * instead we pass it separately to parser. there it gets - * coerced via doNegate() -- Leon aug 20 1999 +/* + * Numbers + * + * Unary minus is not part of a number here. Instead we pass it separately to + * parser, and there it gets coerced via doNegate(). * - * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10. + * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10. * * {realfail1} and {realfail2} are added to prevent the need for
Is it worth pushing conditions to sublink/subplan?
Hi Hackers, Recently, a issue has been bothering me, This is about conditional push-down in SQL. I use cases from regression testing as an example. I found that the conditions (B =1) can be pushed down into the subquery, However, it cannot be pushed down to sublink/subplan. If a sublink/subplan clause contains a partition table, it can be useful to get the conditions for pruning. So, is it worth pushing conditions to sublink/subplan? Anybody have any ideas? regards, Wenjing example: create table p (a int, b int, c int) partition by list (a); create table p1 partition of p for values in (1); create table p2 partition of p for values in (2); create table q (a int, b int, c int) partition by list (a); create table q1 partition of q for values in (1) partition by list (b); create table q11 partition of q1 for values in (1) partition by list (c); create table q111 partition of q11 for values in (1); create table q2 partition of q for values in (2) partition by list (b); create table q21 partition of q2 for values in (1); create table q22 partition of q2 for values in (2); insert into q22 values (2, 2, 3); postgres-# explain (costs off) postgres-# select temp.b from postgres-# ( postgres(# select a,b from ab x where x.a = 1 postgres(# union all postgres(# (values(1,1)) postgres(# ) temp, postgres-# ab y postgres-# where y.b = temp.b and y.a = 1 and y.b=1; QUERY PLAN --- Nested Loop -> Seq Scan on ab_a1_b1 y Filter: ((b = 1) AND (a = 1)) -> Append -> Subquery Scan on "*SELECT* 1" -> Seq Scan on ab_a1_b1 x Filter: ((a = 1) AND (b = 1)) -> Result (8 rows) The conditions (B =1) can be pushed down into the subquery. postgres=# explain (costs off) postgres-# select postgres-# y.a, postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b postgres-# from ab y where a = 1 and b = 1; QUERY PLAN --- Seq Scan on ab_a1_b1 y Filter: ((a = 1) AND (b = 1)) SubPlan 1 -> Append -> Seq Scan on ab_a1_b1 x_1 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a1_b2 x_2 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a1_b3 x_3 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a2_b1 x_4 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a2_b2 x_5 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a2_b3 x_6 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a3_b1 x_7 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a3_b2 x_8 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a3_b3 x_9 Filter: ((y.a = a) AND (y.b = b)) (22 rows) The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in targetlist. postgres=# explain (costs off) postgres-# select y.a postgres-# from ab y postgres-# where postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and postgres-# y.a = 1 and y.b = 1; QUERY PLAN --- Seq Scan on ab_a1_b1 y Filter: ((a = 1) AND (b = 1) AND (SubPlan 1)) SubPlan 1 -> Append -> Seq Scan on ab_a1_b1 x_1 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a1_b2 x_2 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a1_b3 x_3 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a2_b1 x_4 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a2_b2 x_5 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a2_b3 x_6 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a3_b1 x_7 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a3_b2 x_8 Filter: ((y.a = a) AND (y.b = b)) -> Seq Scan on ab_a3_b3 x_9 Filter: ((y.a = a) AND (y.b = b)) (22 rows) The conditions (B=1 and A=1) cannot be pushed down to sublink/subplan in where clause. smime.p7s Description: S/MIME cryptographic signature
PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
During some development on encoding-related parts of postgres I stumbled over the new length-checking-code in common/hex.c/pg_hex_encode. Differently when comparing it to all versions before the output-buffer-length is checked during encoding of every individual source byte. This may impose quite a regression when using pg_dump on databases with many/big bytea or lo columns. Because all criteria to check buffer-length are known in advance of encoding (srclen and destlen) I propose doing the check only once before starting the while-loop. Please find the attached patch for pg14 and master, older versions did not have this behavior. Tested on pg14-beta3, but applies also on master. PS: This is my very first patch, I am in no way an experienced C-developer and don't have a smoothly running development environment or experience yet. (originally mostly dealing with postgres on Windows). If it seems useful somebody could enter it as an open item / resolved item for pg14 after beta 3. Thanks for looking! Hans Buschmann hex_encode_length_check_outside_loop.patch Description: hex_encode_length_check_outside_loop.patch
Re: Failure of subscription tests with topminnow
On Mon, Aug 16, 2021 at 9:24 AM Michael Paquier wrote: > > Hi all, > > topminnow has just failed in a weird fashion: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2021-08-15%2013%3A24%3A58 > # SELECT pid != FROM pg_stat_replication WHERE application_name = 'tap_sub'; > # expecting this output: > # t > # last actual query output: > # > # with stderr: > # ERROR: syntax error at or near "FROM" > # LINE 1: SELECT pid != FROM pg_stat_replication WHERE application_na... > > Looking at the logs, it seems like the problem boils down to an active > slot when using ALTER SUBSCRIPTION tap_sub CONNECTION: > 2021-08-15 18:44:38.027 CEST [16473:2] ERROR: could not start WAL > streaming: ERROR: replication slot "tap_sub" is active for PID 16336 > The "ALTER SUBSCRIPTION tap_sub CONNECTION" would lead to restart the walsender process. Now, here the problem seems to be that the previous walsender process (16336) didn't exit and the new process started to use the slot which leads to the error shown in the log. This is evident from the below part of log where we can see that 16336 has exited after new process started to use the slot. 2021-08-15 18:44:38.027 CEST [16475:6] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1', publication_names '"tap_pub","tap_pub_ins_only"') 2021-08-15 18:44:38.027 CEST [16475:7] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1', publication_names '"tap_pub","tap_pub_ins_only"') 2021-08-15 18:44:38.027 CEST [16475:8] tap_sub ERROR: replication slot "tap_sub" is active for PID 16336 2021-08-15 18:44:38.027 CEST [16475:9] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1', publication_names '"tap_pub","tap_pub_ins_only"') 2021-08-15 18:44:38.041 CEST [16475:10] tap_sub LOG: disconnection: session time: 0:00:00.036 user=nm database=postgres host=[local] 2021-08-15 18:44:38.043 CEST [16336:14] tap_sub LOG: disconnection: session time: 0:00:06.367 user=nm database=postgres host=[local] One idea to solve this is to first disable the subscription, wait for the walsender process to exit, and then change the connection string and re-enable the subscription. -- With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Mon, Aug 16, 2021 at 5:54 PM houzj.f...@fujitsu.com wrote: > > Here is another comment: > > +char * > +logicalrep_message_type(LogicalRepMsgType action) > +{ > ... > + case LOGICAL_REP_MSG_STREAM_END: > + return "STREAM END"; > ... > > I think most the existing code use "STREAM STOP" to describe the > LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in > function logicalrep_message_type() too ? > +1 I think you're right, it should be "STREAM STOP" in that case. Regards, Greg Nancarrow Fujitsu Australia
RE: Skipping logical replication transactions on subscriber side
Monday, August 16, 2021 3:00 PM Hou, Zhijie wrote: > On Thu, Aug 12, 2021 1:53 PM Masahiko Sawada wrote: > > I've attached the updated patches. FYI I've included the patch > > (v8-0005) that fixes the assertion failure during shared fileset > > cleanup to make cfbot tests happy. > > Hi, > > Thanks for the new patches. > I have a few comments on the v8-0001 patch. > 3) > Do we need to invoke set_apply_error_context_xact() in the function > apply_handle_stream_prepare() to save the xid and timestamp ? Sorry, this comment wasn't correct, please ignore it. Here is another comment: +char * +logicalrep_message_type(LogicalRepMsgType action) +{ ... + case LOGICAL_REP_MSG_STREAM_END: + return "STREAM END"; ... I think most the existing code use "STREAM STOP" to describe the LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in function logicalrep_message_type() too ? Best regards, Hou zj
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Wed, Aug 11, 2021 at 10:41:59PM +0200, Michael Meskes wrote: > I'm not sure I understand. Any usage of DECLARE STATEMENT makes the > file need the current version of ecpg anyway. On the other hand > DEALLOCATE did not change its behavior if no DECLARE STATEMENT was > issued, or what did I miss? Yes, you are right here. I went through the code again and noticed by mistake. Sorry for the noise. -- Michael signature.asc Description: PGP signature
RE: Skipping logical replication transactions on subscriber side
On Thu, Aug 12, 2021 1:53 PM Masahiko Sawada wrote: > I've attached the updated patches. FYI I've included the patch > (v8-0005) that fixes the assertion failure during shared fileset cleanup to > make > cfbot tests happy. Hi, Thanks for the new patches. I have a few comments on the v8-0001 patch. 1) + + if (TransactionIdIsNormal(errarg->remote_xid)) + appendStringInfo(&buf, _(" in transaction id %u with commit timestamp %s"), +errarg->remote_xid, +errarg->commit_ts == 0 +? "(unset)" +: timestamptz_to_str(errarg->commit_ts)); + + errcontext("%s", buf.data); I think we can output the timestamp in a separete check which can be more consistent with the other code style in apply_error_callback() (ie) + if (errarg->commit_ts != 0) + appendStringInfo(&buf, _(" with commit timestamp %s"), + timestamptz_to_str(errarg->commit_ts)); 2) +/* + * Get string representing LogicalRepMsgType. + */ +char * +logicalrep_message_type(LogicalRepMsgType action) +{ ... + + elog(ERROR, "invalid logical replication message type \"%c\"", action); +} Some old compilers might complain that the function doesn't have a return value at the end of the function, maybe we can code like the following: +char * +logicalrep_message_type(LogicalRepMsgType action) +{ + switch (action) + { + case LOGICAL_REP_MSG_BEGIN: + return "BEGIN"; ... + default: + elog(ERROR, "invalid logical replication message type \"%c\"", action); + } + return NULL;/* keep compiler quiet */ +} 3) Do we need to invoke set_apply_error_context_xact() in the function apply_handle_stream_prepare() to save the xid and timestamp ? Best regards, Hou zj