Re: Support logical replication of DDLs
On Thu, Apr 20, 2023 at 2:28 PM shveta malik wrote: > > Few comments for ddl_json.c in the patch dated April17: > ... > > 3) expand_jsonb_array() > arrayelem is allocated as below, but not freed. > initStringInfo() > > 4) expand_jsonb_array(), > we initialize iterator as below which internally does palloc > it = JsonbIteratorInit(container); > Shall this be freed at the end? I see usage of this function in other files. > At a few places, it is freed while not freed at other places. > Normally, it is a good idea to free whenever the allocation is done in a long-lived context. However, in some places, we free just for the sake of cleanliness. I think we don't need to bother doing retail free in this case unless it is allocated in some long-lived context. > 5) deparse_ddl_json_to_string(char *json_str, char** owner) > str = palloc(value->val.string.len + 1); > we do palloc here and return allocated memory to caller as 'owner'. Caller > sets this 'owner' using SetConfigOption() which internally allocates new > memory and copies 'owner' to that. So the memory allocated in > deparse_ddl_json_to_string is never freed. Better way should be the caller > passing this allocated memory to deparse_ddl_json_to_string() and freeing it > when done. Thoughts? > I think that will complicate the code. I don't see the need to allocate this in any longer-lived context, so we shouldn't be bothered to retail pfree it. > 6)expand_fmt_recursive(): > value = findJsonbValueFromContainer(container, JB_FOBJECT, ); > Should this 'value' be freed at the end like we do at all other places in > this file? > Yeah, we can do this for the sake of consistency. Few comments on 0001 patch: = 1. + case 'O': + specifier = SpecOperatorName; + break; ... + case 'R': + specifier = SpecRole; + break; + default: Both of these specifiers don't seem to be used in the patch. So, we can remove them. I see some references to 'O' in the comments, those also need to be modified. 2. + /* For identity column, find the sequence owned by column in order + * to deparse the column definition */ In multi-line comments, the first and last lines should be empty. Refer to multi-line comments at other places. 3. + return object_name.data; + +} An extra empty line before } is not required. -- With Regards, Amit Kapila.
RE: Test failures of 100_bugs.pl
On Tue, Jan 24, 2023 7:41 PM Amit Kapila wrote: > > On Tue, Jan 24, 2023 at 8:53 AM Andres Freund wrote: > > > > cfbot, the buildfarm and locally I have seen 100_bugs.pl fail > > occasionally. Just rarely enough that I never got around to looking into it > > for real. > > > ... > > > > We see t2 added to the publication: > > 2023-01-24 00:57:30.099 UTC [73654][client backend] [100_bugs.pl][7/5:0] > LOG: statement: ALTER PUBLICATION testpub ADD TABLE t2 > > > > And that *then* "t" was synchronized: > > 2023-01-24 00:57:30.102 UTC [73640][logical replication worker] LOG: > > logical > replication table synchronization worker for subscription "testsub", table > "t" has > finished > > > > and then that the refresh was issued: > > 2023-01-24 00:57:30.128 UTC [73657][client backend] [100_bugs.pl][5/10:0] > LOG: statement: ALTER SUBSCRIPTION testsub REFRESH PUBLICATION > > > > And we see a walsender starting and the query to get the new tables being > executed: > > 2023-01-24 00:57:30.139 UTC [73660][walsender] [testsub][6/8:0] LOG: > statement: SELECT DISTINCT t.schemaname, t.tablename > > , t.attnames > > FROM pg_catalog.pg_publication_tables t > > WHERE t.pubname IN ('testpub') > > > > > > And that's it, the rest of the time is just polling. > > > > > > Perhaps wait_for_subscription_sync() should dump the set of rel states to > make > > something like this more debuggable? > > > > > > The fact that the synchronization for t finished just before the refresh > > makes > > me wonder if a wakeup or a cache invalidation got lost? > > > > From the LOGs, the only thing one could draw is lost invalidation > because the nap time of the apply worker is 1s, so it should process > invalidation during the time we are polling. Also, the rel should be > added to pg_subscription_rel because the test is still polling for > rels to be in 'ready' or 'done' state. > > I think we can do three things to debug (a) as you suggest dump the > rel state in wait_for_subscription_sync; (b) add some DEBUG log in > invalidate_syncing_table_states() to ensure that invalidation has been > processed; (c) print rel states and relids from table_states_not_ready > in process_syncing_tables_for_apply() to see if t2 has ever appeared > in that list. > There was a similar buildfarm failure on francolin recently[1]. I think the problem is that after REFRESH PUBLICATION, the table sync worker for the new table test_mismatching_types was not started. So, the test timed out while waiting for an ERROR message that should have been reported by the table sync worker. -- regress_log_014_binary: timed out waiting for match: (?^:ERROR: ( [A-Z0-9]+:)? incorrect binary data format) at /home/bf/bf-build/francolin/HEAD/pgsql/src/test/subscription/t/014_binary.pl line 269. 014_binary_subscriber.log: 2023-04-16 18:18:38.455 UTC [3079482] 014_binary.pl LOG: statement: ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; 2023-04-16 18:21:39.219 UTC [3079474] ERROR: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. -- I wrote a patch to dump rel state in wait_for_subscription_sync() as suggested. Please see the attached patch. I will try to add some debug logs in code later. [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin=2023-04-16%2018%3A17%3A09 Regards, Shi Yu v1-0001-dump-rel-state-in-wait_for_subscription_sync.patch Description: v1-0001-dump-rel-state-in-wait_for_subscription_sync.patch
Minor code de-duplication in fe-connect.c
Commit [1] implements Fisher-Yates shuffling algorithm to shuffle connection addresses, in two places. The attached patch moves the duplicated code to a function, and calls it in those 2 places. [1]: Support connection load balancing in libpq 7f5b19817eaf38e70ad1153db4e644ee9456853e Best regards, Gurjeet https://Gurje.et Postgres Contributors Team, http://aws.amazon.com v0-0001-Reduce-code-duplication-in-fe-connect.c.patch Description: Binary data
Re: Make message strings in fe-connect.c consistent
On Thu, Apr 20, 2023 at 9:31 PM Tom Lane wrote: > > Gurjeet Singh writes: > > When reviewing a recently committed patch [1] I noticed the odd usage > > of a format specifier: > > > + libpq_append_conn_error(conn, "invalid %s value: \"%s\"", > > + "load_balance_hosts", > > + conn->load_balance_hosts); > > > The oddity is that the first %s is unnecessary, since the value we > > want there is a constant. Typically a format specifier used to get the > > value stored in a variable. > > This is actually intentional, on the grounds that it reduces the > number of format strings that require translation. That's the only reason I too could come up with. > > There's just one exception to this pattern, though. > > >> libpq_append_conn_error(conn, "invalid require_auth method: \"%s\"", > >> method); > > Yup, this one did not get the memo. That explains why I could not find any translation for this error message. Best regards, Gurjeet http://Gurje.et Postgres Contributors Team, http://aws.amazon.com
Re: Make message strings in fe-connect.c consistent
Gurjeet Singh writes: > When reviewing a recently committed patch [1] I noticed the odd usage > of a format specifier: > + libpq_append_conn_error(conn, "invalid %s value: \"%s\"", > + "load_balance_hosts", > + conn->load_balance_hosts); > The oddity is that the first %s is unnecessary, since the value we > want there is a constant. Typically a format specifier used to get the > value stored in a variable. This is actually intentional, on the grounds that it reduces the number of format strings that require translation. > There's just one exception to this pattern, though. >> libpq_append_conn_error(conn, "invalid require_auth method: \"%s\"", >> method); Yup, this one did not get the memo. regards, tom lane
Make message strings in fe-connect.c consistent
When reviewing a recently committed patch [1] I noticed the odd usage of a format specifier: + libpq_append_conn_error(conn, "invalid %s value: \"%s\"", + "load_balance_hosts", + conn->load_balance_hosts); The oddity is that the first %s is unnecessary, since the value we want there is a constant. Typically a format specifier used to get the value stored in a variable. Upon closer look, it looks like this is a common pattern in fe-connect.c; there are many places where a %s format specifier is being used in the format sting, where the name of the parameter would have sufficed. Upon some research, the only explanation I could come up with was that this pattern of specifying error messages helps with message translations. This way there's just one message to be translated. For example: .../libpq/po/es.po-#: fe-connect.c:1268 fe-connect.c:1294 fe-connect.c:1336 fe-connect.c:1345 .../libpq/po/es.po-#: fe-connect.c:1378 fe-connect.c:1422 .../libpq/po/es.po-#, c-format .../libpq/po/es.po:msgid "invalid %s value: \"%s\"\n" There's just one exception to this pattern, though. > libpq_append_conn_error(conn, "invalid require_auth method: \"%s\"", > method); So, to make it consistent throughout the file, we should either replace all such %s format specifiers with the respective strings, or use the same pattern for the message string used for require_method, as well. Attached patch [2] does the former, and patch [3] does the latter. Pick your favorite one. [1]: Support connection load balancing in libpq 7f5b19817eaf38e70ad1153db4e644ee9456853e [2]: Replace placeholders with known strings v1-0001-Replace-placeholders-with-known-strings.patch [3]: Make require_auth error message similar to surrounding messages v1-0001-Make-require_auth-error-message-similar-to-surrou.patch Best regards, Gurjeet http://Gurje.et Postgres Contributors Team, http://aws.amazon.com v1-0001-Replace-placeholders-with-known-strings.patch Description: Binary data v1-0001-Make-require_auth-error-message-similar-to-surrou.patch Description: Binary data
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
Greate news! Hearty congratulations to Nathan, Amit and Masahiko. On Thu, Apr 20, 2023 at 11:10 PM Tom Lane wrote: > > The Core Team would like to extend our congratulations to > Nathan Bossart, Amit Langote, and Masahiko Sawada, who have > accepted invitations to become our newest Postgres committers. > > Please join me in wishing them much success and few reverts. > > regards, tom lane > > -- Best Wishes, Ashutosh Bapat
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
On Fri, Apr 21, 2023 at 5:40 AM Tom Lane wrote: > The Core Team would like to extend our congratulations to > Nathan Bossart, Amit Langote, and Masahiko Sawada, who have > accepted invitations to become our newest Postgres committers. > > Please join me in wishing them much success and few reverts. +100. Great news.
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
On Thu, Apr 20, 2023 at 11:10 PM Tom Lane wrote: > > The Core Team would like to extend our congratulations to > Nathan Bossart, Amit Langote, and Masahiko Sawada, who have > accepted invitations to become our newest Postgres committers. > > Please join me in wishing them much success and few reverts. > Congratulations to the new committers! -- With Regards, Amit Kapila.
Re: Testing autovacuum wraparound (including failsafe)
I agree having the new functions in the tree is useful. I also tried running the TAP tests in v2, but 001 and 002 fail to run: Odd number of elements in hash assignment at [...]/Cluster.pm line 2010. Can't locate object method "pump_nb" via package "PostgreSQL::Test::BackgroundPsql" at [...] It seems to be complaining about +my $in = ''; +my $out = ''; +my $timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default); +my $background_psql = $node->background_psql('postgres', \$in, \$out, $timeout); ...that call to background_psql doesn't look like other ones that have "key => value". Is there something I'm missing? -- John Naylor EDB: http://www.enterprisedb.com
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
At Thu, 20 Apr 2023 13:40:49 -0400, Tom Lane wrote in > The Core Team would like to extend our congratulations to > Nathan Bossart, Amit Langote, and Masahiko Sawada, who have > accepted invitations to become our newest Postgres committers. > > Please join me in wishing them much success and few reverts. Congratulations to all, from me, too, wishing much success! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
On Thu, 20 Apr 2023 at 23:10, Tom Lane wrote: > > The Core Team would like to extend our congratulations to > Nathan Bossart, Amit Langote, and Masahiko Sawada, who have > accepted invitations to become our newest Postgres committers. > > Please join me in wishing them much success and few reverts. Very deserved. Congratulations Nathan, Amit and Masahiko. Regards, Vignesh
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
On Fri, Apr 21, 2023 at 12:40 AM Tom Lane wrote: > > The Core Team would like to extend our congratulations to > Nathan Bossart, Amit Langote, and Masahiko Sawada, who have > accepted invitations to become our newest Postgres committers. > > Please join me in wishing them much success and few reverts. Congratulations to all! -- John Naylor EDB: http://www.enterprisedb.com
Re: Support logical replication of DDLs
Here are some more review comments for the patch 0002-2023_04_07-2 This was a WIP review in parts because the patch was quite large: WIP part 1 [1] was posted 17/4. WIP part 2 [2] was posted 17/4. WIP part 3 [3] was posted 19/4. WIP part 4 is this post. (This is my final WIP part for this 0002 patch) == contrib/test_decoding/sql/ddl.sql 1. +SELECT 'ddl msg2' FROM pg_logical_emit_ddl_message('ddl msg2', 16394, 1, '{"fmt": "CREATE SCHEMA %{if_not_exists}s %{name}I %{authorization}s", "name": "foo", "authorization": {"fmt": "AUTHORIZATION %{authorization_role}I", "present": false, "authorization_role": null}, "if_not_exists": ""}'); I wasn't entirely sure what are these tests showing. It seems to do nothing but hardwire a bunch of random stuff and then print it out again. Am I missing something? And are those just bogus content payloads? Maybe they are valid JSON but AFAICT nobody is using them. What is the advantage of using this bogus payload data instead of just a simple string like "DDL message content goes here"? == contrib/test_decoding/test_decoding.c 2. _PG_output_plugin_init cb->message_cb = pg_decode_message; + cb->ddl_cb = pg_decode_ddl_message; cb->filter_prepare_cb = pg_decode_filter_prepare; Where is the 'stream_ddl_cb' to match this? ~~~ 3. pg_decode_ddl_message +static void +pg_decode_ddl_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, + XLogRecPtr message_lsn, const char *prefix, Oid relid, + DeparsedCommandType cmdtype, Size sz, const char *message) +{ + OutputPluginPrepareWrite(ctx, true); + appendStringInfo(ctx->out, "message: prefix: %s, relid: %u, ", + prefix, relid); Should the appendStringInfo say "DDL message:" instead of "message"? I can't tell if this was deliberate or a cut/paste error from the existing code. ~~~ 4. pg_decode_ddl_message + appendStringInfo(ctx->out, "sz: %zu content:", sz); + appendBinaryStringInfo(ctx->out, message, sz); + OutputPluginWrite(ctx, true); 4a. Should there be a whitespace after that last 'content:' before the binary content? ~ 4b. Is it necessary to say this is 'Binary'? I thought this payload was only JSON text data. == src/backend/replication/logical/ddltrigger.c 5. +/*- + * + * ddltrigger.c + * Logical DDL messages. + * + * Copyright (c) 2022, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/replication/logical/ddltrigger.c + * + * NOTES + * + * Deparse the ddl command and log it. + * + * --- + */ ~ 5a. Just saying "Logical DDL messages" is the same header comment as in the other new file ddlmessges.c, so it looks like a cut/paste issue. ~ 5b. Should say 2023. ~~~ 6. publication_deparse_ddl_command_start +/* + * Deparse the ddl command and log it prior to + * execution. Currently only used for DROP TABLE command + * so that catalog can be accessed before being deleted. + * This is to check if the table is part of the publication + * or not. + */ +Datum +publication_deparse_ddl_command_start(PG_FUNCTION_ARGS) +{ + EventTriggerData *trigdata; + char*command = psprintf("Drop table command start"); Since information about this only being for DROP is hardcoded and in the function comment, shouldn't this whole function be renamed to something DROP-specific? e.g publication_deparse_ddl_drop_command_start() ~~~ 7. publication_deparse_ddl_command_start + if (relation) + table_close(relation, NoLock); I thought this check was not needed because the relation was already checked earlier in this function so it cannot be NULL here. ~~~ 8. publication_deparse_table_rewrite + char relpersist; + CollectedCommand *cmd; + char*json_string; The json_string can be declared later within the scope that uses it, instead of here at the top. ~~~ 9. publication_deparse_ddl_command_end + ListCell *lc; + slist_iter iter; + DeparsedCommandType type; + Oid relid; + char relkind; 9a. Some of these variable declarations seem misplaced. I think the 'json_string' and the 'type' can be at a lower scope, can't they? ~ 9b. Also IMO it is better to call 'type' as 'cmdtype', so it has the same name as the variable in the other slist_foreach loop. ~~~ 10. publication_deparse_ddl_command_end + foreach(lc, currentEventTriggerState->commandList) + { + char relpersist = RELPERSISTENCE_PERMANENT; + CollectedCommand *cmd = lfirst(lc); + char*json_string; This json_string can be declared later only in the scope that uses it. ~~~ 11. publication_deparse_ddl_command_end + if (cmd->type == SCT_Simple && + !OidIsValid(cmd->d.simple.address.objectId)) + continue; + + if (cmd->type == SCT_AlterTable) + { + relid = cmd->d.alterTable.objectId; + type = DCT_TableAlter; + } + else + { + /* Only SCT_Simple for now */ + relid = cmd->d.simple.address.objectId; + type = DCT_SimpleCmd; + } This code seemed structured a bit strangely to me;
Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name
On Wed, Mar 15, 2023 at 10:38:34AM +0100, Daniel Gustafsson wrote: > While here, I wonder if we should document what BGW_MAXLEN is defined as in > bgworker.sgml? I am -0.5 for this. If you are writing a new background worker, it's probably reasonable to expect that you can locate the definition of BGW_MAXLEN. Also, I think there's a good chance that we'd forget to update such documentation the next time we adjust it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Should we remove vacuum_defer_cleanup_age?
On Fri, Apr 14, 2023 at 03:07:37PM -0400, Jonathan S. Katz wrote: > +1. +1. I agree with the upthread discussion and support removing vacuum_defer_cleanup_age. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Memory leak from ExecutorState context?
On Thu, Apr 20, 2023 at 12:42 PM Konstantin Knizhnik wrote: > On 11.04.2023 8:14 PM, Jehan-Guillaume de Rorthais wrote: > > On Sat, 8 Apr 2023 02:01:19 +0200 > > Jehan-Guillaume de Rorthais wrote: > > > >> On Fri, 31 Mar 2023 14:06:11 +0200 > >> Jehan-Guillaume de Rorthais wrote: > >> > >> [...] > >> > >> After rebasing Tomas' memory balancing patch, I did some memory measures > >> to answer some of my questions. Please, find in attachment the resulting > >> charts "HJ-HEAD.png" and "balancing-v3.png" to compare memory consumption > >> between HEAD and Tomas' patch. They shows an alternance of numbers > >> before/after calling ExecHashIncreaseNumBatches (see the debug patch). I > >> didn't try to find the exact last total peak of memory consumption during > >> the > >> join phase and before all the BufFiles are destroyed. So the last number > >> might be underestimated. > > I did some more analysis about the total memory consumption in filecxt of > > HEAD, > > v3 and v4 patches. My previous debug numbers only prints memory metrics > > during > > batch increments or hash table destruction. That means: > > > > * for HEAD: we miss the batches consumed during the outer scan > > * for v3: adds twice nbatch in spaceUsed, which is a rough estimation > > * for v4: batches are tracked in spaceUsed, so they are reflected in > > spacePeak > > > > Using a breakpoint in ExecHashJoinSaveTuple to print > > "filecxt->mem_allocated" > > from there, here are the maximum allocated memory for bufFile context for > > each > > branch: > > > > batches max bufFiles total spaceAllowed rise > >HEAD16384 199966960 ~194MB > >v3 4096 65419456 ~78MB > >v4(*3) 2048 3427328048MB nbatch*sizeof(PGAlignedBlock)*3 > >v4(*4) 1024 17170160 60.6MB nbatch*sizeof(PGAlignedBlock)*4 > >v4(*5) 2048 34273280 42.5MB nbatch*sizeof(PGAlignedBlock)*5 > > > > It seems account for bufFile in spaceUsed allows a better memory balancing > > and > > management. The precise factor to rise spaceAllowed is yet to be defined. > > *3 or > > *4 looks good, but this is based on a single artificial test case. > > > > Also, note that HEAD is currently reporting ~4MB of memory usage. This is by > > far wrong with the reality. So even if we don't commit the balancing memory > > patch in v16, maybe we could account for filecxt in spaceUsed as a bugfix? > > > > Regards, > > Thank you for the patch. > I faced with the same problem (OOM caused by hash join). > I tried to create simplest test reproducing the problem: > > create table t(pk int, val int); > insert into t values (generate_series(1,1),0); > set work_mem='64kB'; > explain (analyze,buffers) select count(*) from t t1 join t t2 on > (t1.pk=t2.pk); > > > There are three workers and size of each exceeds 1.3Gb. > > Plan is the following: > > Finalize Aggregate (cost=355905977972.87..355905977972.88 rows=1 > width=8) (actual time=2 > 12961.033..226097.513 rows=1 loops=1) > Buffers: shared hit=32644 read=852474 dirtied=437947 written=426374, > temp read=944407 w > ritten=1130380 > -> Gather (cost=355905977972.65..355905977972.86 rows=2 width=8) > (actual time=212943. > 505..226097.497 rows=3 loops=1) > Workers Planned: 2 > Workers Launched: 2 > Buffers: shared hit=32644 read=852474 dirtied=437947 > written=426374, temp read=94 > 4407 written=1130380 > -> Partial Aggregate (cost=355905976972.65..355905976972.66 > rows=1 width=8) (ac > tual time=212938.410..212940.035 rows=1 loops=3) > Buffers: shared hit=32644 read=852474 dirtied=437947 > written=426374, temp r > ead=944407 written=1130380 > -> Parallel Hash Join (cost=1542739.26..303822614472.65 > rows=2083334502 width=0) (actual time=163268.274..207829.524 > rows= loops=3) > Hash Cond: (t1.pk = t2.pk) > Buffers: shared hit=32644 read=852474 > dirtied=437947 written=426374, temp read=944407 written=1130380 > -> Parallel Seq Scan on t t1 > (cost=0.00..859144.78 rows=4178 width=4) (actual > time=0.045..30828.051 rows= loops=3) > Buffers: shared hit=16389 read=426089 written=87 > -> Parallel Hash (cost=859144.78..859144.78 > rows=4178 width=4) (actual time=82202.445..82202.447 rows= > loops=3) > Buckets: 4096 (originally 4096) Batches: > 32768 (originally 8192) Memory Usage: 192kB > Buffers: shared hit=16095 read=426383 > dirtied=437947 written=426287, temp read=267898 written=737164 > -> Parallel Seq Scan on t t2 > (cost=0.00..859144.78 rows=4178 width=4) (actual > time=0.054..12647.534 rows= loops=3) > Buffers: shared hit=16095 read=426383 > dirtied=437947 writ > ten=426287 > Planning: >
Re: Fix typos and inconsistencies for v16
On Wed, 19 Apr 2023 at 07:00, Alexander Lakhin wrote: > please look at the similar list for v15+ (596b5af1d..HEAD). I've now pushed most of these but didn't include the following ones: > 3. BufFileOpenShared -> BufFileOpenFileSet // see dcac5e7ac Maybe I need to spend longer, but I just didn't believe the command that claimed that "BufFiles opened using BufFileOpenFileSet() are read-only by definition". Looking at the code for that, it seems to depend on if O_RDONLY is included in the mode flags. > 19. multidimensional-aware -> multidimension-aware // sync with gistbuild.c I didn't change this as I didn't think it was an improvement. I'd probably have written "multidimensionally aware", but I didn't feel strongly enough to want to change it. David
Re: Remove io prefix from pg_stat_io columns
On Thu, Apr 20, 2023 at 11:38:42AM +0900, Michael Paquier wrote: > No worries, committers should take care of that. Done as of 0ecb87e, as I can keep an eye on the buildfarm today, with a catversion bump. -- Michael signature.asc Description: PGP signature
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
On Fri, 21 Apr 2023 at 05:40, Tom Lane wrote: > > The Core Team would like to extend our congratulations to > Nathan Bossart, Amit Langote, and Masahiko Sawada, who have > accepted invitations to become our newest Postgres committers. > > Please join me in wishing them much success and few reverts. Great news! Congratulations to all three! David
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
On Thu, Apr 20, 2023 at 01:40:49PM -0400, Tom Lane wrote: > The Core Team would like to extend our congratulations to > Nathan Bossart, Amit Langote, and Masahiko Sawada, who have > accepted invitations to become our newest Postgres committers. > > Please join me in wishing them much success and few reverts. Congratulations to all of you! May the buildfarm show kindness to your commits. -- Michael signature.asc Description: PGP signature
Re: Incremental sort for access method with ordered scan support (amcanorderbyop)
On Thu, 20 Apr 2023 at 18:46, Richard Guo wrote: > > > On Thu, Apr 20, 2023 at 6:38 AM David Rowley wrote: >> >> That function is pretty new and was exactly added so we didn't have to >> write list_truncate(list_copy(...), n) anymore. That gets pretty >> wasteful when the input List is long and we only need a small portion >> of it. > > I searched the codes and found some other places where the manipulation > of lists can be improved in a similar way. I'd be happy to discuss our thought about List inefficiencies, but I think to be fair to Miroslav, we should do that somewhere else. The list_copy_head() discussion was directly related to his patch due to the list of list_truncate(list_copy(..), ..). The other things you've mentioned are not. Feel free to start a thread and copy me in. David
Re: Incremental sort for access method with ordered scan support (amcanorderbyop)
>I searched the codes and found some other places where the manipulation >of lists can be improved in a similar way. >* lappend(list_copy(list), datum) as in get_required_extension(). >This is not very efficient as after list_copy it would need to enlarge >the list immediately. It can be improved by inventing a new function, >maybe called list_append_copy, that do the copy and append all together. >* lcons(datum, list_copy(list)) as in get_query_def(). >This is also not efficient. Immediately after list_copy, we'd need to >enlarge the list and move all the entries. It can also be improved by >doing all these things all together in one function. >* lcons(datum, list_delete_nth_cell(list_copy(list), n)) as in >sort_inner_and_outer. >It'd need to copy all the elements, and then delete the n'th entry which >would cause all following entries be moved, and then move all the >remaining entries for lcons. Maybe we can invent a new function for it? >So is it worthwhile to improve these places? I think yes. It's very inefficient coping and moving, unnecessarily. Perhaps, like the attached patch? lcons_copy_delete needs a careful review. >I wonder if we can invent function list_nth_xid to do it, to keep >consistent with list_nth/list_nth_int/list_nth_oid. Perhaps list_nth_xid(const List *list, int n)? regards, Ranier Vilela 0001-add-new-list-functions.patch Description: Binary data
Re: LLVM strip -x fails
Andres Freund writes: > On 2023-04-20 12:43:48 -0400, Tom Lane wrote: >> The previous complaint about this [1] suggested we use --strip-unneeded >> for all cases with GNU strip, same as we've long done for shared libs. >> It's an easy enough change, but I wonder if anyone will complain. > I doubt anybody wants to strip symbols and keep debug information, so I doubt > there's much ground for complaints? Agreed. It doesn't look like --strip-unneeded is a worse choice than -x. > Oddly the output of llvm-strip confuses binutils objdump enough that it claims > that "file format not recognized". Not sure which side is broken there. Not our problem, I'd say ... > llvm-strip's output is a lot larger than gnu strip's: ... nor that. These things do suggest that llvm-strip isn't all that close to being ready for prime time, but if FreeBSD wants to push the envelope on toolchain usage, who are we to stand in the way? I'll go make it so. regards, tom lane
Re: Improve logging when using Huge Pages
On Thu, Mar 23, 2023 at 07:23:28AM +0900, Michael Paquier wrote: > On Wed, Mar 22, 2023 at 05:18:28PM -0500, Justin Pryzby wrote: >> Wow, good point. I think to make it work we'd need put >> huge_pages_active into BackendParameters and handle it in >> save_backend_variables(). If so, that'd be strong argument for using a >> GUC, which already has all the necessary infrastructure for exposing the >> server's state. > > I am afraid so, duplicating an existing infrastructure for a need like > the one of this thread is not really appealing. AFAICT this would involve adding a bool to BackendParameters and using it in save_backend_variables() and restore_backend_variables(), which is an additional 3 lines of code. That doesn't sound too bad to me, but perhaps I am missing something. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Non-superuser subscription owners
On Thu, Apr 20, 2023 at 1:08 AM Amit Kapila wrote: > Pushed. I noticed that we didn't display this new subscription option > 'password_required' in \dRs+: > > postgres=# \dRs+ > > List of subscriptions > Name | Owner | Enabled | Publication | Binary | Streaming | > Two-phase commit | Disable on error | Origin | Run as Owner? | > Synchronous commit |Conninfo | Skip LSN > > Is that intentional? Sorry, if it was discussed previously because I > haven't followed this discussion in detail. No, I don't think that's intentional. I just didn't think about it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Tab completion for GRANT MAINTAIN
On Wed, Apr 19, 2023 at 10:54:05AM -0400, Tom Lane wrote: > (One could wish that it didn't take touching three or so places in > tab-complete.c to add a privilege, especially when a naive hacker might > think he was done after touching Privilege_options_of_grant_and_revoke. > I didn't see any easy way to improve that situation though.) Sorry, I think this was my fault. Thanks for fixing. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Request for comment on setting binary format output per session
On Tue, 18 Apr 2023 at 12:31, Dave Cramer wrote: > > > On Tue, 18 Apr 2023 at 12:24, Robert Haas wrote: > >> On Tue, Apr 18, 2023 at 11:51 AM Tom Lane wrote: >> > Robert Haas writes: >> > > One thing I think we should do in this area is introduce #defines for >> > > all the message type codes and use those instead of having hard-coded >> > > constants everywhere. >> > >> > +1, but I wonder where we should put those exactly. My first thought >> > was postgres_ext.h, but the charter for that is >> > >> > * This file contains declarations of things that are visible >> everywhere >> > * in PostgreSQL *and* are visible to clients of frontend interface >> libraries. >> > * For example, the Oid type is part of the API of libpq and other >> libraries. >> > >> > so picayune details of the wire protocol probably don't belong there. >> > Maybe we need a new header concerned with the wire protocol? >> >> Yeah. I sort of thought maybe one of the files in src/include/libpq >> would be the right place, but it doesn't look like it. >> >> If we at least created the defines and replaced occurrences with the > same, then we can litigate where to put them later. > > I think I'd prefer this in a different patch, but I'd be willing to take a > run at it. > As promised here is a patch with defines for all of the protocol messages. I created a protocol.h file and put it in src/includes I'm fairly sure that some of the names I used may need to be changed but the grunt work of finding and replacing everything is done. Dave Cramer > > Dave > 0001-Created-protocol.h.patch Description: Binary data
Re: LLVM strip -x fails
Hi, On 2023-04-20 12:43:48 -0400, Tom Lane wrote: > Andres Freund writes: > > Afaict the only safe thing to use when stripping static libs is > > -g/--strip-debug. > > The previous complaint about this [1] suggested we use --strip-unneeded > for all cases with GNU strip, same as we've long done for shared libs. > It's an easy enough change, but I wonder if anyone will complain. --strip-unneeded output is smaller than -x output: 19M src/interfaces/libpq/libpq.a 364K src/interfaces/libpq/libpq.a.strip.gnu.g 352K src/interfaces/libpq/libpq.a.strip.gnu.unneeded 356K src/interfaces/libpq/libpq.a.strip.gnu.x 352K src/interfaces/libpq/libpq.a.strip.gnu.x.g strip --version GNU strip (GNU Binutils for Debian) 2.40 Partially that's because --strip-unneeded implies -g. Interestingly -x -g output isn't quite the same as --strip-unneeded. The latter also removes the _GLOBAL_OFFSET_TABLE_ symbol. I doubt anybody wants to strip symbols and keep debug information, so I doubt there's much ground for complaints? Oddly the output of llvm-strip confuses binutils objdump enough that it claims that "file format not recognized". Not sure which side is broken there. llvm-strip's output is a lot larger than gnu strip's: 19M src/interfaces/libpq/libpq.a 19M src/interfaces/libpq/libpq.a.strip.llvm.X 908K src/interfaces/libpq/libpq.a.strip.llvm.g 892K src/interfaces/libpq/libpq.a.strip.llvm.unneeded 892K src/interfaces/libpq/libpq.a.strip.llvm.unneeded.g 364K src/interfaces/libpq/libpq.a.strip.gnu.g 356K src/interfaces/libpq/libpq.a.strip.gnu.x 352K src/interfaces/libpq/libpq.a.strip.gnu.x.g 352K src/interfaces/libpq/libpq.a.strip.gnu.unneeded Greetings, Andres Freund
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
On 2023-04-20 Th 14:12, Peter Geoghegan wrote: On Thu, Apr 20, 2023 at 10:56 AM Pavel Borisov wrote: It's much deserved! Congratulations, Nathan, Amit and Masahiko! Congratulations to all three! +3 cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Incremental sort for access method with ordered scan support (amcanorderbyop)
> I've just pushed a fix to master for this. See [1]. If you base your > patch atop of that you should be able to list list_copy_head() without > any issues. Thanks for this fix. Now the version am_orderbyop_incremental_sort_v3.1.patch [1] works without issues using the master branch. [1] https://www.postgresql.org/message-id/attachment/146450/am_orderbyop_incremental_sort_v3.1.patch
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
On Thu, Apr 20, 2023 at 10:56 AM Pavel Borisov wrote: > It's much deserved! Congratulations, Nathan, Amit and Masahiko! Congratulations to all three! -- Peter Geoghegan
Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
On Thu, 20 Apr 2023 at 21:40, Tom Lane wrote: > > The Core Team would like to extend our congratulations to > Nathan Bossart, Amit Langote, and Masahiko Sawada, who have > accepted invitations to become our newest Postgres committers. > > Please join me in wishing them much success and few reverts. > > regards, tom lane Great news! It's much deserved! Congratulations, Nathan, Amit and Masahiko! Pavel Borisov
New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
The Core Team would like to extend our congratulations to Nathan Bossart, Amit Langote, and Masahiko Sawada, who have accepted invitations to become our newest Postgres committers. Please join me in wishing them much success and few reverts. regards, tom lane
Re: LLVM strip -x fails
Andres Freund writes: > Afaict the only safe thing to use when stripping static libs is > -g/--strip-debug. The previous complaint about this [1] suggested we use --strip-unneeded for all cases with GNU strip, same as we've long done for shared libs. It's an easy enough change, but I wonder if anyone will complain. regards, tom lane [1] https://www.postgresql.org/message-id/17898-5308d09543463266%40postgresql.org
Re: Memory leak from ExecutorState context?
On 11.04.2023 8:14 PM, Jehan-Guillaume de Rorthais wrote: On Sat, 8 Apr 2023 02:01:19 +0200 Jehan-Guillaume de Rorthais wrote: On Fri, 31 Mar 2023 14:06:11 +0200 Jehan-Guillaume de Rorthais wrote: [...] After rebasing Tomas' memory balancing patch, I did some memory measures to answer some of my questions. Please, find in attachment the resulting charts "HJ-HEAD.png" and "balancing-v3.png" to compare memory consumption between HEAD and Tomas' patch. They shows an alternance of numbers before/after calling ExecHashIncreaseNumBatches (see the debug patch). I didn't try to find the exact last total peak of memory consumption during the join phase and before all the BufFiles are destroyed. So the last number might be underestimated. I did some more analysis about the total memory consumption in filecxt of HEAD, v3 and v4 patches. My previous debug numbers only prints memory metrics during batch increments or hash table destruction. That means: * for HEAD: we miss the batches consumed during the outer scan * for v3: adds twice nbatch in spaceUsed, which is a rough estimation * for v4: batches are tracked in spaceUsed, so they are reflected in spacePeak Using a breakpoint in ExecHashJoinSaveTuple to print "filecxt->mem_allocated" from there, here are the maximum allocated memory for bufFile context for each branch: batches max bufFiles total spaceAllowed rise HEAD16384 199966960 ~194MB v3 4096 65419456 ~78MB v4(*3) 2048 3427328048MB nbatch*sizeof(PGAlignedBlock)*3 v4(*4) 1024 17170160 60.6MB nbatch*sizeof(PGAlignedBlock)*4 v4(*5) 2048 34273280 42.5MB nbatch*sizeof(PGAlignedBlock)*5 It seems account for bufFile in spaceUsed allows a better memory balancing and management. The precise factor to rise spaceAllowed is yet to be defined. *3 or *4 looks good, but this is based on a single artificial test case. Also, note that HEAD is currently reporting ~4MB of memory usage. This is by far wrong with the reality. So even if we don't commit the balancing memory patch in v16, maybe we could account for filecxt in spaceUsed as a bugfix? Regards, Thank you for the patch. I faced with the same problem (OOM caused by hash join). I tried to create simplest test reproducing the problem: create table t(pk int, val int); insert into t values (generate_series(1,1),0); set work_mem='64kB'; explain (analyze,buffers) select count(*) from t t1 join t t2 on (t1.pk=t2.pk); There are three workers and size of each exceeds 1.3Gb. Plan is the following: Finalize Aggregate (cost=355905977972.87..355905977972.88 rows=1 width=8) (actual time=2 12961.033..226097.513 rows=1 loops=1) Buffers: shared hit=32644 read=852474 dirtied=437947 written=426374, temp read=944407 w ritten=1130380 -> Gather (cost=355905977972.65..355905977972.86 rows=2 width=8) (actual time=212943. 505..226097.497 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=32644 read=852474 dirtied=437947 written=426374, temp read=94 4407 written=1130380 -> Partial Aggregate (cost=355905976972.65..355905976972.66 rows=1 width=8) (ac tual time=212938.410..212940.035 rows=1 loops=3) Buffers: shared hit=32644 read=852474 dirtied=437947 written=426374, temp r ead=944407 written=1130380 -> Parallel Hash Join (cost=1542739.26..303822614472.65 rows=2083334502 width=0) (actual time=163268.274..207829.524 rows= loops=3) Hash Cond: (t1.pk = t2.pk) Buffers: shared hit=32644 read=852474 dirtied=437947 written=426374, temp read=944407 written=1130380 -> Parallel Seq Scan on t t1 (cost=0.00..859144.78 rows=4178 width=4) (actual time=0.045..30828.051 rows= loops=3) Buffers: shared hit=16389 read=426089 written=87 -> Parallel Hash (cost=859144.78..859144.78 rows=4178 width=4) (actual time=82202.445..82202.447 rows= loops=3) Buckets: 4096 (originally 4096) Batches: 32768 (originally 8192) Memory Usage: 192kB Buffers: shared hit=16095 read=426383 dirtied=437947 written=426287, temp read=267898 written=737164 -> Parallel Seq Scan on t t2 (cost=0.00..859144.78 rows=4178 width=4) (actual time=0.054..12647.534 rows= loops=3) Buffers: shared hit=16095 read=426383 dirtied=437947 writ ten=426287 Planning: Buffers: shared hit=69 read=38 Planning Time: 2.819 ms Execution Time: 226113.292 ms (22 rows) - So we have increased number of batches to 32k. I applied your patches 0001-0004 but unfortunately them have not reduced memory consumption - still size of each backend is more than 1.3Gb. I wonder what can be the prefered solution of the problem? We
Re: Wrong results from Parallel Hash Full Join
On Wed, Apr 19, 2023 at 08:47:07PM -0400, Melanie Plageman wrote: > On Wed, Apr 19, 2023 at 8:41 PM Justin Pryzby wrote: > > > > On Wed, Apr 19, 2023 at 12:20:51PM -0700, Andres Freund wrote: > > > On 2023-04-19 12:16:24 -0500, Justin Pryzby wrote: > > > > On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote: > > > > > Ultimately this is probably fine. If we wanted to modify one of the > > > > > existing tests to cover the multi-batch case, changing the select > > > > > count(*) to a select * would do the trick. I imagine we wouldn't want > > > > > to > > > > > do this because of the excessive output this would produce. I wondered > > > > > if there was a pattern in the tests for getting around this. > > > > > > > > You could use explain (ANALYZE). But the output is machine-dependant in > > > > various ways (which is why the tests use "explain analyze so rarely). > > > > > > I think with sufficient options it's not machine specific. > > > > It *can* be machine specific depending on the node type.. > > > > In particular, for parallel workers, it shows "Workers Launched: ..", > > which can vary even across executions on the same machine. And don't > > forget about "loops=". > > > > Plus: > > src/backend/commands/explain.c: "Buckets: %d Batches: %d Memory Usage: > > %ldkB\n", > > > > > We have a bunch of > > > EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) .. > > > in our tests. > > > > There's 81 uses of "timing off", out of a total of ~1600 explains. Most > > of them are in partition_prune.sql. explain analyze is barely used. > > > > I sent a patch to elide the machine-specific parts, which would make it > > easier to use. But there was no interest. > > While I don't know about other use cases, I would have used that here. > Do you still have that patch laying around? I'd be interested to at > least review it. https://commitfest.postgresql.org/41/3409/
Re: Wrong results from Parallel Hash Full Join
On Wed, Apr 19, 2023 at 8:43 PM Melanie Plageman wrote: > On Wed, Apr 19, 2023 at 3:20 PM Andres Freund wrote: >> On 2023-04-19 12:16:24 -0500, Justin Pryzby wrote: >> > On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote: >> > > Ultimately this is probably fine. If we wanted to modify one of the >> > > existing tests to cover the multi-batch case, changing the select >> > > count(*) to a select * would do the trick. I imagine we wouldn't want to >> > > do this because of the excessive output this would produce. I wondered >> > > if there was a pattern in the tests for getting around this. >> > >> > You could use explain (ANALYZE). But the output is machine-dependant in >> > various ways (which is why the tests use "explain analyze so rarely). >> >> I think with sufficient options it's not machine specific. We have a bunch of >> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) .. >> in our tests. > > > Cool. Yea, so ultimately these options are almost enough but memory > usage changes from execution to execution. There are some tests which do > regexp_replace() on the memory usage part of the EXPLAIN ANALYZE output > to allow us to still compare the plans. However, I figured if I was > already going to go to the trouble of using regexp_replace(), I might as > well write a function that returns the "Actual Rows" field from the > EXPLAIN ANALYZE output. > > The attached patch does that. I admittedly mostly copy-pasted the > plpgsql function from similar examples in other tests, and I suspect it > may be overkill and also poorly written. I renamed the function to join_hash_actual_rows to avoid potentially affecting other tests. Nothing about the function is specific to a hash join plan, so I think it is more clear to prefix the function with the test file name. v2 attached. - Melanie From d8f6946cf127092913d8f1b7a9741b97f3215bbd Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Wed, 19 Apr 2023 20:09:37 -0400 Subject: [PATCH v2] Test multi-batch PHJ match bit initialization 558c9d75 fixed a bug with initialization of the hash join match status bit and added test coverage for serial hash join and single batch parallel hash join. Add a test for multi-batch parallel hash join. To test this with the existing relations in the join_hash test file, this commit modifies some of the test queries to use SELECT * instead of SELECT COUNT(*), which was needed to ensure that the tuples being inserted into the hashtable had not already been made into virtual tuples and lost their HOT status bit. These modifications made the original tests introduced in 558c9d75 redundant. Discussion: https://postgr.es/m/ZEAh6CewmARbDVuN%40telsasoft.com --- src/test/regress/expected/join_hash.out | 134 +++- src/test/regress/sql/join_hash.sql | 72 +++-- 2 files changed, 100 insertions(+), 106 deletions(-) diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out index e892e7cccb..d4a70f6754 100644 --- a/src/test/regress/expected/join_hash.out +++ b/src/test/regress/expected/join_hash.out @@ -49,10 +49,35 @@ begin end loop; end; $$; +-- Return the number of actual rows returned by a query. This is useful when a +-- count(*) would not execute the desired codepath but a SELECT * would produce +-- an excessive number of rows. +create or replace function join_hash_actual_rows(query text) +returns jsonb language plpgsql +as +$$ +declare + elements jsonb; +begin + execute 'explain (analyze, costs off, summary off, timing off, format ''json'') ' || query into strict elements; + if jsonb_array_length(elements) > 1 then +raise exception 'Cannot return actual rows from more than one plan'; + end if; + return elements->0->'Plan'->'Actual Rows'; +end; +$$; -- Make a simple relation with well distributed keys and correctly -- estimated size. -create table simple as - select generate_series(1, 2) AS id, 'aa'; +create table simple (id int, value text); +insert into simple values (1, 'aa'); +-- Hash join reuses the HOT status bit to indicate match status. This can only +-- be guaranteed to produce correct results if all the hash join tuple match +-- bits are reset before reuse. This is done upon loading them into the +-- hashtable. To test this, update simple, creating a HOT tuple. If this status +-- bit isn't cleared, we won't correctly emit the NULL-extended unmatching +-- tuple in full hash join. +update simple set id = 1; +insert into simple select generate_series(2, 2), 'aa'; alter table simple set (parallel_workers = 2); analyze simple; -- Make a relation whose size we will under-estimate. We want stats @@ -273,6 +298,7 @@ set local max_parallel_workers_per_gather = 2; set local work_mem = '192kB'; set local hash_mem_multiplier = 1.0; set local enable_parallel_hash = on; +set local parallel_tuple_cost = 0; explain
Re: LLVM strip -x fails
Hi, Moving this to -hackers. On 2023-04-20 11:49:23 +0200, Palle Girgensohn wrote: > I was recently made aware of a problem building postgresql using LLVM > binutils. > > A summary: > > -- > > pgsql's build has requested to strip all non-global symbols (strip -x), but > there is at least one non-global symbol that in fact cannot be stripped > because it is referenced by a relocation. An important detail here is that this happens when stripping static libraries. It's not too surprising that one needs the symbols referenced by relocations until after linking with the static lib, even if they're not globally visible symbols. > Both GNU strip and ELF Tool Chain strip silently handle this case (and just > retain the local symbol), but LLVM strip is stricter and emits an error upon > request to strip a non-removable local symbol. > > There is an LLVM ticket open for this at > https://github.com/llvm/llvm-project/issues/47468, and it may make sense for > LLVM strip to behave the same as GNU and ELF Tool Chain strip. That said, > pgsql should just not use strip -x when there are symbols that cannot be > stripped. Personally I'd say stripping symbols is something that should just not be done anymore, but ... > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270769 > > https://reviews.freebsd.org/D39590 > > > Any toughts about this? Should send the suggested patch upstreams? Or do we > just consider LLVM to behave badly? Peter, it's unlikely given the timeframe, but do you happen to remember why you specified -x when stripping static libs? This seems to be all the way back from commit 563673e15db995b6f531b44be7bb162330ac157a Author: Peter Eisentraut Date: 2002-04-10 16:45:25 + Add make install-strip target. Afaict the only safe thing to use when stripping static libs is -g/--strip-debug. Greetings, Andres Freund
Re: check_strxfrm_bug()
On Thu, 2023-04-20 at 13:34 +1200, Thomas Munro wrote: > I could write a patch to remove the libc strxfrm support, but since > Jeff recently wrote new code in 16 to abstract that stuff, he might > prefer to look at it? +1 to removing it. As far as how it's removed, we could directly check: if (!collate_c && !(locale && locale->provider == COLLPROVIDER_ICU)) abbreviate = false; as it was before, or we could still try to hide it as a detail behind a function. I don't have a strong opinion there, though I thought it might be good for varlena.c to not know those internal details. Regards, Jeff Davis
memory context mismatch in LockMethodLocalHash entries.
Hi All, LockMethodLocalHash hash table is allocated in memory context ""LOCALLOCK hash". LockAcquireExtended() fetches an entry from this hash table and allocates memory for locallock->lockOwners in TopMemoryContext. Thus the entry locallock and an array pointed from this entry are allocated in two different memory contexts. Theoretically if LockMethodLocalHash was destroyed with some entries in it, it would leave some dangling pointers in TopMemoryContext. It looks to me that the array lockOwners should be allocated in same context as LOCALLOCK hash or its child. Something like below diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 42595b38b2..32804b1c2c 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -843,7 +843,7 @@ LockAcquireExtended(const LOCKTAG *locktag, locallock->maxLockOwners = 8; locallock->lockOwners = NULL; /* in case next line fails */ locallock->lockOwners = (LOCALLOCKOWNER *) - MemoryContextAlloc(TopMemoryContext, + MemoryContextAlloc(GetMemoryChunkContext(locallock), locallock->maxLockOwners * sizeof(LOCALLOCKOWNER)); } else LockMethodLocalHash is hash_destroyed() in InitLocks(). The comment there mentions that possibly there are no entries in that hash table at that time. So the problem described above is rare or non-existent as of now. But it's possibly worth fixing while it's not too serious. There were some untraceable bugs related to locks reported earlier [1]. Those may be linked to this. But we couldn't establish the connection. [1] https://www.postgresql.org/message-id/flat/5227.1315428924%40sss.pgh.pa.us#00116525613b7ddb82669d2ba358b31e -- Best Wishes, Ashutosh Bapat
Re: Should we put command options in alphabetical order in the doc?
> On 20 Apr 2023, at 14:40, David Rowley wrote: > I see "man grep" categorises the command line options and then sorts > alphabetically within the category. On FreeBSD and macOS "man grep" lists all options alphabetically. > FWIW, vacuumdb --help has its options in alphabetical order using the > abbreviated form of the option. It does (as most of our binaries do) group "Connection options" separately though, and in initdb --help and pg_dump --help we have other groupings as well. -- Daniel Gustafsson
[BUG] FK broken after DETACHing referencing part
Hi, Considering two partitionned tables with a FK between them: DROP TABLE IF EXISTS p, c, c_1 CASCADE; -- -- Parent table + partition + data CREATE TABLE p ( id bigint PRIMARY KEY ) PARTITION BY list (id); CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1); INSERT INTO p VALUES (1); -- Child table + partition + data CREATE TABLE c ( idbigint PRIMARY KEY, p_id bigint NOT NULL, FOREIGN KEY (p_id) REFERENCES p (id) ) PARTITION BY list (id); CREATE TABLE c_1 PARTITION OF c FOR VALUES IN (1); INSERT INTO c VALUES (1,1); After DETACHing the "c_1" partition, current implementation make sure it keeps the FK herited from its previous top table "c": ALTER TABLE c DETACH PARTITION c_1; \d c_1 -- outputs: -- [...] -- Foreign-key constraints: -- "c_p_id_fkey" FOREIGN KEY (p_id) REFERENCES p(id) However, because the referenced side is partionned, this FK is half backed, with only the referencing (insert/update on c_1) side enforced, but not the referenced side (update/delete on p): INSERT INTO c_1 VALUES (2,2); -- fails as EXPECTED -- ERROR: insert or update on table "child_1" violates foreign key [...] DELETE FROM p; -- should actually fail -- DELETE 1 SELECT * FROM c_1; -- id | parent_id -- +--- -- 1 | 1 -- (1 row) SELECT * FROM p; -- id -- -- (0 rows) When detaching "c_1", current implementation adds two triggers to enforce UPDATE/DELETE on "p" are restricted if "c_1" keeps referencing the related rows... But it forgets to add them on partitions of "p_1", where the triggers should actually fire. To make it clear, the FK c_1 -> p constraint and triggers after DETACHING c_1 are: SELECT c.oid AS conid, c.conname, c.conparentid AS conparent, r2.relname AS pkrel, t.tgrelid::regclass AS tgrel, p.proname FROM pg_constraint c JOIN pg_class r ON c.conrelid = r.oid JOIN pg_class r2 ON c.confrelid = r2.oid JOIN pg_trigger t ON t.tgconstraint = c.oid JOIN pg_proc p ON p.oid = t.tgfoid WHERE r.relname = 'c_1' AND r2.relname LIKE 'p%' ORDER BY r.relname, c.conname, t.tgrelid::regclass::text, p.proname; -- conid | conname | conparent | pkrel | tgrel | proname -- ---+-+---+---+---+-- -- 18454 | c_p_id_fkey | 0 | p | c_1 | RI_FKey_check_ins -- 18454 | c_p_id_fkey | 0 | p | c_1 | RI_FKey_check_upd -- 18454 | c_p_id_fkey | 0 | p | p | RI_FKey_noaction_del -- 18454 | c_p_id_fkey | 0 | p | p | RI_FKey_noaction_upd Where they should be: -- conid | conname| conparent | pkrel | tgrel | proname -- ---+--+---+---+---+-- -- 18454 | c_p_id_fkey | 0 | p | c_1 | RI_FKey_check_ins -- 18454 | c_p_id_fkey | 0 | p | c_1 | RI_FKey_check_upd -- 18454 | c_p_id_fkey | 0 | p | p | RI_FKey_noaction_del -- 18454 | c_p_id_fkey | 0 | p | p | RI_FKey_noaction_upd -- NEW!! | c_p_id_fkey1 | 18454 | p_1 | p_1 | RI_FKey_noaction_del -- NEW!! | c_p_id_fkey1 | 18454 | p_1 | p_1 | RI_FKey_noaction_upd I poked around DetachPartitionFinalize() to try to find a way to fix this, but it looks like it would duplicate a bunch of code from other code path (eg. from CloneFkReferenced). Instead of tweaking existing FK, keeping old constraint name (wouldn't "c_1_p_id_fkey" be better after detach?) and duplicating some code around, what about cleaning up the FK constraints from the detached table and recreating a cleaner one using the known code path ATAddForeignKeyConstraint() ? Thanks for reading me down to here! ++
Re: Should we put command options in alphabetical order in the doc?
On Wed, 19 Apr 2023 at 22:04, Alvaro Herrera wrote: > > On 2023-Apr-18, Peter Geoghegan wrote: > > > While I'm certain that nobody will agree with me on every little > > detail, I have to imagine that most would find my preferred ordering > > quite understandable and unsurprising, at a high level -- this is not > > a hopelessly idiosyncratic ranking, that could just as easily have > > been generated by a PRNG. People may not easily agree that "apples are > > more important than oranges, or vice-versa", but what does it matter? > > I've really only put each option into buckets of items with *roughly* > > the same importance. All of the details beyond that don't matter to > > me, at all. > > I agree with you that roughly bucketing items is a good approach. > Within each bucket we can then sort alphabetically. If these "buckets" were subcategories, then it might be ok. I see "man grep" categorises the command line options and then sorts alphabetically within the category. If we could come up with a way of categorising the options then this would satisfy what Melanie mentioned about having the argument types listed separately. However, I'm really not sure which categories we could have. I really don't have any concrete ideas here, but I'll attempt to at least start something: Behavioral: ANALYZE DISABLE_PAGE_SKIPPING FREEZE FULL INDEX_CLEANUP ONLY_DATABASE_STATS PROCESS_MAIN PROCESS_TOAST SKIP_DATABASE_STATS SKIP_LOCKED TRUNCATE Resource Usage: BUFFER_USAGE_LIMIT PARALLEL Informational: VERBOSE Option Parameters: boolean column_name integer size table_name I'm just not sure if we have enough options to have a need to categorise them. Also, going by the categories I attempted to come up with, it just feels like "Behavioral" contains too many and "Informational" is likely only ever going to contain VERBOSE. So I'm not very happy with them. I'm not really feeling excited enough about this to even come up with a draft patch. I thought I'd send out this anyway to see if anyone can think of anything better. FWIW, vacuumdb --help has its options in alphabetical order using the abbreviated form of the option. David
Re: Initial Schema Sync for Logical Replication
I am working on a prototype with above discussed idea, I think I will send it for initial review by Monday. Regards Sachin
Re: Support logical replication of DDLs
On Thu, Apr 20, 2023 at 2:28 PM shveta malik wrote: > > On Thu, Apr 20, 2023 at 9:11 AM shveta malik wrote: >> >> On Mon, Apr 17, 2023 at 5:32 PM Zhijie Hou (Fujitsu) >> wrote: >> > >> > Attach the new version patch set which include the following changes: >> Few comments for ddl_deparse.c in patch dated April17: >> > Few comments for ddl_json.c in the patch dated April17: > Few more comments, mainly for event_trigger.c in the patch dated April17: 1)EventTriggerCommonSetup() +pub_funcoid = LookupFuncName(pubfuncname, 0, NULL, true); This is the code where we have special handling for ddl-triggers. Here we are passing 'missing_ok' as true, so shouldn't we check pub_funcoid against 'InvalidOid' ? 2) EventTriggerTableInitWriteEnd() + if (stmt->objtype == OBJECT_TABLE) + { +parent = currentEventTriggerState->currentCommand->parent; +pfree(currentEventTriggerState->currentCommand); +currentEventTriggerState->currentCommand = parent; + } + else + { +MemoryContext oldcxt; +oldcxt = MemoryContextSwitchTo(currentEventTriggerState->cxt); +currentEventTriggerState->currentCommand->d.simple.address = address; +currentEventTriggerState->commandList = + lappend(currentEventTriggerState->commandList, + currentEventTriggerState->currentCommand); + + MemoryContextSwitchTo(oldcxt); + } +} It will be good to add some comments in the 'else' part. It is not very much clear what exactly we are doing here and for which scenario. 3) EventTriggerTableInitWrite() + if (!currentEventTriggerState) + return; + + /* Do nothing if no command was collected. */ + if (!currentEventTriggerState->currentCommand) + return; Is it possible that when we reach here no command is collected yet? I think this can happen only for the case when commandCollectionInhibited is true. If so, above can be modified to: if (!currentEventTriggerState || currentEventTriggerState->commandCollectionInhibited) return; Assert(currentEventTriggerState->currentCommand != NULL); This will make the behaviour and checks consistent across similar functions in this file. 4) EventTriggerTableInitWriteEnd() Here as well, we can have the same assert as below: Assert(currentEventTriggerState->currentCommand != NULL); 'currentEventTriggerState' and 'commandCollectionInhibited' checks are already present. 5) logical_replication.sgml: + 'This is especially useful if you have lots schema changes over time that need replication.' lots schema --> lots of schema thanks Shveta
Re: Should we put command options in alphabetical order in the doc?
On Wed, Apr 19, 2023 at 05:33:47PM -0400, Melanie Plageman wrote: > On Wed, Apr 19, 2023 at 2:39 PM Peter Geoghegan wrote: > > > > On Wed, Apr 19, 2023 at 3:04 AM Alvaro Herrera > > wrote: > > > > While I'm certain that nobody will agree with me on every little > > > > detail, I have to imagine that most would find my preferred ordering > > > > quite understandable and unsurprising, at a high level -- this is not > > > > a hopelessly idiosyncratic ranking, that could just as easily have > > > > been generated by a PRNG. People may not easily agree that "apples are > > > > more important than oranges, or vice-versa", but what does it matter? > > > > I've really only put each option into buckets of items with *roughly* > > > > the same importance. All of the details beyond that don't matter to > > > > me, at all. > > > > > > I agree with you that roughly bucketing items is a good approach. > > > Within each bucket we can then sort alphabetically. > > > > I think of these buckets as working at a logarithmic scale. The FULL, > > FREEZE, VERBOSE, and ANALYZE options are multiple orders of magnitude > > more important than most of the other options, and maybe one order of > > magnitude more important than the PARALLEL, TRUNCATE, and > > INDEX_CLEANUP options. With differences that big, you have a structure > > that generalizes across all users quite well. This doesn't seem > > particularly subjective. > > I actually favor query/command order followed by alphabetical order for > most of the commands David included in his patch. > > Of course the parameter argument types, like boolean and integer, should > be grouped together separate from the main parameters. David fit this > into the alphabetical paradigm by doing uppercase alphabetical followed > by lowercase alphabetical. There are some specific cases where I think > this isn't working quite as intended in his patch. I've called those out > in my command-by-command code review below. > > I actually think we should consider having a single location which > defines all argument types for all SQL command parameters. Then we > wouldn't need to define them for each command. We could simply link to > the definition from the synopsis. That would clean up these lists quite > a bit. Perhaps there is some variation from command to command in the > actual definitions, though (I haven't checked). I would be happy to try > and write this patch if folks are interested in the idea. I looked into this and it isn't a good idea. Out of the 183 SQL commands, really only ANALYZE, VACUUM, COPY, CLUSTER, EXPLAIN, and REINDEX have parameter argument types that are context-independent. And out of those, boolean is the only type shared by all. VACUUM is the only one with more than one parameter argument "type". So, it is basically just a bad idea. Oh well... - Melanie
Re: [PATCH] Allow Postgres to pick an unused port to listen
Aleksander, On Thu, Apr 20, 2023 at 1:22 PM Aleksander Alekseev < aleksan...@timescale.com> wrote: > Hi, > > > Also, I don't think there's a case for distributed systems here because > we're only managing a single computer's resource: the allocation of local > ports. > > I don't suggest building a distributed system but rather using > well-known solutions from this area. For the described case the > "resource manager" will be as simple a single Consul instance (a > single binary file, since Consul is written in Go) running locally. > The "complexity" would be for the test framework to use a few extra > REST queries. Arguably not that complicated. > Bringing in a process that works over REST API (requiring itself to have a port, by the way) and increasing the rollout of such an environment is antithetical to simplicity and, thus, will make it only worse. If this is the alternative, I'd rather have a few retries or some other small hacks. Bringing in a new dependency with Python is also a heavy solution I'd rather avoid. I find that this is rather a problem with a relatively small surface. If the patch won't go through, I'll just find a workaround to live with, but I'd rather stay away from blowing the development environment out of proportion for something so minuscule. > > > using a KV store to lease a port does not guarantee the port's > availability > > I assume you don't have random processes doing random things (like > listening random ports) on a CI machine. You know that certain ports > are reserved for the tests and are going to be used only for this > purpose using the same leasing protocol. > This is intended to be used by CI and development workstations, where all bets are kind of off. > > > For example, I'd suggest adding an option to Postgres to receive sockets > it should listen [...] > > Not sure if I fully understood the idea, but it looks like you are > suggesting to build in certain rather complicated functionality for an > arguably rare use case so that a QA engineer didn't have one extra > small dependency to worry about in this rare case. I'm quite skeptical > that this is going to happen. > My suggestion was to simply allow listening for a wildcard port and be able to read it out in some way. Nothing particularly complicated. The fact that the process may die before it is connected to is rather a strange argument as the same can happen outside of this use case. -- Y.
Re: Note new NULLS NOT DISTINCT on unique index tutorial page
On Thu, 20 Apr 2023 at 12:04, David Gilman wrote: > The revised patch is good. Please go ahead and commit whatever > phrasing you or the other committers find acceptable. I don't really > have any preferences in how this is exactly phrased, I just think it > should be mentioned in the docs. Thanks. With that, I admit to further adjusting the wording before I pushed the result. David
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Thu, 20 Apr 2023 at 11:01, Hayato Kuroda (Fujitsu) wrote: > > Dear Vignesh, > > Thank you for reviewing! PSA new patchset. > > > > Additionally, I added a checking functions in 0003 > > > According to pg_resetwal and other functions, the length of > > CHECKPOINT_SHUTDOWN > > > record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + > > sizeof(CheckPoint)). > > > Therefore, the function ensures that the difference between current insert > > position > > > and confirmed_flush_lsn is less than (above + page header). > > > > Logical replication slots can be created only if wal_level >= logical, > > currently we do not have any check to see if wal_level >= logical if > > "--include-logical-replication-slots" option is specified. If > > include-logical-replication-slots is specified with pg_upgrade, we > > will be creating replication slots after a lot of steps like > > performing prechecks, analyzing, freezing, deleting, restoring, > > copying, setting related objects and then create replication slot, > > where we will be erroring out after a lot of time(Many cases > > pg_upgrade takes a lot of hours to perform these operations). I feel > > it would be better to add a check in the beginning itself somewhere in > > check_new_cluster to see if wal_level is set appropriately in case of > > include-logical-replication-slot option to detect and throw an error > > early itself. > > I see your point. Moreover, I think max_replication_slots != 0 must be also > checked. > I added a checking function and related test in 0001. Thanks for the updated patch. A Few comments: 1) if the verbose option is enabled, we should print the new slot information, we could add a function print_slot_infos similar to print_rel_infos which could print slot name and two_phase is enabled or not. + end_progress_output(); + check_ok(); + + /* update new_cluster info now that we have objects in the databases */ + get_db_and_rel_infos(_cluster); +} 2) Since we will be using this option with pg_upgrade, should we use this along with the --binary-upgrade option only? + if (dopt.logical_slots_only && dopt.dataOnly) + pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together"); + if (dopt.logical_slots_only && dopt.schemaOnly) + pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together"); 3) Since it two_phase is boolean, can we use bool data type instead of string: + slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT; + + slotinfo[i].dobj.catId.tableoid = InvalidOid; + slotinfo[i].dobj.catId.oid = InvalidOid; + AssignDumpId([i].dobj); + + slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname)); + + slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin)); + slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i, i_twophase)); We can change it to something like: if (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0) slotinfo[i].twophase = true; else slotinfo[i].twophase = false; 4) The comments are inconsistent, some have termination characters and some don't. We can keep it consistent: +# Can be changed to test the other modes. +my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy'; + +# Initialize old publisher node +my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher'); +$old_publisher->init(allows_streaming => 'logical'); +$old_publisher->start; + +# Initialize subscriber node +my $subscriber = PostgreSQL::Test::Cluster->new('subscriber'); +$subscriber->init(allows_streaming => 'logical'); +$subscriber->start; + +# Schema setup +$old_publisher->safe_psql('postgres', + "CREATE TABLE tbl AS SELECT generate_series(1,10) AS a"); +$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)"); + +# Initialize new publisher node 5) should we use free instead of pfree as used in other function like dumpForeignServer: + appendPQExpBuffer(query, ");"); + + ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId, +ARCHIVE_OPTS(.tag = slotname, + .description = "REPLICATION SLOT", + .section = SECTION_POST_DATA, + .createStmt = query->data)); + + pfree(slotname); + destroyPQExpBuffer(query); + } +} Regards, Vignesh
Re: [PATCH] Allow Postgres to pick an unused port to listen
Hi, > Also, I don't think there's a case for distributed systems here because we're > only managing a single computer's resource: the allocation of local ports. I don't suggest building a distributed system but rather using well-known solutions from this area. For the described case the "resource manager" will be as simple a single Consul instance (a single binary file, since Consul is written in Go) running locally. The "complexity" would be for the test framework to use a few extra REST queries. Arguably not that complicated. > using a KV store to lease a port does not guarantee the port's availability I assume you don't have random processes doing random things (like listening random ports) on a CI machine. You know that certain ports are reserved for the tests and are going to be used only for this purpose using the same leasing protocol. If there are random things happening on CI you have no control of, you are having a problem with the CI infrastructure, not with Postgres. > For example, I'd suggest adding an option to Postgres to receive sockets it > should listen [...] Not sure if I fully understood the idea, but it looks like you are suggesting to build in certain rather complicated functionality for an arguably rare use case so that a QA engineer didn't have one extra small dependency to worry about in this rare case. I'm quite skeptical that this is going to happen. > I am curious, Yurii, is Postgres the only service that need an unused > port for listening in your testing/application environment? Otherwise, > how is this handled in other software? That's a very good point. To clarify, there is nothing wrong with the patch per se. It's merely an unreliable solution for a problem it is supposed to address. I don't think we should encourage the users to build unreliable systems. -- Best regards, Aleksander Alekseev
Re: Initial Schema Sync for Logical Replication
On Mon, Apr 17, 2023 at 9:12 AM Masahiko Sawada wrote: > > On Fri, Apr 7, 2023 at 6:37 PM Amit Kapila wrote: > > > > On Thu, Apr 6, 2023 at 6:57 PM Masahiko Sawada > > wrote: > > > > > > > > > While writing a PoC patch, I found some difficulties in this idea. > > > First, I tried to add schemaname+relname to pg_subscription_rel but I > > > could not define the primary key of pg_subscription_rel. The primary > > > key on (srsubid, srrelid) doesn't work since srrelid could be NULL. > > > Similarly, the primary key on (srsubid, srrelid, schemaname, relname) > > > also doesn't work. > > > > > > > Can we think of having a separate catalog table say > > pg_subscription_remote_rel for this? You can have srsubid, > > remote_schema_name, remote_rel_name, etc. We may need some other state > > to be maintained during the initial schema sync where this table can > > be used. Basically, this can be used to maintain the state till the > > initial schema sync is complete because we can create a relation entry > > in pg_subscritption_rel only after the initial schema sync is > > complete. > > It might not be ideal but I guess it works. But I think we need to > modify the name of replication slot for initial sync as it currently > includes OID of the table: > > void > ReplicationSlotNameForTablesync(Oid suboid, Oid relid, > char *syncslotname, Size szslot) > { > snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, > relid, GetSystemIdentifier()); > } > > If we use both schema name and table name, it's possible that slot > names are duplicated if schema and/or table names are long. Another > idea is to use the hash value of schema+table names, but it cannot > completely eliminate that possibility, and probably would make > investigation and debugging hard in case of any failure. Probably we > can use the OID of each entry in pg_subscription_remote_rel instead, > but I'm not sure it's a good idea, mainly because we will end up using > twice as many OIDs as before. > The other possibility is to use worker_pid. To make debugging easier, we may want to LOG schema_name+rel_name vs slot_name mapping at DEBUG1 log level. -- With Regards, Amit Kapila.
Re: Support logical replication of DDLs
On Thu, Apr 20, 2023 at 9:11 AM shveta malik wrote: > > > Few comments for ddl_deparse.c in patch dated April17: > > 1) append_format_string() > I think we need to have 'Assert(sub_fmt)' here like we have it in all > other similar functions (append_bool_object, append_object_object, > ...) > +1. > 2) append_object_to_format_string() > here we have code piece : > if (sub_fmt == NULL || tree->fmtinfo == NULL) > return sub_fmt; > but sub_fmt will never be null when we reach this function as all its > callers assert for null sub_fmt. So that means when tree->fmtinfo is > null, we end up returning sub_fmt as it is, instead of extracting > object name from that. Is this intended? > > 3) We can remove extra spaces after full-stop in the comment below > /* > * Deparse a ColumnDef node within a typed table creation. This is simpler > * than the regular case, because we don't have to emit the type declaration, > * collation, or default. Here we only return something if the column is > being > * declared NOT NULL. > ... > deparse_ColumnDef_typed() > Which full-stop you are referring to here first or second? I see there is a tab after the first one which should be changed to space. > > 4) These functions are not being used, do we need to retain these in this > patch? > deparse_Type_Storage() > deparse_Type_Receive() > deparse_Type_Send() > deparse_Type_Typmod_In() > deparse_Type_Typmod_Out() > deparse_Type_Analyze() > deparse_Type_Subscript() > I don't think we need to retain these. And, it seems they are already removed. > 5) deparse_AlterRelation() > We have below variable initialized to false in the beginning > 'boolistype = false;' > And then we have many conditional codes using the above, eg: istype ? > "ATTRIBUTE" : "COLUMN". We are not changing 'istype' anywhere and it > is hard-coded in the beginning. It means there are parts of code in > this function which will never be htt (written for 'istype=true' case) > , so why do we need this variable and conditional code around it? > I don't see any usage of istype. We should simply remove the use of istype and use "COLUMN" directly. > > 6) There are plenty of places where we use 'append_not_present' > without using 'append_null_object'. > Do we need to have 'append_null_object' along with > 'append_not_present' at these places? > What is the difference if we add a null object or not before not_present? -- With Regards, Amit Kapila.
RE: Initial Schema Sync for Logical Replication
I am working on a prototype with above Idea , and will send it for review by Sunday/Monday Regards Sachin
Re: [PATCH] Allow Postgres to pick an unused port to listen
Hi, Yurii Rashkovskii a écrit : On Wed, Apr 19, 2023 at 11:44 PM Aleksander Alekseev < aleksan...@timescale.com> wrote: I would like to suggest a patch against master (although it may be worth backporting it) that makes it possible to listen on any unused port. [...] A bullet-proof approach would be (approximately) for the test framework to lease the ports on the given machine, for instance by using a KV value with CAS support like Consul or etcd (or another PostgreSQL instance), as this is done for leader election in distributed systems (so called leader lease). After leasing the port the framework knows no other testing process on the given machine will use it (and also it keeps telling the KV storage that the port is still leased) and specifies it in postgresql.conf as usual. The approach you suggest introduces a significant amount of complexity but seemingly fails to address one of the core issues: using a KV store to lease a port does not guarantee the port's availability. I don't believe this is a sound way to address this issue, let alone a bulletproof one. Also, I don't think there's a case for distributed systems here because we're only managing a single computer's resource: the allocation of local ports. For this (local computer) use case, a tool such as https://github.com/kmike/port-for/ would do the job if I understand correctly (the lease thing, locally). And it would work for "anything", not just Postgres. I am curious, Yurii, is Postgres the only service that need an unused port for listening in your testing/application environment? Otherwise, how is this handled in other software? Cheers, Denis
Re: Support logical replication of DDLs
On Thu, Apr 20, 2023 at 9:11 AM shveta malik wrote: > On Mon, Apr 17, 2023 at 5:32 PM Zhijie Hou (Fujitsu) > wrote: > > > > Attach the new version patch set which include the following changes: > > > > Few comments for ddl_deparse.c in patch dated April17: > > Few comments for ddl_json.c in the patch dated April17: 1) expand_jsonval_string() I think we need to assert if jsonval is neither jbvString nor jbvBinary. 2) expand_jsonval_strlit() same here, assert if not jbvString (like we do in expand_jsonval_number and expand_jsonval_identifier etc) 3) expand_jsonb_array() arrayelem is allocated as below, but not freed. initStringInfo() 4) expand_jsonb_array(), we initialize iterator as below which internally does palloc it = JsonbIteratorInit(container); Shall this be freed at the end? I see usage of this function in other files. At a few places, it is freed while not freed at other places. 5) deparse_ddl_json_to_string(char *json_str, char** owner) str = palloc(value->val.string.len + 1); we do palloc here and return allocated memory to caller as 'owner'. Caller sets this 'owner' using SetConfigOption() which internally allocates new memory and copies 'owner' to that. So the memory allocated in deparse_ddl_json_to_string is never freed. Better way should be the caller passing this allocated memory to deparse_ddl_json_to_string() and freeing it when done. Thoughts? 6)expand_fmt_recursive(): value = findJsonbValueFromContainer(container, JB_FOBJECT, ); Should this 'value' be freed at the end like we do at all other places in this file? thanks Shveta
Re: [PoC] pg_upgrade: allow to upgrade publisher node
Hi, On Thu, Apr 20, 2023 at 05:31:16AM +, Hayato Kuroda (Fujitsu) wrote: > Dear Vignesh, > > Thank you for reviewing! PSA new patchset. > > > > Additionally, I added a checking functions in 0003 > > > According to pg_resetwal and other functions, the length of > > CHECKPOINT_SHUTDOWN > > > record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + > > sizeof(CheckPoint)). > > > Therefore, the function ensures that the difference between current insert > > position > > > and confirmed_flush_lsn is less than (above + page header). I think that this test should be different when just checking for the prerequirements (live_check / --check) compared to actually doing the upgrade, as it's almost guaranteed that the slots won't have sent everything when the source server is up and running. Maybe simply check that all logical slots are currently active when running the live check, so at least there's a good chance that they will still be at shutdown, and will therefore send all the data to the subscribers? Having a regression tests for that scenario would also be a good idea. Having an uncommitted write transaction should be enough to cover it.
Re: Incremental sort for access method with ordered scan support (amcanorderbyop)
On Thu, Apr 20, 2023 at 6:38 AM David Rowley wrote: > That function is pretty new and was exactly added so we didn't have to > write list_truncate(list_copy(...), n) anymore. That gets pretty > wasteful when the input List is long and we only need a small portion > of it. I searched the codes and found some other places where the manipulation of lists can be improved in a similar way. * lappend(list_copy(list), datum) as in get_required_extension(). This is not very efficient as after list_copy it would need to enlarge the list immediately. It can be improved by inventing a new function, maybe called list_append_copy, that do the copy and append all together. * lcons(datum, list_copy(list)) as in get_query_def(). This is also not efficient. Immediately after list_copy, we'd need to enlarge the list and move all the entries. It can also be improved by doing all these things all together in one function. * lcons(datum, list_delete_nth_cell(list_copy(list), n)) as in sort_inner_and_outer. It'd need to copy all the elements, and then delete the n'th entry which would cause all following entries be moved, and then move all the remaining entries for lcons. Maybe we can invent a new function for it? So is it worthwhile to improve these places? Besides, I found one place that can be improved the same way as what we did in 9d299a49. --- a/src/backend/rewrite/rewriteSearchCycle.c +++ b/src/backend/rewrite/rewriteSearchCycle.c @@ -523,7 +523,7 @@ rewriteSearchAndCycle(CommonTableExpr *cte) fexpr = makeFuncExpr(F_INT8INC, INT8OID, list_make1(fs), InvalidOid, InvalidOid, COERCE_EXPLICIT_CALL); - lfirst(list_head(search_col_rowexpr->args)) = fexpr; + linitial(search_col_rowexpr->args) = fexpr; Also, in applyparallelworker.c we have the usage as TransactionId xid_tmp = lfirst_xid(list_nth_cell(subxactlist, i)); I wonder if we can invent function list_nth_xid to do it, to keep consistent with list_nth/list_nth_int/list_nth_oid. Thanks Richard
Re: Logical replication failed with SSL SYSCALL error
On Wed, 19 Apr 2023 at 17:26, shaurya jain <12345shau...@gmail.com> wrote: > > Hi Team, > > Could you please help me with this, It's urgent for the production > environment. > > On Wed, Apr 19, 2023 at 3:44 PM shaurya jain <12345shau...@gmail.com> wrote: >> >> Hi Team, >> >> Could you please help, It's urgent for the production env? >> >> On Sun, Apr 16, 2023 at 2:40 AM shaurya jain <12345shau...@gmail.com> wrote: >>> >>> Hi Team, >>> >>> Postgres Version:- 13.8 >>> Issue:- Logical replication failing with SSL SYSCALL error >>> Priority:-High >>> >>> We are migrating our database through logical replications, and all of >>> sudden below error pops up in the source and target logs which leads us to >>> nowhere. >>> >>> Logs from Source:- >>> LOG: could not send data to client: Connection reset by peer >>> STATEMENT: COPY public.test TO STDOUT >>> FATAL: connection to client lost >>> STATEMENT: COPY public.test TO STDOUT >>> >>> Logs from Target:- >>> 2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from WAL >>> stream: SSL SYSCALL error: Connection timed out >>> 2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932 >>> 2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical >>> replication worker" (PID 1250) exited with exit code 1 >>> 2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table >>> synchronization worker for subscription " sub_tables_2_180", table "test" >>> has started >>> 2023-04-15 19:12:05 >>> UTC:10.144.19.34(33276):postgres@webadmit_staging:[7112]:WARNING: there is >>> no transaction in progress >>> 2023-04-15 19:14:08 >>> UTC:10.144.19.34(33324):postgres@webadmit_staging:[6052]:LOG: could not >>> receive data from client: Connection reset by peer >>> 2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from WAL >>> stream: SSL SYSCALL error: Connection timed out >>> 2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from WAL >>> stream: SSL SYSCALL error: Connection timed out >>> 2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from WAL >>> stream: SSL SYSCALL error: Connection timed out >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical >>> replication worker" (PID 2556) exited with exit code 1 >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical >>> replication worker" (PID 2112) exited with exit code 1 >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical >>> replication worker" (PID 1089) exited with exit code 1 >>> 2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply worker for >>> subscription "sub_tables_2_180" has started >>> 2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply worker for >>> subscription "sub_tables_3_192" has started >>> 2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply worker for >>> subscription "sub_tables_1_180" has started >>> >>> Just after this error, all other replication slots get disabled for some >>> time and come back online along with COPY command with the new PID in >>> pg_stat_activity. >>> >>> I have a few queries regarding this:- >>> >>> The exact reason for disconnection (Few articles claim memory and few >>> network) This might be because of network failure, did you notice any network instability, could you check the TCP settings. You could check the following configurations tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count. This means it will connect the server based on tcp_keepalives_idle seconds specified , if the server does not respond in tcp_keepalives_interval seconds it'll try again, and will consider the connection gone after tcp_keepalives_count failures. >>> Will it lead to data inconsistency? It will not lead to inconsistency. In case of failure the failed transaction will be rolled back. >>> Does this new PID COPY command again migrate the whole data of the test >>> table once again? Yes, it will migrate the whole table data again in case of failures. Regards, Vignesh