Re: HOT chain validation in verify_heapam()
On Thu, Mar 23, 2023 at 2:15 AM Andres Freund wrote: > > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is > set > and expects to find an item it can dereference - but I don't think that's > something we can rely on: Afaics HOT pruning can break chains, but doesn't > reset xmax. > > We have below code which I think takes care of xmin and xmax matching and if they match then only we add them to the predecessor array. /* * If the next line pointer is a redirect, or if it's a tuple * but the XMAX of this tuple doesn't match the XMIN of the next * tuple, then the two aren't part of the same update chain and * there is nothing more to do. */ if (ItemIdIsRedirected(next_lp)) continue; curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, curr_lp); curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup); next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp); next_xmin = HeapTupleHeaderGetXmin(next_htup); if (!TransactionIdIsValid(curr_xmax) || !TransactionIdEquals(curr_xmax, next_xmin)) continue; -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 22, 2023 at 4:06 PM Hayato Kuroda (Fujitsu) wrote: > > > The patch looks mostly good to me. However, I have one > > question/comment as follows: > > > > - > > + > > binary (boolean) > > > > > > To allow references to the binary option, we add the varlistentry id > > here. It looks slightly odd to me to add id for just one entry, see > > commit 78ee60ed84bb3a1cf0b6bd9a715dcbcf252a90f5 where we have > > purposefully added ids to allow future references. Shall we add id to > > other options as well on this page? > > I have analyzed same points and made patch that could be applied atop > v19-0001. > Please check 0002 patch. > Pushed the 0001. It may be better to start a separate thread for 0002. -- With Regards, Amit Kapila.
Re: Add n_tup_newpage_upd to pg_stat table views
> > > * No more dedicated struct to carry around the type of an update. > > We just use two boolean arguments to the pgstats function instead. The > struct didn't seem to be adding much, and it was distracting to track > the information this way within heap_update(). > That's probably a good move, especially if we start tallying updates that use TOAST.
Re: Error "initial slot snapshot too large" in create replication slot
At Wed, 22 Mar 2023 14:27:40 +0900 (JST), Kyotaro Horiguchi wrote in > At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" > wrote in > > Kyotoro Horiguchi, any chance you'll be able to work on this for this > > commitfest? If so shout (or anyone else is planning to push it over > > the line Andres?) otherwise I'll move it on to the next release. > > Ugg. sorry for being lazy. I have lost track of the conversation. I'm > currently working on this and will come back soon with a new version. I relized that attempting to make SnapshotData.xip expansible was making procarray.c and snapmgr.c too complicated. The main reason is that SnapShotData is allocated in various ways, like on the stack, using palloc including xip/subxip arrays, with palloc then allocating xip/subxip arrays separately, or statically allocated and then having xip/subxip arrays malloc'ed later. This variety was making the expansion logic a mess. So I went back to square one and decided to use subxip as an extension for the xip array instead. Like the comment added in the function SnapBuildInitialSnapshot mentions, I don't think we can reliably identify top-level XIDs. So, this patch just increases the allowed number of XIDs by using the subxip array. (The title of the patch was changed accordingly.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 4a41002eea7eaa18bd51521798c19d191d9c6d8c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 19 Jul 2022 11:50:29 +0900 Subject: [PATCH v7] Lift initial snapshot limit for logical replication The replication command CREATE_REPLICATION_SLOT tends to fail with "snapshot too large" when many subtransactions are active. This problem stems from SnapBuildInitialSnapshot attempting to generate a snapshot in which all active XIDs, including subxids, are stored within the xip array. This patch mitigates the constraint on the acceptable number of transaction IDs by utilizing the subxid array. Author: Dilip Kumar and Kyotaro Horiguchi --- src/backend/replication/logical/snapbuild.c | 50 + src/backend/storage/ipc/procarray.c | 18 ++-- src/include/access/transam.h| 33 ++ 3 files changed, 77 insertions(+), 24 deletions(-) diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 62542827e4..516e9ddfc8 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -565,10 +565,14 @@ Snapshot SnapBuildInitialSnapshot(SnapBuild *builder) { Snapshot snap; + int ntxn; + int xiplen; TransactionId xid; TransactionId safeXid; TransactionId *newxip; + TransactionId *newsubxip; int newxcnt = 0; + int newsubxcnt = 0; Assert(XactIsoLevel == XACT_REPEATABLE_READ); Assert(builder->building_full_snapshot); @@ -610,9 +614,35 @@ SnapBuildInitialSnapshot(SnapBuild *builder) MyProc->xmin = snap->xmin; - /* allocate in transaction context */ - newxip = (TransactionId *) - palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount()); + /* + * Aallocate in transaction context + * + * Since we know all the active XIDs, this snapshot won't be + * suboverflowed. However, if the number of XIDs surpasses the XID arrays' + * capacity, we cannot establish this snapshot because we don't know which + * XIDs are top-level. Since the original snapshot is historical, we can't + * reliably idetify the top-level XIDs. Thus, we have no choice but fail + * when there are too many active XIDs. + */ + ntxn = TransactionIdDistance(snap->xmin, snap->xmax) - snap->xcnt; + xiplen = GetMaxSnapshotXidCount(); + newxip = (TransactionId *) palloc(sizeof(TransactionId) * xiplen); + + if (ntxn > GetMaxSnapshotXidCount()) + { + /* + * Since we can't identify the top-level XIDs, we can't carete a + * suboverflowed snapshot. + */ + if (ntxn > xiplen + GetMaxSnapshotSubxidCount()) + ereport(ERROR, + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), + errmsg("initial slot snapshot too large"))); + + newsubxip = (TransactionId *) + palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount()); + } + /* * snapbuild.c builds transactions in an "inverted" manner, which means it @@ -633,21 +663,23 @@ SnapBuildInitialSnapshot(SnapBuild *builder) if (test == NULL) { - if (newxcnt >= GetMaxSnapshotXidCount()) -ereport(ERROR, - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), - errmsg("initial slot snapshot too large"))); - - newxip[newxcnt++] = xid; + if (newxcnt < xiplen) +newxip[newxcnt++] = xid; + else +newsubxip[newsubxcnt++] = xid; } TransactionIdAdvance(xid); } + Assert(newxcnt + newsubxcnt == ntxn); + /* adjust remaining snapshot fields as needed */ snap->snapshot_type = SNAPSHOT_MVCC; snap->xcnt = newxcnt; snap->xip = newxip; + snap->subxcnt = newsubxcnt; + snap->subxip = newsubxip; return snap; } diff --git
Re: Data is copied twice when specifying both child and parent table in publication
Here are some review comments for patch v20-0001. == General. 1. That function 'pg_get_publication_tables' does not seem to be described in the PG documentation. Why isn't it in the "System Catalog Information Functions" table [1] ? I asked this same question a long time ago but then the reply [2] was like "it doesn't seem to be a function provided to users". Well, perhaps that just means that the documentation has been accidentally missing for a long time. Does anybody know for sure if the omission of this function from the documentation is deliberate? If nobody here knows, then maybe this can be asked/addressed in a separate thread. == src/backend/catalog/pg_publication.c 2. filter_partitions (review comment from my v19 review) > 2a. > My previous review [1] (see #1) suggested putting most code within the > condition. AFAICT my comment still is applicable but was not yet > addressed. 22/3 Wang-san replied: "Personally, I prefer the current style because the approach you mentioned adds some indentations." Sure, but there is more than just indentation/style differences here. Currently, there is some unnecessary code executed if the table is not a partition. And the reader cannot tell at-a-glance if (skip) will be true/false without looking more closely at the loop logic. So, I think changing it would be better, but anyway I won’t debate about it any more because it's not a functional problem. == src/backend/commands/subscriptioncmds.c 3. fetch_table_list + /* Get the list of tables from the publisher. */ + if (server_version >= 16) + { + StringInfoData pub_names; - appendStringInfoString(, "FROM pg_catalog.pg_publication_tables t\n" -" WHERE t.pubname IN ("); - get_publications_str(publications, , true); - appendStringInfoChar(, ')'); + initStringInfo(_names); + get_publications_str(publications, _names, true); + + /* + * From version 16, we allowed passing multiple publications to the + * function pg_get_publication_tables. This helped to filter out the + * partition table whose ancestor is also published in this + * publication array. + * + * Join pg_get_publication_tables with pg_publication to exclude + * non-existing publications. + */ + appendStringInfo(, "SELECT DISTINCT N.nspname, C.relname,\n" + " ( SELECT array_agg(a.attname ORDER BY a.attnum)\n" + "FROM pg_attribute a\n" + "WHERE a.attrelid = GPT.relid AND\n" + " a.attnum = ANY(GPT.attrs)\n" + " ) AS attnames\n" + " FROM pg_class C\n" + " JOIN pg_namespace N ON N.oid = C.relnamespace\n" + " JOIN ( SELECT (pg_get_publication_tables(VARIADIC array_agg(pubname::text))).*\n" + " FROM pg_publication\n" + " WHERE pubname IN ( %s )) as GPT\n" + " ON GPT.relid = C.oid\n", + pub_names.data); + + pfree(pub_names.data); + } + else + { + appendStringInfoString(, "SELECT DISTINCT t.schemaname, t.tablename \n"); + + /* Get column lists for each relation if the publisher supports it */ + if (check_columnlist) + appendStringInfoString(, ", t.attnames\n"); + + appendStringInfoString(, "FROM pg_catalog.pg_publication_tables t\n" +" WHERE t.pubname IN ("); + get_publications_str(publications, , true); + appendStringInfoChar(, ')'); + } I noticed the SQL "if" part is using uppercase aliases, but the SQL in the "else" part is using lowercase aliases. I think it would be better to be consistent (pick one). == src/test/subscription/t/013_partition.pl 4. -# for tab4, we publish changes through the "middle" partitioned table +# If we subscribe only to pub_lower_level, changes for tab4 will be published +# through the "middle" partition table. However, since we will be subscribing +# to both pub_lower_level and pub_all (see subscription sub2 below), we will +# publish changes via the root table (tab4). $node_publisher->safe_psql('postgres', "CREATE PUBLICATION pub_lower_level FOR TABLE tab4_1 WITH (publish_via_partition_root = true)" ); ~ This comment seemed a bit overkill IMO. I don't think you need to say much here except maybe: # Note that subscription "sub2" will later subscribe simultaneously to both pub_lower_level (i.e. just table tab4_1) and pub_all. ~~~ 5. I think maybe you could have another test scenario where you INSERT something into tab4_1_1 while only the publication for tab4_1 has publish_via_partition_root=true ~~~ 6. AFAICT the tab4 tests are only testing the initial sync, but are not testing normal replication. Maybe some more normal (post sync) INSERTS are needed to tab4, tab4_1, tab4_1_1 ? == src/test/subscription/t/028_row_filter.pl 7. +# insert data into partitioned table. +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_viaroot_part(a) VALUES(13), (17)"); + $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2, tap_pub_3, tap_pub_4a, tap_pub_4b,
Re: doc: add missing "id" attributes to extension packaging page
On Tue, 21 Mar 2023 23:16:25 +0100 Brar Piening wrote: > On 17.01.2023 at 23:43, Karl O. Pinc wrote: > > It's good you asked. After poking about the XSLT 1.0 spec to > > refresh my memory I abandoned the "mode" approach entirely. The > > real "right way" is to use the import mechanism. > After quite some time trying to figure out why things don't work as > intended, I ended up reading the XSLT 1.0 spec. > > As the name already suggests, is closely related > to with the difference that it *applies* a > *template rule* from an imported style sheet instead of applying a > template rule from the current style sheet > (https://www.w3.org/TR/1999/REC-xslt-19991116#apply-imports). What it > does not do is *calling* a *named template* > (https://www.w3.org/TR/1999/REC-xslt-19991116#named-templates). > > What this basically means is that in XSLT 1.0 you can use > to override template rules ( match="this-pattern-inside-match-makes-it-a-template-rule">) but you > cannot use it to override named templates ( name="this-id-inside-name-makes-it-a-named-template">). If you want to > override named templates you basically have to duplicate and change > them. > > While there are mechanisms to call overriden named templates in XSLT > 3, this is out of scope here, since we're bound to XSLT 1.0 (It was reassuring to see you take the steps above; I once did exactly the same with and had the same excitements and disappointments. I feel validated. ;-) (One of my disappointments is that xsltproc supports only XSLT 1.0, and nothing later. IIRC, apparently one big reason is not the amount work needed to develop the program but the work required to develop a test suite to validate conformance.) > As a consequence, there was little I could change in the initial patch > to avoid the code duplication and all attempts to do so, ultimately > led to even longer and more complex code without really reducing the > amount of duplication. You're quite right. I clearly didn't have my XSLT turned on. Importing only works when templates are matched, not called by name. Sorry for the extra work I've put you through. > The approach actually does work in the > varlistentry case, although this doesn't really change a lot > regarding the length of the patch (it's a bit nicer though since in > this case it really avoids duplication). You've put in a lot of good work. I'm attaching 2 patches with only minor changes. 001-add-needed-ids_v1.patch This separates out the addition of ids from the XSLT changes, just to keep things tidy. Content is from your patch. 002-make_html_ids_discoverable_v4.patch I changed the linked text, the #, so that the leading space is not linked. This is arguable, as the extra space makes it easier to put the mouse on the region. But it seems tidy. I've tided up so the lines are no longer than 80 chars. > I've also taken the advice > to terminate the build and print the xpath if a required id is > missing. This looks awesome. I love the xpath! I've changed the format of the error message. What do you think? (Try it out by _not_ applying 001-add-needed-ids_v1.patch.) Also, the error message now has leading and trailing newlines to make it stand out. I'm normally against this sort of thing but thought I'd add it anyway for others to review. I'm ready to send these on to a committer but if you don't like what I did please send more patches for me to review. Outstanding questions (for committer?): The 002-make_html_ids_discoverable_v4.patch generates xhtml , , etc. attributes using a XSLT element with a "namespace" attribute. I'm unclear on the relationship PG has with xhtml and namespaces. Looks right to me, since the generated html has the same namespace name appearing in the xmlns attribute of the html tag, but somebody who knows more than me might want to review this. Using the namespace attribute does not seem to have affected the generated html, as far as my random sampling of output can tell. What character should be used to represent a link anchor? I've left your choice of "#" in the patch. If we go to unicode, My preference is the text paperclip ︎ Here's a table of all the choices I came up with, there may be others that are suitable. (The hex representations are Python 3 string literals. So print("\U0001f4ce\ufe0e") prints the text paperclip.) Hash mark # (ASCII, used in the patch, \u0023) Anchor ⚓ \u2693 Place of interest ⌘ \u2318 Bullseye ◎ \u25ce Link (emoji variant) \U0001f517 Link (text variant)︎ \U0001f517\ufe0e Paperclip (emoji variant) \U0001f4ce Paperclip (text variant) ︎ \U0001f4ce\ufe0e Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml index 9a0241a8d6..62d9f9eb22 100644 --- a/doc/src/sgml/pgwalinspect.sgml +++
Re: Combine pg_walinspect till_end_of_wal functions with others
On Thu, Mar 16, 2023 at 01:17:59PM +0530, Bharath Rupireddy wrote: > FWIW, I rebased the tests tweaking patch and attached it here as v9. > This should keep the pg_walinspect tests consistent across comments, > spaces, new lines and using count(*) >= 1 for all successful function > executions. Thoughts? Mostly OK by me, so applied after tweaking a few tiny things. The rewrites of the queries where we should have more than one record and the removal of count() for the failure cases have been kept as proposed, as are most of the comments. -- Michael signature.asc Description: PGP signature
回复: WAL Insertion Lock Improvements
Hi Andres Freund This patch improves performance significantly,Commitfest 2023-03 is coming to an end,Is it not submitted yet since the patch still needs to be improved? Best wish 发件人: Nathan Bossart 发送时间: 2023年2月21日 13:49 收件人: Bharath Rupireddy 抄送: Andres Freund ; PostgreSQL Hackers 主题: Re: WAL Insertion Lock Improvements On Thu, Feb 09, 2023 at 11:51:28AM +0530, Bharath Rupireddy wrote: > On Thu, Feb 9, 2023 at 3:36 AM Nathan Bossart > wrote: >> Overall, I think this patch is in reasonable shape. > > Thanks for reviewing. Please see the attached v5 patch. I'm marking this as ready-for-committer. I think a couple of the comments could use some small adjustments, but that probably doesn't need to hold up this patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Orphaned wait event
Hi, Commit dee663f7 made WAIT_EVENT_SLRU_FLUSH_SYNC redundant, so here's a patch to remove it. In case it's useful again, here's how I noticed: for X in ` grep WAIT_EVENT_ src/include/utils/wait_event.h | sed '/^#/d;s/,//;s/ = .*//' ` do if ! ( git grep $X | grep -v src/include/utils/wait_event.h | grep -v src/backend/utils/activity/wait_event.c | grep $X > /dev/null ) then echo "$X is not used" fi done From 5bd5d5448b1fbd01833069af72bf12d19eef42dd Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 23 Mar 2023 14:36:27 +1300 Subject: [PATCH] Remove orphaned wait event. Commit dee663f7 made WAIT_EVENT_SLRU_FLUSH_SYNC redundant. diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 7ab4424bf1..45259efd0f 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1468,11 +1468,6 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser ReplicationSlotWrite Waiting for a write to a replication slot control file. - - SLRUFlushSync - Waiting for SLRU data to reach durable storage during a checkpoint - or database shutdown. - SLRURead Waiting for a read of an SLRU page. diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c index 7940d64639..23a85345b7 100644 --- a/src/backend/utils/activity/wait_event.c +++ b/src/backend/utils/activity/wait_event.c @@ -672,9 +672,6 @@ pgstat_get_wait_io(WaitEventIO w) case WAIT_EVENT_REPLICATION_SLOT_WRITE: event_name = "ReplicationSlotWrite"; break; - case WAIT_EVENT_SLRU_FLUSH_SYNC: - event_name = "SLRUFlushSync"; - break; case WAIT_EVENT_SLRU_READ: event_name = "SLRURead"; break; diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h index 518d3b0a1f..14b7ac5cf6 100644 --- a/src/include/utils/wait_event.h +++ b/src/include/utils/wait_event.h @@ -207,7 +207,6 @@ typedef enum WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC, WAIT_EVENT_REPLICATION_SLOT_SYNC, WAIT_EVENT_REPLICATION_SLOT_WRITE, - WAIT_EVENT_SLRU_FLUSH_SYNC, WAIT_EVENT_SLRU_READ, WAIT_EVENT_SLRU_SYNC, WAIT_EVENT_SLRU_WRITE, -- 2.39.2
Re: refactoring basebackup.c
On Thu, Mar 23, 2023 at 2:50 PM Thomas Munro wrote: > In rem: commit 3500ccc3, > > for X in ` grep -E '^[^*]+event_name = "' > src/backend/utils/activity/wait_event.c | >sed 's/^.* = "//;s/";$//;/unknown/d' ` > do > if ! git grep "$X" doc/src/sgml/monitoring.sgml > /dev/null > then > echo "$X is not documented" > fi > done > > BaseBackupSync is not documented > BaseBackupWrite is not documented [Resending with trimmed CC: list, because the mailing list told me to due to a blocked account, sorry if you already got the above.]
[PATCH] Remove unnecessary unbind in LDAP search+bind mode
Hi! Comments in src/backend/libpq/auth.c [1] say: (after successfully finding the final DN to check the user-supplied password against) /* Unbind and disconnect from the LDAP server */ and later /* * Need to re-initialize the LDAP connection, so that we can bind to * it with a different username. */ But the protocol actually permits multiple subsequent authentications ("binds" in LDAP parlance) over a single connection [2]. Moreover, inspection of the code revision history of mod_authnz_ldap, pam_ldap, Bugzilla, and MediaWiki LDAP authentication plugin, shows that they've been doing this bind-after-search over the same LDAP connection for ~20 years without any evidence of interoperability troubles. (mod_authnz_ldap and pam_ldap are listed in the PostgreSQL documentation as examples of other software implementing this scheme. Bugzilla and MediaWiki are the original patch author's motivating examples [3]) Also it might be interesting to consider this note from the current revision of the protocol RFC [4]: "The Unbind operation is not the antithesis of the Bind operation as the name implies. The naming of these operations are historical. The Unbind operation should be thought of as the "quit" operation." So, it seems like the whole connection re-initialization thing was just a confusion caused by this very unfortunate "historical" naming, and can be safely removed, thus saving quite a few network round-trips, especially for the case of ldaps/starttls. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/auth.c;h=bc0cf26b122a1b28c20fe037ec851c0e99b1ffb6;hb=HEAD#l2603 [2] https://www.rfc-editor.org/rfc/rfc4511#section-4.2.1 [3] https://www.postgresql.org/message-id/4c0112730909141334n201cadf3x2e288528a97883ca%40mail.gmail.com [4] https://www.rfc-editor.org/rfc/rfc4511#section-4.3 -- Best regards, Anatoly Zaretsky v1-0001-Remove-unnecessary-unbind-in-LDAP-search-bind-mod.patch Description: Binary data
RE: Simplify some codes in pgoutput
On Monday, March 20, 2023 5:20 pmhouzj.f...@fujitsu.com wrote: > > On Thursday, March 16, 2023 12:30 PM Amit Kapila > wrote: > > > > > On Wed, Mar 15, 2023 at 2:00 PM houzj.f...@fujitsu.com > > wrote: > > > > > > I noticed that there are some duplicated codes in pgoutput_change() > > function > > > which can be simplified, and here is an attempt to do that. > > > > > > > For REORDER_BUFFER_CHANGE_DELETE, when the old tuple is missing, after > > this patch, we will still send BEGIN and do OutputPluginWrite, etc. > > Also, it will try to perform row_filter when none of old_slot or > > new_slot is set. I don't know for which particular case we have s > > handling missing old tuples for deletes but that might require changes > > in your proposed patch. > > I researched this a bit. I think the old tuple will be null only if the > modified table > doesn't have PK or RI when the DELETE happens (referred to the heap_delete()), > but in that case the DELETE won't be allowed to be replicated(e.g. the DELETE > will either error out or be filtered by table level filter in > pgoutput_change). > > I also checked this for system table and in that case it is null but > reorderbuffer > doesn't forward it. For user_catalog_table, similarily, the DELETE should be > filtered by table filter in pgoutput_change as well. > > So, I think we can remove this check and log. > And here is the new version patch which removes that for now. After rethinking about this, it seems better leave this check for now. Although it may be unnecessary, but we can remove that later as a separate patch when we are sure about this. So, here is a patch that add this check back. Best Regards, Hou zj v3-0001-simplify-the-code-in-pgoutput_change.patch Description: v3-0001-simplify-the-code-in-pgoutput_change.patch
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-22 13:45:52 -0700, Andres Freund wrote: > On 2023-03-22 09:19:18 -0400, Robert Haas wrote: > > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev > > wrote: > > > The patch needed a rebase due to a4f23f9b. PFA v12. > > > > I have committed this after tidying up a bunch of things in the test > > case file that I found too difficult to understand -- or in some cases > > just incorrect, like: > > As noticed by Andrew > https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and > then reproduced on HEAD by me, there's something not right here. skink / valgrind reported in a while back and found another issue: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2023-03-22%2021%3A53%3A41 ==2490364== VALGRINDERROR-BEGIN ==2490364== Conditional jump or move depends on uninitialised value(s) ==2490364==at 0x11D459F2: check_tuple_visibility (verify_heapam.c:1379) ==2490364==by 0x11D46262: check_tuple (verify_heapam.c:1812) ==2490364==by 0x11D46FDF: verify_heapam (verify_heapam.c:535) ==2490364==by 0x3D5B2C: ExecMakeTableFunctionResult (execSRF.c:235) ==2490364==by 0x3E8225: FunctionNext (nodeFunctionscan.c:95) ==2490364==by 0x3D6685: ExecScanFetch (execScan.c:133) ==2490364==by 0x3D6709: ExecScan (execScan.c:182) ==2490364==by 0x3E813A: ExecFunctionScan (nodeFunctionscan.c:270) ==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464) ==2490364==by 0x3FF7E7: ExecProcNode (executor.h:262) ==2490364==by 0x3FFB15: ExecNestLoop (nodeNestloop.c:160) ==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464) ==2490364== Uninitialised value was created by a stack allocation ==2490364==at 0x11D45325: check_tuple_visibility (verify_heapam.c:994) ==2490364== ==2490364== VALGRINDERROR-END ==2490364== VALGRINDERROR-BEGIN ==2490364== Conditional jump or move depends on uninitialised value(s) ==2490364==at 0x11D45AC6: check_tuple_visibility (verify_heapam.c:1379) ==2490364==by 0x11D46262: check_tuple (verify_heapam.c:1812) ==2490364==by 0x11D46FDF: verify_heapam (verify_heapam.c:535) ==2490364==by 0x3D5B2C: ExecMakeTableFunctionResult (execSRF.c:235) ==2490364==by 0x3E8225: FunctionNext (nodeFunctionscan.c:95) ==2490364==by 0x3D6685: ExecScanFetch (execScan.c:133) ==2490364==by 0x3D6709: ExecScan (execScan.c:182) ==2490364==by 0x3E813A: ExecFunctionScan (nodeFunctionscan.c:270) ==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464) ==2490364==by 0x3FF7E7: ExecProcNode (executor.h:262) ==2490364==by 0x3FFB15: ExecNestLoop (nodeNestloop.c:160) ==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464) ==2490364== Uninitialised value was created by a stack allocation ==2490364==at 0x11D45325: check_tuple_visibility (verify_heapam.c:994) ==2490364== Greetings, Andres Freund
Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Hi, On 2023-03-21 01:25:11 +0300, Alexander Korotkov wrote: > I'm going to push patchset v15 if no objections. Just saw that this went in - didn't catch up with the thread before, unfortunately. At the very least I'd like to see some more work on cleaning up the lazy tuple slot stuff. It's replete with unnecessary multiple-evaluation hazards - I realize that there's some of those already, but I don't think we should go further down that route. As far as I can tell there's no need for any of this to be macros. > From a8b4e8a7b27815e013ea07b8cc9ac68541a9ac07 Mon Sep 17 00:00:00 2001 > From: Alexander Korotkov > Date: Tue, 21 Mar 2023 00:34:15 +0300 > Subject: [PATCH 1/2] Evade extra table_tuple_fetch_row_version() in > ExecUpdate()/ExecDelete() > > When we lock tuple using table_tuple_lock() then we at the same time fetch > the locked tuple to the slot. In this case we can skip extra > table_tuple_fetch_row_version() thank to we've already fetched the 'old' tuple > and nobody can change it concurrently since it's locked. > > Discussion: > https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com > Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp > Reviewed-by: Andres Freund, Chris Travers > --- > src/backend/executor/nodeModifyTable.c | 48 +++--- > 1 file changed, 35 insertions(+), 13 deletions(-) > > diff --git a/src/backend/executor/nodeModifyTable.c > b/src/backend/executor/nodeModifyTable.c > index 3a673895082..93ebfdbb0d8 100644 > --- a/src/backend/executor/nodeModifyTable.c > +++ b/src/backend/executor/nodeModifyTable.c > @@ -1559,6 +1559,22 @@ ldelete: > { > case TM_Ok: > > Assert(context->tmfd.traversed); > + > + /* > + * Save locked tuple > for further processing of > + * RETURNING clause. > + */ > + if (processReturning && > + > resultRelInfo->ri_projectReturning && > + > !resultRelInfo->ri_FdwRoutine) > + { > + TupleTableSlot > *returningSlot; > + > + returningSlot = > ExecGetReturningSlot(estate, resultRelInfo); > + > ExecCopySlot(returningSlot, inputslot); > + > ExecMaterializeSlot(returningSlot); > + } > + > epqslot = > EvalPlanQual(context->epqstate, > >resultRelationDesc, > >resultRelInfo->ri_RangeTableIndex, This seems a bit byzantine. We use inputslot = EvalPlanQualSlot(...) to make EvalPlanQual() a bit cheaper, because that avoids a slot copy inside EvalPlanQual(). But now we copy and materialize that slot anyway - and we do so even if EPQ fails. And we afaics also do it when epqreturnslot is set, in which case we'll afaics never use the copied slot. Read the next paragraph below before replying to the above - I don't think this is right for other reasons: > @@ -1673,12 +1689,17 @@ ldelete: > } > else > { > + /* > + * Tuple can be already fetched to the returning slot > in case > + * we've previously locked it. Fetch the tuple only if > the slot > + * is empty. > + */ > slot = ExecGetReturningSlot(estate, resultRelInfo); > if (oldtuple != NULL) > { > ExecForceStoreHeapTuple(oldtuple, slot, false); > } > - else > + else if (TupIsNull(slot)) > { > if > (!table_tuple_fetch_row_version(resultRelationDesc, tupleid, > >SnapshotAny, slot)) I don't think this is correct as-is - what if ExecDelete() is called with some older tuple in the returning slot? If we don't enter the TM_Updated path, it won't get updated, and we'll return the wrong tuple. It
Re: Add n_tup_newpage_upd to pg_stat table views
On Wed, Mar 22, 2023 at 05:14:08PM -0700, Peter Geoghegan wrote: > * Small adjustments to the documentation. > > Nearby related items were tweaked slightly to make everything fit > together a bit better. For example, the description of n_tup_hot_upd > is revised to make it obvious that n_tup_hot_upd counts row updates > that can never get counted under the new n_tup_newpage_upd counter. @@ -168,6 +168,7 @@ typedef struct PgStat_TableCounts PgStat_Counter t_tuples_updated; PgStat_Counter t_tuples_deleted; PgStat_Counter t_tuples_hot_updated; + PgStat_Counter t_tuples_newpage_updated; boolt_truncdropped; I have in the works something that's going to rename these fields to not have the "t_" prefix anymore, to ease some global refactoring in pgstatfuncs.c so as we have less repetitive code with the functions that grab these counters. I don't think that's something you need to name without the prefix here, just a FYI that this is going to be immediately renamed ;) -- Michael signature.asc Description: PGP signature
Re: Add pg_walinspect function with block info columns
On Wed, Mar 22, 2023 at 5:14 PM Michael Paquier wrote: > On Wed, Mar 22, 2023 at 05:05:10PM -0700, Peter Geoghegan wrote: > > I'd go further than that myself: I haven't had any use for FPIs at > > all. If I was going to do something with FPIs then I'd just use > > pg_waldump, since I'd likely want to get them onto the filesystem for > > analysis anyway. (Just my experience.) > > FWIW, being able to get access to raw FPIs with a SQL interface is > useful if you cannot log into the host, which is something that a lot > of cloud providers don't allow, and not everybody is able to have > access to archived WAL segments in a different host. I'm not saying that it's not ever useful. Just that finding FPIs interesting isn't necessarily all that common when using pg_get_wal_block_info (or won't be, once it has those additional columns from pg_get_wal_records_info). -- Peter Geoghegan
Re: Add pg_walinspect function with block info columns
On Wed, Mar 22, 2023 at 05:05:10PM -0700, Peter Geoghegan wrote: > I'd go further than that myself: I haven't had any use for FPIs at > all. If I was going to do something with FPIs then I'd just use > pg_waldump, since I'd likely want to get them onto the filesystem for > analysis anyway. (Just my experience.) FWIW, being able to get access to raw FPIs with a SQL interface is useful if you cannot log into the host, which is something that a lot of cloud providers don't allow, and not everybody is able to have access to archived WAL segments in a different host. -- Michael signature.asc Description: PGP signature
Re: Add n_tup_newpage_upd to pg_stat table views
On Fri, Mar 17, 2023 at 3:22 PM Peter Geoghegan wrote: > I think that this is pretty close to being committable already. Attached revision has some small tweaks by me. Going to commit this revised version tomorrow morning. Changes: * No more dedicated struct to carry around the type of an update. We just use two boolean arguments to the pgstats function instead. The struct didn't seem to be adding much, and it was distracting to track the information this way within heap_update(). * Small adjustments to the documentation. Nearby related items were tweaked slightly to make everything fit together a bit better. For example, the description of n_tup_hot_upd is revised to make it obvious that n_tup_hot_upd counts row updates that can never get counted under the new n_tup_newpage_upd counter. -- Peter Geoghegan From 23d768e87e95e421e583e2f0dc06bd36534081a6 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Wed, 22 Mar 2023 11:58:18 -0700 Subject: [PATCH v2] Count the number of new page updates in pgstats. Bump catalog and stats format versions. Corey Huinker, with some tweaks by me. Author: Corey Huinker Reviewed-By: Peter Geoghegan Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CADkLM=ded21M9iZ36hHm-vj2rE2d=zcKpUQMds__Xm2pxLfHKA@mail.gmail.com --- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_proc.dat | 10 src/include/pgstat.h | 10 src/backend/access/heap/heapam.c | 2 +- src/backend/catalog/system_views.sql | 4 +++- src/backend/utils/activity/pgstat_relation.c | 12 -- src/backend/utils/adt/pgstatfuncs.c | 18 ++ doc/src/sgml/monitoring.sgml | 25 src/test/regress/expected/rules.out | 12 +++--- 9 files changed, 78 insertions(+), 17 deletions(-) diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h index e94528a7c..0c0915885 100644 --- a/src/include/catalog/catversion.h +++ b/src/include/catalog/catversion.h @@ -57,6 +57,6 @@ */ /* mmddN */ -#define CATALOG_VERSION_NO 202303181 +#define CATALOG_VERSION_NO 202303221 #endif diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 5cf87aeb2..7c358cff1 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5360,6 +5360,11 @@ proname => 'pg_stat_get_tuples_hot_updated', provolatile => 's', proparallel => 'r', prorettype => 'int8', proargtypes => 'oid', prosrc => 'pg_stat_get_tuples_hot_updated' }, +{ oid => '8614', + descr => 'statistics: number of tuples updated onto a new page', + proname => 'pg_stat_get_tuples_newpage_updated', provolatile => 's', + proparallel => 'r', prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_tuples_newpage_updated' }, { oid => '2878', descr => 'statistics: number of live tuples', proname => 'pg_stat_get_live_tuples', provolatile => 's', proparallel => 'r', prorettype => 'int8', proargtypes => 'oid', @@ -5823,6 +5828,11 @@ proname => 'pg_stat_get_xact_tuples_hot_updated', provolatile => 'v', proparallel => 'r', prorettype => 'int8', proargtypes => 'oid', prosrc => 'pg_stat_get_xact_tuples_hot_updated' }, +{ oid => '8615', + descr => 'statistics: number of tuples updated onto a new page in current transaction', + proname => 'pg_stat_get_xact_tuples_newpage_updated', provolatile => 'v', + proparallel => 'r', prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_xact_tuples_newpage_updated' }, { oid => '3044', descr => 'statistics: number of blocks fetched in current transaction', proname => 'pg_stat_get_xact_blocks_fetched', provolatile => 'v', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 1e418b682..46d053422 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -151,8 +151,8 @@ typedef struct PgStat_BackendSubEntry * the index AM, while tuples_fetched is the number of tuples successfully * fetched by heap_fetch under the control of simple indexscans for this index. * - * tuples_inserted/updated/deleted/hot_updated count attempted actions, - * regardless of whether the transaction committed. delta_live_tuples, + * tuples_inserted/updated/deleted/hot_updated/newpage_updated count attempted + * actions, regardless of whether the transaction committed. delta_live_tuples, * delta_dead_tuples, and changed_tuples are set depending on commit or abort. * Note that delta_live_tuples and delta_dead_tuples can be negative! * -- @@ -168,6 +168,7 @@ typedef struct PgStat_TableCounts PgStat_Counter t_tuples_updated; PgStat_Counter t_tuples_deleted; PgStat_Counter t_tuples_hot_updated; + PgStat_Counter t_tuples_newpage_updated; bool t_truncdropped; PgStat_Counter t_delta_live_tuples; @@ -234,7 +235,7 @@ typedef struct PgStat_TableXactStatus *
Re: Add pg_walinspect function with block info columns
On Tue, Mar 21, 2023 at 11:33 PM Michael Paquier wrote: > > The new pg_get_wal_block_info outputs columns in an order that doesn't > > seem like the most useful possible order to me. This gives us another > > reason to have separate GetWALRecordInfo and GetWALBlockInfo utility > > functions rather than sharing logic for building output tuples. > > > > Specifically, I think that pg_get_wal_block_info should ouput the > > "primary key" columns first: > > > > reltablespace, reldatabase, relfilenode, blockid, start_lsn, end_lsn > > It seems to me that this is up to the one using this SQL? If you see it that way, then why does it matter what I may want to do with the declared order? > I am not > sure to follow why this is important. For the cases you have poked > at, I guess it is, but is strikes me that it is just natural to shape > that to match the C structures we use for the WAL records > themselves, so the other way around. I don't feel very strongly about it, but it seems better to highlight the difference that exists between this and pg_get_wal_records_info. > Hence, it seems to me that 0002 has the order pretty much right. > What's the point in adding the description, by the way? Only > consistency with the other function? Is that really useful if you > want to apply more quals when retrieving some block data? I don't understand. It's useful to include the description for the same reason as it's useful to include it in pg_get_wal_records_info. Why wouldn't it be useful? Most individual records that have any block_ref blocks have exactly one. Most individual WAL records are very simple record types. So pg_get_wal_block_info just isn't going to look that different to pg_get_wal_records_info, once they share most of the same columns. The way that pg_get_wal_block_info disaggregates on block won't make the output look all that different. So each distinct "description" will usually only appear once in pg_get_wal_block_info anyway. -- Peter Geoghegan
Re: Remove nonmeaningful prefixes in PgStat_* fields
On Wed, Mar 22, 2023 at 02:52:23PM -0400, Melanie Plageman wrote: > This comment still has the t_ prefix as does the one for tuples_updated > and deleted. > > otherwise, LGTM. Good catch. pgstat_count_heap_update() has a t_tuples_hot_updated, and pgstat_update_heap_dead_tuples() a t_delta_dead_tuples on top of the three you have just reported. I have grepped for all the fields renamed, and nothing else stands out. However, my eyes don't have a 100% accuracy, either. -- Michael signature.asc Description: PGP signature
Re: Add pg_walinspect function with block info columns
On Wed, Mar 22, 2023 at 8:35 AM Melanie Plageman wrote: > After reading what you said, I was interested to see how substantial the > I/O cost with non-compressed FPI would be. > > Using a patch with a parameter to pg_get_wal_block_info() to skip > outputting FPI, I found that on a fast local nvme ssd, the timing > difference between doing so and not still isn't huge -- 9 seconds when > outputting the FPI vs 8.5 seconds when skipping outputting FPI. (with > ~50,000 records all with non-compressed FPIs). > > However, perhaps obviously, the I/O cost is worse. > Doing nothing but > > SELECT * FROM pg_get_wal_block_info(:start_lsn, :end_lsn, true) > where fpi is not null; > > per iostat, the write latency was double for the query which output fpi > from the one that didn't and the wkB/s was much higher. I think that we should also have something like the patch that you wrote to skip FPIs. It's not something that I feel as strongly about as the main point about including all the fields from pg_get_wal_records_info. but it does seem worth doing. > I have had use for block info without seeing the FPIs, personally. I'd go further than that myself: I haven't had any use for FPIs at all. If I was going to do something with FPIs then I'd just use pg_waldump, since I'd likely want to get them onto the filesystem for analysis anyway. (Just my experience.) -- Peter Geoghegan
Re: Options to rowwise persist result of stable/immutable function with RECORD result
On Wed, Mar 22, 2023 at 4:46 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Wed, Mar 22, 2023 at 4:32 PM Eske Rahn wrote: > >> Hi, >> >> Thanks for the quick answer *:-D* >> >> That was a nice sideeffect of lateral. >> >> In the example, the calling code also gets simplified: >> >> WITH x AS ( >> SELECT clock_timestamp() rowstart, *, clock_timestamp() rowend FROM ( >> SELECT '1' inp UNION >> SELECT '2' >> ) y, LATERAL septima.foo(inp) g >> ) >> SELECT * FROM x; >> >> >> That solved the issue at hand, in a much better way. Thanks >> >> Though I still fail to see *why* the other way should generally call the >> function for every column in the *result* record - if the function is >> STABLE or IMMUTABLE. >> > > It gets rewritten to be effectively: > > select func_call(...).col1, func_call(...).col2, func_call(...).col3 > > under the assumption that repeating the function call will be cheap and > side-effect free. It was never ideal but fixing that form of optimization > was harder than implementing LATERAL where the multi-column result has a > natural output in the form of a multi-column table. A normal function call > in the target list really means "return a single value" which is at odds > with writing .* after it. > > Actually, it is less "optimization" and more "SQL is strongly typed and all columns must be defined during query compilation". David J.
Re: Options to rowwise persist result of stable/immutable function with RECORD result
On Wed, Mar 22, 2023 at 4:32 PM Eske Rahn wrote: > Hi, > > Thanks for the quick answer *:-D* > > That was a nice sideeffect of lateral. > > In the example, the calling code also gets simplified: > > WITH x AS ( > SELECT clock_timestamp() rowstart, *, clock_timestamp() rowend FROM ( > SELECT '1' inp UNION > SELECT '2' > ) y, LATERAL septima.foo(inp) g > ) > SELECT * FROM x; > > > That solved the issue at hand, in a much better way. Thanks > > Though I still fail to see *why* the other way should generally call the > function for every column in the *result* record - if the function is > STABLE or IMMUTABLE. > It gets rewritten to be effectively: select func_call(...).col1, func_call(...).col2, func_call(...).col3 under the assumption that repeating the function call will be cheap and side-effect free. It was never ideal but fixing that form of optimization was harder than implementing LATERAL where the multi-column result has a natural output in the form of a multi-column table. A normal function call in the target list really means "return a single value" which is at odds with writing .* after it. David J.
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry
On Wed, Mar 22, 2023 at 02:21:12PM -0400, Melanie Plageman wrote: > Apologies as I know this docs update has already been committed, but > buffers fetched and blocks fetched both feel weird to me. If you have a > cache hit, you don't end up really "fetching" anything at all (since > pgstat_count_buffer_read() is called before ReadBuffer_common() and we > don't know if it is a hit or miss yet). And, I would normally associate > fetching with fetching a block into a buffer. It seems like this counter > is really reflecting the number of buffers acquired or used. Well, it is the number of times we've requested a block read, though it may not actually be a read if the block was in the cache already. > This isn't really the fault of this patch since that member was already > called blocks_fetched. The original documentation of these functions added by 46aa77c refers to "block fetch requests" and "block requests found in cache", so that would not be right either based on your opinion here. If you find "fetch" to be incorrect in this context, here is another idea: - "block read requests" for blocks_fetched(). - "block read requested but actually found in cache" for blocks_hit(). All the system views care only about the difference between both counters to count the number of physical reads really done. -- Michael signature.asc Description: PGP signature
Re: Improve logging when using Huge Pages
On Wed, Mar 22, 2023 at 05:18:28PM -0500, Justin Pryzby wrote: > Wow, good point. I think to make it work we'd need put > huge_pages_active into BackendParameters and handle it in > save_backend_variables(). If so, that'd be strong argument for using a > GUC, which already has all the necessary infrastructure for exposing the > server's state. I am afraid so, duplicating an existing infrastructure for a need like the one of this thread is not really appealing. -- Michael signature.asc Description: PGP signature
Re: Improve logging when using Huge Pages
On Wed, Mar 22, 2023 at 04:37:01PM +0900, Michael Paquier wrote: > I have noted something.. For the WIN32 case, we have that: > > +++ b/src/backend/port/win32_shmem.c > @@ -327,6 +327,8 @@ retry: > Sleep(1000); > continue; > } > + > + huge_pages_active = ((flProtect & SEC_LARGE_PAGES) != 0); > break; > > Are you sure that this is correct? This is set in > PGSharedMemoryCreate(), part of CreateSharedMemoryAndSemaphores() in > the startup sequence that creates the shmem segment. However, for a > normal backend created by EXEC_BACKEND, SubPostmasterMain() reattaches > to an existing shared memory segment, so we don't go through the > creation path that would set huge_pages_active for the process just > started, (note that InitPostmasterChild() switches IsUnderPostmaster, > bypassing the shmem segment creation). Wow, good point. I think to make it work we'd need put huge_pages_active into BackendParameters and handle it in save_backend_variables(). If so, that'd be strong argument for using a GUC, which already has all the necessary infrastructure for exposing the server's state. -- Justin
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-22 14:56:22 -0700, Andres Freund wrote: > A patch addressing some, but not all, of those is attached. With that I don't > see any crashes or false-positives anymore. That patch missed that, as committed, the first if (ItemIdIsRedirected()) check sets lp_valid[n] = true even if the target of the redirect is unused. With that fixed, 004_verify_heapam doesn't cause crash anymore - it doesn't pass though, because there's a bunch of unadjusted error messages. Andres diff --git i/contrib/amcheck/verify_heapam.c w/contrib/amcheck/verify_heapam.c index 663fb65dee6..d0c462e28ea 100644 --- i/contrib/amcheck/verify_heapam.c +++ w/contrib/amcheck/verify_heapam.c @@ -486,14 +486,24 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); - -/* - * Record the fact that this line pointer has passed basic - * sanity checking, and also the offset number to which it - * points. - */ -lp_valid[ctx.offnum] = true; -successor[ctx.offnum] = rdoffnum; +else if (ItemIdIsDead(rditem)) + report_corruption(, + psprintf("line pointer redirection to dead item at offset %u", + (unsigned) rdoffnum)); +else if (ItemIdIsRedirected(rditem)) + report_corruption(, + psprintf("line pointer redirection to another redirect at offset %u", + (unsigned) rdoffnum)); +else +{ + /* + * Record the fact that this line pointer has passed basic + * sanity checking, and also the offset number to which it + * points. + */ + lp_valid[ctx.offnum] = true; + successor[ctx.offnum] = rdoffnum; +} continue; } @@ -564,16 +574,19 @@ verify_heapam(PG_FUNCTION_ARGS) TransactionId next_xmin; OffsetNumber nextoffnum = successor[ctx.offnum]; + /* the current line pointer may not have a successor */ + if (nextoffnum == InvalidOffsetNumber) +continue; + /* - * The current line pointer may not have a successor, either - * because it's not valid or because it didn't point to anything. - * In either case, we have to give up. - * - * If the current line pointer does point to something, it's - * possible that the target line pointer isn't valid. We have to - * give up in that case, too. + * The successor is located beyond the end of the line item array, + * which can happen when the array is truncated. */ - if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum]) + if (nextoffnum > maxoff) +continue; + + /* the successor is not valid, have to give up */ + if (!lp_valid[nextoffnum]) continue; /* We have two valid line pointers that we can examine. */ @@ -583,14 +596,8 @@ verify_heapam(PG_FUNCTION_ARGS) /* Handle the cases where the current line pointer is a redirect. */ if (ItemIdIsRedirected(curr_lp)) { -/* Can't redirect to another redirect. */ -if (ItemIdIsRedirected(next_lp)) -{ - report_corruption(, - psprintf("redirected line pointer points to another redirected line pointer at offset %u", - (unsigned) nextoffnum)); - continue; -} +/* should have filtered out all the other cases */ +Assert(ItemIdIsNormal(next_lp)); /* Can only redirect to a HOT tuple. */ next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp); @@ -602,8 +609,12 @@ verify_heapam(PG_FUNCTION_ARGS) } /* - * Redirects are created by updates, so successor should be - * the result of an update. + * Redirects are created by HOT updates, so successor should + * be the result of an HOT update. + * + * XXX: HeapTupleHeaderIsHeapOnly() should always imply + * HEAP_UPDATED. This should be checked even when the tuple + * isn't a target of a redirect. */ if ((next_htup->t_infomask & HEAP_UPDATED) == 0) { @@ -665,15 +676,18 @@ verify_heapam(PG_FUNCTION_ARGS) * tuple should be marked as a heap-only tuple. Conversely, if the * current tuple isn't marked as HOT-updated, then the next tuple * shouldn't be marked as a heap-only tuple. + * + * NB: Can't use HeapTupleHeaderIsHotUpdated() as it checks if + * hint bits indicate xmin/xmax aborted. */ - if (!HeapTupleHeaderIsHotUpdated(curr_htup) && + if (!(curr_htup->t_infomask2 & HEAP_HOT_UPDATED) && HeapTupleHeaderIsHeapOnly(next_htup)) { report_corruption(, psprintf("non-heap-only update produced a heap-only tuple at offset %u", (unsigned) nextoffnum)); } - if (HeapTupleHeaderIsHotUpdated(curr_htup) && + if ((curr_htup->t_infomask2 & HEAP_HOT_UPDATED) && !HeapTupleHeaderIsHeapOnly(next_htup)) { report_corruption(,
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-22 13:45:52 -0700, Andres Freund wrote: > On 2023-03-22 09:19:18 -0400, Robert Haas wrote: > > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev > > wrote: > > > The patch needed a rebase due to a4f23f9b. PFA v12. > > > > I have committed this after tidying up a bunch of things in the test > > case file that I found too difficult to understand -- or in some cases > > just incorrect, like: > > As noticed by Andrew > https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and > then reproduced on HEAD by me, there's something not right here. > > At the very least there's missing verification that tids actually exists in > the > "Update chain validation" loop, leading to: > TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: > "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: > 355, PID: 645093 > > Which makes sense - the earlier loop adds t_ctid to the successor array, which > we then query without checking if there still is such a tid on the page. It's not quite so simple - I see now that the lp_valid check tries to prevent that. However, it's not sufficient, because there is no guarantee that lp_valid[nextoffnum] is initialized. Consider what happens if t_ctid of a heap tuple points to beyond the end of the item array - lp_valid[nextoffnum] won't be initialized. Why are redirections now checked in two places? There already was a ItemIdIsUsed() check in the "/* Perform tuple checks */" loop, but now there's the ItemIdIsRedirected() check in the "Update chain validation." loop as well - and the output of that is confusing, because it'll just mention the target of the redirect. I also think it's not quite right that some of checks inside if (ItemIdIsRedirected()) continue in case of corruption, others don't. While there's a later continue, that means the corrupt tuples get added to the predecessor array. Similarly, in the non-redirect portion, the successor array gets filled with corrupt tuples, which doesn't seem quite right to me. A patch addressing some, but not all, of those is attached. With that I don't see any crashes or false-positives anymore. Greetings, Andres Freund diff --git i/contrib/amcheck/verify_heapam.c w/contrib/amcheck/verify_heapam.c index 663fb65dee6..43f93f7784a 100644 --- i/contrib/amcheck/verify_heapam.c +++ w/contrib/amcheck/verify_heapam.c @@ -486,6 +486,14 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); +else if (ItemIdIsDead(rditem)) + report_corruption(, + psprintf("line pointer redirection to dead item at offset %u", + (unsigned) rdoffnum)); +else if (ItemIdIsRedirected(rditem)) + report_corruption(, + psprintf("line pointer redirection to another redirect at offset %u", + (unsigned) rdoffnum)); /* * Record the fact that this line pointer has passed basic @@ -564,16 +572,19 @@ verify_heapam(PG_FUNCTION_ARGS) TransactionId next_xmin; OffsetNumber nextoffnum = successor[ctx.offnum]; + /* the current line pointer may not have a successor */ + if (nextoffnum == InvalidOffsetNumber) +continue; + /* - * The current line pointer may not have a successor, either - * because it's not valid or because it didn't point to anything. - * In either case, we have to give up. - * - * If the current line pointer does point to something, it's - * possible that the target line pointer isn't valid. We have to - * give up in that case, too. + * The successor is located beyond the end of the line item array, + * which can happen when the array is truncated. */ - if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum]) + if (nextoffnum > maxoff) +continue; + + /* the successor is not valid, have to give up */ + if (!lp_valid[nextoffnum]) continue; /* We have two valid line pointers that we can examine. */ @@ -583,14 +594,8 @@ verify_heapam(PG_FUNCTION_ARGS) /* Handle the cases where the current line pointer is a redirect. */ if (ItemIdIsRedirected(curr_lp)) { -/* Can't redirect to another redirect. */ -if (ItemIdIsRedirected(next_lp)) -{ - report_corruption(, - psprintf("redirected line pointer points to another redirected line pointer at offset %u", - (unsigned) nextoffnum)); - continue; -} +/* should have filtered out all the other cases */ +Assert(ItemIdIsNormal(next_lp)); /* Can only redirect to a HOT tuple. */ next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp); @@ -602,8 +607,12 @@ verify_heapam(PG_FUNCTION_ARGS) } /* - * Redirects are created by updates, so successor should be - * the result of an update. + * Redirects are created by HOT updates, so successor should + * be the result of an HOT update. + * + * XXX:
Re: Options to rowwise persist result of stable/immutable function with RECORD result
On Tuesday, March 21, 2023, Eske Rahn wrote: > Hi, > > I have noticed a rather odd behaviour that is not strictly a bug, but is > unexpected. > > It is when a immutable (or stable) PG function is returning results in a > record structure a select on these calls the function repeatedly for each > element in the output record. > The LATERAL join modifier exists to handle this kind of situation. David J.
Re: HOT chain validation in verify_heapam()
On Wed, Mar 22, 2023 at 2:14 PM Peter Geoghegan wrote: > > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is > > set > > and expects to find an item it can dereference - but I don't think that's > > something we can rely on: Afaics HOT pruning can break chains, but doesn't > > reset xmax. > > I think that we need two passes to be completely thorough. An initial > pass, that works pretty much as-is, plus a second pass that locates > any orphaned heap-only tuples -- heap-only tuples that were not deemed > part of a valid HOT chain during the first pass. These remaining > orphaned heap-only tuples should be verified as having come from > now-aborted transactions (they should definitely be fully DEAD) -- > otherwise we have corruption. I see that there is a second pass over the heap page in verify_heapam(), in fact. Kind of. I'm referring to this loop: /* * An update chain can start either with a non-heap-only tuple or with * a redirect line pointer, but not with a heap-only tuple. * * (This check is in a separate loop because we need the predecessor * array to be fully populated before we can perform it.) */ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { if (xmin_commit_status_ok[ctx.offnum] && (xmin_commit_status[ctx.offnum] == XID_COMMITTED || xmin_commit_status[ctx.offnum] == XID_IN_PROGRESS) && predecessor[ctx.offnum] == InvalidOffsetNumber) { ItemId curr_lp; curr_lp = PageGetItemId(ctx.page, ctx.offnum); if (!ItemIdIsRedirected(curr_lp)) { HeapTupleHeader curr_htup; curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, curr_lp); if (HeapTupleHeaderIsHeapOnly(curr_htup)) report_corruption(, psprintf("tuple is root of chain but is marked as heap-only tuple")); } } } However, this "second pass over page" loop has roughly the same problem as the nearby HeapTupleHeaderIsHotUpdated() coding pattern: it doesn't account for the fact that a tuple whose xmin was XID_IN_PROGRESS a little earlier on may not be in that state once we reach the second pass loop. Concurrent transaction abort needs to be accounted for. The loop needs to recheck xmin status, at least in the initially-XID_IN_PROGRESS-xmin case. -- Peter Geoghegan
Re: [BUG] pg_stat_statements and extended query protocol
> What about using an uint64 for calls? That seems more appropriate to me (even > if > queryDesc->totaltime->calls will be passed (which is int64), but that's > already > also the case for the "rows" argument and > queryDesc->totaltime->rows_processed) That's fair > I'm not sure it's worth mentioning that the new counters are "currently" used > with the ExecutorRun. Sure, I suppose these fields could be used outside of ExecutorRun. Good point. > Also, I wonder if "rows" (and not rows_processed) would not be a better > naming. Agree. I went with rows_processed initially, since it was accumulating es_processed, but as the previous point, this instrumentation could be used outside of ExecutorRun. v3 addresses the comments. Regards, -- Sami Imseih Amazon Web Services (AWS) v3-0001-Correct-accumulation-of-counters-for-extended-query-.patch Description: v3-0001-Correct-accumulation-of-counters-for-extended-query-.patch
Options to rowwise persist result of stable/immutable function with RECORD result
Hi, I have noticed a rather odd behaviour that is not strictly a bug, but is unexpected. It is when a immutable (or stable) PG function is returning results in a record structure a select on these calls the function repeatedly for each element in the output record. See below for an example. Sure I can work around this by returning in an array, or materialised as a whole by e.g. a materialised CTE, but what I'm looking for is *materialising of just the individual row *during processing, if the function is to be called on many rows. Obviously in theory the returned record could be very complex, so we might not want it materialised in general, but an option to do so would be nice. I would suggest that a WITH could be marked with a new "MATERIALIZED *ROW*" option (reusing already reserved keywords). Note how I below have set the cost extreme, in this test, the value does not affect the behaviour.. The result set here have five elements, if i change the type to VOLATILE, the execution time is reduced by a factor of five (see the difference between the stamp of line one and two). It is directly proportional to the number of elements requested from the record (here I requested all) (The real life scenario is a function that by a list of reg_ex expessions, splits up the input in numerous fields, And I noticed the behaviour as a raise function added for debug, put out the same repeatedly.) - DROP TYPE IF EXISTS septima.foo_type CASCADE; CREATE TYPE septima.foo_type AS (a text, b text, c text, d text, e text); DROP FUNCTION IF EXISTS septima.foo(text); CREATE OR REPLACE FUNCTION septima.foo(inp text) RETURNS septima.foo_type AS $BODY$ DECLARE result_record septima.foo_type; i BIGINT :=12345678; BEGIN WHILE 0https://septima.dk
Re: Initial Schema Sync for Logical Replication
On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote: > Now, how do we avoid these problems even if we have our own version of > functionality similar to pg_dump for selected objects? I guess we will > face similar problems. If so, we may need to deny schema sync in any > such case. There are 2 approaches for initial DDL synchronization: 1) generate the DDL command on the publisher, stream it and apply it as-is on the subscriber; 2) generate a DDL representation (JSON, for example) on the publisher, stream it, transform it into a DDL command on subscriber and apply it. The option (1) is simpler and faster than option (2) because it does not require an additional step (transformation). However, option (2) is more flexible than option (1) because it allow you to create a DDL command even if a feature was removed from the subscriber and the publisher version is less than the subscriber version or a feature was added to the publisher and the publisher version is greater than the subscriber version. Of course there are exceptions and it should forbid the transformation (in this case, it can be controlled by the protocol version -- LOGICALREP_PROTO_FOOBAR_VERSION_NUM). A decision must be made: simple/restrict vs complex/flexible. One of the main use cases for logical replication is migration (X -> Y where X < Y). Postgres generally does not remove features but it might happen (such as WITH OIDS syntax) and it would break the DDL replication (option 1). In the downgrade case (X -> Y where X > Y), it might break the DDL replication if a new syntax is introduced in X. Having said that, IMO option (1) is fragile if we want to support DDL replication between different Postgres versions. It might eventually work but there is no guarantee. Per discussion [1], I think if we agree that the Alvaro's DDL deparse patch is the way to go with DDL replication, it seems wise that it should be used for initial DDL synchronization as well. [1] https://www.postgresql.org/message-id/CAA4eK1%2Bw_dFytBiv3RxbOL76_noMzmX0QGTc8uS%3Dbc2WaPVoow%40mail.gmail.com -- Euler Taveira EDB https://www.enterprisedb.com/
Re: HOT chain validation in verify_heapam()
On Wed, Mar 22, 2023 at 1:45 PM Andres Freund wrote: > At the very least there's missing verification that tids actually exists in > the > "Update chain validation" loop, leading to: > TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: > "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: > 355, PID: 645093 > > Which makes sense - the earlier loop adds t_ctid to the successor array, which > we then query without checking if there still is such a tid on the page. > > I suspect we don't just need a !ItemIdIsUsed(), but also a check gainst the > max offset on the page. We definitely need to do it that way, since a heap-only tuple's t_ctid is allowed to point to almost anything. I guess it can't point to some completely different heap block, but that's about the only restriction. In particular, it can point to an item that's past the end of the page following line pointer array truncation (truncation can happen during pruning or when the second heap pass takes place in VACUUM). OTOH the rules for LP_REDIRECT items *are* very strict. They need to be, since it's the root item of the HOT chain, referenced by TIDs in indexes, and have no heap tuple header metadata to use in cross-checks that take place during HOT chain traversal (traversal by code in places such as heap_hot_search_buffer). > WRT these failures: > non-heap-only update produced a heap-only tuple at offset 20 > > I think that's largely a consequence of HeapTupleHeaderIsHotUpdated()'s > definition: That has to be a problem for verify_heapam. > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set > and expects to find an item it can dereference - but I don't think that's > something we can rely on: Afaics HOT pruning can break chains, but doesn't > reset xmax. I think that we need two passes to be completely thorough. An initial pass, that works pretty much as-is, plus a second pass that locates any orphaned heap-only tuples -- heap-only tuples that were not deemed part of a valid HOT chain during the first pass. These remaining orphaned heap-only tuples should be verified as having come from now-aborted transactions (they should definitely be fully DEAD) -- otherwise we have corruption. That's what my abandoned patch to make heap pruning more robust did, you'll recall. -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-22 09:19:18 -0400, Robert Haas wrote: > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev > wrote: > > The patch needed a rebase due to a4f23f9b. PFA v12. > > I have committed this after tidying up a bunch of things in the test > case file that I found too difficult to understand -- or in some cases > just incorrect, like: As noticed by Andrew https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and then reproduced on HEAD by me, there's something not right here. At the very least there's missing verification that tids actually exists in the "Update chain validation" loop, leading to: TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: 355, PID: 645093 Which makes sense - the earlier loop adds t_ctid to the successor array, which we then query without checking if there still is such a tid on the page. I suspect we don't just need a !ItemIdIsUsed(), but also a check gainst the max offset on the page. WRT these failures: non-heap-only update produced a heap-only tuple at offset 20 I think that's largely a consequence of HeapTupleHeaderIsHotUpdated()'s definition: /* * Note that we stop considering a tuple HOT-updated as soon as it is known * aborted or the would-be updating transaction is known aborted. For best * efficiency, check tuple visibility before using this macro, so that the * INVALID bits will be as up to date as possible. */ #define HeapTupleHeaderIsHotUpdated(tup) \ ( \ ((tup)->t_infomask2 & HEAP_HOT_UPDATED) != 0 && \ ((tup)->t_infomask & HEAP_XMAX_INVALID) == 0 && \ !HeapTupleHeaderXminInvalid(tup) \ ) Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set and expects to find an item it can dereference - but I don't think that's something we can rely on: Afaics HOT pruning can break chains, but doesn't reset xmax. Greetings, Andres Freund
Re: Set arbitrary GUC options during initdb
Andres Freund writes: > This commit unfortunately broke --wal-segsize. If I use a slightly larger than > the default setting, I get: > initdb --wal-segsize 64 somepath > running bootstrap script ... 2023-03-22 13:06:41.282 PDT [639848] FATAL: > "min_wal_size" must be at least twice "wal_segment_size" [ confused... ] Oh, I see the problem. This: /* set default max_wal_size and min_wal_size */ snprintf(repltok, sizeof(repltok), "min_wal_size = %s", pretty_wal_size(DEFAULT_MIN_WAL_SEGS)); conflines = replace_token(conflines, "#min_wal_size = 80MB", repltok); looks like it's setting a compile-time-constant value of min_wal_size; at least that's what I thought it was doing when I revised the code. But it isn't, because somebody had the brilliant idea of making pretty_wal_size() depend on the wal_segment_size_mb variable. Will fix, thanks for report. regards, tom lane
Re: Set arbitrary GUC options during initdb
Hi, This commit unfortunately broke --wal-segsize. If I use a slightly larger than the default setting, I get: initdb --wal-segsize 64 somepath running bootstrap script ... 2023-03-22 13:06:41.282 PDT [639848] FATAL: "min_wal_size" must be at least twice "wal_segment_size" Greetings, Andres Freund
Re: Non-superuser subscription owners
On Wed, 2023-03-22 at 12:16 -0400, Robert Haas wrote: > If nobody's too unhappy with the idea, I plan to commit this soon, > both because I think that the feature is useful, and also because I > think it's an important security improvement. Is there any chance I can convince you to separate the privileges of using a connection string and creating a subscription, as I suggested[1] earlier? It would be useful for dblink, and I also plan to propose CREATE SUBSCRIPTION ... SERVER for v17 (it was too late for 16), for which it would also be useful to make the distinction. You seemed to generally think it was a reasonable idea, but wanted to wait for the other patch. I think it's the right breakdown of privileges even now, and I don't see a reason to give ourselves a headache later trying to split up the privileges later. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/fa1190c117c2455f2dd968a1a09f796ccef27b29.ca...@j-davis.com
Re: Can we avoid chdir'ing in resolve_symlinks() ?
Peter Eisentraut writes: > On 05.10.22 15:59, Tom Lane wrote: >> What did you think of the compromise proposal to change only >> the paths that pg_config outputs? I've not tried to code that, >> but I think it should be feasible. > I don't think I understand what this proposal actually means. What > would be the behavior of pg_config and how would it be different from > before? What I had in mind was: * server and most frontend programs keep the same behavior, that is fully resolve their executable's path to an absolute path (and then navigate to the rest of the installation from there); but now they'll use realpath() to avoid chdir'ing while they do that. * pg_config applies realpath() if its initial PATH search produces a relative path to the executable, or if the last component of that path is a symlink. Otherwise leave it alone, which would have the effect of not expanding directory-level symlinks. I think that changing pg_config's behavior would be enough to resolve the complaints you listed, but perhaps I'm missing some fine points. regards, tom lane
Re: On login trigger: take three
> On 22 Mar 2023, at 18:54, Robert Haas wrote: > Basically, I think 0001 is a good idea -- I'm much more nervous about > 0002. I think we should get 0001 polished up and committed. Correct me if I'm wrong, but I believe you commented on v27-0001 of the login event trigger patch series? Sorry about the confusion if so, this is a very old and lengthy thread with many twists and turns. This series was closed downthread when the original authors of login EVT left, and the 0001 GUC patch extracted into its own thread. That patch now lives at: https://commitfest.postgresql.org/42/4013/ This thread was then later revived by Mikhail Gribkov but without 0001 instead referring to the above patch for that part. The new patch for 0001 is quite different, and I welcome your eyes on that since I agree with you that it would be a good idea. -- Daniel Gustafsson
Re: HOT chain validation in verify_heapam()
Robert Haas writes: > I have committed this after tidying up a bunch of things in the test > case file that I found too difficult to understand -- or in some cases > just incorrect, like: My animal mamba doesn't like this one bit. I suspect the reason is that it's big-endian (PPC) and the endianness hacking in the test is simply wrong: syswrite($file, pack("L", $ENDIANNESS eq 'little' ? 0x00010019 : 0x19000100)) or BAIL_OUT("syswrite failed: $!"); pack's L code should already be performing an endianness swap, so why are we doing another one in the argument? regards, tom lane
Re: Request for comment on setting binary format output per session
On Wed, 2023-03-22 at 14:42 -0400, Tom Lane wrote: > This isn't going to help much unless we change the wire protocol > so that RowDescription messages carry these UUIDs instead of > (or in addition to?) the OIDs of the column datatypes. While > that's not completely out of the question, it's a heavy lift > that will affect multiple layers of client code along with the > server. I'm not sure that's a hard requirement. I pointed out a similar solution for type names here: https://www.postgresql.org/message-id/4297b9e310172b9a1e6d737e21ad8796d0ab7b03.ca...@j-davis.com In other words: if the Bind message depends on knowing the OID mappings, that forces an extra round-trip; but if the client doesn't need the mapping until it receives its first result, then it can use pipelining to avoid the extra round-trip. (I haven't actually tried it and I don't know if it's very reasonable to expect the client to do this.) > Also, what about container types? I doubt it's sane for > array-of-foo to have a UUID that's unrelated to the one for foo. > Composites and ranges would need some intelligence too if we > don't want them to be unduly complicated to process. That's a good point. I don't know if that is a major design issue or not; but it certainly adds complexity to the proposal and/or clients implementing it. Regards, Jeff Davis
Re: meson documentation build open issues
Hi, On 2023-03-20 10:32:49 -0700, Andres Freund wrote: > On 2023-03-20 11:58:08 +0100, Peter Eisentraut wrote: > > Oh, this patch set grew quite quickly. ;-) > > Yep :) Unless somebody sees a reason to wait, I am planning to commit: meson: add install-{quiet, world} targets meson: add install-{docs,doc-html,doc-man} targets meson: make install_test_files more generic, rename to install_files While I don't think we have necessarily the path forward around .css and .svg, the above are independent of that. For the .svg: I wonder if we should just inline the images in the chunked html, just like we do in the single page one. It's not like we reuse one image across a lot of pages, so there's no bandwidth saved from having the images separate... For the .css: docbook-xsl actually has support for writing the .css: [1] - but it requires the .css file be valid xml. I wonder if the cleanest approch would be to have a build step to create .css.xml - then the non-chunked build's generate.css.header would do the right thing. I'll start a new thread for docs: speed up docs build by special-casing the gentext.template VERY WIP: parallel doc generation after the feature freeze. After looking into it a tiny bit more, it seems we should use neither pandoc nor dbtoepub for epub generation. All the dbtoepub does is to invoke the docbook-xsl support for epubs and zip the result - except it doesn't use our stylesheets, so it looks randomly different and doesn't use our speedups. At the very least we should use our customizations, if we want epub support. Or we should just remove it. Pandoc unfortunately doesn't do docbook well enough to be usable for now to directly parse our docbook. Regards, Andres [1] https://docbook.sourceforge.net/release/xsl/current/doc/html/custom.css.source.html
Re: Remove nonmeaningful prefixes in PgStat_* fields
On Mon, Mar 20, 2023 at 10:05:21AM +0100, Drouvot, Bertrand wrote: > Hi, > > On 3/20/23 8:32 AM, Michael Paquier wrote: > > > > /* Total time previously charged to function, as of function start > > */ > > - instr_time save_f_total_time; > > + instr_time save_total_time; > > I have something to say about this one, though.. I find this change a > > bit confusing. It may be better kept as it is, or I think that we'd > > better rename also "save_total" and "start" to reflect in a better way > > what they do, because removing "f_" reduces the meaning of the field > > with the two others in the same structure. > > Thanks for looking at it! > > Good point and keeping it as it is currently would not > affect the work that is/will be done in [1]. > > So, please find attached V2 attached taking this comment into account. > diff --git a/src/backend/utils/adt/pgstatfuncs.c > b/src/backend/utils/adt/pgstatfuncs.c > index 35c6d46555..4f21fb2dc2 100644 > --- a/src/backend/utils/adt/pgstatfuncs.c > +++ b/src/backend/utils/adt/pgstatfuncs.c > @@ -1552,7 +1552,7 @@ pg_stat_get_xact_tuples_inserted(PG_FUNCTION_ARGS) > result = 0; > else > { > - result = tabentry->t_counts.t_tuples_inserted; > + result = tabentry->counts.tuples_inserted; This comment still has the t_ prefix as does the one for tuples_updated and deleted. otherwise, LGTM. > /* live subtransactions' counts aren't in t_tuples_inserted yet > */ > for (trans = tabentry->trans; trans != NULL; trans = > trans->upper) > result += trans->tuples_inserted; > @@ -1573,7 +1573,7 @@ pg_stat_get_xact_tuples_updated(PG_FUNCTION_ARGS) > result = 0; > else > { > - result = tabentry->t_counts.t_tuples_updated; > + result = tabentry->counts.tuples_updated; > /* live subtransactions' counts aren't in t_tuples_updated yet > */ > for (trans = tabentry->trans; trans != NULL; trans = > trans->upper) > result += trans->tuples_updated; > @@ -1594,7 +1594,7 @@ pg_stat_get_xact_tuples_deleted(PG_FUNCTION_ARGS) > result = 0; > else > { > - result = tabentry->t_counts.t_tuples_deleted; > + result = tabentry->counts.tuples_deleted; > /* live subtransactions' counts aren't in t_tuples_deleted yet > */ > for (trans = tabentry->trans; trans != NULL; trans = > trans->upper) > result += trans->tuples_deleted; > @@ -1613,7 +1613,7 @@ pg_stat_get_xact_tuples_hot_updated(PG_FUNCTION_ARGS) > if ((tabentry = find_tabstat_entry(relid)) == NULL) > result = 0; > else > - result = (int64) (tabentry->t_counts.t_tuples_hot_updated); > + result = (int64) (tabentry->counts.tuples_hot_updated); > > PG_RETURN_INT64(result); > } - Melanie
Re: Request for comment on setting binary format output per session
Jeff Davis writes: > On Wed, 2023-03-22 at 10:21 +0100, Peter Eisentraut wrote: >> I've been thinking that we need some new kind of identifier to allow >> clients to process types in more sophisticated ways. >> For example, each type could be (self-)assigned a UUID, which is fixed >> for that type no matter in which schema or under what extension name or >> with what OID it is installed. Client libraries could then hardcode >> that UUID for processing the types. Conversely, the UUID could be >> changed if the wire format of the type is changed, without having to >> change the type name. > That sounds reasonable to me. It could also be useful for other > extension objects (or the extension itself) to avoid other kinds of > weirdness from name collisions or major version updates or extensions > that depend on other extensions. This isn't going to help much unless we change the wire protocol so that RowDescription messages carry these UUIDs instead of (or in addition to?) the OIDs of the column datatypes. While that's not completely out of the question, it's a heavy lift that will affect multiple layers of client code along with the server. Also, what about container types? I doubt it's sane for array-of-foo to have a UUID that's unrelated to the one for foo. Composites and ranges would need some intelligence too if we don't want them to be unduly complicated to process. regards, tom lane
Re: Set arbitrary GUC options during initdb
I wrote: > Peter Eisentraut writes: >> I would remove the >> #if DEF_PGPORT != 5432 >> This was in the previous code too, but now if we remove it, then we >> don't have any more hardcoded 5432 left, which seems like a nice >> improvement in cleanliness. > Hm. That'll waste a few cycles during initdb; not sure if the extra > cleanliness is worth it. It's not like that number is going to change. After further thought I did it as you suggest. I think the only case where we really care about shaving milliseconds from initdb is in debug builds (e.g. buildfarm), which very likely get built with nondefault DEF_PGPORT anyway. I did get a bee in my bonnet about how replace_token (and now replace_guc_value) leak memory like there's no tomorrow. The leakage amounts to about a megabyte per run according to valgrind, and it's not going anywhere but up as we add more calls of those functions. So I made a quick change to redefine them in a less leak-prone way. regards, tom lane
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry
On Mon, Mar 20, 2023 at 6:58 AM Drouvot, Bertrand wrote: > > Hi, > > On 3/20/23 12:43 AM, Michael Paquier wrote: > > At the > > end, documenting both still sounds like the best move to me. > > Agree. > > Please find attached > v1-0001-pg_stat_get_xact_blocks_fetched-and_hit-doc.patch doing so. > > I did not put the exact same wording as the one being removed in ddfc2d9, as: > > - For pg_stat_get_xact_blocks_hit(), I think it's better to be closer to say > the > pg_statio_all_tables.heap_blks_hit definition. > > - For pg_stat_get_xact_blocks_fetched(), I think that using "buffer" is > better (than block) as at the > end it's related to pgstat_count_buffer_read(). > > At the end there is a choice to be made for both for the wording between > block and buffer. Indeed their > counters get incremented in "buffer" macros while retrieved in those "blocks" > functions. > > "Buffer" sounds more appropriate to me, so the attached has been done that > way. Apologies as I know this docs update has already been committed, but buffers fetched and blocks fetched both feel weird to me. If you have a cache hit, you don't end up really "fetching" anything at all (since pgstat_count_buffer_read() is called before ReadBuffer_common() and we don't know if it is a hit or miss yet). And, I would normally associate fetching with fetching a block into a buffer. It seems like this counter is really reflecting the number of buffers acquired or used. tuples_fetched makes more sense because a tuple is "fetched" into a slot. This isn't really the fault of this patch since that member was already called blocks_fetched. - Melanie
Re: psql: Add role's membership options to the \du+ command
In the previous version, I didn't notice (unlike cfbot) the compiler warning. Fixed in version 6. - Pavel Luzanov From 1b8b5743df23637b70e8d4ad0df0e1f892c595f3 Mon Sep 17 00:00:00 2001 From: Pavel Luzanov Date: Wed, 22 Mar 2023 20:54:41 +0300 Subject: [PATCH v6] psql: show membership options in the \du command Format for the "Member of" column completely redesigned. Shown within each row, in newline-separated format, are the memberships granted to the role. The presentation includes both the name of the grantor as well as the membership permissions (in an abbreviated format: a for admin option, i for inherit option, s for set option.) The word 'empty' is printed in the case that none of those permissions are granted. --- doc/src/sgml/ref/psql-ref.sgml | 30 +++--- src/bin/psql/describe.c| 63 -- src/test/regress/expected/psql.out | 49 +++ src/test/regress/sql/psql.sql | 30 ++ 4 files changed, 146 insertions(+), 26 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7b8ae9fac3..03e7da93de 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1727,9 +1727,18 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g S modifier to include system roles. If pattern is specified, only those roles whose names match the pattern are listed. -If the form \dg+ is used, additional information -is shown about each role; currently this adds the comment for each -role. + + +Shown within each row, in newline-separated format, are the memberships granted to +the role. The presentation includes both the name of the grantor +as well as the membership permissions (in an abbreviated format: +a for admin option, i for inherit option, +s for set option.) The word empty is printed in +the case that none of those permissions are granted. +See the GRANT command for their meaning. + + +If the form \dg+ is used the comment attached to the role is shown. @@ -1969,9 +1978,18 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g S modifier to include system roles. If pattern is specified, only those roles whose names match the pattern are listed. -If the form \du+ is used, additional information -is shown about each role; currently this adds the comment for each -role. + + +Shown within each row, in newline-separated format, are the memberships granted to +the role. The presentation includes both the name of the grantor +as well as the membership permissions (in an abbreviated format: +a for admin option, i for inherit option, +s for set option.) The word empty is printed in +the case that none of those permissions are granted. +See the GRANT command for their meaning. + + +If the form \du+ is used the comment attached to the role is shown. diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 99e28f607e..7f2b7c9363 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -3631,24 +3631,42 @@ describeRoles(const char *pattern, bool verbose, bool showSystem) printfPQExpBuffer(, "SELECT r.rolname, r.rolsuper, r.rolinherit,\n" " r.rolcreaterole, r.rolcreatedb, r.rolcanlogin,\n" - " r.rolconnlimit, r.rolvaliduntil,\n" - " ARRAY(SELECT b.rolname\n" - "FROM pg_catalog.pg_auth_members m\n" - "JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n" - "WHERE m.member = r.oid) as memberof"); + " r.rolconnlimit, r.rolvaliduntil, r.rolreplication,\n"); - if (verbose) - { - appendPQExpBufferStr(, "\n, pg_catalog.shobj_description(r.oid, 'pg_authid') AS description"); - ncols++; - } - appendPQExpBufferStr(, "\n, r.rolreplication"); + if (pset.sversion >= 16) + appendPQExpBufferStr(, + " (SELECT pg_catalog.string_agg(\n" + "pg_catalog.format('%I from %I (%s)',\n" + " b.rolname, m.grantor::pg_catalog.regrole::pg_catalog.text,\n" + " pg_catalog.regexp_replace(\n" + "pg_catalog.concat_ws(', ',\n" + " CASE WHEN m.admin_option THEN 'a' END,\n" + " CASE WHEN m.inherit_option THEN 'i' END,\n" + " CASE WHEN m.set_option THEN 's' END),\n" + " '^$', 'empty')\n" + "), E'\\n'\n" + "ORDER BY b.rolname, m.grantor::pg_catalog.regrole::pg_catalog.text)\n" + " FROM pg_catalog.pg_auth_members m\n" + " JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n" + " WHERE m.member = r.oid) as memberof"); + else + appendPQExpBufferStr(, +
Re: ICU locale validation / canonicalization
On Tue, 2023-03-21 at 10:35 +0100, Peter Eisentraut wrote: > [PATCH v6 1/6] Support language tags in older ICU versions (53 and > earlier). > > In pg_import_system_collations(), this is now redundant and can be > simplified: > > - if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr)) > + if (!pg_is_ascii(langtag) || !pg_is_ascii(langtag)) > > icu_set_collation_attributes() needs more commenting about what is > going > on. My guess is that uloc_canonicalize() converts from language tag > to > ICU locale ID, and then the existing logic to parse that apart would > apply. Is that how it works? Fixed the redundancy, added some comments, and committed 0001. > [PATCH v6 2/6] Wrap ICU ucol_open(). > > It makes sense to try to unify some of this. But I find the naming > confusing. If I see pg_ucol_open(), then I would expect that all > calls > to ucol_open() would be replaced by this. But here it's only a few, > without explanation. (pg_ucol_open() has no explanation at all > AFAICT.) The remaining callsite which doesn't use the wrapper is in initdb.c, which can't call into pg_locale.c, and has different intentions. initdb uses ucol_open to get the default locale if icu_locale is not specified; and it also uses ucol open to verify that the locale can be opened (whether specified or the default). (Aside: I created a tiny 0004 patch which makes this difference more clear and adds a nice comment.) There's no reason to use a wrapper when getting the default locale, because it's just passing NULL anyway. When verifying that the locale can be opened, ucol_open() doesn't catch many problems anyway, so I'm not sure it's worth a lot of effort to copy these extra checks that the wrapper does into initdb.c. For instance, what's the value in replacing "und" with "root" if opening either will succeed? Parsing the attributes can potentially catch problems, but the later patch 0006 will check the attributes when converting to a language tag at initdb time. So I'm inclined to just leave initdb alone in patches 0002 and 0003. > I have in my notes that check_icu_locale() and make_icu_collator() > should be combined into a single function. I think that would be a > better way to slice it. That would leave out get_collation_actual_version(), which should handle the same fixups for attributes and the "und" locale. > Btw., I had intentionally not written code like this > > +#if U_ICU_VERSION_MAJOR_NUM < 54 > + icu_set_collation_attributes(collator, loc_str); > +#endif > > The disadvantage of doing it that way is that you then need to dig > out > an old version of ICU in order to check whether the code compiles at > all. With the current code, you can be sure that that code compiles > if > you make changes elsewhere. I was wondering about that -- thank you, I changed it back to use "if" rather than "#ifdef". New series attached (starting at 0002 to better correspond to the previous series). -- Jeff Davis PostgreSQL Contributor Team - AWS From fbe03dc596b5e12f4dda60269e044caa58f8be32 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Tue, 14 Mar 2023 21:21:17 -0700 Subject: [PATCH v7 2/7] Wrap ICU ucol_open(). Hide details of supporting older ICU versions in a wrapper function. The current code only needs to handle icu_set_collation_attributes(), but a subsequent commit will add additional version-specific code. --- src/backend/utils/adt/pg_locale.c | 70 +++ 1 file changed, 43 insertions(+), 27 deletions(-) diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index c3ede994be..dd0786dff5 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -140,6 +140,7 @@ static char *IsoLocaleName(const char *); */ static UConverter *icu_converter = NULL; +static UCollator *pg_ucol_open(const char *loc_str); static void init_icu_converter(void); static size_t uchar_length(UConverter *converter, const char *str, int32_t len); @@ -1430,17 +1431,8 @@ make_icu_collator(const char *iculocstr, { #ifdef USE_ICU UCollator *collator; - UErrorCode status; - status = U_ZERO_ERROR; - collator = ucol_open(iculocstr, ); - if (U_FAILURE(status)) - ereport(ERROR, -(errmsg("could not open collator for locale \"%s\": %s", - iculocstr, u_errorName(status; - - if (U_ICU_VERSION_MAJOR_NUM < 54) - icu_set_collation_attributes(collator, iculocstr); + collator = pg_ucol_open(iculocstr); /* * If rules are specified, we extract the rules of the standard collation, @@ -1451,6 +1443,7 @@ make_icu_collator(const char *iculocstr, const UChar *default_rules; UChar *agg_rules; UChar *my_rules; + UErrorCode status; int32_t length; default_rules = ucol_getRules(collator, ); @@ -1722,16 +1715,11 @@ get_collation_actual_version(char collprovider, const char *collcollate) if (collprovider == COLLPROVIDER_ICU) { UCollator *collator; -
Re: Request for comment on setting binary format output per session
On Wed, 2023-03-22 at 10:12 +0100, Peter Eisentraut wrote: > Sending type names is kind of useless if what comes back with the > result > (RowDescription) are OIDs anyway. > > The client would presumably have some code like > > if (typeoid == 555) > parseThatType(); > > So it already needs to know about the OIDs of all the types it is > interested in. Technically it's still an improvement because you can avoid an extra round-trip. The client library can pipeline a query like: SELECT typname, oid FROM pg_type WHERE typname IN (...list of supported type names...); when the client first connects, and then go ahead and send whatever queries you want without waiting for the response. When you get back the result of the pg_type query, you cache the mapping, and use it to process any other results you get. That avoids introducing an extra round-trip. I'm not sure if that's a reasonable thing to expect the client to do, so I agree that we should offer a better way. Regards, Jeff Davis
Re: Initial Schema Sync for Logical Replication
> > Yes. Do we have any concrete use case where the subscriber is an older > > version, in the first place? > > > > As per my understanding, it is mostly due to the reason that it can > work today. Today, during an off-list discussion with Jonathan on this > point, he pointed me to a similar incompatibility in MySQL > replication. See the "SQL incompatibilities" section in doc[1]. Also, > please note that this applies not only to initial sync but also to > schema sync during replication. I don't think it would be feasible to > keep such cross-version compatibility for DDL replication. I think it's possible to make DDL replication cross-version compatible, by making the DDL deparser version-aware: the deparsed JSON blob can have a PG version in it, and the destination server can process the versioned JSON blob by transforming anything incompatible according to the original version and its own version. Regards, Zane
Re: Request for comment on setting binary format output per session
On Wed, 2023-03-22 at 10:21 +0100, Peter Eisentraut wrote: > I've been thinking that we need some new kind of identifier to allow > clients to process types in more sophisticated ways. > > For example, each type could be (self-)assigned a UUID, which is > fixed > for that type no matter in which schema or under what extension name > or > with what OID it is installed. Client libraries could then hardcode > that UUID for processing the types. Conversely, the UUID could be > changed if the wire format of the type is changed, without having to > change the type name. That sounds reasonable to me. It could also be useful for other extension objects (or the extension itself) to avoid other kinds of weirdness from name collisions or major version updates or extensions that depend on other extensions. Regards, Jeff Davis
Re: On login trigger: take three
On Tue, Mar 15, 2022 at 4:52 PM Daniel Gustafsson wrote: > Yeah, that was the previously posted v25 from the author (who adopted it from > the original author). I took the liberty to quickly poke at the review > comments you had left as well as the ones that I had found to try and progress > the patch. 0001 should really go in it's own thread though to not hide it > from > anyone interested who isn't looking at this thread. Some comments on 0001: - In general, I think we should prefer to phrase options in terms of what is done, rather than what is not done. For instance, the corresponding GUC for row-level security is row_security={on|off}, not ignore_row_security. - I think it's odd that the GUC in question doesn't accept true and false and our usual synonyms for those values. I suggest that it should, even if we want to add more possible values later. - "ignoreing" is mispleled. So is gux-ignore-event-trigger. "Even triggers" -> "Event triggers". - Perhaps the documentation for the GUC should mention that the GUC is not relevant in single-user mode because event triggers don't fire then anyway. - "Disable event triggers during the session." isn't a very good description because there is in theory nothing to prevent this from being set in postgresql.conf. Basically, I think 0001 is a good idea -- I'm much more nervous about 0002. I think we should get 0001 polished up and committed. -- Robert Haas EDB: http://www.enterprisedb.com
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hi, Tom, see below - I wonder if should provide one more piece of infrastructure around the saved error stuff... Have you measured whether this has negative performance effects when *NOT* using the new option? As-is this does not work with FORMAT BINARY - and converting the binary input functions to support soft errors won't happen for 16. So I think you need to raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified. On 2023-03-22 22:34:20 +0900, torikoshia wrote: > @@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate) > > ExecClearTuple(myslot); > > + if (cstate->opts.ignore_datatype_errors) > + { > + escontext.details_wanted = true; > + cstate->escontext = escontext; > + } I think it might be worth pulling this out of the loop. That does mean you'd have to reset escontext.error_occurred after an error, but that doesn't seem too bad, you need to do other cleanup anyway. > @@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext > *econtext, > values[m] = ExecEvalExpr(defexprs[m], econtext, > [m]); > } > else > - values[m] = InputFunctionCall(_functions[m], > - > string, > - > typioparams[m], > - > att->atttypmod); > + /* If IGNORE_DATATYPE_ERRORS is enabled skip > rows with datatype errors */ > + if (!InputFunctionCallSafe(_functions[m], > + >string, > + >typioparams[m], > + >att->atttypmod, > + >(Node *) >escontext, > + >[m])) > + { > + cstate->ignored_errors++; > + > + ereport(WARNING, > + errmsg("%s", > cstate->escontext.error_data->message)); That isn't right - you loose all the details of the message. As is you'd also leak the error context. I think the best bet for now is to do something like /* adjust elevel so we don't jump out */ cstate->escontext.error_data->elevel = WARNING; /* despite the name, this won't raise an error if elevel < ERROR */ ThrowErrorData(cstate->escontext.error_data); I wonder if we ought to provide a wrapper for this? It could e.g. know to mention the original elevel and such? I don't think NextCopyFrom() is the right place to emit this warning - it e.g. is also called from file_fdw.c, which might want to do something else with the error. From a layering POV it seems cleaner to do this in CopyFrom(). You already have a check for escontext.error_occurred there anyway. > @@ -3378,6 +3378,10 @@ copy_opt_item: > { > $$ = makeDefElem("freeze", (Node *) > makeBoolean(true), @1); > } > + | IGNORE_DATATYPE_ERRORS > + { > + $$ = > makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1); > + } > | DELIMITER opt_as Sconst > { > $$ = makeDefElem("delimiter", (Node *) > makeString($3), @1); I think we shouldn't add a new keyword for this, but only support this via /* new COPY option syntax */ copy_generic_opt_list: copy_generic_opt_elem Further increasing the size of the grammar with random keywords when we have more generic ways to represent them seems unnecessary. > +-- tests for IGNORE_DATATYPE_ERRORS option > +CREATE TABLE check_ign_err (n int, m int[], k int); > +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS; > +1{1} 1 > +a{2} 2 > +3{3} 33 > +4{a, 4} 4 > + > +5{5} 5 > +\. > +SELECT * FROM check_ign_err; > + I suggest adding a few more tests: - COPY with a datatype error that can't be handled as a soft error - test documenting that COPY FORMAT BINARY is incompatible with IGNORE_DATATYPE_ERRORS - a soft error showing the error context - although that will require some care to avoid the function name + line in the output Greetings, Andres Freund
Re: Add pg_walinspect function with block info columns
On Mon, Mar 20, 2023 at 7:34 PM Peter Geoghegan wrote: > On Sun, Mar 19, 2023 at 8:21 PM Kyotaro Horiguchi > wrote: > > It is not an issue with this patch, but as I look at this version, I'm > > starting to feel uneasy about the subtle differences between what > > GetWALRecordsInfo and GetWALBlockInfo do. One solution might be to > > have GetWALBlockInfo return a values array for each block, but that > > could make things more complex than needed. Alternatively, could we > > get GetWALRecordsInfo to call tuplestore_putvalues() internally? This > > way, both functions can manage the temporary memory context within > > themselves. > > Agreed. I'm also not sure what to do about it, though. > > > This means GetWALBlockInfo overwrites the last two columns generated > > by GetWalRecordInfo, but I don't think this approach is clean and > > stable. I agree we don't want the final columns in a block info tuple > > but we don't want to duplicate the common code path. > > > I initially thought we could devide the function into > > GetWALCommonInfo(), GetWALRecordInfo() and GetWALBlockInfo(), but it > > doesn't seem that simple.. In the end, I think we should have separate > > GetWALRecordInfo() and GetWALBlockInfo() that have duplicate > > "values[i++] = .." lines. > > I agree. A little redundancy is better when the alternative is fragile > code, and I'm pretty sure that that applies here -- there won't be > very many duplicated lines, and the final code will be significantly > clearer. There can be a comment about keeping GetWALRecordInfo and > GetWALBlockInfo in sync. So, I also agree that it is better to have the two separate functions instead of overwriting the last two columns. As for keeping them in sync, we could define the number of common columns as a macro like: #define WALINSPECT_INFO_NUM_COMMON_COLS 10 and use that to calculate the size of the values/nulls array in GetWalRecordInfo() and GetWALBlockInfo() (assuming a new version where those two functions duplicate the setting of values[x] = y). That way, if a new column of information is added and one of the two functions forgets to set it in the values array, it would still cause an empty column and it will be easier for the programmer to see it needs to be added. We could even define an enum like: typedef enum walinspect_common_col { WALINSPECT_START_LSN, WALINSPECT_END_LSN, WALINSPECT_PREV_LSN, WALINSPECT_XID, WALINSPECT_RMGR, WALINSPECT_REC_TYPE, WALINSPECT_REC_LENGTH, WALINSPECT_MAIN_DATA_LENGTH, WALINSPECT_FPILEN, WALINSPECT_DESC, WALINSPECT_NUM_COMMON_COL, } walinspect_common_col; and set values in both functions like values[WALINSPECT_FPILEN] = y if we kept the order of common columns the same and as the first N columns for both functions. This would keep us from having to manually update a macro like WALINSPECT_INFO_NUM_COMMON_COLS. Though, I'm not sure how much value that actually adds. - Melanie
Re: allow_in_place_tablespaces vs. pg_basebackup
On Tue, Mar 21, 2023 at 7:59 PM Michael Paquier wrote: > On Mon, Mar 20, 2023 at 07:56:42AM -0400, Robert Haas wrote: > > Anyone want to comment on this? > > I have not checked the patch in details, but perhaps this needs at > least one test? Sure. I was sort of hoping to get feedback on the overall plan first, but it's easy enough to add a test so I've done that in the attached version. I've put it in a separate file because 010_pg_basebackup.pl is pushing a thousand lines and it's got a ton of unrelated tests in there already. The test included here fails on master, but passes with the patch. -- Robert Haas EDB: http://www.enterprisedb.com v2-0001-Fix-pg_basebackup-with-in-place-tablespaces-some-.patch Description: Binary data
Re: Set arbitrary GUC options during initdb
Peter Eisentraut writes: > This patch looks good to me. It's a very nice simplification of the > initdb.c code, even without the new feature. Thanks for looking! > I found that the addition of > #include > didn't appear to be necessary. Maybe it was required before > guc_value_requires_quotes() was changed? There's still an isspace() added by the patch ... oh, the #include is not needed because port.h includes ctype.h. That's spectacularly awful from an include-footprint standpoint, but I can't say that I want to go fix it right this minute. > I would remove the > #if DEF_PGPORT != 5432 > This was in the previous code too, but now if we remove it, then we > don't have any more hardcoded 5432 left, which seems like a nice > improvement in cleanliness. Hm. That'll waste a few cycles during initdb; not sure if the extra cleanliness is worth it. It's not like that number is going to change. regards, tom lane
Re: CREATE DATABASE ... STRATEGY WAL_LOG issues
On 2023-03-22 09:58:58 -0400, Robert Haas wrote: > On Wed, Mar 22, 2023 at 1:12 AM Andres Freund wrote: > > Patch with the two minimal fixes attached. As we don't know whether it's > > worth > > changing the strategy, the more minimal fixes seem more appropriate. > > LGTM. Thanks for checking. Pushed.
Re: Should we remove vacuum_defer_cleanup_age?
Hi, On 2023-03-22 11:44:20 -0500, Justin Pryzby wrote: > On Sat, Mar 18, 2023 at 10:33:57AM +0100, Alvaro Herrera wrote: > > On 2023-Mar-17, Andres Freund wrote: > > > > > I started writing a test for vacuum_defer_cleanup_age while working on > > > the fix > > > referenced above, but now I am wondering if said energy would be better > > > spent > > > removing vacuum_defer_cleanup_age alltogether. > > > > +1 I agree it's not useful anymore. > > > > > I don't think I have the cycles to push this through in the next weeks, > > > but if > > > we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a > > > good idea to mark it as deprecated in 16? > > > > Hmm, for the time being, can we just "disable" it by disallowing to set > > the GUC to a value different from 0? Then we can remove the code later > > in the cycle at leisure. > > It can be useful to do a "rolling transition", and it's something I do > often. > > But I can't see why that would be useful here? It seems like something > that could be done after the feature freeze. It's removing a feature, > not adding one. It wasn't actually that much work to write a patch to remove vacuum_defer_cleanup_age, see the attached. I don't know whether others think we should apply it this release, given the "late submission", but I tend to think it's not worth caring the complication of vacuum_defer_cleanup_age forward. Greetings, Andres Freund >From 24b519015bb1922a9746eda2ebea8702ccdd486f Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 22 Mar 2023 09:56:39 -0700 Subject: [PATCH v1] Remove vacuum_defer_cleanup_age Discussion: https://postgr.es/m/20230317230930.nhsgk3qfk7f4a...@awork3.anarazel.de --- src/include/storage/standby.h | 1 - src/backend/storage/ipc/procarray.c | 105 ++ src/backend/storage/ipc/standby.c | 1 - src/backend/utils/misc/guc_tables.c | 9 -- src/backend/utils/misc/postgresql.conf.sample | 1 - src/bin/pg_upgrade/server.c | 5 +- doc/src/sgml/config.sgml | 35 -- doc/src/sgml/high-availability.sgml | 19 +--- 8 files changed, 11 insertions(+), 165 deletions(-) diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h index 2effdea126f..4bc23adf516 100644 --- a/src/include/storage/standby.h +++ b/src/include/storage/standby.h @@ -21,7 +21,6 @@ #include "storage/standbydefs.h" /* User-settable GUC parameters */ -extern PGDLLIMPORT int vacuum_defer_cleanup_age; extern PGDLLIMPORT int max_standby_archive_delay; extern PGDLLIMPORT int max_standby_streaming_delay; extern PGDLLIMPORT bool log_recovery_conflict_waits; diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index ea91ce355f2..106b184a3e6 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -367,9 +367,6 @@ static inline void ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId l static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid); static void MaintainLatestCompletedXid(TransactionId latestXid); static void MaintainLatestCompletedXidRecovery(TransactionId latestXid); -static void TransactionIdRetreatSafely(TransactionId *xid, - int retreat_by, - FullTransactionId rel); static inline FullTransactionId FullXidRelativeTo(FullTransactionId rel, TransactionId xid); @@ -1709,10 +1706,7 @@ TransactionIdIsActive(TransactionId xid) * do about that --- data is only protected if the walsender runs continuously * while queries are executed on the standby. (The Hot Standby code deals * with such cases by failing standby queries that needed to access - * already-removed data, so there's no integrity bug.) The computed values - * are also adjusted with vacuum_defer_cleanup_age, so increasing that setting - * on the fly is another easy way to make horizons move backwards, with no - * consequences for data integrity. + * already-removed data, so there's no integrity bug.) * * Note: the approximate horizons (see definition of GlobalVisState) are * updated by the computations done here. That's currently required for @@ -1877,50 +1871,11 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) TransactionIdOlder(h->data_oldest_nonremovable, kaxmin); /* temp relations cannot be accessed in recovery */ } - else - { - /* - * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age. - * - * vacuum_defer_cleanup_age provides some additional "slop" for the - * benefit of hot standby queries on standby servers. This is quick - * and dirty, and perhaps not all that useful unless the primary has a - * predictable transaction rate, but it offers some protection when - * there's no walsender connection. Note that we are assuming - * vacuum_defer_cleanup_age isn't large enough to cause wraparound --- - * so guc.c should limit it to no
Re: Should we remove vacuum_defer_cleanup_age?
On Sat, Mar 18, 2023 at 10:33:57AM +0100, Alvaro Herrera wrote: > On 2023-Mar-17, Andres Freund wrote: > > > I started writing a test for vacuum_defer_cleanup_age while working on the > > fix > > referenced above, but now I am wondering if said energy would be better > > spent > > removing vacuum_defer_cleanup_age alltogether. > > +1 I agree it's not useful anymore. > > > I don't think I have the cycles to push this through in the next weeks, but > > if > > we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a > > good idea to mark it as deprecated in 16? > > Hmm, for the time being, can we just "disable" it by disallowing to set > the GUC to a value different from 0? Then we can remove the code later > in the cycle at leisure. It can be useful to do a "rolling transition", and it's something I do often. But I can't see why that would be useful here? It seems like something that could be done after the feature freeze. It's removing a feature, not adding one. -- Justin
Re: Add pg_walinspect function with block info columns
On Wed, Mar 22, 2023 at 11:35 AM Melanie Plageman wrote: > > On Fri, Mar 17, 2023 at 8:51 PM Michael Paquier wrote: > > > > On Fri, Mar 17, 2023 at 04:36:58PM -0700, Peter Geoghegan wrote: > > > I'm sure that they will do that much more than they would have > > > otherwise. Since we'll have made pg_get_wal_block_info() so much more > > > useful than pg_get_wal_records_info() for many important use cases. > > > Why is that a bad thing? Are you concerned about the overhead of > > > pulling in FPIs when pg_get_wal_block_info() is run, if Bharath's > > > patch is committed? That could be a problem, I suppose -- but it would > > > be good to get more data on that. Do you think that this will be much > > > of an issue, Bharath? > > > > Yes. The CPU cost is one thing, but I am also worrying about the > > I/O cost with a tuplestore spilling to disk a large number of FPIs, > > and some workloads can generate WAL so as FPIs is what makes for most > > of the contents stored in the WAL. (wal_compression is very effective > > in such cases, for example.) > > I had done some analysis about CPU costs for decompressing FPI upthread > in [1], finding that adding a parameter to allow skipping outputting FPI > would not have much impact when FPI are compressed, as decompressing the > images comprised very little of the overall time. > > After reading what you said, I was interested to see how substantial the > I/O cost with non-compressed FPI would be. > > Using a patch with a parameter to pg_get_wal_block_info() to skip > outputting FPI, I found that on a fast local nvme ssd, the timing > difference between doing so and not still isn't huge -- 9 seconds when > outputting the FPI vs 8.5 seconds when skipping outputting FPI. (with > ~50,000 records all with non-compressed FPIs). > > However, perhaps obviously, the I/O cost is worse. > Doing nothing but > > SELECT * FROM pg_get_wal_block_info(:start_lsn, :end_lsn, true) > where fpi is not null; Sorry, I should have been more clear: similar results with a select list simply excluding fpi and no where clause. - Melanie
Re: Bytea PL/Perl transform
>Среда, 22 марта 2023, 12:45 +03:00 от Daniel Gustafsson : > >> On 18 Mar 2023, at 23:25, Иван Панченко < w...@mail.ru > wrote: >> >> Hi, >> PostgreSQL passes bytea arguments to PL/Perl functions as hexadecimal >> strings, which is not only inconvenient, but also memory and time consuming. >> So I decided to propose a simple transform extension to pass bytea as native >> Perl octet strings. >> Please find the patch attached. >Thanks for the patch, I recommend registering this in the currently open >Commitfest to make sure it's kept track of: > >https://commitfest.postgresql.org/43/ Thanks, done: https://commitfest.postgresql.org/43/4252/ > >-- >Daniel Gustafsson > -- Ivan
Re: Non-superuser subscription owners
On Wed, Mar 8, 2023 at 2:47 PM Andres Freund wrote: > Hm - it still feels wrong that we error out in case of failure, despite the > comment to the function saying: > * Returns NULL on error and fills the err with palloc'ed error message. I've amended the comment so that it explains why it's done that way. > Other than this, the change looks ready to me. Well, it still needed documentation changes and pg_dump changes. I've added those in the attached version. If nobody's too unhappy with the idea, I plan to commit this soon, both because I think that the feature is useful, and also because I think it's an important security improvement. Since replication is currently run as the subscription owner, any table owner can compromise the subscription owner's account, which is really bad, but if the subscription owner can be a non-superuser, it's a little bit less bad. From a security point of view, I think the right thing to do and what would improve security a lot more is to run replication as the table owner rather than the subscription owner. I've posted a patch for that at http://postgr.es/m/CA+TgmoaSCkg9ww9oppPqqs+9RVqCexYCE6Aq=usypfnoode...@mail.gmail.com and AFAICT everyone agrees with the idea, even if the patch itself hasn't yet attracted any code reviews. But although the two patches are fairly closely related, this seems to be a good idea whether that moves forward or not, and that seems to be a good idea whether this moves forward or not. As this one has had more review and discussion, my current thought is to try to get this one committed first. -- Robert Haas EDB: http://www.enterprisedb.com v5-0001-Add-new-predefined-role-pg_create_subscriptions.patch Description: Binary data
Re: [PATCH] Report the query string that caused a memory error under Valgrind
On 31.01.23 15:00, Onur Tirtir wrote: We use Valgrind --together with the suppression file provided in Postgres repo-- to test Citus extension against memory errors. We replace /bin/postgres executable with a simple bash script that executes the original postgres executable under Valgrind and then we run our usual regression tests. However, it is quite hard to understand which query caused a memory error in the stack traces that has been dumped into valgrind logfile. For this reason, we propose the attached patch to allow Valgrind to report the query string that caused a memory error right after the relevant stack trace. I belive this would not only be useful for Citus but also for Postgres and other extensions in their valgrind-testing process. I can see how this could be useful. But this only handles queries using the simple protocol. At least the extended protocol should be handled as well. Maybe it would be better to move this up to PostgresMain() and handle all protocol messages?
Re: Add pg_walinspect function with block info columns
On Fri, Mar 17, 2023 at 8:51 PM Michael Paquier wrote: > > On Fri, Mar 17, 2023 at 04:36:58PM -0700, Peter Geoghegan wrote: > > I'm sure that they will do that much more than they would have > > otherwise. Since we'll have made pg_get_wal_block_info() so much more > > useful than pg_get_wal_records_info() for many important use cases. > > Why is that a bad thing? Are you concerned about the overhead of > > pulling in FPIs when pg_get_wal_block_info() is run, if Bharath's > > patch is committed? That could be a problem, I suppose -- but it would > > be good to get more data on that. Do you think that this will be much > > of an issue, Bharath? > > Yes. The CPU cost is one thing, but I am also worrying about the > I/O cost with a tuplestore spilling to disk a large number of FPIs, > and some workloads can generate WAL so as FPIs is what makes for most > of the contents stored in the WAL. (wal_compression is very effective > in such cases, for example.) I had done some analysis about CPU costs for decompressing FPI upthread in [1], finding that adding a parameter to allow skipping outputting FPI would not have much impact when FPI are compressed, as decompressing the images comprised very little of the overall time. After reading what you said, I was interested to see how substantial the I/O cost with non-compressed FPI would be. Using a patch with a parameter to pg_get_wal_block_info() to skip outputting FPI, I found that on a fast local nvme ssd, the timing difference between doing so and not still isn't huge -- 9 seconds when outputting the FPI vs 8.5 seconds when skipping outputting FPI. (with ~50,000 records all with non-compressed FPIs). However, perhaps obviously, the I/O cost is worse. Doing nothing but SELECT * FROM pg_get_wal_block_info(:start_lsn, :end_lsn, true) where fpi is not null; per iostat, the write latency was double for the query which output fpi from the one that didn't and the wkB/s was much higher. This is probably obvious, but I'm just wondering if it makes sense to have such a parameter to avoid impacting a system which is doing concurrent I/O with walinspect. I have had use for block info without seeing the FPIs, personally. - Melanie [1] https://www.postgresql.org/message-id/CAAKRu_bJvbcYBRj2cN6G2xV7B7-Ja%2BpjTO1nEnEhRR8OXYiABA%40mail.gmail.com
Re: meson: Non-feature feature options
On 22.03.23 11:16, Nazir Bilal Yavuz wrote: Hi, On Wed, 15 Mar 2023 at 11:12, Peter Eisentraut wrote: On 14.03.23 15:07, Nazir Bilal Yavuz wrote: I think the uuid side of this is making this way too complicated. I'm content leaving this as a manual option for now. There is much more value in making the ssl option work automatically. So I would welcome a patch that just makes -Dssl=auto work smoothly, perhaps using the "trick" that Andres described. I tried to implement what we did for ssl to uuid as well, do you have any comments? For the uuid option, we have three different choices. What should be the search order and why? Docs [1] say that: OSSP uuid library is not well maintained, and is becoming increasingly difficult to port to newer platforms; so we can put 'uuid-ossp' to the end. Between 'uuid-e2fs' and 'uuid-bsd', I believe 'uuid-e2fs' is used more often than 'uuid-bsd'. Hence, they can be ordered as 'uuid-e2fs', 'uuid-bsd', 'uuid-ossp'. Does that make sense? I think that's fair.
Re: ResourceOwner refactoring
Hi, I noticed that the patchset didn't make much progress since February and decided to give it another round of code review. > [...]. But in general, end-of-transaction activities should be kept > simple, especially between the release phases, so I feel that having to > remember extra resources there is a bad code smell and we shouldn't > encourage that. +1 > I added a test module in src/test/modules/test_resowner to test those cases. This is much appreciated, as well as extensive changes made to READMEs and the comments. > [...] New patchset attached. > I added it as a separate patch on top of the previous patches, as > v13-0003-Release-resources-in- > priority-order.patch, to make it easier to > see what changed after the previous patchset version. But it should be > squashed with patch 0002 before committing. My examination, which besides reading the code included running it under sanitizer and checking the code coverage, didn't reveal any major problems with the patchset. Certain "should never happen in practice" scenarios seem not to be test-covered in resowner.c, particularly: ``` elog(ERROR, "ResourceOwnerEnlarge called after release started"); elog(ERROR, "ResourceOwnerRemember called but array was full"); elog(ERROR, "ResourceOwnerForget called for %s after release started", kind->name); elog(ERROR, "%s %p is not owned by resource owner %s", elog(ERROR, "ResourceOwnerForget called for %s after release started", kind->name); elog(ERROR, "lock reference %p is not owned by resource owner %s" ``` I didn't check whether these or similar code paths were tested before the patch and I don't have a strong opinion on whether we should test these scenarios. Personally I'm OK with the fact that these few lines are not covered with tests. The following procedures are never executed: * RegisterResourceReleaseCallback() * UnregisterResourceReleaseCallback() And are never actually called anymore due to changes in 0005. Previously they were used by contrib/pgcrypto. I suggest dropping this part of the API since it seems to be redundant now. This will simplify the implementation even further. This patch, although moderately complicated, was moved between several commitfests. I think it would be great if it made it to PG16. I'm inclined to change the status of the patchset to RfC in a bit, unless anyone has a second opinion. -- Best regards, Aleksander Alekseev
Re: User functions for building SCRAM secrets
On 3/22/23 2:48 AM, Michael Paquier wrote: On Tue, Feb 14, 2023 at 06:16:18PM -0500, Jonathan S. Katz wrote: I opted for the approach in [2]. v5 contains the branching logic for the UTF8 only tests, and the corresponding output files. I tested locally on macOS against both UTF8 + C locales. I was reading this thread again, and pondered on this particular point: https://www.postgresql.org/message-id/caawbhmhjcfc4oaga_7yluhtj6j+rxey+bodrygzndaflgfz...@mail.gmail.com We've had our share of complains over the years that Postgres logs password data in the logs with various DDLs, so I'd tend to agree that this is not a practice we should try to encourage more. The parameterization of the SCRAM verifiers through GUCs (like Daniel's https://commitfest.postgresql.org/42/4201/ for the iteration number) is more promising because it is possible to not have to send the password over the wire with once we let libpq take care of the computation, and the server would not know about that I generally agree with not allowing password data to be in logs, but in practice, there are a variety of tools and extensions that obfuscate or remove passwords from PostgreSQL logs. Additionally, this function is not targeted for SQL statements directly, but stored procedures. For example, an extension like "pg_tle" exposes the ability for someone to write a "check_password_hook" directly from PL/pgSQL[1] (and other languages). As we've made it a best practice to pre-hash the password on the client-side, a user who wants to write a check password hook against a SCRAM verifier needs to be able to compare the verifier against some existing set of plaintext criteria, and has to write their own function to do it. I have heard several users who have asked to do this, and the only feedback I can give them is "implement your own SCRAM build secret function." And, if my PostgreSQL server _is_ the client, e.g. it's making a dblink call to another PostgreSQL server, the only way it can modify a password is by sending the plaintext credential over the wire. I don't see how the parameterization work applies here -- would we allow salts to be parameterized? -- and it still would not allow the server to build out a SCRAM secret for these cases. Maybe I'm not conveying the problem this is solving -- I'm happy to go one more round trying to make it clearer -- but if this is not clear, it'd be good to at develop an alternative approach to this before withdrawing the patch. Thanks, Jonathan [1] https://github.com/aws/pg_tle/blob/main/docs/06_plpgsql_examples.md#example-password-check-hook-against-bad-password-dictionary OpenPGP_signature Description: OpenPGP digital signature
Re: CREATE DATABASE ... STRATEGY WAL_LOG issues
On Wed, Mar 22, 2023 at 1:12 AM Andres Freund wrote: > Patch with the two minimal fixes attached. As we don't know whether it's worth > changing the strategy, the more minimal fixes seem more appropriate. LGTM. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Initial Schema Sync for Logical Replication
On Wed, Mar 22, 2023 at 2:16 PM Amit Kapila wrote: > > On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada wrote: > > > > On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila wrote: > > > > > > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira wrote: > > > > > > > You should > > > > exclude them removing these objects from the TOC before running > > > > pg_restore or > > > > adding a few pg_dump options to exclude these objects. Another issue is > > > > related > > > > to different version. Let's say the publisher has a version ahead of the > > > > subscriber version, a new table syntax can easily break your logical > > > > replication setup. IMO pg_dump doesn't seem like a good solution for > > > > initial > > > > synchronization. > > > > > > > > Instead, the backend should provide infrastructure to obtain the > > > > required DDL > > > > commands for the specific (set of) tables. This can work around the > > > > issues from > > > > the previous paragraph: > > > > > > > ... > > > > * don't need to worry about different versions. > > > > > > > > > > AFAICU some of the reasons why pg_dump is not allowed to dump from the > > > newer version are as follows: (a) there could be more columns in the > > > newer version of the system catalog and then Select * type of stuff > > > won't work because the client won't have knowledge of additional > > > columns. (b) the newer version could have new features (represented by > > > say new columns in existing catalogs or new catalogs) that the older > > > version of pg_dump has no knowledge of and will fail to get that data > > > and hence an inconsistent dump. The subscriber will easily be not in > > > sync due to that. > > > > > > Now, how do we avoid these problems even if we have our own version of > > > functionality similar to pg_dump for selected objects? I guess we will > > > face similar problems. > > > > Right. I think that such functionality needs to return DDL commands > > that can be executed on the requested version. > > > > > If so, we may need to deny schema sync in any such case. > > > > Yes. Do we have any concrete use case where the subscriber is an older > > version, in the first place? > > > > As per my understanding, it is mostly due to the reason that it can > work today. Today, during an off-list discussion with Jonathan on this > point, he pointed me to a similar incompatibility in MySQL > replication. See the "SQL incompatibilities" section in doc[1]. Also, > please note that this applies not only to initial sync but also to > schema sync during replication. I don't think it would be feasible to > keep such cross-version compatibility for DDL replication. Makes sense to me. > Having said above, I don't intend that we must use pg_dump from the > subscriber for the purpose of initial sync. I think the idea at this > stage is to primarily write a POC patch to see what difficulties we > may face. The other options that we could try out are (a) try to > duplicate parts of pg_dump code in some way (by extracting required > code) for the subscription's initial sync, or (b) have a common code > (probably as a library or some other way) for the required > functionality. There could be more possibilities that we may not have > thought of yet. But the main point is that for approaches other than > using pg_dump, we should consider ways to avoid duplicity of various > parts of its code. Due to this, I think before ruling out using > pg_dump, we should be clear about its risks and limitations. > > Thoughts? > Agreed. My biggest concern about approaches other than using pg_dump is the same; the code duplication that could increase the maintenance costs. We should clarify what points of using pg_dump is not a good idea, and also analyze alternative ideas in depth. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2023-03-17 21:23, torikoshia wrote: On 2023-03-07 18:09, Daniel Gustafsson wrote: On 7 Mar 2023, at 09:35, Damir Belyalov wrote: I felt just logging "Error: %ld" would make people wonder the meaning of the %ld. Logging something like ""Error: %ld data type errors were found" might be clearer. Thanks. For more clearance change the message to: "Errors were found: %". I'm not convinced that this adds enough clarity to assist the user. We also shouldn't use "error" in a WARNING log since the user has explicitly asked to skip rows on error, so it's not an error per se. +1 How about something like: ereport(WARNING, (errmsg("%ld rows were skipped due to data type incompatibility", cstate->ignored_errors), errhint("Skipped rows can be inspected in the database log for reprocessing."))); Since skipped rows cannot be inspected in the log when log_error_verbosity is set to terse, it might be better without this errhint. Removed errhint. Modified some codes since v3 couldn't be applied HEAD anymore. Also modified v3 patch as below: 65 + if (cstate->opts.ignore_datatype_errors) 66 + cstate->ignored_errors = 0; 67 + It seems not necessary since cstate is initialized by palloc0() in BeginCopyFrom(). 134 + ereport(LOG, 135 + errmsg("%s", cstate->escontext.error_data->message)); 136 + 137 + return true; Since LOG means 'Reports information of interest to administrators' according to the manual[1], datatype error should not be logged as LOG. I put it back in WARNING. [1] https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 6764d7e0f21ca266d7426cb922fd00e5138ec857 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Wed, 22 Mar 2023 22:00:15 +0900 Subject: [PATCH v4] Add new COPY option IGNORE_DATATYPE_ERRORS Add new COPY option IGNORE_DATATYPE_ERRORS. Currently entire COPY fails even when there is one unexpected data regarding data type or range. IGNORE_DATATYPE_ERRORS ignores these errors and skips them and COPY data which don't contain problem. This patch uses the soft error handling infrastructure, which is introduced by d9f7f5d32f20. Author: Damir Belyalov, Atsushi Torikoshi --- doc/src/sgml/ref/copy.sgml | 12 src/backend/commands/copy.c | 8 src/backend/commands/copyfrom.c | 20 src/backend/commands/copyfromparse.c | 19 +++ src/backend/parser/gram.y| 8 +++- src/include/commands/copy.h | 1 + src/include/commands/copyfrom_internal.h | 3 +++ src/include/parser/kwlist.h | 1 + src/test/regress/expected/copy2.out | 15 +++ src/test/regress/sql/copy2.sql | 12 10 files changed, 94 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 5e591ed2e6..168b1c05d9 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -34,6 +34,7 @@ COPY { table_name [ ( format_name FREEZE [ boolean ] +IGNORE_DATATYPE_ERRORS [ boolean ] DELIMITER 'delimiter_character' NULL 'null_string' HEADER [ boolean | MATCH ] @@ -234,6 +235,17 @@ COPY { table_name [ ( + +IGNORE_DATATYPE_ERRORS + + + Drops rows that contain malformed data while copying. These are rows + with columns where the data type's input-function raises an error. + Outputs warnings about rows with incorrect data to system logfile. + + + + DELIMITER diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index f14fae3308..02d911abbe 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate, bool format_specified = false; bool freeze_specified = false; bool header_specified = false; + bool ignore_datatype_errors_specified = false; ListCell *option; /* Support external use for option sanity checking */ @@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate, freeze_specified = true; opts_out->freeze = defGetBoolean(defel); } + else if (strcmp(defel->defname, "ignore_datatype_errors") == 0) + { + if (ignore_datatype_errors_specified) +errorConflictingDefElem(defel, pstate); + ignore_datatype_errors_specified = true; + opts_out->ignore_datatype_errors = defGetBoolean(defel); + } else if (strcmp(defel->defname, "delimiter") == 0) { if (opts_out->delim) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 80bca79cd0..85c47f54b2 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -953,6 +953,7 @@ CopyFrom(CopyFromState cstate) { TupleTableSlot *myslot; bool skip_tuple; + ErrorSaveContext
Re: HOT chain validation in verify_heapam()
On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev wrote: > The patch needed a rebase due to a4f23f9b. PFA v12. I have committed this after tidying up a bunch of things in the test case file that I found too difficult to understand -- or in some cases just incorrect, like: elsif ($offnum == 35) { -# set xmax to invalid transaction id. $tup->{t_xmin} = $in_progress_xid; $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED; push @expected, -- Robert Haas EDB: http://www.enterprisedb.com
Re: Make EXPLAIN generate a generic plan for a parameterized query
Thanks for the review! On Tue, 2023-03-21 at 16:32 +0100, Christoph Berg wrote: > I have some comments: > > > This allows EXPLAIN to generate generic plans for parameterized statements > > (that have parameter placeholders like $1 in the statement text). > > > + > > + GENERIC_PLAN > > + > > + > > + Generate a generic plan for the statement (see > linkend="sql-prepare"/> > > + for details about generic plans). The statement can contain > > parameter > > + placeholders like $1 (but then it has to be a > > statement > > + that supports parameters). This option cannot be used together with > > + ANALYZE, since a statement with unknown parameters > > + cannot be executed. > > Like in the commit message quoted above, I would put more emphasis on > "parameterized query" here: > > Allow the statement to contain parameter placeholders like > $1 and generate a generic plan for it. > This option cannot be used together with ANALYZE. I went with Allow the statement to contain parameter placeholders like $1 and generate a generic plan for it. See for details about generic plans and the statements that support parameters. This option cannot be used together with ANALYZE. > > + /* check that GENERIC_PLAN is not used with EXPLAIN ANALYZE */ > > + if (es->generic && es->analyze) > > + ereport(ERROR, > > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > + errmsg("EXPLAIN ANALYZE cannot be used > > with GENERIC_PLAN"))); > > To put that in line with the other error messages in that context, I'd > inject an extra "option": > > errmsg("EXPLAIN option ANALYZE cannot be used with GENERIC_PLAN"))); Done. > > --- a/src/test/regress/sql/explain.sql > > +++ b/src/test/regress/sql/explain.sql > > [...] > > +create extension if not exists postgres_fdw; > > "create extension postgres_fdw" cannot be used from src/test/regress/ > since contrib/ might not have been built. Ouch. Good catch. > I suggest leaving this test in place here, but with local tables (to > show that plan time pruning using the one provided parameter works), > and add a comment here explaining that is being tested: > > -- create a partition hierarchy to show that plan time pruning removes > -- the key1=2 table but generates a generic plan for key2=$1 I did that, with a different comment. > The test involving postgres_fdw is still necessary to exercise the new > EXEC_FLAG_EXPLAIN_GENERIC code path, but needs to be moved elsewhere, > probably src/test/modules/. Tests for postgres_fdw are in contrib/postgres_fdw/sql/postgres_fdw.sql, so I added the test there. Version 9 of the patch is attached. Yours, Laurenz Albe From 85aa88280069ca2befe7f4308d7e6f724cdb143a Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Wed, 22 Mar 2023 14:08:49 +0100 Subject: [PATCH] Add EXPLAIN option GENERIC_PLAN This allows EXPLAIN to generate generic plans for parameterized statements (that have parameter placeholders like $1 in the statement text). Invent a new executor flag EXEC_FLAG_EXPLAIN_GENERIC that disables runtime partition pruning for such plans: that would fail if the non-existing parameters are involved, and we want to show all subplans anyway. Author: Laurenz Albe Reviewed-by: Julien Rouhaud, Christoph Berg, Michel Pelletier, Jim Jones Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel%40cybertec.at --- .../postgres_fdw/expected/postgres_fdw.out| 30 + contrib/postgres_fdw/sql/postgres_fdw.sql | 25 +++ doc/src/sgml/ref/explain.sgml | 14 +++ src/backend/commands/explain.c| 11 + src/backend/executor/execMain.c | 3 ++ src/backend/executor/execPartition.c | 9 ++-- src/backend/parser/analyze.c | 29 + src/include/commands/explain.h| 1 + src/include/executor/executor.h | 18 +--- src/test/regress/expected/explain.out | 42 +++ src/test/regress/sql/explain.sql | 24 +++ 11 files changed, 197 insertions(+), 9 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 04a3ef450c..25b91ab2e1 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -11783,3 +11783,33 @@ ANALYZE analyze_table; -- cleanup DROP FOREIGN TABLE analyze_ftable; DROP TABLE analyze_table; +-- === +-- test EXPLAIN (GENERIC_PLAN) with foreign partitions +-- === +-- this is needed to exercise the EXEC_FLAG_EXPLAIN_GENERIC flag +CREATE TABLE gen_part ( +key1 integer NOT NULL, +key2 integer NOT NULL +) PARTITION BY LIST (key1); +CREATE TABLE gen_part_1 +PARTITION OF
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Rebased after conflicts with bfc9497ece01c7c45437bc36387cb1ebe346f4d2 Also included the fix for feedback from Daniel on patch 2, which he had shared in the load balancing thread. On Wed, 15 Mar 2023 at 09:49, Jelte Fennema wrote: > > The rebase was indeed trivial (git handled everything automatically), > because my first patch was doing a superset of the changes that were > committed in b6dfee28f. Attached are the new patches. > > On Tue, 14 Mar 2023 at 19:04, Greg Stark wrote: > > > > On Tue, 14 Mar 2023 at 13:59, Tom Lane wrote: > > > > > > "Gregory Stark (as CFM)" writes: > > > > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and > > > > fe-connect.c. Every hunk is failing which perhaps means the code > > > > you're patching has been moved or refactored? > > > > > > The cfbot is giving up after > > > v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails, > > > but that's been superseded (at least in part) by b6dfee28f. > > > > Ah, same with Jelte Fennema's patch for load balancing in libpq. > > > > -- > > greg v16-0003-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v16-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v16-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch Description: Binary data v16-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data v16-0004-Add-non-blocking-version-of-PQcancel.patch Description: Binary data
Re: Doc: Improve note about copying into postgres_fdw foreign tables in batch
> While not the fault of this patch I find it confusing that we mix > and for marking up "postgres_fdw", the latter seemingly more correct > (and less commonly used) than . I think we traditionally use for an extension module (file) name. It seems the is used when we want to refer to objects other than files. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: [EXTERNAL] Support load balancing in libpq
Rebased patch after conflicts with bfc9497ece01c7c45437bc36387cb1ebe346f4d2 v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data v14-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch Description: Binary data v14-0003-Support-load-balancing-in-libpq.patch Description: Binary data v14-0005-Remove-unnecessary-check-from-Fisher-Yates-imple.patch Description: Binary data v14-0004-Add-DNS-based-libpq-load-balancing-test.patch Description: Binary data
Re: Avoid use deprecated Windows Memory API
Em qua., 22 de mar. de 2023 às 07:01, Daniel Gustafsson escreveu: > > On 19 Mar 2023, at 23:41, Michael Paquier wrote: > > > > On Fri, Mar 17, 2023 at 12:19:56PM -0300, Ranier Vilela wrote: > >> Rebased to latest Head. > > > > I was looking at this thread, and echoing Daniel's and Alvaro's > > arguments, I don't quite see why this patch is something we need. I > > would recommend to mark it as rejected and move on. > > Unless the claimed performance improvement is measured and provides a > speedup, > and the loss of zeroing memory is guaranteed safe, there doesn't seem to be > much value provided. > At no time was it suggested that there would be performance gains. The patch proposes to adjust the API that the manufacturer asks you to use. However, I see no point in discussing a defunct patch. regards, Ranier Vilela
Re: Request for comment on setting binary format output per session
If I recall the protocol-extension design correctly, such a setting could only be set at session start, which could be annoying --- at the very least we'd have to tolerate entries for unrecognized data types, since clients couldn't be expected to have checked the list against the current server in advance. The protocol extension design has the drawback that it can only be set at startup. What if we were to allow changes to the setting after startup if the client passed the cancel key as a unique identifier that only the driver would know? Dave Cramer > >
Re: Request for comment on setting binary format output per session
If there's some extension that offers type "mytype", and perhaps allows it to be installed in any schema, then it seems that the client library would know how to parse all instances of "mytype" regardless of the schema or search_path. I may be overthinking this. Dave Cramer On Tue, 21 Mar 2023 at 17:47, Jeff Davis wrote: > On Tue, 2023-03-21 at 09:22 -0400, Dave Cramer wrote: > > As Jeff mentioned there is a visibility problem if the search path is > > changed. The simplest solution IMO is to look up the OID at the time > > the format is requested and use the OID going forward to format the > > output as binary. If the search path changes and a type with the same > > name is now first in the search path then the data would be returned > > in text. > > The binary format parameter would ordinarily be set by the maintainer > of the client library, who knows nothing about the schema the client > might be accessing, and nothing about the search_path that might be > set. They would only know which binary parsers they've already written > and included with their client library. > > With that in mind, using search_path at all seems weird. Why would a > change in search_path affect which types the client library knows how > to parse? If the client library knows how to parse "foo.mytype"'s > binary representation, and you change the search path such that it > finds "bar.mytype" instead, did the client library all of a sudden > forget how to parse "foo.mytype" and learn to parse "bar.mytype"? > > If there's some extension that offers type "mytype", and perhaps allows > it to be installed in any schema, then it seems that the client library > would know how to parse all instances of "mytype" regardless of the > schema or search_path. > > Of course, a potential problem is that ordinary users can create types > (e.g. enum types) and so you'd have to be careful about some tricks > where someone shadows a well-known extension in order to confuse the > client with unexpected binary data (not sure if that's a security > concern or not, just thinking out loud). > > One solution might be that unqualified type names would work on all > types of that name (in any schema) that are owned by a superuser, > regardless of search_path. Most extension scripts will be run as > superuser anyway. It would feel a little magical, which I don't like, > but would work in any practical case I can think of. > > Another solution would be to have some extra catalog field in pg_type > that would be a "binary format identifier" and use that rather than the > type name to match up binary parsers with the proper type. > > Am I over-thinking this? > > Regards, > Jeff Davis > >
Re: Doc: Improve note about copying into postgres_fdw foreign tables in batch
> On 22 Mar 2023, at 12:58, Etsuro Fujita wrote: > > On Fri, Feb 17, 2023 at 5:45 PM Etsuro Fujita wrote: >> Here is a small patch to improve the note, which was added by commit >> 97da48246 ("Allow batch insertion during COPY into a foreign table."), >> by adding an explanation about how the actual number of rows >> postgres_fdw inserts at once is determined in the COPY case, including >> a limitation that does not apply to the INSERT case. > > Does anyone want to comment on this? Patch looks good to me, but I agree with Tatsuo downthread that "similar way to the insert case" reads better. Theoretically the number could be different from 1000 if MAX_BUFFERED_TUPLES was changed in the build, but that's a non-default not worth spending time explaining. + the actual number of rows postgres_fdw copies at While not the fault of this patch I find it confusing that we mix and for marking up "postgres_fdw", the latter seemingly more correct (and less commonly used) than . -- Daniel Gustafsson
Re: Doc: Improve note about copying into postgres_fdw foreign tables in batch
> On Fri, Feb 17, 2023 at 5:45 PM Etsuro Fujita wrote: >> Here is a small patch to improve the note, which was added by commit >> 97da48246 ("Allow batch insertion during COPY into a foreign table."), >> by adding an explanation about how the actual number of rows >> postgres_fdw inserts at once is determined in the COPY case, including >> a limitation that does not apply to the INSERT case. > > Does anyone want to comment on this? > > - This option also applies when copying into foreign tables. > + This option also applies when copying into foreign tables. In that > case > + the actual number of rows postgres_fdw copies at > + once is determined in a similar way to in the insert case, but it is "similar way to in" should be "similar way to", maybe? > + limited to at most 1000 due to implementation restrictions of the > + COPY command. > > > Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Doc: Improve note about copying into postgres_fdw foreign tables in batch
On Fri, Feb 17, 2023 at 5:45 PM Etsuro Fujita wrote: > Here is a small patch to improve the note, which was added by commit > 97da48246 ("Allow batch insertion during COPY into a foreign table."), > by adding an explanation about how the actual number of rows > postgres_fdw inserts at once is determined in the COPY case, including > a limitation that does not apply to the INSERT case. Does anyone want to comment on this? Best regards, Etsuro Fujita
Re: Comment in preptlist.c
On Wed, Mar 22, 2023 at 4:50 PM David Rowley wrote: > And now it just clicked with me why Tom left this. Sorry for stepping > on your toes here. No problem at all. Best regards, Etsuro Fujita
Re: Question: Do we have a rule to use "PostgreSQL" and "PostgreSQL" separately?
> On 22 Mar 2023, at 04:19, Hayato Kuroda (Fujitsu) > wrote: > I have also grepped to detect another wrong markups, and I think at least > "PostgreSQL" should be changed. PSA the patch. I agree with that analysis, this instance should be marked up with but not the other ones. I'll go ahead with your patch after some testing. -- Daniel Gustafsson
Re: Save a few bytes in pg_attribute
On Wed, 22 Mar 2023 at 10:42, Peter Eisentraut wrote: > > On 21.03.23 18:46, Andres Freund wrote: > > FWIW, I think we should consider getting rid of attcacheoff. I doubt it's > > worth its weight these days, because deforming via slots starts at the > > beginning anyway. The overhead of maintaining it is not insubstantial, and > > it's just architecturally ugly to to update tupledescs continually. > > Btw., could attcacheoff be int16? I had the same thought in '21, and in the patch linked upthread[0] I added an extra comment on the field: > + Note: Although the maximum offset encountered in stored tuples is > + limited to the max BLCKSZ (2**15), FormData_pg_attribute is used for > + all internal tuples as well, so attcacheoff may be larger for those > + tuples, and it is therefore not safe to use int16. So, we can't reduce its size while we use attcacheoff for (de)serialization of tuples with up to MaxAttributeNumber (=INT16_MAX) of attributes which each can be larger than one byte (such as in tuplestore, tuplesort, spilling hash aggregates, ...) Kind regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/caeze2wh8-metsryzx_ubj-uv6kb+2ynzhaejmedubjhmgus...@mail.gmail.com
Re: [BUG] pg_stat_statements and extended query protocol
Hi, On 3/21/23 2:16 PM, Imseih (AWS), Sami wrote: This indeed feels a bit more natural seen from here, after looking at the code paths using an Instrumentation in the executor and explain, for example. At least, this stresses me much less than adding 16 bytes to EState for something restricted to the extended protocol when it comes to monitoring capabilities. Attached is the patch that uses Instrumentation. Thanks, I think this new approach makes sense. - const BufferUsage *bufusage, + int64 calls, const BufferUsage *bufusage, What about using an uint64 for calls? That seems more appropriate to me (even if queryDesc->totaltime->calls will be passed (which is int64), but that's already also the case for the "rows" argument and queryDesc->totaltime->rows_processed) @@ -88,6 +88,8 @@ typedef struct Instrumentation double nfiltered2; /* # of tuples removed by "other" quals */ BufferUsage bufusage; /* total buffer usage */ WalUsagewalusage; /* total WAL usage */ + int64 calls; /* # of total calls to ExecutorRun */ + int64 rows_processed; /* # of total rows processed in ExecutorRun */ I'm not sure it's worth mentioning that the new counters are "currently" used with the ExecutorRun. What about just "total calls" and "total rows processed" (or "total rows", see below)? Also, I wonder if "rows" (and not rows_processed) would not be a better naming. Those last comments regarding the Instrumentation are done because ISTM that at the end their usage could vary depending of the use case of the Instrumentation. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Initial Schema Sync for Logical Replication
> From: Amit Kapila > Sent: Wednesday, March 22, 2023 5:16 AM > To: Masahiko Sawada > Cc: Euler Taveira ; Kumar, Sachin > ; Alvaro Herrera ; pgsql- > hack...@lists.postgresql.org; Jonathan S. Katz > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication > > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you can confirm the sender and know the > content is safe. > > > > On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada > wrote: > > > > On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila > wrote: > > > > > > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira > wrote: > > > > > > > You should > > > > exclude them removing these objects from the TOC before running > > > > pg_restore or adding a few pg_dump options to exclude these > > > > objects. Another issue is related to different version. Let's say > > > > the publisher has a version ahead of the subscriber version, a new > > > > table syntax can easily break your logical replication setup. IMO > > > > pg_dump doesn't seem like a good solution for initial synchronization. > > > > > > > > Instead, the backend should provide infrastructure to obtain the > > > > required DDL commands for the specific (set of) tables. This can > > > > work around the issues from the previous paragraph: > > > > > > > ... > > > > * don't need to worry about different versions. > > > > > > > > > > AFAICU some of the reasons why pg_dump is not allowed to dump from > > > the newer version are as follows: (a) there could be more columns in > > > the newer version of the system catalog and then Select * type of > > > stuff won't work because the client won't have knowledge of > > > additional columns. (b) the newer version could have new features > > > (represented by say new columns in existing catalogs or new > > > catalogs) that the older version of pg_dump has no knowledge of and > > > will fail to get that data and hence an inconsistent dump. The > > > subscriber will easily be not in sync due to that. > > > > > > Now, how do we avoid these problems even if we have our own version > > > of functionality similar to pg_dump for selected objects? I guess we > > > will face similar problems. > > > > Right. I think that such functionality needs to return DDL commands > > that can be executed on the requested version. > > > > > If so, we may need to deny schema sync in any such case. > > > > Yes. Do we have any concrete use case where the subscriber is an older > > version, in the first place? > > > > As per my understanding, it is mostly due to the reason that it can work > today. Today, during an off-list discussion with Jonathan on this point, he > pointed me to a similar incompatibility in MySQL replication. See the "SQL > incompatibilities" section in doc[1]. Also, please note that this applies not > only to initial sync but also to schema sync during replication. I don't > think it > would be feasible to keep such cross-version compatibility for DDL > replication. > > Having said above, I don't intend that we must use pg_dump from the > subscriber for the purpose of initial sync. I think the idea at this stage is > to > primarily write a POC patch to see what difficulties we may face. The other > options that we could try out are (a) try to duplicate parts of pg_dump code > in some way (by extracting required > code) for the subscription's initial sync, or (b) have a common code (probably > as a library or some other way) for the required functionality. There could be > more possibilities that we may not have thought of yet. But the main point is > that for approaches other than using pg_dump, we should consider ways to > avoid duplicity of various parts of its code. Due to this, I think before > ruling > out using pg_dump, we should be clear about its risks and limitations. > > Thoughts? There is one more thing which needs to be consider even if we use pg_dump/pg_restore We still need to have a way to get the create table for tables , if we want to support concurrent DDLs on the publisher. >8. TableSync process should check the state of table , if it is >SUBREL_STATE_CREATE it should >get the latest definition from the publisher and recreate the table. (We have >to recreate >the table even if there are no changes). Then it should go into copy table >mode as usual. Unless there is different way to support concurrent DDLs or we going for blocking publisher till initial sync is completed. Regards Sachin > > [1] - https://dev.mysql.com/doc/refman/8.0/en/replication- > compatibility.html > [2] - https://www.postgresql.org/message- > id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ- > uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com > > -- > With Regards, > Amit Kapila.
RE: Allow logical replication to copy tables in binary format
Dear Amit, hackers, > The patch looks mostly good to me. However, I have one > question/comment as follows: > > - > + > binary (boolean) > > > To allow references to the binary option, we add the varlistentry id > here. It looks slightly odd to me to add id for just one entry, see > commit 78ee60ed84bb3a1cf0b6bd9a715dcbcf252a90f5 where we have > purposefully added ids to allow future references. Shall we add id to > other options as well on this page? I have analyzed same points and made patch that could be applied atop v19-0001. Please check 0002 patch. Best Regards, Hayato Kuroda FUJITSU LIMITED v19-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: v19-0001-Allow-logical-replication-to-copy-table-in-binary.patch v19-0002-Add-XML-ID-attributes-to-create_subscription.sgm.patch Description: v19-0002-Add-XML-ID-attributes-to-create_subscription.sgm.patch
Re: Commitfest 2023-03 starting tomorrow!
> On 22 Mar 2023, at 10:39, Peter Eisentraut > wrote: > Personally, if a patch isn't rebased up to the minute doesn't bother me at > all. It's easy to check out as of when the email was sent (or extra bonus > points for using git format-patch --base). Now, rebasing every month or so > is nice, but daily rebases during a commit fest are almost more distracting > than just leaving it. +1. As long as the patch is rebased and builds/tests green when the CF starts I'm not too worried about not having it always rebased during the CF. If resolving the conflicts are non-trivial/obvious then of course, but if only to stay recent and avoid fuzz in applying then it's more distracting. -- Daniel Gustafsson
Re: About a recently-added permission-related error message
On Mon, 20 Mar 2023 17:05:41 +0900 (JST) Kyotaro Horiguchi wrote: > I found an error message added by de4d456b406bf502341ef526710d3f764b41e2c8. > > When I incorrectly configured the primary_conninfo with the wrong > user, I received the following message on the server logs of both > servers involved in a physical replcation set. > > [27022:walsender] FATAL: permission denied to start WAL sender > [27022:walsender] DETAIL: Only roles with the REPLICATION attribute may > start a WAL sender process. > > I'm not sure if adding the user name in the log prefix is a common > practice, but without it, the log line might not have enough > information. Unlike other permission-related messages, this message is > not the something human operators receive in response to their > actions. It seems similar to connection authorization logs where the > user name is important. So, I'd like to propose the following > alternative. I am not sure whether this change is necessary because the error message will appear in the log of the standby server and users can easily know the connection user just by checking primary_conninfo. > [27022:walsender] DETAIL: The connection user "r1" requires the REPLICATION > attribute. However, if we need this change, how about using "DETAIL: The connection user "r1" must have the REPLICATION attribute." This pattern is used in other part like check_object_ownership() and AlterRole(). The user name is not included there, though. Regards, Yugo Nagata > What do you think about this change? > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center -- Yugo NAGATA
Re: meson: Non-feature feature options
Hi, On Wed, 15 Mar 2023 at 11:12, Peter Eisentraut wrote: > > On 14.03.23 15:07, Nazir Bilal Yavuz wrote: > >> I think the uuid side of this is making this way too complicated. I'm > >> content leaving this as a manual option for now. > >> > >> There is much more value in making the ssl option work automatically. > >> So I would welcome a patch that just makes -Dssl=auto work smoothly, > >> perhaps using the "trick" that Andres described. > > I tried to implement what we did for ssl to uuid as well, do you have > > any comments? > > For the uuid option, we have three different choices. What should be > the search order and why? Docs [1] say that: OSSP uuid library is not well maintained, and is becoming increasingly difficult to port to newer platforms; so we can put 'uuid-ossp' to the end. Between 'uuid-e2fs' and 'uuid-bsd', I believe 'uuid-e2fs' is used more often than 'uuid-bsd'. Hence, they can be ordered as 'uuid-e2fs', 'uuid-bsd', 'uuid-ossp'. Does that make sense? Regards, Nazir Bilal Yavuz Microsoft [1] https://www.postgresql.org/docs/current/uuid-ossp.html
Re: Issue attaching a table to a partitioned table with an auto-referenced foreign key
One last ping, hoping someone will have more time now than in january. Perhaps my test is wrong, but I'd like to know why. Thanks. Le mar. 17 janv. 2023 à 16:53, Guillaume Lelarge a écrit : > Quick ping, just to make sure someone can get a look at this issue :) > Thanks. > > > Le ven. 6 janv. 2023 à 11:07, Guillaume Lelarge > a écrit : > >> Hello, >> >> One of our customers has an issue with partitions and foreign keys. He >> works on a v13, but the issue is also present on v15. >> >> I attach a SQL script showing the issue, and the results on 13.7, 13.9, >> and 15.1. But I'll explain the script here, and its behaviour on 13.9. >> >> There is one partitioned table, two partitions and a foreign key. The >> foreign key references the same table: >> >> create table t1 ( >> c1 bigint not null, >> c1_old bigint null, >> c2 bigint not null, >> c2_old bigint null, >> primary key (c1, c2) >> ) >> partition by list (c1); >> create table t1_a partition of t1 for values in (1); >> create table t1_def partition of t1 default; >> alter table t1 add foreign key (c1_old, c2_old) references t1 (c1, c2) on >> delete restrict on update restrict; >> >> I've a SQL function that shows me some information from pg_constraints >> (code of the function in the SQL script attached). Here is the result of >> this function after creating the table, its partitions, and its foreign key: >> >> select * from show_constraints(); >> conname | t| tref | coparent >> +++--- >> t1_c1_old_c2_old_fkey | t1 | t1 | >> t1_c1_old_c2_old_fkey | t1_a | t1 | t1_c1_old_c2_old_fkey >> t1_c1_old_c2_old_fkey | t1_def | t1 | t1_c1_old_c2_old_fkey >> t1_c1_old_c2_old_fkey1 | t1 | t1_a | t1_c1_old_c2_old_fkey >> t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey >> (5 rows) >> >> The constraint works great : >> >> insert into t1 values(1, NULL, 2, NULL); >> insert into t1 values(2, 1,2, 2); >> delete from t1 where c1 = 1; >> psql:ticket15010_v3.sql:34: ERROR: update or delete on table "t1_a" >> violates foreign key constraint "t1_c1_old_c2_old_fkey1" on table "t1" >> DETAIL: Key (c1, c2)=(1, 2) is still referenced from table "t1". >> >> This error is normal since the line I want to delete is referenced on the >> other line. >> >> If I try to detach the partition, it also gives me an error. >> >> alter table t1 detach partition t1_a; >> psql:ticket15010_v3.sql:36: ERROR: removing partition "t1_a" violates >> foreign key constraint "t1_c1_old_c2_old_fkey1" >> DETAIL: Key (c1_old, c2_old)=(1, 2) is still referenced from table "t1". >> >> Sounds good to me too (well, I'd like it to be smarter and find that the >> constraint is still good after the detach, but I can understand why it >> won't allow it). >> >> The pg_constraint didn't change of course: >> >> select * from show_constraints(); >> conname | t| tref | coparent >> +++--- >> t1_c1_old_c2_old_fkey | t1 | t1 | >> t1_c1_old_c2_old_fkey | t1_a | t1 | t1_c1_old_c2_old_fkey >> t1_c1_old_c2_old_fkey | t1_def | t1 | t1_c1_old_c2_old_fkey >> t1_c1_old_c2_old_fkey1 | t1 | t1_a | t1_c1_old_c2_old_fkey >> t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey >> (5 rows) >> >> Now, I'll delete the whole table contents, and I'll detach the partition: >> >> delete from t1; >> alter table t1 detach partition t1_a; >> >> It seems to be working, but the content of pg_constraints is weird: >> >> select * from show_constraints(); >> conname | t| tref | coparent >> +++--- >> t1_c1_old_c2_old_fkey | t1 | t1 | >> t1_c1_old_c2_old_fkey | t1_a | t1 | >> t1_c1_old_c2_old_fkey | t1_def | t1 | t1_c1_old_c2_old_fkey >> t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey >> (4 rows) >> >> I understand why the ('t1_c1_old_c2_old_fkey1', 't1', 't1_a', >> 't1_c1_old_c2_old_fkey') tuple has gone but I don't understand why the >> ('t1_c1_old_c2_old_fkey', 't1_a', 't1', NULL) tuple is still there. >> >> Anyway, I attach the partition: >> >> alter table t1 attach partition t1_a for values in (1); >> >> But pg_constraint has not changed: >> >> select * from show_constraints(); >> conname | t| tref | coparent >> +++--- >> t1_c1_old_c2_old_fkey | t1 | t1 | >> t1_c1_old_c2_old_fkey | t1_a | t1 | t1_c1_old_c2_old_fkey >> t1_c1_old_c2_old_fkey | t1_def | t1 | t1_c1_old_c2_old_fkey >> t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey >> (4 rows) >> >> I was expecting to see the fifth tuple coming back, but alas, no. >> >> And as a result, the foreign key doesn't work anymore: >> >> insert into t1
RE: Data is copied twice when specifying both child and parent table in publication
On Wed, Mar 22, 2023 at 14:32 PM Kuroda, Hayato/黒田 隼人 wrote: > Dear Wang, > > Thank you for updating patch! Following are comments form v19-0001. Thanks for your comments. > 01. logical-replication.sgml > > I found a following statement in logical-replication.sgml. I think this may > cause > mis-reading because it's OK when publishers list partitions and > publish_via_root > is true. > > ``` > >A subscriber node may have multiple subscriptions if desired. It is >possible to define multiple subscriptions between a single >publisher-subscriber pair, in which case care must be taken to ensure >that the subscribed publication objects don't overlap. > > ``` > > How about adding "If publications are set publish_via_partition_root as true > and > they publish partitions that have same partitioned table, only a change to > partitioned > table is published from the publisher."or something like that? I think these seem to be two different scenarios: The scenario mentioned here is multiple subscriptions at the subscription node, while the scenario we fixed this time is a single subscription at the subscription node. So, it seems that these two notes are not strongly related. > 02. filter_partitions > > IIUC this function can refactor like following to avoid "skip" flag. > How do you think? > > ``` > @@ -209,7 +209,6 @@ filter_partitions(List *table_infos) > > foreach(lc, table_infos) > { > - boolskip = false; > List *ancestors = NIL; > ListCell *lc2; > published_rel *table_info = (published_rel *) lfirst(lc); > @@ -224,13 +223,10 @@ filter_partitions(List *table_infos) > /* Is ancestor exists in the published table list? */ > if (is_ancestor_member_tableinfos(ancestor, > table_infos)) > { > - skip = true; > + table_infos = > foreach_delete_current(table_infos, lc); > break; > } > } > - > - if (skip) > - table_infos = foreach_delete_current(table_infos, lc); > } > } > ``` I think this approach deletes the cell of the list of the outer loop in the inner loop. IIUC, we can only use function foreach_delete_current in the current loop to delete the cell of the current loop. > 03. fetch_table_list > > ``` > + /* Get the list of tables from the publisher. */ > + if (server_version >= 16) > ``` > > I think boolean variable can be used to check it like check_columnlist. > How about "use_extended_function" or something? Since we only need it once, I think it's fine not to add a new variable. Regards, Wang Wei
RE: Data is copied twice when specifying both child and parent table in publication
On Wed, Mar 22, 2023 at 12:50 PM Peter Smith wrote: > Here are some review comments for patch code of HEAD_v19-0001 Thanks for your comments. > == > doc/src/sgml/ref/create_publication.sgml > > 1. > + > + There can be a case where a subscription combines multiple > + publications. If a root partitioned table is published by any > + subscribed publications which set > + publish_via_partition_root = true, changes on > this > + root partitioned table (or on its partitions) will be published > using > + the identity and schema of this root partitioned table rather than > + that of the individual partitions. > + > > 1a. > The paragraph prior to this one just refers to "partitioned tables" > instead of "root partitioned table", so IMO we should continue with > the same terminology. Changed. > I also modified the remaining text slightly. AFAIK my suggestion > conveys exactly the same information but is shorter. > > SUGGESTION > There can be a case where one subscription combines multiple > publications. If any of those publications has set > publish_via_partition_root = true, then changes in > a partitioned table (or on its partitions) will be published using the > identity and schema of the partitioned table. Sorry, I'm not sure about this. I'm not a native speaker of English, but it seems like the following use case is not explained: ``` create table t1 (a int primary key) partition by range (a); create table t2 (a int primary key) partition by range (a); create table t3 (a int primary key); alter table t1 attach partition t2 default; alter table t2 attach partition t3 default; create publication p1 for table t1; create publication p2_via for table t2 with(publish_via_partition_root); create publication p3 for table t3; ``` If we subscribe to p1, p2_via and p3 at the same time, then t2's identity and schema will be used instead of t1's (and of course not t3's). > ~ > > 1b. > Shouldn't that paragraph (or possibly somewhere in the CREATE > SUBSCRIPTION notes?) also explain that in this scenario the logical > replication will only publish one set of changes? After all, this is > the whole point of the patch, but I am not sure if the user will know > of this behaviour from the current documentation. It seems to me that what you're explaining is what users expect. So, it seems we don't need to explain it. BTW IIUC, when user wants to use the "via_root" option, they should first read the pg-doc to confirm the meaning and related notes of this option. So, I'm not sure if adding this section in other documentation would be redundant. > == > src/backend/catalog/pg_publication.c > > 2. filter_partitions > > BEFORE: > static void > filter_partitions(List *table_infos) > { > ListCell *lc; > > foreach(lc, table_infos) > { > bool skip = false; > List*ancestors = NIL; > ListCell*lc2; > published_rel*table_info = (published_rel *) lfirst(lc); > > if (get_rel_relispartition(table_info->relid)) > ancestors = get_partition_ancestors(table_info->relid); > > foreach(lc2, ancestors) > { > Oid ancestor = lfirst_oid(lc2); > > /* Is ancestor exists in the published table list? */ > if (is_ancestor_member_tableinfos(ancestor, table_infos)) > { > skip = true; > break; > } > } > > if (skip) > table_infos = foreach_delete_current(table_infos, lc); > } > } > > ~ > > 2a. > My previous review [1] (see #1) suggested putting most code within the > condition. AFAICT my comment still is applicable but was not yet > addressed. Personally, I prefer the current style because the approach you mentioned adds some indentations. > 2b. > IMO the comment "/* Is ancestor exists in the published table list? > */" is unnecessary because it is already clear what is the purpose of > the function named "is_ancestor_member_tableinfos". Removed. > == > src/backend/commands/subscriptioncmds.c > > 3. > fetch_table_list(WalReceiverConn *wrconn, List *publications) > { > WalRcvExecResult *res; > - StringInfoData cmd; > + StringInfoData cmd, > + pub_names; > TupleTableSlot *slot; > Oid tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID}; > List*tablelist = NIL; > - bool check_columnlist = (walrcv_server_version(wrconn) >= 15); > + int server_version = walrcv_server_version(wrconn); > + bool check_columnlist = (server_version >= 15); > + > + initStringInfo(_names); > + get_publications_str(publications, _names, true); > > initStringInfo(); > - appendStringInfoString(, "SELECT DISTINCT t.schemaname, t.tablename > \n"); > > - /* Get column lists for each relation if the publisher supports it */ > - if (check_columnlist) > - appendStringInfoString(, ", t.attnames\n"); > + /* Get the list of tables from the publisher. */ > + if (server_version >= 16) > + { > > ~ > > I think the 'pub_names' is only needed within that ">= 16" condition. > > So all the below code can be moved into that scope can't it? > > +
Re: Avoid use deprecated Windows Memory API
> On 19 Mar 2023, at 23:41, Michael Paquier wrote: > > On Fri, Mar 17, 2023 at 12:19:56PM -0300, Ranier Vilela wrote: >> Rebased to latest Head. > > I was looking at this thread, and echoing Daniel's and Alvaro's > arguments, I don't quite see why this patch is something we need. I > would recommend to mark it as rejected and move on. Unless the claimed performance improvement is measured and provides a speedup, and the loss of zeroing memory is guaranteed safe, there doesn't seem to be much value provided. -- Daniel Gustafsson
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 22, 2023 at 9:00 AM shiy.f...@fujitsu.com wrote: > > On Wed Mar 22, 2023 7:29 AM Peter Smith wrote: > > > > Thanks for all the patch updates. Patch v19 LGTM. > > > > +1 > The patch looks mostly good to me. However, I have one question/comment as follows: - + binary (boolean) To allow references to the binary option, we add the varlistentry id here. It looks slightly odd to me to add id for just one entry, see commit 78ee60ed84bb3a1cf0b6bd9a715dcbcf252a90f5 where we have purposefully added ids to allow future references. Shall we add id to other options as well on this page? -- With Regards, Amit Kapila.
Re: gcc 13 warnings
On 18.03.23 00:54, Andres Freund wrote: I think a good compromise would be buildtype=debugoptimized, which is -O2 with debug symbols, which also sort of matches the default in the autoconf world. Looks like that'd result in a slightly worse build with msvc, as afaict we wouldn't end up with /OPT:REF doesn't get specified, which automatically gets disabled if /DEBUG is specified. I guess we can live with that. I looked up what /OPT:REF does (https://learn.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=msvc-170), and it seems pretty obscure to me, at least for development builds.
Re: Bytea PL/Perl transform
> On 18 Mar 2023, at 23:25, Иван Панченко wrote: > > Hi, > PostgreSQL passes bytea arguments to PL/Perl functions as hexadecimal > strings, which is not only inconvenient, but also memory and time consuming. > So I decided to propose a simple transform extension to pass bytea as native > Perl octet strings. > Please find the patch attached. Thanks for the patch, I recommend registering this in the currently open Commitfest to make sure it's kept track of: https://commitfest.postgresql.org/43/ -- Daniel Gustafsson
Re: Save a few bytes in pg_attribute
On 21.03.23 18:46, Andres Freund wrote: FWIW, I think we should consider getting rid of attcacheoff. I doubt it's worth its weight these days, because deforming via slots starts at the beginning anyway. The overhead of maintaining it is not insubstantial, and it's just architecturally ugly to to update tupledescs continually. Btw., could attcacheoff be int16?
Re: Commitfest 2023-03 starting tomorrow!
On 21.03.23 10:59, Alvaro Herrera wrote: This led me to suggesting that perhaps we need to be more lenient when it comes to new contributors. As I said, for seasoned contributors, it's not a problem to keep up with our requirements, however silly they are. But people who spend their evenings a whole week or month trying to understand how to patch for one thing that they want, to be received by six months of silence followed by a constant influx of "please rebase please rebase please rebase", no useful feedback, and termination with "eh, you haven't rebased for the 1001th time, your patch has been WoA for X days, we're setting it RwF, feel free to return next year" ... they are most certainly off-put and will*not* try again next year. Personally, if a patch isn't rebased up to the minute doesn't bother me at all. It's easy to check out as of when the email was sent (or extra bonus points for using git format-patch --base). Now, rebasing every month or so is nice, but daily rebases during a commit fest are almost more distracting than just leaving it.