Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Mar 20, 2023 at 9:34 PM Masahiko Sawada wrote: > > On Mon, Mar 20, 2023 at 9:34 PM John Naylor > wrote: > > That's an interesting idea, and the analogous behavior to aset could be a good thing for readability and maintainability. Worth seeing if it's workable. > > I've attached a quick hack patch. It can be applied on top of v32 > patches. The changes to dsa.c are straightforward since it makes the > initial and max block sizes configurable. Good to hear -- this should probably be proposed in a separate thread for wider visibility. -- John Naylor EDB: http://www.enterprisedb.com
Re: Eliminating SPI from RI triggers - take 2
Hi Greg, On Tue, Mar 21, 2023 at 3:54 AM Gregory Stark (as CFM) wrote: > On Mon, 17 Oct 2022 at 14:59, Robert Haas wrote: > > But I think the bigger problem for this patch set is that the > > design-level feedback from > > https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com > > hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid > > is still trivial in v7, and that still seems wrong to me. And I still > > don't know how we're going to avoid changing the semantics in ways > > that are undesirable, or even knowing precisely what we did change. If > > we don't have answers to those questions, then I suspect that this > > patch set isn't going anywhere. > > Amit, do you plan to work on this patch for this commitfest (and > therefore this release?). And do you think it has a realistic chance > of being ready for commit this month? Unfortunately, I don't think so. > It looks to me like you have some good feedback and can progress and > are unlikely to finish this patch for this release. In which case > maybe we can move it forward to the next release? Yes, that's what I am thinking too at this point. I agree with Robert's point that changing the implementation from an SQL query plan to a hand-rolled C function is going to change the semantics in some known and perhaps many unknown ways. Until I have enumerated all those semantic changes, it's hard to judge whether the hand-rolled implementation is correct to begin with. I had started doing that a few months back but couldn't keep up due to some other work. An example I had found of a thing that would be broken by taking out the executor out of the equation, as the patch does, is the behavior of an update under READ COMMITTED isolation, whereby a PK tuple being checked for existence is concurrently updated and thus needs to rechecked whether it still satisfies the RI query's conditions. The executor has the EvalPlanQual() mechanism to do that, but while the hand-rolled implementation did refactor ExecLockRows() to allow doing the tuple-locking without a PlanState, it gave no consideration to handling rechecking under READ COMMITTED isolation. There may be other such things and I think I'd better look for them carefully in the next cycle than in the next couple of weeks for this release. My apologies that I didn't withdraw the patch sooner. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: psql: Add role's membership options to the \du+ command
I would suggest tweaking the test output to include regress_du_admin ... I found (with a help of cfbot) difficulty with this. The problem is the bootstrap superuser name (oid=10). This name depends on the OS username. In my case it's pal, but in most cases it's postgres or something else. And the output of \du regress_du_admin can't be predicted: \du regress_du_admin List of roles Role name | Attributes | Member of --+-+- regress_du_admin | Create role | regress_du_role0 from pal (a, i, s)+ | | regress_du_role1 from pal (a, i, s)+ | | regress_du_role2 from pal (a, i, s) So, I decided not to include regress_du_admin in the test output. Please, see version 5 attached. Only tests changed. - Pavel Luzanov From b8f35733126a843edd47a1f89da0d9f8babeec93 Mon Sep 17 00:00:00 2001 From: Pavel Luzanov Date: Tue, 21 Mar 2023 05:58:31 +0300 Subject: [PATCH v5] psql: show membership options in the \du command Format for the "Member of" column completely redesigned. Shown within each row, in newline-separated format, are the memberships granted to the role. The presentation includes both the name of the grantor as well as the membership permissions (in an abbreviated format: a for admin option, i for inherit option, s for set option.) The word 'empty' is printed in the case that none of those permissions are granted. --- doc/src/sgml/ref/psql-ref.sgml | 30 --- src/bin/psql/describe.c| 61 -- src/test/regress/expected/psql.out | 49 src/test/regress/sql/psql.sql | 30 +++ 4 files changed, 144 insertions(+), 26 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7b8ae9fac3..03e7da93de 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1727,9 +1727,18 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g S modifier to include system roles. If pattern is specified, only those roles whose names match the pattern are listed. -If the form \dg+ is used, additional information -is shown about each role; currently this adds the comment for each -role. + + +Shown within each row, in newline-separated format, are the memberships granted to +the role. The presentation includes both the name of the grantor +as well as the membership permissions (in an abbreviated format: +a for admin option, i for inherit option, +s for set option.) The word empty is printed in +the case that none of those permissions are granted. +See the GRANT command for their meaning. + + +If the form \dg+ is used the comment attached to the role is shown. @@ -1969,9 +1978,18 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g S modifier to include system roles. If pattern is specified, only those roles whose names match the pattern are listed. -If the form \du+ is used, additional information -is shown about each role; currently this adds the comment for each -role. + + +Shown within each row, in newline-separated format, are the memberships granted to +the role. The presentation includes both the name of the grantor +as well as the membership permissions (in an abbreviated format: +a for admin option, i for inherit option, +s for set option.) The word empty is printed in +the case that none of those permissions are granted. +See the GRANT command for their meaning. + + +If the form \du+ is used the comment attached to the role is shown. diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 99e28f607e..9f7b7326e9 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -3631,24 +3631,42 @@ describeRoles(const char *pattern, bool verbose, bool showSystem) printfPQExpBuffer(, "SELECT r.rolname, r.rolsuper, r.rolinherit,\n" " r.rolcreaterole, r.rolcreatedb, r.rolcanlogin,\n" - " r.rolconnlimit, r.rolvaliduntil,\n" - " ARRAY(SELECT b.rolname\n" - "FROM pg_catalog.pg_auth_members m\n" - "JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n" - "WHERE m.member = r.oid) as memberof"); + " r.rolconnlimit, r.rolvaliduntil, r.rolreplication,\n"); - if (verbose) - { - appendPQExpBufferStr(, "\n, pg_catalog.shobj_description(r.oid, 'pg_authid') AS description"); - ncols++; - } - appendPQExpBufferStr(, "\n, r.rolreplication"); + if (pset.sversion >= 16) + appendPQExpBufferStr(, + " (SELECT
Re: Track IO times in pg_stat_io
Hi, On 2023-03-16 17:19:16 -0400, Melanie Plageman wrote: > > > > I wonder if we should get rid of pgStatBlockReadTime, > > > > pgStatBlockWriteTime, > > > > > > And then have pg_stat_reset_shared('io') reset pg_stat_database IO > > > stats? > > > > Yes. > > I think this makes sense but I am hesitant to do it in this patchset, > because it feels a bit hidden...maybe? I'd not do it in the same commit, but I don't see a problem with doing it in the same patchset. Now that I think about it again, this wouldn't make pg_stat_reset_shared('io') affect pg_stat_database - I was thinking we should use pgstat_io.c stats to provide the information for pgstat_database.c, using its own pending counter. > > No, I don't think I am suggesting that. What I am trying to suggest is that > > PendingIOStats should contain instr_time, but that PgStat_IO should contain > > PgStat_Counter as microseconds, as before. > > So, I've modified the code to make a union of instr_time and > PgStat_Counter in PgStat_BktypeIO. I am not quite sure if this is okay. > I store in microsec and then in pg_stat_io, I multiply to get > milliseconds for display. Not a fan - what do we gain by having this union? It seems considerably cleaner to have a struct local to pgstat_io.c that uses instr_time and have a clean type in PgStat_BktypeIO. In fact, the code worked after just changing that. I don't think it makes sense to have pgstat_io_start()/end() as well as pgstat_count_io*. For one, the name seems in a "too general namespace" - why not a pgstat_count*? > I considered refactoring pgstat_io_end() to use INSTR_TIME_ACCUM_DIFF() > like [1], but, in the end I actually think I would end up with more > operations because of the various different counters needing to be > updated. As it is now, I do a single subtract and a few adds (one for > each of the different statistics objects tracking IO times > (pgBufferUsage, pgStatBlockWrite/ReadTime). Whereas, I would need to do > an accum diff for every one of those. Right - that only INSTR_TIME_ACCUM_DIFF() only makes sense if there's just a single counter to update. WRT: /* TODO: AFAICT, pgstat_count_buffer_write_time is only called */ /* for shared buffers whereas pgstat_count_buffer_read_time is */ /* called for temp relations and shared buffers. */ /* * is this intentional and should I match current behavior or * not? */ It's hard to see how that behaviour could be intentional. Probably worth fixing in a separate patch. I don't think we're going to backpatch, but it would make this clearer nonetheless. Incremental patch with some of the above changed attached. Btw, it's quite nice how one now can attribute time more easily: 20 connections copying an 8MB file 50 times each: SELECT reuses, evictions, writes, write_time, extends, extend_time FROM pg_stat_io WHERE backend_type = 'client backend' AND io_object = 'relation' AND io_context='bulkwrite'; ┌┬───┬┬┬─┬─┐ │ reuses │ evictions │ writes │ write_time │ extends │ extend_time │ ├┼───┼┼┼─┼─┤ │ 36112 │ 0 │ 36112 │141 │ 1523176 │8676 │ └┴───┴┴┴─┴─┘ 20 connections copying an 80MB file 5 times each: ┌─┬───┬─┬┬─┬─┐ │ reuses │ evictions │ writes │ write_time │ extends │ extend_time │ ├─┼───┼─┼┼─┼─┤ │ 1318539 │ 0 │ 1318539 │ 5013 │ 1523339 │7873 │ └─┴───┴─┴┴─┴─┘ (1 row) Greetings, Andres >From 5d4aa3f6c651006f1ec960f59e24ebc8b5a8ca25 Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Mon, 6 Mar 2023 10:41:51 -0500 Subject: [PATCH v6 1/2] Track IO times in pg_stat_io Add IO timing for reads, writes, extends, and fsyncs to pg_stat_io. Reviewed-by: Bertrand Drouvot Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ay5iKmnbXZ3DsauViF3eMxu4m1oNnJXqV_HyqYeg55Ww%40mail.gmail.com --- src/include/catalog/pg_proc.dat| 6 +- src/include/pgstat.h | 14 +++- src/backend/catalog/system_views.sql | 4 + src/backend/storage/buffer/bufmgr.c| 56 ++ src/backend/storage/buffer/localbuf.c | 6 +- src/backend/storage/smgr/md.c | 27 --- src/backend/utils/activity/pgstat.c| 77 ++- src/backend/utils/activity/pgstat_io.c | 101 - src/backend/utils/adt/pgstatfuncs.c| 48 ++-- doc/src/sgml/monitoring.sgml | 59 +++ src/test/regress/expected/rules.out| 6
Re: Assertion failure with barriers in parallel hash join
Pushed and back-patched, with minor comment tweaks. Apologies for taking so long.
Re: Initial Schema Sync for Logical Replication
On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote: > > From: Alvaro Herrera > > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication > > On 2023-Mar-15, Kumar, Sachin wrote: > > > > > 1. In CreateSubscription() when we create replication > > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we > > can use this snapshot later in the pg_dump. > > > > > > 2. Now we can call pg_dump with above snapshot from CreateSubscription. > > > > Overall I'm not on board with the idea that logical replication would > > depend on > > pg_dump; that seems like it could run into all sorts of trouble (what if > > calling > > external binaries requires additional security setup? what about pg_hba > > connection requirements? what about max_connections in tight > > circumstances?). > > what if calling external binaries requires additional security setup > I am not sure what kind of security restriction would apply in this case, > maybe pg_dump > binary can be changed ? Using pg_dump as part of this implementation is not acceptable because we expect the backend to be decoupled from the client. Besides that, pg_dump provides all table dependencies (such as tablespaces, privileges, security labels, comments); not all dependencies shouldn't be replicated. You should exclude them removing these objects from the TOC before running pg_restore or adding a few pg_dump options to exclude these objects. Another issue is related to different version. Let's say the publisher has a version ahead of the subscriber version, a new table syntax can easily break your logical replication setup. IMO pg_dump doesn't seem like a good solution for initial synchronization. Instead, the backend should provide infrastructure to obtain the required DDL commands for the specific (set of) tables. This can work around the issues from the previous paragraph: * you can selectively choose dependencies; * don't require additional client packages; * don't need to worry about different versions. This infrastructure can also be useful for other use cases such as: * client tools that provide create commands (such as psql, pgAdmin); * other logical replication solutions; * other logical backup solutions. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: Allow logical replication to copy tables in binary format
Here are my review comments for v18-0001 == doc/src/sgml/logical-replication.sgml 1. + target table. However, logical replication in binary format is more + restrictive. See the binary option of + CREATE SUBSCRIPTION + for details. Because you've changed the linkend to be the binary option, IMO now the part also needs to be modified. Otherwise, this page has multiple "CREATE SUBSCRIPTION" links which jump to different places, which just seems wrong to me. SUGGESTION (for the "See the" sentence) See the binary option of CREATE SUBSCRIPTION for details. == doc/src/sgml/ref/alter_subscription.sgml 2. + + See the binary option of + CREATE SUBSCRIPTION + for details about copying pre-existing data in binary format. + (Same as review comment #1 above) SUGGESTION See the binary option of CREATE SUBSCRIPTION for details about copying pre-existing data in binary format. == src/backend/replication/logical/tablesync.c 3. + /* + * Prior to v16, initial table synchronization will use text format even + * if the binary option is enabled for a subscription. + */ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16 && + MySubscription->binary) + { + appendStringInfoString(, " WITH (FORMAT binary)"); + options = lappend(options, + makeDefElem("format", + (Node *) makeString("binary"), -1)); + } I think there can only be 0 or 1 list element in 'options'. So, why does the code here use lappend(options,...) instead of just using list_make1(...)? == src/test/subscription/t/014_binary.pl 4. # - # Test mismatched column types with/without binary mode # - # Test syncing tables with mismatching column types $node_publisher->safe_psql( 'postgres', qq( CREATE TABLE public.test_mismatching_types ( a bigint PRIMARY KEY ); INSERT INTO public.test_mismatching_types (a) VALUES (1), (2); )); # Check the subscriber log from now on. $offset = -s $node_subscriber->logfile; # Ensure the subscription is enabled. disable_on_error is still true, # so the subscription can be disabled due to missing realtion until # the test_mismatching_types table is created. $node_subscriber->safe_psql( 'postgres', qq( CREATE TABLE public.test_mismatching_types ( a int PRIMARY KEY ); ALTER SUBSCRIPTION tsub ENABLE; ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; )); ~~ I found the "Ensure the subscription is enabled..." comment and the necessity for enabling the subscription to be confusing. Can't some complications all be eliminated just by creating the table on the subscribe side first? For example, I rearranged that test (above fragment) like below and it still works OK for me: # - # Test mismatched column types with/without binary mode # - # Create the table on the subscriber side $node_subscriber->safe_psql( 'postgres', qq( CREATE TABLE public.test_mismatching_types ( a int PRIMARY KEY ))); # Check the subscriber log from now on. $offset = -s $node_subscriber->logfile; # Test syncing tables with mismatching column types $node_publisher->safe_psql( 'postgres', qq( CREATE TABLE public.test_mismatching_types ( a bigint PRIMARY KEY ); INSERT INTO public.test_mismatching_types (a) VALUES (1), (2); )); # Refresh the publication to trigger the tablesync $node_subscriber->safe_psql( 'postgres', qq( ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; )); -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Fix typo plgsql to plpgsql.
On Tue, Mar 21, 2023 at 09:04:32AM +0800, Richard Guo wrote: > +1. I believe these are typos. And a grep search shows that all typos of > this kind are addressed by the patch. Yes, you are right. The comments fixed here are related to plpgsql. Will fix.. -- Michael signature.asc Description: PGP signature
RE: Initial Schema Sync for Logical Replication
Hi Alvaro, > From: Alvaro Herrera > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication > On 2023-Mar-15, Kumar, Sachin wrote: > > > 1. In CreateSubscription() when we create replication > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we > can use this snapshot later in the pg_dump. > > > > 2. Now we can call pg_dump with above snapshot from CreateSubscription. > > Overall I'm not on board with the idea that logical replication would depend > on > pg_dump; that seems like it could run into all sorts of trouble (what if > calling > external binaries requires additional security setup? what about pg_hba > connection requirements? what about max_connections in tight > circumstances?). > what if calling external binaries requires additional security setup I am not sure what kind of security restriction would apply in this case, maybe pg_dump binary can be changed ? > what about pg_hba connection requirements? We will use the same connection string which subscriber process uses to connect to the publisher. >what about max_connections in tight circumstances? Right that might be a issue, but I don’t think it will be a big issue, We will create dump of database in CreateSubscription() function itself , So before tableSync process even starts if we have reached max_connections while calling pg_dump itself , tableSync wont be successful. > It would be much better, I think, to handle this internally in the publisher > instead: > similar to how DDL sync would work, except it'd somehow generate the CREATE > statements from the existing tables instead of waiting for DDL events to > occur. I > grant that this does require writing a bunch of new code for each object > type, a > lot of which would duplicate the pg_dump logic, but it would probably be a lot > more robust. Agree , But we might have a lots of code duplication essentially almost all of pg_dump Code needs to be duplicated, which might cause issue when modifying/adding new DDLs. I am not sure but if it's possible to move dependent code of pg_dump to common/ folder , to avoid duplication. Regards Sachin
Re: [BUG] pg_stat_statements and extended query protocol
On Mon, Mar 20, 2023 at 09:41:12PM +, Imseih (AWS), Sami wrote: > I was thinking about this and it seems to me we can avoid > adding new fields to Estate. I think a better place to track > rows and calls is in the Instrumentation struct. > > If this is more palatable, I can prepare the patch. This indeed feels a bit more natural seen from here, after looking at the code paths using an Instrumentation in the executor and explain, for example. At least, this stresses me much less than adding 16 bytes to EState for something restricted to the extended protocol when it comes to monitoring capabilities. -- Michael signature.asc Description: PGP signature
Re: Fix typo plgsql to plpgsql.
On Tue, Mar 21, 2023 at 12:26 AM Zhang Mingli wrote: > Found several typos like plgsql, I think it should be plpgsql. > +1. I believe these are typos. And a grep search shows that all typos of this kind are addressed by the patch. Thanks Richard
Re: Question about pull_up_sublinks_qual_recurse
On Sat, Oct 15, 2022 at 3:27 AM Tom Lane wrote: > Andy Fan writes: > > After some more self review, I find my proposal has the following side > > effects. > > Yeah, I do not think this works at all. The discussion of outer join > reordering in optimizer/README says that that doesn't work, and while > I'm too lazy to construct an example right now, I believe it's true. > I came to this topic again recently and have finally figured out the reason. It looks to me that semi join is slightly different with outer join in this case. The following test case can show why we have to pull_up_sublinks_qual_recurse to either LHS or RHS rather than the JoinExpr. create table t1(a int, b int, c int); create table t2(a int, b int, c int); create table t3(a int, b int, c int); insert into t1 select 1, 1, 2; insert into t2 select 1, 2, 1; insert into t2 select 1, 1, 2; insert into t3 select 1, 1, 2; select * from t1 where exists (select 1 from t2 -- below references to t1 and t2 at the same time where exists (select 1 from t3 where t1.c = t2.c and t2.b = t3.b) and t1.a = t2.a); which can be transformed to SELECT * FROM t1 SEMI JOIN t2 ON t1.a = t2.a AND exists (select 1 from t3 where t1.c = t2.c and t2.b = t3.b) Here the semantics of the query is return the rows in T1 iff there is a row in t2 matches the whole clause (t1.a = t2.a AND exists..); But If we transform it to SELECT * FROM (t1 SEMI JOIN t2 ON t1.a = t2.a) SEMI JOIN t3 on t1.c = t2.c and t2.b = t3.b; The scan on T2 would stop if ONLY (t1.a = t2.a) matches and the following rows will be ignored. However the matched rows may doesn't match the exists clause! So in the above example, the correct result set will be 1 row. If we pull up the sublink above the JoinExpr, no row would be found. The attached is just a comment and a test case to help understand why we have to do things like this. -- Best Regards Andy Fan v1-0001-test-case-and-comments-to-help-understand-why-we-.patch Description: Binary data
Re: Save a few bytes in pg_attribute
Hi, On 2023-03-20 11:00:12 +0100, Peter Eisentraut wrote: > After the discussion in [0] ff., I was looking around in pg_attribute and > noticed that we could possibly save 4 bytes. We could change both > attstattarget and attinhcount from int4 to int2, which together with some > reordering would save 4 bytes from the fixed portion. attndims seems like another good candidate to shrink. Greetings, Andres Freund
Re: Request for comment on setting binary format output per session
On Mon, 20 Mar 2023 at 15:09, Jeff Davis wrote: > On Mon, 2023-03-20 at 14:36 -0400, Dave Cramer wrote: > > Thanks for the review. I'm curious what system you are running on as > > I don't see any of these errors. > > Are asserts enabled? > > > Well I'm guessing psql doesn't know how to read date or timestamptz > > in binary. This is not a failing of the code. > > It seems strange, and potentially dangerous, to send binary data to a > client that's not expecting it. It feels too easy to cause confusion by > changing the GUC mid-session. > > Also, it seems like DISCARD ALL is not resetting it, which I think is a > bug. > Thanks yes, this is a bug > > > > > This is an interesting question. If the type isn't visible then it's > > not visible to the query so > > I don't think that's true -- the type could be in a different schema > from the table. Good point. This seems to be the very difficult part. > > > > > > > 5. There's a theoretical invalidation problem. It might also be a > > > practical problem in some testing setups with long-lived > > > connections > > > that are recreating user-defined types. > > > > > > > UDT's seem to be a problem here which candidly have very little use > > case for binary output. > > I mostly agree with that, but it also might not be hard to support > UDTs. Is there a design problem here or is it "just a matter of code"? > > > > > I didn't try to solve it as Tom was OK with using a GUC. Using a > > startup GUC is interesting, > > but how would that work with pools where we want to reset the > > connection when we return it and then > > set the binary format on borrow ? By using a GUC when a client > > borrows a connection from a pool the client > > can reconfigure the oids it wants formatted in binary. > > That's a good point. How common is it to share a connection pool > between different clients (some of which might support a binary format, > and others which don't)? And would the connection pool put connections > with and without the property in different pools? > For JAVA pools it's probably OK, but for pools like pgbouncer we have no control of who is going to get the connection next. > > > > > I really hadn't considered supporting type names. I have asked Paul > > Ramsey about PostGIS and he doesn't see PostGIS using this. > > One of the things I like about Postgres is that the features all work > together, and that user-defined objects are generally as good as built- > in ones. Sometimes there's a reason to make a special case (e.g. syntax > support or something), but in this case it seems like we could support > user-defined types just fine, right? It's also just more friendly and > readable to use type names, especially if it's a GUC. > > Regards, > Jeff Davis > >
Re: Missing rules for queryjumblefuncs.{funcs,switch}.c for maintainer-clean
On Mon, Mar 20, 2023 at 10:21:28AM -0400, Tom Lane wrote: > Is similar knowledge missing in the meson build files? src/backend/nodes/meson.build and src/include/nodes/meson.build are the two meson files that have the knowledge about the files generated by gen_node_support.pl, and the query jumbling files are consistent with that since 0e681cf. Perhaps I've missed an extra spot? -- Michael signature.asc Description: PGP signature
Re: Request for comment on setting binary format output per session
Dave Cramer On Mon, 20 Mar 2023 at 15:09, Tom Lane wrote: > Dave Cramer writes: > > On Mon, 20 Mar 2023 at 13:05, Jeff Davis wrote: > >> 2. Easy to confuse psql: > >> > >> CREATE TABLE a(d date, t timestamptz); > >> SET format_binary='25,1082,1184'; > >> SELECT * FROM a; > >> d | t > >> ---+--- > >> ! | > >> (1 row) > >> > >> Well I'm guessing psql doesn't know how to read date or timestamptz in > >> binary. This is not a failing of the code. > > What it is is a strong suggestion that controlling this via a GUC is > not a great choice. There are many inappropriate (wrong abstraction > level) ways to change a GUC and thereby break a client that's not > expecting binary output. I think Jeff's suggestion that we should > treat this as a protocol extension might be a good idea. > > If I recall the protocol-extension design correctly, such a setting > could only be set at session start, which could be annoying --- at the > very least we'd have to tolerate entries for unrecognized data types, > since clients couldn't be expected to have checked the list against > the current server in advance. > As mentioned for connection pools we need to be able to set these after the session starts. I'm not sure how useful the protocol extension mechanism works given that it can only be used on startup. > > regards, tom lane >
Re: Request for comment on setting binary format output per session
On Mon, 20 Mar 2023 at 19:10, Merlin Moncure wrote: > > > On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer wrote: > >> >> Dave Cramer >> >> >> On Sat, 4 Mar 2023 at 19:39, Dave Cramer wrote: >> >>> >>> >>> On Sat, 4 Mar 2023 at 19:06, Tom Lane wrote: >>> Jeff Davis writes: > On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote: >> Most of the clients know how to decode the builtin types. I'm not >> sure there is a use case for binary encode types that the clients >> don't have a priori knowledge of. > The client could, in theory, have a priori knowledge of a non-builtin > type. I don't see what's "in theory" about that. There seems plenty of use for binary I/O of, say, PostGIS types. Even for built-in types, do we really want to encourage people to hard-wire their OIDs into applications? >>> >>> How does a client read these? I'm pretty narrowly focussed. The JDBC API >>> doesn't really have a way to read a non built-in type. There is a facility >>> to read a UDT, but the user would have to provide that transcoder. I guess >>> I'm curious how other clients read binary UDT's ? >>> I don't see a big problem with driving this off a GUC, but I think it should be a list of type names not OIDs. We already have plenty of precedent for dealing with that sort of thing; see search_path for the canonical example. IIRC, there's similar caching logic for temp_tablespaces. >>> >>> I have no issue with allowing names, OID's were compact, but we could >>> easily support both >>> >> >> Attached is a preliminary patch that takes a list of OID's. I'd like to >> know if this is going in the right direction. >> >> Next step would be to deal with type names as opposed to OID's. >> This will be a bit more challenging as type names are schema specific. >> > > OIDs are a pain to deal with IMO. They will not survive a dump style > restore, and are hard to keep synchronized between databases...type names > don't have this problem. OIDs are an implementation artifact that ought > not need any extra dependency. > AFAIK, OID's for built-in types don't change. Clearly we need more thought on how to deal with UDT's > > > This seems like a protocol or even a driver issue rather than a GUC issue. > Why does the server need to care what format the client might want to > prefer on a query by query basis? > Actually this isn't a query by query basis. The point of this is that the client wants all the results for given OID's in binary. > I just don't see it. The resultformat switch in libpq works pretty well, > except that it's "all in" on getting data from the server, with the dead > simple workaround of casting to text which might even be able to be managed > from within the driver itself. > > merlin > > >
Re: Add pg_walinspect function with block info columns
On Mon, Mar 20, 2023 at 4:51 PM Peter Geoghegan wrote: > The new pg_get_wal_block_info outputs columns in an order that doesn't > seem like the most useful possible order to me. This gives us another > reason to have separate GetWALRecordInfo and GetWALBlockInfo utility > functions rather than sharing logic for building output tuples. One more piece of feedback for Bharath: I think that we should also make the description output column display NULLs for those records that don't output any description string. This at least includes the "FPI" record type from the "XLOG" rmgr. Alternatively, we could find a way of making it show a description. -- Peter Geoghegan
Re: Add pg_walinspect function with block info columns
On Mon, Mar 20, 2023 at 4:34 PM Peter Geoghegan wrote: > I agree. A little redundancy is better when the alternative is fragile > code, and I'm pretty sure that that applies here -- there won't be > very many duplicated lines, and the final code will be significantly > clearer. There can be a comment about keeping GetWALRecordInfo and > GetWALBlockInfo in sync. The new pg_get_wal_block_info outputs columns in an order that doesn't seem like the most useful possible order to me. This gives us another reason to have separate GetWALRecordInfo and GetWALBlockInfo utility functions rather than sharing logic for building output tuples. Specifically, I think that pg_get_wal_block_info should ouput the "primary key" columns first: reltablespace, reldatabase, relfilenode, blockid, start_lsn, end_lsn Next comes the columns that duplicate the columns output by pg_get_wal_records_info, in the same order as they appear in pg_get_wal_records_info. (Obviously this won't include block_ref). -- Peter Geoghegan
Re: Save a few bytes in pg_attribute
Andres Freund writes: > On 2023-03-20 10:37:36 -0400, Tom Lane wrote: >> I agree that attinhcount could be narrowed, but I have some concern >> about attstattarget. IIRC, the limit on attstattarget was once 1000 >> and then we raised it to 1. Is it inconceivable that we might >> want to raise it to 10 someday? > Hard to believe that'd happen in a minor version - and I don't think there'd > an issue with widening it again in a major version? True. However, I think Tomas' idea of making these columns nullable is even better than narrowing them. regards, tom lane
Re: Save a few bytes in pg_attribute
Hi, On 2023-03-20 10:37:36 -0400, Tom Lane wrote: > I agree that attinhcount could be narrowed, but I have some concern > about attstattarget. IIRC, the limit on attstattarget was once 1000 > and then we raised it to 1. Is it inconceivable that we might > want to raise it to 10 someday? Hard to believe that'd happen in a minor version - and I don't think there'd an issue with widening it again in a major version? I doubt we'll ever go to 100k without a major redesign of stats storage/access - the size of the stats datums would make that pretty impractical right now. Greetings, Andres Freund
Re: Add pg_walinspect function with block info columns
On Sun, Mar 19, 2023 at 8:21 PM Kyotaro Horiguchi wrote: > The documentation has just one section titled "General Functions" > which directly contains detailed explation of four functions, making > it hard to get clear understanding of the available functions. I > considered breaking it down into a few subsections, but that wouldn't > look great since most of them would only contain one function. > However, I feel it would be helpful to add a list of all functions at > the beginning of the section. I like the idea of sections, even if there is only one function per section in some cases. I also think that we should add a "Tip" that advises users that they may use an "end LSN" that is the largest possible LSN, '/' to get information about records up until the current LSN of the cluster (per commit 5c1b6628). Is there a straightforward way to get a usable LSN constant for this purpose? The simplest way I could come up with quickly is "SELECT pg_lsn(2^64.-1)" -- which still isn't very simple. Actually, it might be even worse than '/', so perhaps we should just use that in the docs new "Tip". > I agree that adding a note about the characteristics would helpful to > avoid the misuse of pg_get_wal_block_info(). How about something like, > "Note that pg_get_wal_block_info() omits records that contains no > block references."? This should be a strict invariant. In other words, it should be part of the documented contract of pg_get_wal_block_info and pg_get_wal_records_info. The two functions should be defined in terms of each other. Their relationship is important. Users should be able to safely assume that the records that have a NULL block_ref according to pg_get_wal_records_info are *precisely* those records that won't have any entries within pg_get_wal_block_info (assuming that the same LSN range is used with both functions). pg_walinspect should explicitly promise this, and promise the corollary condition around non-NULL block_ref records. It is a useful promise from the point of view of users. It also makes it easier to understand what's really going on here without any ambiguity. I don't completely disagree with Michael about the redundancy. I just think that it's worth it on performance grounds. We might want to say that directly in the docs, too. > > Attaching v3 patch set - 0001 optimizations around block references, > > 0002 enables pg_get_wal_block_info() to emit per-record info. Any > > thoughts? > > + /* Get block references, if any, otherwise continue. */ > + if (!XLogRecHasAnyBlockRefs(xlogreader)) > + continue; > > I'm not sure, but the "continue" might be confusing since the code > "continue"s if the condition is true and continues the process > otherwise.. And it seems like a kind of "explaination of what the > code does". I feel we don't need the a comment there. +1. Also, if GetWALBlockInfo() is now supposed to only be called when XLogRecHasAnyBlockRefs() now then it should probably have an assertion to verify the precondition. > It is not an issue with this patch, but as I look at this version, I'm > starting to feel uneasy about the subtle differences between what > GetWALRecordsInfo and GetWALBlockInfo do. One solution might be to > have GetWALBlockInfo return a values array for each block, but that > could make things more complex than needed. Alternatively, could we > get GetWALRecordsInfo to call tuplestore_putvalues() internally? This > way, both functions can manage the temporary memory context within > themselves. Agreed. I'm also not sure what to do about it, though. > This means GetWALBlockInfo overwrites the last two columns generated > by GetWalRecordInfo, but I don't think this approach is clean and > stable. I agree we don't want the final columns in a block info tuple > but we don't want to duplicate the common code path. > I initially thought we could devide the function into > GetWALCommonInfo(), GetWALRecordInfo() and GetWALBlockInfo(), but it > doesn't seem that simple.. In the end, I think we should have separate > GetWALRecordInfo() and GetWALBlockInfo() that have duplicate > "values[i++] = .." lines. I agree. A little redundancy is better when the alternative is fragile code, and I'm pretty sure that that applies here -- there won't be very many duplicated lines, and the final code will be significantly clearer. There can be a comment about keeping GetWALRecordInfo and GetWALBlockInfo in sync. -- Peter Geoghegan
Re: Request for comment on setting binary format output per session
On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer wrote: > > Dave Cramer > > > On Sat, 4 Mar 2023 at 19:39, Dave Cramer wrote: > >> >> >> On Sat, 4 Mar 2023 at 19:06, Tom Lane wrote: >> >>> Jeff Davis writes: >>> > On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote: >>> >> Most of the clients know how to decode the builtin types. I'm not >>> >> sure there is a use case for binary encode types that the clients >>> >> don't have a priori knowledge of. >>> >>> > The client could, in theory, have a priori knowledge of a non-builtin >>> > type. >>> >>> I don't see what's "in theory" about that. There seems plenty of >>> use for binary I/O of, say, PostGIS types. Even for built-in types, >>> do we really want to encourage people to hard-wire their OIDs into >>> applications? >>> >> >> How does a client read these? I'm pretty narrowly focussed. The JDBC API >> doesn't really have a way to read a non built-in type. There is a facility >> to read a UDT, but the user would have to provide that transcoder. I guess >> I'm curious how other clients read binary UDT's ? >> >>> >>> I don't see a big problem with driving this off a GUC, but I think >>> it should be a list of type names not OIDs. We already have plenty >>> of precedent for dealing with that sort of thing; see search_path >>> for the canonical example. IIRC, there's similar caching logic >>> for temp_tablespaces. >>> >> >> I have no issue with allowing names, OID's were compact, but we could >> easily support both >> > > Attached is a preliminary patch that takes a list of OID's. I'd like to > know if this is going in the right direction. > > Next step would be to deal with type names as opposed to OID's. > This will be a bit more challenging as type names are schema specific. > OIDs are a pain to deal with IMO. They will not survive a dump style restore, and are hard to keep synchronized between databases...type names don't have this problem. OIDs are an implementation artifact that ought not need any extra dependency. This seems like a protocol or even a driver issue rather than a GUC issue. Why does the server need to care what format the client might want to prefer on a query by query basis? I just don't see it. The resultformat switch in libpq works pretty well, except that it's "all in" on getting data from the server, with the dead simple workaround of casting to text which might even be able to be managed from within the driver itself. merlin
Re: Ability to reference other extensions by schema in extension scripts
Sandro Santilli writes: > On Mon, Mar 13, 2023 at 05:57:57PM -0400, Regina Obe wrote: >> Attached is a slightly revised patch to fix the extra whitespace in the >> extend.gml document that Sandro noted to me. > Thanks Regina. > I've tested attached patch (md5 0b652a8271fc7e71ed5f712ac162a0ef) > against current master (hash 4ef1be5a0b676a9f030cc2e4837f4b5650ecb069). > The patch applies cleanly, builds cleanly, regresses cleanly. Pushed with some mostly-cosmetic adjustments (in particular I tried to make the docs and tests neater). I did not commit the changes in get_available_versions_for_extension to add no_relocate as an output column. Those were dead code because you hadn't done anything to connect them up to an actual output parameter of pg_available_extension_versions(). While I'm not necessarily averse to making the no_relocate values visible somehow, I'm not convinced that pg_available_extension_versions should be the place to do it. ISTM what's relevant is the no_relocate values of *installed* extensions, not those of potentially-installable extensions. If we had a view over pg_extension then that might be a place to add this, but we don't. On the whole it didn't seem important enough to pursue, so I just left it out. regards, tom lane
Re: Add LZ4 compression in pg_dump
On Fri, Mar 17, 2023 at 03:43:58PM +, gkokola...@pm.me wrote: > From a174cdff4ec8aad59f5bcc7e8d52218a14fe56fc Mon Sep 17 00:00:00 2001 > From: Georgios Kokolatos > Date: Fri, 17 Mar 2023 14:45:58 + > Subject: [PATCH v3 1/3] Improve type handling in pg_dump's compress file API > -int > +bool > EndCompressFileHandle(CompressFileHandle *CFH) > { > - int ret = 0; > + boolret = 0; Should say "= false" ? > /* >* Write 'size' bytes of data into the file from 'ptr'. > + * > + * Returns true on success and false on error. > + */ > + bool(*write_func) (const void *ptr, size_t size, > -* Get a pointer to a string that describes an error that occurred > during a > -* compress file handle operation. > +* Get a pointer to a string that describes an error that occurred > during > +* a compress file handle operation. > */ > const char *(*get_error_func) (CompressFileHandle *CFH); This should mention that the error accessible in error_func() applies (only) to write_func() ? As long as this touches pg_backup_directory.c you could update the header comment to refer to "compressed extensions", not just .gz. I noticed that EndCompressorLZ4() tests "if (LZ4cs)", but that should always be true. I was able to convert the zstd patch to this new API with no issue. -- Justin
Re: [PATCH] Fix alter subscription concurrency errors
> "Gregory Stark (as CFM)" writes: > > Hm. This patch is still waiting on updates. But it's a bug fix and it > > would be good to get this in. Is someone else interested in finishing > > this if Jeite isn't available? > > I think the patch as-submitted is pretty uninteresting, mainly because the > design of adding bespoke lock code for subscription objects doesn't scale. > I'm not excited about doing this just for subscriptions without at least a > clear plan for working on other object types. > > Alvaro suggested switching it to use get_object_address() instead, which'd > be better; but before going in that direction we might have to do more > work on get_object_address's error reporting (note the para in its header > comments saying it's pretty weak on that). Sorry for not responding earlier in this thread. I'll be honest in saying this was a small annoyance to me, so I ignored theresonses more than I should have. It caused some test flakiness in the Citus test suite, and it seemed that fixing the underlying issue in Postgres was most appropriate. I addressed this in Citus its test suite by disabling the relevant test (which was an edge case test anyway). So my immidiate problem was fixed, and I stopped caring about this patch very much. Definitely not enough to address this for all other DDLs with the same issue. All in all I'm having a hard time feeling motivated to work on a patch that I don't care much about. Especially since I have two other patches open for a few commit fests that I actually care about, but those patches have received (imho) very little input. Which makes it hard to justify to myself to spend time on this patch, given the knowledge that if I would spend time on it, it might take away the precious reviewer time from the patches I do care about.
Re: [BUG] pg_stat_statements and extended query protocol
Sorry about the delay in response about this. I was thinking about this and it seems to me we can avoid adding new fields to Estate. I think a better place to track rows and calls is in the Instrumentation struct. --- a/src/include/executor/instrument.h +++ b/src/include/executor/instrument.h @@ -88,6 +88,8 @@ typedef struct Instrumentation double nfiltered2; /* # of tuples removed by "other" quals */ BufferUsage bufusage; /* total buffer usage */ WalUsagewalusage; /* total WAL usage */ + int64 calls; + int64 rows_processed; } Instrumentation; If this is more palatable, I can prepare the patch. Thanks for your feedback. Regards. Sami Imseih Amazon Web Services (AWS)
Re: Add LZ4 compression in pg_dump
Hi, I was preparing to get the 3 cleanup patches pushed, so I updated/reworded the commit messages a bit (attached, please check). But I noticed the commit message for 0001 said: In passing save the appropriate errno in LZ4File_open_write in case that the caller is not using the API's get_error_func. I think that's far too low-level for a commit message, it'd be much more appropriate for a comment at the function. However, do we even need this behavior? I was looking for code calling this function without using get_error_func(), but no luck. And if there is such caller, shouldn't we fix it to use get_error_func()? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 6a3d5d743f022ffcd0fcaf3d6e9ba711e2e785e7 Mon Sep 17 00:00:00 2001 From: Georgios Kokolatos Date: Fri, 17 Mar 2023 14:45:58 + Subject: [PATCH v4 1/3] Improve type handling in pg_dump's compress file API MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After 0da243fed0 got committed, we've received a report about a compiler warning, related to the new LZ4File_gets() function: compress_lz4.c: In function ‘LZ4File_gets’: compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is always false [-Wtype-limits] 492 | if (dsize < 0) The reason is very simple - dsize is declared as size_t, which is an unsigned integer, and thus the check is pointless and we might fail to notice an error in some cases (or fail in a strange way a bit later). The warning could have been silenced by simply changing the type, but we realized the API mostly assumes all the libraries use the same types and report errors the same way (e.g. by returning 0 and/or negative value). But we can't make this assumption - the gzip/lz4 libraries already disagree on some of this, and even if they did a library added in the future might not. The right solution is to define what the API does, and translate the library-specific behavior in consistent way (so that the internal errors are not exposed to users of our compression API). For that reason, this commit adjusts the data types in a couple places, so that we don't miss library errors, and unifies the error reporting to simply return true/false (instead of e.g. size_t). Author: Georgios Kokolatos Reported-by: Alexander Lakhin Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/33496f7c-3449-1426-d568-63f6bca2a...@gmail.com --- src/bin/pg_dump/compress_gzip.c | 37 ++-- src/bin/pg_dump/compress_io.c | 8 +-- src/bin/pg_dump/compress_io.h | 38 +--- src/bin/pg_dump/compress_lz4.c| 85 +++ src/bin/pg_dump/compress_none.c | 41 - src/bin/pg_dump/pg_backup_archiver.c | 18 ++ src/bin/pg_dump/pg_backup_directory.c | 36 ++-- 7 files changed, 148 insertions(+), 115 deletions(-) diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c index 0af65afeb4..d9c3969332 100644 --- a/src/bin/pg_dump/compress_gzip.c +++ b/src/bin/pg_dump/compress_gzip.c @@ -233,14 +233,14 @@ InitCompressorGzip(CompressorState *cs, *-- */ -static size_t -Gzip_read(void *ptr, size_t size, CompressFileHandle *CFH) +static bool +Gzip_read(void *ptr, size_t size, size_t *rsize, CompressFileHandle *CFH) { gzFile gzfp = (gzFile) CFH->private_data; - size_t ret; + int gzret; - ret = gzread(gzfp, ptr, size); - if (ret != size && !gzeof(gzfp)) + gzret = gzread(gzfp, ptr, size); + if (gzret <= 0 && !gzeof(gzfp)) { int errnum; const char *errmsg = gzerror(gzfp, ); @@ -249,15 +249,18 @@ Gzip_read(void *ptr, size_t size, CompressFileHandle *CFH) errnum == Z_ERRNO ? strerror(errno) : errmsg); } - return ret; + if (rsize) + *rsize = (size_t) gzret; + + return true; } -static size_t +static bool Gzip_write(const void *ptr, size_t size, CompressFileHandle *CFH) { gzFile gzfp = (gzFile) CFH->private_data; - return gzwrite(gzfp, ptr, size); + return gzwrite(gzfp, ptr, size) > 0; } static int @@ -287,22 +290,22 @@ Gzip_gets(char *ptr, int size, CompressFileHandle *CFH) return gzgets(gzfp, ptr, size); } -static int +static bool Gzip_close(CompressFileHandle *CFH) { gzFile gzfp = (gzFile) CFH->private_data; CFH->private_data = NULL; - return gzclose(gzfp); + return gzclose(gzfp) == Z_OK; } -static int +static bool Gzip_eof(CompressFileHandle *CFH) { gzFile gzfp = (gzFile) CFH->private_data; - return gzeof(gzfp); + return gzeof(gzfp) == 1; } static const char * @@ -319,7 +322,7 @@ Gzip_get_error(CompressFileHandle *CFH) return errmsg; } -static int +static bool Gzip_open(const char *path, int fd, const char *mode, CompressFileHandle *CFH) { gzFile gzfp; @@ -342,18 +345,18 @@ Gzip_open(const char *path, int fd, const char *mode, CompressFileHandle *CFH) gzfp =
Adjust Memoize hit_ratio calculation
Yesterday, in 785f70957, I adjusted the Memoize costing code to account for the size of the cache key when estimating how many cache entries can exist at once in the cache. That effectively makes Memoize a less likely choice as fewer entries will be expected to fit in work_mem now. Because that's being changed in v16, I think it might also be a good idea to fix the hit_ratio calculation problem reported by David Johnston in [1]. In the attached, I've adjusted David's calculation slightly so that we divide by Max(ndistinct, est_cache_entries) instead of ndistinct. This saves from overestimating when ndistinct is smaller than est_cache_entries. I'd rather fix this now for v16 than wait until v17 and further adjust the Memoize costing. I've attached a spreadsheet showing the new and old hit_ration calculations. Cells C1 - C3 can be adjusted to show what the hit ratio is for both the old and new method. Any objections? David [1] https://postgr.es/m/cakfquwzemcnk3yqo2xj4eduody6qakad31rod1vc4q1_s68...@mail.gmail.com adjust_memoize_hit_ratio_calculation.patch Description: Binary data memoize_cache_hits.ods Description: application/vnd.oasis.opendocument.spreadsheet
Re: Experiments with Postgres and SSL
Sorry, checking the cfbot apparently I had a typo in the #ifndef USE_SSL case. From 4b6e01c7f569a919d660cd80ce64cb913bc9f220 Mon Sep 17 00:00:00 2001 From: Greg Stark Date: Mon, 20 Mar 2023 14:09:30 -0400 Subject: [PATCH v4 4/4] alpn support --- src/backend/libpq/be-secure-openssl.c| 65 src/backend/libpq/be-secure.c| 3 ++ src/backend/postmaster/postmaster.c | 23 + src/bin/psql/command.c | 7 ++- src/include/libpq/libpq-be.h | 1 + src/include/libpq/libpq.h| 1 + src/interfaces/libpq/fe-connect.c| 5 ++ src/interfaces/libpq/fe-secure-openssl.c | 25 + src/interfaces/libpq/libpq-int.h | 1 + 9 files changed, 129 insertions(+), 2 deletions(-) diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 685aa2ed69..034e1cf2ec 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -67,6 +67,12 @@ static int ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat static int dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata); static int verify_cb(int ok, X509_STORE_CTX *ctx); static void info_cb(const SSL *ssl, int type, int args); +static int alpn_cb(SSL *ssl, + const unsigned char **out, + unsigned char *outlen, + const unsigned char *in, + unsigned int inlen, + void *userdata); static bool initialize_dh(SSL_CTX *context, bool isServerStart); static bool initialize_ecdh(SSL_CTX *context, bool isServerStart); static const char *SSLerrmessage(unsigned long ecode); @@ -432,6 +438,11 @@ be_tls_open_server(Port *port) /* set up debugging/info callback */ SSL_CTX_set_info_callback(SSL_context, info_cb); + if (ssl_enable_alpn) { + elog(WARNING, "Enabling ALPN Callback"); + SSL_CTX_set_alpn_select_cb(SSL_context, alpn_cb, port); + } + if (!(port->ssl = SSL_new(SSL_context))) { ereport(COMMERROR, @@ -1250,6 +1261,60 @@ info_cb(const SSL *ssl, int type, int args) } } +static unsigned char alpn_protos[] = { + 12, 'P','o','s','t','g','r','e','s','/','3','.','0' +}; +static unsigned int alpn_protos_len = sizeof(alpn_protos); + +/* + * Server callback for ALPN negotiation. We use use the standard "helper" + * function even though currently we only accept one value. We store the + * negotiated protocol in Port->ssl_alpn_protocol and rely on higher level + * logic (in postmaster.c) to decide what to do with that info. + * + * XXX Not sure if it's kosher to use palloc and elog() inside OpenSSL + * callbacks. If we throw an error from here is there a risk of messing up + * OpenSSL data structures? It's possible it's ok becuase this is only called + * during connection negotiation which we don't try to recover from. + */ +static int alpn_cb(SSL *ssl, + const unsigned char **out, + unsigned char *outlen, + const unsigned char *in, + unsigned int inlen, + void *userdata) +{ + /* why does OpenSSL provide a helper function that requires a nonconst + * vector when the callback is declared to take a const vector? What are + * we to do with that?*/ + int retval; + Assert(userdata != NULL); + Assert(out != NULL); + Assert(outlen != NULL); + Assert(in != NULL); + + retval = SSL_select_next_proto((unsigned char **)out, outlen, + alpn_protos, alpn_protos_len, + in, inlen); + if (*out == NULL || *outlen > alpn_protos_len || outlen <= 0) + elog(ERROR, "SSL_select_next_proto returned nonsensical results"); + + if (retval == OPENSSL_NPN_NEGOTIATED) + { + struct Port *port = (struct Port *)userdata; + + port->ssl_alpn_protocol = palloc0(*outlen+1); + /* SSL uses unsigned chars but Postgres uses host signedness of chars */ + strncpy(port->ssl_alpn_protocol, (char*)*out, *outlen); + return SSL_TLSEXT_ERR_OK; + } else if (retval == OPENSSL_NPN_NO_OVERLAP) { + return SSL_TLSEXT_ERR_NOACK; + } else { + return SSL_TLSEXT_ERR_NOACK; + } +} + + /* * Set DH parameters for generating ephemeral DH keys. The * DH parameters can take a long time to compute, so they must be diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index 1020b3adb0..79a61900ba 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -61,6 +61,9 @@ bool SSLPreferServerCiphers; int ssl_min_protocol_version = PG_TLS1_2_VERSION; int ssl_max_protocol_version = PG_TLS_ANY; +/* GUC variable: if false ignore ALPN negotiation */ +bool ssl_enable_alpn = true; + /* */ /* Procedures common to all secure sessions */ /* */ diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index ec1d895a23..2640b69fed 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1934,6 +1934,8 @@ ServerLoop(void)
Re: Commitfest 2023-03 starting tomorrow!
On Tue, Mar 21, 2023 at 9:22 AM Greg Stark wrote: > Yeah, even aside from flappers there's the problem that it's about as > common for real commits to break some test as it is for patches to > start failing tests. So often it's a real failure but it's nothing to > do with the patch, it has to do with the commit that went into git. That is true. The best solution to that problem, I think, is to teach cfbot that it should test on top of the most recent master commit that succeeded in CI here https://github.com/postgres/postgres/commits/master . It's on my list...
Re: Experiments with Postgres and SSL
Here's a first cut at ALPN support. Currently it's using a hard coded "Postgres/3.0" protocol (hard coded both in the client and the server...). And it's hard coded to be required for direct connections and supported but not required for regular connections. IIRC I put a variable labeled a "GUC" but forgot to actually make it a GUC. But I'm thinking of maybe removing that variable since I don't see much of a use case for controlling this manually. I *think* ALPN is supported by all the versions of OpenSSL we support. The other patches are unchanged (modulo a free() that I missed in the client before). They still have the semi-open issues I mentioned in the previous email. -- greg From 32185020927824c4b57af900100a37f92ae6a040 Mon Sep 17 00:00:00 2001 From: Greg Stark Date: Mon, 20 Mar 2023 14:09:30 -0400 Subject: [PATCH v3 4/4] alpn support --- src/backend/libpq/be-secure-openssl.c| 65 src/backend/libpq/be-secure.c| 3 ++ src/backend/postmaster/postmaster.c | 23 + src/bin/psql/command.c | 7 ++- src/include/libpq/libpq-be.h | 1 + src/include/libpq/libpq.h| 1 + src/interfaces/libpq/fe-connect.c| 5 ++ src/interfaces/libpq/fe-secure-openssl.c | 25 + src/interfaces/libpq/libpq-int.h | 1 + 9 files changed, 129 insertions(+), 2 deletions(-) diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 685aa2ed69..034e1cf2ec 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -67,6 +67,12 @@ static int ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat static int dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata); static int verify_cb(int ok, X509_STORE_CTX *ctx); static void info_cb(const SSL *ssl, int type, int args); +static int alpn_cb(SSL *ssl, + const unsigned char **out, + unsigned char *outlen, + const unsigned char *in, + unsigned int inlen, + void *userdata); static bool initialize_dh(SSL_CTX *context, bool isServerStart); static bool initialize_ecdh(SSL_CTX *context, bool isServerStart); static const char *SSLerrmessage(unsigned long ecode); @@ -432,6 +438,11 @@ be_tls_open_server(Port *port) /* set up debugging/info callback */ SSL_CTX_set_info_callback(SSL_context, info_cb); + if (ssl_enable_alpn) { + elog(WARNING, "Enabling ALPN Callback"); + SSL_CTX_set_alpn_select_cb(SSL_context, alpn_cb, port); + } + if (!(port->ssl = SSL_new(SSL_context))) { ereport(COMMERROR, @@ -1250,6 +1261,60 @@ info_cb(const SSL *ssl, int type, int args) } } +static unsigned char alpn_protos[] = { + 12, 'P','o','s','t','g','r','e','s','/','3','.','0' +}; +static unsigned int alpn_protos_len = sizeof(alpn_protos); + +/* + * Server callback for ALPN negotiation. We use use the standard "helper" + * function even though currently we only accept one value. We store the + * negotiated protocol in Port->ssl_alpn_protocol and rely on higher level + * logic (in postmaster.c) to decide what to do with that info. + * + * XXX Not sure if it's kosher to use palloc and elog() inside OpenSSL + * callbacks. If we throw an error from here is there a risk of messing up + * OpenSSL data structures? It's possible it's ok becuase this is only called + * during connection negotiation which we don't try to recover from. + */ +static int alpn_cb(SSL *ssl, + const unsigned char **out, + unsigned char *outlen, + const unsigned char *in, + unsigned int inlen, + void *userdata) +{ + /* why does OpenSSL provide a helper function that requires a nonconst + * vector when the callback is declared to take a const vector? What are + * we to do with that?*/ + int retval; + Assert(userdata != NULL); + Assert(out != NULL); + Assert(outlen != NULL); + Assert(in != NULL); + + retval = SSL_select_next_proto((unsigned char **)out, outlen, + alpn_protos, alpn_protos_len, + in, inlen); + if (*out == NULL || *outlen > alpn_protos_len || outlen <= 0) + elog(ERROR, "SSL_select_next_proto returned nonsensical results"); + + if (retval == OPENSSL_NPN_NEGOTIATED) + { + struct Port *port = (struct Port *)userdata; + + port->ssl_alpn_protocol = palloc0(*outlen+1); + /* SSL uses unsigned chars but Postgres uses host signedness of chars */ + strncpy(port->ssl_alpn_protocol, (char*)*out, *outlen); + return SSL_TLSEXT_ERR_OK; + } else if (retval == OPENSSL_NPN_NO_OVERLAP) { + return SSL_TLSEXT_ERR_NOACK; + } else { + return SSL_TLSEXT_ERR_NOACK; + } +} + + /* * Set DH parameters for generating ephemeral DH keys. The * DH parameters can take a long time to compute, so they must be diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index 1020b3adb0..79a61900ba 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -61,6 +61,9 @@ bool SSLPreferServerCiphers; int
Re: Commitfest 2023-03 starting tomorrow!
On Mon, 20 Mar 2023 at 16:08, Thomas Munro wrote: > > On Tue, Mar 21, 2023 at 3:15 AM Greg Stark wrote: > > The next level of this would be something like notifying the committer > > with a list of patches in the CF that a commit broke. I don't > > immediately see how to integrate that with our workflow but I have > > seen something like this work well in a previous job. When committing > > code you often went and updated other unrelated projects to adapt to > > the new API (or could adjust the code you were committing to cause > > less breakage). > > I've been hesitant to make it send email. The most obvious message to > send would be "hello, you posted a patch, but it fails on CI" to the > submitter. Yeah, even aside from flappers there's the problem that it's about as common for real commits to break some test as it is for patches to start failing tests. So often it's a real failure but it's nothing to do with the patch, it has to do with the commit that went into git. What I'm interested in is something like "hey, your commit caused 17 patches to start failing tests". And it doesn't necessarily have to be an email. Having a historical page so when I look at a patch I can go check, "hey did this start failing at the same time as 17 other patches on the same commit?" would be the same question. -- greg
Re: Add SHELL_EXIT_CODE to psql
On Mon, Mar 20, 2023 at 1:01 PM Tom Lane wrote: > Corey Huinker writes: > > 128+N is implemented. > > I think this mostly looks OK, but: > > * I still say there is no basis whatever for believing that the result > of ferror() is an exit code, errno code, or anything else with > significance beyond zero-or-not. Feeding it to wait_result_to_exit_code > as you've done here is not going to do anything but mislead people in > a platform-dependent way. Probably should set exit_code to -1 if > ferror reports trouble. > Agreed. Sorry, I read your comment wrong the last time or I would have already made it so. > > * Why do you have wait_result_to_exit_code defaulting to return 0 > if it doesn't recognize the code as either WIFEXITED or WIFSIGNALED? > That seems pretty misleading; again -1 would be a better idea. > That makes sense as well. Given that WIFSIGNALED is currently defined as the negation of WIFEXITED, whatever default result we have here is basically a this-should-never-happen. That might be something we want to catch with an assert, but I'm fine with a -1 for now. From 1ff2b5ba4b82efca0edf60a9fb1cdf8307471eee Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Fri, 17 Mar 2023 06:43:24 -0400 Subject: [PATCH v11 1/3] Add PG_OS_TARGET environment variable to enable OS-specific changes to regression tests. --- src/test/regress/pg_regress.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 7b23cc80dc..333aaab139 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -594,6 +594,17 @@ initialize_environment(void) #endif } + /* + * Regression tests that invoke OS commands must occasionally use + * OS-specific syntax (example: NUL vs /dev/null). This env var allows + * those tests to adjust commands accordingly. + */ +#if defined(WIN32) + setenv("PG_OS_TARGET", "WIN32", 1); +#else + setenv("PG_OS_TARGET", "OTHER", 1); +#endif + /* * Set translation-related settings to English; otherwise psql will * produce translated messages and produce diffs. (XXX If we ever support -- 2.39.2 From 3d9c43cfec7f60785c1f1fa1aaa3e1b48224ef98 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Mon, 20 Mar 2023 16:05:10 -0400 Subject: [PATCH v11 2/3] Add wait_result_to_exit_code(). --- src/common/wait_error.c | 15 +++ src/include/port.h | 1 + 2 files changed, 16 insertions(+) diff --git a/src/common/wait_error.c b/src/common/wait_error.c index 4a3c3c61af..80b94cae51 100644 --- a/src/common/wait_error.c +++ b/src/common/wait_error.c @@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found) return true; return false; } + +/* + * Return the POSIX exit code (0 to 255) that corresponds to the argument. + * The argument is a return code returned by wait(2) or waitpid(2), which + * also applies to pclose(3) and system(3). + */ +int +wait_result_to_exit_code(int exit_status) +{ + if (WIFEXITED(exit_status)) + return WEXITSTATUS(exit_status); + if (WIFSIGNALED(exit_status)) + return 128 + WTERMSIG(exit_status); + return -1; +} diff --git a/src/include/port.h b/src/include/port.h index e66193bed9..1b119a3c56 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src); extern char *wait_result_to_str(int exitstatus); extern bool wait_result_is_signal(int exit_status, int signum); extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found); +extern int wait_result_to_exit_code(int exit_status); /* * Interfaces that we assume all Unix system have. We retain individual macros -- 2.39.2 From bf65f36fd57977acaac57972425d1717a205cb72 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Mon, 20 Mar 2023 16:05:23 -0400 Subject: [PATCH v11 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE which report the success or failure of the most recent psql OS-command executed via the \! command or `backticks`. These variables are roughly analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL commands. --- doc/src/sgml/ref/psql-ref.sgml | 21 ++ src/bin/psql/command.c | 15 + src/bin/psql/help.c| 4 src/bin/psql/psqlscanslash.l | 33 + src/test/regress/expected/psql.out | 34 ++ src/test/regress/sql/psql.sql | 30 ++ 6 files changed, 133 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7b8ae9fac3..e6e307fd18 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -4267,6 +4267,27 @@ bar + + SHELL_ERROR + + + true if the last shell command failed, false if + it succeeded. This applies to shell
Re: [PATCH] Add <> support to sepgsql_restorecon
Not this month unfortunately other work has taken precedence. I'll need to look at what it's going to take to create a test. Hopefully I can piggyback on an existing test. Ted On Mon, Mar 20, 2023 at 3:05 PM Gregory Stark (as CFM) wrote: > > Ok, sounds reasonable. Maybe just add a comment to that effect. > > > This needs regression test support for the feature and some minimal > documentation that shows how to make use of it. > > Hm. It sounds like this patch is uncontroversial but is missing > documentation and tests? Has this been addressed? Do you think you'll > get a chance to resolve those issues this month in time for this > release? > > -- > Gregory Stark > As Commitfest Manager >
Re: Commitfest 2023-03 starting tomorrow!
On Tue, Mar 21, 2023 at 3:15 AM Greg Stark wrote: > The next level of this would be something like notifying the committer > with a list of patches in the CF that a commit broke. I don't > immediately see how to integrate that with our workflow but I have > seen something like this work well in a previous job. When committing > code you often went and updated other unrelated projects to adapt to > the new API (or could adjust the code you were committing to cause > less breakage). I've been hesitant to make it send email. The most obvious message to send would be "hello, you posted a patch, but it fails on CI" to the submitter. Cfbot has been running for about 5 years now, and I'd say the rate of spurious/bogus failures has come down a lot over that time as we've chased down the flappers in master, but it's still enough that you would quickly become desensitised/annoyed by the emails, I guess (one of the reasons I recently started keeping history is to be able to do some analysis of that so we can direct attention to chasing down the rest, or have some smart detection of known but not yet resolved flappers, I dunno, something like that). Speaking as someone who occasionally sends patches to other projects, it's confusing and unsettling when you get automated emails from github PRs telling you that this broke, that broke and the other broke, but only the project insiders know which things are newsworthy and which things are "oh yeah that test is a bit noisy, ignore that one".
Re: [PATCH] Add <> support to sepgsql_restorecon
> Ok, sounds reasonable. Maybe just add a comment to that effect. > This needs regression test support for the feature and some minimal > documentation that shows how to make use of it. Hm. It sounds like this patch is uncontroversial but is missing documentation and tests? Has this been addressed? Do you think you'll get a chance to resolve those issues this month in time for this release? -- Gregory Stark As Commitfest Manager
Re: Inconsistency in reporting checkpointer stats
On Sun, 19 Feb 2023 at 04:58, Nitin Jadhav wrote: > > > This doesn't pass the tests, because the regression tests weren't adjusted: > > https://api.cirrus-ci.com/v1/artifact/task/5937624817336320/testrun/build/testrun/regress/regress/regression.diffs > > Thanks for sharing this. I have fixed this in the patch attached. AFAICS this patch was marked Waiting on Author due to this feedback on Feb 14 but the patch was updated Feb 19 and is still passing today. I've marked it Needs Review. I'm not sure if all of the feedback from Bharath Rupireddy and Kyotaro Horiguchi was addressed. If there's any specific questions remaining you need feedback on it would be good to ask explicitly. Otherwise if you think it's ready you could mark it Ready for Commit. -- Gregory Stark As Commitfest Manager
Re: Optimizing Node Files Support
I think it's clear we aren't going to be taking this patch in its current form. Perhaps there are better ways to do what these files do (I sure think there are!) but I don't think microoptimizing the copying is something people are super excited about. It sounds like rethinking how to make these functions more convenient for programmers to maintain reliably would be more valuable. I guess I'll mark it Returned with Feedback -- if there are significant performance gains to show without making the code harder to maintain and/or a nicer way to structure this code in general then we can revisit this. -- Gregory Stark As Commitfest Manager
Re: pg_stats and range statistics
On 20.03.2023 22:27, Gregory Stark (as CFM) wrote: On Sun, 22 Jan 2023 at 18:22, Tomas Vondra wrote: I wonder if we have other functions doing something similar, i.e. accepting a polymorphic type and then imposing additional restrictions on it. Meh, there's things like array comparison functions that require both arguments to be the same kind of arrays. And array_agg that requires the elements to be the same type as the state array (ie, same type as the first element). Not sure there are any taking just one specific type though. Shouldn't this add some sql tests ? Yeah, I guess we should have a couple tests calling these functions on different range arrays. This reminds me lower()/upper() have some extra rules about handling empty ranges / infinite boundaries etc. These functions should behave consistently (as if we called lower() in a loop) and I'm pretty sure that's not the current state. Are we still waiting on these two items? Egor, do you think you'll have a chance to work it for this month? I can try to tidy things up, but I'm not sure if we reached a consensus. Do we stick with the ranges_upper(anyarray) and ranges_lower(anyarray) functions? This approach is okay with me. Tomas, have you made up your mind? Do we want to document these functions? They are very pg_statistic-specific and won't be useful for end users imo.
Re: Save a few bytes in pg_attribute
On Mon, 20 Mar 2023, 11:00 pm Peter Eisentraut, wrote: > After the discussion in [0] ff., I was looking around in pg_attribute > and noticed that we could possibly save 4 bytes. We could change both > attstattarget and attinhcount from int4 to int2, which together with > some reordering would save 4 bytes from the fixed portion. I just want to highlight 1ef61ddce9, which fixed a very long-standing bug that meant that pg_inherits.inhseqno was effectively just 16-bit. Perhaps because nobody seemed to report that as a limitation 16 bits is enough space. I only noticed that as a bug from code reading. David
Re: [PATCH] Introduce array_shuffle() and array_sample()
On Thu, 29 Sept 2022 at 15:34, Tom Lane wrote: > > Martin Kalcher writes: > > New patch: array_shuffle() and array_sample() use pg_global_prng_state now. > > I took a closer look at the patch today. I find this behavior a bit > surprising: > It looks like this patch received useful feedback and it wouldn't take much to push it over the line. But it's been Waiting On Author since last September. Martin, any chance of getting these last bits of feedback resolved so it can be Ready for Commit? -- Gregory Stark As Commitfest Manager
Re: [PATCH] Add initial xid/mxid/mxoff to initdb
On Fri, 25 Nov 2022 at 07:22, Peter Eisentraut wrote: > > Just for completeness, over in the other thread the feedback was that > this functionality is better put into pg_resetwal. So is that other thread tracked in a different commitfest entry and this one completely redundant? I'll mark it Rejected then? -- Gregory Stark As Commitfest Manager
Re: pg_stats and range statistics
On Sun, 22 Jan 2023 at 18:22, Tomas Vondra wrote: > > I wonder if we have other functions doing something similar, i.e. > accepting a polymorphic type and then imposing additional restrictions > on it. Meh, there's things like array comparison functions that require both arguments to be the same kind of arrays. And array_agg that requires the elements to be the same type as the state array (ie, same type as the first element). Not sure there are any taking just one specific type though. > > Shouldn't this add some sql tests ? > > Yeah, I guess we should have a couple tests calling these functions on > different range arrays. > > This reminds me lower()/upper() have some extra rules about handling > empty ranges / infinite boundaries etc. These functions should behave > consistently (as if we called lower() in a loop) and I'm pretty sure > that's not the current state. Are we still waiting on these two items? Egor, do you think you'll have a chance to work it for this month? -- Gregory Stark As Commitfest Manager
Re: meson documentation build open issues
Hi, On 2023-03-19 19:33:38 -0700, Andres Freund wrote: > I think we can make the docs build in parallel and incrementally, by building > the different parts of the docs in parallel, using --stringparam rootid, > e.g. building each 'part' separately. > > A very very rough draft attached: > > parallel with parts: > real 0m10.831s > user 0m58.295s > sys 0m1.402s > > normal: > real 0m32.215s > user 0m31.876s > sys 0m0.328s > > 1/3 of the build time at 2x the cost is nothing to sneeze at. I could not make myself stop trying to figure out where the big constant time factor comes from. Every invocation costs about 2s, even if not much is rendered. Turns out, that's solely spent building all the s. The first time *any* key() is invoked for a document, all the keys are computed in a single pass over the document. A single reasonable key doesn't take that much time, even for the size of our docs. But there are several redundant keys being built. Some of them somewhat expensive. E.g. each takes about 300ms. There's one in chunk-common and one in docbook-no-doctype.xsl. I'm going to cry now. Greetings, Andres Freund
Re: Request for comment on setting binary format output per session
On Mon, 2023-03-20 at 14:36 -0400, Dave Cramer wrote: > Thanks for the review. I'm curious what system you are running on as > I don't see any of these errors. Are asserts enabled? > Well I'm guessing psql doesn't know how to read date or timestamptz > in binary. This is not a failing of the code. It seems strange, and potentially dangerous, to send binary data to a client that's not expecting it. It feels too easy to cause confusion by changing the GUC mid-session. Also, it seems like DISCARD ALL is not resetting it, which I think is a bug. > > This is an interesting question. If the type isn't visible then it's > not visible to the query so I don't think that's true -- the type could be in a different schema from the table. > > > > 5. There's a theoretical invalidation problem. It might also be a > > practical problem in some testing setups with long-lived > > connections > > that are recreating user-defined types. > > > > UDT's seem to be a problem here which candidly have very little use > case for binary output. I mostly agree with that, but it also might not be hard to support UDTs. Is there a design problem here or is it "just a matter of code"? > > I didn't try to solve it as Tom was OK with using a GUC. Using a > startup GUC is interesting, > but how would that work with pools where we want to reset the > connection when we return it and then > set the binary format on borrow ? By using a GUC when a client > borrows a connection from a pool the client > can reconfigure the oids it wants formatted in binary. That's a good point. How common is it to share a connection pool between different clients (some of which might support a binary format, and others which don't)? And would the connection pool put connections with and without the property in different pools? > > I really hadn't considered supporting type names. I have asked Paul > Ramsey about PostGIS and he doesn't see PostGIS using this. One of the things I like about Postgres is that the features all work together, and that user-defined objects are generally as good as built- in ones. Sometimes there's a reason to make a special case (e.g. syntax support or something), but in this case it seems like we could support user-defined types just fine, right? It's also just more friendly and readable to use type names, especially if it's a GUC. Regards, Jeff Davis
Re: Request for comment on setting binary format output per session
Dave Cramer writes: > On Mon, 20 Mar 2023 at 13:05, Jeff Davis wrote: >> 2. Easy to confuse psql: >> >> CREATE TABLE a(d date, t timestamptz); >> SET format_binary='25,1082,1184'; >> SELECT * FROM a; >> d | t >> ---+--- >> ! | >> (1 row) >> >> Well I'm guessing psql doesn't know how to read date or timestamptz in >> binary. This is not a failing of the code. What it is is a strong suggestion that controlling this via a GUC is not a great choice. There are many inappropriate (wrong abstraction level) ways to change a GUC and thereby break a client that's not expecting binary output. I think Jeff's suggestion that we should treat this as a protocol extension might be a good idea. If I recall the protocol-extension design correctly, such a setting could only be set at session start, which could be annoying --- at the very least we'd have to tolerate entries for unrecognized data types, since clients couldn't be expected to have checked the list against the current server in advance. regards, tom lane
Re: Eliminating SPI from RI triggers - take 2
On Mon, 17 Oct 2022 at 14:59, Robert Haas wrote: > > On Sat, Oct 15, 2022 at 1:47 AM Amit Langote wrote: > > But I think the bigger problem for this patch set is that the > design-level feedback from > https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com > hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid > is still trivial in v7, and that still seems wrong to me. And I still > don't know how we're going to avoid changing the semantics in ways > that are undesirable, or even knowing precisely what we did change. If > we don't have answers to those questions, then I suspect that this > patch set isn't going anywhere. Amit, do you plan to work on this patch for this commitfest (and therefore this release?). And do you think it has a realistic chance of being ready for commit this month? It looks to me like you have some good feedback and can progress and are unlikely to finish this patch for this release. In which case maybe we can move it forward to the next release? -- Gregory Stark As Commitfest Manager
Re: doc: add missing "id" attributes to extension packaging page
On Thu, 26 Jan 2023 at 15:55, Brar Piening wrote: > > On 18.01.2023 at 06:50, Brar Piening wrote: > > > I'll give it a proper look this weekend. > > It turns out I didn't get a round tuit. > > ... and I'm afraid I probably will not be able to work on this until > mid/end February so we'll have to move this to the next commitfest until > somebody wants to take it over and push it through. Looks like a lot of good work was happening on this patch right up until mid-February. Is there a lot of work left? Do you think you'll have a chance to wrap it up this commitfest for this release? -- Gregory Stark As Commitfest Manager
Re: [PATCH] psql: Add tab-complete for optional view parameters
On Sun, 29 Jan 2023 at 05:20, Mikhail Gribkov wrote: > > The problem is obviously in the newly added second line of the following > clause: > COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME", > "SET SCHEMA", "SET (", "RESET ("); > > "set schema" and "set (" alternatives are competing, while completion of the > common part "set" leads to a string composition which does not have > the check branch (Matches("ALTER", "VIEW", MatchAny, "SET")). > > I think it may worth looking at "alter materialized view" completion tree > and making "alter view" the same way. > > The new status of this patch is: Waiting on Author I think this patch received real feedback and it looks like it's clear where there's more work needed. I'll move this to the next commitfest. If you plan to work on it this month we can always move it back. -- Gregory Stark As Commitfest Manager
Re: Request for comment on setting binary format output per session
On Mon, 2023-03-20 at 10:04 -0700, Jeff Davis wrote: > CREATE TABLE a(d date, t timestamptz); > SET format_binary='25,1082,1184'; > SELECT * FROM a; > d | t > ---+--- > ! | > (1 row) Oops, missing the following statement after the CREATE TABLE: INSERT INTO a VALUES('1234-01-01', '2023-03-20 09:00:00'); Regards, Jeff Davis
Re: Make ON_ERROR_STOP stop on shell script failure
On Mon, 20 Mar 2023 at 14:34, Gregory Stark (as CFM) wrote: > > Pruning bouncing email address -- please respond from this point in > thread, not previous messages. Oh for heaven's sake. Trying again to prune bouncing email address. Please respond from *here* on the thread Sorry -- Gregory Stark As Commitfest Manager
Re: Request for comment on setting binary format output per session
+Paul Ramsey On Mon, 20 Mar 2023 at 13:05, Jeff Davis wrote: > On Mon, 2023-03-13 at 16:33 -0400, Dave Cramer wrote: > > Attached is a preliminary patch that takes a list of OID's. I'd like > > to know if this is going in the right direction. > > Thanks for the review. I'm curious what system you are running on as I don't see any of these errors. > I found a few issues: > > 1. Some kind of memory error: > > SET format_binary='25,1082,1184'; > WARNING: problem in alloc set PortalContext: detected write past > chunk end in block 0x55ba7b5f7610, chunk 0x55ba7b5f7a48 > ... > SET > 2. Easy to confuse psql: > > CREATE TABLE a(d date, t timestamptz); > SET format_binary='25,1082,1184'; > SELECT * FROM a; >d | t > ---+--- >! | > (1 row) > > Well I'm guessing psql doesn't know how to read date or timestamptz in binary. This is not a failing of the code. > 3. Some style issues > - use of "//" comments > - findOid should return bool, not int > > Sure will fix see attached patch > When you add support for user-defined types, that introduces a couple > other issues: > > 4. The format_binary GUC would depend on the search_path GUC, which > isn't great. > This is an interesting question. If the type isn't visible then it's not visible to the query so > > 5. There's a theoretical invalidation problem. It might also be a > practical problem in some testing setups with long-lived connections > that are recreating user-defined types. > UDT's seem to be a problem here which candidly have very little use case for binary output. > > > We've had this problem with binary for a long time, and it seems > desirable to solve it. But I'm not sure GUCs are the right way. > > How hard did you try to solve it in the protocol rather than with a > GUC? I see that the startup message allows protocol extensions by > prefixing a parameter name with "_pq_". Are protocol extensions > documented somewhere and would that be a reasonable thing to do here? > I didn't try to solve it as Tom was OK with using a GUC. Using a startup GUC is interesting, but how would that work with pools where we want to reset the connection when we return it and then set the binary format on borrow ? By using a GUC when a client borrows a connection from a pool the client can reconfigure the oids it wants formatted in binary. > > Also, if we're going to make the binary format more practical to use, > can we document the expectations better? Yes we can do that. > It seems the expecatation is > that the binary format just never changes, and that if it does, that's > a new type name. > > I really hadn't considered supporting type names. I have asked Paul Ramsey about PostGIS and he doesn't see PostGIS using this. > Regards, > Jeff Davis > > 0002-style-issues-with.patch Description: Binary data
Re: Make ON_ERROR_STOP stop on shell script failure
Pruning bouncing email address -- please respond from this point in thread, not previous messages. -- Gregory Stark As Commitfest Manager
Re: Make ON_ERROR_STOP stop on shell script failure
So I took a look at this patch. The conflict is with 2fe3bdbd691 committed by Peter Eisentraut which added error checks for pipes. Afaics the behaviour is now for pipe commands returning non-zero to cause an error *always* regardless of the setting of ON_ERROR_STOP. I'm not entirely sure that's sensible actually. If you write to a pipe that ends in grep and it happens to produce no matching rows you may actually be quite surprised when that causes your script to fail... But if you remove that failing hunk the resulting patch does apply. I don't see any tests so ... I don't know if the behaviour is still sensible. A quick read gives me the impression that now it's actually inconsistent in the other direction where it stops sometimes more often than the user might expect. I also don't understand the difference between ON_ERROR_STOP_ON and ON_ERROR_STOP_ALL. Nor why we would want ON_ERROR_STOP_SHELL which stops only on shell errors, rather than, say, ON_ERROR_STOP_SQL to do the converse which would at least match the historical behaviour? From 1f0feb9daf106721cb7fcba39ef7d663178f8ed1 Mon Sep 17 00:00:00 2001 From: Greg Stark Date: Mon, 20 Mar 2023 14:25:22 -0400 Subject: [PATCH v5] on error stop for shell --- doc/src/sgml/ref/psql-ref.sgml | 7 +++- src/bin/psql/command.c | 20 ++-- src/bin/psql/common.c | 58 +++--- src/bin/psql/psqlscanslash.l | 29 +++-- src/bin/psql/settings.h| 10 +- src/bin/psql/startup.c | 29 + 6 files changed, 134 insertions(+), 19 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7b8ae9fac3..856bb5716f 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -4194,7 +4194,12 @@ bar By default, command processing continues after an error. When this variable is set to on, processing will instead stop -immediately. In interactive mode, +immediately upon an error in SQL query. When this variable is set to +shell, a nonzero exit status of a shell command, +which indicates failure, is interpreted as an error that stops the processing. +When this variable is set to all, errors from both +SQL queries and shell commands can stop the processing. +In interactive mode, psql will return to the command prompt; otherwise, psql will exit, returning error code 3 to distinguish this case from fatal error diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 61ec049f05..9fbf2522ea 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -34,6 +34,7 @@ #include "describe.h" #include "fe_utils/cancel.h" #include "fe_utils/print.h" +#include "fe_utils/psqlscan_int.h" #include "fe_utils/string_utils.h" #include "help.h" #include "input.h" @@ -2394,9 +2395,13 @@ exec_command_set(PsqlScanState scan_state, bool active_branch) */ char *newval; char *opt; + PQExpBuffer output_buf = scan_state->output_buf; opt = psql_scan_slash_option(scan_state, OT_NORMAL, NULL, false); + if (output_buf->len >= output_buf->maxlen +&& (pset.on_error_stop == PSQL_ERROR_STOP_SHELL || pset.on_error_stop == PSQL_ERROR_STOP_ALL)) +success = false; newval = pg_strdup(opt ? opt : ""); free(opt); @@ -5041,10 +5046,19 @@ do_shell(const char *command) else result = system(command); - if (result == 127 || result == -1) + if (result != 0) { - pg_log_error("\\!: failed"); - return false; + if (result < 0) + pg_log_error("could not execute command: %m"); + else + { + char *reason = wait_result_to_str(result); + + pg_log_error("%s: %s", command, reason ? reason : ""); + free(reason); + } + if (pset.on_error_stop == PSQL_ERROR_STOP_SHELL || pset.on_error_stop == PSQL_ERROR_STOP_ALL) + return false; } return true; } diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index f907f5d4e8..c87090a75d 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -94,6 +94,7 @@ setQFout(const char *fname) { FILE *fout; bool is_pipe; + bool status = true; /* First make sure we can open the new output file/pipe */ if (!openQueryOutputFile(fname, , _pipe)) @@ -103,7 +104,24 @@ setQFout(const char *fname) if (pset.queryFout && pset.queryFout != stdout && pset.queryFout != stderr) { if (pset.queryFoutPipe) - pclose(pset.queryFout); + { + int pclose_rc = pclose(pset.queryFout); + if (pclose_rc != 0) + { +if (pclose_rc < 0) + pg_log_error("could not close pipe to external command: %m"); +else +{ + char *reason = wait_result_to_str(pclose_rc); + + pg_log_error("command failure: %s", + reason ? reason : ""); + free(reason); +} +if ((pset.on_error_stop == PSQL_ERROR_STOP_SHELL || pset.on_error_stop == PSQL_ERROR_STOP_ALL)) + status = false; + } + } else
Re: Save a few bytes in pg_attribute
Tomas Vondra writes: > IMHO it'd be much better to just not store the statistics target for > attributes that have it default (which we now identify by -1), or for > system attributes (where we store 0). I'd bet vast majority of systems > will just use the default / GUC value. So if we're interested in saving > these bytes, we could just store NULL in these cases, no? Hmm, we'd have to move it to the nullable part of the row and expend more code to fetch it; but I don't think it's touched in many places, so that might be a good tradeoff. Couple of notes: * As things stand I think we have a null bitmap in every row of pg_attribute already (surely attfdwoptions and attmissingval are never both filled), so there's no extra cost there. * Putting it in the variable part of the row means it wouldn't appear in tuple descriptors, but that seems fine. I wonder if the same is true of attinhcount. Since it's nonzero for partitions' attributes, it might be non-null in a fairly large fraction of pg_attribute rows in some use-cases, but it still seems like we'd not be losing anything. It wouldn't need to be touched in any high-performance code paths AFAICS. regards, tom lane
Re: [PATCH] Fix alter subscription concurrency errors
"Gregory Stark (as CFM)" writes: > Hm. This patch is still waiting on updates. But it's a bug fix and it > would be good to get this in. Is someone else interested in finishing > this if Jeite isn't available? I think the patch as-submitted is pretty uninteresting, mainly because the design of adding bespoke lock code for subscription objects doesn't scale. I'm not excited about doing this just for subscriptions without at least a clear plan for working on other object types. Alvaro suggested switching it to use get_object_address() instead, which'd be better; but before going in that direction we might have to do more work on get_object_address's error reporting (note the para in its header comments saying it's pretty weak on that). regards, tom lane
Re: Fix order of checking ICU options in initdb and create database
Is this feedback enough to focus the work on the right things? I feel like Marina Polyakova pointed out some real confusing behaviour and perhaps there's a way to solve them by focusing on one at a time without making large changes in the code. Perhaps an idea would be to have each module provide two functions, one which is called early and signals an error if that module's parameters are provided when it's not compiled in, and a second which verifies that the parameters are consistent at the point in time where that's appropriate. (Not entirely unlike what we do for GUCs, though simpler) If every module did that consistently then it would avoid making the changes "unprincipled" or "spaghetti" though frankly I find words like that not very helpful to someone receiving that feedback. The patch is obviously not ready for commit now but it also seems like the feedback has not been really sufficient for Marina Polyakova to make progress either. -- Gregory Stark As Commitfest Manager
Re: Data is copied twice when specifying both child and parent table in publication
On Fri, Mar 17, 2023 at 9:45 PM Amit Kapila wrote: > > There are a bunch of moving parts and hidden subtleties here, and I fell > > into a few traps when I was working on my patch, so it'd be nice to have > > additional coverage. I'm happy to contribute effort in that area if it's > > helpful. > > I think it depends on what tests you have in mind. Just the ones I mentioned, to start with. > I suggest you can > propose a patch to cover tests for this are in a separate thread. We > can then evaluate those separately. To confirm -- you want me to start a new thread for tests for this patchset? (Tests written against HEAD would likely be obsoleted by this change.) --Jacob
Re: Error "initial slot snapshot too large" in create replication slot
On Wed, 12 Oct 2022 at 01:10, Michael Paquier wrote: > > On Tue, Sep 13, 2022 at 05:31:05PM +0900, Kyotaro Horiguchi wrote: > > And ExportSnapshot repalces oversized subxip with overflowed. > > > > So even when GetSnapshotData() returns a snapshot with oversized > > subxip, it is saved as just "overflowed" on exporting. I don't think > > this is the expected behavior since such (no xip and overflowed) > > snapshot no longer works. > > > > On the other hand, it seems to me that snapbuild doesn't like > > takenDuringRecovery snapshots. > > > > So snapshot needs additional flag signals that xip is oversized and > > all xid are holded there. And also need to let takenDuringRecovery > > suggest subxip is oversized. > > The discussion seems to have stopped here. As this is classified as a > bug fix, I have moved this patch to next CF, waiting on author for the > moment. Kyotoro Horiguchi, any chance you'll be able to work on this for this commitfest? If so shout (or anyone else is planning to push it over the line Andres?) otherwise I'll move it on to the next release. -- Gregory Stark As Commitfest Manager
Re: Save a few bytes in pg_attribute
On 3/20/23 15:37, Tom Lane wrote: > Peter Eisentraut writes: >> After the discussion in [0] ff., I was looking around in pg_attribute >> and noticed that we could possibly save 4 bytes. We could change both >> attstattarget and attinhcount from int4 to int2, which together with >> some reordering would save 4 bytes from the fixed portion. > >> attstattarget is already limited to 1, so this wouldn't lose >> anything. For attinhcount, I don't see any documented limits. But it >> seems unlikely to me that someone would need more than 32k immediate >> inheritance parents on a column. (Maybe an overflow check would be >> useful, though, to prevent shenanigans.) > >> The attached patch seems to work. Thoughts? > > I agree that attinhcount could be narrowed, but I have some concern > about attstattarget. IIRC, the limit on attstattarget was once 1000 > and then we raised it to 1. Is it inconceivable that we might > want to raise it to 10 someday? > Yeah, I don't think it'd be wise to make it harder to increase the statistics target limit. IMHO it'd be much better to just not store the statistics target for attributes that have it default (which we now identify by -1), or for system attributes (where we store 0). I'd bet vast majority of systems will just use the default / GUC value. So if we're interested in saving these bytes, we could just store NULL in these cases, no? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Fix alter subscription concurrency errors
On Mon, 16 Jan 2023 at 09:47, vignesh C wrote: > > Jeite, please post an updated version with the fixes. As CommitFest > 2023-01 is currently underway, this would be an excellent time to > update the patch. Hm. This patch is still waiting on updates. But it's a bug fix and it would be good to get this in. Is someone else interested in finishing this if Jeite isn't available? -- Gregory Stark As Commitfest Manager
Re: real/float example for testlibpq3
On Mon, 23 Jan 2023 at 11:54, Mark Wong wrote: fficient way to communicate useful information. > > Yeah, I will try to cover all the data types we ship. :) I'll keep at > it and drop the code examples. I assume this isn't going to happen for this commitfest? If you think it is then shout otherwise I'll mark it Returned with Feedback and move it on to the next release. -- Gregory Stark As Commitfest Manager
Re: meson documentation build open issues
Hi, On 2023-03-20 11:58:08 +0100, Peter Eisentraut wrote: > Oh, this patch set grew quite quickly. ;-) Yep :) > [PATCH v2 5/8] docs: html: copy images to output as part of xslt build > > Making the XSLT stylesheets do the copying has some appeal. I think it > would only work for SVG (or other XML) files, which I guess is okay, but > maybe the templates should have a filter on format="SVG" or something. Also, > this copying actually modifies the files in some XML-equivalent way. Also > okay, I think, but worth noting. I think it can be made work for non-xml files with xinclude too. But the restriction around only working in top-level stylesheets (vs everywhere for documents) is quite annoying. > [PATCH v2 6/8] wip: docs: copy or inline css > > This seems pretty complicated compared to just copying a file? Mainly that it works correctly for the standalone file. Greetings, Andres Freund
Re: Request for comment on setting binary format output per session
On Mon, 2023-03-13 at 16:33 -0400, Dave Cramer wrote: > Attached is a preliminary patch that takes a list of OID's. I'd like > to know if this is going in the right direction. I found a few issues: 1. Some kind of memory error: SET format_binary='25,1082,1184'; WARNING: problem in alloc set PortalContext: detected write past chunk end in block 0x55ba7b5f7610, chunk 0x55ba7b5f7a48 ... SET 2. Easy to confuse psql: CREATE TABLE a(d date, t timestamptz); SET format_binary='25,1082,1184'; SELECT * FROM a; d | t ---+--- ! | (1 row) 3. Some style issues - use of "//" comments - findOid should return bool, not int When you add support for user-defined types, that introduces a couple other issues: 4. The format_binary GUC would depend on the search_path GUC, which isn't great. 5. There's a theoretical invalidation problem. It might also be a practical problem in some testing setups with long-lived connections that are recreating user-defined types. We've had this problem with binary for a long time, and it seems desirable to solve it. But I'm not sure GUCs are the right way. How hard did you try to solve it in the protocol rather than with a GUC? I see that the startup message allows protocol extensions by prefixing a parameter name with "_pq_". Are protocol extensions documented somewhere and would that be a reasonable thing to do here? Also, if we're going to make the binary format more practical to use, can we document the expectations better? It seems the expecatation is that the binary format just never changes, and that if it does, that's a new type name. Regards, Jeff Davis
Re: logical decoding and replication of sequences, take 2
On 3/20/23 13:26, Amit Kapila wrote: > On Mon, Mar 20, 2023 at 5:13 PM Tomas Vondra > wrote: >> >> On 3/20/23 12:00, Amit Kapila wrote: >>> On Mon, Mar 20, 2023 at 1:49 PM Tomas Vondra >>> wrote: I don't understand why we'd need WAL from before the slot is created, which happens before copy_sequence so the sync will see a more recent state (reflecting all changes up to the slot LSN). >>> >>> Imagine the following sequence of events: >>> 1. Operation on a sequence seq-1 which requires WAL. Say, this is done >>> at LSN 1000. >>> 2. Some other random operations on unrelated objects. This would >>> increase LSN to 2000. >>> 3. Create a slot that uses current LSN 2000. >>> 4. Copy sequence seq-1 where you will get the LSN value as 1000. Then >>> you will use LSN 1000 as a starting point to start replication in >>> sequence sync worker. >>> >>> It is quite possible that WAL from LSN 1000 may not be present. Now, >>> it may be possible that we use the slot's LSN in this case but >>> currently, it may not be possible without some changes in the slot >>> machinery. Even, if we somehow solve this, we have the below problem >>> where we can miss some concurrent activity. >>> >> >> I think the question is what would be the WAL-requiring operation at LSN >> 1000. If it's just regular nextval(), then we *will* see it during >> copy_sequence - sequences are not transactional in the MVCC sense. >> >> If it's an ALTER SEQUENCE, I guess it might create a new relfilenode, >> and then we might fail to apply this - that'd be bad. >> >> I wonder if we'd allow actually discarding the WAL while building the >> consistent snapshot, though. >> > > No, as soon as we reserve the WAL location, we update the slot's > minLSN (replicationSlotMinLSN) which would prevent the required WAL > from being removed. > >> You're however right we can't just decide >> this based on LSN, we'd probably need to compare the relfilenodes too or >> something like that ... >> I think the only "issue" are the WAL records after the slot LSN, or more precisely deciding which of the decoded changes to apply. > Now, for the second idea which is to directly use > pg_current_wal_insert_lsn(), I think we won't be able to ensure that > the changes covered by in-progress transactions like the one with > Alter Sequence I have given example would be streamed later after the > initial copy. Because the LSN returned by pg_current_wal_insert_lsn() > could be an LSN after the LSN associated with Alter Sequence but > before the corresponding xact's commit. Yeah, I think you're right - the locking itself is not sufficient to prevent this ordering of operations. copy_sequence would have to lock the sequence exclusively, which seems bit disruptive. >>> >>> Right, that doesn't sound like a good idea. >>> >> >> Although, maybe we could use a less strict lock level? I mean, one that >> allows nextval() to continue, but would conflict with ALTER SEQUENCE. >> > > I don't know if that is a good idea but are you imagining a special > interface/mechanism just for logical replication because as far as I > can see you have used SELECT to fetch the sequence values? > Not sure what would the special mechanism be? I don't think it could read the sequence from somewhere else, and due the lack of MVCC we'd just read same sequence data from the current relfilenode. Or what else would it do? The one thing we can't quite do at the moment is locking the sequence, because LOCK is only supported for tables. So we could either provide a function to lock a sequence, or locks it and then returns the current state (as if we did a SELECT). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add SHELL_EXIT_CODE to psql
Corey Huinker writes: > 128+N is implemented. I think this mostly looks OK, but: * I still say there is no basis whatever for believing that the result of ferror() is an exit code, errno code, or anything else with significance beyond zero-or-not. Feeding it to wait_result_to_exit_code as you've done here is not going to do anything but mislead people in a platform-dependent way. Probably should set exit_code to -1 if ferror reports trouble. * Why do you have wait_result_to_exit_code defaulting to return 0 if it doesn't recognize the code as either WIFEXITED or WIFSIGNALED? That seems pretty misleading; again -1 would be a better idea. regards, tom lane
Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
On Fri, Mar 10, 2023 at 7:00 PM Jacob Champion wrote: > On Thu, Mar 9, 2023 at 6:17 AM Robert Haas wrote: > > That seems like a circular argument. If you call the problem the > > confused deputy problem then the issue must indeed be that the deputy > > is confused, and needs to talk to someone else to get un-confused. But > > why is the deputy necessarily confused in the first place? Our deputy > > is confused because our code to decide whether to proxy a connection > > or not is super-dumb, > > No, I think our proxy is confused because it doesn't know what power > it has, and it can't tell the server what power it wants to use. That > problem is independent of the decision to proxy. You're suggesting > strengthening the code that makes that decision -- adding an oracle > (in the form of a DBA) that knows about the confusion and actively > mitigates it. That's guaranteed to work if the oracle is perfect, > because "perfect" is somewhat tautologically defined as "whatever > ensures secure operation". But the oracle doesn't reduce the > confusion, and DBAs aren't perfect. I think this is the root of our disagreement. My understanding of the previous discussion is that people think that the major problem here is the wraparound-to-superuser attack. That is, in general, we expect that when we connect to a database over the network, we expect it to do some kind of active authentication, like asking us for a password, or asking us for an SSL certificate that isn't just lying around for anyone to use. However, in the specific case of a local connection, we have a reliable way of knowing who the remote user is without any kind of active authentication, namely 'peer' authentication or perhaps even 'trust' if we trust all the local users, and so we don't judge it unreasonable to allow local connections without any form of active authentication. There can be some scenarios where even over a network we can know the identity of the person connecting with complete certainty, e.g. if endpoints are locked down such that the source IP address is a reliable indicator of who is initiating the connection, but in general when there's a network involved you don't know who the person making the connection is and need to do something extra to figure it out. If you accept this characterization of the problem, then I don't think the oracle is that hard to design. We simply set it up not to allow wraparound connections, or maybe even more narrowly to not allow wraparound connections to superuser. If the DBA has some weird network topology where that's not the correct rule, either because they want to allow wraparound connections or they want to disallow other things, then yeah they have to tell us what to allow, but I don't really see why that's an unreasonable expectation. I'd expect the correct configuration of the proxy facility to fall naturally out of what's allowed in pg_hba.conf. If machine A is configured to accept connections from machines B and C based on environmental factors, then machines B and C should be configured not to proxy connections to A. If machines B and C aren't under our control such that we can configure them that way, then the configuration is fundamentally insecure in a way that we can't really fix. I think that what you're proposing is that B and C can just be allowed to proxy to A and A can say "hey, by the way, I'm just gonna let you in without asking for anything else" and B and C can, when proxying, react to that by disconnecting before the connection actually goes through. That's simpler, in a sense. It doesn't require us to set up the proxy configuration on B and C in a way that matches what pg_hba.conf allows on A. Instead, B and C can automatically deduce what connections they should refuse to proxy. I guess that's nice, but it feels pretty magical to me. It encourages the DBA not to think about what B and C should actually be allowed to proxy, and instead just trust that the automatics are going to prevent any security disasters. I'm not sure that they always will, and I fear cultivating too much reliance on them. I think that if you're setting up a network topology where the correct rule is something more complex than "don't allow wraparound connections to superuser," maybe you ought to be forced to spell that rule out instead of letting the system deduce one that you hope will be right. -- Robert Haas EDB: http://www.enterprisedb.com
Fix typo plgsql to plpgsql.
Hi, all Found several typos like plgsql, I think it should be plpgsql. Regards, Zhang Mingli 0001-Fix-typo-plgsql-to-plpgsql.patch Description: Binary data
Re: Add SHELL_EXIT_CODE to psql
I did a look at the patch, and it seems to be in a good condition. It implements declared functionality with no visible defects. As for test, I think it is possible to implement "signals" case in tap tests. But let the actual commiter decide does it worth it or not. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Hi! As suggested before by Heikki Linnakangas, I've added a patch for making 2PC transaction state 64-bit. At first, my intention was to rebuild all twophase interface to use FullTransactionId. But doing this in a proper manner would lead to switching from TransactionId to FullTransactionId in PGPROC and patch become too big to handle here. So I decided to keep it simple for now and use wrap logic trick and calc FullTransactionId on current epoch, since the span of active xids cannot exceed one epoch at any given time. Patches 1 and 2 are the same as above. -- Best regards, Maxim Orlov. v57-0002-Use-larger-segment-file-names-for-pg_notify.patch Description: Binary data v57-0003-Make-use-FullTransactionId-in-2PC-filenames.patch Description: Binary data v57-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch Description: Binary data
Re: Allow parallel plan for referential integrity checks?
On 3/20/23 15:58, Gregory Stark (as CFM) wrote: On Mon, 12 Dec 2022 at 11:37, Frédéric Yhuel wrote: I've planned to work on it full time on week 10 (6-10 March), if you agree to bear with me. The idea would be to bootstrap my brain on it, and then continue to work on it from time to time. Any updates on this patch? I had the opportunity to discuss it with Melanie Plageman when she came to Paris last month and teach us (Dalibo's folk) about the patch review process. She advised me to provide a good test case first, and to explain better how it would be useful, which I'm going to do. Should we move it to next release at this point? Even if you get time to work on it this commitfest do you think it's likely to be committable in the next few weeks? It is very unlikely. Maybe it's better to remove it from CF and put it back later if the test case I will provide does a better job at convincing the Postgres folks that RI checks should be parallelized.
Re: Implement missing join selectivity estimation for range types
Hi Tomas, As a quick update, the paper related to this work has finally been published in Mathematics (https://www.mdpi.com/2227-7390/11/6/1383). During revision we also added a figure showing a comparison of our algorithm vs the existing algorithms in Oracle, SQL Server, MySQL and PostgreSQL, which can be found in the experiments section of the paper. As can be seen, our algorithm outperforms even Oracle and SQL Server. During this revision we also found a small bug, so we are working on a revision of the patch, which fixes this. Also, calc_hist_selectivity_contains in multirangetypes_selfuncs.c needs a proper comment, not just "this is a copy from rangetypes". Right, the comment should elaborate more that the collected statistics are currently that same as rangetypes but may potentially deviate. However, it seems the two functions are exactly the same. Would the functions diverge in the future? If not, maybe there should be just a single shared function? Indeed, it is possible that the two functions will deviate if that statistics of multirange types will be refined. Right, but are there any such plans? Also, what's the likelihood we'll add new statistics to only one of the places (e.g. for multiranges but not plain ranges)? I'd keep a single function until we actually need two. That's also easier for maintenance - with two it's easy to fix a bug in one place and forget about the other, etc. Regarding our previous discussion about the duplication of calc_hist_join_selectivity in rangetypes_selfuncs.c and multirangetypes_selfuncs.c, we can also remove this duplication in the revision if needed. Note that currently, there are no external functions shared between rangetypes_selfuncs.c and multirangetypes_selfuncs.c. Any function that was used in both files was duplicated as a static function. The functions calc_hist_selectivity_scalar, calc_length_hist_frac, calc_hist_selectivity_contained and calc_hist_selectivity_contains are examples of this, where the function is identical but has been declared static in both files. That said, we can remove the duplication of calc_hist_join_selectivity if it still needed. We would, however, require some guidance as to where to put the external definition of this function, as there does not appear to be a rangetypes_selfuncs.h header. Should it simply go into utils/selfuncs.h or should we create a new header file? Best regards, Maxime Schoemans
Re: Allow parallel plan for referential integrity checks?
On Mon, 12 Dec 2022 at 11:37, Frédéric Yhuel wrote: > > I've planned to work on it full time on week 10 (6-10 March), if you > agree to bear with me. The idea would be to bootstrap my brain on it, > and then continue to work on it from time to time. Any updates on this patch? Should we move it to next release at this point? Even if you get time to work on it this commitfest do you think it's likely to be committable in the next few weeks? -- Gregory Stark As Commitfest Manager
Re: explain analyze rows=%.0f
On Wed, 4 Jan 2023 at 10:05, Ibrar Ahmed wrote: > > Thanks, I have modified everything as suggested, except one point > > > Don't use variable format strings. They're hard to read and they > > probably defeat compile-time checks that the arguments match the > > format string. You could use a "*" field width instead, ie ... > > * Another thought is that the non-text formats tend to prize output > > consistency over readability, so maybe we should just always use 2 > > fractional digits there, rather than trying to minimize visible changes. > > In that, we need to adjust a lot of test case outputs. > > * We need some doc adjustments, surely, to explain what the heck this > > means. That sounds like three points :) But they seem like pretty good arguments to me and straightforward if a little tedious to adjust. This patch was marked Returned with Feedback and then later Waiting on Author. And it hasn't had any updates since January. What is the state on this feedback? If it's already done we can set the patch to Ready for Commit and if not do you disagree with the proposed changes? It's actually the kind of code cleanup changes I'm reluctant to bump a patch for. It's not like a committer can't make these kinds of changes when committing. But there are so many patches they're likely to just focus on a different patch when there are adjustments like this pending. -- Gregory Stark As Commitfest Manager
Re: Question: Do we have a rule to use "PostgreSQL" and "PostgreSQL" separately?
> On 20 Mar 2023, at 15:31, Tom Lane wrote: > > "Hayato Kuroda (Fujitsu)" writes: >> While checking documentations, I found that one line notes our product as >> "PostgreSQL", whereas another line notes as just >> "PostgreSQL". > > IMO the convention is to use the tag everywhere that we > spell out "PostgreSQL". I don't think it's actually rendered differently > with our current stylesheets, but maybe someday it will be. IIRC the main use in DocBook is for automatically decorating productnames with trademark signs etc, and to generate lists of trademarks, but also that they can be rendered differently. This reminded me that I was planning to apply the below to make the markup of PostgreSQL consistent: diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 4df8bd1b64..9ff6b08f5a 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -2667,7 +2667,7 @@ TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02' To complicate matters, some jurisdictions have used the same timezone abbreviation to mean different UTC offsets at different times; for example, in Moscow MSK has meant UTC+3 in some years and - UTC+4 in others. PostgreSQL interprets such + UTC+4 in others. PostgreSQL interprets such abbreviations according to whatever they meant (or had most recently meant) on the specified date; but, as with the EST example above, this is not necessarily the same as local civil time on that date. -- Daniel Gustafsson
Re: Save a few bytes in pg_attribute
Peter Eisentraut writes: > After the discussion in [0] ff., I was looking around in pg_attribute > and noticed that we could possibly save 4 bytes. We could change both > attstattarget and attinhcount from int4 to int2, which together with > some reordering would save 4 bytes from the fixed portion. > attstattarget is already limited to 1, so this wouldn't lose > anything. For attinhcount, I don't see any documented limits. But it > seems unlikely to me that someone would need more than 32k immediate > inheritance parents on a column. (Maybe an overflow check would be > useful, though, to prevent shenanigans.) > The attached patch seems to work. Thoughts? I agree that attinhcount could be narrowed, but I have some concern about attstattarget. IIRC, the limit on attstattarget was once 1000 and then we raised it to 1. Is it inconceivable that we might want to raise it to 10 someday? regards, tom lane
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Mar 20, 2023 at 9:34 PM John Naylor wrote: > > > On Mon, Mar 20, 2023 at 12:25 PM Masahiko Sawada > wrote: > > > > On Fri, Mar 17, 2023 at 4:49 PM Masahiko Sawada > > wrote: > > > > > > On Fri, Mar 17, 2023 at 4:03 PM John Naylor > > > wrote: > > > > > > > > On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada > > > > wrote: > > > > > > > > > > On Tue, Mar 14, 2023 at 8:27 PM John Naylor > > > > > wrote: > > > > > > > > > > > > I wrote: > > > > > > > > > > > > > > > Since the block-level measurement is likely overestimating > > > > > > > > > quite a bit, I propose to simply reverse the order of the > > > > > > > > > actions here, effectively reporting progress for the *last > > > > > > > > > page* and not the current one: First update progress with the > > > > > > > > > current memory usage, then add tids for this page. If this > > > > > > > > > allocated a new block, only a small bit of that will be > > > > > > > > > written to. If this block pushes it over the limit, we will > > > > > > > > > detect that up at the top of the loop. It's kind of like our > > > > > > > > > earlier attempts at a "fudge factor", but simpler and less > > > > > > > > > brittle. And, as far as OS pages we have actually written to, > > > > > > > > > I think it'll effectively respect the memory limit, at least > > > > > > > > > in the local mem case. And the numbers will make sense. > > > > > > > > > > I still like my idea at the top of the page -- at least for vacuum > > > > > > and m_w_m. It's still not completely clear if it's right but I've > > > > > > got nothing better. It also ignores the work_mem issue, but I've > > > > > > given up anticipating all future cases at the moment. > > > > > > > > > IIUC you suggested measuring memory usage by tracking how much memory > > > > > chunks are allocated within a block. If your idea at the top of the > > > > > page follows this method, it still doesn't deal with the point Andres > > > > > mentioned. > > > > > > > > Right, but that idea was orthogonal to how we measure memory use, and > > > > in fact mentions blocks specifically. The re-ordering was just to make > > > > sure that progress reporting didn't show current-use > max-use. > > > > > > Right. I still like your re-ordering idea. It's true that the most > > > area of the last allocated block before heap scanning stops is not > > > actually used yet. I'm guessing we can just check if the context > > > memory has gone over the limit. But I'm concerned it might not work > > > well in systems where overcommit memory is disabled. > > > > > > > > > > > However, the big question remains DSA, since a new segment can be as > > > > large as the entire previous set of allocations. It seems it just > > > > wasn't designed for things where memory growth is unpredictable. > > > > aset.c also has a similar characteristic; allocates an 8K block upon > > the first allocation in a context, and doubles that size for each > > successive block request. But we can specify the initial block size > > and max blocksize. This made me think of another idea to specify both > > to DSA and both values are calculated based on m_w_m. For example, we > > That's an interesting idea, and the analogous behavior to aset could be a > good thing for readability and maintainability. Worth seeing if it's workable. I've attached a quick hack patch. It can be applied on top of v32 patches. The changes to dsa.c are straightforward since it makes the initial and max block sizes configurable. The patch includes a test function, test_memory_usage() to simulate how DSA segments grow behind the shared radix tree. If we set the first argument to true, it calculates both initial and maximum block size based on work_mem (I used work_mem here just because its value range is larger than m_w_m): postgres(1:833654)=# select test_memory_usage(true); NOTICE: memory limit 134217728 NOTICE: init 1048576 max 16777216 NOTICE: initial: 1048576 NOTICE: rt_create: 1048576 NOTICE: allocate new DSM [1] 1048576 NOTICE: allocate new DSM [2] 2097152 NOTICE: allocate new DSM [3] 2097152 NOTICE: allocate new DSM [4] 4194304 NOTICE: allocate new DSM [5] 4194304 NOTICE: allocate new DSM [6] 8388608 NOTICE: allocate new DSM [7] 8388608 NOTICE: allocate new DSM [8] 16777216 NOTICE: allocate new DSM [9] 16777216 NOTICE: allocate new DSM [10] 16777216 NOTICE: allocate new DSM [11] 16777216 NOTICE: allocate new DSM [12] 16777216 NOTICE: allocate new DSM [13] 16777216 NOTICE: allocate new DSM [14] 16777216 NOTICE: reached: 148897792 (+14680064) NOTICE: 12718205 keys inserted: 148897792 test_memory_usage --- (1 row) Time: 7195.664 ms (00:07.196) Setting the first argument to false, we can specify both manually in second and third arguments: postgres(1:833654)=# select test_memory_usage(false, 1024 * 1024, 1024 * 1024 * 1024 * 10::bigint); NOTICE: memory limit 134217728 NOTICE: init 1048576 max 10737418240 NOTICE: initial: 1048576
Re: Question: Do we have a rule to use "PostgreSQL" and "PostgreSQL" separately?
"Hayato Kuroda (Fujitsu)" writes: > While checking documentations, I found that one line notes our product as > "PostgreSQL", whereas another line notes as just > "PostgreSQL". IMO the convention is to use the tag everywhere that we spell out "PostgreSQL". I don't think it's actually rendered differently with our current stylesheets, but maybe someday it will be. regards, tom lane
Re: Missing rules for queryjumblefuncs.{funcs,switch}.c for maintainer-clean
Michael Paquier writes: > Nathan has reported to me offlist that maintainer-clean was not doing > its job for the files generated by gen_node_support.pl in > src/backend/nodes/ for the query jumbling. Attached is a patch to > take care of this issue. > While on it, I have found a comment in the related README that was > missing a refresh. > Any objections or comments? Is similar knowledge missing in the meson build files? regards, tom lane
Re: Commitfest 2023-03 starting tomorrow!
The next level of this would be something like notifying the committer with a list of patches in the CF that a commit broke. I don't immediately see how to integrate that with our workflow but I have seen something like this work well in a previous job. When committing code you often went and updated other unrelated projects to adapt to the new API (or could adjust the code you were committing to cause less breakage).
Re: Memory leak from ExecutorState context?
On Mon, 20 Mar 2023 09:32:17 +0100 Tomas Vondra wrote: > >> * Patch 1 could be rebased/applied/backpatched > > > > Would it help if I rebase Patch 1 ("move BufFile stuff into separate > > context")? > > Yeah, I think this is something we'd want to do. It doesn't change the > behavior, but it makes it easier to track the memory consumption etc. Will do this week. > I think focusing on the backpatchability is not particularly good > approach. It's probably be better to fist check if there's any hope of > restarting the work on BNLJ, which seems like a "proper" solution for > the future. Backpatching worth some consideration. Balancing is almost there and could help in 16. As time is flying, it might well miss release 16, but maybe it could be backpacthed in 16.1 or a later minor release? But what if it is not eligible for backpatching? We would stall another year without having at least an imperfect stop-gap solution (sorry for picking your words ;)). BNJL and/or other considerations are for 17 or even after. In the meantime, Melanie, who authored BNLJ, +1 the balancing patch as it can coexists with other discussed solutions. No one down vote since then. Melanie, what is your opinion today on this patch? Did you change your mind as you worked for many months on BNLJ since then? > On 3/19/23 20:31, Justin Pryzby wrote: > > On Fri, Mar 17, 2023 at 05:41:11PM +0100, Tomas Vondra wrote: > * Patch 2 is worth considering to backpatch > >> > >> I'm not quite sure what exactly are the numbered patches, as some of the > >> threads had a number of different patch ideas, and I'm not sure which > >> one was/is the most promising one. > > > > patch 2 is referring to the list of patches that was compiled > > https://www.postgresql.org/message-id/20230310195114.6d0c5406%40karst > > Ah, I see - it's just the "rebalancing" patch which minimizes the total > amount of memory used (i.e. grow work_mem a bit, so that we don't > allocate too many files). > > Yeah, I think that's the best we can do without reworking how we spill > data (slicing or whatever). Indeed, that's why I was speaking about backpatching for this one: * it surely helps 16 (and maybe previous release) in such skewed situation * it's constrained * it's not too invasive, it doesn't shake to whole algorithm all over the place * Melanie was +1 for it, no one down vote. What need to be discussed/worked: * any regressions for existing queries running fine or without OOM? * add the bufFile memory consumption in the planner considerations? > I think the spilling is "nicer" in that it actually enforces work_mem > more strictly than (a), Sure it enforces work_mem more strictly, but this is a discussion for 17 or later in my humble opinion. > but we should probably spend a bit more time on > the exact spilling strategy. I only really tried two trivial ideas, but > maybe we can be smarter about allocating / sizing the files? Maybe > instead of slices of constant size we could/should make them larger, > similarly to what log-structured storage does. > > For example we might double the number of batches a file represents, so > the first file would be for batch 1, the next one for batches 2+3, then > 4+5+6+7, then 8-15, then 16-31, ... > > We should have some model calculating (i) amount of disk space we would > need for the spilled files, and (ii) the amount of I/O we do when > writing the data between temp files. And: * what about Robert's discussion on uneven batch distribution? Why is it ignored? Maybe there was some IRL or off-list discussions? Or did I missed some mails? * what about dealing with all normal batch first, then revamp in freshly emptied batches the skewed ones, spliting them if needed, then rince & repeat? At some point, we would probably still need something like slicing and/or BNLJ though... > let's revive the rebalance and/or spilling patches. Imperfect but better than > nothing. +1 for rebalance. I'm not even sure it could make it to 16 as we are running out time, but it worth to try as it's the closest one that could be reviewed and ready'd-for-commiter. I might lack of ambition, but spilling patches seems too much to make it for 16. It seems to belongs with other larger patches/ideas (patches 4 a 5 in my sum up). But this is just my humble feeling. > And then in the end we can talk about if/what can be backpatched. > > FWIW I don't think there's a lot of rush, considering this is clearly a > matter for PG17. So the summer CF at the earliest, people are going to > be busy until then. 100% agree, there's no rush for patches 3, 4 ... and 5. Thanks!
Re: Kerberos delegation support in libpq and postgres_fdw
Greetings, * Stephen Frost (sfr...@snowman.net) wrote: > * Michael Paquier (mich...@paquier.xyz) wrote: > > On Mon, Sep 19, 2022 at 02:05:39PM -0700, Jacob Champion wrote: > > > It's not prevented, because a password is being used. In my tests I'm > > > connecting as an unprivileged user. > > > > > > You're claiming that the middlebox shouldn't be doing this. If this new > > > default behavior were the historical behavior, then I would have agreed. > > > But the cat's already out of the bag on that, right? It's safe today. > > > And if it's not safe today for some other reason, please share why, and > > > maybe I can work on a patch to try to prevent people from doing it. > > > > Please note that this has been marked as returned with feedback in the > > current CF, as this has remained unanswered for a bit more than three > > weeks. > > There's some ongoing discussion about how to handle outbound connections > from the server ending up picking up credentials from the server's > environment (that really shouldn't be allowed unless specifically asked > for..), that's ultimately an independent change from what this patch is > doing. That got committed, which is great, though it didn't go quite as far as I had been hoping regarding dealing with outbound connections from the server- perhaps we should make it clear at least for postgres_fdw that it might be good for administrators to explicitly say which options are allowed for a given user-map when it comes to how authentication is done to the remote server? Seems like mostly a documentation improvement, I think? Or should we have some special handling around that option for postgres_fdw/dblink? > Here's an updated version which does address Robert's concerns around > having this disabled by default and having options on both the server > and client side saying if it is to be enabled or not. Also added to > pg_stat_gssapi a field that indicates if credentials were proxied or not > and made some other improvements and added additional regression tests > to test out various combinations. I've done some self-review and also reworked how the security checks are done to be sure that we're not ending up pulling credentials from the environment (with added regression tests to check for it too). If there's remaining concerns around that, please let me know. Of course, other review would be great also. Presently though: - Rebased up to today - Requires explicitly being enabled on client and server - Authentication to a remote server via dblink or postgres_fdw with GSSAPI requires that credentials were proxied by the client to the server, except if the superuser set 'password_required' to false on the postgres_fdw (which has lots of caveats around it in the documentation because it's inherently un-safe to do). - Includes updated documentation - Quite a few additional regression tests to check for unrelated credentials coming from the environment in either cases where credentials have been proxied and in cases where they haven't. - Only changes to existing regression tests for dblink/postgres_fdw are in the error message wording updates. Thanks! Stephen From 03a799699b42d3f9d9ed017bd448f606b7a2761d Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Mon, 28 Feb 2022 20:17:55 -0500 Subject: [PATCH] Add support for Kerberos credential delegation Support GSSAPI/Kerberos credentials being delegated to the server by a client. With this, a user authenticating to PostgreSQL using Kerberos (GSSAPI) credentials can choose to delegate their credentials to the PostgreSQL server (which can choose to accept them, or not), allowing the server to then use those delegated credentials to connect to another service, such as with postgres_fdw or dblink or theoretically any other service which is able to be authenticated using Kerberos. Both postgres_fdw and dblink are changed to allow non-superuser password-less connections but only when GSSAPI credentials have been proxied to the server by the client and GSSAPI is used to authenticate to the remote system. Authors: Stephen Frost, Peifeng Qiu Discussion: https://postgr.es/m/co1pr05mb8023cc2cb575e0faad7df4f8a8...@co1pr05mb8023.namprd05.prod.outlook.com --- contrib/dblink/dblink.c | 118 +++--- contrib/dblink/expected/dblink.out| 4 +- contrib/postgres_fdw/connection.c | 68 +++- .../postgres_fdw/expected/postgres_fdw.out| 19 +- contrib/postgres_fdw/option.c | 3 + contrib/postgres_fdw/sql/postgres_fdw.sql | 3 +- doc/src/sgml/config.sgml | 17 + doc/src/sgml/dblink.sgml | 5 +- doc/src/sgml/libpq.sgml | 41 +++ doc/src/sgml/monitoring.sgml | 9 + doc/src/sgml/postgres-fdw.sgml| 5 +- src/backend/catalog/system_views.sql | 3 +- src/backend/foreign/foreign.c | 1 + src/backend/libpq/auth.c
RE: Data is copied twice when specifying both child and parent table in publication
Dear Wang, I have tested about multilevel partitions, and it worked well. Followings are my comments for v18-0001. 01. pg_get_publication_tables ``` + ListCell *lc; ``` This definition can be inside of the "for (i = 0; i < nelems; i++)". 02. pg_get_publication_tables ``` -* If the publication publishes partition changes via their -* respective root partitioned tables, we must exclude partitions -* in favor of including the root partitioned tables. Otherwise, -* the function could return both the child and parent tables -* which could cause data of the child table to be -* double-published on the subscriber side. +* Publications support partitioned tables. If +* publish_via_partition_root is false, all changes are replicated +* using leaf partition identity and schema, so we only need those. +* Otherwise, get the partitioned table itself. ``` The comments can be inside of the "else". 03. pg_get_publication_tables ``` + pfree(elems); ``` Only elems is pfree()'d here, but how about other variable like pub_elem and pub_elem_tables? Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL
On Mon, Mar 20, 2023 at 2:58 PM Önder Kalacı wrote: > > > I applied all patches for all the versions, and re-run the subscription tests, > all looks good to me. > LGTM. I'll push this tomorrow unless there are more comments. -- With Regards, Amit Kapila.
RE: Initial Schema Sync for Logical Replication
Hi Amit, > From: Amit Kapila > > > Hi, > > > > > > I have a couple of questions. > > > > > > Q1. > > > > > > What happens if the subscriber already has some tables present? For > > > example, I did not see the post saying anything like "Only if the > > > table does not already exist then it will be created". > > > > > My assumption was the if subscriber is doing initial schema sync , It > > does not have any conflicting database objects. > > > > Can't we simply error out in such a case with "obj already exists"? > This would be similar to how we deal with conflicting rows with unique/primary > keys. Right this is the default behaviour , We will run pg_restore with --single_transaction, So if we get error while executing a create table the whole pg_restore will fail and user will notified. Regards Sachin
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Mar 20, 2023 at 12:25 PM Masahiko Sawada wrote: > > On Fri, Mar 17, 2023 at 4:49 PM Masahiko Sawada wrote: > > > > On Fri, Mar 17, 2023 at 4:03 PM John Naylor > > wrote: > > > > > > On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada wrote: > > > > > > > > On Tue, Mar 14, 2023 at 8:27 PM John Naylor > > > > wrote: > > > > > > > > > > I wrote: > > > > > > > > > > > > > Since the block-level measurement is likely overestimating quite a bit, I propose to simply reverse the order of the actions here, effectively reporting progress for the *last page* and not the current one: First update progress with the current memory usage, then add tids for this page. If this allocated a new block, only a small bit of that will be written to. If this block pushes it over the limit, we will detect that up at the top of the loop. It's kind of like our earlier attempts at a "fudge factor", but simpler and less brittle. And, as far as OS pages we have actually written to, I think it'll effectively respect the memory limit, at least in the local mem case. And the numbers will make sense. > > > > > > > > I still like my idea at the top of the page -- at least for vacuum and m_w_m. It's still not completely clear if it's right but I've got nothing better. It also ignores the work_mem issue, but I've given up anticipating all future cases at the moment. > > > > > > > IIUC you suggested measuring memory usage by tracking how much memory > > > > chunks are allocated within a block. If your idea at the top of the > > > > page follows this method, it still doesn't deal with the point Andres > > > > mentioned. > > > > > > Right, but that idea was orthogonal to how we measure memory use, and in fact mentions blocks specifically. The re-ordering was just to make sure that progress reporting didn't show current-use > max-use. > > > > Right. I still like your re-ordering idea. It's true that the most > > area of the last allocated block before heap scanning stops is not > > actually used yet. I'm guessing we can just check if the context > > memory has gone over the limit. But I'm concerned it might not work > > well in systems where overcommit memory is disabled. > > > > > > > > However, the big question remains DSA, since a new segment can be as large as the entire previous set of allocations. It seems it just wasn't designed for things where memory growth is unpredictable. > > aset.c also has a similar characteristic; allocates an 8K block upon > the first allocation in a context, and doubles that size for each > successive block request. But we can specify the initial block size > and max blocksize. This made me think of another idea to specify both > to DSA and both values are calculated based on m_w_m. For example, we That's an interesting idea, and the analogous behavior to aset could be a good thing for readability and maintainability. Worth seeing if it's workable. -- John Naylor EDB: http://www.enterprisedb.com
Re: logical decoding and replication of sequences, take 2
On Mon, Mar 20, 2023 at 5:13 PM Tomas Vondra wrote: > > On 3/20/23 12:00, Amit Kapila wrote: > > On Mon, Mar 20, 2023 at 1:49 PM Tomas Vondra > > wrote: > >> > >> > >> I don't understand why we'd need WAL from before the slot is created, > >> which happens before copy_sequence so the sync will see a more recent > >> state (reflecting all changes up to the slot LSN). > >> > > > > Imagine the following sequence of events: > > 1. Operation on a sequence seq-1 which requires WAL. Say, this is done > > at LSN 1000. > > 2. Some other random operations on unrelated objects. This would > > increase LSN to 2000. > > 3. Create a slot that uses current LSN 2000. > > 4. Copy sequence seq-1 where you will get the LSN value as 1000. Then > > you will use LSN 1000 as a starting point to start replication in > > sequence sync worker. > > > > It is quite possible that WAL from LSN 1000 may not be present. Now, > > it may be possible that we use the slot's LSN in this case but > > currently, it may not be possible without some changes in the slot > > machinery. Even, if we somehow solve this, we have the below problem > > where we can miss some concurrent activity. > > > > I think the question is what would be the WAL-requiring operation at LSN > 1000. If it's just regular nextval(), then we *will* see it during > copy_sequence - sequences are not transactional in the MVCC sense. > > If it's an ALTER SEQUENCE, I guess it might create a new relfilenode, > and then we might fail to apply this - that'd be bad. > > I wonder if we'd allow actually discarding the WAL while building the > consistent snapshot, though. > No, as soon as we reserve the WAL location, we update the slot's minLSN (replicationSlotMinLSN) which would prevent the required WAL from being removed. > You're however right we can't just decide > this based on LSN, we'd probably need to compare the relfilenodes too or > something like that ... > > >> I think the only "issue" are the WAL records after the slot LSN, or more > >> precisely deciding which of the decoded changes to apply. > >> > >> > >>> Now, for the second idea which is to directly use > >>> pg_current_wal_insert_lsn(), I think we won't be able to ensure that > >>> the changes covered by in-progress transactions like the one with > >>> Alter Sequence I have given example would be streamed later after the > >>> initial copy. Because the LSN returned by pg_current_wal_insert_lsn() > >>> could be an LSN after the LSN associated with Alter Sequence but > >>> before the corresponding xact's commit. > >> > >> Yeah, I think you're right - the locking itself is not sufficient to > >> prevent this ordering of operations. copy_sequence would have to lock > >> the sequence exclusively, which seems bit disruptive. > >> > > > > Right, that doesn't sound like a good idea. > > > > Although, maybe we could use a less strict lock level? I mean, one that > allows nextval() to continue, but would conflict with ALTER SEQUENCE. > I don't know if that is a good idea but are you imagining a special interface/mechanism just for logical replication because as far as I can see you have used SELECT to fetch the sequence values? -- With Regards, Amit Kapila.
Question: Do we have a rule to use "PostgreSQL" and "PostgreSQL" separately?
Dear hackers, While checking documentations, I found that one line notes our product as "PostgreSQL", whereas another line notes as just "PostgreSQL". For example, in bgworker.sgml: ``` PostgreSQL can be extended to run user-supplied code in separate processes. ... These processes are attached to PostgreSQL's shared memory area and have the option to connect to databases internally; ... ``` It seems that tag is not used when the string is used as link or title, but I cannot find other rule to use them. Could you please tell me discussions or wikipage about it? Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: allow_in_place_tablespaces vs. pg_basebackup
On Thu, Mar 9, 2023 at 4:15 PM Robert Haas wrote: > Now that I'm done grumbling, here's a patch. Anyone want to comment on this? -- Robert Haas EDB: http://www.enterprisedb.com
Re: logical decoding and replication of sequences, take 2
On 3/20/23 12:00, Amit Kapila wrote: > On Mon, Mar 20, 2023 at 1:49 PM Tomas Vondra > wrote: >> >> >> On 3/20/23 04:42, Amit Kapila wrote: >>> On Sat, Mar 18, 2023 at 8:49 PM Tomas Vondra >>> wrote: On 3/18/23 06:35, Amit Kapila wrote: > On Sat, Mar 18, 2023 at 3:13 AM Tomas Vondra > wrote: >> >> ... >> >> Clearly, for sequences we can't quite rely on snapshots/slots, we need >> to get the LSN to decide what changes to apply/skip from somewhere else. >> I wonder if we can just ignore the queued changes in tablesync, but I >> guess not - there can be queued increments after reading the sequence >> state, and we need to apply those. But maybe we could use the page LSN >> from the relfilenode - that should be the LSN of the last WAL record. >> >> Or maybe we could simply add pg_current_wal_insert_lsn() into the SQL we >> use to read the sequence state ... >> > > What if some Alter Sequence is performed before the copy starts and > after the copy is finished, the containing transaction rolled back? > Won't it copy something which shouldn't have been copied? > That shouldn't be possible - the alter creates a new relfilenode and it's invisible until commit. So either it gets committed (and then replicated), or it remains invisible to the SELECT during sync. >>> >>> Okay, however, we need to ensure that such a change will later be >>> replicated and also need to ensure that the required WAL doesn't get >>> removed. >>> >>> Say, if we use your first idea of page LSN from the relfilenode, then >>> how do we ensure that the corresponding WAL doesn't get removed when >>> later the sync worker tries to start replication from that LSN? I am >>> imagining here the sync_sequence_slot will be created before >>> copy_sequence but even then it is possible that the sequence has not >>> been updated for a long time and the LSN location will be in the past >>> (as compared to the slot's LSN) which means the corresponding WAL >>> could be removed. Now, here we can't directly start using the slot's >>> LSN to stream changes because there is no correlation of it with the >>> LSN (page LSN of sequence's relfilnode) where we want to start >>> streaming. >>> >> >> I don't understand why we'd need WAL from before the slot is created, >> which happens before copy_sequence so the sync will see a more recent >> state (reflecting all changes up to the slot LSN). >> > > Imagine the following sequence of events: > 1. Operation on a sequence seq-1 which requires WAL. Say, this is done > at LSN 1000. > 2. Some other random operations on unrelated objects. This would > increase LSN to 2000. > 3. Create a slot that uses current LSN 2000. > 4. Copy sequence seq-1 where you will get the LSN value as 1000. Then > you will use LSN 1000 as a starting point to start replication in > sequence sync worker. > > It is quite possible that WAL from LSN 1000 may not be present. Now, > it may be possible that we use the slot's LSN in this case but > currently, it may not be possible without some changes in the slot > machinery. Even, if we somehow solve this, we have the below problem > where we can miss some concurrent activity. > I think the question is what would be the WAL-requiring operation at LSN 1000. If it's just regular nextval(), then we *will* see it during copy_sequence - sequences are not transactional in the MVCC sense. If it's an ALTER SEQUENCE, I guess it might create a new relfilenode, and then we might fail to apply this - that'd be bad. I wonder if we'd allow actually discarding the WAL while building the consistent snapshot, though. You're however right we can't just decide this based on LSN, we'd probably need to compare the relfilenodes too or something like that ... >> I think the only "issue" are the WAL records after the slot LSN, or more >> precisely deciding which of the decoded changes to apply. >> >> >>> Now, for the second idea which is to directly use >>> pg_current_wal_insert_lsn(), I think we won't be able to ensure that >>> the changes covered by in-progress transactions like the one with >>> Alter Sequence I have given example would be streamed later after the >>> initial copy. Because the LSN returned by pg_current_wal_insert_lsn() >>> could be an LSN after the LSN associated with Alter Sequence but >>> before the corresponding xact's commit. >> >> Yeah, I think you're right - the locking itself is not sufficient to >> prevent this ordering of operations. copy_sequence would have to lock >> the sequence exclusively, which seems bit disruptive. >> > > Right, that doesn't sound like a good idea. > Although, maybe we could use a less strict lock level? I mean, one that allows nextval() to continue, but would conflict with ALTER SEQUENCE. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: server log inflates due to pg_logical_slot_peek_changes/pg_logical_slot_get_changes calls
Thanks, Ashutosh for the reply. I think those messages are useful when debugging logical replication > problems (imagine missing transaction or inconsistent data between > publisher and subscriber). I don't think pg_logical_slot_get_changes() > or pg_logical_slot_peek_changes() are expected to be called frequently > in a loop. Yeah right. But can you please shed some light on when these functions should be called, or are they used only for testing purposes? Instead you should open a replication connection to > continue to receive logical changes ... forever. > Yes, this is what I have decided to resort to now. Why do you need to call pg_logical_slot_peek_changes() and > pg_logical_slot_get_changes() frequently? > I was just playing around to do something for logical replication and thought of doing this quick test where every time interval I read using pg_logical_slot_peek_changes(), make sure to consume them to a consistent state, and only then use pg_logical_slot_get_changes() to advance the slot. Regards, Jeevan Ladhe
Re: logical decoding and replication of sequences, take 2
On Mon, Mar 20, 2023 at 1:49 PM Tomas Vondra wrote: > > > On 3/20/23 04:42, Amit Kapila wrote: > > On Sat, Mar 18, 2023 at 8:49 PM Tomas Vondra > > wrote: > >> > >> On 3/18/23 06:35, Amit Kapila wrote: > >>> On Sat, Mar 18, 2023 at 3:13 AM Tomas Vondra > >>> wrote: > > ... > > Clearly, for sequences we can't quite rely on snapshots/slots, we need > to get the LSN to decide what changes to apply/skip from somewhere else. > I wonder if we can just ignore the queued changes in tablesync, but I > guess not - there can be queued increments after reading the sequence > state, and we need to apply those. But maybe we could use the page LSN > from the relfilenode - that should be the LSN of the last WAL record. > > Or maybe we could simply add pg_current_wal_insert_lsn() into the SQL we > use to read the sequence state ... > > >>> > >>> What if some Alter Sequence is performed before the copy starts and > >>> after the copy is finished, the containing transaction rolled back? > >>> Won't it copy something which shouldn't have been copied? > >>> > >> > >> That shouldn't be possible - the alter creates a new relfilenode and > >> it's invisible until commit. So either it gets committed (and then > >> replicated), or it remains invisible to the SELECT during sync. > >> > > > > Okay, however, we need to ensure that such a change will later be > > replicated and also need to ensure that the required WAL doesn't get > > removed. > > > > Say, if we use your first idea of page LSN from the relfilenode, then > > how do we ensure that the corresponding WAL doesn't get removed when > > later the sync worker tries to start replication from that LSN? I am > > imagining here the sync_sequence_slot will be created before > > copy_sequence but even then it is possible that the sequence has not > > been updated for a long time and the LSN location will be in the past > > (as compared to the slot's LSN) which means the corresponding WAL > > could be removed. Now, here we can't directly start using the slot's > > LSN to stream changes because there is no correlation of it with the > > LSN (page LSN of sequence's relfilnode) where we want to start > > streaming. > > > > I don't understand why we'd need WAL from before the slot is created, > which happens before copy_sequence so the sync will see a more recent > state (reflecting all changes up to the slot LSN). > Imagine the following sequence of events: 1. Operation on a sequence seq-1 which requires WAL. Say, this is done at LSN 1000. 2. Some other random operations on unrelated objects. This would increase LSN to 2000. 3. Create a slot that uses current LSN 2000. 4. Copy sequence seq-1 where you will get the LSN value as 1000. Then you will use LSN 1000 as a starting point to start replication in sequence sync worker. It is quite possible that WAL from LSN 1000 may not be present. Now, it may be possible that we use the slot's LSN in this case but currently, it may not be possible without some changes in the slot machinery. Even, if we somehow solve this, we have the below problem where we can miss some concurrent activity. > I think the only "issue" are the WAL records after the slot LSN, or more > precisely deciding which of the decoded changes to apply. > > > > Now, for the second idea which is to directly use > > pg_current_wal_insert_lsn(), I think we won't be able to ensure that > > the changes covered by in-progress transactions like the one with > > Alter Sequence I have given example would be streamed later after the > > initial copy. Because the LSN returned by pg_current_wal_insert_lsn() > > could be an LSN after the LSN associated with Alter Sequence but > > before the corresponding xact's commit. > > Yeah, I think you're right - the locking itself is not sufficient to > prevent this ordering of operations. copy_sequence would have to lock > the sequence exclusively, which seems bit disruptive. > Right, that doesn't sound like a good idea. -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
Hi, Please see the attached patch. vignesh C , 18 Mar 2023 Cmt, 12:03 tarihinde şunu yazdı: > On Fri, 17 Mar 2023 at 17:55, Melih Mutlu wrote: > 1) Currently we refer the link to the beginning of create subscription > page, this can be changed to refer to binary option contents in create > subscription: > Done. > 2) Running pgperltidy shows the test script 014_binary.pl could be > slightly improved as in the attachment. > Couldn't apply the patch you attached but I ran pgperltidy myself and I guess the result should be similar. Peter Smith , 18 Mar 2023 Cmt, 12:41 tarihinde şunu yazdı: > Commit message > > 1. > Binary copy is supported for v16 or later. > Done as you suggested. Amit Kapila , 18 Mar 2023 Cmt, 13:11 tarihinde şunu yazdı: > On Sat, Mar 18, 2023 at 3:11 PM Peter Smith wrote: > > > > SUGGESTION > > If the publisher is v16 or later, then any initial table > > synchronization will use the same format as specified by the > > subscription binary mode. If the publisher is before v16, then any > > initial table synchronization will use text format regardless of the > > subscription binary mode. > > > > I agree that the previous comment should be updated but I would prefer > something along the lines: "Prior to v16, initial table > synchronization will use text format even if the binary option is > enabled for a subscription." > Done. Amit Kapila , 20 Mar 2023 Pzt, 05:13 tarihinde şunu yazdı: > On Mon, Mar 20, 2023 at 3:37 AM Peter Smith wrote: > > > > There are a couple of TAP tests where the copy binary is expected to > > fail. And when it fails, you do binary=false (changing the format back > > to 'text') so the test is then expected to be able to proceed. > > > > I don't know if this happens in practice, but IIUC in theory, if the > > timing is extremely bad, the tablesync could relaunch in binary mode > > multiple times (any fail multiple times?) before your binary=false > > change takes effect. > > > > So, I was wondering if it could help to use the subscription > > 'disable_on_error=true' parameter for those cases so that the > > tablesync won't needlessly attempt to relaunch until you have set > > binary=false and then re-enabled the subscription. > > > > +1. That would make tests more reliable. > Done. Hayato Kuroda (Fujitsu) , 20 Mar 2023 Pzt, 07:13 tarihinde şunu yazdı: > Dear Melih, > > Thank you for updating the patch. > I checked your added description about initial data sync and I think it's > OK. > > Few minor comments: > Fixed both comments. Thanks, -- Melih Mutlu Microsoft v18-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry
Hi, On 3/20/23 12:43 AM, Michael Paquier wrote: At the end, documenting both still sounds like the best move to me. Agree. Please find attached v1-0001-pg_stat_get_xact_blocks_fetched-and_hit-doc.patch doing so. I did not put the exact same wording as the one being removed in ddfc2d9, as: - For pg_stat_get_xact_blocks_hit(), I think it's better to be closer to say the pg_statio_all_tables.heap_blks_hit definition. - For pg_stat_get_xact_blocks_fetched(), I think that using "buffer" is better (than block) as at the end it's related to pgstat_count_buffer_read(). At the end there is a choice to be made for both for the wording between block and buffer. Indeed their counters get incremented in "buffer" macros while retrieved in those "blocks" functions. "Buffer" sounds more appropriate to me, so the attached has been done that way. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 6249bb50d0..fba8e892cf 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5697,6 +5697,32 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i + + + + pg_stat_get_xact_blocks_fetched + +pg_stat_get_xact_blocks_fetched ( oid ) +bigint + + +Returns the number of buffer fetches for table or index, in the current transaction + + + + + + + pg_stat_get_xact_blocks_hit + +pg_stat_get_xact_blocks_hit ( oid ) +bigint + + +Returns the number of buffer hits for table or index, in the current transaction + + +
Re: meson documentation build open issues
On 20.03.23 03:33, Andres Freund wrote: I did end up getting stuck when hacking on this, and ended up adding css support for nochunk and support for the website style for htmlhelp and nochunk, as well as obsoleting the need for copying the css files... But perhaps that's a bit too much. Updated set of patches attached. This one works in older meson versions too and adds install-world and install-quiet targets. Oh, this patch set grew quite quickly. ;-) [PATCH v2 1/8] meson: rename html_help target to htmlhelp This is obvious. [PATCH v2 5/8] docs: html: copy images to output as part of xslt build Making the XSLT stylesheets do the copying has some appeal. I think it would only work for SVG (or other XML) files, which I guess is okay, but maybe the templates should have a filter on format="SVG" or something. Also, this copying actually modifies the files in some XML-equivalent way. Also okay, I think, but worth noting. Note sure why you removed this comment - since the code still exists. [PATCH v2 6/8] wip: docs: copy or inline css This seems pretty complicated compared to just copying a file?
Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)
po 20. 3. 2023 v 11:24 odesílatel Matthias van de Meent napsal: > > On Mon, 20 Mar 2023 at 11:11, Tomas Vondra > wrote: > > > > On 3/14/23 15:41, Matthias van de Meent wrote: > > > On Tue, 14 Mar 2023 at 14:49, Tomas Vondra > > > wrote: > > >> > > >>> ... > > > > > >> If you agree with these changes, I'll get it committed. > > > > > > Yes, thanks! > > > > > > > I've tweaked the patch per the last round of comments, cleaned up the > > commit message a bit (it still talked about unused bit in tuple header > > and so on), and pushed it. > > > > Thanks for fixing the issues that got the patch reverted last year! > > Thanks for helping getting this in! Thanks for fixing the problems! > > Kind regards, > > Matthias van de Meent. > >
Re: server log inflates due to pg_logical_slot_peek_changes/pg_logical_slot_get_changes calls
On Sun, Mar 19, 2023 at 4:59 PM Jeevan Ladhe wrote: > > Hi, > > I observed absurd behaviour while using pg_logical_slot_peek_changes() > and pg_logical_slot_get_changes(). Whenever any of these two functions > are called to read the changes using a decoder plugin, the following > messages are printed in the log for every single such call. > > 2023-03-19 16:36:06.040 IST [30099] LOG: starting logical decoding for slot > "test_slot1" > 2023-03-19 16:36:06.040 IST [30099] DETAIL: Streaming transactions > committing after 0/851DFD8, reading WAL from 0/851DFA0. > 2023-03-19 16:36:06.040 IST [30099] STATEMENT: SELECT data FROM > pg_logical_slot_get_changes('test_slot1', NULL, NULL, 'format-version', '2'); > 2023-03-19 16:36:06.040 IST [30099] LOG: logical decoding found consistent > point at 0/851DFA0 > 2023-03-19 16:36:06.040 IST [30099] DETAIL: There are no running > transactions. > 2023-03-19 16:36:06.040 IST [30099] STATEMENT: SELECT data FROM > pg_logical_slot_get_changes('test_slot1', NULL, NULL, 'format-version', '2'); > > This log is printed on every single call to peek/get functions and bloats > the server log file by a huge amount when called in the loop for reading > the changes. > > IMHO, printing the message every time we create the context for > decoding a slot using pg_logical_slot_get_changes() seems over-burn. > Wondering if instead of LOG messages, should we mark these as > DEBUG1 in SnapBuildFindSnapshot() and CreateDecodingContext() > respectively? I can produce a patch for the same if we agree. > I think those messages are useful when debugging logical replication problems (imagine missing transaction or inconsistent data between publisher and subscriber). I don't think pg_logical_slot_get_changes() or pg_logical_slot_peek_changes() are expected to be called frequently in a loop. Instead you should open a replication connection to continue to receive logical changes ... forever. Why do you need to call pg_logical_slot_peek_changes() and pg_logical_slot_get_changes() frequently? -- Best Wishes, Ashutosh Bapat
Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)
On Mon, 20 Mar 2023 at 11:11, Tomas Vondra wrote: > > On 3/14/23 15:41, Matthias van de Meent wrote: > > On Tue, 14 Mar 2023 at 14:49, Tomas Vondra > > wrote: > >> > >>> ... > > > >> If you agree with these changes, I'll get it committed. > > > > Yes, thanks! > > > > I've tweaked the patch per the last round of comments, cleaned up the > commit message a bit (it still talked about unused bit in tuple header > and so on), and pushed it. > > Thanks for fixing the issues that got the patch reverted last year! Thanks for helping getting this in! Kind regards, Matthias van de Meent.
Re: Data is copied twice when specifying both child and parent table in publication
On Mon, Mar 20, 2023 at 1:02 PM Peter Smith wrote: > > > == > src/include/catalog/pg_proc.dat > > 4. > +{ oid => '6119', > + descr => 'get information of the tables in the given publication array', > > Should that be worded in a way to make it more clear that the > "publication array" is really an "array of publication names"? > I don't know how important it is to tell that the array is an array of publication names but the current description can be improved. How about something like: "get information of the tables that are part of the specified publications" Few other comments: = 1. foreach(lc2, ancestors) { Oid ancestor = lfirst_oid(lc2); + ListCell *lc3; /* Check if the parent table exists in the published table list. */ - if (list_member_oid(relids, ancestor)) + foreach(lc3, table_infos) { - skip = true; - break; + Oid relid = ((published_rel *) lfirst(lc3))->relid; + + if (relid == ancestor) + { + skip = true; + break; + } } + + if (skip) + break; } - if (!skip) - result = lappend_oid(result, relid); + if (skip) + table_infos = foreach_delete_current(table_infos, lc); The usage of skip looks a bit ugly to me. Can we move the code for the inner loop to a separate function (like is_ancestor_member_tableinfos()) and remove the current cell if it returns true? 2. * Filter out the partitions whose parent tables were also specified in * the publication. */ -static List * -filter_partitions(List *relids) +static void +filter_partitions(List *table_infos) The comment atop filter_partitions is no longer valid. Can we slightly change it to: "Filter out the partitions whose parent tables are also present in the list."? 3. -# Note: We create two separate tables, not a partitioned one, so that we can -# easily identity through which relation were the changes replicated. +# Note: We only create one table (tab4) here. We specified +# publish_via_partition_root = true (see pub_all and pub_lower_level above), so +# all data will be replicated to that table. $node_subscriber2->safe_psql('postgres', "CREATE TABLE tab4 (a int PRIMARY KEY)"); -$node_subscriber2->safe_psql('postgres', - "CREATE TABLE tab4_1 (a int PRIMARY KEY)"); I am not sure if it is a good idea to remove tab4_1 here. It is testing something different as mentioned in the comments. Also, I don't see any data in tab4 for the initial sync, so not sure if this tests the behavior changed by this patch. 4. --- a/src/test/subscription/t/031_column_list.pl +++ b/src/test/subscription/t/031_column_list.pl @@ -959,7 +959,8 @@ $node_publisher->safe_psql( CREATE TABLE test_root_1 PARTITION OF test_root FOR VALUES FROM (1) TO (10); CREATE TABLE test_root_2 PARTITION OF test_root FOR VALUES FROM (10) TO (20); - CREATE PUBLICATION pub_root_true FOR TABLE test_root (a) WITH (publish_via_partition_root = true); + CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a) WITH (publish_via_partition_root = true); + CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH (publish_via_partition_root = true); -- initial data INSERT INTO test_root VALUES (1, 2, 3); @@ -968,7 +969,7 @@ $node_publisher->safe_psql( $node_subscriber->safe_psql( 'postgres', qq( - CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_root_true; + CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_root_true_1, pub_root_true_2; It is not clear to me what exactly you want to test here. Please add some comments. 5. I think you can merge the 0001 and 0003 patches. Apart from the above, attached is a patch to change some of the comments in the patch. -- With Regards, Amit Kapila. diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 82941a0ce7..44fc371d2c 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -1949,22 +1949,15 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications) initStringInfo(); - /* -* Get namespace, relname and column list (if supported) of the tables -* belonging to the specified publications. -* -* Get the list of tables from the publisher. The partition table whose -* ancestor is also in this list will be ignored, otherwise the initial -* data in the partition table would be replicated twice. -* -* From version 16, the parameter of the function -* pg_get_publication_tables can be an array of publications. The -* partition table whose ancestor is also published in this publication -* array will be filtered out in this function. -*/ + /* Get the list of tables from the publisher. */ if (server_version >= 16) { /* +* From version 16, we allowed passing multiple publications to the +* function pg_get_publication_tables. This helped to filter out the +