RE: Partial aggregates pushdown
Hi Mr.Momjian > First, am I correct? Yes, you are correct. This patch uses new special aggregate functions for partial aggregate (then we call this partialaggfunc). > Second, how far away is this from being committable > and/or what work needs to be done to get it committable, either for PG 16 or > 17? I believe there are three: 1. and 2. are not clear if they are necessary or not; 3. are clearly necessary. I would like to hear the opinions of the development community on whether or not 1. and 2. need to be addressed. 1. Making partialaggfunc user-defined function In v17, I make partialaggfuncs as built-in functions. Because of this approach, v17 changes specification of BKI file and pg_aggregate. For now, partialaggfuncs are needed by only postgres_fdw which is just an extension of PostgreSQL. In the future, when revising the specifications for BKI files and pg_aggregate when modifying existing PostgreSQL functions, It is necessary to align them with this patch's changes. I am concerned that this may be undesirable. So I am thinking that v17 should be modified to making partialaggfunc as user defined function. 2. Automation of creating definition of partialaggfuncs In development of v17, I manually create definition of partialaggfuncs for avg, min, max, sum, count. I am concerned that this may be undesirable. So I am thinking that v17 should be modified to automate creating definition of partialaggfuncs for all built-in aggregate functions. 3. Documentation I need add explanation of partialaggfunc to documents on postgres_fdw and other places. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R Center Mitsubishi Electric Corporation
regression coverage gaps for gist and hash indexes
Hi, I was working on committing patch 0001 from [1] and was a bit confused about some of the changes to the WAL format for gist and hash index vacuum. It looked to me that the changed code just flat out would not work. Turns out the problem is that we don't reach deletion for hash and gist vacuum: gist: > Oh, I see. We apparently don't reach the gist deletion code in the tests: > https://coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html#674 > https://coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html#174 > > And indeed, if I add an abort() into , it's not reached. > > And it's not because tests use a temp table, the caller is also unreachable: > https://coverage.postgresql.org/src/backend/access/gist/gist.c.gcov.html#1643 hash: > And there also are no tests: > https://coverage.postgresql.org/src/backend/access/hash/hashinsert.c.gcov.html#372 I've since looked to other index AMs. spgist's XLOG_SPGIST_VACUUM_ROOT emitted, but not replayed: https://coverage.postgresql.org/src/backend/access/spgist/spgvacuum.c.gcov.html#474 https://coverage.postgresql.org/src/backend/access/spgist/spgxlog.c.gcov.html#962 gin's XLOG_GIN_VACUUM_DATA_LEAF_PAGE is not emitted, but only because of a RelationNeedsWAL() check: https://coverage.postgresql.org/src/backend/access/gin/gindatapage.c.gcov.html#852 I also looked at heapam: XLOG_HEAP2_LOCK_UPDATED is not replayed, but emitted: https://coverage.postgresql.org/src/backend/access/heap/heapam.c.gcov.html#5487 https://coverage.postgresql.org/src/backend/access/heap/heapam.c.gcov.html#9965 same for XLOG_HEAP2_REWRITE: https://coverage.postgresql.org/src/backend/access/heap/rewriteheap.c.gcov.html#928 https://coverage.postgresql.org/src/backend/access/heap/heapam.c.gcov.html#9975 and XLOG_HEAP_TRUNCATE (ugh, that record is quite the layering violation): https://coverage.postgresql.org/src/backend/commands/tablecmds.c.gcov.html#2128 https://coverage.postgresql.org/src/backend/access/heap/heapam.c.gcov.html#9918 The fact that those cases aren't replayed isn't too surprising - XLOG_HEAP2_LOCK_UPDATED is exercised by isolationtester, XLOG_HEAP2_REWRITE, XLOG_HEAP_TRUNCATE by contrib/test_decoding. Neither is part of 027_stream_regress.pl The lack of any coverage of hash and gist deletion/vacuum seems quite concerning to me. Greetings, Andres Freund [1] https://postgr.es/m/da7184cf-c7b9-c333-801e-0e7507a23...@gmail.com
Re: Minimal logical decoding on standbys
Hi, On 2023-03-30 18:23:41 +0200, Drouvot, Bertrand wrote: > On 3/30/23 9:04 AM, Andres Freund wrote: > > I think this commit is ready to go. Unless somebody thinks differently, I > > think I might push it tomorrow. > > Great! Once done, I'll submit a new patch so that GlobalVisTestFor() can make > use of the heap relation in vacuumRedirectAndPlaceholder() (which will be > possible > once 0001 is committed). Unfortunately I did find an issue doing a pre-commit review of the patch. The patch adds VISIBILITYMAP_IS_CATALOG_REL to xl_heap_visible.flags - but it does not remove the bit before calling visibilitymap_set(). This ends up corrupting the visibilitymap, because the we'll set a bit for another page. It's unfortunate that visibilitymap_set() doesn't assert that just the correct bits are passed in. It does assert that at least one valid bit is set, but that's not enough, as this case shows. I noticed this when looking into the changes to visibilitymapdefs.h in more detail. I don't like how it ends up in the patch: > --- a/src/include/access/visibilitymapdefs.h > +++ b/src/include/access/visibilitymapdefs.h > @@ -17,9 +17,11 @@ > #define BITS_PER_HEAPBLOCK 2 > > /* Flags for bit map */ > -#define VISIBILITYMAP_ALL_VISIBLE0x01 > -#define VISIBILITYMAP_ALL_FROZEN 0x02 > -#define VISIBILITYMAP_VALID_BITS 0x03/* OR of all valid visibilitymap > - > * flags bits */ > +#define VISIBILITYMAP_ALL_VISIBLE > 0x01 > +#define VISIBILITYMAP_ALL_FROZEN > 0x02 > +#define VISIBILITYMAP_VALID_BITS > 0x03/* OR of all valid visibilitymap > + > * flags bits > */ > +#define VISIBILITYMAP_IS_CATALOG_REL > 0x04/* to handle recovery conflict during logical > + > * decoding > on standby */ On a casual read, one very well might think that VISIBILITYMAP_IS_CATALOG_REL is a valid bit that could be set in the VM. I am thinking of instead creating a separate namespace for the "xlog only" bits: /* * To detect recovery conflicts during logical decoding on a standby, we need * to know if a table is a user catalog table. For that we add an additional * bit into xl_heap_visible.flags, in addition to the above. * * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set(). */ #define VISIBILITYMAP_XLOG_CATALOG_REL 0x04 #define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL) That allows heap_xlog_visible() to do: Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags); vmbits = (xlrec->flags & VISIBILITYMAP_VALID_BITS); and pass vmbits istead of xlrec->flags to visibilitymap_set(). I'm also thinking of splitting the patch into two. One patch to pass down the heap relation into the new places, and another for the rest. As evidenced above, looking at the actual behaviour changes is important... Given how the patch changes the struct for XLOG_GIST_DELETE: > diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h > index 2ce9366277..93fb9d438a 100644 > --- a/src/include/access/gistxlog.h > +++ b/src/include/access/gistxlog.h > @@ -51,11 +51,14 @@ typedef struct gistxlogDelete > { > TransactionId snapshotConflictHorizon; > uint16 ntodelete; /* number of deleted offsets */ > + boolisCatalogRel; /* to handle recovery conflict during > logical > + * decoding on > standby */ > > - /* TODELETE OFFSET NUMBER ARRAY FOLLOWS */ > + /* TODELETE OFFSET NUMBERS */ > + OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER]; > } gistxlogDelete; > > -#define SizeOfGistxlogDelete (offsetof(gistxlogDelete, ntodelete) + > sizeof(uint16)) > +#define SizeOfGistxlogDelete offsetof(gistxlogDelete, offsets) and XLOG_HASH_VACUUM_ONE_PAGE: > diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h > index 9894ab9afe..6c5535fe73 100644 > --- a/src/include/access/hash_xlog.h > +++ b/src/include/access/hash_xlog.h > @@ -252,12 +252,14 @@ typedef struct xl_hash_vacuum_one_page > { > TransactionId snapshotConflictHorizon; > uint16 ntuples; > + boolisCatalogRel; /* to handle recovery conflict during > logical > + * decoding on > standby */ > > - /* TARGET OFFSET
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
On Mon, 20 Mar 2023 at 11:50, Melanie Plageman wrote: > Attached is an updated v6. I had a look over the v6-0001 patch. There are a few things I think could be done better: "Some operations will access a large number of pages at a time", does this really need "at a time"? I think it's more relevant that the operation uses a large number of pages. Missing around Buffer Access Strategy. Various things could be linked to other sections of the glossary, e.g. pages could link to glossary-data-page, shared buffers could link to glossary-shared-memory and WAL could link to glossary-wal. The final paragraph should have tags around the various commands that you list. I have adjusted those and slightly reworded a few other things. See the attached .diff which can be applied atop of v6-0001. David v6-0001-adjustments.diff Description: Binary data
Re: Thoughts on using Text::Template for our autogenerated code?
On Fri, Mar 31, 2023 at 4:15 AM Corey Huinker wrote: > For those who already responded, are your concerns limited to the dependency issue? Would you have concerns about a templating library that was developed by us and included in-tree? Libraries (and abstractions in general) require some mental effort to interface with them (that also means debugging when the output fails to match expectations), not to mention maintenance cost. There has to be a compensating benefit in return. The cost-to-benefit ratio here seems unfavorable -- seems like inventing a machine that ties shoelaces, but we usually wear sandals. -- John Naylor EDB: http://www.enterprisedb.com
Re: Data is copied twice when specifying both child and parent table in publication
On Fri, Mar 31, 2023 at 5:15 AM Jacob Champion wrote: > > On Wed, Mar 29, 2023 at 2:00 AM Amit Kapila wrote: > > Pushed. > > While rebasing my logical-roots patch over the top of this, I ran into > another situation where mixed viaroot settings can duplicate data. The > key idea is to subscribe to two publications with mixed settings, as > before, and add a partition root that's already been replicated with > viaroot=false to the other publication with viaroot=true. > > pub=# CREATE TABLE part (a int) PARTITION BY RANGE (a); > pub=# CREATE PUBLICATION pub_all FOR ALL TABLES; > pub=# CREATE PUBLICATION pub_other FOR TABLE other WITH > (publish_via_partition_root); > -- populate with data, then switch to subscription side > sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub_all, pub_other; > -- switch back to publication > pub=# ALTER PUBLICATION pub_other ADD TABLE part; > -- and back to subscription > sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION; > -- data is now duplicated > > (Standalone reproduction attached.) > > This is similar to what happens if you alter the > publish_via_partition_root setting for an existing publication, but > I'd argue it's easier to hit by accident. Is this part of the same > class of bugs, or is it different (or even expected) behavior? > Hi Jacob. I tried your example. And I can see after the REFRESH the added table 'part' tablesync is launched and so does the copy causing duplicate data. sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION; ALTER SUBSCRIPTION sub=# 2023-03-31 13:09:30.348 AEDT [334] LOG: logical replication table synchronization worker for subscription "sub", table "part" has started ... Duplicate data happens because REFRESH PUBLICATION has the default "refresh_option of copy_data=true. Although the result is at first a bit unexpected, I am not sure if anything can be done to make it do what you probably hoped it would do: For example, Just imagine if logic could be made smarter to recognize that since there was already the 'part_def' being subscribed so it should NOT use the default 'copy_data=true' when the REFRESH launches the ancestor table 'part'... Even if that logic was implemented, I have a feeling you could *still* run into problems if the 'part' table was made of multiple partitions. I think you might get to a situation where you DO want some partition data copied (because you did not have it yet but now you are subscribing to the root you want it) while at the same time, you DON'T want to get duplicated data from other partitions (because you already knew about those ones -- like your example does). So, I am not sure what the answer is, or maybe there isn't one. At least, we need to check there are sufficient "BE CAREFUL" warnings in the documentation for scenarios like this. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Add shared buffer hits to pg_stat_io
Hi, On 2023-03-09 08:23:46 -0500, Melanie Plageman wrote: > Good idea. v3 attached. I committed this, after some small regression test changes. I was worried that the query for testing buffer hits might silently change in the future, so I added an EXPLAIN for the query. Also removed the need for the explicit RESETs by using BEGIN; SET LOCAL ...; query; COMMIT;. Thanks for the patch Melanie and the review Bertrand. I'm excited about finally being able to compute meaningful cache hit ratios :) Regards, Andres
RE: Non-superuser subscription owners
On Friday, March 31, 2023 12:05 AM Robert Haas wrote: Hi, > > On Tue, Mar 28, 2023 at 1:52 PM Jeff Davis wrote: > > On Fri, 2023-03-24 at 00:17 -0700, Jeff Davis wrote: > > > The other patch you posted seems like it makes a lot of progress in > > > that direction, and I think that should go in first. That was one of > > > the items I suggested previously[2], so thank you for working on > > > that. > > > > The above is not a hard objection. > > The other patch is starting to go in a direction that is going to have some > conflicts with this one, so I went ahead and committed this one to avoid > rebasing pain. I noticed the BF[1] report a core dump after this commit. #0 0xfd581864 in _lwp_kill () from /usr/lib/libc.so.12 #0 0xfd581864 in _lwp_kill () from /usr/lib/libc.so.12 #1 0xfd5817dc in raise () from /usr/lib/libc.so.12 #2 0xfd581c88 in abort () from /usr/lib/libc.so.12 #3 0x01e6c8d4 in ExceptionalCondition (conditionName=conditionName@entry=0x2007758 "IsTransactionState()", fileName=fileName@entry=0x20565c4 "catcache.c", lineNumber=lineNumber@entry=1208) at assert.c:66 #4 0x01e4e404 in SearchCatCacheInternal (cache=0xfd21e500, nkeys=nkeys@entry=1, v1=v1@entry=28985, v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at catcache.c:1208 #5 0x01e4eea0 in SearchCatCache1 (cache=, v1=v1@entry=28985) at catcache.c:1162 #6 0x01e66e34 in SearchSysCache1 (cacheId=cacheId@entry=11, key1=key1@entry=28985) at syscache.c:825 #7 0x01e98c40 in superuser_arg (roleid=28985) at superuser.c:70 #8 0x01c657bc in ApplyWorkerMain (main_arg=) at worker.c:4552 #9 0x01c1ceac in StartBackgroundWorker () at bgworker.c:861 #10 0x01c23be0 in do_start_bgworker (rw=) at postmaster.c:5762 #11 maybe_start_bgworkers () at postmaster.c:5986 #12 0x01c2459c in process_pm_pmsignal () at postmaster.c:5149 #13 ServerLoop () at postmaster.c:1770 #14 0x01c26cdc in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0xe0e4) at postmaster.c:1463 #15 0x01ee2c8c in main (argc=4, argv=0xe0e4) at main.c:200 It looks like the super user check is out of a transaction, I haven't checked why it only failed on one BF animal, but it seems we can put the check into the transaction like the following: diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 6fd674b5d6..08f10fc331 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -4545,12 +4545,13 @@ ApplyWorkerMain(Datum main_arg) replorigin_session_setup(originid, 0); replorigin_session_origin = originid; origin_startpos = replorigin_session_get_progress(false); - CommitTransactionCommand(); /* Is the use of a password mandatory? */ must_use_password = MySubscription->passwordrequired && !superuser_arg(MySubscription->owner); + CommitTransactionCommand(); + [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba=2023-03-30%2019%3A41%3A08 Best Regards, Hou Zhijie
Re: Request for comment on setting binary format output per session
Dave Cramer On Thu, 30 Mar 2023 at 15:40, Jeff Davis wrote: > On Thu, 2023-03-30 at 07:06 -0500, Merlin Moncure wrote: > > This ought to be impossible IMO. All libpq routines except PQexec > > have an explicit expectation on format (via resultformat parameter) > > that should not be overridden. PQexec ought to be explicitly > > documented and wired to only request text format data. > > Right now it's clearly documented[1] which formats will be returned for > a given Bind message. That can be seen as the root of the problem with > psql -- we are breaking the protocol by returning binary when psql can > rightfully expect text. > > It is a minor break, because something needed to send the "SET > binary_formats='...'" command, but the protocol docs would need to be > updated for sure. > > > participating clients to receive GUC configured format. I do not > > think that libpq's result format being able to be overridden by GUC > > is a good idea at all, the library has to to participate, and I > > think can be made to so so without adjusting the interface (say, by > > resultFormat = 3). > > Interesting idea. I suppose you'd need to specify 3 for all result > columns? That is a protocol change, but wouldn't "break" older clients. > The newer clients would need to make sure that they're connecting to > v16+, so using the protocol version alone wouldn't be enough. Hmm. > I'm confused. How does using resultFormat=3 change anything ? Dave
Re: Partial aggregates pushdown
On Thu, Dec 15, 2022 at 10:23:05PM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > Hi Mr.Freund. > > > cfbot complains about some compiler warnings when building with clang: > > https://cirrus-ci.com/task/6606268580757504 > > > > deparse.c:3459:22: error: equality comparison with extraneous parentheses > > [-Werror,-Wparentheses-equality] > > if ((node->aggsplit == AGGSPLIT_SIMPLE)) { > > ~~~^~ > > deparse.c:3459:22: note: remove extraneous parentheses around the > > comparison to silence this warning > > if ((node->aggsplit == AGGSPLIT_SIMPLE)) { > > ~ ^ ~ > > deparse.c:3459:22: note: use '=' to turn this equality comparison into an > > assignment > > if ((node->aggsplit == AGGSPLIT_SIMPLE)) { > > ^~ > > = > I fixed this error. Considering we only have a week left before feature freeze, I wanted to review the patch from this commitfest item: https://commitfest.postgresql.org/42/4019/ The most recent patch attached. This feature has been in development since 2021, and it is something that will allow new workloads for Postgres, specifically data warehouse sharding workloads. We currently allow parallel aggregates when the table is on the same machine, and we allow partitonwise aggregates on FDWs only with GROUP BY keys matching partition keys. The first is possible since we can share data structures between background workers, and the second is possible because if the GROUP BY includes the partition key, we are really just appending aggregate rows, not combining aggregate computations. What we can't do without this patch is to push aggregates that require partial aggregate computations (no partition key GROUP BY) to FDW partitions because we don't have a clean way to pass such information from the remote FDW server to the finalize backend. I think that is what this patch does. First, am I correct? Second, how far away is this from being committable and/or what work needs to be done to get it committable, either for PG 16 or 17? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be. >From 71993e37093b3cec325de5989a03afb3073775aa Mon Sep 17 00:00:00 2001 From: Yuki Fujii Date: Fri, 16 Dec 2022 09:33:30 +0900 Subject: [PATCH] Partial aggregates push down v17 --- contrib/postgres_fdw/deparse.c| 97 +- .../postgres_fdw/expected/postgres_fdw.out| 296 +- contrib/postgres_fdw/option.c | 24 ++ contrib/postgres_fdw/postgres_fdw.c | 22 +- contrib/postgres_fdw/postgres_fdw.h | 3 + contrib/postgres_fdw/sql/postgres_fdw.sql | 85 - src/backend/catalog/pg_aggregate.c| 32 ++ src/backend/commands/aggregatecmds.c | 8 + src/bin/pg_dump/pg_dump.c | 25 +- src/include/catalog/pg_aggregate.dat | 62 +++- src/include/catalog/pg_aggregate.h| 11 + src/include/catalog/pg_proc.dat | 35 +++ .../regress/expected/create_aggregate.out | 24 ++ src/test/regress/expected/oidjoins.out| 1 + src/test/regress/sql/create_aggregate.sql | 24 ++ 15 files changed, 715 insertions(+), 34 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 9524765650..8e5a45ceaa 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -202,7 +202,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); - +static bool partial_agg_ok(Aggref* agg, PgFdwRelationInfo* fpinfo); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -907,8 +907,9 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg, fpinfo)) return false; /* As usual, it must be shippable. */ @@ -3448,14 +3449,37 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) StringInfo buf = context->buf; bool use_variadic; - /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + + Assert((node->aggsplit == AGGSPLIT_SIMPLE) || + (node->aggsplit == AGGSPLIT_INITIAL_SERIAL)); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; - /* Find aggregate name from aggfnoid which is a pg_proc
Re: Thoughts on using Text::Template for our autogenerated code?
Andres Freund writes: > On 2023-03-30 17:15:20 -0400, Corey Huinker wrote: >> For those who already responded, are your concerns limited to the >> dependency issue? Would you have concerns about a templating library that >> was developed by us and included in-tree? I'm not suggesting suggesting we >> write one at this time, at least not until after we make a here-doc-ing >> pass first, but I want to understand the concerns before embarking on any >> refactoring. > The dependency is/was my main issue. But I'm also somewhat doubtful that what > we do warrants the use of a template library in the first place, but I could > be convinced by a patch showing it to be a significant improvement. Yeah, it's somewhat hard to believe that the cost/benefit ratio would be attractive. But maybe you could mock up some examples of what the input could look like, and get people on board (or not) before writing any code. regards, tom lane
Re: Support logical replication of DDLs
It seems that lately, the patch attachments are lacking version numbers. It causes unnecessary confusion. For example, I sometimes fetch patches from this thread locally to "diff" them with previous patches to get a rough overview of the changes -- that has now become more difficult. Can you please reinstate the name convention of having version numbers for all patch attachments? IMO *every* post that includes patches should unconditionally increment the patch version -- even if the new patches are just a rebase or some other trivial change. Version numbers make it clear what patches are the latest, you will be easily able to unambiguously refer to them by name in subsequent posts, and when copied to your local computer they won't clash with any older copied patches. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Add pg_walinspect function with block info columns
On Thu, Mar 30, 2023 at 2:41 PM Peter Geoghegan wrote: > pg@regression:5432 [1402115]=# SELECT > count(*) > FROM > pg_get_wal_block_info ('0/10E9D80', '/', true); > ┌─[ RECORD 1 ]───┐ > │ count │ 17,031,979 │ > └───┴┘ > > Time: 15235.499 ms (00:15.235) > > This time is also typical of what I saw. The variance was fairly low, > so I won't bother describing it. If I rerun the same test case with pg_get_wal_records_info (same WAL records, same system) then I find that it takes about 16 and a half seconds. So my patch makes pg_get_wal_block_info a little bit faster than pg_get_wal_records_info for this test case, and likely many interesting cases (assuming that the user opts out of fetching block_data and block_fpi_data values when running pg_get_wal_block_info, per the patch). This result closely matches what I was expecting. We're doing almost the same amount of work when each function is called, so naturally the runtime almost matches. Note that pg_get_wal_records_info does slightly *more* work here, since it alone must output rows for commit records. Unlike pg_get_wal_block_info, which (by design) never outputs rows for WAL records that lack block references. -- Peter Geoghegan
Re: Add pg_walinspect function with block info columns
On Wed, Mar 29, 2023 at 8:28 PM Bharath Rupireddy wrote: > I took a look at v9 and LGTM. Pushed, thanks. There is still an outstanding question around the overhead of outputting FPIs and even block data from pg_get_wal_block_info(). At one point Melanie suggested that we'd need to do something about that, and I tend to agree. Attached patch provides an optional parameter that will make pg_get_wal_block_info return NULLs for both block_data and block_fpi_data, no matter whether or not there is something to show. Note that this only affects those two bytea columns; we'll still show everything else, including valid block_data_length and block_fpi_length values (so the metadata describing the on-disk size of block_data and block_fpi_data is unaffected). To test this patch, I ran pgbench for about 5 minutes, using a fairly standard configuration with added indexes and with wal_log_hints enabled. I ended up with the following WAL records afterwards: pg@regression:5432 [1402115]=# SELECT "resource_manager/record_type" t, pg_size_pretty(combined_size) s, fpi_size_percentage perc_fpi FROM pg_get_wal_Stats ('0/10E9D80', '/', FALSE) where combined_size > 0; ┌─[ RECORD 1 ]──┐ │ t│ XLOG │ │ s│ 1557 MB│ │ perc_fpi │ 22.029466865781302 │ ├─[ RECORD 2 ]──┤ │ t│ Transaction│ │ s│ 49 MB │ │ perc_fpi │ 0 │ ├─[ RECORD 3 ]──┤ │ t│ Storage│ │ s│ 13 kB │ │ perc_fpi │ 0 │ ├─[ RECORD 4 ]──┤ │ t│ CLOG │ │ s│ 1380 bytes │ │ perc_fpi │ 0 │ ├─[ RECORD 5 ]──┤ │ t│ Database │ │ s│ 118 bytes │ │ perc_fpi │ 0 │ ├─[ RECORD 6 ]──┤ │ t│ RelMap │ │ s│ 565 bytes │ │ perc_fpi │ 0 │ ├─[ RECORD 7 ]──┤ │ t│ Standby│ │ s│ 30 kB │ │ perc_fpi │ 0 │ ├─[ RECORD 8 ]──┤ │ t│ Heap2 │ │ s│ 4235 MB│ │ perc_fpi │ 0.6731388657682449 │ ├─[ RECORD 9 ]──┤ │ t│ Heap │ │ s│ 4482 MB│ │ perc_fpi │ 54.46811493602934 │ ├─[ RECORD 10 ]─┤ │ t│ Btree │ │ s│ 1786 MB│ │ perc_fpi │ 22.829279332421116 │ └──┴┘ Time: 3618.693 ms (00:03.619) So about 12GB of WAL -- certainly enough to be a challenge for pg_walinspect. I then ran the following query several times over the same LSN range as before, with my patch applied, but with behavior equivalent to current git HEAD (this is with outputting block_data and block_fpi_data values still turned on): pg@regression:5432 [1402115]=# SELECT count(*) FROM pg_get_wal_block_info ('0/10E9D80', '/', false); ┌─[ RECORD 1 ]───┐ │ count │ 17,031,979 │ └───┴┘ Time: 35171.463 ms (00:35.171) The time shown here is typical of what I saw. And now the same query, but without any overhead for outputting block_data and block_fpi_data values: pg@regression:5432 [1402115]=# SELECT count(*) FROM pg_get_wal_block_info ('0/10E9D80', '/', true); ┌─[ RECORD 1 ]───┐ │ count │ 17,031,979 │ └───┴┘ Time: 15235.499 ms (00:15.235) This time is also typical of what I saw. The variance was fairly low, so I won't bother describing it. I think that this is a compelling reason to apply the patch. It would be possible to get about 75% of the benefit shown here by just suppressing block_fpi_data output, without suppressing block_data, but I think that it makes sense to either suppress both or neither. Things like page split records can write a fairly large amount of WAL in a way that resembles an FPI, even though technically no FPI is involved. If there are no objections, I'll move ahead with committing something along the lines of this patch in the next couple of days. -- Peter Geoghegan v1-0001-pg_get_wal_block_info-suppress-block-data-outputs.patch Description: Binary data
Re: Thoughts on using Text::Template for our autogenerated code?
Hi, On 2023-03-30 17:15:20 -0400, Corey Huinker wrote: > For those who already responded, are your concerns limited to the > dependency issue? Would you have concerns about a templating library that > was developed by us and included in-tree? I'm not suggesting suggesting we > write one at this time, at least not until after we make a here-doc-ing > pass first, but I want to understand the concerns before embarking on any > refactoring. The dependency is/was my main issue. But I'm also somewhat doubtful that what we do warrants the use of a template library in the first place, but I could be convinced by a patch showing it to be a significant improvement. Greetings, Andres Freund
Re: Refactor calculations to use instr_time
Hi, I pushed this version! Thanks all, for the contribution and reviews. > > I would either put WalUsage into PgStat_PendingWalStats (so that it has > > all the same members as PgStat_WalStats), or figure out a way to > > maintain WalUsage separately from PgStat_WalStats or something else. > > Worst case, add more comments to the struct definitions to explain why > > they have the members they have and how WalUsage relates to them. > > Yes, but like Andres said this could be better done as a separate patch. I invite you to write a patch for that for 17... Greetings, Andres Freund
Re: ICU locale validation / canonicalization
On Thu, 2023-03-30 at 08:59 +0200, Peter Eisentraut wrote: > I don't think the special handling of IsBinaryUpgrade is needed or > wanted. I would hope that with this feature, all old-style locale > IDs > would go away, but this way we would keep them forever. If we > believe > that canonicalization is safe, then I don't see why we cannot apply > it > during binary upgrade. There are two issues: 1. Failures can occur. For instance, if an invalid attribute is used, like '@collStrength=primary', then we can't canonicalize it (or if we do, it could end up being not what the user intended). 2. Version 15 and earlier have a subtle bug: it passes the raw locale straight to ucol_open(), and if the locale is "fr_CA.UTF-8" ucol_open() mis-parses it to have language "fr" with no region. If you canonicalize first, it properly parses the locale and produces "fr-CA", which results in a different collator. The 15 behavior is wrong, and this canonicalization patch will fix it, but it doesn't do so during pg_upgrade because that could change the collator and corrupt an index. The current patch deals with these problems by simply preserving the locale (valid or not) during pg_upgrade, and only canonicalizing new collations and databases (so #2 is only fixed for new collations/databases). I think that's a good trade-off because a lot more users will be on ICU now that it's the default, so let's avoid creating more of the problem cases for those new users. To get to perfectly-canonicalized catalogs for upgrades from earlier versions: * We need a way to detect #2, which I posted some code for in an uncommitted revision[1] of this patch series. * We need a way to detect #1 and #2 during the pg_upgrade --check phase. * We need actions that the user can take to correct the problems. I have some ideas but they could use some dicsussion. I'm not sure all of those will be ready for v16, though. Regards, Jeff Davis [1] See check_equivalent_icu_locales() and calling code here: https://www.postgresql.org/message-id/8c7af6820aed94dc7bc259d2aa7f9663518e6137.ca...@j-davis.com
Re: Thoughts on using Text::Template for our autogenerated code?
> > I don't think that's remotely a starter. Asking people to install an old > and possibly buggy version of a perl module is not something we should do. > > I think the barrier for this is pretty high. I try to keep module > requirements for the buildfarm client pretty minimal, and this could affect > a much larger group of people. > Those are good reasons. For those who already responded, are your concerns limited to the dependency issue? Would you have concerns about a templating library that was developed by us and included in-tree? I'm not suggesting suggesting we write one at this time, at least not until after we make a here-doc-ing pass first, but I want to understand the concerns before embarking on any refactoring.
Re: Thoughts on using Text::Template for our autogenerated code?
> > I think many of those could just be replaced by multi-line strings and/or > here > documents to get most of the win. > I agree that a pretty good chunk of it can be done with here-docs, but template files do have attractive features (separation of concerns, syntax highlighting, etc) that made it worth asking.
Re: pgsql: Clean up role created in new subscription test.
> On 30 Mar 2023, at 22:29, Tom Lane wrote: > Daniel Gustafsson writes: >>> On 30 Mar 2023, at 20:44, Tom Lane wrote: >>> Another idea could be for pg_regress to enforce that "select count(*) >>> from pg_roles" gives the same answer before and after the test run. > >> That wouldn't prevent the contents of pg_roles to have changed though, so >> there >> is a (slim) false positive risk with that no? > > Well, we could do "select rolname from pg_roles order by 1" and > actually compare the results of the two selects. That might be > advisable anyway, in order to produce a complaint with useful > detail when there is something wrong. I can see the value in doing something like this to keep us honest. -- Daniel Gustafsson
Re: pgsql: Clean up role created in new subscription test.
Daniel Gustafsson writes: >> On 30 Mar 2023, at 20:44, Tom Lane wrote: >> Maybe it'd be close enough to expect there to be no roles named >> "regress_xxx". In combination with >> -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, that would prevent us >> from accidentally leaving stuff behind, and we could hope that it doesn't >> cause false failures in real installations. > Would that check be always on or only when pg_regress is compiled with > -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS? I envisioned it as being on all the time. >> Another idea could be for pg_regress to enforce that "select count(*) >> from pg_roles" gives the same answer before and after the test run. > That wouldn't prevent the contents of pg_roles to have changed though, so > there > is a (slim) false positive risk with that no? Well, we could do "select rolname from pg_roles order by 1" and actually compare the results of the two selects. That might be advisable anyway, in order to produce a complaint with useful detail when there is something wrong. regards, tom lane
Re: Transparent column encryption
On 30.03.23 20:35, Stephen Frost wrote: I do feel that column encryption is a useful capability and there's large parts of this approach that I agree with, but I dislike the idea of having our clients be able to depend on what gets returned for non-encrypted columns while not being able to trust what encrypted column results are and then trying to say it's 'transparent'. To that end, it seems like just saying they get back a bytea and making it clear that they have to provide the validation would be clear, while keeping much of the rest. [Note that the word "transparent" has been removed from the feature name. I just didn't change the email thread name.] These thoughts are reasonable, but I think there is a tradeoff to be made between having featureful data validation and enhanced security. If you want your database system to validate your data, you have to send it in plain text. If you want to have your database system not see the plain text, then it cannot validate it. But there is still utility in it. You can't really depend on what gets returned even in the non-encrypted case, unless you cryptographically sign the schema against modification or something like that. So realistically, a client that cares strongly about the data it receives has to do some kind of client-side validation anyway. Note also that higher-level client libraries like JDBC effectively do client-side data validation, for example when you call ResultSet.getInt() etc. This is also one of the reasons why the user facing type declaration exists. You could just make all encrypted columns of type "opaque" or something and not make any promises about what's inside. But client APIs sort or rely on the application being able to ask the result set for what's inside a column value. If we just say, we don't know, then applications (or driver APIs) will have to be changed to accommodate that, but the intention was to not require that. So instead we say, it's supposed to be int, and then if it's sometimes actually not int, then your application throws an exception you can deal with. This is arguably a better developer experience, even if it concerns the data type purist. But do you have a different idea about how it should work? Expanding out from that I'd imagine, pie-in-the-sky and in some far off land, having our data type in/out validation functions moved to the common library and then adding client-side validation of the data going in/out of the encrypted columns would allow application developers to be able to trust what we're returning (as long as they're using libpq- and we'd have to document that independent implementations of the protocol have to provide this or just continue to return bytea's). As mentioned, some client libraries effectively already do that. But even if we could make this much more comprehensive, I don't see how this can ever actually satisfy your point. It would require that all participating clients apply validation all the time, and all other clients can rely on that happening, which is impossible.
Re: Thoughts on using Text::Template for our autogenerated code?
On 2023-03-30 Th 15:06, Daniel Gustafsson wrote: On 30 Mar 2023, at 19:06, Corey Huinker wrote: Some other digging around shows that the module has been around since 1996 (Perl5 was 1994) and hasn't had a feature update (or any update for that matter) since 2003. So it should meet our baseline 5.14 requirement, which came out in 2011. I assume you then mean tying this to 1.44 (or another version?), since AFAICT there has been both features and bugfixes well after 2003? https://metacpan.org/dist/Text-Template/changes I don't think that's remotely a starter. Asking people to install an old and possibly buggy version of a perl module is not something we should do. I think the barrier for this is pretty high. I try to keep module requirements for the buildfarm client pretty minimal, and this could affect a much larger group of people. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: running logical replication as the subscription owner
On Thu, Mar 30, 2023 at 2:52 PM Jeff Davis wrote: > > Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET > > ROLE to A. With the patch as written, actions on B's tables are not > > confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on > > C's > > tables are. > > It's interesting that it's not transitive, but I'm not sure whether > that's an argument for or against the current approach, or where it > fits (or doesn't fit) with my suggestion. Why do you consider it > important that C's actions are SECURITY_RESTRICTED_OPERATIONs? So that C can't try to hack into A's account. I mean the point here is that B already has permissions to get into A's account whenever they like, without any hacking. So we don't need to impose SECURITY_RESTRICTED_OPERATION when running as B, because the only purpose of SECURITY_RESTRICTED_OPERATION is to prevent the role to which we're switching from attacking the role from which we're switching. And that's how the patch is currently coded. You proposed removing that behavior, on the theory that if the SECURITY_RESTRICTED_OPERATION restrictions were a problem, someone could activate the run_as_owner option (or whatever we end up calling it). But the run_as_owner option applies to the entire subscription. If A activates that option, then B's hypothetical triggers that run afoul of the SECURITY_RESTRICTED_OPERATION restrictions start working again (woohoo!) but they're now vulnerable to attacks from C. With the patch as coded, A doesn't need to use run_as_owner, everything still just works for B, and A is still protected against C. > In version 16, the subscription owner is almost certainly a superuser, > and the table owner almost certainly is not, so there's little chance > that it just happens that the table owner has that privilege already. > > I don't think we want to encourage such grants to proliferate any more > than we want the option you introduce in 0002 to proliferate. Arguably, > it's worse. I don't necessarily find those role grants to be a problem. Obviously it depends on the use case. If you're hoping to be able to set up an account whose only purpose in life is to own subscriptions and which should have as few permissions as possible, then those role grants suck, and a hypothetical future feature where you can GRANT REPLICATION ON TABLE t1 TO subscription_owning_user will be far better. But I imagine CREATE SUBSCRIPTION being used either by superusers or by people who already have those role grants anyway, because I imagine replication as something that a highly privileged user configures on behalf of everyone who uses the system. And in that case those role grants aren't something new that you do specifically for logical replication - they're already there because you need them to administer stuff. Or you're the superuser and don't need them anyway. > And let's say a user says "I upgraded and my trigger broke logical > replication with message about a security-restricted operation... how > do I get up and running again?". With the patches as-written, we will > have two answers to that question: > > * GRANT subscription_owner TO table_owner WITH SET TRUE > * ALTER SUBSCRIPTION ... ($awesome_option_name=false) > > Under what circumstances would we recommend the former vs. the latter? Well, the latter is clearly better because it has such an awesome option name, right? More seriously, my theory is that there's very little use case for having a replication trigger, default expression, etc. that is performing a security restricted operation. And if someone does have a use case, and it's between users that can't already SET ROLE back and forth, then the setup is pretty dubious from a security perspective and maybe the user ought to rethink it. And if they don't want to rethink it, then they need to throw security out the window, and I don't really care which of those commands they use to do it, but the second one would probably break less other stuff for them, so I'd likely recommend that one. > > I don't think run_as_owner is terrible, despite the ambiguity. > > I won't object but I'm not thrilled. Let's see if anyone else weighs in. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Transparent column encryption
On 30.03.23 17:55, Andres Freund wrote: I find it very hard to belief that details of the catalog representation like this will matter to users. How would would it conceivably affect users that we store (key, encryption method) in pg_attribute vs storing an oid that's effectively a foreign key reference to (key, encryption method)? The change you are alluding to would also affect how the DDL commands work and interoperate, so it affects the user. But also, let's not drive this design decision bottom up. Let's go from how we want the data model and the DDL to work and then figure out suitable ways to record that. I don't really know if you are just worried about the catalog size, or you find an actual fault with the data model, or you just find it subjectively odd. The feature is arguably useful without typmod support, e.g., for text. We could ship it like that, then do some work to reorganize pg_attribute and tuple descriptors to relieve some pressure on each byte, and then add the typmod support back in in a future release. I think that is a workable compromise. I doubt that shipping a version of column encryption that breaks our type system is a good idea. I don't follow how you get from that to claiming that it breaks the type system. Please provide more details.
Re: Request for comment on setting binary format output per session
On Thu, 2023-03-30 at 07:06 -0500, Merlin Moncure wrote: > This ought to be impossible IMO. All libpq routines except PQexec > have an explicit expectation on format (via resultformat parameter) > that should not be overridden. PQexec ought to be explicitly > documented and wired to only request text format data. Right now it's clearly documented[1] which formats will be returned for a given Bind message. That can be seen as the root of the problem with psql -- we are breaking the protocol by returning binary when psql can rightfully expect text. It is a minor break, because something needed to send the "SET binary_formats='...'" command, but the protocol docs would need to be updated for sure. > participating clients to receive GUC configured format. I do not > think that libpq's result format being able to be overridden by GUC > is a good idea at all, the library has to to participate, and I > think can be made to so so without adjusting the interface (say, by > resultFormat = 3). Interesting idea. I suppose you'd need to specify 3 for all result columns? That is a protocol change, but wouldn't "break" older clients. The newer clients would need to make sure that they're connecting to v16+, so using the protocol version alone wouldn't be enough. Hmm. Regards, Jeff Davis [1] https://www.postgresql.org/docs/current/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-BIND
Re: Should vacuum process config file reload more often
> On 30 Mar 2023, at 04:57, Masahiko Sawada wrote: > As another idea, why don't we use macros for that? For example, > suppose VacuumCostStatus is like: > > typedef enum VacuumCostStatus > { >VACUUM_COST_INACTIVE_LOCKED = 0, >VACUUM_COST_INACTIVE, >VACUUM_COST_ACTIVE, > } VacuumCostStatus; > VacuumCostStatus VacuumCost; > > non-vacuum code can use the following macros: > > #define VacuumCostActive() (VacuumCost == VACUUM_COST_ACTIVE) > #define VacuumCostInactive() (VacuumCost <= VACUUM_COST_INACTIVE) // > or we can use !VacuumCostActive() instead. I'm in favor of something along these lines. A variable with a name that implies a boolean value (active/inactive) but actually contains a tri-value is easily misunderstood. A VacuumCostState tri-value variable (or a better name) with a set of convenient macros for extracting the boolean active/inactive that most of the code needs to be concerned with would more for more readable code I think. -- Daniel Gustafsson
Re: Parallel Full Hash Join
On Mon, Mar 27, 2023 at 7:04 PM Thomas Munro wrote: > I found another problem. I realised that ... FULL JOIN ... LIMIT n > might be able to give wrong answers with unlucky scheduling. > Unfortunately I have been unable to reproduce the phenomenon I am > imagining yet but I can't think of any mechanism that prevents the > following sequence of events: > > P0 probes, pulls n tuples from the outer relation and then the > executor starts to shut down (see commit 3452dc52 which pushed down > LIMIT), but just then P1 attaches, right before P0 does. P1 > continues, and finds < n outer tuples while probing and then runs out > so it enters the unmatched scan phase, and starts emitting bogusly > unmatched tuples. Some outer tuples we needed to get the complete set > of match bits and thus the right answer were buffered inside P0's > subplan and abandoned. > > I've attached a simple fixup for this problem. Short version: if > you're abandoning your PHJ_BATCH_PROBE phase without reaching the end, > you must be shutting down, so the executor must think it's OK to > abandon tuples this process has buffered, so it must also be OK to > throw all unmatched tuples out the window too, as if this process was > about to emit them. Right? I understand the scenario you are thinking of, however, I question how those incorrectly formed tuples would ever be returned by the query. The hashjoin would only start to shutdown once enough tuples had been emitted to satisfy the limit, at which point, those tuples buffered in p0 may be emitted by this worker but wouldn't be included in the query result, no? I suppose even if what I said is true, we do not want the hashjoin node to ever produce incorrect tuples. In which case, your fix seems correct to me. > With all the long and abstract discussion of hard to explain problems > in this thread and related threads, I thought I should take a step > back and figure out a way to demonstrate what this thing really does > visually. I wanted to show that this is a very useful feature that > unlocks previously unobtainable parallelism, and to show the > compromise we've had to make so far in an intuitive way. With some > extra instrumentation hacked up locally, I produced the attached > "progress" graphs for a very simple query: SELECT COUNT(*) FROM r FULL > JOIN s USING (i). Imagine a time axis along the bottom, but I didn't > bother to add numbers because I'm just trying to convey the 'shape' of > execution with relative times and synchronisation points. > > Figures 1-3 show that phases 'h' (hash) and 'p' (probe) are > parallelised and finish sooner as we add more processes to help out, > but 's' (= the unmatched inner tuple scan) is not. Note that if all > inner tuples are matched, 's' becomes extremely small and the > parallelism is almost as fair as a plain old inner join, but here I've > maximised it: all inner tuples were unmatched, because the two > relations have no matches at all. Even if we achieve perfect linear > scalability for the other phases, the speedup will be governed by > https://en.wikipedia.org/wiki/Amdahl%27s_law and the only thing that > can mitigate that is if there is more useful work those early-quitting > processes could do somewhere else in your query plan. > > Figure 4 shows that it gets a lot fairer in a multi-batch join, > because there is usually useful work to do on other batches of the > same join. Notice how processes initially work on loading, probing > and scanning different batches to reduce contention, but they are > capable of ganging up to load and/or probe the same batch if there is > nothing else left to do (for example P2 and P3 both work on p5 near > the end). For now, they can't do that for the s phases. (BTW, the > little gaps before loading is the allocation phase that I didn't > bother to plot because they can't fit a label on them; this > visualisation technique is a WIP.) > > With the "opportunistic" change we are discussing for later work, > figure 4 would become completely fair (P0 and P2 would be able to join > in and help out with s6 and s7), but single-batch figures 1-3 would > not (that would require a different executor design AFAICT, or a > eureka insight we haven't had yet; see long-winded discussion). Cool diagrams! > The last things I'm thinking about now: Are the planner changes > right? I think the current changes are correct. I wonder if we have to change anything in initial/final_cost_hashjoin to account for the fact that for a single batch full/right parallel hash join, part of the execution is serial. And, if so, do we need to consider the estimated number of unmatched tuples to be emitted? > Are the tests enough? So, the tests currently in the patch set cover the unmatched tuple scan phase for single batch parallel full hash join. I've attached the dumbest possible addition to that which adds in a multi-batch full parallel hash join case. I did not do any checking to ensure I picked the case which would
Re: pgsql: Clean up role created in new subscription test.
> On 30 Mar 2023, at 20:44, Tom Lane wrote: > Maybe it'd be close enough to expect there to be no roles named > "regress_xxx". In combination with > -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, that would prevent us > from accidentally leaving stuff behind, and we could hope that it doesn't > cause false failures in real installations. Would that check be always on or only when pg_regress is compiled with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS? > Another idea could be for pg_regress to enforce that "select count(*) > from pg_roles" gives the same answer before and after the test run. That wouldn't prevent the contents of pg_roles to have changed though, so there is a (slim) false positive risk with that no? -- Daniel Gustafsson
Re: ResourceOwner refactoring
Hi, > This patch, although moderately complicated, was moved between several > commitfests. I think it would be great if it made it to PG16. I'm > inclined to change the status of the patchset to RfC in a bit, unless > anyone has a second opinion. > I added a test module in src/test/modules/test_resowner to test those cases. Hmm... it looks like a file is missing in the patchset. When building with Autotools: ``` == running regression test queries== test test_resowner... /bin/sh: 1: cannot open /home/eax/projects/postgresql/src/test/modules/test_resowner/sql/test_resowner.sql: No such file ``` I wonder why the tests pass when using Meson. -- Best regards, Aleksander Alekseev
Re: Thoughts on using Text::Template for our autogenerated code?
> On 30 Mar 2023, at 19:06, Corey Huinker wrote: > Some other digging around shows that the module has been around since 1996 > (Perl5 was 1994) and hasn't had a feature update (or any update for that > matter) since 2003. So it should meet our baseline 5.14 requirement, which > came out in 2011. I assume you then mean tying this to 1.44 (or another version?), since AFAICT there has been both features and bugfixes well after 2003? https://metacpan.org/dist/Text-Template/changes -- Daniel Gustafsson
Re: running logical replication as the subscription owner
On Thu, 2023-03-30 at 09:41 -0400, Robert Haas wrote: > On Thu, Mar 30, 2023 at 1:19 AM Jeff Davis wrote: > > I say just take the special case out of 0001. If the trigger > > doesn't > > work as a SECURITY_RESTRICTED_OPERATION, and is also ENABLED > > ALWAYS, > > then the user can just use the new option in 0002 to get the old > > behavior. I don't see a reason to implicitly give them the old > > behavior, as 0001 does. > > Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET > ROLE to A. With the patch as written, actions on B's tables are not > confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on > C's > tables are. It's interesting that it's not transitive, but I'm not sure whether that's an argument for or against the current approach, or where it fits (or doesn't fit) with my suggestion. Why do you consider it important that C's actions are SECURITY_RESTRICTED_OPERATIONs? > I think we want to do everything possible to avoid people feeling > like > they need to turn on this new option. I'm not sure we'll ever be able > to get rid of it, but we certainly should avoid doing things that > make > it more likely that it will be needed. I don't think it helps much, though. While I previously said that the special-case behavior is implicit (which is true), it still almost certainly requires a manual step: GRANT subscription_owner TO table_owner WITH SET; In version 16, the subscription owner is almost certainly a superuser, and the table owner almost certainly is not, so there's little chance that it just happens that the table owner has that privilege already. I don't think we want to encourage such grants to proliferate any more than we want the option you introduce in 0002 to proliferate. Arguably, it's worse. And let's say a user says "I upgraded and my trigger broke logical replication with message about a security-restricted operation... how do I get up and running again?". With the patches as-written, we will have two answers to that question: * GRANT subscription_owner TO table_owner WITH SET TRUE * ALTER SUBSCRIPTION ... ($awesome_option_name=false) Under what circumstances would we recommend the former vs. the latter? > > > I agree that the naming is somewhat problematic, but I don't like > trust_table_owners. It's not clear enough about what actually > happens. > I want the name to describe behavior, not sentiment. On reflection, I agree here. We want it to communicate something about the behavior or mechanism. > run_as_subscription_owner removes the ambiguity, but is long. Then fewer people will use it, which might be a good thing. > I can think of other alternatives, like user_switching or > switch_to_table_owner or no_user_switching or various other things, > but none of them seem very good to me. I like the idea of using "switch" (or some synonym) because it's technically more correct. The subscription always runs as the subscription owner; we are just switching temporarily while applying a change. > Another idea could be to make the option non-Boolean. This is > comically long and I can't seriously recommend it, but just to > illustrate the point, if you type CREATE SUBSCRIPTION ... WITH > (execute_code_as_owner_of_which_object = subscription) then you > certainly should know what you've signed up for! If there were a > shorter version that were still clear, I could go for that, but I'm > having trouble coming up with exact wording. I don't care for that -- it communicates the options as co-equal and maybe something that would live forever (or even have more options in the future). I'd prefer that nobody uses the non-switching behavior except for migration purposes or weird use cases we don't really understand. > I don't think run_as_owner is terrible, despite the ambiguity. I won't object but I'm not thrilled. > Regards, Jeff Davis
Re: Thoughts on using Text::Template for our autogenerated code?
Hi, On 2023-03-30 13:06:46 -0400, Corey Huinker wrote: > Is there a barrier to us using non-core perl modules, in this case > Text::Template? I don't think we should have a hard build-time dependency on non-core perl modules. On some operating systems having to install such dependencies is quite painful (e.g. windows). > I think it would be a tremendous improvement in readability and > maintainability over our current series of print statements, some > multiline, some not. I think many of those could just be replaced by multi-line strings and/or here documents to get most of the win. Greetings, Andres Freund
Re: pgsql: Clean up role created in new subscription test.
Robert Haas writes: > On Thu, Mar 30, 2023 at 1:07 PM Tom Lane wrote: >> This oversight broke repeated runs of "make installcheck". > GH. You would think that I would have learned better by now, but > evidently not. Is there some way we can add an automated guard against > this? Hm. We could add a final test step that prints out all still-existing roles, but the trick is to have it not fail in a legitimate installcheck context (ie, when there are indeed some pre-existing roles). Maybe it'd be close enough to expect there to be no roles named "regress_xxx". In combination with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, that would prevent us from accidentally leaving stuff behind, and we could hope that it doesn't cause false failures in real installations. Another idea could be for pg_regress to enforce that "select count(*) from pg_roles" gives the same answer before and after the test run. That would then be enforced against all pg_regress suites not just the main one, but perhaps that's good. Likewise for tablespaces, subscriptions, and other globally-visible objects, of course. regards, tom lane
Re: Transparent column encryption
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2023-03-30 16:01:46 +0200, Peter Eisentraut wrote: > > On 30.03.23 03:29, Andres Freund wrote: > > > > One might think that, but the precedent in other equivalent systems is > > > > that > > > > you reference the key and the algorithm separately. There is some > > > > (admittedly not very conclusive) discussion about this near [0]. > > > > > > > > [0]: > > > > https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40 > > > > > > I'm very much not convinced by that. Either way, there at least there > > > should > > > be a comment mentioning that we intentionally try to allow that. > > > > > > Even if this feature is something we want (why?), ISTM that this should > > > not be > > > implemented by having multiple fields in pg_attribute, but instead by a > > > table > > > referenced by by pg_attribute.attcek. > > > > I don't know if it is clear to everyone here, but the key data model and the > > surrounding DDL are exact copies of the equivalent MS SQL Server feature. > > When I was first studying it, I had the exact same doubts about this. But > > as I was learning more about it, it does make sense, because this matches a > > common pattern in key management systems, which is relevant because these > > keys ultimately map into KMS-managed keys in a deployment. Moreover, 1) it > > is plausible that those people knew what they were doing, and 2) it would be > > preferable to maintain alignment and not create something that looks the > > same but is different in some small but important details. I was wondering about this- is it really exactly the same, down to the point that there's zero checking of what the data returned actually is after it's decrypted and given to the application, and if it actually matches the claimed data type? > I find it very hard to belief that details of the catalog representation like > this will matter to users. How would would it conceivably affect users that we > store (key, encryption method) in pg_attribute vs storing an oid that's > effectively a foreign key reference to (key, encryption method)? I do agree with this. > > > > With the proposed removal of usertypmod, it's only two fields: the link > > > > to > > > > the key and the user-facing type. > > > > > > That feels far less clean. I think loosing the ability to set the > > > precision of > > > a numeric, or the SRID for postgis datums won't be received very well? > > > > In my mind, and I probably wasn't explicit about this, I'm thinking about > > what can be done now versus later. > > > > The feature is arguably useful without typmod support, e.g., for text. We > > could ship it like that, then do some work to reorganize pg_attribute and > > tuple descriptors to relieve some pressure on each byte, and then add the > > typmod support back in in a future release. I think that is a workable > > compromise. > > I doubt that shipping a version of column encryption that breaks our type > system is a good idea. And this. I do feel that column encryption is a useful capability and there's large parts of this approach that I agree with, but I dislike the idea of having our clients be able to depend on what gets returned for non-encrypted columns while not being able to trust what encrypted column results are and then trying to say it's 'transparent'. To that end, it seems like just saying they get back a bytea and making it clear that they have to provide the validation would be clear, while keeping much of the rest. Expanding out from that I'd imagine, pie-in-the-sky and in some far off land, having our data type in/out validation functions moved to the common library and then adding client-side validation of the data going in/out of the encrypted columns would allow application developers to be able to trust what we're returning (as long as they're using libpq- and we'd have to document that independent implementations of the protocol have to provide this or just continue to return bytea's). Not sure how we'd manage to provide support for extensions though. Thanks, Stephen signature.asc Description: PGP signature
Re: pgsql: Clean up role created in new subscription test.
On Thu, Mar 30, 2023 at 1:07 PM Tom Lane wrote: > Clean up role created in new subscription test. > > This oversight broke repeated runs of "make installcheck". GH. You would think that I would have learned better by now, but evidently not. Is there some way we can add an automated guard against this? -- Robert Haas EDB: http://www.enterprisedb.com
Re: running logical replication as the subscription owner
On Thu, Mar 30, 2023 at 11:11 AM Jelte Fennema wrote: > Regarding the actual patch. I think the code looks good. Mainly the > tests and docs are lacking for the new option. Like I said for the > tests you can borrow the tests I updated for my v2 patch, I think > those should work fine for the new option. I took a look at that but I didn't really feel like that was quite the direction I wanted to go. I'd actually like to separate the tests of the new option out into their own file, so that if for some reason we decide we want to remove it in the future, it's easier to nuke all the associated tests. Also, quite frankly, I think we've gone way overboard in terms of loading too many tests into a single file, with the result that it's very hard to understand exactly what and all the file is actually testing and what it's intended to be testing. So the attached 0002 does it that way. I've also amended 0001 and 0002 with documentation changes that I hope are appropriate. I noticed along the way that my earlier commits had missed one place that needed to be updated by the pg_create_subscription patch I created earlier. A fix for that is included in 0001, but it can be broken out and committed separately if somebody feels strongly about it. I personally don't think it's worth it. -- Robert Haas EDB: http://www.enterprisedb.com v3-0002-Add-a-run_as_owner-option-to-subscriptions.patch Description: Binary data v3-0001-Perform-logical-replication-actions-as-the-table-.patch Description: Binary data
Re: Data is copied twice when specifying both child and parent table in publication
On Wed, Mar 29, 2023 at 2:00 AM Amit Kapila wrote: > Pushed. While rebasing my logical-roots patch over the top of this, I ran into another situation where mixed viaroot settings can duplicate data. The key idea is to subscribe to two publications with mixed settings, as before, and add a partition root that's already been replicated with viaroot=false to the other publication with viaroot=true. pub=# CREATE TABLE part (a int) PARTITION BY RANGE (a); pub=# CREATE PUBLICATION pub_all FOR ALL TABLES; pub=# CREATE PUBLICATION pub_other FOR TABLE other WITH (publish_via_partition_root); -- populate with data, then switch to subscription side sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub_all, pub_other; -- switch back to publication pub=# ALTER PUBLICATION pub_other ADD TABLE part; -- and back to subscription sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION; -- data is now duplicated (Standalone reproduction attached.) This is similar to what happens if you alter the publish_via_partition_root setting for an existing publication, but I'd argue it's easier to hit by accident. Is this part of the same class of bugs, or is it different (or even expected) behavior? Thanks, --Jacob repro.sql Description: application/sql
Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
Greetings, * Jacob Champion (jchamp...@timescale.com) wrote: > On 3/20/23 09:32, Robert Haas wrote: > > I think this is the root of our disagreement. > > Agreed. I've read all the way back to the $SUBJECT change to try and get an understanding of the questions here and it's not been easy, in part, I think, due to the verbiage but also the perhaps lack of concrete examples and instead references to other systems and protocols. > > My understanding of the > > previous discussion is that people think that the major problem here > > is the wraparound-to-superuser attack. That is, in general, we expect > > that when we connect to a database over the network, we expect it to > > do some kind of active authentication, like asking us for a password, > > or asking us for an SSL certificate that isn't just lying around for > > anyone to use. However, in the specific case of a local connection, we > > have a reliable way of knowing who the remote user is without any kind > > of active authentication, namely 'peer' authentication or perhaps even > > 'trust' if we trust all the local users, and so we don't judge it > > unreasonable to allow local connections without any form of active > > authentication. There can be some scenarios where even over a network > > we can know the identity of the person connecting with complete > > certainty, e.g. if endpoints are locked down such that the source IP > > address is a reliable indicator of who is initiating the connection, > > but in general when there's a network involved you don't know who the > > person making the connection is and need to do something extra to > > figure it out. > > Okay, but this is walking back from the network example you just > described upthread. Do you still consider that in scope, or...? The concern around the network certainly needs to be in-scope overall. > > If you accept this characterization of the problem, > > I'm not going to say yes or no just yet, because I don't understand your > rationale for where to draw the lines. > > If you just want the bare minimum thing that will solve the localhost > case, require_auth landed this week. Login triggers are not yet a thing, > so `require_auth=password,md5,scram-sha-256` ensures active > authentication. You don't even have to disallow localhost connections, > as far as I can tell; they'll work as intended. I do think require_auth helps us move in a positive direction. As I mentioned elsewhere, I don't think we highlight it nearly enough in the postgres_fdw documentation. Let's look at that in a bit more depth with concrete examples and perhaps everyone will be able to get a bit more understanding of the issues. Client is psql Proxy is some PG server that's got postgres_fdw Target is another PG server, that is being connected to from Proxy Authentication is via GSS/Kerberos with proxied credentials What do we want to require the user to configure to make this secure? Proxy's pg_hba configured to require GSS auth from Client. Target's pg_hba configured to require GSS auth from Proxy. Who are we trusting with what? In particular, I'd argue that the user who is able to install the postgres_fdw extension and the user who is able to issue the CREATE SERVER are largely trusted; at least in so far as the user doing CREATE SERVER is allowed to create the server and through that allowed to make outbound connections from the Proxy. Therefore, the Proxy is configured with postgres_fdw and with a trusted user performing the CREATE SERVER. What doesn't this handle today? Connection side-effects are one problem- once the CREATE SERVER is done, any user with USAGE rights on the server can create a USER MAPPING for themselves, either with a password or without one (if they're able to proxy GSS credentials to the system). They aren't able to set password_required though, which defaults to true. However, without having require_auth set, they're able to cause the Proxy to reach an authentication stage with the Target that might not match what credentials they're supposed to be providing. We attempt to address this by checking post-auth to Target that we used the credentials to connect that we expected to- if GSS credentials were proxied, then we expect to use those. If a password was provided then we expect to use a password to auth (only checked after we see if GSS credentials were proxied and used). The issue here is 'post-auth' bit, we'd prefer to fail the connection pre-auth if it isn't what we're expecting. Should we then explicit set require_auth=gss when GSS credentials are proxied? Also, if a password is provided, then explicitly set require_auth=scram-sha-256? Or default to these, at least, and allow the CREATE SERVER user to override our choices? Or should it be a USER MAPPING option that's restricted? Or not? > > I think that what you're proposing is that B and C can just be allowed > > to proxy to A and A can say "hey, by the way, I'm just gonna let you > > in without asking
Re: Thoughts on using Text::Template for our autogenerated code?
Corey Huinker writes: > Is there a barrier to us using non-core perl modules, in this case > Text::Template? Use for what exactly? I'd be hesitant to require such things to build from a tarball, or to run regression tests. If it's used to build a generated file that we include in tarballs, that might be workable ... but I'd bet a good fraction of the buildfarm falls over (looks like all four of my animals would), and you might get push-back from developers too. > I think it would be a tremendous improvement in readability and > maintainability over our current series of print statements, some > multiline, some not. I suspect it'd have to be quite a remarkable improvement to justify adding a new required build tool ... but show us an example. regards, tom lane
Re: [POC] Allow an extension to add data into Query and PlannedStmt nodes
On 30/3/2023 12:57, Julien Rouhaud wrote: Extensions could need to pass some query-related data through all stages of the query planning and execution. As a trivial example, pg_stat_statements uses queryid at the end of execution to save some statistics. One more reason - extensions now conflict on queryid value and the logic of its changing. With this patch, it can be managed. I just had a quick lookc at the patch, and IIUC it doesn't really help on that side, as there's still a single official "queryid" that's computed, stored everywhere and later used by pg_stat_statements (which does then store in additionally to that official queryid). Thank you for the attention! This patch doesn't try to solve the problem of oneness of queryId. In this patch we change pg_stat_statements and it doesn't set 0 into queryId field according to its internal logic. And another extension should do the same - use queryId on your own but not erase it - erase your private copy in the ext_field. Ruthless pgbench benchmark shows that we got some overhead: 1.6% - in default mode 4% - in prepared mode ~0.1% in extended mode. That's a quite significant overhead. But the only reason to accept such a change is to actually use it to store additional data, so it would be interesting to see what the overhead is like once you store at least 2 different values there. Yeah, but as I said earlier, it can be reduced to much smaller value just with simple optimization. Here I intentionally avoid it to discuss the core concept. - Are we need to invent a registration procedure to do away with the names of entries and use some compact integer IDs? Note that the patch as proposed doesn't have any defense for two extensions trying to register something with the same name, or update a stored value, as AddExtensionDataToNode() simply prepend the new value to the list. You can actually update the value by just storing the new value, but it will add a significant overhead to every other extension that want to read another value. Thanks a lot! Patch in attachment implements such an idea - extension can allocate some entries and use these private IDs to add entries. I hope, an extension would prefer to use only one entry for all the data to manage overhead, but still. The API is also quite limited as each stored value has a single identifier. What if your extension needs to store multiple things? Since it's all based on Node you can't really store some custom struct, so you probably have to end up with things like "my_extension.my_val1", "my_extension.my_val2" which isn't great. Main idea here - if an extension holds custom struct and want to pass it along all planning and execution stages it should use extensible node with custom read/write/copy routines. -- regards, Andrey Lepikhov Postgres Professional From ab101322330684e9839e46c26f70ad5462e40dac Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Thu, 30 Mar 2023 21:49:32 +0500 Subject: [PATCH 2/2] Add ExtendedData entry registration routines to use entryId instead of symbolic name. --- .../pg_stat_statements/pg_stat_statements.c | 7 +- src/backend/commands/extension.c | 69 +++ src/include/commands/extension.h | 6 +- src/include/nodes/parsenodes.h| 2 +- 4 files changed, 64 insertions(+), 20 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index fc47661c86..5b21163365 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -109,11 +109,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; !IsA(n, DeallocateStmt)) #define GET_QUERYID(node) \ - (Bigint *) GetExtensionData(node->ext_field, "pg_stat_statements") + (Bigint *) GetExtensionData(node->ext_field, extendedEntryID) #define INSERT_QUERYID(node, queryid) \ castNode(Bigint, AddExtensionDataToNode((Node *) node, \ - "pg_stat_statements", \ + extendedEntryID, \ (Node *) makeBigint((int64) queryid), \ true)) /* @@ -298,6 +298,7 @@ static bool pgss_track_utility = true; /* whether to track utility commands */ static bool pgss_track_planning = false; /* whether to track planning * duration */ static bool pgss_save = true; /* whether to save stats across shutdown */ +static int extendedEntryID = -1; /* Use to add queryId into the Query struct
Thoughts on using Text::Template for our autogenerated code?
Is there a barrier to us using non-core perl modules, in this case Text::Template? I think it would be a tremendous improvement in readability and maintainability over our current series of print statements, some multiline, some not. The module itself works like this https://www.perlmonks.org/?node_id=33296 Some other digging around shows that the module has been around since 1996 (Perl5 was 1994) and hasn't had a feature update (or any update for that matter) since 2003. So it should meet our baseline 5.14 requirement, which came out in 2011. I'm happy to proceed with a proof-of-concept so that people can see the costs/benefits, but wanted to first make sure it wasn't a total non-starter.
Re: Autogenerate some wait events code and documentation
On Wed, Mar 29, 2023 at 8:51 AM Drouvot, Bertrand < bertranddrouvot...@gmail.com> wrote: > Hi, > > On 3/29/23 11:44 AM, Drouvot, Bertrand wrote: > > > > > Looking forward to your feedback, > > Just realized that more polishing was needed. > > Done in V2 attached. > > Regards, > > -- > Bertrand Drouvot > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com I think this is good work, but I can't help thinking it would be easier to understand and maintain if we used a template engine like Text::Template, and filled out the template with the variant bits. I'll ask that question in another thread for higher visibility.
Re: Minimal logical decoding on standbys
Hi, On Thu, Mar 30, 2023 at 2:45 PM Jeff Davis wrote: > > On Thu, 2023-03-02 at 23:58 -0800, Jeff Davis wrote: > > On Thu, 2023-03-02 at 11:45 -0800, Jeff Davis wrote: > > > In this case it looks easier to add the right API than to be sure > > > about > > > whether it's needed or not. > > > > I attached a sketch of one approach. I'm not very confident that it's > > the right API or even that it works as I intended it, but if others > > like the approach I can work on it some more. > > Another approach might be to extend WaitEventSets() to be able to wait > on Condition Variables, rather than Condition Variables waiting on > WaitEventSets. Thoughts? > +1 to extend CV. If we extend WaitEventSet() to be able to wait on CV, it would be able to make the code simple, but we would need to change both CV and WaitEventSet(). On Fri, Mar 10, 2023 at 8:34 PM Drouvot, Bertrand wrote: > > I gave it a try, so please find attached > v2-0001-Introduce-ConditionVariableEventSleep.txt (implementing the comments > above) and 0004_new_API.txt to put the new API in the logical decoding on > standby context. @@ -180,13 +203,25 @@ ConditionVariableTimedSleep(ConditionVariable *cv, long timeout, * by something other than ConditionVariableSignal; though we don't * guarantee not to return spuriously, we'll avoid this obvious case. */ - SpinLockAcquire(>mutex); - if (!proclist_contains(>wakeup, MyProc->pgprocno, cvWaitLink)) + + if (cv) { - done = true; - proclist_push_tail(>wakeup, MyProc->pgprocno, cvWaitLink); + SpinLockAcquire(>mutex); + if (!proclist_contains(>wakeup, MyProc->pgprocno, cvWaitLink)) + { + done = true; + proclist_push_tail(>wakeup, MyProc->pgprocno, cvWaitLink); + } + SpinLockRelease(>mutex); } This change looks odd to me since it accepts cv being NULL in spite of calling ConditionVariableEventSleep() for cv. I think that this is because in 0004_new_API.txt, we use ConditionVariableEventSleep() in both not-in-recovery case and recovery-in-progress cases in WalSndWaitForWal() as follows: - WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_WAL); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, wakeEvents, NULL); + ConditionVariableEventSleep(cv, RecoveryInProgress, FeBeWaitSet, NULL, + sleeptime, wait_event); } But I don't think we need to use ConditionVariableEventSleep() in not-in-recovery cases. If I correctly understand the problem this patch wants to deal with, in logical decoding on standby cases, the walsender needs to be woken up on the following events: * condition variable * timeout * socket writable (if pq_is_send_pending() is true) (socket readable event should also be included to avoid wal_receiver_timeout BTW?) On the other hand, in not-in-recovery case, the events are: * socket readable * socket writable (if pq_is_send_pending() is true) * latch * timeout I think that we don't need to change for the latter case as WalSndWait() perfectly works. As for the former cases, since we need to wait for CV, timeout, or socket writable we can use ConditionVariableEventSleep(). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
FW: Add the ability to limit the amount of memory that can be allocated to backends.
Hi Reid! Some thoughts I was looking at lmgr/proc.c, and I see a potential integer overflow - both max_total_bkend_mem and result are declared as “int”, so the expression “max_total_bkend_mem * 1024 * 1024 - result * 1024 * 1024” could have a problem when max_total_bkend_mem is set to 2G or more. /* * Account for shared memory size and initialize * max_total_bkend_mem_bytes. */ pg_atomic_init_u64(>max_total_bkend_mem_bytes, max_total_bkend_mem * 1024 * 1024 - result * 1024 * 1024); As more of a style thing (and definitely not an error), the calling code might look smoother if the memory check and error handling were moved into a helper function, say “backend_malloc”. For example, the following calling code /* Do not exceed maximum allowed memory allocation */ if (exceeds_max_total_bkend_mem(Slab_CONTEXT_HDRSZ(chunksPerBlock))) { MemoryContextStats(TopMemoryContext); ereport(ERROR, (errcode(ERRCODE_OUT_OF_MEMORY), errmsg("out of memory - exceeds max_total_backend_memory"), errdetail("Failed while creating memory context \"%s\".", name))); } slab = (SlabContext *) malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock)); if (slab == NULL) …. Could become a single line: Slab = (SlabContext *) backend_malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock); Note this is a change in error handling as backend_malloc() throws memory errors. I think this change is already happening, as the error throws you’ve already added are potentially visible to callers (to be verified). It also could result in less informative error messages without the specifics of “while creating memory context”. Still, it pulls repeated code into a single wrapper and might be worth considering. I do appreciate the changes in updating the global memory counter. My recollection is the original version updated stats with every allocation, and there was something that looked like a spinlock around the update. (My memory may be wrong …). The new approach eliminates contention, both by having fewer updates and by using atomic ops. Excellent. I also have some thoughts about simplifying the atomic update logic you are currently using. I need to think about it a bit more and will get back to you later on that. * John Morris
Re: Non-superuser subscription owners
On Tue, Mar 28, 2023 at 1:52 PM Jeff Davis wrote: > On Fri, 2023-03-24 at 00:17 -0700, Jeff Davis wrote: > > The other patch you posted seems like it makes a lot of progress in > > that direction, and I think that should go in first. That was one of > > the items I suggested previously[2], so thank you for working on > > that. > > The above is not a hard objection. The other patch is starting to go in a direction that is going to have some conflicts with this one, so I went ahead and committed this one to avoid rebasing pain. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Transparent column encryption
Hi, On 2023-03-30 16:01:46 +0200, Peter Eisentraut wrote: > On 30.03.23 03:29, Andres Freund wrote: > > > One might think that, but the precedent in other equivalent systems is > > > that > > > you reference the key and the algorithm separately. There is some > > > (admittedly not very conclusive) discussion about this near [0]. > > > > > > [0]: > > > https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40 > > > > I'm very much not convinced by that. Either way, there at least there should > > be a comment mentioning that we intentionally try to allow that. > > > > Even if this feature is something we want (why?), ISTM that this should not > > be > > implemented by having multiple fields in pg_attribute, but instead by a > > table > > referenced by by pg_attribute.attcek. > > I don't know if it is clear to everyone here, but the key data model and the > surrounding DDL are exact copies of the equivalent MS SQL Server feature. > When I was first studying it, I had the exact same doubts about this. But > as I was learning more about it, it does make sense, because this matches a > common pattern in key management systems, which is relevant because these > keys ultimately map into KMS-managed keys in a deployment. Moreover, 1) it > is plausible that those people knew what they were doing, and 2) it would be > preferable to maintain alignment and not create something that looks the > same but is different in some small but important details. I find it very hard to belief that details of the catalog representation like this will matter to users. How would would it conceivably affect users that we store (key, encryption method) in pg_attribute vs storing an oid that's effectively a foreign key reference to (key, encryption method)? > > > With the proposed removal of usertypmod, it's only two fields: the link to > > > the key and the user-facing type. > > > > That feels far less clean. I think loosing the ability to set the precision > > of > > a numeric, or the SRID for postgis datums won't be received very well? > > In my mind, and I probably wasn't explicit about this, I'm thinking about > what can be done now versus later. > > The feature is arguably useful without typmod support, e.g., for text. We > could ship it like that, then do some work to reorganize pg_attribute and > tuple descriptors to relieve some pressure on each byte, and then add the > typmod support back in in a future release. I think that is a workable > compromise. I doubt that shipping a version of column encryption that breaks our type system is a good idea. Greetings, Andres Freund
Re: Images storing techniques
On Thu, Mar 30, 2023 at 11:30:37AM +0200, Gaetano Mendola wrote: > I would suggest to store your images in a file system and the store paths to > those images. > > You can keep files and entries in your database synced via triggers/stored > procedures (eventually written in python since pgsql doesn't allow you to > interact with the file system). You might want to read this blog entry: https://momjian.us/main/blogs/pgblog/2017.html#November_6_2017 -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: running logical replication as the subscription owner
On Thu, 30 Mar 2023 at 15:42, Robert Haas wrote: > On Thu, Mar 30, 2023 at 1:19 AM Jeff Davis wrote: > > I say just take the special case out of 0001. If the trigger doesn't > > work as a SECURITY_RESTRICTED_OPERATION, and is also ENABLED ALWAYS, > > then the user can just use the new option in 0002 to get the old > > behavior. I don't see a reason to implicitly give them the old > > behavior, as 0001 does. > > Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET > ROLE to A. With the patch as written, actions on B's tables are not > confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on C's > tables are. I think that's fair. There's no need to set SECURITY_RESTRICTED_OPERATION if it doesn't protect you anyway, and indeed it might break things. To be clear I do think it's important to still switch to the table owner, simply for consistency. But that's done in the patch so that's fine. > I agree that the naming is somewhat problematic, but I don't like > trust_table_owners. It's not clear enough about what actually happens. > I want the name to describe behavior, not sentiment. For security related things, I think sentiment is often just as important as the actual behaviour. It's not without reason that newer javascript frameworks have things like dangerouslySetInnerHTML, to scare people away from using it unless they know what they are doing. > I don't think run_as_owner is terrible, despite the ambiguity. It's > talking about the owner of the object on which the property is being > set. Isn't that the most natural interpretation? I'd be pretty > surprised if I set a property called run_as_owner on object A and it > ran as the owner of some other object B. I think that's fair and I'd be happy with run_as_owner. If someone doesn't understand which owner, they should probably check the documentation anyways to understand the implications. Regarding the actual patch. I think the code looks good. Mainly the tests and docs are lacking for the new option. Like I said for the tests you can borrow the tests I updated for my v2 patch, I think those should work fine for the new option. On Thu, 30 Mar 2023 at 15:42, Robert Haas wrote: > > On Thu, Mar 30, 2023 at 1:19 AM Jeff Davis wrote: > > I say just take the special case out of 0001. If the trigger doesn't > > work as a SECURITY_RESTRICTED_OPERATION, and is also ENABLED ALWAYS, > > then the user can just use the new option in 0002 to get the old > > behavior. I don't see a reason to implicitly give them the old > > behavior, as 0001 does. > > Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET > ROLE to A. With the patch as written, actions on B's tables are not > confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on C's > tables are. > > I think we want to do everything possible to avoid people feeling like > they need to turn on this new option. I'm not sure we'll ever be able > to get rid of it, but we certainly should avoid doing things that make > it more likely that it will be needed. > > > > 0002 adds a run_as_owner option to subscriptions. This doesn't make > > > the updated regression tests fail, because they don't use it. If you > > > revert the changes to 027_nosuperuser.pl then you get failures (as > > > one > > > would hope) and if you then add WITH (run_as_owner = true) to the > > > CREATE SUBSCRIPTION command in 027_nosuperuser.pl then it passes > > > again. I need to spend some more time thinking about what the tests > > > actually ought to look like if we go this route -- I haven't looked > > > through what Jelte proposed yet -- and also the documentation would > > > need a bunch of updating. > > > > "run_as_owner" is ambiguous -- subscription owner or table owner? > > > > I would prefer something like "trust_table_owners". That would > > communicate that the user shouldn't choose it unless they know what > > they're doing. > > I agree that the naming is somewhat problematic, but I don't like > trust_table_owners. It's not clear enough about what actually happens. > I want the name to describe behavior, not sentiment. > > run_as_subscription_owner removes the ambiguity, but is long. > > run_as_table_owner is a bit shorter, and we could do that with the > sense reversed. Is that equally clear? I'm not sure. > > I can think of other alternatives, like user_switching or > switch_to_table_owner or no_user_switching or various other things, > but none of them seem very good to me. > > Another idea could be to make the option non-Boolean. This is > comically long and I can't seriously recommend it, but just to > illustrate the point, if you type CREATE SUBSCRIPTION ... WITH > (execute_code_as_owner_of_which_object = subscription) then you > certainly should know what you've signed up for! If there were a > shorter version that were still clear, I could go for that, but I'm > having trouble coming up with exact wording. > > I don't think run_as_owner is terrible, despite the ambiguity.
Re: meson/msys2 fails with plperl/Strawberry
On 2023-03-27 Mo 13:18, Andres Freund wrote: Hi, On 2023-03-26 21:13:41 -0400, Andrew Dunstan wrote: On Mar 26, 2023, at 5:28 PM, Andres Freund wrote: On 2023-03-26 12:39:08 -0700, Andres Freund wrote: First: I am *not* arguing we shouldn't repair building against strawberry perl with mingw. Hm - can you describe the failure more - I just tried, and it worked to build against strawberry perl on mingw, without any issues. All I did was set -DPERL="c:/strawberrly/perl/bin/perl.exe". That might be the secret sauce I’m missing. I will be offline for a day or three, will test when I’m back. It should suffice to put strawberry perl first in PATH. All that the -DPERL does is to use that, instead of 'perl' from PATH. If putting strawberry perl ahead in PATH failed, something else must have been going on... Yeah, What it actually needed was a system upgrade. Sorry for the noise. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: pgindent vs. git whitespace check
On Wed, Mar 29, 2023 at 08:26:23PM +0200, Daniel Gustafsson wrote: > > On 29 Mar 2023, at 19:18, Bruce Momjian wrote: > > We would have to convert all supported branches, and tell all forks to > > do the same (hopefully at the same time). The new standard would then > > be for all single-line comments to use // instead of /* ... */. > > That still leaves every patch which is in flight on -hackers, and conflicts in > local development trees etc. It's doable (apart from forks, but that cannot > be > our core concern), but I personally can't see the price paid justify the > result. Yes, this would have to be done at the start of a new release cycle. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Transparent column encryption
On 30.03.23 03:29, Andres Freund wrote: One might think that, but the precedent in other equivalent systems is that you reference the key and the algorithm separately. There is some (admittedly not very conclusive) discussion about this near [0]. [0]: https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40 I'm very much not convinced by that. Either way, there at least there should be a comment mentioning that we intentionally try to allow that. Even if this feature is something we want (why?), ISTM that this should not be implemented by having multiple fields in pg_attribute, but instead by a table referenced by by pg_attribute.attcek. I don't know if it is clear to everyone here, but the key data model and the surrounding DDL are exact copies of the equivalent MS SQL Server feature. When I was first studying it, I had the exact same doubts about this. But as I was learning more about it, it does make sense, because this matches a common pattern in key management systems, which is relevant because these keys ultimately map into KMS-managed keys in a deployment. Moreover, 1) it is plausible that those people knew what they were doing, and 2) it would be preferable to maintain alignment and not create something that looks the same but is different in some small but important details. With the proposed removal of usertypmod, it's only two fields: the link to the key and the user-facing type. That feels far less clean. I think loosing the ability to set the precision of a numeric, or the SRID for postgis datums won't be received very well? In my mind, and I probably wasn't explicit about this, I'm thinking about what can be done now versus later. The feature is arguably useful without typmod support, e.g., for text. We could ship it like that, then do some work to reorganize pg_attribute and tuple descriptors to relieve some pressure on each byte, and then add the typmod support back in in a future release. I think that is a workable compromise.
Re: running logical replication as the subscription owner
On Thu, Mar 30, 2023 at 1:19 AM Jeff Davis wrote: > I say just take the special case out of 0001. If the trigger doesn't > work as a SECURITY_RESTRICTED_OPERATION, and is also ENABLED ALWAYS, > then the user can just use the new option in 0002 to get the old > behavior. I don't see a reason to implicitly give them the old > behavior, as 0001 does. Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET ROLE to A. With the patch as written, actions on B's tables are not confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on C's tables are. I think we want to do everything possible to avoid people feeling like they need to turn on this new option. I'm not sure we'll ever be able to get rid of it, but we certainly should avoid doing things that make it more likely that it will be needed. > > 0002 adds a run_as_owner option to subscriptions. This doesn't make > > the updated regression tests fail, because they don't use it. If you > > revert the changes to 027_nosuperuser.pl then you get failures (as > > one > > would hope) and if you then add WITH (run_as_owner = true) to the > > CREATE SUBSCRIPTION command in 027_nosuperuser.pl then it passes > > again. I need to spend some more time thinking about what the tests > > actually ought to look like if we go this route -- I haven't looked > > through what Jelte proposed yet -- and also the documentation would > > need a bunch of updating. > > "run_as_owner" is ambiguous -- subscription owner or table owner? > > I would prefer something like "trust_table_owners". That would > communicate that the user shouldn't choose it unless they know what > they're doing. I agree that the naming is somewhat problematic, but I don't like trust_table_owners. It's not clear enough about what actually happens. I want the name to describe behavior, not sentiment. run_as_subscription_owner removes the ambiguity, but is long. run_as_table_owner is a bit shorter, and we could do that with the sense reversed. Is that equally clear? I'm not sure. I can think of other alternatives, like user_switching or switch_to_table_owner or no_user_switching or various other things, but none of them seem very good to me. Another idea could be to make the option non-Boolean. This is comically long and I can't seriously recommend it, but just to illustrate the point, if you type CREATE SUBSCRIPTION ... WITH (execute_code_as_owner_of_which_object = subscription) then you certainly should know what you've signed up for! If there were a shorter version that were still clear, I could go for that, but I'm having trouble coming up with exact wording. I don't think run_as_owner is terrible, despite the ambiguity. It's talking about the owner of the object on which the property is being set. Isn't that the most natural interpretation? I'd be pretty surprised if I set a property called run_as_owner on object A and it ran as the owner of some other object B. I think our notion of how ambiguous this is may be somewhat inflated by the fact that we've just had a huge conversation about whether it should be the table owner or the subscription owner, so those possibilities are etched in our mind in a way that maybe they won't be for people coming at this fresh. But it's hard to be sure what other people will think about something, and I don't want to be too optimistic about the name I picked, either. > If you are worried that *I* think 0002 would be a poor call, then no, I > don't. Initially I didn't like the idea of supporting two behaviors > (and who would?), but we probably can't avoid it at least for right > now. OK, good. Then we have a way forward that nobody's too upset about. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Images storing techniques
On 29.03.23 23:29, Riccardo Gobbo wrote: Question: for better performance is it better to store images as BYTEA or convert every image in base64 and store the generated string (so in html it's enough to insert the base64 string in the tag)? Converting an image in base64 would use a 30% more memory than storing directly the image's bytes, but I don't know if working with characters rather than bytes could have more prons than cons Storing as bytea is better.
Re: Schema variables - new implementation for Postgres 15
On 30.03.23 10:49, Pavel Stehule wrote: If I reorganize the patch to the following structure, can be it useful for you? 1. really basic functionality (no temporary variables, no def expressions, no memory cleaning) SELECT variable LET should be supported + doc, + related tests. 2. support for temporary variables (session, transaction scope), memory cleaning at the end of transaction 3. PL/pgSQL support 4. pg_dump 5. shadowing warning 6. ... others ... That seems like an ok approach. The pg_dump support should probably go into the first patch, so it's self-contained.
Re: Initial Schema Sync for Logical Replication
On Thu, Mar 30, 2023 at 12:18 AM Masahiko Sawada wrote: > > On Wed, Mar 29, 2023 at 7:57 PM Kumar, Sachin wrote: > > > > > > > > From: Amit Kapila > > > > > > > I think we won't be able to use same snapshot because the > > > > > > > transaction will be committed. > > > > > > > In CreateSubscription() we can use the transaction snapshot from > > > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am > > > > > > > not sure about this part maybe walrcv_disconnect() calls the > > > > > > > commits > > > internally ?). > > > > > > > So somehow we need to keep this snapshot alive, even after > > > > > > > transaction is committed(or delay committing the transaction , > > > > > > > but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we > > > > > > > can have a restart before tableSync is able to use the same > > > > > > > snapshot.) > > > > > > > > > > > > > > > > > > > Can we think of getting the table data as well along with schema > > > > > > via pg_dump? Won't then both schema and initial data will > > > > > > correspond to the same snapshot? > > > > > > > > > > Right , that will work, Thanks! > > > > > > > > While it works, we cannot get the initial data in parallel, no? > > > > > > > > I was thinking each TableSync process will call pg_dump --table, This way > > if we have N > > tableSync process, we can have N pg_dump --table=table_name called in > > parallel. > > In fact we can use --schema-only to get schema and then let COPY take care > > of data > > syncing . We will use same snapshot for pg_dump as well as COPY table. > > How can we postpone creating the pg_subscription_rel entries until the > tablesync worker starts and does the schema sync? I think that since > pg_subscription_rel entry needs the table OID, we need either to do > the schema sync before creating the entry (i.e, during CREATE > SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The > apply worker needs the information of tables to sync in order to > launch the tablesync workers, but it needs to create the table schema > to get that information. For the above reason, I think that step 6 of the initial proposal won't work. If we can have the tablesync worker create an entry of pg_subscription_rel after creating the table, it may give us the flexibility to perform the initial sync. One idea is that we add a relname field to pg_subscription_rel so that we can create entries with relname instead of OID if the table is not created yet. Once the table is created, we clear the relname field and set the OID of the table instead. It's not an ideal solution but we might make it simpler later. Assuming that it's feasible, I'm considering another approach for the initial sync in order to address the concurrent DDLs. The basic idea is to somewhat follow how pg_dump/restore to dump/restore the database data. We divide the synchronization phase (including both schema and data) up into three phases: pre-data, table-data, post-data. These mostly follow the --section option of pg_dump. 1. The backend process performing CREATE SUBSCRIPTION creates the subscription but doesn't create pg_subscription_rel entries yet. 2. Before starting the streaming, the apply worker fetches the table list from the publisher, create pg_subscription_rel entries for them, and dumps+restores database objects that tables could depend on but don't depend on tables such as TYPE, OPERATOR, ACCESS METHOD etc (i.e. pre-data). 3. The apply worker launches the tablesync workers for tables that need to be synchronized. There might be DDLs executed on the publisher for tables before the tablesync worker starts. But the apply worker needs to apply DDLs for pre-data database objects. OTOH, it can ignore DDLs for not-synced-yet tables and other database objects such as INDEX, TRIGGER, RULE, etc (i.e. post-data). 4. The tablesync worker creates its replication slot, dumps+restores the table schema, update the pg_subscription_rel, and perform COPY. These operations should be done in the same transaction. 5. After finishing COPY, the tablesync worker dumps indexes (and perhaps constraints) of the table and creates them (which possibly takes a long time). Then it starts to catch up, same as today. The apply worker needs to wait for the tablesync worker to catch up. We need to repeat these steps until we complete the initial data copy and create indexes for all tables, IOW until all pg_subscription_rel status becomes READY. 6. If the apply worker confirms all tables are READY, it starts another sync worker who is responsible for the post-data database objects such as TRIGGER, RULE, POLICY etc (i.e. post-data). While the sync worker is starting up or working, the apply worker applies changes for pre-data database objects as well as READY tables. 7. Similar to the tablesync worker, this sync worker creates its replication slot and sets the returned LSN somewhere, say pg_subscription. 8. The sync worker dumps and restores these
Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
On Fri, Mar 24, 2023 at 5:47 PM Jacob Champion wrote: > Okay, but this is walking back from the network example you just > described upthread. Do you still consider that in scope, or...? Sorry, I don't know which example you mean. > > If machines B and C aren't under our control such that we can > > configure them that way, then the configuration is fundamentally > > insecure in a way that we can't really fix. > > Here's probably our biggest point of contention. You're unlikely to > convince me that this is the DBA's fault. > > If machines B and C aren't under our control, then our *protocol* is > fundamentally insecure in a way that we have the ability to fix, in a > way that's already been characterized in security literature. I guess I wouldn't have a problem blaming the DBA here, but you seem to be telling me that the security literature has settled on another kind of approach, and I'm not in a position to dispute that. It still feels weird to me, though. -- Robert Haas EDB: http://www.enterprisedb.com
Re: JsonPath version bits
Hi hackers! Could the 1 byte from the JsonPath header be used to store version? Or how many bits from the header could be used for the version value? On Mon, Mar 27, 2023 at 12:54 PM Nikita Malakhov wrote: > Hi hackers! > > I've got a question on the JsonPath header - currently the header size > is 4 bytes, where there are version and mode bits. Is there somewhere > a defined size of the version part? There are some extensions working > with JsonPath, and we have some too, thus it is important how many > bits is it possible to use to store a version value? > > -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/
Re: Request for comment on setting binary format output per session
On Wed, Mar 29, 2023 at 11:04 AM Jeff Davis wrote: > I'm not clear on what proposal you are making and/or endorsing? > ha -- was just backing up dave's GUC idea. > 1. Fix our own clients, like psql, to check for binary data they can't > process. > This ought to be impossible IMO. All libpq routines except PQexec have an explicit expectation on format (via resultformat parameter) that should not be overridden. PQexec ought to be explicitly documented and wired to only request text format data. resultfomat can be extended now or later to allow participating clients to receive GUC configured format. I do not think that libpq's result format being able to be overridden by GUC is a good idea at all, the library has to to participate, and I think can be made to so so without adjusting the interface (say, by resultFormat = 3). Similarly, in JDBC world, it ought to be up to the driver to determine when it want the server to flex wire formats but must be able to override the server's decision. merlin
Re: Support logical replication of DDLs
On Thu, 30 Mar 2023 at 13:29, houzj.f...@fujitsu.com wrote: > > > > > -Original Message- > > From: houzj.f...@fujitsu.com > > Sent: Thursday, March 30, 2023 2:37 PM > > > > On Tuesday, March 28, 2023 12:13 PM houzj.f...@fujitsu.com > > wrote: > > > > > > On Monday, March 27, 2023 8:08 PM Amit Kapila > > > > > wrote: > > > > On Mon, Mar 27, 2023 at 12:07 PM Amit Kapila > > > > wrote: > > > > > > > > > > On Mon, Mar 27, 2023 at 2:52 AM Tom Lane wrote: > > > > > > > > > > > > > > > > > I suggest taking a couple of steps back from the minutiae of the > > > > > > patch, and spending some hard effort thinking about how the thing > > > > > > would be controlled in a useful fashion (that is, a real design > > > > > > for the filtering that was mentioned at the very outset), and > > > > > > about the security issues, and about how we could get to a > > committable > > > patch. > > > > > > > > > > > > > > > > Agreed. I'll try to summarize the discussion we have till now on > > > > > this and share my thoughts on the same in a separate email. > > > > > > > > > > > > > The idea to control what could be replicated is to introduce a new > > > > publication option 'ddl' along with current options 'publish' and > > > > 'publish_via_partition_root'. The values of this new option could be > > > > 'table', 'function', 'all', etc. Here 'all' enables the replication of > > > > all supported DDL commands. Example usage for this would be: > > > > Example: > > > > Create a new publication with all ddl replication enabled: > > > > CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all'); > > > > > > > > Enable table ddl replication for an existing Publication: > > > > ALTER PUBLICATION pub2 SET (ddl = 'table'); > > > > > > > > This is what seems to have been discussed but I think we can even > > > > extend it to support based on operations/commands, say one would like > > > > to publish only 'create' and 'drop' of tables. Then we can extend the > > > > existing publish option to have values like 'create', 'alter', and > > > > 'drop'. > > > > > > > > Another thing we are considering related to this is at what level > > > > these additional options should be specified. We have three variants > > > > FOR TABLE, FOR ALL TABLES, and FOR TABLES IN SCHEMA that enables > > > > replication. Now, for the sake of simplicity, this new option is > > > > discussed to be provided only with FOR ALL TABLES variant but I think > > > > we can provide it with other variants with some additional > > > > restrictions like with FOR TABLE, we can only specify 'alter' and > > > > 'drop' for publish option. Now, though possible, it brings additional > > > > complexity to support it with variants other than FOR ALL TABLES > > > > because then we need to ensure additional filtering and possible > > > > modification of the content we have to send to downstream. So, we can > > even > > > decide to first support it only FOR ALL TABLES variant. > > > > > > > > The other point to consider for publish option 'ddl = table' is > > > > whether we need to allow replicating dependent objects like say some > > > > user-defined type is used in the table. I guess the difficulty here > > > > would be to identify which dependents we want to allow. > > > > > > > > I think in the first version we should allow to replicate only some of > > > > the objects instead of everything. For example, can we consider only > > > > allowing tables and indexes in the first version? Then extend it in a > > > > phased > > > manner? > > > > > > I think supporting table related stuff in the first version makes sense > > > and the > > > patch size could be reduced to a suitable size. > > > > Based on the discussion, I split the patch into four parts: Table DDL > > replication(0001,0002), Index DDL replication(0003), ownership stuff for > > table > > and index(0004), other DDL's replication(0005). > > > > In this version, I mainly tried to split the patch set, and there are few > > OPEN items we need to address later: > > > > 1) The current publication "ddl" option only have two values: table, all. We > >also need to add index and maybe other objects in the list. > > > > 2) Need to improve the syntax stuff. Currently, we store the option value of > >the "with (ddl = xx)" via different columns in the catalog, the > >catalog(pg_publication) will have more and more columns as we add > > support > >for logical replication of other objects in the future. We could store > > it as > >an text array instead. > > > >OTOH, since we have proposed some other more flexible syntax to -hackers, > > the current > >syntax might be changed which might also solve this problem. > > > > 3) The test_ddl_deparse_regress test module is not included in the set, > > because > >I think we also need to split it into table stuff, index stuff and > > others, > >we can share it after finishing that. > > > > 4) The patch set could be spitted further to make it easier
Re: "variable not found in subplan target list"
On 2023-Mar-29, Amit Langote wrote: > On Wed, Mar 29, 2023 at 3:39 AM Tom Lane wrote: > > Alvaro Herrera writes: > > > So I'm back home and found a couple more weird errors in the log: > > > > > ERROR: mismatching PartitionPruneInfo found at part_prune_index 0 > > > DETALLE: plan node relids (b 1), pruneinfo relids (b 36) > > > > This one reproduces for me. > > I've looked into this one and the attached patch fixes it for me. > Turns out set_plan_refs()'s idea of when the entries from > PlannerInfo.partPruneInfos are transferred into > PlannerGlobal.partPruneInfo was wrong. Thanks for the patch. I've pushed it to github for CI testing, and if there are no problems I'll put it in. > Though, I wonder if we need to keep ec386948948 that introduced the > notion of part_prune_index around if the project that needed it [1] > has moved on to an entirely different approach altogether, one that > doesn't require hacking up the pruning code. Hmm, that's indeed tempting. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Support logical replication of DDLs
On Wed, Mar 29, 2023 at 10:23 PM Zheng Li wrote: > > On Wed, Mar 29, 2023 at 5:13 AM Amit Kapila wrote: > > > > On Wed, Mar 29, 2023 at 2:49 AM Zheng Li wrote: > > > > > > > > > I agree that a full fledged DDL deparser and DDL replication is too > > > big of a task for one patch. I think we may consider approaching this > > > feature in the following ways: > > > 1. Phased development and testing as discussed in other emails. > > > Probably support table commands first (as they are the most common > > > DDLs), then the other commands in multiple phases. > > > 2. Provide a subscription option to receive the DDL change, raise a > > > notice and to skip applying the change. The users can listen to the > > > DDL notice and implement application logic to apply the change if > > > needed. The idea is we can start gathering user feedback by providing > > > a somewhat useful feature (compared to doing nothing about DDLs), but > > > also avoid heading straight into the potential footgun situation > > > caused by automatically applying any mal-formatted DDLs. > > > > > > > Doesn't this mean that we still need to support deparsing of such DDLs > > which is what I think we don't want? > > I think we can send the plain DDL command string and the search_path > if we don't insist on applying it in the first version. Maybe the > deparser can be integrated later when we're confident that it's > ready/subset of it is ready. > I think this will have overhead for users that won't need it and we have to anyway remove it later when deparsing of such commands is added. Personally, I don't think we need to do this to catch the apply errors. -- With Regards, Amit Kapila.
Re: Support logical replication of DDLs
On Thu, Mar 30, 2023 at 3:16 PM vignesh C wrote: > > On Thu, 30 Mar 2023 at 13:29, houzj.f...@fujitsu.com > wrote: > > > > > > > > > -Original Message- > > > From: houzj.f...@fujitsu.com > > > Sent: Thursday, March 30, 2023 2:37 PM > > > > > > On Tuesday, March 28, 2023 12:13 PM houzj.f...@fujitsu.com > > > wrote: > > > > > > > > On Monday, March 27, 2023 8:08 PM Amit Kapila > > > > > > > wrote: > > > > > On Mon, Mar 27, 2023 at 12:07 PM Amit Kapila > > > > > wrote: > > > > > > > > > > > > On Mon, Mar 27, 2023 at 2:52 AM Tom Lane wrote: > > > > > > > > > > > > > > > > > > > > I suggest taking a couple of steps back from the minutiae of the > > > > > > > patch, and spending some hard effort thinking about how the thing > > > > > > > would be controlled in a useful fashion (that is, a real design > > > > > > > for the filtering that was mentioned at the very outset), and > > > > > > > about the security issues, and about how we could get to a > > > committable > > > > patch. > > > > > > > > > > > > > > > > > > > Agreed. I'll try to summarize the discussion we have till now on > > > > > > this and share my thoughts on the same in a separate email. > > > > > > > > > > > > > > > > The idea to control what could be replicated is to introduce a new > > > > > publication option 'ddl' along with current options 'publish' and > > > > > 'publish_via_partition_root'. The values of this new option could be > > > > > 'table', 'function', 'all', etc. Here 'all' enables the replication of > > > > > all supported DDL commands. Example usage for this would be: > > > > > Example: > > > > > Create a new publication with all ddl replication enabled: > > > > > CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all'); > > > > > > > > > > Enable table ddl replication for an existing Publication: > > > > > ALTER PUBLICATION pub2 SET (ddl = 'table'); > > > > > > > > > > This is what seems to have been discussed but I think we can even > > > > > extend it to support based on operations/commands, say one would like > > > > > to publish only 'create' and 'drop' of tables. Then we can extend the > > > > > existing publish option to have values like 'create', 'alter', and > > > > > 'drop'. > > > > > > > > > > Another thing we are considering related to this is at what level > > > > > these additional options should be specified. We have three variants > > > > > FOR TABLE, FOR ALL TABLES, and FOR TABLES IN SCHEMA that enables > > > > > replication. Now, for the sake of simplicity, this new option is > > > > > discussed to be provided only with FOR ALL TABLES variant but I think > > > > > we can provide it with other variants with some additional > > > > > restrictions like with FOR TABLE, we can only specify 'alter' and > > > > > 'drop' for publish option. Now, though possible, it brings additional > > > > > complexity to support it with variants other than FOR ALL TABLES > > > > > because then we need to ensure additional filtering and possible > > > > > modification of the content we have to send to downstream. So, we can > > > even > > > > decide to first support it only FOR ALL TABLES variant. > > > > > > > > > > The other point to consider for publish option 'ddl = table' is > > > > > whether we need to allow replicating dependent objects like say some > > > > > user-defined type is used in the table. I guess the difficulty here > > > > > would be to identify which dependents we want to allow. > > > > > > > > > > I think in the first version we should allow to replicate only some of > > > > > the objects instead of everything. For example, can we consider only > > > > > allowing tables and indexes in the first version? Then extend it in a > > > > > phased > > > > manner? > > > > > > > > I think supporting table related stuff in the first version makes sense > > > > and the > > > > patch size could be reduced to a suitable size. > > > > > > Based on the discussion, I split the patch into four parts: Table DDL > > > replication(0001,0002), Index DDL replication(0003), ownership stuff for > > > table > > > and index(0004), other DDL's replication(0005). > > > > > > In this version, I mainly tried to split the patch set, and there are few > > > OPEN items we need to address later: > > > > > > 1) The current publication "ddl" option only have two values: table, all. > > > We > > >also need to add index and maybe other objects in the list. > > > > > > 2) Need to improve the syntax stuff. Currently, we store the option value > > > of > > >the "with (ddl = xx)" via different columns in the catalog, the > > >catalog(pg_publication) will have more and more columns as we add > > > support > > >for logical replication of other objects in the future. We could store > > > it as > > >an text array instead. > > > > > >OTOH, since we have proposed some other more flexible syntax to > > > -hackers, > > > the current > > >syntax might be changed which might also solve this problem. > > > > > >
Re: PGdoc: add ID attribute to create_publication.sgml
On Thu, Mar 30, 2023 at 9:22 AM Peter Smith wrote: > > I have marked the CF entry for this patch as "ready for committer" > LGTM. I'll push this tomorrow unless there are more comments for it. -- With Regards, Amit Kapila.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 30 Mar 2023 at 10:07, Denis Laxalde wrote: > Patch 5 is missing respective changes; please find attached a fixup > patch for these. Thanks, attached are newly rebased patches that include this change. I also cast the result of PQcancelSend to to void in the one case where it's ignored on purpose. Note that the patchset shrunk by one, since the original patch 0002 has been committed now. v19-0002-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v19-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data v19-0003-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v19-0004-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: [EXTERNAL] Support load balancing in libpq
> On 30 Mar 2023, at 10:21, Daniel Gustafsson wrote: > >> On 30 Mar 2023, at 10:00, Julien Rouhaud wrote: >> >> On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson wrote: >>> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) wrote: >>> While checking the buildfarm, I found a failure on NetBSD caused by the added code[1]: >>> >>> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc >>> 4.7.2) >>> has the same error. I'll look into it today to get a fix committed. >> >> This is an i686 machine, so it probably has the same void * >> difference. Building with -m32 might be enough to reproduce the >> problem. > > Makes sense. I think the best option is to simply remove conn from being part > of the seed and rely on the other values. Will apply that after a testrun. After some offlist discussion I ended up pushing the proposed uintptr_t fix instead, now waiting for these animals to build. -- Daniel Gustafsson
Re: Support logical replication of DDLs
On Thu, 30 Mar 2023 at 13:29, houzj.f...@fujitsu.com wrote: > > > > > -Original Message- > > From: houzj.f...@fujitsu.com > > Sent: Thursday, March 30, 2023 2:37 PM > > > > On Tuesday, March 28, 2023 12:13 PM houzj.f...@fujitsu.com > > wrote: > > > > > > On Monday, March 27, 2023 8:08 PM Amit Kapila > > > > > wrote: > > > > On Mon, Mar 27, 2023 at 12:07 PM Amit Kapila > > > > wrote: > > > > > > > > > > On Mon, Mar 27, 2023 at 2:52 AM Tom Lane wrote: > > > > > > > > > > > > > > > > > I suggest taking a couple of steps back from the minutiae of the > > > > > > patch, and spending some hard effort thinking about how the thing > > > > > > would be controlled in a useful fashion (that is, a real design > > > > > > for the filtering that was mentioned at the very outset), and > > > > > > about the security issues, and about how we could get to a > > committable > > > patch. > > > > > > > > > > > > > > > > Agreed. I'll try to summarize the discussion we have till now on > > > > > this and share my thoughts on the same in a separate email. > > > > > > > > > > > > > The idea to control what could be replicated is to introduce a new > > > > publication option 'ddl' along with current options 'publish' and > > > > 'publish_via_partition_root'. The values of this new option could be > > > > 'table', 'function', 'all', etc. Here 'all' enables the replication of > > > > all supported DDL commands. Example usage for this would be: > > > > Example: > > > > Create a new publication with all ddl replication enabled: > > > > CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all'); > > > > > > > > Enable table ddl replication for an existing Publication: > > > > ALTER PUBLICATION pub2 SET (ddl = 'table'); > > > > > > > > This is what seems to have been discussed but I think we can even > > > > extend it to support based on operations/commands, say one would like > > > > to publish only 'create' and 'drop' of tables. Then we can extend the > > > > existing publish option to have values like 'create', 'alter', and > > > > 'drop'. > > > > > > > > Another thing we are considering related to this is at what level > > > > these additional options should be specified. We have three variants > > > > FOR TABLE, FOR ALL TABLES, and FOR TABLES IN SCHEMA that enables > > > > replication. Now, for the sake of simplicity, this new option is > > > > discussed to be provided only with FOR ALL TABLES variant but I think > > > > we can provide it with other variants with some additional > > > > restrictions like with FOR TABLE, we can only specify 'alter' and > > > > 'drop' for publish option. Now, though possible, it brings additional > > > > complexity to support it with variants other than FOR ALL TABLES > > > > because then we need to ensure additional filtering and possible > > > > modification of the content we have to send to downstream. So, we can > > even > > > decide to first support it only FOR ALL TABLES variant. > > > > > > > > The other point to consider for publish option 'ddl = table' is > > > > whether we need to allow replicating dependent objects like say some > > > > user-defined type is used in the table. I guess the difficulty here > > > > would be to identify which dependents we want to allow. > > > > > > > > I think in the first version we should allow to replicate only some of > > > > the objects instead of everything. For example, can we consider only > > > > allowing tables and indexes in the first version? Then extend it in a > > > > phased > > > manner? > > > > > > I think supporting table related stuff in the first version makes sense > > > and the > > > patch size could be reduced to a suitable size. > > > > Based on the discussion, I split the patch into four parts: Table DDL > > replication(0001,0002), Index DDL replication(0003), ownership stuff for > > table > > and index(0004), other DDL's replication(0005). > > > > In this version, I mainly tried to split the patch set, and there are few > > OPEN items we need to address later: > > > > 1) The current publication "ddl" option only have two values: table, all. We > >also need to add index and maybe other objects in the list. > > > > 2) Need to improve the syntax stuff. Currently, we store the option value of > >the "with (ddl = xx)" via different columns in the catalog, the > >catalog(pg_publication) will have more and more columns as we add > > support > >for logical replication of other objects in the future. We could store > > it as > >an text array instead. > > > >OTOH, since we have proposed some other more flexible syntax to -hackers, > > the current > >syntax might be changed which might also solve this problem. > > > > 3) The test_ddl_deparse_regress test module is not included in the set, > > because > >I think we also need to split it into table stuff, index stuff and > > others, > >we can share it after finishing that. > > > > 4) The patch set could be spitted further to make it easier
Re: Images storing techniques
I would suggest to store your images in a file system and the store paths to those images. You can keep files and entries in your database synced via triggers/stored procedures (eventually written in python since pgsql doesn't allow you to interact with the file system). On Thu, Mar 30, 2023, 11:22 Riccardo Gobbo < riccardo.gobb...@studenti.unipd.it> wrote: > Good evening, > I'm a Master degree student at University of Padua in Italy and I'm > developing a web application as assignment for the Web application course. > > Context: the Web application that my group is developing would ideally be > used to manage county side fairs where there would be foods and drinks, > these displayed into a digital menu. > The application uses postgre to implement a database where stores data, > mostly strings as emails and orders but also some images (representing the > dishes). > The web pages are created using java servlets and jbc > > Question: for better performance is it better to store images as BYTEA or > convert every image in base64 and store the generated string (so in html > it's enough to insert the base64 string in the tag)? > Converting an image in base64 would use a 30% more memory than storing > directly the image's bytes, but I don't know if working with characters > rather than bytes could have more prons than cons > > Thank for the time you dedicated for the answer and I apologize both for > disturbing you and my English. > > Best regards, Riccardo. > > Computer Engineering > Mat. 2082156 >
Images storing techniques
Good evening, I'm a Master degree student at University of Padua in Italy and I'm developing a web application as assignment for the Web application course. Context: the Web application that my group is developing would ideally be used to manage county side fairs where there would be foods and drinks, these displayed into a digital menu. The application uses postgre to implement a database where stores data, mostly strings as emails and orders but also some images (representing the dishes). The web pages are created using java servlets and jbc Question: for better performance is it better to store images as BYTEA or convert every image in base64 and store the generated string (so in html it's enough to insert the base64 string in the tag)? Converting an image in base64 would use a 30% more memory than storing directly the image's bytes, but I don't know if working with characters rather than bytes could have more prons than cons Thank for the time you dedicated for the answer and I apologize both for disturbing you and my English. Best regards, Riccardo. Computer Engineering Mat. 2082156
Re: Schema variables - new implementation for Postgres 15
Hi st 29. 3. 2023 v 12:17 odesílatel Peter Eisentraut < peter.eisentr...@enterprisedb.com> napsal: > On 24.03.23 08:04, Pavel Stehule wrote: > > Maybe I can divide the patch 0002-session-variables to three sections - > > related to memory management, planning and execution? > > Personally, I find the existing split not helpful. There is no value > (to me) in putting code, documentation, and tests in three separate > patches. This is in fact counter-helpful (to me). Things like the > DISCARD command (0005) and the error messages changes (0009) can be > separate patches, but most of the rest should probably be a single patch. > > I know you have been asked earlier in the thread to provide smaller > patches, so don't change it just for me, but this is my opinion. > If I reorganize the patch to the following structure, can be it useful for you? 1. really basic functionality (no temporary variables, no def expressions, no memory cleaning) SELECT variable LET should be supported + doc, + related tests. 2. support for temporary variables (session, transaction scope), memory cleaning at the end of transaction 3. PL/pgSQL support 4. pg_dump 5. shadowing warning 6. ... others ... Can it be better for you? Regards Pavel
Re: [EXTERNAL] Support load balancing in libpq
> On 30 Mar 2023, at 10:00, Julien Rouhaud wrote: > > On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson wrote: >> >>> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) >>> wrote: >> >>> While checking the buildfarm, I found a failure on NetBSD caused by the >>> added code[1]: >> >> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc >> 4.7.2) >> has the same error. I'll look into it today to get a fix committed. > > This is an i686 machine, so it probably has the same void * > difference. Building with -m32 might be enough to reproduce the > problem. Makes sense. I think the best option is to simply remove conn from being part of the seed and rely on the other values. Will apply that after a testrun. -- Daniel Gustafsson
RE: doc: add missing "id" attributes to extension packaging page
Dear Brar, Thank you for updating the patch. The patch looks good to me. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Jelte Fennema a écrit : > On Wed, 29 Mar 2023 at 10:43, Denis Laxalde wrote: > > More importantly, not having PQcancelSend() creating the PGcancelConn > > makes reuse of that value, passing through PQcancelReset(), more > > intuitive. E.g., in the tests: > > You convinced me. Attached is an updated patch where PQcancelSend > takes the PGcancelConn and returns 1 or 0. Patch 5 is missing respective changes; please find attached a fixup patch for these. >From c9e59fb3e30db1bfab75be9fdd4afbc227a5270e Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Thu, 30 Mar 2023 09:19:18 +0200 Subject: [PATCH] fixup! Start using new libpq cancel APIs --- contrib/dblink/dblink.c | 4 ++-- src/fe_utils/connect_utils.c | 4 +++- src/test/isolation/isolationtester.c | 4 ++-- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index e139f66e11..073795f088 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -1332,11 +1332,11 @@ dblink_cancel_query(PG_FUNCTION_ARGS) dblink_init(); conn = dblink_get_named_conn(text_to_cstring(PG_GETARG_TEXT_PP(0))); - cancelConn = PQcancelSend(conn); + cancelConn = PQcancelConn(conn); PG_TRY(); { - if (PQcancelStatus(cancelConn) == CONNECTION_BAD) + if (!PQcancelSend(cancelConn)) { msg = pchomp(PQcancelErrorMessage(cancelConn)); } diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c index b32448c010..1cfd717217 100644 --- a/src/fe_utils/connect_utils.c +++ b/src/fe_utils/connect_utils.c @@ -161,7 +161,9 @@ disconnectDatabase(PGconn *conn) if (PQtransactionStatus(conn) == PQTRANS_ACTIVE) { - PQcancelFinish(PQcancelSend(conn)); + PGcancelConn *cancelConn = PQcancelConn(conn); + PQcancelSend(cancelConn); + PQcancelFinish(cancelConn); } PQfinish(conn); diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c index 3781f7982b..de31a87571 100644 --- a/src/test/isolation/isolationtester.c +++ b/src/test/isolation/isolationtester.c @@ -946,9 +946,9 @@ try_complete_step(TestSpec *testspec, PermutationStep *pstep, int flags) */ if (td > max_step_wait && !canceled) { -PGcancelConn *cancel_conn = PQcancelSend(conn); +PGcancelConn *cancel_conn = PQcancelConn(conn); -if (PQcancelStatus(cancel_conn) == CONNECTION_OK) +if (PQcancelSend(cancel_conn)) { /* * print to stdout not stderr, as this should appear in -- 2.30.2
Re: Schema variables - new implementation for Postgres 15
Hi ne 26. 3. 2023 v 19:44 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Fri, Mar 24, 2023 at 08:04:08AM +0100, Pavel Stehule wrote: > > čt 23. 3. 2023 v 19:54 odesílatel Pavel Stehule > > > napsal: > > > > > čt 23. 3. 2023 v 16:33 odesílatel Peter Eisentraut < > > > peter.eisentr...@enterprisedb.com> napsal: > > > > > >> The other issue is that by its nature this patch adds a lot of code > in a > > >> lot of places. Large patches are more likely to be successful if they > > >> add a lot of code in one place or smaller amounts of code in a lot of > > >> places. But this patch does both and it's just overwhelming. There > is > > >> so much new internal functionality and terminology. Variables can be > > >> created, registered, initialized, stored, copied, prepared, set, > freed, > > >> removed, released, synced, dropped, and more. I don't know if anyone > > >> has actually reviewed all that in detail. > > >> > > >> Has any effort been made to make this simpler, smaller, reduce scope, > > >> refactoring, find commonalities with other features, try to manage the > > >> complexity somehow? > > >> > > > I agree that this patch is large, but almost all code is simple. > Complex > > > code is "only" in 0002-session-variables.patch (113KB/438KB). > > > > > > Now, I have no idea how the functionality can be sensibly reduced or > > > divided (no without significant performance loss). I see two difficult > > > points in this code: > > > > > > 1. when to clean memory. The code implements cleaning very accurately - > > > and this is unique in Postgres. Partially I implement some > functionality of > > > storage manager. Probably no code from Postgres can be reused, because > > > there is not any support for global temporary objects. Cleaning based > on > > > sinval messages processing is difficult, but there is nothing else. > The > > > code is a little bit more complex, because there are three types of > session > > > variables: a) session variables, b) temp session variables, c) session > > > variables with transaction scope. Maybe @c can be removed, and maybe we > > > don't need to support not null default (this can simplify > initialization). > > > What do you think about it? > > > > > > 2. how to pass a variable's value to the executor. The implementation > is > > > based on extending the Param node, but it cannot reuse query params > buffers > > > and implements own. > > > But it is hard to simplify code, because we want to support usage > > > variables in queries, and usage in PL/pgSQL expressions too. And both > are > > > processed differently. > > > > > > > Maybe I can divide the patch 0002-session-variables to three sections - > > related to memory management, planning and execution? > > I agree, the patch scale is a bit overwhelming. It's worth noting that > due to the nature of this change certain heavy lifting has to be done in > any case, plus I've got an impression that some part of the patch are > quite solid (although I haven't reviewed everything, did anyone achieve > that milestone?). But still, it would be of great help to simplify the > current implementation, and I'm afraid the only way of doing this is to > make trades-off about functionality vs change size & complexity. > There is not too much space for reduction - more - sometimes there is code reuse between features. I can reduce temporary session variables, but the same AtSubXact routines are used by memory purging routines, and if only if you drop all dependent features, then you can get some interesting number of reduced lines. I can imagine very reduced feature set like 1) no temporary variables, no reset at transaction end 2) without default expressions - default is null 3) direct memory cleaning on drop (without possibility of saved value after reverted drop) or cleaning at session end always Note - @1 and @3 shares code This reduced implementation can still be useful. Probably it doesn't reduce too much code, but it can reduce non trivial code. I believe so almost all not reduced code will be almost trivial > > Maybe instead splitting the patch into implementation components, it's > possible to split it feature-by-feature, where every single patch would > represent an independent (to a certain degree) functionality? I have in > mind something like: catalog changes; base implementation; ACL support; > xact actions implementation (on commit drop, etc); variables with > default value; shadowing; etc. If such approach is possible, it will > give us: flexibility to apply only a subset of the whole patch series; > some understanding how much complexity is coming from each feature. What > do you think about this idea? > I think cleaning, dropping can be moved to a separate patch. ACL support uses generic support (it is only a few lines). The patch 02 can be splitted - I am not sure how these parts can be independent. I'll try to split this patch, and we will see if it will be better. > I also recall
Re: [EXTERNAL] Support load balancing in libpq
On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson wrote: > > > On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) > > wrote: > > > While checking the buildfarm, I found a failure on NetBSD caused by the > > added code[1]: > > Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc > 4.7.2) > has the same error. I'll look into it today to get a fix committed. This is an i686 machine, so it probably has the same void * difference. Building with -m32 might be enough to reproduce the problem.
Re: [POC] Allow an extension to add data into Query and PlannedStmt nodes
Hi, On Wed, Mar 29, 2023 at 12:02:30PM +0500, Andrey Lepikhov wrote: > > Previously, we read int this mailing list some controversial opinions on > queryid generation and Jumbling technique. Here we don't intend to solve > these problems but help an extension at least don't conflict with others on > the queryId value. > > Extensions could need to pass some query-related data through all stages of > the query planning and execution. As a trivial example, pg_stat_statements > uses queryid at the end of execution to save some statistics. One more > reason - extensions now conflict on queryid value and the logic of its > changing. With this patch, it can be managed. I just had a quick lookc at the patch, and IIUC it doesn't really help on that side, as there's still a single official "queryid" that's computed, stored everywhere and later used by pg_stat_statements (which does then store in additionally to that official queryid). You can currently change the main jumbling algorithm with a custom extension, and all extensions will then use it as the source of truth, but I guess that what you want is to e.g. have an additional and semantically different queryid, and create multiple ecosystem of extensions, each using one or the other source of queryid without changing the other ecosystem behavior. > > This patch introduces the structure 'ExtensionData' which allows to manage > of a list of entries with a couple of interface functions > addExtensionDataToNode() and GetExtensionData(). Keep in mind the possible > future hiding of this structure from the public interface. > An extension should invent a symbolic key to identify its data. It may > invent as many additional keys as it wants but the best option here - is no > more than one entry for each extension. > Usage of this machinery is demonstrated by the pg_stat_statements example - > here we introduced Bigint node just for natively storing of queryId value. > > Ruthless pgbench benchmark shows that we got some overhead: > 1.6% - in default mode > 4% - in prepared mode > ~0.1% in extended mode. That's a quite significant overhead. But the only reason to accept such a change is to actually use it to store additional data, so it would be interesting to see what the overhead is like once you store at least 2 different values there. > - Are we need to invent a registration procedure to do away with the names > of entries and use some compact integer IDs? Note that the patch as proposed doesn't have any defense for two extensions trying to register something with the same name, or update a stored value, as AddExtensionDataToNode() simply prepend the new value to the list. You can actually update the value by just storing the new value, but it will add a significant overhead to every other extension that want to read another value. The API is also quite limited as each stored value has a single identifier. What if your extension needs to store multiple things? Since it's all based on Node you can't really store some custom struct, so you probably have to end up with things like "my_extension.my_val1", "my_extension.my_val2" which isn't great.
Re: Minimal logical decoding on standbys
Hi, On 2023-03-04 12:19:57 +0100, Drouvot, Bertrand wrote: > Subject: [PATCH v52 1/6] Add info in WAL records in preparation for logical > slot conflict handling. > > Overall design: > > 1. We want to enable logical decoding on standbys, but replay of WAL > from the primary might remove data that is needed by logical decoding, > causing error(s) on the standby. To prevent those errors, a new replication > conflict scenario needs to be addressed (as much as hot standby does). > > 2. Our chosen strategy for dealing with this type of replication slot > is to invalidate logical slots for which needed data has been removed. > > 3. To do this we need the latestRemovedXid for each change, just as we > do for physical replication conflicts, but we also need to know > whether any particular change was to data that logical replication > might access. That way, during WAL replay, we know when there is a risk of > conflict and, if so, if there is a conflict. > > 4. We can't rely on the standby's relcache entries for this purpose in > any way, because the startup process can't access catalog contents. > > 5. Therefore every WAL record that potentially removes data from the > index or heap must carry a flag indicating whether or not it is one > that might be accessed during logical decoding. > > Why do we need this for logical decoding on standby? > > First, let's forget about logical decoding on standby and recall that > on a primary database, any catalog rows that may be needed by a logical > decoding replication slot are not removed. > > This is done thanks to the catalog_xmin associated with the logical > replication slot. > > But, with logical decoding on standby, in the following cases: > > - hot_standby_feedback is off > - hot_standby_feedback is on but there is no a physical slot between > the primary and the standby. Then, hot_standby_feedback will work, > but only while the connection is alive (for example a node restart > would break it) > > Then, the primary may delete system catalog rows that could be needed > by the logical decoding on the standby (as it does not know about the > catalog_xmin on the standby). > > So, it’s mandatory to identify those rows and invalidate the slots > that may need them if any. Identifying those rows is the purpose of > this commit. This is a very nice commit message. > Implementation: > > When a WAL replay on standby indicates that a catalog table tuple is > to be deleted by an xid that is greater than a logical slot's > catalog_xmin, then that means the slot's catalog_xmin conflicts with > the xid, and we need to handle the conflict. While subsequent commits > will do the actual conflict handling, this commit adds a new field > isCatalogRel in such WAL records (and a new bit set in the > xl_heap_visible flags field), that is true for catalog tables, so as to > arrange for conflict handling. > > The affected WAL records are the ones that already contain the > snapshotConflictHorizon field, namely: > > - gistxlogDelete > - gistxlogPageReuse > - xl_hash_vacuum_one_page > - xl_heap_prune > - xl_heap_freeze_page > - xl_heap_visible > - xl_btree_reuse_page > - xl_btree_delete > - spgxlogVacuumRedirect > > Due to this new field being added, xl_hash_vacuum_one_page and > gistxlogDelete do now contain the offsets to be deleted as a > FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignement. > It's not needed on the others struct where isCatalogRel has > been added. > > Author: Andres Freund (in an older version), Amit Khandekar, Bertrand > Drouvot I think you're first author on this one by now. I think this commit is ready to go. Unless somebody thinks differently, I think I might push it tomorrow. > Subject: [PATCH v52 2/6] Handle logical slot conflicts on standby. > @@ -6807,7 +6808,8 @@ CreateCheckPoint(int flags) >*/ > XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); > KeepLogSeg(recptr, &_logSegNo); > - if (InvalidateObsoleteReplicationSlots(_logSegNo)) > + InvalidateObsoleteReplicationSlots(_logSegNo, , InvalidOid, > NULL); > + if (invalidated) > { > /* >* Some slots have been invalidated; recalculate the old-segment I don't really understand why you changed InvalidateObsoleteReplicationSlots to return void instead of bool, and then added an output boolean argument via a pointer? > @@ -7964,6 +7968,22 @@ xlog_redo(XLogReaderState *record) > /* Update our copy of the parameters in pg_control */ > memcpy(, XLogRecGetData(record), > sizeof(xl_parameter_change)); > > + /* > + * Invalidate logical slots if we are in hot standby and the > primary does not > + * have a WAL level sufficient for logical decoding. No need to > search > + * for potentially conflicting logically slots if standby is > running > + * with wal_level lower than logical, because in that case,
Re: [EXTERNAL] Support load balancing in libpq
> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) > wrote: > While checking the buildfarm, I found a failure on NetBSD caused by the added > code[1]: Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 4.7.2) has the same error. I'll look into it today to get a fix committed. -- Daniel Gustafsson
Re: ICU locale validation / canonicalization
On 30.03.23 04:33, Jeff Davis wrote: Attached is a new version of the final patch, which performs canonicalization. I'm not 100% sure that it's wanted, but it still seems like a good idea to get the locales into a standard format in the catalogs, and if a lot more people start using ICU in v16 (because it's the default), then it would be a good time to do it. But perhaps there are risks? I say, let's do it. I don't think we should show the notice when the canonicalization doesn't change anything. This is not useful: +NOTICE: using language tag "und-u-kf-upper" for locale "und-u-kf-upper" Also, the message should be phrased more from the perspective of the user instead of using ICU jargon, like NOTICE: using canonicalized form "%s" for locale specification "%s" (Still too many big words?) I don't think the special handling of IsBinaryUpgrade is needed or wanted. I would hope that with this feature, all old-style locale IDs would go away, but this way we would keep them forever. If we believe that canonicalization is safe, then I don't see why we cannot apply it during binary upgrade. Needs documentation updates in doc/src/sgml/charset.sgml.
Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
On Wed, 29 Mar 2023 21:32:05 -0700 Noah Misch wrote: > On Sun, Jan 22, 2023 at 02:42:46PM -0600, Karl O. Pinc wrote: > > v10-0001-List-trusted-and-obsolete-extensions.patch > > > + > > + These modules and extensions are obsolete: > > + > > + > > + > > + > > + > > + > > Commit a013738 incorporated this change. Since xml2 is the only > in-tree way to use XSLT from SQL, I think xml2 is not obsolete. Some > individual functions, e.g. xml_valid(), are obsolete. (There are > years-old threats to render the module obsolete, but this has never > happened.) Your point seems valid but this is above my station. I have no idea as to how to best resolve this, or even how to make the resolution happen now that the change has been committed. Someone who knows more than me about the situation is needed to change the phrasing, or re-categorize, or rework the xml2 module docs, or come up with new categories of obsolescence-like states, or provide access to libxslt from PG, or something. I am invested in the patch and appreciate being cc-ed. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
pg_basebackup: Correct type of WalSegSz
The pg_basebackup code has WalSegSz as uint32, whereas the rest of the code has it as int. This seems confusing, and using the extra range wouldn't actually work. This was in the original commit (fc49e24fa6), but I suppose it was an oversight.From 43c6b03b0d2cc044989de7722b8fd7b136fded7b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 30 Mar 2023 08:19:04 +0200 Subject: [PATCH] pg_basebackup: Correct type of WalSegSz The pg_basebackup code had WalSegSz as uint32, whereas the rest of the code has it as int. This seems confusing, and using the extra range wouldn't actually work. --- src/bin/pg_basebackup/streamutil.c | 2 +- src/bin/pg_basebackup/streamutil.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c index e7618f4617..15514599c4 100644 --- a/src/bin/pg_basebackup/streamutil.c +++ b/src/bin/pg_basebackup/streamutil.c @@ -31,7 +31,7 @@ #define ERRCODE_DUPLICATE_OBJECT "42710" -uint32 WalSegSz; +intWalSegSz; static bool RetrieveDataDirCreatePerm(PGconn *conn); diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h index 04ceb30971..268c163213 100644 --- a/src/bin/pg_basebackup/streamutil.h +++ b/src/bin/pg_basebackup/streamutil.h @@ -24,7 +24,7 @@ extern char *dbuser; extern char *dbport; extern char *dbname; extern int dbgetpassword; -extern uint32 WalSegSz; +extern int WalSegSz; /* Connection kept global so we can disconnect easily */ extern PGconn *conn; -- 2.40.0