date:20230330

Hi,

I was working on committing patch 0001 from [1] and was a bit confused about
some of the changes to the WAL format for gist and hash index vacuum. It
looked to me that the changed code just flat out would not work.

Turns out the problem is that we don't reach deletion for hash and gist
vacuum:

gist:

> Oh, I see. We apparently don't reach the gist deletion code in the tests:
> https://coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html#674
> https://coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html#174
> 
> And indeed, if I add an abort() into , it's not reached.
> 
> And it's not because tests use a temp table, the caller is also unreachable:
> https://coverage.postgresql.org/src/backend/access/gist/gist.c.gcov.html#1643


hash:
> And there also are no tests:
> https://coverage.postgresql.org/src/backend/access/hash/hashinsert.c.gcov.html#372


I've since looked to other index AMs.

spgist's XLOG_SPGIST_VACUUM_ROOT emitted, but not replayed:
https://coverage.postgresql.org/src/backend/access/spgist/spgvacuum.c.gcov.html#474
https://coverage.postgresql.org/src/backend/access/spgist/spgxlog.c.gcov.html#962

gin's XLOG_GIN_VACUUM_DATA_LEAF_PAGE is not emitted, but only because of a
RelationNeedsWAL() check:
https://coverage.postgresql.org/src/backend/access/gin/gindatapage.c.gcov.html#852


I also looked at heapam:
XLOG_HEAP2_LOCK_UPDATED is not replayed, but emitted:
https://coverage.postgresql.org/src/backend/access/heap/heapam.c.gcov.html#5487
https://coverage.postgresql.org/src/backend/access/heap/heapam.c.gcov.html#9965

same for XLOG_HEAP2_REWRITE:
https://coverage.postgresql.org/src/backend/access/heap/rewriteheap.c.gcov.html#928
https://coverage.postgresql.org/src/backend/access/heap/heapam.c.gcov.html#9975

and XLOG_HEAP_TRUNCATE (ugh, that record is quite the layering violation):
https://coverage.postgresql.org/src/backend/commands/tablecmds.c.gcov.html#2128
https://coverage.postgresql.org/src/backend/access/heap/heapam.c.gcov.html#9918


The fact that those cases aren't replayed isn't too surprising -
XLOG_HEAP2_LOCK_UPDATED is exercised by isolationtester, XLOG_HEAP2_REWRITE,
XLOG_HEAP_TRUNCATE by contrib/test_decoding. Neither is part of
027_stream_regress.pl


The lack of any coverage of hash and gist deletion/vacuum seems quite
concerning to me.


Greetings,

Andres Freund

[1] https://postgr.es/m/da7184cf-c7b9-c333-801e-0e7507a23...@gmail.com

Re: Minimal logical decoding on standbys

Hi,

On 2023-03-30 18:23:41 +0200, Drouvot, Bertrand wrote:
> On 3/30/23 9:04 AM, Andres Freund wrote:
> > I think this commit is ready to go. Unless somebody thinks differently, I
> > think I might push it tomorrow.
>
> Great! Once done, I'll submit a new patch so that GlobalVisTestFor() can make
> use of the heap relation in vacuumRedirectAndPlaceholder() (which will be 
> possible
> once 0001 is committed).

Unfortunately I did find an issue doing a pre-commit review of the patch.

The patch adds VISIBILITYMAP_IS_CATALOG_REL to xl_heap_visible.flags - but it
does not remove the bit before calling visibilitymap_set().

This ends up corrupting the visibilitymap, because the we'll set a bit for
another page.

It's unfortunate that visibilitymap_set() doesn't assert that just the correct
bits are passed in. It does assert that at least one valid bit is set, but
that's not enough, as this case shows.

I noticed this when looking into the changes to visibilitymapdefs.h in more
detail. I don't like how it ends up in the patch:

> --- a/src/include/access/visibilitymapdefs.h
> +++ b/src/include/access/visibilitymapdefs.h
> @@ -17,9 +17,11 @@
>  #define BITS_PER_HEAPBLOCK 2
>
>  /* Flags for bit map */
> -#define VISIBILITYMAP_ALL_VISIBLE0x01
> -#define VISIBILITYMAP_ALL_FROZEN 0x02
> -#define VISIBILITYMAP_VALID_BITS 0x03/* OR of all valid visibilitymap
> - 
>  * flags bits */
> +#define VISIBILITYMAP_ALL_VISIBLE
> 0x01
> +#define VISIBILITYMAP_ALL_FROZEN 
> 0x02
> +#define VISIBILITYMAP_VALID_BITS 
> 0x03/* OR of all valid visibilitymap
> + 
>  * flags bits 
> */
> +#define VISIBILITYMAP_IS_CATALOG_REL 
> 0x04/* to handle recovery conflict during logical
> + 
>  * decoding 
> on standby */

On a casual read, one very well might think that VISIBILITYMAP_IS_CATALOG_REL
is a valid bit that could be set in the VM.

I am thinking of instead creating a separate namespace for the "xlog only"
bits:

/*
 * To detect recovery conflicts during logical decoding on a standby, we need
 * to know if a table is a user catalog table. For that we add an additional
 * bit into xl_heap_visible.flags, in addition to the above.
 *
 * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
 */
#define VISIBILITYMAP_XLOG_CATALOG_REL  0x04
#define VISIBILITYMAP_XLOG_VALID_BITS   (VISIBILITYMAP_VALID_BITS | 
VISIBILITYMAP_XLOG_CATALOG_REL)

That allows heap_xlog_visible() to do:

Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == 
xlrec->flags);
vmbits = (xlrec->flags & VISIBILITYMAP_VALID_BITS);

and pass vmbits istead of xlrec->flags to visibilitymap_set().

I'm also thinking of splitting the patch into two. One patch to pass down the
heap relation into the new places, and another for the rest. As evidenced
above, looking at the actual behaviour changes is important...

Given how the patch changes the struct for XLOG_GIST_DELETE:

> diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
> index 2ce9366277..93fb9d438a 100644
> --- a/src/include/access/gistxlog.h
> +++ b/src/include/access/gistxlog.h
> @@ -51,11 +51,14 @@ typedef struct gistxlogDelete
>  {
>   TransactionId snapshotConflictHorizon;
>   uint16  ntodelete;  /* number of deleted offsets */
> + boolisCatalogRel;   /* to handle recovery conflict during 
> logical
> +  * decoding on 
> standby */
>
> - /* TODELETE OFFSET NUMBER ARRAY FOLLOWS */
> + /* TODELETE OFFSET NUMBERS */
> + OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
>  } gistxlogDelete;
>
> -#define SizeOfGistxlogDelete (offsetof(gistxlogDelete, ntodelete) + 
> sizeof(uint16))
> +#define SizeOfGistxlogDelete offsetof(gistxlogDelete, offsets)

and XLOG_HASH_VACUUM_ONE_PAGE:

> diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
> index 9894ab9afe..6c5535fe73 100644
> --- a/src/include/access/hash_xlog.h
> +++ b/src/include/access/hash_xlog.h
> @@ -252,12 +252,14 @@ typedef struct xl_hash_vacuum_one_page
>  {
>   TransactionId snapshotConflictHorizon;
>   uint16  ntuples;
> + boolisCatalogRel;   /* to handle recovery conflict during 
> logical
> +  * decoding on 
> standby */
>
> - /* TARGET OFFSET

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-30 Thread David Rowley

On Mon, 20 Mar 2023 at 11:50, Melanie Plageman
 wrote:
> Attached is an updated v6.

I had a look over the v6-0001 patch.  There are a few things I think
could be done better:

"Some operations will access a large number of pages at a time", does
this really need "at a time"? I think it's more relevant that the
operation uses a large number of pages.

Missing  around Buffer Access Strategy.

Various things could be linked to other sections of the glossary, e.g.
pages could link to glossary-data-page, shared buffers could link to
glossary-shared-memory and WAL could link to glossary-wal.

The final paragraph should have  tags around the various
commands that you list.

I have adjusted those and slightly reworded a few other things. See
the attached .diff which can be applied atop of v6-0001.

David


v6-0001-adjustments.diff
Description: Binary data

Re: Thoughts on using Text::Template for our autogenerated code?

2023-03-30 Thread John Naylor

On Fri, Mar 31, 2023 at 4:15 AM Corey Huinker 
wrote:
> For those who already responded, are your concerns limited to the
dependency issue? Would you have concerns about a templating library that
was developed by us and included in-tree?

Libraries (and abstractions in general) require some mental effort to
interface with them (that also means debugging when the output fails to
match expectations), not to mention maintenance cost. There has to be a
compensating benefit in return. The cost-to-benefit ratio here seems
unfavorable -- seems like inventing a machine that ties shoelaces, but we
usually wear sandals.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Data is copied twice when specifying both child and parent table in publication

2023-03-30 Thread Peter Smith

On Fri, Mar 31, 2023 at 5:15 AM Jacob Champion  wrote:
>
> On Wed, Mar 29, 2023 at 2:00 AM Amit Kapila  wrote:
> > Pushed.
>
> While rebasing my logical-roots patch over the top of this, I ran into
> another situation where mixed viaroot settings can duplicate data. The
> key idea is to subscribe to two publications with mixed settings, as
> before, and add a partition root that's already been replicated with
> viaroot=false to the other publication with viaroot=true.
>
>   pub=# CREATE TABLE part (a int) PARTITION BY RANGE (a);
>   pub=# CREATE PUBLICATION pub_all FOR ALL TABLES;
>   pub=# CREATE PUBLICATION pub_other FOR TABLE other WITH
> (publish_via_partition_root);
>   -- populate with data, then switch to subscription side
>   sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub_all, pub_other;
>   -- switch back to publication
>   pub=# ALTER PUBLICATION pub_other ADD TABLE part;
>   -- and back to subscription
>   sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
>   -- data is now duplicated
>
> (Standalone reproduction attached.)
>
> This is similar to what happens if you alter the
> publish_via_partition_root setting for an existing publication, but
> I'd argue it's easier to hit by accident. Is this part of the same
> class of bugs, or is it different (or even expected) behavior?
>

Hi Jacob. I tried your example. And I can see after the REFRESH the
added table 'part' tablesync is launched and so does the copy causing
duplicate data.

sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
ALTER SUBSCRIPTION
sub=# 2023-03-31 13:09:30.348 AEDT [334] LOG:  logical replication
table synchronization worker for subscription "sub", table "part" has
started
...

Duplicate data happens because REFRESH PUBLICATION has the default
"refresh_option of copy_data=true.

Although the result is at first a bit unexpected, I am not sure if
anything can be done to make it do what you probably hoped it would
do:

For example, Just imagine if logic could be made smarter to recognize
that since there was already the 'part_def' being subscribed so it
should NOT use the default 'copy_data=true' when the REFRESH launches
the ancestor table 'part'...

Even if that logic was implemented, I have a feeling you could *still*
run into problems if the 'part' table was made of multiple partitions.
I think you might get to a situation where you DO want some partition
data copied (because you did not have it yet but now you are
subscribing to the root you want it) while at the same time, you DON'T
want to get duplicated data from other partitions (because you already
knew about those ones  -- like your example does).

So, I am not sure what the answer is, or maybe there isn't one.

At least, we need to check there are sufficient "BE CAREFUL" warnings
in the documentation for scenarios like this.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Add shared buffer hits to pg_stat_io

Hi,

On 2023-03-09 08:23:46 -0500, Melanie Plageman wrote:
> Good idea. v3 attached.

I committed this, after some small regression test changes. I was worried that
the query for testing buffer hits might silently change in the future, so I
added an EXPLAIN for the query. Also removed the need for the explicit RESETs
by using BEGIN; SET LOCAL ...; query; COMMIT;.

Thanks for the patch Melanie and the review Bertrand. I'm excited about
finally being able to compute meaningful cache hit ratios :)

Regards,

Andres

RE: Non-superuser subscription owners

2023-03-30 Thread houzj.f...@fujitsu.com

On Friday, March 31, 2023 12:05 AM Robert Haas  wrote:

Hi,

> 
> On Tue, Mar 28, 2023 at 1:52 PM Jeff Davis  wrote:
> > On Fri, 2023-03-24 at 00:17 -0700, Jeff Davis wrote:
> > > The other patch you posted seems like it makes a lot of progress in
> > > that direction, and I think that should go in first. That was one of
> > > the items I suggested previously[2], so thank you for working on
> > > that.
> >
> > The above is not a hard objection.
> 
> The other patch is starting to go in a direction that is going to have some
> conflicts with this one, so I went ahead and committed this one to avoid
> rebasing pain.

I noticed the BF[1] report a core dump after this commit.

#0  0xfd581864 in _lwp_kill () from /usr/lib/libc.so.12
#0  0xfd581864 in _lwp_kill () from /usr/lib/libc.so.12
#1  0xfd5817dc in raise () from /usr/lib/libc.so.12
#2  0xfd581c88 in abort () from /usr/lib/libc.so.12
#3  0x01e6c8d4 in ExceptionalCondition 
(conditionName=conditionName@entry=0x2007758 "IsTransactionState()", 
fileName=fileName@entry=0x20565c4 "catcache.c", 
lineNumber=lineNumber@entry=1208) at assert.c:66
#4  0x01e4e404 in SearchCatCacheInternal (cache=0xfd21e500, 
nkeys=nkeys@entry=1, v1=v1@entry=28985, v2=v2@entry=0, v3=v3@entry=0, 
v4=v4@entry=0) at catcache.c:1208
#5  0x01e4eea0 in SearchCatCache1 (cache=, v1=v1@entry=28985) at 
catcache.c:1162
#6  0x01e66e34 in SearchSysCache1 (cacheId=cacheId@entry=11, 
key1=key1@entry=28985) at syscache.c:825
#7  0x01e98c40 in superuser_arg (roleid=28985) at superuser.c:70
#8  0x01c657bc in ApplyWorkerMain (main_arg=) at worker.c:4552
#9  0x01c1ceac in StartBackgroundWorker () at bgworker.c:861
#10 0x01c23be0 in do_start_bgworker (rw=) at postmaster.c:5762
#11 maybe_start_bgworkers () at postmaster.c:5986
#12 0x01c2459c in process_pm_pmsignal () at postmaster.c:5149
#13 ServerLoop () at postmaster.c:1770
#14 0x01c26cdc in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0xe0e4) at postmaster.c:1463
#15 0x01ee2c8c in main (argc=4, argv=0xe0e4) at main.c:200

It looks like the super user check is out of a transaction, I haven't checked 
why
it only failed on one BF animal, but it seems we can put the check into the
transaction like the following:

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 6fd674b5d6..08f10fc331 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -4545,12 +4545,13 @@ ApplyWorkerMain(Datum main_arg)
replorigin_session_setup(originid, 0);
replorigin_session_origin = originid;
origin_startpos = replorigin_session_get_progress(false);
-   CommitTransactionCommand();
 
/* Is the use of a password mandatory? */
must_use_password = MySubscription->passwordrequired &&
!superuser_arg(MySubscription->owner);
 
+   CommitTransactionCommand();
+

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba=2023-03-30%2019%3A41%3A08

Best Regards,
Hou Zhijie

Re: Request for comment on setting binary format output per session

2023-03-30 Thread Dave Cramer

Dave Cramer


On Thu, 30 Mar 2023 at 15:40, Jeff Davis  wrote:

> On Thu, 2023-03-30 at 07:06 -0500, Merlin Moncure wrote:
> > This ought to be impossible IMO.  All libpq routines except PQexec
> > have an explicit expectation on format (via resultformat parameter)
> > that should not be overridden.  PQexec ought to be explicitly
> > documented and wired to only request text format data.
>
> Right now it's clearly documented[1] which formats will be returned for
> a given Bind message. That can be seen as the root of the problem with
> psql -- we are breaking the protocol by returning binary when psql can
> rightfully expect text.
>
> It is a minor break, because something needed to send the "SET
> binary_formats='...'" command, but the protocol docs would need to be
> updated for sure.
>
> > participating clients to receive GUC configured format.  I do not
> > think that libpq's result format being able to be overridden by GUC
> > is a good idea at all, the library has to to participate, and I
> > think can be made to so so without adjusting the interface (say, by
> > resultFormat = 3).
>
> Interesting idea. I suppose you'd need to specify 3 for all result
> columns? That is a protocol change, but wouldn't "break" older clients.
> The newer clients would need to make sure that they're connecting to
> v16+, so using the protocol version alone wouldn't be enough. Hmm.
>

I'm confused. How does using resultFormat=3 change anything ?
Dave

Re: Partial aggregates pushdown

2023-03-30 Thread Bruce Momjian

On Thu, Dec 15, 2022 at 10:23:05PM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr.Freund.
> 
> > cfbot complains about some compiler warnings when building with clang:
> > https://cirrus-ci.com/task/6606268580757504
> > 
> > deparse.c:3459:22: error: equality comparison with extraneous parentheses
> > [-Werror,-Wparentheses-equality]
> > if ((node->aggsplit == AGGSPLIT_SIMPLE)) {
> >  ~~~^~
> > deparse.c:3459:22: note: remove extraneous parentheses around the
> > comparison to silence this warning
> > if ((node->aggsplit == AGGSPLIT_SIMPLE)) {
> > ~   ^ ~
> > deparse.c:3459:22: note: use '=' to turn this equality comparison into an
> > assignment
> > if ((node->aggsplit == AGGSPLIT_SIMPLE)) {
> > ^~
> > =
> I fixed this error.

Considering we only have a week left before feature freeze, I wanted to
review the patch from this commitfest item:

https://commitfest.postgresql.org/42/4019/

The most recent patch attached.

This feature has been in development since 2021, and it is something
that will allow new workloads for Postgres, specifically data warehouse
sharding workloads.

We currently allow parallel aggregates when the table is on the same
machine, and we allow partitonwise aggregates on FDWs only with GROUP BY
keys matching partition keys.  The first is possible since we can share
data structures between background workers, and the second is possible
because if the GROUP BY includes the partition key, we are really just
appending aggregate rows, not combining aggregate computations.

What we can't do without this patch is to push aggregates that require
partial aggregate computations (no partition key GROUP BY) to FDW
partitions because we don't have a clean way to pass such information
from the remote FDW server to the finalize backend.  I think that is
what this patch does.

First, am I correct?  Second, how far away is this from being committable
and/or what work needs to be done to get it committable, either for PG 16
or 17?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.
>From 71993e37093b3cec325de5989a03afb3073775aa Mon Sep 17 00:00:00 2001
From: Yuki Fujii 
Date: Fri, 16 Dec 2022 09:33:30 +0900
Subject: [PATCH] Partial aggregates push down v17

---
 contrib/postgres_fdw/deparse.c|  97 +-
 .../postgres_fdw/expected/postgres_fdw.out| 296 +-
 contrib/postgres_fdw/option.c |  24 ++
 contrib/postgres_fdw/postgres_fdw.c   |  22 +-
 contrib/postgres_fdw/postgres_fdw.h   |   3 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |  85 -
 src/backend/catalog/pg_aggregate.c|  32 ++
 src/backend/commands/aggregatecmds.c  |   8 +
 src/bin/pg_dump/pg_dump.c |  25 +-
 src/include/catalog/pg_aggregate.dat  |  62 +++-
 src/include/catalog/pg_aggregate.h|  11 +
 src/include/catalog/pg_proc.dat   |  35 +++
 .../regress/expected/create_aggregate.out |  24 ++
 src/test/regress/expected/oidjoins.out|   1 +
 src/test/regress/sql/create_aggregate.sql |  24 ++
 15 files changed, 715 insertions(+), 34 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 9524765650..8e5a45ceaa 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -202,7 +202,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 			int *relno, int *colno);
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
-
+static bool partial_agg_ok(Aggref* agg, PgFdwRelationInfo* fpinfo);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -907,8 +907,9 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg, fpinfo))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3448,14 +3449,37 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	StringInfo	buf = context->buf;
 	bool		use_variadic;
 
-	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+
+	Assert((node->aggsplit == AGGSPLIT_SIMPLE) ||
+		(node->aggsplit == AGGSPLIT_INITIAL_SERIAL));
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
 
-	/* Find aggregate name from aggfnoid which is a pg_proc

Re: Thoughts on using Text::Template for our autogenerated code?

Andres Freund  writes:
> On 2023-03-30 17:15:20 -0400, Corey Huinker wrote:
>> For those who already responded, are your concerns limited to the
>> dependency issue? Would you have concerns about a templating library that
>> was developed by us and included in-tree? I'm not suggesting suggesting we
>> write one at this time, at least not until after we make a here-doc-ing
>> pass first, but I want to understand the concerns before embarking on any
>> refactoring.

> The dependency is/was my main issue. But I'm also somewhat doubtful that what
> we do warrants the use of a template library in the first place, but I could
> be convinced by a patch showing it to be a significant improvement.

Yeah, it's somewhat hard to believe that the cost/benefit ratio would be
attractive.  But maybe you could mock up some examples of what the input
could look like, and get people on board (or not) before writing any
code.

regards, tom lane

Re: Support logical replication of DDLs

2023-03-30 Thread Peter Smith

It seems that lately, the patch attachments are lacking version
numbers. It causes unnecessary confusion. For example, I sometimes
fetch patches from this thread locally to "diff" them with previous
patches to get a rough overview of the changes -- that has now become
more difficult.

Can you please reinstate the name convention of having version numbers
for all patch attachments?

IMO *every* post that includes patches should unconditionally
increment the patch version -- even if the new patches are just a
rebase or some other trivial change. Version numbers make it clear
what patches are the latest, you will be easily able to unambiguously
refer to them by name in subsequent posts, and when copied to your
local computer they won't clash with any older copied patches.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Add pg_walinspect function with block info columns

2023-03-30 Thread Peter Geoghegan

On Thu, Mar 30, 2023 at 2:41 PM Peter Geoghegan  wrote:
> pg@regression:5432 [1402115]=# SELECT
>   count(*)
> FROM
>   pg_get_wal_block_info ('0/10E9D80', '/', true);
> ┌─[ RECORD 1 ]───┐
> │ count │ 17,031,979 │
> └───┴┘
>
> Time: 15235.499 ms (00:15.235)
>
> This time is also typical of what I saw. The variance was fairly low,
> so I won't bother describing it.

If I rerun the same test case with pg_get_wal_records_info (same WAL
records, same system) then I find that it takes about 16 and a half
seconds. So my patch makes pg_get_wal_block_info a little bit faster
than pg_get_wal_records_info for this test case, and likely many
interesting cases (assuming that the user opts out of fetching
block_data and block_fpi_data values when running
pg_get_wal_block_info, per the patch).

This result closely matches what I was expecting. We're doing almost
the same amount of work when each function is called, so naturally the
runtime almost matches. Note that pg_get_wal_records_info does
slightly *more* work here, since it alone must output rows for commit
records. Unlike pg_get_wal_block_info, which (by design) never outputs
rows for WAL records that lack block references.

-- 
Peter Geoghegan

Re: Add pg_walinspect function with block info columns

2023-03-30 Thread Peter Geoghegan

On Wed, Mar 29, 2023 at 8:28 PM Bharath Rupireddy
 wrote:
> I took a look at v9 and LGTM.

Pushed, thanks.

There is still an outstanding question around the overhead of
outputting FPIs and even block data from pg_get_wal_block_info(). At
one point Melanie suggested that we'd need to do something about that,
and I tend to agree. Attached patch provides an optional parameter
that will make pg_get_wal_block_info return NULLs for both block_data
and block_fpi_data, no matter whether or not there is something to
show. Note that this only affects those two bytea columns; we'll still
show everything else, including valid block_data_length and
block_fpi_length values (so the metadata describing the on-disk size
of block_data and block_fpi_data is unaffected).

To test this patch, I ran pgbench for about 5 minutes, using a fairly
standard configuration with added indexes and with wal_log_hints
enabled. I ended up with the following WAL records afterwards:

pg@regression:5432 [1402115]=# SELECT
  "resource_manager/record_type" t,
  pg_size_pretty(combined_size) s,
  fpi_size_percentage perc_fpi
FROM
  pg_get_wal_Stats ('0/10E9D80', '/', FALSE) where
combined_size > 0;
┌─[ RECORD 1 ]──┐
│ t│ XLOG   │
│ s│ 1557 MB│
│ perc_fpi │ 22.029466865781302 │
├─[ RECORD 2 ]──┤
│ t│ Transaction│
│ s│ 49 MB  │
│ perc_fpi │ 0  │
├─[ RECORD 3 ]──┤
│ t│ Storage│
│ s│ 13 kB  │
│ perc_fpi │ 0  │
├─[ RECORD 4 ]──┤
│ t│ CLOG   │
│ s│ 1380 bytes │
│ perc_fpi │ 0  │
├─[ RECORD 5 ]──┤
│ t│ Database   │
│ s│ 118 bytes  │
│ perc_fpi │ 0  │
├─[ RECORD 6 ]──┤
│ t│ RelMap │
│ s│ 565 bytes  │
│ perc_fpi │ 0  │
├─[ RECORD 7 ]──┤
│ t│ Standby│
│ s│ 30 kB  │
│ perc_fpi │ 0  │
├─[ RECORD 8 ]──┤
│ t│ Heap2  │
│ s│ 4235 MB│
│ perc_fpi │ 0.6731388657682449 │
├─[ RECORD 9 ]──┤
│ t│ Heap   │
│ s│ 4482 MB│
│ perc_fpi │ 54.46811493602934  │
├─[ RECORD 10 ]─┤
│ t│ Btree  │
│ s│ 1786 MB│
│ perc_fpi │ 22.829279332421116 │
└──┴┘

Time: 3618.693 ms (00:03.619)

So about 12GB of WAL -- certainly enough to be a challenge for pg_walinspect.

I then ran the following query several times over the same LSN range
as before, with my patch applied, but with behavior equivalent to
current git HEAD (this is with outputting block_data and
block_fpi_data values still turned on):

pg@regression:5432 [1402115]=# SELECT
  count(*)
FROM
  pg_get_wal_block_info ('0/10E9D80', '/', false);
┌─[ RECORD 1 ]───┐
│ count │ 17,031,979 │
└───┴┘

Time: 35171.463 ms (00:35.171)

The time shown here is typical of what I saw.

And now the same query, but without any overhead for outputting
block_data and block_fpi_data values:

pg@regression:5432 [1402115]=# SELECT
  count(*)
FROM
  pg_get_wal_block_info ('0/10E9D80', '/', true);
┌─[ RECORD 1 ]───┐
│ count │ 17,031,979 │
└───┴┘

Time: 15235.499 ms (00:15.235)

This time is also typical of what I saw. The variance was fairly low,
so I won't bother describing it.

I think that this is a compelling reason to apply the patch. It would
be possible to get about 75% of the benefit shown here by just
suppressing block_fpi_data output, without suppressing block_data, but
I think that it makes sense to either suppress both or neither. Things
like page split records can write a fairly large amount of WAL in a
way that resembles an FPI, even though technically no FPI is involved.

If there are no objections, I'll move ahead with committing something
along the lines of this patch in the next couple of days.

-- 
Peter Geoghegan


v1-0001-pg_get_wal_block_info-suppress-block-data-outputs.patch
Description: Binary data

Re: Thoughts on using Text::Template for our autogenerated code?

Hi,

On 2023-03-30 17:15:20 -0400, Corey Huinker wrote:
> For those who already responded, are your concerns limited to the
> dependency issue? Would you have concerns about a templating library that
> was developed by us and included in-tree? I'm not suggesting suggesting we
> write one at this time, at least not until after we make a here-doc-ing
> pass first, but I want to understand the concerns before embarking on any
> refactoring.

The dependency is/was my main issue. But I'm also somewhat doubtful that what
we do warrants the use of a template library in the first place, but I could
be convinced by a patch showing it to be a significant improvement.

Greetings,

Andres Freund

Re: Refactor calculations to use instr_time

Hi,

I pushed this version! Thanks all, for the contribution and reviews.


> > I would either put WalUsage into PgStat_PendingWalStats (so that it has
> > all the same members as PgStat_WalStats), or figure out a way to
> > maintain WalUsage separately from PgStat_WalStats or something else.
> > Worst case, add more comments to the struct definitions to explain why
> > they have the members they have and how WalUsage relates to them.
> 
> Yes, but like Andres said this could be better done as a separate patch.

I invite you to write a patch for that for 17...

Greetings,

Andres Freund

Re: ICU locale validation / canonicalization

2023-03-30 Thread Jeff Davis

On Thu, 2023-03-30 at 08:59 +0200, Peter Eisentraut wrote:

> I don't think the special handling of IsBinaryUpgrade is needed or 
> wanted.  I would hope that with this feature, all old-style locale
> IDs 
> would go away, but this way we would keep them forever.  If we
> believe 
> that canonicalization is safe, then I don't see why we cannot apply
> it 
> during binary upgrade.

There are two issues:

1. Failures can occur. For instance, if an invalid attribute is used,
like '@collStrength=primary', then we can't canonicalize it (or if we
do, it could end up being not what the user intended).

2. Version 15 and earlier have a subtle bug: it passes the raw locale
straight to ucol_open(), and if the locale is "fr_CA.UTF-8" ucol_open()
mis-parses it to have language "fr" with no region. If you canonicalize
first, it properly parses the locale and produces "fr-CA", which
results in a different collator. The 15 behavior is wrong, and this
canonicalization patch will fix it, but it doesn't do so during
pg_upgrade because that could change the collator and corrupt an index.

The current patch deals with these problems by simply preserving the
locale (valid or not) during pg_upgrade, and only canonicalizing new
collations and databases (so #2 is only fixed for new
collations/databases). I think that's a good trade-off because a lot
more users will be on ICU now that it's the default, so let's avoid
creating more of the problem cases for those new users.

To get to perfectly-canonicalized catalogs for upgrades from earlier
versions:

* We need a way to detect #2, which I posted some code for in an
uncommitted revision[1] of this patch series.

* We need a way to detect #1 and #2 during the pg_upgrade --check
phase.

* We need actions that the user can take to correct the problems. I
have some ideas but they could use some dicsussion.

I'm not sure all of those will be ready for v16, though.

Regards,
Jeff Davis

[1] See check_equivalent_icu_locales() and calling code here:
https://www.postgresql.org/message-id/8c7af6820aed94dc7bc259d2aa7f9663518e6137.ca...@j-davis.com

Re: Thoughts on using Text::Template for our autogenerated code?

>
> I don't think that's remotely a starter. Asking people to install an old
> and possibly buggy version of a perl module is not something we should do.
>
> I think the barrier for this is pretty high. I try to keep module
> requirements for the buildfarm client pretty minimal, and this could affect
> a much larger group of people.
>
Those are good reasons.

For those who already responded, are your concerns limited to the
dependency issue? Would you have concerns about a templating library that
was developed by us and included in-tree? I'm not suggesting suggesting we
write one at this time, at least not until after we make a here-doc-ing
pass first, but I want to understand the concerns before embarking on any
refactoring.

Re: Thoughts on using Text::Template for our autogenerated code?

>
> I think many of those could just be replaced by multi-line strings and/or
> here
> documents to get most of the win.
>

I agree that a pretty good chunk of it can be done with here-docs, but
template files do have attractive features (separation of concerns, syntax
highlighting, etc) that made it worth asking.

Re: pgsql: Clean up role created in new subscription test.

> On 30 Mar 2023, at 22:29, Tom Lane  wrote:
> Daniel Gustafsson  writes:
>>> On 30 Mar 2023, at 20:44, Tom Lane  wrote:

>>> Another idea could be for pg_regress to enforce that "select count(*)
>>> from pg_roles" gives the same answer before and after the test run.
> 
>> That wouldn't prevent the contents of pg_roles to have changed though, so 
>> there
>> is a (slim) false positive risk with that no?
> 
> Well, we could do "select rolname from pg_roles order by 1" and
> actually compare the results of the two selects.  That might be
> advisable anyway, in order to produce a complaint with useful
> detail when there is something wrong.

I can see the value in doing something like this to keep us honest.

--
Daniel Gustafsson

Re: pgsql: Clean up role created in new subscription test.

Daniel Gustafsson  writes:
>> On 30 Mar 2023, at 20:44, Tom Lane  wrote:
>> Maybe it'd be close enough to expect there to be no roles named
>> "regress_xxx".  In combination with
>> -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, that would prevent us
>> from accidentally leaving stuff behind, and we could hope that it doesn't
>> cause false failures in real installations.

> Would that check be always on or only when pg_regress is compiled with
> -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS?

I envisioned it as being on all the time.

>> Another idea could be for pg_regress to enforce that "select count(*)
>> from pg_roles" gives the same answer before and after the test run.

> That wouldn't prevent the contents of pg_roles to have changed though, so 
> there
> is a (slim) false positive risk with that no?

Well, we could do "select rolname from pg_roles order by 1" and
actually compare the results of the two selects.  That might be
advisable anyway, in order to produce a complaint with useful
detail when there is something wrong.

regards, tom lane

Re: Transparent column encryption


On 30.03.23 20:35, Stephen Frost wrote:

I do feel that column encryption is a useful capability and there's
large parts of this approach that I agree with, but I dislike the idea
of having our clients be able to depend on what gets returned for
non-encrypted columns while not being able to trust what encrypted
column results are and then trying to say it's 'transparent'.  To that
end, it seems like just saying they get back a bytea and making it clear
that they have to provide the validation would be clear, while keeping
much of the rest.


[Note that the word "transparent" has been removed from the feature 
name.  I just didn't change the email thread name.]


These thoughts are reasonable, but I think there is a tradeoff to be 
made between having featureful data validation and enhanced security. 
If you want your database system to validate your data, you have to send 
it in plain text.  If you want to have your database system not see the 
plain text, then it cannot validate it.  But there is still utility in it.


You can't really depend on what gets returned even in the non-encrypted 
case, unless you cryptographically sign the schema against modification 
or something like that.  So realistically, a client that cares strongly 
about the data it receives has to do some kind of client-side validation 
anyway.


Note also that higher-level client libraries like JDBC effectively do 
client-side data validation, for example when you call 
ResultSet.getInt() etc.


This is also one of the reasons why the user facing type declaration 
exists.  You could just make all encrypted columns of type "opaque" or 
something and not make any promises about what's inside.  But client 
APIs sort or rely on the application being able to ask the result set 
for what's inside a column value.  If we just say, we don't know, then 
applications (or driver APIs) will have to be changed to accommodate 
that, but the intention was to not require that.  So instead we say, 
it's supposed to be int, and then if it's sometimes actually not int, 
then your application throws an exception you can deal with.  This is 
arguably a better developer experience, even if it concerns the data 
type purist.


But do you have a different idea about how it should work?


Expanding out from that I'd imagine, pie-in-the-sky
and in some far off land, having our data type in/out validation
functions moved to the common library and then adding client-side
validation of the data going in/out of the encrypted columns would allow
application developers to be able to trust what we're returning (as long
as they're using libpq- and we'd have to document that independent
implementations of the protocol have to provide this or just continue to
return bytea's).


As mentioned, some client libraries effectively already do that.  But 
even if we could make this much more comprehensive, I don't see how this 
can ever actually satisfy your point.  It would require that all 
participating clients apply validation all the time, and all other 
clients can rely on that happening, which is impossible.

Re: Thoughts on using Text::Template for our autogenerated code?

2023-03-30 Thread Andrew Dunstan



On 2023-03-30 Th 15:06, Daniel Gustafsson wrote:

On 30 Mar 2023, at 19:06, Corey Huinker  wrote:
Some other digging around shows that the module has been around since 1996 
(Perl5 was 1994) and hasn't had a feature update (or any update for that 
matter) since 2003. So it should meet our baseline 5.14 requirement, which came 
out in 2011.

I assume you then mean tying this to 1.44 (or another version?), since AFAICT
there has been both features and bugfixes well after 2003?

https://metacpan.org/dist/Text-Template/changes



I don't think that's remotely a starter. Asking people to install an old 
and possibly buggy version of a perl module is not something we should do.


I think the barrier for this is pretty high. I try to keep module 
requirements for the buildfarm client pretty minimal, and this could 
affect a much larger group of people.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: running logical replication as the subscription owner

On Thu, Mar 30, 2023 at 2:52 PM Jeff Davis  wrote:
> > Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET
> > ROLE to A. With the patch as written, actions on B's tables are not
> > confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on
> > C's
> > tables are.
>
> It's interesting that it's not transitive, but I'm not sure whether
> that's an argument for or against the current approach, or where it
> fits (or doesn't fit) with my suggestion. Why do you consider it
> important that C's actions are SECURITY_RESTRICTED_OPERATIONs?

So that C can't try to hack into A's account.

I mean the point here is that B already has permissions to get into
A's account whenever they like, without any hacking. So we don't need
to impose SECURITY_RESTRICTED_OPERATION when running as B, because the
only purpose of SECURITY_RESTRICTED_OPERATION is to prevent the role
to which we're switching from attacking the role from which we're
switching. And that's how the patch is currently coded. You proposed
removing that behavior, on the theory that if the
SECURITY_RESTRICTED_OPERATION restrictions were a problem, someone
could activate the run_as_owner option (or whatever we end up calling
it). But the run_as_owner option applies to the entire subscription.
If A activates that option, then B's hypothetical triggers that run
afoul of the SECURITY_RESTRICTED_OPERATION restrictions start working
again (woohoo!) but they're now vulnerable to attacks from C. With the
patch as coded, A doesn't need to use run_as_owner, everything still
just works for B, and A is still protected against C.

> In version 16, the subscription owner is almost certainly a superuser,
> and the table owner almost certainly is not, so there's little chance
> that it just happens that the table owner has that privilege already.
>
> I don't think we want to encourage such grants to proliferate any more
> than we want the option you introduce in 0002 to proliferate. Arguably,
> it's worse.

I don't necessarily find those role grants to be a problem. Obviously
it depends on the use case. If you're hoping to be able to set up an
account whose only purpose in life is to own subscriptions and which
should have as few permissions as possible, then those role grants
suck, and a hypothetical future feature where you can GRANT
REPLICATION ON TABLE t1 TO subscription_owning_user will be far
better. But I imagine CREATE SUBSCRIPTION being used either by
superusers or by people who already have those role grants anyway,
because I imagine replication as something that a highly privileged
user configures on behalf of everyone who uses the system. And in that
case those role grants aren't something new that you do specifically
for logical replication - they're already there because you need them
to administer stuff. Or you're the superuser and don't need them
anyway.

> And let's say a user says "I upgraded and my trigger broke logical
> replication with message about a security-restricted operation... how
> do I get up and running again?". With the patches as-written, we will
> have two answers to that question:
>
>   * GRANT subscription_owner TO table_owner WITH SET TRUE
>   * ALTER SUBSCRIPTION ... ($awesome_option_name=false)
>
> Under what circumstances would we recommend the former vs. the latter?

Well, the latter is clearly better because it has such an awesome
option name, right?

More seriously, my theory is that there's very little use case for
having a replication trigger, default expression, etc. that is
performing a security restricted operation.  And if someone does have
a use case, and it's between users that can't already SET ROLE back
and forth, then the setup is pretty dubious from a security
perspective and maybe the user ought to rethink it. And if they don't
want to rethink it, then they need to throw security out the window,
and I don't really care which of those commands they use to do it, but
the second one would probably break less other stuff for them, so I'd
likely recommend that one.

> > I don't think run_as_owner is terrible, despite the ambiguity.
>
> I won't object but I'm not thrilled.

Let's see if anyone else weighs in.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Transparent column encryption


On 30.03.23 17:55, Andres Freund wrote:

I find it very hard to belief that details of the catalog representation like
this will matter to users. How would would it conceivably affect users that we
store (key, encryption method) in pg_attribute vs storing an oid that's
effectively a foreign key reference to (key, encryption method)?


The change you are alluding to would also affect how the DDL commands 
work and interoperate, so it affects the user.


But also, let's not drive this design decision bottom up.  Let's go from 
how we want the data model and the DDL to work and then figure out 
suitable ways to record that.  I don't really know if you are just 
worried about the catalog size, or you find an actual fault with the 
data model, or you just find it subjectively odd.



The feature is arguably useful without typmod support, e.g., for text. We
could ship it like that, then do some work to reorganize pg_attribute and
tuple descriptors to relieve some pressure on each byte, and then add the
typmod support back in in a future release.  I think that is a workable
compromise.


I doubt that shipping a version of column encryption that breaks our type
system is a good idea.


I don't follow how you get from that to claiming that it breaks the type 
system.  Please provide more details.

Re: Request for comment on setting binary format output per session

2023-03-30 Thread Jeff Davis

On Thu, 2023-03-30 at 07:06 -0500, Merlin Moncure wrote:
> This ought to be impossible IMO.  All libpq routines except PQexec
> have an explicit expectation on format (via resultformat parameter)
> that should not be overridden.  PQexec ought to be explicitly
> documented and wired to only request text format data.

Right now it's clearly documented[1] which formats will be returned for
a given Bind message. That can be seen as the root of the problem with
psql -- we are breaking the protocol by returning binary when psql can
rightfully expect text.

It is a minor break, because something needed to send the "SET
binary_formats='...'" command, but the protocol docs would need to be
updated for sure.

> participating clients to receive GUC configured format.  I do not
> think that libpq's result format being able to be overridden by GUC
> is a good idea at all, the library has to to participate, and I
> think can be made to so so without adjusting the interface (say, by
> resultFormat = 3).

Interesting idea. I suppose you'd need to specify 3 for all result
columns? That is a protocol change, but wouldn't "break" older clients.
The newer clients would need to make sure that they're connecting to
v16+, so using the protocol version alone wouldn't be enough. Hmm.

Regards,
Jeff Davis

[1]
https://www.postgresql.org/docs/current/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-BIND

Re: Should vacuum process config file reload more often

> On 30 Mar 2023, at 04:57, Masahiko Sawada  wrote:

> As another idea, why don't we use macros for that? For example,
> suppose VacuumCostStatus is like:
> 
> typedef enum VacuumCostStatus
> {
>VACUUM_COST_INACTIVE_LOCKED = 0,
>VACUUM_COST_INACTIVE,
>VACUUM_COST_ACTIVE,
> } VacuumCostStatus;
> VacuumCostStatus VacuumCost;
> 
> non-vacuum code can use the following macros:
> 
> #define VacuumCostActive() (VacuumCost == VACUUM_COST_ACTIVE)
> #define VacuumCostInactive() (VacuumCost <= VACUUM_COST_INACTIVE) //
> or we can use !VacuumCostActive() instead.

I'm in favor of something along these lines.  A variable with a name that
implies a boolean value (active/inactive) but actually contains a tri-value is
easily misunderstood.  A VacuumCostState tri-value variable (or a better name)
with a set of convenient macros for extracting the boolean active/inactive that
most of the code needs to be concerned with would more for more readable code I
think.

--
Daniel Gustafsson

Re: Parallel Full Hash Join

2023-03-30 Thread Melanie Plageman

On Mon, Mar 27, 2023 at 7:04 PM Thomas Munro  wrote:
> I found another problem.  I realised that ... FULL JOIN ... LIMIT n
> might be able to give wrong answers with unlucky scheduling.
> Unfortunately I have been unable to reproduce the phenomenon I am
> imagining yet but I can't think of any mechanism that prevents the
> following sequence of events:
>
> P0 probes, pulls n tuples from the outer relation and then the
> executor starts to shut down (see commit 3452dc52 which pushed down
> LIMIT), but just then P1 attaches, right before P0 does.  P1
> continues, and finds < n outer tuples while probing and then runs out
> so it enters the unmatched scan phase, and starts emitting bogusly
> unmatched tuples.  Some outer tuples we needed to get the complete set
> of match bits and thus the right answer were buffered inside P0's
> subplan and abandoned.
>
> I've attached a simple fixup for this problem.  Short version: if
> you're abandoning your PHJ_BATCH_PROBE phase without reaching the end,
> you must be shutting down, so the executor must think it's OK to
> abandon tuples this process has buffered, so it must also be OK to
> throw all unmatched tuples out the window too, as if this process was
> about to emit them.  Right?

I understand the scenario you are thinking of, however, I question how
those incorrectly formed tuples would ever be returned by the query. The
hashjoin would only start to shutdown once enough tuples had been
emitted to satisfy the limit, at which point, those tuples buffered in
p0 may be emitted by this worker but wouldn't be included in the query
result, no?

I suppose even if what I said is true, we do not want the hashjoin node
to ever produce incorrect tuples. In which case, your fix seems correct to me.

> With all the long and abstract discussion of hard to explain problems
> in this thread and related threads, I thought I should take a step
> back and figure out a way to demonstrate what this thing really does
> visually.  I wanted to show that this is a very useful feature that
> unlocks previously unobtainable parallelism, and to show the
> compromise we've had to make so far in an intuitive way.  With some
> extra instrumentation hacked up locally, I produced the attached
> "progress" graphs for a very simple query: SELECT COUNT(*) FROM r FULL
> JOIN s USING (i).  Imagine a time axis along the bottom, but I didn't
> bother to add numbers because I'm just trying to convey the 'shape' of
> execution with relative times and synchronisation points.
>
> Figures 1-3 show that phases 'h' (hash) and 'p' (probe) are
> parallelised and finish sooner as we add more processes to help out,
> but 's' (= the unmatched inner tuple scan) is not.  Note that if all
> inner tuples are matched, 's' becomes extremely small and the
> parallelism is almost as fair as a plain old inner join, but here I've
> maximised it: all inner tuples were unmatched, because the two
> relations have no matches at all.  Even if we achieve perfect linear
> scalability for the other phases, the speedup will be governed by
> https://en.wikipedia.org/wiki/Amdahl%27s_law and the only thing that
> can mitigate that is if there is more useful work those early-quitting
> processes could do somewhere else in your query plan.
>
> Figure 4 shows that it gets a lot fairer in a multi-batch join,
> because there is usually useful work to do on other batches of the
> same join.  Notice how processes initially work on loading, probing
> and scanning different batches to reduce contention, but they are
> capable of ganging up to load and/or probe the same batch if there is
> nothing else left to do (for example P2 and P3 both work on p5 near
> the end).  For now, they can't do that for the s phases.  (BTW, the
> little gaps before loading is the allocation phase that I didn't
> bother to plot because they can't fit a label on them; this
> visualisation technique is a WIP.)
>
> With the "opportunistic" change we are discussing for later work,
> figure 4 would become completely fair (P0 and P2 would be able to join
> in and help out with s6 and s7), but single-batch figures 1-3 would
> not (that would require a different executor design AFAICT, or a
> eureka insight we haven't had yet; see long-winded discussion).

Cool diagrams!

> The last things I'm thinking about now:  Are the planner changes
> right?

I think the current changes are correct. I wonder if we have to change
anything in initial/final_cost_hashjoin to account for the fact that
for a single batch full/right parallel hash join, part of the
execution is serial. And, if so, do we need to consider the estimated
number of unmatched tuples to be emitted?

> Are the tests enough?

So, the tests currently in the patch set cover the unmatched tuple scan
phase for single batch parallel full hash join. I've attached the
dumbest possible addition to that which adds in a multi-batch full
parallel hash join case. I did not do any checking to ensure I picked
the case which would

Re: pgsql: Clean up role created in new subscription test.

> On 30 Mar 2023, at 20:44, Tom Lane  wrote:

> Maybe it'd be close enough to expect there to be no roles named
> "regress_xxx".  In combination with
> -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, that would prevent us
> from accidentally leaving stuff behind, and we could hope that it doesn't
> cause false failures in real installations.

Would that check be always on or only when pg_regress is compiled with
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS?

> Another idea could be for pg_regress to enforce that "select count(*)
> from pg_roles" gives the same answer before and after the test run.

That wouldn't prevent the contents of pg_roles to have changed though, so there
is a (slim) false positive risk with that no?

--
Daniel Gustafsson

Re: ResourceOwner refactoring

2023-03-30 Thread Aleksander Alekseev

Hi,

> This patch, although moderately complicated, was moved between several
> commitfests. I think it would be great if it made it to PG16. I'm
> inclined to change the status of the patchset to RfC in a bit, unless
> anyone has a second opinion.

> I added a test module in src/test/modules/test_resowner to test those cases.

Hmm... it looks like a file is missing in the patchset. When building
with Autotools:

```
== running regression test queries==
test test_resowner... /bin/sh: 1: cannot open
/home/eax/projects/postgresql/src/test/modules/test_resowner/sql/test_resowner.sql:
No such file
```

I wonder why the tests pass when using Meson.

-- 
Best regards,
Aleksander Alekseev

Re: Thoughts on using Text::Template for our autogenerated code?

> On 30 Mar 2023, at 19:06, Corey Huinker  wrote:

> Some other digging around shows that the module has been around since 1996 
> (Perl5 was 1994) and hasn't had a feature update (or any update for that 
> matter) since 2003. So it should meet our baseline 5.14 requirement, which 
> came out in 2011.

I assume you then mean tying this to 1.44 (or another version?), since AFAICT
there has been both features and bugfixes well after 2003?

https://metacpan.org/dist/Text-Template/changes

--
Daniel Gustafsson

Re: running logical replication as the subscription owner

2023-03-30 Thread Jeff Davis

On Thu, 2023-03-30 at 09:41 -0400, Robert Haas wrote:
> On Thu, Mar 30, 2023 at 1:19 AM Jeff Davis  wrote:
> > I say just take the special case out of 0001. If the trigger
> > doesn't
> > work as a SECURITY_RESTRICTED_OPERATION, and is also ENABLED
> > ALWAYS,
> > then the user can just use the new option in 0002 to get the old
> > behavior. I don't see a reason to implicitly give them the old
> > behavior, as 0001 does.
> 
> Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET
> ROLE to A. With the patch as written, actions on B's tables are not
> confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on
> C's
> tables are.

It's interesting that it's not transitive, but I'm not sure whether
that's an argument for or against the current approach, or where it
fits (or doesn't fit) with my suggestion. Why do you consider it
important that C's actions are SECURITY_RESTRICTED_OPERATIONs?

> I think we want to do everything possible to avoid people feeling
> like
> they need to turn on this new option. I'm not sure we'll ever be able
> to get rid of it, but we certainly should avoid doing things that
> make
> it more likely that it will be needed.

I don't think it helps much, though. While I previously said that the
special-case behavior is implicit (which is true), it still almost
certainly requires a manual step:

  GRANT subscription_owner TO table_owner WITH SET;

In version 16, the subscription owner is almost certainly a superuser,
and the table owner almost certainly is not, so there's little chance
that it just happens that the table owner has that privilege already.

I don't think we want to encourage such grants to proliferate any more
than we want the option you introduce in 0002 to proliferate. Arguably,
it's worse.

And let's say a user says "I upgraded and my trigger broke logical
replication with message about a security-restricted operation... how
do I get up and running again?". With the patches as-written, we will
have two answers to that question:

  * GRANT subscription_owner TO table_owner WITH SET TRUE
  * ALTER SUBSCRIPTION ... ($awesome_option_name=false)

Under what circumstances would we recommend the former vs. the latter?

> > 
> I agree that the naming is somewhat problematic, but I don't like
> trust_table_owners. It's not clear enough about what actually
> happens.
> I want the name to describe behavior, not sentiment.

On reflection, I agree here. We want it to communicate something about
the behavior or mechanism.

> run_as_subscription_owner removes the ambiguity, but is long.

Then fewer people will use it, which might be a good thing.

> I can think of other alternatives, like user_switching or
> switch_to_table_owner or no_user_switching or various other things,
> but none of them seem very good to me.

I like the idea of using "switch" (or some synonym) because it's
technically more correct. The subscription always runs as the
subscription owner; we are just switching temporarily while applying a
change.

> Another idea could be to make the option non-Boolean. This is
> comically long and I can't seriously recommend it, but just to
> illustrate the point, if you type CREATE SUBSCRIPTION ... WITH
> (execute_code_as_owner_of_which_object = subscription) then you
> certainly should know what you've signed up for! If there were a
> shorter version that were still clear, I could go for that, but I'm
> having trouble coming up with exact wording.

I don't care for that -- it communicates the options as co-equal and
maybe something that would live forever (or even have more options in
the future). I'd prefer that nobody uses the non-switching behavior
except for migration purposes or weird use cases we don't really
understand.

> I don't think run_as_owner is terrible, despite the ambiguity.

I won't object but I'm not thrilled.

> 
Regards,
Jeff Davis

Re: Thoughts on using Text::Template for our autogenerated code?

Hi,

On 2023-03-30 13:06:46 -0400, Corey Huinker wrote:
> Is there a barrier to us using non-core perl modules, in this case
> Text::Template?

I don't think we should have a hard build-time dependency on non-core perl
modules. On some operating systems having to install such dependencies is
quite painful (e.g. windows).

> I think it would be a tremendous improvement in readability and
> maintainability over our current series of print statements, some
> multiline, some not.

I think many of those could just be replaced by multi-line strings and/or here
documents to get most of the win.

Greetings,

Andres Freund

Re: pgsql: Clean up role created in new subscription test.

Robert Haas  writes:
> On Thu, Mar 30, 2023 at 1:07 PM Tom Lane  wrote:
>> This oversight broke repeated runs of "make installcheck".

> GH. You would think that I would have learned better by now, but
> evidently not. Is there some way we can add an automated guard against
> this?

Hm.  We could add a final test step that prints out all still-existing
roles, but the trick is to have it not fail in a legitimate installcheck
context (ie, when there are indeed some pre-existing roles).

Maybe it'd be close enough to expect there to be no roles named
"regress_xxx".  In combination with
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, that would prevent us
from accidentally leaving stuff behind, and we could hope that it doesn't
cause false failures in real installations.

Another idea could be for pg_regress to enforce that "select count(*)
from pg_roles" gives the same answer before and after the test run.
That would then be enforced against all pg_regress suites not just
the main one, but perhaps that's good.

Likewise for tablespaces, subscriptions, and other globally-visible
objects, of course.

regards, tom lane

Re: Transparent column encryption

2023-03-30 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2023-03-30 16:01:46 +0200, Peter Eisentraut wrote:
> > On 30.03.23 03:29, Andres Freund wrote:
> > > > One might think that, but the precedent in other equivalent systems is 
> > > > that
> > > > you reference the key and the algorithm separately.  There is some
> > > > (admittedly not very conclusive) discussion about this near [0].
> > > > 
> > > > [0]: 
> > > > https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40
> > > 
> > > I'm very much not convinced by that. Either way, there at least there 
> > > should
> > > be a comment mentioning that we intentionally try to allow that.
> > > 
> > > Even if this feature is something we want (why?), ISTM that this should 
> > > not be
> > > implemented by having multiple fields in pg_attribute, but instead by a 
> > > table
> > > referenced by by pg_attribute.attcek.
> > 
> > I don't know if it is clear to everyone here, but the key data model and the
> > surrounding DDL are exact copies of the equivalent MS SQL Server feature.
> > When I was first studying it, I had the exact same doubts about this.  But
> > as I was learning more about it, it does make sense, because this matches a
> > common pattern in key management systems, which is relevant because these
> > keys ultimately map into KMS-managed keys in a deployment.  Moreover, 1) it
> > is plausible that those people knew what they were doing, and 2) it would be
> > preferable to maintain alignment and not create something that looks the
> > same but is different in some small but important details.

I was wondering about this- is it really exactly the same, down to the
point that there's zero checking of what the data returned actually is
after it's decrypted and given to the application, and if it actually
matches the claimed data type?

> I find it very hard to belief that details of the catalog representation like
> this will matter to users. How would would it conceivably affect users that we
> store (key, encryption method) in pg_attribute vs storing an oid that's
> effectively a foreign key reference to (key, encryption method)?

I do agree with this.

> > > > With the proposed removal of usertypmod, it's only two fields: the link 
> > > > to
> > > > the key and the user-facing type.
> > > 
> > > That feels far less clean. I think loosing the ability to set the 
> > > precision of
> > > a numeric, or the SRID for postgis datums won't be received very well?
> > 
> > In my mind, and I probably wasn't explicit about this, I'm thinking about
> > what can be done now versus later.
> > 
> > The feature is arguably useful without typmod support, e.g., for text. We
> > could ship it like that, then do some work to reorganize pg_attribute and
> > tuple descriptors to relieve some pressure on each byte, and then add the
> > typmod support back in in a future release.  I think that is a workable
> > compromise.
> 
> I doubt that shipping a version of column encryption that breaks our type
> system is a good idea.

And this.

I do feel that column encryption is a useful capability and there's
large parts of this approach that I agree with, but I dislike the idea
of having our clients be able to depend on what gets returned for
non-encrypted columns while not being able to trust what encrypted
column results are and then trying to say it's 'transparent'.  To that
end, it seems like just saying they get back a bytea and making it clear
that they have to provide the validation would be clear, while keeping
much of the rest.  Expanding out from that I'd imagine, pie-in-the-sky
and in some far off land, having our data type in/out validation
functions moved to the common library and then adding client-side
validation of the data going in/out of the encrypted columns would allow
application developers to be able to trust what we're returning (as long
as they're using libpq- and we'd have to document that independent
implementations of the protocol have to provide this or just continue to
return bytea's).

Not sure how we'd manage to provide support for extensions though.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: pgsql: Clean up role created in new subscription test.

On Thu, Mar 30, 2023 at 1:07 PM Tom Lane  wrote:
> Clean up role created in new subscription test.
>
> This oversight broke repeated runs of "make installcheck".

GH. You would think that I would have learned better by now, but
evidently not. Is there some way we can add an automated guard against
this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: running logical replication as the subscription owner

On Thu, Mar 30, 2023 at 11:11 AM Jelte Fennema  wrote:
> Regarding the actual patch. I think the code looks good. Mainly the
> tests and docs are lacking for the new option. Like I said for the
> tests you can borrow the tests I updated for my v2 patch, I think
> those should work fine for the new option.

I took a look at that but I didn't really feel like that was quite the
direction I wanted to go. I'd actually like to separate the tests of
the new option out into their own file, so that if for some reason we
decide we want to remove it in the future, it's easier to nuke all the
associated tests. Also, quite frankly, I think we've gone way
overboard in terms of loading too many tests into a single file, with
the result that it's very hard to understand exactly what and all the
file is actually testing and what it's intended to be testing. So the
attached 0002 does it that way. I've also amended 0001 and 0002 with
documentation changes that I hope are appropriate.

I noticed along the way that my earlier commits had missed one place
that needed to be updated by the pg_create_subscription patch I
created earlier. A fix for that is included in 0001, but it can be
broken out and committed separately if somebody feels strongly about
it. I personally don't think it's worth it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v3-0002-Add-a-run_as_owner-option-to-subscriptions.patch
Description: Binary data

v3-0001-Perform-logical-replication-actions-as-the-table-.patch
Description: Binary data

Re: Data is copied twice when specifying both child and parent table in publication

2023-03-30 Thread Jacob Champion

On Wed, Mar 29, 2023 at 2:00 AM Amit Kapila  wrote:
> Pushed.

While rebasing my logical-roots patch over the top of this, I ran into
another situation where mixed viaroot settings can duplicate data. The
key idea is to subscribe to two publications with mixed settings, as
before, and add a partition root that's already been replicated with
viaroot=false to the other publication with viaroot=true.

  pub=# CREATE TABLE part (a int) PARTITION BY RANGE (a);
  pub=# CREATE PUBLICATION pub_all FOR ALL TABLES;
  pub=# CREATE PUBLICATION pub_other FOR TABLE other WITH
(publish_via_partition_root);
  -- populate with data, then switch to subscription side
  sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub_all, pub_other;
  -- switch back to publication
  pub=# ALTER PUBLICATION pub_other ADD TABLE part;
  -- and back to subscription
  sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
  -- data is now duplicated

(Standalone reproduction attached.)

This is similar to what happens if you alter the
publish_via_partition_root setting for an existing publication, but
I'd argue it's easier to hit by accident. Is this part of the same
class of bugs, or is it different (or even expected) behavior?

Thanks,
--Jacob


repro.sql
Description: application/sql

Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-03-30 Thread Stephen Frost

Greetings,

* Jacob Champion (jchamp...@timescale.com) wrote:
> On 3/20/23 09:32, Robert Haas wrote:
> > I think this is the root of our disagreement.
> 
> Agreed.

I've read all the way back to the $SUBJECT change to try and get an
understanding of the questions here and it's not been easy, in part, I
think, due to the verbiage but also the perhaps lack of concrete
examples and instead references to other systems and protocols.

> > My understanding of the
> > previous discussion is that people think that the major problem here
> > is the wraparound-to-superuser attack. That is, in general, we expect
> > that when we connect to a database over the network, we expect it to
> > do some kind of active authentication, like asking us for a password,
> > or asking us for an SSL certificate that isn't just lying around for
> > anyone to use. However, in the specific case of a local connection, we
> > have a reliable way of knowing who the remote user is without any kind
> > of active authentication, namely 'peer' authentication or perhaps even
> > 'trust' if we trust all the local users, and so we don't judge it
> > unreasonable to allow local connections without any form of active
> > authentication. There can be some scenarios where even over a network
> > we can know the identity of the person connecting with complete
> > certainty, e.g. if endpoints are locked down such that the source IP
> > address is a reliable indicator of who is initiating the connection,
> > but in general when there's a network involved you don't know who the
> > person making the connection is and need to do something extra to
> > figure it out.
> 
> Okay, but this is walking back from the network example you just
> described upthread. Do you still consider that in scope, or...?

The concern around the network certainly needs to be in-scope overall.

> > If you accept this characterization of the problem,
> 
> I'm not going to say yes or no just yet, because I don't understand your
> rationale for where to draw the lines.
> 
> If you just want the bare minimum thing that will solve the localhost
> case, require_auth landed this week. Login triggers are not yet a thing,
> so `require_auth=password,md5,scram-sha-256` ensures active
> authentication. You don't even have to disallow localhost connections,
> as far as I can tell; they'll work as intended.

I do think require_auth helps us move in a positive direction.  As I
mentioned elsewhere, I don't think we highlight it nearly enough in the
postgres_fdw documentation.  Let's look at that in a bit more depth with
concrete examples and perhaps everyone will be able to get a bit more
understanding of the issues.

Client is psql
Proxy is some PG server that's got postgres_fdw
Target is another PG server, that is being connected to from Proxy
Authentication is via GSS/Kerberos with proxied credentials

What do we want to require the user to configure to make this secure?

Proxy's pg_hba configured to require GSS auth from Client.
Target's pg_hba configured to require GSS auth from Proxy.

Who are we trusting with what?  In particular, I'd argue that the user
who is able to install the postgres_fdw extension and the user who is
able to issue the CREATE SERVER are largely trusted; at least in so far
as the user doing CREATE SERVER is allowed to create the server and
through that allowed to make outbound connections from the Proxy.

Therefore, the Proxy is configured with postgres_fdw and with a trusted
user performing the CREATE SERVER.

What doesn't this handle today?  Connection side-effects are one
problem- once the CREATE SERVER is done, any user with USAGE rights on
the server can create a USER MAPPING for themselves, either with a
password or without one (if they're able to proxy GSS credentials to the
system).  They aren't able to set password_required though, which
defaults to true.  However, without having require_auth set, they're
able to cause the Proxy to reach an authentication stage with the Target
that might not match what credentials they're supposed to be providing.

We attempt to address this by checking post-auth to Target that we used
the credentials to connect that we expected to- if GSS credentials were
proxied, then we expect to use those.  If a password was provided then
we expect to use a password to auth (only checked after we see if GSS
credentials were proxied and used).  The issue here is 'post-auth' bit,
we'd prefer to fail the connection pre-auth if it isn't what we're
expecting.  Should we then explicit set require_auth=gss when GSS
credentials are proxied?  Also, if a password is provided, then
explicitly set require_auth=scram-sha-256?  Or default to these, at
least, and allow the CREATE SERVER user to override our choices?  Or
should it be a USER MAPPING option that's restricted?  Or not?

> > I think that what you're proposing is that B and C can just be allowed
> > to proxy to A and A can say "hey, by the way, I'm just gonna let you
> > in without asking

Re: Thoughts on using Text::Template for our autogenerated code?

Corey Huinker  writes:
> Is there a barrier to us using non-core perl modules, in this case
> Text::Template?

Use for what exactly?

I'd be hesitant to require such things to build from a tarball,
or to run regression tests.  If it's used to build a generated file
that we include in tarballs, that might be workable ... but I'd bet
a good fraction of the buildfarm falls over (looks like all four of
my animals would), and you might get push-back from developers too.

> I think it would be a tremendous improvement in readability and
> maintainability over our current series of print statements, some
> multiline, some not.

I suspect it'd have to be quite a remarkable improvement to justify
adding a new required build tool ... but show us an example.

regards, tom lane

Re: [POC] Allow an extension to add data into Query and PlannedStmt nodes

2023-03-30 Thread Andrey Lepikhov


On 30/3/2023 12:57, Julien Rouhaud wrote:

Extensions could need to pass some query-related data through all stages of
the query planning and execution. As a trivial example, pg_stat_statements
uses queryid at the end of execution to save some statistics. One more
reason - extensions now conflict on queryid value and the logic of its
changing. With this patch, it can be managed.


I just had a quick lookc at the patch, and IIUC it doesn't really help on that
side, as there's still a single official "queryid" that's computed, stored
everywhere and later used by pg_stat_statements (which does then store in
additionally to that official queryid).

Thank you for the attention!
This patch doesn't try to solve the problem of oneness of queryId. In 
this patch we change pg_stat_statements and it doesn't set 0 into 
queryId field according to its internal logic. And another extension 
should do the same - use queryId on your own but not erase it - erase 
your private copy in the ext_field.



Ruthless pgbench benchmark shows that we got some overhead:
1.6% - in default mode
4% - in prepared mode
~0.1% in extended mode.


That's a quite significant overhead.  But the only reason to accept such a
change is to actually use it to store additional data, so it would be
interesting to see what the overhead is like once you store at least 2
different values there.
Yeah, but as I said earlier, it can be reduced to much smaller value 
just with simple optimization. Here I intentionally avoid it to discuss 
the core concept.



- Are we need to invent a registration procedure to do away with the names
of entries and use some compact integer IDs?


Note that the patch as proposed doesn't have any defense for two extensions
trying to register something with the same name, or update a stored value, as
AddExtensionDataToNode() simply prepend the new value to the list.  You can
actually update the value by just storing the new value, but it will add a
significant overhead to every other extension that want to read another value.
Thanks a lot! Patch in attachment implements such an idea - extension 
can allocate some entries and use these private IDs to add entries. I 
hope, an extension would prefer to use only one entry for all the data 
to manage overhead, but still.


The API is also quite limited as each stored value has a single identifier.
What if your extension needs to store multiple things?  Since it's all based on
Node you can't really store some custom struct, so you probably have to end up
with things like "my_extension.my_val1", "my_extension.my_val2" which isn't
great.
Main idea here - if an extension holds custom struct and want to pass it 
along all planning and execution stages it should use extensible node 
with custom read/write/copy routines.


--
regards,
Andrey Lepikhov
Postgres Professional
From ab101322330684e9839e46c26f70ad5462e40dac Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 30 Mar 2023 21:49:32 +0500
Subject: [PATCH 2/2] Add ExtendedData entry registration routines to use
 entryId instead of symbolic name.

---
 .../pg_stat_statements/pg_stat_statements.c   |  7 +-
 src/backend/commands/extension.c  | 69 +++
 src/include/commands/extension.h  |  6 +-
 src/include/nodes/parsenodes.h|  2 +-
 4 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index fc47661c86..5b21163365 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -109,11 +109,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = 
PG_VERSION_NUM / 100;
!IsA(n, 
DeallocateStmt))
 
 #define GET_QUERYID(node) \
-   (Bigint *) GetExtensionData(node->ext_field, "pg_stat_statements")
+   (Bigint *) GetExtensionData(node->ext_field, extendedEntryID)
 
 #define INSERT_QUERYID(node, queryid) \
castNode(Bigint, AddExtensionDataToNode((Node *) node, \
-   
"pg_stat_statements", \
+   
extendedEntryID, \

(Node *) makeBigint((int64) queryid), \

true))
 /*
@@ -298,6 +298,7 @@ static bool pgss_track_utility = true;  /* whether to 
track utility commands */
 static bool pgss_track_planning = false;   /* whether to track planning

 * duration */
 static bool pgss_save = true;  /* whether to save stats across shutdown */
+static int extendedEntryID = -1; /* Use to add queryId into the Query struct

Thoughts on using Text::Template for our autogenerated code?

Is there a barrier to us using non-core perl modules, in this case
Text::Template?

I think it would be a tremendous improvement in readability and
maintainability over our current series of print statements, some
multiline, some not.

The module itself works like this https://www.perlmonks.org/?node_id=33296

Some other digging around shows that the module has been around since 1996
(Perl5 was 1994) and hasn't had a feature update (or any update for that
matter) since 2003. So it should meet our baseline 5.14 requirement, which
came out in 2011.

I'm happy to proceed with a proof-of-concept so that people can see the
costs/benefits, but wanted to first make sure it wasn't a total non-starter.

Re: Autogenerate some wait events code and documentation

On Wed, Mar 29, 2023 at 8:51 AM Drouvot, Bertrand <
bertranddrouvot...@gmail.com> wrote:

> Hi,
>
> On 3/29/23 11:44 AM, Drouvot, Bertrand wrote:
>
> >
> > Looking forward to your feedback,
>
> Just realized that more polishing was needed.
>
> Done in V2 attached.
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

I think this is good work, but I can't help thinking it would be easier to
understand and maintain if we used a template engine like Text::Template,
and filled out the template with the variant bits. I'll ask that question
in another thread for higher visibility.

Re: Minimal logical decoding on standbys

2023-03-30 Thread Masahiko Sawada

Hi,

On Thu, Mar 30, 2023 at 2:45 PM Jeff Davis  wrote:
>
> On Thu, 2023-03-02 at 23:58 -0800, Jeff Davis wrote:
> > On Thu, 2023-03-02 at 11:45 -0800, Jeff Davis wrote:
> > > In this case it looks easier to add the right API than to be sure
> > > about
> > > whether it's needed or not.
> >
> > I attached a sketch of one approach. I'm not very confident that it's
> > the right API or even that it works as I intended it, but if others
> > like the approach I can work on it some more.
>
> Another approach might be to extend WaitEventSets() to be able to wait
> on Condition Variables, rather than Condition Variables waiting on
> WaitEventSets. Thoughts?
>

+1 to extend CV. If we extend WaitEventSet() to be able to wait on CV,
it would be able to make the code simple, but we would need to change
both CV and WaitEventSet().

On Fri, Mar 10, 2023 at 8:34 PM Drouvot, Bertrand
 wrote:
>
> I gave it a try, so please find attached 
> v2-0001-Introduce-ConditionVariableEventSleep.txt (implementing the comments 
> above) and 0004_new_API.txt to put the new API in the logical decoding on 
> standby context.

@@ -180,13 +203,25 @@ ConditionVariableTimedSleep(ConditionVariable
*cv, long timeout,
 * by something other than ConditionVariableSignal;
though we don't
 * guarantee not to return spuriously, we'll avoid
this obvious case.
 */
-   SpinLockAcquire(>mutex);
-   if (!proclist_contains(>wakeup, MyProc->pgprocno,
cvWaitLink))
+
+   if (cv)
{
-   done = true;
-   proclist_push_tail(>wakeup,
MyProc->pgprocno, cvWaitLink);
+   SpinLockAcquire(>mutex);
+   if (!proclist_contains(>wakeup,
MyProc->pgprocno, cvWaitLink))
+   {
+   done = true;
+   proclist_push_tail(>wakeup,
MyProc->pgprocno, cvWaitLink);
+   }
+   SpinLockRelease(>mutex);
}

This change looks odd to me since it accepts cv being NULL in spite of
calling ConditionVariableEventSleep() for cv. I think that this is
because in 0004_new_API.txt, we use ConditionVariableEventSleep() in
both not-in-recovery case and recovery-in-progress cases in
WalSndWaitForWal() as follows:

-   WalSndWait(wakeEvents, sleeptime,
WAIT_EVENT_WAL_SENDER_WAIT_WAL);
+   ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos,
wakeEvents, NULL);
+   ConditionVariableEventSleep(cv, RecoveryInProgress,
FeBeWaitSet, NULL,
+
 sleeptime, wait_event);
}

But I don't think we need to use ConditionVariableEventSleep() in
not-in-recovery cases. If I correctly understand the problem this
patch wants to deal with, in logical decoding on standby cases, the
walsender needs to be woken up on the following events:

* condition variable
* timeout
* socket writable (if pq_is_send_pending() is true)
(socket readable event should also be included to avoid
wal_receiver_timeout BTW?)

On the other hand, in not-in-recovery case, the events are:

* socket readable
* socket writable (if pq_is_send_pending() is true)
* latch
* timeout

I think that we don't need to change for the latter case as
WalSndWait() perfectly works. As for the former cases, since we need
to wait for CV, timeout, or socket writable we can use
ConditionVariableEventSleep().

Regards,


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

FW: Add the ability to limit the amount of memory that can be allocated to backends.

2023-03-30 Thread John Morris

 Hi Reid!
Some thoughts
I was looking at lmgr/proc.c, and I see a potential integer overflow - both 
max_total_bkend_mem and result are declared as “int”, so the expression 
“max_total_bkend_mem * 1024 * 1024 - result * 1024 * 1024” could have a problem 
when max_total_bkend_mem is set to 2G or more.
/*
* Account for shared memory 
size and initialize
* max_total_bkend_mem_bytes.
*/

pg_atomic_init_u64(>max_total_bkend_mem_bytes,

   max_total_bkend_mem * 1024 * 1024 - result * 
1024 * 1024);


As more of a style thing (and definitely not an error), the calling code might 
look smoother if the memory check and error handling were moved into a helper 
function, say “backend_malloc”.  For example, the following calling code

/* Do not exceed maximum allowed memory allocation */
if 
(exceeds_max_total_bkend_mem(Slab_CONTEXT_HDRSZ(chunksPerBlock)))
{
MemoryContextStats(TopMemoryContext);
ereport(ERROR,

(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of 
memory - exceeds max_total_backend_memory"),

errdetail("Failed while creating memory context \"%s\".",

   name)));
}

slab = (SlabContext *) 
malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock));
  if (slab == NULL)
  ….
Could become a single line:
Slab = (SlabContext *) backend_malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock);

Note this is a change in error handling as backend_malloc() throws memory 
errors. I think this change is already happening, as the error throws you’ve 
already added are potentially visible to callers (to be verified). It also 
could result in less informative error messages without the specifics of “while 
creating memory context”.  Still, it pulls repeated code into a single wrapper 
and might be worth considering.

I do appreciate the changes in updating the global memory counter. My 
recollection is the original version updated stats with every allocation, and 
there was something that looked like a spinlock around the update.  (My memory 
may be wrong …). The new approach eliminates contention, both by having fewer 
updates and by using atomic ops.  Excellent.

 I also have some thoughts about simplifying the atomic update logic you are 
currently using. I need to think about it a bit more and will get back to you 
later on that.


  *   John Morris

Re: Non-superuser subscription owners

On Tue, Mar 28, 2023 at 1:52 PM Jeff Davis  wrote:
> On Fri, 2023-03-24 at 00:17 -0700, Jeff Davis wrote:
> > The other patch you posted seems like it makes a lot of progress in
> > that direction, and I think that should go in first. That was one of
> > the items I suggested previously[2], so thank you for working on
> > that.
>
> The above is not a hard objection.

The other patch is starting to go in a direction that is going to have
some conflicts with this one, so I went ahead and committed this one
to avoid rebasing pain.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Transparent column encryption

Hi,

On 2023-03-30 16:01:46 +0200, Peter Eisentraut wrote:
> On 30.03.23 03:29, Andres Freund wrote:
> > > One might think that, but the precedent in other equivalent systems is 
> > > that
> > > you reference the key and the algorithm separately.  There is some
> > > (admittedly not very conclusive) discussion about this near [0].
> > > 
> > > [0]: 
> > > https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40
> > 
> > I'm very much not convinced by that. Either way, there at least there should
> > be a comment mentioning that we intentionally try to allow that.
> > 
> > Even if this feature is something we want (why?), ISTM that this should not 
> > be
> > implemented by having multiple fields in pg_attribute, but instead by a 
> > table
> > referenced by by pg_attribute.attcek.
> 
> I don't know if it is clear to everyone here, but the key data model and the
> surrounding DDL are exact copies of the equivalent MS SQL Server feature.
> When I was first studying it, I had the exact same doubts about this.  But
> as I was learning more about it, it does make sense, because this matches a
> common pattern in key management systems, which is relevant because these
> keys ultimately map into KMS-managed keys in a deployment.  Moreover, 1) it
> is plausible that those people knew what they were doing, and 2) it would be
> preferable to maintain alignment and not create something that looks the
> same but is different in some small but important details.

I find it very hard to belief that details of the catalog representation like
this will matter to users. How would would it conceivably affect users that we
store (key, encryption method) in pg_attribute vs storing an oid that's
effectively a foreign key reference to (key, encryption method)?


> > > With the proposed removal of usertypmod, it's only two fields: the link to
> > > the key and the user-facing type.
> > 
> > That feels far less clean. I think loosing the ability to set the precision 
> > of
> > a numeric, or the SRID for postgis datums won't be received very well?
> 
> In my mind, and I probably wasn't explicit about this, I'm thinking about
> what can be done now versus later.
> 
> The feature is arguably useful without typmod support, e.g., for text. We
> could ship it like that, then do some work to reorganize pg_attribute and
> tuple descriptors to relieve some pressure on each byte, and then add the
> typmod support back in in a future release.  I think that is a workable
> compromise.

I doubt that shipping a version of column encryption that breaks our type
system is a good idea.

Greetings,

Andres Freund

Re: Images storing techniques

2023-03-30 Thread Bruce Momjian

On Thu, Mar 30, 2023 at 11:30:37AM +0200, Gaetano Mendola wrote:
> I would suggest to store your images in a file system and the store paths to
> those images.
> 
> You can keep files and entries in your database synced via triggers/stored
> procedures (eventually written in python since pgsql doesn't allow you to
> interact with the file system). 

You might want to read this blog entry:

https://momjian.us/main/blogs/pgblog/2017.html#November_6_2017

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.

Re: running logical replication as the subscription owner

2023-03-30 Thread Jelte Fennema

On Thu, 30 Mar 2023 at 15:42, Robert Haas  wrote:
> On Thu, Mar 30, 2023 at 1:19 AM Jeff Davis  wrote:
> > I say just take the special case out of 0001. If the trigger doesn't
> > work as a SECURITY_RESTRICTED_OPERATION, and is also ENABLED ALWAYS,
> > then the user can just use the new option in 0002 to get the old
> > behavior. I don't see a reason to implicitly give them the old
> > behavior, as 0001 does.
>
> Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET
> ROLE to A. With the patch as written, actions on B's tables are not
> confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on C's
> tables are.

I think that's fair. There's no need to set
SECURITY_RESTRICTED_OPERATION if it doesn't protect you anyway, and
indeed it might break things. To be clear I do think it's important to
still switch to the table owner, simply for consistency. But that's
done in the patch so that's fine.

> I agree that the naming is somewhat problematic, but I don't like
> trust_table_owners. It's not clear enough about what actually happens.
> I want the name to describe behavior, not sentiment.

For security related things, I think sentiment is often just as
important as the actual behaviour. It's not without reason that newer
javascript frameworks have things like dangerouslySetInnerHTML, to
scare people away from using it unless they know what they are doing.

> I don't think run_as_owner is terrible, despite the ambiguity. It's
> talking about the owner of the object on which the property is being
> set. Isn't that the most natural interpretation? I'd be pretty
> surprised if I set a property called run_as_owner on object A and it
> ran as the owner of some other object B.

I think that's fair and I'd be happy with run_as_owner. If someone
doesn't understand which owner, they should probably check the
documentation anyways to understand the implications.

Regarding the actual patch. I think the code looks good. Mainly the
tests and docs are lacking for the new option. Like I said for the
tests you can borrow the tests I updated for my v2 patch, I think
those should work fine for the new option.

On Thu, 30 Mar 2023 at 15:42, Robert Haas  wrote:
>
> On Thu, Mar 30, 2023 at 1:19 AM Jeff Davis  wrote:
> > I say just take the special case out of 0001. If the trigger doesn't
> > work as a SECURITY_RESTRICTED_OPERATION, and is also ENABLED ALWAYS,
> > then the user can just use the new option in 0002 to get the old
> > behavior. I don't see a reason to implicitly give them the old
> > behavior, as 0001 does.
>
> Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET
> ROLE to A. With the patch as written, actions on B's tables are not
> confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on C's
> tables are.
>
> I think we want to do everything possible to avoid people feeling like
> they need to turn on this new option. I'm not sure we'll ever be able
> to get rid of it, but we certainly should avoid doing things that make
> it more likely that it will be needed.
>
> > > 0002 adds a run_as_owner option to subscriptions. This doesn't make
> > > the updated regression tests fail, because they don't use it. If you
> > > revert the changes to 027_nosuperuser.pl then you get failures (as
> > > one
> > > would hope) and if you then add WITH (run_as_owner = true) to the
> > > CREATE SUBSCRIPTION command in 027_nosuperuser.pl then it passes
> > > again. I need to spend some more time thinking about what the tests
> > > actually ought to look like if we go this route -- I haven't looked
> > > through what Jelte proposed yet -- and also the documentation would
> > > need a bunch of updating.
> >
> > "run_as_owner" is ambiguous -- subscription owner or table owner?
> >
> > I would prefer something like "trust_table_owners". That would
> > communicate that the user shouldn't choose it unless they know what
> > they're doing.
>
> I agree that the naming is somewhat problematic, but I don't like
> trust_table_owners. It's not clear enough about what actually happens.
> I want the name to describe behavior, not sentiment.
>
> run_as_subscription_owner removes the ambiguity, but is long.
>
> run_as_table_owner is a bit shorter, and we could do that with the
> sense reversed. Is that equally clear? I'm not sure.
>
> I can think of other alternatives, like user_switching or
> switch_to_table_owner or no_user_switching or various other things,
> but none of them seem very good to me.
>
> Another idea could be to make the option non-Boolean. This is
> comically long and I can't seriously recommend it, but just to
> illustrate the point, if you type CREATE SUBSCRIPTION ... WITH
> (execute_code_as_owner_of_which_object = subscription) then you
> certainly should know what you've signed up for! If there were a
> shorter version that were still clear, I could go for that, but I'm
> having trouble coming up with exact wording.
>
> I don't think run_as_owner is terrible, despite the ambiguity.

Re: meson/msys2 fails with plperl/Strawberry

2023-03-30 Thread Andrew Dunstan



On 2023-03-27 Mo 13:18, Andres Freund wrote:

Hi,

On 2023-03-26 21:13:41 -0400, Andrew Dunstan wrote:

On Mar 26, 2023, at 5:28 PM, Andres Freund  wrote:

On 2023-03-26 12:39:08 -0700, Andres Freund wrote:
First: I am *not* arguing we shouldn't repair building against strawberry perl
with mingw.

Hm - can you describe the failure more - I just tried, and it worked to build
against strawberry perl on mingw, without any issues. All I did was set
-DPERL="c:/strawberrly/perl/bin/perl.exe".

That might be the secret sauce I’m missing. I will be offline for a day or 
three, will test when I’m back.

It should suffice to put strawberry perl first in PATH. All that the -DPERL
does is to use that, instead of 'perl' from PATH. If putting strawberry perl
ahead in PATH failed, something else must have been going on...




Yeah, What it actually needed was a system upgrade. Sorry for the noise.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: pgindent vs. git whitespace check

2023-03-30 Thread Bruce Momjian

On Wed, Mar 29, 2023 at 08:26:23PM +0200, Daniel Gustafsson wrote:
> > On 29 Mar 2023, at 19:18, Bruce Momjian  wrote:
> > We would have to convert all supported branches, and tell all forks to
> > do the same (hopefully at the same time).  The new standard would then
> > be for all single-line comments to use // instead of /* ... */.
> 
> That still leaves every patch which is in flight on -hackers, and conflicts in
> local development trees etc.  It's doable (apart from forks, but that cannot 
> be
> our core concern), but I personally can't see the price paid justify the 
> result.

Yes, this would have to be done at the start of a new release cycle.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.

Re: Transparent column encryption

On 30.03.23 03:29, Andres Freund wrote:

One might think that, but the precedent in other equivalent systems is that
you reference the key and the algorithm separately. There is some
(admittedly not very conclusive) discussion about this near [0].

[0]:
https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40

I'm very much not convinced by that. Either way, there at least there should
be a comment mentioning that we intentionally try to allow that.

Even if this feature is something we want (why?), ISTM that this should not be
implemented by having multiple fields in pg_attribute, but instead by a table
referenced by by pg_attribute.attcek.

I don't know if it is clear to everyone here, but the key data model and
the surrounding DDL are exact copies of the equivalent MS SQL Server
feature. When I was first studying it, I had the exact same doubts
about this. But as I was learning more about it, it does make sense,
because this matches a common pattern in key management systems, which
is relevant because these keys ultimately map into KMS-managed keys in a
deployment. Moreover, 1) it is plausible that those people knew what
they were doing, and 2) it would be preferable to maintain alignment and
not create something that looks the same but is different in some small
but important details.

With the proposed removal of usertypmod, it's only two fields: the link to
the key and the user-facing type.

That feels far less clean. I think loosing the ability to set the precision of
a numeric, or the SRID for postgis datums won't be received very well?

In my mind, and I probably wasn't explicit about this, I'm thinking
about what can be done now versus later.

The feature is arguably useful without typmod support, e.g., for text.
We could ship it like that, then do some work to reorganize pg_attribute
and tuple descriptors to relieve some pressure on each byte, and then
add the typmod support back in in a future release. I think that is a
workable compromise.

Re: running logical replication as the subscription owner

On Thu, Mar 30, 2023 at 1:19 AM Jeff Davis  wrote:
> I say just take the special case out of 0001. If the trigger doesn't
> work as a SECURITY_RESTRICTED_OPERATION, and is also ENABLED ALWAYS,
> then the user can just use the new option in 0002 to get the old
> behavior. I don't see a reason to implicitly give them the old
> behavior, as 0001 does.

Mmm, I don't agree. Suppose A can SET ROLE to B or C, and B can SET
ROLE to A. With the patch as written, actions on B's tables are not
confined by the SECURITY_RESTRICTED_OPERATION flag, but actions on C's
tables are.

I think we want to do everything possible to avoid people feeling like
they need to turn on this new option. I'm not sure we'll ever be able
to get rid of it, but we certainly should avoid doing things that make
it more likely that it will be needed.

> > 0002 adds a run_as_owner option to subscriptions. This doesn't make
> > the updated regression tests fail, because they don't use it. If you
> > revert the changes to 027_nosuperuser.pl then you get failures (as
> > one
> > would hope) and if you then add WITH (run_as_owner = true) to the
> > CREATE SUBSCRIPTION command in 027_nosuperuser.pl then it passes
> > again. I need to spend some more time thinking about what the tests
> > actually ought to look like if we go this route -- I haven't looked
> > through what Jelte proposed yet -- and also the documentation would
> > need a bunch of updating.
>
> "run_as_owner" is ambiguous -- subscription owner or table owner?
>
> I would prefer something like "trust_table_owners". That would
> communicate that the user shouldn't choose it unless they know what
> they're doing.

I agree that the naming is somewhat problematic, but I don't like
trust_table_owners. It's not clear enough about what actually happens.
I want the name to describe behavior, not sentiment.

run_as_subscription_owner removes the ambiguity, but is long.

run_as_table_owner is a bit shorter, and we could do that with the
sense reversed. Is that equally clear? I'm not sure.

I can think of other alternatives, like user_switching or
switch_to_table_owner or no_user_switching or various other things,
but none of them seem very good to me.

Another idea could be to make the option non-Boolean. This is
comically long and I can't seriously recommend it, but just to
illustrate the point, if you type CREATE SUBSCRIPTION ... WITH
(execute_code_as_owner_of_which_object = subscription) then you
certainly should know what you've signed up for! If there were a
shorter version that were still clear, I could go for that, but I'm
having trouble coming up with exact wording.

I don't think run_as_owner is terrible, despite the ambiguity. It's
talking about the owner of the object on which the property is being
set. Isn't that the most natural interpretation? I'd be pretty
surprised if I set a property called run_as_owner on object A and it
ran as the owner of some other object B. I think our notion of how
ambiguous this is may be somewhat inflated by the fact that we've just
had a huge conversation about whether it should be the table owner or
the subscription owner, so those possibilities are etched in our mind
in a way that maybe they won't be for people coming at this fresh. But
it's hard to be sure what other people will think about something, and
I don't want to be too optimistic about the name I picked, either.

> If you are worried that *I* think 0002 would be a poor call, then no, I
> don't. Initially I didn't like the idea of supporting two behaviors
> (and who would?), but we probably can't avoid it at least for right
> now.

OK, good. Then we have a way forward that nobody's too upset about.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Images storing techniques


On 29.03.23 23:29, Riccardo Gobbo wrote:
Question: for better performance is it better to store images as BYTEA 
or convert every image in base64 and store the generated string  (so in 
html it's enough to insert the base64 string in the tag)?
Converting an image in base64 would use a 30% more memory than storing 
directly the image's bytes, but I don't know if working with characters 
rather than bytes could have more prons than cons


Storing as bytea is better.

Re: Schema variables - new implementation for Postgres 15


On 30.03.23 10:49, Pavel Stehule wrote:
If I reorganize the patch to the following structure, can be it useful 
for you?


1. really basic functionality (no temporary variables, no def 
expressions, no memory cleaning)

    SELECT variable
    LET should be supported + doc, + related tests.

2. support for temporary variables (session, transaction scope),
     memory cleaning at the end of transaction

3. PL/pgSQL support
4. pg_dump
5. shadowing warning
6. ... others ...


That seems like an ok approach.  The pg_dump support should probably go 
into the first patch, so it's self-contained.

Re: Initial Schema Sync for Logical Replication

2023-03-30 Thread Masahiko Sawada

On Thu, Mar 30, 2023 at 12:18 AM Masahiko Sawada  wrote:
>
> On Wed, Mar 29, 2023 at 7:57 PM Kumar, Sachin  wrote:
> >
> > > > > > From: Amit Kapila 
> > > > > > > I think we won't be able to use same snapshot because the
> > > > > > > transaction will be committed.
> > > > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am
> > > > > > > not sure about this part maybe walrcv_disconnect() calls the 
> > > > > > > commits
> > > internally ?).
> > > > > > > So somehow we need to keep this snapshot alive, even after
> > > > > > > transaction is committed(or delay committing the transaction ,
> > > > > > > but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we
> > > > > > > can have a restart before tableSync is able to use the same
> > > > > > > snapshot.)
> > > > > > >
> > > > > >
> > > > > > Can we think of getting the table data as well along with schema
> > > > > > via pg_dump? Won't then both schema and initial data will
> > > > > > correspond to the same snapshot?
> > > > >
> > > > > Right , that will work, Thanks!
> > > >
> > > > While it works, we cannot get the initial data in parallel, no?
> > > >
> >
> > I was thinking each TableSync process will call pg_dump --table, This way 
> > if we have N
> > tableSync process, we can have N pg_dump --table=table_name called in 
> > parallel.
> > In fact we can use --schema-only to get schema and then let COPY take care 
> > of data
> > syncing . We will use same snapshot for pg_dump as well as COPY table.
>
> How can we postpone creating the pg_subscription_rel entries until the
> tablesync worker starts and does the schema sync? I think that since
> pg_subscription_rel entry needs the table OID, we need either to do
> the schema sync before creating the entry (i.e, during CREATE
> SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
> apply worker needs the information of tables to sync in order to
> launch the tablesync workers, but it needs to create the table schema
> to get that information.

For the above reason, I think that step 6 of the initial proposal won't work.

If we can have the tablesync worker create an entry of
pg_subscription_rel after creating the table, it may give us the
flexibility to perform the initial sync. One idea is that we add a
relname field to pg_subscription_rel so that we can create entries
with relname instead of OID if the table is not created yet. Once the
table is created, we clear the relname field and set the OID of the
table instead. It's not an ideal solution but we might make it simpler
later.

Assuming that it's feasible, I'm considering another approach for the
initial sync in order to address the concurrent DDLs.

The basic idea is to somewhat follow how pg_dump/restore to
dump/restore the database data. We divide the synchronization phase
(including both schema and data) up into three phases: pre-data,
table-data, post-data. These mostly follow the --section option of
pg_dump.

1. The backend process performing CREATE SUBSCRIPTION creates the
subscription but doesn't create pg_subscription_rel entries yet.

2. Before starting the streaming, the apply worker fetches the table
list from the publisher, create pg_subscription_rel entries for them,
and dumps+restores database objects that tables could depend on but
don't depend on tables such as TYPE, OPERATOR, ACCESS METHOD etc (i.e.
pre-data).

3. The apply worker launches the tablesync workers for tables that
need to be synchronized.

There might be DDLs executed on the publisher for tables before the
tablesync worker starts. But the apply worker needs to apply DDLs for
pre-data database objects. OTOH, it can ignore DDLs for not-synced-yet
tables and other database objects such as INDEX, TRIGGER, RULE, etc
(i.e. post-data).

4. The tablesync worker creates its replication slot, dumps+restores
the table schema, update the pg_subscription_rel, and perform COPY.

These operations should be done in the same transaction.

5. After finishing COPY, the tablesync worker dumps indexes (and
perhaps constraints) of the table and creates them (which possibly
takes a long time). Then it starts to catch up, same as today. The
apply worker needs to wait for the tablesync worker to catch up.

We need to repeat these steps until we complete the initial data copy
and create indexes for all tables, IOW until all pg_subscription_rel
status becomes READY.

6. If the apply worker confirms all tables are READY, it starts
another sync worker who is responsible for the post-data database
objects such as TRIGGER, RULE, POLICY etc (i.e. post-data).

While the sync worker is starting up or working, the apply worker
applies changes for pre-data database objects as well as READY tables.

7. Similar to the tablesync worker, this sync worker creates its
replication slot and sets the returned LSN somewhere, say
pg_subscription.

8. The sync worker dumps and restores these

Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

On Fri, Mar 24, 2023 at 5:47 PM Jacob Champion  wrote:
> Okay, but this is walking back from the network example you just
> described upthread. Do you still consider that in scope, or...?

Sorry, I don't know which example you mean.

> > If machines B and C aren't under our control such that we can
> > configure them that way, then the configuration is fundamentally
> > insecure in a way that we can't really fix.
>
> Here's probably our biggest point of contention. You're unlikely to
> convince me that this is the DBA's fault.
>
> If machines B and C aren't under our control, then our *protocol* is
> fundamentally insecure in a way that we have the ability to fix, in a
> way that's already been characterized in security literature.

I guess I wouldn't have a problem blaming the DBA here, but you seem
to be telling me that the security literature has settled on another
kind of approach, and I'm not in a position to dispute that. It still
feels weird to me, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: JsonPath version bits

2023-03-30 Thread Nikita Malakhov

Hi hackers!

Could the 1 byte from the JsonPath header be used to store version?
Or how many bits from the header could be used for the version value?

On Mon, Mar 27, 2023 at 12:54 PM Nikita Malakhov  wrote:

> Hi hackers!
>
> I've got a question on the JsonPath header - currently the header size
> is 4 bytes, where there are version and mode bits. Is there somewhere
> a defined size of the version part? There are some extensions working
> with JsonPath, and we have some too, thus it is important how many
> bits is it possible to use to store a version value?
>
>
-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: Request for comment on setting binary format output per session

2023-03-30 Thread Merlin Moncure

On Wed, Mar 29, 2023 at 11:04 AM Jeff Davis  wrote:

>  I'm not clear on what proposal you are making and/or endorsing?
>

ha -- was just backing up dave's GUC idea.

> 1. Fix our own clients, like psql, to check for binary data they can't
> process.
>

This ought to be impossible IMO.  All libpq routines except PQexec have an
explicit expectation on format (via resultformat parameter) that should not
be overridden.  PQexec ought to be explicitly documented and wired to only
request text format data.

resultfomat can be extended now or later to allow participating clients to
receive GUC configured format.  I do not think that libpq's result format
being able to be overridden by GUC is a good idea at all, the library has
to to participate, and I think can be made to so so without adjusting the
interface (say, by resultFormat = 3).  Similarly, in JDBC world, it ought
to be up to the driver to determine when it want the server to flex wire
formats but must be able to override the server's decision.

merlin

Re: Support logical replication of DDLs

2023-03-30 Thread vignesh C

On Thu, 30 Mar 2023 at 13:29, houzj.f...@fujitsu.com
 wrote:
>
>
>
> > -Original Message-
> > From: houzj.f...@fujitsu.com 
> > Sent: Thursday, March 30, 2023 2:37 PM
> >
> > On Tuesday, March 28, 2023 12:13 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Monday, March 27, 2023 8:08 PM Amit Kapila
> > 
> > > wrote:
> > > > On Mon, Mar 27, 2023 at 12:07 PM Amit Kapila 
> > > > wrote:
> > > > >
> > > > > On Mon, Mar 27, 2023 at 2:52 AM Tom Lane  wrote:
> > > > > >
> > > > >
> > > > > > I suggest taking a couple of steps back from the minutiae of the
> > > > > > patch, and spending some hard effort thinking about how the thing
> > > > > > would be controlled in a useful fashion (that is, a real design
> > > > > > for the filtering that was mentioned at the very outset), and
> > > > > > about the security issues, and about how we could get to a
> > committable
> > > patch.
> > > > > >
> > > > >
> > > > > Agreed. I'll try to summarize the discussion we have till now on
> > > > > this and share my thoughts on the same in a separate email.
> > > > >
> > > >
> > > > The idea to control what could be replicated is to introduce a new
> > > > publication option 'ddl' along with current options 'publish' and
> > > > 'publish_via_partition_root'. The values of this new option could be
> > > > 'table', 'function', 'all', etc. Here 'all' enables the replication of
> > > > all supported DDL commands. Example usage for this would be:
> > > > Example:
> > > > Create a new publication with all ddl replication enabled:
> > > >   CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all');
> > > >
> > > > Enable table ddl replication for an existing Publication:
> > > >   ALTER PUBLICATION pub2 SET (ddl = 'table');
> > > >
> > > > This is what seems to have been discussed but I think we can even
> > > > extend it to support based on operations/commands, say one would like
> > > > to publish only 'create' and 'drop' of tables. Then we can extend the
> > > > existing publish option to have values like 'create', 'alter', and 
> > > > 'drop'.
> > > >
> > > > Another thing we are considering related to this is at what level
> > > > these additional options should be specified. We have three variants
> > > > FOR TABLE, FOR ALL TABLES, and FOR TABLES IN SCHEMA that enables
> > > > replication. Now, for the sake of simplicity, this new option is
> > > > discussed to be provided only with FOR ALL TABLES variant but I think
> > > > we can provide it with other variants with some additional
> > > > restrictions like with FOR TABLE, we can only specify 'alter' and
> > > > 'drop' for publish option. Now, though possible, it brings additional
> > > > complexity to support it with variants other than FOR ALL TABLES
> > > > because then we need to ensure additional filtering and possible
> > > > modification of the content we have to send to downstream. So, we can
> > even
> > > decide to first support it only FOR ALL TABLES variant.
> > > >
> > > > The other point to consider for publish option 'ddl = table' is
> > > > whether we need to allow replicating dependent objects like say some
> > > > user-defined type is used in the table. I guess the difficulty here
> > > > would be to identify which dependents we want to allow.
> > > >
> > > > I think in the first version we should allow to replicate only some of
> > > > the objects instead of everything. For example, can we consider only
> > > > allowing tables and indexes in the first version? Then extend it in a 
> > > > phased
> > > manner?
> > >
> > > I think supporting table related stuff in the first version makes sense 
> > > and the
> > > patch size could be reduced to a suitable size.
> >
> > Based on the discussion, I split the patch into four parts: Table DDL
> > replication(0001,0002), Index DDL replication(0003), ownership stuff for 
> > table
> > and index(0004), other DDL's replication(0005).
> >
> > In this version, I mainly tried to split the patch set, and there are few
> > OPEN items we need to address later:
> >
> > 1) The current publication "ddl" option only have two values: table, all. We
> >also need to add index and maybe other objects in the list.
> >
> > 2) Need to improve the syntax stuff. Currently, we store the option value of
> >the "with (ddl = xx)" via different columns in the catalog, the
> >catalog(pg_publication) will have more and more columns as we add
> > support
> >for logical replication of other objects in the future. We could store 
> > it as
> >an text array instead.
> >
> >OTOH, since we have proposed some other more flexible syntax to -hackers,
> > the current
> >syntax might be changed which might also solve this problem.
> >
> > 3) The test_ddl_deparse_regress test module is not included in the set,
> > because
> >I think we also need to split it into table stuff, index stuff and 
> > others,
> >we can share it after finishing that.
> >
> > 4) The patch set could be spitted further to make it easier

Re: "variable not found in subplan target list"

2023-03-30 Thread Alvaro Herrera

On 2023-Mar-29, Amit Langote wrote:

> On Wed, Mar 29, 2023 at 3:39 AM Tom Lane  wrote:
> > Alvaro Herrera  writes:
> > > So I'm back home and found a couple more weird errors in the log:
> >
> > > ERROR:  mismatching PartitionPruneInfo found at part_prune_index 0
> > > DETALLE:  plan node relids (b 1), pruneinfo relids (b 36)
> >
> > This one reproduces for me.
> 
> I've looked into this one and the attached patch fixes it for me.
> Turns out set_plan_refs()'s idea of when the entries from
> PlannerInfo.partPruneInfos are transferred into
> PlannerGlobal.partPruneInfo was wrong.

Thanks for the patch. I've pushed it to github for CI testing, and if
there are no problems I'll put it in.

> Though, I wonder if we need to keep ec386948948 that introduced the
> notion of part_prune_index around if the project that needed it [1]
> has moved on to an entirely different approach altogether, one that
> doesn't require hacking up the pruning code.

Hmm, that's indeed tempting.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Support logical replication of DDLs

2023-03-30 Thread Amit Kapila

On Wed, Mar 29, 2023 at 10:23 PM Zheng Li  wrote:
>
> On Wed, Mar 29, 2023 at 5:13 AM Amit Kapila  wrote:
> >
> > On Wed, Mar 29, 2023 at 2:49 AM Zheng Li  wrote:
> > >
> > >
> > > I agree that a full fledged DDL deparser and DDL replication is too
> > > big of a task for one patch. I think we may consider approaching this
> > > feature in the following ways:
> > > 1. Phased development and testing as discussed in other emails.
> > > Probably support table commands first (as they are the most common
> > > DDLs), then the other commands in multiple phases.
> > > 2. Provide a subscription option to receive the DDL change, raise a
> > > notice and to skip applying the change. The users can listen to the
> > > DDL notice and implement application logic to apply the change if
> > > needed. The idea is we can start gathering user feedback by providing
> > > a somewhat useful feature (compared to doing nothing about DDLs), but
> > > also avoid heading straight into the potential footgun situation
> > > caused by automatically applying any mal-formatted DDLs.
> > >
> >
> > Doesn't this mean that we still need to support deparsing of such DDLs
> > which is what I think we don't want?
>
> I think we can send the plain DDL command string and the search_path
> if we don't insist on applying it in the first version. Maybe the
> deparser can be integrated later when we're confident that it's
> ready/subset of it is ready.
>

I think this will have overhead for users that won't need it and we
have to anyway remove it later when deparsing of such commands is
added. Personally, I don't think we need to do this to catch the apply
errors.

-- 
With Regards,
Amit Kapila.

Re: Support logical replication of DDLs

2023-03-30 Thread Amit Kapila

On Thu, Mar 30, 2023 at 3:16 PM vignesh C  wrote:
>
> On Thu, 30 Mar 2023 at 13:29, houzj.f...@fujitsu.com
>  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: houzj.f...@fujitsu.com 
> > > Sent: Thursday, March 30, 2023 2:37 PM
> > >
> > > On Tuesday, March 28, 2023 12:13 PM houzj.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > On Monday, March 27, 2023 8:08 PM Amit Kapila
> > > 
> > > > wrote:
> > > > > On Mon, Mar 27, 2023 at 12:07 PM Amit Kapila 
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Mar 27, 2023 at 2:52 AM Tom Lane  wrote:
> > > > > > >
> > > > > >
> > > > > > > I suggest taking a couple of steps back from the minutiae of the
> > > > > > > patch, and spending some hard effort thinking about how the thing
> > > > > > > would be controlled in a useful fashion (that is, a real design
> > > > > > > for the filtering that was mentioned at the very outset), and
> > > > > > > about the security issues, and about how we could get to a
> > > committable
> > > > patch.
> > > > > > >
> > > > > >
> > > > > > Agreed. I'll try to summarize the discussion we have till now on
> > > > > > this and share my thoughts on the same in a separate email.
> > > > > >
> > > > >
> > > > > The idea to control what could be replicated is to introduce a new
> > > > > publication option 'ddl' along with current options 'publish' and
> > > > > 'publish_via_partition_root'. The values of this new option could be
> > > > > 'table', 'function', 'all', etc. Here 'all' enables the replication of
> > > > > all supported DDL commands. Example usage for this would be:
> > > > > Example:
> > > > > Create a new publication with all ddl replication enabled:
> > > > >   CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all');
> > > > >
> > > > > Enable table ddl replication for an existing Publication:
> > > > >   ALTER PUBLICATION pub2 SET (ddl = 'table');
> > > > >
> > > > > This is what seems to have been discussed but I think we can even
> > > > > extend it to support based on operations/commands, say one would like
> > > > > to publish only 'create' and 'drop' of tables. Then we can extend the
> > > > > existing publish option to have values like 'create', 'alter', and 
> > > > > 'drop'.
> > > > >
> > > > > Another thing we are considering related to this is at what level
> > > > > these additional options should be specified. We have three variants
> > > > > FOR TABLE, FOR ALL TABLES, and FOR TABLES IN SCHEMA that enables
> > > > > replication. Now, for the sake of simplicity, this new option is
> > > > > discussed to be provided only with FOR ALL TABLES variant but I think
> > > > > we can provide it with other variants with some additional
> > > > > restrictions like with FOR TABLE, we can only specify 'alter' and
> > > > > 'drop' for publish option. Now, though possible, it brings additional
> > > > > complexity to support it with variants other than FOR ALL TABLES
> > > > > because then we need to ensure additional filtering and possible
> > > > > modification of the content we have to send to downstream. So, we can
> > > even
> > > > decide to first support it only FOR ALL TABLES variant.
> > > > >
> > > > > The other point to consider for publish option 'ddl = table' is
> > > > > whether we need to allow replicating dependent objects like say some
> > > > > user-defined type is used in the table. I guess the difficulty here
> > > > > would be to identify which dependents we want to allow.
> > > > >
> > > > > I think in the first version we should allow to replicate only some of
> > > > > the objects instead of everything. For example, can we consider only
> > > > > allowing tables and indexes in the first version? Then extend it in a 
> > > > > phased
> > > > manner?
> > > >
> > > > I think supporting table related stuff in the first version makes sense 
> > > > and the
> > > > patch size could be reduced to a suitable size.
> > >
> > > Based on the discussion, I split the patch into four parts: Table DDL
> > > replication(0001,0002), Index DDL replication(0003), ownership stuff for 
> > > table
> > > and index(0004), other DDL's replication(0005).
> > >
> > > In this version, I mainly tried to split the patch set, and there are few
> > > OPEN items we need to address later:
> > >
> > > 1) The current publication "ddl" option only have two values: table, all. 
> > > We
> > >also need to add index and maybe other objects in the list.
> > >
> > > 2) Need to improve the syntax stuff. Currently, we store the option value 
> > > of
> > >the "with (ddl = xx)" via different columns in the catalog, the
> > >catalog(pg_publication) will have more and more columns as we add
> > > support
> > >for logical replication of other objects in the future. We could store 
> > > it as
> > >an text array instead.
> > >
> > >OTOH, since we have proposed some other more flexible syntax to 
> > > -hackers,
> > > the current
> > >syntax might be changed which might also solve this problem.
> > >
> > >

Re: PGdoc: add ID attribute to create_publication.sgml

2023-03-30 Thread Amit Kapila

On Thu, Mar 30, 2023 at 9:22 AM Peter Smith  wrote:
>
> I have marked the CF entry for this patch as "ready for committer"
>

LGTM. I'll push this tomorrow unless there are more comments for it.

-- 
With Regards,
Amit Kapila.

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2023-03-30 Thread Jelte Fennema

On Thu, 30 Mar 2023 at 10:07, Denis Laxalde  wrote:
> Patch 5 is missing respective changes; please find attached a fixup
> patch for these.

Thanks, attached are newly rebased patches that include this change. I
also cast the result of PQcancelSend to to void in the one case where
it's ignored on purpose. Note that the patchset shrunk by one, since
the original patch 0002 has been committed now.

v19-0002-Return-2-from-pqReadData-on-EOF.patch
Description: Binary data

v19-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data

v19-0003-Add-non-blocking-version-of-PQcancel.patch
Description: Binary data

v19-0004-Start-using-new-libpq-cancel-APIs.patch
Description: Binary data

Re: [EXTERNAL] Support load balancing in libpq

> On 30 Mar 2023, at 10:21, Daniel Gustafsson  wrote:
> 
>> On 30 Mar 2023, at 10:00, Julien Rouhaud  wrote:
>> 
>> On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson  wrote:
>>> 
 On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) 
  wrote:
>>> 
 While checking the buildfarm, I found a failure on NetBSD caused by the 
 added code[1]:
>>> 
>>> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 
>>> 4.7.2)
>>> has the same error.  I'll look into it today to get a fix committed.
>> 
>> This is an i686 machine, so it probably has the same void *
>> difference.  Building with -m32 might be enough to reproduce the
>> problem.
> 
> Makes sense.  I think the best option is to simply remove conn from being part
> of the seed and rely on the other values.  Will apply that after a testrun.

After some offlist discussion I ended up pushing the proposed uintptr_t fix
instead, now waiting for these animals to build.

--
Daniel Gustafsson

Re: Support logical replication of DDLs

2023-03-30 Thread vignesh C

On Thu, 30 Mar 2023 at 13:29, houzj.f...@fujitsu.com
 wrote:
>
>
>
> > -Original Message-
> > From: houzj.f...@fujitsu.com 
> > Sent: Thursday, March 30, 2023 2:37 PM
> >
> > On Tuesday, March 28, 2023 12:13 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Monday, March 27, 2023 8:08 PM Amit Kapila
> > 
> > > wrote:
> > > > On Mon, Mar 27, 2023 at 12:07 PM Amit Kapila 
> > > > wrote:
> > > > >
> > > > > On Mon, Mar 27, 2023 at 2:52 AM Tom Lane  wrote:
> > > > > >
> > > > >
> > > > > > I suggest taking a couple of steps back from the minutiae of the
> > > > > > patch, and spending some hard effort thinking about how the thing
> > > > > > would be controlled in a useful fashion (that is, a real design
> > > > > > for the filtering that was mentioned at the very outset), and
> > > > > > about the security issues, and about how we could get to a
> > committable
> > > patch.
> > > > > >
> > > > >
> > > > > Agreed. I'll try to summarize the discussion we have till now on
> > > > > this and share my thoughts on the same in a separate email.
> > > > >
> > > >
> > > > The idea to control what could be replicated is to introduce a new
> > > > publication option 'ddl' along with current options 'publish' and
> > > > 'publish_via_partition_root'. The values of this new option could be
> > > > 'table', 'function', 'all', etc. Here 'all' enables the replication of
> > > > all supported DDL commands. Example usage for this would be:
> > > > Example:
> > > > Create a new publication with all ddl replication enabled:
> > > >   CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all');
> > > >
> > > > Enable table ddl replication for an existing Publication:
> > > >   ALTER PUBLICATION pub2 SET (ddl = 'table');
> > > >
> > > > This is what seems to have been discussed but I think we can even
> > > > extend it to support based on operations/commands, say one would like
> > > > to publish only 'create' and 'drop' of tables. Then we can extend the
> > > > existing publish option to have values like 'create', 'alter', and 
> > > > 'drop'.
> > > >
> > > > Another thing we are considering related to this is at what level
> > > > these additional options should be specified. We have three variants
> > > > FOR TABLE, FOR ALL TABLES, and FOR TABLES IN SCHEMA that enables
> > > > replication. Now, for the sake of simplicity, this new option is
> > > > discussed to be provided only with FOR ALL TABLES variant but I think
> > > > we can provide it with other variants with some additional
> > > > restrictions like with FOR TABLE, we can only specify 'alter' and
> > > > 'drop' for publish option. Now, though possible, it brings additional
> > > > complexity to support it with variants other than FOR ALL TABLES
> > > > because then we need to ensure additional filtering and possible
> > > > modification of the content we have to send to downstream. So, we can
> > even
> > > decide to first support it only FOR ALL TABLES variant.
> > > >
> > > > The other point to consider for publish option 'ddl = table' is
> > > > whether we need to allow replicating dependent objects like say some
> > > > user-defined type is used in the table. I guess the difficulty here
> > > > would be to identify which dependents we want to allow.
> > > >
> > > > I think in the first version we should allow to replicate only some of
> > > > the objects instead of everything. For example, can we consider only
> > > > allowing tables and indexes in the first version? Then extend it in a 
> > > > phased
> > > manner?
> > >
> > > I think supporting table related stuff in the first version makes sense 
> > > and the
> > > patch size could be reduced to a suitable size.
> >
> > Based on the discussion, I split the patch into four parts: Table DDL
> > replication(0001,0002), Index DDL replication(0003), ownership stuff for 
> > table
> > and index(0004), other DDL's replication(0005).
> >
> > In this version, I mainly tried to split the patch set, and there are few
> > OPEN items we need to address later:
> >
> > 1) The current publication "ddl" option only have two values: table, all. We
> >also need to add index and maybe other objects in the list.
> >
> > 2) Need to improve the syntax stuff. Currently, we store the option value of
> >the "with (ddl = xx)" via different columns in the catalog, the
> >catalog(pg_publication) will have more and more columns as we add
> > support
> >for logical replication of other objects in the future. We could store 
> > it as
> >an text array instead.
> >
> >OTOH, since we have proposed some other more flexible syntax to -hackers,
> > the current
> >syntax might be changed which might also solve this problem.
> >
> > 3) The test_ddl_deparse_regress test module is not included in the set,
> > because
> >I think we also need to split it into table stuff, index stuff and 
> > others,
> >we can share it after finishing that.
> >
> > 4) The patch set could be spitted further to make it easier

Re: Images storing techniques

2023-03-30 Thread Gaetano Mendola

I would suggest to store your images in a file system and the store paths
to those images.

You can keep files and entries in your database synced via triggers/stored
procedures (eventually written in python since pgsql doesn't allow you to
interact with the file system).

On Thu, Mar 30, 2023, 11:22 Riccardo Gobbo <
riccardo.gobb...@studenti.unipd.it> wrote:

> Good evening,
> I'm a Master degree student at University of Padua in Italy and I'm
> developing a web application as assignment for the Web application course.
>
> Context: the Web application that my group is developing would ideally be
> used to manage  county side fairs where there would be foods and drinks,
> these displayed into a digital menu.
> The application uses postgre to implement a database where stores data,
> mostly strings as emails and orders but also some images (representing the
> dishes).
> The web pages are created using java servlets and jbc
>
> Question: for better performance is it better to store images as BYTEA or
> convert every image in base64 and store the generated string  (so in html
> it's enough to insert the base64 string in the tag)?
> Converting an image in base64 would use a 30% more memory than storing
> directly the image's bytes, but I don't know if working with characters
> rather than bytes could have more prons than cons
>
> Thank for the time you dedicated for the answer and I apologize both for
> disturbing you and my English.
>
> Best regards, Riccardo.
>
> Computer Engineering
> Mat. 2082156
>

Images storing techniques

2023-03-30 Thread Riccardo Gobbo

Good evening,
I'm a Master degree student at University of Padua in Italy and I'm
developing a web application as assignment for the Web application course.

Context: the Web application that my group is developing would ideally be
used to manage  county side fairs where there would be foods and drinks,
these displayed into a digital menu.
The application uses postgre to implement a database where stores data,
mostly strings as emails and orders but also some images (representing the
dishes).
The web pages are created using java servlets and jbc

Question: for better performance is it better to store images as BYTEA or
convert every image in base64 and store the generated string  (so in html
it's enough to insert the base64 string in the tag)?
Converting an image in base64 would use a 30% more memory than storing
directly the image's bytes, but I don't know if working with characters
rather than bytes could have more prons than cons

Thank for the time you dedicated for the answer and I apologize both for
disturbing you and my English.

Best regards, Riccardo.

Computer Engineering
Mat. 2082156

Re: Schema variables - new implementation for Postgres 15

2023-03-30 Thread Pavel Stehule

Hi

st 29. 3. 2023 v 12:17 odesílatel Peter Eisentraut <
peter.eisentr...@enterprisedb.com> napsal:

> On 24.03.23 08:04, Pavel Stehule wrote:
> > Maybe I can divide the  patch 0002-session-variables to three sections -
> > related to memory management, planning and execution?
>
> Personally, I find the existing split not helpful.  There is no value
> (to me) in putting code, documentation, and tests in three separate
> patches.  This is in fact counter-helpful (to me).  Things like the
> DISCARD command (0005) and the error messages changes (0009) can be
> separate patches, but most of the rest should probably be a single patch.
>
> I know you have been asked earlier in the thread to provide smaller
> patches, so don't change it just for me, but this is my opinion.
>

If I reorganize the patch to the following structure, can be it useful for
you?

1. really basic functionality (no temporary variables, no def expressions,
no memory cleaning)
   SELECT variable
   LET should be supported + doc, + related tests.

2. support for temporary variables (session, transaction scope),
memory cleaning at the end of transaction

3. PL/pgSQL support
4. pg_dump
5. shadowing warning
6. ... others ...

Can it be better for you?

Regards

Pavel

Re: [EXTERNAL] Support load balancing in libpq

> On 30 Mar 2023, at 10:00, Julien Rouhaud  wrote:
> 
> On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson  wrote:
>> 
>>> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) 
>>>  wrote:
>> 
>>> While checking the buildfarm, I found a failure on NetBSD caused by the 
>>> added code[1]:
>> 
>> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 
>> 4.7.2)
>> has the same error.  I'll look into it today to get a fix committed.
> 
> This is an i686 machine, so it probably has the same void *
> difference.  Building with -m32 might be enough to reproduce the
> problem.

Makes sense.  I think the best option is to simply remove conn from being part
of the seed and rely on the other values.  Will apply that after a testrun.

--
Daniel Gustafsson

RE: doc: add missing "id" attributes to extension packaging page

2023-03-30 Thread Hayato Kuroda (Fujitsu)

Dear Brar,

Thank you for updating the patch. The patch looks good to me.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2023-03-30 Thread Denis Laxalde

Jelte Fennema a écrit :
> On Wed, 29 Mar 2023 at 10:43, Denis Laxalde  wrote:
> > More importantly, not having PQcancelSend() creating the PGcancelConn
> > makes reuse of that value, passing through PQcancelReset(), more
> > intuitive. E.g., in the tests:
> 
> You convinced me. Attached is an updated patch where PQcancelSend
> takes the PGcancelConn and returns 1 or 0.

Patch 5 is missing respective changes; please find attached a fixup
patch for these.
>From c9e59fb3e30db1bfab75be9fdd4afbc227a5270e Mon Sep 17 00:00:00 2001
From: Denis Laxalde 
Date: Thu, 30 Mar 2023 09:19:18 +0200
Subject: [PATCH] fixup! Start using new libpq cancel APIs

---
 contrib/dblink/dblink.c  | 4 ++--
 src/fe_utils/connect_utils.c | 4 +++-
 src/test/isolation/isolationtester.c | 4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index e139f66e11..073795f088 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -1332,11 +1332,11 @@ dblink_cancel_query(PG_FUNCTION_ARGS)
 
 	dblink_init();
 	conn = dblink_get_named_conn(text_to_cstring(PG_GETARG_TEXT_PP(0)));
-	cancelConn = PQcancelSend(conn);
+	cancelConn = PQcancelConn(conn);
 
 	PG_TRY();
 	{
-		if (PQcancelStatus(cancelConn) == CONNECTION_BAD)
+		if (!PQcancelSend(cancelConn))
 		{
 			msg = pchomp(PQcancelErrorMessage(cancelConn));
 		}
diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c
index b32448c010..1cfd717217 100644
--- a/src/fe_utils/connect_utils.c
+++ b/src/fe_utils/connect_utils.c
@@ -161,7 +161,9 @@ disconnectDatabase(PGconn *conn)
 
 	if (PQtransactionStatus(conn) == PQTRANS_ACTIVE)
 	{
-		PQcancelFinish(PQcancelSend(conn));
+		PGcancelConn *cancelConn = PQcancelConn(conn);
+		PQcancelSend(cancelConn);
+		PQcancelFinish(cancelConn);
 	}
 
 	PQfinish(conn);
diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c
index 3781f7982b..de31a87571 100644
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
@@ -946,9 +946,9 @@ try_complete_step(TestSpec *testspec, PermutationStep *pstep, int flags)
 			 */
 			if (td > max_step_wait && !canceled)
 			{
-PGcancelConn *cancel_conn = PQcancelSend(conn);
+PGcancelConn *cancel_conn = PQcancelConn(conn);
 
-if (PQcancelStatus(cancel_conn) == CONNECTION_OK)
+if (PQcancelSend(cancel_conn))
 {
 	/*
 	 * print to stdout not stderr, as this should appear in
-- 
2.30.2

Re: Schema variables - new implementation for Postgres 15

2023-03-30 Thread Pavel Stehule

Hi

ne 26. 3. 2023 v 19:44 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Fri, Mar 24, 2023 at 08:04:08AM +0100, Pavel Stehule wrote:
> > čt 23. 3. 2023 v 19:54 odesílatel Pavel Stehule  >
> > napsal:
> >
> > > čt 23. 3. 2023 v 16:33 odesílatel Peter Eisentraut <
> > > peter.eisentr...@enterprisedb.com> napsal:
> > >
> > >> The other issue is that by its nature this patch adds a lot of code
> in a
> > >> lot of places.  Large patches are more likely to be successful if they
> > >> add a lot of code in one place or smaller amounts of code in a lot of
> > >> places.  But this patch does both and it's just overwhelming.  There
> is
> > >> so much new internal functionality and terminology.  Variables can be
> > >> created, registered, initialized, stored, copied, prepared, set,
> freed,
> > >> removed, released, synced, dropped, and more.  I don't know if anyone
> > >> has actually reviewed all that in detail.
> > >>
> > >> Has any effort been made to make this simpler, smaller, reduce scope,
> > >> refactoring, find commonalities with other features, try to manage the
> > >> complexity somehow?
> > >>
> > > I agree that this patch is large, but almost all code is simple.
> Complex
> > > code is "only" in 0002-session-variables.patch (113KB/438KB).
> > >
> > > Now, I have no idea how the functionality can be sensibly reduced or
> > > divided (no without significant performance loss). I see two difficult
> > > points in this code:
> > >
> > > 1. when to clean memory. The code implements cleaning very accurately -
> > > and this is unique in Postgres. Partially I implement some
> functionality of
> > > storage manager. Probably no code from Postgres can be reused, because
> > > there is not any support for global temporary objects. Cleaning based
> on
> > > sinval messages processing is difficult, but there is nothing else.
> The
> > > code is a little bit more complex, because there are three types of
> session
> > > variables: a) session variables, b) temp session variables, c) session
> > > variables with transaction scope. Maybe @c can be removed, and maybe we
> > > don't need to support not null default (this can simplify
> initialization).
> > > What do you think about it?
> > >
> > > 2. how to pass a variable's value to the executor. The implementation
> is
> > > based on extending the Param node, but it cannot reuse query params
> buffers
> > > and implements own.
> > > But it is hard to simplify code, because we want to support usage
> > > variables in queries, and usage in PL/pgSQL expressions too. And both
> are
> > > processed differently.
> > >
> >
> > Maybe I can divide the  patch 0002-session-variables to three sections -
> > related to memory management, planning and execution?
>
> I agree, the patch scale is a bit overwhelming. It's worth noting that
> due to the nature of this change certain heavy lifting has to be done in
> any case, plus I've got an impression that some part of the patch are
> quite solid (although I haven't reviewed everything, did anyone achieve
> that milestone?). But still, it would be of great help to simplify the
> current implementation, and I'm afraid the only way of doing this is to
> make trades-off about functionality vs change size & complexity.
>

There is not too much space for reduction - more - sometimes there is code
reuse between features.

I can reduce temporary session variables, but the same AtSubXact routines
are used by memory purging routines, and if only if  you drop all dependent
features, then you can get some interesting number of reduced lines. I can
imagine very reduced feature set like

1) no temporary variables, no reset at transaction end
2) without default expressions - default is null
3) direct memory cleaning on drop (without possibility of saved value after
reverted drop) or cleaning at session end always

Note - @1 and @3 shares code

This reduced implementation can still be useful. Probably it doesn't reduce
too much code, but it can reduce non trivial code. I believe so almost all
not reduced code will be almost trivial



>
> Maybe instead splitting the patch into implementation components, it's
> possible to split it feature-by-feature, where every single patch would
> represent an independent (to a certain degree) functionality? I have in
> mind something like: catalog changes; base implementation; ACL support;
> xact actions implementation (on commit drop, etc); variables with
> default value; shadowing; etc. If such approach is possible, it will
> give us: flexibility to apply only a subset of the whole patch series;
> some understanding how much complexity is coming from each feature. What
> do you think about this idea?
>

I think cleaning, dropping can be moved to a separate patch. ACL support
uses generic support (it is only a few lines).

The patch 02 can be splitted - I am not sure how these parts can be
independent. I'll try to split this patch, and we will see if it will be
better.



> I also recall

Re: [EXTERNAL] Support load balancing in libpq

2023-03-30 Thread Julien Rouhaud

On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson  wrote:
>
> > On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) 
> >  wrote:
>
> > While checking the buildfarm, I found a failure on NetBSD caused by the 
> > added code[1]:
>
> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 
> 4.7.2)
> has the same error.  I'll look into it today to get a fix committed.

This is an i686 machine, so it probably has the same void *
difference.  Building with -m32 might be enough to reproduce the
problem.

Re: [POC] Allow an extension to add data into Query and PlannedStmt nodes

2023-03-30 Thread Julien Rouhaud

Hi,

On Wed, Mar 29, 2023 at 12:02:30PM +0500, Andrey Lepikhov wrote:
>
> Previously, we read int this mailing list some controversial opinions on
> queryid generation and Jumbling technique. Here we don't intend to solve
> these problems but help an extension at least don't conflict with others on
> the queryId value.
>
> Extensions could need to pass some query-related data through all stages of
> the query planning and execution. As a trivial example, pg_stat_statements
> uses queryid at the end of execution to save some statistics. One more
> reason - extensions now conflict on queryid value and the logic of its
> changing. With this patch, it can be managed.

I just had a quick lookc at the patch, and IIUC it doesn't really help on that
side, as there's still a single official "queryid" that's computed, stored
everywhere and later used by pg_stat_statements (which does then store in
additionally to that official queryid).

You can currently change the main jumbling algorithm with a custom extension,
and all extensions will then use it as the source of truth, but I guess that
what you want is to e.g. have an additional and semantically different queryid,
and create multiple ecosystem of extensions, each using one or the other source
of queryid without changing the other ecosystem behavior.
>
> This patch introduces the structure 'ExtensionData' which allows to manage
> of a list of entries with a couple of interface functions
> addExtensionDataToNode() and GetExtensionData(). Keep in mind the possible
> future hiding of this structure from the public interface.
> An extension should invent a symbolic key to identify its data. It may
> invent as many additional keys as it wants but the best option here - is no
> more than one entry for each extension.
> Usage of this machinery is demonstrated by the pg_stat_statements example -
> here we introduced Bigint node just for natively storing of queryId value.
>
> Ruthless pgbench benchmark shows that we got some overhead:
> 1.6% - in default mode
> 4% - in prepared mode
> ~0.1% in extended mode.

That's a quite significant overhead.  But the only reason to accept such a
change is to actually use it to store additional data, so it would be
interesting to see what the overhead is like once you store at least 2
different values there.

> - Are we need to invent a registration procedure to do away with the names
> of entries and use some compact integer IDs?

Note that the patch as proposed doesn't have any defense for two extensions
trying to register something with the same name, or update a stored value, as
AddExtensionDataToNode() simply prepend the new value to the list.  You can
actually update the value by just storing the new value, but it will add a
significant overhead to every other extension that want to read another value.

The API is also quite limited as each stored value has a single identifier.
What if your extension needs to store multiple things?  Since it's all based on
Node you can't really store some custom struct, so you probably have to end up
with things like "my_extension.my_val1", "my_extension.my_val2" which isn't
great.

Re: Minimal logical decoding on standbys

Hi,

On 2023-03-04 12:19:57 +0100, Drouvot, Bertrand wrote:
> Subject: [PATCH v52 1/6] Add info in WAL records in preparation for logical
>  slot conflict handling.
> 
> Overall design:
> 
> 1. We want to enable logical decoding on standbys, but replay of WAL
> from the primary might remove data that is needed by logical decoding,
> causing error(s) on the standby. To prevent those errors, a new replication
> conflict scenario needs to be addressed (as much as hot standby does).
> 
> 2. Our chosen strategy for dealing with this type of replication slot
> is to invalidate logical slots for which needed data has been removed.
> 
> 3. To do this we need the latestRemovedXid for each change, just as we
> do for physical replication conflicts, but we also need to know
> whether any particular change was to data that logical replication
> might access. That way, during WAL replay, we know when there is a risk of
> conflict and, if so, if there is a conflict.
> 
> 4. We can't rely on the standby's relcache entries for this purpose in
> any way, because the startup process can't access catalog contents.
> 
> 5. Therefore every WAL record that potentially removes data from the
> index or heap must carry a flag indicating whether or not it is one
> that might be accessed during logical decoding.
> 
> Why do we need this for logical decoding on standby?
> 
> First, let's forget about logical decoding on standby and recall that
> on a primary database, any catalog rows that may be needed by a logical
> decoding replication slot are not removed.
> 
> This is done thanks to the catalog_xmin associated with the logical
> replication slot.
> 
> But, with logical decoding on standby, in the following cases:
> 
> - hot_standby_feedback is off
> - hot_standby_feedback is on but there is no a physical slot between
>   the primary and the standby. Then, hot_standby_feedback will work,
>   but only while the connection is alive (for example a node restart
>   would break it)
> 
> Then, the primary may delete system catalog rows that could be needed
> by the logical decoding on the standby (as it does not know about the
> catalog_xmin on the standby).
> 
> So, it’s mandatory to identify those rows and invalidate the slots
> that may need them if any. Identifying those rows is the purpose of
> this commit.

This is a very nice commit message.


> Implementation:
> 
> When a WAL replay on standby indicates that a catalog table tuple is
> to be deleted by an xid that is greater than a logical slot's
> catalog_xmin, then that means the slot's catalog_xmin conflicts with
> the xid, and we need to handle the conflict. While subsequent commits
> will do the actual conflict handling, this commit adds a new field
> isCatalogRel in such WAL records (and a new bit set in the
> xl_heap_visible flags field), that is true for catalog tables, so as to
> arrange for conflict handling.
> 
> The affected WAL records are the ones that already contain the
> snapshotConflictHorizon field, namely:
> 
> - gistxlogDelete
> - gistxlogPageReuse
> - xl_hash_vacuum_one_page
> - xl_heap_prune
> - xl_heap_freeze_page
> - xl_heap_visible
> - xl_btree_reuse_page
> - xl_btree_delete
> - spgxlogVacuumRedirect
> 
> Due to this new field being added, xl_hash_vacuum_one_page and
> gistxlogDelete do now contain the offsets to be deleted as a
> FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignement.
> It's not needed on the others struct where isCatalogRel has
> been added.
> 
> Author: Andres Freund (in an older version), Amit Khandekar, Bertrand
> Drouvot

I think you're first author on this one by now.


I think this commit is ready to go. Unless somebody thinks differently, I
think I might push it tomorrow.


> Subject: [PATCH v52 2/6] Handle logical slot conflicts on standby.


> @@ -6807,7 +6808,8 @@ CreateCheckPoint(int flags)
>*/
>   XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
>   KeepLogSeg(recptr, &_logSegNo);
> - if (InvalidateObsoleteReplicationSlots(_logSegNo))
> + InvalidateObsoleteReplicationSlots(_logSegNo, , InvalidOid, 
> NULL);
> + if (invalidated)
>   {
>   /*
>* Some slots have been invalidated; recalculate the old-segment

I don't really understand why you changed InvalidateObsoleteReplicationSlots
to return void instead of bool, and then added an output boolean argument via
a pointer?



> @@ -7964,6 +7968,22 @@ xlog_redo(XLogReaderState *record)
>   /* Update our copy of the parameters in pg_control */
>   memcpy(, XLogRecGetData(record), 
> sizeof(xl_parameter_change));
>  
> + /*
> +  * Invalidate logical slots if we are in hot standby and the 
> primary does not
> +  * have a WAL level sufficient for logical decoding. No need to 
> search
> +  * for potentially conflicting logically slots if standby is 
> running
> +  * with wal_level lower than logical, because in that case,

Re: [EXTERNAL] Support load balancing in libpq

> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu)  
> wrote:

> While checking the buildfarm, I found a failure on NetBSD caused by the added 
> code[1]:

Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 4.7.2)
has the same error.  I'll look into it today to get a fix committed.

--
Daniel Gustafsson

Re: ICU locale validation / canonicalization


On 30.03.23 04:33, Jeff Davis wrote:

Attached is a new version of the final patch, which performs
canonicalization. I'm not 100% sure that it's wanted, but it still
seems like a good idea to get the locales into a standard format in the
catalogs, and if a lot more people start using ICU in v16 (because it's
the default), then it would be a good time to do it. But perhaps there
are risks?


I say, let's do it.


I don't think we should show the notice when the canonicalization 
doesn't change anything.  This is not useful:


+NOTICE:  using language tag "und-u-kf-upper" for locale "und-u-kf-upper"

Also, the message should be phrased more from the perspective of the 
user instead of using ICU jargon, like


NOTICE:  using canonicalized form "%s" for locale specification "%s"

(Still too many big words?)


I don't think the special handling of IsBinaryUpgrade is needed or 
wanted.  I would hope that with this feature, all old-style locale IDs 
would go away, but this way we would keep them forever.  If we believe 
that canonicalization is safe, then I don't see why we cannot apply it 
during binary upgrade.



Needs documentation updates in doc/src/sgml/charset.sgml.

Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-03-30 Thread Karl O. Pinc

On Wed, 29 Mar 2023 21:32:05 -0700
Noah Misch  wrote:

> On Sun, Jan 22, 2023 at 02:42:46PM -0600, Karl O. Pinc wrote:
> > v10-0001-List-trusted-and-obsolete-extensions.patch  
> 
> > + 
> > +   These modules and extensions are obsolete:
> > +
> > +   
> > + 
> > + 
> > +   
> > +   
> 
> Commit a013738 incorporated this change.  Since xml2 is the only
> in-tree way to use XSLT from SQL, I think xml2 is not obsolete.  Some
> individual functions, e.g. xml_valid(), are obsolete.  (There are
> years-old threats to render the module obsolete, but this has never
> happened.)

Your point seems valid but this is above my station.
I have no idea as to how to best resolve this, or even how to make the
resolution happen now that the change has been committed.
Someone who knows more than me about the situation is needed
to change the phrasing, or re-categorize, or rework the xml2
module docs, or come up with new categories of obsolescence-like 
states, or provide access to libxslt from PG, or something.

I am invested in the patch and appreciate being cc-ed.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

pg_basebackup: Correct type of WalSegSz