Re: Add index scan progress to pg_stat_progress_vacuum

2022-06-02 Thread Masahiko Sawada
On Fri, May 27, 2022 at 10:52 AM Imseih (AWS), Sami  wrote:
>
> >Another idea I came up with is that we can wait for all index vacuums
> >to finish while checking and updating the progress information, and
> >then calls WaitForParallelWorkersToFinish after confirming all index
> >status became COMPLETED. That way, we don’t need to change the
> >parallel query infrastructure. What do you think?
>
> Thinking about this a bit more, the idea of using
> WaitForParallelWorkersToFinish
> Will not work if you have a leader worker that is
> stuck on a large index. The progress will not be updated
> until the leader completes. Even if the parallel workers
> finish.

Right.

>
> What are your thought about piggybacking on the
> vacuum_delay_point to update progress. The leader can
> perhaps keep a counter to update progress every few thousand
> calls to vacuum_delay_point.
>
> This goes back to your original idea to keep updating progress
> while scanning the indexes.

I think we can have the leader process wait for all index statuses to
become COMPLETED before WaitForParallelWorkersToFinish(). While
waiting for it, the leader can update its progress information. After
the leader confirmed all index statuses became COMPLETED, it can wait
for the workers to finish by WaitForParallelWorkersToFinish().

Regarding waiting in vacuum_delay_point, it might be a side effect as
it’s called every page and used not only by vacuum such as analyze,
but it seems to be worth trying.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Handle infinite recursion in logical replication setup

2022-06-02 Thread Peter Smith
Please find below my review comments for patch v17-0002:

==

1. Commit message

This patch adds a new SUBSCRIPTION parameter "origin". It Specifies whether
the subscription will request the publisher to only send changes that
originated locally, or to send any changes regardless of origin.

"It Specifies" -> "It specifies"

~~~

2. Commit message

Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port='
PUBLICATION pub1 with (origin = local);

"with" -> "WITH"

==

3. doc/src/sgml/catalogs.sgml

+  
+   Possible origin values are local,
+   any, or NULL if none is specified.
+   If local, the subscription will request the
+   publisher to only send changes that originated locally. If
+   any, the publisher sends any changes regardless of
+   their origin.
+  

Should this also mention that NULL (default) is equivalent to 'any'?

SUGGESTION
If any (or NULL), the publisher
sends any changes regardless of their origin.

==

4. src/backend/catalog/pg_subscription.c

@@ -72,6 +72,16 @@ GetSubscription(Oid subid, bool missing_ok)
  sub->twophasestate = subform->subtwophasestate;
  sub->disableonerr = subform->subdisableonerr;

+ datum = SysCacheGetAttr(SUBSCRIPTIONOID,
+ tup,
+ Anum_pg_subscription_suborigin,
+ );
+
+ if (!isnull)
+ sub->origin = TextDatumGetCString(datum);
+ else
+ sub->origin = NULL;
+
  /* Get conninfo */
  datum = SysCacheGetAttr(SUBSCRIPTIONOID,
  tup,

Missing comment like the nearby code has:
/* Get origin */

==

5. src/backend/replication/logical/worker.c

+/* Macro for comparing string fields that might be NULL */
+#define equalstr(a, b) \
+ (((a) != NULL && (b) != NULL) ? (strcmp(a, b) == 0) : (a) == (b))
+

Should that have some extra parens for the macro args?
e.g. "strcmp((a), (b))"

~~~

6. src/backend/replication/logical/worker.c - maybe_reread_subscription

@@ -3059,6 +3063,7 @@ maybe_reread_subscription(void)
  strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
  newsub->binary != MySubscription->binary ||
  newsub->stream != MySubscription->stream ||
+ equalstr(newsub->origin, MySubscription->origin) ||
  newsub->owner != MySubscription->owner ||
  !equal(newsub->publications, MySubscription->publications))
  {

Is that right? Shouldn't it say !equalstr(...)?

==

7. src/backend/replication/pgoutput/pgoutput.c - parse_output_parameters

@@ -380,6 +382,16 @@ parse_output_parameters(List *options, PGOutputData *data)

  data->two_phase = defGetBoolean(defel);
  }
+ else if (strcmp(defel->defname, "origin") == 0)
+ {
+ if (origin_option_given)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+ origin_option_given = true;
+
+ data->origin = defGetString(defel);
+ }

Should this function also be validating that the origin parameter
value is only permitted to be one of "local" or "any"?

~~~

8. src/backend/replication/pgoutput/pgoutput.c - pgoutput_origin_filter

@@ -1698,12 +1710,20 @@ pgoutput_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
 }

 /*
- * Currently we always forward.
+ * Return true if the data source (origin) is remote and user has requested
+ * only local data, false otherwise.
  */
 static bool
 pgoutput_origin_filter(LogicalDecodingContext *ctx,
 RepOriginId origin_id)
 {
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
+ if (data->origin &&
+ (strcmp(data->origin, "local") == 0) &&
+ origin_id != InvalidRepOriginId)
+ return true;
+
  return false;
 }

8a.
Could rewrite the condition by putting the strcmp last so you can
avoid doing unnecessary strcmp.

e.g
+ if (data->origin &&
+ origin_id != InvalidRepOriginId &&
+ strcmp(data->origin, "local" == 0)


8b.
I also wondered if it might be worth considering caching the origin
parameter value when it was parsed so that you can avoid doing any
strcmp at all during this function. Because otherwise this might be
called millions of times, right?

==

9. src/include/catalog/pg_subscription.h

@@ -87,6 +87,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW

  /* List of publications subscribed to */
  text subpublications[1] BKI_FORCE_NOT_NULL;
+
+ /* Publish the data originated from the specified origin */
+ text suborigin;
 #endif
 } FormData_pg_subscription;

SUGGESTION (for the comment text)
/* Only publish data originating from the specified origin */

~~~

10 src/include/catalog/pg_subscription.h - Subscription

@@ -118,6 +121,9 @@ typedef struct Subscription
  char*slotname; /* Name of the replication slot */
  char*synccommit; /* Synchronous commit setting for worker */
  List*publications; /* List of publication names to subscribe to */
+ char*origin; /* Publish the data originated from the
+ * specified origin */
+
 } Subscription;

10a.
Reword comment same as suggested in review comment #9

10b.
Remove spurious blank line

==

11. src/include/replication/walreceiver.h

@@ -183,6 +183,8 @@ 

Re: Handle infinite recursion in logical replication setup

2022-06-02 Thread Peter Smith
Please find below my review comments for patch v17-0001:

==

1. Commit message
Add a missing test to verify only-local option in test_decoding plugin.

"option" -> "parameter"

==

2. contrib/test_decoding/sql/replorigin.sql

2a.
+-- Verify that remote origin data is not returned with only-local option

SUGGESTION
-- Verify the behaviour of the only-local parameter

2b.
+-- Remote origin data returned when only-local option is not set

"option" -> "parameter"

2c.
+-- Remote origin data not returned when only-local option is set

"option" -> "parameter"

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: some aspects of our qsort might not be ideal

2022-06-02 Thread Peter Geoghegan
On Thu, Jun 2, 2022 at 9:24 PM John Naylor  wrote:
> I had heard of it but not looked into it deeply. I did read that Java
> 7 uses dual pivot quicksort for primitives and timsort for objects. I
> wasn't sure if dual pivot was not good for objects (which could have
> possibly-complex comparators) or if timsort was just simply good for
> Java's use cases. It seems accessible to try doing, so I'll look into
> that.

I think that Timsort is most useful with cases where individual
comparisons are inherently very expensive (perhaps even implemented in
Python), which might be common with objects due to the indirection
that Python (and even Java) impose on objects in general.

I would imagine that this is a lot less relevant in
Postgres/tuplesort, because we have optimizations like abbreviated
keys. And because we're mostly just sorting tuples on one or more
attributes of scalar types.

The abbreviated keys optimization is very much something that comes
from the world of databases, not the world of sorting. It's pretty much a
domain-specific technique. That seems relevant to me.

--
Peter Geoghegan




Re: some aspects of our qsort might not be ideal

2022-06-02 Thread John Naylor
On Fri, Jun 3, 2022 at 10:44 AM Peter Geoghegan  wrote:
>
> What about dual-pivot quicksort, which is used in Java 7+? That is the
> defacto successor to Bentley & McIlroy. In fact, Jon Bentley himself
> collaborated with its author, and provided benchmarking input. The
> underlying philosophy is essentially the same as the original -- it
> is supposed to be an "industrial strength" quicksort, with various
> adversarial cases considered directly.

I had heard of it but not looked into it deeply. I did read that Java
7 uses dual pivot quicksort for primitives and timsort for objects. I
wasn't sure if dual pivot was not good for objects (which could have
possibly-complex comparators) or if timsort was just simply good for
Java's use cases. It seems accessible to try doing, so I'll look into
that.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: PG15 beta1 sort performance regression due to Generation context change

2022-06-02 Thread John Naylor
On Fri, Jun 3, 2022 at 10:14 AM David Rowley  wrote:
>
> On Wed, 1 Jun 2022 at 03:09, Tom Lane  wrote:
> > Right now my vote would be to leave things as they stand for v15 ---
> > the performance loss that started this thread occurs in a narrow
> > enough set of circumstances that I don't feel too much angst about
> > it being the price of winning in most other circumstances.  We can
> > investigate these options at leisure for v16 or later.
>
> I've been hesitating a little to put my views here as I wanted to see
> what the other views were first.  My thoughts are generally in
> agreement with you, i.e., to do nothing for PG15 about this.  My
> reasoning is:
>
> 1. Most cases are faster as a result of using generation contexts for sorting.
> 2. The slowdown cases seem rare and the speedup cases are much more common.
> 3. There were performance cliffs in PG14 if a column was added to a
> table to make the tuple size cross a power-of-2 boundary which I don't
> recall anyone complaining about. PG15 makes the performance drop more
> gradual as tuple sizes increase. Performance is more predictable as a
> result.
> 4. As I just demonstrated in [1], if anyone is caught by this and has
> a problem, the work_mem size increase required seems very small to get
> performance back to better than in PG14. I found that setting work_mem
> to 64.3MB makes PG15 faster than PG14 for the problem case. If anyone
> happened to hit this case and find the performance regression
> unacceptable then they have a way out... increase work_mem a little.

Since #4 is such a small lift, I'd be comfortable with closing the open item.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: pg_upgrade test writes to source directory

2022-06-02 Thread Andres Freund
On 2022-06-03 12:29:04 +0900, Michael Paquier wrote:
> On Thu, Jun 02, 2022 at 05:17:34PM -0400, Andrew Dunstan wrote:
> > I hope we fix the original issue soon - it's apparently been the cause
> > of numerous buildfarm failures that it was on my list to investigate
> > e.g.
> > 
> 
> Oops.  Thanks, Andrew, I was not aware of that.  I don't really want
> to wait more if this impacts some of the buildfarm animals.  Even if
> we don't conclude with the use of TESTOUTDIR for the time being, I see
> no strong objections in using TESTDIR/tmp_check, aka
> ${PostgreSQL::Test::Utils::tmp_check}.  So I propose to apply a fix
> doing that in the next 24 hours or so.  We can always switch to a
> different path once we decide something else, if necessary.

+1

Greetings,

Andres Freund




Re: some aspects of our qsort might not be ideal

2022-06-02 Thread Peter Geoghegan
On Thu, Jun 2, 2022 at 8:33 PM John Naylor  wrote:
> Attached is a draft series that implements some but not all features
> of pattern-defeating quicksort, namely the ones I thought were
> interesting for us. Recently this quicksort variant got committed for
> the next release of the Go language 1.19 [1] (which in turn was based
> on that of Rust [2]), and that implementation was a helpful additional
> example. The bottom line is that replacing the partitioning scheme
> this way is likely not worth doing because of our particular use
> cases, but along the way I found some other things that might be worth
> doing, so some good may come out of this.

What about dual-pivot quicksort, which is used in Java 7+? That is the
defacto successor to Bentley & McIlroy. In fact, Jon Bentley himself
collaborated with its author, and provided benchmarking input. The
underlying philosophy is essentially the same as the original -- it
is supposed to be an "industrial strength" quicksort, with various
adversarial cases considered directly.

See:

https://www.wild-inter.net/publications/wild-2018b.pdf

https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf

At one point quite a few years back I planned on investigating it
myself, but never followed through.

--
Peter Geoghegan




Re: pg_upgrade test writes to source directory

2022-06-02 Thread Michael Paquier
On Thu, Jun 02, 2022 at 05:17:34PM -0400, Andrew Dunstan wrote:
> I hope we fix the original issue soon - it's apparently been the cause
> of numerous buildfarm failures that it was on my list to investigate
> e.g.
> 

Oops.  Thanks, Andrew, I was not aware of that.  I don't really want
to wait more if this impacts some of the buildfarm animals.  Even if
we don't conclude with the use of TESTOUTDIR for the time being, I see
no strong objections in using TESTDIR/tmp_check, aka
${PostgreSQL::Test::Utils::tmp_check}.  So I propose to apply a fix
doing that in the next 24 hours or so.  We can always switch to a
different path once we decide something else, if necessary.
--
Michael


signature.asc
Description: PGP signature


Re: PG15 beta1 sort performance regression due to Generation context change

2022-06-02 Thread David Rowley
On Wed, 1 Jun 2022 at 03:09, Tom Lane  wrote:
> Right now my vote would be to leave things as they stand for v15 ---
> the performance loss that started this thread occurs in a narrow
> enough set of circumstances that I don't feel too much angst about
> it being the price of winning in most other circumstances.  We can
> investigate these options at leisure for v16 or later.

I've been hesitating a little to put my views here as I wanted to see
what the other views were first.  My thoughts are generally in
agreement with you, i.e., to do nothing for PG15 about this.  My
reasoning is:

1. Most cases are faster as a result of using generation contexts for sorting.
2. The slowdown cases seem rare and the speedup cases are much more common.
3. There were performance cliffs in PG14 if a column was added to a
table to make the tuple size cross a power-of-2 boundary which I don't
recall anyone complaining about. PG15 makes the performance drop more
gradual as tuple sizes increase. Performance is more predictable as a
result.
4. As I just demonstrated in [1], if anyone is caught by this and has
a problem, the work_mem size increase required seems very small to get
performance back to better than in PG14. I found that setting work_mem
to 64.3MB makes PG15 faster than PG14 for the problem case. If anyone
happened to hit this case and find the performance regression
unacceptable then they have a way out... increase work_mem a little.

Also, in terms of what we might do to improve this situation for PG16:
I was also discussing this off-list with Andres which resulted in him
prototyping a patch [2] to store the memory context type in 3-bits in
the 64-bits prior to the pointer which is used to lookup a memory
context method table so that we can call the correct function. I've
been hacking around with this and I've added some optimisations and
got the memory allocation test [3] (modified to use aset.c rather than
generation.c) showing very promising results when comparing this patch
to master.

There are still a few slowdowns, but 16-byte allocations up to
256-bytes allocations are looking pretty good. Up to ~10% faster
compared to master.

(lower is better)

size   compare
8   114.86%
16 89.04%
32 90.95%
64 94.17%
128   93.36%
256   96.57%
512   101.25%
1024 109.88%
2048 100.87%

There's quite a bit more work to do for deciding how to handle large
allocations and there's also likely more than can be done to further
shrink the existing chunk headers for each of the 3 existing memory
allocators.

David

[1] 
https://www.postgresql.org/message-id/CAApHDvq8MoEMxHN+f=rccfwcfr30an1w3uokruunnplvrr3...@mail.gmail.com
[2] https://github.com/anarazel/postgres/tree/mctx-chunk
[3] 
https://www.postgresql.org/message-id/attachment/134021/allocate_performance_function.patch




Re: pgsql: Use pre-fetching for ANALYZE

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-02 19:30:16 -0700, Andres Freund wrote:
> On 2021-03-16 18:48:08 +, Stephen Frost wrote:
> > Use pre-fetching for ANALYZE
> > 
> > When we have posix_fadvise() available, we can improve the performance
> > of an ANALYZE by quite a bit by using it to inform the kernel of the
> > blocks that we're going to be asking for.  Similar to bitmap index
> > scans, the number of buffers pre-fetched is based off of the
> > maintenance_io_concurrency setting (for the particular tablespace or,
> > if not set, globally, via get_tablespace_maintenance_io_concurrency()).
> 
> I just looked at this as part of debugging a crash / hang in the AIO patch.
> 
> The code does:
> 
>   block_accepted = table_scan_analyze_next_block(scan, targblock, 
> vac_strategy);
> 
> #ifdef USE_PREFETCH
> 
>   /*
>* When pre-fetching, after we get a block, tell the kernel 
> about the
>* next one we will want, if there's any left.
>*
>* We want to do this even if the 
> table_scan_analyze_next_block() call
>* above decides against analyzing the block it picked.
>*/
>   if (prefetch_maximum && prefetch_targblock != 
> InvalidBlockNumber)
>   PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, 
> prefetch_targblock);
> #endif
> 
> I.e. we lock a buffer and *then* we prefetch another buffer. That seems like a
> quite bad idea to me. Why are we doing IO while holding a content lock, if we
> can avoid it?

It also seems decidedly not great from a layering POV to do the IO in
analyze.c. There's no guarantee that the tableam maps blocks in a way that's
compatible with PrefetchBuffer().  Yes, the bitmap heap scan code does
something similar, but a) that is opt in by the AM, b) there's a comment
saying it's quite crufty and should be fixed.

Greetings,

Andres Freund




Re: pgsql: Use pre-fetching for ANALYZE

2022-06-02 Thread Andres Freund
Hi,

On 2021-03-16 18:48:08 +, Stephen Frost wrote:
> Use pre-fetching for ANALYZE
> 
> When we have posix_fadvise() available, we can improve the performance
> of an ANALYZE by quite a bit by using it to inform the kernel of the
> blocks that we're going to be asking for.  Similar to bitmap index
> scans, the number of buffers pre-fetched is based off of the
> maintenance_io_concurrency setting (for the particular tablespace or,
> if not set, globally, via get_tablespace_maintenance_io_concurrency()).

I just looked at this as part of debugging a crash / hang in the AIO patch.

The code does:

block_accepted = table_scan_analyze_next_block(scan, targblock, 
vac_strategy);

#ifdef USE_PREFETCH

/*
 * When pre-fetching, after we get a block, tell the kernel 
about the
 * next one we will want, if there's any left.
 *
 * We want to do this even if the 
table_scan_analyze_next_block() call
 * above decides against analyzing the block it picked.
 */
if (prefetch_maximum && prefetch_targblock != 
InvalidBlockNumber)
PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, 
prefetch_targblock);
#endif

I.e. we lock a buffer and *then* we prefetch another buffer. That seems like a
quite bad idea to me. Why are we doing IO while holding a content lock, if we
can avoid it?

Greetings,

Andres Freund




Re: Multi-Master Logical Replication

2022-06-02 Thread Bruce Momjian
On Thu, Jun  2, 2022 at 05:12:49PM +1000, Peter Smith wrote:
> On Thu, Jun 2, 2022 at 12:03 AM Bruce Momjian  wrote:
> >
> > On Wed, Jun  1, 2022 at 10:27:27AM +0530, Amit Kapila wrote:
> ...
> 
> > My big point is that you should not be showing up with a patch but
> > rather have these discussions to get agreement that this is the
> > direction the community wants to go.
> 
> The purpose of posting the POC patch was certainly not to present a
> fait accompli design/implementation.
> 
> We wanted to solicit some community feedback about the desirability of
> the feature, but because LRG is complicated to describe we felt that
> having a basic functional POC might help to better understand the
> proposal. Also, we thought the ability to experiment with the proposed
> API could help people to decide whether LRG is something worth
> pursuing or not.

I don't think the POC is helping, and I am not sure we really want to
support this style of architecture due to its complexity vs other
options.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: convert libpq uri-regress tests to tap test

2022-06-02 Thread Michael Paquier
On Thu, Jun 02, 2022 at 09:48:25AM -0700, Andres Freund wrote:
> On 2022-06-01 13:59:06 +0900, Michael Paquier wrote:
>> So, this leads to something like the attached.  Does that sound fine
>> to you?
> 
> That looks reasonable to me. Do you want to apply it or will you?

Thanks for double-checking!  I should be able to take care of that
today.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-06-02 Thread Michael Paquier
On Thu, Jun 02, 2022 at 03:56:28PM -0700, Jacob Champion wrote:
> All right, here's the full list of previous suggestions, I think:
> 
> - SharedPort
> - MyProcShared
> - ProcParallelInfo
> - ProcInfoParallel
> - ParallelProcInfo
> - ParallelPortInfo
> 
> I have a few new proposals:
> 
> - GlobalPortInfo
> - GlobalConnInfo
> - Synced[Port/Conn]Info
> - Worker[Port/Conn]Info (but I think this suffers from the exact same
> problem as Parallel)
> - ThingsThatHappenToBeSynchronizedAcrossProcessesWhenParallelismIsInUse
> - OrderImporterConsumerTemplateBeanFactory

ParallelPortInfo sounds kind of right for the job to me in this set of
proposals, as the data is from the Port, and that's some information
shared between all the parallel workers and the leader.
--
Michael


signature.asc
Description: PGP signature


margay fails assertion in stats/dsa/dsm code

2022-06-02 Thread Thomas Munro
Hi,

BF animal margay (a newly started Solaris 11.4/Sparc/GCC 11.2 box) is
sometimes failing with:

TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c",
Line: 1069, PID: 9038)

I can't immediately see why it's doing this, but my tool that looks
for assertion failures hasn't seen that on any other system.   Example
stack (trimmed from log), in this case a regular backend, other times
it was a parallel worker:

TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c",
Line: 1069, PID: 3944)
ExceptionalCondition+0x64 [0x1008bb348]
dsm_segment_address+0x44 [0x1006ff7d0]
get_segment_by_index+0x7c [0x1008ee960]
dsa_get_address+0x9c [0x1008ef754]
pgstat_get_entry_ref+0x1068 [0x10075f348]
pgstat_prep_pending_entry+0x58 [0x100758424]
pgstat_assoc_relation+0x44 [0x10075b314]
_bt_first+0x9ac [0x10036cd78]
btgettuple+0x10c [0x1003653a8]
index_getnext_tid+0x4c [0x1003531c4]
index_getnext_slot+0x78 [0x100353564]
systable_getnext+0x18 [0x1003519b4]
SearchCatCacheMiss+0x74 [0x10089ce18]
SearchCatCacheInternal+0x1c0 [0x10089d0a4]
GetSysCacheOid+0x34 [0x1008b5ca4]
get_role_oid+0x18 [0x100767444]
hba_getauthmethod+0x8 [0x100599da4]
ClientAuthentication+0x1c [0x10058cb68]
InitPostgres+0xacc [0x1008d26b8]
PostgresMain+0x94 [0x1007397f8]
ServerLoop+0x1184 [0x1006739e8]
PostmasterMain+0x1400 [0x10067520c]
main+0x2e0 [0x1005a28c0]
_start+0x64 [0x1002c5c44]

I know that on Solaris we use dynamic_shared_memory=posix.  The other
Solaris/Sparc system is wrasse, and it's not doing this.  I don't see
it yet, but figured I'd report this much to the list in case someone
else does.




Re: PG15 beta1 sort performance regression due to Generation context change

2022-06-02 Thread Robert Haas
On Thu, Jun 2, 2022 at 5:37 PM David Rowley  wrote:
> I had a quick look at that for the problem case and we're very close
> in terms of work_mem size to better performance. A work_mem of just
> 64.3MB brings the performance back to better than PG14.

This is one of the things that I find super-frustrating about work_mem
and sorting. I mean, we all know that work_mem is hard to tune because
it's per-node rather than per-query or per-backend, but on top of
that, sort performance doesn't change smoothly as you vary it. I've
seen really different work_mem settings produce only slightly
different performance, and here you have the opposite: only slightly
different work_mem settings produce significantly different
performance. It's not even the case that more memory is necessarily
better than less.

I have no idea what to do about this, and even if I did, it's too late
to redesign v15. But I somehow feel like the whole model is just
wrong. Sorting shouldn't use more memory unless it's actually going to
speed things up -- and not just any speed-up, but one that's
significant compared to the additional expenditure of memory. But the
fact that the sorting code just treats the memory budget as an input,
and is not adaptive in any way, seems pretty bad. It means we use up
all that memory even if a much smaller amount of memory would deliver
the same performance, or even better performance.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-06-02 Thread Jacob Champion
On Thu, Jun 2, 2022 at 6:52 AM Robert Haas  wrote:
> I'm not sure what it SHOULD be called, exactly: that's one of the hard
> problems in computer science.[1]

Yeah...

All right, here's the full list of previous suggestions, I think:

- SharedPort
- MyProcShared
- ProcParallelInfo
- ProcInfoParallel
- ParallelProcInfo
- ParallelPortInfo

I have a few new proposals:

- GlobalPortInfo
- GlobalConnInfo
- Synced[Port/Conn]Info
- Worker[Port/Conn]Info (but I think this suffers from the exact same
problem as Parallel)
- ThingsThatHappenToBeSynchronizedAcrossProcessesWhenParallelismIsInUse
- OrderImporterConsumerTemplateBeanFactory

I am struggling to come up with a single adjective that captures this
concept of sometimes-synchronized, that isn't conflicting with
existing uses (like Shared). Other suggestions are very welcome.

Thanks,
--Jacob




oat_post_create expected behavior

2022-06-02 Thread Mary Xu
Hello,

I was using an object access hook for oat_post_create access while creating
an extension and expected that I would be able to query for the newly
created extension with get_extension_oid(), but it was returning
InvalidOid. However, the same process works for triggers, so I was
wondering what the expected behavior is?
>From the documentation in objectaccess.h, it doesn't mention anything about
CommandCounterIncrement() for POST_CREATE which implied to me that it
wasn't something I would need to worry about.
One option I thought of was this patch where CCI is called before the
access hook so that the new tuple is visible in the hook. Another option
would be to revise the documentation to reflect the expected behavior.

Thanks,

Mary Xu
diff --git a/src/backend/catalog/objectaccess.c b/src/backend/catalog/objectaccess.c
index 1c51df02d2..e098443915 100644
--- a/src/backend/catalog/objectaccess.c
+++ b/src/backend/catalog/objectaccess.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "access/xact.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_namespace.h"
@@ -40,6 +41,7 @@ RunObjectPostCreateHook(Oid classId, Oid objectId, int subId,
 	memset(_arg, 0, sizeof(ObjectAccessPostCreate));
 	pc_arg.is_internal = is_internal;
 
+	CommandCounterIncrement();
 	(*object_access_hook) (OAT_POST_CREATE,
 		   classId, objectId, subId,
 		   (void *) _arg);
diff --git a/src/test/modules/test_oat_hooks/Makefile b/src/test/modules/test_oat_hooks/Makefile
index 2b5d2687f8..74ef7f0239 100644
--- a/src/test/modules/test_oat_hooks/Makefile
+++ b/src/test/modules/test_oat_hooks/Makefile
@@ -6,6 +6,10 @@ OBJS = \
 	test_oat_hooks.o
 PGFILEDESC = "test_oat_hooks - example use of object access hooks"
 
+EXTENSION = test_ext1
+
+DATA = test_ext1--1.0.sql
+
 REGRESS = test_oat_hooks
 
 # disable installcheck for now
diff --git a/src/test/modules/test_oat_hooks/expected/test_oat_hooks.out b/src/test/modules/test_oat_hooks/expected/test_oat_hooks.out
index 39b274b8fa..c680bf4d9e 100644
--- a/src/test/modules/test_oat_hooks/expected/test_oat_hooks.out
+++ b/src/test/modules/test_oat_hooks/expected/test_oat_hooks.out
@@ -61,6 +61,13 @@ NOTICE:  in process utility: superuser attempting CreateRoleStmt
 NOTICE:  in object access: superuser attempting create (subId=0x0) [explicit]
 NOTICE:  in object access: superuser finished create (subId=0x0) [explicit]
 NOTICE:  in process utility: superuser finished CreateRoleStmt
+CREATE EXTENSION test_ext1;
+NOTICE:  in process utility: superuser attempting CreateExtensionStmt
+NOTICE:  in object access: superuser attempting namespace search (subId=0x0) [no report on violation, allowed]
+NOTICE:  in object access: superuser finished namespace search (subId=0x0) [no report on violation, allowed]
+NOTICE:  in object access: superuser attempting create (subId=0x0) [explicit]
+NOTICE:  in object access: superuser finished create (subId=0x0) [explicit]
+NOTICE:  in process utility: superuser finished CreateExtensionStmt
 CREATE TABLE regress_test_table (t text);
 NOTICE:  in process utility: superuser attempting CreateStmt
 NOTICE:  in object access: superuser attempting namespace search (subId=0x0) [no report on violation, allowed]
@@ -293,6 +300,8 @@ privileges for parameter test_oat_hooks.user_var2
 privileges for parameter test_oat_hooks.super_var2
 privileges for parameter none.such
 privileges for parameter another.bogus
+DROP EXTENSION test_ext1;
+DROP TABLE regress_test_table;
 REVOKE ALL PRIVILEGES ON PARAMETER
 	none.such, another.bogus,
 	test_oat_hooks.user_var1, test_oat_hooks.super_var1,
diff --git a/src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql b/src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql
index 8b6d5373aa..4e40d23571 100644
--- a/src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql
+++ b/src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql
@@ -36,6 +36,7 @@ REVOKE ALL ON PARAMETER "none.such" FROM PUBLIC;
 
 -- Create objects for use in the test
 CREATE USER regress_test_user;
+CREATE EXTENSION test_ext1;
 CREATE TABLE regress_test_table (t text);
 GRANT SELECT ON Table regress_test_table TO public;
 CREATE FUNCTION regress_test_func (t text) RETURNS text AS $$
@@ -92,6 +93,8 @@ RESET SESSION AUTHORIZATION;
 
 SET test_oat_hooks.audit = false;
 DROP ROLE regress_role_joe;  -- fails
+DROP EXTENSION test_ext1;
+DROP TABLE regress_test_table;
 REVOKE ALL PRIVILEGES ON PARAMETER
 	none.such, another.bogus,
 	test_oat_hooks.user_var1, test_oat_hooks.super_var1,
diff --git a/src/test/modules/test_oat_hooks/test_ext1--1.0.sql b/src/test/modules/test_oat_hooks/test_ext1--1.0.sql
new file mode 100644
index 00..fc15f19684
--- /dev/null
+++ b/src/test/modules/test_oat_hooks/test_ext1--1.0.sql
@@ -0,0 +1,2 @@
+/* src/test/modules/test_oat_hooks/test_ext1--1.0.sql */
+/* empty sql file */
diff --git a/src/test/modules/test_oat_hooks/test_ext1.control b/src/test/modules/test_oat_hooks/test_ext1.control

Re: [PATCH] Compression dictionaries for JSONB

2022-06-02 Thread Jacob Champion
On Thu, Jun 2, 2022 at 6:30 AM Aleksander Alekseev
 wrote:
> > I saw there was some previous discussion about dictionary size. It
> > looks like this approach would put all dictionaries into a shared OID
> > pool. Since I don't know what a "standard" use case is, is there any
> > risk of OID exhaustion for larger deployments with many dictionaries?
> > Or is 2**32 so comparatively large that it's not really a serious
> > concern?
>
> I agree, this is a drawback of the current implementation. To be honest,
> I simply followed the example of how ENUMs are implemented. I'm not 100% sure
> if we should be worried here (apparently, freed OIDs are reused). I'm OK with
> using a separate sequence if someone could second this. This is the first time
> I'm altering the catalog so I'm not certain what the best practices are.

I think reuse should be fine (if a bit slower, but offhand that
doesn't seem like an important bottleneck). Users may be unamused to
find that one large dictionary has prevented the creation of any new
entries in other dictionaries, though. But again, I have no intuition
for the size of a production-grade compression dictionary, and maybe
it's silly to assume that normal use would ever reach the OID limit.

> > I see the algorithm description, but I'm curious to know whether it's
> > based on some other existing compression scheme, for the sake of
> > comparison. It seems like it shares similarities with the Snappy
> > scheme?
> >
> > Could you talk more about what the expected ratios and runtime
> > characteristics are? Best I can see is that compression runtime is
> > something like O(n * e * log d) where n is the length of the input, e
> > is the maximum length of a dictionary entry, and d is the number of
> > entries in the dictionary. Since e and d are constant for a given
> > static dictionary, how well the dictionary is constructed is
> > presumably important.
>
> The algorithm is almost identical to the one I used in ZSON extension [1]
> except the fact that ZSON uses 16-bit codes. In docs/benchmark.md you will 
> find
> approximate ratios to expect, etc.

That's assuming a machine-trained dictionary, though, which isn't part
of the proposal now. Is there a performance/ratio sample for a "best
practice" hand-written dictionary?

> > That reminds me. How do people expect to generate a "good" dictionary
> > in practice? Would they somehow get the JSONB representations out of
> > Postgres and run a training program over the blobs? I see some
> > reference to training functions in the prior threads but don't see any
> > breadcrumbs in the code.
>
> So far we agreed that in the first implementation it will be done manually.
> In the future it will be possible to update the dictionaries automatically
> during VACUUM. The idea of something similar to zson_learn() procedure, as
> I recall, didn't get much support, so we probably will not have it, or at 
> least
> it is not a priority.

Hm... I'm skeptical that a manually-constructed set of compression
dictionaries would be maintainable over time or at scale. But I'm not
the target audience so I will let others weigh in here instead.

> > > - Alternative compression algorithms. Note that this will not require any
> > > further changes in the catalog, only the values we write to pg_type and
> > > pg_cast will differ.
> >
> > Could you expand on this? I.e. why would alternative algorithms not
> > need catalog changes? It seems like the only schemes that could be
> > used with pg_catalog.pg_dict are those that expect to map a byte
> > string to a number. Is that general enough to cover other standard
> > compression algorithms?
>
> Sure. When creating a new dictionary pg_type and pg_cast are modified like 
> this:
>
>  =# CREATE TYPE mydict AS DICTIONARY OF JSONB ('abcdef', 'ghijkl');
> CREATE TYPE
>  =# SELECT * FROM pg_type WHERE typname = 'mydict';
> -[ RECORD 1 ]--+---
> oid| 16397
> typname| mydict
> typnamespace   | 2200
> ...
> typarray   | 16396
> typinput   | dictionary_in
> typoutput  | dictionary_out
> ...
>
> =# SELECT c.*, p.proname FROM pg_cast AS c
>  LEFT JOIN pg_proc AS p
>  ON p.oid = c.castfunc
>   WHERE c.castsource = 16397 or c.casttarget = 16397;
> -[ RECORD 1 ]-
> oid | 16400
> castsource  | 3802
> casttarget  | 16397
> castfunc| 9866
> castcontext | a
> castmethod  | f
> proname | jsonb_dictionary
> -[ RECORD 2 ]-
> oid | 16401
> castsource  | 16397
> casttarget  | 3802
> castfunc| 9867
> castcontext | i
> castmethod  | f
> proname | dictionary_jsonb
> -[ RECORD 3 ]-
> oid | 16402
> castsource  | 16397
> casttarget  | 17
> castfunc| 9868
> castcontext | e
> castmethod  | f
> proname | dictionary_bytea
>
> In order to add a new algorithm you simply need to provide alternatives
> to dictionary_in / dictionary_out / jsonb_dictionary / dictionary_jsonb and
> specify them in the 

Re: Assertion failure with barriers in parallel hash join

2022-06-02 Thread Thomas Munro
On Thu, Jun 2, 2022 at 9:31 PM David Geier  wrote:
> We recently encountered the same bug in the field. Oleksii Kozlov managed to 
> come up with reproduction steps, which reliably trigger it. Interestingly, 
> the bug does not only manifest as failing assertion, but also as segmentation 
> fault; in builds with disabled and with enabled (!) assertions. So it can 
> crash production environments. We applied the proposed patch v3 from Melanie 
> to the REL_14_3 branch and can confirm that with the patch neither the 
> assertion nor the segmentation fault still occur.

Thanks for the report, testing, and for creating the CF entry.

I assume you are using parallel_leader_participation=off, and the
reason we haven't heard more about this is because few people do that.
By coincidence I was just about to restart a bunch of hash join
projects and have been paging the topic area back into my brain, so
I'll start with another round of testing/analysis of this bug/patch
next week.




Re: Proposal: adding a better description in psql command about large objects

2022-06-02 Thread Nathan Bossart
On Thu, Jun 02, 2022 at 11:12:46AM +0200, Thibaud W. wrote:
> Attached is a small patch to add a description to the meta commands
> regarding
> large objects.

This seems reasonable to me.  Your patch wasn't applying for some reason,
so I created a new one with a commit message and some small adjustments.
What do you think?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From ca23b5c3b1ad3af4c47bb6d438c58a6e3ba5a1b6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 2 Jun 2022 14:35:31 -0700
Subject: [PATCH v2 1/1] Add descriptions for psql's large object backslash
 commands.

These should be mostly self-explanatory, but they are the only
backslash commands lacking individual short descriptions.

Author: Thibaud W.
Reviewed by: Nathan Bossart
Description: https://postgr.es/m/43f0439c-df3e-a045-ac99-af33523cc2d4%40dalibo.com
---
 src/bin/psql/help.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 49eb116f33..54580ac928 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -321,10 +321,10 @@ slashUsage(unsigned short int pager)
 	fprintf(output, "\n");
 
 	fprintf(output, _("Large Objects\n"));
-	fprintf(output, _("  \\lo_export LOBOID FILE\n"
-	  "  \\lo_import FILE [COMMENT]\n"
-	  "  \\lo_list[+]\n"
-	  "  \\lo_unlink LOBOID  large object operations\n"));
+	fprintf(output, _("  \\lo_export LOBOID FILE export large object to file\n"
+	  "  \\lo_import FILE [COMMENT]  import large object from file\n"
+	  "  \\lo_list[+]list large objects\n"
+	  "  \\lo_unlink LOBOID  delete a large object\n"));
 
 	ClosePager(output);
 }
-- 
2.25.1



Re: PG15 beta1 sort performance regression due to Generation context change

2022-06-02 Thread David Rowley
On Thu, 2 Jun 2022 at 20:20, John Naylor  wrote:
> If anything, changing work_mem is an
> easy to understand (although sometimes not practical) workaround.

I had a quick look at that for the problem case and we're very close
in terms of work_mem size to better performance. A work_mem of just
64.3MB brings the performance back to better than PG14.

postgres=# set work_mem = '64.2MB';
postgres=# \i bench.sql
 c1 | c2 | c3 | c4 | c5 | c6
+++++
(0 rows)

Time: 28949.942 ms (00:28.950)
postgres=# set work_mem = '64.3MB';
postgres=# \i bench.sql
 c1 | c2 | c3 | c4 | c5 | c6
+++++
(0 rows)

Time: 19759.552 ms (00:19.760)

David




Re: pg_upgrade test writes to source directory

2022-06-02 Thread Andrew Dunstan


On 2022-06-01 We 20:37, Michael Paquier wrote:
> On Wed, Jun 01, 2022 at 02:11:12PM -0700, Andres Freund wrote:
>> Until recently TESTDIR needed to point to the build directory containing the
>> binaries. But I'd like to be able to separate test log output from the build
>> tree, so that it's easier to capture files generated by tests for CI /
>> buildfarm. The goal is to have a separate directory for each test, so we can
>> present logs for failed tests separately. That was impossible with TESTDIR,
>> because it needed to point to the build directory.
> FWIW, this argument sounds sensible to me since I looked at 0001, not
> only for the log files, but also to help in the capture of files
> generated by the tests like 010_tab_completion.pl.
>
> I don't know yet what to do about this part, so for now I have fixed
> the other issue reported by Peter where the test names were missing.


I hope we fix the original issue soon - it's apparently been the cause
of numerous buildfarm failures that it was on my list to investigate
e.g.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [RFC] building postgres with meson

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-02 15:53:50 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Oh. I executed maintainer-clean inside src/backend/parser/, and thus didn't
> > see it getting cleaned up.
> 
> > It seems pretty darn grotty that src/backend/parser/Makefile explicitly 
> > states
> > that gram.c ... aren't cleaned "here", but then src/backend/Makefile does
> > clean them up.
> 
> I agree the factorization of this ain't great.  I'd think about improving
> it, were it not that we're trying to get rid of it.

+1. I think I just wanted to excuse my confusion...


> (But with meson, the whole idea of building or cleaning just part of the
> tree is out the window anyway, no?)

Cleaning parts of the tree isn't supported as far as I know (not that I've
needed it). You can build parts of the tree by specifying the target
(e.g. ninja src/backend/postgres) or by specifying meta-targets (e.g. ninja
contrib backend). I've thought about contributing a patch to meson to
automatically generate targets for each directory that has sub-targets - it's
just a few lines.

Greetings,

Andres Freund




Re: replacing role-level NOINHERIT with a grant-level option

2022-06-02 Thread Andrew Dunstan


On 2022-06-02 Th 14:06, Robert Haas wrote:
> On Thu, Jun 2, 2022 at 1:17 PM Tom Lane  wrote:
>> Point 2 would cause every existing pg_dumpall script to fail, which
>> seems like kind of a large gotcha.  Less unpleasant alternatives
>> could include
>>
>> * Continue to accept the syntax, but ignore it, maybe with a WARNING
>> for the alternative that doesn't correspond to the new behavior.
>>
>> * Keep pg_authid.rolinherit, and have it act as supplying the default
>> behavior for subsequent GRANTs to that role.
> Of those two alternatives, I would certainly prefer the first, because
> the second doesn't actually get rid of the ugly wart. It just adds a
> non-ugly thing that we have to maintain along with the ugly thing,
> apparently in perpetuity. If we do the first of these, we can probably
> remove the obsolete syntax at some point in the distant future, and in
> the meantime, we don't have to figure out how it's supposed to
> interact with existing features or new ones, since the actual feature
> is already removed.
>

+1


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PATCH] Fix pg_upgrade test from v10

2022-06-02 Thread Andrew Dunstan


On 2022-06-01 We 21:37, Michael Paquier wrote:
> On Thu, Jun 02, 2022 at 04:22:52AM +0300, Anton A. Melnikov wrote:
>> Found out that test for pg_upgrade (test.sh for 11-14 and 002_pg_upgrade.pl
>> for 15+) doesn't work from 10th versions to higher ones due to incompatible
>> options for initdb and default PGDATA permissions.
> Yeah, there are still TODOs in this stuff.  Those tests can also break
> easily depending on the dump you are pushing to the old node when
> doing cross-version upgrades.  Perl makes it a bit easier to reason
> about improving this area in the future, though, and MSVC is able to
> catch up on that.
>
>>  # To increase coverage of non-standard segment size and group access without
>>  # increasing test runtime, run these tests with a custom setting.
>>  # --allow-group-access and --wal-segsize have been added in v11.
>> -$oldnode->init(extra => [ '--wal-segsize', '1', '--allow-group-access' ]);
>> +my ($oldverstr) = `$ENV{oldinstall}/bin/pg_ctl --version` =~ /(\d+\.\d+)/;
>> +my ($oldver) =  (version->parse(${oldverstr}));
>> +$oldnode->init(extra => [ '--wal-segsize', '1', '--allow-group-access' ])
>> +if $oldver >= version->parse('11.0');
>> +$oldnode->init()
>> +if $oldver < version->parse('11.0');
> A node's pg_version is assigned via _set_pg_version() when creating it
> using PostgreSQL::Test::Cluster::new().  In order to make the
> difference with the set of initdb options to use when setting up the
> old node, it would be simpler to rely on that, no?  Version.pm is able
> to handle integer as well as string comparisons for the version
> strings.


Both these patches look dubious.


1. There is no mention of why there's a change w.r.t. Cygwin and 
permissions checks. Maybe it's ok, but it seems off topic and is
certainly not referred to in the patch submission.

2. As Michael says, we should not be using perl's version module, we
should be using the version object built into each
PostgreSQL::Test::Cluster instance.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_auth_members.grantor is bunk

2022-06-02 Thread Robert Haas
On Thu, Jun 2, 2022 at 3:51 PM Tom Lane  wrote:
> > I sort of thought http://postgr.es/m/3981966.1646429...@sss.pgh.pa.us
> > constituted a completed investigation of this sort. No?
>
> I didn't think so.  It's clear that the spec expects us to track the
> grantor, but I didn't chase down what it expects us to *do* with that
> information, nor what it thinks the rules are for merging multiple
> authorizations.

Hmm, OK. Well, one problem is that I've never had any luck
interpreting what the spec says about anything, and I've sort of given
up. But even if that were not so, I'm a little unclear what other
conclusion is possible here. The spec either wants the same behavior
that we already have for other object types, which is what I am here
proposing that we do, or it wants something different. If it wants
something different, it probably wants that for all object types, not
just roles. Since I doubt we would want the behavior for roles to be
inconsistent with what we do for all other object types, in that case
we would probably either change the behavior for all other object
types to something new, and then clean up the role stuff afterwards,
or else first do what I proposed here and then later change it all at
once. In which case the proposal that I've made is as good a way to
start as any.

Now, if it happens to be the case that the spec proposes a different
behavior for roles than for non-role objects, and if the behavior for
roles is something other than the only we currently have for non-role
objects, then I'd agree that the plan I propose here needs revision. I
suspect that's unlikely but I can't make anything of the spec so 
maybe?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: replacing role-level NOINHERIT with a grant-level option

2022-06-02 Thread Nathan Bossart
On Thu, Jun 02, 2022 at 03:37:34PM -0400, Robert Haas wrote:
> On Thu, Jun 2, 2022 at 2:07 PM Nathan Bossart  
> wrote:
>> I think we should also consider replacing role attributes with predefined
>> roles.  I'm not sure that this proposal totally prepares us for such a
>> change, given role attributes apply only to the specific role for which
>> they are set and aren't inherited.  ISTM in order to support that, we'd
>> need even more enhanced functionality.  For example, if I want 'robert' to
>> be a superuser, and I want 'joe' to inherit the privileges of 'robert' but
>> not 'pg_superuser', you'd need some way to specify inheriting only certain
>> privileges possessed by an intermediate role.
> 
> I guess we could think about adding something like an ONLY clause,
> like GRANT ONLY robert TO joe. I feel a little bit uncomfortable about
> that, though, because it assumes that robert is a superuser but his
> own privileges are distinguishable from those of the superuser. Are
> they really? If I can assume robert's identity, I can presumably
> Trojan my way into the superuser account pretty easily. I'll just
> define a little trigger on one of his tables. I don't really see a way
> where we can ever make it safe to grant a non-superuser membership in
> a superuser role.

I was primarily looking at this from the angle of preserving current
behavior when upgrading from a version with role attributes to a version
without them.  If it's alright that a role with privileges of a superuser
role begins being treated like a superuser after an upgrade, then we
probably don't need something like GRANT ONLY.  I bet that's how a lot of
people expect role attributes to work, anyway.  I'm sure I did at some
point.

> But even if there is a way, I think that is a separate patch from what
> I'm proposing here. [NO]INHERIT only has to do with what privileges
> you can exercise without SET ROLE. To solve the problem you're talking
> about here, you'd need a way to control what privileges are conferred
> in any manner, which is related, but different.

I agree that the role-attribute-to-predefined-role stuff needs its own
thread.  I just think it's worth designing this stuff with that in mind.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [RFC] building postgres with meson

2022-06-02 Thread Tom Lane
Andres Freund  writes:
> Oh. I executed maintainer-clean inside src/backend/parser/, and thus didn't
> see it getting cleaned up.

> It seems pretty darn grotty that src/backend/parser/Makefile explicitly states
> that gram.c ... aren't cleaned "here", but then src/backend/Makefile does
> clean them up.

I agree the factorization of this ain't great.  I'd think about improving
it, were it not that we're trying to get rid of it.

(But with meson, the whole idea of building or cleaning just part of the
tree is out the window anyway, no?)

regards, tom lane




Re: pg_auth_members.grantor is bunk

2022-06-02 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 2, 2022 at 3:15 PM Tom Lane  wrote:
>> Maybe.  What I was pointing out is that this is SQL-standard syntax
>> and there are SQL-standard semantics that it ought to be implementing.
>> Probably those semantics match what you describe here, but we ought
>> to dive into the spec and make sure before we spend a lot of effort.
>> It's not quite clear to me whether the spec defines any particular
>> unique key (identity) for the set of role authorizations.

> I sort of thought http://postgr.es/m/3981966.1646429...@sss.pgh.pa.us
> constituted a completed investigation of this sort. No?

I didn't think so.  It's clear that the spec expects us to track the
grantor, but I didn't chase down what it expects us to *do* with that
information, nor what it thinks the rules are for merging multiple
authorizations.

regards, tom lane




Re: pg_auth_members.grantor is bunk

2022-06-02 Thread Robert Haas
On Thu, Jun 2, 2022 at 3:15 PM Tom Lane  wrote:
> Maybe.  What I was pointing out is that this is SQL-standard syntax
> and there are SQL-standard semantics that it ought to be implementing.
> Probably those semantics match what you describe here, but we ought
> to dive into the spec and make sure before we spend a lot of effort.
> It's not quite clear to me whether the spec defines any particular
> unique key (identity) for the set of role authorizations.

I sort of thought http://postgr.es/m/3981966.1646429...@sss.pgh.pa.us
constituted a completed investigation of this sort. No?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: replacing role-level NOINHERIT with a grant-level option

2022-06-02 Thread Robert Haas
On Thu, Jun 2, 2022 at 2:07 PM Nathan Bossart  wrote:
> I think we should also consider replacing role attributes with predefined
> roles.  I'm not sure that this proposal totally prepares us for such a
> change, given role attributes apply only to the specific role for which
> they are set and aren't inherited.  ISTM in order to support that, we'd
> need even more enhanced functionality.  For example, if I want 'robert' to
> be a superuser, and I want 'joe' to inherit the privileges of 'robert' but
> not 'pg_superuser', you'd need some way to specify inheriting only certain
> privileges possessed by an intermediate role.

I guess we could think about adding something like an ONLY clause,
like GRANT ONLY robert TO joe. I feel a little bit uncomfortable about
that, though, because it assumes that robert is a superuser but his
own privileges are distinguishable from those of the superuser. Are
they really? If I can assume robert's identity, I can presumably
Trojan my way into the superuser account pretty easily. I'll just
define a little trigger on one of his tables. I don't really see a way
where we can ever make it safe to grant a non-superuser membership in
a superuser role.

But even if there is a way, I think that is a separate patch from what
I'm proposing here. [NO]INHERIT only has to do with what privileges
you can exercise without SET ROLE. To solve the problem you're talking
about here, you'd need a way to control what privileges are conferred
in any manner, which is related, but different.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [RFC] building postgres with meson

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-02 15:05:10 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-06-02 13:33:51 -0400, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> (*) do we really not have a target that removes bison / flex output?
> 
> >> maintainer-clean
> 
> > Don't think so:
> 
> See about line 300 in src/backend/Makefile.  In any case, it's
> easy to show by experiment that it does.
> 
> $ make maintainer-clean
> ...
> $ git status --ignored
> On branch master
> Your branch is up to date with 'origin/master'.
> 
> nothing to commit, working tree clean

Oh. I executed maintainer-clean inside src/backend/parser/, and thus didn't
see it getting cleaned up.

It seems pretty darn grotty that src/backend/parser/Makefile explicitly states
that gram.c ... aren't cleaned "here", but then src/backend/Makefile does
clean them up.

Greetings,

Andres Freund




Re: pg_auth_members.grantor is bunk

2022-06-02 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 4, 2022 at 4:34 PM Tom Lane  wrote:
>> If we are not tracking the grantors of role authorizations,
>> then we are doing it wrong and we ought to fix that.

> So let's talk about how we could fix this. In a vacuum I'd say this is
> just a feature that never got finished and we should rip the whole
> thing out. That is, remove pg_auth_members.grantor entirely and at
> most keep some do-nothing syntax around for backward compatibility.
> However, what Tom is saying in the text quoted above is that we ought
> to have something that actually works, which is more challenging.
> Apparently, the desired behavior here is for this to work like grants
> on non-role objects, where executing "GRANT SELECT ON TABLE s1 TO foo"
> under two different user accounts bar and baz that both have
> permissions to grant that privilege creates two independent grants
> that can be independently revoked.

Maybe.  What I was pointing out is that this is SQL-standard syntax
and there are SQL-standard semantics that it ought to be implementing.
Probably those semantics match what you describe here, but we ought
to dive into the spec and make sure before we spend a lot of effort.
It's not quite clear to me whether the spec defines any particular
unique key (identity) for the set of role authorizations.

regards, tom lane




Re: [RFC] building postgres with meson

2022-06-02 Thread Tom Lane
Andres Freund  writes:
> On 2022-06-02 13:33:51 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> (*) do we really not have a target that removes bison / flex output?

>> maintainer-clean

> Don't think so:

See about line 300 in src/backend/Makefile.  In any case, it's
easy to show by experiment that it does.

$ make maintainer-clean
...
$ git status --ignored
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

regards, tom lane




pg_auth_members.grantor is bunk

2022-06-02 Thread Robert Haas
On Fri, Mar 4, 2022 at 4:34 PM Tom Lane  wrote:
> If we are not tracking the grantors of role authorizations,
> then we are doing it wrong and we ought to fix that.

We are definitely doing it wrong. It's not that we aren't doing it at
all, but we are doing it incorrectly. If user foo executes "GRANT foo
TO bar GRANTED BY quux", pg_auth_members.grantor gets the OID of role
"quux". Without the "GRANTED BY" clause, it gets the OID of role
"foo". But no dependency is created; therefore, the OID of that column
can point to a role that no longer exists, or potentially to a role
that did exist at one point, was dropped, and was later replaced by
some other role that happens to get the same OID. pg_dump handles this
by dumping "GRANTED BY whoever" if pg_auth_members.grantor is a
still-extant role and omitting the clause if not. This would be a
security vulnerability if there were any logic in the backend that
actually did anything with pg_auth_members.grantor, because if the
original grantor is removed and replaced by another role with the same
OID, a dump/restore could change the notional grantor. Since there is
no such logic, I don't think it's insecure, but it's still really
lame.

So let's talk about how we could fix this. In a vacuum I'd say this is
just a feature that never got finished and we should rip the whole
thing out. That is, remove pg_auth_members.grantor entirely and at
most keep some do-nothing syntax around for backward compatibility.
However, what Tom is saying in the text quoted above is that we ought
to have something that actually works, which is more challenging.
Apparently, the desired behavior here is for this to work like grants
on non-role objects, where executing "GRANT SELECT ON TABLE s1 TO foo"
under two different user accounts bar and baz that both have
permissions to grant that privilege creates two independent grants
that can be independently revoked. To get there, we'll have to change
a good few things -- not only will we need a dependency to prevent a
grantor from being dropped without revoking the grant, but we need to
change the primary key of pg_auth_members from (roleid, member) to
(roleid, member, grantor). Then we'll also have to change the behavior
of the GRANT and REVOKE commands at the SQL level, and also the
behavior of pg_dump, which will need to dump and restore all grants.

I'm open to other proposals, but my thought is that it might be
simplest to try to clean this up in two steps. In step one, the only
goal would be to make pg_auth_members.grantor reliably sane. In other
words, we'd add a dependency on the grantor when a role is granted to
another role. You could still only have one grant of role A to role B,
but the notional grantor C would always be a user that actually
exists. I suspect it would be a really good idea to also patch pg_dump
to not ever dump the grantor when working from an older release,
because the information is not necessarily reliable and I fear that
propagating it forward could lead to broken stuff or maybe even
security hazards as noted above. Then, in step two, we change things
around to allow multiple grants of the same role to the same other
role, one per grantor. Now you've achieved parity between the behavior
we have for roles and the behavior we have for permissions on other
kinds of SQL objects.

There may be other improvements we want to make in this area -
previous discussions have suggested various ideas - but it seems to me
that making the behavior sane and consistent with other types of
objects would be a good start. That way, if we decide we do want to
change anything else, we will be starting from a firm foundation,
rather than building on sand.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: replacing role-level NOINHERIT with a grant-level option

2022-06-02 Thread Nathan Bossart
On Thu, Jun 02, 2022 at 12:26:48PM -0400, Robert Haas wrote:
> On Mon, Feb 7, 2022 at 11:13 AM Joe Conway  wrote:
>> > It seems to me that the INHERIT role flag isn't very well-considered.
>> > Inheritance, or the lack of it, ought to be decided separately for
>> > each inherited role. However, that would be a major architectural
>> > change.
>>
>> Agreed -- that would be useful.
> 
> Is this a kind of change people would support? Here's a quick sketch:

Yes.

> The idea here is that, today, a role must either inherit privileges
> from all roles of which it is a member or none of them. With this
> proposal, you could make a role inherit some of those privileges but
> not others. I think it's not difficult to see how that might be
> useful: for example, user A could be granted non-login role X with
> inheritance and login role B without inheritance. That would allow
> user A to exercise the privileges of a member of group X at all times,
> and the privileges of user B only with an explicit SET ROLE operation.
> That way, A is empowered to act for B when necessary, but won't
> accidentally do so.

I think we should also consider replacing role attributes with predefined
roles.  I'm not sure that this proposal totally prepares us for such a
change, given role attributes apply only to the specific role for which
they are set and aren't inherited.  ISTM in order to support that, we'd
need even more enhanced functionality.  For example, if I want 'robert' to
be a superuser, and I want 'joe' to inherit the privileges of 'robert' but
not 'pg_superuser', you'd need some way to specify inheriting only certain
privileges possessed by an intermediate role.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: replacing role-level NOINHERIT with a grant-level option

2022-06-02 Thread Robert Haas
On Thu, Jun 2, 2022 at 1:17 PM Tom Lane  wrote:
> Point 2 would cause every existing pg_dumpall script to fail, which
> seems like kind of a large gotcha.  Less unpleasant alternatives
> could include
>
> * Continue to accept the syntax, but ignore it, maybe with a WARNING
> for the alternative that doesn't correspond to the new behavior.
>
> * Keep pg_authid.rolinherit, and have it act as supplying the default
> behavior for subsequent GRANTs to that role.

Of those two alternatives, I would certainly prefer the first, because
the second doesn't actually get rid of the ugly wart. It just adds a
non-ugly thing that we have to maintain along with the ugly thing,
apparently in perpetuity. If we do the first of these, we can probably
remove the obsolete syntax at some point in the distant future, and in
the meantime, we don't have to figure out how it's supposed to
interact with existing features or new ones, since the actual feature
is already removed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [RFC] building postgres with meson

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-02 13:33:51 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > (*) do we really not have a target that removes bison / flex output?
> 
> maintainer-clean

Don't think so:

# gram.c, gram.h, and scan.c are in the distribution tarball, so they
# are not cleaned here.
clean distclean maintainer-clean:
rm -f lex.backup

Greetings,

Andres Freund




Re: [RFC] building postgres with meson

2022-06-02 Thread Tom Lane
Andres Freund  writes:
> (*) do we really not have a target that removes bison / flex output?

maintainer-clean

regards, tom lane




Re: [RFC] building postgres with meson

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-02 13:08:49 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I'm not quite sure what the proper behaviour is when doing an out-of-tree
> > build with meson (all builds are out-of-tree), with a pre-existing flex /
> > bison output in the source tree that is out of date.
> 
> Definitely sounds like a gotcha.
> 
> On the one hand, there's been some discussion already of removing all
> derived files from tarballs and just insisting that users provide all
> needed tools when building from source.  If we did that, it could be
> sufficient for the meson build to check that no such files are present
> in the source tree.  (Checking a couple of them would be enough, likely.)

There already is a check for pg_config.h, so the most obvious source of this
is addressed. Just didn't think about the files that make clean doesn't remove
:/.


> On the other hand, I'm not sure that I want such a change to be forced
> by a toolchain change.  It definitely seems a bit contrary to the plan
> we'd formed of allowing meson and make-based builds to coexist for
> a few years, because we'd be breaking at least some make-based build
> processes.

Agreed. I think it'd be pretty reasonable to not include flex / bison
output. They're not hard to acquire. The docs are perhaps another story.

I think it might be fine to say that make reallyclean (*) is required if
there's some conflicting in-source tree file?


> Could we have the meson build check that, say, if gram.c exists it
> is newer than gram.y?  Or get it to ignore an in-tree gram.c?

I suspect the problem with ignoring is gram.h, that's probably a bit harder to
ignore. Right now I'm leaning towards either always erroring out if there's
bison/flex output in the source tree (with a hint towards make
reallyclean(*)), or erroring out if they're out of date (again with a hint
towards reallyclean)?

Alternatively we could just remove the generated .c/h files from the source
dir, as a part of regenerating them in the build dir? But I like the idea of
the source dir being readonly outside of explicit targets modifying sources
(e.g. update-unicode or such).

Greetings,

Andres Freund

(*) do we really not have a target that removes bison / flex output?




Re: replacing role-level NOINHERIT with a grant-level option

2022-06-02 Thread Tom Lane
Robert Haas  writes:
> Is this a kind of change people would support? Here's a quick sketch:

> 1. Extend the GRANT role_name TO role_name [ WITH ADMIN OPTION ] with
> a new, optional clause, something like WITH NO INHERIT or WITH
> NOINHERIT or WITHOUT INHERIT.
> 2. Remove the INHERIT | NOINHERIT flag from CREATE ROLE and ALTER ROLE.
> 3. Replace pg_authid.rolinherit with pg_auth_members.inherit. Any
> place where we would have considered rolinherit, instead consider the
> inherit flag for the particular pg_auth_members entry at issue.
> 4. When dumping from an old version, dump all grants to NOINHERIT
> roles as non-inheritable grants.

Point 2 would cause every existing pg_dumpall script to fail, which
seems like kind of a large gotcha.  Less unpleasant alternatives
could include

* Continue to accept the syntax, but ignore it, maybe with a WARNING
for the alternative that doesn't correspond to the new behavior.

* Keep pg_authid.rolinherit, and have it act as supplying the default
behavior for subsequent GRANTs to that role.

Perhaps there are other ways.

regards, tom lane




Re: [RFC] building postgres with meson

2022-06-02 Thread Tom Lane
Andres Freund  writes:
> I'm not quite sure what the proper behaviour is when doing an out-of-tree
> build with meson (all builds are out-of-tree), with a pre-existing flex /
> bison output in the source tree that is out of date.

Definitely sounds like a gotcha.

On the one hand, there's been some discussion already of removing all
derived files from tarballs and just insisting that users provide all
needed tools when building from source.  If we did that, it could be
sufficient for the meson build to check that no such files are present
in the source tree.  (Checking a couple of them would be enough, likely.)

On the other hand, I'm not sure that I want such a change to be forced
by a toolchain change.  It definitely seems a bit contrary to the plan
we'd formed of allowing meson and make-based builds to coexist for
a few years, because we'd be breaking at least some make-based build
processes.

Could we have the meson build check that, say, if gram.c exists it
is newer than gram.y?  Or get it to ignore an in-tree gram.c?

regards, tom lane




Re: convert libpq uri-regress tests to tap test

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-01 13:59:06 +0900, Michael Paquier wrote:
> On Tue, May 31, 2022 at 01:58:25PM +0900, Michael Paquier wrote:
> > Why don't you just use src/interfaces/ instead of adding a direct
> > path to libpq?
> 
> So, this leads to something like the attached.  Does that sound fine
> to you?

That looks reasonable to me. Do you want to apply it or will you?

Regards,

Andres




Re: convert libpq uri-regress tests to tap test

2022-06-02 Thread Andres Freund
Hi,

On 2022-05-29 10:18:50 -0500, Justin Pryzby wrote:
> On Sat, Feb 26, 2022 at 05:46:26PM -0800, Andres Freund wrote:
> > On 2022-02-25 17:52:29 -0800, Andres Freund wrote:
> > > I'd like to commit 0001 and 0002 soon, unless somebody sees a reason not 
> > > to?
> > 
> > Pushed.
> 
> If I'm not wrong, this isn't being run by check-world.

Oops, yes. Thanks for catching!




Re: [RFC] building postgres with meson

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-01 12:39:50 +0300, Aleksander Alekseev wrote:
> > > ```
> > > ../src/include/parser/kwlist.h:332:25: error: ‘PARAMETER’ undeclared here 
> > > (not
> > > in a function)
> > > 332 | PG_KEYWORD("parameter", PARAMETER, UNRESERVED_KEYWORD, BARE_LABEL)
> > >
> > > ../src/interfaces/ecpg/preproc/keywords.c:32:55: note: in definition of 
> > > macro
> > > ‘PG_KEYWORD’
> > > 32 | #define PG_KEYWORD(kwname, value, category, collabel) value,
> > > ```
> >
> > Huh. I've not seen this before - could you provide a bit more detail about
> > what you did? CI isn't testing ubuntu, but it is testing Debian, so I'd 
> > expect
> > this to work.
> 
> I used PIP to install Meson, since the default APT package is too old, v0.53:
> 
> $ pip3 install --user meson
> $ meson --version
> 0.62.1
> $ ninja --version
> 1.10.0
> 
> The branch was checked out as it was described in the first email.
> Then to reproduce the issue:
> 
> $ git status
> On branch meson
> Your branch is up to date with 'andres/meson'.
> $ git fetch andres
> $ git rebase -i andres/meson
> $ meson setup build --buildtype debug
> $ cd build
> $ ninja
> 
> This is pretty much the default Ubuntu 20.04.4 LTS system with all the
> recent updates installed, so it shouldn't be a problem to reproduce
> the issue with a VM.

Chatting with a colleague (who unbeknownst to me hit something similar in the
past) I think we figured it out. It's not due to Ubuntu 20.04 or such. It's
likely due to previously having an in-tree build with autoconf, doing make
clean, doing a git pull, then building with meson. The meson build doesn't yet
handle pre-existing flex / bison output.

I had tried to defend against conflicts with in-tree builds by detecting an
in-tree pg_config.h, but that doesn't help with files that aren't removed by
make clean. Like bison / flex output.

And I didn't notice this problem because it doesn't cause visible issues until
the lexer / grammar changes...


I'm not quite sure what the proper behaviour is when doing an out-of-tree
build with meson (all builds are out-of-tree), with a pre-existing flex /
bison output in the source tree that is out of date.

Greetings,

Andres Freund




Re: compiler warnings with gcc 4.8 and -Og

2022-06-02 Thread Tom Lane
Andres Freund  writes:
> I tried to use -Og many times, but in the end mostly gave up, because it still
> makes debugging harder compared to -O0.

Yeah.  My own habit is to build with -O2 normally.  If I'm trying to
debug some bit of code and find that I can't follow things adequately
in gdb, I recompile just the relevant file(s) with -O0.

regards, tom lane




replacing role-level NOINHERIT with a grant-level option

2022-06-02 Thread Robert Haas
On Mon, Feb 7, 2022 at 11:13 AM Joe Conway  wrote:
> > It seems to me that the INHERIT role flag isn't very well-considered.
> > Inheritance, or the lack of it, ought to be decided separately for
> > each inherited role. However, that would be a major architectural
> > change.
>
> Agreed -- that would be useful.

Is this a kind of change people would support? Here's a quick sketch:

1. Extend the GRANT role_name TO role_name [ WITH ADMIN OPTION ] with
a new, optional clause, something like WITH NO INHERIT or WITH
NOINHERIT or WITHOUT INHERIT.
2. Remove the INHERIT | NOINHERIT flag from CREATE ROLE and ALTER ROLE.
3. Replace pg_authid.rolinherit with pg_auth_members.inherit. Any
place where we would have considered rolinherit, instead consider the
inherit flag for the particular pg_auth_members entry at issue.
4. When dumping from an old version, dump all grants to NOINHERIT
roles as non-inheritable grants.

The idea here is that, today, a role must either inherit privileges
from all roles of which it is a member or none of them. With this
proposal, you could make a role inherit some of those privileges but
not others. I think it's not difficult to see how that might be
useful: for example, user A could be granted non-login role X with
inheritance and login role B without inheritance. That would allow
user A to exercise the privileges of a member of group X at all times,
and the privileges of user B only with an explicit SET ROLE operation.
That way, A is empowered to act for B when necessary, but won't
accidentally do so.

It's been proposed elsewhere by Stephen and others that we ought to
have the ability to grant ADMIN OPTION on a role without granting
membership in that role. There's some overlap between these two
proposals. If your concern is just about accident prevention, you will
be happy to use this instead of that. If you want real access
restrictions, you will want that, not this. I think that's OK. Doing
this first doesn't mean we can't do that later. What about the
reverse? Could we add ADMIN-without-membership first, and then decide
whether to do this later? Maybe so, but I have come to feel that
NOINHERIT is such an ugly wart that we'll be happier the sooner we
kill it. It seems to make every discussion that we have about any
other potential change in this area more difficult, and I venture that
ADMIN-without-membership wouldn't turn out to be an exception.

Based on previous discussions I think that, long term, we're probably
headed toward a future where role grants have a bunch of different
flags, each of which controls a single behavior: whether you can
implicitly use the privileges of the role, whether you can access them
via SET ROLE, whether you can grant the role to others, or revoke it
from them, etc. I don't know exactly what the final list will look
like, and hopefully it won't be so long that it makes us all wish we
were dead, but there doesn't seem to be any possibility that implicit
inheritance of permissions won't be in that list. I spent a few
minutes considering whether I ought to instead propose that we just
nuke NOINHERIT from orbit and replace it with absolutely nothing, and
concluded that such a proposal had no hope of attracting consensus.
Perhaps the idea of replacing it with that is more powerful and at
least IMHO cleaner will.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Defer selection of asynchronous subplans until the executor initialization stage

2022-06-02 Thread Zhihong Yu
On Thu, Jun 2, 2022 at 5:08 AM Etsuro Fujita 
wrote:

> On Wed, Apr 6, 2022 at 3:58 PM Etsuro Fujita 
> wrote:
> > I have committed the patch after modifying it as such.
>
> The patch calls trivial_subqueryscan() during create_append_plan() to
> determine the triviality of a SubqueryScan that is a child of an
> Append node.  Unlike when calling it from
> set_subqueryscan_references(), this is done before some
> post-processing such as set_plan_references() on the subquery.  The
> reason why this is safe wouldn't be that obvious, so I added to
> trivial_subqueryscan() comments explaining this.  Attached is a patch
> for that.
>
> Best regards,
> Etsuro Fujita
>
Hi,
Suggestion on formatting the comment:

+ * node (or that for any plan node in the subplan tree), 2)
+ * set_plan_references() modifies the tlist for every plan node in the

It would be more readable if `2)` is put at the beginning of the second
line above.

+ * preserves the length and order of the tlist, and 3)
set_plan_references()
+ * might delete the topmost plan node like an Append or MergeAppend from
the

Similarly you can move `3) set_plan_references()` to the beginning of the
next line.

Cheers


Re: [RFC] building postgres with meson

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-02 15:34:23 +0300, Aleksander Alekseev wrote:
> Hi Andres,
> 
> > Cool. I think I pushed a fix/workaround for the issue now. Still can't 
> > decide
> > whether it's apple's or meson's fault.
> 
> Many thanks! The fix solved the problem, I can compile with -Dldap=enabled 
> now.
> The code passes the tests too.

Cool.


> > I suspect the errors might be due to CC/CPPFLAGS/... not being defined. I 
> > can
> > try to make it output something roughly compatible with the old output, for
> > most of those I already started to compute values for the PGXS compat stuff 
> > I
> > was hacking on recently.
> 
> Yes, that could explain the problem. Just for the record, I get several errors
> regarding src/export.h in TimescaleDB code [1]:
> 

I think this is timescale's issue. Why are you defining / undefining
PGDLLEXPORT?

Part of the patch series is to use visibility attributes, and your #if
TS_EMPTY(PGDLLEXPORT) thing can't handle that.

Greetings,

Andres Freund




Re: compiler warnings with gcc 4.8 and -Og

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-02 01:09:58 -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > forking: <20220302205058.gj15...@telsasoft.com>: Re: Adding CI to our tree
> > On Wed, Mar 02, 2022 at 02:50:58PM -0600, Justin Pryzby wrote:
> >> BTW (regarding the last patch), I just noticed that -Og optimization can 
> >> cause
> >> warnings with gcc-4.8.5-39.el7.x86_64.
> 
> I'm a little dubious about whether -Og is a case we should pay special
> attention to?  Our standard optimization setting for gcc is -O2, and
> once you go away from that there are any number of weird cases that
> may or may not produce warnings.  I'm not entirely willing to buy
> the proposition that we must suppress warnings on
> any-random-gcc-version combined with any-random-options.

I think it'd be useful to have -Og in a usable state, despite my nearby
griping about it. It makes our tests use noticably fewer CPU cycles, and
debugging is less annoying than with -O2. It's also faster to compile.

However, making that effort for compiler versions for a compiler that went out
of support in 2015 doesn't seem useful. It may be useful to pay some attention
to not producint too many warnings on LTS distribution compilers when
compiling with production oriented flags, but nobody should develop on them.

Greetings,

Andres Freund




Re: compiler warnings with gcc 4.8 and -Og

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-02 10:33:52 -0400, Tom Lane wrote:
> Matthias van de Meent  writes:
> > On Thu, 2 Jun 2022, 07:10 Tom Lane,  wrote:
> >> I'm a little dubious about whether -Og is a case we should pay special
> >> attention to?
> 
> > The "Developer FAQ" page on the wiki suggests that when you develop
> > with gcc that you use CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"
> > during development, so I'd hardly call -Og "any random option".
> 
> I have no idea who wrote that FAQ entry, and I'd certainly not
> accept it as being project policy.

I don't know either. However:

> I'd actually say that's an excellent example of adding some random compiler
> options.

To me they mostly make sense. -g3 with -ggdb makes gcc emit enough information
about macros that the debugger can interpret them. -fno-omit-frame-pointer
makes profiling with call graphs much much smaller.

I tried to use -Og many times, but in the end mostly gave up, because it still
makes debugging harder compared to -O0.

Greetings,

Andres Freund




Re: compiler warnings with gcc 4.8 and -Og

2022-06-02 Thread Andres Freund
Hi,

On 2022-06-02 13:24:28 +0900, Michael Paquier wrote:
> On Wed, Jun 01, 2022 at 09:42:44PM -0500, Justin Pryzby wrote:
> > Today's "warnings" thread suggests to me that these are worth fixing - it 
> > seems
> > reasonable to compile postgres 14 on centos7 (as I sometimes have done), and
> > the patch seems even more reasonable when backpatched to older versions.
> > (Also, I wonder if there's any consideration to backpatch cirrus.yaml, which
> > uses -Og)
> 
> https://en.wikipedia.org/wiki/CentOS#CentOS_releases tells that centos
> 7 will be supported until the end of 2024, so I would fix that.

To me fixing gcc 4.8 warnings feels like a fools errand, unless they're
verbose enough to make compilation exceedingly verbose (e.g. warnings in a
header).


> > The patch below applies and fixes warnings back to v13.
> 
> I don't mind fixing what you have here, as a first step.  All those
> cases are telling us that the compiler does not see PG_TRY() as
> something is can rely on to set up each variable.  7292fd8 is
> complaining about the same point, actually, aka setjmp clobberring the
> variable, isn't it?  So wouldn't it be better to initialize them, as
> your patch does, but also mark them as volatile?  In short, what
> happens with -Wclobber and a non-optimized compilation?

FWIW, I found -Wclobber to be so buggy as to be pointless.

I don't think it needs be volatile because it'll not be accessed if we error
out? At least in the first instances in Justin's patch.

Greetings,

Andres Freund




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-06-02 Thread Greg Stark
On Tue, 31 May 2022 at 12:00, Vladimir Sitnikov
 wrote:
>
> Please, do not suggest me avoid 65535 parameters. What I wanted was just to 
> test that the driver was able to handle 65535 parameters.

I don't think we have regression tests to cover things at these
limits, that might be worth adding if they're not too awkward to
maintain.

-- 
greg




Re: compiler warnings with gcc 4.8 and -Og

2022-06-02 Thread Tom Lane
Matthias van de Meent  writes:
> On Thu, 2 Jun 2022, 07:10 Tom Lane,  wrote:
>> I'm a little dubious about whether -Og is a case we should pay special
>> attention to?

> The "Developer FAQ" page on the wiki suggests that when you develop
> with gcc that you use CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"
> during development, so I'd hardly call -Og "any random option".

I have no idea who wrote that FAQ entry, and I'd certainly not
accept it as being project policy.  I'd actually say that's an
excellent example of adding some random compiler options.

regards, tom lane




Re: compiler warnings with gcc 4.8 and -Og

2022-06-02 Thread Matthias van de Meent
On Thu, 2 Jun 2022, 07:10 Tom Lane,  wrote:
>
> Justin Pryzby  writes:
> > forking: <20220302205058.gj15...@telsasoft.com>: Re: Adding CI to our tree
> > On Wed, Mar 02, 2022 at 02:50:58PM -0600, Justin Pryzby wrote:
> >> BTW (regarding the last patch), I just noticed that -Og optimization can 
> >> cause
> >> warnings with gcc-4.8.5-39.el7.x86_64.
>
> I'm a little dubious about whether -Og is a case we should pay special
> attention to?  Our standard optimization setting for gcc is -O2, and
> once you go away from that there are any number of weird cases that
> may or may not produce warnings.  I'm not entirely willing to buy
> the proposition that we must suppress warnings on
> any-random-gcc-version combined with any-random-options.

The "Developer FAQ" page on the wiki suggests that when you develop
with gcc that you use CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"
during development, so I'd hardly call -Og "any random option".

-Matthias




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-06-02 Thread Robert Haas
On Tue, May 31, 2022 at 6:21 PM Jacob Champion  wrote:
> v10 is rebased over latest; I've also added a PGDLLIMPORT to the new global.

I took a quick look at this and it doesn't seem crazy to me, except
that I think ParallelProcInfo is a bad name for it. It's kind of
generic, because neither "proc" nor "info" means a whole lot. It's
also kind of wrong, because I think "parallel" should be things that
have to do with parallelism, not just things that happen to be
synchronized across processes when parallelism is in use. It doesn't
make sense to me to have something called a ParallelProcInfo that is
used for every single connection in the universe even if parallelism
is completely disabled on the system.

I'm not sure what it SHOULD be called, exactly: that's one of the hard
problems in computer science.[1]

-- 
Robert Haas
EDB: http://www.enterprisedb.com

[1] https://martinfowler.com/bliki/TwoHardThings.html




Re: compiler warnings with gcc 4.8 and -Og

2022-06-02 Thread Daniel Gustafsson
> On 2 Jun 2022, at 07:09, Tom Lane  wrote:

> I'm a little dubious about whether -Og is a case we should pay special
> attention to?  Our standard optimization setting for gcc is -O2, and
> once you go away from that there are any number of weird cases that
> may or may not produce warnings.

I think we should pick one level to keep warning free, and stick to that.  In
light of that, -O2 seems a lot more appealing than -Og.

--
Daniel Gustafsson   https://vmware.com/





Re: [PATCH] Compression dictionaries for JSONB

2022-06-02 Thread Aleksander Alekseev
Hi Jacob,

Many thanks for your feedback!

> I saw there was some previous discussion about dictionary size. It
> looks like this approach would put all dictionaries into a shared OID
> pool. Since I don't know what a "standard" use case is, is there any
> risk of OID exhaustion for larger deployments with many dictionaries?
> Or is 2**32 so comparatively large that it's not really a serious
> concern?

I agree, this is a drawback of the current implementation. To be honest,
I simply followed the example of how ENUMs are implemented. I'm not 100% sure
if we should be worried here (apparently, freed OIDs are reused). I'm OK with
using a separate sequence if someone could second this. This is the first time
I'm altering the catalog so I'm not certain what the best practices are.

> I see the algorithm description, but I'm curious to know whether it's
> based on some other existing compression scheme, for the sake of
> comparison. It seems like it shares similarities with the Snappy
> scheme?
>
> Could you talk more about what the expected ratios and runtime
> characteristics are? Best I can see is that compression runtime is
> something like O(n * e * log d) where n is the length of the input, e
> is the maximum length of a dictionary entry, and d is the number of
> entries in the dictionary. Since e and d are constant for a given
> static dictionary, how well the dictionary is constructed is
> presumably important.

The algorithm is almost identical to the one I used in ZSON extension [1]
except the fact that ZSON uses 16-bit codes. In docs/benchmark.md you will find
approximate ratios to expect, etc. The reasons why this particular algorithm
was chosen are:

1. It was extensively tested in the past and seem to work OK for existing
   ZSON users.
2. It doesn't use any knowledge regarding the data structure and thus can be
   reused for TEXT/XML/etc as-is.
3. Previously we agreed that at some point users will be able to change the
   algorithm (the same way as they can do it for TOAST now) so which algorithm
   will be used in the first implementation is not that important. I simply
   choose the already existing one.

> > Current limitations (todo):
> > - ALTER TYPE is not implemented
>
> That reminds me. How do people expect to generate a "good" dictionary
> in practice? Would they somehow get the JSONB representations out of
> Postgres and run a training program over the blobs? I see some
> reference to training functions in the prior threads but don't see any
> breadcrumbs in the code.

So far we agreed that in the first implementation it will be done manually.
In the future it will be possible to update the dictionaries automatically
during VACUUM. The idea of something similar to zson_learn() procedure, as
I recall, didn't get much support, so we probably will not have it, or at least
it is not a priority.

> > - Alternative compression algorithms. Note that this will not require any
> > further changes in the catalog, only the values we write to pg_type and
> > pg_cast will differ.
>
> Could you expand on this? I.e. why would alternative algorithms not
> need catalog changes? It seems like the only schemes that could be
> used with pg_catalog.pg_dict are those that expect to map a byte
> string to a number. Is that general enough to cover other standard
> compression algorithms?

Sure. When creating a new dictionary pg_type and pg_cast are modified like this:

 =# CREATE TYPE mydict AS DICTIONARY OF JSONB ('abcdef', 'ghijkl');
CREATE TYPE
 =# SELECT * FROM pg_type WHERE typname = 'mydict';
-[ RECORD 1 ]--+---
oid| 16397
typname| mydict
typnamespace   | 2200
...
typarray   | 16396
typinput   | dictionary_in
typoutput  | dictionary_out
...

=# SELECT c.*, p.proname FROM pg_cast AS c
 LEFT JOIN pg_proc AS p
 ON p.oid = c.castfunc
  WHERE c.castsource = 16397 or c.casttarget = 16397;
-[ RECORD 1 ]-
oid | 16400
castsource  | 3802
casttarget  | 16397
castfunc| 9866
castcontext | a
castmethod  | f
proname | jsonb_dictionary
-[ RECORD 2 ]-
oid | 16401
castsource  | 16397
casttarget  | 3802
castfunc| 9867
castcontext | i
castmethod  | f
proname | dictionary_jsonb
-[ RECORD 3 ]-
oid | 16402
castsource  | 16397
casttarget  | 17
castfunc| 9868
castcontext | e
castmethod  | f
proname | dictionary_bytea

In order to add a new algorithm you simply need to provide alternatives
to dictionary_in / dictionary_out / jsonb_dictionary / dictionary_jsonb and
specify them in the catalog instead. The catalog schema will remain the same.

> It also seems strange to use a dictionary of C strings to compress
> binary data; wouldn't we want to be able to compress zero bytes too?

That's a good point. Again, here I simply followed the example of the ENUMs
implementation. Since compression dictionaries are intended to be used with
text-like types such as JSONB, (and also JSON, TEXT 

Re: [RFC] building postgres with meson

2022-06-02 Thread Aleksander Alekseev
Hi Andres,

> Cool. I think I pushed a fix/workaround for the issue now. Still can't decide
> whether it's apple's or meson's fault.

Many thanks! The fix solved the problem, I can compile with -Dldap=enabled now.
The code passes the tests too.

> > $ meson configure -Dldap=disabled
> > $ meson configure -Dssl=openssl
> > $ meson configure -Dprefix=/Users/eax/pginstall
>
> FYI, you can set multiple options in one go ;)

Thanks! ;)

> Makes sense. Currently we don't fill the --configure thing, because there
> configure wasn't used. We could try to generate something compatible from
> meson options, but I'm not sure that's a good plan.

If pg_config output was 100% backward compatible with Autotools one, that would
simplify the lives of the extension developers for sure. However, considering
that at PGCon we agreed that both Autotools and Meson will be maintained for
several releases, personally I wouldn't say that this compatibility is
necessary, nor is it realistically deliverable. Nevertheless, IMO there should
be a stable and documented way to determine the PostgreSQL version (this can be
done with `pg_config --version` for both Autotools and Meson), the build tool
used (no way to determine) and the build options (no way to determine
for Meson).

> I suspect the errors might be due to CC/CPPFLAGS/... not being defined. I can
> try to make it output something roughly compatible with the old output, for
> most of those I already started to compute values for the PGXS compat stuff I
> was hacking on recently.

Yes, that could explain the problem. Just for the record, I get several errors
regarding src/export.h in TimescaleDB code [1]:

```
/Users/eax/projects/c/timescaledb/src/export.h:26:5: error: pasting formed
  ')87628', an invalid preprocessing token [clang-diagnostic-error]
#if TS_EMPTY(PGDLLEXPORT)
 ^
/Users/eax/projects/c/timescaledb/src/export.h:17:22: note: expanded from
  macro 'TS_EMPTY'
#define TS_EMPTY(x) (TS_CAT(x, 87628) == 87628)
 ^
/Users/eax/projects/c/timescaledb/src/export.h:15:23: note: expanded from
  macro 'TS_CAT'
#define TS_CAT(x, y) x##y
  ^
/Users/eax/projects/c/timescaledb/src/export.h:26:14: error: function-like
  macro '__attribute__' is not defined [clang-diagnostic-error]
#if TS_EMPTY(PGDLLEXPORT)
 ^
/Users/eax/pginstall/include/postgresql/server/c.h:1339:21: note: expanded
  from macro 'PGDLLEXPORT'
#define PGDLLEXPORT __attribute__((visibility("default")))
^
/Users/eax/projects/c/timescaledb/src/export.h:30:2: error: "PGDLLEXPORT is
  already defined" [clang-diagnostic-error]
#error "PGDLLEXPORT is already defined"
 ^
1 warning and 3 errors generated.
Error while processing /Users/eax/projects/c/timescaledb/src/extension.c
```

[1]: https://github.com/timescale/timescaledb/blob/main/src/export.h

-- 
Best regards,
Aleksander Alekseev




Re: Defer selection of asynchronous subplans until the executor initialization stage

2022-06-02 Thread Etsuro Fujita
On Wed, Apr 6, 2022 at 3:58 PM Etsuro Fujita  wrote:
> I have committed the patch after modifying it as such.

The patch calls trivial_subqueryscan() during create_append_plan() to
determine the triviality of a SubqueryScan that is a child of an
Append node.  Unlike when calling it from
set_subqueryscan_references(), this is done before some
post-processing such as set_plan_references() on the subquery.  The
reason why this is safe wouldn't be that obvious, so I added to
trivial_subqueryscan() comments explaining this.  Attached is a patch
for that.

Best regards,
Etsuro Fujita


Improve-comments-for-trivial_subqueryscan.patch
Description: Binary data


Re: bogus: logical replication rows/cols combinations

2022-06-02 Thread Amit Kapila
On Fri, May 27, 2022 at 1:04 PM houzj.f...@fujitsu.com
 wrote:
>
> On Friday, May 27, 2022 1:54 PM Justin Pryzby  wrote:
> >
> > On Fri, May 27, 2022 at 11:17:00AM +0530, Amit Kapila wrote:
> > > On Tue, May 24, 2022 at 11:03 AM houzj.f...@fujitsu.com
> >  wrote:
> > > >
> > > > On Friday, May 20, 2022 11:06 AM Amit Kapila 
> > wrote:
> > > >
> > > > Thanks for pointing it out. Here is the new version patch which add this
> > version check.
> > >
> > > I have added/edited a few comments and ran pgindent. The attached
> > > looks good to me. I'll push this early next week unless there are more
> > > comments/suggestions.
> >
> > A minor doc review.
> > Note that I also sent some doc comments at
> > 20220519120724.go19...@telsasoft.com.
> >
> > +  lists among publications in which case ALTER
> > PUBLICATION
> > +  command will be successful but later the WalSender in publisher
> > + or the
> >
> > COMMA in which
> >
> > remove "command" ?
> >
> > s/in publisher/on the publisher/
> >
> > +   Subscription having several publications in which the same table has 
> > been
> > +   published with different column lists is not supported.
> >
> > Either "Subscriptions having .. are not supported"; or, "A subscription 
> > having ..
> > is not supported".
>
> Thanks for the comments. Here is the new version patch set which fixes these.
>

I have pushed the bug-fix patch. I'll look at the language
improvements patch next.

-- 
With Regards,
Amit Kapila.




Re: Costing elided SubqueryScans more nearly correctly

2022-06-02 Thread Etsuro Fujita
On Thu, May 5, 2022 at 4:30 PM Richard Guo  wrote:
> On Thu, May 5, 2022 at 7:03 AM Tom Lane  wrote:
>> 1015 improvements to 14 disimprovements isn't a bad score.  I'm
>> a bit surprised there are that many removable SubqueryScans TBH;
>> maybe that's an artifact of all the "SELECT *" queries.

> The patch looks sane to me. 1015 vs 14 is a good win.

+1

Best regards,
Etsuro Fujita




Re: PG15 beta1 sort performance regression due to Generation context change

2022-06-02 Thread Yura Sokolov
Fr, 27/05/2022 в 10:51 -0400, Tom Lane writes:
> Yura Sokolov  writes:
> > В Вт, 24/05/2022 в 17:39 -0700, Andres Freund пишет:
> > > A variation on your patch would be to only store the offset to the block
> > > header - that should always fit into 32bit (huge allocations being their 
> > > own
> > > block, which is why this wouldn't work for storing an offset to the
> > > context).
> > I'm +1 for this.
> 
> Given David's results in the preceding message, I don't think I am.

But David did the opposite: he removed pointer to block and remain
pointer to context. Then code have to do bsearch to find actual block.

> A scheme like this would add more arithmetic and at least one more
> indirection to GetMemoryChunkContext(), and we already know that
> adding even a test-and-branch there has measurable cost.  (I wonder
> if using unlikely() on the test would help?  But it's not unlikely
> in a generation-context-heavy use case.)

Well, it should be tested.

> There would also be a good
> deal of complication and ensuing slowdown created by the need for
> oversize chunks to be a completely different kind of animal with a
> different header.

Why? encoded_size could handle both small sizes and larges sizes
given actual (not requested) allocation size is rounded to page size.
There's no need to different chunk header.

> I'm also not very happy about this:
>
> > And with this change every memory context kind can have same header:
> 
> IMO that's a bug not a feature.  It puts significant constraints on how
> context types can be designed.

Nothing prevents to add additional data before common header.


regards

Yura





Re: doc: CREATE FOREIGN TABLE .. PARTITION OF .. DEFAULT

2022-06-02 Thread Amit Langote
On Thu, Jun 2, 2022 at 6:14 PM Etsuro Fujita  wrote:
> On Thu, Jun 2, 2022 at 10:23 AM Amit Langote  wrote:
> > On Wed, Jun 1, 2022 at 6:15 PM Etsuro Fujita  
> > wrote:
> > > On Tue, May 31, 2022 at 9:35 PM Robert Haas  wrote:
> > > > I would probably just update the synopsis. It's not very hard to
> > > > figure out what's likely to happen even without clicking through the
> > > > link, so it seems like it's just being long-winded to duplicate the
> > > > stuff here. But I don't care much if you feel otherwise.
> > >
> > > It looks like there are pros and cons.  I think it’s a matter of
> > > preference, though.
> > >
> > > I thought it would be an improvement, but I agree that we can live
> > > without it, so I changed my mind; I'll go with my version.  I think we
> > > could revisit this later.
> >
> > I guess I'm fine with leaving the text as-is, though slightly bothered
> > by leaving the phrase "partition of the given parent table with
> > specified partition bound values" to also cover the DEFAULT partition
> > case.
>
> I think we should discuss this separately, maybe as a HEAD-only patch,
> so I pushed my version, leaving the description as-is.

No problem, thanks for the fix.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




RE: Perform streaming logical transactions by background workers and parallel apply

2022-06-02 Thread wangw.f...@fujitsu.com
On Sun, May 29, 2022 8:25 PM osumi.takami...@fujitsu.com 
 wrote:
> Hi,
> 
> 
> Some review comments on the new patches from v6-0001 to v6-0004.
Thanks for your comments.

> 
> 
> (1) create_subscription.sgml
> 
> +  the transaction is committed. Note that if an error happens when
> +  applying changes in a background worker, it might not report the
> +  finish LSN of the remote transaction in the server log.
> 
> I suggest to add a couple of sentences like below
> to the section of logical-replication-conflicts in logical-replication.sgml.
> 
> "
> Setting streaming mode to 'apply' can export invalid LSN as
> finish LSN of failed transaction. Changing the streaming mode and
> making the same conflict writes the finish LSN of the
> failed transaction in the server log if required.
> "
Add the sentences as suggested.

> (2) ApplyBgworkerMain
> 
> 
> +   PG_TRY();
> +   {
> +   LogicalApplyBgwLoop(mqh, pst);
> +   }
> +   PG_CATCH();
> +   {
> 
> ...
> 
> +   pgstat_report_subscription_error(MySubscription->oid, false);
> +
> +   PG_RE_THROW();
> +   }
> +   PG_END_TRY();
> 
> 
> When I stream a transaction in-progress and it causes an error(duplication 
> error),
> seemingly the subscription stats (values in pg_stat_subscription_stats) don't
> get updated properly. The 2nd argument should be true for apply error.
> 
> Also, I observe that both apply_error_count and sync_error_count
> get updated together by error. I think we need to check this point as well.
Yes, we should input "true" to 2nd argument here to log "apply error".
And after checking the second point you mentioned, I think it is caused by the
first point you mentioned and another reason:
With the patch v6 (or v7) and we specify option "apply", when a streamed
transaction causes an error (duplication error), the function
pgstat_report_subscription_error is invoke twice (in main apply worker and
apply background worker, see function ApplyWorkerMain()->start_apply() and
ApplyBgworkerMain). This means for one same error, we will send twice stats
message.
So to fix this, I removed the code that you mentioned and then, just invoke
function LogicalApplyBgwLoop here.

> 
> 
> 
> (3) logicalrep_write_attrs
> 
> +   if (rel->rd_rel->relhasindex)
> +   {
> +   List   *indexoidlist = RelationGetIndexList(rel);
> +   ListCell   *indexoidscan;
> +   foreach(indexoidscan, indexoidlist)
> 
> and
> 
> +   if (indexRel->rd_index->indisunique)
> +   {
> +   int i;
> +   /* Add referenced attributes to idindexattrs 
> */
> +   for (i = 0; i < indexRel->rd_index->indnatts; 
> i++)
> 
> We don't have each blank line after variable declarations.
> There might be some other codes where this point can be applied.
> Please check.
Improve the formatting as you suggested. And I run pgindent for new patches.

> (4)
> 
> +   /*
> +* If any unique index exist, check that they are same as remoterel.
> +*/
> +   if (!rel->sameunique)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("cannot replicate relation with 
> different unique index"),
> +errhint("Please change the streaming option 
> to 'on' instead of
> 'apply'.")));
> 
> 
> When I create a logical replication setup with different constraints
> and let streaming of in-progress transaction run,
> I keep getting this error.
> 
> This should be documented as a restriction or something,
> to let users know the replication progress can't go forward by
> any differences written like in the commit-message in v6-0003.
> 
> Also, it would be preferable to test this as well, if we
> don't dislike having TAP tests for this.
Yes, you are right. Thank for your reminder.
I added this in the paragraph introducing value "apply" in
create_subscription.sgml:
```
To run in this mode, there are following two requirements. The first
is that the unique column should be the same between publisher and
subscriber; the second is that there should not be any non-immutable
function in subscriber-side replicated table.
```
Also added the related tests. (refer to 032_streaming_apply.pl in v8-0003)

I also made some other changes. The new patches and the modification details
were attached in [1].

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB62758A881FF3240171B7B21B9EDE9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei


RE: Perform streaming logical transactions by background workers and parallel apply

2022-06-02 Thread wangw.f...@fujitsu.com
On Mon, May 30, 2022 7:38 PM Amit Kapila  wrote:
> Few comments/suggestions for 0001 and 0003
> =
> 0001
> 
Thanks for your comments.

> 1.
> + else
> + snprintf(bgw.bgw_name, BGW_MAXLEN,
> + "logical replication apply worker for subscription %u", subid);
> 
> Can we slightly change the message to: "logical replication background
> apply worker for subscription %u"?
Improve the message as suggested.

> 2. Can we think of separating the new logic for applying the xact by
> bgworker into a new file like applybgwroker or applyparallel? We have
> previously done the same in the case of vacuum (see vacuumparallel.c).
Improve the patch as suggested. I separated the new logic related to apply
background worker to new file src/backend/replication/logical/applybgwroker.c.

> 3.
> + /*
> + * XXX The publisher side doesn't always send relation update messages
> + * after the streaming transaction, so update the relation in main
> + * apply worker here.
> + */
> + if (action == LOGICAL_REP_MSG_RELATION)
> + {
> + LogicalRepRelation *rel = logicalrep_read_rel(s);
> + logicalrep_relmap_update(rel);
> + }
> 
> I think the publisher side won't send the relation update message
> after streaming transaction only if it has already been sent for a
> non-streaming transaction in which case we don't need to update the
> local cache here. This is as per my understanding of
> maybe_send_schema(), do let me know if I am missing something? If my
> understanding is correct then we don't need this change.
I think we need this change because the publisher will invoke function
cleanup_rel_sync_cache when committing a streaming transaction, then it will
set "schema_sent" to true for related entry. Later, publisher may not send this
schema in function maybe_send_schema because we already sent this schema
(schema_sent = true).
If we do not have this change, It would cause an error in the following case:
Suppose that after walsender worker starts, first we commit a streaming
transaction. Walsender sends relation update message, and only apply background
worker can update relation map cache by this message. After this, if we commit
a non-streamed transaction that contains same replicated table, walsender will
not send relation update message, so main apply worker would not get relation
update message.
I think we need this change to update relation map cache not only in apply
background worker but also in apply main worker.
In addition, we should also handle the LOGICAL_REP_MSG_TYPE message just like
LOGICAL_REP_MSG_RELATION. So improve the code you mentioned. BTW, I simplify
the function handle_streamed_transaction().

> 4.
> + * For the main apply worker, if in streaming mode (receiving a block of
> + * streamed transaction), we send the data to the apply background worker.
>   *
> - * If in streaming mode (receiving a block of streamed transaction), we
> - * simply redirect it to a file for the proper toplevel transaction.
> 
> This comment is slightly confusing. Can we change it to something
> like: "In streaming case (receiving a block of streamed transaction),
> for SUBSTREAM_ON mode, we simply redirect it to a file for the proper
> toplevel transaction, and for SUBSTREAM_APPLY mode, we send the
> changes to background apply worker."?
Improve the comments as suggested.

> 5.
> +apply_handle_stream_abort(StringInfo s)
>  {
> ...
> ...
> + /*
> + * If the two XIDs are the same, it's in fact abort of toplevel xact,
> + * so just free the subxactlist.
> + */
> + if (subxid == xid)
> + {
> + set_apply_error_context_xact(subxid, InvalidXLogRecPtr);
> 
> - fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path,
> O_RDONLY,
> - false);
> + AbortCurrentTransaction();
> 
> - buffer = palloc(BLCKSZ);
> + EndTransactionBlock(false);
> + CommitTransactionCommand();
> +
> + in_remote_transaction = false;
> ...
> ...
> }
> 
> Here, can we update the replication origin as we are doing in
> apply_handle_rollback_prepared? Currently, we don't do it because we
> are just cleaning up temporary files for which we don't even have a
> transaction. Also, we don't have the required infrastructure to
> advance origins for aborts as we have for abort prepared. See commits
> [1eb6d6527a][8a812e5106]. If we think it is a good idea then I think
> we need to send abort_lsn and abort_time from the publisher and we
> need to be careful to make it work with lower subscriber versions that
> don't have the facility to process these additional values.
I think it is a good idea. I will consider this and add this part in next
version.

> 0003
> 
> 6.
> + /*
> + * If any unique index exist, check that they are same as remoterel.
> + */
> + if (!rel->sameunique)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot replicate relation with different unique index"),
> + errhint("Please change the streaming option to 'on' instead of 'apply'.")));
> 
> I think we can do better here. 

Re: Assertion failure with barriers in parallel hash join

2022-06-02 Thread David Geier
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Hi all,

We recently encountered the same bug in the field. Oleksii Kozlov managed to 
come up with reproduction steps, which reliably trigger it. Interestingly, the 
bug does not only manifest as failing assertion, but also as segmentation 
fault; in builds with disabled and with enabled (!) assertions. So it can crash 
production environments. We applied the proposed patch v3 from Melanie to the 
REL_14_3 branch and can confirm that with the patch neither the assertion nor 
the segmentation fault still occur.

I have also glanced at the code and the implementation looks fine. However, I'm 
not an expert for the fairly involved hash join state machine.
There seems to be no need for additional documentation.

For completeness here is the stack trace of the segmentation fault.
Like the stack trace from the assertion failure initially shared by Michael and 
also encountered by us, the stack trace of the segmentation fault also contains 
ExecParallelHashJoinNewBatch().

#9  | Source "/opt/src/backend/executor/execMain.c", line 361, in 
standard_ExecutorRun
| Source "/opt/src/backend/executor/execMain.c", line 1551, in ExecutePlan
  Source "/opt/src/include/executor/executor.h", line 257, in ExecProcNode 
[0x657e4d]
#8  | Source "/opt/src/backend/executor/nodeAgg.c", line 2179, in ExecAgg
  Source "/opt/src/backend/executor/nodeAgg.c", line 2364, in 
agg_retrieve_direct [0x66ba60]
#7  | Source "/opt/src/backend/executor/nodeAgg.c", line 581, in 
fetch_input_tuple
  Source "/opt/src/include/executor/executor.h", line 257, in ExecProcNode 
[0x66d585]
#6  | Source "/opt/src/backend/executor/nodeHashjoin.c", line 607, in 
ExecParallelHashJoin
| Source "/opt/src/backend/executor/nodeHashjoin.c", line 560, in 
ExecHashJoinImpl
  Source "/opt/src/backend/executor/nodeHashjoin.c", line 1132, in 
ExecParallelHashJoinNewBatch [0x67a89b]
#5  | Source "/opt/src/backend/storage/ipc/barrier.c", line 242, in 
BarrierAttach
  Source "/opt/src/include/storage/s_lock.h", line 228, in tas [0x7c2a1b]
#4Object "/lib/x86_64-linux-gnu/libpthread.so.0", at 0x7f4db364841f, in 
__funlockfile

--
David Geier
(SericeNow)

The new status of this patch is: Ready for Committer


Re: doc: CREATE FOREIGN TABLE .. PARTITION OF .. DEFAULT

2022-06-02 Thread Etsuro Fujita
On Thu, Jun 2, 2022 at 10:23 AM Amit Langote  wrote:
> On Wed, Jun 1, 2022 at 6:15 PM Etsuro Fujita  wrote:
> > On Tue, May 31, 2022 at 9:35 PM Robert Haas  wrote:
> > > I would probably just update the synopsis. It's not very hard to
> > > figure out what's likely to happen even without clicking through the
> > > link, so it seems like it's just being long-winded to duplicate the
> > > stuff here. But I don't care much if you feel otherwise.
> >
> > It looks like there are pros and cons.  I think it’s a matter of
> > preference, though.
> >
> > I thought it would be an improvement, but I agree that we can live
> > without it, so I changed my mind; I'll go with my version.  I think we
> > could revisit this later.
>
> I guess I'm fine with leaving the text as-is, though slightly bothered
> by leaving the phrase "partition of the given parent table with
> specified partition bound values" to also cover the DEFAULT partition
> case.

I think we should discuss this separately, maybe as a HEAD-only patch,
so I pushed my version, leaving the description as-is.

Best regards,
Etsuro Fujita




Proposal: adding a better description in psql command about large objects

2022-06-02 Thread Thibaud W.

Hello,

Attached is a small patch to add a description to the meta commands 
regarding

large objects.


the actual description when using psql --help=commands is :

Large Objects
   \lo_export LOBOID FILE
   \lo_import FILE [COMMENT]
   \lo_list
   \lo_unlink LOBOID  large object operations

the proposed description is :

Large Objects
   \lo_export LOBOID FILE  export large object to a file
   \lo_import FILE [COMMENT]    import large object from a file
   \lo_list   list large objects
   \lo_unlink LOBOID  delete a large object


I tried to make an alignment on the description of other meta-commands.

Thanks.
Regards.
--
Thibaud W.diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 49eb116f33..5af7abca9e 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -321,10 +321,10 @@ slashUsage(unsigned short int pager)
fprintf(output, "\n");
 
fprintf(output, _("Large Objects\n"));
-   fprintf(output, _("  \\lo_export LOBOID FILE\n"
- "  \\lo_import FILE [COMMENT]\n"
- "  \\lo_list[+]\n"
- "  \\lo_unlink LOBOID  large object operations\n"));
+   fprintf(output, _("  \\lo_export LOBOID FILE export large object to a file\n"));
+   fprintf(output, _("  \\lo_import FILE [COMMENT]import large object from a file\n"));
+   fprintf(output, _("  \\lo_list[+]list large objects\n"));
+   fprintf(output, _("  \\lo_unlink LOBOID  delete a large object\n"));
 
ClosePager(output);
 }


Re: PG15 beta1 sort performance regression due to Generation context change

2022-06-02 Thread John Naylor
I ran a shorter version of David's script with just 6-9 attributes to
try to reproduce the problem area (spreadsheet with graph attached).
My test is also different in that I compare HEAD with just reverting
40af10b57. This shows a 60% increase in HEAD in runtime for 64MB
workmem and 64 byte tuples. It also shows a 20% regression for 32MB
workmem and 64 byte tuples.

I don't have anything to add to the discussion about whether something
needs to be done here for PG15. If anything, changing work_mem is an
easy to understand (although sometimes not practical) workaround.
-- 
John Naylor
EDB: http://www.enterprisedb.com


Sort benchmark PG15 vs revert gen ctx JCN.ods
Description: application/vnd.oasis.opendocument.spreadsheet


RE: effective_io_concurrency and NVMe devices

2022-06-02 Thread Jakub Wartak
Hi Nathan,

> > NVMe devices have a maximum queue length of 64k:
[..]
> > but our effective_io_concurrency maximum is 1,000:
[..]
> > Should we increase its maximum to 64k?  Backpatched?  (SATA has a
> > maximum queue length of 256.)
> 
> If there are demonstrable improvements with higher values, this seems
> reasonable to me.  I would even suggest removing the limit completely so
> this doesn't need to be revisited in the future.

Well, are there any? I remember playing with this (although for ANALYZE 
Stephen's case [1]) and got quite contrary results [2] -- see going to 16 from 
8 actually degraded performance.
I somehow struggle to understand how 1000+ fadvise() syscalls would be a net 
benefit on storage with latency of ~0.1.. 0.3ms as each syscall on it's own is 
overhead (quite contrary, it should help on very slow one?) 
Pardon if I'm wrong (I don't have time to lookup code now), but maybe Bitmap 
Scans/fadvise() logic would first need to perform some fadvise() offset/length 
aggregations to bigger fadvise() syscalls and in the end real hardware 
observable I/O concurrency would be bigger (assuming that fs/LVM/dm/mq layer 
would split that into more parallel IOs).

-J.

[1] - https://commitfest.postgresql.org/30/2799/
[2] - 
https://www.postgresql.org/message-id/flat/vi1pr0701mb69603a433348edcf783c6ecbf6...@vi1pr0701mb6960.eurprd07.prod.outlook.com


 





Re: Multi-Master Logical Replication

2022-06-02 Thread Peter Smith
On Thu, Jun 2, 2022 at 12:03 AM Bruce Momjian  wrote:
>
> On Wed, Jun  1, 2022 at 10:27:27AM +0530, Amit Kapila wrote:
...

> My big point is that you should not be showing up with a patch but
> rather have these discussions to get agreement that this is the
> direction the community wants to go.

The purpose of posting the POC patch was certainly not to present a
fait accompli design/implementation.

We wanted to solicit some community feedback about the desirability of
the feature, but because LRG is complicated to describe we felt that
having a basic functional POC might help to better understand the
proposal. Also, we thought the ability to experiment with the proposed
API could help people to decide whether LRG is something worth
pursuing or not.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Proposal: add a debug message about using geqo

2022-06-02 Thread KAWAMOTO Masaya
On Tue, 10 May 2022 18:49:54 +0530
Ashutosh Bapat  wrote:

> If we add that information to EXPLAIN output, the user won't need
> access to server logs.
> 
> May be we need it in both the places.

That sounds a nice idea. But I don't think that postgres shows in the
EXPLAIN output why the plan is selected. Would it be appropriate to
show that GEQO is used in EXPLAIN output?

As a test, I created a patch that add information about GEQO to
EXPLAIN output by the GEQO option. The output example is as follows.
What do you think about the location and content of information about GEQO?

postgres=# explain (geqo) select o.id, o.date, c.name as customer_name, 
bar.amount as total_amount
from orders o join customer c on o.customer_id = c.id
join (select foo.id as id, sum(foo.amount) as amount
from (select od.order_id as id, p.name as name, od.quantity as 
quantity, (p.price * od.quantity) as amount
  from order_detail od join product p on od.product_id = p.id
  ) as foo
group by id) as bar on o.id = bar.id ;
   QUERY PLAN

 Hash Join  (cost=118.75..155.04 rows=200 width=48)
   Hash Cond: (o.customer_id = c.id)
   ->  Hash Join  (cost=94.58..130.34 rows=200 width=20)
 Hash Cond: (o.id = bar.id)
 ->  Seq Scan on orders o  (cost=0.00..30.40 rows=2040 width=12)
 ->  Hash  (cost=92.08..92.08 rows=200 width=12)
   ->  Subquery Scan on bar  (cost=88.08..92.08 rows=200 width=12)
 ->  HashAggregate  (cost=88.08..90.08 rows=200 width=12)
   Group Key: od.order_id
   ->  Hash Join  (cost=37.00..72.78 rows=2040 width=12)
 Hash Cond: (od.product_id = p.id)
 ->  Seq Scan on order_detail od  
(cost=0.00..30.40 rows=2040 width=12)
 ->  Hash  (cost=22.00..22.00 rows=1200 width=8)
   ->  Seq Scan on product p  
(cost=0.00..22.00 rows=1200 width=8)
   ->  Hash  (cost=16.30..16.30 rows=630 width=36)
 ->  Seq Scan on customer c  (cost=0.00..16.30 rows=630 width=36)
 GeqoDetails: GEQO: used, geqo_threshold: 3, Max join nodes: 3
(17 rows)

postgres=# set geqo_threshold to 16;
SET
postgres=# explain (geqo) select ... ;
   QUERY PLAN

 Hash Join  (cost=118.75..155.04 rows=200 width=48)
   Hash Cond: (o.customer_id = c.id)
   ->  Hash Join  (cost=94.58..130.34 rows=200 width=20)
 Hash Cond: (o.id = bar.id)
 ->  Seq Scan on orders o  (cost=0.00..30.40 rows=2040 width=12)
 ->  Hash  (cost=92.08..92.08 rows=200 width=12)
   ->  Subquery Scan on bar  (cost=88.08..92.08 rows=200 width=12)
 ->  HashAggregate  (cost=88.08..90.08 rows=200 width=12)
   Group Key: od.order_id
   ->  Hash Join  (cost=37.00..72.78 rows=2040 width=12)
 Hash Cond: (od.product_id = p.id)
 ->  Seq Scan on order_detail od  
(cost=0.00..30.40 rows=2040 width=12)
 ->  Hash  (cost=22.00..22.00 rows=1200 width=8)
   ->  Seq Scan on product p  
(cost=0.00..22.00 rows=1200 width=8)
   ->  Hash  (cost=16.30..16.30 rows=630 width=36)
 ->  Seq Scan on customer c  (cost=0.00..16.30 rows=630 width=36)
 GeqoDetails: GEQO: not used, geqo_threshold: 16, Max join nodes: 3
(17 rows)

postgres=# explain (analyze, settings, geqo) select ...;
QUERY PLAN


--
 Hash Join  (cost=118.75..155.04 rows=200 width=48) (actual time=0.104..0.113 
rows=3 loops=1)
   Hash Cond: (o.customer_id = c.id)
   ->  Hash Join  (cost=94.58..130.34 rows=200 width=20) (actual 
time=0.042..0.048 rows=3 loops=1)
 Hash Cond: (o.id = bar.id)
 ->  Seq Scan on orders o  (cost=0.00..30.40 rows=2040 width=12) 
(actual time=0.003..0.005 rows=3 loops=1)
 ->  Hash  (cost=92.08..92.08 rows=200 width=12) (actual 
time=0.034..0.037 rows=3 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   ->  Subquery Scan on bar  (cost=88.08..92.08 rows=200 width=12) 
(actual time=0.031..0.035 rows=3 loops=1)
 ->  HashAggregate  (cost=88.08..90.08 rows=200 width=12) 
(actual time=0.030..0.033 rows=3 loops=1)
   Group Key: od.order_id
   Batches: 1  Memory Usage: 56kB
   ->  

Re: Multi-Master Logical Replication

2022-06-02 Thread Amit Kapila
On Wed, Jun 1, 2022 at 7:33 PM Bruce Momjian  wrote:
>
> On Wed, Jun  1, 2022 at 10:27:27AM +0530, Amit Kapila wrote:
> > On Tue, May 31, 2022 at 7:36 PM Bruce Momjian  wrote:
> > > Uh, thinking some more, why would anyone set things up this way ---
> > > having part of a table being primary on one server and a different part
> > > of the table be a subscriber.  Seems it would be simpler and safer to
> > > create two child tables and have one be primary on only one server.
> > > Users can access both tables using the parent.
> >
> > Yes, users can choose to do that way but still, to keep the nodes in
> > sync and continuity of operations, it will be very difficult to manage
> > the operations without the LRG APIs. Let us consider a simple two-node
> > example where on each node there is Table T that has partitions P1 and
> > P2. As far as I can understand, one needs to have the below kind of
> > set-up to allow local operations on geographically distributed nodes.
> >
> > Node-1:
> > node1 writes to P1
> > node1 publishes P1
> > node2 subscribes to P1 of node1
> >
> > Node-2:
> > node2 writes to P2
> > node2 publishes P2
> > node1 subscribes to P2 on node2
>
> Yes, that is how you would set it up.
>
> > In this setup, we need to publish individual partitions, otherwise, we
> > will face the loop problem where the data sent by node-1 to node-2 via
> > logical replication will again come back to it causing problems like
> > constraints violations, duplicate data, etc. There could be other ways
> > to do this set up with current logical replication commands (for ex.
> > publishing via root table) but that would require ways to avoid loops
> > and could have other challenges.
>
> Right, individual paritions.
>
> > Now, in such a setup/scheme, consider a scenario (scenario-1), where
> > node-2 went off (either it crashes, went out of network, just died,
> > etc.) and comes up after some time. Now, one can either make the
> > node-2 available by fixing the problem it has or can promote standby
> > in that location (if any) to become master, both might require some
> > time. In the meantime to continue the operations (which provides a
> > seamless experience to users), users will be connected to node-1 to
> > perform the required write operations. Now, to achieve this without
> > LRG APIs, it will be quite complex for users to keep the data in sync.
> > One needs to perform various steps to get the partition P2 data that
> > went to node-1 till the time node-2 was not available. On node-1, it
> > has to publish P2 changes for the time node-2 becomes available with
> > the help of Create/Drop Publication APIs. And when node-2 comes back,
> > it has to create a subscription for the above publication pub-2 to get
> > that data, ensure both the nodes and in sync, and then allow
> > operations on node-2.
>
> Well, you are going to need to modify the app so it knows it can write
> to both partitions on failover anyway.
>

I am not sure if this point is clear to me. From what I can understand
there are two possibilities for the app in this case and both seem to
be problematic.

(a) The app can be taught to write to the P2 partition in node-1 till
the time node-2 is not available. If so, how will we get the partition
P2 data that went to node-1 till the time node-2 was unavailable? If
we don't get the data to node-2 then the operations on node-2 (once it
comes back) can return incorrect results. Also, we need to ensure all
the data for P2 that went to node-1 should be replicated to all other
nodes in the system and for that also we need to create new
subscriptions pointing to node-1. It is easier to think of doing this
for physical replication where after failover the old master node can
start following the new node and the app just need to be taught to
write to the new master node. I can't see how we can achieve that by
current logical replication APIs (apart from doing the complex steps
shared by me). One of the purposes of these new LRG APIs is to ensure
that users don't need to follow those complex steps after failover.

(b) The other possibility is that the app is responsible to ensure
that the same data is written on both node-1 and node-2 for the time
one of those is not available. For that app needs to store the data at
someplace for the time one of the nodes is unavailable and then write
it once the other node becomes available? Also, it won't be practical
when there are more partitions (say 10 or more) as all the partitions
data needs to be present on each node. I think it is the
responsibility of the database to keep the data in sync among nodes
when one or more of the nodes are not available.

-- 
With Regards,
Amit Kapila.