Re: Support for VACUUMing Foreign Tables

2021-05-13 Thread Michael Paquier
On Fri, May 14, 2021 at 01:05:02AM +, tsunakawa.ta...@fujitsu.com wrote:
> Could you let us imagine more concretely how useful it will be?
> While TRUNCATE can be part of an application's data processing as
> alternative to DELETE, I think VACUUM is purely the data storage
> maintenance that's performed by the DBA and can be done naturally
> locally on the server where the table resides.  (The existing
> ANALYZE on FDW is an exception; it's useful to also have data
> statistics locally.)

The concept of vacuuming applies to PG because of its concepts behind
MVCC.  Thinking broader, in which aspect can that apply to FDWs in
general?

> How about adding a routine to the FDW interface that allows to
> execute an arbitrary command like the following?  VACUUM will be
> able to use this.
> 
> PGresult *DoCommandPathThrough(ForeignTable *table, const char *command);
> 
> Or, maybe it's more flexible to use ForeignServer instead of ForeignTable.

Being able to pass down to remote servers arbitrary command strings
sounds like a recipy for security holes, IMO.
--
Michael


signature.asc
Description: PGP signature


Re: subscriptioncheck failure

2021-05-13 Thread Michael Paquier
On Thu, May 13, 2021 at 07:05:55PM +0530, vignesh C wrote:
> Thanks for the comments, Please find the attached patch having the changes.

Cool, thanks for the new version.  I have spent some time
understanding the initial report from Amit as well as what you are
proposing here, and refactoring the test so as the set of CREATE/ALTER
SUBSCRIPTION commands are added within this routine is a good idea.

I would have made the comment on top of setup_subscription a bit more
talkative regarding the fact that it may reuse an existing
subscription, but that's a nit.  Let's wait for Amit and see what he
thinks about what you are proposing.
--
Michael


signature.asc
Description: PGP signature


Re: OOM in spgist insert

2021-05-13 Thread Dilip Kumar
On Fri, May 14, 2021 at 6:31 AM Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > On 2021-May-13, Tom Lane wrote:
> >> What do people think about back-patching this?  In existing branches,
> >> we've defined it to be an opclass bug if it fails to shorten the leaf
> >> datum enough.  But not having any defenses against that seems like
> >> not a great idea.  OTOH, the 10-cycles-to-show-progress rule could be
> >> argued to be an API break.
>
> > I think if the alternative is to throw an error, we can afford to retry
> > quite a few more times than 10 in order not have that called an API
> > break.  Say, retry (MAXIMUM_ALIGNOF << 3) times or so (if you want to
> > parameterize on maxalign).  It's not like this is going to be a
> > performance drag where not needed .. but I think leaving back-branches
> > unfixed is not great.
>
> Hm.  Index bloat is not something to totally ignore, though, so I'm
> not sure what the best cutoff is.
>
> Anyway, here is a patch set teased apart into committable bites,
> and with your other points addressed.

I have tested with my original issue and this patch is fixing the
issue.  Thanks!


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Race condition in recovery?

2021-05-13 Thread Kyotaro Horiguchi
At Fri, 14 May 2021 14:12:31 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 13 May 2021 17:07:31 -0400, Robert Haas  wrote 
> in 
> > On Mon, May 10, 2021 at 4:35 AM Kyotaro Horiguchi
> >  wrote:
> > > It seems to me the issue here is not a race condition but
> > > WaitForWALToBecomeAvailable initializing expectedTLEs with the history
> > > of a improper timeline. So using recoveryTargetTLI instead of
> > > receiveTLI for the case fixes this issue.
> > 
> > I agree.
> > 
> > > I believe the 004_timeline_switch.pl detects your issue.  And the
> > > attached change fixes it.
> > 
> > So why does this use recoveryTargetTLI instead of receiveTLI only
> > conditionally? Why not do it all the time?
> 
> The commit ee994272ca apparently says that there's a case where primary 

This is not an incomplete line but just a garbage.

> > The hard thing about this code is that the assumptions are not very
> > clear. If we don't know why something is a certain way, then we might
> > break things if we change it. Worse yet, if nobody else knows why it's
> > like that either, then who knows what assumptions they might be
> > making? It's hard to be sure that any change is safe.
> 
> Thanks for the comment.
> 
> > But that being said, we have a clear definition from the comments for
> > what expectedTLEs is supposed to contain, and it's only going to end
> > up with those contents if we initialize it from recoveryTargetTLI. So
> > I am inclined to think that we ought to do that always, and if it
> 
> Yes, I also found it after that, and agreed.  Desynchronization
> between recoveryTargetTLI and expectedTLEs prevents
> rescanLatestTimeline from working.
> 
> > breaks something, then that's a sign that some other part of the code
> > also needs fixing, because apparently that hypothetical other part of
> > the code doesn't work if expctedTLEs contains what the comments say
> > that it should.
> 
> After some more inspection, I'm now also sure that it is a
> typo/thinko.  Other than while fetching the first checkpoint,
> receivedTLI is always in the history of recoveryTargetTLI, otherwise
> recoveryTargetTLI is equal to receiveTLI.
> 
> I read that the commit message as "waiting for fetching possible
> future history files to know if there's the future for the current
> timeline.  However now I read it as "don't bother expecting for
> possiblly-unavailable history files when we're reading the first
> checkpoint the timeline for which is already known to us.".  If it is
> correct we don't bother considering future history files coming from
> primary there.
> 
> > Now maybe that's the wrong idea. But if so, then we're saying that the
> > definition of expectedTLEs needs to be changed, and we should update
> > the comments with the new definition, whatever it is. A lot of the
> > confusion here results from the fact that the code and comments are
> > inconsistent and we can't tell whether that's intentional or
> > inadvertent. Let's not leave the next person who looks at this code
> > wondering the same thing about whatever changes we make.
> 
> Ok.  The reason why we haven't have a complain about that would be
> that it is rare that pg_wal is wiped out before a standby connects to
> a just-promoted primary. I'm not sure about the tool Dilip is using,
> though..
> 
> So the result is the attached.  This would be back-patcheable to 9.3
> (or 9.6?) but I doubt that we should do as we don't seem to have had a
> complaint on this issue and we're not full faith on this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: parallel vacuum - few questions on docs, comments and code

2021-05-13 Thread Dilip Kumar
On Thu, May 13, 2021 at 9:00 PM Bharath Rupireddy
 wrote:
>
> Done that way.
>
> PSA patch.

Your changes look good.  About changing the "non-negative integer" to
"greater than or equal to zero", there is another thread [1], I am not
sure that have we concluded anything there yet.

- pg_log_error("parallel vacuum degree must be a non-negative integer");
+ pg_log_error("parallel workers for vacuum must be greater than or
equal to zero");
  exit(1);

[1] 
https://www.postgresql.org/message-id/os0pr01mb5716415335a06b489f1b3a8194...@os0pr01mb5716.jpnprd01.prod.outlook.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Race condition in recovery?

2021-05-13 Thread Kyotaro Horiguchi
At Thu, 13 May 2021 17:07:31 -0400, Robert Haas  wrote 
in 
> On Mon, May 10, 2021 at 4:35 AM Kyotaro Horiguchi
>  wrote:
> > It seems to me the issue here is not a race condition but
> > WaitForWALToBecomeAvailable initializing expectedTLEs with the history
> > of a improper timeline. So using recoveryTargetTLI instead of
> > receiveTLI for the case fixes this issue.
> 
> I agree.
> 
> > I believe the 004_timeline_switch.pl detects your issue.  And the
> > attached change fixes it.
> 
> So why does this use recoveryTargetTLI instead of receiveTLI only
> conditionally? Why not do it all the time?

The commit ee994272ca apparently says that there's a case where primary 

> The hard thing about this code is that the assumptions are not very
> clear. If we don't know why something is a certain way, then we might
> break things if we change it. Worse yet, if nobody else knows why it's
> like that either, then who knows what assumptions they might be
> making? It's hard to be sure that any change is safe.

Thanks for the comment.

> But that being said, we have a clear definition from the comments for
> what expectedTLEs is supposed to contain, and it's only going to end
> up with those contents if we initialize it from recoveryTargetTLI. So
> I am inclined to think that we ought to do that always, and if it

Yes, I also found it after that, and agreed.  Desynchronization
between recoveryTargetTLI and expectedTLEs prevents
rescanLatestTimeline from working.

> breaks something, then that's a sign that some other part of the code
> also needs fixing, because apparently that hypothetical other part of
> the code doesn't work if expctedTLEs contains what the comments say
> that it should.

After some more inspection, I'm now also sure that it is a
typo/thinko.  Other than while fetching the first checkpoint,
receivedTLI is always in the history of recoveryTargetTLI, otherwise
recoveryTargetTLI is equal to receiveTLI.

I read that the commit message as "waiting for fetching possible
future history files to know if there's the future for the current
timeline.  However now I read it as "don't bother expecting for
possiblly-unavailable history files when we're reading the first
checkpoint the timeline for which is already known to us.".  If it is
correct we don't bother considering future history files coming from
primary there.

> Now maybe that's the wrong idea. But if so, then we're saying that the
> definition of expectedTLEs needs to be changed, and we should update
> the comments with the new definition, whatever it is. A lot of the
> confusion here results from the fact that the code and comments are
> inconsistent and we can't tell whether that's intentional or
> inadvertent. Let's not leave the next person who looks at this code
> wondering the same thing about whatever changes we make.

Ok.  The reason why we haven't have a complain about that would be
that it is rare that pg_wal is wiped out before a standby connects to
a just-promoted primary. I'm not sure about the tool Dilip is using,
though..

So the result is the attached.  This would be back-patcheable to 9.3
(or 9.6?) but I doubt that we should do as we don't seem to have had a
complaint on this issue and we're not full faith on this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 94e695031e7aa78670b1d0fd63f6cfed0d501611 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 14 May 2021 13:51:01 +0900
Subject: [PATCH v2] Set expectedTLEs correctly based on recoveryTargetTLI

When a standby starts streaming before it determines expectedTLEs, it
is set based on the timeline of the WAL segment just streamed from the
primary.  If we fetch the last checkpoint in the older timeline, this
behavior leads to setting expectedTLEs based on the older timeline
than recoveryTargetTLI and later calls to rescanLatestTimeLine don't
any longer update recoveryTargetTLI and expectedTLEs and the standby
stays frozen at the point.

This behavior has been introduced by commit ee994272ca but there's no
explanation about breaking the defined relationship between the two
variables, and there seems not to be a case the behavior is useful.

Make things consistent by fixing WaitForWALToBecomeAvailable to set
expectedTLEs not using receiveTLI but recoveryTargetTLI.
---
 src/backend/access/transam/xlog.c  |  3 +-
 src/test/recovery/t/004_timeline_switch.pl | 72 +-
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8d163f190f..ef8b6d2c8d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12662,7 +12662,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		if (readFile < 0)
 		{
 			if (!expectedTLEs)
-expectedTLEs = readTimeLineHistory(receiveTLI);
+expectedTLEs =
+	readTimeLineHistory(recoveryTargetTLI);
 			readFile = 

Re: Race condition in recovery?

2021-05-13 Thread Dilip Kumar
On Fri, May 14, 2021 at 2:37 AM Robert Haas  wrote:
>
> So why does this use recoveryTargetTLI instead of receiveTLI only
> conditionally? Why not do it all the time?
>
> The hard thing about this code is that the assumptions are not very
> clear. If we don't know why something is a certain way, then we might
> break things if we change it. Worse yet, if nobody else knows why it's
> like that either, then who knows what assumptions they might be
> making? It's hard to be sure that any change is safe.
>
> But that being said, we have a clear definition from the comments for
> what expectedTLEs is supposed to contain, and it's only going to end
> up with those contents if we initialize it from recoveryTargetTLI. So
> I am inclined to think that we ought to do that always, and if it
> breaks something, then that's a sign that some other part of the code
> also needs fixing, because apparently that hypothetical other part of
> the code doesn't work if expctedTLEs contains what the comments say
> that it should.
>
> Now maybe that's the wrong idea. But if so, then we're saying that the
> definition of expectedTLEs needs to be changed, and we should update
> the comments with the new definition, whatever it is. A lot of the
> confusion here results from the fact that the code and comments are
> inconsistent and we can't tell whether that's intentional or
> inadvertent. Let's not leave the next person who looks at this code
> wondering the same thing about whatever changes we make.

I am not sure that have you noticed the commit id which changed the
definition of expectedTLEs, Heikki has committed that change so adding
him in the list to know his opinion.

=
ee994272ca50f70b53074f0febaec97e28f83c4e
Author: Heikki Linnakangas   2013-01-03 14:11:58
Committer: Heikki Linnakangas   2013-01-03 14:11:58

Delay reading timeline history file until it's fetched from master.
.
 Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
=

Part of this commit message says that it will not identify the
recoveryTargetTLI as the ancestor of the checkpoint timeline (without
history file).  I did not understand what it is trying to say.  Does
it is trying to say that even though the recoveryTargetTLI is the
ancestor of the checkpoint timeline but we can not track that because
we don't have a history file?  So to handle this problem change the
definition of expectedTLEs to directly point to the checkpoint
timeline?

Because before this commit, we were directly initializing expectedTLEs
with the history file of recoveryTargetTLI, we were not even waiting
for reading the checkpoint,  but under this commit, it is changed.

I am referring to the below code which was deleted by this commit:


@@ -5279,49 +5299,6 @@ StartupXLOG(void)
  */
  readRecoveryCommandFile();

- /* Now we can determine the list of expected TLIs */
- expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
-
- /*
- * If the location of the checkpoint record is not on the expected
- * timeline in the history of the requested timeline, we cannot proceed:
- * the backup is not part of the history of the requested timeline.
- */
- if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) !=
- ControlFile->checkPointCopy.ThisTimeLineID)
- {
=

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Tom Lane
I wrote:
> Maybe we should revert this thing pending somebody doing the work to
> make a version of queryid labeling that actually is negligibly cheap.
> It certainly seems like that could be done; one more traversal of the
> parse tree can't be that expensive in itself.  I suspect that the
> performance problem is with the particular hashing mechanism that
> was used, which looks mighty ad-hoc anyway.

To put a little bit of meat on that idea, I experimented with jacking
up the "jumble" calculation and driving some other implementations
underneath.

I thought that Julien's "worst case" scenario was pretty far from
worst case, since it involved a join which a lot of simple queries
don't.  I tested this scenario instead:

$ cat naive.sql
SELECT * FROM pg_class c ORDER BY oid DESC LIMIT 1;
$ pgbench -n -f naive.sql -T 60 postgres

which is still complicated enough that there's work for the
query fingerprinter to do, but not so much for planning and
execution.

I confirm that on HEAD, there's a noticeable TPS penalty from
turning on compute_query_id: about 3.2% on my machine.

The first patch attached replaces the "jumble" calculation
with two CRC32s (two so that we still get 64 bits out at
the end).  I see 2.7% penalty with this version.  Now,
I'm using an Intel machine with
#define USE_SSE42_CRC32C_WITH_RUNTIME_CHECK 1
so on machines without any hardware CRC support, this'd
likely be a loss.  But it still proves the point that the
existing implementation is just not very speedy.

I then tried a really dumb xor'ing implementation, and
that gives me a slowdown of 2.2%.  This could undoubtedly
be improved on further, say by unrolling the loop or
processing multiple bytes at once.  One problem with it
is that I suspect it will tend to concentrate the entropy
into the third/fourth and seventh/eighth bytes of the
accumulator, since so many of the fields being jumbled
are 4-byte or 8-byte fields with most of the entropy in
their low-order bits.  Probably that could be improved
with a bit more thought -- say, an extra bump of the
nextbyte pointer after each field.

Anyway, I think that what we have here is quite an inferior
implementation, and we can do better with some more thought.

regards, tom lane

diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index f004a9ce8c..74ed555ed7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -41,7 +41,7 @@
 
 static uint64 compute_utility_query_id(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
-		 const unsigned char *item, Size size);
+		 const void *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
 static void JumbleRangeTable(JumbleState *jstate, List *rtable);
 static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
@@ -106,9 +106,11 @@ JumbleQuery(Query *query, const char *querytext)
 	{
 		jstate = (JumbleState *) palloc(sizeof(JumbleState));
 
-		/* Set up workspace for query jumbling */
-		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-		jstate->jumble_len = 0;
+		/* Initialize CRCs for query jumbling */
+		INIT_CRC32C(jstate->crc0);
+		INIT_CRC32C(jstate->crc1);
+		jstate->whichcrc = 0;
+		/* Initialize state for tracking where constants are */
 		jstate->clocations_buf_size = 32;
 		jstate->clocations = (LocationLen *)
 			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
@@ -117,9 +119,11 @@ JumbleQuery(Query *query, const char *querytext)
 
 		/* Compute query ID and mark the Query node with it */
 		JumbleQueryInternal(jstate, query);
-		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
-		  jstate->jumble_len,
-		  0));
+
+		FIN_CRC32C(jstate->crc0);
+		FIN_CRC32C(jstate->crc1);
+		query->queryId = (((uint64) jstate->crc0) << 32) |
+			((uint64) jstate->crc1);
 
 		/*
 		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
@@ -165,36 +169,18 @@ compute_utility_query_id(const char *query_text, int query_location, int query_l
  * the current jumble.
  */
 static void
-AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+AppendJumble(JumbleState *jstate, const void *item, Size size)
 {
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
+	if (jstate->whichcrc)
 	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-		  JUMBLE_SIZE, 0));
-			memcpy(jumble, _hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, 

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2021-05-13 Thread David Rowley
On Fri, 14 May 2021 at 11:22, Tom Lane  wrote:
> I recall somebody (David Rowley, maybe?  Too lazy to check archives.)
> working on this idea awhile ago, but he didn't get to the point of
> a committable patch.

Yeah. Me. The discussion is in [1].

David

[1] 
https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Julien Rouhaud
On Fri, May 14, 2021 at 12:20:00PM +0900, Fujii Masao wrote:
> 
> 
> On 2021/05/14 9:04, Alvaro Herrera wrote:
> > Here's a first attempt at what was suggested.  If you say "auto" it
> > remains auto in SHOW, but it gets enabled if a module asks for it.
> > 
> > Not final yet, but I thought I'd throw it out for early commentary ...
> 
> Many thanks! The patch basically looks good to me.
> 
> +void
> +EnableQueryId(void)
> +{
> + if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
> + auto_query_id_enabled = true;
> 
> Shouldn't EnableQueryId() enable auto_query_id_enabled whatever 
> compute_query_id is?
> Otherwise, for example, the following scenario can happen and it's a bit 
> strange.
> 
> 1. The server starts up with shared_preload_libraries=pg_stat_statements and 
> compute_query_id=on
> 2. compute_query_id is set to auto and the configuration file is reloaded
> Then, even though compute_query_id is auto and pg_stat_statements is loaded,
> query ids are not computed and no queries are tracked by pg_stat_statements.

+1.  Note that if you switch from compute_query_id = off + custom
query_id + pg_stat_statements to compute_query_id = auto then thing will
immediately break (as we instruct third-party plugins authors to error out in
that case), which is way better than breaking at the next random service
restart.




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Fujii Masao




On 2021/05/14 9:04, Alvaro Herrera wrote:

Here's a first attempt at what was suggested.  If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...


Many thanks! The patch basically looks good to me.

+void
+EnableQueryId(void)
+{
+   if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
+   auto_query_id_enabled = true;

Shouldn't EnableQueryId() enable auto_query_id_enabled whatever 
compute_query_id is?
Otherwise, for example, the following scenario can happen and it's a bit 
strange.

1. The server starts up with shared_preload_libraries=pg_stat_statements and 
compute_query_id=on
2. compute_query_id is set to auto and the configuration file is reloaded
Then, even though compute_query_id is auto and pg_stat_statements is loaded,
query ids are not computed and no queries are tracked by pg_stat_statements.


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-05-13 Thread Peter Geoghegan
On Thu, May 13, 2021 at 7:14 PM Michael Paquier  wrote:
> Perhaps that's an awful deal, but based on which facts can you really
> say that this new behavior of needing at least 2% of relation pages
> with some dead items to clean up indexes is not a worse deal in some
> cases?

If I thought that it simply wasn't possible then I wouldn't have
accepted the need to make it possible to disable. This is a
cost/benefit decision problem, which must be made based on imperfect
information -- there are no absolute certainties. But I'm certain
about one thing: there is a large practical difference between the
optimization causing terrible performance in certain scenarios and the
optimization causing slightly suboptimal performance in certain
scenarios. A tiny risk of the former scenario is *much* worse than a
relatively large risk of the latter scenario. There needs to be a
sense of proportion about risk.

> This may cause more problems for the in-core index AMs, as
> much as it could impact any out-of-core index AM, no?

I don't understand what you mean here.

> What about
> other values like 1%, or even 5%?  My guess is that there would be an
> ask to have more control on that, though that stands as my opinion.

How did you arrive at that guess? Why do you believe that? This is the
second time I've asked.

> Saying that, as long as there is a way to disable that for the users
> with autovacuum and manual vacuums, I'd be fine.  It is worth noting
> that adding an GUC to control this optimization would make the code
> more confusing, as there is already do_index_cleanup, a
> vacuum_index_cleanup reloption, and specifying vacuum_index_cleanup to
> TRUE may cause the index cleanup to not actually kick if the 2% bar is
> not reached.

I don't intend to add a GUC. A reloption should suffice.

Your interpretation of what specifying vacuum_index_cleanup (the
VACUUM command option) represents doesn't seem particularly justified
to me. To me it just means "index cleanup and vacuuming are not
explicitly disabled, the default behavior". It's an option largely
intended for emergencies, and largely superseded by the failsafe
mechanism. This interpretation is justified by well established
precedent: it has long been possible for VACUUM to skip heap page
pruning and even heap page vacuuming just because a super-exclusive
lock could not be acquired (though the latter case no longer happens
due to the same work inside vacuumlazy.c) -- which also implies
skipping some index vacuuming, without it ever being apparent to the
user.

-- 
Peter Geoghegan




RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-13 Thread Pengchengliu
Hi Greg,

  Thanks for you replay and test.

  

  When main process gets the transaction snapshot in 
InitializeParallelDSM->GetTransactionSnapshot,  the transaction snapshot xmin 
is very likely follows active snapshot xmin.

  

  Use gdb it is easy to verify it.

  Create env as blow:

  

  1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster 
database.

  2, Append the postgres.conf as below:

  

  max_connections = 2000

  parallel_setup_cost=0

  parallel_tuple_cost=0

  min_parallel_table_scan_size=0

  max_parallel_workers_per_gather=8

  max_parallel_workers = 128 

  

  3, Start the cluster database. Use the init_test.sql script in attachment to 
create some test tables.

  4, Use the sub_120.sql script in attachment with pgbench to test it.

  

  pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 1800

 

 

 

  Then you can login the database, and use GDB to verify it. 

  1, First use explain, make sure force Parallel is OK.

 

  postgres=# explain (verbose,analyze) select count(*) from contend1;

 QUERY PLAN 
 



-



Finalize Aggregate  (cost=12006.11..12006.12 rows=1 width=8) (actual 
time=1075.214..1075.449 rows=1 loops=1)

   Output: count(*)

   ->  Gather  (cost=12006.08..12006.09 rows=8 width=8) (actual 
time=1075.198..1075.433 rows=1 loops=1)

 Output: (PARTIAL count(*))

 Workers Planned: 8

 Workers Launched: 0

 ->  Partial Aggregate  (cost=12006.08..12006.09 rows=1 width=8) 
(actual time=1074.674..1074.676 rows=1 loops=1)

   Output: PARTIAL count(*)

   ->  Parallel Seq Scan on public.contend1  (cost=0.00..11690.06 
rows=126406 width=0) (actual time=0.008..587.454 rows=1

010200 loops=1)

 Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, c10, 
crt_time

Planning Time: 0.123 ms

Execution Time: 1075.588 ms

 postgres=# select pg_backend_pid();

pg_backend_pid 



2182678



 

  2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 
and continue.

  

  gdb  -q -p 2182678 

  ...

  (gdb) b parallel.c:219

Breakpoint 1 at 0x55d085: file 
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c,
 line 219.

  (gdb) c

Continuing.

 

  3, In the psql clinet, we can execute the explain command in step1 again. 

 After we get the breakpoint in gdb, we wait a moment. Then we execute next.

 Use gdb check active_snapshot and transaction_snapshot, 
active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.

When I use gdb test it, sometimes active_snapshot is the same as 
transaction_snapshot. Then you can try it multiple times, and before execute 
next, try wait longer time.

   

   Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)

at 
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:219

219 Snapshottransaction_snapshot = GetTransactionSnapshot();

(gdb) n 

220 Snapshotactive_snapshot = GetActiveSnapshot();

(gdb) 

223 oldcontext = MemoryContextSwitchTo(TopTransactionContext);

(gdb) p *transaction_snapshot

$1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip = 
0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010, 

  subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = 
false, curcid = 0, speculativeToken = 0, 

  active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling 
= 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

(gdb) p *active_snapshot

$2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip = 
0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0, 

  suboverflowed = true, takenDuringRecovery = false, copied = true, curcid = 0, 
speculativeToken = 0, active_count = 1, 

  regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0, 
prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}

(gdb)  

 

Thanks

Pengcheng

  

  

 

-Original Message-
From: Greg Nancarrow  
Sent: 2021年5月13日 22:15
To: Pengchengliu 
Cc: Andres Freund ; PostgreSQL-development 

Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

 

On Thu, May 13, 2021 at 11:25 AM Pengchengliu < 
 pengcheng...@tju.edu.cn> wrote:

> 

 

> Hi Andres,

>  Thanks for you replay.

 

Er, it's Greg who has replied so far (not Andres).

 

> 

>   And If you still cannot reproduce it in 2 minitus. Could you run pgbench 
> longer time, such as 30 or 60 minutes.

> 

 

Actually, I did run it, multiple times, for more than 60 minutes, but no 

Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Julien Rouhaud
On Thu, May 13, 2021 at 09:47:02PM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Fri, May 14, 2021 at 3:12 AM Bruce Momjian  wrote:
> >> I was surprised it was ~2%.
> 
> > Just to be clear, the 2% was a worst case scenario, ie. a very fast
> > read-only query on small data returning a single row.  As soon as you
> > get something more realistic / expensive the overhead goes away.
> 
> Of course, for plenty of people that IS the realistic scenario that
> they care about max performance for.

I'm not arguing that the scenario is unrealistic.  I'm arguing that retrieving
the first row of a join between pg_class and pg_attribute on an otherwise
vanilla database may not be the most representative workload, especially when
you take into account that it was done on hardware that still took 3 ms to do
that.

Unfortunately my laptop is pretty old and has already proven multiple time to
give unreliable benchmark results, so I'm not confident at all that those 2%
are even real outside of my machine.




RE: Forget close an open relation in ReorderBufferProcessTXN()

2021-05-13 Thread osumi.takami...@fujitsu.com
On Thursday, May 13, 2021 7:43 PM Amit Kapila  wrote:
> On Tue, Apr 20, 2021 at 8:36 AM Amit Langote 
> wrote:
> > Back in:
> https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTP
> U0L5%2
> > BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com
> >
> > I had proposed to move the map creation from maybe_send_schema() to
> > get_rel_sync_entry(), mainly because the latter is where I realized it
> > belongs, though a bit too late.
> 
> It seems in get_rel_sync_entry, it will only build the map again when there is
> any invalidation in publication_rel. Don't we need to build it after any DDL 
> on
> the relation itself? I haven't tried this with a test so I might be missing
> something.
Yeah, the patch not only tries to address the memory leak
but also changes the timing (condition) to call convert_tuples_by_name.
This is because the patch placed the function within a condition of 
!entry->replicate_valid in get_rel_sync_entry.
OTOH, OSS HEAD calls it based on RelationSyncEntry's schema_sent in 
maybe_send_schema.

The two flags (replicate_valid and schema_sent) are reset at different timing 
somehow.
InvalidateSystemCaches resets both flags but schema_send is also reset by 
LocalExecuteInvalidationMessage
while replicate_valid is reset by CallSyscacheCallbacks.

IIUC, InvalidateSystemCaches, which applies to both flags, is called
when a transaction starts, via AtStart_Cache and when a table lock is taken via 
LockRelationOid, etc.
Accordingly, I think we can notice changes after any DDL on the relation.

But, as for the different timing, we need to know the impact of the change 
accurately.
LocalExecuteInvalidationMessage is called from functions in reorderbuffer
(e.g. ReorderBufferImmediateInvalidation, ReorderBufferExecuteInvalidations).
This seems to me that changing the condition by the patch
reduces the chance of the reorderbuffer's proactive reset of
the flag which leads to rebuild the map in the end.

Langote-san, could you please explain this perspective ?


Best Regards,
Takamichi Osumi



Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-05-13 Thread Michael Paquier
On Thu, May 13, 2021 at 01:27:44PM -0700, Peter Geoghegan wrote:
> Almost all of the benefit of the optimization is available with the
> current BYPASS_THRESHOLD_PAGES threshold (2% of heap pages have
> LP_DEAD items), which has less risk than a higher threshold. I don't
> think it matters much if we have the occasional unnecessary round of
> index vacuuming on account of not applying the optimization. The truly
> important benefit of the optimization is to not do unnecessary index
> vacuuming all the time in the extreme cases where it's really hard to
> justify. There is currently zero evidence that anything higher than 2%
> will ever help anybody to an appreciably degree. (There is also no
> evidence that the optimization will ever need to be disabled, but I
> accept the need to be conservative and offer an off switch -- the
> precautionary principle applies when talking about new harms.)
> 
> Not having to scan every index on every VACUUM, but only every 5th or
> so VACUUM is a huge improvement. But going from every 5th VACUUM to
> every 10th VACUUM? That's at best a tiny additional improvement in
> exchange for what I'd guess is a roughly linear increase in risk
> (maybe a greater-than-linear increase, even). That's an awful deal.

Perhaps that's an awful deal, but based on which facts can you really
say that this new behavior of needing at least 2% of relation pages
with some dead items to clean up indexes is not a worse deal in some
cases?  This may cause more problems for the in-core index AMs, as
much as it could impact any out-of-core index AM, no?  What about
other values like 1%, or even 5%?  My guess is that there would be an
ask to have more control on that, though that stands as my opinion.

Saying that, as long as there is a way to disable that for the users
with autovacuum and manual vacuums, I'd be fine.  It is worth noting
that adding an GUC to control this optimization would make the code
more confusing, as there is already do_index_cleanup, a
vacuum_index_cleanup reloption, and specifying vacuum_index_cleanup to
TRUE may cause the index cleanup to not actually kick if the 2% bar is
not reached.
--
Michael


signature.asc
Description: PGP signature


Re: alter subscription drop publication fixes

2021-05-13 Thread Japin Li


On Thu, 13 May 2021 at 22:13, vignesh C  wrote:
> On Wed, May 12, 2021 at 10:15 PM Bharath Rupireddy
>  wrote:
>>
>> On Wed, May 12, 2021 at 9:55 PM vignesh C  wrote:
>> > While I was reviewing one of the logical decoding features, I found a
>> > few issues in alter subscription drop publication.
>>
>> Thanks!
>>
>> > Alter subscription drop publication does not support copy_data option,
>> > that needs to be removed from tab completion.
>>
>> +1. You may want to also change set_publication_option(to something
>> like drop_pulication_option with only refresh option) for the drop in
>> the docs? Because "Additionally, refresh options as described under
>> REFRESH PUBLICATION may be specified." doesn't make sense.
>>
>> > Dropping all the publications present in the subscription using alter
>> > subscription drop publication would throw "subscription must contain
>> > at least one publication". This message was slightly confusing to me.
>> > As even though some publication was present on the subscription I was
>> > not able to drop. Instead I feel we could throw an error message
>> > something like "dropping specified publication will result in
>> > subscription without any publication, this is not supported".
>>
>> -1 for that long message. The intention of that error was to throw an
>> error if all the publications of a subscription are dropped. If that's
>> so confusing, then you could just let the error message be
>> "subscription must contain at least one publication", add an error
>> detail "Subscription without any publication is not allowed to
>> exist/is not supported." or "Removing/Dropping all the publications
>> from a subscription is not allowed/supported." or some other better
>> wording.
>>
>
> Modified the error message to "errmsg("cannot drop all the
> publications of the subscriber \"%s\"", subname)".
> I have separated the Drop publication documentation contents. There
> are some duplicate contents but the readability is slightly better.
> Thoughts?
>
>> > merge_publications can be called after validation of the options
>> > specified, I think we should check if the options specified are
>> > correct or not before checking the actual publications.
>>
>> +1. That was a miss in the original feature.
>
> Attached patch has the changes for the same.
>

Thanks for updating the patch. I have a little comments for the new patch.

-  ADD adds additional publications,
-  DROP removes publications from the list of
+  ADD adds additional publications from the list of

I think, we should change the word 'from' to 'to'.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: PG 14 release notes, first draft

2021-05-13 Thread Justin Pryzby
On Thu, May 13, 2021 at 09:01:41PM -0500, Justin Pryzby wrote:
> These should be merged:
> > 756ab29124 Allow pageinspect to inspect GiST indexes (Andrey Borodin, 
> > Heikki Linnakangas)
> > 9e596b65f4 Add "LP_DEAD item?" column to GiST pageinspect functions

Sorry, this was my error while reconciling my list with yours.
Your notes only have one item for these, which is correct.

Possibly you'd want to include the 9e59 commit in the comment (but it wouldn't
have avoided my own confusion, tough).

-- 
Justin




Re: PG 14 release notes, first draft

2021-05-13 Thread Justin Pryzby
On Mon, May 10, 2021 at 09:40:45AM -0500, Justin Pryzby wrote:
> Should any of these be included?

New SQL-accessible functionality should be included:
> 8c15a29745 Allow ALTER TYPE to update an existing type's typsubscript value.

These should be merged:
> 756ab29124 Allow pageinspect to inspect GiST indexes (Andrey Borodin, Heikki 
> Linnakangas)
> 9e596b65f4 Add "LP_DEAD item?" column to GiST pageinspect functions

I'm undecided on this one:
> 7db0cd2145 Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

People who didn't used to use FREEZE (because it didn't work or otherwise)
might want to start using it.

I'm withdrawing these, as modifications to existing log messages don't need to
be included:

> 10a5b35a00 Report resource usage at the end of recovery
> 7e453634bb Add additional information in the vacuum error context.
> 1ea396362b Improve logging of bad parameter values in BIND messages.

-- 
Justin




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Tom Lane
Julien Rouhaud  writes:
> On Fri, May 14, 2021 at 3:12 AM Bruce Momjian  wrote:
>> I was surprised it was ~2%.

> Just to be clear, the 2% was a worst case scenario, ie. a very fast
> read-only query on small data returning a single row.  As soon as you
> get something more realistic / expensive the overhead goes away.

Of course, for plenty of people that IS the realistic scenario that
they care about max performance for.

regards, tom lane




Re: OOM in spgist insert

2021-05-13 Thread Tom Lane
I wrote:
> Anyway, here is a patch set teased apart into committable bites,
> and with your other points addressed.

Oh, maybe some docs would be a good thing too ...

regards, tom lane

diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index cfb2b3c836..18f1f3cdbd 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -978,6 +978,18 @@ LANGUAGE C STRICT;
fails to do that, the SP-GiST core resorts to
extraordinary measures described in .
   
+
+  
+   When longValuesOK is true, it is expected
+   that successive levels of the SP-GiST tree will
+   absorb more and more information into the prefixes and node labels of
+   the inner tuples, making the required leaf datum smaller and smaller,
+   so that eventually it will fit on a page.
+   To prevent bugs in operator classes from causing infinite insertion
+   loops, the SP-GiST core will raise an error if the
+   leaf datum does not become any smaller within ten cycles
+   of choose method calls.
+  
  
 
  


Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote:
> On Fri, May 14, 2021 at 8:13 AM Bruce Momjian  wrote:
> >
> > On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:
> > > Here's a first attempt at what was suggested.  If you say "auto" it
> > > remains auto in SHOW, but it gets enabled if a module asks for it.
> > >
> > > Not final yet, but I thought I'd throw it out for early commentary ...
> >
> > I certainly like this idea better than having the extension change the
> > output of the GUC.
> 
> Oh, I didn't understand that it was the major blocker.  I'm fine with it too.

I think if we keep the output as 'auto', and document that you check
pg_stat_activity for a hash to see if it is enabled, that gets us pretty
far.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: PG 14 release notes, first draft

2021-05-13 Thread Amit Langote
On Thu, May 13, 2021 at 11:59 PM Bruce Momjian  wrote:
> On Thu, May 13, 2021 at 02:46:58PM +0900, Amit Langote wrote:
> > How about writing the 2nd line instead as:
> >
> > Updates/deletes on partitioned tables can now use execution-time
> > partition pruning.
> >
> > We don't seem to use the term "run-time pruning" anywhere else in the
> > documentation, and "pruning of updates/deletes" sounds strange.
>
> Good point, updated text:
>
> 
> 
>
> 
> Improve the performance of updates/deletes on partitioned tables
> when only a few partitions are affected (Amit Langote, Tom Lane)
> 
>
> 
> This also allows updates/deletes on partitioned tables to use
> execution-time partition pruning.
> 
> 

Thank you.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Julien Rouhaud
On Fri, May 14, 2021 at 8:13 AM Bruce Momjian  wrote:
>
> On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:
> > Here's a first attempt at what was suggested.  If you say "auto" it
> > remains auto in SHOW, but it gets enabled if a module asks for it.
> >
> > Not final yet, but I thought I'd throw it out for early commentary ...
>
> I certainly like this idea better than having the extension change the
> output of the GUC.

Oh, I didn't understand that it was the major blocker.  I'm fine with it too.




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Julien Rouhaud
On Fri, May 14, 2021 at 3:12 AM Bruce Momjian  wrote:
>
> > Maybe we should revert this thing pending somebody doing the work to
> > make a version of queryid labeling that actually is negligibly cheap.
> > It certainly seems like that could be done; one more traversal of the
> > parse tree can't be that expensive in itself.  I suspect that the
> > performance problem is with the particular hashing mechanism that
> > was used, which looks mighty ad-hoc anyway.
>
> I was surprised it was ~2%.

Just to be clear, the 2% was a worst case scenario, ie. a very fast
read-only query on small data returning a single row.  As soon as you
get something more realistic / expensive the overhead goes away.  For
reference here is the detail:
https://www.postgresql.org/message-id/CAOBaU_ZVmGPfKTwZ6cM_qdzaF2E1gMkrLDMwwLy4Z1JxQ6=c...@mail.gmail.com




Re: PG 14 release notes, first draft

2021-05-13 Thread Bruce Momjian
On Mon, May 10, 2021 at 02:03:08AM -0400, Bruce Momjian wrote:
> I have committed the first draft of the PG 14 release notes.  You can
> see the most current  build of them here:
> 
>   https://momjian.us/pgsql_docs/release-14.html
> 
> I need clarification on many items, and the document still needs its
> items properly ordered, and markup added.  I also expect a lot of
> feedback.
> 
> I plan to work on completing this document this coming week in
> preparation for beta next week.

I have ordered the items in each section.  My next job is to add markup
and indenting to the XML.

FYI, the PG 14 release item count is much higher than usual:

release-7.4:  263
release-8.0:  230
release-8.1:  174
release-8.2:  215
release-8.3:  214
release-8.4:  314
release-9.0:  237
release-9.1:  203
release-9.2:  238
release-9.3:  177
release-9.4:  211
release-9.5:  193
release-9.6:  214
release-10:  189
release-11:  170
release-12:  180
release-13:  178
release-14:  220

PG 14 is a 23% increase over our previous release.  I think the cause is
either more hackers/sponsors, Covid, or both.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: What is lurking in the shadows?

2021-05-13 Thread David Rowley
On Fri, 14 May 2021 at 12:00, Peter Smith  wrote:
> That bug led me to wonder if similar problems might be going
> undetected elsewhere in the code. There is a gcc compiler option [3]
> -Wshadow which informs about the similar scenario where one variable
> is "shadowing" another (e.g. redeclaring a variable with the same name
> as one at an outer scope).

> For now, I am not sure how to proceed with this information. Hence this 
> post...

I'm inclined to think that since a bug has already been found due to a
local variable shadowing a global one that it would be good to review
these and then consider if it's worth doing any renaming. I think the
process of looking at each warning individually will allow us to
determine if; a) there are any bugs, or; b) if it's worth doing any
renaming.

I see GCC also has -Wshadow=compatible-local to warn when there
shadowing is going on in local vars where both vars have compatible
types.  -Wshadow=local is any local var shadowing, then the option you
used which is the same as -Wshadow=global.

I'd say it might be worth aspiring to reduce the warnings from
building with these flags. If we reduced these down then it might
allow us to more easily identify cases where there are actual bugs.
Maybe we can get to a point where we could enable either
-Wshadow=compatible-local or -Wshadow=local.  I doubt we could ever
get to a stage where -Wshadow=global would work for us.  There's also
some quite old discussion in [1] that you might want to review.

I don't pretend to have found the best example of ones that we might
want to leave alone, but:

pg_controldata.c: In function ‘wal_level_str’:
pg_controldata.c:73:24: warning: declaration of ‘wal_level’ shadows a
global declaration [-Wshadow]
 wal_level_str(WalLevel wal_level)
^
In file included from pg_controldata.c:24:0:
../../../src/include/access/xlog.h:187:24: warning: shadowed
declaration is here [-Wshadow]
 extern PGDLLIMPORT int wal_level;

I wonder if it would really clear up much if the parameter name there
was renamed not to shadow the GUC variable's name.

Also, doing any renaming here is not without risk that we break
something, so certainly PG15 at the earliest, unless there is an
actual bug.

I imagine starting with a patch that fixes the ones where the name
does not have much meaning. e.g, i, buf, tmp, lc

We also need to take into account that renaming variables here can
increase the overhead of backpatching fixes.  The process of fixing
those up to make the patch apply to the back branch does increase the
chances that bugs could make their way into the back branches.
However, it's probably more likely to end up as a bug if the patch was
written for the back branch then there's a bigger opportunity for the
patch author to pick the wrong variable name when converting the patch
to work with master. In the reverse case, that does not seem as likely
due to both variables having the same name.

David

[1] 
https://www.postgresql.org/message-id/flat/877k1psmpf.fsf%40mailbox.samurai.com




RE: Support for VACUUMing Foreign Tables

2021-05-13 Thread tsunakawa.ta...@fujitsu.com
From: Bharath Rupireddy 
> I think it will be useful to allow foreign tables to be VACUUMed if
> the underlying FDW supports, currently VACUUM doesn't support foreign
> tables, see [1].

Could you let us imagine more concretely how useful it will be?  While TRUNCATE 
can be part of an application's data processing as alternative to DELETE, I 
think VACUUM is purely the data storage maintenance that's performed by the DBA 
and can be done naturally locally on the server where the table resides.  (The 
existing ANALYZE on FDW is an exception; it's useful to also have data 
statistics locally.)


> this may not be much useful for FDWs that connect to remote non-MVCC
> databases where the concept of VACUUM may not apply, but for
> postgres_fdw and others it might help.

Can you show some examples of "others"?  I believe we should be careful not to 
make the FDW interface a swamp for functions that are only convenient for 
PostgreSQL.

How about adding a routine to the FDW interface that allows to execute an 
arbitrary command like the following?  VACUUM will be able to use this.

PGresult *DoCommandPathThrough(ForeignTable *table, const char *command);

Or, maybe it's more flexible to use ForeignServer instead of ForeignTable.


Regards
Takayuki Tsunakawa



Re: Testing autovacuum wraparound (including failsafe)

2021-05-13 Thread Peter Geoghegan
On Fri, Apr 23, 2021 at 7:56 PM Peter Geoghegan  wrote:
> I'm convinced -- decoupling the logic from the one-pass-not-two pass
> case seems likely to be simpler and more useful. For both the one pass
> and two pass/has indexes case.

Attached draft patch does it that way.

-- 
Peter Geoghegan


v1-0001-Consider-triggering-failsafe-during-first-scan.patch
Description: Binary data


Re: OOM in spgist insert

2021-05-13 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-May-13, Tom Lane wrote:
>> What do people think about back-patching this?  In existing branches,
>> we've defined it to be an opclass bug if it fails to shorten the leaf
>> datum enough.  But not having any defenses against that seems like
>> not a great idea.  OTOH, the 10-cycles-to-show-progress rule could be
>> argued to be an API break.

> I think if the alternative is to throw an error, we can afford to retry
> quite a few more times than 10 in order not have that called an API
> break.  Say, retry (MAXIMUM_ALIGNOF << 3) times or so (if you want to
> parameterize on maxalign).  It's not like this is going to be a
> performance drag where not needed .. but I think leaving back-branches
> unfixed is not great.

Hm.  Index bloat is not something to totally ignore, though, so I'm
not sure what the best cutoff is.

Anyway, here is a patch set teased apart into committable bites,
and with your other points addressed.

regards, tom lane

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6200699ddd..dd2ade7bb6 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -554,7 +554,7 @@ ProcessClientWriteInterrupt(bool blocked)
 		{
 			/*
 			 * Don't mess with whereToSendOutput if ProcessInterrupts wouldn't
-			 * do anything.
+			 * service ProcDiePending.
 			 */
 			if (InterruptHoldoffCount == 0 && CritSectionCount == 0)
 			{
@@ -3118,6 +3118,12 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
  * If an interrupt condition is pending, and it's safe to service it,
  * then clear the flag and accept the interrupt.  Called only when
  * InterruptPending is true.
+ *
+ * Note: if INTERRUPTS_CAN_BE_PROCESSED() is true, then ProcessInterrupts
+ * is guaranteed to clear the InterruptPending flag before returning.
+ * (This is not the same as guaranteeing that it's still clear when we
+ * return; another interrupt could have arrived.  But we promise that
+ * any pre-existing one will have been serviced.)
  */
 void
 ProcessInterrupts(void)
@@ -3248,7 +3254,11 @@ ProcessInterrupts(void)
 	{
 		/*
 		 * Re-arm InterruptPending so that we process the cancel request as
-		 * soon as we're done reading the message.
+		 * soon as we're done reading the message.  (XXX this is seriously
+		 * ugly: it complicates INTERRUPTS_CAN_BE_PROCESSED(), and it means we
+		 * can't use that macro directly as the initial test in this function,
+		 * meaning that this code also creates opportunities for other bugs to
+		 * appear.)
 		 */
 		InterruptPending = true;
 	}
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 95202d37af..b95bb956b6 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -57,6 +57,15 @@
  * allowing die interrupts: HOLD_CANCEL_INTERRUPTS() and
  * RESUME_CANCEL_INTERRUPTS().
  *
+ * Note that ProcessInterrupts() has also acquired a number of tasks that
+ * do not necessarily cause a query-cancel-or-die response.  Hence, it's
+ * possible that it will just clear InterruptPending and return.
+ *
+ * INTERRUPTS_PENDING_CONDITION() can be checked to see whether an
+ * interrupt needs to be serviced, without trying to do so immediately.
+ * Some callers are also interested in INTERRUPTS_CAN_BE_PROCESSED(),
+ * which tells whether ProcessInterrupts is sure to clear the interrupt.
+ *
  * Special mechanisms are used to let an interrupt be accepted when we are
  * waiting for a lock or when we are waiting for command input (but, of
  * course, only if the interrupt holdoff counter is zero).  See the
@@ -97,24 +106,27 @@ extern PGDLLIMPORT volatile uint32 CritSectionCount;
 /* in tcop/postgres.c */
 extern void ProcessInterrupts(void);
 
+/* Test whether an interrupt is pending */
 #ifndef WIN32
+#define INTERRUPTS_PENDING_CONDITION() \
+	(unlikely(InterruptPending))
+#else
+#define INTERRUPTS_PENDING_CONDITION() \
+	(unlikely(UNBLOCKED_SIGNAL_QUEUE()) ? pgwin32_dispatch_queued_signals() : 0,  \
+	 unlikely(InterruptPending))
+#endif
 
+/* Service interrupt if one is pending */
 #define CHECK_FOR_INTERRUPTS() \
 do { \
-	if (unlikely(InterruptPending)) \
-		ProcessInterrupts(); \
-} while(0)
-#else			/* WIN32 */
-
-#define CHECK_FOR_INTERRUPTS() \
-do { \
-	if (unlikely(UNBLOCKED_SIGNAL_QUEUE())) \
-		pgwin32_dispatch_queued_signals(); \
-	if (unlikely(InterruptPending)) \
+	if (INTERRUPTS_PENDING_CONDITION()) \
 		ProcessInterrupts(); \
 } while(0)
-#endif			/* WIN32 */
 
+/* Is ProcessInterrupts() guaranteed to clear InterruptPending? */
+#define INTERRUPTS_CAN_BE_PROCESSED() \
+	(InterruptHoldoffCount == 0 && CritSectionCount == 0 && \
+	 QueryCancelHoldoffCount == 0)
 
 #define HOLD_INTERRUPTS()  (InterruptHoldoffCount++)
 
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 4d380c99f0..88ae2499dd 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -1905,13 +1905,14 

Re: Executor code - found an instance of a WHILE that should just be an IF

2021-05-13 Thread Greg Nancarrow
On Fri, May 14, 2021 at 10:27 AM David Rowley  wrote:
>
> Thanks for the votes.  Pushed.

Thanks!

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Executor code - found an instance of a WHILE that should just be an IF

2021-05-13 Thread David Rowley
On Fri, 14 May 2021 at 00:27, Julien Rouhaud  wrote:
>
> On Thu, May 13, 2021 at 08:06:18PM +0900, Michael Paquier wrote:
> > On Thu, May 13, 2021 at 08:20:36PM +1200, David Rowley wrote:
> > > Since there's no bug fix here, I thought that there's not much point
> > > in backpatching this.
> >
> > Indeed.  I would not bother with a back-patch either.
> >
> > > Does anyone object to making this small change in master?
> >
> > No objections from here.
>
> +1 to both.

Thanks for the votes.  Pushed.

David




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:
> Here's a first attempt at what was suggested.  If you say "auto" it
> remains auto in SHOW, but it gets enabled if a module asks for it.
> 
> Not final yet, but I thought I'd throw it out for early commentary ...

I certainly like this idea better than having the extension change the
output of the GUC.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Alvaro Herrera
Here's a first attempt at what was suggested.  If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

-- 
Álvaro Herrera   Valdivia, Chile
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 40b5109b55..f7ce01539f 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -1067,4 +1067,20 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
  2
 (1 row)
 
+-- Check that pg_stat_statements() will complain if the configuration appears
+-- to be broken.
+SET compute_query_id = off;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT count(*) FROM pg_stat_statements;
+WARNING:  query identifier calculation is disabled
+ count 
+---
+ 0
+(1 row)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a85f962801..fab684a649 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -369,6 +369,12 @@ _PG_init(void)
 	if (!process_shared_preload_libraries_in_progress)
 		return;
 
+	/*
+	 * Inform the postmaster that we want to enable query_id calculation if
+	 * compute_query_id is set to auto.
+	 */
+	EnableQueryId();
+
 	/*
 	 * Define (or redefine) custom GUC variables.
 	 */
@@ -1617,6 +1623,18 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 	 */
 	LWLockAcquire(pgss->lock, LW_SHARED);
 
+	/*
+	 * If no query_id has been computed for the calling query and there is
+	 * no entries stored, then there's likely a configuration error that caller
+	 * may not be aware of so point it out.
+	 */
+	if (pgstat_get_my_query_id() == UINT64CONST(0) &&
+		compute_query_id == COMPUTE_QUERY_ID_OFF &&
+		hash_get_num_entries(pgss_hash) == 0)
+		ereport(WARNING,
+errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("query identifier calculation is disabled"));
+
 	if (showtext)
 	{
 		/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index e47b26040f..13346e2807 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1,2 +1 @@
 shared_preload_libraries = 'pg_stat_statements'
-compute_query_id = on
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index bc3b6493e6..827a8e3d18 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -437,4 +437,10 @@ SELECT (
 
 SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
 
+-- Check that pg_stat_statements() will complain if the configuration appears
+-- to be broken.
+SET compute_query_id = off;
+SELECT pg_stat_statements_reset();
+SELECT count(*) FROM pg_stat_statements;
+
 DROP EXTENSION pg_stat_statements;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 45bd1f1b7e..60d8b24f5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7627,7 +7627,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
  
 
  
-  compute_query_id (boolean)
+  compute_query_id (enum)
   
compute_query_id configuration parameter
   
@@ -7643,7 +7643,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 identifier to be computed.  Note that an external module can
 alternatively be used if the in-core query identifier computation
 method is not acceptable.  In this case, in-core computation
-must be disabled.  The default is off.
+must be always disabled.
+Valid values are off (always disabled),
+on (always enabled) and auto,
+which let modules such as 
+automatically enable it.
+The default is auto.


 
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index bc2b6038ee..acfb134797 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -22,10 +22,15 @@
 
  
   The module will not track statistics unless query
-  identifiers are calculated.  This can be done by enabling  or using a third-party module that
-  computes its own query identifiers.  Note that all statistics tracked
-  by this module must be reset if the query identifier method is changed.
+  identifiers are calculated.  This is done by automatically when this
+  extension is configured in shared_preload_libraries and
+   is set to auto
+  (which is the default value), or always if  is set to on.
+  You must set  to off
+  if you 

Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Michael Paquier
On Thu, May 13, 2021 at 07:19:11PM -0400, Andrew Dunstan wrote:
> On 5/13/21 3:04 PM, Alvaro Herrera wrote:
>> I'm happy to act as committer for that if he wants to step away from it.
>> I'm already used to being lapidated at every corner anyway.
> 
> Many thanks Alvaro, among other things for teaching me a new word.

+1.  Thanks, Alvaro.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-05-13 Thread Justin Pryzby
On Thu, May 13, 2021 at 05:33:33PM -0400, Alvaro Herrera wrote:
> +++ b/doc/src/sgml/maintenance.sgml
> @@ -817,6 +817,11 @@ analyze threshold = analyze base threshold + analyze 
> scale factor * number of tu
>  
>  is compared to the total number of tuples inserted, updated, or deleted
>  since the last ANALYZE.
> +For partitioned tables, inserts and updates on partitions are counted
> +towards this threshold; however partition meta-operations such as
> +attachment, detachment or drop are not, so running a manual
> +ANALYZE is recommended if the partition added or
> +removed contains a statistically significant volume of data.

I suggest: "Inserts, updates and deletes on partitions of a partitioned table
are counted towards this threshold; however DDL operations such as ATTACH,
DETACH and DROP are not, ...

> +   and in addition it will analyze each individual partition separately.

remove "and" and say in addition COMMA
Or:
"it will also recursive into each partition and update its statistics."

> +   By constrast, if the table being analyzed has inheritance children,
> +   ANALYZE will gather statistics for that table twice:
> +   once on the rows of the parent table only, and a second time on the
> +   rows of the parent table with all of its children.  This second set of
> +   statistics is needed when planning queries that traverse the entire
> +   inheritance tree.  The children tables are not individually analyzed
> +   in this case.

say "The child tables themselves.."

> +  
> +   For tables with inheritance children, the autovacuum daemon only
> +   counts inserts and deletes in the parent table itself when deciding
> +   whether to trigger an automatic analyze for that table.  If that table
> +   is rarely inserted into or updated, the inheritance statistics will
> +   not be up to date unless you run ANALYZE manually.
> +  

This should be emphasized:
Tuples changed in inheritence children do not count towards analyze on the
parent table.  If the parent table is empty or rarely changed, it may never 
be processed by autovacuum.  It's necesary to periodically run an manual
ANALYZE to keep the statistics of the table hierarchy up to date.

I don't know why it says "inserted or updated" but doesn't say "or deleted" -
that seems like a back-patchable fix.

> +++ b/doc/src/sgml/ref/pg_restore.sgml
> @@ -922,8 +922,10 @@ CREATE DATABASE foo WITH TEMPLATE template0;
>  
>
> Once restored, it is wise to run ANALYZE on each
> -   restored table so the optimizer has useful statistics; see
> -and
> +   restored table so the optimizer has useful statistics.
> +   If the table is a partition or an inheritance child, it may also be useful
> +   to analyze the parent table.
> +   See  and
>  for more information.

maybe say: "analyze the parent to update statistics for the table hierarchy".




Re: Always bump PG_CONTROL_VERSION?

2021-05-13 Thread Tom Lane
Jan Wieck  writes:
> On 5/13/21 6:45 PM, Tom Lane wrote:
>> Bumping the version in the commit that changes
>> things is not optional, because if you don't do that then you'll
>> probably burn some other developer also working on HEAD.  So
>> I don't want people thinking they can skip this because it was
>> done at the beginning of the development cycle.

> And we make sure this is done how?

Peer pressure?  It's not that different from N other ways to
do a patch incorrectly, of course.

regards, tom lane




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 07:19:11PM -0400, Andrew Dunstan wrote:
> 
> On 5/13/21 3:04 PM, Alvaro Herrera wrote:
> > On 2021-May-13, Stephen Frost wrote:
> >
> >> Or just accept that this is a bit hokey with the 'auto' approach.  I get
> >> Bruce has concerns about it but I'm not convinced that it's actually all
> >> that bad to go with that.
> > Yeah, I think the alleged confusion there is overstated.
> >
> > I'm happy to act as committer for that if he wants to step away from it.
> > I'm already used to being lapidated at every corner anyway.
> >
> 
> 
> Many thanks Alvaro, among other things for teaching me a new word.
> 
> (delapidated) andrew

Yes, I had to look it up too.  :-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2021-05-13 Thread Tom Lane
Dmitry Astapov  writes:
> Am I right in thinking that elimination the join condition is actually
> quite important part of the process?
> Could it possibly be the main reason for =ANY/(x IN (..)) not to be
> optimized the same way?

Yup.

> Is it still hard when one thinks about =ANY or (column in (val1, val2,
> val3, ...)) as well?

Yeah.  For instance, if you have
   WHERE a = b AND a IN (1,2,3)
then yes, you could deduce "b IN (1,2,3)", but this would not give you
license to drop the "a = b" condition.  So now you have to figure out
what the selectivity of that is after the application of the partially
redundant IN clauses.

I recall somebody (David Rowley, maybe?  Too lazy to check archives.)
working on this idea awhile ago, but he didn't get to the point of
a committable patch.

regards, tom lane




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Andrew Dunstan


On 5/13/21 3:04 PM, Alvaro Herrera wrote:
> On 2021-May-13, Stephen Frost wrote:
>
>> Or just accept that this is a bit hokey with the 'auto' approach.  I get
>> Bruce has concerns about it but I'm not convinced that it's actually all
>> that bad to go with that.
> Yeah, I think the alleged confusion there is overstated.
>
> I'm happy to act as committer for that if he wants to step away from it.
> I'm already used to being lapidated at every corner anyway.
>


Many thanks Alvaro, among other things for teaching me a new word.


cheers


(delapidated) andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: OOM in spgist insert

2021-05-13 Thread Tom Lane
Alvaro Herrera  writes:
> Hmm, it looks OK to me, but I wonder why you kept the original
> CHECK_FOR_INTERRUPTS()s since these would be done once we've broken out
> of the loop anyway.  I tested Dilip's original test case and while we
> still die on OOM, we're able to interrupt it before dying.

Hm.  My thought was that in the cases where InterruptPending is set for
some reason other than a query cancel, we could let ProcessInterrupts
service it at less cost than abandoning and retrying the index insertion.
On reflection though, that only works for the first CHECK_FOR_INTERRUPTS
at the top of the loop, and only the first time through, because during
later calls we'll be holding buffer locks.

Maybe the best idea is to have one CHECK_FOR_INTERRUPTS at the top of
the function, in hopes of clearing out any already-pending interrupts,
and then just use the condition test inside the loop.

> Not related to this patch -- I was bothered by the UnlockReleaseBuffer
> calls at the bottom of spgdoinsert that leave the buffer still set in
> the structs.  It's not a problem if you look only at this routine, but I
> notice that callee doPickSplit does the same thing, so maybe there is
> some weird situation in which you could end up with the buffer variable
> set, but we don't hold lock nor pin on the page, so an attempt to clean
> up would break.

Maybe I'm confused, but aren't those just local variables that are about
to go out of scope anyway?  Clearing them seems more likely to draw
compiler warnings about dead stores than accomplish something useful.

regards, tom lane




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-05-13 Thread Alvaro Herrera
With English fixes from Bruce.

I think the note about autovacuum in the reference page for ANALYZE is a
bit out of place, but not enough to make me move the whole paragraph
elsewhere.  Maybe we should try to do that sometime.

-- 
Álvaro Herrera   Valdivia, Chile
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
>From 2c3e913543ff76a1b170fe6e9bf2aeb8c7e13852 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 13 May 2021 16:24:11 -0400
Subject: [PATCH v2] update docs on analyze on partitioned tables

---
 doc/src/sgml/maintenance.sgml|  5 
 doc/src/sgml/perform.sgml|  3 ++-
 doc/src/sgml/ref/analyze.sgml| 40 +++-
 doc/src/sgml/ref/pg_restore.sgml |  6 +++--
 4 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index de7fd75e1c..b390debf2e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -817,6 +817,11 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 
 is compared to the total number of tuples inserted, updated, or deleted
 since the last ANALYZE.
+For partitioned tables, inserts and updates on partitions are counted
+towards this threshold; however partition meta-operations such as
+attachment, detachment or drop are not, so running a manual
+ANALYZE is recommended if the partition added or
+removed contains a statistically significant volume of data.

 

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..ddd6c3ff3e 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1767,7 +1767,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

 Whenever you have significantly altered the distribution of data
 within a table, running ANALYZE is strongly recommended. This
-includes bulk loading large amounts of data into the table.  Running
+includes bulk loading large amounts of data into the table as well as
+attaching, detaching or dropping partitions.  Running
 ANALYZE (or VACUUM ANALYZE)
 ensures that the planner has up-to-date statistics about the
 table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc161..f99e49798e 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,20 +250,38 @@ ANALYZE [ VERBOSE ] [ table_and_columns
 
   
-If the table being analyzed has one or more children,
-ANALYZE will gather statistics twice: once on the
-rows of the parent table only, and a second time on the rows of the
-parent table with all of its children.  This second set of statistics
-is needed when planning queries that traverse the entire inheritance
-tree.  The autovacuum daemon, however, will only consider inserts or
-updates on the parent table itself when deciding whether to trigger an
-automatic analyze for that table.  If that table is rarely inserted into
-or updated, the inheritance statistics will not be up to date unless you
-run ANALYZE manually.
+   If the table being analyzed is partitioned, ANALYZE
+   will gather statistics by sampling blocks randomly from its partitions;
+   and in addition it will analyze each individual partition separately.
+   (However, in multi-level partitioning scenarios, each leaf partition
+   will only be analyzed once.)
+   By constrast, if the table being analyzed has inheritance children,
+   ANALYZE will gather statistics for that table twice:
+   once on the rows of the parent table only, and a second time on the
+   rows of the parent table with all of its children.  This second set of
+   statistics is needed when planning queries that traverse the entire
+   inheritance tree.  The child tables are not individually analyzed
+   in this case.
   
 
   
-If any of the child tables are foreign tables whose foreign data wrappers
+   The autovacuum daemon counts inserts and updates in the partitions
+   to determine if auto-analyze is needed.  However, adding or
+   removing partitions does not affect autovacuum daemon decisions,
+   so triggering a manual ANALYZE is recommended when
+   this occurs.
+  
+
+  
+   For tables with inheritance children, the autovacuum daemon only
+   counts inserts and deletes in the parent table itself when deciding
+   whether to trigger an automatic analyze for that table.  If that table
+   is rarely inserted into or updated, the inheritance statistics will
+   not be up to date unless you run ANALYZE manually.
+  
+
+  
+If any of the child tables or partitions are foreign tables whose foreign data wrappers
 do not support ANALYZE, those child tables are ignored while
 gathering inheritance statistics.
   
diff --git a/doc/src/sgml/ref/pg_restore.sgml 

Re: Always bump PG_CONTROL_VERSION?

2021-05-13 Thread Jan Wieck

On 5/13/21 6:45 PM, Tom Lane wrote:

  Bumping the version in the commit that changes
things is not optional, because if you don't do that then you'll
probably burn some other developer also working on HEAD.  So
I don't want people thinking they can skip this because it was
done at the beginning of the development cycle.


And we make sure this is done how?


Regards, Jan

--
Jan Wieck
Postgres User since 1994




Re: OOM in spgist insert

2021-05-13 Thread Pavel Borisov
I think it's good to backpatch the check as it doesn't change acceptable
behavior, just replace infinite loop with the acceptable thing.


Re: OOM in spgist insert

2021-05-13 Thread Alvaro Herrera
On 2021-May-13, Tom Lane wrote:

> What do people think about back-patching this?  In existing branches,
> we've defined it to be an opclass bug if it fails to shorten the leaf
> datum enough.  But not having any defenses against that seems like
> not a great idea.  OTOH, the 10-cycles-to-show-progress rule could be
> argued to be an API break.

I think if the alternative is to throw an error, we can afford to retry
quite a few more times than 10 in order not have that called an API
break.  Say, retry (MAXIMUM_ALIGNOF << 3) times or so (if you want to
parameterize on maxalign).  It's not like this is going to be a
performance drag where not needed .. but I think leaving back-branches
unfixed is not great.

I did run Dilip's test case as well as your new regression test, and
both work as intended with your new code (and both OOM-crash the
original code).

-- 
Álvaro Herrera   Valdivia, Chile




Re: Always bump PG_CONTROL_VERSION?

2021-05-13 Thread Tom Lane
Jan Wieck  writes:
> Regardless of PG_VERSION doing the job or not, shouldn't there be a bump 
> in PG_CONTROL_VERSION whenever there is a structural or semantic change 
> in the control file data? And wouldn't the easiest way to ensure that be 
> to bump the version with every release?

No, the way to do that is to change the version number in the commit
that changes the file's contents.

> Also, can someone give me a good reason NOT to bump the version?

It creates unnecessary churn, not to mention a false sense of
complacency.  Bumping the version in the commit that changes
things is not optional, because if you don't do that then you'll
probably burn some other developer also working on HEAD.  So
I don't want people thinking they can skip this because it was
done at the beginning of the development cycle.  We've learned
these things the hard way for CATVERSION.  I think the only reason
that PG_CONTROL version or WAL version might seem different is
that we haven't changed them often enough for people to have fresh
memories of problems.

regards, tom lane




Re: OOM in spgist insert

2021-05-13 Thread Alvaro Herrera
On 2021-May-13, Tom Lane wrote:

> Alvaro Herrera  writes:
> > (Looking again, the nbtpage.c hunk might have been made obsolete by
> > c34787f91058 and other commits).
> 
> OK.  Here's a revision that adopts your idea, except that I left
> out the nbtpage.c change since you aren't sure of that part.

Thanks.

> I added a macro that allows spgdoinsert to Assert that it's not
> called in a context where the infinite-loop-due-to-InterruptPending
> risk would arise.  This is a little bit fragile, because it'd be
> easy for ill-considered changes to ProcessInterrupts to break it,
> but it's better than nothing.

Hmm, it looks OK to me, but I wonder why you kept the original
CHECK_FOR_INTERRUPTS()s since these would be done once we've broken out
of the loop anyway.  I tested Dilip's original test case and while we
still die on OOM, we're able to interrupt it before dying.


Not related to this patch -- I was bothered by the UnlockReleaseBuffer
calls at the bottom of spgdoinsert that leave the buffer still set in
the structs.  It's not a problem if you look only at this routine, but I
notice that callee doPickSplit does the same thing, so maybe there is
some weird situation in which you could end up with the buffer variable
set, but we don't hold lock nor pin on the page, so an attempt to clean
up would break.  I don't know enough about spgist to figure out how to
craft a test case, maybe it's impossible to reach for some reason, but
it seems glass-in-the-beach sort of thing.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude."  (Brian Kernighan)




Re: OOM in spgist insert

2021-05-13 Thread Tom Lane
Here's a patch that attempts to prevent the infinite loop in spgdoinsert
by checking whether the proposed leaf tuple is getting smaller at each
iteration.

We can't be totally rigid about that, because for example if the opclass
shortens a 7-byte string to 5 bytes, that will make no difference in the
tuple's size after alignment.  I first tried to handle that by checking
datumGetSize() of the key datum itself, but observed that spgtextproc.c
has some cases where it'll return an empty leaf-datum string at two
levels before succeeding.  Maybe it'd be okay to fail that on the
grounds that it can't become any smaller later.  But on the whole, and
considering the existing comment's concerns about opclasses that don't
shorten the datum every time, it seems like a good idea to allow some
fuzz here.  So what this patch does is to allow up to 10 cycles with no
reduction in the actual leaf tuple size before failing.  That way we can
handle slop due to alignment roundoff and slop due to opclass corner
cases with a single, very cheap mechanism.  It does mean that we might
build a few more useless inner tuples before failing --- but that seems
like a good tradeoff for *not* failing in cases where the opclass is
able to shorten the leaf datum sufficiently.

I have not bothered to tease apart the query-cancel and infinite-loop
parts of the patch, but probably should do that before committing.

What do people think about back-patching this?  In existing branches,
we've defined it to be an opclass bug if it fails to shorten the leaf
datum enough.  But not having any defenses against that seems like
not a great idea.  OTOH, the 10-cycles-to-show-progress rule could be
argued to be an API break.

regards, tom lane

diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 4d380c99f0..e882b0cea4 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -669,7 +669,8 @@ checkAllTheSame(spgPickSplitIn *in, spgPickSplitOut *out, bool tooBig,
  * will eventually terminate if lack of balance is the issue.  If the tuple
  * is too big, we assume that repeated picksplit operations will eventually
  * make it small enough by repeated prefix-stripping.  A broken opclass could
- * make this an infinite loop, though.
+ * make this an infinite loop, though, so spgdoinsert() checks that the
+ * leaf datums get smaller each time.
  */
 static bool
 doPickSplit(Relation index, SpGistState *state,
@@ -1905,18 +1906,21 @@ spgSplitNodeAction(Relation index, SpGistState *state,
  * Insert one item into the index.
  *
  * Returns true on success, false if we failed to complete the insertion
- * because of conflict with a concurrent insert.  In the latter case,
- * caller should re-call spgdoinsert() with the same args.
+ * (typically because of conflict with a concurrent insert).  In the latter
+ * case, caller should re-call spgdoinsert() with the same args.
  */
 bool
 spgdoinsert(Relation index, SpGistState *state,
 			ItemPointer heapPtr, Datum *datums, bool *isnulls)
 {
+	bool		result = true;
 	TupleDesc	leafDescriptor = state->leafTupDesc;
 	bool		isnull = isnulls[spgKeyColumn];
 	int			level = 0;
 	Datum		leafDatums[INDEX_MAX_KEYS];
 	int			leafSize;
+	int			prevLeafSize;
+	int			numNoProgressCycles = 0;
 	SPPageDesc	current,
 parent;
 	FmgrInfo   *procinfo = NULL;
@@ -1987,9 +1991,10 @@ spgdoinsert(Relation index, SpGistState *state,
 
 	/*
 	 * If it isn't gonna fit, and the opclass can't reduce the datum size by
-	 * suffixing, bail out now rather than getting into an endless loop.
+	 * suffixing, bail out now rather than doing a lot of useless work.
 	 */
-	if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
+	if (leafSize > SPGIST_PAGE_CAPACITY &&
+		(isnull || !state->config.longValuesOK))
 		ereport(ERROR,
 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
  errmsg("index row size %zu exceeds maximum %zu for index \"%s\"",
@@ -2019,9 +2024,17 @@ spgdoinsert(Relation index, SpGistState *state,
 		/*
 		 * Bail out if query cancel is pending.  We must have this somewhere
 		 * in the loop since a broken opclass could produce an infinite
-		 * picksplit loop.
+		 * picksplit loop.  Moreover, because it's typically the case that
+		 * we're holding buffer lock(s), ProcessInterrupts() won't be able to
+		 * throw an error now.  Hence, if we see that the interrupt condition
+		 * remains true, break out of the loop and deal with the case below.
 		 */
 		CHECK_FOR_INTERRUPTS();
+		if (INTERRUPTS_PENDING_CONDITION())
+		{
+			result = false;
+			break;
+		}
 
 		if (current.blkno == InvalidBlockNumber)
 		{
@@ -2140,10 +2153,15 @@ spgdoinsert(Relation index, SpGistState *state,
 			 * spgAddNode and spgSplitTuple cases will loop back to here to
 			 * complete the insertion operation.  Just in case the choose
 			 * function is broken and produces add or split requests
-			 * repeatedly, check for query 

Re: Always bump PG_CONTROL_VERSION?

2021-05-13 Thread Jan Wieck

On 5/12/21 10:04 PM, Michael Paquier wrote:

On Wed, May 12, 2021 at 01:30:27PM -0700, Andres Freund wrote:

That said, I don't think it's a good practice to use the control file
version as an identifier for the major version. Who knows, it might be
necessary to add an optional new format in a minor version at some point
or such crazyness. And then there's the beta stuff you'd mentioned, etc.


Yes, PG_VERSION, as you wrote upthread already, is already fine for
the job, and FWIW, I have yet to see a case where being able to easily
detect the minor version in a data folder matters.


Regardless of PG_VERSION doing the job or not, shouldn't there be a bump 
in PG_CONTROL_VERSION whenever there is a structural or semantic change 
in the control file data? And wouldn't the easiest way to ensure that be 
to bump the version with every release?


Also, can someone give me a good reason NOT to bump the version?


Thanks, Jan

--
Jan Wieck
Postgres User since 1994




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2021-05-13 Thread Dmitry Astapov
On Wed, May 12, 2021 at 4:54 PM Tom Lane  wrote:

> Dmitry Astapov  writes:
> > I am trying to understand the behaviour of the query planner regarding
> the
> > push-down of the conditions "through" the join.
>
> I think your mental model is wrong.  What's actually happening here is
> that the planner uses equivalence classes to deduce implied conditions.
> That is, we have the join condition a.adate = b.bdate and then you've
> added the where condition a.adate = '2021-05-12'.  Transitivity implies
> that b.bdate = '2021-05-12', so we deduce that condition and are able
> to apply it at the relation scan of b.  Furthermore, having restricted
> both a.adate and b.bdate to the same constant value at the scan level,
> we no longer need to apply the join condition a.adate = b.bdate at all.
> This is important not only to avoid the (probably minor) inefficiency
> of rechecking the join condition, but because if we believed that all
> three conditions were independently applicable, we'd come out with a
> serious underestimate of the size of the join result.
>

Thank you very much, my mental model was indeed incorrect, and the above is
very helpful.
Am I right in thinking that elimination the join condition is actually
quite important part of the process?
Could it possibly be the main reason for =ANY/(x IN (..)) not to be
optimized the same way?


>
> > In my experiments, I was never able to get an execution plan that "pushes
> > down" any condition apart from (=) through to the right side of the join,
>
> None of the argument sketched above works for non-equality conditions.
> There are some situations where you could probably figure out how to
> use transitivity to deduce some implied condition, but cleaning things
> up so that you don't have redundant conditions fouling up the join
> size estimates seems like a hard problem.
>

I agree about inequality conditions, this problem seems to be rather hard
to tackle in the general case.

Is it still hard when one thinks about =ANY or (column in (val1, val2,
val3, ...)) as well?
I am thinking that =ANY would be a decent workaround for (x BETWEEN a AND
b) in quite a lot of cases, if it was propagated to all the columns in the
equivalence class.



> > Equally surprising is that I was unable to find documentation or past
> > mailing list discussions of this or similar topic, which leads me to
> > believe that I am just not familiar with the proper terminology and can't
> > come up with the right search terms.
>
> src/backend/optimizer/README has a discussion of equivalence classes.
>
Thank you, this gives me a plethora of keywords for further searches.

I realize that it is possibly off-topic here, but what about workarounds
for inequality constraints, joins and views? Maybe you could give me some
pointers here as well?

My tables are large to huge (think OLAP, not OLTP). I found out when I have
a view that joins several (2 to 10) tables on the column that is
semantically the same in all of them (let's say it is ID and we join on
ID), I do not have many avenues to efficiently select from such view for a
list of IDs at the same time.

I could:
1) Do lots of fast queries and union them:
select * from vw where id=ID1 union all select * from vw where id=ID2
., which is only really feasible if the query is generated by the
program

2)expose all ID columns from all the tables used in the view body and do:
select * from vw where id=ANY() and id1=ANY() and id2=ANY() and id3=ANY()
.
This only works well if the view hierarchy is flat (no views on views). If
there are other views that use this use, re-exports of extra columns
quickly snowballs, you might need column renaming if same view ends up
being used more than once through two different dependency paths. Plus
people not familiar with the problem tend to omit "clearly superfluous"
columns from the new views they build on top.

3)forbid views that join tables larger than a certain size/dismantle views
that become inefficient (this only works if the problem is detected fast
enough and the view did not become popular yet)

So all of the workarounds I see in front of me right now are somewhat sad,
but they are necessary, as not doing them means that queries would take
hours or days instead of minutes.

Is there anything better that I have not considered in terms of workarounds?


-- 
D. Astapov


Re: [PATCH] Full support for index LP_DEAD hint bits on standby

2021-05-13 Thread Michail Nikolaev
Hello.

Added a check for standby promotion with the long transaction to the
test (code and docs are unchanged).

Thanks,
Michail.
From c5e1053805c537b50b0922151bcf127754500adb Mon Sep 17 00:00:00 2001
From: Michail Nikolaev 
Date: Fri, 14 May 2021 00:32:30 +0300
Subject: [PATCH v3 3/4] test

---
 src/test/recovery/Makefile|   1 +
 .../recovery/t/022_standby_index_lp_dead.pl   | 265 ++
 2 files changed, 266 insertions(+)
 create mode 100644 src/test/recovery/t/022_standby_index_lp_dead.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index 96442ceb4e..6399184a8c 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -10,6 +10,7 @@
 #-
 
 EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL+=contrib/pageinspect
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/022_standby_index_lp_dead.pl b/src/test/recovery/t/022_standby_index_lp_dead.pl
new file mode 100644
index 00..fc91f789a1
--- /dev/null
+++ b/src/test/recovery/t/022_standby_index_lp_dead.pl
@@ -0,0 +1,265 @@
+# Checks that index hints on standby work as excepted.
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 18;
+use Config;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->append_conf('postgresql.conf', qq{
+autovacuum = off
+enable_seqscan = off
+enable_indexonlyscan = off
+});
+$node_primary->start;
+
+$node_primary->safe_psql('postgres', 'CREATE EXTENSION pageinspect');
+# Create test table with primary index
+$node_primary->safe_psql(
+'postgres', 'CREATE TABLE test_table (id int, value int)');
+$node_primary->safe_psql(
+'postgres', 'CREATE INDEX test_index ON test_table (value, id)');
+# Fill some data to it, note to not put a lot of records to avoid
+# heap_page_prune_opt call which cause conflict on recovery hiding conflict
+# caused due index hint bits
+$node_primary->safe_psql('postgres',
+'INSERT INTO test_table VALUES (generate_series(1, 30), 0)');
+# And vacuum to allow index hint bits to be set
+$node_primary->safe_psql('postgres', 'VACUUM test_table');
+# For fail-fast in case FPW from primary
+$node_primary->safe_psql('postgres', 'CHECKPOINT');
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Restore standby node from backup backup
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+has_streaming => 1);
+
+my $standby_settings = qq{
+max_standby_streaming_delay = 1
+wal_receiver_status_interval = 1
+hot_standby_feedback = off
+enable_seqscan = off
+enable_indexonlyscan = off
+};
+$node_standby_1->append_conf('postgresql.conf', $standby_settings);
+$node_standby_1->start;
+
+$node_standby_1->backup($backup_name);
+
+# Create second standby node linking to standby 1
+my $node_standby_2 = get_new_node('standby_2');
+$node_standby_2->init_from_backup($node_standby_1, $backup_name,
+has_streaming => 1);
+$node_standby_2->append_conf('postgresql.conf', $standby_settings);
+$node_standby_2->start;
+
+# To avoid hanging while expecting some specific input from a psql
+# instance being driven by us, add a timeout high enough that it
+# should never trigger even on very slow machines, unless something
+# is really wrong.
+my $psql_timeout = IPC::Run::timer(10);
+
+# One psql to run command in repeatable read isolation level
+my %psql_standby_repeatable_read = ('stdin' => '', 'stdout' => '', 'stderr' => '');
+$psql_standby_repeatable_read{run} =
+IPC::Run::start(
+[ 'psql', '-XAb', '-f', '-', '-d', $node_standby_1->connstr('postgres') ],
+'<', \$psql_standby_repeatable_read{stdin},
+'>', \$psql_standby_repeatable_read{stdout},
+'2>', \$psql_standby_repeatable_read{stderr},
+$psql_timeout);
+
+# Another psql to run command in read committed isolation level
+my %psql_standby_read_committed = ('stdin' => '', 'stdout' => '', 'stderr' => '');
+$psql_standby_read_committed{run} =
+IPC::Run::start(
+[ 'psql', '-XAb', '-f', '-', '-d', $node_standby_1->connstr('postgres') ],
+'<', \$psql_standby_read_committed{stdin},
+'>', \$psql_standby_read_committed{stdout},
+'2>', \$psql_standby_read_committed{stderr},
+$psql_timeout);
+
+# Start RR transaction and read first row from index
+ok(send_query_and_wait(\%psql_standby_repeatable_read,
+q[
+BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+SELECT id FROM test_table WHERE value = 0 ORDER BY id LIMIT 1;
+],
+qr/1\n\(1 row\)/m),
+'row is visible in repeatable read');
+
+# Start RC transaction and read first row from index
+ok(send_query_and_wait(\%psql_standby_read_committed,
+q[
+BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;

Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-05-13 Thread Alvaro Herrera
New version, a bit more ambitious.  I think it's better to describe
behavior for partitioned tables ahead of inheritance.  Also, in the
ANALYZE reference page I split the topic in two: in one single paragraph
we now describe what happens with manual analyze for partitioned tables
and inheritance hierarchies; we describe the behavior of autovacuum in
one separate paragraph for each type of hierarchy, since the differences
are stark.

I noticed that difference while verifying the behavior that I was to
document.  If you look at ANALYZE VERBOSE output, it seems a bit
wasteful:

create table part (a int) partition by list (a);
create table part0 partition of part for values in (0);
create table part1 partition of part for values in (1);
create table part23 partition of part for values in (2, 3) partition by list 
(a);
create table part2 partition of part23 for values in (2);
create table part3 partition of part23 for values in (3);
insert into part select g%4 from generate_series(1, 5000) g;

analyze verbose part;

INFO:  analyzing "public.part" inheritance tree
INFO:  "part1": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 
dead rows; 7500 rows in sample, 12500060 estimated total rows
INFO:  "part2": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 
dead rows; 7500 rows in sample, 12500060 estimated total rows
INFO:  "part3": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 
dead rows; 7500 rows in sample, 12500060 estimated total rows
INFO:  "part4": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 
dead rows; 7500 rows in sample, 12500060 estimated total rows
INFO:  analyzing "public.part1"
INFO:  "part1": scanned 3 of 55310 pages, containing 6779940 live rows and 
0 dead rows; 3 rows in sample, 12499949 estimated total rows
INFO:  analyzing "public.part2"
INFO:  "part2": scanned 3 of 55310 pages, containing 6779940 live rows and 
0 dead rows; 3 rows in sample, 12499949 estimated total rows
INFO:  analyzing "public.part34" inheritance tree
INFO:  "part3": scanned 15000 of 55310 pages, containing 339 live rows and 
0 dead rows; 15000 rows in sample, 12500060 estimated total rows
INFO:  "part4": scanned 15000 of 55310 pages, containing 3389940 live rows and 
0 dead rows; 15000 rows in sample, 12499839 estimated total rows
INFO:  analyzing "public.part3"
INFO:  "part3": scanned 3 of 55310 pages, containing 678 live rows and 
0 dead rows; 3 rows in sample, 12500060 estimated total rows
INFO:  analyzing "public.part4"
INFO:  "part4": scanned 3 of 55310 pages, containing 678 live rows and 
0 dead rows; 3 rows in sample, 12500060 estimated total rows
ANALYZE

-- 
Álvaro Herrera   Valdivia, Chile
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)
>From 6961e64a3ad5bfd10a14f544c470dbb93f9aadc3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 13 May 2021 16:24:11 -0400
Subject: [PATCH] update docs on analyze on partitioned tables

---
 doc/src/sgml/maintenance.sgml|  5 
 doc/src/sgml/perform.sgml|  3 ++-
 doc/src/sgml/ref/analyze.sgml| 40 +++-
 doc/src/sgml/ref/pg_restore.sgml |  6 +++--
 4 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index de7fd75e1c..b390debf2e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -817,6 +817,11 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 
 is compared to the total number of tuples inserted, updated, or deleted
 since the last ANALYZE.
+For partitioned tables, inserts and updates on partitions are counted
+towards this threshold; however partition meta-operations such as
+attachment, detachment or drop are not, so running a manual
+ANALYZE is recommended if the partition added or
+removed contains a statistically significant volume of data.

 

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..ddd6c3ff3e 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1767,7 +1767,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

 Whenever you have significantly altered the distribution of data
 within a table, running ANALYZE is strongly recommended. This
-includes bulk loading large amounts of data into the table.  Running
+includes bulk loading large amounts of data into the table as well as
+attaching, detaching or dropping partitions.  Running
 ANALYZE (or VACUUM ANALYZE)
 ensures that the planner has up-to-date statistics about the
 table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc161..8f8d3af985 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ 

Re: Race condition in recovery?

2021-05-13 Thread Robert Haas
On Mon, May 10, 2021 at 4:35 AM Kyotaro Horiguchi
 wrote:
> It seems to me the issue here is not a race condition but
> WaitForWALToBecomeAvailable initializing expectedTLEs with the history
> of a improper timeline. So using recoveryTargetTLI instead of
> receiveTLI for the case fixes this issue.

I agree.

> I believe the 004_timeline_switch.pl detects your issue.  And the
> attached change fixes it.

So why does this use recoveryTargetTLI instead of receiveTLI only
conditionally? Why not do it all the time?

The hard thing about this code is that the assumptions are not very
clear. If we don't know why something is a certain way, then we might
break things if we change it. Worse yet, if nobody else knows why it's
like that either, then who knows what assumptions they might be
making? It's hard to be sure that any change is safe.

But that being said, we have a clear definition from the comments for
what expectedTLEs is supposed to contain, and it's only going to end
up with those contents if we initialize it from recoveryTargetTLI. So
I am inclined to think that we ought to do that always, and if it
breaks something, then that's a sign that some other part of the code
also needs fixing, because apparently that hypothetical other part of
the code doesn't work if expctedTLEs contains what the comments say
that it should.

Now maybe that's the wrong idea. But if so, then we're saying that the
definition of expectedTLEs needs to be changed, and we should update
the comments with the new definition, whatever it is. A lot of the
confusion here results from the fact that the code and comments are
inconsistent and we can't tell whether that's intentional or
inadvertent. Let's not leave the next person who looks at this code
wondering the same thing about whatever changes we make.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: amvalidate(): cache lookup failed for operator class 123

2021-05-13 Thread Tom Lane
Robert Haas  writes:
> On Thu, May 13, 2021 at 2:22 PM Tom Lane  wrote:
>> Meh.  I'm not convinced that that position ought to apply to amvalidate.

> I am still of the opinion that we ought to apply it across the board,
> for consistency.

The main reason I'm concerned about applying that rule to amvalidate
is that then how do you know what's actually an error case?

As a hardly-irrelevant counterexample, we have a whole bunch of
regression tests that do something like

SELECT ...
WHERE NOT amvalidate(oid);

Every one of those is silently and dangerously wrong if amvalidate
might sometimes return null.

regards, tom lane




Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-05-13 Thread Peter Geoghegan
On Thu, May 13, 2021 at 5:06 AM Justin Pryzby  wrote:
> Why is the allowed range from 0 to 0.05?  Why not 0.10 or 1.0 ?
> The old GUC of the same name had max 1e10.

It also had a completely different purpose and default.

> I think a reduced range and a redefinition of the GUC would need to be called
> out as an incompatibility.

The justification from Michael for this approach is that not having
this level of control would be weird, at least to him. But that
justification itself seems weird to me; why start from the premise
that you need a knob (as opposed to an off switch) at all? Why not
start with the way the mechanism works (or is intended to work) in
practice? Most individual tables will *never* have VACUUM apply the
optimization with *any* reasonable threshold value, so we only need to
consider the subset of tables/workloads where it *might* make sense to
skip index vacuuming. This is more qualitative than quantitative.

It makes zero sense to treat the threshold as a universal scale --
this is one reason why I don't want to expose a true tunable knob to
users. Though the threshold-driven/BYPASS_THRESHOLD_PAGES design is
not exactly something with stable behavior for a given table, it
almost works like that in practice: tables tend to usually skip index
vacuuming, or never skip it even once. There is a clear bifurcation
along this line when you view how VACUUM behaves with a variety of
different tables using the new autovacuum logging stuff.

Almost all of the benefit of the optimization is available with the
current BYPASS_THRESHOLD_PAGES threshold (2% of heap pages have
LP_DEAD items), which has less risk than a higher threshold. I don't
think it matters much if we have the occasional unnecessary round of
index vacuuming on account of not applying the optimization. The truly
important benefit of the optimization is to not do unnecessary index
vacuuming all the time in the extreme cases where it's really hard to
justify. There is currently zero evidence that anything higher than 2%
will ever help anybody to an appreciably degree. (There is also no
evidence that the optimization will ever need to be disabled, but I
accept the need to be conservative and offer an off switch -- the
precautionary principle applies when talking about new harms.)

Not having to scan every index on every VACUUM, but only every 5th or
so VACUUM is a huge improvement. But going from every 5th VACUUM to
every 10th VACUUM? That's at best a tiny additional improvement in
exchange for what I'd guess is a roughly linear increase in risk
(maybe a greater-than-linear increase, even). That's an awful deal.

-- 
Peter Geoghegan




Re: amvalidate(): cache lookup failed for operator class 123

2021-05-13 Thread Robert Haas
On Thu, May 13, 2021 at 2:22 PM Tom Lane  wrote:
> Meh.  I'm not convinced that that position ought to apply to amvalidate.

I am still of the opinion that we ought to apply it across the board,
for consistency. It makes it easier for humans to know which problems
are known to be reachable and which are thought to be can't-happen and
thus bugs. If we fix cases like this to return a real error code, then
anything that comes up as XX000 is likely to be a real bug, whereas if
we don't, the things that we're not concerned about have to be
filtered out by some other method, probably involving a human being.
If the filter that human being has to apply further involves reading
Tom Lane's mind and knowing what he will think about a particular
report, or alternatively asking him, it just makes complicated
something that we could have made simple.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Granting control of SUSET gucs to non-superusers

2021-05-13 Thread Mark Dilger



> On May 13, 2021, at 12:18 PM, Jacob Champion  wrote:
> 
> On Thu, 2021-05-13 at 11:42 -0700, Mark Dilger wrote:
>> The distinction that Theme+Security would make is that capabilities
>> can be categorized by the area of the system:
>>  -- planner
>>  -- replication
>>  -- logging
>>  ...
>> but also by the security implications of what is being done:
>>  -- host
>>  -- schema
>>  -- network
> Since the "security" buckets are being used for both proposals -- how
> you would deal with overlap between them? When a GUC gives you enough
> host access to bleed into the schema and network domains, does it get
> all three attributes assigned to it, and thus require membership in all
> three roles?

Yeah, from a security standpoint, pg_host_admin basically gives everything 
away.  I doubt service providers would give the "host" or "network" security to 
their tenants, but they would probably consider giving "schema" security to the 
tenants.

> (Thanks, by the way, for this thread -- I think a "capability system"
> for superuser access is a great idea.)

I am happy to work on this, and appreciate feedback

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Granting control of SUSET gucs to non-superusers

2021-05-13 Thread Stephen Frost
Greetings,

* Jacob Champion (pchamp...@vmware.com) wrote:
> On Thu, 2021-05-13 at 11:42 -0700, Mark Dilger wrote:
> > The distinction that Theme+Security would make is that capabilities
> > can be categorized by the area of the system:
> >   -- planner
> >   -- replication
> >   -- logging
> >   ...
> > but also by the security implications of what is being done:
> >   -- host
> >   -- schema
> >   -- network
> Since the "security" buckets are being used for both proposals -- how
> you would deal with overlap between them? When a GUC gives you enough
> host access to bleed into the schema and network domains, does it get
> all three attributes assigned to it, and thus require membership in all
> three roles?

The question is about exactly what the operation is, not about what that
operation might allow someone to be able to do by using that access.

'network' might, in theory, allow someone to connect out on a port that
happens to have a bash shell that's running as root on the local box too
which means that it "could" be used to gain 'host' access but that's not
really our concern.

To that point, if it's allowing access to run programs on the host then
'host' is required, but I don't think we should also require 'network'
for 'run programs on the host' because someone might run 'curl' with
that access- that's an issue for the admin and the curl utility to
figure out.

> (Thanks, by the way, for this thread -- I think a "capability system"
> for superuser access is a great idea.)

We've been working in that direction for a long time. :)

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Granting control of SUSET gucs to non-superusers

2021-05-13 Thread Jacob Champion
On Thu, 2021-05-13 at 11:42 -0700, Mark Dilger wrote:
> The distinction that Theme+Security would make is that capabilities
> can be categorized by the area of the system:
>   -- planner
>   -- replication
>   -- logging
>   ...
> but also by the security implications of what is being done:
>   -- host
>   -- schema
>   -- network
Since the "security" buckets are being used for both proposals -- how
you would deal with overlap between them? When a GUC gives you enough
host access to bleed into the schema and network domains, does it get
all three attributes assigned to it, and thus require membership in all
three roles?

(Thanks, by the way, for this thread -- I think a "capability system"
for superuser access is a great idea.)

--Jacob


Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 03:04:30PM -0400, Álvaro Herrera wrote:
> On 2021-May-13, Stephen Frost wrote:
> 
> > Or just accept that this is a bit hokey with the 'auto' approach.  I get
> > Bruce has concerns about it but I'm not convinced that it's actually all
> > that bad to go with that.
> 
> Yeah, I think the alleged confusion there is overstated.
> 
> I'm happy to act as committer for that if he wants to step away from it.
> I'm already used to being lapidated at every corner anyway.

OK, feel free to take ownership of it, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 02:29:09PM -0400, Tom Lane wrote:
> Stephen Frost  writes:
> > There's a ridiculously simple option here which is: drop the idea that
> > we support an extension redefining the query id and then just make it
> > on/off with the default to be 'on'.
> 
> I do not think that defaulting it to 'on' is acceptable unless you can
> show that the added overhead is negligible.  IIUC the measurements that
> have been done show the opposite.

Agreed.

> Maybe we should revert this thing pending somebody doing the work to
> make a version of queryid labeling that actually is negligibly cheap.
> It certainly seems like that could be done; one more traversal of the
> parse tree can't be that expensive in itself.  I suspect that the
> performance problem is with the particular hashing mechanism that
> was used, which looks mighty ad-hoc anyway.

I was surprised it was ~2%.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Alvaro Herrera
On 2021-May-13, Stephen Frost wrote:

> Or just accept that this is a bit hokey with the 'auto' approach.  I get
> Bruce has concerns about it but I'm not convinced that it's actually all
> that bad to go with that.

Yeah, I think the alleged confusion there is overstated.

I'm happy to act as committer for that if he wants to step away from it.
I'm already used to being lapidated at every corner anyway.

-- 
Álvaro Herrera   Valdivia, Chile
"E pur si muove" (Galileo Galilei)




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > There's a ridiculously simple option here which is: drop the idea that
> > we support an extension redefining the query id and then just make it
> > on/off with the default to be 'on'.
> 
> I do not think that defaulting it to 'on' is acceptable unless you can
> show that the added overhead is negligible.  IIUC the measurements that
> have been done show the opposite.

Ah, right, it had only been done before when pg_stat_statements was
loaded..  In which case, it seems like we should:

a) go back to that

b) if someone wants an alternative query ID, tell them to add it to
   pg_stat_statements and make it configurable *there*

c) Have pg_stat_statements provide another function/view/etc that folks
   can use to get a queryid for an ongoing query ..?

d) Maybe come up with a way for extensions, generically, to make a value
   available to log_line_prefix?  That could be pretty interesting.

Or just accept that this is a bit hokey with the 'auto' approach.  I get
Bruce has concerns about it but I'm not convinced that it's actually all
that bad to go with that.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Granting control of SUSET gucs to non-superusers

2021-05-13 Thread Mark Dilger



> On May 13, 2021, at 10:41 AM, Stephen Frost  wrote:
> 
> Greetings,
> 
> * Mark Dilger (mark.dil...@enterprisedb.com) wrote:
>>> On May 12, 2021, at 12:58 PM, Robert Haas  wrote:
>>> - Group things by which section of postgresql.conf they're in, and
>>> then further restrict some of them as security-sensitive. This is
>>> reasonably close to what you've got, but not exactly what you've got.
>>> One issue is that it risks separating things that are in practice not
>>> useful to separate, creating more predefined roles to manage than we
>>> really need. With your division, what are the chances that someone
>>> wants to grant pg_stats_settings but not pg_maintenance_settings or
>>> the other way around?
>> 
>> I think our conversation off-list was worth enough to reiterate here
>> 
>> When classifying GUC variables, the philosophy of classification needs to be 
>> consistent and easily understandable so that, among other considerations, 
>> all future GUC variables have a reasonable chance of be classified correctly 
>> by their patch authors and committers.  The patch I posted falls short in 
>> this regard.  You and I discussed two organizational options:
>> 
>> Theme+Security:
>>  - security is considered as falling into three groupings:  (a) host 
>> security, which includes files and permissions, running external commands, 
>> etc., (b) network security, which includes all connection options and 
>> authentications, and (c) schema security, which includes database internal 
>> object security like rls, object ownership, etc.
>>  - theme is based on the GUC config_group, either having one theme per 
>> config_group, or basing the theme on the prefix of the config_group such 
>> that, for example, QUERY_TUNING_METHOD, QUERY_TUNING_COST, 
>> QUERY_TUNING_GEQO, and QUERY_TUNING_OTHER could all be in one theme named 
>> "pg_query_tuning".
>> 
>> Admin+Security
>>  - security works the same as Theme+Security
>>  - a pg_admin role is required to set all non PGC_USERSET gucs, but some of 
>> those gucs *also* require one or more of the security roles
>> 
>> The Theme+Security approach might be insufficient for extensibility, given 
>> that 3rd-party custom GUCs might not have a corresponding theme.  The 
>> Admin+Security approach appears better in this regard.
>> 
>> Admin+Security seems sufficient, in conjunction with Chapman's idea of 
>> extensible check_hooks.
> 
> I'm not entirely following what the difference here is that's being
> suggested.  At a high level, I like the idea of defining capabilities
> along the lines of "host security", "network security", "schema
> security".  I do think we should consider maybe breaking those down a
> bit more but I don't know that we'd really need to have much more.

The distinction that Theme+Security would make is that capabilities can be 
categorized by the area of the system:
  -- planner
  -- replication
  -- logging
  ...
but also by the security implications of what is being done:
  -- host
  -- schema
  -- network

So if a GUC variable is related to replication, but also impacts the security 
of libpq connections to the server, then you'd need to be a member of both 
pg_replication_role and pg_network_admin.  If another GUC variable is related 
to logging, but also impacts the file permissions or ownership of the log file, 
you'd need to be a member of both pg_logging_role and pg_host_admin.


The Admin+Security idea would instead say that to SET any GUC variable other 
than PGC_USERSET gucs, or to ALTER SYSTEM SET on any GUC variable, you'd need 
to be a member of pg_admin_role.  If the GUC variable also impacts host 
security (file permissions, etc.) you'd have to also be a member of 
pg_host_admin, or if it impacts security of connections to the server, you'd 
have to also be a member of pg_network_admin.

I'm just making up names like "pg_replication_role" and such for illustration.

> In general, I'm not really keen on such a generic role as 'pg_admin'.  I
> would have thought we'd have a matrix where we have categories for GUCs
> and roles which are allowed to modify those categories, with the
> additional requirement of having host/network/schema capability for
> those GUCs which imply that level of access.  

Yeah, that's the Theme+Security idea, or at least it seems so to me.

> Having the low-level
> capabilities plus the GUC groups would seem likely to cover most cases
> that 3rd party extensions might wish for, in a pretty granular way,
> though we could always consider adding more in the future.

I'm imagining the security concerns splitting three ways, and the themes 
splitting on the order of ten different ways.  We can quibble over how fine 
grained the themes should be.  There is a simplicity argument to having them be 
one-to-one with the config_group.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Tom Lane
Stephen Frost  writes:
> There's a ridiculously simple option here which is: drop the idea that
> we support an extension redefining the query id and then just make it
> on/off with the default to be 'on'.

I do not think that defaulting it to 'on' is acceptable unless you can
show that the added overhead is negligible.  IIUC the measurements that
have been done show the opposite.

Maybe we should revert this thing pending somebody doing the work to
make a version of queryid labeling that actually is negligibly cheap.
It certainly seems like that could be done; one more traversal of the
parse tree can't be that expensive in itself.  I suspect that the
performance problem is with the particular hashing mechanism that
was used, which looks mighty ad-hoc anyway.

regards, tom lane




Re: amvalidate(): cache lookup failed for operator class 123

2021-05-13 Thread Justin Pryzby
On Thu, May 13, 2021 at 02:22:16PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > Per sqlsmith.
> > postgres=# select amvalidate(123);
> > ERROR:  cache lookup failed for operator class 123
> > postgres=# \errverbose 
> > ERROR:  XX000: cache lookup failed for operator class 123
> > LOCATION:  amvalidate, amapi.c:125
> 
> > The usual expectation is that sql callable functions should return null 
> > rather
> > than hitting elog().
> 
> Meh.  I'm not convinced that that position ought to apply to amvalidate.
> Under what circumstances would you be calling that on an opclass that
> might be about to be dropped?

Sure, no problem.  I'm just passing on the message :)

-- 
Justin




Re: amvalidate(): cache lookup failed for operator class 123

2021-05-13 Thread Tom Lane
Justin Pryzby  writes:
> Per sqlsmith.
> postgres=# select amvalidate(123);
> ERROR:  cache lookup failed for operator class 123
> postgres=# \errverbose 
> ERROR:  XX000: cache lookup failed for operator class 123
> LOCATION:  amvalidate, amapi.c:125

> The usual expectation is that sql callable functions should return null rather
> than hitting elog().

Meh.  I'm not convinced that that position ought to apply to amvalidate.
Under what circumstances would you be calling that on an opclass that
might be about to be dropped?

regards, tom lane




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, May 13, 2021 at 07:39:45PM +0200, Christoph Berg wrote:
> > Re: Bruce Momjian
> > > Well, now that we have clear warnings when it is misconfigured,
> > > especially when querying the pg_stat_statements view, are these
> > > complaints still valid?   Also, how is modifying
> > > shared_preload_libraries zero-config, but modifying
> > > shared_preload_libraries and compute_query_id a huge burden?
> > 
> > It's zero-config in the sense that if you want to have
> > pg_stat_statements, loading that module via shared_preload_libraries
> > is just natural.

Not sure about natural but it's certainly what folks have at least
become used to.  We should be working to eliminate it though.

> > Having to set compute_query_id isn't natural. It's a setting with a
> > completely different name, and the connection of pg_stat_statements to
> > compute_query_id isn't obvious.
> > 
> > The reasoning with "we have warnings and stuff" might be ok if
> > pg_stat_statements were a new thing, but it has worked via
> > shared_preload_libraries only for the last decade, and requiring
> > something extra will confuse probably every single user of
> > pg_stat_statements out there.

As we keep seeing, over and over.  The ongoing comments claiming that
it's "just" a minor additional configuration tweak fall pretty flat when
you consider the number of times it's already been brought up, and who
it has been brought up by.

> > Perhaps worse, note that these warnings will likely first be seen by
> > the end users of databases, not by the admin performing the initial
> > setup or upgrade, who will not be able to fix the problem themselves.

I don't think this is appreciated anywhere near well enough.  This takes
existing perfectly working configurations and actively breaks them in a
manner that isn't obvious and isn't something that an admin would have
any idea about until after they've upgraded and then started trying to
query the view.  That's pretty terrible.

> Well, but doing this extra configuration, the query id shows up in a lot
> more places.  I can't imagine how this could be done cleanly without
> requiring extra configuration, unless the query_id computation was
> cheaper to compute and we could enable it by default.

There's a ridiculously simple option here which is: drop the idea that
we support an extension redefining the query id and then just make it
on/off with the default to be 'on'.  If people actually have a problem
with it being on and they don't want to use pg_stat_statements then they
can turn it off.  This won't break any existing configs that are out
there in the field and avoids the complexity of having some kind of
'auto' config.  I do agree with the general idea of wanting to be
extensible but I'm not convinced that, in this particular case, it's
worth all of this.  I'm somewhat convinced that having a way to disable
the query id is useful in limited cases and if people want a way to do
that, then we can give that to them in a straightfoward way that doens't
break things.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 07:39:45PM +0200, Christoph Berg wrote:
> Re: Bruce Momjian
> > Well, now that we have clear warnings when it is misconfigured,
> > especially when querying the pg_stat_statements view, are these
> > complaints still valid?   Also, how is modifying
> > shared_preload_libraries zero-config, but modifying
> > shared_preload_libraries and compute_query_id a huge burden?
> 
> It's zero-config in the sense that if you want to have
> pg_stat_statements, loading that module via shared_preload_libraries
> is just natural.
> 
> Having to set compute_query_id isn't natural. It's a setting with a
> completely different name, and the connection of pg_stat_statements to
> compute_query_id isn't obvious.
> 
> The reasoning with "we have warnings and stuff" might be ok if
> pg_stat_statements were a new thing, but it has worked via
> shared_preload_libraries only for the last decade, and requiring
> something extra will confuse probably every single user of
> pg_stat_statements out there.
> 
> Perhaps worse, note that these warnings will likely first be seen by
> the end users of databases, not by the admin performing the initial
> setup or upgrade, who will not be able to fix the problem themselves.

Well, but doing this extra configuration, the query id shows up in a lot
more places.  I can't imagine how this could be done cleanly without
requiring extra configuration, unless the query_id computation was
cheaper to compute and we could enable it by default.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 01:51:07PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote:
> > > I'm coming around to have a similar feeling.  While having an
> > > alternative query ID might be useful, I have a hard time seeing it as
> > > likely to be a hugely popular feature that is worth a lot of users
> > > complaining (as we've seen already, multiple times, before even getting
> > > to beta...) that things aren't working anymore.  That we can't figure
> > > out which libraries to load automatically based on what extensions have
> > > been installed and therefore make everyone have to change
> > > shared_preload_libraries isn't a good thing and requiring additional
> > > configuration for extremely common extensions like pg_stat_statements is
> > > making it worse.
> > 
> > Would someone please explain what is wrong with what is in the tree
> > now, except that it needs additional warnings about misconfiguration? 
> > Requiring two postgresql.conf changes instead of one doesn't seem like a
> > valid complaint to me, especially if the warnings are in place and the
> > release notes mention it.
> 
> Will you be updating pg_upgrade to detect and throw a warning during
> check in the event that it discovers a broken config?

Uh, how does this relate to pg_upgrade?  Are you saying someone
misconfigures the new system with pg_stat_statements but not query id? 
The server would still start and upgrade, no?  How is this different
from any other GUC we rename?  I am not following much of the logic in
this discussion, frankly.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote:
> > I'm coming around to have a similar feeling.  While having an
> > alternative query ID might be useful, I have a hard time seeing it as
> > likely to be a hugely popular feature that is worth a lot of users
> > complaining (as we've seen already, multiple times, before even getting
> > to beta...) that things aren't working anymore.  That we can't figure
> > out which libraries to load automatically based on what extensions have
> > been installed and therefore make everyone have to change
> > shared_preload_libraries isn't a good thing and requiring additional
> > configuration for extremely common extensions like pg_stat_statements is
> > making it worse.
> 
> Would someone please explain what is wrong with what is in the tree
> now, except that it needs additional warnings about misconfiguration? 
> Requiring two postgresql.conf changes instead of one doesn't seem like a
> valid complaint to me, especially if the warnings are in place and the
> release notes mention it.

Will you be updating pg_upgrade to detect and throw a warning during
check in the event that it discovers a broken config?

If not, then I don't think you're correct in arguing that this need for
additional configuration isn't a valid complaint.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Granting control of SUSET gucs to non-superusers

2021-05-13 Thread Stephen Frost
Greetings,

* Mark Dilger (mark.dil...@enterprisedb.com) wrote:
> > On May 12, 2021, at 12:58 PM, Robert Haas  wrote:
> > - Group things by which section of postgresql.conf they're in, and
> > then further restrict some of them as security-sensitive. This is
> > reasonably close to what you've got, but not exactly what you've got.
> > One issue is that it risks separating things that are in practice not
> > useful to separate, creating more predefined roles to manage than we
> > really need. With your division, what are the chances that someone
> > wants to grant pg_stats_settings but not pg_maintenance_settings or
> > the other way around?
> 
> I think our conversation off-list was worth enough to reiterate here
> 
> When classifying GUC variables, the philosophy of classification needs to be 
> consistent and easily understandable so that, among other considerations, all 
> future GUC variables have a reasonable chance of be classified correctly by 
> their patch authors and committers.  The patch I posted falls short in this 
> regard.  You and I discussed two organizational options:
> 
> Theme+Security:
>   - security is considered as falling into three groupings:  (a) host 
> security, which includes files and permissions, running external commands, 
> etc., (b) network security, which includes all connection options and 
> authentications, and (c) schema security, which includes database internal 
> object security like rls, object ownership, etc.
>   - theme is based on the GUC config_group, either having one theme per 
> config_group, or basing the theme on the prefix of the config_group such 
> that, for example, QUERY_TUNING_METHOD, QUERY_TUNING_COST, QUERY_TUNING_GEQO, 
> and QUERY_TUNING_OTHER could all be in one theme named "pg_query_tuning".
> 
> Admin+Security
>   - security works the same as Theme+Security
>   - a pg_admin role is required to set all non PGC_USERSET gucs, but some of 
> those gucs *also* require one or more of the security roles
> 
> The Theme+Security approach might be insufficient for extensibility, given 
> that 3rd-party custom GUCs might not have a corresponding theme.  The 
> Admin+Security approach appears better in this regard.
> 
> Admin+Security seems sufficient, in conjunction with Chapman's idea of 
> extensible check_hooks.

I'm not entirely following what the difference here is that's being
suggested.  At a high level, I like the idea of defining capabilities
along the lines of "host security", "network security", "schema
security".  I do think we should consider maybe breaking those down a
bit more but I don't know that we'd really need to have much more.

In general, I'm not really keen on such a generic role as 'pg_admin'.  I
would have thought we'd have a matrix where we have categories for GUCs
and roles which are allowed to modify those categories, with the
additional requirement of having host/network/schema capability for
those GUCs which imply that level of access.  Having the low-level
capabilities plus the GUC groups would seem likely to cover most cases
that 3rd party extensions might wish for, in a pretty granular way,
though we could always consider adding more in the future.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote:
> I'm coming around to have a similar feeling.  While having an
> alternative query ID might be useful, I have a hard time seeing it as
> likely to be a hugely popular feature that is worth a lot of users
> complaining (as we've seen already, multiple times, before even getting
> to beta...) that things aren't working anymore.  That we can't figure
> out which libraries to load automatically based on what extensions have
> been installed and therefore make everyone have to change
> shared_preload_libraries isn't a good thing and requiring additional
> configuration for extremely common extensions like pg_stat_statements is
> making it worse.

Would someone please explain what is wrong with what is in the tree
now, except that it needs additional warnings about misconfiguration? 
Requiring two postgresql.conf changes instead of one doesn't seem like a
valid complaint to me, especially if the warnings are in place and the
release notes mention it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Christoph Berg
Re: Bruce Momjian
> Well, now that we have clear warnings when it is misconfigured,
> especially when querying the pg_stat_statements view, are these
> complaints still valid?   Also, how is modifying
> shared_preload_libraries zero-config, but modifying
> shared_preload_libraries and compute_query_id a huge burden?

It's zero-config in the sense that if you want to have
pg_stat_statements, loading that module via shared_preload_libraries
is just natural.

Having to set compute_query_id isn't natural. It's a setting with a
completely different name, and the connection of pg_stat_statements to
compute_query_id isn't obvious.

The reasoning with "we have warnings and stuff" might be ok if
pg_stat_statements were a new thing, but it has worked via
shared_preload_libraries only for the last decade, and requiring
something extra will confuse probably every single user of
pg_stat_statements out there.

Perhaps worse, note that these warnings will likely first be seen by
the end users of databases, not by the admin performing the initial
setup or upgrade, who will not be able to fix the problem themselves.

Christoph




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 01:17:16PM -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > The only thing that bugs me is that we're pretty damn late in the
> > process to be engaging in this amount of design.
> 
> Indeed.  I feel that this feature was forced in before it was really
> ready.

The user API has always been a challenge for this feature but I thought
we had it ironed out.  What I didn't anticipate were the configuration
complaints, and if those are valid, the feature should be removed since
we can't improve it at this point, nor do I have any idea if that is
even possible without unacceptable negatives.  If the configuration
complaints are invalid, what we have now is very good, I think, though
adding more warnings about misconfiguration would be wise.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Andrew Dunstan  writes:
> > The only thing that bugs me is that we're pretty damn late in the
> > process to be engaging in this amount of design.
> 
> Indeed.  I feel that this feature was forced in before it was really
> ready.

I'm coming around to have a similar feeling.  While having an
alternative query ID might be useful, I have a hard time seeing it as
likely to be a hugely popular feature that is worth a lot of users
complaining (as we've seen already, multiple times, before even getting
to beta...) that things aren't working anymore.  That we can't figure
out which libraries to load automatically based on what extensions have
been installed and therefore make everyone have to change
shared_preload_libraries isn't a good thing and requiring additional
configuration for extremely common extensions like pg_stat_statements is
making it worse.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: OOM in spgist insert

2021-05-13 Thread Tom Lane
Alvaro Herrera  writes:
> (Looking again, the nbtpage.c hunk might have been made obsolete by
> c34787f91058 and other commits).

OK.  Here's a revision that adopts your idea, except that I left
out the nbtpage.c change since you aren't sure of that part.

I added a macro that allows spgdoinsert to Assert that it's not
called in a context where the infinite-loop-due-to-InterruptPending
risk would arise.  This is a little bit fragile, because it'd be
easy for ill-considered changes to ProcessInterrupts to break it,
but it's better than nothing.

regards, tom lane

diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 4d380c99f0..1c14fc7f76 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -1905,13 +1905,14 @@ spgSplitNodeAction(Relation index, SpGistState *state,
  * Insert one item into the index.
  *
  * Returns true on success, false if we failed to complete the insertion
- * because of conflict with a concurrent insert.  In the latter case,
- * caller should re-call spgdoinsert() with the same args.
+ * (typically because of conflict with a concurrent insert).  In the latter
+ * case, caller should re-call spgdoinsert() with the same args.
  */
 bool
 spgdoinsert(Relation index, SpGistState *state,
 			ItemPointer heapPtr, Datum *datums, bool *isnulls)
 {
+	bool		result = true;
 	TupleDesc	leafDescriptor = state->leafTupDesc;
 	bool		isnull = isnulls[spgKeyColumn];
 	int			level = 0;
@@ -2019,9 +2020,17 @@ spgdoinsert(Relation index, SpGistState *state,
 		/*
 		 * Bail out if query cancel is pending.  We must have this somewhere
 		 * in the loop since a broken opclass could produce an infinite
-		 * picksplit loop.
+		 * picksplit loop.  Moreover, because it's typically the case that
+		 * we're holding buffer lock(s), ProcessInterrupts() won't be able to
+		 * throw an error now.  Hence, if we see that the interrupt condition
+		 * remains true, break out of the loop and deal with the case below.
 		 */
 		CHECK_FOR_INTERRUPTS();
+		if (INTERRUPTS_PENDING_CONDITION())
+		{
+			result = false;
+			break;
+		}
 
 		if (current.blkno == InvalidBlockNumber)
 		{
@@ -2140,10 +2149,15 @@ spgdoinsert(Relation index, SpGistState *state,
 			 * spgAddNode and spgSplitTuple cases will loop back to here to
 			 * complete the insertion operation.  Just in case the choose
 			 * function is broken and produces add or split requests
-			 * repeatedly, check for query cancel.
+			 * repeatedly, check for query cancel (see comments above).
 			 */
 	process_inner_tuple:
 			CHECK_FOR_INTERRUPTS();
+			if (INTERRUPTS_PENDING_CONDITION())
+			{
+result = false;
+break;
+			}
 
 			innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
 		PageGetItemId(current.page, current.offnum));
@@ -2267,5 +2281,21 @@ spgdoinsert(Relation index, SpGistState *state,
 		UnlockReleaseBuffer(parent.buffer);
 	}
 
-	return true;
+	/*
+	 * We do not support being called while some outer function is holding a
+	 * buffer lock (or any other reason to postpone query cancels).  If that
+	 * were the case, telling the caller to retry would create an infinite
+	 * loop.
+	 */
+	Assert(INTERRUPTS_CAN_BE_PROCESSED());
+
+	/*
+	 * Finally, check for interrupts again.  If there was a query cancel,
+	 * ProcessInterrupts() will be able to throw the error now.  If it was
+	 * some other kind of interrupt that can just be cleared, return false to
+	 * tell our caller to retry.
+	 */
+	CHECK_FOR_INTERRUPTS();
+
+	return result;
 }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6200699ddd..dd2ade7bb6 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -554,7 +554,7 @@ ProcessClientWriteInterrupt(bool blocked)
 		{
 			/*
 			 * Don't mess with whereToSendOutput if ProcessInterrupts wouldn't
-			 * do anything.
+			 * service ProcDiePending.
 			 */
 			if (InterruptHoldoffCount == 0 && CritSectionCount == 0)
 			{
@@ -3118,6 +3118,12 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
  * If an interrupt condition is pending, and it's safe to service it,
  * then clear the flag and accept the interrupt.  Called only when
  * InterruptPending is true.
+ *
+ * Note: if INTERRUPTS_CAN_BE_PROCESSED() is true, then ProcessInterrupts
+ * is guaranteed to clear the InterruptPending flag before returning.
+ * (This is not the same as guaranteeing that it's still clear when we
+ * return; another interrupt could have arrived.  But we promise that
+ * any pre-existing one will have been serviced.)
  */
 void
 ProcessInterrupts(void)
@@ -3248,7 +3254,11 @@ ProcessInterrupts(void)
 	{
 		/*
 		 * Re-arm InterruptPending so that we process the cancel request as
-		 * soon as we're done reading the message.
+		 * soon as we're done reading the message.  (XXX this is seriously
+		 * ugly: it complicates INTERRUPTS_CAN_BE_PROCESSED(), and it means we
+		 * can't 

Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Tom Lane
Andrew Dunstan  writes:
> The only thing that bugs me is that we're pretty damn late in the
> process to be engaging in this amount of design.

Indeed.  I feel that this feature was forced in before it was really
ready.

regards, tom lane




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 12:45:07PM -0400, Andrew Dunstan wrote:
> 
> On 5/13/21 12:18 AM, Fujii Masao wrote:
> >
> >
> >
> > If we do this, compute_query_id=auto seems to be similar to
> > huge_pages=try.
> > When huge_pages=try, whether huge pages is actually used is defined
> > depending
> > outside PostgreSQL (i.e, OS setting in this case). Neither
> > pg_settings.setting nor
> > souce are not changed in that case.
> >
> >
> 
> Not a bad analogy, showing there's some precedent for this sort of thing.
> 
> 
> The only thing that bugs me is that we're pretty damn late in the
> process to be engaging in this amount of design.

The issue is that there is no external way to check what query id
computation is being used, unlike huge pages, which can be queried from
the operating system.  I also agree it is late, and discussion of auto
continues to show cases where this makes later improvements more
complex.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





amvalidate(): cache lookup failed for operator class 123

2021-05-13 Thread Justin Pryzby
Per sqlsmith.

postgres=# select amvalidate(123);
ERROR:  cache lookup failed for operator class 123
postgres=# \errverbose 
ERROR:  XX000: cache lookup failed for operator class 123
LOCATION:  amvalidate, amapi.c:125

The usual expectation is that sql callable functions should return null rather
than hitting elog().

-- 
Justin




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Andrew Dunstan


On 5/13/21 12:18 AM, Fujii Masao wrote:
>
>
>
> If we do this, compute_query_id=auto seems to be similar to
> huge_pages=try.
> When huge_pages=try, whether huge pages is actually used is defined
> depending
> outside PostgreSQL (i.e, OS setting in this case). Neither
> pg_settings.setting nor
> souce are not changed in that case.
>
>

Not a bad analogy, showing there's some precedent for this sort of thing.


The only thing that bugs me is that we're pretty damn late in the
process to be engaging in this amount of design.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 09:30:55AM -0700, Maciek Sakrejda wrote:
> On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud  wrote:
> >
> > On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:
> > >
> > > The warning makes it clear that there's something wrong, but not how
> > > to fix it
> >
> > I'm confused, are we talking about the new warning in v2 as suggested by 
> > Pavel?
> > As it should make things quite clear:
> >
> > +SELECT count(*) FROM pg_stat_statements;
> > +WARNING:  Query identifier calculation seems to be disabled
> > +HINT:  If you don't want to use a third-party module to compute query 
> > identifiers, you may want to enable compute_query_id
> >
> > The wording can of course be improved.
> 
> I meant that no warning can be as clear as things just working, but I
> do have feedback on the specific message here:
> 
>  * "seems to" be disabled? Is it? Any reason not to be more definitive here?
>  * On reading the beginning of the hint, I can see users asking
> themselves, "Do I want to use a third-party module to compute query
> identifiers?"
>  * "may want to enable"? Are there any situations where I don't want
> to use a third-party module *and* I don't want to enable
> compute_query_id?
> 
> So maybe something like
> 
> > +SELECT count(*) FROM pg_stat_statements;
> > +WARNING:  Query identifier calculation is disabled
> > +HINT:  You must enable compute_query_id or configure a third-party module 
> > to compute query identifiers in order to use pg_stat_statements.

Yes, I like this.  The reason the old message was so vague is that
'auto', the default some people wanted, didn't issue that error, only
'off' did, so there was an assumption you wanted a custom module since
you changed the value to off.  If we are going with just on/off, no
auto, the message you suggest, leading with compute_query_id, is the
right approach.

> (I admit "configure a third-party module" is pretty vague, but I think
> that suggests it's only an option to consider if you know what you're
> doing.)

Seems fine to me.

> Also, if you're configuring this for usage with a tool like pganalyze,
> and neglect to run a manual query (we guide users to do that, but they
> may skip that step), the warnings may not even be visible (the Go
> driver we are using does not surface client warnings). Should this be
> an error instead of a warning? Is it ever useful to get an empty
> result set from querying pg_stat_statements? Using an error here would
> parallel the behavior of shared_preload_libraries not including
> pg_stat_statements.

Good question.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: OOM in spgist insert

2021-05-13 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-May-13, Tom Lane wrote:
>> BTW, another nasty thing I discovered while testing this is that
>> the CHECK_FOR_INTERRUPTS() at line 2146 is useless, because
>> we're holding a buffer lock there so InterruptHoldoffCount > 0.
>> So once you get into this loop you can't even cancel the query.
>> Seems like that needs a fix, too.

Attached is a WIP patch for that part.  Basically, if it looks
like there's an interrupt pending, make spgdoinsert() fall out of
its loop, so it'll release the buffer locks, and then recheck
CHECK_FOR_INTERRUPTS() at a point where it can really throw the
error.  Now, the hole in this approach is what if ProcessInterrupts
still declines to throw the error?  I've made the patch make use of
the existing provision to retry spgdoinsert() in case of deadlock,
so that it just goes around and tries again.  But that doesn't seem
terribly satisfactory, because if InterruptPending remains set then
we'll just have an infinite loop at that outer level.  IOW, this
patch can only cope with the situation where there's not any outer
function holding a buffer lock (or other reason to prevent the
query cancel from taking effect).  Maybe that's good enough, but
I'm not terribly pleased with it.

> This comment made me remember a patch I've had for a while, which splits
> the CHECK_FOR_INTERRUPTS() definition in two -- one of them is
> INTERRUPTS_PENDING_CONDITION() which let us test the condition
> separately; that allows the lock we hold to be released prior to
> actually processing the interrupts.

Hm.  Yeah, I was feeling that this patch is unduly friendly with
the innards of CHECK_FOR_INTERRUPTS().  So maybe some refactoring
is called for, but I'm not quite sure what it should look like.
Testing the condition separately is fine, but what about the case
of ProcessInterrupts refusing to pull the trigger?

regards, tom lane

diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 4d380c99f0..82e68c4ab7 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -1905,13 +1905,14 @@ spgSplitNodeAction(Relation index, SpGistState *state,
  * Insert one item into the index.
  *
  * Returns true on success, false if we failed to complete the insertion
- * because of conflict with a concurrent insert.  In the latter case,
- * caller should re-call spgdoinsert() with the same args.
+ * (typically because of conflict with a concurrent insert).  In the latter
+ * case, caller should re-call spgdoinsert() with the same args.
  */
 bool
 spgdoinsert(Relation index, SpGistState *state,
 			ItemPointer heapPtr, Datum *datums, bool *isnulls)
 {
+	bool		result = true;
 	TupleDesc	leafDescriptor = state->leafTupDesc;
 	bool		isnull = isnulls[spgKeyColumn];
 	int			level = 0;
@@ -2019,9 +2020,17 @@ spgdoinsert(Relation index, SpGistState *state,
 		/*
 		 * Bail out if query cancel is pending.  We must have this somewhere
 		 * in the loop since a broken opclass could produce an infinite
-		 * picksplit loop.
+		 * picksplit loop.  Moreover, because it's typically the case that
+		 * we're holding buffer lock(s), ProcessInterrupts() won't be able to
+		 * throw an error now.  Hence, if we see that InterruptPending remains
+		 * true, break out of the loop and deal with the problem below.
 		 */
 		CHECK_FOR_INTERRUPTS();
+		if (unlikely(InterruptPending))
+		{
+			result = false;
+			break;
+		}
 
 		if (current.blkno == InvalidBlockNumber)
 		{
@@ -2140,10 +2149,15 @@ spgdoinsert(Relation index, SpGistState *state,
 			 * spgAddNode and spgSplitTuple cases will loop back to here to
 			 * complete the insertion operation.  Just in case the choose
 			 * function is broken and produces add or split requests
-			 * repeatedly, check for query cancel.
+			 * repeatedly, check for query cancel (see comments above).
 			 */
 	process_inner_tuple:
 			CHECK_FOR_INTERRUPTS();
+			if (unlikely(InterruptPending))
+			{
+result = false;
+break;
+			}
 
 			innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
 		PageGetItemId(current.page, current.offnum));
@@ -2267,5 +2281,12 @@ spgdoinsert(Relation index, SpGistState *state,
 		UnlockReleaseBuffer(parent.buffer);
 	}
 
-	return true;
+	/*
+	 * Finally, check for interrupts again.  If there was one,
+	 * ProcessInterrupts() will be able to throw the error now.  But should it
+	 * not do so for some reason, return false to cause a retry.
+	 */
+	CHECK_FOR_INTERRUPTS();
+
+	return result;
 }


Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Maciek Sakrejda
On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud  wrote:
>
> On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:
> >
> > The warning makes it clear that there's something wrong, but not how
> > to fix it
>
> I'm confused, are we talking about the new warning in v2 as suggested by 
> Pavel?
> As it should make things quite clear:
>
> +SELECT count(*) FROM pg_stat_statements;
> +WARNING:  Query identifier calculation seems to be disabled
> +HINT:  If you don't want to use a third-party module to compute query 
> identifiers, you may want to enable compute_query_id
>
> The wording can of course be improved.

I meant that no warning can be as clear as things just working, but I
do have feedback on the specific message here:

 * "seems to" be disabled? Is it? Any reason not to be more definitive here?
 * On reading the beginning of the hint, I can see users asking
themselves, "Do I want to use a third-party module to compute query
identifiers?"
 * "may want to enable"? Are there any situations where I don't want
to use a third-party module *and* I don't want to enable
compute_query_id?

So maybe something like

> +SELECT count(*) FROM pg_stat_statements;
> +WARNING:  Query identifier calculation is disabled
> +HINT:  You must enable compute_query_id or configure a third-party module to 
> compute query identifiers in order to use pg_stat_statements.

(I admit "configure a third-party module" is pretty vague, but I think
that suggests it's only an option to consider if you know what you're
doing.)

Also, if you're configuring this for usage with a tool like pganalyze,
and neglect to run a manual query (we guide users to do that, but they
may skip that step), the warnings may not even be visible (the Go
driver we are using does not surface client warnings). Should this be
an error instead of a warning? Is it ever useful to get an empty
result set from querying pg_stat_statements? Using an error here would
parallel the behavior of shared_preload_libraries not including
pg_stat_statements.




Re: OOM in spgist insert

2021-05-13 Thread Alvaro Herrera
On 2021-May-13, Alvaro Herrera wrote:

> The btree code modified was found to be an actual problem in production
> when a btree is corrupted in such a way that vacuum would get an
> infinite loop.  I don't remember the exact details but I think we saw
> vacuum running for a couple of weeks, and had to restart the server in
> order to terminate it (since it wouldn't respond to signals).

(Looking again, the nbtpage.c hunk might have been made obsolete by
c34787f91058 and other commits).

-- 
Álvaro Herrera   Valdivia, Chile




Re: Granting control of SUSET gucs to non-superusers

2021-05-13 Thread Mark Dilger



> On May 12, 2021, at 12:58 PM, Robert Haas  wrote:
> 
> - Group things by which section of postgresql.conf they're in, and
> then further restrict some of them as security-sensitive. This is
> reasonably close to what you've got, but not exactly what you've got.
> One issue is that it risks separating things that are in practice not
> useful to separate, creating more predefined roles to manage than we
> really need. With your division, what are the chances that someone
> wants to grant pg_stats_settings but not pg_maintenance_settings or
> the other way around?

I think our conversation off-list was worth enough to reiterate here

When classifying GUC variables, the philosophy of classification needs to be 
consistent and easily understandable so that, among other considerations, all 
future GUC variables have a reasonable chance of be classified correctly by 
their patch authors and committers.  The patch I posted falls short in this 
regard.  You and I discussed two organizational options:

Theme+Security:
  - security is considered as falling into three groupings:  (a) host security, 
which includes files and permissions, running external commands, etc., (b) 
network security, which includes all connection options and authentications, 
and (c) schema security, which includes database internal object security like 
rls, object ownership, etc.
  - theme is based on the GUC config_group, either having one theme per 
config_group, or basing the theme on the prefix of the config_group such that, 
for example, QUERY_TUNING_METHOD, QUERY_TUNING_COST, QUERY_TUNING_GEQO, and 
QUERY_TUNING_OTHER could all be in one theme named "pg_query_tuning".

Admin+Security
  - security works the same as Theme+Security
  - a pg_admin role is required to set all non PGC_USERSET gucs, but some of 
those gucs *also* require one or more of the security roles

The Theme+Security approach might be insufficient for extensibility, given that 
3rd-party custom GUCs might not have a corresponding theme.  The Admin+Security 
approach appears better in this regard.

Admin+Security seems sufficient, in conjunction with Chapman's idea of 
extensible check_hooks.

Thoughts?


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 11:35:13PM +0800, Julien Rouhaud wrote:
> On Thu, May 13, 2021 at 10:41:43AM -0400, Bruce Momjian wrote:
> > 
> > Well, now that we have clear warnings when it is misconfigured,
> > especially when querying the pg_stat_statements view, are these
> > complaints still valid?
> 
> I'm personally fine with it, and I can send a new version with just the
> warning when calling pg_stat_statements() or one of the view(s).  Or was there
> other warnings that you were referring too?

No, that was the big fix that made misconfiguration very clear to users
who didn't see the change before.

> > I am personally not comfortable committing a patch to add an auto option
> > the way it is implemented, so another committer will need to take
> > ownership of this, or the entire feature can be removed.
> 
> That's fair.  Just to be clear, I'm assuming that you also don't like
> Horigushi-san approach either?

Uh, anything with 'auto', I don't like.  I am afraid if I commit it, I
would feel responsible for the long tail of confusion this will cause
users, which is why I was saying I would rather remove it than be
responsible for causing such confusion.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: OOM in spgist insert

2021-05-13 Thread Alvaro Herrera
On 2021-May-13, Tom Lane wrote:

> BTW, another nasty thing I discovered while testing this is that
> the CHECK_FOR_INTERRUPTS() at line 2146 is useless, because
> we're holding a buffer lock there so InterruptHoldoffCount > 0.
> So once you get into this loop you can't even cancel the query.
> Seems like that needs a fix, too.

This comment made me remember a patch I've had for a while, which splits
the CHECK_FOR_INTERRUPTS() definition in two -- one of them is
INTERRUPTS_PENDING_CONDITION() which let us test the condition
separately; that allows the lock we hold to be released prior to
actually processing the interrupts.

The btree code modified was found to be an actual problem in production
when a btree is corrupted in such a way that vacuum would get an
infinite loop.  I don't remember the exact details but I think we saw
vacuum running for a couple of weeks, and had to restart the server in
order to terminate it (since it wouldn't respond to signals).

-- 
Álvaro Herrera   Valdivia, Chile
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."   (Fotis)
   (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)
>From 5a008141f135bef5ba933b1e3b65c457f58ad85a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 13 May 2021 11:41:19 -0400
Subject: [PATCH] Split CHECK_FOR_INTERRUPTS

This allows the condition to be checked even when in an interrupts-held
situation, so that we can exit that (eg. release some lock we know we're
holding) in order to process them.
---
 src/backend/access/nbtree/nbtpage.c |  7 +++
 src/include/miscadmin.h | 20 
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index ebec8fa5b8..00de713035 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -2397,6 +2397,13 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
 		{
 			bool		leftsibvalid = true;
 
+			if (INTERRUPTS_PENDING_CONDITION())
+			{
+_bt_relbuf(rel, leafbuf);
+ProcessInterrupts();
+return false;	/* should not occur */
+			}
+
 			/*
 			 * Before we follow the link from the page that was the left
 			 * sibling mere moments ago, validate its right link.  This
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 95202d37af..c5c441c2e9 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -98,23 +98,19 @@ extern PGDLLIMPORT volatile uint32 CritSectionCount;
 extern void ProcessInterrupts(void);
 
 #ifndef WIN32
+#define INTERRUPTS_PENDING_CONDITION() \
+	(unlikely(InterruptPending))
+#else
+#define INTERRUPTS_PENDING_CONDITION() \
+	(unlikely(UNBLOCKED_SIGNAL_QUEUE()) ? pgwin32_dispatch_queued_signals() : 0,  \
+	 unlikely(InterruptPending))
+#endif
 
 #define CHECK_FOR_INTERRUPTS() \
 do { \
-	if (unlikely(InterruptPending)) \
+	if (INTERRUPTS_PENDING_CONDITION()) \
 		ProcessInterrupts(); \
 } while(0)
-#else			/* WIN32 */
-
-#define CHECK_FOR_INTERRUPTS() \
-do { \
-	if (unlikely(UNBLOCKED_SIGNAL_QUEUE())) \
-		pgwin32_dispatch_queued_signals(); \
-	if (unlikely(InterruptPending)) \
-		ProcessInterrupts(); \
-} while(0)
-#endif			/* WIN32 */
-
 
 #define HOLD_INTERRUPTS()  (InterruptHoldoffCount++)
 
-- 
2.20.1



Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:
> > Well, now that we have clear warnings when it is misconfigured,
> > especially when querying the pg_stat_statements view, are these
> > complaints still valid?   Also, how is modifying
> > shared_preload_libraries zero-config, but modifying
> > shared_preload_libraries and compute_query_id a huge burden?
> 
> The warning makes it clear that there's something wrong, but not how
> to fix it (as I noted in another message in this thread, a web search
> for pg_stat_statements docs still leads to an obsolete version). I
> don't think anyone is arguing that this is insurmountable for all
> users, but it is additional friction, and every bit of friction makes
> Postgres harder to use. Users don't read documentation, or misread
> documentation, or just are not sure what the documentation or the
> warning is telling them, in spite of our best efforts.

Well, then let's have the error message tell them what is wrong and how
to fix it.  My issue is that 'auto' spreads confusion around the entire
API, as you can see from the discussion in this thread.

> And you're right, modifying shared_preload_libraries is not
> zero-config--I would love it if we could drop that requirement ;).
> Still, adding another setting is clearly one more thing to get wrong.
> 
> > I am personally not comfortable committing a patch to add an auto option
> > the way it is implemented, so another committer will need to take
> > ownership of this, or the entire feature can be removed.
> 
> Assuming we do want to avoid additional configuration requirements for
> pg_stat_statements, is there another mechanism you feel would work
> better? Or is that constraint incompatible with sane behavior for this
> feature?

I think we just need to leave it is on/off, and then help people find
the way to fix it if the misconfigure it, which I think is already been
shown to be possible.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Julien Rouhaud
On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:
> 
> The warning makes it clear that there's something wrong, but not how
> to fix it

I'm confused, are we talking about the new warning in v2 as suggested by Pavel?
As it should make things quite clear:

+SELECT count(*) FROM pg_stat_statements;
+WARNING:  Query identifier calculation seems to be disabled
+HINT:  If you don't want to use a third-party module to compute query 
identifiers, you may want to enable compute_query_id

The wording can of course be improved.




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Julien Rouhaud
On Thu, May 13, 2021 at 10:41:43AM -0400, Bruce Momjian wrote:
> 
> Well, now that we have clear warnings when it is misconfigured,
> especially when querying the pg_stat_statements view, are these
> complaints still valid?

I'm personally fine with it, and I can send a new version with just the
warning when calling pg_stat_statements() or one of the view(s).  Or was there
other warnings that you were referring too?

> I am personally not comfortable committing a patch to add an auto option
> the way it is implemented, so another committer will need to take
> ownership of this, or the entire feature can be removed.

That's fair.  Just to be clear, I'm assuming that you also don't like
Horigushi-san approach either?




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Maciek Sakrejda
On Thu, May 13, 2021 at 7:42 AM Bruce Momjian  wrote:
>
> On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote:
> > On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
> > I don't know what to say.  So here is a summary of the complaints that I'm
> > aware of:
> >
> > - 
> > https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.ca...@moonset.ru
> > That was only a couple of days after the commit just before the feature 
> > freeze,
> > so it may be the less relevant complaint.
> >
> > - 
> > https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com
> > Did a git bisect to find the commit that changed the behavior and somehow
> > didn't notice the new setting
> >
> > - this thread, with Fuji-san saying:
> >
> > > I'm afraid that users may easily forget to enable compute_query_id when 
> > > using
> > > pg_stat_statements (because this setting was not necessary in v13 or 
> > > before)
> >
> > - this thread, with Peter E. saying:
> >
> > > Now there is the additional burden of turning on this weird setting that
> > > no one understands.  That's a 50% increase in burden.  And almost no one 
> > > will
> > > want to use a nondefault setting.  pg_stat_statements is pretty popular.  
> > > I
> > > think leaving in this requirement will lead to widespread confusion and
> > > complaints.
> >
> > - this thread, with Pavel saying:
> >
> > > Until now, the pg_stat_statements was zero-config. So the change is not 
> > > user
> > > friendly.
> >
> > So it's a mix of "it's changing something that didn't change in a long time"
> > and "it's adding extra footgun and/or burden as it's not doing by default 
> > what
> > the majority of users will want", with an overwhelming majority of people
> > supporting the "we don't want that extra burden".
>
> Well, now that we have clear warnings when it is misconfigured,
> especially when querying the pg_stat_statements view, are these
> complaints still valid?   Also, how is modifying
> shared_preload_libraries zero-config, but modifying
> shared_preload_libraries and compute_query_id a huge burden?

The warning makes it clear that there's something wrong, but not how
to fix it (as I noted in another message in this thread, a web search
for pg_stat_statements docs still leads to an obsolete version). I
don't think anyone is arguing that this is insurmountable for all
users, but it is additional friction, and every bit of friction makes
Postgres harder to use. Users don't read documentation, or misread
documentation, or just are not sure what the documentation or the
warning is telling them, in spite of our best efforts.

And you're right, modifying shared_preload_libraries is not
zero-config--I would love it if we could drop that requirement ;).
Still, adding another setting is clearly one more thing to get wrong.

> I am personally not comfortable committing a patch to add an auto option
> the way it is implemented, so another committer will need to take
> ownership of this, or the entire feature can be removed.

Assuming we do want to avoid additional configuration requirements for
pg_stat_statements, is there another mechanism you feel would work
better? Or is that constraint incompatible with sane behavior for this
feature?




Re: parallel vacuum - few questions on docs, comments and code

2021-05-13 Thread Bharath Rupireddy
On Thu, May 13, 2021 at 7:00 PM Masahiko Sawada  wrote:
> > > Yeah, I get it. Even if users don't specify a parallel option there
> > > are chances that parallelism is picked. So the parallel degree is the
> > > final number of workers that are chosen by the server for vacuuming
> > > indexes. And, I think that parallel degree is something internal to
> > > the server, and it's better we replace it in the vacuumdb.sgml, change
> > > PARALLEL_DEGREE to PARALLEL_WORKERS in vacuumdb.c and change the error
> > > message "parallel vacuum degree must be a non-negative integer" to
> > > "parallel workers for vacuum must be greater than or equal to zero".
> > >
> > > Thoughts?
>
> I'm fine with this change.

Thanks.

> is important here but another idea to improve the error message would
> be to change "parallel vacuum degree must be between 0 and %d” to "the
> number of parallel workers must be between 0 and %d” (or using
> “parallel workers for vacuum” instead of “the number of parallel
> workers”) while leaving another message as it is.

Done that way.

PSA patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 514128927d51a5b0578e9b1ce969a68fe8d62b1c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 13 May 2021 20:29:38 +0530
Subject: [PATCH v1] Parallel Vacuum-Reword Error Messages and Docs

---
 doc/src/sgml/ref/vacuumdb.sgml   | 2 +-
 src/backend/commands/vacuum.c| 2 +-
 src/bin/scripts/vacuumdb.c   | 4 ++--
 src/test/regress/expected/vacuum.out | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 843a82e871..05249c4772 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -279,7 +279,7 @@ PostgreSQL documentation
   --parallel=parallel_degree
   

-Specify the parallel degree of parallel vacuum.
+Specify the number of parallel workers for parallel vacuum.
 This allows the vacuum to leverage multiple CPUs to process indexes.
 See .

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d549d0d86f..7421d7cfbd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -165,7 +165,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 if (nworkers < 0 || nworkers > MAX_PARALLEL_WORKER_LIMIT)
 	ereport(ERROR,
 			(errcode(ERRCODE_SYNTAX_ERROR),
-			 errmsg("parallel vacuum degree must be between 0 and %d",
+			 errmsg("parallel workers for vacuum must be between 0 and %d",
 	MAX_PARALLEL_WORKER_LIMIT),
 			 parser_errposition(pstate, opt->location)));
 
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 7901c41f16..069a861aab 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -200,7 +200,7 @@ main(int argc, char *argv[])
 vacopts.parallel_workers = atoi(optarg);
 if (vacopts.parallel_workers < 0)
 {
-	pg_log_error("parallel vacuum degree must be a non-negative integer");
+	pg_log_error("parallel workers for vacuum must be greater than or equal to zero");
 	exit(1);
 }
 break;
@@ -1000,7 +1000,7 @@ help(const char *progname)
 	printf(_("  --no-index-cleanup  don't remove index entries that point to dead tuples\n"));
 	printf(_("  --no-process-toast  skip the TOAST table associated with the table to vacuum\n"));
 	printf(_("  --no-truncate   don't truncate empty pages at the end of the table\n"));
-	printf(_("  -P, --parallel=PARALLEL_DEGREE  use this many background workers for vacuum, if available\n"));
+	printf(_("  -P, --parallel=PARALLEL_WORKERS use this many background workers for vacuum, if available\n"));
 	printf(_("  -q, --quiet don't write any messages\n"));
 	printf(_("  --skip-locked   skip relations that cannot be immediately locked\n"));
 	printf(_("  -t, --table='TABLE[(COLUMNS)]'  vacuum specific table(s) only\n"));
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 90cea6caa8..5e657849aa 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -110,7 +110,7 @@ VACUUM (PARALLEL 2) pvactst;
 UPDATE pvactst SET i = i WHERE i < 1000;
 VACUUM (PARALLEL 0) pvactst; -- disable parallel vacuum
 VACUUM (PARALLEL -1) pvactst; -- error
-ERROR:  parallel vacuum degree must be between 0 and 1024
+ERROR:  parallel workers for vacuum must be between 0 and 1024
 LINE 1: VACUUM (PARALLEL -1) pvactst;
 ^
 VACUUM (PARALLEL 2, INDEX_CLEANUP FALSE) pvactst;
-- 
2.25.1



Re: OOM in spgist insert

2021-05-13 Thread Tom Lane
Pavel Borisov  writes:
> [ v4-0001-When-there-are-INCLUDEd-columns-in-SpGist-check-t.patch ]

I don't like this patch one bit --- it's basically disabling a fairly
important SPGiST feature as soon as there are included columns.
What's more, it's not really giving us any better defense against
the infinite-picksplit-loop problem than we had before.

I wonder if we should give up on the theory posited circa
spgdoinsert.c:2213:

 * Note: if the opclass sets longValuesOK, we rely on the
 * choose function to eventually shorten the leafDatum
 * enough to fit on a page.  We could add a test here to
 * complain if the datum doesn't get visibly shorter each
 * time, but that could get in the way of opclasses that
 * "simplify" datums in a way that doesn't necessarily
 * lead to physical shortening on every cycle.

The argument that there might be a longValuesOK opclass that *doesn't*
shorten the datum each time seems fairly hypothetical to me.

An alternative way of looking at things is that this is the opclass's
fault.  It should have realized that it's not making progress, and
thrown an error.  However, I'm not sure if the opclass picksplit
function has enough info to throw a meaningful error.  It looks to
me like the trouble case from spg_text_picksplit's point of view
is that it's given a single tuple containing a zero-length string,
so that there is no prefix it can strip.  However, it seems possible
that that could be a legitimate case.  Even if it's not, the opclass
function doesn't have (for instance) the name of the index to cite
in an error message.  Nor did we give it a way to return a failure
indication, which is seeming like a mistake right now.

BTW, another nasty thing I discovered while testing this is that
the CHECK_FOR_INTERRUPTS() at line 2146 is useless, because
we're holding a buffer lock there so InterruptHoldoffCount > 0.
So once you get into this loop you can't even cancel the query.
Seems like that needs a fix, too.

regards, tom lane




Re: PG 14 release notes, first draft

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 02:46:58PM +0900, Amit Langote wrote:
> How about writing the 2nd line instead as:
> 
> Updates/deletes on partitioned tables can now use execution-time
> partition pruning.
> 
> We don't seem to use the term "run-time pruning" anywhere else in the
> documentation, and "pruning of updates/deletes" sounds strange.

Good point, updated text:





Improve the performance of updates/deletes on partitioned tables
when only a few partitions are affected (Amit Langote, Tom Lane)



This also allows updates/deletes on partitioned tables to use
execution-time partition pruning.



-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: alter subscription drop publication fixes

2021-05-13 Thread Bharath Rupireddy
On Thu, May 13, 2021 at 7:43 PM vignesh C  wrote:
> I have separated the Drop publication documentation contents. There
> are some duplicate contents but the readability is slightly better.
> Thoughts?

-ALTER SUBSCRIPTION name
DROP PUBLICATION publication_name [, ...] [ WITH (
set_publication_option [=
value] [, ... ] ) ]
+ALTER SUBSCRIPTION name
DROP PUBLICATION publication_name [, ...] [ WITH (
refresh [= value] ) ]

IMO, let's not list the "refresh" option directly here. If we don't
want to add a new list of operations "drop_publication_opition", you
could just mention a note "Except for DROP PUBLICATION, the refresh
options as described under REFRESH PUBLICATION may be specified." or
"Additionally, refresh options as described under REFRESH PUBLICATION
may be specified, except for DROP PUBLICATION."

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Bruce Momjian
On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote:
> On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
> I don't know what to say.  So here is a summary of the complaints that I'm
> aware of:
> 
> - 
> https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.ca...@moonset.ru
> That was only a couple of days after the commit just before the feature 
> freeze,
> so it may be the less relevant complaint.
> 
> - 
> https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com
> Did a git bisect to find the commit that changed the behavior and somehow
> didn't notice the new setting
> 
> - this thread, with Fuji-san saying:
> 
> > I'm afraid that users may easily forget to enable compute_query_id when 
> > using
> > pg_stat_statements (because this setting was not necessary in v13 or before)
> 
> - this thread, with Peter E. saying:
> 
> > Now there is the additional burden of turning on this weird setting that
> > no one understands.  That's a 50% increase in burden.  And almost no one 
> > will
> > want to use a nondefault setting.  pg_stat_statements is pretty popular.  I
> > think leaving in this requirement will lead to widespread confusion and
> > complaints.
> 
> - this thread, with Pavel saying:
> 
> > Until now, the pg_stat_statements was zero-config. So the change is not user
> > friendly.
> 
> So it's a mix of "it's changing something that didn't change in a long time"
> and "it's adding extra footgun and/or burden as it's not doing by default what
> the majority of users will want", with an overwhelming majority of people
> supporting the "we don't want that extra burden".

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid?   Also, how is modifying
shared_preload_libraries zero-config, but modifying
shared_preload_libraries and compute_query_id a huge burden?

I am personally not comfortable committing a patch to add an auto option
the way it is implemented, so another committer will need to take
ownership of this, or the entire feature can be removed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Enhanced error message to include hint messages for redundant options error

2021-05-13 Thread vignesh C
On Thu, May 13, 2021 at 4:58 AM Alvaro Herrera  wrote:
>
> You can avoid duplicating the ereport like this:
>
> +   ereport(ERROR,
> +   (errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("option \"%s\" specified more than 
> once", defel->defname),
> +parser ? parser_errposition(pstate, 
> defel->location) : 0));
>
> ... also, since e3a87b4991cc you can now elide the parens around the
> auxiliary function calls:
>

Modified.

> +ereport(ERROR,
> +errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("option \"%s\" specified more than once", 
> defel->defname),
> +parser ? parser_errposition(pstate, defel->location) : 0));
>
> Please do add a pg_attribute_noreturn() decorator.  I'm not sure if any
> compilers will complain about the code flow if you have that, but I
> expect many (all?) will if you don't.

Modified.

Thanks for the comments, Attached patch has the changes for the same.

Regards,
Vignesh
From c2362f0c7c75675639cba8603b8fb7acd38c506d Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Mon, 26 Apr 2021 18:40:36 +0530
Subject: [PATCH v8] Enhance error message.

Enhanced error message, so that the user can easily identify the error.
---
 contrib/file_fdw/file_fdw.c |  13 +--
 src/backend/catalog/aclchk.c|  14 +--
 src/backend/commands/copy.c |  57 +++---
 src/backend/commands/dbcommands.c   |  76 +++--
 src/backend/commands/extension.c|  23 +---
 src/backend/commands/foreigncmds.c  |  21 ++--
 src/backend/commands/functioncmds.c |  59 --
 src/backend/commands/publicationcmds.c  |  30 ++---
 src/backend/commands/sequence.c |  48 ++--
 src/backend/commands/subscriptioncmds.c |  66 +--
 src/backend/commands/tablecmds.c|   4 +-
 src/backend/commands/typecmds.c |  34 ++
 src/backend/commands/user.c | 118 +---
 src/backend/parser/parse_utilcmd.c  |   4 +-
 src/backend/replication/pgoutput/pgoutput.c |  23 ++--
 src/backend/replication/walsender.c |  15 +--
 src/backend/tcop/utility.c  |  20 ++--
 src/include/commands/defrem.h   |  16 ++-
 src/include/commands/publicationcmds.h  |   4 +-
 src/include/commands/subscriptioncmds.h |   4 +-
 src/include/commands/typecmds.h |   2 +-
 src/include/commands/user.h |   2 +-
 src/test/regress/expected/copy2.out |  24 ++--
 src/test/regress/expected/foreign_data.out  |   8 +-
 src/test/regress/expected/publication.out   |   4 +-
 25 files changed, 235 insertions(+), 454 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2c2f149fb0..f49dd47930 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -200,6 +200,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	char	   *filename = NULL;
 	DefElem*force_not_null = NULL;
 	DefElem*force_null = NULL;
+	DefElem*def;
 	List	   *other_options = NIL;
 	ListCell   *cell;
 
@@ -209,7 +210,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	 */
 	foreach(cell, options_list)
 	{
-		DefElem*def = (DefElem *) lfirst(cell);
+		def = (DefElem *) lfirst(cell);
 
 		if (!is_valid_option(def->defname, catalog))
 		{
@@ -290,10 +291,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		else if (strcmp(def->defname, "force_not_null") == 0)
 		{
 			if (force_not_null)
-ereport(ERROR,
-		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
-		 errhint("Option \"force_not_null\" supplied more than once for a column.")));
+ReportDuplicateOptionError(def, NULL);
 			force_not_null = def;
 			/* Don't care what the value is, as long as it's a legal boolean */
 			(void) defGetBoolean(def);
@@ -302,10 +300,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		else if (strcmp(def->defname, "force_null") == 0)
 		{
 			if (force_null)
-ereport(ERROR,
-		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
-		 errhint("Option \"force_null\" supplied more than once for a column.")));
+ReportDuplicateOptionError(def, NULL);
 			force_null = def;
 			(void) defGetBoolean(def);
 		}
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 53392414f1..264cdc2730 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -59,6 +59,7 @@
 #include "catalog/pg_ts_template.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/event_trigger.h"
 #include "commands/extension.h"
 #include "commands/proclang.h"
@@ -910,30 +911,25 @@ ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s
 	List	   *nspnames = NIL;
 	DefElem*drolespecs = NULL;
 	DefElem*dnspnames = NULL;
+	

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-13 Thread Greg Nancarrow
On Thu, May 13, 2021 at 11:25 AM Pengchengliu  wrote:
>

> Hi Andres,
>  Thanks for you replay.

Er, it's Greg who has replied so far (not Andres).

>
>   And If you still cannot reproduce it in 2 minitus. Could you run pgbench 
> longer time, such as 30 or 60 minutes.
>

Actually, I did run it, multiple times, for more than 60 minutes, but
no assert/crash/coredump occurred in my environment.

>
>   The parallel work process:
>
>   1, Get Snapshot and set TransactionXmin itself, in 
> ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot->GetSnapshotData.
>
>   2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from 
> main process, and set TransactionXmin 799425 in 
> ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.
>
>   3, 
> ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel
>  get the active snapshot(xmin 799162, xmax  82206) from main process, and set 
> this snapshot to scan->rs_base.rs_snapshot.
>
>   4, parallel scan begin, with active snapshot(xmin 799162, xmax  82206) and 
> TransactionXmin(799425),when scan tuple(xmin 799225) 
> SubTransGetTopmostTransaction assert got.
>
>  In  
> HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.
>

I added some logging at a couple of points in the code:
1) In the Worker process code - ParallelWorkerMain() - where it
restores the serialized transaction and active snapshots (i.e. passed
to the Worker from the main process).
2) In the HeapTupleSatisfiesMVCC() function, immediately before it
calls XidInMVCCSnapshot()

After running it for an hour, examination of the log showed that in
ALL cases, the transaction snapshot xmin,xmax was always THE SAME as
the active snapshot xmin,xmax.
(Can you verify that this occurs on your system when things are
working, prior to the coredump?)

This is different to what you are getting in your environment (at
least, different to what you described when the problem occurs).
In your case, you say that the main process gets "the newer
transaction snapshot" - where exactly is this happening in your case?
(or this is what you don't yet know?)
Perhaps very occasionally this somehow happens on your system and
triggers the Assert (and coredump)?  I have not been able to reproduce
that on my system.

Have you reproduced this issue on any other system, using the same
steps as you provided?
I'm wondering if there might be something else in your environment
that may be influencing this problem.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: alter subscription drop publication fixes

2021-05-13 Thread vignesh C
On Wed, May 12, 2021 at 10:15 PM Bharath Rupireddy
 wrote:
>
> On Wed, May 12, 2021 at 9:55 PM vignesh C  wrote:
> > While I was reviewing one of the logical decoding features, I found a
> > few issues in alter subscription drop publication.
>
> Thanks!
>
> > Alter subscription drop publication does not support copy_data option,
> > that needs to be removed from tab completion.
>
> +1. You may want to also change set_publication_option(to something
> like drop_pulication_option with only refresh option) for the drop in
> the docs? Because "Additionally, refresh options as described under
> REFRESH PUBLICATION may be specified." doesn't make sense.
>
> > Dropping all the publications present in the subscription using alter
> > subscription drop publication would throw "subscription must contain
> > at least one publication". This message was slightly confusing to me.
> > As even though some publication was present on the subscription I was
> > not able to drop. Instead I feel we could throw an error message
> > something like "dropping specified publication will result in
> > subscription without any publication, this is not supported".
>
> -1 for that long message. The intention of that error was to throw an
> error if all the publications of a subscription are dropped. If that's
> so confusing, then you could just let the error message be
> "subscription must contain at least one publication", add an error
> detail "Subscription without any publication is not allowed to
> exist/is not supported." or "Removing/Dropping all the publications
> from a subscription is not allowed/supported." or some other better
> wording.
>

Modified the error message to "errmsg("cannot drop all the
publications of the subscriber \"%s\"", subname)".
I have separated the Drop publication documentation contents. There
are some duplicate contents but the readability is slightly better.
Thoughts?

> > merge_publications can be called after validation of the options
> > specified, I think we should check if the options specified are
> > correct or not before checking the actual publications.
>
> +1. That was a miss in the original feature.

Attached patch has the changes for the same.

Regards,
Vignesh
From b73e99bc2001c23f64fb3b4f220e1a4439614dd6 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Mon, 10 May 2021 12:15:28 +0530
Subject: [PATCH v2] Fixes in alter subscription drop publication.

Fixes in alter subscription drop publication.
---
 doc/src/sgml/ref/alter_subscription.sgml   | 37 ++
 src/backend/commands/subscriptioncmds.c|  6 ++--
 src/bin/psql/tab-complete.c|  9 --
 src/test/regress/expected/subscription.out |  2 +-
 4 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 367ac814f4..2bf836ed5a 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -24,7 +24,7 @@ PostgreSQL documentation
 ALTER SUBSCRIPTION name CONNECTION 'conninfo'
 ALTER SUBSCRIPTION name SET PUBLICATION publication_name [, ...] [ WITH ( set_publication_option [= value] [, ... ] ) ]
 ALTER SUBSCRIPTION name ADD PUBLICATION publication_name [, ...] [ WITH ( set_publication_option [= value] [, ... ] ) ]
-ALTER SUBSCRIPTION name DROP PUBLICATION publication_name [, ...] [ WITH ( set_publication_option [= value] [, ... ] ) ]
+ALTER SUBSCRIPTION name DROP PUBLICATION publication_name [, ...] [ WITH ( refresh [= value] ) ]
 ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( refresh_option [= value] [, ... ] ) ]
 ALTER SUBSCRIPTION name ENABLE
 ALTER SUBSCRIPTION name DISABLE
@@ -94,21 +94,46 @@ ALTER SUBSCRIPTION name RENAME TO <
 

 
+   
+DROP PUBLICATION publication_name
+
+ 
+  Removes the specified publications from the list of publications.  See
+   for more information.  By
+  default the dropped publications are refreshed.
+ 
+
+ 
+  The following option is supported:
+
+  
+   
+refresh (boolean)
+
+ 
+  When false, the command will not try to refresh table information.
+  REFRESH PUBLICATION should then be executed separately.
+  The default is true.
+ 
+
+   
+  
+ 
+
+   
+

 SET PUBLICATION publication_name
 ADD PUBLICATION publication_name
-DROP PUBLICATION publication_name
 
  
   Changes the list of subscribed publications.  SET
   replaces the entire list of publications with a new list,
-  ADD adds additional publications,
-  DROP removes publications from the list of
+  ADD adds additional publications from the list of
   publications.  See  for more
   information.  By default, this command will also act like
   REFRESH PUBLICATION, except that in case of
-  ADD or DROP, only the added or
-  dropped publications are refreshed.
+  ADD, only the added 

  1   2   >