Re: Remove a redundant condition check

2020-06-26 Thread Amit Kapila
On Fri, Jun 26, 2020 at 3:32 PM Michael Paquier  wrote:
>
> On Fri, Jun 26, 2020 at 02:39:22PM +0530, Amit Kapila wrote:
> > It seems we forgot to remove the additional check for switchedTLI
> > while adding a new check.  I think we can remove this duplicate check
> > in the HEAD code.  I am not sure if it is worth to backpatch such a
> > change.
>
> Yes, there is no point to keep this check so let's clean up this
> code.  I also see no need to do a backpatch here, this is purely
> cosmetic.
>

Thanks for the confirmation, pushed!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: xid wraparound danger due to INDEX_CLEANUP false

2020-06-26 Thread Masahiko Sawada
On Sat, 27 Jun 2020 at 08:00, Peter Geoghegan  wrote:
>
> On Fri, Jun 26, 2020 at 5:39 AM Robert Haas  wrote:
> > My opinion is that there's no need to change the code in the
> > back-branches, and that I don't really like the approach in master
> > either.
>
> I guess it's hard to see a way that we could fix this in the
> backbranches, provided we aren't willing to tolerate a big refactor,
> or a cleanup scan of the index (note that I mean btvacuumcleanup(),
> not btvacuumscan(), which is quite different).

Agreed.

>
> > I think what we're saying is that there is no worse consequence to
> > turning off index_cleanup than some bloat that isn't likely to be
> > recovered unless you REINDEX.
>
> That's true.

Regarding to the extent of the impact, this bug will affect the user
who turned vacuum_index_cleanup off or executed manually vacuum with
INDEX_CLEANUP off for a long time, after some vacuums. On the other
hand, the user who uses INDEX_CLEANUP off on the spot or turns
vacuum_index_cleanup off of the table from the start would not be
affected or less affected.

>
> > In retrospect, I regret committing this patch without better
> > understanding the issues in this area. That was a fail on my part. At
> > the same time, it doesn't really sound like the issues are all that
> > bad. The potential index bloat does suck, but it can still suck less
> > than the alternatives, and we have evidence that for at least one
> > user, it was worth a major version upgrade just to replace the
> > suckitude they had with the suckitude this patch creates.
>
> I actually agree -- this is a really important feature, and I'm glad
> that we have it. Even in this slightly flawed form. I remember a great
> need for the feature back when I was involved in supporting Postgres
> in production.

I apologize for writing this patch without enough consideration. I
should have been more careful as I learned the nbtree page recycle
strategy when discussing  vacuum_cleanup_index_scale_factor patch.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Remove Extra palloc Of raw_buf For Binary Format In COPY FROM

2020-06-26 Thread Bharath Rupireddy
Thanks Rushabh and Vignesh for the comments.

>
> One comment:
> We could change below code:
> + */
> + if (!cstate->binary)
> + cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
> + else
> + cstate->raw_buf = NULL;
> to:
> cstate->raw_buf = (cstate->binary) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
>

Attached the patch with the above changes.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v3-0001-Remove-Extra-palloc-Of-raw_buf-For-Binary-Format-.patch
Description: Binary data


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-06-26 Thread David G. Johnston
On Sun, Jun 21, 2020 at 10:56 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> When a query on foreign table is executed from a local session using
> postgres_fdw, as expected the local postgres backend opens a
> connection which causes a remote session/backend to be opened on the
> remote postgres server for query execution.
>
> [...]


> I propose to have a new session level GUC called
> "enable_connectioncache"(name can be changed if it doesn't correctly
> mean the purpose) with the default value being true which means that
> all the remote connections are cached. If set to false, the
> connections are not cached and so are remote sessions closed by the local
> backend/session at
> the end of each remote transaction.
>
> [...]

> Thoughts?
>
> Test Case:
> without patch:
> 1. Run the query on foreign table
> 2. Look for the backend/session opened on the remote postgres server, it
> exists till the local session remains active.
>
> with patch:
> 1. SET enable_connectioncache TO false;
> 2. Run the query on the foreign table
> 3. Look for the backend/session opened on the remote postgres server, it
> should not exist.
>

If this is just going to apply to postgres_fdw why not just have that
module provide a function "disconnect_open_sessions()" or the like that
does this upon user command?  I suppose there would be some potential value
to having this be set per-user but that wouldn't preclude the usefulness of
a function.   And by having a function the usefulness of the GUC seems
reduced.  On a related note is there any entanglement here with the
supplied dblink and/or dblink_fdw [1] modules as they do provide connect
and disconnect functions and also leverages postgres_fdw (or dblink_fdw if
specified, which brings us back to the question of whether this option
should be respected by that FDW).

Otherwise, I would imagine that having multiple queries execute before
wanting to drop the connection would be desirable so at minimum a test case
that does something like:

SELECT count(*) FROM remote.tbl1;
-- connection still open
SET enable_connectioncache TO false;
SELECT count(*) FROM remote.tbl2;
-- now it was closed

Or maybe even better, have the close action happen on a transaction
boundary.

And if it doesn't just apply to postgres_fdw (or at least doesn't have to)
then the description text should be less specific.

David J.

[1] The only place I see "dblink_fdw" in the documentation is in the dblink
module's dblink_connect page.  I would probably modify that page to say:
"It is recommended to use the foreign-data wrapper dblink_fdw (installed by
this module) when defining the foreign server." (adding the parenthetical).


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-06-26 Thread vignesh C
On Mon, Jun 22, 2020 at 11:26 AM Bharath Rupireddy
 wrote:
> Attached the initial patch(based on commit
> 9550ea3027aa4f290c998afd8836a927df40b09d), test setup.
>

make check is failing
sysviews.out 2020-06-27 07:22:32.162146320 +0530
@@ -73,6 +73,7 @@
   name  | setting
 +-
  enable_bitmapscan  | on
+ enable_connectioncache | on

one of the test expect files needs to be updated.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-06-26 Thread vignesh C
On Mon, Jun 22, 2020 at 11:26 AM Bharath Rupireddy
 wrote:
>
> Hi,
>
> When a query on foreign table is executed from a local session using
> postgres_fdw, as expected the local postgres backend opens a
> connection which causes a remote session/backend to be opened on the
> remote postgres server for query execution.
>
> One observation is that, even after the query is finished, the remote
> session/backend still persists on the remote postgres server. Upon
> researching, I found that there is a concept of Connection Caching for
> the remote connections made using postgres_fdw. Local backend/session
> can cache up to 8 different connections per backend. This caching is
> useful as it avoids the cost of reestablishing new connections per
> foreign query.
>
> However, at times, there may be situations where the long lasting
> local sessions may execute very few foreign queries and remaining all
> are local queries, in this scenario, the remote sessions opened by the
> local sessions/backends may not be useful as they remain idle and eat
> up the remote server connections capacity. This problem gets even
> worse(though this use case is a bit imaginary) if all of
> max_connections(default 100 and each backend caching 8 remote
> connections) local sessions open remote sessions and they are cached
> in the local backend.
>
> I propose to have a new session level GUC called
> "enable_connectioncache"(name can be changed if it doesn't correctly
> mean the purpose) with the default value being true which means that
> all the remote connections are cached. If set to false, the
> connections are not cached and so are remote sessions closed by the local 
> backend/session at
> the end of each remote transaction.
>
> Attached the initial patch(based on commit
> 9550ea3027aa4f290c998afd8836a927df40b09d), test setup.

Few comments:

 #backend_flush_after = 0   # measured in pages, 0 disables
-
+#enable_connectioncache = on
This guc could be  placed in CONNECTIONS AND AUTHENTICATION section.

+
+   /* see if the cache was for postgres_fdw connections and
+  user chose to disable connection caching*/
+   if ((strcmp(hashp->tabname,"postgres_fdw connections") == 0) &&
+   !enable_connectioncache)

Should be changed to postgres style commenting like:
/*
 * See if the cache was for postgres_fdw connections and
 * user chose to disable connection caching.
 */

+   /* if true, fdw connections in a session are cached, else
+  discarded at the end of every remote transaction.
+   */
+   boolenableconncache;
Should be changed to postgres style commenting.

+/* parameter for enabling fdw connection hashing */
+bool   enable_connectioncache = true;
+

Should this be connection caching?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Remove Extra palloc Of raw_buf For Binary Format In COPY FROM

2020-06-26 Thread vignesh C
On Fri, Jun 26, 2020 at 6:15 PM Rushabh Lathia  wrote:
>
>
>
> On Fri, Jun 26, 2020 at 3:16 PM Bharath Rupireddy 
>  wrote:
>>
>> Hi Hackers,
>>
>> There seems to be an extra palloc of 64KB of raw_buf for binary format
>> files which is not required
>> as copy logic for binary files don't use raw_buf, instead, attribute_buf
>> is used in CopyReadBinaryAttribute.
>
>
> +1
>
> I looked at the patch and the changes looked good. Couple of comments;
>
> 1)
>
> +
> + /* For binary files raw_buf is not used,
> + * instead, attribute_buf is used in
> + * CopyReadBinaryAttribute. Hence, don't palloc
> + * raw_buf.
> + */
>
> Not a PG style of commenting.
>
> 2)  In non-binary mode, should assign NULL the raw_buf.
>
> Attaching patch with those changes.
>

+1 for the patch.

One comment:
We could change below code:
+ */
+ if (!cstate->binary)
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ else
+ cstate->raw_buf = NULL;
to:
cstate->raw_buf = (cstate->binary) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Added tab completion for the missing options in copy statement

2020-06-26 Thread vignesh C
Hi,

I found that tab completion for some parts of the copy statement was
missing. The Tab completion was missing for the following cases:
1) COPY [BINARY]  FROM filename -> "BINARY", "DELIMITER", "NULL",
"CSV", "ENCODING", "WITH (", "WHERE" should be shown.
2) COPY [BINARY]  TO filename -> "BINARY", "DELIMITER", "NULL",
"CSV", "ENCODING", "WITH (" should be shown.
3) COPY [BINARY]  FROM filename WITH options -> "WHERE" should be shown.

I could not find any test cases for tab completion, hence no tests
were added. Attached a patch which has the fix for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From f7eb10b548f43816cc578bae1d60bb9a739f989a Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Thu, 25 Jun 2020 06:12:32 +0530
Subject: [PATCH] Tab completion for copy statement.

Tab completion for suggesting WITH, OPTIONS & WHERE was missing, this patch has
the fix to suggest the same. I did not see any tests for tab completion, hence
no tests were added.
---
 src/bin/psql/tab-complete.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index eb01885..0ffe31a 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2340,11 +2340,30 @@ psql_completion(const char *text, int start, int end)
 		completion_force_quote = false;
 		matches = rl_completion_matches(text, complete_from_files);
 	}
-	/* Offer options after COPY [BINARY]  FROM|TO filename */
-	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny) ||
-			 Matches("COPY", "BINARY", MatchAny, "FROM|TO", MatchAny))
+
+	/* Offer options after COPY [BINARY]  FROM filename */
+	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny) ||
+			 Matches("COPY", "BINARY", MatchAny, "FROM", MatchAny))
+		COMPLETE_WITH("BINARY", "DELIMITER", "NULL", "CSV",
+	  "ENCODING", "WITH (", "WHERE");
+
+	/* Offer options after COPY [BINARY]  TO filename */
+	else if (Matches("COPY|\\copy", MatchAny, "TO", MatchAny) ||
+			 Matches("COPY", "BINARY", MatchAny, "TO", MatchAny))
 		COMPLETE_WITH("BINARY", "DELIMITER", "NULL", "CSV",
-	  "ENCODING");
+	  "ENCODING", "WITH (");
+
+	/* Offer options after COPY [BINARY]  FROM|TO filename WITH ( */
+	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(") ||
+			 Matches("COPY", "BINARY", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
+		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
+	  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
+	  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+
+	/* Offer options after COPY [BINARY]  FROM filename WITH options */
+	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny) ||
+			 Matches("COPY", "BINARY", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
+		COMPLETE_WITH("WHERE");
 
 	/* Offer options after COPY [BINARY]  FROM|TO filename CSV */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "CSV") ||
-- 
1.8.3.1



Re: Review for GetWALAvailability()

2020-06-26 Thread Alvaro Herrera
On 2020-Jun-25, Kyotaro Horiguchi wrote:

> It is error-prone restriction, as discussed before.
> 
> Without changing updator-side, invalid restart_lsn AND valid
> invalidated_at can be regarded as the lost state. With the following
> change suggested by Fujii-san we can avoid the confusing status.
> 
> With attached first patch on top of the slot-dirtify fix below, we get
> "lost" for invalidated slots after restart.

Makes sense.  I pushed this change, thanks.


> The confusing status can be avoided without fixing it, but I prefer to
> fix it.  As Fujii-san suggested upthread, couldn't we remember
> lastRemovedSegNo in the contorl file? (Yeah, it cuases a bump of
> PG_CONTROL_VERSION and CATALOG_VERSION_NO?).

I think that's a pg14 change.  Feel free to submit a patch to the
commitfest.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Peter Geoghegan
On Fri, Jun 26, 2020 at 4:59 PM Tomas Vondra
 wrote:
> I agree larger work_mem for hashagg (and thus less spilling) may mean
> lower work_mem for so some other nodes that are less sensitive to this.
> But I think this needs to be formulated as a cost-based decision,
> although I don't know how to do that for the reasons I explained before
> (bottom-up plan construction vs. distributing the memory budget).

Why do you think that it needs to be formulated as a cost-based
decision? That's probably true of a scheme that allocates memory very
intelligently, but what about an approach that's slightly better than
work_mem?

What problems do you foresee (if any) with adding a hash_mem GUC that
gets used for both planning and execution for hash aggregate and hash
join nodes, in about the same way as work_mem is now?

> FWIW some databases already do something like this - SQL Server has
> something called "memory grant" which I think mostly does what you
> described here.

Same is true of Oracle. But Oracle also has simple work_mem-like
settings for sorting and hashing. People don't really use them
anymore, but apparently it was once common for the DBA to explicitly
give over more memory to hashing -- much like the hash_mem setting I
asked about. IIRC the same is true of DB2.

> The difference between sort and hashagg spills is that for sorts there
> is no behavior change. Plans that did (not) spill in v12 will behave the
> same way on v13, modulo some random perturbation. For hashagg that's not
> the case - some queries that did not spill before will spill now.
>
> So even if the hashagg spills are roughly equal to sort spills, both are
> significantly more expensive than not spilling.

Just to make sure we're on the same page: both are significantly more
expensive than a hash aggregate not spilling *specifically*. OTOH, a
group aggregate may not be much slower when it spills compared to an
in-memory sort group aggregate. It may even be noticeably faster, due
to caching effects, as you mentioned at one point upthread.

This is the property that makes hash aggregate special, and justifies
giving it more memory than other nodes on a system-wide basis (the
same thing applies to hash join). This could even work as a multiplier
of work_mem.

-- 
Peter Geoghegan




Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Bruce Momjian
On Sat, Jun 27, 2020 at 01:58:50AM +0200, Tomas Vondra wrote:
> > Since work_mem affect the optimizer choices, I can imagine it getting
> > complex since nodes would have to ask the global work_mem allocator how
> > much memory it _might_ get, but then ask for final work_mem during
> > execution, and they might differ.  Still, our spill costs are so high
> > for so many node types, that reducing spills seems like it would be a
> > win, even if it sometimes causes poorer plans.
> > 
> 
> I may not understand what you mean by "poorer plans" here, but I find it
> hard to accept that reducing spills is generally worth poorer plans.

We might cost a plan based on a work_mem that the global allocator
things it will give us when we are in the executor, but that might
change when we are in the executor.  We could code is to an optimizer
request is always honored in the executor, but prepared plans would be a
problem, or perhaps already are if you prepare a plan and change
work_mem before EXECUTE.

> I agree larger work_mem for hashagg (and thus less spilling) may mean
> lower work_mem for so some other nodes that are less sensitive to this.
> But I think this needs to be formulated as a cost-based decision,
> although I don't know how to do that for the reasons I explained before
> (bottom-up plan construction vs. distributing the memory budget).
> 
> FWIW some databases already do something like this - SQL Server has
> something called "memory grant" which I think mostly does what you
> described here.

Yep, something like that.

> > > > Also, doesn't this blog entry also show that spiling to disk for ORDER
> > > > BY is similarly slow compared to hash aggs?
> > > >
> > > > https://momjian.us/main/blogs/pgblog/2012.html#February_2_2012
> > > 
> > > The post does not mention hashagg at all, so I'm not sure how could it
> > > show that? But I think you're right the spilling itself is not that far
> > > away, in most cases (thanks to the recent fixes made by Jeff).
> > 
> > Yeah, I was just measuring ORDER BY spill, but it seems to be a similar
> > overhead to hashagg spill, which is being singled out in this discussion
> > as particularly expensive, and I am questioning that.
> > 
> 
> The difference between sort and hashagg spills is that for sorts there
> is no behavior change. Plans that did (not) spill in v12 will behave the
> same way on v13, modulo some random perturbation. For hashagg that's not
> the case - some queries that did not spill before will spill now.

Well, my point is that we already had ORDER BY problems, and if hash agg
now has them too in PG 13, I am fine with that.  We don't guarantee no
problems in major versions.  If we want to add a general knob that says,
"Hey allow this node to exceed work_mem by X%," I don't see the point
--- just increase work_mem, or have different work_mem settings for
different node types, as I outlined previously.

> So even if the hashagg spills are roughly equal to sort spills, both are
> significantly more expensive than not spilling.

Yes, but that means we need a more general fix and worrying about hash
agg is not addressing the core issue.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Tomas Vondra

On Fri, Jun 26, 2020 at 07:00:20PM -0400, Bruce Momjian wrote:

On Fri, Jun 26, 2020 at 07:45:13PM +0200, Tomas Vondra wrote:

On Fri, Jun 26, 2020 at 12:37:26PM -0400, Bruce Momjian wrote:
> I was thinking more of being able to allocate a single value to be
> shared by all active sesions.

Not sure I understand. What "single value" do you mean?


I was thinking of a full-cluster work_mem maximum allocation that could
be given to various backends that request it.

Imagine we set the cluster-wide total of work_mem to 1GB.  If a session
asks for 100MB, if there are no other active sessions, it can grant the
entire 100MB.  If there are other sessions running, and 500MB has
already been allocated, maybe it is only given an active per-node
work_mem of 50MB.  As the amount of unallocated cluster-wide work_mem
gets smaller, requests are granted smaller actual allocations.

What we do now makes little sense, because we might have lots of free
memory, but we force nodes to spill to disk when they exceed a fixed
work_mem.  I realize this is very imprecise, because you don't know what
future work_mem requests are coming, or how long until existing
allocations are freed, but it seems it would have to be better than what
we do now.


Wasn't the idea was to replace work_mem with something like query_mem?
That'd be nice, but I think it's inherently circular - we don't know how
to distribute this to different nodes until we know which nodes will
need a buffer, but the buffer size is important for costing (so we need
it when constructing the paths).

Plus then there's the question whether all nodes should get the same
fraction, or less sensitive nodes should get smaller chunks, etc.
Ultimately this would be based on costing too, I think, but it makes it
soe much complex ...


Since work_mem affect the optimizer choices, I can imagine it getting
complex since nodes would have to ask the global work_mem allocator how
much memory it _might_ get, but then ask for final work_mem during
execution, and they might differ.  Still, our spill costs are so high
for so many node types, that reducing spills seems like it would be a
win, even if it sometimes causes poorer plans.



I may not understand what you mean by "poorer plans" here, but I find it
hard to accept that reducing spills is generally worth poorer plans.

I agree larger work_mem for hashagg (and thus less spilling) may mean
lower work_mem for so some other nodes that are less sensitive to this.
But I think this needs to be formulated as a cost-based decision,
although I don't know how to do that for the reasons I explained before
(bottom-up plan construction vs. distributing the memory budget).

FWIW some databases already do something like this - SQL Server has
something called "memory grant" which I think mostly does what you
described here.



> Also, doesn't this blog entry also show that spiling to disk for ORDER
> BY is similarly slow compared to hash aggs?
>
>https://momjian.us/main/blogs/pgblog/2012.html#February_2_2012

The post does not mention hashagg at all, so I'm not sure how could it
show that? But I think you're right the spilling itself is not that far
away, in most cases (thanks to the recent fixes made by Jeff).


Yeah, I was just measuring ORDER BY spill, but it seems to be a similar
overhead to hashagg spill, which is being singled out in this discussion
as particularly expensive, and I am questioning that.



The difference between sort and hashagg spills is that for sorts there
is no behavior change. Plans that did (not) spill in v12 will behave the
same way on v13, modulo some random perturbation. For hashagg that's not
the case - some queries that did not spill before will spill now.

So even if the hashagg spills are roughly equal to sort spills, both are
significantly more expensive than not spilling.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Bruce Momjian
On Fri, Jun 26, 2020 at 01:53:05PM -0700, Peter Geoghegan wrote:
> On Thu, Jun 25, 2020 at 1:36 PM Andres Freund  wrote:
> > 12  28164.865 ms
> >
> > fast ssd:
> > HEAD92520.680 ms
> >
> > magnetic:
> > HEAD183968.538 ms
> >
> > (no reads, there's plenty enough memory. Just writes because the age /
> > amount thresholds for dirty data are reached)
> >
> > In the magnetic case we're IO bottlenecked nearly the whole time.
> 
> I agree with almost everything you've said on this thread, but at the
> same time I question the emphasis on I/O here. You've shown that
> spinning rust is about twice as slow as a fast SSD here. Fair enough,
> but to me the real story is that spilling is clearly a lot slower in
> general, regardless of how fast the storage subsystem happens to be (I
> wonder how fast it is with a ramdisk). To me, it makes more sense to

This blog entry shows ORDER BY using ram disk, SSD, and magnetic:

https://momjian.us/main/blogs/pgblog/2012.html#February_2_2012

It is from 2012, but I can re-run the test if you want.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Peter Geoghegan
On Fri, Jun 26, 2020 at 4:00 PM Bruce Momjian  wrote:
> Imagine we set the cluster-wide total of work_mem to 1GB.  If a session
> asks for 100MB, if there are no other active sessions, it can grant the
> entire 100MB.  If there are other sessions running, and 500MB has
> already been allocated, maybe it is only given an active per-node
> work_mem of 50MB.  As the amount of unallocated cluster-wide work_mem
> gets smaller, requests are granted smaller actual allocations.

I think that that's the right approach long term. But right now the
DBA has no way to give hash-based nodes more memory, even though it's
clear that that's where it's truly needed in most cases, across almost
workloads. I think that that's the really glaring problem.

This is just the intrinsic nature of hash-based aggregation and hash
join vs sort-based aggregation and merge join (roughly speaking). It's
much more valuable to be able to do hash-based aggregation in one
pass, especially in cases where hashing already did particularly well
in Postgres v12.

> What we do now makes little sense, because we might have lots of free
> memory, but we force nodes to spill to disk when they exceed a fixed
> work_mem.  I realize this is very imprecise, because you don't know what
> future work_mem requests are coming, or how long until existing
> allocations are freed, but it seems it would have to be better than what
> we do now.

Postgres 13 made hash aggregate respect work_mem. Perhaps it would
have made more sense to teach work_mem to respect hash aggregate,
though.

Hash aggregate cannot consume an unbounded amount of memory in v13,
since the old behavior was clearly unreasonable. Which is great. But
it may be even more unreasonable to force users to conservatively set
the limit on the size of the hash table in an artificial, generic way.

> Since work_mem affect the optimizer choices, I can imagine it getting
> complex since nodes would have to ask the global work_mem allocator how
> much memory it _might_ get, but then ask for final work_mem during
> execution, and they might differ.  Still, our spill costs are so high
> for so many node types, that reducing spills seems like it would be a
> win, even if it sometimes causes poorer plans.

I don't think it's really about the spill costs, at least in one
important sense. If performing a hash aggregate in memory uses twice
as much memory as spilling (with either sorting or hashing), but the
operation completes in one third the time, you have actually saved
memory in the aggregate (no pun intended). Also, the query is 3x
faster, which is a nice bonus! I don't think that this kind of
scenario is rare.

-- 
Peter Geoghegan




Re: Possible NULL dereferencing (src/backend/tcop/pquery.c)

2020-06-26 Thread Ranier Vilela
Em sex., 26 de jun. de 2020 às 18:24, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > 1.Assertion check
> > /* Caller messed up if we have neither a ready query nor held data. */
> > Assert(queryDesc || portal->holdStore);
>
> > But in release, if QueryDesc is NULL and portal->holdStore is NULL too,
> > when Call PushActiveSnapshot  *deference* NULL check can happen.
>
> > 2. if (portal->atEnd || count <= 0) is True
> > No need to recheck count against FETCH_ALL.
>
> > Is it worth correcting them?
>
> No.
>
> The assertion already says that that's a case that cannot happen.
> Or to look at it another way: if the case were to occur in a devel
> build, you'd get a core dump at the assertion.  If the case were
> to occur in a production build, you'd get a core dump at the
> dereference.  Not much difference.  Either way, it's a *caller*
> bug, because the caller is supposed to make sure this cannot happen.
> If we thought that it could possibly happen, we would use an ereport
> but not an assertion; having both for the same condition is quite
> misguided.
>
Ok, thats a job of Assertion.
But I still worry that, in some rare cases, portal-> holdStore might be
corrupted in some way
and the function is called, causing a segmentation fault.

>
> (If Coverity is whining about this for you, there's something wrong
> with your Coverity settings.  In the project's instance, Coverity
> accepts assertions as assertions.)
>
Probable, because reports this:
CID 10127 (#2 of 2): Dereference after null check (FORWARD_NULL)8.
var_deref_op: Dereferencing null pointer queryDesc.

>
> I'm unimpressed with the other proposed change too; it's making the logic
> more complicated and fragile for a completely negligible "performance
> gain".  Moreover the compiler could probably make the same optimization.
>
Ok.

Anyway, thank you for by responding, your observations are always valuable
and help learn "the postgres way" to develop.
It's not easy.

best regards,
Ranier Vilela


Re: xid wraparound danger due to INDEX_CLEANUP false

2020-06-26 Thread Peter Geoghegan
On Fri, Jun 26, 2020 at 5:39 AM Robert Haas  wrote:
> My opinion is that there's no need to change the code in the
> back-branches, and that I don't really like the approach in master
> either.

I guess it's hard to see a way that we could fix this in the
backbranches, provided we aren't willing to tolerate a big refactor,
or a cleanup scan of the index (note that I mean btvacuumcleanup(),
not btvacuumscan(), which is quite different).

> I think what we're saying is that there is no worse consequence to
> turning off index_cleanup than some bloat that isn't likely to be
> recovered unless you REINDEX.

That's true.

> Now, what about master? I think it's fine to offer the AM a callback
> even when index_cleanup = false, for example so that it can freeze
> something in its metapage, but I don't agree with passing it the TIDs.
> That seems like it's just inviting it to ignore the emergency brake,
> and it's also incurring real overhead, because remembering all those
> TIDs can use a lot of memory.

You don't have to do anything with TIDs passed from vacuumlazy.c to
recycle pages that need to be recycled, since you only have to go
through btvacuumcleanup() to avoid the problem that we're talking
about (you don't have to call btvacuumscan() to kill TIDs that
vacuumlazy.c will have pruned). Killing TIDs/tuples in the index was
never something that would make sense, even within the confines of the
existing flawed nbtree recycling design. However, you do need to scan
the entire index to do that much. FWIW, that doesn't seem like it
*totally* violates the spirit of "index_cleanup = false", since you're
still not doing most of the usual nbtree vacuuming stuff (even though
you have to scan the index, there is still much less work total).

> If that API limitation causes a problem
> for some future index AM, that will be a good point to discuss when
> the patch for said AM is submitted for review. I entirely agree with
> you that the way btree arranges for btree recycling is crude, and I
> would be delighted if you want to improve it, either for v14 or for
> any future release, or if somebody else wants to do so. However, even
> if that never happens, so what?

I think that it's important to be able to describe an ideal (though
still realistic) design, even if it might remain aspirational for a
long time. I suspect that pushing the mechanism down into index AMs
has other non-obvious benefits.

> In retrospect, I regret committing this patch without better
> understanding the issues in this area. That was a fail on my part. At
> the same time, it doesn't really sound like the issues are all that
> bad. The potential index bloat does suck, but it can still suck less
> than the alternatives, and we have evidence that for at least one
> user, it was worth a major version upgrade just to replace the
> suckitude they had with the suckitude this patch creates.

I actually agree -- this is a really important feature, and I'm glad
that we have it. Even in this slightly flawed form. I remember a great
need for the feature back when I was involved in supporting Postgres
in production.

-- 
Peter Geoghegan




Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Bruce Momjian
On Fri, Jun 26, 2020 at 07:45:13PM +0200, Tomas Vondra wrote:
> On Fri, Jun 26, 2020 at 12:37:26PM -0400, Bruce Momjian wrote:
> > I was thinking more of being able to allocate a single value to be
> > shared by all active sesions.
> 
> Not sure I understand. What "single value" do you mean?

I was thinking of a full-cluster work_mem maximum allocation that could
be given to various backends that request it.

Imagine we set the cluster-wide total of work_mem to 1GB.  If a session
asks for 100MB, if there are no other active sessions, it can grant the
entire 100MB.  If there are other sessions running, and 500MB has
already been allocated, maybe it is only given an active per-node
work_mem of 50MB.  As the amount of unallocated cluster-wide work_mem
gets smaller, requests are granted smaller actual allocations.

What we do now makes little sense, because we might have lots of free
memory, but we force nodes to spill to disk when they exceed a fixed
work_mem.  I realize this is very imprecise, because you don't know what
future work_mem requests are coming, or how long until existing
allocations are freed, but it seems it would have to be better than what
we do now.

> Wasn't the idea was to replace work_mem with something like query_mem?
> That'd be nice, but I think it's inherently circular - we don't know how
> to distribute this to different nodes until we know which nodes will
> need a buffer, but the buffer size is important for costing (so we need
> it when constructing the paths).
> 
> Plus then there's the question whether all nodes should get the same
> fraction, or less sensitive nodes should get smaller chunks, etc.
> Ultimately this would be based on costing too, I think, but it makes it
> soe much complex ...

Since work_mem affect the optimizer choices, I can imagine it getting
complex since nodes would have to ask the global work_mem allocator how
much memory it _might_ get, but then ask for final work_mem during
execution, and they might differ.  Still, our spill costs are so high
for so many node types, that reducing spills seems like it would be a
win, even if it sometimes causes poorer plans.

> > Also, doesn't this blog entry also show that spiling to disk for ORDER
> > BY is similarly slow compared to hash aggs?
> > 
> > https://momjian.us/main/blogs/pgblog/2012.html#February_2_2012
> 
> The post does not mention hashagg at all, so I'm not sure how could it
> show that? But I think you're right the spilling itself is not that far
> away, in most cases (thanks to the recent fixes made by Jeff).

Yeah, I was just measuring ORDER BY spill, but it seems to be a similar
overhead to hashagg spill, which is being singled out in this discussion
as particularly expensive, and I am questioning that.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: PG 13 release notes, first draft

2020-06-26 Thread Bruce Momjian
On Fri, Jun 26, 2020 at 04:23:26PM -0400, Alvaro Herrera wrote:
> Reading Luis Carril's other entry in the relnotes,
> 
>  Allow pg_dump --include-foreign-data to dump data from foreign servers (Luis 
> Carril)
> 
> It seems to suggest that --include-foreign-data existed previously,
> which is not true. I would have worded it as "Add --include-foreign-data
> option to pg_dump to allow dumping data from foreign servers".

OK, pg_dump item about FOREIGN keyword removed from PG 13 release notes,
and the above item clarified.   Patch attached and applied to PG 13.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/doc/src/sgml/release-13.sgml b/doc/src/sgml/release-13.sgml
new file mode 100644
index 401c557..45d8f0b
*** a/doc/src/sgml/release-13.sgml
--- b/doc/src/sgml/release-13.sgml
*** Author: Peter Eisentraut 
   
  
-  
- 
- 
-   
-Add FOREIGN to ALTER statements,
-if appropriate (Luis Carril)
-   
- 
-   
-WHAT IS THIS ABOUT?
-   
-  
- 
  
  
 
--- 1534,1539 
*** Author: Alvaro Herrera 
  

!Allow pg_dump
!--include-foreign-data to dump data from foreign
!servers (Luis Carril)

   
  
--- 2398,2407 
  -->
  

!Add pg_dump
!option --include-foreign-data to dump data from
!foreign servers (Luis Carril)

   
  


Re: Weird failures on lorikeet

2020-06-26 Thread Andrew Dunstan


On 6/25/20 2:42 PM, Andrew Dunstan wrote:
> On 6/25/20 12:52 PM, Alvaro Herrera wrote:
>> Is there something going on with lorikeet again?  I see this:
>>
>> 2020-06-25 01:55:13.380 EDT [5ef43c40.21e0:85] pg_regress/typed_table LOG:  
>> statement: SELECT c.oid::pg_catalog.regclass
>>  FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i
>>  WHERE c.oid = i.inhparent AND i.inhrelid = '21420'
>>AND c.relkind != 'p' AND c.relkind != 'I'
>>  ORDER BY inhseqno;
>> *** starting debugger for pid 4456, tid 2644
>> 2020-06-25 01:55:13.393 EDT [5ef43c40.21e0:86] pg_regress/typed_table LOG:  
>> statement: SELECT c.oid::pg_catalog.regclass, c.relkind, 
>> pg_catalog.pg_get_expr(c.relpartbound, c.oid)
>>  FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i
>>  WHERE c.oid = i.inhrelid AND i.inhparent = '21420'
>>  ORDER BY pg_catalog.pg_get_expr(c.relpartbound, c.oid) = 'DEFAULT', 
>> c.oid::pg_catalog.regclass::pg_catalog.text;
>>   1 [main] postgres 4456 try_to_debug: Failed to start debugger, Win32 
>> error 2
>> 2020-06-25 01:55:13.455 EDT [5ef43c40.21e0:87] pg_regress/typed_table LOG:  
>> statement: CREATE TABLE persons3 OF person_type (
>>  PRIMARY KEY (id),
>>  name NOT NULL DEFAULT ''
>>  );
>> *** continuing pid 4456 from debugger call (0)
>> *** starting debugger for pid 4456, tid 2644
>>   48849 [main] postgres 4456 try_to_debug: Failed to start debugger, Win32 
>> error 2
>> *** continuing pid 4456 from debugger call (0)
>> 2020-06-25 01:55:18.181 EDT [5ef43bd6.2824:4] LOG:  server process (PID 
>> 4456) was terminated by signal 11: Segmentation fault
>> 2020-06-25 01:55:18.181 EDT [5ef43bd6.2824:5] DETAIL:  Failed process was 
>> running: drop table some_tab cascade;
>>
>
> I don't know what that's about. I'll reboot the machine presently and
> see if it persists.
>
>

I've disabled the core dumper, because I've never got it working
properly anyway. But we might well still get the segfault.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: pg_dump bug for extension owned tables

2020-06-26 Thread Andrew Dunstan


On 6/26/20 2:10 PM, Fabrízio de Royes Mello wrote:
>
> On Fri, Jun 26, 2020 at 11:55 AM Fabrízio de Royes Mello
> mailto:fabriziome...@gmail.com>> wrote:
> >
> >
> > On Fri, Jun 26, 2020 at 11:24 AM Andrew Dunstan
>  > wrote:
> > >
> > >
> > > On 6/26/20 9:57 AM, Andrew Dunstan wrote:
> > > > It appears that for extension owned tables tbinfo.attgenerated isn't
> > > > being properly populated, so line 2050 in REL_12_STABLE, which
> is line
> > > > 2109 in git tip, is failing.
> > > >
> > > >
> > >
> > > Should have mentioned this is in src/bin/pg_dump/pg_dump.c
> > >
> >
> > Having a look on it.
> >
>
> Seems when qualify the schemaname the the "tbinfo->interesting" field
> is not setted for extensions objects, so the getTableAttrs can't fill
> the attgenerated field properly.
>
> I'm not 100% sure it's the correct way but the attached patch works
> for me and all tests passed. Maybe we should add more TAP tests?
>
>


Thanks for this.

It looks sane to me, I'll let others weigh in on it though. Yes we
should have a TAP test for it.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: should libpq also require TLSv1.2 by default?

2020-06-26 Thread Tom Lane
Daniel Gustafsson  writes:
> SSL_R_UNKNOWN_PROTOCOL seem to covers cases when someone manages to perform
> something which OpenSSL believes is a broken SSLv2 connection, but their own
> client-level code use it to refer to SSL as well as TLS.  Maybe it's worth
> adding as a belts and suspenders type thing?

No objection on my part.

> Is this targeting v13 or v14?  In case of the former, the release notes entry
> for raising the default minimum version should perhaps be tweaked as it now
> just refers to the GUC which is a tad misleading.

I think Peter is proposing that we change this in v13.  I didn't look
at the release notes; usually we cover this sort of thing in-bulk
when we update the release notes later in beta.

> If anything it might useful to document in the comment that we're only
> concerned with TLS versions, SSL2/3 are disabled in the library 
> initialization.

Good point.

regards, tom lane




Re: Possible NULL dereferencing (src/backend/tcop/pquery.c)

2020-06-26 Thread Tom Lane
Ranier Vilela  writes:
> 1.Assertion check
> /* Caller messed up if we have neither a ready query nor held data. */
> Assert(queryDesc || portal->holdStore);

> But in release, if QueryDesc is NULL and portal->holdStore is NULL too,
> when Call PushActiveSnapshot  *deference* NULL check can happen.

> 2. if (portal->atEnd || count <= 0) is True
> No need to recheck count against FETCH_ALL.

> Is it worth correcting them?

No.

The assertion already says that that's a case that cannot happen.
Or to look at it another way: if the case were to occur in a devel
build, you'd get a core dump at the assertion.  If the case were
to occur in a production build, you'd get a core dump at the
dereference.  Not much difference.  Either way, it's a *caller*
bug, because the caller is supposed to make sure this cannot happen.
If we thought that it could possibly happen, we would use an ereport
but not an assertion; having both for the same condition is quite
misguided.

(If Coverity is whining about this for you, there's something wrong
with your Coverity settings.  In the project's instance, Coverity
accepts assertions as assertions.)

I'm unimpressed with the other proposed change too; it's making the logic
more complicated and fragile for a completely negligible "performance
gain".  Moreover the compiler could probably make the same optimization.

regards, tom lane




Re: should libpq also require TLSv1.2 by default?

2020-06-26 Thread Daniel Gustafsson
> On 26 Jun 2020, at 22:22, Tom Lane  wrote:
> 
> I wrote:
>> Anybdy have a better idea?  Is there a reasonably direct way to ask
>> OpenSSL what its min and max versions are?
> 
> After some digging, there apparently is not.

AFAIK everyone either #ifdef around the TLS1_x_VERSION macros or the OpenSSL
versioning and use hardcoded knowledge based on that.  The latter is fairly
shaky since configure options can disable protocols.  At least in past
versions, the validation for protocol range in OpenSSL ssl_lib was doing pretty
much that too.

> So I propose the attached.

SSL_R_UNKNOWN_PROTOCOL seem to covers cases when someone manages to perform
something which OpenSSL believes is a broken SSLv2 connection, but their own
client-level code use it to refer to SSL as well as TLS.  Maybe it's worth
adding as a belts and suspenders type thing?

I've only had a chance to read the patches, but they read pretty much just like
I had in mind that we could do this. +1 on both patches from an eye-ball POV.

Is this targeting v13 or v14?  In case of the former, the release notes entry
for raising the default minimum version should perhaps be tweaked as it now
just refers to the GUC which is a tad misleading.

> The hack in openssl.h to guess the
> min/max supported versions is certainly nothing but a hack;
> but I see no way to do better.

If anything it might useful to document in the comment that we're only
concerned with TLS versions, SSL2/3 are disabled in the library initialization.

cheers ./daniel



Re: [PATCH] Allow to specify restart_lsn in pg_create_physical_replication_slot()

2020-06-26 Thread Alexey Kondratov

On 2020-06-23 04:18, Michael Paquier wrote:

On Mon, Jun 22, 2020 at 08:18:58PM +0300, Alexey Kondratov wrote:

Things get worse when we allow specifying an older LSN, since it has a
higher chances to be at the horizon of deletion by checkpointer. 
Anyway, if
I get it correctly, with a current patch slot will be created 
successfully,

but will be obsolete and should be invalidated by the next checkpoint.


Is that a behavior acceptable for the end user?  For example, a
physical slot that is created to immediately reserve WAL may get
invalidated, causing it to actually not keep WAL around contrary to
what the user has wanted the command to do.



I can imagine that it could be acceptable in the initially proposed 
scenario for someone, since creation of a slot with historical 
restart_lsn is already unpredictable — required segment may exist or may 
do not exist. However, adding here an undefined behaviour even after a 
slot creation does not look good to me anyway.


I have looked closely on the checkpointer code and another problem is 
that it decides once which WAL segments to delete based on the 
replicationSlotMinLSN, and does not check anything before the actual 
file deletion. That way the gap for a possible race is even wider. I do 
not know how to completely get rid of this race without introducing of 
some locking mechanism, which may be costly.


Thanks for feedback
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Peter Geoghegan
On Thu, Jun 25, 2020 at 1:36 PM Andres Freund  wrote:
> 12  28164.865 ms
>
> fast ssd:
> HEAD92520.680 ms
>
> magnetic:
> HEAD183968.538 ms
>
> (no reads, there's plenty enough memory. Just writes because the age /
> amount thresholds for dirty data are reached)
>
> In the magnetic case we're IO bottlenecked nearly the whole time.

I agree with almost everything you've said on this thread, but at the
same time I question the emphasis on I/O here. You've shown that
spinning rust is about twice as slow as a fast SSD here. Fair enough,
but to me the real story is that spilling is clearly a lot slower in
general, regardless of how fast the storage subsystem happens to be (I
wonder how fast it is with a ramdisk). To me, it makes more sense to
think of the problem here as the fact that v13 will *not* do
aggregation using the fast strategy (i.e. in-memory) -- as opposed to
the problem being that v13 does the aggregation using the slow
strategy (which is assumed to be slow because it involves I/O instead
of memory buffers).

I get approximately the same query runtimes with your "make the
transition state a bit bigger" test case. With "set enable_hashagg =
off", I get a group aggregate + sort. It spills to disk, even with
'work_mem = '15GB'" -- leaving 4 runs to merge at the end. That takes
63702.992 ms on v13. But if I reduce the amount of work_mem radically,
to only 16MB (a x960 decrease!), then the run time only increases by
~30% -- it's only 83123.926 ms. So we're talking about a ~200%
increase (for hash aggregate) versus a ~30% increase (for groupagg +
sort) on fast SSDs.

Changing the cost of I/O in the context of hashaggregate seems like it
misses the point. Jeff recently said "Overall, the IO pattern is
better for Sort, but not dramatically so". Whatever the IO pattern may
be, I think that it's pretty clear that the performance
characteristics of hash aggregation with limited memory are very
different to groupaggregate + sort, at least when only a fraction of
the optimal amount of memory we'd like is available. It's true that
hash aggregate was weird among plan nodes in v12, and is now in some
sense less weird among plan nodes. And yet we have a new problem now
-- so where does that leave that whole "weirdness" framing? ISTM that
the work_mem framework was and is the main problem. We seem to have
lost a crutch that ameliorated the problem before now, even though
that amelioration was kind of an accident. Or a thing that user apps
evolved to rely on.

--
Peter Geoghegan




Re: Why forbid "INSERT INTO t () VALUES ();"

2020-06-26 Thread Fabien COELHO



Hello Isaac,


This is not the only area where empty tuples are not supported. Consider:

PRIMARY KEY ()

This should mean the table may only contain a single row, but is not
supported.


Yep. This is exactly the kind of case about which I was trying the 
command, after reading Bruce Momjian blog 
(https://momjian.us/main/blogs/pgblog/2020.html#June_22_2020) about 
one-row tables and thinking about how to improve it and allow enforcing a 
singleton simply, which is a thing I needed several times in the past.



On the plus side, empty rows are supported, although the explicit ROW
keyword is required.


Yet another weirdness.

Thanks for the comments.

--
Fabien.




Re: should libpq also require TLSv1.2 by default?

2020-06-26 Thread Tom Lane
I wrote:
> Anybdy have a better idea?  Is there a reasonably direct way to ask
> OpenSSL what its min and max versions are?

After some digging, there apparently is not.  At first glance it would
seem that SSL_get_min_proto_version/SSL_get_max_proto_version should
help, but in reality they're just blindingly useless, because they
return zero in most cases of interest.  And when they don't return zero
they might give us a code that we don't recognize, so there's no future
proofing to be had from using them.  Plus they don't exist before
openssl 1.1.1.

It looks like, when they exist, we could use them to discover any
restrictions openssl.cnf has set on the allowed protocol versions ...
but I'm not really convinced that's worth the trouble.  If we up the
libpq default to TLSv1.2 then there probably won't be any real-world
cases where openssl.cnf affects our results.

So I propose the attached.  The hack in openssl.h to guess the
min/max supported versions is certainly nothing but a hack;
but I see no way to do better.

regards, tom lane

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 8adf64c78e..778b166753 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -72,6 +72,7 @@ static bool dummy_ssl_passwd_cb_called = false;
 static bool ssl_is_server_start;
 
 static int	ssl_protocol_version_to_openssl(int v);
+static const char *ssl_protocol_version_to_string(int v);
 
 /*  */
 /*		 Public interface		*/
@@ -365,6 +366,7 @@ be_tls_open_server(Port *port)
 	int			err;
 	int			waitfor;
 	unsigned long ecode;
+	bool		give_proto_hint;
 
 	Assert(!port->ssl);
 	Assert(!port->peer);
@@ -451,10 +453,48 @@ aloop:
 			 errmsg("could not accept SSL connection: EOF detected")));
 break;
 			case SSL_ERROR_SSL:
+switch (ERR_GET_REASON(ecode))
+{
+		/*
+		 * NO_PROTOCOLS_AVAILABLE occurs if the client and
+		 * server min/max protocol version limits don't
+		 * overlap.  UNSUPPORTED_PROTOCOL,
+		 * WRONG_VERSION_NUMBER, and
+		 * TLSV1_ALERT_PROTOCOL_VERSION have been observed
+		 * when trying to communicate with an old OpenSSL
+		 * library.  It's not very clear what would make
+		 * OpenSSL return the other codes listed here, but a
+		 * hint about protocol versions seems like it's
+		 * appropriate for all.
+		 */
+	case SSL_R_NO_PROTOCOLS_AVAILABLE:
+	case SSL_R_UNSUPPORTED_PROTOCOL:
+	case SSL_R_BAD_PROTOCOL_VERSION_NUMBER:
+	case SSL_R_UNKNOWN_SSL_VERSION:
+	case SSL_R_UNSUPPORTED_SSL_VERSION:
+	case SSL_R_VERSION_TOO_HIGH:
+	case SSL_R_VERSION_TOO_LOW:
+	case SSL_R_WRONG_SSL_VERSION:
+	case SSL_R_WRONG_VERSION_NUMBER:
+	case SSL_R_TLSV1_ALERT_PROTOCOL_VERSION:
+		give_proto_hint = true;
+		break;
+	default:
+		give_proto_hint = false;
+		break;
+}
 ereport(COMMERROR,
 		(errcode(ERRCODE_PROTOCOL_VIOLATION),
 		 errmsg("could not accept SSL connection: %s",
-SSLerrmessage(ecode;
+SSLerrmessage(ecode)),
+		 give_proto_hint ?
+		 errhint("This may indicate that the client does not support any SSL protocol version between %s and %s.",
+ ssl_min_protocol_version ?
+ ssl_protocol_version_to_string(ssl_min_protocol_version) :
+ MIN_OPENSSL_TLS_VERSION,
+ ssl_max_protocol_version ?
+ ssl_protocol_version_to_string(ssl_max_protocol_version) :
+ MAX_OPENSSL_TLS_VERSION) : 0));
 break;
 			case SSL_ERROR_ZERO_RETURN:
 ereport(COMMERROR,
@@ -1328,6 +1368,29 @@ ssl_protocol_version_to_openssl(int v)
 	return -1;
 }
 
+/*
+ * Likewise provide a mapping to strings.
+ */
+static const char *
+ssl_protocol_version_to_string(int v)
+{
+	switch (v)
+	{
+		case PG_TLS_ANY:
+			return "any";
+		case PG_TLS1_VERSION:
+			return "TLSv1";
+		case PG_TLS1_1_VERSION:
+			return "TLSv1.1";
+		case PG_TLS1_2_VERSION:
+			return "TLSv1.2";
+		case PG_TLS1_3_VERSION:
+			return "TLSv1.3";
+	}
+
+	return "(unrecognized)";
+}
+
 
 static void
 default_openssl_tls_init(SSL_CTX *context, bool isServerStart)
diff --git a/src/include/common/openssl.h b/src/include/common/openssl.h
index 47fa129994..e4ef53ef70 100644
--- a/src/include/common/openssl.h
+++ b/src/include/common/openssl.h
@@ -17,12 +17,30 @@
 #ifdef USE_OPENSSL
 #include 
 
+/*
+ * OpenSSL doesn't provide any very nice way to identify the min/max
+ * protocol versions the library supports, so we fake it as best we can.
+ * Note in particular that this doesn't account for restrictions that
+ * might be specified in the installation's openssl.cnf.
+ */
+#define MIN_OPENSSL_TLS_VERSION  "TLSv1"
+
+#if defined(TLS1_3_VERSION)
+#define MAX_OPENSSL_TLS_VERSION  "TLSv1.3"
+#elif defined(TLS1_2_VERSION)
+#define MAX_OPENSSL_TLS_VERSION  "TLSv1.2"
+#elif defined(TLS1_1_VERSION)
+#define 

Re: PG 13 release notes, first draft

2020-06-26 Thread Alvaro Herrera
Reading Luis Carril's other entry in the relnotes,

 Allow pg_dump --include-foreign-data to dump data from foreign servers (Luis 
Carril)

It seems to suggest that --include-foreign-data existed previously,
which is not true. I would have worded it as "Add --include-foreign-data
option to pg_dump to allow dumping data from foreign servers".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PG 13 release notes, first draft

2020-06-26 Thread Alvaro Herrera
On 2020-Jun-26, Bruce Momjian wrote:

> On Fri, Jun 26, 2020 at 05:24:16PM +0900, Masahiko Sawada wrote:

> > Author: Alvaro Herrera 
> > 2020-03-20 [4e6209134] pg_dump: Add FOREIGN to ALTER statements, if 
> > appropriate
> > -->
> > 
> >   
> >Add FOREIGN to ALTER 
> > statements,
> >if appropriate (Luis Carril)
> >   

> > IIUC this entry is about that pg_dump adds FOREIGN word to ALTER TABLE
> > command. Please find the attached patch.
> 
> OK, so if that is, what used to happen before?  Did it still work
> without the FOREIGN keyword?  If so, I am thinking we should just remove
> this item.

I tend to agree, it's not a change significant enough to be documented
in the relnotes, i think.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_dump bug for extension owned tables

2020-06-26 Thread Fabrízio de Royes Mello
On Fri, Jun 26, 2020 at 11:55 AM Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
>
> On Fri, Jun 26, 2020 at 11:24 AM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:
> >
> >
> > On 6/26/20 9:57 AM, Andrew Dunstan wrote:
> > > It appears that for extension owned tables tbinfo.attgenerated isn't
> > > being properly populated, so line 2050 in REL_12_STABLE, which is line
> > > 2109 in git tip, is failing.
> > >
> > >
> >
> > Should have mentioned this is in src/bin/pg_dump/pg_dump.c
> >
>
> Having a look on it.
>

Seems when qualify the schemaname the the "tbinfo->interesting" field is
not setted for extensions objects, so the getTableAttrs can't fill the
attgenerated field properly.

I'm not 100% sure it's the correct way but the attached patch works for me
and all tests passed. Maybe we should add more TAP tests?

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a41a3db876..e12811832f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -18063,6 +18063,9 @@ processExtensionTables(Archive *fout, ExtensionInfo extinfo[],
 		if (strlen(extconditionarray[j]) > 0)
 			configtbl->dataObj->filtercond = pg_strdup(extconditionarray[j]);
 	}
+
+	if (configtbl->dobj.dump != DUMP_COMPONENT_NONE)
+		configtbl->interesting = true;
 }
 			}
 		}


Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Tomas Vondra

On Fri, Jun 26, 2020 at 12:37:26PM -0400, Bruce Momjian wrote:

On Fri, Jun 26, 2020 at 04:44:14PM +0200, Tomas Vondra wrote:

On Fri, Jun 26, 2020 at 12:02:10AM -0400, Bruce Momjian wrote:
> On Fri, Jun 26, 2020 at 01:53:57AM +0200, Tomas Vondra wrote:
> > I'm not saying it's not beneficial to use different limits for different
> > nodes. Some nodes are less sensitive to the size (e.g. sorting often
> > gets faster with smaller work_mem). But I think we should instead have a
> > per-session limit, and the planner should "distribute" the memory to
> > different nodes. It's a hard problem, of course.
>
> Yeah, I am actually confused why we haven't developed a global memory
> allocation strategy and continue to use per-session work_mem.
>

I think it's pretty hard problem, actually. One of the reasons is that


Yes, it is a hard problem, because it is balancing memory for shared
buffers, work_mem, and kernel buffers:

https://momjian.us/main/blogs/pgblog/2018.html#December_7_2018

I think the big problem is that the work_mem value is not one value but
a floating value that is different per query and session, and concurrent
session activity.


the costing of a node depends on the amount of memory available to the
node, but as we're building the plan bottom-up, we have no information
about the nodes above us. So we don't know if there are operations that
will need memory, how sensitive they are, etc.

And so far the per-node limit served us pretty well, I think. So I'm not
very confused we don't have the per-session limit yet, TBH.


I was thinking more of being able to allocate a single value to be
shared by all active sesions.



Not sure I understand. What "single value" do you mean?

Wasn't the idea was to replace work_mem with something like query_mem?
That'd be nice, but I think it's inherently circular - we don't know how
to distribute this to different nodes until we know which nodes will
need a buffer, but the buffer size is important for costing (so we need
it when constructing the paths).

Plus then there's the question whether all nodes should get the same
fraction, or less sensitive nodes should get smaller chunks, etc.
Ultimately this would be based on costing too, I think, but it makes it
soe much complex ...


Also, doesn't this blog entry also show that spiling to disk for ORDER
BY is similarly slow compared to hash aggs?

https://momjian.us/main/blogs/pgblog/2012.html#February_2_2012



The post does not mention hashagg at all, so I'm not sure how could it
show that? But I think you're right the spilling itself is not that far
away, in most cases (thanks to the recent fixes made by Jeff).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PG 13 release notes, first draft

2020-06-26 Thread Bruce Momjian
On Fri, Jun 26, 2020 at 05:24:16PM +0900, Masahiko Sawada wrote:
> Hi,
> 
> I realized that PG 13 release note still has the following entry:
> 
> 
> 
>   
>Add FOREIGN to ALTER statements,
>if appropriate (Luis Carril)
>   
> 
>   
>WHAT IS THIS ABOUT?
>   
>  
> 
> 
> 
> IIUC this entry is about that pg_dump adds FOREIGN word to ALTER TABLE
> command. Please find the attached patch.

OK, so if that is, what used to happen before?  Did it still work
without the FOREIGN keyword?  If so, I am thinking we should just remove
this item.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2020-06-26 Thread Tom Lane
Quan Zongliang  writes:
> The reason is that the error message in the bit_in / varbit_in function 
> is output directly using %c. Causes the log file to not be decoded 
> correctly.

> The attachment is a patch.

I'm really quite skeptical of the premise here.  We do not guarantee that
the postmaster log file is valid in any particular encoding; it'd be
nearly impossible to do so if the cluster contains databases using
different encodings.  So I think you'd be way better off to reformulate
your log-reading code to be less fragile.

Even granting the premise, the proposed patch seems like a significant
decrease in user-friendliness for typical cases.  I'd rather see us
make an effort to print one valid-per-the-DB-encoding character.
Now that we can rely on snprintf to count %s restrictions in bytes,
I think something like this should work:

 errmsg("\"%.*s\" is not a valid binary digit",
pg_mblen(sp), sp)));

But the real problem is that this is only the tip of the iceberg.
You didn't even hit all the %c usages in varbit.c.  A quick grep finds
these other spots that can doubtless be made to do the same thing:

acl.c:899:  elog(ERROR, "unrecognized objtype abbreviation: 
%c", objtypec);
arrayfuncs.c:507:   
 errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:554:   
 errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:584:   
 errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:591:   
 errdetail("Unmatched \"%c\" character.", '}')));
arrayfuncs.c:633:   
 errdetail("Unexpected \"%c\" character.",
encode.c:184:errmsg("invalid hexadecimal digit: 
\"%c\"", c)));
encode.c:341:errmsg("invalid symbol 
\"%c\" while decoding base64 sequence", (int) c)));
formatting.c:3298:  
  errmsg("unmatched format separator \"%c\"",
jsonpath_gram.c:2390:
errdetail("unrecognized flag character \"%c\" in LIKE_REGEX predicate",
regexp.c:426:
errmsg("invalid regular expression option: \"%c\"",
tsvector_op.c:312:  elog(ERROR, "unrecognized weight: %c", 
char_weight);
tsvector_op.c:872:   
errmsg("unrecognized weight: \"%c\"", char_weight)));
varbit.c:233:errmsg("\"%c\" is not 
a valid binary digit",
varbit.c:258:errmsg("\"%c\" is not 
a valid hexadecimal digit",
varbit.c:534:errmsg("\"%c\" is not 
a valid binary digit",
varbit.c:559:errmsg("\"%c\" is not 
a valid hexadecimal digit",
varlena.c:5589:  errmsg("unrecognized format() 
type specifier \"%c\"",
varlena.c:5710:  errmsg("unrecognized 
format() type specifier \"%c\"",

and that's just in src/backend/utils/adt/.

regards, tom lane




Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Bruce Momjian
On Fri, Jun 26, 2020 at 04:44:14PM +0200, Tomas Vondra wrote:
> On Fri, Jun 26, 2020 at 12:02:10AM -0400, Bruce Momjian wrote:
> > On Fri, Jun 26, 2020 at 01:53:57AM +0200, Tomas Vondra wrote:
> > > I'm not saying it's not beneficial to use different limits for different
> > > nodes. Some nodes are less sensitive to the size (e.g. sorting often
> > > gets faster with smaller work_mem). But I think we should instead have a
> > > per-session limit, and the planner should "distribute" the memory to
> > > different nodes. It's a hard problem, of course.
> > 
> > Yeah, I am actually confused why we haven't developed a global memory
> > allocation strategy and continue to use per-session work_mem.
> > 
> 
> I think it's pretty hard problem, actually. One of the reasons is that

Yes, it is a hard problem, because it is balancing memory for shared
buffers, work_mem, and kernel buffers:

https://momjian.us/main/blogs/pgblog/2018.html#December_7_2018

I think the big problem is that the work_mem value is not one value but
a floating value that is different per query and session, and concurrent
session activity.

> the costing of a node depends on the amount of memory available to the
> node, but as we're building the plan bottom-up, we have no information
> about the nodes above us. So we don't know if there are operations that
> will need memory, how sensitive they are, etc.
> 
> And so far the per-node limit served us pretty well, I think. So I'm not
> very confused we don't have the per-session limit yet, TBH.

I was thinking more of being able to allocate a single value to be
shared by all active sesions.

Also, doesn't this blog entry also show that spiling to disk for ORDER
BY is similarly slow compared to hash aggs?

https://momjian.us/main/blogs/pgblog/2012.html#February_2_2012

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: should libpq also require TLSv1.2 by default?

2020-06-26 Thread Tom Lane
Here is a quick attempt at getting libpq and the server to report
suitable hint messages about SSL version problems.

The main thing I don't like about this as formulated is that I hard-wired
knowledge of the minimum and maximum SSL versions into the hint messages.
That's clearly not very maintainable, but it seems really hard to keep the
messages readable/useful without giving concrete version numbers.
Anybdy have a better idea?  Is there a reasonably direct way to ask
OpenSSL what its min and max versions are?

regards, tom lane

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 8adf64c78e..380268636a 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -72,6 +72,7 @@ static bool dummy_ssl_passwd_cb_called = false;
 static bool ssl_is_server_start;
 
 static int	ssl_protocol_version_to_openssl(int v);
+static const char *ssl_protocol_version_to_string(int v);
 
 /*  */
 /*		 Public interface		*/
@@ -365,6 +366,7 @@ be_tls_open_server(Port *port)
 	int			err;
 	int			waitfor;
 	unsigned long ecode;
+	bool		give_proto_hint;
 
 	Assert(!port->ssl);
 	Assert(!port->peer);
@@ -451,10 +453,44 @@ aloop:
 			 errmsg("could not accept SSL connection: EOF detected")));
 break;
 			case SSL_ERROR_SSL:
+switch (ERR_GET_REASON(ecode))
+{
+		/*
+		 * NO_PROTOCOLS_AVAILABLE occurs if the client and
+		 * server min/max protocol version limits don't
+		 * overlap.  UNSUPPORTED_PROTOCOL and
+		 * WRONG_VERSION_NUMBER have been observed when trying
+		 * to communicate with an old OpenSSL library.  It's
+		 * not very clear what would make OpenSSL return the
+		 * other codes listed here, but a hint about protocol
+		 * versions seems like it's appropriate for all.
+		 */
+	case SSL_R_NO_PROTOCOLS_AVAILABLE:
+	case SSL_R_UNSUPPORTED_PROTOCOL:
+	case SSL_R_BAD_PROTOCOL_VERSION_NUMBER:
+	case SSL_R_UNKNOWN_SSL_VERSION:
+	case SSL_R_UNSUPPORTED_SSL_VERSION:
+	case SSL_R_VERSION_TOO_HIGH:
+	case SSL_R_VERSION_TOO_LOW:
+	case SSL_R_WRONG_SSL_VERSION:
+	case SSL_R_WRONG_VERSION_NUMBER:
+		give_proto_hint = true;
+		break;
+	default:
+		give_proto_hint = false;
+		break;
+}
 ereport(COMMERROR,
 		(errcode(ERRCODE_PROTOCOL_VIOLATION),
 		 errmsg("could not accept SSL connection: %s",
-SSLerrmessage(ecode;
+SSLerrmessage(ecode)),
+		 give_proto_hint ? errhint("This may indicate that the client does not support any SSL protocol version between %s and %s.",
+   ssl_min_protocol_version ?
+   ssl_protocol_version_to_string(ssl_min_protocol_version) :
+   "TLSv1",
+   ssl_max_protocol_version ?
+   ssl_protocol_version_to_string(ssl_max_protocol_version) :
+   "TLSv1.3") : 0));
 break;
 			case SSL_ERROR_ZERO_RETURN:
 ereport(COMMERROR,
@@ -1328,6 +1364,29 @@ ssl_protocol_version_to_openssl(int v)
 	return -1;
 }
 
+/*
+ * Likewise provide a mapping to strings.
+ */
+static const char *
+ssl_protocol_version_to_string(int v)
+{
+	switch (v)
+	{
+		case PG_TLS_ANY:
+			return "any";
+		case PG_TLS1_VERSION:
+			return "TLSv1";
+		case PG_TLS1_1_VERSION:
+			return "TLSv1.1";
+		case PG_TLS1_2_VERSION:
+			return "TLSv1.2";
+		case PG_TLS1_3_VERSION:
+			return "TLSv1.3";
+	}
+
+	return "(unrecognized)";
+}
+
 
 static void
 default_openssl_tls_init(SSL_CTX *context, bool isServerStart)
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 2d813ef5f9..71407bffd4 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1304,6 +1304,40 @@ open_client_SSL(PGconn *conn)
 	  libpq_gettext("SSL error: %s\n"),
 	  err);
 	SSLerrfree(err);
+	switch (ERR_GET_REASON(ecode))
+	{
+			/*
+			 * NO_PROTOCOLS_AVAILABLE occurs if the client and
+			 * server min/max protocol version limits don't
+			 * overlap.  UNSUPPORTED_PROTOCOL and
+			 * WRONG_VERSION_NUMBER have been observed when
+			 * trying to communicate with an old OpenSSL
+			 * library.  It's not very clear what would make
+			 * OpenSSL return the other codes listed here, but
+			 * a hint about protocol versions seems like it's
+			 * appropriate for all.
+			 */
+		case SSL_R_NO_PROTOCOLS_AVAILABLE:
+		case SSL_R_UNSUPPORTED_PROTOCOL:
+		case SSL_R_BAD_PROTOCOL_VERSION_NUMBER:
+		case SSL_R_UNKNOWN_SSL_VERSION:
+		case SSL_R_UNSUPPORTED_SSL_VERSION:
+		case SSL_R_VERSION_TOO_HIGH:
+		case SSL_R_VERSION_TOO_LOW:
+		case SSL_R_WRONG_SSL_VERSION:
+		case SSL_R_WRONG_VERSION_NUMBER:
+			appendPQExpBuffer(>errorMessage,
+			  libpq_gettext("This may indicate that the 

Re: pg_dump bug for extension owned tables

2020-06-26 Thread Fabrízio de Royes Mello
On Fri, Jun 26, 2020 at 11:24 AM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:
>
>
> On 6/26/20 9:57 AM, Andrew Dunstan wrote:
> > It appears that for extension owned tables tbinfo.attgenerated isn't
> > being properly populated, so line 2050 in REL_12_STABLE, which is line
> > 2109 in git tip, is failing.
> >
> >
>
> Should have mentioned this is in src/bin/pg_dump/pg_dump.c
>

Having a look on it.

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Tomas Vondra

On Fri, Jun 26, 2020 at 12:02:10AM -0400, Bruce Momjian wrote:

On Fri, Jun 26, 2020 at 01:53:57AM +0200, Tomas Vondra wrote:

I'm not saying it's not beneficial to use different limits for different
nodes. Some nodes are less sensitive to the size (e.g. sorting often
gets faster with smaller work_mem). But I think we should instead have a
per-session limit, and the planner should "distribute" the memory to
different nodes. It's a hard problem, of course.


Yeah, I am actually confused why we haven't developed a global memory
allocation strategy and continue to use per-session work_mem.



I think it's pretty hard problem, actually. One of the reasons is that
the costing of a node depends on the amount of memory available to the
node, but as we're building the plan bottom-up, we have no information
about the nodes above us. So we don't know if there are operations that
will need memory, how sensitive they are, etc.

And so far the per-node limit served us pretty well, I think. So I'm not
very confused we don't have the per-session limit yet, TBH.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Possible NULL dereferencing (src/backend/tcop/pquery.c)

2020-06-26 Thread Ranier Vilela
Hi,

Per Coverity.
Perhaps it is excessive caution.
Probably assertion check has already caught all possible errors.
But, redundancy may not cost as much and is worth it.

1.Assertion check
/* Caller messed up if we have neither a ready query nor held data. */
Assert(queryDesc || portal->holdStore);

But in release, if QueryDesc is NULL and portal->holdStore is NULL too,
when Call PushActiveSnapshot  *deference* NULL check can happen.

2. if (portal->atEnd || count <= 0) is True
No need to recheck count against FETCH_ALL.

Is it worth correcting them?

regards,
Ranier Vilela


fix_null_deference_pquery.patch
Description: Binary data


Re: pg_dump bug for extension owned tables

2020-06-26 Thread Andrew Dunstan


On 6/26/20 9:57 AM, Andrew Dunstan wrote:
> It appears that for extension owned tables tbinfo.attgenerated isn't
> being properly populated, so line 2050 in REL_12_STABLE, which is line
> 2109 in git tip, is failing.
>
>

Should have mentioned this is in src/bin/pg_dump/pg_dump.c


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





pg_dump bug for extension owned tables

2020-06-26 Thread Andrew Dunstan


Found when working against release 12.


Given the following extension:


::
share/postgresql/extension/dummy--1.0.0.sql
::
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION dummy" to load this file. \quit

CREATE TABLE @extschema@.dummytab (
   a int,
   b int,
   c int);
SELECT pg_catalog.pg_extension_config_dump('dummytab', '');


::
share/postgresql/extension/dummy.control
::
# dummy extension
comment = 'dummy'
default_version = '1.0.0'
relocatable = false


and this use of it:


bin/psql -c 'create schema dummy; create extension  dummy schema dummy;
insert into dummy.dummytab values(1,2,3);'


this command segfaults:


bin/pg_dump -a --column-inserts -n dummy


It appears that for extension owned tables tbinfo.attgenerated isn't
being properly populated, so line 2050 in REL_12_STABLE, which is line
2109 in git tip, is failing.


I'm looking for a fix, but if anyone has a quick fix that would be nice :-)


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: should libpq also require TLSv1.2 by default?

2020-06-26 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 26 Jun 2020, at 00:44, Tom Lane  wrote:
>> BTW, the server-side report of the problem looks like
>> LOG:  could not accept SSL connection: wrong version number

> I can totally see some thinking that it's the psql version at client side 
> which
> is referred to and not the TLS protocol version.  Perhaps we should add a hint
> there as well?

Not sure.  We can't fix it in the case we're mainly concerned about,
namely an out-of-support server version.  At the same time, it's certainly
true that "version number" is way too under-specified in this context.
Maybe improving this against the day that TLSv2 exists would be smart.

regards, tom lane




RE: Remove a redundant condition check

2020-06-26 Thread Ádám Balogh
Hello,

-Original Message-
From: Amit Kapila  
Sent: 2020. június 26., péntek 11:09
To: Ádám Balogh 
Cc: PostgreSQL Hackers 
Subject: Re: Remove a redundant condition check

>On Thu, Jun 25, 2020 at 11:23 PM Ádám Balogh  wrote:
>>
>>
>> A one line change to remove a duplicate check. This duplicate check was 
>> detected during testing my contribution to a static code analysis tool. 
>> There is no functional change, no new tests needed.
>
> Yeah, this duplicate check is added as part of commit b2a5545bd6.  See below 
> part of change.
>
> - /*
> - * If this record was a timeline switch, wake up any
> - * walsenders to notice that we are on a new timeline.
> - */
> - if (switchedTLI && AllowCascadeReplication())
> - WalSndWakeup();
> + /* Is this a timeline switch? */
> + if (switchedTLI)
> + {
> + /*
> + * Before we continue on the new timeline, clean up any
> + * (possibly bogus) future WAL segments on the old timeline.
> + */
> + RemoveNonParentXlogFiles(EndRecPtr, ThisTimeLineID);
> +
> + /*
> + * Wake up any walsenders to notice that we are on a new
> + * timeline.
> + */
> + if (switchedTLI && AllowCascadeReplication()) WalSndWakeup(); }
>
> It seems we forgot to remove the additional check for switchedTLI while 
> adding a new check.  I think we can remove this duplicate > > check in the 
> HEAD code.  I am not sure if it is worth to backpatch such a change.

Thank you for confirming it. I do not think it is worth to backpatch, it is 
just a readability issue. 
Regards,

Ádám



Re: [Patch] ALTER SYSTEM READ ONLY

2020-06-26 Thread Robert Haas
On Fri, Jun 26, 2020 at 5:59 AM Michael Paquier  wrote:
> Any operation working on on-disk relation blocks needs to have a
> consistent state, and a clean shutdown gives this guarantee thanks to
> the shutdown checkpoint (see also pg_rewind).  There are two states in
> the control file, shutdown for a primary and shutdown while in
> recovery to cover that.  So if you stop the server cleanly but fail to
> see a proper state with pg_checksums, it seems to me that the proposed
> patch does not handle correctly the state of the cluster in the
> control file at shutdown.  That's not good.

I think it is actually very good. If a feature that supposedly
prevents writing WAL permitted a shutdown checkpoint to be written, it
would be failing to accomplish its design goal. There is not much of a
use case for a feature that stops WAL from being written except when
it doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Remove Extra palloc Of raw_buf For Binary Format In COPY FROM

2020-06-26 Thread Rushabh Lathia
On Fri, Jun 26, 2020 at 3:16 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> Hi Hackers,
>
> There seems to be an extra palloc of 64KB of raw_buf for binary format
> files which is not required
> as copy logic for binary files don't use raw_buf, instead, attribute_buf
> is used in CopyReadBinaryAttribute.
>

+1

I looked at the patch and the changes looked good. Couple of comments;

1)

+
+ /* For binary files raw_buf is not used,
+ * instead, attribute_buf is used in
+ * CopyReadBinaryAttribute. Hence, don't palloc
+ * raw_buf.
+ */

Not a PG style of commenting.

2)  In non-binary mode, should assign NULL the raw_buf.

Attaching patch with those changes.



> Attached is a patch, which places a check to avoid this unnecessary 64KB
> palloc.
>
> Request the community to take this patch, if it is useful.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Thanks,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc4..97170d3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3368,7 +3368,17 @@ BeginCopyFrom(ParseState *pstate,
 	initStringInfo(>attribute_buf);
 	initStringInfo(>line_buf);
 	cstate->line_buf_converted = false;
-	cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+
+	/*
+	 * For binary files raw_buf is not used and instead attribute_buf
+	 * is used in CopyReadBinaryAttribute. Hence, don't palloc raw_buf
+	 * for binary files.
+	 */
+	if (!cstate->binary)
+		cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+	else
+		cstate->raw_buf = NULL;
+
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 
 	/* Assign range table, we'll need it in CopyFrom. */


Re: xid wraparound danger due to INDEX_CLEANUP false

2020-06-26 Thread Robert Haas
On Thu, Jun 25, 2020 at 8:44 PM Peter Geoghegan  wrote:
> I am sure about this much: The design embodied by Masahiko's patch is
> clearly a better one overall, even if it doesn't fix the problem on
> its own. I agree that we cannot allow nbtree to ignore "INDEX_CLEANUP
> = off", even if that means leaking pages that could otherwise be
> recycled. I'm not sure what we should do about any of this in the back
> branches, though. I wish I had a simple idea about what to do there.

My opinion is that there's no need to change the code in the
back-branches, and that I don't really like the approach in master
either.

I think what we're saying is that there is no worse consequence to
turning off index_cleanup than some bloat that isn't likely to be
recovered unless you REINDEX. If the problem in question were going to
cause data loss or data corruption or something, we'd have to take
stronger action, but I don't think anyone's saying that this is the
case. Therefore, I think we can handle the back-branches by letting
users know about the bloat hazard and suggesting that they avoid this
option unless it's necessary to avoid running out of XIDs.

Now, what about master? I think it's fine to offer the AM a callback
even when index_cleanup = false, for example so that it can freeze
something in its metapage, but I don't agree with passing it the TIDs.
That seems like it's just inviting it to ignore the emergency brake,
and it's also incurring real overhead, because remembering all those
TIDs can use a lot of memory. If that API limitation causes a problem
for some future index AM, that will be a good point to discuss when
the patch for said AM is submitted for review. I entirely agree with
you that the way btree arranges for btree recycling is crude, and I
would be delighted if you want to improve it, either for v14 or for
any future release, or if somebody else wants to do so. However, even
if that never happens, so what?

In retrospect, I regret committing this patch without better
understanding the issues in this area. That was a fail on my part. At
the same time, it doesn't really sound like the issues are all that
bad. The potential index bloat does suck, but it can still suck less
than the alternatives, and we have evidence that for at least one
user, it was worth a major version upgrade just to replace the
suckitude they had with the suckitude this patch creates.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




ModifyTable overheads in generic plans

2020-06-26 Thread Amit Langote
Hi,

I would like to discuss a refactoring patch that builds on top of the
patches at [1] to address $subject.  To get an idea for what
eliminating these overheads looks like, take a look at the following
benchmarking results.

Note 1: I've forced the use of generic plan by setting plan_cache_mode
to 'force_generic_plan'

Note 2: The individual TPS figures as measured are quite noisy, though
I just want to show the rough trend with increasing number of
partitions.

pgbench -i -s 10 --partitions={0, 10, 100, 1000}
pgbench -T120 -f test.sql -M prepared

test.sql:
\set aid random(1, 100)
update pgbench_accounts set abalance = abalance + 1 where aid = :aid;

Without any of the patches:

0   tps = 13045.485121 (excluding connections establishing)
10  tps = 9358.157433 (excluding connections establishing)
100 tps = 1878.274500 (excluding connections establishing)
1000tps = 84.684695 (excluding connections establishing)

The slowdown as the partition count increases can be explained by the
fact that UPDATE and DELETE can't currently use runtime partition
pruning.  So, even if any given transaction is only updating a single
tuple in a single partition, the plans for *all* partitions are being
initialized and also the ResultRelInfos.  That is, a lot of useless
work being done in InitPlan() and ExecInitModifyTable().

With the patches at [1] (latest 0001+0002 posted there), whereby the
generic plan for UPDATE can now perform runtime pruning, numbers can
be seen to improve, slightly:

0   tps = 12743.487196 (excluding connections establishing)
10  tps = 12644.240748 (excluding connections establishing)
100 tps = 4158.123345 (excluding connections establishing)
1000tps = 391.248067 (excluding connections establishing)

So even though runtime pruning enabled by those patches ensures that
the useless plans are left untouched by the executor, the
ResultRelInfos are still being made assuming *all* result relations
will be processed.  With the attached patches (0001+0002+0003) that I
want to discuss here in this thread, numbers are further improved:

0   tps = 13419.283168 (excluding connections establishing)
10  tps = 12588.016095 (excluding connections establishing)
100 tps = 8560.824225 (excluding connections establishing)
1000tps = 1926.553901 (excluding connections establishing)

0001 and 0002 are preparatory patches.  0003 teaches nodeModifyTable.c
to make the ResultRelInfo for a given result relation lazily, that is,
when the plan producing tuples to be updated/deleted actually produces
one that belongs to that relation.  So, if a transaction only updates
one tuple, then only one ResultRelInfo would be made.  For larger
partition counts, that saves significant amount of work.

However, there's one new loop in ExecInitModifyTable() added by the
patches at [1] that loops over all partitions, which I haven't been
able to eliminate so far and I'm seeing it cause significant
bottleneck at higher partition counts.  The loop is meant to create a
hash table that maps result relation OIDs to their offsets in the
PlannedStmt.resultRelations list.  We need this mapping, because the
ResultRelInfos are accessed from the query-global array using that
offset.  One approach that was mentioned by David Rowley at [1] to not
have do this mapping is to make the result relation's scan node's
targetlist emit the relation's RT index or ordinal position to begin
with, instead of the table OID, but I haven't figured out a way to do
that.

Having taken care of the ModifyTable overheads (except the one
mentioned in the last paragraph), a few more bottlenecks are seen to
pop up at higher partition counts.  Basically, they result from doing
some pre-execution actions on relations contained in the plan by
traversing the flat range table in whole.

1. AcquireExecutorLocks(): locks *all* partitions before executing the
plan tree but runtime pruning allows to skip scanning all but one

2. ExecCheckRTPerms(): checks permissions of *all* partitions before
executing the plan tree, but maybe it's okay to check only the ones
that will be accessed

Problem 1 has been discussed before and David Rowley even developed a
patch that was discussed at [2].  The approach taken in the patch was
to delay locking of the partitions contained in a generic plan that
are potentially runtime pruneable, although as also described in the
linked thread, that approach has a race condition whereby a concurrent
session may invalidate the generic plan by altering a partition in the
window between when AcquireExecutorLocks() runs on the plan and the
plan is executed.

Another solution suggested to me by Robert Haas in an off-list
discussion is to teach AcquireExecutorLocks() or the nearby code to
perform EXTERN parameter based pruning before passing the plan tree to
the executor and lock partitions that survive that pruning.  It's
perhaps doable if we refactor the ExecFindInitialMatchingSubPlans() to
not require a full-blown 

Re: Online checksums patch - once again

2020-06-26 Thread Daniel Gustafsson
> On 26 Jun 2020, at 14:12, Robert Haas  wrote:
> 
> On Thu, Jun 25, 2020 at 5:43 AM Daniel Gustafsson  wrote:
>> Sorry being a bit thick, can you elaborate which case you're thinking about?
>> CREATE TABLE sets the attribute according to the value of data_checksums, and
>> before enabling checksums (and before changing data_checksums to inprogress)
>> the bgworker will update all relhaschecksums from true (if any) to false.  
>> Once
>> the state is set to inprogress all new relations will set relhaschecksums to
>> true.
> 
> Oh, I think I was the one who was confused. I guess relhaschecksums
> only really has meaning when we're in the process of enabling
> checksums? So if we're in that state, then the Boolean tells us
> whether a particular relation is done, and otherwise it doesn't
> matter?

That is correct (which is why the name is terrible since it doesn't convey
that at all).

cheers ./daniel



Re: should libpq also require TLSv1.2 by default?

2020-06-26 Thread Daniel Gustafsson
> On 26 Jun 2020, at 00:44, Tom Lane  wrote:

> My feeling now is that we'd be better off defaulting
> ssl_min_protocol_version to something nonempty, just to make this
> behavior platform-independent.  We certainly can't leave the docs
> as they are.

Yeah, given the concensus in this thread and your findings I think we should
default to TLSv1.2 as originally proposed.

I still think there will be instances of existing connections to old servers
that will all of a sudden break, but it's probably true that it's not a common
setup.  Optimizing for the majority and helping the minority with documentation
is IMO the winning move.

> Also, I confirm that the failure looks like
> 
> $ psql -h ... -d "dbname=postgres sslmode=require"
> psql: error: could not connect to server: SSL error: unsupported protocol
> 
> While that's not *that* awful, if you realize that "protocol" means
> TLS version, many people probably won't without a hint.  It does not
> help any that the message doesn't mention either the offered TLS version
> or the version limits being enforced.  I'm not sure we can do anything
> about the former, but reducing the number of variables affecting the
> latter seems like a smart idea.

+1

> BTW, the server-side report of the problem looks like
> 
> LOG:  could not accept SSL connection: wrong version number

I can totally see some thinking that it's the psql version at client side which
is referred to and not the TLS protocol version.  Perhaps we should add a hint
there as well?

cheers ./daniel



Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-06-26 Thread Daniel Gustafsson
> On 26 Jun 2020, at 10:11, Michael Paquier  wrote:
> 
> On Thu, Jun 25, 2020 at 09:38:23AM +0200, Daniel Gustafsson wrote:
>> Attached is a rebased version which was updated to handle the changes for op
>> class parameters introduced in 911e70207703799605.
> 
> Thanks for the updated version.

Thanks for reviewing!

> While re-reading the code, I got cold feet with the changes done in
> recordDependencyOn().  Sure, we could do as you propose, but that does
> not have to be part of this patch I think, aimed at switching more
> catalogs to use multi inserts, and it just feels a duplicate of
> recordMultipleDependencies(), with the same comments copied all over
> the place, etc.

Fair enough, I can take that to another patch later in the cycle.

> MAX_TEMPLATE_BYTES in pg_shdepend.c needs a comment to explain that
> this is to cap the number of slots used in
> copyTemplateDependencies() for pg_shdepend.

Agreed, +1 on the proposed wording.

> Not much a fan of the changes in GenerateTypeDependencies(),
> particularly the use of refobjs[8], capped to the number of items from
> typeForm.  If we add new members I think that this would break
> easily without us actually noticing that it broke.  

Yeah, thats not good, it's better to leave that out.

> The use of
> ObjectAddressSet() is a good idea though, but it does not feel
> consistent if you don't the same coding rule to typbasetype,
> typcollation or typelem.  I am also thinking to split this part of the
> cleanup in a first independent patch.

+1 on splitting into a separate patch.

> pg_constraint.c, pg_operator.c, extension.c and pg_aggregate.c were
> using ObjectAddressSubSet() with subset set to 0 when registering a
> dependency.  It is simpler to just use ObjectAddressSet().

Fair enough, either way, I don't have strong opinions.

> As this
> updates the way dependencies are tracked and recorded, that's better
> if kept in the main patch.

Agreed.

> +   /* TODO is nreferenced a reasonable allocation of slots? */
> +   slot = palloc(sizeof(TupleTableSlot *) * nreferenced);
> It seems to me that we could just apply the same rule as for
> pg_attribute and pg_shdepend, no?

I think so, I see no reason not to.

> CatalogTupleInsertWithInfo() becomes mostly unused with this patch,
> its only caller being now LOs.  Just noticing, I'd rather not remove
> it for now.

Agreed, let's not bite off that too here, there's enough to chew on.

> The attached includes a bunch of modifications I have done while going
> through the patch (I indend to split and apply the changes of
> pg_type.c separately first, just lacked of time now to send a proper
> split), and there is the number of slots for pg_depend insertions that
> still needs to be addressed.  On top of that pgindent has not been run
> yet.  That's all I have for today, overall the patch is taking a
> committable shape :)

I like it, thanks for hacking on it.  I will take another look at it later
today when back at my laptop.

cheers ./daniel





Re: pgsql: Enable Unix-domain sockets support on Windows

2020-06-26 Thread Amit Kapila
On Sat, Mar 28, 2020 at 7:37 PM Peter Eisentraut  wrote:
>
> Enable Unix-domain sockets support on Windows
>

+
+/*
+ * Windows headers don't define this structure, but you can define it yourself
+ * to use the functionality.
+ */
+struct sockaddr_un
+{
+   unsigned short sun_family;
+   char sun_path[108];
+};

I was going through this feature and reading about Windows support for
it.  I came across a few links which suggest that this structure is
defined in .  Is there a reason for not using this via
afunix.h?

[1] - https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/
[2] - https://gist.github.com/NZSmartie/079d8f894ee94f3035306cb23d49addc

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Online checksums patch - once again

2020-06-26 Thread Robert Haas
On Thu, Jun 25, 2020 at 5:43 AM Daniel Gustafsson  wrote:
> Sorry being a bit thick, can you elaborate which case you're thinking about?
> CREATE TABLE sets the attribute according to the value of data_checksums, and
> before enabling checksums (and before changing data_checksums to inprogress)
> the bgworker will update all relhaschecksums from true (if any) to false.  
> Once
> the state is set to inprogress all new relations will set relhaschecksums to
> true.

Oh, I think I was the one who was confused. I guess relhaschecksums
only really has meaning when we're in the process of enabling
checksums? So if we're in that state, then the Boolean tells us
whether a particular relation is done, and otherwise it doesn't
matter?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: TAP tests and symlinks on Windows

2020-06-26 Thread Peter Eisentraut

On 2020-06-23 12:55, Michael Paquier wrote:

I have implemented a patch based on the feedback received that does
the following, tested with all three patterns (MSVC only on Windows):
- Assume that all non-Windows platform have a proper symlink
implementation for perl.
- If on Windows, check for the presence of Win32::Symlink:
-- If the module is not detected, skip the tests not supported.
-- If the module is detected, run them.


We should be more accurate about things like this:

+# The following tests test symlinks. Windows may not have symlinks, so
+# skip there.

The issue isn't whether Windows has symlinks, since all versions of 
Windows supported by PostgreSQL do (AFAIK).  The issue is only whether 
the Perl installation that runs the tests has symlink support.  And that 
is only necessary if the test itself wants to create or inspect 
symlinks.  For example, there are existing tests involving tablespaces 
that work just fine on Windows.


Relatedly, your patch ends up skipping the tests on MSYS2, even though 
Perl supports symlinks there out of the box.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Windows regress fails (latest HEAD)

2020-06-26 Thread Ranier Vilela
Em qui., 11 de jun. de 2020 às 10:28, Ranier Vilela 
escreveu:

> Em qui., 11 de jun. de 2020 às 10:01, Andrew Dunstan <
> andrew.duns...@2ndquadrant.com> escreveu:
>
>>
>> On 6/11/20 8:52 AM, Ranier Vilela wrote:
>> > Hi,
>> > Latest HEAD, fails with windows regress tests.
>> >
>> >  float8   ... FAILED  517 ms
>> >  partition_prune  ... FAILED 3085 ms
>> >
>> >
>>
>> The first thing you should do when you find this is to see if there is a
>> buildfarm report of the failure. If there isn't then try to work out why.
>>
> Sorry, I will have to research the buildfarm, I have no reference to it.
>
>
>>
>> Also, when making a report like this, it is essential to let us know
>> things like:
>>
>>   * which commit is causing the failure (git bisect is good for finding
>> this)
>>
> Thanks for hit (git bisect).
>
>
>>   * what Windows version you're testing on
>>
> Windows 10 (2004)
>
>   * which compiler you're using
>>
> msvc 2019 (64 bits)
>

Only for registry, if anyone else is using msvc 2019.
I'm using latest msvc 2019 64 bits (16.6.0)
Problably this is a compiler optimization bug.
vcregress check with build DEBUG, pass all 200 tests.

regards,
Ranier Vilela


Re: Remove a redundant condition check

2020-06-26 Thread Ranier Vilela
Em sex., 26 de jun. de 2020 às 06:09, Amit Kapila 
escreveu:

> On Thu, Jun 25, 2020 at 11:23 PM Ádám Balogh 
> wrote:
> >
> >
> > A one line change to remove a duplicate check. This duplicate check was
> detected during testing my contribution to a static code analysis tool.
> There is no functional change, no new tests needed.
> >
> >
>
> Yeah, this duplicate check is added as part of commit b2a5545bd6.  See
> below part of change.
>
> - /*
> - * If this record was a timeline switch, wake up any
> - * walsenders to notice that we are on a new timeline.
> - */
> - if (switchedTLI && AllowCascadeReplication())
> - WalSndWakeup();
> + /* Is this a timeline switch? */
> + if (switchedTLI)
> + {
> + /*
> + * Before we continue on the new timeline, clean up any
> + * (possibly bogus) future WAL segments on the old timeline.
> + */
> + RemoveNonParentXlogFiles(EndRecPtr, ThisTimeLineID);
> +
> + /*
> + * Wake up any walsenders to notice that we are on a new
> + * timeline.
> + */
> + if (switchedTLI && AllowCascadeReplication())
> + WalSndWakeup();
> + }
>
> It seems we forgot to remove the additional check for switchedTLI
> while adding a new check.  I think we can remove this duplicate check
> in the HEAD code.  I am not sure if it is worth to backpatch such a
> change.
>
+1
Great to know, that this is finally going to be fixed. (1)

regards,
Ranier Vilela
1.
https://www.postgresql.org/message-id/CAEudQAocMqfqt0t64HNo39Z73jMey60WmeryB%2BWFDg3BZpCf%3Dg%40mail.gmail.com


Re: [Patch] ALTER SYSTEM READ ONLY

2020-06-26 Thread Amul Sul
On Fri, Jun 26, 2020 at 12:15 PM Michael Banck
 wrote:
>
> Hi,
>
> On Wed, Jun 24, 2020 at 01:54:29PM +0530, tushar wrote:
> > On 6/22/20 11:59 AM, Amul Sul wrote:
> > > 2. Now skipping the startup checkpoint if the system is read-only mode, as
> > > discussed [2].
> >
> > I am not able to perform pg_checksums o/p after shutting down my server in
> > read only  mode .
> >
> > Steps -
> >
> > 1.initdb (./initdb -k -D data)
> > 2.start the server(./pg_ctl -D data start)
> > 3.connect to psql (./psql postgres)
> > 4.Fire query (alter system read only;)
> > 5.shutdown the server(./pg_ctl -D data stop)
> > 6.pg_checksums
> >
> > [edb@tushar-ldap-docker bin]$ ./pg_checksums -D data
> > pg_checksums: error: cluster must be shut down
> > [edb@tushar-ldap-docker bin]$
>
> What's the 'Database cluster state' from pg_controldata at this point?
>
"in production"

Regards,
Amul




Re: Remove a redundant condition check

2020-06-26 Thread Michael Paquier
On Fri, Jun 26, 2020 at 02:39:22PM +0530, Amit Kapila wrote:
> It seems we forgot to remove the additional check for switchedTLI
> while adding a new check.  I think we can remove this duplicate check
> in the HEAD code.  I am not sure if it is worth to backpatch such a
> change.

Yes, there is no point to keep this check so let's clean up this
code.  I also see no need to do a backpatch here, this is purely
cosmetic.
--
Michael


signature.asc
Description: PGP signature


Re: [Patch] ALTER SYSTEM READ ONLY

2020-06-26 Thread Michael Paquier
On Fri, Jun 26, 2020 at 10:11:41AM +0530, Amul Sul wrote:
> I think that's expected since the server isn't clean shutdown, similar error 
> can
> be seen with any server which has been shutdown in immediate mode
> (pg_clt -D data_dir -m i).

Any operation working on on-disk relation blocks needs to have a
consistent state, and a clean shutdown gives this guarantee thanks to
the shutdown checkpoint (see also pg_rewind).  There are two states in
the control file, shutdown for a primary and shutdown while in
recovery to cover that.  So if you stop the server cleanly but fail to
see a proper state with pg_checksums, it seems to me that the proposed
patch does not handle correctly the state of the cluster in the
control file at shutdown.  That's not good.
--
Michael


signature.asc
Description: PGP signature


[PATCH] Remove Extra palloc Of raw_buf For Binary Format In COPY FROM

2020-06-26 Thread Bharath Rupireddy
Hi Hackers,

There seems to be an extra palloc of 64KB of raw_buf for binary format
files which is not required
as copy logic for binary files don't use raw_buf, instead, attribute_buf
is used in CopyReadBinaryAttribute.

Attached is a patch, which places a check to avoid this unnecessary 64KB palloc.

Request the community to take this patch, if it is useful.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Remove-Extra-palloc-Of-raw_buf-For-Binary-Format-.patch
Description: Binary data


Re: PostgreSQL 13 Beta 1 Release Announcement Draft

2020-06-26 Thread Sébastien Lardière
On 19/05/2020 04:29, Jonathan S. Katz wrote:
> Hi,
>
> Attached is a draft of the release announcement for the PostgreSQL 13
> Beta 1 release this week.
>
>
Hi,

Maybe I'm too late, but in this paragraph :


> `psql` now includes the `\warn` command that is similar to the `\echo`
command
> in terms of outputting data, except `\warn` sends it to stderr. And in
case you
> need additional guidance on any of the PostgreSQL commands, the
`--help` flag
> now includes a link to
[https://www.postgresql.org](https://www.postgresql.org).

is it --help shouldn't be /help ?

Same thing in the release note
(https://www.postgresql.org/docs/13/release-13.html) :

> Add the PostgreSQL home page to command-line |--help| output (Peter
Eisentraut)

as it probalbly refer to 27f3dea64833d68c1fa08c1e5d26176a579f69c8, isn't
it ?

regards,

-- 
Sébastien



Re: Remove a redundant condition check

2020-06-26 Thread Amit Kapila
On Thu, Jun 25, 2020 at 11:23 PM Ádám Balogh  wrote:
>
>
> A one line change to remove a duplicate check. This duplicate check was 
> detected during testing my contribution to a static code analysis tool. There 
> is no functional change, no new tests needed.
>
>

Yeah, this duplicate check is added as part of commit b2a5545bd6.  See
below part of change.

- /*
- * If this record was a timeline switch, wake up any
- * walsenders to notice that we are on a new timeline.
- */
- if (switchedTLI && AllowCascadeReplication())
- WalSndWakeup();
+ /* Is this a timeline switch? */
+ if (switchedTLI)
+ {
+ /*
+ * Before we continue on the new timeline, clean up any
+ * (possibly bogus) future WAL segments on the old timeline.
+ */
+ RemoveNonParentXlogFiles(EndRecPtr, ThisTimeLineID);
+
+ /*
+ * Wake up any walsenders to notice that we are on a new
+ * timeline.
+ */
+ if (switchedTLI && AllowCascadeReplication())
+ WalSndWakeup();
+ }

It seems we forgot to remove the additional check for switchedTLI
while adding a new check.  I think we can remove this duplicate check
in the HEAD code.  I am not sure if it is worth to backpatch such a
change.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Resetting spilled txn statistics in pg_stat_replication

2020-06-26 Thread Magnus Hagander
On Tue, Jun 23, 2020 at 12:18 PM Tomas Vondra 
wrote:

> On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
> >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
> > wrote:
> >>
> >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
> >> >
> >> > >
> >> > >What if the decoding has been performed by multiple backends using
> the
> >> > >same slot?  In that case, it will be difficult to make the judgment
> >> > >for the value of logical_decoding_work_mem based on stats.  It would
> >> > >make sense if we provide a way to set logical_decoding_work_mem for a
> >> > >slot but not sure if that is better than what we have now.
> >> > >
> >>
> >> I thought that the stats are relevant to what
> >> logical_decoding_work_mem value was but not with who performed logical
> >> decoding. So even if multiple backends perform logical decoding using
> >> the same slot, the user can directly use stats as long as
> >> logical_decoding_work_mem value doesn’t change.
> >>
> >
> >I think if you maintain these stats at the slot level, you probably
> >need to use spinlock or atomic ops in order to update those as slots
> >can be used from multiple backends whereas currently, we don't need
> >that.
>
> IMHO storing the stats in the slot itself is a bad idea. We have the
> statistics collector for exactly this purpose, and it's receiving data
> over UDP without any extra locking etc.
>

Yeah, that seems much more appropriate. Of course, where they are exposed
is a different question.


>> > >What problems do we see in displaying these for each process?  I think
> >> > >users might want to see the stats for the exited processes or after
> >> > >server restart but I think both of those are not even possible today.
> >> > >I think the stats are available till the corresponding WALSender
> >> > >process is active.
> >>
> >> I might want to see the stats for the exited processes or after server
> >> restart. But I'm inclined to agree with displaying the stats per
> >> process if the stats are displayed on a separate view (e.g.
> >> pg_stat_replication_slots).
> >>
> >
> >Yeah, as told previously, this makes more sense to me.
> >
> >Do you think we should try to write a POC patch using a per-process
> >entry approach and see what difficulties we are facing and does it
> >give the stats in a way we are imagining but OTOH, we can wait for
> >some more to see if there is clear winner approach here?
> >
>
> I may be missing something obvious, but I still see no point in tracking
> per-process stats. We don't have that for other stats, and I'm not sure
> how common is the scenario when a given slot is decoded by many
> backends. I'd say vast majority of cases are simply running decoding
> from a walsender, which may occasionally restart, but I doubt the users
> are interested in per-pid data - they probably want aggregated data.
>

Well, technically we do -- we have the pg_stat_xact_* views. However, those
are only viewable from *inside* the session itself (which can sometimes be
quite annoying).

This does somewhat apply in that normal transactions send their stats
batches at transaction end. If this is data we'd be interested in viewing
inside of that, a more direct exposure would be needed -- such as the way
we do with LSNs in pg_stat_replication or whatever.

For long-term monitoring, people definitely want aggregate data I'd say.
The "realtime data" if we call it that is in my experience mostly
interesting if you want to define alerts etc ("replication standby is too
far behind" is alertable through that, whereas things like "total amount of
replication traffic over the past hour" is something that's more
trend-alertable which is typically handled in a separate system pulling the
aggregate stats)


Can someone explain a plausible scenario for which tracking per-process
> stats would be needed, and simply computing deltas would not work? How
> will you know which old PID is which, what will you do when a PID is
> reused, and so on?
>

I fail to see that one as well, in a real-world scenario. Maybe if you want
to do a one-off point-tuning of one tiny piece of a system? But you will
then also need to long term statistics to follow-up if what you did was
correct anyway...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Resetting spilled txn statistics in pg_stat_replication

2020-06-26 Thread Amit Kapila
On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada
 wrote:
>
> On Thu, 25 Jun 2020 at 19:35, Amit Kapila  wrote:
> >
> > On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila  wrote:
> > >
> > > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra
> > >  wrote:
> > > >
> > > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
> > > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
> > > > > wrote:
> > > > >>
> > > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra 
> > > > >>  wrote:
> > > > >> >
> > > > >> > >
> > > > >> > >What if the decoding has been performed by multiple backends 
> > > > >> > >using the
> > > > >> > >same slot?  In that case, it will be difficult to make the 
> > > > >> > >judgment
> > > > >> > >for the value of logical_decoding_work_mem based on stats.  It 
> > > > >> > >would
> > > > >> > >make sense if we provide a way to set logical_decoding_work_mem 
> > > > >> > >for a
> > > > >> > >slot but not sure if that is better than what we have now.
> > > > >> > >
> > > > >>
> > > > >> I thought that the stats are relevant to what
> > > > >> logical_decoding_work_mem value was but not with who performed 
> > > > >> logical
> > > > >> decoding. So even if multiple backends perform logical decoding using
> > > > >> the same slot, the user can directly use stats as long as
> > > > >> logical_decoding_work_mem value doesn’t change.
> > > > >>
> >
> > Today, I thought about it again, and if we consider the point that
> > logical_decoding_work_mem value doesn’t change much then having the
> > stats at slot-level would also allow computing
> > logical_decoding_work_mem based on stats.  Do you think it is a
> > reasonable assumption that users won't change
> > logical_decoding_work_mem for different processes (WALSender, etc.)?
>
> FWIW, if we use logical_decoding_work_mem as a threshold of starting
> of sending changes to a subscriber, I think there might be use cases
> where the user wants to set different logical_decoding_work_mem values
> to different wal senders. For example, setting a lower value to
> minimize the latency of synchronous logical replication to a near-site
> whereas setting a large value to minimize the amount of data sent to a
> far site.
>

How does setting a large value can minimize the amount of data sent?
One possibility is if there are a lot of transaction aborts and
transactions are not large enough that they cross
logical_decoding_work_mem threshold but such cases shouldn't be many.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Transactions involving multiple postgres foreign servers, take 2

2020-06-26 Thread Amit Kapila
On Fri, Jun 26, 2020 at 10:50 AM Masahiko Sawada
 wrote:
>
> On Tue, 23 Jun 2020 at 13:26, Amit Kapila  wrote:
> >
> >
> > I think at this stage it is important that we do some study of various
> > approaches to achieve this work and come up with a comparison of the
> > pros and cons of each approach (a) what this patch provides, (b) what
> > is implemented in Global Snapshots patch [1], (c) if possible, what is
> > implemented in Postgres-XL.  I fear that if go too far in spending
> > effort on this and later discovered that it can be better done via
> > some other available patch/work (maybe due to a reasons like that
> > approach can easily extended to provide atomic visibility or the
> > design is more robust, etc.) then it can lead to a lot of rework.
>
> Yeah, I have no objection to that plan but I think we also need to
> keep in mind that (b), (c), and whatever we are thinking about global
> consistency are talking about only PostgreSQL (and postgres_fdw).
>

I think we should explore if those approaches could be extended for
FDWs and if not then that could be considered as a disadvantage of
that approach.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Creating a function for exposing memory usage of backend process

2020-06-26 Thread Kasahara Tatsuhito
Hi,

On Fri, Jun 26, 2020 at 3:42 PM Bharath Rupireddy
 wrote:
> While going through the mail chain on relation, plan and catalogue
> caching [1], I'm thinking on the lines that is there a way to know the
> current relation, plan and catalogue cache sizes? If there is a way
> already,  please ignore this and it would be grateful if someone point
> me to that.
AFAIK the only way to get statistics on PostgreSQL's backend  internal
local memory usage is to use MemoryContextStats() via gdb to output
the information to the log, so far.

> If there is no such way to know the cache sizes and other info such as
> statistics, number of entries, cache misses, hits etc.  can the
> approach discussed here be applied?
I think it's partially yes.

> If the user knows the cache statistics and other information, may be
> we can allow user to take appropriate actions such as allowing him to
> delete few entries through a command or some other way.
Yeah, one of the purposes of the features we are discussing here is to
use them for such situation.

Regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: PG 13 release notes, first draft

2020-06-26 Thread Masahiko Sawada
Hi,

I realized that PG 13 release note still has the following entry:



  
   Add FOREIGN to ALTER statements,
   if appropriate (Luis Carril)
  

  
   WHAT IS THIS ABOUT?
  
 



IIUC this entry is about that pg_dump adds FOREIGN word to ALTER TABLE
command. Please find the attached patch.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pg13_release_note.patch
Description: Binary data


Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-06-26 Thread Michael Paquier
On Thu, Jun 25, 2020 at 09:38:23AM +0200, Daniel Gustafsson wrote:
> Attached is a rebased version which was updated to handle the changes for op
> class parameters introduced in 911e70207703799605.

Thanks for the updated version.

While re-reading the code, I got cold feet with the changes done in
recordDependencyOn().  Sure, we could do as you propose, but that does
not have to be part of this patch I think, aimed at switching more
catalogs to use multi inserts, and it just feels a duplicate of
recordMultipleDependencies(), with the same comments copied all over
the place, etc.

MAX_TEMPLATE_BYTES in pg_shdepend.c needs a comment to explain that
this is to cap the number of slots used in
copyTemplateDependencies() for pg_shdepend.

Not much a fan of the changes in GenerateTypeDependencies(),
particularly the use of refobjs[8], capped to the number of items from
typeForm.  If we add new members I think that this would break
easily without us actually noticing that it broke.  The use of
ObjectAddressSet() is a good idea though, but it does not feel
consistent if you don't the same coding rule to typbasetype,
typcollation or typelem.  I am also thinking to split this part of the
cleanup in a first independent patch.

pg_constraint.c, pg_operator.c, extension.c and pg_aggregate.c were
using ObjectAddressSubSet() with subset set to 0 when registering a
dependency.  It is simpler to just use ObjectAddressSet().  As this
updates the way dependencies are tracked and recorded, that's better
if kept in the main patch.

+   /* TODO is nreferenced a reasonable allocation of slots? */
+   slot = palloc(sizeof(TupleTableSlot *) * nreferenced);
It seems to me that we could just apply the same rule as for
pg_attribute and pg_shdepend, no?

CatalogTupleInsertWithInfo() becomes mostly unused with this patch,
its only caller being now LOs.  Just noticing, I'd rather not remove
it for now.

The attached includes a bunch of modifications I have done while going
through the patch (I indend to split and apply the changes of
pg_type.c separately first, just lacked of time now to send a proper
split), and there is the number of slots for pg_depend insertions that
still needs to be addressed.  On top of that pgindent has not been run
yet.  That's all I have for today, overall the patch is taking a
committable shape :)
--
Michael
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index a8f7e9965b..b7626a7ecf 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -178,6 +178,8 @@ extern void record_object_address_dependencies(const ObjectAddress *depender,
 
 extern void sort_object_addresses(ObjectAddresses *addrs);
 
+extern void reset_object_addresses(ObjectAddresses *addrs);
+
 extern void free_object_addresses(ObjectAddresses *addrs);
 
 /* in pg_depend.c */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index cbfdfe2abe..d31141c1a2 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -93,10 +93,11 @@ extern void heap_truncate_check_FKs(List *relations, bool tempTables);
 
 extern List *heap_truncate_find_FKs(List *relationIds);
 
-extern void InsertPgAttributeTuple(Relation pg_attribute_rel,
-   Form_pg_attribute new_attribute,
-   Datum attoptions,
-   CatalogIndexState indstate);
+extern void InsertPgAttributeTuples(Relation pg_attribute_rel,
+	TupleDesc tupdesc,
+	Oid new_rel_oid,
+	Datum *attoptions,
+	CatalogIndexState indstate);
 
 extern void InsertPgClassTuple(Relation pg_class_desc,
 			   Relation new_rel_desc,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 8be303870f..a7e2a9b26b 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -19,6 +19,7 @@
 #define INDEXING_H
 
 #include "access/htup.h"
+#include "nodes/execnodes.h"
 #include "utils/relcache.h"
 
 /*
@@ -36,6 +37,10 @@ extern void CatalogCloseIndexes(CatalogIndexState indstate);
 extern void CatalogTupleInsert(Relation heapRel, HeapTuple tup);
 extern void CatalogTupleInsertWithInfo(Relation heapRel, HeapTuple tup,
 	   CatalogIndexState indstate);
+extern void CatalogTuplesMultiInsertWithInfo(Relation heapRel,
+			 TupleTableSlot **slot,
+			 int ntuples,
+			 CatalogIndexState indstate);
 extern void CatalogTupleUpdate(Relation heapRel, ItemPointer otid,
 			   HeapTuple tup);
 extern void CatalogTupleUpdateWithInfo(Relation heapRel,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 537913d1bb..3334bef458 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2168,10 +2168,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			RelationPutHeapTuple(relation, buffer, heaptup, false);
 
-			/*
-			 * We don't use heap_multi_insert for catalog tuples yet, but
-			 * better be prepared...
-			 

Re: [patch] demote

2020-06-26 Thread Kyotaro Horiguchi
Mmm. Fat finger..

At Fri, 26 Jun 2020 16:14:38 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Hello.
> 
> If we are going to dive so close to server shutdown, we can just
> utilize the restart-after-crash path, which we can assume to work
> reliably. The attached is a quite rough sketch, hijacking smart
> shutdown path for a convenience, of that but seems working.  "pg_ctl
> -m s -W stop" lets server demote.
> 
> > I hadn't time to investigate Robert's concern about shared memory for 
> > snapshot
> > during recovery.
> 
> The patch does all required clenaup of resources including shared

The path does all required clenaup of..

> memory, I believe.  It's enough if we don't need to keep any resources
> alive?
> 
> > The patch doesn't deal with prepared xact yet. Testing 
> > "start->demote->promote"
> > raise an assert if some prepared xact exist. I suppose I will rollback them
> > during demote in next patch version.
> > 
> > I'm not sure how to divide this patch in multiple small independent steps. I
> > suppose I can split it like:
> > 
> > 1. add demote checkpoint
> > 2. support demote: mostly postmaster, startup/xlog and checkpointer related
> >code
> > 3. cli using pg_ctl demote
> > 
> > ...But I'm not sure it worth it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [patch] demote

2020-06-26 Thread Kyotaro Horiguchi
Hello.

At Thu, 25 Jun 2020 19:27:54 +0200, Jehan-Guillaume de Rorthais 
 wrote in 
> Here is a summary of my work during the last few days on this demote approach.
> 
> Please, find in attachment v2-0001-Demote-PoC.patch and the comments in the
> commit message and as FIXME in code.
> 
> The patch is not finished or bug-free yet, I'm still not very happy with the
> coding style, it probably lack some more code documentation, but a lot has
> changed since v1. It's still a PoC to push the discussion a bit further after
> being myself silent for some days.
> 
> The patch is currently relying on a demote checkpoint. I understand a forced
> checkpoint overhead can be massive and cause major wait/downtime. But I keep
> this for a later step. Maybe we should be able to cancel a running checkpoint?
> Or leave it to its synching work but discard the result without wirting it to
> XLog?

If we are going to dive so close to server shutdown, we can just
utilize the restart-after-crash path, which we can assume to work
reliably. The attached is a quite rough sketch, hijacking smart
shutdown path for a convenience, of that but seems working.  "pg_ctl
-m s -W stop" lets server demote.

> I hadn't time to investigate Robert's concern about shared memory for snapshot
> during recovery.

The patch does all required clenaup of resources including shared
memory, I believe.  It's enough if we don't need to keep any resources
alive?

> The patch doesn't deal with prepared xact yet. Testing 
> "start->demote->promote"
> raise an assert if some prepared xact exist. I suppose I will rollback them
> during demote in next patch version.
> 
> I'm not sure how to divide this patch in multiple small independent steps. I
> suppose I can split it like:
> 
> 1. add demote checkpoint
> 2. support demote: mostly postmaster, startup/xlog and checkpointer related
>code
> 3. cli using pg_ctl demote
> 
> ...But I'm not sure it worth it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b4d475bb0b..a4adf3e587 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2752,6 +2752,7 @@ SIGHUP_handler(SIGNAL_ARGS)
 /*
  * pmdie -- signal handler for processing various postmaster signals.
  */
+static bool		demoting = false;
 static void
 pmdie(SIGNAL_ARGS)
 {
@@ -2774,59 +2775,17 @@ pmdie(SIGNAL_ARGS)
 		case SIGTERM:
 
 			/*
-			 * Smart Shutdown:
+			 * XXX: Hijacked as DEMOTE
 			 *
-			 * Wait for children to end their work, then shut down.
+			 * Runs fast shutdown, then restart as standby
 			 */
 			if (Shutdown >= SmartShutdown)
 break;
 			Shutdown = SmartShutdown;
 			ereport(LOG,
-	(errmsg("received smart shutdown request")));
-
-			/* Report status */
-			AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STOPPING);
-#ifdef USE_SYSTEMD
-			sd_notify(0, "STOPPING=1");
-#endif
-
-			if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
-			{
-/* autovac workers are told to shut down immediately */
-/* and bgworkers too; does this need tweaking? */
-SignalSomeChildren(SIGTERM,
-   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-/* and the autovac launcher too */
-if (AutoVacPID != 0)
-	signal_child(AutoVacPID, SIGTERM);
-/* and the bgwriter too */
-if (BgWriterPID != 0)
-	signal_child(BgWriterPID, SIGTERM);
-/* and the walwriter too */
-if (WalWriterPID != 0)
-	signal_child(WalWriterPID, SIGTERM);
-
-/*
- * If we're in recovery, we can't kill the startup process
- * right away, because at present doing so does not release
- * its locks.  We might want to change this in a future
- * release.  For the time being, the PM_WAIT_READONLY state
- * indicates that we're waiting for the regular (read only)
- * backends to die off; once they do, we'll kill the startup
- * and walreceiver processes.
- */
-pmState = (pmState == PM_RUN) ?
-	PM_WAIT_BACKUP : PM_WAIT_READONLY;
-			}
-
-			/*
-			 * Now wait for online backup mode to end and backends to exit. If
-			 * that is already the case, PostmasterStateMachine will take the
-			 * next step.
-			 */
-			PostmasterStateMachine();
-			break;
+	(errmsg("received demote request")));
+			demoting = true;
+			/* FALL THROUGH */
 
 		case SIGINT:
 
@@ -2839,8 +2798,10 @@ pmdie(SIGNAL_ARGS)
 			if (Shutdown >= FastShutdown)
 break;
 			Shutdown = FastShutdown;
-			ereport(LOG,
-	(errmsg("received fast shutdown request")));
+
+			if (!demoting)
+ereport(LOG,
+		(errmsg("received fast shutdown request")));
 
 			/* Report status */
 			AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STOPPING);
@@ -2887,6 +2848,13 @@ pmdie(SIGNAL_ARGS)
 pmState = PM_WAIT_BACKENDS;
 			}
 
+			/* create standby signal file */
+			{
+FILE *standby_file = 

Re: Why forbid "INSERT INTO t () VALUES ();"

2020-06-26 Thread Peter Eisentraut

On 2020-06-25 18:07, Tom Lane wrote:

So I'm still not convinced we should do this.  "MySQL is incapable
of conforming to the standard" is a really lousy reason for us to do
something.


Conformance to the standard means that the syntax described in the 
standard behaves as specified in the standard.  It doesn't mean you 
can't have additional syntax that is not in the standard.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Patch] ALTER SYSTEM READ ONLY

2020-06-26 Thread Michael Banck
Hi,

On Wed, Jun 24, 2020 at 01:54:29PM +0530, tushar wrote:
> On 6/22/20 11:59 AM, Amul Sul wrote:
> > 2. Now skipping the startup checkpoint if the system is read-only mode, as
> > discussed [2].
> 
> I am not able to perform pg_checksums o/p after shutting down my server in
> read only  mode .
> 
> Steps -
> 
> 1.initdb (./initdb -k -D data)
> 2.start the server(./pg_ctl -D data start)
> 3.connect to psql (./psql postgres)
> 4.Fire query (alter system read only;)
> 5.shutdown the server(./pg_ctl -D data stop)
> 6.pg_checksums
> 
> [edb@tushar-ldap-docker bin]$ ./pg_checksums -D data
> pg_checksums: error: cluster must be shut down
> [edb@tushar-ldap-docker bin]$

What's the 'Database cluster state' from pg_controldata at this point?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz




bugfix: invalid bit/varbit input causes the log file to be unreadable

2020-06-26 Thread Quan Zongliang


The bit/varbit type input functions cause file_fdw to fail to read the 
logfile normally.


1. Server conf:
server_encoding = UTF8
locale = zh_CN.UTF-8

2. Create external tables using file_fdw

CREATE EXTENSION file_fdw;
CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE pglog (
 log_time timestamp(3) with time zone,
 user_name text,
 database_name text,
 process_id integer,
 connection_from text,
 session_id text,
 session_line_num bigint,
 command_tag text,
 session_start_time timestamp with time zone,
 virtual_transaction_id text,
 transaction_id bigint,
 error_severity text,
 sql_state_code text,
 message text,
 detail text,
 hint text,
 internal_query text,
 internal_query_pos integer,
 context text,
 query text,
 query_pos integer,
 location text,
 application_name text
) SERVER pglog
OPTIONS ( filename 'log/postgresql-2020-06-16_213409.csv',
 format 'csv');

It's normal to be here.

3. bit/varbit input
 select b'Ù';

The foreign table cannot be accessed. SELECT * FROM pglog will get:
invalid byte sequence for encoding "UTF8": 0xc3 0x22


The reason is that the error message in the bit_in / varbit_in function 
is output directly using %c. Causes the log file to not be decoded 
correctly.


The attachment is a patch.



diff --git a/src/backend/utils/adt/varbit.c b/src/backend/utils/adt/varbit.c
index f0c6a44b84..506cc35446 100644
--- a/src/backend/utils/adt/varbit.c
+++ b/src/backend/utils/adt/varbit.c
@@ -230,8 +230,8 @@ bit_in(PG_FUNCTION_ARGS)
else if (*sp != '0')
ereport(ERROR,

(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-errmsg("\"%c\" is not a valid 
binary digit",
-   *sp)));
+errmsg("\"0x%02X\" is not a 
valid binary digit",
+   (unsigned 
char)*sp)));
 
x >>= 1;
if (x == 0)
@@ -531,8 +531,8 @@ varbit_in(PG_FUNCTION_ARGS)
else if (*sp != '0')
ereport(ERROR,

(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-errmsg("\"%c\" is not a valid 
binary digit",
-   *sp)));
+errmsg("\"0x%02X\" is not a 
valid binary digit",
+   (unsigned 
char)*sp)));
 
x >>= 1;
if (x == 0)


Re: Creating a function for exposing memory usage of backend process

2020-06-26 Thread Bharath Rupireddy
Hi,

While going through the mail chain on relation, plan and catalogue
caching [1], I'm thinking on the lines that is there a way to know the
current relation, plan and catalogue cache sizes? If there is a way
already,  please ignore this and it would be grateful if someone point
me to that.

Posting this here as I felt it's relevant.

If there is no such way to know the cache sizes and other info such as
statistics, number of entries, cache misses, hits etc.  can the
approach discussed here be applied?

If the user knows the cache statistics and other information, may be
we can allow user to take appropriate actions such as allowing him to
delete few entries through a command or some other way.

I'm sorry, If I'm diverting the topic being discussed in this mail
thread, please ignore if it is irrelevant.

[1] - 
https://www.postgresql.org/message-id/flat/20161219.201505.11562604.horiguchi.kyotaro%40lab.ntt.co.jp

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-06-26 Thread Amit Kapila
On Thu, Jun 25, 2020 at 7:11 PM Dilip Kumar  wrote:
>
> On Wed, Jun 24, 2020 at 4:04 PM Amit Kapila  wrote:
> >
> >
> > Review comments on various patches.
> >
> > poc_shared_fileset_cleanup_on_procexit
> > =
> > 1.
> > - ent->subxact_fileset =
> > - MemoryContextAlloc(ApplyContext, sizeof(SharedFileSet));
> > + MemoryContext oldctx;
> >
> > + /* Shared fileset handle must be allocated in the persistent context */
> > + oldctx = MemoryContextSwitchTo(ApplyContext);
> > + ent->subxact_fileset = palloc(sizeof(SharedFileSet));
> >   SharedFileSetInit(ent->subxact_fileset, NULL);
> > + MemoryContextSwitchTo(oldctx);
> >   fd = BufFileCreateShared(ent->subxact_fileset, path);
> >
> > Why is this change required for this patch and why we only cover
> > SharedFileSetInit in the Apply context and not BufFileCreateShared?
> > The comment is also not very clear on this point.
>
> Added the comments for the same.
>

1.
+ /*
+ * Shared fileset handle must be allocated in the persistent context.
+ * Also, SharedFileSetInit allocate the memory for sharefileset list
+ * so we need to allocate that in the long term meemory context.
+ */

How about "We need to maintain shared fileset across multiple stream
open/close calls.  So, we allocate it in a persistent context."

2.
+ /*
+ * If the caller is following the dsm based cleanup then we don't
+ * maintain the filesetlist so return.
+ */
+ if (filesetlist == NULL)
+ return;

The check here should use 'NIL' instead of 'NULL'

Other than that the changes in this particular patch looks good to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Resetting spilled txn statistics in pg_stat_replication

2020-06-26 Thread Masahiko Sawada
On Thu, 25 Jun 2020 at 19:35, Amit Kapila  wrote:
>
> On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila  wrote:
> >
> > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra
> >  wrote:
> > >
> > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
> > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
> > > > wrote:
> > > >>
> > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra 
> > > >>  wrote:
> > > >> >
> > > >> > >
> > > >> > >What if the decoding has been performed by multiple backends using 
> > > >> > >the
> > > >> > >same slot?  In that case, it will be difficult to make the judgment
> > > >> > >for the value of logical_decoding_work_mem based on stats.  It would
> > > >> > >make sense if we provide a way to set logical_decoding_work_mem for 
> > > >> > >a
> > > >> > >slot but not sure if that is better than what we have now.
> > > >> > >
> > > >>
> > > >> I thought that the stats are relevant to what
> > > >> logical_decoding_work_mem value was but not with who performed logical
> > > >> decoding. So even if multiple backends perform logical decoding using
> > > >> the same slot, the user can directly use stats as long as
> > > >> logical_decoding_work_mem value doesn’t change.
> > > >>
>
> Today, I thought about it again, and if we consider the point that
> logical_decoding_work_mem value doesn’t change much then having the
> stats at slot-level would also allow computing
> logical_decoding_work_mem based on stats.  Do you think it is a
> reasonable assumption that users won't change
> logical_decoding_work_mem for different processes (WALSender, etc.)?

FWIW, if we use logical_decoding_work_mem as a threshold of starting
of sending changes to a subscriber, I think there might be use cases
where the user wants to set different logical_decoding_work_mem values
to different wal senders. For example, setting a lower value to
minimize the latency of synchronous logical replication to a near-site
whereas setting a large value to minimize the amount of data sent to a
far site.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services