date:20201218

On Fri, Dec 18, 2020 at 11:51:55AM +0200, Heikki Linnakangas wrote:
> On 18/12/2020 11:35, Heikki Linnakangas wrote:
> > BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we
> > need two structs? They're both allocated and controlled by the
> > cryptohash implementation. It would seem simpler to have just one.
> 
> Something like this. Passes regression tests, but otherwise untested.

Interesting.  I have looked at that with a fresh mind, thanks for the
idea.  This reduces the number of allocations to one making the error
handling a no-brainer, at the cost of hiding the cryptohash type
directly to the caller.  I originally thought that this would be
useful as I recall reading cases in the OpenSSL code doing checks on
hash type used, but perhaps that's just some over-engineered thoughts
from my side.  I have found a couple of small-ish issues, please see
my comments below.

+   /*
+* FIXME: this always allocates enough space for the largest hash.
+* A smaller allocation would be enough for md5, sha224 and sha256.
+*/
I am not sure that this is worth complicating more, and we are not
talking about a lot of memory (sha512 requires 208 bytes, sha224/256
104 bytes, md5 96 bytes with a quick measurement).  This makes free()
equally more simple.  So just allocating the amount of memory based on
the max size in the union looks fine to me.  I would add a memset(0)
after this allocation though.

-#define ALLOC(size) palloc(size)
+#define ALLOC(size) MemoryContextAllocExtended(TopMemoryContext, size, 
MCXT_ALLOC_NO_OOM)
As the only allocation in TopMemoryContext is for the context, it
would be fine to not use MCXT_ALLOC_NO_OOM here, and fail so as
callers in the backend don't need to worry about create() returning
NULL.

-   state->evpctx = EVP_MD_CTX_create();
+   ctx->evpctx = EVP_MD_CTX_create();

-   if (state->evpctx == NULL)
+   if (ctx->evpctx == NULL)
{
If EVP_MD_CTX_create() fails, you would leak memory with the context
allocated in TopMemoryContext.  So this requires a free of the context
before the elog(ERROR).

+   /*
+* Make sure that the resource owner has space to remember this
+* reference. This can error out with "out of memory", so do this
+* before any other allocation to avoid leaking.
+*/
 #ifndef FRONTEND
 ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner);
 #endif
Right.  Good point.

-   /* OpenSSL internals return 1 on success, 0 on failure */
+   /* openssl internals return 1 on success, 0 on failure */
It seems to me that this was not wanted.

At the same time, I have taken care of your comment from upthread to
return a failure if the caller passes NULL for the context, and
adjusted some comments.  What do you think of the attached?
--
Michael
diff --git a/src/include/common/cryptohash.h b/src/include/common/cryptohash.h
index 6ead1cb8e5..817e07c9a2 100644
--- a/src/include/common/cryptohash.h
+++ b/src/include/common/cryptohash.h
@@ -25,12 +25,8 @@ typedef enum
 	PG_SHA512
 } pg_cryptohash_type;
 
-typedef struct pg_cryptohash_ctx
-{
-	pg_cryptohash_type type;
-	/* private area used by each hash implementation */
-	void	   *data;
-} pg_cryptohash_ctx;
+/* opaque context, private to each cryptohash implementation */
+typedef struct pg_cryptohash_ctx pg_cryptohash_ctx;
 
 extern pg_cryptohash_ctx *pg_cryptohash_create(pg_cryptohash_type type);
 extern int	pg_cryptohash_init(pg_cryptohash_ctx *ctx);
diff --git a/src/common/cryptohash.c b/src/common/cryptohash.c
index cf4588bad7..f385bd778f 100644
--- a/src/common/cryptohash.c
+++ b/src/common/cryptohash.c
@@ -39,6 +39,21 @@
 #define FREE(ptr) free(ptr)
 #endif
 
+/* Internal pg_cryptohash_ctx structure */
+struct pg_cryptohash_ctx
+{
+	pg_cryptohash_type type;
+
+	union
+	{
+		pg_md5_ctx		md5;
+		pg_sha224_ctx	sha224;
+		pg_sha256_ctx	sha256;
+		pg_sha384_ctx	sha384;
+		pg_sha512_ctx	sha512;
+	} data;
+};
+
 /*
  * pg_cryptohash_create
  *
@@ -50,38 +65,18 @@ pg_cryptohash_create(pg_cryptohash_type type)
 {
 	pg_cryptohash_ctx *ctx;
 
+	/*
+	 * Note that this always allocates enough space for the largest hash.
+	 * A smaller allocation would be enough for md5, sha224 and sha256,
+	 * but the small extra amount of memory does not make it worth
+	 * complicating this code.
+	 */
 	ctx = ALLOC(sizeof(pg_cryptohash_ctx));
 	if (ctx == NULL)
 		return NULL;
-
+	memset(ctx, 0, sizeof(pg_cryptohash_ctx));
 	ctx->type = type;
 
-	switch (type)
-	{
-		case PG_MD5:
-			ctx->data = ALLOC(sizeof(pg_md5_ctx));
-			break;
-		case PG_SHA224:
-			ctx->data = ALLOC(sizeof(pg_sha224_ctx));
-			break;
-		case PG_SHA256:
-			ctx->data = ALLOC(sizeof(pg_sha256_ctx));
-			break;
-		case PG_SHA384:
-			ctx->data = ALLOC(sizeof(pg_sha384_ctx));
-			break;
-		case PG_SHA512:
-			ctx->data = ALLOC(sizeof(pg_sha512_ctx));
-			break;
-	}
-
-	if (ctx->data == NULL)
-	{
-		explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
-		FREE(ctx);
-		return NULL;
-	}
-
 	return ctx;
 }
 
@@ -95,24 +90,24

[PATCH] Identify LWLocks in tracepoints

2020-12-18 Thread Craig Ringer

Hi all

The attached patch set follows on from the discussion in [1] "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

This does not provide complete information on blockers, because it's not
necessarily valid to compare any two LWLock* pointers between two process
address spaces. The locks could be in DSM segments, and those DSM segments
could be mapped at different addresses.

I wasn't able to work out a sensible way to map a LWLock* to any sort of
(tranche-id, lock-index) because there's no requirement that locks in a
tranche be contiguous or known individually to the lmgr.

Despite that, the patches improve the information available for LWLock
analysis significantly.

Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
fired from LWLockWaitForVar, despite that function never actually acquiring
the lock.

Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
it possible to differentiate between individual locks within a tranche, and
(so long as they aren't tranches in a DSM segment) compare locks between
processes. That means you can do lock-order analysis etc, which was not
previously especially feasible. Traces also don't have to do userspace
reads for the tranche name all the time, so the trace can run with lower
overhead.

Patch 3 adds a single-path tracepoint for all lock acquires and releases,
so you only have to probe the lwlock__acquired and lwlock__release events
to see all acquires/releases, whether conditional or otherwise. It also
adds start markers that can be used for timing wallclock duration of LWLock
acquires/releases.

Patch 4 adds some comments on LWLock tranches to try to address some points
I found confusing and hard to understand when investigating this topic.



[1]
https://www.postgresql.org/message-id/CAGRY4nz%3DSEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA%40mail.gmail.com
.
From 583c818e3121c0f7c6506b434497c81ae94ee9cb Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 19 Nov 2020 17:30:47 +0800
Subject: [PATCH v1 4/4] Comments on LWLock tranches

---
 src/backend/storage/lmgr/lwlock.c | 49 +--
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index cfdfa7f328..123bcc463e 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -112,11 +112,14 @@ extern slock_t *ShmemLock;
  *
  * 1. The individually-named locks defined in lwlocknames.h each have their
  * own tranche.  The names of these tranches appear in IndividualLWLockNames[]
- * in lwlocknames.c.
+ * in lwlocknames.c. The LWLock structs are allocated in MainLWLockArray.
  *
  * 2. There are some predefined tranches for built-in groups of locks.
  * These are listed in enum BuiltinTrancheIds in lwlock.h, and their names
- * appear in BuiltinTrancheNames[] below.
+ * appear in BuiltinTrancheNames[] below. The LWLock structs are allocated
+ * elsewhere under the control of the subsystem that manages the tranche. The
+ * LWLock code does not know or care where in shared memory they are allocated
+ * or how many there are in a tranche.
  *
  * 3. Extensions can create new tranches, via either RequestNamedLWLockTranche
  * or LWLockRegisterTranche.  The names of these that are known in the current
@@ -196,6 +199,13 @@ static int	LWLockTrancheNamesAllocated = 0;
  * This points to the main array of LWLocks in shared memory.  Backends inherit
  * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
  * where we have special measures to pass it down).
+ *
+ * This array holds individual LWLocks and LWLocks allocated in named tranches.
+ *
+ * It does not hold locks for any LWLock that's separately initialized with
+ * LWLockInitialize(). Locks in tranches listed in BuiltinTrancheIds or
+ * allocated with LWLockNewTrancheId() can be embedded in other structs
+ * anywhere in shared memory.
  */
 LWLockPadded *MainLWLockArray = NULL;
 
@@ -593,6 +603,12 @@ InitLWLockAccess(void)
  * Caller needs to retrieve the requested number of LWLocks starting from
  * the base lock address returned by this API.  This can be used for
  * tranches that are requested by using RequestNamedLWLockTranche() API.
+ *
+ * The locks are already initialized.
+ *
+ * This function can not be used for locks in builtin tranches or tranches
+ * registered with LWLockRegisterTranche(). There is no way to look those locks
+ * up by name.
  */
 LWLockPadded *
 GetNamedLWLockTranche(const char *tranche_name)
@@ -647,6 +663,14 @@ LWLockNewTrancheId(void)
  *
  * The tranche name will be user-visible as a wait event name, so try to
  * use a name that fits the style for those.
+ *
+ * The tranche ID should be a user-defined tranche ID acquired from
+ * LWLockNewTrancheId(). It is not necessary to call this for tranches
+ * allocated by RequestNamedLWLockTranche

Re: Commit fest manager for 2021-01

On Sat, Dec 19, 2020 at 10:03:47AM +0530, Amit Kapila wrote:
> Glad to hear. I am confident that you can do justice to this role.

I also think you will do just fine.  Thanks for taking care of this.
--
Michael


signature.asc
Description: PGP signature

Re: Commit fest manager for 2021-01

2020-12-18 Thread Amit Kapila

On Sat, Dec 19, 2020 at 9:10 AM Masahiko Sawada  wrote:
>
> Hi all,
>
> The next commit fest is going to begin in two weeks.
>
> I would like to volunteer as commit fest manager for 2021-01
>

Glad to hear. I am confident that you can do justice to this role.

 if the
> role is not filled and there are no objections.
>

I haven't seen anybody volunteering yet.

-- 
With Regards,
Amit Kapila.

Commit fest manager for 2021-01

2020-12-18 Thread Masahiko Sawada

Hi all,

The next commit fest is going to begin in two weeks.

I would like to volunteer as commit fest manager for 2021-01 if the
role is not filled and there are no objections.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: Proposed patch for key managment

On Fri, Dec 18, 2020 at 04:36:24PM -0500, Stephen Frost wrote:
> > Starting with the passphrase key wrapper, since it's what's in place now.
> > 
> >  - cluster_key_challenge_command = 'password_key_wrapper %q'
> 
> I do tend to like this idea of having what
> cluster_key_challenge_command, or whatever we call it, expects is an
> actual key and have the command that is run be a separate command that
> takes the passphrase and runs the KDF (key derivation function) on it,
> when a passphrase is what the user wishes to use.
> 
> That generally makes more sense to me than having the key derivation
> effort built into the backend which I have a hard time seeing any
> particular reason for, as long we're already calling out to some
> external utility of some kind to get the key.

I have modified the patch to do as you suggested, and as you explained
to me on the phone.  :-)  Instead of having the server hash the pass
phrase provided by the cluster_passphrase_command, have the
cluster_passphrase_command generate a sha256 hash if the user-provided
input is not already 64 hex bytes.  If it is already 64 hex bytes, e.g,
via openssl rand -hex 32, no need to have the server hash that again. 
Double-hashing is less secure.

I have modified the attached patch to do that, and the scripts --- all
my tests pass.  FYI, I moved hex_decode to src/common so that front-end
could use it, and removed it from ecpg since ecpg can use the common one
now too.

> > Sorry, I wasn't trying to hand wave it away. For automated
> > interactions, like big iron HSMs or cloud KSMs, the difference between
> > 2 operations and 10 operations to start a DB server won't matter. For
> > an admin/operator having to type 10 passwords or get 10 clean
> > thumbprint scans, it would be horrible. My underlying question was, is
> > that toil the only problem to be solved, or is there another reason to
> > get into key combination, key splitting and the related issues which
> > are less documented and less well understood than key wrapping.
> 
> I appreciate that you're not trying to hand wave it away but this also
> didn't really answer the actual question- what's the advantage to having
> all of the keys provided by something external rather than having that
> external thing provide just one 'main' key, which we then use to decrypt
> our enveloped keys that we actually use?  I can think of some possible
> advantages but I'd really like to know what advantages you're seeing in
> doing that.

I am not going be as kind.  Our workflow is:

Desirability -> Design -> Implement -> Test -> Review -> Commit
https://wiki.postgresql.org/wiki/Todo#Development_Process

I have already asked about the first item, and received a reply talking
about the design --- that is not helpful.  I only have so much patience
for the "I want my secret decoder ring to glow in the dark" type of
feature additions to this already complex feature.  Unless we stay on
Desirability, I will not be replying.  If you can't answer the
Desirability question, well, talking about items farther right on the
list is not very helpful.

Now, I will say that your question about how a KMS will use this got me
thinking about how to test this, and that got me to implement the AWS
Secret Manager script, so that we definitely helpful.  My point is that
I don't think it is helpful to get into long discussions unless the
Desirability is clearly answered.  This is not just related to this
thread --- this kind of jump-over-Desirability has happened a lot in
relation to this feature, so I thought it would be good to clearly state
it now.

> > > This seems like a crux of at least one concern- that the current patch
> > > is deriving the actual KEK from the passphrase and not just taking the
> > > provided value (at least, that's what it looks like from a *very* quick
> > > look into it), and that's the part that I was suggesting that we might
> > > add an option for- to indicate if the cluster passphrase command is
> > > actually returning a passphrase which should be used to derive a key, or
> > > if it's returning a key directly to be used.  That does seem to be a
> > > material difference to me and one which we should care about.
> > 
> > Yes. Caring about that is the reason I've been making a nuisance of myself.
> 
> Right, I think we can agree on this aspect and I've chatted with Bruce
> about it some also.  When a direct key can be provided, we should use
> that, and not run it through a KDF.  This seems like a particularly
> important case that we should care about even at this early stage.

Agreed, and done in the attached patch.  :-)  I am thankful for all the
ideas that has helped move this feature forward.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

key.diff.gz
Description: application/gzip

key-alter.diff.gz
Description: application/gzip

pass_aws.sh

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

On Fri, Dec 18, 2020 at 01:04:27PM +0200, Heikki Linnakangas wrote:
> BTW, it's a bit weird that the pg_cryptohash_init/update/final() functions
> return success, if the ctx argument is NULL. It would seem more sensible for
> them to return an error.

Okay.

> That way, if a caller forgets to check for NULL
> result from pg_cryptohash_create(), but correctly checks the result of those
> other functions, it would catch the error. In fact, if we documented that
> pg_cryptohash_create() can return NULL, and pg_cryptohash_final() always
> returns error on NULL argument, then it would be sufficient for the callers
> to only check the return value of pg_cryptohash_final(). So the usage
> pattern would be:
> 
> ctx = pg_cryptohash_create(PG_MD5);
> pg_cryptohash_inti(ctx);
> pg_update(ctx, data, size);
> pg_update(ctx, moredata, size);
> if (pg_cryptohash_final(ctx, &hash) < 0)
> elog(ERROR, "md5 calculation failed");
> pg_cryptohash_free(ctx);

I'd rather keep the init and update routines to return an error code
directly.  This is more consistent with OpenSSL (note that libnss does
not return error codes for the init, update and final but it is
possible to grab for errors then react on that).  And we have even in
our tree code paths a-la-pgcrypto that have callbacks for each phase
with some processing in-between.  HMAC also gets a bit cleaner by
keeping this flexibility IMO.
--
Michael


signature.asc
Description: PGP signature

Re: Refactoring HMAC in the core code

On Fri, Dec 18, 2020 at 07:42:02PM -0500, Bruce Momjian wrote:
> > Please note that on a related thread that I have begun yesterday,
> > Heikki has suggested some changes in the way we handle the opaque data
> > used by each cryptohash implementation.
> > https://www.postgresql.org/message-id/6ebe7f1f-bf37-2688-2ac1-a081d2783...@iki.fi
> > 
> > As the design used on this thread for HMAC is similar to what I did
> > for cryptohashes, it would be good to conclude first on the interface
> > there, and then come back here so as a consistent design is used.  As
> > a whole, I don't think that there is any need to rush for this stuff.
> > I would rather wait more and make sure that we agree on an interface
> > people are happy enough with.
> 
> Others are waiting to continue working. I am not going to hold up a
> patch over a one function, two-line API issue.  I will deal with
> whatever new API you choose, and mine will work fine using the OpenSSL
> API until I do.

I will also point out that my patch is going to be bigger and bigger,
and harder to review, the longer I work on it.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Refactoring HMAC in the core code

On Sat, Dec 19, 2020 at 09:35:57AM +0900, Michael Paquier wrote:
> On Fri, Dec 18, 2020 at 10:48:00AM -0500, Bruce Momjian wrote:
> > Great.  See my questions in the key manager thread about whether I
> > should use the init/update/final API or just keep the one-line version.
> > As far as when to commit this, I think the quiet time is actually better
> > because if you break something, it is less of a disruption while you fix
> > it.
> 
> Please note that on a related thread that I have begun yesterday,
> Heikki has suggested some changes in the way we handle the opaque data
> used by each cryptohash implementation.
> https://www.postgresql.org/message-id/6ebe7f1f-bf37-2688-2ac1-a081d2783...@iki.fi
> 
> As the design used on this thread for HMAC is similar to what I did
> for cryptohashes, it would be good to conclude first on the interface
> there, and then come back here so as a consistent design is used.  As
> a whole, I don't think that there is any need to rush for this stuff.
> I would rather wait more and make sure that we agree on an interface
> people are happy enough with.

Others are waiting to continue working. I am not going to hold up a
patch over a one function, two-line API issue.  I will deal with
whatever new API you choose, and mine will work fine using the OpenSSL
API until I do.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Refactoring HMAC in the core code

On Fri, Dec 18, 2020 at 10:48:00AM -0500, Bruce Momjian wrote:
> Great.  See my questions in the key manager thread about whether I
> should use the init/update/final API or just keep the one-line version.
> As far as when to commit this, I think the quiet time is actually better
> because if you break something, it is less of a disruption while you fix
> it.

Please note that on a related thread that I have begun yesterday,
Heikki has suggested some changes in the way we handle the opaque data
used by each cryptohash implementation.
https://www.postgresql.org/message-id/6ebe7f1f-bf37-2688-2ac1-a081d2783...@iki.fi

As the design used on this thread for HMAC is similar to what I did
for cryptohashes, it would be good to conclude first on the interface
there, and then come back here so as a consistent design is used.  As
a whole, I don't think that there is any need to rush for this stuff.
I would rather wait more and make sure that we agree on an interface
people are happy enough with.
--
Michael


signature.asc
Description: PGP signature

Re: terminate called after throwing an instance of 'std::bad_alloc'

2020-12-18 Thread Justin Pryzby

I'm re-raising this issue (since I needed to track down why I'd reflexively
saved a 512g backup of the filesystem where this first happened).

I'm 99% sure the "bad_alloc" is from LLVM.  It happened multiple times on
different servers (running a similar report) after setting jit=on during pg13
upgrade, and never happened since re-setting jit=off.

I reproduced a memory leak, but I don't think you publicized a fix for it.

I *think* the memory leak probably explains the bad_alloc.  We were seeing
postgres killed by OOM every ~16hours, although I don't see how that's possible
from such a tiny leak.  Our query also uses postgis (at first I said it
didn't), so it may be that they're involved.

I'd be happy to run with a prototype fix for the leak to see if the other issue
does (not) recur.

On Wed, Oct 07, 2020 at 04:02:42PM -0700, Andres Freund wrote:
> Hi,
> 
> Sorry for the second send of this email, but somehow I managed to mangle
> the headers in the last version I sent.
> 
> On 2020-10-03 19:01:27 -0700, Andres Freund wrote:
> > On 2020-10-03 18:30:46 -0500, Justin Pryzby wrote:
> > > On Sat, Oct 03, 2020 at 04:01:49PM -0700, Andres Freund wrote:
> > > > > I was able to make a small, apparent leak like so.  This applies to
> > > > > postgis3.0/postgres12/LLVM5/centos7, and
> > > > > postgis3.1alpha/postgres13/LLVM9/centos8.
> > > > >
> > > > > Inlining is essential to see the leak, optimization is not.  Showing 
> > > > > every 9th
> > > > > query:
> > > > >
> > > > > $ for a in `seq 1 99`; do echo "SET jit_inline_above_cost=1; SET 
> > > > > jit_optimize_above_cost=-1; SET jit_above_cost=0; SET 
> > > > > jit_expressions=on; SET log_executor_stats=on; SET 
> > > > > client_min_messages=debug; SET jit=on; explain analyze SELECT 
> > > > > st_x(site_geo) FROM sites WHERE site_geo IS NOT NULL;"; done |psql ts 
> > > > > 2>&1 |grep 'resident' |sed -n '1~9p'
> > > > > !   46024 kB max resident size
> > > > > !   46492 kB max resident size
> > > >
> > > > Huh, that's certainly not good. Looking.
> > >
> > > Reproduce like:
> > > postgres=# CREATE EXTENSION postgis;
> > > postgres=# CREATE TABLE geo(g geometry(point));
> > > postgres=# INSERT INTO geo SELECT st_GeometryFromText('POINT(11 12)',0) 
> > > FROM generate_series(1,9);
> > > postgres=# SET jit_above_cost=0; SET jit_inline_above_cost=0; SET 
> > > client_min_messages=debug; SET log_executor_stats=on; explain analyze 
> > > SELECT st_x(g) FROM geo;
> >
> > It's an issue inside LLVM. I'm not yet sure how to avoid it. Basically
> > whenever we load an IR module to inline from, it renames its type names
> > to make them not conflict with already known type names in the current
> > LLVMContext. That, over time, leads to a large number of type names in
> > memory. We can avoid the re-loading of the module by instead cloning it,
> > but that still leads to new types being created for some internal
> > reasons.  It's possible to create/drop separate LLVMContext instances,
> > but we'd have to figure out when to do so - it's too expensive to always
> > do that.
> 
> I spent a lot of time (probably too much time) trying to find a good way
> out of that. I think I mostly figured it out, but it's gonna take me a
> bit longer to nail down the loose ends. The nice part is that it
> actually speeds up inlining a lot. The bad part is that it's not a small
> change, so I don't yet know how to deal with existing releases.
> 
> Details:
> 
> The issue with the leaks is basically that when LLVM loads a bitcode
> module it doesn't "unify" named types with types in other bitcode module
> - they could after all be entirely independent. That means that we end
> up with a lot of types after a while of running.
> 
> That ought to be addressable by not re-loading modules we inline from
> from disk every time we inline, but unfortunately that doesn't work well
> either: The "cross module import" logic unfortunately modifies the types
> used in the "source module", leading to subsequent imports not working
> well. That last issue is why I had moved to re-loading modules from
> "disk" during import.
> 
> The partial patch I've written doesn't use the existing cross module
> import code anymore, but moves to the lower level functionality it
> uses. For traditional compilers, which is what that linker is mainly
> written for, it makes sense to reduce memory usage by "destructively"
> moving code from the source into the target - it'll only be used
> once. But for the JIT case it's better to do so non-destructively. That
> requires us to implement some "type resolution" between the source and
> target modules, but then gains not needing to reload / copy the source
> module anymore, saving a lot of the costs.
> 
> 
> It's possible that for existing releases we ought to go for a simpler
> approach, even if it's not as good. The only really feasible way I see
> is to basically count the number of times inlining has been done, and
> then just re-initialize after a certain tim

Re: [PATCH] Implements SPIN_LOCK on ARM

David CARLIER  writes:
> Hi here a little update proposal for ARM architecture.

This sort of thing is not a "little proposal" where you can just
send in an unsupported patch and expect it to be accepted.
You need to provide some evidence that (a) it actually does anything
useful and (b) it isn't a net loss on some ARM architectures.

For comparison's sake, see

https://www.postgresql.org/message-id/flat/CAB10pyamDkTFWU_BVGeEVmkc8%3DEhgCjr6QBk02SCdJtKpHkdFw%40mail.gmail.com

where we still haven't pulled the trigger despite a great deal
more than zero testing.

FWIW, some casual googling suggests that ARM "yield" is not
all that much like x86 "pause": it supposedly encourages
the system to swap control away from the thread altogether,
exactly what we *don't* want in a spinloop.  So I'm a little
doubtful whether there's a case to be made for this at all.
But for sure, you haven't tried to make a case.

regards, tom lane

[PATCH] Implements SPIN_LOCK on ARM

2020-12-18 Thread David CARLIER

Hi here a little update proposal for ARM architecture.

Kind regards.


0001-Implements-SPIN_LOCK-macro-for-ARM.patch
Description: Binary data

Re: Proposed patch for key managment

2020-12-18 Thread Першин Юрий Петрович

Greetings Alastair,

* Alastair Turner (min...@decodable.me) wrote:
> On Wed, 16 Dec 2020 at 22:43, Stephen Frost  wrote:
> > If I'm following, you're suggesting something like:
> >
> > cluster_passphrase_command = 'aws get %q'
> >
> > and then '%q' gets replaced with "Please provide the WAL DEK: ", or
> > something like that?  Prompting the user for each key?  Not sure how
> > well that's work if want to automate everything though.
> >
> > If you have specific ideas, it'd really be helpful to give examples of
> > what you're thinking.
> 
> I can think of three specific ideas off the top of my head: the
> passphrase key wrapper, the secret store and the cloud/HW KMS.
> 
> Since the examples expand the purpose of cluster_passphrase_command,
> let's call it cluster_key_challenge_command in the examples.

These ideas don't seem too bad but I'd think we'd pass which key we want
on the command-line using a %i or something like that, rather than using
stdin, unless there's some reason that would be an issue..?  Certainly
the CLI utilities I've seen tend to expect the name of the secret that
you're asking for to be passed on the command line.

> Starting with the passphrase key wrapper, since it's what's in place now.
> 
>  - cluster_key_challenge_command = 'password_key_wrapper %q'

I do tend to like this idea of having what
cluster_key_challenge_command, or whatever we call it, expects is an
actual key and have the command that is run be a separate command that
takes the passphrase and runs the KDF (key derivation function) on it,
when a passphrase is what the user wishes to use.

That generally makes more sense to me than having the key derivation
effort built into the backend which I have a hard time seeing any
particular reason for, as long we're already calling out to some
external utility of some kind to get the key.

>  - Supplied on stdin to the process is the wrapped DEK (either a
> combined key for db files and WAL or one for each, on separate calls)
>  - %q is "Please provide WAL key wrapper password" or just "...provide
> key wrapper password"
>  - Returned on stdout is the unwrapped DEK

> For a secret store
> 
>  - cluster_key_challenge_command = 'vault_key_fetch'
>  - Supplied on stdin to the process is the secret's identifier 
> (pg_dek_xxUUIDxx)
>  - Returned on stdout is the DEK, which may never have been wrapped,
> depending on implementation
>  - Access control to the secret store is managed through the challenge
> command's own config, certs, HBA, ...

> For an HSM or cloud KMS
> 
>  - cluster_key_challenge_command = 'gcp_kms_key_fetch'
>  - Supplied on stdin to the process is the the wrapped DEK (individual
> or combined)
>  - Returned on stdout is the DEK (individual or combined)
>  - Access control to the kms is managed through the challenge
> command's own config, certs, HBA, ...
> 
> The secret store and HSM/KMS options may be promptless, and therefore
> amenable to automation, depending on the setup of those clients.

We can't really have what is passed on stdin, or not, be different
without having some other option say which it is and that seems like
it'd be making it overly complicated, and I get why Bruce would rather
not make this too complicated.

With the thought of trying to keep it a reasonably simple interface, I
had a thought along these lines:

- Separate command that runs the KDF, this is simple enough as it's
  really just a hash, and it returns the key on stdout.
- initdb time option that says if we're going to have PG manage the
  sub-keys, or not.
- cluster_key_command defined as "a command that is passed the ID of
  the key, or keys, required for the cluster to start"

initdb --pg-subkeys
  - Calls cluster_key_command once with "$PG_SYSTEM_ID-main" or similar
and expects the main key to be provided on stdout.  Everything else
is then more-or-less as is today: PG generates DEK sub-keys for data
and WAL and then encrypts them with the main key and stores them.

As long as the enveloped keys file exists on the filesystem, when PG
starts, it'll call the cluster_key_command and will expect the 'main'
key to be provided and it'll then decrypt and verify the DEK sub-keys,
very similar to today.

In this scenario, cluster_key_command might be set to call a command
which accepts a passphrase and runs a KDF on it, or it might be set to
call out to an external vaulting system or to a Yubikey, etc.

initdb --no-pg-subkeys
  - Calls cluster_key_command for each of the sub-keys that "pg-subkeys"
would normally generate itself, passing "$PG_SYSTEM_ID-keyid" for
each (eg: $PG_SYSTEM_ID-data, $PG_SYSTEM_ID-wal), and getting back
the keys on stdout to use.

When PG starts, it sees that the enveloped keys file doesn't exist and
does the same as initdb did- calls cluster_key_command multiple times to
get the keys which are needed.  We'd want to make sure to have a way
early on that checks that the data DEK provided actually decrypts the
cluster, and bail out otherw

libpq @windows : leaked singlethread_lock makes AppVerifier unhappy

HI

address d2be8fe8 found in
_DPH_HEAP_ROOT @ fe321000
in busy allocation (  DPH_HEAP_BLOCK: UserAddr UserSize -   
  VirtAddr VirtSize)
d2dd3444: d2be8fe8   18 -   
  d2be8000 2000
576bab70 verifier!AVrfDebugPageHeapAllocate+0x0240
77aa909b ntdll!RtlDebugAllocateHeap+0x0039
779fbbad ntdll!RtlpAllocateHeap+0x00ed
779fb0cf ntdll!RtlpAllocateHeapInternal+0x022f
779fae8e ntdll!RtlAllocateHeap+0x003e
5778aa2f vrfcore!VfCoreRtlAllocateHeap+0x001f
56b5256c vfbasics!AVrfpRtlAllocateHeap+0x00dc
7558a9f6 ucrtbase!_malloc_base+0x0026
56b538b8 vfbasics!AVrfp_ucrt_malloc+0x0038
*** WARNING: Unable to verify checksum for C:\Program Files 
(x86)\psqlODBC\1300\bin\LIBPQ.dll
d2d973ab LIBPQ!appendBinaryPQExpBuffer+0x016b
d2d855f3 LIBPQ!PQpingParams+0x02a3
d2d82406 LIBPQ!PQencryptPasswordConn+0x0346
d2d83c0d LIBPQ!PQconnectPoll+0x0c4d
d2d85822 LIBPQ!PQpingParams+0x04d2
d2d8463a LIBPQ!PQconnectdbParams+0x002a
*** WARNING: Unable to verify checksum for C:\Program Files 
(x86)\psqlODBC\1300\bin\psqlodbc35w.dll
51798c6c psqlodbc35w!LIBPQ_connect+0x051c 
[c:\mingw\git\psqlodbc-13.00.\connection.c @ 2879]
51795701 psqlodbc35w!CC_connect+0x00c1 
[c:\mingw\git\psqlodbc-13.00.\connection.c @ 1110]
517ac698 psqlodbc35w!PGAPI_DriverConnect+0x02f8 
[c:\mingw\git\psqlodbc-13.00.\drvconn.c @ 233]
517c3bad psqlodbc35w!SQLDriverConnectW+0x016d 
[c:\mingw\git\psqlodbc-13.00.\odbcapiw.c @ 163]
6f7733dc ODBC32!SQLInternalDriverConnectW+0x014c
6f770fb0 ODBC32!SQLDriverConnectW+0x0ac0
535d2a4a msdasql!CODBCHandle::OHDriverConnect+0x00da
535c554c msdasql!CImpIDBInitialize::Initialize+0x02ec
519c079a oledb32!CDBInitialize::DoInitialize+0x003b
519beaa4 oledb32!CDBInitialize::Initialize+0x0034
519c0d12 oledb32!CDCMPool::CreateResource+0x0162
51873f3b comsvcs!CHolder::SafeDispenserDriver::CreateResource+0x005b
51871c5e comsvcs!CHolder::AllocResource+0x01fe
519bb6da oledb32!CDCMPool::DrawResource+0x014a
519bb24b oledb32!CDCMPoolManager::DrawResource+0x020b
519b8233 oledb32!CDPO::Initialize+0x0263
51a993ab msado15!_ConnectAsync+0x01ab

Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-18 Thread Dmitry Dolgov

> On Thu, Dec 17, 2020 at 03:29:35PM -0500, Tom Lane wrote:
> Dmitry Dolgov <9erthali...@gmail.com> writes:
> > On Thu, Dec 17, 2020 at 01:49:17PM -0500, Tom Lane wrote:
> >> We can certainly reconsider the API for the parsing hook if there's
> >> really a good reason for these to be different types, but it seems
> >> like that would just be encouraging poor design.
>
> > To be more specific, this is the current behaviour (an example from the
> > tests) and it doesn't seem right:
>
> > =# update test_jsonb_subscript
> >set test_json['a'] = 3 where id = 1;
> > UPDATE 1
> > =# select jsonb_typeof(test_json->'a')
> >from test_jsonb_subscript where id = 1;
> >  jsonb_typeof
> >  --
> >   string
>
>
> I'm rather inclined to think that the result of subscripting a
> jsonb (and therefore also the required source type for assignment)
> should be jsonb, not just text.  In that case, something like
>   update ... set jsoncol['a'] = 3
> would fail, because there's no cast from integer to jsonb.  You'd
> have to write one of
>   update ... set jsoncol['a'] = '3'
>   update ... set jsoncol['a'] = '"3"'
> to clarify how you wanted the input to be interpreted.
> But that seems like a good thing, just as it is for jsonb_in.

Yep, that makes sense, will go with this idea.

Re: WIP: System Versioned Temporal Table

2020-12-18 Thread Ryan Lambert

On Thu, Nov 19, 2020 at 11:04 AM Surafel Temesgen 
wrote:

>
> Attached is a rebased one.
> regards
> Surafel
>

Thank you for your work on this!  The v7 patch fails on the current master
branch.  Error from make:

gram.y:16695:1: error: static declaration of ‘makeAndExpr’ follows
non-static declaration
 makeAndExpr(Node *lexpr, Node *rexpr, int location)
 ^~~
In file included from gram.y:58:0:
../../../src/include/nodes/makefuncs.h:108:14: note: previous declaration
of ‘makeAndExpr’ was here
 extern Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
  ^~~
gram.y:16695:1: warning: ‘makeAndExpr’ defined but not used
[-Wunused-function]
 makeAndExpr(Node *lexpr, Node *rexpr, int location)
 ^~~



The docs have two instances of "EndtTime" that should be "EndTime".

Ryan Lambert

Re: [PATCH] Automatic HASH and LIST partition creation

2020-12-18 Thread Pavel Borisov

I've realized one strange effect in current grammar parsing: if I do

CREATE TABLE foo (a int) PARTITION BY LIST (a) CONFIGURATION (a 1);
ERROR:  unrecognized auto partition bound specification "a"

I consulted the patch code and realized that in fact, the patch considers
it the (invalid) HASH bounds (doesn't find a word 'modulus') unless it is
specified to be (still invalid) LIST. This is due to the fact that the
grammar parser is not context-aware and in the patch, we tried to avoid the
new parser keyword MODULUS. The effect is that inside a CONFIGURATION
parentheses in case of HASH bounds we don't have a single keyword for the
parser to determine it is really a HASH case.

It doesn't make the patch work wrongly, besides it checks the validity of
all types of bounds in the HASH case even when the partitioning is not
HASH. I find this slightly bogus. This is because the parser can not
determine the type of partitioning inside the configuration clause and this
makes adding new syntax (e.g. adding RANGE partitioning configuration
inside CONFIGURATION parentheses) complicated.

So I have one more syntax proposal: to have separate keywords
inside CONFIGURATION parentheses for each partitioning type.
E.g:
CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES IN
(1,2),(3,4) DEFAULT PARTITION foo_def);
CREATE TABLE foo(a int) PARTITION BY HASH(a) CONFIGURATION (FOR VALUES WITH
MODULUS 3);
CREATE TABLE foo(a int) PARTITION BY RAGE(a) CONFIGURATION (FOR VALUES FROM
1 TO 1000 INTERVAL 10 DEFAULT PARTITION foo_def);

This proposal is in accordance with the current syntax of declarative
partitioning: CREATE TABLE foo_1 PARTITION OF foo FOR VALUES ...

Some more facultative proposals incremental to the abovementioned:
1. Omit CONFIGURATION with/without parentheses. This makes syntax closer
to (non-automatic) declarative partitioning syntax but the clause
seems less legible (in my opinion).
2. Omit just FOR VALUES. This makes the clause short, but adds a difference
to (non-automatic) declarative partitioning syntax.

 I'm planning also to add RANGE partitioning syntax to this in the future
and I will be happy if all three types of the syntax could come along
easily.
 I very much appreciate your views on this especially regarding that
changes can be still made easily because the patch is not committed yet.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: New Table Access Methods for Multi and Single Inserts

2020-12-18 Thread Justin Pryzby

On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby  wrote:
> > Are you thinking that TableInsertState would eventually have additional
> > attributes which would apply to other tableams, but not to heap ?  Is
> > heap_insert_begin() really specific to heap ?  It's allocating and 
> > populating a
> > structure based on its arguments, but those same arguments would be passed 
> > to
> > every other AM's insert_begin routine, too.  Do you need a more flexible 
> > data
> > structure, something that would also accomodate extensions?  I'm thinking of
> > reloptions as a loose analogy.
> 
> I could not think of other tableam attributes now. But +1 to have that
> kind of flexible structure for TableInsertState. So, it can have
> tableam type and attributes within the union for each type.

Right now you have heap_insert_begin(), and I asked if it was really
heap-specific.  Right now, it populates a struct based on a static list of
arguments, which are what heap uses.  

If you were to implement a burp_insert_begin(), how would it differ from
heap's?  With the current API, they'd (have to) be the same, which means either
that it should apply to all AMs (or have a "default" implementation that can be
overridden by an AM), or that this API assumes that other AMs will want to do
exactly what heap does, and fails to allow other AMs to implement optimizations
for bulk inserts as claimed.

I don't think using a "union" solves the problem, since it can only accommodate
core AMs, and not extensions, so I suggested something like reloptions, which
have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
toast.autovacuum_enabled).

-- 
Justin

Re: allow to \dtS+ pg_toast.*

2020-12-18 Thread Justin Pryzby

On Fri, Dec 18, 2020 at 12:43:07PM +0100, Laurenz Albe wrote:
> On Fri, 2020-12-18 at 00:58 -0600, Justin Pryzby wrote:
> > On Thu, Dec 17, 2020 at 04:16:52PM +0100, Laurenz Albe wrote:
> > > On Mon, 2020-11-30 at 10:54 -0600, Justin Pryzby wrote:
> > > > This makes toast tables a bit less special and easier to inspect.
> > > > 
> > > > postgres=# \dtS+ pg_toast.pg_toast_2619
> > > >  pg_toast | pg_toast_2619 | toast table | pryzbyj | permanent   | heap  
> > > > | 56 kB | 
> > > > 
> > > > This follows commit from last year:
> > > > | eb5472da9 make \d pg_toast.foo show its indices ; and, \d toast show 
> > > > its main table
> > > 
> > > This would indeed be convenient.
> > > 
> > > While playing around with it, I found the following oddity:
> > > 
> > > regression=# \dtS pg_toast.pg_toast_30701
> > >  pg_toast | pg_toast_30701 | toast table | laurenz
> > > 
> > > regression=# \dt pg_toast.pg_toast_30701
> > > Did not find any relation named "pg_toast.pg_toast_30701".
> > > 
> > > Now this doesn't seem right.  To my understanding, \dtS should do the 
> > > same as \dt,
> > > except that it should also search in "pg_catalog" if no schema was 
> > > provided.
> > 
> > You mean that if pg_toast.* should be shown if a matching "pattern" is 
> > given,
> > even if "S" was not used.  I think you're right.  The behavior I implemented
> > was intended to provide a bit of historic compatibility towards hiding toast
> > tables, but I think it's not needed, since they're not shown anyway unless
> > someone includes "S", specifies the "pg_toast." schema, or pg_toast is in 
> > their
> > search path.  See attached.
> 
> Yes, exactly.
> 
> I wonder why the modification in "listPartitionedTables" is necessary.
> Surely there cannot be any partitioned toast tables, can there?

The comment should be removed for consistency.
And I changed the code for consistency with listTables (from which I assume
listPartitionedTables was derived - I was involved in the last stages of that
patch).  It doesn't need to exclude pg_catalog or information_schema, either,
but it's kept the same for consistency.  That part could also be removed.

> > > Another thing that is missing is tab completion for
> > > regression=# \dtS pg_toast.pg_
> > > This should work just like for \d and \dS.
..
> If I want to know how big the TOAST table of relation 87654 is,
> I think it is convenient if I can tab to
> 
> \dt+ pg_toast.pg_toast_

I agree that it's nice to complete the schema name, but I'm still not convinced
this part should be included.

The way to include pg_toast.pg_toast is if toast relations are included, which
is exactly what Tom pointed out is usually unhelpful.  If you include toast
relations, tab completion might give "pg_toast.pg_toast_14..." when you wanted
to paste "145678" - you'd need to remove the common suffix that it found.

I considered whether "toast table" should be capitalized (as it is for "\d")
but I think it should stay lowercase.

-- 
Justin
>From 987bbcb083507426018bfccd1eae7552468c57cb Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 28 Oct 2020 23:51:39 -0500
Subject: [PATCH] Allow to \dti+ pg_toast.*

This reverts commit 81fc5df83.  Rather than being consistent by hiding toast
tables, instead allow toast tables to be displayed by specifying "S", or their
schema, or by including pg_toast in search_path.
---
 src/bin/psql/describe.c | 24 ++--
 src/bin/psql/tab-complete.c | 13 -
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 8923eee752..b7a8bd4121 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3762,6 +3762,7 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
 	  " WHEN " CppAsString2(RELKIND_INDEX) " THEN '%s'"
 	  " WHEN " CppAsString2(RELKIND_SEQUENCE) " THEN '%s'"
 	  " WHEN 's' THEN '%s'"
+	  " WHEN " CppAsString2(RELKIND_TOASTVALUE) " THEN '%s'"
 	  " WHEN " CppAsString2(RELKIND_FOREIGN_TABLE) " THEN '%s'"
 	  " WHEN " CppAsString2(RELKIND_PARTITIONED_TABLE) " THEN '%s'"
 	  " WHEN " CppAsString2(RELKIND_PARTITIONED_INDEX) " THEN '%s'"
@@ -3775,6 +3776,7 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
 	  gettext_noop("index"),
 	  gettext_noop("sequence"),
 	  gettext_noop("special"),
+	  gettext_noop("toast table"),
 	  gettext_noop("foreign table"),
 	  gettext_noop("partitioned table"),
 	  gettext_noop("partitioned index"),
@@ -3870,6 +3872,8 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
 		appendPQExpBufferStr(&buf, CppAsString2(RELKIND_SEQUENCE) ",");
 	if (showSystem || pattern)
 		appendPQExpBufferStr(&buf, "'s',"); /* was RELKIND_SPECIAL */
+	if ((showSystem || pattern) && showTables)
+		appendPQExpBufferStr(&buf, "'t',"); /* toast tables */
 	if (showForeign)
 		appendPQExpBufferStr(&buf, CppAsSt

Weird special case in jsonb_concat()

The discussion in [1] pointed out that the existing documentation
for the "jsonb || jsonb" concatenation operator is far short of
reality: it fails to acknowledge that the operator will accept
any cases other than two jsonb array inputs or two jsonb object
inputs.

I'd about concluded that other cases were handled as if by
wrapping non-array inputs in one-element arrays and then
proceeding as for two arrays.  That works for most scenarios, eg

regression=# select '[3]'::jsonb || '{}'::jsonb;
 ?column? 
--
 [3, {}]
(1 row)

regression=# select '3'::jsonb || '[]'::jsonb;
 ?column? 
--
 [3]
(1 row)

regression=# select '3'::jsonb || '4'::jsonb;
 ?column? 
--
 [3, 4]
(1 row)

However, further experimentation found a case that fails:

regression=# select '3'::jsonb || '{}'::jsonb;
ERROR:  invalid concatenation of jsonb objects

I wonder what is the point of this weird exception, and whether
whoever devised it can provide a concise explanation of what
they think the full behavior of "jsonb || jsonb" is.  Why isn't
'[3, {}]' a reasonable result here, if the cases above are OK?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/0d72b76d-ca2b-4263--d6dfca861c51%40www.fastmail.com

Re: Proposed patch for key managment

On Fri, Dec 18, 2020 at 11:13:44AM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Fri, Dec 18, 2020 at 08:18:53AM -0500, Stephen Frost wrote:
> > > - C API in backend - we should try to at least set up the structure to
> > >   allow multiple encryption implementations, either via different
> > >   libraries or if someone feels it'd be useful to write a built-in
> > >   implementation, but as I mentioned just a moment ago, we aren't going
> > >   to get this perfect and we should accept that.
> > 
> > All my OpenSSL-specific stuff is isolated in src/common.
> 
> ... and in 'openssl' files, with 'generic' functions on top, right?
> That's what I recall from before and seems like the right approach to at
> least start with, and then we likely make adjustments as needed to the
> API when we add another encryption library.

Uh, I did it the way cryptohash_openssl.c does it. Sawada-san's patch did
it kind of like that, but had #ifdef calls to OpenSSL versions, while
the cryptohash_openssl.c method has two files with exactly the same
function names, and they are conditionally compiled in the makefile
based on makefile defines, and that is how I did it.  Michael Paquier
pointed this out, and the msvc changes needed to handle it.

> > > > I split apart my patch to create cipher{_openssl}.c and hmac{_openssl}.c
> > > > so the single HMAC function is not in the cipher file anymore, attached.
> > > 
> > > Will try to look at this soon, but in general the idea seems right.
> > 
> > Should I be using the init/update/final HMAC API that SCRAM uses so it
> > is easier to integrate into Michael's patch?  I can do that, but didn't
> > because that is going to require me to create a much larger footprint in
> > the main code, and that might be harder to integrate once Michael's API
> > is final.  It is easier for me to change one line to five lines than to
> > change five lines to five different lines.
> 
> For my 2c, ideally we'd get a review done of Michael's changes and just
> get that committed and have your work here rebased over that.  I don't

That would be nice.

> see any reason why we can't get that done in relatively short order.
> Just because it's registered in the January CF doesn't mean we actually
> have to wait for that to get done, it just needs a review from another
> committer (or self certification from Michael if he's happy with it).

Agreed.  I just don't want to wait until mid-January to get this into
the tree, and I am going to defer to Michael's timeline on this, which
is why I figured I might need to go first.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

initdb fails under bleeding-edge valgrind

With valgrind 3.16.1, we fail to get through initdb:

==00:00:00:41.608 11346== Source and destination overlap in memcpy(0xc190a8, 
0xc190a8, 512)
==00:00:00:41.609 11346==at 0x486C674: __GI_memcpy 
(vg_replace_strmem.c:1035)
==00:00:00:41.609 11346==by 0x9017DB: write_relmap_file (relmapper.c:932)
==00:00:00:41.609 11346==by 0x90243B: RelationMapFinishBootstrap 
(relmapper.c:571)
==00:00:00:41.609 11346==by 0x551083: BootstrapModeMain (bootstrap.c:530)
==00:00:00:41.609 11346==by 0x551083: AuxiliaryProcessMain (bootstrap.c:436)
==00:00:00:41.609 11346==by 0x483E8F: main (main.c:201)
==00:00:00:41.609 11346== 
==00:00:00:41.615 11346== Source and destination overlap in memcpy(0xc192a8, 
0xc192a8, 512)
==00:00:00:41.615 11346==at 0x486C674: __GI_memcpy 
(vg_replace_strmem.c:1035)
==00:00:00:41.615 11346==by 0x9017DB: write_relmap_file (relmapper.c:932)
==00:00:00:41.615 11346==by 0x551083: BootstrapModeMain (bootstrap.c:530)
==00:00:00:41.615 11346==by 0x551083: AuxiliaryProcessMain (bootstrap.c:436)
==00:00:00:41.615 11346==by 0x483E8F: main (main.c:201)
==00:00:00:41.615 11346== 

I'm a bit surprised that we've not seen other complaints from picky
memcpy implementations, because this code path definitely is passing
the same source and destination pointers.

Evidently we need something like this in write_relmap_file:

/* Success, update permanent copy */
-   memcpy(realmap, newmap, sizeof(RelMapFile));
+   if (realmap != newmap)
+   memcpy(realmap, newmap, sizeof(RelMapFile));

Or possibly it'd be cleaner to have RelationMapFinishBootstrap
supply a local RelMapFile variable to construct the mappings in.

regards, tom lane

Re: Refactor routine to check for ASCII-only case

Greetings,

* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> On 18/12/2020 05:57, Michael Paquier wrote:
> >As mentioned in [1], there are three places where there is the same
> >routine to check if a string is made only of ASCII characters.
> >
> >This makes for a small-ish but nice cleanup, as per the attached.
> 
> +1

Yeah, in a quick look, this looks like a good improvement.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Proposed patch for key managment

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Dec 18, 2020 at 08:18:53AM -0500, Stephen Frost wrote:
> > - C API in backend - we should try to at least set up the structure to
> >   allow multiple encryption implementations, either via different
> >   libraries or if someone feels it'd be useful to write a built-in
> >   implementation, but as I mentioned just a moment ago, we aren't going
> >   to get this perfect and we should accept that.
> 
> All my OpenSSL-specific stuff is isolated in src/common.

... and in 'openssl' files, with 'generic' functions on top, right?
That's what I recall from before and seems like the right approach to at
least start with, and then we likely make adjustments as needed to the
API when we add another encryption library.

> > - User API - we should avoid having OpenSSL-specific bits exposed to
> >   users, and I don't think we do today, so I don't think this is an
> >   issue at the moment.
> 
> Right, there is nothing OpenSSL-specific on the user side, but some of
> the scripts assume you can pass an open file descriptor to a
> PKCS11-enabled tool, and I don't know if non-OpenSSL libraries support
> that.

That generally seems alright to me.

> Ideally, we would have scripts for each library to use their
> command-line API for this.  I am hestitant to rename the scripts to
> contain the name openssl because I am unclear if we will ever have
> non-OpenSSL implementations of these.  I updated the script comments at
> the top to indicate "It uses OpenSSL with PKCS11 enabled via OpenSC.".

I'm not too worried about the specific names of those scripts.
Definitely having comments that indicate what the dependencies are is
certainly good.

> One point is that the passphrase scripts are useful for cluster file
> encryption _and_ unlocking TLS certificates.

That's certainly good.

> > > I split apart my patch to create cipher{_openssl}.c and hmac{_openssl}.c
> > > so the single HMAC function is not in the cipher file anymore, attached.
> > 
> > Will try to look at this soon, but in general the idea seems right.
> 
> Should I be using the init/update/final HMAC API that SCRAM uses so it
> is easier to integrate into Michael's patch?  I can do that, but didn't
> because that is going to require me to create a much larger footprint in
> the main code, and that might be harder to integrate once Michael's API
> is final.  It is easier for me to change one line to five lines than to
> change five lines to five different lines.

For my 2c, ideally we'd get a review done of Michael's changes and just
get that committed and have your work here rebased over that.  I don't
see any reason why we can't get that done in relatively short order.
Just because it's registered in the January CF doesn't mean we actually
have to wait for that to get done, it just needs a review from another
committer (or self certification from Michael if he's happy with it).

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: [DOC] Document concurrent index builds waiting on each other

2020-12-18 Thread James Coleman

On Tue, Dec 1, 2020 at 8:05 PM James Coleman  wrote:
>
> On Tue, Dec 1, 2020 at 6:51 PM Alvaro Herrera  wrote:
> >
> > On 2020-Nov-30, James Coleman wrote:
> >
> > > On Mon, Nov 30, 2020 at 4:53 PM Alvaro Herrera  
> > > wrote:
> > > >
> > > > On 2020-Sep-30, Michael Paquier wrote:
> >
> > > > Yeah, I think it might be more sensible to document this in
> > > > maintenance.sgml, as part of the paragraph that discusses removing
> > > > tuples "to save space".  But making it inline with the rest of the flow,
> > > > it seems to distract from higher-level considerations, so I suggest to
> > > > make it a footnote instead.
> > >
> > > I have mixed feelings about wholesale moving it; users aren't likely
> > > to read the vacuum doc when considering how running CIC might impact
> > > their system, though I do understand why it otherwise fits there.
> >
> > Makes sense.  ISTM that if we want to have a cautionary blurb CIC docs,
> > it should go in REINDEX CONCURRENTLY as well.
>
> Agreed. Or, alternatively, a blurb something like "Please note how CIC
> interacts with VACUUM ...", and then the primary language in
> maintenance.sgml. That would have the benefit of maintaining the core
> language in only one place.

Any thoughts on this?

James

Re: how to use valgrind for TAP tests

"osumi.takami...@fujitsu.com"  writes:
> I have a question about how to execute valgrind with TAP tests
> in order to check some patches in the community.
> My main interest is testing src/test/subscription now but
> is there any general way to do it ?

The standard solution is

(1) Build normally (well, with -DUSE_VALGRIND)
(2) Move the postgres executable aside, say
mv src/backend/postgres src/backend/postgres.orig
(3) Replace the executable with a wrapper script that invokes
valgrind on the original executable
(4) Now you can run "make check" with a valgrind'ed server,
as well as things that depend on "make check", such as TAP tests

The script I use for (3) is attached; adjust paths and options to taste.

regards, tom lane

#!/bin/sh

# Use this script as a wrapper to run PG under valgrind in "make check"
# scenarios.  Rename the built postgres executable to postgres.orig
# and then copy this script to $PGBLDROOT/src/backend/postgres.

# Note: a disadvantage of this approach is that the backend will see
# $PGBLDROOT/src/backend/ as its my_exec_path, hence will not pick up
# anything in the temp install dir, but will go to the configured install
# dir.  Must "make install" extensions and other stuff to work properly.

exec valgrind \
--quiet \
--suppressions=$PGSRCROOT/src/tools/valgrind.supp \
--suppressions=$HOME/misc/my-valgrind.supp \
--time-stamp=yes --error-exitcode=128 --trace-children=yes \
--log-file=$HOME/valgrind-logs/%p.log $PMVALGRIND \
$PGBLDROOT/src/backend/postgres.orig "$@"

Re: Refactoring HMAC in the core code

On Fri, Dec 18, 2020 at 03:46:42PM +0900, Michael Paquier wrote:
> On Fri, Dec 18, 2020 at 08:41:01AM +0900, Michael Paquier wrote:
> > Knowing that we are in a period of vacations for a lot of people, and
> > that this is a sensitive area of the code that involves
> > authentication, I think that it is better to let this thread brew
> > longer and get more eyes to look at it.  As this also concerns
> > external SSL libraries like libnss, making sure that the APIs have a
> > shape flexible enough would be good.  Based on my own checks with
> > OpenSSL and libnss, I think that's more than enough.  But let's be
> > sure.
...
> This has been tested on Windows and Linux across all the versions of
> OpenSSL we support on HEAD.  I am also attaching a small module called
> hmacfuncs that I used as a way to validate this patch across all the
> versions of OpenSSL and the fallback implementation.  As a reference,
> this matches with the results from Wikipedia here:
> https://en.wikipedia.org/wiki/HMAC#Examples

Great.  See my questions in the key manager thread about whether I
should use the init/update/final API or just keep the one-line version.
As far as when to commit this, I think the quiet time is actually better
because if you break something, it is less of a disruption while you fix
it.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Proposed patch for key managment

On Fri, Dec 18, 2020 at 08:18:53AM -0500, Stephen Frost wrote:
> What I would be thinking about with this are really three pieces-
> 
> - C API for libpq (not relevant for this, but we have had issues in the
>   past with exposing OpenSSL-specific things there)

Right.

> - C API in backend - we should try to at least set up the structure to
>   allow multiple encryption implementations, either via different
>   libraries or if someone feels it'd be useful to write a built-in
>   implementation, but as I mentioned just a moment ago, we aren't going
>   to get this perfect and we should accept that.

All my OpenSSL-specific stuff is isolated in src/common.

> - User API - we should avoid having OpenSSL-specific bits exposed to
>   users, and I don't think we do today, so I don't think this is an
>   issue at the moment.

Right, there is nothing OpenSSL-specific on the user side, but some of
the scripts assume you can pass an open file descriptor to a
PKCS11-enabled tool, and I don't know if non-OpenSSL libraries support
that.

Ideally, we would have scripts for each library to use their
command-line API for this.  I am hestitant to rename the scripts to
contain the name openssl because I am unclear if we will ever have
non-OpenSSL implementations of these.  I updated the script comments at
the top to indicate "It uses OpenSSL with PKCS11 enabled via OpenSC.".

One point is that the passphrase scripts are useful for cluster file
encryption _and_ unlocking TLS certificates.

> > I split apart my patch to create cipher{_openssl}.c and hmac{_openssl}.c
> > so the single HMAC function is not in the cipher file anymore, attached.
> 
> Will try to look at this soon, but in general the idea seems right.

Should I be using the init/update/final HMAC API that SCRAM uses so it
is easier to integrate into Michael's patch?  I can do that, but didn't
because that is going to require me to create a much larger footprint in
the main code, and that might be harder to integrate once Michael's API
is final.  It is easier for me to change one line to five lines than to
change five lines to five different lines.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: pgbench failed when -f option contains a char '@'

Heikki Linnakangas  writes:
> I think we should just leave this as it is. The user can simply rename 
> the file.

Yeah.  The assumption when we defined the script-weight syntax was that
there's no particular reason to use "@" in a script file name, and
I don't see why that's a bad assumption.

> Or maybe one change would be worthwhile here: First check if the part 
> after the @ contains only digits. If doesn't, then assume it's part of 
> the filename rather than a weight. That would fix this for cases like 
> "f...@1.sql", although not for "foo@1".

I do not like introducing ambiguity of that sort.  Not being entirely
clear on which script file is going to be read seems like a recipe
for security issues.

regards, tom lane

Re: Proposed patch for key managment

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Dec 18, 2020 at 10:01:22AM +0900, Michael Paquier wrote:
> > On Thu, Dec 17, 2020 at 12:10:22PM -0500, Bruce Momjian wrote:
> > > On Thu, Dec 17, 2020 at 11:39:55AM -0500, Stephen Frost wrote:
> > >> I don't think there's any need for us to implement a fallback
> > >> implementation of AES.  I'm not entirely sure we need it for hashes
> > >> but since we've already got it...
> > 
> > We need fallback implementations for cryptohashes (MD5, SHA1/2) and
> > HMAC because we have SCRAM authentication, pgcrypto and SQL functions
> > that should be able to work even when not building with any SSL
> > libraries.  So that's here to stay to ensure compatibility with what
> > we do.  All this stuff is already in the tree for ages, it was just
> > not shaped for a more centralized reuse.
> 
> One question is whether we want to support cluster file encryption
> without OpenSSL --- right now we can't.  I am wondering if we really
> need the hardware acceleration of OpenSSL for AES so doing our own AES
> implementation might not even make sense, performance-wise.

No, I don't think we need to support file encryption independently of
any library being available.  Maybe some day we will, but that should
really just be another option to build with.

Guess it wasn't clear, but I was being a bit tounge-in-cheek regarding
the idea of dropping SCRAM support if we aren't built with OpenSSL.

> > > Agreed.  I think there is serious risk we would do AES in a different
> > > way than OpenSSL, especially if I did it.  ;-)  We can add a native AES
> > > one day if we want, but as stated by Michael Paquier, it has to be
> > > tested so we are sure it returns exactly the same values as OpenSSL.
> > 
> > I think that it would be good to put some generalization here, and
> > look at options that are offered by other SSL libraries, like libnss
> > so as we don't finish with a design that restricts the use of a given
> > feature only to OpenSSL.
> 
> Uh, you mean the C API or the user API?  I don't think we have any
> OpenSSL requirement at the user level, except that my examples use
> command-line openssl to generate the passphrase.

What I would be thinking about with this are really three pieces-

- C API for libpq (not relevant for this, but we have had issues in the
  past with exposing OpenSSL-specific things there)

- C API in backend - we should try to at least set up the structure to
  allow multiple encryption implementations, either via different
  libraries or if someone feels it'd be useful to write a built-in
  implementation, but as I mentioned just a moment ago, we aren't going
  to get this perfect and we should accept that.

- User API - we should avoid having OpenSSL-specific bits exposed to
  users, and I don't think we do today, so I don't think this is an
  issue at the moment.

> I split apart my patch to create cipher{_openssl}.c and hmac{_openssl}.c
> so the single HMAC function is not in the cipher file anymore, attached.

Will try to look at this soon, but in general the idea seems right.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: postgres_fdw - cached connection leaks if the associated user mapping/foreign server is dropped

2020-12-18 Thread Bharath Rupireddy

On Fri, Dec 18, 2020 at 6:39 PM Bharath Rupireddy
 wrote:
>
> On Fri, Dec 18, 2020 at 5:06 PM Hou, Zhijie  wrote:
> > I have an issue about the existing testcase.
> >
> > """
> > -- Test that alteration of server options causes reconnection
> > SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should work
> > ALTER SERVER loopback OPTIONS (SET dbname 'no such database');
> > SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should fail
> > DO $d$
> > BEGIN
> > EXECUTE $$ALTER SERVER loopback
> > OPTIONS (SET dbname '$$||current_database()||$$')$$;
> > END;
> > $d$;
> > SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should work again
> > """
> >
> > IMO, the above case is designed to test the following code[1]:
> > With the patch, it seems the following code[1] will not work for this case, 
> > right?
> > (It seems the connection will be disconnect in pgfdw_xact_callback)
> >
> > I do not know does it matter, or should we add a testcase to cover that?
> >
> > [1] /*
> >  * If the connection needs to be remade due to invalidation, 
> > disconnect as
> >  * soon as we're out of all transactions.
> >  */
> > if (entry->conn != NULL && entry->invalidated && entry->xact_depth 
> > == 0)
> > {
> > elog(DEBUG3, "closing connection %p for option changes to 
> > take effect",
> >  entry->conn);
> > disconnect_pg_server(entry);
> > }
>
> Yes you are right. With the patch if the server is altered in the same
> session in which the connection exists, the connection gets closed at
> the end of that alter query txn, not at the beginning of the next
> foreign tbl query. So, that part of the code in GetConnection()
> doesn't get hit. Having said that, this code gets hit when the alter
> query is run in another session and the connection in the current
> session gets disconnected at the start of the next foreign tbl query.
>
> If we want to cover it with a test case, we might have to alter the
> foreign server in another session. I'm not sure whether we can open
> another session from the current psql session and run a sql command.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com

Re: Proposed patch for key managment

Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Thu, Dec 17, 2020 at 12:10:22PM -0500, Bruce Momjian wrote:
> > Agreed.  I think there is serious risk we would do AES in a different
> > way than OpenSSL, especially if I did it.  ;-)  We can add a native AES
> > one day if we want, but as stated by Michael Paquier, it has to be
> > tested so we are sure it returns exactly the same values as OpenSSL.
> 
> I think that it would be good to put some generalization here, and
> look at options that are offered by other SSL libraries, like libnss
> so as we don't finish with a design that restricts the use of a given
> feature only to OpenSSL.

While I agree with the general idea proposed here, I don't know that we
need to push super hard on it to be somehow perfect right now because it
simply won't be until we actually add support for another library, and I
don't think that's really this patch's responsibility.

So, yes, let's lay the groundwork and structure and perhaps spend a bit
of time looking at other libraries, but not demand this patch also add
support for a second library today, and accept that that means that the
structure we put in place may not end up being exactly perfect.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Single transaction in the tablesync worker?

2020-12-18 Thread Peter Smith

Hi Amit.

PSA my v4 WIP patch for the Solution1.

This patch applies onto the v30 patch set [1] from other 2PC thread:
[1]
https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com

Although incomplete it does still pass all the make check, and
src/test/subscription TAP tests.

Coded / WIP:

* tablesync slot is now permanent instead of temporary

* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions

* tablesync worked now allowing multiple tx instead of single tx

* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.

* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.

* tablesync now sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for apply worker)

* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply

TODO / Known Issues:

* the current implementation of tablesync drop slot (e.g. from
DropSubscription or finish_sync_worker) regenerates the tablesync slot
name so it knows what slot to drop. The current code might be ok for
normal use cases, but if there is an ALTER SUBSCRIPTION ... SET
(slot_name = newname) it would fail to be able to find the tablesync
slot.

* I think if there are crashed tablesync workers then they are not
known to DropSubscription. So this might be a problem to cleanup slots
and/or origin tracking belonging to those unknown workers.

* help / comments / cleanup

* There is temporary "!!>>" excessive logging of mine scattered around
which I added to help my testing during development

---

Kind Regards,
Peter Smith.
Fujitsu Australia

v4-0002-WIP-patch-for-Solution1.patch
Description: Binary data

v4-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patch
Description: Binary data

Double partition lock in bufmgr

2020-12-18 Thread Konstantin Knizhnik


Hi hackers,

I am investigating incident with one of out customers: performance of 
the system isdropped dramatically.
Stack traces of all backends can be found here: 
http://www.garret.ru/diag_20201217_102056.stacks_59644

(this file is 6Mb so I have not attached it to this mail).

What I have see in this stack traces is that 642 backends and blocked in 
LWLockAcquire,

mostly in obtaining shared buffer lock:

#0  0x7f0e7fe7a087 in semop () from /lib64/libc.so.6
#1  0x00682fb1 in PGSemaphoreLock 
(sema=sema@entry=0x7f0e1c1f63a0) at pg_sema.c:387
#2  0x006ed60b in LWLockAcquire (lock=lock@entry=0x7e8b6176d800, 
mode=mode@entry=LW_SHARED) at lwlock.c:1338
#3  0x006c88a7 in BufferAlloc (foundPtr=0x7ffcc3c8de9b "\001", 
strategy=0x0, blockNum=997, forkNum=MAIN_FORKNUM, relpersistence=112 
'p', smgr=0x2fb2df8) at bufmgr.c:1177
#4  ReadBuffer_common (smgr=0x2fb2df8, relpersistence=, 
relkind=, forkNum=forkNum@entry=MAIN_FORKNUM, 
blockNum=blockNum@entry=997, mode=RBM_NORMAL, strategy=0x0, 
hit=hit@entry=0x7ffcc3c8df97 "") at bufmgr.c:894
#5  0x006c928b in ReadBufferExtended (reln=0x32c7ed0, 
forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=997, 
mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at bufmgr.c:753
#6  0x006c93ab in ReadBuffer (blockNum=, 
reln=) at bufmgr.c:685

...

Only 11 locks from this 642 are unique.
Moreover: 358 backends are waiting for one lock and 183 - for another.

There are two backends (pids 291121 and 285927) which are trying to 
obtain exclusive lock while already holding another exclusive lock.

And them block all other backends.

This is single place in bufmgr (and in postgres) where process tries to 
lock two buffers:


        /*
         * To change the association of a valid buffer, we'll need to have
         * exclusive lock on both the old and new mapping partitions.
         */
        if (oldFlags & BM_TAG_VALID)
        {
    ...
            /*
             * Must lock the lower-numbered partition first to avoid
             * deadlocks.
             */
            if (oldPartitionLock < newPartitionLock)
            {
                LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
                LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
            }
            else if (oldPartitionLock > newPartitionLock)
            {
                LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
                LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
            }

This two backends are blocked in the second lock request.
I read all connects in bufmgr.c and README file but didn't find 
explanation why do we need to lock both partitions.
Why it is not possible first free old buffer (as it is done in 
InvalidateBuffer) and then repeat attempt to allocate the buffer?


Yes, it may require more efforts than just "gabbing" the buffer.
But in this case there is no need to keep two locks.

I wonder if somebody in the past  faced with the similar symptoms and 
was this problem with holding locks of two partitions in bufmgr already 
discussed?


P.S.
The customer is using 9.6 version of Postgres, but I have checked that 
the same code fragment is present in the master.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: allow to \dtS+ pg_toast.*

2020-12-18 Thread Laurenz Albe

On Fri, 2020-12-18 at 00:58 -0600, Justin Pryzby wrote:
> On Thu, Dec 17, 2020 at 04:16:52PM +0100, Laurenz Albe wrote:
> > On Mon, 2020-11-30 at 10:54 -0600, Justin Pryzby wrote:
> > > This makes toast tables a bit less special and easier to inspect.
> > > 
> > > postgres=# \dtS+ pg_toast.pg_toast_2619
> > >  pg_toast | pg_toast_2619 | toast table | pryzbyj | permanent   | heap
> > >   | 56 kB | 
> > > 
> > > This follows commit from last year:
> > > | eb5472da9 make \d pg_toast.foo show its indices ; and, \d toast show 
> > > its main table
> > 
> > This would indeed be convenient.
> > 
> > While playing around with it, I found the following oddity:
> > 
> > regression=# \dtS pg_toast.pg_toast_30701
> >  pg_toast | pg_toast_30701 | toast table | laurenz
> > 
> > regression=# \dt pg_toast.pg_toast_30701
> > Did not find any relation named "pg_toast.pg_toast_30701".
> > 
> > Now this doesn't seem right.  To my understanding, \dtS should do the same 
> > as \dt,
> > except that it should also search in "pg_catalog" if no schema was provided.
> 
> You mean that if pg_toast.* should be shown if a matching "pattern" is given,
> even if "S" was not used.  I think you're right.  The behavior I implemented
> was intended to provide a bit of historic compatibility towards hiding toast
> tables, but I think it's not needed, since they're not shown anyway unless
> someone includes "S", specifies the "pg_toast." schema, or pg_toast is in 
> their
> search path.  See attached.

Yes, exactly.

I wonder why the modification in "listPartitionedTables" is necessary.
Surely there cannot be any partitioned toast tables, can there?

> > Another thing that is missing is tab completion for
> > regression=# \dtS pg_toast.pg_
> > This should work just like for \d and \dS.
> 
> Tom objected to this in the past, humorously to me:
> 
> https://www.postgresql.org/message-id/14255.1536781...@sss.pgh.pa.us

That was about VACUUM and ANALYZE, and I certainly see the point
that this is not a frequent enough use case to warrant guiding
people there with tab completion.

But I don't take that as an argument against tab completion in our case.
If I want to know how big the TOAST table of relation 87654 is,
I think it is convenient if I can tab to

\dt+ pg_toast.pg_toast_

If you want to abolish special treatment for TOAST tables in \dt(S),
go all the way.

Yours,
Laurenz Albe

RE: postgres_fdw - cached connection leaks if the associated user mapping/foreign server is dropped

2020-12-18 Thread Hou, Zhijie

Hi

I have an issue about the existing testcase.

"""
-- Test that alteration of server options causes reconnection SELECT c3, c4 
FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should work ALTER SERVER loopback OPTIONS 
(SET dbname 'no such database'); SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 
1;  -- should fail DO $d$
BEGIN
EXECUTE $$ALTER SERVER loopback
OPTIONS (SET dbname '$$||current_database()||$$')$$;
END;
$d$;
SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should work again """

IMO, the above case is designed to test the following code[1]:
With the patch, it seems the following code[1] will not work for this case, 
right?
(It seems the connection will be disconnect in pgfdw_xact_callback)

I do not know does it matter, or should we add a testcase to cover that?

[1] /*
 * If the connection needs to be remade due to invalidation, disconnect 
as
 * soon as we're out of all transactions.
 */
if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
{
elog(DEBUG3, "closing connection %p for option changes to take 
effect",
 entry->conn);
disconnect_pg_server(entry);
}


Best regards,
houzj

Re: Incorrect allocation handling for cryptohash functions with OpenSSL


On 18/12/2020 12:55, Heikki Linnakangas wrote:

On 18/12/2020 12:10, Michael Paquier wrote:

On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote:

pg_cryptohash_create() is now susceptible to leaking memory in
TopMemoryContext, if the allocations fail. I think the attached should fix
it (but I haven't tested it at all).


Yeah, you are right here.  If the second allocation fails the first
one would leak.  I don't think that your suggested fix is completely
right though because it ignores that the callers of
pg_cryptohash_create() in the backend expect an error all the time, so
it could crash.


Ah, right.


Perhaps we should just bite the bullet and switch the
OpenSSL and fallback implementations to use allocation APIs that never
cause an error, and always return NULL?  That would have the advantage
to be more consistent with the frontend that relies in malloc(), at
the cost of requiring more changes for the backend code where the
_create() call would need to handle the NULL case properly.  The
backend calls are already aware of errors so that would not be
invasive except for the addition of some elog(ERROR) or similar, and
we could change the fallback implementation to use palloc_extended()
with MCXT_ALLOC_NO_OOM.


-1. On the contrary, I think we should reduce the number of checks
needed in the callers, and prefer throwing errors, if possible. It's too
easy to forget the check, and it makes the code more verbose, too.

In fact, it might be better if pg_cryptohash_init() and
pg_cryptohash_update() didn't return errors either. If an error happens,
they could just set a flag in the pg_cryptohash_ctx, and
pg_cryptohash_final() function would return the error. That way, you
would only need to check for error return in the call to
pg_cryptohash_final().


BTW, it's a bit weird that the pg_cryptohash_init/update/final() 
functions return success, if the ctx argument is NULL. It would seem 
more sensible for them to return an error. That way, if a caller forgets 
to check for NULL result from pg_cryptohash_create(), but correctly 
checks the result of those other functions, it would catch the error. In 
fact, if we documented that pg_cryptohash_create() can return NULL, and 
pg_cryptohash_final() always returns error on NULL argument, then it 
would be sufficient for the callers to only check the return value of 
pg_cryptohash_final(). So the usage pattern would be:


ctx = pg_cryptohash_create(PG_MD5);
pg_cryptohash_inti(ctx);
pg_update(ctx, data, size);
pg_update(ctx, moredata, size);
if (pg_cryptohash_final(ctx, &hash) < 0)
elog(ERROR, "md5 calculation failed");
pg_cryptohash_free(ctx);

- Heikki

Re: Incorrect allocation handling for cryptohash functions with OpenSSL


On 18/12/2020 12:10, Michael Paquier wrote:

On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote:

pg_cryptohash_create() is now susceptible to leaking memory in
TopMemoryContext, if the allocations fail. I think the attached should fix
it (but I haven't tested it at all).


Yeah, you are right here.  If the second allocation fails the first
one would leak.  I don't think that your suggested fix is completely
right though because it ignores that the callers of
pg_cryptohash_create() in the backend expect an error all the time, so
it could crash.


Ah, right.


Perhaps we should just bite the bullet and switch the
OpenSSL and fallback implementations to use allocation APIs that never
cause an error, and always return NULL?  That would have the advantage
to be more consistent with the frontend that relies in malloc(), at
the cost of requiring more changes for the backend code where the
_create() call would need to handle the NULL case properly.  The
backend calls are already aware of errors so that would not be
invasive except for the addition of some elog(ERROR) or similar, and
we could change the fallback implementation to use palloc_extended()
with MCXT_ALLOC_NO_OOM.


-1. On the contrary, I think we should reduce the number of checks 
needed in the callers, and prefer throwing errors, if possible. It's too 
easy to forget the check, and it makes the code more verbose, too.


In fact, it might be better if pg_cryptohash_init() and 
pg_cryptohash_update() didn't return errors either. If an error happens, 
they could just set a flag in the pg_cryptohash_ctx, and 
pg_cryptohash_final() function would return the error. That way, you 
would only need to check for error return in the call to 
pg_cryptohash_final().



BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we need
two structs? They're both allocated and controlled by the cryptohash
implementation. It would seem simpler to have just one.


Depending on the implementation, the data to track can be completely
different, and this split allows to know about the resowner dependency
only in the OpenSSL part of cryptohashes, without having to include
this knowledge in neither cryptohash.h nor in the fallback
implementation that can just use palloc() in the backend.



...  And I wanted to keep track of the type of cryptohash directly in
the shared structure.  ;)


You could also define a shared header, with the rest of the struct being 
implementation-specific:


typedef struct pg_cryptohash_ctx
{
pg_cryptohash_type type;

/* implementation-specific data follows */
} pg_cryptohash_ctx;

- Heikki

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

On Fri, Dec 18, 2020 at 11:51:55AM +0200, Heikki Linnakangas wrote:
> Something like this. Passes regression tests, but otherwise untested.

...  And I wanted to keep track of the type of cryptohash directly in
the shared structure.  ;)
--
Michael


signature.asc
Description: PGP signature

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote:
> pg_cryptohash_create() is now susceptible to leaking memory in
> TopMemoryContext, if the allocations fail. I think the attached should fix
> it (but I haven't tested it at all).

Yeah, you are right here.  If the second allocation fails the first
one would leak.  I don't think that your suggested fix is completely
right though because it ignores that the callers of
pg_cryptohash_create() in the backend expect an error all the time, so
it could crash.  Perhaps we should just bite the bullet and switch the
OpenSSL and fallback implementations to use allocation APIs that never
cause an error, and always return NULL?  That would have the advantage
to be more consistent with the frontend that relies in malloc(), at
the cost of requiring more changes for the backend code where the
_create() call would need to handle the NULL case properly.  The
backend calls are already aware of errors so that would not be
invasive except for the addition of some elog(ERROR) or similar, and
we could change the fallback implementation to use palloc_extended()
with MCXT_ALLOC_NO_OOM.

> BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we need
> two structs? They're both allocated and controlled by the cryptohash
> implementation. It would seem simpler to have just one.

Depending on the implementation, the data to track can be completely 
different, and this split allows to know about the resowner dependency
only in the OpenSSL part of cryptohashes, without having to include
this knowledge in neither cryptohash.h nor in the fallback
implementation that can just use palloc() in the backend.
--
Michael

signature.asc
Description: PGP signature

Re: Misleading comment in prologue of ReorderBufferQueueMessage

2020-12-18 Thread Ashutosh Bapat

On Wed, Dec 16, 2020 at 8:00 AM Amit Kapila  wrote:

> On Tue, Dec 15, 2020 at 11:25 AM Ashutosh Bapat
>  wrote:
> >
> > On Mon, Dec 14, 2020 at 3:14 PM Amit Kapila 
> wrote:
> >>
> >> On Mon, Dec 14, 2020 at 2:45 PM Ashutosh Bapat
> >>  wrote:
> >> >
> >> > The name of the function suggests that the given message will be
> queued in ReorderBuffer. The prologue of the function says so too
> >> >  776 /*
> >> >  777  * Queue message into a transaction so it can be processed upon
> commit.
> >> >  778  */
> >> > It led me to think that a non-transactional message is processed
> along with the surrounding transaction, esp. when it has an associated xid.
> >> >
> >> > But in reality, the function queues only a transactional message and
> decoders a non-transactional message immediately without waiting for a
> commit.
> >> >
> >> > We should modify the prologue to say
> >> > "Queue a transactional message into a transaction so that it can be
> processed upon commit. A non-transactional message is processed
> immediately." and also change the name of the function to
> ReorderBufferProcessMessage(), but the later may break API compatibility.
> >> >
> >>
> >> +1 for the comment change but I am not sure if it is a good idea to
> >> change the API name.
> >>
> > Can you please review wording? I will create a patch with updated
> wording.
> >
>
> How about something like below:
> A transactional message is queued to be processed upon commit and a
> non-transactional message gets processed immediately.
>

Used this one. PFA patch.

--
Best Wishes,
Ashutosh
From b23d080b10ebcb210cb186ab3d4832e2448d71b7 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 18 Dec 2020 15:33:48 +0530
Subject: [PATCH] Update prologue of ReorderBufferQueueMessage()

The prologue of this function describes behaviour in case of a
transactional WAL message only, but it accepts both transactional and
non-transactional WAL messages. Update the prologue to describe
behaviour in case of non-transactional WAL message as well.

Ashutosh Bapat, rephrased by Amit Kapila
---
 src/backend/replication/logical/reorderbuffer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 301baff244..85e78328b2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -783,7 +783,8 @@ ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn,
 }
 
 /*
- * Queue message into a transaction so it can be processed upon commit.
+ * A transactional message is queued to be processed upon commit and a
+ * non-transactional message gets processed immediately.
  */
 void
 ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid,
-- 
2.17.1

Re: Refactor routine to check for ASCII-only case


On 18/12/2020 05:57, Michael Paquier wrote:

As mentioned in [1], there are three places where there is the same
routine to check if a string is made only of ASCII characters.

This makes for a small-ish but nice cleanup, as per the attached.


+1

- Heikki

Re: [BUG] orphaned function

2020-12-18 Thread Gilles Darold


Le 18/12/2020 à 00:26, Tom Lane a écrit :

Gilles Darold  writes:

The same problem applies if the returned type or the procedural language
is dropped. I have tried to fix that in ProcedureCreate() by using an
AccessSharedLock for the data types and languages involved in the
function declaration. With this patch the race condition with parameters
types, return types and PL languages are fixed. Only data types and
procedural languages with Oid greater than FirstBootstrapObjectId will
be locked locked. But this is probably more complex than this fix so it
is proposed as a separate patch
(v1-003-orphaned_function_types_language.patch) to not interfere with
the applying of Bertran's patch.

Indeed.  This points up one of the things that is standing in the way
of any serious consideration of applying this patch.  To my mind there
are two fundamental, somewhat interrelated considerations that haven't
been meaningfully addressed:

1. What set of objects is it worth trying to remove this race condition
for, and with respect to what dependencies?  Bertrand gave no
justification for worrying about function-to-namespace dependencies
specifically, and you've likewise given none for expanding the patch
to consider function-to-datatype dependencies.  There are dozens more
cases that could be considered; but I sure don't want to be processing
another ad-hoc fix every time someone thinks they're worried about
another specific case.

Taking a practical viewpoint, I'm far more worried about dependencies
of tables than those of any other kind of object.  A messed-up function
definition doesn't cost you much: you can just drop and recreate the
function.  A table full of data is way more trouble to recreate, and
indeed the data might be irreplaceable.  So it seems pretty silly to
propose that we try to remove race conditions for functions' dependencies
on datatypes without removing the same race condition for tables'
dependencies on their column datatypes.

But any of these options lead to the same question: why stop there?
An approach that would actually be defensible, perhaps, is to incorporate
this functionality into the dependency mechanism: any time we're about to
create a pg_depend or pg_shdepend entry, take out an AccessShareLock on
the referenced object, and then check to make sure it still exists.
This would improve the dependency mechanism so it prevents creation-time
race conditions, not just attempts to drop dependencies of
already-committed objects.  It would mean that the one patch would be
the end of the problem, rather than looking forward to a steady drip of
new fixes.

2. How much are we willing to pay for this added protection?  The fact
that we've gotten along fine without it for years suggests that the
answer needs to be "not much".  But I'm not sure that that actually
is the answer, especially if we don't have a policy that says "we'll
protect against these race conditions but no other ones".  I think
there are possibly-serious costs in three different areas:

* Speed.  How much do all these additional lock acquisitions slow
down a typical DDL command?

* Number of locks taken per transaction.  This'd be particularly an
issue for pg_restore runs using single-transaction mode: they'd take
not only locks on the objects they create, but also a bunch of
reference-protection locks.  It's not very hard to believe that this'd
make a difference in whether restoring a large database is possible
without increasing max_locks_per_transaction.

* Risk of deadlock.  The reference locks themselves should be sharable,
so maybe there isn't much of a problem, but I want to see this question
seriously analyzed not just ignored.

Obviously, the magnitude of these costs depends a lot on how many
dependencies we want to remove the race condition for.  But that's
exactly the reason why I don't want a piecemeal approach of fixing
some problems now and some other problems later.  That's way too
much like the old recipe for boiling a frog: we could gradually get
into serious performance problems without anyone ever having stopped
to consider the issue.

In short, I think we should either go for a 100% solution if careful
analysis shows we can afford it, or else have a reasoned policy
why we are going to close these specific race conditions and no others
(implying that we'll reject future patches in the same area).  We
haven't got either thing in this thread as it stands, so I do not
think it's anywhere near being ready to commit.

regards, tom lane



Thanks for the review,


Honestly I have never met these races condition in 25 years of 
PostgreSQL and will probably never but clearly I don't have the Amazon 
database services workload. I let Bertrand decide what to do with the 
patch and address the races condition with a more global approach 
following your advices if more work is done on this topic. I tag the 
patch as "Waiting on author".



Best regards,

--
Gilles Darold
LzLabs GmbH
http://www.lzlabs

Re: Incorrect allocation handling for cryptohash functions with OpenSSL


On 18/12/2020 11:35, Heikki Linnakangas wrote:

BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we
need two structs? They're both allocated and controlled by the
cryptohash implementation. It would seem simpler to have just one.


Something like this. Passes regression tests, but otherwise untested.

- Heikki
diff --git a/src/common/cryptohash.c b/src/common/cryptohash.c
index cf4588bad72..d3fa32888f0 100644
--- a/src/common/cryptohash.c
+++ b/src/common/cryptohash.c
@@ -39,6 +39,20 @@
 #define FREE(ptr) free(ptr)
 #endif
 
+struct pg_cryptohash_ctx
+{
+	pg_cryptohash_type type;
+
+	union
+	{
+		pg_md5_ctx		md5;
+		pg_sha224_ctx	sha224;
+		pg_sha256_ctx	sha256;
+		pg_sha384_ctx	sha384;
+		pg_sha512_ctx	sha512;
+	} data;
+};
+
 /*
  * pg_cryptohash_create
  *
@@ -50,38 +64,15 @@ pg_cryptohash_create(pg_cryptohash_type type)
 {
 	pg_cryptohash_ctx *ctx;
 
+	/*
+	 * FIXME: this always allocates enough space for the largest hash.
+	 * A smaller allocation would be enough for md5, sha224 and sha256.
+	 */
 	ctx = ALLOC(sizeof(pg_cryptohash_ctx));
 	if (ctx == NULL)
 		return NULL;
-
 	ctx->type = type;
 
-	switch (type)
-	{
-		case PG_MD5:
-			ctx->data = ALLOC(sizeof(pg_md5_ctx));
-			break;
-		case PG_SHA224:
-			ctx->data = ALLOC(sizeof(pg_sha224_ctx));
-			break;
-		case PG_SHA256:
-			ctx->data = ALLOC(sizeof(pg_sha256_ctx));
-			break;
-		case PG_SHA384:
-			ctx->data = ALLOC(sizeof(pg_sha384_ctx));
-			break;
-		case PG_SHA512:
-			ctx->data = ALLOC(sizeof(pg_sha512_ctx));
-			break;
-	}
-
-	if (ctx->data == NULL)
-	{
-		explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
-		FREE(ctx);
-		return NULL;
-	}
-
 	return ctx;
 }
 
@@ -100,19 +91,19 @@ pg_cryptohash_init(pg_cryptohash_ctx *ctx)
 	switch (ctx->type)
 	{
 		case PG_MD5:
-			pg_md5_init((pg_md5_ctx *) ctx->data);
+			pg_md5_init(&ctx->data.md5);
 			break;
 		case PG_SHA224:
-			pg_sha224_init((pg_sha224_ctx *) ctx->data);
+			pg_sha224_init(&ctx->data.sha224);
 			break;
 		case PG_SHA256:
-			pg_sha256_init((pg_sha256_ctx *) ctx->data);
+			pg_sha256_init(&ctx->data.sha256);
 			break;
 		case PG_SHA384:
-			pg_sha384_init((pg_sha384_ctx *) ctx->data);
+			pg_sha384_init(&ctx->data.sha384);
 			break;
 		case PG_SHA512:
-			pg_sha512_init((pg_sha512_ctx *) ctx->data);
+			pg_sha512_init(&ctx->data.sha512);
 			break;
 	}
 
@@ -134,19 +125,19 @@ pg_cryptohash_update(pg_cryptohash_ctx *ctx, const uint8 *data, size_t len)
 	switch (ctx->type)
 	{
 		case PG_MD5:
-			pg_md5_update((pg_md5_ctx *) ctx->data, data, len);
+			pg_md5_update(&ctx->data.md5, data, len);
 			break;
 		case PG_SHA224:
-			pg_sha224_update((pg_sha224_ctx *) ctx->data, data, len);
+			pg_sha224_update(&ctx->data.sha224, data, len);
 			break;
 		case PG_SHA256:
-			pg_sha256_update((pg_sha256_ctx *) ctx->data, data, len);
+			pg_sha256_update(&ctx->data.sha256, data, len);
 			break;
 		case PG_SHA384:
-			pg_sha384_update((pg_sha384_ctx *) ctx->data, data, len);
+			pg_sha384_update(&ctx->data.sha384, data, len);
 			break;
 		case PG_SHA512:
-			pg_sha512_update((pg_sha512_ctx *) ctx->data, data, len);
+			pg_sha512_update(&ctx->data.sha512, data, len);
 			break;
 	}
 
@@ -168,19 +159,19 @@ pg_cryptohash_final(pg_cryptohash_ctx *ctx, uint8 *dest)
 	switch (ctx->type)
 	{
 		case PG_MD5:
-			pg_md5_final((pg_md5_ctx *) ctx->data, dest);
+			pg_md5_final(&ctx->data.md5, dest);
 			break;
 		case PG_SHA224:
-			pg_sha224_final((pg_sha224_ctx *) ctx->data, dest);
+			pg_sha224_final(&ctx->data.sha224, dest);
 			break;
 		case PG_SHA256:
-			pg_sha256_final((pg_sha256_ctx *) ctx->data, dest);
+			pg_sha256_final(&ctx->data.sha256, dest);
 			break;
 		case PG_SHA384:
-			pg_sha384_final((pg_sha384_ctx *) ctx->data, dest);
+			pg_sha384_final(&ctx->data.sha384, dest);
 			break;
 		case PG_SHA512:
-			pg_sha512_final((pg_sha512_ctx *) ctx->data, dest);
+			pg_sha512_final(&ctx->data.sha512, dest);
 			break;
 	}
 
@@ -198,26 +189,6 @@ pg_cryptohash_free(pg_cryptohash_ctx *ctx)
 	if (ctx == NULL)
 		return;
 
-	switch (ctx->type)
-	{
-		case PG_MD5:
-			explicit_bzero(ctx->data, sizeof(pg_md5_ctx));
-			break;
-		case PG_SHA224:
-			explicit_bzero(ctx->data, sizeof(pg_sha224_ctx));
-			break;
-		case PG_SHA256:
-			explicit_bzero(ctx->data, sizeof(pg_sha256_ctx));
-			break;
-		case PG_SHA384:
-			explicit_bzero(ctx->data, sizeof(pg_sha384_ctx));
-			break;
-		case PG_SHA512:
-			explicit_bzero(ctx->data, sizeof(pg_sha512_ctx));
-			break;
-	}
-
-	FREE(ctx->data);
 	explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
 	FREE(ctx);
 }
diff --git a/src/common/cryptohash_openssl.c b/src/common/cryptohash_openssl.c
index 118651c4153..24b9636d1ca 100644
--- a/src/common/cryptohash_openssl.c
+++ b/src/common/cryptohash_openssl.c
@@ -35,7 +35,7 @@
  * use malloc to be able to return a failure status back to the caller.
  */
 #ifndef FRONTEND
-#define ALLOC(size) palloc(size)
+#define ALLOC(size) MemoryContextAllocExtended(TopMemoryContext, size, MCXT_ALLOC_NO_OOM)
 #define FR

Re: Deadlock between backend and recovery may not be detected

2020-12-18 Thread Fujii Masao

On 2020/12/17 2:15, Fujii Masao wrote:

On 2020/12/16 23:28, Drouvot, Bertrand wrote:

Hi,

On 12/16/20 2:36 PM, Victor Yegorov wrote:

*CAUTION*: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.

ср, 16 дек. 2020 г. в 13:49, Fujii Masao mailto:masao.fu...@oss.nttdata.com>>:

    After doing this procedure, you can see the startup process and backend
    wait for the table lock each other, i.e., deadlock. But this deadlock 
remains
    even after deadlock_timeout passes.

    This seems a bug to me.

+1

    > * Deadlocks involving the Startup process and an ordinary backend process
    > * will be detected by the deadlock detector within the ordinary backend.

    The cause of this issue seems that ResolveRecoveryConflictWithLock() that
    the startup process calls when recovery conflict on lock happens doesn't
    take care of deadlock case at all. You can see this fact by reading the 
above
    source code comment for ResolveRecoveryConflictWithLock().

    To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
    timer in ResolveRecoveryConflictWithLock() so that the startup process can
    send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
    Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
    the backend should check whether the deadlock actually happens or not.
    Attached is the POC patch implimenting this.

good catch!

I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be 
set in ResolveRecoveryConflictWithLock() too (it is already set in 
ResolveRecoveryConflictWithBufferPin()).

So + 1 to consider this as a bug and for the way the patch proposes to fix it.

Thanks Victor and Bertrand for agreeing!
Attached is the updated version of the patch.

Attached is v3 of the patch. Could you review this version?

While the startup process is waiting for recovery conflict on buffer pin,
it repeats sending the request for deadlock check to all the backends
every deadlock_timeout. This may increase the workload in the startup
process and backends, but since this is the original behavior, the patch
doesn't change that. Also in practice this may not be so harmful because
the period that the buffer is kept pinned is basically not so long.

On the other hand, IMO we should avoid this issue while the startup
process is waiting for recovery conflict on locks since it can take
a long time to release the locks. So the patch tries to fix it.

Or I'm overthinking this? If this doesn't need to be handled,
the patch can be simplified more. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/storage/ipc/procarray.c 
b/src/backend/storage/ipc/procarray.c
index ee912b9d5e..7c1a3f8b3f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3324,6 +3324,13 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid 
dbOid)
  */
 pid_t
 CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
+{
+   return SignalVirtualTransaction(vxid, sigmode, true);
+}
+
+pid_t
+SignalVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode,
+bool conflictPending)
 {
ProcArrayStruct *arrayP = procArray;
int index;
@@ -3342,7 +3349,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, 
ProcSignalReason sigmode)
if (procvxid.backendId == vxid.backendId &&
procvxid.localTransactionId == vxid.localTransactionId)
{
-   proc->recoveryConflictPending = true;
+   proc->recoveryConflictPending = conflictPending;
pid = proc->pid;
if (pid != 0)
{
diff --git a/src/backend/storage/ipc/standby.c 
b/src/backend/storage/ipc/standby.c
index 92d9027776..3164aed423 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,6 +42,10 @@ int  max_standby_streaming_delay = 30 * 1000;

 static HTAB *RecoveryLockLists;

+/* Flags set by timeout handlers */
+static volatile sig_atomic_t got_standby_deadlock_timeout = false;
+static volatile sig_atomic_t got_standby_lock_timeout = false;
+
 static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId 
*waitlist,

   ProcSignalReason reason,

   uint32 wait_event_info,
@@ -395,8 +399,10 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
  * lock.  As we are already queued to be granted the lock, no new lock
  * requests conflicting w

Re: Incorrect allocation handling for cryptohash functions with OpenSSL


On 18/12/2020 09:35, Michael Paquier wrote:

Hi all,

As of the work done in 87ae9691, I have played with error injections
in the code paths using this code, but forgot to count for cases where
cascading resowner cleanups are involved.  Like other resources (JIT,
DSM, etc.), this requires an allocation in TopMemoryContext to make
sure that nothing gets forgotten or cleaned up on the way until the
resowner that did the cryptohash allocation is handled.

Attached is a small extension I have played with by doing some error
injections, and a patch.  If there are no objections, I would like to
commit this fix.


pg_cryptohash_create() is now susceptible to leaking memory in 
TopMemoryContext, if the allocations fail. I think the attached should 
fix it (but I haven't tested it at all).


BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we 
need two structs? They're both allocated and controlled by the 
cryptohash implementation. It would seem simpler to have just one.


- Heikki
diff --git a/src/common/cryptohash_openssl.c b/src/common/cryptohash_openssl.c
index 118651c4153..bb303dd6c8d 100644
--- a/src/common/cryptohash_openssl.c
+++ b/src/common/cryptohash_openssl.c
@@ -35,7 +35,7 @@
  * use malloc to be able to return a failure status back to the caller.
  */
 #ifndef FRONTEND
-#define ALLOC(size) palloc(size)
+#define ALLOC(size) MemoryContextAllocExtended(TopMemoryContext, size, MCXT_ALLOC_NO_OOM)
 #define FREE(ptr) pfree(ptr)
 #else
 #define ALLOC(size) malloc(size)
@@ -69,6 +69,10 @@ pg_cryptohash_create(pg_cryptohash_type type)
 	pg_cryptohash_ctx *ctx;
 	pg_cryptohash_state *state;
 
+#ifndef FRONTEND
+	ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner);
+#endif
+
 	ctx = ALLOC(sizeof(pg_cryptohash_ctx));
 	if (ctx == NULL)
 		return NULL;
@@ -84,10 +88,6 @@ pg_cryptohash_create(pg_cryptohash_type type)
 	ctx->data = state;
 	ctx->type = type;
 
-#ifndef FRONTEND
-	ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner);
-#endif
-
 	/*
 	 * Initialization takes care of assigning the correct type for OpenSSL.
 	 */

Re: pgbench failed when -f option contains a char '@'