Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2018-09-08 Thread Thomas Munro
On Thu, Jan 11, 2018 at 3:01 PM Thomas Munro
 wrote:
> So I agree with Tom's suggestion:
>
> On Wed, Oct 4, 2017 at 2:29 PM, Tom Lane  wrote:
> > Perhaps serialize the contents into an array in DSM, then rebuild a hash
> > table from that in the worker.  Robert might have a better idea though.
>
> I'd happily volunteer to write or review a patch to do that.  Is there
> a rebase of the stuff that got reverted, to build on?

Here is a draft patch showing the approach discussed for transmitting
enum_blacklist in parallel workers.  This should be useful for
reviving the code reverted by 93a1af0b.

--
Thomas Munro
http://www.enterprisedb.com
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index c168118467..c95caee017 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -19,6 +19,7 @@
 #include "access/session.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "catalog/pg_enum.h"
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "commands/async.h"
@@ -71,6 +72,7 @@
 #define PARALLEL_KEY_SESSION_DSM			UINT64CONST(0x000A)
 #define PARALLEL_KEY_REINDEX_STATE			UINT64CONST(0x000B)
 #define PARALLEL_KEY_RELMAPPER_STATE		UINT64CONST(0x000C)
+#define PARALLEL_KEY_ENUMBLACKLIST			UINT64CONST(0x000D)
 
 /* Fixed-size parallel state. */
 typedef struct FixedParallelState
@@ -208,6 +210,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	Size		tstatelen = 0;
 	Size		reindexlen = 0;
 	Size		relmapperlen = 0;
+	Size		enumblacklistlen = 0;
 	Size		segsize = 0;
 	int			i;
 	FixedParallelState *fps;
@@ -261,8 +264,10 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		shm_toc_estimate_chunk(>estimator, reindexlen);
 		relmapperlen = EstimateRelationMapSpace();
 		shm_toc_estimate_chunk(>estimator, relmapperlen);
+		enumblacklistlen = EstimateEnumBlacklistSpace();
+		shm_toc_estimate_chunk(>estimator, enumblacklistlen);
 		/* If you add more chunks here, you probably need to add keys. */
-		shm_toc_estimate_keys(>estimator, 9);
+		shm_toc_estimate_keys(>estimator, 10);
 
 		/* Estimate space need for error queues. */
 		StaticAssertStmt(BUFFERALIGN(PARALLEL_ERROR_QUEUE_SIZE) ==
@@ -336,6 +341,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		char	   *error_queue_space;
 		char	   *session_dsm_handle_space;
 		char	   *entrypointstate;
+		char	   *enumblacklistspace;
 		Size		lnamelen;
 
 		/* Serialize shared libraries we have loaded. */
@@ -385,6 +391,12 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		shm_toc_insert(pcxt->toc, PARALLEL_KEY_RELMAPPER_STATE,
 	   relmapperspace);
 
+		/* Serialize enum blacklist state. */
+		enumblacklistspace = shm_toc_allocate(pcxt->toc, enumblacklistlen);
+		SerializeEnumBlacklist(enumblacklistspace);
+		shm_toc_insert(pcxt->toc, PARALLEL_KEY_ENUMBLACKLIST,
+	   enumblacklistspace);
+
 		/* Allocate space for worker information. */
 		pcxt->worker = palloc0(sizeof(ParallelWorkerInfo) * pcxt->nworkers);
 
@@ -1218,6 +1230,7 @@ ParallelWorkerMain(Datum main_arg)
 	char	   *tstatespace;
 	char	   *reindexspace;
 	char	   *relmapperspace;
+	char	   *enumblacklistspace;
 	StringInfoData msgbuf;
 	char	   *session_dsm_handle_space;
 
@@ -1397,6 +1410,11 @@ ParallelWorkerMain(Datum main_arg)
 	relmapperspace = shm_toc_lookup(toc, PARALLEL_KEY_RELMAPPER_STATE, false);
 	RestoreRelationMap(relmapperspace);
 
+	/* Restore enum blacklist. */
+	enumblacklistspace = shm_toc_lookup(toc, PARALLEL_KEY_ENUMBLACKLIST,
+		false);
+	RestoreEnumBlacklist(enumblacklistspace);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index fde361f367..87a12996c7 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -28,6 +28,7 @@
 #include "utils/builtins.h"
 #include "utils/catcache.h"
 #include "utils/fmgroids.h"
+#include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 
@@ -38,6 +39,7 @@ Oid			binary_upgrade_next_pg_enum_oid = InvalidOid;
 static void RenumberEnumType(Relation pg_enum, HeapTuple *existing, int nelems);
 static int	sort_order_cmp(const void *p1, const void *p2);
 
+static HTAB *enum_blacklist = NULL;
 
 /*
  * EnumValuesCreate
@@ -620,3 +622,56 @@ sort_order_cmp(const void *p1, const void *p2)
 	else
 		return 0;
 }
+
+Size
+EstimateEnumBlacklistSpace(void)
+{
+	if (!enum_blacklist)
+		return sizeof(Oid);
+	return sizeof(Oid) + sizeof(Oid) * hash_get_num_entries(enum_blacklist);
+}
+
+void
+SerializeEnumBlacklist(void *space)
+{
+	Oid *serialized = (Oid *) space;
+	HASH_SEQ_STATUS status;
+	Oid *value;
+
+	if (!enum_blacklist)
+	{
+		*serialized = 0;
+		return;
+	}
+
+	*serialized = hash_get_num_entries(enum_blacklist);
+	hash_seq_init(, enum_blacklist);
+	while ((value = (Oid *) hash_seq_search()))
+		*serialized++ = *value;
+}
+
+void

Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2018-01-12 Thread Robert Haas
On Fri, Jan 12, 2018 at 5:06 PM, Andres Freund  wrote:
> OTOH, it seems quite likely that we'll add more transaction-lifetime
> shared data (e.g. combocid), so building per-xact infrastructure
> actually seems like a good idea.

Sure, but there's no urgency about it.  Particularly if the first cut
at this is implemented using dshash, moving the dshash to a different
DSA with a different lifespan later should be easy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2018-01-12 Thread Andres Freund
On 2018-01-12 07:51:34 -0500, Robert Haas wrote:
> On Thu, Jan 11, 2018 at 11:01 PM, Thomas Munro
>  wrote:
> > Are you saying we should do the work now to create a per-transaction
> > DSM segment + DSA area + thing that every backend attaches to?
> 
> No, I was just thinking you could stuff it into the per-parallel-query
> DSM/DSA.  But...
> 
> > I didn't think creating backend local hash tables would be a problem
> > because it's a vanishingly rare occurrence for the hash table to be
> > created at all (ie when you've altered an enum), and if created, to
> > have more than a couple of entries in it.
> 
> ...this is also a fair point.

OTOH, it seems quite likely that we'll add more transaction-lifetime
shared data (e.g. combocid), so building per-xact infrastructure
actually seems like a good idea.

Greetings,

Andres Freund



Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2018-01-12 Thread Robert Haas
On Thu, Jan 11, 2018 at 11:01 PM, Thomas Munro
 wrote:
> Are you saying we should do the work now to create a per-transaction
> DSM segment + DSA area + thing that every backend attaches to?

No, I was just thinking you could stuff it into the per-parallel-query
DSM/DSA.  But...

> I didn't think creating backend local hash tables would be a problem
> because it's a vanishingly rare occurrence for the hash table to be
> created at all (ie when you've altered an enum), and if created, to
> have more than a couple of entries in it.

...this is also a fair point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2018-01-11 Thread Thomas Munro
On Fri, Jan 12, 2018 at 4:19 PM, Robert Haas  wrote:
> On Thu, Jan 11, 2018 at 6:01 PM, Thomas Munro
>  wrote:
>> [ the data isn't session lifetime ]
>>
>> So I agree with Tom's suggestion:
>>
>> On Wed, Oct 4, 2017 at 2:29 PM, Tom Lane  wrote:
>>> Perhaps serialize the contents into an array in DSM, then rebuild a hash
>>> table from that in the worker.  Robert might have a better idea though.
>>
>> I'd happily volunteer to write or review a patch to do that.  Is there
>> a rebase of the stuff that got reverted, to build on?
>
> Those seem like reasons not to use Session, but not necessarily
> reasons not to have the leader directly build the dshash that the
> workers access rather than building a separate hash table in every
> worker.

Are you saying we should do the work now to create a per-transaction
DSM segment + DSA area + thing that every backend attaches to?  I
guess that would be much like the per-session one, except that we'd
reset it and end of xact (a DSA facility we don't have yet but which
would amount to zapping all superblocks, freeing all but one segment
or something).  Like the per-session one, I suppose it would be
created on demand when you run your first parallel query, and then
survive as long as the leader backend.  I assumed that we'd do that
work when we really need it for writable parallel query, but that it'd
be overkill for this.

Note that the shared record thing still has the backend local cache in
front of the shared registry, to avoid acquiring locks, so doesn't
actually avoid creating a per-backend hash table, though its entries
point directly to TupleDesc objects in shm.

> Maybe having every worker build a separate hash table is a good idea
> for some reason, but it's not clear to me that you've stated such a
> reason.

I didn't think creating backend local hash tables would be a problem
because it's a vanishingly rare occurrence for the hash table to be
created at all (ie when you've altered an enum), and if created, to
have more than a couple of entries in it.  But here's another idea, at
the small end of the how-much-work spectrum:

1.  Put the OIDs into a sorted array in the DSM segment.
2.  Arrange for EnumBlacklisted() (assuming we're talking about the
infrastructure from commit 1635e80d that was later reverted) to get
its hands on a pointer to that + size and binary search it, instead of
looking in the hash table (enum_blacklist), when running in a worker.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2018-01-11 Thread Robert Haas
On Thu, Jan 11, 2018 at 6:01 PM, Thomas Munro
 wrote:
> [ the data isn't session lifetime ]
>
> So I agree with Tom's suggestion:
>
> On Wed, Oct 4, 2017 at 2:29 PM, Tom Lane  wrote:
>> Perhaps serialize the contents into an array in DSM, then rebuild a hash
>> table from that in the worker.  Robert might have a better idea though.
>
> I'd happily volunteer to write or review a patch to do that.  Is there
> a rebase of the stuff that got reverted, to build on?

Those seem like reasons not to use Session, but not necessarily
reasons not to have the leader directly build the dshash that the
workers access rather than building a separate hash table in every
worker.

Maybe having every worker build a separate hash table is a good idea
for some reason, but it's not clear to me that you've stated such a
reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2018-01-11 Thread Thomas Munro
On Fri, Oct 6, 2017 at 2:45 AM, Robert Haas  wrote:
> On Tue, Oct 3, 2017 at 9:38 PM, Andres Freund  wrote:
>>> Do you have any suggestion as to how we should transmit the blacklist to
>>> parallel workers?
>>
>> How about storing them in the a dshash table instead of dynahash?
>> Similar to how we're now dealing with the shared typmod registry stuff?
>> It should be fairly simple to now simply add a new struct Session member
>> shared_enum_whatevs_table.
>
> Yeah, that approach seems worth exploring.

Andrew Dunstan asked me off-list which README covers that stuff.  Erm,
there isn't one, so I was going to write some explanation here to see
if that could help... but after looking at this I'm not sure I agree
it's the right approach anyway.

The reason commit cc5f8136 introduced Session and
SharedRecordTypmodRegistry was that we have some state that is
session-scoped and writable by any worker.  In contrast:

1.  The enum OID blacklist has transaction scope.  If you wanted to
put it into the Session you'd have to explicitly zap it at end of
transaction.  Incidentally dshash has no fast reset, though that could
be added, but I'd probably want fast per-transaction cleanup to skip
retail destruction entirely and simply give back all the memory, just
like MemoryContext does.  There are other transaction-scoped things
we'll eventually want to share, like ComboCIDs, so I think we'll need
something like that, but no one has been brave enough to propose the
machinery yet.

2.  The enum OID blacklist is read-only for workers.  They don't
create new blacklisted enums and don't see that they ever will.

So I agree with Tom's suggestion:

On Wed, Oct 4, 2017 at 2:29 PM, Tom Lane  wrote:
> Perhaps serialize the contents into an array in DSM, then rebuild a hash
> table from that in the worker.  Robert might have a better idea though.

I'd happily volunteer to write or review a patch to do that.  Is there
a rebase of the stuff that got reverted, to build on?

-- 
Thomas Munro
http://www.enterprisedb.com