Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On Thu, Jan 11, 2018 at 3:01 PM Thomas Munro wrote: > So I agree with Tom's suggestion: > > On Wed, Oct 4, 2017 at 2:29 PM, Tom Lane wrote: > > Perhaps serialize the contents into an array in DSM, then rebuild a hash > > table from that in the worker. Robert might have a better idea though. > > I'd happily volunteer to write or review a patch to do that. Is there > a rebase of the stuff that got reverted, to build on? Here is a draft patch showing the approach discussed for transmitting enum_blacklist in parallel workers. This should be useful for reviving the code reverted by 93a1af0b. -- Thomas Munro http://www.enterprisedb.com diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index c168118467..c95caee017 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -19,6 +19,7 @@ #include "access/session.h" #include "access/xact.h" #include "access/xlog.h" +#include "catalog/pg_enum.h" #include "catalog/index.h" #include "catalog/namespace.h" #include "commands/async.h" @@ -71,6 +72,7 @@ #define PARALLEL_KEY_SESSION_DSM UINT64CONST(0x000A) #define PARALLEL_KEY_REINDEX_STATE UINT64CONST(0x000B) #define PARALLEL_KEY_RELMAPPER_STATE UINT64CONST(0x000C) +#define PARALLEL_KEY_ENUMBLACKLIST UINT64CONST(0x000D) /* Fixed-size parallel state. */ typedef struct FixedParallelState @@ -208,6 +210,7 @@ InitializeParallelDSM(ParallelContext *pcxt) Size tstatelen = 0; Size reindexlen = 0; Size relmapperlen = 0; + Size enumblacklistlen = 0; Size segsize = 0; int i; FixedParallelState *fps; @@ -261,8 +264,10 @@ InitializeParallelDSM(ParallelContext *pcxt) shm_toc_estimate_chunk(>estimator, reindexlen); relmapperlen = EstimateRelationMapSpace(); shm_toc_estimate_chunk(>estimator, relmapperlen); + enumblacklistlen = EstimateEnumBlacklistSpace(); + shm_toc_estimate_chunk(>estimator, enumblacklistlen); /* If you add more chunks here, you probably need to add keys. */ - shm_toc_estimate_keys(>estimator, 9); + shm_toc_estimate_keys(>estimator, 10); /* Estimate space need for error queues. */ StaticAssertStmt(BUFFERALIGN(PARALLEL_ERROR_QUEUE_SIZE) == @@ -336,6 +341,7 @@ InitializeParallelDSM(ParallelContext *pcxt) char *error_queue_space; char *session_dsm_handle_space; char *entrypointstate; + char *enumblacklistspace; Size lnamelen; /* Serialize shared libraries we have loaded. */ @@ -385,6 +391,12 @@ InitializeParallelDSM(ParallelContext *pcxt) shm_toc_insert(pcxt->toc, PARALLEL_KEY_RELMAPPER_STATE, relmapperspace); + /* Serialize enum blacklist state. */ + enumblacklistspace = shm_toc_allocate(pcxt->toc, enumblacklistlen); + SerializeEnumBlacklist(enumblacklistspace); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_ENUMBLACKLIST, + enumblacklistspace); + /* Allocate space for worker information. */ pcxt->worker = palloc0(sizeof(ParallelWorkerInfo) * pcxt->nworkers); @@ -1218,6 +1230,7 @@ ParallelWorkerMain(Datum main_arg) char *tstatespace; char *reindexspace; char *relmapperspace; + char *enumblacklistspace; StringInfoData msgbuf; char *session_dsm_handle_space; @@ -1397,6 +1410,11 @@ ParallelWorkerMain(Datum main_arg) relmapperspace = shm_toc_lookup(toc, PARALLEL_KEY_RELMAPPER_STATE, false); RestoreRelationMap(relmapperspace); + /* Restore enum blacklist. */ + enumblacklistspace = shm_toc_lookup(toc, PARALLEL_KEY_ENUMBLACKLIST, + false); + RestoreEnumBlacklist(enumblacklistspace); + /* * We've initialized all of our state now; nothing should change * hereafter. diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c index fde361f367..87a12996c7 100644 --- a/src/backend/catalog/pg_enum.c +++ b/src/backend/catalog/pg_enum.c @@ -28,6 +28,7 @@ #include "utils/builtins.h" #include "utils/catcache.h" #include "utils/fmgroids.h" +#include "utils/memutils.h" #include "utils/syscache.h" #include "utils/tqual.h" @@ -38,6 +39,7 @@ Oid binary_upgrade_next_pg_enum_oid = InvalidOid; static void RenumberEnumType(Relation pg_enum, HeapTuple *existing, int nelems); static int sort_order_cmp(const void *p1, const void *p2); +static HTAB *enum_blacklist = NULL; /* * EnumValuesCreate @@ -620,3 +622,56 @@ sort_order_cmp(const void *p1, const void *p2) else return 0; } + +Size +EstimateEnumBlacklistSpace(void) +{ + if (!enum_blacklist) + return sizeof(Oid); + return sizeof(Oid) + sizeof(Oid) * hash_get_num_entries(enum_blacklist); +} + +void +SerializeEnumBlacklist(void *space) +{ + Oid *serialized = (Oid *) space; + HASH_SEQ_STATUS status; + Oid *value; + + if (!enum_blacklist) + { + *serialized = 0; + return; + } + + *serialized = hash_get_num_entries(enum_blacklist); + hash_seq_init(, enum_blacklist); + while ((value = (Oid *) hash_seq_search())) + *serialized++ = *value; +} + +void
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On Fri, Jan 12, 2018 at 5:06 PM, Andres Freundwrote: > OTOH, it seems quite likely that we'll add more transaction-lifetime > shared data (e.g. combocid), so building per-xact infrastructure > actually seems like a good idea. Sure, but there's no urgency about it. Particularly if the first cut at this is implemented using dshash, moving the dshash to a different DSA with a different lifespan later should be easy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On 2018-01-12 07:51:34 -0500, Robert Haas wrote: > On Thu, Jan 11, 2018 at 11:01 PM, Thomas Munro >wrote: > > Are you saying we should do the work now to create a per-transaction > > DSM segment + DSA area + thing that every backend attaches to? > > No, I was just thinking you could stuff it into the per-parallel-query > DSM/DSA. But... > > > I didn't think creating backend local hash tables would be a problem > > because it's a vanishingly rare occurrence for the hash table to be > > created at all (ie when you've altered an enum), and if created, to > > have more than a couple of entries in it. > > ...this is also a fair point. OTOH, it seems quite likely that we'll add more transaction-lifetime shared data (e.g. combocid), so building per-xact infrastructure actually seems like a good idea. Greetings, Andres Freund
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On Thu, Jan 11, 2018 at 11:01 PM, Thomas Munrowrote: > Are you saying we should do the work now to create a per-transaction > DSM segment + DSA area + thing that every backend attaches to? No, I was just thinking you could stuff it into the per-parallel-query DSM/DSA. But... > I didn't think creating backend local hash tables would be a problem > because it's a vanishingly rare occurrence for the hash table to be > created at all (ie when you've altered an enum), and if created, to > have more than a couple of entries in it. ...this is also a fair point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On Fri, Jan 12, 2018 at 4:19 PM, Robert Haaswrote: > On Thu, Jan 11, 2018 at 6:01 PM, Thomas Munro > wrote: >> [ the data isn't session lifetime ] >> >> So I agree with Tom's suggestion: >> >> On Wed, Oct 4, 2017 at 2:29 PM, Tom Lane wrote: >>> Perhaps serialize the contents into an array in DSM, then rebuild a hash >>> table from that in the worker. Robert might have a better idea though. >> >> I'd happily volunteer to write or review a patch to do that. Is there >> a rebase of the stuff that got reverted, to build on? > > Those seem like reasons not to use Session, but not necessarily > reasons not to have the leader directly build the dshash that the > workers access rather than building a separate hash table in every > worker. Are you saying we should do the work now to create a per-transaction DSM segment + DSA area + thing that every backend attaches to? I guess that would be much like the per-session one, except that we'd reset it and end of xact (a DSA facility we don't have yet but which would amount to zapping all superblocks, freeing all but one segment or something). Like the per-session one, I suppose it would be created on demand when you run your first parallel query, and then survive as long as the leader backend. I assumed that we'd do that work when we really need it for writable parallel query, but that it'd be overkill for this. Note that the shared record thing still has the backend local cache in front of the shared registry, to avoid acquiring locks, so doesn't actually avoid creating a per-backend hash table, though its entries point directly to TupleDesc objects in shm. > Maybe having every worker build a separate hash table is a good idea > for some reason, but it's not clear to me that you've stated such a > reason. I didn't think creating backend local hash tables would be a problem because it's a vanishingly rare occurrence for the hash table to be created at all (ie when you've altered an enum), and if created, to have more than a couple of entries in it. But here's another idea, at the small end of the how-much-work spectrum: 1. Put the OIDs into a sorted array in the DSM segment. 2. Arrange for EnumBlacklisted() (assuming we're talking about the infrastructure from commit 1635e80d that was later reverted) to get its hands on a pointer to that + size and binary search it, instead of looking in the hash table (enum_blacklist), when running in a worker. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On Thu, Jan 11, 2018 at 6:01 PM, Thomas Munrowrote: > [ the data isn't session lifetime ] > > So I agree with Tom's suggestion: > > On Wed, Oct 4, 2017 at 2:29 PM, Tom Lane wrote: >> Perhaps serialize the contents into an array in DSM, then rebuild a hash >> table from that in the worker. Robert might have a better idea though. > > I'd happily volunteer to write or review a patch to do that. Is there > a rebase of the stuff that got reverted, to build on? Those seem like reasons not to use Session, but not necessarily reasons not to have the leader directly build the dshash that the workers access rather than building a separate hash table in every worker. Maybe having every worker build a separate hash table is a good idea for some reason, but it's not clear to me that you've stated such a reason. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On Fri, Oct 6, 2017 at 2:45 AM, Robert Haaswrote: > On Tue, Oct 3, 2017 at 9:38 PM, Andres Freund wrote: >>> Do you have any suggestion as to how we should transmit the blacklist to >>> parallel workers? >> >> How about storing them in the a dshash table instead of dynahash? >> Similar to how we're now dealing with the shared typmod registry stuff? >> It should be fairly simple to now simply add a new struct Session member >> shared_enum_whatevs_table. > > Yeah, that approach seems worth exploring. Andrew Dunstan asked me off-list which README covers that stuff. Erm, there isn't one, so I was going to write some explanation here to see if that could help... but after looking at this I'm not sure I agree it's the right approach anyway. The reason commit cc5f8136 introduced Session and SharedRecordTypmodRegistry was that we have some state that is session-scoped and writable by any worker. In contrast: 1. The enum OID blacklist has transaction scope. If you wanted to put it into the Session you'd have to explicitly zap it at end of transaction. Incidentally dshash has no fast reset, though that could be added, but I'd probably want fast per-transaction cleanup to skip retail destruction entirely and simply give back all the memory, just like MemoryContext does. There are other transaction-scoped things we'll eventually want to share, like ComboCIDs, so I think we'll need something like that, but no one has been brave enough to propose the machinery yet. 2. The enum OID blacklist is read-only for workers. They don't create new blacklisted enums and don't see that they ever will. So I agree with Tom's suggestion: On Wed, Oct 4, 2017 at 2:29 PM, Tom Lane wrote: > Perhaps serialize the contents into an array in DSM, then rebuild a hash > table from that in the worker. Robert might have a better idea though. I'd happily volunteer to write or review a patch to do that. Is there a rebase of the stuff that got reverted, to build on? -- Thomas Munro http://www.enterprisedb.com