On Thu, Dec 8, 2016 at 5:23 PM, Robert Haas <[email protected]> wrote:
> On Wed, Nov 23, 2016 at 3:33 AM, Andres Freund <[email protected]> wrote:
>> On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
>>> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <[email protected]> wrote:
>>> > I've a working fix for this, and for a similar issue Robert found. I'm
>>> > still playing around with it, but basically the fix is to make the
>>> > growth policy a bit more adaptive.
>>>
>>> Any chance you can post a patch soon?
>>
>> Here's my WIP series addressing this and related problems. With this
>> we're again noticeably faster than the dynahash implementation, in both
>> the case here, and the query you brought up over IM.
>>
>> This definitely needs some more TLC, but the general approach seems
>> good. I particularly like that it apparently allows us to increase the
>> default fillfactor without much downside according to my measurements.
>
> Are you going to commit something here? At least enough to make
> Finalize HashAgg -> Gather -> Partial HashAgg terminate in finite
> time? Because the fact that it doesn't really sucks.
I took a look at Andres's patches today and saw that they can't really
be applied as-is, because they "temporarily" change the master's
ParallelWorkerNumber but have no provision for restoring it after an
ERROR. I think that approach isn't right anyway; it seems to me that
what we want to do is set hash_iv based on we're in a Partial HashAgg,
not whether we're in a parallel worker. For instance, if we're
somehow in a nodeSubplan.c there's no need for this just because we
happen to be in a worker -- I think. That led me to develop the
attached patch.
This may not be perfect, but it causes TPC-H Q18 to complete instead
of hanging forever, so I'm going to commit it RSN unless there are
loud objections combined with promising steps toward some alternative
fix. It's been over a month since these problems were reported, and
it is not good that the tree has been in a broken state for that
entire time.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index 94cc59d..3149fbe 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -18,6 +18,8 @@
*/
#include "postgres.h"
+#include "access/hash.h"
+#include "access/parallel.h"
#include "executor/executor.h"
#include "miscadmin.h"
#include "utils/lsyscache.h"
@@ -289,7 +291,8 @@ BuildTupleHashTable(int numCols, AttrNumber *keyColIdx,
FmgrInfo *eqfunctions,
FmgrInfo *hashfunctions,
long nbuckets, Size additionalsize,
- MemoryContext tablecxt, MemoryContext tempcxt)
+ MemoryContext tablecxt, MemoryContext tempcxt,
+ bool use_variable_hash_iv)
{
TupleHashTable hashtable;
Size entrysize = sizeof(TupleHashEntryData) + additionalsize;
@@ -314,6 +317,19 @@ BuildTupleHashTable(int numCols, AttrNumber *keyColIdx,
hashtable->in_hash_funcs = NULL;
hashtable->cur_eq_funcs = NULL;
+ /*
+ * If parallelism is in use, even if the master backend is performing the
+ * scan itself, we don't want to create the hashtable exactly the same way
+ * in all workers. As hashtables are iterated over in keyspace-order,
+ * doing so in all processes in the same way is likely to lead to
+ * "unbalanced" hashtables when the table size initially is
+ * underestimated.
+ */
+ if (use_variable_hash_iv)
+ hashtable->hash_iv = hash_uint32(ParallelWorkerNumber);
+ else
+ hashtable->hash_iv = 0;
+
hashtable->hashtab = tuplehash_create(tablecxt, nbuckets);
hashtable->hashtab->private_data = hashtable;
@@ -450,7 +466,7 @@ TupleHashTableHash(struct tuplehash_hash *tb, const MinimalTuple tuple)
TupleHashTable hashtable = (TupleHashTable) tb->private_data;
int numCols = hashtable->numCols;
AttrNumber *keyColIdx = hashtable->keyColIdx;
- uint32 hashkey = 0;
+ uint32 hashkey = hashtable->hash_iv;
TupleTableSlot *slot;
FmgrInfo *hashfunctions;
int i;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index eefb3d6..a093862 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1723,7 +1723,8 @@ build_hash_table(AggState *aggstate)
node->numGroups,
additionalsize,
aggstate->aggcontexts[0]->ecxt_per_tuple_memory,
- tmpmem);
+ tmpmem,
+ !DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit));
}
/*
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index acded07..5b734c0 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -43,7 +43,8 @@ build_hash_table(RecursiveUnionState *rustate)
node->numGroups,
0,
rustate->tableContext,
- rustate->tempContext);
+ rustate->tempContext,
+ false);
}
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index e94555e..760b935 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -130,7 +130,8 @@ build_hash_table(SetOpState *setopstate)
node->numGroups,
0,
setopstate->tableContext,
- setopstate->tempContext);
+ setopstate->tempContext,
+ false);
}
/*
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index 8ca8fc4..d343600 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -510,7 +510,8 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
nbuckets,
0,
node->hashtablecxt,
- node->hashtempcxt);
+ node->hashtempcxt,
+ false);
if (!subplan->unknownEqFalse)
{
@@ -529,7 +530,8 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
nbuckets,
0,
node->hashtablecxt,
- node->hashtempcxt);
+ node->hashtempcxt,
+ false);
}
/*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index b4d09f9..3f649fa 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -143,7 +143,7 @@ extern TupleHashTable BuildTupleHashTable(int numCols, AttrNumber *keyColIdx,
FmgrInfo *hashfunctions,
long nbuckets, Size additionalsize,
MemoryContext tablecxt,
- MemoryContext tempcxt);
+ MemoryContext tempcxt, bool use_variable_hash_iv);
extern TupleHashEntry LookupTupleHashEntry(TupleHashTable hashtable,
TupleTableSlot *slot,
bool *isnew);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1de5c81..703604a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -533,6 +533,7 @@ typedef struct TupleHashTableData
TupleTableSlot *inputslot; /* current input tuple's slot */
FmgrInfo *in_hash_funcs; /* hash functions for input datatype(s) */
FmgrInfo *cur_eq_funcs; /* equality functions for input vs. table */
+ uint32 hash_iv; /* hash-function IV */
} TupleHashTableData;
typedef tuplehash_iterator TupleHashIterator;
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers