Thank you for your review!

On Mon, 8 May 2023 11:56:48 -0400
Melanie Plageman <melanieplage...@gmail.com> wrote:

> ...
> > 4. accessor->read_buffer is currently allocated in accessor->context as
> > well.
> > 
> >    This buffer holds tuple read from the fileset. This is still a buffer,
> > but not related to any file anymore...
> > 
> > Because of 3 and 4, and the gross context granularity of SharedTuplestore
> > related-code, I'm now wondering if "fileCxt" shouldn't be renamed "bufCxt".
> 
> I think bufCxt is a potentially confusing name. The context contains
> pointers to the batch files and saying those are related to buffers is
> confusing. It also sounds like it could include any kind of buffer
> related to the hashtable or hash join.
> 
> Perhaps we could call this memory context the "spillCxt"--indicating it
> is for the memory required for spilling to permanent storage while
> executing hash joins.

"Spilling" seems fair and a large enough net to grab everything around temp
files and accessing them.

> I discuss this more in my code review below.
> 
> > From c5ed2ae2c2749af4f5058b012dc5e8a9e1529127 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais <j...@dalibo.com>
> > Date: Mon, 27 Mar 2023 15:54:39 +0200
> > Subject: [PATCH 2/3] Allocate hash batches related BufFile in a dedicated
> >  context  
> 
> > diff --git a/src/backend/executor/nodeHash.c
> > b/src/backend/executor/nodeHash.c index 5fd1c5553b..a4fbf29301 100644
> > --- a/src/backend/executor/nodeHash.c
> > +++ b/src/backend/executor/nodeHash.c
> > @@ -570,15 +574,21 @@ ExecHashTableCreate(HashState *state, List
> > *hashOperators, List *hashCollations, 
> >     if (nbatch > 1 && hashtable->parallel_state == NULL)
> >     {
> > +           MemoryContext oldctx;
> > +
> >             /*
> >              * allocate and initialize the file arrays in hashCxt (not
> > needed for
> >              * parallel case which uses shared tuplestores instead of
> > raw files) */
> > +           oldctx = MemoryContextSwitchTo(hashtable->fileCxt);
> > +
> >             hashtable->innerBatchFile = palloc0_array(BufFile *,
> > nbatch); hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch);
> >             /* The files will not be opened until needed... */
> >             /* ... but make sure we have temp tablespaces established
> > for them */  
> 
> I haven't studied it closely, but I wonder if we shouldn't switch back
> into the oldctx before calling PrepareTempTablespaces().
> PrepareTempTablespaces() is explicit about what memory context it is
> allocating in, however, I'm not sure it makes sense to call it in the
> fileCxt. If you have a reason, you should add a comment and do the same
> in ExecHashIncreaseNumBatches().

I had no reason. I catched it in ExecHashIncreaseNumBatches() while reviewing
myself, but not here.

Line moved in v6.

> > @@ -934,13 +943,16 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
> >                hashtable, nbatch, hashtable->spaceUsed);
> >  #endif
> >  
> > -   oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
> > -
> >     if (hashtable->innerBatchFile == NULL)
> >     {
> > +           MemoryContext oldcxt =
> > MemoryContextSwitchTo(hashtable->fileCxt); +
> >             /* we had no file arrays before */
> >             hashtable->innerBatchFile = palloc0_array(BufFile *,
> > nbatch); hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch);
> > +  
> > +           MemoryContextSwitchTo(oldcxt);
> > +
> >             /* time to establish the temp tablespaces, too */
> >             PrepareTempTablespaces();
> >     }
> > @@ -951,8 +963,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)  
> 
> I don't see a reason to call repalloc0_array() in a different context
> than the initial palloc0_array().

Unless I'm wrong, wrapping the whole if/else blocks in memory context
hashtable->fileCxt seemed useless as repalloc() actually realloc in the context
the original allocation occurred. So I only wrapped the palloc() calls.

> > diff --git a/src/backend/executor/nodeHashjoin.c
> > ...
> > @@ -1308,21 +1310,27 @@ ExecParallelHashJoinNewBatch(HashJoinState
> > *hjstate)
> >   * The data recorded in the file for each tuple is its hash value,  
> 
> It doesn't sound accurate to me to say that it should be called *in* the
> HashBatchFiles context. We now switch into that context, so you can
> probably remove this comment and add a comment above the switch into the
> filecxt which explains that the temp file buffers should be allocated in
> the filecxt (both because they need to be allocated in a sufficiently
> long-lived context and to increase visibility of their memory overhead).

Indeed. Comment moved and reworded in v6.

> > diff --git a/src/include/executor/hashjoin.h
> > b/src/include/executor/hashjoin.h index 8ee59d2c71..74867c3e40 100644
> > --- a/src/include/executor/hashjoin.h
> > +++ b/src/include/executor/hashjoin.h
> > @@ -25,10 +25,14 @@
> >   *
> >   * Each active hashjoin has a HashJoinTable control block, which is
> >   * palloc'd in the executor's per-query context.  All other storage needed
> > - * for the hashjoin is kept in private memory contexts, two for each
> > hashjoin.
> > + * for the hashjoin is kept in private memory contexts, three for each
> > + * hashjoin:  
> 
> Maybe "hash table control block". I know the phrase "control block" is
> used elsewhere in the comments, but it is a bit confusing. Also, I wish
> there was a way to make it clear this is for the hashtable but relevant
> to all batches.

I tried to reword the comment with this additional info in mind in v6. Does it
match what you had in mind?

> ...
> So, if we are going to allocate the array of pointers to the spill files
> in the fileCxt, we should revise this comment. As I mentioned above, I
> agree with you that if the SharedTupleStore-related buffers are also
> allocated in this context, perhaps it shouldn't be called the fileCxt.
> 
> One idea I had is calling it the spillCxt. Almost all memory allocated in this
> context is related to needing to spill to permanent storage during execution.

Agree

> The one potential confusing part of this is batch 0 for parallel hash
> join. I would have to dig back into it again, but from a cursory look at
> ExecParallelHashJoinSetUpBatches() it seems like batch 0 also gets a
> shared tuplestore with associated accessors and files even if it is a
> single batch parallel hashjoin.
> 
> Are the parallel hash join read_buffer and write_chunk also used for a
> single batch parallel hash join?

I don't think so.

For the inner side, there's various Assert() around the batchno==0 special
case. Plus, it always has his own block when inserting in a batch, to directly
write in shared memory calling ExecParallelHashPushTuple().

The outer side of the join actually creates all batches using shared tuple
storage mechanism, including batch 0, **only** if the number of batch is
greater than 1. See in ExecParallelHashJoinOuterGetTuple:

  /*
   * In the Parallel Hash case we only run the outer plan directly for
   * single-batch hash joins. Otherwise we have to go to batch files, even
   * for batch 0.
   */
  if (curbatch == 0 && hashtable->nbatch == 1)
  {
        slot = ExecProcNode(outerNode);

So, for a single batch PHJ, it seems there's no temp files involved.

> Though, perhaps spillCxt still makes sense? It doesn't specify
> multi-batch...

I'm not sure the see where would be the confusing part here? Is it that some
STS mechanism are allocated but never used? When the number of batch is 1, it
doesn't really matter much I suppose, as the context consumption stays
really low. Plus, there's some other useless STS/context around there (ie. inner
batch 0 and batch context in PHJ). I'm not sure it worth trying optimizing this
compare to the cost of the added code complexity.

Or am I off-topic and missing something obvious?

> > --- a/src/include/executor/nodeHashjoin.h
> > +++ b/src/include/executor/nodeHashjoin.h
> > @@ -29,6 +29,6 @@ extern void ExecHashJoinInitializeWorker(HashJoinState
> > *state, ParallelWorkerContext *pwcxt);
> >    
> 
> I would add a comment explaining why ExecHashJoinSaveTuple() is passed
> with the fileCxt as a parameter.
> 
> >  extern void ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,

Isn't the comment added in the function itself, in v6, enough? It seems
uncommon to comment on function parameters in headers.

Last, about your TODO in 0001 patch, do you mean that we should document
that after splitting a batch N, its rows can only redispatch in N0 or N1 ?

Regards,
>From 33d799fe230cc3f3c65be07326f297e0bc7fb07c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 30 Apr 2020 07:16:28 -0700
Subject: [PATCH 1/3] Describe hybrid hash join implementation

This is just a draft to spark conversation on what a good comment might
be like in this file on how the hybrid hash join algorithm is
implemented in Postgres. I'm pretty sure this is the accepted term for
this algorithm https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join
---
 src/backend/executor/nodeHashjoin.c | 36 +++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 0a3f32f731..920d1831c2 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -10,6 +10,42 @@
  * IDENTIFICATION
  *	  src/backend/executor/nodeHashjoin.c
  *
+ *   HYBRID HASH JOIN
+ *
+ *  If the inner side tuples of a hash join do not fit in memory, the hash join
+ *  can be executed in multiple batches.
+ *
+ *  If the statistics on the inner side relation are accurate, planner chooses a
+ *  multi-batch strategy and estimates the number of batches.
+ *
+ *  The query executor measures the real size of the hashtable and increases the
+ *  number of batches if the hashtable grows too large.
+ *
+ *  The number of batches is always a power of two, so an increase in the number
+ *  of batches doubles it.
+ *
+ *  Serial hash join measures batch size lazily -- waiting until it is loading a
+ *  batch to determine if it will fit in memory. While inserting tuples into the
+ *  hashtable, serial hash join will, if that tuple were to exceed work_mem,
+ *  dump out the hashtable and reassign them either to other batch files or the
+ *  current batch resident in the hashtable.
+ *
+ *  Parallel hash join, on the other hand, completes all changes to the number
+ *  of batches during the build phase. If it increases the number of batches, it
+ *  dumps out all the tuples from all batches and reassigns them to entirely new
+ *  batch files. Then it checks every batch to ensure it will fit in the space
+ *  budget for the query.
+ *
+ *  In both parallel and serial hash join, the executor currently makes a best
+ *  effort. If a particular batch will not fit in memory, it tries doubling the
+ *  number of batches. If after a batch increase, there is a batch which
+ *  retained all or none of its tuples, the executor disables growth in the
+ *  number of batches globally. After growth is disabled, all batches that would
+ *  have previously triggered an increase in the number of batches instead
+ *  exceed the space allowed.
+ *
+ *  TODO: should we discuss that tuples can only spill forward?
+ *
  * PARALLELISM
  *
  * Hash joins can participate in parallel query execution in several ways.  A
-- 
2.39.2

>From 6c9056979eb4d638f0555a05453686e01b1d1d11 Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais <j...@dalibo.com>
Date: Mon, 27 Mar 2023 15:54:39 +0200
Subject: [PATCH 2/3] Allocate hash batches related BufFile in a dedicated
 context

---
 src/backend/executor/nodeHash.c           | 43 ++++++++++++++++-------
 src/backend/executor/nodeHashjoin.c       | 24 +++++++++----
 src/backend/utils/sort/sharedtuplestore.c |  8 +++++
 src/include/executor/hashjoin.h           | 30 ++++++++++------
 src/include/executor/nodeHashjoin.h       |  2 +-
 5 files changed, 76 insertions(+), 31 deletions(-)

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 5fd1c5553b..e529f50a3e 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -484,7 +484,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	 *
 	 * The hashtable control block is just palloc'd from the executor's
 	 * per-query memory context.  Everything else should be kept inside the
-	 * subsidiary hashCxt or batchCxt.
+	 * subsidiary hashCxt, batchCxt or spillCxt.
 	 */
 	hashtable = palloc_object(HashJoinTableData);
 	hashtable->nbuckets = nbuckets;
@@ -538,6 +538,10 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 												"HashBatchContext",
 												ALLOCSET_DEFAULT_SIZES);
 
+	hashtable->spillCxt = AllocSetContextCreate(hashtable->hashCxt,
+								 "HashSpillContext",
+								 ALLOCSET_DEFAULT_SIZES);
+
 	/* Allocate data that will live for the life of the hashjoin */
 
 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
@@ -570,12 +574,19 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 
 	if (nbatch > 1 && hashtable->parallel_state == NULL)
 	{
+		MemoryContext oldctx;
+
 		/*
 		 * allocate and initialize the file arrays in hashCxt (not needed for
 		 * parallel case which uses shared tuplestores instead of raw files)
 		 */
+		oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
+
 		hashtable->innerBatchFile = palloc0_array(BufFile *, nbatch);
 		hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch);
+
+		MemoryContextSwitchTo(oldctx);
+
 		/* The files will not be opened until needed... */
 		/* ... but make sure we have temp tablespaces established for them */
 		PrepareTempTablespaces();
@@ -913,7 +924,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 	int			oldnbatch = hashtable->nbatch;
 	int			curbatch = hashtable->curbatch;
 	int			nbatch;
-	MemoryContext oldcxt;
 	long		ninmemory;
 	long		nfreed;
 	HashMemoryChunk oldchunks;
@@ -934,13 +944,16 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 		   hashtable, nbatch, hashtable->spaceUsed);
 #endif
 
-	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
-
 	if (hashtable->innerBatchFile == NULL)
 	{
+		MemoryContext oldcxt = MemoryContextSwitchTo(hashtable->spillCxt);
+
 		/* we had no file arrays before */
 		hashtable->innerBatchFile = palloc0_array(BufFile *, nbatch);
 		hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch);
+
+		MemoryContextSwitchTo(oldcxt);
+
 		/* time to establish the temp tablespaces, too */
 		PrepareTempTablespaces();
 	}
@@ -951,8 +964,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 		hashtable->outerBatchFile = repalloc0_array(hashtable->outerBatchFile, BufFile *, oldnbatch, nbatch);
 	}
 
-	MemoryContextSwitchTo(oldcxt);
-
 	hashtable->nbatch = nbatch;
 
 	/*
@@ -1024,7 +1035,8 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 				Assert(batchno > curbatch);
 				ExecHashJoinSaveTuple(HJTUPLE_MINTUPLE(hashTuple),
 									  hashTuple->hashvalue,
-									  &hashtable->innerBatchFile[batchno]);
+									  &hashtable->innerBatchFile[batchno],
+									  hashtable->spillCxt);
 
 				hashtable->spaceUsed -= hashTupleSize;
 				nfreed++;
@@ -1683,7 +1695,8 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		Assert(batchno > hashtable->curbatch);
 		ExecHashJoinSaveTuple(tuple,
 							  hashvalue,
-							  &hashtable->innerBatchFile[batchno]);
+							  &hashtable->innerBatchFile[batchno],
+							  hashtable->spillCxt);
 	}
 
 	if (shouldFree)
@@ -2664,7 +2677,8 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 			/* Put the tuple into a temp file for later batches */
 			Assert(batchno > hashtable->curbatch);
 			ExecHashJoinSaveTuple(tuple, hashvalue,
-								  &hashtable->innerBatchFile[batchno]);
+								  &hashtable->innerBatchFile[batchno],
+								  hashtable->spillCxt);
 			pfree(hashTuple);
 			hashtable->spaceUsed -= tupleSize;
 			hashtable->spaceUsedSkew -= tupleSize;
@@ -3093,8 +3107,11 @@ ExecParallelHashJoinSetUpBatches(HashJoinTable hashtable, int nbatch)
 	pstate->nbatch = nbatch;
 	batches = dsa_get_address(hashtable->area, pstate->batches);
 
-	/* Use hash join memory context. */
-	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
+	/*
+	 * Use hash join spill memory context to allocate accessors and their
+	 * buffers
+	 */
+	oldcxt = MemoryContextSwitchTo(hashtable->spillCxt);
 
 	/* Allocate this backend's accessor array. */
 	hashtable->nbatch = nbatch;
@@ -3196,8 +3213,8 @@ ExecParallelHashEnsureBatchAccessors(HashJoinTable hashtable)
 	 */
 	Assert(DsaPointerIsValid(pstate->batches));
 
-	/* Use hash join memory context. */
-	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
+	/* Use hash join spill memory context to allocate accessors */
+	oldcxt = MemoryContextSwitchTo(hashtable->spillCxt);
 
 	/* Allocate this backend's accessor array. */
 	hashtable->nbatch = pstate->nbatch;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 920d1831c2..3b903cc905 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -486,7 +486,8 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
 					Assert(parallel_state == NULL);
 					Assert(batchno > hashtable->curbatch);
 					ExecHashJoinSaveTuple(mintuple, hashvalue,
-										  &hashtable->outerBatchFile[batchno]);
+										  &hashtable->outerBatchFile[batchno],
+										  hashtable->spillCxt);
 
 					if (shouldFree)
 						heap_free_minimal_tuple(mintuple);
@@ -1307,22 +1308,31 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
  *
  * The data recorded in the file for each tuple is its hash value,
  * then the tuple in MinimalTuple format.
- *
- * Note: it is important always to call this in the regular executor
- * context, not in a shorter-lived context; else the temp file buffers
- * will get messed up.
  */
 void
 ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-					  BufFile **fileptr)
+					  BufFile **fileptr, MemoryContext spillcxt)
 {
 	BufFile    *file = *fileptr;
 
 	if (file == NULL)
 	{
-		/* First write to this batch file, so open it. */
+		MemoryContext oldctx;
+
+		/*
+		 * First write to this batch file, so open it.
+		 *
+		 * It is important to always create BufFile'd temp files in the
+		 * HashSpillContext context. First to avoid a shorter-lived context,
+		 * else the temp file buffers will get messed up. Second, for a better
+		 * accounting of the spilling memory consumption.
+		 */
+		oldctx = MemoryContextSwitchTo(spillcxt);
+
 		file = BufFileCreateTemp(false);
 		*fileptr = file;
+
+		MemoryContextSwitchTo(oldctx);
 	}
 
 	BufFileWrite(file, &hashvalue, sizeof(uint32));
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 0831249159..236be65f22 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -308,11 +308,15 @@ sts_puttuple(SharedTuplestoreAccessor *accessor, void *meta_data,
 	{
 		SharedTuplestoreParticipant *participant;
 		char		name[MAXPGPATH];
+		MemoryContext oldcxt;
 
 		/* Create one.  Only this backend will write into it. */
 		sts_filename(name, accessor, accessor->participant);
+
+		oldcxt = MemoryContextSwitchTo(accessor->context);
 		accessor->write_file =
 			BufFileCreateFileSet(&accessor->fileset->fs, name);
+		MemoryContextSwitchTo(oldcxt);
 
 		/* Set up the shared state for this backend's file. */
 		participant = &accessor->sts->participants[accessor->participant];
@@ -527,11 +531,15 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
 			if (accessor->read_file == NULL)
 			{
 				char		name[MAXPGPATH];
+				MemoryContext oldcxt;
 
 				sts_filename(name, accessor, accessor->read_participant);
+
+				oldcxt = MemoryContextSwitchTo(accessor->context);
 				accessor->read_file =
 					BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY,
 									   false);
+				MemoryContextSwitchTo(oldcxt);
 			}
 
 			/* Seek and load the chunk header. */
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 8ee59d2c71..09113f0192 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -23,22 +23,31 @@
 /* ----------------------------------------------------------------
  *				hash-join hash table structures
  *
- * Each active hashjoin has a HashJoinTable control block, which is
- * palloc'd in the executor's per-query context.  All other storage needed
- * for the hashjoin is kept in private memory contexts, two for each hashjoin.
- * This makes it easy and fast to release the storage when we don't need it
- * anymore.  (Exception: data associated with the temp files lives in the
- * per-query context too, since we always call buffile.c in that context.)
+ * Each active hashjoin has a HashJoinTable structure, which is
+ * palloc'd in the executor's per-query context.  Other storages needed for
+ * each hashjoin is kept in child contexts, three for each hashjoin:
+ *   - HashTableContext (hashCxt): the parent hash table storage context
+ *     - HashSpillContext (spillCxt): storage for temp files buffers
+ *     - HashBatchContext (batchCxt): storage for a batch in serial hash join
  *
  * The hashtable contexts are made children of the per-query context, ensuring
  * that they will be discarded at end of statement even if the join is
  * aborted early by an error.  (Likewise, any temporary files we make will
  * be cleaned up by the virtual file manager in event of an error.)
  *
- * Storage that should live through the entire join is allocated from the
- * "hashCxt", while storage that is only wanted for the current batch is
- * allocated in the "batchCxt".  By resetting the batchCxt at the end of
- * each batch, we free all the per-batch storage reliably and without tedium.
+ * The "hashCxt" context makes it easy and fast to release the storage when we
+ * don't need it anymore. Storage that should live through the entire join is
+ * allocated from the "hashCxt" (mainly hash's meta datas).
+ *
+ * Data associated with temp files lives in the "spillCxt" context which lives
+ * during the entire join as temp files might need to survives batches. These
+ * files are explicitly destroyed by calling BufFileClose() when the code is
+ * done with them. The aim of this context is to help accounting the memory
+ * allocations dedicated to temp files and their buffers.
+ *
+ * Finaly, storage that is only wanted for the current batch is allocated in
+ * the "batchCxt".  By resetting the batchCxt at the end of each batch, we free
+ * all the per-batch storage reliably and without tedium.
  *
  * During first scan of inner relation, we get its tuples from executor.
  * If nbatch > 1 then tuples that don't belong in first batch get saved
@@ -350,6 +359,7 @@ typedef struct HashJoinTableData
 
 	MemoryContext hashCxt;		/* context for whole-hash-join storage */
 	MemoryContext batchCxt;		/* context for this-batch-only storage */
+	MemoryContext spillCxt;		/* context for spilling to temp files */
 
 	/* used for dense allocation of tuples (into linked chunks) */
 	HashMemoryChunk chunks;		/* one list for the whole batch */
diff --git a/src/include/executor/nodeHashjoin.h b/src/include/executor/nodeHashjoin.h
index d367070883..958df42240 100644
--- a/src/include/executor/nodeHashjoin.h
+++ b/src/include/executor/nodeHashjoin.h
@@ -29,6 +29,6 @@ extern void ExecHashJoinInitializeWorker(HashJoinState *state,
 										 ParallelWorkerContext *pwcxt);
 
 extern void ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-								  BufFile **fileptr);
+								  BufFile **fileptr, MemoryContext spillcxt);
 
 #endif							/* NODEHASHJOIN_H */
-- 
2.39.2

Reply via email to