Hi,

On 8.1.2015 08:53, Ali Akbar wrote:
> In the CF, the status becomes "Needs Review". Let's continue our
> discussion of makeArrayResult* behavior if subcontext=false and
> release=true (more below):
> 2014-12-22 8:08 GMT+07:00 Ali Akbar <the.ap...@gmail.com
> <mailto:the.ap...@gmail.com>>:
> 
> 
>     With this API, i think we should make it clear if we call
>     initArrayResult with subcontext=false, we can't call
>     makeArrayResult, but we must use makeMdArrayResult directly.
> 
>     Or better, we can modify makeArrayResult to release according to
>     astate->private_cxt:
> 
>         @@ -4742,7 +4742,7 @@ makeArrayResult(ArrayBuildState *astate,
>             dims[0] = astate->nelems;
>             lbs[0] = 1;
>          
>         -   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext,
>         true);
>         +   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext,
>         astate->private_cxt);

I've done this, so makeArrayResult() uses the private_cxt flag.

>     Or else we implement what you suggest below (more comments below):
> 
>         Thinking about the 'release' flag a bit more - maybe we could do
>         this
>         instead:
> 
>             if (release && astate->private_cxt)
>                 MemoryContextDelete(astate->mcontext);
>             else if (release)
>             {
>                 pfree(astate->dvalues);
>                 pfree(astate->dnulls);
>                 pfree(astate);
>             }
> 
>         i.e. either destroy the whole context if possible, and just free the
>         memory when using a shared memory context. But I'm afraid this would
>         penalize the shared memory context, because that's intended for
>         cases where all the build states coexist in parallel and then at some
>         point are all converted into a result and thrown away. Adding pfree()
>         calls is no improvement here, and just wastes cycles.
> 
> 
>     As per Tom's comment, i'm using "parent memory context" instead of
>     "shared memory context" below.
> 
>     In the future, if some code writer decided to use subcontext=false,
>     to save memory in cases where there are many array accumulation, and
>     the parent memory context is long-living, current code can cause
>     memory leak. So i think we should implement your suggestion
>     (pfreeing astate), and warn the implication in the API comment. The
>     API user must choose between release=true, wasting cycles but
>     preventing memory leak, or managing memory from the parent memory
>     context.

I'm wondering whether this is necessary after fixing makeArrayResult to
use the privat_cxt flag. It's still possible to call makeMdArrayResult
directly (with the wrong 'release' value).

Another option might be to get rid of the 'release' flag altogether, and
just use the 'private_cxt' - I'm not aware of a code using release=false
with private_cxt=true (e.g. to build the same array twice from the same
astate). But maybe there's such code, and another downside is that it'd
break the existing API.

>     In one possible use case, for efficiency maybe the caller will
>     create a special parent memory context for all array accumulation.
>     Then uses makeArrayResult* with release=false, and in the end
>     releasing the parent memory context once for all.

Yeah, although I'd much rather not break the existing code at all. That
is - my goal is not to make it slower unless absolutely necessary (and
in that case the code may be fixed per your suggestion). But I'm not
convinced it's worth it.

> As for the v6 patch:
> - the patch applies cleanly to master
> - make check is successfull
> - memory benefit is still there
> - performance benefit i think is negligible
> 
> Reviewing the code, found this:
> 
>     @@ -573,7 +578,22 @@ array_agg_array_transfn(PG_FUNCTION_ARGS)
>              elog(ERROR, "array_agg_array_transfn called in
>     non-aggregate context");
>          }
>      
>     -    state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *)
>     PG_GETARG_POINTER(0);
>     +
>     +    if (PG_ARGISNULL(0))
>     +    {
>     +        Oid            element_type = get_element_type(arg1_typeid);
>     +
>     +        if (!OidIsValid(element_type))
>     +            ereport(ERROR,
>     +                    (errcode(ERRCODE_DATATYPE_MISMATCH),
>     +                     errmsg("data type %s is not an array type",
>     +                            format_type_be(arg1_typeid))));
> 
> 
> digging more, it looks like those code required because
> accumArrayResultArr checks the element type:
> 
>             /* First time through --- initialize */
>             Oid            element_type = get_element_type(array_type);
> 
>             if (!OidIsValid(element_type))
>                 ereport(ERROR,
>                         (errcode(ERRCODE_DATATYPE_MISMATCH),
>                          errmsg("data type %s is not an array type",
>                                 format_type_be(array_type))));
>             astate = initArrayResultArr(array_type, element_type,
>     rcontext, true);
> 
> 
> I think it should be in initArrayResultArr, because it is an
> initialization-only check, shouldn't it?

Yes, seems reasonable. There are three places where initArrayResultArr
is called, and at two of them it looks just like the code you posted.

    Oid element_type = get_element_type(array_type);

    if (! OidIsValid(element_type)) { ... error ... }

    astate = initArrayResultArr(array_type, element_type, ...)

The last place is initArrayResultAny() is different because it uses the
element_type to decide whether to use initArrayResultArr or plain
initArrayResult. So the error-handling is missing here.

But I think it makes sense to move the error handling into
initArrayResultArr(), including the get_element_type() call, and remove
the element_type from the signature. This means initArrayResultAny()
will call the get_element_type() twice, but I guess that's negligible.
And everyone who calls initArrayResultArr() will get the error handling
for free.

Patch v7 attached, implementing those two changes, i.e.

  * makeMdArrayResult(..., astate->private_cxt)
  * move error handling into initArrayResultArr()
  * remove element_type from initArrayResultArr() signature

regards
Tomas Vondra

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index f3ce1d7..9eb4d63 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -262,7 +262,7 @@ ExecScanSubPlan(SubPlanState *node,
 	/* Initialize ArrayBuildStateAny in caller's context, if needed */
 	if (subLinkType == ARRAY_SUBLINK)
 		astate = initArrayResultAny(subplan->firstColType,
-									CurrentMemoryContext);
+									CurrentMemoryContext, true);
 
 	/*
 	 * We are probably in a short-lived expression-evaluation context. Switch
@@ -964,7 +964,7 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext)
 	/* Initialize ArrayBuildStateAny in caller's context, if needed */
 	if (subLinkType == ARRAY_SUBLINK)
 		astate = initArrayResultAny(subplan->firstColType,
-									CurrentMemoryContext);
+									CurrentMemoryContext, true);
 
 	/*
 	 * Must switch to per-query memory context.
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 600646e..1386a71 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -498,8 +498,13 @@ array_agg_transfn(PG_FUNCTION_ARGS)
 		elog(ERROR, "array_agg_transfn called in non-aggregate context");
 	}
 
-	state = PG_ARGISNULL(0) ? NULL : (ArrayBuildState *) PG_GETARG_POINTER(0);
+	if (PG_ARGISNULL(0))
+		state = initArrayResult(arg1_typeid, aggcontext, false);
+	else
+		state = (ArrayBuildState *) PG_GETARG_POINTER(0);
+
 	elem = PG_ARGISNULL(1) ? (Datum) 0 : PG_GETARG_DATUM(1);
+
 	state = accumArrayResult(state,
 							 elem,
 							 PG_ARGISNULL(1),
@@ -573,7 +578,12 @@ array_agg_array_transfn(PG_FUNCTION_ARGS)
 		elog(ERROR, "array_agg_array_transfn called in non-aggregate context");
 	}
 
-	state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *) PG_GETARG_POINTER(0);
+
+	if (PG_ARGISNULL(0))
+		state = initArrayResultArr(arg1_typeid, aggcontext, false);
+	else
+		state = (ArrayBuildStateArr *) PG_GETARG_POINTER(0);
+
 	state = accumArrayResultArr(state,
 								PG_GETARG_DATUM(1),
 								PG_ARGISNULL(1),
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 5591b46..cccc1b2 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4606,6 +4606,7 @@ array_insert_slice(ArrayType *destArray,
  *
  *	element_type is the array element type (must be a valid array element type)
  *	rcontext is where to keep working state
+ *	subcontext is a flag determining whether to use a separate memory context
  *
  * Note: there are two common schemes for using accumArrayResult().
  * In the older scheme, you start with a NULL ArrayBuildState pointer, and
@@ -4615,24 +4616,36 @@ array_insert_slice(ArrayType *destArray,
  * once per element.  In this scheme you always end with a non-NULL pointer
  * that you can pass to makeArrayResult; you get an empty array if there
  * were no elements.  This is preferred if an empty array is what you want.
+ *
+ * You may choose whether to create a separate memory context for the array
+ * builder, or whether to allocate it directly within rcontext (along with
+ * various other pieces). A separate context has the benefit that it may
+ * be released by deleting the whole context. The main reason why not to
+ * use private contexts is memory efficiency - the minimum size of a context
+ * is 8kB (ALLOCSET_DEFAULT_INITSIZE) even if there's a single value in the
+ * array. When there are many states in parallel (e.g. as in hash aggregate)
+ * this may result in significant memory bloat (up to causing OOM) and also
+ * overhead with managing so many memory contexts.
  */
 ArrayBuildState *
-initArrayResult(Oid element_type, MemoryContext rcontext)
+initArrayResult(Oid element_type, MemoryContext rcontext, bool subcontext)
 {
 	ArrayBuildState *astate;
-	MemoryContext arr_context;
+	MemoryContext arr_context = rcontext;	/* by default use the parent ctx */
 
 	/* Make a temporary context to hold all the junk */
-	arr_context = AllocSetContextCreate(rcontext,
-										"accumArrayResult",
-										ALLOCSET_DEFAULT_MINSIZE,
-										ALLOCSET_DEFAULT_INITSIZE,
-										ALLOCSET_DEFAULT_MAXSIZE);
+	if (subcontext)
+		arr_context = AllocSetContextCreate(rcontext,
+											"accumArrayResult",
+											ALLOCSET_DEFAULT_MINSIZE,
+											ALLOCSET_DEFAULT_INITSIZE,
+											ALLOCSET_DEFAULT_MAXSIZE);
 
 	astate = (ArrayBuildState *)
 		MemoryContextAlloc(arr_context, sizeof(ArrayBuildState));
 	astate->mcontext = arr_context;
-	astate->alen = 64;			/* arbitrary starting array size */
+	astate->private_cxt = subcontext;
+	astate->alen = 8;			/* arbitrary starting array size */
 	astate->dvalues = (Datum *)
 		MemoryContextAlloc(arr_context, astate->alen * sizeof(Datum));
 	astate->dnulls = (bool *)
@@ -4666,7 +4679,7 @@ accumArrayResult(ArrayBuildState *astate,
 	if (astate == NULL)
 	{
 		/* First time through --- initialize */
-		astate = initArrayResult(element_type, rcontext);
+		astate = initArrayResult(element_type, rcontext, true);
 	}
 	else
 	{
@@ -4729,7 +4742,8 @@ makeArrayResult(ArrayBuildState *astate,
 	dims[0] = astate->nelems;
 	lbs[0] = 1;
 
-	return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, true);
+	return makeMdArrayResult(astate, ndims, dims, lbs, rcontext,
+							 astate->private_cxt);
 }
 
 /*
@@ -4768,6 +4782,9 @@ makeMdArrayResult(ArrayBuildState *astate,
 
 	MemoryContextSwitchTo(oldcontext);
 
+	/* we can only release the context if it's a private one. */
+	Assert(! (release && !astate->private_cxt));
+
 	/* Clean up all the junk */
 	if (release)
 		MemoryContextDelete(astate->mcontext);
@@ -4789,24 +4806,35 @@ makeMdArrayResult(ArrayBuildState *astate,
  *	array_type is the array type (must be a valid varlena array type)
  *	element_type is the type of the array's elements
  *	rcontext is where to keep working state
+ *	subcontext is a flag determining whether to use a separate memory context
  */
 ArrayBuildStateArr *
-initArrayResultArr(Oid array_type, Oid element_type, MemoryContext rcontext)
+initArrayResultArr(Oid array_type, MemoryContext rcontext, bool subcontext)
 {
 	ArrayBuildStateArr *astate;
-	MemoryContext arr_context;
+	MemoryContext arr_context = rcontext;   /* by default use the parent ctx */
+
+	Oid element_type = get_element_type(array_type);
+
+	if (!OidIsValid(element_type))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("data type %s is not an array type",
+						format_type_be(array_type))));
 
 	/* Make a temporary context to hold all the junk */
-	arr_context = AllocSetContextCreate(rcontext,
-										"accumArrayResultArr",
-										ALLOCSET_DEFAULT_MINSIZE,
-										ALLOCSET_DEFAULT_INITSIZE,
-										ALLOCSET_DEFAULT_MAXSIZE);
+	if (subcontext)
+		arr_context = AllocSetContextCreate(rcontext,
+											"accumArrayResultArr",
+											ALLOCSET_DEFAULT_MINSIZE,
+											ALLOCSET_DEFAULT_INITSIZE,
+											ALLOCSET_DEFAULT_MAXSIZE);
 
 	/* Note we initialize all fields to zero */
 	astate = (ArrayBuildStateArr *)
 		MemoryContextAllocZero(arr_context, sizeof(ArrayBuildStateArr));
 	astate->mcontext = arr_context;
+	astate->private_cxt = subcontext;
 
 	/* Save relevant datatype information */
 	astate->array_type = array_type;
@@ -4853,21 +4881,9 @@ accumArrayResultArr(ArrayBuildStateArr *astate,
 	arg = DatumGetArrayTypeP(dvalue);
 
 	if (astate == NULL)
-	{
-		/* First time through --- initialize */
-		Oid			element_type = get_element_type(array_type);
-
-		if (!OidIsValid(element_type))
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("data type %s is not an array type",
-							format_type_be(array_type))));
-		astate = initArrayResultArr(array_type, element_type, rcontext);
-	}
+		astate = initArrayResultArr(array_type, rcontext, true);
 	else
-	{
 		Assert(astate->array_type == array_type);
-	}
 
 	oldcontext = MemoryContextSwitchTo(astate->mcontext);
 
@@ -5044,6 +5060,9 @@ makeArrayResultArr(ArrayBuildStateArr *astate,
 
 	MemoryContextSwitchTo(oldcontext);
 
+	/* we can only release the context if it's a private one. */
+	Assert(! (release && !astate->private_cxt));
+
 	/* Clean up all the junk */
 	if (release)
 		MemoryContextDelete(astate->mcontext);
@@ -5062,9 +5081,10 @@ makeArrayResultArr(ArrayBuildStateArr *astate,
  *
  *	input_type is the input datatype (either element or array type)
  *	rcontext is where to keep working state
+ *	subcontext is a flag determining whether to use a separate memory context
  */
 ArrayBuildStateAny *
-initArrayResultAny(Oid input_type, MemoryContext rcontext)
+initArrayResultAny(Oid input_type, MemoryContext rcontext, bool subcontext)
 {
 	ArrayBuildStateAny *astate;
 	Oid			element_type = get_element_type(input_type);
@@ -5074,7 +5094,7 @@ initArrayResultAny(Oid input_type, MemoryContext rcontext)
 		/* Array case */
 		ArrayBuildStateArr *arraystate;
 
-		arraystate = initArrayResultArr(input_type, element_type, rcontext);
+		arraystate = initArrayResultArr(input_type, rcontext, subcontext);
 		astate = (ArrayBuildStateAny *)
 			MemoryContextAlloc(arraystate->mcontext,
 							   sizeof(ArrayBuildStateAny));
@@ -5089,7 +5109,7 @@ initArrayResultAny(Oid input_type, MemoryContext rcontext)
 		/* Let's just check that we have a type that can be put into arrays */
 		Assert(OidIsValid(get_array_type(input_type)));
 
-		scalarstate = initArrayResult(input_type, rcontext);
+		scalarstate = initArrayResult(input_type, rcontext, subcontext);
 		astate = (ArrayBuildStateAny *)
 			MemoryContextAlloc(scalarstate->mcontext,
 							   sizeof(ArrayBuildStateAny));
@@ -5115,7 +5135,7 @@ accumArrayResultAny(ArrayBuildStateAny *astate,
 					MemoryContext rcontext)
 {
 	if (astate == NULL)
-		astate = initArrayResultAny(input_type, rcontext);
+		astate = initArrayResultAny(input_type, rcontext, true);
 
 	if (astate->scalarstate)
 		(void) accumArrayResult(astate->scalarstate,
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index bfe9447..8bb7144 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3948,7 +3948,7 @@ xpath(PG_FUNCTION_ARGS)
 	ArrayType  *namespaces = PG_GETARG_ARRAYTYPE_P(2);
 	ArrayBuildState *astate;
 
-	astate = initArrayResult(XMLOID, CurrentMemoryContext);
+	astate = initArrayResult(XMLOID, CurrentMemoryContext, true);
 	xpath_internal(xpath_expr_text, data, namespaces,
 				   NULL, astate);
 	PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate, CurrentMemoryContext));
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index 694bce7..d0b6836 100644
--- a/src/include/utils/array.h
+++ b/src/include/utils/array.h
@@ -89,6 +89,7 @@ typedef struct ArrayBuildState
 	int16		typlen;			/* needed info about datatype */
 	bool		typbyval;
 	char		typalign;
+	bool		private_cxt;	/* use private memory context */
 } ArrayBuildState;
 
 /*
@@ -109,6 +110,7 @@ typedef struct ArrayBuildStateArr
 	int			lbs[MAXDIM];
 	Oid			array_type;		/* data type of the arrays */
 	Oid			element_type;	/* data type of the array elements */
+	bool		private_cxt;	/* use private memory context */
 } ArrayBuildStateArr;
 
 /*
@@ -286,7 +288,7 @@ extern void deconstruct_array(ArrayType *array,
 extern bool array_contains_nulls(ArrayType *array);
 
 extern ArrayBuildState *initArrayResult(Oid element_type,
-				MemoryContext rcontext);
+				MemoryContext rcontext, bool subcontext);
 extern ArrayBuildState *accumArrayResult(ArrayBuildState *astate,
 				 Datum dvalue, bool disnull,
 				 Oid element_type,
@@ -296,8 +298,8 @@ extern Datum makeArrayResult(ArrayBuildState *astate,
 extern Datum makeMdArrayResult(ArrayBuildState *astate, int ndims,
 				  int *dims, int *lbs, MemoryContext rcontext, bool release);
 
-extern ArrayBuildStateArr *initArrayResultArr(Oid array_type, Oid element_type,
-				   MemoryContext rcontext);
+extern ArrayBuildStateArr *initArrayResultArr(Oid array_type,
+				   MemoryContext rcontext, bool subcontext);
 extern ArrayBuildStateArr *accumArrayResultArr(ArrayBuildStateArr *astate,
 					Datum dvalue, bool disnull,
 					Oid array_type,
@@ -306,7 +308,7 @@ extern Datum makeArrayResultArr(ArrayBuildStateArr *astate,
 				   MemoryContext rcontext, bool release);
 
 extern ArrayBuildStateAny *initArrayResultAny(Oid input_type,
-				   MemoryContext rcontext);
+				   MemoryContext rcontext, bool subcontext);
 extern ArrayBuildStateAny *accumArrayResultAny(ArrayBuildStateAny *astate,
 					Datum dvalue, bool disnull,
 					Oid input_type,
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 492c1ef..e3dda5d 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -1218,7 +1218,7 @@ plperl_array_to_datum(SV *src, Oid typid, int32 typmod)
 				 errmsg("cannot convert Perl array to non-array type %s",
 						format_type_be(typid))));
 
-	astate = initArrayResult(elemtypid, CurrentMemoryContext);
+	astate = initArrayResult(elemtypid, CurrentMemoryContext, true);
 
 	_sv_to_datum_finfo(elemtypid, &finfo, &typioparam);
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to