Hi! First of all, thanks for the review - the insights and comments are spot-on. More comments below.
On 20.12.2014 09:26, Ali Akbar wrote: > > 2014-12-16 11:01 GMT+07:00 Ali Akbar <the.ap...@gmail.com > <mailto:the.ap...@gmail.com>>: > > > > 2014-12-16 10:47 GMT+07:00 Ali Akbar <the.ap...@gmail.com > <mailto:the.ap...@gmail.com>>: > > > 2014-12-16 6:27 GMT+07:00 Tomas Vondra <t...@fuzzy.cz > <mailto:t...@fuzzy.cz>>: > Just fast-viewing the patch. > > The patch is not implementing the checking for not creating new > context in initArrayResultArr. I think we should implement it > also there for consistency (and preventing future problems). You're right that initArrayResultArr was missing the code deciding whether to create a subcontext or reuse the parent one, and the fix you proposed (i.e. reusing code from initArrayResult) is IMHO the right one. > Testing the performance with your query, looks promising: speedup is > between 12% ~ 15%. > > Because i'm using 32-bit systems, setting work_mem to 1024GB failed: > > ERROR: 1073741824 is outside the valid range for parameter > "work_mem" (64 .. 2097151) > STATEMENT: SET work_mem = '1024GB'; > > psql:/media/truecrypt1/oss/postgresql/postgresql/../patches/array-agg.sql:20: > ERROR: 1073741824 is outside the valid range for parameter > "work_mem" (64 .. 2097151) Yes, that's pretty clearly because of the 2GB limit on 32-bit systems. > Maybe because of that, in the large groups a test, the speedup is awesome: > > master: 16,819 ms > > with patch: 1,720 ms Probably. It's difficult to say without explain plans or something, but it's probably using a different plan (e.g. group aggregate). > Looks like with master, postgres resort to disk, but with the patch it > fits in memory. I'd bet that's not postgres, but system using a swap (because postgres allocates a lot of memory). > Note: I hasn't tested the large dataset. > > As expected, testing array_agg(anyarray), the performance is still the > same, because the subcontext hasn't implemented there (test script > modified from Tomas', attached). > > I implemented the subcontext checking in initArrayResultArr by changing > the v3 patch like this: > > +++ b/src/backend/utils/adt/arrayfuncs.c > @@ -4797,10 +4797,11 @@ initArrayResultArr(Oid array_type, Oid > element_type, MemoryContext rcontext, > bool subcontext) > { > ArrayBuildStateArr *astate; > - MemoryContext arr_context; > + MemoryContext arr_context = rcontext; /* by default use the > parent ctx */ > > /* Make a temporary context to hold all the junk */ > - arr_context = AllocSetContextCreate(rcontext, > + if (subcontext) > + arr_context = AllocSetContextCreate(rcontext, > "accumArrayResultArr", > ALLOCSET_DEFAULT_MINSIZE, > ALLOCSET_DEFAULT_INITSIZE, > > > Testing the performance, it got the 12%~15% speedup. Good. (patch attached) Nice, and it's consistent with my measurements on scalar values. > > Looking at the modification in accumArrayResult* functions, i don't > really comfortable with: > > 1. Code that calls accumArrayResult* after explicitly calling > initArrayResult* must always passing subcontext, but it has no > effect. > 2. All existing codes that calls accumArrayResult must be changed. > > Just an idea: why don't we minimize the change in API like this: > > 1. Adding parameter bool subcontext, only in initArrayResult* > functions but not in accumArrayResult* > 2. Code that want to not creating subcontext must calls > initArrayResult* explicitly. > > Other codes that calls directly to accumArrayResult can only be > changed in the call to makeArrayResult* (with release=true > parameter). In places that we don't want to create subcontext (as in > array_agg_transfn), modify it to use initArrayResult* before calling > accumArrayResult*. > > What do you think? I think it's an interesting idea. I've been considering this before, when thinking about the best way to keep the calls to the various methods consistent (eg. enforcing the use of release=true only with subcontexts). What I ended up doing (see the v4 patch attached) is that I (1) added 'private_cxt' flag to the ArrayBuildState[Arr] struct, tracking whether there's a private memory context (2) rolled back all the API changes, except for the initArray* methods (as you proposed) This has the positive benefit that it allows checking consistency of the calls - you can still do initArrayResult(..., subcontext=false) ... makeArrayResult(..., release=true) but it won't reset the memory context, and with assert-enabled build it will actually fail. Another positive benefit is that this won't break the code unless it uses the new API. This is a problem especially with external code (e.g. extensions), but the new API (initArray*) is not part of 9.4 so there's no such code. So that's nice. The one annoying thing is that this makes the API slighly unbalanced. With the new API you can use a shared memory context, which with the old one (not using the initArray* methods) you can't. But I'm OK with that, and it makes the patch smaller (15kB -> 11kB). > As per your concern about calling initArrayResult* with > subcontext=false, while makeArrayResult* with release=true: > > Also, it seems that using 'subcontext=false' and then 'release=true' > would be a bug. Maybe it would be appropriate to store the > 'subcontext' value into the ArrayBuildState and then throw an error > if makeArrayResult* is called with (release=true && subcontext=false). > > > Yes, i think we should do that to minimize unexpected coding errors. > In makeArrayResult*, i think its better to not throwing an error, but > using assertions: > > Assert(release == false || astate->subcontext == true); Yes. I called the flag 'private_cxt' but that's a minor difference. The assert I used is this: /* we can only release the context if it's a private one. */ Assert(! (release && !astate->private_cxt)); regards Tomas
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index d9faf20..9c97755 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -262,7 +262,7 @@ ExecScanSubPlan(SubPlanState *node, /* Initialize ArrayBuildStateAny in caller's context, if needed */ if (subLinkType == ARRAY_SUBLINK) astate = initArrayResultAny(subplan->firstColType, - CurrentMemoryContext); + CurrentMemoryContext, true); /* * We are probably in a short-lived expression-evaluation context. Switch @@ -964,7 +964,7 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext) /* Initialize ArrayBuildStateAny in caller's context, if needed */ if (subLinkType == ARRAY_SUBLINK) astate = initArrayResultAny(subplan->firstColType, - CurrentMemoryContext); + CurrentMemoryContext, true); /* * Must switch to per-query memory context. diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c index 50ea4d2..f434fdd 100644 --- a/src/backend/utils/adt/array_userfuncs.c +++ b/src/backend/utils/adt/array_userfuncs.c @@ -498,8 +498,13 @@ array_agg_transfn(PG_FUNCTION_ARGS) elog(ERROR, "array_agg_transfn called in non-aggregate context"); } - state = PG_ARGISNULL(0) ? NULL : (ArrayBuildState *) PG_GETARG_POINTER(0); + if (PG_ARGISNULL(0)) + state = initArrayResult(arg1_typeid, aggcontext, false); + else + state = (ArrayBuildState *) PG_GETARG_POINTER(0); + elem = PG_ARGISNULL(1) ? (Datum) 0 : PG_GETARG_DATUM(1); + state = accumArrayResult(state, elem, PG_ARGISNULL(1), @@ -573,7 +578,22 @@ array_agg_array_transfn(PG_FUNCTION_ARGS) elog(ERROR, "array_agg_array_transfn called in non-aggregate context"); } - state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *) PG_GETARG_POINTER(0); + + if (PG_ARGISNULL(0)) + { + Oid element_type = get_element_type(arg1_typeid); + + if (!OidIsValid(element_type)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("data type %s is not an array type", + format_type_be(arg1_typeid)))); + + state = initArrayResultArr(arg1_typeid, element_type, aggcontext, false); + } + else + state = (ArrayBuildStateArr *) PG_GETARG_POINTER(0); + state = accumArrayResultArr(state, PG_GETARG_DATUM(1), PG_ARGISNULL(1), diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 933c6b0..51e088c 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -4617,22 +4617,24 @@ array_insert_slice(ArrayType *destArray, * were no elements. This is preferred if an empty array is what you want. */ ArrayBuildState * -initArrayResult(Oid element_type, MemoryContext rcontext) +initArrayResult(Oid element_type, MemoryContext rcontext, bool subcontext) { ArrayBuildState *astate; - MemoryContext arr_context; + MemoryContext arr_context = rcontext; /* by default use the parent ctx */ /* Make a temporary context to hold all the junk */ - arr_context = AllocSetContextCreate(rcontext, - "accumArrayResult", - ALLOCSET_DEFAULT_MINSIZE, - ALLOCSET_DEFAULT_INITSIZE, - ALLOCSET_DEFAULT_MAXSIZE); + if (subcontext) + arr_context = AllocSetContextCreate(rcontext, + "accumArrayResult", + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); astate = (ArrayBuildState *) MemoryContextAlloc(arr_context, sizeof(ArrayBuildState)); astate->mcontext = arr_context; - astate->alen = 64; /* arbitrary starting array size */ + astate->private_cxt = subcontext; + astate->alen = 8; /* arbitrary starting array size */ astate->dvalues = (Datum *) MemoryContextAlloc(arr_context, astate->alen * sizeof(Datum)); astate->dnulls = (bool *) @@ -4666,7 +4668,7 @@ accumArrayResult(ArrayBuildState *astate, if (astate == NULL) { /* First time through --- initialize */ - astate = initArrayResult(element_type, rcontext); + astate = initArrayResult(element_type, rcontext, false); } else { @@ -4768,6 +4770,9 @@ makeMdArrayResult(ArrayBuildState *astate, MemoryContextSwitchTo(oldcontext); + /* we can only release the context if it's a private one. */ + Assert(! (release && !astate->private_cxt)); + /* Clean up all the junk */ if (release) MemoryContextDelete(astate->mcontext); @@ -4791,22 +4796,25 @@ makeMdArrayResult(ArrayBuildState *astate, * rcontext is where to keep working state */ ArrayBuildStateArr * -initArrayResultArr(Oid array_type, Oid element_type, MemoryContext rcontext) +initArrayResultArr(Oid array_type, Oid element_type, MemoryContext rcontext, + bool subcontext) { ArrayBuildStateArr *astate; - MemoryContext arr_context; + MemoryContext arr_context = rcontext; /* by default use the parent ctx */ /* Make a temporary context to hold all the junk */ - arr_context = AllocSetContextCreate(rcontext, - "accumArrayResultArr", - ALLOCSET_DEFAULT_MINSIZE, - ALLOCSET_DEFAULT_INITSIZE, - ALLOCSET_DEFAULT_MAXSIZE); + if (subcontext) + arr_context = AllocSetContextCreate(rcontext, + "accumArrayResultArr", + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); /* Note we initialize all fields to zero */ astate = (ArrayBuildStateArr *) MemoryContextAllocZero(arr_context, sizeof(ArrayBuildStateArr)); astate->mcontext = arr_context; + astate->private_cxt = subcontext; /* Save relevant datatype information */ astate->array_type = array_type; @@ -4862,7 +4870,7 @@ accumArrayResultArr(ArrayBuildStateArr *astate, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("data type %s is not an array type", format_type_be(array_type)))); - astate = initArrayResultArr(array_type, element_type, rcontext); + astate = initArrayResultArr(array_type, element_type, rcontext, false); } else { @@ -5044,6 +5052,9 @@ makeArrayResultArr(ArrayBuildStateArr *astate, MemoryContextSwitchTo(oldcontext); + /* we can only release the context if it's a private one. */ + Assert(! (release && !astate->private_cxt)); + /* Clean up all the junk */ if (release) MemoryContextDelete(astate->mcontext); @@ -5064,7 +5075,7 @@ makeArrayResultArr(ArrayBuildStateArr *astate, * rcontext is where to keep working state */ ArrayBuildStateAny * -initArrayResultAny(Oid input_type, MemoryContext rcontext) +initArrayResultAny(Oid input_type, MemoryContext rcontext, bool subcontext) { ArrayBuildStateAny *astate; Oid element_type = get_element_type(input_type); @@ -5074,7 +5085,7 @@ initArrayResultAny(Oid input_type, MemoryContext rcontext) /* Array case */ ArrayBuildStateArr *arraystate; - arraystate = initArrayResultArr(input_type, element_type, rcontext); + arraystate = initArrayResultArr(input_type, element_type, rcontext, subcontext); astate = (ArrayBuildStateAny *) MemoryContextAlloc(arraystate->mcontext, sizeof(ArrayBuildStateAny)); @@ -5089,7 +5100,7 @@ initArrayResultAny(Oid input_type, MemoryContext rcontext) /* Let's just check that we have a type that can be put into arrays */ Assert(OidIsValid(get_array_type(input_type))); - scalarstate = initArrayResult(input_type, rcontext); + scalarstate = initArrayResult(input_type, rcontext, subcontext); astate = (ArrayBuildStateAny *) MemoryContextAlloc(scalarstate->mcontext, sizeof(ArrayBuildStateAny)); @@ -5115,7 +5126,7 @@ accumArrayResultAny(ArrayBuildStateAny *astate, MemoryContext rcontext) { if (astate == NULL) - astate = initArrayResultAny(input_type, rcontext); + astate = initArrayResultAny(input_type, rcontext, false); if (astate->scalarstate) (void) accumArrayResult(astate->scalarstate, diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 7d90312..a3fc7b0 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3930,7 +3930,7 @@ xpath(PG_FUNCTION_ARGS) ArrayType *namespaces = PG_GETARG_ARRAYTYPE_P(2); ArrayBuildState *astate; - astate = initArrayResult(XMLOID, CurrentMemoryContext); + astate = initArrayResult(XMLOID, CurrentMemoryContext, true); xpath_internal(xpath_expr_text, data, namespaces, NULL, astate); PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate, CurrentMemoryContext)); diff --git a/src/include/utils/array.h b/src/include/utils/array.h index e1d5749..cf0b72c 100644 --- a/src/include/utils/array.h +++ b/src/include/utils/array.h @@ -89,6 +89,7 @@ typedef struct ArrayBuildState int16 typlen; /* needed info about datatype */ bool typbyval; char typalign; + bool private_cxt; /* use private memory context */ } ArrayBuildState; /* @@ -109,6 +110,7 @@ typedef struct ArrayBuildStateArr int lbs[MAXDIM]; Oid array_type; /* data type of the arrays */ Oid element_type; /* data type of the array elements */ + bool private_cxt; /* use private memory context */ } ArrayBuildStateArr; /* @@ -286,7 +288,7 @@ extern void deconstruct_array(ArrayType *array, extern bool array_contains_nulls(ArrayType *array); extern ArrayBuildState *initArrayResult(Oid element_type, - MemoryContext rcontext); + MemoryContext rcontext, bool subcontext); extern ArrayBuildState *accumArrayResult(ArrayBuildState *astate, Datum dvalue, bool disnull, Oid element_type, @@ -297,7 +299,7 @@ extern Datum makeMdArrayResult(ArrayBuildState *astate, int ndims, int *dims, int *lbs, MemoryContext rcontext, bool release); extern ArrayBuildStateArr *initArrayResultArr(Oid array_type, Oid element_type, - MemoryContext rcontext); + MemoryContext rcontext, bool subcontext); extern ArrayBuildStateArr *accumArrayResultArr(ArrayBuildStateArr *astate, Datum dvalue, bool disnull, Oid array_type, @@ -306,7 +308,7 @@ extern Datum makeArrayResultArr(ArrayBuildStateArr *astate, MemoryContext rcontext, bool release); extern ArrayBuildStateAny *initArrayResultAny(Oid input_type, - MemoryContext rcontext); + MemoryContext rcontext, bool subcontext); extern ArrayBuildStateAny *accumArrayResultAny(ArrayBuildStateAny *astate, Datum dvalue, bool disnull, Oid input_type, diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c index 492c1ef..e3dda5d 100644 --- a/src/pl/plperl/plperl.c +++ b/src/pl/plperl/plperl.c @@ -1218,7 +1218,7 @@ plperl_array_to_datum(SV *src, Oid typid, int32 typmod) errmsg("cannot convert Perl array to non-array type %s", format_type_be(typid)))); - astate = initArrayResult(elemtypid, CurrentMemoryContext); + astate = initArrayResult(elemtypid, CurrentMemoryContext, true); _sv_to_datum_finfo(elemtypid, &finfo, &typioparam);
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers