Updated patch attached. 2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.steh...@gmail.com>: >>>> >>>>> I agree with your proposal. I have a few comments to design: >>>>> >>>> >>>>> 1. patch doesn't hold documentation and regress tests, please append >>>>> it. >>>>> >>>> i've added some regression tests in arrays.sql and aggregate.sql.
I've only found the documentation of array_agg. Where is the doc for array(subselect) defined? > 2. this functionality (multidimensional aggregation) can be interesting >>>>> more times, so maybe some interface like array builder should be >>>>> preferred. >>>>> >>>> We already have accumArrayResult and makeArrayResult/makeMdArrayResult, >>>> haven't we? >>>> >>>> Actually array_agg(anyarray) can be implemented by using >>>> accumArrayResult and makeMdArrayResult, but in the process we will >>>> deconstruct the array data and null bit-flags into ArrayBuildState->dvalues >>>> and dnulls. And we will reconstruct it through makeMdArrayResult. I think >>>> this way it's not efficient, so i decided to reimplement it and memcpy the >>>> array data and null flags as-is. >>>> >>> >>> aha, so isn't better to fix a performance for accumArrayResult ? >>> >> >> Ok, i'll go this route. While reading the code of array(subselect) as you >> pointed below, i think the easiest way to implement support for both >> array_agg(anyarray) and array(subselect) is to change accumArrayResult so >> it operates both with scalar datum(s) and array datums, with performance >> optimization for the latter. >> > implemented it by modifying ArrayBuildState to ArrayBuildStateArray and ArrayBuildStateScalar (do you have any idea for better struct naming?) > In other places, i think it's clearer if we just use accumArrayResult and >>>> makeArrayResult/makeMdArrayResult as API for building (multidimensional) >>>> arrays. >>>> >>>> >>>>> 3. array_agg was consistent with array(subselect), so it should be >>>>> fixed too >>>>> >>>>> postgres=# select array_agg(a) from test; >>>>> array_agg >>>>> ----------------------- >>>>> {{1,2,3,4},{1,2,3,4}} >>>>> (1 row) >>>>> >>>>> postgres=# select array(select a from test); >>>>> ERROR: could not find array type for data type integer[] >>>>> >>>> >>>> I'm pretty new in postgresql development. Can you point out where is >>>> array(subselect) implemented? >>>> >>> >>> where you can start? >>> >>> postgres=# explain select array(select a from test); >>> ERROR: 42704: could not find array type for data type integer[] >>> LOCATION: exprType, nodeFuncs.c:116 >>> >>> look to code ... your magic keyword is a ARRAY_SUBLINK .. search in >>> postgresql sources this keyword >>> >> > attention: probably we don't would to allow arrays everywhere. > I've changed the places where i think it's appropriate. > 4. why you use a magic constant (64) there? >>>>> >>>>> + astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); >>>>> + astate->aitems = 64 * nitems; >>>>> >>>>> + astate->nullbitmap = (bits8 *) >>>>> + repalloc(astate->nullbitmap, (astate->aitems + 7) / >>>>> 8); >>>>> >>>> >>>> just follow the arbitrary size choosen in accumArrayResult >>>> (arrayfuncs.c): >>>> astate->alen = 64; /* arbitrary starting array size */ >>>> >>>> it can be any number not too small and too big. Too small, and we will >>>> realloc shortly. Too big, we will end up wasting memory. >>>> >>> >>> you can try to alloc 1KB instead as start -- it is used more times in >>> Postgres. Then a overhead is max 1KB per agg call - what is acceptable. >>> >>> You take this value from accumArrayResult, but it is targeted for >>> shorted scalars - you should to expect so any array will be much larger. >>> >> this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 * number of items in array. Regards, -- Ali Akbar
*** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *************** *** 12046,12051 **** NULL baz</literallayout>(3 rows)</entry> --- 12046,12067 ---- <row> <entry> <indexterm> + <primary>array_agg</primary> + </indexterm> + <function>array_agg(<replaceable class="parameter">anyarray</replaceable>)</function> + </entry> + <entry> + any + </entry> + <entry> + the same array type as input type + </entry> + <entry>input arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input.</entry> + </row> + + <row> + <entry> + <indexterm> <primary>average</primary> </indexterm> <indexterm> *** a/src/backend/nodes/nodeFuncs.c --- b/src/backend/nodes/nodeFuncs.c *************** *** 108,119 **** exprType(const Node *expr) type = exprType((Node *) tent->expr); if (sublink->subLinkType == ARRAY_SUBLINK) { ! type = get_array_type(type); ! if (!OidIsValid(type)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_OBJECT), ! errmsg("could not find array type for data type %s", ! format_type_be(exprType((Node *) tent->expr))))); } } else if (sublink->subLinkType == MULTIEXPR_SUBLINK) --- 108,123 ---- type = exprType((Node *) tent->expr); if (sublink->subLinkType == ARRAY_SUBLINK) { ! if (!OidIsValid(get_element_type(type))) ! { ! /* not array, so check for its array type */ ! type = get_array_type(type); ! if (!OidIsValid(type)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_OBJECT), ! errmsg("could not find array type for data type %s", ! format_type_be(exprType((Node *) tent->expr))))); ! } } } else if (sublink->subLinkType == MULTIEXPR_SUBLINK) *************** *** 139,150 **** exprType(const Node *expr) type = subplan->firstColType; if (subplan->subLinkType == ARRAY_SUBLINK) { ! type = get_array_type(type); ! if (!OidIsValid(type)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_OBJECT), ! errmsg("could not find array type for data type %s", ! format_type_be(subplan->firstColType)))); } } else if (subplan->subLinkType == MULTIEXPR_SUBLINK) --- 143,158 ---- type = subplan->firstColType; if (subplan->subLinkType == ARRAY_SUBLINK) { ! if (!OidIsValid(get_element_type(type))) ! { ! /* not array, so check for its array type */ ! type = get_array_type(type); ! if (!OidIsValid(type)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_OBJECT), ! errmsg("could not find array type for data type %s", ! format_type_be(subplan->firstColType)))); ! } } } else if (subplan->subLinkType == MULTIEXPR_SUBLINK) *** a/src/backend/optimizer/plan/subselect.c --- b/src/backend/optimizer/plan/subselect.c *************** *** 668,677 **** build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot, Assert(!te->resjunk); Assert(testexpr == NULL); ! arraytype = get_array_type(exprType((Node *) te->expr)); ! if (!OidIsValid(arraytype)) ! elog(ERROR, "could not find array type for datatype %s", ! format_type_be(exprType((Node *) te->expr))); prm = generate_new_param(root, arraytype, exprTypmod((Node *) te->expr), --- 668,683 ---- Assert(!te->resjunk); Assert(testexpr == NULL); ! ! arraytype = exprType((Node *) te->expr); ! if (!OidIsValid(get_element_type(arraytype))) ! { ! /* not array, so get the array type */ ! arraytype = get_array_type(exprType((Node *) te->expr)); ! if (!OidIsValid(arraytype)) ! elog(ERROR, "could not find array type for datatype %s", ! format_type_be(exprType((Node *) te->expr))); ! } prm = generate_new_param(root, arraytype, exprTypmod((Node *) te->expr), *** a/src/backend/utils/adt/array_userfuncs.c --- b/src/backend/utils/adt/array_userfuncs.c *************** *** 16,22 **** #include "utils/builtins.h" #include "utils/lsyscache.h" - /*----------------------------------------------------------------------------- * array_push : * push an element onto either end of a one-dimensional array --- 16,21 ---- *** a/src/backend/utils/adt/arrayfuncs.c --- b/src/backend/utils/adt/arrayfuncs.c *************** *** 145,151 **** static int width_bucket_array_variable(Datum operand, Oid collation, TypeCacheEntry *typentry); - /* * array_in : * converts an array from the external format in "string" to --- 145,150 ---- *** a/src/include/catalog/pg_aggregate.h --- b/src/include/catalog/pg_aggregate.h *************** *** 275,280 **** DATA(insert ( 2901 n 0 xmlconcat2 - - - - f f 0 142 0 0 0 _null_ --- 275,281 ---- /* array */ DATA(insert ( 2335 n 0 array_agg_transfn array_agg_finalfn - - - t f 0 2281 0 0 0 _null_ _null_ )); + DATA(insert ( 6005 n 0 array_agg_anyarray_transfn array_agg_anyarray_finalfn - - - t f 0 2281 0 0 0 _null_ _null_ )); /* text */ DATA(insert ( 3538 n 0 string_agg_transfn string_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ )); *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h *************** *** 879,889 **** DATA(insert OID = 3167 ( array_remove PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 DESCR("remove any occurrences of an element from an array"); DATA(insert OID = 3168 ( array_replace PGNSP PGUID 12 1 0 0 0 f f f f f f i 3 0 2277 "2277 2283 2283" _null_ _null_ _null_ _null_ array_replace _null_ _null_ _null_ )); DESCR("replace any occurrences of an element in an array"); ! DATA(insert OID = 2333 ( array_agg_transfn PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2281 "2281 2283" _null_ _null_ _null_ _null_ array_agg_transfn _null_ _null_ _null_ )); DESCR("aggregate transition function"); ! DATA(insert OID = 2334 ( array_agg_finalfn PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2281 2283" _null_ _null_ _null_ _null_ array_agg_finalfn _null_ _null_ _null_ )); DESCR("aggregate final function"); ! DATA(insert OID = 2335 ( array_agg PGNSP PGUID 12 1 0 0 0 t f f f f f i 1 0 2277 "2283" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ )); DESCR("concatenate aggregate input into an array"); DATA(insert OID = 3218 ( width_bucket PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "2283 2277" _null_ _null_ _null_ _null_ width_bucket_array _null_ _null_ _null_ )); DESCR("bucket number of operand given a sorted array of bucket lower bounds"); --- 879,895 ---- DESCR("remove any occurrences of an element from an array"); DATA(insert OID = 3168 ( array_replace PGNSP PGUID 12 1 0 0 0 f f f f f f i 3 0 2277 "2277 2283 2283" _null_ _null_ _null_ _null_ array_replace _null_ _null_ _null_ )); DESCR("replace any occurrences of an element in an array"); ! DATA(insert OID = 2333 ( array_agg_transfn PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2281 "2281 2776" _null_ _null_ _null_ _null_ array_agg_transfn _null_ _null_ _null_ )); DESCR("aggregate transition function"); ! DATA(insert OID = 2334 ( array_agg_finalfn PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2281 2776" _null_ _null_ _null_ _null_ array_agg_finalfn _null_ _null_ _null_ )); DESCR("aggregate final function"); ! DATA(insert OID = 2335 ( array_agg PGNSP PGUID 12 1 0 0 0 t f f f f f i 1 0 2277 "2776" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ )); ! DESCR("concatenate aggregate input into an array"); ! DATA(insert OID = 6003 ( array_agg_anyarray_transfn PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2281 "2281 2277" _null_ _null_ _null_ _null_ array_agg_anyarray_transfn _null_ _null_ _null_ )); ! DESCR("aggregate transition function"); ! DATA(insert OID = 6004 ( array_agg_anyarray_finalfn PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2281 2277" _null_ _null_ _null_ _null_ array_agg_anyarray_finalfn _null_ _null_ _null_ )); ! DESCR("aggregate final function"); ! DATA(insert OID = 6005 ( array_agg PGNSP PGUID 12 1 0 0 0 t f f f f f i 1 0 2277 "2277" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ )); DESCR("concatenate aggregate input into an array"); DATA(insert OID = 3218 ( width_bucket PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "2283 2277" _null_ _null_ _null_ _null_ width_bucket_array _null_ _null_ _null_ )); DESCR("bucket number of operand given a sorted array of bucket lower bounds"); *** a/src/include/utils/array.h --- b/src/include/utils/array.h *************** *** 76,94 **** typedef struct /* * working state for accumArrayResult() and friends */ typedef struct ArrayBuildState { MemoryContext mcontext; /* where all the temp stuff is kept */ Datum *dvalues; /* array of accumulated Datums */ bool *dnulls; /* array of is-null flags for Datums */ int alen; /* allocated length of above arrays */ int nelems; /* number of valid entries in above arrays */ ! Oid element_type; /* data type of the Datums */ ! int16 typlen; /* needed info about datatype */ ! bool typbyval; ! char typalign; ! } ArrayBuildState; /* * structure to cache type metadata needed for array manipulation --- 76,132 ---- /* * working state for accumArrayResult() and friends + * + * is_array_accum: whether accumulating array values. + * (if true must be casted to ArrayBuildStateArray, else + * cast to ArrayBuildStateScalar) */ typedef struct ArrayBuildState { + bool is_array_accum; MemoryContext mcontext; /* where all the temp stuff is kept */ + Oid element_type; /* data type of the Datums */ + int16 typlen; /* needed info about datatype */ + bool typbyval; + char typalign; + } ArrayBuildState; + + /* + * array build state for array accumulation of scalar datums + */ + typedef struct ArrayBuildStateScalar + { + ArrayBuildState astate; + Datum *dvalues; /* array of accumulated Datums */ bool *dnulls; /* array of is-null flags for Datums */ int alen; /* allocated length of above arrays */ int nelems; /* number of valid entries in above arrays */ ! } ArrayBuildStateScalar; ! ! ! /* ! * array build state for array accumulation of array datums ! */ ! typedef struct ! { ! ArrayBuildState astate; ! ! char *data; /* array of accumulated data */ ! bits8 *nullbitmap; /* bitmap of is-null flags for data */ ! ! int abytes; /* allocated length of above arrays */ ! int aitems; /* allocated length of above arrays */ ! int nbytes; /* number of used bytes in above arrays */ ! int nitems; /* number of elements in above arrays */ ! int narray; /* number of array accumulated */ ! ! int ndims; /* element dimensions */ ! int *dims; ! int *lbs; ! ! bool hasnull; /* any element has null */ ! } ArrayBuildStateArray; /* * structure to cache type metadata needed for array manipulation *************** *** 260,266 **** extern Datum makeArrayResult(ArrayBuildState *astate, MemoryContext rcontext); extern Datum makeMdArrayResult(ArrayBuildState *astate, int ndims, int *dims, int *lbs, MemoryContext rcontext, bool release); ! extern ArrayIterator array_create_iterator(ArrayType *arr, int slice_ndim); extern bool array_iterate(ArrayIterator iterator, Datum *value, bool *isnull); extern void array_free_iterator(ArrayIterator iterator); --- 298,305 ---- MemoryContext rcontext); extern Datum makeMdArrayResult(ArrayBuildState *astate, int ndims, int *dims, int *lbs, MemoryContext rcontext, bool release); ! extern Datum makeArrayResultArray(ArrayBuildStateArray *astate, ! MemoryContext rcontext, bool release); extern ArrayIterator array_create_iterator(ArrayType *arr, int slice_ndim); extern bool array_iterate(ArrayIterator iterator, Datum *value, bool *isnull); extern void array_free_iterator(ArrayIterator iterator); *************** *** 293,298 **** extern ArrayType *create_singleton_array(FunctionCallInfo fcinfo, --- 332,340 ---- extern Datum array_agg_transfn(PG_FUNCTION_ARGS); extern Datum array_agg_finalfn(PG_FUNCTION_ARGS); + extern Datum array_agg_anyarray_transfn(PG_FUNCTION_ARGS); + extern Datum array_agg_anyarray_finalfn(PG_FUNCTION_ARGS); + /* * prototypes for functions defined in array_typanalyze.c */ *** a/src/test/regress/expected/aggregates.out --- b/src/test/regress/expected/aggregates.out *************** *** 914,919 **** select array_agg(distinct a order by a desc nulls last) --- 914,946 ---- {3,2,1,NULL} (1 row) + -- array_agg(anyarray) + select array_agg(ar) + from (values ('{1,2}'::int[]), ('{3,4}'::int[])) v(ar); + array_agg + --------------- + {{1,2},{3,4}} + (1 row) + + select array_agg(distinct ar order by ar desc) + from (select array[i / 2] from generate_series(1,10) a(i)) b(ar); + array_agg + --------------------------- + {{5},{4},{3},{2},{1},{0}} + (1 row) + + select array_agg(ar) + from (select array_agg(array[i, i+1, i-1]) + from generate_series(1,2) a(i)) b(ar); + array_agg + --------------------- + {{{1,2,0},{2,3,1}}} + (1 row) + + select array_agg('{}'::int[]) from generate_series(1,2); + ERROR: cannot accumulate empty arrays + select array_agg(null::int[]) from generate_series(1,2); + ERROR: cannot accumulate null arrays -- multi-arg aggs, strict/nonstrict, distinct/order by select aggfstr(a,b,c) from (values (1,3,'foo'),(0,null,null),(2,2,'bar'),(3,1,'baz')) v(a,b,c); *** a/src/test/regress/expected/arrays.out --- b/src/test/regress/expected/arrays.out *************** *** 1521,1526 **** select array_agg(unique1) from tenk1 where unique1 < -15; --- 1521,1543 ---- (1 row) + select array(select unique1 from tenk1 where unique1 < 15 order by unique1); + array + -------------------------------------- + {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14} + (1 row) + + select array(select array[i,i/2] from generate_series(1,5) a(i)); + array + --------------------------------- + {{1,0},{2,1},{3,1},{4,2},{5,2}} + (1 row) + + -- cannot accumulate null arrays and empty arrays + select array(select null::int[]); + ERROR: cannot accumulate null arrays + select array(select '{}'::int[]); + ERROR: cannot accumulate empty arrays select unnest(array[1,2,3]); unnest -------- *** a/src/test/regress/sql/aggregates.sql --- b/src/test/regress/sql/aggregates.sql *************** *** 322,327 **** select array_agg(distinct a order by a desc) --- 322,339 ---- select array_agg(distinct a order by a desc nulls last) from (values (1),(2),(1),(3),(null),(2)) v(a); + -- array_agg(anyarray) + select array_agg(ar) + from (values ('{1,2}'::int[]), ('{3,4}'::int[])) v(ar); + select array_agg(distinct ar order by ar desc) + from (select array[i / 2] from generate_series(1,10) a(i)) b(ar); + select array_agg(ar) + from (select array_agg(array[i, i+1, i-1]) + from generate_series(1,2) a(i)) b(ar); + + select array_agg('{}'::int[]) from generate_series(1,2); + select array_agg(null::int[]) from generate_series(1,2); + -- multi-arg aggs, strict/nonstrict, distinct/order by select aggfstr(a,b,c) *** a/src/test/regress/sql/arrays.sql --- b/src/test/regress/sql/arrays.sql *************** *** 432,437 **** select array_agg(ten) from (select ten from tenk1 where unique1 < 15 order by un --- 432,443 ---- select array_agg(nullif(ten, 4)) from (select ten from tenk1 where unique1 < 15 order by unique1) ss; select array_agg(unique1) from tenk1 where unique1 < -15; + select array(select unique1 from tenk1 where unique1 < 15 order by unique1); + select array(select array[i,i/2] from generate_series(1,5) a(i)); + -- cannot accumulate null arrays and empty arrays + select array(select null::int[]); + select array(select '{}'::int[]); + select unnest(array[1,2,3]); select * from unnest(array[1,2,3]); select unnest(array[1,2,3,4.5]::float8[]);
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers