Hello Dean, Thanks for the detailed feedback! Here is the rebased version.
> 1). In the tests: > > +select setseed(0.8); > +select rand_array(10, 0, 3, 50::int, 80::int); .. > this should really have a comment block to distinguish these new tests > from the preceeding normal_rand() tests. > 3). It's only necessary to call setseed() once to get a reproducible > set of results from then on. > 4). I'd use setseed(0) or setseed(0.5), since those values are exactly > representable as double precision values, unlike 0.8, ensuring that it > works the same on all platforms. It might not matter, but why take the > risk? All Done, very interesting about the reason why the 0.8 is bad. > 2). It's worth also having tests for the error cases. After the code refactor, looks the ERROR is not reachable, so I added both Assert(false) and elog(ERROR) with no test case for that. > > 5). The float8 case still has minimum and maximum value arguments that > it just ignores. It should either not take those arguments, or it > should respect them, and try to return float8 values in the requested > range. I suspect that trying to return float8 values in the requested > range would be hard, if not impossible, due to the inexact nature of > double precision arithmetic. That's why I suggested earlier having the > float8 function not take minval/maxval arguments, and just return > values in the range 0 to 1. > > 11). If the float8 case is made to not have minval/maxval arguments, this > code: > > + Datum minval = PG_GETARG_DATUM(3), > + maxval = PG_GETARG_DATUM(4); > > and the FunctionCallInfo setup code needs to be different for the > float8 case, in order to avoid reading and setting undefined > arguments. Perhaps introduce a "need_val_bounds" boolean variable, > based on the datatype. I should have noticed the float8 issue after reading your first reply.. Actually I don't have speical reason for which I have to support float8. At least when I working on the patch, I want some toast and non-toast data type, so supporting both int/bigint and numeric should be good, at least for my purpose. So in the attached version, I removed the support for float8 for simplification. > > 6). This new function: > > +static Datum > +rand_array_internal(FunctionCallInfo fcinfo, Oid datatype) > +{ > > should have a comment block. In particular, it should document what > its inputs and outputs are. Done. > > 7). This code: > > + int num_tuples = PG_GETARG_INT32(0); > + int minlen = PG_GETARG_INT32(1); > + int maxlen = PG_GETARG_INT32(2); > + Datum minval = PG_GETARG_DATUM(3), > + maxval = PG_GETARG_DATUM(4); > + rand_array_fctx *fctx; > + > + if (datatype == INT4OID) > + random_fn_oid = F_RANDOM_INT4_INT4; > + else if (datatype == INT8OID) ... > should be inside the "if (SRF_IS_FIRSTCALL())" block. There's no point > repeating all those checks for every call. Done. > > 8). I think it would be neater to code the "if (datatype == INT4OID) > ... else if ..." as a switch statement. Done. > > > 9). I would swap the last 2 bound checks round, and simplify the error > messages as follows: Done. > > Also note that primary error messages should not end with a period. > See https://www.postgresql.org/docs/current/error-style-guide.html Thanks for sharing this! > 10). In this error: > > + elog(ERROR, "unsupported type %d for rand_array function.", > + datatype); > > "datatype" is of type Oid, which is unsigned, and so it should use > "%u" not "%d". Also, as above, it should not end with a period, so it > should be: > > + elog(ERROR, "unsupported type %u for rand_array function", > + datatype); Done. > > 12). This code: > > + random_len_fcinfo->args[0].value = minlen; > + random_len_fcinfo->args[1].value = maxlen; > > should really be: > > + random_len_fcinfo->args[0].value = Int32GetDatum(minlen); > + random_len_fcinfo->args[1].value = Int32GetDatum(maxlen); > > It amounts to the same thing, but documents the fact that it's > converting an integer to a Datum. Done. > > > 13). These new functions are significantly under-documented, > especially when compared to all the other functions on > https://www.postgresql.org/docs/current/tablefunc.html > > They really should have their own subsection, along the same lines as > "F.41.1.1. Normal_rand", explaining what the functions do, what their > arguments mean, and giving a couple of usage examples. Done, very impresive feedback and I know why PostgreSQL can always have user friendly documentation now. -- Best Regards Andy Fan
>From 2024871241c35959a259e54a8151d4e9372dbfbf Mon Sep 17 00:00:00 2001 From: Andy Fan <zhihuifan1...@163.com> Date: Mon, 26 Aug 2024 18:50:57 +0800 Subject: [PATCH v20241016 1/1] Add functions rand_array function to contrib/tablefunc. It produces an array of numeric_type with its length in the range of [minlen,maxlen] and each value is in the range of [minval,maxval]. --- contrib/tablefunc/Makefile | 2 +- contrib/tablefunc/expected/tablefunc.out | 55 +++++++ contrib/tablefunc/sql/tablefunc.sql | 11 ++ contrib/tablefunc/tablefunc--1.0--1.1.sql | 17 +++ contrib/tablefunc/tablefunc.c | 168 ++++++++++++++++++++++ contrib/tablefunc/tablefunc.control | 2 +- doc/src/sgml/tablefunc.sgml | 88 +++++++++++- src/backend/utils/adt/arrayfuncs.c | 1 + 8 files changed, 341 insertions(+), 3 deletions(-) create mode 100644 contrib/tablefunc/tablefunc--1.0--1.1.sql diff --git a/contrib/tablefunc/Makefile b/contrib/tablefunc/Makefile index 191a3a1d38..f0c67308fd 100644 --- a/contrib/tablefunc/Makefile +++ b/contrib/tablefunc/Makefile @@ -3,7 +3,7 @@ MODULES = tablefunc EXTENSION = tablefunc -DATA = tablefunc--1.0.sql +DATA = tablefunc--1.0.sql tablefunc--1.0--1.1.sql PGFILEDESC = "tablefunc - various functions that return tables" REGRESS = tablefunc diff --git a/contrib/tablefunc/expected/tablefunc.out b/contrib/tablefunc/expected/tablefunc.out index ddece79029..2562da6b0f 100644 --- a/contrib/tablefunc/expected/tablefunc.out +++ b/contrib/tablefunc/expected/tablefunc.out @@ -12,6 +12,61 @@ SELECT avg(normal_rand)::int, count(*) FROM normal_rand(100, 250, 0.2); -- negative number of tuples SELECT avg(normal_rand)::int, count(*) FROM normal_rand(-1, 250, 0.2); ERROR: number of rows cannot be negative +-- +-- random_array() +-- produce the result with a stable seed. +-- +select setseed(0); + setseed +--------- + +(1 row) + +select rand_array(10, 0, 3, 50::int, 80::int); + rand_array +------------ + {63,71,66} + {64} + {54} + {72,64,60} + {75} + {53,73} + {69} + {74,67} + {65} + {73} +(10 rows) + +select rand_array(10, 0, 3, 50::bigint, 80::bigint); + rand_array +------------ + {} + {59,53} + {72} + {80,79} + {71} + {80,80} + {61,64} + {62,76,80} + {} + {} +(10 rows) + +select rand_array(10, 0, 3, 50::numeric, 80::numeric); + rand_array +------------ + {77,66,73} + {51,53} + {65,54} + {72} + {55,72} + {} + {55,70,64} + {75} + {} + {} +(10 rows) + -- -- crosstab() -- diff --git a/contrib/tablefunc/sql/tablefunc.sql b/contrib/tablefunc/sql/tablefunc.sql index 0fb8e40de2..f474c1ab6e 100644 --- a/contrib/tablefunc/sql/tablefunc.sql +++ b/contrib/tablefunc/sql/tablefunc.sql @@ -8,6 +8,17 @@ SELECT avg(normal_rand)::int, count(*) FROM normal_rand(100, 250, 0.2); -- negative number of tuples SELECT avg(normal_rand)::int, count(*) FROM normal_rand(-1, 250, 0.2); + +-- +-- random_array() +-- produce the result with a stable seed. +-- +select setseed(0); +select rand_array(10, 0, 3, 50::int, 80::int); +select rand_array(10, 0, 3, 50::bigint, 80::bigint); +select rand_array(10, 0, 3, 50::numeric, 80::numeric); + + -- -- crosstab() -- diff --git a/contrib/tablefunc/tablefunc--1.0--1.1.sql b/contrib/tablefunc/tablefunc--1.0--1.1.sql new file mode 100644 index 0000000000..baf9176e98 --- /dev/null +++ b/contrib/tablefunc/tablefunc--1.0--1.1.sql @@ -0,0 +1,17 @@ +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION tablefunc UPDATE TO '1.1'" to load this file. \quit + +CREATE FUNCTION rand_array(numvals int, minlen int, maxlen int, minval int, maxval int) +RETURNS setof int[] +AS 'MODULE_PATHNAME','rand_array_int' +LANGUAGE C VOLATILE STRICT; + +CREATE FUNCTION rand_array(numvals int, minlen int, maxlen int, minval bigint, maxval bigint) +RETURNS setof bigint[] +AS 'MODULE_PATHNAME','rand_array_bigint' +LANGUAGE C VOLATILE STRICT; + +CREATE FUNCTION rand_array(numvals int, minlen int, maxlen int, minval numeric, maxval numeric) +RETURNS setof numeric[] +AS 'MODULE_PATHNAME','rand_array_numeric' +LANGUAGE C VOLATILE STRICT; diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c index 2a25607a2a..fc7099612d 100644 --- a/contrib/tablefunc/tablefunc.c +++ b/contrib/tablefunc/tablefunc.c @@ -42,7 +42,9 @@ #include "lib/stringinfo.h" #include "miscadmin.h" #include "tablefunc.h" +#include "utils/array.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" PG_MODULE_MAGIC; @@ -91,6 +93,12 @@ typedef struct bool use_carry; /* use second generated value */ } normal_rand_fctx; +typedef struct +{ + FunctionCallInfo random_val_fcinfo; + FunctionCallInfo random_len_fcinfo; +} rand_array_fctx; + #define xpfree(var_) \ do { \ if (var_ != NULL) \ @@ -269,6 +277,166 @@ normal_rand(PG_FUNCTION_ARGS) SRF_RETURN_DONE(funcctx); } +/* + * rand_array_internal + * Fill the requested number of array with random values for the + * given range and type into fcinfo. + * + * Inputs: + * fcinfo: includes the number of rows, minlen and maxlen for the array, + * minval and maxval for the element in the array. + * datatype: the data type for the array element. + */ +static Datum +rand_array_internal(FunctionCallInfo fcinfo, Oid datatype) +{ + FuncCallContext *funcctx; + rand_array_fctx *fctx; + + /* stuff done only on the first call of the function */ + if (SRF_IS_FIRSTCALL()) + { + int num_tuples = PG_GETARG_INT32(0); + int minlen = PG_GETARG_INT32(1); + int maxlen = PG_GETARG_INT32(2); + Datum minval = PG_GETARG_DATUM(3), + maxval = PG_GETARG_DATUM(4); + + MemoryContext oldcontext; + + FmgrInfo *random_len_flinfo, *random_val_flinfo; + FunctionCallInfo random_len_fcinfo, random_val_fcinfo; + Oid random_fn_oid; + + switch(datatype) + { + case INT4OID: + random_fn_oid = F_RANDOM_INT4_INT4; + break; + case INT8OID: + random_fn_oid = F_RANDOM_INT8_INT8; + break; + case NUMERICOID: + random_fn_oid = F_RANDOM_NUMERIC_NUMERIC; + break; + default: + elog(ERROR, "unsupported type %u for rand_array function", + datatype); + break; + } + + if (num_tuples < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of rows cannot be negative"))); + + if (minlen < 0) + ereport(ERROR, + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("minlen must be greater than or equal to zero")); + + if (maxlen < minlen) + ereport(ERROR, + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("maxlen must be greater than or equal to minlen")); + + /* create a function context for cross-call persistence */ + funcctx = SRF_FIRSTCALL_INIT(); + + /* + * switch to memory context appropriate for multiple function calls + */ + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); + + funcctx->max_calls = num_tuples; + + /* allocate memory for user context */ + fctx = (rand_array_fctx *) palloc(sizeof(rand_array_fctx)); + + /* build the random_len_fcinfo */ + random_len_fcinfo = (FunctionCallInfo) palloc0(SizeForFunctionCallInfo(2)); + random_len_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo)); + fmgr_info(F_RANDOM_INT4_INT4, random_len_flinfo); + InitFunctionCallInfoData(*random_len_fcinfo, random_len_flinfo, 2, + InvalidOid, NULL, NULL); + + random_len_fcinfo->args[0].isnull = false; + random_len_fcinfo->args[1].isnull = false; + random_len_fcinfo->args[0].value = Int32GetDatum(minlen); + random_len_fcinfo->args[1].value = Int32GetDatum(maxlen); + + /* build the random_val_fcinfo for the specified data type */ + random_val_fcinfo = (FunctionCallInfo) palloc0(SizeForFunctionCallInfo(2)); + random_val_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo)); + fmgr_info(random_fn_oid, random_val_flinfo); + InitFunctionCallInfoData(*random_val_fcinfo, random_val_flinfo, 2, + InvalidOid, NULL, NULL); + + random_val_fcinfo->args[0].isnull = false; + random_val_fcinfo->args[1].isnull = false; + random_val_fcinfo->args[0].value = minval; + random_val_fcinfo->args[1].value = maxval; + + fctx->random_val_fcinfo = random_val_fcinfo; + fctx->random_len_fcinfo = random_len_fcinfo; + + funcctx->user_fctx = fctx; + + MemoryContextSwitchTo(oldcontext); + } + + /* stuff done on every call of the function */ + funcctx = SRF_PERCALL_SETUP(); + + fctx = funcctx->user_fctx; + + if (funcctx->call_cntr < funcctx->max_calls) + { + int array_len; + int i; + Datum *results; + + array_len = Int32GetDatum(FunctionCallInvoke(fctx->random_len_fcinfo)); + + results = palloc(array_len * sizeof(Datum)); + + for(i = 0; i < array_len; i++) + results[i] = FunctionCallInvoke(fctx->random_val_fcinfo); + + + SRF_RETURN_NEXT(funcctx, PointerGetDatum( + construct_array_builtin(results, array_len, datatype))); + } + else + /* do when there is no more left */ + SRF_RETURN_DONE(funcctx); +} + + +PG_FUNCTION_INFO_V1(rand_array_int); +Datum +rand_array_int(PG_FUNCTION_ARGS) +{ + return rand_array_internal(fcinfo, INT4OID); +} + + +PG_FUNCTION_INFO_V1(rand_array_bigint); +Datum +rand_array_bigint(PG_FUNCTION_ARGS) +{ + return rand_array_internal(fcinfo, INT8OID); +} + + +PG_FUNCTION_INFO_V1(rand_array_numeric); +Datum +rand_array_numeric(PG_FUNCTION_ARGS) +{ + return rand_array_internal(fcinfo, NUMERICOID); +} + + /* * get_normal_pair() * Assigns normally distributed (Gaussian) values to a pair of provided diff --git a/contrib/tablefunc/tablefunc.control b/contrib/tablefunc/tablefunc.control index 7b25d16170..9cc6222a4f 100644 --- a/contrib/tablefunc/tablefunc.control +++ b/contrib/tablefunc/tablefunc.control @@ -1,6 +1,6 @@ # tablefunc extension comment = 'functions that manipulate whole tables, including crosstab' -default_version = '1.0' +default_version = '1.1' module_pathname = '$libdir/tablefunc' relocatable = true trusted = true diff --git a/doc/src/sgml/tablefunc.sgml b/doc/src/sgml/tablefunc.sgml index e10fe7009d..6eab69a129 100644 --- a/doc/src/sgml/tablefunc.sgml +++ b/doc/src/sgml/tablefunc.sgml @@ -53,6 +53,17 @@ </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>rand_array</function> (<parameter>numvals</parameter> <type>integer</type>,<parameter>minlen</parameter> <type>int4</type>, <parameter>maxlen</parameter> <type>int4</type>,<parameter>minval</parameter> <type><replaceable>numeric_type</replaceable></type>, <parameter>maxval</parameter> <type><replaceable>numeric_type</replaceable></type> ) + <returnvalue>setof <replaceable>numeric_type</replaceable>[]</returnvalue> + </para> + <para> + Produces a set of random numeric_type[], uses the same deterministic pseudo-random number generator as random(). + </para> + </entry> + </row> + <row> <entry role="func_table_entry"><para role="func_signature"> <function>crosstab</function> ( <parameter>sql</parameter> <type>text</type> ) @@ -849,7 +860,82 @@ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2' (6 rows) </programlisting> </para> - </sect3> + </sect3> + + <sect3 id="tablefunc-functions-rand-array"> + <title><function>rand_array</function></title> + + <indexterm> + <primary>rand_array</primary> + </indexterm> + + <synopsis> + rand_array(int numvals, int minlen, int maxlen, int numeric, int numeric) returns setof numeric + </synopsis> + + <para> + <function>rand_array</function> produces a set of array with its lenght belongs + to [minlen, maxlen] and each element in the array belongs to [minval, maxval]. + </para> + + <para> + <parameter>numvals</parameter> is the number of values to be returned + from the function. The array length is controlled in [<parameter>minlen</parameter>, + <parameter>maxlen</parameter>], and the value of each element in the array + is controlled in [<parameter>minval</parameter>, <parameter>maxval</parameter>] + </para> + + <para> + For example, this call requests 10 arrays with each array's length is between + 3 and 5, and each element in the array is between 2.1 and 3.5. The final value + follows the impact of setseed(). + </para> + + <screen> +test=# SELECT * FROM rand_array(10, 3, 5, 3.0, 5.0); + rand_array +----------------------- + {5.0,3.2,3.6,3.7,4.7} + {5.0,4.0,5.0,4.8,4.3} + {3.5,4.3,4.3,3.7} + {3.8,3.9,4.8,5.0,4.2} + {5.0,4.7,4.9} + {3.5,5.0,3.5,4.8} + {4.9,3.8,3.6,4.2} + {4.0,4.5,3.5} + {4.2,3.4,3.8,3.8,3.9} + {4.4,4.2,4.0,4.1} +(10 rows) + </screen> + + <para> + Besides numeric, both int4 and int8 are supported. + </para> + + <screen> +test=# SELECT pg_typeof(rand_array), * FROM rand_array(5, 3, 5, 3::int4, 5.0::int4); + pg_typeof | rand_array +-----------+------------- + integer[] | {5,5,5,4,5} + integer[] | {4,5,3,4} + integer[] | {4,5,4,3} + integer[] | {5,4,4,3} + integer[] | {3,4,3,3,5} +(5 rows) + +test=# SELECT pg_typeof(rand_array), * FROM rand_array(5, 3, 5, 3::int8, 5.0::int8); + pg_typeof | rand_array +-----------+------------- + bigint[] | {3,5,5,4,4} + bigint[] | {5,4,5,5} + bigint[] | {3,4,3} + bigint[] | {5,3,5,5} + bigint[] | {5,5,5,4} +(5 rows) + </screen> + </sect3> + + </sect2> diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index a715e7e0b8..4255aede2b 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -3442,6 +3442,7 @@ construct_array_builtin(Datum *elems, int nelems, Oid elmtype) break; case TEXTOID: + case NUMERICOID: elmlen = -1; elmbyval = false; elmalign = TYPALIGN_INT; -- 2.45.1