Hello Dean,
Thanks for the detailed feedback! Here is the rebased version.
> 1). In the tests:
>
> +select setseed(0.8);
> +select rand_array(10, 0, 3, 50::int, 80::int);
..
> this should really have a comment block to distinguish these new tests
> from the preceeding normal_rand() tests.
> 3). It's only necessary to call setseed() once to get a reproducible
> set of results from then on.
> 4). I'd use setseed(0) or setseed(0.5), since those values are exactly
> representable as double precision values, unlike 0.8, ensuring that it
> works the same on all platforms. It might not matter, but why take the
> risk?
All Done, very interesting about the reason why the 0.8 is bad.
> 2). It's worth also having tests for the error cases.
After the code refactor, looks the ERROR is not reachable, so I added
both Assert(false) and elog(ERROR) with no test case for that.
>
> 5). The float8 case still has minimum and maximum value arguments that
> it just ignores. It should either not take those arguments, or it
> should respect them, and try to return float8 values in the requested
> range. I suspect that trying to return float8 values in the requested
> range would be hard, if not impossible, due to the inexact nature of
> double precision arithmetic. That's why I suggested earlier having the
> float8 function not take minval/maxval arguments, and just return
> values in the range 0 to 1.
>
> 11). If the float8 case is made to not have minval/maxval arguments, this
> code:
>
> + Datum minval = PG_GETARG_DATUM(3),
> + maxval = PG_GETARG_DATUM(4);
>
> and the FunctionCallInfo setup code needs to be different for the
> float8 case, in order to avoid reading and setting undefined
> arguments. Perhaps introduce a "need_val_bounds" boolean variable,
> based on the datatype.
I should have noticed the float8 issue after reading your first
reply.. Actually I don't have speical reason for which I have to support
float8. At least when I working on the patch, I want some toast and
non-toast data type, so supporting both int/bigint and numeric should be
good, at least for my purpose. So in the attached version, I removed the
support for float8 for simplification.
>
> 6). This new function:
>
> +static Datum
> +rand_array_internal(FunctionCallInfo fcinfo, Oid datatype)
> +{
>
> should have a comment block. In particular, it should document what
> its inputs and outputs are.
Done.
>
> 7). This code:
>
> + int num_tuples = PG_GETARG_INT32(0);
> + int minlen = PG_GETARG_INT32(1);
> + int maxlen = PG_GETARG_INT32(2);
> + Datum minval = PG_GETARG_DATUM(3),
> + maxval = PG_GETARG_DATUM(4);
> + rand_array_fctx *fctx;
> +
> + if (datatype == INT4OID)
> + random_fn_oid = F_RANDOM_INT4_INT4;
> + else if (datatype == INT8OID)
...
> should be inside the "if (SRF_IS_FIRSTCALL())" block. There's no point
> repeating all those checks for every call.
Done.
>
> 8). I think it would be neater to code the "if (datatype == INT4OID)
> ... else if ..." as a switch statement.
Done.
>
>
> 9). I would swap the last 2 bound checks round, and simplify the error
> messages as follows:
Done.
>
> Also note that primary error messages should not end with a period.
> See https://www.postgresql.org/docs/current/error-style-guide.html
Thanks for sharing this!
> 10). In this error:
>
> + elog(ERROR, "unsupported type %d for rand_array function.",
> + datatype);
>
> "datatype" is of type Oid, which is unsigned, and so it should use
> "%u" not "%d". Also, as above, it should not end with a period, so it
> should be:
>
> + elog(ERROR, "unsupported type %u for rand_array function",
> + datatype);
Done.
>
> 12). This code:
>
> + random_len_fcinfo->args[0].value = minlen;
> + random_len_fcinfo->args[1].value = maxlen;
>
> should really be:
>
> + random_len_fcinfo->args[0].value = Int32GetDatum(minlen);
> + random_len_fcinfo->args[1].value = Int32GetDatum(maxlen);
>
> It amounts to the same thing, but documents the fact that it's
> converting an integer to a Datum.
Done.
>
>
> 13). These new functions are significantly under-documented,
> especially when compared to all the other functions on
> https://www.postgresql.org/docs/current/tablefunc.html
>
> They really should have their own subsection, along the same lines as
> "F.41.1.1. Normal_rand", explaining what the functions do, what their
> arguments mean, and giving a couple of usage examples.
Done, very impresive feedback and I know why PostgreSQL can always have
user friendly documentation now.
--
Best Regards
Andy Fan
>From 2024871241c35959a259e54a8151d4e9372dbfbf Mon Sep 17 00:00:00 2001
From: Andy Fan <[email protected]>
Date: Mon, 26 Aug 2024 18:50:57 +0800
Subject: [PATCH v20241016 1/1] Add functions rand_array function to
contrib/tablefunc.
It produces an array of numeric_type with its length in the range of
[minlen,maxlen] and each value is in the range of [minval,maxval].
---
contrib/tablefunc/Makefile | 2 +-
contrib/tablefunc/expected/tablefunc.out | 55 +++++++
contrib/tablefunc/sql/tablefunc.sql | 11 ++
contrib/tablefunc/tablefunc--1.0--1.1.sql | 17 +++
contrib/tablefunc/tablefunc.c | 168 ++++++++++++++++++++++
contrib/tablefunc/tablefunc.control | 2 +-
doc/src/sgml/tablefunc.sgml | 88 +++++++++++-
src/backend/utils/adt/arrayfuncs.c | 1 +
8 files changed, 341 insertions(+), 3 deletions(-)
create mode 100644 contrib/tablefunc/tablefunc--1.0--1.1.sql
diff --git a/contrib/tablefunc/Makefile b/contrib/tablefunc/Makefile
index 191a3a1d38..f0c67308fd 100644
--- a/contrib/tablefunc/Makefile
+++ b/contrib/tablefunc/Makefile
@@ -3,7 +3,7 @@
MODULES = tablefunc
EXTENSION = tablefunc
-DATA = tablefunc--1.0.sql
+DATA = tablefunc--1.0.sql tablefunc--1.0--1.1.sql
PGFILEDESC = "tablefunc - various functions that return tables"
REGRESS = tablefunc
diff --git a/contrib/tablefunc/expected/tablefunc.out b/contrib/tablefunc/expected/tablefunc.out
index ddece79029..2562da6b0f 100644
--- a/contrib/tablefunc/expected/tablefunc.out
+++ b/contrib/tablefunc/expected/tablefunc.out
@@ -12,6 +12,61 @@ SELECT avg(normal_rand)::int, count(*) FROM normal_rand(100, 250, 0.2);
-- negative number of tuples
SELECT avg(normal_rand)::int, count(*) FROM normal_rand(-1, 250, 0.2);
ERROR: number of rows cannot be negative
+--
+-- random_array()
+-- produce the result with a stable seed.
+--
+select setseed(0);
+ setseed
+---------
+
+(1 row)
+
+select rand_array(10, 0, 3, 50::int, 80::int);
+ rand_array
+------------
+ {63,71,66}
+ {64}
+ {54}
+ {72,64,60}
+ {75}
+ {53,73}
+ {69}
+ {74,67}
+ {65}
+ {73}
+(10 rows)
+
+select rand_array(10, 0, 3, 50::bigint, 80::bigint);
+ rand_array
+------------
+ {}
+ {59,53}
+ {72}
+ {80,79}
+ {71}
+ {80,80}
+ {61,64}
+ {62,76,80}
+ {}
+ {}
+(10 rows)
+
+select rand_array(10, 0, 3, 50::numeric, 80::numeric);
+ rand_array
+------------
+ {77,66,73}
+ {51,53}
+ {65,54}
+ {72}
+ {55,72}
+ {}
+ {55,70,64}
+ {75}
+ {}
+ {}
+(10 rows)
+
--
-- crosstab()
--
diff --git a/contrib/tablefunc/sql/tablefunc.sql b/contrib/tablefunc/sql/tablefunc.sql
index 0fb8e40de2..f474c1ab6e 100644
--- a/contrib/tablefunc/sql/tablefunc.sql
+++ b/contrib/tablefunc/sql/tablefunc.sql
@@ -8,6 +8,17 @@ SELECT avg(normal_rand)::int, count(*) FROM normal_rand(100, 250, 0.2);
-- negative number of tuples
SELECT avg(normal_rand)::int, count(*) FROM normal_rand(-1, 250, 0.2);
+
+--
+-- random_array()
+-- produce the result with a stable seed.
+--
+select setseed(0);
+select rand_array(10, 0, 3, 50::int, 80::int);
+select rand_array(10, 0, 3, 50::bigint, 80::bigint);
+select rand_array(10, 0, 3, 50::numeric, 80::numeric);
+
+
--
-- crosstab()
--
diff --git a/contrib/tablefunc/tablefunc--1.0--1.1.sql b/contrib/tablefunc/tablefunc--1.0--1.1.sql
new file mode 100644
index 0000000000..baf9176e98
--- /dev/null
+++ b/contrib/tablefunc/tablefunc--1.0--1.1.sql
@@ -0,0 +1,17 @@
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION tablefunc UPDATE TO '1.1'" to load this file. \quit
+
+CREATE FUNCTION rand_array(numvals int, minlen int, maxlen int, minval int, maxval int)
+RETURNS setof int[]
+AS 'MODULE_PATHNAME','rand_array_int'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE FUNCTION rand_array(numvals int, minlen int, maxlen int, minval bigint, maxval bigint)
+RETURNS setof bigint[]
+AS 'MODULE_PATHNAME','rand_array_bigint'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE FUNCTION rand_array(numvals int, minlen int, maxlen int, minval numeric, maxval numeric)
+RETURNS setof numeric[]
+AS 'MODULE_PATHNAME','rand_array_numeric'
+LANGUAGE C VOLATILE STRICT;
diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c
index 2a25607a2a..fc7099612d 100644
--- a/contrib/tablefunc/tablefunc.c
+++ b/contrib/tablefunc/tablefunc.c
@@ -42,7 +42,9 @@
#include "lib/stringinfo.h"
#include "miscadmin.h"
#include "tablefunc.h"
+#include "utils/array.h"
#include "utils/builtins.h"
+#include "utils/fmgroids.h"
PG_MODULE_MAGIC;
@@ -91,6 +93,12 @@ typedef struct
bool use_carry; /* use second generated value */
} normal_rand_fctx;
+typedef struct
+{
+ FunctionCallInfo random_val_fcinfo;
+ FunctionCallInfo random_len_fcinfo;
+} rand_array_fctx;
+
#define xpfree(var_) \
do { \
if (var_ != NULL) \
@@ -269,6 +277,166 @@ normal_rand(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funcctx);
}
+/*
+ * rand_array_internal
+ * Fill the requested number of array with random values for the
+ * given range and type into fcinfo.
+ *
+ * Inputs:
+ * fcinfo: includes the number of rows, minlen and maxlen for the array,
+ * minval and maxval for the element in the array.
+ * datatype: the data type for the array element.
+ */
+static Datum
+rand_array_internal(FunctionCallInfo fcinfo, Oid datatype)
+{
+ FuncCallContext *funcctx;
+ rand_array_fctx *fctx;
+
+ /* stuff done only on the first call of the function */
+ if (SRF_IS_FIRSTCALL())
+ {
+ int num_tuples = PG_GETARG_INT32(0);
+ int minlen = PG_GETARG_INT32(1);
+ int maxlen = PG_GETARG_INT32(2);
+ Datum minval = PG_GETARG_DATUM(3),
+ maxval = PG_GETARG_DATUM(4);
+
+ MemoryContext oldcontext;
+
+ FmgrInfo *random_len_flinfo, *random_val_flinfo;
+ FunctionCallInfo random_len_fcinfo, random_val_fcinfo;
+ Oid random_fn_oid;
+
+ switch(datatype)
+ {
+ case INT4OID:
+ random_fn_oid = F_RANDOM_INT4_INT4;
+ break;
+ case INT8OID:
+ random_fn_oid = F_RANDOM_INT8_INT8;
+ break;
+ case NUMERICOID:
+ random_fn_oid = F_RANDOM_NUMERIC_NUMERIC;
+ break;
+ default:
+ elog(ERROR, "unsupported type %u for rand_array function",
+ datatype);
+ break;
+ }
+
+ if (num_tuples < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("number of rows cannot be negative")));
+
+ if (minlen < 0)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("minlen must be greater than or equal to zero"));
+
+ if (maxlen < minlen)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("maxlen must be greater than or equal to minlen"));
+
+ /* create a function context for cross-call persistence */
+ funcctx = SRF_FIRSTCALL_INIT();
+
+ /*
+ * switch to memory context appropriate for multiple function calls
+ */
+ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+ funcctx->max_calls = num_tuples;
+
+ /* allocate memory for user context */
+ fctx = (rand_array_fctx *) palloc(sizeof(rand_array_fctx));
+
+ /* build the random_len_fcinfo */
+ random_len_fcinfo = (FunctionCallInfo) palloc0(SizeForFunctionCallInfo(2));
+ random_len_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo));
+ fmgr_info(F_RANDOM_INT4_INT4, random_len_flinfo);
+ InitFunctionCallInfoData(*random_len_fcinfo, random_len_flinfo, 2,
+ InvalidOid, NULL, NULL);
+
+ random_len_fcinfo->args[0].isnull = false;
+ random_len_fcinfo->args[1].isnull = false;
+ random_len_fcinfo->args[0].value = Int32GetDatum(minlen);
+ random_len_fcinfo->args[1].value = Int32GetDatum(maxlen);
+
+ /* build the random_val_fcinfo for the specified data type */
+ random_val_fcinfo = (FunctionCallInfo) palloc0(SizeForFunctionCallInfo(2));
+ random_val_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo));
+ fmgr_info(random_fn_oid, random_val_flinfo);
+ InitFunctionCallInfoData(*random_val_fcinfo, random_val_flinfo, 2,
+ InvalidOid, NULL, NULL);
+
+ random_val_fcinfo->args[0].isnull = false;
+ random_val_fcinfo->args[1].isnull = false;
+ random_val_fcinfo->args[0].value = minval;
+ random_val_fcinfo->args[1].value = maxval;
+
+ fctx->random_val_fcinfo = random_val_fcinfo;
+ fctx->random_len_fcinfo = random_len_fcinfo;
+
+ funcctx->user_fctx = fctx;
+
+ MemoryContextSwitchTo(oldcontext);
+ }
+
+ /* stuff done on every call of the function */
+ funcctx = SRF_PERCALL_SETUP();
+
+ fctx = funcctx->user_fctx;
+
+ if (funcctx->call_cntr < funcctx->max_calls)
+ {
+ int array_len;
+ int i;
+ Datum *results;
+
+ array_len = Int32GetDatum(FunctionCallInvoke(fctx->random_len_fcinfo));
+
+ results = palloc(array_len * sizeof(Datum));
+
+ for(i = 0; i < array_len; i++)
+ results[i] = FunctionCallInvoke(fctx->random_val_fcinfo);
+
+
+ SRF_RETURN_NEXT(funcctx, PointerGetDatum(
+ construct_array_builtin(results, array_len, datatype)));
+ }
+ else
+ /* do when there is no more left */
+ SRF_RETURN_DONE(funcctx);
+}
+
+
+PG_FUNCTION_INFO_V1(rand_array_int);
+Datum
+rand_array_int(PG_FUNCTION_ARGS)
+{
+ return rand_array_internal(fcinfo, INT4OID);
+}
+
+
+PG_FUNCTION_INFO_V1(rand_array_bigint);
+Datum
+rand_array_bigint(PG_FUNCTION_ARGS)
+{
+ return rand_array_internal(fcinfo, INT8OID);
+}
+
+
+PG_FUNCTION_INFO_V1(rand_array_numeric);
+Datum
+rand_array_numeric(PG_FUNCTION_ARGS)
+{
+ return rand_array_internal(fcinfo, NUMERICOID);
+}
+
+
/*
* get_normal_pair()
* Assigns normally distributed (Gaussian) values to a pair of provided
diff --git a/contrib/tablefunc/tablefunc.control b/contrib/tablefunc/tablefunc.control
index 7b25d16170..9cc6222a4f 100644
--- a/contrib/tablefunc/tablefunc.control
+++ b/contrib/tablefunc/tablefunc.control
@@ -1,6 +1,6 @@
# tablefunc extension
comment = 'functions that manipulate whole tables, including crosstab'
-default_version = '1.0'
+default_version = '1.1'
module_pathname = '$libdir/tablefunc'
relocatable = true
trusted = true
diff --git a/doc/src/sgml/tablefunc.sgml b/doc/src/sgml/tablefunc.sgml
index e10fe7009d..6eab69a129 100644
--- a/doc/src/sgml/tablefunc.sgml
+++ b/doc/src/sgml/tablefunc.sgml
@@ -53,6 +53,17 @@
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <function>rand_array</function> (<parameter>numvals</parameter> <type>integer</type>,<parameter>minlen</parameter> <type>int4</type>, <parameter>maxlen</parameter> <type>int4</type>,<parameter>minval</parameter> <type><replaceable>numeric_type</replaceable></type>, <parameter>maxval</parameter> <type><replaceable>numeric_type</replaceable></type> )
+ <returnvalue>setof <replaceable>numeric_type</replaceable>[]</returnvalue>
+ </para>
+ <para>
+ Produces a set of random numeric_type[], uses the same deterministic pseudo-random number generator as random().
+ </para>
+ </entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<function>crosstab</function> ( <parameter>sql</parameter> <type>text</type> )
@@ -849,7 +860,82 @@ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2'
(6 rows)
</programlisting>
</para>
- </sect3>
+ </sect3>
+
+ <sect3 id="tablefunc-functions-rand-array">
+ <title><function>rand_array</function></title>
+
+ <indexterm>
+ <primary>rand_array</primary>
+ </indexterm>
+
+ <synopsis>
+ rand_array(int numvals, int minlen, int maxlen, int numeric, int numeric) returns setof numeric
+ </synopsis>
+
+ <para>
+ <function>rand_array</function> produces a set of array with its lenght belongs
+ to [minlen, maxlen] and each element in the array belongs to [minval, maxval].
+ </para>
+
+ <para>
+ <parameter>numvals</parameter> is the number of values to be returned
+ from the function. The array length is controlled in [<parameter>minlen</parameter>,
+ <parameter>maxlen</parameter>], and the value of each element in the array
+ is controlled in [<parameter>minval</parameter>, <parameter>maxval</parameter>]
+ </para>
+
+ <para>
+ For example, this call requests 10 arrays with each array's length is between
+ 3 and 5, and each element in the array is between 2.1 and 3.5. The final value
+ follows the impact of setseed().
+ </para>
+
+ <screen>
+test=# SELECT * FROM rand_array(10, 3, 5, 3.0, 5.0);
+ rand_array
+-----------------------
+ {5.0,3.2,3.6,3.7,4.7}
+ {5.0,4.0,5.0,4.8,4.3}
+ {3.5,4.3,4.3,3.7}
+ {3.8,3.9,4.8,5.0,4.2}
+ {5.0,4.7,4.9}
+ {3.5,5.0,3.5,4.8}
+ {4.9,3.8,3.6,4.2}
+ {4.0,4.5,3.5}
+ {4.2,3.4,3.8,3.8,3.9}
+ {4.4,4.2,4.0,4.1}
+(10 rows)
+ </screen>
+
+ <para>
+ Besides numeric, both int4 and int8 are supported.
+ </para>
+
+ <screen>
+test=# SELECT pg_typeof(rand_array), * FROM rand_array(5, 3, 5, 3::int4, 5.0::int4);
+ pg_typeof | rand_array
+-----------+-------------
+ integer[] | {5,5,5,4,5}
+ integer[] | {4,5,3,4}
+ integer[] | {4,5,4,3}
+ integer[] | {5,4,4,3}
+ integer[] | {3,4,3,3,5}
+(5 rows)
+
+test=# SELECT pg_typeof(rand_array), * FROM rand_array(5, 3, 5, 3::int8, 5.0::int8);
+ pg_typeof | rand_array
+-----------+-------------
+ bigint[] | {3,5,5,4,4}
+ bigint[] | {5,4,5,5}
+ bigint[] | {3,4,3}
+ bigint[] | {5,3,5,5}
+ bigint[] | {5,5,5,4}
+(5 rows)
+ </screen>
+ </sect3>
+
+
</sect2>
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index a715e7e0b8..4255aede2b 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -3442,6 +3442,7 @@ construct_array_builtin(Datum *elems, int nelems, Oid elmtype)
break;
case TEXTOID:
+ case NUMERICOID:
elmlen = -1;
elmbyval = false;
elmalign = TYPALIGN_INT;
--
2.45.1