Re: New function normal_rand_array function to contrib/tablefunc.

Andy Fan Wed, 16 Oct 2024 00:43:53 -0700

Hello Dean,

 Thanks for the detailed feedback! Here is the rebased version.


> 1). In the tests:
>
> +select setseed(0.8);
> +select rand_array(10, 0, 3, 50::int, 80::int);
..
> this should really have a comment block to distinguish these new tests
> from the preceeding normal_rand() tests.

> 3). It's only necessary to call setseed() once to get a reproducible
> set of results from then on.

> 4). I'd use setseed(0) or setseed(0.5), since those values are exactly
> representable as double precision values, unlike 0.8, ensuring that it
> works the same on all platforms. It might not matter, but why take the
> risk?

All Done, very interesting about the reason why the 0.8 is bad.

> 2). It's worth also having tests for the error cases.

After the code refactor, looks the ERROR is not reachable, so I added
both Assert(false) and elog(ERROR) with no test case for that.

>
> 5). The float8 case still has minimum and maximum value arguments that
> it just ignores. It should either not take those arguments, or it
> should respect them, and try to return float8 values in the requested
> range. I suspect that trying to return  float8 values in the requested
> range would be hard, if not impossible, due to the inexact nature of
> double precision arithmetic. That's why I suggested earlier having the
> float8 function not take minval/maxval arguments, and just return
> values in the range 0 to 1.
>
> 11). If the float8 case is made to not have minval/maxval arguments, this 
> code:
>
> +    Datum minval = PG_GETARG_DATUM(3),
> +        maxval = PG_GETARG_DATUM(4);
>
> and the FunctionCallInfo setup code needs to be different for the
> float8 case, in order to avoid reading and setting undefined
> arguments. Perhaps introduce a "need_val_bounds" boolean variable,
> based on the datatype.

I should have noticed the float8 issue after reading your first
reply.. Actually I don't have speical reason for which I have to support
float8. At least when I working on the patch, I want some toast and
non-toast data type, so supporting both int/bigint and numeric should be
good, at least for my purpose. So in the attached version, I removed the
support for float8 for simplification.

>
> 6). This new function:
>
> +static Datum
> +rand_array_internal(FunctionCallInfo fcinfo, Oid datatype)
> +{
>
> should have a comment block. In particular, it should document what
> its inputs and outputs are.

Done.
>
> 7). This code:
>
> +    int num_tuples = PG_GETARG_INT32(0);
> +    int minlen = PG_GETARG_INT32(1);
> +    int maxlen = PG_GETARG_INT32(2);
> +    Datum minval = PG_GETARG_DATUM(3),
> +        maxval = PG_GETARG_DATUM(4);
> +    rand_array_fctx *fctx;
> +
> +    if (datatype == INT4OID)
> +        random_fn_oid = F_RANDOM_INT4_INT4;
> +    else if (datatype == INT8OID)
...
> should be inside the "if (SRF_IS_FIRSTCALL())" block. There's no point
> repeating all those checks for every call.

Done.
>
> 8). I think it would be neater to code the "if (datatype == INT4OID)
> ... else if ..." as a switch statement.
Done.
>
>
> 9). I would swap the last 2 bound checks round, and simplify the error
> messages as follows:

Done.
>
> Also note that primary error messages should not end with a period.
> See https://www.postgresql.org/docs/current/error-style-guide.html

Thanks for sharing this!

> 10). In this error:
>
> +        elog(ERROR, "unsupported type %d for rand_array function.",
> +             datatype);
>
> "datatype" is of type Oid, which is unsigned, and so it should use
> "%u" not "%d". Also, as above, it should not end with a period, so it
> should be:
>
> +        elog(ERROR, "unsupported type %u for rand_array function",
> +             datatype);

Done.

>
> 12). This code:
>
> +        random_len_fcinfo->args[0].value = minlen;
> +        random_len_fcinfo->args[1].value = maxlen;
>
> should really be:
>
> +        random_len_fcinfo->args[0].value = Int32GetDatum(minlen);
> +        random_len_fcinfo->args[1].value = Int32GetDatum(maxlen);
>
> It amounts to the same thing, but documents the fact that it's
> converting an integer to a Datum.

Done. 
>
>
> 13). These new functions are significantly under-documented,
> especially when compared to all the other functions on
> https://www.postgresql.org/docs/current/tablefunc.html
>
> They really should have their own subsection, along the same lines as
> "F.41.1.1. Normal_rand", explaining what the functions do, what their
> arguments mean, and giving a couple of usage examples.

Done, very impresive feedback and I know why PostgreSQL can always have
user friendly documentation now.

-- 
Best Regards
Andy Fan

>From 2024871241c35959a259e54a8151d4e9372dbfbf Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihuifan1...@163.com>
Date: Mon, 26 Aug 2024 18:50:57 +0800
Subject: [PATCH v20241016 1/1] Add functions rand_array function to
 contrib/tablefunc.

It produces an array of numeric_type with its length in the range of
[minlen,maxlen] and each value is in the range of [minval,maxval].
---
 contrib/tablefunc/Makefile                |   2 +-
 contrib/tablefunc/expected/tablefunc.out  |  55 +++++++
 contrib/tablefunc/sql/tablefunc.sql       |  11 ++
 contrib/tablefunc/tablefunc--1.0--1.1.sql |  17 +++
 contrib/tablefunc/tablefunc.c             | 168 ++++++++++++++++++++++
 contrib/tablefunc/tablefunc.control       |   2 +-
 doc/src/sgml/tablefunc.sgml               |  88 +++++++++++-
 src/backend/utils/adt/arrayfuncs.c        |   1 +
 8 files changed, 341 insertions(+), 3 deletions(-)
 create mode 100644 contrib/tablefunc/tablefunc--1.0--1.1.sql

diff --git a/contrib/tablefunc/Makefile b/contrib/tablefunc/Makefile
index 191a3a1d38..f0c67308fd 100644
--- a/contrib/tablefunc/Makefile
+++ b/contrib/tablefunc/Makefile
@@ -3,7 +3,7 @@
 MODULES = tablefunc
 
 EXTENSION = tablefunc
-DATA = tablefunc--1.0.sql
+DATA = tablefunc--1.0.sql tablefunc--1.0--1.1.sql
 PGFILEDESC = "tablefunc - various functions that return tables"
 
 REGRESS = tablefunc
diff --git a/contrib/tablefunc/expected/tablefunc.out b/contrib/tablefunc/expected/tablefunc.out
index ddece79029..2562da6b0f 100644
--- a/contrib/tablefunc/expected/tablefunc.out
+++ b/contrib/tablefunc/expected/tablefunc.out
@@ -12,6 +12,61 @@ SELECT avg(normal_rand)::int, count(*) FROM normal_rand(100, 250, 0.2);
 -- negative number of tuples
 SELECT avg(normal_rand)::int, count(*) FROM normal_rand(-1, 250, 0.2);
 ERROR:  number of rows cannot be negative
+--
+-- random_array()
+-- produce the result with a stable seed.
+--
+select setseed(0);
+ setseed 
+---------
+ 
+(1 row)
+
+select rand_array(10, 0, 3, 50::int, 80::int);
+ rand_array 
+------------
+ {63,71,66}
+ {64}
+ {54}
+ {72,64,60}
+ {75}
+ {53,73}
+ {69}
+ {74,67}
+ {65}
+ {73}
+(10 rows)
+
+select rand_array(10, 0, 3, 50::bigint, 80::bigint);
+ rand_array 
+------------
+ {}
+ {59,53}
+ {72}
+ {80,79}
+ {71}
+ {80,80}
+ {61,64}
+ {62,76,80}
+ {}
+ {}
+(10 rows)
+
+select rand_array(10, 0, 3, 50::numeric, 80::numeric);
+ rand_array 
+------------
+ {77,66,73}
+ {51,53}
+ {65,54}
+ {72}
+ {55,72}
+ {}
+ {55,70,64}
+ {75}
+ {}
+ {}
+(10 rows)
+
 --
 -- crosstab()
 --
diff --git a/contrib/tablefunc/sql/tablefunc.sql b/contrib/tablefunc/sql/tablefunc.sql
index 0fb8e40de2..f474c1ab6e 100644
--- a/contrib/tablefunc/sql/tablefunc.sql
+++ b/contrib/tablefunc/sql/tablefunc.sql
@@ -8,6 +8,17 @@ SELECT avg(normal_rand)::int, count(*) FROM normal_rand(100, 250, 0.2);
 -- negative number of tuples
 SELECT avg(normal_rand)::int, count(*) FROM normal_rand(-1, 250, 0.2);
 
+
+--
+-- random_array()
+-- produce the result with a stable seed.
+--
+select setseed(0);
+select rand_array(10, 0, 3, 50::int, 80::int);
+select rand_array(10, 0, 3, 50::bigint, 80::bigint);
+select rand_array(10, 0, 3, 50::numeric, 80::numeric);
+
+
 --
 -- crosstab()
 --
diff --git a/contrib/tablefunc/tablefunc--1.0--1.1.sql b/contrib/tablefunc/tablefunc--1.0--1.1.sql
new file mode 100644
index 0000000000..baf9176e98
--- /dev/null
+++ b/contrib/tablefunc/tablefunc--1.0--1.1.sql
@@ -0,0 +1,17 @@
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION tablefunc UPDATE TO '1.1'" to load this file. \quit
+
+CREATE FUNCTION rand_array(numvals int, minlen int, maxlen int, minval int, maxval int)
+RETURNS setof int[]
+AS 'MODULE_PATHNAME','rand_array_int'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE FUNCTION rand_array(numvals int, minlen int, maxlen int, minval bigint, maxval bigint)
+RETURNS setof bigint[]
+AS 'MODULE_PATHNAME','rand_array_bigint'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE FUNCTION rand_array(numvals int, minlen int, maxlen int, minval numeric, maxval numeric)
+RETURNS setof numeric[]
+AS 'MODULE_PATHNAME','rand_array_numeric'
+LANGUAGE C VOLATILE STRICT;
diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c
index 2a25607a2a..fc7099612d 100644
--- a/contrib/tablefunc/tablefunc.c
+++ b/contrib/tablefunc/tablefunc.c
@@ -42,7 +42,9 @@
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "tablefunc.h"
+#include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 
 PG_MODULE_MAGIC;
 
@@ -91,6 +93,12 @@ typedef struct
 	bool		use_carry;		/* use second generated value */
 } normal_rand_fctx;
 
+typedef struct
+{
+	FunctionCallInfo random_val_fcinfo;
+	FunctionCallInfo random_len_fcinfo;
+} rand_array_fctx;
+
 #define xpfree(var_) \
 	do { \
 		if (var_ != NULL) \
@@ -269,6 +277,166 @@ normal_rand(PG_FUNCTION_ARGS)
 		SRF_RETURN_DONE(funcctx);
 }
 
+/*
+ * rand_array_internal
+ *		Fill the requested number of array with random values for the
+ *		given range and type into fcinfo.
+ *
+ * Inputs:
+ * fcinfo: includes the number of rows, minlen and maxlen for the array,
+ * minval and maxval for the element in the array.
+ * datatype: the data type for the array element.
+ */
+static Datum
+rand_array_internal(FunctionCallInfo fcinfo, Oid datatype)
+{
+	FuncCallContext *funcctx;
+	rand_array_fctx *fctx;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		int num_tuples = PG_GETARG_INT32(0);
+		int minlen = PG_GETARG_INT32(1);
+		int maxlen = PG_GETARG_INT32(2);
+		Datum minval = PG_GETARG_DATUM(3),
+			maxval = PG_GETARG_DATUM(4);
+
+		MemoryContext oldcontext;
+
+		FmgrInfo	*random_len_flinfo, *random_val_flinfo;
+		FunctionCallInfo random_len_fcinfo, random_val_fcinfo;
+		Oid random_fn_oid;
+
+		switch(datatype)
+		{
+			case INT4OID:
+				random_fn_oid = F_RANDOM_INT4_INT4;
+				break;
+			case INT8OID:
+				random_fn_oid = F_RANDOM_INT8_INT8;
+				break;
+			case NUMERICOID:
+				random_fn_oid = F_RANDOM_NUMERIC_NUMERIC;
+				break;
+			default:
+				elog(ERROR, "unsupported type %u for rand_array function",
+					 datatype);
+				break;
+		}
+
+		if (num_tuples < 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("number of rows cannot be negative")));
+
+		if (minlen < 0)
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("minlen must be greater than or equal to zero"));
+
+		if (maxlen < minlen)
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("maxlen must be greater than or equal to minlen"));
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		funcctx->max_calls = num_tuples;
+
+		/* allocate memory for user context */
+		fctx = (rand_array_fctx *) palloc(sizeof(rand_array_fctx));
+
+		/* build the random_len_fcinfo */
+		random_len_fcinfo = (FunctionCallInfo) palloc0(SizeForFunctionCallInfo(2));
+		random_len_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo));
+		fmgr_info(F_RANDOM_INT4_INT4, random_len_flinfo);
+		InitFunctionCallInfoData(*random_len_fcinfo, random_len_flinfo, 2,
+								 InvalidOid, NULL, NULL);
+
+		random_len_fcinfo->args[0].isnull = false;
+		random_len_fcinfo->args[1].isnull = false;
+		random_len_fcinfo->args[0].value = Int32GetDatum(minlen);
+		random_len_fcinfo->args[1].value = Int32GetDatum(maxlen);
+
+		/* build the random_val_fcinfo for the specified data type */
+		random_val_fcinfo = (FunctionCallInfo) palloc0(SizeForFunctionCallInfo(2));
+		random_val_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo));
+		fmgr_info(random_fn_oid, random_val_flinfo);
+		InitFunctionCallInfoData(*random_val_fcinfo, random_val_flinfo, 2,
+								 InvalidOid, NULL, NULL);
+
+		random_val_fcinfo->args[0].isnull = false;
+		random_val_fcinfo->args[1].isnull = false;
+		random_val_fcinfo->args[0].value = minval;
+		random_val_fcinfo->args[1].value = maxval;
+
+		fctx->random_val_fcinfo = random_val_fcinfo;
+		fctx->random_len_fcinfo = random_len_fcinfo;
+
+		funcctx->user_fctx = fctx;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	fctx = funcctx->user_fctx;
+
+	if (funcctx->call_cntr < funcctx->max_calls)
+	{
+		int		array_len;
+		int		i;
+		Datum	*results;
+
+		array_len = Int32GetDatum(FunctionCallInvoke(fctx->random_len_fcinfo));
+
+		results = palloc(array_len * sizeof(Datum));
+
+		for(i = 0; i < array_len; i++)
+			results[i] = FunctionCallInvoke(fctx->random_val_fcinfo);
+
+
+		SRF_RETURN_NEXT(funcctx, PointerGetDatum(
+							construct_array_builtin(results, array_len, datatype)));
+	}
+	else
+		/* do when there is no more left */
+		SRF_RETURN_DONE(funcctx);
+}
+
+
+PG_FUNCTION_INFO_V1(rand_array_int);
+Datum
+rand_array_int(PG_FUNCTION_ARGS)
+{
+	return rand_array_internal(fcinfo, INT4OID);
+}
+
+
+PG_FUNCTION_INFO_V1(rand_array_bigint);
+Datum
+rand_array_bigint(PG_FUNCTION_ARGS)
+{
+	return rand_array_internal(fcinfo, INT8OID);
+}
+
+
+PG_FUNCTION_INFO_V1(rand_array_numeric);
+Datum
+rand_array_numeric(PG_FUNCTION_ARGS)
+{
+	return rand_array_internal(fcinfo, NUMERICOID);
+}
+
+
 /*
  * get_normal_pair()
  * Assigns normally distributed (Gaussian) values to a pair of provided
diff --git a/contrib/tablefunc/tablefunc.control b/contrib/tablefunc/tablefunc.control
index 7b25d16170..9cc6222a4f 100644
--- a/contrib/tablefunc/tablefunc.control
+++ b/contrib/tablefunc/tablefunc.control
@@ -1,6 +1,6 @@
 # tablefunc extension
 comment = 'functions that manipulate whole tables, including crosstab'
-default_version = '1.0'
+default_version = '1.1'
 module_pathname = '$libdir/tablefunc'
 relocatable = true
 trusted = true
diff --git a/doc/src/sgml/tablefunc.sgml b/doc/src/sgml/tablefunc.sgml
index e10fe7009d..6eab69a129 100644
--- a/doc/src/sgml/tablefunc.sgml
+++ b/doc/src/sgml/tablefunc.sgml
@@ -53,6 +53,17 @@
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <function>rand_array</function> (<parameter>numvals</parameter> <type>integer</type>,<parameter>minlen</parameter> <type>int4</type>, <parameter>maxlen</parameter> <type>int4</type>,<parameter>minval</parameter> <type><replaceable>numeric_type</replaceable></type>, <parameter>maxval</parameter> <type><replaceable>numeric_type</replaceable></type> )
+        <returnvalue>setof <replaceable>numeric_type</replaceable>[]</returnvalue>
+       </para>
+       <para>
+        Produces a set of random numeric_type[], uses the same deterministic pseudo-random number generator as random().
+       </para>
+       </entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <function>crosstab</function> ( <parameter>sql</parameter> <type>text</type> )
@@ -849,7 +860,82 @@ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2'
 (6 rows)
 </programlisting>
     </para>
-   </sect3>
+  </sect3>
+
+  <sect3 id="tablefunc-functions-rand-array">
+   <title><function>rand_array</function></title>
+
+   <indexterm>
+    <primary>rand_array</primary>
+   </indexterm>
+
+   <synopsis>
+     rand_array(int numvals, int minlen, int maxlen, int numeric, int numeric) returns setof numeric
+   </synopsis>
+
+    <para>
+      <function>rand_array</function> produces a set of array with its lenght belongs
+      to [minlen, maxlen] and each element in the array belongs to [minval, maxval].
+    </para>
+
+    <para>
+     <parameter>numvals</parameter> is the number of values to be returned
+     from the function. The array length is controlled in [<parameter>minlen</parameter>,
+     <parameter>maxlen</parameter>], and the value of each element in the array
+     is controlled in [<parameter>minval</parameter>, <parameter>maxval</parameter>]
+    </para>
+
+    <para>
+      For example, this call requests 10 arrays with each array's length is between
+      3 and 5, and each element in the array is between 2.1 and 3.5. The final value
+      follows the impact of setseed().
+    </para>
+
+    <screen>
+test=# SELECT * FROM rand_array(10, 3, 5, 3.0, 5.0);
+      rand_array       
+-----------------------
+ {5.0,3.2,3.6,3.7,4.7}
+ {5.0,4.0,5.0,4.8,4.3}
+ {3.5,4.3,4.3,3.7}
+ {3.8,3.9,4.8,5.0,4.2}
+ {5.0,4.7,4.9}
+ {3.5,5.0,3.5,4.8}
+ {4.9,3.8,3.6,4.2}
+ {4.0,4.5,3.5}
+ {4.2,3.4,3.8,3.8,3.9}
+ {4.4,4.2,4.0,4.1}
+(10 rows)
+    </screen>
+
+    <para>
+      Besides numeric, both int4 and int8 are supported.
+    </para>
+
+    <screen>
+test=# SELECT pg_typeof(rand_array), * FROM rand_array(5, 3, 5, 3::int4, 5.0::int4);
+ pg_typeof | rand_array  
+-----------+-------------
+ integer[] | {5,5,5,4,5}
+ integer[] | {4,5,3,4}
+ integer[] | {4,5,4,3}
+ integer[] | {5,4,4,3}
+ integer[] | {3,4,3,3,5}
+(5 rows)
+
+test=# SELECT pg_typeof(rand_array), * FROM rand_array(5, 3, 5, 3::int8, 5.0::int8);
+ pg_typeof | rand_array  
+-----------+-------------
+ bigint[]  | {3,5,5,4,4}
+ bigint[]  | {5,4,5,5}
+ bigint[]  | {3,4,3}
+ bigint[]  | {5,3,5,5}
+ bigint[]  | {5,5,5,4}
+(5 rows)
+    </screen>
+  </sect3>
+
+  
 
  </sect2>
 
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index a715e7e0b8..4255aede2b 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -3442,6 +3442,7 @@ construct_array_builtin(Datum *elems, int nelems, Oid elmtype)
 			break;
 
 		case TEXTOID:
+		case NUMERICOID:
 			elmlen = -1;
 			elmbyval = false;
 			elmalign = TYPALIGN_INT;
-- 
2.45.1

Re: New function normal_rand_array function to contrib/tablefunc.

Reply via email to