Hi hackers,

We found that several functions -- namely numeric_combine,
numeric_avg_combine, numeric_poly_combine, and int8_avg_combine -- are
returning NULL without signaling the nullity of datum in fcinfo.isnull.
This is obscured by the fact that the only functions in core (finalfunc
for various aggregates) that those return values feed into happen to
tolerate (or rather, not quite distinguish) zero-but-not-NULL trans
values.

In Greenplum, this behavior becomes problematic because Greenplum
serializes internal trans values before spilling the hash table. The
serial functions (numeric_serialize and friends) are strict functions
that will blow up when they are given null (either in the C sense or the
SQL sense) inputs.

In Postgres if we change hash aggregation in the future to spill the
hash table (vis-à-vis the input tuples), this issues would manifest
itself in the final aggregate because we'll serialize the combined (and
likely incorrectly null) trans values.

Please find attached a small patch fixing said issue.  Originally
reported by Denis Smirnov over at
https://github.com/greenplum-db/gpdb/pull/9878

Cheers,
Jesse and Deep
From edeeaa220fbc3d082d133b29a8c394632588d11a Mon Sep 17 00:00:00 2001
From: Denis Smirnov <s...@arenadata.io>
Date: Thu, 9 Apr 2020 16:10:29 -0700
Subject: [PATCH] Properly mark NULL returns in numeric aggregates

When a function returns NULL (in the SQL sense), it is expected to
signal that in fcinfo.isnull . numeric_combine and friends break this
rule by directly returning a NULL datum without signaling the nullity.
This is obscured by the fact that the only functions in core (finalfunc
for various aggregates) that those return values feed into happen to
tolerate (or rather, not quite distinguish) zero-but-not-NULL trans
values.

In Greenplum, this behavior becomes problematic because Greenplum
serializes internal trans values before spilling the hash table. The
serial functions (numeric_serialize and friends) are strict functions
that will blow up when they are given null (either in the C sense or the
SQL sense) inputs.
---
 src/backend/utils/adt/numeric.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 9986132b45e..57896de6edf 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -3946,7 +3946,11 @@ numeric_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (NumericAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
@@ -4034,7 +4038,11 @@ numeric_avg_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (NumericAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
@@ -4543,7 +4551,11 @@ numeric_poly_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
@@ -4774,7 +4786,11 @@ int8_avg_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
-- 
2.26.0

Reply via email to