Hello

I found probably hard problem in cooperation with window functions :(

tuplesort_begin_datum creates child context inside aggcontext. This
context is used for tuplestore. But when this function is called from
WindowAgg_Aggregates context - someone drops all child context every
iteration, and then tuplestore state is invalid. For this moment we
can block using a median function as window function, but it should be
solved better - if it is possible - I don't see inside window function
implementation.

2010/9/20 Hitoshi Harada <umi.tan...@gmail.com>:
> 2010/8/19 Pavel Stehule <pavel.steh...@gmail.com>:
>> Hello
>>
>> I am sending a prototype implementation of functions median and
>> percentile. This implementation is very simple and I moved it to
>> contrib for this moment - it is more easy maintainable. Later I'll
>> move it to core.
>
> I've reviewed this patch.
>
> * The patch can apply cleanly and make doesn't print any errors nor
> warnings. But it doesn't touch contrib/Makefile so I had to make by
> changing dir to contrib/median.
>

fixed

> * Cosmetic coding style should be more appropriate, including trailing
> white spaces and indentation positions.
>

can you specify where, please?

> * Since these two aggregates use tuplesort inside it, there're
> possible risk to cause out of memory in normal use case. I don't think
> this fact is critical, but at least some notation should be referred
> in docs.
>

it is same as windows function, no? So the risk is same.

> * It doesn't contain any document nor regression tests.
>

yes, it is my fault, some median regress test appended, documentation
I am will sending later - resp. if you agree with current form of
patch, I'll finalize this patch and remove "median" to core. I put
these functions to contrib just for simple testing of proof concept.

> * They should be callable in window function context; for example
>
> contrib_regression=# select median(a) over (order by a) from t1;
> ERROR:  invalid tuplesort state
>
> or at least user-friend message should be printed.
>

the problem is in windows function implementation - partially fixed -
blocked from windows context

> * The returned type is controversy. median(int) returns float8 as the
> patch intended, but avg(int) returns numeric. AFAIK only avg(float8)
> returns float8.
>

fixed

> * percentile() is more problematic; first, the second argument for the
> aggregate takes N of "N%ile" as int, like 50 if you want 50%ile,  but
> it doesn't care about negative values or more than 100. In addition,
> the second argument is taken at the first non-NULL value of the first
> argument, but the second argument is semantically constant. For
> example, for (1.. 10) value of a in table t1,
>

yes, I understand - I don't have a problem to modify it. Just this has
same behave as string_agg. This is again deeper problem - we cannot
ensure so some parameter of aggregation function will be a constant. -
so it cannot be well solved now :( Have you some idea? There isn't any
tool for checking if some parameter is constant or not.

I removed percentile now.

> contrib_regression=# select percentile(a, a * 10 order by a) from t1;
>  percentile
> ------------
>          1
> (1 row)
>
> contrib_regression=# select percentile(a, a * 10 order by a desc) from t1;
>  percentile
> ------------
>         10
> (1 row)
>
> and if the argument comes from the subquery which doesn't contain
> ORDER BY clause, you cannot predict the result.
>
> That's all of my quick review up to now.
>
> Regards,
>
> --
> Hitoshi Harada
>
*** ./contrib/Makefile.orig	2010-06-14 18:17:56.000000000 +0200
--- ./contrib/Makefile	2010-09-21 19:24:59.211190814 +0200
***************
*** 23,28 ****
--- 23,29 ----
  		isn		\
  		lo		\
  		ltree		\
+ 		median		\
  		oid2name	\
  		pageinspect	\
  		passwordcheck	\
*** ./contrib/median/expected/median.out.orig	2010-09-21 22:03:24.749899797 +0200
--- ./contrib/median/expected/median.out	2010-09-21 22:01:46.000000000 +0200
***************
*** 0 ****
--- 1,30 ----
+ -- import library
+ SET client_min_messages = warning;
+ \set ECHO none
+ -- check result for numeric datatypes
+ create table median_test(
+ 	a int, 
+ 	b int8, 
+ 	c float, 
+ 	d numeric, 
+ 	e double precision
+ );
+ insert into median_test
+    select i,i,i,i,i
+       from generate_series(1,10) g(i);
+ select median(a), median(b), median(c), median(d), median(e) from median_test;
+        median       |       median       | median |       median       | median 
+ --------------------+--------------------+--------+--------------------+--------
+  5.5000000000000000 | 5.5000000000000000 |    5.5 | 5.5000000000000000 |    5.5
+ (1 row)
+ 
+ truncate table median_test;
+ insert into median_test
+    select i,i,i,i,i
+       from generate_series(1,11) g(i);
+ select median(a), median(b), median(c), median(d), median(e) from median_test;
+  median | median | median | median | median 
+ --------+--------+--------+--------+--------
+       6 |      6 |      6 |      6 |      6
+ (1 row)
+ 
*** ./contrib/median/Makefile.orig	2010-08-19 12:38:56.144777253 +0200
--- ./contrib/median/Makefile	2010-08-18 20:23:39.180156339 +0200
***************
*** 0 ****
--- 1,17 ----
+ # $PostgreSQL: pgsql/contrib/median/Makefile,v 1.1 2008/07/29 18:31:20 tgl Exp $
+ 
+ MODULES = median
+ DATA_built = median.sql
+ DATA = uninstall_median.sql
+ REGRESS = median
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/median
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
*** ./contrib/median/median.c.orig	2010-08-19 12:39:01.456650776 +0200
--- ./contrib/median/median.c	2010-09-21 22:22:09.745898468 +0200
***************
*** 0 ****
--- 1,287 ----
+ /*
+  * $PostgreSQL: pgsql/contrib/citext/citext.c,v 1.2 2009/06/11 14:48:50 momjian Exp $
+  */
+ #include "postgres.h"
+ 
+ #include "funcapi.h"
+ #include "miscadmin.h"
+ #include "catalog/pg_type.h"
+ #include "parser/parse_coerce.h"
+ #include "parser/parse_oper.h"
+ #include "utils/builtins.h"
+ #include "utils/numeric.h"
+ #include "utils/tuplesort.h"
+ 
+ Datum median_transfn(PG_FUNCTION_ARGS);
+ Datum median_finalfn_double(PG_FUNCTION_ARGS);
+ Datum median_finalfn_numeric(PG_FUNCTION_ARGS);
+ 
+ Datum percentile_transfn(PG_FUNCTION_ARGS);
+ Datum percentile_finalfn(PG_FUNCTION_ARGS);
+ 
+ 
+ #ifdef PG_MODULE_MAGIC
+ PG_MODULE_MAGIC;
+ #endif
+ 
+ PG_FUNCTION_INFO_V1(median_transfn);
+ PG_FUNCTION_INFO_V1(median_finalfn_double);
+ PG_FUNCTION_INFO_V1(median_finalfn_numeric);
+ PG_FUNCTION_INFO_V1(percentile_transfn);
+ PG_FUNCTION_INFO_V1(percentile_finalfn);
+ 
+ typedef struct
+ {
+ 	int	nelems;		/* number of valid entries */
+ 	Tuplesortstate *sortstate;
+ 	FmgrInfo	cast_func_finfo;
+ 	int	p;		/* nth for percentille */
+ 	MemoryContext	aggcontext;
+ } StatAggState;
+ 
+ static StatAggState *
+ makeStatAggState(FunctionCallInfo fcinfo)
+ {
+ 	MemoryContext oldctx;
+ 	MemoryContext aggcontext;
+ 	StatAggState *aggstate;
+ 	Oid	sortop,
+ 			castfunc;
+ 	Oid	   valtype;
+ 	Oid		targettype = InvalidOid;
+ 	CoercionPathType		pathtype;
+ 
+ 	/* 
+ 	 * A tuplesort creates a child Memory Context "TupleSort", but
+ 	 * a current implementation of window functions drops all child
+ 	 * context every iteration, and therefore this function cannot
+ 	 * be called as windows function for this moment.
+ 	 */
+ 	if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
+ 	{
+ 		/* cannot be called inside like windowAggregates */
+ 		elog(ERROR, "median_transfn called in windows aggregate context");
+ 	}
+ 
+ 	if (!AggCheckCallContext(fcinfo, &aggcontext))
+ 	{
+ 		/* cannot be called directly because of internal-type argument */
+ 		elog(ERROR, "string_agg_transfn called in non-aggregate context");
+ 	}
+ 
+ 	oldctx = MemoryContextSwitchTo(aggcontext);
+ 	
+ 	aggstate = (StatAggState *) palloc0(sizeof(StatAggState));
+ 	aggstate->nelems = 0;
+ 	aggstate->aggcontext = aggcontext;
+ 	
+ 	valtype = get_fn_expr_argtype(fcinfo->flinfo, 1);
+ 	get_sort_group_operators(valtype,
+ 						    true, false, false,
+ 						    &sortop, NULL, NULL);
+ 
+ 	aggstate->sortstate = tuplesort_begin_datum(valtype,
+ 							sortop,
+ 							SORTBY_NULLS_DEFAULT,
+ 							work_mem, false);
+ 
+ 	MemoryContextSwitchTo(oldctx);
+ 
+ 	/* 
+ 	 * The result type will be numeric for integer types and for numeric type.
+ 	 * For other types resulting type is double precission. This is consistent
+ 	 * with other aggregates like AVG,..
+ 	 */
+ 	switch (valtype)
+ 	{
+ 		case FLOAT8OID:
+ 		case FLOAT4OID:
+ 			targettype = FLOAT8OID;
+ 			break;
+ 		
+ 		case INT2OID:
+ 		case INT4OID:
+ 		case INT8OID:
+ 		case NUMERICOID:
+ 			targettype = NUMERICOID;
+ 			break;
+ 		
+ 		default:
+ 			elog(ERROR, "unsupported data type with oid %d", valtype);
+ 	}
+ 	if (valtype != targettype)
+ 	{
+ 		/* find a cast function */
+ 		pathtype = find_coercion_pathway(targettype, valtype,
+ 									COERCION_EXPLICIT,
+ 									&castfunc);
+ 		if (pathtype == COERCION_PATH_FUNC)
+ 		{
+ 			Assert(OidIsValid(castfunc));
+ 			fmgr_info_cxt(castfunc, &aggstate->cast_func_finfo,
+ 									    aggcontext);
+ 		} 
+ 		else if (pathtype == COERCION_PATH_RELABELTYPE)
+ 		{
+ 			aggstate->cast_func_finfo.fn_oid = InvalidOid;
+ 		}
+ 		else 
+ 			elog(ERROR, "no conversion function from %s %s",
+ 					 format_type_be(valtype),
+ 					 format_type_be(targettype));
+ 	}
+ 
+ 	return aggstate;
+ }
+ 
+ /*
+  *  append a non NULL value to tuplesort
+  */
+ Datum
+ median_transfn(PG_FUNCTION_ARGS)
+ {
+ 	StatAggState *aggstate;
+ 	
+ 	aggstate = PG_ARGISNULL(0) ? NULL : (StatAggState *) PG_GETARG_POINTER(0);
+ 	
+ 	if (!PG_ARGISNULL(1))
+ 	{
+ 		if (aggstate == NULL)
+ 			aggstate = makeStatAggState(fcinfo);
+ 		
+ 		tuplesort_putdatum(aggstate->sortstate, PG_GETARG_DATUM(1), false);
+ 		aggstate->nelems++;
+ 	}
+ 
+ 	PG_RETURN_POINTER(aggstate);
+ }
+ 
+ static double 
+ to_double(Datum value, FmgrInfo *cast_func_finfo)
+ {
+ 	/* when valtype is same as target type, returns directly */
+ 	if (cast_func_finfo->fn_oid == InvalidOid)
+ 		return DatumGetFloat8(value);
+ 	
+ 	return DatumGetFloat8(FunctionCall1(cast_func_finfo, value));
+ }
+ 
+ Datum
+ median_finalfn_double(PG_FUNCTION_ARGS)
+ {
+ 	StatAggState *aggstate;
+ 	
+ 	Assert(AggCheckCallContext(fcinfo, NULL));
+ 
+ 	aggstate = PG_ARGISNULL(0) ? NULL : (StatAggState *) PG_GETARG_POINTER(0);
+ 
+ 	if (aggstate != NULL)
+ 	{
+ 		int	lidx;
+ 		int	hidx;
+ 		Datum	   value;
+ 		bool	isNull;
+ 		int		i = 1;
+ 		double	result = 0;
+ 		
+ 		hidx = aggstate->nelems / 2 + 1;
+ 		lidx = (aggstate->nelems + 1) / 2;
+ 
+ 		tuplesort_performsort(aggstate->sortstate);
+ 		
+ 		while (tuplesort_getdatum(aggstate->sortstate, 
+ 								true,
+ 								      &value, &isNull))
+ 		{
+ 			if (i++ == lidx)
+ 			{
+ 				result = to_double(value, &aggstate->cast_func_finfo);
+ 				
+ 				if (lidx != hidx)
+ 				{
+ 					tuplesort_getdatum(aggstate->sortstate,
+ 										true,
+ 										      &value, &isNull);
+ 					result = (result + to_double(value, &aggstate->cast_func_finfo)) / 2.0;
+ 				}
+ 				break;
+ 			}
+ 		}
+ 		
+ 		tuplesort_end(aggstate->sortstate);
+ 
+ 		PG_RETURN_FLOAT8(result);
+ 	}
+ 	else
+ 		PG_RETURN_NULL();
+ }
+ 
+ static Numeric
+ to_numeric(Datum value, FmgrInfo *cast_func_finfo)
+ {
+ 	/* when valtype is same as target type, returns directly */
+ 	if (cast_func_finfo->fn_oid == InvalidOid)
+ 		return DatumGetNumeric(value);
+ 
+ 	return DatumGetNumeric(FunctionCall1(cast_func_finfo, value));
+ }
+ 
+ 
+ Datum
+ median_finalfn_numeric(PG_FUNCTION_ARGS)
+ {
+ 	StatAggState *aggstate;
+ 	
+ 	Assert(AggCheckCallContext(fcinfo, NULL));
+ 
+ 	aggstate = PG_ARGISNULL(0) ? NULL : (StatAggState *) PG_GETARG_POINTER(0);
+ 
+ 	if (aggstate != NULL)
+ 	{
+ 		int	lidx;
+ 		int	hidx;
+ 		Datum	   value;
+ 		bool	isNull;
+ 		int		i = 1;
+ 		Numeric result = NULL;		/* be compiler quiet */
+ 		
+ 		hidx = aggstate->nelems / 2 + 1;
+ 		lidx = (aggstate->nelems + 1) / 2;
+ 
+ 		tuplesort_performsort(aggstate->sortstate);
+ 		
+ 		while (tuplesort_getdatum(aggstate->sortstate, 
+ 								true,
+ 								      &value, &isNull))
+ 		{
+ 			if (i++ == lidx)
+ 			{
+ 				result = to_numeric(value, &aggstate->cast_func_finfo);
+ 				
+ 				if (lidx != hidx)
+ 				{
+ 					Numeric stack;
+ 					
+ 					tuplesort_getdatum(aggstate->sortstate,
+ 										true,
+ 										      &value, &isNull);
+ 					stack = to_numeric(value, &aggstate->cast_func_finfo);
+ 					stack = DatumGetNumeric(DirectFunctionCall2(numeric_add,
+ 												NumericGetDatum(stack),
+ 												NumericGetDatum(result)));
+ 					result = DatumGetNumeric(DirectFunctionCall2(numeric_div,
+ 												NumericGetDatum(stack),
+ 												DirectFunctionCall1(float4_numeric,
+ 																    Float4GetDatum(2.0))));
+ 				}
+ 				break;
+ 			}
+ 		}
+ 		
+ 		tuplesort_end(aggstate->sortstate);
+ 
+ 		PG_RETURN_NUMERIC(result);
+ 	}
+ 	else
+ 		PG_RETURN_NULL();
+ }
*** ./contrib/median/median.sql.in.orig	2010-08-19 12:39:06.192775857 +0200
--- ./contrib/median/median.sql.in	2010-09-21 21:52:18.730897577 +0200
***************
*** 0 ****
--- 1,50 ----
+ CREATE OR REPLACE FUNCTION median_transfn(internal, anyelement)
+ RETURNS internal
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE;
+ 
+ CREATE OR REPLACE FUNCTION median_finalfn_double(internal)
+ RETURNS double precision
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE;
+ 
+ CREATE OR REPLACE FUNCTION median_finalfn_numeric(internal)
+ RETURNS numeric
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE;
+ 
+ CREATE AGGREGATE median(bigint) (
+     SFUNC=median_transfn,
+     STYPE=internal,
+     FINALFUNC=median_finalfn_numeric
+ );
+ 
+ CREATE AGGREGATE median(double precision) (
+     SFUNC=median_transfn,
+     STYPE=internal,
+     FINALFUNC=median_finalfn_double
+ );
+ 
+ CREATE AGGREGATE median(integer) (
+     SFUNC=median_transfn,
+     STYPE=internal,
+     FINALFUNC=median_finalfn_numeric
+ );
+ 
+ CREATE AGGREGATE median(numeric) (
+     SFUNC=median_transfn,
+     STYPE=internal,
+     FINALFUNC=median_finalfn_numeric
+ );
+ 
+ CREATE AGGREGATE median(real) (
+     SFUNC=median_transfn,
+     STYPE=internal,
+     FINALFUNC=median_finalfn_double
+ );
+ 
+ CREATE AGGREGATE median(smallint) (
+     SFUNC=median_transfn,
+     STYPE=internal,
+     FINALFUNC=median_finalfn_numeric
+ );
*** ./contrib/median/sql/median.sql.orig	2010-09-21 21:59:24.909024776 +0200
--- ./contrib/median/sql/median.sql	2010-09-21 22:01:35.996896570 +0200
***************
*** 0 ****
--- 1,32 ----
+ -- import library
+ 
+ SET client_min_messages = warning;
+ \set ECHO none
+ \i median.sql
+ RESET client_min_messages;
+ \set ECHO all
+ 
+ -- check result for numeric datatypes
+ 
+ create table median_test(
+ 	a int, 
+ 	b int8, 
+ 	c float, 
+ 	d numeric, 
+ 	e double precision
+ );
+ 
+ insert into median_test
+    select i,i,i,i,i
+       from generate_series(1,10) g(i);
+ 
+ select median(a), median(b), median(c), median(d), median(e) from median_test;
+ 
+ truncate table median_test;
+ 
+ insert into median_test
+    select i,i,i,i,i
+       from generate_series(1,11) g(i);
+ 
+ select median(a), median(b), median(c), median(d), median(e) from median_test;
+ 
*** ./contrib/median/uninstall_median.sql.orig	2010-08-19 12:39:11.712777158 +0200
--- ./contrib/median/uninstall_median.sql	2010-09-21 21:52:08.794896392 +0200
***************
*** 0 ****
--- 1,9 ----
+ DROP FUNCTION median_transfn(internal, anyelement);
+ DROP FUNCTION median_finalfn_double(internal);
+ DROP FUNCTION median_finalfn_numeric(internal);
+ DROP AGGREGATE median(bigint);
+ DROP AGGREGATE median(double precison);
+ DROP AGGREGATE median(integer);
+ DROP AGGREGATE median(numeric);
+ DROP AGGREGATE median(real);
+ DROP AGGREGATE median(smallint);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to