subject:"\[HACKERS\] Window Functions"

Re: [HACKERS] Window functions can be created with defaults, but they don't work

2013-11-05 Thread Tom Lane

I wrote:
> Attached is a proposed patch against HEAD that fixes this by supporting
> default arguments properly for window functions.  In passing, it also
> allows named-argument notation in window function calls, since that's
> free once the other thing works (because the same subroutine fixes up
> both things).

Hm, well, almost free --- turns out ruleutils.c was assuming WindowFuncs
couldn't contain named arguments.  Updated patch attached.

regards, tom lane

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index e3dbc4b..16aade4 100644
*** a/doc/src/sgml/syntax.sgml
--- b/doc/src/sgml/syntax.sgml
*** SELECT concat_lower_or_upper('Hello', 'W
*** 2527,2534 
  
 
  
!  Named and mixed call notations can currently be used only with regular
!  functions, not with aggregate functions or window functions.
  
 

--- 2527,2535 
  
 
  
!  Named and mixed call notations currently cannot be used when calling an
!  aggregate function (but they do work when an aggregate function is used
!  as a window function).
  
 

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 76c032c..4fa73a9 100644
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
*** eval_const_expressions_mutator(Node *nod
*** 2251,2256 
--- 2251,2306 
   */
  return (Node *) copyObject(param);
  			}
+ 		case T_WindowFunc:
+ 			{
+ WindowFunc *expr = (WindowFunc *) node;
+ Oid			funcid = expr->winfnoid;
+ List	   *args;
+ Expr	   *aggfilter;
+ HeapTuple	func_tuple;
+ WindowFunc *newexpr;
+ 
+ /*
+  * We can't really simplify a WindowFunc node, but we mustn't
+  * just fall through to the default processing, because we
+  * have to apply expand_function_arguments to its argument
+  * list.  That takes care of inserting default arguments and
+  * expanding named-argument notation.
+  */
+ func_tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
+ if (!HeapTupleIsValid(func_tuple))
+ 	elog(ERROR, "cache lookup failed for function %u", funcid);
+ 
+ args = expand_function_arguments(expr->args, expr->wintype,
+  func_tuple);
+ 
+ ReleaseSysCache(func_tuple);
+ 
+ /* Now, recursively simplify the args (which are a List) */
+ args = (List *)
+ 	expression_tree_mutator((Node *) args,
+ 			eval_const_expressions_mutator,
+ 			(void *) context);
+ /* ... and the filter expression, which isn't */
+ aggfilter = (Expr *)
+ 	eval_const_expressions_mutator((Node *) expr->aggfilter,
+    context);
+ 
+ /* And build the replacement WindowFunc node */
+ newexpr = makeNode(WindowFunc);
+ newexpr->winfnoid = expr->winfnoid;
+ newexpr->wintype = expr->wintype;
+ newexpr->wincollid = expr->wincollid;
+ newexpr->inputcollid = expr->inputcollid;
+ newexpr->args = args;
+ newexpr->aggfilter = aggfilter;
+ newexpr->winref = expr->winref;
+ newexpr->winstar = expr->winstar;
+ newexpr->winagg = expr->winagg;
+ newexpr->location = expr->location;
+ 
+ return (Node *) newexpr;
+ 			}
  		case T_FuncExpr:
  			{
  FuncExpr   *expr = (FuncExpr *) node;
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 2bd24c8..ede36d1 100644
*** a/src/backend/parser/parse_func.c
--- b/src/backend/parser/parse_func.c
*** ParseFuncOrColumn(ParseState *pstate, Li
*** 537,553 
  	 errmsg("window functions cannot return sets"),
  	 parser_errposition(pstate, location)));
  
- 		/*
- 		 * We might want to support this later, but for now reject it because
- 		 * the planner and executor wouldn't cope with NamedArgExprs in a
- 		 * WindowFunc node.
- 		 */
- 		if (argnames != NIL)
- 			ereport(ERROR,
- 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- 	 errmsg("window functions cannot use named arguments"),
- 	 parser_errposition(pstate, location)));
- 
  		/* parse_agg.c does additional window-func-specific processing */
  		transformWindowFuncCall(pstate, wfunc, over);
  
--- 537,542 
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 04b1c4f..915fb7a 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*** get_windowfunc_expr(WindowFunc *wfunc, d
*** 7461,7466 
--- 7461,7467 
  	StringInfo	buf = context->buf;
  	Oid			argtypes[FUNC_MAX_ARGS];
  	int			nargs;
+ 	List	   *argnames;
  	ListCell   *l;
  
  	if (list_length(wfunc->args) > FUNC_MAX_ARGS)
*** get_windowfunc_expr(WindowFunc *wfunc, d
*** 7468,7485 
  (errcode(ERRCODE_TOO_MANY_ARGUMENTS),
   errmsg("too many arguments")));
  	nargs = 0;
  	foreach(l, wfunc->args)
  	{
  		Node	   *arg = (Node *) lfirst(l);
  
! 		Assert(!IsA(arg, NamedArgExpr));

Re: [HACKERS] Window functions can be created with defaults, but they don't work

2013-11-05 Thread Tom Lane

I wrote:
> I noticed this while poking at the variadic-aggregates issue:
> regression=# create function nth_value_def(anyelement, integer = 1) returns 
> anyelement language internal window immutable strict as 'window_nth_value';
> CREATE FUNCTION
> regression=# SELECT nth_value_def(ten) OVER (PARTITION BY four), ten, four
> FROM (SELECT * FROM tenk1 WHERE unique2 < 10 ORDER BY four, ten)s;
> The connection to the server was lost. Attempting reset: Failed.

Attached is a proposed patch against HEAD that fixes this by supporting
default arguments properly for window functions.  In passing, it also
allows named-argument notation in window function calls, since that's
free once the other thing works (because the same subroutine fixes up
both things).

> The reason this crashes is that the planner doesn't apply
> default-insertion to WindowFunc nodes, only to FuncExprs.  We could make
> it do that, probably, but that seems to me like a feature addition.
> I think a more reasonable approach for back-patching is to fix function
> creation to disallow attaching defaults to window functions (or
> aggregates, for that matter, which would have the same issue if CREATE
> AGGREGATE had the syntax option to specify defaults).  ProcedureCreate
> seems like an appropriate place, since it already contains a lot of sanity
> checks of this sort.

Having now done the patch to fix it properly, I'm more inclined to think
that maybe we should just back-patch this rather than inserting an error
check.  It seems pretty low-risk.

Comments?

regards, tom lane

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index e3dbc4b..16aade4 100644
*** a/doc/src/sgml/syntax.sgml
--- b/doc/src/sgml/syntax.sgml
*** SELECT concat_lower_or_upper('Hello', 'W
*** 2527,2534 
  
 
  
!  Named and mixed call notations can currently be used only with regular
!  functions, not with aggregate functions or window functions.
  
 

--- 2527,2535 
  
 
  
!  Named and mixed call notations currently cannot be used when calling an
!  aggregate function (but they do work when an aggregate function is used
!  as a window function).
  
 

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 76c032c..4fa73a9 100644
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
*** eval_const_expressions_mutator(Node *nod
*** 2251,2256 
--- 2251,2306 
   */
  return (Node *) copyObject(param);
  			}
+ 		case T_WindowFunc:
+ 			{
+ WindowFunc *expr = (WindowFunc *) node;
+ Oid			funcid = expr->winfnoid;
+ List	   *args;
+ Expr	   *aggfilter;
+ HeapTuple	func_tuple;
+ WindowFunc *newexpr;
+ 
+ /*
+  * We can't really simplify a WindowFunc node, but we mustn't
+  * just fall through to the default processing, because we
+  * have to apply expand_function_arguments to its argument
+  * list.  That takes care of inserting default arguments and
+  * expanding named-argument notation.
+  */
+ func_tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
+ if (!HeapTupleIsValid(func_tuple))
+ 	elog(ERROR, "cache lookup failed for function %u", funcid);
+ 
+ args = expand_function_arguments(expr->args, expr->wintype,
+  func_tuple);
+ 
+ ReleaseSysCache(func_tuple);
+ 
+ /* Now, recursively simplify the args (which are a List) */
+ args = (List *)
+ 	expression_tree_mutator((Node *) args,
+ 			eval_const_expressions_mutator,
+ 			(void *) context);
+ /* ... and the filter expression, which isn't */
+ aggfilter = (Expr *)
+ 	eval_const_expressions_mutator((Node *) expr->aggfilter,
+    context);
+ 
+ /* And build the replacement WindowFunc node */
+ newexpr = makeNode(WindowFunc);
+ newexpr->winfnoid = expr->winfnoid;
+ newexpr->wintype = expr->wintype;
+ newexpr->wincollid = expr->wincollid;
+ newexpr->inputcollid = expr->inputcollid;
+ newexpr->args = args;
+ newexpr->aggfilter = aggfilter;
+ newexpr->winref = expr->winref;
+ newexpr->winstar = expr->winstar;
+ newexpr->winagg = expr->winagg;
+ newexpr->location = expr->location;
+ 
+ return (Node *) newexpr;
+ 			}
  		case T_FuncExpr:
  			{
  FuncExpr   *expr = (FuncExpr *) node;
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 2bd24c8..ede36d1 100644
*** a/src/backend/parser/parse_func.c
--- b/src/backend/parser/parse_func.c
*** ParseFuncOrColumn(ParseState *pstate, Li
*** 537,553 
  	 errmsg("window functions cannot return sets"),
  	 parser_errposition(pstate, location)));
  
- 		/*
- 		 * We might want to support this later, but for now reject it because
- 		 * the planner and executor wouldn't cope with NamedArgExprs in a
- 		 * WindowFunc node.
- 		 */

Re: [HACKERS] Window functions can be created with defaults, but they don't work

2013-08-30 Thread Tom Lane

Robert Haas  writes:
> On Fri, Aug 30, 2013 at 6:14 PM, Tom Lane  wrote:
>> The reason this crashes is that the planner doesn't apply
>> default-insertion to WindowFunc nodes, only to FuncExprs.

> I'm not sure I agree.  Under that approach, any functions that have
> already been created like that will still crash the server.  A
> malicious user could create a function like this now and wait to
> crontab it until the day he's leaving the company.  Or there are more
> accidental scenarios as well.

The crash is only possible because the underlying internal-language
function doesn't sanity-check its input enough to catch the case of too
few arguments.  As such, it's not that different from hundreds of other
cases where a superuser can cause a crash by misdeclaring the arguments to
an internal-language function.  So I don't find your argument compelling.
I'd even say this was user error, except that it's not obvious that this
case shouldn't work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions can be created with defaults, but they don't work

2013-08-30 Thread Robert Haas

On Fri, Aug 30, 2013 at 6:14 PM, Tom Lane  wrote:
> I noticed this while poking at the variadic-aggregates issue:
>
> regression=# create function nth_value_def(anyelement, integer = 1) returns 
> anyelement language internal window immutable strict as 'window_nth_value';
> CREATE FUNCTION
> regression=# SELECT nth_value_def(ten) OVER (PARTITION BY four), ten, four
> FROM (SELECT * FROM tenk1 WHERE unique2 < 10 ORDER BY four, ten)s;
> The connection to the server was lost. Attempting reset: Failed.
>
> The reason this crashes is that the planner doesn't apply
> default-insertion to WindowFunc nodes, only to FuncExprs.  We could make
> it do that, probably, but that seems to me like a feature addition.
> I think a more reasonable approach for back-patching is to fix function
> creation to disallow attaching defaults to window functions (or
> aggregates, for that matter, which would have the same issue if CREATE
> AGGREGATE had the syntax option to specify defaults).  ProcedureCreate
> seems like an appropriate place, since it already contains a lot of sanity
> checks of this sort.

I'm not sure I agree.  Under that approach, any functions that have
already been created like that will still crash the server.  A
malicious user could create a function like this now and wait to
crontab it until the day he's leaving the company.  Or there are more
accidental scenarios as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Window functions can be created with defaults, but they don't work

2013-08-30 Thread Tom Lane

I noticed this while poking at the variadic-aggregates issue:

regression=# create function nth_value_def(anyelement, integer = 1) returns 
anyelement language internal window immutable strict as 'window_nth_value';
CREATE FUNCTION
regression=# SELECT nth_value_def(ten) OVER (PARTITION BY four), ten, four
FROM (SELECT * FROM tenk1 WHERE unique2 < 10 ORDER BY four, ten)s;
The connection to the server was lost. Attempting reset: Failed.

The reason this crashes is that the planner doesn't apply
default-insertion to WindowFunc nodes, only to FuncExprs.  We could make
it do that, probably, but that seems to me like a feature addition.
I think a more reasonable approach for back-patching is to fix function
creation to disallow attaching defaults to window functions (or
aggregates, for that matter, which would have the same issue if CREATE
AGGREGATE had the syntax option to specify defaults).  ProcedureCreate
seems like an appropriate place, since it already contains a lot of sanity
checks of this sort.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions seem to inhibit push-down of quals into views

2010-08-13 Thread Tom Lane

Alvaro Herrera  writes:
> CREATE TABLE foo AS SELECT a, a % 10 AS b FROM generate_series(1, 10) a;
> CREATE INDEX a_b ON foo (b);
> CREATE VIEW bar AS SELECT a, b, lead(a, 1) OVER () FROM foo;

> explain select a, b, lead(a, 1) over () from foo where b = 2;
> explain select * from bar where b = 2;

Those are not equivalent queries.  In the first case b=2 is supposed to be
applied before window function evaluation, in the second case not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Window functions seem to inhibit push-down of quals into views

2010-08-13 Thread Alvaro Herrera

Hi,

I've got a table and view defined like this:

CREATE TABLE foo AS SELECT a, a % 10 AS b FROM generate_series(1, 10) a;
CREATE INDEX a_b ON foo (b);
CREATE VIEW bar AS SELECT a, b, lead(a, 1) OVER () FROM foo;

Now, if I query the table directly instead of going through the view, a
WHERE condition can be pushed down to the table scan:

explain select a, b, lead(a, 1) over () from foo where b = 2;
QUERY PLAN 
---
 WindowAgg  (cost=12.14..488.72 rows=500 width=8)
   ->  Bitmap Heap Scan on foo  (cost=12.14..482.47 rows=500 width=8)
 Recheck Cond: (b = 2)
 ->  Bitmap Index Scan on a_b  (cost=0.00..12.01 rows=500 width=0)
   Index Cond: (b = 2)
(5 filas)

However, if I instead query the view, the qual is applied to a SubqueryScan
instead, and the table is scanned with no qual at all:

alvherre=# explain select * from bar where b = 2;
  QUERY PLAN   
---
 Subquery Scan bar  (cost=0.00..3943.00 rows=500 width=12)
   Filter: (bar.b = 2)
   ->  WindowAgg  (cost=0.00..2693.00 rows=10 width=8)
 ->  Seq Scan on foo  (cost=0.00..1443.00 rows=10 width=8)
(4 filas)

The view is behaving like this:

alvherre=# explain select * from (select a, b, lead(a, 1) over () from foo) b 
where b = 2;
  QUERY PLAN   
---
 Subquery Scan b  (cost=0.00..3943.00 rows=500 width=12)
   Filter: (b.b = 2)
   ->  WindowAgg  (cost=0.00..2693.00 rows=10 width=8)
 ->  Seq Scan on foo  (cost=0.00..1443.00 rows=10 width=8)
(4 filas)



This is a killer for useful views on top of queries with window
functions :-(

Is this a optimizer shortcoming?

-- 
Álvaro Herrera 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] window functions maybe bug

2009-09-02 Thread Pavel Stehule

2009/9/2 Tom Lane :
> Pavel Stehule  writes:
>> create table x1 (a integer);
>> insert into x1 
>> values(2),(2),(3),(3),(4),(4),(5),(5),(6),(6),(6),(8),(9),(9),(10),(10);
>
>> postgres=# select row_number() over (order by a), row_number() over
>> (order by a desc) from x1;
>>  row_number | row_number
>> +
>>          16 |          1
>>          15 |          2
>>          14 |          3
>>          11 |          4
>>          13 |          5
>>          12 |          6
>>           9 |          7
>>          10 |          8
>>           7 |          9
>>           8 |         10
>>           5 |         11
>>           6 |         12
>>           4 |         13
>>           3 |         14
>>           1 |         15
>>           2 |         16
>> (16 rows)
>
>> I am not sure, is this correct?
>
> I don't see any grounds for arguing that it's wrong.  The results for
> rows with equal "a" values are indeterminate.

I can understand it. So I found Joe Celko's bug :)

regards
Pavel Stehule

>
>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] window functions maybe bug

2009-09-02 Thread Tom Lane

Pavel Stehule  writes:
> create table x1 (a integer);
> insert into x1 
> values(2),(2),(3),(3),(4),(4),(5),(5),(6),(6),(6),(8),(9),(9),(10),(10);

> postgres=# select row_number() over (order by a), row_number() over
> (order by a desc) from x1;
>  row_number | row_number
> +
>  16 |  1
>  15 |  2
>  14 |  3
>  11 |  4
>  13 |  5
>  12 |  6
>   9 |  7
>  10 |  8
>   7 |  9
>   8 | 10
>   5 | 11
>   6 | 12
>   4 | 13
>   3 | 14
>   1 | 15
>   2 | 16
> (16 rows)

> I am not sure, is this correct?

I don't see any grounds for arguing that it's wrong.  The results for
rows with equal "a" values are indeterminate.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] window functions maybe bug

2009-09-02 Thread Pavel Stehule

Hello,

I wrote article about statistical function - when I tested Joe Celko's
method, I found some problems on not unique dataset:

on distinct dataset is rule so rows here is max(hi), then there is min(lo):

create table x1 (a integer);

insert into x1 select generate_series(1,10);

postgres=# select row_number() over (order by a), row_number() over
(order by a desc) from x1;
 row_number | row_number
+
 10 |  1
  9 |  2
  8 |  3
  7 |  4
  6 |  5
  5 |  6
  4 |  7
  3 |  8
  2 |  9
  1 | 10
(10 rows)

but on other set I got

truncate table x1;
insert into x1 
values(2),(2),(3),(3),(4),(4),(5),(5),(6),(6),(6),(8),(9),(9),(10),(10);

postgres=# select row_number() over (order by a), row_number() over
(order by a desc) from x1;
 row_number | row_number
+
 16 |  1
 15 |  2
 14 |  3
 11 |  4
 13 |  5
 12 |  6
  9 |  7
 10 |  8
  7 |  9
  8 | 10
  5 | 11
  6 | 12
  4 | 13
  3 | 14
  1 | 15
  2 | 16
(16 rows)

I am not sure, is this correct? When this solution is correct, then
Joe Celko's method for median calculation is buggy.

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-27 Thread Hitoshi Harada

2008/12/27 Tom Lane :
> "Robert Haas"  writes:
>> Unfortunately, if we don't want to add an explicit iswindowable flag
>> (and I understand that that's ugly), then I think this is the way to
>> go.  It's a shame that people will have to make code changes, but
>> inventing a fake AggState object just to get around this problem
>> sounds worse.  The array_agg code is new and the fact that it doesn't
>> follow the design pattern should be considered a bug in that code
>> rather than a justification for an ugly workaround.
>
> Well, array_agg may be new but it's simply a re-implementation of a
> design pattern that existed in contrib/intagg since 7.3 or so.  I have
> no problem with fixing array_agg --- what I'm wondering about is who
> has copied intagg before.

We agree that the best solution for ten core aggregates is to rewrite
them to support or not support WindowAgg, so the care for third party
aggregates copied from intagg is nothing but announcing that the
behavior is changing. -- if we had better alternative we should do it,
but it seems to me that there's no way not to break the non-core
aggregates.

SInce at t least you must compile the modules again on 8.4 release,
compiling time warnings or something is the best announcing for now.
Or any other suggestions?

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-26 Thread Tom Lane

"Robert Haas"  writes:
> Unfortunately, if we don't want to add an explicit iswindowable flag
> (and I understand that that's ugly), then I think this is the way to
> go.  It's a shame that people will have to make code changes, but
> inventing a fake AggState object just to get around this problem
> sounds worse.  The array_agg code is new and the fact that it doesn't
> follow the design pattern should be considered a bug in that code
> rather than a justification for an ugly workaround.

Well, array_agg may be new but it's simply a re-implementation of a
design pattern that existed in contrib/intagg since 7.3 or so.  I have
no problem with fixing array_agg --- what I'm wondering about is who
has copied intagg before.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-26 Thread Robert Haas

> 1. Go back to Hitoshi's plan of passing WindowAggState to the
> aggregates.  This will require changing every one of the ten aggregates
> in the core distro, as well as every third-party aggregate that has
> a similar optimization; and we just have to keep our fingers crossed
> that anyone who's taking a short-cut will fix their code before it
> fails in the field.

Unfortunately, if we don't want to add an explicit iswindowable flag
(and I understand that that's ugly), then I think this is the way to
go.  It's a shame that people will have to make code changes, but
inventing a fake AggState object just to get around this problem
sounds worse.  The array_agg code is new and the fact that it doesn't
follow the design pattern should be considered a bug in that code
rather than a justification for an ugly workaround.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-26 Thread David Fetter

On Fri, Dec 26, 2008 at 02:17:29PM -0500, Tom Lane wrote:

> So the alternatives I see are:
> 
> 1. Go back to Hitoshi's plan of passing WindowAggState to the
> aggregates.  This will require changing every one of the ten aggregates
> in the core distro, as well as every third-party aggregate that has
> a similar optimization; and we just have to keep our fingers crossed
> that anyone who's taking a short-cut will fix their code before it
> fails in the field.
> 
> 2. Use an intermediate dummy AggState as I have in my version, but
> document some convention for telling this from a "real" AggState
> when needed.  (Not hard, we just pick some field that would never be
> zero in a real AggState and document testing that.)  This is certainly
> on the ugly side, but it would very substantially cut the number of
> places that need changes.  Only aggregates that are doing something
> irreversible in their final-functions would need to be touched.
> 
> If we were working in a green field then #1 would clearly be the
> preferable choice, but worrying about compatibility with existing
> third-party aggregates is making me lean to #2.  Comments?

Exactly how large is this third-party aggregate problem?  Rather than
support a huge wart, we could just tell people starting now and
prominently in the release notes that such things need a re-do and
point to examples of how it's done.

The case this won't work for is where a vendor of a proprietary C
extension is gone and/or won't update their stuff for 8.4, and it
seems to me that we can't, and shouldn't try to, take responsibility
for that use case anyhow.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-26 Thread Tom Lane

"Joshua D. Drake"  writes:
> I believe the goal should be correctness but why not both? Fix what we
> can and put in place a "work around" that would be removed in 8.5?

Why not both what?  The driving concern here is that there might be
third-party aggregates that will dump core if invoked as window
functions, and there is simply not anything we can do to prevent that.
Short of refusing to call them at all, which strikes me as an extreme
overreaction since most of them can be expected to work fine.  (In
particular, if we went with solution #1 then *all* the ones that were
playing by the documented rules would still work.)

Also, there is no such thing as "a workaround we can remove in 8.5".
If we put in something like a "safe as window function" attribute for
CREATE AGGREGATE, we'll have to support it till the end of time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-26 Thread Joshua D. Drake

On Fri, 2008-12-26 at 14:17 -0500, Tom Lane wrote:
> Greg Stark  writes:
> > Yeah, it seems like adding a flag like iswindowable to aggregate  
> > functions is the safest option.
> 

> So the alternatives I see are:
> 
> 1. Go back to Hitoshi's plan of passing WindowAggState to the
> aggregates.  This will require changing every one of the ten aggregates
> in the core distro, as well as every third-party aggregate that has
> a similar optimization; and we just have to keep our fingers crossed
> that anyone who's taking a short-cut will fix their code before it
> fails in the field.
> 
> 2. Use an intermediate dummy AggState as I have in my version, but
> document some convention for telling this from a "real" AggState
> when needed.  (Not hard, we just pick some field that would never be
> zero in a real AggState and document testing that.)  This is certainly
> on the ugly side, but it would very substantially cut the number of
> places that need changes.  Only aggregates that are doing something
> irreversible in their final-functions would need to be touched.
> 
> If we were working in a green field then #1 would clearly be the
> preferable choice, but worrying about compatibility with existing
> third-party aggregates is making me lean to #2.  Comments?
> 
>   regards, tom lane
> 

I believe the goal should be correctness but why not both? Fix what we
can and put in place a "work around" that would be removed in 8.5?

Joshua D. Drake


-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-26 Thread Tom Lane

Greg Stark  writes:
> Yeah, it seems like adding a flag like iswindowable to aggregate  
> functions is the safest option.

I agree with Hitoshi-san: that's passing information in the wrong
direction.  The right direction is to make it visible to the called
function which context it's being called in, so that it can do the
right thing (or at worst, throw an error if it can't).

The current state of play is that the documented expectation for
aggregate functions is that they should do

if (fcinfo->context && IsA(fcinfo->context, AggState))
... optimized code for aggregate case ...
else
... support regular call, or throw error ...

However, a bit of grepping for AggState shows that this expectation
isn't met very well within the core distribution, let alone elsewhere.
There are about 10 aggregates that have optimizations like this, only
8 of which are playing by the rules --- the violators are:

tsearch2's tsa_rewrite_accum: just assumes it's been passed an AggState,
dumps core (or worse) if not
array_agg: Asserts it's been passed an AggState, dumps core if not

As submitted, Hitoshi's patch made the WindowAgg code pass its
WindowAggState to the aggregate functions, which at best would have the
result of disabling the internal optimizations of the aggregates, and
would result in a core dump in any aggregate that was taking a shortcut.
The working copy I have right now does this:

if (numaggs > 0)
{
/*
 * Set up a mostly-dummy AggState to be passed to plain aggregates
 * so they know they can optimize things.  We don't supply any useful
 * info except for the ID of the aggregate-lifetime context.
 */
winstate->aggstate = makeNode(AggState);
winstate->aggstate->aggcontext = winstate->wincontext;
}

and then passes the dummy AggState to plain aggregates, instead of
WindowAggState.  This makes count(*) and most of the other aggs work
as desired, but it's not sufficient for array_agg because of that
release-the-data-structure-in-the-final-function optimization.

So the alternatives I see are:

1. Go back to Hitoshi's plan of passing WindowAggState to the
aggregates.  This will require changing every one of the ten aggregates
in the core distro, as well as every third-party aggregate that has
a similar optimization; and we just have to keep our fingers crossed
that anyone who's taking a short-cut will fix their code before it
fails in the field.

2. Use an intermediate dummy AggState as I have in my version, but
document some convention for telling this from a "real" AggState
when needed.  (Not hard, we just pick some field that would never be
zero in a real AggState and document testing that.)  This is certainly
on the ugly side, but it would very substantially cut the number of
places that need changes.  Only aggregates that are doing something
irreversible in their final-functions would need to be touched.

If we were working in a green field then #1 would clearly be the
preferable choice, but worrying about compatibility with existing
third-party aggregates is making me lean to #2.  Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-25 Thread Hitoshi Harada

2008/12/25 Pavel Stehule :
> 2008/12/25 Hitoshi Harada :
>> 2008/12/25 Greg Stark :
>>> Yeah, it seems like adding a flag like iswindowable to aggregate functions
>>> is the safest option.
>>>
>>> It would be nice if it represented an abstract property of the state
>>> function or final function rather than just "works with the implementation
>>> of window functions". I'm not sure what that property is though -
>>> isidempotent? isreentrant? Maybe just  a vague isrepeatable?
>>
>> No, I meant wrinting such like:
>>
>> Datum
>> some_trans_fn(PG_FUNCTION_ARGS)
>> {
>>  if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
>>elog(ERROR, "some_agg does not support window aggregate");
>>
>> ...
>> }
>>
>> rather than adding column to catalog. To add flag you must add new
>> syntax for CREATE AGGREGATE, which is slightly more painful.
>>
>
> enhancing of CREATE AGGREGATE syntax should be better, it could solve
> problem with compatibility.
>

Most of the aggregates have no problem with this issue and the rest
are fixable like array_agg. So adding a column and syntax is too much,
I guess. My suggestion of raising error is urgent patch for third
party aggregates that are copied from contrib/intagg.

But if there is a chance of general approach to call repeatable
aggregate other than WindowAgg, that should be considered.


Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-25 Thread Pavel Stehule

2008/12/25 Hitoshi Harada :
> 2008/12/25 Greg Stark :
>> Yeah, it seems like adding a flag like iswindowable to aggregate functions
>> is the safest option.
>>
>> It would be nice if it represented an abstract property of the state
>> function or final function rather than just "works with the implementation
>> of window functions". I'm not sure what that property is though -
>> isidempotent? isreentrant? Maybe just  a vague isrepeatable?
>
> No, I meant wrinting such like:
>
> Datum
> some_trans_fn(PG_FUNCTION_ARGS)
> {
>  if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
>elog(ERROR, "some_agg does not support window aggregate");
>
> ...
> }
>
> rather than adding column to catalog. To add flag you must add new
> syntax for CREATE AGGREGATE, which is slightly more painful.
>

enhancing of CREATE AGGREGATE syntax should be better, it could solve
problem with compatibility.

regards
Pavel Stehule

> Regards,
>
> --
> Hitoshi Harada
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-25 Thread Hitoshi Harada

2008/12/25 Greg Stark :
> Yeah, it seems like adding a flag like iswindowable to aggregate functions
> is the safest option.
>
> It would be nice if it represented an abstract property of the state
> function or final function rather than just "works with the implementation
> of window functions". I'm not sure what that property is though -
> isidempotent? isreentrant? Maybe just  a vague isrepeatable?

No, I meant wrinting such like:

Datum
some_trans_fn(PG_FUNCTION_ARGS)
{
  if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
elog(ERROR, "some_agg does not support window aggregate");

...
}

rather than adding column to catalog. To add flag you must add new
syntax for CREATE AGGREGATE, which is slightly more painful.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-25 Thread Greg Stark

Yeah, it seems like adding a flag like iswindowable to aggregate  
functions is the safest option.


It would be nice if it represented an abstract property of the state  
function or final function rather than just "works with the  
implementation of window functions". I'm not sure what that property  
is though - isidempotent? isreentrant? Maybe just  a vague isrepeatable?


--
Greg


On 24 Dec 2008, at 20:16, "Hitoshi Harada"  wrote:


2008/12/25 Tom Lane :

Gregory Stark  writes:

Tom Lane  writes:
Unless we want to move the goalposts on what an aggregate is  
allowed
to do internally, we're going to have to change this to re- 
aggregate

repeatedly.  Neither prospect is appetizing in the least.


Does it currently copy the state datum before calling the final  
function?

Would that help?


No.  The entire point of what we have now formalized as "aggregates  
with

internal-type transition values" is that the transition value isn't
necessarily a single palloc chunk.  For stuff like array_agg, the
performance costs of requiring that are enormous.

On looking at what array_agg does, it seems the issue there is that
the final-function actually deletes the working state when it thinks
it's done with it (see construct_md_array).   It would probably be
possible to fix things so that it doesn't do that when it's called by
a WindowAgg instead of a regular Agg.  What I'm more concerned about
is what third-party code will break.  contrib/intagg has done this
type of thing since forever, and I'm sure people have copied that...


I have concerned it once before on the first design of the window
functions. I don't have much idea about all over the aggregate
functions but at least count(*) does some assumption of AggState in
its context. So I concluded when the window functions are introduced
it must be announced that if you use aggregate in the window context,
you must be sure it supports window as well as aggregate. It is
because currently (<= 8.3) aggregates are assumed it is called in
AggState only but the assumption would be broken now. It is designed,
and announcing helps much third party code to support or not to
support window functions (i.e. you can stop by error if window is not
supported by the function).

Regards,

--
Hitoshi Harada


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-24 Thread Hitoshi Harada

2008/12/25 Tom Lane :
> Gregory Stark  writes:
>> Tom Lane  writes:
>>> Unless we want to move the goalposts on what an aggregate is allowed
>>> to do internally, we're going to have to change this to re-aggregate
>>> repeatedly.  Neither prospect is appetizing in the least.
>
>> Does it currently copy the state datum before calling the final function?
>> Would that help?
>
> No.  The entire point of what we have now formalized as "aggregates with
> internal-type transition values" is that the transition value isn't
> necessarily a single palloc chunk.  For stuff like array_agg, the
> performance costs of requiring that are enormous.
>
> On looking at what array_agg does, it seems the issue there is that
> the final-function actually deletes the working state when it thinks
> it's done with it (see construct_md_array).   It would probably be
> possible to fix things so that it doesn't do that when it's called by
> a WindowAgg instead of a regular Agg.  What I'm more concerned about
> is what third-party code will break.  contrib/intagg has done this
> type of thing since forever, and I'm sure people have copied that...

I have concerned it once before on the first design of the window
functions. I don't have much idea about all over the aggregate
functions but at least count(*) does some assumption of AggState in
its context. So I concluded when the window functions are introduced
it must be announced that if you use aggregate in the window context,
you must be sure it supports window as well as aggregate. It is
because currently (<= 8.3) aggregates are assumed it is called in
AggState only but the assumption would be broken now. It is designed,
and announcing helps much third party code to support or not to
support window functions (i.e. you can stop by error if window is not
supported by the function).

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-24 Thread Tom Lane

Gregory Stark  writes:
> Tom Lane  writes:
>> Unless we want to move the goalposts on what an aggregate is allowed
>> to do internally, we're going to have to change this to re-aggregate
>> repeatedly.  Neither prospect is appetizing in the least.

> Does it currently copy the state datum before calling the final function?
> Would that help?

No.  The entire point of what we have now formalized as "aggregates with
internal-type transition values" is that the transition value isn't
necessarily a single palloc chunk.  For stuff like array_agg, the
performance costs of requiring that are enormous.

On looking at what array_agg does, it seems the issue there is that
the final-function actually deletes the working state when it thinks
it's done with it (see construct_md_array).   It would probably be
possible to fix things so that it doesn't do that when it's called by
a WindowAgg instead of a regular Agg.  What I'm more concerned about
is what third-party code will break.  contrib/intagg has done this
type of thing since forever, and I'm sure people have copied that...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window-functions patch handling of aggregates

2008-12-23 Thread Gregory Stark

Tom Lane  writes:

> The window functions patch is laboring under the delusion that it can
> call an aggregate's final function and then go back to invoking the
> transfn some more on the same data.  This is merest fantasy :-(
>
> regression=# select array_agg(q1) over(order by q1) from int8_tbl;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> Unless we want to move the goalposts on what an aggregate is allowed
> to do internally, we're going to have to change this to re-aggregate
> repeatedly.  Neither prospect is appetizing in the least.

Does it currently copy the state datum before calling the final function?
Would that help?

It does seem like the abstract interface we have for aggregate functions is a
strength and we should leverage that. The burden of supporting more complex
cases should be borne by the users that are bending the rules.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Window-functions patch handling of aggregates

2008-12-23 Thread Tom Lane

The window functions patch is laboring under the delusion that it can
call an aggregate's final function and then go back to invoking the
transfn some more on the same data.  This is merest fantasy :-(

regression=# select array_agg(q1) over(order by q1) from int8_tbl;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Unless we want to move the goalposts on what an aggregate is allowed
to do internally, we're going to have to change this to re-aggregate
repeatedly.  Neither prospect is appetizing in the least.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions review

2008-11-21 Thread Hitoshi Harada

2008/11/22 Heikki Linnakangas <[EMAIL PROTECTED]>:
> Just to let you know, I'm hacking this patch again, the executor part in
> particular. I've got a stripped out the unfinished window frame stuff,
> refactored the Window object API so that the window functions don't get to,
> and don't need to, access tuple slots directly. And a bunch of other
> simplifications as well. I'll post as soon as it's in a more readable state.
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>

I'm so glad to hear that. Currently I don't have time to refactor much
in executor, but only fixes such like what David reported. Feel free
to ask me about any cut of the patch.

Regards,



-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions review

2008-11-21 Thread Heikki Linnakangas

Just to let you know, I'm hacking this patch again, the executor part in 
particular. I've got a stripped out the unfinished window frame stuff, 
refactored the Window object API so that the window functions don't get 
to, and don't need to, access tuple slots directly. And a bunch of other 
simplifications as well. I'll post as soon as it's in a more readable state.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions review

2008-11-12 Thread Hitoshi Harada

Thank you for your reviewing my code.

2008/11/12 Heikki Linnakangas <[EMAIL PROTECTED]>:
> I've been slicing and dicing this patch for the last few days. There's a lot
> of code in there, but here's some initial comments:
>
> The code to initialize, advance, and finalize an aggregate should be shared
> between Agg and Window nodes.
>
> I'm a bit disappointed that we need so much code to support the window
> aggregates. I figured we'd just have a special window function that calls
> the initialize/advance/finalize functions for the right aggregate, working
> with the WindowObject object like other window functions. But we have in
> fact very separate code path for aggregates. I feel we should implement the
> aggregates as just a thin wrapper window function that I just described. Or,
> perhaps the aggregate functionality should be moved to Agg node, although if
> we do that, we'd probably have to change that again when we implement the
> window frames. Or perhaps it should be completely new node type, though
> you'd really want to share the tuplestore with the Window node if there's
> any window functions in the same query, using the same window specification.

I am disappointed at that, too. I was thinking as you do, that special
thin wrapper would do for pure aggregates. It seems, however,
impossible to me. Aggregates have two separate functions then
initialize value, store intermediate results, pass if each function is
strict, and so on. My rough sketch of wrapper for pure aggregates was
eval_windowaggregate() but actually they need to initialize, advance,
and finalize. As long as we have the same structure for aggregates,
they're always needed.
So the next choice is to share those three with Agg. This sounds sane.
I've not touched any code in nodeAgg.c. If we really need it I will do
it. Then we can remove initialize_windowaggregate(),
advance_windowaggregate(), finalize_windowaggregate(), and
initialize_peragg() from nodeWindow.c. They are almost same
correspoinding functions in Agg, except advance requires window
function API in Window. And if we do that, PerAgg and PerGroup should
be shared also.
The reason I didn't touch nodeAgg.c and didn't share them is that
there are only two nodes that use the share code. I wasn't sure if it
is general enough. Or I thought it'd be better to seperate code than
to share it so we keep them simple. What do you think?

> I don't like that the window functions are passed Var and Const nodes as
> arguments. A user-defined function shouldn't see those internal data
> structures, even though they're just passed as opaque arguments to
> WinRowGetArg. You could pass WinRowGetArg just argument numbers, and have it
> fetch the Expr nodes.

I understand what you point. The current signature of WinRowGetArg is

Datum WinRowGetArg(WindowObject winobj, ExprState *argstate, bool *isnull);

And if we use a number as arguments instead of argstate, we need
fcinfo. How do you think we should pass it? WindowObject may hold it
but it is data related with the window logically, not with each
function. The alternative is to add one more argument to
WinRowGetArg(). I don't like long argument list. But if we really need
it, no way to deny it.

> The incomplete support for window frames should be removed. It was good that
> you did it, so that we have a sketch of how it'd work and got some
> confidence that we won't have to rip out and rewrite all of this in the
> future to support the window frames. But if it's not going to go into this
> release, we should take it out.

Agreed. I'll remove it. Until recently I was wondering if I could try
to implement the FRAME clause. But actually it's too late for 8.4. Or
it would have to be after the current patch is commited.

> The buffering strategies are very coarse. For aggregates, as long as we
> don't support window frames, row buffering is enough. Even when
> partition-buffering is needed, we shouldn't need to fetch the whole
> partition into the tuplestore before we start processing. lead/lag for
> example can start returning values earlier. That's just an optimization,
> though, so maybe we can live with that for now.

Aggregates() RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW doesn't
require frame buffering but RANGE BETEWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING (that is OVER() ) needs it. You're telling me to
switch the strategy depending on cases? Hmmm, let me see.



-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Window functions review

2008-11-12 Thread Heikki Linnakangas

I've been slicing and dicing this patch for the last few days. There's a 
lot of code in there, but here's some initial comments:


The code to initialize, advance, and finalize an aggregate should be 
shared between Agg and Window nodes.


I'm a bit disappointed that we need so much code to support the window 
aggregates. I figured we'd just have a special window function that 
calls the initialize/advance/finalize functions for the right aggregate, 
working with the WindowObject object like other window functions. But we 
have in fact very separate code path for aggregates. I feel we should 
implement the aggregates as just a thin wrapper window function that I 
just described. Or, perhaps the aggregate functionality should be moved 
to Agg node, although if we do that, we'd probably have to change that 
again when we implement the window frames. Or perhaps it should be 
completely new node type, though you'd really want to share the 
tuplestore with the Window node if there's any window functions in the 
same query, using the same window specification.


I don't like that the window functions are passed Var and Const nodes as 
arguments. A user-defined function shouldn't see those internal data 
structures, even though they're just passed as opaque arguments to 
WinRowGetArg. You could pass WinRowGetArg just argument numbers, and 
have it fetch the Expr nodes.


The incomplete support for window frames should be removed. It was good 
that you did it, so that we have a sketch of how it'd work and got some 
confidence that we won't have to rip out and rewrite all of this in the 
future to support the window frames. But if it's not going to go into 
this release, we should take it out.


The buffering strategies are very coarse. For aggregates, as long as we 
don't support window frames, row buffering is enough. Even when 
partition-buffering is needed, we shouldn't need to fetch the whole 
partition into the tuplestore before we start processing. lead/lag for 
example can start returning values earlier. That's just an optimization, 
though, so maybe we can live with that for now.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: patch for CommitFest:Nov.

2008-10-31 Thread Hitoshi Harada

2008/11/1 Tom Lane <[EMAIL PROTECTED]>:
> Gregory Stark <[EMAIL PROTECTED]> writes:
>> "Hitoshi Harada" <[EMAIL PROTECTED]> writes:
>>> 2008/11/1 David Fetter <[EMAIL PROTECTED]>:
>>> I've ever sent a patch over 100k and failed. Actually how much is the
>>> limitation of the patch size? And if the patch is too huge, is it
>>> better to split the patch than send an external link?
>
> I'd suggest splitting the patch into sections if necessary.  A patch
> that's over 100K zipped is likely to be unmanageable from a reviewing
> standpoint anyhow --- it would be better to think about how to factor
> it into separate patches ...

OK, but a half of my patch is based on pg_proc.h so reviewing is not
so complexing as its size.

> But in any case, Alvaro is correct to complain about external links.
> We want the patch to be in the list archives.

Agree. So I suppose the limitation can be bigger up to 500k or so.
Nowadays, network and mail clients wouldn't be annoyed with that size.
But I will follow the current rule. Next time, I'll try split patch.



-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: patch for CommitFest:Nov.

2008-10-31 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> "Hitoshi Harada" <[EMAIL PROTECTED]> writes:
>> 2008/11/1 David Fetter <[EMAIL PROTECTED]>:
>> I've ever sent a patch over 100k and failed. Actually how much is the
>> limitation of the patch size? And if the patch is too huge, is it
>> better to split the patch than send an external link?

I'd suggest splitting the patch into sections if necessary.  A patch
that's over 100K zipped is likely to be unmanageable from a reviewing
standpoint anyhow --- it would be better to think about how to factor
it into separate patches ...

But in any case, Alvaro is correct to complain about external links.
We want the patch to be in the list archives.

> Isn't this like the third time we've run into this and said we were going to
> raise/erase the limit?

Uh, we did.  You'll notice David's 140K email got through.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: patch for CommitFest:Nov.

2008-10-31 Thread Gregory Stark

"Hitoshi Harada" <[EMAIL PROTECTED]> writes:

> 2008/11/1 David Fetter <[EMAIL PROTECTED]>:
>> On Fri, Oct 31, 2008 at 01:00:38PM -0300, Alvaro Herrera wrote:
>>> Hitoshi Harada escribió:
>>>
>>> > [Patch itself]
>>> > http://umitanuki.net/pgsql/wfv08/window_functions.patch.20081031.gz
>>>
>>> Please send the patch to the pgsql-hackers list too.  That way we will
>>> have the patch around, even if the site above goes away in a few years.
>>
>> Here's a bzip2 version, which I hope will get through, as it's over
>> 100kB.
>>
> I've ever sent a patch over 100k and failed. Actually how much is the
> limitation of the patch size? And if the patch is too huge, is it
> better to split the patch than send an external link?

I suppose you could upload the patch to the wiki which just gives a warning
but lets you go ahead.

Isn't this like the third time we've run into this and said we were going to
raise/erase the limit?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: patch for CommitFest:Nov.

2008-10-31 Thread Hitoshi Harada

2008/11/1 David Fetter <[EMAIL PROTECTED]>:
> On Fri, Oct 31, 2008 at 01:00:38PM -0300, Alvaro Herrera wrote:
>> Hitoshi Harada escribió:
>>
>> > [Patch itself]
>> > http://umitanuki.net/pgsql/wfv08/window_functions.patch.20081031.gz
>>
>> Please send the patch to the pgsql-hackers list too.  That way we will
>> have the patch around, even if the site above goes away in a few years.
>
> Here's a bzip2 version, which I hope will get through, as it's over
> 100kB.
>
> Cheers,
> David.
> --
> David Fetter <[EMAIL PROTECTED]> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter  XMPP: [EMAIL PROTECTED]
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>

I've ever sent a patch over 100k and failed. Actually how much is the
limitation of the patch size? And if the patch is too huge, is it
better to split the patch than send an external link?

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: patch for CommitFest:Nov.

2008-10-31 Thread Alvaro Herrera

Hitoshi Harada escribió:

> [Patch itself]
> http://umitanuki.net/pgsql/wfv08/window_functions.patch.20081031.gz

Please send the patch to the pgsql-hackers list too.  That way we will
have the patch around, even if the site above goes away in a few years.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Window Functions: patch for CommitFest:Nov.

2008-10-31 Thread Hitoshi Harada

I have completed my work on the patch for commit fest.

[Design Doc]
http://umitanuki.net/pgsql/wfv08/design.html

[Patch itself]
http://umitanuki.net/pgsql/wfv08/window_functions.patch.20081031.gz

[Git repository]
http://git.postgresql.org/git/~davidfetter/window_functions/.git

All compiler warnings, bison conflicts and known crashes are fixed.
The sgml documentation including keyword (non-)reserved tables is
updated. Test cases are brushed up. A few improvements around APIs are
done and row_number()/rank() cases are optimized not to store all the
rows contained in the partition.

I hope it will come to 8.4.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Hitoshi Harada

2008/10/29 Tom Lane <[EMAIL PROTECTED]>:
> Martijn van Oosterhout <[EMAIL PROTECTED]> writes:
>> On Tue, Oct 28, 2008 at 01:50:26PM -0400, Tom Lane wrote:
>>> ... So it might be possible to fix
>>> by attaching some new precedence level to the ROWS token.
>
>> Yes. Bison's default is to shift, which means that if you do nothing it
>> will treat ROWS as part of the expression if it makes any sense at all.
>> Given the requirement for a following UNBOUNDED or BETWEEN, the only
>> problem is that you'll get a syntax error if the expr_list ends in a
>> postfix operator, I don't see how you get hidden ambiguity.
>
> Hmm, now I see what you meant; that's a little different than what I was
> envisioning.  I was thinking of trying to force a parse decision that
> would support the windowing syntax, whereas you propose forcing a
> parse decision that does the opposite, and making the user parenthesize
> if he's got a conflict.
>
> What the choice seems to come down to is making ROWS and RANGE reserved
> (in some form or other) versus creating a corner case for users of
> postfix operators.  Phrased that way it does seem like the second
> alternative is better.
>
> Hitoshi: you can probably make this happen by including ROWS and RANGE
> in the %nonassoc IDENT precedence declaration, but you'll want to test
> to make sure the right things happen.
>

Bison and parsing are quite new to me so it'll take a little time but
I will try it. One thing, the words following after ROWS/RANGE are not
only UNBOUNDED and BETWEEN but also CURRENT and "unsigned constant"
though. Still the phrasing approach doesn't seem less hope?

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Tom Lane

Martijn van Oosterhout <[EMAIL PROTECTED]> writes:
> On Tue, Oct 28, 2008 at 01:50:26PM -0400, Tom Lane wrote:
>> ... So it might be possible to fix
>> by attaching some new precedence level to the ROWS token.

> Yes. Bison's default is to shift, which means that if you do nothing it
> will treat ROWS as part of the expression if it makes any sense at all.
> Given the requirement for a following UNBOUNDED or BETWEEN, the only
> problem is that you'll get a syntax error if the expr_list ends in a
> postfix operator, I don't see how you get hidden ambiguity.

Hmm, now I see what you meant; that's a little different than what I was
envisioning.  I was thinking of trying to force a parse decision that
would support the windowing syntax, whereas you propose forcing a
parse decision that does the opposite, and making the user parenthesize
if he's got a conflict.

What the choice seems to come down to is making ROWS and RANGE reserved
(in some form or other) versus creating a corner case for users of
postfix operators.  Phrased that way it does seem like the second
alternative is better.

Hitoshi: you can probably make this happen by including ROWS and RANGE
in the %nonassoc IDENT precedence declaration, but you'll want to test
to make sure the right things happen.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Martijn van Oosterhout

On Tue, Oct 28, 2008 at 01:50:26PM -0400, Tom Lane wrote:
> > Given that the only problematic case is if expr_list ends with a
> > postfix operator, wouldn't it be sufficient to simply decree that in
> > that case you need parentheses? Seems a lot less painful than adding
> > two reserved words.
> 
> Hmm ... actually, it might be possible to fix it with a suitable
> precedence declaration?  The trick is to make sure that in
>   ... ORDER BY foo ! ROWS ...
> the operator is taken as postfix not infix, which is the exact opposite
> of what we did for AS-less column aliases (and, in fact, is the opposite
> of what bison will do by default, IIRC).  So it might be possible to fix
> by attaching some new precedence level to the ROWS token.

Yes. Bison's default is to shift, which means that if you do nothing it
will treat ROWS as part of the expression if it makes any sense at all.
Given the requirement for a following UNBOUNDED or BETWEEN, the only
problem is that you'll get a syntax error if the expr_list ends in a
postfix operator, I don't see how you get hidden ambiguity.

Operator precedence is exactly the way to do this, since operator
precedence rules exist solely to resolve the shift/reduce conflicts.

You're right about the documentation though. I suppose you could put in
the documentation for the ROWS stuff something along the lines of: If
the last expression of your ORDER by ends in a postfix operator, you
must use parentheses. How many postfix operators are in common use in
ORDER BY expressions anyway?

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.

signature.asc
Description: Digital signature

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Tom Lane

Martijn van Oosterhout <[EMAIL PROTECTED]> writes:
> On Tue, Oct 28, 2008 at 12:38:09PM -0400, Tom Lane wrote:
>> Right offhand, I don't see any alternative but to make both ROWS and
>> RANGE reserved.  It's pretty annoying since that might break existing
>> applications that have been using these as identifiers, but the SQL
>> committee seems to care little about that :-(

> Given that the only problematic case is if expr_list ends with a
> postfix operator, wouldn't it be sufficient to simply decree that in
> that case you need parentheses? Seems a lot less painful than adding
> two reserved words.

Hmm ... actually, it might be possible to fix it with a suitable
precedence declaration?  The trick is to make sure that in
... ORDER BY foo ! ROWS ...
the operator is taken as postfix not infix, which is the exact opposite
of what we did for AS-less column aliases (and, in fact, is the opposite
of what bison will do by default, IIRC).  So it might be possible to fix
by attaching some new precedence level to the ROWS token.

On the other hand, this would still mean that ROWS acts oddly
differently from any other column name, and in a way that would be hard
to explain or document.  So I'm really not sure that this is a better
solution than reserving it.

Still another trick in our toolbox is to use merged tokens to fix the
lack of lookahead.  If I read the spec correctly, the problematic cases
of ROWS and RANGE must be followed by UNBOUNDED or BETWEEN, so folding
these token pairs into four merged-tokens would get rid of the need to
reserve anything.  Merged tokens create their own little oddities too
though, as I was just complaining to Peter.  I wonder whether it would
be possible to improve on that problem by arranging some way for the
grammar to signal the lexer about whether lookahead is needed, and thus
not do the merging in contexts where it couldn't be appropriate.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Martijn van Oosterhout

On Tue, Oct 28, 2008 at 12:38:09PM -0400, Tom Lane wrote:
> "Hitoshi Harada" <[EMAIL PROTECTED]> writes:
> > In window specifications, we have
> 
> > OVER (ORDER BY expr_list [(ROWS|RANGE) ... ])
> 
> > and currently "ROWS" is not reserved so bison is confused with cases
> > of "ROWS" included in expr_list and in FRAME clause. Because there is
> > no delimiter between ORDER BY clause and FRAME (that is (ROWS |
> > RANGE)) clause, "ROWS" can be in expr_list as a_expr.
> 
> Right offhand, I don't see any alternative but to make both ROWS and
> RANGE reserved.  It's pretty annoying since that might break existing
> applications that have been using these as identifiers, but the SQL
> committee seems to care little about that :-(

Given that the only problematic case is if expr_list ends with a
postfix operator, wouldn't it be sufficient to simply decree that in
that case you need parentheses? Seems a lot less painful than adding
two reserved words.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Hitoshi Harada

2008/10/29 Tom Lane <[EMAIL PROTECTED]>:
> "Hitoshi Harada" <[EMAIL PROTECTED]> writes:
>> Can "ROWS" be reserved_keyword?
>
>> In window specifications, we have
>
>> OVER (ORDER BY expr_list [(ROWS|RANGE) ... ])
>
>> and currently "ROWS" is not reserved so bison is confused with cases
>> of "ROWS" included in expr_list and in FRAME clause. Because there is
>> no delimiter between ORDER BY clause and FRAME (that is (ROWS |
>> RANGE)) clause, "ROWS" can be in expr_list as a_expr.
>
> Right offhand, I don't see any alternative but to make both ROWS and
> RANGE reserved.  It's pretty annoying since that might break existing
> applications that have been using these as identifiers, but the SQL
> committee seems to care little about that :-(
>
> BTW, finding this sort of problem is exactly why ignoring shift/reduce
> conflicts is a bad idea.  You would've ended up with unexpected
> behaviors given the wrong input.
>

I see it now. This is so good study to me, though it spent me much
time. Thanks anyway.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Tom Lane

"Hitoshi Harada" <[EMAIL PROTECTED]> writes:
> Can "ROWS" be reserved_keyword?

> In window specifications, we have

> OVER (ORDER BY expr_list [(ROWS|RANGE) ... ])

> and currently "ROWS" is not reserved so bison is confused with cases
> of "ROWS" included in expr_list and in FRAME clause. Because there is
> no delimiter between ORDER BY clause and FRAME (that is (ROWS |
> RANGE)) clause, "ROWS" can be in expr_list as a_expr.

Right offhand, I don't see any alternative but to make both ROWS and
RANGE reserved.  It's pretty annoying since that might break existing
applications that have been using these as identifiers, but the SQL
committee seems to care little about that :-(

BTW, finding this sort of problem is exactly why ignoring shift/reduce
conflicts is a bad idea.  You would've ended up with unexpected
behaviors given the wrong input.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Hitoshi Harada

2008/10/28 Tom Lane <[EMAIL PROTECTED]>:
> "Hitoshi Harada" <[EMAIL PROTECTED]> writes:
>> OK, I'll try to remove it. I'm not used to bison so my first task is
>> to find where the conflict is...
>
> Use bison -v to get details of where the conflict is.  I find that
> the most common way to fix things is to postpone where the parser
> has to make a decision, which usually ends up meaning a slightly
> more verbose set of productions --- for instance instead of
>
>foo: bar opt_baz
>
>opt_baz: baz | /*EMPTY*/
>
> you might have to do
>
>foo: bar baz | bar
>

Thanks for your advice. And I found the problem is around FRAME
clause. Now my question is:

Can "ROWS" be reserved_keyword?

In window specifications, we have

OVER (ORDER BY expr_list [(ROWS|RANGE) ... ])

and currently "ROWS" is not reserved so bison is confused with cases
of "ROWS" included in expr_list and in FRAME clause. Because there is
no delimiter between ORDER BY clause and FRAME (that is (ROWS |
RANGE)) clause, "ROWS" can be in expr_list as a_expr. I see "ROWS" has
been an unreserved keyword and that many places in gram.y such cases
have been avoided by other haking methods, but in this case, isn't it
possible such? If I missed something let me know it.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Tom Lane

"Hitoshi Harada" <[EMAIL PROTECTED]> writes:
> OK, I'll try to remove it. I'm not used to bison so my first task is
> to find where the conflict is...

Use bison -v to get details of where the conflict is.  I find that
the most common way to fix things is to postpone where the parser
has to make a decision, which usually ends up meaning a slightly
more verbose set of productions --- for instance instead of

foo: bar opt_baz

opt_baz: baz | /*EMPTY*/

you might have to do

foo: bar baz | bar

(which is actually shorter in this pseudo-example, but wouldn't be
if bar were complicated or foo had a lot of alternatives already).
The reason this can fix it is that the parser doesn't have to commit
to which case applies until after it's read the tokens for baz.
In the first form, it has to decide whether baz is there or not
without looking any further ahead than the first token of baz.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Hitoshi Harada

2008/10/28 Tom Lane <[EMAIL PROTECTED]>:
> "Hitoshi Harada" <[EMAIL PROTECTED]> writes:
>> 2008/10/28 ITAGAKI Takahiro <[EMAIL PROTECTED]>:
>>> I tested the patch on mingw (Windows) and
>>> got the following warning and error:
>>>
>>> A. gram.y: conflicts: 3 shift/reduce
>>> B. include/nodes/plannodes.h:650: error: syntax error before "uint"
>>>
>>> I have no idea about A.
>
>> I have noticed it but didn't think it is a problem, but it doesn't
>> occur in production, does it?
>
> We have a zero-tolerance policy for bison warnings.  Patches that
> introduce shift/reduce conflicts *will* be rejected.  (And no, %expect
> isn't an acceptable fix.  The problem with it is you can't be sure
> which warnings it ignored.  In a grammar that gets hacked on as often
> as PG's does, we couldn't rely on the conflicts to not move around,
> possibly resulting in unforeseen misbehavior.)
>
>regards, tom lane
>

OK, I'll try to remove it. I'm not used to bison so my first task is
to find where the conflict is...

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Tom Lane

"Hitoshi Harada" <[EMAIL PROTECTED]> writes:
> 2008/10/28 ITAGAKI Takahiro <[EMAIL PROTECTED]>:
>> I tested the patch on mingw (Windows) and
>> got the following warning and error:
>> 
>> A. gram.y: conflicts: 3 shift/reduce
>> B. include/nodes/plannodes.h:650: error: syntax error before "uint"
>> 
>> I have no idea about A.

> I have noticed it but didn't think it is a problem, but it doesn't
> occur in production, does it?

We have a zero-tolerance policy for bison warnings.  Patches that
introduce shift/reduce conflicts *will* be rejected.  (And no, %expect
isn't an acceptable fix.  The problem with it is you can't be sure
which warnings it ignored.  In a grammar that gets hacked on as often
as PG's does, we couldn't rely on the conflicts to not move around,
possibly resulting in unforeseen misbehavior.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Hitoshi Harada

2008/10/28 ITAGAKI Takahiro <[EMAIL PROTECTED]>:
>
> "Hitoshi Harada" <[EMAIL PROTECTED]> wrote:
>
>> And I fixed this problem, confirming  with/without debug/cassert/gcc
>> -O and push it to git. If you want delta patch, please see
>> http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=fbf19bfd0c8d2ac083b775f4cc724ec66e74fa8f
>
> Now it passed all regression tests!
>
>
> There is still one trivial warning:
>parse_func.c: In function `ParseFuncOrColumn':
>parse_func.c:77: warning: 'retval' might be used uninitialized in this 
> function
>
> It looks completely safe, but I suggest to initialize
> the variable only to keep compiler quiet.
>
> [src/backend/parser/parse_func.c] ParseFuncOrColumn()
>else
> +   {
>elog(ERROR, "unknown function status");
> +   retval = NULL;  /* keep compiler quiet */
> +   }
>

Agreed. I've just noticed it as well and I think it would be more much
like original code that the last "else if" clause gets "else". Anyway,
thanks.


Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread ITAGAKI Takahiro


"Hitoshi Harada" <[EMAIL PROTECTED]> wrote:

> And I fixed this problem, confirming  with/without debug/cassert/gcc
> -O and push it to git. If you want delta patch, please see
> http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=fbf19bfd0c8d2ac083b775f4cc724ec66e74fa8f

Now it passed all regression tests!


There is still one trivial warning:
parse_func.c: In function `ParseFuncOrColumn':
parse_func.c:77: warning: 'retval' might be used uninitialized in this 
function

It looks completely safe, but I suggest to initialize
the variable only to keep compiler quiet.

[src/backend/parser/parse_func.c] ParseFuncOrColumn()
else
+   {
elog(ERROR, "unknown function status");
+   retval = NULL;  /* keep compiler quiet */
+   }

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-28 Thread Hitoshi Harada

2008/10/28 Hitoshi Harada <[EMAIL PROTECTED]>:
> Thanks for your testing all!
>
> 2008/10/28 ITAGAKI Takahiro <[EMAIL PROTECTED]>:
>> "Hitoshi Harada" <[EMAIL PROTECTED]> wrote:
>>
>>
>>> select relid,AVG(seq_Scan) OVER (ORDER BY relid)
>>> FROM pg_stat_user_tables
>>> GROUP BY relid,seq_scan;
>>
>> crashes with segfault.
>
> I reproduced it on linux of my environment, building without
> debug/cassert. It could be a problem around there.
>

And I fixed this problem, confirming  with/without debug/cassert/gcc
-O and push it to git. If you want delta patch, please see

http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=fbf19bfd0c8d2ac083b775f4cc724ec66e74fa8f

Thanks for all of your feedbacks.
Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-27 Thread Hitoshi Harada

Thanks for your testing all!

2008/10/28 ITAGAKI Takahiro <[EMAIL PROTECTED]>:
> "Hitoshi Harada" <[EMAIL PROTECTED]> wrote:
>
>> As I promised, version 7 of the window functions is now released.
>> patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz
>
> I tested the patch on mingw (Windows) and
> got the following warning and error:
>
>  A. gram.y: conflicts: 3 shift/reduce
>  B. include/nodes/plannodes.h:650: error: syntax error before "uint"
>
> I have no idea about A.

I have noticed it but didn't think it is a problem, but it doesn't
occur in production, does it?

>
> B is reported on the type 'uint' in struct Window.
> I can compile the code if I replace 'uint' to 'uint32'
>
> typedef struct Window
> {
> ...
>uintpreceding_rows; /* used only when FRAME_ROWS */
>uintfollowing_rows; /* used only when FRAME_ROWS */

I didn't know it. will fix it.

>
>> SELECT sum(i) OVER (ORDER BY i) FROM generate_series(1, 10) i;
>
> works fine, but...
>
>> select relid,AVG(seq_Scan) OVER (ORDER BY relid)
>> FROM pg_stat_user_tables
>> GROUP BY relid,seq_scan;
>
> crashes with segfault.

I reproduced it on linux of my environment, building without
debug/cassert. It could be a problem around there.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-27 Thread ITAGAKI Takahiro

"Hitoshi Harada" <[EMAIL PROTECTED]> wrote:

> As I promised, version 7 of the window functions is now released.
> patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz

I tested the patch on mingw (Windows) and
got the following warning and error:

  A. gram.y: conflicts: 3 shift/reduce
  B. include/nodes/plannodes.h:650: error: syntax error before "uint"

I have no idea about A.

B is reported on the type 'uint' in struct Window.
I can compile the code if I replace 'uint' to 'uint32' 

typedef struct Window
{
...
uintpreceding_rows; /* used only when FRAME_ROWS */
uintfollowing_rows; /* used only when FRAME_ROWS */

> SELECT sum(i) OVER (ORDER BY i) FROM generate_series(1, 10) i;

works fine, but...

> select relid,AVG(seq_Scan) OVER (ORDER BY relid)
> FROM pg_stat_user_tables
> GROUP BY relid,seq_scan;

crashes with segfault.

=# SELECT version();
   version
--
 PostgreSQL 8.4devel on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.5 
(mingw-vista special r3)
(1 row)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-27 Thread Hitoshi Harada

2008/10/28 David Rowley <[EMAIL PROTECTED]>:
> Hitoshi Harada Wrote:
>
>> As I promised, version 7 of the window functions is now released. At
>> the same time, git repository branch comes back to master.
>>
>> git: http://git.postgresql.org/?p=~davidfetter/window_functions/.git
>> patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz
>
> I've been testing for a couple of hours now and just comparing results to
> the results Oracle gives me. Perhaps not the best method but it's faster
> than reading through the spec.
>
> I managed to get a seg-fault with the following query.
>
> select salary,dense_rank() over (order by salary desc)
> from employee
> group by salary;
>
>
> It's the group by that does it.
>
> test=# select salary,dense_rank() over (order by salary desc) from employee
> group by salary;
> server closed the connection unexpectedly
>This probably means the server terminated abnormally
>before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
>
> It seems to be possible to crash not just dense_rank() and rank().
>
> This crashes too.
>
> select relid,AVG(seq_Scan) OVER (ORDER BY relid)
> FROM pg_stat_user_tables
> GROUP BY relid,seq_scan;

sample=# select relid,AVG(seq_Scan) OVER (ORDER BY relid)
sample-# FROM pg_stat_user_tables
sample-# GROUP BY relid,seq_scan;
 relid |avg
---+
 16385 | 7.
 16391 | 3.5000
 16394 | 2.
(3 rows)

Hmm, I tested it on my environment, but doesn't crash. If you have
further information on that, please tell me. And yes, I haven't paid
attention for such cases. I think at least regression test is
necessary to be added.

> Looking forward to seeing windowing functions in 8.4!

Thanks for your testing.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-27 Thread David Rowley

Hitoshi Harada Wrote:

> As I promised, version 7 of the window functions is now released. At
> the same time, git repository branch comes back to master.
> 
> git: http://git.postgresql.org/?p=~davidfetter/window_functions/.git
> patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz

I've been testing for a couple of hours now and just comparing results to
the results Oracle gives me. Perhaps not the best method but it's faster
than reading through the spec.

I managed to get a seg-fault with the following query.

select salary,dense_rank() over (order by salary desc)
from employee
group by salary;


It's the group by that does it.

test=# select salary,dense_rank() over (order by salary desc) from employee
group by salary;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


It seems to be possible to crash not just dense_rank() and rank(). 

This crashes too.

select relid,AVG(seq_Scan) OVER (ORDER BY relid)
FROM pg_stat_user_tables
GROUP BY relid,seq_scan;

Oracle allows both these queries.

Of course I changed table names and column names to make the test case a bit
easier to re-produce.

Looking forward to seeing windowing functions in 8.4!

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Window Functions: v07 APIs and buffering strateties

2008-10-27 Thread Hitoshi Harada

As I promised, version 7 of the window functions is now released. At
the same time, git repository branch comes back to master.

git: http://git.postgresql.org/?p=~davidfetter/window_functions/.git
patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz

It's too huge to send it to this list :) and I don't have time enough
to update the usual document but more comments in source code were put
than before so check it out!

Thanks to those feedbacks I found the basic cumulative aggregate is
necessary even though FRAME clause is not supported. In window
specifications like:
WINDOW w AS (ORDER BY id)
the window frame is implicitly made as BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW. But still SQL spec window functions like percent_rank()
need all rows of the partition. So I introduce buffering strategy in
Window node.

So now we have:
SELECT sum(i) OVER (ORDER BY i) FROM generate_series(1, 10) i;
 sum
-
   1
   3
   6
  10
  15
  21
  28
  36
  45
  55
(10 rows)

and all SQL spec functions.

I guess this release covers all the features I had intended when I was
started. Though FRAME clause support is not done yet, a basic
execution mechanism is prepared so if the parser supports it, the
executor doesn't worry about it. But before heading for FRAME clause,
we must support buffering strategies, and FRAME clause may be in 8.5
cycle. In this release, I wrote around the Partition buffering so no
optimization for row_number() is implemented, though you can be
inspired how to add another buffering when reading nodeWindow.c. For
the future buffering addition, I defined and classified those Window
function APIs:

extern int64 WinRowCurrentPos(WindowObject winobj);
extern Datum WinRowGetArg(WindowObject winobj, ExprState *argstate,
bool *isnull);
extern bool WinRowGetTuple(WindowObject winobj, TupleTableSlot *slot);

extern bool WinFrameShrinked(WindowObject winobj);
extern bool WinFrameExtended(WindowObject winobj);
extern int64 WinFrameGetRowNum(WindowObject winobj);
extern Datum WinFrameGetArg(WindowObject winobj, ExprState *argstate,
int relpos, int seektype, bool *isnull);
extern bool WinFrameGetTuple(WindowObject winobj, TupleTableSlot *slot,
int relpos, int seektype);
extern int WinFrameShrinkingNum(WindowObject winobj);
extern int WinFrameExtendedNum(WindowObject winobj);

extern int64 WinPartGetRowNum(WindowObject winobj);
extern Datum WinPartGetArg(WindowObject winobj, ExprState *argstate,
int relpos, int seektype, bool *isnull);
extern bool WinPartGetTuple(WindowObject winobj, TupleTableSlot *slot,
int relpos, int seektype);

extern WindowIter WinFrameStartIter(WindowObject winobj, int pos);
extern WindowIter WinPartStartIter(WindowObject winobj, int pos);
extern bool WinIterNext(WindowIter iter);
extern Datum WinIterGetArg(WindowIter iter, ExprState *argstate, bool *isnull);
extern bool WinIterGetTuple(WindowIter iter, TupleTableSlot *slot);

With these APIs, you can write advanced window aggregate subtracting
shrinking rows of the frame, and buffering supports.

I believe I can send another patch until the next commit fest, but not
sure about sgml documentation. If someone is interested in this
feature, feel free to help me in documentation! I think the lack is
around SQL spec window functions, how the window frame works (and that
we don't support FRAME clause in this release), and basic concept of
window function architecture. We don't have to touch deep inside of
window function APIs and buffering strategies since the window
functions cannot be defined as user function now.

Comments, feedbacks?

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: buffering strategy

2008-10-20 Thread Hitoshi Harada

2008/10/21 Heikki Linnakangas <[EMAIL PROTECTED]>:
> Hitoshi Harada wrote:
>>
>> The real problem is not how to cut off preceding rows, but how to read
>> ahead after the current row. I intend to avoid reading ahead until end
>> of the partition for only row_number() that doesn't need any following
>> rows. Sometimes we have to store whole the partition before returning
>> the first result and sometimes not. It depends on function categories,
>> or function access range. My current idea is classify Window function
>> API to three parallel to buffering strategies.
>
> Could the rows be read ahead on demand? If the window function calls
> window_getarg on a row that's not yet fetched, fetch forward to that row.

Well, it could be possible. But from my current view it will be very
complicated and might be impossible. So I will try to implement basic
approach, and let's consider your approach then. We keep stay in
private API so that we have time to consider again.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: buffering strategy

2008-10-20 Thread Heikki Linnakangas


Hitoshi Harada wrote:

The real problem is not how to cut off preceding rows, but how to read
ahead after the current row. I intend to avoid reading ahead until end
of the partition for only row_number() that doesn't need any following
rows. Sometimes we have to store whole the partition before returning
the first result and sometimes not. It depends on function categories,
or function access range. My current idea is classify Window function
API to three parallel to buffering strategies.


Could the rows be read ahead on demand? If the window function calls 
window_getarg on a row that's not yet fetched, fetch forward to that row.



And the lag()/lead(), spec says "OFFSET is exact numeric literal" but
we postgres doesn't have mechanism to limit the function argument data
type to Const integer only. So I am thinking about OFFSET for
lag()/lead() may be dynamic integer variable. If it comes, even those
functions don't know how many rows should be cut off. The lag()/lead()
can access any row of partition, per spec.


Hmm, yeah, that's a bit of a problem :-(.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: buffering strategy

2008-10-20 Thread Hitoshi Harada

2008/10/20 Heikki Linnakangas <[EMAIL PROTECTED]>:
> Hitoshi Harada wrote:
>>
>> Hi,
>>
>> 2008/10/20 Simon Riggs <[EMAIL PROTECTED]>:
>>>
>>> On Mon, 2008-10-20 at 10:32 +0900, Hitoshi Harada wrote:
>>>
 So I propose three Window node buffering strategies,
 row/frame/partition buffering so as to avoid unnecessary row
 buffering.
>>>
>>> Sounds good from here. Can I suggest you release the code in phases?
>>>
>>> It would be better if we got just one of those done (row?), than to
>>> attempt all 3 and end up with none because of emerging details.
>>
>> Thank you for your feedback.
>> Ok, actually the first will be partition buffering, because it covers
>> all of the functions' requirement. The row buffering is kind of
>> optimization in the special case.
>
> The thought I had during the last commit fest was that the function would
> call a callback, something like window_forget(pos), that would tell the
> system that it can release any rows before the given position. That way you
> only need one method, and it's also be optimal for functions like lag(),
> that doesn't fit perfectly into either the row or frame buffering category.
> Or do we need the information at plan-time?
>

Right. In the last commit fest we discussed about run-time cut-off
signal API. But I have finally come to that we must know how to buffer
*before* any execution.

The real problem is not how to cut off preceding rows, but how to read
ahead after the current row. I intend to avoid reading ahead until end
of the partition for only row_number() that doesn't need any following
rows. Sometimes we have to store whole the partition before returning
the first result and sometimes not. It depends on function categories,
or function access range. My current idea is classify Window function
API to three parallel to buffering strategies.

And the lag()/lead(), spec says "OFFSET is exact numeric literal" but
we postgres doesn't have mechanism to limit the function argument data
type to Const integer only. So I am thinking about OFFSET for
lag()/lead() may be dynamic integer variable. If it comes, even those
functions don't know how many rows should be cut off. The lag()/lead()
can access any row of partition, per spec.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: buffering strategy

2008-10-20 Thread Heikki Linnakangas


Hitoshi Harada wrote:

Hi,

2008/10/20 Simon Riggs <[EMAIL PROTECTED]>:

On Mon, 2008-10-20 at 10:32 +0900, Hitoshi Harada wrote:


So I propose three Window node buffering strategies,
row/frame/partition buffering so as to avoid unnecessary row
buffering.

Sounds good from here. Can I suggest you release the code in phases?

It would be better if we got just one of those done (row?), than to
attempt all 3 and end up with none because of emerging details.


Thank you for your feedback.
Ok, actually the first will be partition buffering, because it covers
all of the functions' requirement. The row buffering is kind of
optimization in the special case.


The thought I had during the last commit fest was that the function 
would call a callback, something like window_forget(pos), that would 
tell the system that it can release any rows before the given position. 
That way you only need one method, and it's also be optimal for 
functions like lag(), that doesn't fit perfectly into either the row or 
frame buffering category. Or do we need the information at plan-time?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: buffering strategy

2008-10-20 Thread Hitoshi Harada

Hi,

2008/10/20 Simon Riggs <[EMAIL PROTECTED]>:
>
> On Mon, 2008-10-20 at 10:32 +0900, Hitoshi Harada wrote:
>
>> So I propose three Window node buffering strategies,
>> row/frame/partition buffering so as to avoid unnecessary row
>> buffering.
>
> Sounds good from here. Can I suggest you release the code in phases?
>
> It would be better if we got just one of those done (row?), than to
> attempt all 3 and end up with none because of emerging details.

Thank you for your feedback.
Ok, actually the first will be partition buffering, because it covers
all of the functions' requirement. The row buffering is kind of
optimization in the special case.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions: buffering strategy

2008-10-19 Thread Simon Riggs

On Mon, 2008-10-20 at 10:32 +0900, Hitoshi Harada wrote:

> So I propose three Window node buffering strategies,
> row/frame/partition buffering so as to avoid unnecessary row
> buffering.

Sounds good from here. Can I suggest you release the code in phases?

It would be better if we got just one of those done (row?), than to
attempt all 3 and end up with none because of emerging details.

Good luck.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Window Functions: buffering strategy

2008-10-19 Thread Hitoshi Harada

> I can find how to do it with the new (window execution model) design,
> (and the design is suitable to fix it above,) but at first before
> going into trivial specs, I would like core hackers to review the
> model is better than before or not. Thank you for your cooperation.

So no objections appeared. I am progressing with the new model.

Thanks to feedbacks against the previous patch, I am inclined to think
that we need to have different row buffering strategy for window
functions. As we discussed, depends on functions different bufferings
are needed. Then I read spec again and classified functions.

- row_number() -- row buffering
 Obviously no buffering, or what you need is only where you are in the partition

- rank() -- row buffering
 You need two rows, previous row and current row to determine if you
rank up or not. But it is solvable if the function cache the previous
row so row buffering is needed.

- dense_rank() -- row buffering
 Same as rank().

- percent_rank() -- partition buffering
 In addition to rank() requirement, you have to see the last row in
the partition to count up how many rows are contained in the
partition. The spec says: defined as (RK–1)/(NR–1), where RK is
defined to be the RANK of R and NR is defined to be the number of rows
in the window partition of R.

- cume_dist() -- partition buffering
 Almost same as percent_rank(), partition buffering.

- ntile() -- partition buffering
 You have to know how many rows in the partition to determine how many
rows are contained in each bucket.

- lead() -- partition buffering
 The spec says: returns the value of VE evaluated on a row that is
OFFSET number of rows after R within P (partition).

- lag() -- partition buffering
 Same as lead().

- first_value() -- frame buffering
 The spec says: return the value of VE evaluated on the first row of
the window frame.

- last_value() -- frame buffering
 Same as first_value().

- nth_value() -- frame buffering
 Same as first_value()

- aggregates -- frame buffering
 aggregates all rows in the frame.

So I propose three Window node buffering strategies,
row/frame/partition bufferinig so as to avoid unnecessary row
buffering. Each window functions have buffering strategy number and
planner detect which strategy is at least needed for the execution. If
the node contains only row_number() then it needs row buffering but if
row_number() and lead() then partition bufferinig is needed.
Temporarily until window function APIs are public out, the strategy
numbers for each function can be defined in the source code as macro
or something, and when they are public we must have pg_wfunc or
something to register the strategy. If the function violate the
declared strategy and touched different API, for example if the
row_number() is about to call window_paritition_rows(), error will
occur.

After reading spec closer, I found row_number(), rank(), etc. doesn't
care if there is a moving frame. Only aggregates and
first/last/nth_value()s care it.

Now I guess I know what to do but I don't know how to do it :)  But I
am going to send another patch until next commit fest. I appreciate
for you comments.

Regeards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions

2008-10-14 Thread Hitoshi Harada

2008/10/15 Andreas Joseph Krogh <[EMAIL PROTECTED]>:
> On Tuesday 14 October 2008 18:19:07 Hannu Krosing wrote:
>> On Tue, 2008-10-14 at 11:05 +0200, Andreas Joseph Krogh wrote:
>> > Hi all.
>> > This is not very "hackers"-related, but related to the topic of 
>> > window-funcitons, which seems to be discussed quite a bit on "hackers" 
>> > these days.
>> >
>> > Can window-functions in PG be used to return "total number of rows" in a 
>> > "paged result"?
>> > Say you have:
>> > SELECT p.id, p.firstname
>> >   FROM person p
>> >  ORDER BY p.firstname ASC
>> >  LIMIT 10 OFFSET 10
>> >
>> > Is it possible to use some window-function to return the "total-number of 
>> > columns" in a separate column?
>> >
>> > In Oracle one can do
>> > SELECT q.*, max(rownum) over() as total_rows FROM (subquery)
>> > which returns the total number or columns in a separate column. This is 
>> > very handy for web-pages which for example need to display the rist 20 
>> > results of several million, without having to do a separate count(*) query.
>>
>> no need to use window functions here, just ask for max inline:
>>
>>
>> hannu=# select rownum, word, (select max(rownum) from words) as maxrow
>> from words limit 10;
>>  rownum |   word| maxrow
>> +---+
>>   1 |   |  98569
>>   2 | A |  98569
>>   3 | A's   |  98569
>>   4 | AOL   |  98569
>>   5 | AOL's |  98569
>>   6 | Aachen|  98569
>>   7 | Aachen's  |  98569
>>   8 | Aaliyah   |  98569
>>   9 | Aaliyah's |  98569
>>  10 | Aaron |  98569
>> (10 rows)
>
> Where do you get your "rownum"-column from here? It's a pseudo-column in 
> Oracle which is computed for each row in the "result-set", it's not a column 
> in a table somewhere, which is why I figured I must use window-funciton, or 
> "analytical function" as Oracle calls them, to operate on the *result-set* to 
> retrieve the maximum number of rows which satisfies the query.
>
> As far as I understand the ROW_NUMBER() window-funciton can be used to 
> construct "limit with offset"-queries in a SQL-spec-compliant way.
>
> Say I want to retrieve an ordered list of persons (by name):
>
> SELECT * FROM (
> SELECT ROW_NUMBER() OVER (order by p.name) as rnum, q.*
>  FROM (
> SELECT p.id, p.name FROM person p where p.birth_date > '2000-01-01'
> ) q
> ) r
>  WHERE r.rnum between 11 AND 20
> ;
>
> This is good and works in Oracle, PG >= 8.4 and others that implements 
> spec-compliant window-functions. This is fine, but in Oracle I can extend 
> this query to this for getting the total-number (not just the "page" 11-20) 
> of persons matching in a separate column:
>
> SELECT * FROM (
> SELECT ROW_NUMBER() OVER (order by p.name) as rnum, q.*, max(rownum) over() 
> as total_rows
>  FROM (
> SELECT p.id, p.name FROM person p where p.birth_date > '2000-01-01'
> ) q
> ) r
>  WHERE r.rnum between 11 AND 20
> ;
>
> So my question is: Will PG, with window functions, provide a similar 
> mechanism for retrieving the total number of rows in the "result-set" without 
> actually retrieving them all? I understand that PG might have to visit them 
> all in order to retrieve that count, but that's OK.

Yeah, the half part of my purpose is for that. Manytimes we want
values based on cross-row without reducing or aggregate rows. The rest
of my purpose is for analytical methods such as cumulative aggregates.
As you point, internally postgres must see all the rows to determine
the maximum of row_number() so it's not so efficiently as you feel but
I beleive (and hope) it is elegant enough and perform well
considerablely.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions

2008-10-14 Thread Hannu Krosing

On Tue, 2008-10-14 at 19:04 +0200, Andreas Joseph Krogh wrote:
> On Tuesday 14 October 2008 18:19:07 Hannu Krosing wrote:
> > On Tue, 2008-10-14 at 11:05 +0200, Andreas Joseph Krogh wrote:
> > > Hi all.
> > > This is not very "hackers"-related, but related to the topic of 
> > > window-funcitons, which seems to be discussed quite a bit on "hackers" 
> > > these days.
> > > 
> > > Can window-functions in PG be used to return "total number of rows" in a 
> > > "paged result"?
> > > Say you have:
> > > SELECT p.id, p.firstname
> > >   FROM person p
> > >  ORDER BY p.firstname ASC
> > >  LIMIT 10 OFFSET 10
> > > 
> > > Is it possible to use some window-function to return the "total-number of 
> > > columns" in a separate column?
> > > 
> > > In Oracle one can do 
> > > SELECT q.*, max(rownum) over() as total_rows FROM (subquery)
> > > which returns the total number or columns in a separate column. This is 
> > > very handy for web-pages which for example need to display the rist 20 
> > > results of several million, without having to do a separate count(*) 
> > > query.
> > 
> > no need to use window functions here, just ask for max inline:
> > 
> > 
> > hannu=# select rownum, word, (select max(rownum) from words) as maxrow
> > from words limit 10;
> >  rownum |   word| maxrow 
> > +---+
> >   1 |   |  98569
> >   2 | A |  98569
> >   3 | A's   |  98569
> >   4 | AOL   |  98569
> >   5 | AOL's |  98569
> >   6 | Aachen|  98569
> >   7 | Aachen's  |  98569
> >   8 | Aaliyah   |  98569
> >   9 | Aaliyah's |  98569
> >  10 | Aaron |  98569
> > (10 rows)
> 
> Where do you get your "rownum"-column from here? It's a pseudo-column in 
> Oracle 
> which is computed for each row in the "result-set", it's not a column in a 
> table 
> somewhere, which is why I figured I must use window-funciton, or "analytical 
> function" 
> as Oracle calls them, to operate on the *result-set* to retrieve the maximum 
> number of 
> rows which satisfies the query.

ok, I misunderstood your intent

I guess you can use the non-recursive variant WITH syntax (aka CTE aka
Recursive queries) to get what you want.

> As far as I understand the ROW_NUMBER() window-funciton can be used to 
> construct "limit with offset"-queries in a SQL-spec-compliant way.
> 
> Say I want to retrieve an ordered list of persons (by name):
> 
> SELECT * FROM (
> SELECT ROW_NUMBER() OVER (order by p.name) as rnum, q.*
>   FROM (
> SELECT p.id, p.name FROM person p where p.birth_date > '2000-01-01'
> ) q
> ) r
>  WHERE r.rnum between 11 AND 20
> ;
> 
> This is good and works in Oracle, PG >= 8.4 and others that implements 
> spec-compliant window-functions. This is fine, but in Oracle I can extend 
> this query to this for getting the total-number (not just the "page" 11-20) 
> of persons matching in a separate column:
> 
> SELECT * FROM (
> SELECT ROW_NUMBER() OVER (order by p.name) as rnum, q.*, max(rownum) over() 
> as total_rows
>   FROM (
> SELECT p.id, p.name FROM person p where p.birth_date > '2000-01-01'
> ) q
> ) r
>  WHERE r.rnum between 11 AND 20
> ;
> 
> So my question is: Will PG, with window functions, provide a similar 
> mechanism for retrieving the total number of rows in the "result-set" without 
> actually retrieving them all? I understand that PG might have to visit them 
> all in order to retrieve that count, but that's OK.
> 
> What I'm looking for is an elegant solution to what's becomming a more common 
> requirement in web-applications these days: To display pageable lists with a 
> "total-count", and to do that with *one* query, preferrably using 
> standard-compliant SQL.
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions

2008-10-14 Thread Andreas Joseph Krogh

On Tuesday 14 October 2008 18:19:07 Hannu Krosing wrote:
> On Tue, 2008-10-14 at 11:05 +0200, Andreas Joseph Krogh wrote:
> > Hi all.
> > This is not very "hackers"-related, but related to the topic of 
> > window-funcitons, which seems to be discussed quite a bit on "hackers" 
> > these days.
> > 
> > Can window-functions in PG be used to return "total number of rows" in a 
> > "paged result"?
> > Say you have:
> > SELECT p.id, p.firstname
> >   FROM person p
> >  ORDER BY p.firstname ASC
> >  LIMIT 10 OFFSET 10
> > 
> > Is it possible to use some window-function to return the "total-number of 
> > columns" in a separate column?
> > 
> > In Oracle one can do 
> > SELECT q.*, max(rownum) over() as total_rows FROM (subquery)
> > which returns the total number or columns in a separate column. This is 
> > very handy for web-pages which for example need to display the rist 20 
> > results of several million, without having to do a separate count(*) query.
> 
> no need to use window functions here, just ask for max inline:
> 
> 
> hannu=# select rownum, word, (select max(rownum) from words) as maxrow
> from words limit 10;
>  rownum |   word| maxrow 
> +---+
>   1 |   |  98569
>   2 | A |  98569
>   3 | A's   |  98569
>   4 | AOL   |  98569
>   5 | AOL's |  98569
>   6 | Aachen|  98569
>   7 | Aachen's  |  98569
>   8 | Aaliyah   |  98569
>   9 | Aaliyah's |  98569
>  10 | Aaron |  98569
> (10 rows)

Where do you get your "rownum"-column from here? It's a pseudo-column in Oracle 
which is computed for each row in the "result-set", it's not a column in a 
table somewhere, which is why I figured I must use window-funciton, or 
"analytical function" as Oracle calls them, to operate on the *result-set* to 
retrieve the maximum number of rows which satisfies the query.

As far as I understand the ROW_NUMBER() window-funciton can be used to 
construct "limit with offset"-queries in a SQL-spec-compliant way.

Say I want to retrieve an ordered list of persons (by name):

SELECT * FROM (
SELECT ROW_NUMBER() OVER (order by p.name) as rnum, q.*
  FROM (
SELECT p.id, p.name FROM person p where p.birth_date > '2000-01-01'
) q
) r
 WHERE r.rnum between 11 AND 20
;

This is good and works in Oracle, PG >= 8.4 and others that implements 
spec-compliant window-functions. This is fine, but in Oracle I can extend this 
query to this for getting the total-number (not just the "page" 11-20) of 
persons matching in a separate column:

SELECT * FROM (
SELECT ROW_NUMBER() OVER (order by p.name) as rnum, q.*, max(rownum) over() as 
total_rows
  FROM (
SELECT p.id, p.name FROM person p where p.birth_date > '2000-01-01'
) q
) r
 WHERE r.rnum between 11 AND 20
;

So my question is: Will PG, with window functions, provide a similar mechanism 
for retrieving the total number of rows in the "result-set" without actually 
retrieving them all? I understand that PG might have to visit them all in order 
to retrieve that count, but that's OK.

What I'm looking for is an elegant solution to what's becomming a more common 
requirement in web-applications these days: To display pageable lists with a 
"total-count", and to do that with *one* query, preferrably using 
standard-compliant SQL.

-- 
Andreas Joseph Krogh <[EMAIL PROTECTED]>
Senior Software Developer / CEO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Karenslyst Allé 11  | know how to do a thing and to watch |
PO. Box 529 Skøyen  | somebody else doing it wrong, without   |
0214 Oslo   | comment.|
NORWAY  | |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions

2008-10-14 Thread Hannu Krosing

On Tue, 2008-10-14 at 11:05 +0200, Andreas Joseph Krogh wrote:
> Hi all.
> This is not very "hackers"-related, but related to the topic of 
> window-funcitons, which seems to be discussed quite a bit on "hackers" these 
> days.
> 
> Can window-functions in PG be used to return "total number of rows" in a 
> "paged result"?
> Say you have:
> SELECT p.id, p.firstname
>   FROM person p
>  ORDER BY p.firstname ASC
>  LIMIT 10 OFFSET 10
> 
> Is it possible to use some window-function to return the "total-number of 
> columns" in a separate column?
> 
> In Oracle one can do 
> SELECT q.*, max(rownum) over() as total_rows FROM (subquery)
> which returns the total number or columns in a separate column. This is very 
> handy for web-pages which for example need to display the rist 20 results of 
> several million, without having to do a separate count(*) query.

no need to use window functions here, just ask for max inline:


hannu=# select rownum, word, (select max(rownum) from words) as maxrow
from words limit 10;
 rownum |   word| maxrow 
+---+
  1 |   |  98569
  2 | A |  98569
  3 | A's   |  98569
  4 | AOL   |  98569
  5 | AOL's |  98569
  6 | Aachen|  98569
  7 | Aachen's  |  98569
  8 | Aaliyah   |  98569
  9 | Aaliyah's |  98569
 10 | Aaron |  98569
(10 rows)


-
Hannu Krosing



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Window Functions

2008-10-14 Thread Andreas Joseph Krogh

Hi all.
This is not very "hackers"-related, but related to the topic of 
window-funcitons, which seems to be discussed quite a bit on "hackers" these 
days.

Can window-functions in PG be used to return "total number of rows" in a "paged 
result"?
Say you have:
SELECT p.id, p.firstname
  FROM person p
 ORDER BY p.firstname ASC
 LIMIT 10 OFFSET 10

Is it possible to use some window-function to return the "total-number of 
columns" in a separate column?

In Oracle one can do 
SELECT q.*, max(rownum) over() as total_rows FROM (subquery)
which returns the total number or columns in a separate column. This is very 
handy for web-pages which for example need to display the rist 20 results of 
several million, without having to do a separate count(*) query.

-- 
Andreas Joseph Krogh <[EMAIL PROTECTED]>
Senior Software Developer / CEO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Karenslyst Allé 11  | know how to do a thing and to watch |
PO. Box 529 Skøyen  | somebody else doing it wrong, without   |
0214 Oslo   | comment.|
NORWAY  | |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions patch v06

2008-10-13 Thread Tom Lane

"Hitoshi Harada" <[EMAIL PROTECTED]> writes:
> I agree I need to work on that. Also from the spec, "RESPECT NULLS /
> IGNORE NULLS" may be specified but not supported yet. This syntax
> specification is out of the postgres general function call so I wonder
> if those functions are treated specially or not.

Egad, the SQL committee has certainly been taken over by creeping
COBOL-itis when it comes to inventing random new syntax ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions patch v06

2008-10-13 Thread Hitoshi Harada

2008/10/14 David Rowley <[EMAIL PROTECTED]>:
> Hitoshi Harada wrote:
>>I made up my mind to scratch former window functions and redesigned
>>completely new execution model, based on the discussion with Heikki.
>>Attached is the v06 against HEAD today.
>>http://umitanuki.net/pgsql/wfv06/design.html
>
> First off, fantastic work!
>
> In my eyes this and WITH RECURSIVE are a big step for both Postgres and open
> source RBDMS'.
>
> Only, one small query with LEAD() and LAG()
>
> Going by http://www.wiscorp.com/sql200n.zip
>
> "The lead and lag functions each take three arguments, a 
> VE, an 
> OFFSET, and a  DEFAULT. For each row R within the window
> partition P of R defined by
> a window structure descriptor, the lag function returns the value of VE
> evaluated on a row that is OFFSET
> number of rows before R within P, and the lead function returns the value of
> VE evaluated on a row that is
> OFFSET number of rows after R within P. The value of DEFAULT is returned as
> the result if there is no row
> corresponding to the OFFSET number of rows before R within P (for the lag
> function) or after R within P (for
> the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be
> specified to indicate whether
> the rows within P for which VE evaluates to the null value are preserved or
> eliminated"
>
> So going by that:
> SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee;
>
> Would use 'None' for rows that would be out of the bounds of the window.
>
> The current patch only seems to accept 2 arguments.
> ERROR:  function lag(character varying, integer, unknown) does not exist
>
>
>

Thanks for your feedback.

I agree I need to work on that. Also from the spec, "RESPECT NULLS /
IGNORE NULLS" may be specified but not supported yet. This syntax
specification is out of the postgres general function call so I wonder
if those functions are treated specially or not.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions patch v06

2008-10-13 Thread David Rowley

Hitoshi Harada wrote:
>I made up my mind to scratch former window functions and redesigned
>completely new execution model, based on the discussion with Heikki.
>Attached is the v06 against HEAD today.
>http://umitanuki.net/pgsql/wfv06/design.html

First off, fantastic work!

In my eyes this and WITH RECURSIVE are a big step for both Postgres and open
source RBDMS'.

Only, one small query with LEAD() and LAG()

Going by http://www.wiscorp.com/sql200n.zip 

"The lead and lag functions each take three arguments, a 
VE, an 
OFFSET, and a  DEFAULT. For each row R within the window
partition P of R defined by
a window structure descriptor, the lag function returns the value of VE
evaluated on a row that is OFFSET
number of rows before R within P, and the lead function returns the value of
VE evaluated on a row that is
OFFSET number of rows after R within P. The value of DEFAULT is returned as
the result if there is no row
corresponding to the OFFSET number of rows before R within P (for the lag
function) or after R within P (for
the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be
specified to indicate whether
the rows within P for which VE evaluates to the null value are preserved or
eliminated"

So going by that:
SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee;

Would use 'None' for rows that would be out of the bounds of the window.

The current patch only seems to accept 2 arguments.
ERROR:  function lag(character varying, integer, unknown) does not exist

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions patch v06

2008-10-12 Thread Hitoshi Harada

I confirmed this on Oracle:

select last_value(id) over (order by id) as last_id, id from foo;

LAST_ID ID
--- --
0   0
1   1
2   2
3   3
4   4
5   5
6   6
7   7
8   8
9   9
10  10

So when you specify ORDER BY clause on window definition, the frame
always contains rows preceding from current row, not only on
aggregate.


>
> Doing a bit of poking around in the spec and the Oracle documentation,
> I think (but I'm not 100% sure) that the results returned were correct
> for the query:
>
> postgres=# select a, sum(a) over () from generate_series(1,10) a;
> ERROR:  either PARTITION BY or ORDER BY must be specified in window clause

The empty window definition is forbidden temporarily because I didn't
know how to take it although there's no problem to execute it. I
haven't found any description on spec yet, I know at least Oracle
allows it.

I can find how to do it with the new (window execution model) design,
(and the design is suitable to fix it above,) but at first before
going into trivial specs, I would like core hackers to review the
model is better than before or not. Thank you for your cooperation.


Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions patch v06

2008-10-11 Thread Ian Caulfield

2008/10/11 Hitoshi Harada <[EMAIL PROTECTED]>:
> I am drunk. I forgot cc to -hackers. The talk between me and Ian was like 
> that.
>
> 2008/10/12 Hitoshi Harada <[EMAIL PROTECTED]>:
>> 2008/10/12 Ian Caulfield <[EMAIL PROTECTED]>:
>>> 2008/10/11 Hitoshi Harada <[EMAIL PROTECTED]>:
 2008/10/12 Ian Caulfield <[EMAIL PROTECTED]>:
> 2008/10/11 Hitoshi Harada <[EMAIL PROTECTED]>
>>
>> I'm afraid the patch was too huge, trying to send it again without 
>> attachment...
>>
>> I made up my mind to scratch former window functions and redesigned
>> completely new execution model, based on the discussion with Heikki.
>> Attached is the v06 against HEAD today.
>
> Small nit - I get this from the following query:
>
> postgres=# select a, sum(a) over (order by a) from generate_series(1,10) 
> a;
>  a  | sum
> +-
>  1 |  55
>  2 |  55
>  3 |  55
>  4 |  55
>  5 |  55
>  6 |  55
>  7 |  55
>  8 |  55
>  9 |  55
>  10 |  55
> (10 rows)
>
> From what I can tell of the spec, the 'sum' column should contain a
> running sum (ie 1,3,6 etc). You mention that window frames haven't
> been implemented, but it seems like this case should return an error
> rather than the wrong results...
>
> Thanks,
> Ian
>

 Thanks for notice.
 I didn't know that. Ordered aggregate has only rows until current row?
 I guess I need read more spec.
>>>
>>> That's how I read it, the relevant part of the spec seems to be:
>>>
>>> 5) WD also defines for each row R of RTE the window frame WF of R,
>>> consisting of a collection of rows. WF
>>> is defined as follows.
>>>
>>> Case:
>>> a) If WD has no window framing clause, then
>>>
>>> Case:
>>> i) If the window ordering clause of WD is not present, then WF is the
>>> window partition of R.
>>> ii) Otherwise, WF consists of all rows of the partition of R that
>>> precede R or are peers of R in the
>>> window ordering of the window partition defined by the window ordering 
>>> clause.
>>>
>>> Ian
>>>
>>
>> It seems you're right. I will fix it soon probably.
>> By this spec, some of the regression tests including nth_value() etc.
>> are wrong. Generally we hold only preceding rows in the frame when
>> ORDER BY is specified, not only aggregate case.
>> Thanks again.

Doing a bit of poking around in the spec and the Oracle documentation,
I think (but I'm not 100% sure) that the results returned were correct
for the query:

postgres=# select a, sum(a) over () from generate_series(1,10) a;
ERROR:  either PARTITION BY or ORDER BY must be specified in window clause

Howver, someone who is better at parsing the spec than I am probably
ought to check...

Ian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions patch v06

2008-10-11 Thread Hitoshi Harada

I am drunk. I forgot cc to -hackers. The talk between me and Ian was like that.

2008/10/12 Hitoshi Harada <[EMAIL PROTECTED]>:
> 2008/10/12 Ian Caulfield <[EMAIL PROTECTED]>:
>> 2008/10/11 Hitoshi Harada <[EMAIL PROTECTED]>:
>>> 2008/10/12 Ian Caulfield <[EMAIL PROTECTED]>:
 2008/10/11 Hitoshi Harada <[EMAIL PROTECTED]>
>
> I'm afraid the patch was too huge, trying to send it again without 
> attachment...
>
> I made up my mind to scratch former window functions and redesigned
> completely new execution model, based on the discussion with Heikki.
> Attached is the v06 against HEAD today.

 Small nit - I get this from the following query:

 postgres=# select a, sum(a) over (order by a) from generate_series(1,10) a;
  a  | sum
 +-
  1 |  55
  2 |  55
  3 |  55
  4 |  55
  5 |  55
  6 |  55
  7 |  55
  8 |  55
  9 |  55
  10 |  55
 (10 rows)

 From what I can tell of the spec, the 'sum' column should contain a
 running sum (ie 1,3,6 etc). You mention that window frames haven't
 been implemented, but it seems like this case should return an error
 rather than the wrong results...

 Thanks,
 Ian

>>>
>>> Thanks for notice.
>>> I didn't know that. Ordered aggregate has only rows until current row?
>>> I guess I need read more spec.
>>
>> That's how I read it, the relevant part of the spec seems to be:
>>
>> 5) WD also defines for each row R of RTE the window frame WF of R,
>> consisting of a collection of rows. WF
>> is defined as follows.
>>
>> Case:
>> a) If WD has no window framing clause, then
>>
>> Case:
>> i) If the window ordering clause of WD is not present, then WF is the
>> window partition of R.
>> ii) Otherwise, WF consists of all rows of the partition of R that
>> precede R or are peers of R in the
>> window ordering of the window partition defined by the window ordering 
>> clause.
>>
>> Ian
>>
>
> It seems you're right. I will fix it soon probably.
> By this spec, some of the regression tests including nth_value() etc.
> are wrong. Generally we hold only preceding rows in the frame when
> ORDER BY is specified, not only aggregate case.
> Thanks again.
>
>
> Regards,
>
>
> --
> Hitoshi Harada
>



-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions patch v06

2008-10-11 Thread Ian Caulfield

2008/10/11 Hitoshi Harada <[EMAIL PROTECTED]>
>
> I'm afraid the patch was too huge, trying to send it again without 
> attachment...
>
> I made up my mind to scratch former window functions and redesigned
> completely new execution model, based on the discussion with Heikki.
> Attached is the v06 against HEAD today.

Small nit - I get this from the following query:

postgres=# select a, sum(a) over (order by a) from generate_series(1,10) a;
 a  | sum
+-
  1 |  55
  2 |  55
  3 |  55
  4 |  55
  5 |  55
  6 |  55
  7 |  55
  8 |  55
  9 |  55
 10 |  55
(10 rows)

>From what I can tell of the spec, the 'sum' column should contain a
running sum (ie 1,3,6 etc). You mention that window frames haven't
been implemented, but it seems like this case should return an error
rather than the wrong results...

Thanks,
Ian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window Functions patch v06

2008-10-11 Thread Hitoshi Harada

I'm afraid the patch was too huge, trying to send it again without attachment...

I made up my mind to scratch former window functions and redesigned
completely new execution model, based on the discussion with Heikki.
Attached is the v06 against HEAD today.

http://umitanuki.net/pgsql/wfv06/design.html

The concrete design concept is noted in nodeWindow.c, quoted here:

 * Note that the solid concept of the window functions is "they can access
 * arbitrary rows within the frame as they want". As far as we keep the rule
 * of thumb, any kind of optimization is allowed.

And I threw away an idea that window functions are derived from
aggregates, then window functions are only simple single function,
which have capability to arbitrary random row access. By this mean,
aggregates are special subset of window functions so that aggregates
are treated special case in nodeWindow.c, processed in
eval_windowaggregate(), while normal window functions are in
eval_windowfunction().

By giving random row access, not only able to integrate aggregate to
window mechanism but we now have lead()/lag() support. Furthermore, I
can see FRAME clause (i.e. moving frame aggregate/cumulative
computation) can be added for 8.4 with this design though not
implemented yet.

Tom,
I added a function to tuplestore.c,
tuplestore_write_to_read_pointer(), that allows to copy write pointer
position to the one of readptrs. This is required for my case because
when the functions access randomly rows that the executor haven't read
out will be fetched (i.e. last row of the frame). And I hope you see
the nth_value() function require really arbitrary row access.

P.S.
As a volunteer, my spare time to work on this project is getting
limited, as my daily work is getting busy for the end of the year
(this is a Japanese traditional business sense -:). I don't know how
much I can work for the next commit fest but I can't believe it if
this feature would be missed in 8.4 *only* for my own tiny business
clients!

Regards,



-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-09 Thread Hitoshi Harada

>> Also, current implementation has only a type of plan which uses sort
>> operation. It should be optimized by re-position the windows and/or
>> using hashtable.
>
> I would like to see some performance test results also. It would be good
> to know whether they are fast/slow etc.. It will definitely help the
> case for inclusion if they are faster than alternative multi-statement
> approaches to solving the basic data access problems.
>

Just for the report, I attach the result I have tested today. You see
the result says the current window function is faster than
sort-operated self-join and slower than hashagg-operated self-join.

This test is on the Redhat Linux ES3 Xeon 2.13GHz with 100,000 rows 2
column integers. I wrote simple perl script using psql invoking the
shell so it may contain the invocation overhead overall.


test0   test1   test2   test3   test4   test5

689.502 416.633 257.970 1195.294954.318 1204.292
687.254 447.676 256.629 1075.342949.711 1154.754
700.602 421.818 260.742 1105.680926.462 1203.012
736.594 476.388 334.310 1157.818978.861 1199.944
676.572 418.782 270.270 1060.900909.474 1175.079
687.260 428.564 257.032 1069.0131045.3871275.988
700.252 429.289 263.216 1074.7491018.9681273.965
719.478 445.218 258.464 1087.9321015.7441273.637
694.865 453.737 261.286 1065.2291039.9411262.208
685.756 430.169 258.017 1124.7951102.0551297.603

697.81  436.83  267.79  1101.68 994.09  1232.05

test0   SELECT sum(amount) OVER (PARTITION BY sector) FROM bench1;
test1   SELECT amount FROM bench1 ORDER BY sector;
test2   SELECT sum(amount) FROM bench1 GROUP BY sector;
test3   SELECT id, amount - avg(amount) OVER (PARTITION BY sector) FROM bench1;
test4   SELECT id, amount - avg FROM bench1 INNER JOIN(SELECT sector,
avg(amount) FROM bench1 GROUP BY sector)t USING(sector)
test5   SET enable_hashagg TO off; SELECT id, amount - avg FROM bench1
INNER JOIN(SELECT sector, avg(amount) FROM bench1 GROUP BY sector)t
USING(sector)

I'll include this test in my docs later.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-06 Thread Hitoshi Harada

2008/9/5 Heikki Linnakangas <[EMAIL PROTECTED]>:
> Heikki Linnakangas wrote:
>>
>> I'll review the parser/planner changes from the current patch.
>
> Looks pretty sane to me. Few issues:
>
> Is it always OK to share a window between two separate window function
> invocations, if they both happen to have identical OVER clause? It seems OK
> for stable functions, but I'm not sure that's correct for expressions
> involving volatile functions. I wonder if the SQL spec has anything to say
> about that.

It may be here:

---quote---
In general, two s are computed independently, each
one performing its own sort of its data, even if they use the same
data and the same . Since sorts may specify
partial orderings,
the computation of s is inevitably non-deterministic
to the extent that the ordering is not total. Nevertheless, the user
may desire that two s be computed using the same
ordering, so that, for example, two moving aggregates move through the
rows of a partition in precisely the same order.

Two s are computed using the same (possibly
non-deterministic) window ordering of the rows if any of the following
are true:

― The s identify the same window structure descriptor.

― The s' window structure descriptors have window
partitioning clauses that enumerate the same number of column
references, and those column references are pairwise equivalent in
their order
of occurrence; and their window structure descriptors have window
ordering clauses with the same number of s, and those s are all column references, and those column references are
pairwise equivalent in their order of occurrence, and the s pairwise specify or imply s that
specify equivalent s, the same  (ASC or DESC), and the same  (NULLS
FIRST or NULLS LAST).

― The window structure descriptor of one  is the
ordering window of the other , or both window
structure descriptors identify the same ordering window.
/---quote---

But it doesn't say anything about volatile functions. Do you have
example that is bad with the current design?

The other issuses are OK. I missed those cases. will fix them.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-04 Thread Heikki Linnakangas


Heikki Linnakangas wrote:

I'll review the parser/planner changes from the current patch.


Looks pretty sane to me. Few issues:

Is it always OK to share a window between two separate window function 
invocations, if they both happen to have identical OVER clause? It seems 
OK for stable functions, but I'm not sure that's correct for expressions 
involving volatile functions. I wonder if the SQL spec has anything to 
say about that.


As you noted in the comments, the planner could be a lot smarter about 
avoiding sorts. Currently you just always put a sort node below each 
Window, and another on top of them if there's an ORDER BY. There's 
obviously a lot of room for improvement there.


This line:

result_plan->targetlist = preprocess_window(tlist, result_plan);
in planner.c, won't work if result_plan is one that can't do projection. 
 A few screenfuls above that call, there's this:

/*
 * If the top-level plan node is one that 
cannot do expression
 * evaluation, we must insert a Result node to 
project the
 * desired tlist.
 */
if (!is_projection_capable_plan(result_plan))
{
result_plan = (Plan *) make_result(root,

   sub_tlist,

   NULL,

   result_plan);
}
else
{
/*
 * Otherwise, just replace the 
subplan's flat tlist with
 * the desired tlist.
 */
result_plan->targetlist = sub_tlist;
}
which is what you need to do with the preprocess_window call as well. I 
think this is caused by that:
postgres=# explain SELECT row_number() OVER (ORDER BY id*10) FROM 
(SELECT * FROM foo UNION ALL SELECT * FROM foo) sq;

ERROR:  bogus varattno for OUTER var: 2

And then there's these:

postgres=# explain SELECT * FROm foo WHERE row_number() OVER (ORDER BY 
id) > 10;

ERROR:  winaggref found in non-Window plan node
postgres=# explain UPDATE foo SET id = 10 RETURNING (ROW_NUMBER() OVER 
(ORDER BY random())) ;

ERROR:  winaggref found in non-Window plan node
postgres=# explain SELECT row_number() OVER (ORDER BY (1)) FROm foo;
ERROR:  ORDER/GROUP BY expression not found in targetlist
postgres=# explain SELECT row_number() OVER (ORDER BY length('foo')) 
FROM foo;

ERROR:  could not find pathkey item to sort

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-04 Thread Heikki Linnakangas


Hitoshi Harada wrote:

BTW, I think it is better to put together the discussion points we
have done as "general roadmap to complete window functions". It is not
about the features for the next release but is the complete tasks.
Where to go? Wiki, or my design docs?


That's up to you, really. I like your design docs page, but you're free 
to use the Wiki as well, of course.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions doc patch

2008-09-03 Thread Hitoshi Harada

2008/9/4 Erikj <[EMAIL PROTECTED]>:
> Dear Hitoshi,
>
> I noticed the folowing typo in the doc sgml:
>
>  'rownumber()', instead of 'row_number()' ( 2x )
>
> hth
>
> Erik Rijkers
>
>
>
> *** doc/src/sgml/func.sgml.orig 2008-09-03 17:20:28.130229027 +0200
> --- doc/src/sgml/func.sgml  2008-09-03 17:21:01.331907454 +0200
> ***
> *** 10092,10100 
>   
>
> 
> ! rownumber()
> 
> !rownumber() OVER (ORDER BY  class="parameter">expression)
>
>
> bigint
> --- 10092,10100 
>   
>
> 
> ! row_number()
> 
> !row_number() OVER (ORDER BY  class="parameter">expression)
>
>
> bigint
>

Ah, thanks. It's my mistake. Both of SQL spec and my implementation in
pg_proc say it is row_number, not rownumber.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Window functions doc patch

2008-09-03 Thread Erikj

Dear Hitoshi,

I noticed the folowing typo in the doc sgml:

  'rownumber()', instead of 'row_number()' ( 2x )

hth

Erik Rijkers



*** doc/src/sgml/func.sgml.orig 2008-09-03 17:20:28.130229027 +0200
--- doc/src/sgml/func.sgml  2008-09-03 17:21:01.331907454 +0200
***
*** 10092,10100 
   

 
! rownumber()
 
!rownumber() OVER (ORDER BY expression)


 bigint
--- 10092,10100 
   

 
! row_number()
 
!row_number() OVER (ORDER BY expression)


 bigint



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-03 Thread Hitoshi Harada

2008/9/3 Heikki Linnakangas <[EMAIL PROTECTED]>:
> Hitoshi Harada wrote:
>>
>>> I'd suggest:
>>>
>>> 1. Implement Window node, with the capability to invoke an aggregate
>>> function, using the above API. Implement required parser/planner changes.
>>> Implement a few simple ranking aggregates using the API.
>>> 2. Implement glue to call normal aggregates through the new API
>>> 3. Implement the optimization to drop rows that are no longer needed
>>> (signal_cutoff can be a no-op until this phase)
>>> 4. Implement window framing (the frame can always be all rows in the
>>> partition, or all rows until the current row, until this phase)
>>> 5. Expose the new API to user-defined aggregates. It can be an internal
>>> API
>>> only used by built-in functions until this phase
>>>
>>> I believe you already have phase 1 in your patch, except for the API
>>> changes.
>>
>> I am willing to challenge to implement the API above, after maintain
>> the current patch adding docs and tests. Since the API includes
>> changes much more like Aggregate syntax than current patch, I'm not
>> sure if I can finish it by next commit fest, which is said to be
>> "feature freeze". For safety, remain the current patch to review
>> excluding API and executor then if I fail to finish use it for next
>> release. Git helps it by cutting a branch, does it? How do you think?
>
> We do allow changes to the user manual after the feature freeze, so I'd
> suggest concentrating on the code and tests first. Code comments and
> internal docs are important, though, for easy review.
>
> I'm sure we won't get all the way to phase 5 for 8.4, but if we can even get
> 1-3, plus some of the most important window functions, this this will be a
> great release!

OK, so first tests and internal docs/comments, then comes trying to
catch API , finally docs.

BTW, I think it is better to put together the discussion points we
have done as "general roadmap to complete window functions". It is not
about the features for the next release but is the complete tasks.
Where to go? Wiki, or my design docs?

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-03 Thread Simon Riggs


On Wed, 2008-09-03 at 09:51 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Tue, 2008-09-02 at 15:51 +0300, Heikki Linnakangas wrote:
> > 
> >> The needs of access to the rows are so different that it seems best to
> >> me to delegate the buffering to the window function.
> > 
> > That seems sensible in some ways, not others.
> 
> In the API I proposed later in that mail, the buffering is actually done 
> by the executor node, not by the window function. Instead, the window 
> function can request abitrary rows of the frame from the executor, and 
> can signal that some rows are no longer required, allowing them to be 
> discarded.

I'm happy with that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-03 Thread Heikki Linnakangas


Hitoshi Harada wrote:

2008/9/2 Heikki Linnakangas <[EMAIL PROTECTED]>:

Hitoshi Harada wrote:

2008/9/2 Heikki Linnakangas <[EMAIL PROTECTED]>:
In my understanding, the "Window Frame" is defined
by clauses such like "ROWS BETWEEN ... ", "RANGE BETWEEN ... " or so,
contrast to "Window Partition" defined by "PARTITION BY" clause. A
frame slides within a partition or there's only one frame if those
clauses are not specified. The current patch has not implemented this
yet. I'll update the docs.

Yes, that's how I read it as well. Another way to think of it is that
there's always a window frame, but if no ROWS BETWEEN or similar clause is
given, the window frame spans the whole partition (or from the 1st row of
the partition, up to the current row, if ORDER BY is given).


I don't like to call the second type "ranking aggregates" because it
may refer to only ranking functions though there are more types of
function like ntile(), lead() and lag(). But "window functions"
doesn't seem appropriate either since it's ambiguous with the general
name of window expressions.

Yep, confusing :-(. The SQL standard says that "a window function is one of:
a rank function, a distribution function, a row number function, a window
aggregate function, the ntile function, the lead function, the lag function,
the first-value function, the last-value function, the nth-value function".
So, window aggregate functions are a special class of window functions, and
there's no term to refer to all the rest of the window functions excluding
window aggregate functions.

Your docSQL spec
Window expression   Window function
Window function Any window function other than a window aggregate
function
Window aggregateWindow aggregate function

I tried to coin the term "ranking aggregate" for the SQL2008 term "Any
window function other than a window aggregate function", but you're right
that that's still confusing, because the SQL2008 term "rank function"
includes only RANK() and DENSE_RANK().

The spec calls them "group aggregate functions", when they're used with
GROUP BY, rather than as a window function. I think we could use that term.


Agree. So from now on, we use "window functions" for all kinds of
functions including window aggregates. "Window expression" is
discarded. "Window functions" also means the mechanism to support
these functions to process and this project.


Your proposal is smarter than the current implementation. But it
doesn't seem complete in some way. From logical point of view, the
window functions should be allowed to access whole of rows in the
frame the current row belongs to (c.f. inverse distribution
functions).

By the whole of rows, do you mean
a) the chosen value or expression of all rows, or
b) all columns of the input rows?


a). I mean all input rows in a window frame. But later I found
"inverse distribution function" is not one of window functions. That
is actually one of aggregate functions. Forget about it.


Different window functions have different needs. RANK() for example does
need to see all columns, to compare them, but it only needs access to the
previous and current row. CUME_DIST on the other hand needs access to all
columns of all rows, and LAG needs access to a specific column of a fixed
number of rows. And row_number needs nothing.

The needs of access to the rows are so different that it seems best to me to
delegate the buffering to the window function.


Delegating optimization to them depending on functions' needs is a
good idea. So executor can concentrate on the window function process
flow. Let's unify it in executor and let trivial optimizations get
into individual functions.


Actually, looking closer to the ranking functions, they don't really need
access to all columns. They just need to be able to compare them, according
to the window ordering, so we should probably only provide access to the
arguments of the aggregate, evaluated for any row in the frame, and a
comparator function, that can determine if any given rows in the frame are
<, = or >.


That is kind of problem. If your task is only to define that window
node executor simply stores window frame rows and pass them to window
functions as they need, the rank functions' needs don't come. As you
point out, rank functions need ordering key columns and its
comparators. So you have to imagine what comes next? What will be
wanted other than ordering key columns, if we think about "universe
window functions" much more than SQL spec says?


It might be a good idea to google around what window functions other 
DBMSs support, and see if this scheme could support all of them. I 
couldn't find any that it couldn't, but I didn't look very hard.



Let's look at the trivial, generic, and slow implementation first, and then
figure out what tricks we can do to make it faster. I gather that that
generic algorithm, following how the SQL spec defines the window frame,
looks like this:


So as to satisfy all of

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Heikki Linnakangas


Simon Riggs wrote:

On Tue, 2008-09-02 at 15:51 +0300, Heikki Linnakangas wrote:


The needs of access to the rows are so different that it seems best to
me to delegate the buffering to the window function.


That seems sensible in some ways, not others.


In the API I proposed later in that mail, the buffering is actually done 
by the executor node, not by the window function. Instead, the window 
function can request abitrary rows of the frame from the executor, and 
can signal that some rows are no longer required, allowing them to be 
discarded.



Some of the window functions, like lead and lag merely specify window
size and shape for other functions to act upon.


I don't  understand that. LEAD/LAG return a value. They don't affect the 
size or shape of the window in any way. It doesn't affect other functions.



For those types of
request I don't see any need for custom functions, whereas for the
comparison/calculation functions there might be a need.

We don't need to implement all the things the SQL Standard calls window
functions with a 1:1 mapping to Postgres functions.


Sure, we have special hacks for things like MIN/MAX already. But using 
PostgreSQL functions does seem like the simplest solution to me, as the 
backend code can get quite complex if we have to add special handling 
for different window functions. LEAD/LAG fall quite nicely into the 
framework I proposed, but if something comes along that doesn't, then 
we'll have to extend the framework or add a special case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Simon Riggs

On Tue, 2008-09-02 at 15:51 +0300, Heikki Linnakangas wrote:

> The needs of access to the rows are so different that it seems best to
> me to delegate the buffering to the window function.

That seems sensible in some ways, not others.

Some of the window functions, like lead and lag merely specify window
size and shape for other functions to act upon. For those types of
request I don't see any need for custom functions, whereas for the
comparison/calculation functions there might be a need.

We don't need to implement all the things the SQL Standard calls window
functions with a 1:1 mapping to Postgres functions.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Hitoshi Harada

2008/9/2 Heikki Linnakangas <[EMAIL PROTECTED]>:
> Hitoshi Harada wrote:
>>
>> 2008/9/2 Heikki Linnakangas <[EMAIL PROTECTED]>:
>> In my understanding, the "Window Frame" is defined
>> by clauses such like "ROWS BETWEEN ... ", "RANGE BETWEEN ... " or so,
>> contrast to "Window Partition" defined by "PARTITION BY" clause. A
>> frame slides within a partition or there's only one frame if those
>> clauses are not specified. The current patch has not implemented this
>> yet. I'll update the docs.
>
> Yes, that's how I read it as well. Another way to think of it is that
> there's always a window frame, but if no ROWS BETWEEN or similar clause is
> given, the window frame spans the whole partition (or from the 1st row of
> the partition, up to the current row, if ORDER BY is given).
>
>> I don't like to call the second type "ranking aggregates" because it
>> may refer to only ranking functions though there are more types of
>> function like ntile(), lead() and lag(). But "window functions"
>> doesn't seem appropriate either since it's ambiguous with the general
>> name of window expressions.
>
> Yep, confusing :-(. The SQL standard says that "a window function is one of:
> a rank function, a distribution function, a row number function, a window
> aggregate function, the ntile function, the lead function, the lag function,
> the first-value function, the last-value function, the nth-value function".
> So, window aggregate functions are a special class of window functions, and
> there's no term to refer to all the rest of the window functions excluding
> window aggregate functions.
>
> Your docSQL spec
> Window expression   Window function
> Window function Any window function other than a window aggregate
> function
> Window aggregateWindow aggregate function
>
> I tried to coin the term "ranking aggregate" for the SQL2008 term "Any
> window function other than a window aggregate function", but you're right
> that that's still confusing, because the SQL2008 term "rank function"
> includes only RANK() and DENSE_RANK().
>
> The spec calls them "group aggregate functions", when they're used with
> GROUP BY, rather than as a window function. I think we could use that term.

Agree. So from now on, we use "window functions" for all kinds of
functions including window aggregates. "Window expression" is
discarded. "Window functions" also means the mechanism to support
these functions to process and this project.

>> Your proposal is smarter than the current implementation. But it
>> doesn't seem complete in some way. From logical point of view, the
>> window functions should be allowed to access whole of rows in the
>> frame the current row belongs to (c.f. inverse distribution
>> functions).
>
> By the whole of rows, do you mean
> a) the chosen value or expression of all rows, or
> b) all columns of the input rows?

a). I mean all input rows in a window frame. But later I found
"inverse distribution function" is not one of window functions. That
is actually one of aggregate functions. Forget about it.

>
> Different window functions have different needs. RANK() for example does
> need to see all columns, to compare them, but it only needs access to the
> previous and current row. CUME_DIST on the other hand needs access to all
> columns of all rows, and LAG needs access to a specific column of a fixed
> number of rows. And row_number needs nothing.
>
> The needs of access to the rows are so different that it seems best to me to
> delegate the buffering to the window function.

Delegating optimization to them depending on functions' needs is a
good idea. So executor can concentrate on the window function process
flow. Let's unify it in executor and let trivial optimizations get
into individual functions.

>
> Actually, looking closer to the ranking functions, they don't really need
> access to all columns. They just need to be able to compare them, according
> to the window ordering, so we should probably only provide access to the
> arguments of the aggregate, evaluated for any row in the frame, and a
> comparator function, that can determine if any given rows in the frame are
> <, = or >.

That is kind of problem. If your task is only to define that window
node executor simply stores window frame rows and pass them to window
functions as they need, the rank functions' needs don't come. As you
point out, rank functions need ordering key columns and its
comparators. So you have to imagine what comes next? What will be
wanted other than ordering key columns, if we think about "universe
window functions" much more than SQL spec says?

>
>> From implementing and performance point of view, there
>> need to consider such case like mixing window aggregates and ranking
>> aggregates in the same window, which means it is better that the two
>> types of functions are processed in the similar flow. Also logically,
>> SQL spec doesn't forbid ranking aggregates in sliding window frames.
>
> It do

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Neil Conway

On Tue, Sep 2, 2008 at 9:35 AM, David Fetter <[EMAIL PROTECTED]> wrote:
> Any chance we can buy a few copies of the official one for use on the
> project?

AFAIK there is no significant difference between the "official"
standard and the draft version available online, so I don't see the
point.

Neil

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Simon Riggs


On Tue, 2008-09-02 at 09:35 -0700, David Fetter wrote:
> On Tue, Sep 02, 2008 at 12:42:45PM +0100, Simon Riggs wrote:
> > 
> > On Tue, 2008-09-02 at 03:14 -0400, Tom Lane wrote:
> > > David Fetter <[EMAIL PROTECTED]> writes:
> > > > On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
> > > >> It's not like we haven't seen a SQL draft go down in flames
> > > >> before.
> > > 
> > > > Do you think that anything in the windowing functions section
> > > > will disappear?
> > > 
> > > Who's to say?
> > > 
> > > I have no objection to looking at the 2003 and 200n documents in
> > > parallel, especially if there are places where 200n clarifies the
> > > intent of 2003.  But I'd be suspicious of designing around
> > > entirely-new features presented by 200n.
> > 
> > I have confirmation from Michael Gorman, Wiscorp, that 
> > 
> > > The new standard was approved in early Summer. SQL 2008 is
> > > finished.
> > 
> > So as of now, SQL2008 exists, all hail. SQL2003 and earlier versions
> > have been superseded and can be ignored.
> 
> Any chance we can buy a few copies of the official one for use on the
> project?

You have to buy them from ISO website I think, but it was $00s when I
looked some years back - we'd probably need a dozen copies at least
since we hardly ever meet. Michael's .pdf was the final version, so we
do actually have the final form even if the page formatting somewhat
different.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread David Fetter

On Tue, Sep 02, 2008 at 12:42:45PM +0100, Simon Riggs wrote:
> 
> On Tue, 2008-09-02 at 03:14 -0400, Tom Lane wrote:
> > David Fetter <[EMAIL PROTECTED]> writes:
> > > On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
> > >> It's not like we haven't seen a SQL draft go down in flames
> > >> before.
> > 
> > > Do you think that anything in the windowing functions section
> > > will disappear?
> > 
> > Who's to say?
> > 
> > I have no objection to looking at the 2003 and 200n documents in
> > parallel, especially if there are places where 200n clarifies the
> > intent of 2003.  But I'd be suspicious of designing around
> > entirely-new features presented by 200n.
> 
> I have confirmation from Michael Gorman, Wiscorp, that 
> 
> > The new standard was approved in early Summer. SQL 2008 is
> > finished.
> 
> So as of now, SQL2008 exists, all hail. SQL2003 and earlier versions
> have been superseded and can be ignored.

Any chance we can buy a few copies of the official one for use on the
project?

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Heikki Linnakangas


Martijn van Oosterhout wrote:

On Tue, Sep 02, 2008 at 10:44:31AM +0100, Simon Riggs wrote:

If we only have the combined (brain * time) to get a partial
implementation in for this release then I would urge we go for that,
rather than wait for perfection - as long as there are no other negative
effects.


"premature optimization is the root of all evil." (Knuth, Donald.)

"make it work, then make it better".

Getting a partial implementation that works out now is far better than
waiting until its perfect.


Sure. Just have to watch out that we don't follow a dead-end, making it 
harder to add missing functionality later on.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Heikki Linnakangas


Hitoshi Harada wrote:

2008/9/2 Heikki Linnakangas <[EMAIL PROTECTED]>:
In my understanding, the "Window Frame" is defined
by clauses such like "ROWS BETWEEN ... ", "RANGE BETWEEN ... " or so,
contrast to "Window Partition" defined by "PARTITION BY" clause. A
frame slides within a partition or there's only one frame if those
clauses are not specified. The current patch has not implemented this
yet. I'll update the docs.


Yes, that's how I read it as well. Another way to think of it is that 
there's always a window frame, but if no ROWS BETWEEN or similar clause 
is given, the window frame spans the whole partition (or from the 1st 
row of the partition, up to the current row, if ORDER BY is given).



I don't like to call the second type "ranking aggregates" because it
may refer to only ranking functions though there are more types of
function like ntile(), lead() and lag(). But "window functions"
doesn't seem appropriate either since it's ambiguous with the general
name of window expressions.


Yep, confusing :-(. The SQL standard says that "a window function is one 
of: a rank function, a distribution function, a row number function, a 
window aggregate function, the ntile function, the lead function, the 
lag function, the first-value function, the last-value function, the 
nth-value function". So, window aggregate functions are a special class 
of window functions, and there's no term to refer to all the rest of the 
window functions excluding window aggregate functions.


Your docSQL spec
Window expression   Window function
Window function Any window function other than a window aggregate 
function
Window aggregateWindow aggregate function

I tried to coin the term "ranking aggregate" for the SQL2008 term "Any 
window function other than a window aggregate function", but you're 
right that that's still confusing, because the SQL2008 term "rank 
function" includes only RANK() and DENSE_RANK().


The spec calls them "group aggregate functions", when they're used with 
GROUP BY, rather than as a window function. I think we could use that term.



Your proposal is smarter than the current implementation. But it
doesn't seem complete in some way. From logical point of view, the
window functions should be allowed to access whole of rows in the
frame the current row belongs to (c.f. inverse distribution
functions). 


By the whole of rows, do you mean
a) the chosen value or expression of all rows, or
b) all columns of the input rows?

Different window functions have different needs. RANK() for example does 
need to see all columns, to compare them, but it only needs access to 
the previous and current row. CUME_DIST on the other hand needs access 
to all columns of all rows, and LAG needs access to a specific column of 
a fixed number of rows. And row_number needs nothing.


The needs of access to the rows are so different that it seems best to 
me to delegate the buffering to the window function.


Actually, looking closer to the ranking functions, they don't really 
need access to all columns. They just need to be able to compare them, 
according to the window ordering, so we should probably only provide 
access to the arguments of the aggregate, evaluated for any row in the 
frame, and a comparator function, that can determine if any given rows 
in the frame are <, = or >.



From implementing and performance point of view, there
need to consider such case like mixing window aggregates and ranking
aggregates in the same window, which means it is better that the two
types of functions are processed in the similar flow. Also logically,
SQL spec doesn't forbid ranking aggregates in sliding window frames.


It doesn't? Oh, bummer. It seems we need a grand unified theory of 
ranking and other aggregates then :-(. How does something like RANK work 
with a window frame? What does it return, if the current row is excluded 
from the window frame, e.g with EXCLUDE CURRENT ROW?



Let's look at the trivial, generic, and slow implementation first, and 
then figure out what tricks we can do to make it faster. I gather that 
that generic algorithm, following how the SQL spec defines the window 
frame, looks like this:


0. Construct the ordered set of tuples, containing all the tuples in the 
partition.

For each row, start with the ordered set constructed in step 0, and:
1. Remove all tuples from the set that are before the start of the 
sliding frame
2. Remove all tuples from the set that are after the end of the sliding 
frame
3. Remove all tuples specified by the window-frame-exclusion clause 
(EXCLUDE CURRENT ROW/GROUP/TIES/NO OTHERS).

4. Run the aggregate over all tuples still in the set.

Now, the first obvious optimization is that we don't want to construct 
the window frame from scratch for each row. Instead:


0. Start with an empty ordered set S.
For each row:
1. Read forward until we hit the end of the new sliding frame, and add 
those rows to S.

2. Remove all tuples

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Hitoshi Harada

2008/9/2 Simon Riggs <[EMAIL PROTECTED]>:
> If you've done all of that, then I'm impressed. Well done.
>
> Few general comments
>
> * The docs talk about "windowing functions", yet you talk about "window
> functions" here. I think the latter is correct, but whichever we choose
> we should be consistent (and hopefully matching SQL Standard).

That's what I'm embarrassed. Now we have "window functions" meaning
two, one for generic name of window expressions and the other for
non-window-aggregates. It is a word play, which is difficult problem
for non-native people, but... let's use "window functions". I'll
revise sgml docs.

> * You don't use duplicate the examples from the docs into the tests,
> which is always a good way to get conflicting reports from users. :-)
>
> * The tests seem very light for such a huge range of new functionality.
> (8 tests is hardly sufficient). I'd like to see a wide range of tests -
> probably 5-10 times as many individual test statements. I would also
> like to see test failures that illustrate the as-yet unimplemented
> features and the warning messages that are thrown - this will help us
> understand exactly what is missing also. It would also be useful to see
> other common coding mistakes/misconceptions and the corresponding error
> messages.
>
>> Also, current implementation has only a type of plan which uses sort
>> operation. It should be optimized by re-position the windows and/or
>> using hashtable.
>
> I would like to see some performance test results also. It would be good
> to know whether they are fast/slow etc.. It will definitely help the
> case for inclusion if they are faster than alternative multi-statement
> approaches to solving the basic data access problems.
>

OK, thanks for your advices. I'll work on tests, docs and benchmarks,
then send in another patch in a week or so.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Simon Riggs


On Tue, 2008-09-02 at 03:14 -0400, Tom Lane wrote:
> David Fetter <[EMAIL PROTECTED]> writes:
> > On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
> >> It's not like we haven't seen a SQL draft go down in flames before.
> 
> > Do you think that anything in the windowing functions section will
> > disappear?
> 
> Who's to say?
> 
> I have no objection to looking at the 2003 and 200n documents in
> parallel, especially if there are places where 200n clarifies the
> intent of 2003.  But I'd be suspicious of designing around
> entirely-new features presented by 200n.

I have confirmation from Michael Gorman, Wiscorp, that 

> The new standard was approved in early Summer. SQL 2008 is finished.

So as of now, SQL2008 exists, all hail. SQL2003 and earlier versions
have been superseded and can be ignored.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Martijn van Oosterhout

On Tue, Sep 02, 2008 at 10:44:31AM +0100, Simon Riggs wrote:
> If we only have the combined (brain * time) to get a partial
> implementation in for this release then I would urge we go for that,
> rather than wait for perfection - as long as there are no other negative
> effects.

"premature optimization is the root of all evil." (Knuth, Donald.)

"make it work, then make it better".

Getting a partial implementation that works out now is far better than
waiting until its perfect.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Simon Riggs

On Mon, 2008-09-01 at 21:00 +0300, Heikki Linnakangas wrote:

> 1. It's important that what gets committed now can be extended to handle 
> all of the window function stuff in SQL2003 in the future, as well as 
> user-defined-window-functions in the spirit of PostgreSQL extensibility. 
> Even if we don't implement all of it in this release.

I think whatever public APIs get published now must be sufficient to
support user-defined-window functions across all future releases, so on
that point I agree completely. (One reason why I argued earlier in
favour of avoiding an API for now).

We shouldn't restrict the implementation of the internals to be upward
compatible though because I foresee some aspect of complexity stalling
and thus killing the patch in the short term if we do that. We don't
have much time left for this release.

If we only have the combined (brain * time) to get a partial
implementation in for this release then I would urge we go for that,
rather than wait for perfection - as long as there are no other negative
effects.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Simon Riggs

On Sat, 2008-08-30 at 02:04 +0900, Hitoshi Harada wrote:

> Here's the latest window functions patch against HEAD. It seems to be
> ready for the September commit fest, as added documents, WINDOW clause
> feature and misc tests. I guess this would be the window functions
> feature freeze for 8.4. The remaining feature will be implemented for
> the later release.
> 
> This patch consists of:
> - windowed aggregates
> - cooperate with GROUP BY aggregates
> - some optimizations with multiple windows
> - ranking functions
> - WINDOW clause
> - windowing SQL regression tests
> - sgml documents
> 
> Looking up the total road map, the dropped features are:
> 
> - sliding window (window framing)
> - lead(), lag(), etc. that reach for random rows
> - user defined window functions
> 
> The first and second topics are difficult to implement currently.
> Because these features require random row access, it seems that
> tuplestore would be able to save multiple positions to mark/restore.
> This is fundamental change that is over my capability. Also, user
> defined window functions seem to have much more to decide. I think I
> can't put into shape the general needs of user's window functions now.
> Lacking these feature, this stage looks compatible to SQLServer 2005,
> while Oracle and DB2 have almost full of the specification.

If you've done all of that, then I'm impressed. Well done.

Few general comments

* The docs talk about "windowing functions", yet you talk about "window
functions" here. I think the latter is correct, but whichever we choose
we should be consistent (and hopefully matching SQL Standard).

* You don't use duplicate the examples from the docs into the tests,
which is always a good way to get conflicting reports from users. :-)

* The tests seem very light for such a huge range of new functionality.
(8 tests is hardly sufficient). I'd like to see a wide range of tests -
probably 5-10 times as many individual test statements. I would also
like to see test failures that illustrate the as-yet unimplemented
features and the warning messages that are thrown - this will help us
understand exactly what is missing also. It would also be useful to see
other common coding mistakes/misconceptions and the corresponding error
messages.

> Also, current implementation has only a type of plan which uses sort
> operation. It should be optimized by re-position the windows and/or
> using hashtable.

I would like to see some performance test results also. It would be good
to know whether they are fast/slow etc.. It will definitely help the
case for inclusion if they are faster than alternative multi-statement
approaches to solving the basic data access problems.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Hitoshi Harada

2008/9/2 Heikki Linnakangas <[EMAIL PROTECTED]>:
> Gregory Stark wrote:
>> What would the executor do for a query like
>>
>> SELECT lead(x,1),lead(y,2),lead(y,3)
>>
>> It would not only have to keep a tuplestore to buffer the output but it
>> would
>> have to deal with receiving data from different SRFs at different times.
>> The
>> best approach I can think of would be to keep a tuplestore for each SRF
>> using
>> themas queues, reading old values from the head as soon as they all have
>> at
>> least one new value in them.
>
> Hitoshi solved that creating a separate Window node for each window
> function. So the plan tree for that would look something like:
>
> Window (lead(x,1))
>  Window (lead(y,2))
>Window (lead(y,3))
>  Seq Scan ...
>
> That keeps the Window node implementation quite simple because it only needs
> to handle one window function at a time.

To say more accurately, one Window node can handle more than one
window function. If it is thought to be the same using equalFuncs
comparing targetlists, some functions are put into one node.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Stefan Kaltenbrunner


Tom Lane wrote:

David Fetter <[EMAIL PROTECTED]> writes:

On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:

It's not like we haven't seen a SQL draft go down in flames before.



Do you think that anything in the windowing functions section will
disappear?


Who's to say?

I have no objection to looking at the 2003 and 200n documents in
parallel, especially if there are places where 200n clarifies the
intent of 2003.  But I'd be suspicious of designing around
entirely-new features presented by 200n.


well http://www.wiscorp.com/SQLStandards.html

states:

"This points to the documents which wlll likely be the documents that 
represent the SQL 2008 Standard. These documents are out for 
International Standard ballot at this time. The vote is an Up/Down vote. 
No changes allowed."



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-02 Thread Tom Lane

David Fetter <[EMAIL PROTECTED]> writes:
> On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
>> It's not like we haven't seen a SQL draft go down in flames before.

> Do you think that anything in the windowing functions section will
> disappear?

Who's to say?

I have no objection to looking at the 2003 and 200n documents in
parallel, especially if there are places where 200n clarifies the
intent of 2003.  But I'd be suspicious of designing around
entirely-new features presented by 200n.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-01 Thread David Fetter

On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> > David Fetter wrote:
> >> On Mon, Sep 01, 2008 at 09:00:47PM +0300, Heikki Linnakangas wrote:
> >>> Ok, I'm starting to read up on SQL2003 window functions,
> >> 
> >> Maybe it would be better to read the SQL2008 standard
> >>  :)
> 
> > Ah, thanks!
> 
> Er, s/2008 standard/200n draft with uncertain chances of passage/
> 
> It's not like we haven't seen a SQL draft go down in flames before.

Do you think that anything in the windowing functions section will
disappear?

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 111 matches

Mail list logo