Re: FETCH FIRST clause WITH TIES option

2020-05-06 Thread Alvaro Herrera
On 2020-Apr-08, Andrew Gierth wrote:

> > "Alvaro" == Alvaro Herrera  writes:
> 
>  >> This needs to be fixed in ruleutils, IMO, not by changing what the
>  >> grammar accepts.
> 
>  Alvaro> Fair. I didn't change what the grammar accepts. I ended up only
>  Alvaro> throwing an error in parse analysis when a bare NULL const is
>  Alvaro> seen.
> 
> This seems far too arbitrary to me.

Oops, I saw this comment only now.

We're still on time to fix this, if there's something that needs fixing.
Let's discuss what changes are needed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2020-04-07 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 >> This needs to be fixed in ruleutils, IMO, not by changing what the
 >> grammar accepts.

 Alvaro> Fair. I didn't change what the grammar accepts. I ended up only
 Alvaro> throwing an error in parse analysis when a bare NULL const is
 Alvaro> seen.

This seems far too arbitrary to me.

-- 
Andrew (irc:RhodiumToad)




Re: FETCH FIRST clause WITH TIES option

2020-04-07 Thread Alvaro Herrera
On 2020-Apr-08, Andrew Gierth wrote:

> > "Alvaro" == Alvaro Herrera  writes:
> 
>  Alvaro> It turns out that the SQL standard is much more limited in what
>  Alvaro> it will accept there. But our grammar (what we'll accept for
>  Alvaro> the ancient LIMIT clause) is very lenient -- it'll take just
>  Alvaro> any expression. I thought about reducing that to NumericOnly
>  Alvaro> for FETCH FIRST .. WITH TIES, but then I have to pick: 1)
>  Alvaro> gram.y fails to compile because of a reduce/reduce conflict, or
>  Alvaro> 2) also restricting FETCH FIRST .. ONLY to NumericOnly. Neither
>  Alvaro> of those seemed very palatable.
> 
> FETCH FIRST ... ONLY was made _too_ restrictive initially, such that it
> didn't allow parameters (which are allowed by the spec); see 1da162e1f.

Hmm, oh, I see.

> (That change didn't present a problem for ruleutils, because FETCH FIRST
> ... ONLY is output as a LIMIT clause instead.)

Right, I noticed that and kept it unchanged.

> This needs to be fixed in ruleutils, IMO, not by changing what the
> grammar accepts.

Fair.  I didn't change what the grammar accepts.  I ended up only
throwing an error in parse analysis when a bare NULL const is seen.
I guess we could fix ruleutils to add parens when NULL is seen, but
I'm not sure it's necessary or useful.  (LIMIT uses a null to represent
the LIMIT ALL case.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2020-04-07 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 Alvaro> It turns out that the SQL standard is much more limited in what
 Alvaro> it will accept there. But our grammar (what we'll accept for
 Alvaro> the ancient LIMIT clause) is very lenient -- it'll take just
 Alvaro> any expression. I thought about reducing that to NumericOnly
 Alvaro> for FETCH FIRST .. WITH TIES, but then I have to pick: 1)
 Alvaro> gram.y fails to compile because of a reduce/reduce conflict, or
 Alvaro> 2) also restricting FETCH FIRST .. ONLY to NumericOnly. Neither
 Alvaro> of those seemed very palatable.

FETCH FIRST ... ONLY was made _too_ restrictive initially, such that it
didn't allow parameters (which are allowed by the spec); see 1da162e1f.

(That change didn't present a problem for ruleutils, because FETCH FIRST
... ONLY is output as a LIMIT clause instead.)

This needs to be fixed in ruleutils, IMO, not by changing what the
grammar accepts.

-- 
Andrew.




Re: FETCH FIRST clause WITH TIES option

2020-04-07 Thread Alvaro Herrera
Hello

On 2020-Apr-07, Andres Freund wrote:

> On 2020-04-07 16:36:54 -0400, Alvaro Herrera wrote:
> > Pushed, with some additional changes.
> 
> This triggers a new warning for me (gcc-10):
> /home/andres/src/postgresql/src/backend/executor/nodeLimit.c: In function 
> ‘ExecLimit’:
> /home/andres/src/postgresql/src/backend/executor/nodeLimit.c:136:7: warning: 
> this statement may fall through [-Wimplicit-fallthrough=]
>   136 |if (ScanDirectionIsForward(direction))
>   |   ^
> /home/andres/src/postgresql/src/backend/executor/nodeLimit.c:216:3: note: here
>   216 |   case LIMIT_WINDOWEND_TIES:
>   |   ^~~~
> 
> I've not looked at it in any sort of detail, but it looks like it might
> be a false positive, with the "fall-through" comment not being
> sufficient to quiesce the compiler?

It's on purpose, yeah, but I can understand the compiler not getting it.

> Cosmetically I would agree that falling through to the next case" a few
> blocks deep inside a case: isn't the prettiest...

That's true ... maybe a fix would be to split that stuff to a
subroutine?

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2020-04-07 Thread Andres Freund
Hi,

On 2020-04-07 16:36:54 -0400, Alvaro Herrera wrote:
> Pushed, with some additional changes.

This triggers a new warning for me (gcc-10):
/home/andres/src/postgresql/src/backend/executor/nodeLimit.c: In function 
‘ExecLimit’:
/home/andres/src/postgresql/src/backend/executor/nodeLimit.c:136:7: warning: 
this statement may fall through [-Wimplicit-fallthrough=]
  136 |if (ScanDirectionIsForward(direction))
  |   ^
/home/andres/src/postgresql/src/backend/executor/nodeLimit.c:216:3: note: here
  216 |   case LIMIT_WINDOWEND_TIES:
  |   ^~~~

I've not looked at it in any sort of detail, but it looks like it might
be a false positive, with the "fall-through" comment not being
sufficient to quiesce the compiler?

Cosmetically I would agree that falling through to the next case" a few
blocks deep inside a case: isn't the prettiest...

- Andres




Re: FETCH FIRST clause WITH TIES option

2020-04-07 Thread Alvaro Herrera
Pushed, with some additional changes.

So *of course* when I add tests to verify that ruleutils, I find a case
that does not work properly -- meaning, you get a view that pg_dump
emits in a way that won't be accepted:

CREATE VIEW limit_thousand_v_3 AS SELECT thousand FROM onek WHERE thousand < 995
ORDER BY thousand FETCH FIRST NULL ROWS WITH TIES;

note the "NULL" there.  ruleutils would gladly print this out as:

View definition:
 SELECT onek.thousand
   FROM onek
  WHERE onek.thousand < 995
  ORDER BY onek.thousand
 FETCH FIRST NULL::integer ROWS WITH TIES;

which is then not accepted.

The best fix I could come up for this was to reject a bare NULL in the
limit clause.  It's a very stupid fix, because you can still give it a
NULL, using
CREATE VIEW limit_thousand_v_3 AS SELECT thousand FROM onek WHERE thousand < 995
ORDER BY thousand FETCH FIRST (NULL+1) ROWS WITH TIES;
and the like.  But when ruleutils get this, it will add the parens,
which will magically make it work.

It turns out that the SQL standard is much more limited in what it will
accept there.  But our grammar (what we'll accept for the ancient LIMIT
clause) is very lenient -- it'll take just any expression.  I thought
about reducing that to NumericOnly for FETCH FIRST .. WITH TIES, but
then I have to pick: 1) gram.y fails to compile because of a
reduce/reduce conflict, or 2) also restricting FETCH FIRST .. ONLY to
NumericOnly.  Neither of those seemed very palatable.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2020-04-06 Thread Alvaro Herrera
Some ruleutils.c code added by this patch is not covered by tests:

5246 : /* Add the LIMIT clause if given */
52471115 : if (query->limitOffset != NULL)
5248 : {
5249   0 : appendContextKeyword(context, " OFFSET ",
5250 :  -PRETTYINDENT_STD, 
PRETTYINDENT_STD, 0);
5251   0 : get_rule_expr(query->limitOffset, context, 
false);
5252 : }
52531115 : if (query->limitOption == LIMIT_OPTION_WITH_TIES)
5254 : {
5255   0 : appendContextKeyword(context, " FETCH FIRST ",
5256 :  -PRETTYINDENT_STD, 
PRETTYINDENT_STD, 0);
5257   0 : get_rule_expr(query->limitCount, context, false);
5258   0 : appendContextKeyword(context, " ROWS WITH TIES ",
5259 :  -PRETTYINDENT_STD, 
PRETTYINDENT_STD, 0);
5260 : }
52611115 : if (query->limitCount != NULL && query->limitOption 
!= LIMIT_OPTION_WITH_TIES)
5262 : {
5263   2 : appendContextKeyword(context, " LIMIT ",
5264 :  -PRETTYINDENT_STD, 
PRETTYINDENT_STD, 0);
5265   2 : if (IsA(query->limitCount, Const) &&
5266   0 : ((Const *) query->limitCount)->constisnull)
5267   0 : appendStringInfoString(buf, "ALL");
5268 : else
5269   2 : get_rule_expr(query->limitCount, context, 
false);
5270 : }

Other than that, the patch seems good to go to me, so unless there are
objections, I intend to get this pushed tomorrow.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2020-04-06 Thread Alvaro Herrera
There was a minor conflict in planmain.h.  Here's a refreshed version.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 51dee439e5165efad186c48d1a3b9119f90542e8 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 6 Apr 2020 20:31:17 -0400
Subject: [PATCH v16] FETCH FIRST WITH TIES

XXX must bump catversion

Author: Surafel Temesgen
Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_se...@mail.gmail.com
Discussion: https://postgr.es/m/87o8wvz253@news-spur.riddles.org.uk
---
 doc/src/sgml/ref/select.sgml|  15 ++-
 src/backend/catalog/sql_features.txt|   2 +-
 src/backend/executor/nodeLimit.c| 169 +---
 src/backend/nodes/copyfuncs.c   |   7 +
 src/backend/nodes/equalfuncs.c  |   2 +
 src/backend/nodes/outfuncs.c|   7 +
 src/backend/nodes/readfuncs.c   |   6 +
 src/backend/optimizer/plan/createplan.c |  44 +-
 src/backend/optimizer/plan/planner.c|   1 +
 src/backend/optimizer/util/pathnode.c   |   2 +
 src/backend/parser/analyze.c|   3 +
 src/backend/parser/gram.y   | 117 
 src/backend/utils/adt/ruleutils.c   |  10 +-
 src/include/nodes/execnodes.h   |   4 +
 src/include/nodes/nodes.h   |  13 ++
 src/include/nodes/parsenodes.h  |   2 +
 src/include/nodes/pathnodes.h   |   1 +
 src/include/nodes/plannodes.h   |   5 +
 src/include/optimizer/pathnode.h|   1 +
 src/include/optimizer/planmain.h|   5 +-
 src/test/regress/expected/limit.out | 104 +++
 src/test/regress/sql/limit.sql  |  31 +
 22 files changed, 491 insertions(+), 60 deletions(-)

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 691e402803..2b11c38087 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1438,7 +1438,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1448,10 +1448,13 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
-and ROWS as well as FIRST
-and NEXT are noise words that don't influence
-the effects of these clauses.
+The WITH TIES option is used to return any additional
+rows that tie for the last place in the result set according to
+ORDER BY clause; ORDER BY
+is mandatory in this case.
+ROW and ROWS as well as
+FIRST and NEXT are noise
+words that don't influence the effects of these clauses.
 According to the standard, the OFFSET clause must come
 before the FETCH clause if both are present; but
 PostgreSQL is laxer and allows either order.
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3a40b027d4..b6e58e8493 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -345,7 +345,7 @@ F863	Nested  in 			YES
 F864	Top-level  in views			YES	
 F865	 in 			YES	
 F866	FETCH FIRST clause: PERCENT option			NO	
-F867	FETCH FIRST clause: WITH TIES option			NO	
+F867	FETCH FIRST clause: WITH TIES option			YES	
 R010	Row pattern recognition: FROM clause			NO	
 R020	Row pattern recognition: WINDOW clause			NO	
 R030	Row pattern recognition: full aggregate support			NO	
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index d792e97cfa..00e986c027 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -102,6 +103,15 @@ ExecLimit(PlanState *pstate)
 	node->lstate = LIMIT_EMPTY;
 	return NULL;
 }
+/*
+ * Tuple at limit is needed for comparation in subsequent execution
+ * to detect ties.
+ */
+if (node->limitOption == LIMIT_OPTION_WITH_TIES &&
+	node->position - node->offset == node->count - 1)
+{
+

Re: FETCH FIRST clause WITH TIES option

2020-01-21 Thread Surafel Temesgen
On Thu, Nov 28, 2019 at 5:49 PM Alvaro Herrera 
wrote:

> On 2019-Nov-28, Surafel Temesgen wrote:
>
> > On Thu, Nov 28, 2019 at 12:36 AM Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
> >
> > > I think you should add a /* fall-though */ comment after changing
> state.
> > > Like this (this flow seems clearer; also DRY):
> > >
> > > if (!node->noCount &&
> > > node->position - node->offset >= node->count)
> > > {
> > > if (node->limitOption == LIMIT_OPTION_COUNT)
> > > {
> > > node->lstate = LIMIT_WINDOWEND;
> > > return NULL;
> > > }
> > > else
> > > {
> > > node->lstate = LIMIT_WINDOWEND_TIES;
> > > /* fall-through */
> > > }
> > > }
> > > else
> > > ...
> >
> > changed
>
> But you did not read my code snippet, did you ...?
>

I don't see it. changed to it and rebased to current master

regards
Surafel
From 944626bf7568604bd0a1b5785c6db009baf39d0c Mon Sep 17 00:00:00 2001
From: Surafel Temesgen 
Date: Wed, 22 Jan 2020 09:10:59 +0300
Subject: [PATCH]  fetch first with ties

---
 doc/src/sgml/ref/select.sgml|  15 ++-
 src/backend/catalog/sql_features.txt|   2 +-
 src/backend/executor/nodeLimit.c| 169 +---
 src/backend/nodes/copyfuncs.c   |   7 +
 src/backend/nodes/equalfuncs.c  |   2 +
 src/backend/nodes/outfuncs.c|   7 +
 src/backend/nodes/readfuncs.c   |   6 +
 src/backend/optimizer/plan/createplan.c |  44 +-
 src/backend/optimizer/plan/planner.c|   1 +
 src/backend/optimizer/util/pathnode.c   |   2 +
 src/backend/parser/analyze.c|   3 +
 src/backend/parser/gram.y   | 117 
 src/backend/utils/adt/ruleutils.c   |  10 +-
 src/include/nodes/execnodes.h   |   4 +
 src/include/nodes/nodes.h   |  13 ++
 src/include/nodes/parsenodes.h  |   2 +
 src/include/nodes/pathnodes.h   |   1 +
 src/include/nodes/plannodes.h   |   5 +
 src/include/optimizer/pathnode.h|   1 +
 src/include/optimizer/planmain.h|   3 +-
 src/test/regress/expected/limit.out | 104 +++
 src/test/regress/sql/limit.sql  |  31 +
 22 files changed, 489 insertions(+), 60 deletions(-)

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 691e402803..2b11c38087 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1438,7 +1438,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1448,10 +1448,13 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
-and ROWS as well as FIRST
-and NEXT are noise words that don't influence
-the effects of these clauses.
+The WITH TIES option is used to return any additional
+rows that tie for the last place in the result set according to
+ORDER BY clause; ORDER BY
+is mandatory in this case.
+ROW and ROWS as well as
+FIRST and NEXT are noise
+words that don't influence the effects of these clauses.
 According to the standard, the OFFSET clause must come
 before the FETCH clause if both are present; but
 PostgreSQL is laxer and allows either order.
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 9f840ddfd2..746c9a2516 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -345,7 +345,7 @@ F863	Nested  in 			YES
 F864	Top-level  in views			YES	
 F865	 in 			YES	
 F866	FETCH FIRST clause: PERCENT option			NO	
-F867	FETCH FIRST clause: WITH TIES option			NO	
+F867	FETCH FIRST clause: WITH TIES option			YES	
 R010	Row pattern recognition: FRO

Re: FETCH FIRST clause WITH TIES option

2020-01-05 Thread Tomas Vondra

On Fri, Nov 29, 2019 at 05:19:00AM +, Andrew Gierth wrote:

"Alvaro" == Alvaro Herrera  writes:


Alvaro> First, I noticed that there's a significant unanswered issue
Alvaro> from Andrew Gierth about this using a completely different
Alvaro> mechanism, namely an implicit window function. Robert was
Alvaro> concerned about the performance of Andrew's proposed approach,
Alvaro> but I don't see any reply from Surafel on the whole issue.

Alvaro> Andrew: Would you rather not have this feature in this form at
Alvaro> all?

I have done a proof-of-concept hack for my alternative approach, which I
will post in another thread so as not to confuse the cfbot.

The main advantage, as I see it, of a window function approach is that
it can also work for PERCENT with only a change to the generated
expression; the executor part of the code can handle any stopping
condition. It can also be generalized (though the POC does not do this)
to allow an SQL-level extension, something like "LIMIT WHEN condition"
to indicate that it should stop just before the first row for which the
condition is true. Whether this is a desirable feature or not is another
question.



For the record, the alternative approach was posted here:

https://www.postgresql.org/message-id/flat/87o8wvz253@news-spur.riddles.org.uk

I think we need to make a decision about which road to take - shall we
continue with what Surafel originally proposed, or should we instead
adopt Andrew's patch?

Andrew's patch looks significantly simpler, although it probably will
need more work to get it committable. My main concern about using window
functions was performance, so I think we should benchmark. For example,
considering window functions are not parallel safe, would this have
negative impact on parallel queries?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2019-11-28 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 Alvaro> First, I noticed that there's a significant unanswered issue
 Alvaro> from Andrew Gierth about this using a completely different
 Alvaro> mechanism, namely an implicit window function. Robert was
 Alvaro> concerned about the performance of Andrew's proposed approach,
 Alvaro> but I don't see any reply from Surafel on the whole issue.
 
 Alvaro> Andrew: Would you rather not have this feature in this form at
 Alvaro> all?

I have done a proof-of-concept hack for my alternative approach, which I
will post in another thread so as not to confuse the cfbot.

The main advantage, as I see it, of a window function approach is that
it can also work for PERCENT with only a change to the generated
expression; the executor part of the code can handle any stopping
condition. It can also be generalized (though the POC does not do this)
to allow an SQL-level extension, something like "LIMIT WHEN condition"
to indicate that it should stop just before the first row for which the
condition is true. Whether this is a desirable feature or not is another
question.

-- 
Andrew.




Re: FETCH FIRST clause WITH TIES option

2019-11-28 Thread Alvaro Herrera
On 2019-Nov-28, Surafel Temesgen wrote:

> On Thu, Nov 28, 2019 at 12:36 AM Alvaro Herrera 
> wrote:
> 
> > I think you should add a /* fall-though */ comment after changing state.
> > Like this (this flow seems clearer; also DRY):
> >
> > if (!node->noCount &&
> > node->position - node->offset >= node->count)
> > {
> > if (node->limitOption == LIMIT_OPTION_COUNT)
> > {
> > node->lstate = LIMIT_WINDOWEND;
> > return NULL;
> > }
> > else
> > {
> > node->lstate = LIMIT_WINDOWEND_TIES;
> > /* fall-through */
> > }
> > }
> > else
> > ...
>
> changed

But you did not read my code snippet, did you ...?

> > I think you need to stare a that thing a little harder.
>
> This is because the new state didn't know about backward scan
> and there was off by one error it scan one position past to detect
> ties. The attached patch fix both

Oh, thanks, glad it was that easy.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2019-11-28 Thread Surafel Temesgen
On Thu, Nov 28, 2019 at 12:36 AM Alvaro Herrera 
wrote:

> Thanks.
>
> (I would suggest renaming the new state LIMIT_WINDOWEND_TIES, because it
> seems to convey what it does a little better.)
>
> I think you should add a /* fall-though */ comment after changing state.
> Like this (this flow seems clearer; also DRY):
>
> if (!node->noCount &&
> node->position - node->offset >= node->count)
> {
> if (node->limitOption == LIMIT_OPTION_COUNT)
> {
> node->lstate = LIMIT_WINDOWEND;
> return NULL;
> }
> else
> {
> node->lstate = LIMIT_WINDOWEND_TIES;
> /* fall-through */
> }
> }
> else
> ...
>
>
changed


> I've been playing with this a little more, and I think you've overlooked
> a few places in ExecLimit that need to be changed.  In the regression
> database, I tried this:
>
> 55432 13devel 17282=# begin;
> BEGIN
> 55432 13devel 17282=# declare c1 cursor for select * from int8_tbl order
> by q1 fetch first 2 rows with ties;
> DECLARE CURSOR
> 55432 13devel 17282=# fetch all in c1;
>  q1  │q2
> ─┼──
>  123 │  456
>  123 │ 4567890123456789
> (2 filas)
>
> 55432 13devel 17282=# fetch all in c1;
>  q1 │ q2
> ┼
> (0 filas)
>
> 55432 13devel 17282=# fetch forward all in c1;
>  q1 │ q2
> ┼
> (0 filas)
>
> So far so good .. things look normal.  But here's where things get
> crazy:
>
> 55432 13devel 17282=# fetch backward all in c1;
> q1│q2
> ──┼──
>  4567890123456789 │  123
>   123 │ 4567890123456789
> (2 filas)
>
> (huh???)
>
> 55432 13devel 17282=# fetch backward all in c1;
>  q1 │ q2
> ┼
> (0 filas)
>
> Okay -- ignoring the fact that we got a row that should not be in the
> result, we've exhausted the cursor going back.  Let's scan it again from
> the start:
>
> 55432 13devel 17282=# fetch forward all in c1;
> q1│q2
> ──┼───
>   123 │  4567890123456789
>  4567890123456789 │   123
>  4567890123456789 │  4567890123456789
>  4567890123456789 │ -4567890123456789
> (4 filas)
>
> This is just crazy.
>
> I think you need to stare a that thing a little harder.
>
>
This is because the new state didn't know about backward scan
and there was off by one error it scan one position past to detect
ties. The attached patch fix both
regards
Surafel
From cad411bf6bd5ffc7f5bb5cb8d9ce456fc55ac694 Mon Sep 17 00:00:00 2001
From: Surafel Temesgen 
Date: Thu, 28 Nov 2019 16:18:45 +0300
Subject: [PATCH] FETCH FIRST clause WITH TIES option

---
 doc/src/sgml/ref/select.sgml|  15 ++-
 src/backend/catalog/sql_features.txt|   2 +-
 src/backend/executor/nodeLimit.c| 156 +---
 src/backend/nodes/copyfuncs.c   |   7 ++
 src/backend/nodes/equalfuncs.c  |   2 +
 src/backend/nodes/outfuncs.c|   7 ++
 src/backend/nodes/readfuncs.c   |   6 +
 src/backend/optimizer/plan/createplan.c |  44 ++-
 src/backend/optimizer/plan/planner.c|   1 +
 src/backend/optimizer/util/pathnode.c   |   2 +
 src/backend/parser/analyze.c|   3 +
 src/backend/parser/gram.y   | 117 ++
 src/backend/utils/adt/ruleutils.c   |  10 +-
 src/include/nodes/execnodes.h   |   4 +
 src/include/nodes/nodes.h   |  13 ++
 src/include/nodes/parsenodes.h  |   2 +
 src/include/nodes/pathnodes.h   |   1 +
 src/include/nodes/plannodes.h   |   5 +
 src/include/optimizer/pathnode.h|   1 +
 src/include/optimizer/planmain.h|   3 +-
 src/test/regress/expected/limit.out | 104 
 src/test/regress/sql/limit.sql  |  31 +
 22 files changed, 482 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 691e402803..2b11c38087 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_

Re: FETCH FIRST clause WITH TIES option

2019-11-27 Thread Alvaro Herrera
Thanks.

(I would suggest renaming the new state LIMIT_WINDOWEND_TIES, because it
seems to convey what it does a little better.)

I think you should add a /* fall-though */ comment after changing state.
Like this (this flow seems clearer; also DRY):

if (!node->noCount &&
node->position - node->offset >= node->count)
{
if (node->limitOption == LIMIT_OPTION_COUNT)
{
node->lstate = LIMIT_WINDOWEND;
return NULL;
}
else
{
node->lstate = LIMIT_WINDOWEND_TIES;
/* fall-through */
}
}
else
...

I've been playing with this a little more, and I think you've overlooked
a few places in ExecLimit that need to be changed.  In the regression
database, I tried this:

55432 13devel 17282=# begin;
BEGIN
55432 13devel 17282=# declare c1 cursor for select * from int8_tbl order by q1 
fetch first 2 rows with ties;
DECLARE CURSOR
55432 13devel 17282=# fetch all in c1;
 q1  │q2
─┼──
 123 │  456
 123 │ 4567890123456789
(2 filas)

55432 13devel 17282=# fetch all in c1;
 q1 │ q2 
┼
(0 filas)

55432 13devel 17282=# fetch forward all in c1;
 q1 │ q2 
┼
(0 filas)

So far so good .. things look normal.  But here's where things get
crazy:

55432 13devel 17282=# fetch backward all in c1;
q1│q2
──┼──
 4567890123456789 │  123
  123 │ 4567890123456789
(2 filas)

(huh???)

55432 13devel 17282=# fetch backward all in c1;
 q1 │ q2 
┼
(0 filas)

Okay -- ignoring the fact that we got a row that should not be in the
result, we've exhausted the cursor going back.  Let's scan it again from
the start:

55432 13devel 17282=# fetch forward all in c1;
q1│q2 
──┼───
  123 │  4567890123456789
 4567890123456789 │   123
 4567890123456789 │  4567890123456789
 4567890123456789 │ -4567890123456789
(4 filas)

This is just crazy.

I think you need to stare a that thing a little harder.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2019-11-27 Thread Surafel Temesgen
On Tue, Nov 26, 2019 at 6:09 PM Alvaro Herrera 
wrote:

> I rebased this patch, and my proposed changes are in 0002.
>

Thank you

>
> Looking at the changes in ExecLimit, I'm wondering if it would be better
> to add a new state to the state machine there -- instead of doing all
> the work in duplicative code in the LIMIT_INWINDOW case, have that one
> only save the current end-of-window tuple and jump to
> LIMIT_WINDOWEND_WITH_TIE (or something) which then returns all tuples that
> match the current one.  I think that would end up being cleaner.  Please
> research.
>

Done
regards
Surafel
From 9d02731c2d96bedb72d5f526a64f1fd28ee334c5 Mon Sep 17 00:00:00 2001
From: Surafel Temesgen 
Date: Wed, 27 Nov 2019 11:46:45 +0300
Subject: [PATCH] limit with ties

---
 doc/src/sgml/ref/select.sgml|  15 +--
 src/backend/catalog/sql_features.txt|   2 +-
 src/backend/executor/nodeLimit.c| 117 
 src/backend/nodes/copyfuncs.c   |   7 ++
 src/backend/nodes/equalfuncs.c  |   2 +
 src/backend/nodes/outfuncs.c|   7 ++
 src/backend/nodes/readfuncs.c   |   6 ++
 src/backend/optimizer/plan/createplan.c |  44 -
 src/backend/optimizer/plan/planner.c|   1 +
 src/backend/optimizer/util/pathnode.c   |   2 +
 src/backend/parser/analyze.c|   3 +
 src/backend/parser/gram.y   | 117 ++--
 src/backend/utils/adt/ruleutils.c   |  10 +-
 src/include/nodes/execnodes.h   |   4 +
 src/include/nodes/nodes.h   |  13 +++
 src/include/nodes/parsenodes.h  |   2 +
 src/include/nodes/pathnodes.h   |   1 +
 src/include/nodes/plannodes.h   |   5 +
 src/include/optimizer/pathnode.h|   1 +
 src/include/optimizer/planmain.h|   3 +-
 src/test/regress/expected/limit.out |  52 +++
 src/test/regress/sql/limit.sql  |  21 +
 22 files changed, 381 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 691e402803..2b11c38087 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1438,7 +1438,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1448,10 +1448,13 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
-and ROWS as well as FIRST
-and NEXT are noise words that don't influence
-the effects of these clauses.
+The WITH TIES option is used to return any additional
+rows that tie for the last place in the result set according to
+ORDER BY clause; ORDER BY
+is mandatory in this case.
+ROW and ROWS as well as
+FIRST and NEXT are noise
+words that don't influence the effects of these clauses.
 According to the standard, the OFFSET clause must come
 before the FETCH clause if both are present; but
 PostgreSQL is laxer and allows either order.
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index ab3e381cff..cd720eb19a 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -345,7 +345,7 @@ F863	Nested  in 			YES
 F864	Top-level  in views			YES	
 F865	 in 			YES	
 F866	FETCH FIRST clause: PERCENT option			NO	
-F867	FETCH FIRST clause: WITH TIES option			NO	
+F867	FETCH FIRST clause: WITH TIES option			YES	
 R010	Row pattern recognition: FROM clause			NO	
 R020	Row pattern recognition: WINDOW clause			NO	
 R030	Row pattern recognition: full aggregate support			NO	
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 5e4d02ce4a..1b96a6ee8f 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -102,6 +103,15 @@ ExecLimit(PlanState *pstate)
 	node->lstate = LIMIT_EMPTY;
 	return NULL;
 }
+/*
+ * Tuple at limit is needed 

Re: FETCH FIRST clause WITH TIES option

2019-11-26 Thread Alvaro Herrera
I rebased this patch, and my proposed changes are in 0002.

Looking at the changes in ExecLimit, I'm wondering if it would be better
to add a new state to the state machine there -- instead of doing all
the work in duplicative code in the LIMIT_INWINDOW case, have that one
only save the current end-of-window tuple and jump to
LIMIT_WINDOWEND_WITH_TIE (or something) which then returns all tuples that
match the current one.  I think that would end up being cleaner.  Please
research.

Marking Waiting on Author.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 0860d9e68ec35a641e3cdf910d1064670dd4cb1d Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 25 Nov 2019 17:09:50 -0300
Subject: [PATCH v14 1/2] v13

---
 doc/src/sgml/ref/select.sgml|  15 +--
 src/backend/catalog/sql_features.txt|   2 +-
 src/backend/executor/nodeLimit.c| 109 +---
 src/backend/nodes/copyfuncs.c   |   7 ++
 src/backend/nodes/equalfuncs.c  |   2 +
 src/backend/nodes/outfuncs.c|   7 ++
 src/backend/nodes/readfuncs.c   |   6 ++
 src/backend/optimizer/plan/createplan.c |  44 +++-
 src/backend/optimizer/plan/planner.c|   1 +
 src/backend/optimizer/util/pathnode.c   |   2 +
 src/backend/parser/analyze.c|   3 +
 src/backend/parser/gram.y   | 129 
 src/backend/utils/adt/ruleutils.c   |  10 +-
 src/include/nodes/execnodes.h   |   3 +
 src/include/nodes/nodes.h   |  13 +++
 src/include/nodes/parsenodes.h  |   2 +
 src/include/nodes/pathnodes.h   |   1 +
 src/include/nodes/plannodes.h   |   5 +
 src/include/optimizer/pathnode.h|   1 +
 src/include/optimizer/planmain.h|   3 +-
 src/test/regress/expected/limit.out |  52 ++
 src/test/regress/sql/limit.sql  |  21 
 22 files changed, 393 insertions(+), 45 deletions(-)

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 691e402803..2b11c38087 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1438,7 +1438,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1448,10 +1448,13 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
-and ROWS as well as FIRST
-and NEXT are noise words that don't influence
-the effects of these clauses.
+The WITH TIES option is used to return any additional
+rows that tie for the last place in the result set according to
+ORDER BY clause; ORDER BY
+is mandatory in this case.
+ROW and ROWS as well as
+FIRST and NEXT are noise
+words that don't influence the effects of these clauses.
 According to the standard, the OFFSET clause must come
 before the FETCH clause if both are present; but
 PostgreSQL is laxer and allows either order.
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index ab3e381cff..cd720eb19a 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -345,7 +345,7 @@ F863	Nested  in 			YES
 F864	Top-level  in views			YES	
 F865	 in 			YES	
 F866	FETCH FIRST clause: PERCENT option			NO	
-F867	FETCH FIRST clause: WITH TIES option			NO	
+F867	FETCH FIRST clause: WITH TIES option			YES	
 R010	Row pattern recognition: FROM clause			NO	
 R020	Row pattern recognition: WINDOW clause			NO	
 R030	Row pattern recognition: full aggregate support			NO	
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 5e4d02ce4a..0152ab2c6e 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -102,6 +103,15 @@ ExecLimit(PlanState *pstate)
 	node->lstate = LIMIT_EMPTY;
 	return NULL;
 }
+/*
+ * Tuple at li

Re: FETCH FIRST clause WITH TIES option

2019-11-25 Thread Alvaro Herrera
(Prior to posting this delta patch, the CF bot appeared happy with this
patch.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2019-11-25 Thread Alvaro Herrera
On 2019-Nov-11, Alvaro Herrera wrote:

> I'm not sure the proposed changes to gram.y are all that great, though.

Here's a proposed simplification of the gram.y changes.  There are two
things here:

1. cosmetic: we don't need the LimitClause struct; we can use just
   SelectLimit, and return that from limit_clause; that can be
   complemented using the offset_clause if there's any at select_limit
   level.

2. there's a gratuituous palloc() in opt_select_limit when there's no
   clause, that seems to be there just so that NULLs can be returned.
   That's one extra palloc for SELECTs parsed using one the affected
   productions ... it's not every single select, but it seems bad enough
   it's worth fixing.

I fixed #2 by just checking whether the return from opt_select_limit is
NULL.  ISTM it'd be better to pass the SelectLimit pointer to
insertSelectOptions instead (which is a static function in gram.y anyway
so there's no difficulty there).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e776215155..f4304a45b9 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -135,13 +135,6 @@ typedef struct SelectLimit
 	LimitOption limitOption;
 } SelectLimit;
 
-/* Private struct for the result of limit_clause production */
-typedef struct LimitClause
-{
-	Node *limitCount;
-	LimitOption limitOption;
-} LimitClause;
-
 /* ConstraintAttributeSpec yields an integer bitmask of these flags: */
 #define CAS_NOT_DEFERRABLE			0x01
 #define CAS_DEFERRABLE0x02
@@ -258,8 +251,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	PartitionSpec		*partspec;
 	PartitionBoundSpec	*partboundspec;
 	RoleSpec			*rolespec;
-	struct SelectLimit	*SelectLimit;
-	struct LimitClause	*LimitClause;
+	struct SelectLimit	*selectlimit;
 }
 
 %type 	stmt schema_stmt
@@ -392,8 +384,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	import_qualification_type
 %type  import_qualification
 %type 	vacuum_relation
-%type  opt_select_limit select_limit
-%type  limit_clause
+%type  opt_select_limit select_limit limit_clause
 
 %type 	stmtblock stmtmulti
 OptTableElementList TableElementList OptInherit definition
@@ -11343,8 +11334,9 @@ select_no_parens:
 			| select_clause opt_sort_clause for_locking_clause opt_select_limit
 {
 	insertSelectOptions((SelectStmt *) $1, $2, $3,
-		($4)->limitOffset, ($4)->limitCount,
-		($4)->limitOption,
+		($4) ? ($4)->limitOffset : NULL,
+		($4) ? ($4)->limitCount : NULL,
+		($4) ? ($4)->limitOption : LIMIT_OPTION_DEFAULT,
 		NULL,
 		yyscanner);
 	$$ = $1;
@@ -11362,7 +11354,7 @@ select_no_parens:
 {
 	insertSelectOptions((SelectStmt *) $2, NULL, NIL,
 		NULL, NULL,
-		LIMIT_OPTION_DEFAULT,$1,
+		LIMIT_OPTION_DEFAULT, $1,
 		yyscanner);
 	$$ = $2;
 }
@@ -11370,15 +11362,16 @@ select_no_parens:
 {
 	insertSelectOptions((SelectStmt *) $2, $3, NIL,
 		NULL, NULL,
-		LIMIT_OPTION_DEFAULT,$1,
+		LIMIT_OPTION_DEFAULT, $1,
 		yyscanner);
 	$$ = $2;
 }
 			| with_clause select_clause opt_sort_clause for_locking_clause opt_select_limit
 {
 	insertSelectOptions((SelectStmt *) $2, $3, $4,
-		($5)->limitOffset, ($5)->limitCount,
-		($5)->limitOption,
+		($5) ? ($5)->limitOffset : NULL,
+		($5) ? ($5)->limitCount : NULL,
+		($5) ? ($5)->limitOption : LIMIT_OPTION_DEFAULT,
 		$1,
 		yyscanner);
 	$$ = $2;
@@ -11683,27 +11676,17 @@ sortby:		a_expr USING qual_all_Op opt_nulls_order
 select_limit:
 			limit_clause offset_clause
 {
-	SelectLimit *n = (SelectLimit *) palloc(sizeof(SelectLimit));
-	n->limitOffset = $2;
-	n->limitCount = ($1)->limitCount;
-	n->limitOption = ($1)->limitOption;
-	$$ = n;
+	$$ = $1;
+	($$)->limitOffset = $2;
 }
 			| offset_clause limit_clause
 {
-	SelectLimit *n = (SelectLimit *) palloc(sizeof(SelectLimit));
-	n->limitOffset = $1;
-	n->limitCount = ($2)->limitCount;
-	n->limitOption = ($2)->limitOption;
-	$$ = n;
+	$$ = $2;
+	($$)->limitOffset = $1;
 }
 			| limit_clause
 {
-	SelectLimit *n = (SelectLimit *) palloc(sizeof(SelectLimit));
-	n->limitOffset = NULL;
-	n->limitCount = ($1)->limitCount;
-	n->limitOption = ($1)->limitOption;
-	$$ = n;
+	$$ = $1;
 }
 			| offset_clause
 {
@@ -11717,20 +11700,14 @@ select_limit:
 
 opt_select_limit:
 			select_limit		{ $$ = $1; }
-			| /* EMPTY */
-{
-	SelectLimit *n = (SelectLimit *) palloc(sizeof(SelectLimit));
-	n->limitOffset = NULL;
-	n->limitCount = NULL;
-	n->limitOption = LIMIT_OPTION_DEFAULT;
-	$$ = n;
-}
+			| /* EMPTY */		{ $$ 

Re: FETCH FIRST clause WITH TIES option

2019-11-12 Thread Surafel Temesgen
On Mon, Nov 11, 2019 at 5:56 PM Alvaro Herrera 
wrote:

> First, I noticed that there's a significant unanswered issue from Andrew
> Gierth about this using a completely different mechanism, namely an
> implicit window function.  I see that Robert was concerned about the
> performance of Andrew's proposed approach, but I don't see any reply
> from Surafel on the whole issue.
>
>
i don't know much about window function implementation but am sure teaching
window
function about limit and implementing limit
*variant on top of it will not be much simpler *

*and performant than the current implementation. i also think more
appropriate place to *

*include limit variant is limitNode not other place which can case
redundancy  and *

*maintainability issue *


*regards *

*Surafel *


Re: FETCH FIRST clause WITH TIES option

2019-11-11 Thread Alvaro Herrera
(BTW another small issue in the patch is the comments in gram.y that
talk about ONLY being a reserved word about the SQL:2008 syntax; ISTM
that that comment should now mention both ONLY as well as WITH being
fully reserved words.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2019-11-11 Thread Alvaro Herrera
r count value is required by
@@ -1440,10 +1440,13 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
-and ROWS as well as FIRST
-and NEXT are noise words that don't influence
-the effects of these clauses.
+The WITH TIES option is used to return any additional
+rows that tie for the last place in the result set according to
+ORDER BY clause; ORDER BY
+is mandatory in this case.
+ROW and ROWS as well as
+FIRST and NEXT are noise
+words that don't influence the effects of these clauses.
 According to the standard, the OFFSET clause must come
 before the FETCH clause if both are present; but
 PostgreSQL is laxer and allows either order.
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index ab3e381cff..cd720eb19a 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -345,7 +345,7 @@ F863	Nested  in 			YES
 F864	Top-level  in views			YES	
 F865	 in 			YES	
 F866	FETCH FIRST clause: PERCENT option			NO	
-F867	FETCH FIRST clause: WITH TIES option			NO	
+F867	FETCH FIRST clause: WITH TIES option			YES	
 R010	Row pattern recognition: FROM clause			NO	
 R020	Row pattern recognition: WINDOW clause			NO	
 R030	Row pattern recognition: full aggregate support			NO	
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..04ff17a229 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -102,6 +103,15 @@ ExecLimit(PlanState *pstate)
 	node->lstate = LIMIT_EMPTY;
 	return NULL;
 }
+/*
+ * Tuple at limit is needed for comparation in subsequent execution
+ * to detect ties.
+ */
+if (node->limitOption == LIMIT_OPTION_WITH_TIES &&
+	node->position - node->offset == node->count - 1)
+{
+	ExecCopySlot(node->last_slot, slot);
+}
 node->subSlot = slot;
 if (++node->position > node->offset)
 	break;
@@ -126,12 +136,16 @@ ExecLimit(PlanState *pstate)
 			{
 /*
  * Forwards scan, so check for stepping off end of window. If
- * we are at the end of the window, return NULL without
- * advancing the subplan or the position variable; but change
- * the state machine state to record having done so.
+ * we are at the end of the window, the behavior depends whether
+ * ONLY or WITH TIES was specified.  In case of ONLY, we return
+ * NULL without advancing the subplan or the position variable;
+ * but change the state machine state to record having done so.
+ * In the WITH TIES mode, we need to advance the subplan until
+ * we find the first row with different ORDER BY pathkeys.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == LIMIT_OPTION_COUNT)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -144,18 +158,69 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
-
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == LIMIT_OPTION_WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		 * If we know we won't need to back up, we can release
+		 * resources at this point.
+		 */
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+
+	/*
+	 * Tuple at limit is neede

Re: FETCH FIRST clause WITH TIES option

2019-09-24 Thread Tom Lane
Alvaro Herrera  writes:
> create table w (a point);

> # select * from w order by a fetch first 2 rows with ties;
> ERROR:  could not identify an ordering operator for type point
> LINE 1: select * from w order by a fetch first 2 rows with ties;
>   ^
> HINT:  Use an explicit ordering operator or modify the query.

> I'm not sure that the HINT is useful here.

That's not new to this patch, HEAD does the same:

regression=# create table w (a point);
CREATE TABLE
regression=# select * from w order by a ;
ERROR:  could not identify an ordering operator for type point
LINE 1: select * from w order by a ;
 ^
HINT:  Use an explicit ordering operator or modify the query.

It is a meaningful hint IMO, since (in theory) you could add
something like "USING <<" to the ORDER BY to specify a
particular ordering operator.  The fact that no suitable
operator is actually available in core doesn't seem like
a reason not to give the hint.

regards, tom lane




Re: FETCH FIRST clause WITH TIES option

2019-09-24 Thread Alvaro Herrera
Hmm,

create table w (a point);

# select * from w order by a fetch first 2 rows with ties;
ERROR:  could not identify an ordering operator for type point
LINE 1: select * from w order by a fetch first 2 rows with ties;
  ^
HINT:  Use an explicit ordering operator or modify the query.

I'm not sure that the HINT is useful here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2019-09-10 Thread Surafel Temesgen
On Fri, Sep 6, 2019 at 5:07 PM Alvaro Herrera from 2ndQuadrant <
alvhe...@alvh.no-ip.org> wrote:

> On 2019-Sep-06, Surafel Temesgen wrote:
>
> > > ... yet this doesn't appear to have resulted in any change in the code,
> > > or I just missed it.  Are you going to update the patch per that?
> >
> > Its already done in v9 of the patch attached by Tomas
>
> Oh, I missed that.  Great, so we're expecting some small updates from
> you then and I think you can mark ready for committer when you post it.
> Thanks.
>
>
Done
regards
Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..e83d309c5b 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for the last place in the result set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 059ec02cd0..54628657e6 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -349,7 +349,7 @@ F863	Nested  in 			YES
 F864	Top-level  in views			YES	
 F865	 in 			YES	
 F866	FETCH FIRST clause: PERCENT option			NO	
-F867	FETCH FIRST clause: WITH TIES option			NO	
+F867	FETCH FIRST clause: WITH TIES option			YES	
 R010	Row pattern recognition: FROM clause			NO	
 R020	Row pattern recognition: WINDOW clause			NO	
 R030	Row pattern recognition: full aggregate support			NO	
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..04ff17a229 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -102,6 +103,15 @@ ExecLimit(PlanState *pstate)
 	node->lstate = LIMIT_EMPTY;
 	return NULL;
 }
+/*
+ * Tuple at limit is needed for comparation in subsequent execution
+ * to detect ties.
+ */
+if (node->limitOption == LIMIT_OPTION_WITH_TIES &&
+	node->position - node->offset == node->count - 1)
+{
+	ExecCopySlot(node->last_slot, slot);
+}
 node->subSlot = slot;
 if (++node->position > node->offset)
 	break;
@@ -126,12 +136,16 @@ ExecLimit(PlanState *pstate)
 			{
 /*
  * Forwards scan, so check for stepping off end of window. If
- * we are at the end of the window, return NULL without
- * advancing the subplan or the position variable; but change
- * the state machine state to record having done so.
+ * we are at the end of the window, the behavior depends whether
+ * ONLY or WITH TIES was specified.  In case of ONLY, we return
+ * NULL without advancing the subplan or the position variable;
+ * but change the state machine state to record having done so.
+ * In the WITH TIES mode, we need to advance the subplan until
+ * we find the first row with different ORDER BY pathkeys.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == LIMIT_OPTION_COUNT)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -144,18 +158,69 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == LIMIT_OPTION_WITH_TIES)
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	

Re: FETCH FIRST clause WITH TIES option

2019-09-06 Thread Alvaro Herrera from 2ndQuadrant
On 2019-Sep-06, Surafel Temesgen wrote:

> > ... yet this doesn't appear to have resulted in any change in the code,
> > or I just missed it.  Are you going to update the patch per that?
>
> Its already done in v9 of the patch attached by Tomas

Oh, I missed that.  Great, so we're expecting some small updates from
you then and I think you can mark ready for committer when you post it.
Thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2019-09-06 Thread Surafel Temesgen
Hi Alvaro,
On Fri, Sep 6, 2019 at 1:52 AM Alvaro Herrera from 2ndQuadrant <
alvhe...@alvh.no-ip.org> wrote:

> As Tom just said in the thread for PERCENT, the gram.y changes need a
> better representation.  Also, rename EXACT_NUMBER, per that thread.
>
> As far as I can tell, this concerns feature F867.  I think we should
> mark that as supported after this patch -- please edit
> src/backend/catalog/sql_features.txt.
>

ok


>
> Earlier in the thread, Tomas Vondra said:
>
> > 3) I'm a bit confused by the initialization added to ExecInitLimit. It
> > first gets the tuple descriptor from the limitstate (it should not do
> so
>
> > directly but use ExecGetResultType). But when it creates the extra
> slot,
>
> > it uses ops extracted from the outer plan. That's strange, I guess ...
>
>
> >
>
>
> > And then it extracts the descriptor from the outer plan and uses it
> when
>
> > calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
>
>
> > the last_slot, which is using a descriptor from the limitstate.
>
>
> >
>
>
> > IMHO all of this should use descriptor/ops from the outer plan, no? It
>
>
> > probably does not change anything because limit does not project, but
> it
>
> > seems confusing.
>
>
>
> and you replied:
>
> > agree
>
> ... yet this doesn't appear to have resulted in any change in the code,
> or I just missed it.  Are you going to update the patch per that?
>
>
Its already done in v9 of the patch attached by Tomas

regards
Surafel


Re: FETCH FIRST clause WITH TIES option

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant
As Tom just said in the thread for PERCENT, the gram.y changes need a
better representation.  Also, rename EXACT_NUMBER, per that thread.

As far as I can tell, this concerns feature F867.  I think we should
mark that as supported after this patch -- please edit
src/backend/catalog/sql_features.txt.

Earlier in the thread, Tomas Vondra said:

> 3) I'm a bit confused by the initialization added to ExecInitLimit. It
> first gets the tuple descriptor from the limitstate (it should not do so  
>   
> 
> directly but use ExecGetResultType). But when it creates the extra slot,  
>   
> 
> it uses ops extracted from the outer plan. That's strange, I guess ...
>   
> 
>   
>   
> 
> And then it extracts the descriptor from the outer plan and uses it when  
>   
> 
> calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
>   
> 
> the last_slot, which is using a descriptor from the limitstate.   
>   
> 
>   
>   
> 
> IMHO all of this should use descriptor/ops from the outer plan, no? It
>   
> 
> probably does not change anything because limit does not project, but it  
>   
> 
> seems confusing.  
>   
> 

and you replied:

> agree

... yet this doesn't appear to have resulted in any change in the code,
or I just missed it.  Are you going to update the patch per that?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

2019-07-03 Thread Surafel Temesgen
Hi Erik,
Thank you for looking at it.

Can you have a look?
>
>

It is because of the patch didn't handle that edge case.
attache patch contain a fix for it

regards
Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..e83d309c5b 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for the last place in the result set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..cf557ecb1c 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -102,6 +103,16 @@ ExecLimit(PlanState *pstate)
 	node->lstate = LIMIT_EMPTY;
 	return NULL;
 }
+
+/*
+ * Tuple at limit is needed for comparation in subsequent execution
+ * to detect ties.
+ */
+if (node->limitOption == WITH_TIES &&
+	node->position - node->offset == node->count - 1)
+{
+	ExecCopySlot(node->last_slot, slot);
+}
 node->subSlot = slot;
 if (++node->position > node->offset)
 	break;
@@ -126,12 +137,16 @@ ExecLimit(PlanState *pstate)
 			{
 /*
  * Forwards scan, so check for stepping off end of window. If
- * we are at the end of the window, return NULL without
- * advancing the subplan or the position variable; but change
- * the state machine state to record having done so.
+ * we are at the end of the window, the behavior depends whether
+ * ONLY or WITH TIES was specified.  In case of ONLY, we return
+ * NULL without advancing the subplan or the position variable;
+ * but change the state machine state to record having done so.
+ * In the WITH TIES mode, we need to advance the subplan until
+ * we find the first row with different ORDER BY pathkeys.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == EXACT_NUMBER)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -144,18 +159,69 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		 * If we know we won't need to back up, we can release
+		 * resources at this point.
+		 */
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+}
+else
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;

Re: FETCH FIRST clause WITH TIES option

2019-07-03 Thread Erik Rijkers

On 2019-07-01 19:38, Surafel Temesgen wrote:
Thank you for informing. attach is a rebased patch against current 
master

[...]
[fetch_first_with_ties_v10.patch]


Hi Surafel,

The patch applies OK, make check is OK, compiles OK.

But I get:

TRAP: FailedAssertion("!(!(((slot)->tts_flags & (1 << 1)) != 0))", File: 
"execTuples.c", Line: 491)


when running a variant ('fetch 1' instead of 'fetch 2') of the test SQL 
against src/test/regress/data/onek.data:


(in the script below the location of the file 'onek.data' will have to 
be changed)


- 8< -
#!/bin/bash

echo "
drop   table if exists onek ;
create table onek (
unique1  int4,
unique2  int4,
two  int4,
four int4,
ten  int4,
twenty   int4,
hundred  int4,
thousand int4,
twothousand  int4,
fivethousint4,
tenthous int4,
odd  int4,
even int4,
stringu1 name,
stringu2 name,
string4  name
);

copy onek from 
'/home/aardvark/pg_stuff/pg_sandbox/pgsql.fetch_first_with_ties/src/test/regress/data/onek.data';


create index   onek_unique1 on onek using btree(unique1 
int4_ops);

create index onek_unique2  on onek using btree(unique2 int4_ops);
create index onek_hundred  on onek using btree(hundred int4_ops);
create index onek_stringu1 on onek using btree(stringu1 name_ops);

-- OK:
select  * from onek
where thousand < 5 order by thousand
fetch first 1 rows only
;

-- crashes:
select  * from onek
where thousand < 5 order by thousand
fetch first 1 rows with ties
;

" | psql -qXa
- 8< -

Can you have a look?


thanks,

Erik Rijkers






Re: FETCH FIRST clause WITH TIES option

2019-07-01 Thread Surafel Temesgen
Hi Thomas
On Mon, Jul 1, 2019 at 1:31 PM Thomas Munro  wrote:

> On Mon, Apr 8, 2019 at 8:26 PM Surafel Temesgen 
> wrote:
> > agree
>
> Hi Surafel,
>
> A new Commitfest is starting.  Can you please post a fresh rebase of this
> patch?
>
>
Thank you for informing. attach is a rebased patch against current master

regards
Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..e83d309c5b 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for the last place in the result set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..60660e710f 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -126,12 +127,16 @@ ExecLimit(PlanState *pstate)
 			{
 /*
  * Forwards scan, so check for stepping off end of window. If
- * we are at the end of the window, return NULL without
- * advancing the subplan or the position variable; but change
- * the state machine state to record having done so.
+ * we are at the end of the window, the behavior depends whether
+ * ONLY or WITH TIES was specified.  In case of ONLY, we return
+ * NULL without advancing the subplan or the position variable;
+ * but change the state machine state to record having done so.
+ * In the WITH TIES mode, we need to advance the subplan until
+ * we find the first row with different ORDER BY pathkeys.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == EXACT_NUMBER)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -144,18 +149,69 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		 * If we know we won't need to back up, we can release
+		 * resources at this point.
+		 */
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+}
+else
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+
+	/*
+	 * Tuple at limit is needed for comparation in subsequent execution
+	 * to detect ties.
+	 */
+	if (node->limitOption == WITH_TIES &&
+		node->position - node->offset == node->count - 1)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	

Re: FETCH FIRST clause WITH TIES option

2019-07-01 Thread Thomas Munro
On Mon, Apr 8, 2019 at 8:26 PM Surafel Temesgen  wrote:
> agree

Hi Surafel,

A new Commitfest is starting.  Can you please post a fresh rebase of this patch?

Thanks,

-- 
Thomas Munro
https://enterprisedb.com




Re: FETCH FIRST clause WITH TIES option

2019-04-08 Thread Surafel Temesgen
On Sun, Apr 7, 2019 at 1:39 AM Tomas Vondra 
wrote:

>
> 1) I've removed the costing changes. As Tom mentions elsewhere in this
> thread, that's probably not needed for v1 and it's true those estimates
> are probably somewhat sketchy anyway.
>
>
> 2) v8 did this (per my comment):
>
> -   ExecSetTupleBound(compute_tuples_needed(node),
> outerPlanState(node));
> +   if(node->limitOption == EXACT_NUMBER)
> +   ExecSetTupleBound(compute_tuples_needed(node),
> outerPlanState(node));
>
> but I noticed the comment immediately above that says
>
>   * Notify child node about limit.  Note: think not to "optimize" by
>   * skipping ExecSetTupleBound if compute_tuples_needed returns < 0.  We
>   * must update the child node anyway, in case this is a rescan and the
>   * previous time we got a different result.
>
> which applies to WITH_TIES too, no? So I've modified compute_tuples_needed
> to return -1, and reverted this change. Not sure, though.
>
>
>
I agree that passing -1 to  ExecSetTupleBound is more appropriate
implementation


3) I'm a bit confused by the initialization added to ExecInitLimit. It
> first gets the tuple descriptor from the limitstate (it should not do so
> directly but use ExecGetResultType). But when it creates the extra slot,
> it uses ops extracted from the outer plan. That's strange, I guess ...
>
> And then it extracts the descriptor from the outer plan and uses it when
> calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
> the last_slot, which is using a descriptor from the limitstate.
>
> IMHO all of this should use descriptor/ops from the outer plan, no? It
> probably does not change anything because limit does not project, but it
> seems confusing.
>


agree

regards
Surafel


Re: FETCH FIRST clause WITH TIES option

2019-04-06 Thread Tomas Vondra

Hi Surafel,

I've been looking at the patch with the intent to commit it, but once
again I ran into stuff that seems suspicious to me but am not familiar
with enough to say if it's OK. I'm sorry about that :-(

First, a couple of tweaks in the attached v9 of the patch:


1) I've removed the costing changes. As Tom mentions elsewhere in this
thread, that's probably not needed for v1 and it's true those estimates
are probably somewhat sketchy anyway.


2) v8 did this (per my comment):

-   ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+   if(node->limitOption == EXACT_NUMBER)
+   ExecSetTupleBound(compute_tuples_needed(node), 
outerPlanState(node));

but I noticed the comment immediately above that says

 * Notify child node about limit.  Note: think not to "optimize" by
 * skipping ExecSetTupleBound if compute_tuples_needed returns < 0.  We
 * must update the child node anyway, in case this is a rescan and the
 * previous time we got a different result.

which applies to WITH_TIES too, no? So I've modified compute_tuples_needed
to return -1, and reverted this change. Not sure, though.


3) I'm a bit confused by the initialization added to ExecInitLimit. It
first gets the tuple descriptor from the limitstate (it should not do so
directly but use ExecGetResultType). But when it creates the extra slot,
it uses ops extracted from the outer plan. That's strange, I guess ...

And then it extracts the descriptor from the outer plan and uses it when
calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
the last_slot, which is using a descriptor from the limitstate.

IMHO all of this should use descriptor/ops from the outer plan, no? It
probably does not change anything because limit does not project, but it
seems confusing.


cheers

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..e83d309c5b 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | 
DESC | USING operator ] [ NULLS { 
FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] 
]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] 
[...] ]
 
 where from_item can be 
one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] 
{ ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] 
{ ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for the last place in the result set according to ORDER 
BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..60660e710f 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *   /* return: a 
tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
LimitState *node = castNode(LimitState, pstate);
+   ExprContext *econtext = node->ps.ps_ExprContext;
ScanDirection direction;
TupleTableSlot *slot;
PlanState  *outerPlan;
@@ -126,12 +127,16 @@ ExecLimit(PlanState *pstate)
{
/*
 * Forwards scan, so check for stepping off end 
of window. If
-* we are at the end of the window, return NULL 
without
-* advancing the subplan or the position 
variable; but change
-* the state machine state to record having 
done so.
+* we are at the end of the window, the 
behavior depends whether
+* ONLY or WITH TIES was specified.  In case of 
ONLY, we return
+* NULL without advancing the subplan or the 
position variable;
+* but change the state machine state to record 
having done so.
+* In the WITH TIES mode, we need to advance 
the subplan until
+* we find the first row with different ORDER 
BY pathkeys.

Re: FETCH FIRST clause WITH TIES option

2019-04-03 Thread Tomas Vondra

On Wed, Apr 03, 2019 at 03:08:05PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

I've tried to fix the merge conflict (essentially by moving some of the
code to adjust_limit_rows_costs(), but I'm wondering if the code added to
create_limit_path is actually correct
...
Firstly, this seriously needs some comment explaining why we do this.


I've not looked at this patch, but TBH I wonder why it is touching
planner rowcount estimation at all.  I find it doubtful either that
a correction for WITH TIES would be significant in most use-cases,
or that we could estimate it accurately if it was significant.
It certainly doesn't seem like something that needs to be messed
with in v1 of the feature.



FWIW it was me who suggested to tweak the cardinality estimation this way,
but if we want to leave it out from v1, I'm OK with that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: FETCH FIRST clause WITH TIES option

2019-04-03 Thread Tom Lane
Tomas Vondra  writes:
> I've tried to fix the merge conflict (essentially by moving some of the
> code to adjust_limit_rows_costs(), but I'm wondering if the code added to
> create_limit_path is actually correct
> ...
> Firstly, this seriously needs some comment explaining why we do this.

I've not looked at this patch, but TBH I wonder why it is touching
planner rowcount estimation at all.  I find it doubtful either that
a correction for WITH TIES would be significant in most use-cases,
or that we could estimate it accurately if it was significant.
It certainly doesn't seem like something that needs to be messed
with in v1 of the feature.

regards, tom lane




Re: FETCH FIRST clause WITH TIES option

2019-04-03 Thread Tomas Vondra

Hi,

Unfortunately this got broken again, this time by aef65db676 :-(

I've tried to fix the merge conflict (essentially by moving some of the
code to adjust_limit_rows_costs(), but I'm wondering if the code added to
create_limit_path is actually correct

   if (count_est != 0)
   {
   doublecount_rows;

   if (count_est > 0)
   count_rows = (double) count_est;
   else
   count_rows = clamp_row_est(subpath->rows * 0.10);

   if (limitOption == WITH_TIES)
   {
   ...
   count_rows = Max(avgGroupSize, count_est + (...));
   }
   ...
   }

Firstly, this seriously needs some comment explaining why we do this. But
more importantly - shouldn't it really be

   count_rows = Max(avgGroupSize, count_rows + (...));

instead of using count_est again (which might easily be -1 anyway)?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Re: FETCH FIRST clause WITH TIES option

2019-04-01 Thread Surafel Temesgen
On Sun, Mar 31, 2019 at 3:14 AM Tomas Vondra 
wrote:

>
>
> Hi,
>
> I got to look at the patch today, with the intent to commit, but sadly I
> ran into a couple of minor issues that I don't feel comfortable fixing
> on my own. Attached is a patch highlighling some of the places (0001 is
> your v7 patch, to keep the cfbot happy).
>
>
Thank you

>
> 1) the docs documented this as
>
>... [ ONLY | WITH TIES ]
>
> but that's wrong, because it implies those options are optional (i.e.
> the user may not specify anything). That's not the case, exactly one
> of those options needs to be specified, so it should have been
>
>... { ONLY | WITH TIES }
>
>
> 2) The comment in ExecLimit() needs to be updated to explain that WITH
> TIES changes the behavior.
>
>
> 3) Minor code style issues (no space before * on comment lines, {}
> around single-line if statements, ...).
>
>
> 4) The ExecLimit() does this
>
> if (node->limitOption == WITH_TIES)
> ExecCopySlot(node->last_slot, slot);
>
> but I think we only really need to do that for the last tuple in the
> window, no? Would it be a useful optimization?
>
>
>
I think it is good optimization .Fixed

5) Two issues in _outLimit(). Firstly, when printing uniqCollations the
> code actually prints uniqOperators. Secondly, why does the code use
> these loops at all, instead of using WRITE_ATTRNUMBER_ARRAY and
> WRITE_OID_ARRAY, like other places? Perhaps there's an issue with empty
> arrays? I haven't tested this, but looking at the READ_ counterparts, I
> don't see why that would be the case.
>
>
>
Fixed

regards
Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..e83d309c5b 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for the last place in the result set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..994ac7f089 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -126,12 +127,16 @@ ExecLimit(PlanState *pstate)
 			{
 /*
  * Forwards scan, so check for stepping off end of window. If
- * we are at the end of the window, return NULL without
- * advancing the subplan or the position variable; but change
- * the state machine state to record having done so.
+ * we are at the end of the window, the behavior depends whether
+ * ONLY or WITH TIES was specified.  In case of ONLY, we return
+ * NULL without advancing the subplan or the position variable;
+ * but change the state machine state to record having done so.
+ * In the WITH TIES mode, we need to advance the subplan until
+ * we find the first row with different ORDER BY pathkeys.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == EXACT_NUMBER)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -144,18 +149,69 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the 

Re: Re: FETCH FIRST clause WITH TIES option

2019-03-30 Thread Tomas Vondra

On Sun, Mar 31, 2019 at 01:14:46AM +0100, Tomas Vondra wrote:

On Fri, Mar 29, 2019 at 01:56:48AM +0100, Tomas Vondra wrote:

On Tue, Mar 26, 2019 at 10:46:00AM +0300, Surafel Temesgen wrote:

On Mon, Mar 25, 2019 at 11:56 AM David Steele  wrote:


This patch no longer passes testing so marked Waiting on Author.



Thank  you for informing. Fixed


Thanks for the updated patch.  I do have this on my list of patches that
I'd like to commit in this CF - likely tomorrow after one more round of
review, or so.



Hi,

I got to look at the patch today, with the intent to commit, but sadly I
ran into a couple of minor issues that I don't feel comfortable fixing
on my own. Attached is a patch highlighling some of the places (0001 is
your v7 patch, to keep the cfbot happy).


1) the docs documented this as

 ... [ ONLY | WITH TIES ]

but that's wrong, because it implies those options are optional (i.e.
the user may not specify anything). That's not the case, exactly one
of those options needs to be specified, so it should have been

 ... { ONLY | WITH TIES }


2) The comment in ExecLimit() needs to be updated to explain that WITH
TIES changes the behavior.


3) Minor code style issues (no space before * on comment lines, {}
around single-line if statements, ...).


4) The ExecLimit() does this

  if (node->limitOption == WITH_TIES)
  ExecCopySlot(node->last_slot, slot);

but I think we only really need to do that for the last tuple in the
window, no? Would it be a useful optimization?


5) Two issues in _outLimit(). Firstly, when printing uniqCollations the
code actually prints uniqOperators. Secondly, why does the code use
these loops at all, instead of using WRITE_ATTRNUMBER_ARRAY and
WRITE_OID_ARRAY, like other places? Perhaps there's an issue with empty
arrays? I haven't tested this, but looking at the READ_ counterparts, I
don't see why that would be the case.



Actually, two more minor comments:

6) There's some confusing naming - in plannodes.h the fields added to
the Limit node are called uniqSomething, but in other places the patch
uses sortSomething, ordSomething. I suggest more consistent naming.

7) The LimitOption enum has two items - WITH_ONLY and WITH_TIES. That's
a bit strange, because there's nothing like "WITH ONLY".


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Re: FETCH FIRST clause WITH TIES option

2019-03-30 Thread Tomas Vondra

On Fri, Mar 29, 2019 at 01:56:48AM +0100, Tomas Vondra wrote:

On Tue, Mar 26, 2019 at 10:46:00AM +0300, Surafel Temesgen wrote:

On Mon, Mar 25, 2019 at 11:56 AM David Steele  wrote:


This patch no longer passes testing so marked Waiting on Author.



Thank  you for informing. Fixed


Thanks for the updated patch.  I do have this on my list of patches that
I'd like to commit in this CF - likely tomorrow after one more round of
review, or so.



Hi,

I got to look at the patch today, with the intent to commit, but sadly I
ran into a couple of minor issues that I don't feel comfortable fixing
on my own. Attached is a patch highlighling some of the places (0001 is
your v7 patch, to keep the cfbot happy).


1) the docs documented this as

  ... [ ONLY | WITH TIES ]

but that's wrong, because it implies those options are optional (i.e.
the user may not specify anything). That's not the case, exactly one
of those options needs to be specified, so it should have been

  ... { ONLY | WITH TIES }


2) The comment in ExecLimit() needs to be updated to explain that WITH
TIES changes the behavior.


3) Minor code style issues (no space before * on comment lines, {}
around single-line if statements, ...).


4) The ExecLimit() does this

   if (node->limitOption == WITH_TIES)
   ExecCopySlot(node->last_slot, slot);

but I think we only really need to do that for the last tuple in the
window, no? Would it be a useful optimization?


5) Two issues in _outLimit(). Firstly, when printing uniqCollations the
code actually prints uniqOperators. Secondly, why does the code use
these loops at all, instead of using WRITE_ATTRNUMBER_ARRAY and
WRITE_OID_ARRAY, like other places? Perhaps there's an issue with empty
arrays? I haven't tested this, but looking at the READ_ counterparts, I
don't see why that would be the case.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
>From af9b8b8edcad84f88fc846ab3ce3f77cfb94230e Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 31 Mar 2019 00:04:31 +0100
Subject: [PATCH 1/2] fetch_first_with_ties_v7

---
 doc/src/sgml/ref/select.sgml|  9 ++-
 src/backend/executor/nodeLimit.c| 91 ++---
 src/backend/nodes/copyfuncs.c   |  7 ++
 src/backend/nodes/equalfuncs.c  |  2 +
 src/backend/nodes/outfuncs.c| 31 +
 src/backend/nodes/readfuncs.c   |  6 ++
 src/backend/optimizer/plan/createplan.c | 44 +++-
 src/backend/optimizer/plan/planner.c|  1 +
 src/backend/optimizer/util/pathnode.c   | 16 +
 src/backend/parser/analyze.c|  3 +
 src/backend/parser/gram.y   | 49 +
 src/include/nodes/execnodes.h   |  3 +
 src/include/nodes/nodes.h   | 12 
 src/include/nodes/parsenodes.h  |  2 +
 src/include/nodes/pathnodes.h   |  1 +
 src/include/nodes/plannodes.h   |  5 ++
 src/include/optimizer/pathnode.h|  1 +
 src/include/optimizer/planmain.h|  3 +-
 src/test/regress/expected/limit.out | 35 ++
 src/test/regress/sql/limit.sql  | 17 +
 20 files changed, 307 insertions(+), 31 deletions(-)

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..b3b045ea87 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | 
DESC | USING operator ] [ NULLS { 
FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] 
]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] 
[...] ]
 
 where from_item can be 
one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] 
{ ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] 
{ ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to 
ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..e8aed10177 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *   /* return: a 
tuple or 

Re: Re: FETCH FIRST clause WITH TIES option

2019-03-28 Thread Tomas Vondra

On Tue, Mar 26, 2019 at 10:46:00AM +0300, Surafel Temesgen wrote:

On Mon, Mar 25, 2019 at 11:56 AM David Steele  wrote:


This patch no longer passes testing so marked Waiting on Author.



Thank  you for informing. Fixed


Thanks for the updated patch.  I do have this on my list of patches that
I'd like to commit in this CF - likely tomorrow after one more round of
review, or so.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Re: FETCH FIRST clause WITH TIES option

2019-03-26 Thread Surafel Temesgen
On Mon, Mar 25, 2019 at 11:56 AM David Steele  wrote:

> This patch no longer passes testing so marked Waiting on Author.
>
>
Thank  you for informing. Fixed
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..b3b045ea87 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..e8aed10177 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -144,18 +146,64 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+		return NULL;
+	}
+
+}
+else
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +359,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONLY)
+		ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
 /*
@@ -374,6 +423,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 		  (PlanState *) limitstate);
+	limitstate->limitOption = node->limitOption;
 
 	/*
 	 * Initialize result type.
@@ -390,6 +440,25 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 	 */
 	limitstate->ps.ps_ProjInfo = NULL;
 
+	/*
+	 * Initialize 

Re: Re: FETCH FIRST clause WITH TIES option

2019-03-25 Thread David Steele

On 2/4/19 2:37 PM, Surafel Temesgen wrote:



On Mon, Feb 4, 2019 at 8:29 AM Michael Paquier > wrote:



This patch needs a rebase because of conflicts done recently for
pluggable storage, so moved to next CF, waiting on author.
--


Thank you . rebased against current master


This patch no longer passes testing so marked Waiting on Author.

Regards,
--
-David
da...@pgmasters.net



Re: FETCH FIRST clause WITH TIES option

2019-02-04 Thread Surafel Temesgen
On Mon, Feb 4, 2019 at 8:29 AM Michael Paquier  wrote:

>
> This patch needs a rebase because of conflicts done recently for
> pluggable storage, so moved to next CF, waiting on author.
> --
>

Thank you . rebased against current master

regards
Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..ed7249daeb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..5e7790c0d6 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONLY)
+		ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
 /*
@@ -374,6 +424,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 		  (PlanState *) limitstate);
+	limitstate->limitOption = node->limitOption;
 
 	/*
 	 * Initialize result type.
@@ -390,6 +441,24 @@ ExecInitLimit(Limit 

Re: FETCH FIRST clause WITH TIES option

2019-02-03 Thread Michael Paquier
On Wed, Jan 16, 2019 at 11:45:46AM +0300, Surafel Temesgen wrote:
> The attached patch use your suggestion uptread

This patch needs a rebase because of conflicts done recently for
pluggable storage, so moved to next CF, waiting on author.
--
Michael


signature.asc
Description: PGP signature


Re: FETCH FIRST clause WITH TIES option

2019-01-16 Thread Surafel Temesgen
On Tue, Jan 15, 2019 at 2:51 PM Tomas Vondra 
wrote:


> What do you mean by "raw statistic"?
>

I mean without further calculation to consider other operation


>
> IMHO the best thing you can do is call estimate_num_groups() and combine
> that with the number of input rows. That shall benefit from ndistinct
> coefficients when available, etc. I've been thinking that considering
> the unreliability of grouping estimates we should use a multiple of the
> average size (because there may be much larger groups), but I think
> that's quite unprecipled and I'd much rather try without it first.
>
>
thank you for clarifying.

The attache patch use your suggestion uptread

regards

Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..ed7249daeb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..5e7790c0d6 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == 

Re: FETCH FIRST clause WITH TIES option

2019-01-15 Thread Tomas Vondra



On 1/15/19 11:07 AM, Surafel Temesgen wrote:
> 
> 
> On Wed, Jan 2, 2019 at 6:19 PM Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
> patch should tweak estimates in some way. Currently, the cardinality
> estimate is the same as for plain LIMIT, using the requested number of
> rows. But let's say there are very few large groups - that will
> naturally increase the number of rows produced.
> 
> As an example, let's say the subplan produces 1M rows, and there are
> 1000 groups (when split according to the ORDER BY clause).
> 
> 
>  
> 
> can we use ORDER BY column raw statistic in limit node reliably?
> because it seems to me it can be affected by other operation in a
> subplan like filter condition
> 

What do you mean by "raw statistic"? Using stats from the underlying
table is not quite possible, because you might be operating on top of
join or something like that.

IMHO the best thing you can do is call estimate_num_groups() and combine
that with the number of input rows. That shall benefit from ndistinct
coefficients when available, etc. I've been thinking that considering
the unreliability of grouping estimates we should use a multiple of the
average size (because there may be much larger groups), but I think
that's quite unprecipled and I'd much rather try without it first.

But maybe we can do better when there really is a single table to
consider, in which case we might look at MCV lists and estimate the
largest group. That would give us a much better idea of the worst case.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

2019-01-15 Thread Surafel Temesgen
On Wed, Jan 2, 2019 at 6:19 PM Tomas Vondra 
wrote:

> After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
> patch should tweak estimates in some way. Currently, the cardinality
> estimate is the same as for plain LIMIT, using the requested number of
> rows. But let's say there are very few large groups - that will
> naturally increase the number of rows produced.
>
> As an example, let's say the subplan produces 1M rows, and there are
> 1000 groups (when split according to the ORDER BY clause).
>



can we use ORDER BY column raw statistic in limit node reliably? because it
seems to me it can be affected by other operation in a subplan like filter
condition

regards

Surafel


Re: FETCH FIRST clause WITH TIES option

2019-01-02 Thread Tomas Vondra



On 1/2/19 11:51 AM, Surafel Temesgen wrote:
> 
> 
> On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> >
> >     The attached patch include all the comment given by Tomas and i
> >     check sql standard about LIMIT and this feature
> >
> 
> Unfortunately, it seems the "limit" regression tests fail for some
> reason - the output mismatches the expected results for some reason. It
> seems as if the WITH TIES code affects ordering of the results within
> the group. See the attached file.
> 
> 
> Yes the reason is the order of returned row is not always the same. I
> remove other columns from the result set to get constant result
> 

Thanks, that seems reasonable.

After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
patch should tweak estimates in some way. Currently, the cardinality
estimate is the same as for plain LIMIT, using the requested number of
rows. But let's say there are very few large groups - that will
naturally increase the number of rows produced.

As an example, let's say the subplan produces 1M rows, and there are
1000 groups (when split according to the ORDER BY clause). Consider a
query with "FETCH FIRST 10 ROWS WITH TIES". AFAICS the current estimate
will be 10, but in fact we know that it's likely to produce ~1000 rows
(because that's the average group size).

So I think the patch should tweak the estimate to be

  limitCount + (avgGroupSize/2);

or perhaps

  Max(avgGroupSize, limitCount + (avgGroupSize/2))

The 1/2 is there because we don't know where the group starts.

Of course, using average group size like this is rather crude, but it's
about the best thing we can do. In principle, increasing the cardinality
estimate is the right thing to do.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

2019-01-02 Thread Surafel Temesgen
On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra 
wrote:

> >
> > The attached patch include all the comment given by Tomas and i
> > check sql standard about LIMIT and this feature
> >
>
> Unfortunately, it seems the "limit" regression tests fail for some
> reason - the output mismatches the expected results for some reason. It
> seems as if the WITH TIES code affects ordering of the results within
> the group. See the attached file.
>
>
Yes the reason is the order of returned row is not always the same. I
remove other columns from the result set to get constant result

Regards

Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..ed7249daeb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 6792f9e86c..4fc3f61b26 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONLY)
+		ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
 /*
@@ 

Re: FETCH FIRST clause WITH TIES option

2019-01-01 Thread Tomas Vondra
Hi,

On 11/24/18 10:28 AM, Surafel Temesgen wrote:
> Attach is rebased patch against the current master
> regards
> Surafel
> 
> On Thu, Nov 1, 2018 at 2:28 PM Surafel Temesgen  > wrote:
> 
> hi,
> 
> The attached patch include all the comment given by Tomas and i
> check sql standard about LIMIT and this feature
> 

Unfortunately, it seems the "limit" regression tests fail for some
reason - the output mismatches the expected results for some reason. It
seems as if the WITH TIES code affects ordering of the results within
the group. See the attached file.

> it did not say anything about it but I think its good idea to
> include it to LIMIT too and I will add it if we have consensus on it.
> 

Hmm, I'm not sure that's needed. I don't see an urgent need to do that
in v1 of the patch.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
*** /home/user/work/postgres/src/test/regress/expected/limit.out
2019-01-01 18:04:32.863729423 +0100
--- /home/user/work/postgres/src/test/regress/results/limit.out 2019-01-01 
18:14:13.328729423 +0100
***
*** 512,527 
ORDER BY thousand FETCH FIRST 2 ROW WITH TIES;
   two | unique1 | unique2 | stringu1 | thousand 
  -+-+-+--+--
-  | 800 |   9 | UE   |0
   | 200 |  95 | SH   |0
-  | 500 | 262 | GT   |0
-  | 400 | 309 | KP   |0
-  | 300 | 374 | OL   |0
-  | 600 | 402 | CX   |0
   | 700 | 434 | YA   |0
   | 900 | 913 | QI   |0
-  | 100 | 946 | WD   |0
   |   0 | 998 | AA   |0
  (10 rows)
  
  SELECT ''::text AS two, unique1, unique2, stringu1, thousand
--- 512,527 
ORDER BY thousand FETCH FIRST 2 ROW WITH TIES;
   two | unique1 | unique2 | stringu1 | thousand 
  -+-+-+--+--
   | 200 |  95 | SH   |0
   | 700 | 434 | YA   |0
+  | 600 | 402 | CX   |0
   | 900 | 913 | QI   |0
   |   0 | 998 | AA   |0
+  | 500 | 262 | GT   |0
+  | 800 |   9 | UE   |0
+  | 100 | 946 | WD   |0
+  | 400 | 309 | KP   |0
+  | 300 | 374 | OL   |0
  (10 rows)
  
  SELECT ''::text AS two, unique1, unique2, stringu1, thousand
***
*** 529,536 
ORDER BY thousand FETCH FIRST 2 ROW ONLY;
   two | unique1 | unique2 | stringu1 | thousand 
  -+-+-+--+--
   | 200 |  95 | SH   |0
-  | 500 | 262 | GT   |0
  (2 rows)
  
  -- should fail
--- 529,536 
ORDER BY thousand FETCH FIRST 2 ROW ONLY;
   two | unique1 | unique2 | stringu1 | thousand 
  -+-+-+--+--
+  | 800 |   9 | UE   |0
   | 200 |  95 | SH   |0
  (2 rows)
  
  -- should fail

==



Re: FETCH FIRST clause WITH TIES option

2018-11-24 Thread Surafel Temesgen
Attach is rebased patch against the current master
regards
Surafel

On Thu, Nov 1, 2018 at 2:28 PM Surafel Temesgen 
wrote:

> hi,
>
> The attached patch include all the comment given by Tomas and i check sql
> standard about LIMIT and this feature
>
> it did not say anything about it but I think its good idea to include it
> to LIMIT too and I will add it if we have consensus on it.
>
> regards
>
> surafel
>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..ed7249daeb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 6792f9e86c..4fc3f61b26 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONLY)
+		ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
 /*
@@ -374,6 +424,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 	

Re: FETCH FIRST clause WITH TIES option

2018-11-01 Thread Surafel Temesgen
hi,

The attached patch include all the comment given by Tomas and i check sql
standard about LIMIT and this feature

it did not say anything about it but I think its good idea to include it to
LIMIT too and I will add it if we have consensus on it.

regards

surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..b649b1ca7a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1408,6 +1408,9 @@ FETCH { FIRST | NEXT } [ count ] {
 If count is
 omitted in a FETCH clause, it defaults to 1.
 ROW
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index bb28cf7d1d..dbc060e9a9 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONLY)
+		ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
 /*
@@ -374,6 +424,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 		  (PlanState *) limitstate);
+	limitstate->limitOption = node->limitOption;
 
 	/*
 	 * Initialize result slot and type. (XXX not actually used, but upper
@@ -387,6 

Re: FETCH FIRST clause WITH TIES option

2018-11-01 Thread Tomas Vondra

On 10/31/2018 06:17 PM, Robert Haas wrote:

On Mon, Oct 29, 2018 at 12:48 PM Andrew Gierth
 wrote:

Then FETCH FIRST N WITH TIES becomes "stop when the expression
   rank() over (order by ...) <= N
is no longer true" (where the ... is the existing top level order by)


Wouldn't that be wicked expensive compared to something hard-coded
that does the same thing?



Not sure, but that was one of my concerns too, particularly for the 
simple FETCH FIRST N WITH TIES case. But I think Andrew has a point it 
would make it much easier to implement the PERCENT case.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

2018-10-31 Thread Robert Haas
On Mon, Oct 29, 2018 at 12:48 PM Andrew Gierth
 wrote:
> Then FETCH FIRST N WITH TIES becomes "stop when the expression
>   rank() over (order by ...) <= N
> is no longer true" (where the ... is the existing top level order by)

Wouldn't that be wicked expensive compared to something hard-coded
that does the same thing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: FETCH FIRST clause WITH TIES option

2018-10-29 Thread Tomas Vondra

On 10/29/2018 05:47 PM, Andrew Gierth wrote:

"Tomas" == Tomas Vondra  writes:


  >> I still think that this is the wrong approach. Implementing WITH
  >> TIES and PERCENT together using an implicit window function call
  >> kills two birds with one very small stone (the only executor change
  >> needed would be teaching LIMIT to be able to stop on a boolean
  >> condition), with maximum reuse of existing facilities.

  Tomas> Hmmm, maybe. How would that work, exactly? Wouldn't that mean
  Tomas> extra overhead (the window functions are hardly free) and
  Tomas> limitations?

They're not free, but then this case probably shouldn't be treated as a
particularly hot code path.

The basic idea is this: extend the Limit node to be able to stop on a
boolean expression, such that the first false value ends the result.

Then FETCH FIRST N WITH TIES becomes "stop when the expression
   rank() over (order by ...) <= N
is no longer true" (where the ... is the existing top level order by)

and FETCH FIRST N PERCENT is the same but with percent_rank (or
cume_dist, I'd have to check the exact semantics)

rank() doesn't need to read ahead in the input. percent_rank of course
does, but it's clearly impossible to implement PERCENT without doing
that.



OK. I was under the impression the window function would need to see all 
the input rows, but that seems not to be the case in general.



Also, one could consider extending LIMIT to support arbitrary
expressions, with some syntax like LIMIT WHEN (condition) which would be
the general form.



Hmmm. I can't really recall needing such thing, but of course - that 
does not prove it'd not be a good thing.



  Tomas> FWIW, I doubt the patch can be much smaller/simpler - a
  Tomas> significant part of the new stuff is in gram.y and node read/out
  Tomas> infrastructure, the changes to LIMIT node are fairly minimal.

It's not a matter of patch size as such, but reuse of mechanisms rather
than adding new ones. Also doing WITH TIES as a special case in the
Limit node doesn't make adding PERCENT any easier at all, whereas the
above gets it basically for free.



Makes sense, I guess. At first I was thinking that this certainly does 
not add more new mechanisms than allowing LIMIT to terminate on boolean 
expression. But you're right about the WITH PERCENT part - using window 
functions would make adding this much simpler.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

2018-10-29 Thread Andrew Gierth
> "Tomas" == Tomas Vondra  writes:

 >> I still think that this is the wrong approach. Implementing WITH
 >> TIES and PERCENT together using an implicit window function call
 >> kills two birds with one very small stone (the only executor change
 >> needed would be teaching LIMIT to be able to stop on a boolean
 >> condition), with maximum reuse of existing facilities.

 Tomas> Hmmm, maybe. How would that work, exactly? Wouldn't that mean
 Tomas> extra overhead (the window functions are hardly free) and
 Tomas> limitations?

They're not free, but then this case probably shouldn't be treated as a
particularly hot code path.

The basic idea is this: extend the Limit node to be able to stop on a
boolean expression, such that the first false value ends the result.

Then FETCH FIRST N WITH TIES becomes "stop when the expression
  rank() over (order by ...) <= N
is no longer true" (where the ... is the existing top level order by)

and FETCH FIRST N PERCENT is the same but with percent_rank (or
cume_dist, I'd have to check the exact semantics)

rank() doesn't need to read ahead in the input. percent_rank of course
does, but it's clearly impossible to implement PERCENT without doing
that.

Also, one could consider extending LIMIT to support arbitrary
expressions, with some syntax like LIMIT WHEN (condition) which would be
the general form.

 Tomas> FWIW, I doubt the patch can be much smaller/simpler - a
 Tomas> significant part of the new stuff is in gram.y and node read/out
 Tomas> infrastructure, the changes to LIMIT node are fairly minimal.

It's not a matter of patch size as such, but reuse of mechanisms rather
than adding new ones. Also doing WITH TIES as a special case in the
Limit node doesn't make adding PERCENT any easier at all, whereas the
above gets it basically for free.

-- 
Andrew (irc:RhodiumToad)



Re: FETCH FIRST clause WITH TIES option

2018-10-29 Thread Tomas Vondra

On 10/29/2018 04:17 PM, Andrew Gierth wrote:

"Tomas" == Tomas Vondra  writes:


  > On 10/26/2018 12:28 PM, Surafel Temesgen wrote:
  >> hello ,
  >>
  >> The WITH TIES keyword is sql standard that specifies any peers of
  >> retained rows to retained in the result set too .which means
  >> according to ordering key the result set can includes additional rows
  >> which have ties on the last position, if there are any and It work
  >> with ORDER BY query.

  Tomas> Thanks for the patch. I've looked at it today, and it seems
  Tomas> mostly OK, with a couple of minor issues. Most of it is code
  Tomas> formatting and comment wording issues, so I'm not going to go
  Tomas> through them here - see the attached 0002 patch (0001 is your
  Tomas> patch, rebased to current master).

I still think that this is the wrong approach. Implementing WITH TIES
and PERCENT together using an implicit window function call kills two
birds with one very small stone (the only executor change needed would
be teaching LIMIT to be able to stop on a boolean condition), with
maximum reuse of existing facilities.



Hmmm, maybe. How would that work, exactly? Wouldn't that mean extra 
overhead (the window functions are hardly free) and limitations? Perhaps 
that was discussed in some other thread in the past?


FWIW, I doubt the patch can be much smaller/simpler - a significant part 
of the new stuff is in gram.y and node read/out infrastructure, the 
changes to LIMIT node are fairly minimal.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

2018-10-29 Thread Andrew Gierth
> "Tomas" == Tomas Vondra  writes:

 > On 10/26/2018 12:28 PM, Surafel Temesgen wrote:
 >> hello ,
 >> 
 >> The WITH TIES keyword is sql standard that specifies any peers of 
 >> retained rows to retained in the result set too .which means 
 >> according to ordering key the result set can includes additional rows
 >> which have ties on the last position, if there are any and It work
 >> with ORDER BY query.

 Tomas> Thanks for the patch. I've looked at it today, and it seems
 Tomas> mostly OK, with a couple of minor issues. Most of it is code
 Tomas> formatting and comment wording issues, so I'm not going to go
 Tomas> through them here - see the attached 0002 patch (0001 is your
 Tomas> patch, rebased to current master).

I still think that this is the wrong approach. Implementing WITH TIES
and PERCENT together using an implicit window function call kills two
birds with one very small stone (the only executor change needed would
be teaching LIMIT to be able to stop on a boolean condition), with
maximum reuse of existing facilities.

-- 
Andrew (irc:RhodiumToad)



Re: FETCH FIRST clause WITH TIES option

2018-10-28 Thread Tomas Vondra
Hello Surafel,

On 10/26/2018 12:28 PM, Surafel Temesgen wrote:
> hello ,
> 
> The WITH TIES keyword is sql standard that specifies any peers of 
> retained rows to retained in the result set too .which means 
> according to ordering key the result set can includes additional rows
> which have ties on the last position, if there are any and It work
> with ORDER BY query.
>

Thanks for the patch. I've looked at it today, and it seems mostly OK,
with a couple of minor issues. Most of it is code formatting and comment
wording issues, so I'm not going to go through them here - see the
attached 0002 patch (0001 is your patch, rebased to current master).

There's one place that I think is incorrect, and that's this bit from
ExecLimit:

}else
/*
 * Get next tuple from subplan, if any.
 */
slot = ExecProcNode(outerPlan);
if (TupIsNull(slot))
{
node->lstate = LIMIT_SUBPLANEOF;
return NULL;
}
if (node->limitOption == WITH_TIES)
{
ExecCopySlot(node->last_slot, slot);
}
node->subSlot = slot;
node->position++;

Note that the "else" guards only the very first line, not the whole
block. This seems wrong, i.e. there should be {} around the whole block.

I'm also suggesting to slightly change the create_limit_plan(), to keep
a single make_limit call. IMHO that makes it easier to understand,
although I admit it's fairly subjective.

One question is whether to make this work for LIMIT too, not just for
FETCH FIRST. I'm not sure how difficult would that be (it should be a
grammar-only tweak I guess), or if it's a good idea in general. But I
find it quite confusing that various comments say things like this:

  LimitOption  limitOption; /* LIMIT with ties or  exact number */

while in fact it does not work with LIMIT.

> 
> The attach patch is a finished implementation of it except it did not
> have a regression test because am not sure of changing the test data set
> for it and I can’t find fetch first clause regression test too.
> 

Well, that's not a very good reason not to have tests for this new
improvement. FWIW there are a couple of places in regression tests where
FETCH FIRST is used, see this:

  $ grep -i 'fetch first' src/test/regress/sql/*
  src/test/regress/sql/hs_standby_allowed.sql:fetch first from hsc;
  src/test/regress/sql/tablesample.sql:FETCH FIRST FROM tablesample_cur;
  src/test/regress/sql/tablesample.sql:FETCH FIRST FROM tablesample_cur;
  src/test/regress/sql/tidscan.sql:FETCH FIRST FROM c;

But those places don't seem like very good match for testing the new
stuff, because those are primarily testing something else. I suggest we
add the new tests into limit.sql.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 7f5549614c423a1da060da5f7598e9d0836d03a8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 28 Oct 2018 18:38:47 +0100
Subject: [PATCH 2/2] review

---
 doc/src/sgml/ref/select.sgml|  6 ++--
 src/backend/executor/nodeLimit.c| 52 +++--
 src/backend/optimizer/plan/createplan.c | 25 +++-
 src/include/nodes/execnodes.h   |  4 +--
 src/include/nodes/parsenodes.h  |  4 +--
 src/include/nodes/plannodes.h   |  2 +-
 src/include/nodes/relation.h|  2 +-
 7 files changed, 49 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index c81ac04108..b649b1ca7a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1408,9 +1408,9 @@ FETCH { FIRST | NEXT } [ count ] {
 If count is
 omitted in a FETCH clause, it defaults to 1.
 ROW
-WITH TIES option is Used to return two or more rows 
-that tie for last place in the limit results set according to ORDER BY 
-clause (ORDER BY clause must be specified in this case). 
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 93f8813972..724a8e09c1 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -132,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count && node->limitOption == WITH_ONLY)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,23 +146,24 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
-
 else if (!node->noCount &&
-	node->position - node->offset >= node->count && 

FETCH FIRST clause WITH TIES option

2018-10-26 Thread Surafel Temesgen
hello ,

The WITH TIES keyword is sql standard that specifies any peers of retained
rows

to retained in the result set too .which means according to ordering key
the result set can includes additional rows which have ties on the last
position, if there are any and It work with ORDER BY query

The attach patch is a finished implementation of it except it did not have
a regression test because am not sure of changing the test data set for it
and I can’t find fetch first clause regression test too.

Regards

Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..a93e856946 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1408,6 +1408,9 @@ FETCH { FIRST | NEXT } [ count ] {
 If count is
 omitted in a FETCH clause, it defaults to 1.
 ROW
+WITH TIES option is Used to return two or more rows 
+that tie for last place in the limit results set according to ORDER BY 
+clause (ORDER BY clause must be specified in this case). 
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index bb28cf7d1d..93f8813972 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,7 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count && node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,6 +146,45 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
+else if (!node->noCount &&
+	node->position - node->offset >= node->count && node->limitOption == WITH_TIES)
+{
+	/*
+	* Get next tuple from subplan, if any.
+	*/
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	* Test if the new tuple and the last tuple match.
+	* If so we return the tuple.
+	*/
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}else
 /*
  * Get next tuple from subplan, if any.
  */
@@ -154,6 +194,10 @@ ExecLimit(PlanState *pstate)
 	node->lstate = LIMIT_SUBPLANEOF;
 	return NULL;
 }
+if (node->limitOption == WITH_TIES)
+{
+	ExecCopySlot(node->last_slot, slot);
+}
 node->subSlot = slot;
 node->position++;
 			}
@@ -311,6 +355,7 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
+	if(node->limitOption == WITH_ONLY)
 	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
@@ -374,6 +419,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 		  (PlanState *) limitstate);
+	limitstate->limitOption = node->limitOption;
 
 	/*
 	 * Initialize result slot and type. (XXX not actually used, but upper
@@ -387,6 +433,19 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 	 */
 	limitstate->ps.ps_ProjInfo = NULL;
 
+	if (node->limitOption == WITH_TIES)
+	{