subject:"\[HACKERS\] \[PATCH\] Incremental sort"

Re: [HACKERS] [PATCH] Incremental sort

2018-10-28 Thread Tomas Vondra

Hi Alexander,

On 06/01/2018 04:22 PM, Alexander Korotkov wrote:
> Hi, James!
> 
> On Thu, May 31, 2018 at 11:10 PM James Coleman  > wrote:
> 
> I've attached an updated copy of the patch that applies cleanly to
> current master.
> 
> 
> Thank you for rebasing this patch.  Next time sending a patch,
> please make sure you've bumped its version, if even you made no
> changes there besides to pure rebase. Otherwise, it would be hard to
> distinguish patch versions, because patch files have exactly same
> names.
> 
> I'd like to note, that I'm going to provide revised version of this 
> patch to the next commitfest. After some conversations at PGCon
> regarding this patch, I got more confident that providing separate
> paths for incremental sorts is right.  In the next revision of this
> patch, incremental sort paths would be provided in more cases.
> 

Do you plan to submit an updated patch version for the November CF? It
would be good to look at the costing model soon, otherwise it might end
up missing PG12, and that would be unfortunate.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Re: [HACKERS] [PATCH] Incremental sort

2018-06-01 Thread Alexander Korotkov

Hi, James!

On Thu, May 31, 2018 at 11:10 PM James Coleman  wrote:

> I've attached an updated copy of the patch that applies cleanly to
> current master.
>

Thank you for rebasing this patch.  Next time sending a patch, please make
sure you've bumped
its version, if even you made no changes there besides to pure rebase.
Otherwise, it would be
hard to distinguish patch versions, because patch files have exactly same
names.

I'd like to note, that I'm going to provide revised version of this patch
to the next commitfest.
After some conversations at PGCon regarding this patch, I got more
confident that providing
separate paths for incremental sorts is right.  In the next revision of
this patch, incremental
sort paths would be provided in more cases.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Re: [HACKERS] [PATCH] Incremental sort

2018-05-31 Thread James Coleman

I've attached an updated copy of the patch that applies cleanly to 
current master.



commit 6428245702a40b3e3fa11bb64b7611cdd33a0778
Author: Alexander Korotkov 
Date:   Sat Apr 7 18:51:20 2018 +0300

Implement incremental sort

Incremental sort is an optimized variant of multikey sort for cases when the
input is already sorted by a prefix of the sort keys.  For example when a 
sort
by (key1, key2 ... keyN) is requested, and the input is already sorted by
(key1, key2 ... keyM), M < N, we can divide the input into groups where keys
(key1, ... keyM) are equal, and only sort on the remaining columns.

Incremental sort can give a huge benefit when LIMIT clause is specified,
then it wouldn't even have to read the whole input.  Another huge benefit
of incremental sort is that sorting data in small groups may help to evade
using disk during sort.  However, on small datasets which fit into memory
incremental sort may be slightly slower than full sort.  That was reflected
in costing.

This patch implements very basic usage of incremental sort: it gets used
only in create_ordered_paths(), while it sort can help in much more use 
cases,
for instance in merge join.  But latter would require much more changes in
optimizer and postponed for further releases.

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index e4d9469fdd..61775e6726 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1999,28 +1999,62 @@ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 
FROM ft2 t2 WHERE t1.c1 = t2
  119
 (10 rows)
 
--- CROSS JOIN, not pushed down
+-- CROSS JOIN, not pushed down.  For this query, essential optimization is 
top-N
+-- sort.  But it can't be processed at remote side, because we never do LIMIT
+-- push down.  Assuming that sorting is not worth it to push down, CROSS JOIN
+-- is also not pushed down in order to transfer less tuples over network.
 EXPLAIN (VERBOSE, COSTS OFF)
-SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 
100 LIMIT 10;
- QUERY PLAN  
--
+SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 OFFSET 
100 LIMIT 10;
+QUERY PLAN
+--
  Limit
-   Output: t1.c1, t2.c1
+   Output: t1.c3, t2.c3
->  Sort
- Output: t1.c1, t2.c1
- Sort Key: t1.c1, t2.c1
+ Output: t1.c3, t2.c3
+ Sort Key: t1.c3, t2.c3
  ->  Nested Loop
-   Output: t1.c1, t2.c1
+   Output: t1.c3, t2.c3
->  Foreign Scan on public.ft1 t1
- Output: t1.c1
- Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+ Output: t1.c3
+ Remote SQL: SELECT c3 FROM "S 1"."T 1"
->  Materialize
- Output: t2.c1
+ Output: t2.c3
  ->  Foreign Scan on public.ft2 t2
-   Output: t2.c1
-   Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+   Output: t2.c3
+   Remote SQL: SELECT c3 FROM "S 1"."T 1"
 (15 rows)
 
+SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 OFFSET 
100 LIMIT 10;
+  c3   |  c3   
+---+---
+ 1 | 00101
+ 1 | 00102
+ 1 | 00103
+ 1 | 00104
+ 1 | 00105
+ 1 | 00106
+ 1 | 00107
+ 1 | 00108
+ 1 | 00109
+ 1 | 00110
+(10 rows)
+
+-- CROSS JOIN, pushed down.  Unlike previous query, remote side is able to
+-- return tuples in given order without full sort, but using index scan and
+-- incremental sort.  This is much cheaper than full sort on local side, even
+-- despite we don't know LIMIT on remote side.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 
100 LIMIT 10;
+
QUERY PLAN  
   
+---
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Foreign Scan
+ Output: t1.c1, t2.c1
+ Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+ Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (TRUE)) ORDER BY r1."C 1" ASC NULLS LAST, r2."C 1" ASC NULLS 
LAST
+(6 rows)
+
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 
100 LIMIT 10;
  c1 | c1  
 +-
diff

Re: [HACKERS] [PATCH] Incremental sort

2018-04-10 Thread Alexander Korotkov

On Tue, Apr 10, 2018 at 4:15 PM, David Steele  wrote:

> On 4/9/18 11:56 AM, Alexander Korotkov wrote:
> >
> > Thank you very much for your efforts on reviewing this patch.
> > I'm looking forward work with you on this feature for PG12.
>
> Since there's a new patch I have changed the status to Needs Review and
> moved the entry to the next CF.
>

Right, thank you.

Still, it seems to me that discussion and new patches will be required
> to address Tom's concerns.
>

Sounds correct for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2018-04-10 Thread David Steele

On 4/9/18 11:56 AM, Alexander Korotkov wrote:
> 
> Thank you very much for your efforts on reviewing this patch.
> I'm looking forward work with you on this feature for PG12.

Since there's a new patch I have changed the status to Needs Review and
moved the entry to the next CF.

Still, it seems to me that discussion and new patches will be required
to address Tom's concerns.

Regards,
-- 
-David
da...@pgmasters.net

Re: [HACKERS] [PATCH] Incremental sort

2018-04-09 Thread Alexander Korotkov

On Sat, Apr 7, 2018 at 11:57 PM, Tomas Vondra 
wrote:

> On 04/07/2018 06:23 PM, Tom Lane wrote:
> > Teodor Sigaev  writes:
> >>> I dunno, how would you estimate whether this is actually a win or not?
> >>> I don't think our model of sort costs is anywhere near refined enough
> >>> or accurate enough to reliably predict whether this is better than
> >>> just doing it in one step.  Even if the cost model is good, it's not
> >>> going to be better than our statistics about the number/size of the
> >>> groups in the first column(s), and that's a notoriously unreliable
> stat.
> >
> >> I think that improvement in cost calculation of sort should be a
> >> separate patch, not directly connected to this one. Postpone patches
> >> till other part will be ready to get max improvement for postponed ones
> >> doesn't seem to me very good, especially if it suggests some improvement
> >> right now.
> >
> > No, you misunderstand the point of my argument.  Without a reasonably
> > reliable cost model, this patch could easily make performance *worse*
> > not better for many people, due to choosing incremental-sort plans
> > where they were really a loss.
> >
>
> Yeah. Essentially the patch could push the planner to pick a path that
> has low startup cost (and very high total cost), assuming it'll only
> need to read small part of the input. But if the number of groups in the
> input is low (perhaps just one huge group), that would be a regression.
>

Yes, I think the biggest risk here is too small number of groups.  More
precisely the risk is too large groups while total number of groups might
be large enough.

> If we were at the start of a development cycle and work were being
> > promised to be done later in the cycle to improve the planning aspect,
> > I'd be more charitable about it.  But this isn't merely the end of a
> > cycle, it's the *last day*.  Now is not the time to commit stuff that
> > needs, or even just might need, follow-on work.
> >
>
> +1 to that
>
> FWIW I'm willing to spend some time on the patch for PG12, particularly
> on the planner / costing part. The potential gains are too interesting
>

Thank you very much for your efforts on reviewing this patch.
I'm looking forward work with you on this feature for PG12.

FWIW, I think that we're moving this patch in the right direction by
providing separate paths for incremental sort.  It's much better than
deciding between full or incremental sort in-place.  For sure, considereing
extra paths might cause planning time regression.  But I think the
same statement is true about many other planning optimizations.
One thing be can do is to make enable_incrementalsort = off by
default.  Then only users who understand improtance of incremental
sort will get extra planning time.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Tomas Vondra

On 04/07/2018 06:23 PM, Tom Lane wrote:
> Teodor Sigaev  writes:
>>> I dunno, how would you estimate whether this is actually a win or not?
>>> I don't think our model of sort costs is anywhere near refined enough
>>> or accurate enough to reliably predict whether this is better than
>>> just doing it in one step.  Even if the cost model is good, it's not
>>> going to be better than our statistics about the number/size of the
>>> groups in the first column(s), and that's a notoriously unreliable stat.
> 
>> I think that improvement in cost calculation of sort should be a 
>> separate patch, not directly connected to this one. Postpone patches 
>> till other part will be ready to get max improvement for postponed ones 
>> doesn't seem to me very good, especially if it suggests some improvement 
>> right now.
> 
> No, you misunderstand the point of my argument.  Without a reasonably
> reliable cost model, this patch could easily make performance *worse*
> not better for many people, due to choosing incremental-sort plans
> where they were really a loss.
> 

Yeah. Essentially the patch could push the planner to pick a path that
has low startup cost (and very high total cost), assuming it'll only
need to read small part of the input. But if the number of groups in the
input is low (perhaps just one huge group), that would be a regression.

> If we were at the start of a development cycle and work were being
> promised to be done later in the cycle to improve the planning aspect,
> I'd be more charitable about it.  But this isn't merely the end of a
> cycle, it's the *last day*.  Now is not the time to commit stuff that
> needs, or even just might need, follow-on work.
> 

+1 to that

FWIW I'm willing to spend some time on the patch for PG12, particularly
on the planner / costing part. The potential gains are too interesting.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Tom Lane

Teodor Sigaev  writes:
>> I dunno, how would you estimate whether this is actually a win or not?
>> I don't think our model of sort costs is anywhere near refined enough
>> or accurate enough to reliably predict whether this is better than
>> just doing it in one step.  Even if the cost model is good, it's not
>> going to be better than our statistics about the number/size of the
>> groups in the first column(s), and that's a notoriously unreliable stat.

> I think that improvement in cost calculation of sort should be a 
> separate patch, not directly connected to this one. Postpone patches 
> till other part will be ready to get max improvement for postponed ones 
> doesn't seem to me very good, especially if it suggests some improvement 
> right now.

No, you misunderstand the point of my argument.  Without a reasonably
reliable cost model, this patch could easily make performance *worse*
not better for many people, due to choosing incremental-sort plans
where they were really a loss.

If we were at the start of a development cycle and work were being
promised to be done later in the cycle to improve the planning aspect,
I'd be more charitable about it.  But this isn't merely the end of a
cycle, it's the *last day*.  Now is not the time to commit stuff that
needs, or even just might need, follow-on work.

regards, tom lane

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Alexander Korotkov

On Sat, Apr 7, 2018 at 5:37 PM, Teodor Sigaev  wrote:

> by this patch.  Revised version is attached.
>>
>
> Fine, patch got several rounds of review in all its parts. Is any places
> which should be improved before commit?

Also I found that after planner changes of Alexander Kuzmenkov, incremental
sort
was used in cheapest_input_path() only if its child is cheapest total path.
That makes incremental sort to not get used almost never.
I've changed that to consider incremental sort path when we have some
presorted columns.  I also have to put changes in postgres_fdw regression
tests back, because incremental sort was used right there.

This revision of the patch also includes commit message.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

incremental-sort-26.patch
Description: Binary data

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Teodor Sigaev


I dunno, how would you estimate whether this is actually a win or not?
I don't think our model of sort costs is anywhere near refined enough
or accurate enough to reliably predict whether this is better than
just doing it in one step.  Even if the cost model is good, it's not
going to be better than our statistics about the number/size of the
groups in the first column(s), and that's a notoriously unreliable stat.
I think that improvement in cost calculation of sort should be a 
separate patch, not directly connected to this one. Postpone patches 
till other part will be ready to get max improvement for postponed ones 
doesn't seem to me very good, especially if it suggests some improvement 
right now.



--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Andres Freund

On 2018-04-07 12:06:52 -0400, Tom Lane wrote:
> Alexander Korotkov  writes:
> > On Wed, Mar 28, 2018 at 6:38 PM, Alvaro Herrera 
> > wrote:
> >> Can we have a recap on what the patch *does*?
> 
> > Ggeneral idea hasn't been changed much since first email.
> > Incremental sort gives benefit when you need to sort your dataset
> > by some list of columns while you alredy have input presorted
> > by some prefix of that list of columns.  Then you don't do full sort
> > of dataset, but rather sort groups where values of prefix columns
> > are equal (see header comment in nodeIncremenalSort.c).
> 
> I dunno, how would you estimate whether this is actually a win or not?
> I don't think our model of sort costs is anywhere near refined enough
> or accurate enough to reliably predict whether this is better than
> just doing it in one step.  Even if the cost model is good, it's not
> going to be better than our statistics about the number/size of the
> groups in the first column(s), and that's a notoriously unreliable stat.
> 
> Given that we already have more than enough dubious patches that have
> been shoved in in the last few days, I'd rather not pile on stuff that
> there's any question about.

I don't disagree with any of that. Just wanted to pipe up to say that
there's a fair argument to be made that this patch, which has lingered
for years, "deserves" more to mature in tree than some of the rest.

Greetings,

Andres Freund

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Tom Lane

Alexander Korotkov  writes:
> On Wed, Mar 28, 2018 at 6:38 PM, Alvaro Herrera 
> wrote:
>> Can we have a recap on what the patch *does*?

> Ggeneral idea hasn't been changed much since first email.
> Incremental sort gives benefit when you need to sort your dataset
> by some list of columns while you alredy have input presorted
> by some prefix of that list of columns.  Then you don't do full sort
> of dataset, but rather sort groups where values of prefix columns
> are equal (see header comment in nodeIncremenalSort.c).

I dunno, how would you estimate whether this is actually a win or not?
I don't think our model of sort costs is anywhere near refined enough
or accurate enough to reliably predict whether this is better than
just doing it in one step.  Even if the cost model is good, it's not
going to be better than our statistics about the number/size of the
groups in the first column(s), and that's a notoriously unreliable stat.

Given that we already have more than enough dubious patches that have
been shoved in in the last few days, I'd rather not pile on stuff that
there's any question about.

regards, tom lane

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Alexander Korotkov

On Wed, Mar 28, 2018 at 6:38 PM, Alvaro Herrera 
wrote:

> Teodor Sigaev wrote:
> > > BTW, patch had conflicts with master.  Please, find rebased version
> attached.
> >
> > Despite by patch conflist patch looks commitable, has anybody objections
> to
> > commit it?
> >
> > Patch recieved several rounds of review during 2 years, and seems to me,
> > keeping it out from sources may cause a lost it. Although it suggests
> > performance improvement in rather wide usecases.
>
> Can we have a recap on what the patch *does*?  I see there's a
> description in Alexander's first email
> https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=
> V7wgDaTXdDd9=gon-...@mail.gmail.com
> but that was a long time ago, and the patch has likely changed in the
> meantime ...
>

Ggeneral idea hasn't been changed much since first email.
Incremental sort gives benefit when you need to sort your dataset
by some list of columns while you alredy have input presorted
by some prefix of that list of columns.  Then you don't do full sort
of dataset, but rather sort groups where values of prefix columns
are equal (see header comment in nodeIncremenalSort.c).

Same example as in the first letter works, but plan displays
differently.

create table test as (select id, (random()*1)::int as v1, random() as
v2 from generate_series(1,100) id);
create index test_v1_idx on test (v1);

# explain select * from test order by v1, v2 limit 10;
  QUERY PLAN
---
 Limit  (cost=1.26..1.26 rows=10 width=16)
   ->  Incremental Sort  (cost=1.26..1.42 rows=100 width=16)
 Sort Key: v1, v2
 Presorted Key: v1
 ->  Index Scan using test_v1_idx on test  (cost=0.42..47602.50
rows=100 width=16)
(5 rows)

# select * from test order by v1, v2 limit 10;
   id   | v1 | v2
++
 216426 |  0 | 0.0697950166650116
  96649 |  0 |  0.230586454737931
 892243 |  0 |  0.677791305817664
 323001 |  0 |  0.708638620562851
  87458 |  0 |  0.923310813494027
 224291 |  0 |0.9349579163827
 446366 |  0 |  0.984529701061547
 376781 |  0 |  0.997424073051661
 768246 |  1 |  0.127851997036487
 666102 |  1 |   0.27093240711838
(10 rows)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Tomas Vondra

On 04/07/2018 04:37 PM, Teodor Sigaev wrote:
>> by this patch.  Revised version is attached.
> 
> Fine, patch got several rounds of review in all its parts. Is any
> places which should be improved before commit?
> 

I personally feel rather uneasy about committing it, TBH.

While I don't see any obvious issues in the patch at the moment, the
recent changes were rather significant so I might easily miss some
unexpected consequences. (OTOH it's true it was mostly about reduction
of scope, to limit the risks.)

I don't have time to do more review and testing on the latest patch
version, unfortunately, certainly not before the CF end.

So I guess the ultimate review / decision is up to you ...

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Teodor Sigaev


by this patch.  Revised version is attached.


Fine, patch got several rounds of review in all its parts. Is any places 
which should be improved before commit?


--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Alexander Korotkov

On Sat, Apr 7, 2018 at 4:56 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Apr 6, 2018 at 11:40 PM, Alexander Kuzmenkov <
> a.kuzmen...@postgrespro.ru> wrote:
>
>> On 06.04.2018 20:26, Tomas Vondra wrote:
>>
>>> I personally am OK with reducing the scope of the patch like this. It's
>>> still beneficial for the common ORDER BY + LIMIT case, which is good. I
>>> don't think it may negatively affect other cases (at least I can't think
>>> of any).
>>>
>>
>> I think we can reduce it even further. Just try incremental sort along
>> with full sort over the cheapest path in create_ordered_paths, and don't
>> touch anything else. This is a very minimal and a probably safe start, and
>> then we can continue working on other, more complex cases. In the attached
>> patch I tried to do this. We probably should also remove changes in
>> make_sort() and create a separate function make_incremental_sort() for it,
>> but I'm done for today.
>
>
> I've done further unwedding of sort and incremental sort providing them
> separate function for plan createion.
>
> 2) Likewise, I've suggested that the claim about abbreviated keys in
>>>
>> nodeIncrementalsort.c is dubious. No response, and the XXX comment was
>>> instead merged into the patch:
>>>
>>>   * XXX The claim about abbreviated keys seems rather dubious, IMHO.
>>>
>>
>> Not sure about that, maybe just use abbreviated keys for the first
>> version? Later we can research this more closely and maybe start deciding
>> whether to use abbrev on planning stage.
>
>
> That comes from time when we're trying to make incremental sort to be
> always
> not worse than full sort.  Now, we have separate paths for full and
> incremental sorts,
> and some costing penalty for incremental sort.  So, incremental sort
> should be
> selected only when it's expected to give big win.  Thus, we can give up
> with this
> optimization at least in the initial version.
>
> So, removed.
>
> 4) It's not clear to me why INITIAL_MEMTUPSIZE is defined the way it is.
>>> There needs to be a comment - the intent seems to be making it large
>>> enough to exceed ALLOCSET_SEPARATE_THRESHOLD, but it's not quite clear
>>> why that's a good idea.
>>>
>>
>> Not sure myself, let's ask the other Alexander.
>
>
> I've added comment to INITIAL_MEMTUPSIZE.  However, to be fair it's not
> invention of this patch.  Initial size of memtuples array was so
> previously.
> What this patch did is just move it to the macro.
>
> Also, this note hadn't been adressed yet.
>
> On Sat, Mar 31, 2018 at 11:43 PM, Tomas Vondra  2ndquadrant.com> wrote:
>
>> I'm wondering if a static MIN_GROUP_SIZE is good idea. For example, what
>> if the subplan is expected to return only very few tuples (say, 33), but
>> the query includes LIMIT 1. Now, let's assume the startup/total cost of
>> the subplan is 1 and 100. With MIN_GROUP_SIZE 32 we're bound to
>> execute it pretty much till the end, while we could terminate after the
>> first tuple (if the prefix changes).
>>
>> So I think we should use a Min(limit,MIN_GROUP_SIZE) here, and perhaps
>> this should depend on average group size too.
>>
>
> I agree with that.  For bounded sort, attached patch now selects minimal
> group
> size as Min(DEFAULT_MIN_GROUP_SIZE, bound).  That should improve
> "LIMIT small_number" case.
>

I've just noticed that incremental sort now is not used in
contrib/postgres_fdw.
It's even better assuming that we're going to limit use-cases of incremental
sort.  I've rolled back all the changes made in tests of
contirb/postgres_fdw
by this patch.  Revised version is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


incremental-sort-25.patch
Description: Binary data

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Alexander Korotkov

On Fri, Apr 6, 2018 at 11:40 PM, Alexander Kuzmenkov <
a.kuzmen...@postgrespro.ru> wrote:

> On 06.04.2018 20:26, Tomas Vondra wrote:
>
>> I personally am OK with reducing the scope of the patch like this. It's
>> still beneficial for the common ORDER BY + LIMIT case, which is good. I
>> don't think it may negatively affect other cases (at least I can't think
>> of any).
>>
>
> I think we can reduce it even further. Just try incremental sort along
> with full sort over the cheapest path in create_ordered_paths, and don't
> touch anything else. This is a very minimal and a probably safe start, and
> then we can continue working on other, more complex cases. In the attached
> patch I tried to do this. We probably should also remove changes in
> make_sort() and create a separate function make_incremental_sort() for it,
> but I'm done for today.


I've done further unwedding of sort and incremental sort providing them
separate function for plan createion.

2) Likewise, I've suggested that the claim about abbreviated keys in
>>
> nodeIncrementalsort.c is dubious. No response, and the XXX comment was
>> instead merged into the patch:
>>
>>   * XXX The claim about abbreviated keys seems rather dubious, IMHO.
>>
>
> Not sure about that, maybe just use abbreviated keys for the first
> version? Later we can research this more closely and maybe start deciding
> whether to use abbrev on planning stage.


That comes from time when we're trying to make incremental sort to be always
not worse than full sort.  Now, we have separate paths for full and
incremental sorts,
and some costing penalty for incremental sort.  So, incremental sort should
be
selected only when it's expected to give big win.  Thus, we can give up
with this
optimization at least in the initial version.

So, removed.

4) It's not clear to me why INITIAL_MEMTUPSIZE is defined the way it is.
>> There needs to be a comment - the intent seems to be making it large
>> enough to exceed ALLOCSET_SEPARATE_THRESHOLD, but it's not quite clear
>> why that's a good idea.
>>
>
> Not sure myself, let's ask the other Alexander.


I've added comment to INITIAL_MEMTUPSIZE.  However, to be fair it's not
invention of this patch.  Initial size of memtuples array was so previously.
What this patch did is just move it to the macro.

Also, this note hadn't been adressed yet.

On Sat, Mar 31, 2018 at 11:43 PM, Tomas Vondra  wrote:

> I'm wondering if a static MIN_GROUP_SIZE is good idea. For example, what
> if the subplan is expected to return only very few tuples (say, 33), but
> the query includes LIMIT 1. Now, let's assume the startup/total cost of
> the subplan is 1 and 100. With MIN_GROUP_SIZE 32 we're bound to
> execute it pretty much till the end, while we could terminate after the
> first tuple (if the prefix changes).
>
> So I think we should use a Min(limit,MIN_GROUP_SIZE) here, and perhaps
> this should depend on average group size too.
>

I agree with that.  For bounded sort, attached patch now selects minimal
group
size as Min(DEFAULT_MIN_GROUP_SIZE, bound).  That should improve
"LIMIT small_number" case.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


incremental-sort-24.patch
Description: Binary data

Re: [HACKERS] [PATCH] Incremental sort

2018-04-06 Thread Alexander Kuzmenkov


Hi all,

This is the other Alexander K. speaking.


On 06.04.2018 20:26, Tomas Vondra wrote:

I personally am OK with reducing the scope of the patch like this. It's
still beneficial for the common ORDER BY + LIMIT case, which is good. I
don't think it may negatively affect other cases (at least I can't think
of any).


I think we can reduce it even further. Just try incremental sort along 
with full sort over the cheapest path in create_ordered_paths, and don't 
touch anything else. This is a very minimal and a probably safe start, 
and then we can continue working on other, more complex cases. In the 
attached patch I tried to do this. We probably should also remove 
changes in make_sort() and create a separate function 
make_incremental_sort() for it, but I'm done for today.




1) pathkeys_useful_for_ordering() still uses enable_incrementalsort,
which I think is a bad idea. I've complained about it in my review on
31/3, and I don't see any explanation why this is a good idea.


Removed.


2) Likewise, I've suggested that the claim about abbreviated keys in
nodeIncrementalsort.c is dubious. No response, and the XXX comment was
instead merged into the patch:

  * XXX The claim about abbreviated keys seems rather dubious, IMHO.


Not sure about that, maybe just use abbreviated keys for the first 
version? Later we can research this more closely and maybe start 
deciding whether to use abbrev on planning stage.



3) There is a comment at cost_merge_append, despite there being no
relevant changes in that function. Misplaced comment?


Removed.


4) It's not clear to me why INITIAL_MEMTUPSIZE is defined the way it is.
There needs to be a comment - the intent seems to be making it large
enough to exceed ALLOCSET_SEPARATE_THRESHOLD, but it's not quite clear
why that's a good idea.


Not sure myself, let's ask the other Alexander.



5) I do get this warning when building the code:

costsize.c: In function ‘cost_incremental_sort’:
costsize.c:1812:2: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
   List*presortedExprs = NIL;
   ^~~~

6) The comment at cost_incremental_sort talks about cost_sort_internal,
but it's cost_sort_tuplesort I guess.


Fixed.


7) The new code in create_sort_path is somewhat ugly, I guess. It's
correct, but it really needs to be made easier to comprehend. I might
have time to look into that tomorrow, but I can't promise that.


Removed this code altogether, now the costs are compared by add_path as 
usual.



Attached is a diff highlighting some of those places, and couple of
minor code formatting fixes.


Applied.

Also some other changes from me:

Remove extra blank lines
label_sort_with_costsize shouldn't have to deal with IncrementalSort 
plans, because they are only created from corresponding Path nodes.

Reword a comment in ExecSupportsBackwardsScan.
Clarify cost calculations.
enable_incrementalsort is checked at path level, we don't have to check 
it again at plan level.
enable_sort should act as a cost-based soft disable for both incremental 
and normal sort.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index fa0d1db5fb..2c0c6c3768 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1999,28 +1999,62 @@ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2
  119
 (10 rows)
 
--- CROSS JOIN, not pushed down
+-- CROSS JOIN, not pushed down.  For this query, essential optimization is top-N
+-- sort.  But it can't be processed at remote side, because we never do LIMIT
+-- push down.  Assuming that sorting is not worth it to push down, CROSS JOIN
+-- is also not pushed down in order to transfer less tuples over network.
 EXPLAIN (VERBOSE, COSTS OFF)
-SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
- QUERY PLAN  
--
+SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 OFFSET 100 LIMIT 10;
+QUERY PLAN
+--
  Limit
-   Output: t1.c1, t2.c1
+   Output: t1.c3, t2.c3
->  Sort
- Output: t1.c1, t2.c1
- Sort Key: t1.c1, t2.c1
+ Output: t1.c3, t2.c3
+ Sort Key: t1.c3, t2.c3
  ->  Nested Loop
-   Output: t1.c1, t2.c1
+   Output: t1.c3, t2.c3
->  Foreign Scan on public.ft1 t1
- Output: t1.c1
- Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+ Output: t1.c3
+ Remote SQL: SELECT c3 FROM "S 1"."T 1"
->  Materialize
-

Re: [HACKERS] [PATCH] Incremental sort

2018-04-06 Thread Tomas Vondra

On 04/06/2018 01:43 AM, Alexander Korotkov wrote:
> Hi!
> 
> On Tue, Apr 3, 2018 at 2:10 PM, Tomas Vondra
> > wrote:
> 
> I think solving this may be fairly straight-forward. Essentially, until
> now we only had one way to do the sort, so it was OK to make the sort
> implicit by checking if the path is sorted
> 
>     if (input not sorted)
>     {
>         ... add a Sort node ...
>     }
> 
> But now we have multiple possible ways to do the sort, with different
> startup/total costs. So the places that create the sorts need to
> actually generate the Sort paths for each sort alternative, and store
> the information in the Sort node (instead of relying on pathkeys).
> 
> Ultimately, this should simplify the createplan.c places making all the
> make_sort calls unnecessary (i.e. the input should be already sorted
> when needed). Otherwise it'd mean the decision needs to be done locally,
> but I don't think that should be needed.
> 
> But it's surely a fairly invasive change to the patch ...
> 
> 
> Right, there are situation when incremental sort has lower startup cost,
> but higher total cost.  In order to find lower cost, we ideally should
> generate
> paths for both full sort and incremental sort.  However, that would increase
> number of total pathes, and could slowdown planning time.  Another issue
> that we don't always generate pathes for sort.  And yes, it would be rather
> invasive.  So, that doesn't look feasible to get into 11.
> 

I agree that's probably not feasible for PG11, considering the freeze is
about 48h from now. Not necessarily because of amount of code needed to
do that (it might be fairly small, actually) but because of the risk of
regressions in other types of plans and lack of time for review/testing.

I do not think this would cause a significant increase in path number.
We already do have the (partially sorted) paths in pathlist, otherwise
v21 wouldn't be able to build the incremental sort path anyway. And the
plans that did the decision in createplan.c could fairly easily do the
decision when constructing the path, I believe.

Looking v21, this affects three different places:

1) create_merge_append_plan

For merge_append the issue is that generate_mergeappend_paths() calls
get_cheapest_path_for_pathkeys(), which however only looks at cheapest
startup/total path, and then simply falls-back to cheapest_total_path if
there are no suitably sorted paths. IMHO if you modify this to also
consider partially-sorted paths, it should work. You'll have to account
for the extra cost of incremental cost, and it needs to be fairly cheap
(perhaps even by first quickly computing some initial cost estimate -
see e.g. initial_cost_nestloop/final_cost_nestloop).

2) create_mergejoin_plan

For mergejoin, the issue is that sort_inner_and_outer() only looks at
cheapest_total_path for both sides, even before computing the merge
pathkeys. So some of the code would have to move into the foreach loop,
and the paths would be picked by get_cheapest_path_for_pathkeys(). This
time only using total cost, per the comment in sort_inner_and_outer().

3) create_gather_merge_plan

This seems fairly simple - the issue is that gather_grouping_paths()
only looks at cheapest_partial_path. Should be a matter of simply
calling the improved get_cheapest_path_for_pathkeys().

Of course, this is all fairly hand-wavy and it may turn out to be fairly
expensive in some cases. But we can use another trick - we don't need to
search through all partially sorted paths, because for each pathkey
prefix there can be just one "best" path for startup cost and one for
total cost. So we could maintain a much shorter list of partially sorted
paths, I think.

>
> Intead, I decided to cut usage of incremental sort.  Now, incremental sort
> is generated only in create_sort_path().  Cheaper path selected between
> incremental sort and full sort with taking limit_tuples into account.
> That limits usage of incremental sort, however risk of regression by this
> patch is also minimal.  In fact, incremental sort will be used only when
> sort is explicitly specified and simultaneously LIMIT is specified or
> dataset to be sorted is large and incremental sort saves disk IO.
> 

I personally am OK with reducing the scope of the patch like this. It's
still beneficial for the common ORDER BY + LIMIT case, which is good. I
don't think it may negatively affect other cases (at least I can't think
of any).

It's pretty obvious it may be extremely useful for the other cases too
(particularly for mergejoin on large tables, where it can allow
in-memory sort with quick startup).

But even if you managed to make the necessary code changes, it's
unlikely any experienced committer will look into such significant
change this close to the cutoff. Either they are going to be away for a
weekend, or they are already looking at other

Re: [HACKERS] [PATCH] Incremental sort

2018-04-03 Thread Alexander Korotkov

On Sun, Apr 1, 2018 at 12:06 AM, Tomas Vondra 
wrote:

> On 03/31/2018 10:43 PM, Tomas Vondra wrote:
> > ...
> > But I'm pretty sure it may lead to surprising behavior - for example if
> > you disable incremental sorts (enable_incrementalsort=off), the plan
> > will switch to plain sort without the additional costs. So you'll get a
> > cheaper plan by disabling some operation. That's surprising.
> >
>
> To illustrate this is a valid issue, consider this trivial example:
>
> create table t (a int, b int, c int);
>
> insert into t select 10*random(), 10*random(), 10*random()
>   from generate_series(1,100) s(i);
>
> analyze t;
>
> explain select * from (select * from t order by a,b) foo order by a,b,c;
>
>QUERY PLAN
> 
>  Incremental Sort  (cost=133100.48..264139.27 rows=100 width=12)
>Sort Key: t.a, t.b, t.c
>Presorted Key: t.a, t.b
>->  Sort  (cost=132154.34..134654.34 rows=100 width=12)
>  Sort Key: t.a, t.b
>  ->  Seq Scan on t  (cost=0.00..15406.00 rows=100 width=12)
> (6 rows)
>
> set enable_incrementalsort = off;
>
> explain select * from (select * from t order by a,b) foo order by a,b,c;
>QUERY PLAN
> 
>  Sort  (cost=261402.69..263902.69 rows=100 width=12)
>Sort Key: t.a, t.b, t.c
>->  Sort  (cost=132154.34..134654.34 rows=100 width=12)
>  Sort Key: t.a, t.b
>  ->  Seq Scan on t  (cost=0.00..15406.00 rows=100 width=12)
> (5 rows)
>
> So the cost with incremental sort was 264139, and after disabling the
> incremental cost it dropped to 263902. Granted, the difference is
> negligible in this case, but it's still surprising.
>
> Also, it can be made much more significant by reducing the number of
> prefix groups in the data:
>
> truncate t;
>
> insert into t select 1,1,1 from generate_series(1,100) s(i);
>
> analyze t;
>
> set enable_incrementalsort = on;
>
> explain select * from (select * from t order by a,b) foo order by a,b,c;
>
>QUERY PLAN
> 
>  Incremental Sort  (cost=324165.83..341665.85 rows=100 width=12)
>Sort Key: t.a, t.b, t.c
>Presorted Key: t.a, t.b
>->  Sort  (cost=132154.34..134654.34 rows=100 width=12)
>  Sort Key: t.a, t.b
>  ->  Seq Scan on t  (cost=0.00..15406.00 rows=100 width=12)
> (6 rows)
>
> So that's 263902 vs. 341665, yet we still prefer the incremental mode.


Problem is well-defined, thank you.
I'll check what can be done in this field today.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2018-03-31 Thread Tomas Vondra

Hi,

I've been doing a bit more review of the patch today, focusing on the
planner part, and I'm starting to have some doubts regarding the way
incremental sort paths are created. I do have some question about the
executor and other parts too.

I'll mark this as 'waiting on author' to make it clear the patch is
still being discussed, RFC is not appropriate status for that.

Attached is a patch that highlights some of the interesting places, and
also suggests some minor changes to comments and other tweaks.


1) planning/costing of incremental vs. non-incremental sorts


In sort, all the various places that create/cost sorts:

 * createplan.c (make_sort)
 * planner.c (create_sort_path)
 * pathnode.c (cost_sort)

seem to prefer incremental sorts whenever available. Consider for
example this code from create_merge_append_plan():

if (!pathkeys_common_contained_in(pathkeys, subpath->pathkeys,
  _common_pathkeys))
{
Sort   *sort = make_sort(subplan, numsortkeys,
 n_common_pathkeys,
 sortColIdx, sortOperators,
 collations, nullsFirst);

label_sort_with_costsize(root, sort, best_path->limit_tuples);
subplan = (Plan *) sort;
}

This essentially says that when (n_common_pathkeys > 0), the sort is
going to be incremental.

That however seems to rely on an important assumption - when the input
is presorted, the incremental sort is expected to be cheaper than
regular sort.

This assumption however seems to be proven invalid by cost_sort, which
does the common part for both sort modes (incremental/non-incremental)
first, and then does this:

/* Extra costs of incremental sort */
if (presorted_keys > 0)
{
... add something to the costs ...
}

That is, the incremental cost seems to be pretty much guaranteed to be
more expensive than regular Sort (with the exception of LIMIT queries,
where it's guaranteed to win thanks to lower startup cost).

I don't know how significant the cost difference may be (perhaps not
much), or if it may lead to inefficient plans. For example what if the
cheapest total path is partially sorted by chance, and only has a single
prefix group? Then all the comparisons with pivotSlot are unnecessary.

But I'm pretty sure it may lead to surprising behavior - for example if
you disable incremental sorts (enable_incrementalsort=off), the plan
will switch to plain sort without the additional costs. So you'll get a
cheaper plan by disabling some operation. That's surprising.

So I think it would be more appropriate if those places actually did a
costing of incremental vs. non-incremental sorts, and then constructed
the cheaper option. Essentially we should consider both plain and
incremental sort for each partially sorted input path, and then pick the
right one.

Of course, this is going to be tricky in createplan.c which builds the
plans directly - in that case it might be integrated into make_sort() or
something like that.

Also, I wonder if we could run into problems, due to incremental sort
not supporting things the regular sort does (rewind, backwards scans and
mark/restore).


2) nodeIncrementalsort.c


There's a couple of obsolete comments, that came from nodeSort.c and did
not get tweaked (and so talk about first-time through when incremental
sort needs to do that for each group, etc.). The attached diff tweaks
those, and clarifies a couple of others. I've also added some comments
explaining what the pivotSlot is about etc. There's also a couple of XXX
comments asking additional questions/clarifications.

I'm wondering if a static MIN_GROUP_SIZE is good idea. For example, what
if the subplan is expected to return only very few tuples (say, 33), but
the query includes LIMIT 1. Now, let's assume the startup/total cost of
the subplan is 1 and 100. With MIN_GROUP_SIZE 32 we're bound to
execute it pretty much till the end, while we could terminate after the
first tuple (if the prefix changes).

So I think we should use a Min(limit,MIN_GROUP_SIZE) here, and perhaps
this should depend on average group size too.

The other questionable thing seems to be this claim:

 * We unlikely will have huge groups with incremental sort.  Therefore
 * usage of abbreviated keys would be likely a waste of time.

followed by disabling abbreviated keys in the tuplesort_begin_heap call.
I find this rather dubious and unsupported by any arguments (I certainly
don't see any in the comments).

If would be more acceptable if the estimated number of groups was used
when deciding whether to use incremental sort or not, but that's not the
case - as explained in the first part, we simply prefer incremental
sorts whenever there is a prefix. In those cases we have very little
idea (or even guarantees) regarding the average group size.

Re: [HACKERS] [PATCH] Incremental sort

2018-03-29 Thread Alexander Kuzmenkov


Hi Alexander,

I took a quick look at the patch. Some things I fixed myself in the 
attached patch v.21. Here is the summary:


Typo in compare_fractional_path_costs() should be fixed as a separate patch.
Remove unused function estimate_pathkeys_groups.
Extra MemoryContextReset before tuplesort_end() shouldn't be a big deal, 
so we don't have to add a parameter to tuplesoft_free().

Add comment to maincontext declaration.
Fix typo in INITIAL_MEMTUPSIZE.
Remove trailing whitespace.

Some other things I found:

In tuplesort_reset:
if (state->memtupsize < INITIAL_MEMTUPSIZE)
    
I'd add a comment explaining when and why we have to do this. Also maybe 
a comment to other allocations of memtuples in tuplesort_begin() and 
mergeruns(), explaining why it is reallocated and why in maincontext.



In tuplesort_updatemax:
    /* XXX */
    if (spaceUsedOnDisk > state->maxSpaceOnDisk ||
        (spaceUsedOnDisk == state->maxSpaceOnDisk && spaceUsed > 
state->maxSpace))

XXX. Also comparing bools with '>' looks confusing to me.


We should add a comment on top of tuplesort.c, explaining that we now 
have a faster way to sort multiple batches of data using the same sort 
conditions.



The name 'main context' sounds somewhat vague. Maybe 'top context'? Not 
sure.



In ExecSupportBackwardsScan:
        case T_IncrementalSort:
            return false;
This separate case looks useless, I'd either add a comment explaining 
why it can't scan backwards, or just return false by default.




That's all I have for today; tomorrow I'll continue with reviewing the 
planner part of the patch.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d6e387d63..d11777cb90 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1999,28 +1999,62 @@ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2
  119
 (10 rows)
 
--- CROSS JOIN, not pushed down
+-- CROSS JOIN, not pushed down.  For this query, essential optimization is top-N
+-- sort.  But it can't be processed at remote side, because we never do LIMIT
+-- push down.  Assuming that sorting is not worth it to push down, CROSS JOIN
+-- is also not pushed down in order to transfer less tuples over network.
 EXPLAIN (VERBOSE, COSTS OFF)
-SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
- QUERY PLAN  
--
+SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 OFFSET 100 LIMIT 10;
+QUERY PLAN
+--
  Limit
-   Output: t1.c1, t2.c1
+   Output: t1.c3, t2.c3
->  Sort
- Output: t1.c1, t2.c1
- Sort Key: t1.c1, t2.c1
+ Output: t1.c3, t2.c3
+ Sort Key: t1.c3, t2.c3
  ->  Nested Loop
-   Output: t1.c1, t2.c1
+   Output: t1.c3, t2.c3
->  Foreign Scan on public.ft1 t1
- Output: t1.c1
- Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+ Output: t1.c3
+ Remote SQL: SELECT c3 FROM "S 1"."T 1"
->  Materialize
- Output: t2.c1
+ Output: t2.c3
  ->  Foreign Scan on public.ft2 t2
-   Output: t2.c1
-   Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+   Output: t2.c3
+   Remote SQL: SELECT c3 FROM "S 1"."T 1"
 (15 rows)
 
+SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 OFFSET 100 LIMIT 10;
+  c3   |  c3   
+---+---
+ 1 | 00101
+ 1 | 00102
+ 1 | 00103
+ 1 | 00104
+ 1 | 00105
+ 1 | 00106
+ 1 | 00107
+ 1 | 00108
+ 1 | 00109
+ 1 | 00110
+(10 rows)
+
+-- CROSS JOIN, pushed down.  Unlike previous query, remote side is able to
+-- return tuples in given order without full sort, but using index scan and
+-- incremental sort.  This is much cheaper than full sort on local side, even
+-- despite we don't know LIMIT on remote side.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+QUERY PLAN 
+---
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Foreign Scan
+ Output: t1.c1, t2.c1
+

Re: [HACKERS] [PATCH] Incremental sort

2018-03-29 Thread Alexander Korotkov

On Wed, Mar 28, 2018 at 7:17 PM, Andres Freund  wrote:

> On 2018-03-28 16:28:01 +0300, Teodor Sigaev wrote:
> > > BTW, patch had conflicts with master.  Please, find rebased version
> attached.
> >
> > Despite by patch conflist patch looks commitable, has anybody objections
> to
> > commit it?
>
> > Patch recieved several rounds of review during 2 years, and seems to me,
> > keeping it out from sources may cause a lost it. Although it suggests
> > performance improvement in rather wide usecases.
>
> My impression it has *NOT* received enough review to be RFC. Not saying
> it's impossible to get there this release, but that just committing it
> doesn't seem wise.
>

I would say that executor part of this patch already received plenty of
review.
For sure, there still might be issues.  I just mean that amount of review of
executor part of this patch is not less than in average patch we commit.
But optimizer and costing part of this patch still need somebody to take
a look at it.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2018-03-28 Thread Andres Freund

Hi,

On 2018-03-28 16:28:01 +0300, Teodor Sigaev wrote:
> > BTW, patch had conflicts with master.  Please, find rebased version 
> > attached.
> 
> Despite by patch conflist patch looks commitable, has anybody objections to
> commit it?

> Patch recieved several rounds of review during 2 years, and seems to me,
> keeping it out from sources may cause a lost it. Although it suggests
> performance improvement in rather wide usecases.

My impression it has *NOT* received enough review to be RFC. Not saying
it's impossible to get there this release, but that just committing it
doesn't seem wise.

Greetings,

Andres Freund

Re: [HACKERS] [PATCH] Incremental sort

2018-03-28 Thread Alvaro Herrera

Teodor Sigaev wrote:
> > BTW, patch had conflicts with master.  Please, find rebased version 
> > attached.
> 
> Despite by patch conflist patch looks commitable, has anybody objections to
> commit it?
> 
> Patch recieved several rounds of review during 2 years, and seems to me,
> keeping it out from sources may cause a lost it. Although it suggests
> performance improvement in rather wide usecases.

Can we have a recap on what the patch *does*?  I see there's a
description in Alexander's first email
https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=gon-...@mail.gmail.com
but that was a long time ago, and the patch has likely changed in the
meantime ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Incremental sort

2018-03-28 Thread Tomas Vondra



On 03/28/2018 05:12 PM, Alexander Korotkov wrote:
> On Wed, Mar 28, 2018 at 4:44 PM, Tomas Vondra
> > wrote:
> 
> On 03/28/2018 03:28 PM, Teodor Sigaev wrote:
> >> BTW, patch had conflicts with master.  Please, find rebased version
> >> attached.
> >
> > Despite by patch conflist patch looks commitable, has anybody objections
> > to commit it?
> >
> > Patch recieved several rounds of review during 2 years, and seems to me,
> > keeping it out from sources may cause a lost it. Although it suggests
> > performance improvement in rather wide usecases.
> >
> 
> No objections from me - if you want me to do one final round of review
> after the rebase (not sure how invasive it'll turn out), let me know.
> 
> 
> Rebased patch is attached.  Incremental sort get used in multiple places
> of partition_aggregate regression test.  I've checked those cases, and
> it seems that incremental sort was selected right.
> 

OK, I'll take a look.

> BTW one detail I'd change is name of the GUC variable. enable_incsort
> seems unnecessarily terse - let's go for enable_incremental_sort or
> something like that.
> 
> 
> Already enable_incsort was already renamed to enable_incrementalsort
> since [1].  
> 
> 1.
> https://www.postgresql.org/message-id/CAPpHfduAVmiGDZC%2BdfNL1rEGu0mt45Rd_mxwjY57uqwWhrvQzg%40mail.gmail.com
>  
> 

Ah, apologies. I've been looking at the wrong version.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Incremental sort

2018-03-28 Thread Teodor Sigaev


BTW, patch had conflicts with master.  Please, find rebased version attached.


Despite by patch conflist patch looks commitable, has anybody objections to 
commit it?


Patch recieved several rounds of review during 2 years, and seems to me, keeping 
it out from sources may cause a lost it. Although it suggests performance 
improvement in rather wide usecases.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] [PATCH] Incremental sort

2018-03-28 Thread Teodor Sigaev


BTW, patch had conflicts with master.  Please, find rebased version attached.


Sorry, but patch conflicts with master again.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] [PATCH] Incremental sort

2018-03-21 Thread Alexander Korotkov

Hi!

On Wed, Mar 21, 2018 at 2:30 PM, Darafei "Komяpa" Praliaskouski <
m...@komzpa.net> wrote:

> on a PostGIS system tuned for preferring parallelism heavily (
> min_parallel_table_scan_size=10kB) we experience issues with QGIS table
> discovery query with this patch:
>
> Failing query is:
> [local] gis@gis=# SELECT l.f_table_name,l.f_table_
> schema,l.f_geometry_column,upper(l.type),l.srid,l.coord_
> dimension,c.relkind,obj_description(c.oid) FROM geometry_columns
> l,pg_class c,pg_namespace n WHERE c.relname=l.f_table_name AND
> l.f_table_schema=n.
> nspname AND n.oid=c.relnamespace AND has_schema_privilege(n.nspname,'usage')
> AND has_table_privilege('"'||n.nspname||'"."'||c.relname||'"','select')
> AND l.f_table_schema='public' ORDER BY 
> n.nspname,c.relname,l.f_geometry_column;
>
> ERROR:  XX000: badly formatted node string "INCREMENTALSORT :startup_cost
> 37"...
> CONTEXT:  parallel worker
> LOCATION:  parseNodeString, readfuncs.c:2693
> Time: 42,052 ms
>
>
> Query plan:
>
>
>   QUERY PLAN
>
>
> 
> 
> 
> 
> ──
> Sort  (cost=38717.21..38717.22 rows=1 width=393)
>   Sort Key: c_1.relname, a.attname
>   ->  Nested Loop  (cost=36059.35..38717.20 rows=1 width=393)
> ->  Index Scan using pg_namespace_nspname_index on pg_namespace n
> (cost=0.28..2.30 rows=1 width=68)
>   Index Cond: (nspname = 'public'::name)
>   Filter: has_schema_privilege((nspname)::text, 'usage'::text)
> ->  Nested Loop  (cost=36059.08..38714.59 rows=1 width=407)
>   ->  Nested Loop Left Join  (cost=36058.65..38712.12 rows=1
> width=334)
> Join Filter: ((s_2.connamespace = n_1.oid) AND
> (a.attnum = ANY (s_2.conkey)))
> ->  Nested Loop Left Join  (cost=36058.51..38711.94
> rows=1 width=298)
>   Join Filter: ((s_1.connamespace = n_1.oid) AND
> (a.attnum = ANY (s_1.conkey)))
>   ->  Nested Loop  (cost=36058.38..38711.75 rows=1
> width=252)
> Join Filter: (a.atttypid = t.oid)
> ->  Gather Merge  (cost=36057.95..38702.65
> rows=444 width=256)
>   Workers Planned: 10
>   ->  Merge Left Join
> (cost=35057.76..37689.01 rows=44 width=256)
> Merge Cond: ((n_1.oid =
> s.connamespace) AND (c_1.oid = s.conrelid))
> Join Filter: (a.attnum = ANY
> (s.conkey))
> ->  Incremental Sort
> (cost=37687.19..37687.30 rows=44 width=210)
>   Sort Key: n_1.oid,
> c_1.oid
>   Presorted Key: n_1.oid
>   ->  Nested Loop
> (cost=34837.25..37685.99 rows=44 width=210)
> ->  Merge Join
> (cost=34836.82..34865.99 rows=9 width=136)
>   Merge Cond:
> (c_1.relnamespace = n_1.oid)
>   ->  Sort
> (cost=34834.52..34849.05 rows=5814 width=72)
> Sort
> Key: c_1.relnamespace
> ->
> Parallel Seq Scan on pg_class c_1  (cost=0.00..34470.99 rows=5814 width=72)
>
> Filter: ((relname <> 'raster_columns'::name) AND (NOT
> pg_is_other_temp_schema(relnamespace)) AND has_table_privilege(oid,
> 'SELECT'::text) AND (relkind = ANY ('{r,v,m,f,p}'::"char"[])))
>   ->  Sort
> (cost=2.30..2.31 rows=1 width=68)
> Sort
> Key: n_1.oid
> ->
> Index Scan using pg_namespace_nspname_index on pg_namespace n_1
> (cost=0.28..2.29 rows=1 width=68)
>
> Index Cond: (nspname = 'public'::name)
> ->  Index Scan
> using pg_attribute_relid_attnum_index on pg_attribute a
> (cost=0.43..200.52 rows=11281 width=78)
>   Index Cond:
> (attrelid = c_1.oid)
>   Filter: (NOT
> attisdropped)
> ->  Sort  (cost=1.35..1.35
> rows=1 width=77)
>   Sort Key:
> s.connamespace, s.conrelid
>

Re: [HACKERS] [PATCH] Incremental sort

2018-03-21 Thread Komяpa

Hi,

on a PostGIS system tuned for preferring parallelism heavily (
min_parallel_table_scan_size=10kB) we experience issues with QGIS table
discovery query with this patch:

Failing query is:
[local] gis@gis=# SELECT
l.f_table_name,l.f_table_schema,l.f_geometry_column,upper(l.type),l.srid,l.coord_dimension,c.relkind,obj_description(c.oid)
FROM geometry_columns l,pg_class c,pg_namespace n WHERE
c.relname=l.f_table_name AND l.f_table_schema=n.
nspname AND n.oid=c.relnamespace AND
has_schema_privilege(n.nspname,'usage') AND
has_table_privilege('"'||n.nspname||'"."'||c.relname||'"','select') AND
l.f_table_schema='public' ORDER BY n.nspname,c.relname,l.f_geometry_column;

ERROR:  XX000: badly formatted node string "INCREMENTALSORT :startup_cost
37"...
CONTEXT:  parallel worker
LOCATION:  parseNodeString, readfuncs.c:2693
Time: 42,052 ms


Query plan:


QUERY PLAN


──
Sort  (cost=38717.21..38717.22 rows=1 width=393)
  Sort Key: c_1.relname, a.attname
  ->  Nested Loop  (cost=36059.35..38717.20 rows=1 width=393)
->  Index Scan using pg_namespace_nspname_index on pg_namespace n
(cost=0.28..2.30 rows=1 width=68)
  Index Cond: (nspname = 'public'::name)
  Filter: has_schema_privilege((nspname)::text, 'usage'::text)
->  Nested Loop  (cost=36059.08..38714.59 rows=1 width=407)
  ->  Nested Loop Left Join  (cost=36058.65..38712.12 rows=1
width=334)
Join Filter: ((s_2.connamespace = n_1.oid) AND
(a.attnum = ANY (s_2.conkey)))
->  Nested Loop Left Join  (cost=36058.51..38711.94
rows=1 width=298)
  Join Filter: ((s_1.connamespace = n_1.oid) AND
(a.attnum = ANY (s_1.conkey)))
  ->  Nested Loop  (cost=36058.38..38711.75 rows=1
width=252)
Join Filter: (a.atttypid = t.oid)
->  Gather Merge  (cost=36057.95..38702.65
rows=444 width=256)
  Workers Planned: 10
  ->  Merge Left Join
(cost=35057.76..37689.01 rows=44 width=256)
Merge Cond: ((n_1.oid =
s.connamespace) AND (c_1.oid = s.conrelid))
Join Filter: (a.attnum = ANY
(s.conkey))
->  Incremental Sort
(cost=37687.19..37687.30 rows=44 width=210)
  Sort Key: n_1.oid, c_1.oid
  Presorted Key: n_1.oid
  ->  Nested Loop
(cost=34837.25..37685.99 rows=44 width=210)
->  Merge Join
(cost=34836.82..34865.99 rows=9 width=136)
  Merge Cond:
(c_1.relnamespace = n_1.oid)
  ->  Sort
(cost=34834.52..34849.05 rows=5814 width=72)
Sort
Key: c_1.relnamespace
->
Parallel Seq Scan on pg_class c_1  (cost=0.00..34470.99 rows=5814 width=72)

Filter: ((relname <> 'raster_columns'::name) AND (NOT
pg_is_other_temp_schema(relnamespace)) AND has_table_privilege(oid,
'SELECT'::text) AND (relkind = ANY ('{r,v,m,f,p}'::"char"[])))
  ->  Sort
(cost=2.30..2.31 rows=1 width=68)
Sort
Key: n_1.oid
->
Index Scan using pg_namespace_nspname_index on pg_namespace n_1
(cost=0.28..2.29 rows=1 width=68)

Index Cond: (nspname = 'public'::name)
->  Index Scan
using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.43..200.52
rows=11281 width=78)
  Index Cond:
(attrelid = c_1.oid)
  Filter: (NOT
attisdropped)
->  Sort  (cost=1.35..1.35
rows=1 width=77)
  Sort Key: s.connamespace,
s.conrelid
  ->  Seq Scan on
pg_constraint s  (cost=0.00..1.34 rows=1 width=77)
Filter: (consrc ~~*
'%geometrytype(% = %'::text)
->  Materialize  (cost=0.42..2.45 rows=1
width=4)

Re: [HACKERS] [PATCH] Incremental sort

2018-03-16 Thread Tomas Vondra

On 03/16/2018 09:47 AM, Alexander Korotkov wrote:
> On Fri, Mar 16, 2018 at 5:12 AM, Tomas Vondra
> > wrote:
> 
> I agree those don't seem like an issue in the Incremental Sort patch,
> but like a more generic costing problems.
> 
> 
> Yes, I think so too.

I wonder if we could make the costing a bit more pessimistic, to make
these loses less likely, while still keeping the main wins (particularly
for the LIMIT queries). But that seems a bit like a lost case, I guess.

> Do you think we can mark this patch RFC assuming that it have
> already got pretty much of review previously.
> 

Actually, I was going to propose to switch it to RFC, so I've just done
that. I think the patch is clearly ready for a committer to take a
closer look. I really like this improvement.

I'm going to rerun the tests, but that's mostly because I'm interested
if the change from i++ to i-- in cmpSortPresortedCols makes a measurable
difference. I don't expect to find any issues, so why wait with the RFC?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Incremental sort

2018-03-16 Thread Alexander Korotkov

On Fri, Mar 16, 2018 at 5:12 AM, Tomas Vondra 
wrote:

> I agree those don't seem like an issue in the Incremental Sort patch,
> but like a more generic costing problems.
>

Yes, I think so too.
Do you think we can mark this patch RFC assuming that it have already got
pretty
much of review previously.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2018-03-15 Thread Tomas Vondra



On 03/10/2018 06:05 PM, Alexander Korotkov wrote:
> On Sat, Mar 10, 2018 at 6:42 PM, Alexander Korotkov
> > wrote:
>
> ...
>
> After some investigation of benchmark results, I found 2 sources of
> regressions of incremental sort.
> 
> *Case 1: Underlying node scan lose is bigger than incremental sort win*
> 
> = 33 [Wed Mar  7 10:14:14 CET 2018] scale:1000 groups:10
> work_mem:64MB incremental:on max_workers:0 =
> SELECT * FROM s_1 ORDER BY a, b
>                                                                    QUERY
> PLAN                                                                    
> -
>  Limit  (cost=1588080.84..1588080.84 rows=1 width=20) (actual
> time=5874.527..5874.527 rows=0 loops=1)
>    ->  Incremental Sort  (cost=119371.51..1488081.45 rows=939
> width=20) (actual time=202.842..5653.224 rows=1000 loops=1)
>          Sort Key: s_1.a, s_1.b
>          Presorted Key: s_1.a
>          Sort Method: external merge  Disk: 29408kB
>          Sort Groups: 11
>          ->  Index Scan using s_1_a_idx on s_1  (cost=0.43..323385.52
> rows=939 width=20) (actual time=0.051..1494.105 rows=1000 loops=1)
>  Planning time: 0.269 ms
>  Execution time: 5877.367 ms
> (9 rows)
> 
> = 37 [Wed Mar  7 10:15:51 CET 2018] scale:1000 groups:10
> work_mem:64MB incremental:off max_workers:0 =
> SELECT * FROM s_1 ORDER BY a, b
>                                                           QUERY PLAN   
>                                                       
> --
>  Limit  (cost=1656439.93..1656439.93 rows=1 width=20) (actual
> time=4741.716..4741.716 rows=0 loops=1)
>    ->  Sort  (cost=1531440.69..1556440.54 rows=939 width=20) (actual
> time=3522.156..4519.278 rows=1000 loops=1)
>          Sort Key: s_1.a, s_1.b
>          Sort Method: external merge  Disk: 293648kB
>          ->  Seq Scan on s_1  (cost=0.00..163694.39 rows=939
> width=20) (actual time=0.021..650.322 rows=1000 loops=1)
>  Planning time: 0.249 ms
>  Execution time: 4777.088 ms
> (7 rows)
> 
> In this case optimizer have decided that "Index Scan + Incremental
> Sort" would be cheaper than "Seq Scan + Sort".  But it appears that
> the amount of time we loose by selecting Index Scan over Seq Scan is
> bigger than amount of time we win by selecting Incremental Sort over
> Sort.  I would note that regular Sort consumes about 10X more disk
> space.  I bet that all this space has fit to OS cache of test 
> machine.  But optimizer did expect actual IO to take place in this 
> case.  This has lead actual time to be inadequate the costing.
> 

Yes, you're right the temporary file(s) likely fit into RAM in this test
(and even if they did not, the storage system is pretty good).

> *Case 2: Underlying node is not parallelyzed*
> 
> = 178 [Wed Mar  7 11:18:53 CET 2018] scale:1000 groups:100
> work_mem:8MB incremental:on max_workers:2 =
> SELECT * FROM s_2 ORDER BY a, b, c
>                                                                    
>  QUERY PLAN                                                             
>        
> 
>  Limit  (cost=1179047.88..1179047.88 rows=1 width=20) (actual
> time=4819.999..4819.999 rows=0 loops=1)
>    ->  Incremental Sort  (cost=89.04..1079047.34 rows=1054 width=20)
> (actual time=0.203..4603.197 rows=1000 loops=1)
>          Sort Key: s_2.a, s_2.b, s_2.c
>          Presorted Key: s_2.a, s_2.b
>          Sort Method: quicksort  Memory: 135kB
>          Sort Groups: 10201
>          ->  Index Scan using s_2_a_b_idx on s_2  (cost=0.43..406985.62
> rows=1054 width=20) (actual time=0.052..1461.177 rows=1000 loops=1)
>  Planning time: 0.313 ms
>  Execution time: 4820.037 ms
> (9 rows)
> 
> = 182 [Wed Mar  7 11:20:11 CET 2018] scale:1000 groups:100
> work_mem:8MB incremental:off max_workers:2 =
> SELECT * FROM s_2 ORDER BY a, b, c
>                                                                  QUERY
> PLAN                                                                 
> 
>  Limit  (cost=1705580.76..1705580.76 rows=1 width=20) (actual
> time=3985.818..3985.818 rows=0 loops=1)
>    ->  Gather Merge  (cost=649951.66..1622246.98 rows=878 width=20)
> (actual time=1782.354..3750.868 rows=1000 loops=1)
>          Workers Planned: 2
>          Workers Launched: 2
>          ->  Sort  (cost=648951.64..659368.36 rows=4166689

Re: [HACKERS] [PATCH] Incremental sort

2018-03-10 Thread Alexander Korotkov

On Sat, Mar 10, 2018 at 6:42 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Thu, Mar 8, 2018 at 2:49 AM, Tomas Vondra  > wrote:
> Thank you very much for testing and benchmarking.  I'll investigate
> the regressions you found.
>
>
>> Now, there's a caveat in those tests - the data set is synthetic and
>> perfectly random, i.e. all groups equally likely, no correlations or
>> anything like that.
>>
>> I wonder what is the "worst case" scenario, i.e. how to construct a data
>> set with particularly bad behavior of the incremental sort.
>
>
> I think that depends on the reason of bad behavior of incremental sort.
> For example, our quick sort implementation behaves very good on
> presorted data.  But, incremental sort appears to be not so good in
> this case as Heikki showed upthread.  That prompted me to test
> presorted datasets (which appeared to be "worst case") more intensively.
> But I suspect that regressions you found have another reason, and
> correspondingly "worst case" would be also different.
> When I'll investigate the reason of regressions, I'll try to construct
> "worst case" as well.
>

After some investigation of benchmark results, I found 2 sources of
regressions of incremental sort.

*Case 1: Underlying node scan lose is bigger than incremental sort win*

= 33 [Wed Mar  7 10:14:14 CET 2018] scale:1000 groups:10
work_mem:64MB incremental:on max_workers:0 =
SELECT * FROM s_1 ORDER BY a, b
   QUERY
PLAN
-
 Limit  (cost=1588080.84..1588080.84 rows=1 width=20) (actual
time=5874.527..5874.527 rows=0 loops=1)
   ->  Incremental Sort  (cost=119371.51..1488081.45 rows=939 width=20)
(actual time=202.842..5653.224 rows=1000 loops=1)
 Sort Key: s_1.a, s_1.b
 Presorted Key: s_1.a
 Sort Method: external merge  Disk: 29408kB
 Sort Groups: 11
 ->  Index Scan using s_1_a_idx on s_1  (cost=0.43..323385.52
rows=939 width=20) (actual time=0.051..1494.105 rows=1000 loops=1)
 Planning time: 0.269 ms
 Execution time: 5877.367 ms
(9 rows)

= 37 [Wed Mar  7 10:15:51 CET 2018] scale:1000 groups:10
work_mem:64MB incremental:off max_workers:0 =
SELECT * FROM s_1 ORDER BY a, b
  QUERY PLAN

--
 Limit  (cost=1656439.93..1656439.93 rows=1 width=20) (actual
time=4741.716..4741.716 rows=0 loops=1)
   ->  Sort  (cost=1531440.69..1556440.54 rows=939 width=20) (actual
time=3522.156..4519.278 rows=1000 loops=1)
 Sort Key: s_1.a, s_1.b
 Sort Method: external merge  Disk: 293648kB
 ->  Seq Scan on s_1  (cost=0.00..163694.39 rows=939 width=20)
(actual time=0.021..650.322 rows=1000 loops=1)
 Planning time: 0.249 ms
 Execution time: 4777.088 ms
(7 rows)

In this case optimizer have decided that "Index Scan + Incremental Sort"
would be
cheaper than "Seq Scan + Sort".  But it appears that the amount of time we
loose by
selecting Index Scan over Seq Scan is bigger than amount of time we win by
selecting Incremental Sort over Sort.  I would note that regular Sort
consumes
about 10X more disk space.  I bet that all this space has fit to OS cache
of test
machine.  But optimizer did expect actual IO to take place in this case.
This
has lead actual time to be inadequate the costing.

*Case 2: Underlying node is not parallelyzed*

= 178 [Wed Mar  7 11:18:53 CET 2018] scale:1000 groups:100
work_mem:8MB incremental:on max_workers:2 =
SELECT * FROM s_2 ORDER BY a, b, c
 QUERY
PLAN

 Limit  (cost=1179047.88..1179047.88 rows=1 width=20) (actual
time=4819.999..4819.999 rows=0 loops=1)
   ->  Incremental Sort  (cost=89.04..1079047.34 rows=1054 width=20)
(actual time=0.203..4603.197 rows=1000 loops=1)
 Sort Key: s_2.a, s_2.b, s_2.c
 Presorted Key: s_2.a, s_2.b
 Sort Method: quicksort  Memory: 135kB
 Sort Groups: 10201
 ->  Index Scan using s_2_a_b_idx on s_2  (cost=0.43..406985.62
rows=1054 width=20) (actual time=0.052..1461.177 rows=1000 loops=1)
 Planning time: 0.313 ms
 Execution time: 4820.037 ms
(9 rows)

= 182 [Wed Mar  7 11:20:11 CET 2018] scale:1000 groups:100
work_mem:8MB incremental:off max_workers:2 =
SELECT * FROM s_2 ORDER BY a, b, c
 QUERY
PLAN

Re: [HACKERS] [PATCH] Incremental sort

2018-03-10 Thread Alexander Korotkov

On Thu, Mar 8, 2018 at 2:49 AM, Tomas Vondra 
wrote:

> OK, the revised patch works fine - I've done a lot of testing and
> benchmarking, and not a single segfault or any other crash.
>
> Regarding the benchmarks, I generally used queries of the form
>
> SELECT * FROM (SELECT * FROM t ORDER BY a) foo ORDER BY a,b
>
> with the first sort done in various ways:
>
> * regular Sort node
> * indexes with Index Scan
> * indexes with Index Only Scan
>
> and all these three options with and without LIMIT (the limit was set to
> 1% of the source table).
>
> I've also varied parallelism (max_parallel_workers_per_gather was set to
> either 0 or 2), work_mem (from 4MB to 256MB) and data set size (tables
> from 1000 rows to 10M rows).
>
> All of this may seem like an overkill, but I've found a couple of
> regressions thanks to that.
>
> The full scripts and results are available here:
>
> https://github.com/tvondra/incremental-sort-tests
>
> The queries actually executed are a bit more complicated, to eliminate
> overhead due to data transfer to client etc. The same approach was used
> in the other sorting benchmarks we've done in the past.
>
> I'm attaching results for two scales - 10k and 10M rows, preprocessed
> into .ods format. I haven't looked at the other scales yet, but I don't
> expect any surprises there.
>
> Each .ods file contains raw data for one of the tests (matching the .sh
> script filename), pivot table, and comparison of durations with and
> without the incremental sort.
>
> In general, I think the results look pretty impressive. Almost all the
> comparisons are green, which means "faster than master" - usually by
> tens of percent (without limit), or by up to ~95% (with LIMIT).
>
> There are a couple of regressions in two cases sort-indexes and
> sort-indexes-ios.
>
> Oh the small dataset this seems to be related to the number of groups
> (essentially, number of distinct values in a column). My assumption is
> that there is some additional overhead when "switching" between the
> groups, and with many groups it's significant enough to affect results
> on these tiny tables (where master only takes ~3ms to do the sort). The
> slowdown seems to be
>
> On the large data set it seems to be somehow related to both work_mem
> and number of groups, but I didn't have time to investigate that yet
> (there are explain analyze plans in the results, so feel free to look).
>
> In general, I think this looks really nice. It's certainly awesome with
> the LIMIT case, as it allows us to leverage indexes on a subset of the
> ORDER BY columns.
>

Thank you very much for testing and benchmarking.  I'll investigate
the regressions you found.


> Now, there's a caveat in those tests - the data set is synthetic and
> perfectly random, i.e. all groups equally likely, no correlations or
> anything like that.
>
> I wonder what is the "worst case" scenario, i.e. how to construct a data
> set with particularly bad behavior of the incremental sort.


I think that depends on the reason of bad behavior of incremental sort.
For example, our quick sort implementation behaves very good on
presorted data.  But, incremental sort appears to be not so good in
this case as Heikki showed upthread.  That prompted me to test
presorted datasets (which appeared to be "worst case") more intensively.
But I suspect that regressions you found have another reason, and
correspondingly "worst case" would be also different.
When I'll investigate the reason of regressions, I'll try to construct
"worst case" as well.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2018-03-05 Thread Tomas Vondra

Hi,

I have started reviewing the patch and doing some testing, and I have
pretty quickly ran into a segfault. Attached is a simple reproducer and
an backtrace. AFAICS the bug seems to be somewhere in the tuplesort
changes, likely resetting a memory context too soon or something like
that. I haven't investigated it further, but it matches my hunch that
tuplesort is likely where the bugs will be.

Otherwise the patch seems fairly complete. A couple of minor things that
I noticed while eyeballing the changes in a diff editor.


1) On a couple of places the new code has this comment

/* even when not parallel-aware */

while all the immediately preceding blocks use

/* even when not parallel-aware, for EXPLAIN ANALYZE */

I suggest using the same comment, otherwise it kinda suggests it's not
because of EXPLAIN ANALYZE.


2) I think the purpose of sampleSlot should be explicitly documented
(and I'm not sure "sample" is a good term here, as is suggest some sort
of sampling (for example nodeAgg uses grp_firstTuple).


3) skipCols/SkipKeyData seems a bit strange too, I think. I'd use
PresortedKeyData or something like that.


4) In cmpSortSkipCols, when checking if the columns changed, the patch
does this:

n = ((IncrementalSort *) node->ss.ps.plan)->skipCols;

for (i = 0; i < n; i++)
{
... check i-th key ...
}

My hunch is that checking the keys from the last one, i.e.

for (i = (n-1); i >= 0; i--)
{

}

would be faster. The reasoning is that with "ORDER BY a,b" the column
"b" changes more often. But I've been unable to test this because of the
segfault crashes.


5) The changes from

if (pathkeys_contained_in(...))

to

n = pathkeys_common(pathkeys, subpath->pathkeys);


if (n == 0)

seem rather inconvenient to me, as it makes the code unnecessarily
verbose. I wonder if there's a better way to deal with this.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
test2=# EXPLAIN SELECT * FROM (SELECT * FROM t ORDER BY a) foo ORDER BY a, b;
   QUERY PLAN   

 Incremental Sort  (cost=147027.04..269419.25 rows=100 width=20)
   Sort Key: t.a, t.b
   Presorted Key: t.a
   ->  Sort  (cost=136537.84..139037.84 rows=100 width=20)
 Sort Key: t.a
 ->  Seq Scan on t  (cost=0.00..16370.00 rows=100 width=20)
(6 rows)


crash.sql
Description: application/sql
Core was generated by `postgres: tomas test2 [local] SELECT 
  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  MemoryContextResetOnly (context=0x0) at mcxt.c:158
158 if (!context->isReset)
(gdb) bt
#0  MemoryContextResetOnly (context=0x0) at mcxt.c:158
#1  0x560f622f07b3 in tuplesort_reset (state=0x560f632e56a0) at 
tuplesort.c:1391
#2  0x560f620a2d17 in ExecIncrementalSort (pstate=0x560f632e1890) at 
nodeIncrementalSort.c:280
#3  0x560f6207d5da in ExecProcNode (node=0x560f632e1890) at 
../../../src/include/executor/executor.h:239
#4  ExecutePlan (execute_once=, dest=0x560f632daf28, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_SELECT, use_parallel_mode=, 
planstate=0x560f632e1890, estate=0x560f632e1680) at execMain.c:1721
#5  standard_ExecutorRun (queryDesc=0x560f63233f60, direction=, 
count=0, execute_once=) at execMain.c:361
#6  0x560f621b8cfc in PortalRunSelect (portal=portal@entry=0x560f63277e40, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x560f632daf28) at pquery.c:932
#7  0x560f621ba136 in PortalRun (portal=portal@entry=0x560f63277e40, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
run_once=run_once@entry=1 '\001', dest=dest@entry=0x560f632daf28, 
altdest=altdest@entry=0x560f632daf28, 
completionTag=0x7fffdd875ac0 "") at pquery.c:773
#8  0x560f621b60cb in exec_simple_query (query_string=0x560f63213820 
"SELECT * FROM (SELECT * FROM t ORDER BY a) foo ORDER BY a, b;") at 
postgres.c:1120
#9  0x560f621b7df4 in PostgresMain (argc=, 
argv=argv@entry=0x560f63240678, dbname=, username=) at postgres.c:4144
#10 0x560f61ef56d1 in BackendRun (port=0x560f63236130) at postmaster.c:4412
#11 BackendStartup (port=0x560f63236130) at postmaster.c:4084
#12 ServerLoop () at postmaster.c:1757
#13 0x560f62148242 in PostmasterMain (argc=3, argv=0x560f6320e3e0) at 
postmaster.c:1365
#14 0x560f61ef66b7 in main (argc=3, argv=0x560f6320e3e0) at main.c:228

Re: [HACKERS] [PATCH] Incremental sort

2018-01-08 Thread Alexander Korotkov

On Mon, Jan 8, 2018 at 2:29 PM, Antonin Houska  wrote:

> Alexander Korotkov  wrote:
>
> > Antonin Houska  wrote:
>
> > >  Shouldn't the test contain *both* cases?
>
> > Thank you for pointing that. Sure, both cases are better. I've added
> second case as well as comments. Patch is attached.
>
> I'm fine with the tests now but have a minor comment on this comment:
>
> -- CROSS JOIN, not pushed down, because we don't push down LIMIT and
> remote side
> -- can't perform top-N sort like local side can.
>
> I think the note on LIMIT push-down makes the comment less clear because
> there's no difference in processing the LIMIT: EXPLAIN shows that both
>
> SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1
> OFFSET
> 100 LIMIT 10;
>
> and
>
> SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3
> OFFSET
> 100 LIMIT 10;
>
> evaluate the LIMIT clause only locally.
>
> What I consider the important difference is that the 2nd case does not
> generate the appropriate input for remote incremental sort (while
> incremental
> sort tends to be very cheap). Therefore it's cheaper to do no remote sort
> at
> all and perform the top-N sort locally than to do a regular
> (non-incremental)
> remote sort.
>

Agree, these comments are not clear enough.  I've rewritten comments: they
became much
more wordy, but now they look clearer for me.  Also I've swapped the
queries order, for me
it seems to easier for understanding.


> I have no other questions about this patch. I expect the CFM to set the
> status
> to "ready for committer" as soon as the other reviewers confirm they're
> happy
> about the patch status.


Good, thank you.  Let's see what other reviewers will say.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


incremental-sort-15.patch
Description: Binary data

Re: [HACKERS] [PATCH] Incremental sort

2018-01-04 Thread Tels

Hello Alexander,

On Thu, January 4, 2018 4:36 pm, Alexander Korotkov wrote:
> On Fri, Dec 8, 2017 at 4:06 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Thank you for pointing that.  Sure, both cases are better.  I've added
>> second case as well as comments.  Patch is attached.

I had a quick look, this isn't a full review, but a few things struck me
on a read through the diff:

There are quite a few places where lines are broken like so:

+   
ExecIncrementalSortInitializeWorker((IncrementalSortState *) planstate,
+   
pwcxt);
Or like this:

+   result = (PlanState *) ExecInitIncrementalSort(
+   
(IncrementalSort *) node, estate, eflags);

e.g. a param is on the next line, but aligned to the very same place where
it would be w/o the linebreak. Or is this just some sort of artefact
because I viewed the diff with tabspacing = 8?

I'd fix the grammar here:

+ * Incremental sort is specially optimized kind of multikey sort 
when
+ * input is already presorted by prefix of required keys list.

Like so:

"Incremental sort is a specially optimized kind of multikey sort used when
the input is already presorted by a prefix of the required keys list."

+ * Consider following example.  We have input tuples consisting 
from

"Consider the following example: We have ..."

+* In incremental sort case we also have to cost to detect sort 
groups.

"we also have to cost the detection of sort groups."

"+   * It turns out into extra copy and comparison for each tuple."

"This turns out to be one extra copy and comparison per tuple."

+ "Portions Copyright (c) 1996-2017"

Should probably be 2018 now - time flies fast :)

return_value = _readMaterial();
else if (MATCH("SORT", 4))
return_value = _readSort();
+   else if (MATCH("INCREMENTALSORT", 7))
+   return_value = _readIncrementalSort();
else if (MATCH("GROUP", 5))
return_value = _readGroup();

I think the ", 7" here is left-over from when it was named "INCSORT", and
it should be MATCH("INCREMENTALSORT", 15)), shouldn't it?

+  space, fase 
when it's value for in-memory

typo: "space, false when ..."

+   boolcmp;
+   cmp = cmpSortSkipCols(node, node->sampleSlot, slot);
+
+   if (cmp)

In the above, the variable cmp could be optimized away with:

+   if (cmpSortSkipCols(node, node->sampleSlot, slot))

(not sure if modern compilers won't do this, anway, though)

+typedef struct IncrementalSortState
+{
+   ScanState   ss; /* its first field is 
NodeTag */
+   boolbounded;/* is the result set
bounded? */
+   int64   bound;  /* if bounded, how many
tuples are needed */

If I'm not wrong, the layout of the struct will include quite a bit of
padding on 64 bit due to the mixing of bool and int64, maybe it would be
better to sort the fields differently, e.g. pack 4 or 8 bools together?
Not sure if that makes much of a difference, though.

That's all for now :)

Thank you for your work,

Tels

Re: [HACKERS] [PATCH] Incremental sort

2018-01-04 Thread Alexander Korotkov

On Fri, Dec 8, 2017 at 4:06 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Thank you for pointing that.  Sure, both cases are better.  I've added
> second case as well as comments.  Patch is attached.
>

I just found that patch apply is failed according to commitfest.cputube.org.
Please, find rebased patch attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


incremental-sort-13.patch
Description: Binary data

Re: [HACKERS] [PATCH] Incremental sort

2017-12-08 Thread Alexander Korotkov

Hi!

On Fri, Dec 1, 2017 at 11:39 AM, Antonin Houska  wrote:

> I expected the number of groups actually that actually appear in the
> output,
> you consider it the number of groups started. I can't find similar case
> elsewhere in the code (e.g. Agg node does not report this kind of
> information), so I have no clue. Someone else will have to decide.
>

OK.

> But there is IncrementalSort node on the remote side.
> > Let's see what happens. Idea of "CROSS JOIN, not pushed down" test is
> that cross join with ORDER BY LIMIT is not beneficial to push down, because
> LIMIT is not pushed down and remote side wouldn't be able to use top-N
> heapsort. But if remote side has incremental sort then it can be
> > used, and fetching first 110 rows is cheap. Let's see plan of original
> "CROSS JOIN, not pushed down" test with incremental sort.
> >
> > # EXPLAIN (ANALYZE, VERBOSE) SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN
> ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
>
> ok, understood, thanks. Perhaps it's worth a comment in the test script.
>
> I'm afraid I still see a problem. The diff removes a query that (although a
> bit different from the one above) lets the CROSS JOIN to be pushed down and
> does introduce the IncrementalSort in the remote database. This query is
> replaced with one that does not allow for the join push down.
>
> *** a/contrib/postgres_fdw/sql/postgres_fdw.sql
> --- b/contrib/postgres_fdw/sql/postgres_fdw.sql
> *** SELECT t1.c1 FROM ft1 t1 WHERE NOT EXIST
> *** 510,517 
>   SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE
> t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
>   -- CROSS JOIN, not pushed down
>   EXPLAIN (VERBOSE, COSTS OFF)
> ! SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1
> OFFSET 100 LIMIT 10;
> ! SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1
> OFFSET 100 LIMIT 10;
>   -- different server, not pushed down. No result expected.
>   EXPLAIN (VERBOSE, COSTS OFF)
>   SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY
> t1.c1, t2.c1 OFFSET 100 LIMIT 10;
> --- 510,517 
>   SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE
> t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
>   -- CROSS JOIN, not pushed down
>   EXPLAIN (VERBOSE, COSTS OFF)
> ! SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3
> OFFSET 100 LIMIT 10;
> ! SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3
> OFFSET 100 LIMIT 10;
>   -- different server, not pushed down. No result expected.
>   EXPLAIN (VERBOSE, COSTS OFF)
>   SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY
> t1.c1, t2.c1 OFFSET 100 LIMIT 10;
>
> Shouldn't the test contain *both* cases?


Thank you for pointing that.  Sure, both cases are better.  I've added
second case as well as comments.  Patch is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


incremental-sort-12.patch
Description: Binary data

Re: [HACKERS] [PATCH] Incremental sort

2017-12-08 Thread Alexander Korotkov

On Wed, Nov 22, 2017 at 12:01 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> >> I gather that you have
> >> determined empirically that it's better to be able to sort groups of
> >> at least MIN_GROUP_SIZE than to be able to skip the comparisons on the
> >> leading attributes, but why is that the case?
> >
> > Right.  The issue that not only case of one tuple per group could cause
> > overhead, but few tuples (like 2 or 3) is also case of overhead.  Also,
> > overhead is related not only to sorting.  While investigate of regression
> > case provided by Heikki [1], I've seen extra time spent mostly in extra
> > copying of sample tuple and comparison with that.  In order to cope this
> > overhead I've introduced MIN_GROUP_SIZE which allows to skip copying
> sample
> > tuples too frequently.
>
> I see.  I wonder if there could ever be a function like
> ExecMoveTuple(dst, src).  Given the polymorphism involved it'd be
> slightly complicated and you'd probably have a general case that just
> copies the tuple to dst and clears src, but there might be a bunch of
> cases where you can do something more efficient like moving a pointer
> and pin ownership.  I haven't really thought that through and
> there may be fundamental problems with it...
>

ExecMoveTuple(dst, src) would be good.  But, it would be hard to implement
"moving a pointer and pin ownership" principle in our current
infrastructure.  It's because source and destination can have different
memory contexts.  AFAICS, we can't just move memory area between memory
contexts: we have to allocate new area, then memcpy, and then deallocate
old area.

> If you're going to push the tuples into the sorter every time, then I
> guess there are some special cases that could allow future
> optimisations: (1) if you noticed that every prefix was different, you
> can skip the sort operation (that is, you can use the sorter as a dumb
> tuplestore and just get the tuples out in the same order you put them
> in; not sure if Tuplesort supports that but it presumably could),

In order to notice that every prefix is different, I have to compare every
prefix.  But that may introduce an overhead.  So, there reason why I
introduced MIN_GROUP_SIZE is exactly to not compare every prefix...

> (2)
> if you noticed that every prefix was the same (that is, you have only
> one prefix/group in the sorter) then you could sort only on the suffix
> (that is, you could somehow tell Tuplesort to ignore the leading
> columns),

Yes, I did so before.  But again, after introducing MIN_GROUP_SIZE, I
missed knowledge whether all the prefixes were the same or different.  This
is why, I've to sort by full column list for now...

(3) as a more complicated optimisation for intermediate
> group sizes 1 < n < MIN_GROUP_SIZE, you could somehow number the
> groups with an integer that increments whenever you see the prefix
> change, and somehow tell tuplesort.c to use that instead of the
> leading columns.

That is interesting idea.  The reason we have an overhead in comparison
with plain sort is that we do extra comparison (and copying), but knowledge
of this comparison result is lost for sorting itself.  Thus, sorting can
"reuse" prefix comparison, and overhead would be lower.  But the problem is
that we have to reformat tuples before putting them into tuplesort.  I
wonder if tuple reformatting could eat potential performance win...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2017-12-01 Thread Antonin Houska

Alexander Korotkov  wrote:

> On Wed, Nov 22, 2017 at 1:22 PM, Antonin Houska  wrote:
> 
>  Alexander Korotkov  wrote:
> 
>  > Antonin Houska  wrote:
> 
>  > > * ExecIncrementalSort()
>  > >
>  > > ** if (node->tuplesortstate == NULL)
>  > >
>  > > If both branches contain the expression
>  > >
>  > > node->groupsCount++;
>  > >
>  > > I suggest it to be moved outside the "if" construct.
>  >
>  > Done.
> 
>  One more comment on this: I wonder if the field isn't incremented too
>  early. It seems to me that the value can end up non-zero if the input set is
>  to be empty (not sure if it can happen in practice).
> 
> That happens in practice. On empty input set, incremental sort would count 
> exactly one group.
> 
> # create table t (x int, y int);
> CREATE TABLE
> # create index t_x_idx on t (x);
> CREATE INDEX
> # set enable_seqscan = off;
> SET
> # explain (analyze, buffers) select * from t order by x, y;
> QUERY PLAN
> -
> Incremental Sort (cost=0.74..161.14 rows=2260 width=8) (actual 
> time=0.024..0.024 rows=0 loops=1)
> Sort Key: x, y
> Presorted Key: x
> Sort Method: quicksort Memory: 25kB
> Sort Groups: 1
> Buffers: shared hit=1
> -> Index Scan using t_x_idx on t (cost=0.15..78.06 rows=2260 width=8) (actual 
> time=0.011..0.011 rows=0 loops=1)
> Buffers: shared hit=1
> Planning time: 0.088 ms
> Execution time: 0.066 ms
> (10 rows)
> But from prospective of how code works, it's really 1 group. Tuple sort was 
> defined, inserted no tuples, then sorted and got no tuples out of there. So, 
> I'm not sure if it's really incorrect...

I expected the number of groups actually that actually appear in the output,
you consider it the number of groups started. I can't find similar case
elsewhere in the code (e.g. Agg node does not report this kind of
information), so I have no clue. Someone else will have to decide.

> 
> But there is IncrementalSort node on the remote side.
> Let's see what happens. Idea of "CROSS JOIN, not pushed down" test is that 
> cross join with ORDER BY LIMIT is not beneficial to push down, because LIMIT 
> is not pushed down and remote side wouldn't be able to use top-N heapsort. 
> But if remote side has incremental sort then it can be
> used, and fetching first 110 rows is cheap. Let's see plan of original "CROSS 
> JOIN, not pushed down" test with incremental sort.
> 
> # EXPLAIN (ANALYZE, VERBOSE) SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 
> t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;

ok, understood, thanks. Perhaps it's worth a comment in the test script.

I'm afraid I still see a problem. The diff removes a query that (although a
bit different from the one above) lets the CROSS JOIN to be pushed down and
does introduce the IncrementalSort in the remote database. This query is
replaced with one that does not allow for the join push down.

*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
*** SELECT t1.c1 FROM ft1 t1 WHERE NOT EXIST
*** 510,517 
  SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = 
t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
  -- CROSS JOIN, not pushed down
  EXPLAIN (VERBOSE, COSTS OFF)
! SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 
OFFSET 100 LIMIT 10;
! SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 
OFFSET 100 LIMIT 10;
  -- different server, not pushed down. No result expected.
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c1, t2.c1 OFFSET 100 LIMIT 10;
--- 510,517 
  SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = 
t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
  -- CROSS JOIN, not pushed down
  EXPLAIN (VERBOSE, COSTS OFF)
! SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 
OFFSET 100 LIMIT 10;
! SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 
OFFSET 100 LIMIT 10;
  -- different server, not pushed down. No result expected.
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c1, t2.c1 OFFSET 100 LIMIT 10;

Shouldn't the test contain *both* cases?

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Re: [HACKERS] [PATCH] Incremental sort (was: PoC: Partial sort)

2017-11-29 Thread Michael Paquier

On Mon, Mar 20, 2017 at 6:33 PM, Alexander Korotkov
 wrote:
> Thank you for the report.
> Please, find rebased patch in the attachment.

This patch cannot be applied. Please provide a rebased version. I am
moving it to next CF with waiting on author as status.
-- 
Michael

Re: [HACKERS] [PATCH] Incremental sort

2017-11-22 Thread Antonin Houska

Alexander Korotkov  wrote:

> Antonin Houska  wrote:

> >  * ExecIncrementalSort()
> >
> >  ** if (node->tuplesortstate == NULL)
> > 
> >  If both branches contain the expression
> > 
> >  node->groupsCount++;
> > 
> >  I suggest it to be moved outside the "if" construct.
> 
> Done.

One more comment on this: I wonder if the field isn't incremented too
early. It seems to me that the value can end up non-zero if the input set is
to be empty (not sure if it can happen in practice).

And finally one question about regression tests: what's the purpose of the
changes in contrib/postgres_fdw/sql/postgres_fdw.sql ? I see no
IncrementalSort node in the output.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Re: [HACKERS] [PATCH] Incremental sort

2017-11-21 Thread Thomas Munro

On Tue, Nov 21, 2017 at 1:00 PM, Alexander Korotkov
 wrote:
> On Mon, Nov 20, 2017 at 12:24 AM, Thomas Munro
>  wrote:
>> Is there some reason not to use ApplySortComparator for this?  I think
>> you're missing out on lower-overhead comparators, and in any case it's
>> probably good code reuse, no?
>
> However, for incremental sort case we don't need to know here whether A > B
> or B > A.  It's enough for us to know if A = B or A != B.  In some cases
> it's way cheaper.  For instance, for texts equality check is basically
> memcmp while comparison may use collation.

Ah, right, of course.

>> I gather that you have
>> determined empirically that it's better to be able to sort groups of
>> at least MIN_GROUP_SIZE than to be able to skip the comparisons on the
>> leading attributes, but why is that the case?
>
> Right.  The issue that not only case of one tuple per group could cause
> overhead, but few tuples (like 2 or 3) is also case of overhead.  Also,
> overhead is related not only to sorting.  While investigate of regression
> case provided by Heikki [1], I've seen extra time spent mostly in extra
> copying of sample tuple and comparison with that.  In order to cope this
> overhead I've introduced MIN_GROUP_SIZE which allows to skip copying sample
> tuples too frequently.

I see.  I wonder if there could ever be a function like
ExecMoveTuple(dst, src).  Given the polymorphism involved it'd be
slightly complicated and you'd probably have a general case that just
copies the tuple to dst and clears src, but there might be a bunch of
cases where you can do something more efficient like moving a pointer
and pin ownership.  I haven't really thought that through and
there may be fundamental problems with it...

If you're going to push the tuples into the sorter every time, then I
guess there are some special cases that could allow future
optimisations: (1) if you noticed that every prefix was different, you
can skip the sort operation (that is, you can use the sorter as a dumb
tuplestore and just get the tuples out in the same order you put them
in; not sure if Tuplesort supports that but it presumably could), (2)
if you noticed that every prefix was the same (that is, you have only
one prefix/group in the sorter) then you could sort only on the suffix
(that is, you could somehow tell Tuplesort to ignore the leading
columns), (3) as a more complicated optimisation for intermediate
group sizes 1 < n < MIN_GROUP_SIZE, you could somehow number the
groups with an integer that increments whenever you see the prefix
change, and somehow tell tuplesort.c to use that instead of the
leading columns.  Ok, that last one is probably hard but the first two
might be easier...

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] [PATCH] Incremental sort

2017-11-20 Thread Peter Geoghegan

On Mon, Nov 20, 2017 at 3:34 PM, Alexander Korotkov
 wrote:
> Thank you very much for review.  I really appreciate this topic gets
> attention.  Please, find next revision of patch in the attachment.

I would really like to see this get into v11. This is an important
patch, that has fallen through the cracks far too many times.

-- 
Peter Geoghegan

Re: [HACKERS] [PATCH] Incremental sort

2017-11-19 Thread Thomas Munro

On Wed, Nov 15, 2017 at 7:42 AM, Alexander Korotkov
 wrote:
> Sure, please find rebased patch attached.

+ /*
+  * Check if first "skipCols" sort values are equal.
+  */
+ static bool
+ cmpSortSkipCols(IncrementalSortState *node, TupleTableSlot *a,
+ TupleTableSlot *b)
+ {
+ int n, i;
+
+ Assert(IsA(node->ss.ps.plan, IncrementalSort));
+
+ n = ((IncrementalSort *) node->ss.ps.plan)->skipCols;
+
+ for (i = 0; i < n; i++)
+ {
+ Datum datumA, datumB, result;
+ bool isnullA, isnullB;
+ AttrNumber attno = node->skipKeys[i].attno;
+ SkipKeyData *key;
+
+ datumA = slot_getattr(a, attno, );
+ datumB = slot_getattr(b, attno, );
+
+ /* Special case for NULL-vs-NULL, else use standard comparison */
+ if (isnullA || isnullB)
+ {
+ if (isnullA == isnullB)
+ continue;
+ else
+ return false;
+ }
+
+ key = >skipKeys[i];
+
+ key->fcinfo.arg[0] = datumA;
+ key->fcinfo.arg[1] = datumB;
+
+ /* just for paranoia's sake, we reset isnull each time */
+ key->fcinfo.isnull = false;
+
+ result = FunctionCallInvoke(>fcinfo);
+
+ /* Check for null result, since caller is clearly not expecting one */
+ if (key->fcinfo.isnull)
+ elog(ERROR, "function %u returned NULL", key->flinfo.fn_oid);
+
+ if (!DatumGetBool(result))
+ return false;
+ }
+ return true;
+ }

Is there some reason not to use ApplySortComparator for this?  I think
you're missing out on lower-overhead comparators, and in any case it's
probably good code reuse, no?

Embarrassingly, I was unaware of this patch and started prototyping
exactly the same thing independently[1].  I hadn't got very far and
will now abandon that, but that's one thing I did differently.  Two
other things that may be different: I had a special case for groups of
size 1 that skipped the sorting, and I only sorted on the suffix
because I didn't put tuples with different prefixes into the sorter (I
was assuming that tuplesort_reset was going to be super efficient,
though I hadn't got around to writing that).  I gather that you have
determined empirically that it's better to be able to sort groups of
at least MIN_GROUP_SIZE than to be able to skip the comparisons on the
leading attributes, but why is that the case?

[1] 
https://github.com/macdice/postgres/commit/ab0f8aff9c4b25d5598aa6b3c630df864302f572

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] [PATCH] Incremental sort

2017-11-15 Thread Antonin Houska

Antonin Houska  wrote:

> Alexander Korotkov  wrote:
> 
> > Patch rebased to current master is attached. I'm going to improve my 
> > testing script and post new results. 
> 
> I wanted to review this patch but incremental-sort-8.patch fails to apply. Can
> you please rebase it again?

I could find the matching HEAD quite easily (9b6cb46), so following are my 
comments:

* cost_sort()

** "presorted_keys" missing in the prologue

** when called from label_sort_with_costsize(), 0 is passed for
   "presorted_keys". However label_sort_with_costsize() can sometimes be
   called on an IncrementalSort, in which case there are some "presorted
   keys". See create_mergejoin_plan() for example. (IIUC this should only
   make EXPLAIN inaccurate, but should not cause incorrect decisions.)

** instead of

if (!enable_incrementalsort)
   presorted_keys = false;

you probably meant

if (!enable_incrementalsort)
   presorted_keys = 0;

** instead of

/* Extract presorted keys as list of expressions */
foreach(l, pathkeys)
{
PathKey *key = (PathKey *)lfirst(l);
EquivalenceMember *member = (EquivalenceMember *)
lfirst(list_head(key->pk_eclass->ec_members));

you can use linitial():

/* Extract presorted keys as list of expressions */
foreach(l, pathkeys)
{
PathKey *key = (PathKey *)lfirst(l);
EquivalenceMember *member = (EquivalenceMember *)
linitial(key->pk_eclass->ec_members);

* get_cheapest_fractional_path_for_pathkeys()

The prologue says "... at least partially satisfies the given pathkeys ..."
but I see no change in the function code. In particular the use of
pathkeys_contained_in() does not allow for any kind of partial sorting.

* pathkeys_useful_for_ordering()

Extra whitespace following the comment opening string "/*":

/* 
 * When incremental sort is disabled, pathkeys are useful only when they

* make_sort_from_pathkeys() - the "skipCols" argument should be mentioned in
  the prologue.

* create_sort_plan()

I noticed that pathkeys_common() is called, but the value of n_common_pathkeys
should already be in the path's "skipCols" field if the underlying path is
actually IncrementalSortPath.

* create_unique_plan() does not seem to make use of the incremental
  sort. Shouldn't it do?

* nodeIncrementalSort.c

** These comments seem to contain typos:

"Incremental sort algorithm would sort by xfollowing groups, which have ..."

"Interate while skip cols are same as in saved tuple"

** (This is rather a pedantic comment) I think prepareSkipCols() should be
   defined in front of cmpSortSkipCols().

** the MIN_GROUP_SIZE constant deserves a comment.

* ExecIncrementalSort()

** if (node->tuplesortstate == NULL)

If both branches contain the expression

 node->groupsCount++;

I suggest it to be moved outside the "if" construct.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

48 matches

Mail list logo