Re: Re: [HACKERS] [PATCH] Incremental sort

2018-06-01 Thread Alexander Korotkov
Hi, James!

On Thu, May 31, 2018 at 11:10 PM James Coleman  wrote:

> I've attached an updated copy of the patch that applies cleanly to
> current master.
>

Thank you for rebasing this patch.  Next time sending a patch, please make
sure you've bumped
its version, if even you made no changes there besides to pure rebase.
Otherwise, it would be
hard to distinguish patch versions, because patch files have exactly same
names.

I'd like to note, that I'm going to provide revised version of this patch
to the next commitfest.
After some conversations at PGCon regarding this patch, I got more
confident that providing
separate paths for incremental sorts is right.  In the next revision of
this patch, incremental
sort paths would be provided in more cases.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Re: [HACKERS] [PATCH] Incremental sort

2018-05-31 Thread James Coleman
I've attached an updated copy of the patch that applies cleanly to 
current master.



commit 6428245702a40b3e3fa11bb64b7611cdd33a0778
Author: Alexander Korotkov 
Date:   Sat Apr 7 18:51:20 2018 +0300

Implement incremental sort

Incremental sort is an optimized variant of multikey sort for cases when the
input is already sorted by a prefix of the sort keys.  For example when a 
sort
by (key1, key2 ... keyN) is requested, and the input is already sorted by
(key1, key2 ... keyM), M < N, we can divide the input into groups where keys
(key1, ... keyM) are equal, and only sort on the remaining columns.

Incremental sort can give a huge benefit when LIMIT clause is specified,
then it wouldn't even have to read the whole input.  Another huge benefit
of incremental sort is that sorting data in small groups may help to evade
using disk during sort.  However, on small datasets which fit into memory
incremental sort may be slightly slower than full sort.  That was reflected
in costing.

This patch implements very basic usage of incremental sort: it gets used
only in create_ordered_paths(), while it sort can help in much more use 
cases,
for instance in merge join.  But latter would require much more changes in
optimizer and postponed for further releases.

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index e4d9469fdd..61775e6726 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1999,28 +1999,62 @@ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 
FROM ft2 t2 WHERE t1.c1 = t2
  119
 (10 rows)
 
--- CROSS JOIN, not pushed down
+-- CROSS JOIN, not pushed down.  For this query, essential optimization is 
top-N
+-- sort.  But it can't be processed at remote side, because we never do LIMIT
+-- push down.  Assuming that sorting is not worth it to push down, CROSS JOIN
+-- is also not pushed down in order to transfer less tuples over network.
 EXPLAIN (VERBOSE, COSTS OFF)
-SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 
100 LIMIT 10;
- QUERY PLAN  
--
+SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 OFFSET 
100 LIMIT 10;
+QUERY PLAN
+--
  Limit
-   Output: t1.c1, t2.c1
+   Output: t1.c3, t2.c3
->  Sort
- Output: t1.c1, t2.c1
- Sort Key: t1.c1, t2.c1
+ Output: t1.c3, t2.c3
+ Sort Key: t1.c3, t2.c3
  ->  Nested Loop
-   Output: t1.c1, t2.c1
+   Output: t1.c3, t2.c3
->  Foreign Scan on public.ft1 t1
- Output: t1.c1
- Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+ Output: t1.c3
+ Remote SQL: SELECT c3 FROM "S 1"."T 1"
->  Materialize
- Output: t2.c1
+ Output: t2.c3
  ->  Foreign Scan on public.ft2 t2
-   Output: t2.c1
-   Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+   Output: t2.c3
+   Remote SQL: SELECT c3 FROM "S 1"."T 1"
 (15 rows)
 
+SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 OFFSET 
100 LIMIT 10;
+  c3   |  c3   
+---+---
+ 1 | 00101
+ 1 | 00102
+ 1 | 00103
+ 1 | 00104
+ 1 | 00105
+ 1 | 00106
+ 1 | 00107
+ 1 | 00108
+ 1 | 00109
+ 1 | 00110
+(10 rows)
+
+-- CROSS JOIN, pushed down.  Unlike previous query, remote side is able to
+-- return tuples in given order without full sort, but using index scan and
+-- incremental sort.  This is much cheaper than full sort on local side, even
+-- despite we don't know LIMIT on remote side.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 
100 LIMIT 10;
+
QUERY PLAN  
   
+---
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Foreign Scan
+ Output: t1.c1, t2.c1
+ Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+ Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (TRUE)) ORDER BY r1."C 1" ASC NULLS LAST, r2."C 1" ASC NULLS 
LAST
+(6 rows)
+
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 
100 LIMIT 10;
  c1 | c1  
 +-
diff --g