Re: [HACKERS] Pushing down sorted joins

2016-03-09 Thread Ashutosh Bapat
On Wed, Mar 9, 2016 at 9:22 PM, Robert Haas  wrote:

> On Wed, Mar 9, 2016 at 2:23 AM, Ashutosh Bapat
>  wrote:
> > [ new patch ]
>
> This looks OK to me.  Committed!
>
>
Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Pushing down sorted joins

2016-03-09 Thread Robert Haas
On Wed, Mar 9, 2016 at 2:23 AM, Ashutosh Bapat
 wrote:
> [ new patch ]

This looks OK to me.  Committed!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pushing down sorted joins

2016-03-08 Thread Ashutosh Bapat
> This patch needs to be rebased.
>
>
Done.


> +   /*
> +* TODO: we should worry about EPQ path but should
> that path have
> +* pathkeys? I guess, that's not really important
> since it's just
> +* going to evaluate the join from whole-row
> references stuffed in the
> +* corresponding EPQ slots, for which the order doesn't
> matter.
> +*/
>
> The pathkeys for the EPQ path don't matter.  It'll only be called to
> recheck one single row, and there's only one order in which you can
> return one row.
>

Right. Removed the TODO


>
> -   if (bms_equal(em->em_relids, rel->relids))
> +   if (bms_is_subset(em->em_relids, rel->relids))
>
> Why do we need this?
>
>
The function find_em_expr_for_rel() find an equivalence member expression
that has all its Vars come from the given relation. It's not necessary that
it will have Vars from relids that are covered by the given relations. E.g.
in query SELECT A.c1, B.c2 FROM A join B ON ... ORDER BY A.c3, there will
be a single equivalence member A.c3 in the pathkeys and em_relids will
indicate only A. Hence instead of equal, (which used to be OK for single
relation join push-down) we have to use subset operation. We want an
equivalence members whose relids are subset of relids contained by given
relation.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6479640..48bdbef 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -437,20 +437,54 @@ SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1"
  103 | 103
  104 | 104
  105 | 105
  106 | 106
  107 | 107
  108 | 108
  109 | 109
  110 | 110
 (10 rows)
 
+-- A join between local table and foreign join. ORDER BY clause is added to the
+-- foreign join so that the local table can be joined using merge join strategy.
+EXPLAIN (COSTS false, VERBOSE)
+	SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+ QUERY PLAN 
+
+ Limit
+   Output: t1."C 1"
+   ->  Merge Right Join
+ Output: t1."C 1"
+ Merge Cond: (t3.c1 = t1."C 1")
+ ->  Foreign Scan
+   Output: t3.c1
+   Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+   Remote SQL: SELECT r3."C 1" FROM ("S 1"."T 1" r2 INNER JOIN "S 1"."T 1" r3 ON (TRUE)) WHERE ((r2."C 1" = r3."C 1")) ORDER BY r2."C 1" ASC NULLS LAST
+ ->  Index Only Scan using t1_pkey on "S 1"."T 1" t1
+   Output: t1."C 1"
+(11 rows)
+
+SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+ C 1 
+-
+ 101
+ 102
+ 103
+ 104
+ 105
+ 106
+ 107
+ 108
+ 109
+ 110
+(10 rows)
+
 RESET enable_hashjoin;
 RESET enable_nestloop;
 -- ===
 -- WHERE with remotely-executable conditions
 -- ===
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
  QUERY PLAN  
 -
  Foreign Scan on public.ft1 t1
Output: c1, c2, c3, c4, c5, c6, c7, c8
@@ -862,32 +896,29 @@ SELECT count(c3) FROM ft1 t1 WHERE t1.c1 === t1.c2;
 -- ===
 -- JOIN queries
 -- ===
 -- Analyze ft4 and ft5 so that we have better statistics. These tables do not
 -- have use_remote_estimate set.
 ANALYZE ft4;
 ANALYZE ft5;
 -- join two tables
 EXPLAIN (COSTS false, VERBOSE)
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
- QUERY PLAN 
-
+ QUERY PLAN  

Re: [HACKERS] Pushing down sorted joins

2016-03-08 Thread Robert Haas
On Tue, Feb 23, 2016 at 7:48 AM, Ashutosh Bapat
 wrote:
> Rushabh pointed out that declarations of helper functions
> get_useful_ecs_for_relation and get_useful_pathkeys_for_relation() are part
> of FDW routines declarations rather than helper function declaration. Since
> those functions are related to this patch, the attached patch moves those
> declaration in their right place.

This patch needs to be rebased.

+   /*
+* TODO: we should worry about EPQ path but should
that path have
+* pathkeys? I guess, that's not really important
since it's just
+* going to evaluate the join from whole-row
references stuffed in the
+* corresponding EPQ slots, for which the order doesn't matter.
+*/

The pathkeys for the EPQ path don't matter.  It'll only be called to
recheck one single row, and there's only one order in which you can
return one row.

-   if (bms_equal(em->em_relids, rel->relids))
+   if (bms_is_subset(em->em_relids, rel->relids))

Why do we need this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pushing down sorted joins

2016-02-23 Thread Ashutosh Bapat
Rushabh pointed out that declarations of helper functions
get_useful_ecs_for_relation and get_useful_pathkeys_for_relation() are part
of FDW routines declarations rather than helper function declaration. Since
those functions are related to this patch, the attached patch moves those
declaration in their right place.

On Wed, Feb 17, 2016 at 5:37 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi All,
> Now that we have join pushdown support in postgres_fdw, we can leverage
> the sort pushdown mechanism for base relations to work for pushed down
> joins as well. PFA patch for the same.
>
> The code to find useful set of pathkeys and then generate paths for each
> list of pathkeys is moved into a function which is called for base
> relations and join relations, while creating respective paths. The useful
> pathkeys are same as the base relation i.e. root->query_pathkeys and
> pathkeys useful for merge join as discussed in [1].
>
> I measured performance of pushing down sort for merge joins for query
> SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val =
> ft2.val)) on (lt1.val = ft1.val) where ft1, ft2 are foreign tables, join
> between which gets pushed down to the foreign server and lt is the local
> table.
>
> Without the patch servers prefers local merge join between foreign tables
> followed by merge join with local table by getting the data sorted from the
> foreign server. But with the patch, it pushes down the foreign join and
> also gets the data sorted for local merge join. The times measured over 10
> runs of query with and without patch are
>
> With patch
>  avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
> --+--+--+--
>60310.0369 | 251.075471210925 |59895.064 |60746.496
>
> Without patch
>  avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
> --+--+--+--
>86396.6001 |  254.30988131848 |85906.606 |86742.311
>
> With the patch the execution time of the query reduces by 30%.
>
> The scripts to setup and run query and outputs of running query with and
> without patch are attached.
>
>
> [1]
> http://www.postgresql.org/message-id/CAFjFpRfeKHiCmwJ72p4=zvuzrqsau9tbfyw7vwr-5ppvrcb...@mail.gmail.com
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_join_sort_pd_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pushing down sorted joins

2016-02-17 Thread Ashutosh Bapat
Hi All,
Now that we have join pushdown support in postgres_fdw, we can leverage the
sort pushdown mechanism for base relations to work for pushed down joins as
well. PFA patch for the same.

The code to find useful set of pathkeys and then generate paths for each
list of pathkeys is moved into a function which is called for base
relations and join relations, while creating respective paths. The useful
pathkeys are same as the base relation i.e. root->query_pathkeys and
pathkeys useful for merge join as discussed in [1].

I measured performance of pushing down sort for merge joins for query
SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val =
ft2.val)) on (lt1.val = ft1.val) where ft1, ft2 are foreign tables, join
between which gets pushed down to the foreign server and lt is the local
table.

Without the patch servers prefers local merge join between foreign tables
followed by merge join with local table by getting the data sorted from the
foreign server. But with the patch, it pushes down the foreign join and
also gets the data sorted for local merge join. The times measured over 10
runs of query with and without patch are

With patch
 avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--+--+--+--
   60310.0369 | 251.075471210925 |59895.064 |60746.496

Without patch
 avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--+--+--+--
   86396.6001 |  254.30988131848 |85906.606 |86742.311

With the patch the execution time of the query reduces by 30%.

The scripts to setup and run query and outputs of running query with and
without patch are attached.


[1]
http://www.postgresql.org/message-id/CAFjFpRfeKHiCmwJ72p4=zvuzrqsau9tbfyw7vwr-5ppvrcb...@mail.gmail.com
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_join_sort_pd.patch
Description: application/download
\set num_samples 10
\set query 'SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val = ft2.val)) on (lt1.val = ft1.val)'
EXPLAIN (VERBOSE, ANALYSE) :query ;
SELECT avg_exe_time, std_dev_exe_time, min_exe_time, max_exe_time
		FROM query_execution_stats(:'query', :num_samples);
\set num_rows (1000*1000*10)
-- Create local tables (to be pointed by foreign tables)
DROP TABLE lt1 CASCADE;
CREATE TABLE lt1(val int, val2 int);
INSERT INTO lt1 SELECT i, i FROM (SELECT generate_series(1, :num_rows)) s(i);
CREATE INDEX i_lt1_val ON lt1(val);


DROP TABLE lt2 CASCADE;
CREATE TABLE lt2(val int, val2 int);
INSERT INTO lt2 SELECT i, i FROM (SELECT generate_series(1, :num_rows)) s(i);
CREATE INDEX i_lt2_val ON lt2(val);

DROP TABLE lt3 CASCADE;
CREATE TABLE lt3(val int, val2 int);
INSERT INTO lt3 SELECT i, i FROM (SELECT generate_series(1, :num_rows)) s(i);
CREATE INDEX i_lt3_val ON lt3(val);

DROP EXTENSION postgres_fdw CASCADE;
create extension postgres_fdw;
CREATE SERVER pg_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname 'postgres');
CREATE USER MAPPING FOR CURRENT_USER SERVER pg_server;
CREATE FOREIGN TABLE ft1 (val int, val2 int) SERVER pg_server OPTIONS (table_name 'lt1',
use_remote_estimate 'true');
CREATE FOREIGN TABLE ft2 (val int, val2 int) SERVER pg_server OPTIONS (table_name 'lt2',
use_remote_estimate 'true');

DROP FUNCTION query_execution_stats(query text, num_samples int,
	OUT avg_exe_time float,
	OUT exec_time_dev float,
	OUT min_exe_time float,
	OUT max_exe_time float);

CREATE FUNCTION query_execution_stats(query text, num_samples int,
	OUT avg_exe_time float,
	OUT std_dev_exe_time float,
	OUT min_exe_time float,
	OUT max_exe_time float)
RETURNS record LANGUAGE plpgsql AS $$
DECLARE
	plan json;
BEGIN
	CREATE TEMPORARY TABLE query_exe_times(exe_time float); 

	-- Execute query a few times (5% of user specified runs) to warm the cache
	FOR i IN 1 .. num_samples/20 LOOP
		EXECUTE query;
	END LOOP;

	FOR i IN 1 .. num_samples LOOP
		EXECUTE 'EXPLAIN (analyze, format json) ' || query INTO plan;
		INSERT INTO query_exe_times VALUES ((plan->0->'Execution Time')::text::float);
		RAISE NOTICE 'completed % samples', i;
	END LOOP;

	SELECT avg(exe_time), stddev(exe_time), min(exe_time), max(exe_time)
		INTO avg_exe_time, std_dev_exe_time, min_exe_time, max_exe_time
		FROM query_exe_times;

	DROP TABLE query_exe_times;
END;
$$;

ANALYZE ft1;
ANALYZE ft2;
ANALYZE lt1;
ANALYZE lt2;
ANALYZE lt3;


sort_pd.out.without_patch
Description: Binary data


sort_pd.out.with_patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers